May 17, 2022

Penn State University: IST Academic Search Engine ‘CiteSeerX’ Awarded ‘Best Open Source Project’ by British Computer Society (BCS)

Ed. Note: Congratulations to Dr. Giles and his team at PSU. We are long time admirers (since the late 1990’s) of his important and useful work and use CiteSeerX often.

From Penn St. University:

Dr. C. Lee Giles (Image Source: Penn St.)

CiteSeerX, one of the world’s earliest open source academic search engines and based in the Penn State College of Information Sciences and Technology (IST), has been recognized by the Information Retrieval Specialist Group of the British Computer Society (BCS) as the Best Open Source Project as part of its 2021 Search Industry Awards.

“It’s quite an honor for Penn State and IST to have this recognition from such a prominent society,” said C. Lee Giles, David Reese Professor of Information Sciences and Technology and co-creator of the search engine.

Originally launched as CiteSeer in 1998 and renamed CiteSeerX in 2008, the search engine was one of the pioneer platforms that implemented the automated citation indexing technique to connect papers and researchers as a network. It actively crawls and harvests academic and scientific documents online and uses automatous citation indexing, making it possible for users to find related papers using citation graphs. In order to perform this indexing and information extraction as scale, CiteSeerX uses several machine learning methods. It is often considered a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search.

“Automatically, we were able to bring up how many citations a paper had gotten,” said Giles. “Indexing based on importance was revolutionary at the time.”

“Lee’s innovation and machine learning expertise, along with his proficiency in developing novel specialized search engines including CiteSeerX, have elevated him as a world-renowned leader in his field,” added Andrew Sears, dean of the College of IST. “We are proud to join BCS in celebrating Lee and recognizing CiteSeerX as a cutting-edge platform more than a decade after its launch.”

CiteSeerX has grown to host more than 10 million full text English documents and metadata—including 32 million authors and 240 million citation mentions. It has three million individual users worldwide and receives one billion hits and 180 million downloads annually. The code and data supporting CiteSeerX has been open access since its inception, meaning it can be adapted as needed, by anyone, to fit users’ requirements.

“We don’t keep it to ourselves,” Giles said. “We’ve shared it with others so they can build similar systems. Because it’s modular, it can be changed to meet their needs.”

CiteSeerX was funded by the National Science Foundation, Microsoft, NASA and the Penn State College of Information Sciences and Technology. The initial search engine, CiteSeer, was created by Giles and his colleagues Kurt Bollacker and Steve Lawrence when they were at the NEC Research Institute (now NEC Labs). Its second generation, CiteSeerX, was developed by Giles and Isaac G. Councill, who earned a doctorate from the College of IST in 2006 and continued with the college as a postdoctoral scholar until 2008. The next generation CiteSeerX is being developed at Penn State in collaboration with Jian Wu, assistant professor of computer science at Old Dominion University. According to Wu, the team is “refactoring CiteSeerX from Solr Lucene and mySQL to Elasticsearch, all of which is open source.”

The BCS Search Industry Awards recognize people, projects and organizations that have excelled in the design of search and information retrieval products and services. A charity with a royal charter, BCS aims to lead the information technology industry through its ethical challenges, support the people who work in the industry and make IT good for society. BCS currently has more than 60,000 members in 150 countries.

Direct to CiteSeerX

See Also: Learn More About Dr. Giles

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.