Ed. Note: Congratulations to Dr. Giles and his team at PSU. We are long time admirers (since the late 1990’s) of his important and useful work and use CiteSeerX often.
CiteSeerX, one of the world’s earliest open source academic search engines and based in the Penn State College of Information Sciences and Technology (IST), has been recognized by the Information Retrieval Specialist Group of the British Computer Society (BCS) as the Best Open Source Project as part of its 2021 Search Industry Awards.
“It’s quite an honor for Penn State and IST to have this recognition from such a prominent society,” said C. Lee Giles, David Reese Professor of Information Sciences and Technology and co-creator of the search engine.
Originally launched as CiteSeer in 1998 and renamed CiteSeerX in 2008, the search engine was one of the pioneer platforms that implemented the automated citation indexing technique to connect papers and researchers as a network. It actively crawls and harvests academic and scientific documents online and uses automatous citation indexing, making it possible for users to find related papers using citation graphs. In order to perform this indexing and information extraction as scale, CiteSeerX uses several machine learning methods. It is often considered a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search.
“Automatically, we were able to bring up how many citations a paper had gotten,” said Giles. “Indexing based on importance was revolutionary at the time.”
“Lee’s innovation and machine learning expertise, along with his proficiency in developing novel specialized search engines including CiteSeerX, have elevated him as a world-renowned leader in his field,” added Andrew Sears, dean of the College of IST. “We are proud to join BCS in celebrating Lee and recognizing CiteSeerX as a cutting-edge platform more than a decade after its launch.”
CiteSeerX has grown to host more than 10 million full text English documents and metadata—including 32 million authors and 240 million citation mentions. It has three million individual users worldwide and receives one billion hits and 180 million downloads annually. The code and data supporting CiteSeerX has been open access since its inception, meaning it can be adapted as needed, by anyone, to fit users’ requirements.
“We don’t keep it to ourselves,” Giles said. “We’ve shared it with others so they can build similar systems. Because it’s modular, it can be changed to meet their needs.”
CiteSeerX was funded by the National Science Foundation, Microsoft, NASA and the Penn State College of Information Sciences and Technology. The initial search engine, CiteSeer, was created by Giles and his colleagues Kurt Bollacker and Steve Lawrence when they were at the NEC Research Institute (now NEC Labs). Its second generation, CiteSeerX, was developed by Giles and Isaac G. Councill, who earned a doctorate from the College of IST in 2006 and continued with the college as a postdoctoral scholar until 2008. The next generation CiteSeerX is being developed at Penn State in collaboration with Jian Wu, assistant professor of computer science at Old Dominion University. According to Wu, the team is “refactoring CiteSeerX from Solr Lucene and mySQL to Elasticsearch, all of which is open source.”
The BCS Search Industry Awards recognize people, projects and organizations that have excelled in the design of search and information retrieval products and services. A charity with a royal charter, BCS aims to lead the information technology industry through its ethical challenges, support the people who work in the industry and make IT good for society. BCS currently has more than 60,000 members in 150 countries.
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area.
He earned his MLIS degree from Wayne State University in Detroit.
Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.
From the St. Louis Post-Dispatch: St. Louis-area librarians are confident their children’s sections don’t include — and never have — obscene materials, but they are spending hours examining policies to make ...
From University of Chicago News: In the fall of 2016, Carla D. Hayden had just been confirmed as the 14th librarian of Congress—the first woman and the first African American to hold ...
Fron ALA (Full Text): The American Library Association (ALA) applauds the Biden-Harris Administration’s steps announced today to address the rise in book bans and other attacks on LGBTQIA+ Americans. In ...
Association of College & Research Libraries (ACRL) ACRL Executive Director Robert “Jay” Malone is Leaving Organization, Will Be Succeeded by Interim Executive Director Allison Payne (via ALA) Databases CiteScore 2022 ...
From IMLS: The Institute of Museum and Library Services announced today the release of a research brief on the public library response to community needs during the first 9 months ...
From CBS News (via YouTube): Poet and author Amanda Gorman joins “CBS Mornings” for her first interview since her poem and book, “The Hill We Climb,” was restricted by a ...
From a Joint Announcement: U.S. Government Publishing Office (GPO) in partnership with the National Oceanic and Atmospheric Administration (NOAA) Central Library is working to add more than 47,000 unique items ...
From a Nature Editorial: Why are we disallowing the use of generative AI in visual content? Ultimately, it is a question of integrity. The process of publishing — as far ...
AI For Drug Discovery: Digital Science Fully Acquires OntoChem Congressional Research Service (CRS) Director Under Fire Resigning at Congress’ Research Arm (via BGov) EU Busy with AI Assessing Copyright in ...
From Circana: Sales of LGBTQ fiction in the U.S. reached an all-time high in the 12 months ending May 2023, according to Circana, formerly IRI and The NPD Group, increasing by ...
From the University of Maryland Libraries: The University of Maryland Libraries is excited to announce the acquisition of Ford’s Theatre records. The Ford’s Theatre records will be archived with Special ...
AI Is Used Widely, but Lawmakers Have Set Few Rules (via Stateline) Are Public Computers in Libraries Becoming Obsolete? (via Government Technology) California Expands Partnership with Dolly Parton’s Imagination Library ...