Recently Published Article: “The Number of Scholarly Documents on the Public Web”
Below is a recently published article co-authored by someone we’ve admired for well over a decade, Dr. Lee Giles at Penn St. University.
Dr. Giles is one of the developers CiteSeerX, a wonderful specialty open-web database/search engine that focuses on info tech/computer science scholarly literature. The original CiteSeer, predates Google Scholar by several years.
By the way, “Seer” code is open source and a number of other “seers” were also released at various points over the years.
Now, to the research article.
The Number of Scholarly Documents on the Public Web
Penn State University
C. Lee Giles
Penn State University
May 9, 2014
The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search.
Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million.
Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind.
In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.
Direct to Full Text Article ||| PDF Version
Note: We are working to see the current status of Microsoft Academic Search (MAS). The search tool is still available online but we’re not aware if it’s still being developed.
Longtime readers of infoDOCKET know that we are/were big admirers (and users) of this search tool. Regardless of what we learn for older material (let’s say pre-2013) MSA remains a potentially valuable research tool you should know about.
Our friend and librarian colleague, Lee Dirks, who was central in the development of MAS was tragically killed along with his wife Judy in August 2012.
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.