May 27, 2022

Data Visualization: “How to Turn 175 Years of Words in Scientific American into an Image” + Interactive Resource

From Scientific American:

Summarizing the history of a 175-year-old magazine—that’s 5,107 editions with 199,694 pages containing 110,292,327 words!—into a series of graphics was a daunting assignment. When the hard drive with 64 gigabytes of .pdf files arrived at my home in Germany, I was curious to dig in but also a bit scared: as a data-visualization consultant with a background in cognitive science, I am well aware that the nuance of language and its semantic contents can only be approximated with computational methods.


A central question in any data-science project is how wide a net one casts on the data set. If the net is too coarse, all the interesting little fish might escape. Yet if it is too fine, one can end up with a lot of debris, and too much detail can obscure the big picture. Can we find a simple but interesting and truthful way to distill a wealth of data into a digestible form? The editors and I explored many concept ideas: looking at sentence lengths, the first occurrences of specific words, changes in interpunctuation styles (would there be a rise of question marks?), and mentions of persons and places. Would any of these approaches be supported by the available data?

Learn More, Read the Complete Article

Direct to Interactive Data Visualization

Search a 4,000-word database to see how language in the magazine evolved over time.

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.