From Scientific American:
Summarizing the history of a 175-year-old magazine—that’s 5,107 editions with 199,694 pages containing 110,292,327 words!—into a series of graphics was a daunting assignment. When the hard drive with 64 gigabytes of .pdf files arrived at my home in Germany, I was curious to dig in but also a bit scared: as a data-visualization consultant with a background in cognitive science, I am well aware that the nuance of language and its semantic contents can only be approximated with computational methods.
A central question in any data-science project is how wide a net one casts on the data set. If the net is too coarse, all the interesting little fish might escape. Yet if it is too fine, one can end up with a lot of debris, and too much detail can obscure the big picture. Can we find a simple but interesting and truthful way to distill a wealth of data into a digestible form? The editors and I explored many concept ideas: looking at sentence lengths, the first occurrences of specific words, changes in interpunctuation styles (would there be a rise of question marks?), and mentions of persons and places. Would any of these approaches be supported by the available data?
Direct to Interactive Data Visualization
Search a 4,000-word database to see how language in the magazine evolved over time.