Machine Learning: “This Machine Read 3.5 Million Books Then Told Us What it Thought About Men and Women”
From the World Economic Forum:
Machine learning analyzed 3.5 million books to find that adjectives ascribed to women tend to describe physical appearance, whereas words that refer to behavior go to men.
“Beautiful” and “sexy” are two of the adjectives most frequently used to describe women. Commonly used descriptors for men include righteous, rational, and courageous.
Researchers trawled through an enormous quantity of books in an effort to find out whether there is a difference between the types of words that describe men and women in literature. Using a new computer model, the researchers analyzed a dataset of 3.5 million books, all published in English between 1900 to 2008. The books include a mix of fiction and non-fiction literature.
We are clearly able to see that the words used for women refer much more to their appearances than the words used to describe men. Thus, we have been able to confirm a widespread perception, only now at a statistical level,” says computer scientist and assistant professor Isabelle Augenstein of the University of Copenhagen’s computer science department.
[Clip]
Additional coauthors of the study are from the University of Maryland, Google Research Johns Hopkins University, the University of Massachusetts Amherst, and Microsoft Research.
They presented a paper on the at the 2019 Annual Meeting of the Association for Computational Linguistics.
Read the Complete Article
Filed under: Data Files, News
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.