May 17, 2022

Just Released Paper: "Computer-Assisted Update of a Consumer Health Vocabulary Through Mining of Social Network Data"

Super Interesting!

Author: Kristina M Doing-Harris*, BCompSci MA MS PhD; Qing Zeng-Treitler*, PhD
Affiliation (Both Authors) University of Utah, Department of Biomedical Informatics, Salt Lake City, UT, United States
Source: Journal of Internet Medical Research (13.2, 2011)


Background: Consumer health vocabularies (CHVs) have been developed to aid consumer health informatics applications. This purpose is best served if the vocabulary evolves with consumers’ language.
Objective: Our objective was to create a computer assisted update (CAU) system that works with live corpora to identify new candidate terms for inclusion in the open access and collaborative (OAC) CHV.
Methods: The CAU system consisted of three main parts: a Web crawler and an HTML parser, a candidate term filter that utilizes natural language processing tools including term recognition methods, and a human review interface. In evaluation, the CAU system was applied to the health-related social network website The system’s utility was assessed by comparing the candidate term list it generated to a list of valid terms hand extracted from the text of the crawled webpages.
Results: The CAU system identified 88,994 unique terms 1- to 7-grams (“n-grams” are n consecutive words within a sentence) in 300 crawled webpages. The manual review of the crawled webpages identified 651 valid terms not yet included in the OAC CHV or the Unified Medical Language System (UMLS) Metathesaurus, a collection of vocabularies amalgamated to form an ontology of medical terms, (ie, 1 valid term per 136.7 candidate n-grams). The term filter selected 774 candidate terms, of which 237 were valid terms, that is, 1 valid term among every 3 or 4 candidates reviewed.
Conclusion: The CAU system is effective for generating a list of candidate terms for human review during CHV development.

Access the Complete Article

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.