May 27, 2022

University of Buffalo is Developing a New Data Science Librarian Training Program

From the University of Buffalo:

University at Buffalo researchers in several disciplines are collaborating to develop a new data science training curriculum for library and information science graduate students, and practicing health science librarians.

The $25,000 grant from the National Library of Medicine (NLM) of the National Institutes of Health (NIH) has been awarded to researchers in the Department of Biomedical Informatics in the Jacobs School of Medicine and Biomedical Sciences and the Department of Library and Information Studies in the Graduate School of Education.

The new grant is a supplement to a $2.5 million grant that the NLM awarded in 2017 to UB researchers led by Peter Elkin, MD, professor and chair of the Department of Biomedical Informatics. That grant supports doctoral- and postdoctoral-level training for research careers in biomedical informatics and data science.

The new data science librarian program will focus on preparing people to work in academic, hospital, health-related and public libraries, as well as libraries focused on specific subjects.

Students and librarians who complete the program will achieve data science micro-credentials, which are skill sets that are more narrowly focused, more flexible and quicker to achieve than traditional degrees or certificate programs.

“Our goal is to develop micro-credentials that will provide library and information studies graduate students and practicing health science librarians with the knowledge, skills and attributes they need in order to successfully compete for data science positions,” said Diane G. Schwartz, research associate professor of biomedical informatics and co-investigator on the grant with Ying Sun, PhD, associate professor in the Department of Library and Information Studies.

“The need to train data science librarians stems from the ever-increasing amounts of patient data being generated by electronic health records, as well by as the internet and social media,” Schwartz added.

The power of data science is just beginning to be appreciated, she noted. “Data science has the potential to facilitate the identification of new treatments for disease even before biomedical scientists and physicians know that a particular therapy is successful,” she explained.

The grant is focused on developing a training program that will provide practitioners with specific skills that will allow them to assist health care professionals and biomedical scientists in making sense of and leveraging the ever-growing deluge of data that the biomedical sciences are now generating.

“For example, data science librarians collaborate with health care professionals to assess, manage, analyze and interpret data, developing data sets that can then be communicated to physicians, nurses and other health care providers who will apply these data sets to improve disease prevention, diagnosis and treatment,” Schwartz said.

Schwartz and Sun are creating the data science librarian curriculum around five skill sets:

  • Data analytics or analysis, a scientific, mathematical and statistical area in which data is “cleaned” to enable accurate evaluations or calculations.
  • Data management, in which librarians confront the issue of how best to manage the data deluge and focus on instilling in data users best practices regarding the importance of proper data handling and management.
  • Data archiving/curation, which focuses on alleviating technical issues researchers face, such as data loss, version issues, management of obsolete file formats in long-term projects and provision of secure collaboration tools.
  • Data visualization, in which librarians create visual representations of data in order to more powerfully explore, examine and communicate the meaning of the data.
  • Terminology/ontology, in which librarians work to develop the skills that will enable them to partner with ontologists — people who specialize in the study of organizing and categorizing knowledge on a topic — in order to better find, organize, categorize, integrate and label relevant data.
About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.