For the past 500 years, the National Library of Sweden has collected virtually every word published in Swedish, from priceless medieval manuscripts to present-day pizza menus.
Thanks to a centuries-old law that requires a copy of everything published in Swedish to be submitted to the library — also known as Kungliga biblioteket, or KB — its collections span from the obvious to the obscure: books, newspapers, radio and TV broadcasts, internet content, Ph.D. dissertations, postcards, menus and video games. It’s a wildly diverse collection of nearly 26 petabytes of data, ideal for training state-of-the-art AI.
“We can build state-of-the-art AI models for the Swedish language since we have the best data,” said Love Börjeson, director of KBLab, the library’s data lab.
[Clip]
The library’s datasets represent the full diversity of the Swedish language — including its formal and informal variations, regional dialects and changes over time.
“Our inflow is continuous and growing — every month, we see more than 50 terabytes of new data,” said Börjeson. “Between the exponential growth of digital data and ongoing work digitizing physical collections that date back hundreds of years, we’ll never be finished adding to our collections.”
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area.
He earned his MLIS degree from Wayne State University in Detroit.
Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.
From the FDLP/GPO: The U.S. Government Publishing Office (GPO) Director Hugh Nathanial Halpern has responded to the Feasibility of a Digital Federal Depository Library Program: Report of the GPO Director’s ...
From the Associated Press: An effort to digitize more than 200 Boston Pops radio broadcasts conducted by John Williams from 1979 until 1991 is almost complete, the Boston Symphony Orchestra ...
UPDATE (Feb. 8): We’ve added a link to a statement by the Louisiana Library Association at the bottom of this post. From the Lafayette Daily Advertiser: Louisiana Attorney General Jeff ...
Association of American Publishers (AAP) Announces Finalists And Category Winners For 2023 PROSE Awards Microsoft Will Let Companies Create Their Own Custom Versions of ChatGPT, Source Says (via CNBC) National ...
From a Letter by Ithaka President Kevin Guthrie: I recently shared the 2023 priorities ITHAKA has set to help provide the infrastructure the academic community needs to support research, teaching, and learning ...
Here’s the Full Text of HathiTrust Announcement: HathiTrust, a member-based organization hosted by the University of Michigan, has received a 5-year, $1 million grant from the Mellon Foundation to fund ...
From UMass Amherst Libraries (Full Text): The University of Massachusetts Amherst Libraries are pleased to announce the publication of The UMass Amherst Libraries Falcon Curriculum: An Open Source, Common Core PreK-12 ...
From the Annenberg School of Communications/U. of Pennsylvania: In a new report, “Americans Can’t Consent to Companies’ Use of Their Data,” researchers asked a nationally representative group of more than ...
From Fast Company: More than 50% of academics have used piracy websites like Sci-Hub in order to bypass paywalls for research they want to access, according to a recent study published in ...
AI Models Spit Out Photos of Real People and Copyrighted Images (via MIT Technology Review) California: Orange Unified School District Reinstates Digital Library After Parent Concerns (via Voice of OC) ...
The article linked below (full-text) was recently published Reference Services Review. Title Libraries Advancing Health Equity: A Literature Review Authors Amanda J. Wilson National Library of Medicine Catherine Staley National ...
From the Chicago Sun-Times Editorial Board: Anyone who has spent time on a bookmobile has learned enough to know nothing withstands the change of time. Still, we lament the slow ...