Report: Sweden’s National Library Turns Page to AI to Parse Centuries of Data
From a NVIDIA Blog Post:
For the past 500 years, the National Library of Sweden has collected virtually every word published in Swedish, from priceless medieval manuscripts to present-day pizza menus.
Thanks to a centuries-old law that requires a copy of everything published in Swedish to be submitted to the library — also known as Kungliga biblioteket, or KB — its collections span from the obvious to the obscure: books, newspapers, radio and TV broadcasts, internet content, Ph.D. dissertations, postcards, menus and video games. It’s a wildly diverse collection of nearly 26 petabytes of data, ideal for training state-of-the-art AI.
“We can build state-of-the-art AI models for the Swedish language since we have the best data,” said Love Börjeson, director of KBLab, the library’s data lab.
[Clip]
The library’s datasets represent the full diversity of the Swedish language — including its formal and informal variations, regional dialects and changes over time.
“Our inflow is continuous and growing — every month, we see more than 50 terabytes of new data,” said Börjeson. “Between the exponential growth of digital data and ongoing work digitizing physical collections that date back hundreds of years, we’ll never be finished adding to our collections.”
Learn More, Read the Complete Blog Post (about 800 words)
Filed under: Data Files, Libraries, National Libraries, News
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.