May 27, 2022

Guest Post: Where is the “Big Data” for Libraries? by Matt Weaver

Editors Note
Matt Weaver is the Web Librarian at Westlake Porter Public Library (Westlake, Ohio) and a board member for Library Renewal.

Matt is also a great friend and supporter of infoDOCKET. For example, just about daily he shares news items and new resources that we share with all of you.

Today, in his first guest post, Matt shares his thoughts about the first OverDrive “Big Data” report that was released about a month ago and that we shared (full text) the other day.

We look forward to sharing more from Matt in the future.

Guest Post: Where is the “Big Data” for Libraries?
by Matt Weaver

My first response in looking at this Big Data report is, this is not Big Data.

From my days as an indexer/abstractor for a business database company, one thing that I learned was that monthly reports are rarely of value. Snapshots offer no context for the data to be analyzed and no patterns/ trends can be determined in such a short time frame. But, if you were going to choose one month, why March and not January?

For my library, March marked a definite decline from the post-New Year boom. Hopefully future reports will offer data that is representative, useful etc.

Oh, and be made available to the libraries whose usage they represent.

You see, this report was written for publishers, ALA and libraries (UK), but “Content Partners” did not receive it (to my knowledge, anyway, my library did not). Libraries should be equal partners in these releases, because it’s our data. So, if Mr. Berners-Lee doesn’t mind, I’ll follow his lead and argue that libraries need to demand their data from Overdrive.

It’s not that we are without data entirely. Overdrive’s Content Reserve provides tools with a number of metrics. Reports are customizable and can show trends. They are great tools to have at our disposal. Overdrive’s report contains data that libraries cannot access via Content Reserve. As the source of these reports, each and every Overdrive customer should receive a copy.

For instance, there is a page that, as the Web Librarian, has particular interest, the Device Share page, which features a breakdown of site visits, page views and other data broken down by device. A local version of this would be valuable so that I could understand my mobile users. I also have had an active role in training staff and working with patrons, so having this data — and being able to monitor developments — would help us better prepare training materials and classes so that they are the most effective.

Overdrive’s discussion of cover views, page views etc. Maybe I’m revealing a gap in my knowledge, but in a system like Overdrive’s, is that data necessarily measures of user engagement? I thought page view data is more important when considering selling advertising on pages. So, that data is used to validate the company’s role as custodian of publishers’ content.

I know from working with patrons that the search/browse interface for ebooks/audiobooks is clunky, and search results get rather large, requiring more effort to find things that they would want to read, especially in the absence of many popular authors/titles since the loss of Penguin. Can that data possibly reflect the cumbersome nature of the interface?

I would like to see how many books patrons checked out/put on hold/added to the wish list. Given the average amount of time spent on the site of 9 minutes and 34 seconds looking at an average of 11.6 pages (page 2, Key Findings), I would be interested to learn how much of that
time was spent finding and choosing, and how much was on winnowing. How many people left OverDrive sites without selecting anything? I would like to see a breakdown of frequency of site visits: what percentage of users are frequent users, and how many have never returned.

We do not build Overdrive collections the way we do our print collections, seeking to make sure they rich, reflect a range of viewpoints, all of that first-semester library school stuff, right?

We do not have access to a broad enough range of content to do that. So, after massive changes in access in the past year or so, including Harper Collins, Penguin, Brilliance Audio, and Random House, we license content from what is proving to be a dwindling pool of resources.

Our administrative costs have not been adjusted, and particular publishers have significantly increased their costs. Having the tools to evaluate OverDrive is important, lest libraries find themselves “pot locked,” and bound to paying into a system, even if it doesn’t satisfy patron demand.

If this report truly “confirms the benefits that books and authors in library channels enjoy in terms of exposure and discovery to a highly desirable audience…” then let’s look at the entire audience. How do consortia fair in this environment?

Personally, I think this report holds a modicum of value for libraries at best. As an industry, we need to get as much data out of OverDrive as possible, since our libraries’ ebook/audiobook presence exists solely on their servers; but, without giving OverDrive the opportunity to turn such reports into another a la carte service.

However, if this and future Big Data reports assuage publishers’ concerns about ebooks and audiobooks in libraries, then there is definite value in this exercise; but libraries need data every bit as much as publishers in order to serve our communities as effectively as possible.

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.