April 16, 2014

Fun & Facts with Roy Tennant: Where Do WorldCat Items Come From?

share save 171 16 Fun & Facts with Roy Tennant: Where Do WorldCat Items Come From?

A person known by many readers of infoDOCKET for his work, writing, and presentations, Roy Tennant from OCLC Research, has a great post  (with data) on the Hanging Together blog explaining how he used the WorldCat database to build a list of the countries of origin for the 300+ million MARC records in WorldCat.

Specifically, Roy’s research looked at the 260 $a subfield. (place of publication, distribution, etc.)

Cool!

Tennant writes about some of the challenges he had with this project.

As you might imagine, what results from such an investigation is a complete dog’s breakfast, with a large variety of punctuation marks, typographical errors, imaginative spellings, and just plain junk. No, it is much better to parse bytes 15-17 of the 008 field, which at least are supposed to only contain values from this list maintained by the Library of Congress. Progress.

That is, until one discovers that this “Code List for Countries” is not exactly that. If you happen to be in a certain select part of the world (mostly the United States, Canada, and Australia), you can also select state or province-specific codes. So before I used this table to translate the codes for actual countries I first had to translate the table, so that the code for “California” translated instead to “United States”. Progress.

Make sure to read the complete blog post where Roy explains about other issues he faced.

The list of the Top 25 “Countries” of Publication from WorldCat is included at the bottom of the blog post and also available here.

Thanks Roy!

But Wait…There’s More!

Roy and His OCLC Research Colleagues Recently Shared MARC Usage in WorldCat Data Visualizations

Learn More About Them Here (via Hanging Together)

See Also: A Real-Time Stream New/Updated WorldCat Records (with Visualization)
It’s WorldCat Live!

share save 171 16 Fun & Facts with Roy Tennant: Where Do WorldCat Items Come From?
Gary Price About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.