January 17, 2017

New Beta Release Allows Users to Keyword Search Some Material Found in The Wayback Machine

UPDATE October 26, 2016 It has been quite a week (so far) from the Internet Archive for new and improved search options.

On Monday, The Wayback Machine launched a keyword search beta. Our post about this new option is below this updated. We also shared news of new and enhanced search options from The Open Book Project, an Internet Archive initiative.

Today, news of MORE new search capabilities now available when searching the Internet Archive including using facets to focus search results and full text search (beta) of books. Details here.

Some VERY exciting news! Something we’ve all wanted for a long time.

The Internet Archive has just launched (beta release) the ability to keyword search a limited amount of material found in The Wayback Machine.

At this point you CANNOT keyword search specific words/phrases on specific pages.

This topic is discussed in a set of FAQs available here. We hope complete keyword search comes soon. Today’s launch is a terrific start to making Wayback even more useful.

So, what is available today? What can you search?

1. Keyword search a limited amount of Wayback Machine content, the homepages of more than 350 million sites. Fyi, the complete Wayback Machine contains over 510 billion pages.

2. Keyword search using word(s) that describe a site. For example, “Toronto Government” or websites related to “air traffic control”

3. You can limit your search to a specific domain or site by utilizing, site:. This filter/syntax can be combined with keywords. e.g. pages from MIT pages related to economics.

4. Results appear as you type. Impressive and no hiccups/stutter when I ran several searches. Impressive!

2016-10-24_16-33-01

4. Clicking on any result will take you to a traditional Wayback Machine results page with links to archived copies of the page, PDF. etc.

4. The beta index is multilingual.

5. Cool! You can search the index using unicode characters!

Much more in the complete blog post, FAQs, and this discussion about how the Internet Archive defines web pages, web sites, and web captures.

Resources

Direct to Wayback Machine’s NEW Keyword Search Beta: web-beta.archive.org

Direct to Wayback Machine (Complete Index, Not Keyword Searchable) web.archive.org

Note: Archived Pages Made Available in Archive-It Public Collections Have Always Been Keyword Searchable

Note: How to Instantly Archive Web Pages and PDFs Using Wayback

Gary Price About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.

Share