April 15, 2014

Wikipedia Moving to New Search Technology, Elasticsearch

share save 171 16 Wikipedia Moving to New Search Technology, Elasticsearch

From the Wikimedia Tech Blog:

We’re in the process of rolling out new search infrastructure to all of the wikis, so it’s a good time to explain what’s coming to all Wikimedia wikis in the very immediate future, why we’re changing it.

First a bit of background. All Wikimedia sites have been using a home-grown search system based on Apache Lucene since 2005 or 2006. It was written primarily by volunteer Robert Stojnić and is called lucene-search-2. This is a fantastic search engine, which has powered the sites for years now, and has managed to scale very well for the past 8 years or so. Early in 2013 this became a point of significant operational problems; short-term we were able to patch some of the most glaring issues in lucene-search-2 but it became increasingly apparent that a replacement was needed. Robert is no longer around and the system is showing its age.

We’re very happy with Lucene but we wanted to get out of the business of maintaining a special-purpose open-source search system when there are two very good general-purpose open-source search systems available: Solr and Elasticsearch.

Both are based on Lucene and horizontally scalable for data and query volume. After experimenting with both and implementing basic MediaWiki integration we chose to settle on Elasticsearch.

600px New wiki search Wikipedia Moving to New Search Technology, Elasticsearch

The new search engine is coming soon to all Wikimedia wikis, and may already be on your favorite wiki

We plan for this replacement search to be a Beta Feature for all wikis by the end of February and the primary search in March or April. See our ever-evolving timeline for ever-evolving specifics.

Read the Complete Blog Post to Learn More About Why Elasticsearch was selected.

See Also: Learn More About Elasticsearch

Others Using Elasticsearch Include:

  • Sound and Vision (Dutch National Audiovisual Archives)
  • Foursquare
  • SoundCloud
  • GitHub
  • StumbleUpon
share save 171 16 Wikipedia Moving to New Search Technology, Elasticsearch
Gary Price About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.