I am very happy to announce the availability of anonymous search log files for Wikipedia and its sister projects, as of today. Collecting data about search queries is important for at least three reasons:
+ it provides valuable feedback to our editor community, who can use it to detect topics of interest that are currently insufficiently covered.
+ we can improve our search index by benchmarking improvements against real queries.
+ we give outside researchers the opportunity to discover gems in the data.
Every day from today, we will publish the search queries for the previous day at: http://dumps.wikimedia.org/other/search/ (we expect to have a 3 month rolling window of search data available).
Each line in the log files is tab separated and it contains the following fields:
URL encoded search query
Total number of results
Lucene score of best match
Namespace (coded as integer)
Title of best matching article
We are making this data available under a CC0 license: this means that you can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission. But we do appreciate it if you cite us when you use this data source for your research, experimentation or product development.
Read the Complete Blog Post