From the Financial Times:
The Financial Times is working with the British Library to open up access to FT Digital Archive for academic research.
Extracts from the archive materials were used to produce the feature about Britain’s 1975 referendum on European Community membership that was published on FT.com today.
The archive consists of scanned images of each of the 903,029 pages comprising all 37,464 print editions of the Financial Times published between 1888 and 2010.
For each page, the archive consists of a high-resolution image file of the scanned page and a large XML file that includes the full text of the page (generated by optical character recognition software) and detailed metadata about the position of each scanned word. The full 123-year dataset is 2.5 terabytes in size.
A sample of the archive — consisting of the data for the front page of one (randomly selected) day’s edition in 1888, 1939, 1966, and 1991 — can be downloaded (for non-commercial use).
UPDATE September 24 More on the British Library – Financial Times Announcement (via BL)
The collaboration with the Financial Times is one part of emerging plans for British Library news data. The structure of news content offers numerous opportunities for analysing, interrogating, visualising and rethinking what news archives today, as well as creating new kinds of newspaper and and other news media history. We held a news data workshop on September 7th, where we brought together researchers, developers and content owners to look at ways we might develop plans for news data that would best benefit researchers. There’s a report on the workshop on our Digital Scholarship blog