Paper — Reliable Electronic Text: The Elusive Prerequisite for a Host of Human Language Technologies

April 24, 2011 by fulltextreports

Reliable Electronic Text: The Elusive Prerequisite for a Host of Human Language Technologies

Electronic text for use by human language technologies originates from a number of sources: direct keyboard entry, optical character recognition, speech recognition, and text-containing computer files. In particular, text-containing computer files may elude processing by an array of human language technology applications (e.g., search, language ID, machine translation, and text analytics). This paper brings to light the effort required to extract electronic text from these files, preserve its integrity, and, for some use cases, preserve its structure. It explores a series of specific human language technologies, highlighting the following aspects for each: relevant use cases, the impact of text extraction or conversion errors, the criticality of dependable text extraction and reliable electronic text, and the importance of experimentation and/or testing prior to use. Overall, this paper promotes the successful use of human language technology by equipping the reader to be discerning about the use of human language technology applications with text-containing files.

+ Full Paper (PDF)

Source: Mitre Corporation

Filed under: Journal Articles, News

Digitization Information Technology

Paper — Reliable Electronic Text: The Elusive Prerequisite for a Host of Human Language Technologies

About fulltextreports

Archives

FOLLOW US ON TWITTER

Paper — Reliable Electronic Text: The Elusive Prerequisite for a Host of Human Language Technologies

About fulltextreports

Archives

Related Infodocket Posts

FOLLOW US ON TWITTER