DARPA Announces SafeDocs Program with Goal to Restore “Trust in Electronic Documents”
From the Defense Advanced Research Projects Agency (DARPA):
Today, the expeditious delivery of electronic documents, messages, and other data is relied on for everything from communicationsto navigation. As the near instantaneous exchange of information has increased in volume, so has the variety of electronic data formats–from images and videos to text and maps. Verifying the trustworthiness and provenance of this mountain of electronic information is an exceedingly difficult task as individuals and organizations routinely engage with data shared by unauthenticated and potentially compromised sources. Further, the software used to process electronic data is error-prone and vulnerable to exploitation through maliciously crafted data inputs, opening the technology and its underlying systems to compromise. An attacker’s ability to deliver novel cyberattacks via electronic documents, messages, and streaming data formats appears unbounded, creating an unsustainable situation for software security.
To reduce the sizable attack surface created across consumer, enterprise, and critical infrastructure systems and to help tackle the threat posed by unauthenticated and potentially compromised electronic data, DARPA today announced a new program called Safe Documents (SafeDocs). The goal of the SafeDocs program is to dramatically improve software’s ability to detect and reject invalid or maliciously crafted input data, without impacting the key functionality of new and existing electronic data formats.
Read the Complete DARPA Release
From a Special Notice From DARPA Posted on FBO.gov (5 pages; PDF):
Internet users expect pictures, charts, spreadsheets, maps, audio, video, as well as rich messages potentially including any and all of these, to be received with a click of a button. However, the complexity of managing such electronic data results in software vulnerable to attack. This situation is unsustainable.
DARPA is interested in research that will radically improve software’s ability to safely reject invalid and maliciously crafted input data, while preserving essential functionality of legacy electronic data formats. It is anticipated that research will build on an existing base of knowledge of electronic document, message, and streaming formats and the nature of security vulnerabilities associated with these formats.
SafeDocs aims to restore trust in electronic documents and messages, mitigating one of the root causes of the Internet insecurity epidemic, which is exploitation of software’s input-handling weaknesses via complex, maliciously crafted data inputs. Today’s risks of allowing software to interact with untrusted electronic documents and messages (e.g., by clicking on an email attachment to open it) approach those of downloading and running untrusted programs.
The SafeDocs program will research methods to create technological assurance that an electronic document or message automatically checked and found well-formed is safe to open, as well as generate safer document formats that are subsets of the current untrustworthy ones, preserve existing information, and are also safe to open. It is envisioned that the technology developed for writing input-handling code will result in systems that are more secure and faster to write, to test, and to run.
The program will develop novel verified programming methodologies for building high assurance parsers for extant electronic data formats, and novel methodologies for comprehending, simplifying, and reducing these formats to their safe, unambiguous, verification-friendly subsets (“safe sub-setting”).
SafeDocs will address the ambiguity and complexity obstacles to the application of verified programming posed by extant electronic data formats. The program’s multi-pronged approach will combine:
- a) extraction of the extant formats’ de facto syntax (including any non-compliant syntax deliberately accepted and substantially used in the wild);
- b) identifying a syntactically simpler subset of this syntax that yields itself to use in verified programming while preserving the format’s essential functionality; and
- c) creating software construction kits for building secure, verified parsers for this syntactically simpler subset, and high-assurance translators for converting extant instances of the format to this subset.
Direct to Complete Document
See Also: SafeDoc Proposers Day
August 24, 2018.
Filed under: Associations and Organizations, Data Files, Maps, News, Patrons and Users

About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.