Variability in the extent of the descriptions of data (metadata) held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The automatic scoring of records on the richness of their description enables sorting by quality. Here, we introduce an objective measure for metadata—the ‘Metadata Coverage Index’ (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated for a whole database, for individual records or for their component parts (variables or subsets of the data). The MCI score can be used to filter, rank or search for records, to assess the metadata quality of an ad hoc collection, or to determine the frequency with which fields in a particular record type are filled. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the ‘Minimum Information about a Genome Sequence’ standard developed by the Genomic Standards Consortium. Finally, we discuss a number of challenges and the further application of MCI score data to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of the same standards, and to credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.
+ Full Paper (PDF)