From the National Academies of Sciences, Engineering, and): Medicine (NASEM):
While computational reproducibility in scientific research is generally expected when the original data and code are available, lack of ability to replicate a previous study — or obtain consistent results looking at the same scientific question but with different data — is more nuanced and occasionally can aid in the process of scientific discovery, says a new congressionally mandated report from the National Academies of Sciences, Engineering, and Medicine.
Reproducibility and Replicability in Science recommends ways that researchers, academic institutions, journals, and funders should help strengthen rigor and transparency in order to improve the reproducibility and replicability of scientific research.
Defining Reproducibility and Replicability
The terms “reproducibility” and “replicability” are often used interchangeably, but the report uses each term to refer to a separate concept. Reproducibility means obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis. Replicability means obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.
Reproducing research involves using the original data and code, while replicating research involves new data collection and similar methods used in previous studies, the report says. Even when a study was rigorously conducted according to best practices, correctly analyzed, and transparently reported, it may fail to be replicated.
“Being able to reproduce the computational results of another researcher starting with the same data and replicating a previous study to test its results facilitate the self-correcting nature of science, and are often cited as hallmarks of good science,” said Harvey Fineberg, president of the Gordon and Betty Moore Foundation and chair of the committee that conducted the study. “However, factors such as lack of transparency of reporting, lack of appropriate training, and methodological errors can prevent researchers from being able to reproduce or replicate a study. Research funders, journals, academic institutions, policymakers, and scientists themselves each have a role to play in improving reproducibility and replicability by ensuring that scientists adhere to the highest standards of practice, understand and express the uncertainty inherent in their conclusions, and continue to strengthen the interconnected web of scientific knowledge — the principal driver of progress in the modern world.”
The committee’s definition of reproducibility focuses on computation because most scientific and engineering research disciplines use computation as a tool, and the abundance of data and widespread use of computation have transformed many disciplines. However, this revolution is not yet uniformly reflected in how scientists use software and how scientific results are published and shared, the report says. These shortfalls have implications for reproducibility, because scientists who wish to reproduce research may lack the information or training they need to do so.
When results are produced by complex computational processes using large volumes of data, the methods section of a scientific paper is insufficient to convey the necessary information for others to reproduce the results, the report says. Additional information related to data, code, models, and computational analysis is needed.
If sufficient additional information is available and a second researcher follows the methods described by the first researcher, one expects in many cases to obtain the same exact numeric values – or bitwise reproduction. For some research questions, bitwise reproduction may not be attainable and reproducible results could be obtained within an accepted range of variation.
The evidence base to determine the prevalence of non-reproducibility in research is incomplete, and determining the extent of issues related to computational reproducibility across or within fields of science would be a massive undertaking with a low probability of success, the committee found. However, a number of systematic efforts to reproduce computational results across a variety of fields have failed in more than half of attempts made — mainly due to insufficient detail on data, code, and computational workflow.
One important way to confirm or build on previous results is to follow the same methods, obtain new data, and see if the results are consistent with the original. A successful replication does not guarantee that the original scientific results of a study were correct, however, nor does a single failed replication conclusively refute the original claims, the report says.
Non-replicability can arise from a number of sources. The committee classified sources of non-replicability into those that are potentially helpful to gaining knowledge, and those that are unhelpful.
Potentially helpful sources of non-replicability include inherent but uncharacterized uncertainties in the system being studied. These sources of non-replicability are a normal part of the scientific process, due to the intrinsic variation or complexity in nature, the scope of current scientific knowledge, and the limits of current technologies. In such cases, a failure to replicate may lead to the discovery of new phenomena or new insights about variability in the system being studied.
In other cases, the report says, non-replicability is due to shortcomings in the design, conduct, and communication of a study. Whether arising from lack of knowledge, perverse incentives, sloppiness, or bias, these unhelpful sources of non-replicability reduce the efficiency of scientific progress.
Unhelpful sources of non-replicability can be minimized through initiatives and practices aimed at improving research design and methodology through training and mentoring, repeating experiments before publication, rigorous peer review, utilizing tools for checking analysis and results, and better transparency in reporting. Efforts to minimize avoidable and unhelpful sources of non-replicability warrant continued attention, the report says.
Researchers who knowingly use questionable research practices with the intent to deceive are committing misconduct or fraud. It can be difficult in practice to differentiate between honest mistakes and deliberate misconduct, because the underlying action may be the same while the intent is not. Scientific misconduct in the form of misrepresentation and fraud is a continuing concern for all of science, even though it accounts for a very small percentage of published scientific papers, the committee found.
Improving Reproducibility and Replicability in Research
The report recommends a range of steps that stakeholders in the research enterprise should take to improve reproducibility and replicability, including:
- All researchers should include a clear, specific, and complete description of how the reported results were reached. Reports should include details appropriate for the type of research, such as a clear description of all methods, instruments, materials, procedures, measurements, and other variables involved in the study; a clear description of the analysis of data and decisions for exclusion of some data or inclusion of other; and discussion of the uncertainty of the measurements, results, and inferences.
- Funding agencies and organizations should consider investing in research and development of open-source, usable tools and infrastructure that support reproducibility for a broad range of studies across different domains in a seamless fashion. Concurrently, investments would be helpful in outreach to inform and train researchers on best practices and how to use these tools.
- Journals should consider ways to ensure computational reproducibility for publications that make claims based on computations, to the extent ethically and legally possible.
- The National Science Foundation should take steps to facilitate the transparent sharing and availability of digital artifacts, such as data and code, for NSF-funded studies – including developing a set of criteria for trusted open repositories to be used by the scientific community for objects of the scholarly record, and endorsing or considering the creation of code and data repositories for long-term archiving and preservation of digital artifacts that support claims made in the scholarly record based on NSF-funded research, among other actions.
Confidence in Science
Replicability and reproducibility, useful as they are in building confidence in scientific knowledge, are not the only ways to gain confidence in scientific results. Research synthesis and meta-analysis, for example, are valuable methods for assessing the reliability and validity of bodies of research, the report says. A goal of science is to understand the overall effect from a set of scientific studies, not to strictly determine whether any one study has replicated any other.
Among other related recommendations, the report says that people making personal or policy decisions based on scientific evidence should be wary of making a serious decision based on the results, no matter how promising, of a single study. By the same token, they should not take a new, single contrary study as refutation of scientific conclusions supported by multiple lines of previous evidence.
The study — undertaken by the Committee on Reproducibility and Replicability in Science — was sponsored the National Science Foundation and Alfred P. Sloan Foundation.
Direct to Full Text Report