Data Management and Open Science: Best Practices for Research Transparency

Scientist presenting research data findings, emphasizing the importance of transparency and open science principles

“Open science is not just a policy agenda; it is a way to make science more effective, transparent, and inclusive” writes Jean-Claude Burgelman, Professor of Open Science Policy at Vrije Universiteit Brussel and a major figure in the European Open Science movement.[1] 

We agree, but there are formidable obstacles that make it difficult to work within principles of open science and data management. The era of big data and digital research presents both increasingly sophisticated tools to help us work with these principles—and increasingly calcified barriers to collaboration across platforms and disciplines.

The stakes are high. Transparent research practices, including data sharing and reproducibility, not only enhance the integrity of scientific inquiry but also lead to the types of collaboration and knowledge dissemination that compound innovation and impact. In short—if we can get this right, the benefits to society will be enormous.

Now—this is easier said than done, and it’s not always prudent to abide by these principles… least of all for start-ups who are making a go at mission-driven entrepreneurship. If you want your initiative to succeed in our economy, there are good reasons to be discerning about what to share and when.

In other words: we encourage you to adhere to principles of open science to the extent possible without undermining sustainability – including livelihoods. A world without any open science and data management will plunge deeper into siloes of specialized knowledge and ignorance, particularly as data is stored in isolated systems at higher and higher costs.

For this article’s purposes, we’ll assume that your circumstances are friendly to the principles of data management and open science—and suggest some basic guidelines for how you might start embracing them, particularly while developing a data management plan (DMP). We’ll describe some common strategies for addressing data management requirements from funding agencies, as well as some of the resources and tools for implementing best practices in data management and open science.

Principles of Data Management and Open Science

There are three major principles to pay attention to in the discourse of data management and open science: transparency, accessibility, and reproducibility. Imagine a climate consultant working on environmental impact studies. They are likely to have a greater impact by working under conditions that make data openly available, and that allow for independent verification and replication of research findings. Let’s parse out how transparency, accessibility, reproducibility, and replicability each factor in.

Transparency

Transparency refers to the practice of openly sharing all relevant information about a study, including the methods, data, and analysis used. No guesswork, no secrecy, no hoarding of knowledge. This openness allows our hypothetical climate consultant (and the public) to see exactly how a research project was conducted and how conclusions were drawn. 

As in personal relationships, transparency builds trust and helps prevent errors or biases from leading to assumptions. In science, it also enables independent verification of results. By being transparent, scientists ensure that their work can be scrutinized, validated, and built upon by others (including our climate consultant,) which strengthens the overall reliability of scientific knowledge. We could use a higher degree of trust in this world, don’t you think?

Accessibility

If transparency is about making knowledge visible, accessibility is about making knowledge usable. In the context of scientific research, “accessible” information refers to research findings, data, and resources that are available to as wide an audience as possible. This includes providing open access to research papers, sharing data in publicly accessible repositories, and ensuring that information is understandable to non-specialists. One way of thinking about it is as the opposite impulse of a copyright or patent. 

Accessibility is important because it democratizes knowledge, allowing not only scientists but also climate consultants, policymakers, educators, and the general public to benefit from scientific advancements. It requires resisting the impulse to hoard the tools and act like a hero, but when research is accessible, it can be used to inform policy, education, and further research, leading to broader societal benefits.

Reproducibility and Replicability

Reproducibility and replicability are often discussed as a pair. Reproducibility refers to the ability of other researchers to arrive at the same the results of a scientific study by using the same data and methods that were originally used. This requires making the data, methods, and computational tools used in the research openly accessible and thoroughly documented. It should be clear enough to function as direct instructions to a complete stranger. 

Replicability, on the other hand, refers to the the integrity of an hypothesis, regardless of the data or methods used to test it. A replicable finding is one that can be found consistently and objectively. This is an important quality because it forces researchers to resist and avoid designing experiments (or data interpretation methods) that are influenced by conscious or unconscious biases. Without replicability, we risk circulating misinformation (which has an annoying staying power, by the way.)

Reproducibility ensures that scientific findings are reliable and can be independently verified, which helps to build trust in scientific research. Victoria Stodden, Associate Professor of Information Sciences at the University of Illinois, is a prolific writer and researcher in the area of reproducibility, and has developed frameworks and guidelines that have helped improve the reproducibility of research findings through better data management and sharing practices.

Of course there are certainly other important themes to consider and frameworks for considering them. For example, Barend Mons, Professor of Bioinformatics at Leiden University Medical Center, speaks in terms of “FAIR data principles”, which stands for Findable, Accessible, Interoperable, Reusable.

Digital data repository used for archiving and sharing research data, supporting open science and accessibility

Developing a Data Management Plan (DMP)

A data management plan (DMP) is a formal document that outlines procedures for managing research data throughout the data lifecycle, from collection and documentation to storage and sharing. For a grant writing consultant, crafting and working with a comprehensive DMP is indispensable (and a great way to avoid errors and headaches) when developing grant proposals. 

Researchers really need consider the following key variables for selecting and designing a DMP: 

  • data collection methods, including data types, sources, and formats
  • documentation standards, including metadata and data dictionaries
  • storage and backup procedures, including data security and preservation measures
  • data sharing policies, including access restrictions and embargo periods
  • measuring for maintaining Data Integrity, which requires updating/cleaning/reviewing data. It’s absolutely critical that you be sure you’re using clean data! We recommend regular querying in databases like Access and also using free (easy to learn) programming software like R.

Those are a lot of complex factors to try and keep in mind… so it’s easy to understand why there’s such a need for a DMP to be simple and systematic—and free up your attention to form insights from the information.

Incidentally, many data scientists report a mismatch between their job expectations and the realities they face in the workplace. They often find themselves performing routine data preparation and management tasks (and feeling like they’re “herding cats”) rather than engaging in the advanced analytics and modeling they were trained for. This discrepancy leads to frustration and high turnover rates among data professionals. 

More broadly, developing a comprehensive DMP, enables researchers to ensure the integrity, accessibility, and longevity of their research data.

Addressing Data Management Requirements from Funding Agencies

Many funding agencies require researchers to develop and implement data management plans as a condition of funding. These requirements vary by agency and may include compliance with specific data sharing policies, standards, and guidelines. 

Inconveniently, the cost of managing and storing vast amounts of data persists as a significant concern. Researchers need to balance the expense of high-quality data storage solutions with their often limited budgets. This financial pressure is particularly acute for long-term data preservation and the need for regular backups and disaster recovery solutions, which are essential but costly—and may necessitate earning a grant in order to cover the costs.

Researchers should be prepared to address questions related to data management and open science in grant proposals and project reports, demonstrating their commitment to transparent and accountable research practices. Funding agencies are increasingly sensitive to the ethical use of data, particularly with the rise of artificial intelligence and machine learning. 

We encourage you to keep an eye on the quickly evolving discourses about these themes, and to remain attentive to the guidelines and frameworks that ensure ethical data collection, usage, and decision-making. This ethical dimension adds another layer of complexity to data management, requiring continuous education and adaptation of practices to align with evolving standards

Resources and Tools for Implementing Best Practices

Implementing best practices in data management and open science requires access to resources and tools that support data sharing, documentation, and preservation. Effective tools like these can make your work a lot easier.

For instance, we recommend using data repositories such as Dryad, Zenodo, and Figshare to archive and share research data with the scientific community. Metadata standards, such as Dublin Core and DataCite, provide guidelines for describing and documenting research data to enhance discoverability and interoperability. Data management software, such as DMPTool and REDCap, offers tools for developing, implementing, and monitoring data management plans. 

These resources and tools enable researchers to streamline their data management workflows. They also promote transparency and reproducibility in their research.

Conclusion

Data management and open science are essential components of transparent and accountable research practices—and can make our work more impactful in an increasingly complex world. By adhering to principles of data sharing, reproducibility, and transparency, researchers can enhance the integrity, reliability, and impact of their research findings. 

As you begin thinking about how you can honor these principles, remember to develop a comprehensive data management plan, address data management requirements from funding agencies, and leverage resources and tools for implementing best practices. By promoting collaboration, innovation, and knowledge dissemination in the scientific community, scientists and science writers can advance the collective pursuit of knowledge and discovery.


[1] https://policylabs.frontiersin.org/content/commentary-introduction-jean-claude-burgelman