Open Research Data

Figure by Nik Papageorgiou, 2020; “Open Science is the Future”, CC BY NC ND 3.0

“Open research data that include, among others, digital and analogue data, both raw and processed, and the accompanying metadata, as well as numerical scores, textual records, images and sounds, protocols, analysis code and workflows that can be openly used, reused, retained and redistributed by anyone, subject to acknowledgement. Open research data are available in a timely and user-friendly, human- and machine-readable and actionable format, in accordance with principles of good data governance and stewardship, notably the FAIR (Findable, Accessible, Interoperable, and Reusable) principles, supported by regular curation and maintenance.” (UNESCO, 2021)

“As open as possible as closed as necessary”

Some data should not be openly available and reusable according to criteria established by the institution or local government. This data must be an exception, as it may violate some legislation, state security or patents for example. It is also important to develop tools and protocols for pseudonymizing and anonymizing data, as well as systems for mediated access so that as much data as possible can be shared as appropriate. The need for justified restrictions may also change over time, allowing the data to be made accessible or restricting access to data at a later point (UNESCO, 2021).

Best practices for Research Data Management

Planning and organising data collection is essential for scientific success, it will decrease the probability of an unexpected occurrence, for example, loss of data, errors, or data misuse or, just as important, it is a way to follow specific instructions from research funders.
When data is organized following a plan and shared, it supports transparency and openness as well as increases ROI for publicly funded research. Some of the advantages of sharing data are (1) the reinforcement of verification and replication of the original results, (2) the promotion of new research, (3) collaborations, (4) avoiding duplicate data collection and the spread of fraudulent data, (5) enhances visibility and citation and (6) preserves data for future use.

How can you organize and manage your data for publication?

Open science democratizes access to scientific knowledge and thus enhances research development. Research Data Management is a fundamental requirement for the validation and reproducibility of scientific results. Moreover, funders are aligning their position with the mantra “As open as possible, as close, as necessary”, to ensure transparency and openness.
The data collected and used to validate a scientific project should follow the FAIR principles, an acronym for findable, accessible, interoperable, and reusable.

Creating a DMP (Data Management Plan)

In the scope of H2020 projects and FCT grants, a detailed Data Management Plan (DMP) should be submitted with other requirements about the project. A DMP is a document that details in advance how the data for a specific project will be created, collected, stored, and documented and who will oversee the preservation for long-term usability.
Although a DMP is not a static document and will suffer adaptations during the research process, the DMP will serve as a guideline to follow best practices for Research Data Management.

The Digital Curation Center provides a guiding checklist with questions and tips ranging from:

  • Administrative data;
  • Data collecting;
  • Documentation and metadata;
  • Ethical and legal issues;
  • Storage and replications;
  • Selection and preservation;
  • Data sharing;
  • Resources and responsibilities.

When creating a DMP, it is fundamental to organize and document data in a systematic way to facilitate future preservation and long-term storage. This means that you should be thinking in advance about formats and file names according to the instructions of the repository where the dataset is going to be archived.

To help researchers with the process of creating a DMP aligned with the funder’s requirements check out some online tools:

Common topics in a DMP required by funders are the description of the data (content, type, format, volume), which methodology was followed when collecting the data, ethical and intellectual property of the data, data sharing plans (how, when and who), preserving long-term strategies.

Sensitive Data

In 2018 GDPR (General Data Protection Regulation) became official in the EU, due to this there are a few steps that you should be following to guarantee to avoid any infringement or penalty. Information that allows the identification of a person includes PII (Personally Identifiable Information), PHI (Protected Health Information) and Sensitive Information which should be transformed and anonymized/ pseudonymized, encrypted and archived with a closed licence.

For more guidelines on sensitive data check the OpenAire Factsheet about Personal Data and Open research data.

License and Archive

When the dataset is ready for storage in an archive you should choose the license that better fits the nature of your dataset. If it is sensitive data, you should archive it with a closed license. If the research data are classified as literary work or open software, usually a CC BY 4.0 license is applied. The attribution of a CC BY-SA (Share Alike) license is also compatible with Open Access policies and with Plan S.

Finally, the research data should be archived in a trustable repository: institutional, specific to the discipline of research, or a general repository. To find a repository for your research data, search first for a disciplinary repository, otherwise, look for institutional repositories that guarantee long-term preservation or use a general repository such as Zenodo. Search for repositories that adapt to your needs in re3data.org.

How to license your data?

Learn more about licenses for research data and how to apply it