FAIR research, what’s that?

FAIR is an acronym of the words Findable, Accessible, Interoperable, and Reusable. The goal of the FAIR principles is to make research more accessible and usable to a wider audience. The idea is that research data, methods, and outputs should be possible to search for, find, and access. It also means that research data should be reusable, in that data should be stored in such a way that it can be opened and used in different systems and combined with datasets from other research projects.

FAIR and Open Science

Open science is an umbrella term encompassing initiatives to make as much research as possible open and accessible to everyone. Making research open is a way of increasing transparency, replicability, and opportunities for collaboration. The concept includes several movements related to openness, such as open access to scientific publications, open research methods, and open source code.

FAIR principles are often mentioned alongside open science, but it’s important to understand that FAIR does not mean that all data must be open to everyone (i.e., there is no requirement for full openness). Instead, the principles are about making research data searchable, accessible, compatible, and reusable, but not necessarily open.

Both open science and the FAIR principles contribute to improving the quality, transparency, and reproducibility of research, which many believe promotes scientific advancement and innovation. Furthermore, the Swedish government has set a clear goal that the results of all publicly funded research should be immediately open and accessible ("as open as possible, as closed as necessary") and meet the FAIR principles by 2026.

FAIR in Practice

To achieve the FAIR principles, digital infrastructure is required to enable the publication of research data. Not all research data can or should be openly available, especially not if it contains sensitive and protected information, but metadata about the dataset in question can usually be published. Looking at each FAIR principle, it’s clear that good metadata is crucial for all of them.

Findable

To make research data findable, good and comprehensive metadata is needed. This means that clear, structured, and detailed descriptions of the dataset and its content need to be available. Examples of metadata is title, creator, collection location, and publications related to the dataset. The metadata should follow recognized standards, and each dataset must have a unique and persistent identifier (PID). With a persistent identifier, it's easier to make datasets searchable by linking publications with datasets.

To enhance the searchability for datasets, standardized vocabularies and terms should be used in the metadata descriptions (such as MESH in medicine).

Accessible

Accessible means that datasets should be available in a secure and controlled manner and preserved over time. To achieve this, it's essential to use technical solutions that are well-established and long-lasting. It's crucial to be clear about the access conditions and limitations for each dataset. This is communicated through licenses that indicate whether a dataset is open or only accessible under specific conditions. For fully open datasets, licenses like Creative Commons can be used. Whether or not a dataset is openly accessible, its metadata should still be published. Metadata indicates the existence of the dataset and its contents, allowing potential users to find information about the dataset and request access (if it's not already published openly).

Additionally, all datasets should have clear documentation that describes how to download, interpret, and use them, as well as what tools or software are needed to open and analyze the data.

Interoperable

Datasets are made interoperable by ensuring that the data can be understood, used, and combined with other datasets, tools, and systems, as well as by other researchers. This principle is the most technical of the FAIR principles, as it involves a lot of technical standards and protocols.

By storing datasets in standardized formats, it becomes possible to open and work with the data across different tools and systems. Moreover, data should be interoperable in the sense that it is comprehensible to other researchers. This is achieved by using standardized vocabulary and terminology for the relevant subject area (such as MESH in medicine). The metadata must include information about how the dataset is structured and in what format the data is stored.

Reusable

This is the FAIR principle that ties everything together. When the other principles are fulfilled, datasets become reusable by other researchers. At this stage, it becomes clear that a data management plan (DMP) is highly useful in achieving the FAIR principles. By creating a DMP from the start, researchers are reminded to make deliberate choices regarding file formats, rich metadata descriptions, and more.

Ensuring data quality is key to reusability. This means that the data should be cleaned of errors and duplicates, and that any potential sources of error or limitations in the dataset are documented in the metadata. This helps others to assess whether the dataset is suitable for their specific research project.

The Library's Role in Promoting FAIR Research Data

Libraries and librarians play a crucial role in supporting researchers in their efforts to follow the FAIR principles. With expertise in managing information and metadata, they can assist with metadata standards, descriptions of research data, and making them searchable and usable. Librarians can also help with documentation, publishing metadata and datasets, and choosing repositories. They can assist in designing and implementing DMPs and will collaborate with other experts, such as archivists, for long-term archiving and preservation planning.

Challenges in Implementing the FAIR Principles

The implementation of FAIR principles presents technical, organizational, and cultural challenges, but a successful implementation also offers great opportunities for improving research quality and increasing international collaboration.

There is a significant need for infrastructure for the storage, publication, and archiving of research data. However, there is often a lack of resources for such infrastructure, and both the initial investments and the ongoing maintenance costs are perceived to be expensive. There is also a shortage of training resources, even though educational efforts are necessary to improve knowledge about the FAIR principles and their practical application.

Since openness and sharing are relatively new aspects of research data management, many have concerns regarding competition, copyright, and responsibilities related to data sharing. Researchers may also lack confidence in how their data will be managed, used, and cited, making them hesitant to share their datasets. Those who wish to share data might encounter legal and ethical obstacles. Additionally, there is currently a general lack of incentives to share datasets and follow the FAIR principles.

However, there is hope that the future will bring clear and reasonable incentives for open data and FAIR practices from both the research community and research funders. With more information and education on the topic, knowledge about the FAIR principles should increase, leading to further development of storage and publication infrastructure. Additionally, development of new AI solutions which may give assistance with metadata, standardization, and quality control, which makes the outlook for more FAIR research data in the world fairly bright.

Read more

The FAIR principles were introduced in a Nature article in 2016: The FAIR Guiding Principles for scientific data management and stewardship

Swedish National Data Service information on the FAIR principles