At BioSistemika, we get many questions about data integrity and what is important when implementing the FAIR principles into the laboratory or organization. We gathered a list of frequently asked questions, and we are sharing the answers with you in this blog post.
What is data integrity?
Data integrity is a term used to describe the quality of data throughout the entire lifecycle. That means that data is valid, complete, authentic, and consistent, and it should not be changed between source and destination. To fulfill these requirements, you need to have appropriate quality and risk management systems to assure the security and reliability of the data.
Why is data integrity important?
Data integrity is crucial because it ensures the transparency and trustworthiness of scientific data. As the National Academy of Science explained, it is necessary to ensure public trust in the research data, which is very important in the current times. That is essential to ensure that scientific knowledge and innovation can advance.
What are common data integrity issues?
Most data integrity issues are related to manual tasks such as the name and time of the data creation, the instrument used to conduct the data, and other data path to other media or computers.
The study shows that likelihood of scientific data being available decreases significantly every year. While we can access almost 100% of the data 2 years after the publication, only 33% of the data are available from studies 10 years ago (Figure 1). In most cases, data is not available due to the lost or inaccessible storage media.
How to improve data integrity?
There are many areas where each of us can find an opportunity to improve the integrity of the generated data. In most cases, the steps and tasks that reduce our data quality are related to manual work, which is significantly more prone to errors than automated workflows.
Digitalization of laboratory processes can be an effective way to contribute to data integrity and credibility in a scientific environment and the public eye. The first step that you can take today is to start mapping your processes and data flows. In many cases, you will conclude that at least some processes can be fully automated. Laboratory automation will also bring you closer to achieving data integrity, mainly because it will eliminate the errors resulting from manually done tasks. You can read more about this in our blog post.
What are FAIR data principles?
FAIR data principles were developed to improve data integrity. FAIR principles should be used as a guide to anyone in science who is producing and publishing data. The use of these principles should also be extended to the algorithms, tools, and workflows in the laboratories that produced the data as a result, as well as to the digital environment in which the data is located.
Why would I use FAIR data principles?
There are several reasons why it makes sense to make your data FAIR:
If you published your results alongside all the tools and software you used to conduct the scientific data, your research would be easier to reproduce and verify. That will improve the integrity of your data, and you will be able to come back to this procedure even after some time has passed.
When your data is trustworthy in the long run, you will avoid unnecessary costs associated with repeated analyses. That is especially important when your research is publicly funded, and you need a dedicated (part of the) project for an experiment.
Acceleration of scientific discoveries
The ability for reliable data to be continually re-used can improve our ability to face and solve different global challenges that we are facing.
Meaning of FAIR data principles
FAIR data principles are high-level guidance to follow when thinking about improving data integrity in your laboratory. To implement these principles into your everyday laboratory routine, we share an overview that will further elaborate on the meaning behind the findable – accessible – interoperable – reusable principle.
Findable means that either people or computers can find your data. This applies to data as well as metadata, which essentially describes other data.
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
Accessible data does not mean that your data must be open for everyone to access, but rather that adequate authentication is needed to gain access to it.
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1. the protocol is open, free, and universally implementable
A1.2. the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
Interoperability of data means that both people and machines can recognize and use certain data. Standardization and usage of recognizable file formats are essential to achieve this.
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
Reusable data is the data that other scientists can easily use to understand and recreate the experiment.
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
How to work towards FAIR data in your lab
Saving data on many different locations is still very common in laboratories. However, it is not easy to search through paper documents, and retrieving folders and files without proper metadata can be a nightmare.
To avoid this, we recommend using dedicated (laboratory) software, which can support the FAIR principles described above. However, it is important to keep in mind that there is no single solution to ensure data integrity out of the box. It is much more likely that to achieve data integrity, you will need an integrated ecosystem of applications and collaboration of different stakeholders (IT department, QA department, R&D department, instrument vendors, and others).
- Data integrity is important because it ensures the transparency and trustworthiness of scientific data.
- Most data integrity issues are related to manual tasks that can be automated.
- FAIR (findable – accessible – interoperable – reusable) data principles are used when discussing the high-level approach to data integrity.
- You can improve the integrity of data with digitalization and automation of laboratory processes and workflows.
Are you going through your digitalization journey and are facing challenges?
Welcome to book a meeting with our laboratory digitalization consultancy specialists.