Storing data as files on a hard drive is a common practice for software in many industries. However, when entering regulated environment markets such as pharmaceuticals, biotech, and medical devices, upgrading software to be 21 CFR Part 11 ready is necessary. Data integrity is paramount for 21 CFR Part 11 readiness, and database systems can offer significant advantages over storing data in files. Still, there are numerous complex business and design decisions to be made that may make a significant difference in the cost of such migration.
This blog post discusses the pros and cons of migrating to a database system and other factors to consider when designing laboratory software.
Regulated environments and data integrity
We have already written about software development for the pharmaceutical industry and how to implement 21 CFR Part 11 features into your software on our blog. For further discussion, we’ll recap two sets of principles applicable when designing and evaluating a system from a data integrity point of view.
The first set of principles is ALCOA-CCEA principles, also known as ALCOA+ principles, which are guidelines used to ensure the integrity and reliability of electronic records. Adhering to these principles may reduce the risk of errors and fraud. The ALCOA+ principles require electronic records to be attributable, legible, contemporaneous, original, accurate, complete, consistent, enduring, and available.
The second set of principles or properties a 21 CFR Part 11 ready system must meet is derived directly from the 21 CFR Part 11 requirements:
- data security – ensuring no data is corrupted or lost
- traceability – all actions relevant to electronic records are traced in audit trails
- non-repudiation – approval of an electronic record with an electronic signature can not be disputed
- no falsification – tampering of records is made difficult and can be detected
Data security and no falsification may be most relevant for this blog post since database systems shine in that department.
Storing data in files
Storing data in a file is, in many cases, a straightforward task. One of the most attractive features is that libraries that handle interacting with the operating system’s file system and handling files themselves are part of the standard library suits in most programming languages. Moreover, object (de-)serialization libraries are also readily available and do not introduce a complex dependency. However, one can easily see shortcomings in storing data in files when it comes to questions about concurrent access, fault tolerance, etc. On the contrary, various database systems resolve these dicey issues. Nevertheless, choosing the right database system is a challenging task. Furthermore, a database system is a much more complex software dependency that requires specialized skill sets.
We have compiled a list of some rules of thumb that identify situations where storing data in files is a reasonable choice:
- Kiosk mode: Systems where software runs in the so-called kiosk mode restrict non-privileged users from interacting with the system outside the provided user interface, e.g., no file browsing and modifications through the operating system’s file explorer. Kiosk mode encloses the system and drastically reduces interaction points and the attack surface. In general, kiosk mode makes many data integrity issues irrelevant.
- Simple data: A file-based system may be sufficient if the data model is relatively simple and does not require a high level of structure or organization. For example, a file-based system may be adequate for storing a small number of images or text-based documents such as simple, non-related configuration files.
- Infrequent access: A file-based system may be a good choice if accessing the data is infrequent. Note that file-based systems may not be as performant as database systems when searching and retrieving data.
- Low volume of data: A file-based system may be sufficient for storing a relatively small volume of data. However, as the volume of data grows, a file-based system may become less efficient, and a database system may be a better choice.
- Standalone applications: For a standalone application that does not need to be tightly integrated with other systems or share data with other users, e.g., project files, a file-based system may be sufficient. Still, some file integrity-checking mechanisms must be implemented to prevent users from interacting/tampering with files outside the application.
- Simple queries: A file-based system may be sufficient for simple queries on the data, such as retrieving all records or filtering based on a single criterion.
It is easy to see that file-based data storage system quickly becomes the system’s limiting factor. Drawbacks of such design that stand out are:
- Data security and integrity: Implementing security measures to protect the data stored in a file-based system can take significantly more work.
- Constraints and data integrity checks: A file-based system usually provides a lower level of data integrity than a database management system.
- Data sharing: In a file-based system, allowing multiple users to access and update the data concurrently may be more difficult to implement correctly.
- Data organization: In a file-based system, the data is stored in files and may not be organized in a structured manner. That can make the implementation of efficient searching and retrieving data more difficult.
- Performance: A database management system may perform better than a file-based system when searching and retrieving data, primarily if the data needs to be structured.
- Scalability: A file-based system may not scale as well as a database management system, especially if data grows significantly.
Data security, data integrity, and sharing could quickly become the most challenging points in upgrading a file-based system to a 21 CFR Part 11-ready system, especially as they are interconnected. Let’s examine how introducing a database system might resolve some issues.
Storing data in a database
A database is an organized collection of data stored and accessed electronically. A database management system (DBMS), or colloquially a database system, is specialized software for creating and managing data efficiently and allowing it to safely persists over time. The principles and technologies behind DBMS stand on decades of research and development. A tapestry of different DBMSs has been conceived, each adjusted for the specificity of the
particular problem domain addressed. We may divide database systems into relational database systems and NoSQL database systems, which accommodate everything that is not classified as a relational database system. Choosing the best DBMS for a particular problem is a topic for another article.
Data integrity is a crucial requirement for DBMS. A DBMS may support several measures for data integrity:

Constraints and integrity checks
Constraints enforce rules on the data stored in the database to ensure that a column only contains values within a specified range or that a column cannot contain null values. Constraints help ensure the data’s accuracy and consistency in the database. A DBMS may also perform integrity checks on the data to ensure accuracy and consistency. Accuracy implies trustworthiness which requires assurance that data has not been tampered with.

Transactions
Transactions are the essential concept in most DBMS, along with the ACID properties of a transaction. A database transaction is a sequence of operations that satisfies the ACID properties, and as such, it may be perceived as a single logical operation on the data. ACID is a set of properties for transactions that a DBMS with ACID transactions must implement. The acronym ACID stands for Atomicity, Consistency, Isolation, and Durability.
Implementing something remotely resembling ACID transactions in an application is exceedingly complex. Regardless. ACID transactions cover adequately 21 CFR Part 11 requirement about data security that no data is corrupted. Note that not all DBMSs support transactions, let alone ACID transactions, especially NoSQL DBMSs.

Data backup and recovery
A DBMS typically includes mechanisms for backing up and recovering data in the event of a failure or disaster. That prevents data loss.

Access control
Some DBMSs support access control to restrict access to data stored in the database. That can include user authentication to ensure that only authorized users can access the data. A database can be deployed locally where only users on a local system, e.g., PC, instrument, etc., may access the database system. Alternatively, it can be deployed remotely, where users from different systems may access the same database. That is an excellent feature for sharing data.

Role-based access control
Some DBMSs support even finer access control, namely role-based access control (RBAC), which can restrict user access to specific database objects or actions based on their roles. RBAC and access control offer a handy way to implement the triplet create, check, and approve an electronic record by three different persons with different roles within a typical GMP workflow.

Data encryption
Some DBMSs support encrypting the data stored in the database files and encrypting data transmitted over the network. That prevents unauthorized access to the data.

Auditing
Some DBMSs support tracking and logging all access to the data stored in the database. This can include logging the user who accessed the data, the time of the access, and when the specific data was accessed and modified. There might be mechanisms available to audit changes in the data itself automatically.
These are just a few measures that a DBMS may support to ensure data integrity. A particular DBMS may not support all of that. In general, the more features a DBMS has, the more complex the DBMS is to set up and maintain. Furthermore, depending on the specific DBMSs and the application’s needs, a system designer may select only a subset of DBMS features to use.
Practical considerations before migration to database system for lab software
The decision to migrate currently implemented file-based storage to database-based storage is not easy—quite the opposite. There are several key points that a system architect should consider before migrating from a file-based system to a database-based system.
- Upgrading systems in production: If a system is already deployed in production and a typical workflow is incompatible with a common workflow used in regulated environments, it is unreasonable to expect current users to change their workflows.
- Kiosk mode: Keeping data encapsulated in a system that operates in kiosk mode is similar to keeping data in a database system, albeit kiosk mode is not nearly as sound as some database systems. But in some cases, this might be a viable alternative.
- Data organization: Consider how the data is currently organized in the file-based system and how it will be organized in the database. Migration to DBMS requires to design of the data model (schema for relational database systems), to accommodate the data and ensure that it is structured in an easy search and retrieve way.
- Data migration: Extracting the data from the files and importing it into the database may require data transformation and cleanup. If a product is already deployed at the customer site, consider the logistics and risks associated with such migration.
- Data access and querying: Consider how the software will access and query the data in the database. That might require reimplementing the application’s data access layer and writing SQL queries to access and manipulate the data.
- Data security: Consider securing the data in the database to prevent data loss or security breaches. Regulated environments are usually not connected to the Internet, which helps quite a bit.
- Performance: Consider how the database will perform under various workloads and how it will scale as data grows. Some optimizations of the database system may be required to improve performance.
- Cost: Consider the cost of implementing and maintaining a database system, including the cost of personnel with database expertise for designing and implementing the system, the hardware and software requirements, and the ongoing maintenance and support costs. Even lightweight choices, such as SQLite, may have high costs associated.
- Vetting process: Choosing a good DBMS for a particular problem domain is challenging. On top of that, a system designer must vet the DBMS that fits the IT requirements of regulated industries. DBMS may be deployed separately, and the customer’s IT department allows only established DBMS. On the other hand, concealing a very niche DBMS into a system also introduces some risks.
- Licensing: Consider different licensing (and pricing) of the 21 CFR part 11 product variant.
If a system is still in the design phase and at least a moderately complex data model or data workflows are expected in the long run, then it’s safer to design a system with the database system. A correctly selected database system provides many benefits and prevents some very tricky implementation issues, e.g., concurrency.
On the other side of the spectrum, if a system is already in production and “only” an upgrade with essential features for 21 CFR part 11 readiness needs to be implemented, the decision for migration is significantly more complex. The above points should help with the appropriate decision to meet your needs and requirements.

Summary
Data integrity is essential for the 21 CFR Part 11 ready systems. In that regard, upgrading a software system with data stored in files to meet the data integrity requirements quickly becomes exceedingly difficult for the moderately complex data model or workflows that incorporate data sharing. Migration to some database system data storage might be a viable solution.
The blog post presents several aspects of designing or upgrading software systems to become 21 CFR part 11 ready, emphasizing the shortcomings of data stored in files and potential solutions to migrate data to a database system. Ultimately, there are numerous complex design decisions to be made that can make a significant difference in the cost of such migration. Or as one wise person once said: There are no solutions, only tradeoffs.
This article is an overview of the in-depth whitepaper, which you can download here.