DNA Data Storage – SME Instrument I
Project duration: 1.12.202019 – 30.5.2020
Project budget: 70.000€
Project co-financing: 50.000€
Grant: SME Instrument I (H2020)


DNA data storage
The world is producing data at an ever-increasing rate – we were producing 2.5 exabytes (EB) data per day in 2017, rising to 463 EB per day generated in 2025. With the rise of connected devices, so-called Internet of Things (IoT), these numbers are likely to be even greater. Existing data storage technologies are sufficient for now, but soon we are likely to face challenges due to limited supply and their shortcomings: limited lifespan (up to 30 years), physical space requirements and reliability. That is why we save only 50% of all the data that we generate in the world today. This gap will increase – in 2025 we will be able to save only 20% of all data. Since data is the fuel for artificial intelligence, which is revolutionizing our society, technology, and healthcare, it is even more important to invest into new data storage technologies that would overcome the limitations of the existing technologies.
One of the promising new data storage media is DNA, a molecule that can hold 20-million times more data per gram of substance than state of the art technologies. While traditional DNA synthesis methods are still far too expensive for data storage needs, there is a need for novel ways to encode data as nucleotides (DNA building blocks) and efficiently synthesize the data bearing DNA molecules.
At Biosistemika we invented and patented a novel data encoding system that allows cheap and fast synthesis of DNA molecules. We demonstrated the usability at a laboratory scale. In the next steps we are scaling down the volumes million-times to achieve the price that is comparable to the existing data storage technologies. In this study we investigate which technologies could be used for scaling down and determined InkJet technology as the most promising one, due to low volumes (1.5 picoliter) and high speed (more than 100 000 droplets per second). We also studied the economical feasibility of DNA data storage technology and determined that the cost of storing 1 gigabyte of data to DNA can be comparable to storing the same data to magnetic tape.
Although DNA data storage is an important new technology that could solve many challenges related to data storage today and in the future (as we outlined in the introduction), an equally great impact could be also on life sciences and potential medical solutions. The low volume liquid handling technology would allow a great reduction in price of diagnostic tests, particularly genetic tests, and pathogen detection. This is particularly relevant during the COVID-19 pandemics, where greater accessibility of the tests is essential. The low volume liquid handling solution would also allow the development of new diagnostic approaches, transitioning from targeted diagnostics to screening diagnostics. This would help diagnose diseases that might be not be obvious from the symptoms.
DNA data storage technology is relevant today for digital preservation. Its longevity (lifespan 1000+ years) and resistance to electromagnetic field make it a great medium to store information important for the humanity. Many libraries and archives are using non-electronic technologies, based on paper or film reels, to preserver the information beyond the typical lifespan of the magnetic tapes (up to 30 years). One of the more famous attempts is Github Archive Program, which archived all open source software on multiple non-electronic media in case the humanity gets destroyed. DNA can be therefore also used in similar digital preservation attempts and can serve as additional layer of data safety.