Digitalization has helped us attain a better lifestyle and innovative living. As a result of this, we have come up with huge and unmanageable digital data ultimately leading to a problem of its storage. An enormous amount of data was created in the past 2 years than in all preceding history.

That torrent of information may soon outstrip the ability of hard drives to capture it. In 2012, the total information stored by the entire world was around 2.7 ZB which is expected to increase by 50 percent every year.[1]

IEBS - DNA digital data storage

Research has come up with DNA as one of the potential solutions as for data storage. Genomic DNA provides a rich medium for the storage of information in living cells.

Currently, it is being reported that in 215 petabytes (215 million gigabytes) can be stored in 1 gram of DNA. Information stored in DNA can last for 1000 of years giving strength to the concept of “DNA DATA STORAGE”. As a biological object DNA has an inbuilt capability to store the data in some form and the basic challenge was to retrieve the information stored in DNA in computer-readable data.

Diverse encoding models for reading and writing data onto DNA, codes for encrypting data which addresses issues of error generation, and approaches for developing codons and storage styles have been developed over the recent past years giving a strong base for usage of DNA as a future data storage solution.

DNA lies at the center of the central dogma of life. It is also one of the largest and complex molecules existing in our system. But its significance lies in being the carrier of genetic information. DNA consists of Adenine, Guanine, Cytosine, and Thymine. (A, G, C, and T) paired with nucleotide base pairs A-T and G-C with Single nucleotide can represent 2 bits of information with high memory space due to a 3D structure.

Few limitations using DNA as data Storage where researcher must pay focus

Error Handling:

A common theory that we follow while building a code is that of handling exceptions. This concept is used to manage erroneous statements or runtime errors. So, before we deploy our product into the market, the system gives us all opportunities to validate the code.  In the same manner, two strands of DNA are linked to each other by hydrogen bonds in a strict combination such that only those nucleotides that complement each other are linked, i.e. A=T and G ≡ C. Any other linkage may result into impaired DNA linkage. Now, when DNA is copied or transcribed into RNA, one of the strands is used as a source of data and another strand is used as a control sequence. This also ensured easy correction of impaired nucleotides. We have also devised in vitro methods of correcting such sequences in case our system fails to do the same. [2]


DNA being a part of a biological unit is vulnerable to Viruses and viruses have been invading all types of organisms whether it is a bacteria or complex human system. It is a Master of Data corruption, so it obviously threatens the integrity of virtual information stored in DNA. It will be fun as well as a horror to see viruses interpret the digital information coded in DNA and insert its own malicious code. It’s a long way still, but the question is how to secure the data from existing viruses?

Lack of Search functionality

Unlike the digital storage medium available today the DNA Data storage lack in terms of search functionality and Data read back at is slow speed hampering the functionality of storage platform in DNA data storage platform.[3]


One of the major short come of floppy drive as is that once the data is stored in the memory space that memory space cannot be reused. This limitation is also present in the DNA Data Storage like the information stored cannot be updated and space cannot be reused.

Interpretation of data

DNA data storage does not allow the random access to the information which is present in a current storage medium. For accessing any information, the entire information needs to be decoded.

As we know that DNA is the largest molecule: haploid cell of human contains 3 million pairs of bases. So, how was it possible to read such a huge amount of data? Scientists came up with a method of DNA sequencing. Here DNA is cut into shorter segments and therefore, can be analyzed in parallel. But still, that complete DNA strain need to be decoded.

[1] H. A. Hakami, Z. Chaczko, and A. Kale, “Review of big data storage based on DNA computing,” in Proceedings of the Asia-Pacific Conference on Computer-Aided System Engineering (APCASE ’15), pp. 113–117, Quito, Ecuador, July 2015




Author: Dr. Neeraj Maurya & Mr. Ravi Shanker