THE STUDY “Alignment-Based Approach for Durable Data Storage Into Living Organisms” by Nozomu Yachie et al., in the April 9, 2007, issue of Biotechnology Progress.
THE MOTIVE No data-storage method is foolproof. Bacteria, however, have been passing genetic information from one generation to the next for at least 3 billion years, and they will most likely still be reproducing when humans are long gone. If we could encode information in their genomes, it would be preserved and replicated in perpetuity. That’s what molecular biologists in Japan propose, at least. They have backed their argument by inserting Einstein’s famous equation, E = mc², plus its year of publication, 1905, into the DNA of the bacterium Bacillus subtilis. This is more than a clever stunt, the researchers insist. Biotech companies could conceivably patent or copyright genetically modified organisms by encoding a brand name within the newly created creature’s genetic code.
THE METHODS Yoshiaki Ohashi, a molecular biologist at the Institute for Advanced Biosciences at Keio University in Japan, chose to work with hardy Bacillus subtilis, a harmless soil bacterium that forms spores resistant to ultraviolet light, dehydration, oxygen and nutrient starvation, and organic solvents. He and his team selected the relativity equation because it is “one of the most important legacies” of the 20th century, although, he adds, “we are not fanatic admirers of Dr. Einstein.”
The first step was to convert each of the characters in “E = mc² 1905!” into binary code, the standard computer language consisting of zeros and ones. The next step was to insert that information into the bacterium’s own data-storing code, its DNA. The basic units of DNA are four nucleotide bases: adenine (abbreviated as A), cytosine (C), guanine (G), and thymine (T), which are linked by a phosphate-sugar backbone. Using their own encryption method, Ohashi and his collaborators made the two codes compatible by converting the four units of binary code into two nucleotides. For example, a sequence of four zeros equaled AA, the series 0001 indicated CA, 0010 specified GA, and so on.
Using DNA-splicing enzymes, the Japanese team then inserted the binary-code-carrying sequence into plasmids—circular, double-stranded DNA strings that float around in bacterial cytoplasm, outside the organism’s main chromosome. As a marker for successful insertion, the researchers also put a gene encoding antibiotic resistance into the plasmid. After they mixed the plasmids in with cells of B. subtilis, they could easily test if the plasmid had successfully integrated into the chromosome.
Finally, the team successfully created a computer model to see how much natural mutations might change the encoded message over time. They found that they could, in theory, retrieve 99 percent of the encoded information even if 15 percent of the DNA region swapped bases at random—a process that could take several thousand years.
THE MEANING The beauty of DNA lies in its ubiquity: It is a universal code that we share with almost all living species. Ohashi is fond of the idea that intelligent beings yet to evolve could possibly retrieve the equation he has encoded into B. subtilis’s DNA. “If we, like the dinosaurs, are threatened with extinction, how can we leave a message to the future?” he asks. “Unfortunately, present methods using floppy disks and CDs are not common between the new intelligent organisms and humans, but DNA is common. If an extraterrestrial life is based on DNA, we have the possibility of communicating with them.”
The Keio team’s method allows for the integration of large chunks of code into bacterial DNA. At present, they can handle up to 200,000 characters. “According to my rough calculation, the New Testament of the Bible contains approximately 1 million characters in English,” Ohashi says. “Hence, one bacterium can store one-fifth of the New Testament.” Lila Kari, a professor of biocomputing at the University of Western Ontario, adds that “DNA has tremendous information density. To encode the same amount of data that you can encode in five grams of DNA, you would need something like 150 hectares of the absolutely latest IBM hard disk. It’s mind-boggling.”