This article appeared in the January/February 2022 issue of Discover magazine as "Finishing the Human Blueprint." Become a subscriber for unlimited access to our archive.
At long last, scientists have declared “mission accomplished” on the complete sequencing of the human genome — one of the most ambitious research undertakings of the past few decades. The news may trigger déjà vu: Scientists with the Human Genome Project first announced they had sequenced the human genome in 2003.
That initial effort came with some notable omissions, though. A sizable chunk of the genome remained inaccessible, the era’s technology unable to parse more complex DNA regions. Though additional work added more clarity, around 8 percent of the human genome remained a mystery — until this year, when an international collaboration called the Telomere-to-Telomere (T2T) Consortium filled the gaps.
Many of these tricky regions include long stretches of highly repetitive DNA sequences. Though they often don’t code for proteins, the body’s building blocks, these sequences likely contain important clues to understanding rare genetic diseases, says Karen Miga, a satellite DNA biologist at the University of California, Santa Cruz. The sections might also alter what is known about the basics of human biology, such as cell division.
“We had a pretty darn good first sequence of the human genome,” says Eric Green, director of the National Human Genome Research Institute and a member of the Human Genome Project. But when it came to more complex stretches of the genome, the computers and “the little chemical tricks we do in the test tube, they just choke.”
Initially, scientists used the so-called “shotgun sequencing” technique. It broke longer DNA sequences into small, overlapping pieces that computer algorithms sometimes struggled to stitch back together. Today, more advanced methods empower geneticists to read sequences that measure hundreds of thousands of base pairs (the “letters” that compose DNA) in length, with an occasional length in the millions. That allowed them to “thread through and resolve some of these trickier bits,” says Miga, who helped lead the recent project.
That effort, involving dozens of scientists from around 30 institutions, finalized the human genome sequence in a series of papers posted to bioRxiv, a preprint server, in May 2021. The researchers added nearly 200 million base pairs to the archive of the genome, including 115 genes that likely code for proteins.
The new additions offer a wealth of information for geneticists to comb through. Some genes “probably have new roles that we haven’t even imagined yet for how the cell functions,” Miga says.
In the meantime, there’s work still to be done. For one, the current version of the genome represents a single person. The T2T team, now merged with the Human Pangenome Reference Center at Washington University, is working to add more diverse sequences to their database — so the human genome may contain further surprises.