One of the great achievements of modern science was the human genome project to map the sequence of genes in human DNA. The project produced unprecedented insight into the function of genes, their role in human health and the nature of life itself.
And yet the human genome project was just the beginning. Armed with the sequence of genes in DNA, life scientists now want to know how the extraordinarily rich complexity of life emerges from this code.
Closely linked and just as puzzling is how small changes in the genome can lead to the rich tapestry of human life with its infinite variety of faces, ethnicities and susceptibility to certain diseases.
Now scientists at more than 120 laboratories in the US and elsewhere have joined forces to search for an answer. The group is called the Impact of Genomic Variation on Function Consortium and its goal is to understand how variations in the genome influences its function and consequently the phenotype of the resulting human.
The project has the potential to revolutionize the way scientists understand life and the role that genes play in disease. “To unlock these insights, we need a systematic and comprehensive catalog of genome function and the molecular and cellular effects of genomic variants,” say the team.
Profound Problem
The scale of the task is huge. The human genome project revealed some 25,000 genes. But just a small subset of these is switched on in any particular tissue at any one time. How this genetic switching works in perfect synchrony is a question of profound importance.
Scientists know that each gene codes for a specific protein. In other words, it is a section of DNA that can be transcribed into RNA and then translated into a protein. These proteins are the building blocks of all cells and of the molecular machinery of life. But transcribing a single gene is no simple task.
Each copy of the genome — almost every cell has its own copy — consists of about 3 billion base pairs lined up in the famous double helix structure. If laid out in a straight line, this strand of DNA would stretch to about 2 meters.
But instead, it is packed tightly inside the cell and must be unpacked to access the genes it carries. This packing and unpacking is highly orchestrated. The DNA strand is first wound onto molecular “cotton reels” called histones, which then weave tightly together to form a “DNA rope” called chromatin. This itself eaves back and forth into shapes called chromosomes.
To access a gene, the chromatin must be unpacked in a way that reveals the precise location of the gene and then packed away again afterwards.
All this is managed by complex networks of molecules working together in synchrony. One of the great discoveries of the Human Genome Project was that DNA does not just code for proteins. It also contains numerous genes that produce RNA strands that do not code for proteins.
This non-coding RNA coordinates the processes of life in a complex network of operations--switching, shepherding, binding and so on — to control this enormous ballet of molecular construction.
Given this glimpse of the processes of life, scientists now want to know how it all works; that’s essentially the goal of the Impact of Genomic Variation on Function Consortium.
They already know that single changes within the genome can lead to significant differences between individuals, for example in their susceptibility to certain diseases. But it is no easy task to tease apart the role of each single nucleotide variation, not least because many phenotypic features are the result of combinations of many nucleotide variations. Even when scientists are aware of the variations between individuals, their significance is not always clear.
Rate-Limiting Step
And that makes it hard to determine the role of genes in many diseases or how to fix them. “The interpretation of the impact of genomic variation on function is currently a rate-limiting step for delivering on the promise of precision medicine,” say the group.
So the Impact of Genomic Variation on Function project aims to create a map of the predicted effects of every possible single-nucleotide variant on the key aspects of genome function. That will mean working out how coding variants change the shape and function of proteins, how non-coding variants influence gene expression and together these might influence molecular networks throughout a cell.
Given that the genome has 3 billion nucleotides, there will be no way to experimentally measure the effect of a variant in each position in all cells in every circumstance. The combination of possibilities is mind-bogglingly large.
So scientists will attempt to measure the effects of many variants, but computer modelling will need to shoulder the load for predicting the effects of many others. “The amount of data needed to build accurate models of genome function is unknown, and fully realizing the goal of mapping the impact of genomic variation on function will require additional advances in both experimental and computational methods,” say the team.
That’s why the IGVF consortium is so large — the required skills range across the whole of the life sciences sector and beyond into bioinformatics and computer science.
That’s an ambitious goal with profound implications for how we understand human health and in particular the role of genetic variance in disease. The results will be worth following for years to come.
Ref: The Impact of Genomic Variation on Function (IGVF) Consortium : arxiv.org/abs/2307.13708