The Pith: In this post I examine how looking at genomic data can clarify exactly how closely related siblings really are, instead of just assuming that they’re about 50% similar. I contrast this randomness among siblings to the hard & fast deterministic nature of of parent-child inheritance. Additionally, I detail how the idealized spare concepts of genetics from 100 years ago are modified by what we now know about how genes are physically organized, and, reorganized. Finally, I explain how this clarification allows us to potentially understand with greater precision the nature of inheritance of complex traits which vary within families, and across the whole population.
Humans are diploid organisms. We have two copies of each gene, inherited from each parent (the exception here is for males, who have only one X chromosome inherited from the mother, and lack many compensatory genes on the Y chromosome inherited from the father). Our own parents have two copies of each gene, one inherited from each of their parents. Therefore, one can model a grandchild from two pairs of grandparents as a mosaic of the genes of the four ancestral grandparents. But, the relationship between grandparent and grandchild is not deterministic at any given locus. Rather, it is defined by a probability. To give a concrete example, consider an individual who has four grandparents, three of whom are Chinese, one of whom is Swedish. Imagine that the Swedish individual has blue eyes. One can assume reasonably then on the locus which controls blue vs. non-blue eye color difference one of the grandparents is homozygous for the “blue eye” allele, while the other grandparents are homozygous for the “brown eye” alleles. What is the probability that any given grandchild will carry a “blue eye” allele, and so be a heterozygote? Each individual has two “slots” at a given locus. We know that on one of those slots the individual has only the possibility of having a brown eye allele. Their probability of variation then is operative only on the other slot, inherited from the parent whom we know is a heterozygote. That parent in their turn may contribute to their offspring a blue eye allele, or a brown eye allele. So there is a 50% probability that any given grandchild will be a heterozygote, and a 50% probability that they will be a homozygote.
The above “toy” example on one locus is to illustrate that the variation that one sees among individuals is in part due to the fact that we are not a “blend” of our ancestors, but a combination of various discrete genetic elements which are recombined and synthesized from generation to generation. Each sibling then can be conceptualized as a different “experiment” or “trial,” and their differences are a function of the fact that they are distinctive and unique combinations of their ancestors’ genetic variants. That is the most general theory, without any direct reference to proximate biophysical details of inheritance. Pure Mendelian abstraction as a formal model tells us that reproductive events are discrete sampling processes. But we live in the genomic age, and as you can see above we can measure the variation in genetic relationships among siblings today in an empirical sense.The expectation, as we would expect, is 0.50, but there is variance around that expectation. It is not likely that all of your siblings are “created equal” in reference to their coefficient of genetic relationship to you.
We know now that the human genome consists of about ~3 billion base pairs of A, G, C, and T. In the oldest classical evolutionary genetic models each of these base pairs can be conceived to be inherited independently from the other. In other words, evolution is a game of independent probabilities. But this idealization is not the concrete reality. To the left is a visualization of a human male karyotype, the set of 23 chromosomal pairs which the human genome (excluding the mtDNA) manifests as. Because the ~3 billion aforementioned base pairs have a physical position within these chromosomes the reality is that some are inherited together. That is, their inheritance patterns are associated due to their physical linkage. The karytope you see is clearly diploid. Each chromosome is divided into two symmetrical homologs, inherited from each parent (except 23, the sex chromosomes). The chromosomal numbers also correspond roughly to a rank order of size. To give you a sense of the gap, chromosome 1 has 250,000,000 bases and 4,200 genes, while chromosome 22 has 1,100 genes and 50,000,000 bases (the Y chromosome has a paltry 450 genes, as opposed to the 1,800 on the X).
In the toy example above the eye color locus is on a chromosome. Specifically, chromosome 15. Each individual will inherit one copy of 15 from their parents. But, there is no guarantee that each sibling will inherit the same copy from the generation of the grandparents. Let’s illustrate this schematically. Below you see the four combinations possible in relation to the chromosomes inherited by an individual’s parents from their own parents. So “paternal” and “maternal” here is in reference from the parental generation, so there are two of each. The ones inherited from the parental mother I’ve italicized.
Possible outcomes of combinations from grandparentsMotherPaternalMaternalFatherPaternalPaternalPaternalPaternalMaternalMaternalMaternalPaternalMaternalMaternal
The outcome are as follows:
Top-left cell: paternal grandfather’s chromosome + maternal grandfather’s chromosome Top-right cell: paternal grandfather’s chromosome + maternal grandmother’s chromosome Bottom-left cell: paternal grandmother’s chromosome + maternal grandfather’s chromosome Bottom-right cell: paternal grandmother’s chromosome + maternal grandfather’s chromosome
As an example, if on chromosome 15 two siblings were characterized by the top-left cell, we might say that they were 100% “identical-by-descent” (IBD). This just means that their genes came down from the exact same ancestors. On the other hand, if one sibling was characterized by the top-left cell, and another the bottom-right, then they would be 0% IBD! In other words, in theory with this model siblings could be 0% IBD on the autosomal chromosomes if they kept inheriting different homologs from their grandparents, chromosome by chromosome (This would not be possible for chromosome 23. Males by necessity inherit the same Y from their father. While two females must share the same X from their father).
If you have a background in biology, you know this is wrong, because there’s more to the story. Recombination means that in fact you don’t invariably inherit intact copies of your grandparent’s chromosome. Rather, during meoisis, an individual’s chromosomes often “mix & match” their strands so that new mosaics are formed. So instead of inheriting homologous chromosomes which resemble exactly those carried by their grandparents, individuals often have chromosomes which are a mosaic of maternal and paternal due to the two meoisis events which intervened (one during the formation of the gametes which led to one’s parents, and another during the formation of the gametes of their parents’). If you are still confused, the following 3 minute instructional video may help. The narration has information, so if you can’t listen, the blue = paternal chromosomal segments, and the red = maternal chromosomal segments. Focus especially on recombination, about half way through the video.
This process works in contradiction to conditional dependence of inheritance of variants due to physical linkage on the same chromosomal regions. In other words, though still theoretically possible with no recombination for siblings to be very different, realistically recombination breaks apart many of the associations and reduces the realized variance. In the figure above the the low bound outliers in terms of genetic distance across sibling pairs are about mid-way between the coefficient of relatedness of half-siblings (0.25) and full-siblings (0.50), and fulling-sibling ~0.35 or so (the high bounds are 0.65).
Any any given locus the variance of IBD for siblings is 1/8. Since expectation is ~0.50, you can infer from this that on a specific gene there’s a lot of deviation across a cohort of siblings. This makes sense when you consider that siblings differ a great deal on single gene Mendelian traits. But what about the whole genome? Because now you have many more “draws” the “law of large nummbers” tends to reduce the variance. The figure to the right shows the standard deviation of IBD by chromosome. Remember that expectation is ~0.50. Observe that longer chromosomes have lower deviations. This is due to the variation of rates of recombination across the genome. We’ve come a long way from an abstract Mendelian model, to the point where one can integrate in an understanding of differences of rates of recombination across regions of the genome into the model. The total genome standard deviation of IBD turns out to be 0.036, which is close to older theoretical models which predicted ~0.04. This means that if you randomly drew two full-siblings and compared the extent of total genome IBD, the highest likelihood would be that they differed from 0.50 by 0.036. Assuming a normal distribution that means that 70% of siblings would fall within the interval 0.536 and 0.464 coefficient of relatedness. About 95% would fall with two standard deviations, 0.428 and 572. About 99.8% would fall within three standard deviations, 39.2 to 61.8.
The paper from which I’m drawing the figures and statistics is Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings. The citations, as well as follow-up papers are very interesting. It shows how modern genomics is literally swallowing whole the insights of classical quantitative genetics. Nature is one, and abstractions ultimately map onto the concrete. I’d long thought I should review this paper and its insights, as comparisons across siblings are likely going to be a future avenue of understanding the genetic basis of many traits. But I have a more personal reason for looking into this issue.
This week many of my family members came “online” to the 23andMe system. To review:
RF = Father RM = Mother RS1 = Sibling 1 (female) RS2 = Sibling 2 (male)
Later to come will be RS3, another male. But his data has not loaded….
23andMe has many features related to disease risk and ancestry information. The former was not of great interest to me, as my family is large enough that I had a good sense of what we were at risk for. 23andMe told me that I was at more risk for various ailments which are common across my extended pedigree. It also told me I was at more risk for ailments which are not known in my family. And, it told me I was at less risk for ailments common across my extended pedigree. Finally, it told me I was at less risk for ailments not common across my pedigree. You get the picture. For most people there isn’t much value-add here. I haven’t even touched the issue of “odds ratios”.
In regards to ancestry, I have received some value. I suspect I’m near the end of the line in this area, unless I get into some serious DYI genetics. My involvement in the Harappa Ancestry Project is more about understanding regional patterns of variation, than that of my own family.
So we’re at the next stage: looking at patterns in my own family. The screenshot you see above is from the ‘family inheritance’, and shows the IBD between RS2 and RF chromosome by chromosome. My male sibling and my father. As you can see they are “half-identical” across the whole genome, as they should be. Of each gene my father contributes one copy on the autosome. There’s no variance here. The total 2.86 GB value is also what you’d expect, there are ~3 billion base pairs, and you’re excluding the X and Y, as well as “no calls.” I can tell you that I exhibit the exact same relationship to my father as my brother. In contrast, my sister has more segments shared. That’s because she has an X chromosome from my father. The relationship to our mother is also as expected. We’re all equally related to our parents, once you account for sex differences on chromosome 23.
Below are the screenshots from family inheritance comparing the three siblings in terms of our genomes. Remember that half-identical (light blue) has half the weight as full-identical (dark blue).
no images were found
Here’s the top-line. I share about the same length of segments that are half-identical to both RS1 and RS2, 2.26 and 2.27 GB. But, while I have 0.60 full-identical with RS1, I have 0.86 full-identical with RS2. And here’s the even more surprising part: RS1 and RS2 have much less in common than I do with either of them. 2.09 GB half-identical, and 0.5 full-identical.
But that’s not all. 23andMe has a “relative finder” feature. It’s main goal is to find relatives you don’t know about. I don’t have any non-close relative so far, in contrast to most others from what I have heard. It may be that most of the Bangladeshis in the database are from my own immediate family! (though there are some Indian Bengalis, I’ve found only one other Bangladeshi in the database to “share” genes with) You can though include your own family in the mix. You get two different values, % of DNA shared, and # of shared segments. The former basically seems to be a proxy for IBD. I have a person of European American ancestry on my account, and they have many “relatives” matched with whom they share 0.1-1% of their genome. One individual who asked for a contact did turn out to be a very distant cousin (his surname was the same as that of a grandparent). In any case, the matrix above shows the results so far for my family. My parents are not related; they share no segments or DNA IBD. In contrast, we are all about ~50% IBD with our parents (remember that father contributes no X chromosome to sons). But look at the sibling comparisons. In particular, RS1 & RS2 share only42% of their DNA! This aligns with the earlier results. RS1 and I are a bit closer than expectation. RS2 and I are a bit more distinct. Interestingly, while RS2 and I have 49 segments in common, RS1 and RS2 have 55 in common. Why the discrepancy? Presumably RS1 and RS2 load up on the number of segments on smaller chromosomes. This seems clear in the images above.
Where does this leave us? We know intuitively that siblings differ, and cluster, in their traits. These data and methods illustrate how in the near future how parents be able to determine which siblings cluster on the total genome content level! As I have stated before, RS2 and I in particular resemble each other physically, far more than either of us resemble RS1. Could this relate to what we’ve found genomically? I believe so. Physical appearance is controlled by many different variants across many different genes, so the phenotype may be a good reflection of the character of the total genome. This can be generalized to other quantitative traits.
Finally, this has clear implications for our study of genetic inheritance within families. Classical genetic techniques had to assume that the coefficient of relatedness between siblings was 0.50. The deviation from this expectation would have introduced errors into estimates of heritability and possibly masked the understanding of the genetic architecture of a trait. But now we can correct for deviations from the 0.50 value, and so better understand the genetic basis of complex traits such as behavior.
Citation: Visscher, P., Medland, S., Ferreira, M., Morley, K., Zhu, G., Cornes, B., Montgomery, G., & Martin, N. (2006). Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings PLoS Genetics, 2 (3) DOI: 10.1371/journal.pgen.0020041