Linkage Disequilibrium: Concept, Utility and Evolutionary Dynamics

in the Context of the Human Genome Variation

Ranajit Chakraborty, Ph. D.

Allan King Professor

Human Genetics Center

The University of Texas School of Public Health

P.O. Box 20334, Houston, Texas 77225

rc@hgc9.sph.uth.tmc.edu

Abstract

Linkage disequilibrium (LD), better called gametic phase disequilibrium, is a concept that describes the association of alleles across two or more loci. Thus, since allelic states are discrete characteristics of genetic variation at individual sites of the genome, in a statistical sense, any measure of association for multidimensional categorical data can serve the purpose of defining LD. However, in population genetics, some specific measures of LD gained popularity because of the purpose for which such measures are used. Historically, the initial use of LD was intended to measure the genetic proximity of loci (i.e., the stronger the LD is, the closer is expected to be the loci on a chromosome). However, it was soon discovered that recombination during meiosis impacts the dynamics of LD, erasing the signature of linkage over time. Since most measures of LD are gene frequency dependent, other factors were also shown to affect LD. These include the genetic substructure within a population, natural selection, as well as genetic drift effect due to the finite size of a population. Traditionally, the concept of LD was defined for scenarios in which both loci would exhibit only two different alleles present in a population. When molecular techniques were developed that could detect multiple segregating alleles at each site, the need was realized for developing measures of LD that could encompass more than two segregating alleles per locus. As a consequence, the role of mutation (model as well as rate) on the dynamics of LD became a subject of intensive investigation. Since the strength of LD is dictated by a combined effect of all these factors, it is virtually impossible to isolate the principle cause of LD in any observed data without a carefully conducted genetic breeding experiment.

In the context of the human genome studies, the resurgence of importance of LD arose mainly due to the need of fine mapping of genes, when direct evidence of genetic recombination between closely spaced sites became difficult to find in family data. Since past recombination events dictate the strength LD in an extant population, LD served as the population genetic rationale for position cloning of genes. The localization and eventual cloning of the Cystic Fibrosis gene is a classic example of the success of the LD-approach of gene mapping. More recent studies indicate that the LD-approach of gene mapping may also be a fruitful method to uncover genes underlying complex phenotypes, particularly when populations of known admixture history are utilized in the investigation.

It is expected that with the completion of the Human Genome Project we will soon have a detailed information on the physical locations of the polymorphic sites (such as the Single Nucleotide Polymorphism (SNP) sites, microsatellites, insertion/deletion sites, etc.) evolving under different mutation mechanisms. This will offer a valuable resource to examine the rate and variability of recombination in different regions of the genome, and how natural selection shapes the LD between loci, with effects of population substructure and demographic history incorporated in the analysis. Such data should also circumvent the use of overly simplistic analytical models of the dynamics of LD that has been used in the past or current studies. (Research supported by US Public Health Service research grants GM 41399, GM 52601, and GM 58545 from the US National Institutes of Health).