Talk:Linkage disequilibrium

From Wikipedia, the free encyclopedia

WikiProject Genetics This article is part of WikiProject Genetics, an attempt to build a comprehensive and detailed guide to genetics on Wikipedia. If you would like to participate, you can edit this page, or visit the project page to join the project and/or contribute to the discussion.
??? This article has not yet received a rating on the quality scale.
??? This article has not yet received an importance rating.

I removed the statement "and their genetic distance", since it is not relevant to LD. Lower linkage disequilibrium is expected in data sets for loci the farther they are apart. However, LD reflects covariance between alleles thereby comparing joint occurence with a product distribution - irrespective of distance. -sboehringer

I reverted the wording to "non-random association" as this is the standard textbook definition. See Hartl & Clark Principles of Population Genetics 3rd edition 1997. Correlation may imply a particular measure of non-random association, whereas there are many measures of LD. --Lexor|Talk 05:01, 25 January 2006 (UTC)

Isn't there some way to reformat the 2x2 table of haplotype frequencies? The way it is, the entries on the rows abut, so that D is immediately followed by x_21 when in fact they are in different expressions. I tried some ways of putting blanks in between but none worked. Felsenst 23:40, 29 September 2006 (UTC)

I added a section on analysis software and added links to Haploview and PyPop. Jrandall 22:02, 24 October 2006 (UTC)

To Do

-Begin article with a definition of linkage disequilibrium, instead of stating what it is used for. (For instance, if I wrote an entry on bears, I would start off by saying "Massive plantigrade carnivorous or omnivorous mammals with long shaggy coats and strong claws." Rather than "Never feed hungry bears, because then they will start chasing you")

-Someone needs to add info about multiplicative and epistatic fitnesses for the two locus model -- which explains how selection can only produce LD at polymorphic equalibrium when haplotypes have epistatic fitness.

-Start off article with a small section at the top on Linkage EQUILibrium, this should give good contrast to then explain DISequilibrium.

-Add text on 'selective sweeps', its a related topic to that of LD.

-Add example of LD, use HLA gene example.

-Text on how LD and 'selective sweep' can be used to find genes that cause drug resistance.

-Text on how LD can be selectively advantageous,neutral or disadvantageous.

--Mike Spenard 07:13, 1 January 2007 (UTC)

I've removed the statement that LD is generated by epistasis as this is untrue. LD arises as a consequence of mutations being syntenic to each other. When a mutation arises it is then in complete LD with other polymorphisms on that chromosome. This LD is broken down over time by recombination. It may be true that epistasis may selectively favor certain combinations of trans-acting genes and this would affect the fitness of certain haplotypes, but epistasis does not cause LD.

I also feel that using HLA as an example for LD is disingenious. Its a highly polymorphic region with a relatively high number of recombination hot-spots. Starting off with bi-allelic loci (i.e. SNPs) would be a better demonstration of the principles of LD, rather than complicating it with multi-allelic loci. Given the HapMap has data on this in four populations, this would serve as a good start. It would also serve to demonstrate how populations sub-structure affects the range of LD (viz. range of LD in populations of different ages, and the effect of population bottlenecks).

With regards to epistasis there is a formal genetic definition, but how to actually detect epistatic effects is still problematic, thus introducing the concept within a discussion of LD would I feel complicate the article unnecessarily. See Cordell H.J. (2002) Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11(20): 2463-2468 for a discussion of this. There is after all an article on epistasis, so this should be pointed to and epistasis between syntenic polymorphisms discussed there.

Slack---line 12:05, 2 August 2007 (UTC)

Contents

[edit] Diploids

The statement that

When extending these formula for diploid cells rather than investigating the gametes/haplotypes directly, the laid out principle prevails, the recombination rate between the two loci A and B must be taken into account, though, which is commonly denoted by the letter c.

seems wrong to me. Aside from whether "the laid out principle" is good English, the estimate of D can be made without knowing anything about the recombination fraction between A and B (for example, by the EM method of Hill, 1974). Felsenst 01:34, 1 January 2007 (UTC)

Agreed, 'the laid out principle' needs to go. Also, recomination rate is usually denotated by r (in my expierance), the article I think should use that since r makes symbolic sense more easily. As for D, perhaps D' was ment (as in the value of D for the next generation) D'=(1-r)D  ??? But you are correct, D can be calculated without r. D is the devation from 1 of the sumation of all possible haplotype frequencies. r decays D, which is why I think the above article statement should be changed to apply to D'. [haplotypes A1B1=a,A1B2=b,A2B1=c,A2B2=d] ... D=(ad-bc) right? Could you toss the EM method into this thread? --Mike Spenard 07:03, 1 January 2007 (UTC)
I think you must have meant something else when you said "the [sum] of all possible haplotype frequencies", since the frequencies of everything always adds to 1. D is the frequency of a particular (two-locus) haplotype (say A1B1), minus the expected frequency of that haplotype if the alleles at the two loci are distributed randomly, which is p(A1)p(B1). Felsenst 23:43, 4 September 2007 (UTC)

[edit] What's with δ ?

I think there is a problem with the section on δ. As stated, the formula for it is backwards -- it ought to be h12 - p1 p2, not the other way around. Furthermore there is the puzzling issue of why we have the two measures D and δ when they will be exactly the same number. Can someone enlighten me about this? (I have written a number of papers on linkage disequilibrium starting in 1965 so I believe that I know what I am talking about, but maybe I've missed something). Felsenst 05:38, 29 August 2007 (UTC)

Let me answer my own question. p1 p2 - h12 is not correct. But I had failed to notice that it was h12. However it is more conventional to express it in terms of h11, p1, and p2 whereupon it would be h11 - p1 p2. To express it in terms of h12 the correct formula would be p1(1-p2) - h12. But all this notation is not clear becase h11 is the frequency of haplotype A1B1, but p2 is described as the marginal frequency at the B locus so that it is unclear whether it is for the B1 or the B2 allele. Felsenst 22:35, 26 September 2007 (UTC)

[edit] What's with r2 ?

In the discussion of the r2 measure of disequilibrium it is said that "This however is not adjusted to the loci having different allele frequencies." The whole point of r2 is that it is adjusted for gene frequencies, so this is incorrect. It may not be the best possible way of adjusting, but it tries to do so. This is followed up by a statement that if we take the square root of r2 and give it the sign of D, we get the measure D'. Which is fine, except that this isn't true (try doing both the r2 and D' computations in a case with gene frequencies away from 50% at one or more of the loci and you will see that they don't come out the same). Felsenst 07:08, 6 September 2007 (UTC)

[edit] Not necessarily on the same chromosome

I have reverted the recent edit that described the loci whose disequilibrium as being calculated as "necessarily on the same chromosome arm". This is incorrect -- linkage disequilibrium can be calculated for loci on opposite arms of the same chromosome, or even on different chromosomes. Even when the recombination fraction becomes 50%, linkage disequilibrium can be nonzero. (When disequilibrium is calculated not within a single population but across populations, it can persist for substantial numbers of generations even for unlinked loci). Felsenst 17:19, 6 September 2007 (UTC)

I agree that they need not be on the same arm, but different chromosomes, thats a new one to me! Yes you can have alleles at unlinked loci in disequilibria, but its not linkage, which is what the articles is about (see my above comments with regards to epistasis).
Also why would you calculate LD across populations? Allele frequencies are key to calculating LD, and even under stochastic variation (Kimura's neutral theory) these will vary between populations, so it doesn't even make sense to calculate LD in a mixed pool --Slack---line 17:37, 23 September 2007 (UTC)
Your problem is that you are trying have a logical terminology. The term "linkage disequilibrium" has always been unpopular among population geneticists because it implies that it will be zero when genes are unlinked (which is, as I mentioned, not quite true). Nevertheless the phrase has stuck. So yes, although D is small for unlinked genes, it is not zero and yes, we can calculate it. As for calculating it between populations, D can result from (among other causes) admixture of individuals from different populations, so it makes sense to compare within- and between-population D. Felsenst 06:04, 29 September 2007 (UTC)
I guess my issue is that linkage to me implies exactly that, a physical link. I never said that unlinked genes couldn't display disequilibria, is there an alternative term for describing such disequilibria when loci aren't linked? I've not come across any, that I can think of, although unlinked loci in disequilibriawould make the distinction (although I guess your are saying that there is no distinction, just a historically inaccurate artifact in the nomenclature). I shall have a dig through some books and references to see if I can find the first use of the term (if only to clarify things in my mind).
Fully aware of the effects of admixture and the confounding it causes when performing association mapping (and even the utility of admixture mapping in identifying disease loci). I think I misunderstood the point you were making, which if I understand correctly is that loci can appear to be in LD in admixed populations? Slack---line 15:19, 2 October 2007 (UTC)
I'd go a bit further and say that the unlinked loci don't just "appear" to be in LD in admixed populations, they are in LD in that case. LD is still called LD when the loci actually aren't linked. Furthermore that LD reflects covariation of allele frequencies across the source population, and it is legitimate to analyze it by computing LD both within the population and across the source populations. This was first discussed, as far as I know in the appendix by Timothy Prout to a paper by Jeff Mitton and Richard Koehn in Genetics, 1973 (73: 487-496). Felsenst 23:23, 3 October 2007 (UTC)

[edit] First Paragraph is incomplete

I'm an undergrad genetics student... but I do know that the first para ends abruptly. Ashton 07:17, 19 November 2007 (UTC)

That's the least of the problems of this page. It is a complete mess. Here are some problems:
  1. The utterly mysterious statement that "It may be instructive to study genetic equilibrium, and its application in the Hardy-Weinberg principle." Yup, it may instructive to study that, or astronomy, or linguistics, but why not discuss linkage disequilibrium instead?
  2. Linkage disequilibrium is given the symbol δ, then there is some wrong algebra. The expression p1p2-h12 is given which is not equal to δ but in fact works out to p1p2-p1(1-p2)+D which is 2 p1 p2-p1+D. Then that is (for no reason) equated to h11 h22 - h12 h21 which actually is one way of writing the linkage disequilibrium but which is not at all equal to p1p2-h12.
  3. Next the linkage disequilibrium is called D, without any mention of why the letter suddenly changed, and the discussion goes off sideways into all sorts of discussion of genotype frequencies for no particular reason. In the middle of which is a casual definition of D that happens to be correct but is a tiny fraction of all the discursive stuff.
  4. Then there is a discussion of D' which also describes its rival r2 as not adjusted for gene frequencies, when of course it is (so is D', they're just different adjustments).
  5. Then for no reason whatsoever Tajima's measure of departure from neutrality is described. It uses the symbol D, but is not at all related to linkage disequilibrium.
I wish I had the guts to totally rewrite the page but I'd just get all these authors who were responsible for the mess mad at me. Felsenst (talk) 07:40, 20 November 2007 (UTC)

I've been trying to tidy bits and pieces up myself, by adding references, altering glaring mistakes and so forth but haven't made much headway. Personally I wouldn't get mad (its rather futile ranting in cyber-space), but having been enlightened by yourself I now feel that the terminology in the field should be revised and standardised (viz. above query about linkage disequilibrium, although whether wikipedia is the forum for this is highly unlikely :-) ). Personally I'd defer to yourself Felsenst to make the majority of the changes, but would be more than happy to contribute and help. There are a number of things that need to be addressed first, like

  1. Content : What is the remit of the article. The vague reference to HapMap isn't really useful to the article nor is its adjunct about Ensembl or dbSNP (there are other irrelevant items in the article as well). Worked examples of how admixture can lead to (spurious ;-) ) LD would be useful, as would a few graphs demonstrating the decay of LD over time as a function of the recombination rate/LD measure/strength of epistatic selection. Software for calculating and giving visual representation of LD should be included.
  1. Structure : How the content should be structured. Intro, background (which should give reference to HWeqm and genetic equilibrium and explain why they are pertinent), formulae, worked examples, software, references, links.
  1. Accuracy : All forumlae should be accurate (and referenced).
  1. Referencing : I've attempted to add references where possible, but if the article is to be fully restructured then it should also be fully referenced.
  1. Anything Else? This is just of the top of my head and I've not sat down and given it any hard thought so there are bound to be glaring mistakes.

I wouldn't be too worried about making the changes, if there were some sort of sandbox facility then the article could be written before being changed. I believe there are ways of getting wiki pages locked down if its clear that someone is consistently defacing them. I took it upon myself to merge two coalescent theory pages a while back and expected to have people complaining, but left a notice of the planned merge on each discussion page, waited a month or so, didn't hear anything and went ahead with it (besides, based on the discussion it seems you and I are the only one's watching these pages, the other changes are made by people stumbling across it saying "this is a bit shoddy, lets add my tupence here"). Slack---line (talk) 19:19, 28 November 2007 (UTC)

Well, I'm too busy right now. I was hoping by ranting to impel someone to try to fix it. In general it ought to start out much as it does, defining LD as nonrandom association. Then it should simply define it in terms of the frequency of a gamete: LD is when f(AB) is not equal to p(A)p(B). Then one can talk about the D = f(AB) - p(A)p(B) measure. Then perhaps some discussion of the forces that can cause it including selection on interacting loci, genetic drift, and admixture (migration). Next perhaps some discussion of how genotype frequencies depend on the gene frequencies and D, as well as how there are separate D's for every pair of alleles at these two loci. Then some mention of how they aren't all independent (with m and n alleles at the two loci there are actually (m-1)(n-1) degrees of freedom for D, as the D's for all pairs of alleles are not independent quantities. The I think only two more topics are essential: (1) the higher-order D's can be defined (there's a nice formula by Bill Hill for them, and (2) standardized measures of LD such as D' and r^2 need to be mentioned. Maybe somewhere in here the issue of estimating D from diploid genotype frequencies too. There are many peripheral topics but this would already be fairly long. BTW in the first paragraph both LD and linkage are defined as association. One is association across a population, the other association among gametes produced by a double heterozygote, but this is not clarified. Felsenst (talk) 08:03, 29 November 2007 (UTC)

[edit] Linkage disequilibrium measure, D

Things seem to be improving. Some concerns:

  1. I am not sure why the haplotype frequencies have to be carefully described as "relative frequencies". They don't add up to 1? Actually they do add up to 1 (if there are two alleles at each of the two loci) so describing them as relative is not helpful. They are absolute frequencies, not relative frequencies.
  2. Lewontin and Kojima's paper gives LD its name. That was eliminated. I am not sure whether they were the first to use the letter D, as the reference in the text now implies.
  3. I am told by Monty Slatkin that the historical attribution to Robbins is in fact incorrect -- LD and its mathematics was introduced slightly earlier in a paper by H. S. Jennings, who was the great pioneer of genetics of protists.

Some of the problems I noticed earlier are still around but things are improving. Felsenst (talk) 14:16, 12 December 2007 (UTC)

On a reread of Jennings's paper I am not so sure. He did work out the math of two loci each with two alleles, and correctly. However he did not introduce D but just gave expressions in terms of the four haplotype frequencies. Robbins was the first to recast this in terms of D, which he called δ. Jennings's paper is: Jennings, H. S. 1917. The numerical results of diverse systems of breeding, with respect to two pairs of characters, linked or independent, with special relation to the effects of linkage. Genetics 2: 97-154. Felsenst (talk) 13:18, 13 December 2007 (UTC)

Oops, actually, Robbins used the symbol Δ, not δ. Felsenst (talk) 12:56, 14 December 2007 (UTC)