Talk:Hypergeometric distribution

From Wikipedia, the free encyclopedia

Contents

[edit] Example?

Could someone please add an example? - Someone who didn't sign

Does it really make sense to allow the parameter n to be 0, as the side bar suggests? 81.159.124.90 04:12, 4 December 2005 (UTC)

No. Both N and n should be positive. The support is incorrect also. John Lawrence 21:05, 26 June 2007 (UTC)

To my mind the table on the mathematical characteristics of the hypergeometric distribution is much too broad. I suggest that a line break is used in the formula of curtosis. Falk Lieder 16:00, 6 October 2006 (UTC)

Doesn't it make sense if n=0. In that case the probability of 0 successes is 1, and the probability of any other number of successes is 0. I mean the math doesn't break, and it gives the solution you would intuitively expect. Hwttdz 15:13, 22 October 2007 (UTC)

Agree, I'll change it back.John Lawrence 14:35, 25 October 2007 (UTC)

[edit] (was: helpme)

Formatting problem:

The two tables "Probability mass function" and "drawn / not drawn" are overlapping each other when viewed in Mozilla Firefox version 1 and 2. (There is no problem in Internet Explorer).

I have tried with various html tags, putting headings between the tables, centering, etc. Nothing helps.

I think it has to do with the style sheets allowing text to float around tables. Do I have any access to change this? Arnold90 10:42, 17 June 2007 (UTC)

I tried an align=right. If that isn't ok I suggest asking at the Help desk to get more opinions.--Commander Keane 10:55, 17 June 2007 (UTC)
That helped. Thank you. 13:22, 17 June 2007 (UTC)

The fix has been undone?! Fixing it again Arnold90

[edit] related distributions section

The statement in the related distributions section doesn't make sense to me. X is a random variable, not a sequence, so taking the limit as written doesn't make sense. Also, unless D goes to infinity, the limiting distribution will be degenerate. I would like to replace it with a statement that expresses the same idea in an informal way. I would also like to add comments showing the relationship to the Bernoulli and Normal distribution. I will wait for comments about this change before changing anything. Here is how I would like the section to read:

Let X ~ Hypergeometric(D,N, n) and p = D / N.

  • If N and D are large compared to n and p is not close to 0 or 1, then P[X \le x] \approx P[Y \le x] where Y has a binomial distribution with parameters n and p.

Johnlv12 14:13, 26 June 2007 (UTC)

[edit] application and example section

I would like to remove everything from this section after the first horizontal line. The example of how to do use the calculator is not appropriate here. I would just leave the link to the website at the bottom of the page "external links". Also, the relationship to the binomial distribution is already in the "related distribution" section, so it is not needed here.Johnlv12 14:21, 26 June 2007 (UTC)

[edit] various additions and modifications

I would like to change the name of parameter D to m. Any objections?

I am going to add another symmetry relation and recurrence relations, and a formula for the mode.

I want to add a section for the multivariate hypergeometric distribution. I think it is OK to put this into the same article in analogy with Fisher's noncentral hypergeometric distribution and Wallenius' noncentral hypergeometric distribution. It is convenient to have the formulas for more than two colors of marbles on the same page. Any arguments for putting the multivariate distribution in a seperate entry?

Arnold90 15:59, 1 July 2007 (UTC)

somebody anonymously interchanged white to be failure and black to be success in the example. But, they only changed it in one place, which makes the remainder incorrect. I have changed it back to the way it was originally written. I have no problem with changing the colors or not referring to colors at all (although I think visualizing different colored balls helps). If someone wants to change the colors, they have to make all the changes to make the example still correct.John Lawrence 14:25, 19 October 2007 (UTC)

[edit] Two new subsections

I just added two subsections to greatly expand the symmetries section, putting a stronger expository emphasis on non-sampling applications where the symmetries are more self-evident. I do research concerning stochastic algorithms which greatly involve hypergeometric distributions, but I'm not by any means a proper statistician, so my focus is more on alternate modes of conceptualization, rather than technical statements. I suspect I added perhaps more exposition that appropriate for an article of mathematical bent. However, I do think that my material at least points out some perspectives that could be more forcefully treated in line with the rest of the exposition, if anyone wishes to adapt my contribution in that direction. I worked quite hard to distinguish sampling artifacts from the underlying distribution.

Coming from the symmetric perspective, I do have some technical concerns elsewhere in the article, but as a non-statistician I held back on making any edits to existing material.

The parameters of the distribution are expressed as m on 0..N and n on 1..N. Yet technically, that statement breaks the formal statements of symmetry. Under that statement of parameters, one can not interchange n with m when m==0.

Equations provided for the mean and mode exhibit obvious symmetry in n and m, but the variance equation does not. I feel it should. If I can still do basic algebra (highly debatable) a symmetric expression of the variance might look like this:

 nmN(N-n-m)+(nm)^2 / N^2(N-1)

I happen to like grouping all the division terms on the denominator, as you now have only one place to look to figure out the values of N where variance is undefined. Note that none of the higher order terms indicate the conditions under which they are defined. Perhaps it is standard fare in the stats articles that higher order statistics are only defined for sufficient N. E.g. skewness is undefined for N==2, even though N==2 is viable under parameters. Interestingly the skewness is expressed with symmetry, but kurtosis is not. I grant that anyone who can reason correctly from kurtosis can establish the symmetric form themself.

Why is entropy not given with summation notation? It's a fairly simple sum, and I certainly use it a lot in my own work. Likewise, the median could be expressed easily enough to the nearest integer by averaging the endpoints of the support. I suppose technically the median is only defined when it happens to land on an integer. As a computer scientist, I would tend to view median as one end of a rank order, where rank is abs(left_side-right_side), so by my view, nearby integers adjoined by rounding would both be medians under best-rank selection. Midpoints don't necessarily exist, but non-empty ranks always have mins/maxs. Sometimes you require a true median, sometimes you just want something as close as possible.

As I said, I'm not wedded to the exposition I added. I offer it in the spirit that it will at least suggest useful perspectives to future editors regardless of whether my text stands as contributed. It goes without saying my technical claims should be checked by a competent checker. MaxEnt 08:17, 28 September 2007 (UTC)

Neither of these sections make sense to me and I'm not sure if they belong in an encyclopedia type article. I feel like I would have to read them at least 3-4 times to understand what is written, then what would I know about the hypergeometric distribution afterwards? Too specialized and technical to be of general interst to most people who want to find out what is the hypergeometric distribution.John Lawrence 14:12, 19 October 2007 (UTC)