Talk:Statistical significance

From Wikipedia, the free encyclopedia

This article is within the scope of WikiProject Statistics, which collaborates to improve Wikipedia's coverage of statistics. If you would like to participate, please visit the project page.

WikiProject Mathematics
This article is within the scope of WikiProject Mathematics, which collaborates on articles related to mathematics.
Mathematics rating: Start Class Mid Priority  Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.

The assertion "the smaller the p-value the more significant" is fallacious. A finding is either significant or it isn't. The p-value is not capable of measuring "how significant" a difference between two groups is. Before statistical analysis is performed, an a priori p-value which is the cutoff point for statistical significance must be chosen. This value is arbitrary, but 0.05 is conventionally the most commonly used value. If the p-value is greater than this a priori p-value, then the null hypothesis cannot be rejected. If the p-value is less, then the null hypothesis is rejected. It is a simple yes or no question. To put more stock in the p-value than this is to miss the point. Confidence intervals are a nice alternative to the p-value for this reason.

I'm not an expert, but this article is mixing two independent concepts in statistics. See: http://hops.wharton.upenn.edu/ideas/pdf/Armstrong/StatisticalSignificance.pdf

Should this be merged or linked to the article p-value?

Shouldn't critical value get its own article?

Contents

[edit] Comments moved from article

131.130.93.136 put the following at the top of the article significance...THIS ARTICLE IS HORRENDOUS.


The following article seems to have an error, as statistical significance is defined the other way round than used here.
The cited significance level of 5% actually is known as alpha error or error of the first kind or Type I error, whereas the significance level is 95%. Thus, the comparison of two significance levels of 99% and 95% obviously results in the facts stated below.
The statistical power is defined as 1-beta, beta being the Type II error or error of second kind.
The original article below:

I find this anonymous user's comments to be without merit. Since it's anonymous, I don't think any further comment is needed. Michael Hardy 01:20, 20 Nov 2004 (UTC)

To my mind, there is a confusion in the article between Type I error (α) OF A TEST, which has to be decided a priori, and significance level (p value) OF A RESULT, which is a posteriori. I don't regard this confusion as very serious, but some people do. --Henri de Solages 21:57, 10 December 2005 (UTC)

I'd like to see this corrected. Smoe
Raymond Hubbard paper in External Links goes into great detail about confusion between p and α. Have not got my head round it yet, but article in present form appears to treat them as being identical. 172.173.27.197 13:27, 28 March 2007 (UTC)

Added link from Level to Quantile. Maybe this should move to "See also". Smoe 21:22, 15 January 2006 (UTC)

[edit] Some major changes,mostly in tone

All things considered, I applaud the writers of "statistical significance." With the exception of the p value choice error (an a-priori decision), this describes and documents important issues very well! I do programmatic evaluation and am forever battling "lies, damn lies, and statistics." About to write another reminder to staff about the meaning of "statistical significance?!" of annual change which shows up from our instruments, I ran across this, and will just refer them to Wikipedia for now. A little statistics is a dangerous thing. Thanks, folks.


I removed the final paragraph only after I had written what is now the second paragraph, and noted it covered the same ground. I tried to convey the same information somewhat less formally and put in a more prominant position. I did this because I think this is of great importance to lay persons who are constantly confused by the concept of significance. This is a serious social issue, since pharmaceutical companies are willfully misleading the public by the use of this term. There are drugs out there, Aricept is one, that has a trivial effect, and no long term effect on curtailing Alzheimer’s, yet was approved and sold because of its “significant” effect, the degree of which is not even described.

With all due respect for those whose work I am being presumptuous enough to modify, it is those without the benefit of a good college statistics course who need this article. I do not believe I “dumbed it down” but rather attempted to make it more accessible. I, of course, left untouched the technical description which is excellent.

I also included the paragraph on spurious significance of multiple groups, which is another way that the public can be confused. I will follow up with a reference to the recent women's study, or someone else can do it, if they choose

I would welcome any comments Arodb 01:11, 26 February 2006 (UTC)

Arodb, I've moved your edits. I think your points should be made, but I felt that the article in its previous form was a masterpiece of succinctness, so I created a section for them in order to restore the clarity of the original. BrendanH 21:04, 20 March 2006 (UTC)

Italic text

[edit] First Sentence Confusing

"In statistics, a result is significant if it is unlikely to have occurred by chance, given that in reality, the independent variable (the test condition being examined) has no effect, or, formally stated, that a presumed null hypothesis is true."

I understand and agree with everything up to the second comma. After the comma it appears to say that "the independent variable has no effect in reality" which of course depends on the situation... could someone reword it? --Username132 (talk) 03:58, 16 April 2006 (UTC)

[edit] More Confusion

"For example, one may choose a significance level of, say, 5%, and calculate a critical value of a statistic (such as the mean) so that the probability of it exceeding that value, given the truth of the null hypothesis, would be 5%. If the actual, calculated statistic value exceeds the critical value, then it is significant "at the 5% level"."

What is the word "it" in reference to? --Username132 (talk) 04:14, 16 April 2006 (UTC)

It seems to me that the first sentence contradicts point #2 in the "frequent misunderstandings" section of Wikipedia's description of a "p value." —Preceding unsigned comment added by 68.10.153.140 (talk) 20:43, 25 May 2008 (UTC)

[edit] Small cleanup

The article seems messy right now. The first paragraph in particular was horrible. I've altered some parts for clarity and to try and make it more concise. Let me know what you think (particularly about the opening paragraph - I'm thinking more should be added to that). --Davril2020 06:21, 31 October 2006 (UTC)

I tried to make the opening paragraph more readable and more accessible to the ordinary person. It can still be further improved. I also added a paragraph to the "pitfalls" section (the last paragraph), describing one more pitfall. --Coppertwig 23:46, 6 November 2006 (UTC)

[edit] Popular levels of significance

I changed this, in the opening paragraph, from 10%, 5% and 1% to 5%, 1% and 0.1%. In any of the sciences where I've seen significance level used, as far as I remember, 5% is the maximum usually ever considered "statistically significant". If some people do sometimes use 10% somewhere, my editing is still not incorrect, since it's just listing some examples of commonly used levels. Coppertwig 19:53, 6 November 2006 (UTC)

[edit] Armstrong

I oppose this edit. [1] It looks like fringe to me. --Coppertwig 02:14, 29 August 2007 (UTC)

Does anyone know if the points raised in the Armstrong paragraph represent more than a single researcher's concerns? I think that this type of material might have a role if it represents an emerging, but broad-based set of concerns. However, if it is just one guy who doesn't like significance tests, I would recommend it be removed or at least toned down. Right now it seems a little prominent. Neltana 13:51, 24 October 2007 (UTC)


Actually, I think it is fair to call his work part of an emerging but broad-based set of concerns, and if anything, I don't think these concerns receive ENOUGH prominence in the article. See work by McCloskey and Ziliak, as well as books like "What if there were no significance tests?", "The significance test controversy", etc. 99.233.173.150 (talk) 01:48, 19 December 2007 (UTC)

[edit] Someone should probably document this

http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pmed.0020124&ct=1 —Preceding unsigned comment added by 84.68.149.153 (talk) 09:45, 11 October 2007 (UTC)

[edit] Recipes to using statistical significance in practice

I read this article when a reviewer said I need to test significance of the results in my paper submitted to a conf. But I found no recipe of how to add such a test to my experiments. I think the article needs some practical formulas as to when one has n experiments where population X gave the average A and population Y gave the average B, how one could reason about the significance of A > B. Thanks! 189.141.63.166 (talk) 22:01, 25 December 2007 (UTC) [2]

You're quite right! There is some information about tests at t-test. We need perhaps better links to it; but also, the information there can be improved, and some information on this page here about tests would be good, enough information that people can actually do the tests. I might help with this at some point. --Coppertwig (talk) 00:40, 26 December 2007 (UTC)

[edit] P-values versus Alphas

I appreciate the article as a valiant attempt to explain statistical significance and hypothesis testing. There is one major fallacy: Fisher’s p-values and Neyman-Pearson alpha levels are not equivalent. Obtaining a p-value of 0.05 or less tells you absolutely nothing about the Type-1 error rate. To calculate the p-value, you only need the knowledge of the null hypothesis and the distribution of the test statistic under the null. To calculate the probability of a Type-1 error and chose a suitable alpha, you need to know both the null and alternative hypotheses (and distributions). In practice, correctly determining alpha is not feasible in scientific experiments. Instead, we cite a p-value (if less than 0.05) and erroneously believe that this will limit the overall false positive rate of published scientific works to 5%. It does not. A p-value represents only the level of significance – how unlikely is it to get this result (or more extreme) by chance. Nothing else. See the nice article by Hubbard cited at the bottom of the page. —Preceding unsigned comment added by Franek cimono (talk • contribs) 17:02, 14 March 2008 (UTC)

[edit] Pitfalls

The first paragraph of the Pitfalls section appears to have been copied verbatim from this website: http://d-edreckoning.blogspot.com/2008/01/statistical-significance-in-education.html Krashski35 (talk) 17:36, 29 May 2008 (UTC)