Pearson product-moment correlation coefficient
From Wikipedia, the free encyclopedia
| It has been suggested that this article or section be merged into Correlation. (Discuss) |
| This article may require cleanup to meet Wikipedia's quality standards. Please improve this article if you can. (January 2008) |
In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the MCV or PMCC) (r) is a common measure of the correlation between two variables X and Y. When measured in a population the Pearson Product Moment correlation is designated by the Greek letter rho (ρ). When computed in a sample, it is designated by the letter r and is sometimes called "Pearson's r." Pearson's correlation reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables. A correlation of -1 means that there is a perfect negative linear relationship between variables. A correlation of 0 means there is no linear relationship between the two variables. Correlations are rarely if ever 0, 1, or -1. A certain outcome could indicate whether correlations are negative or positive.[1]
The statistic is defined as the sum of the products of the standard scores of the two measures divided by the degrees of freedom.[1]. If the data comes from a sample, then
where
are the standard score, sample mean, and sample standard deviation (calculated using n − 1 in the denominator).[1]
If the data comes from a population, then
where
are the standard score, population mean, and population standard deviation (calculated using n in the denominator).
The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations.
The coefficient ranges from −1 to 1. A value of 1 shows that a linear equation describes the relationship perfectly and positively, with all data points lying on the same line and with Y increasing with X. A score of −1 shows that all data points lie on a single line but that Y increases as X decreases. A value of 0 shows that a linear model is inappropriate – that there is no linear relationship between the variables.[1]
The Pearson coefficient is a statistic which estimates the correlation of the two given random variables.
The linear equation that best describes the relationship between X and Y can be found by linear regression. This equation can be used to "predict" the value of one measurement from knowledge of the other. That is, for each value of X the equation calculates a value which is the best estimate of the values of Y corresponding the specific value. We denote this predicted variable by Y'.
Any value of Y can therefore be defined as the sum of Y′ and the difference between Y and Y′:
The variance of Y is equal to the sum of the variance of the two components of Y:
Since the coefficient of determination implies that sy.x2 = sy2(1 − r2) we can derive the identity
The square of r is conventionally used as a measure of the association between X and Y. For example, if r2 is 0.90, then 90% of the variance of Y can be "accounted for" by changes in X and the linear relationship between X and Y.[1]
[edit] See also
- Linear correlation (wikiversity)
- Spearman's rank correlation coefficient
[edit] References
- ^ a b c d e Moore, David (August 2006). "4", Basic Practice of Statistics, 4, WH Freeman Company, 90-114. ISBN 0-7167-7463-1.
|
||||||||||||||||||||








