Hotelling's T-square distribution

From Wikipedia, the free encyclopedia

In statistics, Hotelling's T-square statistic,[1] named for Harold Hotelling, is a generalization of Student's t statistic that is used in multivariate hypothesis testing.

Hotelling's T-square statistic is defined as


t^2=n({\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}({\mathbf x}-{\mathbf\mu})

where n is a number of points (see below) {\mathbf x} is a column vector of p elements and {\mathbf W} is a p\times p covariance matrix.

If x\sim N_p(\mu,{\mathbf V}) is a random variable with a multivariate Gaussian distribution and {\mathbf W}\sim W_p(m,{\mathbf V}) (independent of x) has a Wishart distribution with the same non-singular variance matrix \mathbf V and with m = n − 1, then the distribution of t2 is T2(p,m), Hotelling's T-square distribution with parameters p and m. It can be shown that


\frac{m-p+1}{pm}
T^2\sim F_{p,m-p+1}

where F is the F-distribution.

Now suppose that

{\mathbf x}_1,\dots,{\mathbf x}_n

are p×1 column vectors whose entries are real numbers. Let

\overline{\mathbf x}=(\mathbf{x}_1+\cdots+\mathbf{x}_n)/n

be their mean. Let the p×p positive-definite matrix

{\mathbf W}=\sum_{i=1}^n (\mathbf{x}_i-\overline{\mathbf x})(\mathbf{x}_i-\overline{\mathbf x})'/(n-1)

be their "sample variance" matrix. (The transpose of any matrix M is denoted above by M′). Let μ be some known p×1 column vector (in applications a hypothesized value of a population mean). Then Hotelling's T-square statistic is


t^2=n(\overline{\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}(\overline{\mathbf x}-{\mathbf\mu}).

Note that t2 is closely related to the squared Mahalanobis distance.

In particular, it can be shown [2] that if {\mathbf x}_1,\dots,{\mathbf x}_n\sim N_p(\mu,{\mathbf V}), are independent, and \overline{\mathbf x} and {\mathbf W} are as defined above then {\mathbf W} has a Wishart distribution with n − 1 degrees of freedom

\mathbf{W} \sim W_p(V,n-1).

and is independent of \overline{\mathbf x}, and

\overline{\mathbf x}\sim N_p(\mu,V/n)

This implies that:

t^2 = n(\overline{\mathbf x}-{\mathbf\mu})'{\mathbf W}^{-1}(\overline{\mathbf x}-{\mathbf\mu}) \sim T^2(p, n-1).

[edit] Hotelling's two-sample T-square statistic

If {\mathbf x}_1,\dots,{\mathbf x}_{n_x}\sim N_p(\mu,{\mathbf V}) and {\mathbf y}_1,\dots,{\mathbf y}_{n_y}\sim N_p(\mu,{\mathbf V}), with the samples independently drawn from two independent multivariate normal distributions with the same mean and covariance, and we define

\overline{\mathbf x}=\frac{1}{n_x}\sum_{i=1}^{n_x} \mathbf{x}_i \qquad \overline{\mathbf y}=\frac{1}{n_y}\sum_{i=1}^{n_y} \mathbf{y}_i

as the sample means, and

{\mathbf W}= \frac{\sum_{i=1}^{n_x}(\mathbf{x}_i-\overline{\mathbf x})(\mathbf{x}_i-\overline{\mathbf x})'
+\sum_{i=1}^{n_y}(\mathbf{y}_i-\overline{\mathbf y})(\mathbf{y}_i-\overline{\mathbf y})'}{n_x+n_y-2}

as the unbiased pooled covariance matrix estimate, then Hotelling's two-sample T-square statistic is

t^2 = \frac{n_x n_y}{n_x+n_y}(\overline{\mathbf x}-\overline{\mathbf y})'{\mathbf W}^{-1}(\overline{\mathbf x}-\overline{\mathbf y})
\sim T^2(p, n_x+n_y-2)

and it can be related to the F-distribution by

\frac{n_x+n_y-p-1}{(n_x+n_y-2)p}t^2 \sim F(p,n_x+n_y-1-p).[2]

[edit] See also

[edit] References

  1. ^ H. Hotelling (1931) The generalization of Student's ratio, Ann. Math. Statist., Vol. 2, pp360-378.
  2. ^ a b K.V. Mardia, J.T. Kent, and J.M. Bibby (1979) Multivariate Analysis, Academic Press.
Languages