Empirical distribution function

From Wikipedia, the free encyclopedia

In statistics, an empirical distribution function is a cumulative probability distribution function that concentrates probability 1/n at each of the n numbers in a sample.

Let X_1,\ldots,X_n be iid random variables in \mathbb{R} with the cdf F(x).

The empirical distribution function Fn(x) based on sample  X_1,\ldots,X_n is a step function defined by

F_n(x) = \frac{ \mbox{number of elements in the sample} \leq x}n = 
\frac{1}{n} \sum_{i=1}^n I(X_i \le x),

where I(A) is the indicator of event A.

For fixed x, I(X_i\leq x) is a Bernoulli random variable with parameter p = F(x), hence nFn(x) is a binomial random variable with mean nF(x) and variance nF(x)(1 − F(x)).

[edit] Asymptotical properties

F_n(x)\to F(x) almost surely for fixed x.
In other words, Fn(x) is a consistent unbiased estimator of the cumulative distribution function F(x).
\sqrt{n}(F_n(x)-F(x))

converges in distribution to a normal distribution N(0, F(x)(1 − F(x))) for fixed x.

The Berry–Esséen theorem provides the rate of this convergence.
\|F_n(x)-F(x)\|_\infty\to 0 with probability 1.
The Dvoretzky-Kiefer-Wolfowitz inequality provides the rate of this convergence.
\sqrt{n}\|F_n(x)-F(x)\|_\infty converges in distribution to the Kolmogorov distribution, provided that F(x) is continuous.
The Kolmogorov-Smirnov test for goodness-of-fit is based on this fact.
\sqrt{n}(F_n-F), as a process indexed by x, converges weakly in \ell^\infty(\mathbb{R}) to a Brownian bridge B(F(x)).

[edit] See also