Empirical distribution function
From Wikipedia, the free encyclopedia
In statistics, an empirical distribution function is a cumulative probability distribution function that concentrates probability 1/n at each of the n numbers in a sample.
Let
be iid random variables in
with the cdf F(x).
The empirical distribution function Fn(x) based on sample
is a step function defined by
where I(A) is the indicator of event A.
For fixed x,
is a Bernoulli random variable with parameter p = F(x), hence nFn(x) is a binomial random variable with mean nF(x) and variance nF(x)(1 − F(x)).
[edit] Asymptotical properties
- By the strong law of large numbers,
-
almost surely for fixed x.
- In other words, Fn(x) is a consistent unbiased estimator of the cumulative distribution function F(x).
- By the central limit theorem,
converges in distribution to a normal distribution N(0, F(x)(1 − F(x))) for fixed x.
- The Berry–Esséen theorem provides the rate of this convergence.
- By the Glivenko-Cantelli theorem
uniformly over x, that is
-
with probability 1.
- The Dvoretzky-Kiefer-Wolfowitz inequality provides the rate of this convergence.
- Kolmogorov showed that
-
converges in distribution to the Kolmogorov distribution, provided that F(x) is continuous.
- The Kolmogorov-Smirnov test for goodness-of-fit is based on this fact.
-
, as a process indexed by x, converges weakly in
to a Brownian bridge B(F(x)).
[edit] See also
- Càdlàg functions
- Empirical probability
- Empirical process



