Rand index
From Wikipedia, the free encyclopedia
In statistics, the Rand index or Rand measure is a measure of the similarity between two data clusters.
[edit] Definition
Given a set of n elements
and two partitions of S to compare,
and
, we define the following:
- a, the number of pairs of elements in S that are in the same set in X and in the same set in Y
- b, the number of pairs of elements in S that are in different sets in X and in different sets in Y
- c, the number of pairs of elements in S that are in the same set in X and in different sets in Y
- d, the number of pairs of elements in S that are in different sets in X and in the same set in Y
The Rand index, R, is:
Intuitively, one can think of a + b as the number of agreements between X and Y and c + d as the number of disagreements between X and Y.
The Rand index has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.
[edit] References
- W. M. Rand (1971). "Objective criteria for the evaluation of clustering methods". Journal of the American Statistical Association 66: 846–850. doi:.
- K. Y. Yeung & W. L. Ruzzo (2001). "Principal component analysis for clustering gene expression data". Bioinformatics 17 (9): 763–774. doi:.


