Rand index

From Wikipedia, the free encyclopedia

In statistics, the Rand index or Rand measure is a measure of the similarity between two data clusters.

[edit] Definition

Given a set of $n$ elements $S = \{O_1, \ldots, O_n\}$ and two partitions of $S$ to compare, $X = \{x_1, \ldots, x_r\}$ and $Y = \{y_1, \ldots, y_s\}$ , we define the following:

$a$ , the number of pairs of elements in $S$ that are in the same set in $X$ and in the same set in $Y$
$b$ , the number of pairs of elements in $S$ that are in different sets in $X$ and in different sets in $Y$
$c$ , the number of pairs of elements in $S$ that are in the same set in $X$ and in different sets in $Y$
$d$ , the number of pairs of elements in $S$ that are in different sets in $X$ and in the same set in $Y$

The Rand index, $R$ , is:

$R = \frac{a+b}{a+b+c+d} = \frac{a+b}{{n \choose 2 }}$

Intuitively, one can think of $a + b$ as the number of agreements between $X$ and $Y$ and $c + d$ as the number of disagreements between $X$ and $Y$ .

The Rand index has a value between 0 and 1, with 0 indicating that the two data clusters do not agree on any pair of points and 1 indicating that the data clusters are exactly the same.

[edit] References

W. M. Rand (1971). "Objective criteria for the evaluation of clustering methods". Journal of the American Statistical Association 66: 846–850. doi:10.2307/2284239.
K. Y. Yeung & W. L. Ruzzo (2001). "Principal component analysis for clustering gene expression data". Bioinformatics 17 (9): 763–774. doi:10.1093/bioinformatics/17.9.763.

Categories: Machine learning

Rand index

From Wikipedia, the free encyclopedia

[edit] Definition

[edit] References

Views

Navigation

Interaction

Search

Languages