Cosine similarity
From Wikipedia, the free encyclopedia
Cosine similarity is a measure of similarity between two vectors of n dimensions by finding the angle between them, often used to compare documents in text mining. Given two vectors of attributes, A and B, the cosine similarity, θ, is represented using a dot product and magnitude as
For text matching, the attribute vectors A and B are usually the tf-idf vectors of the documents.
Since the angle, θ, is in the range of [0,π], the resulting similarity will yield the value of π as meaning exactly opposite, π / 2 meaning independent, 0 meaning exactly the same, with in-between values indicating intermediate similarities or dissimilarities.
This cosine similarity metric may be extended such that it yields the Jaccard coefficient in the case of binary attributes. This is the Tanimoto coefficient, T(A,B), represented as
[edit] See also
- Sørensen's quotient of similarity
- Mountford's index of similarity
- Hamming distance
- Correlation
- Dice's coefficient
- Jaccard index



