Data Mining - Cosine Similarity (Measure of Angle)
Table of Contents
About
The cosine similarity is a measure of the angle between two vectors, normalized by magnitude. You just divide the dot product by the magnitude of the two vectors.
Articles Related
Formula
By taking the Linear Algebra - (Dot|Scalar|Inner) Product of two vectors and Linear Algebra - (Dot|Scalar|Inner) Product of two vectors definition of the dot product, we get the cosine similarity that is a normalized dot product of two vectors <MATH> similarity = \cos \theta = \frac{a.b}{||a|| ||b||} = \frac{ \sum a_i b_i }{ \sqrt{\sum a_i^2} \sqrt{\sum b_i^2} } </MATH>
- If the angle is small (they share many tokens in common), the cosine is large.
- If the angle is large (and they have few tokens in common), the cosine is small.