Data Mining - Cosine Similarity (Measure of Angle)

About

The cosine similarity is a measure of the angle between two vectors, normalized by magnitude. You just divide the dot product by the magnitude of the two vectors.

Formula

By taking the Linear Algebra - (Dot|Scalar|Inner) Product of two vectors and Linear Algebra - (Dot|Scalar|Inner) Product of two vectors definition of the dot product, we get the cosine similarity that is a normalized dot product of two vectors <MATH> similarity = \cos \theta = \frac{a.b}{||a|| ||b||} = \frac{ \sum a_i b_i }{ \sqrt{\sum a_i^2} \sqrt{\sum b_i^2} } </MATH>

  • If the angle is small (they share many tokens in common), the cosine is large.
  • If the angle is large (and they have few tokens in common), the cosine is small.

Comparison

Text

Documentation / Reference


Powered by ComboStrap