Data Mining - Cosine Similarity (Measure of Angle)

1 - About

The cosine similarity is a measure of the angle between two vectors, normalized by magnitude. You just divide the dot product by the magnitude of the two vectors.

3 - Formula

By taking the Linear Algebra - (Dot|Scalar|Inner) Product of two vectors and Linear Algebra - (Dot|Scalar|Inner) Product of two vectors definition of the dot product, we get the cosine similarity that is a normalized dot product of two vectors <MATH> similarity = \cos \theta = \frac{a.b}{||a|| ||b||} = \frac{ \sum a_i b_i }{ \sqrt{\sum a_i^2} \sqrt{\sum b_i^2} } </MATH>

  • If the angle is small (they share many tokens in common), the cosine is large.
  • If the angle is large (and they have few tokens in common), the cosine is small.

4 - Comparison

4.1 - Text

5 - Documentation / Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap