Table of Contents

Natural Language - Document (Cosine) Similarity

About

Cosine similarity applied to document similarity.

Implementation

Each document becomes a vector in some high dimensional space. To compare two documents we compute the cosine of the angle between their two document vectors.

The dot product and norm computations are simple functions of the bag-of-words document representations.

The geometric interpretation is more intuitive. When the angle between two document vectors is small, they are pointing roughly the same direction because they share many tokens in common.