A term-document matrix is an important representation for text analytics.
Each row of the matrix is a document vector, with one column for every term in the entire corpus.
Naturally, some documents may not contain a given term, so this matrix is sparse. The value in each cell of the matrix is the term frequency. (This value is often a weighted term frequency, typically using tf-idf – term frequency-inverse document frequency.)
Primitive search capabilities
Add a fictive document that contains the search words and compute the similarity matrix only for this document
SELECT * FROM frequency UNION SELECT 'search' as docid, 'washington' as term, 1 as count UNION SELECT 'search' as docid, 'taxes' as term, 1 as count UNION SELECT 'search' as docid, 'treasury' as term, 1 as count