Table of Contents

About

To stem words is to remove word endings like -s and -ing.

Stemming is replacing words with their stems.

Stemming is the process of reducing search tokens to their root (or “stem”)

A search for different type of a word will still yield results.

Example

For instance

  • “bikes” is replaced with “bike”; now query “bike” can find both documents containing “bike” and those containing “bikes”.
  • “search”, “searching” and “searched” can all be reduced to the stem “search”.

Algorithm

SnowBall is a good stemming algorithm.

  • See the attribute filter StringsToWordVector in Weka

Weka Stemmer Stringtowordvector

Documentation / Reference