NLP - (Word Stem|Stemming)


To stem words is to remove word endings like -s and -ing.

Stemming is replacing words with their stems.

Stemming is the process of reducing search tokens to their root (or “stem”)

A search for different type of a word will still yield results.


For instance

  • “bikes” is replaced with “bike”; now query “bike” can find both documents containing “bike” and those containing “bikes”.
  • “search”, “searching” and “searched” can all be reduced to the stem “search”.


SnowBall is a good stemming algorithm.

  • See the attribute filter StringsToWordVector in Weka

Documentation / Reference

Powered by ComboStrap