To stem words is to remove word endings like -s and -ing.
Stemming is replacing words with their stems.
Stemming is the process of reducing search tokens to their root (or “stem”)
A search for different type of a word will still yield results.
- “bikes” is replaced with “bike”; now query “bike” can find both documents containing “bike” and those containing “bikes”.
- “search”, “searching” and “searched” can all be reduced to the stem “search”.
SnowBall is a good stemming algorithm.
- See the attribute filter StringsToWordVector in Weka