Natural Language - Token (Word|Term)

Text Mining


In Natural Language processing, Tokens can be things like:

  • word,
  • numbers,
  • acronyms,
  • word-roots
  • or fixed-length character strings.

A token is the result of parsing (tokenization) the document down to the atomic elements generally of a language.

The token are then searchable.




See Natural Language Processing - (Tokenization|Parser|Text Segmentation|Word Break rules|Text Analysis)


See Natural Language Processing - (Tokenization|Parser|Text Segmentation|Word Break rules|Text Analysis)


NLP - (Word Stem|Stemming)


Discover More
Text Mining
English - Word

Every word has: a form, what it looks like, and a function, what it does in the sentence. See its counterpart Where are you going? Form: Adverb Function: Adverb (it answers the question...
Lexical Analysis - (Token|Lexical unit|Lexeme|Symbol|Word)

A token is symbols of the vocabulary of the language. Each token is a single atomic unit of the language. The token syntax is typically a regular language, so a finite state automaton constructed from...

Lucene is a text search engine library. The following application are Lucene application (ie build on it): * Solr * Elastic Search * New Relic Logs * ... The text data model of Lucene is...
Text Mining
NLP - Forward index

In text search, a forward index is an index that maps documents in a data set to the tokens they contain. This is also called the natural relationship. inverted index
Text Mining
NLP - Stop Words

Stop_wordsStop Words are common words (token) that do not contribute much to the content or meaning of a document. Stopwords add noise, have less value and needs to be ignored/excluded in: bag-of-words...
Text Mining
NLP - Synonym Expansion

Adding in synonyms at the same token position as the current word can mean better matching when users search with words in the synonym set.
Text Mining
Natural Language - Crawler

A crawler is an application (bot) that reads a document (such as web page, word file, ..) and parse them to extract meaningful information. Software for scanning large bodies of text such as collections...
Text Mining
Natural Language - Document

This page is the definition of a document in natural language processing. In natural language processing, a document is represented by: the bag of words model. ie a document has one or more term...
Text Mining
Natural Language Processing - (Tokenization|Parser|Text Segmentation|Word Break rules|Text Analysis)

Tokenization is the process of breaking input text into small indexing elements – tokens. Parsing and Tokenization are often call Text Analysis or Analysis in NLP. The tokens (or terms) are used either:...
Search Engine - Search Index

A search index is an index of token (word) to web page A search engine query it in order to return result. It's structure is inverted index meaning that it maps word to URL (page) The search index is...

Share this page:
Follow us:
Task Runner