A token is symbols of the vocabulary of the language.
Each token is a single atomic unit of the language.
The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it.
A token is:
The process of finding and categorizing tokens from an input stream is called “tokenizing” and is performed by a Lexer (Lexical analyzer).
Token represents symbols of the vocabulary of a language.
A token is the result of parsing the document down to the atomic elements generally of a language.
See also Natural Language - Token (Word|Term)
A token might be:
Example:
Consider the following programming expression:
sum = 3 + 2;
Tokenized in the following table:
Token | |
---|---|
Lexeme | Lexeme type |
sum | Identifier |
= | Assignment operator |
3 | Integer literal |
+ | Addition operator |
2 | Integer literal |
; | End of statement |
A token that has a name is called an identifier
A symbol table is a table of all token with a name (ie an identifier)