Lexical Analysis - (Token|Lexical unit|Lexeme|Symbol|Word)


A token is symbols of the vocabulary of the language.

Each token is a single atomic unit of the language.

The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it.

A token is:

The process of finding and categorizing tokens from an input stream is called “tokenizing” and is performed by a Lexer (Lexical analyzer).

Token represents symbols of the vocabulary of a language.

A token is the result of parsing the document down to the atomic elements generally of a language.

Lexeme Type

A token might be:


Consider the following programming expression:

sum = 3 + 2;

Tokenized in the following table:

Lexeme Lexeme type
sum Identifier
= Assignment operator
3 Integer literal
+ Addition operator
2 Integer literal
; End of statement


Terminal / Non terminal


A token that has a name is called an identifier

Symbol Table

A symbol table is a table of all token with a name (ie an identifier)

Documentation / Reference

Powered by ComboStrap