Lexical Analysis - (Token|Lexical unit|Lexeme|Symbol|Word)

1 - About

A token is symbols of the vocabulary of the language.

Each token is a single atomic unit of the language.

The token syntax is typically a regular language, so a finite state automaton constructed from a regular expression can be used to recognize it.

A token is:

The process of finding and categorizing tokens from an input stream is called “tokenizing” and is performed by a Lexer (Lexical analyzer).

Token represents symbols of the vocabulary of a language.

A token is the result of parsing the document down to the atomic elements generally of a language.

3 - Lexeme Type

A token might be:


Consider the following programming expression:

sum = 3 + 2;

Tokenized in the following table:

Lexeme Lexeme type
sum Identifier
= Assignment operator
3 Integer literal
+ Addition operator
2 Integer literal
; End of statement

4 - Properties

4.1 - Terminal / Non terminal

4.2 - Identifier

A token that has a name is called an identifier

5 - Symbol Table

A symbol table is a table of all token with a name (ie an identifier)

6 - Documentation / Reference

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap