Lexical Mode / Lexer Context / Lexical State

About

The lexical mode is a lexer property while creating the token.

It's also known as:

lexer context
or lexer state

This is generally the only context related data.

It permits to apply lexer rules conditionally.

It means supporting different lexing rules depending on the lexer context. It’s like having multiple separate sub-lexers:

one for each context.
that can be switched between.

Example

The most obvious example is the html pre or html code inside of which no other syntax should be recognized by the Lexer (ie no lexer rules should apply)

This capability is also necessary to tokenize some languages such as HTML ¹⁾/XML ²⁾

Name in Library

Chevrotain

push_mode in chevrotain

Antlr

mode, push_mode in antlr

Jflex

Mode is called a state in Jflex.

You enter a state with the function yybegin. Jflex starts with the state called YYINITIAL

yybegin(MY_STATE)

Jflex does not have the notion of push mode, you need to implement it yourself, Example

%s[tate] "state identifier" [, "state identifier", ... ] for inclusive or
%x[state] "state identifier" [, "state identifier", ... ] for exclusive states

³⁾ ⁴⁾ ⁵⁾

¹⁾

https://github.com/antlr/grammars-v4/blob/master/html/HTMLLexer.g4

²⁾

https://github.com/antlr/grammars-v4/blob/master/xml/XMLLexer.g4

³⁾

state in dokuwiki

⁴⁾

https://chevrotain.io/docs/features/lexer_modes.html

⁵⁾

https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#lexical-modes