Antlr - Lexer Rule (Token names|Lexical Rule)

Card Puncher Data Processing

About

Compiler - Lexer rule (Token names, Lexical rule, Token name) in Antlr.

They are rules that defines tokens.

They are written generally in the grammar but may be written in a lexer grammar file

Each lexer rule is either matched or not so every lexer rule expression is a boolean expression.

Syntax

Token

TokenName : pattern -> lexerCommand

where:

Fragment

  • fragment are just a pattern name (They does not produce token but they can be used in token definition to improve readibility)
fragment Name: pattern

A fragment is a special type of lexer rule that does not result in creation of tokens. They are only present to introduce logical expression that simplify the grammar.

Catch All

A catchall rule is a lexical rule:

  • placed at the end of the lexical grammar
  • that catch all characters that didn't match any rule.

The name is often ANY.

Example:

ANY : . ;

Example

ID  :   [a-zA-Z]+ ;      // match lowercase and uppercase letters from A to Z
INT :   [0-9]+ ;         // match a serie of digit from 0 to 9
DIGITS : [0-9] +; // same
NEWLINE:'\r'? '\n' ;     // match/return newlines to parser (end-statement signal)
WS  :   [ \t]+ -> skip ; // toss out whitespace and tab
HEX : ('%' [a-fA-F0-9] [a-fA-F0-9])+ ; // hexadecimal
STRING : ([a-zA-Z~] |HEX) ([a-zA-Z0-9.-] | HEX)*; // lexer rule can use other lexer rule
TEXT: ~[\])]+ ; // Capture everything apart the character \ and ) - Not class logical

Syntax

Basically the same syntax than parser rules except that lexer rules:

  • cannot have arguments,
  • cannot return values, or local variables.

Lexer rule names (known als as Token name) must begin with an uppercase letter whereas parser rule names begin with a lowercase letter.

A lexer rule can be associated with:

  • a single literal string expected in the input
  • a selection of literal strings that may be found
  • a sequence of specific characters and ranges of characters using the quantifier (greedy ?, * and + or lazy (??, *? and +? )

A lexer rule:

  • cannot be associated with a regular expression.
  • can refer to other lexer rules.

Order of Precedence

Grammar - (Order of (operations|precedence)|operator precedence): The lexer chooses the rule that matches the most characters. If there is a tie then the first one is used.

Documentation / Reference





Discover More
Card Puncher Data Processing
Antlr - (Grammar|Lexicon) (g4)

Grammar in the context of Antlr. The grammar definition of Antlr is called a antlr/antlr4/blob/master/doc/lexicon.mdLexicon because the grammar is used by the lexer (hence the lexer grammar) See: ...
Card Puncher Data Processing
Antlr - (Lexical) Rule

in Antlr. Antlr has two types of rule: Name Case Type Description Example from the getting started uppercase letter lexer rule (known as Token name, they defines the token that the lexer...
Card Puncher Data Processing
Antlr - Channel

When you don't want to check for non-semantic rule such as comments / whitespace, you can throw them away with -> skip but if you want to preserve them, you use channels. Only lexer grammars can...
Card Puncher Data Processing
Antlr - Fragment Lexer Rule

A “fragment” lexer rules does not result in creation of tokens and are only present for grammar simplification.
Card Puncher Data Processing
Antlr - Generated class

From the grammar The lexer rules will create the lexer class The parser rules will create the parser class The classes generated will contain a method for each rule in the grammar. See from...
Idea Antlr Right Click Options
Antlr - Getting Started (Hello World)

A getting started page that brings you in the world of Antlr. antlr/antlr4/blob/master/doc/getting-started.mdantlr4 getting-started Create a grammar file called Hello.g4 and define the grammar...
Card Puncher Data Processing
Antlr - Lexer Grammar

The grammar file of the lexer is composed of lexer rule, optionally broken into multiple lexical modes grammar file Only lexer grammars can contain: mode specifications. custom channels specifications...
Card Puncher Data Processing
Antlr - Lexical mode

Lexical mode in Antlr Lexical modes allow to split a single lexer grammar file into multiple sublexers. The lexer can then only return tokens matched by rules from the current mode. Lexers start out...
Card Puncher Data Processing
Antlr - Parser Rule

in Antlr. Parser rule is the second type of rule for Antlr. They begin with a lowercase letter. The lexer rules specify the tokens whereas the parser rules specify the tree. URL URI See ...
Card Puncher Data Processing
Antlr - Token

org/antlr/v4/runtime/Token A token can be defined via: a or the A token is primarily defined via a lexer rule (Lexical rule) Example: the lexical rule LOWERCASE that capture a string...



Share this page:
Follow us:
Task Runner