Table of Contents

Antlr - (Lexical) Rule

About

Language - (Grammar|Production) Rule in Antlr.

Antlr has two types of rule:

Name Case Type Description Example from the getting started
uppercase letter lexer rule (known as Token name, they defines the token that the lexer will produce / capture ID : [a-z]+ ; defines an ID token that is made of letters from a to z
lowercase letter parser rule They defines how the relation between the token and therefore how the parser tree is build r : 'hello' ID ; defines a pattern with the world hello and the token ID defines just above

In other words, the lexer rule (ie token specification) defines each token that will be used in the the parser rule.

By default, they are specified in one file the grammar file.

The classes generated will contain a method for each rule in the grammar.

Management

Test

See Test rule tool in Idea

Order of precedence

The lexer uses then this order of precedence:

Example:

INT : [0-9]+ ;
DOT : '.' ; // match period
FLOAT : [0-9]+ '.' ; 

You may also use a guard function known as a semantic predicate

RegularExpressionLiteral : {isRegexPossible()}? '/' RegularExpressionBody '/' RegularExpressionFlags ;

Conflict

Token matching may conflict such as when:

In case of token conflict, there is a couple of ways to handle it:

names : name|Number;

Case insensitvity

https://github.com/antlr/antlr4/blob/master/doc/case-insensitive-lexing.md