Antlr - (Lexical) Rule
About
Language - (Grammar|Production) Rule in Antlr.
Antlr has two types of rule:
Name Case | Type | Description | Example from the getting started |
---|---|---|---|
uppercase letter | lexer rule | (known as Token name, they defines the token that the lexer will produce / capture | ID : [a-z]+ ; defines an ID token that is made of letters from a to z |
lowercase letter | parser rule | They defines how the relation between the token and therefore how the parser tree is build | r : 'hello' ID ; defines a pattern with the world hello and the token ID defines just above |
In other words, the lexer rule (ie token specification) defines each token that will be used in the the parser rule.
By default, they are specified in one file the grammar file.
The classes generated will contain a method for each rule in the grammar.
Articles Related
Management
Test
Order of precedence
The lexer uses then this order of precedence:
- The lexer rule that recognizes the most input characters
- The lexer rule occurring first in the grammar file
- Imported rules
Example:
- 34. will match FLOAT not INT then DOT
INT : [0-9]+ ;
DOT : '.' ; // match period
FLOAT : [0-9]+ '.' ;
You may also use a guard function known as a semantic predicate
RegularExpressionLiteral : {isRegexPossible()}? '/' RegularExpressionBody '/' RegularExpressionFlags ;
Conflict
Token matching may conflict such as when:
- you allow number in a name, you would get a conflict if you also define a number token.
- the / character can be the start of a regular expression or the divider.
In case of token conflict, there is a couple of ways to handle it:
- the first one is to create a parser rule with a alternate expression (allowing the two token to be taken). For example, you can get rid of the name conflict by creating a names rule such as below and replace the name rule with the names rule where you got the conflict.
names : name|Number;
- the second one is to use the order of precedence
- the third one is to use semantic predicate.
Case insensitvity
https://github.com/antlr/antlr4/blob/master/doc/case-insensitive-lexing.md