Antlr - (Lexical) Rule

Card Puncher Data Processing

Antlr - (Lexical) Rule

About

Language - (Grammar|Production) Rule in Antlr.

Antlr has two types of rule:

Name Case Type Description Example from the getting started
uppercase letter lexer rule (known as Token name, they defines the token that the lexer will produce / capture ID : [a-z]+ ; defines an ID token that is made of letters from a to z
lowercase letter parser rule They defines how the relation between the token and therefore how the parser tree is build r : 'hello' ID ; defines a pattern with the world hello and the token ID defines just above

In other words, the lexer rule (ie token specification) defines each token that will be used in the the parser rule.

By default, they are specified in one file the grammar file.

The classes generated will contain a method for each rule in the grammar.

Management

Test

See Test rule tool in Idea

Order of precedence

The lexer uses then this order of precedence:

  • The lexer rule that recognizes the most input characters
  • The lexer rule occurring first in the grammar file
  • Imported rules

Example:

  • 34. will match FLOAT not INT then DOT
INT : [0-9]+ ;
DOT : '.' ; // match period
FLOAT : [0-9]+ '.' ; 

You may also use a guard function known as a semantic predicate

RegularExpressionLiteral : {isRegexPossible()}? '/' RegularExpressionBody '/' RegularExpressionFlags ;

Conflict

Token matching may conflict such as when:

  • you allow number in a name, you would get a conflict if you also define a number token.
  • the / character can be the start of a regular expression or the divider.

In case of token conflict, there is a couple of ways to handle it:

  • the first one is to create a parser rule with a alternate expression (allowing the two token to be taken). For example, you can get rid of the name conflict by creating a names rule such as below and replace the name rule with the names rule where you got the conflict.
names : name|Number;

Case insensitvity

https://github.com/antlr/antlr4/blob/master/doc/case-insensitive-lexing.md





Discover More
Card Puncher Data Processing
Antlr - (Grammar|Lexicon) (g4)

Grammar in the context of Antlr. The grammar definition of Antlr is called a antlr/antlr4/blob/master/doc/lexicon.mdLexicon because the grammar is used by the lexer (hence the lexer grammar) See: ...
Card Puncher Data Processing
Antlr - Generated class

From the grammar The lexer rules will create the lexer class The parser rules will create the parser class The classes generated will contain a method for each rule in the grammar. See from...
Idea Antlr Right Click Options
Antlr - Idea Plugin

The Idea plugin is a plugin for Idea that install the Antlr tool. Create a grammar file with the extension g4 and Idea should propose you to install the Antlr...
Card Puncher Data Processing
Antlr - Lexer Rule (Token names|Lexical Rule)

in Antlr. They are rules that defines tokens. They are written generally in the grammar but may be written in a lexer grammar file Each lexer rule is either matched or not so every lexer rule expression...
Card Puncher Data Processing
Antlr - Parse Tree Listener

The parse tree listener (or listener) is a class that implements callback methods that are called by the parser when it creates the parse tree. You can overwrite this class to get information when the...
Card Puncher Data Processing
Antlr - Parser Rule

in Antlr. Parser rule is the second type of rule for Antlr. They begin with a lowercase letter. The lexer rules specify the tokens whereas the parser rules specify the tree. URL URI See ...
Card Puncher Data Processing
Antlr - Walk / Visitor tree

Attach a listener to the parse tree that listens when the parse tree enters an SQL expression and gathers what you want From sqlite-parser A variable...



Share this page:
Follow us:
Task Runner