Each lexer rule is either matched or not so every lexer rule expression is a boolean expression.
- token - ie terminal symbol (the leaf of the parser tree)
TokenName : pattern -> lexerCommand
- fragment are just a pattern name (They does not produce token but they can be used in token definition to improve readibility)
fragment Name: pattern
A fragment is a special type of lexer rule that does not result in creation of tokens. They are only present to introduce logical expression that simplify the grammar.
A catchall rule is a lexical rule:
- placed at the end of the lexical grammar
- that catch all characters that didn't match any rule.
The name is often ANY.
ANY : . ;
ID : [a-zA-Z]+ ; // match lowercase and uppercase letters from A to Z INT : [0-9]+ ; // match a serie of digit from 0 to 9 DIGITS : [0-9] +; // same NEWLINE:'\r'? '\n' ; // match/return newlines to parser (end-statement signal) WS : [ \t]+ -> skip ; // toss out whitespace and tab HEX : ('%' [a-fA-F0-9] [a-fA-F0-9])+ ; // hexadecimal STRING : ([a-zA-Z~] |HEX) ([a-zA-Z0-9.-] | HEX)*; // lexer rule can use other lexer rule TEXT: ~[\])]+ ; // Capture everything apart the character \ and ) - Not class logical
Basically the same syntax than parser rules except that lexer rules:
- cannot have arguments,
- cannot return values, or local variables.
Lexer rule names (known als as Token name) must begin with an uppercase letter whereas parser rule names begin with a lowercase letter.
A lexer rule can be associated with:
- a single literal string expected in the input
- a selection of literal strings that may be found
A lexer rule:
- cannot be associated with a regular expression.
- can refer to other lexer rules.
Order of Precedence
Grammar - (Order of (operations|precedence)|operator precedence): The lexer chooses the rule that matches the most characters. If there is a tie then the first one is used.