Compiler - Lexer rule (Token names, Lexical rule, Token name)

1 - About

lexer rule are rules written specifically for the lexer as they defines tokens

The lexer creates token from the input text that match this rules.

A lexer rule is also known as:

  • lexical rule
  • token specification
  • or token name
Lexer rule names begin with an uppercase letter whereas parser rule names begin with a lowercase letter.

3 - Example

Example of lexer rule with the antlr grammar syntax

ID  :   [a-zA-Z]+ ;      // match lowercase and uppercase letters from A to Z
INT :   [0-9]+ ;         // match a serie of digit from 0 to 9
DIGITS : [0-9] +; // same
NEWLINE:'\r'? '\n' ;     // match/return newlines to parser (end-statement signal)
WS  :   [ \t]+ -> skip ; // toss out whitespace and tab
HEX : ('%' [a-fA-F0-9] [a-fA-F0-9])+ ; // hexadecimal
STRING : ([a-zA-Z~] |HEX) ([a-zA-Z0-9.-] | HEX)*; // lexer rule can use other lexer rule

