Compiler - Lexer rule (Token names, Lexical rule, Token name)


lexer rule are rules written specifically for the lexer as they defines tokens

The lexer creates token from the input text that match this rules.

A lexer rule is also known as:

  • lexical rule
  • token specification
  • or token name

Lexer rule names begin with an uppercase letter whereas parser rule names begin with a lowercase letter.


Example of lexer rule with the antlr grammar syntax

ID  :   [a-zA-Z]+ ;      // match lowercase and uppercase letters from A to Z
INT :   [0-9]+ ;         // match a serie of digit from 0 to 9
DIGITS : [0-9] +; // same
NEWLINE:'\r'? '\n' ;     // match/return newlines to parser (end-statement signal)
WS  :   [ \t]+ -> skip ; // toss out whitespace and tab
HEX : ('%' [a-fA-F0-9] [a-fA-F0-9])+ ; // hexadecimal
STRING : ([a-zA-Z~] |HEX) ([a-zA-Z0-9.-] | HEX)*; // lexer rule can use other lexer rule


You can apply lexer rule conditionally with the lexical mode

Powered by ComboStrap