Compiler - Lexer rule (Token names, Lexical rule, Token name)
Table of Contents
About
lexer rule are rules written specifically for the lexer as they defines tokens
The lexer creates token from the input text that match this rules.
A lexer rule is also known as:
- lexical rule
- token specification
- or token name
Lexer rule names begin with an uppercase letter whereas parser rule names begin with a lowercase letter.
Articles Related
Example
Example of lexer rule with the antlr grammar syntax
ID : [a-zA-Z]+ ; // match lowercase and uppercase letters from A to Z
INT : [0-9]+ ; // match a serie of digit from 0 to 9
DIGITS : [0-9] +; // same
NEWLINE:'\r'? '\n' ; // match/return newlines to parser (end-statement signal)
WS : [ \t]+ -> skip ; // toss out whitespace and tab
HEX : ('%' [a-fA-F0-9] [a-fA-F0-9])+ ; // hexadecimal
STRING : ([a-zA-Z~] |HEX) ([a-zA-Z0-9.-] | HEX)*; // lexer rule can use other lexer rule
Conditonaly
You can apply lexer rule conditionally with the lexical mode