A context-free grammar is:
- a set of instructions called grammar rules (known as production rules or production)
- that describe all possible strings in a given formal language
- without context
- in order to build a lexer
- that will tokenize code written in this language
They are the type 2 of the Chomsky grammar hierarchy.
Context-free grammars :
- are used to describe the syntax of essentially every programming language.
- play an important role in describing natural languages.
DTD’s taken as a whole, are really CFG’s.
Context-free grammars are used to put a tree structure on strings, typically text strings.
In context-free grammars, all rules are:
- or one-to-none.
These rules can be applied regardless of context.
In a context free grammar, each (production|grammar) rule has :
- a non-terminal symbol as its left-hand side (ie the symbol will not appear in the resulting formal language)
- and a sequence of zero or more non-terminal and terminal symbols as its right-hand side.
In context-free grammars, terminal symbols never appear on the left hand side of a production rule
Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given context-free grammar specifies a language, namely, the (perhaps infinite) set of possible sequences of terminal symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side.
Rules can be applied in reverse to check if a string is grammatically correct according to the grammar.
Syntactic equations of the form defined in EBNF generate context-free languages.
Context Free meaning
The term context free is due to Chomsky and stems from the fact that substitution of:
- the symbol left of =
- by a sequence derived from the expression to the right of =
is always permitted, regardless of the context in which the symbol is embedded within the sentence.
It has turned out that this restriction to context freedom (in the sense of Chomsky) is quite acceptable for programming languages, and that it is even desirable.
A CFG consists of the following components:
- a set of terminal symbols, which are the characters of the alphabet that appear in the strings generated by the grammar.
- a set of nonterminal symbols, which are placeholders for patterns of terminal symbols that can be generated by the nonterminal symbols.
- a set of productions, which are rules for replacing (or rewriting) nonterminal symbols (on the left side of the production) in a string with other nonterminal or terminal symbols (on the right side of the production).
- a start symbol, which is a special nonterminal symbol that appears in the initial string generated by the grammar.
To generate a string of terminal symbols from a CFG, we:
- Begin with a string consisting of the start symbol;
- Apply one of the productions with the start symbol on the left hand size, replacing the start symbol with the right hand side of the production;
- Repeat the process of selecting nonterminal symbols in the string, and replacing them with the right hand side of some corresponding production, until all nonterminals have been replaced by terminal symbols.