Language - Context Free Grammar (CFG)

About

A context-free grammar is:

a set of instructions called grammar rules (known as production rules or production)
that describe all possible strings in a given formal language
without context
in order to build a lexer
that will tokenize code written in this language

They are the type 2 of the Chomsky grammar hierarchy.

Context-free grammars :

are used to describe the syntax of essentially every programming language.
play an important role in describing natural languages.

DTD’s taken as a whole, are really CFG’s.

Context-free grammars are used to put a tree structure on strings, typically text strings.

In context-free grammars, all rules are:

one-to-one,
one-to-many,
or one-to-none.

These rules can be applied regardless of context.

In a context free grammar, each (production|grammar) rule has :

a non-terminal symbol as its left-hand side (ie the symbol will not appear in the resulting formal language)
and a sequence of zero or more non-terminal and terminal symbols as its right-hand side.

In context-free grammars, terminal symbols never appear on the left hand side of a production rule

Starting from a sentence consisting of a single distinguished nonterminal, called the goal symbol, a given context-free grammar specifies a language, namely, the (perhaps infinite) set of possible sequences of terminal symbols that can result from repeatedly replacing any nonterminal in the sequence with a right-hand side of a production for which the nonterminal is the left-hand side.

Usage

Grammar Checking

Rules can be applied in reverse to check if a string is grammatically correct according to the grammar.

EBNF

Syntactic equations of the form defined in EBNF generate context-free languages.

Context Free meaning

The term context free is due to Chomsky and stems from the fact that substitution of:

the symbol left of =
by a sequence derived from the expression to the right of =

is always permitted, regardless of the context in which the symbol is embedded within the sentence.

It has turned out that this restriction to context freedom (in the sense of Chomsky) is quite acceptable for programming languages, and that it is even desirable.

Syntax

A CFG consists of the following components:

a set of terminal symbols, which are the characters of the alphabet that appear in the strings generated by the grammar.
a set of nonterminal symbols, which are placeholders for patterns of terminal symbols that can be generated by the nonterminal symbols.
a set of productions, which are rules for replacing (or rewriting) nonterminal symbols (on the left side of the production) in a string with other nonterminal or terminal symbols (on the right side of the production).
a start symbol, which is a special nonterminal symbol that appears in the initial string generated by the grammar.

Implementation

To generate a string of terminal symbols from a CFG, we:

Begin with a string consisting of the start symbol;
Apply one of the productions with the start symbol on the left hand size, replacing the start symbol with the right hand side of the production;
Repeat the process of selecting nonterminal symbols in the string, and replacing them with the right hand side of some corresponding production, until all nonterminals have been replaced by terminal symbols.

About

Grammar Checking

EBNF

Context Free meaning

Syntax

Implementation

Documentation / Reference