Grammar in the context of a compiler. Ie how to define it.
See the grammar section if you are interessed on the structure.
In processing computer languages, semantic processing generally comes after syntactic processing, but in some cases semantic processing is necessary for complete syntactic analysis, and these are done together or concurrently.
The grammar language (ie type) is dependent of the parser generator
Below is a non-exhaustive list:
See also Grammar Conversion Tool
Computer language syntax is generally distinguished into three levels:
How you work to define the grammar depends on your taste and knowledge of the syntax.
There is basically two ways to define the rules tree.
A bottom-up approach will:
The Top-down approach will:
See Grammar - Hierarchy of grammar (Chomsky)
For example, a correct sentence always consists of a subject followed by a predicate.
This rule can be expressed with formula's that are called:
This above sentence can be described by the following formula:
sentence = subject predicate.
where:
If we add to this formula the two further formulas
subject = "John" | "Mary".
predicate = "eats" | "talks".
where:
We define herewith exactly four possible sentences, namely
John eats Mary eats
John talks Mary talks
S = AB.
A = "a" | "b".
B = "c" | "d".
L = {ac, ad, bc, bd}
where:
Nesting is a property particularly important in the definition of structured languages.
The graphical representations are:
See What is a Railroad Diagram? known as Syntax diagram
A language is defined by the following symbol type:
Symbol Type | Also Called | Cardinality | Definition | Example |
---|---|---|---|---|
terminal | term | set (vocabulary) | These symbols are said to be terminal, because they cannot be substituted by any other symbols. | John, Mary, eats, talks. |
non-terminal | identifier | set | They denote syntactic (classes|category) and can be substituted. | Sentence, Subject and predicate |
syntactic equations | productions | set | These define the possible substitutions of non-terminal symbols. An equation is specified for each non-terminal symbol. | |
start | 1 | This symbol is a non-terminal symbol. | Sentence |
Syntactic equations - Backus Naur Form (BNF), Extended BNF (EBNF) and ABNF
A language is, therefore, the set of sequences of terminal symbols which, starting with the start symbol, can be generated by repeated application of syntactic equations, that is, substitutions.
The idea of defining languages and their grammar with mathematical precision goes back to N. Chomsky. It became clear, however, that the presented, simple scheme of substitution rules was insufficient to represent the complexity of spoken languages. This remained true even after the formalisms were considerably expanded. In contrast, this work proved extremely fruitful for the theory of programming languages and mathematical formalisms. In passing, we emphasize that this rigour applied to the syntax only, not to the semantics.
The use of the Chomsky formalism is also responsible for the term programming language, because programming languages seemed to exhibit a structure similar to spoken languages. This term is rather unfortunate, because a programming language is not spoken, and therefore is not a language in the true sense of the word. Formalism or formal notation would have been more appropriate terms.
The structure of a (well formed) sentence is relevant, because it is instrumental in establishing the sentence's meaning. Owing to the syntactic structure, the individual parts of the sentence and their meaning can be recognized independently, and together they yield the meaning of the whole.
expression = number | expression "+" expression.
number = "1" | "2" | "3" | "4" .
“4 + 2 + 1” is a well-formed expression. This expression may even be derived in several ways, each corresponding to a different structural trees, as shown below:
The two differing structures may also be expressed with appropriate parentheses, respectively :
Fortunately, thanks to the associativity of addition both yield the same value 7.
But this need not always be the case. The mere use of subtraction in place of addition yields a counter example which shows that the two differing structures also yield a different interpretation and result:
The example illustrates two facts:
We call a syntactic class ambiguous if it can be attributed several structures. A language is ambiguous if it contains at least one ambiguous syntactic class (construct).
Language - Context Free Grammar (CFG)