Language - (Syntax|Structural) Tree

Compiler

About

The graphical representations of a grammar are called Syntax or Structural tree.

Whether your language input is (Markdown, HTML, or prose, ..), it needs to be parsed to a workable format. Such a format is called a syntax tree.

There is actually two well-known trees:

  • The concrete syntax tree (CST) (or parse tree). A low-level representation of the grammar that represent every detail (such as white-space in white-space insensitive languages)
  • and the abstract syntax tree (AST). A high-level representation of the grammar structures that only represent details relating to the syntactic structure of code (such as ignoring whether a double or single quote was used in languages).

Example

Grammar Graphic Representation

Concret Syntax Tree (CST) vs Abstract Syntax Tree (AST)

In short, a A CST has all information needed to restore the original document completely, AST does not.

For example for an HTML document, CSTs house info on style such as tabs or spaces, but ASTs do not. This makes ASTs often easier to work with.

ASTs can still recreate an exact syntactic representation.

A parse tree (CST) is a record of the rules (and tokens) used to match some input text whereas a abstract syntax tree records the structure of the input and is insensitive to the grammar that produced it.

There are an infinite number of grammars for any single language and hence every grammar will result in a different tree form because of all the intermediate rules.

An abstract syntax tree is a far superior intermediate form precisely because of this insensitivity and because it highlights the structure of the language not the grammar.

It's best to look at an example. For input 3+4 you really want to use the following intermediate form:

+
|  \
3  4

That is, the operator at the root and 3 and 4 as operands (children). In child-sibling form, you'd have

+
|
3 -- 4

Ok, so now a parse tree. I'll pick an extremely simple one out of the infinite number:

expr
   |
plus
   | \ \ 
  3  +  4

Usage

A syntax tree (from a compiler,…) may be used to perform:

  • static semantic analyses like checking that all variables are defined.
  • code generation (type-checking, code optimization, flow analysis, checking for variables being assigned values before they’re used, and so on)
  • pretty-printing (ie Syntax Formatting)
  • program restructuring,
  • code instrumentation,
  • and computing various metrics of a program.

Most of these operations will need to treat nodes that represent assignment statements differently from nodes that represent variables or arithmetic expressions.

Specification

Documentation / Reference





Discover More
Card Puncher Data Processing
Code - Syntax Highlighting

Syntax Highlighting gives color to every type of token. Syntax highlighting is generally achieved via linter. The linter creates a syntax tree that is traversed by the syntax syntax highlighter code...
Compiler
Compiler - LL parser

An LL(k) parser is a type of parser that build the tree in a top-down way. LR approach It is a top-down parser that parses from left to right, constructs a leftmost derivation of the input ...
Compiler
Compiler - LR parser

LR is a type of parser that build the tree from the leaves (bottom-up). LL parser LR(k) means:Left to right,Rightmost derivation parser. LR(k), is: a bottom-up parser that parses from left to...
Compiler
Compiler - Parser Rule

A parser rule is a rule that defines the structure of the parse tree. The parser uses them to build the parse tree. lexer rulestokenstree Parser rule names always start with a lowercase letter (...
Compiler
Concrete Syntax Tree - What is that ?

A concrete syntax tree is a tree representation of a grammar that contains the whole parsed document
Prosemirror Dom
How Rich Text editor in HTML are made (Principles and Demo)

How do you create a Rich Text editor in HTML, what are the well-known text editor and what are the principals. This article includes also a basic example where you can extend from to build your own
Grammar Different Structural Trees For The Same Expression
Language - (Grammar | Syntax | Lexicon)

Grammar in the context of a compiler. Ie how to define it. grammar section In processing computer languages, semantic processing generally comes after syntactic processing, but in some cases semantic...
Compiler
Language - Compiler compilers or (lexer|parser) generators

Compiler-compilers splits the work into a lexer and a parser: The Lexer reads text data (file, string,...) and divides it into tokens using lexer rule (patterns). It generates as output a list of tokens...
Code Analysis Variable Unused Name Error
Language - Linter (Code Analysis, Validation)

A linter is a tool that: statically analyze code finds problems in them may enforce a coding style. syntax highlighting Bad name writing is a problem that occurs when the developer makes...
Ast Explorer Php
Parser / Compiler - (Abstract) Syntax Tree (AST)

An Abstract Syntax Tree (AST) is a syntax tree. It represents the structure of the grammar at a higher level than a concrete syntax tree (parse tree) and thus easier to manipulate. At the opposite of...



Share this page:
Follow us:
Task Runner