Abstract syntax tree

From Wikipedia, the free encyclopedia

In computer science, an abstract syntax tree (AST), or just syntax tree is a finite, labeled, directed tree, where each interior node represents a programming language construct and the children of that node represent meaningful components of the construct. We devise operators to name these programming language constructs. Internal nodes are labeled by these operators, and the leaf nodes represent the operands of the operators. Thus, the leaf nodes are nullary operators and only represent variables or constants. An AST differs from a parse tree (also known as a concrete syntax tree) by omitting nodes and edges for syntax rules that do not affect the semantics of the program. Only significant programming language constructs are included. The classic example of such an omission is grouping parentheses, since in an AST the grouping of operands is implicit in the tree structure.

The AST is used in a parser as an intermediate between a parse tree and a data structure, the latter of which is often used as a compiler or interpreter's internal representation of a computer program while it is being optimized and from which code generation is performed. The range of all possible such structures is described by the abstract syntax. Creating an AST in a parser for a language described by a context free grammar, as nearly all programming languages are, is straightforward. Most rules in the grammar create a new node with the node's edges being the symbols in the rule. Rules that do not contribute to the AST, such as grouping rules, merely pass through the node for one of their symbols. Alternatively, a parser can create a full parse tree, and a post-pass over the parse tree can convert it to an AST by removing the nodes and edges not used in the abstract syntax.