Kompilatorer och interpretatorer: Lecture 2

Note: This is an outline of what I intend to say on the lecture. It is not a definition of the course content, and it does not replace the textbook.

Today: More about syntax analysis ("parsing"),
Aho et al, sections 2.1 - 2.4

But first: Rest from lecture 1:

1.5 The Grouping of Phases

ASU p 21, Reducing the number of passes:

More rest from lecture 1:

1.6 Compiler-Construction Tools

ASU p 22: "Compiler-compiler" = a complete system for compiler building. But! "Yacc" = "Yet Another Compiler-Compiler" is a parser generator.

2.1 Overview

A compiler that translates infix to postfix:

Tree Infix notation Postfix notation
An abstract syntax tree for the expression 2 + 3 2 + 3 2 3 +
An abstract syntax tree for the expression 2 + 3 * 4 2 + 3 * 4 2 3 4 * +
An abstract syntax tree for the expression 2 * 3 + 4 2 * 3 + 4 2 3 * 4 +
An abstract syntax tree for the expression 2 * (3 + 4) 2 * (3 + 4) 2 3 4 + *

Source and target as text.

Postfix: Stack machine. Easy to write an interpreter.

The "2.5" program: simple grammar (Sw: "grammatik") (only + and -), simple parser, very simple scanner (one character = one token).
The "2.9" program: more advanced grammar (identifiers, *, /, mod, div), therefore a more complex parser, a "real" scanner.

2.2 Syntax definition

Example: the if statement in C. An instance:

if (a == b)
  printf("Same!\n");
else
  printf("Not same!\n");

This, as you know, is the syntax for the if statement:

if ( some expression ) some statement else some other statement

A rule that could be part of a context-free grammar (Sw: kontextfri grammatik) for C:

statement -> if ( expression ) statement else statement
statement -> if ( expression ) statement
statement -> { statement-list } (forgot what?)
...

"Context-free": a production "X -> ..." can always be used to replace X with "...", no matter what the rest of the program (that is, the context, Sw: kontext, omgivning) looks like.

  1. A set of terminals (Sw: terminaler) = terminal symbols = tokens
  2. A set of non-terminals (Sw: icke-terminaler) = non-terminal symbols (compound grammatical constructs)
  3. A set of productions (Sw: produktioner) = rules: non-terminal -> tokens/non-terminals. A production is for the non-terminal to the left.
  4. What is the start symbol (Sw: startsymbolen)
Other concepts:

Example 2.1 (p. 27)

7+3, 7+3-4+6, 3 (but not 17, -3 or 2*2)

digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
list -> digit
list -> list + digit
list -> list - digit

or

digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
list -> digit | list + digit | list - digit

Try 9-5+2.
9 -> digit -> list.
5 -> digit.
9-5 -> list - digit -> list
2 -> digit.
9-5+2 -> list + digit -> list

ASU fig 2.2, the parse tree (= concrete syntax tree) and the syntax tree (= abstract syntax tree):

Parse tree for 9-5+2

Why list + digit etc? Asymmetrical and ugly? Why not just list + list, like this:

digit -> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
string -> string | string + string | string - string

ASU fig 2.3 (slide!):

Two parse trees for 9-5+2

Operator associativity (Sw: Operatorassociativitet)

ASU fig 2.4:

Parse trees for left- and right-associative operators

Use a grammar like above, that is,
list -> list + digit
for left-associative (Sw: vänsterassociativa) operators. Use a grammar like
list -> digit + list
for right-associative (Sw: högerassociativa) operators, for example:

right -> letter = right
letter -> a | b | c | ... | z

Operator precedence (Sw: Operatorarprioritet, operatorarprecedens)

9 + 5 * 2 = 9 + (5 * 2), not (9 + 5) * 2.
"*" has higher precedence than "+".

Express this in the grammar:

factor -> digit | ( expr )
term -> term * factor | term / factor | factor
expr -> expr + term | expr - term | term

2.3 Syntax-directed translation

Not just the parse tree for a certain construct, but also keep track of attributes of each subtree. An attribute can be any data that we want to keep track of in the tree, such as the data type of an expression, or even generated code.

Two different types:

Syntax-directed definitions

Grammar + what to do for each production

Syntax-directed definition = context-free grammar, plus a semantic rule (Sw: semantisk regel) for each production, that specifies how to calculate values of attributes. Example:

Production Semantic rule
term -> 0 term.output -> " 1"
term -> 1 term.output -> " 1"
term -> 2 term.output -> " 2"
... ...
expr -> expr1 + term1 expr.output -> expr1.output + term1.output + " +"
... ...

ASU fig 2.6:

Attribute values at nodes in a parse tree

But a syntax-directed definition says nothing about how the parser should build the parse tree! Just the grammar, and what to do when we have found which production to use.

(Syntax-directed) translations schemes

Grammar + what to do for each production embedded in the grammar: in the right-hand side of productions

Syntax-directed definition = context-free grammar, plus semantic actions (Sw: semantiska aktioner, semantiska åtgärder) for each production, that specifies what to do. Example:

expr -> expr1 + term1 { print("+"); }

Generates postfix!

Or, with the action somewhere in the middle:

rest -> + term1 { print("+"); } rest1

The semantic actions are put in the parse tree, just like the "real" parts. ASU fig 2.12:

An extra leaf is constructed for a semantic action

ASU fig 2.14:

Actions translating 9-5+2 into 95-2+

2.4 Parsing

So, how does the parser build the parse tree?
Or rather: how does the parser "navigate through the grammar", guided by the source tokens it sees, in such a way that it could build a parse tree?

...............

Recursive-descent parsing = the parser is a program with a procedure (in C: "function") for each non-terminal

current token, lookahead symbol

backtracking

Predictive parsing

Predictive parsing = a form of recursive-descent, with no backtracking

FIRST(some-nonterminal)

Designing a predictive parser

Left-recursion

ASU fig 2.15:

Steps in top-down construction of a parse tree

ASU fig 2.16:

Top-down parsing while scanning the input from left to right


Thomas Padron-McCarthy (Thomas.Padron-McCarthy@tech.oru.se) January 22, 2003