"http://www.w3.org/TR/html4/loose.dtd"> >

Chapter 10
Infix/Prefix/Postfix notation converter

10.1 Introduction

In the previous example, the parser computes the value of the expression on the fly, while parsing. It is also possible to build an abstract syntax tree to store an abstract representation of the input. This may be usefull when several passes are necessary.

This example shows how to parse an expression (infix, prefix or postfix) and convert it in infix, prefix and postfix notation. The expression is saved in a tree. Each node of the tree correspond to an operator in the expression. Each leave is a number. Then to write the expression in infix, prefix or postfix notation, we just need to walk throught the tree in a particular order.

10.2 Abstract syntax trees

The AST of this converter has two types of node:

class Op
is used to store operators (+, -, *, /, ^). It has two sons associated to the sub expressions.
class Atom
is an atomic expression (a number or a symbolic name).

Both classes are instanciated by the __init__ method. The infix, prefix and postfix methods return strings containing the representation of the node in infix, prefix and postfix notation.

10.3 Grammar

10.3.1 Infix expressions

The grammar for infix expressions is similar to the grammar used in the previous example.
EXPR/e -> TERM/e ( '[+-]'/op TERM/t e=Op<op,e,t,1> )* ;  
TERM/t -> FACT/t ( '[*/]'/op FACT/f t=Op<op,t,f,2> )* ;  
FACT/f -> ATOM/f ( '\^'/op FACT/e f=Op<op,f,e,3> )? ;  
 
ATOM/a -> ident/s a=Atom<s> | '\(' EXPR/a '\)' ;

10.3.2 Prefix expressions

The grammar for prefix expressions is very simple. A compound prefix expression is an operator followed by two subexpressions.
EXPR_PRE/e ->  
    ident/s e=Atom<s>  
|   '\(' EXPR_PRE/e '\)'  
|   OP/<op,prec> EXPR_PRE/a EXPR_PRE/b e=Op<op,a,b,prec>  
;  
 
OP/<op,prec> ->  
    '[+-]'/op prec=1  
|   '[*/]'/op prec=2  
|   '\^'/op   prec=3  
;

10.3.3 Postfix expressions

At first sight postfix and infix grammars may be very similar. Only the position of the operators changes. So a compound postfix expression is a first expression followed by a second and an operator. This rule is left recursive. As TPG is a descendant recursive parser, such rules are forbidden to avoid infinite recursion. To remove the left recursion a classical solution is to rewrite the grammar like this:
EXPR_POST/e -> ATOM_POST/a SEXPR_POST<a>/e ;  
 
ATOM_POST/a ->  
    ident/s a=Atom<s>  
|   '\(' EXPR_POST/a '\)'  
;  
 
SEXPR_POST<e>/e ->  
    EXPR_POST/e2 OP/<op,prec> SEXPR_POST<Op<op,e,e2,prec>>/e  
|   ;

The parser first searches for an atomic expression and then builds the AST by passing partial expressions by the attributes of the SEXPR_POST symbol.

10.4 Source code

Here is the complete source code (notation.py):

#!/usr/bin/env python  
 
# Infix/prefix/postfix expression conversion  
 
import tpg  
 
class Op:  
    """ Binary operator """  
    def __init__(self, op, a, b, prec):  
        self.op = op            # operator ("+", "-", "*", "/", "^")  
        self.prec = prec        # precedence of the operator  
        self.a, self.b = a, b   # operands  
    def infix(self):  
        a = self.a.infix()  
        if self.a.prec < self.prec: a = "(%s)"%a  
        b = self.b.infix()  
        if self.b.prec <= self.prec: b = "(%s)"%b  
        return "%s %s %s"%(a, self.op, b)  
    def prefix(self):  
        a = self.a.prefix()  
        b = self.b.prefix()  
        return "%s %s %s"%(self.op, a, b)  
    def postfix(self):  
        a = self.a.postfix()  
        b = self.b.postfix()  
        return "%s %s %s"%(a, b, self.op)  
 
class Atom:  
    """ Atomic expression """  
    def __init__(self, s):  
        self.a = s  
        self.prec = 99  
    def infix(self): return self.a  
    def prefix(self): return self.a  
    def postfix(self): return self.a  
 
exec(tpg.compile(r"""  
 
# Grammar for arithmetic expressions  
 
parser ExpressionParser:  
 
separator space: '\s+';  
token ident: '\w+';  
 
START/<e,t> ->  
    EXPR/e      t='infix'       '\n'  
|   EXPR_PRE/e  t='prefix'      '\n'  
|   EXPR_POST/e t='postfix'     '\n'  
;  
 
# Infix expressions  
 
EXPR/e -> TERM/e ( '[+-]'/op TERM/t e=Op<op,e,t,1> )* ;  
TERM/t -> FACT/t ( '[*/]'/op FACT/f t=Op<op,t,f,2> )* ;  
FACT/f -> ATOM/f ( '\^'/op FACT/e f=Op<op,f,e,3> )? ;  
 
ATOM/a -> ident/s a=Atom<s> | '\(' EXPR/a '\)' ;  
 
# Prefix expressions  
 
EXPR_PRE/e ->  
    ident/s e=Atom<s>  
|   '\(' EXPR_PRE/e '\)'  
|   OP/<op,prec> EXPR_PRE/a EXPR_PRE/b e=Op<op,a,b,prec>  
;  
 
# Postfix expressions  
 
EXPR_POST/e -> ATOM_POST/a SEXPR_POST<a>/e ;  
 
ATOM_POST/a ->  
    ident/s a=Atom<s>  
|   '\(' EXPR_POST/a '\)'  
;  
 
SEXPR_POST<e>/e ->  
    EXPR_POST/e2 OP/<op,prec> SEXPR_POST<Op<op,e,e2,prec>>/e  
|   ;  
 
OP/<op,prec> ->  
    '[+-]'/op prec=1  
|   '[*/]'/op prec=2  
|   '\^'/op   prec=3  
;  
 
"""))  
 
parser = ExpressionParser()  
while 1:  
    e = raw_input(":")  
    if e == "": break  
    try:  
        expr, t = parser(e+"\n")  
    except (tpg.LexicalError, tpg.SyntaxError), e:  
        print e  
    else:  
        print e, "is a", t, "expression"  
        print "\tinfix   :", expr.infix()  
        print "\tprefix  :", expr.prefix()  
        print "\tpostfix :", expr.postfix()