NAME

Parse::Eyapp::Node - The nodes of the Syntax Trees

SYNOPSIS

use Parse::Eyapp;
use Parse::Eyapp::Treeregexp;

sub TERMINAL::info {
  $_[0]{attr}
}

my $grammar = q{
  %right  '='     # Lowest precedence
  %left   '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
  %left   '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
  %left   NEG     # Disambiguate -a-b as (-a)-b and not as -(a-b)
  %tree           # Let us build an abstract syntax tree ...

  %%
  line:
      exp <%name EXPRESION_LIST + ';'>
        { $_[1] } /* list of expressions separated by ';' */
  ;

  /* The %name directive defines the name of the class to which the node being built belongs */
  exp:
      %name NUM
      NUM
    | %name VAR
      VAR
    | %name ASSIGN
      VAR '=' exp
    | %name PLUS
      exp '+' exp
    | %name MINUS
      exp '-' exp
    | %name TIMES
      exp '*' exp
    | %name DIV
      exp '/' exp
    | %name UMINUS
      '-' exp %prec NEG
    | '(' exp ')'
        { $_[2] }  /* Let us simplify a bit the tree */
  ;

  %%
  sub _Error { die "Syntax error near ".($_[0]->YYCurval?$_[0]->YYCurval:"end of file")."\n" }

  sub _Lexer {
    my($parser)=shift; # The parser object

    for ($parser->YYData->{INPUT}) { # Topicalize
      m{\G\s+}gc;
      $_ eq '' and return('',undef);
      m{\G([0-9]+(?:\.[0-9]+)?)}gc and return('NUM',$1);
      m{\G([A-Za-z][A-Za-z0-9_]*)}gc and return('VAR',$1);
      m{\G(.)}gcs and return($1,$1);
    }
    return('',undef);
  }

  sub Run {
      my($self)=shift;
      $self->YYParse( yylex => \&_Lexer, yyerror => \&_Error, );
  }
}; # end grammar

our (@all, $uminus);

Parse::Eyapp->new_grammar( # Create the parser package/class
  input=>$grammar,
  classname=>'Calc', # The name of the package containing the parser
  firstline=>7       # String $grammar starts at line 7 (for error diagnostics)
);
my $parser = Calc->new();                # Create a parser
$parser->YYData->{INPUT} = "2*-3+b*0;--2\n"; # Set the input
my $t = $parser->Run;                    # Parse it!
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;

# Let us transform the tree. Define the tree-regular expressions ..
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
    { #  Example of support code
      my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
    }
    constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
      => {
        my $op = $Op{ref($bin)};
        $x->{attr} = eval  "$x->{attr} $op $y->{attr}";
        $_[0] = $NUM[0];
      }
    uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
    zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
    whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
  },
  OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations

$t->s($uminus); # Transform UMINUS nodes
$t->s(@all);    # constant folding and mult. by zero

local $Parse::Eyapp::Node::INDENT=0;
print "\nSyntax Tree after transformations:\n",$t->str,"\n";

Introduction

Parse::Eyapp (Extended yapp) is a collection of modules that extends Francois Desarmenien Parse::Yapp 1.05. Eyapp extends yacc/yapp syntax with functionalities like named attributes, EBNF-like expressions, modifiable default action, automatic syntax tree building, semi-automatic abstract syntax tree building, translation schemes, tree regular expressions, tree transformations, scope analysis support, directed acyclic graphs and a few more.

Basic Concepts

Parsing is the activity of producing a syntax tree from an input stream. The program example in the synopsis section shows an example of parsing. The variable $grammar contains a context free eyapp grammar defining the language of lists of arithmetic expressions. A context free grammar is a mathematical device to define languages. The grammar for the example in the synopsis section is:

line: exp <+ ';'>  
;

exp:
    NUM            
  | VAR         
  | VAR '=' exp
  | exp '+' exp    
  | exp '-' exp 
  | exp '*' exp
  | exp '/' exp 
  | '-'  exp 
  |   '(' exp ')' 
;

A grammar generates a language. A grammar is defined by a set of production rules. A production rule has two components: a left hand side which is a syntactic variable or non terminal and a right hand side which is a phrase made of syntactic variables and terminals. The left hand side (lhs) and the right hand side (rhs) are usually separated by an arrow like in:

exp -> VAR = exp

A note: the production rule

line: exp <+ ';'>

is not really a production rule but an abbreviation for two productions. It stands for:

line : exp
     | line ';' exp
;

A terminal or token never appears on the left hand side of a production rule. The phrases of the language are those obtained succesively applying the production rules of the grammar until no more rules can be applied. The succesive substitutions must start from the start symbol of the grammar (line in the example). Such legal sequence of substitutions is known as a derivation. The following is an example of a legal derivation (the big arrow => is read derives):

line => exp => VAR = exp => VAR = exp + exp => VAR = exp + NUM => VAR = VAR + NUM

thus the phrase VAR = VAR + NUM belongs to the language generated by the former grammar. A derivation like can be seen as a tree. For instance, the former derivation is equivalen (has the same information) than the tree:

line(exp(VAR, '=', exp(exp(VAR), '+',  exp(NUM))))

Such a tree is called a syntax tree for the input VAR = VAR + NUM. A grammar is said to be ambiguous if there are phrases in the generated language that have more than one syntax tree. The grammar in the synopsis example is ambiguous. Here is an alternative tree for the same phrase VAR = VAR + NUM:

line(exp(exp(VAR, '=', exp(VAR)), '+', exp(NUM)))

Parsers created by eyapp do not deal directly with the input. Instead they expect the input to be processed by a lexical analyzer. The lexical analyzer parses the input and produces the next token. A token is a pair. The first component is the name of the token (like NUM or VAR) and the second is its attribute (i.e. ay information associated with the token, like that the value is 4 for a NUM or the identifier is temperature for a VAR). Tokens are usually defined using regular expressions. Thus the token NUM is characterized by /[0-9]+(?:\.[0-9]+)?/ and the token VAR by /[A-Za-z][A-Za-z0-9_]*/. The subroutine _Lexer in the tail section of the SYNOPSIS section is a typical example of a typical lexical analyzer:

sub _Lexer {
  my($parser)=shift; # The parser object

  for ($parser->YYData->{INPUT}) { # Topicalize
    m{\G\s+}gc;     # skip whites
    $_ eq '' and return('',undef);
    m{\G([0-9]+(?:\.[0-9]+)?)}gc and return('NUM',$1);
    m{\G([A-Za-z][A-Za-z0-9_]*)}gc and return('VAR',$1);
    m{\G(.)}gcs and return($1,$1);
  }
  return('',undef);
}

The input was saved in the YYData->{INPUT} section of the $parser object. The for loop is a false for: its goal is to make $_ an alias of $parser->YYData->{INPUT}. To catch the next pattern we use the anchor \G. The \G anchor matches at the point where the previous /g match left off. Normally, when a scalar m{}g match fails, the match position is reset and \G will start matching at the beginning of the string. The c option causes the match position to be retained following an unsuccessful match. The couple ('',undef) signals the end of the input.

Parse::Eyapp can analyze your grammar and produce a parser from your grammar. Actually Parse::Eyapp is a translation scheme analyzer. A translation scheme scheme is a context free grammar where the right hand sides of the productions have been augmented with semantic actions (i.e. with chunks of Perl code):

A -> alpha { action(@_) } beta

The analyzer generated by Eyapp executes { action(@_) } after all the semantic actions asssociated with alpha have been executed and before the execution of any of the semantic actions associated with beta.

Notice that ambiguous grammars produce ambiguous translation schemes: since a phrase may have two syntactic trees it will be more than one tree-traversing and consequently more than one way to execute the embedded semantic actions. Certainly different execution orders will usually produce different results. Thus, syntactic ambiguities translate onto semantic ambiguities. That is why it is important to resolve all the ambiguities and conflicts that may arise in our grammar. This is the function of the %left and %right declarations on the header section:

my $grammar = q{
  # header section
  %right  '='     # Lowest precedence
  %left   '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
  %left   '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
  %left   NEG     # Disambiguate -a-b as (-a)-b and not as -(a-b)
  %tree           # Let us build an abstract syntax tree ...

  %%
  .... # body section
  %%
  .... # tail section
};

Priority can be assigned to tokens by using the %left and %right declarations. Tokens in lines below have more precedence than tokens in line above. By giving token '+' more precedence than token '=' we solve the ambiguity for phrases like VAR = VAR + NUM. The tree

line(exp(VAR, '=', exp(exp(VAR), '+',  exp(NUM))))

will be built, discarding the other tree. Since priority means earlier evaluation and the evaluation of semantic actions is bottom up, the deeper the associated subtree the higher the priority.

In a translation scheme the embedded actions modify the attributes associated with the symbols of the grammar.

A -> alpha { action(@_) } beta

Each symbol on the right hand side of a production rule has an associated scalar attribute. In eyapp the attributes of the symbol to the left of action are passed as arguments to action (in the example, those of alpha). These arguments are preceded by a reference to the syntax analyzer object. There is no way inside an ordinary eyapp program for an intermediate action to access the attributes of the symbols on its right, i.e. those associated with the symbols of beta. This restriction is lifted if you use the %metatree directive to build a full translation scheme. See the section "Translation Schemes and the %metatree directive" in Parse::Eyapp to know more about full translation schemes.

Actions on the right hand side counts as symbols and so they can be referenced by its positional arument in later actions in the same production rule. For intermediate actions, the value returned by the action is the attribute associated with such action. For an action at the end of the rule:

A -> alpha { lastaction(@_) } 

the returned value constitutes the attribute of the left hand side of the rule (the attribute of A in this case). The action at the end of the right hand side is called the action associated with the production rule. When no explicit action has been associated with a production rule the default action applies. In Parse::Eyapp the programmer can define what is the default action.

A very special action is "build the node associated with this production rule" which is performed by the YYBuildAST method of the parser object:

%default action { goto &Parse::Eyapp::Driver::YYBuildAST }

The %tree directive used in the SYNOPSIS example is an abbreviation for this and has the effect of building an abstract syntax tree for the input.

The call to

Parse::Eyapp->new_grammar( # Create the parser package/class
   input=>$grammar,
   classname=>"Calc", # The name of the package containing the parser
 );

compiles $grammar and produces a new class Calc containing a LALR parser for such grammar. The call

$parser = Calc->new()

creates a parser object for the language generated by $grammar. Using the YYParse of the parser object:

$self->YYParse( yylex => \&_Lexer, yyerror => \&_Error, )

YYParse is called with arguments a reference to the lexical analyzer and a reference to the error diagnostic subroutine _Error. Such subroutine will be called by YYParse when an error occurs. Is therefore convenient to give a meaningful diagnostic:

sub _Error { 
  die "Syntax error near "
  .($_[0]->YYCurval?$_[0]->YYCurval:"end of file")."\n" 
}

The parser method YYCurval returns the value of the current token. A more accurate error diagnostic subroutine can be obtained if the lexical analyzer is modified so that tokens keep the line number where they start (i.e. the token is a pair (TOKENNAME, [ ATTRIBUTE, LINENUMBER]). In such case the _Error subroutine can be rewritten as:

sub _Error {
  my($token)=$_[0]->YYCurval;
  my($what)= $token ? "input: '$token->[0]' in line $token->[1]" : "end of input";
  my @expected = $_[0]->YYExpect();
  my $expected = @expected? "Expected one of these tokens: '@expected'":"";

  croak "Syntax error near $what. $expected\n";
}

The YYExpect method returns the set of tokens that were expected when the error occurred.

The input in

$parser->YYData->{INPUT} 

is then analyzed by YYParse and an abstract syntax tree is built. The tree rooted on a Parse::Eyapp::Node can be displayed using the method str:

local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;

The following is the description of the syntax tree produced by the call $t->str for the list of expressions "2*-3+b*0;--2\n";:

pl@nereida:~/LEyapp/examples$ synopsis.pl
Syntax Tree:
EXPRESION_LIST(
  PLUS(
    TIMES(
      NUM(
        TERMINAL[2]
      ),
      UMINUS(
        NUM(
          TERMINAL[3]
        )
      ) # UMINUS
    ) # TIMES,
    TIMES(
      VAR(
        TERMINAL[b]
      ),
      NUM(
        TERMINAL[0]
      )
    ) # TIMES
  ) # PLUS,
  UMINUS(
    UMINUS(
      NUM(
        TERMINAL[2]
      )
    ) # UMINUS
  ) # UMINUS
) # EXPRESION_LIST

Did you notice that the TERMINAL nodes appear decorated with its attribute? This is because each time the Parse::Eyapp::Node method str visits a node checks if the node has a method info (i.e. $node->can(info)). If so, the info method is called and the string returned is concatenated in the description string. This is the reason for these three lines at the beginning of the SYNOPSIS example:

sub TERMINAL::info {
  $_[0]{attr}
}

Parse::Eyapp not only gives support to parsing but to later phases of the translation process: tree transformations and scope analysis (scope analysis is the task to find which definition applies to an use of an object in the source). The program in the synopsis section shows an example of a tree transformation specification. Tree transformations are specified using a language called Tree regular expressions. The transformation object is created by the constructor Parse::Eyapp::Treeregexp->new.

my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
    { #  Example of support code
      my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
    }
    constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
      => {
        my $op = $Op{ref($bin)};
        $x->{attr} = eval  "$x->{attr} $op $y->{attr}";
        $_[0] = $NUM[0];
      }
    uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
    zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
    whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
  },
);

The set of transformations specified in the example are

  • The transformation constantfold produces constant folding i.e. trees of expressions like 3*2+4 are reduced to the tree for 10

    { #  Example of support code
      my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
    }
    constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
      => {
        my $op = $Op{ref($bin)};
        $x->{attr} = eval  "$x->{attr} $op $y->{attr}";
        $_[0] = $NUM[0];
      }

    Here constantfold is the name of the transformation. The treeregexp compiler will produce an object $constantfold implementing the transformation. After the name comes the tree pattern:

    /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))

    It matches any subtree rooted in any node belonging to one of these classes: TIMES or PLUS or DIV or MINUS that has two children belonging to the NUM class. The Perl code after the big arrow is executed on any matching subtree. We can refer to the root of the subtree using the variable $bin. We can also refer to the child of the first NUM node using $x. In the same way $y refers to the child of the second NUM node. Since there are two NUM nodes in the pattern, we refer to them inside the transformation part using the array @NUM:

    $_[0] = $NUM[0];

    The action uses and eval and the hash %Op to compute the corresponding reduction of the two nodes. The hash %Op was defined in a previous section containing support code. You can insert in any place of a treeregexp program such support code by surrounding it with curly brackets. The subtree that matched (that is in $_[0]) is substituted by its left child:

    $_[0] = $NUM[0];
  • The transformations zero_times_whatever and whatever_times_zero produce the simplification of trees corresponding to multiplications by zero. Trees for expressions like (a+b)*0 or 0*(b-4) are reduced to the tree for 0.

    zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }

    Here zero_times_whatever is the name of the transformation. The pattern TIMES(NUM($x), .) matches any TIMES node with two children and whose first child belongs to the NUM class. The dot matches any subtree, indicating that we don't care what sort of tree the right child is. The third component

    { $x->{attr} == 0 }

    is the semantic pattern. If both the shape pattern and the semantic pattern apply the action after the arrow is applied. The subtrees is substituted by its left child.

  • The transformation uminus simplifies the tree for unary minus of constant expressions.

    uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }

    It matches trees rooted in a UMINUS node whose only child is a NUM node. In such case the sign of the number that is the attribute of the TERMINAL node is changed and the tree is substituted by its single child.

The call

$p->generate(); 

compiles the transformation specification producing a set of transformations $constantfold, $zero_times_whatever, whatever_times_zero and $uminus. Transformations are Parse::Eyapp::YATW objects. The list variable @all refer to the whole set of Parse::Eyapp::YATW transformations.

The nodes of the abstract syntax tree are objects. The class (NUM, TIMES, UMINUS, etc.) defines the type of node. All node classes inherit from the class Parse::Eyapp::Node. Parse::Eyapp::Node provides a set of methods to manipulate nodes. Among these methods are str, m and s. The m and s methods resemble the matching and substitution operators for regular expressions. But instead of regular expressions they work with tree transformations or treeregexps or, more precisely with Parse::Eyapp::YATW objects. By calling:

$t->s($uminus);

subtrees like

UMINUS(UMINUS(NUM(TERMINAL[2])))

are simplified to

NUM(TERMINAL[2])

The call to

$t->s(@all);

applies the whole set of transformations. The transformations in @all are iteratively applied to the tree $t until no transformation succeeds: Yes, that means that a inappropriate set of transformations my hang your program.

Thus, the former syntax tree for "2*-3+b*0;--2\n"; becomes:

EXPRESION_LIST(NUM(TERMINAL[-6]),NUM(TERMINAL[2]))

The analyzer has been able to optimize - at compile time - the computation of these two expressions

2*-3+b*0;
--2

reducing them to the computation of:

-6;
 2

Parse::Eyapp::Node Methods

The Parse::Eyapp::Node objects represent the nodes of the syntax tree. All the node classes build by %tree and %metatree directives inherit from Parse::Eyapp::Node and consequently have acces to the methods provided in such module.

Parse::Eyapp::Node->new

Nodes are usually created using the %tree or %metatree Parse::Eyapp directives. The Parse::Eyapp::Node constructor new offers an alternative way to create forests.

This class method can be used to build multiple nodes on a row. It receives a string describing the tree and optionally a reference to a subroutine. Such subroutine (called the attribute handler) is in charge to initialize the attributes of the just created nodes. The attribute handler is called with the array of references to the nodes as they appear in the string from left to right.

Parse::Eyapp::Node->new returns an array of pointers to the nodes created as they appear in the input string from left to right. In scalar context returns a pointer to the first of these trees.

The following example (see file examples/28foldwithnewwithvars.pl) of a treeregexp transformation creates a new NUM(TERMINAL) node using Parse::Eyapp::Node->new:

my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
  {
    my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
  }
  constantfold: /TIMES|PLUS|MINUS|DIV/(NUM($x), NUM($y))
     => {
    my $op = $Op{ref($_[0])};

    my $res = Parse::Eyapp::Node->new(
      q{NUM(TERMINAL)},
      sub {
        my ($NUM, $TERMINAL) = @_;
        $TERMINAL->{attr} = eval "$x->{attr} $op $y->{attr}";
        $TERMINAL->{token} = 'NUM';
      },
    );
    $_[0] = $res;
  }
  },
);

The string can describe more than one tree like in:

my @t = Parse::Eyapp::Node->new(
         'A(C,D) E(F)', sub { my $i = 0; $_->{n} = $i++ for @_ });

The following trees will be built:

bless( { 'n' => 0,
  'children' => [
    bless( { 'n' => 1, 'children' => [] }, 'C' ),
    bless( { 'n' => 2, 'children' => [] }, 'D' )
  ]
}, 'A' );
bless( { 'n' => 3,
  'children' => [
    bless( { 'n' => 4, 'children' => [] }, 'F' )
  ]
}, 'E' );

and @t will contain 5 references to the corresponding subtrees A(C,D), C, D, E(F) and F.

Directed Acyclic Graphs with Parse::Eyapp::Node->hnew

Parse::Eyapp provides the method Parse::Eyapp::Node->hnew to build Directed Acyclic Graphs (DAGs) instead of trees. They are built using hashed consing, i.e. memoizing the creation of nodes. It works very much like Parse::Eyapp::Node->new but if one of the implied trees was previously built, hnew returns a reference to the existing one. See the following debugger session where several DAGs describing type expressions are built:

 DB<2> x $a = Parse::Eyapp::Node->hnew('F(X_3(A_3(A_5(INT)), CHAR, A_5(INT)),CHAR)')
0  F=HASH(0x85f6a20)
   'children' => ARRAY(0x85e92e4)
   |- 0  X_3=HASH(0x83f55fc)
   |     'children' => ARRAY(0x83f5608)
   |     |- 0  A_3=HASH(0x85a0488)
   |     |     'children' => ARRAY(0x859fad4)
   |     |        0  A_5=HASH(0x85e5d3c)
   |     |           'children' => ARRAY(0x83f4120)
   |     |              0  INT=HASH(0x83f5200)
   |     |                 'children' => ARRAY(0x852ccb4)
   |     |                      empty array
   |     |- 1  CHAR=HASH(0x8513564)
   |     |     'children' => ARRAY(0x852cad4)
   |     |          empty array
   |     `- 2  A_5=HASH(0x85e5d3c)
   |           -> REUSED_ADDRESS
   `- 1  CHAR=HASH(0x8513564)
         -> REUSED_ADDRESS
 DB<3> x $a->str
0  'F(X_3(A_3(A_5(INT)),CHAR,A_5(INT)),CHAR)'

The second occurrence of A_5(INT) is labelled REUSED_ADDRESS. The same occurs with the second instance of CHAR. Parse::Eyapp::Node->hnew can be more convenient than new when dealing with optimizations like common subexpressions or during type checking. See file examples/Types.eyp for a more comprehensive example.

$node->type

Returns the type of the node. It can be called as a sub when $node is not a Parse::Eyapp::Node like this:

Parse::Eyapp::Node::type($scalar)

This is the case when visiting CODE nodes.

The following session with the debugger illustrates how it works:

> perl -MParse::Eyapp::Node -de0
DB<1> @t = Parse::Eyapp::Node->new("A(B,C)") # Creates a tree
DB<2> x map { $_->type } @t # Get the types of the three nodes
0  'A'
1  'B'
2  'C'
DB<3> x Parse::Eyapp::Node::type(sub {})
0  'CODE'
DB<4> x Parse::Eyapp::Node::type("hola")
0  'Parse::Eyapp::Node::STRING'
DB<5> x Parse::Eyapp::Node::type({ a=> 1})
0  'HASH'
DB<6> x Parse::Eyapp::Node::type([ a, 1 ])
0  'ARRAY'

As it is shown in the example it can be called as a subroutine with a (CODE/HASH/ARRAY) reference or an ordinary scalar.

The words HASH, CODE, ARRAY and STRING are reserved for ordinary Perl references. Avoid naming a node with one of those words.

$node->child

Setter-getter to modify a specific child of a node. It is called like:

$node->child($i)

Returns the child with index $i. Returns undef if the child does not exists. It has two obligatory parameters: the node (since it is a method) and the index of the child. Sets the new value if called

$node->child($i, $tree)

The method will croak if the obligatory parameters are not provided. Follows an example of use inside a Treereg program (see file examples/TSwithtreetransformations2.eyp) that swaps the children of a PLUS node:

my $transform = Parse::Eyapp::Treeregexp->new( STRING => q{
   commutative_add: PLUS($x, ., $y, .) # 1st . = '+' 2nd . = CODE
     => { my $t = $x; $_[0]->child(0, $y); $_[0]->child(2, $t)}
}

Child Access Through %tree alias

Remember that when the Eyapp program runs under the %tree alias directive The dot and dollar notations can be used to generate named getter-setters to access the children:

%tree bypass alias
.... 
%%
exp: %name PLUS
       exp.left '+' exp.right
....
%%
.... # and later
print $exp->left->str;

Here methods with names left and right will be created to access the corresponding children associated with the two instances of exp in the right hand side of the production rule.

$node->children

Returns the array of children of the node. When the tree is a translation scheme the CODE references are also included. See examples/TSPostfix3.eyp for an example of use inside a Translation Scheme:

pl@nereida:~/src/perl/YappWithDefaultAction/examples$\
                      sed -ne '31,34p' TSPostfix3.eyp
line: %name PROG
       exp <%name EXP + ';'>
         { @{$lhs->{t}} = map { $_->{t}} ($_[1]->children()); }

The tree in a Translation Scheme contains the references to the CODE implementing the semantic actions. For example, the syntax tree built by the parser for the input a=-b*3 in TSPostfix3.eyp is:

PROG(EXP(
    ASSIGN(
      TERMINAL[a],
      TERMINAL[=],
      TIMES(
        NEG(TERMINAL[-], VAR(TERMINAL[b], CODE), CODE),
        TERMINAL[*],
        NUM(TERMINAL[3], CODE),
        CODE
      ) # TIMES,
      CODE
    ) # ASSIGN
  ) # EXP,
  CODE
) # PROG

$node->children can also be used as a setter.

$node->Children

Returns the array of children of the node. When dealing with a translation scheme, the $node->Children method (first in uppercase) returns the non CODE children of the node.

$node->last_child

Return the last child of the node. When dealing with translation schemes, the last can be a CODE node.

$node->Last_child

The $node->Last_child method returns the last non CODE child of the node. See an example:

line:       %name EXP
              exp <+ ';'> /* Expressions separated by semicolons */
          { $lhs->{n} = $_[1]->Last_child->{n} }
;

$node->descendant

The descendant method returns the descendant of a node given its coordinates. The coordinates of a node $s relative to a tree $t to which it belongs is a string of numbers separated by dots like ".1.3.2" which denotes the child path from $t to $s, i.e. $s == $t->child(1)->child(3)->child(2).

See a session with the debugger:

  DB<7> x $t->child(0)->child(0)->child(1)->child(0)->child(2)->child(1)->str
0  '
BLOCK[8:4:test]^{0}(
  CONTINUE[10,10]
)
  DB<8> x $t->descendant('.0.0.1.0.2.1')->str
0  '
BLOCK[8:4:test]^{0}(
  CONTINUE[10,10]

$node->str

The str method returns a string representation of the tree. The str method traverses the syntax tree dumping the type of the node being visited in a string. To be specific the value returned by the function referenced by $CLASS_HANDLER will be dumped. The default value fo such function is to return the type of the node. If the node being visited has a method info it will be executed and its result inserted between $DELIMITERs into the string. Thus, in the "SYNOPSIS" example, by adding the info method to the class TERMINAL:

sub TERMINAL::info {
  $_[0]{attr}
}

we achieve the insertion of attributes in the string being built by str.

The existence of some methods (like footnote) and the values of some package variables influence the behavior of str. Among the most important are:

@PREFIXES = qw(Parse::Eyapp::Node::);                                 # Prefixes to supress 
$INDENT = 0; # -1 compact, no info, no footnotes 
             # 0 = compact, 1 = indent, 2 = indent and include Types in closing parenthesis
$STRSEP = ',';                                # Separator between nodes, by default a comma
$DELIMITER = '[';                         # The string returned by C<info> will be enclosed 
$FOOTNOTE_HEADER = "\n---------------------------\n"; 
$FOOTNOTE_SEP = ")\n"; 
$FOOTNOTE_LEFT = '^{';                               # Left delimiter for a footnote number
$FOOTNOTE_RIGHT = '}';                              # Right delimiter for a footnote number
$LINESEP = 4;                             # When indent=2 the enclosing parenthesis will be
                                          # commented if more than $LINESEP apart
$CLASS_HANDLER = sub { type($_[0]) }; # What to print to identify the node

Footnotes and attribute info will not be inserted when $INDENT is -1. A compact representation will be obtained. Such representation can be feed to new or hnew to obtain a copy of the tree. See the following session with the debugger:

pl@nereida:~/LEyapp$ perl -MParse::Eyapp::Node -wde 0
main::(-e:1):   0
  DB<1> $x = Parse::Eyapp::Node->new('A(B(C,D),D)', sub { $_->{order} = $i++ for @_; })
  DB<2> *A::info = *B::info = *C::info = *D::info = sub { shift()->{order} }
  DB<3> p $x->str
A[0](B[1](C[2],D[3]),D[4])
  DB<4> $Parse::Eyapp::Node::INDENT=-1
  DB<5> p $x->str
A(B(C,D),D)
  DB<6> x Parse::Eyapp::Node->hnew($x->str)
0  A=HASH(0x8574704)
   'children' => ARRAY(0x85745d8)
      0  B=HASH(0x857468c)
         'children' => ARRAY(0x8574608)
            0  C=HASH(0x85745b4)
               'children' => ARRAY(0x8509670)
                    empty array
            1  D=HASH(0x8574638)
               'children' => ARRAY(0x857450c)
                    empty array
      1  D=HASH(0x8574638)
         -> REUSED_ADDRESS
1  B=HASH(0x857468c)
   -> REUSED_ADDRESS
2  C=HASH(0x85745b4)
   -> REUSED_ADDRESS
3  D=HASH(0x8574638)
   -> REUSED_ADDRESS
4  D=HASH(0x8574638)
   -> REUSED_ADDRESS

The following list defines the $DELIMITERs you can choose for attribute representation:

'[' => ']', '{' => '}', '(' => ')', '<' => '>'

If the node being visited has a method footnote, the string returned by the method will be concatenated at the end of the string as a footnote. The variables $FOOTNOTE_LEFT and $FOOTNOTE_RIGHT govern the displaying of footnote numbers.

Follows an example of output using footnotes.

nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/script> \
                                         usetypes.pl prueba24.c
PROGRAM^{0}(FUNCTION[f]^{1}(RETURNINT(TIMES(INUM(TERMINAL[2:2]),VAR(TERMINAL[a:2])))))
---------------------------
0)
Types:
$VAR1 = {
  'CHAR' => bless( {
    'children' => []
  }, 'CHAR' ),
  'VOID' => bless( {
    'children' => []
  }, 'VOID' ),
  'INT' => bless( {
    'children' => []
  }, 'INT' ),
  'F(X_1(INT),INT)' => bless( {
    'children' => [
      bless( {
        'children' => [
          $VAR1->{'INT'}
        ]
      }, 'X_1' ),
      $VAR1->{'INT'}
    ]
  }, 'F' )
};
Symbol Table:
$VAR1 = {
  'f' => {
    'type' => 'F(X_1(INT),INT)',
    'line' => 1
  }
};

---------------------------
1)
$VAR1 = {
  'a' => {
    'type' => 'INT',
    'param' => 1,
    'line' => 1
  }
};

The first footnote was due to a call to PROGRAM:footnote. The footnote method for the PROGRAM node was defined as:

nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
                            sed -n -e '691,696p' Types.eyp | cat -n
    1  sub PROGRAM::footnote {
    2    return "Types:\n"
    3           .Dumper($_[0]->{types}).
    4           "Symbol Table:\n"
    5           .Dumper($_[0]->{symboltable})
    6  }

The second footnote was produced by the existence of a FUNCTION::footnote method:

nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
                           sed -n -e '702,704p' Types.eyp | cat -n
1  sub FUNCTION::footnote {
2    return Dumper($_[0]->{symboltable})
3  }

The source program for the example was:

1  int f(int a) {
2    return 2*a;
3  }

$node->equal

A call $tree1->equal($tree2) compare the two trees $tree1 and $tree2. Two trees are considered equal if their root nodes belong to the same class, they have the same number of children and the children are (recursively) equal.

Additionally to the two trees the programmer can specify pairs attribute_key => equality_handler:

$tree1->equal($tree2, attr1 => \&handler1, attr2 => \&handler2, ...)

In such case the definition of equality is more restrictive: Two trees are considered equal if

  • Their root nodes belong to the same class,

  • They have the same number of children

  • For each of the specified attributes occur that for both nodes the existence and definition of the key is the same

  • Assuming the key exists and is defined for both nodes, the equality handlers return true for each of its attributes and

  • The respective children are (recursively) equal.

An attribute handler receives as arguments the values of the attributes of the two nodes being compared and must return true if, and only if, these two attributes are considered equal. Follows an example:

pl@nereida:~/LEyapp/examples$ cat equal.pl
#!/usr/bin/perl -w
use strict;
use Parse::Eyapp::Node;

my $string1 = shift || 'ASSIGN(VAR(TERMINAL))';
my $string2 = shift || 'ASSIGN(VAR(TERMINAL))';
my $t1 = Parse::Eyapp::Node->new($string1, sub { my $i = 0; $_->{n} = $i++ for @_ });
my $t2 = Parse::Eyapp::Node->new($string2);

# Without attributes
if ($t1->equal($t2)) {
  print "\nNot considering attributes: Equal\n";
}
else {
  print "\nNot considering attributes: Not Equal\n";
}

# Equality with attributes
if ($t1->equal($t2, n => sub { return $_[0] == $_[1] })) {
  print "\nConsidering attributes: Equal\n";
}
else {
  print "\nConsidering attributes: Not Equal\n";
}

When the former program is run without arguments produces the following output:

pl@nereida:~/LEyapp/examples$ equal.pl

Not considering attributes: Equal

Considering attributes: Not Equal

Using equal During Testing

During the development of your compiler you add new stages to the existing ones. The consequence is that the AST is decorated with new attributes. Unfortunately, this implies that tests you wrote using is_deeply and comparisons against formerly correct abstract syntax trees are no longer valid. This is due to the fact that is_deeply requires both tree structures to be equivalent in every detail and that our new code produces a tree with new attributes.

Instead of is_deeply use the equal method to check for partial equivalence between abstract syntax trees. You can follow these steps:

  • Dump the tree for the source inserting Data::Dumper statements

  • Carefully check that the tree is really correct

  • Decide which attributes will be used for comparison

  • Write the code for the expected value editing the output produced by Data::Dumper

  • Write the handlers for the attributes you decided. Write the comparison using equal.

Tests using this methodology will not fail even if later code decorating the AST with new attributes is introduced. See an example:

pl@nereida:~/LEyapp/examples$ cat -n testequal.pl
   1  #!/usr/bin/perl -w
   2  use strict;
   3  use Parse::Eyapp::Node;
   4  use Data::Dumper;
   5  use Data::Compare;
   6
   7  my $debugging = 0;
   8
   9  my $handler = sub {
  10    print Dumper($_[0], $_[1]) if $debugging;
  11    Compare($_[0], $_[1])
  12  };
  13
  14  my $t1 = bless( {
  15                   'types' => {
  16                                'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
  17                                'VOID' => bless( { 'children' => [] }, 'VOID' ),
  18                                'INT' => bless( { 'children' => [] }, 'INT' ),
  19                                'F(X_0(),INT)' => bless( {
  20                                   'children' => [
  21                                      bless( { 'children' => [] }, 'X_0' ),
  22                                      bless( { 'children' => [] }, 'INT' ) ]
  23                                 }, 'F' )
  24                              },
  25                   'symboltable' => { 'f' => { 'type' => 'F(X_0(),INT)', 'line' => 1 } },
  26                   'lines' => 2,
  27                   'children' => [
  28                                   bless( {
  29                                            'symboltable' => {},
  30                                            'fatherblock' => {},
  31                                            'children' => [],
  32                                            'depth' => 1,
  33                                            'parameters' => [],
  34                                            'function_name' => [ 'f', 1 ],
  35                                            'symboltableLabel' => {},
  36                                            'line' => 1
  37                                          }, 'FUNCTION' )
  38                                 ],
  39                   'depth' => 0,
  40                   'line' => 1
  41                 }, 'PROGRAM' );
  42  $t1->{'children'}[0]{'fatherblock'} = $t1;
  43
  44  # Tree similar to $t1 but without some attributes (line, depth, etc.)
  45  my $t2 = bless( {
  46                   'types' => {
  47                                'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
  48                                'VOID' => bless( { 'children' => [] }, 'VOID' ),
  49                                'INT' => bless( { 'children' => [] }, 'INT' ),
  50                                'F(X_0(),INT)' => bless( {
  51                                   'children' => [
  52                                      bless( { 'children' => [] }, 'X_0' ),
  53                                      bless( { 'children' => [] }, 'INT' ) ]
  54                                 }, 'F' )
  55                              },
  56                   'symboltable' => { 'f' => { 'type' => 'F(X_0(),INT)', 'line' => 1 } },
  57                   'children' => [
  58                                   bless( {
  59                                            'symboltable' => {},
  60                                            'fatherblock' => {},
  61                                            'children' => [],
  62                                            'parameters' => [],
  63                                            'function_name' => [ 'f', 1 ],
  64                                          }, 'FUNCTION' )
  65                                 ],
  66                 }, 'PROGRAM' );
  67  $t2->{'children'}[0]{'fatherblock'} = $t2;
  68
  69  # Tree similar to $t1 but without some attributes (line, depth, etc.)
  70  # and without the symboltable and types attributes used in the comparison
  71  my $t3 = bless( {
  72                   'types' => {
  73                                'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
  74                                'VOID' => bless( { 'children' => [] }, 'VOID' ),
  75                                'INT' => bless( { 'children' => [] }, 'INT' ),
  76                                'F(X_0(),INT)' => bless( {
  77                                   'children' => [
  78                                      bless( { 'children' => [] }, 'X_0' ),
  79                                      bless( { 'children' => [] }, 'INT' ) ]
  80                                 }, 'F' )
  81                              },
  82                   'children' => [
  83                                   bless( {
  84                                            'symboltable' => {},
  85                                            'fatherblock' => {},
  86                                            'children' => [],
  87                                            'parameters' => [],
  88                                            'function_name' => [ 'f', 1 ],
  89                                          }, 'FUNCTION' )
  90                                 ],
  91                 }, 'PROGRAM' );
  92
  93  $t3->{'children'}[0]{'fatherblock'} = $t2;
  94
  95  # Without attributes
  96  if (Parse::Eyapp::Node::equal($t1, $t2)) {
  97    print "\nNot considering attributes: Equal\n";
  98  }
  99  else {
 100    print "\nNot considering attributes: Not Equal\n";
 101  }
 102
 103  # Equality with attributes
 104  if (Parse::Eyapp::Node::equal(
 105        $t1, $t2,
 106        symboltable => $handler,
 107        types => $handler,
 108      )
 109     ) {
 110        print "\nConsidering attributes: Equal\n";
 111  }
 112  else {
 113    print "\nConsidering attributes: Not Equal\n";
 114  }
 115
 116  # Equality with attributes
 117  if (Parse::Eyapp::Node::equal(
 118        $t1, $t3,
 119        symboltable => $handler,
 120        types => $handler,
 121      )
 122     ) {
 123        print "\nConsidering attributes: Equal\n";
 124  }
 125  else {
 126    print "\nConsidering attributes: Not Equal\n";
 127  }

The code defining tree $t1 was obtained from an output using Data::Dumper. The code for trees $t2 and $t3 was written using cut-and-paste from $t1. They have the same shape than $t1 but differ in their attributes. Tree $t2 shares with $t1 the attributes symboltable and types used in the comparison and so equal returns true when compared. Since $t3 differs from $t1 in the attributes symboltable and types the call to equal returns false.

$node->delete

The $node->delete($child) method is used to delete the specified child of $node. The child to delete can be specified using the index or a reference. It returns the deleted child.

Throws an exception if the object can't do children or has no children. See also the delete method of treeregexes (Parse::Eyapp:YATW objects) to delete the node being visited.

The following example moves out of a loop an assignment statement assuming is an invariant of the loop. To do it, it uses the delete and insert_before methods:

nereida:~/src/perl/YappWithDefaultAction/examples> \
            sed -ne '98,113p' moveinvariantoutofloopcomplexformula.pl
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
  moveinvariant: BLOCK(
                   @prests,
                   WHILE(VAR($b), BLOCK(@a, ASSIGN($x, NUM($e)), @c)),
                   @possts
                 )
    => {
         my $assign = $ASSIGN;
         $BLOCK[1]->delete($ASSIGN);
         $BLOCK[0]->insert_before($WHILE, $assign);
       }
  },
  FIRSTLINE => 99,
);
$p->generate();
$moveinvariant->s($t);

The example below deletes CODE nodes from the tree build for a translation scheme:

my $transform = Parse::Eyapp::Treeregexp->new( 
  STRING=>q{
    delete_code: CODE => { Parse::Eyapp::Node::delete($CODE) }
  },
)

Observe how delete is called as a subroutine.

$node->unshift($newchild)

Inserts $newchild at the beginning of the list of children of $node. See also the unshift method for Parse::Eyapp:YATW treeregexp transformation objects

$node->push($newchild)

Inserts $newchild at the end of the list of children of $node.

$node->insert_before($position, $new_child)

Inserts $newchild before $position in the list of children of $node. Variable $position can be an index or a reference.

The method throws an exception if $position is an index and is not in range. Also if $node has no children.

The method throws a warning if $position is a reference and does not define an actual child. In such case $new_child is not inserted.

See also the insert_before method for Parse::Eyapp:YATW treeregexp transformation objects

$node->insert_after($position, $new_child)

Inserts $newchild after $position in the list of children of $node. Variable $position can be an index or a reference.

The method throws an exception if $position is an index and is not in the range of $node-children>.

The method throws a warning if $position is a reference and does not exists in the list of children. In such case $new_child is not inserted.

$node->translation_scheme

Traverses $node. Each time a CODE node is visited the subroutine referenced is called with arguments the node and its children. Usually the code will decorate the nodes with new attributes or will update existing ones. Obviously this method does nothing for an ordinary AST. It is used after compiling an Eyapp program that makes use of the %metatree directive.

$node->bud

Bottom-up decorator. The tree is traversed bottom-up. The set of transformations is applied to each node in the order supplied by the user. As soon as one succeeds no more transformations are applied. For an example see the files examples/Types.eyp and examples/Trans.trg. The code below shows an extract of the type-checking phase of a toy-example compiler:

nereida:~/src/perl/YappWithDefaultAction/examples> \
                        sed -ne '600,611p' Types.eyp
 my @typecheck = (
   our $inum,
   our $charconstant,
   our $bin,
   our $arrays,
   our $assign,
   our $control,
   our $functioncall,
   our $statements,
 );

 $t->bud(@typecheck);

As an example of the appearance of the treeregexp transformations involved in the former call, see the code of the $control treeregexp transformation:

nereida:~/src/perl/YappWithDefaultAction/examples> \
                        sed -ne '183,192p' Trans.trg
control: /IF|IFELSE|WHILE/:con($bool)
  => {
    $bool = char2int($con, 0) if $bool->{t} == $CHAR;
      type_error("Condition must have integer type!", $bool->line)
    unless $bool->{t} == $INT;

    $con->{t} = $VOID;

    return 1;
  }

Parse::Eyapp:YATW Methods

Parse::Eyapp:YATW objects represent tree transformations. They carry the information of what nodes match and how to modify them.

Parse::Eyapp::YATW->new

Builds a treeregexp transformation object. Though usually you build a transformation by means of Treeregexp programs you can directly invoke the method to build a tree transformation. A transformation object can be built from a function that conforms to the YATW tree transformation call protocol (see the section "The YATW Tree Transformation Call Protocol"). Follows an example (file examples/12ts_simplify_with_s.pl):

nereida:~/src/perl/YappWithDefaultAction/examples> \
       sed -ne '68,$p' 12ts_simplify_with_s.pl | cat -n
 1  sub is_code {
 2    my $self = shift; # tree
 3
 4    # After the shift $_[0] is the father, $_[1] the index
 5    if ((ref($self) eq 'CODE')) {
 6      splice(@{$_[0]->{children}}, $_[1], 1);
 7      return 1;
 8    }
 9    return 0;
10  }
11
12  Parse::Eyapp->new_grammar(
13    input=>$translationscheme,
14    classname=>'Calc',
15    firstline =>7,
16  );
17  my $parser = Calc->new();                # Create the parser
18
19  $parser->YYData->{INPUT} = "2*-3\n";  print "2*-3\n"; # Set the input
20  my $t = $parser->Run;                    # Parse it
21  print $t->str."\n";
22  my $p = Parse::Eyapp::YATW->new(PATTERN => \&is_code);
23  $p->s($t);
24  { no warnings; # make attr info available only for this display
25    local *TERMINAL::info = sub { $_[0]{attr} };
26    print $t->str."\n";
27  }

After the Parse::Eyapp::YATW object $p is built at line 22 the call to method $p->s($t) applies the transformation is_code using a bottom-up traversing of the tree $t. The achieved effect is the elimination of CODE references in the translation scheme tree. When executed the former code produces:

nereida:~/src/perl/YappWithDefaultAction/examples> 12ts_simplify_with_s.pl
2*-3
EXP(TIMES(NUM(TERMINAL,CODE),TERMINAL,UMINUS(TERMINAL,NUM(TERMINAL,CODE),CODE),CODE),CODE)
EXP(TIMES(NUM(TERMINAL[2]),TERMINAL[*],UMINUS(TERMINAL[-],NUM(TERMINAL[3]))))

The file foldrule6.pl in the examples/ distribution directory gives you another example:

nereida:~/src/perl/YappWithDefaultAction/examples> cat -n foldrule6.pl
  1  #!/usr/bin/perl -w
  2  use strict;
  3  use Rule6;
  4  use Parse::Eyapp::YATW;
  5
  6  my %BinaryOperation = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
  7
  8  sub set_terminfo {
  9    no warnings;
 10    *TERMINAL::info = sub { $_[0]{attr} };
 11  }
 12  sub is_foldable {
 13    my ($op, $left, $right);
 14    return 0 unless defined($op = $BinaryOperation{ref($_[0])});
 15    return 0 unless ($left = $_[0]->child(0), $left->isa('NUM'));
 16    return 0 unless ($right = $_[0]->child(1), $right->isa('NUM'));
 17
 18    my $leftnum = $left->child(0)->{attr};
 19    my $rightnum = $right->child(0)->{attr};
 20    $left->child(0)->{attr} = eval "$leftnum $op $rightnum";
 21    $_[0] = $left;
 22  }
 23
 24  my $parser = new Rule6();
 25  $parser->YYData->{INPUT} = "2*3";
 26  my $t = $parser->Run;
 27  &set_terminfo;
 28  print "\n***** Before ******\n";
 29  print $t->str;
 30  my $p = Parse::Eyapp::YATW->new(PATTERN => \&is_foldable);
 31  $p->s($t);
 32  print "\n***** After ******\n";
 33  print $t->str."\n";

when executed produces:

nereida:~/src/perl/YappWithDefaultAction/examples> foldrule6.pl

***** Before ******
TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3]))
***** After ******
NUM(TERMINAL[6])

The YATW Tree Transformation Call Protocol

For a subroutine pattern_sub to work as a YATW tree transformation - as subroutines is_foldable and is_code above - has to conform to the following call description:

pattern_sub(
    $_[0],  # Node being visited
    $_[1],  # Father of this node
    $index, # Index of this node in @Father->children
    $self,  # The YATW pattern object
);

The pattern_sub must return TRUE if matched and FALSE otherwise.

The protocol may change in the near future. Avoid using other information than the fact that the first argument is the node being visited.

Parse::Eyapp::YATW->buildpatterns

Works as Parse::Eyapp->new but receives an array of subs conforming to the YATW Tree Transformation Call Protocol.

our @all = Parse::Eyapp::YATW->buildpatt(\&delete_code, \&delete_tokens);

$yatw->delete

The root of the tree that is currently matched by the YATW transformation $yatw will be deleted from the tree as soon as is safe. That usually means when the processing of their siblings is finished. The following example (taken from file examples/13ts_simplify_with_delete.pl in the Parse::Eyapp distribution) illustrates how to eliminate CODE and syntactic terminals from the syntax tree:

pl@nereida:~/src/perl/YappWithDefaultAction/examples$ \
       sed -ne '62,$p' 13ts_simplify_with_delete.pl | cat -n
 1  sub not_useful {
 2    my $self = shift; # node
 3    my $pat = $_[2];  # get the YATW object
 4
 5    (ref($self) eq 'CODE') or ((ref($self) eq 'TERMINAL') and ($self->{token} eq $self->{attr}))
 6      or do { return 0 };
 7    $pat->delete();
 8    return 1;
 9  }
10
11  Parse::Eyapp->new_grammar(
12    input=>$translationscheme,
13    classname=>'Calc',
14    firstline =>7,
15  );
16  my $parser = Calc->new();                # Create the parser
17
18  $parser->YYData->{INPUT} = "2*3\n"; print $parser->YYData->{INPUT};
19  my $t = $parser->Run;                    # Parse it
20  print $t->str."\n";                      # Show the tree
21  my $p = Parse::Eyapp::YATW->new(PATTERN => \&not_useful); 
22  $p->s($t);                               # Delete nodes
23  print $t->str."\n";                      # Show the tree

when executed we get the following output:

pl@nereida:~/src/perl/YappWithDefaultAction/examples$ 13ts_simplify_with_delete.pl
2*3
EXP(TIMES(NUM(TERMINAL[2],CODE),TERMINAL[*],NUM(TERMINAL[3],CODE),CODE))
EXP(TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3])))

$yatw->unshift

Tha call $yatw->unshift($b) safely unshifts (inserts at the beginning) the node $b in the list of its siblings of the node that matched (i.e in the list of siblings of $_[0]). The following example shows a YATW transformation insert_child that illustrates the use of unshift (file examples/26delete_with_trreereg.pl):

pl@nereida:~/src/perl/YappWithDefaultAction/examples$ \
        sed -ne '70,$p' 26delete_with_trreereg.pl | cat -n
 1  my $transform = Parse::Eyapp::Treeregexp->new( STRING => q{
 2
 3      delete_code : CODE => { $delete_code->delete() }
 4
 5      {
 6        sub not_semantic {
 7          my $self = shift;
 8          return  1 if ((ref($self) eq 'TERMINAL') and ($self->{token} eq $self->{attr}));
 9          return 0;
10        }
11      }
12
13      delete_tokens : TERMINAL and { not_semantic($TERMINAL) } => {
14        $delete_tokens->delete();
15      }
16
17      insert_child : TIMES(NUM(TERMINAL), NUM(TERMINAL)) => {
18        my $b = Parse::Eyapp::Node->new( 'UMINUS(TERMINAL)',
19          sub { $_[1]->{attr} = '4.5' }); # The new node will be a sibling of TIMES
20
21        $insert_child->unshift($b); 
22      }
23    },
24  )->generate();
25
26  Parse::Eyapp->new_grammar(
27    input=>$translationscheme,
28    classname=>'Calc',
29    firstline =>7,
30  );
31  my $parser = Calc->new();                # Create the parser
32
33  $parser->YYData->{INPUT} = "2*3\n"; print $parser->YYData->{INPUT}; # Set the input
34  my $t = $parser->Run;                # Parse it
35  print $t->str."\n";                        # Show the tree
36  # Get the AST
37  our ($delete_tokens, $delete_code);
38  $t->s($delete_tokens, $delete_code);
39  print $t->str."\n";                        # Show the tree
40  our $insert_child;
41  $insert_child->s($t);
42  print $t->str."\n";                        # Show the tree

When is executed the program produces the following output:

pl@nereida:~/src/perl/YappWithDefaultAction/examples$ 26delete_with_trreereg.pl
2*3
EXP(TIMES(NUM(TERMINAL[2],CODE),TERMINAL[*],NUM(TERMINAL[3],CODE),CODE))
EXP(TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3])))
EXP(UMINUS(TERMINAL[4.5]),TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3])))

Don't try to take advantage that the transformation sub receives in $_[1] a reference to the father (see the section "The YATW Tree Transformation Call Protocol") and do something like:

unshift $_[1]->{children}, $b

it is unsafe.

$yatw->insert_before

A call to $yatw->insert_before($node) safely inserts $node in the list of siblings of $_[0] just before $_[0] (i.e. the ndoe that matched with $yatw). The following example (file t/33moveinvariantoutofloop.t) illustrates its use:

my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
  moveinvariant: WHILE(VAR($b), BLOCK(@a, ASSIGN($x, $e), @c)) 
       and { is_invariant($ASSIGN, $WHILE) } => {
         my $assign = $ASSIGN;
         $BLOCK->delete($ASSIGN);
         $moveinvariant->insert_before($assign);
       }
  },
);

Here the ASSIGN($x, $e) subtree - if is loop invariant - will be moved to the list of siblings of $WHILE just before the $WHILE.

Tree Matching and Tree Substitution

Matching Trees

Both the transformation objects in Parse::Eyapp::YATW and the nodes in Parse::Eyapp::Node have a method named m for matching. For a Parse::Eyapp::YATW object, the method -when called in a list context- returns a list of Parse::Eyapp::Node::Match nodes.

@R = $t->m($yatw1, $yatw2, $yatw3, ...)

A Parse::Eyapp::Node::Match object describes the nodes of the actual tree that have matched. The nodes in the returned list are organized in a hierarchy. They appear in the list sorted according to a depth-first visit of the actual tree $t. In a scalar context m returns the first element of the list.

Let us denote by $t the actual tree being searched and $r one of the Parse::Eyapp::Node::Match nodes in the resulting forest @R. Then we have the following methods:

  • The method $r->node return the node $t of the actual tree that matched

  • The method $r->father returns the father of $r in the matching forest. The father of $r is defined by this property: $r->father->node is the nearest ancestor of $r->node that matched with the treeregexp pattern. That is, there is no ancestor that matched between $r->node and $r->father->node. Otherwise $r->father is undef

  • The method $r->coord returns the coordinates of $r->node relative to $t. For example, the coordinate ".1.3.2" denotes the node $t->child(1)->child(3)->child(2), where $t is the root of the search.

  • The method $r->depth returns the depth of $r->node in $t.

  • When m was called as a Parse::Eyapp::Node method, i. e. with potentially more than one YATW treeregexp, the method $r->names returns the array of names of the transformations that matched with $r->node.

The following example illustrates a use of m as a Parse::Eyapp:YATW method. It solves a problem of scope analysis in a C compiler: matching each RETURN statement with the function that surrounds it. The parsing was already done, the AST was built and left in $t. The treeregexp used is:

retscope: /FUNCTION|RETURN/

and the code that solves the problem is:

# Associate each "return exp" with its "function"
my @returns = $retscope->m($t); 
for (@returns) {
  my $node = $_->node;
  if (ref($node) eq 'RETURN') {
    my $function = $_->father->node; 
    $node->{function}  = $function;  
    $node->{t} = $function->{t};
  }
}

The first line gets a list of Parse::Eyapp::Node::Match nodes describing the actual nodes that matched /FUNCTION|RETURN/. If the node described by $_ is a 'RETURN' node, the expresion $_->father->node must necessarily point to the function node that encloses it.

The second example shows the use of m as a Parse::Eyapp::Node method.

pl@nereida:~/src/perl/YappWithDefaultAction/examples$ cat -n m2.pl
 1  #!/usr/bin/perl -w
 2  use strict;
 3  use Rule6;
 4  use Parse::Eyapp::Treeregexp;
 5
 6  Parse::Eyapp::Treeregexp->new( STRING => q{
 7    fold: /times|plus|div|minus/i:bin(NUM($n), NUM($m))
 8    zxw: TIMES(NUM($x), .) and { $x->{attr} == 0 }
 9    wxz: TIMES(., NUM($x)) and { $x->{attr} == 0 }
10  })->generate();
11
12  # Syntax analysis
13  my $parser = new Rule6();
14  $parser->YYData->{INPUT} = "0*0*0";
15  my $t = $parser->Run;
16  print "Tree:",$t->str,"\n";
17
18  # Search
19  my $m = $t->m(our ($fold, $zxw, $wxz));
20  print "Match Node:\n",$m->str,"\n";

When executed with input 0*0*0 the program generates this output:

pl@nereida:~/src/perl/YappWithDefaultAction/examples$ m2.pl
Tree:TIMES(TIMES(NUM(TERMINAL),NUM(TERMINAL)),NUM(TERMINAL))
Match Node:
Match[[TIMES:0:wxz]](Match[[TIMES:1:fold,zxw,wxz]])

The representation of Match nodes by str deserves a comment. Match nodes have their own info method. It returns a string containing the concatenation of the class of $r->node (i.e. the actual node that matched), the depth ($r->depth) and the names of the transformations that matched (as provided by the method $r->names)

The SEVERITY option of Parse::Eyapp::Treeregexp::new

The SEVERITY option of Parse::Eyapp::Treeregexp::new controls the way matching succeeds regarding the number of children. To illustrate its use let us consider the following example. The grammar used Rule6.yp is similar to the one in the SYNOPSIS example.

pl@nereida:~/src/perl/YappWithDefaultAction/examples$ cat -n numchildren.pl
 1  #!/usr/bin/perl -w
 2  use strict;
 3  use Rule6;
 4  use Parse::Eyapp::Treeregexp;
 5
 6  sub TERMINAL::info { $_[0]{attr} }
 7
 8  my $severity = shift || 0;
 9  my $parser = new Rule6();
10  $parser->YYData->{INPUT} = shift || '0*2';
11  my $t = $parser->Run;
12
13  my $transform = Parse::Eyapp::Treeregexp->new(
14    STRING => q{
15      zero_times_whatever: TIMES(NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
16    },
17    SEVERITY => $severity,
18    FIRSTLINE => 14,
19  )->generate;
20
21  $t->s(our @all);
22
23  print $t->str,"\n";

The program gets the severity level from the command line (line 9). The specification of the term TIMES(NUM($x)) inside the transformation zero_times_whatever does not clearly state that TIMES must have two children. There are several interpretations of the treregexp depending on the level fixed for SEVERITY:

  • 0: TIMES must have at least one child. Don't care if it has more.

  • 1: TIMES must have exactly one child.

  • 2: TIMES must have exactly one child. When visit a TIMES node with a different number of children issue a warning.

  • 3: TIMES must have exactly one child. When visit a TIMES node with a different number of children issue an error.

Observe the change in behavior according to the level of SEVERITY:

pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 0 '0*2'
NUM(TERMINAL[0])
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 1 '0*2'
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 2 '0*2'
Warning! found node TIMES with 2 children.
Expected 1 children (see line 15 of ./numchildren.pl)"
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 3 '0*2'
Error! found node TIMES with 2 children.
Expected 1 children (see line 15 of ./numchildren.pl)"
 at (eval 2) line 29

Tree Substitution: The s methods

Both Parse::Eyapp:Node and Parse::Eyapp::YATW objects (i.e. nodes and tree transformations) are provided with a s method.

In the case of a Parse::Eyapp::YATW object the method s applies the tree transformation using a single bottom-up traversing: the transformation is recursively applied to the children and then to the current node.

For Parse::Eyapp:Node nodes the set of transformations is applied to each node until no transformation matches any more. The example in the "SYNOPSIS" section illustrates the use:

 1  # Let us transform the tree. Define the tree-regular expressions ..
 2  my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
 3    { #  Example of support code
 4      my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
 5    }
 6    constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
 7      => {
 8        my $op = $Op{ref($_[0])};
 9        $x->{attr} = eval  "$x->{attr} $op $y->{attr}";
10        $_[0] = $NUM[0];
11      }
12    uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
13    zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
14    whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
15    },
16    OUTPUTFILE=> 'main.pm'
17  );
18  $p->generate(); # Create the tranformations
19 
20  $t->s($uminus); # Transform UMINUS nodes
21  $t->s(@all);    # constant folding and mult. by zero

The call at line 20 can be substituted by $uminus->s($t) without changes.

AUTHOR

Casiano Rodriguez-Leon (casiano@ull.es)

ACKNOWLEDGMENTS

This work has been supported by CEE (FEDER) and the Spanish Ministry of Educación y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project http://www.oplink.ull.es/). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years.

A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien.

I wish to thank Francois Desarmenien for his Parse::Yapp module, to my students at La Laguna and to the Perl Community. Special thanks to my family and Larry Wall.

LICENCE AND COPYRIGHT

Copyright (c) 2006-2007 Casiano Rodriguez-Leon (casiano@ull.es). All rights reserved.

Parse::Yapp copyright is of Francois Desarmenien, all rights reserved. 1998-2001

These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 1925:

Non-ASCII character seen before =encoding in 'I<Educación'. Assuming UTF-8