NAME
Parse::Eyapp::Node - The nodes of the Syntax Trees
SYNOPSIS
use Parse::Eyapp;
use Parse::Eyapp::Treeregexp;
sub TERMINAL::info {
$_[0]{attr}
}
my $grammar = q{
%right '=' # Lowest precedence
%left '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
%left '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
%left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b)
%tree # Let us build an abstract syntax tree ...
%%
line:
exp <%name EXPRESION_LIST + ';'>
{ $_[1] } /* list of expressions separated by ';' */
;
/* The %name directive defines the name of the class to which the node being built belongs */
exp:
%name NUM
NUM
| %name VAR
VAR
| %name ASSIGN
VAR '=' exp
| %name PLUS
exp '+' exp
| %name MINUS
exp '-' exp
| %name TIMES
exp '*' exp
| %name DIV
exp '/' exp
| %name UMINUS
'-' exp %prec NEG
| '(' exp ')'
{ $_[2] } /* Let us simplify a bit the tree */
;
%%
sub _Error { die "Syntax error near ".($_[0]->YYCurval?$_[0]->YYCurval:"end of file")."\n" }
sub _Lexer {
my($parser)=shift; # The parser object
for ($parser->YYData->{INPUT}) { # Topicalize
m{\G\s+}gc;
$_ eq '' and return('',undef);
m{\G([0-9]+(?:\.[0-9]+)?)}gc and return('NUM',$1);
m{\G([A-Za-z][A-Za-z0-9_]*)}gc and return('VAR',$1);
m{\G(.)}gcs and return($1,$1);
}
return('',undef);
}
sub Run {
my($self)=shift;
$self->YYParse( yylex => \&_Lexer, yyerror => \&_Error, );
}
}; # end grammar
our (@all, $uminus);
Parse::Eyapp->new_grammar( # Create the parser package/class
input=>$grammar,
classname=>'Calc', # The name of the package containing the parser
firstline=>7 # String $grammar starts at line 7 (for error diagnostics)
);
my $parser = Calc->new(); # Create a parser
$parser->YYData->{INPUT} = "2*-3+b*0;--2\n"; # Set the input
my $t = $parser->Run; # Parse it!
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;
# Let us transform the tree. Define the tree-regular expressions ..
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{ # Example of support code
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
=> {
my $op = $Op{ref($bin)};
$x->{attr} = eval "$x->{attr} $op $y->{attr}";
$_[0] = $NUM[0];
}
uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
},
OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations
$t->s($uminus); # Transform UMINUS nodes
$t->s(@all); # constant folding and mult. by zero
local $Parse::Eyapp::Node::INDENT=0;
print "\nSyntax Tree after transformations:\n",$t->str,"\n";
Introduction
Parse::Eyapp (Extended yapp) is a collection of modules that extends Francois Desarmenien Parse::Yapp 1.05. Eyapp extends yacc/yapp syntax with functionalities like named attributes, EBNF-like expressions, modifiable default action, automatic syntax tree building, semi-automatic abstract syntax tree building, translation schemes, tree regular expressions, tree transformations, scope analysis support, directed acyclic graphs and a few more.
Basic Concepts
Parsing is the activity of producing a syntax tree from an input stream. The program example in the synopsis section shows an example of parsing. The variable $grammar
contains a context free eyapp grammar defining the language of lists of arithmetic expressions. A context free grammar is a mathematical device to define languages. The grammar for the example in the synopsis section is:
line: exp <+ ';'>
;
exp:
NUM
| VAR
| VAR '=' exp
| exp '+' exp
| exp '-' exp
| exp '*' exp
| exp '/' exp
| '-' exp
| '(' exp ')'
;
A grammar generates a language. A grammar is defined by a set of production rules. A production rule has two components: a left hand side which is a syntactic variable or non terminal and a right hand side which is a phrase made of syntactic variables and terminals. The left hand side (lhs) and the right hand side (rhs) are usually separated by an arrow like in:
exp -> VAR = exp
A note: the production rule
line: exp <+ ';'>
is not really a production rule but an abbreviation for two productions. It stands for:
line : exp
| line ';' exp
;
A terminal or token never appears on the left hand side of a production rule. The phrases of the language are those obtained succesively applying the production rules of the grammar until no more rules can be applied. The succesive substitutions must start from the start
symbol of the grammar (line
in the example). Such legal sequence of substitutions is known as a derivation. The following is an example of a legal derivation (the big arrow =>
is read derives):
line => exp => VAR = exp => VAR = exp + exp => VAR = exp + NUM => VAR = VAR + NUM
thus the phrase VAR = VAR + NUM
belongs to the language generated by the former grammar. A derivation like can be seen as a tree. For instance, the former derivation is equivalen (has the same information) than the tree:
line(exp(VAR, '=', exp(exp(VAR), '+', exp(NUM))))
Such a tree is called a syntax tree for the input VAR = VAR + NUM
. A grammar is said to be ambiguous if there are phrases in the generated language that have more than one syntax tree. The grammar in the synopsis example is ambiguous. Here is an alternative tree for the same phrase VAR = VAR + NUM
:
line(exp(exp(VAR, '=', exp(VAR)), '+', exp(NUM)))
Parsers created by eyapp
do not deal directly with the input. Instead they expect the input to be processed by a lexical analyzer. The lexical analyzer parses the input and produces the next token. A token is a pair. The first component is the name of the token (like NUM
or VAR
) and the second is its attribute (i.e. ay information associated with the token, like that the value is 4
for a NUM
or the identifier is temperature
for a VAR
). Tokens are usually defined using regular expressions. Thus the token NUM
is characterized by /[0-9]+(?:\.[0-9]+)?/
and the token VAR
by /[A-Za-z][A-Za-z0-9_]*/
. The subroutine _Lexer
in the tail section of the SYNOPSIS section is a typical example of a typical lexical analyzer:
sub _Lexer {
my($parser)=shift; # The parser object
for ($parser->YYData->{INPUT}) { # Topicalize
m{\G\s+}gc; # skip whites
$_ eq '' and return('',undef);
m{\G([0-9]+(?:\.[0-9]+)?)}gc and return('NUM',$1);
m{\G([A-Za-z][A-Za-z0-9_]*)}gc and return('VAR',$1);
m{\G(.)}gcs and return($1,$1);
}
return('',undef);
}
The input was saved in the YYData->{INPUT}
section of the $parser
object. The for
loop is a false for
: its goal is to make $_
an alias of $parser->YYData->{INPUT}
. To catch the next pattern we use the anchor \G
. The \G
anchor matches at the point where the previous /g
match left off. Normally, when a scalar m{}g
match fails, the match position is reset and \G
will start matching at the beginning of the string. The c
option causes the match position to be retained following an unsuccessful match. The couple ('',undef)
signals the end of the input.
Parse::Eyapp
can analyze your grammar and produce a parser from your grammar. Actually Parse::Eyapp
is a translation scheme analyzer. A translation scheme scheme is a context free grammar where the right hand sides of the productions have been augmented with semantic actions (i.e. with chunks of Perl code):
A -> alpha { action(@_) } beta
The analyzer generated by Eyapp executes { action(@_) }
after all the semantic actions asssociated with alpha
have been executed and before the execution of any of the semantic actions associated with beta
.
Notice that ambiguous grammars produce ambiguous translation schemes: since a phrase may have two syntactic trees it will be more than one tree-traversing and consequently more than one way to execute the embedded semantic actions. Certainly different execution orders will usually produce different results. Thus, syntactic ambiguities translate onto semantic ambiguities. That is why it is important to resolve all the ambiguities and conflicts that may arise in our grammar. This is the function of the %left
and %right
declarations on the header section:
my $grammar = q{
# header section
%right '=' # Lowest precedence
%left '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
%left '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
%left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b)
%tree # Let us build an abstract syntax tree ...
%%
.... # body section
%%
.... # tail section
};
Priority can be assigned to tokens by using the %left
and %right
declarations. Tokens in lines below have more precedence than tokens in line above. By giving token '+'
more precedence than token '='
we solve the ambiguity for phrases like VAR = VAR + NUM
. The tree
line(exp(VAR, '=', exp(exp(VAR), '+', exp(NUM))))
will be built, discarding the other tree. Since priority means earlier evaluation and the evaluation of semantic actions is bottom up, the deeper the associated subtree the higher the priority.
In a translation scheme the embedded actions modify the attributes associated with the symbols of the grammar.
A -> alpha { action(@_) } beta
Each symbol on the right hand side of a production rule has an associated scalar attribute. In eyapp
the attributes of the symbol to the left of action
are passed as arguments to action
(in the example, those of alpha
). These arguments are preceded by a reference to the syntax analyzer object. There is no way inside an ordinary eyapp
program for an intermediate action
to access the attributes of the symbols on its right, i.e. those associated with the symbols of beta
. This restriction is lifted if you use the %metatree
directive to build a full translation scheme. See the section "Translation Schemes and the %metatree directive" in Parse::Eyapp to know more about full translation schemes.
Actions on the right hand side counts as symbols and so they can be referenced by its positional arument in later actions in the same production rule. For intermediate actions, the value returned by the action
is the attribute associated with such action. For an action at the end of the rule:
A -> alpha { lastaction(@_) }
the returned value constitutes the attribute of the left hand side of the rule (the attribute of A
in this case). The action at the end of the right hand side is called the action associated with the production rule. When no explicit action has been associated with a production rule the default action applies. In Parse::Eyapp
the programmer can define what is the default action.
A very special action is "build the node associated with this production rule" which is performed by the YYBuildAST
method of the parser object:
%default action { goto &Parse::Eyapp::Driver::YYBuildAST }
The %tree
directive used in the SYNOPSIS example is an abbreviation for this and has the effect of building an abstract syntax tree for the input.
The call to
Parse::Eyapp->new_grammar( # Create the parser package/class
input=>$grammar,
classname=>"Calc", # The name of the package containing the parser
);
compiles $grammar
and produces a new class Calc
containing a LALR parser for such grammar. The call
$parser = Calc->new()
creates a parser object for the language generated by $grammar
. Using the YYParse
of the parser object:
$self->YYParse( yylex => \&_Lexer, yyerror => \&_Error, )
YYParse
is called with arguments a reference to the lexical analyzer and a reference to the error diagnostic subroutine _Error
. Such subroutine will be called by YYParse
when an error occurs. Is therefore convenient to give a meaningful diagnostic:
sub _Error {
die "Syntax error near "
.($_[0]->YYCurval?$_[0]->YYCurval:"end of file")."\n"
}
The parser method YYCurval
returns the value of the current token. A more accurate error diagnostic subroutine can be obtained if the lexical analyzer is modified so that tokens keep the line number where they start (i.e. the token is a pair (TOKENNAME, [ ATTRIBUTE, LINENUMBER])
. In such case the _Error
subroutine can be rewritten as:
sub _Error {
my($token)=$_[0]->YYCurval;
my($what)= $token ? "input: '$token->[0]' in line $token->[1]" : "end of input";
my @expected = $_[0]->YYExpect();
my $expected = @expected? "Expected one of these tokens: '@expected'":"";
croak "Syntax error near $what. $expected\n";
}
The YYExpect
method returns the set of tokens that were expected when the error occurred.
The input in
$parser->YYData->{INPUT}
is then analyzed by YYParse
and an abstract syntax tree is built. The tree rooted on a Parse::Eyapp::Node
can be displayed using the method str
:
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;
The following is the description of the syntax tree produced by the call $t->str
for the list of expressions "2*-3+b*0;--2\n";
:
pl@nereida:~/LEyapp/examples$ synopsis.pl
Syntax Tree:
EXPRESION_LIST(
PLUS(
TIMES(
NUM(
TERMINAL[2]
),
UMINUS(
NUM(
TERMINAL[3]
)
) # UMINUS
) # TIMES,
TIMES(
VAR(
TERMINAL[b]
),
NUM(
TERMINAL[0]
)
) # TIMES
) # PLUS,
UMINUS(
UMINUS(
NUM(
TERMINAL[2]
)
) # UMINUS
) # UMINUS
) # EXPRESION_LIST
Did you notice that the TERMINAL
nodes appear decorated with its attribute? This is because each time the Parse::Eyapp::Node
method str
visits a node checks if the node has a method info
(i.e. $node->can(info)
). If so, the info
method is called and the string returned is concatenated in the description string. This is the reason for these three lines at the beginning of the SYNOPSIS example:
sub TERMINAL::info {
$_[0]{attr}
}
Parse::Eyapp
not only gives support to parsing but to later phases of the translation process: tree transformations and scope analysis (scope analysis is the task to find which definition applies to an use of an object in the source). The program in the synopsis section shows an example of a tree transformation specification. Tree transformations are specified using a language called Tree regular expressions. The transformation object is created by the constructor Parse::Eyapp::Treeregexp->new
.
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{ # Example of support code
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
=> {
my $op = $Op{ref($bin)};
$x->{attr} = eval "$x->{attr} $op $y->{attr}";
$_[0] = $NUM[0];
}
uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
},
);
The set of transformations specified in the example are
The transformation
constantfold
produces constant folding i.e. trees of expressions like3*2+4
are reduced to the tree for10
{ # Example of support code my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/'); } constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y)) => { my $op = $Op{ref($bin)}; $x->{attr} = eval "$x->{attr} $op $y->{attr}"; $_[0] = $NUM[0]; }
Here
constantfold
is the name of the transformation. The treeregexp compiler will produce an object$constantfold
implementing the transformation. After the name comes the tree pattern:/TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
It matches any subtree rooted in any node belonging to one of these classes:
TIMES
orPLUS
orDIV
orMINUS
that has two children belonging to theNUM
class. The Perl code after the big arrow is executed on any matching subtree. We can refer to the root of the subtree using the variable$bin
. We can also refer to the child of the firstNUM
node using$x
. In the same way$y
refers to the child of the secondNUM
node. Since there are twoNUM
nodes in the pattern, we refer to them inside the transformation part using the array@NUM
:$_[0] = $NUM[0];
The action uses and
eval
and the hash%Op
to compute the corresponding reduction of the two nodes. The hash%Op
was defined in a previous section containing support code. You can insert in any place of a treeregexp program such support code by surrounding it with curly brackets. The subtree that matched (that is in$_[0]
) is substituted by its left child:$_[0] = $NUM[0];
The transformations
zero_times_whatever
andwhatever_times_zero
produce the simplification of trees corresponding to multiplications by zero. Trees for expressions like(a+b)*0
or0*(b-4)
are reduced to the tree for 0.zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
Here
zero_times_whatever
is the name of the transformation. The patternTIMES(NUM($x), .)
matches anyTIMES
node with two children and whose first child belongs to theNUM
class. The dot matches any subtree, indicating that we don't care what sort of tree the right child is. The third component{ $x->{attr} == 0 }
is the semantic pattern. If both the shape pattern and the semantic pattern apply the action after the arrow is applied. The subtrees is substituted by its left child.
The transformation
uminus
simplifies the tree for unary minus of constant expressions.uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
It matches trees rooted in a
UMINUS
node whose only child is aNUM
node. In such case the sign of the number that is the attribute of theTERMINAL
node is changed and the tree is substituted by its single child.
The call
$p->generate();
compiles the transformation specification producing a set of transformations $constantfold
, $zero_times_whatever
, whatever_times_zero
and $uminus
. Transformations are Parse::Eyapp::YATW
objects. The list variable @all
refer to the whole set of Parse::Eyapp::YATW
transformations.
The nodes of the abstract syntax tree are objects. The class (NUM
, TIMES
, UMINUS
, etc.) defines the type of node. All node classes inherit from the class Parse::Eyapp::Node
. Parse::Eyapp::Node
provides a set of methods to manipulate nodes. Among these methods are str
, m
and s
. The m
and s
methods resemble the matching and substitution operators for regular expressions. But instead of regular expressions they work with tree transformations or treeregexp
s or, more precisely with Parse::Eyapp::YATW
objects. By calling:
$t->s($uminus);
subtrees like
UMINUS(UMINUS(NUM(TERMINAL[2])))
are simplified to
NUM(TERMINAL[2])
The call to
$t->s(@all);
applies the whole set of transformations. The transformations in @all
are iteratively applied to the tree $t
until no transformation succeeds: Yes, that means that a inappropriate set of transformations my hang your program.
Thus, the former syntax tree for "2*-3+b*0;--2\n";
becomes:
EXPRESION_LIST(NUM(TERMINAL[-6]),NUM(TERMINAL[2]))
The analyzer has been able to optimize - at compile time - the computation of these two expressions
2*-3+b*0;
--2
reducing them to the computation of:
-6;
2
Parse::Eyapp::Node Methods
The Parse::Eyapp::Node
objects represent the nodes of the syntax tree. All the node classes build by %tree
and %metatree
directives inherit from Parse::Eyapp::Node
and consequently have acces to the methods provided in such module.
Parse::Eyapp::Node->new
Nodes are usually created using the %tree
or %metatree
Parse::Eyapp
directives. The Parse::Eyapp::Node
constructor new
offers an alternative way to create forests.
This class method can be used to build multiple nodes on a row. It receives a string describing the tree and optionally a reference to a subroutine. Such subroutine (called the attribute handler) is in charge to initialize the attributes of the just created nodes. The attribute handler is called with the array of references to the nodes as they appear in the string from left to right.
Parse::Eyapp::Node->new
returns an array of pointers to the nodes created as they appear in the input string from left to right. In scalar context returns a pointer to the first of these trees.
The following example (see file examples/28foldwithnewwithvars.pl
) of a treeregexp transformation creates a new NUM(TERMINAL)
node using Parse::Eyapp::Node->new
:
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|MINUS|DIV/(NUM($x), NUM($y))
=> {
my $op = $Op{ref($_[0])};
my $res = Parse::Eyapp::Node->new(
q{NUM(TERMINAL)},
sub {
my ($NUM, $TERMINAL) = @_;
$TERMINAL->{attr} = eval "$x->{attr} $op $y->{attr}";
$TERMINAL->{token} = 'NUM';
},
);
$_[0] = $res;
}
},
);
The string can describe more than one tree like in:
my @t = Parse::Eyapp::Node->new(
'A(C,D) E(F)', sub { my $i = 0; $_->{n} = $i++ for @_ });
The following trees will be built:
bless( { 'n' => 0,
'children' => [
bless( { 'n' => 1, 'children' => [] }, 'C' ),
bless( { 'n' => 2, 'children' => [] }, 'D' )
]
}, 'A' );
bless( { 'n' => 3,
'children' => [
bless( { 'n' => 4, 'children' => [] }, 'F' )
]
}, 'E' );
and @t
will contain 5 references to the corresponding subtrees A(C,D), C, D, E(F) and F.
Directed Acyclic Graphs with Parse::Eyapp::Node->hnew
Parse::Eyapp
provides the method Parse::Eyapp::Node->hnew
to build Directed Acyclic Graphs (DAGs) instead of trees. They are built using hashed consing, i.e. memoizing the creation of nodes. It works very much like Parse::Eyapp::Node->new
but if one of the implied trees was previously built, hnew
returns a reference to the existing one. See the following debugger session where several DAGs describing type expressions are built:
DB<2> x $a = Parse::Eyapp::Node->hnew('F(X_3(A_3(A_5(INT)), CHAR, A_5(INT)),CHAR)')
0 F=HASH(0x85f6a20)
'children' => ARRAY(0x85e92e4)
|- 0 X_3=HASH(0x83f55fc)
| 'children' => ARRAY(0x83f5608)
| |- 0 A_3=HASH(0x85a0488)
| | 'children' => ARRAY(0x859fad4)
| | 0 A_5=HASH(0x85e5d3c)
| | 'children' => ARRAY(0x83f4120)
| | 0 INT=HASH(0x83f5200)
| | 'children' => ARRAY(0x852ccb4)
| | empty array
| |- 1 CHAR=HASH(0x8513564)
| | 'children' => ARRAY(0x852cad4)
| | empty array
| `- 2 A_5=HASH(0x85e5d3c)
| -> REUSED_ADDRESS
`- 1 CHAR=HASH(0x8513564)
-> REUSED_ADDRESS
DB<3> x $a->str
0 'F(X_3(A_3(A_5(INT)),CHAR,A_5(INT)),CHAR)'
The second occurrence of A_5(INT)
is labelled REUSED_ADDRESS
. The same occurs with the second instance of CHAR
. Parse::Eyapp::Node->hnew
can be more convenient than new
when dealing with optimizations like common subexpressions or during type checking. See file examples/Types.eyp
for a more comprehensive example.
$node->type
Returns the type of the node. It can be called as a sub when $node
is not a Parse::Eyapp::Node
like this:
Parse::Eyapp::Node::type($scalar)
This is the case when visiting CODE
nodes.
The following session with the debugger illustrates how it works:
> perl -MParse::Eyapp::Node -de0
DB<1> @t = Parse::Eyapp::Node->new("A(B,C)") # Creates a tree
DB<2> x map { $_->type } @t # Get the types of the three nodes
0 'A'
1 'B'
2 'C'
DB<3> x Parse::Eyapp::Node::type(sub {})
0 'CODE'
DB<4> x Parse::Eyapp::Node::type("hola")
0 'Parse::Eyapp::Node::STRING'
DB<5> x Parse::Eyapp::Node::type({ a=> 1})
0 'HASH'
DB<6> x Parse::Eyapp::Node::type([ a, 1 ])
0 'ARRAY'
As it is shown in the example it can be called as a subroutine with a (CODE/HASH/ARRAY) reference or an ordinary scalar.
The words HASH, CODE, ARRAY and STRING are reserved for ordinary Perl references. Avoid naming a node with one of those words.
$node->child
Setter-getter to modify a specific child of a node. It is called like:
$node->child($i)
Returns the child with index $i. Returns undef
if the child does not exists. It has two obligatory parameters: the node (since it is a method) and the index of the child. Sets the new value if called
$node->child($i, $tree)
The method will croak if the obligatory parameters are not provided. Follows an example of use inside a Treereg program (see file examples/TSwithtreetransformations2.eyp
) that swaps the children of a PLUS
node:
my $transform = Parse::Eyapp::Treeregexp->new( STRING => q{
commutative_add: PLUS($x, ., $y, .) # 1st . = '+' 2nd . = CODE
=> { my $t = $x; $_[0]->child(0, $y); $_[0]->child(2, $t)}
}
Child Access Through %tree alias
Remember that when the Eyapp
program runs under the %tree alias
directive The dot and dollar notations can be used to generate named getter-setters to access the children:
%tree bypass alias
....
%%
exp: %name PLUS
exp.left '+' exp.right
....
%%
.... # and later
print $exp->left->str;
Here methods with names left
and right
will be created to access the corresponding children associated with the two instances of exp
in the right hand side of the production rule.
$node->children
Returns the array of children of the node. When the tree is a translation scheme the CODE references are also included. See examples/TSPostfix3.eyp
for an example of use inside a Translation Scheme:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$\
sed -ne '31,34p' TSPostfix3.eyp
line: %name PROG
exp <%name EXP + ';'>
{ @{$lhs->{t}} = map { $_->{t}} ($_[1]->children()); }
The tree in a Translation Scheme contains the references to the CODE
implementing the semantic actions. For example, the syntax tree built by the parser for the input a=-b*3
in TSPostfix3.eyp
is:
PROG(EXP(
ASSIGN(
TERMINAL[a],
TERMINAL[=],
TIMES(
NEG(TERMINAL[-], VAR(TERMINAL[b], CODE), CODE),
TERMINAL[*],
NUM(TERMINAL[3], CODE),
CODE
) # TIMES,
CODE
) # ASSIGN
) # EXP,
CODE
) # PROG
$node->children
can also be used as a setter.
$node->Children
Returns the array of children of the node. When dealing with a translation scheme, the $node->Children method (first in uppercase) returns the non CODE children of the node.
$node->last_child
Return the last child of the node. When dealing with translation schemes, the last can be a CODE
node.
$node->Last_child
The $node->Last_child
method returns the last non CODE child of the node. See an example:
line: %name EXP
exp <+ ';'> /* Expressions separated by semicolons */
{ $lhs->{n} = $_[1]->Last_child->{n} }
;
$node->descendant
The descendant
method returns the descendant of a node given its coordinates. The coordinates of a node $s
relative to a tree $t
to which it belongs is a string of numbers separated by dots like ".1.3.2"
which denotes the child path from $t
to $s
, i.e. $s == $t->child(1)->child(3)->child(2)
.
See a session with the debugger:
DB<7> x $t->child(0)->child(0)->child(1)->child(0)->child(2)->child(1)->str
0 '
BLOCK[8:4:test]^{0}(
CONTINUE[10,10]
)
DB<8> x $t->descendant('.0.0.1.0.2.1')->str
0 '
BLOCK[8:4:test]^{0}(
CONTINUE[10,10]
$node->str
The str
method returns a string representation of the tree. The str method traverses the syntax tree dumping the type of the node being visited in a string. To be specific the value returned by the function referenced by $CLASS_HANDLER
will be dumped. The default value fo such function is to return the type of the node. If the node being visited has a method info
it will be executed and its result inserted between $DELIMITER
s into the string. Thus, in the "SYNOPSIS" example, by adding the info
method to the class TERMINAL
:
sub TERMINAL::info {
$_[0]{attr}
}
we achieve the insertion of attributes in the string being built by str
.
The existence of some methods (like footnote
) and the values of some package variables influence the behavior of str
. Among the most important are:
@PREFIXES = qw(Parse::Eyapp::Node::); # Prefixes to supress
$INDENT = 0; # -1 compact, no info, no footnotes
# 0 = compact, 1 = indent, 2 = indent and include Types in closing parenthesis
$STRSEP = ','; # Separator between nodes, by default a comma
$DELIMITER = '['; # The string returned by C<info> will be enclosed
$FOOTNOTE_HEADER = "\n---------------------------\n";
$FOOTNOTE_SEP = ")\n";
$FOOTNOTE_LEFT = '^{'; # Left delimiter for a footnote number
$FOOTNOTE_RIGHT = '}'; # Right delimiter for a footnote number
$LINESEP = 4; # When indent=2 the enclosing parenthesis will be
# commented if more than $LINESEP apart
$CLASS_HANDLER = sub { type($_[0]) }; # What to print to identify the node
Footnotes and attribute info will not be inserted when $INDENT
is -1. A compact representation will be obtained. Such representation can be feed to new
or hnew
to obtain a copy of the tree. See the following session with the debugger:
pl@nereida:~/LEyapp$ perl -MParse::Eyapp::Node -wde 0
main::(-e:1): 0
DB<1> $x = Parse::Eyapp::Node->new('A(B(C,D),D)', sub { $_->{order} = $i++ for @_; })
DB<2> *A::info = *B::info = *C::info = *D::info = sub { shift()->{order} }
DB<3> p $x->str
A[0](B[1](C[2],D[3]),D[4])
DB<4> $Parse::Eyapp::Node::INDENT=-1
DB<5> p $x->str
A(B(C,D),D)
DB<6> x Parse::Eyapp::Node->hnew($x->str)
0 A=HASH(0x8574704)
'children' => ARRAY(0x85745d8)
0 B=HASH(0x857468c)
'children' => ARRAY(0x8574608)
0 C=HASH(0x85745b4)
'children' => ARRAY(0x8509670)
empty array
1 D=HASH(0x8574638)
'children' => ARRAY(0x857450c)
empty array
1 D=HASH(0x8574638)
-> REUSED_ADDRESS
1 B=HASH(0x857468c)
-> REUSED_ADDRESS
2 C=HASH(0x85745b4)
-> REUSED_ADDRESS
3 D=HASH(0x8574638)
-> REUSED_ADDRESS
4 D=HASH(0x8574638)
-> REUSED_ADDRESS
The following list defines the $DELIMITER
s you can choose for attribute representation:
'[' => ']', '{' => '}', '(' => ')', '<' => '>'
If the node being visited has a method footnote
, the string returned by the method will be concatenated at the end of the string as a footnote. The variables $FOOTNOTE_LEFT
and $FOOTNOTE_RIGHT
govern the displaying of footnote numbers.
Follows an example of output using footnotes
.
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/script> \
usetypes.pl prueba24.c
PROGRAM^{0}(FUNCTION[f]^{1}(RETURNINT(TIMES(INUM(TERMINAL[2:2]),VAR(TERMINAL[a:2])))))
---------------------------
0)
Types:
$VAR1 = {
'CHAR' => bless( {
'children' => []
}, 'CHAR' ),
'VOID' => bless( {
'children' => []
}, 'VOID' ),
'INT' => bless( {
'children' => []
}, 'INT' ),
'F(X_1(INT),INT)' => bless( {
'children' => [
bless( {
'children' => [
$VAR1->{'INT'}
]
}, 'X_1' ),
$VAR1->{'INT'}
]
}, 'F' )
};
Symbol Table:
$VAR1 = {
'f' => {
'type' => 'F(X_1(INT),INT)',
'line' => 1
}
};
---------------------------
1)
$VAR1 = {
'a' => {
'type' => 'INT',
'param' => 1,
'line' => 1
}
};
The first footnote was due to a call to PROGRAM:footnote
. The footnote
method for the PROGRAM
node was defined as:
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '691,696p' Types.eyp | cat -n
1 sub PROGRAM::footnote {
2 return "Types:\n"
3 .Dumper($_[0]->{types}).
4 "Symbol Table:\n"
5 .Dumper($_[0]->{symboltable})
6 }
The second footnote was produced by the existence of a FUNCTION::footnote
method:
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '702,704p' Types.eyp | cat -n
1 sub FUNCTION::footnote {
2 return Dumper($_[0]->{symboltable})
3 }
The source program for the example was:
1 int f(int a) {
2 return 2*a;
3 }
$node->equal
A call $tree1->equal($tree2)
compare the two trees $tree1
and $tree2
. Two trees are considered equal if their root nodes belong to the same class, they have the same number of children and the children are (recursively) equal.
Additionally to the two trees the programmer can specify pairs attribute_key => equality_handler
:
$tree1->equal($tree2, attr1 => \&handler1, attr2 => \&handler2, ...)
In such case the definition of equality is more restrictive: Two trees are considered equal if
Their root nodes belong to the same class,
They have the same number of children
For each of the specified attributes occur that for both nodes the existence and definition of the key is the same
Assuming the key exists and is defined for both nodes, the equality handlers return true for each of its attributes and
The respective children are (recursively) equal.
An attribute handler receives as arguments the values of the attributes of the two nodes being compared and must return true if, and only if, these two attributes are considered equal. Follows an example:
pl@nereida:~/LEyapp/examples$ cat equal.pl
#!/usr/bin/perl -w
use strict;
use Parse::Eyapp::Node;
my $string1 = shift || 'ASSIGN(VAR(TERMINAL))';
my $string2 = shift || 'ASSIGN(VAR(TERMINAL))';
my $t1 = Parse::Eyapp::Node->new($string1, sub { my $i = 0; $_->{n} = $i++ for @_ });
my $t2 = Parse::Eyapp::Node->new($string2);
# Without attributes
if ($t1->equal($t2)) {
print "\nNot considering attributes: Equal\n";
}
else {
print "\nNot considering attributes: Not Equal\n";
}
# Equality with attributes
if ($t1->equal($t2, n => sub { return $_[0] == $_[1] })) {
print "\nConsidering attributes: Equal\n";
}
else {
print "\nConsidering attributes: Not Equal\n";
}
When the former program is run without arguments produces the following output:
pl@nereida:~/LEyapp/examples$ equal.pl
Not considering attributes: Equal
Considering attributes: Not Equal
Using equal
During Testing
During the development of your compiler you add new stages to the existing ones. The consequence is that the AST is decorated with new attributes. Unfortunately, this implies that tests you wrote using is_deeply
and comparisons against formerly correct abstract syntax trees are no longer valid. This is due to the fact that is_deeply
requires both tree structures to be equivalent in every detail and that our new code produces a tree with new attributes.
Instead of is_deeply
use the equal
method to check for partial equivalence between abstract syntax trees. You can follow these steps:
Dump the tree for the source inserting
Data::Dumper
statementsCarefully check that the tree is really correct
Decide which attributes will be used for comparison
Write the code for the expected value editing the output produced by
Data::Dumper
Write the handlers for the attributes you decided. Write the comparison using
equal
.
Tests using this methodology will not fail even if later code decorating the AST with new attributes is introduced. See an example:
pl@nereida:~/LEyapp/examples$ cat -n testequal.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Parse::Eyapp::Node;
4 use Data::Dumper;
5 use Data::Compare;
6
7 my $debugging = 0;
8
9 my $handler = sub {
10 print Dumper($_[0], $_[1]) if $debugging;
11 Compare($_[0], $_[1])
12 };
13
14 my $t1 = bless( {
15 'types' => {
16 'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
17 'VOID' => bless( { 'children' => [] }, 'VOID' ),
18 'INT' => bless( { 'children' => [] }, 'INT' ),
19 'F(X_0(),INT)' => bless( {
20 'children' => [
21 bless( { 'children' => [] }, 'X_0' ),
22 bless( { 'children' => [] }, 'INT' ) ]
23 }, 'F' )
24 },
25 'symboltable' => { 'f' => { 'type' => 'F(X_0(),INT)', 'line' => 1 } },
26 'lines' => 2,
27 'children' => [
28 bless( {
29 'symboltable' => {},
30 'fatherblock' => {},
31 'children' => [],
32 'depth' => 1,
33 'parameters' => [],
34 'function_name' => [ 'f', 1 ],
35 'symboltableLabel' => {},
36 'line' => 1
37 }, 'FUNCTION' )
38 ],
39 'depth' => 0,
40 'line' => 1
41 }, 'PROGRAM' );
42 $t1->{'children'}[0]{'fatherblock'} = $t1;
43
44 # Tree similar to $t1 but without some attributes (line, depth, etc.)
45 my $t2 = bless( {
46 'types' => {
47 'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
48 'VOID' => bless( { 'children' => [] }, 'VOID' ),
49 'INT' => bless( { 'children' => [] }, 'INT' ),
50 'F(X_0(),INT)' => bless( {
51 'children' => [
52 bless( { 'children' => [] }, 'X_0' ),
53 bless( { 'children' => [] }, 'INT' ) ]
54 }, 'F' )
55 },
56 'symboltable' => { 'f' => { 'type' => 'F(X_0(),INT)', 'line' => 1 } },
57 'children' => [
58 bless( {
59 'symboltable' => {},
60 'fatherblock' => {},
61 'children' => [],
62 'parameters' => [],
63 'function_name' => [ 'f', 1 ],
64 }, 'FUNCTION' )
65 ],
66 }, 'PROGRAM' );
67 $t2->{'children'}[0]{'fatherblock'} = $t2;
68
69 # Tree similar to $t1 but without some attributes (line, depth, etc.)
70 # and without the symboltable and types attributes used in the comparison
71 my $t3 = bless( {
72 'types' => {
73 'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
74 'VOID' => bless( { 'children' => [] }, 'VOID' ),
75 'INT' => bless( { 'children' => [] }, 'INT' ),
76 'F(X_0(),INT)' => bless( {
77 'children' => [
78 bless( { 'children' => [] }, 'X_0' ),
79 bless( { 'children' => [] }, 'INT' ) ]
80 }, 'F' )
81 },
82 'children' => [
83 bless( {
84 'symboltable' => {},
85 'fatherblock' => {},
86 'children' => [],
87 'parameters' => [],
88 'function_name' => [ 'f', 1 ],
89 }, 'FUNCTION' )
90 ],
91 }, 'PROGRAM' );
92
93 $t3->{'children'}[0]{'fatherblock'} = $t2;
94
95 # Without attributes
96 if (Parse::Eyapp::Node::equal($t1, $t2)) {
97 print "\nNot considering attributes: Equal\n";
98 }
99 else {
100 print "\nNot considering attributes: Not Equal\n";
101 }
102
103 # Equality with attributes
104 if (Parse::Eyapp::Node::equal(
105 $t1, $t2,
106 symboltable => $handler,
107 types => $handler,
108 )
109 ) {
110 print "\nConsidering attributes: Equal\n";
111 }
112 else {
113 print "\nConsidering attributes: Not Equal\n";
114 }
115
116 # Equality with attributes
117 if (Parse::Eyapp::Node::equal(
118 $t1, $t3,
119 symboltable => $handler,
120 types => $handler,
121 )
122 ) {
123 print "\nConsidering attributes: Equal\n";
124 }
125 else {
126 print "\nConsidering attributes: Not Equal\n";
127 }
The code defining tree $t1
was obtained from an output using Data::Dumper
. The code for trees $t2
and $t3
was written using cut-and-paste from $t1
. They have the same shape than $t1
but differ in their attributes. Tree $t2
shares with $t1
the attributes symboltable
and types
used in the comparison and so equal
returns true
when compared. Since $t3
differs from $t1
in the attributes symboltable
and types
the call to equal
returns false
.
$node->delete
The $node->delete($child)
method is used to delete the specified child of $node
. The child to delete can be specified using the index or a reference. It returns the deleted child.
Throws an exception if the object can't do children
or has no children
. See also the delete method of treeregexes (Parse::Eyapp:YATW
objects) to delete the node being visited.
The following example moves out of a loop an assignment statement assuming is an invariant of the loop. To do it, it uses the delete
and insert_before
methods:
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '98,113p' moveinvariantoutofloopcomplexformula.pl
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
moveinvariant: BLOCK(
@prests,
WHILE(VAR($b), BLOCK(@a, ASSIGN($x, NUM($e)), @c)),
@possts
)
=> {
my $assign = $ASSIGN;
$BLOCK[1]->delete($ASSIGN);
$BLOCK[0]->insert_before($WHILE, $assign);
}
},
FIRSTLINE => 99,
);
$p->generate();
$moveinvariant->s($t);
The example below deletes CODE nodes from the tree build for a translation scheme:
my $transform = Parse::Eyapp::Treeregexp->new(
STRING=>q{
delete_code: CODE => { Parse::Eyapp::Node::delete($CODE) }
},
)
Observe how delete is called as a subroutine.
$node->unshift($newchild)
Inserts $newchild
at the beginning of the list of children of $node
. See also the unshift method for Parse::Eyapp:YATW
treeregexp transformation objects
$node->push($newchild)
Inserts $newchild
at the end of the list of children of $node
.
$node->insert_before($position, $new_child)
Inserts $newchild
before $position
in the list of children of $node
. Variable $position
can be an index or a reference.
The method throws an exception if $position
is an index and is not in range. Also if $node
has no children.
The method throws a warning if $position
is a reference and does not define an actual child. In such case $new_child
is not inserted.
See also the insert_before method for Parse::Eyapp:YATW
treeregexp transformation objects
$node->insert_after($position, $new_child)
Inserts $newchild
after $position
in the list of children of $node
. Variable $position
can be an index or a reference.
The method throws an exception if $position
is an index and is not in the range of $node-
children>.
The method throws a warning if $position
is a reference and does not exists in the list of children. In such case $new_child
is not inserted.
$node->translation_scheme
Traverses $node. Each time a CODE node is visited the subroutine referenced is called with arguments the node and its children. Usually the code will decorate the nodes with new attributes or will update existing ones. Obviously this method does nothing for an ordinary AST. It is used after compiling an Eyapp program that makes use of the %metatree
directive.
$node->bud
Bottom-up decorator. The tree is traversed bottom-up. The set of transformations is applied to each node in the order supplied by the user. As soon as one succeeds no more transformations are applied. For an example see the files examples/Types.eyp
and examples/Trans.trg
. The code below shows an extract of the type-checking phase of a toy-example compiler:
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '600,611p' Types.eyp
my @typecheck = (
our $inum,
our $charconstant,
our $bin,
our $arrays,
our $assign,
our $control,
our $functioncall,
our $statements,
);
$t->bud(@typecheck);
As an example of the appearance of the treeregexp transformations involved in the former call, see the code of the $control
treeregexp transformation:
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '183,192p' Trans.trg
control: /IF|IFELSE|WHILE/:con($bool)
=> {
$bool = char2int($con, 0) if $bool->{t} == $CHAR;
type_error("Condition must have integer type!", $bool->line)
unless $bool->{t} == $INT;
$con->{t} = $VOID;
return 1;
}
Parse::Eyapp:YATW Methods
Parse::Eyapp:YATW
objects represent tree transformations. They carry the information of what nodes match and how to modify them.
Parse::Eyapp::YATW->new
Builds a treeregexp transformation object. Though usually you build a transformation by means of Treeregexp programs you can directly invoke the method to build a tree transformation. A transformation object can be built from a function that conforms to the YATW tree transformation call protocol (see the section "The YATW Tree Transformation Call Protocol"). Follows an example (file examples/12ts_simplify_with_s.pl
):
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '68,$p' 12ts_simplify_with_s.pl | cat -n
1 sub is_code {
2 my $self = shift; # tree
3
4 # After the shift $_[0] is the father, $_[1] the index
5 if ((ref($self) eq 'CODE')) {
6 splice(@{$_[0]->{children}}, $_[1], 1);
7 return 1;
8 }
9 return 0;
10 }
11
12 Parse::Eyapp->new_grammar(
13 input=>$translationscheme,
14 classname=>'Calc',
15 firstline =>7,
16 );
17 my $parser = Calc->new(); # Create the parser
18
19 $parser->YYData->{INPUT} = "2*-3\n"; print "2*-3\n"; # Set the input
20 my $t = $parser->Run; # Parse it
21 print $t->str."\n";
22 my $p = Parse::Eyapp::YATW->new(PATTERN => \&is_code);
23 $p->s($t);
24 { no warnings; # make attr info available only for this display
25 local *TERMINAL::info = sub { $_[0]{attr} };
26 print $t->str."\n";
27 }
After the Parse::Eyapp::YATW
object $p
is built at line 22 the call to method $p->s($t)
applies the transformation is_code
using a bottom-up traversing of the tree $t
. The achieved effect is the elimination of CODE
references in the translation scheme tree. When executed the former code produces:
nereida:~/src/perl/YappWithDefaultAction/examples> 12ts_simplify_with_s.pl
2*-3
EXP(TIMES(NUM(TERMINAL,CODE),TERMINAL,UMINUS(TERMINAL,NUM(TERMINAL,CODE),CODE),CODE),CODE)
EXP(TIMES(NUM(TERMINAL[2]),TERMINAL[*],UMINUS(TERMINAL[-],NUM(TERMINAL[3]))))
The file foldrule6.pl
in the examples/
distribution directory gives you another example:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n foldrule6.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Parse::Eyapp::YATW;
5
6 my %BinaryOperation = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
7
8 sub set_terminfo {
9 no warnings;
10 *TERMINAL::info = sub { $_[0]{attr} };
11 }
12 sub is_foldable {
13 my ($op, $left, $right);
14 return 0 unless defined($op = $BinaryOperation{ref($_[0])});
15 return 0 unless ($left = $_[0]->child(0), $left->isa('NUM'));
16 return 0 unless ($right = $_[0]->child(1), $right->isa('NUM'));
17
18 my $leftnum = $left->child(0)->{attr};
19 my $rightnum = $right->child(0)->{attr};
20 $left->child(0)->{attr} = eval "$leftnum $op $rightnum";
21 $_[0] = $left;
22 }
23
24 my $parser = new Rule6();
25 $parser->YYData->{INPUT} = "2*3";
26 my $t = $parser->Run;
27 &set_terminfo;
28 print "\n***** Before ******\n";
29 print $t->str;
30 my $p = Parse::Eyapp::YATW->new(PATTERN => \&is_foldable);
31 $p->s($t);
32 print "\n***** After ******\n";
33 print $t->str."\n";
when executed produces:
nereida:~/src/perl/YappWithDefaultAction/examples> foldrule6.pl
***** Before ******
TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3]))
***** After ******
NUM(TERMINAL[6])
The YATW Tree Transformation Call Protocol
For a subroutine pattern_sub
to work as a YATW tree transformation - as subroutines is_foldable
and is_code
above - has to conform to the following call description:
pattern_sub(
$_[0], # Node being visited
$_[1], # Father of this node
$index, # Index of this node in @Father->children
$self, # The YATW pattern object
);
The pattern_sub
must return TRUE if matched and FALSE otherwise.
The protocol may change in the near future. Avoid using other information than the fact that the first argument is the node being visited.
Parse::Eyapp::YATW->buildpatterns
Works as Parse::Eyapp->new but receives an array of subs conforming to the YATW Tree Transformation Call Protocol.
our @all = Parse::Eyapp::YATW->buildpatt(\&delete_code, \&delete_tokens);
$yatw->delete
The root of the tree that is currently matched by the YATW transformation $yatw
will be deleted from the tree as soon as is safe. That usually means when the processing of their siblings is finished. The following example (taken from file examples/13ts_simplify_with_delete.pl
in the Parse::Eyapp distribution) illustrates how to eliminate CODE and syntactic terminals from the syntax tree:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ \
sed -ne '62,$p' 13ts_simplify_with_delete.pl | cat -n
1 sub not_useful {
2 my $self = shift; # node
3 my $pat = $_[2]; # get the YATW object
4
5 (ref($self) eq 'CODE') or ((ref($self) eq 'TERMINAL') and ($self->{token} eq $self->{attr}))
6 or do { return 0 };
7 $pat->delete();
8 return 1;
9 }
10
11 Parse::Eyapp->new_grammar(
12 input=>$translationscheme,
13 classname=>'Calc',
14 firstline =>7,
15 );
16 my $parser = Calc->new(); # Create the parser
17
18 $parser->YYData->{INPUT} = "2*3\n"; print $parser->YYData->{INPUT};
19 my $t = $parser->Run; # Parse it
20 print $t->str."\n"; # Show the tree
21 my $p = Parse::Eyapp::YATW->new(PATTERN => \¬_useful);
22 $p->s($t); # Delete nodes
23 print $t->str."\n"; # Show the tree
when executed we get the following output:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ 13ts_simplify_with_delete.pl
2*3
EXP(TIMES(NUM(TERMINAL[2],CODE),TERMINAL[*],NUM(TERMINAL[3],CODE),CODE))
EXP(TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3])))
$yatw->unshift
Tha call $yatw->unshift($b)
safely unshifts (inserts at the beginning) the node $b
in the list of its siblings of the node that matched (i.e in the list of siblings of $_[0]
). The following example shows a YATW transformation insert_child
that illustrates the use of unshift
(file examples/26delete_with_trreereg.pl
):
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ \
sed -ne '70,$p' 26delete_with_trreereg.pl | cat -n
1 my $transform = Parse::Eyapp::Treeregexp->new( STRING => q{
2
3 delete_code : CODE => { $delete_code->delete() }
4
5 {
6 sub not_semantic {
7 my $self = shift;
8 return 1 if ((ref($self) eq 'TERMINAL') and ($self->{token} eq $self->{attr}));
9 return 0;
10 }
11 }
12
13 delete_tokens : TERMINAL and { not_semantic($TERMINAL) } => {
14 $delete_tokens->delete();
15 }
16
17 insert_child : TIMES(NUM(TERMINAL), NUM(TERMINAL)) => {
18 my $b = Parse::Eyapp::Node->new( 'UMINUS(TERMINAL)',
19 sub { $_[1]->{attr} = '4.5' }); # The new node will be a sibling of TIMES
20
21 $insert_child->unshift($b);
22 }
23 },
24 )->generate();
25
26 Parse::Eyapp->new_grammar(
27 input=>$translationscheme,
28 classname=>'Calc',
29 firstline =>7,
30 );
31 my $parser = Calc->new(); # Create the parser
32
33 $parser->YYData->{INPUT} = "2*3\n"; print $parser->YYData->{INPUT}; # Set the input
34 my $t = $parser->Run; # Parse it
35 print $t->str."\n"; # Show the tree
36 # Get the AST
37 our ($delete_tokens, $delete_code);
38 $t->s($delete_tokens, $delete_code);
39 print $t->str."\n"; # Show the tree
40 our $insert_child;
41 $insert_child->s($t);
42 print $t->str."\n"; # Show the tree
When is executed the program produces the following output:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ 26delete_with_trreereg.pl
2*3
EXP(TIMES(NUM(TERMINAL[2],CODE),TERMINAL[*],NUM(TERMINAL[3],CODE),CODE))
EXP(TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3])))
EXP(UMINUS(TERMINAL[4.5]),TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3])))
Don't try to take advantage that the transformation sub receives in $_[1]
a reference to the father (see the section "The YATW Tree Transformation Call Protocol") and do something like:
unshift $_[1]->{children}, $b
it is unsafe.
$yatw->insert_before
A call to $yatw->insert_before($node)
safely inserts $node
in the list of siblings of $_[0]
just before $_[0]
(i.e. the ndoe that matched with $yatw
). The following example (file t/33moveinvariantoutofloop.t
) illustrates its use:
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
moveinvariant: WHILE(VAR($b), BLOCK(@a, ASSIGN($x, $e), @c))
and { is_invariant($ASSIGN, $WHILE) } => {
my $assign = $ASSIGN;
$BLOCK->delete($ASSIGN);
$moveinvariant->insert_before($assign);
}
},
);
Here the ASSIGN($x, $e)
subtree - if is loop invariant - will be moved to the list of siblings of $WHILE
just before the $WHILE
.
Tree Matching and Tree Substitution
Matching Trees
Both the transformation objects in Parse::Eyapp::YATW
and the nodes in Parse::Eyapp::Node
have a method named m
for matching. For a Parse::Eyapp::YATW
object, the method -when called in a list context- returns a list of Parse::Eyapp::Node::Match
nodes.
@R = $t->m($yatw1, $yatw2, $yatw3, ...)
A Parse::Eyapp::Node::Match
object describes the nodes of the actual tree that have matched. The nodes in the returned list are organized in a hierarchy. They appear in the list sorted according to a depth-first visit of the actual tree $t
. In a scalar context m
returns the first element of the list.
Let us denote by $t
the actual tree being searched and $r
one of the Parse::Eyapp::Node::Match
nodes in the resulting forest @R
. Then we have the following methods:
The method
$r->node
return the node$t
of the actual tree that matchedThe method
$r->father
returns the father of$r
in the matching forest. The father of$r
is defined by this property:$r->father->node
is the nearest ancestor of$r->node
that matched with the treeregexp pattern. That is, there is no ancestor that matched between$r->node
and$r->father->node
. Otherwise$r->father
isundef
The method
$r->coord
returns the coordinates of$r->node
relative to$t
. For example, the coordinate".1.3.2"
denotes the node$t->child(1)->child(3)->child(2)
, where$t
is the root of the search.The method
$r->depth
returns the depth of$r->node
in$t
.When
m
was called as aParse::Eyapp::Node
method, i. e. with potentially more than oneYATW
treeregexp, the method$r->names
returns the array of names of the transformations that matched with$r->node
.
The following example illustrates a use of m
as a Parse::Eyapp:YATW
method. It solves a problem of scope analysis in a C compiler: matching each RETURN
statement with the function that surrounds it. The parsing was already done, the AST was built and left in $t
. The treeregexp used is:
retscope: /FUNCTION|RETURN/
and the code that solves the problem is:
# Associate each "return exp" with its "function"
my @returns = $retscope->m($t);
for (@returns) {
my $node = $_->node;
if (ref($node) eq 'RETURN') {
my $function = $_->father->node;
$node->{function} = $function;
$node->{t} = $function->{t};
}
}
The first line gets a list of Parse::Eyapp::Node::Match
nodes describing the actual nodes that matched /FUNCTION|RETURN/
. If the node described by $_
is a 'RETURN'
node, the expresion $_->father->node
must necessarily point to the function node that encloses it.
The second example shows the use of m
as a Parse::Eyapp::Node
method.
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ cat -n m2.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Parse::Eyapp::Treeregexp;
5
6 Parse::Eyapp::Treeregexp->new( STRING => q{
7 fold: /times|plus|div|minus/i:bin(NUM($n), NUM($m))
8 zxw: TIMES(NUM($x), .) and { $x->{attr} == 0 }
9 wxz: TIMES(., NUM($x)) and { $x->{attr} == 0 }
10 })->generate();
11
12 # Syntax analysis
13 my $parser = new Rule6();
14 $parser->YYData->{INPUT} = "0*0*0";
15 my $t = $parser->Run;
16 print "Tree:",$t->str,"\n";
17
18 # Search
19 my $m = $t->m(our ($fold, $zxw, $wxz));
20 print "Match Node:\n",$m->str,"\n";
When executed with input 0*0*0
the program generates this output:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ m2.pl
Tree:TIMES(TIMES(NUM(TERMINAL),NUM(TERMINAL)),NUM(TERMINAL))
Match Node:
Match[[TIMES:0:wxz]](Match[[TIMES:1:fold,zxw,wxz]])
The representation of Match
nodes by str
deserves a comment. Match
nodes have their own info
method. It returns a string containing the concatenation of the class of $r->node
(i.e. the actual node that matched), the depth ($r->depth
) and the names of the transformations that matched (as provided by the method $r->names
)
The SEVERITY
option of Parse::Eyapp::Treeregexp::new
The SEVERITY
option of Parse::Eyapp::Treeregexp::new
controls the way matching succeeds regarding the number of children. To illustrate its use let us consider the following example. The grammar used Rule6.yp
is similar to the one in the SYNOPSIS example.
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ cat -n numchildren.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Parse::Eyapp::Treeregexp;
5
6 sub TERMINAL::info { $_[0]{attr} }
7
8 my $severity = shift || 0;
9 my $parser = new Rule6();
10 $parser->YYData->{INPUT} = shift || '0*2';
11 my $t = $parser->Run;
12
13 my $transform = Parse::Eyapp::Treeregexp->new(
14 STRING => q{
15 zero_times_whatever: TIMES(NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
16 },
17 SEVERITY => $severity,
18 FIRSTLINE => 14,
19 )->generate;
20
21 $t->s(our @all);
22
23 print $t->str,"\n";
The program gets the severity level from the command line (line 9). The specification of the term TIMES(NUM($x))
inside the transformation zero_times_whatever
does not clearly state that TIMES
must have two children. There are several interpretations of the treregexp depending on the level fixed for SEVERITY
:
0:
TIMES
must have at least one child. Don't care if it has more.1:
TIMES
must have exactly one child.2:
TIMES
must have exactly one child. When visit aTIMES
node with a different number of children issue a warning.3:
TIMES
must have exactly one child. When visit aTIMES
node with a different number of children issue an error.
Observe the change in behavior according to the level of SEVERITY
:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 0 '0*2'
NUM(TERMINAL[0])
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 1 '0*2'
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 2 '0*2'
Warning! found node TIMES with 2 children.
Expected 1 children (see line 15 of ./numchildren.pl)"
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 3 '0*2'
Error! found node TIMES with 2 children.
Expected 1 children (see line 15 of ./numchildren.pl)"
at (eval 2) line 29
Tree Substitution: The s
methods
Both Parse::Eyapp:Node
and Parse::Eyapp::YATW
objects (i.e. nodes and tree transformations) are provided with a s
method.
In the case of a Parse::Eyapp::YATW
object the method s
applies the tree transformation using a single bottom-up traversing: the transformation is recursively applied to the children and then to the current node.
For Parse::Eyapp:Node
nodes the set of transformations is applied to each node until no transformation matches any more. The example in the "SYNOPSIS" section illustrates the use:
1 # Let us transform the tree. Define the tree-regular expressions ..
2 my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
3 { # Example of support code
4 my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
5 }
6 constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
7 => {
8 my $op = $Op{ref($_[0])};
9 $x->{attr} = eval "$x->{attr} $op $y->{attr}";
10 $_[0] = $NUM[0];
11 }
12 uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
13 zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
14 whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
15 },
16 OUTPUTFILE=> 'main.pm'
17 );
18 $p->generate(); # Create the tranformations
19
20 $t->s($uminus); # Transform UMINUS nodes
21 $t->s(@all); # constant folding and mult. by zero
The call at line 20 can be substituted by $uminus->s($t)
without changes.
AUTHOR
Casiano Rodriguez-Leon (casiano@ull.es)
ACKNOWLEDGMENTS
This work has been supported by CEE (FEDER) and the Spanish Ministry of Educación y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project http://www.oplink.ull.es/). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years.
A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien.
I wish to thank Francois Desarmenien for his Parse::Yapp
module, to my students at La Laguna and to the Perl Community. Special thanks to my family and Larry Wall.
LICENCE AND COPYRIGHT
Copyright (c) 2006-2007 Casiano Rodriguez-Leon (casiano@ull.es). All rights reserved.
Parse::Yapp copyright is of Francois Desarmenien, all rights reserved. 1998-2001
These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 1925:
Non-ASCII character seen before =encoding in 'I<Educación'. Assuming UTF-8