NAME
eyapptut - A Parse::Eyapp Tutorial
VERSION
1.06552
SYNOPSIS
use strict;
use Parse::Eyapp;
use Parse::Eyapp::Treeregexp;
sub TERMINAL::info {
$_[0]{attr}
}
my $grammar = q{
%right '=' # Lowest precedence
%left '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
%left '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
%left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b)
%tree # Let us build an abstract syntax tree ...
%%
line: exp <%name EXPRESION_LIST + ';'> { $_[1] } /* list of expressions separated by ';' */
;
/* The %name directive defines the name of the class to which the node being built belongs */
exp:
%name NUM NUM | %name VAR VAR | %name ASSIGN VAR '=' exp
| %name PLUS exp '+' exp | %name MINUS exp '-' exp | %name TIMES exp '*' exp
| %name DIV exp '/' exp | %name UMINUS '-' exp %prec NEG
| '(' exp ')' { $_[2] } /* Let us simplify a bit the tree */
;
%%
sub _Error { die "Syntax error near ".($_[0]->YYCurval?$_[0]->YYCurval:"end of file")."\n" }
sub _Lexer {
my($parser)=shift; # The parser object
for ($parser->YYData->{INPUT}) {
s/^\s+//;
$_ eq '' and return('',undef);
s/^([0-9]+(?:\.[0-9]+)?)// and return('NUM',$1);
s/^([A-Za-z][A-Za-z0-9_]*)// and return('VAR',$1);
s/^(.)//s and return($1,$1);
}
}
sub Run {
my($self)=shift;
$self->YYParse( yylex => \&_Lexer, yyerror => \&_Error, );
}
}; # end grammar
our (@all, $uminus);
Parse::Eyapp->new_grammar( # Create the parser package/class
input=>$grammar,
classname=>'Calc', # The name of the package containing the parser
firstline=>7 # String $grammar starts at line 7 (for error diagnostics)
);
my $parser = Calc->new(); # Create a parser
$parser->YYData->{INPUT} = "2*-3+b*0;--2\n"; # Set the input
my $t = $parser->Run; # Parse it!
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;
# Let us transform the tree. Define the tree-regular expressions ..
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{ # Example of support code
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
=> {
my $op = $Op{ref($_[0])};
$x->{attr} = eval "$x->{attr} $op $y->{attr}";
$_[0] = $NUM[0];
}
uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
},
OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations
$t->s($uminus); # Transform UMINUS nodes
$t->s(@all); # constant folding and mult. by zero
local $Parse::Eyapp::Node::INDENT=0;
print "\nSyntax Tree after transformations:\n",$t->str,"\n";
Introduction to Parse::Eyapp
Parse::Eyapp (Extended yapp) is a collection of modules that extends Francois Desarmenien Parse::Yapp 1.05. Eyapp extends yacc/yapp syntax with the functionalities briefly described in this section. This is an introductory tutorial. For a reference guide see Parse::Eyapp. If you are not familiar with yacc or yapp and you can speak Spanish start reading the contents in http://nereida.deioc.ull.es/~pl/perlexamples/chapter_parseeyapp.html.
Input from strings
Grammars can be compiled from a file or from source on the fly (See the synopsis section for an example).
Names for attributes
Attributes can be referenced by meaningful names instead of the classic error-prone positional approach using the dot notation like in:
exp : exp.left '-' exp.right { $left - $right }
By qualifying the first appearance of the syntactic variable exp
with the notation exp.left
we can later refer inside the actions to the associated attribute using the lexical variable $left
. Thus the former code is equivalent to:
exp '-' exp
{
my $lhs = shift;
my ($left, $right) = @_[1, 3];
$lhs->{n} = $left->{n} - $right->{n}
}
The dolar notation $A
can be used as an abbreviation of A.A
. For example, the code:
$VAR '=' $exp
{ $lhs->{n} = $sym{$VAR->{attr}}->{n} = $exp->{n} }
is equivalent to:
VAR '=' exp
{
my $lhs = shift;
my ($VAR, $exp) = @_[1, 3];
$lhs->{n} = $sym{$VAR->{attr}}->{n} = $exp->{n}
}
Lists and Optionals
Lists, optional lists, list separated by tokens, etc. like in the start rule in the Synopsis example can be used:
line: exp <%name EXPRESION_LIST + ';'> { $_[1] }
which defines line
as the language of non empty lists of exp
elements separated by semicolons. The use of %name EXPRESION_LIST
gives a name to the created list. Actually the right hand side of this production has only one element which is the reference to the list. The associated action { $_[1] }
makes the generated parser to return the reference to such list.
The former rule is almost equivalent to:
line: line ';' exp { push $_[1]->{children}, $_[3] }
| exp { bless { children => [ $_[1] ] }, 'EXPRESION_LIST' }
other operators are *
(meaning 0 or more) and ?
(for optionality)
Default actions
When no action is specified both yapp
and eyapp
implicitly insert the semantic action { $_[1] }
. In Parse::Eyapp
you can modify such behavior using the %defaultaction { Perl code }
directive. The { Perl code }
clause that follows the %defaultaction
directive is executed when reducing by any production for which no explicit action was specified.
See an example that translates an infix expression like a=b*-3
into a postfix expression like a b 3 NEG * =
:
# File Postfix.eyp (See the examples/ directory)
%right '='
%left '-' '+'
%left '*' '/'
%left NEG
%defaultaction { return "$left $right $op"; }
%%
line: $exp { print "$exp\n" }
;
exp: $NUM { $NUM }
| $VAR { $VAR }
| VAR.left '='.op exp.right
| exp.left '+'.op exp.right
| exp.left '-'.op exp.right
| exp.left '*'.op exp.right
| exp.left '/'.op exp.right
| '-' $exp %prec NEG { "$exp NEG" }
| '(' $exp ')' { $exp }
;
%%
# Support subroutines as in the Synopsis example
...
Compiling with eyapp
The file containing the Eyapp
program must be compiled with eyapp
:
nereida:~/src/perl/YappWithDefaultAction/examples> eyapp Postfix.eyp
Next, you have to write a client program:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n usepostfix.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Postfix;
4
5 my $parser = new Postfix();
6 $parser->Run;
Now we can run the client program:
nereida:~/src/perl/YappWithDefaultAction/examples> usepostfix.pl
Write an expression: -(2*a-b*-3)
2 a * b 3 NEG * - NEG
Abstract Syntax Trees
Parse::Eyapp
facilitates the construction of concrete syntax trees and abstract syntax trees (abbreviated AST from now on) through the %tree
directive. Nodes in the AST are blessed in the production name
. By default the name of a production is the concatenation of the left hand side and the production number. The production number is the ordinal number of the production as they appear in the associated .output
file (see option -v
of eyapp) However, a production can be named using the %name
directive. Therefore, in the following code:
exp:
%name NUM NUM
| %name VAR VAR
| %name ASSIGN VAR '=' exp
. .............................
| %name UMINUS '-' exp %prec NEG
| '(' exp ')' { $_[2] } /* Let us simplify a bit the tree */
we are explictly naming the productions. Thus, the node corresponding to the production exp: VAR '=' exp
will be named ASSIGN
. Explicit actions can be specified by the programmer like in
| '(' exp ')' { $_[2] } /* Let us simplify a bit the tree */
Explicit actions receive as arguments the references to the children nodes already built. The programmer can influence the shape of the tree by inserting these explicit actions. In this example the programmer has decided to simplify the syntax tree: the nodes associated with the parenthesis are discarded and the reference to the subtree containing the proper expression is returned.
Displaying Trees
All the node classes build by %tree
inherit from Parse::Eyapp::Node
and consequently have access to the methods provided in such module. Among them is the str
method which dumps the tree. The str method traverses the syntax tree dumping the type of the node being visited in a string. If the node has a method info
it will be executed and its result concatenated to the string. Thus, in the Synopsis example, by adding the info
method to the class TERMINAL
:
sub TERMINAL::info {
$_[0]{attr}
}
we achieve the insertion of the TERMINAL
attribute attr
in the string build by str
(see the partial output of synopsis.pl
in section "Syntactic and Semantic tokens"). The existence of some methods (like footnote
) and the values of some package variables influence the behavior of str
. Among the most important are:
@PREFIXES = qw(Parse::Eyapp::Node::); # Prefixes to supress
$INDENT = 0; # 0 = compact, 1 = indent, 2 = indent
# and include Types in closing parenthesis
$STRSEP = ','; # Separator between nodes, by default a comma
$DELIMITER = '['; # The string returned by C<info> will be enclosed
$FOOTNOTE_HEADER = "\n---------------------------\n";
$FOOTNOTE_SEP = ")\n";
$FOOTNOTE_LEFT = '^{'; # Left delimiter for a footnote number
$FOOTNOTE_RIGHT = '}'; # Right delimiter for a footnote number
$LINESEP = 4; # When indent=2 the enclosing parenthesis will be
# commented if more than $LINESEP apart
TERMINAL
nodes
Nodes named TERMINAL
correspond to tokens provided by the lexical analyzer. They are Parse::Eyapp::Node
nodes (hashes) with an attribute attr
holding the attribute provided by the lexical analyzer. The attr
method can be used to get/set the attribute.
User Attributes and System Attributes
All the nodes in the AST are Parse::Eyapp::Node
nodes. They are hashes that the user can decorate with new keys/attributes. The only reserved words are those listed in the reference section. Basically they have a children
key. TERMINAL
nodes have the attr
key.
Syntactic and Semantic tokens
Parse::Eyapp
diferences between syntactic tokens
and semantic tokens
. By default all tokens declared using string notation (i.e. between quotes like '+'
, '='
, in the Synopsis example) are considered syntactic tokens
. Tokens declared by an identifier (like NUM
or VAR
in the Synopsis example) are by default considered semantic tokens
. Syntactic tokens are eliminated when building the syntactic tree. Thus, the first print in the former Synopsis example:
$parser->YYData->{INPUT} = "2*-3+b*0;--2\n";
my $t = $parser->Run;
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;
gives as result the following output:
nereida:~/src/perl/YappWithDefaultAction/examples> synopsis.pl
Syntax Tree:
EXPRESION_LIST(
PLUS(
TIMES(
NUM(
TERMINAL[2]
),
UMINUS(
NUM(
TERMINAL[3]
)
) # UMINUS
) # TIMES,
TIMES(
VAR(
TERMINAL[b]
),
NUM(
TERMINAL[0]
)
) # TIMES
) # PLUS,
UMINUS(
UMINUS(
NUM(
TERMINAL[2]
)
) # UMINUS
) # UMINUS
) # EXPRESION_LIST
Saving the Information In Syntactic Tokens
The reason for the adjective %syntactic
applied to a token is to state that the token influences the shape of the syntax tree but carries no other information. When the tree is built the node corresponding to the token is discarded.
Sometimes the difference between syntactic and semantic tokens is blurred. For example the line number associated with an instance of the syntactic token '+'
can be used later -say during type checking- to emit a more accurate error diagnostic. But if the node was discarded the information about that line number is no longer available. When building the syntax tree Parse::Eyapp
(namely the method Parse::Eyapp::YYBuildAST
) checks a TERMINAL::save_attributes
method exists and if so it will be called when visiting a syntactic terminal. The method receives as argument - additionally to the reference to the TERMINAL
node - a reference to the node associated with the left hand side of the production. Here is an example (file examples/Types.eyp
) of use:
sub TERMINAL::save_attributes {
# $_[0] is a syntactic terminal
# $_[1] is the father.
push @{$_[1]->{lines}}, $_[0]->[1]; # save the line!
}
The directives %syntactic token
and %semantic token
The new token declaration directives %syntactic token
and %semantic token
can change the status of a token. For example (file 15treewithsyntactictoken.pl
in the examples/
directory), given the grammar:
%syntactic token b
%semantic token 'a' 'c'
%tree
%%
S: %name ABC
A B C
| %name BC
B C
;
A: %name A
'a'
;
B: %name B
b
;
C: %name C
'c'
;
%%
the tree build for input abc
will be ABC(A(TERMINAL),B,C(TERMINAL))
.
The bypass
clause and the %no bypass
directive
The shape of the tree can be also modified using some %tree
clauses as %tree bypass
which will produce an automatic bypass of any node with only one child at tree-construction-time.
A bypass operation consists in returning the only child of the node being visited to the father of the node and re-typing (re-blessing) the node in the name of the production (if a name is provided).
A node may have only one child at tree-construction-time for one of two reasons.
The first occurs when the right hand side of the production was already unary like in:
exp: %name NUM NUM
Here the
NUM
node will be bypassed and the childTERMINAL
built from the information provided by the lexical analyzer will be renamed asNUM
.Another reason for a node to be bypassed is the fact that though the right hand side of the production may have more than one symbol, only one of them is not a syntactic token like in:
exp: '(' exp ')'
As consequence of the blind application of the bypass rule undesired bypasses may occur like in
exp : %name UMINUS
'-' $exp %prec NEG
though the right hand side has two symbols, token '-'
is a syntactic token and therefore only exp
is left. The bypass operation will be applied when building this node. This bypass can be avoided applying the no bypass ID
directive to the corresponding production:
exp : %no bypass UMINUS
'-' $exp %prec NEG
The following example is the equivalent of the Synopsis example but using the bypass
clause instead:
use Parse::Eyapp;
use Parse::Eyapp::Treeregexp;
sub TERMINAL::info { $_[0]{attr} }
{ no warnings; *VAR::info = *NUM::info = \&TERMINAL::info; }
my $grammar = q{
%right '=' # Lowest precedence
%left '-' '+'
%left '*' '/'
%left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b)
%tree bypass # Let us build an abstract syntax tree ...
%%
line: exp <%name EXPRESION_LIST + ';'> { $_[1] }
;
exp:
%name NUM NUM | %name VAR VAR | %name ASSIGN VAR '=' exp
| %name PLUS exp '+' exp | %name MINUS exp '-' exp | %name TIMES exp '*' exp
| %name DIV exp '/' exp
| %no bypass UMINUS
'-' $exp %prec NEG
| '(' exp ')'
;
%%
# sub _Error, _Lexer and Run like in the synopsis example
# ...
}; # end grammar
our (@all, $uminus);
Parse::Eyapp->new_grammar( # Create the parser package/class
input=>$grammar,
classname=>'Calc', # The name of the package containing the parser
firstline=>7 # String $grammar starts at line 7 (for error diagnostics)
);
my $parser = Calc->new(); # Create a parser
$parser->YYData->{INPUT} = "a=2*-3+b*0\n"; # Set the input
my $t = $parser->Run; # Parse it!
print "\n************\n".$t->str."\n************\n";
# Let us transform the tree. Define the tree-regular expressions ..
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{ # Example of support code
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM, NUM)
=> {
my $op = $Op{ref($_[0])};
$NUM[0]->{attr} = eval "$NUM[0]->{attr} $op $NUM[1]->{attr}";
$_[0] = $NUM[0];
}
zero_times_whatever: TIMES(NUM, .) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
whatever_times_zero: TIMES(., NUM) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
uminus: UMINUS(NUM) => { $NUM->{attr} = -$NUM->{attr}; $_[0] = $NUM }
},
OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations
$t->s(@all); # constant folding and mult. by zero
print $t->str,"\n";
when running this example we obtain the following output:
nereida:~/src/perl/YappWithDefaultAction/examples> bypass.pl
************
EXPRESION_LIST(ASSIGN(TERMINAL[a],PLUS(TIMES(NUM[2],UMINUS(NUM[3])),TIMES(VAR[b],NUM[0]))))
************
EXPRESION_LIST(ASSIGN(TERMINAL[a],NUM[-6]))
As you can see the trees are more compact when using the bypass
directive.
Explictly building nodes with the YYBuildAST
method
Sometimes the best time to decorate a node with some attributes is just after being built. In such cases the programmer can take manual control building the node with YYBuildAST
to inmediately proceed to decorate it.
The following example illustrates the situation:
Variable:
%name VARARRAY
$ID ('[' binary ']') <%name INDEXSPEC +>
{
my $self = shift;
my $node = $self->YYBuildAST(@_);
$node->{line} = $ID->[1];
return $node;
}
This example defines the expression to access an array element as an identifier followed by a non empty list of binary expressions. The node corresponding to the list of indices has been named INDEXSPEC
.
When no explicit action is inserted a binary node will be built having as first child the node corresponding to the identifier $ID
and as second child the reference to the list of binary expressions. However, the programmer wants to decorate the node being built with a line
attribute holding the line number in the source code where the identifier being used appears. The call to the Parse::Eyapp::Driver
method YYBuildAST
does the job of building the node. After that the node can be decorated and returned.
Actually, the %tree
directive is semantically equivalent to:
%default action { goto &Parse::Eyapp::Driver::YYBuildAST }
The child
and descendant
methods
Access to the children of the AST is achieved through the children
and child
methods. More general is the descendant
method that returns the descendant of a node given its coordinates. See a session with the debugger:
DB<7> x $t->child(0)->child(0)->child(1)->child(0)->child(2)->child(1)->str
0 '
BLOCK[8:4:test]^{0}(
CONTINUE[10,10]
)
DB<8> x $t->descendant('.0.0.1.0.2.1')->str
0 '
BLOCK[8:4:test]^{0}(
CONTINUE[10,10]
Returning non References Under %tree
When a explicit user action returns s.t. that is not a reference no node will be inserted. This fact can be used to supress nodes in the AST being built. See the following example:
nereida:~/src/perl/YappWithDefaultAction/examples> sed -ne '1,11p' returnnonode.yp | cat -n
1 %tree
2 %semantic token 'a' 'b'
3 %%
4 S: /* empty */
5 | S A
6 | S B
7 ;
8 A : 'a'
9 ;
10 B : 'b' { }
11 ;
since the action at line 10 returns undef
the B : 'b'
subtree will not be inserted in the AST:
nereida:~/src/perl/YappWithDefaultAction/examples> usereturnnonode.pl
ababa
S_2(S_3(S_2(S_3(S_2(S_1,A_4(TERMINAL[a]))),A_4(TERMINAL[a]))),A_4(TERMINAL[a]))
Observe the absence of B
s and 'b'
s.
Intermediate actions and %tree
Intermediate actions can be used to change the shape of the AST (prune it, decorate it, etc.) but the value returned by them is ignored. The grammar below has two intermediate actions. They modify the attributes of the node to its left and return a reference $f
to such node (lines 5 and 6):
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '1,10p' intermediateactiontree.yp | cat -n
1 %semantic token 'a' 'b'
2 %tree bypass
3 %%
4 S: /* empty */
5 | S A.f { $f->{attr} = "A"; $f; } A
6 | S B.f { $f->{attr} = "B"; $f; } B
7 ;
8 A : %name A 'a'
9 ;
10 B : %name B 'b'
See the client program running:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n useintermediateactiontree.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Parse::Eyapp;
4 use intermediateactiontree;
5
6 { no warnings;
7 *A::info = *B::info = sub { $_[0]{attr} };
8 }
9
10 my $parser = intermediateactiontree->new();
11 my $t = $parser->Run;
12 print $t->str,"\n";
nereida:~/src/perl/YappWithDefaultAction/examples> useintermediateactiontree.pl
aabbaa
S_2(S_4(S_2(S_1,A[A],A[a]),B[B],B[b]),A[A],A[a])
The attributes of left A
s have been changed from 'a'
to 'A'
. However no further children have been inserted.
The alias
clause of the %tree
directive
Access to children in Parse::Eyapp is made through the child
and children
methods. There are occasions however where access by name to the children may be preferable. The use of the alias
clause with the %tree
directive creates accessors to the children with names specified by the programmer. The dot and dolar notations are used for this. When dealing with a production like:
A:
%name A_Node
Node B.bum N.pum $Chip
methods bum
, pum
and Chip
will be created for the class A_Node
. Those methods wil provide access to the respective child (first, second and third in the example). The methods are build at compile-time and therefore later transformations of the AST modifying the order of the children may invalidate the use of these getter-setters.
As an example, the CPAN module Language::AttributeGrammar provides AST decorators from an attribute grammar specification of the AST. To work Language::AttributeGrammar requires named access to the children of the AST nodes. Follows an example of a small calculator:
use Parse::Eyapp;
use Language::AttributeGrammar;
my $grammar = q{
... # priority declarations. Like in previous examples
%tree bypass alias
%%
line: $exp { $_[1] }
;
exp:
%name NUM
$NUM
| %name VAR
$VAR
............ # as in the bypass example
}; # end grammar
Parse::Eyapp->new_grammar(
input=>$grammar, classname=>'Rule6', firstline =>7,
);
my $parser = Rule6->new();
$parser->YYData->{INPUT} = "a = -(2*3+5-1)\n";
my $t = $parser->Run;
my $attgram = new Language::AttributeGrammar <<'EOG';
# Compute the expression
NUM: $/.val = { $<attr> }
TIMES: $/.val = { $<left>.val * $<right>.val }
PLUS: $/.val = { $<left>.val + $<right>.val }
MINUS: $/.val = { $<left>.val - $<right>.val }
UMINUS: $/.val = { -$<exp>.val }
ASSIGN: $/.val = { $<exp>.val }
EOG
my $res = $attgram->apply($t, 'val');
Tree Regular Expressions
Parse::Eyapp
introduces a new language called Tree Regular Expressions that easies the transformation of trees. Let us recall the previous example used in the bypass
section:
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{ # Example of support code
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM, NUM)
=> {
my $op = $Op{ref($_[0])};
$NUM[0]->{attr} = eval "$NUM[0]->{attr} $op $NUM[1]->{attr}";
$_[0] = $NUM[0];
}
zero_times_whatever: TIMES(NUM, .) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
whatever_times_zero: TIMES(., NUM) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
uminus: UMINUS(NUM) => { $NUM->{attr} = -$NUM->{attr}; $_[0] = $NUM }
},
OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations
$t->s(@all); # constant folding and mult. by zero
The call to the constructor new
builds a Parse::Eyapp::Treeregexp
object. The subsequent call to the method $p->generate
compiles the object producing tree-transformations built according to the specification given in the treeregexp program. A tree transformation is a Parse::Eyapp::YATW
object. The example contains four tree programa transformations named constantfold
, zero_times_whatever
, whatever_times_zero
and <uminus>. These transformations can be grouped in transformation families. Such families of transformations can be applied to any Parse::Eyapp::Node
trees. An special variable @PACKAGE::all
refers to the whole set of transformations in the program. Here PACKAGE
refers to the package where the transformations live. When no PACKAGE
argument is specified in the call to new
- as is the case in this example - the package of the caller is used instead. The call $t->s(@all)
proceeds to the execution of the method s
(for substitution) using all the specified transformations. The transformations will be iteratively applied to all nodes of the tree until there are no changes. Summarizing, that means that
All
UMINUS
nodes whose only child is a numberNUM
will be substituted by theNUM
node but with the sign changedConstant folding will be applied: trees representing constants expressions will be substituted by a
NUM
node representing its valueAll the
TIMES
nodes with one child holding the value 0 will be substituted by that child
The Syntax of Treeregexp
The example illustrates the syntax of the language. A tree transformation conforms to the syntax:
treeregexp:
IDENT ':' treereg ('and' CODE)? ('=>' CODE)?
like in:
zero_times_whatever: TIMES(NUM, .) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
The IDENT
is the name given to the tree transformation. A tree transformation is actually a Parse::Eyapp::YATW
object. After generation time two package objects are created per transformation:
A subroutine with name
zero_times_whatever
holding the actual code for the tree transformation will be available andA scalar variable named
$zero_times_whatever
will refer to theParse::Eyapp::YATW
tree transformation object.
These names live in the package specified by the user in the call to new
through the PACKAGE
argument. When no package name is specified the name of the caller package is used instead.
After the IDENT
and the colon comes the treeregexp. The treeregexp is a term, that is a parenthesized description of the shape of the tree like TIMES(NUM, .)
which says: match nodes of type TIMES
whose left child is a NUM
and whose right child is whatever. The dot stands for whatever and is a treeregexp that matches any node.
Then comes the reserved word and
and some Perl code specifying the semantic conditions for the node being visited to match
{ $NUM->{attr} == 0 }
The code can access to the different subtrees using lexical variables whose names match the type of the node. Thus, in the example:
zero_times_whatever: TIMES(NUM, .) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
variable $NUM
refers to the left child while variables $TIMES
and $_[0]
will refer refer to the node being visited. When more than one node of the same type exists (for instance TIMES(NUM,NUM)
) the associated lexical variable changes its type from scalar to array and thus if several NUM
nodes appear in the term we will speak about $NUM[0]
, $NUM[1]
, etc.
Separated Compilation with treereg
A Treeregexp program can be isolated in a file an compiled with the program treereg
. The default extension is .trg
. See the following example:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n Shift.trg
1 # File: Shift.trg
2 {
3 sub log2 {
4 my $n = shift;
5 return log($n)/log(2);
6 }
7
8 my $power;
9 }
10 mult2shift: TIMES($e, NUM($m)) and { $power = log2($m->{attr}); (1 << $power) == $m->{attr} }
11 => {
12 $_[0]->delete(1);
13 $_[0]->{shift} = $power;
14 $_[0]->type('SHIFTLEFT');
15 }
Note that auxiliary support code can be inserted at any point between transformations (lines 2-6). The code will be inserted (without the defining curly brackets) at that point. Note also that the lexical variable $power
is visible inside the definition of the mult2shift
transformation.
A treeregexp like $e
matches any node. A reference to the node is saved in the lexical variable $e
. The scope of the variable $e
is the current tree transformation, i.e. mult2shift
. Such kind of treeregexps are called scalar treeregexps.
The call to the delete
method at line 12 deletes the second child of the node being visited (i.e. NUM($m)
).
The call to type
at line 14 retypes the node as a SHIFTLEFT
node.
The program is compiled using the script treereg
:
nereida:~/src/perl/YappWithDefaultAction/examples> treereg Shift
nereida:~/src/perl/YappWithDefaultAction/examples> ls -ltr | tail -1
-rw-rw---- 1 pl users 1405 2006-11-06 14:09 Shift.pm
The module Shift.pm
contains the code implementing the tree transformations.
The client program follows:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n useruleandshift.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Shift;
5 { no warnings; *TERMINAL::info = \&TERMINAL::attr; }
6
7 push @SHIFTLEFT::ISA, 'Parse::Eyapp::Node';
8 sub SHIFTLEFT::info { $_[0]{shift} }
9
10 my $parser = new Rule6();
11 $parser->YYData->{INPUT} = <>;
12 my $t = $parser->Run;
13 print "***********\n",$t->str,"\n";
14 $t->s(@Shift::all);
15 print "***********\n",$t->str,"\n";
Lines 5 and 8 provide the node classes TERMINAL
and SHIFTLEFT
of info
methods to be used by str
(lines 13 and 14). To make SHIFTLEFT
a node class it has to inherit from Parse::Eyapp::Node
(line 7).
Multiplications by a power of two are substituted by the corresponding shifts:
nereida:~/src/perl/YappWithDefaultAction/examples> useruleandshift.pl
a=b*8
***********
ASSIGN(TERMINAL[a],TIMES(VAR(TERMINAL[b]),NUM(TERMINAL[8])))
***********
ASSIGN(TERMINAL[a],SHIFTLEFT[3](VAR(TERMINAL[b])))
Regexp Treeregexps
We can use an ordinary regular expression regexp
inside the term part of a treeregexp. The constantfold
transformation in the Synopsis example shows how:
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
=> {
my $op = $Op{ref($bin)};
$x->{attr} = eval "$x->{attr} $op $y->{attr}";
$_[0] = $NUM[0];
}
The regexp is specified between division slashes /
. It is legal to specify options after the second slash (like e
, i
, etc.). The optional identifier bin
after the regexp indicates the name for the lexical variable holding a copy that references the node. If no identifier is specified, the special variable $W
is used instead. If the treeregexp has several anonymous regexp or dot treeregexps they will be stored in the array variable @W
.
The operation of the ordinary string oriented regexps are slightly modified when they are used inside a treeregexp. by default the option x
will be assumed. The treeregexp compiler will automatically insert it. Use the new option X
(upper case X) if you want to supress such behavior. There is no need also to insert \b
word anchors to delimit identifiers: all the identifiers in a regexp treeregexp are automatically surrounded by \b
. Use the option B
(upper case B) to supress this behavior.
The following fragment of the type checking stage of a simple compiler shows that x
is implictly assumed:
# Binary Operations
bin: / PLUS
|MINUS
|TIMES
|DIV
|MOD
|GT
|GE
|LE
|EQ
|NE
|LT
|AND
|EXP
|OR
/($x, $y)
=> {
$x = char2int($_[0], 0);
$y = char2int($_[0], 1);
if (($x->{t} == $INT) and ( $y->{t} == $INT)) {
$_[0]->{t} = $INT;
return 1;
}
type_error("Incompatible types with operator '".($_[0]->lexeme)."'", $_[0]->line);
}
With the natural Perl regexp semantic the language reserved word WHILE
would match the regexp (see the LE
for less or equal) leading to an erroneous type checking. The automatic insertion of word anchors prevent it.
Matching Trees
Both the transformation objects in Parse::Eyapp::YATW
and the nodes in Parse::Eyapp::Node
have a method named m
for matching.
For a Parse::Eyapp::YATW
object, the method -when called in a list context- returns a list of Parse::Eyapp::Node::Match
nodes referencing the nodes of the actual tree that have matched. The nodes in the list are organized in a hierarchy.
The nodes are sorted in the list of trees (a forest) according to a depth-first visit of the actual tree $t
.
In a scalar context m
returns the first element of the list.
Let us denote by $t
the actual tree being searched and $r
one of the Parse::Eyapp::Node::Match
nodes in the resulting forest. Then we have the following methods:
The method
$r->node
return the node$t
of the actual tree that matchedThe method
$r->father
returns the tree in the matching forest. The father is defined by this property:$r->father->node
is the nearest ancestor of$r->node
that matched with the treeregexp pattern. That is, there is no ancestor that matched between$r->node
and$r->father->node
. Otherwise$r->father
isundef
The method
$r->coord
returns the coordinates of the actual tree that matched using s.t similar to the Dewey notation. for example, the coordinate".1.3.2"
denotes the node$t->child(1)->child(3)->child(2)
, where$t
is the root of the search.The method
$r->depth
returns the depth of$r->node
in$t
.When called as a
Parse::Eyapp::Node
method,$r->names
returns the array of names of the transformations that matched.
The following example illustrates a use of m
as a Parse::Eyapp:YATW
method. It solves a problem of scope analysis in a C compiler: matching each RETURN
statement with the function that surrounds it. The treeregexp used is:
retscope: /FUNCTION|RETURN/
and the code that solves the problem is:
# Scope Analysis: Return-Function
my @returns = $retscope->m($t);
for (@returns) {
my $node = $_->node;
if (ref($node) eq 'RETURN') {
my $function = $_->father->node;
$node->{function} = $function;
$node->{t} = $function->{t};
}
}
The first line gets a list of Parse::Eyapp::Node::Match
nodes describing the actual nodes that matched /FUNCTION|RETURN/
. If the node described by $_
is a 'RETURN'
node, the expresion $_->father->node
must necessarily point to the function node that surrounds it.
The second example shows the use of m
as a Parse::Eyapp::Node
method.
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n m2.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Parse::Eyapp::Treeregexp;
5
6 Parse::Eyapp::Treeregexp->new( STRING => q{
7 fold: /times|plus|div|minus/i:bin(NUM($n), NUM($m))
8 zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 }
9 whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 }
10 })->generate();
11
12 # Syntax analysis
13 my $parser = new Rule6();
14 print "Expression: "; $parser->YYData->{INPUT} = <>;
15 my $t = $parser->Run;
16 local $Parse::Eyapp::Node::INDENT = 1;
17 print "Tree:",$t->str,"\n";
18
19 # Search
20 my $m = $t->m(our ($fold, $zero_times_whatever, $whatever_times_zero));
21 print "Match Node:",$m->str,"\n";
When executed with input 0*0*0
the program generates this output:
nereida:~/src/perl/YappWithDefaultAction/examples> m2.pl
Expression: 0*0*0
Tree:
TIMES(
TIMES(
NUM(
TERMINAL
),
NUM(
TERMINAL
)
),
NUM(
TERMINAL
)
)
Match Node:
Match[TIMES:0:whatever_times_zero](
Match[TIMES:1:fold,zero_times_whatever,whatever_times_zero]
)
The representation of Match
nodes by str
deserves a comment. Match
nodes have their own info
method. It returns a string containing the concatenation of the class of $r->node
(i.e. the actual node that matched), the depth ($r->depth
) and the names of the transformations that matched (as provided by the method $r->names
)
The SEVERITY
option of Parse::Eyapp::Treeregexp::new
The SEVERITY
option of Parse::Eyapp::Treeregexp::new
controls the way matching succeeds regarding the number of children. To illustrate its use let us consider the following example. The grammar Rule6
used by the example is similar to the one in the Synopsis example.
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n numchildren.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Parse::Eyapp::Treeregexp;
5 use Parse::Eyapp::Node;
6
7 sub TERMINAL::info { $_[0]{attr} }
8
9 my $severity = shift || 0;
10 my $parser = new Rule6();
11 $parser->YYData->{INPUT} = shift || '0*2';
12 my $t = $parser->Run;
13
14 my $transform = Parse::Eyapp::Treeregexp->new(
15 STRING => q{
16 zero_times_whatever: TIMES(NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
17 },
18 SEVERITY => $severity,
19 FIRSTLINE => 15,
20 )->generate;
21
22 $t->s(our @all);
23
24 print $t->str,"\n";
The program gets the severity level from the command line (line 9). The specification of the term TIMES(NUM($x))
inside the transformation zero_times_whatever
does not clearly state that TIMES
must have two children. There are several interpretations of the treregexp depending on the level fixed for SEVERITY
:
0:
TIMES
must have at least one child. Don't care if it has more.1:
TIMES
must have exactly one child.2:
TIMES
must have exactly one child. When visit aTIMES
node with a different number of children issue a warning.3:
TIMES
must have exactly one child. When visit aTIMES
node with a different number of children issue an error.
Observe the change in behavior according to the level of SEVERITY
:
nereida:~/src/perl/YappWithDefaultAction/examples> numchildren.pl 0 '0*2'
NUM(TERMINAL[0])
nereida:~/src/perl/YappWithDefaultAction/examples> numchildren.pl 1 '0*2'
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
nereida:~/src/perl/YappWithDefaultAction/examples> numchildren.pl 2 '0*2'
Warning! found node TIMES with 2 children.
Expected 1 children (see line 16 of numchildren.pl)"
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
nereida:~/src/perl/YappWithDefaultAction/examples> numchildren.pl 3 '0*2'
Error! found node TIMES with 2 children.
Expected 1 children (see line 16 of numchildren.pl)"
at (eval 2) line 29
Array Treeregexp Expressions
The Treeregexp language permits expressions like:
A(@a,B($x),@c)
After the matching variable @A
contains the shortest prefix of $A->children
that does not match B($x)
. The variable @c
contains the remaining sufix of $A->children
.
The following example uses array treereg expressions to move an assignment out of loop (to be correct, we have to guarantee that the assignment is an invariant of the loop). See lines 98-111:
nereida:~/src/perl/YappWithDefaultAction/examples> \
cat -n moveinvariantoutofloopcomplexformula.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Parse::Eyapp;
4 use Parse::Eyapp::Treeregexp;
5
6 my $grammar = q{
.. ............ # as usual, but including WHILE loops
80 }; # end grammar
.. ......................................................................
93 my $program = "a =1000; c = 1; while (a) { c = c*a; b = 5; a = a-1 }\n";
94 $parser->YYData->{INPUT} = $program;
95 my $t = $parser->Run;
96 my @output = split /\n/, $t->str;
97
98 my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
99 moveinvariant: BLOCK(
100 @prests,
101 WHILE(VAR($b), BLOCK(@a, ASSIGN($x, NUM($e)), @c)),
102 @possts
103 )
104 => {
105 my $assign = $ASSIGN;
106 $BLOCK[1]->delete($ASSIGN);
107 $BLOCK[0]->insert_before($WHILE, $assign);
108 }
109 },
110 FIRSTLINE => 99,
111 );
112 $p->generate();
113 $moveinvariant->s($t);
114 my @output2 = split /\n/, $t->str;
115
116 my ($node1, $node2);
117 format STDOUT_TOP =
118 PROGRAM
119 -------------------------------------------------------
120 @||||||||||||||||||||||||||||||||||||||||||||||||||||||
121 $program
122 -------------------------------------------------------
123 Before | After
124 ---------------------------|---------------------------
125 .
126
127 format STDOUT =
128 @<<<<<<<<<<<<<<<<<<<<<<<<<<@|@<<<<<<<<<<<<<<<<<<<<<<<<<
129 $node1, '|',$node2
130 .
131
132 for (1..$#output) {
133 $node1 = $output[$_];
134 $node2 = $output2[$_];
135 write;
136 }
The call to the method delete
at line 106 deletes the ASSIGN
child of the second BLOCK
. The copy saved in $assign
is inserted as a child of the first block before the loop. Here is the output:
nereida:~/src/perl/YappWithDefaultAction/examples> \
moveinvariantoutofloopcomplexformula.pl | cat -n
1 PROGRAM
2 -------------------------------------------------------
3 a =1000; c = 1; while (a) { c = c*a; b = 5; a = a-1 }
4 -------------------------------------------------------
5 Before | After
6 ---------------------------|---------------------------
7 BLOCK( | BLOCK(
8 ASSIGN( | ASSIGN(
9 TERMINAL[a], | TERMINAL[a],
10 NUM( | NUM(
11 TERMINAL[1000] | TERMINAL[1000]
12 ) | )
13 ) # ASSIGN, | ) # ASSIGN,
14 ASSIGN( | ASSIGN(
15 TERMINAL[c], | TERMINAL[c],
16 NUM( | NUM(
17 TERMINAL[1] | TERMINAL[1]
18 ) | )
19 ) # ASSIGN, | ) # ASSIGN,
20 WHILE( | ASSIGN(
21 VAR( | TERMINAL[b],
22 TERMINAL[a] | NUM(
23 ), | TERMINAL[5]
24 BLOCK( | )
25 ASSIGN( | ) # ASSIGN,
26 TERMINAL[c], | WHILE(
27 TIMES( | VAR(
28 VAR( | TERMINAL[a]
29 TERMINAL[c] | ),
30 ), | BLOCK(
31 VAR( | ASSIGN(
32 TERMINAL[a] | TERMINAL[c],
33 ) | TIMES(
34 ) # TIMES | VAR(
35 ) # ASSIGN, | TERMINAL[c]
36 ASSIGN( | ),
37 TERMINAL[b], | VAR(
38 NUM( | TERMINAL[a]
39 TERMINAL[5] | )
40 ) | ) # TIMES
41 ) # ASSIGN, | ) # ASSIGN,
42 ASSIGN( | ASSIGN(
43 TERMINAL[a], | TERMINAL[a],
44 MINUS( | MINUS(
45 VAR( | VAR(
46 TERMINAL[a] | TERMINAL[a]
47 ), | ),
48 NUM( | NUM(
49 TERMINAL[1] | TERMINAL[1]
50 ) | )
51 ) # MINUS | ) # MINUS
52 ) # ASSIGN | ) # ASSIGN
53 ) # BLOCK | ) # BLOCK
54 ) # WHILE | ) # WHILE
55 ) # BLOCK | ) # BLOCK
Translation Schemes
Eyapp allows through the %metatree
directive the creation of Translation Schemes as described in the Dragon's book. Instead of executing the semantic actions associated with the productions, the syntax tree is built. Semantic actions aren't executed. Instead they are inserted as nodes of the syntax tree. The main difference with ordinary nodes being that the attribute of such a CODE
node is a reference to the anonymous subroutine representing the semantic action. The tree is later traversed in depth-first order using the $t->translation_scheme
method: each time a CODE
node is visited the action is executed.
The following example parses a tiny subset of a typical typed language and decorates the syntax tree with a new attribute t
holding the type of each declared variable:
use strict; # File examples/trans_scheme_simple_decls4.pl
use Data::Dumper;
use Parse::Eyapp;
our %s; # symbol table
my $ts = q{
%token FLOAT INTEGER NAME
%{
our %s;
%}
%metatree
%%
Dl: D <* ';'>
;
D : $T { $L->{t} = $T->{t} } $L
;
T : FLOAT { $lhs->{t} = "FLOAT" }
| INTEGER { $lhs->{t} = "INTEGER" }
;
L : $NAME
{ $NAME->{t} = $lhs->{t}; $s{$NAME->{attr}} = $NAME }
| $NAME { $NAME->{t} = $lhs->{t}; $L->{t} = $lhs->{t} } ',' $L
{ $s{$NAME->{attr}} = $NAME }
;
%%
}; # end $ts
sub Error { die "Error sintáctico\n"; }
{ # Closure of $input, %reserved_words and $validchars
my $input = "";
my %reserved_words = ();
my $validchars = "";
sub parametrize__scanner {
$input = shift;
%reserved_words = %{shift()};
$validchars = shift;
}
sub scanner {
$input =~ m{\G\s+}gc; # skip whites
if ($input =~ m{\G([a-z_A_Z]\w*)\b}gc) {
my $w = uc($1); # upper case the word
return ($w, $w) if exists $reserved_words{$w};
return ('NAME', $1); # not a reserved word
}
return ($1, $1) if ($input =~ m/\G([$validchars])/gc);
die "Not valid token: $1\n" if ($input =~ m/\G(\S)/gc);
return ('', undef); # end of file
}
} # end closure
Parse::Eyapp->new_grammar(input=>$ts,classname=>'main',outputfile=>'Types.pm');
my $parser = main->new(yylex => \&scanner, yyerror => \&Error);
parametrize__scanner(
"float x,y;\ninteger a,b\n",
{ INTEGER => 'INTEGER', FLOAT => 'FLOAT'},
",;"
);
my $t = $parser->YYParse() or die "Syntax Error analyzing input";
$t->translation_scheme;
$Data::Dumper::Indent = 1;
$Data::Dumper::Terse = 1;
$Data::Dumper::Deepcopy = 1;
$Data::Dumper::Deparse = 1;
print Dumper($t);
print Dumper(\%s);
Inside a Translation Scheme the lexical variable $lhs
refers to the attribute of the father.
Execution Stages of a Translation Scheme
The execution of a Translation Scheme can be divided in the following stages:
- 1. During the first stage the grammar is analyzed and the parser is built:
-
Parse::Eyapp->new_grammar(input=>$ts,classname=>'main',outputfile=>'Types.pm');
This stage is called Class Construction Time
- 2. A parser conforming to the generated grammar is built
-
my $parser = main->new(yylex => \&scanner, yyerror => \&Error);
This stage is called Parser Construction Time
- 3. The next phase is Tree construction time. The input is set and the tree is built:
-
parametrize__scanner( "float x,y;\ninteger a,b\n", { INTEGER => 'INTEGER', FLOAT => 'FLOAT'}, ",;" ); my $t = $parser->YYParse() or die "Syntax Error analyzing input";
- 4. The last stage is Execution Time. The tree is traversed in depth first order and the
CODE
nodes are executed. -
$t->translation_scheme;
This combination of bottom-up parsing with depth first traversin leads to a semantic behavior similar to LL and top-down parsers but with several differences:
The grammar can be left-recursive
At the time of executing the action the syntax tree is already built, therefore we can refer to nodes on the right side of the action like in:
D : $T { $L->{t} = $T->{t} } $L
The %begin
directive
The %begin { code }
directive can be used when building a translation scheme, i.e. when under the control of the %metatree
directive. It indicates that such code
will be executed at tree construction time. Therefore the code receives as arguments the references to the nodes of the branch than is being built. Usually the code assist in the construction of the tree. Line 39 of the following code shows an example. The action { $exp }
simplifies the syntax tree bypassing the parenthesis node. The example also illustrates the combined use of default actions and translation schemes.
nereida:~/src/perl/YappWithDefaultAction/examples> \
cat -n trans_scheme_default_action.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Data::Dumper;
4 use Parse::Eyapp;
5 use IO::Interactive qw(is_interactive);
6
7 my $translationscheme = q{
8 %{
9 # head code is available at tree construction time
10 use Data::Dumper;
11 our %sym; # symbol table
12 %}
13
14 %defaultaction { $lhs->{n} = eval " $left->{n} $_[2]->{attr} $right->{n} " }
15
16 %metatree
17
18 %right '='
19 %left '-' '+'
20 %left '*' '/'
21
22 %%
23 line: %name EXP
24 exp <+ ';'> /* Expressions separated by semicolons */
25 { $lhs->{n} = $_[1]->Last_child->{n} }
26 ;
27
28 exp:
29 %name PLUS
30 exp.left '+' exp.right
31 | %name MINUS
32 exp.left '-' exp.right
33 | %name TIMES
34 exp.left '*' exp.right
35 | %name DIV
36 exp.left '/' exp.right
37 | %name NUM $NUM
38 { $lhs->{n} = $NUM->{attr} }
39 | '(' $exp ')' %begin { $exp }
40 | %name VAR
41 $VAR
42 { $lhs->{n} = $sym{$VAR->{attr}}->{n} }
43 | %name ASSIGN
44 $VAR '=' $exp
45 { $lhs->{n} = $sym{$VAR->{attr}}->{n} = $exp->{n} }
46
47 ;
48
49 %%
50 # tail code is available at tree construction time
51 sub _Error {
52 die "Syntax error.\n";
53 }
54
55 sub _Lexer {
56 my($parser)=shift;
57
58 for ($parser->YYData->{INPUT}) {
59 defined($_) or return('',undef);
60
61 s/^\s*//;
62 s/^([0-9]+(?:\.[0-9]+)?)// and return('NUM',$1);
63 s/^([A-Za-z][A-Za-z0-9_]*)// and return('VAR',$1);
64 s/^(.)// and return($1,$1);
65 s/^\s*//;
66 }
67 }
68
69 sub Run {
70 my($self)=shift;
71 return $self->YYParse( yylex => \&_Lexer, yyerror => \&_Error );
72 }
73 }; # end translation scheme
74
75 $Data::Dumper::Indent = 1;
76 $Data::Dumper::Terse = 1;
77 $Data::Dumper::Deepcopy = 1;
78 my $p = Parse::Eyapp->new_grammar(
79 input=>$translationscheme,
80 classname=>'main',
81 firstline => 6,
82 outputfile => 'main.pm');
83 die $p->qtables() if $p->Warnings;
84 my $parser = main->new();
85 print "Write a sequence of arithmetic expressions: " if is_interactive();
86 $parser->YYData->{INPUT} = <>;
87 my $t = $parser->Run() or die "Syntax Error analyzing input";
88 $t->translation_scheme;
89 my $treestring = Dumper($t);
90 our %sym;
91 my $symboltable = Dumper(\%sym);
92 print <<"EOR";
93 ***********Tree*************
94 $treestring
95 ******Symbol table**********
96 $symboltable
97 ************Result**********
98 $t->{n}
99
100 EOR
Scope Analysis with Parse::Eyapp::Scope
Parse::Eyapp
provides support for Scope Analysis through the module Parse::Eyapp::Scope
. Scope Analysis solves the problem of matching each instance or use of an object in the source text with the definition that applies to such instance. Since it is a matching problem it can sometimes easily solved using m
as it was explained in section "Matching Trees".
The following pieces of code show how to implement scope analysis for a C-like language using Parse::Eyapp::Scope
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '131,149p' Types.eyp | cat -n
1 sub reset_file_scope_vars {
2 %st = (); # reset symbol table
3 ($tokenbegin, $tokenend) = (1, 1);
4 %type = ( INT => Parse::Eyapp::Node->hnew('INT'), # like new but
5 CHAR => Parse::Eyapp::Node->hnew('CHAR'), # creates a DAG
6 VOID => Parse::Eyapp::Node->hnew('VOID'),
7 );
8 $depth = 0;
9 $ids = Parse::Eyapp::Scope->new(
10 SCOPE_NAME => 'block',
11 ENTRY_NAME => 'info',
12 SCOPE_DEPTH => 'depth',
13 );
14 $loops = Parse::Eyapp::Scope->new(
15 SCOPE_NAME => 'exits',
16 );
17 $ids->begin_scope();
18 $loops->begin_scope();
19 }
Of course you have to include a directive
use Parse::Eyapp::Scope
in your client program.
The calls to Parse::Eyapp::Scope->new
method (lines 9-13 and 14-16 in the code above) create two Scope Manager objects. One scope manager to solve the scope problem for variables ($ids
) and another to solve the scope problem for loops ($loops
). The scope problem for loops consists in matching each instance of a BREAK
or CONTINUE
with the enclosing loop. The beginning of a scope is set by calling to the begin_scope
method (lines 17 and 18). The end of a scope is signalled by a call to the method end_scope
. Of course, sub reset_file_scope_vars
must be executed at the proper time:
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '170,203p' Types.eyp | cat -n
1 program: /* program -> definition + */
2 {
3 reset_file_scope_vars();
4 }
5 definition<%name PROGRAM +>.program
6 {
7 $program->{symboltable} = { %st }; # creates a copy of the s.t.
8 $program->{depth} = 0;
9 $program->{line} = 1;
10 $program->{types} = { %type };
11 $program->{lines} = $tokenend;
12
13 my ($nondec, $declared) = $ids->end_scope($program->{symboltable}, $program, 'type');
14
15 # Type checking: add a direct pointer to the data-structure
16 # describing the type
17 $_->{t} = $type{$_->{type}} for @$declared;
18
19 if (@$nondec) {
20 warn "Identifier ".$_->key." not declared at line ".$_->line."\n" for @$nondec;
21 die "\n";
22 }
23
24 my $out_of_loops = $loops->end_scope($program);
25 if (@$out_of_loops) {
26 warn "Error: ".ref($_)." outside of loop at line $_->{line}\n" for @$out_of_loops;
27 die "\n";
28 }
29
30 # Check that are not dangling breaks
31 reset_file_scope_vars();
32
33 $program;
34 }
Observe the different ways of calling end_scope
(lines 13 and 24). When a hash table is provided as first argument the declared symbols will be automatically inserted in it. In such case the classes of the nodes being inserted must have a key
method that computes the key for such node.
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '651,657p' Types.eyp | cat -n
1 sub VAR::key {
2 my $self = shift;
3
4 return $self->child(0)->{attr}[0];
5 }
6
7 *VARARRAY::key = *FUNCTIONCALL::key = \&VAR::key;
Each instance of an scoped object must be declared as belonging to the current scope using the scope_instance
method. The following is an example for the $loops
scope manager object:
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '335,346p' Types.eyp | cat -n
1 statement:
2 expression ';' { $_[1] }
3 | ';'
4 | %name BREAK
5 $BREAK ';'
6 {
7 my $self = shift;
8 my $node = $self->YYBuildAST(@_);
9 $node->{line} = $BREAK->[1];
10 $loops->scope_instance($node);
11 return $node;
12 }
and the following illustrates the same for the $ids
scope manager:
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '410,425p' Types.eyp | cat -n
1 Primary:
2 %name INUM
3 INUM
4 | %name CHARCONSTANT
5 CHARCONSTANT
6 | $Variable
7 {
8 $ids->scope_instance($Variable);
9 return $Variable
10 }
11 | '(' expression ')' { $_[2] }
12 | $function_call
13 {
14 $ids->scope_instance($function_call);
15 return $function_call # bypass
16 }
Of course, in each place where a new scope begins/ends the corresponding calls to begin_scope
and end_scope
must be issued. See the following code:
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '277,302p' Types.eyp | cat -n
1 block: /* Production is: block -> '{' declaration * statement * '}' */
2 '{'.bracket
3 { $ids->begin_scope(); }
4 declaration<%name DECLARATIONS *>.decs statement<%name STATEMENTS *>.sts '}'
5 {
6 my %st;
7
8 for my $lst ($decs->children) {
9
10 # control duplicated declarations
11 my $message;
12 die $message if $message = is_duplicated(\%st, $lst);
13
14 %st = (%st, %$lst);
15 }
16 $sts->{symboltable} = \%st;
17 $sts->{line} = $bracket->[1];
18 $sts->type("BLOCK") if (%st);
19 my ($nondec, $dec) = $ids->end_scope(\%st, $sts, 'type');
20
21 # Type checking: add a direct pointer to the data-structure
22 # describing the type
23 $_->{t} = $type{$_->{type}} for @$dec;
24
25 return $sts;
26 }
SEE ALSO
perldoc Parse::Eyapp
The pdf file in http://nereida.deioc.ull.es/~pl/perlexamples/Eyapp.pdf
perldoc eyapp,
perldoc treereg,
Análisis Léxico y Sintáctico, (Notes for a course in compiler construction) by Casiano Rodriguez-Leon. Available at http://nereida.deioc.ull.es/~pl/perlexamples/ Is the more complete and reliable source for Parse::Eyapp. However is in Spanish.
Man pages of yacc(1),
Man pages of bison(1),
REFERENCES
The classic book "Compilers: Principles, Techniques, and Tools" by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman (Addison-Wesley 1986)
AUTHOR
Casiano Rodriguez-Leon (casiano@ull.es)
A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien.
ACKNOWLEDGMENTS
This work has been supported by CEE (FEDER) and the Spanish Ministry of Educación y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project http://www.oplink.ull.es/). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years.
I wish to thank Francois Desarmenien for his Parse::Yapp
module, to my students at La Laguna and to the Perl Community. Special thanks to my family and Larry Wall.
LICENCE AND COPYRIGHT
Copyright (c) 2006-2007 Casiano Rodriguez-Leon (casiano@ull.es). All rights reserved.
Parse::Yapp copyright is of Francois Desarmenien, all rights reserved. 1998-2001
These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
3 POD Errors
The following errors were encountered while parsing the POD:
- Around line 820:
Expected '=item *'
- Around line 823:
Expected '=item *'
- Around line 1443:
Non-ASCII character seen before =encoding in 'sintáctico\n";'. Assuming CP1252