NAME
Parse::Eyapp - Extensions for Parse::Yapp
VERSION
1.088
SYNOPSIS
use Parse::Eyapp;
use Parse::Eyapp::Treeregexp;
sub TERMINAL::info {
$_[0]{attr}
}
my $grammar = q{
%right '=' # Lowest precedence
%left '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
%left '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
%left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b)
%tree # Let us build an abstract syntax tree ...
%%
line:
exp <%name EXPRESION_LIST + ';'>
{ $_[1] } /* list of expressions separated by ';' */
;
/* The %name directive defines the name of the class to which the node being built belongs */
exp:
%name NUM
NUM
| %name VAR
VAR
| %name ASSIGN
VAR '=' exp
| %name PLUS
exp '+' exp
| %name MINUS
exp '-' exp
| %name TIMES
exp '*' exp
| %name DIV
exp '/' exp
| %name UMINUS
'-' exp %prec NEG
| '(' exp ')'
{ $_[2] } /* Let us simplify a bit the tree */
;
%%
sub _Error { die "Syntax error near ".($_[0]->YYCurval?$_[0]->YYCurval:"end of file")."\n" }
sub _Lexer {
my($parser)=shift; # The parser object
for ($parser->YYData->{INPUT}) { # Topicalize
m{\G\s+}gc;
$_ eq '' and return('',undef);
m{\G([0-9]+(?:\.[0-9]+)?)}gc and return('NUM',$1);
m{\G([A-Za-z][A-Za-z0-9_]*)}gc and return('VAR',$1);
m{\G(.)}gcs and return($1,$1);
}
return('',undef);
}
sub Run {
my($self)=shift;
$self->YYParse( yylex => \&_Lexer, yyerror => \&_Error, );
}
}; # end grammar
our (@all, $uminus);
Parse::Eyapp->new_grammar( # Create the parser package/class
input=>$grammar,
classname=>'Calc', # The name of the package containing the parser
firstline=>7 # String $grammar starts at line 7 (for error diagnostics)
);
my $parser = Calc->new(); # Create a parser
$parser->YYData->{INPUT} = "2*-3+b*0;--2\n"; # Set the input
my $t = $parser->Run; # Parse it!
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;
# Let us transform the tree. Define the tree-regular expressions ..
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{ # Example of support code
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
=> {
my $op = $Op{ref($bin)};
$x->{attr} = eval "$x->{attr} $op $y->{attr}";
$_[0] = $NUM[0];
}
uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
},
OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations
$t->s($uminus); # Transform UMINUS nodes
$t->s(@all); # constant folding and mult. by zero
local $Parse::Eyapp::Node::INDENT=0;
print "\nSyntax Tree after transformations:\n",$t->str,"\n";
Structure of this Document
This document is divided in four parts:
The introduction attempts to summarize the way
eyapp
works. It assumes a reader familiar with parsing techniques and with some knowledge ofyacc
oryapp
orRecDescent
or similar tools.The first part describes the Eyapp language. It goes from section "The Eyapp Language" to section "Parse::Eyapp::Parse objects". If you are just interested in parsing and are looking for a yacc/yapp compatible extension read this part. If you have trouble with your
eyapp
program have a look at section "Debugging Parse::Eyapp Programs".The second part describes the Treeregexp language. It goes from section "The Treeregexp Language" to section "Tree Substitution: The s methods". Read it if you are interested in generating abstract syntax trees and transform them.
The third part (section "Scope Analysis with Parse::Eyapp::Scope") describes the Eyapp interface for Scope Analysis. It can be useful if your application language has nested scopes.
Introduction
Parse::Eyapp (Extended yapp) is a collection of modules that extends Francois Desarmenien Parse::Yapp 1.05. Eyapp extends yacc/yapp syntax with functionalities like named attributes, EBNF-like expressions, modifiable default action, automatic syntax tree building, semi-automatic abstract syntax tree building, translation schemes, tree regular expressions, tree transformations, scope analysis support, directed acyclic graphs and a few more.
Basic Concepts
Parsing is the activity of producing a syntax tree from an input stream. The program example in the synopsis section shows an example of parsing. The variable $grammar
contains a context free eyapp grammar defining the language of lists of arithmetic expressions. A context free grammar is a mathematical device to define languages. The grammar for the example in the synopsis section is:
line: exp <+ ';'>
;
exp:
NUM
| VAR
| VAR '=' exp
| exp '+' exp
| exp '-' exp
| exp '*' exp
| exp '/' exp
| '-' exp
| '(' exp ')'
;
A grammar generates a language. A grammar is defined by a set of production rules. A production rule has two components: a left hand side which is a syntactic variable or non terminal and a right hand side which is a phrase made of syntactic variables and terminals. The left hand side (lhs) and the right hand side (rhs) are usually separated by an arrow like in:
exp -> VAR = exp
A note: the production rule
line: exp <+ ';'>
is not really a production rule but an abbreviation for two productions. It stands for:
line : exp
| line ';' exp
;
A terminal or token never appears on the left hand side of a production rule. The phrases of the language are those obtained succesively applying the production rules of the grammar until no more rules can be applied. The succesive substitutions must start from the start
symbol of the grammar (line
in the example). Such legal sequence of substitutions is known as a derivation. The following is an example of a legal derivation (the big arrow =>
is read derives):
line => exp => VAR = exp => VAR = exp + exp => VAR = exp + NUM => VAR = VAR + NUM
thus the phrase VAR = VAR + NUM
belongs to the language generated by the former grammar. A derivation like can be seen as a tree. For instance, the former derivation is equivalen (has the same information) than the tree:
line(exp(VAR, '=', exp(exp(VAR), '+', exp(NUM))))
Such a tree is called a syntax tree for the input VAR = VAR + NUM
. A grammar is said to be ambiguous if there are phrases in the generated language that have more than one syntax tree. The grammar in the synopsis example is ambiguous. Here is an alternative tree for the same phrase VAR = VAR + NUM
:
line(exp(exp(VAR, '=', exp(VAR)), '+', exp(NUM)))
Parsers created by eyapp
do not deal directly with the input. Instead they expect the input to be processed by a lexical analyzer. The lexical analyzer parses the input and produces the next token. A token is a pair. The first component is the name of the token (like NUM
or VAR
) and the second is its attribute (i.e. ay information associated with the token, like that the value is 4
for a NUM
or the identifier is temperature
for a VAR
). Tokens are usually defined using regular expressions. Thus the token NUM
is characterized by /[0-9]+(?:\.[0-9]+)?/
and the token VAR
by /[A-Za-z][A-Za-z0-9_]*/
. The subroutine _Lexer
in the tail section of the SYNOPSIS section is a typical example of a typical lexical analyzer:
sub _Lexer {
my($parser)=shift; # The parser object
for ($parser->YYData->{INPUT}) { # Topicalize
m{\G\s+}gc; # skip whites
$_ eq '' and return('',undef);
m{\G([0-9]+(?:\.[0-9]+)?)}gc and return('NUM',$1);
m{\G([A-Za-z][A-Za-z0-9_]*)}gc and return('VAR',$1);
m{\G(.)}gcs and return($1,$1);
}
return('',undef);
}
The input was saved in the YYData->{INPUT}
section of the $parser
object. The for
loop is a false for
: its goal is to make $_
an alias of $parser->YYData->{INPUT}
. To catch the next pattern we use the anchor \G
. The \G
anchor matches at the point where the previous /g
match left off. Normally, when a scalar m{}g
match fails, the match position is reset and \G
will start matching at the beginning of the string. The c
option causes the match position to be retained following an unsuccessful match. The couple ('',undef)
signals the end of the input.
Parse::Eyapp
can analyze your grammar and produce a parser from your grammar. Actually Parse::Eyapp
is a translation scheme analyzer. A translation scheme scheme is a context free grammar where the right hand sides of the productions have been augmented with semantic actions (i.e. with chunks of Perl code):
A -> alpha { action(@_) } beta
The analyzer generated by Eyapp executes { action(@_) }
after all the semantic actions asssociated with alpha
have been executed and before the execution of any of the semantic actions associated with beta
.
Notice that ambiguous grammars produce ambiguous translation schemes: since a phrase may have two syntactic trees it will be more than one tree-traversing and consequently more than one way to execute the embedded semantic actions. Certainly different execution orders will usually produce different results. Thus, syntactic ambiguities translate onto semantic ambiguities. That is why it is important to resolve all the ambiguities and conflicts that may arise in our grammar. This is the function of the %left
and %right
declarations on the header section:
my $grammar = q{
# header section
%right '=' # Lowest precedence
%left '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
%left '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
%left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b)
%tree # Let us build an abstract syntax tree ...
%%
.... # body section
%%
.... # tail section
};
Priority can be assigned to tokens by using the %left
and %right
declarations. Tokens in lines below have more precedence than tokens in line above. By giving token '+'
more precedence than token '='
we solve the ambiguity for phrases like VAR = VAR + NUM
. The tree
line(exp(VAR, '=', exp(exp(VAR), '+', exp(NUM))))
will be built, discarding the other tree. Since priority means earlier evaluation and the evaluation of semantic actions is bottom up, the deeper the associated subtree the higher the priority.
In a translation scheme the embedded actions modify the attributes associated with the symbols of the grammar.
A -> alpha { action(@_) } beta
Each symbol on the right hand side of a production rule has an associated scalar attribute. In eyapp
the attributes of the symbol to the left of action
are passed as arguments to action
(in the example, those of alpha
). These arguments are preceded by a reference to the syntax analyzer object. There is no way inside an ordinary eyapp
program for an intermediate action
to access the attributes of the symbols on its right, i.e. those associated with the symbols of beta
. This restriction is lifted if you use the %metatree
directive to build a full translation scheme. See the section "Translation Schemes and the %metatree directive" in Parse::Eyapp to know more about full translation schemes.
Actions on the right hand side counts as symbols and so they can be referenced by its positional arument in later actions in the same production rule. For intermediate actions, the value returned by the action
is the attribute associated with such action. For an action at the end of the rule:
A -> alpha { lastaction(@_) }
the returned value constitutes the attribute of the left hand side of the rule (the attribute of A
in this case). The action at the end of the right hand side is called the action associated with the production rule. When no explicit action has been associated with a production rule the default action applies. In Parse::Eyapp
the programmer can define what is the default action.
A very special action is "build the node associated with this production rule" which is performed by the YYBuildAST
method of the parser object:
%default action { goto &Parse::Eyapp::Driver::YYBuildAST }
The %tree
directive used in the SYNOPSIS example is an abbreviation for this and has the effect of building an abstract syntax tree for the input.
The call to
Parse::Eyapp->new_grammar( # Create the parser package/class
input=>$grammar,
classname=>"Calc", # The name of the package containing the parser
);
compiles $grammar
and produces a new class Calc
containing a LALR parser for such grammar. The call
$parser = Calc->new()
creates a parser object for the language generated by $grammar
. Using the YYParse
of the parser object:
$self->YYParse( yylex => \&_Lexer, yyerror => \&_Error, )
YYParse
is called with arguments a reference to the lexical analyzer and a reference to the error diagnostic subroutine _Error
. Such subroutine will be called by YYParse
when an error occurs. Is therefore convenient to give a meaningful diagnostic:
sub _Error {
die "Syntax error near "
.($_[0]->YYCurval?$_[0]->YYCurval:"end of file")."\n"
}
The parser method YYCurval
returns the value of the current token. A more accurate error diagnostic subroutine can be obtained if the lexical analyzer is modified so that tokens keep the line number where they start (i.e. the token is a pair (TOKENNAME, [ ATTRIBUTE, LINENUMBER])
. In such case the _Error
subroutine can be rewritten as:
sub _Error {
my($token)=$_[0]->YYCurval;
my($what)= $token ? "input: '$token->[0]' in line $token->[1]" : "end of input";
my @expected = $_[0]->YYExpect();
my $expected = @expected? "Expected one of these tokens: '@expected'":"";
croak "Syntax error near $what. $expected\n";
}
The YYExpect
method returns the set of tokens that were expected when the error occurred.
The input in
$parser->YYData->{INPUT}
is then analyzed by YYParse
and an abstract syntax tree is built. The tree rooted on a Parse::Eyapp::Node
can be displayed using the method str
:
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;
The following is the description of the syntax tree produced by the call $t->str
for the list of expressions "2*-3+b*0;--2\n";
:
pl@nereida:~/LEyapp/examples$ synopsis.pl
Syntax Tree:
EXPRESION_LIST(
PLUS(
TIMES(
NUM(
TERMINAL[2]
),
UMINUS(
NUM(
TERMINAL[3]
)
) # UMINUS
) # TIMES,
TIMES(
VAR(
TERMINAL[b]
),
NUM(
TERMINAL[0]
)
) # TIMES
) # PLUS,
UMINUS(
UMINUS(
NUM(
TERMINAL[2]
)
) # UMINUS
) # UMINUS
) # EXPRESION_LIST
Did you notice that the TERMINAL
nodes appear decorated with its attribute? This is because each time the Parse::Eyapp::Node
method str
visits a node checks if the node has a method info
(i.e. $node->can(info)
). If so, the info
method is called and the string returned is concatenated in the description string. This is the reason for these three lines at the beginning of the SYNOPSIS example:
sub TERMINAL::info {
$_[0]{attr}
}
Parse::Eyapp
not only gives support to parsing but to later phases of the translation process: tree transformations and scope analysis (scope analysis is the task to find which definition applies to an use of an object in the source). The program in the synopsis section shows an example of a tree transformation specification. Tree transformations are specified using a language called Tree regular expressions. The transformation object is created by the constructor Parse::Eyapp::Treeregexp->new
.
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{ # Example of support code
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
=> {
my $op = $Op{ref($bin)};
$x->{attr} = eval "$x->{attr} $op $y->{attr}";
$_[0] = $NUM[0];
}
uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
},
);
The set of transformations specified in the example are
The transformation
constantfold
produces constant folding i.e. trees of expressions like3*2+4
are reduced to the tree for10
{ # Example of support code my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/'); } constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y)) => { my $op = $Op{ref($bin)}; $x->{attr} = eval "$x->{attr} $op $y->{attr}"; $_[0] = $NUM[0]; }
Here
constantfold
is the name of the transformation. The treeregexp compiler will produce an object$constantfold
implementing the transformation. After the name comes the tree pattern:/TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
It matches any subtree rooted in any node belonging to one of these classes:
TIMES
orPLUS
orDIV
orMINUS
that has two children belonging to theNUM
class. The Perl code after the big arrow is executed on any matching subtree. We can refer to the root of the subtree using the variable$bin
. We can also refer to the child of the firstNUM
node using$x
. In the same way$y
refers to the child of the secondNUM
node. Since there are twoNUM
nodes in the pattern, we refer to them inside the transformation part using the array@NUM
:$_[0] = $NUM[0];
The action uses and
eval
and the hash%Op
to compute the corresponding reduction of the two nodes. The hash%Op
was defined in a previous section containing support code. You can insert in any place of a treeregexp program such support code by surrounding it with curly brackets. The subtree that matched (that is in$_[0]
) is substituted by its left child:$_[0] = $NUM[0];
The transformations
zero_times_whatever
andwhatever_times_zero
produce the simplification of trees corresponding to multiplications by zero. Trees for expressions like(a+b)*0
or0*(b-4)
are reduced to the tree for 0.zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
Here
zero_times_whatever
is the name of the transformation. The patternTIMES(NUM($x), .)
matches anyTIMES
node with two children and whose first child belongs to theNUM
class. The dot matches any subtree, indicating that we don't care what sort of tree the right child is. The third component{ $x->{attr} == 0 }
is the semantic pattern. If both the shape pattern and the semantic pattern apply the action after the arrow is applied. The subtrees is substituted by its left child.
The transformation
uminus
simplifies the tree for unary minus of constant expressions.uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
It matches trees rooted in a
UMINUS
node whose only child is aNUM
node. In such case the sign of the number that is the attribute of theTERMINAL
node is changed and the tree is substituted by its single child.
The call
$p->generate();
compiles the transformation specification producing a set of transformations $constantfold
, $zero_times_whatever
, whatever_times_zero
and $uminus
. Transformations are Parse::Eyapp::YATW
objects. The list variable @all
refer to the whole set of Parse::Eyapp::YATW
transformations.
The nodes of the abstract syntax tree are objects. The class (NUM
, TIMES
, UMINUS
, etc.) defines the type of node. All node classes inherit from the class Parse::Eyapp::Node
. Parse::Eyapp::Node
provides a set of methods to manipulate nodes. Among these methods are str
, m
and s
. The m
and s
methods resemble the matching and substitution operators for regular expressions. But instead of regular expressions they work with tree transformations or treeregexp
s or, more precisely with Parse::Eyapp::YATW
objects. By calling:
$t->s($uminus);
subtrees like
UMINUS(UMINUS(NUM(TERMINAL[2])))
are simplified to
NUM(TERMINAL[2])
The call to
$t->s(@all);
applies the whole set of transformations. The transformations in @all
are iteratively applied to the tree $t
until no transformation succeeds: Yes, that means that a inappropriate set of transformations my hang your program.
Thus, the former syntax tree for "2*-3+b*0;--2\n";
becomes:
EXPRESION_LIST(NUM(TERMINAL[-6]),NUM(TERMINAL[2]))
The analyzer has been able to optimize - at compile time - the computation of these two expressions
2*-3+b*0;
--2
reducing them to the computation of:
-6;
2
The Eyapp Language
Eyapp Grammar
This section describes the syntax of the Eyapp language using its own notation. The grammar extends yacc and yapp grammars. Semicolons have been omitted to save space. Between C-like comments you can find an (informal) explanation of the language associated with the token.
eyapp: head body tail ;
symbol: LITERAL /* A string literal like 'hello' */
| ident
ident: IDENT /* IDENT is [A-Za-z_][A-Za-z0-9_]* */
head: headsec '%%'
headsec: decl *
decl: '\n'
| SEMANTIC typedecl symlist '\n' /* SEMANTIC is %semantic\s+token */
| SYNTACTIC typedecl symlist '\n' /* SYNTACTIC is %syntactic\s+token */
| TOKEN typedecl symlist '\n' /* TOKEN is %token */
| ASSOC typedecl symlist '\n' /* ASSOC is %(left|right|nonassoc) */
| START ident '\n' /* START is %start */
| HEADCODE '\n' /* HEADCODE is %{ Perl code ... %} */
| UNION CODE '\n' /* UNION CODE see yacc/bison */
| DEFAULTACTION CODE '\n' /* DEFAULTACTION is %defaultaction */
| TREE treeclauses? '\n' /* TREE is %tree */
| METATREE '\n' /* METATREE is %metatree */
| TYPE typedecl identlist '\n' /* TYPE is %type */
| EXPECT NUMBER '\n' /* EXPECT is %expect */
/* NUMBER is \d+ */
typedecl: /* empty */
| '<' IDENT '>'
treeclauses: BYPASS ALIAS? | ALIAS BYPASS?
symlist: symbol +
identlist: ident +
body: rules * '%%'
rules: IDENT ':' rhss ';'
rhss: rule <+ '|'>
rule: optname rhs (prec epscode)?
rhs: rhseltwithid *
rhseltwithid :
rhselt '.' IDENT
| '$' rhselt
| rhselt
rhselt: symbol
| code
| '(' optname rhs ')'
| rhselt STAR /* STAR is (%name\s*([A-Za-z_]\w*)\s*)?\* */
| rhselt '<' STAR symbol '>'
| rhselt OPTION /* OPTION is (%name\s*([A-Za-z_]\w*)\s*)?\? */
| rhselt '<' PLUS symbol '>'
| rhselt PLUS /* PLUS is (%name\s*([A-Za-z_]\w*)\s*)?\+ */
optname: (NAME IDENT)? /* NAME is %name */
| NOBYPASS IDENT /* NOBYPASS is %no\s+bypass */
prec: PREC symbol /* PREC is %prec */
epscode: code ?
code:
CODE /* CODE is { Perl code ... } */
| BEGINCODE /* BEGINCODE is %begin { Perl code ... } */
tail: TAILCODE ? /* TAILCODE is { Perl code ... } */
The semantic of Eyapp
agrees with the semantic of yacc
and yapp
for all the common constructions.
Comments
Comments are either Perl style, from #
up to the end of line, or C style, enclosed between /*
and */
.
Syntactic Variables, Symbolic Tokens and String Literals
Two kind of symbols may appear inside a Parse::Eyapp program: Non-terminal symbols or syntactic variables, called also left-hand-side symbols and Terminal symbols, called also Tokens.
Tokens are the symbols the lexical analyzer function returns to the parser. There are two kinds: symbolic tokens and string literals.
Syntactic variables and symbolic tokens identifiers must conform to the regular expression [A-Za-z][A-Za-z0-9_]*
.
When building the syntax tree (i.e. when running under the %tree
directive) symbolic tokens will be considered semantic tokens (see section "Syntactic and Semantic tokens").
String literals are enclosed in single quotes and can contain almost anything. They will be received by the parser as double-quoted strings. Any special character as '"'
, '$'
and '@'
is escaped. To have a single quote inside a literal, escape it with '\'.
When building the syntax tree (i.e. when running under the %tree
directive) string literals will be considered syntactic tokens (see section "Syntactic and Semantic tokens").
Parts of an eyapp
Program
An Eyapp program has three parts called head, body and tail:
eyapp: head body tail ;
Each part is separated from the former by the symbol %%
:
head: headsec '%%'
body: rulesec '%%'
The Head Section
The head section contains a list of declarations
headsec: decl *
There are different kinds of declarations.
This reference does not fully describes all the declarations that are shared with yacc and yapp.
Example of Head Section
In this and the next sections we will describe the basics of the Eyapp language using the file examples/Calc.eyp
that accompanies this distribution. This file implements a trivial calculator. Here is the header section:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '1,11p' Calc.eyp | cat -n
1 # examples/Calc.eyp
2 %right '='
3 %left '-' '+'
4 %left '*' '/'
5 %left NEG
6 %right '^'
7 %{
8 my %s; # symbol table
9 %}
10
11 %%
Declarations and Precedence
Lines 2-5 declare several tokens. The usual way to declare tokens is through the %token
directive. The declarations %nonassoc
, %left
and %right
not only declare the tokens but also associate a priority with them. Tokens declared in the same line have the same precedence. Tokens declared with these directives in lines below have more precedence than those declared above. Thus, in the example above we are saying that "+"
and "-"
have the same precedence but higher precedence than =. The final effect of "-"
having greater precedence than = will be that an expression like:
a = 4 - 5
will be interpreted as
a = (4 - 5)
and not as
(a = 4) - 5
The use of the %left
indicates that - in case of ambiguity and a match between precedences - the parser must build the tree corresponding to a left parenthesization. Thus, the expression
4 - 5 - 9
will be interpreted as
(4 - 5) - 9
Header Code
Perl code surrounded by %{
and %}
can be inserted in the head section. Such code will be inserted in the module generated by eyapp
near the beginning. Therefore, declarations like the one of the calculator symbol table %s
7 %{
8 my %s; # symbol table
9 %}
will be visible from almost any point in the file.
The Start Symbol of the Grammar
%start IDENT
declares IDENT
as the start symbol of the grammar. When %start
is not used, the first rule in the body section will be used.
Expect
The %expect #NUMBER
directive works as in bison and suppress warnings when the number of Shift/Reduce conflicts is exactly #NUMBER
. See section "Solving Ambiguities and Conflicts" to know more about Shift/Reduce conflicts.
Type and Union
C oriented declarations like %type
and %union
are parsed but ignored.
The %strict
Directive
By default, identifiers appearing in the rule section will be classified as terminal if they don't appear in the left hand side of any production rules.
The directive %strict
forces the declaration of all tokens. The following eyapp
program issues a warning:
pl@nereida:~/LEyapp/examples$ cat -n bugyapp2.eyp
1 %strict
2 %%
3 expr: NUM;
4 %%
pl@nereida:~/LEyapp/examples$ eyapp bugyapp2.eyp
Warning! Non declared token NUM at line 3 of bugyapp2.eyp
To keep silent the compiler declare all tokens using one of the token declaration directives (%token
, %left
, etc.)
pl@nereida:~/LEyapp/examples$ cat -n bugyapp3.eyp
1 %strict
2 %token NUM
3 %%
4 expr: NUM;
5 %%
pl@nereida:~/LEyapp/examples$ eyapp bugyapp3.eyp
pl@nereida:~/LEyapp/examples$
It is a good practice to use %strict
at the beginning of your grammar.
Default Action Directive
In Parse::Eyapp
you can modify the default action using the %defaultaction { Perl code }
directive. See section "Default actions".
Tree Construction Directives
Parse::Eyapp
facilitates the construction of concrete syntax trees and abstract syntax trees (abbreviated AST from now on) through the %tree
%metatree
directives. See sections "Abstract Syntax Trees : %tree and %name" and "Translation Schemes and the %metatree directive".
Syntactic and Semantic Tokens
The new token declaration directives %syntactic token
and %semantic token
can change the way eyapp
builds the abstract syntax tree. See section "Syntactic and Semantic tokens".
The Body
The body section contains the rules describing the grammar:
body: rules * '%%'
rules: IDENT ':' rhss ';'
rhss: (optname rhs (prec epscode)?) <+ '|'>
Rules
A rule is made of a left-hand-side symbol (the syntactic variable), followed by a ':'
and one or more right-hand-sides (or productions) separated by '|'
and terminated by a ';'
like in:
exp:
exp '+' exp
| exp '-' exp
| NUM
;
A production (right hand side) may be empty:
input:
/* empty */
| input line
;
The former two productions can be abbreviated as
input:
line *
;
The operators *
, +
and ?
are presented in section "Lists and Optionals".
A syntactic variable cannot appear more than once as a rule name (This differs from yacc).
Semantic Values and Semantic Actions
In Parse::Eyapp
a production rule
A -> X_1 X_2 ... X_n
can be followed by a semantic action:
A -> X_1 X_2 ... X_n { Perl Code }
Such semantic action is nothing but Perl code that will be treated as an anonymous subroutine. The semantic action associated with production rule A -> X_1 X_2 ... X_n
is executed after any actions associated with the subtrees of X_1
, X_2
, ..., X_n
. Eyapp
parsers build the syntax tree using a left-right bottom-up traverse of the syntax tree. Each times the Parser visits the node associated with the production A -> X_1 X_2 ... X_n
the associated semantic action is called. Asociated with each symbol of a Parse::Eyapp grammar there is a scalar Semantic Value or Attribute. The semantic values of terminals are provided by the lexical analyzer. In the calculator example (see file examples/Calc.yp
in the distribution), the semantic value associated with an expression is its numeric value. Thus in the rule:
exp '+' exp { $_[1] + $_[3] }
$_[1]
refers to the attribute of the first exp
, $_[2]
is the attribute associated with '+'
, which is the second component of the pair provided by the lexical analyzer and $_[3]
refers to the attribute of the second exp
.
When the semantic action/anonymous subroutine is called, the arguments are as follows:
$_[1]
to$_[n]
are the attributes of the symbolsX_1
,X_2
, ...,X_n
. Just as$1
to$n
in yacc,$_[0]
is the parser object itself. Having$_[0]
beeing the parser object itself allows you to call parser methods. Most yacc macros have been converted into parser methods. See section "Methods Available in the Generated Class".
The returned value will be the attribute associated with the left hand side of the production.
Names can be given to the attributes using the dot notation (see file examples/CalcSimple.eyp
):
exp.left '+' exp.right { $left + $right }
See section "Names for attributes" for more details about the dot and dollar notations.
If no action is specified and no %defaultaction
is specified the default action
{ $_[1] }
will be executed instead. See section "Default actions" to know more.
Actions in Mid-Rule
Actions can be inserted in the middle of a production like in:
block: '{'.bracket { $ids->begin_scope(); } declaration*.decs statement*.sts '}' { ... }
A middle production action is managed by inserting a new rule in the grammar and associating the semantic action with it:
Temp: /* empty */ { $ids->begin_scope(); }
Middle production actions can refer to the attributes on its left. They count as one of the components of the production. Thus the program:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '1,4p' intermediateaction2.yp
%%
S: 'a' { $_[1]x4 }.mid 'a' { print "$_[2], $mid, $_[3]\n"; }
;
%%
The auxiliar syntactic variables are named @#position-#order
where #position
is the position of the action in the rhs and order
is an ordinal number. See the .output
file for the former example:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ eyapp -v intermediateaction2.yp
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '1,5p' intermediateaction2.output
Rules:
------
0: $start -> S $end
1: S -> 'a' @1-1 'a'
2: @1-1 -> /* empty */
when given input aa
the execution will produce as output aaaa, aaaa, a
.
Example of Body Section
Following with the calculator example, the body is:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '12,48p' Calc.eyp | cat -n
1 start:
2 input { \%s }
3 ;
4
5 input: line *
6 ;
7
8 line:
9 '\n' { undef }
10 | exp '\n' { print "$_[1]\n" if defined($_[1]); $_[1] }
11 | error '\n'
12 {
13 $_[0]->YYErrok;
14 undef
15 }
16 ;
17
18 exp:
19 NUM
20 | $VAR { $s{$VAR} }
21 | $VAR '=' $exp { $s{$VAR} = $exp }
22 | exp.left '+' exp.right { $left + $right }
23 | exp.left '-' exp.right { $left - $right }
24 | exp.left '*' exp.right { $left * $right }
25 | exp.left '/' exp.right
26 {
27 $_[3] and return($_[1] / $_[3]);
28 $_[0]->YYData->{ERRMSG} = "Illegal division by zero.\n";
29 $_[0]->YYError; # Pretend that a syntactic error ocurred: _Error will be called
30 undef
31 }
32 | '-' $exp %prec NEG { -$exp }
33 | exp.left '^' exp.right { $left ** $right }
34 | '(' $exp ')' { $exp }
35 ;
36
37 %%
This example does not uses any of the Eyapp extensions (with the exception of the star list at line 5) and the dot and dollar notations. Please, see the Parse::Yapp pages and elsewhere documentation on yacc and bison for more information.
Solving Ambiguities and Conflicts
When Eyapp analizes a grammar like:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ cat -n ambiguities.eyp
1 %%
2 exp:
3 NUM
4 | exp '-' exp
5 ;
6 %%
it will produce a warning announcing the existence of shift-reduce conflicts:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ eyapp ambiguities.eyp
1 shift/reduce conflict (see .output file)
State 5: reduce by rule 2: exp -> exp '-' exp (default action)
State 5: shifts:
to state 3 with '-'
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ ls -ltr | tail -1
-rw-rw---- 1 pl users 1082 2007-02-06 08:26 ambiguities.output
when eyapp
finds warnings automatically produces a .output
file describing the conflict.
What the warning is saying is that an expression like exp '-' exp
(rule 2) followed by a minus '-'
can be worked in more than one way. If we have an input like NUM - NUM - NUM
the activity of a LALR(1) parser (the family of parsers to which Eyapp belongs) consists of a sequence of shift and reduce actions. A shift action has as consequence the reading of the next token. A reduce action is finding a production rule that matches and substituting the rhs of the production by the lhs. For input NUM - NUM - NUM
the activity will be as follows (the dot is used to indicate where the next input token is):
.NUM - NUM - NUM # shift
NUM.- NUM - NUM # reduce exp: NUM
exp.- NUM - NUM # shift
exp -.NUM - NUM # shift
exp - NUM.- NUM # reduce exp: NUM
exp - exp.- NUM # shift/reduce conflict
up this point two different decisions can be taken: the next description can be
exp.- NUM # reduce by exp: exp '-' exp (rule 2)
or:
exp - exp -.NUM # shift '-' (to state 3)
that is why it is called a shift-reduce conflict.
That is also the reason for the precedence declarations in the head section. Another kind of conflicts are reduce-reduce conflicts. They arise when more that rhs can be applied for a reduction action.
Eyapp solves the conflicts applying the following rules:
In a shift/reduce conflict, the default is the shift.
In a reduce/reduce conflict, the default is to reduce by the earlier grammar production (in the input sequence).
The precedences and associativities are associated with tokens in the declarations section. This is made by a sequence of lines beginning with one of the directives:
%left
,%right
, or%nonassoc
, followed by a list of tokens. All the tokens on the same line have the same precedence and associativity; the lines are listed in order of increasing precedence.A precedence and associativity is associated with each grammar production; it is the precedence and associativity of the last token or literal in the right hand side of the production.
The
%prec
directive can be used when a rhs is involved in a conflict and has no tokens inside or it has but the precedence of the last token leads to an incorrect interpretation. A rhs can be followed by an optional%prec token
directive giving the production the precedence of thetoken
exp: '-' exp %prec NEG { -$_[1] }
If there is a shift/reduce conflict, and both the grammar production and the input character have precedence and associativity associated with them, then the conflict is solved in favor of the action (shift or reduce) associated with the higher precedence. If the precedences are the same, then the associativity is used; left associative implies reduce, right associative implies shift, and nonassociating implies error.
To solve a shift-reduce conflict between a production A --> SOMETHING
and a token 'a'
you can follow this procedure:
- 1. Edit the
.output
file - 2. Search for the state where the conflict between the production and the token is. In our example it looks like:
-
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '56,65p' ambiguities.output State 5: exp -> exp . '-' exp (Rule 2) exp -> exp '-' exp . (Rule 2) '-' shift, and go to state 3 '-' [reduce using rule 2 (exp)] $default reduce using rule 2 (exp)
- 3. Inside the state there has to be a production of the type
A --> SOMETHING.
(with the dot at the end) indicating that a reduction must take place. There has to be also another production of the formA --> prefix . suffix
, where suffix can start with the involved token'a'
. - 4. Decide what action shift or reduce matches the kind of trees you want. In this example we want
NUM - NUM - NUM
to produce a tree likeMINUS(MINUS(NUM, NUM), NUM)
and notNUM, MINUS(MINUS(NUM, NUM))
. We want the conflict inexp - exp.- NUM
to be solved in favor of the reduction byexp: exp '-' exp
. This is achieved by declaring%left '-'
.
Error Recovery
The token name error
is reserved for error handling. This name can be used in grammar productions; it suggests places where errors are expected, and recovery can take place:
line:
'\n' { undef }
| exp '\n' { print "$_[1]\n" if defined($_[1]); $_[1] }
| error '\n'
{
$_[0]->YYErrok;
undef
}
The parser pops its stack until it enters a state where the token error
is legal. It then shifts the token error
and proceeds to discard tokens until finding one that is acceptable. In the example all the tokens until finding a '\n'
will be skipped. If no special error productions have been specified, the processing will halt.
In order to prevent a cascade of error messages, the parser, after detecting an error, remains in error state until three tokens have been successfully read and shifted. If an error is detected when the parser is already in error state, no message is given, and the input token is quietly deleted. The method YYErrok
used in the example communicates to the parser that a satisfactory recovery has been reached and that it can safely emit new error messages.
You cannot have a literal 'error' in your grammar as it would confuse the driver with the error token. Use a symbolic token instead.
The Tail
The tail section contains Perl code. Usually the lexical analyzer and the Error management subroutines go there. A better practice however is to isolate both subroutines in a module and use them in the grammar. An example of this is in files examples/CalcUsingTail.eyp
and examples/Tail.pm
.
The Lexical Analyzer
The Lexical Analyzer is called each time the parser needs a new token. It is called with only one argument (the parser object) and returns a pair containing the next token and its associated attribute.
The fact that is a method of the parser object means that the parser methods are accesible inside the lexical analyzer. Specially interesting is the $_[0]->YYData
method which provides access to the user data area.
When the lexical analyzer reaches the end of input, it must return the pair ('', undef)
See below how to write a lexical analyzer (file examples/Calc.eyp
):
1 sub make_lexer {
2 my $input = shift;
3
4 return sub {
5 my $parser = shift;
6
7 for ($$input) {
8 m{\G[ \t]*}gc;
9 m{\G([0-9]+(?:\.[0-9]+)?)}gc and return ('NUM',$1);
10 m{\G([A-Za-z][A-Za-z0-9_]*)}gc and return ('VAR',$1);
11 m{\G\n}gc and do { $lineno++; return ("\n", "\n") };
12 m{\G(.)}gc and return ($1,$1);
13
14 return('',undef);
15 }
16 }
17 }
The subroutine make_lexer
creates the lexical analyzer as a closure. The lexer returned by make_lexer
is used by the YYParse
method:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '90,97p' Calc.eyp | cat -n
1 sub Run {
2 my($self)=shift;
3 my $input = shift or die "No input given\n";
4
5 return $self->YYParse( yylex => make_lexer($input), yyerror => \&_Error,
6 #yydebug =>0x1F
7 );
8 }
The Error Report Subroutine
The Error Report subroutine is also a parser method, and consequently receives as parameter the parser object.
See the error report subroutine for the example in examples/Calc.eyp
:
1 %%
2
3 my $lineno = 1;
4
5 sub _Error {
6 my $parser = shift;
7
8 exists $parser->YYData->{ERRMSG}
9 and do {
10 print $parser->YYData->{ERRMSG};
11 delete $parser->YYData->{ERRMSG};
12 return;
13 };
14 my($token)=$parser->YYCurval;
15 my($what)= $token ? "input: '$token'" : "end of input";
16 my @expected = $parser->YYExpect();
17 local $" = ', ';
18 print << "ERRMSG";
19
20 Syntax error near $what (lin num $lineno).
21 Expected one of these terminals: @expected
22 ERRMSG
23 }
See the Parse::Yapp pages and elsewhere documentation on yacc and bison for more information.
Using an Eyapp Program
The following is an example of a program that uses the calculator explained in the two previous sections:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ cat -n usecalc.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Calc;
4
5 my $parser = Calc->new();
6 my $input = <<'EOI';
7 a = 2*3
8 d = 5/(a-6)
9 b = (a+1)/7
10 c=a*3+4)-5
11 a = a+1
12 EOI
13 my $t = $parser->Run(\$input);
14 print "========= Symbol Table ==============\n";
15 print "$_ = $t->{$_}\n" for sort keys %$t;
The output for this program is (the input for each output appear as a Perl comment on the right):
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ eyapp Calc.eyp
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ usecalc.pl
6 # a = 2*3
Illegal division by zero. # d = 5/(a-6)
1 # b = (a+1)/7
Syntax error near input: ')' (lin num 4). # c=a*3+4)-5
Expected one of these terminals: -, /, ^, *, +,
7 # a = a+1
========= Symbol Table ==============
a = 7
b = 1
c = 22
Lists and Optionals
The elements of a rhs can be one of these:
rhselt:
symbol
| code
| '(' optname rhs ')'
| rhselt STAR /* STAR is (%name\s*([A-Za-z_]\w*)\s*)?\* */
| rhselt '<' STAR symbol '>'
| rhselt OPTION /* OPTION is (%name\s*([A-Za-z_]\w*)\s*)?\? */
| rhselt '<' PLUS symbol '>'
| rhselt PLUS /* PLUS is (%name\s*([A-Za-z_]\w*)\s*)?\+ */
The STAR
, OPTION
and PLUS
operators provide a simple mechanism to express lists:
In Eyapp the
+
operator indicates one or more repetitions of the element to the left of+
, thus a rule like:decls: decl +
is the same as:
decls: decls decl | decl
An additional symbol may be included to indicate lists of elements separated by such symbol. Thus
rhss: rule <+ '|'>
is equivalent to:
rhss: rhss '|' rule | rule
The operators
*
and?
have their usual meaning: 0 or more for*
and optionality for?
. Is legal to parenthesize arhs
expression as in:optname: (NAME IDENT)?
The Semantic of Lists Operators
The +
operator
The grammar:
pl@nereida:~/LEyapp/examples$ head -12 List3.yp | cat -n
1 # List3.yp
2 %semantic token 'c'
3 %{
4 use Data::Dumper;
5 %}
6 %%
7 S: 'c'+ 'd'+
8 {
9 print Dumper($_[1]);
10 print Dumper($_[2]);
11 }
12 ;
Is equivalent to:
pl@nereida:~/LEyapp/examples$ eyapp -v List3.yp | head -9 List3.output
Rules:
------
0: $start -> S $end
1: PLUS-1 -> PLUS-1 'c'
2: PLUS-1 -> 'c'
3: PLUS-2 -> PLUS-2 'd'
4: PLUS-2 -> 'd'
5: S -> PLUS-1 PLUS-2
By default, the semantic action associated with a +
returns the lists of attributes to which the +
applies:
pl@nereida:~/LEyapp/examples$ use_list3.pl
ccdd
$VAR1 = [ 'c', 'c' ];
$VAR1 = [ 'd', 'd' ];
The semantic associated with a +
changes when one of the tree creation directives is active (for instance %tree
or %metatree
) or it has been explicitly requested with a call to the YYBuildingTree
method:
$self->YYBuildingTree(1);
Other ways to change the associated semantic are to use the yybuildingtree
option of YYParse
:
$self->YYParse( yylex => \&_Lexer, yyerror => \&_Error,
yybuildingtree => 1,
# yydebug => 0x1F
);
In such case the associated semantic action creates a node labelled
_PLUS_LIST_#number
whose children are the attributes associated with the items in the plus list. The #number
in _PLUS_LIST_#number
is the ordinal of the production rule as it appears in the .output
file. As it happens when using the %tree
directive syntactic tokens are skipped.
When executing the example above but under the %tree
directive the ouput changes:
pl@nereida:~/LEyapp/examples$ head -3 List3.yp; eyapp List3.yp
# List3.yp
%semantic token 'c'
%tree
pl@nereida:~/LEyapp/examples$ use_list3.pl
ccdd
$VAR1 = bless( {
'children' => [
bless( { 'children' => [], 'attr' => 'c', 'token' => 'c' }, 'TERMINAL' ),
bless( { 'children' => [], 'attr' => 'c', 'token' => 'c' }, 'TERMINAL' )
]
}, '_PLUS_LIST_1' );
$VAR1 = bless( { 'children' => [] }, '_PLUS_LIST_2' );
The node associated with the list of d
s is empty since terminal d
wasn't declared semantic.
When Nodes Dissappear from Lists
When under the influence of the %tree
directive the action associated with a list operator is to flat the children in a single list.
In the former example, the d
nodes dont show up since 'd'
is a syntactic token. However, it may happen that changing the status of 'd'
to semantic will not suffice.
When inserting the children, the tree (%tree
) node construction method (YYBuildAST
) omits any attribute that is not a reference. Therefore, when inserting explicit actions, it is necessary to guarantee that the returned value is a reference or a semantic token to assure the presence of the value in the lists of children of the node. Certainly you can use this property to prune parts of the tree. Consider the following example:
pl@nereida:~/LEyapp/examples$ head -19 ListWithRefs1.eyp | cat -n
1 # ListWithRefs.eyp
2 %semantic token 'c' 'd'
3 %{
4 use Data::Dumper;
5 %}
6 %%
7 S: 'c'+ D+
8 {
9 print Dumper($_[1]);
10 print $_[1]->str."\n";
11 print Dumper($_[2]);
12 print $_[2]->str."\n";
13 }
14 ;
15
16 D: 'd'
17 ;
18
19 %%
To activate the tree semantic for lists we use the yybuildingtree
option of YYParse
:
pl@nereida:~/LEyapp/examples$ tail -7 ListWithRefs1.eyp | cat -n
1 sub Run {
2 my($self)=shift;
3 $self->YYParse( yylex => \&_Lexer, yyerror => \&_Error,
4 yybuildingtree => 1,
5 #, yydebug => 0x1F
6 );
7 }
The execution gives an ouput like this:
pl@nereida:~/LEyapp/examples$ eyapp ListWithRefs1.eyp; use_listwithrefs1.pl
ccdd
$VAR1 = bless( {
'children' => [
bless( {
'children' => [],
'attr' => 'c',
'token' => 'c'
}, 'TERMINAL' ),
bless( {
'children' => [],
'attr' => 'c',
'token' => 'c'
}, 'TERMINAL' )
]
}, '_PLUS_LIST_1' );
_PLUS_LIST_1(TERMINAL,TERMINAL)
$VAR1 = bless( {
'children' => []
}, '_PLUS_LIST_2' );
_PLUS_LIST_2
Though 'd'
was declared semantic the default action assoaciated with the production D: 'd'
in line 16 returns $_[1]
(that is, the scalar 'd'
). Since it is not a reference it won't be inserted in the list of children of _PLUS_LIST
.
Recovering the Missing Nodes
The solution is to be sure that the attribute is a reference:
pl@nereida:~/LEyapp/examples$ head -22 ListWithRefs.eyp | cat -n
1 # ListWithRefs.eyp
2 %semantic token 'c'
3 %{
4 use Data::Dumper;
5 %}
6 %%
7 S: 'c'+ D+
8 {
9 print Dumper($_[1]);
10 print $_[1]->str."\n";
11 print Dumper($_[2]);
12 print $_[2]->str."\n";
13 }
14 ;
15
16 D: 'd'
17 {
18 bless { attr => $_[1], children =>[]}, 'DES';
19 }
20 ;
21
22 %%
Now the attribute associated with D
is a reference and appears in the list of children of _PLUS_LIST
:
pl@nereida:~/LEyapp/examples$ eyapp ListWithRefs.eyp; use_listwithrefs.pl
ccdd
$VAR1 = bless( {
'children' => [
bless( {
'children' => [],
'attr' => 'c',
'token' => 'c'
}, 'TERMINAL' ),
bless( {
'children' => [],
'attr' => 'c',
'token' => 'c'
}, 'TERMINAL' )
]
}, '_PLUS_LIST_1' );
_PLUS_LIST_1(TERMINAL,TERMINAL)
$VAR1 = bless( {
'children' => [
bless( {
'children' => [],
'attr' => 'd'
}, 'DES' ),
bless( {
'children' => [],
'attr' => 'd'
}, 'DES' )
]
}, '_PLUS_LIST_2' );
_PLUS_LIST_2(DES,DES)
Building a Tree with Parse::Eyapp::Node->new
The former solution consisting on writing by hand the code to build the node may suffice when dealing with a single node. Writing by hand the code to build a node is a cumbersome task. Even worst: though the node built in the former example looks like a Parse::Eyapp
node actually isn't. Parse::Eyapp
nodes always inherit from Parse::Eyapp::Node
and consequently have access to the methods in such package. Thefollowing execution using the debugger illustrates the point:
pl@nereida:~/LEyapp/examples$ perl -wd use_listwithrefs.pl
Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(use_listwithrefs.pl:4): $parser = new ListWithRefs();
DB<1> f ListWithRefs.eyp
1 2 #line 3 "ListWithRefs.eyp"
3
4: use Data::Dumper;
5
6 #line 7 "ListWithRefs.eyp"
7 #line 8 "ListWithRefs.eyp"
8
9: print Dumper($_[1]);
10: print $_[1]->str."\n";
through the command f ListWithRefs.eyp
we inform the debugger that subsequent commands will refer to such file. Next we execute the program up to the semantic action associated with the production rule S: 'c'+ D+
(line 9)
DB<2> c 9 # Continue up to line 9 of ListWithRefs.eyp
ccdd
ListWithRefs::CODE(0x84ebe5c)(ListWithRefs.eyp:9):
9: print Dumper($_[1]);
Now we are in condition to look at the contents of the arguments:
DB<3> x $_[2]->str
0 '_PLUS_LIST_2(DES,DES)'
DB<4> x $_[2]->child(0)
0 DES=HASH(0x85c4568)
'attr' => 'd'
'children' => ARRAY(0x85c458c)
empty array
the str
method works with the object $_[2]
since _PLUS_LIST_2
nodes inherit from Parse::Eyapp::Node
. However, when we try with the DES
node we get an error:
DB<6> x $_[2]->child(0)->str
Can't locate object method "str" via package "DES" at \
(eval 11)[/usr/share/perl/5.8/perl5db.pl:628] line 2, <STDIN> line 1.
DB<7>
More robust than the former solution of building the node by hand is to use the constructor Parse::Eyapp::Node->new
: The method Parse::Eyapp::Node->new
is uset to build forests of syntactic trees.
It receives a list of terms describing the trees and - optionally - a reference to a subroutine used to set up the attributes of the just created nodes. After the creation of the trees the sub is called by Parse::Eyapp::Node->new
with arguments the list of references to the nodes (in the order in which they appear in the terms, from left to right). Parse::Eyapp::Node->new
returns a list of references to the jsut created nodes. In a scalar context returns a reference to the first of such trees. See an example:
pl@nereida:~/LEyapp/examples$ perl -MParse::Eyapp -MData::Dumper -wde 0
main::(-e:1): 0
DB<1> @t = Parse::Eyapp::Node->new('A(C,D) E(F)', sub { my $i = 0; $_->{n} = $i++ for @_ })
DB<2> $Data::Dumper::Indent = 0
DB<3> print Dumper($_)."\n" for @t
$VAR1 = bless( {'n' => 0,'children' => [bless( {'n' => 1,'children' => []}, 'C' ),
bless( {'n' => 2,'children' => []}, 'D' )
]
}, 'A' );
$VAR1 = bless( {'n' => 1,'children' => []}, 'C' );
$VAR1 = bless( {'n' => 2,'children' => []}, 'D' );
$VAR1 = bless( {'n' => 3,'children' => [bless( {'n' => 4,'children' => []}, 'F' )]}, 'E' );
$VAR1 = bless( {'n' => 4,'children' => []}, 'F' );
See the following example in which the nodes associated with 'd'
are explictly constructed:
pl@nereida:~/LEyapp/examples$ head -28 ListWithRefs2.eyp| cat -n
1 # ListWithRefs2.eyp
2 %semantic token 'c'
3 %{
4 use Data::Dumper;
5 %}
6 %%
7 S: 'c'+ D+
8 {
9 print Dumper($_[1]);
10 print $_[1]->str."\n";
11 print Dumper($_[2]);
12 print $_[2]->str."\n";
13 }
14 ;
15
16 D: 'd'.d
17 {
18 Parse::Eyapp::Node->new(
19 'DES(TERMINAL)',
20 sub {
21 my ($DES, $TERMINAL) = @_;
22 $TERMINAL->{attr} = $d;
23 }
24 );
25 }
26 ;
27
28 %%
To know more about Parse::Eyapp::Node->new
see the section Parse::Eyapp::Node->new
When the former eyapp program is executed produces the following output:
pl@nereida:~/LEyapp/examples$ eyapp ListWithRefs2.eyp; use_listwithrefs2.pl
ccdd
$VAR1 = bless( {
'children' => [
bless( { 'children' => [], 'attr' => 'c', 'token' => 'c' }, 'TERMINAL' ),
bless( { 'children' => [], 'attr' => 'c', 'token' => 'c' }, 'TERMINAL' )
]
}, '_PLUS_LIST_1' );
_PLUS_LIST_1(TERMINAL,TERMINAL)
$VAR1 = bless( {
'children' => [
bless( {
'children' => [
bless( { 'children' => [], 'attr' => 'd' }, 'TERMINAL' )
]
}, 'DES' ),
bless( {
'children' => [
bless( { 'children' => [], 'attr' => 'd' }, 'TERMINAL' )
]
}, 'DES' )
]
}, '_PLUS_LIST_2' );
_PLUS_LIST_2(DES(TERMINAL),DES(TERMINAL))
The *
operator
Any list operator operates on the factor to its left. A list in the right hand side of a production rule counts as a single symbol.
Both operators *
and +
can be used with the format X <* Separator>
. In such case they describe lists of X
s separated by separator
. See an example:
pl@nereida:~/LEyapp/examples$ head -25 CsBetweenCommansAndD.eyp | cat -n
1 # CsBetweenCommansAndD.eyp
2
3 %semantic token 'c' 'd'
4
5 %{
6 sub TERMINAL::info {
7 $_[0]->attr;
8 }
9 %}
10 %tree
11 %%
12 S:
13 ('c' <* ','> 'd')*
14 {
15 print "\nNode\n";
16 print $_[1]->str."\n";
17 print "\nChild 0\n";
18 print $_[1]->child(0)->str."\n";
19 print "\nChild 1\n";
20 print $_[1]->child(1)->str."\n";
21 $_[1]
22 }
23 ;
24
25 %%
The rule
S: ('c' <* ','> 'd')*
has only two items in its right hand side: the (separated by commas) list of c
s and the list of d
s. The production rule is equivalent to:
pl@nereida:~/LEyapp/examples$ eyapp -v CsBetweenCommansAndD.eyp
pl@nereida:~/LEyapp/examples$ head -11 CsBetweenCommansAndD.output | cat -n
1 Rules:
2 ------
3 0: $start -> S $end
4 1: STAR-1 -> STAR-1 ',' 'c'
5 2: STAR-1 -> 'c'
6 3: STAR-2 -> STAR-1
7 4: STAR-2 -> /* empty */
8 5: PAREN-3 -> STAR-2 'd'
9 6: STAR-4 -> STAR-4 PAREN-3
10 7: STAR-4 -> /* empty */
11 8: S -> STAR-4
The semantic action associated with *
is to return a reference to a list with the attributes of the matching items.
When working -as in the example - under a tree creation directive it returns a node belonging to a class named _STAR_LIST_#number
whose children are the items in the list. The #number
is the ordinal number of the production rule as it appears in the .output
file. The attributes must be references or associated with semantic tokens to be included in the list. Notice -in the execution of the former example that follows - how the node for PAREN-3
has been eliminated from the tree. Parenthesis nodes are - generally - obivated:
pl@nereida:~/LEyapp/examples$ use_csbetweencommansandd.pl
c,c,cd
Node
_STAR_LIST_4(_STAR_LIST_1(TERMINAL[c],TERMINAL[c],TERMINAL[c]),TERMINAL[d])
Child 0
_STAR_LIST_1(TERMINAL[c],TERMINAL[c],TERMINAL[c])
Child 1
TERMINAL[d]
Notice that the comma (since it is a syntactic token) has also been supressed.
Giving Names to Lists
To set the name of the node associated with a list operator the %name
directive must precede the operator as in the following example:
pl@nereida:~/LEyapp/examples$ sed -ne '1,27p' CsBetweenCommansAndDWithNames.eyp | cat -n
1 # CsBetweenCommansAndDWithNames.eyp
2
3 %semantic token 'c' 'd'
4
5 %{
6 sub TERMINAL::info {
7 $_[0]->attr;
8 }
9 %}
10 %tree
11 %%
12 Start: S
13 ;
14 S:
15 ('c' <%name Cs * ','> 'd') %name Cs_and_d *
16 {
17 print "\nNode\n";
18 print $_[1]->str."\n";
19 print "\nChild 0\n";
20 print $_[1]->child(0)->str."\n";
21 print "\nChild 1\n";
22 print $_[1]->child(1)->str."\n";
23 $_[1]
24 }
25 ;
26
27 %%
The execution shows the renamed nodes:
pl@nereida:~/LEyapp/examples$ use_csbetweencommansanddwithnames.pl c,c,c,cd
Node
Cs_and_d(Cs(TERMINAL[c],TERMINAL[c],TERMINAL[c],TERMINAL[c]),TERMINAL[d])
Child 0
Cs(TERMINAL[c],TERMINAL[c],TERMINAL[c],TERMINAL[c])
Child 1
TERMINAL[d]
Optionals
The X?
operator stands for the presence or omission of X
.
The grammar:
pl@nereida:~/LEyapp/examples$ head -11 List5.yp | cat -n
1 %semantic token 'c'
2 %tree
3 %%
4 S: 'c' 'c'?
5 {
6 print $_[2]->str."\n";
7 print $_[2]->child(0)->attr."\n" if $_[2]->children;
8 }
9 ;
10
11 %%
is equivalent to:
pl@nereida:~/LEyapp/examples$ eyapp -v List5
pl@nereida:~/LEyapp/examples$ head -7 List5.output
Rules:
------
0: $start -> S $end
1: OPTIONAL-1 -> 'c'
2: OPTIONAL-1 -> /* empty */
3: S -> 'c' OPTIONAL-1
When yybuildingtree
is false the associated attribute is a list that will be empty if CX> does not show up.
Under the %tree
directive the action creates an c<_OPTIONAL> node:
pl@nereida:~/LEyapp/examples$ use_list5.pl
cc
_OPTIONAL_1(TERMINAL)
c
pl@nereida:~/LEyapp/examples$ use_list5.pl
c
_OPTIONAL_1
Parenthesis
Any substring on the right hand side of a production rule can be grouped using a parenthesis. The introduction of a parenthesis implies the introduction of an additional syntactic variable whose only production is the sequence of symbols between the parenthesis. Thus the grammar:
pl@nereida:~/LEyapp/examples$ head -6 Parenthesis.eyp | cat -n
1 %%
2 S:
3 ('a' S ) 'b' { shift; [ @_ ] }
4 | 'c'
5 ;
6 %%
is equivalent to:
pl@nereida:~/LEyapp/examples$ eyapp -v Parenthesis.eyp; head -6 Parenthesis.output
Rules:
------
0: $start -> S $end
1: PAREN-1 -> 'a' S
2: S -> PAREN-1 'b'
3: S -> 'c'
By default the semantic rule associated with a parenthesis returns an anonymous list with the attributes of the symbols between the parenthesis:
pl@nereida:~/LEyapp/examples$ cat -n use_parenthesis.pl
1 #!/usr/bin/perl -w
2 use Parenthesis;
3 use Data::Dumper;
4
5 $Data::Dumper::Indent = 1;
6 $parser = Parenthesis->new();
7 print Dumper($parser->Run);
pl@nereida:~/LEyapp/examples$ use_parenthesis.pl
acb
$VAR1 = [
[ 'a', 'c' ], 'b'
];
pl@nereida:~/LEyapp/examples$ use_parenthesis.pl
aacbb
$VAR1 = [
[
'a',
[ [ 'a', 'c' ], 'b' ]
],
'b'
];
when working under a tree directive or when the attribute buildingtree
is set via theYYBuildingtree
method the semantic action returns a node with children the attributes of the symbols between parenthesis. As usual attributes which aren't references will be skipped from the list of children. See an example:
pl@nereida:~/LEyapp/examples$ head -23 List2.yp | cat -n
1 %{
2 use Data::Dumper;
3 %}
4 %semantic token 'a' 'b' 'c'
5 %tree
6 %%
7 S:
8 (%name AS 'a' S )'b'
9 {
10 print "S -> ('a' S )'b'\n";
11 print "Attribute of the first symbol:\n".Dumper($_[1]);
12 print "Attribute of the second symbol: $_[2]\n";
13 $_[0]->YYBuildAST(@_[1..$#_]);
14 }
15 | 'c'
16 {
17 print "S -> 'c'\n";
18 my $r = Parse::Eyapp::Node->new(qw(C(TERMINAL)), sub { $_[1]->attr('c') }) ;
19 print Dumper($r);
20 $r;
21 }
22 ;
23 %%
The example shows (line 8) how to rename a _PAREN
node. The %name CLASSNAME
goes after the opening parenthesis.
The call to YYBuildAST
at line 13 with argumetns the attributes of the symbols on the right hand side returns the node describing the current production rule. Notice that line 13 can be rewritten as:
goto &Parse::Eyapp::Driver::YYBuildAST;
At line 18 the node for the rule is explictly created using Parse::Eyapp::Node-
new>. The handler passed as second argument is responsible for setting the value of the atribute attr
of the just created TERMINAL
node.
Let us see an execution:
pl@nereida:~/LEyapp/examples$ use_list2.pl
aacbb
S -> 'c'
$VAR1 = bless( {
'children' => [
bless( {
'children' => [],
'attr' => 'c'
}, 'TERMINAL' )
]
}, 'C' );
the first reduction occurs by the non recursive rule. The execution shows the tree built by the call to Parse::Eyapp::Node-
new> at line 18.
The execution continues with the reduction or antiderivation by the rule S -> ('a' S )'b'
. The action at lines 9-14 dumps the attribute associated with ('a' S)
- or, in other words, the attribute associated with the variable PAREN-1
. It also dumps the attribute of 'b'
:
S -> ('a' S )'b'
Attribute of the first symbol:
$VAR1 = bless( {
'children' => [
bless( { 'children' => [], 'attr' => 'a', 'token' => 'a' }, 'TERMINAL' ),
bless( { 'children' => [ bless( { 'children' => [], 'attr' => 'c' }, 'TERMINAL' )
]
}, 'C' )
]
}, 'AS' );
Attribute of the second symbol: b
The last reduction shown is by the rule: S -> ('a' S )'b'
:
S -> ('a' S )'b'
Attribute of the first symbol:
$VAR1 = bless( {
'children' => [
bless( { 'children' => [], 'attr' => 'a', 'token' => 'a' }, 'TERMINAL' ),
bless( {
'children' => [
bless( {
'children' => [
bless( { 'children' => [], 'attr' => 'a', 'token' => 'a' }, 'TERMINAL' ),
bless( {
'children' => [
bless( { 'children' => [], 'attr' => 'c' }, 'TERMINAL' )
]
}, 'C' )
]
}, 'AS' ),
bless( { 'children' => [], 'attr' => 'b', 'token' => 'b' }, 'TERMINAL' )
]
}, 'S_2' )
]
}, 'AS' );
Attribute of the second symbol: b
Actions Inside Parenthesis
Though is a practice to avoid, since it clutters the code, it is certainly permitted to introduce actions between the parenthesis, as in the example below:
pl@nereida:~/LEyapp/examples$ head -16 ListAndAction.eyp | cat -n
1 # ListAndAction.eyp
2 %{
3 my $num = 0;
4 %}
5
6 %%
7 S: 'c'
8 {
9 print "S -> c\n"
10 }
11 | ('a' {$num++; print "Seen <$num> 'a's\n"; $_[1] }) S 'b'
12 {
13 print "S -> (a ) S b\n"
14 }
15 ;
16 %%
This is the output when executing this program with input aaacbbb
:
pl@nereida:~/LEyapp/examples$ use_listandaction.pl
aaacbbb
Seen <1> 'a's
Seen <2> 'a's
Seen <3> 'a's
S -> c
S -> (a ) S b
S -> (a ) S b
S -> (a ) S b
Names for attributes
Attributes can be referenced by meaningful names instead of the classic error-prone positional approach using the dot notation:
rhs: rhseltwithid *
rhseltwithid :
rhselt '.' IDENT
| '$' rhselt
| rhselt
for example:
exp : exp.left '-' exp.right { $left - $right }
By qualifying the first appearance of the syntactic variable exp
with the notation exp.left
we can later refer inside the actions to the associated attribute using the lexical variable $left
.
The dolar notation $A
can be used as an abbreviation of A.A
.
Default actions
When no action is specified both yapp
and eyapp
implicitly insert the semantic action { $_[1] }
. In Parse::Eyapp
you can modify such behavior using the %defaultaction { Perl code }
directive. The { Perl code }
clause that follows the %defaultaction
directive is executed when reducing by any production for which no explicit action was specified.
Translator from Infix to Postfix
See an example that translates an infix expression like a=b*-3
into a postfix expression like a b 3 NEG * =
:
# File Postfix.eyp (See the examples/ directory)
%right '='
%left '-' '+'
%left '*' '/'
%left NEG
%defaultaction { return "$left $right $op"; }
%%
line: $exp { print "$exp\n" }
;
exp: $NUM { $NUM }
| $VAR { $VAR }
| VAR.left '='.op exp.right
| exp.left '+'.op exp.right
| exp.left '-'.op exp.right
| exp.left '*'.op exp.right
| exp.left '/'.op exp.right
| '-' $exp %prec NEG { "$exp NEG" }
| '(' $exp ')' { $exp }
;
%%
# Support subroutines as in the Synopsis example
...
The file containing the Eyapp
program must be compiled with eyapp
:
nereida:~/src/perl/YappWithDefaultAction/examples> eyapp Postfix.eyp
Next, you have to write a client program:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n usepostfix.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Postfix;
4
5 my $parser = new Postfix();
6 $parser->Run;
Now we can run the client program:
nereida:~/src/perl/YappWithDefaultAction/examples> usepostfix.pl
Write an expression: -(2*a-b*-3)
2 a * b 3 NEG * - NEG
Default Actions, %name
and YYName
In eyapp
each production rule has a name. The name of a rule can be explicitly given by the programmer using the %name
directive. For example, in the piece of code that follows the name ASSIGN
is given to the rule exp: VAR '=' exp
.
When no explicit name is given the rule has an implicit name. The implicit name of a rule is shaped by concatenating the name of the syntactic variable on its left, an underscore and the ordinal number of the production rule Lhs_#
as it appears in the .output
file. Avoid giving names matching such pattern to production rules. The patterns /${lhs}_\d+$/
where ${lhs}
is the name of the syntactic variable are reserved for internal use by eyapp
.
pl@nereida:~/LEyapp/examples$ cat -n Lhs.eyp
1 # Lhs.eyp
2
3 %right '='
4 %left '-' '+'
5 %left '*' '/'
6 %left NEG
7
8 %defaultaction {
9 my $self = shift;
10 my $name = $self->YYName();
11 bless { children => [ grep {ref($_)} @_] }, $name;
12 }
13
14 %%
15 input:
16 /* empty */
17 { [] }
18 | input line
19 {
20 push @{$_[1]}, $_[2] if defined($_[2]);
21 $_[1]
22 }
23 ;
24
25 line: '\n' { }
26 | exp '\n' { $_[1] }
27 ;
28
29 exp:
30 NUM { $_[1] }
31 | VAR { $_[1] }
32 | %name ASSIGN
33 VAR '=' exp
34 | %name PLUS
35 exp '+' exp
36 | %name MINUS
37 exp '-' exp
38 | %name TIMES
39 exp '*' exp
40 | %name DIV
41 exp '/' exp
42 | %name UMINUS
43 '-' exp %prec NEG
44 | '(' exp ')' { $_[2] }
45 ;
Inside a semantic action the name of the current rule can be recovered using the method YYName
of the parser object.
The default action (lines 8-12) computes as attribute of the left hand side a reference to an object blessed in the name of the rule. The object has an attribute children
which is a reference to the list of children of the node. The call to grep
11 bless { children => [ grep {ref($_)} @_] }, $name;
excludes children that aren't references. Notice that the lexical analyzer only returns references for the NUM
and VAR
terminals:
59 sub _Lexer {
60 my($parser)=shift;
61
62 for ($parser->YYData->{INPUT}) {
63 s/^[ \t]+//;
64 return('',undef) unless $_;
65 s/^([0-9]+(?:\.[0-9]+)?)//
66 and return('NUM', bless { attr => $1}, 'NUM');
67 s/^([A-Za-z][A-Za-z0-9_]*)//
68 and return('VAR',bless {attr => $1}, 'VAR');
69 s/^(.)//s
70 and return($1, $1);
71 }
72 return('',undef);
73 }
follows the client program:
pl@nereida:~/LEyapp/examples$ cat -n uselhs.pl
1 #!/usr/bin/perl -w
2 use Lhs;
3 use Data::Dumper;
4
5 $parser = new Lhs();
6 my $tree = $parser->Run;
7 $Data::Dumper::Indent = 1;
8 if (defined($tree)) { print Dumper($tree); }
9 else { print "Cadena no válida\n"; }
When executed with input a=(2+3)*b
the parser produces the following tree:
ASSIGN(TIMES(PLUS(NUM[2],NUM[3]), VAR[b]))
See the result of an execution:
pl@nereida:~/LEyapp/examples$ uselhs.pl
a=(2+3)*b
$VAR1 = [
bless( {
'children' => [
bless( { 'attr' => 'a' }, 'VAR' ),
bless( {
'children' => [
bless( {
'children' => [
bless( { 'attr' => '2' }, 'NUM' ),
bless( { 'attr' => '3' }, 'NUM' )
]
}, 'PLUS' ),
bless( { 'attr' => 'b' }, 'VAR' )
]
}, 'TIMES' )
]
}, 'ASSIGN' )
];
The name of a production rule can be changed at execution time. See the following example:
29 exp:
30 NUM { $_[1] }
31 | VAR { $_[1] }
32 | %name ASSIGN
33 VAR '=' exp
34 | %name PLUS
35 exp '+' exp
36 | %name MINUS
37 exp '-' exp
38 {
39 my $self = shift;
40 $self->YYName('SUBSTRACT'); # rename it
41 $self->YYBuildAST(@_); # build the node
42 }
43 | %name TIMES
44 exp '*' exp
45 | %name DIV
46 exp '/' exp
47 | %name UMINUS
48 '-' exp %prec NEG
49 | '(' exp ')' { $_[2] }
50 ;
When the client program is executed we can see the presence of the SUBSTRACT
nodes:
pl@nereida:~/LEyapp/examples$ useyynamedynamic.pl
2-b
$VAR1 = [
bless( {
'children' => [
bless( {
'attr' => '2'
}, 'NUM' ),
bless( {
'attr' => 'b'
}, 'VAR' )
]
}, 'SUBSTRACT' )
];
Abstract Syntax Trees : %tree
and %name
Parse::Eyapp
facilitates the construction of concrete syntax trees and abstract syntax trees (abbreviated AST from now on) through the %tree
directive. Nodes in the AST are blessed in the production name
. By default the name of a production is the concatenation of the left hand side and the production number. The production number is the ordinal number of the production as they appear in the associated .output
file (see option -v
of eyapp). For example, given the grammar:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '9,28p' treewithoutnames.pl
my $grammar = q{
%right '=' # Lowest precedence
%left '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
%left '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
%left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b)
%tree # Let us build an abstract syntax tree ...
%%
line: exp <+ ';'> { $_[1] } /* list of expressions separated by ';' */
;
exp:
NUM | VAR | VAR '=' exp
| exp '+' exp | exp '-' exp | exp '*' exp
| exp '/' exp
| '-' exp %prec NEG
| '(' exp ')' { $_[2] }
;
The tree produced by the parser when feed with input a=2*b
is:
_PLUS_LIST(exp_6(TERMINAL[a],exp_9(exp_4(TERMINAL[2]),exp_5(TERMINAL[b]))))
If we want to see the correspondence between names and rules we can generate and check the corresponding file .output
:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '28,42p' treewithoutnames.output
Rules:
------
0: $start -> line $end
1: PLUS-1 -> PLUS-1 ';' exp
2: PLUS-1 -> exp
3: line -> PLUS-1
4: exp -> NUM
5: exp -> VAR
6: exp -> VAR '=' exp
7: exp -> exp '+' exp
8: exp -> exp '-' exp
9: exp -> exp '*' exp
10: exp -> exp '/' exp
11: exp -> '-' exp
12: exp -> '(' exp ')'
We can see now that the node exp_9
corresponds to the production exp -> exp '*' exp
. Observe also that the Eyapp production:
line: exp <+ ';'>
actually produces the productions:
1: PLUS-1 -> PLUS-1 ';' exp
2: PLUS-1 -> exp
and that the name of the class associated with the non empty list is _PLUS_LIST
.
A production rule can be named using the %name IDENTIFIER
directive. For each production rule a namespace/package is created. The IDENTIFIER
is the name of the associated package. Therefore, by modifying the former grammar with additional %name
directives:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '8,26p' treewithnames.pl
my $grammar = q{
%right '=' # Lowest precedence
%left '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
%left '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
%left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b)
%tree # Let us build an abstract syntax tree ...
%%
line: exp <%name EXPS + ';'> { $_[1] } /* list of expressions separated by ';' */
;
exp:
%name NUM NUM | %name VAR VAR | %name ASSIGN VAR '=' exp
| %name PLUS exp '+' exp | %name MINUS exp '-' exp | %name TIMES exp '*' exp
| %name DIV exp '/' exp
| %name UMINUS '-' exp %prec NEG
| '(' exp ')' { $_[2] }
;
we are explictly naming the productions. Thus, all the node instances corresponding to the production exp: VAR '=' exp
will belong to the class ASSIGN
. Now the tree for a=2*b
becomes:
EXPS(ASSIGN(TERMINAL[a],TIMES(NUM(TERMINAL[2]),VAR(TERMINAL[b]))))
Observe how the list has been named EXPS
. The %name
directive prefixes the list operator ([+*?]
).
About the Encapsulation of Nodes
There is no encapsulation of nodes. The user/client knows that they are hashes that can be decorated with new keys/attributes. All nodes in the AST created by %tree
are Parse::Eyapp::Node
nodes. The only reserved field is children
which is a reference to the array of children. You can always create a Node
class by hand by inheriting from Parse::Eyapp::Node
. See section "Compiling with eyapp and treereg" for an example.
TERMINAL Nodes
Nodes named TERMINAL
are built from the tokens provided by the lexical analyzer. Parse::Eyapp
follows the same protocol than Parse::Yapp for communication between the parser and the lexical analyzer: A couple ($token, $attribute)
is returned by the lexical analyzer. These values are stored under the keys token
and attr
. TERMINAL
nodes as all Parse::Eyapp::Node
nodes also have the attribute children
but is - almost always - empty.
Explicit Actions Inside %tree
Explicit actions can be specified by the programmer like in this line from the "SYNOPSIS" example:
| '(' exp ')' { $_[2] } /* Let us simplify a bit the tree */
Explicit actions receive as arguments the references to the children nodes already built. The programmer can influence the shape of the tree by inserting these explicit actions. In this example the programmer has decided to simplify the syntax tree: the nodes associated with the parenthesis are discarded and the reference to the subtree containing the proper expression is returned. Such manoeuvre is called bypassing. See section "The bypass clause and the %no bypass directive" to know more about automatic bypassing
Explicitly Building Nodes With YYBuildAST
Sometimes the best time to decorate a node with some attributes is just after being built. In such cases the programmer can take manual control building the node with YYBuildAST
to inmediately proceed to decorate it.
The following example illustrates the situation:
Variable:
%name VARARRAY
$ID ('[' binary ']') <%name INDEXSPEC +>
{
my $self = shift;
my $node = $self->YYBuildAST(@_);
$node->{line} = $ID->[1];
return $node;
}
This production rule defines the expression to access an array element as an identifier followed by a non empty list of binary expressions Variable: ID ('[' binary ']')+
. Furthermore, the node corresponding to the list of indices has been named INDEXSPEC
.
When no explicit action is inserted a binary node will be built having as first child the node corresponding to the identifier $ID
and as second child the reference to the list of binary expressions. The children corresponding to '['
and ']'
are discarded since they are -by default- syntactic tokens (see section "Syntactic and Semantic tokens"). However, the programmer wants to decorate the node being built with a line
attribute holding the line number in the source code where the identifier being used appears. The call to the Parse::Eyapp::Driver
method YYBuildAST
does the job of building the node. After that the node can be decorated and returned.
Actually, the %tree
directive is semantically equivalent to:
%default action { goto &Parse::Eyapp::Driver::YYBuildAST }
Returning non References Under %tree
When a explicit user action returns s.t. that is not a reference no node will be inserted. This fact can be used to supress nodes in the AST being built. See the following example (file examples/returnnonode.yp
):
nereida:~/src/perl/YappWithDefaultAction/examples> sed -ne '1,11p' returnnonode.yp | cat -n
1 %tree
2 %semantic token 'a' 'b'
3 %%
4 S: /* empty */
5 | S A
6 | S B
7 ;
8 A : 'a'
9 ;
10 B : 'b' { }
11 ;
since the action at line 10 returns undef
the B : 'b'
subtree will not be inserted in the AST:
nereida:~/src/perl/YappWithDefaultAction/examples> usereturnnonode.pl
ababa
S_2(S_3(S_2(S_3(S_2(S_1,A_4(TERMINAL[a]))),A_4(TERMINAL[a]))),A_4(TERMINAL[a]))
Observe the absence of B
s and 'b'
s.
Intermediate actions and %tree
Intermediate actions can be used to change the shape of the AST (prune it, decorate it, etc.) but the value returned by them is ignored. The grammar below has two intermediate actions. They modify the attributes of the node to its left and return a reference $f
to such node (lines 5 and 6):
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '1,10p' intermediateactiontree.yp | cat -n
1 %semantic token 'a' 'b'
2 %tree bypass
3 %%
4 S: /* empty */
5 | S A.f { $f->{attr} = "A"; $f; } A
6 | S B.f { $f->{attr} = "B"; $f; } B
7 ;
8 A : %name A 'a'
9 ;
10 B : %name B 'b'
See the client program running:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n useintermediateactiontree.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Parse::Eyapp;
4 use intermediateactiontree;
5
6 { no warnings;
7 *A::info = *B::info = sub { $_[0]{attr} };
8 }
9
10 my $parser = intermediateactiontree->new();
11 my $t = $parser->Run;
12 print $t->str,"\n";
nereida:~/src/perl/YappWithDefaultAction/examples> useintermediateactiontree.pl
aabbaa
S_2(S_4(S_2(S_1,A[A],A[a]),B[B],B[b]),A[A],A[a])
The attributes of left A
s have been effectively changed by the intermediate actions from 'a'
to 'A'
. However no further children have been inserted.
Syntactic and Semantic tokens
Parse::Eyapp
diferences between syntactic tokens
and semantic tokens
. By default all tokens declared using string notation (i.e. between quotes like '+'
, '='
) are considered syntactic tokens. Tokens declared by an identifier (like NUM
or VAR
) are by default considered semantic tokens. Syntactic tokens do not yield to nodes in the syntactic tree. Thus, the first print in the former "SYNOPSIS" example:
$parser->YYData->{INPUT} = "2*-3+b*0;--2\n";
my $t = $parser->Run;
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;
gives as result the following output:
nereida:~/src/perl/YappWithDefaultAction/examples> synopsis.pl
Syntax Tree:
EXPRESION_LIST(
PLUS(
TIMES(
NUM(
TERMINAL[2]
),
UMINUS(
NUM(
TERMINAL[3]
)
) # UMINUS
) # TIMES,
TIMES(
VAR(
TERMINAL[b]
),
NUM(
TERMINAL[0]
)
) # TIMES
) # PLUS,
UMINUS(
UMINUS(
NUM(
TERMINAL[2]
)
) # UMINUS
) # UMINUS
) # EXPRESION_LIST
TERMINAL
nodes corresponding to tokens that were defined by strings like '='
, '-'
, '+'
, '/'
, '*'
, '('
and ')'
do not appear in the tree. TERMINAL
nodes corresponding to tokens that were defined using an identifer, like NUM
or VAR
are, by default, semantic tokens and appear in the AST.
Changing the Status of a Token
The new token declaration directives %syntactic token
and %semantic token
can change the status of a token. For example (file 15treewithsyntactictoken.pl
in the examples/
directory), given the grammar:
%syntactic token b
%semantic token 'a' 'c'
%tree
%%
S: %name ABC
A B C
| %name BC
B C
;
A: %name A
'a'
;
B: %name B
b
;
C: %name C
'c'
;
%%
the tree build for input abc
will be ABC(A(TERMINAL[a]),B,C(TERMINAL[c]))
.
Saving the Information of Syntactic Tokens in their Father
The reason for the adjective %syntactic
applied to a token is to state that the token influences the shape of the syntax tree but carries no other information. When the syntax tree is built the node corresponding to the token is discarded.
Sometimes the difference between syntactic and semantic tokens is blurred. For example the line number associated with an instance of the syntactic token '+'
can be used later -say during type checking- to emit a more accurate error diagnostic. But if the node was discarded the information about that line number is no longer available. When building the syntax tree Parse::Eyapp
(namely the method Parse::Eyapp::YYBuildAST
) checks if the method TERMINAL::save_attributes
exists and if so it will be called when dealing with a syntactic token. The method receives as argument - additionally to the reference to the attribute of the token as it is returned by the lexical analyzer - a reference to the node associated with the left hand side of the production. Here is an example (file examples/Types.eyp
) of use:
sub TERMINAL::save_attributes {
# $_[0] is a syntactic terminal
# $_[1] is the father.
push @{$_[1]->{lines}}, $_[0]->[1]; # save the line number
}
The bypass
clause and the %no bypass
directive
The shape of the tree can be also modified using some %tree
clauses as %tree bypass
which will produce an automatic bypass of any node with only one child at tree-construction-time.
A bypass operation consists in returning the only child of the node being visited to the father of the node and re-typing (re-blessing) the node in the name of the production (if a name was provided).
A node may have only one child at tree-construction-time for one of two reasons.
The first occurs when the right hand side of the production was already unary like in:
exp: %name NUM NUM
Here - if the
bypass
clause is used - theNUM
node will be bypassed and the childTERMINAL
built from the information provided by the lexical analyzer will be renamed/reblessed asNUM
.Another reason for a node to be bypassed is the fact that though the right hand side of the production may have more than one symbol, only one of them is not a syntactic token like in:
exp: '(' exp ')'
A consequence of the global scope application of %tree bypass
is that undesired bypasses may occur like in
exp : %name UMINUS
'-' $exp %prec NEG
though the right hand side has two symbols, token '-'
is a syntactic token and therefore only exp
is left. The bypass operation will be applied when building this node. This bypass can be avoided applying the no bypass ID
directive to the corresponding production:
exp : %no bypass UMINUS
'-' $exp %prec NEG
The following example (file examples/bypass.pl
) is the equivalent of the "SYNOPSIS" example but using the bypass
clause instead:
use Parse::Eyapp;
use Parse::Eyapp::Treeregexp;
sub TERMINAL::info { $_[0]{attr} }
{ no warnings; *VAR::info = *NUM::info = \&TERMINAL::info; }
my $grammar = q{
%right '=' # Lowest precedence
%left '-' '+'
%left '*' '/'
%left NEG # Disambiguate -a-b as (-a)-b and not as -(a-b)
%tree bypass # Let us build an abstract syntax tree ...
%%
line: exp <%name EXPRESION_LIST + ';'> { $_[1] }
;
exp:
%name NUM NUM | %name VAR VAR | %name ASSIGN VAR '=' exp
| %name PLUS exp '+' exp | %name MINUS exp '-' exp | %name TIMES exp '*' exp
| %name DIV exp '/' exp
| %no bypass UMINUS
'-' $exp %prec NEG
| '(' exp ')'
;
%%
# sub _Error, _Lexer and Run like in the synopsis example
# ...
}; # end grammar
our (@all, $uminus);
Parse::Eyapp->new_grammar( # Create the parser package/class
input=>$grammar,
classname=>'Calc', # The name of the package containing the parser
firstline=>7 # String $grammar starts at line 7 (for error diagnostics)
);
my $parser = Calc->new(); # Create a parser
$parser->YYData->{INPUT} = "a=2*-3+b*0\n"; # Set the input
my $t = $parser->Run; # Parse it!
print "\n************\n".$t->str."\n************\n";
# Let us transform the tree. Define the tree-regular expressions ..
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{ # Example of support code
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM, NUM)
=> {
my $op = $Op{ref($_[0])};
$NUM[0]->{attr} = eval "$NUM[0]->{attr} $op $NUM[1]->{attr}";
$_[0] = $NUM[0];
}
zero_times_whatever: TIMES(NUM, .) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
whatever_times_zero: TIMES(., NUM) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
uminus: UMINUS(NUM) => { $NUM->{attr} = -$NUM->{attr}; $_[0] = $NUM }
},
OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations
$t->s(@all); # constant folding and mult. by zero
print $t->str,"\n";
when running this example with input "a=2*-3+b*0\n"
we obtain the following output:
nereida:~/src/perl/YappWithDefaultAction/examples> bypass.pl
************
EXPRESION_LIST(ASSIGN(TERMINAL[a],PLUS(TIMES(NUM[2],UMINUS(NUM[3])),TIMES(VAR[b],NUM[0]))))
************
EXPRESION_LIST(ASSIGN(TERMINAL[a],NUM[-6]))
As you can see the trees are more compact when using the bypass
directive.
The alias
clause of the %tree
directive
Access to children in Parse::Eyapp is made through the child
and children
methods. There are occasions however where access by name to the children may be preferable. The use of the alias
clause with the %tree
directive creates accessors to the children with names specified by the programmer. The dot and dolar notations are used for this. When dealing with a production like:
A:
%name A_Node
Node B.bum N.pum $Chip
methods bum
, pum
and Chip
will be created for the class A_Node
. Those methods wil provide access to the respective child (first, second and third in the example). The methods are build at compile-time and therefore later transformations of the AST modifying the order of the children may invalidate the use of these getter-setters.
As an example, the CPAN module Language::AttributeGrammar provides AST decorators from an attribute grammar specification of the AST. To work Language::AttributeGrammar requires named access to the children of the AST nodes. Follows an example (file examples/CalcwithAttributeGrammar.pl
) of a small calculator:
use Parse::Eyapp;
use Language::AttributeGrammar;
my $grammar = q{
... # priority declarations. Like in previous examples
%tree bypass alias
%%
line: $exp { $_[1] }
;
exp:
%name NUM
$NUM
| %name VAR
$VAR
............ # as in the bypass example
}; # end grammar
Parse::Eyapp->new_grammar(
input=>$grammar, classname=>'Rule6', firstline =>7,
);
my $parser = Rule6->new();
$parser->YYData->{INPUT} = "a = -(2*3+5-1)\n";
my $t = $parser->Run;
my $attgram = new Language::AttributeGrammar <<'EOG';
# Compute the expression
NUM: $/.val = { $<attr> }
TIMES: $/.val = { $<left>.val * $<right>.val }
PLUS: $/.val = { $<left>.val + $<right>.val }
MINUS: $/.val = { $<left>.val - $<right>.val }
UMINUS: $/.val = { -$<exp>.val }
ASSIGN: $/.val = { $<exp>.val }
EOG
my $res = $attgram->apply($t, 'val');
Debugging Parse::Eyapp
Programs
The sources of error when programming with eyapp
are many and various. Some of them are minor, as having a nonterminal without production rules or a terminal that is never produced by the lexical analyzer. These kind of errors can be catched with the help of the %strict
directive.
In this section we will discuss three main kind of errors that correspond to three development stages:
Conflict errors:
Conflicts with the grammar: the grammar is ambiguous or is not clear - perhaps due to the fact that
eyapp
uses only a lookahead symbol - which sort of tree must be built for some inputsTree building errors:
There are no conflicts but the parser does not build the syntas tree as expected. May be it rejects correct sentences or accepts incorrect ones. Or may be it accepts correct ones but the syntax tree has not the shape we want.
Semantic errors:
We have solved the conflicts and trees are satisfactory but we have errors inside the semantic actions.
Conflict Errors
The following simplified eyapp
program has some errors. The generated language is made of lists of declarations (D
stands for declaration) followed by lists of sentences (S
stands for stament) separated by semicolons:
pl@nereida:~/LEyapp/examples$ cat -n Debug.eyp
1 %token D S
2
3 %{
4 our $VERSION = '0.01';
5 %}
6
7 %%
8 p:
9 ds ';' ss
10 | ss
11 ;
12
13 ds:
14 D ';' ds
15 | D
16 {
17 print "Reducing by rule:\n";
18 print "\tds -> D\n";
19 $_[1];
20 }
21 ;
22
23 ss:
24 S ';' ss
25 | S
26 ;
27
28 %%
29
30 my $tokenline = 0;
31
32 sub _Error {
33 my $parser = shift;
34 my ($token) = $parser->YYCurval;
35 my ($what) = $token ? "input: '$token'" : "end of input";
36 die "Syntax error near $what line num $tokenline\n";
37 }
38
39 my $input;
40
41 sub _Lexer {
42
43 for ($input) {
44 s{^(\s)}{} and $tokenline += $1 =~ tr{\n}{};
45 return ('',undef) unless $_;
46 return ($1,$1) if s/^(.)//;
47 }
48 return ('',undef);
49 }
50
51 sub Run {
52 my ($self) = shift;
53
54 $input = shift;
55
56 return $self->YYParse( yylex => \&_Lexer, yyerror => \&_Error,
57 yydebug => 0xF
58 );
59 }
When compiling, eyapp
produces a warning message announcing the existence of a conflict:
pl@nereida:~/LEyapp/examples$ eyapp Debug.eyp
1 shift/reduce conflict (see .output file)
State 4: shifts:
to state 8 with ';'
The existence of warnings triggers the creation of a file Debug.output
containing information about the grammar and the syntax analyzer.
Let us see the contents of the Debug.output
file:
pl@nereida:~/LEyapp/examples$ cat -n Debug.output
1 Warnings:
2 ---------
3 1 shift/reduce conflict (see .output file)
4 State 4: shifts:
5 to state 8 with ';'
6
7 Conflicts:
8 ----------
9 State 4 contains 1 shift/reduce conflict
10
11 Rules:
12 ------
13 0: $start -> p $end
14 1: p -> ds ';' ss
15 2: p -> ss
16 3: ds -> D ';' ds
17 4: ds -> D
18 5: ss -> S ';' ss
19 6: ss -> S
20
21 States:
22 -------
23 State 0:
24
25 $start -> . p $end (Rule 0)
26
27 D shift, and go to state 4
28 S shift, and go to state 1
29
30 p go to state 2
31 ss go to state 3
32 ds go to state 5
33
.. .........................................
55 State 4:
56
57 ds -> D . ';' ds (Rule 3)
58 ds -> D . (Rule 4)
59
60 ';' shift, and go to state 8
61
62 ';' [reduce using rule 4 (ds)]
63
.. .........................................
84 State 8:
85
86 ds -> D ';' . ds (Rule 3)
87
88 D shift, and go to state 4
89
90 ds go to state 11
91
.. .........................................
112 State 12:
113
114 p -> ds ';' ss . (Rule 1)
115
116 $default reduce using rule 1 (p)
117
118
119 Summary:
120 --------
121 Number of rules : 7
122 Number of terminals : 4
123 Number of non-terminals : 4
124 Number of states : 13
The parser generated by Parse::Eyapp
is based on a deterministic finite automata. Each state of the automata remembers what production rules are candidates to apply and what have been seen from the right hand side of the production rule. The problem, according to the warning, occurs in state 4. State 4 contains:
55 State 4:
56
57 ds -> D . ';' ds (Rule 3)
58 ds -> D . (Rule 4)
59
60 ';' shift, and go to state 8
61
62 ';' [reduce using rule 4 (ds)]
63
An state is a set of production rules with a marker (the dot in rules 3 and 4) somewhere in its right hand side. If the parser is in state 4 is because the production rules ds -> D ';' ds
and ds -> D
are potential candidates to build the syntax tree. That they will be or not depends on what will happen next when more input is processed.
The dot that appears on the right hand side means position in our guessing. The fact that ds -> D .';' ds
is in state 4 means that if the parser is in state 4 we have already seen D
and we expect to see a semicolon followed by ds
(or something derivable from ds
). If such thing happens this production will be the right one (will be the handle in the jargon). The comment
60 ';' shift, and go to state 8
means that if the next token is a semicolon the next state will be state 8:
84 State 8:
85
86 ds -> D ';' . ds (Rule 3)
87
88 D shift, and go to state 4
89
90 ds go to state 11
As we see state 8 has the item ds -> D ';' . ds
which means that we have already seen a D
and a semicolon.
The fact that ds -> D .
is in state 4 means that we have already seen D
and since the dot is at the end of the rule, this production can be the right one, even if a semicolon is just waiting in the input.
That is why eyapp
talks about a shift/reduce conflict with ';'
because in state 4 there are two rules that compete to be the right one:
pl@nereida:~/LEyapp/examples$ eyapp Debug.eyp
1 shift/reduce conflict (see .output file)
We can guess that the right item (the states of the automata are called LR(0) items in the jargon) is ds -> D .';' ds
and shift to state 8 consuming the semicolon, expecting to see something derivable from ds
later or guess that ds -> D .
is the right LR(0) item and reduce for such rule. This is the meaning of the comments in state 4:
60 ';' shift, and go to state 8
61
62 ';' [reduce using rule 4 (ds)]
To illustrate the problem let us consider the phrases D;S
and D;D;S
.
For both phrases, after consuming the D
the parser will go to state 4 and the current token will be the semicolon.
For the first phrase D;S
the correct decison is to use rule 4 ds -> D
(to reduce in the jargon). For the second phrase D;D;S
the correct decision is to follow rule 3 ds -> D . ';' ds
.
The parser generated by eyapp
could know wich rule is correct for each case if it were allowed to look at the token after the semicolon: if it is a S
is rule 4, if it is a D
is rule 3. But the parsers generated by Eyapp
do not lookahead more than the next token (this is what the "1" means when we say that Parse::Eyapp
parsers are LALR(1)) and therefore is not in condition to decide which producction rule applies.
To solve the conflict the Eyapp
programmer has to reformulate the grammar modifying priorities and reorganizing the rules. Rewriting the recursive rule for ds
to be let recursive solves the conflict:
pl@nereida:~/LEyapp/examples$ sed -ne '/^ds:/,/^;/p' Debug1.eyp | cat -n
1 ds:
2 ds ';' D
3 | D
4 {
5 print "Reducing by rule:\n";
6 print "\tds -> D\n";
7 $_[1];
8 }
9 ;
Now, for any phrase matching the pattern D ; ...
the action to build the tree is to reduce by ds -> D
.
The rightmost antiderivation for D;D;S
is:
Derivation | Tree
--------------------------------------+-----------------------------
D;D;S <= ds;D;S <= ds;S <= ds;ss <= p | p(ds(ds(D),';',D),';',ss(S))
while the rightmost antiderivation for D;S
is:
Derivation | Tree
--------------------------------------+-----------------------------
D;S <= ds;S <= ds;ss <= p | p(ds(D),';',ss(S))
When we recompile the modified grammar no warnings appear:
pl@nereida:~/LEyapp/examples$ eyapp Debug1.eyp
pl@nereida:~/LEyapp/examples$
Errors During Tree Construction
Let us write the typical client program:
pl@nereida:~/LEyapp/examples$ cat -n usedebug1.pl
1 #!/usr/bin/perl -w
2 # usetreebypass.pl prueba2.exp
3 use strict;
4 use Debug1;
5
6 sub slurp_file {
7 my $fn = shift;
8 my $f;
9
10 local $/ = undef;
11 if (defined($fn)) {
12 open $f, $fn or die "Can't find file $fn!\n";
13 }
14 else {
15 $f = \*STDIN;
16 }
17 my $input = <$f>;
18 return $input;
19 }
20
21 my $input = slurp_file( shift() );
22
23 my $parser = Debug1->new();
24
25 $parser->Run($input);
When executing the program we observe an abnormal behavior. We activate the option yydebug => 0xF
in the call to a YYParser
. The integer parameter yydebug
of new
and YYParse
controls the level of debugging. Different levels of verbosity can be obtained by setting the bits of this argument. It works as follows:
/============================================================\
| Bit Value | Outputs |
|------------+-----------------------------------------------|
| 0x01 | Token reading (useful for Lexer debugging) |
|------------+-----------------------------------------------|
| 0x02 | States information |
|------------+-----------------------------------------------|
| 0x04 | Driver actions (shifts, reduces, accept...) |
|------------+-----------------------------------------------|
| 0x08 | Parse Stack dump |
|------------+-----------------------------------------------|
| 0x10 | Error Recovery tracing |
\============================================================/
Let us see what happens when the input is D;S
. We have introduced some white spaces and carriage returns between the terminals:
pl@nereida:~/LEyapp/examples$ usedebug1.pl
D
;
S
----------------------------------------
In state 0:
Stack:[0]
Need token. Got >D<
Shift and go to state 4.
----------------------------------------
In state 4:
Stack:[0,4]
Don't need token.
Reduce using rule 4 (ds --> D): Reducing by rule:
ds -> D
Back to state 0, then go to state 5.
----------------------------------------
In state 5:
Stack:[0,5]
Need token. Got ><
Syntax error near end of input line num 1
What's going on? After reading the carriage return
Need token. Got >D<
the parser receives an end of file. ¿Why?. Something is going wrong in the communications between lexer and parser. Let us review the lexical analyzer:
pl@nereida:~/LEyapp/examples$ sed -ne '/sub.*_Lexer/,/^}/p' Debug1.eyp | cat -n
1 sub _Lexer {
2
3 for ($input) {
4 s{^(\s)}{} and $tokenline += $1 =~ tr{\n}{};
5 return ('',undef) unless $_;
6 return ($1,$1) if s/^(.)//;
7 }
8 return ('',undef);
9 }
The error is at line 4. Only a single white space is eaten! The second white in the input does not match lines 5 and 6 and the contextualizing for
finishes. Line 8 the returns the ('',undef)
signalling the end of input.
Let us write a new version Debug2.eyp
that fixes the problem:
pl@nereida:~/LEyapp/examples$ sed -ne '/sub.*_Lexer/,/^}/p' Debug2.eyp | cat -n
1 sub _Lexer {
2
3 for ($input) {
4 s{^(\s+)}{} and $tokenline += $1 =~ tr{\n}{};
5 return ('',undef) unless $_;
6 return ($1,$1) if s/^(.)//;
7 }
8 return ('',undef);
9 }
Now the analysis seems to work:
pl@nereida:~/LEyapp/examples$ usedebug2.pl
D
;
S
----------------------------------------
In state 0:
Stack:[0]
Need token. Got >D<
Shift and go to state 4.
----------------------------------------
In state 4:
Stack:[0,4]
Don't need token.
Reduce using rule 4 (ds --> D): Reducing by rule:
ds -> D
Back to state 0, then go to state 5.
----------------------------------------
In state 5:
Stack:[0,5]
Need token. Got >;<
Shift and go to state 8.
----------------------------------------
In state 8:
Stack:[0,5,8]
Need token. Got >S<
Shift and go to state 1.
----------------------------------------
In state 1:
Stack:[0,5,8,1]
Need token. Got ><
Reduce using rule 6 (ss --> S): Back to state 8, then go to state 10.
----------------------------------------
In state 10:
Stack:[0,5,8,10]
Don't need token.
Reduce using rule 1 (p --> ds ; ss): Back to state 0, then go to state 2.
----------------------------------------
In state 2:
Stack:[0,2]
Shift and go to state 7.
----------------------------------------
In state 7:
Stack:[0,2,7]
Don't need token.
Accept.
Understanding the Output of yydebug
The YYParse
methods implements the generic LR parsing algorithm. It very much works Parse::Yapp::YYParse
and as yacc/bison yyparse
. It accepts almost the same arguments as Class->new
(Being Class
the name of the generated class).
The parser uses two tables and a stack. The two tables are called the action table and the goto table. The stack is used to keep track of the states visited.
At each step the generated parser consults the action
table and takes one decision: To shift to a new state consuming one token (and pushing the current state in the stack) or to reduce by some production rule. In the last case the parser pops from its stack as many states as symbols are on the right hand side of the production rule. Here is a Perl/C like pseudocode summarizing the activity of YYParse
:
1 my $parser = shift; # The parser object
2 push(@stack, $parser->{startstate});
3 $b = $parser->YYLexer(); # Get the first token
4 FOREVER: {
5 $s = top(0); # Get the state on top of the stack
6 $a = $b;
7 switch ($parser->action[$s->state][$a]) {
8 case "shift t" :
9 my $t;
10 $t->{state} = t;
11 $t->{attr} = $a->{attr};
12 push($t);
13 $b = $parser->YYLexer(); # Call the lexical analyzer
14 break;
15 case "reduce A->alpha" :
16 # Call the semantic action with the attributes of the rhs as args
17 my $semantic = $parser->Semantic{A ->alpha}; # The semantic action
18 my $r;
19 $r->{attr} = $semantic->($parser, top(|alpha|-1)->attr, ... , top(0)->attr);
20
21 # Pop as many states as symbols on the rhs of A->alpha
22 pop(|alpha|);
23
24 # Goto next state
25 $r->{state} = $parser->goto[top(0)][A];
26 push($r);
27 break;
28 case "accept" : return (1);
29 default : $parser->YYError("syntax error");
30 }
31 redo FOREVER;
32 }
Here |alpha|
stands for the length of alpha
. Function top(k)
returns the state in position k
from the top of the stack, i.e. the state at depth k
. Function pop(k)
extracts k
states from the stack. The call $state->attr
returns the attribute associated with $state
. The call $parser->Semantic{A ->alpha}
returns the semantic action associated with production A ->alpha
.
Let us see a trace for the small gramar in examples/aSb.yp
:
pl@nereida:~/LEyapp/examples$ /usr/local/bin/paste.pl aSb.yp aSb.output | head -5
%% | Rules:
S: { print "S -> epsilon\n" } | ------
| 'a' S 'b' { print "S -> a S b\n" } | 0: $start -> S $end
; | 1: S -> /* empty */
%% | 2: S -> 'a' S 'b'
The tables in file aSb.output
describe the actions and transitions to take:
pl@nereida:~/LEyapp/examples$ cat -n aSb.output
. .........................................
7 States:
8 -------
9 State 0:
10
11 $start -> . S $end (Rule 0)
12
13 'a' shift, and go to state 2
14
15 $default reduce using rule 1 (S)
16
17 S go to state 1
18
19 State 1:
20
21 $start -> S . $end (Rule 0)
22
23 $end shift, and go to state 3
24
25 State 2:
26
27 S -> 'a' . S 'b' (Rule 2)
28
29 'a' shift, and go to state 2
30
31 $default reduce using rule 1 (S)
32
33 S go to state 4
34
35 State 3:
36
37 $start -> S $end . (Rule 0)
38
39 $default accept
40
41 State 4:
42
43 S -> 'a' S . 'b' (Rule 2)
44
45 'b' shift, and go to state 5
46
47 State 5:
48
49 S -> 'a' S 'b' . (Rule 2)
50
51 $default reduce using rule 2 (S)
52
53
54 Summary:
55 --------
56 Number of rules : 3
57 Number of terminals : 3
58 Number of non-terminals : 2
59 Number of states : 6
When executed with yydebug
set and input aabb
we obtain the following output:
pl@nereida:~/LEyapp/examples$ use_aSb.pl
----------------------------------------
In state 0:
Stack:[0]
aabb <----------- user input
Need token. Got >a<
Shift and go to state 2.
----------------------------------------
In state 2:
Stack:[0,2]
Need token. Got >a<
Shift and go to state 2.
----------------------------------------
In state 2:
Stack:[0,2,2]
Need token. Got >b<
Reduce using rule 1 (S --> /* empty */): S -> epsilon
Back to state 2, then go to state 4.
The output S-> epsilon
is consequence of the semantic action associated with such production rule.
----------------------------------------
In state 4:
Stack:[0,2,2,4]
Shift and go to state 5.
----------------------------------------
In state 5:
Stack:[0,2,2,4,5]
Don't need token.
Reduce using rule 2 (S --> a S b): S -> a S b
Back to state 2, then go to state 4.
As a result of reducing by rule 2 the semantic action is executed
{ print "S -> a S b\n" }
and the three last visited states are popped from the stack, and the stack becomes [0,2]
. But that means that we are now in state 2 seeing a S
. If you look at the table above being in state2 and seeing a S
we go to state 4.
----------------------------------------
In state 4:
Stack:[0,2,4]
Need token. Got >b<
Shift and go to state 5.
----------------------------------------
In state 5:
Stack:[0,2,4,5]
Don't need token.
Reduce using rule 2 (S --> a S b): S -> a S b
Back to state 0, then go to state 1.
----------------------------------------
In state 1:
Stack:[0,1]
Need token. Got ><
Shift and go to state 3.
----------------------------------------
In state 3:
Stack:[0,1,3]
Don't need token.
Accept.
Errors Inside Semantic Actions
A third type of error occurs when the code inside a semantic action does'nt behave as expected.
the semantic actions are translated in anonymous methods of the parser object. Since they are anonymous we ca't use breakpoints as
b subname # stop when arriving at sub ''name''
or
c subname # contine up to reach sub ''name''
Furthermore the file loaded by the client program is the generated .pm
. The code in Debug.pm
is alien to us - Was automatically generated by Parse::Eyapp
- and it can be difficult to find where our inserted semantic actions are.
To watch the execution of a semantic action is simple: We use the debugger f file.eyp
option to switch the viewing filename to our grammar file.
pl@nereida:~/LEyapp/examples$ perl -wd usedebug2.pl
Loading DB routines from perl5db.pl version 1.28
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(usedebug2.pl:21): my $input = slurp_file( shift() );
DB<1> f Debug2.eyp
1 2 #line 3 "Debug2.eyp"
3
4: our $VERSION = '0.01';
5
6 7 8 9 10
Now we can set a breakpoint at any line of our grammar file. Thus the 18
in the command b 18
refers to line 18 in Debug2.eyp
. The command l
shows the corresponding lines of the .eyp
file
DB<2> b 18
DB<3> l
11 12 13 14 15 16 #line 17 "Debug2.eyp"
17
18:b print "Reducing by rule:\n";
19: print "\tds -> D\n";
20: $_[1];
We issue now the command c
(continue). The execution continues up to linea 18 of Debug2.eyp
:
DB<3> c
D
;
S
Debug2::CODE(0x85129d8)(Debug2.eyp:18):
18: print "Reducing by rule:\n";
DB<3> n
Reducing by rule:
Now we can issue any debugger commands (like x
, p
, etc.) to investigate the internal state of our program and determine what are the reasons for any abnormal behavior.
Debug2::CODE(0x85129d8)(Debug2.eyp:19):
19: print "\tds -> D\n";
DB<3> x $_[0]{GRAMMAR}
0 ARRAY(0x8538360)
0 ARRAY(0x855aa88)
0 '_SUPERSTART'
1 '$start'
2 ARRAY(0x855ab60)
0 'p'
1 '$end'
3 0
1 ARRAY(0x855a890)
0 'p_1'
1 'p'
2 ARRAY(0x855a8fc)
0 'ds'
1 ';'
2 'ss'
3 0
2 ARRAY(0x855a800)
0 'p_2'
1 'p'
2 ARRAY(0x855a830)
0 'ss'
3 0
3 ARRAY(0x855a764)
0 'ds_3'
1 'ds'
2 ARRAY(0x855a7a0)
0 'ds'
1 ';'
2 'D'
3 0
4 ARRAY(0x85421d4)
0 'ds_4'
1 'ds'
2 ARRAY(0x855a6e0)
0 'D'
3 0
5 ARRAY(0x8538474)
0 'ss_5'
1 'ss'
2 ARRAY(0x854f9c8)
0 'S'
1 ';'
2 'ss'
3 0
6 ARRAY(0x85383b4)
0 'ss_6'
1 'ss'
2 ARRAY(0x85383f0)
0 'S'
3 0
DB<4>
Using a second c
the execution continues until reaching the end of the program:
DB<3> c
Debugged program terminated. Use q to quit or R to restart,
use o inhibit_exit to avoid stopping after program termination,
h q, h R or h o to get additional info.
DB<3>
Parse::Eyapp
Methods
A Parse::Eyapp
object holds the information about the Eyapp
input grammar: parsing tables, conflicts, semantic actions, etc.
Parse::Eyapp->new_grammar
To translate an Eyapp grammar you must use either the eyapp script or call the class constructor new_grammar
. The Parse::Eyapp
method Parse::Eyapp->new_grammar(input=>$grammar)
creates a package containing the code that implements a LALR parser for the input grammar:
my $p = Parse::Eyapp->new_grammar(
input=>$translationscheme,
classname=>'Grammar',
firstline => 6,
outputfile => 'main'
);
die $p->Warnings if $p->Warnings;
my $new_parser_for_grammar = Grammar->new();
The method returns a Parse::Eyapp
object.
You can check the object to see if there were problems during the construction of the parser for your grammar:
die $p->qtables() if $p->Warnings;
The method Warnings
returns the warnings produced during the parsing. The absence of warnings indicates the correctness of the input program.
The call to Parse::Eyapp->new_grammar
generates a class/package containing the parser for your input grammar. Such package lives in the namespace determined by the classname
argument of new_grammar
. To create a parser for the grammar you call the constructor new
of the just created class:
my $new_parser_for_grammar = Grammar->new();
The meaning of the arguments of Parse::Eyapp->new_grammar
is:
- - input
-
The string containing the input
- - classname
-
The name of the package that will held the code for the LALR parser. The package of the caller will be used as default if none is specified.
- - firstline
-
For error diagnostics. The line where the definition of the Eyapp grammar starts.
- - linenumbers
-
Include/not include
# line directives
in the generated code - - outputfile
-
If defined the generated code fill be dumped in the specified filename (with extension .pm) and the LALR information ambigueties and conflicts) in the specified filename with extension .output.
$eyapp->qtables
Returns a string containing information on warnings, ambiguities, conflicts, rules and the generated DFA tables. Is the same information in file.output
when using the command eyapp -v file.eyp
.
my $p = Parse::Eyapp->new_grammar(
input=>$eyappprogram,
classname=>'SimpleC',
outputfile => 'SimpleC.pm',
firstline=>12,
);
print $p->qtables() if $p->Warnings;
$eyapp->outputtables
It receives two arguments
$eyapp->outputtables($path, $base)
Similar to qtables
but prints the information on warnings, conflicts and rules to the specified $path/$file
.
$eyapp->Warnings
Returns the warnings resulting from compiling the grammar:
my $p = Parse::Eyapp->new_grammar(
input=>$translationscheme,
classname=>'main',
firstline => 6,
outputfile => 'main'
);
die $p->Warnings if $p->Warnings;
Returns the empty string if there were no conflicts.
$eyapp->ShowDfa
Returns a string with the information about the LALR generated DFA.
$eyapp->Summary
Returns a string with summary information about the compilation of the grammar. No arguments.
$eyapp->Conflicts
Returns a string with summary information about the conflicts that arised when compiling the grammar. No arguments.
$eyapp->DfaTable
Returns a string with the parsing tables
Methods Available in the Generated Class
The class containing the parser generated by Parse::Eyapp
inherits from Parse::Eyapp::Driver
. Therefore all the methods in Parse::Eyapp::Driver
are avaialbe in the generated class.
This section describes the methods and objects belonging to the class generated either using eyapp or Parse::Eyapp->new_grammar
. In the incoming paragraphs we will assume that Class
was the value selected for the classname
argument when Parse::Eyapp->new_grammar
was called. Objects belonging to Class
are the actual parsers for the input grammar.
Class->new
The method Class->new
returns a new LALR parser object. Here Class
stands for the name of the class containing the parser. See an example of call:
my $parser = main->new(yyprefix => 'Parse::Eyapp::Node::',
yylex => \&main::_Lexer,
yyerror => \&main::_Error,
yydebug => 0x1F,
);
The meaning of the arguments used in the example are as follows:
- - yyprefix
-
Used with
%tree
or%metatree
. When used, the type names of the nodes of the syntax tree will be build prefixing the value associated toyyprefix
to the name of the production rule. The name of the production rule is either explicitly given through a %name directive or the concatenation of the left hand side of the rule with the ordinal of the right hand side of the production. See section "Compiling with eyapp and treereg" in Parse::Eyapp for an example. - - yylex
-
Reference to the lexical analyzer subroutine
- - yyerror
-
Reference to the error subroutine. The error subroutine receives as first argument the reference to the
Class
parser object. This way it can take advantage of methods likeYYCurval
and YYExpect (see below):sub _Error { my($token)=$_[0]->YYCurval; my($what)= $token ? "input: '$token'" : "end of input"; my @expected = $_[0]->YYExpect(); local $" = ', '; die "Syntax error near $what. Expected one of these tokens: @expected\n"; }
- - yydebug
-
Controls the level of debugging. Must be a number.
The package produced from the grammar has several methods.
The parser object has the following methods that work at parsing time exactly as in Parse::Yapp. These methods can be found in the module Parse::Eyapp::Driver. Assume you have in $parser
the reference to your parser object:
$parser->YYParse()
It very much works Parse::Yapp::YYParse
and as yacc/bison yyparse
. It accepts almost the same arguments as Class->new
with the exception of yyprefix
which can be used only with new
.
$parser->YYErrok
Works as yacc/bison yyerrok
. Modifies the error status so that subsequent error messages will be emitted.
$parser->YYError
Works as yacc/bison YYERROR
. Pretends that a syntax error has been detected.
$parser->YYNberr
The current number of errors
$parser->YYAccept
Works as yacc/bison YYACCEPT
. The parser finishes returning the current semantic value to indicate success.
$parser->YYAbort
Works as yacc/bison YYABORT
. The parser finishes returning undef
to indicate failure.
$parser->YYBuildingTree
Influences the semantic of list operators. If true the action associated with X+
will be to build a Parse::Eyapp::Node
node with all the attributes of the elements in the list as children. This is the appropriate semantic when working under the %tree
directive. If set to false the semantic action will return an anonymous list with the attributes associated with the X
in the plus list. Same thing with the operators *
and ?
.
$parser->YYRecovering
Works as yacc/bison YYRECOVERING
. Returns TRUE
if the parser is recovering from a syntax error.
$parser->YYCurtok
Gives the current token
$parser->YYCurval
Gives the attribute associated with the current token
$parser->YYExpect
Returns the list of tokens the parser expected when the failure occurred
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ \
sed -ne '26,33p' Postfix.eyp
sub _Error {
my($token)=$_[0]->YYCurval;
my($what)= $token ? "input: '$token'" : "end of input";
my @expected = $_[0]->YYExpect();
local $" = ', ';
die "Syntax error near $what. Expected one of these tokens: @expected\n";
}
$parser->YYLexer
Returns a reference to the lexical analyzer
$parser->YYLhs
Returns the identifier of the left hand side of the current production (the one that is being used for reduction/antiderivation. An example of use can be found in examples/Lhs1.yp
:
%defaultaction { print $_[0]->YYLhs,"\n" }
$parser->YYRuleindex
Returns the index of the production rule, counting the super rule as rule 0. To know the numbers have a look at the .output
file. To get a .output
file use the option -v
of eyapp
or the outputfile
parameter when using method new_grammar
(see the documentation for eyapp).
$parser->YYRightside
Returns an array of strings describing the right hand side of the rule
$parser->YYIsterm
Returns TRUE if the symbol given as argument is a terminal. Example:
DB<0> x $self->YYIsterm('exp')
0 ''
DB<1> x $self->YYIsterm('*')
0 1
An example of combined use of YYRightside
, YYRuleindex
, YYLhs
and YYIsterm
can be found examples/Rule3.yp
:
nereida:~/src/perl/YappWithDefaultAction/examples> sed -n -e '4,22p' Rule3.yp | cat -n
1 sub build_node {
2 my $self = shift;
3 my @children = @_;
4 my @right = $self->YYRightside();
5 my $var = $self->YYLhs;
6 my $rule = $self->YYRuleindex();
7
8 for(my $i = 0; $i < @right; $i++) {
9 $_ = $right[$i];
10 if ($self->YYIsterm($_)) {
11 $children[$i] = bless { token => $_, attr => $children[$i] },
12 __PACKAGE__.'::TERMINAL';
13 }
14 }
15 bless {
16 children => \@children,
17 info => "$var -> @right"
18 }, __PACKAGE__."::${var}_$rule"
19 }
when executed an output similar to this is produced:
nereida:~/src/perl/YappWithDefaultAction/examples> userule3.pl
2*3
$VAR1 = bless( {
'info' => 'exp -> exp * exp',
'children' => [
bless( {
'info' => 'exp -> NUM',
'children' => [ bless( { 'attr' => '2', 'token' => 'NUM' }, 'Rule3::TERMINAL' ) ]
}, 'Rule3::exp_6' ),
bless( { 'attr' => '*', 'token' => '*' }, 'Rule3::TERMINAL' ),
bless( {
'info' => 'exp -> NUM',
'children' => [ bless( { 'attr' => '3', 'token' => 'NUM' }, 'Rule3::TERMINAL' )
]
}, 'Rule3::exp_6' )
]
}, 'Rule3::exp_11' );
$parser->YYIssemantic
Returns TRUE if the terminal is semantic. Semantics token can be declared using the directive %semantic token
. The opposite of a Semantic token is a Syntactic token. Syntactic tokens can be declared using the directive %syntactic token
.
When using the %tree
directive all the nodes corresponding to syntactic tokens are pruned from the tree. Under this directive tokens in the text delimited by simple quotes (like '+'
) are, by default, considered syntactic tokens.
When using the %metatree
directive all the tokens are considered, by default, semantic tokens. Thus, no nodes will be - by default- pruned when construction the code augmented tree. The exception are string tokens used as separators in the definition of lists, like in S <* ';'>
. If you want the separating string token to appear include an explicit semantic declaration for it (example %semantic token ';'
).
$parser->YYName
Returns the name of the current rule (The production whose reduction gave place to the execution of the current semantic action).
DB<12> x $self->YYName
0 'exp_11'
$parser->YYPrefix
Return and/or sets the yyprefix
attribute. This a string that will be concatenated as a prefix to any Parse::Eyapp::Node
nodes in the syntax tree.
$parser->YYBypass
Returns TRUE if running under the %tree bypass
clause
$parser->YYBypassrule
Returns TRUE if the production being used for reduction was marked to be bypassed.
$parser->YYFirstline
First line of the input string describing the grammar
Parse::Eyapp::Driver::BeANode
Is not a method. Receives as input a Class
name. Introduces Parse::Eyapp::Node
as an ancestor class of Class
. To work correctly, objects belonging to Class
must be hashes with a children
key whose value must be a reference to the array of children. The children must be also Parse::Eyapp::Node
nodes. Actually you can circumvent this call by directly introducing Parse::Eyapp::Node
in the ancestors of Class
:
push @{$class."::ISA"}, "Parse::Eyapp::Node"
$parser->YYBuildAST
Sometimes the best time to decorate a node with some attributes is just after being built. In such cases the programmer can take manual control building the node with YYBuildAST
to inmediately proceed to decorate it.
The following example from examples/Types.eyp
illustrates the idea:
Variable:
%name VARARRAY
$ID ('[' binary ']') <%name INDEXSPEC +>
{
my $self = shift;
my $node = $self->YYBuildAST(@_);
$node->{line} = $ID->[1];
return $node;
}
Actually, the %tree
directive is semantically equivalent to:
%default action { goto &Parse::Eyapp::Driver::YYBuildAST }
$parser->YYBuildTS
Similar to $parser->YYBuildAST
but builds nodes for translation schemes.
Parse::Eyapp::Parse objects
The parser for the Eyapp
language was written and generated using Parse::Eyapp
and the eyapp
compiler (actually the first version was bootstrapped using the yapp compiler). The Eyapp program parsing the Eyapp
language is in the file Parse/Eyapp/Parse.yp
in the Parse::Eyapp
distribution. Therefore Parse::Eyapp::Parse
objects have all the methods in Parse::Eyapp::driver
.
A Parse::Eyapp::Parse
is nothing but a particular kind of Parse::Eyapp
parser: the one that parses Eyapp
grammars.
Translation Schemes and the %metatree
directive
A translation scheme scheme is a context free grammar where the right hand sides of the productions have been augmented with semantic actions (i.e. with chunks of Perl code):
A -> alpha { action(@_) } beta
The analyzer generated by Parse::Eyapp
executes action()
after all the semantic actions asssociated with alpha
have been executed and before the execution of any of the semantic actions associated with beta
.
In a translation scheme the embedded actions modify the attributes associated with the symbols of the grammar.
A -> alpha { action(@_) } beta
each symbol on the right hand side of a production rule has an associated scalar attribute. In ordinary eyapp
programs the attributes of the symbol to the left of action
are passed as arguments to action
(in the example, those of alpha
). These arguments are preceded by a reference to the syntax analyzer object. There is no way inside an ordinary eyapp
program for an intermediate action
to access the attributes of the symbols on its right, i.e. those associated with the symbols of beta
. This restriction is lifted if you use the %metatree
directive.
Eyapp allows through the %metatree
directive the creation of Translation Schemes where the actions have access to almost any node of the syntax tree.
When using the %metatree
directive semantic actions aren't executed. Instead they are inserted as nodes of the syntax tree. The main difference with ordinary nodes being that the attribute of such a CODE
node is a reference to the anonymous subroutine representing the semantic action. The tree is later traversed in depth-first order using the $t->translation_scheme
method: each time a CODE
node is visited the action is executed.
The following example parses a tiny subset of a typical typed language and decorates the syntax tree with a new attribute t
holding the type of each declared variable:
use strict; # File examples/trans_scheme_simple_decls4.pl
use Data::Dumper;
use Parse::Eyapp;
our %s; # symbol table
my $ts = q{
%token FLOAT INTEGER NAME
%{
our %s;
%}
%metatree
%%
Dl: D <* ';'>
;
D : $T { $L->{t} = $T->{t} } $L
;
T : FLOAT { $lhs->{t} = "FLOAT" }
| INTEGER { $lhs->{t} = "INTEGER" }
;
L : $NAME
{ $NAME->{t} = $lhs->{t}; $s{$NAME->{attr}} = $NAME }
| $NAME { $NAME->{t} = $lhs->{t}; $L->{t} = $lhs->{t} } ',' $L
{ $s{$NAME->{attr}} = $NAME }
;
%%
}; # end $ts
sub Error { die "Error sintáctico\n"; }
{ # Closure of $input, %reserved_words and $validchars
my $input = "";
my %reserved_words = ();
my $validchars = "";
sub parametrize__scanner {
$input = shift;
%reserved_words = %{shift()};
$validchars = shift;
}
sub scanner {
$input =~ m{\G\s+}gc; # skip whites
if ($input =~ m{\G([a-z_A_Z]\w*)\b}gc) {
my $w = uc($1); # upper case the word
return ($w, $w) if exists $reserved_words{$w};
return ('NAME', $1); # not a reserved word
}
return ($1, $1) if ($input =~ m/\G([$validchars])/gc);
die "Not valid token: $1\n" if ($input =~ m/\G(\S)/gc);
return ('', undef); # end of file
}
} # end closure
Parse::Eyapp->new_grammar(input=>$ts,classname=>'main',outputfile=>'Types.pm');
my $parser = main->new(yylex => \&scanner, yyerror => \&Error);
parametrize__scanner(
"float x,y;\ninteger a,b\n",
{ INTEGER => 'INTEGER', FLOAT => 'FLOAT'},
",;"
);
my $t = $parser->YYParse() or die "Syntax Error analyzing input";
$t->translation_scheme;
$Data::Dumper::Indent = 1;
$Data::Dumper::Terse = 1;
$Data::Dumper::Deepcopy = 1;
$Data::Dumper::Deparse = 1;
print Dumper($t);
print Dumper(\%s);
Inside a Translation Scheme the lexical variable $lhs
refers to the attribute of the father.
Execution Stages of a Translation Scheme
The execution of a Translation Scheme can be divided in the following stages:
- 1. During the first stage the grammar is analyzed and the parser is built:
-
Parse::Eyapp->new_grammar(input=>$ts,classname=>'main',outputfile=>'Types.pm');
This stage is called Class Construction Time
- 2. A parser conforming to the generated grammar is built
-
my $parser = main->new(yylex => \&scanner, yyerror => \&Error);
This stage is called Parser Construction Time
- 3. The next phase is Tree construction time. The input is set and the tree is built:
-
parametrize__scanner( "float x,y;\ninteger a,b\n", { INTEGER => 'INTEGER', FLOAT => 'FLOAT'}, ",;" ); my $t = $parser->YYParse() or die "Syntax Error analyzing input";
- 4. The last stage is Execution Time. The tree is traversed in depth first order and the
CODE
nodes are executed. -
$t->translation_scheme;
This combination of bottom-up parsing with depth first traversing leads to a semantic behavior similar to recursive top-down parsers but with two advantages:
The grammar can be left-recursive
At the time of executing the action the syntax tree is already built, therefore we can refer to nodes on the right side of the action like in:
D : $T { $L->{t} = $T->{t} } $L
The %begin
directive
The %begin { code }
directive can be used when building a translation scheme, i.e. when under the control of the %metatree
directive. It indicates that such { code }
will be executed at tree construction time. Therefore the code receives as arguments the references to the nodes of the branch than is being built. Usually begin code assist in the construction of the tree. Line 39 of the following code shows an example. The action { $exp }
simplifies the syntax tree bypassing the parenthesis node. The example also illustrates the combined use of default actions and translation schemes.
pl@nereida:~/LEyapp/examples$ cat -n trans_scheme_default_action.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Data::Dumper;
4 use Parse::Eyapp;
5 use IO::Interactive qw(is_interactive);
6
7 my $translationscheme = q{
8 %{
9 # head code is available at tree construction time
10 use Data::Dumper;
11 our %sym; # symbol table
12 %}
13
14 %defaultaction {
15 $lhs->{n} = eval " $left->{n} $_[2]->{attr} $right->{n} "
16 }
17
18 %metatree
19
20 %right '='
21 %left '-' '+'
22 %left '*' '/'
23
24 %%
25 line: %name EXP
26 exp <+ ';'> /* Expressions separated by semicolons */
27 { $lhs->{n} = $_[1]->Last_child->{n} }
28 ;
29
30 exp:
31 %name PLUS
32 exp.left '+' exp.right
33 | %name MINUS
34 exp.left '-' exp.right
35 | %name TIMES
36 exp.left '*' exp.right
37 | %name DIV
38 exp.left '/' exp.right
39 | %name NUM
40 $NUM
41 { $lhs->{n} = $NUM->{attr} }
42 | '(' $exp ')' %begin { $exp }
43 | %name VAR
44 $VAR
45 { $lhs->{n} = $sym{$VAR->{attr}}->{n} }
46 | %name ASSIGN
47 $VAR '=' $exp
48 { $lhs->{n} = $sym{$VAR->{attr}}->{n} = $exp->{n} }
49
50 ;
51
52 %%
53 # tail code is available at tree construction time
54 sub _Error {
55 die "Syntax error.\n";
56 }
57
58 sub _Lexer {
59 my($parser)=shift;
60
61 for ($parser->YYData->{INPUT}) {
62 s/^\s+//;
63 $_ or return('',undef);
64 s/^([0-9]+(?:\.[0-9]+)?)// and return('NUM',$1);
65 s/^([A-Za-z][A-Za-z0-9_]*)// and return('VAR',$1);
66 s/^(.)// and return($1,$1);
67 }
68 return('',undef);
69 }
70
71 sub Run {
72 my($self)=shift;
73 return $self->YYParse( yylex => \&_Lexer, yyerror => \&_Error );
74 }
75 }; # end translation scheme
76
77 sub TERMINAL::info { $_[0]->attr }
78
79 my $p = Parse::Eyapp->new_grammar(
80 input=>$translationscheme,
81 classname=>'main',
82 firstline => 6,
83 outputfile => 'main.pm');
84 die $p->qtables() if $p->Warnings;
85 my $parser = main->new();
86 print "Write a sequence of arithmetic expressions: " if is_interactive();
87 $parser->YYData->{INPUT} = <>;
88 my $t = $parser->Run() or die "Syntax Error analyzing input";
89 $t->translation_scheme;
90
91 $Parse::Eyapp::Node::INDENT = 2;
92 my $treestring = $t->str;
93
94 $Data::Dumper::Indent = 1;
95 $Data::Dumper::Terse = 1;
96 $Data::Dumper::Deepcopy = 1;
97 our %sym;
98 my $symboltable = Dumper(\%sym);
99
100 print <<"EOR";
101 ***********Tree*************
102 $treestring
103 ******Symbol table**********
104 $symboltable
105 ************Result**********
106 $t->{n}
107
108 EOR
When executed with input a=2*3;b=a*a
the program produces an output similar to this:
pl@nereida:~/LEyapp/examples$ trans_scheme_default_action.pl
Write a sequence of arithmetic expressions: a=2*3;b=a*a
***********Tree*************
EXP(
_PLUS_LIST(
ASSIGN(
TERMINAL[a],
TERMINAL[=],
TIMES(
NUM(TERMINAL[2], CODE),
TERMINAL[*],
NUM(TERMINAL[3], CODE),
CODE
) # TIMES,
CODE
) # ASSIGN,
ASSIGN(
TERMINAL[b],
TERMINAL[=],
TIMES(
VAR(TERMINAL[a], CODE),
TERMINAL[*],
VAR(TERMINAL[a], CODE),
CODE
) # TIMES,
CODE
) # ASSIGN
) # _PLUS_LIST,
CODE
) # EXP
******Symbol table**********
{
'a' => {
'n' => 6
},
'b' => {
'n' => 36
}
}
************Result**********
36
The Treeregexp Language
A Treeregexp program is made of the repetition of three kind of primitives: The treeregexp transformations, auxiliar Perl code and Transformation Families.
treeregexplist: treeregexp*
treeregexp:
IDENT ':' treereg ('=>' CODE)? # Treeregexp
| CODE # Auxiliar code
| IDENT '=' IDENT + ';' # Transformation families
Treeregexp themselves follow the rule:
IDENT ':' treereg ('=>' CODE)?
Several instances of this rule can be seen in the example in the "SYNOPSIS" section. The identifier IDENT
gives the name to the rule. At the time of this writing (2006) there are the following kinds of treeregexes:
treereg:
/* tree patterns with children */
IDENT '(' childlist ')' ('and' CODE)?
| REGEXP (':' IDENT)? '(' childlist ')' ('and' CODE)?
| SCALAR '(' childlist ')' ('and' CODE)?
| '.' '(' childlist ')' ('and' CODE)?
/* leaf tree patterns */
| IDENT ('and' CODE)?
| REGEXP (':' IDENT)? ('and' CODE)?
| '.' ('and' CODE)?
| SCALAR ('and' CODE)?
| ARRAY
| '*'
Treeregexp rules
When seen a rule like
zero_times: TIMES(NUM($x), ., .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
The Treeregexp translator creates a Parse::Eyapp:YATW
object that can be later referenced in the user code by the package variable $zero_times
.
The treeregexp
The first part of the rule TIMES(NUM($x), ., .)
indicates that for a matching to succeed the node being visited must be of type
TIMES
, have a left child of type
NUM
and two more children.
If the first part succeeded then the following part takes the control to see if the semantic conditions are satisfied.
Semantic condition
The second part is optional and must be prefixed by the reserved word and
followed by a Perl code manifesting the semantic conditions that must be hold by the node to succeed. Thus, in the example:
zero_times: TIMES(NUM($x), ., .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
the semantic condition $x->{attr} == 0
states that the value of the number stored in the TERMINAL
node referenced by $x
must be zero.
Referencing the matching nodes
The node being visited can be referenced/modified inside the semantic actions using $_[0]
.
The Treeregexp translator automatically creates a set of lexical variables for us. The scope of these variables is limited to the semantic condition and the transformation code.
Thus, in the example
zero_times: TIMES(NUM($x), ., .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
the node being visited $_[0]
can be also referenced using the lexical variable $TIMES
which is created by he Treeregexp compiler. In the same way a reference to the left child NUM
will be stored in the lexical variable $NUM
and a reference to the child of $NUM
will be stored in $x
. The semantic condition states that the attribute of the node associated with $x
must be zero.
When the same type of node appears several times inside the treeregexp part the associated lexical variable is declared by the Treeregexp compiler as an array. This is the case in the constantfold
transformation in the "SYNOPSIS" example, where there are two nodes of type NUM
:
constantfold: /TIMES|PLUS|DIV|MINUS/(NUM($x), ., NUM($y))
=> {
$x->{attr} = eval "$x->{attr} $W->{attr} $y->{attr}";
$_[0] = $NUM[0];
}
Thus variable $NUM[0]
references the node that matches the first NUM
term in the formula and $NUM[1]
the one that matches the second.
Transformation code
The third part of the rule is also optional and comes prefixed by the big arrow =>
. The Perl code in this section usually transforms the matching tree. To achieve the modification of the tree, the Treeregexp programmer must use $_[0]
and not the lexical variables provided by the translator. Remember that in Perl $_[0]
is an alias of the actual parameter. The constantfold
example above will not work if we rewrite the code $_[0] = $NUM[0]
as
{ $TIMES = $NUM }
Regexp Treeregexes
The previous constantfold
example used a classic Perl linear regexp to explicit that the root node of the matching subtree must match the Perl regexp. The general syntax for REGEXP
treeregexes patterns is:
treereg: REGEXP (':' IDENT)? '(' childlist ')' ('and' CODE)?
The REGEXP
must be specified between slashes (other delimiters as {}
are not accepted). It is legal to specify options after the second slash (like e
, i
, etc.).
The operation of string oriented regexps is slightly modified when they are used inside a treeregexp: by default the option x
will be assumed. The treeregexp compiler will automatically insert it. Use the new option X
(upper case X) if you want to supress such behavior. There is no need also to insert \b
word anchors to delimit identifiers: all the identifiers in a regexp treeregexp are automatically surrounded by \b
. Use the option B
(upper case B) to supress this behavior.
The optional identifier after the REGEXP
indicates the name of the lexical variable that will be held a reference to the node whose type matches REGEXP
. Variable $W
(or @W
if there are more than one REGEXP and or dot treeregexes) will be used instead if no identifier is specified.
Scalar Treeregexes
A scalar treeregxp is defined writing a Perl scalar inside the treeregexp, like $x
in NUM($x)
. A scalar treeregxp immediately matches any node that exists and stores a reference to such node inside the Perl lexical scalar variable. The scope of the variable is limited to the semantic parts of the Treeregexp. Is illegal to use $W
or $W_#num
as variable names for scalar treeregexes.
Dot Treeregexes
A dot matches any node. It can be seen as an abbreviation for scalar treeregexes. The reference to the matching node is stored in the lexical variable $W
. The variable @W
will be used instead if there are more than one REGEXP and or dot treeregexes
Array Treeregexp Expressions
The Treeregexp language permits expressions like:
A(@a,B($x),@c)
After the matching variable @A
contains the shortest prefix of $A->children
that does not match B($x)
. The variable @c
contains the remaining sufix of $A->children
.
The following example uses array treereg expressions to move the assignment b = 5
out of the while
loop:
.. ......................................................................
93 my $program = "a =1000; c = 1; while (a) { c = c*a; b = 5; a = a-1 }\n";
94 $parser->YYData->{INPUT} = $program;
95 my $t = $parser->Run;
96 my @output = split /\n/, $t->str;
97
98 my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
99 moveinvariant: BLOCK(
100 @prests,
101 WHILE(VAR($b), BLOCK(@a, ASSIGN($x, NUM($e)), @c)),
102 @possts
103 )
104 => {
105 my $assign = $ASSIGN;
106 $BLOCK[1]->delete($ASSIGN);
107 $BLOCK[0]->insert_before($WHILE, $assign);
108 }
109 },
110 FIRSTLINE => 99,
111 );
112 $p->generate();
Star Treeregexp
Deprecated. Don't use it. Is still there but not to endure.
Transformation Families
Transformations created by Parse::Eyapp::Treeregexp
can be grouped in families. That is the function of the rule:
treeregexp: IDENT '=' IDENT + ';'
The next example (file examples/TSwithtreetransformations3.eyp
) defines the family
algebraic_transformations = constantfold zero_times times_zero comasocfold;
Follows the code:
my $transform = Parse::Eyapp::Treeregexp->new( STRING => q{
uminus: UMINUS(., NUM($x), .) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($z), ., NUM($y))
=> {
$z->{attr} = eval "$z->{attr} $W->{attr} $y->{attr}";
$_[0] = $NUM[0];
}
commutative_add: PLUS($x, ., $y, .)
=> { my $t = $x; $_[0]->child(0, $y); $_[0]->child(2, $t)}
comasocfold: TIMES(DIV(NUM($x), ., $b), ., NUM($y))
=> {
$x->{attr} = $x->{attr} * $y->{attr};
$_[0] = $DIV;
}
zero_times: TIMES(NUM($x), ., .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
times_zero: TIMES(., ., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
algebraic_transformations = constantfold zero_times times_zero comasocfold;
},
);
$transform->generate();
our ($uminus);
$uminus->s($tree);
The transformations belonging to a family are usually applied toghether:
$tree->s(@algebraic_transformations);
Code Support
In between Treeregexp rules and family assignments the programmer can insert Perl code between curly brackets. That code usually gives support to the semantic conditions and transformations inside the rules. See for example test 14 in the t/
directory of the Parse::Eyapp distribution.
{
sub not_semantic {
my $self = shift;
return 1 if $self->{token} eq $self->{attr};
return 0;
}
}
delete_tokens : TERMINAL and { not_semantic($TERMINAL) }
=> { $delete_tokens->delete() }
Parse::Eyapp::Node Methods
The Parse::Eyapp::Node
objects represent the nodes of the syntax tree. All the node classes build by %tree
and %metatree
directives inherit from Parse::Eyapp::Node
and consequently have acces to the methods provided in such module.
Parse::Eyapp::Node->new
Nodes are usually created using the %tree
or %metatree
Parse::Eyapp
directives. The Parse::Eyapp::Node
constructor new
offers an alternative way to create forests.
This class method can be used to build multiple nodes on a row. It receives a string describing the tree and optionally a reference to a subroutine. Such subroutine (called the attribute handler) is in charge to initialize the attributes of the just created nodes. The attribute handler is called with the array of references to the nodes as they appear in the string from left to right.
Parse::Eyapp::Node->new
returns an array of pointers to the nodes created as they appear in the input string from left to right. In scalar context returns a pointer to the first of these trees.
The following example (see file examples/28foldwithnewwithvars.pl
) of a treeregexp transformation creates a new NUM(TERMINAL)
node using Parse::Eyapp::Node->new
:
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
{
my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
}
constantfold: /TIMES|PLUS|MINUS|DIV/(NUM($x), NUM($y))
=> {
my $op = $Op{ref($_[0])};
my $res = Parse::Eyapp::Node->new(
q{NUM(TERMINAL)},
sub {
my ($NUM, $TERMINAL) = @_;
$TERMINAL->{attr} = eval "$x->{attr} $op $y->{attr}";
$TERMINAL->{token} = 'NUM';
},
);
$_[0] = $res;
}
},
);
The string can describe more than one tree like in:
my @t = Parse::Eyapp::Node->new(
'A(C,D) E(F)', sub { my $i = 0; $_->{n} = $i++ for @_ });
The following trees will be built:
bless( { 'n' => 0,
'children' => [
bless( { 'n' => 1, 'children' => [] }, 'C' ),
bless( { 'n' => 2, 'children' => [] }, 'D' )
]
}, 'A' );
bless( { 'n' => 3,
'children' => [
bless( { 'n' => 4, 'children' => [] }, 'F' )
]
}, 'E' );
and @t
will contain 5 references to the corresponding subtrees A(C,D), C, D, E(F) and F.
Directed Acyclic Graphs with Parse::Eyapp::Node->hnew
Parse::Eyapp
provides the method Parse::Eyapp::Node->hnew
to build Directed Acyclic Graphs (DAGs) instead of trees. They are built using hashed consing, i.e. memoizing the creation of nodes. It works very much like Parse::Eyapp::Node->new
but if one of the implied trees was previously built, hnew
returns a reference to the existing one. See the following debugger session where several DAGs describing type expressions are built:
DB<2> x $a = Parse::Eyapp::Node->hnew('F(X_3(A_3(A_5(INT)), CHAR, A_5(INT)),CHAR)')
0 F=HASH(0x85f6a20)
'children' => ARRAY(0x85e92e4)
|- 0 X_3=HASH(0x83f55fc)
| 'children' => ARRAY(0x83f5608)
| |- 0 A_3=HASH(0x85a0488)
| | 'children' => ARRAY(0x859fad4)
| | 0 A_5=HASH(0x85e5d3c)
| | 'children' => ARRAY(0x83f4120)
| | 0 INT=HASH(0x83f5200)
| | 'children' => ARRAY(0x852ccb4)
| | empty array
| |- 1 CHAR=HASH(0x8513564)
| | 'children' => ARRAY(0x852cad4)
| | empty array
| `- 2 A_5=HASH(0x85e5d3c)
| -> REUSED_ADDRESS
`- 1 CHAR=HASH(0x8513564)
-> REUSED_ADDRESS
DB<3> x $a->str
0 'F(X_3(A_3(A_5(INT)),CHAR,A_5(INT)),CHAR)'
The second occurrence of A_5(INT)
is labelled REUSED_ADDRESS
. The same occurs with the second instance of CHAR
. Parse::Eyapp::Node->hnew
can be more convenient than new
when dealing with optimizations like common subexpressions or during type checking. See file examples/Types.eyp
for a more comprehensive example.
Expanding Directed Acyclic Graphs with Parse::Eyapp::Node->hexpand
Calls to Parse::Eyapp::Node->hexpand
have the syntax
$z = Parse::Eyapp::Node->hexpand('CLASS', @children, \&handler)
Creates a dag of type 'CLASS'
with children @children
in a way compatible with hew
. The last optional argument can be a reference to a sub. Such sub will be called after the creation of the DAG with a reference to the root of the DAG as single argument. The following session with the debugger illustrates the use of Parse::Eyapp::Node->hexpand
. First we create a DAG using hnew
:
pl@nereida:~/Lbook/code/Simple-Types/script$ perl -MParse::Eyapp::Node -wde 0
main::(-e:1): 0
DB<1> $x = Parse::Eyapp::Node->hnew('A(C(B),C(B))')
DB<2> x $x
0 A=HASH(0x850c850)
'children' => ARRAY(0x850ca30)
0 C=HASH(0x850c928)
'children' => ARRAY(0x850c9e8)
0 B=HASH(0x850c9a0)
'children' => ARRAY(0x83268c8)
empty array
1 C=HASH(0x850c928)
-> REUSED_ADDRESS
Now we can expand the DAG using hexpand
:
DB<3> $y = Parse::Eyapp::Node->hexpand('A', $x->child(0))
DB<4> x $y
0 A=HASH(0x8592558)
'children' => ARRAY(0x832613c)
0 C=HASH(0x850c928)
'children' => ARRAY(0x850c9e8)
0 B=HASH(0x850c9a0)
'children' => ARRAY(0x83268c8)
empty array
DB<5> $z = Parse::Eyapp::Node->hexpand('A', $x->children)
DB<6> x $z
0 A=HASH(0x850c850)
'children' => ARRAY(0x850ca30)
0 C=HASH(0x850c928)
'children' => ARRAY(0x850c9e8)
0 B=HASH(0x850c9a0)
'children' => ARRAY(0x83268c8)
empty array
1 C=HASH(0x850c928)
-> REUSED_ADDRESS
Notice that the address for $z
is the same than the address for $x
. The following command illustrates the use with a handler:
DB<7> $z = Parse::Eyapp::Node->hexpand('A', $x->children, sub { $_[0]->{t} = "X" })
DB<8> x $z
0 A=HASH(0x850c850)
'children' => ARRAY(0x850ca30)
0 C=HASH(0x850c928)
'children' => ARRAY(0x850c9e8)
0 B=HASH(0x850c9a0)
'children' => ARRAY(0x83268c8)
empty array
1 C=HASH(0x850c928)
-> REUSED_ADDRESS
't' => 'X'
$node->type
Returns the type of the node. It can be called as a sub when $node
is not a Parse::Eyapp::Node
like this:
Parse::Eyapp::Node::type($scalar)
This is the case when visiting CODE
nodes.
The following session with the debugger illustrates how it works:
> perl -MParse::Eyapp::Node -de0
DB<1> @t = Parse::Eyapp::Node->new("A(B,C)") # Creates a tree
DB<2> x map { $_->type } @t # Get the types of the three nodes
0 'A'
1 'B'
2 'C'
DB<3> x Parse::Eyapp::Node::type(sub {})
0 'CODE'
DB<4> x Parse::Eyapp::Node::type("hola")
0 'Parse::Eyapp::Node::STRING'
DB<5> x Parse::Eyapp::Node::type({ a=> 1})
0 'HASH'
DB<6> x Parse::Eyapp::Node::type([ a, 1 ])
0 'ARRAY'
As it is shown in the example it can be called as a subroutine with a (CODE/HASH/ARRAY) reference or an ordinary scalar.
The words HASH, CODE, ARRAY and STRING are reserved for ordinary Perl references. Avoid naming a node with one of those words.
$node->child
Setter-getter to modify a specific child of a node. It is called like:
$node->child($i)
Returns the child with index $i. Returns undef
if the child does not exists. It has two obligatory parameters: the node (since it is a method) and the index of the child. Sets the new value if called
$node->child($i, $tree)
The method will croak if the obligatory parameters are not provided. Follows an example of use inside a Treereg program (see file examples/TSwithtreetransformations2.eyp
) that swaps the children of a PLUS
node:
my $transform = Parse::Eyapp::Treeregexp->new( STRING => q{
commutative_add: PLUS($x, ., $y, .) # 1st . = '+' 2nd . = CODE
=> { my $t = $x; $_[0]->child(0, $y); $_[0]->child(2, $t)}
}
Child Access Through %tree alias
Remember that when the Eyapp
program runs under the %tree alias
directive The dot and dollar notations can be used to generate named getter-setters to access the children:
%tree bypass alias
....
%%
exp: %name PLUS
exp.left '+' exp.right
....
%%
.... # and later
print $exp->left->str;
Here methods with names left
and right
will be created to access the corresponding children associated with the two instances of exp
in the right hand side of the production rule.
$node->children
Returns the array of children of the node. When the tree is a translation scheme the CODE references are also included. See examples/TSPostfix3.eyp
for an example of use inside a Translation Scheme:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$\
sed -ne '31,34p' TSPostfix3.eyp
line: %name PROG
exp <%name EXP + ';'>
{ @{$lhs->{t}} = map { $_->{t}} ($_[1]->children()); }
The tree in a Translation Scheme contains the references to the CODE
implementing the semantic actions. For example, the syntax tree built by the parser for the input a=-b*3
in TSPostfix3.eyp
is:
PROG(EXP(
ASSIGN(
TERMINAL[a],
TERMINAL[=],
TIMES(
NEG(TERMINAL[-], VAR(TERMINAL[b], CODE), CODE),
TERMINAL[*],
NUM(TERMINAL[3], CODE),
CODE
) # TIMES,
CODE
) # ASSIGN
) # EXP,
CODE
) # PROG
$node->children
can also be used as a setter.
$node->Children
Returns the array of children of the node. When dealing with a translation scheme, the $node->Children method (first in uppercase) returns the non CODE children of the node.
$node->last_child
Return the last child of the node. When dealing with translation schemes, the last can be a CODE
node.
$node->Last_child
The $node->Last_child
method returns the last non CODE child of the node. See an example:
line: %name EXP
exp <+ ';'> /* Expressions separated by semicolons */
{ $lhs->{n} = $_[1]->Last_child->{n} }
;
$node->descendant
The descendant
method returns the descendant of a node given its coordinates. The coordinates of a node $s
relative to a tree $t
to which it belongs is a string of numbers separated by dots like ".1.3.2"
which denotes the child path from $t
to $s
, i.e. $s == $t->child(1)->child(3)->child(2)
.
See a session with the debugger:
DB<7> x $t->child(0)->child(0)->child(1)->child(0)->child(2)->child(1)->str
0 '
BLOCK[8:4:test]^{0}(
CONTINUE[10,10]
)
DB<8> x $t->descendant('.0.0.1.0.2.1')->str
0 '
BLOCK[8:4:test]^{0}(
CONTINUE[10,10]
$node->str
The str
method returns a string representation of the tree. The str method traverses the syntax tree dumping the type of the node being visited in a string. To be specific the value returned by the function referenced by $CLASS_HANDLER
will be dumped. The default value fo such function is to return the type of the node. If the node being visited has a method info
it will be executed and its result inserted between $DELIMITER
s into the string. Thus, in the "SYNOPSIS" example, by adding the info
method to the class TERMINAL
:
sub TERMINAL::info {
$_[0]{attr}
}
we achieve the insertion of attributes in the string being built by str
.
The existence of some methods (like footnote
) and the values of some package variables influence the behavior of str
. Among the most important are:
@PREFIXES = qw(Parse::Eyapp::Node::); # Prefixes to supress
$INDENT = 0; # -1 compact, no info, no footnotes
# 0 = compact, 1 = indent, 2 = indent and include Types in closing parenthesis
$STRSEP = ','; # Separator between nodes, by default a comma
$DELIMITER = '['; # The string returned by C<info> will be enclosed
$FOOTNOTE_HEADER = "\n---------------------------\n";
$FOOTNOTE_SEP = ")\n";
$FOOTNOTE_LEFT = '^{'; # Left delimiter for a footnote number
$FOOTNOTE_RIGHT = '}'; # Right delimiter for a footnote number
$LINESEP = 4; # When indent=2 the enclosing parenthesis will be
# commented if more than $LINESEP apart
$CLASS_HANDLER = sub { type($_[0]) }; # What to print to identify the node
Footnotes and attribute info will not be inserted when $INDENT
is -1. A compact representation will be obtained. Such representation can be feed to new
or hnew
to obtain a copy of the tree. See the following session with the debugger:
pl@nereida:~/LEyapp$ perl -MParse::Eyapp::Node -wde 0
main::(-e:1): 0
DB<1> $x = Parse::Eyapp::Node->new('A(B(C,D),D)', sub { $_->{order} = $i++ for @_; })
DB<2> *A::info = *B::info = *C::info = *D::info = sub { shift()->{order} }
DB<3> p $x->str
A[0](B[1](C[2],D[3]),D[4])
DB<4> $Parse::Eyapp::Node::INDENT=-1
DB<5> p $x->str
A(B(C,D),D)
DB<6> x Parse::Eyapp::Node->hnew($x->str)
0 A=HASH(0x8574704)
'children' => ARRAY(0x85745d8)
0 B=HASH(0x857468c)
'children' => ARRAY(0x8574608)
0 C=HASH(0x85745b4)
'children' => ARRAY(0x8509670)
empty array
1 D=HASH(0x8574638)
'children' => ARRAY(0x857450c)
empty array
1 D=HASH(0x8574638)
-> REUSED_ADDRESS
1 B=HASH(0x857468c)
-> REUSED_ADDRESS
2 C=HASH(0x85745b4)
-> REUSED_ADDRESS
3 D=HASH(0x8574638)
-> REUSED_ADDRESS
4 D=HASH(0x8574638)
-> REUSED_ADDRESS
The following list defines the $DELIMITER
s you can choose for attribute representation:
'[' => ']', '{' => '}', '(' => ')', '<' => '>'
If the node being visited has a method footnote
, the string returned by the method will be concatenated at the end of the string as a footnote. The variables $FOOTNOTE_LEFT
and $FOOTNOTE_RIGHT
govern the displaying of footnote numbers.
Follows an example of output using footnotes
.
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/script> \
usetypes.pl prueba24.c
PROGRAM^{0}(FUNCTION[f]^{1}(RETURNINT(TIMES(INUM(TERMINAL[2:2]),VAR(TERMINAL[a:2])))))
---------------------------
0)
Types:
$VAR1 = {
'CHAR' => bless( {
'children' => []
}, 'CHAR' ),
'VOID' => bless( {
'children' => []
}, 'VOID' ),
'INT' => bless( {
'children' => []
}, 'INT' ),
'F(X_1(INT),INT)' => bless( {
'children' => [
bless( {
'children' => [
$VAR1->{'INT'}
]
}, 'X_1' ),
$VAR1->{'INT'}
]
}, 'F' )
};
Symbol Table:
$VAR1 = {
'f' => {
'type' => 'F(X_1(INT),INT)',
'line' => 1
}
};
---------------------------
1)
$VAR1 = {
'a' => {
'type' => 'INT',
'param' => 1,
'line' => 1
}
};
The first footnote was due to a call to PROGRAM:footnote
. The footnote
method for the PROGRAM
node was defined as:
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '691,696p' Types.eyp | cat -n
1 sub PROGRAM::footnote {
2 return "Types:\n"
3 .Dumper($_[0]->{types}).
4 "Symbol Table:\n"
5 .Dumper($_[0]->{symboltable})
6 }
The second footnote was produced by the existence of a FUNCTION::footnote
method:
nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
sed -n -e '702,704p' Types.eyp | cat -n
1 sub FUNCTION::footnote {
2 return Dumper($_[0]->{symboltable})
3 }
The source program for the example was:
1 int f(int a) {
2 return 2*a;
3 }
$node->equal
A call $tree1->equal($tree2)
compare the two trees $tree1
and $tree2
. Two trees are considered equal if their root nodes belong to the same class, they have the same number of children and the children are (recursively) equal.
Additionally to the two trees the programmer can specify pairs attribute_key => equality_handler
:
$tree1->equal($tree2, attr1 => \&handler1, attr2 => \&handler2, ...)
In such case the definition of equality is more restrictive: Two trees are considered equal if
Their root nodes belong to the same class,
They have the same number of children
For each of the specified attributes occur that for both nodes the existence and definition of the key is the same
Assuming the key exists and is defined for both nodes, the equality handlers return true for each of its attributes and
The respective children are (recursively) equal.
An attribute handler receives as arguments the values of the attributes of the two nodes being compared and must return true if, and only if, these two attributes are considered equal. Follows an example:
pl@nereida:~/LEyapp/examples$ cat equal.pl
#!/usr/bin/perl -w
use strict;
use Parse::Eyapp::Node;
my $string1 = shift || 'ASSIGN(VAR(TERMINAL))';
my $string2 = shift || 'ASSIGN(VAR(TERMINAL))';
my $t1 = Parse::Eyapp::Node->new($string1, sub { my $i = 0; $_->{n} = $i++ for @_ });
my $t2 = Parse::Eyapp::Node->new($string2);
# Without attributes
if ($t1->equal($t2)) {
print "\nNot considering attributes: Equal\n";
}
else {
print "\nNot considering attributes: Not Equal\n";
}
# Equality with attributes
if ($t1->equal($t2, n => sub { return $_[0] == $_[1] })) {
print "\nConsidering attributes: Equal\n";
}
else {
print "\nConsidering attributes: Not Equal\n";
}
When the former program is run without arguments produces the following output:
pl@nereida:~/LEyapp/examples$ equal.pl
Not considering attributes: Equal
Considering attributes: Not Equal
Using equal
During Testing
During the development of your compiler you add new stages to the existing ones. The consequence is that the AST is decorated with new attributes. Unfortunately, this implies that tests you wrote using is_deeply
and comparisons against formerly correct abstract syntax trees are no longer valid. This is due to the fact that is_deeply
requires both tree structures to be equivalent in every detail and that our new code produces a tree with new attributes.
Instead of is_deeply
use the equal
method to check for partial equivalence between abstract syntax trees. You can follow these steps:
Dump the tree for the source inserting
Data::Dumper
statementsCarefully check that the tree is really correct
Decide which attributes will be used for comparison
Write the code for the expected value editing the output produced by
Data::Dumper
Write the handlers for the attributes you decided. Write the comparison using
equal
.
Tests using this methodology will not fail even if later code decorating the AST with new attributes is introduced. See an example:
pl@nereida:~/LEyapp/examples$ cat -n testequal.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Parse::Eyapp::Node;
4 use Data::Dumper;
5 use Data::Compare;
6
7 my $debugging = 0;
8
9 my $handler = sub {
10 print Dumper($_[0], $_[1]) if $debugging;
11 Compare($_[0], $_[1])
12 };
13
14 my $t1 = bless( {
15 'types' => {
16 'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
17 'VOID' => bless( { 'children' => [] }, 'VOID' ),
18 'INT' => bless( { 'children' => [] }, 'INT' ),
19 'F(X_0(),INT)' => bless( {
20 'children' => [
21 bless( { 'children' => [] }, 'X_0' ),
22 bless( { 'children' => [] }, 'INT' ) ]
23 }, 'F' )
24 },
25 'symboltable' => { 'f' => { 'type' => 'F(X_0(),INT)', 'line' => 1 } },
26 'lines' => 2,
27 'children' => [
28 bless( {
29 'symboltable' => {},
30 'fatherblock' => {},
31 'children' => [],
32 'depth' => 1,
33 'parameters' => [],
34 'function_name' => [ 'f', 1 ],
35 'symboltableLabel' => {},
36 'line' => 1
37 }, 'FUNCTION' )
38 ],
39 'depth' => 0,
40 'line' => 1
41 }, 'PROGRAM' );
42 $t1->{'children'}[0]{'fatherblock'} = $t1;
43
44 # Tree similar to $t1 but without some attributes (line, depth, etc.)
45 my $t2 = bless( {
46 'types' => {
47 'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
48 'VOID' => bless( { 'children' => [] }, 'VOID' ),
49 'INT' => bless( { 'children' => [] }, 'INT' ),
50 'F(X_0(),INT)' => bless( {
51 'children' => [
52 bless( { 'children' => [] }, 'X_0' ),
53 bless( { 'children' => [] }, 'INT' ) ]
54 }, 'F' )
55 },
56 'symboltable' => { 'f' => { 'type' => 'F(X_0(),INT)', 'line' => 1 } },
57 'children' => [
58 bless( {
59 'symboltable' => {},
60 'fatherblock' => {},
61 'children' => [],
62 'parameters' => [],
63 'function_name' => [ 'f', 1 ],
64 }, 'FUNCTION' )
65 ],
66 }, 'PROGRAM' );
67 $t2->{'children'}[0]{'fatherblock'} = $t2;
68
69 # Tree similar to $t1 but without some attributes (line, depth, etc.)
70 # and without the symboltable and types attributes used in the comparison
71 my $t3 = bless( {
72 'types' => {
73 'CHAR' => bless( { 'children' => [] }, 'CHAR' ),
74 'VOID' => bless( { 'children' => [] }, 'VOID' ),
75 'INT' => bless( { 'children' => [] }, 'INT' ),
76 'F(X_0(),INT)' => bless( {
77 'children' => [
78 bless( { 'children' => [] }, 'X_0' ),
79 bless( { 'children' => [] }, 'INT' ) ]
80 }, 'F' )
81 },
82 'children' => [
83 bless( {
84 'symboltable' => {},
85 'fatherblock' => {},
86 'children' => [],
87 'parameters' => [],
88 'function_name' => [ 'f', 1 ],
89 }, 'FUNCTION' )
90 ],
91 }, 'PROGRAM' );
92
93 $t3->{'children'}[0]{'fatherblock'} = $t2;
94
95 # Without attributes
96 if (Parse::Eyapp::Node::equal($t1, $t2)) {
97 print "\nNot considering attributes: Equal\n";
98 }
99 else {
100 print "\nNot considering attributes: Not Equal\n";
101 }
102
103 # Equality with attributes
104 if (Parse::Eyapp::Node::equal(
105 $t1, $t2,
106 symboltable => $handler,
107 types => $handler,
108 )
109 ) {
110 print "\nConsidering attributes: Equal\n";
111 }
112 else {
113 print "\nConsidering attributes: Not Equal\n";
114 }
115
116 # Equality with attributes
117 if (Parse::Eyapp::Node::equal(
118 $t1, $t3,
119 symboltable => $handler,
120 types => $handler,
121 )
122 ) {
123 print "\nConsidering attributes: Equal\n";
124 }
125 else {
126 print "\nConsidering attributes: Not Equal\n";
127 }
The code defining tree $t1
was obtained from an output using Data::Dumper
. The code for trees $t2
and $t3
was written using cut-and-paste from $t1
. They have the same shape than $t1
but differ in their attributes. Tree $t2
shares with $t1
the attributes symboltable
and types
used in the comparison and so equal
returns true
when compared. Since $t3
differs from $t1
in the attributes symboltable
and types
the call to equal
returns false
.
$node->delete
The $node->delete($child)
method is used to delete the specified child of $node
. The child to delete can be specified using the index or a reference. It returns the deleted child.
Throws an exception if the object can't do children
or has no children
. See also the delete method of treeregexes (Parse::Eyapp:YATW
objects) to delete the node being visited.
The following example moves out of a loop an assignment statement assuming is an invariant of the loop. To do it, it uses the delete
and insert_before
methods:
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '98,113p' moveinvariantoutofloopcomplexformula.pl
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
moveinvariant: BLOCK(
@prests,
WHILE(VAR($b), BLOCK(@a, ASSIGN($x, NUM($e)), @c)),
@possts
)
=> {
my $assign = $ASSIGN;
$BLOCK[1]->delete($ASSIGN);
$BLOCK[0]->insert_before($WHILE, $assign);
}
},
FIRSTLINE => 99,
);
$p->generate();
$moveinvariant->s($t);
The example below deletes CODE nodes from the tree build for a translation scheme:
my $transform = Parse::Eyapp::Treeregexp->new(
STRING=>q{
delete_code: CODE => { Parse::Eyapp::Node::delete($CODE) }
},
)
Observe how delete is called as a subroutine.
$node->unshift($newchild)
Inserts $newchild
at the beginning of the list of children of $node
. See also the unshift method for Parse::Eyapp:YATW
treeregexp transformation objects
$node->push($newchild)
Inserts $newchild
at the end of the list of children of $node
.
$node->insert_before($position, $new_child)
Inserts $newchild
before $position
in the list of children of $node
. Variable $position
can be an index or a reference.
The method throws an exception if $position
is an index and is not in range. Also if $node
has no children.
The method throws a warning if $position
is a reference and does not define an actual child. In such case $new_child
is not inserted.
See also the insert_before method for Parse::Eyapp:YATW
treeregexp transformation objects
$node->insert_after($position, $new_child)
Inserts $newchild
after $position
in the list of children of $node
. Variable $position
can be an index or a reference.
The method throws an exception if $position
is an index and is not in the range of $node-
children>.
The method throws a warning if $position
is a reference and does not exists in the list of children. In such case $new_child
is not inserted.
$node->translation_scheme
Traverses $node. Each time a CODE node is visited the subroutine referenced is called with arguments the node and its children. Usually the code will decorate the nodes with new attributes or will update existing ones. Obviously this method does nothing for an ordinary AST. It is used after compiling an Eyapp program that makes use of the %metatree
directive.
$node->bud
Bottom-up decorator. The tree is traversed bottom-up. The set of transformations is applied to each node in the order supplied by the user. As soon as one succeeds no more transformations are applied. For an example see the files examples/Types.eyp
and examples/Trans.trg
. The code below shows an extract of the type-checking phase of a toy-example compiler:
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '600,611p' Types.eyp
my @typecheck = (
our $inum,
our $charconstant,
our $bin,
our $arrays,
our $assign,
our $control,
our $functioncall,
our $statements,
);
$t->bud(@typecheck);
As an example of the appearance of the treeregexp transformations involved in the former call, see the code of the $control
treeregexp transformation:
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '183,192p' Trans.trg
control: /IF|IFELSE|WHILE/:con($bool)
=> {
$bool = char2int($con, 0) if $bool->{t} == $CHAR;
type_error("Condition must have integer type!", $bool->line)
unless $bool->{t} == $INT;
$con->{t} = $VOID;
return 1;
}
Tree Matching and Tree Substitution
Matching Trees
Both the transformation objects in Parse::Eyapp::YATW
and the nodes in Parse::Eyapp::Node
have a method named m
for matching. For a Parse::Eyapp::YATW
object, the method -when called in a list context- returns a list of Parse::Eyapp::Node::Match
nodes.
@R = $t->m($yatw1, $yatw2, $yatw3, ...)
A Parse::Eyapp::Node::Match
object describes the nodes of the actual tree that have matched. The nodes in the returned list are organized in a hierarchy. They appear in the list sorted according to a depth-first visit of the actual tree $t
. In a scalar context m
returns the first element of the list.
Let us denote by $t
the actual tree being searched and $r
one of the Parse::Eyapp::Node::Match
nodes in the resulting forest @R
. Then we have the following methods:
The method
$r->node
return the node$t
of the actual tree that matchedThe method
$r->father
returns the father of$r
in the matching forest. The father of$r
is defined by this property:$r->father->node
is the nearest ancestor of$r->node
that matched with the treeregexp pattern. That is, there is no ancestor that matched between$r->node
and$r->father->node
. Otherwise$r->father
isundef
The method
$r->coord
returns the coordinates of$r->node
relative to$t
. For example, the coordinate".1.3.2"
denotes the node$t->child(1)->child(3)->child(2)
, where$t
is the root of the search.The method
$r->depth
returns the depth of$r->node
in$t
.When
m
was called as aParse::Eyapp::Node
method, i. e. with potentially more than oneYATW
treeregexp, the method$r->names
returns the array of names of the transformations that matched with$r->node
.
The following example illustrates a use of m
as a Parse::Eyapp:YATW
method. It solves a problem of scope analysis in a C compiler: matching each RETURN
statement with the function that surrounds it. The parsing was already done, the AST was built and left in $t
. The treeregexp used is:
retscope: /FUNCTION|RETURN/
and the code that solves the problem is:
# Associate each "return exp" with its "function"
my @returns = $retscope->m($t);
for (@returns) {
my $node = $_->node;
if (ref($node) eq 'RETURN') {
my $function = $_->father->node;
$node->{function} = $function;
$node->{t} = $function->{t};
}
}
The first line gets a list of Parse::Eyapp::Node::Match
nodes describing the actual nodes that matched /FUNCTION|RETURN/
. If the node described by $_
is a 'RETURN'
node, the expresion $_->father->node
must necessarily point to the function node that encloses it.
The second example shows the use of m
as a Parse::Eyapp::Node
method.
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ cat -n m2.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Parse::Eyapp::Treeregexp;
5
6 Parse::Eyapp::Treeregexp->new( STRING => q{
7 fold: /times|plus|div|minus/i:bin(NUM($n), NUM($m))
8 zxw: TIMES(NUM($x), .) and { $x->{attr} == 0 }
9 wxz: TIMES(., NUM($x)) and { $x->{attr} == 0 }
10 })->generate();
11
12 # Syntax analysis
13 my $parser = new Rule6();
14 $parser->YYData->{INPUT} = "0*0*0";
15 my $t = $parser->Run;
16 print "Tree:",$t->str,"\n";
17
18 # Search
19 my $m = $t->m(our ($fold, $zxw, $wxz));
20 print "Match Node:\n",$m->str,"\n";
When executed with input 0*0*0
the program generates this output:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ m2.pl
Tree:TIMES(TIMES(NUM(TERMINAL),NUM(TERMINAL)),NUM(TERMINAL))
Match Node:
Match[[TIMES:0:wxz]](Match[[TIMES:1:fold,zxw,wxz]])
The representation of Match
nodes by str
deserves a comment. Match
nodes have their own info
method. It returns a string containing the concatenation of the class of $r->node
(i.e. the actual node that matched), the depth ($r->depth
) and the names of the transformations that matched (as provided by the method $r->names
)
The SEVERITY
option of Parse::Eyapp::Treeregexp::new
The SEVERITY
option of Parse::Eyapp::Treeregexp::new
controls the way matching succeeds regarding the number of children. To illustrate its use let us consider the following example. The grammar used Rule6.yp
is similar to the one in the SYNOPSIS example.
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ cat -n numchildren.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Parse::Eyapp::Treeregexp;
5
6 sub TERMINAL::info { $_[0]{attr} }
7
8 my $severity = shift || 0;
9 my $parser = new Rule6();
10 $parser->YYData->{INPUT} = shift || '0*2';
11 my $t = $parser->Run;
12
13 my $transform = Parse::Eyapp::Treeregexp->new(
14 STRING => q{
15 zero_times_whatever: TIMES(NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
16 },
17 SEVERITY => $severity,
18 FIRSTLINE => 14,
19 )->generate;
20
21 $t->s(our @all);
22
23 print $t->str,"\n";
The program gets the severity level from the command line (line 9). The specification of the term TIMES(NUM($x))
inside the transformation zero_times_whatever
does not clearly state that TIMES
must have two children. There are several interpretations of the treregexp depending on the level fixed for SEVERITY
:
0:
TIMES
must have at least one child. Don't care if it has more.1:
TIMES
must have exactly one child.2:
TIMES
must have exactly one child. When visit aTIMES
node with a different number of children issue a warning.3:
TIMES
must have exactly one child. When visit aTIMES
node with a different number of children issue an error.
Observe the change in behavior according to the level of SEVERITY
:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 0 '0*2'
NUM(TERMINAL[0])
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 1 '0*2'
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 2 '0*2'
Warning! found node TIMES with 2 children.
Expected 1 children (see line 15 of ./numchildren.pl)"
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ numchildren.pl 3 '0*2'
Error! found node TIMES with 2 children.
Expected 1 children (see line 15 of ./numchildren.pl)"
at (eval 2) line 29
Tree Substitution: The s
methods
Both Parse::Eyapp:Node
and Parse::Eyapp::YATW
objects (i.e. nodes and tree transformations) are provided with a s
method.
In the case of a Parse::Eyapp::YATW
object the method s
applies the tree transformation using a single bottom-up traversing: the transformation is recursively applied to the children and then to the current node.
For Parse::Eyapp:Node
nodes the set of transformations is applied to each node until no transformation matches any more. The example in the "SYNOPSIS" section illustrates the use:
1 # Let us transform the tree. Define the tree-regular expressions ..
2 my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
3 { # Example of support code
4 my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
5 }
6 constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
7 => {
8 my $op = $Op{ref($_[0])};
9 $x->{attr} = eval "$x->{attr} $op $y->{attr}";
10 $_[0] = $NUM[0];
11 }
12 uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
13 zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
14 whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
15 },
16 OUTPUTFILE=> 'main.pm'
17 );
18 $p->generate(); # Create the tranformations
19
20 $t->s($uminus); # Transform UMINUS nodes
21 $t->s(@all); # constant folding and mult. by zero
The call at line 20 can be substituted by $uminus->s($t)
without changes.
Parse::Eyapp:YATW Methods
Parse::Eyapp:YATW
objects represent tree transformations. They carry the information of what nodes match and how to modify them.
Parse::Eyapp::YATW->new
Builds a treeregexp transformation object. Though usually you build a transformation by means of Treeregexp programs you can directly invoke the method to build a tree transformation. A transformation object can be built from a function that conforms to the YATW tree transformation call protocol (see the section "The YATW Tree Transformation Call Protocol"). Follows an example (file examples/12ts_simplify_with_s.pl
):
nereida:~/src/perl/YappWithDefaultAction/examples> \
sed -ne '68,$p' 12ts_simplify_with_s.pl | cat -n
1 sub is_code {
2 my $self = shift; # tree
3
4 # After the shift $_[0] is the father, $_[1] the index
5 if ((ref($self) eq 'CODE')) {
6 splice(@{$_[0]->{children}}, $_[1], 1);
7 return 1;
8 }
9 return 0;
10 }
11
12 Parse::Eyapp->new_grammar(
13 input=>$translationscheme,
14 classname=>'Calc',
15 firstline =>7,
16 );
17 my $parser = Calc->new(); # Create the parser
18
19 $parser->YYData->{INPUT} = "2*-3\n"; print "2*-3\n"; # Set the input
20 my $t = $parser->Run; # Parse it
21 print $t->str."\n";
22 my $p = Parse::Eyapp::YATW->new(PATTERN => \&is_code);
23 $p->s($t);
24 { no warnings; # make attr info available only for this display
25 local *TERMINAL::info = sub { $_[0]{attr} };
26 print $t->str."\n";
27 }
After the Parse::Eyapp::YATW
object $p
is built at line 22 the call to method $p->s($t)
applies the transformation is_code
using a bottom-up traversing of the tree $t
. The achieved effect is the elimination of CODE
references in the translation scheme tree. When executed the former code produces:
nereida:~/src/perl/YappWithDefaultAction/examples> 12ts_simplify_with_s.pl
2*-3
EXP(TIMES(NUM(TERMINAL,CODE),TERMINAL,UMINUS(TERMINAL,NUM(TERMINAL,CODE),CODE),CODE),CODE)
EXP(TIMES(NUM(TERMINAL[2]),TERMINAL[*],UMINUS(TERMINAL[-],NUM(TERMINAL[3]))))
The file foldrule6.pl
in the examples/
distribution directory gives you another example:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n foldrule6.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Parse::Eyapp::YATW;
5
6 my %BinaryOperation = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
7
8 sub set_terminfo {
9 no warnings;
10 *TERMINAL::info = sub { $_[0]{attr} };
11 }
12 sub is_foldable {
13 my ($op, $left, $right);
14 return 0 unless defined($op = $BinaryOperation{ref($_[0])});
15 return 0 unless ($left = $_[0]->child(0), $left->isa('NUM'));
16 return 0 unless ($right = $_[0]->child(1), $right->isa('NUM'));
17
18 my $leftnum = $left->child(0)->{attr};
19 my $rightnum = $right->child(0)->{attr};
20 $left->child(0)->{attr} = eval "$leftnum $op $rightnum";
21 $_[0] = $left;
22 }
23
24 my $parser = new Rule6();
25 $parser->YYData->{INPUT} = "2*3";
26 my $t = $parser->Run;
27 &set_terminfo;
28 print "\n***** Before ******\n";
29 print $t->str;
30 my $p = Parse::Eyapp::YATW->new(PATTERN => \&is_foldable);
31 $p->s($t);
32 print "\n***** After ******\n";
33 print $t->str."\n";
when executed produces:
nereida:~/src/perl/YappWithDefaultAction/examples> foldrule6.pl
***** Before ******
TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3]))
***** After ******
NUM(TERMINAL[6])
The YATW Tree Transformation Call Protocol
For a subroutine pattern_sub
to work as a YATW tree transformation - as subroutines is_foldable
and is_code
above - has to conform to the following call description:
pattern_sub(
$_[0], # Node being visited
$_[1], # Father of this node
$index, # Index of this node in @Father->children
$self, # The YATW pattern object
);
The pattern_sub
must return TRUE if matched and FALSE otherwise.
The protocol may change in the near future. Avoid using other information than the fact that the first argument is the node being visited.
Parse::Eyapp::YATW->buildpatterns
Works as Parse::Eyapp->new but receives an array of subs conforming to the YATW Tree Transformation Call Protocol.
our @all = Parse::Eyapp::YATW->buildpatt(\&delete_code, \&delete_tokens);
$yatw->delete
The root of the tree that is currently matched by the YATW transformation $yatw
will be deleted from the tree as soon as is safe. That usually means when the processing of their siblings is finished. The following example (taken from file examples/13ts_simplify_with_delete.pl
in the Parse::Eyapp distribution) illustrates how to eliminate CODE and syntactic terminals from the syntax tree:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ \
sed -ne '62,$p' 13ts_simplify_with_delete.pl | cat -n
1 sub not_useful {
2 my $self = shift; # node
3 my $pat = $_[2]; # get the YATW object
4
5 (ref($self) eq 'CODE') or ((ref($self) eq 'TERMINAL') and ($self->{token} eq $self->{attr}))
6 or do { return 0 };
7 $pat->delete();
8 return 1;
9 }
10
11 Parse::Eyapp->new_grammar(
12 input=>$translationscheme,
13 classname=>'Calc',
14 firstline =>7,
15 );
16 my $parser = Calc->new(); # Create the parser
17
18 $parser->YYData->{INPUT} = "2*3\n"; print $parser->YYData->{INPUT};
19 my $t = $parser->Run; # Parse it
20 print $t->str."\n"; # Show the tree
21 my $p = Parse::Eyapp::YATW->new(PATTERN => \¬_useful);
22 $p->s($t); # Delete nodes
23 print $t->str."\n"; # Show the tree
when executed we get the following output:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ 13ts_simplify_with_delete.pl
2*3
EXP(TIMES(NUM(TERMINAL[2],CODE),TERMINAL[*],NUM(TERMINAL[3],CODE),CODE))
EXP(TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3])))
$yatw->unshift
Tha call $yatw->unshift($b)
safely unshifts (inserts at the beginning) the node $b
in the list of its siblings of the node that matched (i.e in the list of siblings of $_[0]
). The following example shows a YATW transformation insert_child
that illustrates the use of unshift
(file examples/26delete_with_trreereg.pl
):
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ \
sed -ne '70,$p' 26delete_with_trreereg.pl | cat -n
1 my $transform = Parse::Eyapp::Treeregexp->new( STRING => q{
2
3 delete_code : CODE => { $delete_code->delete() }
4
5 {
6 sub not_semantic {
7 my $self = shift;
8 return 1 if ((ref($self) eq 'TERMINAL') and ($self->{token} eq $self->{attr}));
9 return 0;
10 }
11 }
12
13 delete_tokens : TERMINAL and { not_semantic($TERMINAL) } => {
14 $delete_tokens->delete();
15 }
16
17 insert_child : TIMES(NUM(TERMINAL), NUM(TERMINAL)) => {
18 my $b = Parse::Eyapp::Node->new( 'UMINUS(TERMINAL)',
19 sub { $_[1]->{attr} = '4.5' }); # The new node will be a sibling of TIMES
20
21 $insert_child->unshift($b);
22 }
23 },
24 )->generate();
25
26 Parse::Eyapp->new_grammar(
27 input=>$translationscheme,
28 classname=>'Calc',
29 firstline =>7,
30 );
31 my $parser = Calc->new(); # Create the parser
32
33 $parser->YYData->{INPUT} = "2*3\n"; print $parser->YYData->{INPUT}; # Set the input
34 my $t = $parser->Run; # Parse it
35 print $t->str."\n"; # Show the tree
36 # Get the AST
37 our ($delete_tokens, $delete_code);
38 $t->s($delete_tokens, $delete_code);
39 print $t->str."\n"; # Show the tree
40 our $insert_child;
41 $insert_child->s($t);
42 print $t->str."\n"; # Show the tree
When is executed the program produces the following output:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ 26delete_with_trreereg.pl
2*3
EXP(TIMES(NUM(TERMINAL[2],CODE),TERMINAL[*],NUM(TERMINAL[3],CODE),CODE))
EXP(TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3])))
EXP(UMINUS(TERMINAL[4.5]),TIMES(NUM(TERMINAL[2]),NUM(TERMINAL[3])))
Don't try to take advantage that the transformation sub receives in $_[1]
a reference to the father (see the section "The YATW Tree Transformation Call Protocol") and do something like:
unshift $_[1]->{children}, $b
it is unsafe.
$yatw->insert_before
A call to $yatw->insert_before($node)
safely inserts $node
in the list of siblings of $_[0]
just before $_[0]
(i.e. the ndoe that matched with $yatw
). The following example (file t/33moveinvariantoutofloop.t
) illustrates its use:
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
moveinvariant: WHILE(VAR($b), BLOCK(@a, ASSIGN($x, $e), @c))
and { is_invariant($ASSIGN, $WHILE) } => {
my $assign = $ASSIGN;
$BLOCK->delete($ASSIGN);
$moveinvariant->insert_before($assign);
}
},
);
Here the ASSIGN($x, $e)
subtree - if is loop invariant - will be moved to the list of siblings of $WHILE
just before the $WHILE
.
Compiling with eyapp
and treereg
A Treeregexp program can be isolated in a file an compiled with the program treereg
. The default extension is .trg
. See the following example:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n Shift.trg
1 # File: Shift.trg
2 {
3 sub log2 {
4 my $n = shift;
5 return log($n)/log(2);
6 }
7
8 my $power;
9 }
10 mult2shift: TIMES($e, NUM($m)) and { $power = log2($m->{attr}); (1 << $power) == $m->{attr} }
11 => {
12 $_[0]->delete(1);
13 $_[0]->{shift} = $power;
14 $_[0]->type('SHIFTLEFT');
15 }
Note that auxiliary support code can be inserted at any point between transformations (lines 2-6). The code will be inserted (without the defining curly brackets) at that point. Note also that the lexical variable $power
is visible inside the definition of the mult2shift
transformation.
A treeregexp like $e
matches any node. A reference to the node is saved in the lexical variable $e
. The scope of the variable $e
is the current tree transformation, i.e. mult2shift
. Such kind of treeregexps are called scalar treeregexps.
The call to the delete
method at line 12 deletes the second child of the node being visited (i.e. NUM($m)
).
The call to type
at line 14 retypes the node as a SHIFTLEFT
node.
The program is compiled using the script treereg
:
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ eyapp Rule6
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ treereg Shift
pl@nereida:~/src/perl/YappWithDefaultAction/examples$ ls -ltr | tail -2
-rw-rw---- 1 pl users 5960 2007-01-30 09:09 Rule6.pm
-rw-rw---- 1 pl users 1424 2007-01-30 09:09 Shift.pm
The Grammar Rule6.yp
is similar to the one in the "SYNOPSIS" section. Module Rule6.pm
contains the parser. The module Shift.pm
contains the code implementing the tree transformations.
The client program follows:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n useruleandshift.pl
1 #!/usr/bin/perl -w
2 use strict;
3 use Rule6;
4 use Shift;
5 { no warnings; *TERMINAL::info = \&TERMINAL::attr; }
6
7 push @SHIFTLEFT::ISA, 'Parse::Eyapp::Node';
8 sub SHIFTLEFT::info { $_[0]{shift} }
9
10 my $parser = new Rule6();
11 $parser->YYData->{INPUT} = <>;
12 my $t = $parser->Run;
13 print "***********\n",$t->str,"\n";
14 $t->s(@Shift::all);
15 print "***********\n",$t->str,"\n";
Lines 5 and 8 provide the node classes TERMINAL
and SHIFTLEFT
of info
methods to be used by str
(lines 13 and 14). To make SHIFTLEFT
a node class it has to inherit from Parse::Eyapp::Node
(line 7).
Multiplications by a power of two are substituted by the corresponding shifts:
nereida:~/src/perl/YappWithDefaultAction/examples> useruleandshift.pl
a=b*8
***********
ASSIGN(TERMINAL[a],TIMES(VAR(TERMINAL[b]),NUM(TERMINAL[8])))
***********
ASSIGN(TERMINAL[a],SHIFTLEFT[3](VAR(TERMINAL[b])))
Compiling: More Options
See files Rule9.yp
, Transform4.trg
and foldand0rule9_4.pl
in the examples directory for a more detailed vision of this example. File Rule9.yp
is very much like the grammar in the "SYNOPSIS" example. To compile the grammar Rule9.yp
and the treeregexp file Transform4.trg
use the commands:
eyapp -m 'Calc' Rule9.yp
That will produce a file Calc.pm
containing a package Calc
that implements the LALR parser. Then the command:
treereg -o T.pm -p 'R::' -m T Transform4
produces a file T.pm
containing a package T
that implements the tree transformation program. The -p
option announces that node classes are prefixed by 'R::'
.
With such parameters the client program uses the generated modules as follows:
nereida:~/src/perl/YappWithDefaultAction/examples> cat -n foldand0rule9_4.pl
1 #!/usr/bin/perl -w
2 # File: foldand0rule9_4.pl. Compile it with
3 # eyapp -m 'Calc' Rule9.yp; treereg -o T.pm -p 'R::' -m T Transform4
4 use strict;
5 use Calc;
6 use T;
7
8 sub R::TERMINAL::info { $_[0]{attr} }
9 my $parser = new Calc(yyprefix => "R::");
10 my $t = $parser->YYParse( yylex => \&Calc::Lexer, yyerror => \&Calc::Error);
11 print "\n***** Before ******\n";
12 print $t->str."\n";
13 $t->s(@T::all);
14 print "\n***** After ******\n";
15 print $t->str."\n";
running the program produces the following output:
nereida:~/src/perl/YappWithDefaultAction/examples> foldand0rule9_4.pl
2*3
***** Before ******
R::TIMES(R::NUM(R::TERMINAL[2]),R::TERMINAL[*],R::NUM(R::TERMINAL[3]))
***** After ******
R::NUM(R::TERMINAL[6])
Scope Analysis with Parse::Eyapp::Scope
A scope manager helps to compute the mapping function that maps the uses (instances) of source objects to their definitions. For instance, when dealing with identifier scope analysis the problem is to associate each ocurrence of an identifier with the declaration that applies to it. Another example is loop scope analysis where the problem is to associate each occurrence of a CONTINUE
or BREAK
node with the shallowest LOOP
that encloses it. Or label scope analysis, the problem to associate a GOTO
node with the node to jump to, that is, with the STATEMENT
associated with the label.
To take advantage of Parse::Eyapp::Scope
, the compiler writer must mark at the appropriate time (for example a new block or new subroutine for identifier scope analysis, a new loop for loop scope analysis, etc.) the beginning of a new scope calling the method begin_scope. From that point on any ocurring instance of an object (for example, variables in expressions for identifier scope analysis, breaks and continues for loop scope analysis, etc.) must be declared calling the method scope_instance. The programmer must also mark the end of the current scope at the appropriate time.
$scope->end_scope
There are three ways of calling $scope->end_scope
. The first one is for Scope Analysis Problems where a symbol table is needed (for example in identifier scope analysis and label scope analysis and there is a Parse::Eyapp::Node
node that owns the scope.
$scope->end_scope with first Arg a Symbol Table and Second Argument a Node
For each ocurring instance of an object $x
that occurred since the last call to begin_scope the call to
$scope->end_scope(\%symboltable, $definition_node, 'attr1', 'attr2', ... )
decorates the ocurring instance $x
with several attributes:
An entry
$x->{SCOPE_NAME}
is built that will reference$definition_node
.An entry
$x->{ENTRY_NAME}
is built. That entry references$symboltable{$x->key}
(to have a faster access from the instance to the attributes of the object). The instantiated nodes must have a$x->key
method which provides the entry for the node in the symbol table:pl@nereida:~/src/perl/YappWithDefaultAction/examples$ sed -ne '651,657p' Types.eyp sub VAR::key { my $self = shift; return $self->child(0)->{attr}[0]; } *VARARRAY::key = *FUNCTIONCALL::key = \&VAR::key;
For each aditional arguments
attr#k
an entry$x->{attr#k
} will be built. That entry references$symboltable{$x->key}{attr#k}
. Therefore the entry for$x
in the symbol table must already have a field namedattr#k
. If the hash referenced by$symboltable{$x->key}
does not have a keyattr#k
no reference is built.
In a list context $scope>end_scope
returns two references. The first one is a reference to a list of node instantiated that weren't defined in the current scope. The second is a reference to a list of nodes that were defined in this scope. In a scalar context returns the first of these two. An instance $x
is defined if, and only if, exists $symboltable{$_->key}
.
$scope->end_scope with first Arg a Symbol Table and Remaining Arguments strings
For each ocurring instance of an object $x
that occurred since the last call to begin_scope the call to
$scope->end_scope(\%symboltable, 'attr1', 'attr2', ... )
decorates the ocurring instance $x
with several attributes:
An entry
$x->{ENTRY_NAME}
is built. That entry references$symboltable{$x->key}
(to have a faster access from the instance to the attributes of the object). The instantiated nodes must have a$x->key
method which provides the entry for the node in the symbol table.For each aditional arguments
attr#k
an entry$x->{attr#k
} will be built. That entry references$symboltable{$x->key}{attr#k}
. Therefore the entry for$x
in the symbol table must already have a field namedattr#k
. If the hash referenced by$symboltable{$x->key}
does not have a keyattr#k
no reference is built.
$scope->end_scope for Simple Scope Analysis
Some scope analysis problems do not require the existence of a symbol table (for instance, the problem of associating a RETURN
node with the FUNCTION
that encloses it). For such kind of problems $scope>end_scope
provides a second form of call. The second way to call $scope>end_scope
is
$declared = $scopemanager->end_scope($definition_node);
The only argument is the reference to the node that controls/defines the scope. The method returns a reference to the declared nodes. Any node instanced with scope_instance
since the last call to begin_scope
is considered declared.
$scope->begin_scope
Marks the beginning of an scope. Example (file examples/Types.eyp
):
loopPrefix:
$WHILE '(' expression ')'
{
$loops->begin_scope;
$_[3]->{line} = $WHILE->[1]; # Save the line for error diagostic
$_[3]
}
$scope->scope_instance
Declares the node argument to be an occurring instance of the scope:
nereida:~/doc/casiano/PLBOOK/PLBOOK/code> \
sed -ne '375,380p' Simple6.eyp | cat -n
1 $Variable '=' binary
2 {
3 my $parser = shift;
4 $ids->scope_instance($Variable);
5 $parser->YYBuildAST(@_); # "Manually" build the node
6 }
Parse::Eyapp::Scope->new
Parse::Eyapp::Scope->new
returns a scope managment object. The scope mapping function is implemented by Parse::Eyapp::Scope
through a set of attributes that are added to the nodes involved in the scope analysis. The names of these attributes can be specified using the parameters of Parse::Eyapp::Scope->new
. The arguments of new
are:
SCOPE_NAME
is the name chosen for the attribute of the node instance which will held the reference to the definition node. If not specified it will take the value"scope"
.ENTRY_NAME
is the name of the attribute of the node instance which will held the reference to the symbol table entry. By default takes the value"entry"
.SCOPE_DEPTH
is the name for an attribute of the definition node. Optional. If not specified it will not be defined.
ENVIRONMENT
Remember to set the environment variable PERL5LIB
if you decide to install Parse::Eyapp
at a location other than the standard. For example, on a bash or sh:
export PERL5LIB-/home/user/wherever_it_is/lib/:$PERL5LIB
on a csh
or tcsh
setenv PERL5LIB /home/user/wherever_it_is/lib/:$PERL5LIB
Be sure the scripts eyapp
and treereg
are in the execution PATH.
DEPENDENCIES
This distribution depends on the following modules:
It seems that List::Util is in the core of Perl distributions since version 5.73:
> perl -MModule::CoreList -e 'print Module::CoreList->first_release("List::Util")'
5.007003
and Data::Dumper is also in the core since 5.5:
> perl -MModule::CoreList -e 'print Module::CoreList->first_release("Data::Dumper")'
5.005
and Pod::Usage is also in the core since 5.6:
> perl -MModule::CoreList -e 'print Module::CoreList->first_release("Pod::Usage")'
5.006
I also recommend the following modules:
The dependence on Test::Warn, Test::Pod and Test::Exception is merely for the execution of tests. If the modules aren't installed the tests depending on them will be skipped.
INSTALLATION
To install it, follow the traditional mantra:
perl Makefile.PL
make
make test
make install
Also:
Make a local copy of the
examples/
directory in this distributionProbably it will be also a good idea to make a copy of the tests in the
t/
directory. They also illustrate the use of EyappPrint and read the pdf file in http://nereida.deioc.ull.es/~pl/perlexamples/Eyapp.pdf
BUGS AND LIMITATIONS
The way Parse::Eyapp parses Perl code is verbatim the way it does Parse::Yapp 1.05. Quoting Francois Desarmenien Parse::Yapp documentation:
"Be aware that matching braces in Perl is much more difficult than in C: inside strings they don't need to match. While in C it is very easy to detect the beginning of a string construct, or a single character, it is much more difficult in Perl, as there are so many ways of writing such literals. So there is no check for that today. If you need a brace in a double-quoted string, just quote it (
\{
or\}
). For single-quoted strings, you will need to make a comment matching it in th right order. Sorry for the inconvenience.{ "{ My string block }". "\{ My other string block \}". qq/ My unmatched brace \} /. # Force the match: { q/ for my closing brace } / q/ My opening brace { / # must be closed: } }
All of these constructs should work."
Alternative exact solutions were tried but resulted in much slower code. Therefore, until something faster is found, I rather prefer for Parse::Eyapp to live with this limitation.
The same limitation may appear inside header code (code between
%{
and%}
)
SEE ALSO
The pdf file in http://nereida.deioc.ull.es/~pl/perlexamples/Eyapp.pdf
The tutorial Parsing Strings and Trees with
Parse::Eyapp
(An Introduction to Compiler Construction in seven pages) in http://nereida.deioc.ull.es/~pl/eyapsimple/perldoc eyapp,
perldoc treereg,
Análisis Léxico y Sintáctico, (Notes for a course in compiler construction) by Casiano Rodriguez-Leon. Available at http://nereida.deioc.ull.es/~pl/perlexamples/ Is the more complete and reliable source for Parse::Eyapp. However is in Spanish.
Man pages of yacc(1),
Man pages of bison(1),
REFERENCES
The classic Dragon's book Compilers: Principles, Techniques, and Tools by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman (Addison-Wesley 1986)
AUTHOR
Casiano Rodriguez-Leon (casiano@ull.es)
ACKNOWLEDGMENTS
This work has been supported by CEE (FEDER) and the Spanish Ministry of Educación y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project http://www.oplink.ull.es/). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years.
A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien.
I wish to thank Francois Desarmenien for his Parse::Yapp
module, to my students at La Laguna and to the Perl Community. Special thanks to my family and Larry Wall.
LICENCE AND COPYRIGHT
Copyright (c) 2006-2007 Casiano Rodriguez-Leon (casiano@ull.es). All rights reserved.
Parse::Yapp copyright is of Francois Desarmenien, all rights reserved. 1998-2001
These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 2420:
Non-ASCII character seen before =encoding in 'válida\n";'. Assuming CP1252