NAME

eyapptut - A Parse::Eyapp Tutorial

VERSION

1.06552

SYNOPSIS

use strict;
use Parse::Eyapp;
use Parse::Eyapp::Treeregexp;

sub TERMINAL::info {
  $_[0]{attr}
}

my $grammar = q{
  %right  '='     # Lowest precedence
  %left   '-' '+' # + and - have more precedence than = Disambiguate a-b-c as (a-b)-c
  %left   '*' '/' # * and / have more precedence than + Disambiguate a/b/c as (a/b)/c
  %left   NEG     # Disambiguate -a-b as (-a)-b and not as -(a-b)
  %tree           # Let us build an abstract syntax tree ...

  %%
  line: exp <%name EXPRESION_LIST + ';'>  { $_[1] } /* list of expressions separated by ';' */
  ;

  /* The %name directive defines the name of the class to which the node being built belongs */
  exp:
      %name NUM  NUM            | %name VAR   VAR         | %name ASSIGN VAR '=' exp
    | %name PLUS exp '+' exp    | %name MINUS exp '-' exp | %name TIMES  exp '*' exp
    | %name DIV     exp '/' exp | %name UMINUS '-' exp %prec NEG
    |   '(' exp ')'  { $_[2] }  /* Let us simplify a bit the tree */
  ;

  %%
  sub _Error { die "Syntax error near ".($_[0]->YYCurval?$_[0]->YYCurval:"end of file")."\n" }

  sub _Lexer {
    my($parser)=shift; # The parser object

    for ($parser->YYData->{INPUT}) {
      s/^\s+//;
      $_ eq '' and return('',undef);
      s/^([0-9]+(?:\.[0-9]+)?)// and return('NUM',$1);
      s/^([A-Za-z][A-Za-z0-9_]*)// and return('VAR',$1);
      s/^(.)//s and return($1,$1);
    }
  }

  sub Run {
      my($self)=shift;
      $self->YYParse( yylex => \&_Lexer, yyerror => \&_Error, );
  }
}; # end grammar

our (@all, $uminus);

Parse::Eyapp->new_grammar( # Create the parser package/class
  input=>$grammar,
  classname=>'Calc', # The name of the package containing the parser
  firstline=>7       # String $grammar starts at line 7 (for error diagnostics)
);
my $parser = Calc->new();                # Create a parser
$parser->YYData->{INPUT} = "2*-3+b*0;--2\n"; # Set the input
my $t = $parser->Run;                    # Parse it!
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;

# Let us transform the tree. Define the tree-regular expressions ..
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
  { #  Example of support code
    my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
  }
  constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
    => {
      my $op = $Op{ref($_[0])};
      $x->{attr} = eval  "$x->{attr} $op $y->{attr}";
      $_[0] = $NUM[0];
    }
  uminus: UMINUS(NUM($x)) => { $x->{attr} = -$x->{attr}; $_[0] = $NUM }
  zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 } => { $_[0] = $NUM }
  whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
  },
  OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations

$t->s($uminus); # Transform UMINUS nodes
$t->s(@all);    # constant folding and mult. by zero

local $Parse::Eyapp::Node::INDENT=0;
print "\nSyntax Tree after transformations:\n",$t->str,"\n";

Introduction to `Parse::Eyapp`

Parse::Eyapp (Extended yapp) is a collection of modules that extends Francois Desarmenien Parse::Yapp 1.05. Eyapp extends yacc/yapp syntax with the functionalities briefly described in this section. This is an introductory tutorial. For a reference guide see Parse::Eyapp. If you are not familiar with yacc or yapp and you can speak Spanish start reading the contents in http://nereida.deioc.ull.es/~pl/perlexamples/chapter_parseeyapp.html.

Input from strings

Grammars can be compiled from a file or from source on the fly (See the synopsis section for an example).

Names for attributes

Attributes can be referenced by meaningful names instead of the classic error-prone positional approach using the dot notation like in:

exp : exp.left '-' exp.right  { $left - $right }

By qualifying the first appearance of the syntactic variable exp with the notation exp.left we can later refer inside the actions to the associated attribute using the lexical variable $left. Thus the former code is equivalent to:

exp '-' exp
 { 
   my $lhs = shift; 
   my ($left, $right) = @_[1, 3];
   $lhs->{n} = $left->{n} - $right->{n} 
 }

The dolar notation $A can be used as an abbreviation of A.A. For example, the code:

$VAR '=' $exp
  { $lhs->{n} = $sym{$VAR->{attr}}->{n} = $exp->{n} }

is equivalent to:

VAR '=' exp
{ 
  my $lhs = shift;
  my ($VAR, $exp) = @_[1, 3];
  $lhs->{n} = $sym{$VAR->{attr}}->{n} = $exp->{n} 
}

Lists and Optionals

Lists, optional lists, list separated by tokens, etc. like in the start rule in the Synopsis example can be used:

line: exp <%name EXPRESION_LIST + ';'>  { $_[1] }

which defines line as the language of non empty lists of exp elements separated by semicolons. The use of %name EXPRESION_LIST gives a name to the created list. Actually the right hand side of this production has only one element which is the reference to the list. The associated action { $_[1] } makes the generated parser to return the reference to such list.

The former rule is almost equivalent to:

line: line ';' exp  { push $_[1]->{children}, $_[3] }
    | exp           { bless { children => [ $_[1] ] }, 'EXPRESION_LIST' }

other operators are * (meaning 0 or more) and ? (for optionality)

Default actions

When no action is specified both yapp and eyapp implicitly insert the semantic action { $_[1] }. In Parse::Eyapp you can modify such behavior using the %defaultaction { Perl code } directive. The { Perl code } clause that follows the %defaultaction directive is executed when reducing by any production for which no explicit action was specified.

See an example that translates an infix expression like a=b*-3 into a postfix expression like a b 3 NEG * = :

# File Postfix.eyp (See the examples/ directory)
%right  '='
%left   '-' '+'
%left   '*' '/'
%left   NEG

%defaultaction { return  "$left $right $op"; }

%%
line: $exp  { print "$exp\n" }
;

exp:        $NUM  { $NUM }
        |   $VAR  { $VAR }
        |   VAR.left '='.op exp.right
        |   exp.left '+'.op exp.right
        |   exp.left '-'.op exp.right
        |   exp.left '*'.op exp.right
        |   exp.left '/'.op exp.right
        |   '-' $exp %prec NEG { "$exp NEG" }
        |   '(' $exp ')' { $exp }
;

%%

# Support subroutines as in the Synopsis example
...

Compiling with `eyapp`

The file containing the Eyapp program must be compiled with eyapp:

nereida:~/src/perl/YappWithDefaultAction/examples> eyapp Postfix.eyp

Next, you have to write a client program:

nereida:~/src/perl/YappWithDefaultAction/examples> cat -n usepostfix.pl
     1  #!/usr/bin/perl -w
     2  use strict;
     3  use Postfix;
     4
     5  my $parser = new Postfix();
     6  $parser->Run;

Now we can run the client program:

nereida:~/src/perl/YappWithDefaultAction/examples> usepostfix.pl
Write an expression: -(2*a-b*-3)
2 a * b 3 NEG * - NEG

Abstract Syntax Trees

Parse::Eyapp facilitates the construction of concrete syntax trees and abstract syntax trees (abbreviated AST from now on) through the %tree directive. Nodes in the AST are blessed in the production name. By default the name of a production is the concatenation of the left hand side and the production number. The production number is the ordinal number of the production as they appear in the associated .output file (see option -v of eyapp) However, a production can be named using the %name directive. Therefore, in the following code:

exp:
       %name NUM  NUM            
     | %name VAR   VAR         
     | %name ASSIGN VAR '=' exp
     . .............................
     | %name UMINUS '-' exp %prec NEG
     |   '(' exp ')'  { $_[2] }  /* Let us simplify a bit the tree */

we are explictly naming the productions. Thus, the node corresponding to the production exp: VAR '=' exp will be named ASSIGN. Explicit actions can be specified by the programmer like in

|   '(' exp ')'  { $_[2] }  /* Let us simplify a bit the tree */

Explicit actions receive as arguments the references to the children nodes already built. The programmer can influence the shape of the tree by inserting these explicit actions. In this example the programmer has decided to simplify the syntax tree: the nodes associated with the parenthesis are discarded and the reference to the subtree containing the proper expression is returned.

Displaying Trees

All the node classes build by %tree inherit from Parse::Eyapp::Node and consequently have access to the methods provided in such module. Among them is the str method which dumps the tree. The str method traverses the syntax tree dumping the type of the node being visited in a string. If the node has a method info it will be executed and its result concatenated to the string. Thus, in the Synopsis example, by adding the info method to the class TERMINAL:

sub TERMINAL::info {
  $_[0]{attr}
}

we achieve the insertion of the TERMINAL attribute attr in the string build by str (see the partial output of synopsis.pl in section "Syntactic and Semantic tokens"). The existence of some methods (like footnote) and the values of some package variables influence the behavior of str. Among the most important are:

@PREFIXES = qw(Parse::Eyapp::Node::);                              # Prefixes to supress 
$INDENT = 0;                                       # 0 = compact, 1 = indent, 2 = indent 
                                              # and include Types in closing parenthesis
$STRSEP = ',';                             # Separator between nodes, by default a comma
$DELIMITER = '[';                      # The string returned by C<info> will be enclosed 
$FOOTNOTE_HEADER = "\n---------------------------\n"; 
$FOOTNOTE_SEP = ")\n"; 
$FOOTNOTE_LEFT = '^{';                            # Left delimiter for a footnote number
$FOOTNOTE_RIGHT = '}';                           # Right delimiter for a footnote number
$LINESEP = 4;                          # When indent=2 the enclosing parenthesis will be
                                       # commented if more than $LINESEP apart

`TERMINAL` nodes

Nodes named TERMINAL correspond to tokens provided by the lexical analyzer. They are Parse::Eyapp::Node nodes (hashes) with an attribute attr holding the attribute provided by the lexical analyzer. The attr method can be used to get/set the attribute.

User Attributes and System Attributes

All the nodes in the AST are Parse::Eyapp::Node nodes. They are hashes that the user can decorate with new keys/attributes. The only reserved words are those listed in the reference section. Basically they have a children key. TERMINAL nodes have the attr key.

Syntactic and Semantic tokens

Parse::Eyapp diferences between syntactic tokens and semantic tokens. By default all tokens declared using string notation (i.e. between quotes like '+', '=', in the Synopsis example) are considered syntactic tokens. Tokens declared by an identifier (like NUM or VAR in the Synopsis example) are by default considered semantic tokens. Syntactic tokens are eliminated when building the syntactic tree. Thus, the first print in the former Synopsis example:

$parser->YYData->{INPUT} = "2*-3+b*0;--2\n"; 
my $t = $parser->Run;                    
local $Parse::Eyapp::Node::INDENT=2;
print "Syntax Tree:",$t->str;

gives as result the following output:

nereida:~/src/perl/YappWithDefaultAction/examples> synopsis.pl
Syntax Tree:
EXPRESION_LIST(
  PLUS(
    TIMES(
      NUM(
        TERMINAL[2]
      ),
      UMINUS(
        NUM(
          TERMINAL[3]
        )
      ) # UMINUS
    ) # TIMES,
    TIMES(
      VAR(
        TERMINAL[b]
      ),
      NUM(
        TERMINAL[0]
      )
    ) # TIMES
  ) # PLUS,
  UMINUS(
    UMINUS(
      NUM(
        TERMINAL[2]
      )
    ) # UMINUS
  ) # UMINUS
) # EXPRESION_LIST

Saving the Information In Syntactic Tokens

The reason for the adjective %syntactic applied to a token is to state that the token influences the shape of the syntax tree but carries no other information. When the tree is built the node corresponding to the token is discarded.

Sometimes the difference between syntactic and semantic tokens is blurred. For example the line number associated with an instance of the syntactic token '+' can be used later -say during type checking- to emit a more accurate error diagnostic. But if the node was discarded the information about that line number is no longer available. When building the syntax tree Parse::Eyapp (namely the method Parse::Eyapp::YYBuildAST) checks a TERMINAL::save_attributes method exists and if so it will be called when visiting a syntactic terminal. The method receives as argument - additionally to the reference to the TERMINAL node - a reference to the node associated with the left hand side of the production. Here is an example (file examples/Types.eyp) of use:

sub TERMINAL::save_attributes {
  # $_[0] is a syntactic terminal
  # $_[1] is the father.
  push @{$_[1]->{lines}}, $_[0]->[1]; # save the line!
}

The directives `%syntactic token` and `%semantic token`

The new token declaration directives %syntactic token and %semantic token can change the status of a token. For example (file 15treewithsyntactictoken.pl in the examples/ directory), given the grammar:

%syntactic token b
%semantic token 'a' 'c'
%tree

%%

S: %name ABC
     A B C
 | %name BC
     B C
;

A: %name A
     'a'
;

B: %name B
     b
;

C: %name C
    'c'
;
%%

the tree build for input abc will be ABC(A(TERMINAL),B,C(TERMINAL)).

The `bypass` clause and the `%no bypass` directive

The shape of the tree can be also modified using some %tree clauses as %tree bypass which will produce an automatic bypass of any node with only one child at tree-construction-time.

A bypass operation consists in returning the only child of the node being visited to the father of the node and re-typing (re-blessing) the node in the name of the production (if a name is provided).

A node may have only one child at tree-construction-time for one of two reasons.

The first occurs when the right hand side of the production was already unary like in:
```
exp:
    %name NUM  NUM 
```
Here the NUM node will be bypassed and the child TERMINAL built from the information provided by the lexical analyzer will be renamed as NUM.
Another reason for a node to be bypassed is the fact that though the right hand side of the production may have more than one symbol, only one of them is not a syntactic token like in:
```
exp: '(' exp ')'
```

As consequence of the blind application of the bypass rule undesired bypasses may occur like in

exp : %name UMINUS
      '-' $exp %prec NEG

though the right hand side has two symbols, token '-' is a syntactic token and therefore only exp is left. The bypass operation will be applied when building this node. This bypass can be avoided applying the no bypass ID directive to the corresponding production:

exp : %no bypass UMINUS
      '-' $exp %prec NEG

The following example is the equivalent of the Synopsis example but using the bypass clause instead:

use Parse::Eyapp;
use Parse::Eyapp::Treeregexp;

sub TERMINAL::info { $_[0]{attr} }
{ no warnings; *VAR::info = *NUM::info = \&TERMINAL::info; }

my $grammar = q{
  %right  '='     # Lowest precedence
  %left   '-' '+' 
  %left   '*' '/' 
  %left   NEG     # Disambiguate -a-b as (-a)-b and not as -(a-b)
  %tree bypass    # Let us build an abstract syntax tree ...

  %%
  line: exp <%name EXPRESION_LIST + ';'>  { $_[1] } 
  ;

  exp:
      %name NUM  NUM            | %name VAR   VAR         | %name ASSIGN VAR '=' exp
    | %name PLUS exp '+' exp    | %name MINUS exp '-' exp | %name TIMES  exp '*' exp
    | %name DIV     exp '/' exp
    | %no bypass UMINUS
      '-' $exp %prec NEG
    |   '(' exp ')'
  ;

  %%
  # sub _Error, _Lexer and Run like in the synopsis example
  # ...
}; # end grammar

our (@all, $uminus);

Parse::Eyapp->new_grammar( # Create the parser package/class
  input=>$grammar,
  classname=>'Calc', # The name of the package containing the parser
  firstline=>7       # String $grammar starts at line 7 (for error diagnostics)
);
my $parser = Calc->new();                # Create a parser
$parser->YYData->{INPUT} = "a=2*-3+b*0\n"; # Set the input
my $t = $parser->Run;                    # Parse it!

print "\n************\n".$t->str."\n************\n";

# Let us transform the tree. Define the tree-regular expressions ..
my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
  { #  Example of support code
    my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
  }
  constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM, NUM)
    => {
      my $op = $Op{ref($_[0])};
      $NUM[0]->{attr} = eval  "$NUM[0]->{attr} $op $NUM[1]->{attr}";
      $_[0] = $NUM[0];
    }
  zero_times_whatever: TIMES(NUM, .) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
  whatever_times_zero: TIMES(., NUM) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
  uminus: UMINUS(NUM) => { $NUM->{attr} = -$NUM->{attr}; $_[0] = $NUM }
  },
  OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations

$t->s(@all);    # constant folding and mult. by zero

print $t->str,"\n";

when running this example we obtain the following output:

nereida:~/src/perl/YappWithDefaultAction/examples> bypass.pl

************
EXPRESION_LIST(ASSIGN(TERMINAL[a],PLUS(TIMES(NUM[2],UMINUS(NUM[3])),TIMES(VAR[b],NUM[0]))))
************
EXPRESION_LIST(ASSIGN(TERMINAL[a],NUM[-6]))

As you can see the trees are more compact when using the bypass directive.

Explictly building nodes with the `YYBuildAST` method

Sometimes the best time to decorate a node with some attributes is just after being built. In such cases the programmer can take manual control building the node with YYBuildAST to inmediately proceed to decorate it.

The following example illustrates the situation:

Variable:
    %name  VARARRAY
    $ID ('[' binary ']') <%name INDEXSPEC +>
      {
        my $self = shift;
        my $node =  $self->YYBuildAST(@_);
        $node->{line} = $ID->[1];
        return $node;
      }

This example defines the expression to access an array element as an identifier followed by a non empty list of binary expressions. The node corresponding to the list of indices has been named INDEXSPEC.

When no explicit action is inserted a binary node will be built having as first child the node corresponding to the identifier $ID and as second child the reference to the list of binary expressions. However, the programmer wants to decorate the node being built with a line attribute holding the line number in the source code where the identifier being used appears. The call to the Parse::Eyapp::Driver method YYBuildAST does the job of building the node. After that the node can be decorated and returned.

Actually, the %tree directive is semantically equivalent to:

%default action { goto &Parse::Eyapp::Driver::YYBuildAST }

The `child` and `descendant` methods

Access to the children of the AST is achieved through the children and child methods. More general is the descendant method that returns the descendant of a node given its coordinates. See a session with the debugger:

  DB<7> x $t->child(0)->child(0)->child(1)->child(0)->child(2)->child(1)->str
0  '
BLOCK[8:4:test]^{0}(
  CONTINUE[10,10]
)
  DB<8> x $t->descendant('.0.0.1.0.2.1')->str
0  '
BLOCK[8:4:test]^{0}(
  CONTINUE[10,10]

Returning non References Under `%tree`

When a explicit user action returns s.t. that is not a reference no node will be inserted. This fact can be used to supress nodes in the AST being built. See the following example:

nereida:~/src/perl/YappWithDefaultAction/examples> sed -ne '1,11p' returnnonode.yp | cat -n
 1  %tree
 2  %semantic token 'a' 'b'
 3  %%
 4  S:  /* empty */
 5      | S A
 6      | S B
 7  ;
 8  A : 'a'
 9  ;
10  B : 'b' { }
11  ;

since the action at line 10 returns undef the B : 'b' subtree will not be inserted in the AST:

nereida:~/src/perl/YappWithDefaultAction/examples> usereturnnonode.pl
ababa
S_2(S_3(S_2(S_3(S_2(S_1,A_4(TERMINAL[a]))),A_4(TERMINAL[a]))),A_4(TERMINAL[a]))

Observe the absence of Bs and 'b's.

Intermediate actions and `%tree`

Intermediate actions can be used to change the shape of the AST (prune it, decorate it, etc.) but the value returned by them is ignored. The grammar below has two intermediate actions. They modify the attributes of the node to its left and return a reference $f to such node (lines 5 and 6):

nereida:~/src/perl/YappWithDefaultAction/examples> \
         sed -ne '1,10p' intermediateactiontree.yp | cat -n
 1  %semantic token 'a' 'b'
 2  %tree bypass
 3  %%
 4  S:    /* empty */
 5      | S A.f { $f->{attr} = "A"; $f; } A
 6      | S B.f { $f->{attr} = "B"; $f; } B
 7  ;
 8  A : %name A 'a'
 9  ;
10  B : %name B 'b'

See the client program running:

nereida:~/src/perl/YappWithDefaultAction/examples> cat -n useintermediateactiontree.pl
 1  #!/usr/bin/perl -w
 2  use strict;
 3  use Parse::Eyapp;
 4  use intermediateactiontree;
 5
 6  { no warnings;
 7  *A::info = *B::info = sub { $_[0]{attr} };
 8  }
 9
10  my $parser = intermediateactiontree->new();
11  my $t = $parser->Run;
12  print $t->str,"\n";
nereida:~/src/perl/YappWithDefaultAction/examples> useintermediateactiontree.pl
aabbaa
S_2(S_4(S_2(S_1,A[A],A[a]),B[B],B[b]),A[A],A[a])

The attributes of left As have been changed from 'a' to 'A'. However no further children have been inserted.

The `alias` clause of the `%tree` directive

Access to children in Parse::Eyapp is made through the child and children methods. There are occasions however where access by name to the children may be preferable. The use of the alias clause with the %tree directive creates accessors to the children with names specified by the programmer. The dot and dolar notations are used for this. When dealing with a production like:

A: 
   %name A_Node
   Node B.bum N.pum $Chip

methods bum, pum and Chip will be created for the class A_Node. Those methods wil provide access to the respective child (first, second and third in the example). The methods are build at compile-time and therefore later transformations of the AST modifying the order of the children may invalidate the use of these getter-setters.

As an example, the CPAN module Language::AttributeGrammar provides AST decorators from an attribute grammar specification of the AST. To work Language::AttributeGrammar requires named access to the children of the AST nodes. Follows an example of a small calculator:

use Parse::Eyapp;
use Language::AttributeGrammar;

my $grammar = q{
... # priority declarations. Like in previous examples
%tree bypass alias

%%
line: $exp  { $_[1] }
;

exp:
    %name NUM
          $NUM
        | %name VAR
          $VAR
    ............ # as in the bypass example
}; # end grammar

Parse::Eyapp->new_grammar(
  input=>$grammar, classname=>'Rule6', firstline =>7,
);
my $parser = Rule6->new();
$parser->YYData->{INPUT} = "a = -(2*3+5-1)\n";
my $t = $parser->Run;
my $attgram = new Language::AttributeGrammar <<'EOG';
# Compute the expression
NUM:    $/.val = { $<attr> }
TIMES:  $/.val = { $<left>.val * $<right>.val }
PLUS:   $/.val = { $<left>.val + $<right>.val }
MINUS:  $/.val = { $<left>.val - $<right>.val }
UMINUS: $/.val = { -$<exp>.val }
ASSIGN: $/.val = { $<exp>.val }
EOG

my $res = $attgram->apply($t, 'val');

Tree Regular Expressions

Parse::Eyapp introduces a new language called Tree Regular Expressions that easies the transformation of trees. Let us recall the previous example used in the bypass section:

my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
  { #  Example of support code
    my %Op = (PLUS=>'+', MINUS => '-', TIMES=>'*', DIV => '/');
  }
  constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM, NUM)
    => {
      my $op = $Op{ref($_[0])};
      $NUM[0]->{attr} = eval  "$NUM[0]->{attr} $op $NUM[1]->{attr}";
      $_[0] = $NUM[0];
    }
  zero_times_whatever: TIMES(NUM, .) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
  whatever_times_zero: TIMES(., NUM) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }
  uminus: UMINUS(NUM) => { $NUM->{attr} = -$NUM->{attr}; $_[0] = $NUM }
  },
  OUTPUTFILE=> 'main.pm'
);
$p->generate(); # Create the tranformations

$t->s(@all);    # constant folding and mult. by zero

The call to the constructor new builds a Parse::Eyapp::Treeregexp object. The subsequent call to the method $p->generate compiles the object producing tree-transformations built according to the specification given in the treeregexp program. A tree transformation is a Parse::Eyapp::YATW object. The example contains four tree programa transformations named constantfold, zero_times_whatever, whatever_times_zero and <uminus>. These transformations can be grouped in transformation families. Such families of transformations can be applied to any Parse::Eyapp::Node trees. An special variable @PACKAGE::all refers to the whole set of transformations in the program. Here PACKAGE refers to the package where the transformations live. When no PACKAGE argument is specified in the call to new - as is the case in this example - the package of the caller is used instead. The call $t->s(@all) proceeds to the execution of the method s (for substitution) using all the specified transformations. The transformations will be iteratively applied to all nodes of the tree until there are no changes. Summarizing, that means that

All UMINUS nodes whose only child is a number NUM will be substituted by the NUM node but with the sign changed
Constant folding will be applied: trees representing constants expressions will be substituted by a NUM node representing its value
All the TIMES nodes with one child holding the value 0 will be substituted by that child

The Syntax of Treeregexp

The example illustrates the syntax of the language. A tree transformation conforms to the syntax:

treeregexp: 
    IDENT ':' treereg ('and' CODE)?  ('=>' CODE)?

like in:

zero_times_whatever: TIMES(NUM, .) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }

The IDENT is the name given to the tree transformation. A tree transformation is actually a Parse::Eyapp::YATW object. After generation time two package objects are created per transformation:

A subroutine with name zero_times_whatever holding the actual code for the tree transformation will be available and
A scalar variable named $zero_times_whatever will refer to the Parse::Eyapp::YATW tree transformation object.

These names live in the package specified by the user in the call to new through the PACKAGE argument. When no package name is specified the name of the caller package is used instead.

After the IDENT and the colon comes the treeregexp. The treeregexp is a term, that is a parenthesized description of the shape of the tree like TIMES(NUM, .) which says: match nodes of type TIMES whose left child is a NUM and whose right child is whatever. The dot stands for whatever and is a treeregexp that matches any node.

Then comes the reserved word and and some Perl code specifying the semantic conditions for the node being visited to match

{ $NUM->{attr} == 0 }

The code can access to the different subtrees using lexical variables whose names match the type of the node. Thus, in the example:

zero_times_whatever: TIMES(NUM, .) and { $NUM->{attr} == 0 } => { $_[0] = $NUM }

variable $NUM refers to the left child while variables $TIMES and $_[0] will refer refer to the node being visited. When more than one node of the same type exists (for instance TIMES(NUM,NUM)) the associated lexical variable changes its type from scalar to array and thus if several NUM nodes appear in the term we will speak about $NUM[0], $NUM[1], etc.

Separated Compilation with `treereg`

A Treeregexp program can be isolated in a file an compiled with the program treereg. The default extension is .trg. See the following example:

nereida:~/src/perl/YappWithDefaultAction/examples> cat -n Shift.trg
 1  # File: Shift.trg
 2  {
 3    sub log2 {
 4      my $n = shift;
 5      return log($n)/log(2);
 6    }
 7
 8    my $power;
 9  }
10  mult2shift: TIMES($e, NUM($m)) and { $power = log2($m->{attr}); (1 << $power) == $m->{attr} }
11    => {
12      $_[0]->delete(1);
13      $_[0]->{shift} = $power;
14      $_[0]->type('SHIFTLEFT');
15    }

Note that auxiliary support code can be inserted at any point between transformations (lines 2-6). The code will be inserted (without the defining curly brackets) at that point. Note also that the lexical variable $power is visible inside the definition of the mult2shift transformation.

A treeregexp like $e matches any node. A reference to the node is saved in the lexical variable $e. The scope of the variable $e is the current tree transformation, i.e. mult2shift. Such kind of treeregexps are called scalar treeregexps.

The call to the delete method at line 12 deletes the second child of the node being visited (i.e. NUM($m)).

The call to type at line 14 retypes the node as a SHIFTLEFT node.

The program is compiled using the script treereg:

nereida:~/src/perl/YappWithDefaultAction/examples> treereg Shift
nereida:~/src/perl/YappWithDefaultAction/examples> ls -ltr | tail -1
-rw-rw----  1 pl users   1405 2006-11-06 14:09 Shift.pm

The module Shift.pm contains the code implementing the tree transformations.

The client program follows:

nereida:~/src/perl/YappWithDefaultAction/examples> cat -n useruleandshift.pl
 1  #!/usr/bin/perl -w
 2  use strict;
 3  use Rule6;
 4  use Shift;
 5  { no warnings; *TERMINAL::info = \&TERMINAL::attr; }
 6
 7  push @SHIFTLEFT::ISA, 'Parse::Eyapp::Node';
 8  sub SHIFTLEFT::info { $_[0]{shift} }
 9
10  my $parser = new Rule6();
11  $parser->YYData->{INPUT} = <>;
12  my $t = $parser->Run;
13  print "***********\n",$t->str,"\n";
14  $t->s(@Shift::all);
15  print "***********\n",$t->str,"\n";

Lines 5 and 8 provide the node classes TERMINAL and SHIFTLEFT of info methods to be used by str (lines 13 and 14). To make SHIFTLEFT a node class it has to inherit from Parse::Eyapp::Node (line 7).

Multiplications by a power of two are substituted by the corresponding shifts:

nereida:~/src/perl/YappWithDefaultAction/examples> useruleandshift.pl
a=b*8
***********
ASSIGN(TERMINAL[a],TIMES(VAR(TERMINAL[b]),NUM(TERMINAL[8])))
***********
ASSIGN(TERMINAL[a],SHIFTLEFT[3](VAR(TERMINAL[b])))

Regexp Treeregexps

We can use an ordinary regular expression regexp inside the term part of a treeregexp. The constantfold transformation in the Synopsis example shows how:

constantfold: /TIMES|PLUS|DIV|MINUS/:bin(NUM($x), NUM($y))
  => {
    my $op = $Op{ref($bin)};
    $x->{attr} = eval  "$x->{attr} $op $y->{attr}";
    $_[0] = $NUM[0];
  }

The regexp is specified between division slashes /. It is legal to specify options after the second slash (like e, i, etc.). The optional identifier bin after the regexp indicates the name for the lexical variable holding a copy that references the node. If no identifier is specified, the special variable $W is used instead. If the treeregexp has several anonymous regexp or dot treeregexps they will be stored in the array variable @W.

The operation of the ordinary string oriented regexps are slightly modified when they are used inside a treeregexp. by default the option x will be assumed. The treeregexp compiler will automatically insert it. Use the new option X (upper case X) if you want to supress such behavior. There is no need also to insert \b word anchors to delimit identifiers: all the identifiers in a regexp treeregexp are automatically surrounded by \b. Use the option B (upper case B) to supress this behavior.

The following fragment of the type checking stage of a simple compiler shows that x is implictly assumed:

# Binary Operations
bin: / PLUS
      |MINUS
      |TIMES
      |DIV
      |MOD
      |GT
      |GE
      |LE
      |EQ
      |NE
      |LT
      |AND
      |EXP
      |OR
     /($x, $y)
  => {
    $x = char2int($_[0], 0);
    $y = char2int($_[0], 1);

    if (($x->{t} == $INT) and ( $y->{t} == $INT)) {
      $_[0]->{t} = $INT;
      return 1;
    }
    type_error("Incompatible types with operator '".($_[0]->lexeme)."'", $_[0]->line);
  }

With the natural Perl regexp semantic the language reserved word WHILE would match the regexp (see the LE for less or equal) leading to an erroneous type checking. The automatic insertion of word anchors prevent it.

Matching Trees

Both the transformation objects in Parse::Eyapp::YATW and the nodes in Parse::Eyapp::Node have a method named m for matching.

For a Parse::Eyapp::YATW object, the method -when called in a list context- returns a list of Parse::Eyapp::Node::Match nodes referencing the nodes of the actual tree that have matched. The nodes in the list are organized in a hierarchy.

The nodes are sorted in the list of trees (a forest) according to a depth-first visit of the actual tree $t.

In a scalar context m returns the first element of the list.

Let us denote by $t the actual tree being searched and $r one of the Parse::Eyapp::Node::Match nodes in the resulting forest. Then we have the following methods:

The method $r->node return the node $t of the actual tree that matched
The method $r->father returns the tree in the matching forest. The father is defined by this property: $r->father->node is the nearest ancestor of $r->node that matched with the treeregexp pattern. That is, there is no ancestor that matched between $r->node and $r->father->node. Otherwise $r->father is undef
The method $r->coord returns the coordinates of the actual tree that matched using s.t similar to the Dewey notation. for example, the coordinate ".1.3.2" denotes the node $t->child(1)->child(3)->child(2), where $t is the root of the search.
The method $r->depth returns the depth of $r->node in $t.
When called as a Parse::Eyapp::Node method, $r->names returns the array of names of the transformations that matched.

The following example illustrates a use of m as a Parse::Eyapp:YATW method. It solves a problem of scope analysis in a C compiler: matching each RETURN statement with the function that surrounds it. The treeregexp used is:

retscope: /FUNCTION|RETURN/

and the code that solves the problem is:

# Scope Analysis: Return-Function
my @returns = $retscope->m($t); 
for (@returns) {
  my $node = $_->node;
  if (ref($node) eq 'RETURN') {
    my $function = $_->father->node;
    $node->{function}  = $function;
    $node->{t} = $function->{t};
  }
}

The first line gets a list of Parse::Eyapp::Node::Match nodes describing the actual nodes that matched /FUNCTION|RETURN/. If the node described by $_ is a 'RETURN' node, the expresion $_->father->node must necessarily point to the function node that surrounds it.

The second example shows the use of m as a Parse::Eyapp::Node method.

nereida:~/src/perl/YappWithDefaultAction/examples> cat -n m2.pl
 1  #!/usr/bin/perl -w
 2  use strict;
 3  use Rule6;
 4  use Parse::Eyapp::Treeregexp;
 5
 6  Parse::Eyapp::Treeregexp->new( STRING => q{
 7    fold: /times|plus|div|minus/i:bin(NUM($n), NUM($m))
 8    zero_times_whatever: TIMES(NUM($x), .) and { $x->{attr} == 0 }
 9    whatever_times_zero: TIMES(., NUM($x)) and { $x->{attr} == 0 }
10  })->generate();
11
12  # Syntax analysis
13  my $parser = new Rule6();
14  print "Expression: "; $parser->YYData->{INPUT} = <>;
15  my $t = $parser->Run;
16  local $Parse::Eyapp::Node::INDENT = 1;
17  print "Tree:",$t->str,"\n";
18
19  # Search
20  my $m = $t->m(our ($fold, $zero_times_whatever, $whatever_times_zero));
21  print "Match Node:",$m->str,"\n";

When executed with input 0*0*0 the program generates this output:

nereida:~/src/perl/YappWithDefaultAction/examples> m2.pl
Expression: 0*0*0
Tree:
TIMES(
  TIMES(
    NUM(
      TERMINAL
    ),
    NUM(
      TERMINAL
    )
  ),
  NUM(
    TERMINAL
  )
)
Match Node:
Match[TIMES:0:whatever_times_zero](
  Match[TIMES:1:fold,zero_times_whatever,whatever_times_zero]
)

The representation of Match nodes by str deserves a comment. Match nodes have their own info method. It returns a string containing the concatenation of the class of $r->node (i.e. the actual node that matched), the depth ($r->depth) and the names of the transformations that matched (as provided by the method $r->names)

The `SEVERITY` option of `Parse::Eyapp::Treeregexp::new`

The SEVERITY option of Parse::Eyapp::Treeregexp::new controls the way matching succeeds regarding the number of children. To illustrate its use let us consider the following example. The grammar Rule6 used by the example is similar to the one in the Synopsis example.

nereida:~/src/perl/YappWithDefaultAction/examples> cat -n numchildren.pl
 1  #!/usr/bin/perl -w
 2  use strict;
 3  use Rule6;
 4  use Parse::Eyapp::Treeregexp;
 5  use Parse::Eyapp::Node;
 6
 7  sub TERMINAL::info { $_[0]{attr} }
 8
 9  my $severity = shift || 0;
10  my $parser = new Rule6();
11  $parser->YYData->{INPUT} = shift || '0*2';
12  my $t = $parser->Run;
13
14  my $transform = Parse::Eyapp::Treeregexp->new(
15    STRING => q{
16      zero_times_whatever: TIMES(NUM($x)) and { $x->{attr} == 0 } => { $_[0] = $NUM }
17    },
18    SEVERITY => $severity,
19    FIRSTLINE => 15,
20  )->generate;
21
22  $t->s(our @all);
23
24  print $t->str,"\n";

The program gets the severity level from the command line (line 9). The specification of the term TIMES(NUM($x)) inside the transformation zero_times_whatever does not clearly state that TIMES must have two children. There are several interpretations of the treregexp depending on the level fixed for SEVERITY:

0: TIMES must have at least one child. Don't care if it has more.
1: TIMES must have exactly one child.
2: TIMES must have exactly one child. When visit a TIMES node with a different number of children issue a warning.
3: TIMES must have exactly one child. When visit a TIMES node with a different number of children issue an error.

Observe the change in behavior according to the level of SEVERITY:

nereida:~/src/perl/YappWithDefaultAction/examples> numchildren.pl 0 '0*2'
NUM(TERMINAL[0])
nereida:~/src/perl/YappWithDefaultAction/examples> numchildren.pl 1 '0*2'
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
nereida:~/src/perl/YappWithDefaultAction/examples> numchildren.pl 2 '0*2'
Warning! found node TIMES with 2 children.
Expected 1 children (see line 16 of numchildren.pl)"
TIMES(NUM(TERMINAL[0]),NUM(TERMINAL[2]))
nereida:~/src/perl/YappWithDefaultAction/examples> numchildren.pl 3 '0*2'
Error! found node TIMES with 2 children.
Expected 1 children (see line 16 of numchildren.pl)"
 at (eval 2) line 29

Array Treeregexp Expressions

The Treeregexp language permits expressions like:

A(@a,B($x),@c)

After the matching variable @A contains the shortest prefix of $A->children that does not match B($x). The variable @c contains the remaining sufix of $A->children.

The following example uses array treereg expressions to move an assignment out of loop (to be correct, we have to guarantee that the assignment is an invariant of the loop). See lines 98-111:

nereida:~/src/perl/YappWithDefaultAction/examples> \
      cat -n moveinvariantoutofloopcomplexformula.pl
  1  #!/usr/bin/perl -w
  2  use strict;
  3  use Parse::Eyapp;
  4  use Parse::Eyapp::Treeregexp;
  5
  6  my $grammar = q{
 ..      ............ # as usual, but including WHILE loops
 80  }; # end grammar
 ..  ......................................................................
 93  my $program = "a =1000; c = 1; while (a) { c = c*a; b = 5; a = a-1 }\n";
 94  $parser->YYData->{INPUT} = $program;
 95  my $t = $parser->Run;
 96  my @output = split /\n/, $t->str;
 97
 98  my $p = Parse::Eyapp::Treeregexp->new( STRING => q{
 99    moveinvariant: BLOCK(
100                     @prests,
101                     WHILE(VAR($b), BLOCK(@a, ASSIGN($x, NUM($e)), @c)),
102                     @possts
103                   )
104      => {
105           my $assign = $ASSIGN;
106           $BLOCK[1]->delete($ASSIGN);
107           $BLOCK[0]->insert_before($WHILE, $assign);
108         }
109    },
110    FIRSTLINE => 99,
111  );
112  $p->generate();
113  $moveinvariant->s($t);
114  my @output2 = split /\n/, $t->str;
115
116  my ($node1, $node2);
117  format STDOUT_TOP =
118                          PROGRAM
119  -------------------------------------------------------
120  @||||||||||||||||||||||||||||||||||||||||||||||||||||||
121  $program
122  -------------------------------------------------------
123  Before                     |    After
124  ---------------------------|---------------------------
125  .
126
127  format STDOUT =
128  @<<<<<<<<<<<<<<<<<<<<<<<<<<@|@<<<<<<<<<<<<<<<<<<<<<<<<<
129  $node1,                    '|',$node2
130  .
131
132  for (1..$#output) {
133    $node1 = $output[$_];
134    $node2 = $output2[$_];
135    write;
136  }

The call to the method delete at line 106 deletes the ASSIGN child of the second BLOCK. The copy saved in $assign is inserted as a child of the first block before the loop. Here is the output:

nereida:~/src/perl/YappWithDefaultAction/examples> \
      moveinvariantoutofloopcomplexformula.pl | cat -n
 1                          PROGRAM
 2  -------------------------------------------------------
 3   a =1000; c = 1; while (a) { c = c*a; b = 5; a = a-1 }
 4  -------------------------------------------------------
 5  Before                     |    After
 6  ---------------------------|---------------------------
 7  BLOCK(                     | BLOCK(
 8    ASSIGN(                  |   ASSIGN(
 9      TERMINAL[a],           |     TERMINAL[a],
10      NUM(                   |     NUM(
11        TERMINAL[1000]       |       TERMINAL[1000]
12      )                      |     )
13    ) # ASSIGN,              |   ) # ASSIGN,
14    ASSIGN(                  |   ASSIGN(
15      TERMINAL[c],           |     TERMINAL[c],
16      NUM(                   |     NUM(
17        TERMINAL[1]          |       TERMINAL[1]
18      )                      |     )
19    ) # ASSIGN,              |   ) # ASSIGN,
20    WHILE(                   |   ASSIGN(
21      VAR(                   |     TERMINAL[b],
22        TERMINAL[a]          |     NUM(
23      ),                     |       TERMINAL[5]
24      BLOCK(                 |     )
25        ASSIGN(              |   ) # ASSIGN,
26          TERMINAL[c],       |   WHILE(
27          TIMES(             |     VAR(
28            VAR(             |       TERMINAL[a]
29              TERMINAL[c]    |     ),
30            ),               |     BLOCK(
31            VAR(             |       ASSIGN(
32              TERMINAL[a]    |         TERMINAL[c],
33            )                |         TIMES(
34          ) # TIMES          |           VAR(
35        ) # ASSIGN,          |             TERMINAL[c]
36        ASSIGN(              |           ),
37          TERMINAL[b],       |           VAR(
38          NUM(               |             TERMINAL[a]
39            TERMINAL[5]      |           )
40          )                  |         ) # TIMES
41        ) # ASSIGN,          |       ) # ASSIGN,
42        ASSIGN(              |       ASSIGN(
43          TERMINAL[a],       |         TERMINAL[a],
44          MINUS(             |         MINUS(
45            VAR(             |           VAR(
46              TERMINAL[a]    |             TERMINAL[a]
47            ),               |           ),
48            NUM(             |           NUM(
49              TERMINAL[1]    |             TERMINAL[1]
50            )                |           )
51          ) # MINUS          |         ) # MINUS
52        ) # ASSIGN           |       ) # ASSIGN
53      ) # BLOCK              |     ) # BLOCK
54    ) # WHILE                |   ) # WHILE
55  ) # BLOCK                  | ) # BLOCK

Translation Schemes

Eyapp allows through the %metatree directive the creation of Translation Schemes as described in the Dragon's book. Instead of executing the semantic actions associated with the productions, the syntax tree is built. Semantic actions aren't executed. Instead they are inserted as nodes of the syntax tree. The main difference with ordinary nodes being that the attribute of such a CODE node is a reference to the anonymous subroutine representing the semantic action. The tree is later traversed in depth-first order using the $t->translation_scheme method: each time a CODE node is visited the action is executed.

The following example parses a tiny subset of a typical typed language and decorates the syntax tree with a new attribute t holding the type of each declared variable:

use strict; # File examples/trans_scheme_simple_decls4.pl
use Data::Dumper;
use Parse::Eyapp;
our %s; # symbol table

my $ts = q{ 
  %token FLOAT INTEGER NAME

  %{
  our %s;
  %}

  %metatree

  %%
  Dl:  D <* ';'>
  ;

  D : $T { $L->{t} = $T->{t} } $L
  ;

  T : FLOAT    { $lhs->{t} = "FLOAT" }
    | INTEGER  { $lhs->{t} = "INTEGER" }
  ;

  L : $NAME
        { $NAME->{t} = $lhs->{t}; $s{$NAME->{attr}} = $NAME }
    | $NAME { $NAME->{t} = $lhs->{t}; $L->{t} = $lhs->{t} } ',' $L
        { $s{$NAME->{attr}} = $NAME }
  ;
  %%
}; # end $ts

sub Error { die "Error sintáctico\n"; }

{ # Closure of $input, %reserved_words and $validchars
  my $input = "";
  my %reserved_words = ();
  my $validchars = "";

  sub parametrize__scanner {
    $input = shift;
    %reserved_words = %{shift()};
    $validchars = shift;
  }

  sub scanner {
    $input =~ m{\G\s+}gc;                     # skip whites
    if ($input =~ m{\G([a-z_A_Z]\w*)\b}gc) {
      my $w = uc($1);                 # upper case the word
      return ($w, $w) if exists $reserved_words{$w};
      return ('NAME', $1);            # not a reserved word
    }
    return ($1, $1) if ($input =~ m/\G([$validchars])/gc);
    die "Not valid token: $1\n" if ($input =~ m/\G(\S)/gc);
    return ('', undef); # end of file
  }
} # end closure

Parse::Eyapp->new_grammar(input=>$ts,classname=>'main',outputfile=>'Types.pm');
my $parser = main->new(yylex => \&scanner, yyerror => \&Error); 

parametrize__scanner(
  "float x,y;\ninteger a,b\n",
  { INTEGER => 'INTEGER', FLOAT => 'FLOAT'},
  ",;"
);

my $t = $parser->YYParse() or die "Syntax Error analyzing input";

$t->translation_scheme;

$Data::Dumper::Indent = 1;
$Data::Dumper::Terse = 1;
$Data::Dumper::Deepcopy  = 1;
$Data::Dumper::Deparse = 1;
print Dumper($t);
print Dumper(\%s);

Inside a Translation Scheme the lexical variable $lhs refers to the attribute of the father.

Execution Stages of a Translation Scheme

The execution of a Translation Scheme can be divided in the following stages:

1. During the first stage the grammar is analyzed and the parser is built:

Parse::Eyapp->new_grammar(input=>$ts,classname=>'main',outputfile=>'Types.pm');

This stage is called Class Construction Time

2. A parser conforming to the generated grammar is built

my $parser = main->new(yylex => \&scanner, yyerror => \&Error);

This stage is called Parser Construction Time

3. The next phase is Tree construction time. The input is set and the tree is built:

parametrize__scanner(
   "float x,y;\ninteger a,b\n",
   { INTEGER => 'INTEGER', FLOAT => 'FLOAT'},
   ",;"
 );

 my $t = $parser->YYParse() or die "Syntax Error analyzing input";

4. The last stage is Execution Time. The tree is traversed in depth first order and the CODE nodes are executed.

$t->translation_scheme;

This combination of bottom-up parsing with depth first traversin leads to a semantic behavior similar to LL and top-down parsers but with several differences:

The grammar can be left-recursive
At the time of executing the action the syntax tree is already built, therefore we can refer to nodes on the right side of the action like in:
```
D : $T { $L->{t} = $T->{t} } $L
```

The `%begin` directive

The %begin { code } directive can be used when building a translation scheme, i.e. when under the control of the %metatree directive. It indicates that such code will be executed at tree construction time. Therefore the code receives as arguments the references to the nodes of the branch than is being built. Usually the code assist in the construction of the tree. Line 39 of the following code shows an example. The action { $exp } simplifies the syntax tree bypassing the parenthesis node. The example also illustrates the combined use of default actions and translation schemes.

nereida:~/src/perl/YappWithDefaultAction/examples> \
               cat -n trans_scheme_default_action.pl
  1  #!/usr/bin/perl -w
  2  use strict;
  3  use Data::Dumper;
  4  use Parse::Eyapp;
  5  use IO::Interactive qw(is_interactive);
  6
  7  my $translationscheme = q{
  8  %{
  9  # head code is available at tree construction time
 10  use Data::Dumper;
 11  our %sym; # symbol table
 12  %}
 13
 14  %defaultaction { $lhs->{n} = eval " $left->{n} $_[2]->{attr} $right->{n} " }
 15
 16  %metatree
 17
 18  %right   '='
 19  %left   '-' '+'
 20  %left   '*' '/'
 21
 22  %%
 23  line:       %name EXP
 24                exp <+ ';'> /* Expressions separated by semicolons */
 25                  { $lhs->{n} = $_[1]->Last_child->{n} }
 26  ;
 27
 28  exp:
 29              %name PLUS
 30                exp.left '+' exp.right
 31          |   %name MINUS
 32                exp.left '-' exp.right
 33          |   %name TIMES
 34                exp.left '*' exp.right
 35          |   %name DIV
 36                exp.left '/' exp.right
 37          |   %name NUM   $NUM
 38                  { $lhs->{n} = $NUM->{attr} }
 39          |   '(' $exp ')'  %begin { $exp }
 40          |   %name VAR
 41                $VAR
 42                  { $lhs->{n} = $sym{$VAR->{attr}}->{n} }
 43          |   %name ASSIGN
 44                $VAR '=' $exp
 45                  { $lhs->{n} = $sym{$VAR->{attr}}->{n} = $exp->{n} }
 46
 47  ;
 48
 49  %%
 50  # tail code is available at tree construction time
 51  sub _Error {
 52    die "Syntax error.\n";
 53  }
 54
 55  sub _Lexer {
 56      my($parser)=shift;
 57
 58      for ($parser->YYData->{INPUT}) {
 59          defined($_) or  return('',undef);
 60
 61          s/^\s*//;
 62          s/^([0-9]+(?:\.[0-9]+)?)// and return('NUM',$1);
 63          s/^([A-Za-z][A-Za-z0-9_]*)// and return('VAR',$1);
 64          s/^(.)// and return($1,$1);
 65          s/^\s*//;
 66      }
 67  }
 68
 69  sub Run {
 70      my($self)=shift;
 71      return $self->YYParse( yylex => \&_Lexer, yyerror => \&_Error );
 72  }
 73  }; # end translation scheme
 74
 75  $Data::Dumper::Indent = 1;
 76  $Data::Dumper::Terse = 1;
 77  $Data::Dumper::Deepcopy  = 1;
 78  my $p = Parse::Eyapp->new_grammar(
 79    input=>$translationscheme,
 80    classname=>'main',
 81    firstline => 6,
 82    outputfile => 'main.pm');
 83  die $p->qtables() if $p->Warnings;
 84  my $parser = main->new();
 85  print "Write a sequence of arithmetic expressions: " if is_interactive();
 86  $parser->YYData->{INPUT} = <>;
 87  my $t = $parser->Run() or die "Syntax Error analyzing input";
 88  $t->translation_scheme;
 89  my $treestring = Dumper($t);
 90  our %sym;
 91  my $symboltable = Dumper(\%sym);
 92  print <<"EOR";
 93  ***********Tree*************
 94  $treestring
 95  ******Symbol table**********
 96  $symboltable
 97  ************Result**********
 98  $t->{n}
 99
100  EOR

Scope Analysis with `Parse::Eyapp::Scope`

Parse::Eyapp provides support for Scope Analysis through the module Parse::Eyapp::Scope. Scope Analysis solves the problem of matching each instance or use of an object in the source text with the definition that applies to such instance. Since it is a matching problem it can sometimes easily solved using m as it was explained in section "Matching Trees".

The following pieces of code show how to implement scope analysis for a C-like language using Parse::Eyapp::Scope

nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
                          sed -n -e '131,149p' Types.eyp | cat -n
 1  sub reset_file_scope_vars {
 2    %st = (); # reset symbol table
 3    ($tokenbegin, $tokenend) = (1, 1);
 4    %type = ( INT  => Parse::Eyapp::Node->hnew('INT'),  # like new but
 5              CHAR => Parse::Eyapp::Node->hnew('CHAR'), # creates a DAG
 6              VOID => Parse::Eyapp::Node->hnew('VOID'), 
 7            );
 8    $depth = 0;
 9    $ids = Parse::Eyapp::Scope->new(
10             SCOPE_NAME => 'block',
11             ENTRY_NAME => 'info',
12             SCOPE_DEPTH => 'depth',
13    );
14    $loops = Parse::Eyapp::Scope->new(
15             SCOPE_NAME => 'exits',
16    );
17    $ids->begin_scope();
18    $loops->begin_scope(); 
19  }

Of course you have to include a directive

use Parse::Eyapp::Scope

in your client program.

The calls to Parse::Eyapp::Scope->new method (lines 9-13 and 14-16 in the code above) create two Scope Manager objects. One scope manager to solve the scope problem for variables ($ids) and another to solve the scope problem for loops ($loops). The scope problem for loops consists in matching each instance of a BREAK or CONTINUE with the enclosing loop. The beginning of a scope is set by calling to the begin_scope method (lines 17 and 18). The end of a scope is signalled by a call to the method end_scope. Of course, sub reset_file_scope_vars must be executed at the proper time:

nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
                            sed -n -e '170,203p' Types.eyp | cat -n
 1  program: /* program -> definition +  */
 2        {
 3          reset_file_scope_vars();
 4        }
 5      definition<%name PROGRAM +>.program
 6        {
 7          $program->{symboltable} = { %st };  # creates a copy of the s.t.
 8          $program->{depth} = 0;
 9          $program->{line}  = 1;
10          $program->{types} = { %type };
11          $program->{lines} = $tokenend;
12
13          my ($nondec, $declared) = $ids->end_scope($program->{symboltable}, $program, 'type');
14
15          # Type checking: add a direct pointer to the data-structure
16          # describing the type
17          $_->{t} = $type{$_->{type}} for @$declared;
18
19          if (@$nondec) {
20            warn "Identifier ".$_->key." not declared at line ".$_->line."\n" for @$nondec;
21            die "\n";
22          }
23
24          my $out_of_loops = $loops->end_scope($program);
25          if (@$out_of_loops) {
26            warn "Error: ".ref($_)." outside of loop at line $_->{line}\n" for @$out_of_loops;
27            die "\n";
28          }
29
30          # Check that are not dangling breaks
31          reset_file_scope_vars();
32
33          $program;
34        }

Observe the different ways of calling end_scope (lines 13 and 24). When a hash table is provided as first argument the declared symbols will be automatically inserted in it. In such case the classes of the nodes being inserted must have a key method that computes the key for such node.

nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
                           sed -n -e '651,657p' Types.eyp | cat -n
  1  sub VAR::key {
  2    my $self = shift;
  3
  4    return $self->child(0)->{attr}[0];
  5  }
  6
  7  *VARARRAY::key = *FUNCTIONCALL::key = \&VAR::key;

Each instance of an scoped object must be declared as belonging to the current scope using the scope_instance method. The following is an example for the $loops scope manager object:

nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
                           sed -n -e '335,346p' Types.eyp | cat -n
    1  statement:
    2      expression ';' { $_[1] }
    3    | ';'
    4    | %name BREAK
    5      $BREAK ';'
    6        {
    7          my $self = shift;
    8          my $node = $self->YYBuildAST(@_);
    9          $node->{line} = $BREAK->[1];
   10          $loops->scope_instance($node);
   11          return $node;
   12        }

and the following illustrates the same for the $ids scope manager:

nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
                            sed -n -e '410,425p' Types.eyp | cat -n
    1  Primary:
    2      %name INUM
    3      INUM
    4    | %name CHARCONSTANT
    5      CHARCONSTANT
    6    | $Variable
    7        {
    8          $ids->scope_instance($Variable);
    9          return $Variable
   10        }
   11    | '(' expression ')' { $_[2] }
   12    | $function_call
   13        {
   14          $ids->scope_instance($function_call);
   15          return $function_call  # bypass
   16        }

Of course, in each place where a new scope begins/ends the corresponding calls to begin_scope and end_scope must be issued. See the following code:

nereida:~/doc/casiano/PLBOOK/PLBOOK/code/Simple-Types/lib/Simple> \
                            sed -n -e '277,302p' Types.eyp | cat -n
  1  block:  /* Production is: block -> '{' declaration * statement * '}' */
  2      '{'.bracket
  3         { $ids->begin_scope(); }
  4       declaration<%name DECLARATIONS *>.decs statement<%name STATEMENTS *>.sts '}'
  5         {
  6           my %st;
  7
  8           for my $lst ($decs->children) {
  9
 10               # control duplicated declarations
 11             my $message;
 12             die $message if $message = is_duplicated(\%st, $lst);
 13
 14             %st = (%st, %$lst);
 15           }
 16           $sts->{symboltable} = \%st;
 17           $sts->{line} = $bracket->[1];
 18           $sts->type("BLOCK") if (%st);
 19           my ($nondec, $dec) = $ids->end_scope(\%st, $sts, 'type');
 20
 21           # Type checking: add a direct pointer to the data-structure
 22           # describing the type
 23           $_->{t} = $type{$_->{type}} for @$dec;
 24
 25           return $sts;
 26         }

REFERENCES

The classic book "Compilers: Principles, Techniques, and Tools" by Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman (Addison-Wesley 1986)

AUTHOR

Casiano Rodriguez-Leon (casiano@ull.es)

A large percentage of code is verbatim taken from Parse::Yapp 1.05. The author of Parse::Yapp is Francois Desarmenien.

ACKNOWLEDGMENTS

This work has been supported by CEE (FEDER) and the Spanish Ministry of Educación y Ciencia through Plan Nacional I+D+I number TIN2005-08818-C04-04 (ULL::OPLINK project http://www.oplink.ull.es/). Support from Gobierno de Canarias was through GC02210601 (Grupos Consolidados). The University of La Laguna has also supported my work in many ways and for many years.

I wish to thank Francois Desarmenien for his Parse::Yapp module, to my students at La Laguna and to the Perl Community. Special thanks to my family and Larry Wall.

LICENCE AND COPYRIGHT

These modules are free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

3 POD Errors

The following errors were encountered while parsing the POD:

Around line 820:: Expected '=item *'
Around line 823:: Expected '=item *'
Around line 1443:: Non-ASCII character seen before =encoding in 'sintáctico\n";'. Assuming CP1252

To install Parse::Eyapp, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Parse::Eyapp

CPAN shell

perl -MCPAN -e shell
install Parse::Eyapp

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

Introduction to Parse::Eyapp

Input from strings

Names for attributes

Lists and Optionals

Default actions

Compiling with eyapp

Abstract Syntax Trees

Displaying Trees

TERMINAL nodes

User Attributes and System Attributes

Syntactic and Semantic tokens

Saving the Information In Syntactic Tokens

The directives %syntactic token and %semantic token

The bypass clause and the %no bypass directive

Explictly building nodes with the YYBuildAST method

The child and descendant methods

Returning non References Under %tree

Intermediate actions and %tree

The alias clause of the %tree directive