The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Parse::Stallion - EBNF based regexp backtracking parser and tree evaluator.

SYNOPSIS

  use Parse::Stallion;

  my %rules = (rule_name_1 => ..rule_definition.. ,
   rule_name_2 => ..rule_definition.. ,
   ...);

  my $stallion = new Parse::Stallion(
    \%rules,
     # the following parameters are optional
   {start_rule => 'rule_name_1', #default the rule which is not a subrule
    do_evaluation_in_parsing => 0, #default 0
    max_steps => 200000, #default 1000000;
    do_not_compress_eval => 0, #default 0
    separator => '__XZ__', #default '__XZ__'
    parse_forward => sub {...}, #default no sub
    parse_backtrack => sub {...}, #default no sub
  });

  my $parse_info_hash = {}; # optional, little impact on performance
  my $parse_trace = []; # optional, some impact on performance
  my $result = $stallion->parse_and_evaluate($given_string,
    # usually omit the following
   {max_steps => 30000, #default from parser
    parse_info => $parse_info_hash, #if provided, parse info returned
    trace => $parse_trace # if provided, trace returned
   });
  # returns undef if unable to parse

Rule Definitions (may be abbreviated to first letter):

  AND('subrule_1', 'subrule_2', ..., EVALUATION(sub{...}))

  OR('subrule_1', 'subrule_2', ..., EVALUATION(sub{...}))

  MULTIPLE('subrule_1', EVALUATION(sub{...}))

  LEAF(qr/regex/, EVALUATION(sub{...}))

DESCRIPTION

Stallion parses and evaluates a string using entered grammar rules. The parsing is done top-down via an initial start rule, in a depth first search forming a parse tree. When a rule does not match the parser backtracks to a node that has another option.

For evaluating a tree node, the evaluation subroutine is given a reference to a hash representing the returned values of the child nodes. The evaluation may be done while creating the parse tree and reject a match affecting which strings parse. This allows complex grammars.

If the evaluation is not done while parsing, on a successful parse, the tree is evaluated in bottom up, left to right order.

The grammars recognized are context free and are similar to those expressed in Extended Backus-Naur Form (EBNF).

The object being parsed does not need to be a string. Except for the section on non-strings, the documentation assumes strings are being parsed.

COMPLETE EXAMPLES

The following examples read in two unsigned integers and adds them.

  use Parse::Stallion;

   my %basic_grammar = (
    expression =>
     AND('number',
      qr/\s*\+\s*/,
      'number',
      EVALUATION(
       sub {return $_[0]->{number}->[0] + $_[0]->{number}->[1]})
    ),
    number => LEAF(qr/\d+/,
      E(sub{return 0 + $_[0];}))
     #0 + $_[0] converts the matched string into a number
   );

   my $parser = new Parse::Stallion(\%basic_grammar);

   my $result = $parser->parse_and_evaluate('7+4');
   #$result should contain 11

   my %grammar_2 = (
    expression =>
     A('number',
      qr/\s*\+\s*/,
      {right_number => 'number'},
      E(sub {return $_[0]->{number} + $_[0]->{right_number}})
    ),
    number => L(qr/\d+/,
      EVALUATION(sub{return 0 + $_[0];}))
   );

   my $parser_2 = new Parse::Stallion(
    \%grammar_2, {start_rule => 'expression'});

   my $result_2 = $parser_2->parse_and_evaluate('8 + 5');
   #$result_2 should contain 13

   use Parse::Stallion::EBNF; #see documenation on Parse::Stallion::EBNF

   my $grammar_3 = 'start = (left.number qr/\s*\+\s*/ right.number)
      S{return $_[0]->{left} + $_[0]->{right}}S;
     number = qr/\d+/;';

   my $parser_3 = ebnf_new Parse::Stallion::EBNF($grammar_3);

   my $result_3 = $parser_3->parse_and_evaluate('1 + 6');
   #$result_3 should contain 7

RULES

There are 4 rule types: 'LEAF', 'AND', 'OR', and 'MULTIPLE'.

Parsing begins from the start rule, if the 'start_rule' parameter is omitted, the rule which is not a subrule is used as the start rule. The start rule can be of any type, though if the start rule is a 'LEAF', the grammar is essentially just a regular expression.

After a successful parse, the external nodes correspond to the substrings that the 'LEAF' rules matched; the other rule types correspond to the internal nodes.

LEAF

A 'LEAF' rule contains a regexp that must match the beginning part of the remaining input string, internally \A is prepended to the regexp. During parsing, when a 'LEAF' matches, the matched substring is removed from the input string, though reattached if backtracking occurs. The text LEAF is optional, regexp's are assumed to be leaves, but then there can be no EVALUATION routine for that leaf.

If the regexp matches but is empty, the matched value is set to an empty string as opposed to an undefined value.

Examples (equivalent):

  LEAF(qr/xx\w+/)

  L(qr/xx\w+/)

  qr/xx\w+/

would match any perl word (\w+) starting with "xx".

See the section 'LEAF DETAILS' for other ways to handle leaves.

AND

An 'AND' rule contains a list of subrules that must be completely matched, from left to right, for the 'and' rule to match.

Examples (equivalent):

  AND('rule_1', 'rule_2', 'rule_3')

  A('rule_1', 'rule_2', 'rule_3')

OR

An 'OR' rule contains a list of subrules, one of which must be matched for the 'OR' rule to match.

During parsing, the subrules are attempted to be matched left to right. If a subrule matches and then is subsequently backtracked, the parser will try to match the next subrule.

Examples (equivalent):

  OR('rule_1', 'rule_2', 'rule_3')

  O('rule_1', 'rule_2', 'rule_3')

MULTIPLE (and OPTIONAL)

A 'MULTIPLE' rule matches if its subrule matches repeatedly between a minimum and maximum number of times. The first parameter to 'MULTIPLE' is the subrule, the next 2 optional parameters are the minium and maximum repititions allowed. The default minimum is 0 and the default maximum is "infinite", though this is represented by setting the maximum to 0.

For there to be another repetition, the input string must have been shortened, else it would be considered an illegal form of "left recursion".

By default the maximal number of possible matches of the repeating rule are tried and then if backtracking occurs, the number of matches is decremented. If the parameter 'match_min_first' is passed in, the minimal number of matches is tried first and the number of matches increases when backtracking.

Examples (equivalent):

  MULTIPLE('rule_1')

  M('rule_1')

  M('rule_1', 0, 0)

One can label a rule with the value 'OPTIONAL' that maps to a 'MULTIPLE' rule with minimum 0 and maximum 1.

Examples (equivalent):

  OPTIONAL('rule_1')

  ZERO_OR_ONE('rule_1')

  Z('rule_1')

  MULTIPLE('rule_1', 0, 1)

to get one or more:

  MULTIPLE('rule_2', 1, 0)

SIMILARITY BETWEEN RULE TYPES.

The following rules all parse tree-wise equivalently and evaluate equivalently since a MULTIPLE rule that has at most 1 child does not return an array ref.

  AND('subrule')

  O('subrule')

  MULTIPLE('subrule', 1, 1)

  M('subrule', 1, 1, 'match_min_first')

NESTED RULES

Rules can be nested inside of other rules. See the section EVALUATION for how nested rules affect tree evaluations.

Example:

  sum => AND('number', MULTIPLE(AND('plus', 'number')));

is equivalent, parsing wise, to

  sum => A('number', 'plus_numbers');
  plus_numbers = M('plus_number');
  plus_number => A('plus', 'number');

ALIASES

One can also use an alias for a rule by a hash reference with a single key/value pair. This sets the name to the key when evaluating the parsed expression and parsing the subrule specified by the value.

  adding =  A(
   'number', L(qr/\s*[+]\s*/),
     {right_number => 'number'},
   E(sub {return $_[0]->{number} + $_[0]->{right_number}})
  );

RULE NAMES

Rule names cannot contain the 'separator' substring '__XZ__', to avoid confliciting with internally generated rule names. This can be changed by using the 'separator' parameter.

ENSURING RULES FORM COMPLETE GRAMMAR

Stallion ensures that a grammar is complete and croaks if the given grammar has any rules not reachable from the start rule or if within any rule a subrule does not exist.

PARSE_AND_EVALUATE

After setting up a Parse::Stallion parser, strings are parsed and evaluated via parse_and_evaluate. In scalar context, the returned value is the returned value of the root node's evaluation routine.

RETURNED VALUES

To get details on the parsing from parse_and_evaluate, a parameter parse_info should be passed an empty hash that will be filled in. Unless an evaluation can return an 'undef' value, not true in the examples provided, one can check if the returned value is 'undef' to determine if an evaluation failed.

  my $parse_info = {};
  my $parse_trace = [];
  my $value =
   $stallion->parse_and_evaluate($given_string, {parse_info=>$parse_info,
    parse_trace => $parse_trace});

  $parse_info->{parse_succeeded}; # is 1 if the string parses, else 0.
  $parse_info->{number_of_steps}; # number of steps parsing took
  $parse_info->{start_rule};

  # $parse_trace contains one hash per step, the hash keys are
  #  1) rule_name
  #  2) moving_forward (value 0 if backtracking),
  #  3) moving_down (value 0 if moving up parse tree)
  #  4) value (position in string or value returned from parse_forward)
  #  5) node_creation_step, uniquely identifies node in parse tree
  #  6) parent_node_creation_step, parent in parse tree
  #  7) informative message of most recent parse step
  #  8) nodes of the current parse tree

  $parse_info->{tree}; # the parse tree if the string parses.

The tree is an Parse::Stallion object having a function, that converts a tree into a string, each node consisting of one line:

  $parse_info->{tree}->stringify({values=>['name','parse_match']});

Internally generated node names, from rules generated by breaking up the entered rules into subrules, will show up. The module Parse::Stallion::EBNF shows the grammar with these generated subrules.

NUMBER OF PARSE STEPS

If the parsing reaches the maximum number of steps without completing a parse tree, the parse fails (croak). Each step is listed in the parse_trace.

A step is an action on a node, roughly speaking matching a regex for a 'leaf' node, or moving forward or backtracking from a node.

The maximum number of steps can be changed, default 100,000. If max_steps is set to a negative number, then there is no limit on the number of steps.

  $stallion->parse_and_evaluate($string, {max_steps=>200000});

"LEFT RECURSION"

Parse::Stallion may encounter "left recursiveness" during parsing in which case the parsing stops and a message is 'croak'ed.

"Left recursion" occurs during parsing when the same non-'leaf' rule shows up a second time on the parse tree with the at the same position in the input string.

Illegal Case 1:

     expression => AND('expression', 'plus', 'term')

Illegal Case 2:

     rule_with_empty => AND('empty', 'rule_with_empty', 'other_rule')
     empty => qr//

Illegal Case 3:

     rule_with_optional => AND('nothing', 'optional_rule', 'nothing')
     nothing => AND('empty')
     empty => L(qr//)
     optional_rule => OPTIONAL('some_other_rule')
     some_other_rule => qr/x/

The 3rd case will detect left recursiveness if optional_rule does not match and modify the input.

EVALUATION

Evaluation can occur during or after parsing.

Each rule may have an evaluation subroutine tied to it, if when defining a rule there is no subroutine, a default "do nothing" subroutine is provided. Subrules may have a subroutine tied to them though by default they have none.

When setting up a rule, one can specify the EVALUATION subroutine by the parameter 'EVALUATION' or 'E'. If the parameter is given a subroutine, that is used. If the parameter is given a regexp, that is applied to the matched string of the text and the first parenthized match is returned as the value. Otherise the parameter within the EVALUATION is returned.

Each node has a computed value that is the result of calling its evaluation routine. The returned value of the parse is the computed value of the root node.

There are two parameters to a node's evaluation routine.

The first parameter to the evaluation routine is either a hash or a string:

If the node is a leaf regexp that has a parenthesized match inside, what is matched by the first parenthesized match is the parameter. Else if the node is a leaf then what is matched by the leaf is the first parameter. Else if 'USE_PARSE_MATCH()' has been set for the node's rule, a join of all the matched strings of the nodes descendents is the parameter.

For other internal nodes, the first parameter is a hash. The hash's keys are the named subrules of the node's rule, the values are the computed value of the corresponding child node. If a key could repeat, the value is an array reference.

The second parameter to an evaluation routine is a hash to several parameters. These are:

   parse_this_ref # The object being parsed
   current_value # The current value if evaluation during parsing
   parameters # Same as first parameter if hash
   node_parse_match # Same as first parameter if USE_PARSE_MATCH
   parse_hash # Writeable hash used for parse (can store global variables)
   node_hash # Writeable hash used for node (combine with parse_forward)

By nesting a rule with an alias, the alias is used for the name of the hash parameter instead of the rule name.

Examples:

   LEAF( qr/\w+/, E(sub {...})) # subroutine is called with word

   L( qr/(d)\w+/, E(sub {...})) # subroutine is called with 'd'

   L( qr/\s*\w+\s*/, E(qr/\s*(\w+)\s*/) # white space trimmed

   L( qr/\s*(\w+)\s*/) # white space trimmed

   qr/\s*(\w+)\s*/ # white space trimmed

   L( qr/\s*\w+\s*/, E('x')) # evaluates to value x

   L( qr/\s*\w+\s*/, E(['x','ss','t'])) # evaluates to array reference

   A( qr/a/, qr/b/, E(sub {...})) #params: $_[0]->{''}->['a','b']

   A( qr/a/, qr/b/, E(sub {...}),
    USE_PARSE_MATCH()) #params: $_[0] eq 'ab'

   A( {f=>qr/a/}, {f=>qr/b/},
    E(sub {...})) #params: $_[0]->{'f'}->['a','b']

A function, LOCATION is provided that takes a string reference and a position in the string and computes the line number and tab value of the given position. This can be used in evaluation functions if one passes in the object being parsed (second argument) and the current value (third argument). This is used in Parse::Stallion::EBNF to show where within the input grammar an error occurs, see for example test case t/ebnf_in.t.

EVALUATION AFTER PARSING

If after parsing, Stallion will evaluate the parse tree in a bottom up left to right traversal.

EVALUATION (AND UNEVALUATION) DURING PARSING

If the do_evaluation_in_parsing is set when a Parse::Stallion object is created the evaluation occurs during the parsing instead of afterwards. Alternatively, if there exists any UNEVALUATION routine, the evaluation is done during parsing.

Every time a node is matched, its evaluation routine is called with a hash as it would be during evaluation after parsing. A second parameter is also passed in which corresponds to the current object being parsed. This allows look ahead.

The evaluation routine may return a second parameter that tells Parse::Stallion to reject or not reject the match. This allows more control over what can be parsed.

The same node may be evaluated several times due to backtracking. One must be careful either to not change the references/objects passed in to the evaluation routine or to undo changes in an unevaluation routine. An unevaluation routine is called when backtracking reverts back to a node, the parameters are the same as for the evaluation routine. UNEVALUATION (alternatively U) are used to set the subroutine.

The following example shows a 'leaf' rule that matches all words except for those marked in the hash %keywords:

  our %keywords = (...);
  my %grammar = (...
   leaf => L(
     qr/\w+/,
     E(sub {if ($keywords{$_[0]}) {return (undef, 1)} return $_[0]}),
     U(), #forces do_evaluation_in_parsing
   ),
  ...
  );

EVALUATION DURING PARSING EXAMPLE

In this example, the first statement tells a truth about the number of elements in a list, the second tells whether or not the first statement is true. If the second statement is true, the string parses.

  my %parsing_rules = (
   start_expression =>
    AND('two_statements', qr/\z/,
     E(sub {return $_[0]->{'two_statements'}})
   ),
   two_statements =>
     A('list_statement','truth_statement',
      EVALUATION(sub {
       if ($_[0]->{list_statement} != $_[0]->{truth_statement}) {
         return (undef, 1);
       }
       return 1;
     })
   ),
   list_statement =>
     A('count_statement', 'list',
      E(sub {
       if ($_[0]->{count_statement} == scalar(@{$_[0]->{list}})) {
         return 1;
       }
       return 0;
     })
   ),
   count_statement =>
     A(qr/there are /i,'number',L(qr/ elements in /),
      E(sub { return $_[0]->{number}; })
    ),
   number =>
    L(qr/\d+/,
     E(sub { return 0 + shift; })
   ),
   list => A('number', M(A(qr/\,/}, 'number')),
     EVALUATION(sub {return $_[0]->{number}})
   ),
   truth_statement =>
     O({t => qr/\. that is the truth\./},
      {t=>qr/\. that is not the truth\./},
      E(sub {
       if ($_[0]->{t} =~ /not/) {
         return 0;
       }
       return 1;
     })
   ),
  );
  
  my $how_many_parser = new Parse::Stallion(
   \%parsing_rules,
   {do_evaluation_in_parsing => 1,
    start_rule => 'start_expression',
  });
  
  $result = $how_many_parser->parse_and_evaluate(
    "there are 5 elements in 5,4,3,2,1. that is the truth.");
  
  print "$result should be 1\n";
  
  $result = $how_many_parser->parse_and_evaluate(
    "there are 5 elements in 5,4,3,1. that is not the truth.");
  
  print "$result should be 1\n";
  
  $result = $how_many_parser->parse_and_evaluate(
    "there are 5 elements in 5,4,3,1. that is the truth.");
  
  print "$result should be undef\n";

DEFAULT EVALUATION ROUTINE

If a rule does not have an evaluation routine specified, it is as if a generic subroutine is used.

Generic Evaluation Routine 1

  • If the passed in hash reference has only one key, then the value of that key in the hash reference is returned.

  • If the passed in hash reference has more than one key, then the hash reference is returned.

It is as if this routine were run:

  sub {
    my $parameters = shift;
    if (ref $parameters eq 'HASH') {
      if (keys %$parameters == 1) {
        my ($key) = keys %$parameters;
        return $parameters->{$key};
      }
    }
    return $parameters;
  }

Generic Evaluation Routine 2

If do_not_compress_eval is set when creating the parser, it is as if this routine were run:

  sub {
    return shift;
  }

Example:

  my %no_eval_rules = (
   start_rule => A('term',
    M(A({plus=>qr/\s*\+\s*/}, 'term'))),
   term => A({left=>'number'},
    M(A({times=>qr/\s*\*\s*/},
     {right=>'number'}))),
   number => qr/\s*\d*\s*/
  );                               
                      
  my $no_eval_parser = new Parse::Stallion(\%no_eval_rules,
   {do_not_compress_eval => 0});

  $result = $no_eval_parser->parse_and_evaluate("7+4*8");
   
  #$result contains:
  { 'plus' => [ '+' ],
    'term' => [ '7',
                { 'left' => '4',
                  'right' => [ '8' ],
                  'times' => [ '*' ] } ] }

  my $dnce_no_eval_parser = new Parse::Stallion(\%no_eval_rules,
   {do_not_compress_eval => 1});

  $result = $dnce_no_eval_parser->parse_and_evaluate("7+4*8");

  #$result contains:
  { 'plus' => [ '+' ],
    'term' => [ { 'left' => '7' },
                { 'left' => '4',
                  'right' => [ '8' ],
                  'times' => [ '*' ] } ] }

Parameter types to Evaluation Routines

If a named parameter could appear more than once, it is passed as an array, else as a scalar. Being passed as an array could be caused by the name being within a 'MULTIPLE' rule which does not have maximum children of 1 or occuring more than once within the subrules of an 'AND' rule.

The routine which_parameters_are_arrays returns a hash of the possible values passed to an evaluation routine. For a given key, if the value is 1, the key would be passed to the evaluation routine as an array, if the value is 0, it would be passed as a scalar.

MORE COMPLICATED EXAMPLE

  The following is a simple calculator:

   %calculator_rules = (
    start_expression => A(
      'expression', qr/\z/,
      E(sub {return $_[0]->{expression}})
     ),
    expression => A(
      'term', 
       M(A('plus_or_minus', 'term')),
      E(sub {my $to_combine = $_[0]->{term};
       my $plus_or_minus = $_[0]->{plus_or_minus};
       my $value = $to_combine->[0];
       for my $i (1..$#{$to_combine}) {
         if ($plus_or_minus->[$i-1] eq '+') {
           $value += $to_combine->[$i];
         }
         else {
           $value -= $to_combine->[$i];
         }
       }
       return $value;
      })
     ),
    term => A(
      'number', 
       M(A('times_or_divide', 'number')),
       E(sub {my $to_combine = $_[0]->{number};
       my $times_or_divide = $_[0]->{times_or_divide};
       my $value = $to_combine->[0];
       for my $i (1..$#{$to_combine}) {
         if ($times_or_divide->[$i-1] eq '*') {
           $value *= $to_combine->[$i];
         }
         else {
           $value /= $to_combine->[$i]; #does not check for zero
         }
       }
       return $value;
      })
    ),
    number => L(
      qr/\s*[+\-]?(\d+(\.\d*)?|\.\d+)\s*/,
      E(sub{ return 0 + $_[0]; })
    ),
    plus_or_minus => qr/\s*([\-+])\s*/,
    times_or_divide => qr/\s*([*\/])\s*/
   );

   $calculator_parser = new Parse::Stallion(\%calculator_rules,
     {start_rule => 'start_expression'
   });

   my $result = $calculator_parser->parse_and_evaluate("3+7*4");
   # $result should contain 31

  my $array_p = $calculator_parser->which_parameters_are_arrays('term');
  # $array_p would be {number => 1, times_or_divide => 1}

  $array_p = $calculator_parser->which_parameters_are_arrays(
   'start_expression');
  # $array_p would be {expression => 0}

STRING_EVALUATION

Instead of passing an evaluation subroutine, one can pass in the string of a subroutine via STRING_EVALAUATION (or SE). The string is modified so that the passed in parameters become local variables.

Example:

    a_rule => A('x_1', 'x_2', 'x_3', 'x_1'),
      SE('#comment'));

results in the evaluation routine equivalent to:

     sub {
     $x_1 = $_[0]->{x_1};
     $x_2 = $_[0]->{x_2};
     $x_3 = $_[0]->{x_3};
     #comment
     }

If 'use_parse_match' or rule is a leaf then $_ is set to the parameter $_[0]; else, $_ is the one unnamed rule or an array ref of the unnamed rules:

    $_ = $_[0]->{''};

LEAF DETAILS

Leaf rules can be set up as follows:

  LEAF($leaf_arg, PARSE_FORWARD(sub{...}), PARSE_BACKTRACK(sub{...}),
   EVALUATION(sub{...}), UNEVALUATION(sub{...}), DISPLAY($display));

If $leaf_arg is a Regexp, it is converted into a hash ref: {regex_match => $leaf_arg} for internal purposes.

If a default PARSE_FORWARD is not provided then a regexp match is attempted on the string being parsed at the current_value's position. Default parse_forward and parse_backtrack subroutines can be provided for leaves.

The subroutine in PARSE_FORWARD (or PF) is called when moving forwards during the parse. Its one parameter is a hash of: parse_this_ref # The object being parsed current_value # current value of parse parameters # Same as to evaluation routine of leaf node's parent if # evaluation in parsing and only known parameters node_parse_match # Same as to evaluation routine of leaf node's parent # if evaluation in parsing and only existing descendents parse_hash # hash for storing global variables during given parse node_hash # hash for storing variables of parent's node leaf_rule_info # $leaf_arg's

The regexp matching done if there is no PARSE_FORWARD routine is similar to:

If the parsing should continue forward it should return an array with the first argument true (1), the second argument a "parse match" to store as what was matched, and the thrid argument the new current value. Else it should return 0.

The subroutine in PARSE_BACKTRACK (or PB) is called when backtracking through a leaf. Its one parameter is a hash of parse_this_ref # The object being parsed current_value # current value of parse value_when_entered # value when leaf was created match # value that was returned by parse_forward as match parameters # Same as to evaluation routine of leaf node's parent if # evaluation in parsing and only known parameters node_parse_match # Same as to evaluation routine of leaf node's parent # if evaluation in parsing and only existing descendents parse_hash # hash for storing global variables during given parse node_hash # hash for storing variables of parent's node leaf_rule_info # $leaf_arg's

It should return false. If it returns true, then the parsing immediately ends in failure. This can be used to set up a rule

  pass_this_no_backtrack => L(qr//,PB(sub{return 1}))

that if encountered during parsing during a backtrack means that the parsing will end.

The string $display is used in the related module Parse::Stallion::EBNF as to the string to show for the leaf rule.

EVALUATION and UNEVALUATION are explained in the section 'EVALUATION'.

PARSING NON-STRINGS

Four subroutines may be provided: a default 'leaf' rule matcher/modifier for when the parser is moving forward and a default 'leaf' rule unmodifier for when the parser is backtracking. A third optional subroutine, initial_value, sets the initial current value else it is 0.

The fourth subroutine, final_value, should return the final value of a successful parse for a given object. This subroutine is similar to parsing strings ensuring, or not ensuring, that the entire string is matched instead of matching only a portion.

  my $object_parser = new Parse::Stallion({
    ...
    parse_forward =>
     sub {
       my $parameters = shift;
       ...
       return ($true_if_object_matches_rule,
        $value_to_store_in_leaf_node,
        $value_equal_or_greater_than_current_value);
     },
    parse_backtrack =>
     sub {
       my $parameters = shift;
       ...
       return; #else parsing halts
      },
    initial_value => sub {my ($object_ref, $parse_hash) = @_;
       ...
       return $initial_value;
    },
    final_value =>
     sub {my ($object_ref, $current_value, $parse_hash) = @_;
        ...
       return $final_value; # parse ends if $final_value==$current_value
     },
  });

When evaluating the parse tree, the parameters to the 'leaf' nodes are the values returned in parse_forward, $value_to_store_in_leaf_node. These values are joined together for parse_match.

The script object_string.pl in the example directory shows how to use this.

'LEAF' LEAF PARSE FORWARD/BACKTRACK

All 'leaf' rules need to be set up such that when the parser is moving forward and reaches a 'leaf', the 'leaf' rule attempts to match the object being parsed at the current value. If there is a match, then the current_value may increase.

When backtracking, the object being parsed should be reverted, if changed, to the state before being matched by the 'leaf' rule.

NON_DECREASING VALUE

The third value returned from parse_forward should be equal or greater than the $current_value that was passed in.

This value is used to detect and prevent "left recursion" by not allowing a non-'leaf' rule to repeat at the same value. 'Multiple' rules are prevented from repeating more than once at the same value.

The value also cuts down on the number of steps by allowing the parser to not repeat dead-end parses. If during the parse, the same rule is attempted a second time on the parse object with the same value, and the first parse did not succeed, the parser will begin backtracking.

In parsing a string, the value is the current position within the string.

STRINGS

By default, strings are matched, which, if a reference to the string instead of the string is passed in to parse_and_evaluate, is similar to that found in the test case object_string.t:

  my $calculator_stallion = new Parse::Stallion({
    ...
    parse_forward =>
     sub {
      my $parameters = shift;
      my $input_string_ref = $parameters->{parse_this_ref};
      my $rule_definition = $parameters->{leaf_rule_info};
      my $m = $rule_definition->{regex_match};
      if ($$input_string_ref =~ s/\A($m)//) {
        return (1, $1, 0 - length($string));
      }
      return 0;
     },

    parse_backtrack =>
     sub {
      my $parameters = shift;
      my $input_string_ref = $parameters->{parse_this_ref};
      my $stored_value = $parameters->{match};
      if (defined $stored_value) {
        $$input_string_ref = $stored_value.$$input_string_ref;
      }
      return;
     },

    initial_value =>
     sub {
       my $input_string_ref = shift;
       return 0 - length($$input_string_ref);
     },

    final_value =>
     sub {
       return 0;
     }
  });

EXPORT

The following are EXPORTED from this module:

 A AND E EVALUATION L LEAF LEAF_DISPLAY LOCATION M MULTIPLE O OPTIONAL
 OR PARSE_FORWARD PARSE_BACKTRACK PB PF SE STRING_EVALUATION U
 UNEVALUATION USE_PARSE_MATCH Z ZERO_OR_ONE

PERL Requirements

Parse::Stallion's installation uses Test::More and Time::Local requiring perl 5.6 or higher. Parse::Stallion should work with earlier versions of perl, neither of those modules is required outside of the test cases for installation.

AUTHOR

Arthur Goldstein, <arthur@acm.org>

ACKNOWLEDGEMENTS

Damian Conway and Greg London.

COPYRIGHT AND LICENSE

Copyright (C) 2007-9 by Arthur Goldstein. All Rights Reserved.

This module is free software. It may be used, redistributed and/or modified under the terms of the Perl Artistic License (see http://www.perl.com/perl/misc/Artistic.html)

BUGS

Please email in bug reports.

TO DO AND FUTURE POSSIBLE CHANGES

License

Please send in suggestions.

SEE ALSO

example directory (includes stallion.html, a javascript translation of Parse::Stallion) and test case directory t

Parse::Stallion::EBNF. Outputs grammars in more readable form. Also contains an example of how one could input a grammar from a string, similar to how many parsers take their input.

Parse::Stallion::CSV. Example of how to create a parser from specification.

Perl 6 grammars.

lex, yacc, ..., other parsers.