Name

Data::DFA - Deterministic finite state parser from regular expression

Synopsis

Create a deterministic finite state parser to recognize sequences of symbols that match a given regular expression.

To recognize sequences of symbols drawn from 'a'..'e' that match the regular expression: a (b|c)+ d? e:

# Construct a deterministic finite state automaton from the regular expression:

use Data::DFA qw(:all);
use Data::Table::Text qw(:all);
use Test::More qw(no_plan);

my $dfa = fromExpr
 (element("a"),
  oneOrMore(choice(element("b"), element("c"))),
  optional(element("d")),
  element("e")
 );

ok  $dfa->parser->accepts(qw(a b e));
ok !$dfa->parser->accepts(qw(a d));

# Print the symbols used and the transitions table:

is_deeply ['a'..'e'], [$dfa->symbols];

ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
  State        Final  Symbol  Target       Final
1   0                   a       1 3          0
2   1 2 3 4 5 6  0      b       1 2 3 4 5 6  0
3                       c       1 3 4 5 6    0
4                       d       6            0
5                       e       7            1
6   1 3          0      b       1 2 3 4 5 6  0
7                       c       1 3 4 5 6    0
8   1 3 4 5 6    0      b       1 2 3 4 5 6  0
9                       c       1 3 4 5 6    0
10                      d       6            0
11                      e       7            1
12  6            0      e       7            1
END

# Create a parser and use it to parse a sequence of symbols

my $parser = $dfa->parser;                                                    # New parser

eval { $parser->accept($_) } for qw(a b a);                                   # Try to parse a b a

say STDERR $@;                                                                # Error message
#   Already processed: a b
#   Expected one of  : b c d e
#   But was given    : a

is_deeply [$parser->next],     [qw(b c d e)];                                 # Next acceptable symbol
is_deeply  $parser->processed, [qw(a b)];                                     # Symbols processed
ok !$parser->final;                                                           # Not in a final state

Description

The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.

Construct regular expression

Construct a regular expression that defines the language to be parsed using the following combining operations which can all be imported:

element($)

One element.

   Parameter  Description
1  $label     Transition symbol

This is a static method and so should be invoked as:

Data::DFA::element

This method can be imported via:

use Data::DFA qw(element)

sequence(@)

Sequence of elements.

   Parameter  Description
1  @elements  Elements

This is a static method and so should be invoked as:

Data::DFA::sequence

This method can be imported via:

use Data::DFA qw(sequence)

optional(@)

An optional sequence of element.

   Parameter  Description
1  @element   Elements

This is a static method and so should be invoked as:

Data::DFA::optional

This method can be imported via:

use Data::DFA qw(optional)

zeroOrMore(@)

Zero or more repetitions of a sequence of elements.

   Parameter  Description
1  @element   Elements

This is a static method and so should be invoked as:

Data::DFA::zeroOrMore

This method can be imported via:

use Data::DFA qw(zeroOrMore)

oneOrMore(@)

One or more repetitions of a sequence of elements.

   Parameter  Description
1  @element   Elements

This is a static method and so should be invoked as:

Data::DFA::oneOrMore

This method can be imported via:

use Data::DFA qw(oneOrMore)

choice(@)

Choice from amongst one or more elements.

   Parameter  Description
1  @elements  Elements to be chosen from

This is a static method and so should be invoked as:

Data::DFA::choice

This method can be imported via:

use Data::DFA qw(choice)

Deterministic finite state parser

Create a deterministic finite state automaton to parse sequences of symbols in the language defined by a regular expression.

fromExpr(@)

Create a DFA from a regular expression.

   Parameter  Description
1  @expr      Expression

This is a static method and so should be invoked as:

Data::DFA::fromExpr

print($$$)

Print DFA to a string and optionally to STDERR or STDOUT

   Parameter  Description
1  $dfa       DFA
2  $title     Title
3  $print     1 - STDOUT or 2 - STDERR

printNws($$$)

Print DFA with whitespace normalized

   Parameter  Description
1  $dfa       DFA
2  $title     Title
3  $print     1 - STDOUT or 2 - STDERR

symbols($)

Return an array of all the symbols accepated by the DFA

   Parameter  Description
1  $dfa       DFA

parser($)

Create a parser from a deterministic finite state automaton constructed from a regular expression.

   Parameter  Description
1  $dfa       Deterministic finite state automaton generated from an expression

This is a static method and so should be invoked as:

Data::DFA::parser

Parser methods

Use the DFA to parse a sequence of symbols

Data::DFA::Parser::accept($$)

Accept the next symbol drawn from the symbol set if possible by moving to a new state otherwise confessing with a helpful message

   Parameter  Description
1  $parser    DFA Parser
2  $symbol    Next symbol to be processed by the finite state automaton

Data::DFA::Parser::final($)

Returns whether we are currently in a final state or not

   Parameter  Description
1  $parser    DFA Parser

Data::DFA::Parser::next($)

Returns an array of symbols that would be accepted in the current state

   Parameter  Description
1  $parser    DFA Parser

Data::DFA::Parser::accepts($@)

Confirm that a DFA accepts an array representing a sequence of symbols

   Parameter  Description
1  $parser    DFA Parser
2  @symbols   Array of symbols

Private Methods

finalState($$)

Check whether any of the specified states in the NFA are final

   Parameter  Description
1  $nfa       NFA
2  $reach     Hash of states in the NFA

superState($$$$$)

Create super states from existing superstate

   Parameter              Description
1  $dfa                   DFA
2  $superStateName        Start state in DFA
3  $nfa                   NFA we are converting
4  $symbols               Symbols in the NFA we are converting
5  $nfaSymbolTransitions  States reachable from each state by symbol

superStates($$$)

Create super states from existing superstate

   Parameter        Description
1  $dfa             DFA
2  $SuperStateName  Start state in DFA
3  $nfa             NFA we are tracking

transitionOnSymbol($$$)

The super state reached by transition on a symbol from a specified state

   Parameter        Description
1  $dfa             DFA
2  $superStateName  Start state in DFA
3  $symbol          Symbol

Index

1 choice

2 Data::DFA::Parser::accept

3 Data::DFA::Parser::accepts

4 Data::DFA::Parser::final

5 Data::DFA::Parser::next

6 element

7 finalState

8 fromExpr

9 oneOrMore

10 optional

11 parser

12 print

13 printNws

14 sequence

15 superState

16 superStates

17 symbols

18 transitionOnSymbol

19 zeroOrMore

Exports

All of the following methods can be imported via:

use Data::DFA qw(:all);

Or individually via:

use Data::DFA qw(<method>);

1 choice

2 element

3 oneOrMore

4 optional

5 sequence

6 zeroOrMore

Installation

This module is written in 100% Pure Perl and, thus, it is easy to read, use, modify and install.

Standard Module::Build process for building and installing modules:

perl Build.PL
./Build
./Build test
./Build install

Author

philiprbrenan@gmail.com

http://www.appaapps.com

Copyright

Copyright (c) 2016-2018 Philip R Brenan.

This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.