Name

Data::DFA - Deterministic finite state parser from regular expression

Synopsis

Create a deterministic finite state parser to recognize sequences of symbols that match a given regular expression.

For example: to recognize sequences of symbols drawn from 'a'..'e' that match the regular expression: a (b|c)+ d? e proceed as follows:

# Construct a deterministic finite state automaton from the regular expression:

use Data::DFA qw(:all);
use Data::Table::Text qw(:all);
use Test::More qw(no_plan);

my $dfa = dfaFromExpr
 (element("a"),
  oneOrMore(choice(element("b"), element("c"))),
  optional(element("d")),
  element("e")
 );

# Print the symbols used and the transitions table:

is_deeply ['a'..'e'], [$dfa->symbols];

ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location  F  Transitions
     0  0  { a => 1 }
     1  0  { b => 2, c => 2 }
     2  0  { b => 2, c => 2, d => 6, e => 7 }
     6  0  { e => 7 }
     7  1  undef
END

# Create a parser and parse a sequence of symbols with the returned sub:

my ($parser, $end, $next, $processed) = $dfa->parser;                         # New parser

eval { &$parser($_) } for(qw(a b a));                                         # Try to parse a b a

say STDERR $@;                                                                # Error message
#   Already processed: a b
#   Expected one of  : b c d e
#   But was given    : a

is_deeply [&$next],      [qw(b c d e)];                                       # Next acceptable symbol
is_deeply [&$processed], [qw(a b)];                                           # Symbols processed
ok !&$end;                                                                    # Not at the end

Description

The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.

Construct regular expression

Construct a regular expression that defines the language to be parsed using the following combining operations which can all be imported:

element($)

One element.

   Parameter  Description
1  $label     Transition label

Example:

ok dfaFromExpr(element("a"))->print("Element: a") eq nws <<END;
Element: a
Location  F  Transitions
       0  0  { a => 1 }
       1  1  undef
END

This is a static method and so should be invoked as:

Data::DFA::element

sequence(@)

Sequence of elements.

   Parameter  Description
1  @elements  Elements

Example:

ok dfaFromExpr(sequence(element("a"), element("b")))

->print("Sequence: ab") eq nws <<END;
Sequence: ab
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2 }
       2  1  undef
END

This is a static method and so should be invoked as:

Data::DFA::sequence

optional($)

Zero or one of an element.

   Parameter  Description
1  $element   Element

Example:

ok dfaFromExpr(element("a"), optional(element("b")), element("c"))

->print("Optional: ab?c") eq nws <<END;
Optional: ab?c
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2, c => 3 }
       2  0  { c => 3 }
       3  1  undef
END

my $dfa = dfaFromExpr

(element("a"),

oneOrMore(choice(element("b"), element("c"))),

optional(element("d")),

element("e")

);

ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2, c => 2 }
       2  0  { b => 2, c => 2, d => 6, e => 7 }
       6  0  { e => 7 }
       7  1  undef
END

This is a static method and so should be invoked as:

Data::DFA::optional

zeroOrMore($)

Zero or more repetitions of an element.

   Parameter  Description
1  $element   Element to be repeated

Example:

ok dfaFromExpr(element("a"), zeroOrMore(element("b")), element("c"))

->print("Zero Or More: ab*c") eq nws <<END;
Zero Or More: ab*c
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 1, c => 4 }
       4  1  undef
END

This is a static method and so should be invoked as:

Data::DFA::zeroOrMore

oneOrMore($)

One or more repetitions of an element.

   Parameter  Description
1  $element   Element to be repeated

Example:

my $dfa = dfaFromExpr(element("a"), oneOrMore(element("b")), element("c"));

ok $dfa->print("One or More: ab+c") eq nws <<END;
One or More: ab+c
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2 }
       2  0  { b => 2, c => 4 }
       4  1  undef
END

This is a static method and so should be invoked as:

Data::DFA::oneOrMore

choice(@)

Choice from amongst one or more elements.

   Parameter  Description
1  @elements  Elements to be chosen from

Example:

my $dfa = dfaFromExpr(element("a"),

choice(element("b"), element("c")),

element("d"));

ok $dfa->print("Choice: (a(b|c)d") eq nws <<END;
Choice: (a(b|c)d
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2, c => 2 }
       2  0  { d => 5 }
       5  1  undef
END

This is a static method and so should be invoked as:

Data::DFA::choice

Deterministic finite state parser

Create a deterministic finite state automaton to parse sequences of symbols in the language defined by a regular expression.

print($$$)

Print the current state of the finite automaton. If it is non deterministic, the non deterministic jumps will be shown as well as the transitions table. If deterministic, only the transitions table will be shown.

   Parameter  Description
1  $states    States
2  $title     Title
3  $print     Print to STDERR if 2 or to STDOUT if 1

Example:

my $dfa = dfaFromExpr

(element("a"),

oneOrMore(choice(element("b"), element("c"))),

optional(element("d")),

element("e")

);

ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2, c => 2 }
       2  0  { b => 2, c => 2, d => 6, e => 7 }
       6  0  { e => 7 }
       7  1  undef
END

dfaFromExpr(@)

Create a DFA from a regular expression.

   Parameter  Description
1  @expr      Expression

Example:

my $dfa = dfaFromExpr

(element("a"),

oneOrMore(choice(element("b"), element("c"))),

optional(element("d")),

element("e")

);

ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2, c => 2 }
       2  0  { b => 2, c => 2, d => 6, e => 7 }
       6  0  { e => 7 }
       7  1  undef
END

This is a static method and so should be invoked as:

Data::DFA::dfaFromExpr

parser(@)

Create a parser from a deterministic finite state automaton constructed from a regular expression.

   Parameter  Description
1  $dfa       Deterministic finite state automaton generated from an expression

Example:

my $dfa = dfaFromExpr

(element("a"),

oneOrMore(choice(element("b"), element("c"))),

optional(element("d")),

element("e")

);

ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2, c => 2 }
       2  0  { b => 2, c => 2, d => 6, e => 7 }
       6  0  { e => 7 }
       7  1  undef
END

my ($parser, $end) = $dfa->parser;

&$parser($_) for qw(a b);

ok !&$end;

my ($parser, $end) = $dfa->parser;

&$parser($_) for qw(a b b c e);

ok &$end;

my ($parser, $end, $next, $processed) = $dfa->parser;

eval{&$parser($_)} for(qw(a b a));

ok !index(nws($@), nws <<END);
Already processed: a b
Expected one of  : b c d e
But was given    : a
END

is_deeply [&$next],      [qw(b c d e)];

is_deeply [&$processed], [qw(a b)];

is_deeply [&$processed], [qw(a b)];

This is a static method and so should be invoked as:

Data::DFA::parser

Parser methods

The following subs accept the input sequence of symbols to be validated and describe the current state of the parse. They are returned by parser when a new parser is constructed.

($)

Accept the next symbol drawn from the symbol set if possible by moving to a new state otherwise confessing with a helpful message

   Parameter  Description
1  $symbol    Next symbol to be processed by the finite state automaton

()

Returns whether we are currently in a final state or not

()

Returns an array of symbols that would be accepted in the current state

()

Returns the array of symbols processed so far by this parser

Private Methods

nfaFromExpr2($$)

Create a DFA from an expression by pushing it on to the array of state transitions and connecting it up to existing states with jumps.

   Parameter  Description
1  $states    States
2  $expr      Expression to convert to a DFA

nfaFromExpr(@)

Create an NFA from an expression.

   Parameter  Description
1  @expr      Expressions

This is a static method and so should be invoked as:

Data::DFA::nfaFromExpr

printNfa($$$)

Print the current state of a NFA.

   Parameter  Description
1  $states    States
2  $title     Title
3  $print     Print to STDERR if 2 or to STDOUT if 1

printDfa($$$)

Print the current state of a DFA.

   Parameter  Description
1  $states    States
2  $title     Title
3  $print     Print to STDERR if 2 or to STDOUT if 1

symbols($)

Return an array of all the transition symbols.

   Parameter  Description
1  $states    States

Example:

my $dfa = dfaFromExpr

(element("a"),

oneOrMore(choice(element("b"), element("c"))),

optional(element("d")),

element("e")

);

ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location  F  Transitions
       0  0  { a => 1 }
       1  0  { b => 2, c => 2 }
       2  0  { b => 2, c => 2, d => 6, e => 7 }
       6  0  { e => 7 }
       7  1  undef
END

is_deeply ['a'..'e'], [$dfa->symbols];

reachableStates($$$$)

Find the names of all the states that can be reached from a specified state via a specified symbol and all the jumps available.

   Parameter   Description
1  $states     States
2  $stateName  Name of start state
3  $symbol     Symbol
4  $targets    Optional array reference of reachable targets so far

removeJumpsFromState($$)

Remove the jumps from a state

   Parameter   Description
1  $states     States
2  $stateName  Name of the state to be dejumped.

reachableFrom($$$)

Find the names of all the states that can be reached from a specified state using any symbol.

   Parameter   Description
1  $states     States
2  $stateName  Name of start state
3  $targets    Optional hash reference of reachable targets so far

removeJumpsFromAllStates($)

Remove the jumps from every state.

   Parameter  Description
1  $states    States

removeDuplicateStates($)

Remove any states with duplicate transition sets redirecting transitions to the surviving state.

   Parameter  Description
1  $states    States

Index

1

2 choice

3 dfaFromExpr

4 element

5 nfaFromExpr

6 nfaFromExpr2

7 oneOrMore

8 optional

9 parser

10 print

11 printDfa

12 printNfa

13 reachableFrom

14 reachableStates

15 removeDuplicateStates

16 removeJumpsFromAllStates

17 removeJumpsFromState

18 sequence

19 symbols

20 zeroOrMore

Installation

This module is written in 100% Pure Perl and, thus, it is easy to read, use, modify and install.

Standard Module::Build process for building and installing modules:

perl Build.PL
./Build
./Build test
./Build install

Author

philiprbrenan@gmail.com

http://www.appaapps.com

Copyright

Copyright (c) 2016-2018 Philip R Brenan.

This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.