Name
Data::DFA - Deterministic finite state parser from regular expression
Synopsis
Create a deterministic finite state parser to recognize sequences of symbols that match a given regular expression.
For example: to recognize sequences of symbols drawn from 'a'..'e' that match the regular expression: a (b|c)+ d? e proceed as follows:
# Construct a deterministic finite state automaton from the regular expression:
use Data::DFA qw(:all);
use Data::Table::Text qw(:all);
use Test::More qw(no_plan);
my $dfa = dfaFromExpr
(element("a"),
oneOrMore(choice(element("b"), element("c"))),
optional(element("d")),
element("e")
);
# Print the symbols used and the transitions table:
is_deeply ['a'..'e'], [$dfa->symbols];
ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2, c => 2 }
2 0 { b => 2, c => 2, d => 6, e => 7 }
6 0 { e => 7 }
7 1 undef
END
# Create a parser and parse a sequence of symbols with the returned sub:
my ($parser, $end, $next, $processed) = $dfa->parser; # New parser
eval { &$parser($_) } for(qw(a b a)); # Try to parse a b a
say STDERR $@; # Error message
# Already processed: a b
# Expected one of : b c d e
# But was given : a
is_deeply [&$next], [qw(b c d e)]; # Next acceptable symbol
is_deeply [&$processed], [qw(a b)]; # Symbols processed
ok !&$end; # Not at the end
Description
The following sections describe the methods in each functional area of this module. For an alphabetic listing of all methods by name see Index.
Construct regular expression
Construct a regular expression that defines the language to be parsed using the following combining operations which can all be imported:
element($)
One element.
Parameter Description
1 $label Transition label
Example:
ok dfaFromExpr(element("a"))->print("Element: a") eq nws <<END;
Element: a
Location F Transitions
0 0 { a => 1 }
1 1 undef
END
This is a static method and so should be invoked as:
Data::DFA::element
sequence(@)
Sequence of elements.
Parameter Description
1 @elements Elements
Example:
ok dfaFromExpr(sequence(element("a"), element("b")))
->print("Sequence: ab") eq nws <<END;
Sequence: ab
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2 }
2 1 undef
END
This is a static method and so should be invoked as:
Data::DFA::sequence
optional($)
Zero or one of an element.
Parameter Description
1 $element Element
Example:
ok dfaFromExpr(element("a"), optional(element("b")), element("c"))
->print("Optional: ab?c") eq nws <<END;
Optional: ab?c
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2, c => 3 }
2 0 { c => 3 }
3 1 undef
END
my $dfa = dfaFromExpr
(element("a"),
oneOrMore(choice(element("b"), element("c"))),
optional(element("d")),
element("e")
);
ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2, c => 2 }
2 0 { b => 2, c => 2, d => 6, e => 7 }
6 0 { e => 7 }
7 1 undef
END
This is a static method and so should be invoked as:
Data::DFA::optional
zeroOrMore($)
Zero or more repetitions of an element.
Parameter Description
1 $element Element to be repeated
Example:
ok dfaFromExpr(element("a"), zeroOrMore(element("b")), element("c"))
->print("Zero Or More: ab*c") eq nws <<END;
Zero Or More: ab*c
Location F Transitions
0 0 { a => 1 }
1 0 { b => 1, c => 4 }
4 1 undef
END
This is a static method and so should be invoked as:
Data::DFA::zeroOrMore
oneOrMore($)
One or more repetitions of an element.
Parameter Description
1 $element Element to be repeated
Example:
my $dfa = dfaFromExpr(element("a"), oneOrMore(element("b")), element("c"));
ok $dfa->print("One or More: ab+c") eq nws <<END;
One or More: ab+c
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2 }
2 0 { b => 2, c => 4 }
4 1 undef
END
This is a static method and so should be invoked as:
Data::DFA::oneOrMore
choice(@)
Choice from amongst one or more elements.
Parameter Description
1 @elements Elements to be chosen from
Example:
my $dfa = dfaFromExpr(element("a"),
choice(element("b"), element("c")),
element("d"));
ok $dfa->print("Choice: (a(b|c)d") eq nws <<END;
Choice: (a(b|c)d
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2, c => 2 }
2 0 { d => 5 }
5 1 undef
END
This is a static method and so should be invoked as:
Data::DFA::choice
Deterministic finite state parser
Create a deterministic finite state automaton to parse sequences of symbols in the language defined by a regular expression.
print($$$)
Print the current state of the finite automaton. If it is non deterministic, the non deterministic jumps will be shown as well as the transitions table. If deterministic, only the transitions table will be shown.
Parameter Description
1 $states States
2 $title Title
3 $print Print to STDERR if 2 or to STDOUT if 1
Example:
my $dfa = dfaFromExpr
(element("a"),
oneOrMore(choice(element("b"), element("c"))),
optional(element("d")),
element("e")
);
ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2, c => 2 }
2 0 { b => 2, c => 2, d => 6, e => 7 }
6 0 { e => 7 }
7 1 undef
END
dfaFromExpr(@)
Create a DFA from a regular expression.
Parameter Description
1 @expr Expression
Example:
my $dfa = dfaFromExpr
(element("a"),
oneOrMore(choice(element("b"), element("c"))),
optional(element("d")),
element("e")
);
ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2, c => 2 }
2 0 { b => 2, c => 2, d => 6, e => 7 }
6 0 { e => 7 }
7 1 undef
END
This is a static method and so should be invoked as:
Data::DFA::dfaFromExpr
parser(@)
Create a parser from a deterministic finite state automaton constructed from a regular expression.
Parameter Description
1 $dfa Deterministic finite state automaton generated from an expression
Example:
my $dfa = dfaFromExpr
(element("a"),
oneOrMore(choice(element("b"), element("c"))),
optional(element("d")),
element("e")
);
ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2, c => 2 }
2 0 { b => 2, c => 2, d => 6, e => 7 }
6 0 { e => 7 }
7 1 undef
END
my ($parser, $end) = $dfa->parser;
&$parser($_) for qw(a b);
ok !&$end;
my ($parser, $end) = $dfa->parser;
&$parser($_) for qw(a b b c e);
ok &$end;
my ($parser, $end, $next, $processed) = $dfa->parser;
eval{&$parser($_)} for(qw(a b a));
ok !index(nws($@), nws <<END);
Already processed: a b
Expected one of : b c d e
But was given : a
END
is_deeply [&$next], [qw(b c d e)];
is_deeply [&$processed], [qw(a b)];
is_deeply [&$processed], [qw(a b)];
This is a static method and so should be invoked as:
Data::DFA::parser
Parser methods
The following subs accept the input sequence of symbols to be validated and describe the current state of the parse. They are returned by parser when a new parser is constructed.
($)
Accept the next symbol drawn from the symbol set if possible by moving to a new state otherwise confessing with a helpful message
Parameter Description
1 $symbol Next symbol to be processed by the finite state automaton
()
Returns whether we are currently in a final state or not
()
Returns an array of symbols that would be accepted in the current state
()
Returns the array of symbols processed so far by this parser
Private Methods
nfaFromExpr2($$)
Create a DFA from an expression by pushing it on to the array of state transitions and connecting it up to existing states with jumps.
Parameter Description
1 $states States
2 $expr Expression to convert to a DFA
nfaFromExpr(@)
Create an NFA from an expression.
Parameter Description
1 @expr Expressions
This is a static method and so should be invoked as:
Data::DFA::nfaFromExpr
printNfa($$$)
Print the current state of a NFA.
Parameter Description
1 $states States
2 $title Title
3 $print Print to STDERR if 2 or to STDOUT if 1
printDfa($$$)
Print the current state of a DFA.
Parameter Description
1 $states States
2 $title Title
3 $print Print to STDERR if 2 or to STDOUT if 1
symbols($)
Return an array of all the transition symbols.
Parameter Description
1 $states States
Example:
my $dfa = dfaFromExpr
(element("a"),
oneOrMore(choice(element("b"), element("c"))),
optional(element("d")),
element("e")
);
ok $dfa->print("Dfa for a(b|c)+d?e :") eq nws <<END;
Dfa for a(b|c)+d?e :
Location F Transitions
0 0 { a => 1 }
1 0 { b => 2, c => 2 }
2 0 { b => 2, c => 2, d => 6, e => 7 }
6 0 { e => 7 }
7 1 undef
END
is_deeply ['a'..'e'], [$dfa->symbols];
reachableStates($$$$)
Find the names of all the states that can be reached from a specified state via a specified symbol and all the jumps available.
Parameter Description
1 $states States
2 $stateName Name of start state
3 $symbol Symbol
4 $targets Optional array reference of reachable targets so far
removeJumpsFromState($$)
Remove the jumps from a state
Parameter Description
1 $states States
2 $stateName Name of the state to be dejumped.
reachableFrom($$$)
Find the names of all the states that can be reached from a specified state using any symbol.
Parameter Description
1 $states States
2 $stateName Name of start state
3 $targets Optional hash reference of reachable targets so far
removeJumpsFromAllStates($)
Remove the jumps from every state.
Parameter Description
1 $states States
removeDuplicateStates($)
Remove any states with duplicate transition sets redirecting transitions to the surviving state.
Parameter Description
1 $states States
Index
2 choice
4 element
8 optional
9 parser
10 print
11 printDfa
12 printNfa
18 sequence
19 symbols
20 zeroOrMore
Installation
This module is written in 100% Pure Perl and, thus, it is easy to read, use, modify and install.
Standard Module::Build process for building and installing modules:
perl Build.PL
./Build
./Build test
./Build install
Author
Copyright
Copyright (c) 2016-2018 Philip R Brenan.
This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.