NAME
Algorithm::ConstructDFA2 - Deterministic finite automaton construction
SYNOPSIS
use Algorithm::ConstructDFA2;
my $dfa = Algorithm::ConstructDFA2->new(
input_alphabet => [ @symbols ],
input_vertices => [ qw/ 2 3 4 / ],
input_edges => [ [ 2, 3 ], [ 3, 4 ] ],
vertex_nullable => sub($vertex) { ... },
vertex_matches => sub($vertex, $input) { ... },
storage_dsn => 'dbi:SQLite:dbname=...',
);
my $start_id = $dfa->find_or_create_state_id(qw/ 2 /);
while (my $count = $dfa->compute_some_transitions(1_000)) {
...
}
my @accepting = $dfa->cleanup_dead_states(sub(@vertices) {
...
});
DESCRIPTION
This module computes deterministic finite automata from equivalent non-deterministic finite automata. The input NFA must be expressed as directed graph with labeled vertices. Vertex labels indicate if vertices match a particular terminal symbol from an input alphabet, or match the empty string, meaning they can be crossed without any input when matching a string.
This is slightly different from how NFA graphs are usually encoded in literature (as graph with labeled edges), but the conversion is straightforward (turn edges into additional vertices). Finding a suitable alphabet is more difficult, Set::IntSpan::Partition can help with that (the module splits sets of sets of terminals like "letters" and "digits" and "hexdigits" into non-overlapping sets, each of which can then be used as a terminal for this module).
DFAs can be exponentially larger than equivalent NFAs; to accomodate large or complicated NFAs, computed data is held in a SQLite database to reduce memory use. Since a DFA is basically just the result of exhaustively computing cross-products, most computation is done in SQL, leaving only minimal Perl code.
CONSTRUCTOR
- new(%options)
-
The
%options
hash supports the following keys:input_vertices
-
Array of vertices (unsigned integers) in the input graph.
input_edges
-
Array of edges (arrays of two vertices) in the input graph.
input_alphabet
-
Array of terminal symbols (unsigned integers).
vertex_nullable
-
Code reference called for each vertex in the input graph. Should return a true value if and only if the vertex matches the empty string.
vertex_matches
-
Code reference called for each pair of input vertex and input symbol from the input alphabet. Should return a true value if and only if the vertex matches the input symbol.
storage_dsn
-
Database to use for computations,
dbi:SQLite:dbname=:memory:
by default.
METHODS
- $dfa->find_or_create_state_id(@vertices)
-
Given a list of vertices, computes a new state, adds it to the automaton if it does not already exist, and returns an identifier for the state. This is used to create a start state in the DFA.
- $dfa->compute_some_transitions($limit)
-
Computes up to
$limit
additional transitions and returns the number of transitions actually computed. A return value of zero indicates that all transitions have been computed. - $dfa->dead_state_id()
-
Returns the state identifier for a fixed dead state (from which no accepting configuration can be reached).
- $dfa->cleanup_dead_states(\&vertices_accept)
-
Given a code reference that takes a list of vertices and returns true if and only if the vertices are an accepting configuration, this method changes the automaton so that dead states have no transitions to different dead states.
If, for example, the input NFA has a special "final" vertex that indicates acceptance when reached, the code reference would check if the vertex list contains this vertex.
- $dfa->transitions_as_3tuples()
-
Returns a list of all transitions computed so far as. Transitions are arrays with three identifiers for the source state, the input symbol, and the destination state.
for my $transition ( $dfa->transitions_as_3tuples() ) { my ($src_state, $input, $dst_state) = @$transition; ... }
- $dfa->vertices_in_state($state_id)
-
Returns a list of vertices in the state
$state_id
. - $dfa->transitions_as_5tuples()
-
Returns a list of all transitions computed so far as. Transitions are arrays with five identifiers: the source state, an input vertex included in the source state, the input symbol, the destination state and an input vertex included in the destination state.
for my $transition ( $dfa->transitions_as_5tuples() ) { my ($src_state, $src_vertex, $input, $dst_state, $dst_vertex) = @$transition; ... }
Note that unlike
transitions_as_3tuples
this omits transitions involving the main dead state. - $dfa->backup_to_file('v0', $file)
-
Create a backup of the database used to store input and computed data into
$file
. The first parameter must bev0
and indicates the version of the database schema.
TODO
It does not make sense for
transitions_as_5tuples
and its companions to return a list for large automata. But short of returning the DBI statement handle there does not seem to be a good way to return something more lazy....
BUG REPORTS
Please report bugs in this module via http://rt.cpan.org/NoAuth/Bugs.html?Dist=Algorithm-ConstructDFA2
SEE ALSO
Set::IntSpan::Partition - Useful to create alphabets from sets
Acme::Partitioner - Useful to minimise automata
Algorithm::ConstructDFA - obsolete predecessor
Algorithm::ConstructDFA::XS - obsolete predecessor
ACKNOWLEDGEMENTS
Thanks to Slaven Rezic for bug reports.
AUTHOR / COPYRIGHT / LICENSE
Copyright (c) 2017-2018 Bjoern Hoehrmann <bjoern@hoehrmann.de>.
This module is licensed under the same terms as Perl itself.