NAME
Data::Match - Complex data structure pattern matching
SYNOPSIS
use Data::Match qw(:all);
my ($match, $results) = match($structure, $pattern);
use Data::Match;
my $obj = new Data::Match;
my ($match, $results) = $obj->execute($structure, $pattern);
DESCRIPTION
Data::Match provides extensible complex Perl data structure searching and matching.
EXPORT
None are exported by default. :func
exports match
and matches
, :pat
exports all the pattern element generators below, :all
exports :func
and :pat
.
PATTERNS
A data pattern is a complex data structure that possibly matches another complex data structure. For example:
matches([ 1, 2 ], [ 1, 2 ]); # TRUE
matches([ 1, 2, 3 ], [ 1, ANY, 3 ]); # TRUE
matches([ 1, 2, 3 ], [ 1, ANY, 2 ]); # FALSE: 3 != 2
ANY
matches anything, including an undefined value.
my $results = matches([ 1, 2, 1 ], [ BIND('x'), ANY, BIND('x') ]); # TRUE
BIND($name)
matches anything and remembers each match and its position with every BIND($name)
in $result-
{'BIND'}{$name}>. If BIND($name)
is not the same as the first value bound to BIND($name)
it does not match. For example:
my $results = matches([ 1, 2, 3 ], [ BIND('x'), 2, BIND('x') ]); # FALSE: 3 != 1
COLLECT($name)
is similar to BIND but does not compare first bound values.
REST
matches all remaining elements of an array or hash.
matches([ 1, 2, 3 ], [ 1, REST() ]); # TRUE
matches({ 'a'=>1, 'b'=>1 }, { 'b'=>1, REST() => REST() }); # TRUE
FIND
searches at all depths for matching subpatterns.
matches([ 1, [ 1, 2 ], 3], FIND(COLLECT('x', [ 1, REST() ])); # is true.
See the test script t/t1.t
in the package distribution for more pattern examples.
MATCH COLLECTIONS
When a BIND
or COLLECT
matches a datum, an entry is collected in $result-
{BIND}> and $result-
{COLLECT}>, respectively. (This might change in the future)
Each entry for the binding name is a hash containing 'v'
, 'p'
and 'ps'
lists. 'v'
is a list of the value at each match. 'p'
is a list describing where the corresponding match was found based on the root of the searchat each match. 'ps'
is a list of code strings that describes where the match was for each match.
SUBPATTERNS
All patterns can have sub-patterns. Most patterns match the ANDed results of their sub-patterns and their own behavior, first trying the sub-patterns before attempting to match the intrinsic behavior. However, OR
and ANY
match any sub-patterns;
For example:
match([ ['a', 1 ], ['b', 2], ['a', 3] ], EACH(COLLECT('x', ['a', ANY())))) # TRUE
The above pattern means:
For EACH element in the root structure (an array):
COLLECT each element, in 'x'
, that is,
An ARRAY of length 2 that starts with 'a'
.
On the other hand.
match( [ ['a', 1 ], ['b', 2], ['a', 3] ], ALL(COLLECT('x', [ 'a', ANY() ])) )
# IS FALSE
Because the second root element (an array) does not start with 'a'
. But,
match( [ ['a', 1 ], ['a', 2], ['a', 3] ], ALL(COLLECT('x', [ 'a', ANY() ])) )
# IS TRUE
The pattern below essentially does a flatten of the nested array:
match(
[ 1, 'x',
[ 2, 'x',
[ 3, 'x'],
[ 4,
[ 5,
[ 'x' ]
],
6
]
]
],
FIND(COLLECT('x', EXPR(q{! ref}))),
{ 'no_collect_path' => 1 }
);
no_collect_path
causes COLLECT
and BIND
to not collect any paths.
MATCH SLICES
Match slices are objects that contain slices of matched portions of a data structure. This is useful for inflicting change into substructures matched by patterns like REST
.
For example:
do {
my $a = [ 1, 2, 3, 4 ];
my $p = [ 1, ANY, REST(BIND('s')) ];
my $r = matches($a, $p); #
ok($r); # TRUE
ok(Compare($r->{'BIND'}{'s'}{'v'}[0], [ 3, 4 ])); # TRUE
$p->[0] = 'x';
matches($a, [ 1, 2, 'x', 4 ]); # TRUE
}
Hash match slices are generate for each key-value pair for a hash matched by EACH
and ALL
. Each of these match slices can be matched as a hash with a single key->value pair.
Match slices are useful for search and replace missions.
VISITATION ADAPTERS
By default Data::Match is blind to Perl object interfaces. To instruct Data::Match to not traverse object implementation containers and honor object interfaces you must provide a visitation adaptor. A visitation adaptor tells Data::Match what to traverse through an object interface and how to keep track of how it got through.
For example:
package Foo;
sub new
{
my ($cls, %opts) = @_;
bless \%opts, $cls;
}
sub x { shift->{x}; }
sub parent { shift->{parent}; }
sub children { shift->{children}; }
sub add_child {
my $self = shift;
for my $c ( @_ ) {
$c->{parent} = $self;
}
push(@{$self->{children}}, @_);
}
my $foos = [ map(new Foo('x' => $_), 1 .. 10) ];
for my $f ( @$foos ) { $f->add_child($foos->[rand($#$foo)); }
my $pat = FIND(COLLECT('Foo', ISA('Foo', { 'parent' => $foos->[0], REST() => REST() })));
$match->match($foos, $pat);
The problem with the above example is: FIND
will not honor the interface of class Foo by default and will eventually find a Foo where $_-
parent eq $foos->[0]> through all the parent and child links in the objects' implementation container. To force Data::Match to honor an interface (or a subset of an interface) during FIND
traversal we create a 'find' adapter sub that will do the right thing.
my $opts = {
'find' => {
'Foo' => sub {
my ($self, $visitor) = @_;
# Always do 'x'.
$visitor->($self->x, 'METHOD', 'x');
# Optional children traversal.
if ( $match->{'Foo_traverse_children'} ) {
$visitor->($self->children, 'METHOD', 'children');
}
# Optional parent traversal.
if ( $match->{'Foo_traverse_parent'} ) {
$visitor->($self->parent, 'METHOD', 'parent');
}
}
}
}
my $match = new Data::Match($opts);
$match = $match->match($foos, $pat);
See t/t4.t
for more examples of visitation adaptors.
DESIGN
Data::Match employs a mostly-functional external interface since this module was inspired by a Lisp tutorial ("The Little Lisper", maybe) I read too many years ago; besides, pattern matching is largely recursively functional. The optional control hashes and traverse adapter interfaces are better represented by an object interface so implemented a functional veneer over the core object interface.
Internally, objects are used to represent the pattern primitives because most of the pattern primitives have common behavior. There are a few design patterns that are particularly applicable in Data::Match: Visitor and Adapter. Adapter is used to provide the extensiblity for the traversal of blessed structures such that Data::Match can honor the external interfaces of a class and not blindly violate encapsulation. Visitor is the basis for some of the FIND
pattern implementation. The Data::Match::Slice
classes that provide the match slices are probably a Veneer on the array and hash types through the tie meta-behaviors.
CAVEATS
Does not have regex-like operators like '?', '*', '+'.
Should probably have more interfaces with Data::DRef and Data::Walker.
The visitor adaptors do not use
UNIVERSAL::isa
to search for the adapter is usesref
.Since hash keys do not retain blessedness (what was Larry thinking?) it is difficult to have patterns match keys without resorting to some bizarre regex instead of using
isa
.
STATUS
If you find this to be useful please contact the author. This is alpha software; all APIs, semantics and behavors are subject to change.
INTERFACE
This section describes the external interface of this module.
%match_opts
Default options for match
.
_match_pre
Initialize the match object before pattern traversal.
_match_post
Initialize the match object before pattern traversal.
execute
Matches a structure against a pattern. In a list context, returns both the match success and results; in a scalar context returns the results hash if match succeeded or undef.
use Data::Match;
my $obj = new Data::Match();
my $matched = $obj->execute($thing, $pattern);
match
match($thing, $pattern, @opts)
it same as:
Data::Match->new(@opts)->execute($thing, $pattern);
matches
Same as match
in scalar context.
match_path_str
Returns a perl expression that will generate code to point to the element of the path.
match_path_DRef_path
Returns a string suitable for Data::DRef.
match_path_get
Returns the value pointing to the location for the match path in the root.
my $results = matches($thing, FIND(BIND('x', [ 'x', REST ])));
my $x = match_path_get($thing, $results->{'BIND'}{'x'}{'p'}[0]);
The above example returns the first array that begins with 'x'
.
match_path_set
Returns the value pointing to the location for the match path in the root.
my $results = matches($thing, FIND(BIND('x', [ 'x', REST ])));
match_path_set($thing, $results->{'BIND'}{'x'}{'p'}[0], 'y');
The above example replaces the first array found that starts with 'x' with 'y';
match_path_ref
Returns a scalar ref pointing to the location for the match path in the root.
my $results = matches($thing, FIND(BIND('x', [ 'x', REST ])));
my $ref = match_path_ref($thing, $results->{'BIND'}{'x'}{'p'}[0]);
$$ref = 'y';
The above example replaces the first array that starts with 'x' with 'y';
VERSION
Version 0.03, $Revision: 1.8 $.
AUTHOR
Kurt A. Stephens <kurtstephens@acm.org>
COPYRIGHT
Copyright (c) 2001, Kurt A. Stephens and ION, INC.
SEE ALSO
perl, Array::PatternMatcher, Data::Compare, Data::Dumper, Data::DRef, Data::Walker.