NAME

Data::Seek - Search Complex Data Structures

VERSION

version 0.07

SYNOPSIS

use Data::Seek;

my $hash   = {...};
my $seeker = Data::Seek->new(data => $hash);
my $result = $seeker->search('*');
my $data   = $result->data;

DESCRIPTION

Data::Seek is a module for traversing complex data structures. This module allows you to select specific node(s) in a hierarchical data structure using a criteria. A criteria is an expression consisting of one or more criterion. A criterion is the part of the criteria that is used to select node(s) to be returned in the result. Data::Seek is akin to Data::Dpath but with fewer features and a simple node selection syntax. Data::Seek node traversal is extremely fast, and it can throw exception objects which explain, in detail, why a search failed. This is very useful internally, and externally when processing foreign data structures where you need to provide detailed errors explaining how to resolve the missing or malformed data nodes.

CONCEPTS

The follow is a short and simple overview of the strategy and syntax used by Data::Seek to query complex data strictures. The overall idea behind Data::Seek is to flatten/fold the data structure, preferably once, reduce it by applying a series patterns, then unflatten/unfold and operate on the new data structure. The introspection strategy is to flatten the data structure producing a non-hierarchical data structure where its keys represent endpoints (using dot-notation and colons to separate nested hash keys and array indices respectively) within the structure.

ENCODING

During the processing of flattening a data structure with nested data, the following data structure would be converted into a collection of endpoint/value pairs.

{
    'id' => 12345,
    'patient' => {
        'name' => {
            'first' => 'Bob',
            'last'  => 'Bee'
        }
    },
    'medications' => [{
        'aceInhibitors' => [{
            'name'      => 'lisinopril',
            'strength'  => '10 mg Tab',
            'dose'      => '1 tab',
            'route'     => 'PO',
            'sig'       => 'daily',
            'pillCount' => '#90',
            'refills'   => 'Refill 3'
        }],
        'antianginal' => [{
            'name'      => 'nitroglycerin',
            'strength'  => '0.4 mg Sublingual Tab',
            'dose'      => '1 tab',
            'route'     => 'SL',
            'sig'       => 'q15min PRN',
            'pillCount' => '#30',
            'refills'   => 'Refill 1'
        }],
    }]
}

Given the aforementioned data structure, the following would be the resulting flattened structure comprised of endpoint/value pairs.

{
    'id'                                      => 12345,
    'medications:0.aceInhibitors:0.dose'      => '1 tab',
    'medications:0.aceInhibitors:0.name'      => 'lisinopril',
    'medications:0.aceInhibitors:0.pillCount' => '#90',
    'medications:0.aceInhibitors:0.refills'   => 'Refill 3',
    'medications:0.aceInhibitors:0.route'     => 'PO',
    'medications:0.aceInhibitors:0.sig'       => 'daily',
    'medications:0.aceInhibitors:0.strength'  => '10 mg Tab',
    'medications:0.antianginal:0.dose'        => '1 tab',
    'medications:0.antianginal:0.name'        => 'nitroglycerin',
    'medications:0.antianginal:0.pillCount'   => '#30',
    'medications:0.antianginal:0.refills'     => 'Refill 1',
    'medications:0.antianginal:0.route'       => 'SL',
    'medications:0.antianginal:0.sig'         => 'q15min PRN',
    'medications:0.antianginal:0.strength'    => '0.4 mg Sublingual Tab',
    'patient.name.first'                      => 'Bob'
    'patient.name.last'                       => 'Bee',
}

This structure provides the endpoint strings which will be matched against using the querying strategy.

QUERYING

During the processing of querying the data structure, the criteria (query expressions) are converted into a series of regular expressions to be applied sequentially, filtering/reducing the endpoints and producing a data set of matching nodes or throwing an exception explaining the search failure.

Node Expression

id
patient
medications

The node expression is a part of a criterion, which preforms an exact match against a node in the data structure. It is a string which can contain letters, numbers, and/or underscores.

Step Expression

my $result = $seeker->search(...);

# given "id"
{ id => 12345 }

# given "patient.name.first"
{ patient => { name => { first => "Bob" } } }

# given "patient.name.last"
{ patient => { name => { last => "Bee" } } }

The step expression is a criterion, or part of a criterion, made up of one or more node expressions separated using the period character, which matches against nodes in the data structure. It is a string which can contain letters, numbers, and/or underscores, separated using periods.

Index Expression

my $result = $seeker->search(...);

# given "medications:0.aceInhibitors:0.dose"
{ medications => [{ aceInhibitors => [{ dose => "1 tab" }] }] }

# given "medications:0.aceInhibitors:0.name"
{ medications => [{ aceInhibitors => [{ name => "lisinopril" }] }], }

# given "medications:0.aceInhibitors:0.pillCount"
{ medications => [{ aceInhibitors => [{ pillCount => "#90" }] }] }

The index expression is a criterion, or part of a criterion, having a node expressions suffixed with a colon followed by a number denoting that it should only match an array which has an index corresponding to the numeric portion of the suffix. It is a string which can contain letters, numbers, and/or underscores, suffixed with a semi-colon followed by a number.

Iterator Expression

my $result = $seeker->search(...);

# given "@medications.@aceInhibitors.dose"
{ medications => [{ aceInhibitors => [{ dose => "1 tab" }] }] }

# given "@medications.@aceInhibitors.name"
{ medications => [{ aceInhibitors => [{ name => "lisinopril" }] }], }

# given "@medications.@aceInhibitors.pillCount"
{ medications => [{ aceInhibitors => [{ pillCount => "#90" }] }] }

The iteration expression is a criterion, or part of a criterion, having a node expressions preceded by an "at" character (ampersand) denoting that the node expression should match all nodes in the data structure which are mapped to array objects. It is a string which can contain letters, numbers, and/or underscores, preceded by a single ampersand character.

Wildcard Expression

my $result = $seeker->search(...);

# given "*"
{ id => 12345 }

# given "*.*.first"
{ patient => { name => { first => "Bob" } } }

# given "*.*.last"
{ patient => { name => { last => "Bee" } } }

# given "patient.*.first"
{ patient => { name => { first => "Bob" } } }

# given "patient.*.last"
{ patient => { name => { last => "Bee" } } }

# given "@*.@*.pillCount"
{
    medications => [{
        aceInhibitors => [{ pillCount => "#90" }],
        antianginal   => [{ pillCount => "#30" }],
    }],
}

The wildcard expression is a criterion, or part of a criterion, which matches against a single node having a single "star" character match and represent one node expression. It is a string which can contain letters, numbers, underscores, and/or a single star character.

Greedy-Wildcard Expression

my $result = $seeker->search(...);

# given "**.first"
{ patient => { name => { first => "Bob" } } }

# given "**.last"
{ patient => { name => { last => "Bee" } } }

# given "patient.**"
{ patient => { name => { first => "Bob", last => "Bee" } } }

# given "medications**.pillCount"
{
    medications => [{
        aceInhibitors => [{ pillCount => "#90" }],
        antianginal   => [{ pillCount => "#30" }],
    }],
}

The greedy-wildcard expression is a criterion, or part of a criterion, which matches against any multitude of nodes having a double "star" character match and represent one or more of any character. It is a string which can contain letters, numbers, underscores, and/or a double star character.

ATTRIBUTES

cache

$seeker->cache;
$seeker->cache(1);

Encode the data structure and cache the result. Allows multiple queries to execute faster. Caching is disabled by default.

data

$seeker->data;
$seeker->data(Data::Seek::Data->new(...));

The data structure to be introspected, must be a hash reference, blessed or not, which defaults to or becomes a Data::Seek::Data object.

ignore

$seeker->ignore;
$seeker->ignore(1);

Bypass exceptions thrown when a criterion is invalid or no data matches can be found.

METHODS

my @criteria = ('id', 'person.name.*');
my $result   = $seeker->search(@criteria);

Prepare a search object to use the supplied criteria and return a result object. Introspection is triggered when the result object is enacted. See Data::Seek::Search::Result for usage information.

AUTHOR

Al Newkirk <anewkirk@ana.io>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Al Newkirk.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.