NAME

Data::Walk::Extracted - An extracted dataref walker

SYNOPSIS

#! C:/Perl/bin/perl
use Modern::Perl;
use YAML::Any;
use Moose::Util qw( with_traits );
use Data::Walk::Extracted v0.007;
use Data::Walk::Print v0.007;

$| = 1;

#Use YAML to compress writing the data ref
my  $firstref = Load(
    '---
    Someotherkey:
        value
    Parsing:
        HashRef:
            LOGGER:
                run: INFO
    Helping:
        - Somelevel
        - MyKey:
            MiddleKey:
                LowerKey1: lvalue1
                LowerKey2:
                    BottomKey1: 12345
                    BottomKey2:
                    - bavalue1
                    - bavalue2
                    - bavalue3'
);
my  $secondref = Load(
    '---
    Someotherkey:
        value
    Helping:
        - Somelevel
        - MyKey:
            MiddleKey:
                LowerKey1: lvalue1
                LowerKey2:
                    BottomKey1: 12346
                    BottomKey2:
                    - bavalue1
                    - bavalue3'
);
my $newclass = with_traits( 'Data::Walk::Extracted', ( 'Data::Walk::Print' ) );
my $AT_ST = $newclass->new(
        match_highlighting => 1,#This is the default
        sort_HASH => 1,#To force order for demo purposes
);
$AT_ST->print_data(
    print_ref     =>  $firstref,
    match_ref   =>  $secondref,
);

#######################################
#     Output of SYNOPSIS
# 01:{#<--- Ref Type Match
# 02:	Helping => [#<--- Secondary Key Match - Ref Type Match
# 03:		'Somelevel',#<--- Secondary Position Exists - Secondary Value Matches
# 04:		{#<--- Secondary Position Exists - Ref Type Match
# 05:			MyKey => {#<--- Secondary Key Match - Ref Type Match
# 06:				MiddleKey => {#<--- Secondary Key Match - Ref Type Match
# 07:					LowerKey1 => 'lvalue1',#<--- Secondary Key Match - Secondary Value Matches
# 08:					LowerKey2 => {#<--- Secondary Key Match - Ref Type Match
# 09:						BottomKey1 => '12345',#<--- Secondary Key Match - Secondary Value Does NOT Match
# 10:						BottomKey2 => [#<--- Secondary Key Match - Ref Type Match
# 11:							'bavalue1',#<--- Secondary Position Exists - Secondary Value Matches
# 12:							'bavalue2',#<--- Secondary Position Exists - Secondary Value Does NOT Match
# 13:							'bavalue3',#<--- Secondary Position Does NOT Exist - Secondary Value Does NOT Match
# 14:						],
# 15:					},
# 16:				},
# 17:			},
# 18:		},
# 19:	],
# 20:	Parsing => {#<--- Secondary Key Mismatch - Ref Type Mismatch
# 21:		HashRef => {#<--- Secondary Key Mismatch - Ref Type Mismatch
# 22:			LOGGER => {#<--- Secondary Key Mismatch - Ref Type Mismatch
# 23:				run => 'INFO',#<--- Secondary Key Mismatch - Secondary Value Does NOT Match
# 24:			},
# 25:		},
# 26:	},
# 27:	Someotherkey => 'value',#<--- Secondary Key Match - Secondary Value Matches
# 28:},
#######################################

DESCRIPTION

This module takes a data reference (or two) and recursivly travels through it(them). Where the two references diverge the walker follows the primary data reference. At the beginning and end of each node the code will attempt to call a method using data from the current location of the node.

Beware Recursive parsing is not a good fit for all data since very deep data structures will burn a fair amount of perl memory! Meaning that as the module recursively parses through the levels perl leaves behind snapshots of the previous level that allow perl to keep track of it's location.

This is an implementation of the concept of extracted data walking from Higher-Order-Perl Chapter 1 by Mark Jason Dominus. The book is well worth the money! With that said I diverged from MJD purity in two ways. This is object oriented code not functional code and moreover it is written in Moose. :) Second, the code uses methods that are not included in the class, to provide add-on functionality at the appropriate places for action. The MJD equivalent expects to use a passed CodeRef at the action points. There is clearly some overhead associated with both of these differences. I made those choices consciously and if that upsets you do not hassle MJD!

Default Functionality

This module does not do anything by itself but walk the data structure. It takes no action on its own during the walk. All the output above is from Data::Walk::Print

Basic interface

The module uses five basic pieces of data to work;

primary_ref - a dataref that the walker will walk
secondary_ref - a dataref that is used for comparision while walking
before_method - some action performed at the beginning of each node
after_method - some action performed at the beginning of each node
conversion_ref - a way to change the data ref naming used in the role to the name used in the base class. This allows the data to be named in a way unique to the role so that any bad callout can be caught but still be used generically by the base class.

An example

$passed_ref ={
    print_ref =>{ 
        First_key => 'first_value',
    },
    match_ref =>{
        First_key => 'second_value',
    },
    before_method => '_print_before_method',
    after_method  => '_print_after_method',
}

$conversion_ref =>{
    primary_ref   => 'print_ref',# generic_name => role_name,
    secondary_ref => 'match_ref',
}

The minimum acceptable list of passed arguments are: 'primary_ref' and either of 'before_method' or 'after_method'. The list can also contain 'secondary_ref' and 'branch_ref' but they are not required. When nameing the before_method and after_method for the role keep in mind possible namespace collisions with other role methods. The input scrubber will use the $conversion_ref to test the $passed_ref for the correct $key names. If the key names are passed differently from the role then the scrubber will change the keys prior to sending the $passed_ref to the data walker. Any errors will be 'croak'ed using the passed names not the data walker names.

After the data scrubbing the $passed_ref is sent to the data walker.

v0.007

State This code is still in Beta state and therefore the API is subject to change. I like the basics and will try to add rather than modify whenever possible in the future. The goal of future development will be focused on supporting additional branch types. API changes will only occur if the current functionality proves excessivly flawed in some fasion. All fixed functionality will be defined by the test suit.
Included ArrayRefs and HashRefs are supported data walker nodes. Strings and Numbers are all currently treated as base states.
Excluded Objects and CodeRefs are not currently handled. The should cause the code to croak if the module encounters them (not tested). See "TODO"

Extending Data::Walk::Extracted

All action taken during the data walking must be initiated by implementation of two possible methods. The before_method and the after_method. The methods are not provided by the base Data::Walk::Extracted class. They can be added with a Moose::Role or by extending the class.

How to add Roles to the Class?

One way to incorporate a role into this class and then use it is the method 'with_traits' from Moose::Util.

What is the reccomended way to build a role that uses this class?

First start by creating the 'action' method for the role. This would preferably be named something descriptive like 'mangle_data'. This method should build a $passed_ref and possibly a $conversion_ref. The $passed_ref can include up to two data references, a call to either a 'before_method' or an 'after_method' or both, and possibly a 'branch_ref'. The $conversion_ref should contain key / value pairs that repsesent the translation of the $passed_ref keys used in the Role to the names used by the class. This allows for generic handling of walking but still allowing multiple roles to coexist in the class when built.

Then build one or both of before_method and after_method for use when walking the data. For examples review the code in Data::Walk::Print

Write some tests for your role!

what is the recursive data walking sequence?
First The class checks for an available 'before_method'. Using the test exists $passed_ref->{before_method}. If the test passes then the sequence $method = $passed_ref->{before_method}; $passed_ref = $self->$method( $passed_ref ); is run. If the new $passed_ref contains the key $passed_ref->{bounce} or is undef the program deletes the key 'bounce' from the $passed_ref (as needed) and then returns $passed_ref directly back up the data tree. Do not pass 'Go' do not collect $200. Otherwise $passed_ref is sent on to the node parser. If the $passed_ref is modified by the 'before_method' then the node parser will parse the new ref and not the old one.
Second It determines what reference type the node is at the current level. Strings and Numbers are considered 'TERMINATOR' types and are handled as single element nodes. Then, any listing available for elements of that node is created and if the list should be sorted then the list is sorted. If the current node is 'undef' this is considered a 'base state' and the code skips to the "Fifth" step.
Third - building the $passed_ref For each element of the node a new dataset is built. The dataset consists of a "primary_ref", a "secondary_ref" and a "branch_ref". The primary_ref contains only the portion of the dataset that exists below the selected element of that node. The secondary_ref is only constructed if it has a matching element at that node with the primary_ref. Node matching for hashrefs is done by string compares of the key only. Node matching for arrayrefs is done by testing if the secondary_ref has the same array position available as the primary_ref. No position content compare is done! The secondary_ref would then be built like the primary_ref. The branch_ref will contain an array ref of array refs. Each of the top array positions represents a previously traveled node on the current branch. The lower array ref will have four positions which describe the the element taken for that branch. The values in each position are; 0-ref type, 1-hash key name or '', 2-element sequence position (from 0), and 3-level of the node (from 1). The branch_ref arrays are effectivly the linear (vertical) breadcrumbs that show how the parser got to that point. Past completed branches and future pending branches are not shown. The new dataset is then passed to the recursive (private) subroutine to be parsed in the same manner ("First").
Fourth When the values are returned from the recursion call the returned value(s) is(are) used to replace the pased primary_ref and secondary_ref values in the current $passed_ref.
Fifth - The class checks for an available 'after_method'. Using the test exists $passed_ref->{after_method}. If the test passes then the sequence $method = $passed_ref->{after_method}; $passed_ref = $self->$method( $passed_ref ); is run.
Seventh the $passed_ref is passed back up to the next level. (with changes)

Attributes

Data passed to ->new when creating an instance. For modification of these attributes see "Methods". The ->new function will either accept fat comma lists or a complete hash ref that has the possible appenders as the top keys.

sort_HASH

Definition: This attribute is set to sort (or not) Hash Ref keys prior to walking the Hash Ref node.
Default 0 (No sort)
Range Boolean values. See "TODO" for future direction.

sort_ARRAY

Definition: This attribute is set to sort (or not) Array values prior to walking the Array Ref node. Warning this will permanantly sort the actual data in the passed ref permanently. If a secondary ref also exists it will be sorted as well!
Default 0 (No sort)
Range Boolean values. See "TODO" for future direction.

skip_HASH_ref

Definition: This attribute is set to skip (or not) the processing of HASH Ref nodes.
Default 0 (Don't skip)
Range Boolean values.

skip_ARRAY_ref

Definition: This attribute is set to skip (or not) the processing of ARRAY Ref nodes.
Default 0 (Don't skip)
Range Boolean values.

skip_TERMINATOR_ref

Definition: This attribute is set to skip (or not) the processing of TERMINATOR's of ref branches.
Default 0 (Don't skip)
Range Boolean values.

change_array_size

Definition: This attribute will not be used by this class directly. However the Data::Walk::Prune Role and the Data::Walk::Graft Role both use it so it is placed here so there will be no conflicts.
Default 1 (This usually means that the array position will be added or removed)
Range Boolean values.

Methods

change_array_size_behavior( $bool )

Definition: This method is used to change the "change_array_size" attribute after the instance is created. This attribute is not used by this class! However, it is provided so multiple Roles can share behavior rather each handling this attribute differently. See Data::Walk::Prune and Data::Walk::Graft for specific effects of this attribute.
Accepts: a Boolean value
Returns: ''

GLOBAL VARIABLES

$ENV{Smart_Comments}

The module uses Smart::Comments with the '-ENV' option so setting the variable $ENV{Smart_Comments} will turn on smart comment reporting. There are three levels of 'Smartness' called in this module '### #### #####'. See the Smart::Comments documentation for more information.

$Carp::Verbose

The module uses Carp to die(croak) so the variable $Carp::Verbose can be set for more detailed debugging.

BUGS

Data-Walk-Extracted/issues

TODO

Support recursion through CodeRefs
Support recursion through Objects
Allow the sort_XXX attributes to recieve a sort subroutine
Add a Data::Walk::Top Role to the package
Add a Data::Walk::Thin Role to the package

SUPPORT

jandrew@cpan.org

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

Dependancies

Modern::Perl
version
Carp
Moose
MooseX::StrictConstructor
MooseX::Types::Moose
Smart::Comments -ENV option set

SEE ALSO

Data::Walk
Data::Walker
Data::Dumper - Dump
YAML - Dump
Data::Walk::Print - or other action object