NAME
Data::Walk::Extracted - An extracted dataref walker
SYNOPSIS
#!perl
use Modern::Perl;
use YAML::Any;
use Moose::Util qw( with_traits );
use Data::Walk::Extracted v0.015;
use Data::Walk::Print v0.009;
$| = 1;
#Use YAML to compress writing the data ref
my $firstref = Load(
'---
Someotherkey:
value
Parsing:
HashRef:
LOGGER:
run: INFO
Helping:
- Somelevel
- MyKey:
MiddleKey:
LowerKey1: lvalue1
LowerKey2:
BottomKey1: 12345
BottomKey2:
- bavalue1
- bavalue2
- bavalue3'
);
my $secondref = Load(
'---
Someotherkey:
value
Helping:
- Somelevel
- MyKey:
MiddleKey:
LowerKey1: lvalue1
LowerKey2:
BottomKey2:
- bavalue1
- bavalue2
BottomKey1: 12354'
);
my $AT_ST = with_traits(
'Data::Walk::Extracted',
( 'Data::Walk::Print' ),
)->new(
match_highlighting => 1,#This is the default
);
$AT_ST->print_data(
print_ref => $firstref,
match_ref => $secondref,
sort_HASH => 1,#To force order for demo purposes
);
############################################################################
# Output of SYNOPSIS
# 01:{#<--- Ref Type Match
# 02: Helping => [#<--- Secondary Key Match - Ref Type Match
# 03: 'Somelevel',#<--- Secondary Position Exists - Secondary Value Matches
# 04: {#<--- Secondary Position Exists - Ref Type Match
# 05: MyKey => {#<--- Secondary Key Match - Ref Type Match
# 06: MiddleKey => {#<--- Secondary Key Match - Ref Type Match
# 07: LowerKey1 => 'lvalue1',#<--- Secondary Key Match - Secondary Value Matches
# 08: LowerKey2 => {#<--- Secondary Key Match - Ref Type Match
# 09: BottomKey1 => '12345',#<--- Secondary Key Match - Secondary Value Does NOT Match
# 10: BottomKey2 => [#<--- Secondary Key Match - Ref Type Match
# 11: 'bavalue1',#<--- Secondary Position Exists - Secondary Value Matches
# 12: 'bavalue2',#<--- Secondary Position Exists - Secondary Value Does NOT Match
# 13: 'bavalue3',#<--- Secondary Position Does NOT Exist - Secondary Value Does NOT Match
# 14: ],
# 15: },
# 16: },
# 17: },
# 18: },
# 19: ],
# 20: Parsing => {#<--- Secondary Key Mismatch - Ref Type Mismatch
# 21: HashRef => {#<--- Secondary Key Mismatch - Ref Type Mismatch
# 22: LOGGER => {#<--- Secondary Key Mismatch - Ref Type Mismatch
# 23: run => 'INFO',#<--- Secondary Key Mismatch - Secondary Value Does NOT Match
# 24: },
# 25: },
# 26: },
# 27: Someotherkey => 'value',#<--- Secondary Key Match - Secondary Value Matches
# 28:},
##############################################################################
DESCRIPTION
This module takes a data reference (or two) and recursivly travels through it(them). Where the two references diverge the walker follows the primary data reference. At the beginning and end of each "node" the code will attempt to call a method using data from the current location of the node.
Definitions
node
Each branch point of a data reference is considered a node. The original top level reference is the 'zeroth' node. Recursion 'Base state' nodes are understood to have zero elements so an additional node called 'END' type is recognized after a scalar.
Caveat utilitor
This is not an extention of Data::Walk
This module uses the 'defined or' ( //= ) and so requires perl 5.010 or higher.
This is a Moose based data handling class. Many software developers will tell you Moose and data manipulation don't belong together. They are most certainly right in startup-time critical circumstances.
Recursive parsing is not a good fit for all data since very deep data structures will consume a fair amount of computer memory! The code leaves in memory a snapshot of the active data at the previous node when it travels down the data tree. This means that the memory foot print of the originally passed primary ref and secondary ref (and a few other data points) are multiplied many times as a function of the depth of the data structure.
This class has no external effect! all output above is from the role Data::Walk::Print
The primary_ref and secondary_ref are effectivly deep cloned during this process. To leave the primary_ref pointer intact see "fixed_primary"
The "COPYRIGHT" is down lower.
Supported node walking types
- ARRAY
- HASH
- SCALAR
Other node support
Support for Objects is partially implemented and as a consequence '_process_the_data' won't immediatly die when asked to parse an object. It will still die but on a dispatch table call that indicates where there is missing object support not at the top of the node.
Supported one shot "Attributes"
- sort_HASH
- sort_ARRAY
- skip_HASH_ref
- skip_ARRAY_ref
- skip_SCALAR_ref
- change_array_size
- fixed_primary
What is the unique value of this module?
With the recursive part of data walking extracted the various functionalities desired when walking the data can be modularized without copying this code. The Moose framework also allows diverse and targeted data parsing without dragging along a kitchen sink API for every implementation of this Class.
Acknowledgement of MJD
This is an implementation of the concept of extracted data walking from Higher-Order-Perl Chapter 1 by Mark Jason Dominus. The book is well worth the money! With that said I diverged from MJD purity in two ways. This is object oriented code not functional code. Second, like the MJD equivalent, the code does nothing on its own. Unlike the MJD equivalent it looks for methods provided in a role or class extention at the appropriate places for action. The MJD equivalent expects to use a passed CodeRef at the action points. There is clearly some overhead associated with both of these differences. I made those choices consciously and if that upsets you do not hassle MJD!
Extending Data::Walk::Extracted
All action taken during the data walking must be initiated by implementation of action methods that do not exist in this class. They can be added with a traditionally incorporated Role Moose::Role, by extending the class, or attaching a role with the needed functionality at run time using 'with_traits' from Moose::Util. See the internal method _process_the_data to see the detail of how these methods are incorporated and review the "Recursive Parsing Flow" to understand the details of how the methods are used.
What is the recomended way to build a role that uses this class?
First build a method to be used when the class reaches a data "node" and another to be used when the class leaves a data node (as needed). Then create the 'action' method for the role. This would preferably be named something descriptive like 'mangle_data'. Remember if more than one role is added to Data::Walk::Extracted then all methods should be named with all method names considered. This method should compose any required node action methods and data references into a $passed_ref and possibly a $conversion_ref to be used by _process_the_data . Then the 'action' method should call;
$passed_ref = $self->_process_the_data( $passed_ref, $conversion_ref );
Afterwards returning anything from the $passed_ref of interest.
Finally, Write some tests for your role!
Methods
Methods used to write Roles
_process_the_data( $passed_ref, $conversion_ref ) - internal
- Definition: This method is the core access to the recursive parsing of Data::Walk::Extracted. While the method is listed as a private (leading underscore) method it is intended to be used by consuming roles or classes. To use this method you compose this class with a role or inherit this class and then send the needed information from your code to this method and this method will scub the data inputs and send them to the recusive parser. Extentions or roles that use this method are expected to compose and pass the following data to this method.
- Accepts: ( $passed_ref, $conversion_ref )
-
- $passed_ref this ref contains key value pairs as follows;
-
- primary_ref - a dataref that the walker will walk - required
- secondary_ref - a dataref that is used for comparision while walking - optional
- before_method - a method name that will perform some action at the beginning of each node - optional
- after_method - a method name that will perform some action at the end of each node - optional
- [attribute name] - attribute names are accepted with temporary attribute settings. These settings are temporarily set for a single "_process_the_data" call and then the original attribute values are restored. For this to work the the attribute must have the following prefixed methods; get_$name, set_$name, clear_$name, and has_$name. - optional
- $conversion_ref This allows a public method to accept different key names for the various keys listed above and then convert them later to the generic terms used by this class. - optional
- Example
-
$passed_ref ={ print_ref =>{ First_key => [ 'first_value', 'second_value' ], }, match_ref =>{ First_key => 'second_value', }, before_method => '_print_before_method', after_method => '_print_after_method', sort_Array => 1,#One shot attribute setter } $conversion_ref ={ primary_ref => 'print_ref',# generic_name => role_name, secondary_ref => 'match_ref', }
- Action: This method begins by scrubing the top level of the inputs and ensures that the minimum requirements for the recursive data parser are met. If needed it will use a conversion ref (also provided by the caller) to change input hash keys to the generic hash keys used by this class. When the recursive data walker is called it walks through the passed primary_ref data structure. Each time the walker reaches a "node" it will attempt to call a provided before_method. It will then check if the secondary_ref matches, See the "Recursive Parsing Flow" for more information. At this point it will recursivly walk the node. After the node has been processed it will attempt to call an after_method. The before_method and after_method are allowed to change the primary_ref and secondary_ref.
- Returns: the $passed_ref (only) with the key names restored to their original versions.
_build_branch( $seed_ref, @arg_list ) - internal
- Definition: There are times when a role will wish to reconstruct the data branch that lead from the 'zeroth' node to where the data walker is currently at. This private method takes a seed reference and uses the branch ref to recursivly append to the front of the seed until a complete branch to the zeroth node is generated.
- Accepts: a list of arguments starting with the $seed_ref to build from. The remaining arguments are just the array elements of the 'branch ref'.
- Example
-
$ref = $self->_build_branch( $seed_ref, @{ $passed_ref->{branch_ref}}, );
- Returns: a data reference with the current path back to the start pre-pended to the $seed_ref
_extracted_ref_type( $test_ref ) - internal
- Definition: In order to manage data types necessary for this class a data walker compliant type tester is provided. This is necessary to support a few non perl-standard types. First, the base state 'END' is treated as a data type and is not generated in the normal perl data typing systems. Second, strings and numbers both return as 'SCALAR' (not '' or undef). For the purposes of this class you should always call this attribute to get the correct data type when using dispatch tables!
- Accepts: This method expects to be called by $self. It receives a data reference that can include undef.
- Returns: a data walker type or it confesses. (For more details see $discover_type_dispatch in the code)
_dispatch_method( $dispatch_ref, $call, @arg_list ) - internal
- Definition: To make this class extensible, the majority of the decision points are managed by dispatch (hash) tables. In order to have the dispatch behavior common across all methods the dispatch call is provided for all consuming classes and rolls.
- Accepts: This method expects to be called by $self. It first receives the dispatch table (hash) as a data reference. Next, the data type is accepted as $call. Finally, any arguments needed by the dispatch table are passed through in @arg_list.
- Returns: defined by the dispatch table
Public Methods
set_change_array_size( $bool )
- Definition: This method is used to change the "change_array_size" attribute after the instance is created.
- Accepts: a Boolean value
- Returns: nothing
get_change_array_size()
- Definition: This method returns the current state of the "change_array_size" attribute.
- Accepts: nothing
- Returns: $Bool value representing the state of the 'change_array_size' attribute
has_change_array_size()
- Definition: This method is used to test if the "change_array_size" attribute is set.
- Accepts: nothing
- Returns: $Bool value indicating if the 'change_array_size' attribute has been set
clear_change_array_size()
- Definition: This method clears the "change_array_size" attribute.
- Accepts: nothing
- Returns: nothing
set_fixed_primary( $bool )
- Definition: This method is used to change the "fixed_primary" attribute after the instance is created.
- Accepts: a Boolean value
- Returns: nothing
get_fixed_primary()
- Definition: This method returns the current state of the "fixed_primary" attribute.
- Accepts: nothing
- Returns: $Bool value representing the state of the 'fixed_primary' attribute
has_fixed_primary()
- Definition: This method is used to test if the "fixed_primary" attribute is set.
- Accepts: nothing
- Returns: $Bool value indicating if the 'fixed_primary' attribute has been set
clear_fixed_primary()
- Definition: This method clears the "fixed_primary" attribute.
- Accepts: nothing
- Returns: nothing
Attributes
Data passed to ->new when creating an instance. For modification of these attributes see "Public Methods". The ->new function will either accept fat comma lists or a complete hash ref that has the possible appenders as the top keys. Additionally some attributes that meet the criteria can be passed to _process_the_data and will be adjusted for just the run of that method call. These are called one shot attributes.
sort_HASH
- Definition: This attribute is set to sort (or not) Hash Ref keys prior to walking the Hash Ref node.
- Default 0 (No sort)
- Range Boolean values and sort coderefs.
sort_ARRAY
- Definition: This attribute is set to sort (or not) Array values prior to walking the Array Ref node. Warning this will permanantly sort the actual data in the passed ref permanently. If a secondary ref also exists it will NOT be sorted! Sorting Arrays is not recommended.
- Default 0 (No sort)
- Range Boolean values and sort coderefs.
skip_HASH_ref
- Definition: This attribute is set to skip (or not) the processing of HASH Ref nodes.
- Default 0 (Don't skip)
- Range Boolean values.
skip_ARRAY_ref
- Definition: This attribute is set to skip (or not) the processing of ARRAY Ref nodes.
- Default 0 (Don't skip)
- Range Boolean values.
skip_SCALAR_ref
- Definition: This attribute is set to skip (or not) the processing of SCALAR's of ref branches.
- Default 0 (Don't skip)
- Range Boolean values.
change_array_size
- Definition: This attribute will not be used by this class directly. However the Data::Walk::Prune Role and the Data::Walk::Graft Role both use it so it is placed here so there will be no conflicts.
- Default 1 (This usually means that the array will grow or shrink when a position is added or removed)
- Range Boolean values.
fixed_primary
- Definition: This attribute will leaved the primary_ref data ref intact rather than deep cloning it. This also means that no changes made at lower levels will be passed upwards.
- Default 0 = The primary ref is not fixed (and will be changed / deep cloned)
- Range Boolean values.
Recursive Parsing Flow
Assess and implement the before_method
When the recursive process is called, the class checks for an available 'before_method'. Using the test;
exists $passed_ref->{before_method};
If the test passes then the next sequence is run.
$method = $passed_ref->{before_method};
$passed_ref = $self->$method( $passed_ref );
Then if the new $passed_ref contains the key $passed_ref->{bounce} or is undef the program deletes the key 'bounce' from the $passed_ref (as needed) and then returns $passed_ref directly back up the data tree. Do not pass 'Go' do not collect $200. Otherwise the $passed_ref is sent on to the node parser. If the $passed_ref is modified by the 'before_method' then the node parser will parse the new ref and not the old one.
Determine node type
The current node is examined to determine it's reference type. A node type below SCALAR called 'END' is generated to manage the 'before_method' and 'after_method' implementation. The relevant skip attribute is consulted and if this node should be skipped then the program goes directly to the "Assess and implement the after_method" step.
Identify node elements
If the node type is not skipped then a list is generated for all paths within a node. For example a 'HASH' node would generate a list of hash keys for that node. SCALARs are considered 'SCALAR' types and are handled as single element nodes with the scalar value as the only item in the list. 'END' nodes always have the empty set for this step. If the list should be sorted then the list is sorted. The node is then tested for an empty set. If the set is empty this is considered a 'base state' and the code skips to the "Assess and implement the after_method" step else the code sends the list to "Iterate through each element".
Iterate through each element
For each element a new $passed_ref is generated containing the data below that element. The secondary_ref is only constructed if it has a matching element to the primary ref. Matching for hashrefs is done by key matching only. Matching for arrayrefs is done by testing if the secondary_ref has the same array position available as the primary_ref. No position content compare is done!
A position trace is generated
The current node list position is then documented using an internally managed key of the $passed_ref labeled branch_ref. The array reference stored in branch_ref can be thought of as the stack trace that documents the node elements directly between the current position and the top (or zeroth) level of the parsed data_ref. Past completed branches and future pending branches are not shown. Each element of the branch_ref contains four positions used to describe the node and selections used to traverse that node level. The values in each sub position are;
[
ref_type, #The node reference type
the list item value or '' for ARRAYs, #key name for hashes, scalar value for scalars
element sequence position (from 0),#For hashes this is only relevent if sort_HASH is called
level of the node (from 0),#The zeroth level is the passed data ref
]
Going deeper in the data
The new (sub) $passed_ref is then passed as a new data set to be parsed and it starts at "Assess and implement the before_method".
Actions on return from recursion
When the values are returned from the recursion call the last branch_ref element is poped off and the returned value(s) is(are) used to replace the sub elements of the primary_ref and secondary_ref associated with that list element in the current $passed_ref. If there are still pending items in the node element list then the program returns to "Iterate through each element" else it moves to "Assess and implement the after_method".
Assess and implement the after_method
The class checks for an available 'after_method' using the test;
exists $passed_ref->{after_method};
If the test passes then the following sequence is run.
$method = $passed_ref->{after_method};
$passed_ref = $self->$method( $passed_ref );
Go up
The updated $passed_ref is passed back up to the next level.
GLOBAL VARIABLES
- $ENV{Smart_Comments}
-
The module uses Smart::Comments if the '-ENV' option is set. The 'use' is encapsulated in a BEGIN block triggered by the environmental variable to comfort non-believers. Setting the variable $ENV{Smart_Comments} will load and turn on smart comment reporting. There are three levels of 'Smartness' available in this module '### #### #####'.
SUPPORT
TODO
Support recursion through CodeRefs
Support recursion through Objects
Add a Data::Walk::Top Role to the package
Add a Data::Walk::Thin Role to the package
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
Dependencies
- 5.010
- version
- Carp
- Moose
- MooseX::StrictConstructor
- MooseX::Types::Moose
- Scalar::Util
- Class::Inspector
SEE ALSO
- Smart::Comments - is used if the -ENV option is set
- Data::Walk
- Data::Walker
- Data::Dumper - Dumper
- YAML - Dump
- Data::Walk::Print - available Data::Walk::Extracted Role
- Data::Walk::Prune - available Data::Walk::Extracted Role
- Data::Walk::Graft - available Data::Walk::Extracted Role
- Data::Walk::Clone - available Data::Walk::Extracted Role
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 1325:
L<> starts or ends with whitespace