NAME
Data::Walk::Extracted - An extracted dataref walker
SYNOPSIS
This is a contrived example! For more functional (complex/useful) examples see the roles in this package.
package Data::Walk::MyRole;
use Moose::Role;
requires '_process_the_data';
use MooseX::Types::Moose qw(
Str
ArrayRef
HashRef
);
my $mangle_keys = {
Hello_ref => 'primary_ref',
World_ref => 'secondary_ref',
};
#########1 Public Method 3#########4#########5#########6#########7#########8
sub mangle_data{
my ( $self, $passed_ref ) = @_;
@$passed_ref{ 'before_method', 'after_method' } =
( '_mangle_data_before_method', '_mangle_data_after_method' );
### Start recursive parsing
$passed_ref = $self->_process_the_data( $passed_ref, $mangle_keys );
### End recursive parsing with: $passed_ref
return $passed_ref->{Hello_ref};
}
#########1 Private Methods 3#########4#########5#########6#########7#########8
### If you are at the string level merge the two references
sub _mangle_data_before_method{
my ( $self, $passed_ref ) = @_;
if(
is_Str( $passed_ref->{primary_ref} ) and
is_Str( $passed_ref->{secondary_ref} ) ){
$passed_ref->{primary_ref} .= " " . $passed_ref->{secondary_ref};
}
return $passed_ref;
}
### Strip the reference layers on the way out
sub _mangle_data_after_method{
my ( $self, $passed_ref ) = @_;
if( is_ArrayRef( $passed_ref->{primary_ref} ) ){
$passed_ref->{primary_ref} = $passed_ref->{primary_ref}->[0];
}elsif( is_HashRef( $passed_ref->{primary_ref} ) ){
$passed_ref->{primary_ref} = $passed_ref->{primary_ref}->{level};
}
return $passed_ref;
}
package main;
use Modern::Perl;
use MooseX::ShortCut::BuildInstance qw(
build_instance
);
my $AT_ST = build_instance(
package => 'Greeting',
superclasses => [ 'Data::Walk::Extracted' ],
roles => [ 'Data::Walk::MyRole' ],
);
print $AT_ST->mangle_data( {
Hello_ref =>{ level =>[ { level =>[ 'Hello' ] } ] },
World_ref =>{ level =>[ { level =>[ 'World' ] } ] },
} ) . "\n";
#################################################################################
# Output of SYNOPSIS
# 01:Hello World
#################################################################################
DESCRIPTION
This module takes a data reference (or two) and recursivly travels through it(them). Where the two references diverge the walker follows the primary data reference. At the beginning and end of each "node" the code will attempt to call a method using data from the current location of the node.
Acknowledgement of MJD
This is an implementation of the concept of extracted data walking from Higher-Order-Perl Chapter 1 by Mark Jason Dominus. The book is well worth the money! With that said I diverged from MJD purity in two ways. This is object oriented code not functional code. Second, like the MJD equivalent, the code does nothing on its own. Unlike the MJD equivalent it looks for methods provided in a role or class extention at the appropriate places for action. The MJD equivalent expects to use a passed CodeRef at the action points. There is clearly some overhead associated with both of these differences. I made those choices consciously and if that upsets you do not hassle MJD!
What is the unique value of this module?
With the recursive part of data walking extracted the various functionalities desired when walking the data can be modularized without copying this code. The Moose framework also allows diverse and targeted data parsing without dragging along a kitchen sink API for every implementation of this Class.
Extending Data::Walk::Extracted
All action taken during the data walking must be initiated by implementation of action methods that do not exist in this class. They can be added with a traditionally incorporated Role Moose::Role, by extending the class, or joined to the class later. See MooseX::ShortCut::BuildInstance. or Moose::Util for more class building information. See the "Recursive Parsing Flow" to understand the details of how the methods are used.
Requirements to build a role that uses this class
First build either or both of the before and after action methods. Then create the 'action' method for the role. This would preferably be named something descriptive like 'mangle_data'. Remember if more than one role is added to Data::Walk::Extracted then all methods should be named with consideration for other (future?) method names. The 'mangle_data' method should gather any action methods and data references into a $passed_ref the pass this reference and possibly a "$conversion_ref" to be used by _process_the_data . Then the 'action' method should call;
$passed_ref = $self->_process_the_data( $passed_ref, $conversion_ref );
See the "Recursive Parsing Flow" for the details of this action.
Finally, Write some tests for your role!
Recursive Parsing Flow
Assess and implement the before_method
The class next checks for an available 'before_method'. Using the test;
exists $passed_ref->{before_method};
If the test passes then the next sequence is run.
$method = $passed_ref->{before_method};
$passed_ref = $self->$method( $passed_ref );
If the $passed_ref is modified by the 'before_method' then the recursive parser will parse the new ref and not the old one.
Identify node elements
If the next node type is not skipped then a list is generated for all paths within that lower node. For example a 'HASH' node would generate a list of hash keys for that node. SCALARs are handled as a list with one element single element and UNDEFs are an empty list. If the list should be sorted then the list is sorted. ARRAYS are hard sorted. This means that the actual items in the (primary) passed data ref are permanantly sorted.
Iterate through each element
For each element a new $passed_ref is generated containing the data below that element. The down level secondary_ref is only constructed if it has a matching type/element to the primary ref. Matching for hashrefs is done by key matching only. Matching for arrayrefs is done by position exists testing only. No position content compare is done! Scalars are matched on content. The list of items generated for this element is as follows;
- before_method => -->name of before method for this role here<--
- after_method => -->name of after method for this role here<--
- branch_ref => An array ref of array refs
- primary_ref => the piece of the primary data ref below this element
- primary_type => the lower primary ref type
- match => YES|NO (This indicates if the secondary ref meets matching critera
- skip => YES|NO Checks the three skip attributes against the lower primary_ref node. This can also be adjusted in a 'before_method' upon arrival at that node.
- secondary_ref => if match eq 'YES' then built like the primary ref
- secondary_type => if match eq 'YES' then calculated like the primary type
A position trace is generated
The current node list position is then documented using an internally managed key of the $passed_ref labeled 'branch_ref'. The array reference stored in branch_ref can be thought of as the stack trace that documents the node elements directly between the current position and the initial (or zeroth) level of the parsed primary data_ref. Past completed branches and future pending branches are not maintained. Each element of the branch_ref contains four positions used to describe the node and selections used to traverse that node level. The values in each sub position are;
[
ref_type, #The node reference type
the list item value or '' for ARRAYs,
#key name for hashes, scalar value for scalars
element sequence position (from 0),
#For hashes this is only relevent if sort_HASH is called
level of the node (from 0),
`#The zeroth level is the passed data ref
]
Going deeper in the data
The down level ref is then passed as a new data set to be parsed and it starts at "Assess and implement the before_method".
Actions on return from recursion
When the values are returned from the recursion call the last branch_ref element is poped off and the returned data ref is used to replace the sub elements of the primary_ref and secondary_ref associated with that list element in the current level of the $passed_ref. If there are still pending items in the node element list then the program returns to "Iterate through each element" else it moves to "Assess and implement the after_method".
Assess and implement the after_method
The class checks for an available 'after_method' using the test;
exists $passed_ref->{after_method};
If the test passes then the following sequence is run.
$method = $passed_ref->{after_method};
$passed_ref = $self->$method( $passed_ref );
If the $passed_ref is modified by the 'after_method' then the recursive parser will parse the new ref and not the old one.
Go up
The updated $passed_ref is passed back up to the next level.
Attributes
Data passed to ->new when creating an instance. For modification of these attributes see "Public Methods". The ->new function will either accept fat comma lists or a complete hash ref that has the possible attributes as the top keys. Additionally some attributes that meet certain criteria can be passed to _process_the_data and will be adjusted for just the run of that method call. These are called one shot attributes. Nested calls to _process_the_data will be tracked and the attribute will remain in force until the parser returns to the calling 'one shot' level. Previous attribute values are restored after the 'one shot' attribute value expires.
sorted_nodes
- Definition: This attribute is set to sort (or not) the list of items in each node.
- Default {} #Nothing is sorted
- Range This accepts a HashRef.
-
The keys are only used if they match a node type identified by the function _extracted_ref_type. The value for the key can be anything, but if it is a CODEREF it will be treated as a sort function in perl. In general it is sorting a list of strings not the data structure itself. The sort will be applied as follows.
@node_list = sort $coderef @node_list
For the type 'ARRAY' the node is sorted (permanantly) as well as the list. This means that if the array contains a list of references it will effectivly sort in memory pointer order. Additionally the 'secondary_ref' node is not sorted, so prior alignment may break. In general ARRAY sorts are not recommended.
- Example:
-
sorted_nodes =>{ ARRAY => 1,#Will sort the primary_ref only HASH => sub{ $b cmp $a }, #reverse sort the keys }
skipped_nodes
- Definition: This attribute is set to skip (or not) node parsing by type. If the current node type matches (eq) the primary_type then the 'before_method' and 'after_method' are run at that node but no parsing is done.
- Default {} #Nothing is skipped
- Range This accepts a HashRef.
-
The keys are only used if they match a node type identified by the function _extracted_ref_type. The value for the key can be anything.
skip_level
- Definition: This attribute is set to skip (or not) node parsing at a given level. Because the process doesn't start checking until after it enters the data ref it effectivly ignores a skip_level set to 0 (The base node level).
- Default undef #Nothing is skipped
- Range This accepts an integer
skip_node_tests
- Definition: This attribute contains a list of test conditions used to skip certain targeted nodes. The test can target, array position, match a hash key, even restrict the test to only one level. The test is run against the branch_ref so it skips the node below the matching conditions not the node at the matching conditions. Matching is done with either 'eq' or '=~'. The attribute is passed an ArrayRef of ArrayRefs. Each sub_ref contains the following;
-
- $type - this is any of the identified reference node types
- $key - this is either a scalar or regexref to use for matching a hash key
- $position - this is used to match an array position can be an integer or 'ANY'
- $level - this restricts the skipping test usage to a specific level only or 'ANY'
- Example
-
[ [ 'HASH', 'KeyWord', 'ANY', 'ANY'], # Skip the node below the value of any hash key eq 'Keyword' [ 'ARRAY', 'ANY', '3', '4'], ], # Skip the nodes below arrays at position three on level four ]
- Range an infinite number of skip tests added to an array
- Default [] = no nodes are skipped
change_array_size
- Definition: This attribute will not be used by this class directly. However the Data::Walk::Prune Role and the Data::Walk::Graft Role both use it so it is placed here so there will be no conflicts.
- Default 1 (This usually means that the array will grow or shrink when a position is added or removed)
- Range Boolean values.
fixed_primary
- Definition: This means that no changes made at lower levels will be passed upwards into the final ref.
- Default 0 = The primary ref is not fixed (and can be changed) 0 effectively deep clones the portions of the primary ref that are traversed.
- Range Boolean values.
Methods
Methods used to write Roles
_process_the_data( $passed_ref, $conversion_ref )
- Definition: This method is the core access to the recursive parsing of Data::Walk::Extracted. It should only be used by a method in consuming roles or classes. It should not be used by the end user. This method scrubs the inputs and then sends them to the recursive function.
- Accepts: ( $passed_ref, $conversion_ref )
-
- $passed_ref this ref contains key value pairs as follows;
-
- primary_ref - a dataref that the walker will walk. This can be renamed with a $conversion_ref - required
- secondary_ref - a dataref that is used for comparision while walking. (can be renamed) - optional
- before_method - a method name that will perform some action at the beginning of each node - optional
- after_method - a method name that will perform some action at the end of each node - optional
- [attribute name] - attribute names are accepted with temporary attribute settings. These settings are temporarily set for a single "_process_the_data" call and then the original attribute values are restored. For this to work the the attribute must have the following prefixed methods; get_$name, set_$name, clear_$name, and has_$name. - optional
- $conversion_ref This allows a public method to accept different key names for the various keys listed above and then convert them later to the generic terms used by this class. - optional
- Example
-
$passed_ref ={ print_ref =>{ First_key => [ 'first_value', 'second_value' ], }, match_ref =>{ First_key => 'second_value', }, before_method => '_print_before_method', after_method => '_print_after_method', sorted_nodes =>{ Array => 1 },#One shot attribute setter } $conversion_ref ={ primary_ref => 'print_ref',# generic_name => role_name, secondary_ref => 'match_ref', }
- Action: This method begins by scrubing the top level of the inputs and ensures that the minimum requirements for the recursive data parser are met. If needed it will use a conversion ref (also provided by the caller) to change input hash keys to the generic hash keys used by this class. This function then calls the actual recursive function. For a better understanding of the recursive steps see "Recursive Parsing Flow".
- Returns: the $passed_ref (only) with the key names restored to their original versions.
_build_branch( $seed_ref, @arg_list )
- Definition: There are times when a role will wish to reconstruct the data branch that lead from the 'zeroth' node to where the data walker is currently at. This private method takes a seed reference and uses the branch ref to recursivly append to the front of the seed until a complete branch to the zeroth node is generated.
- Accepts: a list of arguments starting with the $seed_ref to build from. The remaining arguments are just the array elements of the 'branch ref'.
- Example
-
$ref = $self->_build_branch( $seed_ref, @{ $passed_ref->{branch_ref}}, );
- Returns: a data reference with the current path back to the start pre-pended to the $seed_ref
_extracted_ref_type( $test_ref )
- Definition: In order to manage data types necessary for this class a data walker compliant 'Type' tester is provided. This is necessary to support a few non perl-standard types not generated in standard perl typing systems. First, 'undef' is the UNDEF type. Second, strings and numbers both return as 'SCALAR' (not '' or undef). Much of the code in this package runs on dispatch tables that are built around these specific type definitions.
- Accepts: This method expects to be called by $self. It receives a data reference that can include/be undef.
- Returns: a data walker type or it confesses. (For more details see $discover_type_dispatch in the code)
_get_had_secondary
- Definition: during the initial processing of data in _process_the_data the existence of a passed secondary ref is tested and stored in the attribute '_had_secondary'. On occasion a role might need to know if a secondary ref existed at any level if it it is not represented at the current level.
- Accepts: nothing
- Returns: True|1 if the secondary ref ever existed
_get_current_level
- Definition: on occasion you may need for one of the methods to know what level is currently being parsed. This will provide that information in integer format.
- Accepts: nothing
- Returns: the integer value for the level
[_private 'one shot' attributes]
- Definition: private one shot attributes in roles are allowed as well. If you would like to implement a private one shot attribute that is not exposed to the end user then adding the '_' prefix to the attribute name and creating the appropriate _get, _set, _clear, and _has methods will enable this.
Public Methods
add_sorted_nodes( NODETYPE => 1, )
- Definition: This method is used to add nodes to be sorted to the walker by adjusting the attribute "sorted_nodes".
- Accepts: Node key => value pairs where the key is the Node name and the value is 1. This method can accept multiple key => value pairs.
- Returns: nothing
has_sorted_nodes
- Definition: This method checks if any sorting is turned on in the attribute "sorted_nodes".
- Accepts: Nothing
- Returns: the count of sorted node types listed
check_sorted_nodes( NODETYPE )
- Definition: This method is used to see if a node type is sorted by testing the attribute "sorted_nodes".
- Accepts: the name of one node type
- Returns: true if that node is sorted as determined by "sorted_nodes"
clear_sorted_nodes
- Definition: This method will clear all values in the attribute "sorted_nodes". and therefore turn off those sorts.
- Accepts: nothing
- Returns: nothing
remove_sorted_node( NODETYPE1, NODETYPE2, )
- Definition: This method will clear the key / value pairs in "sorted_nodes" for the listed items.
- Accepts: a list of NODETYPES to delete
- Returns: In list context it returns a list of values in the hash for the deleted keys. In scalar context it returns the value for the last key specified
set_sorted_nodes( $hashref )
- Definition: This method will completely reset the attribute "sorted_nodes" to $hashref.
- Accepts: a hashref of NODETYPE keys with the value of 1.
- Returns: nothing
get_sorted_nodes
- Definition: This method will return a hashref of the attribute "sorted_nodes"
- Accepts: nothing
- Returns: a hashref
add_skipped_nodes( NODETYPE1 => 1, NODETYPE2 => 1 )
- Definition: This method adds additional skip definition(s) to the "skipped_nodes" attribute.
- Accepts: a list of key value pairs as used in 'skipped_nodes'
- Returns: nothing
has_skipped_nodes
- Definition: This method checks if any nodes are set to be skipped in the attribute "skipped_nodes".
- Accepts: Nothing
- Returns: the count of skipped node types listed
check_skipped_node( $string )
- Definition: This method checks if a specific node type is set to be skipped in the "skipped_nodes" attribute.
- Accepts: a string
- Returns: Boolean value indicating if the specific $string is set
remove_skipped_nodes( NODETYPE1, NODETYPE2 )
- Definition: This method deletes specificily identified node skips from the "skipped_nodes" attribute.
- Accepts: a list of NODETYPES to delete
- Returns: In list context it returns a list of values in the hash for the deleted keys. In scalar context it returns the value for the last key specified
clear_skipped_nodes
- Definition: This method clears all data in the "skipped_nodes" attribute.
- Accepts: nothing
- Returns: nothing
set_skipped_nodes( $hashref )
- Definition: This method will completely reset the attribute "skipped_nodes" to $hashref.
- Accepts: a hashref of NODETYPE keys with the value of 1.
- Returns: nothing
get_skipped_nodes
- Definition: This method will return a hashref of the attribute "skipped_nodes"
- Accepts: nothing
- Returns: a hashref
set_skip_level( $int)
- Definition: This method is used to reset the "skip_level" attribute after the instance is created.
- Accepts: an integer (negative numbers and 0 will be ignored)
- Returns: nothing
get_skip_level()
- Definition: This method returns the current "skip_level" attribute.
- Accepts: nothing
- Returns: an integer
has_skip_level()
- Definition: This method is used to test if the "skip_level" attribute is set.
- Accepts: nothing
- Returns: $Bool value indicating if the 'skip_level' attribute has been set
clear_skip_level()
- Definition: This method clears the "skip_level" attribute.
- Accepts: nothing
- Returns: nothing (always successful)
set_skip_node_tests( ArrayRef[ArrayRef] )
- Definition: This method is used to change (completly) the 'skip_node_tests' attribute after the instance is created. See "skip_node_tests" for an example.
- Accepts: an array ref of array refs
- Returns: nothing
get_skip_node_tests()
- Definition: This method returns the current master list from the "skip_node_tests" attribute.
- Accepts: nothing
- Returns: an array ref of array refs
has_skip_node_tests()
- Definition: This method is used to test if the "skip_node_tests" attribute is set.
- Accepts: nothing
- Returns: The number of sub array refs there are in the list
clear_skip_node_tests()
- Definition: This method clears the "skip_node_tests" attribute.
- Accepts: nothing
- Returns: nothing (always successful)
add_skip_node_tests( ArrayRef1, ArrayRef2 )
- Definition: This method adds additional skip_node_test definition(s) to the the "skip_node_tests" attribute list.
- Accepts: a list of array refs as used in 'skip_node_tests'. These are 'pushed onto the existing list.
- Returns: nothing
set_change_array_size( $bool )
- Definition: This method is used to change the "change_array_size" attribute after the instance is created.
- Accepts: a Boolean value
- Returns: nothing
get_change_array_size()
- Definition: This method returns the current state of the "change_array_size" attribute.
- Accepts: nothing
- Returns: $Bool value representing the state of the 'change_array_size' attribute
has_change_array_size()
- Definition: This method is used to test if the "change_array_size" attribute is set.
- Accepts: nothing
- Returns: $Bool value indicating if the 'change_array_size' attribute has been set
clear_change_array_size()
- Definition: This method clears the "change_array_size" attribute.
- Accepts: nothing
- Returns: nothing
set_fixed_primary( $bool )
- Definition: This method is used to change the "fixed_primary" attribute after the instance is created.
- Accepts: a Boolean value
- Returns: nothing
get_fixed_primary()
- Definition: This method returns the current state of the "fixed_primary" attribute.
- Accepts: nothing
- Returns: $Bool value representing the state of the 'fixed_primary' attribute
has_fixed_primary()
- Definition: This method is used to test if the "fixed_primary" attribute is set.
- Accepts: nothing
- Returns: $Bool value indicating if the 'fixed_primary' attribute has been set
clear_fixed_primary()
- Definition: This method clears the "fixed_primary" attribute.
- Accepts: nothing
- Returns: nothing
Definitions
node
Each branch point of a data reference is considered a node. The original top level reference is the 'zeroth' node. Recursion 'Base state' nodes are understood to have zero elements so an additional node called 'END' type is recognized after a scalar.
Supported node walking types
- ARRAY
- HASH
- SCALAR
- UNDEF
Other node support
Support for Objects is partially implemented and as a consequence '_process_the_data' won't immediatly die when asked to parse an object. It will still die but on a dispatch table call that indicates where there is missing object support, not at the top of the node. This allows for some of the skip attributes to use 'OBJECT' in their definitions.
Supported one shot attributes
- sorted_nodes
- skipped_nodes
- skip_level
- skip_node_tests
- change_array_size
- fixed_primary
- explanation
Dispatch Tables
This class uses the role Data::Walk::Extracted::Dispatch to implement dispatch tables. When there is a decision point, that role is used to make the class extensible.
Caveat utilitor
This is not an extention of Data::Walk
The core class has no external effect. All output comes from addtions to the class.
This module uses the 'defined or' ( //= ) and so requires perl 5.010 or higher.
This is a Moose based data handling class. Many coders will tell you Moose and data manipulation don't belong together. They are most certainly right in speed intensive circumstances.
Recursive parsing is not a good fit for all data since very deep data structures will burn a fair amount of perl memory! Meaning that as the module recursively parses through the levels perl leaves behind snapshots of the previous level that allow perl to keep track of it's location.
The primary_ref and secondary_ref are effectivly deep cloned during this process. To leave the primary_ref pointer intact see "fixed_primary"
GLOBAL VARIABLES
- $ENV{Smart_Comments}
-
The module uses Smart::Comments if the '-ENV' option is set. The 'use' is encapsulated in an if block triggered by an environmental variable to comfort non-believers. Setting the variable $ENV{Smart_Comments} in a BEGIN block will load and turn on smart comment reporting. There are three levels of 'Smartness' available in this module '###', '####', and '#####'.
SUPPORT
TODO
provide full recursion through Objects
Support recursion through CodeRefs
Add a Data::Walk::Diff Role to the package
Add a Data::Walk::Top Role to the package
Add a Data::Walk::Thin Role to the package
Add a Data::Walk::Substitute Role to the package
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
Dependencies
- 5.010 (for use of defined or //)
- Moose
- MooseX::StrictConstructor
- Class::Inspector
- Scalar::Util
- Carp
- MooseX::Types::Moose
- Data::Walk::Extracted::Types
- Data::Walk::Extracted::Dispatch
SEE ALSO
- Smart::Comments - is used if the -ENV option is set
- Data::Walk
- Data::Walker
- Data::Dumper - Dumper
- YAML - Dump
- Data::Walk::Print - available Data::Walk::Extracted Role
- Data::Walk::Prune - available Data::Walk::Extracted Role
- Data::Walk::Graft - available Data::Walk::Extracted Role
- Data::Walk::Clone - available Data::Walk::Extracted Role