NAME
CXC::Data::Visitor - Invoke a callback on every element at every level of a data structure.
VERSION
version 0.12
SYNOPSIS
use CXC::Data::Visitor 'visit', 'RESULT_CONTINUE';
my %root = (
fruit => {
berry => 'purple',
apples => [ 'fuji', 'macoun' ],
} );
visit(
\%root,
sub ( $kydx, $vref, @ ) {
$vref->$* = 'blue' if $kydx eq 'berry';
return RESULT_CONTINUE;
} );
say $root{fruit}{berry}
results in
blue
DESCRIPTION
CXC::Data::Visitor::visit performs a depth-first traversal of a data structure, invoking a provided callback subroutine on elements in the structure.
Features
- The type of element passed to the callback (containers, terminal elements) can be selected.
- The order of traversal at a given depth (i.e. within a container's elements) may be customized.
- The callback can modify the traversal process.
- The complete path from the structure to an element (both the ancestor containers and the keys and indexes required to traverse the path) is available to the callback.
- Cycles are detected upon traversing a container a second time in a depth first search, and the resultant action may be specified.
- Objects are treated as terminal elements and are not traversed.
- Containers that can be reached multiple times without cycling are visited once per parent.
Overview
visit recursively traverses the container, $root, calling the
passed subroutine, $callback on each element, $element, which
is allowed by the "visit" option.
The traversal is depth-first, e.g. if $element is a
container, $callback is called on it and then its contents before
processing $element's siblings.
Each container's contents are traversed in sorted order. For hashes, this is alphabetical, for arrays, numerical. (This may be changed with the "key_sort" and "idx_sort" options).
For example, the default traversal order for the structure in the "SYNOPSIS" is
+-------------------------+-----------------------+-----+
| path | value | idx |
+-------------------------+-----------------------+-----+
| $root{fruit} | \$root{fruit} | 0 |
| $root{fruit}{apples} | \$root{fruit}{apples} | 0 |
| $root{fruit}{apples}[0] | fuji | 0 |
| $root{fruit}{apples}[1] | macoun | 1 |
| $root{fruit}{berry} | purple | 1 |
+-------------------------+-----------------------+-----+
Containers that can be reached multiple times without cycling, e.g.
%hash = ( a => { b => 1 }, );
$hash{c} = $hash{a};
are visited once per parent, e.g.
{a}, {a}{b}
{c}, {c}{b}
$callback's return value indicates how visit should proceed (see
"Traversal Directives"). The simplest directive is to continue
traversal; additional directives abort the traversal,
abort descent into a container, revisit the current container
immediately, revisit a container after its contents are visited,
and other obscure combinations.
USAGE
visit has the following signature:
( $completed, $context, $metadata ) = visit( $root, $callback, %options )
The two mandatory arguments are $root, a reference to either a
hash or an array, and $callback, a reference to a subroutine which
will be invoked on visited elements. By default $callback is invoked on
$root's elements, not on $root itself; use the "VISIT_ROOT"
flag change this.
-
$completed => Boolean
true if all elements were visited, false if $callback requested a premature return.
-
$context
The variable of the same name passed to $callback; see the "context" option.
-
$metadata => hash
collected metadata. See "Metadata".
Options
visit may be passed the following options:
-
context optional
Arbitrary data to be passed to "$callback" via the
$contextargument. Use it for whatever you'd like. If not specified, a hash will be created. -
cycle => constant|coderef
How cycles within
$rootshould be handled. See "Cycles". -
visit => constant
Specify elements (by type) which will be passed to
$callback. See "Element Filters" -
key_sort => boolean |
$coderefThe order of keys when traversing hashes. If true (the default), the order is that returned by Perl's
sortroutine. If false, it is the order returned that Perl'skeysroutine.If a coderef, it is used to sort the keys. It is called as
\@sorted_keys = $coderef->( \@unsorted_keys ); -
idx_sort =>
$coderefBy default array elements are traversed in order of their ascending index. Use "idx_sort" to specify a subroutine which returns them in an alternative order. It is called as
\@indices = $coderef->( $n );where
$nis the number of elements in the array. -
sort_keys => coderef
DEPRECATED
An optional coderef which implements a caller specific sort order. It is passed two keys as arguments. It should return
-1,0, or1indicating that the sort order of the first argument is less than, equal to, or greater than that of the second argument. -
revisit_limit
A container may be scanned multiple times during a visit to it. This sets the maximum number of times the container is re-scanned during a visit before
visitthrows an exception to avoid infinite loops. This limit also applies to "RESULT_REVISIT_ROOT".The defaults is
10. Set it to0to indicate no limit.
Callback
visit invokes $callback on selected elements of $root (see
"Element Filters"). $callback is invoked as
$directive = $callback->( $kydx, $vref, $context, \%metadata );
The arguments passed to $callback are:
-
$kydx
The location (key or index) of the element in its parent container. This will be undefined when
$callbackis invoked on$root(see "VISIT_ROOT"). -
$vref
A reference to the element. Use $vref->$* to extract or modify the element's value. Do not cache this value; the full path to the element is provided via the "$metadata" argument.
-
$context
A reference to data reserved for use by
$callback. See the "context" option. -
$metadata
A hash of state information used to keep track of progress. While primarily of use by
visit, some may be of interest to$callback. See "Metadata"
Traversal Directives
"$callback" must return a constant (see "EXPORTS")
indicating what visit should do next. Not all constants
are allowed in all contexts in which $callback is invoked;
see "Calling Contexts and Allowed Traversal Directives".
Single Directives
-
RESULT_CONTINUE
Visit the next element in the parent container.
+-------------------------+-----------------------+-----+ | path | value | idx | +-------------------------+-----------------------+-----+ | $root{fruit} | \$root{fruit} | 0 | | $root{fruit}{apples} | \$root{fruit}{apples} | 0 | | $root{fruit}{apples}[0] | fuji | 0 | | $root{fruit}{apples}[1] | macoun | 1 | | $root{fruit}{berry} | purple | 1 | +-------------------------+-----------------------+-----+ -
RESULT_RETURN
Return immediately to the caller of
visit. -
RESULT_STOP_DESCENT
If the current element is a hash or array, do not visit its contents, and visit the next element in the parent container.
If the element is not a container, the next element in the container will be visited (just as with "RESULT_CONTINUE").
For example, If
RESULT_STOP_DESCENTis returned when$root{fruit}{apples}is traversed, the traversal would look like this:+----------------------+-----------------------+-----+ | path | value | idx | +----------------------+-----------------------+-----+ | $root{fruit} | \$root{fruit} | 0 | | $root{fruit}{apples} | \$root{fruit}{apples} | 0 | | $root{fruit}{berry} | purple | 1 | +----------------------+-----------------------+-----+ -
RESULT_REVISIT_CONTENTS
Do not visit the next element in the parent container. restart with the first element in the container. The order of elements is determined when the container is visited, so starts within a visit will have the same order.
For example, if
RESULT_REVISIT_CONTENTSis returned the first time$root{fruit}{apples}[0]is traversed, the traversal would look like this:+-------------------------+-----------------------+-----+-------+ | path | value | idx | visit | +-------------------------+-----------------------+-----+-------+ | $root{fruit} | \$root{fruit} | 0 | 1 | | $root{fruit}{apples} | \$root{fruit}{apples} | 0 | 1 | | $root{fruit}{apples}[0] | fuji | 0 | 1 | | $root{fruit}{apples}[0] | fuji | 0 | 2 | | $root{fruit}{apples}[1] | macoun | 1 | 2 | | $root{fruit}{berry} | purple | 1 | 1 | +-------------------------+-----------------------+-----+-------+To avoid inadvertent infinite loops, the number of revisits during a traversal of a container is limited (see "revisit_limit"). Containers with multiple parents are traversed once per parent; The limit is reset for each traversal.
-
RESULT_REVISIT_ROOT
Stop processing and re-start at
$root. To avoid inadvertent infinite loops, the number of revisits is limited (see "revisit_limit"). -
RESULT_REVISIT_ELEMENT
If the current element is not a container, the next element in the container will be visited (just as with "RESULT_CONTINUE").
If the current element is a container, its contents will be visited, and "$callback" will be invoked on it again afterwards.
During the call to
$callbackon the container prior to visiting its contents,$metadata->{pass} & PASS_VISIT_ELEMENTwill be true. During the followup visit
$metadata->{pass} & PASS_REVISIT_ELEMENTwill be true.
For example, If
RESULT_REVISIT_ELEMENTis returned when$root{fruit}{apples}is traversed, the traversal would look like this:+-------------------------+-----------------------+-----+----------------------+ | path | value | idx | pass | +-------------------------+-----------------------+-----+----------------------+ | $root{fruit} | \$root{fruit} | 0 | PASS_VISIT_ELEMENT | | $root{fruit}{apples} | \$root{fruit}{apples} | 0 | PASS_VISIT_ELEMENT | | $root{fruit}{apples}[0] | fuji | 0 | PASS_VISIT_ELEMENT | | $root{fruit}{apples}[1] | macoun | 1 | PASS_VISIT_ELEMENT | | $root{fruit}{apples} | \$root{fruit}{apples} | 0 | PASS_REVISIT_ELEMENT | | $root{fruit}{berry} | purple | 1 | PASS_VISIT_ELEMENT | +-------------------------+-----------------------+-----+----------------------+
Mixed Directives
Some directives can be mixed with "RESULT_REVISIT_CONTENTS" and "RESULT_REVISIT_ELEMENT" by performing a binary OR with them.
-
RESULT_STOP_DESCENT | RESULT_REVISIT_CONTENTS
If the current element is not a container, the next element in the container will be visited (just as with "RESULT_CONTINUE").
If the current element is a hash or array, do not visit its contents, and continue with the next element in the parent container. For non-container elements, continue with the next element in the parent container.
After all of the container's contents have been visited, start again with the first element in the container.
For example, if
RESULT_STOP_DESCENT | RESULT_REVISIT_CONTENTSis returned when$root{fruit}{apples}is traversed when$metadata-{visit} ==1>, the traversal would look like+----------------------+-----------------------+-----+-------+ | path | value | idx | visit | +----------------------+-----------------------+-----+-------+ | $root{fruit} | \$root{fruit} | 0 | 1 | | $root{fruit}{apples} | \$root{fruit}{apples} | 0 | 1 | | $root{fruit}{berry} | purple | 1 | 1 | | $root{fruit}{apples} | \$root{fruit}{apples} | 0 | 2 | | $root{fruit}{berry} | purple | 1 | 2 | +----------------------+-----------------------+-----+-------+ -
RESULT_CONTINUE | RESULT_REVISIT_CONTENTS
Visit the remaining elements in the parent container, then start again with the first element in the container.
For example, if
RESULT_CONTINUE | RESULT_REVISIT_CONTENTSis returned when$callbackis first passed$root{fruit}{apples}[0], the traversal would look like+-------------------------+-----------------------+-----+-------+ | path | value | idx | visit | +-------------------------+-----------------------+-----+-------+ | $root{fruit} | \$root{fruit} | 0 | 1 | | $root{fruit}{apples} | \$root{fruit}{apples} | 0 | 1 | | $root{fruit}{apples}[0] | fuji | 0 | 1 | | $root{fruit}{apples}[1] | macoun | 1 | 1 | | $root{fruit}{apples}[0] | fuji | 0 | 2 | | $root{fruit}{apples}[1] | macoun | 1 | 2 | | $root{fruit}{berry} | purple | 1 | 1 | +-------------------------+-----------------------+-----+-------+
Calling Contexts and Allowed Traversal Directives
$callback's allowed return value depends upon the context it is
called in. $callback may be called on an element multiple times
during different stages of traversal.
When invoked on an element during a scan of its parent container
-
The
passmetadata attribute is set toPASS_VISIT_ELEMENT -
$callbackmust return one ofRESULT_REVISIT_CONTENTS RESULT_RETURN RESULT_CONTINUE RESULT_CONTINUE | RESULT_REVISIT_CONTENTS RESULT_STOP_DESCENT | RESULT_REVISIT_CONTENTS RESULT_CONTINUE | RESULT_REVISIT_ELEMENT
When invoked on a container immediately after its contents have been visited
-
The
passmetadata attribute is set toPASS_REVISIT_ELEMENT -
$callbackmust return one ofRESULT_RETURN RESULT_CONTINUE RESULT_REVISIT_CONTENTS RESULT_CONTINUE | RESULT_REVISIT_CONTENTS
When invoked on $root before its contents have been visited
See "VISIT_ROOT".
-
The
passmetadata attribute is set toPASS_VISIT_ELEMENT -
$callbackmust return one ofRESULT_CONTINUE RESULT_CONTINUE | RESULT_REVISIT_ELEMENT RESULT_RETURN RESULT_REVISIT_ROOT RESULT_STOP_DESCENT
When invoked on the $root immediately after its elements have been visited
See "VISIT_ROOT" and "RETURN_REVISIT_ELEMENT".
-
The
passmetadata attribute is set toPASS_REVISIT_ELEMENT -
$callbackmust return one ofRESULT_RETURN RESULT_CONTINUE
Metadata
$callback is passed a hash of state information ($metadata) kept
by CXC::Data::Visitor::visit, some of which may be of interest to
the callback:
$metadata has the following entries:
-
container
A reference to the hash or array which contains the element being visited.
-
path
An array which contains the path (keys and indices) used to arrive at the current element from $root.
-
ancestors
An array containing references to the ancestor containers of the current element.
-
pass
A constant indicating the current visit pass of an element. See "RESULT_REVISIT_ELEMENT".
-
visit
A unary-based counter indicating the number of times the element's container has been scanned and its contents processed in a single visit. This will be greater than
1if the "RESULT_REVISIT_CONTENTS" directive has been applied. It is not the number of times that the element has been visited, as scans may be interrupted and restarted. -
idx
A zero-based index indicating the order of the element in its container. Ordering depends upon how container elements are sorted; see "key_sort" and "idx_sort".
Element Filters
The parts of the structure that will trigger a callback. Note that
by default the passed top level structure, $root is not
passed to the callback. See "VISIT_ROOT".
See "EXPORTS" to import the constants.
-
VISIT_CONTAINER
Invoke "$callback" on containers (either hashes or arrays). For example, the elements in the following structure
$root = { a => { b => 1, c => [ 2, 3 ] } }passed to "$callback" are:
a => {...} # $root->{a} c => [...] # $root->{c} -
VISIT_ARRAY
-
VISIT_HASH
Only visit containers of the given type.
-
VISIT_LEAF
Invoke "$callback" on terminal (leaf) elements. For example, the elements in the following structure
$root = { a => { b => 1, c => [ 2, 3 ] } }passed to "$callback" are:
b => 1 # $root->{a}{b} 0 => 2 # $root->{a}{c}[0] 1 => 3 # $root->{a}{c}[1] -
VISIT_ALL
Invoke "$callback" on all elements except for
$root. This is the default. -
VISIT_ROOT
Pass
$rootto$callback. To filter on one of the other values, pass a binary OR of "VISIT_ROOT" and the other filter, e.g.VISIT_ROOT | VISIT_LEAFSpecifying "VISIT_ROOT" on its own is equivalent to
VISIT_ROOT | VISIT_ALL
Cycles
-
CYCLE_DIE
Throw an exception (the default).
-
CYCLE_CONTINUE
Pretend we haven't seen it before. Will cause stack exhaustion if $callback does handle this.
-
CYCLE_TRUNCATE
Truncate before entering the cycle a second time.
-
$coderef
Examine the situation and request a particular resolution. $coderef is called as
$coderef->( $container, $context, $metadata );where $container is the hash or array which has already been traversed. See below for "$context" and "$metadata".
$coderef should return one of CYCLE_DIE, CYCLE_CONTINUE, or CYCLE_TRUNCATE, indicating what should be done.
EXPORTS
This module uses Exporter::Tiny, which provides enhanced import utilities.
Subroutines
The following subroutine may be imported:
visit
Constants
Constants may be imported individually or as groups via tags. The available tags and their respective imported symbols are:
-
all
Import all symbols.
-
results
RESULT_CONTINUE RESULT_RETURN RESULT_REVISIT_CONTAINER # deprecated alias for RESULT_REVISIT_CONTENTS RESULT_REVISIT_CONTENTS RESULT_REVISIT_ELEMENT RESULT_REVISIT_ROOT RESULT_STOP_DESCENT -
cycles
CYCLE_CONTINUE CYCLE_DIE CYCLE_TRUNCATE -
visits
VISIT_ALL VISIT_CONTAINER VISIT_LEAF VISIT_ROOT -
passes
PASS_REVISIT_ELEMENT PASS_VISIT_ELEMENT -
constants
Import tags
cycles,passes,results,visits.
DEPRECATED CONSTRUCTS
- RESULT_REVISIT_CONTAINER is a deprecated alias for "RESULT_REVISIT_CONTENTS".
SUPPORT
Bugs
Please report any bugs or feature requests to bug-cxc-data-visitor@rt.cpan.org or through the web interface at: https://rt.cpan.org/Public/Dist/Display.html?Name=CXC-Data-Visitor
Source
Source is available at
https://codeberg.org/CXC-Optics/p5-CXC-Data-Visitor
and may be cloned from
https://codeberg.org/CXC-Optics/p5-CXC-Data-Visitor.git
SEE ALSO
Please see those modules/websites for more information related to this module.
AUTHOR
Diab Jerius djerius@cpan.org
COPYRIGHT AND LICENSE
This software is Copyright (c) 2024 by Smithsonian Astrophysical Observatory.
This is free software, licensed under:
The GNU General Public License, Version 3, June 2007