NAME
Iterator::Flex::Manual::Authoring - How to write an iterator
VERSION
version 0.18
DESCRIPTION
Iterator Phases
Iterators must manage the four different phases that the iterator might be in:
initialization
iteration
exhaustion
error
For more details, see "Iterator life-cycle" in Iterator::Flex::Manual::Overview.
Initialization
When an iterator is constructed it is typically passed some state information; it may be an array or a hash, a database handle, or a file pointer.
The constructor must save the relevant pieces of information (typically through closed-over variables; see below) and initialize variables which keep track of where the iterator is in the data stream.
For example, if the iterator operates on an array, it will need to keep track of the index of the element it must return next.
Some iterators don't need or have access to that information. If an iterator operates on a file handle, returning the next line in the file, the file handle keeps track of the next line in the file, so the iterator doesn't need to. Similarly, if an iterator is retrieving data from a database via a cursor, the database will keep track of where it is in the data stream.
Iteration
In the iteration phase, the iterator identifies the data to return, updates its internal state (if necessary) so that it will return the correct data on the next iteration, and returns the data.
If the data stream has been exhausted, then the iterator must indicate this by calling the signal_exhaustion
method. This method implements the exhaustion policy requested by the user who set up the iterator (either returning a sentinel value or throwing an exception.
After this the iterator enters the "Exhaustion" phase.
If there is an error (e.g. if a database connection is dropped), the iterator must signal this by calling the signal_error
method. Not all iterators have an error phase.
Exhaustion
Unlike other iteration implementations, it is legal to call an iterator's next
method after the iterator is exhausted. In the exhaustion phase, the iterator simply invokes the signal_exhaustion
method
Error
Not all iterators have an error phase, but if they are in one, they simply call signal_error
.
Capabilities
An iterator must do at least one thing: return the next datum from the data stream. This is the next capability. Iterator::Flex iterators can support a number of other capabilities; see in Iterator::Flex::Manual::Overview,
Building an Iterator
Iterators are constructed by passing an attribute hash, %AttrHash
to the Iterator::Flex
factory, which uses it to construct an appropriate iterator class, instantiate it, and return it to the user.
The attribute hash (whose contents are documented much greater detail in "Iterator Parameters" in Iterator::Flex::Manual::Overview) describes the iterator's capabilities and provides implementations.
The main iterator routine (next
) must be a closure, with state contained in closed over variables. Every time a new iterator is constructed, a new closure is generated.
Writing an iterator generally involves writing a subroutine which returns the %AttrHash
containing the closures. As an example, we will construct an iterator which operates on arrays, providing a number of capabilities.
For simplicity, we'll write a construct
subroutine which is passed a reference to the array to iterator over, and returns the %AttrHash
. Later we'll see how to create a one-off iterator and a standalone iterator class using the concepts we've explored.
Our construct
subroutine will be called as
$ref_AttrHash = construct( \@array );
Creating the next
capability
First, let's concentrate on the heart of the iterator, the next
capability, which must be implemented as a closure.
next
has three responsibilities:
return the next data element
signal exhaustion
(optionally) signal an error.
It usually also ensures that the current
and previous
capabilities return the proper values. Because it is called most often, it should be as efficient as possible.
next
cannot keep state internally. Our construct
subroutine will store the state in lexical variables which only our instance of next
will have access to.
To illustrate, here's an implementation of next
for iteration over an array:
my $next = sub {
if ( $next == $len ) {
# if first time through, set current
$prev = $current
if ! $self->is_exhausted;
return $current = $self->signal_exhaustion;
}
$prev = $current;
$current = $next++;
return $arr->[$current];
};
Notice that the subroutine doesn't take any parameters. Also notice that it uses a number of variables that are not defined in the subroutine, e.g. $arr
, $next
, etc. These are lexical variables in configure
and are initialized outside of the $next
closure.
$arr
is the array we're operating on, $len
is its length (so we don't have to look it up every time). Because it's cheap to retain the state of an array (it's just an index), we can easily keep track of what is needed to implement the prev and current capabilities; those are stored in $prev
and $current
.
Finally, there's $self
, which is a handle for our iterator. It's not used for any performance critical work.
These must all be properly initialized by construct
before $next
is created; we'll go over that later. Let's first look at the code for the $next
closure.
The code is divided into two sections; the first deals with data exhaustion>:
if ( $next == $len ) {
# if first time through, set prev
$prev = $current
if ! $self->is_exhausted;
return $current = $self->signal_exhaustion;
}
Every time the iterator is invoked, it checks if it has run out of data. If it is has (e.g. $next == $len
) then the iterator sets up the exhaustion phase. The is_exhausted
predicate will be true if the iterator is already in the exhaustion phase. If it is, it doesn't need to perform work required to handle other capabilities. In our case, the first time the iterator is in the exhausted state it must set $prev
so that it correctly returns the last element in the array (which will be $current from the last successful iteration).
Then, it signals exhaustion by returning its signal_exhaustion
method (and setting $current
to that value, so the current
capability will return the correct value).
Recall that it is the client code that determines how the iterator will signal exhaustion (i.e, via a sentinel value or an exception). The iterator itself doesn't care; it simply returns the result of the signal_exhaustion
method, which will set the is_exhausted
object predicate and then either return a sentinel value or throw an exception.
In other iterator implementations (e.g. C++, Raku), calling next
(or other methods) on an exhausted iterator is undefined behavior. This is not true for Iterator::Flex
iterators. An exhausted iterator must always respond, identically, to a call to next
, so must always return the result of the signal_exhaustion
method.
The second part of the code takes care of returning the correct data and setting the iterator up for the succeeding call to next
. It also ensures that the current and prev capabilities will return the proper values:
$prev = $current;
$current = $next++;
return $arr->[$current];
Other capabilities
For completeness, here's the implementation of the rest of the iterator's capabilities:
my $reset = sub { $prev = $current = undef; $next = 0; };
my $rewind = sub { $next = 0; };
my $prev = sub { return defined $prev ? $arr->[$prev] : undef; };
my $current = sub { return defined $current ? $arr->[$current] : undef; };
They have been written as closures accessing the lexical variables, but they could also have been written as methods if the iterator chose to store its state in some other fashion. Only next
must be a closure.
Initialization Phase
Finally, we'll get to the iterator initialization phase, which may make more sense now that we've gone through the other phases. Recall that we are using closed over variables to keep track of state.
Our code should look something like this:
sub construct ( $array ) {
# initialize lexical variables here
my $next = ...;
my $prev = ...;
my $current = ...;
my $arr = ...;
my $len = ...;
my $self = ...;
# create our closures
my $next = sub { ... };
my $prev = sub { ... };
...
# return our %AttrHash:
return {
_self => \$self,
next => $next,
prev => $prev,
current => $current,
reset => $reset,
rewind => $rewind,
};
}
The first five lexical variables are easy:
my $next = 0;
my $prev = undef;
my $current = undef;
my $arr = $array ;
my $len = $array->@*;
Now, what about $self
? It is a reference to our iterator object, but the object hasn't be created yet; that's done when %AttrHash
is passed to Iterator::Flex::Factory
. So where does $self
get initialized? The answer lies in the _self
entry in %AttrHash
, which holds a reference to $self
. When Iterator::Flex::Factory creates the iterator object it uses the _self
entry to initialize $self
. (Note that $self
is not a reference to a hash. You cannot store data in it.)
Wrapping up
At this point construct
is functionally complete; given an array it'll return a hash that can be fed to the iterator factory.
Passing the %AttrHash
to the factory
Iterators may be constructed on-the-fly, or may be formalized as classes.
A one-off iterator
This approach uses "construct_from_attrs" in Iterator::Flex::Factory to create an iterator object from our %AttrHash
:
my @array = ( 1..100 );
my $AttrHash = construct( \@array );
$iter = Iterator::Flex::Factorye->construct_from_attrs( $AttrHash, \%opts );
In addition to %AttrHash
, construct_from_attrs
takes another options hash, which is where the exhaustion policy is set.
In this case, we can choose one of the following entries
exhaustion => 'throw';
On exhaustion, throw an exception object of class
Iterator::Flex::Failure::Exhausted
.exhaustion => [ return => $sentinel ];
On exhaustion, return the specified sentinel value.
The default is
exhaustion => [ return => undef ];
At this point $iter
is initialized and ready for use.
An iterator class
Creating a class requires a few steps more, and gives the following benefits:
A much cleaner interface, e.g.
$iter = Iterator::Flex::Array->new( \@array );
vs. the multi-liner above.
The ability to freeze and thaw the iterator
some of the construction costs can be moved from run time to compile time.
An iterator class must
subclass Iterator::Flex::Base;
provide two class methods,
new
andconstruct
; andregister its capabilities.
new
The new
method converts from the API most comfortable to your usage to the internal API used by Iterator::Flex::Base. By convention, the last argument should be reserved for a hashref containing general iterator arguments (such as the exhaustion
key). This hashref is documented in "new_from_attrs" in Iterator::Flex::Base.
The super class' constructor takes two arguments: a variable containing iterator specific data (state), and the above-mentioned general argument hash. The state variable can take any form, it is not interpreted by the Iterator::Flex
framework.
Here's the code for "new" in Iterator::Flex::Array:
sub new ( $class, $array, $pars={} ) {
$class->_throw( parameter => "argument must be an ARRAY reference" )
unless Ref::Util::is_arrayref( $array );
$class->SUPER::new( { array => $array }, $pars );
}
It's pretty simple. It saves the general options hash if present, stores the passed array (the state) in a hash, and passes both of them to the super class' constructor. (A hash is used here because Iterator::Flex::Array can be serialized, and extra state is required to do so).
construct
The construct
class method's duty is to return a %AttrHash
. It's called as
$AttrHash = $class->construct( $state );
where $state
is the state variable passed to "new" in Iterator::Flex::Base. Unsurprisingly, it is remarkably similar to the construct
subroutine developed earlier.
There are a few differences:
The signature changes, as this is a class method, rather than a subroutine.
There are additional
%AttrHash
entries available:_roles
, which supports run-time enabling of capabilities andfreeze
, which supports serialization.Capabilities other than
next
can be implemented as actual class methods, rather than closures. This decreases the cost of creating iterators (because they only need to be compiled once, rather than for every instance of the iterator) but increases run time costs, as they cannot use closed over variables to access state information.
Registering Capabilities
Unlike when using "construct_from_attr" in Iterator::Flex::Factory, which helpfully looks at %AttrHash
to determine which capabilities are provided (albeit at run time), classes are encouraged to register their capabilities at compile time via the _add_roles
method. For the example iterator class, this would be done via
__PACKAGE__->_add_roles( qw[
State::Registry
Next::ClosedSelf
Rewind::Closure
Reset::Closure
Prev::Closure
Current::Closure
] );
(These are all accepted shorthand for roles in the Iterator::Flex::Role namespace.)
If capabilities must be added at run time, use the _roles
entry in %AttrHash
.
The specific roles used here are:
- Next::ClosedSelf
-
This indicates that the
next
capability uses a closed over$self
variable, and thatIterator::Flex
should use the_self
hash entry to initialize it. - State::Registry
-
This indicates that the exhaustion state should be stored in the central iterator Registry. Another implementation uses a closed over variable (and the role
State::Closure
). See "Exhaustion" in Iterator::Flex::Manual::Internals. - Reset::Closure
- Prev::Closure
- Current::Closure
- Rewind::Closure
-
These indicate that the named capability is present and implemented as a closure.
All together
package My::Array;
use strict;
use warnings;
use parent 'Iterator::Flex::Base';
sub new {
my $class = shift;
my $gpar = Ref::Util::is_hashref( $_[-1] ) ? pop : {};
$class->_throw( parameter => "argument must be an ARRAY reference" )
unless Ref::Util::is_arrayref( $_[0] );
$class->SUPER::new( { array => $_[0] }, $gpar );
}
sub construct {
my ( $class, $state ) = @_;
# initialize lexical variables here
...
my $arr = $state->{array};
my %AttrHash = ( ... );
return \%AttrHash;
}
__PACKAGE__->_add_roles( qw[
State::Registry
Next::ClosedSelf
Rewind::Closure
Reset::Closure
Prev::Closure
Current::Closure
] );
1;
INTERNALS
SUPPORT
Bugs
Please report any bugs or feature requests to bug-iterator-flex@rt.cpan.org or through the web interface at: https://rt.cpan.org/Public/Dist/Display.html?Name=Iterator-Flex
Source
Source is available at
https://gitlab.com/djerius/iterator-flex
and may be cloned from
https://gitlab.com/djerius/iterator-flex.git
SEE ALSO
Please see those modules/websites for more information related to this module.
AUTHOR
Diab Jerius <djerius@cpan.org>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2018 by Smithsonian Astrophysical Observatory.
This is free software, licensed under:
The GNU General Public License, Version 3, June 2007