NAME

TPath::Forester - a generator of TPath expressions for a particular class of nodes

VERSION

version 1.007

SYNOPSIS

# we apply the TPath::Forester role to a class

{
  package MyForester;

  use Moose;                                      # for simplicity we omit removing Moose droppings, etc.
  use MooseX::MethodAttributes;                   # needed if you're going to add some attributes

  with 'TPath::Forester';                         # compose in the TPath::Forester methods and attributes

  # define abstract methods
  sub children    { $_[1]->children }             # our nodes know their children
  sub parent      { $_[1]->parent }               # our nodes know their parent
  sub has_tag     {                               # our nodes have a tag attribute which is
     my ($self, $node, $tag) = @_;                #   their only tag
     $node->tag eq $tag;
  }
  sub matches_tag { 
     my ($self, $node, $re) = @_;
     $node->tag =~ $re;
  }

  # define an attribute
  sub baz :Attr   {   
    # the canonical order of arguments, none of which we need
    # my ($self, $node, $index, $collection, @args) = @_;
    'baz';
  }
}

# now select some nodes from a tree

my $f     = MyForester->new;                      # make a forester
my $path  = $f->path('//foo/>bar[@depth = 4]');   # compile a path
my $root  = fetch_tree();                         # get a tree of interest
my @nodes = $path->select($root);                 # find the nodes of interest

# say our nodes have a text method that returns a string

$f->add_test( sub { shift->text =~ /^\s+$/ } );   # ignore whitespace nodes
$f->add_test( sub { shift->text =~ /^-?\d+$/ } ); # ignore integers
$f->add_test( sub { ! length shift->text } );     # ignore empty nodes

# reset to ignoring nothing

$f->clear_tests;

DESCRIPTION

A TPath::Forester understands your trees and hence can translate TPath expressions into objects that will select the appropriate nodes from your trees. It can also generate an index appropriate to your trees if you're doing multiple selects on a particular tree.

TPath::Forester is a role. It provides most, but not all, methods and attributes required to construct TPath::Expression objects. You must specify how to find a node's children and its parent (you may have to rely on a TPath::Index for this), and you must define how a tag string or regex may match a node, if at all.

Why "Forester"

Foresters are people who can tell you about trees. A class with the role TPath::Forester can also tell you about trees. I think now "arborist" sounds better, but I don't feel like refactoring everything to use a new name.

ATTRIBUTES

log_stream

A TPath::LogStream required by the @log attribute. By default it is TPath::StderrLog. This attribute is required by the @log attribute from TPath::Attributes::Standard.

one_based

Whether to use xpath-style index predicates, with [1] being the index of the first element, or zero-based indices, with [0] being the first index. This only affects non-negative indices. This attribute is false by default.

case_insensitive

Whether selectors are case-insensitive in their matchign of tags. This attribute is false by default.

METHODS

add_test, has_tests, clear_tests

Add a code ref that will be used to test whether a node is ignorable. The return value of this code will be treated as a boolean value. If it is true, the node, and all its children, will be passed over as possible items to return from a select.

Example test:

$f->add_test(sub {
    my ($forester, $node, $index) = @_;
    return $forester->has_tag('foo');
});

Every test will receive the forester itself, the node, and the index as arguments. This example test will cause the forester $f to ignore foo nodes.

This method has the companion methods has_tests and clear_tests. The former says whether the list is empty and the latter clears it.

add_attribute

Expects a name, a code reference, and possibly options. Adds the attribute to the forester.

If the attribute name is already in use, the method will croak unless you specify that this attribute should override the already named attribute. E.g.,

$f->add_attribute( 'foo', sub { ... }, -override => 1 );

If you specify the attribute as overriding and the name is *not* already in use, the method will carp. You can use the -force option to skip all this checking and just add the attribute.

Note that the code reference will receive the forester, a node, an index, a collection of nodes, and optionally any additional arguments. If you want the attribute to evaluate as undefined for a particular node, it must return undef for this node.

attribute

Expects a TPath::Context, an attribute name, and an optional parameter list. Returns the value of the attribute in that context.

path

Takes a TPath expression and returns a TPath::Expression.

index

Takes a tree node and returns a TPath::Index object that TPath::Expression objects can use to cache information about the tree rooted at the given node.

parent

Expects a TPath::Context and returns the parent of the context node according to the index. If your nodes know their own parents, you probably want to override this method. See also TPath::Index.

id

Expects a node. Returns id of node, if any. By default this method always returns undef. Override if your node has some defined notion of id.

autoload_attribute

Expects an attribute name and optionally a list of arguments. Returns a code reference instantiating the attribute. This method is required for attributes such as

//foo[@:a]

or

//bar[@:b(1)]

Note the unescaped colon preceding the attribute name.

Autoloading is useful for this such as HTML or XML trees, where nodes may have ad hoc attributes.

This method must be defined by each forester requiring attribute auto-loading. The default method will always return undef, and if one attempts to use it to autoload an attribute an error will be thrown during expression compilation.

is_leaf

Expects a node, and an index.

Returns whether the context node is a leaf. Override this with something more efficient where available. E.g., where the node provides an is_leaf method,

sub is_leaf { $_[1]->is_leaf }

is_root

Expects a node and an index.

Returns whether the context node is the root. Delegates to index.

Override this with something more efficient where available. E.g., where the node provides an is_root method,

sub is_root { $_[1]->is_root }

has_tag

Expects a node and a string. Returns whether the node, in whatever sense is appropriate to this sort of node, "has" the string as a tag. See the required tag method.

matches_tag

Expects a node and a compiled regex. Returns whether the node, in whatever sense is appropriate to this sort of node, has a tag that matches the regex. See the required tag method.

wrap

Expects a node and possibly an options hash. Returns a node of the type understood by the forester.

If your forester must coerce things into a tree of the right type, override this method, which otherwise just passes through its second argument.

Note, if you do need to override the default wrap, you'll have to jump through a few Moose hoops. The basic pattern is

...
use Moose;
...
with 'TPath::Forester' => { -excludes => 'wrap' };
...

{
    no warnings 'redefine';
    sub wrap {
        my ($self, $node, %opts) = @_;
        return $node if blessed $node and $node->isa('MyNode');
        # coerce
        ...
    }
}

See TPath::Forester::Ref for an example.

ROLES

TPath::Attributes::Standard, TPath::TypeCheck

REQUIRED METHODS

children

Expects a node and an index. Returns the children of the node as a list.

tag

Expects a node and returns the value selectors are matched against, or undef if the node has no tag.

If your node type cannot be so easily mapped to a particular tag, you may want to override the has_tag and matches_tag methods and supply a no-op method for tag.

AUTHOR

David F. Houghton <dfhoughton@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by David F. Houghton.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.