my $this = HTML::Object::XPath->new || die( HTML::Object::XPath->error, "\n" );
my $p = HTML::Object->new;
my $doc = $p->parse_file( $path_to_html_file ) || die( $p->error );
# Returns a list of HTML::Object::Element objects matching the select, which is
# converted into a xpath
my @nodes = $doc->find( 'p' );
# or directly:
use HTML::Object::XPath;
my $xp = use HTML::Object::XPath->new;
my @nodes = $xp->findnodes( $xpath, $element_object );
=head1 VERSION
v0.2.0
=head1 DESCRIPTION
This module implements the XPath engine used by L<HTML::Object::XQuery> to provide a jQuery-like interface to query the parsed DOM object.
=head1 METHODS
=head2 clear_namespaces
Clears all previously set namespace mappings.
=head2 exists
Provided with a C<path> and a C<context> and this returns true if the given path exists.
=head2 findnodes
Provided with a C<path> and a C<context> this returns a list of nodes found by C<path>, optionally in context C<context>.
In scalar context it returns an HTML::Object::XPath::NodeSet object.
=head2 findnodes_as_string
Provided with a C<path> and a C<context> and this returns the nodes found as a single string. The result is not guaranteed to be valid HTML though (it could for example be just text if the query returns attribute values).
=head2 findnodes_as_strings
Provided with a C<path> and a C<context> and this returns the nodes found as a list of strings, one per node found.
=head2 findvalue
Provided with a C<path> and a C<context> and this returns the result as a string (the concatenation of the values of the result nodes).
=head2 findvalues
Provided with a C<path> and a C<context> and this returns the values of the result nodes as a list of strings.
=head2 matches($node, $path, $context)
Provided with a C<node> L<object|HTML::Object::Element>, C<path> and a C<context> and this returns true if the node matches the path.
=head2 find
Provided with a C<path> and a C<context> and this returns either a L<HTML::Object::XPath::NodeSet> object containing the nodes it found (or empty if no nodes matched the path), or one of L<HTML::Object::XPath::Literal> (a string), L<HTML::Object::XPath::Number>, or L<HTML::Object::XPath::Boolean>. It should always return something - and you can use ->isa() to find out what it returned. If you need to check how many nodes it found you should check $nodeset->size.
See L<HTML::Object::XPath::NodeSet>.
=head2 get_namespace ($prefix, $node)
Provided with a C<prefix> and a C<node> L<object|HTML::Object::Element> and this returns the uri associated to the prefix for the node (mostly for internal usage)
=head2 get_var
Provided with a variable name, and this returns the value of the XPath variable (mostly for internal usage)
=head2 getNodeText
Provided with a C<path> and this returns the text string for a particular node. It returns a string, or C<undef> if the node does not exist.
=head2 namespaces
Sets or gets an hash reference of namespace attributes.
=head2 new_expr
Create a new L<HTML::Object::XPath::Expr>, passing it whatever argument was provided, and returns the newly instantiated object, or C<undef> upon L<error|Module::Generic/error>
=head2 new_function
Create a new L<HTML::Object::XPath::Function> object, passing it whatever argument was provided, and returns the newly instantiated object, or C<undef> upon L<error|Module::Generic/error>
=head2 new_literal
Create a new L<HTML::Object::XPath::Literal> object, passing it whatever argument was provided, and returns the newly instantiated object, or C<undef> upon L<error|Module::Generic/error>
=head2 new_location_path
Create a new L<HTML::Object::XPath::LocationPath> object, passing it whatever argument was provided, and returns the newly instantiated object, or C<undef> upon L<error|Module::Generic/error>
=head2 new_nodeset
Create a new L<HTML::Object::XPath::NodeSet> object, passing it whatever argument was provided, and returns the newly instantiated object, or C<undef> upon L<error|Module::Generic/error>
=head2 new_number
Create a new L<HTML::Object::XPath::Number> object, passing it whatever argument was provided, and returns the newly instantiated object, or C<undef> upon L<error|Module::Generic/error>
=head2 new_root
Create a new L<HTML::Object::XPath::Root> object, passing it whatever argument was provided, and returns the newly instantiated object, or C<undef> upon L<error|Module::Generic/error>
=head2 new_step
Create a new L<HTML::Object::XPath::Step> object, passing it whatever argument was provided, and returns the newly instantiated object, or C<undef> upon L<error|Module::Generic/error>
=head2 new_variable
Create a new L<HTML::Object::XPath::Variable> object, passing it whatever argument was provided, and returns the newly instantiated object, or C<undef> upon L<error|Module::Generic/error>
=head2 set_namespace
Provided with a C<prefix> and an C<uri> and this sets the namespace prefix mapping to the uri.
Normally in L<HTML::Object::XPath> the prefixes in XPath node tests take their context from the current node. This means that foo:bar will always match an element <foo:bar> regardless of the namespace that the prefix foo is mapped to (which might even change within the document, resulting in unexpected results). In order to make prefixes in XPath node tests actually map to a real URI, you need to enable that via a call to the set_namespace method of your HTML::Object::XPath object.
=head2 parse
Provided with an XPath expression and this returns a new L<HTML::Object::XPath::Expr> object that can then be used repeatedly.
You can create an XPath expression from a CSS selector expression using L<HTML::selector::XPath>
=head2 set_strict_namespaces
Takes a boolean value.
By default, for historical as well as convenience reasons, L<HTML::Object::XPath> has a slightly non-standard way of dealing with the default namespace.
If you search for C<//tag> it will return elements C<tag>. As far as I understand it, if the document has a default namespace, this should not return anything. You would have to first do a C<set_namespace>, and then search using the namespace.
Passing a true value to C<set_strict_namespaces> will activate this behaviour, passing a false value will return it to its default behaviour.
=head2 set_var
Provided with a variable name and its value and this sets an XPath variable (that can be used in queries as C<$var>)
=head1 NODE STRUCTURE
All nodes have the same first 2 entries in the array: node_parent and node_pos. The type of the node is determined using the ref() function.
The node_parent always contains an entry for the parent of the current node - except for the root node which has undef in there. And node_pos is the position of this node in the array that it is in (think: $node == $node->[node_parent]->[node_children]->[$node->[node_pos]] )
Nodes are structured as follows:
=head2 Root Node
The L<root node|HTML::Object::Root> is just an element node with no parent.
[
undef, # node_parent - check for undef to identify root node
undef, # node_pos
undef, # node_prefix
[ ... ], # node_children (see below)
]
=head2 L<Element|HTML::Object::Element> Node
[
$parent, # node_parent
<position in current array>, # node_pos
'xxx', # node_prefix - namespace prefix on this element
[ ... ], # node_children
'yyy', # node_name - element tag name
[ ... ], # node_attribs - attributes on this element
[ ... ], # node_namespaces - namespaces currently in scope
]
=head2 L<Attribute|HTML::Object::Attribute> Node
[
$parent, # node_parent - the element node
<position in current array>, # node_pos
'xxx', # node_prefix - namespace prefix on this element