NAME

Web::Query - Yet another scraping library like jQuery

SYNOPSIS

use Web::Query;

wq('http://google.com/search?q=foobar')
      ->find('h2')
      ->each(sub {
            my $i = shift;
            printf("%d) %s\n", $i+1, $_->text
      });

DESCRIPTION

Web::Query is a yet another scraping framework, have a jQuery like interface.

Yes, I know ingy's pQuery. But it's just a alpha quality. It doesn't works. Web::Query built at top of the CPAN modules, HTML::TreeBuilder::XPath, LWP::UserAgent, and HTML::Selector::XPath.

So, this module uses HTML::Selector::XPath and only supports the CSS 3 selector supported by that module. Web::Query doesn't support jQuery's extended queries(yet?).

THIS LIBRARY IS UNDER DEVELOPMENT. ANY API MAY CHANGE WITHOUT NOTICE.

FUNCTIONS

wq($stuff)

This is a shortcut for Web::Query->new($stuff). This function is exported by default.

METHODS

my $q = Web::Query->new($stuff, \%options )

Create new instance of Web::Query. You can make the instance from URL(http, https, file scheme), HTML in string, URL in string, URI object, and instance of HTML::Element.

This method throw the exception on unknown $stuff.

This method returns undefined value on non-successful response with URL.

Currently, the only option valid option is indent, which will be used as the indentation string if the object is printed.

my $q = Web::Query->new_from_element($element: HTML::Element)

Create new instance of Web::Query from instance of HTML::Element.

my $q = Web::Query->new_from_html($html: Str)

Create new instance of Web::Query from HTML.

my $q = Web::Query->new_from_url($url: Str)

Create new instance of Web::Query from URL.

If the response is not success(It means /^20[0-9]$/), this method returns undefined value.

You can get a last result of response, use the $Web::Query::RESPONSE.

Here is a best practical code:

my $url = 'http://example.com/';
my $q = Web::Query->new_from_url($url)
    or die "Cannot get a resource from $url: " . Web::Query->last_response()->status_line;
my $q = Web::Query->new_from_file($file_name: Str)

Create new instance of Web::Query from file name.

my @html = $q->html();
my $html = $q->html();
$q->html('<p>foo</p>');

Get/Set the innerHTML.

$q->as_html();

Return the elements associated with the object as strings. If called in a scalar context, only return the string representation of the first element.

my @text = $q->text();
my $text = $q->text();
$q->text('text');

Get/Set the inner text.

my $attr = $q->attr($name);
$q->attr($name, $val);

Get/Set the attribute value in element.

$q = $q->find($selector)

This method find nodes by $selector from $q. $selector is a CSS3 selector.

$q->each(sub { my ($i, $elem) = @_; ... })

Visit each nodes. $i is a counter value, 0 origin. $elem is iteration item. $_ is localized by $elem.

$q->map(sub { my ($i, $elem) = @_; ... })

Creates a new array with the results of calling a provided function on every element.

$q->filter(sub { my ($i, $elem) = @_; ... })

Reduce the elements to those that pass the function's test.

$q->end()

Back to the before context like jQuery.

my $size = $q->size() : Int

Return the number of DOM elements matched by the Web::Query object.

my $parent = $q->parent() : Web::Query

Return the parent node from $q.

my $first = $q->first()

Return the first matching element.

This method constructs a new Web::Query object from the first matching element.

my $last = $q->last()

Return the last matching element.

This method constructs a new Web::Query object from the last matching element.

$q->remove()

Delete the elements associated with the object from the DOM.

# remove all <blink> tags from the document
$q->find('blink')->remove;
$q->replace_with( $replacement );

Replace the elements of the object with the provided replacement. The replacement can be a string, a Web::Query object or an anonymous function. The anonymous function is passed the index of the current node and the node itself (with is also localized as $_).

my $q = wq( '<p><b>Abra</b><i>cada</i><u>bra</u></p>' );

$q->find('b')->replace_with('<a>Ocus</a>);
    # <p><a>Ocus</a><i>cada</i><u>bra</u></p>

$q->find('u')->replace_with($q->find('b'));
    # <p><i>cada</i><b>Abra</b></p>

$q->find('i')->replace_with(sub{ 
    my $name = $_->text;
    return "<$name></$name>";
});
    # <p><b>Abra</b><cada></cada><u>bra</u></p>

HOW DO I CUSTOMIZE USER AGENT?

You can specify your own instance of LWP::UserAgent.

$Web::Query::UserAgent = LWP::UserAgent->new( agent => 'Mozilla/5.0' );

INCOMPATIBLE CHANGES

0.10

new_from_url() is no longer throws exception on bad response from HTTP server.

AUTHOR

Tokuhiro Matsuno <tokuhirom AAJKLFJEF@ GMAIL COM>

SEE ALSO

pQuery

LICENSE

Copyright (C) Tokuhiro Matsuno

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.