NAME
Web::Query - Yet another scraping library like jQuery
SYNOPSIS
use Web::Query;
wq('http://google.com/search?q=foobar')
->find('h2')
->each(sub {
my $i = shift;
printf("%d) %s\n", $i+1, $_->text
});
DESCRIPTION
Web::Query is a yet another scraping framework, have a jQuery like interface.
Yes, I know Ingy's pQuery. But it's just a alpha quality. It doesn't works. Web::Query built at top of the CPAN modules, HTML::TreeBuilder::XPath, LWP::UserAgent, and HTML::Selector::XPath.
So, this module uses HTML::Selector::XPath and only supports the CSS 3 selector supported by that module. Web::Query doesn't support jQuery's extended queries(yet?).
THIS LIBRARY IS UNDER DEVELOPMENT. ANY API MAY CHANGE WITHOUT NOTICE.
FUNCTIONS
METHODS
- my $q = Web::Query->new($stuff, \%options )
-
Create new instance of Web::Query. You can make the instance from URL(http, https, file scheme), HTML in string, URL in string, URI object, and instance of HTML::Element.
This method throw the exception on unknown $stuff.
This method returns undefined value on non-successful response with URL.
Currently, the only option valid option is indent, which will be used as the indentation string if the object is printed.
- my $q = Web::Query->new_from_element($element: HTML::Element)
-
Create new instance of Web::Query from instance of HTML::Element.
- my $q = Web::Query->new_from_html($html: Str)
-
Create new instance of Web::Query from HTML.
- my $q = Web::Query->new_from_url($url: Str)
-
Create new instance of Web::Query from URL.
If the response is not success(It means /^20[0-9]$/), this method returns undefined value.
You can get a last result of response, use the
$Web::Query::RESPONSE
.Here is a best practical code:
my $url = 'http://example.com/'; my $q = Web::Query->new_from_url($url) or die "Cannot get a resource from $url: " . Web::Query->last_response()->status_line;
- my $q = Web::Query->new_from_file($file_name: Str)
-
Create new instance of Web::Query from file name.
- my @html = $q->html();
- my $html = $q->html();
- $q->html('<p>foo</p>');
-
Get/Set the innerHTML.
- $q->as_html();
-
Return the elements associated with the object as strings. If called in a scalar context, only return the string representation of the first element.
- my @text = $q->text();
- my $text = $q->text();
- $q->text('text');
-
Get/Set the inner text.
- my $attr = $q->attr($name);
- $q->attr($name, $val);
-
Get/Set the attribute value in element.
- $q = $q->find($selector)
-
This method find nodes by $selector from $q. $selector is a CSS3 selector.
- $q->each(sub { my ($i, $elem) = @_; ... })
-
Visit each nodes.
$i
is a counter value, 0 origin.$elem
is iteration item.$_
is localized by$elem
. - $q->map(sub { my ($i, $elem) = @_; ... })
-
Creates a new array with the results of calling a provided function on every element.
- $q->filter(sub { my ($i, $elem) = @_; ... })
-
Reduce the elements to those that pass the function's test.
- $q->end()
-
Back to the before context like jQuery.
- my $size = $q->size() : Int
-
Return the number of DOM elements matched by the Web::Query object.
- my $parent = $q->parent() : Web::Query
-
Return the parent node from
$q
. - my $first = $q->first()
-
Return the first matching element.
This method constructs a new Web::Query object from the first matching element.
- my $last = $q->last()
-
Return the last matching element.
This method constructs a new Web::Query object from the last matching element.
- $q->remove()
-
Delete the elements associated with the object from the DOM.
# remove all <blink> tags from the document $q->find('blink')->remove;
- $q->replace_with( $replacement );
-
Replace the elements of the object with the provided replacement. The replacement can be a string, a
Web::Query
object or an anonymous function. The anonymous function is passed the index of the current node and the node itself (with is also localized as$_
).my $q = wq( '<p><b>Abra</b><i>cada</i><u>bra</u></p>' ); $q->find('b')->replace_with('<a>Ocus</a>); # <p><a>Ocus</a><i>cada</i><u>bra</u></p> $q->find('u')->replace_with($q->find('b')); # <p><i>cada</i><b>Abra</b></p> $q->find('i')->replace_with(sub{ my $name = $_->text; return "<$name></$name>"; }); # <p><b>Abra</b><cada></cada><u>bra</u></p>
HOW DO I CUSTOMIZE USER AGENT?
You can specify your own instance of LWP::UserAgent.
$Web::Query::UserAgent = LWP::UserAgent->new( agent => 'Mozilla/5.0' );
INCOMPATIBLE CHANGES
AUTHOR
Tokuhiro Matsuno <tokuhirom AAJKLFJEF@ GMAIL COM>
SEE ALSO
LICENSE
Copyright (C) Tokuhiro Matsuno
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.