Revision history for Perl extension Web::Scraper
0.12 Thu Aug 30 02:39:44 PDT 2007
- Added 's' command to scraper to get the HTML source
- You can use $tree variable to deal with the HTML::Element object in scraper shell
- Give a graceful error message if the given Selector/XPath doesn't compile
- Give a better error when number of args in process() seems wrong
0.11 Tue Aug 28 02:50:01 PDT 2007
- Supported hash-reference in process values, like
process "a", "people[]", { link => '@href', name => 'TEXT' };
See t/09-process_hash.t for its usage.
0.10 Mon Aug 27 00:53:51 PDT 2007
- result now returns the entire stash if called without keys
- added bin/scraper CLI
0.09 Wed Aug 15 10:51:14 PDT 2007
- remove Devel::Leak use from tests
0.08 Tue Aug 14 13:25:16 PDT 2007
- Call $tree->delete after the callback to avoid memory leaks by TreeBuilder.
(Thanks to k.daiba for the report)
0.07 Sat May 12 16:23:51 PDT 2007
- Updated dependencies for HTML::TreeBuilder::XPath
0.06 Sat May 12 15:47:27 PDT 2007
- Now don't use decoded_content to work with new H::R::Encoding
0.05 Wed May 9 18:21:22 PDT 2007
- Added (less DSL-ish) Web::Scraper->define(sub { ... }) syntax
- Fixed bug where the module dies if there's no encoding found in HTTP response headers
- Added more examples in eg/
- When we get value using callback, pass HTML::Element object as $_, in addition to $_[0]
(Suggested by Matt S. Trout)
- If the expression (1st argument to process()) starts with "/", it's
treated as a direct XPath and no Selector-to-XPath conversion is done.
0.04 Wed May 9 00:55:32 PDT 2007
- *API CHANGE* Now scraper {} returns Web::Scraper object and not closure.
You should call ->scrape() to get the response back.
(Suggested by Marcus Ramberg)
I loved the code returning closure, but this is more compatible to
scrapi.rb API and hopefully less confusing to people.
0.03 Tue May 8 23:04:13 PDT 2007
- use 'TEXT' rather than 'content' to grab text from element
to be more compatible with scrapi
- Added unit tests using Test::Base
- Refactored internal code for easier reading
- chained callbacks are now passed HTML::Element, not HTML, to avoid double HTML parsing
- Implemented callbacks (iterator) API
- Added 'process_first' to be compatible with scrapi
0.02 Tue May 8 20:03:37 PDT 2007
- Added dependencies to Makefile.PL
0.01 Tue May 8 04:05:59 2007
- original version