NAME
untemplate - analyze several HTML documents based on the same template
VERSION
version 0.004
SYNOPSIS
untemplate [options] HTML1 HTML2 [HTML3] [...]
DESCRIPTION
Takes multiple HTML documents generated using the same template and attempts to extract only the data inserted into original template.
Accepts URL if AnyEvent::Net::Curl::Queued is present.
OPTIONS
- --help
-
This.
- --[no]color
-
Enable syntax highlight for XPath. By default, enabled automatically on interactive terminals.
- --[no]strict
-
Strict mode disables grouping by
id
,class
orname
attributes. The grouping is enabled by default.
EXAMPLES
untemplate --color http://bash.org/?1839 http://bash.org/?2486 | less -R
CAVEATS
Trying to untemplate HTML documents not based on the same template, the results will be empty.
Unfortunately, employing any kind of document identifier as part of element class/id (common practice in WordPress themes) is enough to constitute "not same template".
AUTHOR
Stanislaw Pusep <stas@sysd.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Stanislaw Pusep.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.