NAME
scrape.pl - simple HTML scraping from the command line
ABSTRACT
This is a simple program to extract data from HTML by specifying CSS3 or XPath selectors.
SYNOPSIS
scrape.pl URL selector selector ...
# Print page title
scrape.pl http://perl.org title
# The Perl Programming Language - www.perl.org
# Print links with titles, make links absolute
scrape.pl http://perl.org a //a/@href --uri=2
# Print all links to JPG images, make links absolute
scrape.pl http://perl.org a[@href=$"jpg"]
DESCRIPTION
This program fetches an HTML page and extracts nodes matched by XPath or CSS selectors from it.
If URL is -
, input will be read from STDIN.
OPTIONS
- --sep
-
Separator character to use for columns. Default is tab.
- --uri COLUMNS
-
Numbers of columns to convert into absolute URIs, if the known attributes do not everything you want.
- --no-uri
-
Switches off the automatic translation to absolute URIs for known attributes like
href
andsrc
.
REPOSITORY
The public repository of this module is http://github.com/Corion/App-scrape.
SUPPORT
The public support forum of this program is http://perlmonks.org/.
AUTHOR
Max Maischein corion@cpan.org
COPYRIGHT (c)
Copyright 2011-2011 by Max Maischein corion@cpan.org
.
LICENSE
This module is released under the same terms as Perl itself.