NAME
scrape2rss.pl - extract information as RSS (well, Atom) feed
ABSTRACT
This is a simple program to extract data from HTML by specifying CSS3 or XPath selectors.
SYNOPSIS
scrape2rss.pl URL OPTIONS
scrape2rss.pl
http://conferences.yapceurope.org/gpw2011/news
--feed-title "GPW 2011 Atom Feed"
--title "h3 a"
--summary "h3+p+p"
--permalink "h3 a@href"
--date "h3+p em"
--date-fmt "%d/%m/%y %H:%M"
-o gpw2011.de.atom
DESCRIPTION
This program fetches an HTML page and creates an RSS feed from it. The elements that are turned into the RSS feed are specified as CSS or XPath selectors.
If the URL is -
, input will be read from STDIN.
OPTIONS
- --title
-
Selector for the entry title
- --summary
-
Selector for the entry summary
- --permalink
-
Selector for the entry permalink
- --pages
-
Selector for the pagination links to follow
- --date
-
Selector for the entry publication date
- --date-fmt
-
sprintf
format that the entry publication date is in for conversion into a proper Atom timestamp - --outfile
-
Name of the output file
Default is STDOUT
- --debug
-
Output information in clear text
REPOSITORY
The public repository of this module is http://github.com/Corion/App-scrape.
SUPPORT
The public support forum of this program is http://perlmonks.org/.
AUTHOR
Max Maischein corion@cpan.org
COPYRIGHT (c)
Copyright 2011-2011 by Max Maischein corion@cpan.org
.
LICENSE
This module is released under the same terms as Perl itself.