NAME
XML::RSS:Parser - A liberal parser for RSS Feeds.
SYNOPSIS
#!/usr/bin/perl -w
use strict;
use XML::RSS::Parser;
use URI;
use LWP::UserAgent;
use Data::Dumper;
my $ua = LWP::UserAgent->new;
$ua->agent('XML::RSS::Parser Test Script');
my @places=( 'http://www.mplode.com/tima/xml/index.xml' );
my $p = new XML::RSS::Parser;
foreach my $place ( @places ) {
# retreive feed
my $url=URI->new($place);
my $req=HTTP::Request->new;
$req->method('GET');
$req->uri($url);
my $feed = $ua->request($req);
# parse feed
$p->parse( $feed->content );
# print feed title and items data dump to screen
print $p->channel->{ $p->ns_qualify('title', $p->rss_namespace_uri ) }."\n";
my $d = Data::Dumper->new([ $p->items ]);
print $d->Dump."\n\n";
}
DESCRIPTION
XML::RSS::Parser is a lightweight liberal parser of RSS feeds that is derived from the XML::Parser::LP module the I developed for mt-rssfeed -- a MovableType plugin. This parser is "liberal" in that it does not demand compliance to a specific RSS version and will attempt to gracefully handle tags it does not expect or understand. The parser's only requirement is that the file is well-formed XML. The module is leaner then XML::RSS -- the majority of code was for generating RSS files.
Your feedback and suggestions are greatly appreciated. See the TO DO section for some brief thoughts on next steps.
This modules requires the XML::Parser package.
METHODS
The following methods are available:
new
Constructor for XML::RSS::Parser. Returns a reference to a XML::RSS::Parser object.
parse(source)
Inherited from XML::Parser, the SOURCE parameter should either an open IO::Handle or a string containing the whole XML document. A die call is thrown if a parse error occurs otherwise it will return 1.
parsefile(file)
Inherited from XML::Parser, FILE is an open handle. The file is closed no matter how parse returns. A die call is thrown if a parse error occurs otherwise it will return 1.
channel
Returns a HASH reference of elements found directly under the channel element. The key is the fully namespace qualified element.
items
Returns a reference to an ARRAY of HASH references. Each hash referenced contains the fully namespaced qualified elements found under directly under an item element. The ordering of the item elements in the feed is maintained within the array.
image
Returns a HASH reference of elements found directly under the image element. If an image has not been defined the hash will not contain any key/value pairs.
ns_qualify(element, namesapce_uri)
A simple utility method that will return a fully namespace qualified string for the supplied element.
rss_namespace_uri
A utility method for determining the namespace RSS elements are in if at all. This is important since different RSS namespaces are in use. Returns the default namespace if it is defined otherwise it hunts for it based on a list of common namespace URIs. Return a null string if a namespace cannot be determined or was not defined at all.
SEE ALSO
XML::Parser, http://feeds.archive.org/validator/, http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html, http://www.oreillynet.com/pub/a/webservices/2002/11/19/rssfeedquality.html,
TO DO AND ISSUE
Add for attribute handling and storage.
Add handling for SkipDays, SkipHours, textinput and rdf:items.
Implementing processing switches for turning section processing on and off.
LICENSE
The software is released under the Artistic License. The terms of the Artistic License are described at http://www.perl.com/language/misc/Artistic.html.
AUTHOR & COPYRIGHT
Except where otherwise noted, XML::RSS::Parser is Copyright 2003, Timothy Appnel, tima@mplode.com. All rights reserved.