NAME

XML::RSS:Parser - A liberal parser for RSS Feeds.

SYNOPSIS

#!/usr/bin/perl -w

use strict;
use XML::RSS::Parser;
use URI;
use LWP::UserAgent;
use Data::Dumper;

my $ua = LWP::UserAgent->new;
$ua->agent('XML::RSS::Parser Test Script');
my @places=( 'http://www.mplode.com/tima/xml/index.xml' );

my $p = new XML::RSS::Parser;

foreach my $place ( @places ) {

	# retreive feed
	my $url=URI->new($place);
	my $req=HTTP::Request->new;
	$req->method('GET');
	$req->uri($url);
	my $feed = $ua->request($req);

	# parse feed
	$p->parse( $feed->content );

	# print feed title and items data dump to screen
	print $p->channel->{ $p->ns_qualify('title', $p->rss_namespace_uri ) }."\n";
	my $d = Data::Dumper->new([ $p->items ]);
	print $d->Dump."\n\n";

}

DESCRIPTION

XML::RSS::Parser is a lightweight liberal parser of RSS feeds that is derived from the XML::Parser::LP module the I developed for mt-rssfeed -- a MovableType plugin. This parser is "liberal" in that it does not demand compliance to a specific RSS version and will attempt to gracefully handle tags it does not expect or understand. The parser's only requirement is that the file is well-formed XML. The module is leaner then XML::RSS -- the majority of code was for generating RSS files.

Your feedback and suggestions are greatly appreciated. See the TO DO section for some brief thoughts on next steps.

This modules requires the XML::Parser package.

METHODS

The following methods are available:

  • new

    Constructor for XML::RSS::Parser. Returns a reference to a XML::RSS::Parser object.

  • parse(source)

    Inherited from XML::Parser, the SOURCE parameter should either an open IO::Handle or a string containing the whole XML document. A die call is thrown if a parse error occurs otherwise it will return 1.

  • parsefile(file)

    Inherited from XML::Parser, FILE is an open handle. The file is closed no matter how parse returns. A die call is thrown if a parse error occurs otherwise it will return 1.

  • channel

    Returns a HASH reference of elements found directly under the channel element. The key is the fully namespace qualified element.

  • items

    Returns a reference to an ARRAY of HASH references. Each hash referenced contains the fully namespaced qualified elements found under directly under an item element. The ordering of the item elements in the feed is maintained within the array.

  • image

    Returns a HASH reference of elements found directly under the image element. If an image has not been defined the hash will not contain any key/value pairs.

  • ns_qualify(element, namesapce_uri)

    A simple utility method that will return a fully namespace qualified string for the supplied element.

  • rss_namespace_uri

    A utility method for determining the namespace RSS elements are in if at all. This is important since different RSS namespaces are in use. Returns the default namespace if it is defined otherwise it hunts for it based on a list of common namespace URIs. Return a null string if a namespace cannot be determined or was not defined at all.

SEE ALSO

XML::Parser, http://feeds.archive.org/validator/, http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html, http://www.oreillynet.com/pub/a/webservices/2002/11/19/rssfeedquality.html,

TO DO AND ISSUE

  • Add for attribute handling and storage.

  • Add handling for SkipDays, SkipHours, textinput and rdf:items.

  • Implementing processing switches for turning section processing on and off.

LICENSE

The software is released under the Artistic License. The terms of the Artistic License are described at http://www.perl.com/language/misc/Artistic.html.

AUTHOR & COPYRIGHT

Except where otherwise noted, XML::RSS::Parser is Copyright 2003, Timothy Appnel, tima@mplode.com. All rights reserved.