NAME

XML::RSS:Parser - A liberal object-oriented parser for RSS feeds.

SYNOPSIS

#!/usr/bin/perl -w

use strict;
use XML::RSS::Parser;

my $p = new XML::RSS::Parser;
my $feed = $p->parsefile('/path/to/some/rss/file');
	
# output some values
my $title = XML::RSS::Parser->ns_qualify('title',$feed->rss_namespace_uri);
print $feed->channel->children($title)->value."\n";
print "item count: ".$feed->item_count()."\n\n";
foreach my $i ( $feed->items ) {
	map { print $_->name.": ".$_->value."\n" } $i->children;
	print "\n";
} 

DESCRIPTION

XML::RSS::Parser is a lightweight liberal parser of RSS feeds that is derived from the XML::Parser::LP module the I developed for mt-rssfeed -- a Movable Type plugin. This parser is "liberal" in that it does not demand compliance of a specific RSS version and will attempt to gracefully handle tags it does not expect or understand. The parser's only requirements is that the file is well-formed XML and remotely resembles RSS. The module is leaner then XML::RSS -- the majority of code was for generating RSS files.

Your feedback and suggestions are greatly appreciated. See the "TO DO" section for some brief thoughts on next steps.

This modules requires the XML::Parser package.

SPECIAL PROCESSING NOTES

There are a number of different RSS formats in use today. In very subtle ways these formats are not entirely compatible from one to another. To ease working with RSS data in different formats, the parser does not create the feed's parse tree verbatim. Instead it makes a few assumptions to "normalize" the parse tree into a more consistent form.

  • The parser will not include the root tags of rss or RDF in the tree. Namespace declaration information is still extracted.

  • The parser also forces channel and item into a parent-child relationship. In versions 0.9 and 1.0, channel and item tags are siblings.

  • Some more advanced feeds in existence take advantage of namespace extensions that are permitted by RSS 1.0 and 2.0 (not related) and embed complex blocks markup from other dialects. Two somewhat common dialects found in feeds are XHTML bodies and FOAF persons. The parser preserves these blocks as a single node in the tree for ease of handling.

    An XHTML element can be retrieved by the element name of http://www.w3.org/1999/xhtml/body.

    A FOAF person can be retrieved by the element name of http://xmlns.com/foaf/0.1/person. Some feeds use Person (capital P) -- the parser will preserve those blocks but you have to retrieve the node with the slightly different name.

METHODS

The following objects and methods are provided in this package.

XML::RSS::Parser->new

Constructor. Returns a reference to a new XML::RSS::Parser object.

$parser->parse(source)

Inherited from XML::Parser, the SOURCE parameter should either an open IO::Handle or a string containing the whole XML document. A die call is thrown if a parse error occurs otherwise it will return a XML::RSS::Parser::Feed object.

$parser->parsefile(file)

Inherited from XML::Parser, FILE is an open handle. The file is closed no matter how parse returns. A die call is thrown if a parse error occurs otherwise it will return a XML::RSS::Parser::Feed object.

XML::RSS::Parser->ns_qualify(element, namesapce_uri)

An simple utility method implemented as an abstract method that will return a fully namespace qualified string for the supplied element.

DEPENDENCIES

XML::Parser

SEE ALSO

XML::RSS:Parser::Element, XML::RSS::Parser::Feed, XML::Parser, XML::SimpleObject

The Feed Validator http://www.feedvalidator.org/

What is RSS? http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html

Raising the Bar on RSS Feed Quality http://www.oreillynet.com/pub/a/webservices/2002/11/19/rssfeedquality.html

TO DO

  • Abstraction layer for handling overlapping elements found throughout the various RSS formats.

  • Implement simple XPath matching capabilities to the package.

  • Parser collects a lot of unnecessary whitespace. Keep or filter? Filter what?

  • Add method for adding more blocks to preserve.

LICENSE

The software is released under the Artistic License. The terms of the Artistic License are described at http://www.perl.com/language/misc/Artistic.html.

AUTHOR & COPYRIGHT

Except where otherwise noted, XML::RSS::Parser is Copyright 2003-4, Timothy Appnel, cpan@timaoutloud.org. All rights reserved.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 237:

'=item' outside of any '=over'

Around line 258:

You forgot a '=back' before '=head1'