NAME
XML::RSSLite - lightweight, "relaxed" RSS (and XML-ish) parser
SYNOPSIS
use XML::RSSLite;
parseRSS(\%result, \$content);
print "=== Channel ===\n",
      "Title: $result{'title'}\n",
      "Desc:  $result{'description'}\n",
      "Link:  $result{'link'}\n\n";
foreach $item (@{$result{'items'}}) {
print "  --- Item ---\n",
      "  Title: $item->{'title'}\n",
      "  Desc:  $item->{'description'}\n",
      "  Link:  $item->{'link'}\n\n";
}
DESCRIPTION
This module attempts to extract the maximum amount of content from available documents, and is less concerned with XML compliance than alternatives. Rather than rely on XML::Parser, it uses heuristics and good old-fashioned Perl regular expressions. It stores the data in a simple hash structure, and "aliases" certain tags so that when done, you can count on having the minimal data necessary for re-constructing a valid RSS file. This means you get the basic title, description, and link for a channel and its items.
This module extracts more usable links by parsing "scriptingNews" and "weblog" formats in addition to RDF & RSS. It also "sanitizes" the output for best results. The munging includes:
- Remove leading whitespace from URIs
 - By defaul strips characters except 0-9~!@#$%^&*()-+=a-zA-Z[];',.:"<>?\s
 - Use misplaced urls in <title> when <link> is empty
 - Exract links from <a href=...> if required
 - Limit links to ftp and http(s)
 - Join relative item urls (beginning with / or #) to the site base
 
EXPORT
- parseRSS($outHashRef, $inScalarRef, [$strip])
 - 
- inScalarRef - required
 - 
Reference to a scalar containing the document to be parsed. NOTE: The contents will effectively be destroyed. Make a deep copy first if you care.
 - outHashRef - required
 - 
Reference to the hash within which to store the parsed content.
 - strip - optional
 - 
An expression indicating the level of winnowing to be performed on the characters permitted in the results.
 
 
EXPORTABLE
- parseXML(\%parsedTree, \$parseThis, 'topTag', $comments);
 
CAVEATS
This is not a conforming parser. It does not handle the following
- 
<foo bar=">"> - 
<foo><bar> <bar></bar> <bar></bar> </bar></foo> - 
<![CDATA[ ]]> - 
PI 
It's non-validating, without a DTD the following cannot be properly addressed
- entities
 - namespaces
 - 
This may or may not be arriving in some future release.
 
SEE ALSO
perl(1), XML::RSS, XML::SAX::PurePerl, XML::Parser::Lite, <XML::Parser>
AUTHOR
Jerrad Pierce <jpierce@cpan.org>.
Scott Thomason <scott@thomasons.org>
LICENSE
Portions Copyright (c) 2002,2003,2009 Jerrad Pierce, (c) 2000 Scott Thomason. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 480:
 =back without =over