NAME

XML::RSSLite - Perl extension for "relaxed" RSS parsing

SYNOPSIS

use XML::RSSLite;

. . .

parseXML(\%result, \$content);

print "=== Channel ===\n",
      "Title: $result{'title'}\n",
      "Desc:  $result{'description'}\n",
      "Link:  $result{'link'}\n\n";

foreach $item (@{$result{'items'}}) {
print "  --- Item ---\n",
      "  Title: $item->{'title'}\n",
      "  Desc:  $item->{'description'}\n",
      "  Link:  $item->{'link'}\n\n";
}

DESCRIPTION

This module attempts to extract the maximum amount of content from available documents, and is less concerned with XML compliance than alternatives. Rather than rely on XML::Parser, it uses heuristics and good old-fashioned Perl regular expressions. It stores the data in a simple hash structure, and "aliases" certain tags so that when done, you can count on having the minimal data necessary for re-constructing a valid RSS file. This means you get the basic title, description, and link for a channel and its items. Anything else present in the hash is a bonus :)

This module extracts more usable links by parsing "scriptingNews" and "weblog" formats in addition to RDF & RSS. It also "sanitizes" the output for best results. The munging includes:

Remove html tags to leave plain text
Remove characters other than 0-9~!@#$%^&*()-+=a-zA-Z[];',.:"<>?\s
Use <url> tags when <link> is empty
Use misplaced urls in <title> when <link> is empty
Join relative urls to the site base

EXPORT

parseXML($outHashRef, $inScalarRef)

$inScalarRef is a reference to a scalar containing the document to be parsed, the contents will effectively be destroyed. $outHashRef is a reference to the hash within which to store the parsed content.

usableXML($inScalarRef)

Test whether or not XML::RSSLite understands the content of the referenced document.

EXPORTABLE

isRDF($inScalarRef)

Tests if a referenced document is RDF.

isRSS($inScalarRef)

Tests if a referenced document is RSS.

isSN($inScalarRef)

Tests if a referenced document is scriptingNews.

isWL($inScalarRef)

Tests if a referenced document is weblog.

BUGS

Sometimes the title of an item will be missed, the condition will presist until additional items have been added to the document. As a stop gap, when this happens the item title is set equal to the item link.

It may take awhile for the tuits to fix this to accumulate. feel free to submit a patch.

SEE ALSO

perl(1), XML::RSS

AUTHOR

Jerrad Pierce <jpierce@cpan.org>.

Scott Thomason <scott@industrial-linux.org>

LICENSE

Portions Copyright (c) 2002 Jerrad Pierce, (c) 2000 Scott Thomason. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.