NAME

WWW::SitemapIndex::XML - XML Sitemap index protocol

VERSION

version 1.121160

SYNOPSIS

use WWW::SitemapIndex::XML;

my $index = WWW::SitemapIndex::XML->new();

# add new sitemaps
$index->add( 'http://mywebsite.com/sitemap1.xml.gz' );

# or
$index->add(
    loc => 'http://mywebsite.com/sitemap1.xml.gz',
    lastmod => '2010-11-26',
);

# or
$index->add(
    WWW::SitemapIndex::XML::Sitemap->new(
        loc => 'http://mywebsite.com/sitemap1.xml.gz',
        lastmod => '2010-11-26',
    )
);

# read sitemaps from existing sitemap_index.xml file
my @sitemaps = $index->read( 'sitemap_index.xml' );

# load sitemaps from existing sitemap_index.xml file
$index->load( 'sitemap_index.xml' );

# get XML::LibXML object
my $xml = $index->as_xml;

print $xml->toString(1);

# write to file
$index->write( 'sitemap_index.xml', my $pretty_print = 1 );

# write compressed
$index->write( 'sitemap_index.xml.gz' );

DESCRIPTION

Read and write sitemap index xml files as defined at http://www.sitemaps.org/.

METHODS

add($sitemap|%attrs)

$index->add(
    WWW::SitemapIndex::XML::Sitemap->new(
        loc => 'http://mywebsite.com/sitemap1.xml.gz',
        lastmod => '2010-11-26',
    )
);

Add the $sitemap object representing single sitemap in the sitemap index.

Accepts blessed objects implementing WWW::SitemapIndex::XML::Sitemap::Interface.

Otherwise the arguments %attrs are passed as-is to create new WWW::SitemapIndex::XML::Sitemap object.

$index->add(
    loc => 'http://mywebsite.com/sitemap1.xml.gz',
    lastmod => '2010-11-26',
);

# single url argument
$index->add( 'http://mywebsite.com/' );

# is same as
$index->add( loc => 'http://mywebsite.com/sitemap1.xml.gz' );

Performs basic validation of sitemaps added:

  • maximum of 50 000 sitemaps in single sitemap

  • URL no longer then 2048 characters

  • all URLs should use the same protocol and reside on same host

sitemaps

my @sitemaps = $index->sitemaps;

Returns a list of all Sitemap objects added to sitemap index.

load(%sitemap_index_location)

$index->load( location => $sitemap_index_file );

It is a shortcut for:

$index->add($_) for $index->read( location => $sitemap_index_file );

Please see "read" for details.

read(%sitemap_index_location)

# file or url to sitemap index
my @sitemaps = $index->read( location => $file_or_url );

# file handle
my @sitemaps = $index->read( IO => $fh );

# xml string
my @sitemaps = $index->read( string => $xml );

Read the sitemap index from file, URL, open file handle or string and return the list of WWW::SitemapIndex::XML::Sitemap objects representing <sitemap> elements.

write($file, $format = 0)

# write to file
$index->write( 'sitemap_index.xml', my $pretty_print = 1);

# or
my $fh = IO::File->new();
$fh->open('sitemap_index.xml', 'w');
$index->write( $fh, my $pretty_print = 1);
$cfh->close;

# write compressed
$index->write( 'sitemap_index.xml.gz' );

Write XML sitemap index to $file - a file name or IO::Handle object.

If file names ends in .gz then the output file will be compressed by setting compression on xml object - please note that it requires libxml2 to be compiled with zlib support.

Optional $format is passed to toFH or toFile methods (depending on the type of $file, respectively for file handle and file name) as described in XML::LibXML.

as_xml

my $xml = $index->as_xml;

# pretty print
print $xml->toString(1);

# write compressed
$xml->setCompression(8);
$xml->toFile( "sitemap_index.xml.gz" );

Returns XML::LibXML::Document object representing the sitemap index in XML format.

The <sitemap> elements are built by calling as_xml on all Sitemap objects added into sitemap index.

SEE ALSO

Please see those modules/websites for more information related to this module.

AUTHOR

Alex J. G. Burzyński <ajgb@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2010 by Alex J. G. Burzyński <ajgb@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.