NAME
Web::Sitemap - Simple way to generate sitemap files with paging support
SYNOPSIS
use Web::Sitemap;
my $sm = Web::Sitemap->new(
output_dir => '/path/for/sitemap',
### Options ###
temp_dir => '/path/to/tmp',
loc_prefix => 'http://my_domain.com',
index_name => 'sitemap',
file_prefix => 'sitemap.',
# mark for grouping urls
default_tag => 'my_tag',
# add <mobile:mobile/> inside <url>, and appropriate namespace (Google standard)
mobile => 1,
# add appropriate namespace (Google standard)
images => 1,
# additional namespaces (scalar or array ref) for <urlset>
namespace => 'xmlns:some_namespace_name="..."',
# location prefix for files-parts of the sitemap (default is loc_prefix value)
file_loc_prefix => 'http://my_domain.com',
# specify data input charset
charset => 'utf8',
move_from_temp_action => sub {
my ($temp_file_name, $public_file_name) = @_;
# ...some action...
#
# default behavior is
# File::Copy::move($temp_file_name, $public_file_name);
}
);
$sm->add(\@url_list);
# When adding a new portion of URL, you can specify a label for the file in which these will be URL
$sm->add(\@url_list1, tag => 'articles');
$sm->add(\@url_list2, tag => 'users');
# If in the process of filling the file number of URL's will exceed the limit of 50 000 URL or the file size is larger than 50MB, the file will be rotate
$sm->add(\@url_list3, tag => 'articles');
# After calling finish() method will create an index file, which will link to files with URL's
$sm->finish;
DESCRIPTION
This module is an utility for generating indexed sitemaps.
Each sitemap file can have up to 50 000 URLs or up to 50MB in size (after decompression) according to sitemaps.org. Any page that exceeds that limit must use sitemap index files instead.
Web::Sitemap generates a single sitemap index with links to multiple sitemap pages. The pages are automatically split when they reach the limit and are always gzip compressed. Files are created in form of temporary files and copied over to the destination directory, but the copy action can be hooked into to change that behavior.
INTERFACE
Web::Sitemap only provides OO interface.
Methods
new
my $sitemap = Web::Sitemap->new(output_dir => $dirname, %options);
Constructs a new Web::Sitemap object that will generate the sitemap.
Files will be put into output_dir. This argument is required.
Other optional arguments include:
temp_dirPath to a temporary directory. Must already exist and be writable. If not specified, a new temporary directory will be created using File::Temp.
loc_prefixA location prefix for all the urls in the sitemap, like 'http://my_domain.com'. Defaults to an empty string.
index_nameName of the sitemap index (basename without the extension). Defaults to 'sitemap'.
file_prefixPrefix for all sitemap files containing URLs. Defaults to 'sitemap.'.
default_tagA default tag that will be used for grouping URLs in files when they are added without an explicit tag. Defaults to 'pages'.
mobileWill add a mobile namespace to the sitemap files, and each URL will contain
<mobile:mobile/>. This is a Google standard. Disabled by default.imagesWill add images namespace to the sitemap files. This is a Google standard. Disabled by default.
namespaceAdditional namespaces to be added to the sitemap files. This can be a string or an array reference containing strings. Empty by default.
file_loc_prefixA prefix that will be put before the filenames in the sitemap index. This will not cause files to be put in a different directory, will only affect the sitemap index. Defaults to the value of
loc_prefix.charsetEncoding to be used for writing the files. Defaults to 'utf8'.
move_from_temp_actionA coderef that will change how the files are handled after successful generation. Will be called once for each generated file and be passed these arguments:
$temporary_file_path, $destination_file_path.By default it will copy the files using File::Copy::move.
add
$sitemap->add(\@links, tag => $tagname);
Adds more links to the sitemap under $tagname (can be ommited - defaults to pages or the one specified in the constructor).
Links can be simple scalars (URL strings) or a hashref. See "new" in Web::Sitemap::Url for a list of possible hashref arguments.
Can be called multiple times.
finish
$sitemap->finish;
Finalizes the sitemap creation and calls the function to move temporary files to the output directory.
EXAMPLES
Support for Google images format
Format 1
$sitemap->add([{
loc => 'http://test1.ru/',
images => {
caption_format => sub {
my ($iterator_value) = @_;
return sprintf('Vasya - foto %d', $iterator_value);
},
loc_list => [
'http://img1.ru/',
'http://img2.ru'
]
}
}]);
Format 2
$sitemap->add([{
loc => 'http://test11.ru/',
images => {
caption_format_simple => 'Vasya - foto',
loc_list => ['http://img11.ru/', 'http://img21.ru']
}
}]);
Format 3
$sitemap->add([{
loc => 'http://test122.ru/',
images => {
loc_list => [
{ loc => 'http://img122.ru/', caption => 'image #1' },
{ loc => 'http://img133.ru/', caption => 'image #2' },
{ loc => 'http://img144.ru/', caption => 'image #3' },
{ loc => 'http://img222.ru', caption => 'image #4' }
]
}
}]);
);
AUTHOR
Mikhail N Bogdanov <mbogdanov at cpan.org >
CONTRIBUTORS
In no particular order:
Ivan Bessarabov
Bartosz Jarzyna (@brtastic)
LICENSE
This module and all the packages in this module are governed by the same license as Perl itself.