NAME
Data::Mirror - a simple way to efficiently retrieve data from the World Wide Web.
VERSION
version 0.07
SYNOPSIS
use Data::Mirror qw(:all);
# set the global time-to-live of all cached resources
$Data::Mirror::TTL = 30;
# get some data
$file = mirror_file($url);
$string = mirror_str($url);
$fh = mirror_fh($url);
$json = mirror_json($url);
$xml = mirror_xml($url);
$yaml = mirror_yaml($url);
$rows = mirror_csv($url);
DESCRIPTION
Data::Mirror
tries to take away as much pain as possible when it comes to retrieving and using remote data sources such as JSON objects, YAML documents, XML instances and CSV files.
Many Perl programs need to retrieve, store, and then parse remote data resources. This can result in a lot of repetitive code, to generate a local filename, check to see if it already exists and is sufficiently fresh, retrieve a copy of the remote resource if needed, and then parse it. If a program uses data sources of many different types (say JSON, XML and CSV) then it often does the same thing over and over again, just using different modules for parsing.
Data::Mirror
does all that for you, so you can focus on using the data.
USAGE
The general form of this module's API is:
$value = Data::Mirror::mirror_TYPE($url);
where TYPE
corresponds to the expected data type of the resource at $url
(which can be a string or a URI).
The return value will be undef
if there's an error. The module will carp()
so you can catch any errors.
Note: it's possible that the remote resource will actually be someting that evaluates to undef
(for example, a JSON document that is exactly "null"
, or a YAML document that is exactly "~"
), or if there is an error parsing the resource once retrieved. Consider wrapping the method call in eval
if you need to distinguish between these scenarios.
By default, if the locally cached version of the resource is younger than $Data::Mirror::TTL_SECONDS
old, Data::Mirror
will just use it and won't try to refresh it, but you can override that per-request by passing the $ttl
argument:
$value = Data::Mirror::mirror_TYPE($url, $ttl);
EXPORTS
To import all the functions listed below, include :all
in the tags imported by use
:
use Data::Mirror qw(:all);
You can also import specific functions separately:
use Data::Mirror qw(mirror_json mirror_csv);
PACKAGE VARIABLES
$TTL_SECONDS
This is the global "time to live" of local copies of files, which is used if the $ttl
argument is not passed to a mirror function. By default it's 300 seconds.
If Data::Mirror
receives a 304 response from the server, then it will update the mtime of the local file so that another refresh will not occur until a further $TTL_SECONDS
seconds has elapsed. The mtime will either be the current timestamp, or the value of the Expires
header, whichever is later.
$UA
This is an LWP::UserAgent object used to retrieve remote resources. You may wish to use this variable to configure various aspects of its behaviour, such as credentials, user agent string, TLS options, etc.
$JSON
This is a JSON::XS object used for JSON decoding. You may wish to use this variable to change how it processes JSON data.
$CSV
This is a Text::CSV_XS object used for CSV parsing. You may wish to use this variable to change how it processes CSV data.
FUNCTIONS
mirror_file()
This method returns a string containg a name of a local file containing the resource. All the other functions listed in this section use mirror_file()
under the hood.
Data::Mirror
will write local copies of files to the appropriate temporary directory (determined using File::Spec->tmpdir
) and tries to reduce the risk of collision by hashing the URL and the current username. This means that different programs, run by the same user, that use Data::Mirror
to retrieve the same URL, will effectively share a cache for that URL, but other users on the system will not. File permissions are set to 0600
so other users cannot read the files.
mirror_str($url)
This method returns a UTF-8 encoded string containing the resource. If it's possible that the resource might be large enough to use up a lot of memory, consider using mirror_file()
or mirror_fh()
instead.
mirror_fh()
This method returns an IO::File handle containing the resource.
mirror_xml()
This method returns an XML::LibXML::Document handle containing the resource.
mirror_json()
This method returns a JSON data structure containing the resource. This could be undef
, a simple string, or an arrayref or hashref.
mirror_yaml()
This method returns a YAML data structure containing the resource. This could be undef
, a simple string, or an arrayref or hashref.
mirror_csv()
This method returns a reference to an array of arrayrefs containing the CSV rows in the resource.
OTHER FUNCTIONS
$file = Data::Mirror::filename($url);
Returns the local filename that Data::Mirror would use for the given URL.
$exists = Data::Mirror::mirrored($url);
Returns true if a copy of the resource identified by $url
exists locally. This function is equivalent to running -e Data::Mirror->filename($url)
.
$stale = Data::Mirror::stale($url, $ttl);
Returns true if the resource identified by $url
(a) does not exist locally or (b) its modification time is more then $ttl
seconds in the past. If $ttl
is not specified then $Data::Mirror::TTL_SECONDS
will be used instead.
REPORTING BUGS, CONTRIBUTING ENHANCEMENTS
This module is developed on GitHub at https://github.com/gbxyz/perl-data-mirror.
AUTHOR
Gavin Brown <gavin.brown@fastmail.uk>
COPYRIGHT AND LICENSE
This software is copyright (c) 2024 by Gavin Brown.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.