NAME
CPAN::Access::AdHoc - Retrieve stuff from an arbitrary CPAN repository
SYNOPSIS
use CPAN::Access::AdHoc;
my ( $module ) = @ARGV;
my $cad = CPAN::Access::AdHoc->new();
my $index = $cad->fetch_module_index();
if ( $index->{$module} ) {
print "$module is in $index->{distribution}\n";
} else {
print "$module is not indexed\n";
}
DESCRIPTION
This class provides a lowish-level interface to an arbitrary CPAN repository. You can fetch anything, but there is particular support for the author and module indices, distributions, and their metadata.
What it does not provide is module installation, dependency resolution, or what-have-you. There are already plenty of tools for that.
The intent is that this should be a zero-configuration system, or at least a configuration-optional system.
Attributes can be specified explicitly either when the object is instantiated or afterwards. The default is from the global section of a Config::Tiny configuration file, CPAN-Access-AdHoc.ini, which is found in directory File::HomeDir->my_dist_config( 'CPAN-Access-AdHoc' )
. The named sections are currently unused, though CPAN-Access-AdHoc
reserves to itself all section names which contain no uppercase letters.
In addition, it is possible to take the default CPAN repository URL from the user's CPAN::Mini, cpanm, CPAN, or CPANPLUS configuration. They are accessed in this order by default, and the first available is used. But which of these are considered, and the order in which they are considered is under the user's control, via the default_cpan_source attribute/configuration item.
What actually happened here is that I got an RT ticket on one of my CPAN distributions, pointing out that the Free Software Foundation had moved, and I needed to update the copy of the Gnu GPL that I distributed. Well, it's the same text for all my distributions, so I wanted a tool to tell me which ones had already been updated in CPAN.
A little later, I realized that a clobbered version of one of my author tests got shipped in a couple distributions, so I wrote another Perl script to see how far the rot had spread.
Then I found out about an interesting but somewhat heavyweight module, and wanted to know what I really needed to install to get it going. Yes, cpanm will do this, but I have not taken that step yet.
So I found myself writing mostly the same code for the third time, and decided there ought to be a better way. Hence this module.
METHODS
This class supports the following public methods:
Instantiator
new
This static method instantiates the object. You can specify attribute values by passing name/value argument pairs. Defaults are documented with the individual attributes.
If you do not specify an explicit cpan
argument, and a default CPAN URL can not be computed, an exception is thrown. See the cpan attribute documentation for a few more details.
Accessors/Mutators
config
When called with no arguments, this method acts as an accessor, and returns the current configuration as a Config::Tiny object.
When called with an argument, this method acts as a mutator. If the argument is a Config::Tiny object it becomes the new configuration. If the argument is undef
, file CPAN-Access-AdHoc.ini in File::HomeDir->my_dist_config( 'CPAN-Access-AdHoc' )
is read for the configuration. If this file does not exist, the configuration is set to an empty Config::Tiny object.
cpan
When called with no arguments, this method acts as an accessor, and returns a URI object representing the URL of the CPAN repository being accessed.
When called with an argument, this method acts as a mutator. It sets the URL of the CPAN repository accessed by this object, and (for reasons of sanity) calls flush() to purge any data cached from the old repository. The argument can be either a string or an object that stringifies (such as a URI object). To be valid, the scheme must be supported by LWP::UserAgent (that is, LWP::Protocol::implementor()
must return a true value), and must support a hierarchical name space. That means that schemes like file:
, http:
, and ftp:
are accepted, but schemes like mailto:
(non-hierarchical name space) and foobar:
(not known to be supported by LWP::UserAgent
) are not.
If the argument is undef
, the default URL as computed from the sources in default_cpan_source is used. If no URL can be computed from any source, an exception is thrown.
default_cpan_source
When called with no arguments, this method acts as an accessor, and returns the current list of default CPAN sources as an array reference. This is incompatible with version 0.000_08 and before, where the return was a comma-delimited string.
When called with an argument, this method acts as a mutator, and sets the list of default CPAN sources. This list is either an array reference or a comma-delimited string, and consists of the names of zero or more CPAN::Access::AdHoc::Default::CPAN::*
classes. With either mechanism the names of the classes may be passed without the common prefix, which will be added back if needed. See the documentation of these classes for more information.
If any of the elements in the string does not represent an existing CPAN::Access::AdHoc::Default::CPAN::
class, an exception is thrown and the value of the attribute remains unmodified.
If the argument is undef
, the default is restored.
The default is 'CPAN::Mini,cpanm,CPAN,CPANPLUS'
.
Functionality
These methods are what all the rest is in aid of.
corpus
This convenience method returns a list of the indexed distributions by the author with the given CPAN ID. This information is derived from the output of indexed_distributions(). The argument is converted to upper case before use.
fetch
This method fetches the named file from the CPAN repository. Its argument is the name of the file relative to the root of the repository.
If this method determines that there might be checksums for this file, it attempts to retrieve them, and if successful will compare the SHA256
checksum of the retrieved data to the retrieved value.
If the file is compressed in some way it will be decompressed.
If the fetched file is an archive of some sort, an object representing the archive will be returned. This object will be of one of the CPAN::Access::AdHoc::Archive::*
classes, each of which wraps the corresponding Archive::*
class and provides CPAN::Access::AdHoc
with a consistent interface. These classes will be initialized with
content => the literal content of the archive, as downloaded,
encoding => the MIME encoding used to decode the archive,
path => the path to the archive, relative to the base URL.
If the fetched file is not an archive, it is wrapped in a CPAN::Access::AdHoc::Archive::Null object and returned.
All other fetch functionality is implemented in terms of this method.
fetch_author_index
This method fetches the author index, authors/01mailrc.txt.gz. It is expanded and interpreted, and returned as a hash reference keyed by the authors' CPAN IDs. The data for each author is an anonymous hash with the following keys:
The results of the first fetch are cached; subsequent calls are supplied from cache.
fetch_module_index
This method fetches the module index, modules/02packages.details.txt.gz. It is expanded and interpreted, and returned as a hash reference keyed by the module names. The data for each module is an anonymous hash with the following keys:
If called in list context, the first return is the index, and the second is another hash reference containing the metadata that appears at the top of the expanded index file.
The results of the first fetch are cached; subsequent calls are supplied from cache.
fetch_distribution_archive
This method takes as its argument the name of a distribution file relative to the archive's authors/id/ directory, and returns the distribution as a CPAN::Access::AdHoc::Archive::*
object.
Note that since this method is implemented in terms of fetch(), the archive method's path
attribute will be set to its path relative to the base URL of the CPAN repository, not its path relative to the authors/id/ directory. So, for example,
$arc = $cad->fetch_distribution_archive(
'B/BA/BACH/PDQ-0.000_01.zip' );
say $arc->path(); # authors/id/B/BA/BACH/PDQ-0.000_01.zip
For convenience, either the top or the top two directories can be omitted, since they can be reconstructed from the rest. So the above example can also be written as
$arc = $cad->fetch_distribution_archive(
'BACH/PDQ-0.000_01.zip' );
say $arc->path(); # authors/id/B/BA/BACH/PDQ-0.000_01.zip
fetch_distribution_checksums
use YAML::Any;
print Dump( $cad->fetch_distribution_checksums(
'B/BA/BACH/' ) );
print Dump( $cad->fetch_distribution_checksums(
'B/BA/BACH/Johann-0.001.tar.bz2' ) );
This method takes as its argument either a file name or a directory name relative to authors/id/. A directory is indicated by a trailing slash.
If the request if for the CHECKSUMS file, the return is a reference to a hash which contains the interpreted contents of the entire file.
If the argument is a file name other than CHECKSUMS, the return is a reference to the CHECKSUMS entry for that file, provided it exists.
If the argument is a directory name, it is treated like a request for the CHECKSUMS file in that directory.
If the CHECKSUMS file does not exist, an exception is raised. If the argument was a file name and the file has no entry in the CHECKSUMS file, nothing is returned.
For convenience, either the top or the top two directories can be omitted, since they can be reconstructed from the rest.
The result of the first fetch for a given directory is cached, and subsequent calls for the same author are supplied from cache.
fetch_registered_module_index
This method fetches the registered module index, modules/03modlist.data.gz. It is interpreted, and returned as a hash reference keyed by module name.
If called in list context, the first return is the index, and the second is a hash reference containing the metadata that appears at the top of the expanded index file.
The results of the first fetch are cached; subsequent calls are supplied from cache.
flush
This method deletes all cached results, causing them to be re-fetched when needed.
indexed_distributions
This convenience method returns a list of all indexed distributions in ASCIIbetical order. This information is derived from the results of fetch_module_index(), and is cached.
Subclass Methods
The following methods exist for the benefit of subclasses, and should not be considered part of the public interface. I am willing to make this interface public on request, but until the request comes I will consider myself at liberty to modify it without notice.
__attr
This method returns a hash containing all attributes specific to the class that makes the call. This hash may be modified, and in fact must be to store new attribute values.
__cache
This method returns a hash containing all values cached by the object. This hash may be modified, and in fact must be to cache new values.
__create_accessor_mutators
__PACKAGE__->__create_accessor_mutators( @attributes );
This static method creates accessor/mutator methods for the attributes named in its argument list. If a subroutine with the same name as an attribute exists at the time this method is called, that subroutine is assumed to be the accessor/mutator for that attribute.
The methods created by __create_accessor_mutators()
have three hooks for behavior modification. For any attribute whatever
, these are:
- __attr__whatever__default
-
my ( $self, $value ) = @_; my $code; not defined $value and $code = $self->can( '__attr__whatever__default' ) and $value = $code->( $self );
This is called when the mutator is passed a new value of
undef
. Its only argument is the invocant. It must return a valid value of the attribute.If a subclass overrides this, the subclass probably should not call
$self->SUPER::__attr__whatever__default()
. - __attr__whatever__validate
-
my ( $self, $value ) = @_; my $code; $code = $self->can( '__attr__whatever__validate' ) and $value = $code->( $self, $value );
This method is called after
__attr__whatever__default()
, and validates the value. It rejects a value by throwing an exception. The preferred way to do this is by calling__wail()
.If a subclass overrides this, the subclass must execute
$value = $self->SUPER::__attr__whatever__validate( $value );
before it performs its own validation. The superclass method must return the internal format of the attribute's value, which the subclass must return after validating.
- __attr__whatever__post_assignment
-
$self->__attr__whatever__post_assignment()
This method is called after the new value has been assigned to the attribute.
If a subclass overrides this, it must call
$self->SUPER::__attr__whatever__post_assignment()
, and it should call it last thing before returning.
All these hooks are optional, but __create_accessor_mutators()
will generate dummy __attr__whatever__validate()
and __attr__whatever__post_assignment()
methods for any attributes that do not have them at the time it is called.
__init
This method is called when a new object is instantiated. Its arguments are a series of name/value pairs. The subclass should override this, and the override should make use of any arguments it recognizes, deleting them from the argument hash as it does so. The override should then call $self->SUPER::__init( %args )
, passing the superclass all unused arguments.
SEE ALSO
CPAN::DistnameInfo, which parses distribution name and version (among other things) from the name of a particular distribution archive. This was very helpful in some of my CPAN ad-hocery.
SUPPORT
Support is by the author. Please file bug reports at http://rt.cpan.org, or in electronic mail to the author.
AUTHOR
Thomas R. Wyant, III wyant at cpan dot org
COPYRIGHT AND LICENSE
Copyright (C) 2012 by Thomas R. Wyant, III
This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.10.0. For more details, see the full text of the licenses in the directory LICENSES.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.