NAME
CPAN::Visitor - Generic traversal of distributions in a CPAN repository
VERSION
version 0.005
SYNOPSIS
use CPAN::Visitor;
my $visitor = CPAN::Visitor->new( cpan => "/path/to/cpan" );
# Prepare to visit all distributions
$visitor->select();
# Or a subset of distributions
$visitor->select(
subtrees => [ 'D/DA', 'A/AD' ], # relative to authors/id/
exclude => qr{/Acme-}, # No Acme- dists
match => qr{/Test-} # Only Test- dists
);
# Action is specified via a callback
$visitor->iterate(
visit => sub {
my $job = shift;
print $job->{distfile} if -f 'Build.PL'
}
);
# Or start with a list of files
$visitor = CPAN::Visitor->new(
cpan => "/path/to/cpan",
files => \@distfiles, # e.g. ANDK/CPAN-1.94.tar.gz
);
$visitor->iterate( visit => \&callback );
# Iterate in parallel
$visitor->iterate( visit => \&callback, jobs => 5 );
DESCRIPTION
A very generic, callback-driven program to iterate over a CPAN repository.
Needs better documentation and tests, but is provided for others to examine, use or contribute to.
USAGE
new
my $visitor = CPAN::Visitor->new( @args );
Object attributes include:
cpan
— path to CPAN or mini CPAN repository. Required.quiet
— whether warnings should be silenced (e.g. from extraction). Optional.stash
— hash-ref of user-data to be made available during iteration. Optional.files
— array-ref with a pre-selection of of distribution files. These must be in AUTHOR/NAME.suffix format. Optional.
select
$visitor->select( @args );
Valid arguments include:
subtrees
— path or array-ref of paths to search. These must be relative to the 'authors/id/' directory within a CPAN repo. If given, only files within those subtrees will be considered. If not specified, the entire 'authors/id' tree is searched.exclude
— qr() or array-ref of qr() patterns. If a path matches *any* pattern, it is excludedmatch
— qr() or array-ref of qr() patterns. If an array-ref is provided, only paths that match *all* patterns are includedall_files — boolean that determines whether all files or only files that have a distribution archive suffix are selected. Default is false.
append — boolean that determines whether the selected files should be appended to previously selected files. The default is false, which replaces any previous selection
The select
method returns a count of files selected.
iterate
$visitor->iterate( @args );
Valid arguments include:
jobs
— non-negative integer specifying the maximum number of forked processes. Defaults to none.check
— code reference callbackstart
— code reference callbackextract
— code reference callbackenter
— code reference callbackvisit
— code reference callbackleave
— code reference callbackfinish
— code reference callback
See "ACTION CALLBACKS" for more. Generally, you only need to provide the visit
callback, which is called from inside the unpacked distribution directory.
The iterate
method always returns true.
ACTION CALLBACKS
Each selected distribution is processed with a series of callback functions. These are each passed a hash-ref with information about the particular distribution being processed.
sub _my_visit {
my $job = shift;
# do stuff
}
The job hash-ref is initialized with the following fields:
distfile
— the unique, short CPAN distfile name, e.g. DAGOLDEN/CPAN-Visitor-0.001.tar.gzdistpath
— the absolute path the distribution archive, e.g. /my/cpan/authors/id/D/DA/DAGOLDEN/CPAN-Visitor-0.001.tar.gztempdir
— a File::Temp directory object for extraction or other thingsstash
— the 'stash' hashref from the Visitor objectquiet
— the 'quiet' flag from the Visitor objectresult
— an empty hashref to start; the return values from each action are added and may be referenced by subsequent actions
The result
field is used to accumulate the return values from action callbacks. For example, the return value from the default 'extract' action is the unpacked distribution directory:
$job->{result}{extract} # distribution directory path
You do not need to store the results yourself — the iterate
method takes care of it for you.
Callbacks occur in the following order. Some callbacks skip further processing if the return value is false.
check
— determines whether the distribution should be processed; goes to next file if false; default is always truestart
— used for any setup, logging, etc; default does nothingextract
— operate on the tarball to prepare for visiting; skips to finish action if it returns a false value; the default extracts a distribution into a temp directory and returns the path to the extracted directory; if thestash
has a true value forprefer_bin
, binary tar, etc. will be preferred. This is faster, but less portable.enter
— skips to the finish action if it returns false; default takes the result of extract, chdir's into it, and returns the original directory; if the extract result is missing the +x permissions, this will attempt to add it before calling chdir.visit
— examine the distribution or otherwise do stuff; the default does nothing;leave
— default returns to the original directory (the result of enter)finish
— any teardown processing, logging, etc.
These allow complete customization of the iteration process. For example, one could do something like this:
replace the default
extract
callback with one that returns an arrayref of distribution files without actually unpacking it into a physical directoryreplace the default
enter
callback with one that does nothing but return a true value; replace the defaultleave
callback likewisehave the
visit
callback get the$job->{result}{extract}
listing and examine it for the presence of certain files
This could potentially speed up iteration if only the file names within the distribution are of interest and not the contents of the actual files.
SEE ALSO
SUPPORT
Bugs / Feature Requests
Please report any bugs or feature requests through the issue tracker at https://github.com/dagolden/CPAN-Visitor/issues. You will be notified automatically of any progress on your issue.
Source Code
This is open source software. The code repository is available for public review and contribution under the terms of the license.
https://github.com/dagolden/CPAN-Visitor
git clone https://github.com/dagolden/CPAN-Visitor.git
AUTHOR
David Golden <dagolden@cpan.org>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2010 by David Golden.
This is free software, licensed under:
The Apache License, Version 2.0, January 2004