NAME
PAUSE::Users - interface to PAUSE's users file (00whois.xml)
SYNOPSIS
use PAUSE::Users;
my $users = PAUSE::Users->new(max_age => '1 day');
my $iterator = $users->user_iterator();
while (defined(my $user = $iterator->next_user)) {
print "PAUSE id = ", $user->id, "\n";
print "Name = ", $user->fullname, "\n";
}
DESCRIPTION
PAUSE::Users provides an interface to the 00whois.xml
file produced by the Perl Authors Upload Server (PAUSE). This file contains a list of all PAUSE users, with some basic information about each user.
By default PAUSE::Users will request the file from PAUSE at most once a day, using a locally cached copy otherwise. You can specify the caching time using the max_age
attribute. You can express the caching time using any of the expressions supported by Time::Duration::Parse.
At the moment this module supports a single iterator interface. The next_user()
method returns an instance of PAUSE::Users::User (I know, bit of an odd name).
Here's the simple skeleton for iterating over all PAUSE users:
my $iterator = PAUSE::Users->new()->user_iterator();
while (my $user = $iterator->next_user) {
# doing something with $user
}
Constructor
The constructor takes the following attributes
cache_path Specify the full path to the local file where the contents of 00whois.xml should be cached. If not set, an appropriate path for your operating system will be generated using File::HomeDir.
If you don't set this attribute, then after instantiating PAUSE::Users you can get this attribute to see where the content is being cached.
path The full path to your own copy of 00whois.xml. If this is provided, then PAUSE::Users won't check to see if CPAN's copy is more recent than your file.
max_age The maximum age for the cached copy, which is stored in the file referenced with the
cache_path
attribute. If your cached copy was updated with the lastmax_age
seconds, then PAUSE::Users won't even check whether the CPAN copy has been updated.You can specify the
max_age
using any of the notations supported by Time::Duration::Parse. It defaults to '1 day'.
The user object
The user object supports the following methods:
- id
-
The user's PAUSE id. For example my PAUSE id is NEILB.
- fullname
-
The full name of the user, as they would write it. So expect to see Kanji and plenty of other non-ASCII characters here. You are UTF-8 clean, right?
- asciiname
-
An ASCII version of the user's name. This might be the romaji version of a Japanese name, or the fullname without any accents. For example, author NANIS has fullname A. Sinan Ünür, and asciiname A. Sinan Unur.
-
The contact email address for the author, or
CENSORED
if the author specified that their email address should not be shared. - has_cpandir
-
Set to
1
if the author has a directory on CPAN, and 0 if not. This is only true (1) if the author currently has something on CPAN. If you upload a dist then delete it, the dist will be on BackPAN but not on CPAN, andhas_cpandir
will return 0. - homepage
-
The author's homepage, if they've specified one. This might be their blog, their employer's home page, or any other URL they've chosen to associate with their account.
- introduced
-
When the author's PAUSE account was created, specified as seconds since the epoch. This may change to being an instance of DateTime.
00whois.xml file format
The meat of the file is a list of <cpanid>
elements, each of which contains details of one PAUSE user:
<?xml version="1.0" encoding="UTF-8"?>
<cpan-whois xmlns='http://www.cpan.org/xmlns/whois'
last-generated='Sat Nov 16 18:19:01 2013 UTC'
generated-by='/home/puppet/pause/cron/cron-daily.pl'>
...
<cpanid>
<id>NEILB</id>
<type>author</type>
<fullname>Neil Bowers</fullname>
<email>neil@bowers.com</email>
<has_cpandir>1</has_cpandir>
</cpanid>
...
</cpan-whois>
In addition to all PAUSE users, the underlying file (00whois.xml) also contains details of perl.org mailing lists. For example, here's the entry for Perl5-Porters:
<cpanid>
<id>P5P</id>
<type>list</type>
<asciiname>The Perl5 Porters Mailing List</asciiname>
<email>perl5-porters@perl.org</email>
<info>Mail perl5-porters-subscribe@perl.org</info>
<has_cpandir>0</has_cpandir>
</cpanid>
All list type entries are ignored by PAUSE::Users
.
NOTES
I started off trying a couple of XML modules, but I was surprised at how slow they were, and not really iterator-friendly. So the current version of the iterator does line-based parsing using regexps. You really shouldn't do that, but 00whois.xml is automatically generated, follows a well-defined format, which very rarely changes.
SEE ALSO
Parse::CPAN::Whois is another module that parses 00whois.xml, but you have to download it yourself first.
Parse::CPAN::Authors is another module for getting information about PAUSE users, but based on 01.mailrc.txt.gz
.
CPAN::Index::API::File::Whois provides a similar interface to 00whois.xml.
CPAN::Search::Author does a real-time search for CPAN authors using search.cpan.org.
CPAN::Source fetches 4 of the PAUSE indices and lets you query an aggregation of the data they contain.
PAUSE::Permissions, PAUSE::Packages.
REPOSITORY
https://github.com/neilbowers/PAUSE-Users
AUTHOR
Neil Bowers <neilb@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by Neil Bowers <neilb@cpan.org>.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.