NAME

PAUSE::Users - interface to PAUSE's users file (00whois.xml)

SYNOPSIS

use PAUSE::Users;

my $users    = PAUSE::Users->new(max_age => '1 day');
my $iterator = $users->user_iterator();

while (defined(my $user = $iterator->next_user)) {
  print "PAUSE id = ", $user->id, "\n";
  print "Name     = ", $user->fullname, "\n";
}

DESCRIPTION

PAUSE::Users provides an interface to the 00whois.xml file produced by the Perl Authors Upload Server (PAUSE). This file contains a list of all PAUSE users, with some basic information about each user.

By default PAUSE::Users will request the file from PAUSE at most once a day, using a locally cached copy otherwise. You can specify the caching time using the max_age attribute. You can express the caching time using any of the expressions supported by Time::Duration::Parse.

At the moment this module supports a single iterator interface. The next_user() method returns an instance of PAUSE::Users::User (I know, bit of an odd name).

Here's the simple skeleton for iterating over all PAUSE users:

my $iterator = PAUSE::Users->new()->user_iterator();

while (my $user = $iterator->next_user) {
   # doing something with $user
}

Constructor

The constructor takes the following attributes

  • cache_path Specify the full path to the local file where the contents of 00whois.xml should be cached. If not set, an appropriate path for your operating system will be generated using File::HomeDir.

    If you don't set this attribute, then after instantiating PAUSE::Users you can get this attribute to see where the content is being cached.

  • path The full path to your own copy of 00whois.xml. If this is provided, then PAUSE::Users won't check to see if CPAN's copy is more recent than your file.

  • max_age The maximum age for the cached copy, which is stored in the file referenced with the cache_path attribute. If your cached copy was updated with the last max_age seconds, then PAUSE::Users won't even check whether the CPAN copy has been updated.

    You can specify the max_age using any of the notations supported by Time::Duration::Parse. It defaults to '1 day'.

The user object

The user object supports the following methods:

id

The user's PAUSE id. For example my PAUSE id is NEILB.

fullname

The full name of the user, as they would write it. So expect to see Kanji and plenty of other non-ASCII characters here. You are UTF-8 clean, right?

asciiname

An ASCII version of the user's name. This might be the romaji version of a Japanese name, or the fullname without any accents. For example, author NANIS has fullname A. Sinan Ünür, and asciiname A. Sinan Unur.

email

The contact email address for the author, or CENSORED if the author specified that their email address should not be shared.

has_cpandir

Set to 1 if the author has a directory on CPAN, and 0 if not. This is only true (1) if the author currently has something on CPAN. If you upload a dist then delete it, the dist will be on BackPAN but not on CPAN, and has_cpandir will return 0.

homepage

The author's homepage, if they've specified one. This might be their blog, their employer's home page, or any other URL they've chosen to associate with their account.

introduced

When the author's PAUSE account was created, specified as seconds since the epoch. This may change to being an instance of DateTime.

00whois.xml file format

The meat of the file is a list of <cpanid> elements, each of which contains details of one PAUSE user:

<?xml version="1.0" encoding="UTF-8"?>
<cpan-whois xmlns='http://www.cpan.org/xmlns/whois'
           last-generated='Sat Nov 16 18:19:01 2013 UTC'
           generated-by='/home/puppet/pause/cron/cron-daily.pl'>
 
 ...
 
 <cpanid>
  <id>NEILB</id>
  <type>author</type>
  <fullname>Neil Bowers</fullname>
  <email>neil@bowers.com</email>
  <has_cpandir>1</has_cpandir>
 </cpanid>
 
 ...
 
</cpan-whois>

In addition to all PAUSE users, the underlying file (00whois.xml) also contains details of perl.org mailing lists. For example, here's the entry for Perl5-Porters:

<cpanid>
 <id>P5P</id>
 <type>list</type>
 <asciiname>The Perl5 Porters Mailing List</asciiname>
 <email>perl5-porters@perl.org</email>
 <info>Mail perl5-porters-subscribe@perl.org</info>
 <has_cpandir>0</has_cpandir>
</cpanid>

All list type entries are ignored by PAUSE::Users.

NOTES

I started off trying a couple of XML modules, but I was surprised at how slow they were, and not really iterator-friendly. So the current version of the iterator does line-based parsing using regexps. You really shouldn't do that, but 00whois.xml is automatically generated, follows a well-defined format, which very rarely changes.

SEE ALSO

Parse::CPAN::Whois is another module that parses 00whois.xml, but you have to download it yourself first.

Parse::CPAN::Authors is another module for getting information about PAUSE users, but based on 01.mailrc.txt.gz.

CPAN::Index::API::File::Whois provides a similar interface to 00whois.xml.

CPAN::Search::Author does a real-time search for CPAN authors using search.cpan.org.

CPAN::Source fetches 4 of the PAUSE indices and lets you query an aggregation of the data they contain.

PAUSE::Permissions, PAUSE::Packages.

REPOSITORY

https://github.com/neilbowers/PAUSE-Users

AUTHOR

Neil Bowers <neilb@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by Neil Bowers <neilb@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.