NAME

Logfile::EPrints - Parse Apache logs from GNU EPrints

SYNOPSIS

  use Logfile::EPrints;

  my $parser = Logfile::EPrints->new(
  	handler=>Logfile::Repeated->new(
	  handler=>Logfile::Institution->new(
	  	handler=>$MyHandler,
	)),
	identifier=>'oai:myir:', # Prepended to the eprint id
  );
  open my $fh, "<access_log" or die $!;
  $parser->parse_file($fh);

  package MyHandler;

  sub new { ... }
  sub AUTOLOAD { ... }
  sub fulltext {
  	my ($self,$hit) = @_;
	printf("%s from %s requested %s (%s)\n",
	  $hit->hostname||$hit->address,
	  $hit->institution||'Unknown',
	  $hit->page,
	  $hit->identifier,
	);
  }

DESCRIPTION

The Logfile::* modules provide a means to analyze log files from Web servers (typically Institutional Repositories) by translating HTTP requests into more informative data, e.g. a full-text download by a user at Caltech.

The architectural design consists of a series of pluggable filters that read from a log file or stream into Perl objects/callbacks. The first filter in the stream needs to convert from the log file format into a Perl object representing a single "hit". Subsequent filters can then ignore hits (e.g. from robots) and/or augment them with additional data (e.g. country of origin by GeoIP).

CALLBACKS

See Logfile::Hit for the fields available from the 'hit' object.

Filter Callbacks

abstract($handler,$hit)
browse($handler,$hit)
fulltext($handler,$hit)
repeated($handler,$hit)

Repeated is implemented by Logfile::Repeated

search($handler,$hit)

SEE ALSO

AUTHOR

Timothy D Brody, <tdb01r@ecs.soton.ac.uk>

TODO

Robots filter:

Exclude users that request robots.txt (probably requires persistent storage) =item Exclude users by user-agent string

COPYRIGHT AND LICENSE

Copyright (C) 2005 by Timothy D Brody

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 169:

'=item' outside of any '=over'

Around line 172:

You forgot a '=back' before '=head1'