NAME
Logfile - Perl extension for generating reports from logfiles
SYNOPSIS
use Logfile::Cern;
$l = new Logfile::Cern File => 'cache.log.gz',
Group => [Domain,File,Hour];
$l->report(Group => File, Sort => Records);
$l->report(Group => Domain, Sort => Bytes);
$l->report(Group => Hour, List => [Bytes, Records]);
use Logfile::Wftp;
[...]
DESCRIPTION
The Logfile extension will help you to generate various reports from different server logfiles. In general there is no restriction as to what information you extract from the logfiles.
Reading the files
The package can be customized by subclassing Logfile
.
A subclass should provide a function next
which reads the next record from the file handle $self->{Fh}
and returns an object of type Logfile::Record
. In addition a function norm
may be specified to normalize the various record fields.
Here is a shortened version of the Logfile::Cern
class:
package Logfile::Cern;
@ISA = qw ( Logfile::Base ) ;
sub next {
my $self = shift;
my $fh = $self->{Fh};
*S = $fh;
my ($line,$host,$user,$pass,$rest,$date,$req,$code,$bytes);
($host,$user,$pass,$rest) = split ' ', $line, 4;
($rest =~ s!\[([^\]]+)\]\s*!!) && ($date = $1);
($rest =~ s!\"([^\"]+)\"\s*!!) && ($req = (split ' ', $1)[1]);
($code, $bytes) = split ' ', $rest;
Logfile::Record->new(Host => $host,
Date => $date,
File => $req,
Bytes => $bytes);
}
As stated above, in general you are free to choose the fields you enter in the record. But:
- Date
-
should be a valid date string. For conversion to the seconds elapsed since the start of epoch the modules GetDate and Date::DateParse are tried. If both cannot be
use
ed, a crude build-in module is used.The record constructor replaces Date by the date in
yymmdd
form to make it sortable. Also the field Hour is padded in. - Host
-
Setting Host will also set field Domain by the verbose name of the country given by the domain suffix of the fully qualified domain name (hostname.domain).
foo.bar.PG
will be mapped toPapua New
. Host names containing no dot will be assigned to the domain Local. IP numbers will be assigned to the domain Unresolved. Mapping of short to long domain names is done in the Net::Country extension which might be useful in other contexts:use Net::Country; $germany = Net::Country::Name('de');
- Records
-
is always set to 1 in the
Record
constructor. So this field gives the number of successful returns from thenext
function.
Here is the shortened optional norm
method:
sub norm {
my ($self, $key, $val) = @_;
if ($key eq File) {
$val =~ s/\?.*//; # remove query
$val =~ s!%([\da-f][\da-f])!chr(hex($1))!eig; # decode escapes
}
$val;
}
The constructor reads in a logfile and builds one or more indices.
$l = new Logfile::Cern File => 'cache.log.gz',
Group => [Host,Domain,File,Hour,Date];
There is little space but some time overhead in generating additional indexes. If the File parameter is not given, STDIN is used. The Group parameter may be a field name or a reference to a list of field names. Only the field names given as constructor argument can be used for report generation.
Report Generation
The Index to use for a report must be given as the Group parameter. Output is sorted by the index field unless a Sort parameter is given. Also the output can be truncated by a Top argument or Limit.
The report generator lists the fields Bytes and Records for a given index. The option List may be a single field name or a reference to an array of field names. It specifies which field should be listed in addition to the Group field. List defaults to Records.
$l->report(Group => Domain, List => [Bytes, Records])
Output is sorted by the Group field unless overwritten by a Sort option. Default sorting order is increasing for Date and Hour fields and decreasing for all other Fields. The order can be reversed using the Reverse option.
This code
$l->report(Group => File, Sort => Records, Top => 10);
prints:
File Records
=====================================
/htbin/SFgate 30 31.58%
/freeWAIS-sf/* 22 23.16%
/SFgate/SFgate 8 8.42%
/SFgate/SFgate-small 7 7.37%
/icons/* 4 4.21%
/~goevert 3 3.16%
/journals/SIGMOD 3 3.16%
/SFgate/ciw 2 2.11%
/search 1 1.05%
/reports/96/ 1 1.05%
Here are other examples. Also take a look at the t/* files.
$l->report(Group => Domain, Sort => Bytes);
Domain Records
===============================
Germany 12 12.63%
Unresolved 8 8.42%
Israel 34 35.79%
Denmark 4 4.21%
Canada 3 3.16%
Network 6 6.32%
US Commercial 14 14.74%
US Educational 8 8.42%
Hong Kong 2 2.11%
Sweden 2 2.11%
Non-Profit 1 1.05%
Local 1 1.05%
$l->report(Group => Hour, List => [Bytes, Records]);
Hour Bytes Records
======================================
07 245093 17.66% 34 35.79%
08 438280 31.59% 19 20.00%
09 156730 11.30% 11 11.58%
10 255451 18.41% 16 16.84%
11 274521 19.79% 10 10.53%
12 17396 1.25% 5 5.26%
Report options
- Group
=>
field -
Mandatory. field must be one of the fields passed to the constructor.
- List
=>
field - List
=>
[field, field] -
List the subtotals for fields. Defaults to Records.
- Sort
=>
field. -
Sort output by field. By default, Date and Hour are sorted in increasing order, whereas all other fields are sorted in decreasing order.
- Reverse
=> 1
-
Reverse sorting order.
- Top
=>
number -
Print only the first number subtotals.
- Limit
=>
number -
Print only the subtotals with Sort field greater than number (less than number if sorted in increasing order).
Currently reports are simply printed to STDOUT.
AUTHOR
Ulrich Pfeifer <pfeifer@wait.de>
NEWS
Fixed strict refs bug for perl 5.005.
Fixed bug in fallback to included date parsing reported by James Downs.
Fixed y2k bug as suggested by Fred Korz. I chose the two digit version to be a backward compatible as possible. The 20
will be obvious a few years from now ;-) Output columns should now be separated by whitespace in any case.
SEE ALSO
perl(1).