Why not adopt me?
NAME
Parse::IRCLog -- parse internet relay chat logs
VERSION
version 1.101
$Id: /my/cs/projects/irclog/trunk/lib/Parse/IRCLog.pm 27907 2006-11-13T15:40:37.121620Z rjbs $
SYNOPSIS
use Parse::IRCLog;
$result = Parse::IRCLog->parse("perl-2004-02-01.log");
my %to_print = ( msg => 1, action => 1 );
for ($result->events) {
next unless $to_print{ $_->{type} };
print "$_->{nick}: $_->{text}\n";
}
DESCRIPTION
This module provides a simple framework to parse IRC logs in arbitrary formats.
A parser has a set of regular expressions for matching different events that occur in an IRC log, such as "msg" and "action" events. Each line in the log is matched against these rules and a result object, representing the event stream, is returned.
The rule set, described in greated detail below, can be customized by subclassing Parse::IRCLog. In this way, Parse::IRCLog can provide a generic interface for log analysis across many log formats, including custom formats.
Normally, the parse
method is used to create a result set without storing a parser object, but a parser may be created and reused.
METHODS
new
-
This method constructs a new parser (with
<$class-
construct>>) and initializes it (with<$obj-
init>>). Construction and initialization are separated for ease of subclassing initialization for future pipe dreams like guessing what ruleset to use. construct
-
The parser constructor just returns a new, empty parser object. It should be a blessed hashref.
init
-
The initialization method configures the object, loading its ruleset.
patterns
-
This method returns a reference to a hash of regular expressions, which are used to parse the logs. Only a few, so far, are required by the parser, although internally a few more are used to break down the task of parsing lines.
action
matches an action; that is, the result of /ME in IRC. It should return the following matches:$1 - timestamp $2 - nick prefix $3 - nick $4 - the action
msg
matches a message; that is, the result of /MSG (or "normal talking") in IRC. It should return the following matches:$1 - timestamp $2 - nick prefix $3 - nick $3 - channel $5 - the action
Read the source for a better idea as to how these regexps break down. Oh, and for what it's worth, the default patterns are based on my boring, default irssi configuration. Expect more rulesets to be included in future distributions.
parse($file)
-
This method parses the file named and returns a Parse::IRCLog::Result object representing the results. The
parse
method can be called on a parser object or on the class. If called on the class, a parser will be instantiated for the method call and discarded whenparse
returns. parse_line($line)
-
This method is used internally by
parse
to turn each line into an event. While it could someday be made slick, it's adequate for now. It attempts to match each line against the required patterns from thepatterns
result and if successful returns a hashref describing the event.If no match can be found, an "unknown" event is returned.
TODO
Write a few example subclasses for common log formats.
Add a few more default event types: join, part, nick. Others?
Possibly make the patterns
sub an module, to allow subclassing to override only one or two patterns. For example, to use the default nick
pattern but override the nick_container
or action_leader
. This sounds like a very good idea, actually, now that I write it down.
AUTHOR
Ricardo SIGNES <rjbs@cpan.org>
COPYRIGHT
Copyright 2004 by Ricardo Signes.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.