NAME

Log::Log4perl::AutoCategorize - extended Log::Log4perl logging

ABSTRACT

Log::Log4perl::AutoCategorize extends Log::Log4perl's easy mode, adding 2 main features;

1. extended, automatic, transparent categorization capabilities
2. runtime information useful for:
  a. test-coverage assessment
  b. managing your logging config.
3. Minor convenience enhancements
  a. config-load in use statement
  b. Data::Dumper of ref args

There are several more mature alternatives which you should check out for comparison;

#1. search for Stealth Loggers in base POD
use Log::Log4perl qw(:easy);

#2. new functionality, developed at approx same time as AutoCategorize
use Log::Log4perl::Filter;

SYNOPSIS

  use Log::Log4perl::AutoCategorize
    (
     alias => 'Logger', # shorthand class-name alias
     # you can initialize in use statement
     #  1st way gives separation of code from config
     #  2nd way is good for demonstration and early development
     initfile => $filename,
     initstr => q{
         # see Log4perl docs to understand these directives
	 log4perl.rootLogger=DEBUG, A1
	 # log4perl.appender.A1=Log::Dispatch::Screen
	 log4perl.appender.A1 = Log::Dispatch::File
	 log4perl.appender.A1.filename = ./mylog
	 log4perl.appender.A1.mode = write
	 log4perl.appender.A1.layout = PatternLayout
	 log4perl.appender.A1.layout.ConversionPattern=%d %c %m%n
	 # create TEST-COVERAGE log
	 log4perl.appender.COVERAGE = Log::Dispatch::File
         log4perl.appender.COVERAGE.mode = write
	 log4perl.appender.COVERAGE.layout = org.apache.log4j.PatternLayout
	 log4perl.appender.COVERAGE.layout.ConversionPattern = (%d{HH:mm:ss.SSS}) %c: %m%n
	 # save timestamped versions, 1 per process
	 log4perl.appender.COVERAGE.filename = sub {"./test-coverage.txt.". scalar localtime}

	 # now, add the value: send the stuff written at END to it
	 log4perl.logger.Log.Log4perl.Autocategorize.END = INFO, COVERAGE
	 },
     );

  foreach (1..500) {
    Logger->warn($_);
    foo();
    A->bar();
    A::bar();
  }

  sub foo {
    foreach (1..20) {
	Logger->warn($_);
    }
  }

  package A;

  sub bar {
    my @d;
    foreach (reverse 1..10) {
	push @d, $_;
	Logger->warn("t-minus:", $_,\@d);
    }
  }

DESCRIPTION

Before diving in, a few notes:

This is not intended to document Log4perl (abbrev for Log::Log4perl); that documentation is a good tutorial, and is quite thorough and complete.

In this document, I use Logger as the official shorthand for Log::Log4perl::AutoCategorize, as it gets tedious and verbose to repeat the full name. There is module support for you to do this too; note that its used in the example above, ie: 'alias => Logger'. You can alias it as you prefer: myLogger, ourLogger, or you can subclass it.

I try to use call for the act of calling a method at runtime, and invocation to refer to the source-code (ie: package,sub,line-number) that is called. I hope that the distinction will help to bend the brick into a more hat-like shape.

This module requires perl 5.8, as the optimizer is unsupported in 5.6. You can use this module IFF you uncomment the __END__ before the optimization stuff is done (at bottom of file), but runtime costs are considerable (40% overhead), so I wont add this as an option.

AutoCategorization

The primary feature of this module is to extend Log4perl's flexibility by automatically and transparently creating logging categories for each invocation in your application code.

use Log::Log4perl ':easy' offers a comparable, but less capable feature; it automatically infers that $logCat = "$caller_package", and uses that category to properly handle the message at runtime.

package Foo;
use Log::Log4perl(':easy');

package Foo::Bar;
use Log::Log4perl(':easy');

This module extends the $logCat with info thats obtained at runtime, leveraging Log::Log4perl's categorization of log-events by giving more detail upon which to filter.

$logcat = "$package.$subname.$loglevel.$linenum";

Log4perl's bubble-up behavior is still in effect; if your log-config never specifies anything beyond $package, the $logCat of each call bubbles up thru the log-config, until it matches with your package-level config item.

In other words: since the $logcat contains more information than that provided by :easy, your logging configuration can exersize more control over what gets logged.

Note that you could have always used whatever categories you wanted, see "Configuration files" in Log::Log4perl, you just didnt get the help that made it easy to do so.

Test Coverage Results of your Application

When your code uses this module, the logging calls are categorized and counted; this is reported when the program terminates. The info is returned in 3 chunks.

Seen: How many times each invocation was called

The %Seen hash stores and counts all invocations of all functions, keyed by the category. The keys are prefixed with '#log4perl.category.', which simplifies cut-paste of this output into your log-config. This is variously called a usage report, a test-coverage report, a seen-report, etc.

(15:24:24.772) Logger.END.info.106: $Seen Log Events: = [
 {
   '#log4perl.category.A.bar.debug.55' => '-400',
   '#log4perl.category.A.bar.warn.54' => 400,
   '#log4perl.category.Log.Log4perl.AutoCategorize.END.info.201' => 1,
   '#log4perl.category.main.foo.warn.44' => 200,
   '#log4perl.category.main.info.32' => 10,
   '#log4perl.category.main.warn.31' => 10
 }
];

Note that A.bar.debug.55 above has a negative count. This indicates that the logging activity was suppressed by the configuration, but the code was still reached 400 times.

UnSeen: What invocations were never called

This report identifies the Logging invocations which were never called during this run of your program. This can be used systematically to assess how thoroughly you are testing your application.

(15:24:24.772) Logger.END.info.106: $un-Seen Log Events: = [
 {
   'debug_00011' => 'main,probe.pl,52',
   'info_00010' => 'main,probe.pl,51',
 }
];

Category to Munged-Name

This hash identifies the map between the categories and the unique function-name. It is provided primarily for completeness, it may prove useful for debugging and/or other things.

(19:44:06.210) Log.Log4perl.AutoCategorize.END.info.308: cat2data:, {
 'A.truck.debug.63' => 'debug_00005',
 'A.truck.debug.65' => 'debug_00006',
 'A.truck.debug.66' => 'debug_1_00007',
 'A.truck.warn.62' => 'warn_00004',
 ...
 }

Assessing Test Coverage

One way to use the test coverage facility is by running your test suite and collecting the individual coverage reports. Since the config-items are sorted in the report, its easy to compare 2 of them; you can tell what parts of the system are tested by each part of the test suite.

Log4perl can write log-files with configurable names (ex: a timestamped name, see the SYNOPSIS example). You can use this facility to write separate files for each sub-test, and then merge them (ex: `cat *.coverage-log|sort -u`), and quickly assess the total coverage provided by the entire test suite.

Keeping your Logging Config current

Log4perl documentation speaks of the hazards/limitations of categories; this is more critical here, because categories are highly leveraged. See "Pitfalls with Categories" in Log::Log4perl for more info.

This package is undisputably more vulnerable to such problems; it exposes line numbers, which change regularly during development. If you use numbers, be aware that adding comments to your code will change the category reported, rendering those log-config items ineffective for controlling output.

Moreover, you should recognize that as the numbers become more stable, theyre also less valuable; by then, youve reassesed and refined your choices wrt log-levels. (you have, yes?)

However, you can also be more disiplined (TMTOWTDI), using config-lines with a mix of specificity: package, package.method, and package.method.logginglevel. Here the coverage report helps, since it lists all executed logger-invocations, it serves as a pretty good roadmap to your application, letting you see the entire forest through the trees.

Also, the coverage report is formatted to be easily edited into a informative and useful logging configuration, ie:

1. comments/documents the 'existence' of a log-config entry
2. doesnt create the hierarchy in the logger-config, efficient
3. identifies all the code instrumentation available
4. serves as cut-paste fodder for your log-configs

Bottom line: I recommend that you start your log-config file by copying and editing the coverage report. Just remove the quotes put in by Data::Dumper, change '=>' to '=', remove '#' to activate lines, and change the count to DEBUG, WARN, etc.

What %Seen cant tell you

Because the %Seen coverage only includes the categories it sees at (a single) runtime, it cant 'remember' the aggregate coverage from previous runs.

The sort technique given above to assess aggregate test-coverage is not as useful here; it sorts commented lines away from their uncommented counterparts, but its a start.

I may add an option to support a separate 'remembering' aggregate via a tied variable thats initialized in a BEGIN block. This is a distant feature; I need to find and copy prior art from somewhere in CPAN before I design and implement this.

Filtering with Log::Log4perl::AutoCategorize

Restating in different words (at risk of redundancy), filtering with this package uses the base behavior of bubbling up a logging event through the config, and using generic config-items which are not overridden by more specific items.

Using part or all of the hierarchy of categories in your config-file, you can enable or disable logging on a class, a method in that class, particular logging levels with that method, or line by line.

log4perl.category.Frob = DEBUG			# 1
log4perl.category.Frob.nicate.debug = INFO		# 2
log4perl.category.Frob.nicate.debug.118 = DEBUG	# 3
#log4perl.category.Frob.Vernier.twiddle.debug = DEBUG	# 4

(1) enables debug logging accross the Frob class, but then (2) overrides that class-wide setting within the method nicate(), suppressing debug logging (by setting the threshold to INFO). (3) further overrides that setting by enabling the debug statement on line 118.

Tweaking your log config

If you look carefully at the example above, (2) may be confusing; it has 'debug = INFO', which looks like nonsense. The 'debug' is the log-level that the programmer wrote, it reflects his judgement as to the nature of the message.

The '= INFO' part tells you that the config user decided to suppress debug messages from nicate(), which were enabled class-wide by (1). (3) restores normal behavior of debug logging, but only for 1 invocation.

Note also that (4), even if it was uncommented, would have no effect on the logging, because its just saying that all debug() calls done by Frob::Vernier::twiddle() are issued at their default level. Keeping such lines commented is more efficient; it saves Log::Log4perl both cpu and memory by not loading config-items that arent meaningful.

Other Features

Automatic categorization, logging coverage, and fine-grained control of logging are the primary value-added features of this wrapper package. Other secondary features are;

Configuration Loading

If use Log::Log4perl::AutoCategorize is given an import list which includes initstr => q{} or initfile => $filename, the string or file is used to initialize Log::Log4perl. You can still use the old way, ie $base->init().

Classname Aliasing

Because it would be inconvenient to type the full classname for every statement, ex: Log::Log4perl::AutoCategorize->debug(), the package has the ability to alias itself into a classname of your choice, ex:

use Log::Log4perl::AutoCategorize ( alias => myLogger );
myLogger->debug("ok doing it");

Because aliasing is used in optimization phase, it must be imported via use, it cannot be deferred as in require, import.

Easy Initialization

You can initialize the base logger by any of the following;

# as use parameters, least typing
a. use Log::Log4perl::Autocategorize ( initstr  => $string );
b. use Log::Log4perl::Autocategorize ( initfile => $filename );

# assuming youve already used the package
c. Log::Log4perl::AutoCategorize->init( $filename );
d. Log::Log4perl::AutoCategorize->init( \$string );

# you can cheat, and use the base class name
e. Log::Log4perl->init( $filename );
f. Log::Log4perl->init( \$string );

# assuming youve aliased
g. Logger->init( $filename );
h. Logger->init( \$string );

i. # no explicit initialization at all (theres a default config)

The default config writes to stdout, and includes a test-coverage setup. This config is used at import() time, unless a or b is provided by you. If you explicitly call init(), the base is re-initialized, and relys on base behavior to work correctly.

I hope at some point to provide init_modify(), which will overlay new configuration on existing, rather than a full reinitialization. This will be implemented using Log4perls (currently nonexistent) configuration include mechanism.

Automatic Dump of structured data

If you pass in a data-ref, its rendered using Data::Dumper, making it easy to examine the details of the runtime context. Since the Dump is embedded inside the logging-method, its only called when needed, avoiding work to produce output that will just be thrown away.

SubClassing

As sayeth "Planning for the Future: Better Constructors" in perltoot, you make you package suitable for deriving new ones. test 04_subclass.t verifies this, though it wasnt the 1st thing I got working.

Module Architecture

The previous section hopefully provides sufficient motivation for why you should use this module, this section describes how the module works.

2 phase strategy to achieve both greater filtering capability, and similar or better efficiency.

Original AUTOLOAD functionality

In v0.001, AUTOLOAD did the entire job itself, primarily cuz it seemed better to write the code once, rather than 5 times (for debug, info, warn, error, fatal). Every time a call like Logger->warn() was reached, AUTOLOAD would get control, and do the following:

0. determine log-level which was called ($AUTOLOAD)
1. used caller() to construct a category string, $logcat, dynamically.
2. fetched the $logger singleton {$logger = get_logger($logcat)}
3. tested if $logcat was a loggable event {$logger->is_$loglevel()}
4. logged the message, or not, as appropriate.

Doing this repeatedly using caller() for every call is computationally expensive for a logging subsystem. If the module is to be usable application wide, it must be efficient.

Gaining efficiency

The canonical way to get speed from AUTOLOAD is to not call it; or more helpfully, to create the desired subroutine so that it gets called for all future calls to that method. But in this case, the decision reached in step 4 is different for every invocation, so it cannot be reduced to a single subroutine.

The only way to get custom behavior for each invocation is to make them each invoke a unique method name; ie to munge the method names where they are invoked. Once this is done, AUTOLOAD can then vivify subroutines to implement them, and then it wont be called again (for that unique method), avoiding the 0-4 overhead completely.

Using optimizer.pm

I use Simon Cozen''s optimizer.pm to convert the source code:

    FROM: {
	Logger->debug ("msg A");
	Logger->debug ("msg B");
    }
    TO:	{
	Logger->debug_001 ("msg A");
	Logger->debug_002 ("msg B");
	# etc...
    }

The function-munging produces a unique function name for each point in your code where you invoke a logging method. This allows AUTOLOAD (which handles all these calls) to distinguish each invocation, make a custom handler routine for it. Once the custom handler is added into the symbol table, it is called later without any overhead.

How optimizer munges

Your code, 'use optimizer => sub yourfunc;' operates at compile time, letting you search the opcodes for specific op-chains. Once youve got the right opchain, you can do the opcode surgery.

For example, we must recognize this opcode chain;

`perl -MO=Concise,-exec -e 'Logger->info_00010(2,"args")`; t76 <0> pushmark s t77 <$> const(PV "Logger") sM/BARE t78 <$> const(IV 2) sM t79 <$> const(PV "args") sM t7a <$> method_named(PVIV "info_00010") t7b <1> entersub[t23] vKS/TARG

Our criteria is (pushmark, const with PV="Logger", ..., method_named with PVIV=~$namePatt). We also recognize the opcodes that build arguments on the stack, so we can find the balanced method_named opcode. Finally, we insure that the method name being invoked is one of the allowed ones.

Caveat: you must use a specific coding style

To match the above criteria, your code must use *ONLY the 1st* of these 3 different coding styles;

IE: Logger->debug("foo");
NOT: $logger->debug("foo");
NOR: Logger::debug("foo");

This form provides the package name as a bareword (t77), which allows a rather strict (safe?) test; we dont want to stomp in others\' namespaces (except for the explicit takeover of the $Alias package)

The 2nd form is not safely detectable at compile time, since $logger type cannot be absolutely known, and we try not to do risky surgery. For example

# never inadvertently do this -
$logger->warn if ref $logger eq 'FlareGun';

The 3rd form is doable, but clutters the code which does $Alias recognition, for little syntactic gain. Currently, you wont even get a warning you if you use this construct.

lightweight customized subroutines

Given that all the customized anonymous subroutines are instantiated as either sub{} or sub{ logit("$cat") }; they all invoke a common logit() routine that does the real work.

The asubs are trivial wrappers, with the category value stringified and held in code, and passed into logit() when called. This doesnt shrink the number of symbol table entries, but it minimizes the size of the CVs hanging there.

To save more space, a common do_nothing() function could be used instead of the current invocation counting do-nothing, at the cost of losing test-coverage info.

AUTOLOAD revisited

Once method munging was in place, $AUTOLOAD would contain, for example; not "debug", but "debug_00011". Now that each invocation is unique, the distinctions can be computed once and remembered; ie they can be reduced to a dynamically created sub, and added to the symbol table.

IOW, these changes were made to AUTOLOAD:

4. computes the category using caller()
5. determines whether the base logger would send the message.
6. instantiates an anonymous subroutine reflecting 5.
7. adds the sub to the symbol table, with the munged method name.

At runtime, invocations like Logger->warn_$UIID() are called, and dispatched to AUTOLOAD. Once the munged function is in the symbol table, AUTOLOAD is never called again for this invocation.

Benchmarking

Ive done one Benchmark.pm based test (tshort.pl) which doesnt seem quite right. The simpler approach of using 2 similar test-scripts shows more convincing results.

$jimc@harpo logger]$ time t1.pl -n > /dev/null

real	1m30.860s
user	1m28.246s
sys	0m2.002s

$jimc@harpo logger]$ time t2.pl

... debug output snipped ...

real	0m52.229s
user	0m49.363s
sys	0m1.625s

This last benchmark implements above test in pure Log::Log4perl code. As you can see, this package is competetive with Log::Log4perl itself, at least when analogous features are used (that use is a bit forced, you wouldnt NEED to do so all the time).

Nonetheless, its conceivable to get better than native performance if internal short-circuits can be taken.

Please run perftests/timeit.sh and send me email, particularly if you get wildly different results.

CAVEAT (restated)

This package expects you to use Logger->warn() style of coding, $logger->warn() will not work, unless . This is because the type of $logger cannot be known at compile time, and its too risky to munge everything that looks like $flaregun->warn().

Even if it were possible, it would probably be bad to mix lexical code customization with objects and their attributes.

BUGS

The category extension can be abused; you can independently suppress warn() while elevating debug(), leading to suprises in what output goes where. But this can be used sanely, so I dont consider worth disabling. I could be convinced to optionally report it; as ever, a patch is a compelling argument.

The %Seen report can never contain the categories of unexecuted functions, since the catalog entry is created by AUTOLOAD, which is never run if the function isnt executed. Ive thought of a few "solutions", but theyre all insane ..

1. remember functions during munging, call them once in CHECK{}.
   Wont work; would get useless caller() info.

2. Tie %Total_Invocations to a DBM file, update in the END.  This
   is seductive; but it will have trouble forgetting invocations
   that have been removed from the code, or in the case of
   line-numbered log-specs, been moved by a mere comment. Note
   though that reset control could be added to API.

Debugging your code can be problematic; it seems to undo emacs keybindings; a ctrl-P (previous command) will send it into a cpu-sucking vortex.

TODO

fix raft of 'subroutine redefined' warnings

Warnings are issued (under -w) during compile-phase, I cant currently suppress them. I suspect that solution lies in warnings::register, or perhaps in more primitive solutions like { package B::Generate; no warnings 'redefine' }. If you can patch these away, Id be eternally grateful.

devise good way to limit the deep logging of ref-data

AutoCategorize uses Data::Dumper to render the structure of complex data arguments. It would be nice to selectively suppress or modify the Dumper output, and/or print just the strings.

But to do this via the log-config, the config-item syntax must be extended; either by adding new attributes, or by allowing new values. Either approach compromises interoperability with log4java config files. These possible config-schemes should therefore be considered speculative;

# using extra param
log4perl.category.TestUtils.testcase_applies = DEBUG, 0

The 2nd parameter would control whether Data::Dumper was invoked, in this case they would not be. This scheme has advantage of having a wide API; hashrefs etc can (in principle - and subject to reality check by Gurus) be given as parameters.

Other schemes are also possible;

# an additional '.detail' attribute
log4perl.category.A.foo.debug.detail = 0

# new .*.detail attribute, with (needed) wild-card
log4perl.category.A.foo.debug.*.detail = 0
# limit the depth of dumper (one way, others possible)
log4perl.category.A.foo.debug.*.detail = -20
# alternative values
log4perl.category.A.foo.debug.*.detail = YAML
log4perl.category.A.foo.debug.*.detail = sub { YAML->dump }

# alternate (currently unused) directives
log4perl.autocat.A.foo.debug.*.detail = sub { YAML->dump }

allow use of alternates to Data::Dumper

YAML comes to mind, as does Data::Dumper::Sortkeys = sub {...}; I took a quick swipe at using YAML::Dump (-dy debug option to perftests/tdebug.pl), but its not yielding expected results.

Its also possible to get various object and context sensitive serializations, esp if combined with extra params above.

devise extended method-name space

This package is hardcoded to only munge invocations to one of (debug, info, notice, warn, error), non matching are left unaltered, because some calls should delegate thru to Log::Log4perl, and we should be conservative about what gets munged.

But the point is this: A whole panoply of method suffixes are possible; (ex: /(info|debug)_.*/), a careful use of this method namespace could be used to specify many subtle (or not so subtle) variations in behavior (ex: debug_dump, warn_nodump, error_iftrue, info_iffalse).

short-circuit internal checks

Its likely that several checks and/or dereferences internal to Log::Log4perl can be eliminated once its determined (at 1st invocation) that a log event should actually do something. This is however risky; too much deep coupling.

init and watch

Log4perl has ability to watch the logfile for changes, and change its operation accordingly. This package must hook into that notification, and use it to undefine the appropriate symbol-table entry, thus allowing the AUTOLOAD mechanism to re-vivify it according to the new configuration. Obviously it would be nice to do this only to the parts of the config that are different.

This feature will use the %cat2data mapping built at 1st invocation to allow config changes to be mapped back to the munged method-name that resolved to that category in the 1st place.

context sensitive logging

NDC and MDC are tantalizing features, they appear to hold some promise for conditional logging; ie once an application error is detected, a full dump of the NDC or MDC is logged. Alternatively, once an all-ok is determined, a stack of NDC logs can be discarded, and a summary message issued.

Remembering the AutoCat-alog

See also "BUGS/Seen.2". While it borders on the baroque, its possible to tie %TotalInvocations to a DBM or Storable file, whose name is controlled by your log-config. Then you could use Log4perl''s logitem = sub{} to syncronize the filename to your VERSION; either $package::VERSION, a complete CVS Revision, or to your application Version. Interestingly, this could give some insight into longer-term code evolution.

Improve the documentation

In an effort to provide adequate motivation and explanation, Ive gotten long-winded, repeating ideas several times. Im now too close to this to judge their clarity adequately, I welcome feedback and suggestions wrt what is tedious or unclear.

To this end, Im including 2 versions of a lightning (5 min) talk I gave on this module at YAPC::NA in Boca. The longer version I tested on several willing victims, and the short version I actually gave based upon their valuable comments. If these are helpful, please let me know how they could be folded into this document to improve clarity and/or remove redundancy.

More TODO

find and kill bugs..
write more test cases..
optimize more
integrate more fully with Log::Log4perl

help/feedback is welcome.

CREDITS

Mike Schilli and Kevin Goess for Log::Log4perl, and for reviewing my docs and code, and pointing out features I had not fully understood.

Simon Cozens, Artur Bergman for optimizer, B::Generate. The extra features this module provides would not be practical w/o the deep magic in those packages; Im just repeating incantations.

AUTHOR

Jim Cromie <jcromie@cpan.org>

COPYRIGHT AND LICENSE

Copyright 2003 Jim Cromie

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

(I always wanted to write that ;-)