NAME
Log::Log4perl::AutoCategorize - extended Log::Log4perl logging
ABSTRACT
Log::Log4perl::AutoCategorize extends Log::Log4perl's easy mode, adding 2 main features;
1. extended, automatic, transparent categorization capabilities
2. runtime information useful for:
a. test-coverage assessment
b. managing your logging config.
3. Minor convenience enhancements
a. config-load in use statement
b. Data::Dumper of ref args
There are several more mature alternatives which you should check out for comparison;
#1. search for Stealth Loggers in base POD
use Log::Log4perl qw(:easy);
#2. new functionality, developed at approx same time as AutoCategorize
use Log::Log4perl::Filter;
SYNOPSIS
use Log::Log4perl::AutoCategorize
(
alias => 'Logger', # shorthand class-name alias
# you can initialize in use statement
# 1st way gives separation of code from config
# 2nd way is good for demonstration and early development
initfile => $filename,
initstr => q{
# see Log4perl docs to understand these directives
log4perl.rootLogger=DEBUG, A1
# log4perl.appender.A1=Log::Dispatch::Screen
log4perl.appender.A1 = Log::Dispatch::File
log4perl.appender.A1.filename = ./mylog
log4perl.appender.A1.mode = write
log4perl.appender.A1.layout = PatternLayout
log4perl.appender.A1.layout.ConversionPattern=%d %c %m%n
# create TEST-COVERAGE log
log4perl.appender.COVERAGE = Log::Dispatch::File
log4perl.appender.COVERAGE.mode = write
log4perl.appender.COVERAGE.layout = org.apache.log4j.PatternLayout
log4perl.appender.COVERAGE.layout.ConversionPattern = (%d{HH:mm:ss.SSS}) %c: %m%n
# save timestamped versions, 1 per process
log4perl.appender.COVERAGE.filename = sub {"./test-coverage.txt.". scalar localtime}
# now, add the value: send the stuff written at END to it
log4perl.logger.Log.Log4perl.Autocategorize.END = INFO, COVERAGE
},
);
foreach (1..500) {
Logger->warn($_);
foo();
A->bar();
A::bar();
}
sub foo {
foreach (1..20) {
Logger->warn($_);
}
}
package A;
sub bar {
my @d;
foreach (reverse 1..10) {
push @d, $_;
Logger->warn("t-minus:", $_,\@d);
}
}
DESCRIPTION
Before diving in, a few notes:
This is not intended to document Log4perl (abbrev for Log::Log4perl); that documentation is a good tutorial, and is quite thorough and complete.
In this document, I use Logger as the official shorthand for Log::Log4perl::AutoCategorize, as it gets tedious and verbose to repeat the full name. There is module support for you to do this too; note that its used in the example above, ie: 'alias => Logger'. You can alias it as you prefer: myLogger, ourLogger, or you can subclass it.
I try to use call for the act of calling a method at runtime, and invocation to refer to the source-code (ie: package,sub,line-number) that is called. I hope that the distinction will help to bend the brick into a more hat-like shape.
This module requires perl 5.8, as the optimizer is unsupported in 5.6. You can use this module IFF you uncomment the __END__ before the optimization stuff is done (at bottom of file), but runtime costs are considerable (40% overhead), so I wont add this as an option.
AutoCategorization
The primary feature of this module is to extend Log4perl's flexibility by automatically and transparently creating logging categories for each invocation in your application code.
use Log::Log4perl ':easy' offers a comparable, but less capable feature; it automatically infers that $logCat = "$caller_package", and uses that category to properly handle the message at runtime.
package Foo;
use Log::Log4perl(':easy');
package Foo::Bar;
use Log::Log4perl(':easy');
This module extends the $logCat with info thats obtained at runtime, leveraging Log::Log4perl's categorization of log-events by giving more detail upon which to filter.
$logcat = "$package.$subname.$loglevel.$linenum";
Log4perl's bubble-up behavior is still in effect; if your log-config never specifies anything beyond $package, the $logCat of each call bubbles up thru the log-config, until it matches with your package-level config item.
In other words: since the $logcat contains more information than that provided by :easy, your logging configuration can exersize more control over what gets logged.
Note that you could have always used whatever categories you wanted, see "Configuration files" in Log::Log4perl, you just didnt get the help that made it easy to do so.
Test Coverage Results of your Application
When your code uses this module, the logging calls are categorized and counted; this is reported when the program terminates. The info is returned in 3 chunks.
Seen: How many times each invocation was called
The %Seen hash stores and counts all invocations of all functions, keyed by the category. The keys are prefixed with '#log4perl.category.', which simplifies cut-paste of this output into your log-config. This is variously called a usage report, a test-coverage report, a seen-report, etc.
(15:24:24.772) Logger.END.info.106: $Seen Log Events: = [
{
'#log4perl.category.A.bar.debug.55' => '-400',
'#log4perl.category.A.bar.warn.54' => 400,
'#log4perl.category.Log.Log4perl.AutoCategorize.END.info.201' => 1,
'#log4perl.category.main.foo.warn.44' => 200,
'#log4perl.category.main.info.32' => 10,
'#log4perl.category.main.warn.31' => 10
}
];
Note that A.bar.debug.55 above has a negative count. This indicates that the logging activity was suppressed by the configuration, but the code was still reached 400 times.
UnSeen: What invocations were never called
This report identifies the Logging invocations which were never called during this run of your program. This can be used systematically to assess how thoroughly you are testing your application.
(15:24:24.772) Logger.END.info.106: $un-Seen Log Events: = [
{
'debug_00011' => 'main,probe.pl,52',
'info_00010' => 'main,probe.pl,51',
}
];
Category to Munged-Name
This hash identifies the map between the categories and the unique function-name. It is provided primarily for completeness, it may prove useful for debugging and/or other things.
(19:44:06.210) Log.Log4perl.AutoCategorize.END.info.308: cat2data:, {
'A.truck.debug.63' => 'debug_00005',
'A.truck.debug.65' => 'debug_00006',
'A.truck.debug.66' => 'debug_1_00007',
'A.truck.warn.62' => 'warn_00004',
...
}
Assessing Test Coverage
One way to use the test coverage facility is by running your test suite and collecting the individual coverage reports. Since the config-items are sorted in the report, its easy to compare 2 of them; you can tell what parts of the system are tested by each part of the test suite.
Log4perl can write log-files with configurable names (ex: a timestamped name, see the SYNOPSIS example). You can use this facility to write separate files for each sub-test, and then merge them (ex: `cat *.coverage-log|sort -u`), and quickly assess the total coverage provided by the entire test suite.
Keeping your Logging Config current
Log4perl documentation speaks of the hazards/limitations of categories; this is more critical here, because categories are highly leveraged. See "Pitfalls with Categories" in Log::Log4perl for more info.
This package is undisputably more vulnerable to such problems; it exposes line numbers, which change regularly during development. If you use numbers, be aware that adding comments to your code will change the category reported, rendering those log-config items ineffective for controlling output.
Moreover, you should recognize that as the numbers become more stable, theyre also less valuable; by then, youve reassesed and refined your choices wrt log-levels. (you have, yes?)
However, you can also be more disiplined (TMTOWTDI), using config-lines with a mix of specificity: package, package.method, and package.method.logginglevel. Here the coverage report helps, since it lists all executed logger-invocations, it serves as a pretty good roadmap to your application, letting you see the entire forest through the trees.
Also, the coverage report is formatted to be easily edited into a informative and useful logging configuration, ie:
1. comments/documents the 'existence' of a log-config entry
2. doesnt create the hierarchy in the logger-config, efficient
3. identifies all the code instrumentation available
4. serves as cut-paste fodder for your log-configs
Bottom line: I recommend that you start your log-config file by copying and editing the coverage report. Just remove the quotes put in by Data::Dumper, change '=>' to '=', remove '#' to activate lines, and change the count to DEBUG, WARN, etc.
What %Seen cant tell you
Because the %Seen coverage only includes the categories it sees at (a single) runtime, it cant 'remember' the aggregate coverage from previous runs.
The sort technique given above to assess aggregate test-coverage is not as useful here; it sorts commented lines away from their uncommented counterparts, but its a start.
I may add an option to support a separate 'remembering' aggregate via a tied variable thats initialized in a BEGIN block. This is a distant feature; I need to find and copy prior art from somewhere in CPAN before I design and implement this.
Filtering with Log::Log4perl::AutoCategorize
Restating in different words (at risk of redundancy), filtering with this package uses the base behavior of bubbling up a logging event through the config, and using generic config-items which are not overridden by more specific items.
Using part or all of the hierarchy of categories in your config-file, you can enable or disable logging on a class, a method in that class, particular logging levels with that method, or line by line.
log4perl.category.Frob = DEBUG # 1
log4perl.category.Frob.nicate.debug = INFO # 2
log4perl.category.Frob.nicate.debug.118 = DEBUG # 3
#log4perl.category.Frob.Vernier.twiddle.debug = DEBUG # 4
(1) enables debug logging accross the Frob class, but then (2) overrides that class-wide setting within the method nicate(), suppressing debug logging (by setting the threshold to INFO). (3) further overrides that setting by enabling the debug statement on line 118.
Tweaking your log config
If you look carefully at the example above, (2) may be confusing; it has 'debug = INFO', which looks like nonsense. The 'debug' is the log-level that the programmer wrote, it reflects his judgement as to the nature of the message.
The '= INFO' part tells you that the config user decided to suppress debug messages from nicate(), which were enabled class-wide by (1). (3) restores normal behavior of debug logging, but only for 1 invocation.
Note also that (4), even if it was uncommented, would have no effect on the logging, because its just saying that all debug() calls done by Frob::Vernier::twiddle() are issued at their default level. Keeping such lines commented is more efficient; it saves Log::Log4perl both cpu and memory by not loading config-items that arent meaningful.
Other Features
Automatic categorization, logging coverage, and fine-grained control of logging are the primary value-added features of this wrapper package. Other secondary features are;
Configuration Loading
If use Log::Log4perl::AutoCategorize is given an import list which includes initstr => q{} or initfile => $filename, the string or file is used to initialize Log::Log4perl. You can still use the old way, ie $base->init().
Classname Aliasing
Because it would be inconvenient to type the full classname for every statement, ex: Log::Log4perl::AutoCategorize->debug(), the package has the ability to alias itself into a classname of your choice, ex:
use Log::Log4perl::AutoCategorize ( alias => myLogger );
myLogger->debug("ok doing it");
Because aliasing is used in optimization phase, it must be imported via use, it cannot be deferred as in require, import.
Easy Initialization
You can initialize the base logger by any of the following;
# as use parameters, least typing
a. use Log::Log4perl::Autocategorize ( initstr => $string );
b. use Log::Log4perl::Autocategorize ( initfile => $filename );
# assuming youve already used the package
c. Log::Log4perl::AutoCategorize->init( $filename );
d. Log::Log4perl::AutoCategorize->init( \$string );
# you can cheat, and use the base class name
e. Log::Log4perl->init( $filename );
f. Log::Log4perl->init( \$string );
# assuming youve aliased
g. Logger->init( $filename );
h. Logger->init( \$string );
i. # no explicit initialization at all (theres a default config)
The default config writes to stdout, and includes a test-coverage setup. This config is used at import() time, unless a or b is provided by you. If you explicitly call init(), the base is re-initialized, and relys on base behavior to work correctly.
I hope at some point to provide init_modify(), which will overlay new configuration on existing, rather than a full reinitialization. This will be implemented using Log4perls (currently nonexistent) configuration include mechanism.
Automatic Dump of structured data
If you pass in a data-ref, its rendered using Data::Dumper, making it easy to examine the details of the runtime context. Since the Dump is embedded inside the logging-method, its only called when needed, avoiding work to produce output that will just be thrown away.
SubClassing
As sayeth "Planning for the Future: Better Constructors" in perltoot, you make you package suitable for deriving new ones. test 04_subclass.t verifies this, though it wasnt the 1st thing I got working.
Module Architecture
The previous section hopefully provides sufficient motivation for why you should use this module, this section describes how the module works.
2 phase strategy to achieve both greater filtering capability, and similar or better efficiency.
Original AUTOLOAD functionality
In v0.001, AUTOLOAD did the entire job itself, primarily cuz it seemed better to write the code once, rather than 5 times (for debug, info, warn, error, fatal). Every time a call like Logger->warn() was reached, AUTOLOAD would get control, and do the following:
0. determine log-level which was called ($AUTOLOAD)
1. used caller() to construct a category string, $logcat, dynamically.
2. fetched the $logger singleton {$logger = get_logger($logcat)}
3. tested if $logcat was a loggable event {$logger->is_$loglevel()}
4. logged the message, or not, as appropriate.
Doing this repeatedly using caller() for every call is computationally expensive for a logging subsystem. If the module is to be usable application wide, it must be efficient.
Gaining efficiency
The canonical way to get speed from AUTOLOAD is to not call it; or more helpfully, to create the desired subroutine so that it gets called for all future calls to that method. But in this case, the decision reached in step 4 is different for every invocation, so it cannot be reduced to a single subroutine.
The only way to get custom behavior for each invocation is to make them each invoke a unique method name; ie to munge the method names where they are invoked. Once this is done, AUTOLOAD can then vivify subroutines to implement them, and then it wont be called again (for that unique method), avoiding the 0-4 overhead completely.
Using optimizer.pm
I use Simon Cozen''s optimizer.pm to convert the source code:
FROM: {
Logger->debug ("msg A");
Logger->debug ("msg B");
}
TO: {
Logger->debug_001 ("msg A");
Logger->debug_002 ("msg B");
# etc...
}
The function-munging produces a unique function name for each point in your code where you invoke a logging method. This allows AUTOLOAD (which handles all these calls) to distinguish each invocation, make a custom handler routine for it. Once the custom handler is added into the symbol table, it is called later without any overhead.
How optimizer munges
Your code, 'use optimizer => sub yourfunc;' operates at compile time, letting you search the opcodes for specific op-chains. Once youve got the right opchain, you can do the opcode surgery.
For example, we must recognize this opcode chain;
`perl -MO=Concise,-exec -e 'Logger->info_00010(2,"args")`; t76 <0> pushmark s t77 <$> const(PV "Logger") sM/BARE t78 <$> const(IV 2) sM t79 <$> const(PV "args") sM t7a <$> method_named(PVIV "info_00010") t7b <1> entersub[t23] vKS/TARG
Our criteria is (pushmark, const with PV="Logger", ..., method_named with PVIV=~$namePatt). We also recognize the opcodes that build arguments on the stack, so we can find the balanced method_named opcode. Finally, we insure that the method name being invoked is one of the allowed ones.
Caveat: you must use a specific coding style
To match the above criteria, your code must use *ONLY the 1st* of these 3 different coding styles;
IE: Logger->debug("foo");
NOT: $logger->debug("foo");
NOR: Logger::debug("foo");
This form provides the package name as a bareword (t77), which allows a rather strict (safe?) test; we dont want to stomp in others\' namespaces (except for the explicit takeover of the $Alias package)
The 2nd form is not safely detectable at compile time, since $logger type cannot be absolutely known, and we try not to do risky surgery. For example
# never inadvertently do this -
$logger->warn if ref $logger eq 'FlareGun';
The 3rd form is doable, but clutters the code which does $Alias recognition, for little syntactic gain. Currently, you wont even get a warning you if you use this construct.
lightweight customized subroutines
Given that all the customized anonymous subroutines are instantiated as either sub{} or sub{ logit("$cat") }; they all invoke a common logit() routine that does the real work.
The asubs are trivial wrappers, with the category value stringified and held in code, and passed into logit() when called. This doesnt shrink the number of symbol table entries, but it minimizes the size of the CVs hanging there.
To save more space, a common do_nothing() function could be used instead of the current invocation counting do-nothing, at the cost of losing test-coverage info.
AUTOLOAD revisited
Once method munging was in place, $AUTOLOAD would contain, for example; not "debug", but "debug_00011". Now that each invocation is unique, the distinctions can be computed once and remembered; ie they can be reduced to a dynamically created sub, and added to the symbol table.
IOW, these changes were made to AUTOLOAD:
4. computes the category using caller()
5. determines whether the base logger would send the message.
6. instantiates an anonymous subroutine reflecting 5.
7. adds the sub to the symbol table, with the munged method name.
At runtime, invocations like Logger->warn_$UIID() are called, and dispatched to AUTOLOAD. Once the munged function is in the symbol table, AUTOLOAD is never called again for this invocation.
Benchmarking
Ive done one Benchmark.pm based test (tshort.pl) which doesnt seem quite right. The simpler approach of using 2 similar test-scripts shows more convincing results.
$jimc@harpo logger]$ time t1.pl -n > /dev/null
real 1m30.860s
user 1m28.246s
sys 0m2.002s
$jimc@harpo logger]$ time t2.pl
... debug output snipped ...
real 0m52.229s
user 0m49.363s
sys 0m1.625s
This last benchmark implements above test in pure Log::Log4perl code. As you can see, this package is competetive with Log::Log4perl itself, at least when analogous features are used (that use is a bit forced, you wouldnt NEED to do so all the time).
Nonetheless, its conceivable to get better than native performance if internal short-circuits can be taken.
Please run perftests/timeit.sh and send me email, particularly if you get wildly different results.
CAVEAT (restated)
This package expects you to use Logger->warn() style of coding, $logger->warn() will not work, unless . This is because the type of $logger cannot be known at compile time, and its too risky to munge everything that looks like $flaregun->warn().
Even if it were possible, it would probably be bad to mix lexical code customization with objects and their attributes.
BUGS
The category extension can be abused; you can independently suppress warn() while elevating debug(), leading to suprises in what output goes where. But this can be used sanely, so I dont consider worth disabling. I could be convinced to optionally report it; as ever, a patch is a compelling argument.
The %Seen report can never contain the categories of unexecuted functions, since the catalog entry is created by AUTOLOAD, which is never run if the function isnt executed. Ive thought of a few "solutions", but theyre all insane ..
1. remember functions during munging, call them once in CHECK{}.
Wont work; would get useless caller() info.
2. Tie %Total_Invocations to a DBM file, update in the END. This
is seductive; but it will have trouble forgetting invocations
that have been removed from the code, or in the case of
line-numbered log-specs, been moved by a mere comment. Note
though that reset control could be added to API.
Debugging your code can be problematic; it seems to undo emacs keybindings; a ctrl-P (previous command) will send it into a cpu-sucking vortex.
TODO
fix raft of 'subroutine redefined' warnings
Warnings are issued (under -w) during compile-phase, I cant currently suppress them. I suspect that solution lies in warnings::register, or perhaps in more primitive solutions like { package B::Generate; no warnings 'redefine' }. If you can patch these away, Id be eternally grateful.
devise good way to limit the deep logging of ref-data
AutoCategorize uses Data::Dumper to render the structure of complex data arguments. It would be nice to selectively suppress or modify the Dumper output, and/or print just the strings.
But to do this via the log-config, the config-item syntax must be extended; either by adding new attributes, or by allowing new values. Either approach compromises interoperability with log4java config files. These possible config-schemes should therefore be considered speculative;
# using extra param
log4perl.category.TestUtils.testcase_applies = DEBUG, 0
The 2nd parameter would control whether Data::Dumper was invoked, in this case they would not be. This scheme has advantage of having a wide API; hashrefs etc can (in principle - and subject to reality check by Gurus) be given as parameters.
Other schemes are also possible;
# an additional '.detail' attribute
log4perl.category.A.foo.debug.detail = 0
# new .*.detail attribute, with (needed) wild-card
log4perl.category.A.foo.debug.*.detail = 0
# limit the depth of dumper (one way, others possible)
log4perl.category.A.foo.debug.*.detail = -20
# alternative values
log4perl.category.A.foo.debug.*.detail = YAML
log4perl.category.A.foo.debug.*.detail = sub { YAML->dump }
# alternate (currently unused) directives
log4perl.autocat.A.foo.debug.*.detail = sub { YAML->dump }
allow use of alternates to Data::Dumper
YAML comes to mind, as does Data::Dumper::Sortkeys = sub {...}; I took a quick swipe at using YAML::Dump (-dy debug option to perftests/tdebug.pl), but its not yielding expected results.
Its also possible to get various object and context sensitive serializations, esp if combined with extra params above.
devise extended method-name space
This package is hardcoded to only munge invocations to one of (debug, info, notice, warn, error), non matching are left unaltered, because some calls should delegate thru to Log::Log4perl, and we should be conservative about what gets munged.
But the point is this: A whole panoply of method suffixes are possible; (ex: /(info|debug)_.*/), a careful use of this method namespace could be used to specify many subtle (or not so subtle) variations in behavior (ex: debug_dump, warn_nodump, error_iftrue, info_iffalse).
short-circuit internal checks
Its likely that several checks and/or dereferences internal to Log::Log4perl can be eliminated once its determined (at 1st invocation) that a log event should actually do something. This is however risky; too much deep coupling.
init and watch
Log4perl has ability to watch the logfile for changes, and change its operation accordingly. This package must hook into that notification, and use it to undefine the appropriate symbol-table entry, thus allowing the AUTOLOAD mechanism to re-vivify it according to the new configuration. Obviously it would be nice to do this only to the parts of the config that are different.
This feature will use the %cat2data mapping built at 1st invocation to allow config changes to be mapped back to the munged method-name that resolved to that category in the 1st place.
context sensitive logging
NDC and MDC are tantalizing features, they appear to hold some promise for conditional logging; ie once an application error is detected, a full dump of the NDC or MDC is logged. Alternatively, once an all-ok is determined, a stack of NDC logs can be discarded, and a summary message issued.
Remembering the AutoCat-alog
See also "BUGS/Seen.2". While it borders on the baroque, its possible to tie %TotalInvocations to a DBM or Storable file, whose name is controlled by your log-config. Then you could use Log4perl''s logitem = sub{} to syncronize the filename to your VERSION; either $package::VERSION, a complete CVS Revision, or to your application Version. Interestingly, this could give some insight into longer-term code evolution.
Improve the documentation
In an effort to provide adequate motivation and explanation, Ive gotten long-winded, repeating ideas several times. Im now too close to this to judge their clarity adequately, I welcome feedback and suggestions wrt what is tedious or unclear.
To this end, Im including 2 versions of a lightning (5 min) talk I gave on this module at YAPC::NA in Boca. The longer version I tested on several willing victims, and the short version I actually gave based upon their valuable comments. If these are helpful, please let me know how they could be folded into this document to improve clarity and/or remove redundancy.
More TODO
find and kill bugs..
write more test cases..
optimize more
integrate more fully with Log::Log4perl
help/feedback is welcome.
CREDITS
Mike Schilli and Kevin Goess for Log::Log4perl, and for reviewing my docs and code, and pointing out features I had not fully understood.
Simon Cozens, Artur Bergman for optimizer, B::Generate. The extra features this module provides would not be practical w/o the deep magic in those packages; Im just repeating incantations.
AUTHOR
Jim Cromie <jcromie@cpan.org>
COPYRIGHT AND LICENSE
Copyright 2003 Jim Cromie
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
(I always wanted to write that ;-)