NAME
Devel::Init - Control when initialization happens, depending on situation.
SYNOPSYS
package Some::Module;
use Devel::Init qw< :Init InitBlock >;
BEGIN {
my $dbh;
sub getDBH :Init(not_prefork) {
return $dbh ||= InitBlock { _connect_to_database() };
}
}
sub readRecords {
# ...
my $dbh= getDBH();
# ...
}
# ...
package main;
use Devel::Init qw< RunInits >;
use Some::Module;
RunInits('prefork');
fork_worker_children(
sub {
RunInits();
do_work();
},
);
DESCRIPTION
Devel::Init provides aspect-oriented control over initialization. It can make your server code more efficient, allow for faster diagnosis of configuration (and other) problems, make your interactive code more responsive, and make your modules much easier to unit test (and other benefits).
Motivation
The best time to do complex or costly initialization can vary widely, even for the same piece of code. It is impossible for a module author to pick the best time because there is more than one best time, depending on the situation.
Devel::Init
allows module authors to declare what initialization steps are required and then allows each situation to very easily drive which types of initializations should be run when (and which types should not be run until they are actually needed).
The documentation also lays out best practices for ensuring that initialization will always be done before it is needed and not before it is possible, even when complex interdependencies between initialization steps exist.
And Devel::Init
makes it easy to write your initialization code so that a circular dependency will be detected and be reported in an informative way.
When to initialize
A good example that highlights many benefits of doing initializations at the "right" time is a daemon that fork()
s a bunch of worker children. A common example of that is an Apache web server using mod_perl.
For an initialization step that allocates a large structure of static data, running it before the worker children are fork()
ed can have several benefits. The work of building the structure doesn't need to be repeated in each child. The pages of memory holding parts of the structure can stay shared among the children, reducing the amount of physical memory required.
Conversely, for an initialization step that connects to a database server, doing that before fork()
ing is wasteful (and may even be problematic) because the children would inherit the underlying socket handle but couldn't actually make shared use of it and so would just have to connect to the DB server all over again.
But you might want to have database connections made quickly after each child is fork()
ed so that any problems with database connectivity or configuration will be noticed immediately.
When doing unit testing on some part of the larger application, you may want to avoid making database connections early. If the unit test never exercises code that needs the database connection, then you wouldn't need to set up a database that can be reached from within the test environment.
And if you are writing complex servers but are not using Devel::Init
, then you probably have run into cases where you can't even use
perl -c Some/Module.pm
to check for simple syntax errors in a module that you are working on. It is too easy to end up with way too much initialization happening at "compile time" so that the above never gets around to checking the syntax of that module:
$ perl -c Some/Module.pm
Can't connect to database ...
Interdependent initializations
You may not want to connect to the DB before fork()
ing, but you might have some other step that needs to connect to the DB in order to build a large data structure. If one registers the connecting to the DB as a step to be run after fork()
ing, then that registration should not prevent the connecting to the DB before fork()
ing if such is required before then.
So it is important to ensure that initialization steps don't get run "too late". Each needs to be run when it is needed, even if that isn't the "best" time you were hoping for.
You need to write your initialization code to ensure that complex dependencies between initialization steps get carried out in an acceptable order.
Singletons
An easy and reliable way to do this is to have a function that you call whenever you want to use the result of an initialization step. An obvious example is the handle for a connection to a database.
You write a subroutine that returns this DB handle and you ensure that nobody can access the DB handle without first calling that function. A great way to ensure such is to "hide" the DB handle in a lexical variable that is only in scope to that initialization method. For example:
BEGIN {
my $dbh;
sub get_dbh {
return $dbh
if $dbh;
# ...
$dbh= ...;
# ...
return $dbh;
}
}
# ...
my $sth= get_dbh()->prepare( $sql );
# ...
my $dbh= get_dbh();
for my $sql ( @queries ) {
my $sth= $dbh->prepare( $sql );
...
}
# ...
You might recognize this as a common way to implement the DB handle as a "singleton".
The code to initialize the DB handle might be complex. But the function immediately returns the stashed handle if it has already been initialized and so only adds insignificant overhead which is not worth worrying about.
Depending on a simple initialization
There are also simple initializations that you probably don't want to control via Devel::Init
that need to happen before other initialization steps can be run. So it is important to not allow an initialization step to be run "too early" where it would fail to find some data it needs.
For example:
# A bad example!
use Devel::Init qw< :Init >;
my $Connect= 'dbi:pick:dick@trw:1965';
my $DB_user= 'dnelson';
my $DB_pass= 'convair3537';
{
my $dbh;
sub getDBH :Init(not_prefork) {
$dbh ||= DBI->connect(
$Connect, $DB_user, $DB_pass,
{ RaiseError => 1, PrintError => 0 },
);
return $dbh;
}
}
# Don't do the above!
It is possible for getDBH()
to get called before $Connect
etc. get initialized. The :Init(not_prefork)
attribute notifies Devel::Init
of \&getDBH
as soon as Perl has finished compiling the sub getDBH
code. It may be much later before the code to initialize $Connect
etc. gets run.
Compare that to:
# A better example.
use Devel::Init qw< :Init >;
use DBI;
BEGIN {
my $Connect= 'DBI:pick:dick@trw:1965';
my $DB_user= 'dnelson';
my $DB_pass= 'convair3537';
my $dbh;
sub getDBH :Init(not_prefork) {
$dbh ||= DBI->connect(
$Connect, $DB_user, $DB_pass,
{ RaiseError => 1, PrintError => 0 },
);
return $dbh;
}
# Devel::Init is notified when the above line is compiled
}
# $Connect etc. are initialized when the above line is compiled
Devel::Init
gets notified of the desire to run getDBH()
before $Connect
etc. get initialized, but there is no opportunity for other code to be run between those two steps so this code should be safe.
Even safer (and more flexible) would be:
use Devel::Init qw< :Init InitBlock >;
our( $Connect, $DB_user, $DB_pass );
BEGIN {
$Connect= 'DBI:pick:dick@trw:1965';
$DB_user= 'dnelson';
$DB_pass= 'convair3537';
}
BEGIN {
my $dbh;
sub getDBH :Init(not_prefork) {
return $dbh ||= InitBlock {
DBI->connect(
$Connect, $DB_user, $DB_pass,
{ RaiseError => 1, PrintError => 0 },
),
};
}
}
[The second BEGIN
has no impact in the above example. But we consider the use of BEGIN
for blocks around such "static" (or "state") variables (like $dbh
above) to be a "best practice" because there are many cases where it can prevent problems.]
Depending on initialization provided by another module
Below is another pattern that could lead to initialization being run before it is ready:
# A bad example!
package Some::Module;
use Devel::Init ':Init';
require Local::Config; # Oops
{
my $dbh;
sub getDBH :Init(not_prefork) {
$dbh ||= DBI->connect(
Local::Config::GetDbConnectParams('some_db'),
);
return $dbh;
}
}
# Avoid the above!
This could cause Local::Config::GetDbConnectParams()
to be called before it has even been defined -- because getDBH()
could be called before the require Local::Config;
line gets run so the Local::Config
module might not have been loaded yet.
The compile-time pattern
The simplest (and likely most common) pattern to prevent such problems is to load the other module at compile time via use
. To demonstrate the benefit of a BEGIN
block, we also add a simple data initialization ($db_name
):
# A "best practice"!
package Some::Module;
use Devel::Init qw< :Init InitBlock >;
use Local::Config qw< GetDbConnectParams >;
use DBI ();
BEGIN {
my $dbh;
my $db_name= $ENV{'DB_NAME'} || 'some_db';
sub getDBH :Init(not_prefork) {
return $dbh ||= InitBlock {
DBI->connect(
GetDbConnectParams( $db_name ),
)
or die "Failed to connect to $db_name: $DBI::errstr\n";
};
}
}
The run-time pattern
If you find yourself in the somewhat unusual situation of trying to delay the loading of module code, then you might instead use the following "run-time" pattern:
# Use this pattern with caution.
package Some::Module;
require Devel::Init;
require Local::Config;
require DBI;
{
my $dbh;
my $db_name= $ENV{'DB_NAME'} || 'some_db';
sub getDBH {
return $dbh ||= Devel::Init::InitSub( sub {
DBI->connect(
Local::Config::GetDbConnectParams( $db_name ),
)
or die "Failed to connect to $db_name: $DBI::errstr\n";
} );
}
}
Devel::Init::RegisterInit( \&getDBH, 'not_prefork' );
Note that deviating from this "run time" pattern elsewhere in the code for the above Some::Module
could then cause problems.
USAGE
InitBlock
The InitBlock() function can be imported at compile-time (via use
). It is declared with a subroutine "prototype", sub InitBlock(&)
, so that you must pass it a single block (code surrounded by curly braces, {
and }
) and that block will be treated as a subroutine (as a "CODE" reference).
The passed-in block of code is called (from a scalar- or list-context to match the context InitBlock() was called from) and the value(s) returned by the block are returned by InitBlock().
So, InitBlock() acts like a no-op, simply returning whatever is returned by the code you passed to it.
BEGIN {
my $singleton;
sub get_singleton :Init {
return $singleton ||= InitBlock {
... # do the hard work of initializing $initializedValue
return $initializedValue;
};
}
}
However, InitBlock() keeps track of which initialization steps are still in-progress and thus can detect circular dependencies between initialization steps. When a circular dependency is detected, InitBlock() will die
, providing an informative summary of exactly what components make up that cycle of dependence.
InitSub
InitSub() behaves exactly the same as InitBlock() but it is not declared with a subroutine "prototype" so you must pass a single scalar to it which must be a "CODE" reference (like "sub { ... }
" or "\&subname
", for example):
{
my $dbh;
sub getDBH {
return $dbh ||= Devel::Init::InitSub( sub {
DBI->connect( @connect_params )
or die "Failed to connect to DB: $DBI::errstr\n";
} );
}
}
Devel::Init::RegisterInit( \&getDBH, 'not_prefork' );
:Init
The basic (compile-time) initialization pattern:
package Some::Module;
use Devel::Init qw< :Init >;
BEGIN {
my $single;
sub get_singleton :Init {
return $single ||= InitBlock {
... # figure out $initialized_value
return $initialized_value;
};
}
}
sub some_sub {
... get_singleton() ...
}
The use Devel::Init qw<:Init>;
line allows the current package to use the :Init
attribute on any sub(routine)s it declares. By including that attribute you tell Devel::Init
that you want that subroutine [get_singleton()
, above] called [with no arguments passed to it] at whatever point the current environment arranges as the "best" time.
You can also pass predicates via the :Init
attribute that provide hints about which times might be more or less appropriate for each initialization step. To specify more than one predicate, separate them with spaces, tabs, commas, and/or other white-space. For example:
...
sub get_dynamic_config :Init(not_prefork,not_unittest) {
...
}
...
Note that the text inside the parentheses after :Init
is not Perl code. Do not put quotes around the predicates. You also can't interpolate scalars inside there:
sub getFoo :Init(prefork,$other); # Not what it appears!
However, as a special case that you are unlikely to use, you can pretend to interpolate arrays or references to arrays. See "Use in public modules" for the details (on second thought, you probably shouldn't).
:NoDefault
Devel::Init
declares its own INIT
block as follows:
INIT {
RunInits( 'default' )
if $DefaultInit;
}
This means that RunInits('default')
will be run just after the main script has finished being compiled (just before the main script's "run time" begins), unless this default initialization step is disabled prior to that.
The default initialization step is usually disabled by the main script (or some module that it use
s) importing RunInits() (usually so that the main script can call RunInits() at the most appropriate time):
use Devel::Init qw< RunInits >; # Disables default initialization
But you can also disable the default inititialization step by "importing" ':NoDefault'
(which does not import anything into the caller's namespace):
use Devel::Init qw< :NoDefault >; # Disables default initialization
Note that several environments (such as when Perl is embedded, such as is the case with mod_perl) may not run INIT
blocks. So don't be surprised if the default initialization step fails to run in some environments (even if nobody disabled it).
The default initialization step is provided mainly for authors of public modules who might otherwise be tempted to concoct their own schemes meant to force early initialization for cases when the users of their module fail to make use of Devel::Init
in the main script to define the best point in time for initialization to happen.
It means that using Devel::Init
in a public module defaults to acting the same as doing your initialization inside of your own INIT
block.
Note that the default initialization step specifies the predicate 'default' just in case somebody wants to declare steps that should only be run by the default step ('only_default'
) or that should not be run by the default step ('not_default'
). However, the author of Devel::Init
has not yet imagined any reasons why somebody should want to specify such.
RegisterInit()
The less-likely run-time initialization pattern:
package Some::Module;
use Devel::Init qw< RegisterInit InitBlock >;
{
my $single;
sub get_singleton {
$single ||= InitBlock {
... # figure out $initialized_value
return $initialized_value;
};
return $single;
}
}
RegisterInit( \&get_singleton );
The first argument to RegisterInit()
should be a reference to a subroutine that should be run (with no arguments passed to it) at the desired initialization time. Subsequent arguments to RegisterInit(), if any, should be predicate string(s).
As a special case that you are unlikely to use, a predicate string can instead be a references to an array. The contents of that array will be used as predicates at the time of any future calls to RunInits()
. See "Use in public modules" for why you might do that (or not).
Predicates
When you register an initialization step, you can specify zero or more predicates that provide hints as to which initialization point in time might be more or less appropriate for that step to be run at.
A predicate is just a word that describes a property that a point in time for initialization might or might not possess. Each predicate may be prefixed by "not_" or "only_" (or have no prefix).
Registering an initialization step with "not_X" would mean that it would avoid being run at a time that was designated as either "X" or "only_X". Registering an initialization step with "only_X" would mean that it would avoid being run at a time that was NOT designated as either "X" or "only_X". More details on how predicates interact are given later.
You can invent and use any predicate words that make sense to you.
It is expected that most initialization steps will be registered with no predicates specified. The next-most-expected registrations are ones designated as 'not_prefork' [those that connect to external resources in ways that can't be shared with fork()
ed children].
RunInits()
Calling RunInits()
causes (some of) the previously registered initialization steps to be immediately run.
use Devel::Init qw< RunInits >;
...
RunInits( '-StackTrace', 'not_prefork' );
If the first argument to RunInits()
is either '-StackTrace'
or '-NoStackTrace'
, then it overrides the default behavior specified by use Devel::Init qw< -StackTrace >;
(see below). Other arguments must not begin with a dash character, '-'
, as such are reserved for future use as other options.
The remaining arguments to RunInits()
are predicate strings that describe aspects of the current initialization phase. These serve to select which of the previously registered initialization steps will get run now.
Just as with :Init
and RegisterInit()
, each predicate can be prepended with 'not_'
or 'only_'
. See "Predicate details" for more details.
If an initialization step die
s, then RunInits()
will cleanly remove that step (and any prior steps that were just run) from the list of pending initializations and then re-die
with the same exception or message.
[You likely don't care, but RunInits() returns the count of initialization steps that it just ran -- and this might change in future.]
-StackTrace and -NoStackTrace
Importing -StackTrace
from Devel::Init
will generate a stack trace (see the Carp module) if Devel::Init
detects a fatal error [though this is overridden by RunInits('-NoStackTrace',...), of course].
use Devel::Init qw< RunInits -StackTrace >;
Importing -NoStackTrace
restores the default of not generating a stack trace when fatal errors are detected.
Predicate details
More predicate examples
The most-expected arrangements of running initialization steps are:
A simple command-line utility:
...
if( ! GetOpts() ) {
die Usage();
}
# Don't do costly init when we only issue a 'usage' message.
RunInits();
...
A server with worker children:
...
RunInits('prefork');
fork_worker_children(
sub {
RunInits();
do_work();
},
);
...
A unit test script:
use Test::More ...;
...
require_ok( 'Some::Module' );
RunInits( 'only_unittest' );
...
Some possible arrangements of predicates when registering are illustrated below (with subroutine names and comments to help justify each choice), roughly from most-likely to least-likely:
sub getDriverHash :Init {
# Run as soon as possible in all "normal" environments.
# For unit tests, doesn't get run unless/until needed.
}
sub getDBH :Init(not_prefork) {
# Connects to a DB so should not be done before forking.
}
sub getHugeData :Init(only_prefork) {
# Doesn't need to be run until needed, unless we plan
# to fork a bunch of children such that we should
# try to pre-load the huge data so it will likely be
# shared between children.
}
sub getConfig :Init(unittest) {
# Run as soon as possible.
# Even run when unit testing does RunInits('only_unittest')
# because this 'config' knows how to stub itself out then.
}
sub checkContracts :Init(only_unittest) {
# Only do these constly checks when doing unit tests.
}
sub autoStubbedConnection :Init(not_prefork,unittest) {
# Don't run pre-fork but otherwise run as soon as possible,
# even when in a unit test.
}
sub getDynamicConfig :Init(only_unittest) :Init(only_prefork) {
# This step can be expensive. Don't run it until needed.
# Except that we want to run it before fork()ing if we
# will have worker children (lots of data to share).
# And we want to run it early for unit testing where it
# gets stubbed out and so isn't expensive and also arranges
# to stub out other stuff.
# When part of a command-line utility, this step can be
# completely skipped when a "usage" error is detected,
# saving significant time.
}
How predicates interact
When multiple predicates are specified as part of a registration or are passed to RunInits()
, it means that all predicates must be "satisfied" for an initialization step to be run at that time.
So, sub getFoo :Init(bar,baz) ...
indicates that getFoo();
should be run only when both 'bar' and 'baz' are appropriate.
If you want to specify an "or
" relationship, then register the initialization step more than once:
sub getFud :Init(bar,baz) :Init(fig,fug) {
...
}
The above indicates that getFud();
should be called when both 'bar' and 'baz' are appropriate -- or -- when both 'fig' and 'fug' are appropriate. Actually, this means that getFud();
might get called twice (which isn't usually a problem with the 'get singleton' initialization pattern that is encouraged).
Similarly, writing the following:
RunInits( 'bar', 'baz' );
RunInits( 'fig', 'fug' );
would mean that you want to run all steps where 'bar' and 'baz' are appropriate -- or -- where 'fig' and 'fug' are appropriate.
What does it mean for a predicate word to be appropriate? The following table illustrates all of the different possible cases for a single predicate word ("X", in this case). Note that "n/a" denotes when "X" was not specified as a predicate, not even with any prefix:
not_X n/a X only_X
----- --- --- ------
not_X : run run NOT NOT
n/a : run run run NOT
X : NOT run run run
only_X : NOT NOT run run
These interactions were carefully chosen to allow "common sense" (more like "common English usage", really) to reveal how they work. But, just in case the author's "sense" has little in common with the reader's "sense", let's beat this horse to a proper death.
Below we re-use our extensive examples of initialization steps from above and then give example RunInits()
invocations and list which steps would be run in each case.
sub getDriverHash :Init;
sub getDBH :Init(not_prefork);
sub getHugeData :Init(only_prefork);
sub getConfig :Init(unittest);
sub checkContracts :Init(only_unittest);
sub autoStubbedConnection :Init(not_prefork,unittest);
sub getDynamicConfig :Init(only_unittest) :Init(only_prefork);
The command-line utility use-case:
RunInits();
# Runs getDriverHash()
# Runs getDBH()
# Saves getHugeData for later ('only_prefork' not met)
# Runs getConfig()
# Saves checkContracts for later ('only_unittest' not met)
# Runs autoStubbedConnection()
# Saves getDynamicConfig (both registrations) for later
The fork()
ing server use-case:
RunInits('prefork');
# Runs getDriverHash()
# Saves getDBH for later ('not_prefork')
# Runs getHugeData()
# Runs getConfig()
# Saves checkContracts for later ('only_unittest')
# Saves autoStubbedConnection for later ('not_prefork' not met)
# Runs getDynamicConfig() /once/ ('only_unittest' not met)
RunInits();
# Ran getDriverHash above (no longer registered)
# Runs getDBH()
# Ran getHugeData above
# Ran getConfig above
# Saves checkContracts for later ('only_unittest')
# Runs autoStubbedConnection()
# Saves getDynamicConfig's 'only_unittest' entry for later
The unit test use-case:
RunInits( 'only_unittest' );
# Saves getDriverHash for later ('unittest' not specified)
# Saves getDBH for later ('unittest' not specified)
# Saves getHugeData for later ('only_prefork' not met either)
# Runs getConfig()
# Runs checkContracts()
# Runs autoStubbedConnection()
# Runs getDynamicConfig() /once/ ('only_prefork' not met)
Finally, the flog-the-dead-horse, non-sensical use-case [such invocations of RunInits()
really make no sense]:
RunInits( 'only_prefork', 'not_unittest' ); # Don't do this!
# Saves getDriverHash for later ('prefork' not specified)
# Saves getDBH for later ('not_prefork' very not met)
# Runs getHugeData()
# Saves getConfig for later ('unittest' not met)
# Saves checkContracts for later ('prefork' not specified)
# Saves autoStubbedConnection for later (nothing "met")
# Runs getDynamicConfig() /once/ ('only_unittest' not met)
RunInits( 'not_prefork', 'not_postfork' ); # Don't do this!
# Runs getDriverHash()
# Runs getDBH()
# Ran getHugeData above (no longer registered)
# Runs getConfig()
# Saves checkContracts for later ('only_unittest' not met)
# Runs autoStubbedConnection()
# Saves getDynamicConfig's 'only_unittest' entry for later
LIMITATIONS
Compatibility with Attribute::Handlers, etc.
The design of attributes.pm
encourages complex and hard-to-coordinate usage patterns that are (unfortunately) well demonstrated by Attribute::Handlers
. Although early drafts of Devel::Init
included complex code to try to support compatibility with Attribute::Handlers
, it was determined that it is more appropriate for such compatibility to be handled in a more sane manner via changes to attributes.pm
(or at least some other module).
Devel::Init
uses a much, much simpler approach for supporting attributes and also supports attributes::get()
(which Attribute::Handlers
appears to have completely ignored).
Using use Devel::Init qw<:Init>;
in a package is likely to cause uses of Attribute::Handlers
or similar attribute-handling modules to be ignored. This is because attributes.pm
basically does Some::Module-
can('MODIFY_CODE_ATTRIBUTES')> and Devel::Init
directly defines Some::Module::MODIFY_CODE_ATTRIBUTES()
(and Some::Module::FETCH_CODE_ATTRIBUTES
) while Attribute::Handlers
makes complicated use of multiple layers of inheritance. Only one MODIFY_CODE_ATTRIBUTES()
method is found and used by attributes.pm
, and it will be the one defined by Devel::Init
.
Note that Attribute::Handlers
does
push @UNIVERSAL::ISA, 'Attribute::Handlers::UNIVERSAL'
which means that every single class now magically inherits from Attribute::Handlers::UNIVERSAL
. This is an extremely heavy-handed way to implement anything.
Devel::Init
will cooperate with an attribute-handling module that directly defines a Some::Module::MODIFY_CODE_ATTRIBUTES()
method provided either that Devel::Init
is loaded second or the other module also cooperates.
Use in public modules
Devel::Init
was primarily designed for use within a large code base maintained by some organization, such as a company's internal code base. The precise meaning of possible common predicates like 'prefork' or 'unittest' is difficult to nail down for the universe of all possible environments. Even more difficult would be defining the full set of predicates that might be desired.
So if you make use of Devel::Init in a public module, you may want to allow the user of the module to override which predicates to apply. You could do that by writing your module code something like:
[NOTE: This only works for global arrays from the current package. Specifying what looks like an array is like passing in the values from that array at the time of the subroutine declaration being compiled. Specifying what looks like a reference to an array delays the reading of predicates from the array until each time RunInits() is called.]
package Public::Package;
our @InitWhen= qw< not_prefork >;
# @InitWhen must be a /global/ array declared via 'our' or 'use vars'!
# "my @InitWhen" would break this example!
sub foo :Init(\@InitWhen) {
...
}
sub import {
my( $pkg, %opts )= @_;
for my $when ( $opts{InitWhen} ) {
@InitWhen= @$when
if $when;
}
...
}
and then documenting that the user can write code similar to the following to override the Devel::Init
predicates that are applied:
package main;
use Public::Package( InitWhen => [qw< make_db_ready >] );
The code
sub foo :Init(\@InitWhen) {
actually has the same effect as:
BEGIN { RegisterInit( \&foo, '\\@InitWhen' ); }
[Note that the second argument is just a string.]
AUTHOR
Tye McQueen -- http://www.perlmonks.org/?node=tye
Many thanks to http://www.whitepages.com/ and http://www.marchex.com/ for supporting the creation of this module and allowing me to release it to the world.
LICENSE
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself (see http://www.perl.com/perl/misc/Artistic.html).
SEE ALSO
Theo Chocolate Factory, Fremont, WA.US - http://www.theochocolate.com/
git://github.com/TyeMcQueen/Devel-Init.git