NAME

Parse::PerlConfig - parse a configuration file written in Perl

SYNOPSIS

use Parse::PerlConfig;
my $parsed = Parse::PerlConfig::parse(
    File            =>      "/etc/perlapp/conf",
    Handlers        =>      [\%config, \&config],
);

DESCRIPTION

This module is useful for parsing a configuration file written in Perl and obtaining the values defined therein. This is achieved through the parse() function, which creates a namespace, reads in Perl code, evals it, and then examines the namespace's symbol table. Symbols are then processed into a hash and returned.

Export

The parse() function is exportable upon request.

Parsing

Parsing is not a simple do("filename"). Instead the filenames specified are opened, read, eval'd, and closed. The justification for this being twofold:

I did not want surprises in what file was found; do("file") searches @INC.

lexicals

I wanted to be able to insert lexicals for the code in the file to see. Being able to define variables without having them parsed back out (remember, the namespace is searched) is a nice feature.

Parsing (in this manner) requires a namespace. By default, the namespace is constructed by appending Namespace_Base to a unique identifier (currently, an encoded version of the filename, but don't rely on this). You can override this behaviour by specifying an explicit Namespace argument.

Prior to eval'ing the contents of a configuration file the lexical hash %parse_perl_config is initialized with several keys (documented below); if a Lexicals argument was given each of the lexicals specified are initialized.

There are a few caveats; lexicals specified in the Lexicals argument cannot override %parse_perl_config; keys specified in Lexicals cannot be code references, because code references cannot currently be reliably reconstructed; modifications to %parse_perl_config keys (other than Error, documented below) are discouraged, as the results are not defined.

The %parse_perl_config hash contains the following keys:

Error

Making this key a true value will cause the error handler Error_eval to be called with the value.

Namespace

The namespace the file is being evaluated in.

Filename

The name of the file being parsed.

Parse_Args

A hash of the arguments passed to parse().

Once the namespace has been setup, and the code eval'd, it is then parse()'s job to go through the namespace's symbol table and look for "things". What it looks for depends on the Thing_Order and Symbols arguments.

After that, handlers are updated, and a hash reference of what was parsed out is returned.

The parse subroutine

Arguments

parse() takes a list of key-value pairs as its arguments and adds them to an argument hash. If the first argument to parse() is a hash or array reference, it is dereferenced and used as if it were specified as a list. All elements following this argument are added to the arguments hash, and they override any settings specified by the reference.

This means the call:

parse(
    { Files => "/home/me/config.conf", Error_default => 'fwarn' },
    Files => "/home/you/config.conf"
);

causes parse()'s argument hash to consist of the following (ignoring default settings):

Files           =>  "/home/you/config.conf",
Error_default   =>  "fwarn",

Simply replace the braces, {}, with brackets, [], and you get the same result. This makes it convenient to store commonly-used arguments to parse() in a hash or array, and efficiently pass these arguments to parse(), while still allowing a seperate Files argument for each call.

The below itemization of parse()'s arguments describes key-value pairs. Each item consists of a key name and a description of the expected value for that key.

The value description requires some explanation.

A single pipe, "|", indicates alternative values; only one of the values must be specified.

Values bracketed with "<" and ">" indicate that value is not literal, but figurative. So, in the case of <coderef>, you must specify a code reference (a closure or reference to a named subroutine), not the literal string "<coderef>". Values without such bracketing are literal.

Braces, {}, indicate a hash reference is required; brackets, [], indicate an array reference is required.

Below each key-value description is a description of the default setting, followed by a description of what the argument means.

Files <filename> | [<filename> ...]
default: none, this argument is required

This is the file or files you wish to parse. If a file cannot be parsed for any reason the entire parse is not abandoned, the file is simply skipped (after calling an appropriate error handling function).

File <filename>

Equivalent to Files <filename>.

Handlers <hashref>|<coderef> | [<hashref>|<coderef> ...]
default: none

By default, parse() simply returns a hash reference of symbol names and their values. Given a Handlers argument, parse() will add key-value pairs to each hash reference specified, and call each code reference specified with a single argument, the hash reference it returns.

Handler <hashref>|<coderef>

Equivalent to Handlers <hashref>|<coderef>.

Lexicals <hashref>
default: none

The key-value pairs in the specified hashref are made into lexical variables in the configuration eval. See the section on Parsing for further information.

Thing_Order <string>|<arrayref>
default: '$@%&i*'

Specifies the default thing order for symbols parsed from each configuration file. See the section Things for further information.

Taint_Clean <boolean>
default: false

If set to any true value the filehandle opened on the configuration file is untainted before evaling the code contained therein. Because this involves loading IO::Handle, which involves quite a bit of code, the option is turned off by default. You will get taint exceptions if you don't specify this option while running in -T mode.

Also, as the namespace is currently constructed, having a tainted filename will cause the namespace name to be tainted, so it is also untainted. In the case of an explicitly specified Namespace value, it will also be untainted.

No other values are untainted. This includes any key-value pairs specified by the Lexicals argument; you must untaint those yourself, since there is no reasonable way for parse() to determine how best to untaint them.

Symbols <hashref>
default: empty hashref

This is an override for the Thing_Order argument above. The keys in the specified hashref are symbols you want parsed specially, the values the thing order (either a string or array reference). See the section Things for further information regarding thing order.

Namespace <string>
default: generated from Namespace_Base and a unique identifier

This option explicitly specifies the namespace the files are parsed in. See the section Parsing for further information.

Namespace_Base <string>
default: Parse::PerlConfig::ConfigFile

Unless the Namespace argument is specified, the namespace a file is parsed in is generated by appending a cleaned up version of the filename to this setting. See the section Parsing for further information.

Error_*

See the section Error Handling for further information.

Things

"Things" (as taken from the Perl documentation, regarding the *foo{THING} syntax) are the Perl datatypes. These include scalars, arrays, hashes, subroutines, IO handles, and globs.

Anywhere a "things" argument is required you can specify one of two things; a string containing the special "thing" characters, or an array reference of each thing's actual name. The thing characters are as follows: $ for scalar, % for a hash, @ for an array, & for a subroutine, i for an IO handle, and * for a glob. The full name for each coincides with the full name for each datatype in their respective glob slots: SCALAR for a scalar, HASH for a hash, ARRAY for an array, CODE for a subroutine, IO for an IO handle, and GLOB for a glob.

Exception Handling

parse() takes various Error_* and Warn_* arguments that determine how it handles any problems it encounters. Each argument can take one of several values.

default

The error handling specifed by Error_default in the case of an Error_* argument, or Warn_default in the case of a Warn_* argument, is used.

noop

The error is ignored.

warn

Results in a call to CORE::warn() with a trailing newline, but only if $^W is set to a true value.

fwarn

Like warn, but the warning is raised regardless of $^W's value.

die

Results in a call to CORE::die() with a trailing newline.

<code reference>

The code reference will be called with a single argument, that of the error message. The error message is guaranteed to contain no trailing newlines (in case the code reference decides to die() or warn()).

There are various handler arguments. Unless otherwise specified, the default handler is used (Error_default's or Warn_default's value).

Warn_default
default: noop

The default warning handler.

Warn_preparse

Called just before a file is parsed to indicate parsing is about to begin.

Warn_eval

Called with any warnings issued by the eval'd file.

Error_default
default: warn

The default error handler.

Error_argument

Called if there is a problem with one of the arguments specified.

Error_file_is_dir

Called if a configuration file specified was discovered to be a directory.

Error_failed_open

Called if the open attempt on a configuration file fails.

Error_eval

Called if the variable $parse_perl_config{Error} is set in the configuration file, or if there was an eval error.

Error_unknown_thing

Called if there is a problem with a thing character or thing name in a thing argument (thing thing thing).

Error_unknown_handler

Called if an unknown reference is encountered in the Handlers argument.

Error_invalid_lexical

Called if an invalid lexical name or a CODE reference value is encountered in the Lexicals argument.

Error_invalid_namespace

Called if either the constructed namespace (using Namespace_Base) or a specified Namespace value is invalid. This may indicate an error in the construction of a namespace name (the generation of a unique identifier), but it's most likely you specified Namespace_Base or Namespace with invalid characters.

BUGS

Due to the fact that the scalar slot in a glob is always filled it is not possible to distinguish from a scalar that was never defined (e.g. @foo was, but $foo was never mentioned) from one that is simply undef. Because of this, for example, if you have a thing order of $@ and code along the lines of $foo = undef; @foo = (); the 'foo' key of the hash will be an array reference, despite there being a scalar and $ coming first in the thing order.

TODO

t/parse/symbols.t, t/parse/multi-file.t, t/parse/namespace.t

AUTHOR

Michael Fowler <michael@shoebox.net>