NAME
Data::Domain - Data description and validation
SYNOPSIS
use Data::Domain qw/:all/;
my $domain = Struct(
anInt => Int(-min => 3, -max => 18),
aNum => Num(-min => 3.33, -max => 18.5),
aDate => Date(-max => 'today'),
aLaterDate => sub {my $context = shift;
Date(-min => $context->{flat}{aDate})},
aString => String(-min_length => 2, -optional => 1),
anEnum => Enum(qw/foo bar buz/),
anIntList => List(-min_size => 1, -all => Int),
aMixedList => List(Integer, String, Int(-min => 0), Date),
aStruct => Struct(foo => String, bar => Int(-optional => 1))
);
my $messages = $domain->inspect($some_data);
my_display_error($messages) if $messages;
DESCRIPTION
A data domain is a description of a set of values, either scalar, or, more interestingly, structured values (arrays or hashes). The description can include many constraints, like minimal or maximal values, regular expressions, required fields, forbidden fields, and also contextual dependencies. From that description, one can then check if a given value belongs to the domain. In case of mismatch, a structured set of error messages is returned.
The motivation for writing this package was to be able to express in a compact way some possibly complex constraints about structured data. The data is a Perl tree (nested hashrefs or arrayrefs) that may come from XML, JSON, from a database through DBIx::DataModel, or from postprocessing an HTML form through CGI::Expand. Data::Domain
is a kind of tree parser on that structure, with some facilities for dealing with dependencies within the structure through lazy evaluation of domains. Other packages doing data validation are briefly listed in the "SEE ALSO" section.
DISCLAIMER : this code is still in design exploration phase; some parts of the API may change in future versions.
FUNCTIONS
Shortcuts for domain constructors
Internally, domains are represented as Perl objects; however, it would be tedious to write
my $domain = Data::Domain::Struct->new(
anInt => Data::Domain::Int->new(-min => 3, -max => 18),
aDate => Data::Domain::Date->new(-max => 'today'),
...
);
so for each of its builtin domain constructors, Data::Domain
exports a plain function that just calls new
on the appropriate subclass. If you import those functions (use Data::Domain qw/:all/
, or use Data::Domain qw/Struct Int Date .../
), then you can write more conveniently :
my $domain = Struct(
anInt => Int(-min => 3, -max => 18),
aDate => Date(-max => 'today'),
...
);
Function names like Int
or String
are convenient, but also quite common and therefore may cause name clashes with other modules. In such cases, don't import the function names, and explicitly call the <new> method on domain constructors -- or write your own wrappers around them.
node_from_path
my $node = node_from_path($root, @path);
Convenience function to find a given node in a data tree, starting from the root and following a path (a sequence of hash keys or array indices). Returns undef
if no such path exists in the tree. Mainly useful for contextual constraints in lazy constructors (see below).
METHODS
new
Creates a new domain object, from one of the domain constructors listed below (Num
, Int
, Date
, etc.). The Data::Domain
class itself has no new
method, because it is an abstract class.
Arguments to the new
method specify various constraints for the domain (minimal/maximal values, regular expressions, etc.); most often they are specific to a given domain constructor, so see the details below. However, there are also some generic options :
-optional
-
if true, an <undef> value will be accepted, without generating an error message
-default
-
defines a default value for the domain, that can then be retrieved by client code, for example for pre-filling a form
-name
-
defines a name for the domain, that will be printed in error messages instead of the subclass name.
-messages
-
defines how error messages will be generated
Option names always start with a dash. If no option name is given, parameters to the new
method are passed to the default option, which differs according to the constructor subclass. For example the default option in List
is -items
, so
my $domain = List(Int, String, Int);
is equivalent to
my $domain = List(-items => [Int, String, Int]);
inspect
my $messages = $domain->inspect($some_data);
Inspects the supplied data, and returns an error message (or a structured collection of messages) if anything is wrong. If the data successfully passed all domain tests, then nothing is returned.
For scalar domains (Num
, String
, etc.), the error message is just a string. For structured domains (List
, Struct
), the return value is a corresponding arrayref or hashref, like for example
{anInt => "smaller than mimimum 3",
aDate => "not a valid date",
aList => ["message for item 0", undef, undef, "message for item 3"]}
The client code can then exploit this structure to dispatch error messages to appropriate locations (typically these will be the form fields that gathered the data).
msg
Internal utility function for generating an error message.
subclass
Returns the short name of the subclass of Data::Domain
(i.e. returns 'Int' for Data::Domain::Int
).
messages
Global setting to choose how error messages are generated. The argument
Data::Domain->messages('english'); # the default
Data::Domain->messages('français');
Data::Domain->messages($my_messages);
Data::Domain->messages(sub {my ($msg_id, @args) = @_;
return "you just got it wrong"});
Global setting to choose how error messages are generated. The argument is either the name of a builtin language (right now, only english and french), or a hashref with your own messages, or a reference to your own message handling function.
If supplying your own messages, you should pass a two-level hashref: first-level entries in the hash correspond to Data::Domain
subclasses (i.e Num => {...}
, String => {...}
); for each of those, the second-level entries should correspond to message identifiers as specified in the doc for each subclass (for example TOO_SHORT
, NOT_A_HASH
, etc.). Values should be strings suitable to be fed to sprintf.
BUILTIN DOMAIN CONSTRUCTORS
Whatever
my $domain = Struct(
just_anything => Whatever,
is_defined => Whatever(-defined => 1),
is_undef => Whatever(-defined => 0),
is_true => Whatever(-true => 1),
is_false => Whatever(-true => 0),
is_object => Whatever(-isa => 'My::Funny::Object'),
has_methods => Whatever(-can => [qw/jump swim dance sing/]),
);
Encapsulates just any kind of Perl value.
Options
- -defined
-
If true, the data must be defined. If false, the data must be undef.
- -true
-
If true, the data must be true. If false, the data must be false.
- -isa
-
The data must be an object of the specified class.
- -can
-
The data must implement the listed methods, supplied either as an arrayref (several methods) or as a scalar (just one method).
Error messages
In case of failure, the domain returns one of the following scalar messages : MATCH_DEFINED
, MATCH_TRUE
, MATCH_ISA
, MATCH_CAN
.
Num
my $domain = Num(-min => -3.33, -max => 999, -not_in => [2, 3, 5, 7, 11]);
Domain for numbers (including floats).
Options
- -min
-
The data must be greater or equal to the supplied value.
- -max
-
The data must be smaller or equal to the supplied value.
- -not_in
-
The data must be different from all values in the exclusion set, supplied as an arrayref.
Error messages
In case of failure, the domain returns one of the following scalar messages : INVALID
, TOO_SMALL
, TOO_BIG
, EXCLUSION_SET
.
Int
my $domain = Int(-min => 0, -max => 999, -not_in => [2, 3, 5, 7, 11]);
Domain for integers. Accepts the same options as Num
and returns the same error messages.
Date
Data::Domain::Date->parser('EU'); # default
my $domain = Date(-min => '01.01.2001',
-max => 'today',
-not_in => ['02.02.2002', '03.03.2003', 'yesterday']);
Domain for dates, implemented via the Date::Calc module. By default, dates are parsed according to the european format, i.e. through the Decode_Date_EU method; this can be changed by setting
Data::Domain::Date->parser('US'); # will use Decode_Date_US
or
Data::Domain::Date->parser(\&your_own_date_parsing_function);
# that func. should return an array ($year, $month, $day)
When outputting error messages, dates will be printed according to Date::Calc's current language (english by default); see that module's documentation for changing the language.
Options
In the options below, the special keywords today
, yesterday
or tomorrow
may be used instead of a date constant, and will be replaced by the appropriate date when performing comparisons.
- -min
-
The data must be greater or equal to the supplied value.
- -max
-
The data must be smaller or equal to the supplied value.
- -not_in
-
The data must be different from all values in the exclusion set, supplied as an arrayref.
Error messages
In case of failure, the domain returns one of the following scalar messages : INVALID
, TOO_SMALL
, TOO_BIG
, EXCLUSION_SET
.
Time
my $domain = Time(-min => '08:00', -max => 'now');
Domain for times in format hh:mm:ss
(minutes and seconds are optional).
Options
In the options below, the special keyword now
may be used instead of a time, and will be replaced by the current local time when performing comparisons.
- -min
-
The data must be greater or equal to the supplied value.
- -max
-
The data must be smaller or equal to the supplied value.
Error messages
In case of failure, the domain returns one of the following scalar messages : INVALID
, TOO_SMALL
, TOO_BIG
.
String
my $domain = String(qr/^[A-Za-z0-9_\s]+$/);
my $domain = String(-regex => qr/^[A-Za-z0-9_\s]+$/,
-antiregex => qr/$RE{profanity}/, # see Regexp::Common
-min => 'AA',
-max => 'zz',
-not_in => [qw/foo bar/]);
Domain for strings.
Options
- -regex
-
The data must match the supplied compiled regular expression. Don't forget to put
^
and$
anchors if you want your regex to check the whole string.-regex
is the default option, so you may just pass the regex as a single unnamed argument toString()
. - -antiregex
-
The data must not match the supplied regex.
- -min
-
The data must be greater or equal to the supplied value.
- -max
-
The data must be smaller or equal to the supplied value.
- -not_in
-
The data must be different from all values in the exclusion set, supplied as an arrayref.
Error messages
In case of failure, the domain returns one of the following scalar messages : TOO_SHORT
, TOO_LONG
, TOO_SMALL
, TOO_BIG
, EXCLUSION_SET
, SHOULD_MATCH
, SHOULD_NOT_MATCH
.
Enum
my $domain = Enum(qw/foo bar buz/);
Domain for a finite set of scalar values.
Options
- -values
-
Ref to an array of values admitted in the domain. This would be called as
Enum(-values => [qw/foo bar buz/])
, but since this it is the default option, it can be simply written asEnum(qw/foo bar buz/)
.
Error messages
In case of failure, the domain returns the following scalar message : NOT_IN_LIST
.
List
my $domain = List(String, Int, String, Num);
my $domain = List(-items => [String, Int, String, Num]); # same as above
my $domain = List(-all => Int(qr/^[A-Z]+$/),
-min_size => 3,
-max_size => 10);
Domain for lists of values (stored as Perl arrayrefs).
Options
- -items
-
Ref to an array of domains; then the first n items in the data must match those domains, in the same order.
This is the default option, so item domains may be passed directly to the
new
method, without the-items
keyword. - -min_size
-
The data must be a ref to an array with at least that number of entries.
- -max_size
-
The data must be a ref to an array with at most that number of entries.
- -all
-
All remaining entries in the array, after the first <n> entries as specified by the
-items
option (if any), must satisfy that domain specification. - -any
-
At least one remaining entry in the array, after the first <n> entries as specified by the
-items
option (if any), must satisfy that domain specification.Option
-any
is incompatible with option-all
.
Error messages
The domain will first check if the supplied array is of appropriate shape; in case of of failure, it will return of the following scalar messages : NOT_A_LIST
, c<TOO_SHORT>, TOO_LONG
.
Then it will check all items in the supplied array according to the -items
, -all
or -any
specifications, and return an arrayref of messages, where message positions correspond to the positions of offending data items.
Struct
my $domain = Struct(foo => Int, bar => String);
my $domain = Struct(-fields => [foo => Int, bar => String],
-exclude => '*');
Domain for associative structures (stored as Perl hashrefs).
Options
- -fields
-
Supplies a list of keys with their associated domains. The list might be given either as a hashref or as an arrayref (in which case the the order of individual field checks will follow the order in the array). The ordering may make a difference in case of context dependencies (see "LAZY CONSTRUCTORS" below ).
- -exclude
-
Specifies which keys are not allowed in the structure. The exclusion may be specified as an arrayref of key names, as a compiled regular expression, or as the string constant '
*
' or 'all
' (meaning that no key will be allowed except those explicitly listed in the-fields
option.
Error messages
The domain will first check if the supplied hash is of appropriate shape; in case of of failure, it will return of the following scalar messages : NOT_A_HASH
, c<FORBIDDEN_FIELD>.
Then it will check all entries in the supplied hash according to the -fields
specification, and return a hashref of messages, where keys correspond to the keys of offending data items.
One_of
my $domain = One_of($domain1, $domain2, ...);
Union of domains : successively checks the member domains, until one of them succeeds.
Options
Error messages
If all member domains failed to accept the data, an arrayref or error messages is returned, where the order of messages corresponds to the order of the checked domains.
LAZY CONSTRUCTORS (CONTEXT DEPENDENCIES)
Principle
If an element of a structured domain (List
or Struct
depends on another element), then we need to lazily construct the domain. Consider for example a struct in which the value of field date_end
must be greater than date_begin
: the subdomain for date_end
can only be constructed when the argument to <-min> is known, namely when the domain inspects an actual data structure.
Lazy domain construction is achieved by supplying a function reference instead of a domain object. That function will be called with some context information, and should return the domain object. So our example becomes :
my $domain = Struct(
date_begin => Date,
date_end => sub {my $context = shift;
Date(-min => $context->{flat}{date_begin})}
);
Structure of context
The supplied context is a hashref containing the following information:
- root
-
the overall root of the inspected data
- path
-
the sequence of keys or array indices that led to the current data node. With that information, the subdomain is able to jump to other ancestor or sibling data node within the tree, with help of the node_from_path function.
- flat
-
a flat hash containing an entry for any hash key met so far while traversing the tree. In case of name clashes, most recent keys (down in the tree) override previous keys.
- list
-
a reference to the last list (arrayref) encountered while traversing the tree.
Here is an example :
my $data = {foo => [undef, 99, {bar => "hello, world"}]};
my $domain = Struct(
foo => List(Whatever,
Whatever,
Struct(bar => sub {my $context = shift;
print Dumper($context);
String;})
)
);
$domain->inspect($data);
This code will print something like
$VAR1 = {
'root' => {'foo' => [undef, 99, {'bar' => 'hello, world'}]},
'path' => ['foo', 2, 'bar'],
'list' => $VAR1->{'root'}{'foo'},
'flat' => {
'bar' => 'hello, world',
'foo' => $VAR1->{'root'}{'foo'}
}
};
Usage examples
Contextual sets
my $some_cities = {
Switzerland => [qw/Genève Lausanne Bern Zurich Bellinzona/],
France => [qw/Paris Lyon Marseille Lille Strasbourg/],
Italy => [qw/Milano Genova Livorno Roma Venezia/],
};
my $domain = Struct(
country => Enum(keys %$some_cities),
city => sub {
my $context = shift;
Enum(-values => $some_cities->{$context->{flat}{country}});
});
Ordered lists
Here is an example of a domain for ordered lists of integers:
my $domain = List(-all => sub {
my $context = shift;
my $index = $context->{path}[-1];
return Int if $index == 0; # first item has no constraint
return Int(-min => $context->{list}[$index-1] + 1);
});
Recursive domains
A domain for expression trees, where leaves are numbers, and intermediate nodes are binary operators on subtrees
my $expr_domain = One_of(Num, Struct(operator => String(qr(^[-+*/]$)),
left => sub {$expr_domain},
right => sub {$expr_domain}));
WRITING NEW DOMAIN CONSTRUCTORS
Implementing new domain constructors is fairly simple : create a subclass of Data::Domain
and implement a new
method and an _inspect
method. See the source code of Data::Domain::Num
or Data::Domain::String
for short examples.
However, before writing such a class, consider whether the existing mechanisms are not enough for your needs. For example, many domains could be expressed as a String
with a regular expression:
my $Email_dom = String(qr/^[-.\w]+\@[\w.]+$/);
my $Phone_dom = String(qr/^\+?[0-9() ]+$/);
my $Contact_dom = Struct(name => String,
phone => $Phone_dom,
mobile => $Phone_dom,
emails => List(-all => $Email_dom));
SEE ALSO
Doc and tutorials on complex Perl data structures: perlref, perldsc, perllol.
Other CPAN modules doing data validation : Data::FormValidator, CGI::FormBuilder, HTML::Widget::Constraint, Jifty::DBI, Data::Constraint, Declare::Constraints::Simple. Among those, Declare::Constraints::Simple
is the closest to Data::Domain
, because it is also designed to deal with substructures; yet it has a different approach to combinations of constraints and scope dependencies.
Some inspiration for Data::Domain
came from the wonderful Parse::RecDescent module, especially the idea of passing a context where individual rules can grab information about neighbour nodes.
TODO
- generate javascript validation code
- normalization / conversions (-filter option)
- msg callbacks (-filter_msg option)
- default values within domains ? (good idea ?)
AUTHOR
Laurent Dami, <laurent.dami AT etat geneve ch>
COPYRIGHT AND LICENSE
Copyright 2006 by Laurent Dami.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 977:
Non-ASCII character seen before =encoding in 'Data::Domain->messages('français');'. Assuming CP1252