NAME

Data::Domain - Data description and validation

SYNOPSIS

use Data::Domain qw/:all/;

my $domain = Struct(
  anInt      => Int(-min => 3, -max => 18),
  aNum       => Num(-min => 3.33, -max => 18.5),
  aDate      => Date(-max => 'today'),
  aLaterDate => sub {my $context = shift;
                     Date(-min => $context->{flat}{aDate})},
  aString    => String(-min_length => 2, -optional => 1),
  anEnum     => Enum(qw/foo bar buz/),
  anIntList  => List(-min_size => 1, -all => Int),
  aMixedList => List(Integer, String, Int(-min => 0), Date),
  aStruct    => Struct(foo => String, bar => Int(-optional => 1))
);

my $messages = $domain->inspect($some_data);
my_display_error($messages) if $messages;

DESCRIPTION

A data domain is a description of a set of values, either scalar, or, more interestingly, structured values (arrays or hashes). The description can include many constraints, like minimal or maximal values, regular expressions, required fields, forbidden fields, and also contextual dependencies. From that description, one can then check if a given value belongs to the domain. In case of mismatch, a structured set of error messages is returned.

The motivation for writing this package was to be able to express in a compact way some possibly complex constraints about structured data. The data is a Perl tree (nested hashrefs or arrayrefs) that may come from XML, JSON, from a database through DBIx::DataModel, or from postprocessing an HTML form through CGI::Expand. Data::Domain is a kind of tree parser on that structure, with some facilities for dealing with dependencies within the structure through lazy evaluation of domains. Other packages doing data validation are briefly listed in the "SEE ALSO" section.

DISCLAIMER : this code is still in design exploration phase; some parts of the API may change in future versions.

FUNCTIONS

Shortcuts for domain constructors

Internally, domains are represented as Perl objects; however, it would be tedious to write

my $domain = Data::Domain::Struct->new(
  anInt      => Data::Domain::Int->new(-min => 3, -max => 18),
  aDate      => Data::Domain::Date->new(-max => 'today'),
  ...
);

so for each of its builtin domain constructors, Data::Domain exports a plain function that just calls new on the appropriate subclass. If you import those functions (use Data::Domain qw/:all/, or use Data::Domain qw/Struct Int Date .../), then you can write more conveniently :

my $domain = Struct(
  anInt      => Int(-min => 3, -max => 18),
  aDate      => Date(-max => 'today'),
  ...
);

Function names like Int or String are convenient, but also quite common and therefore may cause name clashes with other modules. In such cases, don't import the function names, and explicitly call the <new> method on domain constructors -- or write your own wrappers around them.

node_from_path

my $node = node_from_path($root, @path);

Convenience function to find a given node in a data tree, starting from the root and following a path (a sequence of hash keys or array indices). Returns undef if no such path exists in the tree. Mainly useful for contextual constraints in lazy constructors (see below).

METHODS

new

Creates a new domain object, from one of the domain constructors listed below (Num, Int, Date, etc.). The Data::Domain class itself has no new method, because it is an abstract class.

Arguments to the new method specify various constraints for the domain (minimal/maximal values, regular expressions, etc.); most often they are specific to a given domain constructor, so see the details below. However, there are also some generic options :

-optional

if true, an <undef> value will be accepted, without generating an error message

-default

defines a default value for the domain, that can then be retrieved by client code, for example for pre-filling a form

-name

defines a name for the domain, that will be printed in error messages instead of the subclass name.

-messages

defines how error messages will be generated

Option names always start with a dash. If no option name is given, parameters to the new method are passed to the default option, which differs according to the constructor subclass. For example the default option in List is -items, so

my $domain = List(Int, String, Int);

is equivalent to

my $domain = List(-items => [Int, String, Int]);

inspect

my $messages = $domain->inspect($some_data);

Inspects the supplied data, and returns an error message (or a structured collection of messages) if anything is wrong. If the data successfully passed all domain tests, then nothing is returned.

For scalar domains (Num, String, etc.), the error message is just a string. For structured domains (List, Struct), the return value is a corresponding arrayref or hashref, like for example

{anInt => "smaller than mimimum 3",
 aDate => "not a valid date",
 aList => ["message for item 0", undef, undef, "message for item 3"]}

The client code can then exploit this structure to dispatch error messages to appropriate locations (typically these will be the form fields that gathered the data).

msg

Internal utility function for generating an error message.

subclass

Returns the short name of the subclass of Data::Domain (i.e. returns 'Int' for Data::Domain::Int).

messages

Global setting to choose how error messages are generated. The argument

Data::Domain->messages('english');  # the default
Data::Domain->messages('français');
Data::Domain->messages($my_messages); 
Data::Domain->messages(sub {my ($msg_id, @args) = @_;
                            return "you just got it wrong"});

Global setting to choose how error messages are generated. The argument is either the name of a builtin language (right now, only english and french), or a hashref with your own messages, or a reference to your own message handling function.

If supplying your own messages, you should pass a two-level hashref: first-level entries in the hash correspond to Data::Domain subclasses (i.e Num => {...}, String => {...}); for each of those, the second-level entries should correspond to message identifiers as specified in the doc for each subclass (for example TOO_SHORT, NOT_A_HASH, etc.). Values should be strings suitable to be fed to sprintf.

BUILTIN DOMAIN CONSTRUCTORS

Whatever

my $domain = Struct(
  just_anything => Whatever,
  is_defined    => Whatever(-defined => 1),
  is_undef      => Whatever(-defined => 0),
  is_true       => Whatever(-true => 1),
  is_false      => Whatever(-true => 0),
  is_object     => Whatever(-isa => 'My::Funny::Object'),
  has_methods   => Whatever(-can => [qw/jump swim dance sing/]),
);

Encapsulates just any kind of Perl value.

Options

-defined

If true, the data must be defined. If false, the data must be undef.

-true

If true, the data must be true. If false, the data must be false.

-isa

The data must be an object of the specified class.

-can

The data must implement the listed methods, supplied either as an arrayref (several methods) or as a scalar (just one method).

Error messages

In case of failure, the domain returns one of the following scalar messages : MATCH_DEFINED, MATCH_TRUE, MATCH_ISA, MATCH_CAN.

Num

my $domain = Num(-min => -3.33, -max => 999, -not_in => [2, 3, 5, 7, 11]);

Domain for numbers (including floats).

Options

-min

The data must be greater or equal to the supplied value.

-max

The data must be smaller or equal to the supplied value.

-not_in

The data must be different from all values in the exclusion set, supplied as an arrayref.

Error messages

In case of failure, the domain returns one of the following scalar messages : INVALID, TOO_SMALL, TOO_BIG, EXCLUSION_SET.

Int

my $domain = Int(-min => 0, -max => 999, -not_in => [2, 3, 5, 7, 11]);

Domain for integers. Accepts the same options as Num and returns the same error messages.

Date

Data::Domain::Date->parser('EU'); # default    
my $domain = Date(-min => '01.01.2001', 
                  -max => 'today',
                  -not_in => ['02.02.2002', '03.03.2003', 'yesterday']);

Domain for dates, implemented via the Date::Calc module. By default, dates are parsed according to the european format, i.e. through the Decode_Date_EU method; this can be changed by setting

Data::Domain::Date->parser('US'); # will use Decode_Date_US

or

Data::Domain::Date->parser(\&your_own_date_parsing_function);
# that func. should return an array ($year, $month, $day)

When outputting error messages, dates will be printed according to Date::Calc's current language (english by default); see that module's documentation for changing the language.

Options

In the options below, the special keywords today, yesterday or tomorrow may be used instead of a date constant, and will be replaced by the appropriate date when performing comparisons.

-min

The data must be greater or equal to the supplied value.

-max

The data must be smaller or equal to the supplied value.

-not_in

The data must be different from all values in the exclusion set, supplied as an arrayref.

Error messages

In case of failure, the domain returns one of the following scalar messages : INVALID, TOO_SMALL, TOO_BIG, EXCLUSION_SET.

Time

my $domain = Time(-min => '08:00', -max => 'now');

Domain for times in format hh:mm:ss (minutes and seconds are optional).

Options

In the options below, the special keyword now may be used instead of a time, and will be replaced by the current local time when performing comparisons.

-min

The data must be greater or equal to the supplied value.

-max

The data must be smaller or equal to the supplied value.

Error messages

In case of failure, the domain returns one of the following scalar messages : INVALID, TOO_SMALL, TOO_BIG.

String

my $domain = String(qr/^[A-Za-z0-9_\s]+$/);

my $domain = String(-regex     => qr/^[A-Za-z0-9_\s]+$/,
                    -antiregex => qr/$RE{profanity}/,    # see Regexp::Common
                    -min       => 'AA',
                    -max       => 'zz',
                    -not_in    => [qw/foo bar/]);

Domain for strings.

Options

-regex

The data must match the supplied compiled regular expression. Don't forget to put ^ and $ anchors if you want your regex to check the whole string.

-regex is the default option, so you may just pass the regex as a single unnamed argument to String().

-antiregex

The data must not match the supplied regex.

-min

The data must be greater or equal to the supplied value.

-max

The data must be smaller or equal to the supplied value.

-not_in

The data must be different from all values in the exclusion set, supplied as an arrayref.

Error messages

In case of failure, the domain returns one of the following scalar messages : TOO_SHORT, TOO_LONG, TOO_SMALL, TOO_BIG, EXCLUSION_SET, SHOULD_MATCH, SHOULD_NOT_MATCH.

Enum

my $domain = Enum(qw/foo bar buz/);

Domain for a finite set of scalar values.

Options

-values

Ref to an array of values admitted in the domain. This would be called as Enum(-values => [qw/foo bar buz/]), but since this it is the default option, it can be simply written as Enum(qw/foo bar buz/).

Error messages

In case of failure, the domain returns the following scalar message : NOT_IN_LIST.

List

my $domain = List(String, Int, String, Num);

my $domain = List(-items => [String, Int, String, Num]); # same as above

my $domain = List(-all      => Int(qr/^[A-Z]+$/),
                  -min_size => 3,
                  -max_size => 10);

Domain for lists of values (stored as Perl arrayrefs).

Options

-items

Ref to an array of domains; then the first n items in the data must match those domains, in the same order.

This is the default option, so item domains may be passed directly to the new method, without the -items keyword.

-min_size

The data must be a ref to an array with at least that number of entries.

-max_size

The data must be a ref to an array with at most that number of entries.

-all

All remaining entries in the array, after the first <n> entries as specified by the -items option (if any), must satisfy that domain specification.

-any

At least one remaining entry in the array, after the first <n> entries as specified by the -items option (if any), must satisfy that domain specification.

Option -any is incompatible with option -all.

Error messages

The domain will first check if the supplied array is of appropriate shape; in case of of failure, it will return of the following scalar messages : NOT_A_LIST, c<TOO_SHORT>, TOO_LONG.

Then it will check all items in the supplied array according to the -items, -all or -any specifications, and return an arrayref of messages, where message positions correspond to the positions of offending data items.

Struct

my $domain = Struct(foo => Int, bar => String);

my $domain = Struct(-fields  => [foo => Int, bar => String],
                    -exclude => '*');

Domain for associative structures (stored as Perl hashrefs).

Options

-fields

Supplies a list of keys with their associated domains. The list might be given either as a hashref or as an arrayref (in which case the the order of individual field checks will follow the order in the array). The ordering may make a difference in case of context dependencies (see "LAZY CONSTRUCTORS" below ).

-exclude

Specifies which keys are not allowed in the structure. The exclusion may be specified as an arrayref of key names, as a compiled regular expression, or as the string constant '*' or 'all' (meaning that no key will be allowed except those explicitly listed in the -fields option.

Error messages

The domain will first check if the supplied hash is of appropriate shape; in case of of failure, it will return of the following scalar messages : NOT_A_HASH, c<FORBIDDEN_FIELD>.

Then it will check all entries in the supplied hash according to the -fields specification, and return a hashref of messages, where keys correspond to the keys of offending data items.

One_of

my $domain = One_of($domain1, $domain2, ...);

Union of domains : successively checks the member domains, until one of them succeeds.

Options

-options

List of domains to be checked. This is the default option, so the keyword may be omitted.

Error messages

If all member domains failed to accept the data, an arrayref or error messages is returned, where the order of messages corresponds to the order of the checked domains.

LAZY CONSTRUCTORS (CONTEXT DEPENDENCIES)

Principle

If an element of a structured domain (List or Struct depends on another element), then we need to lazily construct the domain. Consider for example a struct in which the value of field date_end must be greater than date_begin : the subdomain for date_end can only be constructed when the argument to <-min> is known, namely when the domain inspects an actual data structure.

Lazy domain construction is achieved by supplying a function reference instead of a domain object. That function will be called with some context information, and should return the domain object. So our example becomes :

my $domain = Struct(
     date_begin => Date,
     date_end   => sub {my $context = shift;
                        Date(-min => $context->{flat}{date_begin})}
   );

Structure of context

The supplied context is a hashref containing the following information:

root

the overall root of the inspected data

path

the sequence of keys or array indices that led to the current data node. With that information, the subdomain is able to jump to other ancestor or sibling data node within the tree, with help of the node_from_path function.

flat

a flat hash containing an entry for any hash key met so far while traversing the tree. In case of name clashes, most recent keys (down in the tree) override previous keys.

list

a reference to the last list (arrayref) encountered while traversing the tree.

Here is an example :

my $data   = {foo => [undef, 99, {bar => "hello, world"}]};
my $domain = Struct(
   foo => List(Whatever, 
               Whatever, 
               Struct(bar => sub {my $context = shift;
                                  print Dumper($context);
                                  String;})
              )
   );
$domain->inspect($data);

This code will print something like

$VAR1 = {
  'root' => {'foo' => [undef, 99, {'bar' => 'hello, world'}]},
  'path' => ['foo', 2, 'bar'],
  'list' => $VAR1->{'root'}{'foo'},
  'flat' => {
    'bar' => 'hello, world',
    'foo' => $VAR1->{'root'}{'foo'}
  }
};

Usage examples

Contextual sets

my $some_cities = {
   Switzerland => [qw/Genève Lausanne Bern Zurich Bellinzona/],
   France      => [qw/Paris Lyon Marseille Lille Strasbourg/],
   Italy       => [qw/Milano Genova Livorno Roma Venezia/],
};
my $domain = Struct(
   country => Enum(keys %$some_cities),
   city    => sub {
      my $context = shift;
      Enum(-values => $some_cities->{$context->{flat}{country}});
    });

Ordered lists

Here is an example of a domain for ordered lists of integers:

my $domain = List(-all => sub {
    my $context = shift;
    my $index = $context->{path}[-1];
    return Int if $index == 0; # first item has no constraint
    return Int(-min => $context->{list}[$index-1] + 1);
  });

Recursive domains

A domain for expression trees, where leaves are numbers, and intermediate nodes are binary operators on subtrees

my $expr_domain = One_of(Num, Struct(operator => String(qr(^[-+*/]$)),
                                     left     => sub {$expr_domain},
                                     right    => sub {$expr_domain}));

WRITING NEW DOMAIN CONSTRUCTORS

Implementing new domain constructors is fairly simple : create a subclass of Data::Domain and implement a new method and an _inspect method. See the source code of Data::Domain::Num or Data::Domain::String for short examples.

However, before writing such a class, consider whether the existing mechanisms are not enough for your needs. For example, many domains could be expressed as a String with a regular expression:

my $Email_dom   = String(qr/^[-.\w]+\@[\w.]+$/); 
my $Phone_dom   = String(qr/^\+?[0-9() ]+$/); 
my $Contact_dom = Struct(name   => String,
                         phone  => $Phone_dom,
                         mobile => $Phone_dom,
                         emails => List(-all => $Email_dom));

SEE ALSO

Doc and tutorials on complex Perl data structures: perlref, perldsc, perllol.

Other CPAN modules doing data validation : Data::FormValidator, CGI::FormBuilder, HTML::Widget::Constraint, Jifty::DBI, Data::Constraint, Declare::Constraints::Simple. Among those, Declare::Constraints::Simple is the closest to Data::Domain, because it is also designed to deal with substructures; yet it has a different approach to combinations of constraints and scope dependencies.

Some inspiration for Data::Domain came from the wonderful Parse::RecDescent module, especially the idea of passing a context where individual rules can grab information about neighbour nodes.

TODO

- generate javascript validation code
- normalization / conversions (-filter option)
- msg callbacks (-filter_msg option)
- default values within domains ? (good idea ?)

AUTHOR

Laurent Dami, <laurent.dami AT etat geneve ch>

COPYRIGHT AND LICENSE

Copyright 2006 by Laurent Dami.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 977:

Non-ASCII character seen before =encoding in 'Data::Domain->messages('français');'. Assuming CP1252