NAME

Data::Schema - Validate nested data structures with nested structure

VERSION

Version 0.04

SYNOPSIS

# OO interface
use Data::Schema;
my $validator = Data::Schema->new();
my $schema = [array => {min_len=>2, max_len=>4}];
my $data = [1, 2, 3];
my $res = $validator->validate($data, $schema);
print "valid!" if $res->{success}; # prints 'valid!'

# procedural interface
use Data::Schema;
my $sch = ["hash",
           {keys =>
                {name => "str",
                 age  => ["int", {required=>1, min=>18}]
                }
            }
          ];
my $r;
$r = ds_validate({name=>"Lucy", age=>18}, $sch); # success
$r = ds_validate({name=>"Lucy"         }, $sch); # fail: missing age
$r = ds_validate({name=>"Lucy", age=>16}, $sch); # fail: underage

DESCRIPTION

NOTE: THIS IS A PRELIMINARY RELEASE. I have pinned down more or less the general code structure, user interface, and schema syntax which I want, as well as implemented a fairly complete set of types and type attributes. Also you already can create new types by using schema or by writing Perl type handlers. In short, it's already usable in term of validation task (and I am about to use it in production code). However there are other "standard" stuffs like handling of default values and filters which will be implemented in future releases. I am also planning more advanced things like variable substitution, conditionals, etc. but will need to think more about the syntax.

There are already a lot of data validation modules on CPAN. However, most of them do not validate nested data structures. Many seem to focus only on "form" (which is usually presented as shallow hash in Perl).

And of the rest which do nested data validation, either I am not really fond of the syntax, or the validator/schema system is not simple/flexible/etc enough for my taste. For example, other data validation modules might require you to write:

{ type => "int" }

just for validating a measly little int with no other requirements at all. I find this rather annoying. I want to be able to just say:

"int"

And thus Data::Schema (DS) is born.

With DS, you validate a nested data structure with a schema, which is also a nested data structure. But simpler cases will only require you to write a simple schema, like just a string "int" above.

Another design consideration for DS is, I want to maximize reusability of my schemas. And thus DS allows you to define schemas in terms of other schemas (called "schema type"), and your schemas can be "require"-d from Perl variables or YAML files.

Potential application of DS: validating configuration, function parameters, command line arguments, etc.

To get started, see Data::Schema::Manual::Tutorial.

FUNCTIONS

ds_validate($data, $schema)

Non-OO wrapper for validate(). Exported by default. See validate() method.

ATTRIBUTES

config

Configuration hashref. See CONFIG section.

METHODS

merge_attr_hashes($attr_hashes)

Merge several attribute hashes if there are hashes that can be merged (i.e. contains merge prefix in its keys). Used by DST::Base and DST::Schema. As DS user, normally you wouldn't need this.

init_validation_state()

Initialize validation state. Used internally by validate(). As DS user, normally you wouldn't need this.

save_validation_state()

Save validation state (position in data, position in schema, number of errors, etc) into a stack, so that you can start using the validator to validate a new data with a new schema, even in the middle of validating another data/schema. Used internally by validate() and DST::Schema. As DS user, normally you wouldn't need this.

See also: restore_validation_state().

restore_validation_state()

Restore the last validation state into a stack. Used internally by validate() and DST::Schema. As DS user, normally you wouldn't need this.

See also: save_validation_state().

log_error($message)

Add an error when in validation process. Will not add if there are already too many errors (too_many_errors attribute is true). Used by type handlers. As DS user, normally you wouldn't need this.

log_warning($message)

Add a warning when in validation process. Will not add if there are already too many warnings (too_many_warnings attribute is true). Used by type handlers. As DS user, normally you wouldn't need this.

check_type_name($name)

Checks whether $name is a valid type name. Returns true if valid, false if invalid. By default it requires that type name starts with a lowercase letter and contains only lowercase letters, numbers, and underscores. Maximum length is 64.

You can override this method if you want stricter/looser type name criteria.

register_type($name, $class|$obj)

Register a new type, along with a class name ($class) or the actual object ($obj) to handle the type. If $class is given, the class will be require'd (if not already require'd) and instantiated to become object.

Any object can become a type handler, as long as it has:

* a validator() rw property to store/set validator object; * handle_type() method to handle type checking; * zero or more handle_attr_*() methods to handle attribute checking.

See Data::Schema::Manual::TypeHandler for more details on writing a type handler.

register_plugin($class|$obj)

Register a new plugin. Accept a plugin object or class. If $class is given, the class will be require'd (if not already require'd) and instantiated to become object.

Any object can become a plugin, you don't need to subclass from anything, as long as it has:

* a validator() rw property to store/set validator object; * zero or more handle_*() methods to handle some events/hooks.

See Data::Schema::Manual::Plugin for more details on writing a plugin.

call_handler($name, [@args])

Try handle_*() method from each registered plugin until one returns 0 or 1. If a plugin return -1 (decline) then we continue to the next plugin. Returns the status of the last plugin. Returns -1 if there's no handler to invoke.

get_type_handler($name)

Try to get type handler for a certain type. If type is not found, invoke handle_unknown_type() in plugins to give plugins a chance to load the type. If type is still not found, return undef.

normalize_schema($schema)

Normalize a schema into the third form (hash form) ({type=>..., attr_hashes=>..., def=>...) as well as do some sanity checks on it. Returns an error message string if fails.

register_schema_as_type($schema, $name)

Register schema as new type. $schema is a normalized schema. Return {success=>(0 or 1), error=>...}. Fails if type with name $name is already defined, or if $schema cannot be parsed. Might actually register more than one type actually, if the schema contains other types in it (hash form of schema can define types).

validate($data[, $schema])

Validate a data structure. $schema must be given unless you already give the schema via the schema attribute.

Returns {success=>0 or 1, errors=>[...], warnings=>[...]}. The 'success' key will be set to 1 if the data validates, otherwise 'errors' and 'warnings' will be filled with the details.

CONFIG

Configuration is set like this:

my $validator = new Data::Schema;
$validator->config->{CONFIGVAR} = 'VALUE';
# ...

Available configuration variables:

max_errors => INT

Maximum number of errors before validation stops. Default is 10.

max_warnings => INT

Maximum number of warnings before warnings will not be added anymore. Default is 10.

schema_search_path => [...]

A list of places to look for schemas. If you use DSP::LoadSchema::YAMLFile, this will be a list of directories to search for YAML files. If you use DSP::LoadSchema::Hash, this will be the hashes to search for schemas. This is used if you use schema types (types based on schema).

See <Data::Schema::Type::Schema> for more details.

gettext_function => \&func

If set to a coderef, then this will be used to get custom error message when errmsg attribute suffix is used. For example, if schema is:

[str => {regex=>'/^\w+$/', 'regex.errmsg'=>'alphanums_only'}]

then your function will be called with 'alphanums_only' as the argument.

PERFORMANCE NOTES

The way the code is written & structured (e.g. it uses Moose, validation involves a relatively high number of method calls, etc.) it is probably slower than other data validation modules. However, at this state the code has not been profiled and optimized.

To give a rough picture, here's how DS 0.03 fares on my Athlon 64 X2 5000+ (which I think is still a fairly decent box in 2009). Perl 5.10.0, Moose 0.72.

1. Using the simplest case:

$validator->validate(1, "int")

the speed is around 14,000 validations per second.

2. Using the dice throws example (see DSM::Tutorial):

$validator->validate([1,2,3,4,5,6,[1,1],[1,2],[1,3],[1,4]], $schema)

the speed is around 150/sec.

3. Using the dice throws example, but moving all subschemas to a hash and using DSP::LoadSchema::Hash to load it, the speed is around 190/sec.

4. Using a fairly complex schema, XXX.

With this kind of performance you might want to reconsider using DS inside functions that are called very frequently (like hundreds or thousands of times per second). But I think DS should be fine for CGI applications or for command line argument checking and I will not be focusing on performance for the time being.

Some tips on performance:

1. move subschemas out;

2. keep schema simple;

3. write heavy-duty validation logic in Perl (e.g. using new type handler and/or type attribute).

SEE ALSO

Data::Schema::Manual::Tutorial, Data::Schema::Manual::Schema, Data::Schema::Manual::TypeHandler, Data::Schema::Manual::Plugin

Some other data validation modules on CPAN: Data::FormValidator, Data::Rx, Kwalify.

AUTHOR

Steven Haryanto, <steven at masterweb.net>

BUGS

Please report any bugs or feature requests to bug-data-schema at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Schema. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Data::Schema

You can also look for information at:

ACKNOWLEDGEMENTS

COPYRIGHT & LICENSE

Copyright 2009 Steven Haryanto, all rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.