NAME

Data::Schema::Manual::Tutorial - Introduction to and using Data::Schema

OVERVIEW

This document is meant to be first reading for people wanting to know and use Data::Schema (DS). It explains what DS is and what it is for, how to write DS schemas, and how to validate data structures using DS.

INTRODUCTION

Often you want to be certain that a piece of data (a scalar, an array, or perhaps a hash of arrays of hashes, etc.) is of specific range of values/shape/structure. For example, you might want to make sure that the argument to your function is an array of just numbers, or that your command line arguments are valid email addresses that are no longer than 64 characters, and so on. In fact, data validation happens so often that you're totally sick of writing code like this:

if (!defined($arg)) { die "Please specify an argument!" }
if (ref($arg) ne 'ARRAY') { die "Argument is not an array!" }
if (!@$arg) { die "Argument is empty array!" }
for my $i (0..@$arg-1) {
if (!defined($arg)) { die "Element #$i is undefined!" }
if ($arg->[$i] !~ /^-?\d+\.?\d*$/) { die "Element #$i is not a number!" }
}

This is why schemas are a good thing. A schema is a data structure, not code, that declaratively specifies this kind of validation. The functionality of the above code can be replaced with this DS schema:

[array=>{of=>'float', minlen=>1}]

And the final code becomes:

my $res = ds_validate($arg, $schema);
if (!$res->{success}) { die ... }

See how much shorter and simpler it becomes?

DS schemas are less tedious to write, and thus less boring and less error-prone than manual data validation. Because DS schemas are just normal data structures, they are reusable across programming languages and can also be validated themselves (using, why, DS schemas of course).

WRITING SCHEMAS

The simplest form of a schema is just a string specifying a type:

TYPE

Example:

int

or

hash

If you want to restrict the values that the data can contain, you can add one or more type attributes. The schema becomes a two-element array with the type in the first element, and the hash of attributes as the second element:

[ TYPE, ATTRIBUTES ]

Example:

[ str => {minlen=>4, maxlen=>8} ]

or:

[ hash => {required_keys=>[qw/name age address/]} ]

For a list of available types and their respective attributes, see the documentation for Data::Schema::Type::* modules. There are hash, array, int, float, bool, str, and object types, among others.

If you want, you can even write your own type in Perl.

VALIDATING USING DS

The simplest way is just by using the ds_validate() function. It is exported by default. The syntax is:

ds_validate($data, $schema)

Example:

my $res = ds_validate(12, [int => {min=>10}]);
die "Invalid!" unless $res->{success};

The result ($res) is a hashref:

{success=>(0 or 1), errors=>[...], warnings=>[...]}

The 'success' key will be set to 1 if validation is successful, or 0 if not. The 'errors' (and 'warnings') keys are each a list of errors (and warnings) provided should you want to check for details why the validation fails. Each error/warning message is prefixed with data and schema path-like position to help you pinpoint where in the data and schema the validation fails.

The second way, OO-style, provides more control and options:

my $validator = new Data::Schema;

You can set configuration using:

$validator->config->{CONFIGVAR} = 'VALUE';

You can also load plugins:

$validator->register_plugin('Data::Schema::Plugin::WHATEVER');

You can then validate using:

my $res = $validator->validate($data, $schema);

The result is the same hashref described above.

Refer to Data::Schema for details on available configuration and other methods.

ANY AND ALL

Schemas can be as simple or as complex as you want.

To require that data be of some type OR of some other type, you can write something like this:

[
"any",
of => ["array", "hash"],
]

This says that your data can be an array(ref) or a hash(ref). any is some "virtual" type that allows you to specifying several alternatives. Another example:

[
"any",
of => [
[int => {min=>1, max=>10}],
[int => {min=>101, max=>110}],
[int => {min=>1001, max=>1010}],
]
]

The above says that you want an int between 1-10, OR between 101-110, OR between 1001-1010.

There is also the all virtual type that requires the data to satisfies ALL requirements instead of just one. For example:

[
"all",
of => [
[string => {match=>'^\w+$'}],
[string => {match=>'^(.)\1$'}],
[string => {match=>'^[aeiou]$'}],
]
]

The above says that you need a string which is composed of alphanumeric characters only and it has a sequence of two identical characters, and also that it has a vowel.

DEFINING SCHEMAS IN TERMS OF OTHER SCHEMAS

Schemas can actually be defined in terms of other schemas. For example:

my $schema = {
def => {
even => [int => {divisible_by => 2}],
odd => [int => {mod => [2, 1]}],
alt_array => [array => {elem_regex => {"[02468]\$"=>"even", "[13579]\$"=>"odd"}}],
},
type => "alt_array",
};
my $res;
$res = ds_validate([2, 3, 8, -7, 10], $schema); # success
$res = ds_validate([2, 2, 7, -7, 10], $schema); # fail on 2nd and 3rd element

The above schema says that you want an array with alternating even and odd integers. even and odd can be regarded as subschemas, and they are used by the alt_array subschema.

Of course you can also write the schema in "one go":

$schema = [
array => {
elem_regex => {
"[02468]\$"=>[int => {divisible_by => 2}],
"[13579]\$"=>[int => {mod => [2, 1]}],
}
}
];

but some of us might find breaking down a complex schema into pieces help in better understanding it.

EXTERNAL SCHEMAS

Aside from putting subschemas in a schema, you can also put schemas in a separate hash:

my $schema_types = {
even => [int => {divisible_by => 2}],
positive_even => [even => {min => 0}],
array_of_ints => [array => {of => int}],
address => [
"hash",
{
required_keys => [qw/line1 line2 city province country postcode/],
keys => {
line1 => ["str", {required=>1}],
line2 => "str",
city => ["str", {required=>1}],
province => ["str", {required=>1}],
country => ["str", {match=>'/^[A-Z]{2}$/', required=>1}],
postcode => ["str", {minlen=>4, maxlen=>15}],
}
}
],
};
$validator->register_plugin('Data::Schema::Plugin::LoadSchema::Hash');
$validator->config->{schema_search_path} = $schema_types;
my $res;
$res = validate(4, 'positive_even'); # success
$res = validate(4, [positive_even => {min=>10}]); # fail: less than 10

The above address schema is for validating an address "record" (or "form"). There are also other schema types defined in $schema_types. They are loaded using DSP::LoadSchema::Hash.

Another alternative is putting schemas in YAML files.

# in schemadir/address.yaml
- hash
- required_keys: [line1, line2, city, province, country, postcode]
keys:
line1: [str, {required: 1}]
line2: str
city: [str, {required: 1}]
province: [str, {required: 1}],
country: [str, {match: '/^[A-Z]{2}$/', required: 1}]
postcode: [str, {minlen: 4, maxlen: 15}]
# in schemadir/even.yaml
- int
- divisible_by: 2
# in your code
$validator->register_plugin('Data::Schema::Plugin::LoadSchema::YAMLFile');
$validator->config->{schema_search_path} = ["schemadir"];
my $res;
$res = validate(4, 'even'); # success
$res = validate(4, [even => {min=>10}]); # fail: less than 10

MORE EXAMPLES

For now, please see the t/schemas/ directory in the distribution.

SEE ALSO

Data::Schema::Manual::Schema, Data::Schema::Manual::TypeHandler, Data::Schema::Manual::Plugin

AUTHOR

Steven Haryanto, <steven at masterweb.net>

COPYRIGHT & LICENSE

Copyright 2009 Steven Haryanto, all rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.