NAME
Data::Schema::Manual::Tutorial - Introduction to and using Data::Schema
OVERVIEW
This document is meant to be first reading for people wanting to know and use Data::Schema (DS). It explains what DS is and what it is for, how to write DS schemas, and how to validate data structures using DS.
INTRODUCTION
Often you want to be certain that a piece of data (a scalar, an array, or perhaps a hash of arrays of hashes, etc.) is of specific range of values/shape/structure. For example, you might want to make sure that the argument to your function is an array of just numbers, or that your command line arguments are valid email addresses that are no longer than 64 characters, and so on. In fact, data validation happens so often that you're totally sick of writing code like this:
if (!defined($arg)) { die "Please specify an argument!" }
if (ref($arg) ne 'ARRAY') { die "Argument is not an array!" }
if (!@$arg) { die "Argument is empty array!" }
for my $i (0..@$arg-1) {
if (!defined($arg)) { die "Element #$i is undefined!" }
if ($arg->[$i] !~ /^-?\d+\.?\d*$/) { die "Element #$i is not a number!" }
}
This is why schemas are a good thing. A schema is a data structure, not code, that declaratively specifies this kind of validation. The functionality of the above code can be replaced with this DS schema:
[array=>{of=>'float', minlen=>1}]
And the final code becomes:
my $res = ds_validate($arg, $schema);
if (!$res->{success}) { die ... }
See how much shorter and simpler it becomes?
DS schemas are less tedious to write, and thus less boring and less error-prone than manual data validation. Because DS schemas are just normal data structures, they are reusable across programming languages and can also be validated themselves (using, why, DS schemas of course).
WRITING SCHEMAS
First Form (scalar)
The simplest form of a schema is just a string specifying a type:
TYPE
Example:
int
or
hash
Second Form (array)
If you want to restrict the values that the data can contain, you can add one or more type attributes. The schema becomes a two-element array with the type in the first element, and the hash of attributes as the second element:
[ TYPE, ATTRIBUTES ]
Example:
[ str => {minlen=>4, maxlen=>8} ]
or:
[ hash => {required_keys=>[qw/name age address/]} ]
For a list of available types and their respective attributes, see the documentation for Data::Schema::Type::* modules. There are hash, array, int, float, bool, str, and object types, among others.
You can write your own type in Perl. It's pretty simple. See Data::Schema::Manual::TypeHandler. Or you can also write new types in schema itself. See SCHEMA AS TYPE section below.
Third Form (hash)
There is also the third form:
{ type: TYPE, attrs: ATTRIBUTES, ... }
which allows for more complex stuffs, but we do not need to write in this form unless we want to define subschemas (a.k.a schema types).
VALIDATING USING DS
The simplest way is just by using the ds_validate() function. It is exported by default. The syntax is:
ds_validate($data, $schema)
Example:
use Data::Schema;
my $res = ds_validate(12, [int => {min=>10}]);
die "Invalid!" unless $res->{success};
The result ($res) is a hashref:
{success=>(0 or 1), errors=>[...], warnings=>[...]}
The 'success' key will be set to 1 if validation is successful, or 0 if not. The 'errors' (and 'warnings') keys are each a list of errors (and warnings) provided should you want to check for details why the validation fails. Each error/warning message is prefixed with data and schema path-like position to help you pinpoint where in the data and schema the validation fails.
The second way, OO-style, provides more control and options:
use Data::Schema;
my $validator = new Data::Schema;
You can set configuration using:
$validator->config->{CONFIGVAR} = 'VALUE';
You can also load plugins:
$validator->register_plugin('Data::Schema::Plugin::WHATEVER');
You can then validate using:
my $res = $validator->validate($data, $schema);
The result is the same hashref described above.
Refer to Data::Schema for details on available configuration and other methods.
SCHEMA AS TYPE
In DS, you can also write new types using schema itself. In other words, you can define schemas in terms of other schemas.
You can put these schema types in another schema (in which you might call the schema types as "subschemas"), or in a hash, or in YAML files.
Schema types inside schema
Example of putting schema types (subschemas) in another schema:
my $schema = {
def => {
even => [int => {divisible_by => 2}],
odd => [int => {mod => [2, 1]}],
alt_array => [array => {elem_regex => {"[02468]\$"=>"even", "[13579]\$"=>"odd"}}],
},
type => "alt_array",
};
my $res;
$res = ds_validate([2, 3, 8, -7, 10], $schema); # success
$res = ds_validate([2, 2, 7, -7, 10], $schema); # fail on 2nd and 3rd element
The above schema says that you want an array with alternating even and odd integers. even and odd can be regarded as subschemas, and they are used by the alt_array subschema.
Of course you can also write the schema in "one go":
$schema = [
array => {
elem_regex => {
"[02468]\$"=>[int => {divisible_by => 2}],
"[13579]\$"=>[int => {mod => [2, 1]}],
}
}
];
but some of us might find breaking down a complex schema into pieces help in better understanding it.
Putting schema types in a hash
Aside from putting the schema types in the schema itself, you can also put them in a separate hash:
my $schema_types = {
even => [int => {divisible_by => 2}],
positive_even => [even => {min => 0}],
array_of_ints => [array => {of => int}],
address => [
"hash",
{
required_keys => [qw/line1 line2 city province country postcode/],
keys => {
line1 => ["str", {required=>1}],
line2 => "str",
city => ["str", {required=>1}],
province => ["str", {required=>1}],
country => ["str", {regex=>'/^[A-Z]{2}$/', required=>1}],
postcode => ["str", {minlen=>4, maxlen=>15}],
}
}
],
};
$validator->register_plugin('Data::Schema::Plugin::LoadSchema::Hash');
$validator->config->{schema_search_path} = $schema_types;
my $res;
$res = validate(4, 'positive_even'); # success
$res = validate(4, [positive_even => {min=>10}]); # fail: less than 10
The above address schema is for validating an address "record" (or "form"). There are also other schema types defined in $schema_types. They are loaded using DSP::LoadSchema::Hash.
Putting schema types in YAML files
Another alternative is putting schema types in YAML files.
# in schemadir/address.yaml
- hash
- required_keys: [line1, line2, city, province, country, postcode]
keys:
line1: [str, {required: 1}]
line2: str
city: [str, {required: 1}]
province: [str, {required: 1}],
country: [str, {regex: '/^[A-Z]{2}$/', required: 1}]
postcode: [str, {minlen: 4, maxlen: 15}]
# in schemadir/even.yaml
- int
- divisible_by: 2
# in your code
$validator->register_plugin('Data::Schema::Plugin::LoadSchema::YAMLFile');
$validator->config->{schema_search_path} = ["schemadir"];
my $res;
$res = validate(4, 'even'); # success
$res = validate(4, [even => {min=>10}]); # fail: less than 10
MORE EXAMPLES
For now, please see the t/schemas/ directory in the distribution.
SEE ALSO
Data::Schema::Tutorial::Schema, Data::Schema::Tutorial::TypeHandler, Data::Schema::Tutorial::Plugin
AUTHOR
Steven Haryanto, <steven at masterweb.net>
COPYRIGHT & LICENSE
Copyright 2009 Steven Haryanto, all rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.