NAME

Data::Schema::Manual::Schema - Data::Schema schema reference

OVERVIEW

This document is explains the syntax of Data::Schema schema.

FORMS

Data::Schema schema is just a normal data structure: Perl scalars, arrays, and hashes.

There are three forms of schema. These different forms are supported for the convenience of schema writers. Internally all schemas and subschemas will be converted ("normalized") to the third form (HASH).

First Form (SCALAR)

TYPE

The simplest form of schema is just a scalar (string) specifying type name. This states that the data must be of specified type

With this first form you cannot add any other value restrictions or anything else, so this form is very limited.

Example:

"int"

The schema says that data must be an integer. Examples of valid data:

5
-2

Example of invalid data:

"int" # not an integer, but a string
[1] # not an integer, an array
{} # not an integer, an empty hash

TYPE can also be the name of another schema. For example if you already have defined a schema with name 'short_array' with this definition:

[array => {maxlen: 10}]

Then you can also have a schema that says just:

short_array

and it will also mean that the data must satisfy the 'short_array' schema.

Second Form (ARRAY)

[TYPE, ATTRHASH, ATTRHASH, ...]

The second form is the array form. The first element of the array is required, the type name (or schema name). The rest is a list of attribute hashes, and is optional.

The first form is actually equivalent to this second form:

[TYPE]

in which no attribute hashes are specified.

Attribute hash is a mapping of attribute names and values. This further limits the range of data values possible. Each type has its own set of known attributes, for example all numeric types (like int and float) has the min, max, et al. Most types have a one_of attribute to limit values to the list of values we specify, etc.

For type validation to succeed, the type requirement *as well as* the requirements of all attributes (from all attribute hashes) must be satisfied.

For more details on attribute hashes, see ATTRHASH section below.

Example:

[str => {one_of => [qw/A B O AB/]}]

This schema states that data must be a string, and it must either be "A", "B", "O", or "AB". Examples of invalid data:

[] # does not satisfy type requirement, not a string
"C" # a string value, but does not satisfy the one_of attribute

Another example:

["int", {min=>0, divisible_by=>2}, {divisible_by=>3}]

The schema effectively says that the data must be positive and divisible by 6 (since it must be divisible by 2 AND 3). Examples of valid data:

6
12

Examples of invalid data:

-6 # an int, satisfies all divisible_by attributes, but not the min

If you specify a schema name as the first element, then the attributes will be of the base type of the schema. Example:

# schema with name = 'even'
[int => {divisible_by=>2}]
# our schema
[even => {min=>20}]

Our schema in effect says that the data must be an even number greater or equal than 20. Since our schema is based on the even schema, the attributes we can specify is that of the int type, since even is defined as an int.

Third Form (HASH)

{type=>TYPE OR SCHEMA,
attrs=>ATTRHASH, attr_hashes=>[ATTRHASH, ...],
def=>SCHEMADEFS,
...}

The third form (HASH) is the most complete form where you can specify everything. The type key is required, while the rest are optional.

The second form is equivalent to this third form:

{type=>TYPE, attr_hashes=>[ATTRHASH, ...]}

where nothing but type name and attribute hashes are specified.

The first form is equivalent to this third form:

{type=>TYPE}

where nothing but type name is specified.

You can specify attribute hashes in attr_hashes key, or if you want to specify just one attribute hash, you can use the attrs key. If they are both present, attribute hashes from both will be used.

This third form allows us to define other schemas inside our schema, using the def keys, which must be a hashref of schema name and definition. This is a way to break down or organize a complex schema into several pieces.

Example:

{
def => {
single_dice_throw => [int => {one_of => [1,2,3,4,5,6]}],
sdt => "single_dice_throw", # short notation
dice_pair_throw => [array => {len=>2, elems=>["sdt", "sdt"]}],
dpt => "dice_pair_throw", # short notation
throw => [or => {alts => ["sdt", "dpt"]}],
throws => [array => {of => "throw"}],
},
type => "throws"
}

This schema specifies that we are accepting a list of dice throws (throws). Each throw can be a single dice throw (sdt) which is a number between 1 and 6, OR a throw of two dices (dpt) which is a 2-element array (where each element is a number between 1 and 6).

Examples of valid data:

[1, [1,3], 6, 4, 2, [3,5]]

Examples of invalid data:

[1, [2, 3], 0] # the third throw is invalid
[1, [2,0,4], 4, 5] # the second throw (a dice pair throw) is invalid

TYPE

Data::Schema comes with several types out of the box, for example: bool, int, float, str, array, hash, etc.

Each type is handled by a type handler, which is a Perl module.

For more details on each type, refer to its handler module documentation. For example, for hash type, see Data::Schema::Type::Hash.

You can write your own type handler. For more information on how to write a type handler, see Data::Schema::Manual::TypeHandler.

ATTRHASH

An attribute hash is a mapping of attribute names and values.

Each type has its own set of known attribute names. To see what attributes a type supports, see type handler module documentation. For example, for hash type, see Data::Schema::Type::Hash.

A schema can specify more than one attribute hashes, in which each attribute hash will be evaluated in order. However, if a key on one attribute hash contains a prefix (see Attribute prefix section below), merging will occur (see Merging of attribute hashes section below).

Attribute prefix

Attribute prefix is one of these characters:

+ - . ! *

prepended to the attribute name.

These will affect merging behaviour of attribute hashes.

The first attribute hash in the schema is not allowed to have attribute prefixes on its keys.

Attribute suffix

Attribute suffix is the dot character (".") followed by one of these:

errmsg

They give additional information/instruction associated with the attribute. They are not necessarily passed to the type attribute handler sub (handle_attr_ATTRNAME()) of the type handler but can be useful only to the validator.

Validation will fail if an unknown suffix is specified.

errmsg

This attribute suffix is used to supply custom error message. For example:

[str=>{regex=>'^\w{4,8}$',
regex.errmsg=>'4-8 alphanumeric characters only!'}]

When validating the regex attribute fails, instead of the default error message from type handler, validator will use the custom error message giving clearer information to the user.

Note: if gettext_function configuration is set, this message will be passed to the function first before being returned. See Data::Schema for more on configuration.

Merging of attribute hashes

Given several attribute hashes in the schema like:

[TYPE, AH1, AH2, AH3]

all AH1, AH2, and AH3 will be evaluated in that order. However, if AH2 keys contain prefixes, AH1 will be merged with AH2 first before evaluated. If AH3 contains merge prefixes too then AH1 will be merged with AH2 and then merged again with AH3 first before evaluating the first attribute hash, and so on. Illustration ("+" notation indicates the presence of merge prefix and "|" notation indicates merging).

AH1, AH2, AH3
eval(AH1)
eval(AH2)
eval(AH3)
AH1, *AH2, AH3
eval(AH1|AH2)
eval(AH3)
AH1, AH2, *AH3
eval(AH1)
eval(AH2|AH3)
AH1, AH2, AH3
eval(AH1|AH2|AH3)

Data::Schema uses Data::PrefixMerge to do merging. Data::PrefixMerge style of merging allows keys on the left side to replace but also add, subtract, remove keys from the left side. This allows schema definition to relax/subtract attribute requirements instead of only add attribute requirements.

Examples:

[int => {divisible_by=>2}, { divisible_by =>3}] # must be divisible by 2 & 3
[int => {divisible_by=>2}, {'*divisible_by'=>3}] # will be merged and become:
[int => {divisible_by=>3} ] # must be divisible by 3 ONLY
[int => {divisible_by=>2}, {'!divisible_by'=>0}] # will be merged and become:
[int => {} ] # need not be divisible at all
[int => {one_of=>[1,2,3,4,5]}, { one_of =>[6]}] # impossible to satisfy
[int => {one_of=>[1,2,3,4,5]}, {'+one_of'=>[6]}] # will be merged and become:
[int => {one_of=>[1,2,3,4,5,6]} ]
[int => {one_of=>[1,2,3,4,5]}, {'-one_of'=>[4]}] # will be merged and become:
[int => {one_of=>[1,2,3, 5]} ]

Refer to Data::PrefixMerge for details on merging syntax and behaviour.

NAMING SCHEMAS FOR USE IN OTHER SCHEMAS

Schemas can be defined for use in other schemas. Example:

{
def => {
single_dice_throw => [int => {one_of => [1,2,3,4,5,6]}],
sdt => "single_dice_throw", # short notation
dice_pair_throw => [array => {len=>2, elems=>["sdt", "sdt"]}],
dpt => "dice_pair_throw", # short notation
throw => [or => {alts => ["sdt", "dpt"]}],
throws => [array => {of => "throw"}],
},
type => "throws"
}

The above schema defines six other schemas (subschemas?). These subschemas will not be available outside of this schema.

Another way is by putting schemas in Perl hash or in YAML files and then loading them using DSP::LoadSchema::Hash or DSP::LoadSchema::YAMLFile.

AUTHOR

Steven Haryanto, <steven at masterweb.net>

COPYRIGHT & LICENSE

Copyright 2009 Steven Haryanto, all rights reserved.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.