NAME
Data::Sah - Schema for data structures
VERSION
version 0.04
SYNOPSIS
Sample schemas:
# integer, optional
'int'
# required integer
'int*'
# same thing
['int', {req=>1}]
# integer between 1 and 10
['int*', {min=>1, max=>10}]
# same thing, the curly brace is optional (unless for advanced stuff)
['int*', min=>1, max=>10]
# array of integers between 1 and 10
['array*', {of=>['int*', between=>[1, 10]]}]
# a byte (let's assign it to a new type 'byte')
['int', {between=>[0,255]}]
# a byte that's divisible by 3
['byte', {div_by=>3}]
# a byte that's divisible by 3 *and* 5
['byte', {'div_by&'=>[3, 5]}]
# a byte that's divisible by 3 *or* 5
['byte', {'div_by|'=>[3, 5]}]
# a byte that's *in*divisible by 3
['byte', {'!div_by'=>3}]
# an address hash (let's assign it to a new type called 'address')
['hash' => {
# recognized keys
keys => {
line1 => ['str*', max_len => 80],
line2 => ['str*', max_len => 80],
city => ['str*', max_len => 60],
province => ['str*', max_len => 60],
postcode => ['str*', len_between=>[4, 15], match=>'^[\w-]{4,15}$'],
country => ['str*', len => 2, match => '^[A-Z][A-Z]$'],
},
# keys that must exist in data
req_keys => [qw/line1 city province postcode country/],
}]
# a US address, let's base it on 'address' but change 'postcode' to 'zipcode'.
# also, require country to be set to 'US'
['address' => {
'[merge-]keys' => {postcode=>undef},
'[merge]keys' => {
zipcode => ['str*', len=>5, '^\d{5}$'],
country => ['str*', is=>'US'],
},
'[merge-]req_keys' => [qw/postcode/],
'[merge+]req_keys' => [qw/zipcode/],
}]
Using this module:
use Data::Sah;
my $sah = Data::Sah->new;
# get compiler, e.g. perl
my $perlc = $sah->get_compiler('perl');
Then use the compiler (e.g. see Data::Sah::Compiler::perl for more details on how to generate validator using the perl compiler). There's also an easier interface: Data::Sah::Easy.
DESCRIPTION
NOTE: This is a very early release, with minimal implementation and specification still changing. Do NOT use this module yet.
Sah is a schema language to validate data structures.
Features/highlights:
Pure data structure
A Sah schema is just a normal data structure. Using data structures as schemas simplifies parsing and enables easier manipulation (composition, merging, etc) of schemas as well validation of the schemas themselves. For your convenience, Sah accepts a variety of forms and shortcuts, which will be converted into a normalized data structure form.
Some examples of schema:
# a string 'str' # a required string 'str*' # same thing [str => {req=>1}] # a 3x3 matrix of required integers [array => {req=>1, len=>3, of=> [array => {req=>1, len=>3, of=> 'int*'}]}]
See Data::Sah::Manual::Schema for full description of the syntax.
Compilation
To validate data, Perl validator code is generated (compiled) from your schema. This ensures full validation speed, at least one to two orders of magnitude faster than interpreted validation. Compilers to other languages also exist, e.g. JavaScript. This means you only need to write a schema once and use it to validate data anywhere.
The generated validator code can run without this module.
Natural language description
Sah schema can also be converted into human text (e.g.
[int =
{between=>[1, 10]}]> becomes "a number between 1 and 10"). Technically this is just another compilation. This can be used to generate specification document, error messages, etc directly from the schema. This saves you from having to write for many common error messages (but you can supply your own when needed).The human text is translateable and can be output in various forms (as a single sentence, single paragraph, or multiple paragraphs) and formats (text, HTML).
Power
Sah supports common types and a quite rich set of clauses (and clause attributes) for each type, including range constraints, nested conditionals, dependencies, conflict rules, etc. There are also filters/functions and expressions.
Extensibility
You can add your own types, type clauses, and functions if what you need is not supported out of the box.
Emphasis on reusability
You can define schemas in terms of other schemas. Example:
# array of unique gmail addresses [array => {uniq => 1, of => [email => {match => qr/gmail\.com$/}]}]
In the above example, the schema is based on 'email'. Email can be a type or just another schema:
# definition of email [str => {match => ".+\@.+"}]
Another example:
# schema: even [int => {div_by=>2}] # schema: pos_even [even => {min=>0}]
In the above example, 'pos_even' is defined from 'even' with an additional clause (min=>0). As a matter of fact you can also override and remove constraints from your base schema, for even more flexibility.
# schema: pos_even_or_odd [pos_even => {"[merge!]div_by"=>2}] # remove the div_by clause
The above example makes
pos_even_or_odd
effectively equivalent to positive integer.See Data::Sah::Manual::Schema for more about clause set merging.
For schema-local definition, you can also define schemas within schemas:
# dice_throws: array of dice throw results ["array*" => {of => 'dice_throw*'}, {def => { dice_throw => [int => {between=>[1, 6]}], }}, ]
The
dice_throw
schema will only be visible from within thedice_throws
.See Data::Sah::Manual::Schema for more about base schema definitions.
To get started, see Data::Sah::Manual::Tutorial and Data::Sah::Easy.
This module uses Moo for object system and Log::Any for logging.
ATTRIBUTES
compilers => HASH
A mapping of compiler name and compiler (Data::Sah::Compiler::*) objects.
METHODS
new() => OBJ
Create a new Data::Sah instance.
$sah->get_compiler($name) => OBJ
Get compiler object. "Data::Sah::Compiler::$name" will be loaded first and instantiated if not already so. After that, the compiler object is cached.
Example:
my $plc = $sah->get_compiler("perl"); # loads Data::Sah::Compiler::perl
$sah->normalize_schema($schema) => HASH
Normalize a schema, e.g. change int*
into [int =
{req=>1}]>, as well as do some sanity checks on it. Returns the normalized schema if succeeds, or dies on error.
Can also be used as a function.
Autoloaded.
$sah->normalize_var($var) => STR
Normalize a variable name in expression into its fully qualified/absolute form.
Autoloaded. Not yet implemented.
For example:
[int => {min => 10, 'max=' => '2*$min'}]
$min in the above expression will be normalized as schema:clauses.min.value
.
$sah->compile($compiler_name, %compiler_args) => STR
Basically just a shortcut for get_compiler() and send %compiler_args to the particular compiler. Returns generated code.
$sah->perl(%args) => STR
Shortcut for $sah->compile('perl', %args).
$sah->human(%args) => STR
Shortcut for $sah->compile('human', %args).
$sah->js(%args) => STR
Shortcut for $sah->compile('js', %args).
FAQ
Why use a schema (a.k.a "Turing tarpit")? Why not use pure Perl?
I'll leave it to others to debate DSL (like templating language, schema, etc) vs pure Perl. But my principle is: if a DSL can save me significant amount of time, keep my code clean and maintainable, even if it's not perfect (what is?), I'll take it. 90% of the time, my schemas are some variations of the simple cases like:
'str*'
[str => {len_between=>[1, 10], match=>'some regex'}]
[str => {in => [qw/a b c and some other values/]}]
[array => {of => 'some_other_type'}]
[hash => {keys => {key1=>'some schema', ...}, req_keys => [qw/.../], ...}]
and writing schemas is faster and less tedious/error-prone than writing equivalent Perl code, plus Sah can generate JavaScript code and human description text for me. For more complex validation I stay with Sah until it starts to get unwieldy. It usually can go pretty far since I can add functions and custom clauses to its types; it's for the rare and very complex validation needs that I go pure Perl. Your mileage may vary.
What does 'Sah' mean?
Sah is an Indonesian word, meaning 'valid' or 'legal'. It's short.
The previous incarnation of this module uses the namespace Data::Schema, started in 2009 and deprecated in 2011 in favor of Sah.
Why a new name/module? Difference with Data::Schema?
There are enough incompatibilities between the two (some different syntaxes, renamed clauses). Also, some terminology have been changed, e.g. "attribute" become "clauses", "suffix" becomes "attributes". This warrants a new name.
Compared to Data::Schema, Sah always compiles schemas and there is much greater flexibility in code generation (can generate different forms of code, can change data term, can generate code to validate multiple schemas, etc). There is no longer hash form, schema is either a string or an array. Some clauses have been renamed (mostly, commonly used clauses are abbreviated, Huffman encoding thingy), some removed (usually because they are replaced by a more general solution), and new ones have been added.
If you use Data::Schema, I recommend you migrate to Data::Sah as I will not be developing Data::Schema anymore. Sorry, there's currently no tool to convert your Data::Schema schemas to Sah, but it should be relatively straightforward. I recommend that you look into Data::Sah::Easy.
MODULE ORGANIZATION
Data::Sah::Type::* roles specify Sah types, e.g. Data::Sah::Type::bool specifies the bool type.
Data::Sah::FuncSet::* roles specify bundles of functions, e.g. Data::Sah::FuncSet::Core specifies the core/standard functions.
Data::Sah::Compiler::$LANG:: namespace is for compilers. Each compiler (if derived from BaseCompiler) might further contain ::TH::* and ::FSH::* to implement appropriate functionalities, e.g. Data::Sah::Compiler::perl::TH::bool is the 'bool' type handler for the Perl compiler and Data::Sah::Compiler::perl::FSH::Core is the funcset 'Core' handler for Perl compiler.
Data::Sah::Lang::$LANGCODE::* namespace is reserved for modules that contain translations. Language submodules follows the organization of other modules, e.g. Data::Sah::Lang::en_US::Type::int, Data::Sah::Lang::id_ID::FuncSet::Core, etc.
Data::Sah::Schema:: namespace is reserved for modules that contain bundles of schemas. For example, Data::Sah::Schema::CPANMeta contains the schema to validate CPAN META.yml. Data::Sah::Schema::Sah contains the schema for Sah schema itself.
Data::Sah::TypeX::$TYPENAME::$CLAUSENAME namespace can be used to name distributions that extend an existing Sah type by introducing a new clause for it. It must also contain, at the minimum: perl, js, and human compiler implementations for it, as well as English translations. For example, Data::Sah::TypeX::int::is_prime is a distribution that adds is_prime
clause to the int
type. It will contain the following packages inside: Data::Sah::Type::int, Data::Sah::Compiler::{perl,human,js}::TH::int. Other compilers' implementation can be packaged under Data::Sah::Compiler::$COMPILERNAME::TypeX::$TYPENAME::$CLAUSENAME, e.g. Data::Sah::Compiler::python::TypeX::int::is_prime distribution. Language can be put in Data::Sah::Lang::$LANGCODE::TypeX::int::is_prime.
Data::Sah::Manual::* contains documentation, surprisingly enough.
SEE ALSO
Alternatives to Sah
Moose has a type system. MooseX::Params::Validate, among others, can validate method parameters based on this.
Some other data validation and data schema modules on CPAN: Data::FormValidator, Params::Validate, Data::Rx, Kwalify, Data::Verifier, Data::Validator, JSON::Schema, Validation::Class.
AUTHOR
Steven Haryanto <stevenharyanto@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Steven Haryanto.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.