NAME

Data::Sah::Manual::Developer - Data::Sah developer information

VERSION

This document describes version 0.916 of Data::Sah::Manual::Developer (from Perl distribution Data-Sah), released on 2024-02-16.

OVERVIEW

PERL CODE GENERATION

This section will describe how the schema is converted into Perl code.

From each clause, an equivalent Perl expression will be generated (except for a few special clauses). The expression will return true/false depending on whether data passes the clause. For example, in the schema:

["int", min=>1, max=>10]

the clause min=>1 will be translated into something like:

$data >= 1

and the clause max=>10 will be translated into something like:

$data <= 10

For the type itself (int) we will generate a Perl expression for type checking:

Scalar::Util::Numeric::isint($data)

These Perl expressions are then ordered and combined into a single one. The order follows the priorities specified by the Sah specification, as each clause has its priority (the lower the number, the higher the priority). The min and max clauses are "regular" type constraint clauses so they each have a priority of 50. There is a special clause req (unspecified here, the default is 0) which have a high priority of 3, which is even higher than the type check. The req clause, if given the value of 1/true will require data to be defined. On the other hand, if req is false then if data is undefined then all the other constraint clauses will be skipped (so undef will pass the schema).

After the ordering, the type constraint expressions are joined using the Perl operator && to be able to shortcut after the first failure. The final Perl expression becomes:

(!defined($data) ? 1 :
    (Scalar::Util::Numeric::isnt($data) &&
    ($data >= 1) &&
    ($data <= 10))
)

Default value

The default clause is another special clause that has a high priority, evaluated before req, type check, or the other constraint clauses.

["int", min=>1, max=>10, default=>1]

The default clause will be translated into this Perl expression:

(($data //= 1), 1)

What the above expression does is evaluate the argument to the left of the comma operator (assigning default value to data) then evaluate the argument to the right of the comma, then return that value. So the effect is the above expression will always return true, even though the default value given in the schema might be false Perl-wise, like "" or 0.

So the final expression will become:

(($data //= 1), 1) &&
(!defined($data) ? 1 :
    (Scalar::Util::Numeric::isnt($data) &&
    ($data >= 1) &&
    ($data <= 10))
)

Required value (req=>1)

What if req is true?

["int*", min=>1, max=>10] # a.k.a. ["int", req=>1, min=>1, max=>10]

Then the final expression will become this instead:

(defined($data) &&
 Scalar::Util::Numeric::isnt($data) &&
 ($data >= 1) &&
 ($data <= 10))

And if we add the default value:

["int*", min=>1, max=>10, default=>1]

Then the final expression will become this:

(($data //= 1), 1) &&
(defined($data) &&
 Scalar::Util::Numeric::isnt($data) &&
 ($data >= 1) &&
 ($data <= 10))

Validator subroutine

To generate a validator subroutine, then, is only a matter of adding some bits to make a full subroutine. Let's get back to this schema:

["int", min=>1, max=>10, default=>1]

The final validator code generated would be something like:

require Scalar::Util::Numeric;
my $validator = sub {
    my $data = shift;

    (($data //= 1), 1) &&
    (!defined($data) ? 1 :
        (Scalar::Util::Numeric::isnt($data) &&
        ($data >= 1) &&
        ($data <= 10))
    )

};

This is what is returned by the Data::Sah's gen_validator() function. This validator will return true when data is valid, or false otherwise. Let's test it:

$validator->("x");   # false (fails the type check, isint())
$validator->(-1);    # false (fails the min clause, $data >= 0)
$validator->(20);    # false (fails the max clause, $data <= 10)
$validator->(5);     # true
$validator->(undef); # true (because there is the default value of 1

String-returning validator

The above is fine if all you want is a validator that returns true/false (bool). What if instead you want to return some error message on failure. gen_validator() supports this: if you pass the option return_type => "str_errmsg" you will get such validator:

$validator = gen_validator(["int", min=>1, max=>10, default=>1], {return_type=>"str_errmsg"});

To do this, each Perl expression will need to be able to set an error message:

require Scalar::Util::Numeric;
my $validator = sub {
    my $data = shift;

    my $err_data;

    (($data //= 1), 1) &&
    (!defined($data) ? 1 :
        (Scalar::Util::Numeric::isnt($data) ?      1 : (($err_data //= "Not integer"),0)     ) &&
        ($data >= 1 ?                              1 : (($err_data //= "Must be at least 1"),0)     ) &&
        ($data <= 10 ?                             1 : (($err_data //= "Must be at most 10"),0)     )
    );

    $err_data //= "";
    $err_data;
};

So each constraint expression still either returns true or false like in the boolean validator case, but before the expression returns 0, it sets $err_data first.

After the whole expression is evaluated, $err_data is returned.

Other possible values for the return_type are:

  • hash_details

    This will generate validator that returns a hash (instead of a single string) with more information about all the errors and warnings encountered during validation. It works with the same principle.

  • bool_valid+val

  • str_errmsg+val

    The bool_valid+val and str_errmsg+val return types are the same as bool_valid and str_errmsg return types respectively, but instead of a bool (or str), they return an array(ref) containing bool (or str) as well as the final input value. Final input value means input value that might be the default value, after application of coercion and/or filters, the value that usually is used further after the validation process.

    For the hash_details return type, the final input value is put in the value key.

Or-logic

Normally all clauses in a clause set must return true for the validation to succeed ("and-logic"). However, some other logics are possible: only N clauses need to succeed, at most N clauses must succeed, or its combination.

When only one clauses need to succeed, this is called an "or-logic". Example schema for a password policy:

["str*", {
    clause => [
        [min_len => 10],
        [match => qr/\W/],
        [match => qr/[A-Z][0-9]|[0-9][A-Z]/i],
    ],
    "clause.op" => "or",
}]

The above schema says that a password needs to be at least 10 characters long, or contains a symbol (non-word character), or contains both letters and numbers.

This will be translated into something like this:

(defined $data) &&
(!ref($data)) && # type check for str
(do {
     my $_sahv_ok = 0;
     my $_sahv_nok = 0;

     (length($data) >= 10                 ? ++$_sahv_ok : ++$_sahv_nok) &&
     ($data =~ qr/\W/                     ? ++$_sahv_ok : ++$_sahv_nok) &&
     ($data =~ qr/[A-Z][0-9]|[0-9][A-Z]/i ? ++$_sahv_ok : ++$_sahv_nok) &&
     $_sahv_ok >= 1;
})

XXX shortcut after $_sahv_ok becomes 1?

HUMAN TEXT GENERATION

This section explains how Sah schema is converted into human description text, e.g. [int => div_by=>3] into "integer, divisible by 3". This human text is used for error messages or for documentation. You should read the previous section about code generation first, since text generation is basically the same: it's just another "compilation" process. The difference is, instead of generating Perl code as in the case of the "perl" compiler (Data::Sah::Compiler::perl), the "human" compiler (Data::Sah::Compiler::human) generates text as the result.

As in generating code, when generating text, we visit the type handler and then clause handler for each clause. Each of these handlers usually calls add_ccl() to add a "compiled clause" which will be joined together to create the final result.

The type handler usually adds a "noun" compiled clause. For example, for schema ["float", min=>1, max=>10], the type handler for float (method handle_type in Data::Sah::Compiler::human::TH::float, TH is short for type handler) will add this compiled clause:

{
  type => 'noun',
  text => ['decimal number', 'decimal numbers'],
  xlt  => 1,
}

The xlt=>1 signifies that the text has been translated (note that the human compiler supports producing human text in languages other than English).

Next, the clause handler for clause min (method clause_min in Data::Sah::Compiler::human::TH::float) will add this compiled clause:

{
  type => 'clause',
  fmt  => '%(modal_verb)s be at least %s',
}

Now, instead of text we have fmt. This will be converted into text using sprintfn (see Text::sprintfn) by add_ccl(). The positional arguments (like %s) will be fed from clause value (in this case, 1). While the named arguments (like %(modal_verb)s) will be supplied by add_ccl().

Since xlt is not set to true, this means the format string needs to be translated first. add_ccl() will find a suitable translation first (see "Translation") and then call sprintfn() to finally get text. The final result of this compiled clause is:

{
  type => 'clause',
  text => 'must be at least 1',
  xlt  => 1,
}

For the last clause max, we'll similarly get a compiled clause:

{
  type => 'clause',
  fmt  => '%(modal_verb)s be at most %s',
}

which will become:

{
  type => 'clause',
  text => 'must be at most 10',
  xlt  => 1,
}

Finally, all the compiled clauses will simply be joined and the compilation result is:

"decimal number, must be at least 1, must be at least 10"

Formats

TBD

Handling CLAUSE.op and CLAUSE.err_level

Consider this schema:

[int => 'div_by&' => [3, 5]]

which is a shortcut for:

[int => 'div_by'=>[3, 5], 'div_by.op'=>'and']

This is a clause with multivalues. This is the compiled clauses that will be added during generation:

{type=>'noun', text=>['integer','integers'], xlt=>1}

and:

{type=>'clause', fmt=>'%(modal_verb)s divisible by %s'}

which will become:

{type=>'clause', text=>'must be divisible by 3 and 5', xlt=>1}

In other words, the clause fmt is the same but the arguments supplied to it are formatted to contain the multiple values.

Another example, for [int => 'div_by&'=>[2,3,5]], the clause will generate this final compiled clause:

{type=>'clause', text=>'must be divisible by all of [2,3,5]', xlt=>1}

For [int => 'div_by|'=>[2,3,5]] (which is shortcut for [int => 'div_by'=>[2,3,5], 'div_by.op'=>'or']) the final compiled clause will be:

{type=>'clause', text=>'must be divisible by one of [2,3,5]', xlt=>1}

For [int => '!div_by'=>3] (which is shortcut for [int => 'div_by'=>3, 'div_by.op'=>'or']) the final compiled clause will be:

{type=>'clause', text=>'must not be divisible by 3', xlt=>1}

that is, the value for modal_verb named argument supplied by add_ccl() is changed from the default must to must not.

For [int => 'div_by'=>3, 'div_by.err_level'=>'warn'], the final compiled clause will be:

{type=>'clause', text=>'should be divisible by 3', xlt=>1}

that is, the value for modal_verb named argument supplied by add_ccl() is changed from the default must to should.

Not all clauses can use multiple clause values in its arguments. For example, in [int => mod=>[3, 1]], the compiled clause for the mod clause will be:

{type=>'clause', fmt=>'%(modal_verb)s leave a remainder of %2$s when divided by %1$s', vals=>[3, 1]}

(Note: the vals key supplies positional arguments for sprintfn if you want it other than the default clause value. In this case we want to flatten the clause value because otherwise the positional arguments array would be [ [3,1] ]. The %1$s and %2$s are printf syntax for using positional arguments (see sprintf in perlfunc).

The final compiled clause will become:

{type=>'clause', text=>'must leave a remainder of 1 when divided by 3', xlt=>1}

Now what if we have this schema: [int => 'mod&' => [ [3,1], [5,1] ]. If we use the same fmt for multiple values, the final compiled clause will become:

{type=>'clause', text=>'must leave a remainder of [5,1] when divided by [3,1]', xlt=>1}

in which the text doesn't make grammatical sense. In this case, the clause handler will need to add a compiled clause of type list instead of of type clause:

{
  type  =>'list',
  text  => 'all of the following must be true',
  items => [
    {type=>'clause', text='must leave a remainder of 1 when divided by 3', xlt=>1},
    {type=>'clause', text='must leave a remainder of 1 when divided by 5', xlt=>1},
  ],
  xlt   => 1,
}

The list compiled clause is used to create text with bullet points (which can be inlined into a clause in some cases where possible). The final compilation result for the last schema will be:

"integer, all of the following must be true: must leave a remainder of 1 when
divided by 3, must leave a remainder of 1 when divided by 5"

Coercion (perl)

Coercion rules for perl are organized modularly in Data::Sah::Coerce::perl::To_$TARGET_TYPE::From_$SOURCE_TYPE::$DESCRIPTION modules, where $TARGET_TYPE is the schema being compiled, $SOURCE_TYPE is source type, $DESCRIPTION is some extra description. Example:

Data::Sah::Coerce::perl::To_date::From_float::Epoch

This module contain rule to convert integer (which assumed to be Unix epoch) into date. Another example:

Data::Sah::Coerce::perl::To_date::From_str::ISO8601

This is also a module to coerce date from (a subset of) ISO8601 strings.

Handling expression

TBD

Translation

TBD

COERCION

In Data::Sah, coercion rules are organized modularly in Data::Sah::Coerce::$LANG::To_$TARGET_TYPE::From_$SOURCE_TYPE::$DESCRIPTION modules, where $TARGET_TYPE is the schema being compiled, $SOURCE_TYPE is source type, and $DESCRIPTION is some extra description. For language-specific information, see "Coercion (perl)".

Code for coercion is generated by collecting all rules from the coercion handler modules then combining them and putting it after setting default value and before type check.

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/Data-Sah.

SOURCE

Source repository is at https://github.com/perlancar/perl-Data-Sah.

AUTHOR

perlancar <perlancar@cpan.org>

CONTRIBUTING

To contribute, you can send patches by email/via RT, or send pull requests on GitHub.

Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:

% prove -l

If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.

COPYRIGHT AND LICENSE

This software is copyright (c) 2024, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012 by perlancar <perlancar@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=Data-Sah

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.