NAME

Data::Sah::Compiler - Base class for Sah compilers (Data::Sah::Compiler::*)

VERSION

This document describes version 0.916 of Data::Sah::Compiler (from Perl distribution Data-Sah), released on 2024-02-16.

COMPILATION DATA KEYS

  • v => int

    Version of compilation data structure. Currently at 2. Whenever there's a backward-incompatible change introduced in the structure, this version number will be bumped. Client code can check this key to deliberately fail when it encounters version number that it can't handle.

  • args => HASH

    Arguments given to compile().

  • compiler => OBJ

    The compiler object.

  • compiler_name => str

    Compiler name, e.g. perl, js.

  • is_inner => bool

    Convenience. Will be set to 1 when this compilation is a subcompilation (i.e. compilation of a subschema). You can also check for outer_cd to find out if this compilation is an inner compilation.

  • outer_cd => HASH

    If compilation is called from within another compile(), this will be set to the outer compilation's $cd. The inner compilation will inherit some values from the outer, like list of types (th_map) and function sets (fsh_map).

  • th_map => HASH

    Mapping of fully-qualified type names like int and its Data::Sah::Compiler::*::TH::* type handler object (or array, a normalized schema).

  • fsh_map => HASH

    Mapping of function set name like core and its Data::Sah::Compiler::*::FSH::* handler object.

  • schema => ARRAY

    The current schema (normalized) being processed. Since schema can contain other schemas, there will be subcompilation and this value will not necessarily equal to $cd->{args}{schema}.

  • spath = ARRAY

    An array of strings, with empty array ([]) as the root. Point to current location in schema during compilation. Inner compilation will continue/append the path.

    Example:

    # spath, with pointer to location in the schema
    
    spath: ["elems"] ----
                         \
    schema: ["array", {elems => ["float", [int => {min=>3}], [int => "div_by&" => [2, 3]]]}
    
    spath: ["elems", 0] ------------
                                    \
    schema: ["array", {elems => ["float", [int => {min=>3}], [int => "div_by&" => [2, 3]]]}
    
    spath: ["elems", 1, "min"] ---------------------
                                                    \
    schema: ["array", {elems => ["float", [int => {min=>3}], [int => "div_by&" => [2, 3]]]}
    
    spath: ["elems", 2, "div_by", 1] -------------------------------------------------
                                                                                      \
    schema: ["array", {elems => ["float", [int => {min=>3}], [int => "div_by&" => [2, 3]]]}

    Note: aside from spath, there is also the analogous dpath which points to the location of data (e.g. array element, hash key). But this is declared and maintained by the generated code, not by the compiler.

  • th => OBJ

    Current type handler.

  • type => STR

    Current type name.

  • clsets => ARRAY

    All the clause sets. Each schema might have more than one clause set, due to processing base type's clause set.

  • clset => HASH

    Current clause set being processed. Note that clauses are evaluated not strictly in clset order, but instead based on expression dependencies and priority.

  • clset_dlang => HASH

    Default language of the current clause set. This value is taken from $cd->{clset}{default_lang} or $cd->{outer_cd}{default_lang} or the default en_US.

  • clset_num => INT

    Set to 0 for the first clause set, 1 for the second, and so on. Due to merging, we might process more than one clause set during compilation.

  • uclset => HASH

    Short for "unprocessed clause set", a shallow copy of clset, keys will be removed from here as they are processed by clause handlers, remaining keys after processing the clause set means they are not recognized by hooks and thus constitutes an error.

  • uclsets => ARRAY

    All the uclset for each clause set.

  • clause => STR

    Current clause name.

  • cl_meta => HASH

    Metadata information about the clause, from the clause definition. This include prio (priority), attrs (list of attributes specific for this clause), allow_expr (whether clause allows expression in its value), etc. See Data::Sah::Type::$TYPENAME for more information.

  • cl_value => ANY

    Clause value. Note: for putting in generated code, use cl_term.

    The clause value will be coerced if there are applicable coercion rules. To get the raw/original value as the schema specifies it, see cl_raw_value.

  • cl_raw_value => any

    Like cl_value, but without any coercion/filtering done to the value.

  • cl_term => STR

    Clause value term. If clause value is a literal (.is_expr is false) then it is produced by passing clause value to literal(). Otherwise, it is produced by passing clause value to expr().

  • cl_is_expr => BOOL

    A copy of $cd->{clset}{"${clause}.is_expr"}, for convenience.

  • cl_op => STR

    A copy of $cd->{clset}{"${clause}.op"}, for convenience.

  • cl_is_multi => BOOL

    Set to true if cl_value contains multiple clause values. This will happen if .op is either and, or, or none and $cd->{CLAUSE_DO_MULTI} is set to true.

  • indent_level => INT

    Current level of indent when printing result using $c->line(). 0 means unindented.

  • all_expr_vars => ARRAY

    All variables in all expressions in the current schema (and all of its subschemas). Used internally by compiler. For example (XXX syntax not not finalized):

    # schema
    [array => {of=>'str1', min_len=>1, 'max_len=' => '$min_len*3'},
     {def => {
         str1 => [str => {min_len=>6, 'max_len=' => '$min_len*2',
                          check=>'substr($_,0,1) eq "a"'}],
     }}]
    
    all_expr_vars => ['schema:///clsets/0/min_len', # or perhaps .../min_len/value
                      'schema://str1/clsets/0/min_len']

    This data can be used to order the compilation of clauses based on dependencies. In the above example, min_len needs to be evaluated before max_len (especially if min_len is an expression).

  • modules => array of hash

    List of modules that are required, one way or another. Each element is a hash which must contain at least the name key (module name). There are other keys like version (minimum version), phase (explained below). Some languages might add other keys, like perl with use_statement (statement to load/use the module, used by e.g. pragmas like no warnings 'void' which are not the regular require MODULE statement). Generally, duplicate entries (entries with the same name and phase) are avoided, except in special cases like Perl pragmas.

    There are runtime modules (phase key set to runtime), which are required by the generated code when running. For each entry, the only required key is name. Other keys include: version (minimum version). Some languages have some additional rule for this, e.g. perl has use_statement (how to use the module, e.g. for pragma, like no warnings 'void').

    There are also compile-time modules (phase key set to compile), which are required during compilation of schema. This include coercion rule modules like Data::Sah::Coerce::perl::To_date::From_float::Epoch, and so on. This information might be useful for distributions that use Data::Sah. Because Data::Sah is a modular library, where there are third party extensions for types, coercion rules, and so on, listing these modules as dependencies instead of a single Data::Sah will ensure that dependants will pull the right distribution during installation.

  • ccls => [HASH, ...]

    (Result) Compiled clauses, collected during processing of schema's clauses. Each element will contain the compiled code in the target language, error message, and other information. At the end of processing, these will be joined together.

  • result => ...

    (Result) The final result. For most compilers, it will be string/text.

  • has_constraint_clause => bool

    Convenience. True if there is at least one constraint clause in the schema. This excludes special clause req and forbidden.

  • has_subschema => bool

    Convenience. True if there is at least one clause which contains a subschema.

ATTRIBUTES

main => OBJ

Reference to the main Data::Sah object.

expr_compiler => OBJ

Reference to expression compiler object. In the perl compiler, for example, this will be an instance of Language::Expr::Compiler::Perl object.

METHODS

new() => OBJ

$c->compile(%args) => HASH

Compile schema into target language.

Arguments (* denotes required arguments, subclass may introduce others):

  • data_name => STR (default: 'data')

    A unique name. Will be used as default for variable names, etc. Should only be comprised of letters/numbers/underscores.

  • schema* => STR|ARRAY

    The schema to use. Will be normalized by compiler, unless schema_is_normalized is set to true.

  • lang => STR (default: from LANG/LANGUAGE or en_US)

    Desired output human language. Defaults (and falls back to) en_US.

  • mark_missing_translation => BOOL (default: 1)

    If a piece of text is not found in desired human language, en_US version of the text will be used but using this format:

    (en_US:the text to be translated)

    If you do not want this marker, set the mark_missing_translation option to 0.

  • locale => STR

    Locale name, to be set during generating human text description. This sometimes needs to be if setlocale() fails to set locale using only lang.

  • schema_is_normalized => BOOL (default: 0)

    If set to true, instruct the compiler not to normalize the input schema and assume it is already normalized.

  • allow_expr => BOOL (default: 1)

    Whether to allow expressions. If false, will die when encountering expression during compilation. Usually set to false for security reason, to disallow complex expressions when schemas come from untrusted sources.

  • on_unhandled_attr => STR (default: 'die')

    What to do when an attribute can't be handled by compiler (either it is an invalid attribute, or the compiler has not implemented it yet). Valid values include: die, warn, ignore.

  • on_unhandled_clause => STR (default: 'die')

    What to do when a clause can't be handled by compiler (either it is an invalid clause, or the compiler has not implemented it yet). Valid values include: die, warn, ignore.

  • indent_level => INT (default: 0)

    Start at a specified indent level. Useful when generated code will be inserted into another code (e.g. inside sub {} where it is nice to be able to indent the inside code).

  • skip_clause => ARRAY (default: [])

    List of clauses to skip (to assume as if it did not exist). Example when compiling with the human compiler:

    # schema
    [int => {default=>1, between=>[1, 10]}]
    
    # generated human description in English
    integer, between 1 and 10, default 1
    
    # generated human description, with skip_clause => ['default']
    integer, between 1 and 10

Compilation data

During compilation, compile() will call various hooks (listed below). The hooks will be passed compilation data ($cd) which is a hashref containing various compilation state and result. Compilation data is written to this hashref instead of on the object's attributes to make it easy to do recursive compilation (compilation of subschemas).

Keys that are put into this compilation data include input data, compilation state, and others. Many of these keys might exist only temporarily during certain phases of compilation and will no longer exist at the end of compilation, for example clause will only exist during processing of a clause and will be seen by hooks like before_clause and after_clause, it will not be seen by before_all_clauses or after_compile.

For a list of keys, see "COMPILATION DATA KEYS". Subclasses may add more data; see their respective documentation.

Return value

The compilation data will be returned as return value. Main result will be in the result key. There is also ccls, and subclasses may put additional results in other keys. Final usable result might need to be pieced together from these results, depending on your needs.

Hooks

By default this base compiler does not define any hooks; subclasses can define hooks to implement their compilation process. Each hook will be passed compilation data, and should modify or set the compilation data as needed. The hooks that compile() will call at various points, in calling order, are:

  • $c->before_compile($cd)

    Called once at the beginning of compilation.

  • $c->before_handle_type($cd)

  • $th->handle_type($cd)

  • $c->before_all_clauses($cd)

    Called before calling handler for any clauses.

  • $th->before_all_clauses($cd)

    Called before calling handler for any clauses, after compiler's before_all_clauses().

  • $c->before_clause($cd)

    Called for each clause, before calling the actual clause handler ($th->clause_NAME() or $th->clause).

  • $th->before_clause($cd)

    After compiler's before_clause() is called, type handler's before_clause() will also be called if available.

    Input and output interpretation is the same as compiler's before_clause().

  • $th->before_clause_NAME($cd)

    Can be used to customize clause.

    Introduced in v0.10.

  • $th->clause_NAME($cd)

    Clause handler. Will be called only once (if $cd-{CLAUSE_DO_MULTI}> is set to by other hooks before this) or once for each value in a multi-value clause (e.g. when .op attribute is set to and or or). For example, in this schema:

    [int => {"div_by&" => [2, 3, 5]}]

    clause_div_by() can be called only once with $cd->{cl_value} set to [2, 3, 5] or three times, each with $cd->{value} set to 2, 3, and 5 respectively.

  • $th->after_clause_NAME($cd)

    Can be used to customize clause.

    Introduced in v0.10.

  • $th->after_clause($cd)

    Called for each clause, after calling the actual clause handler ($th->clause_NAME()).

  • $c->after_clause($cd)

    Called for each clause, after calling the actual clause handler ($th->clause_NAME()).

    Output interpretation is the same as $th->after_clause().

  • $th->after_all_clauses($cd)

    Called after all clauses have been compiled, before compiler's after_all_clauses().

  • $c->after_all_clauses($cd)

    Called after all clauses have been compiled.

  • $c->after_compile($cd)

    Called at the very end before compiling process end.

$c->get_th

$c->get_fsh

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/Data-Sah.

SOURCE

Source repository is at https://github.com/perlancar/perl-Data-Sah.

AUTHOR

perlancar <perlancar@cpan.org>

CONTRIBUTING

To contribute, you can send patches by email/via RT, or send pull requests on GitHub.

Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:

% prove -l

If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.

COPYRIGHT AND LICENSE

This software is copyright (c) 2024, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012 by perlancar <perlancar@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=Data-Sah

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.