NAME

Flat::Schema - Deterministic schema contracts for flat files

WHY THIS EXISTS (IN ONE PARAGRAPH)

In real ETL work, yesterday's CSV becomes today's "contract" whether you meant it or not. Flat::Schema makes that contract explicit: generate a deterministic schema from what you observed, record ambiguity as issues, and give the next step (validation) something stable to enforce.

SYNOPSIS

Basic usage:

use Flat::Profile;
use Flat::Schema;

my $profile = Flat::Profile->profile_file(
    file => "data.csv",
);

my $schema = Flat::Schema->from_profile(
    profile => $profile,
);

print Flat::Schema->new()->to_json(schema => $schema);

With overrides:

my $schema = Flat::Schema->from_profile(
    profile   => $profile,
    overrides => [
        { column_index => 0, set => { type => 'integer', nullable => 0 } },
        { column_index => 3, set => { name => 'created_at', type => 'datetime' } },
    ],
);

DESCRIPTION

Flat::Schema consumes reports produced by Flat::Profile and generates a deterministic, inspectable schema contract describing what tabular data should look like.

It is the second module in the Flat::* series:

Flat::Profile — What the data looks like
Flat::Schema — What the data should look like
Flat::Validate — Does the data conform (planned)

The schema is a canonical Perl data structure that:

Is stable and deterministic (identical inputs → identical output)
Is serializable to JSON and YAML
Captures inference decisions and ambiguity as issues
Can be consumed by Flat::Validate or other tooling

REAL-WORLD USE CASES (THE STUFF YOU ACTUALLY DO)

1) Vendor “helpfully” changes a column (integer → text)

You ingest daily files and one day a numeric column starts containing values like N/A, unknown, or ERR-17. Your pipeline should not silently coerce this into zero or drop rows.

Workflow:

Profile last-known-good
Generate schema (your contract)
Validate future drops against the schema

A typical override when you decide "we accept this as string now":

my $schema = Flat::Schema->from_profile(
    profile   => $profile,
    overrides => [
        { column_index => 7, set => { type => 'string' } },
    ],
);

Flat::Schema will record that the override conflicts with what it inferred, and that record is useful during incident review.

2) Columns that are “nullable in real life” even if today they are not

Data often arrives complete in a sample window and then starts missing values in production. In v1, nullability is intentionally simple:

nullable = true iff null_count > 0

If you know a field is nullable even if today it isn't, force it:

overrides => [
    { column_index => 2, set => { nullable => 1 } },  # allow missing later
],

3) Timestamp confusion: date vs datetime vs “whatever the exporter did”

When temporal evidence mixes, Flat::Schema chooses predictability over cleverness.

date + datetime → datetime
temporal + non-temporal → string (and it tells you)

This prevents “maybe parseable” data from becoming quietly wrong later.

4) “Header row roulette” and naming cleanup

You may get headers like Customer ID, customer_id, CUSTID, or no header at all. Schema stores both:

index always
name when available

If you need normalized naming for downstream systems:

overrides => [
    { column_index => 0, set => { name => 'customer_id' } },
],

5) Reproducible artifacts for tickets, audits, and “what changed?”

Sometimes the most important feature is being able to paste the schema into a ticket, diff it in Git, or keep it as a build artifact.

Flat::Schema’s serializers are deterministic by design. If the schema changes, it is because the inputs changed (profile or overrides), not because hash order shifted.

SCHEMA STRUCTURE (AT A GLANCE)

A generated schema contains:

{
    schema_version => 1,
    generator      => { name => "Flat::Schema", version => "0.01" },
    profile        => { ... },
    columns        => [ ... ],
    issues         => [ ... ],
}

Each column contains:

{
    index      => 0,
    name       => "id",
    type       => "integer",
    nullable   => 0,
    length     => { min => 1, max => 12 },  # optional
    overrides  => { ... },                  # optional
    provenance => {
        basis         => "profile",
        rows_observed => 1000,
        null_count    => 0,
        null_rate     => { num => 0, den => 1000 },
        overrides     => [ "type", "nullable" ],  # optional
    },
}

TYPE INFERENCE (v1)

Type inference is based solely on evidence provided by Flat::Profile.

Scalar widening order:

boolean → integer → number → string

Temporal handling:

date + datetime → datetime
temporal + non-temporal → string (with warning)

Mixed evidence is widened and recorded as an issue.

NULLABILITY INFERENCE (v1)

Rules:

nullable = true iff null_count > 0
If rows_profiled == 0, all columns are nullable
All-null columns emit warning all_null_column
Zero profiled rows emits warning no_rows_profiled

USER OVERRIDES (v1)

Overrides are applied after inference.

Supported fields:

type
nullable
name
length (min/max)

Overrides:

Are index-based (column_index required)
May conflict with inferred values (recorded as warnings)
Are recorded in column.overrides
Are recorded in provenance.overrides
Emit an informational override_applied issue

Overrides referencing unknown columns cause a hard error.

DETERMINISTIC SERIALIZATION

Flat::Schema includes built-in deterministic JSON and YAML serializers.

Same input profile + same overrides → identical JSON/YAML.

This is required for reproducible pipelines and meaningful diffs.

STATUS

Implemented in v1:

Canonical schema structure
Deterministic serialization
Type inference
Nullability inference
User overrides (index-based)

Future releases may expand the type lattice, constraint modeling, and schema evolution.

AUTHOR

Sergio de Sousa sergio@serso.com

LICENSE

This library is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)