NAME

Type::Guess - Infer data types from an array of scalars

SYNOPSIS

use Type::Guess;

my @list = qw/a b cd efg hijk/;
my $guess = Type::Guess->new(@list);

print $guess->type;       # Str
print $guess->precision;  # 0
print $guess->length;     # 4
print $guess->to_string;  # %-4s

# Use the object directly as a formatter
print $guess->($_) for @list;   # "a   ", "b   ", "cd  ", "efg ", "hijk"
print "$guess";                 # %-4s  (stringifies to format string)

# SQL column definitions
my $sql = Type::Guess->with_roles("+SQL::SQLite")->new(@list);
print $sql->to_sql;             # VARCHAR(4)

# DateTime detection
my $dt = Type::Guess->with_roles("+DateTime", "+SQL::Pg")->new(@dates);
print $dt->type;                # DateTime
print $dt->to_sql;              # TIMESTAMP

DESCRIPTION

Type::Guess analyses an array of scalar values and determines the most appropriate data type (Str, Int, or Num). It also tracks precision, field width, and sign, and generates a sprintf-style format string that can be used directly to format the original values consistently.

The object overloads both stringification ("") and the code dereference operator (&{}), so it can be used as a format string or called directly as a formatting function.

Additional roles extend the base detection to handle datetime values, Unicode strings, Type::Tiny integration, and SQL column definition generation.

METHODS

new

my $guess = Type::Guess->new(@list);

Creates a new Type::Guess object by analysing the input list. Alternatively, a hashref of attribute values may be passed directly:

my $guess = Type::Guess->new({ type => "Int", length => 5 });

analyse

my $guess = $class->analyse(@list);

Analyses a list and returns a new Type::Guess object. Useful for reusing a configured class against multiple datasets.

my $class = Type::Guess->with_roles("+Unicode", "+SQL::Pg");
my $g1 = $class->analyse(@first_column);
my $g2 = $class->analyse(@second_column);

type

my $type = $guess->type;
$guess->type("Str");

Returns or sets the inferred type. Possible values are:

  • Str (string)

  • Int (integer)

  • Num (floating point)

  • DateTime when using the +DateTime or +DateTime::Naive role

  • A Type::Tiny object when using the +Tiny role

Setting type to Str restores length to its originally detected value.

precision

my $p = $guess->precision;
$guess->precision(2);

Returns or sets the number of decimal places for Num values. Always returns 0 for non-Num types.

length

my $len = $guess->length;
$guess->length(12);

Returns or sets the total field width used in the format string. For Num types this is derived from integer_chars and precision; setting it adjusts integer_chars accordingly. Setting a value lower than the detected minimum is ignored with a warning.

integer_chars

my $ic = $guess->integer_chars;
$guess->integer_chars(7);

Returns or sets the number of digits in the integer part of numeric values.

signed

my $sign = $guess->signed;

Returns "-" if only negative values were detected, "+-" if both positive and negative values were detected, or undef if the data is unsigned. Always returns undef for Str types.

to_string

my $format = $guess->to_string;

Returns a sprintf-style format string for the detected type:

Str  ->  %-4s
Int  ->  %5i
Num  ->  %11.5f

If percentages is true, the Num format gets a trailing %%:

Num (%) ->  %7.1f%%

If a custom format has been set it is returned as-is.

to_sql

my $col = $guess->to_sql;

Returns a SQL column type string for the detected type. Requires a SQL dialect role to be composed — see "SQL ROLES". Croaks if called without a dialect role.

format

$guess->format("%010.2f");

Overrides the auto-generated format string. Once set, to_string returns this value directly.

as_hash

my $href = $guess->as_hash;

Returns all attributes as a plain hashref. Can be passed back to new to reconstruct an equivalent object:

my $clone = Type::Guess->new($guess->as_hash);

with_roles

my $class = Type::Guess->with_roles("+Tiny");
my $class = Type::Guess->with_roles("+DateTime", "+SQL::Pg");

Applies one or more roles to the class and returns the composed class. Role names prefixed with + are expanded to Type::Guess::Role::*. Requires Role::Tiny 2.000001 or later.

OVERLOADING

Type::Guess overloads two operators:

Stringification ("")
print "$guess";   # prints the format string, e.g. "%-4s"
Code dereference (&{})
my $formatted = $guess->($value);   # equivalent to sprintf $guess->to_string, $value

This makes the object usable directly as a formatting function:

my @formatted = map { $guess->($_) } @list;

READ-ONLY ATTRIBUTES

The following attributes record the values detected at construction time and do not change when length, precision, or integer_chars are subsequently adjusted. Attempting to set them issues a warning and has no effect.

  • length_ro

  • precision_ro

  • integer_chars_ro

ROLES

+DateTime

my $guess = Type::Guess->with_roles("+DateTime")->new(@dates);

Detects datetime values using DateTime::Format::Flexible by default. Sets type to "DateTime" when the data matches. Requires DateTime::Format::Flexible to be installed.

The parser can be changed via parser_class:

my $class = Type::Guess->with_roles("+DateTime");
$class->parser_class("Strptime", '%Y-%m-%d', "parse_datetime");
my $guess = $class->new(@dates);

parser_class accepts up to three arguments: the format class name (expanded to DateTime::Format::* if not fully qualified), an optional options list, and an optional method name (defaults to parse_datetime).

+DateTime::Naive

my $guess = Type::Guess->with_roles("+DateTime::Naive")->new(@dates);

A self-contained datetime detector with no non-core dependencies. Uses a set of built-in patterns to recognise common date and datetime formats:

YYYY-MM-DD
DD-Mon-YYYY       (e.g. 26-Jan-2024)
YYYY-MM-DD HH:MM:SS
DD.MM.YYYY        (European dot-separated)
DD/MM/YYYY        (European slash-separated)

Also adds a datetime_format attribute containing the detected strftime format string:

print $guess->datetime_format;   # %Y-%m-%d

Use +DateTime::Naive when you want no external dependencies and your date formats are well-controlled. Use +DateTime when you need to handle a wider variety of real-world date strings.

+Unicode

my $guess = Type::Guess->with_roles("+Unicode")->new(@strings);

Handles multibyte and wide characters correctly. Uses visual display width (via Text::VisualWidth::PP) rather than byte or character count for length, and Text::VisualPrintf for formatting.

+Tiny

my $guess = Type::Guess->with_roles("+Tiny")->new(@list);

Integrates with Type::Tiny. When this role is active, type returns a Type::Tiny object rather than a plain string, so you get $guess-type->name> instead of $guess-type>. A custom type list can be passed as a trailing hashref to new:

use Type::Tiny;
my $Date = Type::Tiny->new(
    name       => "Date",
    constraint => sub { /^\d{4}-\d{2}-\d{2}$/ },
);
my $guess = Type::Guess->with_roles("+Tiny")->new(@list, { types => [$Date, Int, Num, Str] });

Types are tried in order; the first type that satisfies the tolerance threshold wins.

SQL ROLES

SQL roles add a to_sql method that returns a dialect-appropriate column type string based on the detected type and its attributes. Compose one SQL role per object. Combining +DateTime or +DateTime::Naive with a SQL role produces the correct datetime column type for that dialect.

+SQL::SQLite

my $guess = Type::Guess->with_roles("+SQL::SQLite")->new(@list);
print $guess->to_sql;

Type mapping:

Int              -> INTEGER
Num              -> FLOAT
Str (< 1024)     -> VARCHAR(n)
Str (>= 1024)    -> TEXT
DateTime         -> DATETIME

SQLite does not enforce column width or type constraints at the storage level, but the declared types are useful for ORM compatibility.

+SQL::Pg

my $guess = Type::Guess->with_roles("+SQL::Pg")->new(@list);
print $guess->to_sql;

Type mapping:

Int (<=9 digits)  -> INTEGER
Int (>9 digits)   -> BIGINT
Num               -> DECIMAL(n,p)
Str (< 1024)      -> VARCHAR(n)
Str (>= 1024)     -> TEXT
DateTime          -> TIMESTAMP

Postgres distinguishes TIMESTAMP (no timezone) and TIMESTAMP WITH TIME ZONE. Type::Guess emits TIMESTAMP by default. If your application is timezone-aware, override to_sql in a subclass or local role to return TIMESTAMP WITH TIME ZONE for DateTime types.

Postgres has no UNSIGNED modifier; the signed attribute is detected but not reflected in the SQL output.

CONFIGURATION

Class-level settings that apply to all subsequent calls.

tolerance

Type::Guess->tolerance(0.1);

The fraction of values (0-1) permitted to fail the type check while still matching that type. Defaults to 0.

Type::Guess->tolerance(0.25);
my $t = Type::Guess->new(1, 2, 3, "a");  # Int  (1 in 4 fails = 25%)

Type::Guess->tolerance(0);
my $t = Type::Guess->new(1, 2, 3, "a");  # Str  (any failure disqualifies)

skip_empty

Type::Guess->skip_empty(1);   # default

When true (the default), empty strings are excluded before analysis. When false, empty strings are included and count against numeric type detection.

Type::Guess->skip_empty(1);
$t = Type::Guess->new(1, 2, "", 3, 4);  # Int  (empty string ignored)

Type::Guess->skip_empty(0);
$t = Type::Guess->new(1, 2, "", 3, 4);  # Str  (empty string included)

encoding

Type::Guess->encoding("UTF-8");

Sets the character encoding used when decoding input. Defaults to "". Primarily used by the +Unicode role.

AUTHOR

Simone Cesano <scesano@cpan.org>

LICENSE

This module is licensed under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 143:

Non-ASCII character seen before =encoding in '—'. Assuming UTF-8