NAME

Text::CSV::Auto - Comprehensive and automatic loading, processing, and analysis of CSV files.

SYNOPSIS

Given a CSV file like this:

name,age,gender
Jill,44,f
Bob,32,m
Joe,51,m
June,23,f

You could do this:

use Text::CSV::Auto qw( process_csv );

process_csv(
    'path/to/file.csv',
    sub{
        my ($row) = @_;

        print "$row->{name} is $row->{age} years old.\n";
    },
);

You can also slurp in all the rows in one giant array of hashes:

use Text::CSV::Auto qw( slurp_csv );

my $rows = slurp_csv( 'path/to/file.csv' );
foreach my $row (@$rows) {
    print "$row->{name} is $row->{age} years old.\n";
}

You can also get an analysis about the content of the file:

use Text::CSV::Auto qw( analyze_csv );

my $headers = analyze_csv( 'path/to/file.csv' );

Which will give you something like this:

[
    {
        header        => 'name',
        string        => 1,
        string_length => 4,
    },
    {
        header          => 'age',
        integer         => 1,
        min             => 23,
        max             => 51,
        integer_length  => 2,
    },
    {
        header        => 'gender',
        string        => 1,
        string_length => 1,
    },
]

DESCRIPTION

This module provides utilities to quickly process and analyze CSV files with as little hassle as possible.

The reliable and robust Text::CSV_XS module is used for the actual CSV parsing. This module provides a simpler and smarter interface. In most situations all you need to do is specify the filename of the file and this module will automatically figure out what kind of separator is used and set some good default options for processing the file.

The name CSV is misleading as any variable-width delimited file should be fine including TSV files and pipe "|" delimted files to name a few.

SUBROUTINES

process_csv

process_csv(
    $filename,
    $options, # optional
    $code_ref,
);

For each row that is found in the CSV it will be converted in to a hash and the code reference you pass will be executed with the row hashref as the first argument.

Options may be specified as a hashref. Any options that Text::CSV_XS supports, such as sep_char and binary, can be set. Some sain options are set by default but can be overriden:

binary    => 1 # Assume there is binary data.
auto_diag => 2 # die() if there are any errors.
sep_char  => ... # Automatically detected.

Read the Text::CSV_XS docs to see the many options that it supports.

There are additional options that can be set that affect how this module works. Some sain defaults have also been set for these:

format_headers => 1

Read below about the additional options that are supported:

headers

By default headers are pulled from the first row in the CSV. In some cases a CSV file does not have headers. In these cases you should specify an arrayref of header names that you would like to use.

headers => ['foo', 'bar']

format_headers

When the first row is pulled from the CSV to determine the headers this option will cause them to be formatted to be more consistant and remove duplications. For example, if this were the headers:

Parents Name,Parent Age,Child Name,Child Age,Child Name,Child Age

The headers would be tranformed too:

parent_name,parent_age,child_name,child_age,child_name_2,child_age_2

This option is enabled by default. You can turn it off if you want:

format_headers => 0

skip_rows

An arrayref of row numbers to skip can be specified. This is useful for CSV files that contain anciliary rows that you don't want to be processed. For example, you could ignore the 2nd row and the 5th through the 10th rows:

skip_rows => [2, 5..10]

max_rows

By default all rows will be processed. In some cases you only want to run a sample set of rows. This option will limit the number of rows processed. This is most useful for when you are using analyze_csv() on a very large file where you don't need every row to be analyzed.

max_rows => 50

slurp_csv

my $rows = slurp_csv(
    $filename,
    $options, # optional
);

Specify a filename and all the rows will be returned as an array of hashes.

Supports the exact same options as process_csv().

analyze_csv

my $info = analyze_csv(
    $filename,
    $options, # optional
    $sub, # optional
);

Returns an array of hashes where each hash represents a header in the CSV file. The hash will contain a lot of different meta data about the data that was found in the rows for that header.

Supports the exact same options as process_csv().

The meta data can contain any of the follow values.

string => 1: A value did not look like a number.
string_length => ...: The length of the largest value.
integer => 1: A value looked like a integer (non-decimal number).
integer_length => ...: The number of integer digits in the largest value.
decimal => 1: A value looked like a decimal.
fractional_length => ...: The number of decimal digits in the value with the most decimal places.
max => ...: The maximum number value found.
min => ...: The minimum number value found.
mdy_date => 1: A value had the format of "MM/DD/YYYY".
ymd_date => 1: A value had the format of "YYYY-MM-DD".
unsigned => 1: A negative number was found.
undef => 1: An empty value was found.

It is possible that within the same header that multiple data types are found, such as finding a integer value on one row then a string value on another row within the same header. In a case like this both the integer=>1 and string=>1 flags would be set.

AUTHOR

Aran Clary Deltac <bluefeet@gmail.com>

LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.