NAME

File::Raw::Separated - CSV/TSV plugin for File::Raw

VERSION

Version 0.02

SYNOPSIS

use File::Raw::Separated qw(import);

my $rows = file_parse_buf($scalar);                      # CSV
my $rows = file_parse_buf($scalar, { dialect => 'tsv' });

file_parse_buf_each($scalar, sub {
    my $row = $_[0];     # arrayref of fields (reused across calls;
                         # copy with [@$row] if you need to retain)
});

# Streaming for files larger than RAM
file_parse_stream("huge.csv", sub { my $row = $_[0]; ... });

my $rows = file_parse_buf("name,age\nalice,30\n", { header => 1 });
# $rows = [ { name => 'alice', age => '30' } ]

...

use File::Raw qw(import);
use File::Raw::Separated;            # registers csv + tsv plugins

my $rows = file_slurp("data.csv", plugin => 'csv');
my $rows = file_slurp("data.csv", plugin => 'csv', sep => ';', strict => 1);
my $rows = file_slurp("data.tsv", plugin => 'tsv');
my $text = file_slurp("readme.txt"); # no plugin => raw bytes

OPTIONS

Every parse function and every plugin => 'csv'|'tsv' dispatch accepts the same set of trailing keys.

dialect

csv (default for the unified parse_* functions) or tsv. Selects the seeded defaults for sep and quote; any explicit keys you also set override the dialect's defaults. The plugin => ... form picks the dialect by plugin name; dialect in that case is ignored.

sep

Single-byte field separator. Default , for CSV, \t for TSV.

quote

Single-byte quote character. Default " for CSV, disabled for TSV. Pass undef to disable quoting (every quote becomes literal data).

escape

Single-byte backslash-style escape character. When set, inside a quoted field the escape char consumes the next byte literally. Default undef (RFC 4180 doubled-quote escape only).

strict

If true, croaks on malformed input (stray quote mid-field, unbalanced quotes, EOL mismatch under pinned eol). Error message includes byte offset (and file path for parse_stream). Default false (lenient recovery).

eol

One of auto, lf, crlf, cr. Default auto: locks to the first detected terminator and stays in that mode for the rest of the parse. Pinning a non-matching EOL under strict croaks.

trim

Strip leading/trailing ASCII space and tab from unquoted fields only. Quoted fields preserve all bytes. Default false.

empty_is_undef

Empty unquoted field becomes undef. Quoted empty ("") stays the empty string. Default false (always returns "").

binary

Skip UTF-8 BOM stripping and skip sv_utf8_decode on each field. Default false.

Controls whether rows are emitted as arrayrefs (default) or hashrefs. Two forms:

header => 1

The first emitted row is consumed as field names; subsequent rows are emitted as hashrefs keyed by those names. Use when the file has its own header line.

header => [qw(name age city)]

Caller supplies the names. The parser does not consume any row as a header - row 0 is treated as data and emitted as a hashref against the supplied names. Use when the file has no header line of its own. The arrayref must be non-empty, contain no undef entries, and have no duplicates (each is checked at call time and croaks otherwise).

In either form: a row with more fields than the header croaks; a shorter row pads missing keys with undef. Default false (arrayref rows).

max_field_len

Cap on a single field's byte length. Exceeding the cap croaks with field exceeds max length. Default 16 MiB.

IMPORT

import is an XSUB (installed at BOOT). Each requested name is stamped into the caller's package as file_<name> via newXS - the same mechanism File::Raw uses, so the two modules compose without colliding (file_slurp from File::Raw, file_parse_buf from here, etc.).

The file_ prefix is added by the importer; you request names without it. Unknown names produce a warning, not a die.

import

Bareword shorthand for :all - matches the File::Raw idiom use File::Raw qw(import).

:all

All nine names. Equivalent to :unified :csv :tsv.

:unified

parse_buf, parse_buf_each, parse_stream - dialect read from the opts hash, defaults to csv.

:csv

csv_parse_buf, csv_parse_buf_each, csv_parse_stream - dialect pinned to csv; the dialect key in opts is ignored.

:tsv

tsv_parse_buf, tsv_parse_buf_each, tsv_parse_stream.

Individual names

Any of the nine bare names listed above can be requested directly; each lands as file_<name>.

ROW-AV ALIASING (callback variants)

file_parse_buf_each, file_parse_stream (and their dialect-pinned counterparts) hand the callback the SAME arrayref every row, with its contents replaced. Stash a copy if you need to retain the row across calls:

file_parse_buf_each($buf, sub {
    my $row = [@{$_[0]}];   # explicit copy
    push @keep, $row;
});

Header mode uses a fresh hashref per row, so the aliasing only affects the array-form callback path.

STREAMING

file_parse_stream opens the file at the C level (PerlLIO_open) and reads in 64 KiB chunks, feeding each to the parser's incremental API. RSS is bounded by the read buffer + max_field_len regardless of total file size.

To abort mid-stream, die from the callback. The exception propagates; the file descriptor and parser state are cleaned up on every exit path.

PLUGIN INTEGRATION WITH FILE::RAW

Loading File::Raw::Separated registers two plugins with File::Raw via file_register_plugin (declared in include/file_plugin.h):

  • csv plugin - CSV defaults (sep ,, quote "). READ phase fires from File::Raw::slurp($p, plugin => 'csv', ...), returning AoA (or AoH under header => 1).

  • tsv plugin - TSV defaults (sep \t, no quoting).

The plugins register at module load and stay registered for the life of the process. Per-call options arrive through File::Raw's variadic XSUB plumbing; there is no global state to mutate. To opt out for a particular call, just don't pass plugin =>.

The WRITE / RECORD / STREAM phases are not yet wired - they will land once the parser core grows a serialiser and File::Raw teaches each_line, grep_lines, etc. the plugin pipeline. In the meantime use parse_stream for streaming directly.

SEE ALSO

File::Raw - the underlying fast file IO layer.

AUTHOR

LNATION <email@lnation.org>

LICENSE AND COPYRIGHT

This software is Copyright (c) 2026 by LNATION <email@lnation.org>.

This is free software, licensed under:

The Artistic License 2.0 (GPL Compatible)