NAME
File::Raw::Separated - CSV/TSV plugin for File::Raw
VERSION
Version 0.01
SYNOPSIS
use File::Raw::Separated qw(import);
my $rows = file_parse_buf($scalar); # CSV
my $rows = file_parse_buf($scalar, { dialect => 'tsv' });
file_parse_buf_each($scalar, sub {
my $row = $_[0]; # arrayref of fields (reused across calls;
# copy with [@$row] if you need to retain)
});
# Streaming for files larger than RAM
file_parse_stream("huge.csv", sub { my $row = $_[0]; ... });
my $rows = file_parse_buf("name,age\nalice,30\n", { header => 1 });
# $rows = [ { name => 'alice', age => '30' } ]
...
use File::Raw qw(import);
use File::Raw::Separated; # registers csv + tsv plugins
my $rows = file_slurp("data.csv", plugin => 'csv');
my $rows = file_slurp("data.csv", plugin => 'csv', sep => ';', strict => 1);
my $rows = file_slurp("data.tsv", plugin => 'tsv');
my $text = file_slurp("readme.txt"); # no plugin => raw bytes
OPTIONS
Every parse function and every plugin => 'csv'|'tsv' dispatch accepts the same set of trailing keys.
dialect-
csv(default for the unifiedparse_*functions) ortsv. Selects the seeded defaults forsepandquote; any explicit keys you also set override the dialect's defaults. Theplugin => ...form picks the dialect by plugin name;dialectin that case is ignored. sep-
Single-byte field separator. Default
,for CSV,\tfor TSV. quote-
Single-byte quote character. Default
"for CSV, disabled for TSV. Passundefto disable quoting (every quote becomes literal data). escape-
Single-byte backslash-style escape character. When set, inside a quoted field the escape char consumes the next byte literally. Default
undef(RFC 4180 doubled-quote escape only). strict-
If true, croaks on malformed input (stray quote mid-field, unbalanced quotes, EOL mismatch under pinned
eol). Error message includes byte offset (and file path forparse_stream). Default false (lenient recovery). eol-
One of
auto,lf,crlf,cr. Defaultauto: locks to the first detected terminator and stays in that mode for the rest of the parse. Pinning a non-matching EOL understrictcroaks. trim-
Strip leading/trailing ASCII space and tab from unquoted fields only. Quoted fields preserve all bytes. Default false.
empty_is_undef-
Empty unquoted field becomes
undef. Quoted empty ("") stays the empty string. Default false (always returns""). binary-
Skip UTF-8 BOM stripping and skip
sv_utf8_decodeon each field. Default false. header-
Controls whether rows are emitted as arrayrefs (default) or hashrefs. Two forms:
header => 1-
The first emitted row is consumed as field names; subsequent rows are emitted as hashrefs keyed by those names. Use when the file has its own header line.
header => [qw(name age city)]-
Caller supplies the names. The parser does not consume any row as a header - row 0 is treated as data and emitted as a hashref against the supplied names. Use when the file has no header line of its own. The arrayref must be non-empty, contain no
undefentries, and have no duplicates (each is checked at call time and croaks otherwise).
In either form: a row with more fields than the header croaks; a shorter row pads missing keys with
undef. Default false (arrayref rows). max_field_len-
Cap on a single field's byte length. Exceeding the cap croaks with
field exceeds max length. Default 16 MiB.
IMPORT
import is an XSUB (installed at BOOT). Each requested name is stamped into the caller's package as file_<name> via newXS — the same mechanism File::Raw uses, so the two modules compose without colliding (file_slurp from File::Raw, file_parse_buf from here, etc.).
The file_ prefix is added by the importer; you request names without it. Unknown names produce a warning, not a die.
import-
Bareword shorthand for
:all— matches the File::Raw idiomuse File::Raw qw(import). :all-
All nine names. Equivalent to
:unified :csv :tsv. :unified-
parse_buf,parse_buf_each,parse_stream- dialect read from the opts hash, defaults to csv. :csv-
csv_parse_buf,csv_parse_buf_each,csv_parse_stream- dialect pinned to csv; thedialectkey in opts is ignored. :tsv-
tsv_parse_buf,tsv_parse_buf_each,tsv_parse_stream. - Individual names
-
Any of the nine bare names listed above can be requested directly; each lands as
file_<name>.
ROW-AV ALIASING (callback variants)
file_parse_buf_each, file_parse_stream (and their dialect-pinned counterparts) hand the callback the SAME arrayref every row, with its contents replaced. Stash a copy if you need to retain the row across calls:
file_parse_buf_each($buf, sub {
my $row = [@{$_[0]}]; # explicit copy
push @keep, $row;
});
Header mode uses a fresh hashref per row, so the aliasing only affects the array-form callback path.
STREAMING
file_parse_stream opens the file at the C level (PerlLIO_open) and reads in 64 KiB chunks, feeding each to the parser's incremental API. RSS is bounded by the read buffer + max_field_len regardless of total file size.
To abort mid-stream, die from the callback. The exception propagates; the file descriptor and parser state are cleaned up on every exit path.
PLUGIN INTEGRATION WITH FILE::RAW
Loading File::Raw::Separated registers two plugins with File::Raw via file_register_plugin (declared in include/file_plugin.h):
csvplugin - CSV defaults (sep,, quote"). READ phase fires fromFile::Raw::slurp($p, plugin => 'csv', ...), returning AoA (or AoH underheader => 1).tsvplugin - TSV defaults (sep\t, no quoting).
The plugins register at module load and stay registered for the life of the process. Per-call options arrive through File::Raw's variadic XSUB plumbing; there is no global state to mutate. To opt out for a particular call, just don't pass plugin =>.
The WRITE / RECORD / STREAM phases are not yet wired - they will land once the parser core grows a serialiser and File::Raw teaches each_line, grep_lines, etc. the plugin pipeline. In the meantime use parse_stream for streaming directly.
SEE ALSO
File::Raw - the underlying fast file IO layer.
AUTHOR
LNATION <email@lnation.org>
LICENSE AND COPYRIGHT
This software is Copyright (c) 2026 by LNATION <email@lnation.org>.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 150:
Non-ASCII character seen before =encoding in '—'. Assuming UTF-8