NAME
Tie::CSV_File - ties a csv-file to an array of arrays
SYNOPSIS
use Tie::CSV_File;
tie my @data, 'Tie::CSV_File', 'xyz.dat';
print "Data in 3rd line, 5th column: ", $data[2][4];
untie @data;
# or to read a tabular, or a whitespace or a colon seperated file
tie my @data, 'Tie::CSV_File', 'xyz.dat', TAB_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', COLON_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', SEMICOLON_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPERATED;
# or to read something own defined
tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_char => '|',
sep_re => qr/\s*\|\s*/,
quote_char => undef,
eol => undef, # default
escape_char => undef,
always_quote => 0; # default
$data[1][3] = 4;
$data[-1][-1] = "last column in last line";
$data[0] = [qw/Name Address Country Phone/];
push @data, ["Gates", "Redmond", "Washington", "0800-EVIL"];
push @data, ["Linus", "Helsinki", "Finnland", "0800-LINUX"];
delete $data[3][2];
DESCRIPTION
Tie::CSV_File
represents a regular csv file as a Perl array of arrays. The first dimension of the represents the line-nr in the original file, the second dimension represents the col-nr. Both indices are starting with 0. You can also access with the normal array value, e.g. $data[-1][-1]
stands for the last field in the last line, or @{$data[1]}
stands for the columns of the second line.
An empty field has the value ''
, while a not existing field has the value undef
. E.g. about the file
"first field",,
"last field"
"the above line is empty"
we can say
$data[0][0] eq "first field"
$data[0][1] eq ""
!defined $data[0][2]
$data[1][0] eq "last field"
@{$data[1]} # is an empty list ()
!defined $data[1][0]
$data[2][0] eq "the above line is empty"
!defined $data[$x][$y] # for every $x > 3, $y any
Note, that it is possible also, to change the data.
$data[0][0] = "first line, first column";
$data[3][7] = "anywhere in the world";
$data[-1][-1] = "last line, last column";
$data[0] = ["Last name", "First name", "Address"];
push @data, ["Schleicher", "Janek", "Germany"];
my @header = @{ shift @data };
Please pay attention that deleting an array element has a slightly different meaning to the normal behaviour. Deleting an element set the element empty ("" or []), but not undef.
delete $data[5]; # similar to $data[5] = [];
delete $data[5][5]; # similar to $data[5][5] = "";
In fact, in a file there is no value undefined. A cell of the CSV-File can only be empty (""). Undefined values signalizes that the line or the column doesn't exist. Especially the lines ,,,
and "","","",""
are the same for Tie::CSV_File
and the second version could be changed without a warning to the first one when you write to the tied array.
There's only a small part of the whole file in memory, so this module will work also for large files. Please look the Tie::File module for any details, as I use it to read the lines of the file.
But it won't work with large fields, as all fields of one line are parsed, even if you only want to get one field.
CSV options for tieing
Similar to Text::CSV_XS
, you can add the following options:
- quote_char {default: "} =item eol {default: undef}, =item sep_char {default: ,} =item escape_char {default: "} =item always_quote {default: 0}
Please read the documentation of Text::CSV_XS for details.
Note, that the binary option isn't available.
In addition to have an easier working with files, that aren't seperated with different characters, e.g. sometimes one whitespace, sometimes more, I added the sep_re option (defaults to undef
).
If it is specified, sep_char is ignored when reading, instead something similar to split at the sepater is done to find out the fields.
E.g., you can say
tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_re => qr/\s+/,
quote_char => undef,
eol => undef, # default
escape_char => undef,
always_quote => 0; # default
to read something like
PID TTY TIME CMD
1200 pts/0 00:00:00 bash
1221 pts/0 00:00:01 nedit
1224 pts/0 00:00:01 nedit
1228 pts/0 00:00:06 nedit
1318 pts/0 00:00:01 nedit
1605 pts/0 00:00:00 ps
Note, that the value of sep_re must be a regexp object, e.g. generated with qr/.../
. A simple string produces an error.
Note also, that sep_char
is used to write data. As the name suggests sep_char
can only consists of one char.
Predefined file types
Without any options you define a standard csv file. However, tabular seperated, colon seperated and whitespace seperated files are also commonly used, so they are predefined. That's why it's possible to say:
tie my @data, 'Tie::CSV_File', 'xyz.dat', TAB_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', COLON_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPERATED;
- TAB_SEPERATED
-
It's defined with:
sep_char => "\t", quote_char => undef, eol => undef, # default escape_char => undef, always_quote => 0 # default
Note, that the data isn't allowed to contain any tab.
- COLON_SEPERATED
-
It's defined with:
sep_char => ":", quote_char => undef, eol => undef, # default escape_char => undef, always_quote => 0 # default
Note, that the data isn't allowed to contain any colon.
- SEMICOLON_SEPERATED
-
It's defined with:
sep_char => ";", quote_char => undef, eol => undef, # default escape_char => undef, always_quote => 0 # default
Note, that the data isn't allowed to contain any colon.
Allthough that looks very similar to CSV files, SEMICOLON_SEPERATED doesn't quote data and can't work properly with quoted data. If you want just a normal CSV file with semicolons instead of commas, just write
tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_char => ";";
- WHITESPACE_SEPERATED
-
It's defined with:
sep_re => qr/\s+/, sep_char => ' ', quote_char => undef, eol => undef, # default escape_char => undef, always_quote => 0 # default
Note that it reads with splitting at all whitespace sequences. Especially it's not possible to define an empty field. Note also, that when setting an element, all whitespace sequences are transformed to a simple blank.
Of course, you can overwrite some options. E.g., let's assume that you have a whitespace seperated file, but you want to write a tab instead of a blank when changing the data. That can be done with:
tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPERATED, sep_char => "\t";
Please suggest me other useful file types, I could predeclare.
EXPORT
By default these constants are exported:
TAB_SEPERATED
COLON_SEPERATED
SEMICOLON_SEPERATED
WHITESPACE_SEPERATED
BUGS
The indirect write methods like push @data, [1, 2]
, push @{$data[3]}, ["a", "b"]
or similar to slices aren't tested directly. I hope that the implementation of Tie::Array is good enough for it. It will be tested extensivly with the future versions.
This module is slow, even slower than necessary with object oriented features. I'll change it when implementing some more features.
This module expects that the tied file doesn't change from anywhere else as this module when it is tied. But the file isn't locked, so it's your job to take care about.
Please inform me about every bug or missing feature of this module.
TODO
Possibility to give (memory) options at tieing, like mode, memory, dw_size similar to Tie::File.
Discuss differences to AnyData module.
Discuss differenced to DBD::CSV module.
Implement binary mode.
Option like filter =
sub { s/\s+/ / }> that would specify a routine called before a line is processed. Perhaps even process is a sensfull name to this option.
Warn if sep_char isn't matched with a specified sep_re or if sep_char consists of more than one character.
SEE ALSO
Tie::File Text::CSV Text::CSV_XS AnyData DBD::CSV
AUTHOR
Janek Schleicher, <bigj@kamelfreund.de<gt>
COPYRIGHT AND LICENSE
Copyright 2002 by Janek Schleicher
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.