NAME

Tie::CSV_File - ties a csv-file to an array of arrays

SYNOPSIS

use Tie::CSV_File;

tie my @data, 'Tie::CSV_File', 'xyz.dat';
print "Data in 3rd line, 5th column: ", $data[2][4];
untie @data;

# or to read a tabular, or a whitespace or a (semi-)colon seperated file
tie my @data, 'Tie::CSV_File', 'xyz.dat', TAB_SEPERATED;
# or  use instead COLON_SEPERATED, SEMICOLON_SEPERATED, PIPE_SEPERATED,
#         or even WHITESPACE_SEPERATED

# or to read something own defined
tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_char     => '|',
                                          sep_re       => qr/\s*\|\s*/,
                                          quote_char   => undef,
                                          eol          => undef, # default
                                          escape_char  => undef,
                                          always_quote => 0;  # default
                                          
$data[1][3] = 4;
$data[-1][-1] = "last column in last line";

$data[0] = [qw/Name Address Country Phone/];
push @data, ["Gates", "Redmond",  "Washington", "0800-EVIL"];
push @data, ["Linus", "Helsinki", "Finnland",   "0800-LINUX"];

delete $data[3][2];

DESCRIPTION

Tie::CSV_File represents a regular csv file as a Perl array of arrays. The first dimension of the represents the line-nr in the original file, the second dimension represents the col-nr. Both indices are starting with 0. You can also access with the normal array value, e.g. $data[-1][-1] stands for the last field in the last line, or @{$data[1]} stands for the columns of the second line.

An empty field has the value '', while a not existing field has the value undef. E.g. about the file

"first field",,
"last field"

"the above line is empty"

we can say

$data[0][0] eq "first field"
$data[0][1] eq ""
!defined $data[0][2] 

$data[1][0] eq "last field"

@{$data[1]}  # is an empty list ()
!defined $data[1][0]

$data[2][0] eq "the above line is empty"

!defined $data[$x][$y] # for every $x > 3, $y any 

Note, that it is possible also, to change the data.

$data[0][0]   = "first line, first column";
$data[3][7]   = "anywhere in the world";
$data[-1][-1] = "last line, last column";

$data[0] = ["Last name", "First name", "Address"];
push @data, ["Schleicher", "Janek", "Germany"];
my @header = @{ shift @data };

Please pay attention that deleting an array element has a slightly different meaning to the normal behaviour. Deleting an element set the element empty ("" or []), but not undef.

delete $data[5];    # similar to $data[5] = [];
delete $data[5][5]; # similar to $data[5][5] = "";

In fact, in a file there is no value undefined. A cell of the CSV-File can only be empty (""). Undefined values signalizes that the line or the column doesn't exist. Especially the lines ,,, and "","","","" are the same for Tie::CSV_File and the second version could be changed without a warning to the first one when you write to the tied array.

There's only a small part of the whole file in memory, so this module will work also for large files. Please look the Tie::File module for any details, as I use it to read the lines of the file.

But it won't work with large fields, as all fields of one line are parsed, even if you only want to get one field.

CSV options for tieing

Similar to Text::CSV_XS, you can add the following options:

quote_char {default: "} =item eol {default: undef}, =item sep_char {default: ,} =item escape_char {default: "} =item always_quote {default: 0}

Please read the documentation of Text::CSV_XS for details.

Note, that the binary option isn't available.

In addition to have an easier working with files, that aren't seperated with different characters, e.g. sometimes one whitespace, sometimes more, I added the sep_re option (defaults to undef).

If it is specified, sep_char is ignored when reading, instead something similar to split at the sepater is done to find out the fields.

E.g., you can say

tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_re       => qr/\s+/,
                                          quote_char   => undef,
                                          eol          => undef, # default
                                          escape_char  => undef,
                                          always_quote => 0;     # default
                                      

to read something like

   PID TTY          TIME CMD
1200 pts/0    00:00:00 bash
1221 pts/0    00:00:01 nedit
1224 pts/0    00:00:01 nedit
1228 pts/0    00:00:06 nedit
1318 pts/0    00:00:01 nedit
1605 pts/0    00:00:00 ps

Note, that the value of sep_re must be a regexp object, e.g. generated with qr/.../. A simple string produces an error.

Note also, that sep_char is used to write data. As the name suggests sep_char can only consists of one char.

Predefined file types

Without any options you define a standard csv file. However, tabular seperated, colon seperated and whitespace seperated files are also commonly used, so they are predefined. That's why it's possible to say:

tie my @data, 'Tie::CSV_File', 'xyz.dat', TAB_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', COLON_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPERATED;
TAB_SEPERATED

It's defined with:

sep_char     => "\t",
quote_char   => undef,
eol          => undef, # default
escape_char  => undef,
always_quote => 0     # default

Note, that the data isn't allowed to contain any tab.

COLON_SEPERATED

It's defined with:

sep_char     => ":",
quote_char   => undef,
eol          => undef, # default
escape_char  => undef,
always_quote => 0     # default

Note, that the data isn't allowed to contain any colon.

SEMICOLON_SEPERATED

It's defined with:

sep_char     => ";",
quote_char   => undef,
eol          => undef, # default
escape_char  => undef,
always_quote => 0     # default

Note, that the data isn't allowed to contain any colon.

Allthough that looks very similar to CSV files, SEMICOLON_SEPERATED doesn't quote data and can't work properly with quoted data. If you want just a normal CSV file with semicolons instead of commas, just write

tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_char => ";";
PIPE_SEPERATED

It's defined with:

sep_char     => "|",
quote_char   => undef,
eol          => undef, # default
escape_char  => undef,
always_quote => 0     # default

Note, that the data isn't allowed to contain any pipe delimeter.

WHITESPACE_SEPERATED

It's defined with:

sep_re       => qr/\s+/,
sep_char     => ' ',
quote_char   => undef,
eol          => undef, # default
escape_char  => undef,
always_quote => 0     # default

Note that it reads with splitting at all whitespace sequences. Especially it's not possible to define an empty field. Note also, that when setting an element, all whitespace sequences are transformed to a simple blank.

Of course, you can overwrite some options. E.g., let's assume that you have a whitespace seperated file, but you want to write a tab instead of a blank when changing the data. That can be done with:

tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPERATED, sep_char => "\t";

Please suggest me other useful file types, I could predeclare.

EXPORT

By default these constants are exported:

TAB_SEPERATED
COLON_SEPERATED
SEMICOLON_SEPERATED
PIPE_SEPERATED
WHITESPACE_SEPERATED

BUGS

The indirect write methods like push @data, [1, 2], push @{$data[3]}, ["a", "b"] or similar to slices aren't tested directly. I hope that the implementation of Tie::Array is good enough for it. It will be tested extensivly with the future versions.

This module is slow, even slower than necessary with object oriented features. I'll change it when implementing some more features.

This module expects that the tied file doesn't change from anywhere else as this module when it is tied. But the file isn't locked, so it's your job to take care about.

Please inform me about every bug or missing feature of this module.

TODO

Possibility to give (memory) options at tieing, like mode, memory, dw_size similar to Tie::File.

Discuss differences to AnyData module.

Discuss differenced to DBD::CSV module.

Implement binary mode.

Option like filter = sub { s/\s+/ / }> that would specify a routine called before a line is processed. Perhaps even process is a sensfull name to this option.

Warn if sep_char isn't matched with a specified sep_re or if sep_char consists of more than one character.

SEE ALSO

Tie::File Text::CSV Text::CSV_XS AnyData DBD::CSV

AUTHOR

Janek Schleicher, <bigj@kamelfreund.de<gt>

COPYRIGHT AND LICENSE

Copyright 2002 by Janek Schleicher

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.