NAME

Tie::CSV_File - ties a csv-file to an array of arrays

SYNOPSIS

use Tie::CSV_File;

tie my @data, 'Tie::CSV_File', 'xyz.dat';
print "Data in 3rd line, 5th column: ", $data[2][4];
untie @data;

# or to read a tabular, or a whitespace or a colon seperated file
tie my @data, 'Tie::CSV_File', 'xyz.dat', TAB_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', COLON_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPERATED;

# or to read something own defined
tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_char     => '|',
                                          sep_re       => qr/\s*\|\s*/,
                                          quote_char   => undef,
                                          eol          => undef,
                                          escape_char  => undef,
                                          always_quote => 0,
                                          
$data[1][3] = 4;
$data[-1][-1] = "last column in last line";

$data[0] = [qw/Name Address Country Phone/];
push @data, ["Gates", "Redmond",  "Washington", "0800-EVIL"];
push @data, ["Linus", "Helsinki", "Finnland",   "0800-LINUX"];

[NOT YET IMPLEMENTED]
delete $data[3][2];

DESCRIPTION

Tie::CSV_File represents a regular csv file as a Perl array of arrays. The first dimension of the represents the line-nr in the original file, the second dimension represents the col-nr. Both indices are starting with 0. You can also access with the normal array value, e.g. $data[-1][-1] stands for the last field in the last line, or @{$data[1]} stands for the columns of the second line.

An empty field has the value '', while a not existing field has the value undef. E.g. about the file

"first field",,
"last field"

"the above line is empty"

we can say

$data[0][0] eq "first field"
$data[0][1] eq ""
!defined $data[0][2] 

$data[1][0] eq "last field"

@{$data[1]}  # is an empty list ()
!defined $data[1][0]

$data[2][0] eq "the above line is empty"

!defined $data[$x][$y] # for every $x > 3, $y any 

Note, that it is possible also, to change the data.

$data[0][0]   = "first line, first column";
$data[3][7]   = "anywhere in the world";
$data[-1][-1] = "last line, last column";

$data[0] = ["Last name", "First name", "Address"];
push @data, ["Schleicher", "Janek", "Germany"];
my @header = @{ shift @data };

You can't delete something, but it will be implemented soon.

There's only a small part of the whole file in memory, so this module will work also for large files. Please look the Tie::File module for any details, as I use it to read the lines of the file.

But it won't work with large fields, as all fields of one line are parsed, even if you only want to get one field.

CSV options for tieing

Similar to Text::CSV_XS, you can add the following options:

quote_char {default: "} =item eol {default: undef}, =item sep_char {default: ,} =item escape_char {default: "} =item always_quote {default: 0}

Please read the documentation of Text::CSV_XS for details.

Note, that the binary option isn't available.

In addition to have an easier working with files, that aren't seperated with different characters, e.g. sometimes one whitespace, sometimes more, I added the sep_re option (defaults to undef).

If it is specified, sep_char is ignored when reading, instead something similar to split at the sepater is done to find out the fields.

E.g., you can say

tie my @data, 'Tie::CSV_File', 'xyz.dat', sep_re       => qr/\s+/,
                                          quote_char   => undef,
                                          eol          => undef, # default
                                          escape_char  => undef,
                                          always_quote => 0;     # default
                                      

to read something like

   PID TTY          TIME CMD
1200 pts/0    00:00:00 bash
1221 pts/0    00:00:01 nedit
1224 pts/0    00:00:01 nedit
1228 pts/0    00:00:06 nedit
1318 pts/0    00:00:01 nedit
1605 pts/0    00:00:00 ps

Note, that the value of sep_re must be a regexp object, e.g. generated with qr/.../. A simple string produces an error.

Note also, that sep_char is used to write data. As the name suggests sep_char can only consists of one char.

Predefined file types

Without any options you define a standard csv file. However, tabular seperated, colon seperated and whitespace seperated files are also commonly used, so they are predefined. That's why it's possible to say:

tie my @data, 'Tie::CSV_File', 'xyz.dat', TAB_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', COLON_SEPERATED;
tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPERATED;
TAB_SEPERATED

It's defined with:

sep_char     => "\t",
quote_char   => undef,
eol          => undef, # default
escape_char  => undef,
always_quote => 0     # default

Note, that the data isn't allowed to contain any tab.

COLON_SEPERATED

It's defined with:

sep_char     => ":",
quote_char   => undef,
eol          => undef, # default
escape_char  => undef,
always_quote => 0     # default

Note, that the data isn't allowed to contain any colon.

WHITESPACE_SEPERATED

It's defined with:

sep_re       => qr/\s+/,
sep_char     => ' ',
quote_char   => undef,
eol          => undef, # default
escape_char  => undef,
always_quote => 0     # default

Note that it reads with splitting at all whitespace sequences. Especially it's not possible to define an empty field. Note also, that when setting an element, all whitespace sequences are transformed to a simple blank.

Of course, you can overwrite some options. E.g., let's assume that you have a whitespace seperated file, but you want to write a tab instead of a blank when changing the data. That can be done with:

tie my @data, 'Tie::CSV_File', 'xyz.dat', WHITESPACE_SEPERATED, sep_char => "\t";

Please suggest me other useful file types, I could predeclar.

EXPORT

By default these constants are exported:

TAB_SEPERATED
COLON_SEPERATED
WHITESPACE_SEPERATED

BUGS

The indirect write methods like push @data, [1, 2], push @{$data[3]}, ["a", "b"] or similar to slices aren't tested directly. I hope that the implementation of Tie::Array is good enough for it. It will be tested extensivly with the future versions.

This module is slow, even slower than necessary with object oriented features. I'll change it when implementing some more features.

Please inform me about every bug or missing feature of this module.

TODO

Implement deleting possibilities.

Possibility to give (memory) options at tieing, like mode, memory, dw_size similar to Tie::File.

Discuss differences to AnyData module.

Implement binary mode.

Option like filter = sub { s/\s+/ / }> that would specify a routine called before a line is processed. Perhaps even process is a sensfull name to this option.

Warn if sep_char isn't matched with a specified sep_re or if sep_char consists of more than one character.

SEE ALSO

Tie::File Text::CSV Text::CSV_XS AnyData

AUTHOR

Janek Schleicher, <big@kamelfreund.de<gt>

COPYRIGHT AND LICENSE

Copyright 2002 by Janek Schleicher

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.