NAME

Spreadsheet::Edit::IO - convert between spreadsheet and csv files

SYNOPSIS

use Spreadsheet::Edit::IO
      qw/convert_spreadsheet OpenAsCsv
         cx2let let2cx @sane_CSV_read_options @sane_CSV_write_options/;

# Open a CSV file or result of converting a sheet from a spreadsheet 
# DEPRECATED? 
#
my $hash = OpenAsCsv("/path/to/spreadsheet.odt!Sheet1");  # single-arg form
my $hash = OpenAsCsv(inpath => "/path/to/spreadsheet.odt", 
                     sheetname -> "Sheet1"); 
print "Reading ",$hash->csvpath()," with encoding ",$hash->encoding(),"\n";
while (<$hash->{fh}>) { ... }

# Convert CSV to spreadsheet
$hash = convert_spreadsheet(inpath => "mycsv.csv", cvt_to => "xls");
print "Resulting spreadsheet path is $hash->{outpath}\n";

# Convert a single sheet from a spreadsheet to CSV
$hash = convert_spreadsheet(inpath => "mywork.xls", sheetname => "Sheet1", 
                            cvt_to => "csv");
open my $fh, (openhandle($hash->{outpath}) ? "<&" : "<"), $hash->{outpath};
binmode $fh, $hash->{iolayers};
...

# Convert all sheets in a spreadsheet to CSV files in a subdir
$hash = convert_spreadsheet(inpath => "mywork.xls", allsheets => 1,
                            cvt_to => "csv");
system "ls -l ".$hash->{outpath};  # show resulting .csv files

# Translate between 0-based column index and letter code (A, B, etc.)
print "The first column is column ", cx2let(0), " (duh!)\n";
print "The 100th column is column ", cx2let(99), "\n";
print "Column BF is index ", let2cx("BF"), "\n";

# Extract components of "filepath!SHEETNAME" specifiers
my $path      = filepath_from_spec("/path/to/spreasheet.xls!Sheet1")
my $sheetname = sheetname_from_spec("/path/to/spreasheet.xls!Sheet1")

# Parse a csv file with sane options
my $csv = Text::CSV->new({ @sane_CSV_read_options, eol => $hash->{eol} })
  or die "ERROR: ".Text::CSV->error_diag ();
my @rows
while (my $F = $csv->getline( $infh )) {
  push @rows, $F;
}
close $infh or die "Error reading ", $hash->csvpath(), ": $!";

# Write a csv file with sane options
my $ocsv = Text::CSV->new({ @sane_CSV_write_options })
  or die "ERROR: ".Text::CSV->error_diag ();
open my $outfh, ">:encoding(utf8)", $outpath
  or die "$outpath: $!";
foreach (@rows) { $ocsv->print($outfh, $_) }
close $outfh or die "Error writing $outpath: $!";

DESCRIPTION

Convert between CSV and spreadsheet files using external programs, plus some utility functions

$hash = OpenAsCsv INPUT

$hash = OpenAsCsv inpath => INPUT, sheetname => SHEETNAME, ...

This is a thin wrapper for convert_spreadsheet followed by open (MAYBE 2B DEPRECATED?)

If a single argument is given it specifies INPUT; otherwise all arguments must be specified as key => value pairs, and may include any options supported by convert_spreadsheet.

INPUT may be a csv or spreadsheet workbook path or an open filehandle to one of those; if a spreadsheet, then a single "sheet" is converted, specified by either a !SHEETNAME suffix in the INPUT path, or a separate sheetname => SHEETNAME option.

The resulting file handle refers to a guaranteed-seekable CSV file; this will either be a temporary file (auto-removed at process exit), or the original INPUT if it was already a seekable csv file.

RETURNS: A ref to a hash containing the following:

{
 fh        => the resulting open file handle
 iolayers  => the iolayers (i.e. binmode arg) used by the file handle
 csvpath   => the path {fh} refers to, which might be a temporary file
 inpath    => original input path or open file handle
 sheetname => sheet name if the input was a spreadsheet
 tempdir   => temporary directory, only if specified in input arguments
}

convert_spreadsheet INPUT, cvt_to=>suffix, OPTIONS

convert_spreadsheet INPUT, cvt_to=>"csv", allsheets => 1, OPTIONS

This converts from CSV to one of various spreadsheet formats or vice-versa.

RETURNS: A ref to a hash containing at least:

{
 outpath   => path of output file (or directory with 'allsheets')
 iolayers  => (i.e. binmode arg) needed to read output if CSV
 inpath    => path of original file with any !SHEETNAME suffix removed
 sheetname => sheet name if the input was a spreadsheet
 cvt_from  => input file type, as specified or detected
 cvt_to    => output file type, as specified or derived from outpath
}

cvt_from and cvt_to are the filename suffixes (sans dot) of the corresponding file types, e.g. "csv", "xls", "xlsx", "odt" etc.

INPUT may be a path or a pre-opened file handle. If cvt_from is is not specified then it is inferred from the INPUT path suffix, or if INPUT is a handle or has no suffix then the file content is examined.

If outpath => OUTPATH is specifed in OPTIONS then results are returned there. OUTPATH must have the appropriate file suffix (.csv .xls etc.) except with allsheets => 1 when OUTPATH, if specified, must be a directory path which will be created if it does not exist.

If outpath => OUTPATH is not specified (or is undef) then results are returned in temporary file(s) which are auto-deleted at process exit, except that if no conversion is necessary (cvt_from is the same as cvt_to) then INPUT itself is returned as outpath.

Some vestigial support for spreadsheet formats exists but does not work well and is not documented here.

Other OPTIONS may include:

sheetname => "sheet name"

The workbook 'sheet' name used when reading or writing a spreadsheet.

An input sheet name may also be specified as "!sheetname" appended to the INPUT path.

allsheets => BOOL

Valid only with cvt_to => 'csv'. All sheets in the input spreadsheet are converted to separate .csv files. outpath, if specified, must be a directory, which will be created if necessary; if not specified, then a new sub-directory of tempdir will be created to contain the the resulting .csv files.

verbose => BOOL
use_gnumeric => BOOL # instead of libre/openoffice
tempdir => "/path/to/dir"

If tempdir is not specified a temporary directory will be created and auto-removed when your process exits.

sane_CSV_read_options

sane_CSV_write_options

@Spreadsheet::Edit::IO:sane_CSV_read_options contains the options you will always want to use with Text::CSV objects->new().

Specifically, quotes and embedded newlines are handled correctly.

You can append overrides, e.g. auto_diag or allow_whitespace. If you change quote_char, then escape_char must be set to the same value.

cx2let COLUMNINDEX

let2cx LETTERCODE

Functions which translate between spreadsheet-column letter codes ("A", "B", etc.) and 0-based column indicies.

filepath_from_spec EXPR

sheetname_from_spec EXPR

Functions which decompose strings giving a spreadsheet path and possibly sheetname as "FILEPATH!SHEETNAME", "FILEPATH|||SHEETNAME", or "FILEPATH[SHEETNAME]". sheetname_from_spec returns undef if the input does not have a a sheetname suffix.

form_spec_with_sheetname(PATH, SHEENAME)

Composes a combined string in a "preferred" format ("PATH!SHEETNAME" or one of the others; which is not specified).