NAME
Spreadsheet::ExcelTableReader - Module to extract a table from somewhere within an Excel spreadsheet
VERSION
version 0.000001_001
SYNOPSIS
my $tr= Spreadsheet::ExcelTableReader->new(
file => $filename_or_parser_instance,
sheet => $pattern_or_sheet_ref, # optional. will search all sheets otherwise
fields => [
{ name => 'isbn', header => qr/isbn/i, isa => ISBN }
'author',
'title',
{ name => 'publisher', header => qr/publish/i },
...
],
);
my $data= $tr->hashes;
# -or-
$data= $tr->arrays;
# -or-
my $i= $tr->iterator(hash => 1);
while (my $rec= $i->()) { ... }
DESCRIPTION
Reading data from a spreadsheet isn't too hard thanks to modules like Spreadsheet::ParseExcel and Spreadsheet::XLSX, and Data::Table::Excel. The problem comes from the users, when they are exchanging files, adding rows or columns, or otherwise mucking around with the layout.
The purpose of this module is to help you find your data table somewhere within an excel file, and clean up and/or validate the values as you extract them. It uses the names (or regexes) of header columns to locate the header row, and then pulls the data rows below that until the first blank row (or end of file). The columns do not need to be in the same order as you specified, and you have the option to ignore unknown columns, and the option to proceed even if not all of your columns were found.
The default options are to make sure it found all your data columns, ignore extra columns, strip off whitespace, and throw exceptions if it can't do those things.
ATTRIBUTES
file
This is either a filename (which gets coerced into a parser instance) or a parser instance that you created. Currently supported parsers are Spreadsheet::ParseExcel and Spreadsheet::XLSX.
file
is not required if you supplied a parser's worksheet object as "sheet"
sheet
This is either a sheet name, a regex for matching a sheet name, or a parser's worksheet instance. It is also optional; if you don't specify a sheet then this table reader will search all sheets looking for your table columns.
fields
Fields is an array of field specifications (Spreadsheet::ExcelTableReader::Field) or a hashref that constructs one, or just a simple string that we use to build a field with default values.
# This
fields => [ 'foo' ]
# becomes this
fields => [ Spreadsheet::ExcelTableReader::Field->new(
name => 'foo',
header => qr/^\W*$foo\W*$/,
required => 1,
trim => 1,
blank => undef
) ]
field_list
Convenient list accessor for "fields". Not writeable.
find_table_args
Supplies default arguments to "find_table". These are ignored if you call find_table
directly.
METHODS
new
Standard Moo constructor, accepting attributes as hash or hashref. Dies if it doesn't have any sheets to work with. (i.e. it tries to open the file if necessary, and sees if any sheets match your sheet
specification)
find_table
$tr->find_table( %params )
Perform the search for the header row of the table. After this is called, the rest of the data-reading methods will pull from the located region of the spreadsheet.
Returns true if it located the header, or false otherwise.
table_location
Returns information about the location of the table after a successful find_table. Returns undef if find_table has not yet run.
{
header => \@values, # The literal header values we found
start_cell => $cell_addr, # The Excel cell address of the first data row, first column
end_cell => $cel_addr, # The Excel cell address of the last data row, last column
}
record_count
Returns the number of rows in the table, by a simple difference of Excel cell addresses. You might get a smaller number of rows back if you configure the iterator to skip or stop at empty rows.
records
my $records= $tr->records( %options );
Returns an arrayref of records, each as a hashref (unless arrays are requested in %options).
record_arrays
Returns an arrayref of arrayrefs. Shortcut for $tr->records(as => 'array')
.
iterator
my $i= $tr->iterator(hash => 1);
while ($rec= $i->()) {
...
}
or if you want to ignore invalid data:
my $i= $tr->iterator(on_error => '');
while (1) {
my $rec= $i->();
last unless defined $rec;
if (! ref $rec) { warn "Error on row ".$i->row.", but continuing\n" }
else {
...
}
}
Returns a record iterator. The iterator is a coderef which returns the next record each time you call it. The iterator is also blessed, so you can call methods on it! Isn't that cool?
AUTHOR
Michael Conrad <mike@nrdvana.net>
COPYRIGHT AND LICENSE
This software is copyright (c) 2016 by Michael Conrad.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.