NAME
Spreadsheet::ExcelTableReader - Module to extract a table from somewhere within an Excel spreadsheet
VERSION
version 0.000001_003
SYNOPSIS
my $tr= Spreadsheet::ExcelTableReader->new(
file => $filename_or_parser_instance,
sheet => $pattern_or_sheet_ref, # optional. will search all sheets otherwise
fields => [
{ name => 'isbn', header => qr/isbn/i, isa => ISBN }
'author',
'title',
{ name => 'publisher', header => qr/publish/i },
...
],
);
my $data= $tr->records;
# -or-
$data= $tr->record_arrays;
# -or-
my $i= $tr->iterator(as => 'hash');
while (my $rec= $i->()) { ... }
DESCRIPTION
Reading data from a spreadsheet isn't too hard thanks to modules like Spreadsheet::ParseExcel and Spreadsheet::ParseXLSX, and Data::Table::Excel. However there are often problems with knowing what/where to parse, because Excel files are generally considered a human interface and people might add or edit column headers or move the table around.
The purpose of this module is to help you find your data table somewhere within an excel file, and clean up and/or validate the values as you extract them. It uses the names (or regexes) of header columns to locate the header row, and then pulls the data rows below that until the first blank row (or end of file). The columns do not need to be in the same order as you specified, and you have the option to ignore unknown columns, and the option to proceed even if not all of your columns were found.
The default options are to make sure it found all your data columns, ignore extra columns, strip off whitespace, and throw exceptions if it can't do those things.
ATTRIBUTES
file
This is either a filename (which gets coerced into a parser instance) or a parser instance that you created. Currently supported parsers are Spreadsheet::ParseExcel, Spreadsheet::ParseXLSX, and Spreadsheet::XLSX.
file
is not required if you supplied a parser's worksheet object as "sheet"
sheet
This is either a sheet name, a regex for matching a sheet name, or a parser's worksheet instance. It is also optional; if you don't specify a sheet then this table reader will search all sheets looking for your table columns.
fields
Fields is an array of field specifications (Spreadsheet::ExcelTableReader::Field) or a hashref that constructs one, or just a simple string that we use to build a field with default values.
# This
fields => [ 'foo' ]
# becomes this
fields => [ Spreadsheet::ExcelTableReader::Field->new(
name => 'foo',
header => qr/^\s*$foo\s*$/,
required => 1,
trim => 1,
blank => undef
) ]
field_list
Convenient list accessor for "fields". Not writeable.
find_table_args
Supplies default arguments to "find_table". These are ignored if you call find_table
directly.
METHODS
new
Standard Moo constructor, accepting attributes as hash or hashref. Dies if it doesn't have any sheets to work with. (i.e. it tries to open the file if necessary, and sees if any sheets match your sheet
specification)
find_table
$tr->find_table( %params )
Perform the search for the header row of the table. After this is called, the rest of the data-reading methods will pull from the located region of the spreadsheet.
Returns true if it located the header, or false otherwise.
table_location
Returns information about the location of the table after a successful find_table. Returns undef if find_table has not yet run.
{
header => \@values, # The literal header values we found
start_cell => $cell_addr, # The Excel cell address of the first data row, first column
end_cell => $cel_addr, # The Excel cell address of the last data row, last column
}
records
my $records= $tr->records( %options );
Returns an arrayref of records, each as a hashref (unless arrays are requested in %options). See "iterator" for the list of options.
record_arrays
Returns an arrayref of arrayrefs. Shortcut for $tr->records(as => 'array')
.
iterator
my $i= $tr->iterator(as => 'array');
while ($rec= $i->()) {
...
}
Returns a record iterator. The iterator is a coderef which returns the next record each time you call it. The iterator is also blessed, so you can call methods on it! Isn't that cool?
Parameters:
- as
-
Either "array", for each record to be an arrayref of values in the same order as 'fields', or "hash", for each record to be a hashref of field=>value.
The default is 'hash'.
- blank_row
-
If this is set to 'skip', blank rows in the data will be ignored. The iterator will return non-blank rows until it reaches the end of the file.
If this is set to 'end', the first blank row in the data set will have an EOF effect. No more rows can be returned until the iterator is reset.
The default is 'end'.
- on_error
-
If this is set to a coderef, then the coderef will be called if the row fails its validation instead of throwing an exception.
on_error => sub { my ($record, $failed_fields)= @_; for my $field (@$failed_fields) { if ($record->{ $field->name } ...) { ... } } return ...; # 'use' or 'skip' or 'end' };
The callback is given the record which failed the validation (which might be an arrayref or hashref depending on the other options) and an arrayref of each Field object which had an invalid value.
If the callback returns 'use', the record (possibly modified by the callback) will be returned from the iterator like normal. If the callback returns 'skip', the record will be skipped and the iterator will loop to the next row. If the callback returns 'end', the iterator will return undef and go into an EOF state.
A quick way to simply ignore rows which don't match your validation is:
my $records= $tr->records(on_error => sub { 'skip' });
Methods:
- sheet
-
The current worksheet object being processed by the iterator
- col
-
The column index of the first column, or of the last cell that failed validation
- row
-
The row number of the last record returned, or the index of the header if the first record has not been read
- remaining
-
The estimate of the number of rows remaining. This can be a lie if { blank_row => 'end' }
- rewind
-
Resets the iterator for another run through the same data
AUTHOR
Michael Conrad <mike@nrdvana.net>
COPYRIGHT AND LICENSE
This software is copyright (c) 2016 by Michael Conrad.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.