NAME

Spreadsheet::ExcelTableReader - Module to extract a table from somewhere within an Excel spreadsheet

VERSION

version 0.000001_001

SYNOPSIS

  my $tr= Spreadsheet::ExcelTableReader->new(
	file   => $filename_or_parser_instance,
	sheet  => $pattern_or_sheet_ref,  # optional. will search all sheets otherwise
	fields => [
		{ name => 'isbn', header => qr/isbn/i, isa => ISBN }
		'author',
		'title',
		{ name => 'publisher', header => qr/publish/i },
		...
    ],
  );
  
  my $data= $tr->hashes;
  # -or-
  $data= $tr->arrays;
  # -or-
  my $i= $tr->iterator(hash => 1);
  while (my $rec= $i->()) { ... }

DESCRIPTION

Reading data from a spreadsheet isn't too hard thanks to modules like Spreadsheet::ParseExcel and Spreadsheet::XLSX, and Data::Table::Excel. The problem comes from the users, when they are exchanging files, adding rows or columns, or otherwise mucking around with the layout.

The purpose of this module is to help you find your data table somewhere within an excel file, and clean up and/or validate the values as you extract them. It uses the names (or regexes) of header columns to locate the header row, and then pulls the data rows below that until the first blank row (or end of file). The columns do not need to be in the same order as you specified, and you have the option to ignore unknown columns, and the option to proceed even if not all of your columns were found.

The default options are to make sure it found all your data columns, ignore extra columns, strip off whitespace, and throw exceptions if it can't do those things.

ATTRIBUTES

file

This is either a filename (which gets coerced into a parser instance) or a parser instance that you created. Currently supported parsers are Spreadsheet::ParseExcel and Spreadsheet::XLSX.

file is not required if you supplied a parser's worksheet object as "sheet"

sheet

This is either a sheet name, a regex for matching a sheet name, or a parser's worksheet instance. It is also optional; if you don't specify a sheet then this table reader will search all sheets looking for your table columns.

fields

Fields is an array of field specifications (Spreadsheet::ExcelTableReader::Field) or a hashref that constructs one, or just a simple string that we use to build a field with default values.

# This
fields => [ 'foo' ]

# becomes this
fields => [ Spreadsheet::ExcelTableReader::Field->new( 
  name     => 'foo',
  header   => qr/^\W*$foo\W*$/,
  required => 1,
  trim     => 1,
  blank    => undef
) ]

field_list

Convenient list accessor for "fields". Not writeable.

find_table_args

Supplies default arguments to "find_table". These are ignored if you call find_table directly.

METHODS

new

Standard Moo constructor, accepting attributes as hash or hashref. Dies if it doesn't have any sheets to work with. (i.e. it tries to open the file if necessary, and sees if any sheets match your sheet specification)

find_table

$tr->find_table( %params )

Perform the search for the header row of the table. After this is called, the rest of the data-reading methods will pull from the located region of the spreadsheet.

Returns true if it located the header, or false otherwise.

table_location

Returns information about the location of the table after a successful find_table. Returns undef if find_table has not yet run.

{
  header     => \@values,    # The literal header values we found
  start_cell => $cell_addr,  # The Excel cell address of the first data row, first column
  end_cell   => $cel_addr,   # The Excel cell address of the last data row, last column
}

record_count

Returns the number of rows in the table, by a simple difference of Excel cell addresses. You might get a smaller number of rows back if you configure the iterator to skip or stop at empty rows.

records

my $records= $tr->records( %options );

Returns an arrayref of records, each as a hashref (unless arrays are requested in %options).

record_arrays

Returns an arrayref of arrayrefs. Shortcut for $tr->records(as => 'array').

iterator

my $i= $tr->iterator(hash => 1);
while ($rec= $i->()) {
  ...
}

or if you want to ignore invalid data:

my $i= $tr->iterator(on_error => '');
while (1) {
  my $rec= $i->();
  last unless defined $rec;
  if (! ref $rec) { warn "Error on row ".$i->row.", but continuing\n" }
  else {
    ...
  }
}

Returns a record iterator. The iterator is a coderef which returns the next record each time you call it. The iterator is also blessed, so you can call methods on it! Isn't that cool?

AUTHOR

Michael Conrad <mike@nrdvana.net>

COPYRIGHT AND LICENSE

This software is copyright (c) 2016 by Michael Conrad.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.