NAME

Spreadsheet::ExcelTableReader - Module to extract a table from somewhere within an Excel spreadsheet

VERSION

version 0.000001_003

SYNOPSIS

  my $tr= Spreadsheet::ExcelTableReader->new(
	file   => $filename_or_parser_instance,
	sheet  => $pattern_or_sheet_ref,  # optional. will search all sheets otherwise
	fields => [
		{ name => 'isbn', header => qr/isbn/i, isa => ISBN }
		'author',
		'title',
		{ name => 'publisher', header => qr/publish/i },
		...
    ],
  );
  
  my $data= $tr->records;
  # -or-
  $data= $tr->record_arrays;
  # -or-
  my $i= $tr->iterator(as => 'hash');
  while (my $rec= $i->()) { ... }

DESCRIPTION

Reading data from a spreadsheet isn't too hard thanks to modules like Spreadsheet::ParseExcel and Spreadsheet::ParseXLSX, and Data::Table::Excel. However there are often problems with knowing what/where to parse, because Excel files are generally considered a human interface and people might add or edit column headers or move the table around.

The purpose of this module is to help you find your data table somewhere within an excel file, and clean up and/or validate the values as you extract them. It uses the names (or regexes) of header columns to locate the header row, and then pulls the data rows below that until the first blank row (or end of file). The columns do not need to be in the same order as you specified, and you have the option to ignore unknown columns, and the option to proceed even if not all of your columns were found.

The default options are to make sure it found all your data columns, ignore extra columns, strip off whitespace, and throw exceptions if it can't do those things.

ATTRIBUTES

`file`

This is either a filename (which gets coerced into a parser instance) or a parser instance that you created. Currently supported parsers are Spreadsheet::ParseExcel, Spreadsheet::ParseXLSX, and Spreadsheet::XLSX.

file is not required if you supplied a parser's worksheet object as "sheet"

`sheet`

This is either a sheet name, a regex for matching a sheet name, or a parser's worksheet instance. It is also optional; if you don't specify a sheet then this table reader will search all sheets looking for your table columns.

fields

Fields is an array of field specifications (Spreadsheet::ExcelTableReader::Field) or a hashref that constructs one, or just a simple string that we use to build a field with default values.

# This
fields => [ 'foo' ]

# becomes this
fields => [ Spreadsheet::ExcelTableReader::Field->new( 
  name     => 'foo',
  header   => qr/^\s*$foo\s*$/,
  required => 1,
  trim     => 1,
  blank    => undef
) ]

field_list

Convenient list accessor for "fields". Not writeable.

find_table_args

Supplies default arguments to "find_table". These are ignored if you call find_table directly.

METHODS

new

Standard Moo constructor, accepting attributes as hash or hashref. Dies if it doesn't have any sheets to work with. (i.e. it tries to open the file if necessary, and sees if any sheets match your sheet specification)

find_table

$tr->find_table( %params )

Perform the search for the header row of the table. After this is called, the rest of the data-reading methods will pull from the located region of the spreadsheet.

Returns true if it located the header, or false otherwise.

table_location

Returns information about the location of the table after a successful find_table. Returns undef if find_table has not yet run.

{
  header     => \@values,    # The literal header values we found
  start_cell => $cell_addr,  # The Excel cell address of the first data row, first column
  end_cell   => $cel_addr,   # The Excel cell address of the last data row, last column
}

records

my $records= $tr->records( %options );

Returns an arrayref of records, each as a hashref (unless arrays are requested in %options). See "iterator" for the list of options.

record_arrays

Returns an arrayref of arrayrefs. Shortcut for $tr->records(as => 'array').

iterator

my $i= $tr->iterator(as => 'array');
while ($rec= $i->()) {
  ...
}

Returns a record iterator. The iterator is a coderef which returns the next record each time you call it. The iterator is also blessed, so you can call methods on it! Isn't that cool?

Parameters:

as

Either "array", for each record to be an arrayref of values in the same order as 'fields', or "hash", for each record to be a hashref of field=>value.

The default is 'hash'.

blank_row

If this is set to 'skip', blank rows in the data will be ignored. The iterator will return non-blank rows until it reaches the end of the file.

If this is set to 'end', the first blank row in the data set will have an EOF effect. No more rows can be returned until the iterator is reset.

The default is 'end'.

on_error

If this is set to a coderef, then the coderef will be called if the row fails its validation instead of throwing an exception.

on_error => sub {
  my ($record, $failed_fields)= @_;
  for my $field (@$failed_fields) {
    if ($record->{ $field->name } ...) {
      ...
    }
  }
  return ...; # 'use' or 'skip' or 'end'
};

The callback is given the record which failed the validation (which might be an arrayref or hashref depending on the other options) and an arrayref of each Field object which had an invalid value.

If the callback returns 'use', the record (possibly modified by the callback) will be returned from the iterator like normal. If the callback returns 'skip', the record will be skipped and the iterator will loop to the next row. If the callback returns 'end', the iterator will return undef and go into an EOF state.

A quick way to simply ignore rows which don't match your validation is:

my $records= $tr->records(on_error => sub { 'skip' });

Methods:

sheet: The current worksheet object being processed by the iterator
col: The column index of the first column, or of the last cell that failed validation
row: The row number of the last record returned, or the index of the header if the first record has not been read
remaining: The estimate of the number of rows remaining. This can be a lie if { blank_row => 'end' }
rewind: Resets the iterator for another run through the same data

AUTHOR

Michael Conrad <mike@nrdvana.net>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Spreadsheet::ExcelTableReader, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Spreadsheet::ExcelTableReader

CPAN shell

perl -MCPAN -e shell
install Spreadsheet::ExcelTableReader

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)