NAME

Spreadsheet::BasicRead - Methods to easily read data from spreadsheets (.xls, .xlxs and .xlxm)

DESCRIPTION

Provides methods for simple reading of a Excel spreadsheet row at a time returning the row as an array of column values. Properties can be set so that blank rows are skipped. The heading row can also be set so that reading always starts at this row which is the first row of the sheet by default. Properties can also be set to skip the heading row.

Note 1. Leading and trailing white space is removed from cell values.

Note 2. Row and column references are zero (0) indexed. That is cell
        A1 is row 0, column 0

Note 3. Now handles .xlxs and .xlsm files

SYNOPSIS

use Spreadsheet::BasicRead;

my $xlsFileName = 'Test.xls';

my $ss = new Spreadsheet::BasicRead($xlsFileName) ||
   die "Could not open '$xlsFileName': $!";

# Print the row number and data for each row of the
# spreadsheet to stdout using '|' as a separator
my $row = 0;
while (my $data = $ss->getNextRow())
{
   $row++;
   print join('|', $row, @$data), "\n";
}

# Print the number of sheets
print "There are ", $ss->numSheets(), " in the spreadsheet\n";

# Set the heading row to 4
$ss->setHeadingRow(4);

# Skip the first data line, it's assumed to be a heading
$ss->skipHeadings(1);

# Print the name of the current sheet
print "Sheet name is ", $ss->currentSheetName(), "\n";

# Reset back to the first row of the sheet
$ss->getFirstRow();

REQUIRED MODULES

The following modules are required:

Spreadsheet::ParseExcel
Spreadsheet::XLSX

Optional module File::Log can be used to allow simple logging of errors.

METHODS

There are no class methods, the object methods are described below. Private class method start with the underscore character '_' and should be treated as Private.

new

Called to create a new BasicReadNamedCol object. The arguments can be either a single string (see 'SYNOPSIS') which is taken as the filename of the spreadsheet of as named arguments.

eg.  my $ss = Spreadsheet::BasicReadNamedCol->new(
                 fileName      => 'MyExcelSpreadSheet.xls',
                 skipHeadings  => 1,
                 skipBlankRows => 1,
                 log           => $log,
                 oldCell       => 1,
             );

The following named arguments are available:

skipHeadings

Don't output the headings line in the first call to 'getNextRow' if true. This is the first row of the spreadsheet unless the setHeadingRow function has been called to set the heading row.

skipBlankRows

Skip blank lines in the spreadsheet if true.

log

Use the File::Log object to log exceptions. If not provided error conditions are logged to STDERR

fileName

The name (and optionally path) of the spreadsheet file to process.

oldCell

Empty cells returned undef pre version 1.5. They now return ''.

The old functionality can be turned on by setting argument oldCell to true

Note that new will die if the spreadsheet can not be successfully opened. As such you may wish to wrap the call to new in a eval block. See xlsgrep|EXAMPLE APPLICATIONS for an example of when this might be desirable.

getNextRow()

Get the next row of data from the spreadsheet. The data is returned as an array reference.

eg.  $rowDataArrayRef = $ss->getNextRow();

numSheets()

Returns the number of sheets in the spreadsheet

openSpreadsheet(fileName)

Open a new spreadsheet file and set the current sheet to the first sheet. The name and optionally path of the spreadsheet file is a required argument to this method.

currentSheetNum()

Returns the current sheet number or undef if there is no current sheet. 'setCurrentSheetNum' can be called to set the current sheet.

currentSheetName()

Return the name of the current sheet or undef if the current sheet is not defined. see 'setCurrentSheetNum'.

setCurrentSheetNum(num)

Sets the current sheet to the integer value 'num' passed as the required argument to this method. Note that this should not be bigger than the value returned by 'numSheets'.

getNextSheet()

Returns the next sheet "ssBook" object or undef if there are no more sheets to process. If there is no current sheet defined the first sheet is returned.

getFirstSheet()

Returns the first sheet "ssBook" object.

cellValue(row, col)

Returns the value of the cell defined by (row, col)in the current sheet.

getFirstRow()

Returns the first row of data from the spreadsheet (possibly skipping the column headings 'skipHeadings' as an array reference.

setHeadingRow(rowNumber)

Sets the effective minimum row for the spreadsheet to 'rowNumber', since it is assumed that the heading is on this row and anything above the heading is not relavent.

Note: the row (and column) numbers are zero indexed.

setRow(rowNumber)

Sets the row to be returned by the next call to 'getNextRow'. Note that if the heading row has been defined and the row number set with setRow is less than the heading row, data will be returned from the heading row regardless, unless skip heading row has been set, in which case it will be the row after the heading row.

getRowNumber()

Returns the number of the current row (that has been retrieved). Note that row numbers are zero indexed. If a row has not been retrieved as yet, -1 is returned.

logexp(message)

Logs an exception message (can be a list of strings) using the File::Log object if it was defined and then calls die message.

logmsg(debug, message)

If a File::Log object was passed as a named argument 'new' and if 'debug' (integer value) is equal to or greater than the current debug Level (see File::Log) then the message is added to the log file.

If a File::Log object was not passed to new then the message is output to STDERR.

EXAMPLE APPLICATIONS

Two sample (but usefull) applications are included with this distribution.

The simplest is dumpSS.pl which will dump the entire contents of a spreadsheet to STDOUT. Each sheet is preceeded by the sheet name (enclosed in ***) on a line, followed by each row of the spreadsheet, with cell values separated by the pipe '|' character. There is no special handling provided for cells containing the pipe character.

A more complete example is xlsgrep. This application can be used to do a perl pattern match for cell values within xls files in the current and sub directories. There are no special grep flags, however this should not be a problem since perl's pattern matching allows for most requirements within the search pattern.

Usage is: xlsgrep.pl pattern

To do a case insensative search for "Some value" in any xls file in the current directory you would use:

xlsgrep '(?i)Some value'

For further details, see each applications POD.

ACKNOWLEDGEMENTS

I would like to acknowledge the input and patches recieved from the following:

Ilia Lobsanov, Bryan Maloney, Bill (from Datacraft), nadim and D. Dewey Allen

KNOWN ISSUES

None, however please contact the author at gng@cpan.org should you find any problems and I will endevour to resolve then as soon as possible.

If you have any enhancement suggestions please send me an email and I will try to accommodate your suggestion.

SEE ALSO

Spreadsheet:ParseExcel on CPAN does all the hard work, thanks Kawai Takanori (Hippo2000) kwitknr@cpan.org

The included applications dumpSS.pl and xlsgrep.pl

AUTHOR

Greg George, IT Technology Solutions P/L, Australia
Mobile: +61-404-892-159, Email: gng@cpan.org

LICENSE

Copyright (c) 1999- Greg George. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

VERSION

This is version 1.12

UPDATE HISTORY

Revision 1.12  2017/05/01 Greg
- Added handling of .xlxs & .xlxm files
- Added utf8 conversion for .xls[x|m] files as well as decoding of html escapes & < and >

Revision 1.11  2012/04/10 11:08:42  Greg
- Added handling of .xlxs files

Revision 1.10  2006/04/30 05:35:13  Greg
- added getRowNumber()

Revision 1.9  2006/03/05 02:43:34  Greg
- Update of Acknowledgments

Revision 1.8  2006/03/05 02:31:41  Greg
- Changes to cellValue return to cater for 'GENERAL' value sometimes returned from OpenOffice spreadsheets
  patch provided by Ilia Lobsanov <samogon@gmail.com>
  see http://www.annocpan.org/~KWITKNR/Spreadsheet-ParseExcel-0.2602/ParseExcel.pm#note_18

Revision 1.7  2006/01/25 22:17:47  Greg
- Correction to reading of the first row of the next sheet (without calling getFirstRow).
  Error detected and reported by Tim Rossiter
- Reviewed memory useage as reported by Ilia Lobsanov - this seems to be in the underlying OLE::Storage_Lite

Revision 1.6  2005/02/21 09:54:08  Greg
- Update to setCurrentSheetNum() so that the new sheet is handled by BasicRead functions

Revision 1.5  2004/10/08 22:40:27  Greg
- Changed cellValue to return '' for an empty cell rather than undef (requested by D D Allen).  Old functionality can be maintained by setting named parameter 'oldCell' to true in call to new().
- Added examples to POD

Revision 1.4  2004/10/01 11:02:21  Greg
- Updated getNextRow to skip sheets that have nothing on them

Revision 1.3  2004/09/30 12:32:25  Greg
- Update to currentSheetNum and getNextSheet functions

Revision 1.2  2004/08/21 02:30:29  Greg
- Added setHeadingRow and setRow
- Updated documentation
- Remove irrelavant use lib;

Revision 1.1.1.1  2004/07/31 07:45:02  Greg
- Initial release to CPAN