NAME
Spreadsheet::Read::Ingester - ingest and save csv and spreadsheet data to a perl data structure to avoid reparsing
SYNOPSIS
use Spreadsheet::Read::Ingester;
# ingest raw file, store parsed data file, and return data object
my $data = Spreadsheet::Read::Ingester->new('/path/to/file');
# the returned data object has all the methods of a L<Spreadsheet::Read> object
my $num_cols = $data->sheet(1)->maxcol;
# delete old data files older than 30 days to save disk space
Spreadsheet::Read::Ingester->cleanup;
DESCRIPTION
This module is intended to be a drop-in replacement for Spreadsheet::Read and is a simple, unobtrusive wrapper for it.
Parsing spreadsheet and csv data files is time consuming, especially with large data sets. If a data file is ingested more than once, much time and processing power is wasted reparsing the same data. To avoid reparsing, this module uses Storable to save a parsed version of the data to disk when a new file is ingested. All subsequent ingestions are retrieved from the stored Perl data structure. Files are saved in the directory determined by File::UserConfig and is a function of the user's OS.
The stored data file names are the unique file signatures for the raw data file. The signature is used to detect if the original file changed, in which case the data is reingested from the raw file and a new parsed file is saved using an updated file signature. Arguments passed to the constructor are appended to the name of the file to ensure different parse options are accounted for. Parsed data files are kept indefinitely but can be deleted with the cleanup()
method.
Consult the Spreadsheet::Read documentation for accessing the data object returned by this module.
METHODS
new( $path_to_file )
my $data = Spreadsheet::Read::Ingester->new('/path/to/file');
Takes same arguments as the new constructor in Spreadsheet::Read module. Returns an object identical to the object returned by the Spreadsheet::Read module along with its corresponding methods.
cleanup( $file_age_in_days )
cleanup()
Spreadsheet::Read::Ingester->cleanup(0);
Deletes all stored files from the user's application data directory. Takes an optional argument indicating the minimum number of days old the file must be before it is deleted. Defaults to 30 days. Passing a value of 0 deletes all files.
REQUIRES
SUPPORT
Perldoc
You can find documentation for this module with the perldoc command.
perldoc Spreadsheet::Read::Ingester
Websites
The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.
MetaCPAN
A modern, open-source CPAN search engine, useful to view POD in HTML format.
Source Code
The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)
https://github.com/sdondley/Spreadsheet-Read-Ingester
git clone git://github.com/sdondley/Spreadsheet-Read-Ingester.git
BUGS AND LIMITATIONS
You can make new bug reports, and view existing ones, through the web interface at https://github.com/sdondley/Spreadsheet-Read-Ingester/issues.
INSTALLATION
See perlmodinstall for information and options on installing Perl modules.
SEE ALSO
AUTHOR
Steve Dondley <s@dondley.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2019 by Steve Dondley.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.