NAME
Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings - Parser of XLSX format strings
SYNOPSYS
#!/usr/bin/env perl
package MyPackage;
use Moose;
sub get_log_space{};
with 'Spreadsheet::XLSX::Reader::LibXML::UtilFunctions',
'Spreadsheet::XLSX::Reader::LibXML::FmtDefault';
# call 'with' a second time to ensure that the prior methods are recorded
with 'Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings';
package main;
my $parser = MyPackage->new( epoch_year => 1904 );
my $conversion = $parser->parse_excel_format_string( '[$-409]dddd, mmmm dd, yyyy;@' );
print 'For conversion named: ' . $conversion->name . "\n";
for my $unformatted_value ( '7/4/1776 11:00.234 AM', 0.112311 ){
print "Unformatted value: $unformatted_value\n";
print "..coerces to: " . $conversion->assert_coerce( $unformatted_value ) . "\n";
}
###########################
# SYNOPSIS Screen Output
# 01: For conversion named: DATESTRING_0
# 02: Unformatted value: 7/4/1776 11:00.234 AM
# 03: ..coerces to: Thursday, July 04, 1776
# 04: Unformatted value: 0.112311
# 05: ..coerces to: Friday, January 01, 1904
###########################
DESCRIPTION
This documentation is written to explain ways to extend this package. To use the data extraction of Excel workbooks, worksheets, and cells please review the documentation for Spreadsheet::XLSX::Reader::LibXML, Spreadsheet::XLSX::Reader::LibXML::Worksheet, and Spreadsheet::XLSX::Reader::LibXML::Cell
This is a general purpose Moose Role that will convert Excel format strings into Type::Coercion objects in order to implement the conversion defined by the format string. It does allow for the format string to have up to four parts separated by semi-colons. The four parts are positive, zero, negative, and text. In the Excel version the text is just a pass through. Because excel does not record dates prior to 1900ish as numbers this will actually take the date format indicated by the number portion and try and use that output format to coerce a text date into the same values. All dates are processed into and then potentially back out of DateTime objects.
To replace this module just build a Moose::Role that has the following Primary Methods and Attributes. Then set the format_string_parser attribute with the new role name when initially starting Spreadsheet::XLSX::Reader::LibXML.
The decimal conversion to fractions initially uses the continued fraction algorythm to calculate a possible fraction for the pased $decimal value with the setting of 20 max iterations and a maximum denominator width defined by the format string. If that does not resolve satisfactorily it then calculates an over/under numerator with decreasing denominators from the maximum denominator (based on the format string) all the way to the denominator of 2 and takes the most accurate result. There is no early-out set in this computation so if you reach this point for multi digit denominators it is computationally intensive. (Not that continued fractions are computationally so cheap.). However, doing the calculation this way generally yields the same result as Excel. In some few cases the result is more accurate. I was unable to duplicate the results from Excel exactly.
requires
These are method(s) used by this Role but not provided by the role. Any class consuming this role will not build without first providing these methods prior to loading this role.
get_log_space
Definition: Used to return the log space used by the code protected by ###LogSD. See Log::Shiras for more information.
get_excel_region
Definition: Used to return the two letter region ID. This ID is then used by DateTime::Format::Flexible to interpret date strings. Currently this method is provided by Spreadsheet::XLSX::Reader::LibXML::FmtDefault when it is loaded as a role of Spreadsheet::XLSX::Reader::LibXML::Styles.
Primary Methods
These are the primary ways to use this Role. For additional ParseExcelFormatStrings options see the Attributes section.
parse_excel_format_string( $string )
Definition: This is the method to convert Excel format strings into Type::Tiny objects with built in coercions. The type coercion objects are then used to convert unformatted values into formatted values using the assert_coerce method. Coercions built allow for the format string to have up to four parts separated by semi-colons. These four parts correlate to four different data input ranges. The four parts are positive, zero, negative, and text. If three substrings are sent then the data input is split to positive and zero, negative, and text. If two input types are sent the data input is split between numbers and text. One input type is a take all comers with the exception of dates. Dates always add a possible from-text conversion to process Excel pre-1900ish dates. This is because excel does not record dates prior to 1900ish as numbers. All dates are processed into and then potentially back out of DateTime objects. This requires deep or stacked coercions .
Accepts: an Excel number format strings
Returns: a Type::Tiny object with type coercions and pre-filters set for each input type from the formatting string
Attributes
Data passed to new when creating the Styles instance. For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes. Most of these are not exposed to the top level of Spreadsheet::XLSX::Reader::LibXML.
epoch_year
Definition: This is the epoch year in the Excel sheet. It differentiates between Windows and Apple Excel implementations. For more information see DateTimeX::Format::Excel
Default: 1900
attribute methods Methods provided to adjust this attribute
get_epoch_year
Definition: returns the value of the attribute
datetime_dates
Definition: It may be that you desire the full DateTime object rather than the finalized datestring when converting unformatted data to formatted date data. This attribute sets whether data coersions are built to do the full conversion or just to a DateTime object.
Default: 0 = unformatted values are coerced completely to date strings (1 = stop at DateTime)
attribute methods Methods provided to adjust this attribute
get_date_behavior
Definition: returns the value of the attribute
set_date_behavior( $Bool )
Definition: sets the attribute value (only new coercions are affected)
Accepts: Boolean values
cache_formats
Definition: In order to save re-building the coercion each time it is required, built coercions can be cached with the format string as the key. This attribute sets whether caching is turned on or not.
Default: 1 = caching is on
attribute methods Methods provided to adjust this attribute
get_cache_behavior
Definition: returns the value of the attribute
format_cash
Definition: This is the format cache described in cache_formats. It stores pre-built formats for re-use.
Default: {}
attribute methods Methods provided to adjust this attribute
has_cached_format( $format_string )
Definition: returns true if the $format_string has a pre-built coersion already stored
=set_cached_format( $format_string = $coercion )>
Definition: sets the coersion object for $format_string key
get_cached_format( $format_string )
Definition: gets the coersion object stored against the $format_string key
SUPPORT
TODO
1. Attempt to merge _split_decimal_integer and _integer_and_decimal
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2014 by Jed Lund
DEPENDENCIES
Type::Tiny - 0.046
DateTimeX::Format::Excel - 0.012
Clone - clone
SEE ALSO
Spreadsheet::ParseExcel - Excel 2003 and earlier
Spreadsheet::XLSX - 2007+
Spreadsheet::ParseXLSX - 2007+
All lines in this package that use Log::Shiras are commented out