NAME
Spreadsheet::XLSX::Reader::LibXML::GetCell - Top level xlsx Worksheet interface
SYNOPSIS
If you are looking for the synopsis for the package see "SYNOPSIS" in Spreadsheet::XLSX::Reader::LibXML. Otherwise the best example for use of this module alone is the test file in this package t/Spreadsheet/XLSX/Reader/LibXML/10-get_cell.t
DESCRIPTION
This documentation is written to explain ways to use this module when writing your own excel parser. To use the general package for excel parsing out of the box please review the documentation for Workbooks, Worksheets, and Cells
This is the extracted Role to be used as a top level worksheet interface. This is the place where all the various details in each sub XML sheet are coallated into a set of data representing all the necessary information for a requested cell. Since this is the center of data coallation all elements that may be customized should reside outside of this role. This includes any specific elements that would be different between each of the sheet parser types and any element of Excel data presentation that may lend itself to customization. For instance all the XML parser methods, (Reader, DOM, and possibly SAX) should exist outside and preferebly below this role.
This role is also contains a layer of abstraction to allow for run time setting of count-from-one or count-from-zero mode. The layer of abstraction is use with the Moose around modifier.
requires
These are method(s) used by this Role but not provided by the role. Any class consuming this role will not build without first providing these methods prior to loading this role. Since this is the center of data collation the list is long.
min_row
Definition: Used to get the minimum row with data in the worksheet.
max_row
Definition: Used to get the maximum row with data in the worksheet.
min_col
Definition: Used to get the minimum column with data in the worksheet.
max_col
Definition: Used to get the maximum column with data in the worksheet.
row_range
Definition: Used to return a list of the $minimum_row and $maximum_row values
col_range
Definition: Used to return a list of the $minimum_column and $maximum_column values.
_get_next_value_cell
Definition: This should return the next cell data from the worksheet file that contains unique formatting or information. The data is expected in a perl hash ref. This method should collect data left to right and top to bottom. The styles.xml, sharedStrings.xml, and calcChain.xml etc. sheet data are coallated into the cell information at this point. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.
Example of expected return data set
{
'r' => 'A6', # The cell ID
'cell_merge' => 'A6:B6', # The merge range
'row' => 6, # count by 1 (no 'around' performed on leading '_' methods)
'col' => 1, # count by 1 (no 'around' performed on leading '_' methods)
's' => '11', # Styles type (position 11 in the styles sheet)
't' => 's' # Cell data type (string)
'v' =>{ # Cell data (since this cell is string
'raw_text' => '15' # data this actually points to position
} # 15 in the sharedStrings.xml file )
}
_get_next_cell
Definition: Like _get_next_value_cell this method should return the next cell. The difference is it should return undef for empty cells rather than skipping them. This method should collect data left to right and top to bottom. The styles.xml, sharedStrings.xml, and calcChain.xml etc. sheet data are coallated into the cell information at this point. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.
_get_col_row
Definition: This method should provide a targeted way to return the worksheet file information on a cell. It should only accept count-from-one column and row numbers and the column should be required before the row. If the request is made for an out of row bounds position the method should provide an 'EOR' string. An 'EOF' string should be returned when the file has reached the end and then the method should wrap back to the beginning.
The attribute workbook_instance must also be filled correctly since it exports all of the the workbook level functionality to this class.
Primary Methods
These are the various methods provided by this role. Each of them calls a sub method to get the base cell data and then coallates that information into the proper return value(s) defined by "group_return_type" in Spreadsheet::XLSX::Reader::LibXML.
get_cell( $row, $column )
Definition: This calls the supplied method _get_col_row.
Accepts: the list ( $row, $column ) both required (and modified as needed by the attribute state of "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML)
Returns: if data to build a cell instance is provided then the instance is collated, built, and returned. Otherwise the value from '_get_col_row' is returned unfiltered.
get_next_value
Definition: This calls the supplied method _get_next_value_cell
Accepts: nothing
Returns: if data to build a cell instance is provided then the instance is collated, built, and returned. Otherwise the value from '_get_next_value_cell' is returned unfiltered.
fetchrow_arrayref( $row )
Definition: This calls the supplied method _get_row_all. It will return 'EOF' once instead of an array reference for the end of the file before resetting to the first row..
Accepts: undef = next|$row = a row integer indicating the desired row (modified as needed by the attribute state of "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML)
Returns: an array ref of all possible column positions in that row with data filled in as appropriate. (or 'EOF')
fetchrow_array( $row )
Definition: This function calls fetchrow_arrayref except it returns an array instead of an array ref
Accepts: undef = next|$row = a row integer indicating the desired row
Returns: an array of all possible column positions in that row with data filled in as appropriate.
set_headers( @header_row_list [ \&header_scrubber ] )
Definition: This function is used to set headers used in the function fetchrow_hashref. It accepts a list of row numbers that will be collated into a set of headers used to build the hashref for each row. The header rows are coallated in sequence with the first number taking precedence. The list is also used to set the lowest row of the headers in the table. All rows at that level and higher will be considered out of the table and will return undef while setting the error instance. If some of the columns do not have values then the instance will auto generate unique headers for each empty header column to fill out the header ref. [ optionally: it is possible to pass a coderef to scrub the headers so they make some sence. for example; ]
my $scrubber = sub{
my $input = $_[0];
$input =~ s/\n//g if $input;
$input =~ s/\s/_/g if $input;
return $input;
};
$self->set_headers( 2, 1, $scrubber ); # Called internally as $new_value = $scrubber->( $old_value );
# Returns/stores the headers set at row 2 and 1 with values from row 2 taking precedence
# Then it scrubs the values by removing newlines and replacing spaces with underscores.
Accepts: a list of row numbers (modified as needed by the attribute state of "count_from_zero" in Spreadsheet::XLSX::Reader::LibXML) and an optional closure .
Returns: an array ref of the built headers for review
fetchrow_hashref( $row )
Definition: This function is used to return a hashref representing the data in the specified row. If no $row value is passed it will return the 'next' row of data. A call to this function without setting the headers first will return undef and set the error instance. This function calls _get_row_all.
Accepts: a target $row number for return values or undef meaning 'next'
Returns: a hash ref of the values for that row
Attributes
Arguments that can be passed to new when creating a class instance or changed using one of the 'attribute methods'. Where an attribute is delegating the 'attribute method' from a method in the instance stored in the attribute the documentation will indicate that the 'attribute method' is 'delegated'. All 'delegated' methods are required for the instance to be accepted by the attribute. For more information on attributes see Moose::Manual::Attributes and Moose::Manual::Delegation.
last_header_row
Definition: This is generally set by the method set_headers( @header_row_list ) method not during ->new and is the largest row number of the @header_row_list not necessarily the last number in the sequence.
Default: undef
attribute methods Methods provided to adjust this attribute
get_last_header_row
Definition: returns the value of the attribute
has_last_header_row
Definition: predicate for the attribute
min_header_col
Definition: When the method fetchrow_hashref is called it is possible to only return a set of information between two defined columns. This is the attribute that defines the start column.
Default: undef
attribute methods Methods provided to adjust this attribute
set_min_header_col
Definition: sets the value of the attribute
Range: integer values Integers less than min_col will be ignored
get_min_header_col
Definition: returns the value of the attribute
has_min_header_col
Definition: predicate for the attribute
clear_min_header_col
Definition: sets min_header_col to 'undef'
max_header_col
Definition: When the method fetchrow_hashref is called it is possible to only collect a set of information between two defined columns. This is the attribute that defines the end column.
Default: undef
attribute methods Methods provided to adjust this attribute
set_max_header_col
Definition: sets the value of the attribute
Range: integer values Integers larger than max_col will be ignored
get_max_header_col
Definition: returns the value of the attribute
has_max_header_col
Definition: predicate for the attribute
clear_max_header_col
Definition: sets min_header_col to 'undef'
custom_formats
Definition: When this role is coallating data about a cell it will check this attribute before it checks the styles sheet to see if there is a format defined by the user for converting the unformatted data. The formats stored must have two methods 'assert_coerce' and 'display_name'. The cell instance builder will consult this attribute by first checking the cellID as a key, then it checks for just the column letter(s) as a key, and finally it checks the row number as a key. For an easy way to build custom conversion review the documentation for Type::Tiny and Type::Coercions. the Chained Coercions are very cool!.
Default: undef
attribute methods Methods provided to adjust this attribute
set_custom_formats( { $key => $conversion } )
Definition: a way to set all $key => $conversion pairs at once
Accepts: a hashref of $key => $conversion pairs
has_custom_format( $key )
Definition: checks if the specific $key for a format is registered
get_custom_format( $key )
Definition: get the custom format for the requested $key
Returns: the $conversion registered to the $key
set_custom_format( $key => $conversion )
Definition: set the custom format $conversion for the identified $key
workbook_instance
Definition: This is where the workbook level methods are accessed by the worksheet. Because the workbook class is complex and I don't wan't to maintain duplicate documentation I request that you review the documentation for that class there. This attribute can/should only be set at ->new, however, it delegates to this class a number of methods that will update the workbook instance and therefore have universal effect when the other sheets are read.
Default: none
Required: yes
attribute methods Methods provided to adjust this attribute
counting_from_zero - delegated
boundary_flag_setting - delegated
change_boundary_flag - delegated
_has_shared_strings_file - delegated
get_shared_string_position - delegated
_has_styles_file - delegated
get_format_position - delegated
set_empty_is_end - delegated
is_empty_the_end - delegated
_starts_at_the_edge - delegated
get_group_return_type - delegated
set_group_return_type - delegated
get_epoch_year - delegated
change_output_encoding - delegated
get_date_behavior - delegated
set_date_behavior - delegated
get_empty_return_type - delegated
set_error - delegated
set_values_only - delegated
get_values_only - delegated
SUPPORT
github Spreadsheet::XLSX::Reader::LibXML/issues
1. Add the workbook attributute to the documentation
TODO
1. Eliminate the min / max row / col calls from this role (and requireds) if possible.
AUTHOR
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.
This software is copyrighted (c) 2014, 2015 by Jed Lund
DEPENDENCIES
version - 0.77
Type::Tiny - 1.000
Spreadsheet::XLSX::Reader::LibXML::Cell
requires
min_row
max_row
min_col
max_col
row_range
col_range
_get_col_row
_get_next_value_cell
_get_row_all
SEE ALSO
Spreadsheet::ParseExcel - Excel 2003 and earlier
Spreadsheet::XLSX - 2007+
Spreadsheet::ParseXLSX - 2007+
All lines in this package that use Log::Shiras are commented out