NAME

Spreadsheet::XLSX::Reader::LibXML::Worksheet - Read xlsx worksheets with LibXML

SYNOPSIS

See the SYNOPSIS in Spreadsheet::XLSX::Reader::LibXML (The Workbook level class)

DESCRIPTION

If you want just learn the key elements of how to use the Worksheet instances returned from Spreadsheet::XLSX::Reader::LibXML this POD is for you. You will also want to skip the next paragraph. First, it is best to generate a worksheet instance with the main class. Once you have done that there are several ways to step through the data and return information from the identified location.

As you may have noticed this is only the pod explaining the Worksheet class not the file containing the actual class. In fact worksheets are built somwhat on the fly with an amalgam of classes, roles, and traits each with its own code and POD file (combined). For protips to manipulate each look at the pod for each and then review the bundled tests for each. In the POD for those classes are also found details on how to extend them. Additionally the test cases generally build the smallest possible group of modules from this package to isolate each set of functionality so the represent a good 'under the hood' look at the way each is integrated into this larger package. The trick for testing only roles or classes missing all role functionality is using MooseX::ShortCut::BuildInstance. For protips on manipulating Worksheet instances start with the role Spreadsheet::XLSX::Reader::LibXML::GetCell.

The way to set what type of information is returned

There is a an attribute set in the primary instance Spreadsheet::XLSX::Reader::LibXML called group_return_type. Setting this attribute will return either a full Spreadsheet::XLSX::Reader::LibXML::Cell instance, just the unformatted value, or the formatted value. For more details on the data available in the Cell instance read the documentation for Spreadsheet::XLSX::Reader::LibXML::Cell.

Ways to select cells for reading

These are the various functions that are available (independent of sheet parser type) to select which cells to read.

get_cell( $row, $column )

    Definition: Indicate both the requested row and requested column and the information for that position will be returned. Both $row and $column are required

    Accepts: the list ( $row, $column ) both required

    Returns: see returns for details on what is returned

get_next_value

    Definition: Reading left to right and top to bottom this will return the next cell with a value. This actually includes cells with no value but some unique formatting such as cells that have been merged with other cells.

    Accepts: nothing

    Returns: see returns for details on what is returned

fetchrow_arrayref( $row )

    Definition: In an homage to DBI I included this function to return an array ref of the cells or values in the requested $row. If no row is requested this returns the 'next' row. In the array ref any empty and non unique cell will show as 'undef'.

    Accepts: undef|$row = a row integer indicating the desired row

    Returns: an array ref of all possible column positions in that row with data filled in as appropriate.

fetchrow_array( $row )

    Definition: This function is just like fetchrow_arrayref except it returns an array instead of an array ref

    Accepts: undef = next|$row = a row integer indicating the desired row

    Returns: an array of all possible column positions in that row with data filled in as appropriate.

set_headers( $header_row_list )

    Definition: This function is used to set headers used in the function fetchrow_hashref. It accepts a list of row numbers that will be collated into a set of headers used to build the hashref for each row. The header rows are coallated in sequence with the first number taking precedence. The list is also used to set the lowest row of the headers in the table. All rows at that level and higher will be considered out of the table and will return undef while setting the error instance. If some of the columns do not have values then the instance will auto generate unique headers for each empty header column to fill out the header ref.

    Accepts: a list of row numbers

    Returns: an array ref of the built headers for review

fetchrow_hashref( $row )

    Definition: This function is used to return a hashref representing the data in the specified row. If no $row value is passed it will return the 'next' row of data. A call to this function without setting the headers first will return undef and set the error instance.

    Accepts: a target $row number for return values or undef meaning 'next'

    Returns: a hash ref of the values for that row

Attributes

These are attributes that affect the behaviour of the returned data in the worksheet instance. In general you would not set these on instance generation, Because the primary class will generate this instance for you. Rather you would use the attribue methods listed with each attribute to change the attribute after the worksheet instance has been generated.

min_header_col

    Definition: This attribute affects the hashref that is returned in the method fetchrow_hashref. If the table you are reading does not start in the first column of the sheet then you need to indicate where to start. Otherwize the fetchrow_hashref method will return auto generated headers and attach them to the cell data outside your table but inside the sheet data boundaries. This attribute tells fetchrow_hashref what column to use to start the hash ref build.

    Default: undef (which is equivalent to the minimum column of the sheet)

    Range: The minimum column of the sheet to or less than the max_header_col

attribute methods Methods provided to adjust this attribute

get_min_header_col

    Definition: returns the value stored in the attribute

set_min_header_col

    Definition: Sets a new value for the attribute

has_min_header_col

    Definition: Indicates if the attribute has a stored value

max_header_col

    Definition: This attribute affects the hashref that is returned in the method fetchrow_hashref. If the table you are reading ends before the max column of the sheet then you need indicate where to stop reading. Otherwize the fetchrow_hashref method will return auto generated headers and attach them to the cell values outside your table. This attribute tells fetchrow_hashref what column to use to end the hash ref build.

    Default: undef (equal to the maximum column of the sheet)

    Range: The maximum column of the sheet to or less than the min_header_col

attribute methods Methods provided to adjust this attribute

get_max_header_col

    Definition: returns the value stored in the attribute

set_max_header_col

    Definition: Sets a new value for the attribute

has_max_header_col

    Definition: Indicates if the attribute has a stored value

custom_formats

    Definition: This package will generate value conversions that generally match the numerical conversions set in the Excel spreadsheet. However, it may be that you want to convert the unformatted values for certain cells, rows, or columns in some user defined way. Build an object instance that has the two following methods; 'assert_coerce' and 'display_name'. Then place it here in this attribute as a value to a hash key that is keyed on the target Cell ID, or the row number, or the Column letter callout and this package will assign that conversion when calling 'value' on the cell rather than the conversion stored in the Excel spreadsheet.

    Default: {} = no custom conversions

    Range: keys representing cell ID's, row numbers, or column letter callouts

    Example:

    Building a converter on the fly (or use Type::Library or MooseX::Types

    use DateTimeX::Format::Excel;
    use DateTime::Format::Flexible;
    use Type::Coercion;
    use Type::Tiny;
    my	@args_list	= ( system_type => 'apple_excel' );
    my	$converter	= DateTimeX::Format::Excel->new( @args_list );
    my	$string_via	= sub{ 
    						my	$str = $_[0];
    						return DateTime::Format::Flexible->parse_datetime( $str );#my	$dt	= 
    						#~ return $dt->format_cldr( 'yyyy-M-d' );
    					};
    my	$num_via	= sub{
    						my	$num = $_[0];
    						return $converter->parse_datetime( $num );#my	$dt = 
    						#~ return $dt->format_cldr( 'yyyy-M-d' );
    					};
    
    # Turn date strings or Excel date numbers to DateTime objects!
    my	$date_time_from_value = Type::Coercion->new( 
    	type_coercion_map => [ Num, $num_via, Str, $string_via, ],
    );
    $date_time_type = Type::Tiny->new(
    		name		=> 'Custom_date_type',
    		constraint	=> sub{ ref($_) eq 'DateTime' },
    		coercion	=> $date_time_from_value,
    	);
    
    # Deep coercion! to handle first the $date_time_from_value coercion and then 
    #    build a specific date string output
    $string_type = Type::Tiny->new(
    		name		=> 'YYYYMMDD',
    		constraint	=> sub{
    			!$_ or (
    			$_ =~ /^\d{4}\-(\d{2})-(\d{2})$/ and
    			$1 > 0 and $1 < 13 and $2 > 0 and $2 < 32 )
    		},
    		coercion	=> Type::Coercion->new(
    			type_coercion_map =>[
    				$date_time_type->coercibles, sub{ my $tmp = $date_time_type->coerce( $_ ); $tmp->format_cldr( 'yyyy-MM-dd' ) },
    			],
    		),
    );

    Setting custom conversions to use for the worksheet

    my $worksheet = $workbook->worksheet( 'TargetWorksheetName' );
    $worksheet->set_custom_formats( {
    	E10	=> $date_time_type,
    	10	=> $string_type,
    	D14	=> $string_type,
    } );
    	

attribute methods Methods provided to adjust this attribute

set_custom_formats

    Definition: Sets a new (complete) hashref for the attribute

has_custom_format( $key )

    Definition: checks if the specific custom format $key is set

set_custom_format( $key => $coercion, ... )

    Definition: sets the specific custom format $key(s) with $coercion(s)

get_custom_format( $key )

    Definition: returns the specific custom format for that $key (see has_custom_format )

BUILD / INSTALL from Source

    1. Download a compressed file with the code

    2. Extract the code from the compressed file. If you are using tar this should work:

    tar -zxvf Spreadsheet-XLSX-Reader-LibXML-v0.xx.tar.gz

    3. Change (cd) into the extracted directory

(For Windows find what version of make was used to compile your perl)

perl  -V:make

Then (for Windows substitute the correct make function (s/make/dmake/g)?)

>perl Makefile.PL

>make

>make test

>make install # As sudo/root

>make clean

SUPPORT

TODO

    1. Add a way to set manual headers for fetchrow_hashref

    2. Add a pivot table reader (sometimes returns different values than just the sheet shows)

    3. Add more exposure to worksheet formatting values

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2014 by Jed Lund