NAME

Spreadsheet::XLSX::Reader::LibXML Worksheet POD - Read xlsx worksheets with LibXML

DESCRIPTION

If you want just learn the key elements of how to use the Worksheet instances returned from Spreadsheet::XLSX::Reader::LibXML this POD is for you. You will also want to skip the next paragraph. First, it is best to generate a worksheet instance with the main class. Once you have done that there are several ways to step through the data and return information from the identified location.

As you may have noticed this is only the pod explaining the Worksheet class not the file containing the actual class. In fact worksheets are built somwhat on the fly with an amalgam of classes, roles, and traits each with its own code and POD file (combined). For protips to manipulate each look at the pod for each and then review the bundled tests for each. In the POD for those classes are also found details on how to extend them. Additionally the test cases generally build the smallest possible group of modules from this package to isolate each set of functionality so the represent a good 'under the hood' look at the way each is integrated into this larger package. The trick for testing only roles or classes missing all role functionality is using MooseX::ShortCut::BuildInstance. For protips on manipulating Worksheet instances start with the role Spreadsheet::XLSX::Reader::LibXML::GetCell.

The way to set what type of information is returned

There is a an attribute set in the primary instance Spreadsheet::XLSX::Reader::LibXML called group_return_type. Setting this attribute will return either a full Spreadsheet::XLSX::Reader::LibXML::Cell instance, just the unformatted value, or the formatted value. For more details on the data available in the Cell instance read the documentation for Spreadsheet::XLSX::Reader::LibXML::Cell.

Ways to select cells for reading

These are the various functions that are available (independent of sheet parser type) to select which cells to read.

get_cell( $row, $column )

    Definition: Indicate both the requested row and requested column and the information for that position will be returned. Both $row and $column are required

    Accepts: the list ( $row, $column ) both required

    Returns: see returns for details on what is returned

get_next_value

    Definition: Reading left to right and top to bottom this will return the next cell with a value. This actually includes cells with no value but some unique formatting such as cells that have been merged with other cells.

    Accepts: nothing

    Returns: see returns for details on what is returned

fetchrow_arrayref( $row )

    Definition: In an homage to DBI I included this function to return an array ref of the cells or values in the requested $row. If no row is requested this returns the 'next' row. In the array ref any empty and non unique cell will show as 'undef'.

    Accepts: undef|$row = a row integer indicating the desired row

    Returns: an array ref of all possible column positions in that row with data filled in as appropriate.

fetchrow_array( $row )

    Definition: This function is just like fetchrow_arrayref except it returns an array instead of an array ref

    Accepts: undef = next|$row = a row integer indicating the desired row

    Returns: an array of all possible column positions in that row with data filled in as appropriate.

set_headers( $header_row_list )

    Definition: This function is used to set headers used in the function fetchrow_hashref. It accepts a list of row numbers that will be collated into a set of headers used to build the hashref for each row. The header rows are coallated in sequence with the first number taking precedence. The list is also used to set the lowest row of the headers in the table. All rows at that level and higher will be considered out of the table and will return undef while setting the error instance.

    Accepts: a list of row numbers

    Returns: an array ref of the built headers for review

fetchrow_hashref( $row )

    Definition: This function is used to return a hashref representing the data in the specified row. If no $row value is passed it will return the 'next' row of data. A call to this function without setting the headers first will return undef and set the error instance.

    Accepts: a target $row number for return values or undef meaning 'next'

    Returns: a hash ref of the values for that row

fetchrow_hashref( $row )################################################## Start Here

    Definition: This function is used to return a hashref representing the data in the specified row. If no $row value is passed it will return the 'next' row of data. A call to this function without setting the headers first will return undef and set the error instance.

    Accepts: a target $row number for return values or undef meaning 'next'

    Returns: a hash ref of the values for that row

Attributes

Data passed to new when creating an instance (parser). For modification of these attributes see the listed 'attribute methods'. For more information on attributes see Moose::Manual::Attributes.

error_inst

    Definition: This attribute holds an 'error' object instance. It should have several methods for managing errors. Currently no error codes or error translation options are available but this should make implementation of that easier.

    Default: a Spreadsheet::XLSX::Reader::LibXML::Error instance with the attributes set as;

    ( should_warn => 0 )

    Range: The minimum list of methods to implement for your own instance is;

    error set_error clear_error set_warnings if_warn
    	

attribute methods Methods provided to adjust this attribute

get_error_inst

Definition: returns this instance

error

Definition: Used to get the most recently logged error

set_error

Definition: used to set a new error string

clear_error

Definition: used to clear the current error string in this attribute

set_warnings

Definition: used to turn on or off real time warnings when errors are set

if_warn

Definition: a method mostly used to extend this package and see if warnings should be emitted.

file_name

    Definition: This attribute holds the full file name and path for the xlsx file to be parsed.

    Default no default - this must be provided to read a file

    Range any unincrypted xlsx file that can be opened in Microsoft Excel

attribute methods Methods provided to adjust this attribute

set_file_name

Definition: change the set file name (this will reboot the workbook instance)

has_file_name

Definition: this is fundamentally a way to see if the workbook loaded correctly

file_creator

    Definition: This holds the information stored in the Excel Metadata for who created the file originally. You shouldn't set this attribute yourself.

    Default the value from the file

    Range A string

attribute methods Methods provided to adjust this attribute

creator

Definition: returns the name of the file creator

file_date_created

    Definition: This holds the created date in the Excel Metadata for when the file was first built. You shouldn't set this attribute yourself.

    Default the value from the file

    Range A timestamp string (ISO ish)

attribute methods Methods provided to adjust this attribute

date_created

Definition: returns the date the file was created

file_modified_by

    Definition: This holds the information stored in the Excel Metadata for who modified the file last. You shouldn't set this attribute yourself.

    Default the value from the file

    Range A string

attribute methods Methods provided to adjust this attribute

modified_by

Definition: returns the user name of the person who last modified the file

file_date_modified

    Definition: This holds the last modified date in the Excel Metadata for when the file was last changed. You shouldn't set this attribute yourself.

    Default the value from the file

    Range A timestamp string (ISO ish)

attribute methods Methods provided to adjust this attribute

date_modified

Definition: returns the date when the file was last modified

sheet_parser

    Definition: This sets the way the .xlsx file is parsed. For now the only choice is 'reader'.

    Default 'reader'

    Range 'reader'

attribute methods Methods provided to adjust this attribute

set_parser_type

Definition: the way to change the parser type

get_parser_type

Definition: returns the currently set parser type

count_from_zero

    Definition: Excel spreadsheets count from 1. Spreadsheet::ParseExcel counts from zero. This allows you to choose either way.

    Default 1

    Range 1 = counting from zero like Spreadsheet::ParseExcel, 0 = Counting from 1 lke Excel

attribute methods Methods provided to adjust this attribute

counting_from_zero

Definition: a way to check the current attribute setting

set_count_from_zero

Definition: a way to change the current attribute setting

file_boundary_flags

    Definition: When you request data past the end of a row or past the bottom of the data this package can return 'EOR' or 'EOF' to indicate that state. This is especially helpful in 'while' loops. The other option is to return 'undef'. This is problematic if some cells in your table are empty which also returns undef.

    Default 1

    Range 1 = return 'EOR' or 'EOF' flags as appropriate, 0 = return undef when requesting a position that is out of bounds

attribute methods Methods provided to adjust this attribute

boundary_flag_setting

Definition: a way to check the current attribute setting

change_boundary_flag

Definition: a way to change the current attribute setting

empty_is_end

    Definition: The excel convention is to read the table left to right and top to bottom. Some tables have uneven columns from row to row. This allows the several methods that take 'next' values to wrap after the last element with data rather than going to the max column.

    Default 0

    Range 1 = treat all columns short of the max column for the sheet as being in the table, 0 = end each row after the last cell with data rather than going to the max sheet column

attribute methods Methods provided to adjust this attribute

is_empty_the_end

Definition: a way to check the current attribute setting

set_empty_is_end

Definition: a way to set the current attribute setting

from_the_edge

    Definition: Some data tables start in the top left corner. Others do not. I don't reccomend that practice but when aquiring data in the wild it is often good to adapt. This attribute sets whether the file reads from the top left edge or from the top row with data and starting from the leftmost column with data.

    Default 1

    Range 1 = treat the top left corner of the sheet even if there is no data in the top row or leftmost column, 0 = Set the minimum row and minimum columns to be the first row and first column with data

attribute methods Methods provided to adjust this attribute

set_from_the_edge

Definition: a way to set the current attribute setting

default_format_list

    Definition: This is a departure from Spreadsheet::ParseExcel for two reasons. First, it doesn't use the same modules. Second, this accepts a role with two methods where ParseExcel accepts an object instance.

    Default Spreadsheet::XLSX::Reader::LibXML::FmtDefault

    Range a Moose role with the methods 'get_defined_excel_format' and 'change_output_encoding' it should be noted that libxml2 which is the underlying code for XML::LibXML allways attempts to get the data into perl friendly strings. That means this should only tweak the data on the way out and does not affect the data on the way in.

attribute methods Methods provided to adjust this attribute

get_default_format_list

Definition: a way to check the current attribute setting

set_default_format_list

Definition: a way to set the current attribute setting

format_string_parser

    Definition: This is the interpreter that turns the excel into a Type::Tiny coercion. If you don't like the output or the method you can write your own Moose Role and add it here.

    Default Spreadsheet::XLSX::Reader::LibXML::ParseExcelFormatStrings

    Range a Moose role with the method 'parse_excel_format_string'

attribute methods Methods provided to adjust this attribute

get_format_string_parser

Definition: a way to check the current attribute setting

set_format_string_parser

Definition: a way to set the current attribute setting

group_return_type

    Definition: Traditionally ParseExcel returns a cell object with lots of methods to reveal information about the cell. In reality this is probably not used very much so in the interest of simplifying you can get a cell object instance set to the cell information. Or you can just get the raw value in the cell or you can get the cell value formatted either the way the sheet specified or the way you specify. See the 'custom_formats' attribute for the Spreadsheet::XLSX::Reader::LibXML::Worksheet class to insert custom targeted formats for use with the parser. All empty cells return undef no matter what.

    Default instance

    Range instance = returns a populated Spreadsheet::XLSX::Reader::LibXML::Cell instance, unformatted = returns the raw value of the cell with no modifications, value = returns just the formatted value stored in the excel cell

attribute methods Methods provided to adjust this attribute

get_group_return_type

Definition: a way to check the current attribute setting

set_group_return_type

Definition: a way to set the current attribute setting

BUILD / INSTALL from Source

    1. Download a compressed file with the code

    2. Extract the code from the compressed file. If you are using tar this should work:

    tar -zxvf Spreadsheet-XLSX-Reader-LibXML-v0.xx.tar.gz

    3. Change (cd) into the extracted directory

(For Windows find what version of make was used to compile your perl)

perl  -V:make

Then (for Windows substitute the correct make function (s/make/dmake/g)?)

>perl Makefile.PL

>make

>make test

>make install # As sudo/root

>make clean

SUPPORT

TODO

    1. Add a pivot table reader (Not just read the values from the sheet)

    2. Add calc chain methods

    3. Add more exposure to workbook formatting methods

    4. Build a DOM parser alternative for the sheets (Theoretically faster than the reader but uses more memory)

AUTHOR

Jed Lund
jandrew@cpan.org

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

This software is copyrighted (c) 2014 by Jed Lund