NAME

Excel::ValueWriter::XLSX - generating data-only Excel workbooks in XLSX format, fast

SYNOPSIS

my $writer = Excel::ValueWriter::XLSX->new;
$writer->add_sheet($sheet_name1, $table_name1, [qw/a b tot/], [[1, 2, '=[a]+[b]'],
                                                               [3, 4],
                                                               ['TRUE', 'FALSE'],
                                                              ]);
$writer->add_sheet($sheet_name2, $table_name2, \@headers, $row_generator);
$writer->add_sheets_from_database($dbh);
$writer->add_defined_name($name, $formula, $comment);
$writer->save_as($filename);

$writer = Excel::ValueWriter::XLSX->new(bool_regex => qr[^(?:(VRAI)|FAUX)$]);
$writer->add_sheet($sheet_name1, $table_name1, [qw/a b/], [['I like Perl:', 'VRAI']]);

DESCRIPTION

The present module is aimed at fast and cost-effective production of "data-only" Excel workbooks, containing nothing but plain values and formulas.

CPU and memory usage are much lower than with the well-known Excel::Writer::XLSX; however the set of features is also much more restricted : there is no support for formats, colors, figures or other fancy Excel features.

Such workbooks with plain data are useful for example :

  • in architectures where Excel is used merely as a local database, for example in connection with a Power Pivot architecture (for that purpose, see also the companion module Excel::PowerPivot::Utils);

  • for fine tuning of the ZIP compression level to be applied to the generated .xlsx file;

  • for generating files with large amounts of data (this was the original motivation for writing this module, because Excel::Writer::XLSX tends to become very slow on files with large numbers of rows).

METHODS

new

my $writer = Excel::ValueWriter::XLSX->new(%options);

Constructor for a new writer object. Options are :

date_regex

A compiled regular expression for detecting data cells that contain dates. The default implementation recognizes dates in dd.mm.yyyy, yyyy-mm-dd and mm/dd/yyyy formats. User-supplied regular expressions should use named captures so that the day, month and year values can be found respectively in $+{d}, $+{m} and $+{y}.

bool_regex

A compiled regular expression for detecting data cells that contain boolean values. The default implementation recognizes uppercase strings 'TRUE' or 'FALSE' as booleans. User-supplied regular expressions should put the word corresponding to 'TRUE' within parenthesis so that the content is captured in $1. Here is an example for french :

$writer = Excel::ValueWriter::XLSX->new(bool_regex => qr[^(?:(VRAI)|FAUX)$]);
compression_level

A number from 0 (no compression) to 9 (maximum compression) specifying the desired ZIP compression level. High values produce smaller files but consume more CPU. The default is taken from COMPRESSION_LEVEL_DEFAULT in Archive::Zip, which amounts to 6.

add_sheet

$writer->add_sheet($sheet_name, $table_name, [$headers,] $rows);

Adds a new worksheet into the workbook.

  • The $sheet_name is mandatory; it must be unique and between 1 and 31 characters long.

  • If $table_name is not undef, the sheet contents will be registered as an Excel table of that name. Excel tables offer more features than regular ranges of cells, so generally it is a good idea to always assign a table name. Table names must be unique, of minimum 3 characters, without spaces or special characters. Technically table names could be equal to sheet names, but this is not recommended because it may create confusion if the data is later to be referred to from Power Query or from Power Pivot.

  • The $headers argument is optional; it may be undef or may even be absent. If present, it should contain an arrayref of scalar values, that will be used as column names for the table associated with that worksheet. Column names should be unique (otherwise Excel will automatically add a discriminating number). If $headers are not present, the first row in $rows will be treated as headers.

  • The $rows argument may be either a reference to a 2-dimensional array of values, or a reference to a callback function that will return a new row at each call, in the form of a 1-dimensional array reference. An empty return from the callback function signals the end of data (but intermediate empty rows may be returned as []). Callback functions should typically be closures over a lexical variable used to decide when the last row has been met. Here is an example of a callback function used to feed a sheet with 500 lines of 300 columns of random numbers:

    my @headers_for_rand = map {"h$_"} 1 .. 300;
    my $random_rows = do {my $count = 500; sub {$count-- > 0 ? [map {rand()} 1 .. 300] : undef}};
    $writer->add_sheet(RAND_SHEET => rand => \@headers_for_rand, $random_rows);

Cells within a row must contain scalar values. Values that look like numbers are treated as numbers. String values that match the date_regex are converted into numbers and displayed through a date format. String values that start with an initial '=' are treated as formulas; but like in Excel, if you want regular string that starts with a '=', put a single quote just before the '=' -- that single quote will be removed from the string. Everything else is treated as a string. Strings are shared at the workbook level (hence a string that appears several times in the input data will be stored only once within the workbook).

add_sheets_from_database

$writer->add_sheets_from_database($dbh, $sheet_prefix, @table_names);

Gets data from database tables and adds them as sheets into the Excel workbook. Arguments are :

$dbh

An active DBI database handle

$sheet_prefix

A string that will be prepended at the beginning of each worksheet name, so that they are different from names of Excel tables. The default is 'S.'.

@table_names

The list of tables to be read from the database. If empty, table names are retrieved automatically from the database through the "table_info" in DBI method.

add_defined_name

$writer->add_defined_name($name, $formula, $comment);

Adds a "defined name" to the workbook. Defined names can be used in any formula within the workbook, and will be replaced by the corresponding content.

  • $name is mandatory and must be unique

  • $formula is mandatory and will be interpreted by Excel like a formula. References to ranges should include the sheet name and use absolute coordinates; for example for concatenating two cells in sheet 's1', the formula is :

    $writer->add_defined_name(cells_1_and_2 => q{'s1'!$A$1&'s1'!$A$2});

    If the intended content is just a constant string, it must be enclosed in double quotes, i.e.

    $writer->add_defined_name(my_string => q{"my_constant_value"});
  • $comment is optional; it will appear when users consult the name manager in the Formulas tab.

save_as

$writer->save_as($target);

Writes the workbook contents into the specified $target, which can be either a filename or filehandle opened for writing.

ARCHITECTURAL NOTES

Object-orientedness

Although I'm a big fan of Moose and its variants, the present module is implemented in POPO (Plain Old Perl Object) : since the aim is to maximize cost-effectiveness, and since the object model is extremely simple, there was no ground for using a sophisticated object system.

Benchmarks

I did a couple of measurements that demonstrate the effectiveness of this module on large datasets; but the results cannot be included into this distribution yet, because the tests are not sufficiently formalized and they are based on on data that is not public.

To do

- options for workbook properties : author, etc.
- support for 1904 date schema
- reproducible benchmarks

SEE ALSO

Excel::Writer::XLSX, Excel::PowerPivot::Utils.

AUTHOR

Laurent Dami, <dami at cpan.org>

COPYRIGHT AND LICENSE

Copyright 2022, 2023 by Laurent Dami.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.