NAME
Data::Tabular::Dumper - Seamlessly dump tabular data to XML, CSV and XLS.
SYNOPSIS
use Data::Tabular::Dumper;
$date=strftime('%Y%m%d', localtime);
my $dumper = Data::Tabular::Dumper->open(
XML => [ "$date.xml", "data" ],
CSV => [ "$date.csv", {} ],
Excel => [ "$date.xls" ]
);
# $data is a 2-d or 3-d data structure
$data = {
'0-monday' => { hits=>30, misses=>5, GPA=>0.42 },
'1-tuesday' => { hits=>17, misses=>3, GPA=>0.17 },
};
$dumper->dump( $data );
## If you want more control :
$dumper->page_start( "My Page" );
# what each field is called
$dumper->fields([qw(uri hits bytes)]);
# now output the data
foreach my $day (@$month) {
$dumper->write($day);
}
$dumper->page_end( "My Page" );
# sane shutdown
$dumper->close();
This would produce the following XML :
<?xml version="1.0" encoding="iso-8859-1"?>
<access>
<My_Page>
<page>
<uri>/index.html</uri>
<hits>4000</hits>
<bytes>5123412</bytes>
</page>
<page>
<uri>/something/index.html</uri>
<hits>400</hits>
<bytes>51234</bytes>
</page>
</My_Page>
<!-- more page tags here -->
</access>
DESCRIPTION
Data::Tabular::Dumper aims to make it easy to turn tabular data into as many file formats as possible. This is useful when you need to provide data that folks will then process further. Because you don't really know what format they want to use, you can provide as many as possible, and let them choose which they want.
Tabular data means data that has 2 dimensions, like a list of lists, a hash of lists, a list of hashes or a hash of hashes.
You may also dump 3 dimentional data; in this case, each of the top-level elements are called pages and each sub-element is independent.
While it might seem desirable to give an example for each data type, this would be onerous to maintain. Please look at the tests to see what a given data object yields.
2 DIMENSIONAL DATA
List of lists
Simplest type of data; each of the sub-lists is output as-is. For XML, the lowest elements are number 0, 1, etc.
Hash of lists
Each of the sub-lists is output prefixed with the key name. For XML, the lowest elements are number 0, 1, etc, with 0 being the key.
List of hashes
The bottom hashes keyed records, column names are hash keys, column values are hash values. Obviously, the list of column names has to be the same for all records, so all the keys in all the hashes are used. If a given hash doesn't have a key, it will be blank in the output at that position.
[ { camera=>"EOS 2000", price=>12000.00 },
{ camera=>"FinePix 1300", price=>150 },
]
This corresponds to the following table:
camera price
EOS 2000 12000.00
FinePix 1300 150.00
Note that keys are asciibetically sorted.
Hash of hashes
Similar to List of hashes
, except the first column is the key in the top hash. For XML the key is used instead of record
, unless you are using master_key
(see open). Keys are asciibetically sorted.
Example :
{ monday => { honk => 42, bonk=>17 },
wednesday => { honk => 12, blurf=>36 }
}
CSV and Excel would look like:
,blurf,bonk,honk
monday,,17,42
wednesday,36,12
The XML would look like:
<DATA>
<monday>
<bonk>17</bonk>
<honk>42</honk>
</monday>
<wednesday>
<blurf>36</blurf>
<honk>12</honk>
</wednesday>
</DATA>
3 DIMENSIONAL DATA
List of 2D data
Each element in the top list is a page. Pages are named Page 1, Page 2 and so on. Each 2D element is treated seperately as above.
Hash of lists of lists
Hash of lists of hashes
Hash of hashes of hashes
Each value in the top hash is a page. Pages are named by their keys. Each 2D element is treated seperately as above, as if you were doing:
foreach my $key ( sort keys %$HoX ) {
$dumper->page_start( $key );
$dumper->dump( $HoX->{$key} );
$dumper->page_send( $key );
}
Hash of hashes of lists
NOT SUPPORTED
FUNCTIONS
Dump( $data )
Calls dump
as a package method. In other words, it does the following:
Data::Tabular::Dumper->dump( $data );
Data::Tabular::Dumper METHODS
open(%writers)
Creates the Data::Tabular::Dumper object. %writers
is a hash that contains the the package of the object (as keys) and the parameters for it's open()
function (as values). As a convienience, the Data::Tabular::Dumper::* modules can be specified as XML, Excel or CSV. The example in the SYNOPSIS would create 3 objects, via the following calls :
$obj0 = Data::Tabular::Dumper::XML->open( ["$date.xml","users", "user"] );
$obj1 = Data::Tabular::Dumper::Excel->open( ["$date.xls"] );
$obj2 = Data::Tabular::Dumper::CSV->open( ["$date.xls", {}] );
Note that you must load a given package first. Data::Tabular::Dumper-
open> will not do so for you.
You may also create your own packages. See WRITER OBJECTS below.
There is one special key in %writers
:
- master_key
-
Sets the column name for the first column when dumping hash of lists, hash of hashes or the equivalent 3-D structures. The first column corresponds to the key names of the top hash.
close()
Does an orderly close of all the writers. Some of the writers need this to clean up data and write file footers properly. Note that DESTROY also calls close.
master_key( [$key] )
Sets the master_key
, returning old value. If called without a parameter, returns current master_key
.
dump( $data )
Analyses $data
, then dumps each of it's component objects to the configured files.
Dump
is not efficient. It must walk over the data 2 and sometimes 3 times. It may also modify your data, so watch out.
May also be called as a package method, in which case it returns a CSV representation of the data.
print $fh Data::Tabular::Dumper->dump( $data );
page_start( $name )
Opens a new page in each file named $name
. You must call fields() if you want it to have a header.
For XML, a page is an XML element that wraps all furthur data. The element's name is $name
with all non-word characters converted to an underscore ($name =~ s/\W/_/g
.)
page_end( $name )
Closes the current page. Please make sure $name
is identical to what was passed to page_start
.
fields($fieldref)
Sets the column headers to the values in the arrayref $fieldref. Calling this "fields" might be misdenomer. Field headers are often concidered a "special" row of data.
write($dataref)
Writes a row of data from the arrayref $dataref.
WRITER OBJECTS
An object must implement 4 methods for it to be useable by Data::Tabular::Dumper.
open($package, $p)
Create the object, opening any necessary files. $p
is the data handed to Data::Tabular::Dumper->open.
close()
Do any necesssary cleaning up, like outputing a footer, closing files, etc.
fields($fieldref)
Define the names of the fields. $fieldref
is an arrayref containing all the field headings.
write($dataref)
Write a row of data to the output. $dataref
is an arrayref containing a row of data to be output.
page_start($name) =head2 page_end($name)
Start and end a new page in the output. If it is called from dump, all pages are started and ended with the same $name
. If called from user code, all bets are off.
PREDEFINED WRITERS
Data::Tabular::Dumper::XML
Produces an XML file of the tabular data.
open($package, [$file_or_fh, $top, $record])
Opens the file $file_or_fh
for writing if it is a scalar. Otherwise $file_or_fh
is considered a filehandle. The top element is $top
and defaults to DATA. Each record is a $record
element and defaults to RECORD.
fields($fieldref)
Define the tag for each data value.
write($dataref)
Output a record. Each item in the arrayref $dataref
becomes an element named by the corresponding name set in fields()
. If there are more items in $dataref
then fields, the last field name is duplicated. If there are no fields defined, elementes are named 0, 1, etc.
Example :
$xml=Data::Tabular::Dumper::XML->open(['something.xml']);
$xml->fields([qw(foo bar)]);
$xml->write([0..5]);
Would produce the following XML :
<?xml version="1.0" encoding="iso-8859-1"?>
<DATA>
<RECORD>
<foo>0</foo>
<bar>1</bar>
<bar>2</bar>
<bar>3</bar>
<bar>4</bar>
<bar>5</bar>
</RECORD>
</DATA>
Likewise,
$xml=Data::Tabular::Dumper::XML->open(['something.xml']);
$xml->dump( [ [ { up=>1, down=>-1, left=>0.5, right=>-0.5 } ] ] );
$xml->close
Would produce the following XML :
<?xml version="1.0" encoding="iso-8859-1"?>
<DATA>
<Page_1>
<down>-1</down>
<left>0.5</left>
<right>-0.5</right>
<up>1</up>
</Page_1>
</DATA>
Data::Tabular::Dumper::CSV
Produces an CSV file of the tabular data. Each new page is started a row with the page name on it and ending with a blank line.
open($package, [$file_or_fh, $CSVattribs])
Opens the file $file_or_fh
for writing if it is a scalar. Otherwise $file_or_fh
is considered a filehandle. Creates a Text::CSV_XS object using the attributes in the hashref $CSVattribs
.
It should be noted that you probably want to set eol
to \n
, otherwise all the output will be on one line. See Text::CSV_XS
for details.
Example :
$xml=Data::Tabular::Dumper::CSV->open(['something.xml',
{eol=>"\n", binary=>1}]);
$xml->fields([qw(foo bar)]);
$xml->write("me,you", "other");
Would produce the following CSV :
foo,bar
"me,you",other
fields( $fieldref )
Outputs a row that contains the names of the fields. Basically, it's the same as write
.
Data::Tabular::Dumper::Excel
Produces an Excel workbook of the tabular data. Each page is a new worksheet.
If you want a header on each worksheet, you must call fields()
after each page is started. If you do not call page_start()
, a default empty worksheet is used. Note that dump()
handles all this for you.
open($package, [$file])
Creates the workbook $file
.
fields($fieldref)
Creates a row in bold from the elements in the arrayref $fieldref
.
BUGS
There are no test cases for all dump
's edge cases, such as non-heterogeous lower data elements.
There is no verification of the Excel workbooks produced.
No support for RDBMSes. I'm not fully sure how this would work... each page would be a table? What about lists as the lowest data structure? We'd need a way to match data columns to table columns.
close
should call page_end
if there is one pending.
AUTHOR
Philip Gwyn <perl at pied.nu>
SEE ALSO
Text::CSV_XS, Spreadsheet::WriteExcel, http://www.xml.org, perl.