NAME

dbfilepivot - pivot a table, converting multiple rows into single wide row

SYNOPSIS

dbfilepivot [-e empty] -k KeyField -p PivotField [-v ValueField]

DESCRIPTION

Pivot a table, converting multiple rows corresponding to the same key into a single wide row.

In a normalized database, one might have data with a schema like (id, attribute, value), but sometimes it's more convenient to see the data with a schema like (id, attribute1, attribute2). (For example, gnuplot's stacked histograms requires denormalized data.) Dbfilepivot converts the normalized format to the denormalized, but sometimes useful, format. Here the "id" is the key, the attribute is the "pivot", and the value is, well, the optional "value".

An example is clearer. A gradebook usually looks like:

#fsdb name hw_1 hw_2 hw_3
John       97  98  99
Paul       -   80  82

but a properly normalized format would represent it as:

#fsdb name hw score
John       1  97
John       2  98
John       3  99
Paul       2  80
Paul       3  82

This tool converts the second form into the first, when used as

dbfilepivot -k name -p hw -v score

or

dbfilepivot --possible-pivots='1 2 3' -k name -p hw -v score

Here name is the key column that indicates which rows belong to the same entity, hw is the pivot column that will be indicate which column in the output is relevant, and score is the value that indicates what goes in the output.

The pivot creates a new column key_tag1, key_tag2, etc. for each tag, the contents of the pivot field in the input. It then populates those new columns with the contents of the value field in the input.

If no value column is specified, then values are either empty or 1.

Dbfilepivot assumes all lines with the same key are adjacent in the input source, like dbmapreduce(1) with the -S option. To enforce this invariant, by default, it requires input be sorted by key.

There is no requirement that the pivot field be sorted (provided the key field is already sorted).

By default, dbfilepivot makes two passes over its data and so requires temporary disk space equal to the input size. With the --possible-pivots option, the user can specify pivots and skip the second pass and avoid temporary data storage.

Memory usage is proportional to the number of unique pivot values.

The inverse of this commend is dbcolsplittorows.

OPTIONS

-k or --key KeyField

specify which column is the key for grouping. Required (no default).

-p or --pivot PivotField

specify which column is the key to indicate which column in the output is relevant. Required (no default).

-v or --value ValueField

Specify which column is the value in the output. If none is given, 1 is used for the value.

--possible-pivots PP

Specify all possible pivot values as PP, a whitespace-separated list. With this option, data is processed only once (not twice).

-C S or --element-separator S

Specify the separator S used to join the input's key column with its contents. (Defaults to a single underscore.)

-e E or --empty E

give value E as the value for empty (null) records

-S or --pre-sorted

Assume data is already grouped by key. Provided twice, it removes the validation of this assertion. By default, we sort by key.

-T TmpDir

where to put tmp files. Also uses environment variable TMPDIR, if -T is not specified. Default is /tmp.

This module also supports the standard fsdb options:

-d

Enable debugging output.

-i or --input InputSource

Read from InputSource, typically a file name, or - for standard input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

-o or --output OutputDestination

Write to OutputDestination, typically a file name, or - for standard output, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

--autorun or --noautorun

By default, programs process automatically, but Fsdb::Filter objects in Perl do not run until you invoke the run() method. The --(no)autorun option controls that behavior within Perl.

--help

Show help.

--man

Show full manual.

SAMPLE USAGE

Input:

#fsdb name hw score
John       1  97
John       2  98
John       3  99
Paul       2  80
Paul       3  82

Command:

cat data.fsdb | dbfilepivot -k name -p hw -v score

Output:

#fsdb name hw_1 hw_2 hw_3
John	97	98	99
Paul	-	80	82
#   | dbfilepivot -k name -p hw -v score

SEE ALSO

Fsdb(3). dbcolmerge(1). dbcolsplittorows(1). dbcolsplittocols(1).

CLASS FUNCTIONS

new

$filter = new Fsdb::Filter::dbfilepivot(@arguments);

Create a new dbfilepivot object, taking command-line arguments.

set_defaults

$filter->set_defaults();

Internal: set up defaults.

parse_options

$filter->parse_options(@ARGV);

Internal: parse command-line arguments.

_find_possible_pivots

$filter->_find_possible_pivots();

Internal: scan input data to find all possible pivot values.

Returns npivots, pivots_aref.

_given_possible_pivots

$filter->_given_possible_pivots();

Internal: parse option of possible pivots.

Returns npivots, pivots_href.

setup

$filter->setup();

Internal: setup, parse headers.

run

$filter->run();

Internal: run over each rows.

AUTHOR and COPYRIGHT

Copyright (C) 2011-2018 by John Heidemann <johnh@isi.edu>

This program is distributed under terms of the GNU general public license, version 2. See the file COPYING with the distribution for details.