NAME

dbcolpercentile - compute percentiles or ranks for an existing column

SYNOPSIS

dbcolpercentile [-rplhS] column

DESCRIPTION

Compute a percentile of a column of numbers. The new column will be called percentile or rank. Non-numeric records are handled as in other programs.

If the data is pre-sorted and only a rank is requested, no extra storage is required. In all other cases, a full copy of data is buffered on disk.

OPTIONS

-p or --percentile

Show percentile (default).

-P or --rank or --nopercentile

Compute ranks instead of percentiles.

--fraction

Show fraction (percentage, except between 0 and 1, not cumulatative fraction).

-a or --include-non-numeric

Compute stats over all records (treat non-numeric records as zero rather than just ignoring them).

-S or --pre-sorted

Assume data is already sorted. With one -S, we check and confirm this precondition. When repeated, we skip the check.

-f FORMAT or --format FORMAT

Specify a printf(3)-style format for output statistics. Defaults to %.5g.

-T TmpDir

where to put tmp files. Also uses environment variable TMPDIR, if -T is not specified. Default is /tmp.

Sort specification options (can be interspersed with column names):

-r or --descending

sort in reverse order (high to low)

-R or --ascending

sort in normal order (low to high)

-n or --numeric

sort numerically (default)

-N or --lexical

sort lexicographically

This module also supports the standard fsdb options:

-d

Enable debugging output.

-i or --input InputSource

Read from InputSource, typically a file name, or - for standard input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

-o or --output OutputDestination

Write to OutputDestination, typically a file name, or - for standard output, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

--autorun or --noautorun

By default, programs process automatically, but Fsdb::Filter objects in Perl do not run until you invoke the run() method. The --(no)autorun option controls that behavior within Perl.

--help

Show help.

--man

Show full manual.

SAMPLE USAGE

Input:

#fsdb name id test1
a 1 80
b 2 70
c 3 65
d 4 90
e 5 70
f 6 90

Command:

cat DATA/grades.fsdb | dbcolpercentile test1

Output:

#fsdb name id test1 percentile
d	4	90	1
f	6	90	1
a	1	80	0.66667
b	2	70	0.5
e	5	70	0.5
c	3	65	0.16667
#  | dbsort -n test1
#   | dbcolpercentile test1

Command 2:

cat DATA/grades.fsdb | dbcolpercentile --rank test1

Output 2:

#fsdb name id test1 rank
d	4	90	1
f	6	90	1
a	1	80	3
b	2	70	4
e	5	70	4
c	3	65	6
#  | dbsort -n test1
#   | dbcolpercentile --rank test1

SEE ALSO

Fsdb. dbcolhisto.

CLASS FUNCTIONS

new

$filter = new Fsdb::Filter::dbcolpercentile(@arguments);

Create a new dbcolpercentile object, taking command-line arguments.

set_defaults

$filter->set_defaults();

Internal: set up defaults.

parse_options

$filter->parse_options(@ARGV);

Internal: parse command-line arguments.

setup

$filter->setup();

Internal: setup, parse headers.

_count_rows

$n = $self->_count_rows()

Interpose a filter on $self-{_in}> that counts the rows.

run

$filter->run();

Internal: run over each rows.

AUTHOR and COPYRIGHT

Copyright (C) 1991-2008 by John Heidemann <johnh@isi.edu>

This program is distributed under terms of the GNU general public license, version 2. See the file COPYING with the distribution for details.