NAME

dbcolsdecimate - drop rows selectively, keeping large changes and periodic samples

SYNOPSIS

dbcolsdecimate [-p RELATIVE_PREC] [-P ABSOLUTE_PREC] column1 [column2...]

DESCRIPTION

For each of the given columns, prune it back to show changes with at most RELATIVE_PRECISION fraction of total range change (default: 0.01; alternativey one can specify an absolute precision). This tool is designed for reducing the actual data in a graph while keeping it visually identical.

Precisions, if specified, apply to any any subsequent columns. (One can therefore have different precisions for different columsn.)

With multiple columns, major changes in any column cause a record to be emitted.

Our goal is to output an identical plot, with fewer points if we can. This goal differs from and is easier than prior published work that has the goal of the number of points by a known factor, or to a constant number, while preserving as much fidelity as possible.

We usually put out a pair of points at each change, so that if the data has stairsteps, they don't turn in to diagonals.

Please take caution that relative precision is based on evaluation of the range of the data, and so it is sensitive to outliers. Verbose output (-v) will show the actual precision that is promised, allowing one to adjust manually if necessary (with -P).

By default this program temporarily stores a complete copy of the input data on disk. However, if all columns are given absolute precisions, this program runs with constant memory.

OPTIONS

--precision-relative P or --relative-precision P or -p P

Set the precision of how large a fraction of the total range should be presereved. Applies to any subsequent columns. Default: 0.01.

--precision-absolute P or --absolute-precision P or -P P

Set the precision in absolute units. Applies to any subsequent columns.

-T TmpDir

where to put tmp files. Also uses environment variable TMPDIR, if -T is not specified. Default is /tmp.

This module also supports the standard fsdb options:

-d

Enable debugging output.

-v

Enable verbose output.

-i or --input InputSource

Read from InputSource, typically a file name, or - for standard input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

-o or --output OutputDestination

Write to OutputDestination, typically a file name, or - for standard output, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects.

--autorun or --noautorun

By default, programs process automatically, but Fsdb::Filter objects in Perl do not run until you invoke the run() method. The --(no)autorun option controls that behavior within Perl.

--help

Show help.

--man

Show full manual.

SAMPLE USAGE

Input:

#fsdb x y
0 0
1 50
2 50
3 50
4 50
5 50
6 50
7 50
8 50
9 50
10 50
11 50
12 50
13 50
14 50
15 50
16 50
17 50
18 50
19 50
20 50
21 50
22 50
23 50
24 50
25 50
26 50
27 50
28 50
29 50
30 50
31 50
32 50
33 50
34 50
35 50
36 50
37 50
38 50
39 50
40 50
41 50
42 50
43 50
44 50
45 50
46 50
47 50
48 50
49 50
50 50
50 51
50 52
50 53
50 54
50 55
50 56
50 57
50 58
50 59
50 60
50 61
50 62
50 63
50 64
50 65
50 66
50 67
50 68
50 69
50 70
50 71
50 72
50 73
50 74
50 75
50 76
50 77
50 78
50 79
50 80
50 81
50 82
50 83
50 84
50 85
50 86
50 87
50 88
50 89
50 90
50 91
50 92
50 93
50 94
50 95
50 96
50 97
50 98
50 99
100 100

Command:

dbcolsdecimate -v -p 0.1 x -p 0.2 y

Output:

(from TEST/dbcolsdecimate_linear_different.out):

#fsdb x y
# column x with range 100 and relative precision 0.1 gives threshold 10
# column y with range 100 and relative precision 0.2 gives threshold 20
0	0
1	50
11	50
12	50
22	50
23	50
33	50
34	50
44	50
45	50
50	70
50	71
50	91
50	92
50	99
100	100
# output 16 of 101 (0.1584)
#   | dbcolsdecimate -v -p 0.1 x -p 0.2 y

SEE ALSO

Fsdb, dbcolmovingstats.

CLASS FUNCTIONS

new

$filter = new Fsdb::Filter::dbcolsdecimate(@arguments);

Create a new dbcolsdecimate object, taking command-line arguments.

set_defaults

$filter->set_defaults();

Internal: set up defaults.

parse_options

$filter->parse_options(@ARGV);

Internal: parse command-line arguments.

setup

$filter->setup();

Internal: setup, parse headers.

run

$filter->run();

Internal: run over each rows.

AUTHOR and COPYRIGHT

Copyright (C) 2023 by John Heidemann <johnh@isi.edu>

This program is distributed under terms of the GNU general public license, version 2. See the file COPYING with the distribution for details.