NAME
dbcolsdecimate - drop rows selectively, keeping large changes and periodic samples
SYNOPSIS
dbcolsdecimate [-p RELATIVE_PREC] [-P ABSOLUTE_PREC] column1 [column2...]
DESCRIPTION
For each of the given columns, prune it back to show changes with at most RELATIVE_PRECISION fraction of total range change (default: 0.01; alternativey one can specify an absolute precision). This tool is designed for reducing the actual data in a graph while keeping it visually identical.
Precisions, if specified, apply to any any subsequent columns. (One can therefore have different precisions for different columsn.)
With multiple columns, major changes in any column cause a record to be emitted.
Our goal is to output an identical plot, with fewer points if we can. This goal differs from and is easier than prior published work that has the goal of the number of points by a known factor, or to a constant number, while preserving as much fidelity as possible.
We usually put out a pair of points at each change, so that if the data has stairsteps, they don't turn in to diagonals.
Please take caution that relative precision is based on evaluation of the range of the data, and so it is sensitive to outliers. Verbose output (-v) will show the actual precision that is promised, allowing one to adjust manually if necessary (with -P).
By default this program temporarily stores a complete copy of the input data on disk. However, if all columns are given absolute precisions, this program runs with constant memory.
OPTIONS
- --precision-relative P or --relative-precision P or -p P
-
Set the precision of how large a fraction of the total range should be presereved. Applies to any subsequent columns. Default: 0.01.
- --precision-absolute P or --absolute-precision P or -P P
-
Set the precision in absolute units. Applies to any subsequent columns.
- -T TmpDir
-
where to put tmp files. Also uses environment variable TMPDIR, if -T is not specified. Default is /tmp.
This module also supports the standard fsdb options:
- -d
-
Enable debugging output.
- -v
-
Enable verbose output.
- -i or --input InputSource
-
Read from InputSource, typically a file name, or
-
for standard input, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects. - -o or --output OutputDestination
-
Write to OutputDestination, typically a file name, or
-
for standard output, or (if in Perl) a IO::Handle, Fsdb::IO or Fsdb::BoundedQueue objects. - --autorun or --noautorun
-
By default, programs process automatically, but Fsdb::Filter objects in Perl do not run until you invoke the run() method. The
--(no)autorun
option controls that behavior within Perl. - --help
-
Show help.
- --man
-
Show full manual.
SAMPLE USAGE
Input:
#fsdb x y
0 0
1 50
2 50
3 50
4 50
5 50
6 50
7 50
8 50
9 50
10 50
11 50
12 50
13 50
14 50
15 50
16 50
17 50
18 50
19 50
20 50
21 50
22 50
23 50
24 50
25 50
26 50
27 50
28 50
29 50
30 50
31 50
32 50
33 50
34 50
35 50
36 50
37 50
38 50
39 50
40 50
41 50
42 50
43 50
44 50
45 50
46 50
47 50
48 50
49 50
50 50
50 51
50 52
50 53
50 54
50 55
50 56
50 57
50 58
50 59
50 60
50 61
50 62
50 63
50 64
50 65
50 66
50 67
50 68
50 69
50 70
50 71
50 72
50 73
50 74
50 75
50 76
50 77
50 78
50 79
50 80
50 81
50 82
50 83
50 84
50 85
50 86
50 87
50 88
50 89
50 90
50 91
50 92
50 93
50 94
50 95
50 96
50 97
50 98
50 99
100 100
Command:
dbcolsdecimate -v -p 0.1 x -p 0.2 y
Output:
(from TEST/dbcolsdecimate_linear_different.out):
#fsdb x y
# column x with range 100 and relative precision 0.1 gives threshold 10
# column y with range 100 and relative precision 0.2 gives threshold 20
0 0
1 50
11 50
12 50
22 50
23 50
33 50
34 50
44 50
45 50
50 70
50 71
50 91
50 92
50 99
100 100
# output 16 of 101 (0.1584)
# | dbcolsdecimate -v -p 0.1 x -p 0.2 y
SEE ALSO
CLASS FUNCTIONS
new
$filter = new Fsdb::Filter::dbcolsdecimate(@arguments);
Create a new dbcolsdecimate object, taking command-line arguments.
set_defaults
$filter->set_defaults();
Internal: set up defaults.
parse_options
$filter->parse_options(@ARGV);
Internal: parse command-line arguments.
setup
$filter->setup();
Internal: setup, parse headers.
run
$filter->run();
Internal: run over each rows.
AUTHOR and COPYRIGHT
Copyright (C) 2023 by John Heidemann <johnh@isi.edu>
This program is distributed under terms of the GNU general public license, version 2. See the file COPYING with the distribution for details.