NAME

Statistics::Descriptive::Discrete - Compute descriptive statistics for discrete data sets.

To install, use the CPAN module (https://metacpan.org/pod/Statistics::Descriptive::Discrete).

SYNOPSIS

    use Statistics::Descriptive::Discrete;

    my $stats = new Statistics::Descriptive::Discrete;
    $stats->add_data(1,10,2,1,1,4,5,1,10,8,7);
    print "count = ",$stats->count(),"\n";
    print "uniq  = ",$stats->uniq(),"\n";
    print "sum = ",$stats->sum(),"\n";
    print "min = ",$stats->min(),"\n";
    print "min index = ",$stats->mindex(),"\n";
    print "max = ",$stats->max(),"\n";
    print "max index = ",$stats->maxdex(),"\n";
    print "mean = ",$stats->mean(),"\n";
    print "geometric mean = ",$stats->geometric_mean(),"\n";
    print "harmonic mean = ", $stats->harmonic_mean(),"\n";
    print "standard_deviation = ",$stats->standard_deviation(),"\n";
    print "variance = ",$stats->variance(),"\n";
    print "sample_range = ",$stats->sample_range(),"\n";
    print "mode = ",$stats->mode(),"\n";
    print "median = ",$stats->median(),"\n";
    my $f = $stats->frequency_distribution_ref(3);
    for (sort {$a <=> $b} keys %$f) {
      print "key = $_, count = $f->{$_}\n";
    }

DESCRIPTION

This module provides basic functions used in descriptive statistics. It borrows very heavily from Statistics::Descriptive::Full (which is included with Statistics::Descriptive) with one major difference. This module is optimized for discretized data e.g. data from an A/D conversion that has a discrete set of possible values.
E.g. if your data is produced by an 8 bit A/D then you'd have only 256 possible values in your data set. Even though you might have a million data points, you'd only have 256 different values in those million points. Instead of storing the entire data set as Statistics::Descriptive does, this module only stores the values seen and the number of times each value occurs.

For very large data sets, this storage method results in significant speed and memory improvements. For example, for an 8-bit data set (256 possible values), with 1,000,000 data points, this module is about 10x faster than Statistics::Descriptive::Full or Statistics::Descriptive::Sparse.

Statistics::Descriptive run time is a factor of the size of the data set. In particular, repeated calls to add_data are slow. Statistics::Descriptive::Discrete's add_data is optimized for speed. For a give number of data points, this module's run time will increase as the number of unique data values in the data set increases. For example, while this module runs about 10x the speed of Statistics::Descriptive::Full for an 8-bit data set, the run speed drops to about 3x for an equivalent sized 20-bit data set.

See sdd_prof.pl in the examples directory to play with profiling this module against Statistics::Descriptive::Full.

METHODS

NOTE

The interface for this module strives to be identical to Statistics::Descriptive.
Any differences are noted in the description for each method.

BUGS

TODO

AUTHOR

Rhet Turnbull, rturnbull+cpan@gmail.com

CREDIT

Thanks to the following individuals for finding bugs, providing feedback, and submitting changes:

COPYRIGHT

Copyright (c) 2002, 2019 Rhet Turnbull. All rights reserved.  This
program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

Portions of this code is from Statistics::Descriptive which is under
the following copyrights:

Copyright (c) 1997,1998 Colin Kuskie. All rights reserved.  This
program is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.

Copyright (c) 1998 Andrea Spinelli. All rights reserved.  This program
is free software; you can redistribute it and/or modify it under the
same terms as Perl itself.

Copyright (c) 1994,1995 Jason Kastner. All rights
reserved.  This program is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.

SEE ALSO

Statistics::Descriptive

Statistics::Discrete