PDL::Stats::Basic -- basic statistics and related utilities
The terms FUNCTIONS and METHODS are arbitrarily used to refer to methods that are threadable and methods that are NOT threadable, respectively.
Does not have mean or median function here. see SEE ALSO.
use PDL::LiteF;
use PDL::NiceSlice;
use PDL::Stats::Basic;
my $stdv = $data->stdv;
my $stdv = stdv( $data );
Signature: (a(n); float+ [o]b())
Sample standard deviation.
stdv does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); float+ [o]b())
Unbiased estimate of population standard deviation.
stdv_unbiased does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); float+ [o]b())
Sample variance.
var does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); float+ [o]b())
Unbiased estimate of population variance.
var_unbiased does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); float+ [o]b())
# 95% confidence interval for samples with large N
$ci_95_upper = $data->average + 1.96 * $data->se;
$ci_95_lower = $data->average - 1.96 * $data->se;
Standard error of the mean. Useful for calculating confidence intervals.
se does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); float+ [o]b())
sum of squared deviations from the mean
ss does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); float+ [o]b())
sample skewness. measure of asymmetry in data. skewness == 0 for normal distribution.
skew does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); float+ [o]b())
unbiased estimate of population skewness. this is the number in GNumeric Descriptive Statistics.
skew_unbiased does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); float+ [o]b())
sample kurtosis. measure of "peakedness" of data. kurtosis == 0 for normal distribution.
kurt does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); float+ [o]b())
unbiased estimate of population kurtosis. this is the number in GNumeric Descriptive Statistics.
kurt_unbiased does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); b(n); float+ [o]c())
sample covariance. see corr for ways to call
cov does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); b(n); float+ [o]c())
perldl> $a = random 5, 3
perldl> $b = sequence 5,3
perldl> p $a->corr($b)
[0.20934208 0.30949881 0.26713007]
for square corr table
perldl> p $a->corr($a->dummy(1,1))
[ 1 -0.41995259 -0.029301192]
[ -0.41995259 1 -0.61927619]
[-0.029301192 -0.61927619 1]
pearson correlation coefficient. r = cov(X,Y) / (stdv(X) * stdv(Y)).
corr does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (r(); n(); [o]t())
$corr = $data->corr( $data->dummy(1,1) );
$n = $data->n_pair( $data->dummy(1,1) );
$t_corr = $corr->t_corr( $n );
use PDL::GSL::CDF;
$p_2tail = 2 * (1 - gsl_cdf_tdist_P( $t_corr->abs, $n-2 ));
t significance test for Pearson correlations.
t_corr does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); b(n); int [o]c())
returns the number of good pairs between 2 lists. useful with corr (esp. when bad values are involved)
n_pair does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); b(n); float+ [o]c())
$corr = $a->dev_m->corr_dev($b->dev_m);
calculates correlations from dev_m vals. seems faster than doing corr from original vals when data pdl is big
corr_dev does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); b(m); float+ [o]t(); [o]d())
my ($t, $df) = t_test( $pdl1, $pdl2 );
use PDL::GSL::CDF;
my $p_2tail = 2 * (1 - gsl_cdf_tdist_P( $t->abs, $df ));
independent sample t-test, assuming equal var.
t_test does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); b(m); float+ [o]t(); [o]d())
my ($t, $df) = $pdl1->t_test( $pdl2 );
independent sample t-test, NOT assuming equal var. ie Welch two sample t test. Df follows Welch-Satterthwaite equation instead of Satterthwaite (1946, as cited by Hays, 1994, 5th ed.). It matches GNumeric, which matches R.
t_test_nev does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Signature: (a(n); b(n); float+ [o]t(); [o]d())
paired sample t-test.
t_test_paired does handle bad values. It will set the bad-value flag of all output piddles if the flag is set for any of the input piddles.
Reads either file or file handle*. Returns observation x variable pdl and var and obs ids if specified. Ids in perl @ ref to allow for non-numeric ids. Other non-numeric entries are treated as missing, which are filled with $opt{MISSN} then set to BAD*. Can specify num of data rows to read from top but not arbitrary range.
*If passed handle, it will not be closed here.
*PDL::Bad::setvaltobad only works consistently with the default TYPE double before PDL-2.4.4_04.
Default options (case insensitive):
V => 1, # prints simple status
TYPE => double,
C_ID => 1,
R_ID => 1,
R_VAR => 0, # set to 1 if var in rows
SEP => "\t", # can take regex qr//
MISSN => -999,
NROW => '',
($data, $idv, $ido) = get_data( \*STDIN, { TYPE=>long } );
$data = get_data( 'zcat big_data.txt.gz |' );
Lookup specified var (obs) id in $idv ($ido) (see get_data) and return index in $idv as pdl if found. Useful for selecting data by var (obs) id.
my $ind = which_id $ido, ['2c_1', 'vq_1'];
my $data_subset = $data( $ind, );
PDL::Basic (hist for frequency counts)
PDL::Ufunc (sum, avg, median, min, max, etc.)
PDL::GSL::CDF (various cumulative distribution functions)
