NAME

Statistics::Sequences::Pot - Helmut Schmidt's test of force-like runs of a discrete state within a random distribution

VERSION

This is documentation for version 0.02 of Statistics::Sequences::Pot, released 27 June 2008.

SYNOPSIS

use Statistics::Sequences::Pot;

$pot = Statistics::Sequences::Pot->new();

# Load an array (reference) of data (of strings or numbers) into the pot object:
$pot->load([qw/2 0 8 5 3 5 2 3 1 1 9 4 4 1 5 5 6 5 8 7 5 3 8 5 6/]);

# Test the relative runs of a specific state (e.g., "5") among these data:
$pot->test(state => 5);

# Print out the Pot-statistic, and a z-test of its significance:
$pot->dump();

# Prints: State 5: Pot = 4.31, z = -0.19, p = 0.42539

# or be discretely informed re individual stats, post-test, e.g., :

print "Observed Pot = $pot->{'observed'} for $pot->{'count'} occurrences
 of $pot->{'state'} among $pot->{'samplings'};
 probability = $pot->{'p_value'}\n";

# Prints: Observed Pot = 4.3098698035002 for 7 occurrences of 5 among 25; probability = 0.42539

DESCRIPTION

The Pot statistic measures the bunching relative to the spacing of a single state within a series of other states, conceived by Helmut Schmidt as a targeted "potential" energy (or Pot) that dissipates exponentially between states. It's not limited to considering only clusters of consecutive states (or bunches), as is the case with the more familiar Runs test of sequences.

Say you're interested in the occurrence of the state 3 within an array of digits: note how, in the following arrays, there are increasing breaks between the 3s (separated by 0, 1 and then 2 other states):

4, 7, 3, 3
3, 4, 3, 7
3, 8, 1, 3

The occurrence of 3 is, with the Pot-test, of exponentially declining interest across these sequences, given the increasing breaks by other states between the occurrences of 3. The statistic does not ignore these ever remoter occurrences of the state of interest; it accounts for increased spacing between them as if there were an exponentially declining force, a potential towards 3, within the data-stream (up to a theoretical or empirical asymptote that may be specified).

Running the Pot-test involves testing its significance as a standard "z" score; Schmidt (2000) provided data demonstrating Pot's conformance with the normal distribution. This will naturally be improved by repeated sampling, and by using block averages.

METHODS

Methods are essentially as described in Statistics::Sequences. See this manpage for how to handle non-dichotomous data, e.g., numerical data, or those with more than two categories; the relevant methods are not described here.

new

$pot = Statistics::Sequences::Runs->new();

Returns a new Runs object. Expects/accepts no arguments but the classname.

load

$pot->load(@data);
$pot->load(\@data);
$pot->load('sample1' => \@data1, 'sample2' => \@data2)
$pot->load({'sample1' => \@data1, 'sample2' => \@data2})

Loads data anonymously or by name. See load in the Statistics::Sequences manpage.

test

$pot->test(state => 'a'[, scale => 1, full => 0]);

Aliases: perform_pot_test

Runs the Pot-test on a specified state, lumps the Pot object with stats, and returns itself. Note that data must already be loaded into the Pot object before testing, otherwise, expect a croak. The test works with the following required, and then some optional, parameters, each as name => value pairs.

state => string

The state within the data whose bunching is to be tested. This is the only required parameter to test, which will surely croak if no state is specified, or if a state occurs more often than there are data. If the state does not exist in the data, nix is defined.

scale => numeric >= 1

Optionally, the scale of the range parameter, which should be greater than or equal to 1. Default = 1; values less than 1 are effected as 1. For info on how to set this parameter, see the "DESCRIPTION" above, and also the explanation of observed among the statistical "ATTRIBUTES".

full => boolean

Optionally, standard descriptive statistics regarding the bunches, and the spaces between them, will be available. See bunches and spaces, below. Default = 0, given the independence of Pot from the central tendencies of bunching and spacing. The facility is provided for purposes of hypothesis-testing, and appraising the character of Pot.

dump

$pot->dump(flag => '1|0', text => '0|1|2');

Print Pot-test results to STDOUT. See dump in the Statistics::Sequences manpage for details.

ATTRIBUTES

Once calling test, the pot object is lumped with the following attributes, each of which may be accessed as $pot->{'ATTRIBUTE'}.

samplings

The size of the submitted data (or number of "trials").

count

The number of times the target state occurred in the submitted array.

observed

A measure of the number and size of bunchings of the state that occurred within the array. This is based on Schmidt (2000), Equations 6-7, and his Appended program. The formula is:

            I,J=1..N
             SUM  r|n(I) - n(J)|
            I<J

where

            r = eN/MS

is the number of observations, and S (for scale) is a constant determining the range r of the potential energy between pairs of I and J states.

In most situations, should all states be equiprobable, or their probability be proportionate to their number, r would reflect the average distance, or delay, between successive states, equal to the number of all observations divided by the number of states. For example, if there were 10 possible states, and 100 observations have been made, then the probability of re-occurrence of any one of the 10 states within any slot will be equal to 100/10, with S = 1, i.e., expecting that any one of the states would mostly occur by a spacing of 10, and then by an exponentially declining tendency toward consecutive occurrence. In this way, with S = 1, Pot can be considered to be a measure of "short-range bunching," as Schmidt called it. Bunching over a larger range than this minimally expected range can be measured with S > 1. This is specified, optionally, as the argument named scale to test. Hypothesis-testing might be made with respect to various values of the scale parameter.

expected

The theoretically expected value of Pot, given N states among M observations, and r range of clustering within these observations. It is calculated as follows, given the above definitions.

              Pot = ((N(N – 1))/(M(M – 1))) . (r/1 – r) . (M – (1/(1 – r)))

obs_dev

The observed value of Pot less the expected value of Pot, and hence the amount by which the observed value deviates from that expected by chance.

std_dev

The standard deviation, based on the theoretically expected variance of Pot, which is given by:

              Variance = (r²/ (1 – r²) . (N / M) . (1 – (N / M))² . N

range

The range of observations over which Pot was assessed, simply being the product of the scale and number of observations, divided by the number of states.

z_value

The standard score, based on dividing the observed deviation by the standard deviation.

p_value

The probability associated with the absolute value of the z-value.

bunches

Only provided if passing full => 1 to test.

A Statistics::Descriptive::Full object, loaded with the lengths of each bunch of the state. Statistics such as the count, mean, mode and range of the observed bunches can be called, and the ordered list of bunch sizes can itself be retrieved. E.g.,

print $pot->{'bunches'}->mean() ."\n";
@bunch_lengths = $pot->{'bunches'}->get_data();
spaces

Only provided if passing full => 1 to test.

A Statistics::Descriptive::Full object, loaded with the lengths of the intervals, or spaces, between each bunch of the state. Statistics such as the count, mean, mode and range of the observed spaces can be called. E.g.,

print $pot->{'spaces'}->mode() ."\n";

Note that test returns the Pot object (itself), so you could get "immediate" access to any of the above by, for example:

my $n_zeroes = $pot->test(state => 0)->{'count'};

print 'median spaces = ' . $pot->test(state => 0, full => 1)->{'spaces'}->median() . "\n";

EXAMPLES

1. Using Pot as a test of bunching of a particular state within a collection of quasi-random observations.

use Statistics::Sequences::Pot;
use strict;
my ($i, @data) = ();

# Init an array of random data with integers ranging from 0 to 15:
for ($i = 0; $i < 960; $i++) {
  $data[$i] = int(rand(16));
}

# Assess degree of bunching within these data with respect to a randomly selected target state:
my $state = int(rand(16));

my $pot = Statistics::Sequences::Pot->new();
$pot->load(\@data)->test(state => $state);

# Access the results of this analysis:
print "The probability of obtaining as much bunching of $state as observed is $pot->{'p_value'}\n";
# or:
print "For state $pot->{'state'} occurring $pot->{'count'} times among $pot->{'samplings'}, ".
   "the observed value of Pot was $pot->{'observed'}.\n" .
   "The expected value of Pot was $pot->{'expected'}\n" .
   "The deviation from expectation was $pot->{'obs_dev'}\n" .
   "The standard deviation was $pot->{'std_dev'}\n" .
   "Z = $pot->{'z_value'}\n" .
   "Probability of this z-value = $pot->{'p_value'}\n";
# or print the lot, and more, in English:
$pot->dump(text => 2);

# See what else was happening, having already given test() the data to test:
foreach (0 .. 15) {
   next if $_ == $state;
   $pot->test(state => $_)->dump();
 }

2. Using Pot as a test of randomness of an array of dichotomous observations. Note: alphabetic strings as the elements of the array; reuse of loaded data; recycling of loaded data between tests; internal storage of the state; and exploitation of the module for semi-Pot purposes.

use Statistics::Sequences::Pot;
use strict;
my ($i, @data) = ();

# Init an array of random data with values of either 'hit' or 'miss':
my @categories = (qw/hit miss/);  
for ($i = 0; $i < 640; $i++) {
  $data[$i] = $categories[int(rand(@categories))];
}

# Make a pot object and load up the data:
my $pot = Statistics::Sequences::Pot->new();
$pot->load(\@data);

# Run and dump a couple analyses, on each possible state:
$pot->test(state => 'hit');
$pot->dump();
$pot->test(state => 'miss');
$pot->dump();

# Be randomly redundant:
$pot->test(state => $categories[int(rand(@categories))], full => 1);

print "Randomly selected state was a $pot->{'state'}, and this occurred $pot->{'count'} times, " .
      "most frequently bunching by a length of " . $pot->{'bunches'}->mode() . "\n";

# Prints, e.g.:
## State hit: Pot = 252.04, z = 0.83, p = 0.20298
## State miss: Pot = 246.43, z = 0.82, p = 0.20721
## Randomly selected state was a miss, and this occurred 315 times, most frequently bunching by a length of 1

REFERENCES

Schmidt, H. (2000). A proposed measure for psi-induced bunching of randomly spaced events. Journal of Parapsychology, 64, 301-316.

SEE ALSO

http://www.fourmilab.ch/rpkp/ for Schmidt's many papers on the physical conceptualisation and properties of psi.

Statistics::Descriptive : The present module adds data to "Full" objects of this package in order to access descriptives re bunches and spaces.

Statistics::Distributions : The present module uses the uprob() method of this package for determining the probability associated with the z-value.

Statistics::Frequency : the proportional_frequency() method in this module could be informative when working with data of the kind used here.

BUGS/LIMITATIONS

No computational bugs as yet identfied. Hopefully this will change, given time.

Limitations of the code, perhaps, concern the non-unique storage of data arrays (compared to, say, Statistics::DependantTTest, but see Statistics::TTest). This would require a unique name for each array of data, and explicit reference to one or another array with each test (when, perhaps, you'd have only one data-set, after all). In any case, the data are accepted as array references.

Limitations of the actual Pot statistic may be considered to be its newness, not having had the opportunity to be critiqued by peers, and that some experimentation might be required to find an optimal scale.

REVISION HISTORY

See CHANGES in installation dist for revisions.

AUTHOR/LICENSE

rgarton AT cpan DOT org

This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see http://www.perl.com/perl/misc/Artistic.html).

Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 275:

Expected text after =item, not a bullet

Around line 281:

Expected text after =item, not a bullet