NAME

Statistics::Sequences::Pot - Schmidt's test of force-like runs among randomly spaced events

VERSION

This is documentation for version 0.01 of Statistics::Sequences::Pot, released 13 July 2006.

SYNOPSIS

use Statistics::Sequences::Pot;

$pot = Statistics::Sequences::Pot->new();

# Load an array (reference) of data (of strings or numbers) into the pot object:
$pot->load([qw/2 0 8 5 3 5 2 3 1 1 9 4 4 1 5 5 6 5 8 7 5 3 8 5 6/]);

# Test the relative runs of a specific event (e.g., "5") among these data:
$pot->test(event => 5);

# Print out the Pot-statistic, and a z-test of its significance:
$pot->dump();

# Prints: Event 5: Pot = 4.31, z = -0.19, p = 0.42539

# or be discretely informed re individual stats, post-test, e.g., :

print "Observed Pot = $pot->{'observed'} for $pot->{'events'} occurrences
 of $pot->{'event'} among $pot->{'units'};
 probability = $pot->{'p_value'}\n";

# Prints: Observed Pot = 4.3098698035002 for 7 occurrences of 5 among 25; probability = 0.42539

DESCRIPTION

The Pot statistic measures the bunching relative to the spacing of a single event within a series of other events, conceived by Helmut Schmidt as a targeted "potential" energy (or Pot) that dissipates exponentially between events. It's not limited to considering only clusters of consecutive events (or bunches), as is the case with the more familiar Runs test of sequences.

Say you're interested in the occurrence of th event 3 within an array of digits: note how, in the following arrays, there are increasing breaks between the 3s (separated by 0, 1 and then 2 other events):

4, 7, 3, 3
3, 4, 3, 7
3, 8, 1, 3

The occurrence of 3 is, with the Pot-test, of exponentially declining interest across these sequences, given the increasing breaks by other events between the occurrences of 3. The statistic does not ignore these ever remoter occurrences of the event of interest; it accounts for increased spacing between them as if there were an exponentially declining force, a potential towards 3, within the data-stream (up to a theoretical or empirical asymptote that may be specified).

Running the Pot-test involves a z-test for significance; Schmidt (2000) provided data demonstrating Pot's conformance with the normal distribution. This will, of course, be improved by repeated sampling, and by pooling observations into blocks.

METHODS

Methods are essentially as described in Statistics::Sequences. See this manpage for how to handle non-dichotomous data, e.g., numerical data, or those with more than two categories; the relevant methods are not described here.

new

$pot = Statistics::Sequences::Runs->new();

Returns a new Runs object. Expects/accepts no arguments but the classname.

load

$pot->load(@data);
$pot->load(\@data);
$pot->load('sample1' => \@data1, 'sample2' => \@data2)
$pot->load({'sample1' => \@data1, 'sample2' => \@data2})

Loads data anonymously or by name. See load in the Statistics::Sequences manpage.

test

$pot->test(event => 'a'[, scale => 1, full => 0]);

Aliases: perform_pot_test

Runs the Pot-test on a specified event, lumps the Pot object with stats, and returns itself. Note that data must already be loaded into the Pot object before testing, otherwise, expect a croak. The test works with the following required, and then some optional, parameters, each as name => value pairs.

event => string

The event within the data whose bunching is to be tested. This is the only required parameter to test, which will surely croak if no event is specified, or the perverse occasion when an event occurs more often than there are data. If the event does not exist in the data, all parameters are undefined.

scale => numeric >= 1

Optionally, the scale of the range parameter, which should be greater than or equal to 1. Default = 1; values less than 1 are effected as 1. For info on how to set this parameter, see the "DESCRIPTION" above, and also the explanation of observed among the statistical "ATTRIBUTES".

full => boolean

Optionally, standard descriptive statistics regarding the bunches, and the spaces between them, will be available. See bunches and spaces, below. Default = 0, given the independence of Pot from the central tendencies of bunching and spacing. The facility is provided for purposes of hypothesis-testing, and appraising the character of Pot.

dump

$pot->dump(data => '1|0', flag => '1|0', text => '0|1|2');

Print Pot-test results to STDOUT. See dump in the Statistics::Sequences manpage for details.

ATTRIBUTES

Once calling test, the pot object is lumped with the following attributes, each of which may be accessed as $pot->{'ATTRIBUTE'}.

units

The size of the submitted data.

n_events

The number of times the target event occurred in the submitted array.

observed

A measure of the number and size of bunchings of the event that occurred within the array. This is based on Schmidt (2000), Equations 6-7, and his Appended program. The formula is:

            I,J=1..N
            SUM  r|n(I) - n(J)|
            I<J

where

            r = e-N/MS

is the number of observations, and S (for scale) is a constant determining the range r of the potential energy between pairs of I and J events.

In most situations, should all events be equiprobable, or their probability be proportionate to their number, r would reflect the average distance, or delay, between successive events, equal to the number of all observations divided by the number of events. For example, if there were 10 possible events, and 100 observations have been made, then the probability of re-occurrence of any one of the 10 events within any slot will be equal to 100/10, with S = 1, i.e., expecting that any one of the events would mostly occur by a spacing of 10, and then by an exponentially declining tendency toward consecutive occurrence. In this way, with S = 1, Pot can be considered to be a measure of "short-range bunching," as Schmidt called it. Bunching over a larger range than this minimally expected range can be measured with S > 1. This is specified, optionally, as the argument named scale to test. Hypothesis-testing might be made with respect to various values of the scale parameter.

expected

The theoretically expected value of Pot, given N events among M observations, and r range of clustering within these observations. It is calculated as follows, given the above definitions.

              Pot = ((N(N - 1))/(M(M - 1))) . (r/1 - r) . (M - (1/(1 - r)))

deviation

The observed value of Pot less the expected value of Pot, and hence the amount by which the observed value deviates from that expected by chance.

sd

The standard deviation, based on the theoretically expected variance of Pot, which is given by:

              Variance = (r2/ (1 - r2) . (N / M) . (1 - (N / M))2 . N

range

The range of observations over which Pot was assessed, simply being the product of the scale and number of observations, divided by the number of events.

z_value

The result of the z-test, based on dividing the observed deviation by the standard deviation.

p_value

The probability associated with the absolute value of the z-statistic.

bunches

Only provided if passing full => 1 to test.

A Statistics::Descriptive::Full object, loaded with the lengths of each bunch of the event. Statistics such as the count, mean, mode and range of the observed bunches can be called, and the ordered list of bunch sizes can itself be retrieved. E.g.,

print $pot->{'bunches'}->mean() ."\n";
@bunch_lengths = $pot->{'bunches'}->get_data();
spaces

Only provided if passing full => 1 to test.

A Statistics::Descriptive::Full object, loaded with the lengths of the intervals, or spaces, between each bunch of the event. Statistics such as the count, mean, mode and range of the observed spaces can be called. E.g.,

print $pot->{'spaces'}->mode() ."\n";

Note that test returns the Pot object (itself), so one could get "immediate" access to any of the above by, for example:

my $n_zeroes = $pot->test(event => 0)->{'n_events'};

print 'median spaces = ' . $pot->test(event => 0, full => 1)->{'spaces'}->median() . "\n";

EXAMPLES

1. Using Pot as a test of bunching of a particular event within a collection of quasi-random observations.

use Statistics::Sequences::Pot;
use strict;
my ($i, @data) = ();

# Init an array of random data with integers ranging from 0 to 15:
for ($i = 0; $i < 960; $i++) {
  $data[$i] = int(rand(16));
}

# Assess degree of bunching within these data with respect to a randomly selected target event:
my $event = int(rand(16));

my $pot = Statistics::Sequences::Pot->new();
$pot->load(\@data)->test(event => $event);

# Access the results of this analysis:
print "The probability of obtaining as much bunching of $event as observed is $pot->{'p_value'}\n";
# or:
print "For event $pot->{'event'} occurring $pot->{'events'} times among $pot->{'units'}, ".
   "the observed value of Pot was $pot->{'observed'}.\n" .
   "The expected value of Pot was $pot->{'expected'}\n" .
   "The deviation from expectation was $pot->{'deviation'}\n" .
   "The standard deviation was $pot->{'sd'}\n" .
   "Z = $pot->{'z_value'}\n" .
   "Probability of this deviation = $pot->{'p_value'}\n";
# or print the lot, and more, in English:
$pot->dump(text => 2);

# See what else was happening, having already given test() the data to test:
foreach (0 .. 15) {
   next if $_ == $event;
   $pot->test(event => $_)->dump();
 }

2. Using Pot as a test of randomness of an array of dichotomous observations. Note: alphabetic strings as the elements of the array; reuse of loaded data; recycling of loaded data between tests; internal storage of the event; and exploitation of the module for semi-Pot purposes.

use Statistics::Sequences::Pot;
use strict;
my ($i, @data) = ();

# Init an array of random data with values of either 'hit' or 'miss':
my @categories = (qw/hit miss/);  
for ($i = 0; $i < 640; $i++) {
  $data[$i] = $categories[int(rand(@categories))];
}

# Make a pot object and load up the data:
my $pot = Statistics::Sequences::Pot->new();
$pot->load(\@data);

# Run and dump a couple analyses, on each possible event:
$pot->test(event => 'hit');
$pot->dump();
$pot->test(event => 'miss');
$pot->dump();

# Be randomly redundant:
$pot->test(event => $categories[int(rand(@categories))], full => 1);

print "Randomly selected event was a $pot->{'event'}, and this occurred $pot->{'events'} times, " .
      "most frequently bunching by a length of " . $pot->{'bunches'}->mode() . "\n";

# Prints, e.g.:
## Event hit: Pot = 252.04, z = 0.83, p = 0.20298
## Event miss: Pot = 246.43, z = 0.82, p = 0.20721
## Randomly selected event was a miss, and this occurred 315 times, most frequently bunching by a length of 1

REFERENCES

Schmidt, H. (2000). A proposed measure for psi-induced bunching of randomly spaced events. Journal of Parapsychology, 64, 301-316.

SEE ALSO

http://www.fourmilab.ch/rpkp/ for Schmidt's many papers on the physical conceptualisation and properties of psi.

Statistics::Descriptive : The present module adds data to "Full" objects of this package in order to access descriptives re bunches and spaces.

Statistics::Distributions : The present module uses the uprob() method of this package for determining the probability associated with the z-test.

Statistics::Frequency : the proportional_frequency() method in this module could be informative when working with data of the kind used here.

BUGS/LIMITATIONS

No computational bugs as yet identfied. Hopefully this will change, given time.

Limitations of the code, perhaps, concern the non-unique storage of data arrays (compared to, say, Statistics::DependantTTest, but see Statistics::TTest). This would require a unique name for each array of data, and explicit reference to one or another array with each test (when, perhaps, you'd have only one data-set, after all). In any case, the data are accepted as array references.

Limitations of the actual Pot statistic may be considered to be its newness, not having had the opportunity to be critiqued by peers, and that some experimentation might be required to find an optimal scale.

REVISION HISTORY

v 0.01

June 2006

Initital release via PAUSE.

AUTHOR/LICENSE

rgarton@utas_DOT_edu_DOT_au

This program is free software. This module is free software. It may be used, redistributed and/or modified under the stame terms as Perl-5.6.1 (or later) (see http://www.perl.com/perl/misc/Artistic.html).

Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 290:

Expected text after =item, not a bullet

Around line 296:

Expected text after =item, not a bullet