NAME
Statistics::Runs - The Runs-test (Wald-Walfowitz or Swed-Eisenhard Test)
VERSION
This is documentation for version 0.02 of Statistics::Sequences::Runs, released 27 June 2008.
SYNOPSIS
use Statistics::Sequences::Runs;
$runs = Statistics::Sequences::Runs->new();
$runs->load(qw/1 0 0 0 1 1 0 1 1 0 0 1 0 0 1 1 1 1 0 1/);
$runs->test()->dump();
DESCRIPTION
The Runs-test assesses the difference between two independent distributions, or a difference within a single distribution of dichotomous observations, in terms of the frequency of the runs of states within them.
A run is a sequence of identical states on 1 or more consecutive trials. For example, in a signal-detection test, there'll be a series, over time, of hits (H) and misses (M), which might look like H-H-M-H-M-M-M-M-H. Here, there are 5 runs: 3 of hits, and 2 of misses. This number of runs can be compared with the number expected to occur by chance, given the number of observed hits and misses. More runs than expected ("negative serial dependence") generally indicates irregularity, or instability; fewer runs than expected ("positive serial dependence") indicates regularity, or stability. Both can indicate a sequential dependency: either negative (an extra-chance factor, or bias, to produce too many alternations), or positive (an extra-chance factor, or bias, to produce too many repetitions).
The distribution of runs is asymptotically normal - quite quickly, with probabilities well estimated by the normal distribution when both the numbers of H and M exceed 10 (e.g., Kelly, 1982). The deviation of the observed number of runs is therefore reliably assessed by way of a z-score.
METHODS
Methods are essentially as described in Statistics::Sequences. See this manpage for how to handle non-dichotomous data, e.g., numerical data, or those with more than two categories.
new
$run = Statistics::Sequences::Runs->new();
Returns a new Runs object. Expects/accepts no arguments but the classname.
load
$runs->load(@data);
$runs->load(\@data);
$runs->load('dist1' => \@data1, 'dist2' => \@data2)
$runs->load({'dist1' => \@data1, 'dist2' => \@data2})
Loads data anonymously or by name. See load in the Statistics::Sequences manpage.
test
$runs->test();
Performs the runs test on the named distributions. If only one distribution name is given, the "one-sample" Runs test is performed, cutting the data at the median, or by the value given as cut. Observations that fall above and below the cut-value then constitute the "groups" to be searched for runs. Otherwise, with two named groups, runs are sought on the basis of the observations belonging to one or the other named group.
dump
$runs->dump(flag => '1|0', text => '0|1|2');
Print Runs-test results to STDOUT. See dump in the Statistics::Sequences manpage for details.
EXAMPLE
Seating at the diner
Swed and Eisenhart (1943) list the occupied (O) and empty (E) seats in a row at a lunch counter. Have people taken up their seats on a random basis? Note there is no need to dichotomise these data: there is already a single sample, with dichotomous, categorical observations.
use Statistics::Sequences::Runs;
my $runs = Statistics::Sequences::Runs->new();
my @seating = (qw/E O E E O E E E O E E E O E O E/);
$runs->load(\@seating);
$runs->test(ccorr => 1, tails => 1)->dump();
Suggesting some non-random basis for people taking their seats, this outputs:
Runs: expected = 7.88, observed = 11.00, z = 1.60, 1-p = 0.054834
ESP runs
In a single run of a classic ESP test, there are 25 trials, each composed of a randomly generated state (typically, one of 5 possible geometric figures), and a human-generated state drawn from the same pool of alternatives. Tests of the synchrony between the random and human data are then made, typically in terms of the number of "hits" observed versus that expected. The runs of hits and misses can also be tested by dichotomising the data on the basis of the match of the random "targets" with the human "responses", like so:
use Statistics::Sequences::Runs;
# Produce pseudo targets and responses:
my ($i, @targets, @responses);
for ($i = 0; $i < 250; $i++) {
$targets[$i] = (qw/circle plus square star wave/)[int(rand(5))];
$responses[$i] = (qw/circle plus square star wave/)[int(rand(5))];
}
# Do the run thing:
my $runs = Statistics::Sequences::Runs->new();
$runs->load(targets => \@targets, responses => \@responses);
$runs->match(data => [qw/targets responses/]);
$runs->test();
print "The probability of obtaining these $runs->{'observed'} runs is $runs->{'p_value'}\n";
# But what if the responses were actually synchronised to the target on the trial one ahead?
$runs->match(data => [qw/targets responses/], lag => 1)->test();
print "With responses synchronised to targets on the next (+1) sample,\n
$runs->{'observed'} runs in 249 samplings were produced when $runs->{'expected'} were expected,\n
a deviation with an associated probability of $runs->{'p_value'}\n";
REFERENCES
Kelly, E. F. (1982). On grouping of hits in some exceptional psi performers. Journal of the American Society for Psychical Research, 76, 101-142.
Swed, F., & Eisenhart, C. (1943). Tables for testing randomness of grouping in a sequence of alternatives. Annals of Mathematical Statistics, 14, 66-87. [Look in ex/checks.pl
in the installation dist for a few examples from this paper for testing.]
Wald, A., & Wolfowitz, J. (1940). On a test whether two samples are from the same population. Annals of Mathematical Statistics, 11, 147-162.
Wolfowitz, J. (1943). On the theory of runs with some applications to quality control. Annals of Mathematical Statistics, 14, 280-288. [Suggests some ways in which data may be dichotomised for testing runs.]
SEE ALSO
Statistics::Sequences for other tests of sequences, and for sharing data between these tests.
TO DO/BUGS
Results are dubious if there are only two observations.
Testing not by z-scores, and/or using poisson distribution for low number of observations
Fu's Markovian solution
REVISION HISTORY
See CHANGES in installation dist for revisions.
AUTHOR/LICENSE
- Copyright (c) 2006-2009 Roderick Garton
-
rgarton AT cpan DOT org
This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see http://www.perl.com/perl/misc/Artistic.html).
DISCLAIMER
To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.