NAME

Statistics::RankCorrelation - Compute the rank correlation between two vectors

SYNOPSIS

use Statistics::RankCorrelation;

$c = Statistics::RankCorrelation->new(\@u, \@v);

$n = $c->spearman;
$n = $c->csim;

DESCRIPTION

This module computes the rank correlation coefficient between two sample vectors.

Some definitions are always in order:

Statistical rank: The ordinal number of a value's position in a list sorted in a specified order (usually decreasing).

Tied ranks:

PUBLIC METHODS

new VECTOR1, VECTOR2

$c = Statistics::RankCorrelation->new(\@u, \@v);

This method constructs a new Statistics::RankCorrelation object, "co-normalizes" (i.e. pad with trailing zero values) the vectors if they are not the same size, and finds their statistical ranks.

x_data, y_data

$x = $c->x_data;
$y = $c->y_data;

Return the original data samples that were provided to the constructor as array references.

x_rank, y_rank

$x = $c->x_rank;
$y = $c->y_rank;

Return the statistically ranked data samples that were provided to the constructor as array references.

spearman

$n = $c->spearman;

Spearman's rho rank-order correlation is a nonparametric measure of association based on the rank of the data values. The formula is:

    6 * sum( (Ri - Si)^2 )
1 - ----------------------
        N * (N^2 - 1)

Where Ri and Si are the ranks of the values of the two data vectors, and N is the number of samples in the vectors.

The Spearman correlation is a special case of the Pearson product-moment correlation.

csim

$n = $c->csim;

Return the "contour similarity index measure", which is a single dimensional measure of the similarity between two vectors.

This returns a measure in the range [-1..1] and is computed using matrices of binary data representing "higher or lower" values in the original vectors.

Please consult the csim item under the SEE ALSO section.

PRIVATE FUNCTIONS

_rank

$u_ranks = _rank(\@u);

Return an array reference of the ordinal ranks of the given data.

In the case of a tie in the data (identical values) the rank numbers are averaged. An example will help:

data  = [1.0, 2.1, 3.2,   3.2,   3.2,   4.3]
ranks = [1,   2,   9.6/3, 9.6/3, 9.6/3, 4]

_pad_vectors

($u, $v) = _pad_vectors($u, $v);

Append zeros to either input vector for all values in the other that do not have a corresponding value. That is, "pad" the tail of the shorter vector with zero values.

_correlation_matrix

$matrix = _correlation_matrix($u);

Return the correlation matrix for a single vector.

This function builds a square, binary matrix that represents "higher or lower" value within the vector itself.

SEE ALSO

For the csim method:

http://www2.mdanderson.org/app/ilya/Publications/JNMRcontour.pdf

For the <Cspearman> method:

http://mathworld.wolfram.com/SpearmanRankCorrelationCoefficient.html

http://faculty.vassar.edu/lowry/ch3b.html

http://www.pinkmonkey.com/studyguides/subjects/stats/chap6/s0606801.asp

http://fonsg3.let.uva.nl/Service/Statistics/RankCorrelation_coefficient.html

http://www.statsoftinc.com/textbook/stnonpar.html#correlations

http://software.biostat.washington.edu/~rossini/courses/intro-nonpar/text/Tied_Data.html#SECTION00427000000000000000

AUTHOR

Gene Boggs <gene@cpan.org>

COPYRIGHT AND LICENSE

Copyright 2003, Gene Boggs

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.