NAME

Statistics::ANOVA - Perform oneway analyses of variance

SYNOPSIS

use Statistics::ANOVA 0.061;
my $ano = Statistics::ANOVA->new();

# Some data:
my @gp1 = (qw/8 7 11 14 9/);
my @gp2 = (qw/11 9 8 11 13/);

# Load the data (names can be arbitrary):
$ano->load_data({gp1 => \@gp1, gp2 => \@gp2});
# Here comes another one:
my @gp3 = (qw/7 13 12 8 10/);
$ano->add_data(gp3 => \@gp3);

# If they are independent data, test equality of variances, difference between them, and means:
$ano->obrien_test()->dump(title => 'O\'Brien\'s test of equality of variances');
$ano->levene_test()->dump(title => 'Levene\'s test of equality of variances');
$ano->anova_indep()->dump(title => 'Independent groups ANOVA', eta_squared => 1, omega_squared => 1);
$ano->comparisons_indep();

# or if they are repeated measures:
$ano->anova_dep()->dump(title => 'Dependent groups ANOVA');
$ano->comparisons_dep();
# or:
$ano->anova_friedman()->dump(title => 'Friedman test');
# or 
$ano->anova_friedman(f_equiv => 1)->dump(title => 'Friedman test');

DESCRIPTION

Performs oneway between groups and repeated measures ANOVAs, with estimates of proportion of variance acounted for (eta-squared) and effect-size (omega-squared), plus pairwise comparisons by the relevant t-tests. Also performs equality of variances tests (O'Brien's, Levene's).

METHODS

new

Create a new Statistics::ANOVA object

load

$ano->load('aname', @data1)
$ano->load('aname', \@data1)
$ano->load({'aname' => \@data1, 'another_name' => \@data2})

Alias: load_data

Accepts either (1) a single name => value pair of a sample name, and a list (referenced or not) of data; or (2) a hash reference of named array references of data. The data are loaded into the class object by name, within a hash named data, as Statistics::Descriptive::Full objects. So you can easily get at any descriptives for the groups you've loaded - e.g., $ano->{'data'}->{'aname'}->mean() - or you could get at the data again by going $ano->{'data'}->{'aname'}->get_data(); and so on. The names of the data are up to you.

Each call unloads any previous loads.

Returns the Statistics::ANOVA object.

add

$ano->add('another_name', @data2)
$ano->add('another_name', \@data2)
$ano->add({'another_name' => \@data2})

Alias: add_data

Same as load except that any previous loads are not unloaded.

unload

$ano->unload();

Empties all cached data and calculations upon them, ensuring these will not be used for testing. This will be automatically called with each new load, but, to take care of any development, it could be good practice to call it yourself whenever switching from one dataset for testing to another.

anova_indep

$ano->anova_indep()

An implementation of a one-way between-groups analysis of variance. Feeds the class object $ano as follows:

$ano->{'f_value'}
$ano->{'df_t'} : the treatment or numerator or between-groups degree(s) of freedom
$ano->{'df_e'} : the error or denominator or within-groups degree(s) of freedom
$ano->{'p_value'} : associated with the F-value with the above dfs
$ano->{'ss_t'} : treatment sums of squares
$ano->{'ss_e'} : error sums of squares
$ano->{'ms_t'} : treatment mean squares
$ano->{'ms_e'} : error mean squares

anova_dep

$ano->anova_dep()

Alias: anova_rm

Performs a one-way repeated measures analysis of variance (sphericity assumed). See anova_indep for fed values.

anova_friedman

$ano->anova_friedman()

Alias: friedman_test

Performs Friedman's nonparametric analysis of variance - for two or more dependent (matched, related) groups. The statistical attributes now within the class object (see anova_indep) pertain to this test, e.g., $ano->{'chi_value'} gives the chi-square statistic from the Friedman test; and $ano->{'p_value'} gives the associated p-value (area under the right-side, upper tail). There is now no defined 'f_value'.

Accepts, however, one argument: If f_equiv => 1, then, instead of the chi_value, and p_value read off the chi-square distribution, you get the F-value equivalent, with the p-value read off the F-distribution.

See some other module for performing nonparametric pairwise comparisons.

obrien_test

Performs O'Brien's (1981) test for equality of variances within each group: based on transforming each observation in relation to its group variance and its deviation from its group mean; and performing an ANOVA on these transformed scores (for which the group mean is equal to the variance of the original observations). The procedure is recognised to be robust against violations of normality (unlike F-max).

The statistical attributes now within the class object (see anova_indep) pertain to this test, e.g., $ano->{'f_value'} gives the F-statistic for O'Brien's Test; and $ano->{'p_value'} gives the p-value associated with the F-statistic for O'Brien's Test.

levene_test

Performs Levene's (1960) test for equality of variances within each group: an ANOVA of the absolute deviations, i.e., absolute value of each observation less its group mean.

The statistical attributes now within the class object (see anova_indep) pertain to this test, e.g., $ano->{'f_value'} gives the F-statistic for Levene's Test; and $ano->{'p_value'} gives the p-value associated with the F-statistic for Levene's Test.

eta_squared

Returns eta-squared if an ANOVA has been performed; otherwise croaks. Also feeds $ano with the value, named 'eta_sq'. Values range from 0 to 1, 0 indicating no effect, 1 indicating difference between at least two DV means. Generally indicates the proportion of variance in the DV related to an effect.

omega_squared

Returns the effect size statistic omega-squared if an ANOVA has been performed; otherwise croaks. Also feeds $ano with the value, named 'omega_sq'. Generally, size is small where omega_sq = .01, medium if omega_sq = .059, and strong if omega_sq = .138.

comparisons_indep

$ano->comparisons_indep(tails => 1|2, flag => 1|0 )

Performs independent samples t-tests for each pair of the loaded data, using Statistics::TTest. Simply prints the results to STDOUT. The p_value is 2-tailed, by default, unless otherwise specified, as above. The output strings are appended with an asterisk if the logical value of the optional attribute flag equals 1 and the p_value is less than the Bonferroni-adjusted alpha level. This alpha level, relative to alpha = .05, for the number of paired comparisons, is printed at the end of the list.

comparisons_dep

$ano->comparisons_dep(tails => 1|2, flag => 1|0 )

Performs dependent samples t-tests for each pair of the loaded data, using Statistics::DependantTTest. The number of observations must be equal for each of the data-sets tested. Simply prints the results to STDOUT. The p_value is 2-tailed, by default, unless otherwise specified, as above. The output strings are appended with an asterisk if the logical value of the optional attribute flag equals 1 and the p_value is less than the Bonferroni-adjusted alpha level. This alpha level, relative to alpha = .05, for the number of paired comparisons, is printed at the end of the list.

string

$str = $ano->string(mse => 1, eta_squared => 1, omega_squared => 1, precision_p => integer, precision_s => integer)

Returns a statement of result, in the form of F(df_t, df_e) = f_value, p = p_value; or, for Friedman test chi^2(df_t) = chi_value, p = p_value (to the value of precision_p, if any). Optionally also get MSe, eta_squared and omega_squared values appended to the string, where relevant. These and the test statistic are "sprintf"'d to the precision_s specified (or, by default, not at all).

table

$tble = $ano->table(precision_p => integer, precision_s => integer);

Returns a table listing the degrees of freedom, sums of squares, and mean squares for the tested "factor" and "error" (between/within groups), and the F and p values. The test statistics are "sprintf"'d to the precision_s specified (or, by default, not at all); the p value's precision can be specified by precision_p.

Formatting with right-justification where appropriate is left as an exercise for the user.

dump

$ano->dump(title => 'ANOVA test', precision_p => integer, precision_s => integer, mse => 1, eta_squared => 1, omega_squared => 1, verbose => 1)

Prints the string returned by string, or, if specified with the attribute table => 1, the table returned by table; and the string as well if string => 1. A newline - "\n" - is appended at the end of the print of the string. Above this string or table, a title can also be printed, by giving a value to the optional title attribute.

If verbose => 1, then any curiosities arising in the calculations are noted at the end of other dumps. At the moment, this is only the number of observations that might have been purged were they identified as undefined or not-a-number upon loading/adding.

Missing/Invalid values

Any data-points/observations sent to load or add that are undefined or not-a-number are marked for purging before being anova-tested or tested pairwise. The data arrays accessed as above, as Statistics::Descriptive::Full, will still show the original values. When, however, you call one of the anova or pairwise methods, the data must be purged of these invalid values before testing.

For anova_indep and comparisons_indep, each list is simply purged of any undefined or invalid values. This also occurs for the equality of variances tests.

For anova_dep and comparisons_dep, each list is purged of any value at all indices that, in any list, contain invalid values. So if two lists are (1, 4, 2) and (2, ' ', 3), the lists will have to become (1, 2) and (2, 3) to account for the bung-value in the second list, and to keep all the observations appropriately paired.

The number of indices that were subject to purging is cached thus: $ano->{'purged'}. The dump method can also reveal this value.

The looks_like_number method in Scalar::Util is used for checking validity of values.

REFERENCES

Gardner, R. C. (2001). Psychological Statistics using SPSS for Windows. Upper Saddle River, NJ, US: Prentice Hall. : An interesting source for open-source.

Maxwell, S. E., & Delaney, H. D. (1990). Designing Experiments and Analyzing Data: A Model Comparison Perspective. Belmont, CA, US: Wadsworth.

SEE ALSO

Statistics::FisherPitman For an alternative to independent groups ANOVA when the variances are unequal.

Math::Cephes Probabilities for all F-tests are computed using the fdtrc function in this, rather than the more commonly used Statistics::Distributions module, as the former appears to be more accurate for higher values of F.

Statistics::Descriptive Fundamental calculations of means and variances are left up to this old standard; any limitations/idiosyncrasies therein are naturally passed onto the present one; although the present one purges missing and non-numerical values, unlike Statistics::Descriptive.

Statistics::Table::F Simply returns an F value. Note that it does not handle missing values, treating them as zero and thus returning an erroneous F-value in these cases.

BUGS/LIMITATIONS

Computational bugs will hopefully be identified with usage over time.

Optimisation welcomed.

No adjustment for violations of sphericity in repeated measures ANOVA.

Print only of t-test results.

REVISION HISTORY

See CHANGES in installation distribution.

AUTHOR/LICENSE

rgarton AT cpan DOT org

This program is free software. It may be used, redistributed and/or modified under the same terms as Perl-5.6.1 (or later) (see http://www.perl.com/perl/misc/Artistic.html).

Disclaimer

To the maximum extent permitted by applicable law, the author of this module disclaims all warranties, either express or implied, including but not limited to implied warranties of merchantability and fitness for a particular purpose, with regard to the software and the accompanying documentation.