NAME
Lingua::Diversity::SamplingScheme - storing the parameters of a sampling scheme
VERSION
This documentation refers to Lingua::Diversity::SamplingScheme version 0.02.
SYNOPSIS
# Lingua::Diversity::SamplingScheme is used by Lingua::Diversity::Variety.
use Lingua::Diversity::Variety;
# Create a new sampling scheme...
my $sampling_scheme = Lingua::Diversity::SamplingScheme->new(
'mode' => 'segmental',
'subsample_size' => 100,
);
# ... Then apply it to a Lingua::Diversity::Variety object.
Lingua::Diversity::Variety->new(
'transform' => 'type_token_ratio',
'sampling_scheme' => $sampling_scheme,
);
DESCRIPTION
This class serves as storage for a set of parameters defining a sampling scheme (to be used with a Lingua::Diversity::Variety object). Such a scheme is meant to describe the kind of resampling that should be applied as well as the number of subsamples and their size.
CREATOR
The creator (new()
) returns a new Lingua::Diversity::SamplingScheme object. It takes one required and two optional named parameters:
- subsample_size (required)
-
The requested number of unit tokens per subsample (a positive integer).
- num_subsamples
-
The number of subsamples to be drawn (a positive integer). Default is 100. Note that this parameter has no effect in segmental mode (see below), since in this case the number of subsamples is the result of the integer division of text length by requested subsample size.
- mode
-
Either random (default) or segmental.
Value 'random' means that (i) the order of unit tokens in the text should not be modified in a given subsample, and (ii) the probability for a unit token to occur in a given subsample depends only on the requested subsample size (see subsample_size above). E.g. from text say you say me, the following subsamples of size 3 (and only them) could be generated (with uniform probability): say you say, say you me, say say me, and you say me.
Value 'segmental' means that subsamples should be continuous, non-overlapping sequences of units in the original text. For example, text say you say me would give rise to exactly two subsamples of size 2: say you and say me. Incomplete subsamples at the end of the text are ignored, so that a subsample size of 3 would produce a single subsample in this example (i.e. say you say). Note that in this mode, it is assumed that the unit and category arrays are in the text's order.
ACCESSORS
- get_subsample_size() and set_subsample_size()
-
Getter and setter for the subsample_size attribute.
- get_num_subsamples() and set_num_subsamples()
-
Getter and setter for the num_subsamples attribute.
- get_mode() and set_mode()
-
Getter and setter for the mode attribute.
DEPENDENCIES
This module is part of the Lingua::Diversity distribution.
BUGS AND LIMITATIONS
There are no known bugs in this module.
Please report problems to Aris Xanthos (aris.xanthos@unil.ch)
Patches are welcome.
AUTHOR
Aris Xanthos (aris.xanthos@unil.ch)
LICENSE AND COPYRIGHT
Copyright (c) 2011 Aris Xanthos (aris.xanthos@unil.ch).
This program is released under the GPL license (see http://www.gnu.org/licenses/gpl.html).
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.