NAME
Statistics::Sampler::Multinomial - Generate multinomial samples using the conditional binomial method.
SYNOPSIS
use Statistics::Sampler::Multinomial;
my $object = Statistics::Sampler::Multinomial->new(
data => [0.1, 0.3, 0.2, 0.4],
);
$object->draw;
# returns a number between 0..3
my $samples = $object->draw_n_samples(5)
# returns an array ref that might look something like
# [3,3,0,2,0]
# to specify your own PRNG object, in this case the Mersenne Twister
my $mrma = Math::Random::MT::Auto->new;
my $object = Statistics::Sampler::Multinomial->new(
prng => $mrma,
data => [1,2,3,5,10],
);
DESCRIPTION
Implements multinomial sampling using the conditional binomial method (the same algorithm as used in the GSL). Benchmarking shows it to be faster than the Alias method implemented in Statistics::Sampler::Multinomial::AliasMethod, presumably because the calls to the PRNG are inside XS and avoid perl subroutine overheads (and profiling showed the RNG calls to be the main bottleneck for the Alias method).
For more details and background about the various approaches, see http://www.keithschwarz.com/darts-dice-coins.
METHODS
- my $object = Statistics::Sampler::Multinomial->new(data => [0.1, 0.4, 0.5], data_sum_to_one => 1)
- my $object = Statistics::Sampler::Multinomial->new (data => [1,2,3,4,5,100], prng => $prng)
-
Creates a new object, optionally passing a PRNG object to be used.
Callers can promise the data sum to one, in which case it will not calculate the sum. No checks of the validity of such promises are made, so expect failures for lying. (This should be generalised to use the sum directly).
If no PRNG object is passed then it croaks. One day it will default to an internal object that uses the perl PRNG stream and has a binomial method.
Passing your own PRNG means you have control over the random number stream used, and can use it as part of a separate analysis. The only requirement of such an object is that it has a binomial() method.
- $object->draw
-
Draw one sample from the distribution. Returns the sampled class number.
- $object->draw_n_samples ($n)
-
Returns an array ref of $n samples across the K classes, where K is the length of the data array passed in to the call to new. e.g. for $n=3 and the K=5 example from above, one could get (0,1,2,0,0).
- $object->get_class_count
-
Returns the number of classes in the sample, or zero if initialise has not yet been run.
BUGS AND LIMITATIONS
Please report any bugs or feature requests to https://github.com/shawnlaffan/perl-statistics-sampler-multinomial/issues.
SEE ALSO
These packages also have multinomial samplers and are (much) faster than this package, but you cannot supply your own PRNG. If you do not care that all your random samples come from the same PRNG stream then you should use them.
Math::Random, Math::GSL::Randist
AUTHOR
Shawn Laffan <shawnlaffan@gmail.com>
LICENCE AND COPYRIGHT
Copyright (c) 2016, Shawn Laffan <shawnlaffan@gmail.com>
. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
DISCLAIMER OF WARRANTY
BECAUSE THIS SOFTWARE IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE SOFTWARE, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE SOFTWARE "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE SOFTWARE IS WITH YOU. SHOULD THE SOFTWARE PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR, OR CORRECTION.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE SOFTWARE AS PERMITTED BY THE ABOVE LICENCE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE SOFTWARE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE SOFTWARE TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.