NAME

Algorithm::RandomPointGenerator -- This module generates a set of random points in a 2D plane according to a user-specified probability distribution that is provided to the module in the form of a 2D histogram.

SYNOPSIS

#  The quickest way to use the module is through the script genRand2D that you'll
#  find in the examples subdirectory.  You can move this script to any convenient
#  location in your directory structure.  Call this script as a command-line utility
#  in the following manner:

genRand2D  --histfile  your_histogram_file.csv  --bbfile  your_bounding_box_file.csv

#  where the '--histfile' option supplies the name of the file that contains a 2D
#  histogram and the option '--bbfile' the name of the file that defines a bounding
#  box in the XY-plane to which the histogram applies. The module uses the
#  Metropolis-Hastings algorithm to draw random points from a probability density
#  function that is approximated by the 2D histogram you supply through the
#  '--histfile' option. You can also run the command

genRand2D  --help

#  for further information regarding these two command-line options.  An invocation
#  of genRand2D gives you 2000 random points that are deposited in a file whose
#  name is printed out in the terminal window in which you invoke the genRand2D
#  command.


#  The rest of this Synopsis concerns using the module with greater control over
#  the production and display of random points.  Obviously, the very first thing
#  you would need to do would be to import the module:

use Algorithm::RandomPointGenerator;

#  Next, name the file that contains a 2D histogram for the desired density
#  function for the generation of random points:

my $input_histogram_file = "histogram.csv";

#  Then name the file that defines the bounding box for the random points:

my $bounding_box_file =  "bounding_box.csv";

#  Now construct an instance of RandomPointGenerator using a call that, assuming
#  you wish to set all the constructor options, would look like:

my $generator = Algorithm::RandomPointGenerator->new(
                          input_histogram_file    => $input_histogram_file,
                          bounding_box_file       => $bounding_box_file,
                          number_of_points        => 2000,
                          how_many_to_discard     => 500,
                          proposal_density_width  => 0.1,
                          y_axis_pos_direction    => 'down', 
                          output_hist_bins_along_x => 40,
);

#  The role served by the different constructor options is explained later in this
#  documentation. After constructing an instance of the module, you would need to
#  call the following methods for generating the random points and for displaying
#  their histogram:

$generator->read_histogram_file_for_desired_density();
$generator->read_file_for_bounding_box();
$generator->normalize_input_histogram();
$generator->set_sigmas_for_proposal_density();
$generator->metropolis_hastings();
$generator->write_generated_points_to_a_file();
$generator->make_output_histogram_for_generated_points();
$generator->plot_histogram_3d_surface();

#  As to what is accomplished by each of the methods called above is described
#  later in this documentation.  Note that since several of the constructor
#  parameters have defaults, a minimal call to the constructor may look as brief
#  as:

my $generator = Algorithm::RandomPointGenerator->new(
                          input_histogram_file    => $input_histogram_file,
                          bounding_box_file       => $bounding_box_file,
);

#  In this case, the number of points to be generated is set to 2000.  These will
#  be the points after the first 500 that are discarded to get past the effects of
#  the starting state of the generator.

DESCRIPTION

Several testing protocols for "big data" research projects involving large geographic areas require a random set of points that are distributed according to a user-specified probability density function which is likely to exist in the form of a 2D histogram. This module is an implementation of the Metropolis-Hastings algorithm for generating such a set of points.

METHODS

The module provides the following methods:

new():

A call to new() constructs a new instance of the Algorithm::RandomPointGenerator class. If you wanted to set all the constructor options, this call would look like:

my $generator = Algorithm::RandomPointGenerator->new(
                          input_histogram_file    => $input_histogram_file,
                          bounding_box_file       => $bounding_box_file,
                          number_of_points        => 2000,
                          how_many_to_discard     => 500,
                          proposal_density_width  => 0.1,
                          y_axis_pos_direction    => 'down', 
                );

where input_histogram_file is the name of the file that contains a 2D histogram as an approximation to the desired probability density function for the random points to be generated. The data in the histogram file is expected to be in CSV format. Here is a display of a very small portion of the contents of such a file for an actual geographic region:

0,211407,216387,211410,205621,199122,192870, ........
0,408221,427716,427716,427716,427716,427716,427716, ......
0,408221,427716,427716,427716,427716,427716,427716, ......
....
....
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,165,9282,11967,15143, .....
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,....

The bounding_box_file parameter of the constructor should delineate the portion of the plane to which the input histogram corresponds. Here is an example of the contents of an actual file supplied for this option:

-71.772016, -70.431923
-34.254251,  -33.203240

Apart from any comment lines, there must exist exactly two lines in the bounding-box file, with the first line indicating the left and the right limits of the horizontal coordinates and the second line indicating the lower and the upper limits of the vertical coordinates. (The values shown above are the longitude and the latitude limits for a region in Chile, in case you are curious.)

Constructor Parameters:

input_histogram_file:

This required parameter supplies the name of the file that contains a 2D histogram as the desired density function for the random points that the module will generate. Each line record in this file must correspond to one row of the 2D histogram. The left-to-right direction in the file will be considered to be positive direction for the x-axis. As for the positive direction for the y-axis, it is common for that to be from top to bottom when the histogram is written out to a text file as an array of integers. It is important to bear in mind this orientation of the histogram in light of the fact that a bounding box is specified typically with its y-axis going upwards (whereas the x-axis continues to be positive from left to right). This inconsistency between how a 2D histogram is typically stored in a text file and how a bounding box is likely to be specified means that if the events occur more frequently in the upper right hand corner of the bounding box, those high counts would show up in the lower right hand corner of the histogram in a text file (or in a terminal display of the contents of such a file). You can use the constructor option y_axis_pos_direction to reverse the positive sense of the y direction for the histogram. If you set y_axis_pos_direction to the string up, then the orientation of the y axis in both the histogram and the bounding box will be the same.

bounding_box_file:

This required parameter supplies the name of the file that contains the bounding box information. By bounding box is meant the part of the XY-plane that corresponds to the histogram supplied through the input_histogram_file option. Apart from any comment lines, this file must contain exactly two lines and each line must contain exactly two numeric entries. Additionally, the first entry in each of the two lines must be smaller than the second entry in the same line. The two entries in the first line define the lower and the upper bounds on the x-axis and the two entries in the second line do the same for the y-axis.

number_of_points:

This parameter specifies the number of random points that you want the module to generate.

how_many_to_discard:

The Metropolis-Hastings algorithm belongs to a category of algorithms known as random-walk algorithms. Since the random walk carried out by such algorithms must be initialized with user input, it is necessary to discard the points produced until the effects the initial state have died out. This parameter specifies how many of the generated points will be discarded. This parameter is optional in the sense that it has a default value of 500.

proposal_density_width:

Given the current point, the Metropolis-Hastings randomly chooses a candidate for the next point. This random selection of a candidate point is carried out using what is known as the "proposal density". The module uses a bivariate Gaussian for the proposal density. The value supplied through this parameter controls the standard deviations of the proposal Gaussian in the x and the y directions. The default value for this parameter is 0.1. With that value for the parameter, the standard deviation of the proposal density along the x direction will be set to 0.1 times the width of the bounding box, and the standard deviation of the same along the y-direction to 0.1 times the height of the bounding box.

y_axis_pos_direction:

See the explanation above for the constructor parameter input_histogram_file for why you may need to use the y_axis_pos_direction parameter. The parameter takes one of two values, up and down. The usual interpretation of a 2D histogram in a text file --- with the positive direction of the y-axis pointing downwards --- corresponds to the case when this parameter takes the default value of down.

output_hist_bins_along_x:

This parameter controls the resolution with which the histogram of the generated random points will be displayed. The value you supply is for the number of bins along the x-direction. The number of bins along the y-direction is set according to the aspect ratio of the bounding box.

read_histogram_file_for_desired_density():
$generator->read_histogram_file_for_desired_density();

This is a required call after the constructor is invoked. As you would expect, this call reads in the histogram data for the desired probability density function for random point generation.

read_file_for_bounding_box():
$generator->read_file_for_bounding_box();

This is also a required call. This call reads in the left and the right limits on the x-axis and the lower and the upper limits on the y-axis for the portion of the XY-plane for which the random points are needed.

normalize_input_histogram():
$generator->normalize_input_histogram();

This normalizes the input histogram to turn it into an approximation to the desired probability density function for the random points.

set_sigmas_for_proposal_density():
$generator->set_sigmas_for_proposal_density();

The Metropolis-Hastings algorithm is a random-walk algorithm that at each point first creates a candidate for the next point and then makes a probabilistic decision regarding the acceptance of the candidate point as the next point on the walk. The candidate point is drawn from what is known as the proposal density function. This module uses a bivariate Gaussian for the proposal density. You set the width of the proposal density by calling this method.

metropolis_hastings():

This is the workhorse of the module, as you would expect. As its name implies, it uses the Metropolis-Hastings random-walk algorithm to generate a set of random points whose probability distribution corresponds to the input histogram.

write_generated_points_to_a_file():
$generator->write_generated_points_to_a_file();

This method writes the generated points to a disk file whose name is keyed to the name of the input histogram file. The two coordinates for each generated random point are written out to a new line in this file.

make_output_histogram_for_generated_points():
$generator->make_output_histogram_for_generated_points();

The name of the method speaks for itself.

plot_histogram_3d_surface():
$generator->plot_histogram_3d_surface();

This method uses a Perl wrapper for Gnuplot, as provided by the module Graphics::GnuplotIF, for creating a 3D surface plot of the histogram of the random points generated by the module.

plot_histogram_lineplot():
$generator->plot_histogram_lineplot(); 

This creates a 3D line plot display of the histogram of the generated random points.

display_output_histogram_in_terminal_window():
$generator->display_output_histogram_in_terminal_window();

Useful for debugging purposes, it displays in the terminal window a two dimensional array of numbers that is the histogram of the random points generated by the module.

THE examples DIRECTORY

Probably the most useful item in the examples directory is the command-line script genRand2D that can be called simply with two arguments for generating a set of random points. A call to this script looks like

genRand2D  --histfile  your_histogram_file.csv  --bbfile  your_bounding_box_file.csv

where the --histfile option supplies the name of the file that contains a 2D input histogram and the --bbfile option the name of the file that defines the bounding box in the XY-plane. You can also execute the command line

genRand2D  --help

for displaying information as to what is required by the two options for the genRand2D command. The command-line invocation of genRand2D gives you 2000 random points that are deposited in a file whose name is printed out in the terminal window in which you execute the command.

To become more familiar with all of the different options you can use with this module, you should experiment with the script:

generate_random_points.pl

You can feed it different 2D histograms --- even made-up 2D histograms --- and look at the histogram of the generated random points to see how well the module is working. Keep in mind, though, if your made-up input histogram has disconnected blobs in it, the random-points that are generated may correspond to just one of the blobs. Since the process of random-point generation involves a random walk, the algorithm may not be able to hop from one blob to another in the input histogram if they are too far apart. As to what exactly you'll get by way of the output histogram would depend on your choice of the width of the proposal density.

The examples directory contains the following histogram and bounding-box files for you to get started:

hist1.csv   bb1.csv

hist2.csv   bb2.csv    

If you run the generate_random_points.pl script with the hist1.csv and bb1.csv files, the histogram you get for the 2000 random points generated by the module will look like what you see in the file output_histogram_for_hist1.png. On a Linux machine, you can see this histogram with the usual display command from the ImageMagick library. And if you run generate_random_points.pl script with the hist2.csv and bb2.csv files, you'll see an output histogram that should look like what you see in output_histogram_for_hist2.png.

You should also try invoking the command-line calls:

genRand2D --histfile hist1.csv --bbfile bb1.csv

genRand2D --histfile hist2.csv --bbfile bb2.csv

REQUIRED

This module requires the following three modules:

List::Util
Graphics::GnuplotIF
Math::Big
Math::Random
File::Basename

EXPORT

None by design.

BUGS

Please notify the author if you encounter any bugs. When sending email, please place the string 'RandomPointGenerator' in the subject line.

INSTALLATION

The usual

perl Makefile.PL
make
make test
make install

if you have root access. If not,

perl Makefile.PL prefix=/some/other/directory/
make
make test
sudo make install

AUTHOR

Avinash Kak, kak@purdue.edu

If you send email, please place the string "RandomPointGenerator" in your subject line to get past my spam filter.

COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Copyright 2014 Avinash Kak