NAME
Net::Amazon::HadoopEC2::Cluster - Representation of Hadoop-EC2 cluster
SYNOPSIS
my $hadoop = Net::Amazon::HadoopEC2->new(
    {
        aws_account_id => 'my account',
        aws_access_key_id => 'my key',
        aws_secret_access_key => 'my secret',
    }
);
my $cluster = $hadoop->launch_cluster(
    {
        naem => 'hadoop-ec2-cluster',
        image_id => 'ami-b0fe1ad9' # hadoop-ec2 official image
        slaves => 2,
        key_name => 'gsg-keypair',
        key_file => "$ENV{HOME}/.ssh/id_rsa-gsg-keypair',
    }
);
$cluster->push_file(
    {
        files => ['map.pl', 'reduce.pl'],
        destination => '/root/',
    }
);
my $option = join(' ', qw(
        -mapper map.pl
        -reducer reduce.pl
        -file map.pl
        -file reduce.pl
    )
);
my $result = $cluster->execute(
    {
        command => "$hadoop jar $streaming $option",
    }
);
DESCRIPTION
A class Representing Hadoop-EC2 cluster
METHODS
new
Constructor. Normally Net::Amazon::HadoopEC2 calls this so you won't need to think about this.
launch_cluster ($hashref)
Launches hadoop-ec2 cluster. Returns Net::Amazon::HadoopEC2::Cluster instance itself when succeeded.
- image_id (required)
 - 
The image id (ami) of the cluster.
 - key_name (optional)
 - 
The key name to use when launching cluster. the default is 'gsg-keypair'.
 - key_file (required)
 - 
Location of the private key file associated with key_name.
 - slaves (optional)
 - 
The number of slaves. The default is 2.
 
find_cluster
Finds hadoop-ec2 cluster. Returns Net::Hadoop::EC2::Cluster instance itself if found.
launch_slave ($hashref)
Launches hadoop-ec2 slave instance for this cluster. Returns Net::Hadoop::EC2::Cluster instance itself if succeeded. Arguments are:
terminate_cluster
Terminates all EC2 instances of this cluster. Returns Net::Amazon::EC2::TerminateInstancesResponse instance.
terminate_slaves ($hashref)
Terminates hadoop-ec2 slave instances of this cluster. Returns Net::Amazon::EC2::TerminateInstancesResponse instance. Arguments are:
- slaves (optional)
 - 
The number of slave instances to terminate. the default is the number of exisiting instances.
 
execute ($hashref)
Runs command on the master instance via ssh. Returns Net::Amazon::HadoopEC2::SSH::Response instance. This method is implemented in Net::Amazon::HadoopEC2::SSH and it's only wrapper of Net::SSH::Perl. Arguments are:
- command (required)
 - 
The command line to pass.
 - stdin (optional)
 - 
String to pass to STDIN of the command.
 
push_files ($hashref)
Pushes local files to hadoop-ec2 master instance via ssh. This method is also implemented in Net::Amazon::HadoopEC2::SSH. Returns true if succeeded. Arguments are:
- files (required)
 - 
files to push. Accepts string or arrayref of strings.
 - destination (required)
 - 
Destination of the files.
 
get_files ($hashref)
Gets files on the hadoop-ec2 master instance. This method is implemented in Net::Amazon::HadoopEC2::SSH. Returns true if succeeded. Arguments are:
- files (required)
 - 
files to get. String and arrayref of strings is ok.
 - destination (required)
 - 
local path to place the files.
 
ATTRIBUTES
name
Name of the cluster.
key_file
The key name to use when launching cluster. the default is 'gsg-keypair'.
retry
Boolean whether EC2 api request retry or not.
map_tasks
MAX_MAP_TASKS to pass to the instances when boot.
reduce_tasks
MAX_REDUCE_TASKS to pass to the instances when boot.
compress
COMPRESS to pass to the instances when boot.
user_data
additional user data to pass to the instances when boot.
master_instance
Net::Amazon::EC2::RunningInstances instance of master instance.
slave_instances
Arrayref of Net::Amazon::EC2::RunningInstances instance of master instance.
AUTHOR
Nobuo Danjou nobuo.danjou@gmail.com