NAME

Net::Amazon::HadoopEC2::Cluster - Representation of Hadoop-EC2 cluster

SYNOPSIS

my $hadoop = Net::Amazon::HadoopEC2->new(
    {
        aws_account_id => 'my account',
        aws_access_key_id => 'my key',
        aws_secret_access_key => 'my secret',
    }
);
my $cluster = $hadoop->launch_cluster(
    {
        naem => 'hadoop-ec2-cluster',
        image_id => 'ami-b0fe1ad9' # hadoop-ec2 official image
        slaves => 2,
        key_name => 'gsg-keypair',
        key_file => "$ENV{HOME}/.ssh/id_rsa-gsg-keypair',
    }
);
$cluster->push_file(
    {
        files => ['map.pl', 'reduce.pl'],
        destination => '/root/',
    }
);
my $option = join(' ', qw(
        -mapper map.pl
        -reducer reduce.pl
        -file map.pl
        -file reduce.pl
    )
);
my $result = $cluster->execute(
    {
        command => "$hadoop jar $streaming $option",
    }
);

DESCRIPTION

A class Representing Hadoop-EC2 cluster

METHODS

new

Constructor. Normally Net::Amazon::HadoopEC2 calls this so you won't need to think about this.

launch_cluster ($hashref)

Launches hadoop-ec2 cluster. Returns Net::Amazon::HadoopEC2::Cluster instance itself when succeeded.

image_id (required)

The image id (ami) of the cluster.

key_name (optional)

The key name to use when launching cluster. the default is 'gsg-keypair'.

key_file (required)

Location of the private key file associated with key_name.

slaves (optional)

The number of slaves. The default is 2.

find_cluster

Finds hadoop-ec2 cluster. Returns Net::Hadoop::EC2::Cluster instance itself if found.

launch_slave ($hashref)

Launches hadoop-ec2 slave instance for this cluster. Returns Net::Hadoop::EC2::Cluster instance itself if succeeded. Arguments are:

slaves (optional)

The number of slaves to launch. default is 1.

terminate_cluster

Terminates all EC2 instances of this cluster. Returns Net::Amazon::EC2::TerminateInstancesResponse instance.

terminate_slaves ($hashref)

Terminates hadoop-ec2 slave instances of this cluster. Returns Net::Amazon::EC2::TerminateInstancesResponse instance. Arguments are:

slaves (optional)

The number of slave instances to terminate. the default is the number of exisiting instances.

execute ($hashref)

Runs command on the master instance via ssh. Returns Net::Amazon::HadoopEC2::SSH::Response instance. This method is implemented in Net::Amazon::HadoopEC2::SSH and it's only wrapper of Net::SSH::Perl. Arguments are:

command (required)

The command line to pass.

stdin (optional)

String to pass to STDIN of the command.

push_files ($hashref)

Pushes local files to hadoop-ec2 master instance via ssh. This method is also implemented in Net::Amazon::HadoopEC2::SSH. Returns true if succeeded. Arguments are:

files (required)

files to push. Accepts string or arrayref of strings.

destination (required)

Destination of the files.

get_files ($hashref)

Gets files on the hadoop-ec2 master instance. This method is implemented in Net::Amazon::HadoopEC2::SSH. Returns true if succeeded. Arguments are:

files (required)

files to get. String and arrayref of strings is ok.

destination (required)

local path to place the files.

ATTRIBUTES

name

Name of the cluster.

key_file

The key name to use when launching cluster. the default is 'gsg-keypair'.

retry

Boolean whether EC2 api request retry or not.

map_tasks

MAX_MAP_TASKS to pass to the instances when boot.

reduce_tasks

MAX_REDUCE_TASKS to pass to the instances when boot.

compress

COMPRESS to pass to the instances when boot.

user_data

additional user data to pass to the instances when boot.

master_instance

Net::Amazon::EC2::RunningInstances instance of master instance.

slave_instances

Arrayref of Net::Amazon::EC2::RunningInstances instance of master instance.

AUTHOR

Nobuo Danjou nobuo.danjou@gmail.com

SEE ALSO

Net::Amazon::HadoopEC2