NAME
Net::CascadeCopy - efficiently copy files to many servers in multiple locations
SYNOPSIS
use Net::CascadeCopy;
# create a new CascadeCopy object
my $ccp = Net::CascadeCopy->new( { ssh => "/path/to/ssh",
ssh_flags => "-x -A",
max_failures => 3,
max_forks => 2,
} );
# set the command and arguments to use to transfer file(s)
$ccp->set_command( "scp", "-p" );
# set path on the local server
$ccp->set_source_path( "/path/on/local/server" );
# set path on all remote servers
$ccp->set_source_path( "/path/on/remote/servers" );
# add lists of servers in multiple datacenters
$ccp->add_group( "datacenter1", \@dc1_servers );
$ccp->add_group( "datacenter2", \@dc2_servers );
# transfer all files
$ccp->transfer();
DESCRIPTION
This module efficiently distributes a file or directory across a large number of servers in multiple datacenters via rsync or scp.
A frequent solution to distributing a file or directory to a large number of servers is to copy it from a central file server to all other servers. To speed this up, multiple file servers may be used, or files may be copied in parallel until the inevitable bottleneck in network/disk/cpu is reached. These approaches run in O(n) time.
This module and the included script, ccp, take a much more efficient approach that is O(log n). Once the file(s) are been copied to a remote server, that server will be promoted to be used as source server for copying to remaining servers. Thus, the rate of transfer increases exponentially rather than linearly.
Servers can be specified in groups (e.g. datacenter) to prevent copying across groups. This maximizes the number of transfers done over a local high-speed connection (LAN) while minimizing the number of transfers over the WAN.
The number of multiple simultaneous transfers per source point is configurable. The total number of simultaneously forked processes is limited via Proc::Queue, and is currently hard coded to 32.
CONSTRUCTOR
- new( { option => value } )
-
Returns a reference to a new use Net::CascadeCopy object.
Supported options:
- ssh => "/path/to/ssh"
-
Name or path of ssh script ot use to log in to each remote server to begin a transfer to another remote server. Default is simply "ssh" to be invoked from $PATH.
- ssh_flags => "-x -A"
-
Command line options to be passed to ssh script. Default is to disable X11 and enable agent forwarding.
- max_failures => 3
-
The Maximum number of transfer failures to allow before giving up on a target host. Default is 3.
- max_forks => 2
-
The maximum number of simultaneous transfers that should be running per source server. Default is 2.
INTERFACE
- $self->add_group( $groupname, \@servers )
-
Add a group of servers. Ideally all servers will be located in the same datacenter. This may be called multiple times with different group names to create multiple groups.
- $self->set_command( $command, $args )
-
Set the command and arguments that will be used to transfer files. For example, "rsync" and "-ravuz" could be used for rsync, or "scp" and "-p" could be used for scp.
- $self->set_source_path( $path )
-
Specify the path on the local server where the source files reside.
- $self->set_target_path( $path )
-
Specify the target path on the remote servers where the files should be copied.
- $self->transfer( )
-
Transfer all files. Will not return until all files are transferred.
BUGS AND LIMITATIONS
Note that this is still an alpha release.
If using rsync for the copy mechanism, it is recommended that you use the "--delete" and "--checksum" options. Otherwise, if the content of the directory structure varies slightly from system to system, then you may potentially sync different files from some servers than from others.
Since the copies will be performed between machines, you must be able to log into each source server to each target server (in the same group). Since empty passwords on ssh keys are insecure, the default ssh arguments enable the ssh agent for authentication (the -A option). Note that each server will need an entry in .ssh/known_hosts for each other server.
There are no known bugs in this module. Please report problems to VVu@geekfarm.org. Patches are welcome.
SEE ALSO
http://www.geekfarm.org/wu/muse/CascadeCopy.html
AUTHOR
Alex White <vvu@geekfarm.org>
LICENCE AND COPYRIGHT
Copyright (c) 2007, Alex White <vvu@geekfarm.org>
. All rights reserved.
All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the geekfarm.org nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.