NAME

OpenMosix::HA -- High Availability (HA) layer for an openMosix cluster

SYNOPSIS

use OpenMosix::HA;

my $ha = new OpenMosix::HA;

# start the monitor daemon 
$ha->monitor;

DESCRIPTION

This module provides the basic functionality needed to manage resource startup and restart across a cluster of openMosix machines.

This gives you a high-availability cluster with low hardware overhead. In contrast to traditional HA clusters, we use the openMosix cluster membership facility, rather than hardware serial cables or extra ethernet ports, to provide heartbeat and to detect network partitions.

All you need to do is build a relatively conventional openMosix cluster, install this module on each node, and configure it to start and manage your HA processes. You do not need the relatively high-end server machines which traditional HA requires. There is no need for chained SCSI buses (though you can use them) -- you can instead share disks among many nodes via any number of other current technologies, including SAN, NAS, GFS, or Firewire (IEEE-1394).

BACKGROUND

Normally, a process-migration-based cluster computing technology (such as openMosix) is orthogonal to the intent of high availability. When openMosix nodes die, any processes migrated to those nodes will also die, regardless of where they were spawned. The higher the node count, the more frequently these failures are likely to occur.

But if processes are started via OpenMosix::HA, any processes and resource groups which fail due to node failure can be configured to automatically restart on other nodes. OpenMosix::HA detects process failure, selects a new node out of all currently available, and deconflicts the selection so that two nodes don't restart the same process or resource group.

While similar to the normal inittab format, the configuration file for OpenMosix::HA includes an extra "resource group" column -- this is what enables you to group processes, disk mounts, virtual IP addresses, and related resources into resource groups.

Any given node only needs to be able to support a subset of all resource groups. OpenMosix::HA provides an extra "test" runmode (beyond init's normal 'wait', 'once', and 'respawn'), enabling the module to automatically test a given node for fitness before considering starting a given resource group there.

There is no "head" or "supervisor" node in an OpenMosix::HA cluster -- there is no single point of failure. Each node makes its own observations and decisions about the start or restart of processes and resource groups.

IO Fencing (also STOMITH or STONITH, the art of making sure that a partially-dead node doesn't continue to access shared resources) can be handled as it is in conventional HA clusters, by a combination of exclusive device logins when using Firewire, distributed locks when using GFS or other SAN, and brute-force methods such as X10 or network-controlled powerstrips. OpenMosix::HA provides a callback hook which can be used to trigger the latter.

METHODS

new(%parms)

Loads Cluster::Init, but does not start any resource groups.

Accepts an optional parameter hash which you can use to override module defaults. Defaults are set for a typical openMosix cluster installation. Parameters you can override include:

mfsbase

MFS mount point. Defaults to /mfs.

mynode

Mosix node number of local machine. You should only override this for testing purposes.

varpath

The local path under / where the module should look for the hactl and clinittab files, and where it should put clinitstat and clinit.s; this is also the subpath where it should look for these things on other machines, under /mfsbase/NODE. Defaults to var/mosix-ha.

timeout

The maximum age (in seconds) of any node's clinitstat file, after which the module considers that node to be stale, and calls for a STOMITH. Defaults to 60 seconds.

XXX STOMITH callback.
monitor()

Starts the monitor daemon. The monitor ensures the resource groups in clinittab are each running somewhere in the cluster, at the runlevels specified in hactl. Any resource groups found not running are candidates for a restart on the local node.

Before restarting a resource group, the local monitor announces its intentions in the local clinitstat file, and observes clinitstat on other nodes. If the monitor on any other node also intends to start the same resource group, then the local monitor will detect this and cancel its own restart. The checks and restarts are staggered by random times on various nodes to prevent oscillation.

XXX document run levels: plan test run stop

INSTALLATION

FILES

XXX list files and their purposes; refer to Cluster::Init default filenames

AVAILABILITY

This module is based on my IS::Init module, which is already in production and available from CPAN. My wife and I had hoped to have a beta version of OpenMosix::HA available by the time of Moshe Bar's Feb 5 2003 openMosix talk at the Silicon Valley Linux Users Group.

Then I unexpectedly became involved in data collection for Columbia's California transit -- SVLUG member Ian Kluft was one of the few witnesses. We decided it best to defer work on this module in favor of improving our understanding of where the orbiter's breakup actually began, relaying our results to Johnson Space Center and working with media to encourage others to do the same. These efforts by ourselves and others have been successful beyond what any of us expected -- NASA JSC emergency ops responded to us personally and as of this writing a search in California is already underway. But I don't have a Perl module for you yet.

For a production version of OpenMosix::HA, check CPAN.org, TerraLuna.Org, or Infrastructures.Org in early March 2003, or contact me. Beta versions will become available as time permits before then.

AUTHOR

Steve Traugott
CPAN ID: STEVEGT
stevegt@TerraLuna.Org
http://www.stevegt.com

COPYRIGHT

Copyright (c) 2003 Steve Traugott. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

The full text of the license can be found in the LICENSE file included with this module.

SEE ALSO

IS::Init, openMosix.Org, qlusters.com