[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
High Availability versus Automatic Process Migration
There are two problems associated with automatic process migration when
considered in the context of HA:
1) The HA cluster manager wants to control what's running where *all the
time*,
so it can take proper recovery actions and guarantee the
paranoid-by-definition customer what they want in terms of migration
strategies.
2) Some kinds of process migration (like Mosix) are low-availability
solutions:
if every process has migrated from its original machine loss of any
node can kill all processes (for a 2-node system).
Successful HA customers often have these characteristics:
Availability and data integrity are EVERYTHING
control-freaks
anal-retentive
paranoid
perfectionists
I don't know enough about HPC customers, but I don't associate these
characteristics with the HPC arena.
My guess is that it will be these differences that push the solutions
farther apart and make them to separate market niches - even if all the
technology could otherwise be common.
Having said that, there are MANY possible common elements between HA and HPC
clusters, and many common solutions are possible.
A few examples come readily to mind:
Cluster membership and corresponding event APIs
single-image boot
cluster filesystems
system monitoring
node reset mechanisms (i.e., Stonith)
Unless a given feature REQUIRES a kernel implementation by definition, I
would strongly recommend against using /proc-like interfaces for user
programs, but instead define an API which can be easily and sensibly
implemented by user-level programs.
Of course, if there is a /proc-thinggie around, the API could just turn
around and ask /proc (through a plug in model). BUT, the applications
shouldn't be doing this themselves.
Heartbeat is a user-space cluster manager. We originally implemented a
/proc interface for it because it was cool, and could be common with kernel
implementations. It was also a mistake, and has been dropped. I now
believe that doing it the other way around makes more sense.
-- Alan Robertson
alanr@unix.sh
Linux-cluster: generic cluster infrastructure for Linux
Archive: http://mail.nl.linux.org/linux-cluster/