[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: ETCP Project
Greg Lindahl wrote:
>
> I'd agree. But the failover guys have better tools than I thought they
> had.
Us failover guys are glad you think so ;-)
> I have one thing in my clusters I want to make failover, that's
> the "master" node which runs the queue system. The queue system has a
> fairly small amount of state, so drbd+heartbeat/takeover looks like
> it's good enough for my purposes. Neato.
Is your queuing system open source software?
I've been wanting VERY MUCH to start an open source project to put together
an HA/HPC highly available job scheduling/queuing system. But I didn't know
what tools were out there for cluster scheduling.
If your queuing software stores its queue on disk, in a robust fashion, then
it's easy to fail over using a shared disk or DBRD-type mirroring
arrangement.
For what it's worth, there is a (largely inactive) mailing list for this
project. You can find it here:
http://hacqs.community.tummy.com/mailman/listinfo/hacqs
HACQS stands for High-Availability Cluster Queueing System.
Greg: I just subscribed you. Hope that's OK... ;-)
One interesting thing that heartbeat does that a queueing system might like
to take advantage of is that each heartbeat packet has a certain amount of
data added to it automatically. This is done in a fairly modular way, but
one of the things it currently adds automatically is the load average
information from /proc/loadavg.
The API doesn't have a way for your application to retrieve that information
(today), but it could be easily added. The API is really just now getting
usable, so changing it is no problem.
-- Alan Robertson
alanr@unix.sh
Linux-cluster: generic cluster infrastructure for Linux
Archive: http://mail.nl.linux.org/linux-cluster/