[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ETCP Project



Greg Lindahl wrote:
> 
> I'd agree. But the failover guys have better tools than I thought they
> had.

Us failover guys are glad you think so ;-)

> I have one thing in my clusters I want to make failover, that's
> the "master" node which runs the queue system. The queue system has a
> fairly small amount of state, so drbd+heartbeat/takeover looks like
> it's good enough for my purposes. Neato.

Is your queuing system open source software?

I've been wanting VERY MUCH to start an open source project to put together
an HA/HPC highly available job scheduling/queuing system.  But I didn't know
what tools were out there for cluster scheduling.

If your queuing software stores its queue on disk, in a robust fashion, then
it's easy to fail over using a shared disk or DBRD-type mirroring
arrangement.

For what it's worth, there is a (largely inactive) mailing list for this
project.  You can find it here:
	http://hacqs.community.tummy.com/mailman/listinfo/hacqs

HACQS stands for High-Availability Cluster Queueing System.

Greg:  I just subscribed you.  Hope that's OK... ;-)

One interesting thing that heartbeat does that a queueing system might like
to take advantage of is that each heartbeat packet has a certain amount of
data added to it automatically.  This is done in a fairly modular way, but
one of the things it currently adds automatically is the load average
information from /proc/loadavg.

The API doesn't have a way for your application to retrieve that information
(today), but it could be easily added.  The API is really just now getting
usable, so changing it is no problem.

	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/