[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A proposal for a General Clustering Framework



Bill Todd wrote:
> 
> ----- Original Message -----
> From: "Alan Robertson" <alanr@unix.sh>
> To: "Peter Badovinatz" <tabmowzo@yahoo.com>
> Cc: "linux-cluster" <linux-cluster@nl.linux.org>
> Sent: Wednesday, June 06, 2001 12:14 AM
> Subject: Re: A proposal for a General Clustering Framework
> 
> > Peter Badovinatz wrote:
> > >
> >
> > [snip]
> >
> > > Depends...  We could push customers who really cared about strict HA to
> avoid
> > > version heterogeneity except during actual node-by-node upgrade.  It was
> > > customers who were more oriented to cluster file system - where we still
> needed
> > > (a bit looser form of) HA and failover but they could be up to 500 node
> > > clusters - that we had no choice but to be very flexible as to
> supporting
> > > multiple versions across nodes because upgrading 500 nodes a few at a
> time
> > > takes a long time, and unlike more controlled HA clusters the workload
> was
> > > quite varied so many more things to upgrade/test/migrate.
> > >
> > > The better choice is to be sure the framework lets you handle multiple
> active
> > > version levels from the beginning, and you work to minimise their use by
> decree
> > > (as above) or by only supporting some limited number of levels (N and
> N+1 or
> > > N+2.)
> 
> Since Alan hasn't responded to my request for a reassessment (I understand
> that he's been busy today), and since the comments above may reflect a
> similar lack of understanding, I'll reiterate:
> 
> Supporting at least version N+1 of the *software* along with version N is
> clearly required to achieve on-line rolling upgrades.  What is not required
> is to support (concurrently) multiple versions of the communication
> protocols:  you can instead support only the old protocol set until all
> nodes have been upgraded to version N+1, and then perform a cluster-wide
> synchronous cut-over to the new protocol version.
> 
> If the cluster can be assumed to be under the control of a single
> organization (in contrast to the nodes on the Internet, for example), then
> requiring that all nodes be upgraded before the communication protocols
> change is not prima facie unreasonable:  indeed, it is difficult to create
> examples where a protocol-upgrade is urgently needed but only by some subset
> of the cluster's members, since a cluster is a considerably more
> tightly-bound entity than the nodes on the Internet.

You're right that an HA cluster owner has much more control over their
cluster than the average "n" random computers on the internet.

You're right about probably not needing to have more than two versions of
any one  given entity in the cluster at the same time.

However, I would disagree that they are only one version apart.

Most HA customers are extremely conservative.

Many of them upgrade only when forced to by something outside their control.

This sometimes results in only upgrading every 5 years.  This might mean the
versions of various software pieces would be 3-15 versions apart.

I'm not sure if there's a big technical difference between being able to
support any two releases that are 15 versions apart from saying that you're
going to support 15 versions at once.  Administratively, there is a *huge*
difference.  "This way lies madness".

Also, I'm not sure if all the technologies we develop will only be used on
clusters whose administration is tightly coupled.

I think that the mechanisms need to be capable of dealing with large
differences - and that in practice you only support certain combinations
*that you've tested*.

Those supported combinations are probably dictated more by politics and
money than technical judgement ;-)

	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/