[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A proposal for a General Clustering Framework




--- Bill Todd <billtodd@foo.mv.com> wrote:
<snip, other referenced info> 
> While I agree with the observation expressed elsewhere that name/value pairs
> would seem to address the problem you describe, I'll also point out that it
> is often easier (and/or safer) to define upgrades (at least those that
> affect protocol syntax or semantics) such that all nodes get upgraded (one
> by one, with no general outage) to new software that continues to use the
> old protocols - and then after all upgrades are complete, a synchronous
> cluster-wide transition to the new protocols occurs.  Among other things,
> this avoids proliferation of run-time conditionals that depend on the
> capabilities of each member.

I've done this.  Our normal recommendation for truly HA systems is to upgrade
them as "fast as possible" although this does mean one node at a time, to get
through this period in as timely a fashion as you can.  We also stated that you
should not perform any configuration changes during this window (i.e., define
new nodes, modify resource definitions, etc.)  Not that we strictly disallowed
this, just try to discourage it.

The code would switch over to new protocols once all nodes were upgraded.  You
still had to maintain the coexistence code though until you'd gotten rid of all
levels which could coexist.
> 
> In general, if one can avoid the requirement to support version
> heterogeneity across a cluster, life can be a lot simpler.  While version
> heterogeneity is clearly required in general networks (where systems are
> under the control of completely independent organizations), it's not clear
> that (at least what I think of as) a cluster can't reasonably impose such a
> constraint.
> 
> - bill
> 
Depends...  We could push customers who really cared about strict HA to avoid
version heterogeneity except during actual node-by-node upgrade.  It was
customers who were more oriented to cluster file system - where we still needed
(a bit looser form of) HA and failover but they could be up to 500 node
clusters - that we had no choice but to be very flexible as to supporting
multiple versions across nodes because upgrading 500 nodes a few at a time
takes a long time, and unlike more controlled HA clusters the workload was
quite varied so many more things to upgrade/test/migrate.

The better choice is to be sure the framework lets you handle multiple active
version levels from the beginning, and you work to minimise their use by decree
(as above) or by only supporting some limited number of levels (N and N+1 or
N+2.)

Peter

=====
These have been the opinions of:
Peter R. Badovinatz -- (503)578-5530 (TL 775)
wombat@us.ibm.com/tabmowzo@yahoo.com
and in no way should be construed as official opinion of 
IBM, Corp.

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail - only $35 
a year!  http://personal.mail.yahoo.com/

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/