[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A proposal for a General Clustering Framework



Hi guys,

I am jumping in to a rather old discussion, but I didn't find time before.
Customers. Eek. (Should any participant reading this be a customer, you and
everyone else you know is of course excluded from that statement ;-)

On 2001-06-06T00:29:46,
   Alan Robertson <alanr@unix.sh> said:

> I'm not sure if there's a big technical difference between being able to
> support any two releases that are 15 versions apart from saying that you're
> going to support 15 versions at once.  Administratively, there is a *huge*
> difference.  "This way lies madness".

Let me offer a different perspective here.

Do the nodes running two different versions of the protocol really have to be
able to talk to each other? The discussion appears to assume that the answer
to this question is "Yes".

Now, I am going to explain to you why I think the answer should in fact be
"No".

If you are doing an upgrade which changes APIs (ie, protocol version change,
new attributes), you are asking for the software to bridge a potentially huge
communication gap.

Sure, a new attribute might be filled in automatically by defaulting it to a
sensible value - this is "easy enough" if it is static and independant, but
you have to embed complex logic if it is in fact related to other attributes.
It becomes a nightmare if you aren't adding or deleting an attribute, but
_changing_ the meaning of an attribute, potentially because the old code had a
bug which treated it incorrectly. This different understanding of the
parameters can in fact lead to a quite-non-HA cluster.

So in fact, it is desireable that the answer is "No" to reduce complexity.

So, _can_ the answer be "No"? Yes.

What upgrading a node to a new software release effectively is manually
partitioning the cluster - into the nodes which already have the new software
and those which do not.

If a new node is updated, it will join the "new" cluster. If you figure that
the new software release doesn't work, you downgrade it and it will go back to
the old cluster.

As part of this, you do not have to have the two versions talk to eachother,
because they are completely independant.

The only requirement you have to satisfy here is that two versions of the
protocol on the same wire (logically speaking) truely do not interfere and
that the software ignores any version of the protocols but its own.

This does have a slight penalty obviously: During the upgrade period, your
redundancy is reduced. However, I think this is acceptable, as the upgrade is
a controlled operation and you have experts on site to fix everything which
might go wrong.

It may be desireable to support the following features in the resource
management to make this more seamless for the clients:

- Be able to instruct the resource manager that a node is about to drop out,
  but that the services which were run on this node should NOT be restarted on
  another node, ie "detaching" a node and all services it is running.

- When upgrading the software on the local node, be able to tell the local
  resource manager that even though it is going down, it should NOT take the
  resources down.

- Being able to "reattach" to resources.

Comments?

-- 
"I'm extraordinarily patient provided I get my own way in the end."
        -- Margeret Thatcher

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/