[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: available resource declaration language(s)





 Hello, all!

 (firts of all: yes, I read your paper. yes, I read your message. Anyway,
I perceived on your answer that we are talking about two completly
different things; HA and HP are two different worlds)


> What would make you think about TCP?  Certainly neither I, nor the article
> mentioned it, nor is it in the code, and I certainly wouldn't recommend it.


 It is a protocol that it is used in high-performance cluster application.
You did not talked about it, and I did talk to keep outside the
conversation there we have a common point, thus there is no problem there.

> Read the article.  Read the email you responded to ;-).  Better yet, try the
> code.  I'd be delighted to hear how it works for you.

 Yes, I read it and your full paper. And I thing that we are making
different assumption. You are thinking in HA with a switch. In this case,
your approach is PERFECT. No other discusion.

 I am talking about HP clusters, and with low budgets. This is another
completly different world, with different needs.

> Collisions are rarely a problem in a properly configured full-duplex
> switched network. A switch is cheap.  For example, the 24-port 100-mbit
> full-duplex switch on my home network cost about $300 USD.  If you have a

 (it is only to focus my position. Yours will be different)

  First problem. A switch costs here, in Brazil, more that $500USD, and in
north-northwest is more expensive. We have here a 68% importation tax. And
it can not be found in all regions of Brazil.

 Second problem. It is not one switch for the entire cluster. And we are
not talking about last-generation-Digital-nodes. Here the best
relationship between MIPS/$ is K6-II. Really, you have not too much more
to buy -maybe on the south, but you can not find the last-generation
processor in all Brasil-. That means more nodes, than means more than one
switch.

 Switchs were discarted at the first budget. Maybe other country, or
other kind of problem. Yes, we can afford some switches, but it is the
cost of one disk node or two diskless node.


> Why aren't you running full-duplex switches?  They *are* cheap.

  It depends were are you, and rationale below.

> traffic by .6% of busy causes you a problem, your network is too close to
> the edge, and will be in trouble in a few days when the load grows even if
> all this traffic is removed.


 Well, here is the point. All my nodes are overloaded, my network is at
the edge. By night, I overloaded also the teaching computer labs.

(Rationale) On my research field -protein structure-, the first takes
all. It doesn't matter what do you do; if somebody send the structure
before you to PDB you will not published. That means that the work of a
group of nearly a dozen of researchers goes to trash, all money spent with
reagents and so on. If it happends some time in a year, some people of the
groups may lost their grant. Some research club of some countries (US,
Canada) have really  gigant clusters and budgets tree orders of magnitude
bigger than my full university. It is like running with a Beattle against
a F1 car. And sometimes we win with our Beattle.

 That means that the most of the money go to processors and memory, and we
must use our resources at 100%. 98% is not enough. And that is the
battlefield of HP.

 Maybe your battlefield -HA- is completly different, but a broadcast
solution, directly, can't work on HP. If you have a HP solution with
broadcasting, you have a better one P2P. At last, It is my own experience
on HP clusters. Yours may be different.

> Each machine receives "n" updates per period of time.  Updating an in-memory
> table is pretty cheap.  If you code it right, the time to update an entry
> for a node is constant, so the total overhead is O(n).


 Yes, if your work is only see if your node is working -O(n)-. But if you
have to find the better node that fits in a group of constraints to send a
piece of work, it is not so easy as updating a table.

> How will you know which machines are working and which aren't unless you
> have some kind of keepalive or heartbeat?  This function (cluster
> membership) is a necessary function, unless unreliable clusters are the only
> kind you are interested in.

 Particulary, you are right. The most of the node of my cluster are
unrelable by the owner of the machine -that can switch off the machine at
any time-. I resolve this at application level, and I thing that this is a
problem to be solved at application level. But the most of the HP clusters
are in the same way: non-dedicated hardware. The budget is always a
problem; and the most of us used all our budget on a core cluster, and
after this we try to use pieces of CPU time of non-dedicated machines.

 Anyway, I thing that in a HP cluster is not so important to know at
application level who is working ad whois not. The fact that this were
transparent is marvellous, and anybody that had to configure a Beowulf PVM
cluster knows why. ;-)

 On HP clusters, you are not sending hearbeats. You are sending
information about the nodes and LOTS of information about the problem that
you are solving.


> If your machines are not constantly sending information, then they're idle.

 My machines never are idle.

> and reliably if a node leaves the cluster unexpectedly.  This latter piece
> is the single most important property for a high-availability cluster.


 This phrase is that showed to me that meanwhile I am playing soccer, you
are playing basketball. The rules are different. ;-)

> The thing I emphasized the most in my initial post was that whatever method
> applications use to get this data must be standardized through a single
> agnostic API.  This discussion points out *clearly* why I believe this quite
> passionately.


 I am begining to thing that we will need two different APIs, that will be
different kernel options. One is HA ckusters and other HP clusters. They
are SO different -you are showing me this- that I find quite dificult to
do a HP+HA API.


> The most important conclusion I draw from this interchange is that we MUST
> create a framework into which we can plug various methods, and have the
> client applications not care at all.  If we create such a framework, then
> the technologies can fight it out, and the winner will always be the user.

 Perfect. I am not going to talk about HA, but HP; and I thing that in
that case the framework would have the following guidelines -it is my
proponsal-. I will use as reference the four things that I have used more
-MPI, PVM, Mosix and Beowulf-. The four are completly differeng things,
but I will not talk about implementation, but features; it is a wish list.

1) The cluster have to be a semantics. PVM have a semantics, MPI have a
semantics, Mosix have a semantics. Maybe Mosix one is better -the whole
cluster is shown to the user as a SMP machine-. Mosix does this, thus it
is possible. Maybe one of the hot points of the discussion is deciding
what semantics is better for a HP cluster.

2) It should be a efficient method to send a task from the beginning to
the least loaded node  of the cluster. PVM have this, Mosix have not -in
Mosix the task can migrate after being launched, but its kernel part will
be executed on the launch node-.

3) It should be portable between different Linux architectures. Mosix are
not, the others are. (For me, it does not matter; but I know groups that
will find great this).

4) The network will be as transparent af we could. Mosix is great for
this, PVM and MPI does a good work, and Beowulf does nothing.

5) It must to allow to run cheap hardware efficiently. this take out
broadcast protocols, sorry. ;-)

6) Migrating running task is great. Mosix does a good work here, but not
perfect -sockets and shared memory code can not migrate-.

 I thing that failure tolerance must be resolved at user level. HP have
lots of completly different soft, and a general solution can be an
unaceptable overload -renember that in HP we alwais are at the limit of
the machines-.

 Well, it is only my opinion. I am only a developer of HP applications and
maintainer of a really weird cluster (with dedicated and non-dedicated
nodes), but maybe gives some insight to somebody.


 Yours:

David


---------------------
http://www.orcero.org
  irbis@orcero.org
---------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/