[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: available resource declaration language(s)





 Hello, Alan!

> I would suggest that we not give up quickly on the idea of having common
> components and APIs.  I believe that some implementations of components will

 It is not giving up the idea; I thing that there is some subsystems that
may be common, and some not. I thing that the API would be different, but
we can rehuse losts of code between the different projects.

 Anyway, maybe the most important part is that the projects were not
mutually incompatible. That means that somebody could enable HA and HP
kernel options at the same. At this time, appling simultaneusly the
parches of two different projects is really an headache -when it works-.

> And, just as you pointed out, my view isn't sufficient for everyone, and I'm
> sure you would agree that your view isn't sufficient for everyone either.

 I agree completly with this. My view is very focused on HP clusters on
CPU-bounded process; it is not the only one use of clusters, but I thing
that it is an important one. Not the only one -HA, network-bounded and
disk-bounded are also very important-; but it is mine one, and is the one
that I know best. I am doing some brainstorming to all of us -including me- get
a broather view of the full thing. When we have a clear common objetive, I
also will like to colaborate with some code. :-) At last, all the
discussion will finish coding. ;-)

> For example, we've talked about membership.  Most (all?) cluster
> applications need some form of membership services.  But different


 Maybe this is one of the points where we can find a common solution, and
reuse lots of code. IMHO our comon goals are:

1) A system call to explicitily include or exclude a node on the cluster
2) The kernel structure with the list of nodes of the cluster, with
metrics of the quality of the node.
3) The manipulation routines of that kernel structure
4) For each node, the best IP to forward the packets related to kernel
cluster messages to reach better to the node (very interesting on complex
topologies; on my own experience is a thing that It helps, because I use
several network cards attached to each control node, and there is more
that a way to reach some nodes on the cluster. I would like that NFS
packets and kernel cluster messages uses different ways; actually Mosix
get crazy with some topologies). Hacking the routing tables does not work;
because we want that kernel packets take other different route. This can
also help to you, as far as you can have two network cards for node, and
one of them is ony for HA messages; you avoid that a malicious node full
the channel and block HA messages.
5) Routines to drop a node that goes down;  it include freeing the kernel
resources atached to the remote  node.

 All this things can be common on all clustering projects, and we can take
any existing implementation and use it. IMHO there will be, anyway, things
that  will be different and must be choosen during the kernel:

1) Policy on detecting when a remote node must be dropt. We can allow to
choose a HA policy -your code is perfect- and to choose a lazy policy -a
node is dropt when you try to contact him to send a work repeately and he
does not answer.
2) If node transparency is allowed (Mosix code?). This includes: common
PID table -there exists some Beowulf code about this, and PVM have some
great ideas-, process migration, launching new processes at least loaded
node. This is not good for HA, thus it would be some that you must
activate via /proc
 3) Policy to forward IP packets to node: using or not the IP
address forward field or not. The IP forward field could be also enabled via
/proc, and IP can also be choosen the adress via /proc.


> (QOS).  Using QOS as an analogy, most applications need networking, but some
> need low latency, and others need predictable packet times, and others need
> high bandwidth.


 You are right. What about my comment before?

> In our case, your cluster might need a very low bandwidth solution, and mine
> might need quick discovery of dead nodes.  But - we both need to be able to
> tell what machines are in the cluster, and what ones are out of it.

 You are completly right. Anyway, lots of things -file locking, internal
clustering structures, can be common.The rest is to choose CuQOS -cluster
Quality of service- ;-) parameters.

> > the least loaded node  of the cluster. PVM have this, Mosix have not -in
> > Mosix the task can migrate after being launched, but its kernel part will
> > be executed on the launch node-.
>
> A cluster batch scheduler presumably could be a help here...

 Yes; but in the (2) point before we must to include this ability,
transparent to the user, giving the SMP semantics to the whole kernel;
this must be optionaly, activated via /proc, because HA people would not
like this; you would like to know exactly where is each software running.

> > 5) It must to allow to run cheap hardware efficiently. this take out
> > broadcast protocols, sorry. ;-)
>
> I would state this differently.  It must be possible to assemble a set of
> components that allows it to work efficiently on cheap hardware.  I would
> also argue that it must be possible to assemble a set of components that
> allow it to take advantage of clusters with more bandwidth.

 You are right. This could be also a CuQOS option. :-) I thing that this
is included on the proponsal before.

> There is also a class of applications (like weather prediction) where the
> system needs to be HA/HPC.  The US weather bureau wants to perform a set of

 Well, if you have enough bucks, it is possible to do at the same time HA
and HPC. It is only activating all the CuQOS option before. ;-)

> > 6) Migrating running task is great. Mosix does a good work here, but not
> > perfect -sockets and shared memory code can not migrate-.
>
> For HA, automatic process migration in the kernel is a hinderance - not a
> help.  It makes it difficult to figure out what has failed, and to restart


 You are completly right here. But for HP it is the heaven; I must to
recognice that there were a before and a after of discovering the process
migration. That is why I thing that the migration should be on the
hypotetical clustering kernel as an kernel option.

> It also makes performance unpredictable.  If you're short on cycles (like
> you describe yourself), then this can be a big problem.

 This is not exact. In Mosix you are sure that you are going to use the
resources of the system at the higher level. Prof. Amnon has his
mathemathical demostration of this. Yes, you are not true if one
particular process will have some particular performance, but if the
overall is the best, you have an high probability of running faster.;-)


> Application-directed restarts are much harder, but often better performing
> as well.  If you have more human than technological resources, this might be

 Sorry, but no. I have tested it. On the worst case -an artificially
created programme to difficult the work of Mosix, migration is nearly as
good as the best case, and the different is very little -some minutes on a
work of a little more than a week-. On real cases, Mosix clearly
overperforms PVM with a directed politics of node asignment. (Tested with
some different mollecular modeling  packages, and some different QM
packages)

 Hope that my opinion helps. :-)

 Yours:


 David

---------------------
http://www.orcero.org
  irbis@orcero.org
---------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/