[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: the "cluster" system call (and file system type)



irbis@orcero.org wrote:
> 
>  Hello, David!
> 
> > As I mentioned earlier (months ago) :)
> > it could be a mountable fake file system.
> 
>  Better focused, something like a /proc filesystem. :-)

Part of the objection to including mosix as-is into the kernel
is all the mucking it does in /proc.  This point was made
early on by very possibly Alan himself.   So by mounting
the cluster system interfaces outside of /proc, the definitions
of them are independent of everything that happens in /proc,
and there is also very nice independence -- you could have one
machine join two completely different clusters by mounting them
on two points, and the distinction is clean.



>  I thing that in that way the Mosix aproachment is a very good idea. Since
> all the information is on a subdirectory of /proc, is a really
> non-intrusive way to get information and to put orders to the cluster.
> (non-intrusiveness is inmportant. I hardly can imagine myself rewriting
> some of these molecular dynamics dinnosaurs, and, what is worse,
> explaining to the HPB why I am doing this.

it's non-intrusive to the user, but it is very intrusive to the kernel.
When /proc gets modified in the kernel, they have to rewrite the patch.
Bolt-on fake file systems were more difficult when they ported MOSIX
to Linux, they're easier now (there's even a howto document about it
and a standard fake-file-system interface: you don't need to pretend
to be a local NFS server any more.)


 
>  Anyway, you have here a good idea, since:
> 
> >
> > mount special-file mount-point -t cluster-type
> >
> 
>  Could be an EXCELENT way to control dinamically the membership of a
> machine. If you do:
> 
> mount /proc/cluster -t cluster
> 
>  you enter on the cluster, and with:
> 
> umount /proc/cluster
> 
>  you get away the node of the cluster. To enter automatically, is only
> having the line:
> 
> none /proc/cluster cluster defaults 0 0
> 
>  On the fstab.

You'd need an awfully robust cluster system to be able to have
no configuration file at all!  I placed the configuration file
in the device-special slot in mount syntax.  That would make
mounting a mosix-1.0 emulator from fstab look something like:

/etc/mosix.map /proc/mosix cluster mosix 0 0

>  (in fact, I always have thought that the most of the code of Mosix could
> be reused, and we could work on parching the weakness of Mosix, only for
> not to reinvent the weel; this was before the "using XML for
> transmitting control information" and "broadcast on ethernet do not hurts
> network performance" threads)

Breaking it into individual components may be lots of projects. I don't
know how interrelated the things are; how hard it would be to apply
Shiloh Migration Algorithms to bproc migration, for instance.


> > The mount-point would be a directory where all the
> > control interfaces, including a standard subset and
> > whatever extensions the particular system adds on, will
> > live.
> 
>  Same as Mosix... I think the same.

Yes, except not guaranteed to live at /proc/mosix.


 
> > the cluster-type would be the clustering discipline to
> > give the special-file to, to set itself up.  Mount might
> > be able to figure out what kind it is on its own.
> 
>  Maybe, more if we take into account that hp and ha have completly
> different objectives. We can keep the same infraestructure -CPIDs, node
> tables, and so on- but in the mounting type we choose if we want HP or HA.
> Some of the things that HP people says -as the laptop stuff- are anthem
> for HA people, and some of the things that HA people says -"broadcasting
> constantly XML messages to keep on date membership information" are
> somewhat hard to eat by HP people. Thus, we could redefine the mount
> before as:
> 
> mount /proc/cluster -t HPcluster
> 
>  and
> 
> mount /proc/cluster -t HAcluster
> 
> and with this we enable/disable the features proposed for HA people, or
> more Mosix-like features.

The way mount works, you _ALWAYS_ have

	mount special directory

and type, which is declared by -t, is optional but may be required if
mount cannot determine it, for instance

	mount -t nfs humdinger.redhat.com:/pub/current/ /installtree

and then options are provided with -o switches:

	mount -t nfs humdinger.redhat.com:/pub/current/ /installtree -o soft -o
tcp


That's how mount works.

So to make the cluster file system type "cluster" all the
options would get specified with -o switches.  -o HP or -o HA
might turn on whole families of options.


 
> > I was about to write a completely user-mode system based
> > on unix-domain sockets this spring but got distracted.
> 
>  Well, this already exists (PVM), and work fine. Anyway, PVM has its own
> leaks, please share with us the fresh ideas that you were going to use on
> your user-mode system.

I meant a channel system where node-node communication is established
with a single stream connection that persists, and all other communication
between those two nodes is multiplexed over that channel.

This would remove the IP-address <--> node 1-1 mapping limit that MOSIX 
has, allowing nodes behind a NAT to peer with nodes outside for instance.

That in place, a system where you can open a new channel to a peer node
at will, the next step was to re-implement process migration over these
multiplexed streams, with file handles getting implemented as channels
so that your basic IO can occur remotely by the various handles getting
hidden behind an abstraction.

I would have to do some homework to determine efficiencies and stuff.

I think I posted a summary proposal of it earlier this year.

-- 
                                           David Nicol 816.235.1187


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/