From owner-linux-cluster@nl.linux.org Mon Jul  2 23:49:16 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16233AbRGBVtC>; Mon, 2 Jul 2001 23:49:02 +0200
Received: from mail303.mail.bellsouth.net ([205.152.58.163]:36839 "EHLO
	imf03bis.bellsouth.net") by humbolt.nl.linux.org with ESMTP
	id <S16192AbRGBVsq> convert rfc822-to-8bit; Mon, 2 Jul 2001 23:48:46 +0200
Received: from taz ([208.61.48.116]) by imf03bis.bellsouth.net
          (InterMail vM.5.01.01.01 201-252-104) with SMTP
          id <20010702214935.CLSG16957.imf03bis.bellsouth.net@taz>;
          Mon, 2 Jul 2001 17:49:35 -0400
Date:	Mon, 2 Jul 2001 17:48:37 -0400
From:	Greg Freemyer <freemyer-ml@NorcrossGroup.com>
Subject: re: Compaq launches Open SSI Cluster Projects
To:	<bruce@kahuna.cag.cpqcorp.net>, <linux-cluster@nl.linux.org>
cc:	Linux HA - users mailing lins <linux-ha@muc.de>,  SSI Clusters for Linux <ssic-linux-devel@opensource.compaq.com.>
Mime-Version: 1.0
Organization: The Norcross Group
X-Mailer: GoldMine [5.50.10424]
Content-Type: text/plain
Content-Transfer-Encoding: 8BIT
Message-Id: <20010702214935.CLSG16957.imf03bis.bellsouth.net@taz>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Bruce,

I have just read your paper at

http://bjbrew.org/cpq/ssic_linux/montreal/sld001.htm

and in particular the summary page at

http://bjbrew.org/cpq/ssic_linux/montreal/sld053.htm

I must say that I am blown away with what you are doing.  

Once your goals are accomplished, it looks to me as if you will have the most advanced UNIX level HA/HPC clustering solution available.(I am including the commercial products like TruClusters and Veritas.  I don't know enough about Beowulf or Mosix to comment.)

Would you agree with that?

Without fully understanding the pros/cons, I hope you are successful in garnering interest for getting this into the standard Linux Kernel.  (Unfortunately, I am not a player in the Linux world so my support won't mean anything.)

The one negative I see is that you seem to have been developing this in a vacuum from the Linux communities perspective.

The only Linux technology I see in the presentation is GFS.

Are there any other pre-existing Linux HA/HP cluster technologies you are incorporating?

Greg
=======

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com

 >>  Compaq has launched two open source technology projects
 >>  under the GPL license.  They are briefly described below 
 >>  and can be found through www.opensource.compaq.com.

 >>  We are actively looking for technology partners, 
 >>  contributors, consultants and general kibitzers to
 >>  participate via the email lists set up for each project.
 >>  Those that just want to monitor the projects are welcome
 >>  as well.

 >>  Cluster Infrastructure for Linux (CI)
 >>  The goal of this project is to develop a common 
 >>  infrastructure for many if not all forms of Linux 
 >>  clustering by extending the Cluster Membership and 
 >>  Inter-node Communication Subsystems from Compaq's 
 >>  NonStop Clusters for Unixware code base.  This project 
 >>  also provides the basis for the Open SSI Clusters for 
 >>  Linux project.  
 >>  A developers download is available via
 >>  www.opensource.compaq.com for Intel-32, along 
 >>  with build, boot, hook, interface and api documentation.
 >>  We will put the CVS repository on the web when we can.
 >>  A port to the alpha chip has already succeeded and 
 >>  patches for that are available.

 >>  Open Single System Image (SSI) Clusters for Linux Project
 >>  The Open SSI project leverages both Compaq's NonStop
 >>  Clusters for Unixware technology and other open source
 >>  technology to provide a full, highly available SSI
 >>  environment for Linux.  Goals for SSI Clusters include
 >>  availability, scalability and manageability, built from
 >>  standard servers.  Technology pieces will include:
 >>  membership, single root and single init, cluster filesystems
 >>  and DLM, single process space and process migration, load
 >>  leveling, availability monitors and failover, single namespace  
 >>  and shared access for all forms of IPC, devices and networking, 
 >>  and a single management space.  The SSI project will leverage 
 >>  the Cluster Infrastructure for Linux project.
 >>  Source beyond the CI base is not yet available.  We are
 >>  aiming for a developers release of much of functionality in
 >>  July.  In the meantime there is a presentation on SSI
 >>  Clustering on the web. An initial list of component requirements 
 >>  will soon be posted for discussion and refinement.
 >>  Join the mail alias via www.opensource.compaq.com
 >>  to stay updated.

 >>  bruce walker
 >>  SSI Cluster Architect
 >>  Linux Program Office
 >>  Compaq Computers

 >>  Linux-cluster: generic cluster infrastructure for Linux
 >>  Archive:       http://mail.nl.linux.org/linux-cluster/







Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 01:38:00 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16136AbRGBXhl>; Tue, 3 Jul 2001 01:37:41 +0200
Received: from cpe-24-221-212-80.co.sprintbbd.net ([24.221.212.80]:15864 "EHLO
	host.domain.name") by humbolt.nl.linux.org with ESMTP
	id <S16255AbRGBXhe>; Tue, 3 Jul 2001 01:37:34 +0200
Received: from unix.sh (localhost [127.0.0.1])
	by host.domain.name (Postfix) with ESMTP
	id 03CB826A3A; Mon,  2 Jul 2001 17:36:52 -0600 (MDT)
Message-ID: <3B410594.94EE59C9@unix.sh>
Date:	Mon, 02 Jul 2001 17:36:52 -0600
From:	Alan Robertson <alanr@unix.sh>
Organization: IBM Linux Technology Center
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.18 i686)
X-Accept-Language: en
MIME-Version: 1.0
To:	Greg Freemyer <freemyer-ml@NorcrossGroup.com>
Cc:	bruce@kahuna.cag.cpqcorp.net, linux-cluster@nl.linux.org,
	Linux HA - users mailing lins <linux-ha@muc.de>,
	SSI Clusters for Linux 
	<ssic-linux-devel@opensource.compaq.com>
Subject: Re: Compaq launches Open SSI Cluster Projects
References: <20010702214935.CLSG16957.imf03bis.bellsouth.net@taz>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg Freemyer wrote:

> you seem to have been developing this in a vacuum from the Linux communities
> perspective.
> 
> The only Linux technology I see in the presentation is GFS.
> 
> Are there any other pre-existing Linux HA/HP cluster technologies you are
> incorporating?

Hi Greg,

We've certainly encouraged them to work together with us to make their
software fit into the planned community clustering infrastructure project. 
This would be of benefit to them, to the Linux community, and to potential
users of clustering infrastructure.

We're still waiting to hear if they're interested.

	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 01:53:21 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16265AbRGBXxN>; Tue, 3 Jul 2001 01:53:13 +0200
Received: from inet-mail4.oracle.com ([148.87.2.204]:61832 "EHLO
	inet-mail4.oracle.com") by humbolt.nl.linux.org with ESMTP
	id <S16255AbRGBXxC>; Tue, 3 Jul 2001 01:53:02 +0200
Received: from gmgw01.us.oracle.com (gmgw01.us.oracle.com [130.35.249.115])
	by inet-mail4.oracle.com (Switch-2.1.3/Switch-2.1.0) with ESMTP id f62NkO916321;
	Mon, 2 Jul 2001 16:46:24 -0700 (PDT)
Received: from oracle.com (dbrower-sun.us.oracle.com [130.35.180.64])
	by gmgw01.us.oracle.com (Switch-2.1.1/Switch-2.1.0) with ESMTP id f62Nq0v24080;
	Mon, 2 Jul 2001 16:52:00 -0700 (PDT)
Message-ID: <3B410920.C23765FA@oracle.com>
Date:	Mon, 02 Jul 2001 16:52:00 -0700
From:	David Brower <David.Brower@oracle.com>
Organization: Oracle Corporation
X-Mailer: Mozilla 4.7 [en] (X11; U; SunOS 5.6 sun4u)
X-Accept-Language: en
MIME-Version: 1.0
To:	Alan Robertson <alanr@unix.sh>
CC:	Greg Freemyer <freemyer-ml@NorcrossGroup.com>,
	bruce@kahuna.cag.cpqcorp.net, linux-cluster@nl.linux.org,
	Linux HA - users mailing lins <linux-ha@muc.de>,
	SSI Clusters for Linux 
	<ssic-linux-devel@opensource.compaq.com>
Subject: Re: Compaq launches Open SSI Cluster Projects
References: <20010702214935.CLSG16957.imf03bis.bellsouth.net@taz> <3B410594.94EE59C9@unix.sh>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

They -have- developed in a linux vacuum; this is the Tandem/NonStop/SCO
Unix cluster stuff, pretty much as deployed on that platform.

Personally, I am more interested in the CI part of the project, which
along with the IBM DLM would provide a reasonable GFS platform.  It seems
it would get together faster than GFS-on-DLM-on-heartbeat would.

I'm less into in the very-large scope SSI work, maybe because I don't 
understand the implications of the failure domains.  It has appeared to me
before that failure of a node is likely to have deeper ripples in the SSI 
scheme than it does in clusters with less tightly coupled nodes.  And it
is all very intrusive in ways one wonders if Linus would ever accept.  Its
a noble attempt, but it's gonna be a lot harder to accept.

In a perfect world, many of the components would plug together; I don't know
how the CI stuff maps into the heartbeat model.

-dB

Alan Robertson wrote:
> 
> Greg Freemyer wrote:
> 
> > you seem to have been developing this in a vacuum from the Linux communities
> > perspective.
> >
> > The only Linux technology I see in the presentation is GFS.
> >
> > Are there any other pre-existing Linux HA/HP cluster technologies you are
> > incorporating?
> 
> Hi Greg,
> 
> We've certainly encouraged them to work together with us to make their
> software fit into the planned community clustering infrastructure project.
> This would be of benefit to them, to the Linux community, and to potential
> users of clustering infrastructure.
> 
> We're still waiting to hear if they're interested.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 02:36:29 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16449AbRGCAgV>; Tue, 3 Jul 2001 02:36:21 +0200
Received: from cpe-24-221-212-80.co.sprintbbd.net ([24.221.212.80]:47608 "EHLO
	host.domain.name") by humbolt.nl.linux.org with ESMTP
	id <S16234AbRGCAgI>; Tue, 3 Jul 2001 02:36:08 +0200
Received: from unix.sh (localhost [127.0.0.1])
	by host.domain.name (Postfix) with ESMTP
	id 2753F26A3A; Mon,  2 Jul 2001 18:35:44 -0600 (MDT)
Message-ID: <3B411360.64681960@unix.sh>
Date:	Mon, 02 Jul 2001 18:35:44 -0600
From:	Alan Robertson <alanr@unix.sh>
Organization: IBM Linux Technology Center
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.18 i686)
X-Accept-Language: en
MIME-Version: 1.0
To:	David Brower <David.Brower@oracle.com>
Cc:	Greg Freemyer <freemyer-ml@NorcrossGroup.com>,
	bruce@kahuna.cag.cpqcorp.net, linux-cluster@nl.linux.org,
	Linux HA - users mailing lins <linux-ha@muc.de>,
	SSI Clusters for Linux 
	<ssic-linux-devel@opensource.compaq.com>
Subject: Re: Compaq launches Open SSI Cluster Projects
References: <20010702214935.CLSG16957.imf03bis.bellsouth.net@taz> <3B410594.94EE59C9@unix.sh> <3B410920.C23765FA@oracle.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

David Brower wrote:
> 
> They -have- developed in a linux vacuum; this is the Tandem/NonStop/SCO
> Unix cluster stuff, pretty much as deployed on that platform.
> 
> Personally, I am more interested in the CI part of the project, which
> along with the IBM DLM would provide a reasonable GFS platform.  It seems
> it would get together faster than GFS-on-DLM-on-heartbeat would.

Got any Round Tuits you can send me?     Any grad students?
 
> In a perfect world, many of the components would plug together; I don't know
> how the CI stuff maps into the heartbeat model.

Hopefully, heartbeat will map into the framework model, not the other way
around ;-)

It is one of the influences that enters into the framework model, and there
are a few things it does nicely, and we'll preserve those.

But, it will just be one component of many.

	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 02:44:40 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16455AbRGCAo1>; Tue, 3 Jul 2001 02:44:27 +0200
Received: from suntan.tandem.com ([192.216.221.8]:46261 "EHLO
	suntan.tandem.com") by humbolt.nl.linux.org with ESMTP
	id <S16261AbRGCAoO>; Tue, 3 Jul 2001 02:44:14 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id RAA27180
	for <linux-cluster@nl.linux.org>; Mon, 2 Jul 2001 17:44:05 -0700 (PDT)
Received: (from bruce@localhost) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) id f630Vx218047; Mon, 2 Jul 2001 17:31:59 -0700 (PDT)
From:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Message-Id: <200107030031.f630Vx218047@kahuna.cag.cpqcorp.net>
Subject: Re: Compaq launches Open SSI Cluster Projects
In-Reply-To: <20010702214935.CLSG16957.imf03bis.bellsouth.net@taz> from Greg Freemyer at "Jul 2, 2001 05:48:37 pm"
To:	freemyer-ml@NorcrossGroup.com (Greg Freemyer)
Date:	Mon, 2 Jul 2001 17:31:59 -0700 (PDT)
Cc:	linux-cluster@nl.linux.org,
	linux-ha@muc.de (Linux HA - users mailing lins),
	ssic-linux-devel@opensource.compaq.com (SSI Clusters for Linux)
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg,
  Thanks for the interest in full SSI clustering.
Full SSI clustering is not as familiar to most
people as HA clustering or HPC clustering.  As
you noted, it is very ambious. Fortunately we have been at
it for many years (working on different Unix
bases).  A key component is a single root.  However,
a terse list of some of the components we believe
are needed for an SSI cluster shows that single
root filesystem is just part of one of them
(list provided below).

The plan for the project is to start with a discussion
of the component areas and of requirements for the
component areas.   For many of the component areas we
have Linux code, which was reworked from the code we
had on the Unixware base.  Two of the components
(membership and internode communication) are already
open sourced via the CI project (Cluster Infrastructure,
available via the www.opensource.compaq.com link).
We plan to have an initial integrated developers release of 
many of the other components later this month.

Areas we hope and expect to leverage existing Linux
projects and technology include:
   a. filesystems (we will release a cluster filesystem
	we have developed but hope to involve and
	incorporate any that come around (starting with
	GFS).
   b. all aspects of application monitoring and restart
	(many different linux projects to work from here)
   c. load leveling (both connection load leveling 
	like LVS and process migration load leveling like
	Mosix)
   d. devfs (we have enhancements to the basic devfs to
	provide a transparent clusterwide device view and 
 	clusterwide device access.
   e. DLM (our cluster filesystem didn't need one but many
	others do and we are working to fold the open
	sourced DLM into CI.

The goals of SSI clustering simple - simultaneously provide
high availability, scalability and manageability.  If
we are successful, SSI clusters will not only be the HA
clusters of the future but may also be the load leveling
and high performance clusters as well.

Here is a very terse list of SSI component areas:

1.  Membership - kernel boot time; APIs; coordinate kernel cleanup; 
	split brain; STOMITH 
2.  Internode Communication Subsystem - kernel boot time; channels; 
	flow control; transports, 
3.  Filesystem - single root; single mount tree; access to all filesystems; 
	offset coherency
4.  Processes - single namespace and full access to all from all; arbitrary 
	node failure; /proc; 
5.  Devices - single namespace for all; access to all from anywhere; 
	persistence; parallel access, ...
6.  Interprocess Comm - single namespace; access all sysVipc, pipes, fifos, 
	ptys, Unix sockets, Inet sockets
7.  TCP/IP networking - single set of devices; single port space; 
	cluster virtual IP (CVIP); connection load leveling 
	IP failover or CVIP failover
8.  Paging/Swap - single set of devices; borrow space if needed
9.  kernel data replication service - maintain consistency; populate new 
	nodes
10. Cluster Volume Manager
11. HA shared storage 
12. HA interconnect 
13. DLM;   
14. SSI system mgmt;
	very small enhancements to single machine Linux tools
15. single HA init; cluster booting and run levels
16. HA applications and system daemons
	- simplified versions (due to SSI) of standard HA tools
17. timesync
18. load leveling
19. Packaging and Installation
20. object location interfaces and object movement interfaces
	- moving pipes and sockets etc. from node to node


Soon there will be an annotated presentation on SSI, followed
by an initial list of requirements, component by component.

> Bruce,
> 
> I have just read your paper at
> 
> http://bjbrew.org/cpq/ssic_linux/montreal/sld001.htm
> 
> and in particular the summary page at
> 
> http://bjbrew.org/cpq/ssic_linux/montreal/sld053.htm
> 
> I must say that I am blown away with what you are doing.  
> 
> Once your goals are accomplished, it looks to me as if you will have the most advanced UNIX level HA/HPC clustering solution available.(I am including the commercial products like TruClusters and Veritas.  I don't know enough about Beowulf or Mosix to comment.)
> 
> Would you agree with that?
> 
> Without fully understanding the pros/cons, I hope you are successful in garnering interest for getting this into the standard Linux Kernel.  (Unfortunately, I am not a player in the Linux world so my support won't mean anything.)
> 
> The one negative I see is that you seem to have been developing this in a vacuum from the Linux communities perspective.
> 
> The only Linux technology I see in the presentation is GFS.
> 
> Are there any other pre-existing Linux HA/HP cluster technologies you are incorporating?
> 
> Greg
> =======
> 
> Greg Freemyer
> Internet Engineer
> Deployment and Integration Specialist
> The Norcross Group
> www.NorcrossGroup.com
> 
>  >>  Compaq has launched two open source technology projects
>  >>  under the GPL license.  They are briefly described below 
>  >>  and can be found through www.opensource.compaq.com.
> 
>  >>  We are actively looking for technology partners, 
>  >>  contributors, consultants and general kibitzers to
>  >>  participate via the email lists set up for each project.
>  >>  Those that just want to monitor the projects are welcome
>  >>  as well.
> 
>  >>  Cluster Infrastructure for Linux (CI)
>  >>  The goal of this project is to develop a common 
>  >>  infrastructure for many if not all forms of Linux 
>  >>  clustering by extending the Cluster Membership and 
>  >>  Inter-node Communication Subsystems from Compaq's 
>  >>  NonStop Clusters for Unixware code base.  This project 
>  >>  also provides the basis for the Open SSI Clusters for 
>  >>  Linux project.  
>  >>  A developers download is available via
>  >>  www.opensource.compaq.com for Intel-32, along 
>  >>  with build, boot, hook, interface and api documentation.
>  >>  We will put the CVS repository on the web when we can.
>  >>  A port to the alpha chip has already succeeded and 
>  >>  patches for that are available.
> 
>  >>  Open Single System Image (SSI) Clusters for Linux Project
>  >>  The Open SSI project leverages both Compaq's NonStop
>  >>  Clusters for Unixware technology and other open source
>  >>  technology to provide a full, highly available SSI
>  >>  environment for Linux.  Goals for SSI Clusters include
>  >>  availability, scalability and manageability, built from
>  >>  standard servers.  Technology pieces will include:
>  >>  membership, single root and single init, cluster filesystems
>  >>  and DLM, single process space and process migration, load
>  >>  leveling, availability monitors and failover, single namespace  
>  >>  and shared access for all forms of IPC, devices and networking, 
>  >>  and a single management space.  The SSI project will leverage 
>  >>  the Cluster Infrastructure for Linux project.
>  >>  Source beyond the CI base is not yet available.  We are
>  >>  aiming for a developers release of much of functionality in
>  >>  July.  In the meantime there is a presentation on SSI
>  >>  Clustering on the web. An initial list of component requirements 
>  >>  will soon be posted for discussion and refinement.
>  >>  Join the mail alias via www.opensource.compaq.com
>  >>  to stay updated.
> 
>  >>  bruce walker
>  >>  SSI Cluster Architect
>  >>  Linux Program Office
>  >>  Compaq Computers
> 
>  >>  Linux-cluster: generic cluster infrastructure for Linux
>  >>  Archive:       http://mail.nl.linux.org/linux-cluster/
> 
> 
> 
> 
> 
> 
> 
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 03:02:28 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16458AbRGCBCW>; Tue, 3 Jul 2001 03:02:22 +0200
Received: from mail07b.vwh1.net ([209.238.9.59]:25688 "HELO mail07b.vwh1.net")
	by humbolt.nl.linux.org with SMTP id <S16456AbRGCBCH>;
	Tue, 3 Jul 2001 03:02:07 +0200
Received: from www.bickleywest.com (208.55.15.92)
	by mail07b.vwh1.net (RS ver 1.0.60s) with SMTP id 08082759;
	Mon,  2 Jul 2001 20:56:45 -0400 (EDT)
Message-ID: <3B4118DD.745F1E54@bickleywest.com>
Date:	Mon, 02 Jul 2001 17:59:09 -0700
From:	Lyle Bickley <lbickley@bickleywest.com>
Organization: Bickley Consulting West Inc.
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.16 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	David Brower <David.Brower@oracle.com>
CC:	Alan Robertson <alanr@unix.sh>,
	Greg Freemyer <freemyer-ml@NorcrossGroup.com>,
	bruce@kahuna.cag.cpqcorp.net, linux-cluster@nl.linux.org,
	Linux HA - users mailing lins <linux-ha@muc.de>,
	SSI Clusters for Linux 
	<ssic-linux-devel@opensource.compaq.com>
Subject: Re: Compaq launches Open SSI Cluster Projects
References: <20010702214935.CLSG16957.imf03bis.bellsouth.net@taz> <3B410594.94EE59C9@unix.sh> <3B410920.C23765FA@oracle.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Loop-Detect: 1
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

The good news is that this code contains very important intellectual
property - and I am amazed that Compaq has released it under GPL2. 
(I've been working with Tandem/Compaq for several years on highly
scalable, highly reliable SSI and CI architectures.)

I haven't looked at all the code (some is yet to be released...), but if
it is the code I'm familiar with, it allows for process migration,
process pairs, etc. for "serious" HA for business applications and
databases.

David Brower wrote:
> 
> They -have- developed in a linux vacuum; this is the Tandem/NonStop/SCO
> Unix cluster stuff, pretty much as deployed on that platform.

Yup.

> 
> Personally, I am more interested in the CI part of the project, which
> along with the IBM DLM would provide a reasonable GFS platform.  It seems
> it would get together faster than GFS-on-DLM-on-heartbeat would.
> 
> I'm less into in the very-large scope SSI work, maybe because I don't
> understand the implications of the failure domains.  It has appeared to me
> before that failure of a node is likely to have deeper ripples in the SSI
> scheme than it does in clusters with less tightly coupled nodes.

Not so.  With process pairs and process migration, one can have apps.
that are almost impervious to hardware failure.  (Take a look at Jim
Gray's book - Transaction Processing Concepts and Techniques, section
3.7 - Fault Model and Software Fault Masking).

> And it
> is all very intrusive in ways one wonders if Linus would ever accept.  Its
> a noble attempt, but it's gonna be a lot harder to accept.

By definition, SSI clusters ARE intrusive - at least from the kernel
standpoint.  It's a question of "degree".  If "hooks" are used, then it
would seem that those "hooks" may well server beneficial to all CI
work(?).

At the application level, intrusiveness can be somewhat "hidden".  For
instance, if a JVM is made to be "fault tolerant", applications running
on that JVM "inherit" SOME of the qualities of that fault tolerance. 
For an application (say database) to be fully fault tolerant (hardware
and software), the that app must be architected for fault tolerance
(process pairs, etc.).

> In a perfect world, many of the components would plug together; I don't know
> how the CI stuff maps into the heartbeat model.

Again, I'll have to look at the code - but "typically" all clustering
dealing with HA fault domains must have "heartbeat".  As I recollect
from some of my earlier IP work, HP and Tandem hold the original patents
on "heartbeat".

---- snip ---- snip ----
 
Cheers,
Lyle

> ------------------------------------------------------------------------------
> Linux HA Web Site:
>   http://linux-ha.org/
> Linux HA HOWTO:
>   http://metalab.unc.edu/pub/Linux/ALPHA/linux-ha/High-Availability-HOWTO.html
> ------------------------------------------------------------------------------

-- 
Lyle Bickley             | Bickley Consulting West Inc.
lbickley@acm.org         | 
lbickley@bickleywest.com | V 650-428-0621
http://bickleywest.com/	 | F 650-428-0599
		
"Black holes exist where GOD is dividing by zero"

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 03:06:00 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16242AbRGCBFv>; Tue, 3 Jul 2001 03:05:51 +0200
Received: from suntan.tandem.com ([192.216.221.8]:28342 "EHLO
	suntan.tandem.com") by humbolt.nl.linux.org with ESMTP
	id <S16137AbRGCBFk>; Tue, 3 Jul 2001 03:05:40 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id SAA27457
	for <linux-cluster@nl.linux.org>; Mon, 2 Jul 2001 18:05:34 -0700 (PDT)
Received: (from bruce@localhost) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) id f630oSg19466; Mon, 2 Jul 2001 17:50:28 -0700 (PDT)
From:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Message-Id: <200107030050.f630oSg19466@kahuna.cag.cpqcorp.net>
Subject: Re: Compaq launches Open SSI Cluster Projects
In-Reply-To: <3B410594.94EE59C9@unix.sh> from Alan Robertson at "Jul 2, 2001 05:36:52 pm"
To:	alanr@unix.sh (Alan Robertson)
Date:	Mon, 2 Jul 2001 17:50:28 -0700 (PDT)
Cc:	freemyer-ml@NorcrossGroup.com (Greg Freemyer),
	linux-cluster@nl.linux.org,
	linux-ha@muc.de (Linux HA - users mailing lins),
	ssic-linux-devel@opensource.compaq.com (SSI Clusters for Linux)
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Alan,
  We are very interested in working with any and all
Linux cluster groups.  I haven't sent a response to
your earlier message about your framework  because
I haven't studied it enough to react intelligently.

The desire to produce SSI cluster may add
requirements to your framework.  As for APIs, a
set of proposed membership APIs is included in the
Cluster Infrastructure project (along with all the code
to build, boot and play with for clusters that may
scale up to 64 nodes (haven't tested that big yet,
though)).
I would very much like to start a discussion on
membership APIs (since in our experience, this is
the first place an application might become cluster
aware).  I'll send the URL where you can review them.

bruce.walker@compaq.com

> 
> Hi Greg,
> 
> We've certainly encouraged them to work together with us to make their
> software fit into the planned community clustering infrastructure project. 
> This would be of benefit to them, to the Linux community, and to potential
> users of clustering infrastructure.
> 
> We're still waiting to hear if they're interested.
> 
> 	-- Alan Robertson
> 	   alanr@unix.sh
> 
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 03:33:48 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16459AbRGCBdb>; Tue, 3 Jul 2001 03:33:31 +0200
Received: from suntan.tandem.com ([192.216.221.8]:24247 "EHLO
	suntan.tandem.com") by humbolt.nl.linux.org with ESMTP
	id <S16137AbRGCBdY>; Tue, 3 Jul 2001 03:33:24 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id SAA27737
	for <linux-cluster@nl.linux.org>; Mon, 2 Jul 2001 18:33:21 -0700 (PDT)
Received: (from bruce@localhost) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) id f6316A720505; Mon, 2 Jul 2001 18:06:10 -0700 (PDT)
From:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Message-Id: <200107030106.f6316A720505@kahuna.cag.cpqcorp.net>
Subject: Re: Compaq launches Open SSI Cluster Projects
In-Reply-To: <3B410920.C23765FA@oracle.com> from David Brower at "Jul 2, 2001 04:52:00 pm"
To:	David.Brower@oracle.com (David Brower)
Date:	Mon, 2 Jul 2001 18:06:10 -0700 (PDT)
Cc:	alanr@unix.sh (Alan Robertson),
	freemyer-ml@NorcrossGroup.com (Greg Freemyer),
	linux-cluster@nl.linux.org,
	linux-ha@muc.de (Linux HA - users mailing lins),
	ssic-linux-devel@opensource.compaq.com (SSI Clusters for Linux)
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

David,

> They -have- developed in a linux vacuum; this is the Tandem/NonStop/SCO
> Unix cluster stuff, pretty much as deployed on that platform.

As with any significant contribution to Linux, we started with
something (didn't GFS, DLM, failsafe and many others start that way?).
We are open sourcing the complete NonStop Cluster technology.
Our goal is allow the community to leverage that technology, along
with other open source technology (GFS, LVS, DLM, failsafe, etc.) to
build the best clustering product around.
> 
> Personally, I am more interested in the CI part of the project, which
> along with the IBM DLM would provide a reasonable GFS platform.  It seems
> it would get together faster than GFS-on-DLM-on-heartbeat would.

We broke apart the CI components specifically for this need.  Because
IBM only released a small subset of their clustering, the DLM did not
have a sufficiently rich membership service to layer on.  We felt we
had something they (and thus the community) could use.

> 
> I'm less into in the very-large scope SSI work, maybe because I don't 
> understand the implications of the failure domains.  It has appeared to me
> before that failure of a node is likely to have deeper ripples in the SSI 
> scheme than it does in clusters with less tightly coupled nodes.  And it
> is all very intrusive in ways one wonders if Linus would ever accept.  Its
> a noble attempt, but it's gonna be a lot harder to accept.
> 
> In a perfect world, many of the components would plug together; I don't know
> how the CI stuff maps into the heartbeat model.
> 
> -dB
> 
> Alan Robertson wrote:
> > 
> > Greg Freemyer wrote:
> > 
> > > you seem to have been developing this in a vacuum from the Linux communities
> > > perspective.
> > >
> > > The only Linux technology I see in the presentation is GFS.
> > >
> > > Are there any other pre-existing Linux HA/HP cluster technologies you are
> > > incorporating?
> > 
> > Hi Greg,
> > 
> > We've certainly encouraged them to work together with us to make their
> > software fit into the planned community clustering infrastructure project.
> > This would be of benefit to them, to the Linux community, and to potential
> > users of clustering infrastructure.
> > 
> > We're still waiting to hear if they're interested.
> 
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 07:32:15 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16070AbRGCFb6>; Tue, 3 Jul 2001 07:31:58 +0200
Received: from web9208.mail.yahoo.com ([216.136.129.41]:18186 "HELO
	web9208.mail.yahoo.com") by humbolt.nl.linux.org with SMTP
	id <S16012AbRGCFbj>; Tue, 3 Jul 2001 07:31:39 +0200
Message-ID: <20010703053136.48055.qmail@web9208.mail.yahoo.com>
Received: from [32.102.126.34] by web9208.mail.yahoo.com; Mon, 02 Jul 2001 22:31:36 PDT
Date:	Mon, 2 Jul 2001 22:31:36 -0700 (PDT)
From:	Peter Badovinatz <tabmowzo@yahoo.com>
Subject: Re: Compaq launches Open SSI Cluster Projects
To:	Lyle Bickley <lbickley@bickleywest.com>,
	David Brower <David.Brower@oracle.com>
Cc:	Alan Robertson <alanr@unix.sh>,
	Greg Freemyer <freemyer-ml@NorcrossGroup.com>,
	bruce@kahuna.cag.cpqcorp.net, linux-cluster@nl.linux.org,
	Linux HA - users mailing lins <linux-ha@muc.de>,
	SSI Clusters for Linux 
	<ssic-linux-devel@opensource.compaq.com>
In-Reply-To: <3B4118DD.745F1E54@bickleywest.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


--- Lyle Bickley <lbickley@bickleywest.com> wrote:
<snip>
> 
> Again, I'll have to look at the code - but "typically" all clustering
> dealing with HA fault domains must have "heartbeat".  As I recollect
> from some of my earlier IP work, HP and Tandem hold the original patents
> on "heartbeat".
> 
Lyle,

Could you expand on this point about HP and Tandem holding original patents on
'heartbeat'?  Do you know the patent numbers, or are there more specifics you
could provide?  There are many implementations of "heartbeat", including one
very notable Open Source one ;) and I'd be quite curious to understand what
these patents may cover.

> ---- snip ---- snip ----
>  
> Cheers,
> Lyle
> 
> -- 
> Lyle Bickley             | Bickley Consulting West Inc.
> lbickley@acm.org         | 
> lbickley@bickleywest.com | V 650-428-0621
> http://bickleywest.com/	 | F 650-428-0599
> 		
> "Black holes exist where GOD is dividing by zero"
> 

Peter


=====
These have been the opinions of:
Peter R. Badovinatz -- (503)578-5530 (TL 775)
wombat@us.ibm.com/tabmowzo@yahoo.com
and in no way should be construed as official opinion of 
IBM, Corp.

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 08:34:48 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16139AbRGCGec>; Tue, 3 Jul 2001 08:34:32 +0200
Received: from cpe-24-221-212-80.co.sprintbbd.net ([24.221.212.80]:508 "EHLO
	host.domain.name") by humbolt.nl.linux.org with ESMTP
	id <S16034AbRGCGeK>; Tue, 3 Jul 2001 08:34:10 +0200
Received: from unix.sh (localhost [127.0.0.1])
	by host.domain.name (Postfix) with ESMTP
	id A57F826444; Tue,  3 Jul 2001 00:33:43 -0600 (MDT)
Message-ID: <3B416746.3C3ED397@unix.sh>
Date:	Tue, 03 Jul 2001 00:33:43 -0600
From:	Alan Robertson <alanr@unix.sh>
Organization: IBM Linux Technology Center
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.18 i686)
X-Accept-Language: en
MIME-Version: 1.0
To:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Cc:	Greg Freemyer <freemyer-ml@NorcrossGroup.com>,
	linux-cluster@nl.linux.org,
	Linux HA - users mailing lins <linux-ha@muc.de>,
	SSI Clusters for Linux 
	<ssic-linux-devel@opensource.compaq.com>
Subject: Re: Compaq launches Open SSI Cluster Projects
References: <200107030050.f630oSg19466@kahuna.cag.cpqcorp.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Bruce Walker wrote:
> 
> Alan,
>   We are very interested in working with any and all
> Linux cluster groups.  I haven't sent a response to
> your earlier message about your framework  because
> I haven't studied it enough to react intelligently.

I suppose you could have taken the alanr-approach, and commented on it
without feeling constrained by a lack of knowledge ;-)

Please keep in mind that the document is being written as we speak, and very
much subject to change.  I've made significant changes as a result of
feedback from the linux-cluster list, and hope to continue to do so.  Don't
hesitate to ask questions about the things which will inevitably be unclear
(or even wrong!) in it.

There are not yet any details on any APIs in the document.  There is an
incomplete list of API sets.

Having helped another large computer company open source some HA software in
the past, I know how time-consuming and frustrating this initial phase of
the project can be.  I'm sure you've come to know and love your lawyers ;-)

Speaking of this -- has COMPAQ committed to a license for the complete set
of software you're going to provide?

Of course, the GPL/LGPL would be the most harmonious choice, since every
other OSS project uses them.

If you're interested in some background on how I have approached HA designs
in the past, or my personal leanings, you might find it helpful to read the
heartbeat design document.  You might also try it you have insomnia, and are
don't like pills ;-)  It's here:
	http://www.linuxshowcase.org/2000/2000papers/papers/robertson/

> The desire to produce SSI cluster may add
> requirements to your framework.

Of course.  Producing an "X" cluster (for any "X") generally requires more
capabilities, and more APIs.  My big concern about APIs in this area, is
that the base set of APIs not be encumbered by the desire to add a set of
APIs for an optional feature for certain types of clusters (like SSI).

The set of APIs is not fixed, nor is it ever intended to be completely
fixed.  The idea is that you should be able to assemble a cluster out of the
components you need -- and leave out those you don't need.  That will
necessarily leave out certain APIs.  The set of APIs needs to be well
thought-out, well-designed and harmonious - but I don't see that it has to
be bounded by any particular hard boundary.  The set of libraries on a Linux
system isn't intended to be bounded, nor is the set of plugins for the GIMP.

> As for APIs, a
> set of proposed membership APIs is included in the
> Cluster Infrastructure project (along with all the code
> to build, boot and play with for clusters that may
> scale up to 64 nodes (haven't tested that big yet,
> though)).

Where can I find those proposed APIs?

> I would very much like to start a discussion on
> membership APIs (since in our experience, this is
> the first place an application might become cluster
> aware).

My two favorite areas are membership and basic cluster messaging.

By the way, keep in mind that in the framework, APIs are those things that
are exposed to the user *or* other cluster components.  So, basic messaging
will be of interest to other cluster components soon.

> I'll send the URL where you can review them.

Great!  On this note, I just added some general, semi-philosophical thoughts
on APIs to the document.

Looking forward to the URL...

	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 08:41:00 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16141AbRGCGkv>; Tue, 3 Jul 2001 08:40:51 +0200
Received: from cpe-24-221-212-80.co.sprintbbd.net ([24.221.212.80]:8188 "EHLO
	host.domain.name") by humbolt.nl.linux.org with ESMTP
	id <S16130AbRGCGkk>; Tue, 3 Jul 2001 08:40:40 +0200
Received: from unix.sh (localhost [127.0.0.1])
	by host.domain.name (Postfix) with ESMTP
	id DAE7D26444; Tue,  3 Jul 2001 00:40:18 -0600 (MDT)
Message-ID: <3B4168D2.7BF35C35@unix.sh>
Date:	Tue, 03 Jul 2001 00:40:18 -0600
From:	Alan Robertson <alanr@unix.sh>
Organization: IBM Linux Technology Center
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.18 i686)
X-Accept-Language: en
MIME-Version: 1.0
To:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	Greg Freemyer <freemyer-ml@NorcrossGroup.com>,
	linux-cluster@nl.linux.org,
	Linux HA - users mailing lins <linux-ha@muc.de>,
	SSI Clusters for Linux 
	<ssic-linux-devel@opensource.compaq.com>
Subject: Re: Compaq launches Open SSI Cluster Projects
References: <200107030050.f630oSg19466@kahuna.cag.cpqcorp.net> <3B416746.3C3ED397@unix.sh>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Alan Robertson wrote:
> 

> Speaking of this -- has COMPAQ committed to a license for the complete set
> of software you're going to provide?
> 
> Of course, the GPL/LGPL would be the most harmonious choice, since every
> other OSS project uses them.

OOPS!

I meant "every other OSS HA project".

	Sorry...

	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul  3 19:26:29 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16467AbRGCR0P>; Tue, 3 Jul 2001 19:26:15 +0200
Received: from 66.linscomp.com ([63.141.210.66]:60953 "EHLO exchserv.linsang")
	by humbolt.nl.linux.org with ESMTP id <S16460AbRGCRZy>;
	Tue, 3 Jul 2001 19:25:54 +0200
Received: by EXCHSERV with Internet Mail Service (5.5.2653.19)
	id <N9N846RQ>; Tue, 3 Jul 2001 10:29:32 -0700
Message-ID: <E7FEE1294B1AD51192490002A50EF31F044521@EXCHSERV>
From:	Darryl Rodden <DRodden@Xyterra.com>
To:	linux-cluster <linux-cluster@nl.linux.org>
Subject: RE: Updated General Cluster framework draft document
Date:	Tue, 3 Jul 2001 10:29:21 -0700 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C103E5.B269B230"
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C103E5.B269B230
Content-Type: text/plain;
	charset="iso-8859-1"

Hi,

The HA Forum published an open architecture high availability paper:
ftp://download.intel.com/platforms/applied/eiacomm/papers/ha_solutions.pdf

It takes a more systems view and discusses all aspects: hardware, OS,
middleware, etc.

I noticed that IBM is not part of that forum, though. :-)

Darryl Rodden
Xyterra Computing



-----Original Message-----
From: Alan Robertson [mailto:alanr@unix.sh]
Sent: Sunday, June 17, 2001 6:02 AM
To: linux-cluster
Subject: Updated General Cluster framework draft document


Hi,

I have incorporated the latest comments from the last round of discussions
on the framework document into it.

As before, comments and discussions are encouraged.


	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

------_=_NextPart_001_01C103E5.B269B230
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>RE: Updated General Cluster framework draft document</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>Hi,</FONT>
</P>

<P><FONT SIZE=3D2>The HA Forum published an open architecture high =
availability paper:</FONT>
<BR><FONT SIZE=3D2><A =
HREF=3D"ftp://download.intel.com/platforms/applied/eiacomm/papers/ha_sol=
utions.pdf" =
TARGET=3D"_blank">ftp://download.intel.com/platforms/applied/eiacomm/pap=
ers/ha_solutions.pdf</A></FONT>
</P>

<P><FONT SIZE=3D2>It takes a more systems view and discusses all =
aspects: hardware, OS, middleware, etc.</FONT>
</P>

<P><FONT SIZE=3D2>I noticed that IBM is not part of that forum, though. =
:-)</FONT>
</P>

<P><FONT SIZE=3D2>Darryl Rodden</FONT>
<BR><FONT SIZE=3D2>Xyterra Computing</FONT>
</P>
<BR>
<BR>

<P><FONT SIZE=3D2>-----Original Message-----</FONT>
<BR><FONT SIZE=3D2>From: Alan Robertson [<A =
HREF=3D"mailto:alanr@unix.sh">mailto:alanr@unix.sh</A>]</FONT>
<BR><FONT SIZE=3D2>Sent: Sunday, June 17, 2001 6:02 AM</FONT>
<BR><FONT SIZE=3D2>To: linux-cluster</FONT>
<BR><FONT SIZE=3D2>Subject: Updated General Cluster framework draft =
document</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>Hi,</FONT>
</P>

<P><FONT SIZE=3D2>I have incorporated the latest comments from the last =
round of discussions</FONT>
<BR><FONT SIZE=3D2>on the framework document into it.</FONT>
</P>

<P><FONT SIZE=3D2>As before, comments and discussions are =
encouraged.</FONT>
</P>
<BR>

<P>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT SIZE=3D2>-- Alan =
Robertson</FONT>
<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <FONT =
SIZE=3D2>&nbsp;&nbsp; alanr@unix.sh</FONT>
</P>

<P><FONT SIZE=3D2>Linux-cluster: generic cluster infrastructure for =
Linux</FONT>
<BR><FONT SIZE=3D2>Archive:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <A =
HREF=3D"http://mail.nl.linux.org/linux-cluster/" =
TARGET=3D"_blank">http://mail.nl.linux.org/linux-cluster/</A></FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C103E5.B269B230--

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul  4 08:27:38 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16231AbRGDG1T>; Wed, 4 Jul 2001 08:27:19 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:23824 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16197AbRGDG1H>;
	Wed, 4 Jul 2001 08:27:07 +0200
Received: (qmail 27433 invoked from network); 4 Jul 2001 06:23:23 -0000
Received: from nicol6.umkc.edu (HELO kasey.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 4 Jul 2001 06:23:23 -0000
Message-ID: <3B42B595.6916AB2@kasey.umkc.edu>
Date:	Wed, 04 Jul 2001 01:20:05 -0500
From:	"David L. Nicol" <david@kasey.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	user-mode-linux-user@lists.sourceforge.net,
	"linux-cluster@nl.linux.org" <linux-cluster@nl.linux.org>
Subject: UML and HA
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


I was thinking about HA and process migration, since I've got MOSIX
up and running again, sort of.  One of the problems that all the
process migration systems have is mapping local PID to distant PID.

So here's where user mode linux could come in:  Machines could
pair up in a "buddy system" (or heavier redundancy -- flavor to
taste) and each system would maintain a UML snapshot of what its
buddy is doing.  That way, if the cubans detonate server A, server
R can activate it's UML backup of server A and gracefully handle
server A's business.



-- 
                                           David Nicol 816.235.1187
                        And the cow threw up seven times, and said:
            "Say it now and say it loud, I'm a cow and I am proud."


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul  4 10:28:58 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16321AbRGDI2j>; Wed, 4 Jul 2001 10:28:39 +0200
Received: from [194.145.62.101] ([194.145.62.101]:41225 "EHLO mail1.siemens.pt")
	by humbolt.nl.linux.org with ESMTP id <S16311AbRGDI20>;
	Wed, 4 Jul 2001 10:28:26 +0200
Received: from siepor43.siemens.pt ([141.29.156.127]) by mail1.siemens.pt with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id NJSWAD62; Wed, 4 Jul 2001 09:29:00 +0100
Received: by siepor43.siemens.pt with Internet Mail Service (5.5.2653.19)
	id <NV59F18V>; Wed, 4 Jul 2001 09:27:53 +0100
Message-ID: <C075BE033D4FD411B1960800060D9D6E02D0E9A7@siepor43.siemens.pt>
From:	Jorge Silva <Jorge.Silva@lis2.siemens.pt>
To:	"David L. Nicol" <david@kasey.umkc.edu>,
	user-mode-linux-user@lists.sourceforge.net,
	linux-cluster@nl.linux.org
Subject: RE: [uml-user] UML and HA
Date:	Wed, 4 Jul 2001 09:27:48 +0100 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2653.19)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



> -----Original Message-----
> From: David L. Nicol [mailto:david@kasey.umkc.edu]
> Sent: Wednesday, July 04, 2001 7:20 AM
> To: user-mode-linux-user@lists.sourceforge.net;
> linux-cluster@nl.linux.org
> Subject: [uml-user] UML and HA
> 
> 
> 
> I was thinking about HA and process migration, since I've got MOSIX
> up and running again, sort of.  One of the problems that all the
> process migration systems have is mapping local PID to distant PID.
> 
> So here's where user mode linux could come in:  Machines could
> pair up in a "buddy system" (or heavier redundancy -- flavor to
> taste) and each system would maintain a UML snapshot of what its
> buddy is doing.  That way, if the cubans detonate server A, server

I think we should fear the americans more.

> R can activate it's UML backup of server A and gracefully handle
> server A's business.
> 
> 
> 
> -- 
>                                            David Nicol 816.235.1187
>                         And the cow threw up seven times, and said:
>             "Say it now and say it loud, I'm a cow and I am proud."
> 
> 
> _______________________________________________
> User-mode-linux-user mailing list
> User-mode-linux-user@lists.sourceforge.net
> http://lists.sourceforge.net/lists/listinfo/user-mode-linux-user
> 

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul  5 10:55:42 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16248AbRGEIzg>; Thu, 5 Jul 2001 10:55:36 +0200
Received: from gate.in-addr.de ([212.8.193.158]:54033 "EHLO mx.in-addr.de")
	by humbolt.nl.linux.org with ESMTP id <S16098AbRGEIzV>;
	Thu, 5 Jul 2001 10:55:21 +0200
Received: from hermes.marowsky-bree.de (localhost [127.0.0.1])
	by mx.in-addr.de (mail.in-addr.de) with ESMTP
	id EA4FD37B5B; Thu,  5 Jul 2001 10:54:49 +0200 (CEST)
Received: by hermes.marowsky-bree.de (Postfix, from userid 500)
	id 3A5481A909; Wed,  4 Jul 2001 18:56:10 +0200 (CEST)
Date:	Wed, 4 Jul 2001 18:56:10 +0200
From:	Lars Marowsky-Bree <lmb@suse.de>
To:	Alan Robertson <alanr@unix.sh>
Cc:	linux-cluster@nl.linux.org
Subject: Re: A proposal for a General Clustering Framework
Message-ID: <20010704185610.A784@marowsky-bree.de>
References: <20010606015801.18290.qmail@web9204.mail.yahoo.com> <3B1DAE2B.43771C47@unix.sh> <02bd01c0ee4c$276c2d60$94627dc7@filesrus> <3B1DCDDA.8803B4E2@unix.sh>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.3.16i
In-Reply-To: <3B1DCDDA.8803B4E2@unix.sh>; from "Alan Robertson" on 2001-06-06T00:29:46
X-Ctuhulu: HASTUR
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Hi guys,

I am jumping in to a rather old discussion, but I didn't find time before.
Customers. Eek. (Should any participant reading this be a customer, you and
everyone else you know is of course excluded from that statement ;-)

On 2001-06-06T00:29:46,
   Alan Robertson <alanr@unix.sh> said:

> I'm not sure if there's a big technical difference between being able to
> support any two releases that are 15 versions apart from saying that you're
> going to support 15 versions at once.  Administratively, there is a *huge*
> difference.  "This way lies madness".

Let me offer a different perspective here.

Do the nodes running two different versions of the protocol really have to be
able to talk to each other? The discussion appears to assume that the answer
to this question is "Yes".

Now, I am going to explain to you why I think the answer should in fact be
"No".

If you are doing an upgrade which changes APIs (ie, protocol version change,
new attributes), you are asking for the software to bridge a potentially huge
communication gap.

Sure, a new attribute might be filled in automatically by defaulting it to a
sensible value - this is "easy enough" if it is static and independant, but
you have to embed complex logic if it is in fact related to other attributes.
It becomes a nightmare if you aren't adding or deleting an attribute, but
_changing_ the meaning of an attribute, potentially because the old code had a
bug which treated it incorrectly. This different understanding of the
parameters can in fact lead to a quite-non-HA cluster.

So in fact, it is desireable that the answer is "No" to reduce complexity.

So, _can_ the answer be "No"? Yes.

What upgrading a node to a new software release effectively is manually
partitioning the cluster - into the nodes which already have the new software
and those which do not.

If a new node is updated, it will join the "new" cluster. If you figure that
the new software release doesn't work, you downgrade it and it will go back to
the old cluster.

As part of this, you do not have to have the two versions talk to eachother,
because they are completely independant.

The only requirement you have to satisfy here is that two versions of the
protocol on the same wire (logically speaking) truely do not interfere and
that the software ignores any version of the protocols but its own.

This does have a slight penalty obviously: During the upgrade period, your
redundancy is reduced. However, I think this is acceptable, as the upgrade is
a controlled operation and you have experts on site to fix everything which
might go wrong.

It may be desireable to support the following features in the resource
management to make this more seamless for the clients:

- Be able to instruct the resource manager that a node is about to drop out,
  but that the services which were run on this node should NOT be restarted on
  another node, ie "detaching" a node and all services it is running.

- When upgrading the software on the local node, be able to tell the local
  resource manager that even though it is going down, it should NOT take the
  resources down.

- Being able to "reattach" to resources.

Comments?

-- 
"I'm extraordinarily patient provided I get my own way in the end."
        -- Margeret Thatcher

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul  5 11:25:04 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16260AbRGEJYz>; Thu, 5 Jul 2001 11:24:55 +0200
Received: from cpe-24-221-212-80.co.sprintbbd.net ([24.221.212.80]:31471 "EHLO
	host.domain.name") by humbolt.nl.linux.org with ESMTP
	id <S16264AbRGEJYk>; Thu, 5 Jul 2001 11:24:40 +0200
Received: from unix.sh (localhost [127.0.0.1])
	by host.domain.name (Postfix) with ESMTP
	id 9608C26486; Thu,  5 Jul 2001 03:24:08 -0600 (MDT)
Message-ID: <3B443238.1ABE4F79@unix.sh>
Date:	Thu, 05 Jul 2001 03:24:08 -0600
From:	Alan Robertson <alanr@unix.sh>
Organization: IBM Linux Technology Center
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.18 i686)
X-Accept-Language: en
MIME-Version: 1.0
To:	Darryl Rodden <DRodden@Xyterra.com>
Cc:	linux-cluster <linux-cluster@nl.linux.org>
Subject: Re: Updated General Cluster framework draft document
References: <E7FEE1294B1AD51192490002A50EF31F044521@EXCHSERV>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

> Darryl Rodden wrote:
> 
> Hi,
> 
> The HA Forum published an open architecture high availability paper:
> ftp://download.intel.com/platforms/applied/eiacomm/papers/ha_solutions.pdf
> 
> It takes a more systems view and discusses all aspects: hardware, OS,
> middleware, etc.
> 
> I noticed that IBM is not part of that forum, though. :-)

This comes up from time to time.  Usually someone imagines some vast
conspiracy or rivalry.  That's not the case.  For the most part we take
complementary approaches to HA.  They largely concentrate on hardware and
system issues for telephony systems.   We largely concentrate on clustering
issues for enterprise systems.

We'd been doing HA work on Linux for several years on before they formed
their group.  Contributions to their work are limited to paying members,
whereas open community developments (like linux-ha and the framework) are
open to all.  They certainly haven't sent anyone on the linux-ha team an
invitation to join them ;-)  On the other hand, at least a few of their
members have received invitations to participate here.  This isn't to say
that their work is uninteresting, or there is anything wrong with their
approach.

As an aside: The cluster framework is not an IBM effort, and it would be a
mistake to view it that way.  Unlike the intel group, it is open to
*everyone*, and not intended to be dominated by any one company or person.  
When I worked for SuSE, the largest contributors to linux-HA besides myself
weren't SuSE employees, but Conectiva and VA Linux employees - who were
competitors to SuSE on several fronts.

	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul  5 17:19:36 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16095AbRGEPTb>; Thu, 5 Jul 2001 17:19:31 +0200
Received: from cpe-24-221-212-80.co.sprintbbd.net ([24.221.212.80]:6129 "EHLO
	host.domain.name") by humbolt.nl.linux.org with ESMTP
	id <S16092AbRGEPTR>; Thu, 5 Jul 2001 17:19:17 +0200
Received: from unix.sh (localhost [127.0.0.1])
	by host.domain.name (Postfix) with ESMTP
	id 796FD26B28; Thu,  5 Jul 2001 09:18:47 -0600 (MDT)
Message-ID: <3B448557.453924DA@unix.sh>
Date:	Thu, 05 Jul 2001 09:18:47 -0600
From:	Alan Robertson <alanr@unix.sh>
Organization: IBM Linux Technology Center
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.18 i686)
X-Accept-Language: en
MIME-Version: 1.0
To:	Lars Marowsky-Bree <lmb@suse.de>
Cc:	linux-cluster@nl.linux.org
Subject: Re: A proposal for a General Clustering Framework
References: <20010606015801.18290.qmail@web9204.mail.yahoo.com> <3B1DAE2B.43771C47@unix.sh> <02bd01c0ee4c$276c2d60$94627dc7@filesrus> <3B1DCDDA.8803B4E2@unix.sh> <20010704185610.A784@marowsky-bree.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Lars Marowsky-Bree wrote:
> 
> Hi guys,
> 

> Let me offer a different perspective here.
> 
> Do the nodes running two different versions of the protocol really have to be
> able to talk to each other? The discussion appears to assume that the answer
> to this question is "Yes".

The question was more of this form:

Should the framework provide the capability which would enable two different
versions of software potentially running different versions of protocols to
communicate sensibly with each other without a lot of work?

It was not a blanket statement regarding whether any two different versions
of any particular protocol be able to talk together at any particular point
in time.

> Now, I am going to explain to you why I think the answer should in fact be
> "No".
> 
> If you are doing an upgrade which changes APIs (ie, protocol version change,
> new attributes), you are asking for the software to bridge a potentially huge
> communication gap.
> 
> Sure, a new attribute might be filled in automatically by defaulting it to a
> sensible value - this is "easy enough" if it is static and independant, but
> you have to embed complex logic if it is in fact related to other attributes.
> It becomes a nightmare if you aren't adding or deleting an attribute, but
> _changing_ the meaning of an attribute, potentially because the old code had a
> bug which treated it incorrectly. This different understanding of the
> parameters can in fact lead to a quite-non-HA cluster.

If you're designing an HA protocol, and you want for this to work in your
favor it is encumbent upon you to make sensible protocol changes which are
easy to do this with.  If you make stupid decisions, you can make every
single version change hard.  Sometimes the result is just hard, and that's
how it is, and you have to shut the cluster down.

For example, in heartbeat, I changed message format entirely early on.  The
new nodes could never communicate with the old ones.  Since that time, it
hasn't ever happened.  It will likely (but not certainly) happen again when
we change to the XML-RPC view of the world.

The intent of this proposal is to make the easy things easy...  The hard
things are still hard.  Fortunately, with some thought, the easy cases
vastly outnumber the hard ones.

> So in fact, it is desireable that the answer is "No" to reduce complexity.
> 
> So, _can_ the answer be "No"? Yes.
> 
> What upgrading a node to a new software release effectively is manually
> partitioning the cluster - into the nodes which already have the new software
> and those which do not.

Why would an administrator (customer) *want* to do this?  That's not what I
want when I install *my* new software ;-).  If it's a 2-node system (the
most common kind), you no longer have quorum and the cluster stops.  Worse
yet, you could wind up with two clusters, *each* of which has quorum...

For example, Paddy understands FailSafe's lack of ability to do this as
being a problem.  IBM's HA-CMP has this feature.  Heartbeat has this
feature.

Wombat tells me that marketing comes in from time to time and has some
particular sale they're going to lose if they can't upgrade seamlessly from
version X to version Y.  Then someone makes the decision and developers
scramble so that the customer gets what they want.  This seems to be
reality.  This proposal makes reality easier to stomach.

If you have a 101-node cluster, you have to downgrade it to a cluster of
significantly less capacity  (>= 51 nodes).  This seems like a lot of loss
of capacity...  If the new cluster doesn't work, you won't find it until the
new cluster achieves quorum, and then if you have problems, you have 51
nodes to back out of...  This is equivalent to a 50/51-node flash cut at the
moment the "old" cluster loses quorum.  You're very vulnerable at that
moment...

If you *want* to take this approach, you can define the cluster to talk over
a different multicast group, or a different port or whatever and leave
resources offline until you're ready to switch over.

My experience in database upgrades indicates that about 9/10 of the upgrades
are easy ones.  You can always use this approach for dealing with
incompatible versions.

It is probably the case that group services and/or the RPC system should
deal with this - and allow the application some flexibility in solving it.

	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul  5 18:48:51 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16116AbRGEQsd>; Thu, 5 Jul 2001 18:48:33 +0200
Received: from mail317.mail.bellsouth.net ([205.152.58.177]:54799 "EHLO
	imf17bis.bellsouth.net") by humbolt.nl.linux.org with ESMTP
	id <S16092AbRGEQsT> convert rfc822-to-8bit; Thu, 5 Jul 2001 18:48:19 +0200
Received: from taz ([208.61.65.237]) by imf17bis.bellsouth.net
          (InterMail vM.5.01.01.01 201-252-104) with SMTP
          id <20010705164909.JJFP20225.imf17bis.bellsouth.net@taz>;
          Thu, 5 Jul 2001 12:49:09 -0400
Date:	Thu, 5 Jul 2001 12:48:16 -0400
From:	Greg Freemyer <freemyer-ml@NorcrossGroup.com>
Subject: re: UML and HA
To:	Greg Freemyer <freemyer@NorcrossGroup.com>,
	linux-cluster <linux-cluster@nl.linux.org>
Mime-Version: 1.0
Organization: The Norcross Group
X-Mailer: GoldMine [5.50.10424]
Content-Type: text/plain
Content-Transfer-Encoding: 8BIT
Message-Id: <20010705164909.JJFP20225.imf17bis.bellsouth.net@taz>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


I don't know if it helps with process migration, but you might want to consider the below:

In Compaq's TruClusters they require each process to have a unique pid across the cluster.

They accomplish this by using a 64-bit pid.  Then they have each node assigned a unique node #.

Effectively the pid becomes a simple combination of the node # and the local pid.

in 'c' code:    pid = (node << 32) | local_pid;

          (There are of course faster ways to code this.)

The above may seem extreme, but Compaq seems to have made a major paradigm shift.

They now seem to be designing a high quality/high functionality cluster OS, and let the single node cluster be a special case.

I for one agree with this conceptualization and hope Linux does the same, but I think it will take a lot of evangelizing in the Linux kernel area to see this paradigm take hold.  (I could be wrong here.  I follow Linux clustering mail lists, but not any of the others.)

Greg
=========

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com

>>  I was thinking about HA and process migration, since I've got MOSIX
>>  up and running again, sort of.  One of the problems that all the
>>  process migration systems have is mapping local PID to distant PID.

>>  So here's where user mode linux could come in:  Machines could
>>  pair up in a "buddy system" (or heavier redundancy -- flavor to
>>  taste) and each system would maintain a UML snapshot of what its
>>  buddy is doing.  That way, if the cubans detonate server A, server
>>  R can activate it's UML backup of server A and gracefully handle
>>  server A's business.



>>  -- 
>>  David Nicol 816.235.1187
>>  And the cow threw up seven times, and said:
>>  "Say it now and say it loud, I'm a cow and I am proud."


>>  Linux-cluster: generic cluster infrastructure for Linux
>>  Archive:       http://mail.nl.linux.org/linux-cluster/









Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul  5 19:05:52 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16093AbRGERFq>; Thu, 5 Jul 2001 19:05:46 +0200
Received: from cpe-24-221-212-80.co.sprintbbd.net ([24.221.212.80]:48625 "EHLO
	host.domain.name") by humbolt.nl.linux.org with ESMTP
	id <S16092AbRGERF0>; Thu, 5 Jul 2001 19:05:26 +0200
Received: from unix.sh (localhost [127.0.0.1])
	by host.domain.name (Postfix) with ESMTP
	id 1E3A626EC5; Thu,  5 Jul 2001 11:04:48 -0600 (MDT)
Message-ID: <3B449E2F.8C5F5EEE@unix.sh>
Date:	Thu, 05 Jul 2001 11:04:47 -0600
From:	Alan Robertson <alanr@unix.sh>
Organization: IBM Linux Technology Center
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.18 i686)
X-Accept-Language: en
MIME-Version: 1.0
To:	Greg Freemyer <freemyer-ml@NorcrossGroup.com>
Cc:	Greg Freemyer <freemyer@NorcrossGroup.com>,
	linux-cluster <linux-cluster@nl.linux.org>
Subject: Re: UML and HA
References: <20010705164909.JJFP20225.imf17bis.bellsouth.net@taz>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg Freemyer wrote:
> 
> I don't know if it helps with process migration, but you might want to consider the below:
> 
> In Compaq's TruClusters they require each process to have a unique pid across the cluster.
> 
> They accomplish this by using a 64-bit pid.  Then they have each node assigned a unique node #.

MOSIX does something similar.

	-- Alan Robertson
	   alanr@unix.sh

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul  6 08:52:33 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16034AbRGFGwZ>; Fri, 6 Jul 2001 08:52:25 +0200
Received: from cs.huji.ac.il ([132.65.16.10]:14557 "EHLO cs.huji.ac.il")
	by humbolt.nl.linux.org with ESMTP id <S16139AbRGFGwN>;
	Fri, 6 Jul 2001 08:52:13 +0200
Received: from mos218.cs.huji.ac.il ([132.65.173.218] ident=mail)
	by cs.huji.ac.il with esmtp (Exim 3.30 #1)
	id 15IPTI-0001lf-00
	for linux-cluster@nl.linux.org; Fri, 06 Jul 2001 09:52:12 +0300
Received: from amnons by mos218.cs.huji.ac.il with local (Exim 3.16 #1)
	id 15IPTI-0001nk-00
	for linux-cluster@nl.linux.org; Fri, 06 Jul 2001 09:52:12 +0300
Subject: Re: UML and HA
To:	linux-cluster@nl.linux.org
Date:	Fri, 6 Jul 2001 09:52:12 +0300 (IDT)
X-Mailer: ELM [version 2.5 PL3]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <E15IPTI-0001nk-00@mos218.cs.huji.ac.il>
From:	Amnon Shiloh <amnons@cs.huji.ac.il>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Alan Robertson <alanr@unix.sh> Wrote:

> > I don't know if it helps with process migration, but you might want to consider the below:
> > 
> > In Compaq's TruClusters they require each process to have a unique pid across the cluster.
> > 
> > They accomplish this by using a 64-bit pid.  Then they have each node assigned a unique node #.
> 
> MOSIX does something similar.

MOSIX indeed has a unique node #, but still uses only 16-bit PIDs
in order to avoid any changes to the user-level interface.
As a result, PIDs are NOT unique across the cluster.

Amnon Shiloh -- the HUJI MOSIX group.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul  6 09:33:32 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16141AbRGFHdN>; Fri, 6 Jul 2001 09:33:13 +0200
Received: from [213.98.27.110] ([213.98.27.110]:32264 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16147AbRGFHdE>;
	Fri, 6 Jul 2001 09:33:04 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f669eLb06259
	for <linux-cluster@nl.linux.org>; Fri, 6 Jul 2001 09:40:21 GMT
Date:	Fri, 6 Jul 2001 09:40:21 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
cc:	<linux-cluster@nl.linux.org>
Subject: Re: UML and HA
In-Reply-To: <E15IPTI-0001nk-00@mos218.cs.huji.ac.il>
Message-ID: <Pine.LNX.4.30.0107060938540.5694-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
To:	unlisted-recipients:; (no To-header on input)
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


 Hello, Alan!

> > > They accomplish this by using a 64-bit pid.  Then they have each node assigned a unique node #.
> >
> > MOSIX does something similar.

 No, Mosix doesn't. PVM does something similar.

 Yours:

David

---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sat Jul  7 20:53:54 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S17361AbRGGSxt>; Sat, 7 Jul 2001 20:53:49 +0200
Received: from suntan.tandem.com ([192.216.221.8]:2189 "EHLO suntan.tandem.com")
	by humbolt.nl.linux.org with ESMTP id <S17359AbRGGSxg>;
	Sat, 7 Jul 2001 20:53:36 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id LAA03092
	for <linux-cluster@nl.linux.org>; Sat, 7 Jul 2001 11:53:33 -0700 (PDT)
Received: (from bruce@localhost) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) id f67IfNl21269; Sat, 7 Jul 2001 11:41:23 -0700 (PDT)
From:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Message-Id: <200107071841.f67IfNl21269@kahuna.cag.cpqcorp.net>
Subject: Re: UML and HA and clusterwide pids
In-Reply-To: <E15IPTI-0001nk-00@mos218.cs.huji.ac.il> from Amnon Shiloh at "Jul 6, 2001 09:52:12 am"
To:	amnons@cs.huji.ac.il (Amnon Shiloh)
Date:	Sat, 7 Jul 2001 11:41:23 -0700 (PDT)
Cc:	linux-cluster@nl.linux.org
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Annon wrote:
> 
> MOSIX indeed has a unique node #, but still uses only 16-bit PIDs
> in order to avoid any changes to the user-level interface.
> As a result, PIDs are NOT unique across the cluster.
> 
> Amnon Shiloh -- the HUJI MOSIX group.

The SSI project that was recently launched 
(you can find it from www.opensource.compaq.com) has
unique node numbers, 32-bit pids on 32 bit hardware and
clusterwide unique pids (the technology base the
SSI project is seeded with has had clusterwide unique
pids and process migration for over 15 years and a 
single clusterwide root filesystem for over 20 years
(Locus technology, TCF, TNC, and NonStop Clusters for Unixware).

A first developer release, including:
  - cluster membership and internode communication from
	the Cluster Infrastructure project
  - capability for clusters up to 64 nodes
  - clusterwide root filesystem
  - clusterwide process ids and access to all
	processes from all nodes at all times
  - full SSI remote exec, inheriting pid, open files,
	open sockets, open pipes, open devices, etc.
  - process migration, if we finish the port in time
  - clusterwide device naming and access
  - clusterwide message queue naming and access
  - clusterwide fifo naming and access
  - single init for the cluster, with support to have
	the cluster at different run levels
  - no single points of failure for the cluster
  - application monitoring and failover
should be out this month.

Please come and join the project (at least via the email 
list for now).

bruce.walker@compaq.com
Open SSI Cluster architect
Compaq.



> 
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sun Jul  8 23:59:25 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S17744AbRGHV7K>; Sun, 8 Jul 2001 23:59:10 +0200
Received: from web9205.mail.yahoo.com ([216.136.129.38]:54792 "HELO
	web9205.mail.yahoo.com") by humbolt.nl.linux.org with SMTP
	id <S17742AbRGHV6y>; Sun, 8 Jul 2001 23:58:54 +0200
Message-ID: <20010708215851.85019.qmail@web9205.mail.yahoo.com>
Received: from [32.102.126.40] by web9205.mail.yahoo.com via HTTP; Sun, 08 Jul 2001 14:58:51 PDT
Date:	Sun, 8 Jul 2001 14:58:51 -0700 (PDT)
From:	Peter Badovinatz <tabmowzo@yahoo.com>
Subject: Re: A proposal for a General Clustering Framework
To:	Lars Marowsky-Bree <lmb@suse.de>, Alan Robertson <alanr@unix.sh>
Cc:	linux-cluster@nl.linux.org
In-Reply-To: <20010704185610.A784@marowsky-bree.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Sorry to be even later jumping in, been on vacation for a week...

--- Lars Marowsky-Bree <lmb@suse.de> wrote:
> Hi guys,
> 
> I am jumping in to a rather old discussion, but I didn't find time before.
> Customers. Eek. (Should any participant reading this be a customer, you and
> everyone else you know is of course excluded from that statement ;-)
> 
<snip>
> 
> So in fact, it is desireable that the answer is "No" to reduce complexity.
> 
> So, _can_ the answer be "No"? Yes.

>From experience on the proprietary HA cluster front, the answer can't be no. 
The customers want us to be able to upgrade the software, and reintegrate the
node into the cluster.  They also want failover to work during the upgrade
process.  A set of disks connected to two nodes has no redundancy in the period
when one of the nodes is being upgraded, they want that node active as quickly
as possible.
> 
> What upgrading a node to a new software release effectively is manually
> partitioning the cluster - into the nodes which already have the new software
> and those which do not.

No, it has to be one cluster.
> 
> If a new node is updated, it will join the "new" cluster. If you figure that
> the new software release doesn't work, you downgrade it and it will go back
> to
> the old cluster.

Many serious HA  customers have 'test' clusters, which are set up the same as
their production clusters, and they test the upgrade here first.  But, yes,
they still want the ability to downgrade the node and back off an update.
> 
> As part of this, you do not have to have the two versions talk to eachother,
> because they are completely independant.

They can't be independent, as they are managing a common set of resources.
> 
> The only requirement you have to satisfy here is that two versions of the
> protocol on the same wire (logically speaking) truely do not interfere and
> that the software ignores any version of the protocols but its own.
> 
> This does have a slight penalty obviously: During the upgrade period, your
> redundancy is reduced. However, I think this is acceptable, as the upgrade is
> a controlled operation and you have experts on site to fix everything which
> might go wrong.

Fix what?  If there is no backup node integrated with the active node, where is
the recovery?  You're correct that there are experts on site, although how
expert they are varies by customer.
> 
> It may be desireable to support the following features in the resource
> management to make this more seamless for the clients:
> 
> - Be able to instruct the resource manager that a node is about to drop out,
>   but that the services which were run on this node should NOT be restarted
> on
>   another node, ie "detaching" a node and all services it is running.
> 
> - When upgrading the software on the local node, be able to tell the local
>   resource manager that even though it is going down, it should NOT take the
>   resources down.
> 
> - Being able to "reattach" to resources.

HACMP on AIX calls this support 'forced down'.  You can tell HACMP to leave the
resources running where they are, shut itself down, the other nodes do not take
the resources over, you upgrade HACMP code, restart it, and it reattaches the
resources.  The exposure here is that IF a resource fails during this upgrade
period, it will NOT be recovered, and you have to take manual action.  This
feature is popular among HACMP customers, at least some notable minority of
them.
> 
> Comments?

We have found that customers with HA clusters have some common features:
- they never want to upgrade, once it's working.
- if they must upgrade, it must be one node at a time, and it must be able to
be reversed if anything appears to be going wrong.
- upgraded nodes reintegrate into the cluster seamlessly.
- windows exposing the cluster to SPOFs must be as small as possible.
- they should never need to shut down a resource to upgrade the HA code.
> 
> -- 
> "I'm extraordinarily patient provided I get my own way in the end."
>         -- Margeret Thatcher

Peter

=====
These have been the opinions of:
Peter R. Badovinatz -- (503)578-5530 (TL 775)
wombat@us.ibm.com/tabmowzo@yahoo.com
and in no way should be construed as official opinion of 
IBM, Corp.

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 10 15:53:33 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16381AbRGJNxY>; Tue, 10 Jul 2001 15:53:24 +0200
Received: from gate.in-addr.de ([212.8.193.158]:56081 "EHLO mx.in-addr.de")
	by humbolt.nl.linux.org with ESMTP id <S16355AbRGJNxL>;
	Tue, 10 Jul 2001 15:53:11 +0200
Received: from hermes.marowsky-bree.de (localhost [127.0.0.1])
	by mx.in-addr.de (mail.in-addr.de) with ESMTP
	id 107A93F83A; Tue, 10 Jul 2001 15:52:50 +0200 (CEST)
Received: by hermes.marowsky-bree.de (Postfix, from userid 500)
	id A2E301A98F; Tue, 10 Jul 2001 15:53:10 +0200 (CEST)
Date:	Tue, 10 Jul 2001 15:53:10 +0200
From:	Lars Marowsky-Bree <lmb@suse.de>
To:	linuxfailsafe@lists.community.tummy.com, linux-ha@muc.de,
	linux-cluster@nl.linux.org
Subject: Open Clustering BOF at Ottawa Linux Symposium
Message-ID: <20010710155310.A2114@marowsky-bree.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.3.16i
X-Ctuhulu: HASTUR
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Hi,

the description of the HA BOF is finally online - please see it at
http://www.linuxsymposium.org/bofs.php .

If you are interested, please attend.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 10 23:02:31 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16424AbRGJVC0>; Tue, 10 Jul 2001 23:02:26 +0200
Received: from pentafluge.infradead.org ([195.224.55.251]:47112 "EHLO
	pentafluge.infradead.org") by humbolt.nl.linux.org with ESMTP
	id <S16420AbRGJVCM> convert rfc822-to-8bit; Tue, 10 Jul 2001 23:02:12 +0200
Received: from mail315.mail.bellsouth.net ([205.152.58.175] helo=imf15bis.bellsouth.net)
	by pentafluge.infradead.org with esmtp (Exim 3.22 #1 (Red Hat Linux))
	id 15K4YY-0006jg-00
	for <linux-cluster@nl.linux.org>; Tue, 10 Jul 2001 21:56:30 +0100
Received: from taz ([208.61.65.237]) by imf15bis.bellsouth.net
          (InterMail vM.5.01.01.01 201-252-104) with SMTP
          id <20010710205903.RSCI797.imf15bis.bellsouth.net@taz>
          for <linux-cluster@nl.linux.org>; Tue, 10 Jul 2001 16:59:03 -0400
Date:	Tue, 10 Jul 2001 16:57:38 -0400
From:	Greg Freemyer <freemyer-ml@NorcrossGroup.com>
Subject: Clusterwide pids
To:	<linux-cluster@nl.linux.org>
Mime-Version: 1.0
Organization: The Norcross Group
X-Mailer: GoldMine [5.50.10424]
Content-Type: text/plain
Content-Transfer-Encoding: 8BIT
Message-Id: <20010710205903.RSCI797.imf15bis.bellsouth.net@taz>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


SInce this mailing list is dedicated to sharing cluster component technologies amongst the various Open Source cluster projects, Bruce Walker's comments got me thinking:



Definition:
CPID - Clusterwide Process ID.  Guaranteed to be unique for each process across the entire cluster.

(please advise if there is a pre-existing Acronym for this.)

Assumptions:
CPIDs are a useful concept and one that is used by several existing cluster solutions.

Sophisticated features, such as transparent clusterwide IPC, seem to require CPIDs

CPIDs may make process migration easier to implement. (I have no knowledge about this.)

For various reasons, implementing them seems difficult in a 16-bit pid world.

Changing pids to a 32-bit int will cause changes to the kernel and to many applications.  At a minimum many user level programs may need to be recompiled if this change were made.

Questions:
Given that CPIDs seem to be difficult to fully implement and involve the kernel, but if accomplished can be widely used throughout the OpenSource cluster projects, should a common CPID project be initiated?

Should CPIDs be added to Alan Robertson's infrastructure document as a future goal?  (Many HA clusters do not need CPIDs, so I would guess that it would not be appropriate, but I thought I would ask anyway.)

Bruce Walker relates that the SSI for Linux Clusters project has this working, and apparently PVM does as well.  Could either of these implementations, or some other existing implementation, be isolated and made available to the Linux Cluster community as a component?

Can the CPID module be written in such a way as to allow multiple cluster membership algorithms?

If so, what are the chances of this component being made a part of the standard Linux kernel?  I realize many people are against significant changes to the kernel, but I do not see a way for CPIDs to be implemented outside of the kernel, and one common solution seems far better than a separate solution for each Cluster Solution which needs CPIDs.


Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com


>>  Annon wrote:
>>  > 
>>  > MOSIX indeed has a unique node #, but still uses only 16-bit PIDs
>>  > in order to avoid any changes to the user-level interface.
>>  > As a result, PIDs are NOT unique across the cluster.
>>  > 
>>  > Amnon Shiloh -- the HUJI MOSIX group.

>>  The SSI project that was recently launched 
>>  (you can find it from www.opensource.compaq.com) has
>>  unique node numbers, 32-bit pids on 32 bit hardware and
>>  clusterwide unique pids (the technology base the
>>  SSI project is seeded with has had clusterwide unique
>>  pids and process migration for over 15 years and a 
>>  single clusterwide root filesystem for over 20 years
>>  (Locus technology, TCF, TNC, and NonStop Clusters for Unixware).

>>  A first developer release, including:
>>  - cluster membership and internode communication from
>>      the Cluster Infrastructure project
>>  - capability for clusters up to 64 nodes
>>  - clusterwide root filesystem
>>  - clusterwide process ids and access to all
>>      processes from all nodes at all times
>>  - full SSI remote exec, inheriting pid, open files,
>>      open sockets, open pipes, open devices, etc.
>>  - process migration, if we finish the port in time
>>  - clusterwide device naming and access
>>  - clusterwide message queue naming and access
>>  - clusterwide fifo naming and access
>>  - single init for the cluster, with support to have
>>      the cluster at different run levels
>>  - no single points of failure for the cluster
>>  - application monitoring and failover
>>  should be out this month.

>>  Please come and join the project (at least via the email 
>>  list for now).

>>  bruce.walker@compaq.com
>>  Open SSI Cluster architect
>>  Compaq.



>>  > 
>>  > Linux-cluster: generic cluster infrastructure for Linux
>>  > Archive:       http://mail.nl.linux.org/linux-cluster/


>>  Linux-cluster: generic cluster infrastructure for Linux
>>  Archive:       http://mail.nl.linux.org/linux-cluster/








Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 10 23:19:55 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16441AbRGJVTk>; Tue, 10 Jul 2001 23:19:40 +0200
Received: from pentafluge.infradead.org ([195.224.55.251]:53512 "EHLO
	pentafluge.infradead.org") by humbolt.nl.linux.org with ESMTP
	id <S16422AbRGJVTd> convert rfc822-to-8bit; Tue, 10 Jul 2001 23:19:33 +0200
Received: from mail317.mail.bellsouth.net ([205.152.58.177] helo=imf17bis.bellsouth.net)
	by pentafluge.infradead.org with esmtp (Exim 3.22 #1 (Red Hat Linux))
	id 15K4pL-0006lR-00
	for <linux-cluster@nl.linux.org>; Tue, 10 Jul 2001 22:13:51 +0100
Received: from taz ([208.61.65.237]) by imf06bis.bellsouth.net
          (InterMail vM.5.01.01.01 201-252-104) with SMTP
          id <20010710163106.LSM13530.imf06bis.bellsouth.net@taz>;
          Tue, 10 Jul 2001 12:31:06 -0400
Date:	Tue, 10 Jul 2001 12:29:42 -0400
From:	Greg Freemyer <freemyer@NorcrossGroup.com>
Subject: Clusterwide pids
To:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
cc:	<linux-cluster@nl.linux.org>
Mime-Version: 1.0
Organization: The Norcross Group
X-Mailer: GoldMine [5.50.10424]
Content-Type: text/plain
Content-Transfer-Encoding: 8BIT
Message-Id: <20010710163106.LSM13530.imf06bis.bellsouth.net@taz>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


SInce this mailing list is dedicated to sharing cluster component technologies amongst the various Open Source cluster projects, Bruce Walker's comments got me thinking:



Definition:
CPID - Clusterwide Process ID.  Guaranteed to be unique for each process across the entire cluster.

(please advise if there is a pre-existing Acronym for this.)

Assumptions:
CPIDs are a useful concept and one that is used by several existing cluster solutions.

Sophisticated features, such as transparent clusterwide IPC, seem to require CPIDs

CPIDs may make process migration easier to implement. (I have no knowledge about this.)

For various reasons, implementing them seems difficult in a 16-bit pid world.

Changing pids to a 32-bit int will cause changes to the kernel and to many applications.  At a minimum many user level programs may need to be recompiled if this change were made.

Questions:
Given that CPIDs seem to be difficult to fully implement and involve the kernel, but if accomplished can be widely used throughout the OpenSource cluster projects, should a common CPID project be initiated?

Should CPIDs be added to Alan Robertson's infrastructure document as a future goal?  (Many HA clusters do not need CPIDs, so I would guess that it would not be appropriate, but I thought I would ask anyway.)

Bruce Walker relates that the SSI for Linux Clusters project has this working, and apparently PVM does as well.  Could either of these implementations, or some other existing implementation, be isolated and made available to the Linux Cluster community as a component?

Can the CPID module be written in such a way as to allow multiple cluster membership algorithms?

If so, what are the chances of this component being made a part of the standard Linux kernel?  I realize many people are against significant changes to the kernel, but I do not see a way for CPIDs to be implemented outside of the kernel, and one common solution seems far better than a separate solution for each Cluster Solution which needs CPIDs.


Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com


 >>  Annon wrote:
 >>  > 
 >>  > MOSIX indeed has a unique node #, but still uses only 16-bit PIDs
 >>  > in order to avoid any changes to the user-level interface.
 >>  > As a result, PIDs are NOT unique across the cluster.
 >>  > 
 >>  > Amnon Shiloh -- the HUJI MOSIX group.

 >>  The SSI project that was recently launched 
 >>  (you can find it from www.opensource.compaq.com) has
 >>  unique node numbers, 32-bit pids on 32 bit hardware and
 >>  clusterwide unique pids (the technology base the
 >>  SSI project is seeded with has had clusterwide unique
 >>  pids and process migration for over 15 years and a 
 >>  single clusterwide root filesystem for over 20 years
 >>  (Locus technology, TCF, TNC, and NonStop Clusters for Unixware).

 >>  A first developer release, including:
 >>  - cluster membership and internode communication from
 >>      the Cluster Infrastructure project
 >>  - capability for clusters up to 64 nodes
 >>  - clusterwide root filesystem
 >>  - clusterwide process ids and access to all
 >>      processes from all nodes at all times
 >>  - full SSI remote exec, inheriting pid, open files,
 >>      open sockets, open pipes, open devices, etc.
 >>  - process migration, if we finish the port in time
 >>  - clusterwide device naming and access
 >>  - clusterwide message queue naming and access
 >>  - clusterwide fifo naming and access
 >>  - single init for the cluster, with support to have
 >>      the cluster at different run levels
 >>  - no single points of failure for the cluster
 >>  - application monitoring and failover
 >>  should be out this month.

 >>  Please come and join the project (at least via the email 
 >>  list for now).

 >>  bruce.walker@compaq.com
 >>  Open SSI Cluster architect
 >>  Compaq.



 >>  > 
 >>  > Linux-cluster: generic cluster infrastructure for Linux
 >>  > Archive:       http://mail.nl.linux.org/linux-cluster/


 >>  Linux-cluster: generic cluster infrastructure for Linux
 >>  Archive:       http://mail.nl.linux.org/linux-cluster/







Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 00:07:01 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16451AbRGJWGq>; Wed, 11 Jul 2001 00:06:46 +0200
Received: from [213.98.27.110] ([213.98.27.110]:4882 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16455AbRGJWGc>;
	Wed, 11 Jul 2001 00:06:32 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6B0Dl312514;
	Wed, 11 Jul 2001 00:13:47 GMT
Date:	Wed, 11 Jul 2001 00:13:47 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	Greg Freemyer <freemyer@NorcrossGroup.com>
cc:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	<linux-cluster@nl.linux.org>
Subject: Re: Clusterwide pids
In-Reply-To: <20010710163106.LSM13530.imf06bis.bellsouth.net@taz>
Message-ID: <Pine.LNX.4.30.0107102342330.12297-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, all!

On Tue, 10 Jul 2001, Greg Freemyer wrote:

> Definition:
> CPID - Clusterwide Process ID.  Guaranteed to be unique for each process across the entire cluster.
>
> (please advise if there is a pre-existing Acronym for this.)
>
> Assumptions:
> CPIDs are a useful concept and one that is used by several existing cluster solutions.
>
> Sophisticated features, such as transparent clusterwide IPC, seem to require CPIDs
>
> CPIDs may make process migration easier to implement. (I have no knowledge about this.)

 Not necessarly. In fact, the only working SSI fully transparent migration
scheme running today on productions environments -Mosix- has no
a common PID space. And you have a patch for common PID space for Beowulf
clusters by Internet.

 Personally, I find the PVM pid model great, and that would work on Mosix
-althought it would break the backwards compatibility some user space
apps, like top-. It could be solved parching the most common system apps
-as ps or top-, but Mosix people are not too much worried about what
happends at user space.  -or at least this is that it looked to me-.


> For various reasons, implementing them seems difficult in a 16-bit pid world.

 As an example, in Mosix we have a limit of 2**16 nodes, plus 16 bits pid,
it would need 32 bit pids. Possible, but common aplications like ps could
behave strangely.

> Questions:
> Given that CPIDs seem to be difficult to fully implement and involve
> the kernel, but if accomplished can be widely used throughout the
> OpenSource cluster projects, should a common CPID project be initiated?

 I have seen on the past a path running for 2.0 series for a common PID,
but I lost the link. Anyway, some people would ask first about if it is
a good feature -since not all the people think that it is a good idea
breaking backward compatibility-.


> Bruce Walker relates that the SSI for Linux Clusters project has this
> working, and apparently PVM does as well.  Could either of these
> implementations, or some other existing implementation, be isolated and
> made available to the Linux Cluster community as a component?

 PVM uses the CPID as an internal comodity of the virtual machine, and it
does mapping between the internal common PID and the real PID of the
machine. Anyway, it runs enterly at userland. I doubt that we could use
PVM code, but we can get the ideas.

 The main idea would be doing mapping, as PID does. If you ask for a
16-bit PID, you reference the local PID. If you ask for a 32-bit PID, it
would give to you the global PID. This would mean that all the old
application will run, but it also means that we will need more kernel
calls. As an example, we keep:

 pid_t getpid(void);
 pid_t getppid(void);


 but we also provide:

 pid_t_32 getpid32(void);
 pid_t_32 getppid32(void);

 This would mean that the old apps will run, and we can develop new HP
applications that would make use of the new syscalls.


 For a non-HP kernel, the result of the 16-bits and 32-bits would be the
same -2 most significant bytes at CPID at 0-.

 For a HP kernel, the two lest significat bytes of CPID are the local
PID, and the two most significant bytes are the node tag.

> Can the CPID module be written in such a way as to allow multiple
> cluster membership algorithms?

 I strongly doubt, due to the problems on "live" new memberships. Anyway,
on all the cases that I know on HP computing,  there is not a "multiple
cluster membership" concept. We used to use  something like "partitioning
the cluster". I thing that maybe is better not allowing multiple
memberships, and allowing partitions and groups of jobs; in a way that a
node can  be in more than one partition, and a job of a group of jobs can
only migrate inside its partition.



 Yours:

David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 00:28:36 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16456AbRGJW22>; Wed, 11 Jul 2001 00:28:28 +0200
Received: from jalon.able.es ([212.97.163.2]:21721 "EHLO jalon.able.es")
	by humbolt.nl.linux.org with ESMTP id <S16447AbRGJW2S>;
	Wed, 11 Jul 2001 00:28:18 +0200
Received: from werewolf.able.es ([212.97.170.71]) by
          jalon.able.es (Netscape Messaging Server 4.15) with ESMTP id
          GGA3RZ00.77C; Wed, 11 Jul 2001 00:28:47 +0200 
Date:	Wed, 11 Jul 2001 00:29:40 +0200
From:	"J . A . Magallon" <jamagallon@able.es>
To:	irbis@orcero.org
Cc:	Greg Freemyer <freemyer@NorcrossGroup.com>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010711002940.A1264@werewolf.able.es>
References: <20010710163106.LSM13530.imf06bis.bellsouth.net@taz> <Pine.LNX.4.30.0107102342330.12297-100000@hermes.orcero.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
In-Reply-To: <Pine.LNX.4.30.0107102342330.12297-100000@hermes.orcero.org>; from irbis@orcero.org on Wed, Jul 11, 2001 at 02:13:47 +0200
X-Mailer: Balsa 1.1.6
Content-Length:	1573
Lines:	38
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


On 20010711 irbis@orcero.org wrote:
>>
>> CPIDs may make process migration easier to implement. (I have no knowledge about this.)
>

Well, this was the kind of things I thought that were going to be talked about
in this list the first time I saw the announce. Things needed by everyone, that
are not still solved in an standard way (and a way accepted for inclussion in
main kernel tree). This are the kind of things that
must be standarised before anything. I lost my hope when people begun to talk
about XML.... what the hell is that needed for just to say 'I'm here' ?. Well...

I have read sometimes about 32bit pids in lkml. I think the first thing to do is
ask lkml (well, really Linus) what are the plans for 32bit pids on main kernel
stream. Perhaps it is a 2.5 planned feature.

> Not necessarly. In fact, the only working SSI fully transparent migration
>scheme running today on productions environments -Mosix- has no
>a common PID space. And you have a patch for common PID space for Beowulf
>clusters by Internet.
>

Look at http://bproc.sourceforge.net/. I think it is the cleaner implementation.

>
> For a HP kernel, the two lest significat bytes of CPID are the local
>PID, and the two most significant bytes are the node tag.
>

That is the reason I see to ask in kernel list. Perhaps there are plans yet
fot those high 32 bits...

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Mandrake Linux release 8.1 (Cooker) for i586
Linux werewolf 2.4.6-ac2 #1 SMP Sun Jul 8 23:57:11 CEST 2001 i686

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 00:59:42 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16465AbRGJW7Y>; Wed, 11 Jul 2001 00:59:24 +0200
Received: from pentafluge.infradead.org ([195.224.55.251]:57096 "EHLO
	pentafluge.infradead.org") by humbolt.nl.linux.org with ESMTP
	id <S16460AbRGJW7E> convert rfc822-to-8bit; Wed, 11 Jul 2001 00:59:04 +0200
Received: from mail108.mail.bellsouth.net ([205.152.58.48] helo=imf08bis.bellsouth.net)
	by pentafluge.infradead.org with esmtp (Exim 3.22 #1 (Red Hat Linux))
	id 15K6Nf-0006qW-00
	for <linux-cluster@nl.linux.org>; Tue, 10 Jul 2001 23:53:23 +0100
Received: from taz ([208.61.65.237]) by imf08bis.bellsouth.net
          (InterMail vM.5.01.01.01 201-252-104) with SMTP
          id <20010710225557.KBMP10148.imf08bis.bellsouth.net@taz>;
          Tue, 10 Jul 2001 18:55:57 -0400
Date:	Tue, 10 Jul 2001 18:54:31 -0400
From:	Greg Freemyer <freemyer@NorcrossGroup.com>
Subject: re[2]: Clusterwide pids
To:	<irbis@orcero.org>
cc:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	<linux-cluster@nl.linux.org>
Mime-Version: 1.0
Organization: The Norcross Group
X-Mailer: GoldMine [5.50.10424]
Content-Type: text/plain
Content-Transfer-Encoding: 8BIT
Message-Id: <20010710225557.KBMP10148.imf08bis.bellsouth.net@taz>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

David,

 >>  > Can the CPID module be written in such a way as to allow multiple
 >>  > cluster membership algorithms?

 >>  I strongly doubt, due to the problems on "live" new memberships. Anyway,
 >>  on all the cases that I know on HP computing,  there is not a "multiple
 >>  cluster membership" concept. We used to use  something like "partitioning
 >>  the cluster". I thing that maybe is better not allowing multiple
 >>  memberships, and allowing partitions and groups of jobs; in a way that a
 >>  node can  be in more than one partition, and a job of a group of jobs can
 >>  only migrate inside its partition.

I did not mean to imply simultaneously.

I merely meant: could the CPID module be written generically enough that it could interface to the multiple membership algorithms already in use. (i.e. I'm sure FailSafe, Kimberlite, SteelEye, Beowulf, PVM, SSI for Linux Clusters, etc. each have their own membership algorithm.)

I was not trying to imply that one node could be simultaneously a member of a PVM cluster and a SSI for Linux Cluster.  That seems way beyond the current scope of things.  I guess it is conceivable that with enough common infrastructure it could be done, but I would not consider that a design goal.

What I was concerned about is that each of the membership algorithms and implementations has its own idiosyncrasies and we would need to insure that a CPID module would not have to know which membership algorithm it was working with.

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 01:41:40 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16449AbRGJXlc>; Wed, 11 Jul 2001 01:41:32 +0200
Received: from jalon.able.es ([212.97.163.2]:19163 "EHLO jalon.able.es")
	by humbolt.nl.linux.org with ESMTP id <S16428AbRGJXlT>;
	Wed, 11 Jul 2001 01:41:19 +0200
Received: from werewolf.able.es ([212.97.170.71]) by
          jalon.able.es (Netscape Messaging Server 4.15) with ESMTP id
          GGA75P00.I6J; Wed, 11 Jul 2001 01:41:49 +0200 
Date:	Wed, 11 Jul 2001 01:42:43 +0200
From:	"J . A . Magallon" <jamagallon@able.es>
To:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Cc:	"J . A . Magallon" <jamagallon@able.es>, irbis@orcero.org,
	Greg Freemyer <freemyer@NorcrossGroup.com>,
	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010711014243.C1076@werewolf.able.es>
References: <20010711002940.A1264@werewolf.able.es> <200107102329.f6ANTgo08480@kahuna.cag.cpqcorp.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
In-Reply-To: <200107102329.f6ANTgo08480@kahuna.cag.cpqcorp.net>; from bruce@kahuna.cag.cpqcorp.net on Wed, Jul 11, 2001 at 01:29:42 +0200
X-Mailer: Balsa 1.1.6
Content-Length:	758
Lines:	19
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


On 20010711 Bruce Walker wrote:
>
>Given a scheme to determine the node number early in boot, I would suggest
>that 0 be the default (no clustering; not a valid node number) and
>you assign a node number if you intend to cluster at some point.  Clearly
>we don't want to hard code the node number into the kernel.
>> 

Could it be done like DCHP ? So if you wanto to re-define nodes just have to
change a main server config. On boot, nodes ask for a number. And you can
force a node to re-new its number.
mmm, yes, just like pump -R.

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Mandrake Linux release 8.1 (Cooker) for i586
Linux werewolf 2.4.6-ac2 #1 SMP Sun Jul 8 23:57:11 CEST 2001 i686

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 01:50:51 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16463AbRGJXuo>; Wed, 11 Jul 2001 01:50:44 +0200
Received: from [192.216.221.8] ([192.216.221.8]:2473 "EHLO suntan.tandem.com")
	by humbolt.nl.linux.org with ESMTP id <S16430AbRGJXub>;
	Wed, 11 Jul 2001 01:50:31 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id QAA13453
	for <linux-cluster@nl.linux.org>; Tue, 10 Jul 2001 16:50:28 -0700 (PDT)
Received: (from bruce@localhost) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) id f6ANTgo08480; Tue, 10 Jul 2001 16:29:42 -0700 (PDT)
From:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Message-Id: <200107102329.f6ANTgo08480@kahuna.cag.cpqcorp.net>
Subject: Re: Clusterwide pids
In-Reply-To: <20010711002940.A1264@werewolf.able.es> from "J . A . Magallon" at "Jul 11, 2001 00:29:40 am"
To:	jamagallon@able.es (J . A . Magallon)
Date:	Tue, 10 Jul 2001 16:29:42 -0700 (PDT)
Cc:	irbis@orcero.org, freemyer@NorcrossGroup.com (Greg Freemyer),
	linux-cluster@nl.linux.org
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

> 
> On 20010711 irbis@orcero.org wrote:
> 
> I have read sometimes about 32bit pids in lkml. I think the first thing to do is
> ask lkml (well, really Linus) what are the plans for 32bit pids on main kernel
> stream. Perhaps it is a 2.5 planned feature.

In 2.4, all the pids in the kernel are 32 bit and getpid() returns a
32 bit number.  We have been node-encoding (i.e. CPIDing) for 9 months
and have noticed only 1 minor problem in Red Hat 7.1.  What libc 
converts a 32 bit pid from the kernel to a 16 bit pid?  B.t.w., top and
ps seem to work just fine with CPIDs.

If in fact the size of the pid is not an issue and the code to create
CPIDs is clearly not an issue, I think the key discussion point would
be  how and where to designate a node number.  For the Cluster Infrastructure
project (CI), we originally were setting the node number in lilo but
on Linus's suggestion, we have made cluster enabling and joining 
command driven.  That means you don't have a node number until you are
ready to join a cluster, potentially long after some processes have
been created.  For the open SSI cluster, the initialization is done
from commands in the ram disk so a node number is known very early.

Given a scheme to determine the node number early in boot, I would suggest
that 0 be the default (no clustering; not a valid node number) and
you assign a node number if you intend to cluster at some point.  Clearly
we don't want to hard code the node number into the kernel.
> 
> > Not necessarly. In fact, the only working SSI fully transparent migration
> >scheme running today on productions environments -Mosix- has no
> >a common PID space.
> >
The Linux version of Mosix has SSI transparent migration but is not
an SSI cluster.  Clusterwide pids are more important in an SSI
cluster.  The distinction is that in Linux version of Mosix, processes
that migrate away from their home node still see the view of their home
node. Each home node has a different view of processes, devices,
IPC objects (fifos, unix sockets, messages queues, etc), filesystems,
networking, etc.).  In an SSI cluster, all processes on all nodes see the same view
of all resources, including all other processes.

bruce walker
Open SSI Cluster architect
Compaq Computers

> -- 
> J.A. Magallon                           #  Let the source be with you...        
> mailto:jamagallon@able.es
> Mandrake Linux release 8.1 (Cooker) for i586
> Linux werewolf 2.4.6-ac2 #1 SMP Sun Jul 8 23:57:11 CEST 2001 i686
> 
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 02:00:11 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16468AbRGKAAG>; Wed, 11 Jul 2001 02:00:06 +0200
Received: from pentafluge.infradead.org ([195.224.55.251]:59400 "EHLO
	pentafluge.infradead.org") by humbolt.nl.linux.org with ESMTP
	id <S16464AbRGJX74> convert rfc822-to-8bit; Wed, 11 Jul 2001 01:59:56 +0200
Received: from mail116.mail.bellsouth.net ([205.152.58.56] helo=imf16bis.bellsouth.net)
	by pentafluge.infradead.org with esmtp (Exim 3.22 #1 (Red Hat Linux))
	id 15K7KY-0006ts-00
	for <linux-cluster@nl.linux.org>; Wed, 11 Jul 2001 00:54:15 +0100
Received: from taz ([208.61.65.237]) by imf16bis.bellsouth.net
          (InterMail vM.5.01.01.01 201-252-104) with SMTP
          id <20010710235649.WZCR22093.imf16bis.bellsouth.net@taz>;
          Tue, 10 Jul 2001 19:56:49 -0400
Date:	Tue, 10 Jul 2001 19:55:16 -0400
From:	Greg Freemyer <freemyer@NorcrossGroup.com>
Subject: re[2]: Clusterwide pids
To:	<irbis@orcero.org>
cc:	<linux-cluster@nl.linux.org>
Mime-Version: 1.0
Organization: The Norcross Group
X-Mailer: GoldMine [5.50.10424]
Content-Type: text/plain
Content-Transfer-Encoding: 8BIT
Message-Id: <20010710235649.WZCR22093.imf16bis.bellsouth.net@taz>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


 >>  > Questions:
 >>  > Given that CPIDs seem to be difficult to fully implement and involve
 >>  > the kernel, but if accomplished can be widely used throughout the
 >>  > OpenSource cluster projects, should a common CPID project be initiated?

 >>  I have seen on the past a path running for 2.0 series for a common PID,
 >>  but I lost the link. Anyway, some people would ask first about if it is
 >>  a good feature -since not all the people think that it is a good idea
 >>  breaking backward compatibility-.

I guess that I am such a cluster convert, that I simply accept that full SSI type clusters are the future.  For IPC to ever have a SSI view of the cluster will require CPIDs, so I take it as a given that eventually, we will have to have a CPID solution.  My hope is that we have just one, not one per cluster technology.

A better question from my optimistic perspective is, how can a CPID solution be implemented with minimal impact.

 >>  > Bruce Walker relates that the SSI for Linux Clusters project has this
 >>  > working, and apparently PVM does as well.  Could either of these
 >>  > implementations, or some other existing implementation, be isolated and
 >>  > made available to the Linux Cluster community as a component?

 >>  PVM uses the CPID as an internal commodity of the virtual machine, and it
 >>  does mapping between the internal common PID and the real PID of the
 >>  machine. Anyway, it runs enterly at userland. I doubt that we could use
 >>  PVM code, but we can get the ideas.

This seems pretty restrictive.  I gather PVM has a limit of 64,000 processes across the entire cluster.  A 100 node cluster could only have 640 processes per node.  

That seems like a pretty big limitation to build in.

I think any core/common cluster infrastructure modules should support at least 100 nodes, and hopefully a lot more than that.

Given the Alpha and the imminent arrival of the Itanium, I would prefer to see a 64 bit CPID with the first 32 bits for the node and the last 32 bits for the local pid.  

I don't have Alpha Linux installed on anything right now, so I'm not sure what size pid it uses, but it seems reasonable to me for the 64-bit processors to use a 64-bit CPID.

 >>  The main idea would be doing mapping, as PID does. If you ask for a
 >>  16-bit PID, you reference the local PID. If you ask for a 32-bit PID, it
 >>  would give to you the global PID. This would mean that all the old
 >>  application will run, but it also means that we will need more kernel
 >>  calls. As an example, we keep:

 >>  pid_t getpid(void);
 >>  pid_t getppid(void);


 >>  but we also provide:

 >>  pid_t_32 getpid32(void);
 >>  pid_t_32 getppid32(void);

 >>  This would mean that the old apps will run, and we can develop new HP
 >>  applications that would make use of the new syscalls.

In addition I would think we should update the appropriate header file

#ifndef LEGACY_PID

#define   getpid()      getpid32()
#define   getppid()    getppid32()

#undef pid_t
typedef    pid_t         pid_t_32

#endif

This would allow a simple recompile would give people a chance to know about the new calls and a fairly easy way to stick with the current pids.

In addition, it might be nice to actually say getcpid() and getpcpid() instead of getpid32() and getppid32().  This would make developers more aware of the availability of the cluster infrastructure as opposed to thinking it was a simple pid expansion.

 >>  For a non-HP kernel, the result of the 16-bits and 32-bits would be the
 >>  same -2 most significant bytes at CPID at 0-.

Agreed, unless my earlier suggestion of a 64-bit pid is given consideration.  In that case I don't know what would happen.

 >>  Yours:

 >>  David




Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 02:05:04 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16470AbRGKAEp>; Wed, 11 Jul 2001 02:04:45 +0200
Received: from suntan.tandem.com ([192.216.221.8]:35497 "EHLO
	suntan.tandem.com") by humbolt.nl.linux.org with ESMTP
	id <S16464AbRGKAE3>; Wed, 11 Jul 2001 02:04:29 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id RAA13713
	for <linux-cluster@nl.linux.org>; Tue, 10 Jul 2001 17:04:22 -0700 (PDT)
Received: (from bruce@localhost) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) id f6ANlrY09648; Tue, 10 Jul 2001 16:47:53 -0700 (PDT)
From:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Message-Id: <200107102347.f6ANlrY09648@kahuna.cag.cpqcorp.net>
Subject: Re: re[2]: Clusterwide pids
In-Reply-To: <20010710225557.KBMP10148.imf08bis.bellsouth.net@taz> from Greg Freemyer at "Jul 10, 2001 06:54:31 pm"
To:	freemyer@NorcrossGroup.com (Greg Freemyer)
Date:	Tue, 10 Jul 2001 16:47:53 -0700 (PDT)
Cc:	irbis@orcero.org, linux-cluster@nl.linux.org
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg,
> 
> I did not mean to imply simultaneously.
> 
> I merely meant: could the CPID module be written generically enough that it could interface to the multiple membership algorithms already in use. (i.e. I'm sure FailSafe, Kimberlite, SteelEye, Beowulf, PVM, SSI for Linux Clusters, etc. each have their own membership algorithm.)

I'd be surprised if was hard to have each of the cluster solutions move to a common
node number mechanism (whether pids had it encoded or not).  A key goal of our 
Cluster Infrastructure project (can be found via www.opensource.compaq.com) is to
provide a common membership (and thus node numbering) mechanism for any and all clusters.
I promised earlier that I would  share the proposed membership interfaces, which I will
do soon.

> 
> I was not trying to imply that one node could be simultaneously a member of a PVM cluster and a SSI for Linux Cluster.  That seems way beyond the current scope of things.  I guess it is conceivable that with enough common infrastructure it could be done, but I would not consider that a design goal.

While PVM and SSI may not be very likely, overlapping clusters is quite tantalizing.
Steven Tweedie has proposed hierarhical clusters, which I believe VMS clustering has.
I think that has definite merit and going beyond that, allowing nodes to be members
of more than one cluster at once has some attraction.
> 
> Greg Freemyer
> Internet Engineer
> Deployment and Integration Specialist
> The Norcross Group
> www.NorcrossGroup.com
> 
> 
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/

bruce walker
Open SSI Cluster Architect
Compaq Computers


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 02:11:27 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16472AbRGKALJ>; Wed, 11 Jul 2001 02:11:09 +0200
Received: from gw.xkey.com ([206.86.100.52]:55566 "EHLO happy.xkey.com")
	by humbolt.nl.linux.org with ESMTP id <S16469AbRGKAKz>;
	Wed, 11 Jul 2001 02:10:55 +0200
Received: (from smtp@localhost) by happy.xkey.com
	id RAA16729 for <linux-cluster@nl.linux.org>; Tue, 10 Jul 2001 17:10:53 -0700
Received: from happy(127.0.0.1) by happy.xkey.com via smtp (V1.3)
	id sma016710; Tue Jul 10 17:10:48 2001
Received: (from lindahl@localhost)
	by localhost.hpti.com (8.11.0/8.11.0) id f6B0DTV26694
	for linux-cluster@nl.linux.org; Tue, 10 Jul 2001 20:13:29 -0400
X-Authentication-Warning: localhost.hpti.com: lindahl set sender to lindahl@conservativecomputer.com using -f
Date:	Tue, 10 Jul 2001 20:13:29 -0400
From:	Greg Lindahl <lindahl@conservativecomputer.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010710201329.A26668@wumpus.foo>
Mail-Followup-To: linux-cluster@nl.linux.org
References: <20010710235649.WZCR22093.imf16bis.bellsouth.net@taz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010710235649.WZCR22093.imf16bis.bellsouth.net@taz>; from freemyer@NorcrossGroup.com on Tue, Jul 10, 2001 at 07:55:16PM -0400
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Tue, Jul 10, 2001 at 07:55:16PM -0400, Greg Freemyer wrote:

> I guess that I am such a cluster convert, that I simply accept that
> full SSI type clusters are the future.

Then you'd better listen more carefully to people who have different
needs, and users who have a different definition of a "single system".
I'm currently bidding on a procurement that will involve a machine
that is faster than 5 TFlops, and I assure you that any unnecessary
complications aren't going to be involved, because it's a really big
box that runs MPI jobs, not one that does anything else.

For some people, full SSI clusters are the future. For other people,
other kinds clusters are the future.

BTW, the pid size thing has been done to death in Linux-kernel. Go
read it, it's a good discussion.

g

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 06:40:59 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16475AbRGKEkv>; Wed, 11 Jul 2001 06:40:51 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:18441 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16477AbRGKEki>;
	Wed, 11 Jul 2001 06:40:38 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6B4eO6405214;
	Wed, 11 Jul 2001 00:40:24 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107110440.f6B4eO6405214@saturn.cs.uml.edu>
Subject: Re: re[2]: Clusterwide pids
To:	freemyer@NorcrossGroup.com (Greg Freemyer)
Date:	Wed, 11 Jul 2001 00:40:24 -0400 (EDT)
Cc:	irbis@orcero.org, linux-cluster@nl.linux.org
In-Reply-To: <20010710235649.WZCR22093.imf16bis.bellsouth.net@taz> from "Greg Freemyer" at Jul 10, 2001 07:55:16 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg Freemyer writes:

>>>  PVM uses the CPID as an internal commodity of the virtual machine, and it
>>>  does mapping between the internal common PID and the real PID of the
>>>  machine. Anyway, it runs enterly at userland. I doubt that we could use
>>>  PVM code, but we can get the ideas.
>
> This seems pretty restrictive.  I gather PVM has a limit of 64,000
> processes across the entire cluster.  A 100 node cluster could only
> have 640 processes per node.

This works: 4000 compute nodes with 16 processes each.

Where I work, we pack 320 32-bit processors into a 9U space.
These units may be linked up together with fiber. So 4 boxes
gets you to 1280 nodes.

> I think any core/common cluster infrastructure modules should
> support at least 100 nodes, and hopefully a lot more than that.

Make that 1000 at least.

> Given the Alpha and the imminent arrival of the Itanium, I would
> prefer to see a 64 bit CPID with the first 32 bits for the node
> and the last 32 bits for the local pid.

Bad, and it won't happen.

BTW the "ps" code is mine.

How about this: when a node boots, pass it a PID range. If a node
runs out of PIDs, it can ask for more. One need not contact the
global pool on every PID allocation; PIDs can be dished out in
blocks and only returned to the pool when they sit idle. PIDs that
belong to other nodes or the global pool can appear allocated to
most of the local code, perhaps with a new process state.

Let's have an adjustable PID limit too. Right now it is 0xfffe if
I remember right. We could set the default to 9999, and let those
with large clusters change it to 999999 or more. This way you can
have all the PIDs you need without forcing small system users to
be annoyed with huge numbers.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 07:38:56 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16485AbRGKFis>; Wed, 11 Jul 2001 07:38:48 +0200
Received: from mercury.mv.net ([199.125.85.40]:53776 "EHLO mercury.mv.net")
	by humbolt.nl.linux.org with ESMTP id <S16478AbRGKFil>;
	Wed, 11 Jul 2001 07:38:41 +0200
Received: from filesrus (bnh-1-23.mv.com [199.125.99.23]) by mercury.mv.net (8.8.8/mem-971025) with SMTP id BAA03346 for <linux-cluster@nl.linux.org>; Wed, 11 Jul 2001 01:38:39 -0400 (EDT)
Message-ID: <039701c109cc$4c104e80$28627dc7@filesrus>
From:	"Bill Todd" <billtodd@foo.mv.com>
To:	<linux-cluster@nl.linux.org>
References: <20010710163106.LSM13530.imf06bis.bellsouth.net@taz> <Pine.LNX.4.30.0107102342330.12297-100000@hermes.orcero.org> <20010711002940.A1264@werewolf.able.es>
Subject: Re: Clusterwide pids
Date:	Wed, 11 Jul 2001 01:42:38 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


----- Original Message -----
From: "J . A . Magallon" <jamagallon@able.es>
To: <irbis@orcero.org>
Cc: "Greg Freemyer" <freemyer@NorcrossGroup.com>; "Bruce Walker"
<bruce@kahuna.cag.cpqcorp.net>; <linux-cluster@nl.linux.org>
Sent: Tuesday, July 10, 2001 6:29 PM
Subject: Re: Clusterwide pids

...


> I have read sometimes about 32bit pids in lkml. I think the first thing to
do is
> ask lkml (well, really Linus) what are the plans for 32bit pids on main
kernel
> stream. Perhaps it is a 2.5 planned feature.

Not being at all Linux-savvy, I'll just mention that if the PID size may be
changing anyway taking it all the way to 64 bits might avoid another change
later.  It would allow fairly arbitrary internal formats (including support
for a node to be a member of multiple clusters, should that prove useful,
plus enough space to use just about any node-local PID as the basis for a
cluster-wide PID) with larger-than-16-bit ranges.  If I were a fan of UUIDs
for such things I might even suggest 128 bits, but I tend to prefer
physically-based identifiers where there's no compelling reason to require a
level of look-up indirection.

- bill




Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 08:00:11 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16483AbRGKGAC>; Wed, 11 Jul 2001 08:00:02 +0200
Received: from gw.xkey.com ([206.86.100.52]:38670 "EHLO happy.xkey.com")
	by humbolt.nl.linux.org with ESMTP id <S16477AbRGKF74>;
	Wed, 11 Jul 2001 07:59:56 +0200
Received: (from smtp@localhost) by happy.xkey.com
	id WAA09779 for <linux-cluster@nl.linux.org>; Tue, 10 Jul 2001 22:59:53 -0700
Received: from happy(127.0.0.1) by happy.xkey.com via smtp (V1.3)
	id sma009775; Tue Jul 10 22:59:51 2001
Received: (from lindahl@localhost)
	by localhost.hpti.com (8.11.0/8.11.0) id f6B62WB27138
	for linux-cluster@nl.linux.org; Wed, 11 Jul 2001 02:02:32 -0400
X-Authentication-Warning: localhost.hpti.com: lindahl set sender to lindahl@conservativecomputer.com using -f
Date:	Wed, 11 Jul 2001 02:02:32 -0400
From:	Greg Lindahl <lindahl@conservativecomputer.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010711020232.A27132@wumpus.foo>
Mail-Followup-To: linux-cluster@nl.linux.org
References: <20010710163106.LSM13530.imf06bis.bellsouth.net@taz> <Pine.LNX.4.30.0107102342330.12297-100000@hermes.orcero.org> <20010711002940.A1264@werewolf.able.es> <039701c109cc$4c104e80$28627dc7@filesrus>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <039701c109cc$4c104e80$28627dc7@filesrus>; from billtodd@foo.mv.com on Wed, Jul 11, 2001 at 01:42:38AM -0400
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Wed, Jul 11, 2001 at 01:42:38AM -0400, Bill Todd wrote:

> Not being at all Linux-savvy, I'll just mention that if the PID size may be
> changing anyway taking it all the way to 64 bits might avoid another change
> later.

This is a good example of a point which is pointless to bring up on
this mailing list. But you can read the linux-kernel list to find out
what's actually already in the plans for the main kernel.

g

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 10:46:24 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16519AbRGKIqU>; Wed, 11 Jul 2001 10:46:20 +0200
Received: from gate.in-addr.de ([212.8.193.158]:12046 "EHLO mx.in-addr.de")
	by humbolt.nl.linux.org with ESMTP id <S16520AbRGKIqB>;
	Wed, 11 Jul 2001 10:46:01 +0200
Received: from hermes.marowsky-bree.de (localhost [127.0.0.1])
	by mx.in-addr.de (mail.in-addr.de) with ESMTP id 1ED703770C
	for <linux-cluster@nl.linux.org>; Wed, 11 Jul 2001 10:46:00 +0200 (CEST)
Received: by hermes.marowsky-bree.de (Postfix, from userid 500)
	id AF2531A9B6; Wed, 11 Jul 2001 10:27:33 +0200 (CEST)
Date:	Wed, 11 Jul 2001 10:27:33 +0200
From:	Lars Marowsky-Bree <lmb@suse.de>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010711102733.C1904@marowsky-bree.de>
References: <20010710235649.WZCR22093.imf16bis.bellsouth.net@taz>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.3.16i
In-Reply-To: <20010710235649.WZCR22093.imf16bis.bellsouth.net@taz>; from "Greg Freemyer" on 2001-07-10T19:55:16
X-Ctuhulu: HASTUR
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On 2001-07-10T19:55:16,
   Greg Freemyer <freemyer@NorcrossGroup.com> said:

> I guess that I am such a cluster convert, that I simply accept that full SSI
> type clusters are the future.  For IPC to ever have a SSI view of the
> cluster will require CPIDs, so I take it as a given that eventually, we will
> have to have a CPID solution.  My hope is that we have just one, not one per
> cluster technology.

A CPID and SSI-type clustering certainly are desireable for quite a few
clustering applications.

However, not for all of them. Most notably, clustering nodes which are
geographically distributed - it makes sense to treat them as less-tightly
coupled.

The common cluster infrastructure shouldn't enforce this.

However, a CPID component certainly makes sense in the framework, and as long
as everybody sticks to the APIs suggested, you can have several...

> I think any core/common cluster infrastructure modules should support at
> least 100 nodes, and hopefully a lot more than that.

The model itself should be totally agnostic and scale in any direction. I
think this is possible for quite a few aspects.

The actual implementations might have different limits.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 10:46:26 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16521AbRGKIqS>; Wed, 11 Jul 2001 10:46:18 +0200
Received: from gate.in-addr.de ([212.8.193.158]:11790 "EHLO mx.in-addr.de")
	by humbolt.nl.linux.org with ESMTP id <S16519AbRGKIqB>;
	Wed, 11 Jul 2001 10:46:01 +0200
Received: from hermes.marowsky-bree.de (localhost [127.0.0.1])
	by mx.in-addr.de (mail.in-addr.de) with ESMTP id 2D68D3770D
	for <linux-cluster@nl.linux.org>; Wed, 11 Jul 2001 10:46:00 +0200 (CEST)
Received: by hermes.marowsky-bree.de (Postfix, from userid 500)
	id B86201A98F; Wed, 11 Jul 2001 10:22:46 +0200 (CEST)
Date:	Wed, 11 Jul 2001 10:22:46 +0200
From:	Lars Marowsky-Bree <lmb@suse.de>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010711102246.B1904@marowsky-bree.de>
References: <20010710235649.WZCR22093.imf16bis.bellsouth.net@taz> <200107110440.f6B4eO6405214@saturn.cs.uml.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.3.16i
In-Reply-To: <200107110440.f6B4eO6405214@saturn.cs.uml.edu>; from "Albert D. Cahalan" on 2001-07-11T00:40:24
X-Ctuhulu: HASTUR
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On 2001-07-11T00:40:24,
   "Albert D. Cahalan" <acahalan@cs.uml.edu> said:

> How about this: when a node boots, pass it a PID range.

This leaves us with the chicken and egg problem - how do you boot a node which
is - at the time of boot - unable to contact the cluster?

IMHO, a node should be able to have "local" processes (node id part of the
CPID = 0). Only after it joined the cluster once (and thus was assigned a node
id) should processes which need a valid CPID be started.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 10:46:28 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16525AbRGKIqQ>; Wed, 11 Jul 2001 10:46:16 +0200
Received: from gate.in-addr.de ([212.8.193.158]:11022 "EHLO mx.in-addr.de")
	by humbolt.nl.linux.org with ESMTP id <S16521AbRGKIqF>;
	Wed, 11 Jul 2001 10:46:05 +0200
Received: from hermes.marowsky-bree.de (localhost [127.0.0.1])
	by mx.in-addr.de (mail.in-addr.de) with ESMTP id 204EA3770A
	for <linux-cluster@nl.linux.org>; Wed, 11 Jul 2001 10:45:58 +0200 (CEST)
Received: by hermes.marowsky-bree.de (Postfix, from userid 500)
	id 7C1951A9BD; Wed, 11 Jul 2001 10:31:38 +0200 (CEST)
Date:	Wed, 11 Jul 2001 10:31:38 +0200
From:	Lars Marowsky-Bree <lmb@suse.de>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010711103138.D1904@marowsky-bree.de>
References: <20010710225557.KBMP10148.imf08bis.bellsouth.net@taz> <200107102347.f6ANlrY09648@kahuna.cag.cpqcorp.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.3.16i
In-Reply-To: <200107102347.f6ANlrY09648@kahuna.cag.cpqcorp.net>; from "Bruce Walker" on 2001-07-10T16:47:53
X-Ctuhulu: HASTUR
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On 2001-07-10T16:47:53,
   Bruce Walker <bruce@kahuna.cag.cpqcorp.net> said:

> I'd be surprised if was hard to have each of the cluster solutions move to a
> common node number mechanism (whether pids had it encoded or not).

I bet it will be quite hard ;-)

For now, I would be happy if we could agree on a membership API.

> A key goal of our Cluster Infrastructure project (can be found via
> www.opensource.compaq.com) is to provide a common membership (and thus node
> numbering) mechanism for any and all clusters.  I promised earlier that I
> would  share the proposed membership interfaces, which I will do soon.

I am looking forward to seeing this! Sounds very interesting. This is exactly
what this mailing list is intended for.

Will you - or someone else from Compaq's effort - be at Ottawa Linux
Symposium's clustering working group?

> Steven Tweedie has proposed hierarhical clusters, which I believe VMS
> clustering has.  I think that has definite merit and going beyond that,
> allowing nodes to be members of more than one cluster at once has some
> attraction.

Hierarchial clusters are hard enough, being a member of two clusters at once
is an interesting problem - what possible advantages do you see here?

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 19:28:17 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16406AbRGKR2J>; Wed, 11 Jul 2001 19:28:09 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:4105 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16203AbRGKR1u>;
	Wed, 11 Jul 2001 19:27:50 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6BHRaP15862;
	Wed, 11 Jul 2001 13:27:36 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107111727.f6BHRaP15862@saturn.cs.uml.edu>
Subject: Re: Clusterwide pids
To:	lmb@suse.de (Lars Marowsky-Bree)
Date:	Wed, 11 Jul 2001 13:27:35 -0400 (EDT)
Cc:	linux-cluster@nl.linux.org
In-Reply-To: <20010711102246.B1904@marowsky-bree.de> from "Lars Marowsky-Bree" at Jul 11, 2001 10:22:46 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Lars Marowsky-Bree writes:
>    "Albert D. Cahalan" <acahalan@cs.uml.edu> said:

>> How about this: when a node boots, pass it a PID range.
>
> This leaves us with the chicken and egg problem - how do you
> boot a node which is - at the time of boot - unable to contact
> the cluster?

You don't. It is a mistake to design for this perversion.
Do you, or do you not want a shared PID space? Make up your
mind about this. You don't run an SMP system as multiple
uniprocessor systems, then suddenly decide that you want SMP!

> IMHO, a node should be able to have "local" processes (node id
> part of the CPID = 0). Only after it joined the cluster once
> (and thus was assigned a node id) should processes which need
> a valid CPID be started.

Now what am I supposed to do with the "ps" program I wrote?
Invisible processes? Nope, not at all OK. This is crap too:

  PID TTY          TIME CMD
    1 ?        00:00:03 init
    1 ?        00:00:03 init
    1 ?        00:00:03 init
    1 ?        00:00:03 init
    2 ?        00:00:00 keventd
    2 ?        00:00:00 keventd
    2 ?        00:00:00 keventd
    2 ?        00:00:00 keventd
    3 ?        00:00:10 kswapd
    3 ?        00:00:10 kswapd
    3 ?        00:00:10 kswapd
    3 ?        00:00:10 kswapd
    4 ?        00:00:00 kreclaimd
    4 ?        00:00:00 kreclaimd
    4 ?        00:00:00 kreclaimd
    4 ?        00:00:00 kreclaimd
    5 ?        00:00:00 bdflush
    5 ?        00:00:00 bdflush
    5 ?        00:00:00 bdflush
    5 ?        00:00:00 bdflush
    6 ?        00:00:00 kupdated
    6 ?        00:00:00 kupdated
    6 ?        00:00:00 kupdated
    6 ?        00:00:00 kupdated
    7 ?        00:00:00 mdrecoveryd
    7 ?        00:00:00 mdrecoveryd
    7 ?        00:00:00 mdrecoveryd
    7 ?        00:00:00 mdrecoveryd
31331 pts/0    00:00:00 bash
 9786 tty2     00:00:00 bash
24247 tty2     00:00:00 less
26657 pts/0    00:00:00 ps

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 19:35:47 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16480AbRGKRfi>; Wed, 11 Jul 2001 19:35:38 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:17929 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16419AbRGKRfW>;
	Wed, 11 Jul 2001 19:35:22 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6BHZJx18770;
	Wed, 11 Jul 2001 13:35:19 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107111735.f6BHZJx18770@saturn.cs.uml.edu>
Subject: Re: Clusterwide pids
To:	billtodd@foo.mv.com (Bill Todd)
Date:	Wed, 11 Jul 2001 13:35:19 -0400 (EDT)
Cc:	linux-cluster@nl.linux.org
In-Reply-To: <039701c109cc$4c104e80$28627dc7@filesrus> from "Bill Todd" at Jul 11, 2001 01:42:38 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Bill Todd writes:

>> I have read sometimes about 32bit pids in lkml. I think the first
>> thing to do is ask lkml (well, really Linus) what are the plans for
>> 32bit pids on main kernel stream. Perhaps it is a 2.5 planned feature.

It was a Linux 0.1 feature. The PID wraps early to protect bash.
The limit may be adjusted as you please.

I wish it were 9999 by default, adjustable at run time for large
systems that need 5 or 6 digits.

> Not being at all Linux-savvy, I'll just mention that if the PID size
> may be changing anyway taking it all the way to 64 bits might avoid

Thankfully, Linus already vetoed this idea several months ago.
The PID is 32 bits.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 20:15:45 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16419AbRGKSPj>; Wed, 11 Jul 2001 20:15:39 +0200
Received: from mhro1.mayo.edu ([129.176.212.21]:25790 "EHLO mhro1.mayo.edu")
	by humbolt.nl.linux.org with ESMTP id <S16495AbRGKSPX>;
	Wed, 11 Jul 2001 20:15:23 +0200
Received: from [172.23.52.30] by mhro1.mayo.edu with ESMTP for linux-cluster@nl.linux.org; Wed, 11 Jul 2001 13:15:21 -0500
Message-Id: <3B4C97B8.8094CC65@mayo.edu>
Date:	Wed, 11 Jul 2001 13:15:20 -0500
From:	Patrick Spinler <spinler.patrick@mayo.edu>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.18pre21 i686)
X-Accept-Language: en
MIME-Version: 1.0
CC:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
References: <200107111727.f6BHRaP15862@saturn.cs.uml.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
To:	unlisted-recipients:; (no To-header on input)
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

"Albert D. Cahalan" wrote:
> 
> Now what am I supposed to do with the "ps" program I wrote?
> Invisible processes? Nope, not at all OK. This is crap too:
> 
>   PID TTY          TIME CMD
>     1 ?        00:00:03 init
>     1 ?        00:00:03 init
>     1 ?        00:00:03 init
>     1 ?        00:00:03 init
(snip)
> 26657 pts/0    00:00:00 ps

>From a user's viewpoint (& using a VMS cluster daily at work), I like
having a unified cluster PID space, but being able to specify what scope
my commands apply to.  How about something like this ?

# ps -A
   PID TTY          TIME CMD
     1 ?        00:00:03 init
     2 ?        00:00:00 keventd
     3 ?        00:00:10 kswapd
     4 ?        00:00:00 kreclaimd
     5 ?        00:00:00 bdflush
...
#
# ps -A --cluster --sort=pid
NODE     PID TTY          TIME CMD
node1      1 ?        00:00:03 init
node2      1 ?        00:00:03 init
node3      1 ?        00:00:03 init
master     1 ?        00:00:03 init
node1      2 ?        00:00:00 keventd
node2      2 ?        00:00:00 keventd
node3      2 ?        00:00:00 keventd
master     2 ?        00:00:00 keventd
node1      3 ?        00:00:10 kswapd
node2      3 ?        00:00:10 kswapd
node3      3 ?        00:00:10 kswapd
master     3 ?        00:00:10 kswapd
node1      4 ?        00:00:00 kreclaimd
node2      4 ?        00:00:00 kreclaimd
node3      4 ?        00:00:00 kreclaimd
master     4 ?        00:00:00 kreclaimd
node1      5 ?        00:00:00 bdflush
node2      5 ?        00:00:00 bdflush
node3      5 ?        00:00:00 bdflush
master     5 ?        00:00:00 bdflush
...
#
# ps -A --node=master
   PID TTY          TIME CMD
     1 ?        00:00:03 init
     2 ?        00:00:00 keventd
     3 ?        00:00:10 kswapd
     4 ?        00:00:00 kreclaimd
     5 ?        00:00:00 bdflush
...


Or perhaps use 'ps', 'top', 'kill', etc for local processes, 'cps',
'ctop', 'ckill', etc for cluster operations.

-- Pat

-- 
      This message does not represent the policies or positions
	     of the Mayo Foundation or its subsidiaries.
  Patrick Spinler			email:	Spinler.Patrick@Mayo.EDU
  Mayo Foundation			phone:	507/284-9485

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 20:30:22 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16517AbRGKSaE>; Wed, 11 Jul 2001 20:30:04 +0200
Received: from mail006.mail.bellsouth.net ([205.152.58.26]:18533 "EHLO
	imf06bis.bellsouth.net") by humbolt.nl.linux.org with ESMTP
	id <S16515AbRGKS3w> convert rfc822-to-8bit; Wed, 11 Jul 2001 20:29:52 +0200
Received: from taz ([208.61.65.237]) by imf06bis.bellsouth.net
          (InterMail vM.5.01.01.01 201-252-104) with SMTP
          id <20010711183042.ZBXT13530.imf06bis.bellsouth.net@taz>;
          Wed, 11 Jul 2001 14:30:42 -0400
Date:	Wed, 11 Jul 2001 14:29:13 -0400
From:	Greg Freemyer <freemyer@NorcrossGroup.com>
Subject: re[2]: Clusterwide pids
To:	Albert D. Cahalan <acahalan@cs.uml.edu>,  Lars Marowsky-Bree <lmb@suse.de>
cc:	<linux-cluster@nl.linux.org>
Mime-Version: 1.0
Organization: The Norcross Group
X-Mailer: GoldMine [5.50.10424]
Content-Type: text/plain
Content-Transfer-Encoding: 8BIT
Message-Id: <20010711183042.ZBXT13530.imf06bis.bellsouth.net@taz>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

 >>  Lars Marowsky-Bree writes:
 >>  >    "Albert D. Cahalan" <acahalan@cs.uml.edu> said:

 >>  >> How about this: when a node boots, pass it a PID range.
 >>  >
 >>  > This leaves us with the chicken and egg problem - how do you
 >>  > boot a node which is - at the time of boot - unable to contact
 >>  > the cluster?

 >>  "Albert D. Cahalan" <acahalan@cs.uml.edu> said:
 >>  You don't. It is a mistake to design for this perversion.
 >>  Do you, or do you not want a shared PID space? Make up your
 >>  mind about this. You don't run an SMP system as multiple
 >>  uniprocessor systems, then suddenly decide that you want SMP!

This may be good discussion point for the Ottawa meeting.  (No I won't be there.)

I have to agree that in order to get a shared PID space, it seems best to do it right off the bat.

I am most familiar with TruClusters from Compaq, and that is what they do.

If you must boot a TruCluster node, or set of nodes, without having quoram, then you have a separate diagnostic boot process you can manually use to override the quarom value.

I don't have the details handy, but effectively it is like:

    boot kernel -flags quoram_overide=n

Where n becomes your new quoram value for this one boot.

It is not typically used, but, for instance, if you have a 5 node cluster and 3 of the nodes die, you need a way to get the cluster back operational.

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 23:20:47 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16555AbRGKVUj>; Wed, 11 Jul 2001 23:20:39 +0200
Received: from 108-MADR-X55.libre.retevision.es ([62.83.16.108]:60932 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16421AbRGKVUc>;
	Wed, 11 Jul 2001 23:20:32 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP id 3D15D2F4BA
	for <linux-cluster@nl.linux.org>; Wed, 11 Jul 2001 23:26:45 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	linux-cluster@nl.linux.org
Subject: Fwd: Re: re[2]: Clusterwide pids
Date:	Wed, 11 Jul 2001 23:26:44 +0200
X-Mailer: KMail [version 1.2]
MIME-Version: 1.0
Message-Id: <01071123264404.01746@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list




> Given the Alpha and the imminent arrival of the Itanium, I would prefer to
> see a 64 bit CPID with the first 32 bits for the node and the last 32 bits
> for the local pid.

Fun, everytime the process migrate we'll have to change the PID (i assume
this will be generic and cluster with processes migrating will use it)

> I don't have Alpha Linux installed on anything right now, so I'm not sure
> what size pid it uses, but it seems reasonable to me for the 64-bit
> processors to use a 64-bit CPID.

As far as i know pid is a pid_t that is a int in all the plataforms.


Add a new field in task struct (we'll end adding a new structure there
anyway) 32 bits if you want with the node number.
You can begin with node number 0 and then start the clusteringd that read
that node number from /etc/



--
Jordi
  Student in a dark place of Spain



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Wed Jul 11 23:52:41 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16448AbRGKVwd>; Wed, 11 Jul 2001 23:52:33 +0200
Received: from [213.98.27.110] ([213.98.27.110]:10758 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16581AbRGKVwT>;
	Wed, 11 Jul 2001 23:52:19 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6BNwoN02834;
	Wed, 11 Jul 2001 23:58:50 GMT
Date:	Wed, 11 Jul 2001 23:58:50 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	Jordi Polo <mumismo@wanadoo.es>
cc:	<linux-cluster@nl.linux.org>
Subject: Re: Fwd: Re: re[2]: Clusterwide pids
In-Reply-To: <01071123264404.01746@mioooldpc>
Message-ID: <Pine.LNX.4.30.0107112352230.2790-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, Jordi!

> > Given the Alpha and the imminent arrival of the Itanium, I would prefer to
> > see a 64 bit CPID with the first 32 bits for the node and the last 32 bits
> > for the local pid.
>
> Fun, everytime the process migrate we'll have to change the PID (i assume
> this will be generic and cluster with processes migrating will use it)
>


 That is not the point. You select the CPID when the process start in this
way, but it does not depend on where is running, but where was born the
process.

 Rationale: Where the process is running is irrelevant on a SSI cluster.
We could have a mechanism, but it will not be commonly used. By the other
way, ensuring that a particular CPID is unique on the system when you create a
process is really complicated, and we must have a way to find a free CPID
without answering to the rest of the cluster, because we can have lots of
race conditions on the protocol that lead to two process with the same
CPID.

 Componing the CPID as <Origin node ID, local PID> assures that there will
be no two process with the same CPID, and there will be no weird race
conditions.

 Yours:

David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 00:17:43 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16575AbRGKWRe>; Thu, 12 Jul 2001 00:17:34 +0200
Received: from 108-MADR-X55.libre.retevision.es ([62.83.16.108]:20229 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16567AbRGKWRU>;
	Thu, 12 Jul 2001 00:17:20 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id C9E7D2F4BA; Thu, 12 Jul 2001 00:23:06 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	irbis@orcero.org
Subject: Re: Fwd: Re: re[2]: Clusterwide pids
Date:	Thu, 12 Jul 2001 00:23:05 +0200
X-Mailer: KMail [version 1.2]
References: <Pine.LNX.4.30.0107112352230.2790-100000@hermes.orcero.org>
In-Reply-To: <Pine.LNX.4.30.0107112352230.2790-100000@hermes.orcero.org>
Cc:	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071200230500.01984@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


>  Componing the CPID as <Origin node ID, local PID> assures that there will
> be no two process with the same CPID, and there will be no weird race
> conditions.

Ok, then we can just leave the standard kernel as it is and add a 
node-where-was-born field 

--
Jordi

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 00:23:35 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16584AbRGKWX2>; Thu, 12 Jul 2001 00:23:28 +0200
Received: from suntan.tandem.com ([192.216.221.8]:49631 "EHLO
	suntan.tandem.com") by humbolt.nl.linux.org with ESMTP
	id <S16580AbRGKWXN>; Thu, 12 Jul 2001 00:23:13 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id PAA11477
	for <linux-cluster@nl.linux.org>; Wed, 11 Jul 2001 15:22:48 -0700 (PDT)
Received: (from bruce@localhost) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) id f6BM0Kl03445; Wed, 11 Jul 2001 15:00:20 -0700 (PDT)
From:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Message-Id: <200107112200.f6BM0Kl03445@kahuna.cag.cpqcorp.net>
Subject: Re: Fwd: Re: re[2]: Clusterwide pids
In-Reply-To: <01071123264404.01746@mioooldpc> from Jordi Polo at "Jul 11, 2001 11:26:44 pm"
To:	mumismo@wanadoo.es (Jordi Polo)
Date:	Wed, 11 Jul 2001 15:00:20 -0700 (PDT)
Cc:	linux-cluster@nl.linux.org
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

[Charset utf-8 unsupported, skipping...]
From owner-linux-cluster@nl.linux.org Wed Jul 11 14:21:11 2001
Return-Path: owner-linux-cluster@nl.linux.org
Received: from kahuna (kahuna [16.61.168.50]) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) with ESMTP id f6BLLBd03182 for <bruce@kahuna.cag.cpqcorp.net>; Wed, 11 Jul 2001 14:21:11 -0700 (PDT)
Received: from exccup-25008.mis.tandem.com
	by kahuna (fetchmail-4.4.0 POP3)
	for <bruce/kahuna.cag.cpqcorp.net> (single-drop); Wed, 11 Jul 2001 14:21:11 PDT
Received: from ztxmail01.ztx.compaq.com (ztxmail01.nz-cce.cpqcorp.net [161.114.8.205]) by exccup-gh01.mis.tandem.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21)
	id NKGHY0H5; Wed, 11 Jul 2001 14:20:54 -0700
Received: by ztxmail01.ztx.compaq.com (Postfix, from userid 12345)
	id 872835ABC; Wed, 11 Jul 2001 16:20:50 -0500 (CDT)
Received: from humbolt.nl.linux.org (humbolt.nl.linux.org [131.211.28.48])
	by ztxmail01.ztx.compaq.com (Postfix) with ESMTP
	id 6907258AA; Wed, 11 Jul 2001 16:20:50 -0500 (CDT)
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16555AbRGKVUj>; Wed, 11 Jul 2001 23:20:39 +0200
Received: from 108-MADR-X55.libre.retevision.es ([62.83.16.108]:60932 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16421AbRGKVUc>;
	Wed, 11 Jul 2001 23:20:32 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP id 3D15D2F4BA
	for <linux-cluster@nl.linux.org>; Wed, 11 Jul 2001 23:26:45 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From: Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To: linux-cluster@nl.linux.org
Subject: Fwd: Re: re[2]: Clusterwide pids
Date: 	Wed, 11 Jul 2001 23:26:44 +0200
X-Mailer: KMail [version 1.2]
MIME-Version: 1.0
Message-Id: <01071123264404.01746@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender: owner-linux-cluster@nl.linux.org
Precedence: bulk
X-Status: 
X-SCO-PAD: XXXXXX
Content-Length:   962
Status: RO



> Fun, everytime the process migrate we'll have to change the PID (i assume
> this will be generic and cluster with processes migrating will use it)

A key feature of the Open SSI Cluster Project is that processes retain
their pid when they migrate or rexec.  The node number put into the
pid is the node number where the process was created.  It is not the
node number of the node where it is currently executing.




bruce walker


> --
> Jordi
>   Student in a dark place of Spain



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 09:12:16 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16145AbRGLHMH>; Thu, 12 Jul 2001 09:12:07 +0200
Received: from gate.in-addr.de ([212.8.193.158]:47621 "EHLO mx.in-addr.de")
	by humbolt.nl.linux.org with ESMTP id <S16458AbRGLHLu>;
	Thu, 12 Jul 2001 09:11:50 +0200
Received: from hermes.marowsky-bree.de (localhost [127.0.0.1])
	by mx.in-addr.de (mail.in-addr.de) with ESMTP
	id 2DB3F40840; Thu, 12 Jul 2001 09:11:49 +0200 (CEST)
Received: by hermes.marowsky-bree.de (Postfix, from userid 500)
	id 1FB941A9D2; Thu, 12 Jul 2001 09:12:08 +0200 (CEST)
Date:	Thu, 12 Jul 2001 09:12:08 +0200
From:	Lars Marowsky-Bree <lmb@suse.de>
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Cc:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010712091208.C579@marowsky-bree.de>
References: <20010711102246.B1904@marowsky-bree.de> <200107111727.f6BHRaP15862@saturn.cs.uml.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.3.16i
In-Reply-To: <200107111727.f6BHRaP15862@saturn.cs.uml.edu>; from "Albert D. Cahalan" on 2001-07-11T13:27:35
X-Ctuhulu: HASTUR
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On 2001-07-11T13:27:35,
   "Albert D. Cahalan" <acahalan@cs.uml.edu> said:

> > This leaves us with the chicken and egg problem - how do you
> > boot a node which is - at the time of boot - unable to contact
> > the cluster?
> You don't. It is a mistake to design for this perversion.

Uh. A node not being able to join the cluster is a perfectly reasonable
exception, and you want it to boot so that you can fix it over the network. It
makes sense not to start any cluster services, that is true.

> Do you, or do you not want a shared PID space? Make up your
> mind about this. You don't run an SMP system as multiple
> uniprocessor systems, then suddenly decide that you want SMP!

Refer to the CPU hotplug features in recent 2.4 kernels ;-)

Besides, if your implementation decides that you do not want to provide this
"intermediate" step between shared / not-shared PID space, that is fine.

> > IMHO, a node should be able to have "local" processes (node id
> > part of the CPID = 0). Only after it joined the cluster once
> > (and thus was assigned a node id) should processes which need
> > a valid CPID be started.
> Now what am I supposed to do with the "ps" program I wrote?
> Invisible processes? Nope, not at all OK. This is crap too:
> 
>   PID TTY          TIME CMD
>     1 ?        00:00:03 init
>     1 ?        00:00:03 init
>     1 ?        00:00:03 init
>     1 ?        00:00:03 init

You won't see this.

init is almost guaranteed to be a "local" process, and thus not visible to
other nodes.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 18:58:21 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16089AbRGLQ6O>; Thu, 12 Jul 2001 18:58:14 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:4880 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16086AbRGLQ6E>;
	Thu, 12 Jul 2001 18:58:04 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6CGvoO92965;
	Thu, 12 Jul 2001 12:57:50 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107121657.f6CGvoO92965@saturn.cs.uml.edu>
Subject: Re: Clusterwide pids
To:	lmb@suse.de (Lars Marowsky-Bree)
Date:	Thu, 12 Jul 2001 12:57:50 -0400 (EDT)
Cc:	acahalan@cs.uml.edu (Albert D. Cahalan), linux-cluster@nl.linux.org
In-Reply-To: <20010712091208.C579@marowsky-bree.de> from "Lars Marowsky-Bree" at Jul 12, 2001 09:12:08 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Lars Marowsky-Br\351e writes:
>    "Albert D. Cahalan" <acahalan@cs.uml.edu> said:

>>> This leaves us with the chicken and egg problem - how do you
>>> boot a node which is - at the time of boot - unable to contact
>>> the cluster?
>>
>> You don't. It is a mistake to design for this perversion.
>
> Uh. A node not being able to join the cluster is a perfectly
> reasonable exception, and you want it to boot so that you can
> fix it over the network. It makes sense not to start any cluster
> services, that is true.

1. boot the single node without joining the cluster
2. fix the node
3. reboot to join the cluster

>> Do you, or do you not want a shared PID space? Make up your
>> mind about this. You don't run an SMP system as multiple
>> uniprocessor systems, then suddenly decide that you want SMP!
> 
> Refer to the CPU hotplug features in recent 2.4 kernels ;-)

When the CPU is added, it does not bring its own kernel state.
The CPU would generally start from reset or equivalent.
This is not a problem. It's like rebooting a node when you want
that node in the cluster.

Running a 2-way system with 2 GB of RAM and 2 copies of the
kernel, you can't get to SMP without killing some processes.
One of the CPUs has to be reset, more-or-less.

>> Now what am I supposed to do with the "ps" program I wrote?
>> Invisible processes? Nope, not at all OK. This is crap too:
>> 
>>   PID TTY          TIME CMD
>>     1 ?        00:00:03 init
>>     1 ?        00:00:03 init
>>     1 ?        00:00:03 init
>>     1 ?        00:00:03 init
>
> You won't see this.
>
> init is almost guaranteed to be a "local" process, and thus
> not visible to other nodes.

Eeew.

Share, or do not share.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 19:22:57 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16066AbRGLRWv>; Thu, 12 Jul 2001 19:22:51 +0200
Received: from cerebus.wirex.com ([216.161.55.93]:29692 "EHLO
	figure1.int.wirex.com") by humbolt.nl.linux.org with ESMTP
	id <S16012AbRGLRWi>; Thu, 12 Jul 2001 19:22:38 +0200
Received: (from chris@localhost)
	by figure1.int.wirex.com (8.11.0/8.11.0) id f6CHKpA19213
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 10:20:51 -0700
Date:	Thu, 12 Jul 2001 10:20:50 -0700
From:	Chris Wright <chris@wirex.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010712102050.W25392@figure1.int.wirex.com>
References: <20010712091208.C579@marowsky-bree.de> <200107121657.f6CGvoO92965@saturn.cs.uml.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200107121657.f6CGvoO92965@saturn.cs.uml.edu>; from acahalan@cs.uml.edu on Thu, Jul 12, 2001 at 12:57:50PM -0400
X-Editor: Vim http://www.vim.org/
X-Info:	http://www.wirex.com
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

* Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
> Lars Marowsky-Br\351e writes:
> >    "Albert D. Cahalan" <acahalan@cs.uml.edu> said:
> 
> >>> This leaves us with the chicken and egg problem - how do you
> >>> boot a node which is - at the time of boot - unable to contact
> >>> the cluster?
> >>
> >> You don't. It is a mistake to design for this perversion.
> >
> > Uh. A node not being able to join the cluster is a perfectly
> > reasonable exception, and you want it to boot so that you can
> > fix it over the network. It makes sense not to start any cluster
> > services, that is true.
> 
> 1. boot the single node without joining the cluster
> 2. fix the node
> 3. reboot to join the cluster

sounds like a windows soultion ;-)  seriously though, if you can boot
and run processes that are local only (no cluster yet, as you haven't
been able to join for whatever reason), how does this cause problems when
you join the cluster?  sure you join the cluster with some machine state
other than a fresh reset, but the state is relative to the resources you
can share with the cluster.  and the act of joining the cluster should
initialize any cluster specific state on the node, no?

<snip>
> >> Now what am I supposed to do with the "ps" program I wrote?
> >> Invisible processes? Nope, not at all OK. This is crap too:
> >> 
> >>   PID TTY          TIME CMD
> >>     1 ?        00:00:03 init
> >>     1 ?        00:00:03 init
> >>     1 ?        00:00:03 init
> >>     1 ?        00:00:03 init
> >
> > You won't see this.
> >
> > init is almost guaranteed to be a "local" process, and thus
> > not visible to other nodes.
> 
> Eeew.
> 
> Share, or do not share.

are you suggesting that it is not legitimate to have local-only processes?
i would expect some processes to be necessarily tied to a node.  like those
repsonsible for reporting the state of the node...

it seems useful to me to be able to distinguish local from cluster
processing.  for ps, perhaps you could use different flags as mentioned
earlier, and scan /proc/cluster/ for cluster processes.

perhaps i'm missing something...
-chris

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 19:47:34 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16067AbRGLRr1>; Thu, 12 Jul 2001 19:47:27 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:12049 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16097AbRGLRrO>;
	Thu, 12 Jul 2001 19:47:14 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6CHlC4105822;
	Thu, 12 Jul 2001 13:47:12 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107121747.f6CHlC4105822@saturn.cs.uml.edu>
Subject: Re: Clusterwide pids
To:	chris@wirex.com (Chris Wright)
Date:	Thu, 12 Jul 2001 13:47:11 -0400 (EDT)
Cc:	linux-cluster@nl.linux.org
In-Reply-To: <20010712102050.W25392@figure1.int.wirex.com> from "Chris Wright" at Jul 12, 2001 10:20:50 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Chris Wright writes:
> * Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
>> Lars Marowsky-Br\351e writes:
>>>    "Albert D. Cahalan" <acahalan@cs.uml.edu> said:
>> 
>>>>> This leaves us with the chicken and egg problem - how do you
>>>>> boot a node which is - at the time of boot - unable to contact
>>>>> the cluster?
>>>>
>>>> You don't. It is a mistake to design for this perversion.
>>>
>>> Uh. A node not being able to join the cluster is a perfectly
>>> reasonable exception, and you want it to boot so that you can
>>> fix it over the network. It makes sense not to start any cluster
>>> services, that is true.
>> 
>> 1. boot the single node without joining the cluster
>> 2. fix the node
>> 3. reboot to join the cluster
> 
> sounds like a windows soultion ;-)  seriously though, if you can boot
> and run processes that are local only (no cluster yet, as you haven't
> been able to join for whatever reason), how does this cause problems when
> you join the cluster?

Conflicting PIDs are bad.

>>> init is almost guaranteed to be a "local" process, and thus
>>> not visible to other nodes.
>> 
>> Eeew.
>> 
>> Share, or do not share.
> 
> are you suggesting that it is not legitimate to have local-only
> processes? i would expect some processes to be necessarily tied
> to a node.  like those repsonsible for reporting the state of
> the node...

Being tied to a node is fine. Just do it with a unique PID.
On an SMP system, one may support the ability to lock a process
onto a processor. This doesn't mean you should allow that PID
to be used for a different process on a different CPU.

> it seems useful to me to be able to distinguish local from cluster
> processing.  for ps, perhaps you could use different flags as
> mentioned earlier, and scan /proc/cluster/ for cluster processes.

If you're going to do that, then "do not share" is your answer.
Just run a regular cluster and forget about a shared PID space.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 20:10:35 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16130AbRGLSKT>; Thu, 12 Jul 2001 20:10:19 +0200
Received: from mail209.mail.bellsouth.net ([205.152.58.149]:6455 "EHLO
	imf09bis.bellsouth.net") by humbolt.nl.linux.org with ESMTP
	id <S16122AbRGLSKG> convert rfc822-to-8bit; Thu, 12 Jul 2001 20:10:06 +0200
Received: from taz ([208.61.65.237]) by imf09bis.bellsouth.net
          (InterMail vM.5.01.01.01 201-252-104) with SMTP
          id <20010712181057.CDSZ12972.imf09bis.bellsouth.net@taz>;
          Thu, 12 Jul 2001 14:10:57 -0400
Date:	Thu, 12 Jul 2001 14:09:25 -0400
From:	Greg Freemyer <freemyer@NorcrossGroup.com>
Subject: re[2]: Clusterwide pids
To:	Chris Wright <chris@wirex.com>, <linux-cluster@nl.linux.org>
Mime-Version: 1.0
Organization: The Norcross Group
X-Mailer: GoldMine [5.50.10424]
Content-Type: text/plain
Content-Transfer-Encoding: 8BIT
Message-Id: <20010712181057.CDSZ12972.imf09bis.bellsouth.net@taz>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

 >>  * Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
 >>  > Lars Marowsky-Br\351e writes:
 >>  > >    "Albert D. Cahalan" <acahalan@cs.uml.edu> said:
 >>  > 
 >>  > >>> This leaves us with the chicken and egg problem - how do you
 >>  > >>> boot a node which is - at the time of boot - unable to contact
 >>  > >>> the cluster?
 >>  > >>
 >>  > >> You don't. It is a mistake to design for this perversion.
 >>  > >
 >>  > > Uh. A node not being able to join the cluster is a perfectly
 >>  > > reasonable exception, and you want it to boot so that you can
 >>  > > fix it over the network. It makes sense not to start any cluster
 >>  > > services, that is true.
 >>  > 
 >>  > 1. boot the single node without joining the cluster
 >>  > 2. fix the node
 >>  > 3. reboot to join the cluster

 >>  sounds like a windows solution ;-)  seriously though, if you can boot
 >>  and run processes that are local only (no cluster yet, as you haven't
 >>  been able to join for whatever reason), how does this cause problems when
 >>  you join the cluster?  sure you join the cluster with some machine state
 >>  other than a fresh reset, but the state is relative to the resources you
 >>  can share with the cluster.  and the act of joining the cluster should
 >>  initialize any cluster specific state on the node, no?

I think you are missing the point.

IF THE GOAL ...
                        is to get to a cluster with totally transparent IPC capability across the cluster (SSI IPC) then you have to have a unique pid for each process in the cluster.

For instance:   kill nnn   
must send a signal to just one process, not to several.

Ignoring the performance gains potentially available with SSI IPC, one of the points of SSI IPC is to allow administration of the cluster from any node.

For instance, if you have local pids for local processes and CPIDs only for cluster processes, then you have the difficult administrative task of telneting to the appropriate node to kill a runaway local process.

Yes, you can modify kill to accept a node parameter, but IPC is commonly used throughout lots of different admin tools, and it would get extremely difficult to support this mechanism.

In my opinion, it is superior to have CPIDs for all processes and thus we can start thinking seriously about SSI IPC.

If you have processes running prior to joining the cluster and you want CPIDs, then you have 2 basic choices I can think of:

1) Use the local/cluster process paradigm.
2) Have the kernel assume all pids with a node # of 0 are on the local node, and thus it puts the local node # into the cpid.

Far preferable to either of the above in my mind is to have two boot choices, standalone for maintenance, cluster for normal operation.

In the cluster boot situation there are again 2 basic choices:

1) Have a predetermined node #
2) Have the node # dynamically assigned, possibly by having a small 'cluster joining' app which is invoked prior to loading the kernel.  

In either case, if quarom is not available on the cluster at boot up, the kernel/app just sits and waits for more nodes to come alive.

Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 20:26:00 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16134AbRGLSZl>; Thu, 12 Jul 2001 20:25:41 +0200
Received: from cerebus.wirex.com ([216.161.55.93]:55802 "EHLO
	figure1.int.wirex.com") by humbolt.nl.linux.org with ESMTP
	id <S16155AbRGLSZc>; Thu, 12 Jul 2001 20:25:32 +0200
Received: (from chris@localhost)
	by figure1.int.wirex.com (8.11.0/8.11.0) id f6CINgF19265
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 11:23:42 -0700
Date:	Thu, 12 Jul 2001 11:23:42 -0700
From:	Chris Wright <chris@wirex.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010712112342.X25392@figure1.int.wirex.com>
References: <20010712102050.W25392@figure1.int.wirex.com> <200107121747.f6CHlC4105822@saturn.cs.uml.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200107121747.f6CHlC4105822@saturn.cs.uml.edu>; from acahalan@cs.uml.edu on Thu, Jul 12, 2001 at 01:47:11PM -0400
X-Editor: Vim http://www.vim.org/
X-Info:	http://www.wirex.com
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

* Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
> Chris Wright writes:
> > * Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
> >> 
> >> 1. boot the single node without joining the cluster
> >> 2. fix the node
> >> 3. reboot to join the cluster
> > 
> > sounds like a windows soultion ;-)  seriously though, if you can boot
> > and run processes that are local only (no cluster yet, as you haven't
> > been able to join for whatever reason), how does this cause problems when
> > you join the cluster?
> 
> Conflicting PIDs are bad.

of course.

> 
> Being tied to a node is fine. Just do it with a unique PID.
> On an SMP system, one may support the ability to lock a process
> onto a processor. This doesn't mean you should allow that PID
> to be used for a different process on a different CPU.

sorry, implicit in my thinking was a node id as part of the pid.  a node
id of something like 0 being akin to local only.  a non-zero node id
being akin to a cluster process.  (analogous to 127.0.0.1 and the ip
addr of your admin interface)

-chris

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 20:31:32 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16150AbRGLSbX>; Thu, 12 Jul 2001 20:31:23 +0200
Received: from gw.xkey.com ([206.86.100.52]:38661 "EHLO happy.xkey.com")
	by humbolt.nl.linux.org with ESMTP id <S16156AbRGLSbO>;
	Thu, 12 Jul 2001 20:31:14 +0200
Received: (from smtp@localhost) by happy.xkey.com
	id LAA22756 for <linux-cluster@nl.linux.org>; Thu, 12 Jul 2001 11:31:12 -0700
Received: from happy(127.0.0.1) by happy.xkey.com via smtp (V1.3)
	id sma022427; Thu Jul 12 11:25:08 2001
Received: (from lindahl@localhost)
	by localhost.hpti.com (8.11.0/8.11.0) id f6CIRf130893
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 14:27:41 -0400
X-Authentication-Warning: localhost.hpti.com: lindahl set sender to lindahl@conservativecomputer.com using -f
Date:	Thu, 12 Jul 2001 14:27:41 -0400
From:	Greg Lindahl <lindahl@conservativecomputer.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010712142741.A30889@wumpus>
Mail-Followup-To: linux-cluster@nl.linux.org
References: <20010712181057.CDSZ12972.imf09bis.bellsouth.net@taz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010712181057.CDSZ12972.imf09bis.bellsouth.net@taz>; from freemyer@NorcrossGroup.com on Thu, Jul 12, 2001 at 02:09:25PM -0400
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Thu, Jul 12, 2001 at 02:09:25PM -0400, Greg Freemyer wrote:

> For instance, if you have local pids for local processes and CPIDs
> only for cluster processes, then you have the difficult
> administrative task of telneting to the appropriate node to kill a
> runaway local process.

Why do people always assume that the only alternative to their clever
plan is some horrible activity? People who run clusters that aren't
"full SSI" clusters have means of dealing with this that are better
than what you describe.

I've especially noticed this with folks who promote using a single
root disk. I have clusters with 500+ root disks, and I assure you that
I have a suite of simple tools which keep them synchronized. Yet the
single rootdisk folks will tell you, "then you have the difficult
administrative task of..." and assume that anyone not doing it their
way has no tools...

g

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 21:05:27 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16171AbRGLTFR>; Thu, 12 Jul 2001 21:05:17 +0200
Received: from 37-MADR-X33.libre.retevision.es ([62.83.4.37]:39174 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16172AbRGLTFF>;
	Thu, 12 Jul 2001 21:05:05 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP id 887FD2F62A
	for <linux-cluster@nl.linux.org>; Thu, 12 Jul 2001 21:11:21 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	linux-cluster@nl.linux.org
Subject: re[2]: Clusterwide pids
Date:	Thu, 12 Jul 2001 21:11:20 +0200
X-Mailer: KMail [version 1.2]
References: <20010712181057.CDSZ12972.imf09bis.bellsouth.net@taz>
In-Reply-To: <20010712181057.CDSZ12972.imf09bis.bellsouth.net@taz>
MIME-Version: 1.0
Message-Id: <01071221112000.02292@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



Let us summarize a bit:
-most people agrees we need unique CPID even if the process is something we 
know is not going to migrate as init. 
-that CPID will be node number + local pid
- We don't want to recompile userspace apps so we do our own system calls, 
which  maybe :
	* getcpid to get the Â¿64bits? with the standard getpid returning the low 32 
bits of the CPID
	* getnode(pid) with 0 if process if local , nonzero otherwise.
Then we have a question left: Â¿How we manage to get unique CPIDs ?
What number node a machine must have, Â¿read it from /etc? Â¿read it a la HDLC?

i have missed something?


--

Jordi
   Student of Spain 
	  

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 21:11:48 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16170AbRGLTLb>; Thu, 12 Jul 2001 21:11:31 +0200
Received: from mail207.mail.bellsouth.net ([205.152.58.147]:35438 "EHLO
	imf07bis.bellsouth.net") by humbolt.nl.linux.org with ESMTP
	id <S16156AbRGLTLZ> convert rfc822-to-8bit; Thu, 12 Jul 2001 21:11:25 +0200
Received: from taz ([208.61.65.237]) by imf07bis.bellsouth.net
          (InterMail vM.5.01.01.01 201-252-104) with SMTP
          id <20010712191216.BPR3754.imf07bis.bellsouth.net@taz>;
          Thu, 12 Jul 2001 15:12:16 -0400
Date:	Thu, 12 Jul 2001 15:10:44 -0400
From:	Greg Freemyer <freemyer@NorcrossGroup.com>
Subject: re[2]: Clusterwide pids
To:	Greg Lindahl <lindahl@conservativecomputer.com>,
	<linux-cluster@nl.linux.org>
Mime-Version: 1.0
Organization: The Norcross Group
X-Mailer: GoldMine [5.50.10424]
Content-Type: text/plain
Content-Transfer-Encoding: 8BIT
Message-Id: <20010712191216.BPR3754.imf07bis.bellsouth.net@taz>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

 >>  On Thu, Jul 12, 2001 at 02:09:25PM -0400, Greg Freemyer wrote:

 >>  > For instance, if you have local pids for local processes and CPIDs
 >>  > only for cluster processes, then you have the difficult
 >>  > administrative task of telneting to the appropriate node to kill a
 >>  > runaway local process.

 >>  Why do people always assume that the only alternative to their clever
 >>  plan is some horrible activity? People who run clusters that aren't
 >>  "full SSI" clusters have means of dealing with this that are better
 >>  than what you describe.

And my next sentence was

>> Yes, you can modify kill to accept a node parameter ...

What I was trying to say, and said poorly, was that are 2 basic ways to manage a cluster

1) With existing tools extended via full SSI.  The goal being to modify as little of the human interface as possible and thus shorten the learning curve.  
           i.e. from the command line perspective kill works the same on a standalone machine as on a cluster.  Yes, you know it is more complicated internally, but the administrator doesn't have to learn any new options.

2) With a new set of tools, or with administrator visible extensions to existing tools.

           i.e. adding a --node parameter to kill.  (One more learning curve item for supporting clusters vs. standalone.)


It is my belief that any true server today is a potential future cluster.  

If we keep the future administrators learning curve issues in mind, then I believe that we will make a lot of normal Linux administrators jobs a lot easier, than if we ignore the learning curve issues.

I also suspect that SSI IPC will allow many non-cluster aware applications to suddenly be able to take advantage of the cluster.  For instance if you are migrating a set of applications from one node to another which use IPC amongst the set, then SSI IPC would allow the processes to be moved one at a time.

Obviously any high-performance applications which are gong to take advantage of the parallel processing features of the cluster, will require special coding regardless of the presence, or lack thereof, of SSI IPC.


Greg Freemyer
Internet Engineer
Deployment and Integration Specialist
The Norcross Group
www.NorcrossGroup.com


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 21:28:03 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16097AbRGLT1z>; Thu, 12 Jul 2001 21:27:55 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:12819 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16094AbRGLT1n>;
	Thu, 12 Jul 2001 21:27:43 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6CJOC6113746;
	Thu, 12 Jul 2001 15:24:12 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107121924.f6CJOC6113746@saturn.cs.uml.edu>
Subject: Re: re[2]: Clusterwide pids
To:	mumismo@wanadoo.es (Jordi Polo)
Date:	Thu, 12 Jul 2001 15:24:12 -0400 (EDT)
Cc:	linux-cluster@nl.linux.org
In-Reply-To: <01071221112000.02292@mioooldpc> from "Jordi Polo" at Jul 12, 2001 09:11:20 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Jordi Polo writes:

> Let us summarize a bit:
> -most people agrees we need unique CPID even if the process is something we 
> know is not going to migrate as init. 

OK, except just call it the PID.

> -that CPID will be node number + local pid

No. You could run out of PIDs this way, especially if you support
process migration. It is better to pass out chunks of PID space or
lists of free PIDs. When a node uses half of its allocation, it can
send a request for more PID space.

We have most of a 31-bit PID space available. We shouldn't waste
this. It would be nice to keep things compact on the smaller
clusters for "ps -efj" being sane, while allowing a higher limit
on the really big clusters.

> - We don't want to recompile userspace apps so we do our own system calls, 
> which  maybe :
> 	* getcpid to get the \302\27764bits? with the standard getpid
> returning the low 32 bits of the CPID
> 	* getnode(pid) with 0 if process if local , nonzero otherwise.
> Then we have a question left: \302\277How we manage to get unique CPIDs ?
> What number node a machine must have, \302\277read it from /etc?
> \302\277read it a la HDLC?

Hell no. Use the regular system call for getpid().

Linus already said the PID will not go past 32 bits.

Assign node number via the kernel command line, or assign it
when the cluster is joined.

Maybe you don't really want a shared PID space. It's not for
everyone. Let's keep this clean: share, or do not share.
The half-assed solutions will haunt us later.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 21:28:43 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16153AbRGLT20>; Thu, 12 Jul 2001 21:28:26 +0200
Received: from gw.xkey.com ([206.86.100.52]:38919 "EHLO happy.xkey.com")
	by humbolt.nl.linux.org with ESMTP id <S16094AbRGLT2U>;
	Thu, 12 Jul 2001 21:28:20 +0200
Received: (from smtp@localhost) by happy.xkey.com
	id MAA27382 for <linux-cluster@nl.linux.org>; Thu, 12 Jul 2001 12:28:18 -0700
Received: from happy(127.0.0.1) by happy.xkey.com via smtp (V1.3)
	id sma027341; Thu Jul 12 12:26:44 2001
Received: (from lindahl@localhost)
	by localhost.hpti.com (8.11.0/8.11.0) id f6CJTHF31025
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 15:29:17 -0400
X-Authentication-Warning: localhost.hpti.com: lindahl set sender to lindahl@conservativecomputer.com using -f
Date:	Thu, 12 Jul 2001 15:29:17 -0400
From:	Greg Lindahl <lindahl@conservativecomputer.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010712152917.B31016@wumpus>
Mail-Followup-To: linux-cluster@nl.linux.org
References: <20010712191216.BPR3754.imf07bis.bellsouth.net@taz>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010712191216.BPR3754.imf07bis.bellsouth.net@taz>; from freemyer@NorcrossGroup.com on Thu, Jul 12, 2001 at 03:10:44PM -0400
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Thu, Jul 12, 2001 at 03:10:44PM -0400, Greg Freemyer wrote:

> If we keep the future administrators learning curve issues in mind,
> then I believe that we will make a lot of normal Linux
> administrators jobs a lot easier, than if we ignore the learning
> curve issues.

I wasn't ignoring the learning curve issues, so I don't know what "we"
you had in mind.

> I also suspect that SSI IPC will allow many non-cluster aware
> applications to suddenly be able to take advantage of the cluster.
> For instance if you are migrating a set of applications from one
> node to another which use IPC amongst the set, then SSI IPC would
> allow the processes to be moved one at a time.

What applications did you have in mind? In my many years of experience
with distributed and clustered computing in both the enterprise and
scientific arenas, I've run into relatively few applications that use
IPC amongst a set of processes, in a way that would mesh nicely with
SSI IPC. Of course my experience is only with a subset of the
universe, so I'd love to hear examples.

g

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 21:37:37 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16178AbRGLTha>; Thu, 12 Jul 2001 21:37:30 +0200
Received: from suntan.tandem.com ([192.216.221.8]:19861 "EHLO
	suntan.tandem.com") by humbolt.nl.linux.org with ESMTP
	id <S16172AbRGLThP>; Thu, 12 Jul 2001 21:37:15 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id MAA07564
	for <linux-cluster@nl.linux.org>; Thu, 12 Jul 2001 12:37:07 -0700 (PDT)
Received: (from bruce@localhost) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) id f6CJFPj09389; Thu, 12 Jul 2001 12:15:25 -0700 (PDT)
From:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Message-Id: <200107121915.f6CJFPj09389@kahuna.cag.cpqcorp.net>
Subject: Re: re[2]: Clusterwide pids
In-Reply-To: <01071221112000.02292@mioooldpc> from Jordi Polo at "Jul 12, 2001 09:11:20 pm"
To:	mumismo@wanadoo.es (Jordi Polo)
Date:	Thu, 12 Jul 2001 12:15:25 -0700 (PDT)
Cc:	linux-cluster@nl.linux.org
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=UNKNOWN-8BIT
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

> 
> Let us summarize a bit:
> -most people agrees we need unique CPID even if the process is something we 
> know is not going to migrate as init. 
> -that CPID will be node number + local pid

process creation node number + process creation uniquifier

> - We don't want to recompile userspace apps so we do our own system calls, 
> which  maybe :
> 	* getcpid to get the Â¿64bits? with the standard getpid returning the low 32 
> bits of the CPID
> 	* getnode(pid) with 0 if process if local , nonzero otherwise.

I certainly don't agree with this.  pid's are 32 bits so having both node number
and uniquifier is quite feasible (have been doing it for a dozen years).
This way it is all transparent, which, in the case of SSI clustering, is key.

> Then we have a question left: Â¿How we manage to get unique CPIDs ?
> What number node a machine must have, Â¿read it from /etc? Â¿read it a la HDLC?

The issue, as I mentioned earlier, is when/how does a node get a node number.
The DLCP idea is interesting.  Another option is something in lilo or ram disk.  
Specifying it must be in /etc is bad for those of us with a single root.
I like the idea of a "cluster" system call.  One of the subcommands of it would
be set your node number.  Different cluster implementations could gather that
number from different places.

bruce

> 
> i have missed something?
> 
> 
> --
> 
> Jordi
>    Student of Spain 
> 	  
> 
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/
> 

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 21:38:59 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16197AbRGLTil>; Thu, 12 Jul 2001 21:38:41 +0200
Received: from mhro1.mayo.edu ([129.176.212.21]:24042 "EHLO mhro1.mayo.edu")
	by humbolt.nl.linux.org with ESMTP id <S16176AbRGLTiS>;
	Thu, 12 Jul 2001 21:38:18 +0200
Received: from [172.23.52.30] by mhro1.mayo.edu with ESMTP; Thu, 12 Jul 2001 14:37:56 -0500
Message-Id: <3B4DFC94.E4B823D1@mayo.edu>
Date:	Thu, 12 Jul 2001 14:37:56 -0500
From:	Patrick Spinler <spinler.patrick@mayo.edu>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.18pre21 i686)
X-Accept-Language: en
MIME-Version: 1.0
To:	Greg Lindahl <lindahl@conservativecomputer.com>
CC:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
References: <20010712191216.BPR3754.imf07bis.bellsouth.net@taz> <20010712152917.B31016@wumpus>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg Lindahl wrote:
> 
> I've run into relatively few applications that use
> IPC amongst a set of processes, in a way that would mesh nicely with
> SSI IPC. Of course my experience is only with a subset of the
> universe, so I'd love to hear examples.

DBMS server ?
Distributed object application service(s) ?

-- Pat

-- 
      This message does not represent the policies or positions
	     of the Mayo Foundation or its subsidiaries.
  Patrick Spinler			email:	Spinler.Patrick@Mayo.EDU
  Mayo Foundation			phone:	507/284-9485

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 21:43:19 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16195AbRGLTnL>; Thu, 12 Jul 2001 21:43:11 +0200
Received: from gw.xkey.com ([206.86.100.52]:11784 "EHLO happy.xkey.com")
	by humbolt.nl.linux.org with ESMTP id <S16198AbRGLTm5>;
	Thu, 12 Jul 2001 21:42:57 +0200
Received: (from smtp@localhost) by happy.xkey.com
	id MAA29329 for <linux-cluster@nl.linux.org>; Thu, 12 Jul 2001 12:42:55 -0700
Received: from happy(127.0.0.1) by happy.xkey.com via smtp (V1.3)
	id sma029325; Thu Jul 12 12:42:54 2001
Received: (from lindahl@localhost)
	by localhost.hpti.com (8.11.0/8.11.0) id f6CJjWr31101
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 15:45:32 -0400
X-Authentication-Warning: localhost.hpti.com: lindahl set sender to lindahl@conservativecomputer.com using -f
Date:	Thu, 12 Jul 2001 15:45:32 -0400
From:	Greg Lindahl <lindahl@conservativecomputer.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010712154532.A31069@wumpus>
Mail-Followup-To: linux-cluster@nl.linux.org
References: <20010712191216.BPR3754.imf07bis.bellsouth.net@taz> <20010712152917.B31016@wumpus> <3B4DFC94.E4B823D1@mayo.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3B4DFC94.E4B823D1@mayo.edu>; from spinler.patrick@mayo.edu on Thu, Jul 12, 2001 at 02:37:56PM -0500
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Thu, Jul 12, 2001 at 02:37:56PM -0500, Patrick Spinler wrote:
> Greg Lindahl wrote:

> > I've run into relatively few applications that use
> > IPC amongst a set of processes, in a way that would mesh nicely with
> > SSI IPC. Of course my experience is only with a subset of the
> > universe, so I'd love to hear examples.
> 
> DBMS server ?

Which one? Oracle uses a distributed lock manager that doesn't use
normal IPC, and direct access to the storage from all the nodes, which
doesn't use normal IPC. I think DB2 for clusters is similar in
design. Single node databases are generally threaded and use a shared
memory model, not IPC.

> Distributed object application service(s) ?

Generally use things other than normal IPC: CORBA is one example.

g


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 21:52:51 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16194AbRGLTwn>; Thu, 12 Jul 2001 21:52:43 +0200
Received: from mhro1.mayo.edu ([129.176.212.21]:40944 "EHLO mhro1.mayo.edu")
	by humbolt.nl.linux.org with ESMTP id <S16185AbRGLTwd>;
	Thu, 12 Jul 2001 21:52:33 +0200
Received: from [172.23.52.30] by mhro1.mayo.edu with ESMTP; Thu, 12 Jul 2001 14:52:26 -0500
Message-Id: <3B4DFFFA.190AEB6A@mayo.edu>
Date:	Thu, 12 Jul 2001 14:52:26 -0500
From:	Patrick Spinler <spinler.patrick@mayo.edu>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.18pre21 i686)
X-Accept-Language: en
MIME-Version: 1.0
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>,
	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
References: <200107121924.f6CJOC6113746@saturn.cs.uml.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

"Albert D. Cahalan" wrote:
> 
> It is better to pass out chunks of PID space or
> lists of free PIDs. When a node uses half of its allocation, it can
> send a request for more PID space.

(snip)

> Assign node number via the kernel command line, or assign it
> when the cluster is joined.
> 

This scheme is quite straightforward, which is good. But in my limited
understanding it seems to place a cluster architechure in one of two
possibly difficult to resolve scenarios:

a) cluster_id's and pid ranges are hardcoded at boot time
   -) potentially difficult to administer large or highly dynamic
cluster(s)

b) node has to join cluster and request pid range's & cluster id before
starting any processes
   -) all key cluster membership management & control must be kernel
level
      (this may be necessary anyway)
   -) still have to assume certain processes & pid's are local node
only, 
      e.g. init is assumed to be pid 0 in many places, correct ?

How might you address these problems ?  Have I misunderstood ?

-- Pat

-- 
      This message does not represent the policies or positions
	     of the Mayo Foundation or its subsidiaries.
  Patrick Spinler			email:	Spinler.Patrick@Mayo.EDU
  Mayo Foundation			phone:	507/284-9485

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 22:11:14 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16182AbRGLULG>; Thu, 12 Jul 2001 22:11:06 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:47372 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16174AbRGLUKz>;
	Thu, 12 Jul 2001 22:10:55 +0200
Received: (qmail 213749 invoked from network); 12 Jul 2001 20:07:04 -0000
Received: from nicol6.umkc.edu (HELO cstp.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 12 Jul 2001 20:07:04 -0000
Message-ID: <3B4E0296.CE76CDEA@cstp.umkc.edu>
Date:	Thu, 12 Jul 2001 15:03:34 -0500
From:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Greg Freemyer <freemyer-ml@NorcrossGroup.com>
CC:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
References: <20010710205903.RSCI797.imf15bis.bellsouth.net@taz>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



Greg Freemyer wrote:
> 
> SInce this mailing list is dedicated to sharing cluster component technologies
> amongst the various Open Source cluster projects, Bruce Walker's comments got
> me thinking:
> 
> Definition:
> CPID - Clusterwide Process ID.  Guaranteed to be unique for each process across the entire cluster.
> 
> (please advise if there is a pre-existing Acronym for this.)
> 
> Assumptions:
> CPIDs are a useful concept and one that is used by several existing cluster solutions.
> 
> Sophisticated features, such as transparent clusterwide IPC, seem to require CPIDs


Without reading all the other posts in this thread, here's what I have to
say on the subject (yes, how arrogant.)

if per-machine PIDs  (LPID) remain as they are now, and CPID is created by
combining
the LPID with the node number, a CPID then becomes a structure of two
identifiers.

A process could have one or more CPIDs over its life, and might have
different
CPIDs in two different clusters that its node is part of.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 22:16:59 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16189AbRGLUQk>; Thu, 12 Jul 2001 22:16:40 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:64524 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16176AbRGLUQh>;
	Thu, 12 Jul 2001 22:16:37 +0200
Received: (qmail 213866 invoked from network); 12 Jul 2001 20:12:48 -0000
Received: from nicol6.umkc.edu (HELO cstp.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 12 Jul 2001 20:12:48 -0000
Message-ID: <3B4E03EE.8DDBA268@cstp.umkc.edu>
Date:	Thu, 12 Jul 2001 15:09:18 -0500
From:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Greg Freemyer <freemyer@NorcrossGroup.com>
CC:	irbis@orcero.org, Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
References: <20010710225557.KBMP10148.imf08bis.bellsouth.net@taz>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg Freemyer wrote:

> I was not trying to imply that one node could be simultaneously a member of a PVM cluster and a SSI for Linux Cluster.  That seems way beyond the current scope of things.  I guess it is conceivable that with enough common infrastructure it could be done, but I would not consider that a design goal.

I consider that a design goal.  A possible way to have simultaneous
operations of
incompatible systems sharing the same hardware might be to run them each in
their
own UML (User Mode Linux) sandboxes.


> What I was concerned about is that each of the membership algorithms and implementations has its own idiosyncrasies and we would need to insure that a CPID module would not have to know which membership algorithm it was working with.

Agreed.  the "what node is this?" function needs to be fully abstracted, to
the point where it would be possible to have one node be in several
differnt
clusters (with different numbers for itself) and it will give the correct
CPID depending on who is asking -- since the asking entity will provide the
context for determining the node-ID part of the CPID.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 22:28:38 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16190AbRGLU2a>; Thu, 12 Jul 2001 22:28:30 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:33549 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16179AbRGLU2T>;
	Thu, 12 Jul 2001 22:28:19 +0200
Received: (qmail 214004 invoked from network); 12 Jul 2001 20:24:30 -0000
Received: from nicol6.umkc.edu (HELO cstp.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 12 Jul 2001 20:24:30 -0000
Message-ID: <3B4E06AB.32622387@cstp.umkc.edu>
Date:	Thu, 12 Jul 2001 15:20:59 -0500
From:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Greg Freemyer <freemyer@NorcrossGroup.com>
CC:	irbis@orcero.org, linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
References: <20010710235649.WZCR22093.imf16bis.bellsouth.net@taz>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg Freemyer wrote:
> 
> I guess that I am such a cluster convert, that I simply accept that full SSI type clusters
> are the future.  For IPC to ever have a SSI view of the cluster will require CPIDs, so I
> take it as a given that eventually, we will have to have a CPID solution.  My hope is
> that we have just one, not one per cluster technology.

Translation tables work fine.  There has to be a way for the OS to figure
out where
to deliver the signal and deliver it there. Mucking around with the Local
PID to
implement Cluster PID may be overkill.  So what if ps on node seven doesn't
show everything running on node nine?




> 
> Given the Alpha and the imminent arrival of the Itanium, I would prefer to see a 64 bit CPID with the first 32 bits for the node and the last 32 bits for the local pid.
> 
> I don't have Alpha Linux installed on anything right now, so I'm not sure what size pid it uses, but it seems reasonable to me for the 64-bit processors to use a 64-bit CPID.

if the LPID is kept separate and untouched by the clustering code, and the
CPID made
a two-part structure of NodeIdentifier and LPID, local code is left alone
and
clustering code gets the appropriate size CPID depending on the limits of
the clustering scheme (since everyone has node ids of some kind.)


 
>  >>  The main idea would be doing mapping, as PID does. If you ask for a
>  >>  16-bit PID, you reference the local PID. If you ask for a 32-bit PID, it
>  >>  would give to you the global PID. This would mean that all the old
>  >>  application will run, but it also means that we will need more kernel
>  >>  calls. As an example, we keep:

Any bit-counting limits future expansion and portability.  What if I want
to embed the structure with my 4-bit project, just for fun?  Keeping
LPID and CPID as abstract as possible (but no abstracter) allows full
flexibility.


> This would allow a simple recompile would give people a chance to know about the new calls and a fairly easy way to stick with the current pids.

"a simple recomlile"  -- so much for transparency and binary compabability!

 
> In addition, it might be nice to actually say getcpid() and getpcpid() instead of getpid32() and getppid32().  This would make developers more aware of the availability of the cluster infrastructure as opposed to thinking it was a simple pid expansion.

I Agree.  Define them both in terms of a (NodeID, LPID) structure and you
have, FWIW, my full support.


-- 
                                           David Nicol 816.235.1187
                      Irish Government Warning: SMOKERS DIE YOUNGER


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 22:40:06 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16211AbRGLUjs>; Thu, 12 Jul 2001 22:39:48 +0200
Received: from 37-MADR-X33.libre.retevision.es ([62.83.4.37]:26119 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16179AbRGLUja>;
	Thu, 12 Jul 2001 22:39:30 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id E6B612F4BE; Thu, 12 Jul 2001 22:45:38 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Subject: Re: re[2]: Clusterwide pids
Date:	Thu, 12 Jul 2001 22:45:38 +0200
X-Mailer: KMail [version 1.2]
References: <200107121915.f6CJFPj09389@kahuna.cag.cpqcorp.net>
In-Reply-To: <200107121915.f6CJFPj09389@kahuna.cag.cpqcorp.net>
Cc:	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071222453803.02292@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

El Jueves 12 Julio 2001 21:15, escribiste:
> > Let us summarize a bit:
> > -most people agrees we need unique CPID even if the process is something
> > we know is not going to migrate as init.
> > -that CPID will be node number + local pid
>
> process creation node number + process creation uniquifier
that's what i wanted to say 

>
> > - We don't want to recompile userspace apps so we do our own system
> > calls, which  maybe :
> > 	* getcpid to get the Â¿64bits? with the standard getpid returning the low
> > 32 bits of the CPID
> > 	* getnode(pid) with 0 if process if local , nonzero otherwise.
>
> I certainly don't agree with this.  pid's are 32 bits so having both node
> number and uniquifier is quite feasible (have been doing it for a dozen
> years). This way it is all transparent, which, in the case of SSI
> clustering, is key.
I think we are talking about the same, i gave 2 posibilities , it seems  you 
think the second is right . if so i agree with you. 
We have to think the our changes in the kernel are better if they are less 
intrusive and i think changing pid size is not the way , we just add a 
structure in the task struct where one of the fields is the node number.
More fields must be discussed as soon as we decide something about this 
issue. 


> > Then we have a question left: Â¿How we manage to get unique CPIDs ?
> > What number node a machine must have, Â¿read it from /etc? Â¿read it a la
> > HDLC?
>
> The issue, as I mentioned earlier, is when/how does a node get a node
> number. The DLCP idea is interesting.  Another option is something in lilo
> or ram disk. Specifying it must be in /etc is bad for those of us with a
> single root. I like the idea of a "cluster" system call.  One of the
> subcommands of it would be set your node number.  Different cluster
> implementations could gather that number from different places.

even with a single root you're having diferent configurations for every 
machine (ip, inittab ...) so just add a new file with the node id. If you are 
managing all the cluster nodes, you can just  give every node a different 
number, the problem comes when you just want to connect your laptop to the 
cluster and you have no idea about what node numbers are being used.  

--
Jordi  
  Student of Spain 

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 22:40:25 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16217AbRGLUkS>; Thu, 12 Jul 2001 22:40:18 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:270 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16205AbRGLUkA>;
	Thu, 12 Jul 2001 22:40:00 +0200
Received: (qmail 214227 invoked from network); 12 Jul 2001 20:36:11 -0000
Received: from nicol6.umkc.edu (HELO cstp.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 12 Jul 2001 20:36:11 -0000
Message-ID: <3B4E0968.2678BF43@cstp.umkc.edu>
Date:	Thu, 12 Jul 2001 15:32:40 -0500
From:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Lars Marowsky-Bree <lmb@suse.de>
CC:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
References: <20010710225557.KBMP10148.imf08bis.bellsouth.net@taz> <200107102347.f6ANlrY09648@kahuna.cag.cpqcorp.net> <20010711103138.D1904@marowsky-bree.de>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Lars Marowsky-Bree wrote:

> 
> Hierarchial clusters are hard enough, being a member of two clusters at once
> is an interesting problem - what possible advantages do you see here?

Since it was part of the early posts to this list, I'll try to answer
this one:

Multiple clusters at once simplifies hierarchical clusters.  In fact
it may be considered a prerequisite for it.

Say you've got ten universites, each with ten labs, each with
ten machines.  Each lab is administered seperately, but as part
of an Internet2 initiative all hundred labs have received a grant
to implement a process migration cluster.

If this was to be done using COTS MOSIX, all thousand machines would
get listed in a map file which is shared by all thousand.

With multiple clusters, each lab can form one cluster, one machine
in each lab can be designated as the representative to the U. cluster,
and one of these ten (or maybe a designated server) can be the
designated representative to the wide area cluster.

Not only is administration greatly simplified, since you no longer
need to touch the map files in Iowa City when a DHCP lease
expires in San Bernadino, but the migration decisions at the
leaf node level are eased: the leaf nodes do not need to even
consider machines outside of their immediate bailiwick.

A dually connected machine might advertise as available resources
the other clusters connected to it.

and so forth.


Still waiting for that grant by the way -- I guess I'd better get
on writing the application!


-- 
                                           David Nicol 816.235.1187
                      Irish Government Warning: SMOKERS DIE YOUNGER


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 22:47:32 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16221AbRGLUrX>; Thu, 12 Jul 2001 22:47:23 +0200
Received: from gw.xkey.com ([206.86.100.52]:64266 "EHLO happy.xkey.com")
	by humbolt.nl.linux.org with ESMTP id <S16223AbRGLUrK>;
	Thu, 12 Jul 2001 22:47:10 +0200
Received: (from smtp@localhost) by happy.xkey.com
	id NAA00467 for <linux-cluster@nl.linux.org>; Thu, 12 Jul 2001 13:47:08 -0700
Received: from happy(127.0.0.1) by happy.xkey.com via smtp (V1.3)
	id sma000459; Thu Jul 12 13:47:03 2001
Received: (from lindahl@localhost)
	by localhost.hpti.com (8.11.0/8.11.0) id f6CKnfm31318
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 16:49:41 -0400
X-Authentication-Warning: localhost.hpti.com: lindahl set sender to lindahl@conservativecomputer.com using -f
Date:	Thu, 12 Jul 2001 16:49:41 -0400
From:	Greg Lindahl <lindahl@conservativecomputer.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010712164941.A31314@wumpus>
Mail-Followup-To: linux-cluster@nl.linux.org
References: <20010710225557.KBMP10148.imf08bis.bellsouth.net@taz> <200107102347.f6ANlrY09648@kahuna.cag.cpqcorp.net> <20010711103138.D1904@marowsky-bree.de> <3B4E0968.2678BF43@cstp.umkc.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3B4E0968.2678BF43@cstp.umkc.edu>; from dnicol@cstp.umkc.edu on Thu, Jul 12, 2001 at 03:32:40PM -0500
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Thu, Jul 12, 2001 at 03:32:40PM -0500, David L. Nicol wrote:

> Say you've got ten universites, each with ten labs, each with
> ten machines.  Each lab is administered seperately, but as part
> of an Internet2 initiative all hundred labs have received a grant
> to implement a process migration cluster.

So you want to do something that Condor already does well without
using a cluster, by using a cluster? Sounds like using a screwdriver
to pound in a nail. Depends on the exact details, of course, but I
frequently see people using a "cluster" screwdriver on a nail.

g

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 23:18:52 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16218AbRGLVSf>; Thu, 12 Jul 2001 23:18:35 +0200
Received: from web9207.mail.yahoo.com ([216.136.129.40]:51467 "HELO
	web9207.mail.yahoo.com") by humbolt.nl.linux.org with SMTP
	id <S16208AbRGLVSY>; Thu, 12 Jul 2001 23:18:24 +0200
Message-ID: <20010712211808.53179.qmail@web9207.mail.yahoo.com>
Received: from [192.148.12.85] by web9207.mail.yahoo.com via HTTP; Thu, 12 Jul 2001 14:18:08 PDT
Date:	Thu, 12 Jul 2001 14:18:08 -0700 (PDT)
From:	Peter Badovinatz <tabmowzo@yahoo.com>
Subject: Re: Clusterwide pids
To:	linux-cluster@nl.linux.org
Cc:	Greg Lindahl <lindahl@conservativecomputer.com>
In-Reply-To: <20010712154532.A31069@wumpus>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


--- Greg Lindahl <lindahl@conservativecomputer.com> wrote:
> On Thu, Jul 12, 2001 at 02:37:56PM -0500, Patrick Spinler wrote:
> > Greg Lindahl wrote:
> 
> > > I've run into relatively few applications that use
> > > IPC amongst a set of processes, in a way that would mesh nicely with
> > > SSI IPC. Of course my experience is only with a subset of the
> > > universe, so I'd love to hear examples.
> > 
> > DBMS server ?
> 
> Which one? Oracle uses a distributed lock manager that doesn't use
> normal IPC, and direct access to the storage from all the nodes, which
> doesn't use normal IPC. I think DB2 for clusters is similar in
> design. Single node databases are generally threaded and use a shared
> memory model, not IPC.

Clustered DB2 expects each "instance" to have direct access to its storage for
its partition of the database.  A single physical node can host multiple
instances, although for performance reasons this isn't usually done (except in
failover cases.)  Each instance expects to have exlusive access to the storage
hosting its partition of data, failover requires the storage to be accessible
from multiple nodes, but there is no concurrent access of the storage.

Although a single instance is made up of many processes, and I do believe that
IPC is used a little bit among these processes, there isn't much need or reason
to migrate them off of the physical node one at a time.  The perfomance loss as
these processes use IPC to communicate across a network instead of local memory
would eat up whatever time you think you're saving.  To move an instance to
another node you currently have to stop it, move over the storage access, then
restart the instance processes. 

Using existing failover clustering techniques on Linux or other UNICES this
takes a few seconds.  If a node fails, then you do 'standard' recovery, which
differs not all in scope between an SSI or a non-SSI cluster -- in both cases
the physical node failed, you have to start up the processes somewhere else
where you have the proper storage access.

In any case, migrating the process means that you still have a "blackout"
period where the process instance can't accept/answer requests.  Not that this
says that SSI is or isn't valid, but for this scenario non-SSI clustering
techniques are just as valid and offer the same level of function.
> 
> > Distributed object application service(s) ?
> 
> Generally use things other than normal IPC: CORBA is one example.
> 
> g

Peter


=====
These have been the opinions of:
Peter R. Badovinatz -- (503)578-5530 (TL 775)
wombat@us.ibm.com/tabmowzo@yahoo.com
and in no way should be construed as official opinion of 
IBM, Corp.

__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Thu Jul 12 23:42:17 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16226AbRGLVmC>; Thu, 12 Jul 2001 23:42:02 +0200
Received: from inet-mail3.oracle.com ([148.87.2.203]:34785 "EHLO
	inet-mail3.oracle.com") by humbolt.nl.linux.org with ESMTP
	id <S16208AbRGLVll>; Thu, 12 Jul 2001 23:41:41 +0200
Received: from gmgw01.us.oracle.com (gmgw01.us.oracle.com [130.35.249.115])
	by inet-mail3.oracle.com (Switch-2.1.3/Switch-2.1.0) with ESMTP id f6CLbX909448;
	Thu, 12 Jul 2001 14:37:33 -0700 (PDT)
Received: from oracle.com (dbrower-sun.us.oracle.com [130.35.180.64])
	by gmgw01.us.oracle.com (Switch-2.1.1/Switch-2.1.0) with ESMTP id f6CLfTj20853;
	Thu, 12 Jul 2001 14:41:29 -0700 (PDT)
Message-ID: <3B4E1989.6FAD51C4@oracle.com>
Date:	Thu, 12 Jul 2001 14:41:29 -0700
From:	David Brower <David.Brower@oracle.com>
Organization: Oracle Corporation
X-Mailer: Mozilla 4.7 [en] (X11; U; SunOS 5.6 sun4u)
X-Accept-Language: en
MIME-Version: 1.0
To:	Peter Badovinatz <tabmowzo@yahoo.com>
CC:	linux-cluster@nl.linux.org,
	Greg Lindahl <lindahl@conservativecomputer.com>
Subject: Re: Clusterwide pids
References: <20010712211808.53179.qmail@web9207.mail.yahoo.com>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

As a database guy, I've been underinspired by the SSI cluster with
process/IPC migration for about the same reasons.  Applications that
are really concerned about performance and h/a will be and are written
in a cluster-aware manner.  They take very careful account of the
distinctions between nodes, and appropriate use of shared memory,
IPC and network access.   Where fault tolerance is needed, they
use their own mechanisms.  It seems to me that an application that
is having processes migrated around, dragging IPC connections
along is one that is going to have a host of performance issues.

For instance, imagine a web server running an apache with JVMs
to run servelets.  Does it really make sense to migrate some
of those processes to another machine without moving them all?
Is this something you're going to really want to do during normal
operation, transparently?  Aren't you just as well to have enabled
the service on the other node, and then gracefully shut this one
down?

I guess I'd like to see a convincing example of an application that
really needs SSI IPC semantics, and wouldn't be better served
by knowing what was going on itself.

-dB   

Peter Badovinatz wrote:
> 
> --- Greg Lindahl <lindahl@conservativecomputer.com> wrote:
> > On Thu, Jul 12, 2001 at 02:37:56PM -0500, Patrick Spinler wrote:
> > > Greg Lindahl wrote:
> >
> > > > I've run into relatively few applications that use
> > > > IPC amongst a set of processes, in a way that would mesh nicely with
> > > > SSI IPC. Of course my experience is only with a subset of the
> > > > universe, so I'd love to hear examples.
> > >
> > > DBMS server ?
> >
> > Which one? Oracle uses a distributed lock manager that doesn't use
> > normal IPC, and direct access to the storage from all the nodes, which
> > doesn't use normal IPC. I think DB2 for clusters is similar in
> > design. Single node databases are generally threaded and use a shared
> > memory model, not IPC.
> 
> Clustered DB2 expects each "instance" to have direct access to its storage for
> its partition of the database.  A single physical node can host multiple
> instances, although for performance reasons this isn't usually done (except in
> failover cases.)  Each instance expects to have exlusive access to the storage
> hosting its partition of data, failover requires the storage to be accessible
> from multiple nodes, but there is no concurrent access of the storage.
> 
> Although a single instance is made up of many processes, and I do believe that
> IPC is used a little bit among these processes, there isn't much need or reason
> to migrate them off of the physical node one at a time.  The perfomance loss as
> these processes use IPC to communicate across a network instead of local memory
> would eat up whatever time you think you're saving.  To move an instance to
> another node you currently have to stop it, move over the storage access, then
> restart the instance processes.
> 
> Using existing failover clustering techniques on Linux or other UNICES this
> takes a few seconds.  If a node fails, then you do 'standard' recovery, which
> differs not all in scope between an SSI or a non-SSI cluster -- in both cases
> the physical node failed, you have to start up the processes somewhere else
> where you have the proper storage access.
> 
> In any case, migrating the process means that you still have a "blackout"
> period where the process instance can't accept/answer requests.  Not that this
> says that SSI is or isn't valid, but for this scenario non-SSI clustering
> techniques are just as valid and offer the same level of function.
> >
> > > Distributed object application service(s) ?
> >
> > Generally use things other than normal IPC: CORBA is one example.
> >
> > g
> 
> Peter

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 00:06:15 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16246AbRGLWGF>; Fri, 13 Jul 2001 00:06:05 +0200
Received: from jalon.able.es ([212.97.163.2]:38828 "EHLO jalon.able.es")
	by humbolt.nl.linux.org with ESMTP id <S16239AbRGLWFz>;
	Fri, 13 Jul 2001 00:05:55 +0200
Received: from werewolf.able.es ([212.97.168.128]) by
          jalon.able.es (Netscape Messaging Server 4.15) with ESMTP id
          GGDS2P00.Q2J; Fri, 13 Jul 2001 00:06:25 +0200 
Date:	Fri, 13 Jul 2001 00:07:45 +0200
From:	"J . A . Magallon" <jamagallon@able.es>
To:	Chris Wright <chris@wirex.com>
Cc:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010713000745.A30317@werewolf.able.es>
References: <20010712102050.W25392@figure1.int.wirex.com> <200107121747.f6CHlC4105822@saturn.cs.uml.edu> <20010712112342.X25392@figure1.int.wirex.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
In-Reply-To: <20010712112342.X25392@figure1.int.wirex.com>; from chris@wirex.com on Thu, Jul 12, 2001 at 20:23:42 +0200
X-Mailer: Balsa 1.1.6
Content-Length:	767
Lines:	24
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


On 20010712 Chris Wright wrote:
>* Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
>> Chris Wright writes:
>> > * Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
>> >> 
>> >> 1. boot the single node without joining the cluster
>> >> 2. fix the node
>> >> 3. reboot to join the cluster
>> > 

Sure it has been talked about before, but I do not remember any conclussion.
Which is the problem with booting like

linux nodeid=47

and let the cluster admin be sure there is no duplicate id ?
Then node just boots and joins the cluster in user-space.

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Mandrake Linux release 8.1 (Cooker) for i586
Linux werewolf 2.4.6-ac2 #1 SMP Sun Jul 8 23:57:11 CEST 2001 i686

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 00:33:07 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16243AbRGLWc7>; Fri, 13 Jul 2001 00:32:59 +0200
Received: from 234-MADR-X55.libre.retevision.es ([62.83.16.234]:50951 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16247AbRGLWcp>;
	Fri, 13 Jul 2001 00:32:45 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id 300692F4BE; Fri, 13 Jul 2001 00:39:00 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	"J . A . Magallon" <jamagallon@able.es>
Subject: Re: Clusterwide pids
Date:	Fri, 13 Jul 2001 00:38:58 +0200
X-Mailer: KMail [version 1.2]
References: <20010712102050.W25392@figure1.int.wirex.com> <20010712112342.X25392@figure1.int.wirex.com> <20010713000745.A30317@werewolf.able.es>
In-Reply-To: <20010713000745.A30317@werewolf.able.es>
Cc:	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071300385805.02292@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

El Viernes 13 Julio 2001 00:07, escribiste:
> On 20010712 Chris Wright wrote:
> >* Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
> >> Chris Wright writes:
> >> > * Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
> >> >> 1. boot the single node without joining the cluster
> >> >> 2. fix the node
> >> >> 3. reboot to join the cluster
>
> Sure it has been talked about before, but I do not remember any
> conclussion. Which is the problem with booting like
>
> linux nodeid=47
>
> and let the cluster admin be sure there is no duplicate id ?
> Then node just boots and joins the cluster in user-space.

The problem comes when you just want to connect your laptop to the 
cluster and you have no idea about what node numbers are being used.  
You want also that your machines already started join the cluster.

Your way surely works but what about machines that select a unique node id 
automagically? If we want something like that the best moment to begin 
developing it is now, later it will be much harder make a more flexible 
aproximation than a boot option.

--
Jordi
  Student of Spain 
 

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 00:46:29 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16249AbRGLWqW>; Fri, 13 Jul 2001 00:46:22 +0200
Received: from mercury.mv.net ([199.125.85.40]:12811 "EHLO mercury.mv.net")
	by humbolt.nl.linux.org with ESMTP id <S16228AbRGLWqI>;
	Fri, 13 Jul 2001 00:46:08 +0200
Received: from filesrus (bnh-6-05.mv.com [199.125.98.69]) by mercury.mv.net (8.8.8/mem-971025) with SMTP id SAA14301; Thu, 12 Jul 2001 18:46:01 -0400 (EDT)
Message-ID: <03c101c10b25$007ef3e0$45627dc7@filesrus>
From:	"Bill Todd" <billtodd@foo.mv.com>
To:	"David Brower" <David.Brower@oracle.com>,
	"Peter Badovinatz" <tabmowzo@yahoo.com>
Cc:	<linux-cluster@nl.linux.org>,
	"Greg Lindahl" <lindahl@conservativecomputer.com>
References: <20010712211808.53179.qmail@web9207.mail.yahoo.com> <3B4E1989.6FAD51C4@oracle.com>
Subject: Re: Clusterwide pids
Date:	Thu, 12 Jul 2001 18:50:07 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


----- Original Message -----
From: "David Brower" <David.Brower@oracle.com>
To: "Peter Badovinatz" <tabmowzo@yahoo.com>
Cc: <linux-cluster@nl.linux.org>; "Greg Lindahl"
<lindahl@conservativecomputer.com>
Sent: Thursday, July 12, 2001 5:41 PM
Subject: Re: Clusterwide pids


> As a database guy, I've been underinspired by the SSI cluster with
> process/IPC migration for about the same reasons.  Applications that
> are really concerned about performance and h/a will be and are written
> in a cluster-aware manner.  They take very careful account of the
> distinctions between nodes, and appropriate use of shared memory,
> IPC and network access.   Where fault tolerance is needed, they
> use their own mechanisms.  It seems to me that an application that
> is having processes migrated around, dragging IPC connections
> along is one that is going to have a host of performance issues.
>
> For instance, imagine a web server running an apache with JVMs
> to run servelets.  Does it really make sense to migrate some
> of those processes to another machine without moving them all?
> Is this something you're going to really want to do during normal
> operation, transparently?  Aren't you just as well to have enabled
> the service on the other node, and then gracefully shut this one
> down?
>
> I guess I'd like to see a convincing example of an application that
> really needs SSI IPC semantics, and wouldn't be better served
> by knowing what was going on itself.

How about any multi-process application running on a large, partitioned
SMP/NUMA system that communicates via shared-memory mechanisms and migrates
processes for load-balancing among the partitions (which may also be
configured to provide high-availability in the case of a failure)?  That's
structured pretty much as a 'normal' cluster is, but lacks the negative
communication characteristics you assume above.

Whether such a configuration should be considered an important target is a
separate issue.

- bill



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 01:02:16 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16260AbRGLXCB>; Fri, 13 Jul 2001 01:02:01 +0200
Received: from jalon.able.es ([212.97.163.2]:35246 "EHLO jalon.able.es")
	by humbolt.nl.linux.org with ESMTP id <S16252AbRGLXBq>;
	Fri, 13 Jul 2001 01:01:46 +0200
Received: from werewolf.able.es ([212.97.168.128]) by
          jalon.able.es (Netscape Messaging Server 4.15) with ESMTP id
          GGDUNP00.424; Fri, 13 Jul 2001 01:02:13 +0200 
Date:	Fri, 13 Jul 2001 01:03:35 +0200
From:	"J . A . Magallon" <jamagallon@able.es>
To:	Jordi Polo <mumismo@wanadoo.es>
Cc:	linux-cluster@nl.linux.org
Subject: Re: re[2]: Clusterwide pids
Message-ID: <20010713010335.A30345@werewolf.able.es>
References: <20010712181057.CDSZ12972.imf09bis.bellsouth.net@taz> <01071221112000.02292@mioooldpc>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
In-Reply-To: <01071221112000.02292@mioooldpc>; from mumismo@wanadoo.es on Thu, Jul 12, 2001 at 21:11:20 +0200
X-Mailer: Balsa 1.1.6
Content-Length:	998
Lines:	25
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


On 20010712 Jordi Polo wrote:
>- We don't want to recompile userspace apps so we do our own system calls, 
>which  maybe :
>	* getcpid to get the Â¿64bits? with the standard getpid returning the low 32 
>bits of the CPID
>	* getnode(pid) with 0 if process if local , nonzero otherwise.
>Then we have a question left: Â¿How we manage to get unique CPIDs ?
>What number node a machine must have, Â¿read it from /etc? Â¿read it a la HDLC?
>

Really you want (will use nid for symmetry with pid):
- getlnid(): get local node id
- getlpid(): get local pid
- getcpid(): get full cpid (simple way is just (nid << 16)|pid)
- getpnid(): parent node id (where did I start?)

- getpid(), getppid(): ??? returning lpid or cpid ??? cpid allows standard user tools
	to work over cluster.

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Mandrake Linux release 8.1 (Cooker) for i586
Linux werewolf 2.4.6-ac2 #1 SMP Sun Jul 8 23:57:11 CEST 2001 i686

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 01:16:10 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16069AbRGLXQC>; Fri, 13 Jul 2001 01:16:02 +0200
Received: from jalon.able.es ([212.97.163.2]:61614 "EHLO jalon.able.es")
	by humbolt.nl.linux.org with ESMTP id <S16060AbRGLXPx>;
	Fri, 13 Jul 2001 01:15:53 +0200
Received: from werewolf.able.es ([212.97.168.128]) by
          jalon.able.es (Netscape Messaging Server 4.15) with ESMTP id
          GGDVBC00.U2U; Fri, 13 Jul 2001 01:16:24 +0200 
Date:	Fri, 13 Jul 2001 01:17:45 +0200
From:	"J . A . Magallon" <jamagallon@able.es>
To:	Jordi Polo <mumismo@wanadoo.es>
Cc:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010713011745.C30345@werewolf.able.es>
References: <20010712102050.W25392@figure1.int.wirex.com> <20010712112342.X25392@figure1.int.wirex.com> <20010713000745.A30317@werewolf.able.es> <01071300385805.02292@mioooldpc>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
In-Reply-To: <01071300385805.02292@mioooldpc>; from mumismo@wanadoo.es on Fri, Jul 13, 2001 at 00:38:58 +0200
X-Mailer: Balsa 1.1.6
Content-Length:	1354
Lines:	37
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


On 20010713 Jordi Polo wrote:
>> conclussion. Which is the problem with booting like
>>
>> linux nodeid=47
>>
>> and let the cluster admin be sure there is no duplicate id ?
>> Then node just boots and joins the cluster in user-space.
>
>The problem comes when you just want to connect your laptop to the 
>cluster and you have no idea about what node numbers are being used.  
>You want also that your machines already started join the cluster.
>

Well, what is a cluster ? Why would you need to plug a laptop on a cluster ?
What is the problem with rebooting to change a node id ?
I suppose you want to run a cluster because you have a HA application or a HP one
that needs days-weeks-months to complete. You see a cluster like a net,
and I see it more like a crossbar to plug CPUs. You do not play plugging
and unpluging your pentiums on a 2 way box.

>Your way surely works but what about machines that select a unique node id 
>automagically?

What for ?

>If we want something like that the best moment to begin 
>developing it is now, later it will be much harder make a more flexible 
>aproximation than a boot option.
>


-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Mandrake Linux release 8.1 (Cooker) for i586
Linux werewolf 2.4.6-ac2 #1 SMP Sun Jul 8 23:57:11 CEST 2001 i686

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 01:48:18 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16139AbRGLXsK>; Fri, 13 Jul 2001 01:48:10 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:43013 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16093AbRGLXrw>;
	Fri, 13 Jul 2001 01:47:52 +0200
Received: (qmail 217185 invoked from network); 12 Jul 2001 23:44:03 -0000
Received: from nicol6.umkc.edu (HELO cstp.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 12 Jul 2001 23:44:02 -0000
Message-ID: <3B4E356F.72005037@cstp.umkc.edu>
Date:	Thu, 12 Jul 2001 18:40:31 -0500
From:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	"J . A . Magallon" <jamagallon@able.es>
CC:	Jordi Polo <mumismo@wanadoo.es>, linux-cluster@nl.linux.org
Subject: cluster vs. grid
References: <20010712102050.W25392@figure1.int.wirex.com> <20010712112342.X25392@figure1.int.wirex.com> <20010713000745.A30317@werewolf.able.es> <01071300385805.02292@mioooldpc> <20010713011745.C30345@werewolf.able.es>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

"J . A . Magallon" wrote:
> 
> Well, what is a cluster ? Why would you need to plug a laptop on a cluster ?
> What is the problem with rebooting to change a node id ?

Maybe we should start calling it a "grid" instead of a cluster :)

Using the adaptive pool or half-duplex pool architechtures as 
described at www.mosix.org, with mosix 1.0.* installed on your
laptop, you could plug your laptop into a network jack anywhere
at a mosix-enabled site, get a DHCP address, get a local mosix.map
file from the cluster administrator, and demonstrate a massively
parallel application headquartersed in your laptop which would
use the lab machines as compute nodes.

That's today.

In the future, after our work in this WG is done, we'll be able to
let you do that 

	--> without inconveniencing anyone else if you run rabbits

	--> without needing to change your map file


Rebooting to reconfigure a service?  What do you think this is,
Microsoft?



-- 
                                           David Nicol 816.235.1187
                      Irish Government Warning: SMOKERS DIE YOUNGER


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:02:49 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16242AbRGMACl>; Fri, 13 Jul 2001 02:02:41 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:8966 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16092AbRGMACX>;
	Fri, 13 Jul 2001 02:02:23 +0200
Received: (qmail 217348 invoked from network); 12 Jul 2001 23:58:34 -0000
Received: from nicol6.umkc.edu (HELO cstp.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 12 Jul 2001 23:58:34 -0000
Message-ID: <3B4E38D7.242EF502@cstp.umkc.edu>
Date:	Thu, 12 Jul 2001 18:55:03 -0500
From:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Patrick Spinler <spinler.patrick@mayo.edu>
CC:	"Albert D. Cahalan" <acahalan@cs.uml.edu>,
	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
References: <200107121924.f6CJOC6113746@saturn.cs.uml.edu> <3B4DFFFA.190AEB6A@mayo.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Patrick Spinler wrote:
> 
> "Albert D. Cahalan" wrote:
> >
> > It is better to pass out chunks of PID space or
> > lists of free PIDs. 
...

> This scheme is quite straightforward, which is good. But in my limited
> understanding it seems to place a cluster architechure in one of two
> possibly difficult to resolve scenarios:
> 
> a) cluster_id's and pid ranges are hardcoded at boot time
>    -) potentially difficult to administer large or highly dynamic
> cluster(s)
> 
> b) node has to join cluster and request pid range's & cluster id before
> starting any processes
>    -) all key cluster membership management & control must be kernel
> level
>       (this may be necessary anyway)
>    -) still have to assume certain processes & pid's are local node
> only,
>       e.g. init is assumed to be pid 0 in many places, correct ?
> 
> How might you address these problems ?  Have I misunderstood ?
> 
> -- Pat

Reserve a range (<8K, for instance) for nonmigratable on any machine,
then divide up the higher PID numbers.  complex machines still might
need translation tables, but trivial clusters wouldn't any more.  
B*-like division of PID space could then be static or dynamic.

Early Beowulf (bproc?) used hard PID division, then later switched
to translation tables, as I vaguely understand it.  Anyone reading
this care to explain why, or post an URL to the discussion?

-- 
                                           David Nicol 816.235.1187
                      Irish Government Warning: SMOKERS DIE YOUNGER


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:13:17 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16250AbRGMANJ>; Fri, 13 Jul 2001 02:13:09 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:34566 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16092AbRGMANA>;
	Fri, 13 Jul 2001 02:13:00 +0200
Received: (qmail 217534 invoked from network); 13 Jul 2001 00:09:10 -0000
Received: from nicol6.umkc.edu (HELO cstp.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 13 Jul 2001 00:09:10 -0000
Message-ID: <3B4E3B52.44C093A7@cstp.umkc.edu>
Date:	Thu, 12 Jul 2001 19:05:38 -0500
From:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
CC:	Jordi Polo <mumismo@wanadoo.es>, linux-cluster@nl.linux.org
Subject: the "cluster" system call
References: <200107121915.f6CJFPj09389@kahuna.cag.cpqcorp.net>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Bruce Walker wrote:
 
> The issue, as I mentioned earlier, is when/how does a node get a node number.
> The DLCP idea is interesting.  Another option is something in lilo or ram disk.
> Specifying it must be in /etc is bad for those of us with a single root.
> I like the idea of a "cluster" system call.  One of the subcommands of it would
> be set your node number.  Different cluster implementations could gather that
> number from different places.
> 
> bruce


As I mentioned earlier (months ago) :)
it could be a mountable fake file system.

mount special-file mount-point -t cluster-type

The mount-point would be a directory where all the 
control interfaces, including a standard subset and
whatever extensions the particular system adds on, will
live.

The special-file would contain the configuration info
for this cluster membership.

the cluster-type would be the clustering discipline to 
give the special-file to, to set itself up.  Mount might
be able to figure out what kind it is on its own.

I was about to write a completely user-mode system based
on unix-domain sockets this spring but got distracted.

-- 
                                           David Nicol 816.235.1187
                      Irish Government Warning: SMOKERS DIE YOUNGER


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:14:07 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16137AbRGMAOA>; Fri, 13 Jul 2001 02:14:00 +0200
Received: from 104-MADR-X46.libre.retevision.es ([62.83.25.104]:10500 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16252AbRGMANx>;
	Fri, 13 Jul 2001 02:13:53 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP id 6C8F22FABC
	for <linux-cluster@nl.linux.org>; Fri, 13 Jul 2001 02:20:11 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	linux-cluster@nl.linux.org
Subject: Re: cluster vs. grid
Date:	Fri, 13 Jul 2001 02:20:11 +0200
X-Mailer: KMail [version 1.2]
MIME-Version: 1.0
Message-Id: <01071302201101.00494@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list




El Viernes 13 Julio 2001 01:40, escribiste:
> "J . A . Magallon" wrote:
> > Well, what is a cluster ? Why would you need to plug a laptop on a
> > cluster ? What is the problem with rebooting to change a node id ?
>
> Maybe we should start calling it a "grid" instead of a cluster :)
>
> Using the adaptive pool or half-duplex pool architechtures as
> described at www.mosix.org, with mosix 1.0.* installed on your
> laptop, you could plug your laptop into a network jack anywhere
> at a mosix-enabled site, get a DHCP address, get a local mosix.map
> file from the cluster administrator, and demonstrate a massively
> parallel application headquartersed in your laptop which would
> use the lab machines as compute nodes.

MOSIX was exactly what i had in mind , about getting the mosix.map (node
number) i see 3 aproach:
-local: in /etc/ and boot time, i don't like it because you need to know all
the other node number to avoid conflicts. Easy to implement.
-Centralized : we have a DHCP server that knows every configuration and cares
about unique node numbers. If you have your own configuration that server can
be a node-number server . Easy to implement
-Distributed: we don't need a centralized server (we can add machines if the
server is down) so more fault tolerance but harder to implement as we need
have a lot of care to add a machine to the cluster . I think this approach is
the right one . We need a little protocol to add a machine (for instance we
reserve a node number for machines that wants to enter the cluster). I don't
mean a substitute of DHCP, this approach is only to give a node number to the
new machine .

> That's today.
>
> In the future, after our work in this WG is done, we'll be able to
> let you do that
>
> 	--> without inconveniencing anyone else if you run rabbits
>
> 	--> without needing to change your map file
>
>
> Rebooting to reconfigure a service?  What do you think this is,
> Microsoft?

--
Jordi
  Student of Spain

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:17:20 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16270AbRGMARK>; Fri, 13 Jul 2001 02:17:10 +0200
Received: from 104-MADR-X46.libre.retevision.es ([62.83.25.104]:11268 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16267AbRGMARC>;
	Fri, 13 Jul 2001 02:17:02 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id B397A2FABC; Fri, 13 Jul 2001 02:23:21 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	"J . A . Magallon" <jamagallon@able.es>
Subject: Re: re[2]: Clusterwide pids
Date:	Fri, 13 Jul 2001 02:23:21 +0200
X-Mailer: KMail [version 1.2]
References: <20010712181057.CDSZ12972.imf09bis.bellsouth.net@taz> <01071221112000.02292@mioooldpc> <20010713010335.A30345@werewolf.able.es>
In-Reply-To: <20010713010335.A30345@werewolf.able.es>
Cc:	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071302232102.00494@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

El Viernes 13 Julio 2001 01:03, escribiste:
> On 20010712 Jordi Polo wrote:
> >- We don't want to recompile userspace apps so we do our own system calls,
> >which  maybe :
> >	* getcpid to get the Â¿64bits? with the standard getpid returning the low
> > 32 bits of the CPID
> >	* getnode(pid) with 0 if process if local , nonzero otherwise.
> >Then we have a question left: Â¿How we manage to get unique CPIDs ?
> >What number node a machine must have, Â¿read it from /etc? Â¿read it a la
> > HDLC?
>
> Really you want (will use nid for symmetry with pid):
> - getlnid(): get local node id
> - getlpid(): get local pid
> - getcpid(): get full cpid (simple way is just (nid << 16)|pid)
> - getpnid(): parent node id (where did I start?)
>
> - getpid(), getppid(): ??? returning lpid or cpid ??? cpid allows standard
> user tools to work over cluster.

i think your aproach is that cpid is 32 bits and userspace expect a 15 bits 
pid, so i don't think changing getpid() is a good thing     

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:24:30 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16274AbRGMAYN>; Fri, 13 Jul 2001 02:24:13 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:61446 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16273AbRGMAXt>;
	Fri, 13 Jul 2001 02:23:49 +0200
Received: (qmail 217683 invoked from network); 13 Jul 2001 00:19:59 -0000
Received: from nicol6.umkc.edu (HELO cstp.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 13 Jul 2001 00:19:59 -0000
Message-ID: <3B4E3DDB.7AF5C297@cstp.umkc.edu>
Date:	Thu, 12 Jul 2001 19:16:27 -0500
From:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Greg Freemyer <freemyer@NorcrossGroup.com>
CC:	Chris Wright <chris@wirex.com>, linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
References: <20010712181057.CDSZ12972.imf09bis.bellsouth.net@taz>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg Freemyer wrote:
> 
> I think you are missing the point.
> 
> IF THE GOAL ...
>                         is to get to a cluster with totally transparent IPC capability across the cluster (SSI IPC) then you have to have a unique pid for each process in the cluster.
> 
> For instance:   kill nnn
> must send a signal to just one process, not to several.

If the husk a process leaves behind when it migrates away sets up signal
handlers
that call the process back and delivers the process, you're set.


> Ignoring the performance gains potentially available with SSI IPC, one of the points of SSI IPC is to allow administration of the cluster from any node.
> 
> For instance, if you have local pids for local processes and CPIDs only for cluster processes, then you have the difficult administrative task of telneting to the appropriate node to kill a runaway local process.

rsh appropriate kill -9 2644

 
> Yes, you can modify kill to accept a node parameter, but IPC is commonly used throughout lots of different admin tools, and it would get extremely difficult to support this mechanism.

see above

 
> In my opinion, it is superior to have CPIDs for all processes and thus we can start thinking seriously about SSI IPC.
 
> If you have processes running prior to joining the cluster and you want CPIDs, then you have 2 basic choices I can think of:
> 
> 1) Use the local/cluster process paradigm.
> 2) Have the kernel assume all pids with a node # of 0 are on the local node, and thus it puts the local node # into the cpid.
> 
> Far preferable to either of the above in my mind is to have two boot choices, standalone for maintenance, cluster for normal operation.
> 
> In the cluster boot situation there are again 2 basic choices:
> 
> 1) Have a predetermined node #
> 2) Have the node # dynamically assigned, possibly by having a small 'cluster joining' app which is invoked prior to loading the kernel.
> 
> In either case, if quarom is not available on the cluster at boot up, the kernel/app just sits and waits for more nodes to come alive.

does not handle the situation of, my laptop has 20000 AI neurons running on
it (very slowly), and I want to plug it in to the network and have it
use the available machines, and have them come back home when I leave and
take my laptop with me.

Also, 

	rsh other kill PID
is very nice if you are root everywhere, but if all the machines in the
club are not yours? You don't want someone across campus killing your
jobs so his will run better.

-- 
                                           David Nicol 816.235.1187
                      Irish Government Warning: SMOKERS DIE YOUNGER


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:31:04 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16277AbRGMAaz>; Fri, 13 Jul 2001 02:30:55 +0200
Received: from gw.xkey.com ([206.86.100.52]:59140 "EHLO happy.xkey.com")
	by humbolt.nl.linux.org with ESMTP id <S16273AbRGMAak>;
	Fri, 13 Jul 2001 02:30:40 +0200
Received: (from smtp@localhost) by happy.xkey.com
	id RAA29854 for <linux-cluster@nl.linux.org>; Thu, 12 Jul 2001 17:30:38 -0700
Received: from happy(127.0.0.1) by happy.xkey.com via smtp (V1.3)
	id sma029844; Thu Jul 12 17:30:36 2001
Received: (from lindahl@localhost)
	by localhost.hpti.com (8.11.0/8.11.0) id f6D0XCJ32028
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 20:33:12 -0400
X-Authentication-Warning: localhost.hpti.com: lindahl set sender to lindahl@conservativecomputer.com using -f
Date:	Thu, 12 Jul 2001 20:33:12 -0400
From:	Greg Lindahl <lindahl@conservativecomputer.com>
To:	linux-cluster@nl.linux.org
Subject: Re: cluster vs. grid
Message-ID: <20010712203312.A32024@wumpus>
Mail-Followup-To: linux-cluster@nl.linux.org
References: <20010712102050.W25392@figure1.int.wirex.com> <20010712112342.X25392@figure1.int.wirex.com> <20010713000745.A30317@werewolf.able.es> <01071300385805.02292@mioooldpc> <20010713011745.C30345@werewolf.able.es> <3B4E356F.72005037@cstp.umkc.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3B4E356F.72005037@cstp.umkc.edu>; from dnicol@cstp.umkc.edu on Thu, Jul 12, 2001 at 06:40:31PM -0500
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Thu, Jul 12, 2001 at 06:40:31PM -0500, David L. Nicol wrote:

> Maybe we should start calling it a "grid" instead of a cluster :)

Grids and clusters are different things. If you want to build grids,
go build grids.

g

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:34:57 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16279AbRGMAej>; Fri, 13 Jul 2001 02:34:39 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:55304 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16276AbRGMAe1>;
	Fri, 13 Jul 2001 02:34:27 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6D0YMD131974;
	Thu, 12 Jul 2001 20:34:22 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107130034.f6D0YMD131974@saturn.cs.uml.edu>
Subject: Re: Clusterwide pids
To:	dnicol@cstp.umkc.edu (David L. Nicol)
Date:	Thu, 12 Jul 2001 20:34:21 -0400 (EDT)
Cc:	freemyer@NorcrossGroup.com (Greg Freemyer),
	chris@wirex.com (Chris Wright), linux-cluster@nl.linux.org
In-Reply-To: <3B4E3DDB.7AF5C297@cstp.umkc.edu> from "David L. Nicol" at Jul 12, 2001 07:16:27 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

David L. Nicol writes:
> Greg Freemyer wrote:

>> I think you are missing the point.
>>
>> IF THE GOAL ...
>> is to get to a cluster with totally transparent IPC capability
>> across the cluster (SSI IPC) then you have to have a unique pid
>> for each process in the cluster.
>> 
>> For instance:   kill nnn
>> must send a signal to just one process, not to several.
> 
> If the husk a process leaves behind when it migrates away sets
> up signal handlers that call the process back and delivers the
> process, you're set.

Arrrgh!

Problem right here: "husk a process leaves behind when it migrates"

This is not a reliable solution. It is very ugly too. This is one
of the things that sucks about Mosix.

>> Ignoring the performance gains potentially available with SSI IPC,
>> one of the points of SSI IPC is to allow administration of the
>> cluster from any node.
>>
>> For instance, if you have local pids for local processes and CPIDs
>> only for cluster processes, then you have the difficult
>> administrative task of telneting to the appropriate node to kill
>> a runaway local process.
>
> rsh appropriate kill -9 2644

OK. You have no need for a shared PID space. SSI is not for you.
Maybe you don't really want a cluster; you want a network of
workstations running an app that starts compute jobs at night.

> does not handle the situation of, my laptop has 20000 AI neurons
> running on it (very slowly), and I want to plug it in to the network
> and have it use the available machines, and have them come back home
> when I leave and take my laptop with me.
> 
> Also, 
> 
> 	rsh other kill PID
> is very nice if you are root everywhere, but if all the machines in
> the club are not yours? You don't want someone across campus killing
> your jobs so his will run better.

This is NOT a cluster. Go away.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:49:11 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16203AbRGMAtD>; Fri, 13 Jul 2001 02:49:03 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:3593 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16206AbRGMAsx>;
	Fri, 13 Jul 2001 02:48:53 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6D0mh0132652;
	Thu, 12 Jul 2001 20:48:43 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107130048.f6D0mh0132652@saturn.cs.uml.edu>
Subject: Re: Clusterwide pids
To:	lindahl@conservativecomputer.com (Greg Lindahl)
Date:	Thu, 12 Jul 2001 20:48:42 -0400 (EDT)
Cc:	linux-cluster@nl.linux.org
In-Reply-To: <20010712152917.B31016@wumpus> from "Greg Lindahl" at Jul 12, 2001 03:29:17 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg Lindahl writes:
> On Thu, Jul 12, 2001 at 03:10:44PM -0400, Greg Freemyer wrote:

>> I also suspect that SSI IPC will allow many non-cluster aware
>> applications to suddenly be able to take advantage of the cluster.
>> For instance if you are migrating a set of applications from one
>> node to another which use IPC amongst the set, then SSI IPC would
>> allow the processes to be moved one at a time.
>
> What applications did you have in mind? In my many years of experience
> with distributed and clustered computing in both the enterprise and
> scientific arenas, I've run into relatively few applications that use
> IPC amongst a set of processes, in a way that would mesh nicely with
> SSI IPC. Of course my experience is only with a subset of the
> universe, so I'd love to hear examples.

Mercury customers love their POSIX semaphores and POSIX signals.
The IPC can be used directly from hardware, or from a driver.
So lets see: radar, video, MRI, digital X-ray, sonar, weird
classified projects, software radio...

Data arrives somewhere.
DMA moves it.
DMA fires off a signal or adjusts a semaphore.

Nice, isn't it?


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:56:06 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16281AbRGMAzq>; Fri, 13 Jul 2001 02:55:46 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:8969 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16206AbRGMAz3>;
	Fri, 13 Jul 2001 02:55:29 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6D0tN6132722;
	Thu, 12 Jul 2001 20:55:23 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107130055.f6D0tN6132722@saturn.cs.uml.edu>
Subject: Re: re[2]: Clusterwide pids
To:	mumismo@wanadoo.es (Jordi Polo)
Date:	Thu, 12 Jul 2001 20:55:23 -0400 (EDT)
Cc:	bruce@kahuna.cag.cpqcorp.net (Bruce Walker),
	linux-cluster@nl.linux.org
In-Reply-To: <01071222453803.02292@mioooldpc> from "Jordi Polo" at Jul 12, 2001 10:45:38 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Jordi Polo writes:
> El Jueves 12 Julio 2001 21:15, escribiste:

>> The issue, as I mentioned earlier, is when/how does a node get a node
>> number. The DLCP idea is interesting.  Another option is something in lilo
>> or ram disk. Specifying it must be in /etc is bad for those of us with a
>> single root. I like the idea of a "cluster" system call.  One of the
>> subcommands of it would be set your node number.  Different cluster
>> implementations could gather that number from different places.
>
> even with a single root you're having diferent configurations for
> every machine (ip, inittab ...) so just add a new file with the
> node id. If you are managing all the cluster nodes, you can just
> give every node a different number,

Huh? If you add a new file, it appears everywhere. If you edit
that file, it changes everywhere. Single root means just that.
("root" being the filesystem root, not the user root)

> the problem comes when you just want to connect your laptop to the 
> cluster and you have no idea about what node numbers are being used.  

You can't do that really, and you wouldn't want to. This type of
cluster appears to be a single machine.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 02:57:35 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16285AbRGMA5R>; Fri, 13 Jul 2001 02:57:17 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:8200 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16287AbRGMA5F>;
	Fri, 13 Jul 2001 02:57:05 +0200
Received: (qmail 218145 invoked from network); 13 Jul 2001 00:53:15 -0000
Received: from nicol6.umkc.edu (HELO cstp.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 13 Jul 2001 00:53:15 -0000
Message-ID: <3B4E45A7.137D30A8@cstp.umkc.edu>
Date:	Thu, 12 Jul 2001 19:49:43 -0500
From:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Organization: University of Missouri - Kansas City   supercomputing infrastructure
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	mosix list <mosix-list@cs.huji.ac.il>,
	"linux-cluster@nl.linux.org" <linux-cluster@nl.linux.org>
Subject: Re: Saveing processes to Disk?
References: <E15Kii0-0004j6-00@mos223.cs.huji.ac.il>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Amnon Shiloh wrote:

> You are really looking into checkpointing, not migration -
> checkpoints have been around for ages, since the days of mainframes,
> and still are.  It is done in user-mode.
> 
> Perhaps someone can recommend a good checkpointing package on this list.
> 
> Amnon Shiloh -- the HUJI MOSIX group.


Freshmeat offers:
http://freshmeat.net/projects/oneworld

I have heard of something called "freeze" but cannot find it; I think
it is a system of doing something like a laptop sleep mode.



-- 
                                           David Nicol 816.235.1187
                      Irish Government Warning: SMOKERS DIE YOUNGER


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 03:04:30 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16283AbRGMBEL>; Fri, 13 Jul 2001 03:04:11 +0200
Received: from gw.xkey.com ([206.86.100.52]:64007 "EHLO happy.xkey.com")
	by humbolt.nl.linux.org with ESMTP id <S16280AbRGMBD5>;
	Fri, 13 Jul 2001 03:03:57 +0200
Received: (from smtp@localhost) by happy.xkey.com
	id SAA31747 for <linux-cluster@nl.linux.org>; Thu, 12 Jul 2001 18:03:51 -0700
Received: from happy(127.0.0.1) by happy.xkey.com via smtp (V1.3)
	id sma031741; Thu Jul 12 18:03:45 2001
Received: (from lindahl@localhost)
	by localhost.hpti.com (8.11.0/8.11.0) id f6D16MH32154
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 21:06:22 -0400
X-Authentication-Warning: localhost.hpti.com: lindahl set sender to lindahl@conservativecomputer.com using -f
Date:	Thu, 12 Jul 2001 21:06:22 -0400
From:	Greg Lindahl <lindahl@conservativecomputer.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010712210622.A32141@wumpus>
Mail-Followup-To: linux-cluster@nl.linux.org
References: <20010712152917.B31016@wumpus> <200107130048.f6D0mh0132652@saturn.cs.uml.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200107130048.f6D0mh0132652@saturn.cs.uml.edu>; from acahalan@cs.uml.edu on Thu, Jul 12, 2001 at 08:48:42PM -0400
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Thu, Jul 12, 2001 at 08:48:42PM -0400, Albert D. Cahalan wrote:

> Mercury customers love their POSIX semaphores and POSIX signals.
> The IPC can be used directly from hardware, or from a driver.
> So lets see: radar, video, MRI, digital X-ray, sonar, weird
> classified projects, software radio...
> 
> Data arrives somewhere.
> DMA moves it.
> DMA fires off a signal or adjusts a semaphore.
> 
> Nice, isn't it?

What you've described is a custom programming model, not standard IPCs
which could be made transparant by what we're talking about now. Sure,
control info is being transmitted by standard semaphores and signals,
but not the data.

So no, that's not an example of an application that would be helped by
this. Mercury's competition, btw, mostly uses a message passing model,
which I think fits this kind of processing much better.

g


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 03:06:41 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16287AbRGMBGb>; Fri, 13 Jul 2001 03:06:31 +0200
Received: from perdita.cs.wisc.edu ([128.105.165.34]:25868 "EHLO
	perdita.cs.wisc.edu") by humbolt.nl.linux.org with ESMTP
	id <S16282AbRGMBGT>; Fri, 13 Jul 2001 03:06:19 +0200
Received: (from epaulson@localhost)
	by perdita.cs.wisc.edu (8.9.2/8.9.2) id UAA14909;
	Thu, 12 Jul 2001 20:06:13 -0500 (CDT)
Date:	Thu, 12 Jul 2001 20:06:13 -0500
From:	Erik Paulson <epaulson@cs.wisc.edu>
To:	"David L. Nicol" <dnicol@cstp.umkc.edu>
Cc:	mosix list <mosix-list@cs.huji.ac.il>,
	"linux-cluster@nl.linux.org" <linux-cluster@nl.linux.org>
Subject: Re: Saveing processes to Disk?
Message-ID: <20010712200613.C12431@perdita.cs.wisc.edu>
References: <E15Kii0-0004j6-00@mos223.cs.huji.ac.il> <3B4E45A7.137D30A8@cstp.umkc.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3B4E45A7.137D30A8@cstp.umkc.edu>; from dnicol@cstp.umkc.edu on Thu, Jul 12, 2001 at 07:49:43PM -0500
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Thu, Jul 12, 2001 at 07:49:43PM -0500, David L. Nicol wrote:
> Amnon Shiloh wrote:
> > 
> > Perhaps someone can recommend a good checkpointing package on this list.

http://www.cs.wisc.edu/condor/ (my personal favorite, but I'm a bit biased :)
http://mtckpt.sourceforge.net/
http://www.checkpointing.org/

Also, libckp from AT&T, and then there are a couple for NT - check a couple
of USENIX NT conferences from a few years back...

-Erik


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 03:08:50 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16305AbRGMBIl>; Fri, 13 Jul 2001 03:08:41 +0200
Received: from 104-MADR-X46.libre.retevision.es ([62.83.25.104]:29444 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16306AbRGMBI1>;
	Fri, 13 Jul 2001 03:08:27 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id DACDA2FAB6; Fri, 13 Jul 2001 03:14:45 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Subject: Re: re[2]: Clusterwide pids
Date:	Fri, 13 Jul 2001 03:14:44 +0200
X-Mailer: KMail [version 1.2]
References: <200107130055.f6D0tN6132722@saturn.cs.uml.edu>
In-Reply-To: <200107130055.f6D0tN6132722@saturn.cs.uml.edu>
Cc:	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071303144403.00494@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

> >> The issue, as I mentioned earlier, is when/how does a node get a node
> >> number. The DLCP idea is interesting.  Another option is something in
> >> lilo or ram disk. Specifying it must be in /etc is bad for those of us
> >> with a single root. I like the idea of a "cluster" system call.  One of
> >> the subcommands of it would be set your node number.  Different cluster
> >> implementations could gather that number from different places.
> >
> > even with a single root you're having diferent configurations for
> > every machine (ip, inittab ...) so just add a new file with the
> > node id. If you are managing all the cluster nodes, you can just
> > give every node a different number,
>
> Huh? If you add a new file, it appears everywhere. If you edit
> that file, it changes everywhere. Single root means just that.
> ("root" being the filesystem root, not the user root)

If you are able to give every machine a different IP address , you can as 
well give then a different node number. In the same way , DHCP or whatever. 

> > the problem comes when you just want to connect your laptop to the
> > cluster and you have no idea about what node numbers are being used.
>
> You can't do that really, and you wouldn't want to. This type of
> cluster appears to be a single machine.

Some of the people here really seems to want that.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 03:38:01 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16304AbRGMBhx>; Fri, 13 Jul 2001 03:37:53 +0200
Received: from cerebus.wirex.com ([216.161.55.93]:45553 "EHLO
	figure1.int.wirex.com") by humbolt.nl.linux.org with ESMTP
	id <S16300AbRGMBhj>; Fri, 13 Jul 2001 03:37:39 +0200
Received: (from chris@localhost)
	by figure1.int.wirex.com (8.11.0/8.11.0) id f6D1ZtT19606
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 18:35:55 -0700
Date:	Thu, 12 Jul 2001 18:35:55 -0700
From:	Chris Wright <chris@wirex.com>
To:	linux-cluster@nl.linux.org
Subject: Re: Clusterwide pids
Message-ID: <20010712183555.M25392@figure1.int.wirex.com>
References: <20010712102050.W25392@figure1.int.wirex.com> <200107121747.f6CHlC4105822@saturn.cs.uml.edu> <20010712112342.X25392@figure1.int.wirex.com> <20010713000745.A30317@werewolf.able.es>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010713000745.A30317@werewolf.able.es>; from jamagallon@able.es on Fri, Jul 13, 2001 at 12:07:45AM +0200
X-Editor: Vim http://www.vim.org/
X-Info:	http://www.wirex.com
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

* J . A . Magallon (jamagallon@able.es) wrote:
> 
> On 20010712 Chris Wright wrote:
> >* Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
> >> Chris Wright writes:
> >> > * Albert D. Cahalan (acahalan@cs.uml.edu) wrote:
> >> >> 
> >> >> 1. boot the single node without joining the cluster
> >> >> 2. fix the node
> >> >> 3. reboot to join the cluster
> >> > 
> 
> Sure it has been talked about before, but I do not remember any conclussion.
> Which is the problem with booting like
> 
> linux nodeid=47
> 
> and let the cluster admin be sure there is no duplicate id ?
> Then node just boots and joins the cluster in user-space.

perhaps i'm misunderstanding.  albert mentioned booting into two separate
modes, a cluster mode and a non-cluster mode.  in the cluster mode,
you could certainly hardcode the node id as you mentioned.  of course,
some admins may prefer the transparency of getting the node id assigned
when the node joins the cluster, but that's another issue.  either way,
this does not seem to address the possbility of maintaining a single
boot mode that allows a machine to run local only processes (init, fsck
local volumes, etc.) without obtaining a cluster-wide unique pid from
the cluster.

i was simply asserting that there are some processes (even in an SSI
cluster) who should never run anywhere but the local node (hence the node
id 0, 127.0.0.1 analogy).

if i understand your suggestion, you are saying that the node should be
booted and able to run processes before it joins the cluster (otherwise
how would it join the cluster in user-space?).

i think the question is what does a common process space look like in
a cluster?  can it be segmented such that non-cluster processes and
cluster processes can co-exist uniquely in the pid space.  i assert this
is possible if you include the node id as part of the pid, and make a
special case (node_id == 0) for local only processes.

-chris

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 03:49:15 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16310AbRGMBtH>; Fri, 13 Jul 2001 03:49:07 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:3594 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16313AbRGMBs7>;
	Fri, 13 Jul 2001 03:48:59 +0200
Received: (qmail 218742 invoked from network); 13 Jul 2001 01:45:10 -0000
Received: from nicol6.umkc.edu (HELO kasey.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 13 Jul 2001 01:45:10 -0000
Message-ID: <3B4E51D1.FA30AAE6@kasey.umkc.edu>
Date:	Thu, 12 Jul 2001 20:41:37 -0500
From:	"David L. Nicol" <david@kasey.umkc.edu>
Organization: UMKC Information Services Central Systems
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Greg Lindahl <lindahl@conservativecomputer.com>
CC:	linux-cluster@nl.linux.org
Subject: Re: cluster vs. grid
References: <20010712102050.W25392@figure1.int.wirex.com> <20010712112342.X25392@figure1.int.wirex.com> <20010713000745.A30317@werewolf.able.es> <01071300385805.02292@mioooldpc> <20010713011745.C30345@werewolf.able.es> <3B4E356F.72005037@cstp.umkc.edu> <20010712203312.A32024@wumpus>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Greg Lindahl wrote:
> 
> On Thu, Jul 12, 2001 at 06:40:31PM -0500, David L. Nicol wrote:
> 
> > Maybe we should start calling it a "grid" instead of a cluster :)
> 
> Grids and clusters are different things. If you want to build grids,
> go build grids.
> 
> g

So you see "cluster" as a narrow term, including single system image,
not including transparent [de|at]tachment of nodes, and all nodes
under common administratice authority; not encompassing "grid" which
is a larger collection with looser coupling.

Condor?  grid.

RSH realm? grid.

bproc? cluster.

MOSIX? both.

MPI/PVM? ?


Others see "cluster" as a wider term, including both, leaving us without
a term for the small, tightly coupled groups.

MOSIX? cluster.
Condor? wide cluster.
bproc? cluster.
RSH? cluster.
MPI/PVM? ?

I have been told, "Condor is not a clustering architecture."  I suppose
whoever said that thinks rsh is not a clustering architecture either.

Do we have authoritative definitions yet?

Who's in charge?

-- 
                                           David Nicol 816.235.1187
                      Irish Government Warning: SMOKERS DIE YOUNGER


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 04:02:20 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16336AbRGMCCL>; Fri, 13 Jul 2001 04:02:11 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:30730 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16334AbRGMCB7>;
	Fri, 13 Jul 2001 04:01:59 +0200
Received: (qmail 218928 invoked from network); 13 Jul 2001 01:58:09 -0000
Received: from nicol6.umkc.edu (HELO kasey.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 13 Jul 2001 01:58:09 -0000
Message-ID: <3B4E54DD.E5324730@kasey.umkc.edu>
Date:	Thu, 12 Jul 2001 20:54:37 -0500
From:	"David L. Nicol" <david@kasey.umkc.edu>
Organization: UMKC Information Services Central Systems
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
CC:	linux-cluster@nl.linux.org
Subject: Cahalan: This is NOT a cluster. Go away.
References: <200107130034.f6D0YMD131974@saturn.cs.uml.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

"Albert D. Cahalan" wrote:

> > David Nicol wrote:
> >
> >       rsh other kill PID
> > is very nice if you are root everywhere, but if all the machines in
> > the club are not yours? You don't want someone across campus killing
> > your jobs so his will run better.
> 
> This is NOT a cluster. Go away.

Go away?  Go work on grids?  

Gentlemen, I have been on this list since before it was set up in reponse
to a thread on incorporating MOSIX into the mainstream linux kernel, and
I am not going to "go away" because a contingent devoted to a single system
image architecture has been dominating discussion lately.

if we get a consensus that "cluster" is a narrow word and anything 
involving transient membership and flexible boundries of resource pools
is a "grid" that is fine with me, the pie-in-sky "standard linux cluster
architecture" will have to change to "standard linux grid architecture"
and Dr. Shiloh will have to change the title of his book in the new 
editions to keep up with the current terminology.

That's all very nice.  But go away?


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 04:05:22 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16333AbRGMCFO>; Fri, 13 Jul 2001 04:05:14 +0200
Received: from gw.xkey.com ([206.86.100.52]:28682 "EHLO happy.xkey.com")
	by humbolt.nl.linux.org with ESMTP id <S16332AbRGMCFE>;
	Fri, 13 Jul 2001 04:05:04 +0200
Received: (from smtp@localhost) by happy.xkey.com
	id TAA01303 for <linux-cluster@nl.linux.org>; Thu, 12 Jul 2001 19:04:59 -0700
Received: from happy(127.0.0.1) by happy.xkey.com via smtp (V1.3)
	id sma001295; Thu Jul 12 19:04:52 2001
Received: (from lindahl@localhost)
	by localhost.hpti.com (8.11.0/8.11.0) id f6D27TQ32344
	for linux-cluster@nl.linux.org; Thu, 12 Jul 2001 22:07:29 -0400
X-Authentication-Warning: localhost.hpti.com: lindahl set sender to lindahl@conservativecomputer.com using -f
Date:	Thu, 12 Jul 2001 22:07:29 -0400
From:	Greg Lindahl <lindahl@conservativecomputer.com>
To:	linux-cluster@nl.linux.org
Subject: Re: cluster vs. grid
Message-ID: <20010712220729.A32328@wumpus>
Mail-Followup-To: linux-cluster@nl.linux.org
References: <20010712102050.W25392@figure1.int.wirex.com> <20010712112342.X25392@figure1.int.wirex.com> <20010713000745.A30317@werewolf.able.es> <01071300385805.02292@mioooldpc> <20010713011745.C30345@werewolf.able.es> <3B4E356F.72005037@cstp.umkc.edu> <20010712203312.A32024@wumpus> <3B4E51D1.FA30AAE6@kasey.umkc.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <3B4E51D1.FA30AAE6@kasey.umkc.edu>; from david@kasey.umkc.edu on Thu, Jul 12, 2001 at 08:41:37PM -0500
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Thu, Jul 12, 2001 at 08:41:37PM -0500, David L. Nicol wrote:

> So you see "cluster" as a narrow term, including single system image,
> not including transparent [de|at]tachment of nodes, and all nodes
> under common administratice authority; not encompassing "grid" which
> is a larger collection with looser coupling.

That's not mine, that's the one that's used widely within the cluster
community. 

> MPI/PVM? ?

MPI and PVM can be used in either clusters or distributed
computing. When I worked on Legion, I headed the application group,
and we implemented both MPI and PVM over Legion, which is clearly not
a cluster operating system, since it can span different administrative
domains.

> I have been told, "Condor is not a clustering architecture."  I suppose
> whoever said that thinks rsh is not a clustering architecture either.

Why don't you ask Miron Livney what he thinks Condor is? Hint: not a
cluster.

As for what's an authority, the IEEE Task Force on Cluster Computing
is the kind of body that most people consider authoritative. We had a
"what's a cluster" discussion once, and it was quite interesting.

g

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 04:26:17 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16114AbRGMC0J>; Fri, 13 Jul 2001 04:26:09 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:13066 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16082AbRGMCZ4>;
	Fri, 13 Jul 2001 04:25:56 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6D2Oct136804;
	Thu, 12 Jul 2001 22:24:38 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107130224.f6D2Oct136804@saturn.cs.uml.edu>
Subject: Re: Cahalan: This is NOT a cluster. Go away.
To:	david@kasey.umkc.edu (David L. Nicol)
Date:	Thu, 12 Jul 2001 22:24:37 -0400 (EDT)
Cc:	acahalan@cs.uml.edu (Albert D. Cahalan), linux-cluster@nl.linux.org
In-Reply-To: <3B4E54DD.E5324730@kasey.umkc.edu> from "David L. Nicol" at Jul 12, 2001 08:54:37 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

David L. Nicol writes:
> "Albert D. Cahalan" wrote:
>> David Nicol wrote:

>>>       rsh other kill PID
>>> is very nice if you are root everywhere, but if all the machines in
>>> the club are not yours? You don't want someone across campus killing
>>> your jobs so his will run better.
>>
>> This is NOT a cluster. Go away.
>
> Go away?  Go work on grids?  

Whatever makes you happy. You sure don't want a cluster.

> Gentlemen, I have been on this list since before it was set up in reponse
> to a thread on incorporating MOSIX into the mainstream linux kernel, and
> I am not going to "go away" because a contingent devoted to a single system
> image architecture has been dominating discussion lately.

It's fine to not want a single system image architecture.
You can have some other type of cluster.

It's not at all OK to expect to plug a random pre-booted computer
into the network, ssh around, and start running stuff. It's not at
all OK to have a mess of multiple security and administrative domains.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 04:34:32 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16240AbRGMCeY>; Fri, 13 Jul 2001 04:34:24 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:44555 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16213AbRGMCeJ>;
	Fri, 13 Jul 2001 04:34:09 +0200
Received: (qmail 219487 invoked from network); 13 Jul 2001 02:30:20 -0000
Received: from nicol6.umkc.edu (HELO kasey.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 13 Jul 2001 02:30:20 -0000
Message-ID: <3B4E5C67.340C4848@kasey.umkc.edu>
Date:	Thu, 12 Jul 2001 21:26:47 -0500
From:	"David L. Nicol" <david@kasey.umkc.edu>
Organization: UMKC Information Services Central Systems
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
CC:	linux-cluster@nl.linux.org
Subject: Re: Cahalan: This is NOT a cluster. Go away.
References: <200107130224.f6D2Oct136804@saturn.cs.uml.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

"Albert D. Cahalan" wrote:
 
> It's not at all OK to expect to plug a random pre-booted computer
> into the network, ssh around, and start running stuff.

I can do this today.  Bring your laptop with MOSIX-1.0.* into
my office and plug it into my hub and processes running on it can
migrate to my workstation, and a server accross the hall.

There are a couple of MOSIX build parameters that have to be
set the same on all the systems, but if your system had booted up
last week with a congruent kernel, it would work.

Don't tell me not to expect something I can demonstrate right now.


> It's not at
> all OK to have a mess of multiple security and administrative domains.

If you can sandbox crashme, you can sandbox a guest process.




-- 
                                           David Nicol 816.235.1187
                      Irish Government Warning: SMOKERS DIE YOUNGER


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 06:38:32 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16393AbRGMEiO>; Fri, 13 Jul 2001 06:38:14 +0200
Received: from out-mta3.plasa.com ([202.134.0.195]:4708 "EHLO
	out-mta3.plasa.com") by humbolt.nl.linux.org with ESMTP
	id <S16370AbRGMEiE>; Fri, 13 Jul 2001 06:38:04 +0200
Received: out-mta3.plasa.com; Fri, 13 Jul 2001 11:37:45 +0700
Received: out-mta2.plasa.com; Fri, 13 Jul 2001 11:37:44 +0700
Message-ID: <018d01c10b56$5a05b120$1268053d@thunder>
From:	"mulyadi" <a_mulyadi@telkom.net>
To:	<linux-cluster@nl.linux.org>
Cc:	"jordi" <mumismo@wanadoo.es>
Subject: Some "crash and burn" idea about clusterwide pid
Date:	Fri, 13 Jul 2001 11:42:33 +0700
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4133.2400
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Hello all (especially Jordi)

Watching long threads about cluster pid, make me eager to deliver some
ideas, here they are :

1. Since SSI made some perception to the running apps that they are still
running on "non SSI" system, so system call related to pid's retrieval must
be made more "smarter". I mean, if the node which called it was joined into
a cluster system, then it return cluster wide pid, which I believe everyone
was agree that is origin node ID + local PID. But, if somehow the node is
temporarily disconnected from cluster, of course there is no need to use
cluster wide pid, therefore it just use local PID.

    The point is, we want to minimize the effort and overhead for some
system call related to pid. In other word, there is no need to implement
cluster wide PID's rule for every process, only for process that migrate to
another node. For this manner, SSI kernel must maintain some additional
process table that maintain the list which process that has been migrated

2. How to get node number?? Well, i think we must have some "DNS" for this,
in centralized manner if everyone agree. Then, each node kept some
"ARP"-like table that maintain a temporary table which include some node
number and its IP that have most number of communication which itself. So,
if node A talk frequently with node B and C, then it list B and C in its
table. This way, we can reduce traffic and get easy administration of
cluster node numbering. Any comment or critics??

Mulyadi Santosa





Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Fri Jul 13 09:13:39 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16563AbRGMHNX>; Fri, 13 Jul 2001 09:13:23 +0200
Received: from [213.98.27.110] ([213.98.27.110]:30212 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16495AbRGMHNP>;
	Fri, 13 Jul 2001 09:13:15 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6D9KVN17224;
	Fri, 13 Jul 2001 09:20:31 GMT
Date:	Fri, 13 Jul 2001 09:20:31 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	"David L. Nicol" <dnicol@cstp.umkc.edu>
cc:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	Jordi Polo <mumismo@wanadoo.es>, <linux-cluster@nl.linux.org>
Subject: Re: the "cluster" system call
In-Reply-To: <3B4E3B52.44C093A7@cstp.umkc.edu>
Message-ID: <Pine.LNX.4.30.0107130858590.16666-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, David!

> As I mentioned earlier (months ago) :)
> it could be a mountable fake file system.

 Better focused, something like a /proc filesystem. :-)


 I thing that in that way the Mosix aproachment is a very good idea. Since
all the information is on a subdirectory of /proc, is a really
non-intrusive way to get information and to put orders to the cluster.
(non-intrusiveness is inmportant. I hardly can imagine myself rewriting
some of these molecular dynamics dinnosaurs, and, what is worse,
explaining to the HPB why I am doing this.

 Anyway, you have here a good idea, since:

>
> mount special-file mount-point -t cluster-type
>

 Could be an EXCELENT way to control dinamically the membership of a
machine. If you do:

mount /proc/cluster -t cluster

 you enter on the cluster, and with:


umount /proc/cluster

 you get away the node of the cluster. To enter automatically, is only
having the line:

none /proc/cluster cluster defaults 0 0

 On the fstab.


 (in fact, I always have thought that the most of the code of Mosix could
be reused, and we could work on parching the weakness of Mosix, only for
not to reinvent the weel; this was before the "using XML for
transmitting control information" and "broadcast on ethernet do not hurts
network performance" threads)


> The mount-point would be a directory where all the
> control interfaces, including a standard subset and
> whatever extensions the particular system adds on, will
> live.

 Same as Mosix... I think the same.

> the cluster-type would be the clustering discipline to
> give the special-file to, to set itself up.  Mount might
> be able to figure out what kind it is on its own.

 Maybe, more if we take into account that hp and ha have completly
different objectives. We can keep the same infraestructure -CPIDs, node
tables, and so on- but in the mounting type we choose if we want HP or HA.
Some of the things that HP people says -as the laptop stuff- are anthem
for HA people, and some of the things that HA people says -"broadcasting
constantly XML messages to keep on date membership information" are
somewhat hard to eat by HP people. Thus, we could redefine the mount
before as:

mount /proc/cluster -t HPcluster

 and

mount /proc/cluster -t HAcluster

and with this we enable/disable the features proposed for HA people, or
more Mosix-like features.

> I was about to write a completely user-mode system based
> on unix-domain sockets this spring but got distracted.

 Well, this already exists (PVM), and work fine. Anyway, PVM has its own
leaks, please share with us the fresh ideas that you were going to use on
your user-mode system.

 Yours:

David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sat Jul 14 05:00:47 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16054AbRGNDAf>; Sat, 14 Jul 2001 05:00:35 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:24837 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16101AbRGNDAO>;
	Sat, 14 Jul 2001 05:00:14 +0200
Received: (qmail 240848 invoked from network); 14 Jul 2001 02:55:49 -0000
Received: from nicol6.umkc.edu (HELO kasey.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 14 Jul 2001 02:55:49 -0000
Message-ID: <3B4FB3DA.47803245@kasey.umkc.edu>
Date:	Fri, 13 Jul 2001 21:52:10 -0500
From:	"David L. Nicol" <david@kasey.umkc.edu>
Organization: UMKC Information Services Central Systems
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	irbis@orcero.org
CC:	"David L. Nicol" <dnicol@cstp.umkc.edu>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	Jordi Polo <mumismo@wanadoo.es>, linux-cluster@nl.linux.org
Subject: Re: the "cluster" system call (and file system type)
References: <Pine.LNX.4.30.0107130858590.16666-100000@hermes.orcero.org>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

irbis@orcero.org wrote:
> 
>  Hello, David!
> 
> > As I mentioned earlier (months ago) :)
> > it could be a mountable fake file system.
> 
>  Better focused, something like a /proc filesystem. :-)

Part of the objection to including mosix as-is into the kernel
is all the mucking it does in /proc.  This point was made
early on by very possibly Alan himself.   So by mounting
the cluster system interfaces outside of /proc, the definitions
of them are independent of everything that happens in /proc,
and there is also very nice independence -- you could have one
machine join two completely different clusters by mounting them
on two points, and the distinction is clean.



>  I thing that in that way the Mosix aproachment is a very good idea. Since
> all the information is on a subdirectory of /proc, is a really
> non-intrusive way to get information and to put orders to the cluster.
> (non-intrusiveness is inmportant. I hardly can imagine myself rewriting
> some of these molecular dynamics dinnosaurs, and, what is worse,
> explaining to the HPB why I am doing this.

it's non-intrusive to the user, but it is very intrusive to the kernel.
When /proc gets modified in the kernel, they have to rewrite the patch.
Bolt-on fake file systems were more difficult when they ported MOSIX
to Linux, they're easier now (there's even a howto document about it
and a standard fake-file-system interface: you don't need to pretend
to be a local NFS server any more.)


 
>  Anyway, you have here a good idea, since:
> 
> >
> > mount special-file mount-point -t cluster-type
> >
> 
>  Could be an EXCELENT way to control dinamically the membership of a
> machine. If you do:
> 
> mount /proc/cluster -t cluster
> 
>  you enter on the cluster, and with:
> 
> umount /proc/cluster
> 
>  you get away the node of the cluster. To enter automatically, is only
> having the line:
> 
> none /proc/cluster cluster defaults 0 0
> 
>  On the fstab.

You'd need an awfully robust cluster system to be able to have
no configuration file at all!  I placed the configuration file
in the device-special slot in mount syntax.  That would make
mounting a mosix-1.0 emulator from fstab look something like:

/etc/mosix.map /proc/mosix cluster mosix 0 0

>  (in fact, I always have thought that the most of the code of Mosix could
> be reused, and we could work on parching the weakness of Mosix, only for
> not to reinvent the weel; this was before the "using XML for
> transmitting control information" and "broadcast on ethernet do not hurts
> network performance" threads)

Breaking it into individual components may be lots of projects. I don't
know how interrelated the things are; how hard it would be to apply
Shiloh Migration Algorithms to bproc migration, for instance.


> > The mount-point would be a directory where all the
> > control interfaces, including a standard subset and
> > whatever extensions the particular system adds on, will
> > live.
> 
>  Same as Mosix... I think the same.

Yes, except not guaranteed to live at /proc/mosix.


 
> > the cluster-type would be the clustering discipline to
> > give the special-file to, to set itself up.  Mount might
> > be able to figure out what kind it is on its own.
> 
>  Maybe, more if we take into account that hp and ha have completly
> different objectives. We can keep the same infraestructure -CPIDs, node
> tables, and so on- but in the mounting type we choose if we want HP or HA.
> Some of the things that HP people says -as the laptop stuff- are anthem
> for HA people, and some of the things that HA people says -"broadcasting
> constantly XML messages to keep on date membership information" are
> somewhat hard to eat by HP people. Thus, we could redefine the mount
> before as:
> 
> mount /proc/cluster -t HPcluster
> 
>  and
> 
> mount /proc/cluster -t HAcluster
> 
> and with this we enable/disable the features proposed for HA people, or
> more Mosix-like features.

The way mount works, you _ALWAYS_ have

	mount special directory

and type, which is declared by -t, is optional but may be required if
mount cannot determine it, for instance

	mount -t nfs humdinger.redhat.com:/pub/current/ /installtree

and then options are provided with -o switches:

	mount -t nfs humdinger.redhat.com:/pub/current/ /installtree -o soft -o
tcp


That's how mount works.

So to make the cluster file system type "cluster" all the
options would get specified with -o switches.  -o HP or -o HA
might turn on whole families of options.


 
> > I was about to write a completely user-mode system based
> > on unix-domain sockets this spring but got distracted.
> 
>  Well, this already exists (PVM), and work fine. Anyway, PVM has its own
> leaks, please share with us the fresh ideas that you were going to use on
> your user-mode system.

I meant a channel system where node-node communication is established
with a single stream connection that persists, and all other communication
between those two nodes is multiplexed over that channel.

This would remove the IP-address <--> node 1-1 mapping limit that MOSIX 
has, allowing nodes behind a NAT to peer with nodes outside for instance.

That in place, a system where you can open a new channel to a peer node
at will, the next step was to re-implement process migration over these
multiplexed streams, with file handles getting implemented as channels
so that your basic IO can occur remotely by the various handles getting
hidden behind an abstraction.

I would have to do some homework to determine efficiencies and stuff.

I think I posted a summary proposal of it earlier this year.

-- 
                                           David Nicol 816.235.1187


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sat Jul 14 09:50:12 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16139AbRGNHtz>; Sat, 14 Jul 2001 09:49:55 +0200
Received: from 140-MADR-X46.libre.retevision.es ([62.83.25.140]:12292 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16114AbRGNHth>;
	Sat, 14 Jul 2001 09:49:37 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id 6613F2FAB9; Sat, 14 Jul 2001 09:55:56 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	"mulyadi" <a_mulyadi@telkom.net>
Subject: Re: Some "crash and burn" idea about clusterwide pid
Date:	Sat, 14 Jul 2001 09:55:54 +0200
X-Mailer: KMail [version 1.2]
References: <018d01c10b56$5a05b120$1268053d@thunder>
In-Reply-To: <018d01c10b56$5a05b120$1268053d@thunder>
Cc:	<linux-cluster@nl.linux.org>
MIME-Version: 1.0
Message-Id: <01071409555400.00479@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


> Hello all (especially Jordi)

Hello Mulyadi

> Watching long threads about cluster pid, make me eager to deliver some
> ideas, here they are :
>
> 1. Since SSI made some perception to the running apps that they are still
> running on "non SSI" system, so system call related to pid's retrieval must
> be made more "smarter". I mean, if the node which called it was joined into
> a cluster system, then it return cluster wide pid, which I believe everyone
> was agree that is origin node ID + local PID. But, if somehow the node is
> temporarily disconnected from cluster, of course there is no need to use
> cluster wide pid, therefore it just use local PID.

getpid returns a pid_t that is declared as int but only the 15 lower bits 
counts, i guess that a returned PID with 16-31 bits to 0 , means the process 
is local (even if the PID is node-pid , we can reserve node 0 to access local 
pid, so we can't use a node 0 in the system ), if 16-31 are different to 0 
then the pid is a CPID , maybe of ourn own node (i prefer all the local 
processes return node=0 if no Â¿we can check it with getnode?) or another one.
But, i don't like that idea about changing pids if we are not connected, i 
thing returning 0 for node in local processes is cleaner.      

>     The point is, we want to minimize the effort and overhead for some
> system call related to pid. In other word, there is no need to implement
> cluster wide PID's rule for every process, only for process that migrate to
> another node. For this manner, SSI kernel must maintain some additional
> process table that maintain the list which process that has been migrated

what if we hace a process that never migrates but we want to take advantage 
of the CPID IPC? 

> 2. How to get node number?? Well, i think we must have some "DNS" for this,
> in centralized manner if everyone agree. Then, each node kept some
> "ARP"-like table that maintain a temporary table which include some node
> number and its IP that have most number of communication which itself. So,
> if node A talk frequently with node B and C, then it list B and C in its
> table. This way, we can reduce traffic and get easy administration of
> cluster node numbering. Any comment or critics??
>
> Mulyadi Santosa



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sat Jul 14 10:37:12 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16232AbRGNIhD>; Sat, 14 Jul 2001 10:37:03 +0200
Received: from staff.cs.usyd.edu.au ([129.78.8.1]:3982 "helo
	staff.cs.usyd.edu.au") by humbolt.nl.linux.org with SMTP
	id <S16206AbRGNIgt>; Sat, 14 Jul 2001 10:36:49 +0200
Date:	Sat, 14 Jul 2001 18:29:01 +1100
From:	bruce@staff.cs.usyd.edu.au (Bruce Janson)
Subject: Re: the "cluster" system call (and file system type)
To:	linux-cluster@nl.linux.org
Message-Id: <20010714083658Z16206-26148+3074@humbolt.nl.linux.org>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

    ..
    Date:	Fri, 13 Jul 2001 21:52:10 -0500
    From:	"David L. Nicol" <david@kasey.umkc.edu>
    ..
    Bolt-on fake file systems [are]
    easier now (there's even a howto document about it
    and a standard fake-file-system interface: you don't need to pretend
    to be a local NFS server any more.)
    ..

David,
    That's interesting (if you mean something other than just the
Coda-based variant).
Do you have a pointer to this new "Bolt-on fake file systems" feature?

Cheers,
Bruce Janson, Basser Department of Computer     Email:  bruce@cs.usyd.edu.au
Science, F09 Madsen Building, Eastern Avenue    Phone:  +61-2-9351-3423/4
University of Sydney, N.S.W., 2006, AUSTRALIA   Fax:    +61-2-9351-3838

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sat Jul 14 16:08:00 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16302AbRGNOHy>; Sat, 14 Jul 2001 16:07:54 +0200
Received: from [213.98.27.110] ([213.98.27.110]:17156 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16284AbRGNOHi>;
	Sat, 14 Jul 2001 16:07:38 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6EGE0601055;
	Sat, 14 Jul 2001 16:14:01 GMT
Date:	Sat, 14 Jul 2001 16:14:00 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	"David L. Nicol" <david@kasey.umkc.edu>
cc:	"David L. Nicol" <dnicol@cstp.umkc.edu>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	Jordi Polo <mumismo@wanadoo.es>, <linux-cluster@nl.linux.org>
Subject: Re: the "cluster" system call (and file system type)
In-Reply-To: <3B4FB3DA.47803245@kasey.umkc.edu>
Message-ID: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Helo, David!

> >  Better focused, something like a /proc filesystem. :-)
>
> Part of the objection to including mosix as-is into the kernel
> is all the mucking it does in /proc.  This point was made
> early on by very possibly Alan himself.   So by mounting

 Well, if this is the only one problem, let it be in "/cluster". ;-)

 I thought that it would be the best putting all the kernel control stuff
on "/proc". Or would it be better "/scsi", "/net", "/ipv4", "/ipv6",
"/pcmcia" and so on?

> and there is also very nice independence -- you could have one
> machine join two completely different clusters by mounting them
> on two points, and the distinction is clean.

 All the subsystems lay on "/proc". I can not understand why clusters
subsistems should be different. I thing that the best is finnaly
integrating clustering into the kernel, and that Linux became a
cluster-capable OS. If kernel people ask to the comunity, sure they find
lots of help. Yes, the 2.5 would be REALLY unstable, but we could have
amazing features for databases, servers, and workstations.

> it's non-intrusive to the user, but it is very intrusive to the kernel.


 If we want a full clustering suport, we will have to touch LOTS of
things on the kernel. It is not so easy as putting 3 beowulf patches, HA
patches and it's all, folks.

> You'd need an awfully robust cluster system to be able to have
> no configuration file at all!  I placed the configuration file
> in the device-special slot in mount syntax.  That would make
> mounting a mosix-1.0 emulator from fstab look something like:
>
> /etc/mosix.map /proc/mosix cluster mosix 0 0

 You can configure via "/proc", as other subsystems.

> So to make the cluster file system type "cluster" all the
> options would get specified with -o switches.  -o HP or -o HA
> might turn on whole families of options.


 This is a better idea than mine, some as mounting the "/proc/cluster"
filesystem with HP or HA options.

> >  Well, this already exists (PVM), and work fine. Anyway, PVM has its own
> > leaks, please share with us the fresh ideas that you were going to use on
> > your user-mode system.
>
> I meant a channel system where node-node communication is established
> with a single stream connection that persists, and all other communication
> between those two nodes is multiplexed over that channel.

 Like CPLAN portals?

 Yours:

David

---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------




Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sat Jul 14 21:59:01 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16205AbRGNT6y>; Sat, 14 Jul 2001 21:58:54 +0200
Received: from 57-MADR-X29.libre.retevision.es ([62.83.8.57]:10765 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16175AbRGNT6o>;
	Sat, 14 Jul 2001 21:58:44 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id C04512F62A; Sat, 14 Jul 2001 22:04:57 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	<irbis@orcero.org>
Subject: Re: the "cluster" system call (and file system type)
Date:	Sat, 14 Jul 2001 22:04:56 +0200
X-Mailer: KMail [version 1.2]
References: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org>
In-Reply-To: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org>
Cc:	<david@kasey.umkc.edu>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	<linux-cluster@nl.linux.org>
MIME-Version: 1.0
Message-Id: <01071422045602.01963@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

El SÃ¡bado 14 Julio 2001 18:14, escribiste:
>  Helo, David!
>
> > >  Better focused, something like a /proc filesystem. :-)
> >
> > Part of the objection to including mosix as-is into the kernel
> > is all the mucking it does in /proc.  This point was made
> > early on by very possibly Alan himself.   So by mounting
>
>  Well, if this is the only one problem, let it be in "/cluster". ;-)
>
>  I thought that it would be the best putting all the kernel control stuff
> on "/proc". Or would it be better "/scsi", "/net", "/ipv4", "/ipv6",
> "/pcmcia" and so on?
>
I lost the thread , are you talking about using /proc/cluster, /cluster or 
whatever to stats and control files or you plan to do something like mounting 
that filesystem and have 
/proc/cluster/node1/234 
/proc/cluster/node1/235  
/proc/cluster/node1/236 
/proc/cluster/node1/238
/proc/cluster/node2/23 
/proc/cluster/node2/24

A bit crazy , all the nodes need to know everything, but amazing use 
konqueror's drag'n drop to migrate a process ^_^  (i think is the first one 
anyway :P)

  
>  All the subsystems lay on "/proc". I can not understand why clusters
> subsistems should be different. I thing that the best is finnaly
> integrating clustering into the kernel, and that Linux became a
> cluster-capable OS. If kernel people ask to the comunity, sure they find
> lots of help. Yes, the 2.5 would be REALLY unstable, but we could have
> amazing features for databases, servers, and workstations.

When the need of interrupts comes the kernel takes care of it , when the need 
of  multiprocessors comes the kernel takes care of it, i just wonder why when 
the need of multicomputers comes suddenly the kernel developers decided it 
was userspace. 
Yes i know , no every one have a lan and want it . 
Well i know much more people having a lan in his house than a SMP machine. 
And several of them will love a SSI system (in fact they hardly would believe 
it ^_^ )



--
Jordi 
  Student of Spain	

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sun Jul 15 05:30:12 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16341AbRGOD3y>; Sun, 15 Jul 2001 05:29:54 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:25103 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16331AbRGOD3j>;
	Sun, 15 Jul 2001 05:29:39 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6F3S2v267458;
	Sat, 14 Jul 2001 23:28:02 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107150328.f6F3S2v267458@saturn.cs.uml.edu>
Subject: Re: the "cluster" system call (and file system type)
To:	irbis@orcero.org
Date:	Sat, 14 Jul 2001 23:28:01 -0400 (EDT)
Cc:	david@kasey.umkc.edu (David L. Nicol),
	dnicol@cstp.umkc.edu (David L. Nicol),
	bruce@kahuna.cag.cpqcorp.net (Bruce Walker),
	mumismo@wanadoo.es (Jordi Polo), linux-cluster@nl.linux.org
In-Reply-To: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org> from "irbis@orcero.org" at Jul 14, 2001 04:14:00 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

irbis@orcero.org writes:

>>>  Better focused, something like a /proc filesystem. :-)
>>
>> Part of the objection to including mosix as-is into the kernel
>> is all the mucking it does in /proc.  This point was made
>> early on by very possibly Alan himself.   So by mounting

I think I've heard Alexander Viro and Linus Torvalds both
complaining about /proc. One of them called it a dumping ground.

>  Well, if this is the only one problem, let it be in "/cluster". ;-)
...
>  All the subsystems lay on "/proc". I can not understand why clusters
> subsistems should be different. I thing that the best is finnaly
> integrating clustering into the kernel, and that Linux became a

Not everything is in /proc now. These are filesystems:

/dev            if using devfs
/dev/shm        for shared memory
/dev/pty        for new-style PTY devices
/proc           yeah, it isn't going away
/proc/openprom  something for Suns that used to be part of /proc

Just forget about adding a big tree of stuff in /proc.
Adding a single file won't get you flamed very much.
The same goes in /dev.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sun Jul 15 09:51:45 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16357AbRGOHvh>; Sun, 15 Jul 2001 09:51:37 +0200
Received: from jalon.able.es ([212.97.163.2]:27015 "EHLO jalon.able.es")
	by humbolt.nl.linux.org with ESMTP id <S16350AbRGOHvZ>;
	Sun, 15 Jul 2001 09:51:25 +0200
Received: from werewolf.able.es ([212.97.168.128]) by
          jalon.able.es (Netscape Messaging Server 4.15) with ESMTP id
          GGI8IG00.ULN; Sun, 15 Jul 2001 09:51:53 +0200 
Date:	Sun, 15 Jul 2001 09:53:49 +0200
From:	"J . A . Magallon" <jamagallon@able.es>
To:	"Albert D . Cahalan" <acahalan@cs.uml.edu>
Cc:	irbis@orcero.org, "David L . Nicol" <david@kasey.umkc.edu>,
	"David L . Nicol" <dnicol@cstp.umkc.edu>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	Jordi Polo <mumismo@wanadoo.es>, linux-cluster@nl.linux.org
Subject: Re: the "cluster" system call (and file system type)
Message-ID: <20010715095349.C10337@werewolf.able.es>
References: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org> <200107150328.f6F3S2v267458@saturn.cs.uml.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
In-Reply-To: <200107150328.f6F3S2v267458@saturn.cs.uml.edu>; from acahalan@cs.uml.edu on Sun, Jul 15, 2001 at 05:28:01 +0200
X-Mailer: Balsa 1.1.7
Content-Length:	1397
Lines:	32
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


On 20010715 Albert D. Cahalan wrote:
>irbis@orcero.org writes:
>
>>>>  Better focused, something like a /proc filesystem. :-)
>>>
>>> Part of the objection to including mosix as-is into the kernel
>>> is all the mucking it does in /proc.  This point was made
>>> early on by very possibly Alan himself.   So by mounting
>
>I think I've heard Alexander Viro and Linus Torvalds both
>complaining about /proc. One of them called it a dumping ground.
>
>>  Well, if this is the only one problem, let it be in "/cluster". ;-)
>...
>>  All the subsystems lay on "/proc". I can not understand why clusters
>> subsistems should be different. I thing that the best is finnaly
>> integrating clustering into the kernel, and that Linux became a
>

I think what people hate is polluting /proc in separate subtrees, like putting
things in /proc/sys/net/cluster, /proc/cpu/cluster, etc.
I thing thigs can be done in an transparent way defining a fake clusterfs,
with things like cluter_add_entry etc. that (by now) just call they equivalents
in proc_xxx, rooted at /proc/cluster. If sometine you have to move the tree,
well, just clone the proc_xxx functions to a new independent tree.

-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Mandrake Linux release 8.1 (Cooker) for i586
Linux werewolf 2.4.6-ac3 #1 SMP Sun Jul 15 01:23:01 CEST 2001 i686

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sun Jul 15 14:20:52 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16376AbRGOMUp>; Sun, 15 Jul 2001 14:20:45 +0200
Received: from [213.98.27.110] ([213.98.27.110]:29708 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16368AbRGOMU0>;
	Sun, 15 Jul 2001 14:20:26 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6FER0617324;
	Sun, 15 Jul 2001 14:27:00 GMT
Date:	Sun, 15 Jul 2001 14:27:00 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	Jordi Polo <mumismo@wanadoo.es>
cc:	<david@kasey.umkc.edu>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	<linux-cluster@nl.linux.org>
Subject: Re: the "cluster" system call (and file system type)
In-Reply-To: <01071422045602.01963@mioooldpc>
Message-ID: <Pine.LNX.4.30.0107151422210.17257-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, Jordi!

> whatever to stats and control files or you plan to do something like mounting
> that filesystem and have
> /proc/cluster/node1/234
> /proc/cluster/node1/235
> /proc/cluster/node1/236
> /proc/cluster/node1/238
> /proc/cluster/node2/23
> /proc/cluster/node2/24

 It is a virtual fs,  what means that the information is generated
dynamically, and there is not a great thechnical problem.  And it is easy
to do complex things from bash scripts, what would be great for
administrators.

> When the need of interrupts comes the kernel takes care of it , when the need
> of  multiprocessors comes the kernel takes care of it, i just wonder why when
> the need of multicomputers comes suddenly the kernel developers decided it
> was userspace.
> Yes i know , no every one have a lan and want it .

 That is why recompiling the kernel is good, and modules are great. :-)

 The most of the people does not want PCMCIA, neither the bizarre Memory
devices stuff.  I does not use the most of the options of the kernel,
neither the most of the people that I know. But this was not a problem, at
least until now.

 Yours:

David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sun Jul 15 14:27:24 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16381AbRGOM1F>; Sun, 15 Jul 2001 14:27:05 +0200
Received: from [213.98.27.110] ([213.98.27.110]:34316 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16373AbRGOM07>;
	Sun, 15 Jul 2001 14:26:59 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6FEXT617370;
	Sun, 15 Jul 2001 14:33:29 GMT
Date:	Sun, 15 Jul 2001 14:33:29 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
cc:	"David L. Nicol" <david@kasey.umkc.edu>,
	"David L. Nicol" <dnicol@cstp.umkc.edu>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	Jordi Polo <mumismo@wanadoo.es>, <linux-cluster@nl.linux.org>
Subject: Re: the "cluster" system call (and file system type)
In-Reply-To: <200107150328.f6F3S2v267458@saturn.cs.uml.edu>
Message-ID: <Pine.LNX.4.30.0107151427370.17257-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, all!

> Not everything is in /proc now. These are filesystems:
>
> /dev            if using devfs
> /dev/shm        for shared memory
> /dev/pty        for new-style PTY devices
> /proc           yeah, it isn't going away
> /proc/openprom  something for Suns that used to be part of /proc

 OK, but none of these are information of the system. These are devices.
The phylosophy is completly different. Ask yourself about having a device
on "/dev" with the process that  are running on a single machine. Strange,
isn't it? Then, change "local" for "node73". :-)

> Just forget about adding a big tree of stuff in /proc.
> Adding a single file won't get you flamed very much.

 The problem is about the amount of information.  I can not see all the
information and control of a whole cluster on a device. The control of the
device and the logic of the device would be so complex that it would be
better using PVM and forgetting any kernel "facilities".

 Yours:


David

---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------




Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sun Jul 15 15:10:59 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16387AbRGONKu>; Sun, 15 Jul 2001 15:10:50 +0200
Received: from 109-MADR-X29.libre.retevision.es ([62.83.8.109]:31236 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16379AbRGONKf>;
	Sun, 15 Jul 2001 15:10:35 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id 4C5562F4BE; Sun, 15 Jul 2001 15:17:01 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	"J . A . Magallon" <jamagallon@able.es>
Subject: Re: the "cluster" system call (and file system type)
Date:	Sun, 15 Jul 2001 15:17:00 +0200
X-Mailer: KMail [version 1.2]
References: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org> <200107150328.f6F3S2v267458@saturn.cs.uml.edu> <20010715095349.C10337@werewolf.able.es>
In-Reply-To: <20010715095349.C10337@werewolf.able.es>
Cc:	"Albert D . Cahalan" <acahalan@cs.uml.edu>, irbis@orcero.org,
	"David L . Nicol" <david@kasey.umkc.edu>,
	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071515170000.00696@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


> I think what people hate is polluting /proc in separate subtrees, like
> putting things in /proc/sys/net/cluster, /proc/cpu/cluster, etc.
> I thing thigs can be done in an transparent way defining a fake clusterfs,
> with things like cluter_add_entry etc. that (by now) just call they
> equivalents in proc_xxx, rooted at /proc/cluster. If sometine you have to
> move the tree, well, just clone the proc_xxx functions to a new independent
> tree.

As far as i can see you are going to do a /proc-like fs outside /proc so as 
david want the same but in /proc i think that we can begin to think in what 
will be inside that directory and later we can put it in /proc or /cluster or 
whatever. 
We are just discussing the minor thing we'll have to decide, when we have all 
we want to be inside that directory we can just ask linus where he wants it. 
Maybe is just a matter of taste :P 

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sun Jul 15 19:57:49 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16407AbRGOR5b>; Sun, 15 Jul 2001 19:57:31 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:54277 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16403AbRGOR5P>;
	Sun, 15 Jul 2001 19:57:15 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6FHtoF294627;
	Sun, 15 Jul 2001 13:55:50 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107151755.f6FHtoF294627@saturn.cs.uml.edu>
Subject: Re: the "cluster" system call (and file system type)
To:	irbis@orcero.org
Date:	Sun, 15 Jul 2001 13:55:50 -0400 (EDT)
Cc:	acahalan@cs.uml.edu (Albert D. Cahalan),
	david@kasey.umkc.edu (David L. Nicol),
	dnicol@cstp.umkc.edu (David L. Nicol),
	bruce@kahuna.cag.cpqcorp.net (Bruce Walker),
	mumismo@wanadoo.es (Jordi Polo), linux-cluster@nl.linux.org
In-Reply-To: <Pine.LNX.4.30.0107151427370.17257-100000@hermes.orcero.org> from "irbis@orcero.org" at Jul 15, 2001 02:33:29 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

irbis@orcero.org writes:

>> Not everything is in /proc now. These are filesystems:
>>
>> /dev            if using devfs
>> /dev/shm        for shared memory
>> /dev/pty        for new-style PTY devices
>> /proc           yeah, it isn't going away
>> /proc/openprom  something for Suns that used to be part of /proc
>
> OK, but none of these are information of the system. These are devices.

First of all, what is the difference? Consider /proc/kcore and /dev/mem.

Second of all, /proc/openprom is a whole tree of device information.
It used to be part of the /proc filesystem (thus the mountpoint), but
it was removed as part of an effort to clean the crud out or /proc.

> The phylosophy is completly different. Ask yourself about having a device
> on "/dev" with the process that  are running on a single machine. Strange,
> isn't it? Then, change "local" for "node73". :-)

/multi/73/ctl
/multi/73/mem
/multi/73/mailbox
...

If you do the SSI trick, then remote processes appear in /proc just
like the local processes do. Otherwise /proc is mostly untouched.

>> Just forget about adding a big tree of stuff in /proc.
>> Adding a single file won't get you flamed very much.
>
> The problem is about the amount of information.  I can not see all the
> information and control of a whole cluster on a device. The control of the
> device and the logic of the device would be so complex that it would be
> better using PVM and forgetting any kernel "facilities".

What is wrong with that?

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sun Jul 15 20:35:32 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16071AbRGOSf2>; Sun, 15 Jul 2001 20:35:28 +0200
Received: from [213.98.27.110] ([213.98.27.110]:50447 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16031AbRGOSfQ>;
	Sun, 15 Jul 2001 20:35:16 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6FKfh620053;
	Sun, 15 Jul 2001 20:41:43 GMT
Date:	Sun, 15 Jul 2001 20:41:43 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
cc:	"David L. Nicol" <david@kasey.umkc.edu>,
	"David L. Nicol" <dnicol@cstp.umkc.edu>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	Jordi Polo <mumismo@wanadoo.es>, <linux-cluster@nl.linux.org>
Subject: Re: the "cluster" system call (and file system type)
In-Reply-To: <200107151755.f6FHtoF294627@saturn.cs.uml.edu>
Message-ID: <Pine.LNX.4.30.0107152014140.19848-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list




 Hello, all!

> > OK, but none of these are information of the system. These are devices.
>
> First of all, what is the difference? Consider /proc/kcore and /dev/mem.

 The mem can be seen as a block device. I can find how to see the
information onf the machines of a cluster on a block device. In a char
device, it is harder.

> Second of all, /proc/openprom is a whole tree of device information.
> It used to be part of the /proc filesystem (thus the mountpoint), but
> it was removed as part of an effort to clean the crud out or /proc.

 I can not see why this is good. In fact, I can not see why is so
neccessary clean /proc, as far as it is very usefull, allow from shell
scripts to do lots of cool things that administrators likes, and give to
us a clear interface to the inner things of the system, in a way that it
is independent for the language.

> > The phylosophy is completly different. Ask yourself about having a device
> > on "/dev" with the process that  are running on a single machine. Strange,
> > isn't it? Then, change "local" for "node73". :-)
>
> /multi/73/ctl
> /multi/73/mem
> /multi/73/mailbox

 And how we do to map processes to devices? And, we will have one
interface to control local processes, and other to control remote
processes?  How we handle with we have the information of "/proc" on
your "/multi" def filesystem on a different format? Do we need to
redesing the whole apps that use "/proc" to understand the cluster? :-?

> If you do the SSI trick, then remote processes appear in /proc just
> like the local processes do. Otherwise /proc is mostly untouched.

 This is not realistics. Even on a SSI cluster, there are times that you
want to know here is running a process. As an exaple, if you want to shut
down a node for maintenance, or you want to discontect for the network, or
for debbuging purporses, or because the process is bounded to the hardware
-as X server-, or because is an important local daemon -as crond-... there
are lots of cases of this. If you perceive that one httpd fails because
the network on this node is bad, and you can not discover where is running
the httpd that is advicing you about the problem, you will enjoy on a
65000 nodes testing each one by one disconected for the cluster to
discover what is happening.

> > The problem is about the amount of information.  I can not see all the
> > information and control of a whole cluster on a device. The control of the
> > device and the logic of the device would be so complex that it would be
> > better using PVM and forgetting any kernel "facilities".
>
> What is wrong with that?

 What is wrong? Nothing, if we are talking about clustering on the kernel
for "hype" and for marketing purporses. But if we want that Linux have
some add value over other operating systems that runs PVM (as AIX,
Solaris, SCO, and the most of the flavours of Unix, and other
pseudo-OS like Windows 98 and Windows NT) we must to give to the top user,
the administrator and the programer "facilities", not "dificulties". Doing
a messy, flaky, complex and non-orthogal clustering mechanism to the
kernel will be  worst than giving nothing, because nothing -what is that
we have now- take less time, and we must center our efforts to develop
under PVM. The only  difference is that developing under PVM my software
is 100% compatible  with other Unix flavours, and if someday the whole
kernel people gets mad  and include Apache and the X server on the kernel
-don't smile, we have  yet graphical code and a HTTP daemon inside the
kernel-, I can get my softare and use other underlaying OS.

  Think from the point of view of a developer: now my software have five
users on the world, and maybe there is one or two people that could have
interest. If I migrate to a Linux-only API, I will lost the half of my
users -I and a friend will be the only users-. If you do the API complex,
nobody will port his software or will going to develop new software to the
Linux cluster infraestructure.

 Think from the point of view of a sysadm: my clusters are a completly
headache, due to strange voodoo problems that happend with networks when
you really use them. A great cluster sysadm must be strict,
self-organizated and follow literaly KISS philosophy. If the kernel
cluster is complex to administrate and to understand, there will be
difficult a system administrator that wants put this staff on a 512 node
cluster.


 I repeat myself: the logic must be simple. Orthogonal. And I must be able
to catch any information that I want about any process or any node from
any node on an easy an orthogonal way. Magagement of great clusters is a
difficult thing, and programming parallel aplications is also a difficult
thing. If you want that somebody choose Linux against PVM+anything, it
would be easier for two both.

 Yours:

David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Sun Jul 15 21:26:56 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16432AbRGOT0r>; Sun, 15 Jul 2001 21:26:47 +0200
Received: from jalon.able.es ([212.97.163.2]:60823 "EHLO jalon.able.es")
	by humbolt.nl.linux.org with ESMTP id <S16410AbRGOT0k>;
	Sun, 15 Jul 2001 21:26:40 +0200
Received: from werewolf.able.es ([212.97.168.128]) by
          jalon.able.es (Netscape Messaging Server 4.15) with ESMTP id
          GGJ4P200.0MW; Sun, 15 Jul 2001 21:27:02 +0200 
Date:	Sun, 15 Jul 2001 21:29:05 +0200
From:	"J . A . Magallon" <jamagallon@able.es>
To:	Jordi Polo <mumismo@wanadoo.es>
Cc:	"J . A . Magallon" <jamagallon@able.es>,
	"Albert D . Cahalan" <acahalan@cs.uml.edu>, irbis@orcero.org,
	"David L . Nicol" <david@kasey.umkc.edu>,
	linux-cluster@nl.linux.org
Subject: Re: the "cluster" system call (and file system type)
Message-ID: <20010715212905.A11513@werewolf.able.es>
References: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org> <200107150328.f6F3S2v267458@saturn.cs.uml.edu> <20010715095349.C10337@werewolf.able.es> <01071515170000.00696@mioooldpc>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
In-Reply-To: <01071515170000.00696@mioooldpc>; from mumismo@wanadoo.es on Sun, Jul 15, 2001 at 15:17:00 +0200
X-Mailer: Balsa 1.1.7
Content-Length:	1553
Lines:	32
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


On 20010715 Jordi Polo wrote:
>
>> I think what people hate is polluting /proc in separate subtrees, like
>> putting things in /proc/sys/net/cluster, /proc/cpu/cluster, etc.
>> I thing thigs can be done in an transparent way defining a fake clusterfs,
>> with things like cluter_add_entry etc. that (by now) just call they
>> equivalents in proc_xxx, rooted at /proc/cluster. If sometine you have to
>> move the tree, well, just clone the proc_xxx functions to a new independent
>> tree.
>
>As far as i can see you are going to do a /proc-like fs outside /proc so as 
>david want the same but in /proc i think that we can begin to think in what 
>will be inside that directory and later we can put it in /proc or /cluster or 
>whatever. 
>We are just discussing the minor thing we'll have to decide, when we have all 
>we want to be inside that directory we can just ask linus where he wants it. 
>Maybe is just a matter of taste :P 
>

That's what I wanted to say, I did not make it very clear. If all the info and control
is to be in a file system, I see (I'm not a kernel hacker) two ways, devfs-like
and proc-like. I only see using proc as a fast hack to start using and designing
something that perhaps will need some special design or feature.
But using /proc to start work, people can focus on what features want there.


-- 
J.A. Magallon                           #  Let the source be with you...        
mailto:jamagallon@able.es
Mandrake Linux release 8.1 (Cooker) for i586
Linux werewolf 2.4.6-ac3 #1 SMP Sun Jul 15 01:23:01 CEST 2001 i686

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 00:57:48 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16248AbRGOW5l>; Mon, 16 Jul 2001 00:57:41 +0200
Received: from 216-MADR-X29.libre.retevision.es ([62.83.8.216]:24580 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16216AbRGOW5Z>;
	Mon, 16 Jul 2001 00:57:25 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id 6DA7D2F4BD; Mon, 16 Jul 2001 01:03:25 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	"J . A . Magallon" <jamagallon@able.es>
Subject: Re: the "cluster" system call (and file system type)
Date:	Mon, 16 Jul 2001 01:03:24 +0200
X-Mailer: KMail [version 1.2]
References: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org> <01071515170000.00696@mioooldpc> <20010715212905.A11513@werewolf.able.es>
In-Reply-To: <20010715212905.A11513@werewolf.able.es>
Cc:	"Albert D . Cahalan" <acahalan@cs.uml.edu>, irbis@orcero.org,
	"David L . Nicol" <david@kasey.umkc.edu>,
	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071601032402.00502@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

El Domingo 15 Julio 2001 21:29, escribiste:
> On 20010715 Jordi Polo wrote:
> >> I think what people hate is polluting /proc in separate subtrees, like
> >> putting things in /proc/sys/net/cluster, /proc/cpu/cluster, etc.
> >> I thing thigs can be done in an transparent way defining a fake
> >> clusterfs, with things like cluter_add_entry etc. that (by now) just
> >> call they equivalents in proc_xxx, rooted at /proc/cluster. If sometine
> >> you have to move the tree, well, just clone the proc_xxx functions to a
> >> new independent tree.
> >
> >As far as i can see you are going to do a /proc-like fs outside /proc so
> > as david want the same but in /proc i think that we can begin to think in
> > what will be inside that directory and later we can put it in /proc or
> > /cluster or whatever.
> >We are just discussing the minor thing we'll have to decide, when we have
> > all we want to be inside that directory we can just ask linus where he
> > wants it. Maybe is just a matter of taste :P
>
> That's what I wanted to say, I did not make it very clear. If all the info
> and control is to be in a file system, I see (I'm not a kernel hacker) two
> ways, devfs-like and proc-like. I only see using proc as a fast hack to
> start using and designing something that perhaps will need some special
> design or feature.
> But using /proc to start work, people can focus on what features want
> there.

Perfect, i don't mind if we spend the next 3 months just discussing the 
architecture . Otherwise i think we'll realize about things too late (with a 
lot of effort coding done) , so let it be /proc if you like and let's start 
to talk about arch.

I like a lot your idea about mounting a Fs to make our node member of a 
cluster. There we can have :
1.-Info about other nodes (maybe processes but more overload) there, things 
that we'll need to make better decitions(node number -ip , cost to arrive 
there, Â¿cpu mem loads?
2.- Configuration of the cluster parameters in the /proc way, we mainly have 
to make it as automatic as possible but let the administrator control for 
fine tuning.


Sorry to be so generic, i have no time now, i'll explain longer tomorrow


Something that worries me is how will we find the processes (as they go away 
from his node) we have 4 alternatives:

1.- a central server knows all : uggh , i don't like any weak point like that.
2.- We do a broadcast (maybe multicast .....) requesting that process.
3.- The info where the process lives  (i don't thing a process will migrate a 
lot )is updated in the node where the process was born . so we can just ask 
the node that according to the cpid is the local node and there will be the 
info . If that node is down we can use 1 or 2 . I like this alternative most. 
This is like a cache, if the info is not in that node we do a broadcast that 
is a more expensive operation.      
4.- if the processes don't migrate a lot makes sense update the info in 
everynode ?? i have no idea about this, surely in a single ethernet with hub 
it makes sense (if we have choosen number 3), but i don't know about other 
configurations. 


In broadcast in lans that use routers and are connected with other lans 
members of a cluster. the router can get that info from every lan   so if 
other lan make a broadcast request it doesn't have to broadcast it to the 
other lan because he has the info. Or just leave a cache of the last request 
so it doesn't have to make a request next time. 

And we surely want a cache in everynode of the processes requested (as it may 
be requested again) similar to the arp cache but this cache is not trown 
awayas arp but when we go to a node requesting a process that is not there 
(he migrates but our cache is outdated) that node say as it migrated so we 
choose 1234 above and update our cache, if that node is down we use 1234 also 
       
Feedback really wellcome . (written very late at night , i'm not sure if it's 
all rubbish :P)     

--
Jordi
  Student of Spain 

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 03:52:53 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16401AbRGPBwi>; Mon, 16 Jul 2001 03:52:38 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:27401 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16376AbRGPBwS>;
	Mon, 16 Jul 2001 03:52:18 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6G1omd314427;
	Sun, 15 Jul 2001 21:50:48 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107160150.f6G1omd314427@saturn.cs.uml.edu>
Subject: Re: the "cluster" system call (and file system type)
To:	irbis@orcero.org
Date:	Sun, 15 Jul 2001 21:50:48 -0400 (EDT)
Cc:	acahalan@cs.uml.edu (Albert D. Cahalan),
	david@kasey.umkc.edu (David L. Nicol),
	dnicol@cstp.umkc.edu (David L. Nicol),
	bruce@kahuna.cag.cpqcorp.net (Bruce Walker),
	mumismo@wanadoo.es (Jordi Polo), linux-cluster@nl.linux.org
In-Reply-To: <Pine.LNX.4.30.0107152014140.19848-100000@hermes.orcero.org> from "irbis@orcero.org" at Jul 15, 2001 08:41:43 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

irbis@orcero.org writes:

>>> OK, but none of these are information of the system. These are devices.
>>
>> First of all, what is the difference? Consider /proc/kcore and /dev/mem.
>
> The mem can be seen as a block device. I can find how to see the
> information onf the machines of a cluster on a block device. In a char
> device, it is harder.

Never mind what the file type is. Until just recently, /proc/self/maps
was a pipe. There was a /dev/sndstat (a character device) that acted
just like most of the files in /proc. You could make a directory
readable even, so that "cat /foo/mydir/" displays "I'm a dir!".

>> Second of all, /proc/openprom is a whole tree of device information.
>> It used to be part of the /proc filesystem (thus the mountpoint), but
>> it was removed as part of an effort to clean the crud out or /proc.
>
>  I can not see why this is good.

It would be nice if you could, but that doesn't matter. Most of
the serious kernel hackers want to clean up /proc. That certainly
includes Alexander Viro and seems to include Linus Torvalds.

> In fact, I can not see why is so
> neccessary clean /proc, as far as it is very usefull, allow from shell
> scripts to do lots of cool things that administrators likes, and give to
> us a clear interface to the inner things of the system, in a way that it
> is independent for the language.

You don't need the /proc filesystem for this. You can write your
own filesystem instead, and you can mount it somewhere else.

>>> The phylosophy is completly different. Ask yourself about having a
>>> device on "/dev" with the process that are running on a single
>>> machine. Strange, isn't it? Then, change "local" for "node73". :-)
>>
>> /multi/73/ctl
>> /multi/73/mem
>> /multi/73/mailbox
>
> And how we do to map processes to devices? And, we will have one
> interface to control local processes, and other to control remote
> processes?

First one must consider: is this even a good thing to be doing?

If you go with SSI, your processes show up in /proc. If not, use
whatever admin daemon and cluster management interface you like.

Who said anything about devices anyway? The file type does not
matter at all. Linux can support any operation on any file type.
Your filesystem just needs to set the right pointers. I say we
make everything a... how about type 0x9000, does that sound cool?
We can call this S_IFPOO, the Penguin Poo type.

> How we handle with we have the information of "/proc" on
> your "/multi" def filesystem on a different format? Do we need to
> redesing the whole apps that use "/proc" to understand the cluster? :-?

Yep, you need to do that anyway for a non-SSI cluster.
Apps that use /proc do not understand cluster node IDs
and do not understand /proc/node73 either.

>> If you do the SSI trick, then remote processes appear in /proc just
>> like the local processes do. Otherwise /proc is mostly untouched.
>
> This is not realistics. Even on a SSI cluster, there are times that you
> want to know here is running a process.

Add a line to /proc/*/status for that.

> As an exaple, if you want to shut
> down a node for maintenance, or you want to discontect for the network, or
> for debbuging purporses, or because the process is bounded to the hardware
> -as X server-,

I have to wonder why you'd be running an X server, but OK...

> or because is an important local daemon -as crond-...

If you do SSI, there should be only one crond running. It need not
be bound to anything.

> there
> are lots of cases of this. If you perceive that one httpd fails because
> the network on this node is bad, and you can not discover where is running
> the httpd that is advicing you about the problem, you will enjoy on a
> 65000 nodes testing each one by one disconected for the cluster to
> discover what is happening.

How do you do that in any case? Maybe have the server stuff some
encrypted debug info into the response headers. So you get the PID.
(if this were not SSI, get the node ID as well) Then just kill it.

>>> The problem is about the amount of information.  I can not
>>> see all the information and control of a whole cluster on a
>>> device. The control of the device and the logic of the device
>>> would be so complex that it would be better using PVM and
>>> forgetting any kernel "facilities".
>>
>> What is wrong with that?
>
> What is wrong? Nothing, if we are talking about clustering on the kernel
> for "hype" and for marketing purporses. But if we want that Linux have
> some add value over other operating systems that runs PVM (as AIX,
> Solaris, SCO, and the most of the flavours of Unix, and other
> pseudo-OS like Windows 98 and Windows NT) we must to give to the top user,
> the administrator and the programer "facilities", not "dificulties".

Throwing everything in /proc sounds like hype and marketing to me.
Without SSI, the need for kernel support is greatly reduced.

> Doing
> a messy, flaky, complex and non-orthogal clustering mechanism to the
> kernel will be  worst than giving nothing, because nothing -what is that
> we have now- take less time, and we must center our efforts to develop
> under PVM.

You could have a nice library that hides the underlying interface.
You don't need to care if the library reads /proc, reads /multi,
or connects to a daemon on another node.

> Think from the point of view of a sysadm: my clusters are a completly
> headache, due to strange voodoo problems that happend with networks when
> you really use them. A great cluster sysadm must be strict,
> self-organizated and follow literaly KISS philosophy. If the kernel
> cluster is complex to administrate and to understand, there will be
> difficult a system administrator that wants put this staff on a 512 node
> cluster.

I don't see how throwing more crud into /proc will help you with
this problem.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 04:48:10 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16450AbRGPCsB>; Mon, 16 Jul 2001 04:48:01 +0200
Received: from lmg.ahnet.net ([207.150.192.13]:36640 "EHLO lmg01.affinity.com")
	by humbolt.nl.linux.org with ESMTP id <S16434AbRGPCrw>;
	Mon, 16 Jul 2001 04:47:52 +0200
Received: from blister.redondo.realbig.com ([24.16.173.193]) by lmg.ahnet.net with ESMTP id <293779-9614>; Sun, 15 Jul 2001 19:47:49 -0700
Date:	Sun, 15 Jul 2001 19:47:32 -0700 (PDT)
From:	Andy Poling <andy@realbig.com>
To:	irbis@orcero.org
cc:	linux-cluster@nl.linux.org, "David L. Nicol" <david@kasey.umkc.edu>
Subject: Re: the "cluster" system call (and file system type)
In-Reply-To: <Pine.LNX.4.30.0107152014140.19848-100000@hermes.orcero.org>
Message-ID: <Pine.LNX.4.21.0107151938150.967-100000@blister.redondo.realbig.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

irbis@orcero.org sez:
>  I can not see why this is good. In fact, I can not see why is so
> neccessary clean /proc, as far as it is very usefull, allow from shell
> scripts to do lots of cool things that administrators likes, and give to
> us a clear interface to the inner things of the system, in a way that it
> is independent for the language.

Remember the original purpose of /proc (as created by Bell Labs according to
the deep recesses of my memory).  It was intended to provide information and
control points for running processes - nothing more.  Thus the name "proc". :-)

The Linux community has taken that concept and extended it to also encompass
anything related to the running kernel... and even outside the kernel in some
cases.

I think it's a good idea to establish a seperate pseudo-filesystem hierarchy
for cluster-related information.  That wouldn't necessarily mean that it has
to be difficult to implement/maintain - simply that we wouldn't further
bastardize the original /proc concept...


David Nicol sez:
> Bolt-on fake file systems were more difficult when they ported MOSIX
> to Linux, they're easier now (there's even a howto document about it
> and a standard fake-file-system interface: you don't need to pretend
> to be a local NFS server any more.)

I'd definitely like to take a look at that howto doc... where can it be found?

-Andy





Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 08:59:46 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16474AbRGPG7g>; Mon, 16 Jul 2001 08:59:36 +0200
Received: from [213.98.27.110] ([213.98.27.110]:27908 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16457AbRGPG7V>;
	Mon, 16 Jul 2001 08:59:21 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6G96t625009;
	Mon, 16 Jul 2001 09:06:56 GMT
Date:	Mon, 16 Jul 2001 09:06:55 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	Andy Poling <andy@realbig.com>
cc:	<linux-cluster@nl.linux.org>,
	"David L. Nicol" <david@kasey.umkc.edu>
Subject: Re: the "cluster" system call (and file system type)
In-Reply-To: <Pine.LNX.4.21.0107151938150.967-100000@blister.redondo.realbig.com>
Message-ID: <Pine.LNX.4.30.0107160903400.24929-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, Andy:

> Remember the original purpose of /proc (as created by Bell Labs according to
> the deep recesses of my memory).  It was intended to provide information and
> control points for running processes - nothing more.  Thus the name "proc". :-)

 Well, we are talking about process. Then, I can not find the problem. :-)

> The Linux community has taken that concept and extended it to also encompass
> anything related to the running kernel... and even outside the kernel in some
> cases.

 Personally, as system administrator and programmer, I find the "/proc"
mechanism to control nearly anything one of the strong points of the
Linux kernel.

 Yours:

David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 09:28:24 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16465AbRGPH2S>; Mon, 16 Jul 2001 09:28:18 +0200
Received: from [213.98.27.110] ([213.98.27.110]:40452 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16416AbRGPH17>;
	Mon, 16 Jul 2001 09:27:59 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6G9YH625169;
	Mon, 16 Jul 2001 09:34:17 GMT
Date:	Mon, 16 Jul 2001 09:34:17 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
cc:	"David L. Nicol" <david@kasey.umkc.edu>,
	"David L. Nicol" <dnicol@cstp.umkc.edu>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	Jordi Polo <mumismo@wanadoo.es>, <linux-cluster@nl.linux.org>
Subject: Re: the "cluster" system call (and file system type)
In-Reply-To: <200107160150.f6G1omd314427@saturn.cs.uml.edu>
Message-ID: <Pine.LNX.4.30.0107160907170.24929-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, all!

> >> It used to be part of the /proc filesystem (thus the mountpoint), but
> >> it was removed as part of an effort to clean the crud out or /proc.
> >
> >  I can not see why this is good.
>
> It would be nice if you could, but that doesn't matter. Most of

 Well, it should be some rationale. "Authority" rationale could work on
middle ages with Aristoteles teaching, but for me never did work very
well. For me, it DOES matter.


> the serious kernel hackers want to clean up /proc. That certainly
> includes Alexander Viro and seems to include Linus Torvalds.

 Then, the rationale is: Linus and Viro says?

> > And how we do to map processes to devices? And, we will have one
> > interface to control local processes, and other to control remote
> > processes?
>
> First one must consider: is this even a good thing to be doing?


 No. That is what I amy trying to say on the whole thread. We have a
interface for local process, "/proc". Let it be the remote process on
"/proc", with some more information about where is really executing. Maybe
it is only good for people that does not matter what they thing -as
myself-, but it sounds logic, and orthogonal.

> > As an exaple, if you want to shut
> > down a node for maintenance, or you want to discontect for the network, or
> > for debbuging purporses, or because the process is bounded to the hardware
> > -as X server-,
>
> I have to wonder why you'd be running an X server, but OK...

 Well, it is only a CLEAR example. I always run ONE X server. Maybe you
are so inteligent that only reading a PDB file with vi can ajust the
molecules position to refine the structure of a protein. I am not SO
inteligent, and I need to see it. (I am working on the full automatization
of the process. When I get it, maybe I win the nobel prize. ;-) but now
you NEED X.

 After explaining to you why anybody want to execute a X server, If
somebody is enough inteligent for not to need to see the protein, or you
work on other think, thing on a DSP, or thing on a ASIC.

> > or because is an important local daemon -as crond-...
>
> If you do SSI, there should be only one crond running. It need not
> be bound to anything.

 No. You will have local crond. And maybe a global crond. There are
maintance task on each node of a SSI cluster, and I will NOT doing them by
hand.

> > the httpd that is advicing you about the problem, you will enjoy on a
> > 65000 nodes testing each one by one disconected for the cluster to
> > discover what is happening.
>
> How do you do that in any case? Maybe have the server stuff some
> encrypted debug info into the response headers. So you get the PID.
> (if this were not SSI, get the node ID as well) Then just kill it.

 No, I am not talking about this. I am talking about hardware problems on
one node, that you discover with the messages of the applications.
Hardware fails on the real world.

> > Solaris, SCO, and the most of the flavours of Unix, and other
> > pseudo-OS like Windows 98 and Windows NT) we must to give to the top user,
> > the administrator and the programer "facilities", not "dificulties".
>
> Throwing everything in /proc sounds like hype and marketing to me.
> Without SSI, the need for kernel support is greatly reduced.

 Without SSI, without a common "/proc", what is exactly what you want for
clustering on the kernel? The weel was invented: it is PVM. We have all at
userland now, with PVM. If you feel good with all clustering on userland,
try PVM.

> You could have a nice library that hides the underlying interface.
> You don't need to care if the library reads /proc, reads /multi,
> or connects to a daemon on another node.

 Once again, that is the state of art TWENTY years ago. We have this
twenty years ago. It is not needed so hype for your proponsal. Nearly all
OS have it. And Linux have it since sockets were included on the kernel.

> > self-organizated and follow literaly KISS philosophy. If the kernel
> > cluster is complex to administrate and to understand, there will be
> > difficult a system administrator that wants put this staff on a 512 node
> > cluster.
>
> I don't see how throwing more crud into /proc will help you with
> this problem.

 On my Mosix cluster, It helps a lot. Maybe I was doing the wrong thing
administrating it by scripts, and I should do it by hand. It is great
doing to work on weekends to do manualy the things, and wake up to strange
hours to do maintance work over the processes.

 Yours:

 David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 14:33:32 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16178AbRGPMdY>; Mon, 16 Jul 2001 14:33:24 +0200
Received: from gate.in-addr.de ([212.8.193.158]:14607 "EHLO mx.in-addr.de")
	by humbolt.nl.linux.org with ESMTP id <S16095AbRGPMdI>;
	Mon, 16 Jul 2001 14:33:08 +0200
Received: from hermes.marowsky-bree.de (localhost [127.0.0.1])
	by mx.in-addr.de (mail.in-addr.de) with ESMTP
	id 0E590421E0; Mon, 16 Jul 2001 14:33:02 +0200 (CEST)
Received: by hermes.marowsky-bree.de (Postfix, from userid 500)
	id 6EEBC1AD14; Mon, 16 Jul 2001 14:33:27 +0200 (CEST)
Date:	Mon, 16 Jul 2001 14:33:27 +0200
From:	Lars Marowsky-Bree <lmb@suse.de>
To:	David Brower <David.Brower@oracle.com>
Cc:	Peter Badovinatz <tabmowzo@yahoo.com>, linux-cluster@nl.linux.org,
	Greg Lindahl <lindahl@conservativecomputer.com>
Subject: Re: Clusterwide pids
Message-ID: <20010716143327.D1400@marowsky-bree.de>
References: <20010712211808.53179.qmail@web9207.mail.yahoo.com> <3B4E1989.6FAD51C4@oracle.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.3.16i
In-Reply-To: <3B4E1989.6FAD51C4@oracle.com>; from "David Brower" on 2001-07-12T14:41:29
X-Ctuhulu: HASTUR
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On 2001-07-12T14:41:29,
   David Brower <David.Brower@oracle.com> said:

I would happen to agree.

SSI clusters are neat and all that, and I agree that "share or do not share"
is _one_ viable approach to it - but the general framework might allow for
local processes too. Oh well.

For now, I would be happy if we made progress on the issues of defining a
common API for the membership and communication layers and identify further
components (DLM, resource management, node eviction etc) and start from that.

Because even if hell freezes over you are going to need this to get SSI right
on a Unix like OS ;-) And you need a solid fundament for this.

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 14:45:50 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16352AbRGPMpp>; Mon, 16 Jul 2001 14:45:45 +0200
Received: from pentafluge.infradead.org ([195.224.55.251]:33035 "EHLO
	pentafluge.infradead.org") by humbolt.nl.linux.org with ESMTP
	id <S16250AbRGPMpN>; Mon, 16 Jul 2001 14:45:13 +0200
Received: from dclient116-28.hispeed.ch ([62.2.116.28] helo=localhost)
	by pentafluge.infradead.org with smtp (Exim 3.22 #1 (Red Hat Linux))
	id 15M7dh-0007my-00
	for <linux-cluster@nl.linux.org>; Mon, 16 Jul 2001 13:38:52 +0100
Message-ID: <6052200171169343450@e-wideweb.com>
X-EM-Version: 4, 5, 0, 0
X-EM-Registration: #30C3410514B417038530
Reply-To: mma@e-wideweb.com
From:	"Urs Stettler" <mma@e-wideweb.com>
To:	"linux-cluster@nl.linux.org" <linux-cluster@nl.linux.org>
Subject: WARNING!!! DON'T READ THIS IF YOU DON'T LIKE MONEY
Date:	Mon, 16 Jul 2001 11:3:43 +0200
MIME-Version: 1.0
Content-type: text/plain; charset=iso-8859-1
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

WOW THIS REALLY WORKS!!!

Dear Friend & Future Millionaire:

AS SEEN ON NATIONAL TV:
Making over half a million dollars every 4-5 months from your home for 
an investment of only $25 U.S. Dollars expense one time.
THANKS TO THE COMPUTER AGE AND THE INTERNET!!
============================================
BE A MILLIONAIRE LIKE THE OTHERS WITH IN A YEAR!!!
Before you say "BULL" please read the following. This is the letter you
have beeen hearing about on the news lately. Due to the populartity of
this letter on the inernet , A national weekly news program recently 
devoted an entire show to the investigation of this program described
below, to see
if it really can make people money. The show also investigated wheather 
or not the program was legal. Their findings proved once and for all that
there are absolutely NO LAWS
prohibiting the participation in the program and if people can follow the 
simple instrucyions, they are bound to make some mega bucks with only
$25 out of pocket cost". DUE TO THE RECENT INCREASE OF 
POPULARITY & RESPECT THIS PROGRAM HAS ATTANED,
IT IS CURRENTLY WORKING BETTER THAN EVER.
This is what one had to say:"Thanks to this profitable opportunity, I
was approached many times before but each time I pased on it. I am 
so glad I finally joined to see what one could expect in return for the
minimal effort and money required. To my astonishment, I recieved total
$610,470.00 in 21 weeks, with money still comming in."
Pam Hedland, Fort Lee New Jersey.
============================================
Here is another testimonial:  "This program has been around for a long
time but I never believed in it. But one day when I recieved this again 
in the mail I decided to gamble my $25 on it. I followed the simple 
instructions and walaa.....3 weeks later the money started to come in.
First month I only made $240.00 but the next two months I made 
a total of $290,000.00. So far in the past 8 months by re-entering the 
program, I have made over $710,000.00 and I am playing it again. The
key to success in this program is to follow the the simple steps and NOT 
change anything." More testimonials later but first.

=====PRINT THIS NOW FOR YOUR FUTURE REFERANCE=========
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
If you would like to make at least $500,000 every 4 to 5 months easily and
comfortably, please read the following...THEN READ IT AGAIN and 
AGAIN!!!
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
FOLLOW THE SIMPLE INSTRUCTIONS BELOW AND YOUR 
FINANCIAL
DREAMS WILL COME TRUE, GUARANTEED! 
 
INSTRUCTIONS
=====Order all 5 reports shown on the list below=====
For each report, send $5 CASH ,THE NAME & NUMBER OF THE 
REPORTYOU ARE ORDERINGand YOUR E-MAIL ADDRESS to the person 
whose name appears ON THAT LIST next to the report. MAKE SURE YOUR 
RETURN ADDRESS IS ON THE ENVELOPE TOP LEFT CORNER in case of any 
mail problems.
===When you place your order, make sure you order each of the 5 reports.
You will need all 5 reports so that you can save them on your computer
and resell them. YOUR TOTAL COST $5 X 5=$25.00.
Within a few days you will recieve, via e-mail, each of the five reports
from 
these 5 different individuals. Save them on your computer so they will be
accessible for you to send to the 1000's of people who will order them
from you. Also make a floppy of these reports and keep it on your desk in
case something happens to your computer.

IMPORTANT-DO NOT  alter the names of the peple who are listed next
to each report, or their sequence on the list, in any way, other than what
is
instructed below in step "1 through 6" or you will lose out on the majority
of your porfits, Once you understand the way this works you will also see
how it does not work if you change it. Remember, this method has been 
tested, and if you alter it, it will NOT work!!! People have tried to put
there
friends/relatives names on all five thinking they could get all the money. 
But it does not work this way. Believe us, we have all tried to be greedy
and 
then nothing happened. So DO NOT try to change anything other than what is 
instructed. Because if you do it will not work for you.
Remember, honesty reaps the reward!!!!

1....After you have ordered all 5 reports , take this advertisement and
REMOVE the name & address of the person in REPORT # 5. This person 
has made it through the cycle and is no doubt counting there fortunes.
2.... Move the name & address in REPORT #4 down To REPORT #5.
3.... MOVE the name & address in REPORT #3 down To REPORT #4.
4.... MOVE the name & address in REPORT #2 down To REPORT #3.
5.... MOVE the name & address in REPORT #1 down To REPORT #2.
6.... Insert YOUR name & address in the REPORT #1 Position. PLEASE 
MAKE SURE you copy every name & address ACCURATELY!
==============================================
**** Take this entire letter ,with the modified list of names, and save it
on 
your computer. DO NOT MAKE ANY OTHER CHANGES.
Save this on a disk as well just in case if you loose any data. To assist
you with
marketing your business on the internet, the 5 reports you purchase will 
provide you with invaluable marketing information which includes how to
send 
bulk e-mails legally, where to find thousands of free classified ads and
much 
more.
.
There are 2 Primary methods to get this venture going:
METHOD # 1: BY SENDING BULK E-MAIL LEGALLY
================================================
Let's say that you decide to start small, just to see how it goes, and we
will
assume You and those involved send out only 5,000 e-mails each. Let's
also assume that the mailing recieve only a 0.2% response (the response
could be much better but lets just say it is only 0.2% also many people
will send out hundreds of thousands of e-mails instead of only 5,000 each).
Continuing with this example , you send out only 5,000 e-mails. With a 
0.2% response, that is only 10 orders for report #1. Those 10 people
responded 
by sending out 5,000 e-mails each for a total of 50,000. Out of those 
50,000 e-mails only 0.2% responded with orders. That = 100 people responded 
and ordered Report #2.

Those 100 people mail out 5,000 e-mails for a total of 500,000 e-mails.
The 0.2% response to that is 1000 orders for Report #3.
Those 1000 people send out 5,000 e-mails each for a total of 5 million e-
mails sent out. The 0.2% response to that is 10,000 orders for Report #4.
Those 10,000 people send out 5,000 e-mails  each for a total of 
50,000,000 (50 million) e-mails. The 0.2% response to that is 100,000
orders for Report
#5 THAT'S 100,000 ORDERS TIMES $5 EACH=$555,000 ( half million).

Your total income in this example is: 1.....$50 + 2.....$500 + 3.....$5,000
+ 
4.....$50.000 + 5.....$500,000.......Grand Total=$555,550.00

NUMBERS DONT LIE. GET A PENCIL & PAPER AND FIGURE IT OUT
THE WORST POSSIBLE RESPONSES AND NO MATTER HOW YOU
CALCULATE IT, YOU WILL STILL MAKE A LOT OF MONEY!!
================================================
REMEMBER FRIEND THIS IS ASSUMING ONLY 10 PEOPLE 
ORDER OUT OF 5,000 YOU MAILED TO.
Dare to think  for a monment what would happen if everyone or half or 
even one 4th of those people 100,000 e-mails each or more? There are 
over 150million people on the internet worldwide and counting. Believe 
me, many people will do just that, and more!

METHOD #2 : BY PLACING FREE ADS ON THE INTERNET
================================================
ADVERTISING ON THE NET IS VERY VERY INEXPENSIVE and there 
are hundreds of FREE places to advertise. Placing a lot of free ads on the
internet will 
easily get a larger response. We strongly suggest you start with Method# 
1 and add Method #2 as you go along . For every $5 you recieve , all you
have to do is e-mail them the Report they ordered. That's it. Always 
provide same day service on all orders.
This will guarantee that the e-mail they send out, with your name and 
address on it, will be prompt because they can not advertise until they 
recieve the report.
===========AVAILABLE REPORTS========================
ORDER EACH REPORT BY ITS NUMBER & NAME ONLY. Notes:
Always send $5 cash (U.S. CURRENCY) for each Report. Checks NOT
accepted. Make sure the cash is concealed by rapping it in at least 2 
sheets of paper. On one of those sheets of paper, Write the NUMBER & the 
NAME of the report you are ordering, YOUR E-MAIL ADDRESS and your name
and postal address.
PLACE YOUR ORDER FOR THESE REPORTS NOW:
================================================
REPORT # 1:" the Insider's Guide to Advertising for Free on the Net"
Order Report #1 from:

U. Stettler
Langackerstr. 165
CH-8704 Herrliberg
Switzerland
____________________________________________________________
REPORT #2" The Insider's Guide to Sending Bulk e-mail on the Net"
Order Report #2 from:

B. Collins
155 East Beaver Creek Rd.
Unit 24, Suite 235
Richmond Hill, Ontario
L4B 2N1
Canada 
____________________________________________________________
REPORT #3:" Secret to Multilevel marketing on the net"
Order Report #3 from:


R.Robinson
14612 Friar st.#5
VanNuys CA, 91411
USA

_____________________________________________________________
REPORT #4:" How to Become a Millionaire Utilizing MLM & the Net"
Order Report #4 from:


Randall Williams 
401 Stocks Dairy Road 
Leesburg Georgia 31763 
USA 

____________________________________________________________
REPORT #5: "How to Send Out One Million e-mails for Free"
Order Report #5 from:

Dario Va 
16541 Blatt Blvd #206 
Weston, FL 33326 
U.S.A. 
_____________________________________________________________   
$$$$$$$$$YOUR SUCCESS GUIDELINES$$$$$$$$$$$$$$$$$
Follow these guildelines to guarantee your success:
===If you do not receive at least 10 orders for Report #1 within 2 
weeks, continue sending e-mails until you do.
===After you have recieved 10 orders, 2 to 3 weeks after that you
should receive 100 orders  or more for REPORT #2. If you did not,
continue advertising or sending e-mails untill you do.
===Once you received 100 or more orders for Report #2, YOU 
CAN RELAX, because the system is already working for you, and the 
cash will continue to roll in! THIS IS IMPORTANT TO REMEMBER:
every time your name is moved down the list, you are placed in front 
of a different report.

You can KEEP TRACK of your PROGRESS by watching which report 
people are ordering from you. IF YOU WANT TO GENERATE MORE 
INCOME SEND ANOTHER BATCH OF E-MAILS AND START
THE WHOLE PROCESS AGAIN.
There is NO LIMIT to the income you can generate from this business!!!
================================================
FOLLOWING IS A NOTE FROM THE ORIGINATOR OF THIS 
PROGRAM:
You have just received information that can give you finiancial freedom for 
the rest of your life, with NO RISK and just A LITTLE BIT OF EFFORT.you 
can make more money in the next few weeks and months than you ever
imagined. 
follow the program EXACTLY AS INSTRUCTED. Do Not change it in any way. It
works
exceedingly well as it is now. Remember to e-mail a copy of this exciting
report after you have put 
your name and address in the Report #1 and moved the others to 
#2.......#5 as instructed above. One of the peole you send this to may send
out
100,000 or more e-mails and your name will be on every one of them.
Remember though, the more you send out the more potiential customers
you will reach. So my friend, I have given you the ideas, information,
materials and 
opportunity to become financially independent ITS UP TO YOU NOW!

============MORE TESTIMONIALS=======================
"My name is Mitchell, My wife Jody ad I live in Chicago. I am an 
accountant with a major U.S. Corparation and I make pretty good money.
When I received this program I grumbled  to Jody about receiving" junk
mail". I made fun of the whole thing , spouting my knowledge of the 
population and percentages involved. I "knew" it wouldn't work. Jody
totally ignored
my supposed intelligence and a few days later she jumped in with both 
feet. I made merciless fun of her, and was ready to lay the ol "I told you
so" on 
her when the thing didnt work. Well , the laugh was on me ,Within 3 
weeks she had received 50 responses. Within the next 45 days she had 
received total $147,200.00.......all cash! I was shocked. I have joined
Jody 
in her" hobby".
Mitchell Wolf M.D., Chicago Illinois
================================================
"Not being the gambling type , it took me several weeks to make up my
mind to participate in this plan. But conservative that i am, I decided that
 the initial investment was so little that there was just no way that I
wouldn't get enough orders to at least get my money back". "I was
surprised when I found my medium size post office box crammed with
orders. I made $319,210.00 in the first 12 weeks. The nice thing about 
this deal is that it does not matter where people live. There simply isn't a
better investment with a faster return and so big"
Dan Sondstrom, Alberta Canada
================================================
"I had received this program before. I deleted it, but later I wondered
if I should have given it a try. Of course, I had no idea who to contact to
get  another copy, so I had to wait until I was e-mailed again by someone 
else......11 months passed then it luckily came again.....I did not
delete this one! I made more than $490,000 on my first try and all the
the money came within 22 weeks."
Susan De Suza, New York, N.Y.
================================================
"It really is a great opportunity to make relatively easy money with
little cost to you. I followed the simple instructions carefully and
within 10 days the money started to come in. My first month I made 
$20,560.00 and by the end of the third month my total cash count was
$362,840.00. Life is beautiful, Thanx to the internet.".
Fred Dellaca, Westport New Zealand
================================================
ORDER YUR REPORTS TODAY AND GET STARTED ON
"YOUR" ROAD TO FINANCIAL FREEDOM!
================================================
If you have any questions of the legality of this program, contact the 
Office of Associate Director for Marketing Practices, Federal Trade 
Commission
Bureau of Consumer Protection, Washington ,D.C. 





This is a one time mailing and you will not be contacted 
again. Should you chose to have your name completely eliminated
from our data-base, you can press the respond button and type 
"remove" on the subject line.




Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 19:48:35 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16231AbRGPRsT>; Mon, 16 Jul 2001 19:48:19 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:17670 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16067AbRGPRrz>;
	Mon, 16 Jul 2001 19:47:55 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6GHkOQ350465;
	Mon, 16 Jul 2001 13:46:24 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107161746.f6GHkOQ350465@saturn.cs.uml.edu>
Subject: Re: the "cluster" system call (and file system type)
To:	irbis@orcero.org
Date:	Mon, 16 Jul 2001 13:46:24 -0400 (EDT)
Cc:	acahalan@cs.uml.edu (Albert D. Cahalan),
	david@kasey.umkc.edu (David L. Nicol),
	dnicol@cstp.umkc.edu (David L. Nicol),
	bruce@kahuna.cag.cpqcorp.net (Bruce Walker),
	mumismo@wanadoo.es (Jordi Polo), linux-cluster@nl.linux.org
In-Reply-To: <Pine.LNX.4.30.0107160907170.24929-100000@hermes.orcero.org> from "irbis@orcero.org" at Jul 16, 2001 09:34:17 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

irbis@orcero.org writes:

>>>> It used to be part of the /proc filesystem (thus the mountpoint), but
>>>> it was removed as part of an effort to clean the crud out or /proc.
>>>
>>>  I can not see why this is good.
>>
>> It would be nice if you could, but that doesn't matter. Most of
>
>  Well, it should be some rationale. "Authority" rationale could work on
> middle ages with Aristoteles teaching, but for me never did work very
> well. For me, it DOES matter.
>
>> the serious kernel hackers want to clean up /proc. That certainly
>> includes Alexander Viro and seems to include Linus Torvalds.
>
>  Then, the rationale is: Linus and Viro says?

Not really, but it ought to do. I'll try to explain again though:

There are lots of ill-behaved files in /proc. Some of them depend
on kernel config options. The content has undocumented syntax that
changes on a whim. Notice that /proc is like a kid's toy box.
Everything gets thrown into it as a disorganized heap.

The ioctl() calls and /dev are also under fire for being messy.

These messes lead to bugs, including security holes. It is
difficult to verify that every last one of these interfaces
works correctly. The code for /proc or /dev becomes difficult
to maintain when it grows in misc. odd directions.

It looks like the /proc filesystem will be split into two filesystems.
To keep old software working, /proc will be a union mount of the two
new filesystems. (union mounts don't work yet)

>>> And how we do to map processes to devices? And, we will have one
>>> interface to control local processes, and other to control remote
>>> processes?
>>
>> First one must consider: is this even a good thing to be doing?
>
>  No. That is what I amy trying to say on the whole thread.

I mean "map processes to devices". I never suggested to do that.
It is something you came up with, and I think it is a poor idea.

> We have a
> interface for local process, "/proc". Let it be the remote process on
> "/proc", with some more information about where is really executing. Maybe
> it is only good for people that does not matter what they thing -as
> myself-, but it sounds logic, and orthogonal.

This is nice for SSI clusters.

>>> As an exaple, if you want to shut
>>> down a node for maintenance, or you want to discontect for the network, or
>>> for debbuging purporses, or because the process is bounded to the hardware
>>> -as X server-,
>>
>> I have to wonder why you'd be running an X server, but OK...
>
>  Well, it is only a CLEAR example. I always run ONE X server. Maybe you
> are so inteligent that only reading a PDB file with vi can ajust the
> molecules position to refine the structure of a protein. I am not SO
> inteligent, and I need to see it. (I am working on the full automatization
> of the process. When I get it, maybe I win the nobel prize. ;-) but now
> you NEED X.
>
>  After explaining to you why anybody want to execute a X server, If
> somebody is enough inteligent for not to need to see the protein, or you
> work on other think, thing on a DSP, or thing on a ASIC.

You only need the client libraries for this. You don't need to
run the X server. Set your DISPLAY environment variable to your
desktop machine... you know about remote X usage I hope.

>>> or because is an important local daemon -as crond-...
>>
>> If you do SSI, there should be only one crond running. It need not
>> be bound to anything.
>
>  No. You will have local crond. And maybe a global crond. There are
> maintance task on each node of a SSI cluster, and I will NOT doing them by
> hand.

There should not be maintance task on each node of a SSI cluster.
That nearly a contradiction. I don't think flashing a new BIOS is
something you do from a cron job.

>>> Solaris, SCO, and the most of the flavours of Unix, and other
>>> pseudo-OS like Windows 98 and Windows NT) we must to give to the top user,
>>> the administrator and the programer "facilities", not "dificulties".
>>
>> Throwing everything in /proc sounds like hype and marketing to me.
>> Without SSI, the need for kernel support is greatly reduced.
>
>  Without SSI, without a common "/proc", what is exactly what you want for
> clustering on the kernel? The weel was invented: it is PVM. We have all at
> userland now, with PVM. If you feel good with all clustering on userland,
> try PVM.

Good idea. You don't need kernel support.

>> You could have a nice library that hides the underlying interface.
>> You don't need to care if the library reads /proc, reads /multi,
>> or connects to a daemon on another node.
>
>  Once again, that is the state of art TWENTY years ago. We have this
> twenty years ago. It is not needed so hype for your proponsal. Nearly all
> OS have it. And Linux have it since sockets were included on the kernel.

It's not broken. Why change it?

>>> self-organizated and follow literaly KISS philosophy. If the kernel
>>> cluster is complex to administrate and to understand, there will be
>>> difficult a system administrator that wants put this staff on a 512 node
>>> cluster.
>>
>> I don't see how throwing more crud into /proc will help you with
>> this problem.
>
>  On my Mosix cluster, It helps a lot. Maybe I was doing the wrong thing
> administrating it by scripts, and I should do it by hand. It is great
> doing to work on weekends to do manualy the things, and wake up to strange
> hours to do maintance work over the processes.

"not in /proc" DOES NOT MEAN "do it by hand"

With extra crud in the kernel:  foo=`cat /proc/foo`
With a lean and simple kernel:  foo=`cluster --foo`

You can script it either way.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 19:49:35 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16372AbRGPRt3>; Mon, 16 Jul 2001 19:49:29 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:21510 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16323AbRGPRtQ>;
	Mon, 16 Jul 2001 19:49:16 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6GHkOQ350465;
	Mon, 16 Jul 2001 13:46:24 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107161746.f6GHkOQ350465@saturn.cs.uml.edu>
Subject: Re: the "cluster" system call (and file system type)
To:	irbis@orcero.org
Date:	Mon, 16 Jul 2001 13:46:24 -0400 (EDT)
Cc:	acahalan@cs.uml.edu (Albert D. Cahalan),
	david@kasey.umkc.edu (David L. Nicol),
	dnicol@cstp.umkc.edu (David L. Nicol),
	bruce@kahuna.cag.cpqcorp.net (Bruce Walker),
	mumismo@wanadoo.es (Jordi Polo), linux-cluster@nl.linux.org
In-Reply-To: <Pine.LNX.4.30.0107160907170.24929-100000@hermes.orcero.org> from "irbis@orcero.org" at Jul 16, 2001 09:34:17 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

irbis@orcero.org writes:

>>>> It used to be part of the /proc filesystem (thus the mountpoint), but
>>>> it was removed as part of an effort to clean the crud out or /proc.
>>>
>>>  I can not see why this is good.
>>
>> It would be nice if you could, but that doesn't matter. Most of
>
>  Well, it should be some rationale. "Authority" rationale could work on
> middle ages with Aristoteles teaching, but for me never did work very
> well. For me, it DOES matter.
>
>> the serious kernel hackers want to clean up /proc. That certainly
>> includes Alexander Viro and seems to include Linus Torvalds.
>
>  Then, the rationale is: Linus and Viro says?

Not really, but it ought to do. I'll try to explain again though:

There are lots of ill-behaved files in /proc. Some of them depend
on kernel config options. The content has undocumented syntax that
changes on a whim. Notice that /proc is like a kid's toy box.
Everything gets thrown into it as a disorganized heap.

The ioctl() calls and /dev are also under fire for being messy.

These messes lead to bugs, including security holes. It is
difficult to verify that every last one of these interfaces
works correctly. The code for /proc or /dev becomes difficult
to maintain when it grows in misc. odd directions.

It looks like the /proc filesystem will be split into two filesystems.
To keep old software working, /proc will be a union mount of the two
new filesystems. (union mounts don't work yet)

>>> And how we do to map processes to devices? And, we will have one
>>> interface to control local processes, and other to control remote
>>> processes?
>>
>> First one must consider: is this even a good thing to be doing?
>
>  No. That is what I amy trying to say on the whole thread.

I mean "map processes to devices". I never suggested to do that.
It is something you came up with, and I think it is a poor idea.

> We have a
> interface for local process, "/proc". Let it be the remote process on
> "/proc", with some more information about where is really executing. Maybe
> it is only good for people that does not matter what they thing -as
> myself-, but it sounds logic, and orthogonal.

This is nice for SSI clusters.

>>> As an exaple, if you want to shut
>>> down a node for maintenance, or you want to discontect for the network, or
>>> for debbuging purporses, or because the process is bounded to the hardware
>>> -as X server-,
>>
>> I have to wonder why you'd be running an X server, but OK...
>
>  Well, it is only a CLEAR example. I always run ONE X server. Maybe you
> are so inteligent that only reading a PDB file with vi can ajust the
> molecules position to refine the structure of a protein. I am not SO
> inteligent, and I need to see it. (I am working on the full automatization
> of the process. When I get it, maybe I win the nobel prize. ;-) but now
> you NEED X.
>
>  After explaining to you why anybody want to execute a X server, If
> somebody is enough inteligent for not to need to see the protein, or you
> work on other think, thing on a DSP, or thing on a ASIC.

You only need the client libraries for this. You don't need to
run the X server. Set your DISPLAY environment variable to your
desktop machine... you know about remote X usage I hope.

>>> or because is an important local daemon -as crond-...
>>
>> If you do SSI, there should be only one crond running. It need not
>> be bound to anything.
>
>  No. You will have local crond. And maybe a global crond. There are
> maintance task on each node of a SSI cluster, and I will NOT doing them by
> hand.

There should not be maintance task on each node of a SSI cluster.
That nearly a contradiction. I don't think flashing a new BIOS is
something you do from a cron job.

>>> Solaris, SCO, and the most of the flavours of Unix, and other
>>> pseudo-OS like Windows 98 and Windows NT) we must to give to the top user,
>>> the administrator and the programer "facilities", not "dificulties".
>>
>> Throwing everything in /proc sounds like hype and marketing to me.
>> Without SSI, the need for kernel support is greatly reduced.
>
>  Without SSI, without a common "/proc", what is exactly what you want for
> clustering on the kernel? The weel was invented: it is PVM. We have all at
> userland now, with PVM. If you feel good with all clustering on userland,
> try PVM.

Good idea. You don't need kernel support.

>> You could have a nice library that hides the underlying interface.
>> You don't need to care if the library reads /proc, reads /multi,
>> or connects to a daemon on another node.
>
>  Once again, that is the state of art TWENTY years ago. We have this
> twenty years ago. It is not needed so hype for your proponsal. Nearly all
> OS have it. And Linux have it since sockets were included on the kernel.

It's not broken. Why change it?

>>> self-organizated and follow literaly KISS philosophy. If the kernel
>>> cluster is complex to administrate and to understand, there will be
>>> difficult a system administrator that wants put this staff on a 512 node
>>> cluster.
>>
>> I don't see how throwing more crud into /proc will help you with
>> this problem.
>
>  On my Mosix cluster, It helps a lot. Maybe I was doing the wrong thing
> administrating it by scripts, and I should do it by hand. It is great
> doing to work on weekends to do manualy the things, and wake up to strange
> hours to do maintance work over the processes.

"not in /proc" DOES NOT MEAN "do it by hand"

With extra crud in the kernel:  foo=`cat /proc/foo`
With a lean and simple kernel:  foo=`cluster --foo`

You can script it either way.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 19:59:14 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16272AbRGPR65>; Mon, 16 Jul 2001 19:58:57 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:12804 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16224AbRGPR6u>;
	Mon, 16 Jul 2001 19:58:50 +0200
Received: (qmail 285295 invoked from network); 16 Jul 2001 17:54:51 -0000
Received: from nicol6.umkc.edu (HELO kasey.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 16 Jul 2001 17:54:51 -0000
Message-ID: <3B532994.CD53DBEC@kasey.umkc.edu>
Date:	Mon, 16 Jul 2001 12:51:16 -0500
From:	"David L. Nicol" <david@kasey.umkc.edu>
Organization: UMKC Information Services Central Systems
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Jordi Polo <mumismo@wanadoo.es>
CC:	"J . A . Magallon" <jamagallon@able.es>,
	"Albert D . Cahalan" <acahalan@cs.uml.edu>, irbis@orcero.org,
	linux-cluster@nl.linux.org
Subject: another laundry or shopping list
References: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org> <01071515170000.00696@mioooldpc> <20010715212905.A11513@werewolf.able.es> <01071601032402.00502@mioooldpc>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Jordi Polo wrote:

> > But using /proc to start work, people can focus on what features want
> > there.

Lets mount it, by default and for discussion purposes, at /proc/cluster


> Sorry to be so generic, i have no time now, i'll explain longer tomorrow
> 
> Something that worries me is how will we find the processes (as they go away
> from his node) we have 4 alternatives:


> 3.- The info where the process lives  (i don't thing a process will migrate a
> lot )is updated in the node where the process was born . so we can just ask
> the node that according to the cpid is the local node and there will be the
> info . If that node is down we can use 1 or 2 . I like this alternative most.
> This is like a cache, if the info is not in that node we do a broadcast that
> is a more expensive operation.


I like MOSIX's home-node paradigm.  A processes home node is where it 
started, and the home node is responsible for keeping track of the
process, wherever it goes, so it can send it signals and such, so when
a process moves from one non-home node to another, it has to register
this with its home node.

Meaning in CPID terms, that a signal for a cpid would get redirected
by the home node, which is obvious from the CPID.

That's for process migration.





If all the different pieces operate independently

	remote swap space

	remote storage

	remote CPU

	remote IO

	remote signal delivery

	remote whatever-is-left

and we have standards for them, we can have heterogeneous clusters,
since the only thing you need CPU congruity for is remote CPU.

There could be, say, a Central Swap Server that provides swap space
for all the machines, and stores it in its own /dev/shm, and
we could have cluster-wide absolute memory addressing, towards
a standard cluster-wide memory sharing protocol.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 20:39:46 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16224AbRGPSji>; Mon, 16 Jul 2001 20:39:38 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:36103 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16067AbRGPSjT>;
	Mon, 16 Jul 2001 20:39:19 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6GIbqL338950;
	Mon, 16 Jul 2001 14:37:52 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107161837.f6GIbqL338950@saturn.cs.uml.edu>
Subject: Re: another laundry or shopping list
To:	david@kasey.umkc.edu (David L. Nicol)
Date:	Mon, 16 Jul 2001 14:37:52 -0400 (EDT)
Cc:	mumismo@wanadoo.es (Jordi Polo),
	jamagallon@able.es (J . A . Magallon),
	acahalan@cs.uml.edu (Albert D . Cahalan), irbis@orcero.org,
	linux-cluster@nl.linux.org
In-Reply-To: <3B532994.CD53DBEC@kasey.umkc.edu> from "David L. Nicol" at Jul 16, 2001 12:51:16 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

David L. Nicol writes:
> Jordi Polo wrote:

>>> But using /proc to start work, people can focus on what features want
>>> there.
>
> Lets mount it, by default and for discussion purposes, at /proc/cluster

That means you need a mount point, which means more junk in /proc.
Forget about using /proc. You can have something similar to /proc,
but with separate code and a completely unrelated mount point.
Here is a Tru64 5.0 system:

$ ls -l /cluster/
total 3
drwxr-xr-x   3 root     system       512 Apr  5  2000 admin
drwxr-xr-x   3 root     system       512 Aug  4  2000 auth
drwxr-xr-x   3 root     system       512 Apr  5  2000 members
lrwxr-xr-x   1 root     system        14 Aug  4  2000 usr -> ../usr/cluster

That works for me.

> I like MOSIX's home-node paradigm.  A processes home node is where it 
> started, and the home node is responsible for keeping track of the
> process, wherever it goes, so it can send it signals and such, so when
> a process moves from one non-home node to another, it has to register
> this with its home node.
>
> Meaning in CPID terms, that a signal for a cpid would get redirected
> by the home node, which is obvious from the CPID.
>
> That's for process migration.

This is so... wrong. Now you have two nodes that must work OK,
increasing the chance of failure. You increase network traffic
and reduce performance.

There are two good ways to do process migration. The first way
is with SSI. The second is totally userspace. Doesn't Condor
support this? Obviously the userspace solution has limits, but
it ought to be the best performing.

> If all the different pieces operate independently
> 
> 	remote swap space

Remote swap space is already supported more or less. There might
still be some deadlocks. Shared swap space is more difficult.

Future directions seem to be toward using a filesystem-like swap
space, with anonymous files as backing store. This might help.

Peer-to-peer memory sharing would be cool.

> 	remote storage

Again, remote is easy. (the network block device) Shared is hard.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 21:09:10 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16307AbRGPTIx>; Mon, 16 Jul 2001 21:08:53 +0200
Received: from 26-MADR-X110.libre.retevision.es ([62.83.45.90]:13573 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16069AbRGPTIo>;
	Mon, 16 Jul 2001 21:08:44 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id E0BDE2F4BA; Mon, 16 Jul 2001 21:15:16 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	"David L. Nicol" <david@kasey.umkc.edu>
Subject: Re: another laundry or shopping list
Date:	Mon, 16 Jul 2001 21:15:15 +0200
X-Mailer: KMail [version 1.2]
References: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org> <01071601032402.00502@mioooldpc> <3B532994.CD53DBEC@kasey.umkc.edu>
In-Reply-To: <3B532994.CD53DBEC@kasey.umkc.edu>
Cc:	"J . A . Magallon" <jamagallon@able.es>,
	"Albert D . Cahalan" <acahalan@cs.uml.edu>, irbis@orcero.org,
	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071621151500.01423@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



> > > But using /proc to start work, people can focus on what features want
> > > there.
>
> Lets mount it, by default and for discussion purposes, at /proc/cluster
>
Perfect.
David and Albert, please consider this convention in your discussion. 


> > Sorry to be so generic, i have no time now, i'll explain longer tomorrow
> >
> > Something that worries me is how will we find the processes (as they go
> > away from his node) we have 4 alternatives:
> >
> >
> > 3.- The info where the process lives  (i don't thing a process will
> > migrate a lot )is updated in the node where the process was born . so we
> > can just ask the node that according to the cpid is the local node and
> > there will be the info . If that node is down we can use 1 or 2 . I like
> > this alternative most. This is like a cache, if the info is not in that
> > node we do a broadcast that is a more expensive operation.
>
> I like MOSIX's home-node paradigm.  A processes home node is where it
> started, and the home node is responsible for keeping track of the
> process, wherever it goes, so it can send it signals and such, so when
> a process moves from one non-home node to another, it has to register
> this with its home node.
>
> Meaning in CPID terms, that a signal for a cpid would get redirected
> by the home node, which is obvious from the CPID.
>
I like MOSIX approach also, in fact i think that sometimes is just necesary , 
but i would migrate more info and let in the home node just the minimum so 
i'd break the depyty and migrate part of it . So if i have to send a signal i 
can send it directly to the node where the process is. That implies the need 
of the cache and all that thing of my last mail but eliminates  the need to 
send 10 signals 3 lans away when the node is just next to me.
  
Another little idea: what about not moving but copying the process and sleep 
it in the home node and a little heartbeat so if the other node fails you can 
run again the process in other node.  This obviusly is a higher lever, very 
optional feature. 

> That's for process migration.
>
>
>
>
>
> If all the different pieces operate independently
>
> 	remote swap space
>
> 	remote storage
>
> 	remote CPU
>
> 	remote IO
>
> 	remote signal delivery
>
> 	remote whatever-is-left
>
> and we have standards for them, we can have heterogeneous clusters,
> since the only thing you need CPU congruity for is remote CPU.
can you be more verbose about this?,i think i don't catch well what you mean. 

> There could be, say, a Central Swap Server that provides swap space
> for all the machines, and stores it in its own /dev/shm, and
> we could have cluster-wide absolute memory addressing, towards
> a standard cluster-wide memory sharing protocol.

Here i don't like the idea of the Central Swap Server, in fact i don't like 
any idea that implies a central server if:
a) that server can't foresee with 2 minutes if someone is going to run into 
the power cable, of course with cheap hardw.
b) you can't demostrate me it's the only way possible 
 
 

i think i didn't catch the topic either.
--
Jordi
  Student of Spain

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 21:26:01 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16084AbRGPTZx>; Mon, 16 Jul 2001 21:25:53 +0200
Received: from 26-MADR-X110.libre.retevision.es ([62.83.45.90]:16133 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16069AbRGPTZn>;
	Mon, 16 Jul 2001 21:25:43 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id 18FFC2F4BA; Mon, 16 Jul 2001 21:32:02 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Subject: Re: another laundry or shopping list
Date:	Mon, 16 Jul 2001 21:32:01 +0200
X-Mailer: KMail [version 1.2]
References: <200107161837.f6GIbqL338950@saturn.cs.uml.edu>
In-Reply-To: <200107161837.f6GIbqL338950@saturn.cs.uml.edu>
Cc:	david@kasey.umkc.edu, jamagallon@able.es, irbis@orcero.org,
	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071621320101.01423@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


> >
> > Meaning in CPID terms, that a signal for a cpid would get redirected
> > by the home node, which is obvious from the CPID.
> >
> > That's for process migration.
>
> This is so... wrong. Now you have two nodes that must work OK,
> increasing the chance of failure. You increase network traffic
> and reduce performance.
>
> There are two good ways to do process migration. The first way
> is with SSI. The second is totally userspace. Doesn't Condor
> support this? Obviously the userspace solution has limits, but
> it ought to be the best performing.

I agree we double the chance of fail but this about issues you can't take 
away home node. You need some of the home node syscalls. Think in a 5 nodes 
cluster,all of them connected with his own modem to internet.  2 of them with 
telnet and ftp open, as the processes has that sockets open you need a little 
of it being bound to that node. 
will be all the machines in the cluster have the same time ? maybe you don't 
want it and you want the time of your home node not others.
If you find a way to avoid this (not in userspace) i'll be really happy .  


--
Jordi 
  Student of Spain 

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 21:52:20 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16188AbRGPTwE>; Mon, 16 Jul 2001 21:52:04 +0200
Received: from [192.216.221.8] ([192.216.221.8]:35223 "EHLO suntan.tandem.com")
	by humbolt.nl.linux.org with ESMTP id <S16221AbRGPTv6>;
	Mon, 16 Jul 2001 21:51:58 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id MAA19886
	for <linux-cluster@nl.linux.org>; Mon, 16 Jul 2001 12:51:51 -0700 (PDT)
Received: from kahuna.cag.cpqcorp.net (thanos.cag.cpqcorp.net [16.61.168.101]) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) with ESMTP id f6GJXld07368; Mon, 16 Jul 2001 12:33:47 -0700 (PDT)
Message-ID: <3B53419A.B5A9B421@kahuna.cag.cpqcorp.net>
Date:	Mon, 16 Jul 2001 12:33:46 -0700
From:	John Byrne <jbyrne@kahuna.cag.cpqcorp.net>
Reply-To: John.L.Byrne@compaq.com
X-Mailer: Mozilla 4.61 [en] (X11; I; UnixWare 5 i386)
MIME-Version: 1.0
To:	"David L. Nicol" <david@kasey.umkc.edu>
CC:	ssic-linux-devel@opensource.compaq.com, linux-cluster@nl.linux.org
Subject: Re: another laundry or shopping list
References: <3B532994.CD53DBEC@kasey.umkc.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

"David L. Nicol" wrote:
> I like MOSIX's home-node paradigm.  A processes home node is where it
> started, and the home node is responsible for keeping track of the
> process, wherever it goes, so it can send it signals and such, so when
> a process moves from one non-home node to another, it has to register
> this with its home node.
> 
> Meaning in CPID terms, that a signal for a cpid would get redirected
> by the home node, which is obvious from the CPID.
> 
> That's for process migration.

A problem with the MOSIX home-node paradigm is that if the home node
goes away, a remote process has to die because context gets lost.
Ideally, once a process (or a group of processes sharing resources such
as shared memory, etc.) gets migrated away from a node, no dependencies
should be left on the home node. Depending on the resources in use by a
process, this can range from relatively trivial to impossible. 

If you define SSI to mean everything on all systems are visible to all
processes, one of the biggest design issues to face is what happens to
all the various namespaces like /dev (both at the filesystem naming
level and the dev_t level) and /proc (most everything except the PIDs).
These namespaces were not designed with clustering in mind and are not
easily extensible without changes to the applications. The home-node
paradigm can simplify this tremendously, because the default namespaces
an application sees are the home node's. This allows an unmodified
application to run just fine, but it may be blind to other resources in
the cluster. A cluster-aware application (or user) can look in
/proc/cluster (or /dev/cluster?) to find resources on other nodes and
use them.

I often wish K&R had designed the original Unix with clustering in mind
and that all applications had been developed with it in mind as well. Is
there a Linux Time-Machine project anywhere?


> 
> If all the different pieces operate independently
> 
>         remote swap space

Remote swapping can be used to allow processes to run that cannot be run
otherwise; however, it also introduces failure semantics that can cause
processes to die that wouldn't die if you didn't have it: if you need to
load a page from a node that has gone down, the process is toast.

Remote swap space is less important in Linux than other Unixes. In
UnixWare, a page of memory couldn't be allocated to a process unless a
swap page was available to back it up. (Physical memory was counted as
swap for this purpose.) Linux allows available swap to be overcommitted
and only allocates swap pages when a dirty anon page is deactivated; if
you run out, the OOM killer starts freeing pages by killing processes.
Under UnixWare, a process couldn't allocate large sparse arrays without
having swap to back up the entire array. Under Linux, only the actually
used memory needs swap. I mostly like the Linux way better, although the
OOM killer's actions can be annoying if it chooses poorly.

Also, the additional overhead of going remote is not something swapping
really needs.

I'd make remote swap a low-priority item.

> 
>         remote storage

You have to define exactly what filesystem semantics you plan to offer
and what performance tradeoffs you want to make. If you have to go
remote on each I/O, it performs poorly. If you add caching on nodes
remote from the actual filesystem, then there are problems with data and
attribute coherency. There is a range of solutions here that will depend
on the type of cluster and hardware and the application's requirements.

Maintaining the  filesystem namespace for a cluster is probably more
easily abstracted and we could do this first. 

> 
>         remote CPU
> 
>         remote IO

The acutal I/O is straightforward. To handle ioctls, ou have to have
hooks to to allow a driver to remotely read a process's address space or
device-specific support code. Some device types may have special
requirements: I believe the generic tty structure is expected to be
directly accessible from the process and the low-level driver. 
 
> 
>         remote signal delivery
> 
>         remote whatever-is-left
> 
> and we have standards for them, we can have heterogeneous clusters,
> since the only thing you need CPU congruity for is remote CPU.
> 
> There could be, say, a Central Swap Server that provides swap space
> for all the machines, and stores it in its own /dev/shm, and
> we could have cluster-wide absolute memory addressing, towards
> a standard cluster-wide memory sharing protocol.
> 
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Mon Jul 16 22:01:55 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16253AbRGPUBr>; Mon, 16 Jul 2001 22:01:47 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:41737 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16201AbRGPUBl>;
	Mon, 16 Jul 2001 22:01:41 +0200
Received: (qmail 287465 invoked from network); 16 Jul 2001 19:57:45 -0000
Received: from nicol6.umkc.edu (HELO kasey.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 16 Jul 2001 19:57:45 -0000
Message-ID: <3B534663.935BAEB@kasey.umkc.edu>
Date:	Mon, 16 Jul 2001 14:54:11 -0500
From:	"David L. Nicol" <david@kasey.umkc.edu>
Organization: UMKC Information Services Central Systems
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	Jordi Polo <mumismo@wanadoo.es>
CC:	"J . A . Magallon" <jamagallon@able.es>,
	"Albert D . Cahalan" <acahalan@cs.uml.edu>, irbis@orcero.org,
	linux-cluster@nl.linux.org
Subject: Re: another laundry or shopping list
References: <Pine.LNX.4.30.0107141603260.911-100000@hermes.orcero.org> <01071601032402.00502@mioooldpc> <3B532994.CD53DBEC@kasey.umkc.edu> <01071621151500.01423@mioooldpc>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Jordi Polo wrote:
> 
> the only thing you need CPU congruity for is remote CPU.
> can you be more verbose about this?,i think i don't catch well what you mean.

I have several kinds of systems.  I cannot put my alpha systems into
my MOSIX cluster since MOSIX only works on *86.  But if I could
integrate everything but process migration (remote CPU) into the
other-OS system (as I can with PVM, where distribution is by source
code of RPC and they don't care what kind of hardware is running)
with my different kinds of machines, that would make the differences
between nodes less.

Here's the continuum from tight to loose clusters as I currently see it:


(tight)						(loose)
SSI --- bproc --- MOSIX --- PVM/MPI --- r*/condor ---
distributed.net/entropoa


Is that pretty much right? Some will say that anything to the right of
bproc is "wrong, not a cluster" but on the other end,  the "Entropia
Grid" hypes itself as "The world's largest supercomputer."

The primary assumption of SSI, which is a single common administration,
cannot be held, for grids (I am going to call any resource sharing
system where root one one machine might not be root on all of them a 
grid, is that okay with everyone?  This makes most MOSIX configurations
grids instead of clusters -- or grid-clusters -- or we need a new word,
like calling anything from PVM to the left a "cluster" and calling
the SSI systems a "tight cluster" for instance, and keeping grids
for condor and distributed.net/entropia.

Right now, for instance, grids do _not_ do process migration of unfinished
work, at least not in the context of the gridding protocol. What I mean 
is, for example, if I have mprime running on a machine in a MOSIX cluster,
that mprime process might hike all over the cluster, but primenet does
not know that it is not running at its home node all the time.

Which leads to a running-of-process model where giving a process a time
slice can some how be abstracted to loosen the binding between processor
and process.  

Can we do SSI-like things, sharing more resources with less structure,
while still having the nodes under enforcedly separate management?


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 00:08:14 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16151AbRGPWH4>; Tue, 17 Jul 2001 00:07:56 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:33036 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16097AbRGPWHm>;
	Tue, 17 Jul 2001 00:07:42 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6GM6CX373197;
	Mon, 16 Jul 2001 18:06:12 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107162206.f6GM6CX373197@saturn.cs.uml.edu>
Subject: Re: another laundry or shopping list
To:	mumismo@wanadoo.es (Jordi Polo)
Date:	Mon, 16 Jul 2001 18:06:12 -0400 (EDT)
Cc:	acahalan@cs.uml.edu (Albert D. Cahalan), david@kasey.umkc.edu,
	jamagallon@able.es, irbis@orcero.org, linux-cluster@nl.linux.org
In-Reply-To: <01071621320101.01423@mioooldpc> from "Jordi Polo" at Jul 16, 2001 09:32:01 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Jordi Polo writes:

>>> Meaning in CPID terms, that a signal for a cpid would get redirected
>>> by the home node, which is obvious from the CPID.
>>>
>>> That's for process migration.
>>
>> This is so... wrong. Now you have two nodes that must work OK,
>> increasing the chance of failure. You increase network traffic
>> and reduce performance.
>>
>> There are two good ways to do process migration. The first way
>> is with SSI. The second is totally userspace. Doesn't Condor
>> support this? Obviously the userspace solution has limits, but
>> it ought to be the best performing.
>
> I agree we double the chance of fail but this about issues you can't
> take away home node. You need some of the home node syscalls. Think
> in a 5 nodes cluster,all of them connected with his own modem to
> internet.

OK, that would be: /dev/ttyS0, /dev/ttyS1, /dev/ttyS2...
Any modem can be used from any node.

> 2 of them with telnet and ftp open,

OK. Note that they have the same local IP address.

> as the processes has
> that sockets open you need a little of it being bound to that node.

No, although migration away from IO might be a dumb idea.

> will be all the machines in the cluster have the same time ? maybe
> you don't want it and you want the time of your home node not
> others.  If you find a way to avoid this (not in userspace) i'll be
> really happy .

All nodes follow POSIX time, which is similar to UTC.
(time zones are done with libc and environment variables)

Time skew causes trouble even with plain NFS, so what
else is new? Add delays as desired to avoid time warps.



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 00:18:58 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16176AbRGPWSu>; Tue, 17 Jul 2001 00:18:50 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:48396 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16128AbRGPWSg>;
	Tue, 17 Jul 2001 00:18:36 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6GMHCU372439;
	Mon, 16 Jul 2001 18:17:12 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107162217.f6GMHCU372439@saturn.cs.uml.edu>
Subject: Re: another laundry or shopping list
To:	John.L.Byrne@compaq.com
Date:	Mon, 16 Jul 2001 18:17:12 -0400 (EDT)
Cc:	david@kasey.umkc.edu (David L. Nicol),
	ssic-linux-devel@opensource.compaq.com, linux-cluster@nl.linux.org
In-Reply-To: <3B53419A.B5A9B421@kahuna.cag.cpqcorp.net> from "John Byrne" at Jul 16, 2001 12:33:46 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

John Byrne writes:
> "David L. Nicol" wrote:

> If you define SSI to mean everything on all systems are visible to all
> processes, one of the biggest design issues to face is what happens to
> all the various namespaces like /dev (both at the filesystem naming
> level and the dev_t level) and /proc (most everything except the PIDs).
> These namespaces were not designed with clustering in mind and are not
> easily extensible without changes to the applications. The home-node

Some devices, such as /dev/null, get replicated.
Other devices, such as /dev/sda, require forwarding.

So /dev/sda is on node 6, /dev/sdb is on node 23, etc.

> I often wish K&R had designed the original Unix with clustering in mind
> and that all applications had been developed with it in mind as well. Is
> there a Linux Time-Machine project anywhere?

Plan 9

>>         remote swap space
>
> Remote swapping can be used to allow processes to run that cannot be run
> otherwise; however, it also introduces failure semantics that can cause
> processes to die that wouldn't die if you didn't have it: if you need to
> load a page from a node that has gone down, the process is toast.

That would depend on the cluster hardware. If the node-to-node and
node-to-disk interconnects are one and the same, swapping to a peer
is no worse than swapping to disk. I have such a system. You can build
one with Cisco's SCSI-over-IP disk server.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 00:19:48 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16154AbRGPWTa>; Tue, 17 Jul 2001 00:19:30 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:50444 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16179AbRGPWTM>;
	Tue, 17 Jul 2001 00:19:12 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6GMHCU372439;
	Mon, 16 Jul 2001 18:17:12 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107162217.f6GMHCU372439@saturn.cs.uml.edu>
Subject: Re: another laundry or shopping list
To:	John.L.Byrne@compaq.com
Date:	Mon, 16 Jul 2001 18:17:12 -0400 (EDT)
Cc:	david@kasey.umkc.edu (David L. Nicol),
	ssic-linux-devel@opensource.compaq.com, linux-cluster@nl.linux.org
In-Reply-To: <3B53419A.B5A9B421@kahuna.cag.cpqcorp.net> from "John Byrne" at Jul 16, 2001 12:33:46 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

John Byrne writes:
> "David L. Nicol" wrote:

> If you define SSI to mean everything on all systems are visible to all
> processes, one of the biggest design issues to face is what happens to
> all the various namespaces like /dev (both at the filesystem naming
> level and the dev_t level) and /proc (most everything except the PIDs).
> These namespaces were not designed with clustering in mind and are not
> easily extensible without changes to the applications. The home-node

Some devices, such as /dev/null, get replicated.
Other devices, such as /dev/sda, require forwarding.

So /dev/sda is on node 6, /dev/sdb is on node 23, etc.

> I often wish K&R had designed the original Unix with clustering in mind
> and that all applications had been developed with it in mind as well. Is
> there a Linux Time-Machine project anywhere?

Plan 9

>>         remote swap space
>
> Remote swapping can be used to allow processes to run that cannot be run
> otherwise; however, it also introduces failure semantics that can cause
> processes to die that wouldn't die if you didn't have it: if you need to
> load a page from a node that has gone down, the process is toast.

That would depend on the cluster hardware. If the node-to-node and
node-to-disk interconnects are one and the same, swapping to a peer
is no worse than swapping to disk. I have such a system. You can build
one with Cisco's SCSI-over-IP disk server.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 00:36:09 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16233AbRGPWfv>; Tue, 17 Jul 2001 00:35:51 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:5133 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16225AbRGPWff>;
	Tue, 17 Jul 2001 00:35:35 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6GMYCS334936;
	Mon, 16 Jul 2001 18:34:12 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107162234.f6GMYCS334936@saturn.cs.uml.edu>
Subject: Re: another laundry or shopping list
To:	david@kasey.umkc.edu (David L. Nicol)
Date:	Mon, 16 Jul 2001 18:34:11 -0400 (EDT)
Cc:	mumismo@wanadoo.es (Jordi Polo),
	jamagallon@able.es (J . A . Magallon),
	acahalan@cs.uml.edu (Albert D . Cahalan), irbis@orcero.org,
	linux-cluster@nl.linux.org
In-Reply-To: <3B534663.935BAEB@kasey.umkc.edu> from "David L. Nicol" at Jul 16, 2001 02:54:11 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

David L. Nicol writes:
> Jordi Polo wrote:

> (tight)						(loose)
> SSI --- bproc --- MOSIX --- PVM/MPI --- r*/condor ---
> distributed.net/entropoa
>
> Is that pretty much right? Some will say that anything to the right of
> bproc is "wrong, not a cluster" but on the other end,  the "Entropia
> Grid" hypes itself as "The world's largest supercomputer."

Condor isn't bad. MPI isn't bad. It's the half-way hacks that
are gross. Besides, we're not going to have 42 different types
of cluster support in the kernel. Even 2 is likely too many.

> Right now, for instance, grids do _not_ do process migration of
> unfinished work, at least not in the context of the gridding
> protocol. What I mean is, for example, if I have mprime running on a
> machine in a MOSIX cluster, that mprime process might hike all over
> the cluster, but primenet does not know that it is not running at
> its home node all the time.

Is automatic process migration any more than an academic curiosity?
I'm sure it has produced a few doctorates, but that isn't enough
reason to make it the Linux standard.

It's much better give every CPU one single-threaded compute task.
If this is web serving, pass a socket around.

> Can we do SSI-like things, sharing more resources with less structure,
> while still having the nodes under enforcedly separate management?

Sure. Change that to "Should we..." and the answer is NO.
Linux needs to stay clean, maintainable, secure, and fast.
You can't get code into Linux using "ain't it cool" as the
primary reason.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 00:49:29 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16256AbRGPWtT>; Tue, 17 Jul 2001 00:49:19 +0200
Received: from hilbert.umkc.edu ([134.193.4.60]:64528 "HELO tesla.umkc.edu")
	by humbolt.nl.linux.org with SMTP id <S16225AbRGPWtE>;
	Tue, 17 Jul 2001 00:49:04 +0200
Received: (qmail 290469 invoked from network); 16 Jul 2001 22:45:09 -0000
Received: from nicol6.umkc.edu (HELO kasey.umkc.edu) (david@134.193.4.67)
  by hilbert.umkc.edu with SMTP; 16 Jul 2001 22:45:09 -0000
Message-ID: <3B536D9E.A038DF6B@kasey.umkc.edu>
Date:	Mon, 16 Jul 2001 17:41:34 -0500
From:	"David L. Nicol" <david@kasey.umkc.edu>
Organization: UMKC Information Services Central Systems
X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.5 i586)
X-Accept-Language: en
MIME-Version: 1.0
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
CC:	John.L.Byrne@compaq.com, ssic-linux-devel@opensource.compaq.com,
	linux-cluster@nl.linux.org
Subject: Re: another laundry or shopping list
References: <200107162217.f6GMHCU372439@saturn.cs.uml.edu>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

"Albert D. Cahalan" wrote:

> Some devices, such as /dev/null, get replicated.
> Other devices, such as /dev/sda, require forwarding.
> 
> So /dev/sda is on node 6, /dev/sdb is on node 23, etc.

This can be done now with network block device, yes?  What
file systems are able to handle multiple simultaneous access
at device level?  (it would not be hard to add, at design
level, but it is so far from the standard paradigm of one system
gets to be the Single Definitive Point, as in NFS)


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 01:24:17 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16262AbRGPXYL>; Tue, 17 Jul 2001 01:24:11 +0200
Received: from mercury.mv.net ([199.125.85.40]:35588 "EHLO mercury.mv.net")
	by humbolt.nl.linux.org with ESMTP id <S16156AbRGPXXy>;
	Tue, 17 Jul 2001 01:23:54 +0200
Received: from filesrus (bnh-3-02.mv.com [199.125.99.130]) by mercury.mv.net (8.8.8/mem-971025) with SMTP id TAA25891 for <linux-cluster@nl.linux.org>; Mon, 16 Jul 2001 19:23:50 -0400 (EDT)
Message-ID: <032a01c10e4e$f6c79f00$85637dc7@filesrus>
From:	"Bill Todd" <billtodd@foo.mv.com>
To:	<linux-cluster@nl.linux.org>
References: <200107161837.f6GIbqL338950@saturn.cs.uml.edu>
Subject: Re: another laundry or shopping list
Date:	Mon, 16 Jul 2001 19:28:03 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


----- Original Message -----
From: "Albert D. Cahalan" <acahalan@cs.uml.edu>
To: "David L. Nicol" <david@kasey.umkc.edu>
Cc: "Jordi Polo" <mumismo@wanadoo.es>; "J . A . Magallon"
<jamagallon@able.es>; "Albert D . Cahalan" <acahalan@cs.uml.edu>;
<irbis@orcero.org>; <linux-cluster@nl.linux.org>
Sent: Monday, July 16, 2001 2:37 PM
Subject: Re: another laundry or shopping list


> David L. Nicol writes:

...

> > I like MOSIX's home-node paradigm.  A processes home node is where it
> > started, and the home node is responsible for keeping track of the
> > process, wherever it goes, so it can send it signals and such, so when
> > a process moves from one non-home node to another, it has to register
> > this with its home node.
> >
> > Meaning in CPID terms, that a signal for a cpid would get redirected
> > by the home node, which is obvious from the CPID.
> >
> > That's for process migration.
>
> This is so... wrong. Now you have two nodes that must work OK,
> increasing the chance of failure. You increase network traffic
> and reduce performance.

This sounds like a fairly standard distributed-system issue, and it should
be amenable to a standard solution.

If one assumes that process migrations are too common (and possibly that
there are too many total processes in the cluster) to have all nodes track
the locations of all (even just all migrated) processes, then some kind of
process location directory mechanism is required.  The home-node mechanism
(assuming that the home node is implicit in the CPID) both avoids the need
for creating a more elaborate directory service and promotes reasonable
distribution of forwarding loads for migrated processes (it also saves
messages, though exactly which messages are saved depends on the details of
the mechanism it's competing with).

When a node fails, one can select an alternate node as the surrogate home,
broadcast this selection to the remaining members of the cluster, and update
the selected node with the current locations of all still-extant processes
migrated from the failed node (if the failed node recovers, it can then
reassume responsibility for this location list).

When any node needs to communicate with a process for the first time, it can
target the process's home node (as determined from the CPID), and either
reach the process there or be forwarded to its actual location.  If the home
node is known to be dead, a surrogate should already be known (if not, the
appropriate mechanism should be started).  And whenever a node is
communicating with a process for *other than* the first time, it should
usually be able to find the process's last-known location in a local cache
it can maintain (assuming that communication occurs much more frequently
than process migration, this will be a win - and if that assumption is
false, then the occasional additional message to a stale location will pale
in comparison to the migration overhead anyway).

That's a moderate amount of mechanism to create (including some moderately
tricky synchronization during state transitions), but that's the cost of
creating a solution that's both high-performance and scalable (if those are
goals).

- bill



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 01:37:40 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16278AbRGPXhd>; Tue, 17 Jul 2001 01:37:33 +0200
Received: from mercury.mv.net ([199.125.85.40]:13064 "EHLO mercury.mv.net")
	by humbolt.nl.linux.org with ESMTP id <S16271AbRGPXhQ>;
	Tue, 17 Jul 2001 01:37:16 +0200
Received: from filesrus (bnh-3-02.mv.com [199.125.99.130]) by mercury.mv.net (8.8.8/mem-971025) with SMTP id TAA29187 for <linux-cluster@nl.linux.org>; Mon, 16 Jul 2001 19:37:13 -0400 (EDT)
Message-ID: <033b01c10e50$d5b58460$85637dc7@filesrus>
From:	"Bill Todd" <billtodd@foo.mv.com>
To:	<linux-cluster@nl.linux.org>
References: <200107162217.f6GMHCU372439@saturn.cs.uml.edu> <3B536D9E.A038DF6B@kasey.umkc.edu>
Subject: Re: another laundry or shopping list
Date:	Mon, 16 Jul 2001 19:41:26 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


----- Original Message -----
From: "David L. Nicol" <david@kasey.umkc.edu>
To: "Albert D. Cahalan" <acahalan@cs.uml.edu>
Cc: <John.L.Byrne@compaq.com>; <ssic-linux-devel@opensource.compaq.com>;
<linux-cluster@nl.linux.org>
Sent: Monday, July 16, 2001 6:41 PM
Subject: Re: another laundry or shopping list


> "Albert D. Cahalan" wrote:
>
> > Some devices, such as /dev/null, get replicated.
> > Other devices, such as /dev/sda, require forwarding.
> >
> > So /dev/sda is on node 6, /dev/sdb is on node 23, etc.
>
> This can be done now with network block device, yes?  What
> file systems are able to handle multiple simultaneous access
> at device level?  (it would not be hard to add, at design
> level, but it is so far from the standard paradigm of one system
> gets to be the Single Definitive Point, as in NFS)

On Linux, GFS.  I don't know about XFS, and think JFS does not.

Elsewhere:  Oracle Parallel Server, similar products from DB2 and Informix
(yes, they're not file systems, but they might be of interest to Linux),
various file-access mechanisms in S/390 Parallel Sysplex that are likely not
of interest here, ODS-2 on VMS, Tru64's cluster file system (reportedly,
anyway, though I won't vouch for that), Tivoli's SANergy (acquired from
Mercury Computer), GPFS on AIX (using 'virtual shared disks'), EMC's Celerra
(recently extended to do so), the HP file system acquired from Transoft
Networks, ADIC's file system acquired from MountainGate, Avid's file system
acquired from Polybus (these last five were all developed largely to
optimize streaming-media access by multiple clients at high rates), ...

- bill

>
>
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/
>


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 01:43:32 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16277AbRGPXnX>; Tue, 17 Jul 2001 01:43:23 +0200
Received: from ns.caldera.de ([212.34.180.1]:47767 "EHLO ns.caldera.de")
	by humbolt.nl.linux.org with ESMTP id <S16156AbRGPXnN>;
	Tue, 17 Jul 2001 01:43:13 +0200
Received: (from hch@localhost)
	by ns.caldera.de (8.11.1/8.11.1) id f6GNgcL15005;
	Tue, 17 Jul 2001 01:42:38 +0200
Date:	Tue, 17 Jul 2001 01:42:38 +0200
From:	Christoph Hellwig <hch@caldera.de>
To:	Bill Todd <billtodd@foo.mv.com>
Cc:	linux-cluster@nl.linux.org
Subject: Re: another laundry or shopping list
Message-ID: <20010717014238.A14844@caldera.de>
References: <200107162217.f6GMHCU372439@saturn.cs.uml.edu> <3B536D9E.A038DF6B@kasey.umkc.edu> <033b01c10e50$d5b58460$85637dc7@filesrus>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <033b01c10e50$d5b58460$85637dc7@filesrus>; from billtodd@foo.mv.com on Mon, Jul 16, 2001 at 07:41:26PM -0400
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Mon, Jul 16, 2001 at 07:41:26PM -0400, Bill Todd wrote:
> 
> Elsewhere:  Oracle Parallel Server, similar products from DB2 and Informix
> (yes, they're not file systems, but they might be of interest to Linux),
> various file-access mechanisms in S/390 Parallel Sysplex that are likely not
> of interest here, ODS-2 on VMS, Tru64's cluster file system (reportedly,
> anyway, though I won't vouch for that), Tivoli's SANergy (acquired from
> Mercury Computer), GPFS on AIX (using 'virtual shared disks'),

It's an open secret that IBM is porting GPFS to linux..

	Christoph

-- 
Whip me.  Beat me.  Make me maintain AIX.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 01:44:02 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16302AbRGPXnm>; Tue, 17 Jul 2001 01:43:42 +0200
Received: from mercury.mv.net ([199.125.85.40]:22537 "EHLO mercury.mv.net")
	by humbolt.nl.linux.org with ESMTP id <S16231AbRGPXne>;
	Tue, 17 Jul 2001 01:43:34 +0200
Received: from filesrus (bnh-3-02.mv.com [199.125.99.130]) by mercury.mv.net (8.8.8/mem-971025) with SMTP id TAA01117 for <linux-cluster@nl.linux.org>; Mon, 16 Jul 2001 19:43:32 -0400 (EDT)
Message-ID: <036e01c10e51$b76872a0$85637dc7@filesrus>
From:	"Bill Todd" <billtodd@foo.mv.com>
To:	<linux-cluster@nl.linux.org>
References: <200107162217.f6GMHCU372439@saturn.cs.uml.edu> <3B536D9E.A038DF6B@kasey.umkc.edu> <033b01c10e50$d5b58460$85637dc7@filesrus>
Subject: Re: another laundry or shopping list
Date:	Mon, 16 Jul 2001 19:47:46 -0400
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.50.4522.1200
X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

> ... (these last five 

Duh - six.

- bill



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 02:21:24 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16323AbRGQAVT>; Tue, 17 Jul 2001 02:21:19 +0200
Received: from suntan.tandem.com ([192.216.221.8]:7073 "EHLO suntan.tandem.com")
	by humbolt.nl.linux.org with ESMTP id <S16306AbRGQAU6>;
	Tue, 17 Jul 2001 02:20:58 +0200
Received: from kahuna.cag.cpqcorp.net (kahuna.cag.cpqcorp.net [16.61.168.50])
	by suntan.tandem.com (8.9.3/2.0.1) with ESMTP id RAA26796
	for <linux-cluster@nl.linux.org>; Mon, 16 Jul 2001 17:20:55 -0700 (PDT)
Received: (from bruce@localhost) by kahuna.cag.cpqcorp.net (8.10.1/UW7.1.1-NSC) id f6H05pi12548; Mon, 16 Jul 2001 17:05:51 -0700 (PDT)
From:	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>
Message-Id: <200107170005.f6H05pi12548@kahuna.cag.cpqcorp.net>
Subject: Re: process tracking
In-Reply-To: <032a01c10e4e$f6c79f00$85637dc7@filesrus> from Bill Todd at "Jul 16, 2001 04:28:03 pm"
To:	billtodd@foo.mv.com (Bill Todd)
Date:	Mon, 16 Jul 2001 17:05:51 -0700 (PDT)
Cc:	linux-cluster@nl.linux.org, ssic-linux-devel@opensource.compaq.com
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Bill,
  I added the SSI mailing list to this response since there
is a lot of SSI in your note.

You have, to a large extent, described what we already do
in the SSI code which we will release later this month 
(see www.opensource.compaq.com for the SSI project if you are
interested), and, as you point out, there are some tricky aspects.

In that system, 
   - all processes except the single init have a node encoded pid
	(node part is the node where the process was created)
   - the node where a process is created is responsible for tracking
	where the process is currently executing but, unlike the
	Mosix implementation, does not execute any of the system code
	for the process.  All system calls are executed locally.
   - having the creation node track the process is handy to make sure
	the pid is not reused;
   - any system call (like kill) executed by any process on any node
	can query the creation node to determine where the process
	currently is (dealing of course with a process who was
	migrating at that very instant)
   - if the creation node fails (we actually called it the origin node),
	tracking of the process is taken over by the "surrogate origin"
	node (note that in Mosix, it is necessary for all processes
	that started from the origin node to abort since all, or almost
	all, their system calls are executed back on the home node).
   - when a node reboot and rejoins the cluster, it regains the tracking
	of old processes so the "finding a process" algorithm is not
	complicated and so that pid is not reused.
   - the algorithm for finding a process is simply:
       a - look locally
       b - if the creation node is up, ask him whether the process exists
	and if so, where
       c - if the creation node is down, ask the well known surrogate
	origin node that same question
   - by the way, if the surrogate origin node dies, it's data is automatically
	rebuilt on the well known surrogate takeover node.
   - there are no single points of failure for cluster data or the cluster
	itself.

I'm sorry you can't all try it out yet and help us improve on the SSI
capability.  I'll send mail when the download is available.

bruce walker
Open SSI Cluster Architect
Linux Technology Office
Compaq Computers.

> Bill Todd wrote:
> 
> This sounds like a fairly standard distributed-system issue, and it
> should
> be amenable to a standard solution.
> 
> If one assumes that process migrations are too common (and possibly that
> there are too many total processes in the cluster) to have all nodes
> track
> the locations of all (even just all migrated) processes, then some kind
> of
> process location directory mechanism is required.  The home-node
> mechanism
> (assuming that the home node is implicit in the CPID) both avoids the
> need
> for creating a more elaborate directory service and promotes reasonable
> distribution of forwarding loads for migrated processes (it also saves
> messages, though exactly which messages are saved depends on the details
> of
> the mechanism it's competing with).
> 
> When a node fails, one can select an alternate node as the surrogate
> home,
> broadcast this selection to the remaining members of the cluster, and
> update
> the selected node with the current locations of all still-extant
> processes
> migrated from the failed node (if the failed node recovers, it can then
> reassume responsibility for this location list).
> 
> When any node needs to communicate with a process for the first time, it
> can
> target the process's home node (as determined from the CPID), and either
> reach the process there or be forwarded to its actual location.  If the
> home
> node is known to be dead, a surrogate should already be known (if not,
> the
> appropriate mechanism should be started).  And whenever a node is
> communicating with a process for *other than* the first time, it should
> usually be able to find the process's last-known location in a local
> cache
> it can maintain (assuming that communication occurs much more frequently
> than process migration, this will be a win - and if that assumption is
> false, then the occasional additional message to a stale location will
> pale
> in comparison to the migration overhead anyway).
> 
> That's a moderate amount of mechanism to create (including some
> moderately
> tricky synchronization during state transitions), but that's the cost of
> creating a solution that's both high-performance and scalable (if those
> are
> goals).
> 
> - bill
> 
> 
> 
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 02:21:34 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16320AbRGQAVW>; Tue, 17 Jul 2001 02:21:22 +0200
Received: from 40-MADR-X34.libre.retevision.es ([62.83.3.40]:40174 "EHLO
	carlos") by humbolt.nl.linux.org with ESMTP id <S16301AbRGQAU6>;
	Tue, 17 Jul 2001 02:20:58 +0200
Received: from carlos by carlos with local (Exim 3.12 #1 (Debian))
	id 15MIdr-0000ef-00; Tue, 17 Jul 2001 02:23:11 +0200
Subject: Re: another laundry or shopping list
From:	carlos <manaha@wanadoo.es>
To:	Jordi Polo <mumismo@wanadoo.es>,
	linux-cluster <linux-cluster@nl.linux.org>
In-Reply-To: <01071621320101.01423@mioooldpc>
References: <200107161837.f6GIbqL338950@saturn.cs.uml.edu>  <01071621320101.01423@mioooldpc>
Content-Type: text/plain
X-Mailer: Evolution 0.5.1 (Developer Preview)
Mime-Version: 1.0
X-Evolution: 0000000f-0010
X-Mailer: Evolution 0.5.1 (Developer Preview)
Date:	16 Jul 2001 23:23:11 -0100
Message-Id: <E15MIdr-0000ef-00@carlos>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

Jordi Polo Wrote:

> I like MOSIX approach also, in fact i think that sometimes is just necesary , 
> but i would migrate more info and let in the home node just the minimum so 
> i'd break the depyty and migrate part of it . So if i have to send a signal i 
> can send it directly to the node where the process is. That implies the need 
> of the cache and all that thing of my last mail but eliminates  the need to 
> send 10 signals 3 lans away when the node is just next to me.
>   
> Another little idea: what about not moving but copying the process and sleep 
> it in the home node and a little heartbeat so if the other node fails you can 
> run again the process in other node.  This obviusly is a higher lever, very 
> optional feature. 

I think it's a good idea to include it as an optional issue in the
kernel. In that way people not only would choose between HA, or HP
cluster. People with slow lans will not choose that behaviour, but may
be many other people will be interested in the reliability of their
process. I like mosix approach but i think there are many systems that
will appreciate this effort. On the other hand , the problem is not as
easy to solve as make a heartbeat and send context and DATA dirty pages,
first of all we should decide things like how to manage the clock of the
system and other things like that.
I`am thinking in that part of code as an optional code in the cluster
system, I do not think a lot of people interested in waste a lots of
resources in that behaviour. 
> --
> Jordi
>   Student of Spain

Carlos

        the other Student of spain

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 07:30:54 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16171AbRGQFag>; Tue, 17 Jul 2001 07:30:36 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:17427 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16069AbRGQFab>;
	Tue, 17 Jul 2001 07:30:31 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6H5T8b390268;
	Tue, 17 Jul 2001 01:29:08 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107170529.f6H5T8b390268@saturn.cs.uml.edu>
Subject: Re: another laundry or shopping list
To:	david@kasey.umkc.edu (David L. Nicol)
Date:	Tue, 17 Jul 2001 01:29:08 -0400 (EDT)
Cc:	acahalan@cs.uml.edu (Albert D. Cahalan), John.L.Byrne@compaq.com,
	ssic-linux-devel@opensource.compaq.com, linux-cluster@nl.linux.org
In-Reply-To: <3B536D9E.A038DF6B@kasey.umkc.edu> from "David L. Nicol" at Jul 16, 2001 05:41:34 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

David L. Nicol writes:
> "Albert D. Cahalan" wrote:

>> Some devices, such as /dev/null, get replicated.
>> Other devices, such as /dev/sda, require forwarding.
>>
>> So /dev/sda is on node 6, /dev/sdb is on node 23, etc.
>
> This can be done now with network block device, yes?

Not really. That would be /dev/nb0, /dev/nb1...
What I mean is that "/dev/sdb is on node 23" can
really be /dev/sdb (major 8, minor 16), can be
visible to all nodes, and refer to a disk attached
to node 23.

> What
> file systems are able to handle multiple simultaneous access
> at device level?

GFS, available from www.sistina.com as regular GPL kernel code.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 09:13:42 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16197AbRGQHNX>; Tue, 17 Jul 2001 09:13:23 +0200
Received: from [213.98.27.110] ([213.98.27.110]:7177 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16087AbRGQHNC>;
	Tue, 17 Jul 2001 09:13:02 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6H9JV603681;
	Tue, 17 Jul 2001 09:19:31 GMT
Date:	Tue, 17 Jul 2001 09:19:31 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	"David L. Nicol" <david@kasey.umkc.edu>
cc:	<linux-cluster@nl.linux.org>
Subject: Re: the "cluster" system call (and file system type)
In-Reply-To: <3B532046.2652155A@kasey.umkc.edu>
Message-ID: <Pine.LNX.4.30.0107170914330.3408-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


 Hello, all!

> >  Like CPLAN portals?
>
>
> what is a CPLAN portal?


 Sorry, I misspelled it. It is <A
HREF="http://www.extremelinux.org/activities/usenix99/docs/cplant/talk/">
CPLANT</A>.


 Yours:

David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 09:53:04 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16194AbRGQHww>; Tue, 17 Jul 2001 09:52:52 +0200
Received: from [213.98.27.110] ([213.98.27.110]:23305 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16171AbRGQHwg>;
	Tue, 17 Jul 2001 09:52:36 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6H9wp603918;
	Tue, 17 Jul 2001 09:58:51 GMT
Date:	Tue, 17 Jul 2001 09:58:51 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
cc:	"David L. Nicol" <david@kasey.umkc.edu>,
	"David L. Nicol" <dnicol@cstp.umkc.edu>,
	Bruce Walker <bruce@kahuna.cag.cpqcorp.net>,
	Jordi Polo <mumismo@wanadoo.es>, <linux-cluster@nl.linux.org>
Subject: Re: the "cluster" system call (and file system type)
In-Reply-To: <200107161746.f6GHkOQ350465@saturn.cs.uml.edu>
Message-ID: <Pine.LNX.4.30.0107170920280.3408-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, all!

On Mon, 16 Jul 2001, Albert D. Cahalan wrote:

> >> the serious kernel hackers want to clean up /proc. That certainly
> >> includes Alexander Viro and seems to include Linus Torvalds.
> >
> >  Then, the rationale is: Linus and Viro says?
>
> Not really, but it ought to do. I'll try to explain again though:

 No, it oughtn't. All the tech documents that I used to read must have a
rationale backwards. Then, explain the rationale for the first time -not
again-:

> There are lots of ill-behaved files in /proc. Some of them depend
> on kernel config options. The content has undocumented syntax that
> changes on a whim. Notice that /proc is like a kid's toy box.

 Well, what I am saying does not colide with this. You can weep /proc of
ill-behaved files. What I am saying is that cluster's processes
information should be there. If you are going to take out the processes
information of "/proc", directly umount it. It is its most important
feature.


 The only thing is that the processes on a cluster have another piece of
information: where is running. "/proc" must have information of each
process, and information about here is running each processes. And that
information should be wherever the process information would be. Only for
being orthogonal.

> These messes lead to bugs, including security holes. It is
> difficult to verify that every last one of these interfaces
> works correctly. The code for /proc or /dev becomes difficult
> to maintain when it grows in misc. odd directions.

 Well, then you propose that some processes where on one directory, with
scsi, pcmcia and other information, and other processes where in other
directory, with part of the information of the processes of the first
directory?

> >> First one must consider: is this even a good thing to be doing?
> >
> >  No. That is what I amy trying to say on the whole thread.
>
> I mean "map processes to devices". I never suggested to do that.
> It is something you came up with, and I think it is a poor idea.

 Then, I can not understand you. I am talking about work with cluster
processes like the local processes, but with the where-is-running info;
and I imagined that you were talking about put all the cluster processes
information on "/dev". Maybe I can not understand what are you talking
about, or what do you wanted to say when you talked about put it on
"/dev", and put some examples of things that were moved from "/proc" to
"/dev".

> > it is only good for people that does not matter what they thing -as
> > myself-, but it sounds logic, and orthogonal.
>
> This is nice for SSI clusters.

 that is what I am talking about on this thread.

> >> I have to wonder why you'd be running an X server, but OK...
> >
> >
> >  After explaining to you why anybody want to execute a X server, If
> > somebody is enough inteligent for not to need to see the protein, or you
> > work on other think, thing on a DSP, or thing on a ASIC.
>
> You only need the client libraries for this. You don't need to
> run the X server. Set your DISPLAY environment variable to your
> desktop machine... you know about remote X usage I hope.

 Really, I can not understand you. I said:

 Some processes must run locally, and must not be affected by a SSI
semantics to its dependence to the hardware of a particular node, as X.


 You said to me:

  I have to wonder why you'd be running an X server, but OK...


 I tried to explain to you why could be interesting running a X, and try
to explain to you that there are times that it is used an ASIC on some
machines to help calculations -some Lenard-Jonnes calculus are
done sometimes on ASICs, it is very cheap and boost performance-. I have
read that some astrophysics people use ASIC for gravitational forces.

 And you say to me some things that I know about DISPLAY environment
variable.

 Well, maintaining a mail conversation with you is somewhat strange. We
are trying to find a good model for clustering under Linux, not trying to
demostrate that we are superior of the others. I DO know to use a X
server. If you respond to my  mail, please respond what I am saying, not
take appart one phrase and say  some strange thing about the phrase not
related to the topic to try to show us how wise you are. You are wise, you
have not to show it to the others.

> >  No. You will have local crond. And maybe a global crond. There are
> > maintance task on each node of a SSI cluster, and I will NOT doing them by
> > hand.
>
> There should not be maintance task on each node of a SSI cluster.

 Well, this would be great on a ideal SSI cluster. But doing a completly
SSI-ideal clustering will mean to modify the whole arquitecture of the
kernel. That is why we should find a middle term, with SSI semantics to
the user.

> That nearly a contradiction. I don't think flashing a new BIOS is
> something you do from a cron job.

 I suppose that the clusters that you manage have not filesystems, that
should be checked. I suppose that your hardware never fails, thus you does
not to check the state of the network cards. And lots of more things.

> >  Without SSI, without a common "/proc", what is exactly what you want for
> > clustering on the kernel? The weel was invented: it is PVM. We have all at
> > userland now, with PVM. If you feel good with all clustering on userland,
> > try PVM.
>
> Good idea. You don't need kernel support.

 Other time your discusion methodology. Please read the phrase. It's YOU
who is talking about that it is not needed a SSI cluster without a common
"/proc". It is you who, as YOU are saying to me, does not need the kernel
suport. I find great having a SSI cluster with a common "/proc". In fact,
I use Mosix.

> >  Once again, that is the state of art TWENTY years ago. We have this
> > twenty years ago. It is not needed so hype for your proponsal. Nearly all
> > OS have it. And Linux have it since sockets were included on the kernel.
>
> It's not broken. Why change it?

 That is your point of view. I would like to have SSI. For me, PVM it is
not broken, but is a twenty-years-old thechnology that should be improved.
If you are happy with PVM and no more, it's your life.

> "not in /proc" DOES NOT MEAN "do it by hand"
>
> With extra crud in the kernel:  foo=`cat /proc/foo`
> With a lean and simple kernel:  foo=`cluster --foo`
>
> You can script it either way.

 Yes, it would be great a script that works in "/proc" for the local
processes, and with a userland aplication for remote processes. On the
real thing, the work for the use it is orders of magnitude greater than a
orthogonal solution.

 Yours:

David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 10:39:12 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16151AbRGQIjE>; Tue, 17 Jul 2001 10:39:04 +0200
Received: from gate.in-addr.de ([212.8.193.158]:44549 "EHLO mx.in-addr.de")
	by humbolt.nl.linux.org with ESMTP id <S16137AbRGQIip>;
	Tue, 17 Jul 2001 10:38:45 +0200
Received: from hermes.marowsky-bree.de (localhost [127.0.0.1])
	by mx.in-addr.de (mail.in-addr.de) with ESMTP
	id 060B742646; Tue, 17 Jul 2001 10:38:03 +0200 (CEST)
Received: by hermes.marowsky-bree.de (Postfix, from userid 500)
	id B07D81AD3A; Tue, 17 Jul 2001 10:38:11 +0200 (CEST)
Date:	Tue, 17 Jul 2001 10:38:11 +0200
From:	Lars Marowsky-Bree <lmb@suse.de>
To:	Jordi Polo <mumismo@wanadoo.es>
Cc:	"Albert D. Cahalan" <acahalan@cs.uml.edu>, david@kasey.umkc.edu,
	jamagallon@able.es, irbis@orcero.org, linux-cluster@nl.linux.org
Subject: Re: another laundry or shopping list
Message-ID: <20010717103811.F584@marowsky-bree.de>
References: <200107161837.f6GIbqL338950@saturn.cs.uml.edu> <01071621320101.01423@mioooldpc>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
User-Agent: Mutt/1.3.16i
In-Reply-To: <01071621320101.01423@mioooldpc>; from "Jordi Polo" on 2001-07-16T21:32:01
X-Ctuhulu: HASTUR
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On 2001-07-16T21:32:01,
   Jordi Polo <mumismo@wanadoo.es> said:

> I agree we double the chance of fail but this about issues you can't take 
> away home node. You need some of the home node syscalls. Think in a 5 nodes 
> cluster,all of them connected with his own modem to internet.  2 of them with 
> telnet and ftp open, as the processes has that sockets open you need a little 
> of it being bound to that node. 

You cannot migrate a process which is tied to a node because it is utilising
node-local (ie, non-shared, not-cluster-aware) resources like a serial line.

That is quite a sensible restriction.

Or you would migrate it away, and _if_ the node doing the serial IO failed, it
would receive IO errors on that file, but the application would continue to
run despite this.

> will be all the machines in the cluster have the same time ? maybe you don't 
> want it and you want the time of your home node not others.

Un-synchronized time is really evil. You want synchronized time. (Note that
this is totally different from the _timezone_ setting)

xntpd is a good starting point.

Eddie/Erlang apparently has a time mechanism which guarantees cluster-wide
synchronized time, which is monotonously increasing. (They compensate clock
offset to the reference clock by making it tick a _little_ bit slower or
faster - really neat)

Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Perfection is our goal, excellence will be tolerated. -- J. Yahl


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 11:00:04 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16169AbRGQI77>; Tue, 17 Jul 2001 10:59:59 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:50437 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16142AbRGQI7k>;
	Tue, 17 Jul 2001 10:59:40 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6H8wFZ393169;
	Tue, 17 Jul 2001 04:58:15 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107170858.f6H8wFZ393169@saturn.cs.uml.edu>
Subject: Re: the "cluster" system call (and file system type)
To:	irbis@orcero.org
Date:	Tue, 17 Jul 2001 04:58:15 -0400 (EDT)
Cc:	acahalan@cs.uml.edu (Albert D. Cahalan),
	david@kasey.umkc.edu (David L. Nicol),
	dnicol@cstp.umkc.edu (David L. Nicol),
	bruce@kahuna.cag.cpqcorp.net (Bruce Walker),
	mumismo@wanadoo.es (Jordi Polo), linux-cluster@nl.linux.org
In-Reply-To: <Pine.LNX.4.30.0107170920280.3408-100000@hermes.orcero.org> from "irbis@orcero.org" at Jul 17, 2001 09:58:51 AM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

irbis@orcero.or writes:

> The only thing is that the processes on a cluster have another piece of
> information: where is running. "/proc" must have information of each
> process, and information about here is running each processes. And that
> information should be wherever the process information would be. Only for
> being orthogonal.

Think of /proc as the raw interface. If you want an app to
handle a non-SSI cluster, you use a library that gets process
information for non-local processes by some other method.

>> These messes lead to bugs, including security holes. It is
>> difficult to verify that every last one of these interfaces
>> works correctly. The code for /proc or /dev becomes difficult
>> to maintain when it grows in misc. odd directions.
>
>  Well, then you propose that some processes where on one directory, with
> scsi, pcmcia and other information, and other processes where in other
> directory, with part of the information of the processes of the first
> directory?

No, more like this:
rsh non-ssi-node-39 cat /proc/1/status

One would substitute a better protocol and add a nice library.

>> I mean "map processes to devices". I never suggested to do that.
>> It is something you came up with, and I think it is a poor idea.
>
> Then, I can not understand you. I am talking about work with cluster
> processes like the local processes, but with the where-is-running info;
> and I imagined that you were talking about put all the cluster processes
> information on "/dev". Maybe I can not understand what are you talking
> about, or what do you wanted to say when you talked about put it on
> "/dev", and put some examples of things that were moved from "/proc" to
> "/dev".

The point is that things are being moved out of /proc whenever
possible. Not everything get moved to /dev, and anyway not
everything in /dev is a block or character device.

> to explain to you that there are times that it is used an ASIC on some
> machines to help calculations -some Lenard-Jonnes calculus are
> done sometimes on ASICs, it is very cheap and boost performance-. I have
> read that some astrophysics people use ASIC for gravitational forces.

I suppose the ASIC gets a device file like any other device.
Just as with a serial port, it is good (but not required) to
have the process be near the hardware.

> I suppose that the clusters that you manage have not filesystems, that
> should be checked. I suppose that your hardware never fails, thus you does
> not to check the state of the network cards. And lots of more things.

That is pretty much true for me. Compute nodes have a CPU, some memory,
one multi-purpose chip, and maybe some L2 or L3 cache. There isn't much
to fail, and the node is pretty much dead if something does fail.
Disks are on the network, not local to any particular node.

Even on regular PC hardware, you could lock a process onto one node
without making it local-only. (stays on one node, but seen from all)

>>> try PVM.
>>
>> Good idea. You don't need kernel support.
>
> Other time your discusion methodology. Please read the phrase. It's YOU
> who is talking about that it is not needed a SSI cluster without a common
> "/proc". It is you who, as YOU are saying to me, does not need the kernel
> suport. I find great having a SSI cluster with a common "/proc". In fact,
> I use Mosix.

SSI must have a common /proc.

For non-SSI, why bother? You already have to handle the lack of a
shared PID space, filesystem name space, etc.

I don't think Mosix is SSI. You have separate filesystems.

>> "not in /proc" DOES NOT MEAN "do it by hand"
>>
>> With extra crud in the kernel:  foo=`cat /proc/foo`
>> With a lean and simple kernel:  foo=`cluster --foo`
>>
>> You can script it either way.
>
>  Yes, it would be great a script that works in "/proc" for the local
> processes, and with a userland aplication for remote processes. On the
> real thing, the work for the use it is orders of magnitude greater
> than a orthogonal solution.

You can call the data collection app for both local and remote
processes. The data collection app is a wrapper around a library.
Only the library needs to care about process location.

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 13:44:23 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16198AbRGQLoP>; Tue, 17 Jul 2001 13:44:15 +0200
Received: from [213.98.27.110] ([213.98.27.110]:41482 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16054AbRGQLoE>;
	Tue, 17 Jul 2001 13:44:04 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6HDoT605354;
	Tue, 17 Jul 2001 13:50:29 GMT
Date:	Tue, 17 Jul 2001 13:50:29 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	Jordi Polo <mumismo@wanadoo.es>
cc:	"David L. Nicol" <david@kasey.umkc.edu>,
	"J . A . Magallon" <jamagallon@able.es>,
	"Albert D . Cahalan" <acahalan@cs.uml.edu>,
	<linux-cluster@nl.linux.org>
Subject: Re: another laundry or shopping list
In-Reply-To: <01071621151500.01423@mioooldpc>
Message-ID: <Pine.LNX.4.30.0107171345580.5047-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Helo, all!

> > I like MOSIX's home-node paradigm.  A processes home node is where it
> > started, and the home node is responsible for keeping track of the


 I thing basicaly the same than you both on Mosix as a good starting
point. It scales really well, due to its P2P migration mechanism, but...

> > for all the machines, and stores it in its own /dev/shm, and
> > we could have cluster-wide absolute memory addressing, towards
> > a standard cluster-wide memory sharing protocol.
>
> Here i don't like the idea of the Central Swap Server, in fact i don't like
> any idea that implies a central server if:

 I must agree with Jordi. If we have a page fault at the same time than
network is disconected or overloaded, this would fail. We would also have
strongs problems on slow networks, and a very low scalability -thing on
broadcasting each page fault to the whole cluster-.

 Yours:


 David

---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 14:12:04 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16241AbRGQML4>; Tue, 17 Jul 2001 14:11:56 +0200
Received: from [213.98.27.110] ([213.98.27.110]:56074 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16231AbRGQMLm>;
	Tue, 17 Jul 2001 14:11:42 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6HEI3605557;
	Tue, 17 Jul 2001 14:18:04 GMT
Date:	Tue, 17 Jul 2001 14:18:03 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	<John.L.Byrne@compaq.com>
cc:	"David L. Nicol" <david@kasey.umkc.edu>,
	<ssic-linux-devel@opensource.compaq.com>,
	<linux-cluster@nl.linux.org>
Subject: Re: another laundry or shopping list
In-Reply-To: <3B53419A.B5A9B421@kahuna.cag.cpqcorp.net>
Message-ID: <Pine.LNX.4.30.0107171407570.5047-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, all!

On Mon, 16 Jul 2001, John Byrne wrote:

> "David L. Nicol" wrote:

> A problem with the MOSIX home-node paradigm is that if the home node
> goes away, a remote process has to die because context gets lost.


 No. It is because the kernel-part of the process does not migrate with
the user-part of the process. On the firts versions of Mosix, it migrated
the whole process. I do not know the rationale for such variation. Any
Mosix core developer could explain to us the problem of migrating the
whole process?

 The most important concept that we can get is that there is not a central
authority, the nodes negociate the migration between themself with a good
cost function. This allows high scalability on a way that is independent
fron the quality of the network. On the worst case, we can get this
concept and the cost function.

> I often wish K&R had designed the original Unix with clustering in mind
> and that all applications had been developed with it in mind as well. Is
> there a Linux Time-Machine project anywhere?

 They did it. It is Plan 9. ;-) It has some really cool ideas.

> I'd make remote swap a low-priority item.

 I will not do remote swap. ;-)

>
> >         remote storage


 We have some interesting works: GFS and PVFS. Maybe we could learn from
both. Personaly, I consider that a decentraliced GFS would do a great
storage system for clusters.

> Maintaining the  filesystem namespace for a cluster is probably more
> easily abstracted and we could do this first.

 We have this at this time, not as powerfull as in Plan9, but it works.

 Yours:

David



---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 16:57:48 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16309AbRGQO5k>; Tue, 17 Jul 2001 16:57:40 +0200
Received: from cs.huji.ac.il ([132.65.16.10]:65276 "EHLO cs.huji.ac.il")
	by humbolt.nl.linux.org with ESMTP id <S16307AbRGQO5W>;
	Tue, 17 Jul 2001 16:57:22 +0200
Received: from mos218.cs.huji.ac.il ([132.65.173.218] ident=mail)
	by cs.huji.ac.il with esmtp (Exim 3.30 #1)
	id 15MWHp-0000ps-00
	for linux-cluster@nl.linux.org; Tue, 17 Jul 2001 17:57:21 +0300
Received: from amnons by mos218.cs.huji.ac.il with local (Exim 3.16 #1)
	id 15MWHp-00032T-00
	for linux-cluster@nl.linux.org; Tue, 17 Jul 2001 17:57:21 +0300
Subject: Re: another laundry or shopping list
To:	linux-cluster@nl.linux.org
Date:	Tue, 17 Jul 2001 17:57:21 +0300 (IDT)
X-Mailer: ELM [version 2.5 PL3]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-Id: <E15MWHp-00032T-00@mos218.cs.huji.ac.il>
From:	Amnon Shiloh <amnons@cs.huji.ac.il>
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

David <irbis@orcero.org> Wrote:

> On Mon, 16 Jul 2001, John Byrne wrote:
> 
> > "David L. Nicol" wrote:
> 
> > A problem with the MOSIX home-node paradigm is that if the home node
> > goes away, a remote process has to die because context gets lost.
> 
> 
>  No. It is because the kernel-part of the process does not migrate with
> the user-part of the process. On the firts versions of Mosix, it migrated

> the whole process. I do not know the rationale for such variation. Any
> Mosix core developer could explain to us the problem of migrating the
> whole process?

The first versions of MOSIX (actually, they were called just "MOS")
were true SSI, but because of that we could not keep the user-interface 100%
the same, only 99.5%.  The devil was in the details and the differences were
minor, but all user-utilities and libraries had to be at least recompiled and
a few had to even be slightly modified.

This was reasonable at the time under Unix-5.2, where the whole user-sources
were less than 20MB... try to do it now on Linux, with TeraBytes of user-code
already around...

Eventually, we had to sacrifice those nice features for the sake of 100%
compatibility.  Suppose we went along today with "Plan 9" or anything similar
with SSI in mind - we would have great ideas and a better operating-system,
but would enough people be there to re-write/review all the user-programs ever
written for Linux and store both sources and binaries carefully as separate
copies from Linux all over the web and Linux-equivalent mirrors?

Amnon Shiloh -- the HUJI MOSIX group.


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 18:03:19 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16277AbRGQQDL>; Tue, 17 Jul 2001 18:03:11 +0200
Received: from garrincha.netbank.com.br ([200.203.199.88]:62734 "EHLO
	netbank.com.br") by humbolt.nl.linux.org with ESMTP
	id <S16262AbRGQQDA>; Tue, 17 Jul 2001 18:03:00 +0200
Received: from surriel.ddts.net (1-183.ctame701-1.telepar.net.br [200.181.137.183])
	by netbank.com.br (Postfix) with ESMTP
	id 0509946809; Tue, 17 Jul 2001 13:02:14 -0300 (BRST)
Received: from localhost (bvatay@localhost [127.0.0.1])
	by surriel.ddts.net (8.11.4/8.11.2) with ESMTP id f6HG2uF05997;
	Tue, 17 Jul 2001 13:02:56 -0300
Date:	Tue, 17 Jul 2001 13:02:56 -0300 (BRST)
From:	Rik van Riel <riel@conectiva.com.br>
X-X-Sender:  <riel@imladris.rielhome.conectiva>
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Cc:	"David L. Nicol" <david@kasey.umkc.edu>,
	<linux-cluster@nl.linux.org>
Subject: Re: Cahalan: This is NOT a cluster. Go away.
In-Reply-To: <200107130224.f6D2Oct136804@saturn.cs.uml.edu>
Message-ID: <Pine.LNX.4.33L.0107171301340.10870-100000@imladris.rielhome.conectiva>
X-spambait: aardvark@kernelnewbies.org
X-spammeplease:	aardvark@nl.linux.org
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Thu, 12 Jul 2001, Albert D. Cahalan wrote:
> David L. Nicol writes:
> > "Albert D. Cahalan" wrote:
> >>
> >> This is NOT a cluster. Go away.
> >
> > Go away?  Go work on grids?
>
> Whatever makes you happy. You sure don't want a cluster.

Albert,

I have NEVER seen you do anything else but insult
people on mailing lists. Please don't do this on
any of the list I manage or I will blacklist you.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 18:19:32 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16302AbRGQQTM>; Tue, 17 Jul 2001 18:19:12 +0200
Received: from 212-MADR-X50.libre.retevision.es ([62.83.21.212]:2564 "EHLO
	mioooldpc") by humbolt.nl.linux.org with ESMTP id <S16270AbRGQQS4>;
	Tue, 17 Jul 2001 18:18:56 +0200
Received: from mioooldpc (mioooldpc [127.0.0.1])
	by mioooldpc (Postfix) with SMTP
	id D386730568; Tue, 17 Jul 2001 18:25:34 +0200 (CEST)
Content-Type: text/plain;
  charset="utf-8"
From:	Jordi Polo <mumismo@wanadoo.es>
Organization: Echoff
To:	Amnon Shiloh <amnons@cs.huji.ac.il>
Subject: Re: another laundry or shopping list
Date:	Tue, 17 Jul 2001 18:25:33 +0200
X-Mailer: KMail [version 1.2]
References: <E15MWHp-00032T-00@mos218.cs.huji.ac.il>
In-Reply-To: <E15MWHp-00032T-00@mos218.cs.huji.ac.il>
Cc:	linux-cluster@nl.linux.org
MIME-Version: 1.0
Message-Id: <01071718253300.01293@mioooldpc>
Content-Transfer-Encoding: 8bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list


> The first versions of MOSIX (actually, they were called just "MOS")
> were true SSI, but because of that we could not keep the user-interface
> 100% the same, only 99.5%.  The devil was in the details and the
> differences were minor, but all user-utilities and libraries had to be at
> least recompiled and a few had to even be slightly modified.
>
> This was reasonable at the time under Unix-5.2, where the whole
> user-sources were less than 20MB... try to do it now on Linux, with
> TeraBytes of user-code already around...
>
> Eventually, we had to sacrifice those nice features for the sake of 100%
> compatibility.  Suppose we went along today with "Plan 9" or anything
> similar with SSI in mind - we would have great ideas and a better
> operating-system, but would enough people be there to re-write/review all
> the user-programs ever written for Linux and store both sources and
> binaries carefully as separate copies from Linux all over the web and
> Linux-equivalent mirrors?
>
> Amnon Shiloh -- the HUJI MOSIX group.
>


why not ?, if the changes are really very few, it's better do it now that 
never. If we have to recompile user-space apps, we have the source code to be 
able to do that, the distributions will have a good job with that. 
We'll have a transition and apps that works in kernel 3.0 and apps previous 
3.0 but as far as i see the community will migrate to 3.0 very fast.
As only a few apps will be slightly changed most of the people will just 
compile with no worries. I mean maybe you have to recompile glibc but i dont 
think any gnome app will see any change , so even it will quite transparent 
to users, but a nightmare to distros.
Of course then we must have a very clear squeme about what to implement in 
order to convence linus and the rest of the community . 

it's time for 2.9 

--
Jordi
  Student of Spain

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 18:37:56 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16270AbRGQQhs>; Tue, 17 Jul 2001 18:37:48 +0200
Received: from inet-mail3.oracle.com ([148.87.2.203]:10120 "EHLO
	inet-mail3.oracle.com") by humbolt.nl.linux.org with ESMTP
	id <S16066AbRGQQhg>; Tue, 17 Jul 2001 18:37:36 +0200
Received: from gmgw01.us.oracle.com (gmgw01.us.oracle.com [130.35.249.115])
	by inet-mail3.oracle.com (Switch-2.1.3/Switch-2.1.0) with ESMTP id f6HGX4923599;
	Tue, 17 Jul 2001 09:33:04 -0700 (PDT)
Received: from oracle.com ([152.68.53.75])
	by gmgw01.us.oracle.com (Switch-2.1.1/Switch-2.1.0) with ESMTP id f6HGb4j17012;
	Tue, 17 Jul 2001 09:37:04 -0700 (PDT)
Message-ID: <3B54699A.72612AC3@oracle.com>
Date:	Tue, 17 Jul 2001 09:36:42 -0700
From:	David Brower <david.brower@oracle.com>
Organization: Oracle Corporation
X-Mailer: Mozilla 4.73 [en] (Windows NT 5.0; U)
X-Accept-Language: en
MIME-Version: 1.0
To:	Amnon Shiloh <amnons@cs.huji.ac.il>
CC:	linux-cluster@nl.linux.org
Subject: Re: another laundry or shopping list
References: <E15MWHp-00032T-00@mos218.cs.huji.ac.il>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

I, for one, would be most interested to see the results of
a Mosix person looking at the compaq SSI, and  Bruce's taking
a closer look at Mosix, in the form of "compare and contrast".
It would be illuminating.

thanks,
-dB

Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 20:36:39 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16093AbRGQSgb>; Tue, 17 Jul 2001 20:36:31 +0200
Received: from [213.98.27.110] ([213.98.27.110]:1037 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16143AbRGQSgW>;
	Tue, 17 Jul 2001 20:36:22 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6HKgf608157;
	Tue, 17 Jul 2001 20:42:42 GMT
Date:	Tue, 17 Jul 2001 20:42:41 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	"David L. Nicol" <david@kasey.umkc.edu>
cc:	Jordi Polo <mumismo@wanadoo.es>,
	"J . A . Magallon" <jamagallon@able.es>,
	"Albert D . Cahalan" <acahalan@cs.uml.edu>,
	<linux-cluster@nl.linux.org>
Subject: Re: another laundry or shopping list
In-Reply-To: <3B532994.CD53DBEC@kasey.umkc.edu>
Message-ID: <Pine.LNX.4.30.0107172033310.8078-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, David!

> I like MOSIX's home-node paradigm.  A processes home node is where it


 I do like also their paradigm. In fact, they have a deep experience on
doing SSI-like facilities over non-SSI OS. I think that we must learn of
their knowledge, to use their KNOW-HOW, now matter the final code where
based on Mosix or rebuilt from scratch. They have really inteligent ideas,
that suppose a step forward respecting to PVM, what I think that it is
other of the important hits to learn about.

> If all the different pieces operate independently
> 	remote swap space
> 	remote storage
> 	remote CPU
> 	remote IO
> 	remote signal delivery
> 	remote whatever-is-left

 Personaly, I thing that remote swap space is not a good idea due to
scalability reasons and network failure response, but I agree with you
that it is better the subsystems be independent.

 In fact, we will find users that want a common parallel file system, but
does not want a process migration, or people that want process migration,
but that does not want a parallel filesystem, or both. I can find
scenarios of lots of combinations, and it would be interesting not to have
a bunch package "all or nothing".

 You are doing defense of a common swap area? What would be on your
opinion the good points of that mechanism?

 Yours:

David

---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 20:40:59 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16150AbRGQSkv>; Tue, 17 Jul 2001 20:40:51 +0200
Received: from [213.98.27.110] ([213.98.27.110]:5133 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16143AbRGQSkc>;
	Tue, 17 Jul 2001 20:40:32 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6HKks608188;
	Tue, 17 Jul 2001 20:46:54 GMT
Date:	Tue, 17 Jul 2001 20:46:54 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	Jordi Polo <mumismo@wanadoo.es>
cc:	"David L. Nicol" <david@kasey.umkc.edu>,
	"J . A . Magallon" <jamagallon@able.es>,
	"Albert D . Cahalan" <acahalan@cs.uml.edu>,
	<linux-cluster@nl.linux.org>
Subject: Re: another laundry or shopping list
In-Reply-To: <01071621151500.01423@mioooldpc>
Message-ID: <Pine.LNX.4.30.0107172043480.8078-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, Jordi!

> Another little idea: what about not moving but copying the process and sleep
> it in the home node and a little heartbeat so if the other node fails you can
> run again the process in other node.  This obviusly is a higher lever, very
> optional feature.

 It is useless for my personal use of clusters, but I have heared lots of
people talking about "checkpoints" in a cluster. It would be great for
fault tolerance. Since I am not an expert on fault tolerance, my opinion
should not be heared too much, but I think that it is a cool idea for
checkpoints, and that HA people might give their opinion here about this
stuff, only for contining this interesting brainstorming.

 Yours:

David


---------------------------
     irbis@orcero.org
http://www.orcero.org/irbis
---------------------------


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 20:44:29 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16158AbRGQSoL>; Tue, 17 Jul 2001 20:44:11 +0200
Received: from saturn.cs.uml.edu ([129.63.8.2]:19472 "EHLO saturn.cs.uml.edu")
	by humbolt.nl.linux.org with ESMTP id <S16145AbRGQSoD>;
	Tue, 17 Jul 2001 20:44:03 +0200
Received: (from acahalan@localhost)
	by saturn.cs.uml.edu (8.11.0/8.11.2) id f6HIgdA424763;
	Tue, 17 Jul 2001 14:42:39 -0400 (EDT)
From:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Message-Id: <200107171842.f6HIgdA424763@saturn.cs.uml.edu>
Subject: Re: another laundry or shopping list
To:	irbis@orcero.org
Date:	Tue, 17 Jul 2001 14:42:39 -0400 (EDT)
Cc:	david@kasey.umkc.edu (David L. Nicol),
	mumismo@wanadoo.es (Jordi Polo),
	jamagallon@able.es (J . A . Magallon),
	acahalan@cs.uml.edu (Albert D . Cahalan),
	linux-cluster@nl.linux.org
In-Reply-To: <Pine.LNX.4.30.0107172033310.8078-100000@hermes.orcero.org> from "irbis@orcero.org" at Jul 17, 2001 08:42:41 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

irbis@orcero.org writes:

> Personaly, I thing that remote swap space is not a good idea due to
> scalability reasons and network failure response, but I agree with you
> that it is better the subsystems be independent.

If you are going to do process migration, you might as well
do remote swap space. Won't you start running a migrated process
before all the pages are transferred? That's paging in from a
remote node. Couldn't you transfer read-only pages before the
real migration event occurs? That's paging out to a remote node.
Paging in and out to a remote node sure looks like remote swap.



Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 21:20:27 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16192AbRGQTUK>; Tue, 17 Jul 2001 21:20:10 +0200
Received: from garrincha.netbank.com.br ([200.203.199.88]:33290 "EHLO
	netbank.com.br") by humbolt.nl.linux.org with ESMTP
	id <S16153AbRGQTTs>; Tue, 17 Jul 2001 21:19:48 +0200
Received: from surriel.ddts.net (1-183.ctame701-1.telepar.net.br [200.181.137.183])
	by netbank.com.br (Postfix) with ESMTP
	id DB23A46803; Tue, 17 Jul 2001 16:18:54 -0300 (BRST)
Received: from localhost (xwetja@localhost [127.0.0.1])
	by surriel.ddts.net (8.11.4/8.11.2) with ESMTP id f6HJJcF29851;
	Tue, 17 Jul 2001 16:19:38 -0300
Date:	Tue, 17 Jul 2001 16:19:38 -0300 (BRST)
From:	Rik van Riel <riel@conectiva.com.br>
X-X-Sender:  <riel@imladris.rielhome.conectiva>
To:	"Albert D. Cahalan" <acahalan@cs.uml.edu>
Cc:	"David L. Nicol" <david@kasey.umkc.edu>,
	Jordi Polo <mumismo@wanadoo.es>,
	"J . A . Magallon" <jamagallon@able.es>, <irbis@orcero.org>,
	<linux-cluster@nl.linux.org>
Subject: Re: another laundry or shopping list
In-Reply-To: <200107162234.f6GMYCS334936@saturn.cs.uml.edu>
Message-ID: <Pine.LNX.4.33L.0107171615080.10870-100000@imladris.rielhome.conectiva>
X-spambait: aardvark@kernelnewbies.org
X-spammeplease:	aardvark@nl.linux.org
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list

On Mon, 16 Jul 2001, Albert D. Cahalan wrote:

> Is automatic process migration any more than an academic curiosity?

Yes. It's a perfect feature when you want to use
computing power from mostly-idle workstations but
you want to stop loading the machine when the user
goes back to typing/web browsing/...

Process migration is no less useful than eg.
preemptive multitasking...

> > Can we do SSI-like things, sharing more resources with less structure,
> > while still having the nodes under enforcedly separate management?
>
> Sure. Change that to "Should we..." and the answer is NO.

That's YOUR answer, not "the answer" (if such a thing
exists).

> Linux needs to stay clean, maintainable, secure, and fast.
> You can't get code into Linux using "ain't it cool" as the
> primary reason.

Hmm, wasn't Linux started as a "cool project" in
the first place? ;))

But indeed, if the coolness factor was the only
reason it wouldn't go in, but please listen to
what the other folks have to say before you start
flaming.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/

From owner-linux-cluster@nl.linux.org Tue Jul 17 21:35:14 2001
Received: (root@humbolt.nl.linux.org) by humbolt.nl.linux.org
	id <S16277AbRGQTfG>; Tue, 17 Jul 2001 21:35:06 +0200
Received: from [213.98.27.110] ([213.98.27.110]:26125 "EHLO hermes.orcero.org")
	by humbolt.nl.linux.org with ESMTP id <S16189AbRGQTe4>;
	Tue, 17 Jul 2001 21:34:56 +0200
Received: from localhost (localhost.localdomain [127.0.0.1])
	by hermes.orcero.org (8.11.0/8.11.0) with ESMTP id f6HLgO608574;
	Tue, 17 Jul 2001 21:42:24 GMT
Date:	Tue, 17 Jul 2001 21:42:24 +0000 (/etc/localtime)
From:	<irbis@orcero.org>
To:	Amnon Shiloh <amnons@cs.huji.ac.il>
cc:	<linux-cluster@nl.linux.org>
Subject: Re: another laundry or shopping list
In-Reply-To: <E15MWHp-00032T-00@mos218.cs.huji.ac.il>
Message-ID: <Pine.LNX.4.30.0107172132490.8078-100000@hermes.orcero.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender:	owner-linux-cluster@nl.linux.org
Precedence: bulk
Return-Path: <owner-linux-cluster@nl.linux.org>
X-Envelope-To: <"| /bin/marchive -a -m -f /home/majordomo/public_html/linux-cluster/folders/linux-cluster"> (uid 0)
X-Orcpt: rfc822;linux-cluster-list



 Hello, Amnon!

> > the whole process. I do not know the rationale for such variation. Any
> > Mosix core developer could explain to us the problem of migrating the
> > whole process?
>
> The first versions of MOSIX (actually, they were called just "MOS")
> were true SSI, but because of that we could not keep the user-interface 100%
> the same, only 99.5%.  The devil was in the details and the differences were
> minor, but all user-utilities and libraries had to be at least recompiled and
> a few had to even be slightly modified.

 Well... this is NO good. And we will find this problem here, since some
people on the list think on a full SSI implementation -I do not agree
100%, my opinion is that it is better enchancing Mosix, but all this is
ok-. Where did you find the problems of backward compatibility? Since you
walked this way on the past, your insights will be unvaluable. :-)

> compatibility.  Suppose we went along today with "Plan 9" or anything similar
> with SSI in mind - we would have great ideas and a better operating-system,
> but would 