[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ETCP Project



there are huge cache consistency problems with this unless
you lock access for concurrent writers of the same data on
each node.    Who has -the- authoritative data if/when two
copies are different?  How do you resolve them?  You've
stumbled into a classic problem space-- synchronization
of replicas and resolution when they fall out of sync.

"The only thing interesting in h/a is what happens during and
after the failures."

FY consideration, the TruCluster CFS is a layer that resolves
the coherency issues, running on top of whatever filesystems
are available underneath.  If I understand correctly, it will
turn a mounted FAT partition into a correct CFS (but not
a very high perfomance one).

The way this sort of thing is usually done by CFS's that aren't
symmetric access (GFS is symmetric) is that one node really
mounts the FS; a layer handles the concurrent access, and other
nodes do remote ops to the mounting node.  The mounting node
is done in an HA way so that there is always one available.  Then,
the FS is build on logical volumes that mirror the storage in
different places -- such as LVM, using a local disk and maybe
an drbd like device.  Then these layers have problems at mirror
divergence time.  (The process usually involves picking a winning
side and "resilvering" the mirror with its contents -- bad if both
have changes that should be kept).

What you are proposing is to create a layer similar that maps two
truly different file systems, and coordinates their access.  Without
one point of truth defining "the" buffer in question somehow, I don't
know how you are going to make it work consistently correctly.

The big problem case to work out is:  processes on each node
writing to the end of a log file.  You must get all lines from all processes
in correct time order, without losing anything.  This is a highly contended
block
with concurrent writes, followed by file extension.

cheers,
-dB

Greg Lindahl wrote:

> > > It wouldn't be hard to have system calls which read execute only on
> > > the local node, and system calls which write get executed on the local
> > > and remote node. Voila, it's a HA component.
> >
> > I don't understand.  Read what execute only on the local system?
> > If it's by system call, then there's no local data to read locally.  I'm
> > confused.
>
> I didn't explain it fully.
>
> The basic ForwardFS is forwarding system calls to another system, so
> it can have any filesystem on the remote system, and there is no local
> data on disk.
>
> The HigherAvailabilityMirrorFS I described is a filesystem that can
> execute a given system call in 2 places: (1) against a local
> underlying filesystem (of any kind), and (2) against a remote
> underlying filesystem (of any kind). PVFS is actually forwarding the
> call up to a user-level process. In order to keep the 2 underlying
> filesystems synchronized, you need to do all writes against both, but
> reads only need to go against 1. (Screw atime.)
>
> -- g
>
> Linux-cluster: generic cluster infrastructure for Linux
> Archive:       http://mail.nl.linux.org/linux-cluster/


Linux-cluster: generic cluster infrastructure for Linux
Archive:       http://mail.nl.linux.org/linux-cluster/