From Linux NFS
Distribute NFSv4 state via File System or NFSD
The discussion of whether to distribute nfsv4 stateids via the NFSD
or through the file system is an ongoing debate. I would say it started
when I implemented the first pNFS prototype in 2004.
I'm sure this conversation will continue on forever until someone actually
spends the time to implement a nfsd-to-nfsd stateid mechanism and compares
its performance and functionality with doing the distribution via the file
system. Until then, this page tries to capture the issues.
I think the ideal compromise would have NFSD offer a default server-to-server
protocol, that does the most basic of things. File systems that need more
control can implement their own mechanism and hook into it via a standard
interface (via export ops or something)
Outside file system (via NFSD or something)
Pro: Simplifies work of FS
a) NFSD MUST be exactly in sync with the FS on the active data servers, and
which client has accessed data on which data server. (note the number of export
ops it would take maintain this coherence)
NFSD must know which devices to distribute/revoke state from. This means that
it must be in sync with regards to the active data servers. Ideally, it would
also track which data servers have which stateids so
it can avoid calling data servers that don't have any information.
In the laissez-faire case, it would even know which clients accessed which data
servers, so the revokes can be even more specific. In short, it seems to me
that the only place that really understands the whole picture is the FS.
b) FS already has a protocol to communicate between servers, why duplicate it?
c) Scalability is an issue. Many FSs use sophisticated communication algorithms
to spread information among many many servers (tree broadcast, etc)
Inside file system
a) Leaves the problem of performance and scalability to the file system
(which has already solved the problem)
b) The FS has all the required information to optimize distribution and
revocation of all state information. No sync with NFSD layer required.
Con: Each FS must implement extra export ops