Cluster coherent lock recovery
From Linux NFS
NFSv4 servers are permitted to forget which clients hold which locks on reboot, and instead depend on clients to replay their locking state on a reboot.
On startup they enter a "grace period" (by default on linux knfsd, 90 seconds) during which clients are allowed to reclaim old locks, but are not permitted to acquire new locks.
If the filesystem is shared by users other than knfsd's clients, those users also need to be prevented from acquiring locks during the grace period.
Normally we accomplish this by stopping any apps using the filesystem before stopping the nfs server, and starting the nfs server on reboot before starting any other applications.
This is tricky to get right. And on a cluster filesystem, it would require shutting down applications on *other* nodes using the same filesystem. Simplest would probably be just to reboot the entire cluster in that case, but that would require careful timing to coordinate grace periods, or one node could resume normal locking while another is still accepting reclaim locks.
So, we need some kind of logic to enforce consistent lock recovery across a cluster.
We'd like it to also allow for nfsd migration and failover, so that locks acquired on one node can be reclaimed on another.
It should also block locks by users other than knfsd during the grace period, so that we can safely run other applications on an nfs-exported cluster filesystem.
GFS2 is probably our first target.
It might also be interesting (at least for prototyping) to look at exports of filesystems shared across containers. One obstacle is incomplete support for reboot recovery in containers.