Cluster Coherent NFSv4 and Delegations

From Linux NFS

Revision as of 22:30, 10 October 2006 by Peterhoneyman (Talk | contribs)
Jump to: navigation, search

Background

NFSv4 adds a new protocol feature: delegations. RFC 3530 explains:

The major addition to NFS version 4 in the area of caching is the ability of the server to delegate certain responsibilities to the client. When the server grants a delegation for a file to a client, the client is guaranteed certain semantics with respect to the sharing of that file with other clients. At OPEN, the server may provide the client either a read or write delegation for the file. If the client is granted a read delegation, it is assured that no other client has the ability to write to the file for the duration of the delegation. If the client is granted a write delegation, the client is assured that no other client has read or write access to the file.
Delegations can be recalled by the server. If another client requests access to the file in such a way that the access conflicts with the granted delegation, the server is able to notify the initial client and recall the delegation. This requires that a callback path exist between the server and client. If this callback path does not exist, then delegations can not be granted. The essence of a delegation is that it allows the client to locally service operations such as OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate interaction with the server."

Linux NFSv4 Delegation Support for Cluster Filesystems

To coordinate NFSv4 delegations with local access, we implement delegations with the lease extension to the VFS lock subsystem. The VFS lock subsystem uses fcntl() to set and get a lease. To allow a lease to be recalled, e.g., on account of a conflicting open, the VFS layer has a break_lease() function

* The break_lease call needs to be added to the VFS rename and unlink implementations.

When the NFS server's open() method is invoked, it may issue or recall a delegation. A delegation can be issued if it does not conflict with an existing delegation. Issuing a delegation is optional. A delegation can be recalled at any time. Recalling a delegation is mandatory if a conflicting open is received.

A conflicting open can come from a variety of sources: local access, NFS access, Samba access, etc. Every call to VFS open must check for conflict with an existing delegation and recall it if necessary. NFSD may wait for the delegation recall to complete, or may respond to the OPEN request with NFSERR_DELAY.

* Is that true?

If an OPEN request forces a delegation recall, the NFS server issues a CB_RECALL request to all clients holding the conflicting delegation. This is implemented with the VFS layer break_lease() call, which notifies lease holders that a conflicting OPEN has occurred. The VFS layer makes this determination without consulting the underlying file system.

Once the recall of conflicting delegations is complete, NFSD can proceed with its pending OPEN request. In order to determine whether it can issue a delegation for the request, NFSD needs information that lives on the other side of the VFS layer. The VFS lease subsystem can make the determination by examining the entry for the file in the open inode table: if there are no writers, then a READ delegation can be issued; if there are no readers or writers, then a WRITE delegation can be issued. NFSD must obtain the result of this determination from the VFS layer.

If NFSD elects to grant a delegation, it must inform the underlying file system so that the file system is able later to demand that NFSD recall the delegation.

Tasks

  • Ask file system to check for delegation recall in progress prior to granting an OPEN, granting a delegation, or initiating a recall.
  • Set up a callback from the file system to notify an NFSv4 server to perform a CB_RECALL upon a conflicting OPEN from another node.
  • Ask the file system if a delegation can be granted.
  • Tell the file system that the VFS on a node has detected a lease conflict (rename, unlink, etc) and that any delegations should be recalled.

Proposed Implementation

Extend the set/get/breaklease interfaces to service cluster file systems. The extensions will resemble the posix locking extensions (callbacks, etc).

What we probably need is new inode operations:

  • break_lease(inode, mode)
  • setlease(filp, mode)
  • getlease(filp, &mode)

Where mode can be one of read, write, or unlock. We'd also allow the mode to be or'ed with a nonblocking flag?

The VFS lease subsystem includes a series of lock manager callbacks. Will these be sufficient for the cluster filesystem case?

Actually current setlease and getlease functions use a struct file_lock instead of (or in addition to) the mode. Do we need that?

Also, setlease and getlease could be file operations instead of inode operations. This is probably a fairly arbitrary choice.

To handle the possibility that break_lease, setlease, getlease, etc. might block, even in the absence of contention, we might want to allow an -EINPROGRESS return to be followed by a callback e.g. break_lease_result(inode, stat); where stat might be -EAGAIN (we're waiting for the lease to be broken) or OK (it was immediately broken, or there never was one).

Status

Personal tools