Cluster Coherent NFSv4 and Delegations

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
m (Delegations moved to Cluster Coherent NFSv4 and Delegations)
(Undo revision 2717 by ZjrNu3 (Talk))
 
(7 intermediate revisions not shown)
Line 1: Line 1:
-
'''Cluster Coherent NFSv4 and Delegations'''
+
==Background==
 +
NFSv4 adds a new protocol feature: delegations. RFC 3530 explains:
-
''Background''
+
:The major addition to NFS version 4 in the area of caching is the ability of the server to delegate certain responsibilities to the client. When the server grants a delegation for a file to a client, the client is guaranteed certain semantics with respect to the sharing of that file with other clients. At OPEN, the server may provide the client either a read or write delegation for the file. If the client is granted a read delegation, it is assured that no other client has the ability to write to the file for the duration of the delegation. If the client is granted a write delegation, the client is assured that no other client has read or write access to the file.
-
NFSv4 adds a new protocol feature, Delegations. From rfc3530:
+
:Delegations can be recalled by the server. If another client requests access to the file in such a way that the access conflicts with the granted delegation, the server is able to notify the initial client and recall the delegation. This requires that a callback path exist between the server and client. If this callback path does not exist, then delegations can not be granted. The essence of a delegation is that it allows the client to locally service operations such as OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate interaction with the server.
-
"The major addition to NFS version 4 in the area of caching is the ability of the server to delegate certain responsibilities to the client. When the server grants a delegation for a file to a client, the client is guaranteed certain semantics with respect to the sharing of that file with other clients. At OPEN, the server may provide the client either a read or write delegation for the file. If the client is granted a read delegation, it is assured that no other client has the ability to write to the file for the duration of the delegation. If the client is granted a write delegation, the client is assured that no other client has read or write access to the file."
+
==Linux NFSv4 Delegation Support for Cluster Filesystems==
-
"Delegations can be recalled by the server. If another client requests access to the file in such a way that the access conflicts with the granted delegation, the server is able to notify the initial client and recall the delegation. This requires that a callback path exist between the server and client. If this callback path does not exist, then delegations can not be granted. The essence of a delegation is that it allows the client to locally service operations such as OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate interaction with the server."
+
To coordinate NFSv4 delegations with local access, we implement delegations with the lease extension to the VFS lock subsystem. The VFS lock subsystem uses fcntl() to set and get a lease. To allow a lease to be recalled, e.g., on account of a conflicting open, the VFS layer has a break_lease() function
-
'''Linux NFSv4 Deletgation Support for Cluster Filesystems'''
+
* The break_lease call needs to be added to the VFS rename and unlink implementations.
 +
* ''dmr'': this is done in the 2.6.16-based directory delegations kernel; that code needs to be broken into patches, moved forward, and tested.
-
The Linux NFSv4 server delegation implementation uses the lease extensions to the VFS lock subsystem (so a lease equals a delegation). Use of the lease subsystem coordinates local access and NFSv4 delegations. The VFS lease subsystem has an fcntl() interface to set and get a lease, and a break_lease function is called in the VFS layer to recall a lease upon a conflicting open (needs to be added to the VFS rename and unlink).
+
When the NFS server's open() method is invoked, it may issue or recall a delegation.  A delegation can be issued if it does not conflict with an existing delegation.  Issuing a delegation is optional. A delegation can be recalled at any time. Recalling a delegation is mandatory if a conflicting open is received.
-
The open syscall provides the opportunity for the NFSD to hand out a delegation. A conflicting open forces a delegation recall. The conflicting open could come from local access, NFS access, Samba access etc. Once a file has been delegated to any client, all OPENS must check if there is a delegation recall in progress related to the requested OPEN access (NFSERR_DELAY) prior to granting OPEN.
+
A conflicting open can come from a variety of sources: local access, NFS access, Samba access, etc. Every invocation of the VFS open method must check for conflict with an existing delegation and recall it if necessary.  NFSD may wait for the delegation recall to complete, or may respond to the OPEN request with NFSERR_DELAY.
-
If the requested OPEN access forces a delegation recall, NFSD initiates a CB_RECALL on all conflicting delegations. This is currently implemented using the VFS layer break_lease call, which notifies lease holders when a conflicting OPEN has occurred. The VFS layer makes this determination without consulting the underlying file system.
+
* Is that last bit true?
 +
* ''dmr'': if you mean the last sentence, yes, it's true in that the server could do that (or quickly stall for a period less than a client's retry
 +
    interval), but all it does now is respond with NFSERR_DELAY.
-
Finally, NFSD determines if it can hand out a delegation on the file for the requested OPEN. The VFS lease subsystem does this by examining in memory inode fields to determine if there are any writers (to grant a READ delegation) or any readers or writers ( to grant a WRITE delegation). The underlying file system will need to be consulted to make this determination.
+
If an OPEN request forces a delegation recall, NFSD issues a CB_RECALL request to all clients holding the conflicting delegation. This is implemented on the client with the VFS layer break_lease() call, which notifies lease holders that a conflicting OPEN has occurred. The VFS layer makes this determination without consulting the underlying file system.
-
If NFSD decides to grant a delegation, it needs to tell the underlying file system so that the file system can notify NFSD to recall the delegation at a later time.
+
* Now I'm confused.
-
'''Tasks'''
+
Once the recall of conflicting delegations is complete, NFSD can proceed with its pending OPEN request.  In order to determine whether it can issue a delegation for the request, NFSD needs information that lives on the other side of the VFS layer.  The VFS lease subsystem can make the determination by examining the entry for the file in the open inode table: if there are no writers, then a READ delegation can be issued; if there are no readers or writers, then a WRITE delegation can be issued.  NFSD must obtain the result of this determination from the VFS layer. 
-
    * Ask file system to check for delegation recall in progress prior to granting an OPEN,
+
* I think I gt this wrong.
-
      granting a delegation, or initiating a recall.
+
-
    * Set up a callback from the file system to notify an NFSv4 server to perform a CB_RECALL
+
-
      upon a conflicting OPEN from another node.
+
-
    * Ask the file system if a delegation can be granted.
+
-
    * Tell the file system that the VFS on a node has detected a lease conflict (rename,
+
-
      unlink, etc) and that any delegations should be recalled.
+
-
'''Proposed Implementation'''
+
If NFSD elects to grant a delegation, it must inform the underlying file system.
 +
 
 +
==Tasks==
 +
 
 +
* VFS OPEN must ask the file system to check for delegation recall in progress prior to granting an OPEN, granting a delegation, or initiating a recall.
 +
 
 +
* Does "delegation recall in progress" cover (a) recalls initiated as a result of the current open and (b) recalls from other opens that have not completed?
 +
 
 +
* If  delegation is issued, the NFS client must set up a callback path for a potential CB_RECALL request from the server.
 +
* NFSD must ask the file system if a delegation can be granted.
 +
* The VFS must tell the file system of a lease conflict (rename, unlink, etc) and compel it to recall any delegations.
 +
 
 +
==Proposed Implementation==
Extend the set/get/breaklease interfaces to service cluster file systems. The extensions will resemble the posix locking extensions (callbacks, etc).
Extend the set/get/breaklease interfaces to service cluster file systems. The extensions will resemble the posix locking extensions (callbacks, etc).
Line 37: Line 47:
What we probably need is new inode operations:
What we probably need is new inode operations:
-
    * break_lease(inode, mode)
+
* break_lease(inode, mode)
-
    * setlease(filp, mode)
+
* setlease(filp, mode)
-
    * getlease(filp, &mode)
+
* getlease(filp, &mode)
Where mode can be one of read, write, or unlock. We'd also allow the mode to be or'ed with a nonblocking flag?
Where mode can be one of read, write, or unlock. We'd also allow the mode to be or'ed with a nonblocking flag?
Line 51: Line 61:
To handle the possibility that break_lease, setlease, getlease, etc. might block, even in the absence of contention, we might want to allow an -EINPROGRESS return to be followed by a callback e.g. break_lease_result(inode, stat); where stat might be -EAGAIN (we're waiting for the lease to be broken) or OK (it was immediately broken, or there never was one).
To handle the possibility that break_lease, setlease, getlease, etc. might block, even in the absence of contention, we might want to allow an -EINPROGRESS return to be followed by a callback e.g. break_lease_result(inode, stat); where stat might be -EAGAIN (we're waiting for the lease to be broken) or OK (it was immediately broken, or there never was one).
-
'''Status'''
+
==Status==
 +
Implementation awaits progress in resolving the above issues.

Latest revision as of 16:10, 8 January 2008

Contents

Background

NFSv4 adds a new protocol feature: delegations. RFC 3530 explains:

The major addition to NFS version 4 in the area of caching is the ability of the server to delegate certain responsibilities to the client. When the server grants a delegation for a file to a client, the client is guaranteed certain semantics with respect to the sharing of that file with other clients. At OPEN, the server may provide the client either a read or write delegation for the file. If the client is granted a read delegation, it is assured that no other client has the ability to write to the file for the duration of the delegation. If the client is granted a write delegation, the client is assured that no other client has read or write access to the file.
Delegations can be recalled by the server. If another client requests access to the file in such a way that the access conflicts with the granted delegation, the server is able to notify the initial client and recall the delegation. This requires that a callback path exist between the server and client. If this callback path does not exist, then delegations can not be granted. The essence of a delegation is that it allows the client to locally service operations such as OPEN, CLOSE, LOCK, LOCKU, READ, WRITE without immediate interaction with the server.

Linux NFSv4 Delegation Support for Cluster Filesystems

To coordinate NFSv4 delegations with local access, we implement delegations with the lease extension to the VFS lock subsystem. The VFS lock subsystem uses fcntl() to set and get a lease. To allow a lease to be recalled, e.g., on account of a conflicting open, the VFS layer has a break_lease() function

* The break_lease call needs to be added to the VFS rename and unlink implementations.
* dmr: this is done in the 2.6.16-based directory delegations kernel; that code needs to be broken into patches, moved forward, and tested.

When the NFS server's open() method is invoked, it may issue or recall a delegation. A delegation can be issued if it does not conflict with an existing delegation. Issuing a delegation is optional. A delegation can be recalled at any time. Recalling a delegation is mandatory if a conflicting open is received.

A conflicting open can come from a variety of sources: local access, NFS access, Samba access, etc. Every invocation of the VFS open method must check for conflict with an existing delegation and recall it if necessary. NFSD may wait for the delegation recall to complete, or may respond to the OPEN request with NFSERR_DELAY.

* Is that last bit true?
* dmr: if you mean the last sentence, yes, it's true in that the server could do that (or quickly stall for a period less than a client's retry 
   interval), but all it does now is respond with NFSERR_DELAY.

If an OPEN request forces a delegation recall, NFSD issues a CB_RECALL request to all clients holding the conflicting delegation. This is implemented on the client with the VFS layer break_lease() call, which notifies lease holders that a conflicting OPEN has occurred. The VFS layer makes this determination without consulting the underlying file system.

* Now I'm confused.

Once the recall of conflicting delegations is complete, NFSD can proceed with its pending OPEN request. In order to determine whether it can issue a delegation for the request, NFSD needs information that lives on the other side of the VFS layer. The VFS lease subsystem can make the determination by examining the entry for the file in the open inode table: if there are no writers, then a READ delegation can be issued; if there are no readers or writers, then a WRITE delegation can be issued. NFSD must obtain the result of this determination from the VFS layer.

* I think I gt this wrong.

If NFSD elects to grant a delegation, it must inform the underlying file system.

Tasks

  • VFS OPEN must ask the file system to check for delegation recall in progress prior to granting an OPEN, granting a delegation, or initiating a recall.
* Does "delegation recall in progress" cover (a) recalls initiated as a result of the current open and (b) recalls from other opens that have not completed?
  • If delegation is issued, the NFS client must set up a callback path for a potential CB_RECALL request from the server.
  • NFSD must ask the file system if a delegation can be granted.
  • The VFS must tell the file system of a lease conflict (rename, unlink, etc) and compel it to recall any delegations.

Proposed Implementation

Extend the set/get/breaklease interfaces to service cluster file systems. The extensions will resemble the posix locking extensions (callbacks, etc).

What we probably need is new inode operations:

  • break_lease(inode, mode)
  • setlease(filp, mode)
  • getlease(filp, &mode)

Where mode can be one of read, write, or unlock. We'd also allow the mode to be or'ed with a nonblocking flag?

The VFS lease subsystem includes a series of lock manager callbacks. Will these be sufficient for the cluster filesystem case?

Actually current setlease and getlease functions use a struct file_lock instead of (or in addition to) the mode. Do we need that?

Also, setlease and getlease could be file operations instead of inode operations. This is probably a fairly arbitrary choice.

To handle the possibility that break_lease, setlease, getlease, etc. might block, even in the absence of contention, we might want to allow an -EINPROGRESS return to be followed by a callback e.g. break_lease_result(inode, stat); where stat might be -EAGAIN (we're waiting for the lease to be broken) or OK (it was immediately broken, or there never was one).

Status

Implementation awaits progress in resolving the above issues.

Personal tools