http://www.linux-nfs.org/wiki/index.php?title=Special:Contributions&feed=atom&limit=20&target=Peterhoneyman&year=&month=Linux NFS - User contributions [en]2024-03-29T09:12:14ZFrom Linux NFSMediaWiki 1.16.5http://www.linux-nfs.org/wiki/index.php/PNFS_prototype_designPNFS prototype design2010-04-01T17:12:25Z<p>Peterhoneyman: /* Current Issues */</p>
<hr />
<div>= pNFS =<br />
<br />
'''pNFS''' is part of the first NFSv4 minor version. This space is used to track and share Linux pNFS implementation ideas and issues.<br />
<br />
== General Information ==<br />
<br />
* [http://www.citi.umich.edu/projects/asci/pnfs/linux/ Linux pNFS Implementation Homepage]<br />
<br />
* [[pNFS Setup Instructions]] - Basic pNFS setup instructions.<br />
<br />
* [[GFS2 Setup Notes - cluster3, 2.6.27 kernel]]<br />
<br />
* [[Older GFS2 Setup Notes - first pass, in VMWare, and upgrading from cluster2 to cluster3]]<br />
<br />
* [[pNFS Block Server Setup Instructions]] - Basic pNFS Block Server setup instructions.<br />
<br />
<br />
<br />
==== Filing Bugs ====<br />
*[http://bugzilla.linux-nfs.org linux-nfs.org Bugzilla] - Read/ Write access by "NFSv4.1 related bugs" group members<br />
** Use the keywords: "NFSv4.1" and "pNFS".<br />
** The "NFSv4.1 related bugs" group is used to track our bugs. You'll need a user account on [http://bugzilla.linux-nfs.org bugzilla], after that, send an email to Trond to add you to the group.<br />
<br />
== Development Resources ==<br />
<br />
* [[pNFS Development Git tree|pNFS Development Git tree]]<br />
<br />
* [[pNFS Git tree recipies|pNFS Git tree recipies]]<br />
<br />
* [[Wireshark Patches|Wireshark Patches]]<br />
<br />
== Current Issues ==<br />
* [[Client_sessions_Implementation_Issues|Client Sessions Implementation Issues]]<br />
<br />
* [[Client pNFS Requirements]]<br />
**[[pNFS Client Review for Kernel Submission]] - Review and redesign of pNFS client for submission to the Kernel.<br />
<br />
* [[pNFS Todo List|pNFS Todo List]] (last updated July 2009)<br />
<br />
* [[pNFS Implementation Issues|pNFS Implementation Issues]] (last updated April 2008)<br />
<br />
* [[Bakeathon 2007 Issues List|Bakeathon 2007 Issues List]]<br />
<br />
* [[pNFS Development Road Map]]<br />
<br />
* [[pNFS File-based Stateid Distribution]]<br />
<br />
== Old Issues ==<br />
* [[Cthon06 Meeting Notes|Connectathon 2006 Linux pNFS Implementation Meeting Notes]]<br />
<br />
* [[linux pnfs client rewrite may 2006|Linux pNFS Client Internal Reorg patches May 2006 - For Display Purposes Only - Do Not Use]]<br />
<br />
* [[pNFS todo List 2007|pNFS todo List July 2007]]</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/PNFS_prototype_designPNFS prototype design2010-04-01T17:11:47Z<p>Peterhoneyman: /* Current Issues */</p>
<hr />
<div>= pNFS =<br />
<br />
'''pNFS''' is part of the first NFSv4 minor version. This space is used to track and share Linux pNFS implementation ideas and issues.<br />
<br />
== General Information ==<br />
<br />
* [http://www.citi.umich.edu/projects/asci/pnfs/linux/ Linux pNFS Implementation Homepage]<br />
<br />
* [[pNFS Setup Instructions]] - Basic pNFS setup instructions.<br />
<br />
* [[GFS2 Setup Notes - cluster3, 2.6.27 kernel]]<br />
<br />
* [[Older GFS2 Setup Notes - first pass, in VMWare, and upgrading from cluster2 to cluster3]]<br />
<br />
* [[pNFS Block Server Setup Instructions]] - Basic pNFS Block Server setup instructions.<br />
<br />
<br />
<br />
==== Filing Bugs ====<br />
*[http://bugzilla.linux-nfs.org linux-nfs.org Bugzilla] - Read/ Write access by "NFSv4.1 related bugs" group members<br />
** Use the keywords: "NFSv4.1" and "pNFS".<br />
** The "NFSv4.1 related bugs" group is used to track our bugs. You'll need a user account on [http://bugzilla.linux-nfs.org bugzilla], after that, send an email to Trond to add you to the group.<br />
<br />
== Development Resources ==<br />
<br />
* [[pNFS Development Git tree|pNFS Development Git tree]]<br />
<br />
* [[pNFS Git tree recipies|pNFS Git tree recipies]]<br />
<br />
* [[Wireshark Patches|Wireshark Patches]]<br />
<br />
== Current Issues ==<br />
* [[Client_sessions_Implementation_Issues|Client Sessions Implementation Issues]]<br />
<br />
* [[Client pNFS Requirements]]<br />
**[[pNFS Client Review for Kernel Submission]] - Review and redesign of pNFS client for submission to the Kernel.<br />
<br />
* [[pNFS Todo List|pNFS Todo List]]<br />
<br />
* [[pNFS Implementation Issues|pNFS Implementation Issues]] (last updated April 2008)<br />
<br />
* [[Bakeathon 2007 Issues List|Bakeathon 2007 Issues List]]<br />
<br />
* [[pNFS Development Road Map]]<br />
<br />
* [[pNFS File-based Stateid Distribution]]<br />
<br />
== Old Issues ==<br />
* [[Cthon06 Meeting Notes|Connectathon 2006 Linux pNFS Implementation Meeting Notes]]<br />
<br />
* [[linux pnfs client rewrite may 2006|Linux pNFS Client Internal Reorg patches May 2006 - For Display Purposes Only - Do Not Use]]<br />
<br />
* [[pNFS todo List 2007|pNFS todo List July 2007]]</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/PNFS_prototype_designPNFS prototype design2010-04-01T17:10:39Z<p>Peterhoneyman: /* Current Issues */</p>
<hr />
<div>= pNFS =<br />
<br />
'''pNFS''' is part of the first NFSv4 minor version. This space is used to track and share Linux pNFS implementation ideas and issues.<br />
<br />
== General Information ==<br />
<br />
* [http://www.citi.umich.edu/projects/asci/pnfs/linux/ Linux pNFS Implementation Homepage]<br />
<br />
* [[pNFS Setup Instructions]] - Basic pNFS setup instructions.<br />
<br />
* [[GFS2 Setup Notes - cluster3, 2.6.27 kernel]]<br />
<br />
* [[Older GFS2 Setup Notes - first pass, in VMWare, and upgrading from cluster2 to cluster3]]<br />
<br />
* [[pNFS Block Server Setup Instructions]] - Basic pNFS Block Server setup instructions.<br />
<br />
<br />
<br />
==== Filing Bugs ====<br />
*[http://bugzilla.linux-nfs.org linux-nfs.org Bugzilla] - Read/ Write access by "NFSv4.1 related bugs" group members<br />
** Use the keywords: "NFSv4.1" and "pNFS".<br />
** The "NFSv4.1 related bugs" group is used to track our bugs. You'll need a user account on [http://bugzilla.linux-nfs.org bugzilla], after that, send an email to Trond to add you to the group.<br />
<br />
== Development Resources ==<br />
<br />
* [[pNFS Development Git tree|pNFS Development Git tree]]<br />
<br />
* [[pNFS Git tree recipies|pNFS Git tree recipies]]<br />
<br />
* [[Wireshark Patches|Wireshark Patches]]<br />
<br />
== Current Issues ==<br />
* [[Client_sessions_Implementation_Issues|Client Sessions Implementation Issues]]<br />
<br />
* [[Client pNFS Requirements]]<br />
**[[pNFS Client Review for Kernel Submission]] - Review and redesign of pNFS client for submission to the Kernel.<br />
<br />
* [[pNFS Todo List|pNFS Todo List]]<br />
<br />
* [[pNFS Implementation Issues|pNFS Implementation Issues]]<br />
<br />
* [[Bakeathon 2007 Issues List|Bakeathon 2007 Issues List]]<br />
<br />
* [[pNFS Development Road Map]]<br />
<br />
* [[pNFS File-based Stateid Distribution]]<br />
<br />
== Old Issues ==<br />
* [[Cthon06 Meeting Notes|Connectathon 2006 Linux pNFS Implementation Meeting Notes]]<br />
<br />
* [[linux pnfs client rewrite may 2006|Linux pNFS Client Internal Reorg patches May 2006 - For Display Purposes Only - Do Not Use]]<br />
<br />
* [[pNFS todo List 2007|pNFS todo List July 2007]]</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-04-01T17:08:45Z<p>Peterhoneyman: purple -> orange</p>
<hr />
<div>This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">Issues labeled in red need to be addressed as part of the minimum pNFS functionality patches</font><br />
* <font color="orange">Issues labeled in purple can be deferred for now</font><br />
* <font color="green">Issues labeled in green can be deferred indefinitely</font><br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font><br />
* <font color="red">Review impact to struct nfs_server</font> Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font><br />
** <font color="red">Black list the layout module so that capability is not available</font><br />
** <font color="orange">Disable pNFS per mount</font><br />
** <font color="green">Define I/O threshold to override attributes and other policy on the client</font><br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font><br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font><br />
** <font color="green">Multiple layouts per filesystem</font><br />
* <font color="red">Data should survive data server filehandle invalidation</font><br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font><br />
* <font color="orange">Support Direct I/O</font><br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font><br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font><br />
*** <font color="orange">Reuse DS clientid/session if we already have one</font><br />
* <font color="red">Remove PNFS_CONFIG Flag</font><br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* <font color="red">Add, Remove, Locate</font><br />
** <font color="orange">Policy to prune unused device info (elevate?)</font><br />
** <font color="red">Umount should clean device table</font><br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** <font color="red">Careful handling of lease renewals</font><br />
* <font color="red">DeviceInfo Mappings</font><br />
* <font color="orange">Multipath support for each DS</font><br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** <font color="red">Give up and I/O through MDS</font><br />
*** <font color="orange">Reattempt through DS?</font><br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* <font color="red">Discuss with server implementers about need for state renewal daemon on DS</font><br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** <font color="orange">Return layouts if they have not been used within certain time to avoid running out of state on server</font><br />
* <font color="orange">Caching beyond CLOSE</font><br />
* <font color="red">Whole file layouts</font><br />
* <font color="orange">Segment layouts</font><br />
** <font color="orange">Merge Overlapping Layouts</font><br />
*** Revisit when we study the layout design<br />
* <font color="red">Should allow layouts of differing iomode for the same range</font><br />
* Stateid/Seqid management<br />
** <font color="red">OLD and BAD stateid error handling in layout operations</font><br />
* <font color="red">Check current Referring Tuple Handling works with pNFS callbacks</font><br />
<br />
=== <font color="red">Interaction with Delegations</font>===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* <font color="red">Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations</font><br />
** <font color="red">If client doesn't specify pNFS and server does, client needs to not do it</font><br />
* Remember server response to determine:<br />
** <font color="red">If we need to send GETATTR asking for layout type</font><br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* <font color="green">EXCHGID4_FLAG4_BIND_PRINC_STATEID</font><br />
* <font color="red">Separate nfs_client for MDS/DS dual personality</font><br />
** Make sure the client owner is different for each<br />
<br />
=== <font color="red">GETDEVICEINFO</font>===<br />
* <font color="orange">Request Device notifications</font><br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* <font color="red">Determine best GETDEVICEINFO_ARGS gdia_maxcount limits</font><br />
** <font color="red">XDR across page boundaries is problematic today but should be addressed</font><br />
* <font color="red">Handle NFS4ERR_TOOSMALL</font><br />
** <font color="red">Turn off pNFS</font><br />
* Determine where to invoke it<br />
** <font color="red">Invoke from the state manager</font><br />
<br />
=== <font color="green">GETDEVICELIST (Opt)</font>===<br />
<br />
=== <font color="red">LAYOUTGET</font>===<br />
* <font color="red">Determine where to invoke it</font><br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** <font color="red">Minimize sprinkling pNFS calls throughout the call</font><br />
** <font color="red">Minimize number of layout reference/ dereference (number of layout gets per I/O)</font><br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** <font color="red">Specify smart minimum and a reasonable size</font><br />
** <font color="orange">nfs_wait_on_sequence to serialize the gets, returns, and recalls</font><br />
* <font color="red">Support layout range that does not match request</font><br />
* <font color="red">Forgetful Model (12.5.5.1)</font><br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** <font color="red">I/O through MDS</font><br />
** <font color="orange">Timer to retry layout</font><br />
** <font color="orange">Mark inode to not request layout until all dirty pages are flushed</font><br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* <font color="red">Obey stripe unit size and commit through MDS bits</font><br />
* FileHandle Determination (13.3)<br />
** <font color="red">DS Filehandle same as MDS</font><br />
** <font color="red">Same DS Filehandle for every data server</font><br />
*** Not sure if we handle it<br />
** <font color="red">Unique Filehandle for each data server</font><br />
* <font color="red">Specify intended IO Mode in Layout</font><br />
* <font color="orange">More than one striping pattern: logr_layout array > 1</font><br />
* <font color="red">Able to handle different iomode from what was requested</font><br />
* <font color="red">Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3)</font><br />
* <font color="red">Obey logr_return_on_close</font> XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** <font color="red">What's the implication on the forgetful model</font><br />
* <font color="orange">Layout read(write)-ahead</font><br />
** <font color="red">Files Layout will request entire file</font><br />
This makes it impossible (or unfeasible) to extend files in block layout<br />
<br />
=== <font color="red">LAYOUTCOMMIT</font>===<br />
* <font color="red">Include last_write_offset, offset, length</font><br />
* <font color="green">Include mtime</font><br />
** <font color="red">getattr after LAYOUTCOMMIT to update cached attributes</font><br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* <font color="red">Determine where to invoke it</font><br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* <font color="red">Support sub-range layouts</font><br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* <font color="red">Recover from MDS reboot</font><br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** <font color="red">Check we have a layout and correct I/O mode before issuing layoutcommit</font><br />
** <font color="orange">Fred's bug of hole in the layout range</font> Subset of layout segments<br />
* <font color="red">Handle NFS4ERR_RECLAIM_BAD</font><br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== <font color="red">LAYOUTRETURN</font>===<br />
* <font color="red">Forgetful Model</font><br />
* <font color="red">On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1)</font><br />
* <font color="red">On CB_RECALL_ANY return LAYOUTRETURN4_ALL</font><br />
* <font color="green">Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1)</font><br />
* <font color="green">Return full range specified by the layout recall (12.5.5.1)</font><br />
* <font color="green">Ability to return chunks of layouts for huge files to show progress</font><br />
* <font color="green">Return entire range layout as final LAYOUTRETURN</font><br />
* <font color="green">Return NFS4ERR_NOMATCHING_LAYOUT if none is found</font><br />
* <font color="green">Bulk Return</font><br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** <font color="green">sync with nfs_wait_on_sequence()</font><br />
*** The seqid affinity is associated with the filehandle<br />
* <font color="green">Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4)</font><br />
** <font color="red">Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1)</font><br />
** <font color="green">Serialization later</font><br />
** <font color="orange">Return NFS4ERR_DELAY?</font><br />
* <font color="red">Error Recovery</font><br />
** Handle NFS4ERR_OLD_STATEID<br />
** <font color="green">Handle NFS4ERR_BAD_STATEID</font> stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* <font color="red">Error fallback on I/O error</font><br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== <font color="green">SECINFO_NO_NAME (Req)</font> === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* <font color="green">LayoutHint attribute</font><br />
** <font color="green">Need to define a user/programmable interface?</font><br />
* <font color="green">GETATTR follows OPEN to determine layout type</font><br />
* <font color="red">Support GUARDED during create</font><br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* <font color="orange">Compare commit verifier to each of the DS write verifiers</font> XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* <font color="red">Keep data until return value is received so that you can reissue request in case error</font><br />
<br />
== Callback Service Operations ==<br />
=== <font color="red">CB_LAYOUTRECALL</font>===<br />
* <font color="red">Forgetful client behavior</font><br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** <font color="orange">LAYOUTRECALL4_FSID</font><br />
** <font color="orange">LAYOTURECALL4_ALL</font><br />
<br />
=== <font color="red">CB_RECALL_ANY (Req)</font>===<br />
* <font color="red">Client issues LAYOUTRETURN(ALL) due to forgetful client model</font><br />
<br />
=== <font color="green">CB_RECALLABLE_OBJ_AVAIL</font>===<br />
* <font color="red">Set loga_signal_layout_avail on LAYOUTGET to FALSE</font><br />
<br />
=== <font color="green">CB_NOTIFY_DEVICEID (Opt)</font>===<br />
* <font color="red">Indicate no interest in notification</font><br />
* <font color="orange">Detect race with GETDEVICE_INFO</font><br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== <font color="green">CB_WANTS_CANCELLED (Req)</font>===<br />
* <font color="red">Specify no interest if needed</font><br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== <font color="green">SECINFO_NO_NAME</font>===<br />
<br />
=== I/O ===<br />
* <font color="red">Review Data distribution algorithm: (which DS, offset, length)</font><br />
* <font color="red">Sparse</font><br />
* <font color="green">Dense</font><br />
** <font color="red">Stash existing code</font><br />
* WRITE<br />
** <font color="red">Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data</font><br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** <font color="red">Zero byte & EOF handling on reads with holes handled locally (13.10)</font><br />
<br />
=== COMMIT ===<br />
* <font color="red">Commit through MDS</font><br />
* <font color="red">Commit through DS</font><br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** <font color="green">layout_hint</font><br />
** <font color="orange">layout_type</font><br />
** <font color="orange">mdsthreshold</font><br />
** <font color="red">fs_layout_type</font><br />
** <font color="orange">layout_alignment</font><br />
** <font color="orange">layout_blksize</font><br />
<br />
== Locking ==<br />
* <font color="orange">Mandatory Locking</font> <br />
** Use Lock StateID <br />
** <font color="orange">Handle NFS4ERR_LOCKED</font> Check with Windows (Tom Talpey) to see if there's a server in the future<br />
<br />
== Error Handling ==<br />
* <font color="red">Handle I/O errors due to fencing</font><br />
* <font color="red">Due to Layout Revocation</font><br />
* <font color="red">NFS4ERR_GRACE handling</font><br />
* <font color="red">State recovery through the State Manager only</font><br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** <font color="red">Retry pNFS on remount</font><br />
** <font color="orange">Timer?</font><br />
** <font color="orange">Clear error state once there are no more dirty pages?</font><br />
** <font color="red">Fail to MDS on first error - keep it simple</font><br />
** <font color="orange">Retry pNFS after X condition/time</font><br />
<br />
== Security ==<br />
* <font color="red">DS ACL related errors?</font><br />
<br />
== Multiple Layout Type Support ==<br />
* <font color="green">Different Layout types for different files</font><br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** <font color="red">Write through MDS</font><br />
** <font color="orange">Redo Session/Layout setup, reissue I/O to DSs</font><br />
<br />
=== <font color="green">Lease Move (11.7.7.1) (Low Priority)</font>===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* <font color="red">Handle fencing error</font><br />
<br />
=== Metadata Server Restart ===<br />
* <font color="red">SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID</font><br />
* Server out of Grace<br />
** <font color="red">I/O through MDS</font><br />
** <font color="orange">Redo Session/Layout setup, reissue I/O to DSs</font><br />
* Server in Grace<br />
** <font color="red">LAYOUT_COMMIT in reclaim mode</font><br />
** <font color="orange">Redo Session/Layout setup, reissue I/O to DSs</font><br />
<br />
== Data Server Multipathing (13.5) ==<br />
* <font color="orange">Bandwidth Scaling</font><br />
** <font color="green">Session Trunking</font><br />
* Higher Availability<br />
** <font color="orange">multipath_list4</font><br />
** <font color="orange">Replacement DeviceID-to-Device address mapping</font><br />
* <font color="orange">Replacement DeviceID</font><br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:PeterhoneymanUser:Peterhoneyman2010-04-01T17:00:45Z<p>Peterhoneyman: </p>
<hr />
<div>'''Peter Honeyman''' has been a member of the University of Michigan faculty for over 20 years.<br />
He holds the B.G.S. (with distinction) from the University<br />
of Michigan and the M.S.E., M.A., and Ph.D. degrees from Princeton<br />
University. <br />
<br />
After completing doctoral studies in relational database<br />
theory under the supervision of J.D. Ullman, he joined Bell Labs<br />
as a Member of Technical Staff in computer systems research, then<br />
returned to Princeton as Assistant Professor of Computer Science.<br />
<br />
At the University of Michigan, Honeyman holds the following appointments:<br />
* Research Professor of Information<br />
* Scientific Director of the Center for Information Technology Integration<br />
* Adjunct Professor of Electrical Engineering and Computer Science.<br />
<br />
As an experimental computer scientist, Honeyman builds middleware<br />
for file systems, security, and mobile computing. He has been<br />
instrumental in software projects including Honey DanBer UUCP,<br />
PathAlias, MacNFS, Disconnected AFS, and WebCard, the first Internet<br />
smart card. Current work centers on CITI's open source reference<br />
implementation of NFSv4 and its extensions for high end computing.<br />
<br />
Honeyman is a member of USENIX, IFIP WGs 6.1 and 8.8, AAAS, and EFF.<br />
<br />
I have a public [[User:Peterhoneyman/sandbox|sandbox]].</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/Client_pNFS_RequirementsClient pNFS Requirements2010-04-01T16:55:59Z<p>Peterhoneyman: page rename</p>
<hr />
<div>This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* An (A) indicates the issue needs to be addressed as part of the minimum pNFS functionality patches<br />
* A (B) indicates the issue can be deferred for a subsequent wave of patches<br />
* A (C) indicates the issue can be indefinitely deferred as there is no clear requirement for it<br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* Review impact to struct nfs_client (A) Batsakis<br />
** Ensure layouts are cleaned-up in the right order when the client is destroyed (A)<br />
* Review impact to struct nfs_server (A) Batsakis<br />
* Review impact to struct nfs4_session (A) Batsakis<br />
* Determine if there is a need for the DS to have a struct nfs_server (A) Batsakis<br />
* Ability to tell client not to use pNFS against a server which may support it (A)<br />
** Black list the layout module so that capability is not available (A)<br />
** Disable pNFS per mount (B)<br />
** Define I/O threshold to override attributes and other policy on the client (C)<br />
* Layout Drivers should be automatically loaded (Using request module call) (A)<br />
* Ability to have multiple layouts loaded<br />
** One layout type per filesystem (A)<br />
** Multiple layouts per filesystem (C-)<br />
* Data should survive data server filehandle invalidation (A)<br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1) (A)<br />
* Support Direct I/O (B?)<br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* Support Buffered I/O (Page based) (A)<br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** Each personality with its own clientid and session (A)<br />
*** Reuse DS clientid/session if we already have one (B)<br />
* Remove PNFS_CONFIG Flag (A)<br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* Add, Remove, Locate (A)<br />
** Policy to prune unused device info (B+)<br />
** Umount should clean device table (A)<br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** Careful handling of lease renewals (A)<br />
* DeviceInfo Mappings (A)<br />
* Multipath support for each DS (B)<br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** Give up and I/O through MDS (A)<br />
*** Reattempt through DS? (B)<br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* Discuss with server implementers about need for state renewal daemon on DS (A)<br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/Client_pnfs_deliverablesClient pnfs deliverables2010-04-01T16:53:06Z<p>Peterhoneyman: moved Client pnfs deliverables to Client pNFS Requirements:&#32;just following orders</p>
<hr />
<div>#REDIRECT [[Client pNFS Requirements]]</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/Client_pNFS_RequirementsClient pNFS Requirements2010-04-01T16:53:05Z<p>Peterhoneyman: moved Client pnfs deliverables to Client pNFS Requirements:&#32;just following orders</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* An (A) indicates the issue needs to be addressed as part of the minimum pNFS functionality patches<br />
* A (B) indicates the issue can be deferred for a subsequent wave of patches<br />
* A (C) indicates the issue can be indefinitely deferred as there is no clear requirement for it<br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* Review impact to struct nfs_client (A) Batsakis<br />
** Ensure layouts are cleaned-up in the right order when the client is destroyed (A)<br />
* Review impact to struct nfs_server (A) Batsakis<br />
* Review impact to struct nfs4_session (A) Batsakis<br />
* Determine if there is a need for the DS to have a struct nfs_server (A) Batsakis<br />
* Ability to tell client not to use pNFS against a server which may support it (A)<br />
** Black list the layout module so that capability is not available (A)<br />
** Disable pNFS per mount (B)<br />
** Define I/O threshold to override attributes and other policy on the client (C)<br />
* Layout Drivers should be automatically loaded (Using request module call) (A)<br />
* Ability to have multiple layouts loaded<br />
** One layout type per filesystem (A)<br />
** Multiple layouts per filesystem (C-)<br />
* Data should survive data server filehandle invalidation (A)<br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1) (A)<br />
* Support Direct I/O (B?)<br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* Support Buffered I/O (Page based) (A)<br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** Each personality with its own clientid and session (A)<br />
*** Reuse DS clientid/session if we already have one (B)<br />
* Remove PNFS_CONFIG Flag (A)<br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* Add, Remove, Locate (A)<br />
** Policy to prune unused device info (B+)<br />
** Umount should clean device table (A)<br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** Careful handling of lease renewals (A)<br />
* DeviceInfo Mappings (A)<br />
* Multipath support for each DS (B)<br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** Give up and I/O through MDS (A)<br />
*** Reattempt through DS? (B)<br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* Discuss with server implementers about need for state renewal daemon on DS (A)<br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-05T18:51:36Z<p>Peterhoneyman: /* LAYOUTGET */</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">Issues labeled in red need to be addressed as part of the minimum pNFS functionality patches</font><br />
* <font color="purple">Issues labeled in purple can be deferred for now</font><br />
* <font color="green">Issues labeled in green can be deferred indefinitely</font><br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font><br />
* <font color="red">Review impact to struct nfs_server</font> Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font><br />
** <font color="red">Black list the layout module so that capability is not available</font><br />
** <font color="purple">Disable pNFS per mount</font><br />
** <font color="green">Define I/O threshold to override attributes and other policy on the client</font><br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font><br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font><br />
** <font color="green">Multiple layouts per filesystem</font><br />
* <font color="red">Data should survive data server filehandle invalidation</font><br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font><br />
* <font color="purple">Support Direct I/O</font><br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font><br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font><br />
*** <font color="purple">Reuse DS clientid/session if we already have one</font><br />
* <font color="red">Remove PNFS_CONFIG Flag</font><br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* <font color="red">Add, Remove, Locate</font><br />
** <font color="purple">Policy to prune unused device info (elevate?)</font><br />
** <font color="red">Umount should clean device table</font><br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** <font color="red">Careful handling of lease renewals</font><br />
* <font color="red">DeviceInfo Mappings</font><br />
* <font color="purple">Multipath support for each DS</font><br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** <font color="red">Give up and I/O through MDS</font><br />
*** <font color="purple">Reattempt through DS?</font><br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* <font color="red">Discuss with server implementers about need for state renewal daemon on DS</font><br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** <font color="purple">Return layouts if they have not been used within certain time to avoid running out of state on server</font><br />
* <font color="purple">Caching beyond CLOSE</font><br />
* <font color="red">Whole file layouts</font><br />
* <font color="purple">Segment layouts</font><br />
** <font color="purple">Merge Overlapping Layouts</font><br />
*** Revisit when we study the layout design<br />
* <font color="red">Should allow layouts of differing iomode for the same range</font><br />
* Stateid/Seqid management<br />
** <font color="red">OLD and BAD stateid error handling in layout operations</font><br />
* <font color="red">Check current Referring Tuple Handling works with pNFS callbacks</font><br />
<br />
=== <font color="red">Interaction with Delegations</font>===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* <font color="red">Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations</font><br />
** <font color="red">If client doesn't specify pNFS and server does, client needs to not do it</font><br />
* Remember server response to determine:<br />
** <font color="red">If we need to send GETATTR asking for layout type</font><br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* <font color="green">EXCHGID4_FLAG4_BIND_PRINC_STATEID</font><br />
* <font color="red">Separate nfs_client for MDS/DS dual personality</font><br />
** Make sure the client owner is different for each<br />
<br />
=== <font color="red">GETDEVICEINFO</font>===<br />
* <font color="purple">Request Device notifications</font><br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* <font color="red">Determine best GETDEVICEINFO_ARGS gdia_maxcount limits</font><br />
** <font color="red">XDR across page boundaries is problematic today but should be addressed</font><br />
* <font color="red">Handle NFS4ERR_TOOSMALL</font><br />
** <font color="red">Turn off pNFS</font><br />
* Determine where to invoke it<br />
** <font color="red">Invoke from the state manager</font><br />
<br />
=== <font color="green">GETDEVICELIST (Opt)</font>===<br />
<br />
=== <font color="red">LAYOUTGET</font>===<br />
* <font color="red">Determine where to invoke it</font><br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** <font color="red">Minimize sprinkling pNFS calls throughout the call</font><br />
** <font color="red">Minimize number of layout reference/ dereference (number of layout gets per I/O)</font><br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** <font color="red">Specify smart minimum and a reasonable size</font><br />
** <font color="purple">nfs_wait_on_sequence to serialize the gets, returns, and recalls</font><br />
* <font color="red">Support layout range that does not match request</font><br />
* <font color="red">Forgetful Model (12.5.5.1)</font><br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** <font color="red">I/O through MDS</font><br />
** <font color="purple">Timer to retry layout</font><br />
** <font color="purple">Mark inode to not request layout until all dirty pages are flushed</font><br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* <font color="red">Obey stripe unit size and commit through MDS bits</font><br />
* FileHandle Determination (13.3)<br />
** <font color="red">DS Filehandle same as MDS</font><br />
** <font color="red">Same DS Filehandle for every data server</font><br />
*** Not sure if we handle it<br />
** <font color="red">Unique Filehandle for each data server</font><br />
* <font color="red">Specify intended IO Mode in Layout</font><br />
* <font color="purple">More than one striping pattern: logr_layout array > 1</font><br />
* <font color="red">Able to handle different iomode from what was requested</font><br />
* <font color="red">Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3)</font><br />
* <font color="red">Obey logr_return_on_close</font> XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** <font color="red">What's the implication on the forgetful model</font><br />
* <font color="purple">Layout read(write)-ahead</font><br />
** <font color="red">Files Layout will request entire file</font><br />
This makes it impossible (or unfeasible) to extend files in block layout<br />
<br />
=== <font color="red">LAYOUTCOMMIT</font>===<br />
* <font color="red">Include last_write_offset, offset, length</font><br />
* <font color="green">Include mtime</font><br />
** <font color="red">getattr after LAYOUTCOMMIT to update cached attributes</font><br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* <font color="red">Determine where to invoke it</font><br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* <font color="red">Support sub-range layouts</font><br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* <font color="red">Recover from MDS reboot</font><br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** <font color="red">Check we have a layout and correct I/O mode before issuing layoutcommit</font><br />
** <font color="purple">Fred's bug of hole in the layout range</font> Subset of layout segments<br />
* <font color="red">Handle NFS4ERR_RECLAIM_BAD</font><br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== <font color="red">LAYOUTRETURN</font>===<br />
* <font color="red">Forgetful Model</font><br />
* <font color="red">On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1)</font><br />
* <font color="red">On CB_RECALL_ANY return LAYOUTRETURN4_ALL</font><br />
* <font color="green">Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1)</font><br />
* <font color="green">Return full range specified by the layout recall (12.5.5.1)</font><br />
* <font color="green">Ability to return chunks of layouts for huge files to show progress</font><br />
* <font color="green">Return entire range layout as final LAYOUTRETURN</font><br />
* <font color="green">Return NFS4ERR_NOMATCHING_LAYOUT if none is found</font><br />
* <font color="green">Bulk Return</font><br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** <font color="green">sync with nfs_wait_on_sequence()</font><br />
*** The seqid affinity is associated with the filehandle<br />
* <font color="green">Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4)</font><br />
** <font color="red">Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1)</font><br />
** <font color="green">Serialization later</font><br />
** <font color="purple">Return NFS4ERR_DELAY?</font><br />
* <font color="red">Error Recovery</font><br />
** Handle NFS4ERR_OLD_STATEID<br />
** <font color="green">Handle NFS4ERR_BAD_STATEID</font> stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* <font color="red">Error fallback on I/O error</font><br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== <font color="green">SECINFO_NO_NAME (Req)</font> === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* <font color="green">LayoutHint attribute</font><br />
** <font color="green">Need to define a user/programmable interface?</font><br />
* <font color="green">GETATTR follows OPEN to determine layout type</font><br />
* <font color="red">Support GUARDED during create</font><br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* <font color="purple">Compare commit verifier to each of the DS write verifiers</font> XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* <font color="red">Keep data until return value is received so that you can reissue request in case error</font><br />
<br />
== Callback Service Operations ==<br />
=== <font color="red">CB_LAYOUTRECALL</font>===<br />
* <font color="red">Forgetful client behavior</font><br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** <font color="purple">LAYOUTRECALL4_FSID</font><br />
** <font color="purple">LAYOTURECALL4_ALL</font><br />
<br />
=== <font color="red">CB_RECALL_ANY (Req)</font>===<br />
* <font color="red">Client issues LAYOUTRETURN(ALL) due to forgetful client model</font><br />
<br />
=== <font color="green">CB_RECALLABLE_OBJ_AVAIL</font>===<br />
* <font color="red">Set loga_signal_layout_avail on LAYOUTGET to FALSE</font><br />
<br />
=== <font color="green">CB_NOTIFY_DEVICEID (Opt)</font>===<br />
* <font color="red">Indicate no interest in notification</font><br />
* <font color="purple">Detect race with GETDEVICE_INFO</font><br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== <font color="green">CB_WANTS_CANCELLED (Req)</font>===<br />
* <font color="red">Specify no interest if needed</font><br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== <font color="green">SECINFO_NO_NAME</font>===<br />
<br />
=== I/O ===<br />
* <font color="red">Review Data distribution algorithm: (which DS, offset, length)</font><br />
* <font color="red">Sparse</font><br />
* <font color="green">Dense</font><br />
** <font color="red">Stash existing code</font><br />
* WRITE<br />
** <font color="red">Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data</font><br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** <font color="red">Zero byte & EOF handling on reads with holes handled locally (13.10)</font><br />
<br />
=== COMMIT ===<br />
* <font color="red">Commit through MDS</font><br />
* <font color="red">Commit through DS</font><br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** <font color="green">layout_hint</font><br />
** <font color="purple">layout_type</font><br />
** <font color="purple">mdsthreshold</font><br />
** <font color="red">fs_layout_type</font><br />
** <font color="purple">layout_alignment</font><br />
** <font color="purple">layout_blksize</font><br />
<br />
== Locking ==<br />
* <font color="purple">Mandatory Locking</font> <br />
** Use Lock StateID <br />
** <font color="purple">Handle NFS4ERR_LOCKED</font> Check with Windows (Tom Talpey) to see if there's a server in the future<br />
<br />
== Error Handling ==<br />
* <font color="red">Handle I/O errors due to fencing</font><br />
* <font color="red">Due to Layout Revocation</font><br />
* <font color="red">NFS4ERR_GRACE handling</font><br />
* <font color="red">State recovery through the State Manager only</font><br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** <font color="red">Retry pNFS on remount</font><br />
** <font color="purple">Timer?</font><br />
** <font color="purple">Clear error state once there are no more dirty pages?</font><br />
** <font color="red">Fail to MDS on first error - keep it simple</font><br />
** <font color="purple">Retry pNFS after X condition/time</font><br />
<br />
== Security ==<br />
* <font color="red">DS ACL related errors?</font><br />
<br />
== Multiple Layout Type Support ==<br />
* <font color="green">Different Layout types for different files</font><br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** <font color="red">Write through MDS</font><br />
** <font color="purple">Redo Session/Layout setup, reissue I/O to DSs</font><br />
<br />
=== <font color="green">Lease Move (11.7.7.1) (Low Priority)</font>===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* <font color="red">Handle fencing error</font><br />
<br />
=== Metadata Server Restart ===<br />
* <font color="red">SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID</font><br />
* Server out of Grace<br />
** <font color="red">I/O through MDS</font><br />
** <font color="purple">Redo Session/Layout setup, reissue I/O to DSs</font><br />
* Server in Grace<br />
** <font color="red">LAYOUT_COMMIT in reclaim mode</font><br />
** <font color="purple">Redo Session/Layout setup, reissue I/O to DSs</font><br />
<br />
== Data Server Multipathing (13.5) ==<br />
* <font color="purple">Bandwidth Scaling</font><br />
** <font color="green">Session Trunking</font><br />
* Higher Availability<br />
** <font color="purple">multipath_list4</font><br />
** <font color="purple">Replacement DeviceID-to-Device address mapping</font><br />
* <font color="purple">Replacement DeviceID</font><br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-05T18:51:17Z<p>Peterhoneyman: /* LAYOUTGET */</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">Issues labeled in red need to be addressed as part of the minimum pNFS functionality patches</font><br />
* <font color="purple">Issues labeled in purple can be deferred for now</font><br />
* <font color="green">Issues labeled in green can be deferred indefinitely</font><br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font><br />
* <font color="red">Review impact to struct nfs_server</font> Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font><br />
** <font color="red">Black list the layout module so that capability is not available</font><br />
** <font color="purple">Disable pNFS per mount</font><br />
** <font color="green">Define I/O threshold to override attributes and other policy on the client</font><br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font><br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font><br />
** <font color="green">Multiple layouts per filesystem</font><br />
* <font color="red">Data should survive data server filehandle invalidation</font><br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font><br />
* <font color="purple">Support Direct I/O</font><br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font><br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font><br />
*** <font color="purple">Reuse DS clientid/session if we already have one</font><br />
* <font color="red">Remove PNFS_CONFIG Flag</font><br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* <font color="red">Add, Remove, Locate</font><br />
** <font color="purple">Policy to prune unused device info (elevate?)</font><br />
** <font color="red">Umount should clean device table</font><br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** <font color="red">Careful handling of lease renewals</font><br />
* <font color="red">DeviceInfo Mappings</font><br />
* <font color="purple">Multipath support for each DS</font><br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** <font color="red">Give up and I/O through MDS</font><br />
*** <font color="purple">Reattempt through DS?</font><br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* <font color="red">Discuss with server implementers about need for state renewal daemon on DS</font><br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** <font color="purple">Return layouts if they have not been used within certain time to avoid running out of state on server</font><br />
* <font color="purple">Caching beyond CLOSE</font><br />
* <font color="red">Whole file layouts</font><br />
* <font color="purple">Segment layouts</font><br />
** <font color="purple">Merge Overlapping Layouts</font><br />
*** Revisit when we study the layout design<br />
* <font color="red">Should allow layouts of differing iomode for the same range</font><br />
* Stateid/Seqid management<br />
** <font color="red">OLD and BAD stateid error handling in layout operations</font><br />
* <font color="red">Check current Referring Tuple Handling works with pNFS callbacks</font><br />
<br />
=== <font color="red">Interaction with Delegations</font>===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* <font color="red">Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations</font><br />
** <font color="red">If client doesn't specify pNFS and server does, client needs to not do it</font><br />
* Remember server response to determine:<br />
** <font color="red">If we need to send GETATTR asking for layout type</font><br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* <font color="green">EXCHGID4_FLAG4_BIND_PRINC_STATEID</font><br />
* <font color="red">Separate nfs_client for MDS/DS dual personality</font><br />
** Make sure the client owner is different for each<br />
<br />
=== <font color="red">GETDEVICEINFO</font>===<br />
* <font color="purple">Request Device notifications</font><br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* <font color="red">Determine best GETDEVICEINFO_ARGS gdia_maxcount limits</font><br />
** <font color="red">XDR across page boundaries is problematic today but should be addressed</font><br />
* <font color="red">Handle NFS4ERR_TOOSMALL</font><br />
** <font color="red">Turn off pNFS</font><br />
* Determine where to invoke it<br />
** <font color="red">Invoke from the state manager</font><br />
<br />
=== <font color="green">GETDEVICELIST (Opt)</font>===<br />
<br />
=== <font color="red">LAYOUTGET</font>===<br />
* <font color="red">Determine where to invoke it</font><br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** <font color="red">Minimize sprinkling pNFS calls throughout the call</font><br />
** <font color="red">Minimize number of layout reference/ dereference (number of layout gets per I/O)</font><br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** <font color="red">Specify smart minimum and a reasonable size</font><br />
** <font color="purple">nfs_wait_on_sequence to serialize the gets, returns, and recalls</font><br />
* <font color="red">Support layout range that does not match request</font><br />
* <font color="red">Forgetful Model (12.5.5.1)</font><br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** <font color="red">I/O through MDS</font><br />
** <font color="purple">Timer to retry layout</font><br />
** <font color="purple">Mark inode to not request layout until all dirty pages are flushed</font><br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* <font color="red">Obey stripe unit size and commit through MDS bits</font><br />
* FileHandle Determination (13.3)<br />
** <font color="red">DS Filehandle same as MDS</font><br />
** <font color="red">Same DS Filehandle for every data server</font><br />
*** Not sure if we handle it<br />
** <font color="red">Unique Filehandle for each data server</font><br />
* <font color="red">Specify intended IO Mode in Layout</font><br />
* <font color="purple">More than one striping pattern: logr_layout array > 1</font><br />
* <font color="red">Able to handle different iomode from what was requested</font><br />
* <font color="red">Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3)</font><br />
* <font color="red">Obey logr_return_on_close</font> XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** <font color="red">What's the implication on the forgetful model</font><br />
* <font color="purple">Layout read(write)-ahead</font><br />
** <font color="red">Files Layout will request entire file</font><br />
This makes it impossible (at most unfeasible) to extend files in block layout<br />
<br />
=== <font color="red">LAYOUTCOMMIT</font>===<br />
* <font color="red">Include last_write_offset, offset, length</font><br />
* <font color="green">Include mtime</font><br />
** <font color="red">getattr after LAYOUTCOMMIT to update cached attributes</font><br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* <font color="red">Determine where to invoke it</font><br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* <font color="red">Support sub-range layouts</font><br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* <font color="red">Recover from MDS reboot</font><br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** <font color="red">Check we have a layout and correct I/O mode before issuing layoutcommit</font><br />
** <font color="purple">Fred's bug of hole in the layout range</font> Subset of layout segments<br />
* <font color="red">Handle NFS4ERR_RECLAIM_BAD</font><br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== <font color="red">LAYOUTRETURN</font>===<br />
* <font color="red">Forgetful Model</font><br />
* <font color="red">On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1)</font><br />
* <font color="red">On CB_RECALL_ANY return LAYOUTRETURN4_ALL</font><br />
* <font color="green">Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1)</font><br />
* <font color="green">Return full range specified by the layout recall (12.5.5.1)</font><br />
* <font color="green">Ability to return chunks of layouts for huge files to show progress</font><br />
* <font color="green">Return entire range layout as final LAYOUTRETURN</font><br />
* <font color="green">Return NFS4ERR_NOMATCHING_LAYOUT if none is found</font><br />
* <font color="green">Bulk Return</font><br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** <font color="green">sync with nfs_wait_on_sequence()</font><br />
*** The seqid affinity is associated with the filehandle<br />
* <font color="green">Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4)</font><br />
** <font color="red">Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1)</font><br />
** <font color="green">Serialization later</font><br />
** <font color="purple">Return NFS4ERR_DELAY?</font><br />
* <font color="red">Error Recovery</font><br />
** Handle NFS4ERR_OLD_STATEID<br />
** <font color="green">Handle NFS4ERR_BAD_STATEID</font> stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* <font color="red">Error fallback on I/O error</font><br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== <font color="green">SECINFO_NO_NAME (Req)</font> === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* <font color="green">LayoutHint attribute</font><br />
** <font color="green">Need to define a user/programmable interface?</font><br />
* <font color="green">GETATTR follows OPEN to determine layout type</font><br />
* <font color="red">Support GUARDED during create</font><br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* <font color="purple">Compare commit verifier to each of the DS write verifiers</font> XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* <font color="red">Keep data until return value is received so that you can reissue request in case error</font><br />
<br />
== Callback Service Operations ==<br />
=== <font color="red">CB_LAYOUTRECALL</font>===<br />
* <font color="red">Forgetful client behavior</font><br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** <font color="purple">LAYOUTRECALL4_FSID</font><br />
** <font color="purple">LAYOTURECALL4_ALL</font><br />
<br />
=== <font color="red">CB_RECALL_ANY (Req)</font>===<br />
* <font color="red">Client issues LAYOUTRETURN(ALL) due to forgetful client model</font><br />
<br />
=== <font color="green">CB_RECALLABLE_OBJ_AVAIL</font>===<br />
* <font color="red">Set loga_signal_layout_avail on LAYOUTGET to FALSE</font><br />
<br />
=== <font color="green">CB_NOTIFY_DEVICEID (Opt)</font>===<br />
* <font color="red">Indicate no interest in notification</font><br />
* <font color="purple">Detect race with GETDEVICE_INFO</font><br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== <font color="green">CB_WANTS_CANCELLED (Req)</font>===<br />
* <font color="red">Specify no interest if needed</font><br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== <font color="green">SECINFO_NO_NAME</font>===<br />
<br />
=== I/O ===<br />
* <font color="red">Review Data distribution algorithm: (which DS, offset, length)</font><br />
* <font color="red">Sparse</font><br />
* <font color="green">Dense</font><br />
** <font color="red">Stash existing code</font><br />
* WRITE<br />
** <font color="red">Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data</font><br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** <font color="red">Zero byte & EOF handling on reads with holes handled locally (13.10)</font><br />
<br />
=== COMMIT ===<br />
* <font color="red">Commit through MDS</font><br />
* <font color="red">Commit through DS</font><br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** <font color="green">layout_hint</font><br />
** <font color="purple">layout_type</font><br />
** <font color="purple">mdsthreshold</font><br />
** <font color="red">fs_layout_type</font><br />
** <font color="purple">layout_alignment</font><br />
** <font color="purple">layout_blksize</font><br />
<br />
== Locking ==<br />
* <font color="purple">Mandatory Locking</font> <br />
** Use Lock StateID <br />
** <font color="purple">Handle NFS4ERR_LOCKED</font> Check with Windows (Tom Talpey) to see if there's a server in the future<br />
<br />
== Error Handling ==<br />
* <font color="red">Handle I/O errors due to fencing</font><br />
* <font color="red">Due to Layout Revocation</font><br />
* <font color="red">NFS4ERR_GRACE handling</font><br />
* <font color="red">State recovery through the State Manager only</font><br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** <font color="red">Retry pNFS on remount</font><br />
** <font color="purple">Timer?</font><br />
** <font color="purple">Clear error state once there are no more dirty pages?</font><br />
** <font color="red">Fail to MDS on first error - keep it simple</font><br />
** <font color="purple">Retry pNFS after X condition/time</font><br />
<br />
== Security ==<br />
* <font color="red">DS ACL related errors?</font><br />
<br />
== Multiple Layout Type Support ==<br />
* <font color="green">Different Layout types for different files</font><br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** <font color="red">Write through MDS</font><br />
** <font color="purple">Redo Session/Layout setup, reissue I/O to DSs</font><br />
<br />
=== <font color="green">Lease Move (11.7.7.1) (Low Priority)</font>===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* <font color="red">Handle fencing error</font><br />
<br />
=== Metadata Server Restart ===<br />
* <font color="red">SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID</font><br />
* Server out of Grace<br />
** <font color="red">I/O through MDS</font><br />
** <font color="purple">Redo Session/Layout setup, reissue I/O to DSs</font><br />
* Server in Grace<br />
** <font color="red">LAYOUT_COMMIT in reclaim mode</font><br />
** <font color="purple">Redo Session/Layout setup, reissue I/O to DSs</font><br />
<br />
== Data Server Multipathing (13.5) ==<br />
* <font color="purple">Bandwidth Scaling</font><br />
** <font color="green">Session Trunking</font><br />
* Higher Availability<br />
** <font color="purple">multipath_list4</font><br />
** <font color="purple">Replacement DeviceID-to-Device address mapping</font><br />
* <font color="purple">Replacement DeviceID</font><br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-04T15:04:40Z<p>Peterhoneyman: </p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">Issues labeled in red need to be addressed as part of the minimum pNFS functionality patches</font><br />
* <font color="purple">Issues labeled in purple can be deferred for now</font><br />
* <font color="green">Issues labeled in green can be deferred indefinitely</font><br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font><br />
* <font color="red">Review impact to struct nfs_server</font> Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font><br />
** <font color="red">Black list the layout module so that capability is not available</font><br />
** <font color="purple">Disable pNFS per mount</font><br />
** <font color="green">Define I/O threshold to override attributes and other policy on the client</font><br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font><br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font><br />
** <font color="green">Multiple layouts per filesystem</font><br />
* <font color="red">Data should survive data server filehandle invalidation</font><br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font><br />
* <font color="purple">Support Direct I/O</font><br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font><br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font><br />
*** <font color="purple">Reuse DS clientid/session if we already have one</font><br />
* <font color="red">Remove PNFS_CONFIG Flag</font><br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* <font color="red">Add, Remove, Locate</font><br />
** <font color="purple">Policy to prune unused device info (elevate?)</font><br />
** <font color="red">Umount should clean device table</font><br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** <font color="red">Careful handling of lease renewals</font><br />
* <font color="red">DeviceInfo Mappings</font><br />
* <font color="purple">Multipath support for each DS</font><br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** <font color="red">Give up and I/O through MDS</font><br />
*** <font color="purple">Reattempt through DS?</font><br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* <font color="red">Discuss with server implementers about need for state renewal daemon on DS</font><br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** <font color="purple">Return layouts if they have not been used within certain time to avoid running out of state on server</font><br />
* <font color="purple">Caching beyond CLOSE</font><br />
* <font color="red">Whole file layouts</font><br />
* <font color="purple">Segment layouts</font><br />
** <font color="purple">Merge Overlapping Layouts</font><br />
*** Revisit when we study the layout design<br />
* <font color="red">Should allow layouts of differing iomode for the same range</font><br />
* Stateid/Seqid management<br />
** <font color="red">OLD and BAD stateid error handling in layout operations</font><br />
* <font color="red">Check current Referring Tuple Handling works with pNFS callbacks</font><br />
<br />
=== <font color="red">Interaction with Delegations</font>===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* <font color="red">Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations</font><br />
** <font color="red">If client doesn't specify pNFS and server does, client needs to not do it</font><br />
* Remember server response to determine:<br />
** <font color="red">If we need to send GETATTR asking for layout type</font><br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* <font color="green">EXCHGID4_FLAG4_BIND_PRINC_STATEID</font><br />
* <font color="red">Separate nfs_client for MDS/DS dual personality</font><br />
** Make sure the client owner is different for each<br />
<br />
=== <font color="red">GETDEVICEINFO</font>===<br />
* <font color="purple">Request Device notifications</font><br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* <font color="red">Determine best GETDEVICEINFO_ARGS gdia_maxcount limits</font><br />
** <font color="red">XDR across page boundaries is problematic today but should be addressed</font><br />
* <font color="red">Handle NFS4ERR_TOOSMALL</font><br />
** <font color="red">Turn off pNFS</font><br />
* Determine where to invoke it<br />
** <font color="red">Invoke from the state manager</font><br />
<br />
=== <font color="green">GETDEVICELIST (Opt)</font>===<br />
<br />
=== <font color="red">LAYOUTGET</font>===<br />
* <font color="red">Determine where to invoke it</font><br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** <font color="red">Minimize sprinkling pNFS calls throughout the call</font><br />
** <font color="red">Minimize number of layout reference/ dereference (number of layout gets per I/O)</font><br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** <font color="red">Specify smart minimum and a reasonable size</font><br />
** <font color="purple">nfs_wait_on_sequence to serialize the gets, returns, and recalls</font><br />
* <font color="red">Support layout range that does not match request</font><br />
* <font color="red">Forgetful Model (12.5.5.1)</font><br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** <font color="red">I/O through MDS</font><br />
** <font color="purple">Timer to retry layout</font><br />
** <font color="purple">Mark inode to not request layout until all dirty pages are flushed</font><br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* <font color="red">Obey stripe unit size and commit through MDS bits</font><br />
* FileHandle Determination (13.3)<br />
** <font color="red">DS Filehandle same as MDS</font><br />
** <font color="red">Same DS Filehandle for every data server</font><br />
*** Not sure if we handle it<br />
** <font color="red">Unique Filehandle for each data server</font><br />
* <font color="red">Specify intended IO Mode in Layout</font><br />
* <font color="purple">More than one striping pattern: logr_layout array > 1</font><br />
* <font color="red">Able to handle different iomode from what was requested</font><br />
* <font color="red">Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3)</font><br />
* <font color="red">Obey logr_return_on_close</font> XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** <font color="red">What's the implication on the forgetful model</font><br />
* <font color="purple">Layout read(write)-ahead</font><br />
** <font color="red">Files Layout will request entire file</font><br />
<br />
=== <font color="red">LAYOUTCOMMIT</font>===<br />
* <font color="red">Include last_write_offset, offset, length</font><br />
* <font color="green">Include mtime</font><br />
** <font color="red">getattr after LAYOUTCOMMIT to update cached attributes</font><br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* <font color="red">Determine where to invoke it</font><br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* <font color="red">Support sub-range layouts</font><br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* <font color="red">Recover from MDS reboot</font><br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** <font color="red">Check we have a layout and correct I/O mode before issuing layoutcommit</font><br />
** <font color="purple">Fred's bug of hole in the layout range</font> Subset of layout segments<br />
* <font color="red">Handle NFS4ERR_RECLAIM_BAD</font><br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== <font color="red">LAYOUTRETURN</font>===<br />
* <font color="red">Forgetful Model</font><br />
* <font color="red">On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1)</font><br />
* <font color="red">On CB_RECALL_ANY return LAYOUTRETURN4_ALL</font><br />
* <font color="green">Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1)</font><br />
* <font color="green">Return full range specified by the layout recall (12.5.5.1)</font><br />
* <font color="green">Ability to return chunks of layouts for huge files to show progress</font><br />
* <font color="green">Return entire range layout as final LAYOUTRETURN</font><br />
* <font color="green">Return NFS4ERR_NOMATCHING_LAYOUT if none is found</font><br />
* <font color="green">Bulk Return</font><br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** <font color="green">sync with nfs_wait_on_sequence()</font><br />
*** The seqid affinity is associated with the filehandle<br />
* <font color="green">Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4)</font><br />
** <font color="red">Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1)</font><br />
** <font color="green">Serialization later</font><br />
** <font color="purple">Return NFS4ERR_DELAY?</font><br />
* <font color="red">Error Recovery</font><br />
** Handle NFS4ERR_OLD_STATEID<br />
** <font color="green">Handle NFS4ERR_BAD_STATEID</font> stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* <font color="red">Error fallback on I/O error</font><br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== <font color="green">SECINFO_NO_NAME (Req)</font> === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* <font color="green">LayoutHint attribute</font><br />
** <font color="green">Need to define a user/programmable interface?</font><br />
* <font color="green">GETATTR follows OPEN to determine layout type</font><br />
* <font color="red">Support GUARDED during create</font><br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* <font color="purple">Compare commit verifier to each of the DS write verifiers</font> XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* <font color="red">Keep data until return value is received so that you can reissue request in case error</font><br />
<br />
== Callback Service Operations ==<br />
=== <font color="red">CB_LAYOUTRECALL</font>===<br />
* <font color="red">Forgetful client behavior</font><br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** <font color="purple">LAYOUTRECALL4_FSID</font><br />
** <font color="purple">LAYOTURECALL4_ALL</font><br />
<br />
=== <font color="red">CB_RECALL_ANY (Req)</font>===<br />
* <font color="red">Client issues LAYOUTRETURN(ALL) due to forgetful client model</font><br />
<br />
=== <font color="green">CB_RECALLABLE_OBJ_AVAIL</font>===<br />
* <font color="red">Set loga_signal_layout_avail on LAYOUTGET to FALSE</font><br />
<br />
=== <font color="green">CB_NOTIFY_DEVICEID (Opt)</font>===<br />
* <font color="red">Indicate no interest in notification</font><br />
* <font color="purple">Detect race with GETDEVICE_INFO</font><br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== <font color="green">CB_WANTS_CANCELLED (Req)</font>===<br />
* <font color="red">Specify no interest if needed</font><br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== <font color="green">SECINFO_NO_NAME</font>===<br />
<br />
=== I/O ===<br />
* <font color="red">Review Data distribution algorithm: (which DS, offset, length)</font><br />
* <font color="red">Sparse</font><br />
* <font color="green">Dense</font><br />
** <font color="red">Stash existing code</font><br />
* WRITE<br />
** <font color="red">Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data</font><br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** <font color="red">Zero byte & EOF handling on reads with holes handled locally (13.10)</font><br />
<br />
=== COMMIT ===<br />
* <font color="red">Commit through MDS</font><br />
* <font color="red">Commit through DS</font><br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** <font color="green">layout_hint</font><br />
** <font color="purple">layout_type</font><br />
** <font color="purple">mdsthreshold</font><br />
** <font color="red">fs_layout_type</font><br />
** <font color="purple">layout_alignment</font><br />
** <font color="purple">layout_blksize</font><br />
<br />
== Locking ==<br />
* <font color="purple">Mandatory Locking</font> <br />
** Use Lock StateID <br />
** <font color="purple">Handle NFS4ERR_LOCKED</font> Check with Windows (Tom Talpey) to see if there's a server in the future<br />
<br />
== Error Handling ==<br />
* <font color="red">Handle I/O errors due to fencing</font><br />
* <font color="red">Due to Layout Revocation</font><br />
* <font color="red">NFS4ERR_GRACE handling</font><br />
* <font color="red">State recovery through the State Manager only</font><br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** <font color="red">Retry pNFS on remount</font><br />
** <font color="purple">Timer?</font><br />
** <font color="purple">Clear error state once there are no more dirty pages?</font><br />
** <font color="red">Fail to MDS on first error - keep it simple</font><br />
** <font color="purple">Retry pNFS after X condition/time</font><br />
<br />
== Security ==<br />
* <font color="red">DS ACL related errors?</font><br />
<br />
== Multiple Layout Type Support ==<br />
* <font color="green">Different Layout types for different files</font><br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** <font color="red">Write through MDS</font><br />
** <font color="purple">Redo Session/Layout setup, reissue I/O to DSs</font><br />
<br />
=== <font color="green">Lease Move (11.7.7.1) (Low Priority)</font>===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* <font color="red">Handle fencing error</font><br />
<br />
=== Metadata Server Restart ===<br />
* <font color="red">SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID</font><br />
* Server out of Grace<br />
** <font color="red">I/O through MDS</font><br />
** <font color="purple">Redo Session/Layout setup, reissue I/O to DSs</font><br />
* Server in Grace<br />
** <font color="red">LAYOUT_COMMIT in reclaim mode</font><br />
** <font color="purple">Redo Session/Layout setup, reissue I/O to DSs</font><br />
<br />
== Data Server Multipathing (13.5) ==<br />
* <font color="purple">Bandwidth Scaling</font><br />
** <font color="green">Session Trunking</font><br />
* Higher Availability<br />
** <font color="purple">multipath_list4</font><br />
** <font color="purple">Replacement DeviceID-to-Device address mapping</font><br />
* <font color="purple">Replacement DeviceID</font><br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-04T14:35:45Z<p>Peterhoneyman: /* State/connection management */</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">Issues labeled in red need to be addressed as part of the minimum pNFS functionality patches</font><br />
* <font color="purple">Issues labeled in purple can be deferred for now</font><br />
* <font color="green">Issues labeled in green can be deferred indefinitely</font><br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font><br />
* <font color="red">Review impact to struct nfs_server</font> Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font><br />
** <font color="red">Black list the layout module so that capability is not available</font><br />
** <font color="purple">Disable pNFS per mount</font><br />
** <font color="green">Define I/O threshold to override attributes and other policy on the client</font><br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font><br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font><br />
** <font color="green">Multiple layouts per filesystem</font><br />
* <font color="red">Data should survive data server filehandle invalidation</font><br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font><br />
* <font color="purple">Support Direct I/O</font><br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font><br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font><br />
*** <font color="purple">Reuse DS clientid/session if we already have one</font><br />
* <font color="red">Remove PNFS_CONFIG Flag</font><br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* <font color="red">Add, Remove, Locate</font><br />
** <font color="purple">Policy to prune unused device info (elevate?)</font><br />
** <font color="red">Umount should clean device table</font><br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** <font color="red">Careful handling of lease renewals</font><br />
* <font color="red">DeviceInfo Mappings</font><br />
* <font color="purple">Multipath support for each DS</font><br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** <font color="red">Give up and I/O through MDS</font><br />
*** <font color="purple">Reattempt through DS?</font><br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* <font color="red">Discuss with server implementers about need for state renewal daemon on DS</font><br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-04T14:35:20Z<p>Peterhoneyman: /* DeviceID Management */</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">Issues labeled in red need to be addressed as part of the minimum pNFS functionality patches</font><br />
* <font color="purple">Issues labeled in purple can be deferred for now</font><br />
* <font color="green">Issues labeled in green can be deferred indefinitely</font><br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font><br />
* <font color="red">Review impact to struct nfs_server</font> Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font><br />
** <font color="red">Black list the layout module so that capability is not available</font><br />
** <font color="purple">Disable pNFS per mount</font><br />
** <font color="green">Define I/O threshold to override attributes and other policy on the client</font><br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font><br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font><br />
** <font color="green">Multiple layouts per filesystem</font><br />
* <font color="red">Data should survive data server filehandle invalidation</font><br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font><br />
* <font color="purple">Support Direct I/O</font><br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font><br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font><br />
*** <font color="purple">Reuse DS clientid/session if we already have one</font><br />
* <font color="red">Remove PNFS_CONFIG Flag</font><br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* <font color="red">Add, Remove, Locate</font><br />
** <font color="purple">Policy to prune unused device info (elevate?)</font><br />
** <font color="red">Umount should clean device table</font><br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** <font color="red">Careful handling of lease renewals</font><br />
* <font color="red">DeviceInfo Mappings</font><br />
* <font color="purple">Multipath support for each DS</font><br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** <font color="red">Give up and I/O through MDS</font><br />
*** <font color="purple">Reattempt through DS?</font><br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* Discuss with server implementers about need for state renewal daemon on DS (A)<br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-04T14:32:47Z<p>Peterhoneyman: /* Data Structure Integration */</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">Issues labeled in red need to be addressed as part of the minimum pNFS functionality patches</font><br />
* <font color="purple">Issues labeled in purple can be deferred for now</font><br />
* <font color="green">Issues labeled in green can be deferred indefinitely</font><br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font><br />
* <font color="red">Review impact to struct nfs_server</font> Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font><br />
** <font color="red">Black list the layout module so that capability is not available</font><br />
** <font color="purple">Disable pNFS per mount</font><br />
** <font color="green">Define I/O threshold to override attributes and other policy on the client</font><br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font><br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font><br />
** <font color="green">Multiple layouts per filesystem</font><br />
* <font color="red">Data should survive data server filehandle invalidation</font><br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font><br />
* <font color="purple">Support Direct I/O</font><br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font><br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font><br />
*** <font color="purple">Reuse DS clientid/session if we already have one</font><br />
* <font color="red">Remove PNFS_CONFIG Flag</font><br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* Add, Remove, Locate (A)<br />
** Policy to prune unused device info (B+)<br />
** Umount should clean device table (A)<br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** Careful handling of lease renewals (A)<br />
* DeviceInfo Mappings (A)<br />
* Multipath support for each DS (B)<br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** Give up and I/O through MDS (A)<br />
*** Reattempt through DS? (B)<br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* Discuss with server implementers about need for state renewal daemon on DS (A)<br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-04T14:28:01Z<p>Peterhoneyman: /* Legend */</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">Issues labeled in red need to be addressed as part of the minimum pNFS functionality patches</font><br />
* <font color="purple">Issues labeled in purple can be deferred for now</font><br />
* <font color="green">Issues labeled in green can be deferred indefinitely</font><br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> (A) Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font> (A)<br />
* <font color="red">Review impact to struct nfs_server</font> (A) Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> (A) Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> (A) Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font> (A)<br />
** <font color="red">Black list the layout module so that capability is not available (A)</font><br />
** Disable pNFS per mount (B)<br />
** Define I/O threshold to override attributes and other policy on the client (C)<br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font> (A)<br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font> (A)<br />
** Multiple layouts per filesystem (C-)<br />
* <font color="red">Data should survive data server filehandle invalidation</font> (A)<br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font> (A)<br />
* Support Direct I/O (B?)<br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font> (A)<br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font> (A)<br />
*** Reuse DS clientid/session if we already have one (B)<br />
* <font color="red">Remove PNFS_CONFIG Flag</font> (A)<br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* Add, Remove, Locate (A)<br />
** Policy to prune unused device info (B+)<br />
** Umount should clean device table (A)<br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** Careful handling of lease renewals (A)<br />
* DeviceInfo Mappings (A)<br />
* Multipath support for each DS (B)<br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** Give up and I/O through MDS (A)<br />
*** Reattempt through DS? (B)<br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* Discuss with server implementers about need for state renewal daemon on DS (A)<br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-04T14:25:03Z<p>Peterhoneyman: /* Legend */</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">Issues labeled in red need to be addressed as part of the minimum pNFS functionality patches</font><br />
* A (B) indicates the issue can be deferred for a subsequent wave of patches<br />
* A (C) indicates the issue can be indefinitely deferred as there is no clear requirement for it<br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> (A) Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font> (A)<br />
* <font color="red">Review impact to struct nfs_server</font> (A) Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> (A) Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> (A) Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font> (A)<br />
** <font color="red">Black list the layout module so that capability is not available (A)</font><br />
** Disable pNFS per mount (B)<br />
** Define I/O threshold to override attributes and other policy on the client (C)<br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font> (A)<br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font> (A)<br />
** Multiple layouts per filesystem (C-)<br />
* <font color="red">Data should survive data server filehandle invalidation</font> (A)<br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font> (A)<br />
* Support Direct I/O (B?)<br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font> (A)<br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font> (A)<br />
*** Reuse DS clientid/session if we already have one (B)<br />
* <font color="red">Remove PNFS_CONFIG Flag</font> (A)<br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* Add, Remove, Locate (A)<br />
** Policy to prune unused device info (B+)<br />
** Umount should clean device table (A)<br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** Careful handling of lease renewals (A)<br />
* DeviceInfo Mappings (A)<br />
* Multipath support for each DS (B)<br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** Give up and I/O through MDS (A)<br />
*** Reattempt through DS? (B)<br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* Discuss with server implementers about need for state renewal daemon on DS (A)<br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-04T14:24:05Z<p>Peterhoneyman: /* Data Structure Integration */</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">An (A) indicates the issue needs to be addressed as part of the minimum pNFS functionality patches</font><br />
* A (B) indicates the issue can be deferred for a subsequent wave of patches<br />
* A (C) indicates the issue can be indefinitely deferred as there is no clear requirement for it<br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* <font color="red">Review impact to struct nfs_client</font> (A) Batsakis<br />
** <font color="red">Ensure layouts are cleaned-up in the right order when the client is destroyed</font> (A)<br />
* <font color="red">Review impact to struct nfs_server</font> (A) Batsakis<br />
* <font color="red">Review impact to struct nfs4_session</font> (A) Batsakis<br />
* <font color="red">Determine if there is a need for the DS to have a struct nfs_server</font> (A) Batsakis<br />
* <font color="red">Ability to tell client not to use pNFS against a server which may support it</font> (A)<br />
** <font color="red">Black list the layout module so that capability is not available (A)</font><br />
** Disable pNFS per mount (B)<br />
** Define I/O threshold to override attributes and other policy on the client (C)<br />
* <font color="red">Layout Drivers should be automatically loaded (Using request module call)</font> (A)<br />
* Ability to have multiple layouts loaded<br />
** <font color="red">One layout type per filesystem</font> (A)<br />
** Multiple layouts per filesystem (C-)<br />
* <font color="red">Data should survive data server filehandle invalidation</font> (A)<br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** <font color="red">EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1)</font> (A)<br />
* Support Direct I/O (B?)<br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* <font color="red">Support Buffered I/O (Page based)</font> (A)<br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** <font color="red">Each personality with its own clientid and session</font> (A)<br />
*** Reuse DS clientid/session if we already have one (B)<br />
* <font color="red">Remove PNFS_CONFIG Flag</font> (A)<br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* Add, Remove, Locate (A)<br />
** Policy to prune unused device info (B+)<br />
** Umount should clean device table (A)<br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** Careful handling of lease renewals (A)<br />
* DeviceInfo Mappings (A)<br />
* Multipath support for each DS (B)<br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** Give up and I/O through MDS (A)<br />
*** Reattempt through DS? (B)<br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* Discuss with server implementers about need for state renewal daemon on DS (A)<br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-04T14:17:42Z<p>Peterhoneyman: /* Legend */</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* <font color="red">An (A) indicates the issue needs to be addressed as part of the minimum pNFS functionality patches</font><br />
* A (B) indicates the issue can be deferred for a subsequent wave of patches<br />
* A (C) indicates the issue can be indefinitely deferred as there is no clear requirement for it<br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* Review impact to struct nfs_client (A) Batsakis<br />
** Ensure layouts are cleaned-up in the right order when the client is destroyed (A)<br />
* Review impact to struct nfs_server (A) Batsakis<br />
* Review impact to struct nfs4_session (A) Batsakis<br />
* Determine if there is a need for the DS to have a struct nfs_server (A) Batsakis<br />
* Ability to tell client not to use pNFS against a server which may support it (A)<br />
** Black list the layout module so that capability is not available (A)<br />
** Disable pNFS per mount (B)<br />
** Define I/O threshold to override attributes and other policy on the client (C)<br />
* Layout Drivers should be automatically loaded (Using request module call) (A)<br />
* Ability to have multiple layouts loaded<br />
** One layout type per filesystem (A)<br />
** Multiple layouts per filesystem (C-)<br />
* Data should survive data server filehandle invalidation (A)<br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1) (A)<br />
* Support Direct I/O (B?)<br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* Support Buffered I/O (Page based) (A)<br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** Each personality with its own clientid and session (A)<br />
*** Reuse DS clientid/session if we already have one (B)<br />
* Remove PNFS_CONFIG Flag (A)<br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* Add, Remove, Locate (A)<br />
** Policy to prune unused device info (B+)<br />
** Umount should clean device table (A)<br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** Careful handling of lease renewals (A)<br />
* DeviceInfo Mappings (A)<br />
* Multipath support for each DS (B)<br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** Give up and I/O through MDS (A)<br />
*** Reattempt through DS? (B)<br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* Discuss with server implementers about need for state renewal daemon on DS (A)<br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/User:Peterhoneyman/sandboxUser:Peterhoneyman/sandbox2010-03-04T14:16:31Z<p>Peterhoneyman: Created page with 'Client pNFS Deliverables NOTE: Need to rename this page to: Client pNFS Requirements This document enumerates the pNFS functionality targeted for integration into the upstre…'</p>
<hr />
<div>Client pNFS Deliverables<br />
<br />
NOTE: Need to rename this page to: Client pNFS Requirements<br />
<br />
This document enumerates the pNFS functionality targeted for integration into the upstream Linux kernel. The first wave of patches will implement the minimum set of functionality required to support the Files Layout. These items are denoted as Priority A. Subsequent waves of patches will address functionality that builds on top of the minimum required set as well as implement additional Layout Types.<br />
<br />
== Legend ==<br />
Note: The labeling still needs to be reviewed by the v4.1 Linux community.<br />
* An (A) indicates the issue needs to be addressed as part of the minimum pNFS functionality patches<br />
* A (B) indicates the issue can be deferred for a subsequent wave of patches<br />
* A (C) indicates the issue can be indefinitely deferred as there is no clear requirement for it<br />
The priority list was initially reviewed during Connectathon 2010.<br />
<br />
== General ==<br />
=== Data Structure Integration ===<br />
* Review impact to struct nfs_client (A) Batsakis<br />
** Ensure layouts are cleaned-up in the right order when the client is destroyed (A)<br />
* Review impact to struct nfs_server (A) Batsakis<br />
* Review impact to struct nfs4_session (A) Batsakis<br />
* Determine if there is a need for the DS to have a struct nfs_server (A) Batsakis<br />
* Ability to tell client not to use pNFS against a server which may support it (A)<br />
** Black list the layout module so that capability is not available (A)<br />
** Disable pNFS per mount (B)<br />
** Define I/O threshold to override attributes and other policy on the client (C)<br />
* Layout Drivers should be automatically loaded (Using request module call) (A)<br />
* Ability to have multiple layouts loaded<br />
** One layout type per filesystem (A)<br />
** Multiple layouts per filesystem (C-)<br />
* Data should survive data server filehandle invalidation (A)<br />
** Client cache maps DS filehandle to MDS filehandle, and the MDS filehandle to cached data (13.1)<br />
* Lease timeout determination<br />
** EXCHGID4_FLAG_USE_PNFS_DS vs MDS or PNFS (13.1.1) (A)<br />
* Support Direct I/O (B?)<br />
** Consult with list, is there customer demand for holding off the first integration?<br />
** Dean can volunteer to implement. Shares same RPC calls as buffered I/O - callbacks are slightly different<br />
** Determine when to trigger the layoutget<br />
* Support Buffered I/O (Page based) (A)<br />
* Session Implications<br />
** Support dual DS/MDS Personality (13.1)<br />
*** Each personality with its own clientid and session (A)<br />
*** Reuse DS clientid/session if we already have one (B)<br />
* Remove PNFS_CONFIG Flag (A)<br />
** Check with Fedora<br />
*** As long as there is a way to specifically prevent the use of pNFS<br />
<br />
=== DeviceID Management ===<br />
* Add, Remove, Locate (A)<br />
** Policy to prune unused device info (B+)<br />
** Umount should clean device table (A)<br />
*** XXX Not sure this is correct, since the scope of a deviceID is the clientID/layouttype - not the filesystem<br />
*** Careful handling of lease renewals (A)<br />
* DeviceInfo Mappings (A)<br />
* Multipath support for each DS (B)<br />
** How does the MDS represent a DS with IPv4 and IPv6 addresses?<br />
** Revisit when generic support for replicated servers is implemented<br />
* Policy<br />
** What happens if the device is down?<br />
*** Give up and I/O through MDS (A)<br />
*** Reattempt through DS? (B)<br />
**** Revisit when generic support fort replicated server<br />
* Recalls (See callbacks)<br />
<br />
=== State/connection management ===<br />
* Discuss with server implementers about need for state renewal daemon on DS (A)<br />
** Is there really a need to keep the lease alive? Can we get away without renewed per DS?<br />
<br />
=== Layout Management ===<br />
* Layout Driver (See above)<br />
* Add, Remove, Locate<br />
** Return layouts if they have not been used within certain time to avoid running out of state on server (B)<br />
* Caching beyond CLOSE (B)<br />
* Whole file layouts (A)<br />
* Segment layouts (B?)<br />
** Merge Overlapping Layouts (B)<br />
*** Revisit when we study the layout design<br />
* Should allow layouts of differing iomode for the same range (A)<br />
* Stateid/Seqid management<br />
** OLD and BAD stateid error handling in layout operations (A)<br />
* Check current Referring Tuple Handling works with pNFS callbacks (A)<br />
<br />
=== Interaction with Delegations (A)===<br />
* Verify proper use of delegation stateid on layoutget<br />
* If no delegation use open stateid<br />
* If mandatory locking then use lock stateid (Priority?)<br />
<br />
== Metadata Server Operations ==<br />
=== EXCHANGE_ID ===<br />
* Handle EXCHGID4_FLAG_USE_NON_PNFS/ EXCHGID4_FLAG_USE_PNFS_MDS/ EXCHGID4_FLAG_USE_PNFS_DS combinations (A)<br />
** If client doesn't specify pNFS and server does, client needs to not do it (A)<br />
* Remember server response to determine:<br />
** If we need to send GETATTR asking for layout type (A)<br />
** To determine if we should specify a layout hint during create (Priority?)<br />
* EXCHGID4_FLAG4_BIND_PRINC_STATEID (C)<br />
* Separate nfs_client for MDS/DS dual personality (A)<br />
** Make sure the client owner is different for each<br />
<br />
=== GETDEVICEINFO (A)===<br />
* Request Device notifications (B)<br />
** NOTIFY_DEVICEID4_CHANGE <br />
** NOTIFY_DEVICEID4_DELETE<br />
* Determine best GETDEVICEINFO_ARGS gdia_maxcount limits (A)<br />
** XDR across page boundaries is problematic today but should be addressed (A?)<br />
* Handle NFS4ERR_TOOSMALL (A)<br />
** Turn off pNFS (A)<br />
* Determine where to invoke it<br />
** Invoke from the state manager (A)<br />
<br />
=== GETDEVICELIST (Opt) (C)===<br />
<br />
=== LAYOUTGET (A)===<br />
* Determine where to invoke it (A)<br />
** Acquire layout as close to the actual I/O?<br />
** For files layout layout at open makes sense - good enough reason to have it as well? <br />
** Minimize sprinkling pNFS calls throughout the call (A)<br />
** Minimize number of layout reference/ dereference (number of layout gets per I/O) (A)<br />
** read, write, mmap, splice_read, splice_write ?<br />
** readpages, writepages error recovery (invoke the state manager?)<br />
** Specify smart minimum and a reasonable size (A)<br />
** nfs_wait_on_sequence to serialize the gets, returns, and recalls (B)<br />
* Support layout range that does not match request (A)<br />
* Forgetful Model (12.5.5.1) (A)<br />
** Makes the layoutreturn/ cb_recall simpler<br />
* Error handling<br />
** I/O through MDS (A)<br />
** Timer to retry layout (B?)<br />
** Mark inode to not request layout until all dirty pages are flushed (B?)<br />
* Handle NFS4ERR_RECALLCONFLICT AND NFS4ERR_RETURNCONFLICT (12.5.5.2)<br />
* Handle NFS4ERR_GRACE<br />
* Handle NFS4ERR_LAYOUTTRYLATER<br />
* Handle NFS4ERR_INVAL<br />
* Handle NFs4ERR_TOOSMALL<br />
* Handle NFS4ERR_LAYOUTUNAVAILABLE<br />
* Handle NFS4ERR_UNKNOWN_LAYOUTTYPE<br />
* Handle NFS4ERR_BADIOMODE<br />
* Handle NFS4ERR_LOCKED<br />
* Obey stripe unit size and commit through MDS bits (A)<br />
* FileHandle Determination (13.3)<br />
** DS Filehandle same as MDS (A)<br />
** Same DS Filehandle for every data server (A)<br />
*** Not sure if we handle it<br />
** Unique Filehandle for each data server (A)<br />
* Specify intended IO Mode in Layout (A)<br />
* More than one striping pattern: logr_layout array > 1 (B)<br />
* Able to handle different iomode from what was requested (A)<br />
* Handle layouts of length NFS4_UINT64_MAX (various rules) (18.43.3) (A)<br />
* Obey logr_return_on_close (A?) XXX Study XXX<br />
** What if you have multiple opens on the same file?<br />
** What's the implication on the forgetful model (A)<br />
* Layout read(write)-ahead (B)<br />
** Files Layout will request entire file (A)<br />
<br />
=== LAYOUTCOMMIT (A)===<br />
* Include last_write_offset, offset, length (A)<br />
* Include mtime (C)<br />
** getattr after LAYOUTCOMMIT to update cached attributes (A)<br />
* Keep layoutcommit data until return value is received so that you can reissue request in case of GRACE for example<br />
XXX What about FILE_SYNC vs DATA_SYNC? Trond had some questions XXX<br />
* Determine where to invoke it (A?)<br />
** Issue layoutcommit in write_inode() and nfs_revalidate_inode()<br />
** Issue layoutcommit before data commits<br />
* Support sub-range layouts (A)<br />
** Do we really know any servers that will do this at this time?<br />
** Belongs in the layout opaque structure? XXX Need to review XXX<br />
* Recover from MDS reboot (A)<br />
** Issue layout_commit with reclaim bit set<br />
** Handle NFS4ERR_NO_GRACE<br />
* Handle NFS4ERR_BADLAYOUT <br />
** Check we have a layout and correct I/O mode before issuing layoutcommit (A)<br />
** Fred’s bug of hole in the layout range (B) Subset of layout segments<br />
* Handle NFS4ERR_RECLAIM_BAD (A)<br />
* Attribute caching: loca_time_modify specified - follow with GETATTR<br />
<br />
=== LAYOUTRETURN (A)===<br />
* Forgetful Model (A)<br />
* On CB_LAYOUTRECALL always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
* On CB_RECALL_ANY return LAYOUTRETURN4_ALL (A)<br />
* Return all subfile ranges on CB_RECALL of entire file layout (12.5.5.1) (C)<br />
* Return full range specified by the layout recall (12.5.5.1) (C)<br />
* Ability to return chunks of layouts for huge files to show progress (C)<br />
* Return entire range layout as final LAYOUTRETURN (C)<br />
* Return NFS4ERR_NOMATCHING_LAYOUT if none is found (C)<br />
* Bulk Return (C)<br />
** LAYOUTRETURN4_FSID<br />
** LAYOUTRETURN4_ALL<br />
** sync with nfs_wait_on_sequence() (C)<br />
*** The seqid affinity is associated with the filehandle<br />
* Serialize operations resulting from intersecting CB_LAYOUTRECALLs (18.44.4) (C)<br />
** Forgetful model always return NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) (A)<br />
** Serialization later (C)<br />
** Return NFS4ERR_DELAY?(B)<br />
* Error Recovery (A)<br />
** Handle NFS4ERR_OLD_STATEID<br />
** Handle NFS4ERR_BAD_STATEID (C) stateid's seqid()<br />
** Handle NFs4ERR_NO_GRACE<br />
** Handle NFS4ERR_INVAL<br />
<br />
=== I/O through the MDS ===<br />
* Error fallback on I/O error (A)<br />
** Including NFS4ERR_BAD_STATEID as returned by DS resulting from DS fencing the I/O after a recall of the layout<br />
<br />
=== SECINFO_NO_NAME (Req) (C) === <br />
* Required only for the server<br />
<br />
=== OPEN ===<br />
* LayoutHint attribute (C)<br />
** Need to define a user/programmable interface? (C)<br />
* GETATTR follows OPEN to determine layout type (C)<br />
* Support GUARDED during create (A)<br />
<br />
=== SETATTR ===<br />
* Changing size may trigger server to recall layout<br />
** No impact on Forgetful client since there is nothing to return<br />
** Same applies to open with truncate<br />
<br />
=== COMMIT ===<br />
* Compare commit verifier to each of the DS write verifiers (B) XXX Review section 13.7 XXX<br />
* We keep the commit verifier per page<br />
* Keep data until return value is received so that you can reissue request in case error (A)<br />
<br />
== Callback Service Operations ==<br />
=== CB_LAYOUTRECALL (A)===<br />
* Forgetful client behavior (A)<br />
** NFS4ERR_NOMATCHING_LAYOUT (12.5.5.1) <br />
* Bulk Recall<br />
** LAYOUTRECALL4_FSID (B)<br />
** LAYOTURECALL4_ALL (B)<br />
<br />
=== CB_RECALL_ANY (Req) (A)===<br />
* Client issues LAYOUTRETURN(ALL) due to forgetful client model (A)<br />
<br />
=== CB_RECALLABLE_OBJ_AVAIL (C)===<br />
* Set loga_signal_layout_avail on LAYOUTGET to FALSE (A)<br />
<br />
=== CB_NOTIFY_DEVICEID (Opt) (C)===<br />
* Indicate no interest in notification (A)<br />
* Detect race with GETDEVICE_INFO (B)<br />
** If layouts using deviceID, then issue TEST_STATEID<br />
*** If valid layout in use, then issue GETDEVICEINFO<br />
<br />
=== CB_WANTS_CANCELLED (Req) (C)===<br />
* Specify no interest if needed (A)<br />
<br />
== Data Server Operations ==<br />
<br />
=== EXCHANGE_ID ===<br />
<br />
=== SECINFO_NO_NAME (C)===<br />
<br />
=== I/O ===<br />
* Review Data distribution algorithm: (which DS, offset, length) (A)<br />
* Sparse (A)<br />
* Dense (C)<br />
** Stash existing code (A)<br />
* WRITE<br />
** Cache all data in range until successful LAYOUTCOMMIT(1st) and COMMIT (2nd) for unstable data (A)<br />
*** How is it that files does not need this for proper recovery? (12.7.4, top of page 306)<br />
* READ<br />
** Zero byte & EOF handling on reads with holes handled locally (13.10) (A)<br />
<br />
=== COMMIT ===<br />
* Commit through MDS (A)<br />
* Commit through DS (A)<br />
<br />
== Metadata/ Attribute Handling ==<br />
* pNFS related attributes<br />
** layout_hint (C)<br />
** layout_type (B)<br />
** mdsthreshold (B)<br />
** fs_layout_type (A)<br />
** layout_alignment (B)<br />
** layout_blksize (B)<br />
<br />
== Locking ==<br />
* Mandatory Locking (B) <br />
** Use Lock StateID <br />
** Handle NFS4ERR_LOCKED (B) Check with Windows (Tom Talpey) to see if there’s a server in the future<br />
<br />
== Error Handling ==<br />
* Handle I/O errors due to fencing (A)<br />
* Due to Layout Revocation (A)<br />
* NFS4ERR_GRACE handling (A)<br />
* State recovery through the State Manager only (A)<br />
** Recover state and mark as I/O for MDS for example<br />
* When do we retry again to the DS<br />
** Retry pNFS on remount (A)<br />
** Timer? (B)<br />
** Clear error state once there are no more dirty pages? (B)<br />
** Fail to MDS on first error - keep it simple (A)<br />
** Retry pNFS after X condition/time (B)<br />
<br />
== Security ==<br />
* DS ACL related errors? (A)<br />
<br />
== Multiple Layout Type Support ==<br />
* Different Layout types for different files (C)<br />
<br />
== Recovery ==<br />
* DS Lease Expiration on the Client (12.7.2) (SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED, SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED)<br />
** Write through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
=== Lease Move (11.7.7.1) (Low Priority) (C)===<br />
<br />
=== Loss of Layout State on Metadata Server ===<br />
* Handle fencing error (A)<br />
<br />
=== Metadata Server Restart ===<br />
* SEQ4_STATUS_RESTART_RECLAIM_NEEDED, NFS4ERR_BAD_SESSION/ NFS4_STALE_CLIENTID (A?)<br />
* Server out of Grace<br />
** I/O through MDS (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
* Server in Grace<br />
** LAYOUT_COMMIT in reclaim mode (A)<br />
** Redo Session/Layout setup, reissue I/O to DSs (B)<br />
<br />
== Data Server Multipathing (13.5) ==<br />
* Bandwidth Scaling (B)<br />
** Session Trunking (C)<br />
* Higher Availability<br />
** multipath_list4 (B?)<br />
** Replacement DeviceID-to-Device address mapping (B?)<br />
* Replacement DeviceID (B?)<br />
<br />
== IPv6 ==</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/PNFS_prototype_designPNFS prototype design2008-01-16T21:49:12Z<p>Peterhoneyman: bruce made me do it</p>
<hr />
<div>= pNFS =<br />
<br />
'''pNFS''' is part of the first NFSv4 minor version. This space is used to track and share Linux pNFS implementation ideas and issues.<br />
<br />
* [http://www.citi.umich.edu/projects/asci/pnfs/linux/ Linux pNFS Implementation Homepage]<br />
<br />
* [[Cthon06 Meeting Notes|Connectathon 2006 Linux pNFS Implementation Meeting Notes]]<br />
<br />
* [[linux pnfs client rewrite may 2006|Linux pNFS Client Internal Reorg patches May 2006 - For Display Purposes Only - Do Not Use]]<br />
<br />
* [[pNFS Implementation Issues|pNFS Implementation Issues]]<br />
<br />
* [[pNFS todo List|pNFS todo List]]<br />
<br />
* [[Wireshark Patches|Wireshark Patches]]<br />
<br />
* [[Bakeathon 2007 Issues List|Bakeathon 2007 Issues List]]<br />
<br />
* [[pNFS Development Road Map]]<br />
<br />
* [http://spreadsheets.google.com/pub?key=pGVvgce8dC-WWbowI9TSmEg Linux pNFS Development Gantt Chart]<br />
<br />
* [[pNFS Git tree recipies|pNFS Git tree recipies]]<br />
<br />
* [[pNFS Development Git tree|pNFS Development Git tree]]</div>Peterhoneymanhttp://www.linux-nfs.org/wiki/index.php/PNFS_Development_Road_MapPNFS Development Road Map2008-01-16T21:48:33Z<p>Peterhoneyman: PNFS Developers Road Map moved to PNFS Development Road Map: cuz bruce fields sez so</p>
<hr />
<div>Completing pNFS for Linux requires fighting three battles: IETF specification, Linux implementation, and integration into the Linux kernel.<br />
<br />
Section 1 describes the status of NFSv4.1 specification, based on the IETF meeting that just ended.<br />
<br />
Section II describes the plan for implementation and integration.<br />
<br />
==IETF Road Map==<br />
<br />
NFSv4.1 extends NFSv4 with two major components: sessions and pNFS. As of the 70th IETF Meeting in Vancouver (December 2007), the specification of sessions in [http://www3.ietf.org/internet-drafts/draft-ietf-nfsv4-minorversion1-17.txt draft-ietf-nfsv4-minorversion1-17.txt] appears to be complete. [http://www1.ietf.org/mail-archive/web/nfsv4/current/msg05155.html pNFS discussions] centered on device ID mappings, layout range accounting, sparse files, persistent sessions, and recall processing.<br />
<br />
Draft 18 is anticipated to be released on December 21, 2007. The major change is device mappings, which allow a device ID to be recalled without affecting the layout. Draft 18 issues will be tested at the Austin Bakeathon in February 2008.<br />
<br />
Draft 19 is expected to follow the Austin Bakeathon and be issued as an RFC following the 71st IETF Meeting in Philadelphia (March 2008). This will freeze the specification of sessions, generic pNFS protocol issues, and pNFS file layout. Specification of block layout, currently [http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-block draft-ietf-nfsv4-pnfs-block-05.txt], and object layout, currently [http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj draft-ietf-nfsv4-pnfs-obj-04.txt], may also be ready to move forward in Philadelphia; otherwise they will wait until the 72nd IETF Meeting in Europe (July/August 2008).<br />
<br />
==Linux pNFS Road Map==<br />
<br />
The Linux pNFS road map entails fighting three battles <br />
<br />
===Rebase the implementation on the latest Linux kernel===<br />
<br />
The current version of Linux pNFS is implemented on the 2.6.18.3 kernel. The Linux pNFS developers group is rebasing the code to the latest kernel, 2.6.24 at this writing. <br />
<br />
Along the way to the current kernel, the NFS client and RPC layer saw major changes, complicating a direct port pNFS and sessions code. This led to two efforts to rebase the code:<br />
<br />
* Patch forward<br />
<br />
:Benny Halevy (Panasas) has rebased the 2.6.18.3 sessions code through the multiple kernels along the path to the latest level. <br />
<br />
Benny says: "I've completed rebasing our patches in the linux-pnfs-2.6 over 2.6.24-rc5." I'm confused<br />
<br />
* Rewrite<br />
<br />
:A team at Network Appliance led by Ricardo Labiaga rewrote sessions for the latest Linux kernel and submitted patches to the Linux pNFS developers group for review. The forward channel code was added to linux-pnfs-2.6-latest, a git tree based on the latest kernel. <br />
<br />
:Andy Adamson rewrote the pNFS I/O path. READ I/O patches to the latest kernel are under review by the Linux pNFS developers. WRITE I/O is being factored into patches of manageable size.<br />
<br />
===Fully implement the final specification===<br />
<br />
The current code implements draft-ietf-nfsv4-minorversion1-13. The current IETF specification at this writing is [http://www.nfsv4-editor.org/draft-17/draft-ietf-nfsv4-minorversion1-17.txt draft 17]; draft 18 is anticipated by December 21, 2007, and draft 19 is anticipated before the next IETF meeting. Effort is required to bring the current code forward to draft 18 for the Austin Bakeathon in February 2008, and then to complete the implementation. This is detailed below.<br />
<br />
===Organize and submit a sequence of patches to the Linux maintainers===<br />
<br />
Once the code is ported to a git tree based on Linus’ kernel and brought forward to the final NFSv4.1 draft, a “ready to submit” branch can be made available to Linux kernel maintainers and pNFS developers for review, performance testing, and error testing. <br />
<br />
The pNFS and sessions patches for 2.6.18.3 tree are huge and lack a patch history suitable for submission. Benny is creating small patches from the 2.6.18.3 code base and applying them to successive kernels, with the hope that at the end of the process, he will have preserved functionality and created a patch history useful for submitting to kernel review.<br />
<br />
Ricardo and Andy approach the problem from the other direction. After rewriting pNFS and sessions for the latest kernel, they factor the code into small patches that they can submit for review by Linux kernel maintainers.<br />
<br />
==Components and dependencies==<br />
<br />
===Switch on minor version ===<br />
<br />
Provide the unified framework for minor versions in the NFSv4 client and server.<br />
<br />
===Minimal sessions, forward channel===<br />
<br />
Set up the minimal NFSv4.1 session over a forward channel, including session slot and sequence number management.<br />
<br />
Client and server negotiate a session, place an OP_SEQUENCE as the first operation of every compound, and recover from session loss due to lease expiration. <br />
<br />
Implement session keep-alive.<br />
<br />
The client and server operations to be implemented for this step:<br />
<br />
* OP_EXCHANGE_ID<br />
* OP_CREATE_SESSION<br />
* OP_SEQUENCE<br />
* OP_DESTROY_SESSION<br />
* Add OP_SEQUENCE to each compound<br />
* State renewal<br />
* RPC layer errors<br />
<br />
''Depends on minor version switch''<br />
<br />
===Minimal sessions, back channel===<br />
<br />
* Set up a minimal NFSv4.1 session over a back channel negotiated between the client and the server. <br />
* Use the forward channel code for session slot and sequence number management.<br />
* Client and server will create back channel(s) and place a CB_SEQUENCE as the first operation on all CB_COMPOUND RPCs.<br />
<br />
Client and server operations to be implemented:<br />
<br />
* OP_CREATE_SESSION<br />
* OP_CB_SEQUENCE<br />
* OP_CB_RECALL_SLOT<br />
<br />
''Depends on minor version switch and minimal sessions forward channel''<br />
<br />
===pNFS I/O READ and WRITE===<br />
<br />
The pNFS generic client supports two I/O paths that use the NFS page cache:<br />
<br />
* an RPC based I/O path, used by the file layout module, and<br />
*a non-RPC path, used by the block layout and object layout modules.<br />
<br />
Implmentation steps:<br />
<br />
* Negotiate pNFS layout type common to the pNFS client and server<br />
*Client and server perform I/O over the file layout type<br />
*Client returns layout on unmount<br />
<br />
Client implementation:<br />
<br />
*Generic pNFS client and layout API<br />
* File layout, using the layout API<br />
<br />
Server implementation:<br />
<br />
* pNFS interface for NFSD, implemented as an extended export operations API, for exporting a pNFS capable file system.<br />
<br />
The server API is used by the following prototypes:<br />
<br />
* IBM GPFS file layout server,<br />
* Panasas object layout server, and<br />
* Network Appliance Linux MDS file layout server. <br />
<br />
The Network Appliance Linux MDS prototype is not released at this writing. <br />
<br />
Client and server operations to be implemented:<br />
<br />
* OP_EXCHANGE_ID<br />
* pNFS-specific OP_GETATTR attributes<br />
* OP_GETDEVICELIST <br />
* OP_GETDEVICEINFO<br />
* OP_LAYOUTGET <br />
* OP_LAYOUTCOMMIT<br />
* OP_LAYOUTRETURN<br />
<br />
''Depends on minor version switch and minimal sessions forward channel''<br />
<br />
===pNFS layout recall===<br />
<br />
* Enable the pNFS server to recall layouts using the minimal sessions back channel.<br />
* Enable the pNFS server to inform the client that a previously denied LAYOUGET is now available.<br />
* When complete, the pNFS client and server will be able to setup and manage layout caches.<br />
<br />
Client and server operations to be implemented:<br />
<br />
* OP_CB_SEQUENCE<br />
* OP_CB_LAYOUTRECALL<br />
* OP_CB_RECALLABLE_OBJ_AVAIL<br />
* OP_LAYOUTGET<br />
* OP_LAYOUTRETURN<br />
<br />
''Depends on minor version switch, and minimal sessions forward and back channels, and pNFS I/O''<br />
<br />
===Exactly once semantics===<br />
<br />
* Revisit forward channel attributes on client and server.<br />
* Implement server replay cache <br />
<br />
''Depends on minor version switch and minimal sessions forward channel''<br />
<br />
===pNFS reboot recovery===<br />
<br />
* Implement pNFS I/O failover to the metadata server using NFSv4.1 RPC.<br />
* Implement grace period recovery.<br />
<br />
''Depends on minor version switch, minimal sessions forward and back channels, and pNFS I/O''<br />
<br />
===Full sessions forward channel===<br />
<br />
* Implement the mandatory session forward channel features:<br />
* Trunking<br />
* OP_BIND_CONN_TO_SESSION<br />
* Kerberos and X509 machine credentials at mount for EXCHANGE_ID<br />
* SSV<br />
* Secure forward channel<br />
<br />
''Depends on minor version switch and minimal sessions forward channel''<br />
<br />
===Full sessions back channel===<br />
<br />
* Implement the mandatory session forward channel features:<br />
* OP_BACKCHANNEL_CTL<br />
* SSV (secret state verifier)<br />
* Secure back channel<br />
* OP_CB_SEQUENCE<br />
<br />
''Depends on minor version switch and minimal sessions forward and back channels''<br />
<br />
===pNFS device recall===<br />
<br />
* Implement the Draft 18 pNFS device recall feature.<br />
<br />
''Depends on minor version switch and minimal sessions forward and back channels''<br />
<br />
===Back channel replay cache===<br />
<br />
* Implement the NFSv4.1 server replay cache required for exactly once semantics.<br />
<br />
''Depends on minor version switch and minimal sessions forward and back channels''<br />
<br />
===pNFS O_DIRECT I/O path===<br />
<br />
When O_DIRECT is specified, READ and WRITE I/O bypass the NFS page cache. <br />
<br />
* Add pNFS I/O callouts to fs/nfs/direct.c to get a layout.<br />
* Perform pNFS I/O.<br />
<br />
==Status==<br />
<br />
The linux-pnfs-2.6-latest tree has the following functionality tested in 2.6.24-based kernels<br />
<br />
* Step 1: switch on minor version<br />
* Step 2: minimal sessions forward channel<br />
* Step 3: minimal sessions back channel<br />
* Step 4: pNFS I/O<br />
:* READ for file, block, and object layouts<br />
:* WRITE for file, block, and object layouts is working, but needs patches factored for review</div>Peterhoneyman