PNFS Development Road Map

From Linux NFS

(Difference between revisions)
Jump to: navigation, search
(Minimal sessions, back channel)
m (PNFS Developers Road Map moved to PNFS Development Road Map: cuz bruce fields sez so)
 
(19 intermediate revisions not shown)
Line 7: Line 7:
==IETF Road Map==
==IETF Road Map==
-
Will follow.
+
NFSv4.1 extends NFSv4 with two major components: sessions and pNFS.  As of the 70th IETF Meeting in Vancouver (December 2007), the specification of sessions in [http://www3.ietf.org/internet-drafts/draft-ietf-nfsv4-minorversion1-17.txt draft-ietf-nfsv4-minorversion1-17.txt] appears to be complete.  [http://www1.ietf.org/mail-archive/web/nfsv4/current/msg05155.html pNFS discussions] centered on device ID mappings, layout range accounting, sparse files, persistent sessions, and recall processing.
 +
 
 +
Draft 18 is anticipated to be released on December 21, 2007.  The major change is device mappings, which allow a device ID to be recalled without affecting the layout.  Draft 18 issues will be tested at the Austin Bakeathon in February 2008.
 +
 
 +
Draft 19 is expected to follow the Austin Bakeathon and be issued as an RFC following the 71st IETF Meeting in Philadelphia (March 2008).  This will freeze the specification of sessions, generic pNFS protocol issues, and pNFS file layout.  Specification of block layout, currently [http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-block draft-ietf-nfsv4-pnfs-block-05.txt], and object layout, currently [http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj draft-ietf-nfsv4-pnfs-obj-04.txt], may also be ready to move forward in Philadelphia; otherwise they will wait until the 72nd IETF Meeting in Europe (July/August 2008).
==Linux pNFS Road Map==
==Linux pNFS Road Map==
Line 21: Line 25:
* Patch forward
* Patch forward
-
:Benny Halevy (Panasas) is rebasing the 2.6.18.3 sessions code through the multiple kernels along the path to the latest level.  
+
:Benny Halevy (Panasas) has rebased the 2.6.18.3 sessions code through the multiple kernels along the path to the latest level.  
 +
 
 +
Benny says: "I've completed rebasing our patches in the linux-pnfs-2.6 over 2.6.24-rc5."  I'm confused
* Rewrite
* Rewrite
Line 31: Line 37:
===Fully implement the final specification===
===Fully implement the final specification===
-
The current code implements draft-ietf-nfsv4-minorversion1-13.  The current IETF specification is draft 18; draft 19 is anticipated before the next IETF meeting.  Effort is required to bring the current code forward to draft 18 for the Austin Bakeathon in February 2008, and then to complete the implementation.  This is detailed below.
+
The current code implements draft-ietf-nfsv4-minorversion1-13.  The current IETF specification at this writing is [http://www.nfsv4-editor.org/draft-17/draft-ietf-nfsv4-minorversion1-17.txt draft 17]; draft 18 is anticipated by December 21, 2007, and draft 19 is anticipated before the next IETF meeting.  Effort is required to bring the current code forward to draft 18 for the Austin Bakeathon in February 2008, and then to complete the implementation.  This is detailed below.
===Organize and submit a sequence of patches to the Linux maintainers===
===Organize and submit a sequence of patches to the Linux maintainers===
Line 79: Line 85:
* OP_CB_RECALL_SLOT
* OP_CB_RECALL_SLOT
-
''Depends on Steps 1 and 2''
+
''Depends on minor version switch and minimal sessions forward channel''
===pNFS I/O READ and WRITE===
===pNFS I/O READ and WRITE===
Line 85: Line 91:
The pNFS generic client supports two I/O paths that use the NFS page cache:
The pNFS generic client supports two I/O paths that use the NFS page cache:
-
an RPC based I/O path, used by the file layout module, and
+
* an RPC based I/O path, used by the file layout module, and
-
a non-RPC path, used by the block layout and object layout modules.
+
*a non-RPC path, used by the block layout and object layout modules.
-
Negotiate pNFS layout type common to the pNFS client and server
+
Implmentation steps:
-
Client and server perform I/O over the file layout type
+
 
-
Client returns layout on unmount
+
* Negotiate pNFS layout type common to the pNFS client and server
 +
*Client and server perform I/O over the file layout type
 +
*Client returns layout on unmount
Client implementation:
Client implementation:
-
Generic pNFS client and layout API
+
*Generic pNFS client and layout API
-
File layout, using the layout API
+
* File layout, using the layout API
Server implementation:
Server implementation:
-
pNFS interface for NFSD, implemented as an extended export operations API, for exporting a pNFS capable file system.
+
* pNFS interface for NFSD, implemented as an extended export operations API, for exporting a pNFS capable file system.
The server API is used by the following prototypes:
The server API is used by the following prototypes:
-
IBM GPFS file layout server,
+
* IBM GPFS file layout server,
-
Panasas object layout server, and
+
* Panasas object layout server, and
-
Network Appliance Linux MDS file layout server.  
+
* Network Appliance Linux MDS file layout server.  
The Network Appliance Linux MDS prototype is not released at this writing.  
The Network Appliance Linux MDS prototype is not released at this writing.  
Line 111: Line 119:
Client and server operations to be implemented:
Client and server operations to be implemented:
-
OP_EXCHANGE_ID
+
* OP_EXCHANGE_ID
-
pNFS-specific OP_GETATTR attributes
+
* pNFS-specific OP_GETATTR attributes
-
OP_GETDEVICELIST
+
* OP_GETDEVICELIST
-
OP_GETDEVICEINFO
+
* OP_GETDEVICEINFO
-
OP_LAYOUTGET
+
* OP_LAYOUTGET
-
OP_LAYOUTCOMMIT
+
* OP_LAYOUTCOMMIT
-
OP_LAYOUTRETURN
+
* OP_LAYOUTRETURN
 +
 
 +
''Depends on minor version switch and minimal sessions forward channel''
-
Depends on Steps 1 and 2.
 
-
 
===pNFS layout recall===
===pNFS layout recall===
-
Enable the pNFS server to recall layouts using the minimal sessions back channel.
+
* Enable the pNFS server to recall layouts using the minimal sessions back channel.
-
Enable the pNFS server to inform the client that a previously denied LAYOUGET is now available.
+
* Enable the pNFS server to inform the client that a previously denied LAYOUGET is now available.
-
When complete, the pNFS client and server will be able to setup and manage layout caches.
+
* When complete, the pNFS client and server will be able to setup and manage layout caches.
Client and server operations to be implemented:
Client and server operations to be implemented:
-
OP_CB_SEQUENCE
+
* OP_CB_SEQUENCE
-
OP_CB_LAYOUTRECALL
+
* OP_CB_LAYOUTRECALL
-
OP_CB_RECALLABLE_OBJ_AVAIL
+
* OP_CB_RECALLABLE_OBJ_AVAIL
-
OP_LAYOUTGET
+
* OP_LAYOUTGET
-
OP_LAYOUTRETURN
+
* OP_LAYOUTRETURN
-
Depends on Steps 1–4.
+
''Depends on minor version switch, and minimal sessions forward and back channels, and pNFS I/O''
===Exactly once semantics===
===Exactly once semantics===
-
Revisit forward channel attributes on client and server.
+
* Revisit forward channel attributes on client and server.
-
Implement server replay cache  
+
* Implement server replay cache  
-
Depends on Steps 1 and 2.
+
''Depends on minor version switch and minimal sessions forward channel''
===pNFS reboot recovery===
===pNFS reboot recovery===
-
Implement pNFS I/O failover to the metadata server using NFSv4.1 RPC.
+
* Implement pNFS I/O failover to the metadata server using NFSv4.1 RPC.
-
Implement grace period recovery.
+
* Implement grace period recovery.
-
Depends on Steps 1–4.
+
''Depends on minor version switch, minimal sessions forward and back channels, and pNFS I/O''
===Full sessions forward channel===
===Full sessions forward channel===
-
Implement the mandatory session forward channel features:
+
* Implement the mandatory session forward channel features:
-
Trunking
+
* Trunking
-
OP_BIND_CONN_TO_SESSION
+
* OP_BIND_CONN_TO_SESSION
-
Kerberos and X509 machine credentials at mount for EXCHANGE_ID
+
* Kerberos and X509 machine credentials at mount for EXCHANGE_ID
-
SSV
+
* SSV
-
Secure forward channel
+
* Secure forward channel
-
Depends on Steps 1 and 2.
+
''Depends on minor version switch and minimal sessions forward channel''
===Full sessions back channel===
===Full sessions back channel===
-
Implement the mandatory session forward channel features:
+
* Implement the mandatory session forward channel features:
-
OP_BACKCHANNEL_CTL
+
* OP_BACKCHANNEL_CTL
-
SSV (secret state verifier)
+
* SSV (secret state verifier)
-
Secure back channel
+
* Secure back channel
-
OP_CB_SEQUENCE
+
* OP_CB_SEQUENCE
-
Depends on Steps 1–3.
+
''Depends on minor version switch and minimal sessions forward and back channels''
===pNFS device recall===
===pNFS device recall===
-
Implement the Draft 18 pNFS device recall feature.
+
* Implement the Draft 18 pNFS device recall feature.
-
Depends on Steps 1–3.
+
''Depends on minor version switch and minimal sessions forward and back channels''
===Back channel replay cache===
===Back channel replay cache===
-
Implement the NFSv4.1 server replay cache required for exactly once semantics.
+
* Implement the NFSv4.1 server replay cache required for exactly once semantics.
-
Depends on Steps 1–3.
+
''Depends on minor version switch and minimal sessions forward and back channels''
===pNFS O_DIRECT I/O path===
===pNFS O_DIRECT I/O path===
Line 188: Line 196:
When O_DIRECT is specified, READ and WRITE I/O bypass the NFS page cache.  
When O_DIRECT is specified, READ and WRITE I/O bypass the NFS page cache.  
-
Add pNFS I/O callouts to fs/nfs/direct.c to get a layout.
+
* Add pNFS I/O callouts to fs/nfs/direct.c to get a layout.
-
Perform pNFS I/O.
+
* Perform pNFS I/O.
-
Status
+
==Status==
The linux-pnfs-2.6-latest tree has the following functionality tested in 2.6.24-based kernels
The linux-pnfs-2.6-latest tree has the following functionality tested in 2.6.24-based kernels
-
Step 1: switch on minor version
+
* Step 1: switch on minor version
-
Step 2: minimal sessions forward channel
+
* Step 2: minimal sessions forward channel
-
Step 3: minimal sessions back channel
+
* Step 3: minimal sessions back channel
-
Step 4: pNFS I/O
+
* Step 4: pNFS I/O
-
READ for file, block, and object layouts
+
:* READ for file, block, and object layouts
-
WRITE for file, block, and object layouts is working, but needs patches factored for review
+
:* WRITE for file, block, and object layouts is working, but needs patches factored for review

Latest revision as of 21:48, 16 January 2008

Completing pNFS for Linux requires fighting three battles: IETF specification, Linux implementation, and integration into the Linux kernel.

Section 1 describes the status of NFSv4.1 specification, based on the IETF meeting that just ended.

Section II describes the plan for implementation and integration.

Contents

IETF Road Map

NFSv4.1 extends NFSv4 with two major components: sessions and pNFS. As of the 70th IETF Meeting in Vancouver (December 2007), the specification of sessions in draft-ietf-nfsv4-minorversion1-17.txt appears to be complete. pNFS discussions centered on device ID mappings, layout range accounting, sparse files, persistent sessions, and recall processing.

Draft 18 is anticipated to be released on December 21, 2007. The major change is device mappings, which allow a device ID to be recalled without affecting the layout. Draft 18 issues will be tested at the Austin Bakeathon in February 2008.

Draft 19 is expected to follow the Austin Bakeathon and be issued as an RFC following the 71st IETF Meeting in Philadelphia (March 2008). This will freeze the specification of sessions, generic pNFS protocol issues, and pNFS file layout. Specification of block layout, currently draft-ietf-nfsv4-pnfs-block-05.txt, and object layout, currently draft-ietf-nfsv4-pnfs-obj-04.txt, may also be ready to move forward in Philadelphia; otherwise they will wait until the 72nd IETF Meeting in Europe (July/August 2008).

Linux pNFS Road Map

The Linux pNFS road map entails fighting three battles

Rebase the implementation on the latest Linux kernel

The current version of Linux pNFS is implemented on the 2.6.18.3 kernel. The Linux pNFS developers group is rebasing the code to the latest kernel, 2.6.24 at this writing.

Along the way to the current kernel, the NFS client and RPC layer saw major changes, complicating a direct port pNFS and sessions code. This led to two efforts to rebase the code:

  • Patch forward
Benny Halevy (Panasas) has rebased the 2.6.18.3 sessions code through the multiple kernels along the path to the latest level.
Benny says: "I've completed rebasing our patches in the linux-pnfs-2.6 over 2.6.24-rc5."  I'm confused
  • Rewrite
A team at Network Appliance led by Ricardo Labiaga rewrote sessions for the latest Linux kernel and submitted patches to the Linux pNFS developers group for review. The forward channel code was added to linux-pnfs-2.6-latest, a git tree based on the latest kernel.
Andy Adamson rewrote the pNFS I/O path. READ I/O patches to the latest kernel are under review by the Linux pNFS developers. WRITE I/O is being factored into patches of manageable size.

Fully implement the final specification

The current code implements draft-ietf-nfsv4-minorversion1-13. The current IETF specification at this writing is draft 17; draft 18 is anticipated by December 21, 2007, and draft 19 is anticipated before the next IETF meeting. Effort is required to bring the current code forward to draft 18 for the Austin Bakeathon in February 2008, and then to complete the implementation. This is detailed below.

Organize and submit a sequence of patches to the Linux maintainers

Once the code is ported to a git tree based on Linus’ kernel and brought forward to the final NFSv4.1 draft, a “ready to submit” branch can be made available to Linux kernel maintainers and pNFS developers for review, performance testing, and error testing.

The pNFS and sessions patches for 2.6.18.3 tree are huge and lack a patch history suitable for submission. Benny is creating small patches from the 2.6.18.3 code base and applying them to successive kernels, with the hope that at the end of the process, he will have preserved functionality and created a patch history useful for submitting to kernel review.

Ricardo and Andy approach the problem from the other direction. After rewriting pNFS and sessions for the latest kernel, they factor the code into small patches that they can submit for review by Linux kernel maintainers.

Components and dependencies

Switch on minor version

Provide the unified framework for minor versions in the NFSv4 client and server.

Minimal sessions, forward channel

Set up the minimal NFSv4.1 session over a forward channel, including session slot and sequence number management.

Client and server negotiate a session, place an OP_SEQUENCE as the first operation of every compound, and recover from session loss due to lease expiration.

Implement session keep-alive.

The client and server operations to be implemented for this step:

  • OP_EXCHANGE_ID
  • OP_CREATE_SESSION
  • OP_SEQUENCE
  • OP_DESTROY_SESSION
  • Add OP_SEQUENCE to each compound
  • State renewal
  • RPC layer errors

Depends on minor version switch

Minimal sessions, back channel

  • Set up a minimal NFSv4.1 session over a back channel negotiated between the client and the server.
  • Use the forward channel code for session slot and sequence number management.
  • Client and server will create back channel(s) and place a CB_SEQUENCE as the first operation on all CB_COMPOUND RPCs.

Client and server operations to be implemented:

  • OP_CREATE_SESSION
  • OP_CB_SEQUENCE
  • OP_CB_RECALL_SLOT

Depends on minor version switch and minimal sessions forward channel

pNFS I/O READ and WRITE

The pNFS generic client supports two I/O paths that use the NFS page cache:

  • an RPC based I/O path, used by the file layout module, and
  • a non-RPC path, used by the block layout and object layout modules.

Implmentation steps:

  • Negotiate pNFS layout type common to the pNFS client and server
  • Client and server perform I/O over the file layout type
  • Client returns layout on unmount

Client implementation:

  • Generic pNFS client and layout API
  • File layout, using the layout API

Server implementation:

  • pNFS interface for NFSD, implemented as an extended export operations API, for exporting a pNFS capable file system.

The server API is used by the following prototypes:

  • IBM GPFS file layout server,
  • Panasas object layout server, and
  • Network Appliance Linux MDS file layout server.

The Network Appliance Linux MDS prototype is not released at this writing.

Client and server operations to be implemented:

  • OP_EXCHANGE_ID
  • pNFS-specific OP_GETATTR attributes
  • OP_GETDEVICELIST
  • OP_GETDEVICEINFO
  • OP_LAYOUTGET
  • OP_LAYOUTCOMMIT
  • OP_LAYOUTRETURN

Depends on minor version switch and minimal sessions forward channel

pNFS layout recall

  • Enable the pNFS server to recall layouts using the minimal sessions back channel.
  • Enable the pNFS server to inform the client that a previously denied LAYOUGET is now available.
  • When complete, the pNFS client and server will be able to setup and manage layout caches.

Client and server operations to be implemented:

  • OP_CB_SEQUENCE
  • OP_CB_LAYOUTRECALL
  • OP_CB_RECALLABLE_OBJ_AVAIL
  • OP_LAYOUTGET
  • OP_LAYOUTRETURN

Depends on minor version switch, and minimal sessions forward and back channels, and pNFS I/O

Exactly once semantics

  • Revisit forward channel attributes on client and server.
  • Implement server replay cache

Depends on minor version switch and minimal sessions forward channel

pNFS reboot recovery

  • Implement pNFS I/O failover to the metadata server using NFSv4.1 RPC.
  • Implement grace period recovery.

Depends on minor version switch, minimal sessions forward and back channels, and pNFS I/O

Full sessions forward channel

  • Implement the mandatory session forward channel features:
  • Trunking
  • OP_BIND_CONN_TO_SESSION
  • Kerberos and X509 machine credentials at mount for EXCHANGE_ID
  • SSV
  • Secure forward channel

Depends on minor version switch and minimal sessions forward channel

Full sessions back channel

  • Implement the mandatory session forward channel features:
  • OP_BACKCHANNEL_CTL
  • SSV (secret state verifier)
  • Secure back channel
  • OP_CB_SEQUENCE

Depends on minor version switch and minimal sessions forward and back channels

pNFS device recall

  • Implement the Draft 18 pNFS device recall feature.

Depends on minor version switch and minimal sessions forward and back channels

Back channel replay cache

  • Implement the NFSv4.1 server replay cache required for exactly once semantics.

Depends on minor version switch and minimal sessions forward and back channels

pNFS O_DIRECT I/O path

When O_DIRECT is specified, READ and WRITE I/O bypass the NFS page cache.

  • Add pNFS I/O callouts to fs/nfs/direct.c to get a layout.
  • Perform pNFS I/O.

Status

The linux-pnfs-2.6-latest tree has the following functionality tested in 2.6.24-based kernels

  • Step 1: switch on minor version
  • Step 2: minimal sessions forward channel
  • Step 3: minimal sessions back channel
  • Step 4: pNFS I/O
  • READ for file, block, and object layouts
  • WRITE for file, block, and object layouts is working, but needs patches factored for review
Personal tools