PNFS Development Road Map

From Linux NFS

Revision as of 19:48, 10 December 2007 by Peterhoneyman (Talk | contribs)
Jump to: navigation, search

Completing pNFS for Linux requires fighting three battles: IETF specification, Linux implementation, and integration into the Linux kernel.

Section 1 describes the status of NFSv4.1 specification, based on the IETF meeting that just ended.

Section II describes the plan for implementation and integration.


IETF Road Map

Will follow.

Linux pNFS Road Map

The Linux pNFS road map entails fighting three battles

Rebase the implementation on the latest Linux kernel

The current version of Linux pNFS is implemented on the kernel. The Linux pNFS developers group is rebasing the code to the latest kernel, 2.6.24 at this writing.

Along the way to the current kernel, the NFS client and RPC layer saw major changes, complicating a direct port pNFS and sessions code. This led to two efforts to rebase the code:

  • Patch forward
Benny Halevy (Panasas) is rebasing the sessions code through the multiple kernels along the path to the latest level.
  • Rewrite
A team at Network Appliance led by Ricardo Labiaga rewrote sessions for the latest Linux kernel and submitted patches to the Linux pNFS developers group for review. The forward channel code was added to linux-pnfs-2.6-latest, a git tree based on the latest kernel.
Andy Adamson rewrote the pNFS I/O path. READ I/O patches to the latest kernel are under review by the Linux pNFS developers. WRITE I/O is being factored into patches of manageable size.

Fully implement the final specification

The current code implements draft-ietf-nfsv4-minorversion1-13. The current IETF specification is draft 18; draft 19 is anticipated before the next IETF meeting. Effort is required to bring the current code forward to draft 18 for the Austin Bakeathon in February 2008, and then to complete the implementation. This is detailed below.

Organize and submit a sequence of patches to the Linux maintainers

Once the code is ported to a git tree based on Linus’ kernel and brought forward to the final NFSv4.1 draft, a “ready to submit” branch can be made available to Linux kernel maintainers and pNFS developers for review, performance testing, and error testing.

The pNFS and sessions patches for tree are huge and lack a patch history suitable for submission. Benny is creating small patches from the code base and applying them to successive kernels, with the hope that at the end of the process, he will have preserved functionality and created a patch history useful for submitting to kernel review.

Ricardo and Andy approach the problem from the other direction. After rewriting pNFS and sessions for the latest kernel, they factor the code into small patches that they can submit for review by Linux kernel maintainers.

Components and dependencies

Switch on minor version

Provide the unified framework for minor versions in the NFSv4 client and server.

Minimal sessions, forward channel

Set up the minimal NFSv4.1 session over a forward channel, including session slot and sequence number management.

Client and server negotiate a session, place an OP_SEQUENCE as the first operation of every compound, and recover from session loss due to lease expiration.

Implement session keep-alive.

The client and server operations to be implemented for this step:

  • Add OP_SEQUENCE to each compound
  • State renewal
  • RPC layer errors

Depends on minor version switch

Minimal sessions, back channel

  • Set up a minimal NFSv4.1 session over a back channel negotiated between the client and the server.
  • Use the forward channel code for session slot and sequence number management.
  • Client and server will create back channel(s) and place a CB_SEQUENCE as the first operation on all CB_COMPOUND RPCs.

Client and server operations to be implemented:


Depends on minor version switch and minimal sessions forward channel


The pNFS generic client supports two I/O paths that use the NFS page cache:

• an RPC based I/O path, used by the file layout module, and • a non-RPC path, used by the block layout and object layout modules.

• Negotiate pNFS layout type common to the pNFS client and server • Client and server perform I/O over the file layout type • Client returns layout on unmount

Client implementation:

• Generic pNFS client and layout API • File layout, using the layout API

Server implementation:

• pNFS interface for NFSD, implemented as an extended export operations API, for exporting a pNFS capable file system.

The server API is used by the following prototypes:

• IBM GPFS file layout server, • Panasas object layout server, and • Network Appliance Linux MDS file layout server.

The Network Appliance Linux MDS prototype is not released at this writing.

Client and server operations to be implemented:


Depends on Steps 1 and 2.

pNFS layout recall

• Enable the pNFS server to recall layouts using the minimal sessions back channel. • Enable the pNFS server to inform the client that a previously denied LAYOUGET is now available. • When complete, the pNFS client and server will be able to setup and manage layout caches.

Client and server operations to be implemented:


Depends on Steps 1–4.

Exactly once semantics

• Revisit forward channel attributes on client and server. • Implement server replay cache

Depends on Steps 1 and 2.

pNFS reboot recovery

• Implement pNFS I/O failover to the metadata server using NFSv4.1 RPC. • Implement grace period recovery.

Depends on Steps 1–4.

Full sessions forward channel

• Implement the mandatory session forward channel features: • Trunking • OP_BIND_CONN_TO_SESSION • Kerberos and X509 machine credentials at mount for EXCHANGE_ID • SSV • Secure forward channel

Depends on Steps 1 and 2.

Full sessions back channel

• Implement the mandatory session forward channel features: • OP_BACKCHANNEL_CTL • SSV (secret state verifier) • Secure back channel • OP_CB_SEQUENCE

Depends on Steps 1–3.

pNFS device recall

• Implement the Draft 18 pNFS device recall feature.

Depends on Steps 1–3.

Back channel replay cache

• Implement the NFSv4.1 server replay cache required for exactly once semantics.

Depends on Steps 1–3.


When O_DIRECT is specified, READ and WRITE I/O bypass the NFS page cache.

• Add pNFS I/O callouts to fs/nfs/direct.c to get a layout. • Perform pNFS I/O.


The linux-pnfs-2.6-latest tree has the following functionality tested in 2.6.24-based kernels

• Step 1: switch on minor version • Step 2: minimal sessions forward channel • Step 3: minimal sessions back channel • Step 4: pNFS I/O • READ for file, block, and object layouts • WRITE for file, block, and object layouts is working, but needs patches factored for review

Personal tools