PNFS Development Road Map
From Linux NFS
(→Full sessions forward channel) |
m (→pNFS reboot recovery) |
||
Line 151: | Line 151: | ||
* Implement grace period recovery. | * Implement grace period recovery. | ||
- | ''Depends on minor version switch, | + | ''Depends on minor version switch, minimal sessions forward and back channels, and pNFS I/O'' |
===Full sessions forward channel=== | ===Full sessions forward channel=== |
Revision as of 19:55, 10 December 2007
Completing pNFS for Linux requires fighting three battles: IETF specification, Linux implementation, and integration into the Linux kernel.
Section 1 describes the status of NFSv4.1 specification, based on the IETF meeting that just ended.
Section II describes the plan for implementation and integration.
IETF Road Map
Will follow.
Linux pNFS Road Map
The Linux pNFS road map entails fighting three battles
Rebase the implementation on the latest Linux kernel
The current version of Linux pNFS is implemented on the 2.6.18.3 kernel. The Linux pNFS developers group is rebasing the code to the latest kernel, 2.6.24 at this writing.
Along the way to the current kernel, the NFS client and RPC layer saw major changes, complicating a direct port pNFS and sessions code. This led to two efforts to rebase the code:
- Patch forward
- Benny Halevy (Panasas) is rebasing the 2.6.18.3 sessions code through the multiple kernels along the path to the latest level.
- Rewrite
- A team at Network Appliance led by Ricardo Labiaga rewrote sessions for the latest Linux kernel and submitted patches to the Linux pNFS developers group for review. The forward channel code was added to linux-pnfs-2.6-latest, a git tree based on the latest kernel.
- Andy Adamson rewrote the pNFS I/O path. READ I/O patches to the latest kernel are under review by the Linux pNFS developers. WRITE I/O is being factored into patches of manageable size.
Fully implement the final specification
The current code implements draft-ietf-nfsv4-minorversion1-13. The current IETF specification is draft 18; draft 19 is anticipated before the next IETF meeting. Effort is required to bring the current code forward to draft 18 for the Austin Bakeathon in February 2008, and then to complete the implementation. This is detailed below.
Organize and submit a sequence of patches to the Linux maintainers
Once the code is ported to a git tree based on Linus’ kernel and brought forward to the final NFSv4.1 draft, a “ready to submit” branch can be made available to Linux kernel maintainers and pNFS developers for review, performance testing, and error testing.
The pNFS and sessions patches for 2.6.18.3 tree are huge and lack a patch history suitable for submission. Benny is creating small patches from the 2.6.18.3 code base and applying them to successive kernels, with the hope that at the end of the process, he will have preserved functionality and created a patch history useful for submitting to kernel review.
Ricardo and Andy approach the problem from the other direction. After rewriting pNFS and sessions for the latest kernel, they factor the code into small patches that they can submit for review by Linux kernel maintainers.
Components and dependencies
Switch on minor version
Provide the unified framework for minor versions in the NFSv4 client and server.
Minimal sessions, forward channel
Set up the minimal NFSv4.1 session over a forward channel, including session slot and sequence number management.
Client and server negotiate a session, place an OP_SEQUENCE as the first operation of every compound, and recover from session loss due to lease expiration.
Implement session keep-alive.
The client and server operations to be implemented for this step:
- OP_EXCHANGE_ID
- OP_CREATE_SESSION
- OP_SEQUENCE
- OP_DESTROY_SESSION
- Add OP_SEQUENCE to each compound
- State renewal
- RPC layer errors
Depends on minor version switch
Minimal sessions, back channel
- Set up a minimal NFSv4.1 session over a back channel negotiated between the client and the server.
- Use the forward channel code for session slot and sequence number management.
- Client and server will create back channel(s) and place a CB_SEQUENCE as the first operation on all CB_COMPOUND RPCs.
Client and server operations to be implemented:
- OP_CREATE_SESSION
- OP_CB_SEQUENCE
- OP_CB_RECALL_SLOT
Depends on minor version switch and minimal sessions forward channel
pNFS I/O READ and WRITE
The pNFS generic client supports two I/O paths that use the NFS page cache:
- an RPC based I/O path, used by the file layout module, and
- a non-RPC path, used by the block layout and object layout modules.
Implmentation steps:
- Negotiate pNFS layout type common to the pNFS client and server
- Client and server perform I/O over the file layout type
- Client returns layout on unmount
Client implementation:
- Generic pNFS client and layout API
- File layout, using the layout API
Server implementation:
- pNFS interface for NFSD, implemented as an extended export operations API, for exporting a pNFS capable file system.
The server API is used by the following prototypes:
- IBM GPFS file layout server,
- Panasas object layout server, and
- Network Appliance Linux MDS file layout server.
The Network Appliance Linux MDS prototype is not released at this writing.
Client and server operations to be implemented:
- OP_EXCHANGE_ID
- pNFS-specific OP_GETATTR attributes
- OP_GETDEVICELIST
- OP_GETDEVICEINFO
- OP_LAYOUTGET
- OP_LAYOUTCOMMIT
- OP_LAYOUTRETURN
Depends on minor version switch and minimal sessions forward channel
pNFS layout recall
- Enable the pNFS server to recall layouts using the minimal sessions back channel.
- Enable the pNFS server to inform the client that a previously denied LAYOUGET is now available.
- When complete, the pNFS client and server will be able to setup and manage layout caches.
Client and server operations to be implemented:
- OP_CB_SEQUENCE
- OP_CB_LAYOUTRECALL
- OP_CB_RECALLABLE_OBJ_AVAIL
- OP_LAYOUTGET
- OP_LAYOUTRETURN
Depends on minor version switch, and minimal sessions forward and back channels, and pNFS I/O
Exactly once semantics
- Revisit forward channel attributes on client and server.
- Implement server replay cache
Depends on minor version switch and minimal sessions forward channel
pNFS reboot recovery
- Implement pNFS I/O failover to the metadata server using NFSv4.1 RPC.
- Implement grace period recovery.
Depends on minor version switch, minimal sessions forward and back channels, and pNFS I/O
Full sessions forward channel
- Implement the mandatory session forward channel features:
- Trunking
- OP_BIND_CONN_TO_SESSION
- Kerberos and X509 machine credentials at mount for EXCHANGE_ID
- SSV
- Secure forward channel
Depends on minor version switch and minimal sessions forward a channel
Full sessions back channel
• Implement the mandatory session forward channel features: • OP_BACKCHANNEL_CTL • SSV (secret state verifier) • Secure back channel • OP_CB_SEQUENCE
Depends on Steps 1–3.
pNFS device recall
• Implement the Draft 18 pNFS device recall feature.
Depends on Steps 1–3.
Back channel replay cache
• Implement the NFSv4.1 server replay cache required for exactly once semantics.
Depends on Steps 1–3.
pNFS O_DIRECT I/O path
When O_DIRECT is specified, READ and WRITE I/O bypass the NFS page cache.
• Add pNFS I/O callouts to fs/nfs/direct.c to get a layout. • Perform pNFS I/O.
Status
The linux-pnfs-2.6-latest tree has the following functionality tested in 2.6.24-based kernels
• Step 1: switch on minor version • Step 2: minimal sessions forward channel • Step 3: minimal sessions back channel • Step 4: pNFS I/O • READ for file, block, and object layouts • WRITE for file, block, and object layouts is working, but needs patches factored for review