Linux pnfs client rewrite may 2006

From Linux NFS

Revision as of 19:39, 3 May 2006 by Andros (Talk | contribs)
Jump to: navigation, search

pNFS Client Rewrite May 2006

The current pNFS 2.6.16 CVS kernel client code combines the pNFS processing in the NFSv4 code path resulting in many #ifdefs and pnfs specific switching. The main purpose of this rewrite is to separate the pNFS code path from the NFSv2/v3/v4 code path. Following the method used to separate the NFSv2/v3/v4 code paths from each other, I created a new rpc_ops for pNFS, and moved all pNFS processing into the appropriate routines. New rpc_ops were created where necessary. Existing nfs functions were split when necessary.

The rpc_ops are set at mount. The NFSv4 client uses the nfs_v4_clientops rpc_ops as usual. If a server supports pNFS and a layout driver has been negotiated and initalized, the nfs_v4_clientops are replaced with the new pnfs_v4_clientops (see set_pnfs_layoutderiver()). The pnfs_v4_clientops also contain a reference to the new pnfs_file_operations, and which are now set via the rpc_ops.

Moving pNFS processing into their own rpc_ops allows for errors to be returned in rpc_ops calling routines that are ignored by the normal NFS code path, but required by pNFS. In the RPC based NFS read path, for example, the only error is -ENOMEM from allocating pages. All other errors are detected in the RPC path. pNFS has other possible errors, such as LAYOUTGET failing, or non-RPC based I/0 failing.

Four new rpc_ops allow pNFS to switch between using the normal server (server->rsize,rpages,wsize,wpages) read/write sizes and the data server read/write sizes (server->ds_rsize,ds_rpages,ds_wsize,ds_wpages) without if statements in the normal NFS code path. Note that there is still a chicken-and-egg problem due to this decision being made proir to the request size being calculated.

   rsize(struct inode *, struct nfs_read_data *)
   wsize(struct inode *, struct nfs_write_data *)
   rpages(struct inode *, unsigned int *)
   wpages(struct inode *, unsigned int *)

Two new rpc_ops allow isolation of pNFS processing from normal NFS processing in the pageing setup for read and write.

   pagein_one(struct list_head *, struct inode *)
   flush_one(struct inode *, struct list_head *, int, int)

Finally, non-RPC based I/0 drivers can use the page setup routines. At the conclusion of I/O, the pages need to be returned. This code exists in the RPC callback routines, which were called in the old pNFS client code. This resulted in many if(pnfs_XXX) switches around portions of the callback code that are RPC related. I split the callbacks into functions, and created new pNFS_xxx_norpc() callbacks that call only the portions of the RPC callbacks that apply.

I have applied the above to the read/write/commit code paths. I have yet to look at the directIO code path.


Personal tools