PNFS Server Filesystem API Design

From Linux NFS

Revision as of 14:02, 6 December 2012 by BennyHalevy (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search



The pNFS server introduces an extension to the filesystem VFS API to call into the filesystem for exporting it over pNFS. Essentially, the filesystem is called upon receiving respective pNFS protocol operations such as LAYOUTGET and GETDEVICEINFO, and in turn, it may use a set of layout-type library calls, implemented in the exportfs kernel module, to decode and encode layout-type specific functions so that fielsystems that share the same layout type can share this common code.

Callbacks, such as CB_LAYOUTRECALL and CB_NOTIFY_DEVICEID can be generated by the filesystem using the pnfsd cb operations vector provided by the nfsd module.

pNFS Export Operations

The pNFS export operations are defined in the following header file:


struct pnfs_export_operations;

The filesystem must implement the following methods for basic functionality:

layout_type Returns the supported pnfs_layouttype4
get_device_info Encode device info onto the xdr stream
layout_get Retrieve and encode a layout for inode onto the xdr stream
set_device_notify Implement device notification negotiation
get_device_iter Retrieve all available devices via an iterator
layout_commit Commit changes to layout
  • layout-type specific arguments to layout commit should be handled here
layout_return Returns the layout.

Note that this method may be called internally by pnfsd upon, e.g.
layout recall returning NFS4ERR_NOMATCHING_LAYOUT or upon client expiration.

  • layout-type specific arguments to layout commit should be handled here
can_merge_layouts Policy call. Can layout segments be merged for this layout type?
files layout only
get_verifier Get the write verifier for DS (called on MDS only)
get_state Call fs on DS only

pNFSd Callback Operations

The pNFSd callback operations are defined in the following header files:


struct pnfsd_cb_operations;


To use the pnfsd callback operations, the filesystem module must call pnfsd_get_cb_op() to get a reference on the global vector the nfsd module provides. The motivation for doing this is the reverse dependency of the filesystem on the nfsd module so that nfsd won't be able to go down while the filesystem is up, allowing it to call a callback function into thin air. pnfsd_put_cb_op() is called by the filesystem to release the reference.

cb_layout_recall Recall layout(s).

nfsd4_pnfs_cb_layout is used to specify the recall scope for per-file, fsid, or all layouts
and to specify which client to recall the layout from (by providing non-zero clientid)

cb_device_notify Notify device ids change or delete
files layout only
cb_get_state Callback from fs on MDS only
cb_change_state Callback from fs on DS only

Locking Design

The nfsd state lock is never held while calling into the file system to avoid potential deadlocks that may be caused by the following scenario:

  • State lock is held by nfsd
  • The fs is being called (e.g. close())
  • The fs generates a callback to the client (e.g. cb_layout_recall)
  • The client sends a layout_return synchronously with the callback before replying to it.
  • For serving layout_return, nfsd needs to acquire the state lock
Personal tools