Configuring pNFS/spnfsd

From Linux NFS

Revision as of 07:28, 15 August 2011 by BennyHalevy (Talk | contribs)
Jump to: navigation, search


What is pNFS ?

pNFS is a new NFS feature provided in NFSv4.1, also known as Parallel NFS. Parallel NFS (pNFS) extends Network File Sharing version 4 (NFSv4) to allow clients to directly access file data on the storage used by the NFSv4 server. This ability to bypass the server for data access can increase both performance and parallelism, but requires additional client functionality for data access, some of which is dependent on the class of storage used.

Parallel NFS comes with various ways of accessing the data directly. For the moment, three such "layouts" have been provided.

  • the LAYOUT4_FILE that stripes accross multiple NFS Server
  • the LAYOUT4_BLOCK_VOLUME that allow the client to access data as stored in a block device
  • LAYOUT4_OSD2_OBJECTS that is based on the OSD2 protocol.

NFSv4.1 and pNFS are described by the following RFCs:

  • RFC5661 : Network File System (NFS) Version 4 Minor Version 1 Protocol
  • RFC5662 : Network File System (NFS) Version 4 Minor Version 1, External Data Representation Standard (XDR) Description
  • RFC5663 : Parallel NFS (pNFS) Block/Volume Layout
  • RFC5664 : Object-Based Parallel NFS (pNFS) Operations

What is spNFS ?

spNFS is a simple pNFS LAYOUT4_FILE server implementation that uses independent nfs servers as data servers, and puts most of the MDS logic in a userspace daemon. As of early 2011, it is mostly unmaintained, and we no longer recommend its use; the following directions are still available if you want to try spNFS anyway, but you may be better off with a different server implementation (see

Content of this document

This document describes how 3 machines were set up to build a basic pNFS/LAYOUT4_FILE test configuration using spNFS on the server side.

(WARNING: as of February 2011, the spNFS code is mostly unmaintained; we no longer recommend it.)

The machines I used are:

  • nfsmds, IP addr = XX.YY.ZZ.A, used as Metadata Server
  • nfsds, IP addr = XX.YY.ZZ.B, used a Data Server
  • nfsclient, IP addr = XX.YY.ZZ.C, used as client

Where is the source code?

The first things to be done are recompiling a kernel and a nfs-utils distribution that are compatible. I used those from Benny Halevy's git repository:

 # Get kernel repository
 git clone git://
 # Get nfs-utils repository

For this document, I used the repositories version with the following status:

  • pnfs-nfs-utils: commit id = 2b5373db8615a52c47dbcf3ab968fad7cdcc6fed (pnfs-nfs-utils-1-2-2)
  • kernel linux-pnfs: commit id = cbd09e0fb2b160a06a44aad1c21786b99401823f (pnfs-all-2.6.33-2010-03-09)

Let's go configuring now...

Building the pnfs Kernel

The kernel compilation goes ok. Just make sure that you have the right options configured in .config

       # CONFIG_PNFSD_LOCAL_EXPORT is not set

With kernel 2.6.34 or higher, add (should be the same as CONFIG_NFS_FS)


Building nfs-utils

Compiling pnfs-nfs-utils will be done as this

 # autoreconf --instal
 # ./configure --prefix=/usr && make && make install

but you have to make sure that you have the following products installed (all nodes were installed with a Fedora 12):

  • libtirpc + libtirpc-dev
  • tcp_wrappers + tcp_wrapper-libs + tcp_wrappers-devel
  • libblkid + libblkid-devel
  • libevent + libevent-devel
  • libnfsidmap
  • device-mapper-devel (starting Fedora 15)

You'll find all of them as rpm packages, but the libnfsidmap. For this one, you'll have to get the lastest version, compile and install it (do not forget to specify "./configure --prefix=/usr"). You can get it from nfs-utils-lib-devel-1.1.4-8 or higher as well.

Basically, as command like the following one should do all the required work (example for Fedora 15):

 # yum install libtirpc{,-devel} tcp_wrappers{,-devel} libevent{,-devel} libnfsidmap{,-devel} openldap-devel \
               libgssglue{,-devel} krb5-devel libblkid{,-devel} device-mapper-devel libcap{,-devel}

Configuring the test bed to used pNFS over LAYOUT4_FILES

In this configuration, the client (nfsclient) will mount the MDS (nfsmds). The client has inserted a specific kernel module, known as the layout driver to connect to the DS. All of the metadata traffic will go through the MDS, but data traffic will be done in-between the DS and the client.

The MDS should be able to mount the DS and have root access on it. It runs a user space daemon, the spnfsd (which is part of nfs-utils) that uses this mount point to get information from the DS.

Configuring the spNFS Data Server

The Data Server is just a regular NFSv4.1 server. It is important that the Metadata Data Server had root access on it, to prevent from weird behaviour due to EPERM errors.

The Data Server's /etc/exports will look like this on nfsds:

 /export/spnfs  *(rw,sync,fsid=0,insecure,no_subtree_check,pnfs,no_root_squash)

Configuring the spNFS Metadata Server

The MDS is a client to the DS, and runs the spnfsd. It is as well a NFSv4.1 server with pNFS enabled.

The spnfsd configuration is done in two steps:

  • configuring the MDS as client to the DS
  • Writing the /etc/spnfsd.conf file

On the MDS, the /etc/fstab should contain this line:

 nfsds:/       /spnfs/XX.YY.ZZ.B   nfs4    minorversion=1        0 0

It is mandatory to have the mount point done over NFSv4 and with minorversion set to 1.

Its /etc/spnfsd will look like this (this is a single DS configuration)

 Verbosity = 1
 Stripe-size = 8192
 Dense-striping = 0
 Pipefs-Directory = /var/lib/nfs/rpc_pipefs
 DS-Mount-Directory = /spnfs
 NumDS = 1
 DS1_PORT = 2049
 DS1_ROOT = /
 DS1_ID = 1

Finally the /etc/exports will be like this

 /export  *(rw,sync,pnfs,fsid=0,insecure,no_subtree_check,no_root_squash)

Notice the pnfs token within the export's options

Configuring the client

The client is to be used as a regular NFSv4.1 client. The only thing to do is making sure that layout driver kernel module is loaded

 # modprobe nfs_layout_nfsv41_files

(previously called nfslayoutdriver in pre-2.6.26 kernels)

Then you can mount the MDS on the client:

 # mount -t nfs4 -o minorversion=1 nfsmds:/ /mnt

warning: Before making any read/write operations, make sure that the NFSv4 grace delay is passed. Usually it take 90s after the nfs service starts..

Basic test

The first test is pretty simple: On the client, I write 50 bytes to a file:

 # echo "jljlkjljjhkjhkhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhkjhk" > ./myfile
 # ls -i ./myfile
 330246 myfile

On the DS, I should see a new file whose name contains the fileid of myfile and located in the root of what it exports to the MDS.

 # ls -l /export/spnfs/330246*
 -rwxrwxrwx 1 root root 50 Mar 24 10:49 /export/spnfs/330246.2343187478
 # cat /export/spnfs/330246.2343187478

As you can see, this file, located on the DS contains the data written by the client.

On the MDS, the file has the right size, but no blocks allocated if watched outside NFS. It contains no data.

 # cd /export
 # stat myfile
 File: `myfile'
 Size: 50              Blocks: 0            IO Block: 4096   regular file
 Device: fd00h/64768d    Inode: 330246      Links: 1
 Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
 Access: 2010-03-24 12:56:02.331151053 +0100
 Modify: 2010-03-24 10:49:08.997150735 +0100
 Change: 2010-03-24 10:49:08.997150735 +0100
 # cat myfile
 (no output, the file is empty)

-- Philippe Deniel 2010-04-07

Personal tools