NFS Stale File Handle - RESOLUTION/WORKAROUND

rob.collins · May 23, 2017, 12:32am

Presentation:
Container(s) won’t start and give off many/various errors pertaining to ‘stale’ NFS mounts. Executing df -h</code/> within the rancher-nfs container exec shell shows mount points having ‘stale NFS file handle.’

Cause:
In our case, we did a live migration of NFS mounts on the NFS server that resulted in a 1-3 second export interruption. Many circumstances causing communication interruption between rancher-nfs and your network storage could have the same effect. No other services or applications within our platform (outside of rancher/docker) were bothered by this specific transition.

Explanation Definitions:
Volume - This is the name of the volume that represents the NFS mount point in Rancher. For the example below, we’ll call this 'DemoVolume’
NFS Target - This is the remote NFS server export that the driver and volume combination points to. For the example below, we’ll call this 'nfsServer01.localdomain.com:/RaidVol1/Docker/Rancher/DemoConfig’
Local Mount Path - This is the local filesystem path inside the container where the remote NFS Target is mounted. For the example below, we’ll call this '/home/user3/config’
Rancher-NFS Mount Path - This is the path on the rancher-nfs container where the NFS Target is mounted and the Rancher Volume is created. For the example below, we’ll call this '/var/lib/rancher/volumes/rancher-nfs/DemoVolume’
Volume Mount - This is the combination of Volume and Local Mount Path that you provide to Rancher during container creation.

Resolution:

From the rancher-nfs container exec shell, run umount -l /var/lib/rancher/volumes/rancher-nfs/DemoVolume where the path is the container-local mount path.
From the Rancher management GUI (or CLI, I guess) create a new container with the affected volume using any Local Mount Path within your dummy container: ‘DemoVolume:/home/user3/config’
Start the container.
From the rancher-nfs container exec shell, run df -h and note that the NFS mount appears again and is healthy. Restart of real/production containers leveraging the affected volume(s) may be necessary. We saw some images pickup the restored volume(s) and some not.

Any questions, leave them here. I couldn’t find any indication on any forums on how to fix this issue - so this hack is better than nothing for the time being. Hopefully, as it did in our case, it will save you a reboot of all the hosts in your environment.

orangewolf · October 25, 2017, 5:52am

Ran in to the same problem, though we were able to solve it more simply. in each instance of rancher-nfs container exec shell run umount -l /var/lib/rancher/volumes/rancher-fns; mount This will remount the files and you can simply restart instances (our restarted automatically).

Topic		Replies	Views
Help with rancher-nfs driver Rancher 1.x	1	1949	January 8, 2017
Help Understanding Volumes Rancher	1	677	August 5, 2019
Multiple storage driver for one environment Rancher 1.x	4	2587	January 3, 2017
Nginx mounting volumes issue Rancher 1.x	3	2922	December 20, 2017
Creation of new stack with nfs volume stuck Rancher 1.x	4	2577	August 30, 2017

NFS Stale File Handle - RESOLUTION/WORKAROUND

Related topics