Output: mount.nfs: Protocol not supported

Hi folks,

Any help would be really appreciated

I have the following environment:

kubectl version

Client Version: v1.28.6
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.5+k3s1

4 nodes, 2 dedicated for generic pods, 2 dedicated for Longhorn volumes.

I have a volume created as ReadWriteMany

I have 1 deployment with pods running across the 2 nodes accessing the ReadWriteMany. All working as intended.

The installation was based on default configurations

Now the tricky part that makes everything so confuse. As soon as I add a new node, no pod running in this node is able to mount the volume (see error message below). All others pods running in different nodes still works fine.

Mounting arguments: mount -t nfs -o vers=4.1,noresvport,intr,hard 10.43.40.29:/longhorn-data /var/lib/kubelet/plugins/kubernetes.io/csi/driver.longhorn.io/edf8a47b178e53d854331a01716c090cf6bf6f2cc5ce2fc0e99d057589d1b542/globalmount
Output: mount.nfs: Protocol not supported

Each node has the following information

node01 - nfsstat –s

Server rpc stats:
calls      badcalls   badfmt     badauth    badclnt
0          0          0          0          0       

Client rpc stats:
calls      retrans    authrefrsh
131095257   2          131113216

Client nfs v4:
null             read             write            commit           open             
10        0%     3549325   2%     17609021 13%     9141      0%     3933223   3%   

node02 - nfsstat –s

Server rpc stats:
calls      badcalls   badfmt     badauth    badclnt
0          0          0          0          0       

Client rpc stats:
calls      retrans    authrefrsh
18727523   0          18729915

Client nfs v4:
null             read             write            commit           open             
2         0%     11        0%     3244833  17%     240       0%     954546    5%  

node03 - nfsstat –s (with the problem)

Client rpc stats:
calls      retrans    authrefrsh
49         0          49      

Client nfs v4:
null             read             write            commit           open             
7        14%     0         0%     0         0%     0         0%     0         0%

Node00 - systemctl status iscsid.service

● iscsid.service - iSCSI initiator daemon (iscsid)
     Loaded: loaded (/lib/systemd/system/iscsid.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2024-02-15 18:42:58 PST; 11s ago
TriggeredBy: ● iscsid.socket
       Docs: man:iscsid(8)
    Process: 3302780 ExecStartPre=/lib/open-iscsi/startup-checks.sh (code=exited, status=0/SUCCESS)
    Process: 3302793 ExecStart=/sbin/iscsid (code=exited, status=0/SUCCESS)
   Main PID: 3302795 (iscsid)
      Tasks: 2 (limit: 9269)
     Memory: 3.2M
     CGroup: /system.slice/iscsid.service
             ├─3302794 /sbin/iscsid
             └─3302795 /sbin/iscsid

Feb 15 18:42:58 chromebox00 systemd[1]: Starting iSCSI initiator daemon (iscsid)...
Feb 15 18:42:58 chromebox00 iscsid[3302793]: iSCSI logger with pid=3302794 started!
Feb 15 18:42:58 chromebox00 systemd[1]: iscsid.service: Failed to parse PID from file /run/iscsid.pid: Invalid argument
Feb 15 18:42:58 chromebox00 systemd[1]: Started iSCSI initiator daemon (iscsid).
Feb 15 18:42:59 chromebox00 iscsid[3302794]: iSCSI daemon with pid=3302795 started!

Node01 - systemctl status iscsid.service

● iscsid.service - iSCSI initiator daemon (iscsid)
     Loaded: loaded (/lib/systemd/system/iscsid.service; disabled; vendor preset: enabled)
     Active: active (running) since Thu 2024-02-08 10:13:38 PST; 1 weeks 0 days ago
TriggeredBy: ● iscsid.socket
       Docs: man:iscsid(8)
   Main PID: 4161 (iscsid)
      Tasks: 2 (limit: 9269)
     Memory: 3.6M
     CGroup: /system.slice/iscsid.service
             ├─4160 /sbin/iscsid
             └─4161 /sbin/iscsid

Node02 - systemctl status iscsid.service

● iscsid.service - iSCSI initiator daemon (iscsid)
     Loaded: loaded (/lib/systemd/system/iscsid.service; disabled; vendor preset: enabled)
     Active: active (running) since Fri 2024-02-16 02:44:06 UTC; 2s ago
TriggeredBy: ● iscsid.socket
       Docs: man:iscsid(8)
    Process: 23691 ExecStartPre=/lib/open-iscsi/startup-checks.sh (code=exited, status=0/SUCCESS)
    Process: 23694 ExecStart=/sbin/iscsid (code=exited, status=0/SUCCESS)
   Main PID: 23696 (iscsid)
      Tasks: 2 (limit: 37645)
     Memory: 2.8M
        CPU: 8ms
     CGroup: /system.slice/iscsid.service
             ├─23695 /sbin/iscsid
             └─23696 /sbin/iscsid

Feb 16 02:44:06 chromebox02 systemd[1]: Starting iSCSI initiator daemon (iscsid)...
Feb 16 02:44:06 chromebox02 iscsid[23694]: iSCSI logger with pid=23695 started!
Feb 16 02:44:06 chromebox02 systemd[1]: Started iSCSI initiator daemon (iscsid).
Feb 16 02:44:07 chromebox02 iscsid[23695]: iSCSI daemon with pid=23696 started!

My PersitentVolume

kind: PersistentVolume
apiVersion: v1
metadata:
  name: longhorn-data
  uid: 21be760f-f2f3-4dde-ae09-a179568c7bd4
  resourceVersion: '246513'
  creationTimestamp: '2024-02-04T20:12:19Z'
  annotations:
    longhorn.io/volume-scheduling-error: ''
    pv.kubernetes.io/bound-by-controller: 'yes'
  finalizers:
    - kubernetes.io/pv-protection
    - external-attacher/driver-longhorn-io
  managedFields:
    - manager: k3s
      operation: Update
      apiVersion: v1
      time: '2024-02-04T20:12:20Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:pv.kubernetes.io/bound-by-controller: {}
        f:spec:
          f:claimRef:
            .: {}
            f:apiVersion: {}
            f:kind: {}
            f:name: {}
            f:namespace: {}
            f:resourceVersion: {}
            f:uid: {}
    - manager: k3s
      operation: Update
      apiVersion: v1
      time: '2024-02-04T20:12:20Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:phase: {}
      subresource: status
    - manager: csi-attacher
      operation: Update
      apiVersion: v1
      time: '2024-02-04T20:23:40Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:finalizers:
            v:"external-attacher/driver-longhorn-io": {}
    - manager: longhorn-manager
      operation: Update
      apiVersion: v1
      time: '2024-02-05T00:13:49Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:longhorn.io/volume-scheduling-error: {}
        f:spec:
          f:accessModes: {}
          f:capacity:
            .: {}
            f:storage: {}
          f:csi:
            .: {}
            f:driver: {}
            f:fsType: {}
            f:volumeAttributes:
              .: {}
              f:diskSelector: {}
              f:migratable: {}
              f:nodeSelector: {}
              f:numberOfReplicas: {}
              f:staleReplicaTimeout: {}
            f:volumeHandle: {}
          f:persistentVolumeReclaimPolicy: {}
          f:storageClassName: {}
          f:volumeMode: {}
spec:
  capacity:
    storage: 400Gi
  csi:
    driver: driver.longhorn.io
    volumeHandle: longhorn-data
    fsType: ext4
    volumeAttributes:
      diskSelector: ''
      migratable: 'false'
      nodeSelector: ''
      numberOfReplicas: '3'
      staleReplicaTimeout: '20'
  accessModes:
    - ReadWriteMany
  claimRef:
    kind: PersistentVolumeClaim
    namespace: default
    name: longhorn-data
    uid: af5bfd17-b1b1-437a-a3b1-a9c90ba724f7
    apiVersion: v1
    resourceVersion: '221539'
  persistentVolumeReclaimPolicy: Retain
  storageClassName: longhorn-static
  volumeMode: Filesystem
status:
  phase: Bound

My Volume specs

State: Attached
Health: Healthy
Ready for workload:Ready
Conditions: scheduled
Frontend:Block Device
Backend Data Engine:v1
Offline Replica Rebuilding:disabled
Attached Node & Endpoint: 
diskserver00
/dev/longhorn/longhorn-data
Size: 400 Gi
Actual Size:262 Gi
Data Locality:disabled
Access Mode:ReadWriteMany
Engine Image: longhornio/longhorn-engine:v1.5.1
Created: 11 days ago
Encrypted: False
Node Tags:
Disk Tags:
Last Backup:
Last Backup At:
Replicas Auto Balance:ignored
Snapshot Data Integrity:ignored
Instance Manager:
instance-manager-6fcf05c270e6a8475ad7300181f1b8cb
Allow snapshots removal during trim:ignored
Replica Soft Anti Affinity:ignored
Replica Zone Soft Anti Affinity:ignored
Namespace:default
PVC Name:longhorn-data
PV Name:longhorn-data
PV Status:Bound
Revision Counter Disabled:False

Pod Name:crawler-master-6bbd6776f5-bdqwz
Pod Status:Running
Workload Name:crawler-master-6bbd6776f5
Workload Type:ReplicaSet

Pod Name:crawler-worker-7b9f6b4785-2g697
Pod Status:Pending
Workload Name:crawler-worker-7b9f6b4785
Workload Type:ReplicaSet

Pod Name:crawler-worker-7b9f6b4785-7xgjz
Pod Status:Running
Workload Name:crawler-worker-7b9f6b4785
Workload Type:ReplicaSet

Pod Name:crawler-worker-7b9f6b4785-bs26x
Pod Status:Running
Workload Name:crawler-worker-7b9f6b4785
Workload Type:ReplicaSet

Pod Name:crawler-worker-7b9f6b4785-vd6tc
Pod Status:Running
Workload Name:crawler-worker-7b9f6b4785
Workload Type:ReplicaSet