Error after migration

Hello, some time in the past after one of upgrade migration (I’m using currently 1.2.2) backup tab stopped to work, showing an error:

error listing backup volume names: Failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.2.2/longhorn [backup ls --volume-only nfs://omv:/backup/longhorn], output Failed to execute: ls [-1 /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes], output ls: cannot access ‘/var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes’: Permission denied , error exit status 2 , stderr, time=“2021-10-14T16:29:34Z” level=warning msg=“failed to list first level dirs for path: backupstore/volumes reason: Failed to execute: ls [-1 /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes], output ls: cannot access ‘/var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes’: Permission denied\n, error exit status 2” pkg=backupstore time=“2021-10-14T16:29:34Z” level=error msg=“Failed to execute: ls [-1 /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes], output ls: cannot access ‘/var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes’: Permission denied\n, error exit status 2” , error exit status 1

Besides that, everything works as expected.

Here’s my deployment: https://github.com/Marx2/homelab/blob/main/cluster/core/longhorn/helm-release.yaml

Can you help me fix this problem?

Edit: I’m using 3 nodes
Follder /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes exists only on one of them, and as a root I can’t access it:

root@longhorn-manager-chksw:/# ls -al /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/
ls: cannot open directory ‘/var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/’: Permission denied

Could you please go to the node and try to umount the NFS mount point manually, Then, longhorn will remount it when it needs to access the remote backup target.

Hi, I did it
Mount is:

/dev/mapper/pve-root on /var/lib/longhorn-setting type ext4 (ro,relatime,errors=remount-ro)
omv:/backup/longhorn on /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.42.1.61,local_lock=none,addr=192.168.1.230)

After unmounting I went to Dashboard and it got remounted automatically like this:

/dev/mapper/pve-root on /var/lib/longhorn-setting type ext4 (ro,relatime,errors=remount-ro)
omv:/backup/longhorn on /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.42.1.61,local_lock=none,addr=192.168.1.230)

Unfortunatelly gooing int backup tab still shows the same error. It’s also visible in manager’s logs:

time="2021-10-15T05:59:34Z" level=error msg="Error listing backup volumes from backup target" controller=longhorn-backup-target cred= error="error listing backup volume names: Failed to execute: /var/lib/longhorn/engine-binaries/longhornio-longhorn-engine-v1.2.2/longhorn [backup ls --volume-only nfs://omv:/backup/longhorn], output Failed to execute: ls [-1 /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes], output ls: cannot access '/var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes': Permission denied\n, error exit status 2\n, stderr, time=\"2021-10-15T05:59:34Z\" level=warning msg=\"failed to list first level dirs for path: backupstore/volumes reason: Failed to execute: ls [-1 /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes], output ls: cannot access '/var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes': Permission denied\\n, error exit status 2\" pkg=backupstore\ntime=\"2021-10-15T05:59:34Z\" level=error msg=\"Failed to execute: ls [-1 /var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes], output ls: cannot access '/var/lib/longhorn-backupstore-mounts/omv/backup/longhorn/backupstore/volumes': Permission denied\\n, error exit status 2\"\n, error exit status 1" interval=5m0s node=wezyr url="nfs://omv:/backup/longhorn"

Can it be somehow connected with this bug?

I can’t test it, because I use Longhorn from Helm chart, and I don’t know, how to pass flag

    - --default-fstype=ext4

I’ve made this changen in Helm chart also, but doesn’t seem to work:

 values:
    persistence:
      defaultFsType: ext4

I guess the NFS server is inside the Kubernetes cluster.

  • Can you check how your NFS was deployed?
  • Was the NFS server accessible by other in-cluster Pods after the upgrade?

The bug [BUG] Longhorn 1.2.0 - wrong volume permissions inside container / broken fsGroup · Issue #2964 · longhorn/longhorn · GitHub is not related to this issue.
That bug is fixed in Longhorn v1.2.2 that is the version you are using.

maybe it’s not fixed? Or maybe it is, but I need to do something on my own to make it working (like reset, restart, reconfigure sth.)? How can I check it?
I suppose reinstalling would fix it, but as I have no backup working, I can’t do that

Sorry for the late response. The fix only works for the newly provisioned PV.

You will lose any ability to add or remove nodes from the cluster, perform etcd backups or disaster recovery, or edit any of the cluster configuration if you launch a Kubernetes cluster in one Rancher instance and then try to import it into another Rancher instance using the imported cluster feature.