[Recovery] Cannot mount simple-longhorn block device

Honzapat · December 27, 2023, 10:22pm

Hello
I was using Harvester for a few months until a power failure struck our invertor, which rendered the supposedly HA cluster(2 nodes tho) unusable(starts, but the rke2 never boots).

Fortunately I can access the ssh, so I was able to find the longhorn volume of the VM I wanted. I have copied over replica and respective backing-image. After few struggles with WSL(doesn’t work) I managed to spin up simple-longhorn docker on another server. After installing open-scsi(would be nice to be mentioned somewhere) I was finally able to get the block device appear in /dev/longhorn/

Unfortunatly I still can’t mount this volume

sudo mount -o loop /dev/longhorn/pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f /mnt/
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.

simple-longhorn log

ubuntu@panel:~$ sudo docker run -it -v /home/ubuntu/longhorn/backing-images/default-image-knhwc-8c95e140:/host/var/lib/harvester/defaultdisk/backing-images/default-image-knhwc-8c95e140 -v /dev:/host/dev -v /proc:/host/proc -v /home/ubuntu/longhorn/replicas/pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f:/volume --privileged longhornio/longhorn-engine:v1.1.1 launch-simple-longhorn pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f 21474836480
+ set -e
+ mount --rbind /host/dev /dev
+ volume=pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f
+ size=21474836480
+ frontend=
+ '[' -z pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f ']'
+ '[' -z 21474836480 ']'
+ '[' -z ']'
+ echo Use default frontend TGT block device
Use default frontend TGT block device
+ frontend=tgt-blockdev
+ exec longhorn-instance-manager daemon
+ start
+ set +e
+ true
+ /usr/local/bin/grpc_health_probe -addr localhost:8500
INFO[0000] Storing process logs at path: /var/log/instances
[longhorn-instance-manager] INFO[0000] Instance Manager listening to localhost:8500
timeout: failed to connect service "localhost:8500" within 1s
+ [[ 2 -eq 0 ]]
+ sleep 1
+ true
+ /usr/local/bin/grpc_health_probe -addr localhost:8500
status: SERVING
+ [[ 0 -eq 0 ]]
+ echo longhorn instance manager is ready
longhorn instance manager is ready
+ break
+ set -e
+ tgtd -f
+ longhorn-instance-manager process create --name pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r --binary /usr/local/bin/longhorn --port-count 15 --port-args --listen,localhost: -- replica /volume/ --size 21474836480
+ tee /var/log/tgtd.log
tgtd: iser_ib_init(3431) Failed to initialize RDMA; load kernel modules?
tgtd: work_timer_start(146) use timer_fd based scheduler
tgtd: bs_init(387) use signalfd notification
[longhorn-instance-manager] INFO[0002] Process Manager: prepare to create process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r
[longhorn-instance-manager] INFO[0002] Process Manager: created process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r
{
        "name": "pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r",
        "binary": "/usr/local/bin/longhorn",
        "args": [
                "replica",
                "/volume/",
                "--size",
                "21474836480",
                "--listen",
                "localhost:10000"
        ],
        "portCount": 15,
        "portArgs": [
                "--listen,localhost:"
        ],
        "processStatus": {
                "state": "starting",
                "errorMsg": "",
                "portStart": 10000,
                "portEnd": 10014
        },
        "deleted": false
}
+ sleep 5
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:10Z" level=info msg="Listening on data server localhost:10001"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:10Z" level=info msg="Listening on sync agent server localhost:10002"
time="2023-12-27T20:44:10Z" level=info msg="Listening on gRPC Replica server localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:10Z" level=info msg="Listening on sync localhost:10002"
[longhorn-instance-manager] INFO[0002] Process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r has started at localhost:10000
+ longhorn-instance-manager process create --name pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e --binary /usr/local/bin/longhorn --port-count 1 --port-args --listen,localhost: -- controller pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f --frontend tgt-blockdev --replica tcp://localhost:10000
[longhorn-instance-manager] INFO[0007] Process Manager: prepare to create process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e
[longhorn-instance-manager] INFO[0007] Process Manager: created process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e
{
        "name": "pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e",
        "binary": "/usr/local/bin/longhorn",
        "args": [
                "controller",
                "pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f",
                "--frontend",
                "tgt-blockdev",
                "--replica",
                "tcp://localhost:10000",
                "--listen",
                "localhost:10015"
        ],
        "portCount": 1,
        "portArgs": [
                "--listen,localhost:"
        ],
        "processStatus": {
                "state": "starting",
                "errorMsg": "",
                "portStart": 10015,
                "portEnd": 10015
        },
        "deleted": false
}
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Starting with replicas [\"tcp://localhost:10000\"]"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Connecting to remote: localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:15Z" level=info msg="New connection from: 127.0.0.1:52128"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Opening: localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:15Z" level=info msg="Opening volume /volume/, size 21474836480/512"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Adding backend: tcp://localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Start monitoring tcp://localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Get backend tcp://localhost:10000 revision counter 35907865"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="device pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f: SCSI device /dev/longhorn/pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f shutdown"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] go-iscsi-helper: tgtd is already running
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:16Z" level=info msg="go-iscsi-helper: found available target id 1"
tgtd: device_mgmt(246) sz:118 params:path=/var/run/longhorn-pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f.sock,bstype=longhorn,bsopts=size=21474836480
tgtd: bs_thread_open(409) 16
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:16Z" level=info msg="New data socket connection established"
[longhorn-instance-manager] INFO[0008] wait for gRPC service of process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e to start at localhost:10015
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:16Z" level=info msg="default: automatically rescan all LUNs of all iscis sessions"
[longhorn-instance-manager] INFO[0009] wait for gRPC service of process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e to start at localhost:10015
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:17Z" level=info msg="Creating device /dev/longhorn/pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f 8:16"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:17Z" level=info msg="device pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f: SCSI device sdb created"
time="2023-12-27T20:44:17Z" level=info msg="Listening on gRPC Controller server: localhost:10015"
[longhorn-instance-manager] INFO[0010] wait for gRPC service of process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e to start at localhost:10015
[longhorn-instance-manager] INFO[0010] Process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e has started at localhost:10015

Can I ask for some help to recover the config files in the VM?

Thanks

PS: I have already gone thru the following issues/helps, but they do not address such issue, or not completely:

[QUESTION] How to recover from a catastrophic failure? · Issue #2714 · longhorn/longhorn · GitHub
Unable to mount volume directly! help longhorn is holding my data ransom - #13 by Jade_Baxter
github .com/longhorn/longhorn/issues/469
github .com/longhorn/longhorn/issues/31
github .com/longhorn/longhorn/issues/3817

tw0flower · October 8, 2024, 1:51am

This thread is old and I don’t know if @Honzapat has been able to recover their data, but it is one of the first results in Google when you look up this error.

The issue for me was that in the process of moving the data, the ext4 .img files had become non-sparse. And even if you transfer these files back to a system that supports non-sparse filesystems they will still be non-sparse.

If you transfer using rsync, make sure to use -S/–sparse. And if you already have the files locally, you can use fallocate -d to make the file sparse again. Both have worked for me.

I pulled my hair for so long with this and thought I had lost all my files. I feel like this should be emphasized in the documentation, it’s easy to fall in that trap.

In my case, the destination filesystem was ZFS, a fairly common filesystem which does not support sparse files.

Topic		Replies	Views
Unable to mount volume directly! help longhorn is holding my data ransom Longhorn	16	6596	September 28, 2021
What is in a Longhorn volume? Longhorn	6	10107	June 8, 2021
MountVolume.MountDevice failed after Node restart Longhorn	2	42	January 15, 2025
Rancher Volume not mounting in container OS Longhorn	5	2486	September 12, 2018
Mounted NFS share as a volume Longhorn	2	2616	April 29, 2024

[Recovery] Cannot mount simple-longhorn block device

Related topics