Hello
I was using Harvester for a few months until a power failure struck our invertor, which rendered the supposedly HA cluster(2 nodes tho) unusable(starts, but the rke2 never boots).
Fortunately I can access the ssh, so I was able to find the longhorn volume of the VM I wanted. I have copied over replica and respective backing-image. After few struggles with WSL(doesn’t work) I managed to spin up simple-longhorn docker on another server. After installing open-scsi(would be nice to be mentioned somewhere) I was finally able to get the block device appear in /dev/longhorn/
Unfortunatly I still can’t mount this volume
sudo mount -o loop /dev/longhorn/pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f /mnt/
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error.
simple-longhorn log
ubuntu@panel:~$ sudo docker run -it -v /home/ubuntu/longhorn/backing-images/default-image-knhwc-8c95e140:/host/var/lib/harvester/defaultdisk/backing-images/default-image-knhwc-8c95e140 -v /dev:/host/dev -v /proc:/host/proc -v /home/ubuntu/longhorn/replicas/pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f:/volume --privileged longhornio/longhorn-engine:v1.1.1 launch-simple-longhorn pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f 21474836480
+ set -e
+ mount --rbind /host/dev /dev
+ volume=pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f
+ size=21474836480
+ frontend=
+ '[' -z pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f ']'
+ '[' -z 21474836480 ']'
+ '[' -z ']'
+ echo Use default frontend TGT block device
Use default frontend TGT block device
+ frontend=tgt-blockdev
+ exec longhorn-instance-manager daemon
+ start
+ set +e
+ true
+ /usr/local/bin/grpc_health_probe -addr localhost:8500
INFO[0000] Storing process logs at path: /var/log/instances
[longhorn-instance-manager] INFO[0000] Instance Manager listening to localhost:8500
timeout: failed to connect service "localhost:8500" within 1s
+ [[ 2 -eq 0 ]]
+ sleep 1
+ true
+ /usr/local/bin/grpc_health_probe -addr localhost:8500
status: SERVING
+ [[ 0 -eq 0 ]]
+ echo longhorn instance manager is ready
longhorn instance manager is ready
+ break
+ set -e
+ tgtd -f
+ longhorn-instance-manager process create --name pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r --binary /usr/local/bin/longhorn --port-count 15 --port-args --listen,localhost: -- replica /volume/ --size 21474836480
+ tee /var/log/tgtd.log
tgtd: iser_ib_init(3431) Failed to initialize RDMA; load kernel modules?
tgtd: work_timer_start(146) use timer_fd based scheduler
tgtd: bs_init(387) use signalfd notification
[longhorn-instance-manager] INFO[0002] Process Manager: prepare to create process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r
[longhorn-instance-manager] INFO[0002] Process Manager: created process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r
{
"name": "pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r",
"binary": "/usr/local/bin/longhorn",
"args": [
"replica",
"/volume/",
"--size",
"21474836480",
"--listen",
"localhost:10000"
],
"portCount": 15,
"portArgs": [
"--listen,localhost:"
],
"processStatus": {
"state": "starting",
"errorMsg": "",
"portStart": 10000,
"portEnd": 10014
},
"deleted": false
}
+ sleep 5
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:10Z" level=info msg="Listening on data server localhost:10001"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:10Z" level=info msg="Listening on sync agent server localhost:10002"
time="2023-12-27T20:44:10Z" level=info msg="Listening on gRPC Replica server localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:10Z" level=info msg="Listening on sync localhost:10002"
[longhorn-instance-manager] INFO[0002] Process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r has started at localhost:10000
+ longhorn-instance-manager process create --name pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e --binary /usr/local/bin/longhorn --port-count 1 --port-args --listen,localhost: -- controller pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f --frontend tgt-blockdev --replica tcp://localhost:10000
[longhorn-instance-manager] INFO[0007] Process Manager: prepare to create process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e
[longhorn-instance-manager] INFO[0007] Process Manager: created process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e
{
"name": "pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e",
"binary": "/usr/local/bin/longhorn",
"args": [
"controller",
"pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f",
"--frontend",
"tgt-blockdev",
"--replica",
"tcp://localhost:10000",
"--listen",
"localhost:10015"
],
"portCount": 1,
"portArgs": [
"--listen,localhost:"
],
"processStatus": {
"state": "starting",
"errorMsg": "",
"portStart": 10015,
"portEnd": 10015
},
"deleted": false
}
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Starting with replicas [\"tcp://localhost:10000\"]"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Connecting to remote: localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:15Z" level=info msg="New connection from: 127.0.0.1:52128"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Opening: localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-r] time="2023-12-27T20:44:15Z" level=info msg="Opening volume /volume/, size 21474836480/512"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Adding backend: tcp://localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Start monitoring tcp://localhost:10000"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="Get backend tcp://localhost:10000 revision counter 35907865"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:15Z" level=info msg="device pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f: SCSI device /dev/longhorn/pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f shutdown"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] go-iscsi-helper: tgtd is already running
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:16Z" level=info msg="go-iscsi-helper: found available target id 1"
tgtd: device_mgmt(246) sz:118 params:path=/var/run/longhorn-pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f.sock,bstype=longhorn,bsopts=size=21474836480
tgtd: bs_thread_open(409) 16
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:16Z" level=info msg="New data socket connection established"
[longhorn-instance-manager] INFO[0008] wait for gRPC service of process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e to start at localhost:10015
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:16Z" level=info msg="default: automatically rescan all LUNs of all iscis sessions"
[longhorn-instance-manager] INFO[0009] wait for gRPC service of process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e to start at localhost:10015
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:17Z" level=info msg="Creating device /dev/longhorn/pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f 8:16"
[pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e] time="2023-12-27T20:44:17Z" level=info msg="device pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f: SCSI device sdb created"
time="2023-12-27T20:44:17Z" level=info msg="Listening on gRPC Controller server: localhost:10015"
[longhorn-instance-manager] INFO[0010] wait for gRPC service of process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e to start at localhost:10015
[longhorn-instance-manager] INFO[0010] Process pvc-c69c17b1-c61a-405b-8036-f8d2efae3bb7-37126f0f-e has started at localhost:10015
Can I ask for some help to recover the config files in the VM?
Thanks
PS: I have already gone thru the following issues/helps, but they do not address such issue, or not completely:
- [QUESTION] How to recover from a catastrophic failure? · Issue #2714 · longhorn/longhorn · GitHub
- Unable to mount volume directly! help longhorn is holding my data ransom - #13 by Jade_Baxter
- github .com/longhorn/longhorn/issues/469
- github .com/longhorn/longhorn/issues/31
- github .com/longhorn/longhorn/issues/3817