I have tried installing Longhorn on a 4-node k3s cluster - one x86 PowerEdge (control-plane), 3 Raspberry Pi 4s. Longhorn storage for the PowerEdge is on an SSD, and for the Pis is on attached powered HDDs (one per Pi). All Longhorn pods are running with no errors/restarts, but all volumes report as degraded and any replicas that aren’t on the control-plane node quickly error out, are recreated, and error out again:
$ kubectl get replicas.longhorn.io -n longhorn-system
NAME STATE NODE DISK INSTANCEMANAGER IMAGE AGE
pvc-58e326c8-3777-42ad-89cf-9eacf0ba0fb7-r-6bb2c2a1 running epsilon 16a8c5dc-4b44-4100-94f6-0995c0a85b8b instance-manager-c8e888869e140a5618feabb01783baaa longhornio/longhorn-engine:v1.5.0 14h
pvc-f7fda1f2-897b-4310-8548-c1ad64040ea1-r-adad8378 running epsilon 16a8c5dc-4b44-4100-94f6-0995c0a85b8b instance-manager-c8e888869e140a5618feabb01783baaa longhornio/longhorn-engine:v1.5.0 4d3h
pvc-acd10539-0474-49b6-837b-b985c65b8925-r-08d9fd13 running epsilon 16a8c5dc-4b44-4100-94f6-0995c0a85b8b instance-manager-c8e888869e140a5618feabb01783baaa longhornio/longhorn-engine:v1.5.0 4d3h
pvc-47c0f6df-3b36-456c-a650-991d7131fa82-r-047c7b26 running epsilon 16a8c5dc-4b44-4100-94f6-0995c0a85b8b instance-manager-c8e888869e140a5618feabb01783baaa longhornio/longhorn-engine:v1.5.0 4d4h
pvc-747fcde6-1e35-48de-9b30-74d601870c43-r-57e62375 stopped 8m9s
pvc-747fcde6-1e35-48de-9b30-74d601870c43-r-f4cbbf91 stopped rasnu1 085c5398-ff0d-4aae-a3ee-2a9133e1c564 8m9s
pvc-747fcde6-1e35-48de-9b30-74d601870c43-r-6b053823 stopped rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b 8m9s
pvc-f7fda1f2-897b-4310-8548-c1ad64040ea1-r-538a3bb8 stopped rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b 3s
pvc-58e326c8-3777-42ad-89cf-9eacf0ba0fb7-r-4e1c8d50 stopped rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b 2s
pvc-acd10539-0474-49b6-837b-b985c65b8925-r-9f12f6ee stopped rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b 2s
pvc-47c0f6df-3b36-456c-a650-991d7131fa82-r-ab2b9023 stopped rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b 2s
pvc-f7fda1f2-897b-4310-8548-c1ad64040ea1-r-fea692f8 stopped rasnu1 085c5398-ff0d-4aae-a3ee-2a9133e1c564 2s
pvc-58e326c8-3777-42ad-89cf-9eacf0ba0fb7-r-18bd95b9 starting rasnu1 085c5398-ff0d-4aae-a3ee-2a9133e1c564 instance-manager-754ee9c224195750355753939287ef17 2s
pvc-acd10539-0474-49b6-837b-b985c65b8925-r-1a9ef538 starting rasnu1 085c5398-ff0d-4aae-a3ee-2a9133e1c564 instance-manager-754ee9c224195750355753939287ef17 3s
pvc-47c0f6df-3b36-456c-a650-991d7131fa82-r-44a185a6 error rasnu1 085c5398-ff0d-4aae-a3ee-2a9133e1c564 instance-manager-754ee9c224195750355753939287ef17 2s
[...wait a few seconds...]
$ kubectl get replicas.longhorn.io -n longhorn-system
NAME STATE NODE DISK INSTANCEMANAGER IMAGE AGE
pvc-58e326c8-3777-42ad-89cf-9eacf0ba0fb7-r-6bb2c2a1 running epsilon 16a8c5dc-4b44-4100-94f6-0995c0a85b8b instance-manager-c8e888869e140a5618feabb01783baaa longhornio/longhorn-engine:v1.5.0 14h
pvc-f7fda1f2-897b-4310-8548-c1ad64040ea1-r-adad8378 running epsilon 16a8c5dc-4b44-4100-94f6-0995c0a85b8b instance-manager-c8e888869e140a5618feabb01783baaa longhornio/longhorn-engine:v1.5.0 4d3h
pvc-acd10539-0474-49b6-837b-b985c65b8925-r-08d9fd13 running epsilon 16a8c5dc-4b44-4100-94f6-0995c0a85b8b instance-manager-c8e888869e140a5618feabb01783baaa longhornio/longhorn-engine:v1.5.0 4d3h
pvc-47c0f6df-3b36-456c-a650-991d7131fa82-r-047c7b26 running epsilon 16a8c5dc-4b44-4100-94f6-0995c0a85b8b instance-manager-c8e888869e140a5618feabb01783baaa longhornio/longhorn-engine:v1.5.0 4d4h
pvc-747fcde6-1e35-48de-9b30-74d601870c43-r-57e62375 stopped 8m51s
pvc-747fcde6-1e35-48de-9b30-74d601870c43-r-f4cbbf91 stopped rasnu1 085c5398-ff0d-4aae-a3ee-2a9133e1c564 8m51s
pvc-747fcde6-1e35-48de-9b30-74d601870c43-r-6b053823 stopped rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b 8m51s
pvc-58e326c8-3777-42ad-89cf-9eacf0ba0fb7-r-64b001c3 stopped 1s
pvc-47c0f6df-3b36-456c-a650-991d7131fa82-r-4755af1d error rasnu1 085c5398-ff0d-4aae-a3ee-2a9133e1c564 instance-manager-754ee9c224195750355753939287ef17 2s
pvc-acd10539-0474-49b6-837b-b985c65b8925-r-f4e63118 stopped rasnu1 085c5398-ff0d-4aae-a3ee-2a9133e1c564 2s
pvc-f7fda1f2-897b-4310-8548-c1ad64040ea1-r-48f0d951 stopped 1s
pvc-58e326c8-3777-42ad-89cf-9eacf0ba0fb7-r-b9e06832 error rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b instance-manager-c0b889266e751abfd03088e8529cacb0 3s
pvc-f7fda1f2-897b-4310-8548-c1ad64040ea1-r-734c1ea0 error rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b instance-manager-c0b889266e751abfd03088e8529cacb0 3s
pvc-47c0f6df-3b36-456c-a650-991d7131fa82-r-f0cc49c8 error rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b instance-manager-c0b889266e751abfd03088e8529cacb0 3s
pvc-acd10539-0474-49b6-837b-b985c65b8925-r-887183c1 stopped rassigma c5f7d455-1486-4015-8e3f-2f6d5112d11b 1s
One of the Pi’s disks is nearly full (with non-Longhorn data) so I’ve disabled it, but there’s plenty of room on the others:
Logs from longhorn-manager pod do show some recurring errors, but I’m not sure how to respond to either:
time="2023-07-21T07:07:22Z" level=debug msg="Requeue volume due to error <nil> or Operation cannot be fulfilled on replicas.longhorn.io \"pvc-58e326c8-3777-42ad-89cf-9eacf0ba0fb7-r-e645e973\": the object has been modified; please apply your changes to the latest version and try again" accessMode=rwo controller=longhorn-volume frontend=blockdev migratable=false node=epsilon owner=epsilon state=attached volume=pvc-58e326c8-3777-42ad-89cf-9eacf0ba0fb7
...
time="2023-07-21T07:07:22Z" level=error msg="There's no available disk for replica pvc-747fcde6-1e35-48de-9b30-74d601870c43-r-57e62375, size 536870912000"
time="2023-07-21T07:07:22Z" level=warning msg="Failed to schedule replica" accessMode=rwx controller=longhorn-volume frontend=blockdev migratable=false node=epsilon owner=epsilon replica=pvc-747fcde6-1e35-48de-9b30-74d601870c43-r-57e62375 shareEndpoint= shareState=stopped state=detached volume=pvc-747fcde6-1e35-48de-9b30-74d601870c43
Similar lines reoccur for replica IDs.
I’ll generate a Support Bundle now.
EDIT: I tried uploading the Support Bundle (~600M) to Git LFS, but that was apparently too large. Any suggestions on alternative upload locations welcome.
EDIT2: GitHub issue opened here