Longhorn-UI on CrashLoopBack and Error in Longhorn-Manager

RockD · December 19, 2024, 8:43pm

Hello

I’m having problems with my instalation of longhorn in a cluster, de longhorn-ui get stuck in CrashLoopBack with exit error code 127, and doesn’t generate any logs

Searching for posible debuguing options, I cross with someone saying that the problems that the ui is failing in that way is posible due to problems on the longhorn-manager pods, analizing the logs I encountered two types of erroos:

The first one are:

time="2024-12-18T17:58:00Z" level=warning msg="Failed to get filesystem device type of /var/lib/longhorn/" func="controller.(*ClusterInfo).collectNodeDiskCount" file="setting_controller.go:2136" controller=longhorn-setting error="lstat /sys/class/block/VolGroup00-var: no such file or directory" node=plsnrewrkapp003
time="2024-12-18T17:57:57Z" level=warning msg="Failed to get filesystem device type of /app" func="controller.(*ClusterInfo).collectNodeDiskCount" file="setting_controller.go:2136" controller=longhorn-setting error="lstat /sys/class/block/vg_app-lv_app: no such file or directory" node=plsnrewrkapp004

I know that is only a warning but, the other thing that get my attention are the following:

time="2024-12-18T18:44:54Z" level=warning msg="Cannot auto-delete Pod when the associated Volume is not found" func="controller.(*KubernetesPodController).getAssociatedVolumes" file="kubernetes_pod_controller.go:634" controller=longhorn-kubernetes-pod error="volume.longhorn.io \"pcs-opengateway-pv\" not found" node=plsnrewrkapp003 pod=pcs-opengateway-backend-01-pod-bb54f877c-zbzwt
10.5.16.141 - - [18/Dec/2024:18:44:56 +0000] "GET /v1/volumes/pcs-opengateway-volume HTTP/1.1" 200 12247 "" "Go-http-client/1.1"
time="2024-12-18T18:45:24Z" level=warning msg="Cannot auto-delete Pod when the associated Volume is not found" func="controller.(*KubernetesPodController).getAssociatedVolumes" file="kubernetes_pod_controller.go:634" controller=longhorn-kubernetes-pod error="volume.longhorn.io \"pcs-opengateway-pv\" not found" node=plsnrewrkapp003 pod=pcs-opengateway-backend-01-pod-bb54f877c-zbzwt
time="2024-12-18T18:45:54Z" level=warning msg="Cannot auto-delete Pod when the associated Volume is not found" func="controller.(*KubernetesPodController).getAssociatedVolumes" file="kubernetes_pod_controller.go:634" controller=longhorn-kubernetes-pod error="volume.longhorn.io \"pcs-opengateway-pv\" not found" node=plsnrewrkapp003 pod=pcs-opengateway-backend-01-pod-bb54f877c-zbzwt

not only with this volumes but with several pv

those pv are currently claimed and working perfectly in their respective deploys as shown as follow:

NAME                                 CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                                 STORAGECLASS      REASON   AGE
pcs-opengateway-alloy-prod-pv        4Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-alloy-prod-pvc        longhorn-static            30d
pcs-opengateway-grafana-prod-pv      1Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-grafana-prod-pvc      longhorn-static            29d
pcs-opengateway-prod-loki-pv         4Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-prod-loki-pvc         longhorn-static            38d
pcs-opengateway-prometheus-prod-pv   4Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-prometheus-prod-pvc   longhorn-static            24d
pcs-opengateway-pv                   4Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-pvc                   longhorn-static            65d
pcs-techcore-workshop-pv             32Gi       RWX            Retain           Bound      pcs-techcore-workshop/pcs-techcore-workshop-pvc       longhorn-static            49d
prueba-pv                            2Gi        RWX            Retain           Released   pcs-opengateway/prueba-pvc                            longhorn-static            41d

OS: Red Hat Enterprise Linux release 8.9 (Ootpa)
Kernel version: 4.18.0-513.5.1.el8_9.x86_64
CPU per Node: 16vCPU
RAM per Node: 32GB
Kubernetes version: v1.28.14 (K8s)
Longhorn version: 1.7.1

This probles happened from one day to another when there were no deployments in that week

If someone has knowledge of what is happening or how to debug further this proble I will appreciate

Thanks in advance

Topic		Replies	Views
Longhorn UI, Volume stuck in deleting Longhorn	4	5391	February 26, 2020
Longhorn manager occasionally stop working Longhorn	2	2231	March 22, 2023
Systemd complaining (still working fine) Longhorn	4	3101	April 5, 2019
Longhorn - workload pod moved - storage did not Longhorn	2	1527	July 1, 2021
Pod stucks when recreates at another node Longhorn	0	442	August 9, 2023

Longhorn-UI on CrashLoopBack and Error in Longhorn-Manager

Related topics