Hello
I’m having problems with my instalation of longhorn in a cluster, de longhorn-ui get stuck in CrashLoopBack with exit error code 127, and doesn’t generate any logs
Searching for posible debuguing options, I cross with someone saying that the problems that the ui is failing in that way is posible due to problems on the longhorn-manager pods, analizing the logs I encountered two types of erroos:
The first one are:
time="2024-12-18T17:58:00Z" level=warning msg="Failed to get filesystem device type of /var/lib/longhorn/" func="controller.(*ClusterInfo).collectNodeDiskCount" file="setting_controller.go:2136" controller=longhorn-setting error="lstat /sys/class/block/VolGroup00-var: no such file or directory" node=plsnrewrkapp003
time="2024-12-18T17:57:57Z" level=warning msg="Failed to get filesystem device type of /app" func="controller.(*ClusterInfo).collectNodeDiskCount" file="setting_controller.go:2136" controller=longhorn-setting error="lstat /sys/class/block/vg_app-lv_app: no such file or directory" node=plsnrewrkapp004
I know that is only a warning but, the other thing that get my attention are the following:
time="2024-12-18T18:44:54Z" level=warning msg="Cannot auto-delete Pod when the associated Volume is not found" func="controller.(*KubernetesPodController).getAssociatedVolumes" file="kubernetes_pod_controller.go:634" controller=longhorn-kubernetes-pod error="volume.longhorn.io \"pcs-opengateway-pv\" not found" node=plsnrewrkapp003 pod=pcs-opengateway-backend-01-pod-bb54f877c-zbzwt
10.5.16.141 - - [18/Dec/2024:18:44:56 +0000] "GET /v1/volumes/pcs-opengateway-volume HTTP/1.1" 200 12247 "" "Go-http-client/1.1"
time="2024-12-18T18:45:24Z" level=warning msg="Cannot auto-delete Pod when the associated Volume is not found" func="controller.(*KubernetesPodController).getAssociatedVolumes" file="kubernetes_pod_controller.go:634" controller=longhorn-kubernetes-pod error="volume.longhorn.io \"pcs-opengateway-pv\" not found" node=plsnrewrkapp003 pod=pcs-opengateway-backend-01-pod-bb54f877c-zbzwt
time="2024-12-18T18:45:54Z" level=warning msg="Cannot auto-delete Pod when the associated Volume is not found" func="controller.(*KubernetesPodController).getAssociatedVolumes" file="kubernetes_pod_controller.go:634" controller=longhorn-kubernetes-pod error="volume.longhorn.io \"pcs-opengateway-pv\" not found" node=plsnrewrkapp003 pod=pcs-opengateway-backend-01-pod-bb54f877c-zbzwt
not only with this volumes but with several pv
those pv are currently claimed and working perfectly in their respective deploys as shown as follow:
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pcs-opengateway-alloy-prod-pv 4Gi RWX Retain Bound pcs-opengateway/pcs-opengateway-alloy-prod-pvc longhorn-static 30d
pcs-opengateway-grafana-prod-pv 1Gi RWX Retain Bound pcs-opengateway/pcs-opengateway-grafana-prod-pvc longhorn-static 29d
pcs-opengateway-prod-loki-pv 4Gi RWX Retain Bound pcs-opengateway/pcs-opengateway-prod-loki-pvc longhorn-static 38d
pcs-opengateway-prometheus-prod-pv 4Gi RWX Retain Bound pcs-opengateway/pcs-opengateway-prometheus-prod-pvc longhorn-static 24d
pcs-opengateway-pv 4Gi RWX Retain Bound pcs-opengateway/pcs-opengateway-pvc longhorn-static 65d
pcs-techcore-workshop-pv 32Gi RWX Retain Bound pcs-techcore-workshop/pcs-techcore-workshop-pvc longhorn-static 49d
prueba-pv 2Gi RWX Retain Released pcs-opengateway/prueba-pvc longhorn-static 41d
OS: Red Hat Enterprise Linux release 8.9 (Ootpa)
Kernel version: 4.18.0-513.5.1.el8_9.x86_64
CPU per Node: 16vCPU
RAM per Node: 32GB
Kubernetes version: v1.28.14 (K8s)
Longhorn version: 1.7.1
This probles happened from one day to another when there were no deployments in that week
If someone has knowledge of what is happening or how to debug further this proble I will appreciate
Thanks in advance