Longhorn-UI on CrashLoopBack and Error in Longhorn-Manager

Hello

I’m having problems with my instalation of longhorn in a cluster, de longhorn-ui get stuck in CrashLoopBack with exit error code 127, and doesn’t generate any logs

Searching for posible debuguing options, I cross with someone saying that the problems that the ui is failing in that way is posible due to problems on the longhorn-manager pods, analizing the logs I encountered two types of erroos:

The first one are:

time="2024-12-18T17:58:00Z" level=warning msg="Failed to get filesystem device type of /var/lib/longhorn/" func="controller.(*ClusterInfo).collectNodeDiskCount" file="setting_controller.go:2136" controller=longhorn-setting error="lstat /sys/class/block/VolGroup00-var: no such file or directory" node=plsnrewrkapp003
time="2024-12-18T17:57:57Z" level=warning msg="Failed to get filesystem device type of /app" func="controller.(*ClusterInfo).collectNodeDiskCount" file="setting_controller.go:2136" controller=longhorn-setting error="lstat /sys/class/block/vg_app-lv_app: no such file or directory" node=plsnrewrkapp004

I know that is only a warning but, the other thing that get my attention are the following:

time="2024-12-18T18:44:54Z" level=warning msg="Cannot auto-delete Pod when the associated Volume is not found" func="controller.(*KubernetesPodController).getAssociatedVolumes" file="kubernetes_pod_controller.go:634" controller=longhorn-kubernetes-pod error="volume.longhorn.io \"pcs-opengateway-pv\" not found" node=plsnrewrkapp003 pod=pcs-opengateway-backend-01-pod-bb54f877c-zbzwt
10.5.16.141 - - [18/Dec/2024:18:44:56 +0000] "GET /v1/volumes/pcs-opengateway-volume HTTP/1.1" 200 12247 "" "Go-http-client/1.1"
time="2024-12-18T18:45:24Z" level=warning msg="Cannot auto-delete Pod when the associated Volume is not found" func="controller.(*KubernetesPodController).getAssociatedVolumes" file="kubernetes_pod_controller.go:634" controller=longhorn-kubernetes-pod error="volume.longhorn.io \"pcs-opengateway-pv\" not found" node=plsnrewrkapp003 pod=pcs-opengateway-backend-01-pod-bb54f877c-zbzwt
time="2024-12-18T18:45:54Z" level=warning msg="Cannot auto-delete Pod when the associated Volume is not found" func="controller.(*KubernetesPodController).getAssociatedVolumes" file="kubernetes_pod_controller.go:634" controller=longhorn-kubernetes-pod error="volume.longhorn.io \"pcs-opengateway-pv\" not found" node=plsnrewrkapp003 pod=pcs-opengateway-backend-01-pod-bb54f877c-zbzwt

not only with this volumes but with several pv

those pv are currently claimed and working perfectly in their respective deploys as shown as follow:

NAME                                 CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                                                 STORAGECLASS      REASON   AGE
pcs-opengateway-alloy-prod-pv        4Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-alloy-prod-pvc        longhorn-static            30d
pcs-opengateway-grafana-prod-pv      1Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-grafana-prod-pvc      longhorn-static            29d
pcs-opengateway-prod-loki-pv         4Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-prod-loki-pvc         longhorn-static            38d
pcs-opengateway-prometheus-prod-pv   4Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-prometheus-prod-pvc   longhorn-static            24d
pcs-opengateway-pv                   4Gi        RWX            Retain           Bound      pcs-opengateway/pcs-opengateway-pvc                   longhorn-static            65d
pcs-techcore-workshop-pv             32Gi       RWX            Retain           Bound      pcs-techcore-workshop/pcs-techcore-workshop-pvc       longhorn-static            49d
prueba-pv                            2Gi        RWX            Retain           Released   pcs-opengateway/prueba-pvc                            longhorn-static            41d

OS: Red Hat Enterprise Linux release 8.9 (Ootpa)
Kernel version: 4.18.0-513.5.1.el8_9.x86_64
CPU per Node: 16vCPU
RAM per Node: 32GB
Kubernetes version: v1.28.14 (K8s)
Longhorn version: 1.7.1

This probles happened from one day to another when there were no deployments in that week

If someone has knowledge of what is happening or how to debug further this proble I will appreciate

Thanks in advance