I have a longhorn volume in “deleting” state shown in UI.
Repeated attempts to delete it don’t remove it and it is stuck in this state.
I have already removed the corresponding PV/PVC from k8s but still that Longhorn Volume is listed as “deleting” in UI.
Background info regarding this volume:
Initially the volume was a valid PV/PVC following the standard naming convention “pvc-*”
At some point we had DiskPressure issues on a node, rendering that volume broken because re-scheduling the longhorn replicas failed due to insufficient resources on other nodes.
Restored the broken volume from a backup using longhorn UI - but using a different volume-name, not like “pvc-*”
afterwards that longhorn volume (PV/PVC) was usable on the corresponding POD again.
The DiskPressure issue reappeared, again longhorn volume got broken due the resulting POD eviction.
Afterwards that longhorn volume was shown in UI as “Detached” even though the rescheduled POD successfully accessed PV/PVC
The logs of longhorn-manager start with this snippet and the last errors are repeated over and over again:
Name of the stuck longhorn volume is “sfpl-prod-file-service”.
The mentioned “instance-manager-r-af5c615a” POD does not run anymore, I guess it was evicted when DiskPressure happened.
time="2020-02-24T16:16:55Z" level=info msg="Start overwriting built-in settings with customized values"
time="2020-02-24T16:16:55Z" level=debug msg="Engine image longhornio/longhorn-engine:v0.7.0 is ready"
time="2020-02-24T16:16:55Z" level=info msg="Listening on 10.42.4.140:9500"
time="2020-02-24T16:16:55Z" level=info msg="Start Longhorn node controller"
time="2020-02-24T16:16:55Z" level=info msg="Start Longhorn replica controller"
time="2020-02-24T16:16:55Z" level=info msg="Start Longhorn engine controller"
time="2020-02-24T16:16:55Z" level=info msg="Start Longhorn websocket controller"
time="2020-02-24T16:16:55Z" level=info msg="Start Longhorn volume controller"
time="2020-02-24T16:16:55Z" level=info msg="Start Longhorn Engine Image controller"
time="2020-02-24T16:16:55Z" level=info msg="Starting Longhorn instance manager controller"
time="2020-02-24T16:16:55Z" level=info msg="Start Longhorn Setting controller"
time="2020-02-24T16:16:55Z" level=info msg="Start kubernetes controller"
time="2020-02-24T16:16:56Z" level=debug msg="Start monitoring pvc-1a6b2d60-2c86-437f-bad4-6c3e01abab4b-e-38079260"
time="2020-02-24T16:16:56Z" level=debug msg="Start monitoring instance manager instance-manager-r-33fd6639"
time="2020-02-24T16:16:56Z" level=warning msg="Error syncing Longhorn replica longhorn-system/sfpl-prod-file-service-r-720aa022: fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io \"instance-manager-r-af5c615a\" not found"
time="2020-02-24T16:16:56Z" level=debug msg="Start monitoring pvc-e580989d-01e3-4b58-bf8b-6fee72b63762-e-15b99770"
time="2020-02-24T16:16:56Z" level=debug msg="Start backup store monitoring for s3://longhorn-backup@us-east-1/"
time="2020-02-24T16:16:56Z" level=debug msg="Start monitoring instance manager instance-manager-e-5f078c81"
time="2020-02-24T16:16:56Z" level=warning msg="Error syncing Longhorn replica longhorn-system/sfpl-prod-file-service-r-720aa022: fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io \"instance-manager-r-af5c615a\" not found"
E0224 16:16:56.073734 1 replica_controller.go:178] fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io "instance-manager-r-af5c615a" not found
time="2020-02-24T16:16:56Z" level=warning msg="Dropping Longhorn replica longhorn-system/sfpl-prod-file-service-r-720aa022 out of the queue: fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io \"instance-manager-r-af5c615a\" not found"
time="2020-02-24T16:17:25Z" level=warning msg="Error syncing Longhorn replica longhorn-system/sfpl-prod-file-service-r-720aa022: fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io \"instance-manager-r-af5c615a\" not found"
time="2020-02-24T16:17:25Z" level=warning msg="Error syncing Longhorn replica longhorn-system/sfpl-prod-file-service-r-720aa022: fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io \"instance-manager-r-af5c615a\" not found"
E0224 16:17:25.795593 1 replica_controller.go:178] fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io "instance-manager-r-af5c615a" not found
time="2020-02-24T16:17:25Z" level=warning msg="Dropping Longhorn replica longhorn-system/sfpl-prod-file-service-r-720aa022 out of the queue: fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io \"instance-manager-r-af5c615a\" not found"
.....
E0225 07:23:56.705452 1 replica_controller.go:178] fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io "instance-manager-r-af5c615a" not found
time="2020-02-25T07:23:56Z" level=warning msg="Dropping Longhorn replica longhorn-system/sfpl-prod-file-service-r-720aa022 out of the queue: fail to sync replica for longhorn-system/sfpl-prod-file-service-r-720aa022: failed to cleanup the related replica process before deleting replica sfpl-prod-file-service-r-720aa022: instancemanager.longhorn.io \"instance-manager-r-af5c615a\" not found"