Longhorn-system stuck Terminating

Was experimenting with Longhorn. It is a cool technology, but need some help uninstalling. Namespace is currently stuck in “Terminating” state.

Found the uninstall steps on the GitHub instructions but they don’t work at this point because they expect the namespace to be in a normal state.

Any suggestions?

Output of kubectl get namespace longhorn-system -o json:

{
    "apiVersion": "v1",
    "kind": "Namespace",
    "metadata": {
        "annotations": {
            "cattle.io/status": "{\"Conditions\":[{\"Type\":\"InitialRolesPopulated\",\"Status\":\"True\",\"Message\":\"\",\"LastUpdateTime\":\"2019-01-10T17:18:05Z\"},{\"Type\":\"ResourceQuotaInit\",\"Status\":\"True\",\"Message\":\"\",\"LastUpdateTime\":\"2019-01-10T17:18:04Z\"}]}",
            "field.cattle.io/creatorId": "user-sw4mg",
            "field.cattle.io/projectId": "c-gkz6s:p-48tst",
            "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{\"cattle.io/status\":\"{\\\"Conditions\\\":[{\\\"Type\\\":\\\"InitialRolesPopulated\\\",\\\"Status\\\":\\\"True\\\",\\\"Message\\\":\\\"\\\",\\\"LastUpdateTime\\\":\\\"2019-01-10T17:18:05Z\\\"},{\\\"Type\\\":\\\"ResourceQuotaInit\\\",\\\"Status\\\":\\\"True\\\",\\\"Message\\\":\\\"\\\",\\\"LastUpdateTime\\\":\\\"2019-01-10T17:18:04Z\\\"}]}\",\"field.cattle.io/creatorId\":\"user-sw4mg\",\"field.cattle.io/projectId\":\"c-gkz6s:p-48tst\",\"lifecycle.cattle.io/create.namespace-auth\":\"true\"},\"creationTimestamp\":\"2019-01-10T17:17:20Z\",\"deletionTimestamp\":\"2019-02-22T12:30:14Z\",\"labels\":{\"cattle.io/creator\":\"norman\",\"field.cattle.io/projectId\":\"p-48tst\"},\"name\":\"longhorn-system\",\"resourceVersion\":\"10512979\",\"selfLink\":\"/api/v1/namespaces/longhorn-system\",\"uid\":\"9673789f-14fb-11e9-ba68-005056b171b1\"},\"spec\":{\"finalizers\":[]},\"status\":{\"phase\":\"Terminating\"}}\n",
            "lifecycle.cattle.io/create.namespace-auth": "true"
        },
        "creationTimestamp": "2019-01-10T17:17:20Z",
        "deletionTimestamp": "2019-02-22T12:30:14Z",
        "labels": {
            "cattle.io/creator": "norman",
            "field.cattle.io/projectId": "p-48tst"
        },
        "name": "longhorn-system",
        "resourceVersion": "15206257",
        "selfLink": "/api/v1/namespaces/longhorn-system",
        "uid": "9673789f-14fb-11e9-ba68-005056b171b1"
    },
    "spec": {
        "finalizers": [
            "kubernetes"
        ]
    },
    "status": {
        "phase": "Terminating"
    }
}
1 Like

Hi @jtstepan

You can try try follow the uninstallation instruction of v0.3 here https://github.com/rancher/longhorn/tree/v0.3#uninstall-longhorn

The cleanup.sh should still work. The reason is some resources inside the namespace has finalizer built-in and need to be removed properly. If you delete the manager first then the resource, the resource would be leftover and cannot be deleted by Kubernetes.

cleanup.sh should remove all the finalizers for you.

Looks like that script is not in master anymore, but I found it here: https://raw.githubusercontent.com/rancher/longhorn-manager/revert-196-issue-273/deploy/scripts/cleanup.sh

Gets stuck removing the “engineimages”:

# bash -x ./cleanup.sh
+ NAMESPACE=longhorn-system
+ remove_crd_instances
+ remove_and_wait volumes.longhorn.rancher.io
+ local crd=volumes.longhorn.rancher.io
++ kubectl -n longhorn-system delete volumes.longhorn.rancher.io --all
+ out='error: the server doesn'\''t have a resource type "volumes"'
+ '[' 1 -ne 0 ']'
+ echo error: the server 'doesn'\''t' have a resource type '"volumes"'
error: the server doesn't have a resource type "volumes"
+ return
+ remove_and_wait engines.longhorn.rancher.io
+ local crd=engines.longhorn.rancher.io
++ kubectl -n longhorn-system delete engines.longhorn.rancher.io --all
+ out='error: the server doesn'\''t have a resource type "engines"'
+ '[' 1 -ne 0 ']'
+ echo error: the server 'doesn'\''t' have a resource type '"engines"'
error: the server doesn't have a resource type "engines"
+ return
+ remove_and_wait replicas.longhorn.rancher.io
+ local crd=replicas.longhorn.rancher.io
++ kubectl -n longhorn-system delete replicas.longhorn.rancher.io --all
+ out='error: the server doesn'\''t have a resource type "replicas"'
+ '[' 1 -ne 0 ']'
+ echo error: the server 'doesn'\''t' have a resource type '"replicas"'
error: the server doesn't have a resource type "replicas"
+ return
+ remove_and_wait engineimages.longhorn.rancher.io
+ local crd=engineimages.longhorn.rancher.io
++ kubectl -n longhorn-system delete engineimages.longhorn.rancher.io --all

Never goes any further.

You can use
kubectl - n longhorn-system edit lhei

Then manually remove the finalizer field in the entries. It should let the script continue.

1 Like

It did get past the engineimagines, but now hanging on the nodes removal:

# bash -x ./cleanup.sh
+ NAMESPACE=longhorn-system
+ remove_crd_instances
+ remove_and_wait volumes.longhorn.rancher.io
+ local crd=volumes.longhorn.rancher.io
++ kubectl -n longhorn-system delete volumes.longhorn.rancher.io --all
+ out='error: the server doesn'\''t have a resource type "volumes"'
+ '[' 1 -ne 0 ']'
+ echo error: the server 'doesn'\''t' have a resource type '"volumes"'
error: the server doesn't have a resource type "volumes"
+ return
+ remove_and_wait engines.longhorn.rancher.io
+ local crd=engines.longhorn.rancher.io
++ kubectl -n longhorn-system delete engines.longhorn.rancher.io --all
+ out='error: the server doesn'\''t have a resource type "engines"'
+ '[' 1 -ne 0 ']'
+ echo error: the server 'doesn'\''t' have a resource type '"engines"'
error: the server doesn't have a resource type "engines"
+ return
+ remove_and_wait replicas.longhorn.rancher.io
+ local crd=replicas.longhorn.rancher.io
++ kubectl -n longhorn-system delete replicas.longhorn.rancher.io --all
+ out='error: the server doesn'\''t have a resource type "replicas"'
+ '[' 1 -ne 0 ']'
+ echo error: the server 'doesn'\''t' have a resource type '"replicas"'
error: the server doesn't have a resource type "replicas"
+ return
+ remove_and_wait engineimages.longhorn.rancher.io
+ local crd=engineimages.longhorn.rancher.io
++ kubectl -n longhorn-system delete engineimages.longhorn.rancher.io --all
+ out='error: the server doesn'\''t have a resource type "engineimages"'
+ '[' 1 -ne 0 ']'
+ echo error: the server 'doesn'\''t' have a resource type '"engineimages"'
error: the server doesn't have a resource type "engineimages"
+ return
+ remove_and_wait settings.longhorn.rancher.io
+ local crd=settings.longhorn.rancher.io
++ kubectl -n longhorn-system delete settings.longhorn.rancher.io --all
+ out='error: the server doesn'\''t have a resource type "settings"'
+ '[' 1 -ne 0 ']'
+ echo error: the server 'doesn'\''t' have a resource type '"settings"'
error: the server doesn't have a resource type "settings"
+ return
+ remove_and_wait nodes.longhorn.rancher.io
+ local crd=nodes.longhorn.rancher.io
++ kubectl -n longhorn-system delete nodes.longhorn.rancher.io --all

Apply the same method to nodes.longhorn.rancher.io as well.

E.g.

kubectl -n longhorn-system edit nodes.longhorn.rancher.io
1 Like

That worked, thank you! The script ran to complete and the namespace is gone.

For the record, the hung namespace is really easy to cause. I simply used Rancher UI to add Longhorn, then realized I didn’t put the namespace in a project (and it wouldn’t let me move it). So I tried to delete the namespace. Immediately hung “Terminating”. I spent about an hour trying to find the dangling records that the namespace were waiting on. The main issue was CRD’s with finalizers. I am guessing some records deleted in the wrong order and the finalizers could not respond. Either way, please do fix if you can. It’s quite a pain.

1 Like

Hi @jhughes2112

Can you check if https://github.com/longhorn/longhorn#uninstall-longhorn helps? It’s designed to clean it up even when the namespace is terminating. Also, if you install Longhorn using Rancher catalog app, it’s recommend to uninstall using the catalog app too. That will take care of the clean up correctly as well.

1 Like

Oh, that’s an interesting solution. I didn’t find that while searching for answers. I did it manually last night. Thanks for the response. Also, I wanted to thank you for working on Longhorn. I just got it set up after I wrote the above, mounted a separate disk per node and unscheduled the boot drive, and it just worked. The web UI is excellent!

This worked for me:

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/uninstall/uninstall.yaml

Thanks @jhughes2112

2 Likes

None of the above methods worked for my situation. I used this script to delete longhorn-system’s namespace by calling the k8s native interface.(Clusters managed by anrancher)
#!/bin/bash

NAMESPACE=$1
K8S_SERVER=https://localk8s.com/k8s/clusters/c-qftg9
TOKEN=kubeconfig-user-kxhls.c-qftg9:xdhrv62fcvzfrdv9qlr8w674q2mt77lzjts62xn7ch58dvb7qc5xdb
kubectl get namespace $NAMESPACE -o json | jq '.spec = {"finalizers":[]}' >tmp.json
curl -X PUT --data-binary @tmp.json $K8S_SERVER/api/v1/namespaces/$NAMESPACE/finalize -H "Content-Type: application/json" --header "Authorization: Bearer $TOKEN" --insecure
rm tmp.json

That’s weird. The uninstaller should take care of the situation automatically. If that’s not the case, can you file a bug?

I had to run the following command to find which resources were causing the ns delete to hang. I hope this helps someone.

kubectl api-resources --verbs=list --namespaced -o name   | xargs -n 1 kubectl get --show-kind --ignore-not-found -n longhorn-system
1 Like

Thanks for posting this. It solved my problem with a longhorn namespace I couldn’t delete.