Ive got several pods that are stuck in the removing state… How to debug and remove:
can you use
kubectl describe on the pods to get some details on them?
./rancher kubectl describe pod hub-857dc8f9f4-j9nqt --namespace platform
Start Time: Sat, 09 Jun 2018 09:28:01 -0400
Status: Terminating (lasts 4d)
Termination Grace Period: 30s
Controlled By: ReplicaSet/hub-857dc8f9f4
Exit Code: 0
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Restart Count: 0
Environment Variables from:
staging-secrets Secret Optional: false
/etc/northpage from staging-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c74p6 (ro)
Type: Secret (a volume populated by a Secret)
Type: Secret (a volume populated by a Secret)
QoS Class: BestEffort
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Was there ever any resolution to this? I’m having a similar issue; I have two pods that are stuck in the “Terminating” state and I can’t figure out how to get rid of them. The one thing they both have in common is that they both tried (and failed) to mount a secret that didn’t exist as a volume; I made a typo when creating the secret and didn’t notice it. Here’s the status for one of them:
Name: catbot-tunnel-7d65bcbc4f-2v5dt Namespace: default Node: ares/184.108.40.206 Start Time: Mon, 02 Jul 2018 14:30:15 -0500 Labels: io.kompose.service=catbot-tunnel pod-template-hash=3821676709 Annotations: <none> Status: Terminating (lasts 2h) Termination Grace Period: 0s IP: Controlled By: ReplicaSet/catbot-tunnel-7d65bcbc4f Containers: catbot-tunnel: Container ID: Image: jnovack/autossh Image ID: Port: <none> Host Port: <none> State: Terminated Exit Code: 0 Started: Mon, 01 Jan 0001 00:00:00 +0000 Finished: Mon, 01 Jan 0001 00:00:00 +0000 Ready: False Restart Count: 0 Environment: SSH_HOSTUSER: omitted Mounts: /id_rsa from key (ro) /var/run/secrets/kubernetes.io/serviceaccount from default-token-4pc7s (ro) Conditions: Type Status Initialized True Ready False PodScheduled True Volumes: key: Type: Secret (a volume populated by a Secret) SecretName: key Optional: false default-token-4pc7s: Type: Secret (a volume populated by a Secret) SecretName: default-token-4pc7s Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: <none>
No resolution so I just moved on. If I remember correctly I was playing with secrets as well. Sorry couldn’t be more help.
Just in case anybody else comes along, I was able to remove them by first force-removing the pods:
kubectl delete pods/catbot-tunnel-2-798675b7c5-dnw8r --grace-period=0 --force
This did not actually remove them, but instead it caused them to get stuck waiting on a foregroundDeletion event. I used kubectl edit to remove this from their definitions:
finalizers: - foregroundDeletion
Then they finally died (and hopefully haven’t left any open resources in the system that I’m not aware of).
We experiencing same issue, what i found so far that this issue is somewhat related to docker, after we downgraded to 1.12.6 from 17.3.2, situation become much more stable.
Same issue here, but also appearing with 17.03.2-ce (we too did a downgrade, from 17.12.1 to 220.127.116.11 to be precise). Rancher 2.0.8, Docker 17.12.1-0ubuntu1
The container/pod has been in “Removing” state for the past days. When I use kubectl to get the current status:
$ kubectl describe pod importer-8bf85dcc9-r5rtn --namespace gamma --insecure-skip-tls-verify=true Name: importer-8bf85dcc9-r5rtn Namespace: gamma Node: redacted/redacted Start Time: Tue, 18 Sep 2018 16:03:59 +0200 Labels: pod-template-hash=469418775 workload.user.cattle.io/workloadselector=deployment-gamma-importer Annotations: cni.projectcalico.org/podIP=10.42.1.39/32 Status: Terminating (expires Tue, 18 Sep 2018 16:09:11 +0200) Termination Grace Period: 30s IP: 10.42.1.39 Controllers: <none> Containers: importer: Container ID: docker://05b93ed9018854067b5ec63ef4929b512cd9f9f2306f9e0ff67ea6ee06478c1b Image: redacted/importer:stage-461 Image ID: docker-pullable://redacted/importer@sha256:fa93b2f3359ce7b72823292bbfc2bfb493e912cbf28d87b976ecdeadb6ba3ca7 Port: State: Terminated Exit Code: 0 Started: Mon, 01 Jan 0001 00:00:00 +0000 Finished: Mon, 01 Jan 0001 00:00:00 +0000 Ready: False Restart Count: 0
Note the Status: Terminating (expires Tue, 18 Sep 2018 16:09:11 +0200). It’s September 20th today…
I tried to force a deletion using kubectl like @minneyar described:
$ kubectl delete pod importer-8bf85dcc9-r5rtn --now --force --namespace gamma --insecure-skip-tls-verify=true pod "importer-8bf85dcc9-r5rtn" deleted
But the pod is still shown in Rancher2 UI (Removing) and the pod still shows up when I use the same kubectl describe command from above…
I could not find that information about “finalizers”. What exactly did you edit, @minneyar?
I feel like what I edited was in the pod metadata, but I don’t exactly remember and I actually haven’t used rancher in a while now. Here’s a little bit of documentation on that metadata, though: https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#controlling-how-the-garbage-collector-deletes-dependents
OK I will take a look at that.
In the meantime I “profited” from the pod being in “removing” state to extend the check_rancher2 monitoring plugin (can be found here: https://github.com/Napsty/check_rancher2). It will now alarm when a pod in a non-running state is found within a project:
./check_rancher2.sh -H myrancher2.example.com -U token-XXXXX -P ootaefomai7eeseyoopeeghooxoor1iuvie0Ohvahph5ahrui5Ailee -S -t pod -p c-xxxxx:p-xxxxx CHECK_RANCHER2 CRITICAL - Pod "importer-8bf85dcc9-r5rtn" is removing -|'pods_total'=8;;;; 'pods_errors'=1;;;;
Thank you @minneyar
kubectl delete pods/podname --grace-period=0 --force
Now worked for me too! Seems the --grace-period=0 did the trick (I tried it with --now before).
The monitoring plugin now returns OK:
$ ./check_rancher2.sh -H myrancher2.example.com -U token-XXXXX -P ootaefomai7eeseyoopeeghooxoor1iuvie0Ohvahph5ahrui5Ailee -S -t pod -p c-xxxxx:p-xxxxx CHECK_RANCHER2 OK - All pods (7) in project c-xxxxx:p-xxxxx are running|'pods_total'=7;;;; 'pod_errors'=0;;;;