Ive got several pods that are stuck in the removing state… How to debug and remove:
can you use kubectl describe
on the pods to get some details on them?
./rancher kubectl describe pod hub-857dc8f9f4-j9nqt --namespace platform
Name: hub-857dc8f9f4-j9nqt
Namespace: platform
Node: ip-10-1-10-190/10.1.10.190
Start Time: Sat, 09 Jun 2018 09:28:01 -0400
Labels: pod-template-hash=4138749590
workload.user.cattle.io/workloadselector=deployment-platform-hub
workloadID_ingress-4d741ade6d2bdf8f43d7a491eb645b06=true
workloadID_ingress-5cf5f9db88fc56cc41f0def7f145ae65=true
workloadID_ingress-c84e860e5a186ef0532c749a7ff8dbc6=true
Annotations: field.cattle.io/publicEndpoints=[{“addresses”:[“ip”],“port”:30868,“protocol”:“TCP”,“serviceName”:“platform:ingress-4d741ade6d2bdf8f43d7a491eb645b06”,“allNodes”:true},{“addresses”:["ip…
Status: Terminating (lasts 4d)
Termination Grace Period: 30s
IP:
Controlled By: ReplicaSet/hub-857dc8f9f4
Containers:
hub:
Container ID:
Image: some.url:5000/hub:test2-0982d24dcbc2b26e16bb8500f4f100efae7c45cc
Image ID:
Port:
Host Port:
State: Terminated
Exit Code: 0
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 0
Environment Variables from:
staging-secrets Secret Optional: false
Environment:
Mounts:
/etc/northpage from staging-secrets (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-c74p6 (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
staging-secrets:
Type: Secret (a volume populated by a Secret)
SecretName: staging-secrets
Optional: false
default-token-c74p6:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-c74p6
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Was there ever any resolution to this? I’m having a similar issue; I have two pods that are stuck in the “Terminating” state and I can’t figure out how to get rid of them. The one thing they both have in common is that they both tried (and failed) to mount a secret that didn’t exist as a volume; I made a typo when creating the secret and didn’t notice it. Here’s the status for one of them:
Name: catbot-tunnel-7d65bcbc4f-2v5dt
Namespace: default
Node: ares/129.162.199.31
Start Time: Mon, 02 Jul 2018 14:30:15 -0500
Labels: io.kompose.service=catbot-tunnel
pod-template-hash=3821676709
Annotations: <none>
Status: Terminating (lasts 2h)
Termination Grace Period: 0s
IP:
Controlled By: ReplicaSet/catbot-tunnel-7d65bcbc4f
Containers:
catbot-tunnel:
Container ID:
Image: jnovack/autossh
Image ID:
Port: <none>
Host Port: <none>
State: Terminated
Exit Code: 0
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 0
Environment:
SSH_HOSTUSER: omitted
Mounts:
/id_rsa from key (ro)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4pc7s (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
key:
Type: Secret (a volume populated by a Secret)
SecretName: key
Optional: false
default-token-4pc7s:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-4pc7s
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events: <none>
No resolution so I just moved on. If I remember correctly I was playing with secrets as well. Sorry couldn’t be more help.
Just in case anybody else comes along, I was able to remove them by first force-removing the pods:
kubectl delete pods/catbot-tunnel-2-798675b7c5-dnw8r --grace-period=0 --force
This did not actually remove them, but instead it caused them to get stuck waiting on a foregroundDeletion event. I used kubectl edit to remove this from their definitions:
finalizers:
- foregroundDeletion
Then they finally died (and hopefully haven’t left any open resources in the system that I’m not aware of).
We experiencing same issue, what i found so far that this issue is somewhat related to docker, after we downgraded to 1.12.6 from 17.3.2, situation become much more stable.
Same issue here, but also appearing with 17.03.2-ce (we too did a downgrade, from 17.12.1 to 17.0.3.2 to be precise). Rancher 2.0.8, Docker 17.12.1-0ubuntu1
The container/pod has been in “Removing” state for the past days. When I use kubectl to get the current status:
$ kubectl describe pod importer-8bf85dcc9-r5rtn --namespace gamma --insecure-skip-tls-verify=true
Name: importer-8bf85dcc9-r5rtn
Namespace: gamma
Node: redacted/redacted
Start Time: Tue, 18 Sep 2018 16:03:59 +0200
Labels: pod-template-hash=469418775
workload.user.cattle.io/workloadselector=deployment-gamma-importer
Annotations: cni.projectcalico.org/podIP=10.42.1.39/32
Status: Terminating (expires Tue, 18 Sep 2018 16:09:11 +0200)
Termination Grace Period: 30s
IP: 10.42.1.39
Controllers: <none>
Containers:
importer:
Container ID: docker://05b93ed9018854067b5ec63ef4929b512cd9f9f2306f9e0ff67ea6ee06478c1b
Image: redacted/importer:stage-461
Image ID: docker-pullable://redacted/importer@sha256:fa93b2f3359ce7b72823292bbfc2bfb493e912cbf28d87b976ecdeadb6ba3ca7
Port:
State: Terminated
Exit Code: 0
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Mon, 01 Jan 0001 00:00:00 +0000
Ready: False
Restart Count: 0
Note the Status: Terminating (expires Tue, 18 Sep 2018 16:09:11 +0200). It’s September 20th today…
I tried to force a deletion using kubectl like @minneyar described:
$ kubectl delete pod importer-8bf85dcc9-r5rtn --now --force --namespace gamma --insecure-skip-tls-verify=true
pod "importer-8bf85dcc9-r5rtn" deleted
But the pod is still shown in Rancher2 UI (Removing) and the pod still shows up when I use the same kubectl describe command from above…
I could not find that information about “finalizers”. What exactly did you edit, @minneyar?
I feel like what I edited was in the pod metadata, but I don’t exactly remember and I actually haven’t used rancher in a while now. Here’s a little bit of documentation on that metadata, though: https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/#controlling-how-the-garbage-collector-deletes-dependents
OK I will take a look at that.
In the meantime I “profited” from the pod being in “removing” state to extend the check_rancher2 monitoring plugin (can be found here: https://github.com/Napsty/check_rancher2). It will now alarm when a pod in a non-running state is found within a project:
./check_rancher2.sh -H myrancher2.example.com -U token-XXXXX -P ootaefomai7eeseyoopeeghooxoor1iuvie0Ohvahph5ahrui5Ailee -S -t pod -p c-xxxxx:p-xxxxx
CHECK_RANCHER2 CRITICAL - Pod "importer-8bf85dcc9-r5rtn" is removing -|'pods_total'=8;;;; 'pods_errors'=1;;;;
Thank you @minneyar
kubectl delete pods/podname --grace-period=0 --force
Now worked for me too! Seems the --grace-period=0 did the trick (I tried it with --now before).
The monitoring plugin now returns OK:
$ ./check_rancher2.sh -H myrancher2.example.com -U token-XXXXX -P ootaefomai7eeseyoopeeghooxoor1iuvie0Ohvahph5ahrui5Ailee -S -t pod -p c-xxxxx:p-xxxxx
CHECK_RANCHER2 OK - All pods (7) in project c-xxxxx:p-xxxxx are running|'pods_total'=7;;;; 'pod_errors'=0;;;;