Rancher in Docker - helm-operation error

Hi,

we have here a strange problem - I got the task to upgrade our Lab-Rancher installation which currently runs via docker on VM

docker run -d --restart=unless-stopped --network=host -v /data/certs/cacerts.pem:/etc/rancher/ssl/cacerts.pem -v /data/rancher-data:/var/lib/rancher -v /data/registries.yaml:/etc/rancher/k3s/registries.yaml --privileged our.obfuscated.private.repo/sys/rancher:v2.6.3-patch1

Before I started I created a snapshot of the VM in case something goes wrong with upgrading to v2.6.4 - and as feared something went wrong. In 2.6.4 I saw that in the “local”-Kubernetes cluster the helm-operation pod somehow always failed partially to be precise it writes in the pod-logs:

Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

Every 5 minutes another pod is being created and failing. Everything else works in Rancher as expected, but I don’t feel right about those helm-operation pods and they also create some new configmaps called impersonation-helm-op-admin-kubeconfig... and impersonation-helm-op-user-kubeconfig...

So I rolled back the VM’s snapshot and saw that we already had the same issue within Rancher v2.6.3-patch1. So after digging a little bit in further I guess there was already some issue, when a colleague tried to update from v2.6.3 to v2.6.3-patch1 and missed those errors.

How can I fix this now manually? Is it event fixable or will I have to resetup everything - including the one and thankfully only RKE cluster we manage via this Rancher installation?

Here are some logs from the k3s.log that repeat approximately every 5 min

I0504 10:50:04.937820     384 topology_manager.go:187] "Topology Admit Handler"
I0504 10:50:05.025043     384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"data\" (UniqueName: \"kubernetes.io/secret/0aff4470-947e-470e-a1f9-b4c2ae4416f5-data\") pod \"helm-operation-52rpg\" (UID: \"0aff4470-947e-470e-a1f9-b4c2ae4416f5\") "
I0504 10:50:05.025105     384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/0aff4470-947e-470e-a1f9-b4c2ae4416f5-admin-kubeconfig\") pod \"helm-operation-52rpg\" (UID: \"0aff4470-947e-470e-a1f9-b4c2ae4416f5\") "
I0504 10:50:05.025151     384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/0aff4470-947e-470e-a1f9-b4c2ae4416f5-user-kubeconfig\") pod \"helm-operation-52rpg\" (UID: \"0aff4470-947e-470e-a1f9-b4c2ae4416f5\") "
I0504 10:50:05.025195     384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pod-impersonation-helm-op-tv69v-token-v89bh\" (UniqueName: \"kubernetes.io/secret/0aff4470-947e-470e-a1f9-b4c2ae4416f5-pod-impersonation-helm-op-tv69v-token-v89bh\") pod \"helm-operation-52rpg\" (UID: \"0aff4470-947e-470e-a1f9-b4c2ae4416f5\") "
I0504 10:50:05.729940     384 scope.go:111] "RemoveContainer" containerID="08a18c6a9dcf524e083b95cd8ecc4ba578a286c54d8c8e7e922080b16d60ed2e"
I0504 10:50:06.532431     384 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="4c2036b896f63a4ab0fbd43315a02f35b81b662843fb8dd700543c6a6b248c44"
I0504 10:50:07.536574     384 scope.go:111] "RemoveContainer" containerID="c5d282c066b4e5ef7af1d19a140edf403a88bacecf8e643245dc9bbc1f2ab4ff"
I0504 10:50:07.766246     384 topology_manager.go:187] "Topology Admit Handler"
E0504 10:50:07.771131     384 reflector.go:138] object-"cattle-system"/"helm-operation-x27kx": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "helm-operation-x27kx" is forbidden: User "system:node:local-node" cannot list resource "secrets" in API group "" in the namespace "cattle-system": no relationship found between node 'local-node' and this object
I0504 10:50:07.850987     384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-admin-kubeconfig\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") "
I0504 10:50:07.851020     384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-user-kubeconfig\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") "
I0504 10:50:07.851050     384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pod-impersonation-helm-op-cw62z-token-p4nll\" (UniqueName: \"kubernetes.io/secret/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-pod-impersonation-helm-op-cw62z-token-p4nll\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") "
I0504 10:50:07.851071     384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"data\" (UniqueName: \"kubernetes.io/secret/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-data\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") "
I0504 10:50:08.153040     384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"data\" (UniqueName: \"kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-data\") pod \"983816b1-6d2b-4a07-a436-048c066cb0c4\" (UID: \"983816b1-6d2b-4a07-a436-048c066cb0c4\") "
I0504 10:50:08.153086     384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"pod-impersonation-helm-op-ccrqs-token-dgfkx\" (UniqueName: \"kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-pod-impersonation-helm-op-ccrqs-token-dgfkx\") pod \"983816b1-6d2b-4a07-a436-048c066cb0c4\" (UID: \"983816b1-6d2b-4a07-a436-048c066cb0c4\") "
I0504 10:50:08.153125     384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-admin-kubeconfig\") pod \"983816b1-6d2b-4a07-a436-048c066cb0c4\" (UID: \"983816b1-6d2b-4a07-a436-048c066cb0c4\") "
I0504 10:50:08.153155     384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-user-kubeconfig\") pod \"983816b1-6d2b-4a07-a436-048c066cb0c4\" (UID: \"983816b1-6d2b-4a07-a436-048c066cb0c4\") "
I0504 10:50:08.158642     384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-data" (OuterVolumeSpecName: "data") pod "983816b1-6d2b-4a07-a436-048c066cb0c4" (UID: "983816b1-6d2b-4a07-a436-048c066cb0c4"). InnerVolumeSpecName "data". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0504 10:50:08.158676     384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-pod-impersonation-helm-op-ccrqs-token-dgfkx" (OuterVolumeSpecName: "pod-impersonation-helm-op-ccrqs-token-dgfkx") pod "983816b1-6d2b-4a07-a436-048c066cb0c4" (UID: "983816b1-6d2b-4a07-a436-048c066cb0c4"). InnerVolumeSpecName "pod-impersonation-helm-op-ccrqs-token-dgfkx". PluginName "kubernetes.io/secret", VolumeGidValue ""
W0504 10:50:08.162151     384 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volume-subpaths/admin-kubeconfig/proxy/0" is not a mountpoint, deleting
W0504 10:50:08.162331     384 empty_dir.go:520] Warning: Failed to clear quota on /var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volumes/kubernetes.io~configmap/admin-kubeconfig: clearQuota called, but quotas disabled
I0504 10:50:08.162480     384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-admin-kubeconfig" (OuterVolumeSpecName: "admin-kubeconfig") pod "983816b1-6d2b-4a07-a436-048c066cb0c4" (UID: "983816b1-6d2b-4a07-a436-048c066cb0c4"). InnerVolumeSpecName "admin-kubeconfig". PluginName "kubernetes.io/configmap", VolumeGidValue ""
W0504 10:50:08.163029     384 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volume-subpaths/user-kubeconfig/helm/1" is not a mountpoint, deleting
W0504 10:50:08.163206     384 empty_dir.go:520] Warning: Failed to clear quota on /var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volumes/kubernetes.io~configmap/user-kubeconfig: clearQuota called, but quotas disabled
I0504 10:50:08.163341     384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-user-kubeconfig" (OuterVolumeSpecName: "user-kubeconfig") pod "983816b1-6d2b-4a07-a436-048c066cb0c4" (UID: "983816b1-6d2b-4a07-a436-048c066cb0c4"). InnerVolumeSpecName "user-kubeconfig". PluginName "kubernetes.io/configmap", VolumeGidValue ""
I0504 10:50:08.253739     384 reconciler.go:319] "Volume detached for volume \"pod-impersonation-helm-op-ccrqs-token-dgfkx\" (UniqueName: \"kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-pod-impersonation-helm-op-ccrqs-token-dgfkx\") on node \"local-node\" DevicePath \"\""
I0504 10:50:08.253777     384 reconciler.go:319] "Volume detached for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-admin-kubeconfig\") on node \"local-node\" DevicePath \"\""
I0504 10:50:08.253796     384 reconciler.go:319] "Volume detached for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-user-kubeconfig\") on node \"local-node\" DevicePath \"\""
I0504 10:50:08.253810     384 reconciler.go:319] "Volume detached for volume \"data\" (UniqueName: \"kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-data\") on node \"local-node\" DevicePath \"\""
I0504 10:50:08.538708     384 scope.go:111] "RemoveContainer" containerID="c5d282c066b4e5ef7af1d19a140edf403a88bacecf8e643245dc9bbc1f2ab4ff"
I0504 10:50:08.728797     384 scope.go:111] "RemoveContainer" containerID="a8d715771516a94efe3e133eb5a1fa2e6e4b500469991d84eea1fb09a4c33bfb"
E0504 10:50:08.963085     384 secret.go:195] Couldn't get secret cattle-system/helm-operation-x27kx: failed to sync secret cache: timed out waiting for the condition
E0504 10:50:08.963175     384 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/secret/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-data podName:c1e5f377-3f12-4f55-8ad3-fbeb1e36661f nodeName:}" failed. No retries permitted until 2022-05-04 10:50:09.463145133 +0000 UTC m=+4563.113500603 (durationBeforeRetry 500ms). Error: "MountVolume.SetUp failed for volume \"data\" (UniqueName: \"kubernetes.io/secret/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-data\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") : failed to sync secret cache: timed out waiting for the condition"
E0504 10:50:09.220083     384 remote_runtime.go:144] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0\": failed to delete \"eth0\": no such device" podSandboxID="1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0"
E0504 10:50:09.220128     384 kuberuntime_manager.go:958] "Failed to stop sandbox" podSandboxID={Type:containerd ID:1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0}
E0504 10:50:09.220168     384 kubelet_pods.go:1288] "Failed killing the pod" err="failed to \"KillPodSandbox\" for \"27214cba-0e55-4bce-80e8-93a99e5febc0\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\"1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0\\\": failed to delete \\\"eth0\\\": no such device\"" podName="helm-operation-r6lhv"
I0504 10:50:09.541249     384 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0"
I0504 10:50:09.733994     384 kubelet_volumes.go:140] "Cleaned up orphaned pod volumes dir" podUID=983816b1-6d2b-4a07-a436-048c066cb0c4 path="/var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volumes"
I0504 10:50:09.734306     384 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volumes"
I0504 10:50:10.171123     384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-admin-kubeconfig\") pod \"27214cba-0e55-4bce-80e8-93a99e5febc0\" (UID: \"27214cba-0e55-4bce-80e8-93a99e5febc0\") "
I0504 10:50:10.171166     384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-user-kubeconfig\") pod \"27214cba-0e55-4bce-80e8-93a99e5febc0\" (UID: \"27214cba-0e55-4bce-80e8-93a99e5febc0\") "
I0504 10:50:10.171202     384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"pod-impersonation-helm-op-6hv5p-token-c7qgn\" (UniqueName: \"kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-pod-impersonation-helm-op-6hv5p-token-c7qgn\") pod \"27214cba-0e55-4bce-80e8-93a99e5febc0\" (UID: \"27214cba-0e55-4bce-80e8-93a99e5febc0\") "
I0504 10:50:10.171228     384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"data\" (UniqueName: \"kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-data\") pod \"27214cba-0e55-4bce-80e8-93a99e5febc0\" (UID: \"27214cba-0e55-4bce-80e8-93a99e5febc0\") "
I0504 10:50:10.183335     384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-data" (OuterVolumeSpecName: "data") pod "27214cba-0e55-4bce-80e8-93a99e5febc0" (UID: "27214cba-0e55-4bce-80e8-93a99e5febc0"). InnerVolumeSpecName "data". PluginName "kubernetes.io/secret", VolumeGidValue ""
W0504 10:50:10.185340     384 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volume-subpaths/admin-kubeconfig/proxy/0" is not a mountpoint, deleting
W0504 10:50:10.186158     384 empty_dir.go:520] Warning: Failed to clear quota on /var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volumes/kubernetes.io~configmap/admin-kubeconfig: clearQuota called, but quotas disabled
I0504 10:50:10.186302     384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-admin-kubeconfig" (OuterVolumeSpecName: "admin-kubeconfig") pod "27214cba-0e55-4bce-80e8-93a99e5febc0" (UID: "27214cba-0e55-4bce-80e8-93a99e5febc0"). InnerVolumeSpecName "admin-kubeconfig". PluginName "kubernetes.io/configmap", VolumeGidValue ""
W0504 10:50:10.191100     384 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volume-subpaths/user-kubeconfig/helm/1" is not a mountpoint, deleting
W0504 10:50:10.191282     384 empty_dir.go:520] Warning: Failed to clear quota on /var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volumes/kubernetes.io~configmap/user-kubeconfig: clearQuota called, but quotas disabled
I0504 10:50:10.191414     384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-user-kubeconfig" (OuterVolumeSpecName: "user-kubeconfig") pod "27214cba-0e55-4bce-80e8-93a99e5febc0" (UID: "27214cba-0e55-4bce-80e8-93a99e5febc0"). InnerVolumeSpecName "user-kubeconfig". PluginName "kubernetes.io/configmap", VolumeGidValue ""
I0504 10:50:10.196683     384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-pod-impersonation-helm-op-6hv5p-token-c7qgn" (OuterVolumeSpecName: "pod-impersonation-helm-op-6hv5p-token-c7qgn") pod "27214cba-0e55-4bce-80e8-93a99e5febc0" (UID: "27214cba-0e55-4bce-80e8-93a99e5febc0"). InnerVolumeSpecName "pod-impersonation-helm-op-6hv5p-token-c7qgn". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0504 10:50:10.271519     384 reconciler.go:319] "Volume detached for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-user-kubeconfig\") on node \"local-node\" DevicePath \"\""
I0504 10:50:10.271562     384 reconciler.go:319] "Volume detached for volume \"pod-impersonation-helm-op-6hv5p-token-c7qgn\" (UniqueName: \"kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-pod-impersonation-helm-op-6hv5p-token-c7qgn\") on node \"local-node\" DevicePath \"\""
I0504 10:50:10.271576     384 reconciler.go:319] "Volume detached for volume \"data\" (UniqueName: \"kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-data\") on node \"local-node\" DevicePath \"\""
I0504 10:50:10.271588     384 reconciler.go:319] "Volume detached for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-admin-kubeconfig\") on node \"local-node\" DevicePath \"\""
2022-05-04 10:50:11.418293 I | etcdserver/api/etcdhttp: /health OK (status code 200)
I0504 10:50:11.547327     384 scope.go:111] "RemoveContainer" containerID="fc05f0c3194a5ed3e7a7c2b6eac28771d4fcca4fcac75e4606aee3b4294020be"
I0504 10:50:11.729244     384 scope.go:111] "RemoveContainer" containerID="2153688c9bc4c69e1aac1577502cd200781f9fad53c1997fdbaf710e9f839cb6"
I0504 10:50:11.734755     384 kubelet_volumes.go:140] "Cleaned up orphaned pod volumes dir" podUID=27214cba-0e55-4bce-80e8-93a99e5febc0 path="/var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volumes"
I0504 10:50:11.735083     384 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volumes"
I0504 10:50:12.549663     384 scope.go:111] "RemoveContainer" containerID="fc05f0c3194a5ed3e7a7c2b6eac28771d4fcca4fcac75e4606aee3b4294020be"

I only can guess that something is wrong with some serviceaccount and/or its rolebindings or so judging from the log-line

...
E0504 10:50:07.771131     384 reflector.go:138] object-"cattle-system"/"helm-operation-x27kx": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "helm-operation-x27kx" is forbidden: User "system:node:local-node" cannot list resource "secrets" in API group "" in the namespace "cattle-system": no relationship found between node 'local-node' and this object
...

Thanks for any tips about how to tackle that problem in advance.

Seems I found a solution after digging a little bit further into that helm stuff :expressionless:

I opened a kubectl-shell in via the Rancher-UI and executed following commands

> helm --namespace cattle-system list --all
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
rancher-webhook cattle-system   96              2022-02-17 21:12:54.311285925 +0000 UTC pending-upgrade rancher-webhook-1.0.3+up0.2.4   0.2.4

> helm history rancher-webhook --namespace cattle-system
REVISION        UPDATED                         STATUS          CHART                           APP VERSION     DESCRIPTION                                                          
2               Wed Dec 29 13:22:18 2021        deployed        rancher-webhook-1.0.2+up0.2.2   0.2.2           Upgrade complete                                                     
93              Thu Feb 17 20:52:51 2022        failed          rancher-webhook-1.0.3+up0.2.4   0.2.4           Upgrade "rancher-webhook" failed: timed out waiting for the condition
94              Thu Feb 17 20:57:54 2022        failed          rancher-webhook-1.0.3+up0.2.4   0.2.4           Upgrade "rancher-webhook" failed: timed out waiting for the condition
95              Thu Feb 17 21:07:51 2022        failed          rancher-webhook-1.0.3+up0.2.4   0.2.4           Upgrade "rancher-webhook" failed: timed out waiting for the condition
96              Thu Feb 17 21:12:54 2022        pending-upgrade rancher-webhook-1.0.3+up0.2.4   0.2.4           Preparing upgrade

> helm history rancher-webhook --namespace cattle-system
REVISION        UPDATED                         STATUS          CHART                           APP VERSION     DESCRIPTION                                                          
2               Wed Dec 29 13:22:18 2021        deployed        rancher-webhook-1.0.2+up0.2.2   0.2.2           Upgrade complete                                                     
93              Thu Feb 17 20:52:51 2022        failed          rancher-webhook-1.0.3+up0.2.4   0.2.4           Upgrade "rancher-webhook" failed: timed out waiting for the condition
94              Thu Feb 17 20:57:54 2022        failed          rancher-webhook-1.0.3+up0.2.4   0.2.4           Upgrade "rancher-webhook" failed: timed out waiting for the condition
95              Thu Feb 17 21:07:51 2022        failed          rancher-webhook-1.0.3+up0.2.4   0.2.4           Upgrade "rancher-webhook" failed: timed out waiting for the condition
96              Thu Feb 17 21:12:54 2022        pending-upgrade rancher-webhook-1.0.3+up0.2.4   0.2.4           Preparing upgrade       
                                             
> helm rollback rancher-webhook 95 --namespace cattle-system
checking 6 resources for changes
Looks like there are no changes for ServiceAccount "rancher-webhook"
Looks like there are no changes for ClusterRoleBinding "rancher-webhook"
Looks like there are no changes for Service "webhook-service"
Looks like there are no changes for Service "rancher-webhook"
Rollback was a success! Happy Helming!

> helm --namespace cattle-system list --all
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
rancher-webhook cattle-system   97              2022-05-05 06:50:24.067130778 +0000 UTC deployed        rancher-webhook-1.0.3+up0.2.4   0.2.4      

> helm history rancher-webhook --namespace cattle-system
REVISION        UPDATED                         STATUS          CHART                           APP VERSION     DESCRIPTION                                                          
2               Wed Dec 29 13:22:18 2021        superseded      rancher-webhook-1.0.2+up0.2.2   0.2.2           Upgrade complete                                                     
93              Thu Feb 17 20:52:51 2022        failed          rancher-webhook-1.0.3+up0.2.4   0.2.4           Upgrade "rancher-webhook" failed: timed out waiting for the condition
94              Thu Feb 17 20:57:54 2022        failed          rancher-webhook-1.0.3+up0.2.4   0.2.4           Upgrade "rancher-webhook" failed: timed out waiting for the condition
95              Thu Feb 17 21:07:51 2022        failed          rancher-webhook-1.0.3+up0.2.4   0.2.4           Upgrade "rancher-webhook" failed: timed out waiting for the condition
96              Thu Feb 17 21:12:54 2022        pending-upgrade rancher-webhook-1.0.3+up0.2.4   0.2.4           Preparing upgrade                                                    
97              Thu May  5 06:50:24 2022        deployed        rancher-webhook-1.0.3+up0.2.4   0.2.4           Rollback to 95

After that I waited again about 30-60min to see if those helm-operation pods are going to pop-up in 5 minute interval again. But it stopped - logs also seemed to be ok.

Then I tried the upgrade to v2.6.4 again and it worked without any problems.

1 Like

Thank you very much for this suggestion! This problem happened to us when upgrading from 2.6.8 → 2.6.9, and the RAM usage for the kube-apiserver & Rancher servers spiked and consumed all RAM on the node. It took us a while to recover.

We are using VMs and RKE (version 1).