Hi,
we have here a strange problem - I got the task to upgrade our Lab-Rancher installation which currently runs via docker
on VM
docker run -d --restart=unless-stopped --network=host -v /data/certs/cacerts.pem:/etc/rancher/ssl/cacerts.pem -v /data/rancher-data:/var/lib/rancher -v /data/registries.yaml:/etc/rancher/k3s/registries.yaml --privileged our.obfuscated.private.repo/sys/rancher:v2.6.3-patch1
Before I started I created a snapshot of the VM in case something goes wrong with upgrading to v2.6.4 - and as feared something went wrong. In 2.6.4 I saw that in the “local”-Kubernetes cluster the helm-operation
pod somehow always failed partially to be precise it writes in the pod-logs:
Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress
Every 5 minutes another pod is being created and failing. Everything else works in Rancher as expected, but I don’t feel right about those helm-operation
pods and they also create some new configmaps called impersonation-helm-op-admin-kubeconfig...
and impersonation-helm-op-user-kubeconfig...
So I rolled back the VM’s snapshot and saw that we already had the same issue within Rancher v2.6.3-patch1. So after digging a little bit in further I guess there was already some issue, when a colleague tried to update from v2.6.3 to v2.6.3-patch1 and missed those errors.
How can I fix this now manually? Is it event fixable or will I have to resetup everything - including the one and thankfully only RKE cluster we manage via this Rancher installation?
Here are some logs from the k3s.log
that repeat approximately every 5 min
I0504 10:50:04.937820 384 topology_manager.go:187] "Topology Admit Handler"
I0504 10:50:05.025043 384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"data\" (UniqueName: \"kubernetes.io/secret/0aff4470-947e-470e-a1f9-b4c2ae4416f5-data\") pod \"helm-operation-52rpg\" (UID: \"0aff4470-947e-470e-a1f9-b4c2ae4416f5\") "
I0504 10:50:05.025105 384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/0aff4470-947e-470e-a1f9-b4c2ae4416f5-admin-kubeconfig\") pod \"helm-operation-52rpg\" (UID: \"0aff4470-947e-470e-a1f9-b4c2ae4416f5\") "
I0504 10:50:05.025151 384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/0aff4470-947e-470e-a1f9-b4c2ae4416f5-user-kubeconfig\") pod \"helm-operation-52rpg\" (UID: \"0aff4470-947e-470e-a1f9-b4c2ae4416f5\") "
I0504 10:50:05.025195 384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pod-impersonation-helm-op-tv69v-token-v89bh\" (UniqueName: \"kubernetes.io/secret/0aff4470-947e-470e-a1f9-b4c2ae4416f5-pod-impersonation-helm-op-tv69v-token-v89bh\") pod \"helm-operation-52rpg\" (UID: \"0aff4470-947e-470e-a1f9-b4c2ae4416f5\") "
I0504 10:50:05.729940 384 scope.go:111] "RemoveContainer" containerID="08a18c6a9dcf524e083b95cd8ecc4ba578a286c54d8c8e7e922080b16d60ed2e"
I0504 10:50:06.532431 384 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="4c2036b896f63a4ab0fbd43315a02f35b81b662843fb8dd700543c6a6b248c44"
I0504 10:50:07.536574 384 scope.go:111] "RemoveContainer" containerID="c5d282c066b4e5ef7af1d19a140edf403a88bacecf8e643245dc9bbc1f2ab4ff"
I0504 10:50:07.766246 384 topology_manager.go:187] "Topology Admit Handler"
E0504 10:50:07.771131 384 reflector.go:138] object-"cattle-system"/"helm-operation-x27kx": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "helm-operation-x27kx" is forbidden: User "system:node:local-node" cannot list resource "secrets" in API group "" in the namespace "cattle-system": no relationship found between node 'local-node' and this object
I0504 10:50:07.850987 384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-admin-kubeconfig\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") "
I0504 10:50:07.851020 384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-user-kubeconfig\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") "
I0504 10:50:07.851050 384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"pod-impersonation-helm-op-cw62z-token-p4nll\" (UniqueName: \"kubernetes.io/secret/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-pod-impersonation-helm-op-cw62z-token-p4nll\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") "
I0504 10:50:07.851071 384 reconciler.go:224] "operationExecutor.VerifyControllerAttachedVolume started for volume \"data\" (UniqueName: \"kubernetes.io/secret/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-data\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") "
I0504 10:50:08.153040 384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"data\" (UniqueName: \"kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-data\") pod \"983816b1-6d2b-4a07-a436-048c066cb0c4\" (UID: \"983816b1-6d2b-4a07-a436-048c066cb0c4\") "
I0504 10:50:08.153086 384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"pod-impersonation-helm-op-ccrqs-token-dgfkx\" (UniqueName: \"kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-pod-impersonation-helm-op-ccrqs-token-dgfkx\") pod \"983816b1-6d2b-4a07-a436-048c066cb0c4\" (UID: \"983816b1-6d2b-4a07-a436-048c066cb0c4\") "
I0504 10:50:08.153125 384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-admin-kubeconfig\") pod \"983816b1-6d2b-4a07-a436-048c066cb0c4\" (UID: \"983816b1-6d2b-4a07-a436-048c066cb0c4\") "
I0504 10:50:08.153155 384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-user-kubeconfig\") pod \"983816b1-6d2b-4a07-a436-048c066cb0c4\" (UID: \"983816b1-6d2b-4a07-a436-048c066cb0c4\") "
I0504 10:50:08.158642 384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-data" (OuterVolumeSpecName: "data") pod "983816b1-6d2b-4a07-a436-048c066cb0c4" (UID: "983816b1-6d2b-4a07-a436-048c066cb0c4"). InnerVolumeSpecName "data". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0504 10:50:08.158676 384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-pod-impersonation-helm-op-ccrqs-token-dgfkx" (OuterVolumeSpecName: "pod-impersonation-helm-op-ccrqs-token-dgfkx") pod "983816b1-6d2b-4a07-a436-048c066cb0c4" (UID: "983816b1-6d2b-4a07-a436-048c066cb0c4"). InnerVolumeSpecName "pod-impersonation-helm-op-ccrqs-token-dgfkx". PluginName "kubernetes.io/secret", VolumeGidValue ""
W0504 10:50:08.162151 384 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volume-subpaths/admin-kubeconfig/proxy/0" is not a mountpoint, deleting
W0504 10:50:08.162331 384 empty_dir.go:520] Warning: Failed to clear quota on /var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volumes/kubernetes.io~configmap/admin-kubeconfig: clearQuota called, but quotas disabled
I0504 10:50:08.162480 384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-admin-kubeconfig" (OuterVolumeSpecName: "admin-kubeconfig") pod "983816b1-6d2b-4a07-a436-048c066cb0c4" (UID: "983816b1-6d2b-4a07-a436-048c066cb0c4"). InnerVolumeSpecName "admin-kubeconfig". PluginName "kubernetes.io/configmap", VolumeGidValue ""
W0504 10:50:08.163029 384 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volume-subpaths/user-kubeconfig/helm/1" is not a mountpoint, deleting
W0504 10:50:08.163206 384 empty_dir.go:520] Warning: Failed to clear quota on /var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volumes/kubernetes.io~configmap/user-kubeconfig: clearQuota called, but quotas disabled
I0504 10:50:08.163341 384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-user-kubeconfig" (OuterVolumeSpecName: "user-kubeconfig") pod "983816b1-6d2b-4a07-a436-048c066cb0c4" (UID: "983816b1-6d2b-4a07-a436-048c066cb0c4"). InnerVolumeSpecName "user-kubeconfig". PluginName "kubernetes.io/configmap", VolumeGidValue ""
I0504 10:50:08.253739 384 reconciler.go:319] "Volume detached for volume \"pod-impersonation-helm-op-ccrqs-token-dgfkx\" (UniqueName: \"kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-pod-impersonation-helm-op-ccrqs-token-dgfkx\") on node \"local-node\" DevicePath \"\""
I0504 10:50:08.253777 384 reconciler.go:319] "Volume detached for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-admin-kubeconfig\") on node \"local-node\" DevicePath \"\""
I0504 10:50:08.253796 384 reconciler.go:319] "Volume detached for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/983816b1-6d2b-4a07-a436-048c066cb0c4-user-kubeconfig\") on node \"local-node\" DevicePath \"\""
I0504 10:50:08.253810 384 reconciler.go:319] "Volume detached for volume \"data\" (UniqueName: \"kubernetes.io/secret/983816b1-6d2b-4a07-a436-048c066cb0c4-data\") on node \"local-node\" DevicePath \"\""
I0504 10:50:08.538708 384 scope.go:111] "RemoveContainer" containerID="c5d282c066b4e5ef7af1d19a140edf403a88bacecf8e643245dc9bbc1f2ab4ff"
I0504 10:50:08.728797 384 scope.go:111] "RemoveContainer" containerID="a8d715771516a94efe3e133eb5a1fa2e6e4b500469991d84eea1fb09a4c33bfb"
E0504 10:50:08.963085 384 secret.go:195] Couldn't get secret cattle-system/helm-operation-x27kx: failed to sync secret cache: timed out waiting for the condition
E0504 10:50:08.963175 384 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/secret/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-data podName:c1e5f377-3f12-4f55-8ad3-fbeb1e36661f nodeName:}" failed. No retries permitted until 2022-05-04 10:50:09.463145133 +0000 UTC m=+4563.113500603 (durationBeforeRetry 500ms). Error: "MountVolume.SetUp failed for volume \"data\" (UniqueName: \"kubernetes.io/secret/c1e5f377-3f12-4f55-8ad3-fbeb1e36661f-data\") pod \"helm-operation-z67tt\" (UID: \"c1e5f377-3f12-4f55-8ad3-fbeb1e36661f\") : failed to sync secret cache: timed out waiting for the condition"
E0504 10:50:09.220083 384 remote_runtime.go:144] "StopPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to destroy network for sandbox \"1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0\": failed to delete \"eth0\": no such device" podSandboxID="1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0"
E0504 10:50:09.220128 384 kuberuntime_manager.go:958] "Failed to stop sandbox" podSandboxID={Type:containerd ID:1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0}
E0504 10:50:09.220168 384 kubelet_pods.go:1288] "Failed killing the pod" err="failed to \"KillPodSandbox\" for \"27214cba-0e55-4bce-80e8-93a99e5febc0\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to destroy network for sandbox \\\"1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0\\\": failed to delete \\\"eth0\\\": no such device\"" podName="helm-operation-r6lhv"
I0504 10:50:09.541249 384 pod_container_deletor.go:79] "Container not found in pod's containers" containerID="1a1e0f596761dab9c861f9dd488979a662e1e91f1381d94c6aab6bf8d67944d0"
I0504 10:50:09.733994 384 kubelet_volumes.go:140] "Cleaned up orphaned pod volumes dir" podUID=983816b1-6d2b-4a07-a436-048c066cb0c4 path="/var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volumes"
I0504 10:50:09.734306 384 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/983816b1-6d2b-4a07-a436-048c066cb0c4/volumes"
I0504 10:50:10.171123 384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-admin-kubeconfig\") pod \"27214cba-0e55-4bce-80e8-93a99e5febc0\" (UID: \"27214cba-0e55-4bce-80e8-93a99e5febc0\") "
I0504 10:50:10.171166 384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-user-kubeconfig\") pod \"27214cba-0e55-4bce-80e8-93a99e5febc0\" (UID: \"27214cba-0e55-4bce-80e8-93a99e5febc0\") "
I0504 10:50:10.171202 384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"pod-impersonation-helm-op-6hv5p-token-c7qgn\" (UniqueName: \"kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-pod-impersonation-helm-op-6hv5p-token-c7qgn\") pod \"27214cba-0e55-4bce-80e8-93a99e5febc0\" (UID: \"27214cba-0e55-4bce-80e8-93a99e5febc0\") "
I0504 10:50:10.171228 384 reconciler.go:196] "operationExecutor.UnmountVolume started for volume \"data\" (UniqueName: \"kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-data\") pod \"27214cba-0e55-4bce-80e8-93a99e5febc0\" (UID: \"27214cba-0e55-4bce-80e8-93a99e5febc0\") "
I0504 10:50:10.183335 384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-data" (OuterVolumeSpecName: "data") pod "27214cba-0e55-4bce-80e8-93a99e5febc0" (UID: "27214cba-0e55-4bce-80e8-93a99e5febc0"). InnerVolumeSpecName "data". PluginName "kubernetes.io/secret", VolumeGidValue ""
W0504 10:50:10.185340 384 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volume-subpaths/admin-kubeconfig/proxy/0" is not a mountpoint, deleting
W0504 10:50:10.186158 384 empty_dir.go:520] Warning: Failed to clear quota on /var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volumes/kubernetes.io~configmap/admin-kubeconfig: clearQuota called, but quotas disabled
I0504 10:50:10.186302 384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-admin-kubeconfig" (OuterVolumeSpecName: "admin-kubeconfig") pod "27214cba-0e55-4bce-80e8-93a99e5febc0" (UID: "27214cba-0e55-4bce-80e8-93a99e5febc0"). InnerVolumeSpecName "admin-kubeconfig". PluginName "kubernetes.io/configmap", VolumeGidValue ""
W0504 10:50:10.191100 384 mount_helper_common.go:133] Warning: "/var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volume-subpaths/user-kubeconfig/helm/1" is not a mountpoint, deleting
W0504 10:50:10.191282 384 empty_dir.go:520] Warning: Failed to clear quota on /var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volumes/kubernetes.io~configmap/user-kubeconfig: clearQuota called, but quotas disabled
I0504 10:50:10.191414 384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-user-kubeconfig" (OuterVolumeSpecName: "user-kubeconfig") pod "27214cba-0e55-4bce-80e8-93a99e5febc0" (UID: "27214cba-0e55-4bce-80e8-93a99e5febc0"). InnerVolumeSpecName "user-kubeconfig". PluginName "kubernetes.io/configmap", VolumeGidValue ""
I0504 10:50:10.196683 384 operation_generator.go:830] UnmountVolume.TearDown succeeded for volume "kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-pod-impersonation-helm-op-6hv5p-token-c7qgn" (OuterVolumeSpecName: "pod-impersonation-helm-op-6hv5p-token-c7qgn") pod "27214cba-0e55-4bce-80e8-93a99e5febc0" (UID: "27214cba-0e55-4bce-80e8-93a99e5febc0"). InnerVolumeSpecName "pod-impersonation-helm-op-6hv5p-token-c7qgn". PluginName "kubernetes.io/secret", VolumeGidValue ""
I0504 10:50:10.271519 384 reconciler.go:319] "Volume detached for volume \"user-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-user-kubeconfig\") on node \"local-node\" DevicePath \"\""
I0504 10:50:10.271562 384 reconciler.go:319] "Volume detached for volume \"pod-impersonation-helm-op-6hv5p-token-c7qgn\" (UniqueName: \"kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-pod-impersonation-helm-op-6hv5p-token-c7qgn\") on node \"local-node\" DevicePath \"\""
I0504 10:50:10.271576 384 reconciler.go:319] "Volume detached for volume \"data\" (UniqueName: \"kubernetes.io/secret/27214cba-0e55-4bce-80e8-93a99e5febc0-data\") on node \"local-node\" DevicePath \"\""
I0504 10:50:10.271588 384 reconciler.go:319] "Volume detached for volume \"admin-kubeconfig\" (UniqueName: \"kubernetes.io/configmap/27214cba-0e55-4bce-80e8-93a99e5febc0-admin-kubeconfig\") on node \"local-node\" DevicePath \"\""
2022-05-04 10:50:11.418293 I | etcdserver/api/etcdhttp: /health OK (status code 200)
I0504 10:50:11.547327 384 scope.go:111] "RemoveContainer" containerID="fc05f0c3194a5ed3e7a7c2b6eac28771d4fcca4fcac75e4606aee3b4294020be"
I0504 10:50:11.729244 384 scope.go:111] "RemoveContainer" containerID="2153688c9bc4c69e1aac1577502cd200781f9fad53c1997fdbaf710e9f839cb6"
I0504 10:50:11.734755 384 kubelet_volumes.go:140] "Cleaned up orphaned pod volumes dir" podUID=27214cba-0e55-4bce-80e8-93a99e5febc0 path="/var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volumes"
I0504 10:50:11.735083 384 kubelet_getters.go:300] "Path does not exist" path="/var/lib/kubelet/pods/27214cba-0e55-4bce-80e8-93a99e5febc0/volumes"
I0504 10:50:12.549663 384 scope.go:111] "RemoveContainer" containerID="fc05f0c3194a5ed3e7a7c2b6eac28771d4fcca4fcac75e4606aee3b4294020be"
I only can guess that something is wrong with some serviceaccount and/or its rolebindings or so judging from the log-line
...
E0504 10:50:07.771131 384 reflector.go:138] object-"cattle-system"/"helm-operation-x27kx": Failed to watch *v1.Secret: failed to list *v1.Secret: secrets "helm-operation-x27kx" is forbidden: User "system:node:local-node" cannot list resource "secrets" in API group "" in the namespace "cattle-system": no relationship found between node 'local-node' and this object
...
Thanks for any tips about how to tackle that problem in advance.