We are running RKE v1.3.15 on a local Rancher Kubernetes cluster (v1.23.6) with Rancher v2.6.12. Our downstream clusters are on Kubernetes v1.24.13.
While attempting to upgrade Rancher from v2.6.12 to v2.6.14 or v2.7.x, we encountered recurring pod failures in the “cattle-system” namespace. These failures cause the Rancher and downstream clusters to fluctuate between available and unavailable states.
We followed the solution in GitHub issue #26082 to remove rancher-token-* secrets, but the issue persists. Rancher pod logs frequently show connection errors like:
Failed to connect to peer wss:///v3/connect [local ID=]: websocket: bad handshake
2024/10/24 16:38:48 [ERROR] error syncing ‘c-r8stq/m-1be734bafd08’: handler node-controller: job already exists for c-r8stq/m-1be734bafd08, requeuing
2024/10/24 16:38:49 [ERROR] error syncing ‘c-r8stq/m-1be734bafd08’: handler node-controller: job already exists for c-r8stq/m-1be734bafd08, requeuing
2024/10/24 16:38:50 [ERROR] [rkebootstrap] fleet-default/custom-553fed3fb05c: error getting machine by owner reference no matching controller owner ref
2024/10/24 16:38:50 [ERROR] error syncing ‘fleet-default/custom-553fed3fb05c’: handler rke-bootstrap: no matching controller owner ref, requeuing
2024/10/24 16:38:50 [ERROR] [rkebootstrap] fleet-default/custom-8fcd833ca169: error getting machine by owner reference no matching controller owner ref
2024/10/24 16:38:50 [ERROR] error syncing ‘fleet-default/custom-8fcd833ca169’: handler rke-bootstrap: no matching controller owner ref, requeuing
2024/10/24 16:38:50 [ERROR] [rkebootstrap] fleet-default/custom-9f5a365127b8: error getting machine by owner reference no matching controller owner ref
2024/10/24 16:38:50 [ERROR] error syncing ‘fleet-default/custom-9f5a365127b8’: handler rke-bootstrap: no matching controller owner ref, requeuing
2024/10/24 16:38:50 [ERROR] [rkebootstrap] fleet-default/custom-c57175487998: error getting machine by owner reference no matching controller owner ref
2024/10/24 16:38:50 [ERROR] error syncing ‘fleet-default/custom-c57175487998’: handler rke-bootstrap: no matching controller owner ref, requeuing
2024/10/24 16:38:50 [ERROR] available chart version (100.2.3+up0.5.3) for fleet is less than the min version (102.2.0+up0.8.0)
2024/10/24 16:38:50 [ERROR] Failed to find system chart fleet will try again in 5 seconds: no chart name found
2024/10/24 16:38:50 [ERROR] error syncing ‘fleet-default/sandbox-managed-system-upgrade-controller’: handler mcc-bundle: no chart version found for system-upgrade-controller-102.1.0+up0.5.0, requeuing
panic: indexer conflict: map[byPod:{}]