Failed to connect to peer wss://10.42.0.22/v3/connect [local ID=10.42.1.15]: websocket: bad handshake

Hello,

I have installed a Rancher HA cluster with K3S (in mode with ETCD).
The installation of Rancher was successful. I specify that I use cert-manager for the management of certificates.

However, in the logs of the Rancher pods, I end up with the following errors :

[ERROR] Failed to connect to peer wss://10.42.2.19/v3/connect [local ID=10.42.0.22]: websocket: bad handshake

And when I create a cluster and want to add a node, adding the node doesn’t work, the node tries to make the wss connection but it doesn’t go any further.
It gets stuck at this step :

level=info msg="Connecting to proxy" url="wss://admin-k8s.harvest.fr/v3/connect"
level=info msg="Starting plan monitor, checking every 15 seconds"

Thank you in advance for your help.

Théo VILLEMINOT

Hi @tvilleminot were you able to solve this? We are on the same exact situation right now on a new installed 2.5.8 rancher

best

Hi @spatialy,
I solved the problem by using version of K3S v1.20+k3s1 instead of version v1.21+k3s1.
I think that Rancher does not yet support the version of K3S v1.21+k3s1 well and that it is therefore necessary to stay for the moment in version v1.20+k3s1. Moreover, Rancher still offers version 1.21 of K8S for the creation of clusters.

Hi @tvilleminot, we were able to get a functioning cluster but after that when we try to add a node to any new cluster we get some weird messages presumable related to the k3s using traffic as ingress controller and some sort of issue routing calls to wss:// … you too get this?

Best

Hi @spatialy,
Had this problem in v1.21+k3s1 of K3S but this is no longer the case in v1.20+k3s1 of K3S.
Are you now in version 1.20 ?

HI @tvilleminot yes we downgrade and this error disappear but another one arise

Hi @spatialy,
I noticed that if I tried to add only an etcd/control plane node to the cluster, I still had a problem with wss communication. For that, I also had to add a worker node and the nodes are added correctly.
I don’t know if it’s the same problem for you ?

Hi @tvilleminot you are right … last time we deploy rancher was when v2 was out and after that only using upgrades to new versions … now the cluster is not initialized until you get all roles registered in Rancher.

Best

For what it is worth, I had gotten the same error with Rancher’s RKE2 on the k8s 1.21+ version. Just make sure to set your INSTALL_RKE2_CHANNEL env variable to v1.20 (vs stable or latest).

This also happens with RKE1 and Rancher 2.5. We ran into this exact error, but didn’t notice it at first because everything appeared fine at first. It ended up corrupting one of our clusters pretty badly.

Just for the record, Rancher 2.5 does not support Kubernetes 1.21 according to the Rancher 2.5 Support Matrix.