Adding additional servers to cluster results in certificate verify errors

mikeh89 · September 7, 2022, 10:05am

Running v1.20.7+k3s1 in HA mode with external Postgres DB.
Nodes running Ubuntu 20.04 LTS.

Issue:
Due to deprecated hardware, we’re retiring some of our nodes to replace with newer hardware.
In order to maintain HA status, this also includes creating new servers/control planes.

When adding new server nodes to the cluster, existing nodes report certificate verify errors, resulting in downtime.

The new server has a valid kubeconfig and kubectl commands work as expected. All other nodes show a ‘NOT READY’ state.

We’ve narrowed it down to the CA hash in the K3S_TOKEN changing after adding the new server. Updating this in the systemd (/etc/systemd/system/k3s.service.env or /etc/systemd/system/k3s-agent.service.env) variables on the other nodes restores the connection to the cluster, but isn’t handy.

A number of apps, including CoreDNS also fail as their service account tokens are no longer valid. This is easily resolved by deleting the tokens, followed by the pods to generate new tokens, but again, isn’t handy.

Is there a reason the K3S_TOKEN is updating with each new additional server, or is there a specific process needed in order to add new server nodes to a HA setup?

Script used for adding new server nodes

#!/bin/bash
TOKEN="<token>" # Token from initial server.
REG_URL="hostname:6443" # Hostname of server node.
K3S_VERSION="v1.20.7+k3s1" # Must match server. 

if [[ -z $REG_URL || -z $K3S_VERSION || -z $TOKEN ]]; then
  echo "Error: One or more variables are undefined."
  exit 1
fi

curl -sfL https://get.k3s.io | K3S_URL=$REG_URL K3S_TOKEN=$TOKEN INSTALL_K3S_VERSION=$K3S_VERSION sh -

Topic		Replies	Views
I cant get my 3rd HA node to work I get unhealthy cluster	0	1312	September 14, 2022
Can't join additional master nodes to single-node cluster with embedded etcd k3s, k3OS, and k3d	1	924	May 15, 2024
Invalid bearer token, square/go-jose: error in cryptographic primitive when adding master nodes k3s, k3OS, and k3d	0	1859	December 14, 2022
Adding floating ip to existing cluster issue k3s, k3OS, and k3d	0	914	December 12, 2021
Reinstall causes x509: certificate signed by unknown authority k3s, k3OS, and k3d	6	12772	March 16, 2022

Adding additional servers to cluster results in certificate verify errors

Related topics