K3S master join failed - HA with embedded ETCD

I have two fresh ubuntu VM(s)

  • VM-1 (65.0.54.158)
  • VM-2 (65.2.136.2)

I am trying to set up a HA k3s cluster with embedded ETCD. I am referring to the official document

Here is what I have executed on VM-1

curl -sfL https://get.k3s.io | K3S_TOKEN=AtJMEyWR8pE3HR4RWgT6IsqglOkBm0sLC4n0aDBkng9VE1uqyNevR6oCMNCqQNaF sh -s - server --cluster-init

Here is the response from VM-1

curl -sfL https://get.k3s.io | K3S_TOKEN=AtJMEyWR8pE3HR4RWgT6IsqglOkBm0sLC4n0aDBkng9VE1uqyNevR6oCMNCqQNaF sh -s - server --cluster-init
[INFO]  Finding release for channel stable
[INFO]  Using v1.24.4+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.24.4+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.24.4+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s

additionally, I have checked

sudo kubectl get nodes

and this worked perfectly

NAME              STATUS   ROLES                       AGE   VERSION
ip-172-31-41-34   Ready    control-plane,etcd,master   18m   v1.24.4+k3s1

Now I am going to ssh into VM-2 and make it join the server running on VM-1

curl -sfL https://get.k3s.io | K3S_TOKEN=AtJMEyWR8pE3HR4RWgT6IsqglOkBm0sLC4n0aDBkng9VE1uqyNevR6oCMNCqQNaF sh -s - server --server https://65.0.54.158:6443

response

[INFO]  Finding release for channel stable
[INFO]  Using v1.24.4+k3s1 as release
[INFO]  Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.24.4+k3s1/sha256sum-amd64.txt
[INFO]  Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.24.4+k3s1/k3s
[INFO]  Verifying binary download
[INFO]  Installing k3s to /usr/local/bin/k3s
[INFO]  Skipping installation of SELinux RPM
[INFO]  Creating /usr/local/bin/kubectl symlink to k3s
[INFO]  Creating /usr/local/bin/crictl symlink to k3s
[INFO]  Creating /usr/local/bin/ctr symlink to k3s
[INFO]  Creating killall script /usr/local/bin/k3s-killall.sh
[INFO]  Creating uninstall script /usr/local/bin/k3s-uninstall.sh
[INFO]  env: Creating environment file /etc/systemd/system/k3s.service.env
[INFO]  systemd: Creating service file /etc/systemd/system/k3s.service
[INFO]  systemd: Enabling k3s unit
Created symlink /etc/systemd/system/multi-user.target.wants/k3s.service → /etc/systemd/system/k3s.service.
[INFO]  systemd: Starting k3s
Job for k3s.service failed because the control process exited with error code.
See "systemctl status k3s.service" and "journalctl -xe" for details

Here is the contents of /var/log/syslog

Sep  6 19:10:00 ip-172-31-46-114 systemd[1]: Starting Lightweight Kubernetes...
Sep  6 19:10:00 ip-172-31-46-114 sh[9516]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Sep  6 19:10:00 ip-172-31-46-114 sh[9517]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Sep  6 19:10:00 ip-172-31-46-114 k3s[9520]: time="2022-09-06T19:10:00Z" level=info msg="Acquiring lock file /var/lib/rancher/k3s/data/.lock"
Sep  6 19:10:00 ip-172-31-46-114 k3s[9520]: time="2022-09-06T19:10:00Z" level=info msg="Preparing data dir /var/lib/rancher/k3s/data/577968fa3d58539cc4265245941b7be688833e6bf5ad7869fa2afe02f15f1cd2"
Sep  6 19:10:02 ip-172-31-46-114 k3s[9520]: time="2022-09-06T19:10:02Z" level=info msg="Starting k3s v1.24.4+k3s1 (c3f830e9)"
Sep  6 19:10:22 ip-172-31-46-114 k3s[9520]: time="2022-09-06T19:10:22Z" level=fatal msg="starting kubernetes: preparing server: failed to get CA certs: Get \"https://65.0.54.158:6443/cacerts\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
Sep  6 19:10:22 ip-172-31-46-114 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Sep  6 19:10:22 ip-172-31-46-114 systemd[1]: k3s.service: Failed with result 'exit-code'.
Sep  6 19:10:22 ip-172-31-46-114 systemd[1]: Failed to start Lightweight Kubernetes.

I am stuck at this for two days. I would really appreciate some help setting it up. Thank you.

Hello,

Do you have a firewall between nodes ? It seems that the vm-2 can not reach VM-1 (timeout).
Please test with curl:

curl -vk https://65.0.54.158:6443

or with nc:

nc -v 65.0.54.158 6443

Additionnaly, can you post the result of:

kubectl get nodes -o wide

It seems that you have 2 network cards ? One public, and one private (local) ?

Was there a resolution to this issue @fidays ? I have exactly the same issue.

When I join the nodes without the HA etcd embedded commands, it works just fine, but the vanilla HA etcd commands from the docks fail every time.

@fidays @wfaler
Was there a resolution to this issue? I have exactly the same issue.

I have tried lowering the version to “v1.22.2+k3s1” and tried to make architecture through a load balancer, but it did not work.
Is it related to the fact that I am developing using ubuntu 22.04?