K3s Agent on Azure node unable to join cluster

Hi there :wave: I was able to join an AWS node with a GCP node using K3s and Kilo and this made my day. Honestly. I never before worked so deep down the network layer (configuring WireGuard to establish a VPN) and I’m pretty proud of myself The current state is committed to the repository.

Only the Azure node is making trouble. I read that --node-external-ip must be provided as argument to the primary K3s node, which I did (steps are documented here). Still the setup of K3s fails with this error:

[root@az-japaneast-node azureuser]$ curl -sfL [https://get.k3s.io](https://get.k3s.io/) | K3S_URL=[https://10.0.0.1:6443](https://10.0.0.1:6443/) K3S_TOKEN=$K3S_TOKEN sh -
[INFO] Finding release for channel stable
[INFO] Using v1.25.6+k3s1 as release
[INFO] Downloading hash https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/sha256sum-amd64.txt
[INFO] Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.25.6+k3s1/k3s
[INFO] Verifying binary download
[INFO] Installing k3s to /usr/local/bin/k3s
[INFO] Skipping installation of SELinux RPM
[INFO] Creating /usr/local/bin/kubectl symlink to k3s
[INFO] Creating /usr/local/bin/crictl symlink to k3s
[INFO] Creating /usr/local/bin/ctr symlink to k3s
[INFO] Creating killall script /usr/local/bin/k3s-killall.sh
[INFO] Creating uninstall script /usr/local/bin/k3s-agent-uninstall.sh
[INFO] env: Creating environment file /etc/systemd/system/k3s-agent.service.env
[INFO] systemd: Creating service file /etc/systemd/system/k3s-agent.service
[INFO] systemd: Enabling k3s-agent unit Created symlink /etc/systemd/system/multi-user.target.wants/k3s-agent.service → /etc/systemd/system/k3s-agent.service.
[INFO] systemd: Starting k3s-agent Job for k3s-agent.service failed because the control process exited with error code. See "systemctl status k3s-agent.service" and "journalctl -xe" for details.

The systemd status is as follows:

[root@az-japaneast-node azureuser]$ systemctl status k3s-agent.service
● k3s-agent.service - Lightweight Kubernetes
      Loaded: loaded (/etc/systemd/system/k3s-agent.service; enabled; vendor preset: enabled)
      Active: activating (auto-restart) (Result: exit-code) since Fri 2023-02-03 19:08:40 UTC; 845ms ago
      Docs: [https://k3s.io](https://k3s.io/)
     Process: 21049 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, status=0/SUCCESS)
     Process: 21051 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
     Process: 21052 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
     Process: 21053 ExecStart=/usr/local/bin/k3s agent (code=exited, status=1/FAILURE)
    Main PID: 21053 (code=exited, status=1/FAILURE)
         CPU: 101ms

Feb 03 19:08:40 az-japaneast-node systemd[1]: k3s-agent.service: Main process exited, code=exited, status=1/FAILURE
Feb 03 19:08:40 az-japaneast-node systemd[1]: k3s-agent.service: Failed with result 'exit-code'.
Feb 03 19:08:40 az-japaneast-node systemd[1]: Failed to start Lightweight Kubernetes.

Does anyone maybe have an idea how to resolve that? The other node on GCP was happily joining the primary node (running on AWS), though.

Found the issue in the meantime.

Cgroup v2

Cgroup v1 and Hybrid v1/v2 are not supported; only pure Cgroup v2 is supported. If K3s fails to start due to missing cgroups when running rootless, it is likely that your node is in Hybrid mode, and the “missing” cgroups are still bound to a v1 controller.

Source

For the Azure node, Cgroups v2 had to be enabled by modifying the cmdline for GRUB as described here.