Rancher RKE2 cluster airgap installation getting failed

Hello Everyone

we have deployed our rancher on docker. We are trying to create a RKE2 cluster through UI.

As this is a airgap installation we have created Registry and configured containerd to use this registry for downloading images using resgistry.yaml.

We have noticed the cluster is not coming up and its in provisioning Status only.

The provisioning log says
[INFO ] provisioning bootstrap node(s) : waiting for agent to check in and apply initial plan

Let me know if anyone faced this issue anytime before and resolved this.

Thanks in advance.

I can’t say I know exactly, but my understanding is installing the cluster through the UI means you’re using the TAR install method rather than the RPM method.

If the error you see on the UI says it’s waiting for the agent to check in then that means that it’s pinged enough for Rancher to be aware of it but it’s helper service hasn’t said it completed (if it’s not in the Rancher UI then it never managed to talk to Rancher). Either way you’ll want to check your services on the node you deployed to for a service named rancher-system-agent (if you’re stuck with systemd like most of us, then running systemctl and then typing /rancher and maybe repeating ‘/’ and hitting enter if something else pops up first until it tells you nothing more is found should take you to it or tell you it’s not found). You should check the logs for that service (which with systemd is journalctl -u rancher-system-agent and see if you find anything to help.

Hopefully that’ll get you to the answer or at least the next step. As another note, this forum doesn’t have a large footprint by Rancher employees but their Slack does, so if you aren’t getting the answers you need here you might sign up for that and ask there too.

Thank you @wcoateRR for responding.

I went through the journalctl logs and error says it received a secret which is null

29 rancher-system-agent[2889731]: time="2022-03-05T13:37:29+04:00" level=error msg="[K8s] Received secret that was nil"
34 rancher-system-agent[2889731]: time="2022-03-05T13:37:34+04:00" level=error msg="[K8s] Received secret that was nil"
39 rancher-system-agent[2889731]: time="2022-03-05T13:37:39+04:00" level=error msg="[K8s] Received secret that was nil"
44 rancher-system-agent[2889731]: time="2022-03-05T13:37:44+04:00" level=error msg="[K8s] Received secret that was nil"
49 rancher-system-agent[2889731]: time="2022-03-05T13:37:49+04:00" level=error msg="[K8s] Received secret that was nil"
54 rancher-system-agent[2889731]: time="2022-03-05T13:37:54+04:00" level=error msg="[K8s] Received secret that was nil"
59 rancher-system-agent[2889731]: time="2022-03-05T13:37:59+04:00" level=error msg="[K8s] Received secret that was nil"
04 rancher-system-agent[2889731]: time="2022-03-05T13:38:04+04:00" level=error msg="[K8s] Received secret that was nil"
09 rancher-system-agent[2889731]: time="2022-03-05T13:38:09+04:00" level=error msg="[K8s] Received secret that was nil"

I havent created any secrets so far and this is master node (Deploying all components Control plane, etcd and worker).

Now what you can do here depends on how far the RKE2 install got.

If it didn’t get to deploying containers, then all you can do is look higher up in that journalctl file for other odd entries or poke through Rancher to see if you see errors on its side (you might want to do this anyway).

If it did get to the point of installing the RKE2 binaries and start trying to launch containers, then you can use export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml to be able to use the /var/lib/rancher/rke2/bin/crictl command to start doing the same sort of things you’d do with the docker command (ps to get running containers, exec to run a shell in one, I think logs to get container logs, etc). That might also get you some info.

I started typing up some options with kubectl but you’re at a point where that won’t work yet.