New install of Rancher v2.5.1, cattle-cluster-agent can't resolve the Rancher host

Note, new users can’t put more that two links in a post, so wherever you see redacted, presume it’s a url in the form reda.cted.org

'm trying to build a new K8s cluster using Rancher, but the cattle-cluster-agent can’t seem to resolve the host that Rancher is running on.

Rancher is on 10.x.x.101, and it started a node on 10.x.x.102. From the command prompt on 102, dc01.reda.cted.org resolves to 10.x.x.101, but the container can’t resolve it. Can someone tell me what I’m missing?

The cattle-cluster-agent is always exiting with the following error message:

INFO: Environment: CATTLE_ADDRESS=10.42.0.4 CATTLE_CA_CHECKSUM=5expurgated0 CATTLE_CLUSTER=true CATTLE_FEATURES= CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=true CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-84f5865486-k9gl5 CATTLE_SERVER=https://dc01.reda.cted.org
INFO: Using resolv.conf: nameserver 10.43.0.10 search cattle-system.svc.cluster.local svc.cluster.local cluster.local redacted options ndots:5
ERROR: dc01.redacted/ping is not accessible (Could not resolve host: dc01.redacted)

I nuked-and-paved it into Rancher v2.5.3, but that didn’t help.

I did find that the coredns container was failing to connect to my DNS provider:

[ERROR] plugin/errors: 2 4274169997662007610.2777994250524695476. HINFO: read udp 10.42.0.5:38998->10.x.x.x:53: read: no route to host. Is it possible that this is because the default internal network is part of the default external network’s address space? I would have thought that masquerading would handle that.

Apparently this is a known issue with CentOS iptables. I ran this:

systemctl stop docker
iptables –flush
iptables -tnat –flush
systemctl start docker

And that allowed the node to connect.

2 Likes

On what nodes did you do this on? The main rancher mgmt node? Or the subsequent nodes? I’m using AlmaLinux/RockyLinux and having the SAME issue I think.