Kube DNS failing?

G’Day,

My rancher journey is going super-well – massive props to the team. This is a great product!

I’ve set up an HA cluster using the the rancher agent. This worked well, but I’ve done two things which seem to have broken DNS in particular:

  1. I had started rancher on one of the nodes, and since I wanted that node to also be a master, I needed to migrate it. I’ve migrated rancher elsewhere, re-run the agent on the machines, and things seem happy aside from DNS.
  2. I’ve rebooted one of the machines (but the other two have been up as-is).

My services are still available, but none of them can resolve domain names.

Any ideas of what to look for, or is this a kube question and I should be asking there?

Dave.

Hi,

What OS are you using on your hosts?
Ubuntu 17.10 and 18.04 have problems with DNS, as kube-dns/core-dns fail on systemd-resolved based systems.
See this Rancher issue: https://github.com/rancher/rancher/issues/13504
See this kubeadm issue: kubernetes/kubeadm#273
Workaround in this issue: kubernetes/kubeadm#787
On the hosts workaround via:
sudo rm /etc/resolv.conf
sudo ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf

That helped me on hosts running Ubuntu 18.04

Thanks for the reply! I’m using CentOS 7 as the underlying OS for the cluster. I logged into the kube-dns threads to look at the logs and didn’t see anything worrying.

Eventually I have had to re-create the cluster entirely, and I now have it operational again. Fortunately this was early in my Kube/Rancher journey, so it only took me a day or so, however this would be pretty bad later on.

I would have thought that Kube would have checks to ensure that its DNS service was operating correctly. I wonder whether there was an underlying issue with the network that was created by moving rancher off, or rebooting, or…?

Either way, for me this is resolved, but it’s something that should get attention and resolution.