Occasional ubuntu 18.04 host with chronic DNS failures

Daniel_Petersen · August 31, 2020, 6:31pm

Hi,
I am exclusively using Ubuntu 18.04 on hosts for a cluster. On occasion, I see certain hosts that have chronic DNS resolution issues. I’ve poked around quite a bit and it sounds like it may be related to the kubelet/systemd issue mentioned here.

If I understand this issue and the relevant commit) correctly, it intends to solve the problem by:

if systemd-resolved is detected, it adds --resolv-conf=/run/systemd/resolve/resolv.conf to kubelet.

@superseb Based on that fix, should I reasonably expect to always see --resolv-conf=/run/systemd/resolve/resolv.conf as a kubelet arg on all my ubuntu hosts?

I’ve taken a look at kubelet args on several hosts with docker inspect and in all cases I’m seeing:

"--resolv-conf=/etc/resolv.conf",

does this mean the fix isn’t behaving as expected, or is there something else here I’m not considering?

superseb · September 1, 2020, 12:59pm

To handle this seamlessly, it is done in the entrypoint of the container here: https://github.com/rancher/rke-tools/blob/master/entrypoint.sh#L53

So checking the kubelet process should be enough to validate it.

Also, if this is the issue, DNS would not work at all, not sporadically fail. If you have DNS issues, please create an issue here https://github.com/rancher/rancher/issues/new and describe your setup and what errors you see exactly. We have troubleshooting steps available here https://rancher.com/docs/rancher/v2.x/en/troubleshooting/ that might help in a first attempt to nail down the cause.

Daniel_Petersen · September 3, 2020, 6:04pm

Thanks for the reply.

Are you saying my assumptions are correct, namely that if I see the kubelet arg "--resolv-conf=/etc/resolv.conf" on an ubuntu 18.04 host that’s a member of a cluster, then the rancher isn’t properly detecting and/or accommodating for systemd-resolved when it launches kubelet?

Your point about if this were the issue, DNS would be consistently failing is an interesting one; I had the same thought. The challenge has been that I’ve not been consistently seeing this problem: most nodes are behaving properly, just on occasion I’ll get nodes that have chronic DNS issues from within kubernetes deployments (the containers within the pods). I’ll try to pull together good data and repro steps to write up an issue.

superseb · September 4, 2020, 12:13pm

The detection can be checked using:

ps -ef |grep kubelet | grep resolv-conf

which should result in a line ending in:

--resolv-conf=/etc/resolv.conf --client-ca-file=/etc/kubernetes/ssl/kube-ca.pem --cgroup-driver=cgroupfs --resolv-conf=/run/systemd/resolve/resolv.conf

where you can see the added --resolv-conf=/run/systemd/resolve/resolv.conf at the end which is added if resolved is detected.

Topic		Replies	Views
Kube DNS failing? Rancher	2	1486	May 25, 2018
Systemd-resolved - resolve.conf - ndots 5 - search domain Rancher	0	1599	April 8, 2020
Problem with DNS Loopback	1	2391	September 19, 2018
CoreDNS not configuring nameservers Rancher	4	2182	December 29, 2020
DNS resolution inside of container isn't working Rancher 1.x	3	2735	June 13, 2016

Occasional ubuntu 18.04 host with chronic DNS failures

Related topics