DNS broken for single workload: can't resolve '(null)'


while upgrading my stacks from Rancher 1.6 to 2.1 I ran into a strange issue.

I deployed nextcloud:13-fpm-alpine as workload + a sidecar for the cronjob and all the other services needed (Redis, MariaDB, …) and everything was looking good.

However, external DNS is broken inside of the Nextcloud container only. The Redis and MariaDB container resolve rancher.com just fine but the Nextcloud container and sidecar will fail to resolve anything that isn’t a discoverable service.

The output inside of the Nextcloud container looks like this:

/var/www/html # nslookup rancher.com
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'rancher.com': Try again

Removing options ndots:5 from resolv.conf will lead to a result but it takes 5-10 seconds:

/var/www/html # nslookup rancher.com
nslookup: can't resolve '(null)': Name does not resolve

Name:      rancher.com
Address 1:
Address 2:
Address 3: 2606:4700:20::6818:1033
Address 4: 2606:4700:20::6818:1133

This is the only workload where this happens, with the difference being that it has a sidecar and it uses the Nextcloud image. This works fine in both Rancher 1.6 and any other workload in this project/namespace.

Any ideas what could cause this? Does anybody else see this with the image nextcloud:13-fpm-alpine?

Switching to nextcloud:13-fpm solves this issue, any idea why?

Alpine uses a different libc then most other base image, which has a different DNS resolver in it with different behaviors and features (mostly not) implemented.

5-10 seconds maybe suggests timeout(s) trying to contact a resolver though.

Just tried more Alpine containers and that is indeed the cause.

Is there a way to resolve this? Since there is just a single nameserver the only timeout that could happen is to the Kubernetes internal DNS. Unless Alpine uses something outside of resolv.conf for DNS resolving.