Ingresses, External DNS and fault tolerance


With Rancher 1.6 we were using the external-dns route53 service to sync. the IP of our load-balancers with our DNS records stored on AWS. It worked well. For instance if one of the node running a load-balancer had a critical failure its IP would automatically get removed from the external DNS records and clients would stop trying to reach it.

We are trying to do kind of the same thing on Rancher 2.x. We tried to do it using kubernetes-incubator/external-dns. It works (which is great already :p) but when a node fails its IP will still be listed in the DNS records anyway.
I guess kubernetes-incubator/external-dns doesn’t do any health check, it’s just reading the configuration from K8s concerning ingresses and their external IPs and that’s it.
Since the IP of one of the failed node is still within our DNS records we will still have clients trying to connect to it.

Are we doing something wrong with kubernetes-incubator/external-dns ?
Or maybe is there another way to get the the same goal : automatically removing node’s external IP from the DNS records if the node is not responding or in a failed state ?

When reading the doc it seems that having external/cloud load-balancers is the way to go. Should we go that way and just don’t spend anymore time on trying to work with DNS records synchronizations ? Are most of the deployments of Rancher 2.x configured that way ?


I’m talking to myself here but maybe this can help someone else :
In the end we went for the ‘health check’ option on Route53. Route53 is checking the /healthz URL every 10sec on our nodes and if the result is not HTTP 200 then the IP address of the failed node will be removed from the DNS entries. It would then take another minute for the client to stop trying to reach this failed node since that’s the TTL of our DNS entries.