Service Discovery not working after upgrading a pod...?!

Hi,

we recently upgraded from Rancher 1.6 (Cattle) to Rancher 2.1.5 and are facing some issues with the Service Discovery of k8s. Unfortunately, I do not know what we did wrong or why that happens.

Our current setup consists of several web server pods, all writing to two replicating database pods. They are resolving to this databases via a service discovery record. Everything works fine until I am upgrading one database pod. After that none of the web server pods are resolving correctly to the upgraded and fully working database pod?

I get it running properly only if I upgrade and restart all the web server pods also. Unfortunately that’s not optimal, because that means several minutes downtime for us. What are we doing wrong?

In Rancher 1.6 everything worked as expected with zero downtime. Every container was able to directly reach the new upgraded database container without the need of upgrading it also?!

Thank you in advance.