I have a deployment containing 3 pods of my microservice which hosts a web api. Scaling these microservices is currently giving me very weird behavior and i cannot work out what on earth is causing it but i believe it has only been occurring since upgrading in the last few months.
When I scale a deployment it creates me a new, 4th, pod which seems to then error resolving DNS names and calling other endpoints. If i scale back down to 3 and then scale back to 4 BEFORE the old 4th instance has been fully removed then the new 4th pod works completely fine. However if i wait for the old 4th pod to be removed before creating a new 4th pod then the new one also exhibits the same odd networking behavior.
It also seems to sometimes get itself into a state where the pods ONLY work when there is a pod in the ‘removing’ state and when it is removed and gone from the list then network problems occur again.
Does anyone know any useful tips on diagnosing this issue?