I have noticed that while upgrading, all containers still seems to be part of the load balancer targets.
e.g. if I spam http get requests to my load balanced service, some times there will be a huge latency when the load balancer hits a container that is currently upgrading.
even if I have a a big batch interval for the upgrade. when a specific instance is upgraded and hit by the balancer, there will be some latency.
Wouldn’t it be possible to have Rancher notifying the loadbalancer before it upgrades a container?
e.g. if you have 3 instances of the container, if instance #1 is about to be upgraded, the load balancer could first remove it from its target list. thus eliminating any quirky latency or bad requests. and add it back to the target list once the container is healthy again.
I might have misinterpreted some of the behavior, but it sure looks like what I observed above is how things work.
even if you add health checks to a service, those checks will not be fine grained enough to ensure complete responsiveness.
Is this the current state or am I misunderstanding something?
I have made some benchmarking on this topic.
I have used a http service that runs with scale 3 and a load balancer infront of it, and I have then tried upgrading it with different settings while at the same time hitting it with a tool that outputs request time to a log:
Here is a latency graph when using "start before stopping"
In the latter case, there are 3 spikes each of 5000 milliseconds while upgrading.
This behavior is consistent between runs, so they are not one time issues, but happens the same way when I run each scenario.
Is this due to start up overhead in my services? or is the loadbalancer hitting containers that are being shut down?
Is there anything I can do to mitigate this?
@rogeralsing - I had similar - is your service actually ready to process response when it shows as Running in Rancher? In my case, it was a Spring based Java application. The container was up and running but Tomcat was still starting up, deploying the Spring application, initial Spring bootstrap, etc. That would take 15 seconds after the container was marked as running. So the load balancer would start sending traffic to it. I solved that by having a health check. Now the container is in Initializing mode until the web application is actually fully started and responds to web requests and the load balancer only sends traffic once it’s done initializing.
I have tested this using a simple nginx & php-fpm stack (4 containers - 2 per host) behind rancher internal load balancer. If i run something like apachebench (ab) and perform an upgrade batch (1 at a time) it will indeed close already open connections (ie not inform haproxy of a container shutting down, it seems)…