LB-HAproxy loses restarted containers


#1

github issue: https://github.com/rancher/rancher/issues/17964
Hi! I found this problem with relation of rancher/lb-service-haproxy:v0.9.3 and 0.9.6 Tested in (Rancher 1.6.20 and 1.6.26)
Once containers in service are stopped there is Draintimeout. And LB does not provide new connections to the container that is being stopped. But once container started again configuration in lb is not reloaded and running container is skipped for newly coming connections. If all containers in service were restarted there is no container left that can accept new connection from LB perspective. And service is left in zombie state it is alive but dead at the same time.

I mentioned that config reloading in lb happens sometimes if there is small delay between container in stopped state and started again. However it happens very infrequently.


#2

LOGS showing behaviour that was present in 0.7.15 or 0.7.17 LBs:
2/6/2019 8:50:28 AMtime=“2019-02-06T06:50:28Z” level=info msg=“Received target.drain IP: 10.42.162.23, drainTimeout: 10000, eventID: 466b4379-5eef-4171-9ad8-07e1aafbecda, resourceID 1i349872”
2/6/2019 8:50:28 AMtime=“2019-02-06T06:50:28Z” level=info msg="[Endpoint IP: 10.42.162.23, name: 1516362c18a417e74e830f2313719dec917ffae6] Result: Drain not needed"
2/6/2019 8:50:28 AMtime=“2019-02-06T06:50:28Z” level=info msg=“New reply: &client.Publish{Resource:client.Resource{Id:…
2/6/2019 8:50:29 AMtime=“2019-02-06T06:50:29Z” level=info msg=” – reloading haproxy config with the new config changes\n * Reloading haproxy haproxy…

Here is log record with reloading config: – reloading haproxy config with the new config changes\


#3

Here is output from lb v0.9.3:
I restarted the same container 2 times no reloading happening in any case. An it is not accessible from LB
1 reastart
2/6/2019 8:51:01 AMtime=“2019-02-06T06:51:01Z” level=info msg=“Received target.drain IP: 10.42.162.23, drainTimeout: 10000, eventID: b1d1ce0c-3d3d-44d0-bfa7-e17346653891, resourceID 1i349872”
2/6/2019 8:51:01 AMtime=“2019-02-06T06:51:01Z” level=info msg="[Endpoint IP: 10.42.162.23, name: 1516362c18a417e74e830f2313719dec917ffae6] Result: Drain not needed"
2/6/2019 8:51:01 AMtime=“2019-02-06T06:51:01Z” level=info msg=“New reply: &client.Publish{R”
2 restart
2/6/2019 8:55:43 AMtime=“2019-02-06T06:55:43Z” level=info msg=“Received target.drain IP: 10.42.162.23, drainTimeout: 10000, eventID: 7e5a2904-9771-48cf-a48b-c921c0174c84, resourceID 1i349872”
2/6/2019 8:55:43 AMtime=“2019-02-06T06:55:43Z” level=info msg="[Endpoint IP: 10.42.162.23, name: 1516362c18a417e74e830f2313719dec917ffae6] Result: Drain not needed"
2/6/2019 8:55:43 AMtime="2019-02-06T06:55:43Z
3 restart starting
2/6/2019 8:56:10 AMtime=“2019-02-06T06:56:10Z” level=info msg=“Received target.drain IP: 10.42.162.23, drainTimeout: 10000, eventID: d3b89076-beee-4237-958b-d41f0d3a6de5, resourceID 1i349872”