LB-HAproxy loses restarted containers

Fermiy · February 8, 2019, 8:18am

github issue: https://github.com/rancher/rancher/issues/17964
Hi! I found this problem with relation of rancher/lb-service-haproxy:v0.9.3 and 0.9.6 Tested in (Rancher 1.6.20 and 1.6.26)
Once containers in service are stopped there is Draintimeout. And LB does not provide new connections to the container that is being stopped. But once container started again configuration in lb is not reloaded and running container is skipped for newly coming connections. If all containers in service were restarted there is no container left that can accept new connection from LB perspective. And service is left in zombie state it is alive but dead at the same time.

I mentioned that config reloading in lb happens sometimes if there is small delay between container in stopped state and started again. However it happens very infrequently.

Fermiy · February 8, 2019, 8:20am

LOGS showing behaviour that was present in 0.7.15 or 0.7.17 LBs:
2/6/2019 8:50:28 AMtime=“2019-02-06T06:50:28Z” level=info msg=“Received target.drain IP: 10.42.162.23, drainTimeout: 10000, eventID: 466b4379-5eef-4171-9ad8-07e1aafbecda, resourceID 1i349872”
2/6/2019 8:50:28 AMtime=“2019-02-06T06:50:28Z” level=info msg="[Endpoint IP: 10.42.162.23, name: 1516362c18a417e74e830f2313719dec917ffae6] Result: Drain not needed"
2/6/2019 8:50:28 AMtime=“2019-02-06T06:50:28Z” level=info msg=“New reply: &client.Publish{Resource:client.Resource{Id:…
2/6/2019 8:50:29 AMtime=“2019-02-06T06:50:29Z” level=info msg=” – reloading haproxy config with the new config changes\n * Reloading haproxy haproxy…

Here is log record with reloading config: – reloading haproxy config with the new config changes\

Fermiy · February 8, 2019, 8:24am

Here is output from lb v0.9.3:
I restarted the same container 2 times no reloading happening in any case. An it is not accessible from LB
1 reastart
2/6/2019 8:51:01 AMtime=“2019-02-06T06:51:01Z” level=info msg=“Received target.drain IP: 10.42.162.23, drainTimeout: 10000, eventID: b1d1ce0c-3d3d-44d0-bfa7-e17346653891, resourceID 1i349872”
2/6/2019 8:51:01 AMtime=“2019-02-06T06:51:01Z” level=info msg="[Endpoint IP: 10.42.162.23, name: 1516362c18a417e74e830f2313719dec917ffae6] Result: Drain not needed"
2/6/2019 8:51:01 AMtime=“2019-02-06T06:51:01Z” level=info msg=“New reply: &client.Publish{R”
2 restart
2/6/2019 8:55:43 AMtime=“2019-02-06T06:55:43Z” level=info msg=“Received target.drain IP: 10.42.162.23, drainTimeout: 10000, eventID: 7e5a2904-9771-48cf-a48b-c921c0174c84, resourceID 1i349872”
2/6/2019 8:55:43 AMtime=“2019-02-06T06:55:43Z” level=info msg="[Endpoint IP: 10.42.162.23, name: 1516362c18a417e74e830f2313719dec917ffae6] Result: Drain not needed"
2/6/2019 8:55:43 AMtime="2019-02-06T06:55:43Z
3 restart starting
2/6/2019 8:56:10 AMtime=“2019-02-06T06:56:10Z” level=info msg=“Received target.drain IP: 10.42.162.23, drainTimeout: 10000, eventID: d3b89076-beee-4237-958b-d41f0d3a6de5, resourceID 1i349872”

Topic		Replies	Views
Mechanics of rolling-update: is it robust? Rancher 1.x	1	952	July 26, 2016
HAProxy crash/restart issue Rancher v0.51.0 Rancher 1.x	1	1437	February 8, 2016
Load balancer issue Rancher 1.x	19	5439	February 2, 2016
Rancher 1.4 haproxy Rancher 1.x	1	1401	February 9, 2017
Rancher nodes not staying up after upgrade Rancher 1.x	2	801	September 19, 2017

LB-HAproxy loses restarted containers

Related topics