AWS HA Rancher 1.6 Server ASG shows disconnected Rancher hosts after termination

After building a new HA Rancher v1.6.14 cluster and adding K8s hosts(plane isolation). If we try roll(terminate) one of the Rancher server nodes in an ASG in order to update the AMI. The Rancher server came up fine, but we saw K8s nodes listed in a disconnected state.

Even though the Rancher server nodes have new IP addresses, when checking the docker logs of the Rancher server container, we see that the Cluster membership changed message(which is expected). The K8s hosts remain disconnected in GUI even though it shows there are some containers running.

After rebooting the K8s hosts, they show up as connected. Though, the K8s clusters is performing in a degraded state(kubectl timeouts). Interestingly, we noticed that rancher-kubernetes-agent remains in an unhealthy state and we see the following logs from the Rancher server container.

2018-03-01 00:24:22,720 ERROR [:] [] [] [] [TaskScheduler-1] [ ] Failed to get ping from agent [2] count [3]

From the K8s nodes after new Rancher server host comes online:
What is more interesting is docker ps -a list:

CONTAINER ID        IMAGE                            COMMAND                  CREATED             STATUS                    PORTS               NAMES
5e6b23bfbf34        rancher/k8s:v1.7.7-rancher1      "/usr/bin/ ku"   4 hours ago         Dead                                          r-kubernetes-kubelet-3-c6e2b297
a0367533999e        rancher/k8s:v1.7.7-rancher1      "/usr/bin/ ku"   42 hours ago        Up 16 hours                                   r-kubernetes-proxy-8-b8736b50
b71180afff6b        rancher/healthcheck:v0.3.3       "/.r/r /rancher-entry"   42 hours ago        Up 42 hours                                   r-healthcheck-healthcheck-8-17e4d795

Afterward, I recently tested an in-place upgrade of Rancher server(same --advertise-address IP) and node agents using the following documentation: Again, I experienced a similar issue. I was able to get K8s running on the cluster, but it was experiencing degraded performance.

However, after building a completely new Rancher server with a clean database, everything is working as expected. I checked the AWS ELB and we are using classic with no stickiness. So far I have be unable to do a complete in-place upgrade or upgrade the Rancher server nodes without affect the stability of the cluster.