I am running several Rancher Cluster under latest stable CoreOS (currently 1353.7.0) on bare metal with Rancher v1.5.6 on 2 - 4 nodes. All systems run stable during the whole day but in the morning all services from all stacks (user and infrastructure) are no longer healty and restarting permanently. Only a reboot of all nodes in parallel resolves the situation.
What I observed is that on the master node (running the Racher server) the IP address changes during night from the physical (172.30.14.52) to some virtual IP address (172.17.0.1) and from that time healthcheck and other services seem to loose connection and restart permanently. All other agent nodes keep the physical address visible in the Infrastructure → Hosts page.
Our first Racher system still running under v1.4.1 with 3 nodes is stable since weeks!