I’m using Rancher on AWS. Right now I have 3xm4.large instances and 2 are in 1 AZ and 1 is in another AZ. But most of my services have global scheduling and I’ve found the memory usage on the instance running rancher server is a bit too high compared to the other two nodes. I am thinking about how best to rearchitect it so that the server container doesn’t affect my hosts serving containers. Should I scale up that instance to a 2xl, or add a fourth node that just runs rancher server? Advice welcome.
So, We utilize an HA setup for Rancher Server and completely separate hosts for nodes to execute containers on (agent). This allows us to run a bit smaller instance type for the Rancher Servers, as they primarily are responsible for api driven processes. We also add and remove hosts to environments via terraform and api calls to Rancher Server to add hosts to environments as needed. Eventually we plan to run scaling automatically based on metrics, but for now it is semi-manual.
We spread hosts across multiple AZs, but often use labeling and scheduling rules to support running containers with-in an AZ when network latency is more critical. Often if latency is more critical than that, you’re probably not running in containers anyway.
I hope this helps.