I am testing Rancher out and am curious how far people have pushed Rancher. I’m curious how many nodes it can scale to, how many containers it can handle and also if the network overlay ever has issues.
Basically I’m just curious what breaks first and what bottlenecks exist at scale. Thanks.
The honest answer is we haven’t done a ton of load or scale testing lately, so there are likely pieces that could use improvement. There will probably be a good round of that towards the end of the year. But nobody I know of has come to us on fire yet.
There are definitely some limits at which things need to be scaled out or modified, e.g. the managed networks are a /16 by default and there is a connection per host using an ephemeral port on the API server.
Everything is designed in general to work out of the box with the one docker run ... rancher/server for the needs of most people, with various bits that can be swapped out and scaled independently for large or HA deployments. Over time those bits will become fully managed so you can do something like click a button to enable HA or scale out.
The best public example is probably actually from before we had a real company and it was still called Stampede… http://youtu.be/fmYqm7TC7GI (starting around 13m). Launching ~128,000 containers on ~200 hosts provided by DigitalOcean (around 45min). In between is a lot of relevant Darren talking about the architecture, and some screens of the primitive UI I put together for it in a few weekends .x