I’ve set up a new Rancher cluster. 3 t2.small RancherOS nodes in AWS with an ELB directing traffic over to port 80 across the 3 nodes, which is the Rancher HAProxy LB, configured to direct requests over to the application containers (WordPress, in my case).
I did some testing using SwitchyOmega in Chrome with a PAC file to route traffic over for the same site to the ELB, HAProxy, and the direct container. I have the Chrome developer tools open to the timeline, and I manually refresh after I adjust the PAC file to switch the target so I can see the load time. I’m seeing some pretty unhappy results:
ELB - 4.1m
HAProxy - 1.8m
Container - 1.97s
I can accept that since this is a testing setup the ELB will underperform, as I know it scales behind the scenes to handle the kind of traffic that it’s getting. But a 54x speed difference between going through the HAProxy container to the actual Nginx WordPress container? That’s amazingly bad. I had thought that the slow performance I was seeing was due to downsizing the new cluster to t2.small nodes, but I discussed with vincent99 on IRC and confirmed that my credit balance was fine on the t2 nodes, so CPU shouldn’t be an issue. The other stack I have, I have a virtual F5 EC2 instance to direct traffic over to the containers directly, which bypasses both the ELB and HAProxy components of the new setup.
Any suggestions are welcome, as I would really like to be able to shut down that F5 instance and not have it show up on next month’s bill. But for right now, I’m thinking I need to keep it in place so that I can shut down the old m3.medium node that is on a Rancher 1.2.0 setup that seems to have choked during the upgrade. It’s several days later and I still see the “Upgrading environment” banner and other oddities.
I’m checking out Traefik now.