I don’t have a lot of experience with large clusters, because we run small clusters and don’t do a lot of churn with deployments.
However, I suspect m4.large may be larger than you need for the master nodes. “Too large” probably won’t cause anything to crash, but you might see a substantial cost savings with smaller nodes. List price on those m4.large EC2 nodes is $159.48/month (less if you set up a prepaid option). We have been using t2.medium master nodes, which list at $36.51/month, and things run just fine. The t2.small would be $18.94/month, but we haven’t tried that and probably won’t, just because that drops to a single vCPU.
You are still running rancher-v1.5.6 (2017-04-24). If you are using kubernetes orchestration, you are still running kubernetes-v1.5.4. If you could upgrade to rancher-v1.6.10 (2017-09-20), you would also upgrade to kubernetes-v1.7.4. We upgraded and it seems very stable for us. We’re quite happy, but the upgrade did involve downtime and a rebuild, as we expected. We didn’t attempt the docker upgrade because of the kubernetes dependency.