Failing to Communicate with Kubernetes API server after Load Testing Cluster

I’m new to Rancher and Kubernetes. I ran a load test on a service running on my server and it caused rancher and my deployments running on the cluster to stop responding. After a period of time my deployments are accessible again and running find. I can also login into Rancher, but rancher cannot connect to kubernetes API server.

Error in Rancher:

This cluster is currently Unavailable; areas that interact directly with it will not be available until the API is ready.

Failed to communicate with API server: Get net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Running Cluster on DigitalOcean. Nodes Deployed using DigitalOcean Node Driver.

Any suggestions on how to get my kubernetes api server functioning again? It been a few hours since it has been down so I don’t think it’s spinning it back up.


This starts with how your cluster was setup (What droplet sizes did you use for Rancher and for the cluster nodes, how many cluster nodes did you use and what role did every node have?)

If the performance testing was done running pods without any resource limits, probably nodes have been going down and you can only survive that if you have the correct amount of nodes and roles.

Thank for your response!

The “cluster” is composed of a single 4GB (2 vCPUs) droplet. I do not have any resource limits on my pods.

It is a development setup but I’m trying to learn from it. For the sake of learning, is their a way to salvage the cluster and bring it back?

In a production setup, I’m planning on having:

  • 1 - 4GB (2 vCPUs) droplet - etcd & Control Plane.
  • 2 - 4GB (2 vCPUs) droplet - etcd & Control Plane & Worker.
  • 2 - 4GB (2 vCPUs) droplet - Worker.

Is this a proper setup? Should I have resource limits on my containers as well? Even though certain containers have peak load times.

Are you running rancher/rancher on that single droplet and have the same node added to a cluster? As that will be close to getting out of resources and then the load test hasn’t even started.

I would stay away from mixing roles, in a smaller setup you can possibly mix etcd and controlplane but every role separated is the best option.

Putting limits on your pods is a good practice yes.

rancher/rancher is running on a separate 4GB (2 vCPUs) droplet from the kubernetes cluster.

What size droplet do you suggest for running etcd and controlplane together on a node? Would a 3 node cluster of 2GB (1 vCPU) droplets running etcd & control plane be a good option for a small setup? I have can’t find any good info on proper hardware sizes for the roles.

For Example:

  • 3 - 2GB (1 vCPU) droplet - etcd & Control Plane.
  • 3 - 4GB (2 vCPUs) droplet - Worker.

Thanks so much for your help.

nice, another quick question then.

I recently seperated my etcd and control plane from my workers to avoid the kubernetes api being unavailable.

If I had a resource issue in the future which made the cattle system inaccessible whats gonna be the best way to kill off the containers which are causing the problems ?