Failing to Communicate with Kubernetes API server after Load Testing Cluster

calebmcelroy · September 9, 2018, 12:15am

I’m new to Rancher and Kubernetes. I ran a load test on a service running on my server and it caused rancher and my deployments running on the cluster to stop responding. After a period of time my deployments are accessible again and running find. I can also login into Rancher, but rancher cannot connect to kubernetes API server.

Error in Rancher:

This cluster is currently Unavailable; areas that interact directly with it will not be available until the API is ready.

Failed to communicate with API server: Get https://206.81.13.161:6443/api/v1/componentstatuses: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Running Cluster on DigitalOcean. Nodes Deployed using DigitalOcean Node Driver.

Any suggestions on how to get my kubernetes api server functioning again? It been a few hours since it has been down so I don’t think it’s spinning it back up.

Thanks!

superseb · September 10, 2018, 6:16am

This starts with how your cluster was setup (What droplet sizes did you use for Rancher and for the cluster nodes, how many cluster nodes did you use and what role did every node have?)

If the performance testing was done running pods without any resource limits, probably nodes have been going down and you can only survive that if you have the correct amount of nodes and roles.

calebmcelroy · September 15, 2018, 12:46pm

Thank for your response!

The “cluster” is composed of a single 4GB (2 vCPUs) droplet. I do not have any resource limits on my pods.

It is a development setup but I’m trying to learn from it. For the sake of learning, is their a way to salvage the cluster and bring it back?

In a production setup, I’m planning on having:

1 - 4GB (2 vCPUs) droplet - etcd & Control Plane.
2 - 4GB (2 vCPUs) droplet - etcd & Control Plane & Worker.
2 - 4GB (2 vCPUs) droplet - Worker.

Is this a proper setup? Should I have resource limits on my containers as well? Even though certain containers have peak load times.

superseb · September 17, 2018, 8:08am

Are you running rancher/rancher on that single droplet and have the same node added to a cluster? As that will be close to getting out of resources and then the load test hasn’t even started.

I would stay away from mixing roles, in a smaller setup you can possibly mix etcd and controlplane but every role separated is the best option.

Putting limits on your pods is a good practice yes.

calebmcelroy · September 18, 2018, 12:31pm

rancher/rancher is running on a separate 4GB (2 vCPUs) droplet from the kubernetes cluster.

What size droplet do you suggest for running etcd and controlplane together on a node? Would a 3 node cluster of 2GB (1 vCPU) droplets running etcd & control plane be a good option for a small setup? I have can’t find any good info on proper hardware sizes for the roles.

For Example:

3 - 2GB (1 vCPU) droplet - etcd & Control Plane.
3 - 4GB (2 vCPUs) droplet - Worker.

Thanks so much for your help.

Josh_H · September 19, 2018, 3:55am

nice, another quick question then.

I recently seperated my etcd and control plane from my workers to avoid the kubernetes api being unavailable.

If I had a resource issue in the future which made the cattle system inaccessible whats gonna be the best way to kill off the containers which are causing the problems ?

Topic		Replies	Views
Cluster unavailable - Failed to communicate with API server - waiting for cluster agent to connect Rancher	0	3347	February 28, 2019
Cluster api access is stuck on a missing node Rancher	4	1281	April 15, 2022
After Upgrade from 2.3.5 to 2.4.2: Failed to communicate with API Server Rancher	2	4088	October 19, 2021
Rancher API Server Rancher	0	1142	July 9, 2020
Cluster agent down - kubectl cant connect - GUI Failed to communicate with API server Rancher	3	4183	August 31, 2022

Failing to Communicate with Kubernetes API server after Load Testing Cluster

Related topics