It all started a few days ago: for some reason my deployments weren’t available any more, so I had a look at the rancher dashboard and it told me that
Controller Manager and
Scheduler were both unhealthy. So i first connected to the VM where both were running on, and I noticed a very high CPU usage where docker took about 300-400% of CPU. So I decided to restart the VM, and a few minutes later everything was fine again.
But since that day, I experience almost daily problems with the cluster. It starts after a few hours and the VM on which
Controller Manager ,
Etcd are running get’s high on CPU, where the docker process again takes up 200-300% of CPU and over time more and more
runc processes are started. If I would not restart the VM, a few hours later the VM turns unusable because of the high CPU load. Further, deployments are nearly impossible because pods are get stuck on
Creating container message and all pods turn into
Unknown status. The cluster overview of rancher tells me that most of the time either one VM or both VMs are
Both VMs have 16GB of RAM 4 Core CPUs. I didn’t change anything on the cluster configuration over the last months. The cluster went well for almost two years now.
Anyone else experienced those problems? Honestly, I don’t know what to do.
Kubernetes Version: v1.16.3
Rancher Version: 2.3.3
Docker Version: Main VM: 18.09.6 Worker VM: 19.03.8
This is the recent event log: