Hi,
We’ve installed Rancher on a RHEL cluster, and set up the server separate from all the K8S related nodes. We’ve successfully set up the cluster, deployed test workloads using nginx, and got the ingress controller working. The K8S nodes are all running happily for several days. However, the machine running the Rancher server develops problems after a couple of days. The Rancher server UI isn’t reachable, and also sshd quits responding to attempts to reach the machine. Then, sshd mysteriously started working again, but when I attempt to do anything related to Docker, the command hangs and does not respond.
We’ve seen this behavior on two different RHEL VMs running the Rancher server. I want to think that nothing happening inside a Docker container could possibly impact OS level resources like sshd, but this is looking like a might strong coincidence.
Has anyone else seen this?