Hi All,
I’m running rancher v2.2.2 on a standalone VM with two rancher-built on-prem clusters running K8S v1.13.5. Docker on the VM (Ubuntu 18.04.3 LTS) is v18.09.7 build 2d0083d.
The webUI regularly squawks that it’s lost connection to the clusters with a red “Unavailable” button and if I click on the cluster a header reading “This cluster is currently Unavailable ; areas that interact directly with it will not be available until the API is ready.”
The VM is consistently running about 50% CPU utilization (4 cores). I had to add RAM to it a few times, it appears there might be a leak as it crept up to full usage until I finally gave it 32GB and it now seems to top out at 8GB used.
“docker logs rancher | grep -i error” on the VM shows a lot of this:
E0827 20:57:49.614914 6 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
E0827 20:57:49.615253 6 streamwatcher.go:109] Unable to decode an event from the watch stream: net/http: request canceled (Client.Timeout exceeded while reading body)
2019-08-27 20:57:55.622514 I | http: TLS handshake error from 127.0.0.1:46340: EOF
E0827 20:58:07.972251 6 request.go:853] Unexpected error when reading response body: &http.httpError{err:“net/http: request canceled (Client.Timeout exceeded while reading body)”, timeout:true}
E0827 20:58:07.972411 6 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ConfigMap: Unexpected error &http.httpError{err:“net/http: request canceled (Client.Timeout exceeded while reading body)”, timeout:true} when reading response body. Please retry.
2019/08/27 20:58:07 [ERROR] ClusterController c-9b748 [user-controllers-controller] failed with : failed to start user controllers for cluster c-9b748: timeout syncing controllers
2019/08/27 20:58:19 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.50:44508: i/o timeout
2019/08/27 20:58:19 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.51:50416: i/o timeout
2019/08/27 20:58:19 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.53:49654: i/o timeout
2019-08-27 20:58:20.605208 I | http: TLS handshake error from 127.0.0.1:46518: EOF
2019/08/27 20:58:19 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.52:35168: i/o timeout
2019/08/27 20:58:19 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.54:38440: i/o timeout
2019-08-27 20:58:28.998570 I | http: TLS handshake error from 127.0.0.1:46566: EOF
2019-08-27 20:58:28.998729 I | http: TLS handshake error from 127.0.0.1:46568: EOF
2019-08-27 20:58:29.001544 I | http: TLS handshake error from 127.0.0.1:46572: EOF
E0827 20:58:38.066194 6 request.go:853] Unexpected error when reading response body: &http.httpError{err:“net/http: request canceled (Client.Timeout exceeded while reading body)”, timeout:true}
E0827 20:58:38.066366 6 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ConfigMap: Unexpected error &http.httpError{err:“net/http: request canceled (Client.Timeout exceeded while reading body)”, timeout:true} when reading response body. Please retry.
2019/08/27 20:58:53 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.52:35268: i/o timeout
2019-08-27 20:59:02.449195 I | http: TLS handshake error from 127.0.0.1:46740: EOF
2019/08/27 20:58:53 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.51:50490: i/o timeout
2019-08-27 20:59:02.471820 I | http: TLS handshake error from 127.0.0.1:46756: EOF
2019-08-27 20:59:02.472121 I | http: TLS handshake error from 127.0.0.1:46758: EOF
2019/08/27 20:58:53 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.53:49764: i/o timeout
E0827 20:59:08.147494 6 request.go:853] Unexpected error when reading response body: &http.httpError{err:“context deadline exceeded (Client.Timeout exceeded while reading body)”, timeout:true}
2019/08/27 20:58:53 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.54:38514: i/o timeout
E0827 20:59:08.147662 6 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v1.ConfigMap: Unexpected error &http.httpError{err:“context deadline exceeded (Client.Timeout exceeded while reading body)”, timeout:true} when reading response body. Please retry.
2019/08/27 20:58:53 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.50:44578: i/o timeout
2019/08/27 20:59:08 [ERROR] ClusterController c-9b748 [user-controllers-controller] failed with : failed to start user controllers for cluster c-9b748: timeout syncing controllers
2019-08-27 20:59:13.932063 I | mvcc: store.index: compact 25150803
2019-08-27 20:59:13.938430 I | mvcc: finished scheduled compaction at 25150803 (took 4.060551ms)
2019/08/27 20:59:26 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.50:44656: i/o timeout
2019/08/27 20:59:26 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.54:38584: i/o timeout
2019/08/27 20:59:26 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.51:50560: i/o timeout
2019/08/27 20:59:26 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.52:35358: i/o timeout
2019/08/27 20:59:26 [INFO] error in remotedialer server [400]: read tcp 172.17.0.2:443->140.107.117.53:49866: i/o timeout
Any guidance would be appreciated!