Rancher 2.2.8 Admin server in restart loop

ER144 · February 27, 2020, 7:58pm

We have a docker rancher (non-clustered) admin server that for some reason last week started to behave very badly. (2 core 16 gig memory - tried also with 4 core 20 gig of memory with same results).

The CPU goes to 100% utilized (on 2 core shoes as 200%) and memory keeps creeping up till it runs out of memory, the instance is killed then automatically restarts. About every 90 seconds.

The same configuration has been running fine for months, then on a Sunday morning just started having issues. No apparent changes.

The log files show no sign of errors
there are painfully little/no diagnostic tools for this applications
adding debug to the logs shows nothing but a loop - but no errors that seem to indicate the issue. The debug output is turnned off every 90 seconds on a restart.

I really would appreciate any input on this or how to diagnose this issue.

Also - we utilize Dynatrace on the server, but a statically linked go application can not be monitored by Dynatrace - bummer!

snippet from logs (the bad certificate errors have been there before)

2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-tp8zd/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-ftb44/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-hbpb9/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
2020/02/27 12:12:32 [DEBUG] ClusterAlertRuleController calling handler cluster-alert-rule-controller c-6kxbq/high-memmory
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-ftb44/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-tp8zd/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-hbpb9/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
I0227 12:12:32.588229 7 trace.go:76] Trace[787931110]: “List /apis/project.cattle.io/v3/apprevisions” (started: 2020-02-27 12:12:31.801440445 +0000 UTC m=+30.655743469) (total time: 786.761999ms):
Trace[787931110]: [769.141486ms] [769.102334ms] Listing from storage done
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/management.cattle.io/v3/c-hx8mj/clusteralertrules
2020/02/27 12:12:32 [DEBUG] REST DELETE apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ClusterAlertRuleController calling handler cluster-alert-rule-controller c-6kxbq/etcd-system-service
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler project-alert-rule-controller p-hbpb9/less-than-half-workload-available
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-tp8zd/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-ftb44/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-hbpb9/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler project-alert-rule-controller p-ftb44/less-than-half-workload-available
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler project-alert-rule-controller p-tp8zd/memory-close-to-resource-limited
2020/02/27 12:12:32 [DEBUG] ClusterAlertRuleController calling handler cluster-alert-rule-controller c-6kxbq/deployment-event-alert
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler pod-target-alert-watcher p-hbpb9/less-than-half-workload-available
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/management.cattle.io/v3/c-hx8mj/clusteralertrules
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler pod-target-alert-watcher p-tp8zd/memory-close-to-resource-limited
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler pod-target-alert-watcher p-ftb44/less-than-half-workload-available
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-ftb44/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-hbpb9/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-tp8zd/pipelinesettings
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.187:51186: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.187:51184: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.187:51182: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.186:50618: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.186:50616: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.186:50614: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.186:50072: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.187:51188: remote error: tls: bad certificate
2020-02-27 12:12:33.563562 W | etcdserver: apply entries took too long [794.631734ms for 1 entries]
2020-02-27 12:12:33.563602 W | etcdserver: avoid queries with large range/delete range!
I0227 12:12:33.570020 7 trace.go:76] Trace[1307881769]: “Create /apis/management.cattle.io/v3/namespaces/c-hx8mj/clusteralertrules” (started: 2020-02-27 12:12:32.677334704 +0000 UTC m=+31.531637795) (total time: 892.651606ms):
Trace[1307881769]: [892.651606ms] [892.468828ms] END
2020/02/27 12:12:33 [DEBUG] REST CREATE apis/management.cattle.io/v3/c-hx8mj/clusteralertgroups
I0227 12:12:33.589782 7 trace.go:76] Trace[1443316789]: “Create /apis/project.cattle.io/v3/namespaces/p-hbpb9/pipelinesettings” (started: 2020-02-27 12:12:32.757108178 +0000 UTC m=+31.611411263) (total time: 832.646477ms):
Trace[1443316789]: [832.646477ms] [832.565626ms] END
I0227 12:12:33.590153 7 trace.go:76] Trace[2033875104]: “Create /apis/project.cattle.io/v3/namespaces/p-lv5lf/pipelinesettings” (started: 2020-02-27 12:12:32.756801589 +0000 UTC m=+31.611104678) (total time: 833.336847ms):
Trace[2033875104]: [833.336847ms] [833.246113ms] END
I0227 12:12:33.590363 7 trace.go:76] Trace[960459642]: “Create /apis/project.cattle.io/v3/namespaces/p-tp8zd/pipelinesettings” (started: 2020-02-27 12:12:32.756272483 +0000 UTC m=+31.610575611) (total time: 834.070593ms):
Trace[960459642]: [834.070593ms] [833.896484ms] END

Paul1 · June 26, 2021, 6:02pm

Did you ever solve this? Seems like no help offered.

Topic		Replies	Views
Auto restart sometimes not working Rancher 1.x	3	2512	March 25, 2019
Rancher server 1.4.1 keep restarting Rancher 1.x	0	391	April 22, 2023
Rancher stops responding and restarts regularly Rancher	13	7133	December 10, 2020
Rancher server memory, crash after a day or two Rancher 1.x	4	1580	October 17, 2017
Rancher host has to be rebooted every couple of days Rancher 1.x	0	1048	October 5, 2016

Rancher 2.2.8 Admin server in restart loop

Related topics