We have a docker rancher (non-clustered) admin server that for some reason last week started to behave very badly. (2 core 16 gig memory - tried also with 4 core 20 gig of memory with same results).
The CPU goes to 100% utilized (on 2 core shoes as 200%) and memory keeps creeping up till it runs out of memory, the instance is killed then automatically restarts. About every 90 seconds.
The same configuration has been running fine for months, then on a Sunday morning just started having issues. No apparent changes.
The log files show no sign of errors
there are painfully little/no diagnostic tools for this applications
adding debug to the logs shows nothing but a loop - but no errors that seem to indicate the issue. The debug output is turnned off every 90 seconds on a restart.
I really would appreciate any input on this or how to diagnose this issue.
Also - we utilize Dynatrace on the server, but a statically linked go application can not be monitored by Dynatrace - bummer!
snippet from logs (the bad certificate errors have been there before)
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-tp8zd/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-ftb44/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-hbpb9/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
2020/02/27 12:12:32 [DEBUG] ClusterAlertRuleController calling handler cluster-alert-rule-controller c-6kxbq/high-memmory
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-ftb44/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-tp8zd/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-hbpb9/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
I0227 12:12:32.588229 7 trace.go:76] Trace[787931110]: “List /apis/project.cattle.io/v3/apprevisions” (started: 2020-02-27 12:12:31.801440445 +0000 UTC m=+30.655743469) (total time: 786.761999ms):
Trace[787931110]: [769.141486ms] [769.102334ms] Listing from storage done
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/management.cattle.io/v3/c-hx8mj/clusteralertrules
2020/02/27 12:12:32 [DEBUG] REST DELETE apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ClusterAlertRuleController calling handler cluster-alert-rule-controller c-6kxbq/etcd-system-service
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler project-alert-rule-controller p-hbpb9/less-than-half-workload-available
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-tp8zd/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-ftb44/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-hbpb9/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler project-alert-rule-controller p-ftb44/less-than-half-workload-available
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler project-alert-rule-controller p-tp8zd/memory-close-to-resource-limited
2020/02/27 12:12:32 [DEBUG] ClusterAlertRuleController calling handler cluster-alert-rule-controller c-6kxbq/deployment-event-alert
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler pod-target-alert-watcher p-hbpb9/less-than-half-workload-available
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/management.cattle.io/v3/c-hx8mj/clusteralertrules
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler pod-target-alert-watcher p-tp8zd/memory-close-to-resource-limited
2020/02/27 12:12:32 [DEBUG] REST GET apis/project.cattle.io/v3/p-lv5lf/apps/cluster-alerting
2020/02/27 12:12:32 [DEBUG] ProjectAlertRuleController calling handler pod-target-alert-watcher p-ftb44/less-than-half-workload-available
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-ftb44/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-lv5lf/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-hbpb9/pipelinesettings
2020/02/27 12:12:32 [DEBUG] REST CREATE apis/project.cattle.io/v3/p-tp8zd/pipelinesettings
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.187:51186: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.187:51184: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.187:51182: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.186:50618: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.186:50616: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.186:50614: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.186:50072: remote error: tls: bad certificate
2020/02/27 12:12:33 [DEBUG] 2020/02/27 12:12:33 http: TLS handshake error from 10.30.7.187:51188: remote error: tls: bad certificate
2020-02-27 12:12:33.563562 W | etcdserver: apply entries took too long [794.631734ms for 1 entries]
2020-02-27 12:12:33.563602 W | etcdserver: avoid queries with large range/delete range!
I0227 12:12:33.570020 7 trace.go:76] Trace[1307881769]: “Create /apis/management.cattle.io/v3/namespaces/c-hx8mj/clusteralertrules” (started: 2020-02-27 12:12:32.677334704 +0000 UTC m=+31.531637795) (total time: 892.651606ms):
Trace[1307881769]: [892.651606ms] [892.468828ms] END
2020/02/27 12:12:33 [DEBUG] REST CREATE apis/management.cattle.io/v3/c-hx8mj/clusteralertgroups
I0227 12:12:33.589782 7 trace.go:76] Trace[1443316789]: “Create /apis/project.cattle.io/v3/namespaces/p-hbpb9/pipelinesettings” (started: 2020-02-27 12:12:32.757108178 +0000 UTC m=+31.611411263) (total time: 832.646477ms):
Trace[1443316789]: [832.646477ms] [832.565626ms] END
I0227 12:12:33.590153 7 trace.go:76] Trace[2033875104]: “Create /apis/project.cattle.io/v3/namespaces/p-lv5lf/pipelinesettings” (started: 2020-02-27 12:12:32.756801589 +0000 UTC m=+31.611104678) (total time: 833.336847ms):
Trace[2033875104]: [833.336847ms] [833.246113ms] END
I0227 12:12:33.590363 7 trace.go:76] Trace[960459642]: “Create /apis/project.cattle.io/v3/namespaces/p-tp8zd/pipelinesettings” (started: 2020-02-27 12:12:32.756272483 +0000 UTC m=+31.610575611) (total time: 834.070593ms):
Trace[960459642]: [834.070593ms] [833.896484ms] END