Kube-apiserver restarting each few seconds

Hi there,
we are not sure what happened, but suddenly our cluster was not responding. A cluster formed by three nodes, one controlplane/etc and 2 woerks. The node1 which is the controlplane/etcd seems to be the problem.

Cluster health check failed: Failed to communicate with API server: Get https://MI_IP:6443/api/v1/namespaces/kube-system?timeout=30s

Looking in this node for the container of the kube-apiserver is always restarting. The log:

+ '[' kube-apiserver = kubelet ']'
+ exec kube-apiserver --requestheader-group-headers=X-Remote-Group --etcd-keyfile=/etc/kubernetes/ssl/kube-node-key.pem --etcd-prefix=/registry --proxy-client-cert-file=/etc/kubernetes/ssl/kube-apiserver-proxy-client.pem --service-node-port-range=30000-32767 --bind-address=0.0.0.0 --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --authorization-mode=Node,RBAC --cloud-provider= --kubelet-client-certificate=/etc/kubernetes/ssl/kube-apiserver.pem --proxy-client-key-file=/etc/kubernetes/ssl/kube-apiserver-proxy-client-key.pem --authentication-token-webhook-cache-ttl=5s --etcd-cafile=/etc/kubernetes/ssl/kube-ca.pem --audit-log-maxbackup=10 --insecure-port=0 --profiling=false --audit-log-path=/var/log/kube-audit/audit-log.json --audit-log-maxsize=100 --requestheader-allowed-names=kube-apiserver-proxy-client --client-ca-file=/etc/kubernetes/ssl/kube-ca.pem --tls-cert-file=/etc/kubernetes/ssl/kube-apiserver.pem --tls-private-key-file=/etc/kubernetes/ssl/kube-apiserver-key.pem --audit-policy-file=/etc/kubernetes/audit-policy.yaml --authentication-token-webhook-config-file=/etc/kubernetes/kube-api-authn-webhook.yaml --anonymous-auth=false --enable-admission-plugins=NamespaceLifecycle,LimitRanger,ServiceAccount,DefaultStorageClass,DefaultTolerationSeconds,MutatingAdmissionWebhook,ValidatingAdmissionWebhook,ResourceQuota,NodeRestriction,Priority,TaintNodesByCondition,PersistentVolumeClaimResize --service-account-key-file=/etc/kubernetes/ssl/kube-service-account-token-key.pem --requestheader-username-headers=X-Remote-User --service-account-lookup=true --audit-log-maxage=30 --audit-log-format=json --etcd-servers=https://MI_IP:2379 --kubelet-client-key=/etc/kubernetes/ssl/kube-apiserver-key.pem --runtime-config=authorization.k8s.io/v1beta1=true --advertise-address=MI_IP --allow-privileged=true --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --requestheader-extra-headers-prefix=X-Remote-Extra- --secure-port=6443 --etcd-certfile=/etc/kubernetes/ssl/kube-node.pem --requestheader-client-ca-file=/etc/kubernetes/ssl/kube-apiserver-requestheader-ca.pem --service-cluster-ip-range=10.43.0.0/16 --storage-backend=etcd3
Flag --insecure-port has been deprecated, This flag will be removed in a future version.
I1116 13:01:05.507115       1 server.go:618] external host was not specified, using MI_IP
I1116 13:01:05.507626       1 server.go:148] Version: v1.18.6
W1116 13:01:05.883250       1 authentication.go:420] the webhook cache ttl of 5s is shorter than the overall cache ttl of 10s for successful token authentication attempts.
I1116 13:01:05.890773       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I1116 13:01:05.890840       1 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I1116 13:01:05.891976       1 plugins.go:158] Loaded 12 mutating admission controller(s) successfully in the following order: NamespaceLifecycle,LimitRanger,ServiceAccount,NodeRestriction,TaintNodesByCondition,Priority,DefaultTolerationSeconds,DefaultStorageClass,StorageObjectInUseProtection,RuntimeClass,DefaultIngressClass,MutatingAdmissionWebhook.
I1116 13:01:05.892031       1 plugins.go:161] Loaded 10 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,RuntimeClass,CertificateApproval,CertificateSigning,CertificateSubjectRestriction,ValidatingAdmissionWebhook,ResourceQuota.
I1116 13:01:05.893311       1 client.go:361] parsed scheme: "endpoint"
I1116 13:01:05.893431       1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://MI_IP:2379  <nil> 0 <nil>}]
I1116 13:01:06.883459       1 client.go:361] parsed scheme: "endpoint"
I1116 13:01:06.883684       1 endpoint.go:68] ccResolverWrapper: sending new addresses to cc: [{https://MI_IP:2379  <nil> 0 <nil>}]
panic: context deadline exceeded

goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/apiextensions-apiserver/pkg/registry/customresourcedefinition.NewREST(0xc0006331f0, 0x50e7a40, 0xc0002b7c20, 0xc0002b7e48)
	/workspace/anago-v1.18.6-rc.0.48+a9f7208b601483/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiextensions-apiserver/pkg/registry/customresourcedefinition/etcd.go:56 +0x3e7
k8s.io/kubernetes/vendor/k8s.io/apiextensions-apiserver/pkg/apiserver.completedConfig.New(0xc0003adc20, 0xc000a17e88, 0x51a63e0, 0x77457d8, 0x10, 0x0, 0x0)
	/workspace/anago-v1.18.6-rc.0.48+a9f7208b601483/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiextensions-apiserver/pkg/apiserver/apiserver.go:145 +0x14ef
k8s.io/kubernetes/cmd/kube-apiserver/app.createAPIExtensionsServer(0xc000a17e80, 0x51a63e0, 0x77457d8, 0x0, 0x50e75a0, 0xc000d9a100)
	/workspace/anago-v1.18.6-rc.0.48+a9f7208b601483/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-apiserver/app/apiextensions.go:102 +0x59
k8s.io/kubernetes/cmd/kube-apiserver/app.CreateServerChain(0xc000a7b080, 0xc0002cd020, 0x455c0f4, 0xc, 0xc0006bfc48)
	/workspace/anago-v1.18.6-rc.0.48+a9f7208b601483/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-apiserver/app/server.go:181 +0x2b8
k8s.io/kubernetes/cmd/kube-apiserver/app.Run(0xc000a7b080, 0xc0002cd020, 0x0, 0x0)
	/workspace/anago-v1.18.6-rc.0.48+a9f7208b601483/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-apiserver/app/server.go:150 +0x101
k8s.io/kubernetes/cmd/kube-apiserver/app.NewAPIServerCommand.func1(0xc0009ab900, 0xc000a7b340, 0x0, 0x2b, 0x0, 0x0)
	/workspace/anago-v1.18.6-rc.0.48+a9f7208b601483/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-apiserver/app/server.go:117 +0x104
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0xc0009ab900, 0xc0000d0010, 0x2b, 0x2b, 0xc0009ab900, 0xc0000d0010)
	/workspace/anago-v1.18.6-rc.0.48+a9f7208b601483/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826 +0x460
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc0009ab900, 0x1647fe8b16232239, 0x7727600, 0xc000068750)
	/workspace/anago-v1.18.6-rc.0.48+a9f7208b601483/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914 +0x2fb
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(...)
	/workspace/anago-v1.18.6-rc.0.48+a9f7208b601483/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
main.main()
	_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-apiserver/apiserver.go:43 +0xcd


What could I try? :frowning: