问题:rancher-agent反复重启,导入集群频繁出现Cluster agent is not connected
环境:AWS-us-east-1-b区
版本:rancher-2.7.2,eks-1.25
日志:INFO: Environment: CATTLE_ADDRESS=10.5.62.43 CATTLE_CA_CHECKSUM=f8e4ed6f390c7bdf2d92d7940cc44b931c538a36fc777e11f67b970ee5915846 CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://172.20.124.24:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://172.20.143_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://172.20.124.24:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=172.20.124.24 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=172ERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_FEATURES=embedded-cluster-api=false,fleet=false,monitoringv1=false,multi-cluster-management=false,multi-cluster-management-agent=true,provisioningv2=false,rke2=false CATTLE_INGRESS_IP_DOMAIN=sslip.io CATT_NAME=cattle-cluster-agent-57c968cf6b-52hvd CATTLE_SERVER=https://10.5.61.167:443 CATTLE_SERVER_VERSION=v2.7.2
INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local ec2.internal nameserver 172.20.0.10 options ndots:5
INFO: https://10.5.61.167:443/ping is accessible
INFO: Value from https://10.5.61.167:443/v3/settings/cacerts is an x509 certificate
time="2023-07-13T09:06:29Z" level=info msg="Listening on /tmp/log.sock"
time="2023-07-13T09:06:30Z" level=info msg="Rancher agent version v2.7.2 is starting"
time="2023-07-13T09:06:36Z" level=info msg="Connecting to wss://10.5.61.167:443/v3/connect/register with token starting with t9xsqt9slknx6h6dn6tt77cs56m"
time="2023-07-13T09:06:36Z" level=info msg="Connecting to proxy" url="wss://10.5.61.167:443/v3/connect/register"
time="2023-07-13T09:06:39Z" level=info msg="Requesting kubelet certificate regeneration"
time="2023-07-13T09:07:07Z" level=info msg="Starting /v1, Kind=Service controller"
time="2023-07-13T09:07:15Z" level=info msg="Running in single server mode, will not peer connections"
time="2023-07-13T09:08:13Z" level=info msg="Applying CRD features.management.cattle.io"
time="2023-07-13T09:09:52Z" level=info msg="Applying CRD navlinks.ui.cattle.io"
time="2023-07-13T09:09:57Z" level=info msg="Applying CRD podsecurityadmissionconfigurationtemplates.management.cattle.io"
time="2023-07-13T09:10:01Z" level=info msg="Applying CRD clusters.management.cattle.io"
time="2023-07-13T09:10:03Z" level=info msg="Applying CRD apiservices.management.cattle.io"
time="2023-07-13T09:10:05Z" level=info msg="Applying CRD clusterregistrationtokens.management.cattle.io"
time="2023-07-13T09:10:11Z" level=info msg="Applying CRD settings.management.cattle.io"
time="2023-07-13T09:10:14Z" level=info msg="Applying CRD preferences.management.cattle.io"
time="2023-07-13T09:10:20Z" level=info msg="Applying CRD features.management.cattle.io"
time="2023-07-13T09:10:24Z" level=info msg="Applying CRD clusterrepos.catalog.cattle.io"
time="2023-07-13T09:10:32Z" level=info msg="Applying CRD operations.catalog.cattle.io"
time="2023-07-13T09:10:35Z" level=info msg="Applying CRD apps.catalog.cattle.io"
time="2023-07-13T09:13:35Z" level=info msg="Starting API controllers"
I0713 09:14:10.754456 52 trace.go:205] Trace[292897603]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.25.4-rancher1/tools/cache/reflector.go:170 (13-Jul-2023 09:13:36.772) (total time: 33982ms):
Trace[292897603]: ---"Objects listed" error:<nil> 33982ms (09:14:10.754)
Trace[292897603]: [33.982354473s] [33.982354473s] END
time="2023-07-13T09:14:18Z" level=info msg="Starting management.cattle.io/v3, Kind=GroupMember controller"
time="2023-07-13T09:14:19Z" level=info msg="Starting management.cattle.io/v3, Kind=Group controller"
time="2023-07-13T09:14:19Z" level=info msg="Starting management.cattle.io/v3, Kind=Cluster controller"
time="2023-07-13T09:14:19Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-07-13T09:14:19Z" level=info msg="Starting management.cattle.io/v3, Kind=User controller"
time="2023-07-13T09:14:19Z" level=info msg="Starting management.cattle.io/v3, Kind=Token controller"
time="2023-07-13T09:14:19Z" level=info msg="Starting management.cattle.io/v3, Kind=UserAttribute controller"
I0713 09:14:35.158344 52 trace.go:205] Trace[1789677789]: "DeltaFIFO Pop Process" ID:v1.admissionregistration.k8s.io,Depth:30,Reason:slow event handlers blocking the queue (13-Jul-2023 09:14:34.114) (total time: 1043ms):
Trace[1789677789]: [1.043931475s] [1.043931475s] END
I0713 09:15:08.165978 52 trace.go:205] Trace[1841500955]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.25.4-rancher1/tools/cache/reflector.go:170 (13-Jul-2023 09:14:27.783) (total time: 40382ms):
Trace[1841500955]: ---"Objects listed" error:<nil> 40382ms (09:15:08.165)
Trace[1841500955]: [40.382412239s] [40.382412239s] END
I0713 09:15:10.299119 52 trace.go:205] Trace[1391728421]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.25.4-rancher1/tools/cache/reflector.go:170 (13-Jul-2023 09:14:27.788) (total time: 42510ms):
Trace[1391728421]: ---"Objects listed" error:<nil> 42510ms (09:15:10.298)
Trace[1391728421]: [42.510556207s] [42.510556207s] END
I0713 09:15:33.002428 52 trace.go:205] Trace[488148478]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.25.4-rancher1/tools/cache/reflector.go:170 (13-Jul-2023 09:14:27.784) (total time: 65218ms):
Trace[488148478]: ---"Objects listed" error:<nil> 65218ms (09:15:33.002)
Trace[488148478]: [1m5.218203725s] [1m5.218203725s] END
time="2023-07-13T09:15:33Z" level=info msg="Starting apiregistration.k8s.io/v1, Kind=APIService controller"
time="2023-07-13T09:15:33Z" level=info msg="Starting apiextensions.k8s.io/v1, Kind=CustomResourceDefinition controller"
time="2023-07-13T09:15:34Z" level=info msg="Starting management.cattle.io/v3, Kind=Feature controller"
time="2023-07-13T09:15:34Z" level=info msg="Starting management.cattle.io/v3, Kind=Cluster controller"
time="2023-07-13T09:15:34Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2023-07-13T09:15:34Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=Role controller"
time="2023-07-13T09:15:34Z" level=info msg="Starting /v1, Kind=ServiceAccount controller"
time="2023-07-13T09:15:51Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding controller"
time="2023-07-13T09:15:52Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=ClusterRole controller"
I0713 09:15:54.407507 52 leaderelection.go:248] attempting to acquire leader lease kube-system/cattle-controllers...
time="2023-07-13T09:15:56Z" level=info msg="Starting steve aggregation client"
time="2023-07-13T09:15:57Z" level=info msg="Starting management.cattle.io/v3, Kind=Preference controller"
time="2023-07-13T09:15:58Z" level=info msg="Starting catalog.cattle.io/v1, Kind=ClusterRepo controller"
time="2023-07-13T09:16:01Z" level=info msg="Starting rbac.authorization.k8s.io/v1, Kind=RoleBinding controller"
time="2023-07-13T09:16:03Z" level=info msg="Starting management.cattle.io/v3, Kind=Setting controller"
time="2023-07-13T09:16:03Z" level=info msg="Starting management.cattle.io/v3, Kind=ClusterRegistrationToken controller"
time="2023-07-13T09:16:03Z" level=info msg="Starting /v1, Kind=ConfigMap controller"
time="2023-07-13T09:16:04Z" level=info msg="Active TLS secret cattle-system/serving-cert (ver=8734872) (count 9): map[listener.cattle.io/cn-10.5.50.64:10.5.50.64 listener.cattle.io/cn-10.5.52.197:10.5.52.197 listener.cattle.io/cn-10.5.52.68:10.5.52.68 listener.cattle.io/cn-10.5.53.116:10.5.53.116 listener.cattle.io/cn-10.5.59.59:10.5.59.59 listener.cattle.io/cn-10.5.62.43:10.5.62.43 listener.cattle.io/cn-127.0.0.1:127.0.0.1 listener.cattle.io/cn-localhost:localhost listener.cattle.io/cn-rancher.cattle-system:rancher.cattle-system listener.cattle.io/fingerprint:SHA1=CC1CEFC4321F6022FCE4E6D6D40210EFAED1B10F]"
time="2023-07-13T09:16:00Z" level=info msg="Starting management.cattle.io/v3, Kind=APIService controller"
time="2023-07-13T09:16:12Z" level=info msg="Listening on :443"
time="2023-07-13T09:16:12Z" level=info msg="Listening on :80"
I0713 09:16:18.052632 52 leaderelection.go:258] successfully acquired lease kube-system/cattle-controllers
time="2023-07-13T09:16:23Z" level=info msg="Listening on :444"
time="2023-07-13T09:16:28Z" level=warning msg="dynamiclistener [::]:444: no cached certificate available for preload - deferring certificate load until storage initialization or first client request"
time="2023-07-13T09:16:47Z" level=error msg="error syncing 'rancher-partner-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974 fetch origin -- 2e69bb64a4c45dc8c98c3b3733985288f007fb73 error: exit status 128, detail: fatal: unable to access 'https://git.rancher.io/partner-charts/': The requested URL returned error: 502\n, requeuing"
time="2023-07-13T09:17:44Z" level=info msg="Steve auth startup complete"
time="2023-07-13T09:17:57Z" level=error msg="error syncing 'rancher-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721 fetch origin -- 049c4b643fac957fba0a7f5834a9a9e06319ed39 error: exit status 128, detail: error: RPC failed; HTTP 502 curl 22 The requested URL returned error: 502\nfatal: the remote end hung up unexpectedly\n, requeuing"
time="2023-07-13T09:18:14Z" level=info msg="Registering namespaceHandler for adding labels "
time="2023-07-13T09:18:31Z" level=error msg="error syncing 'rancher-partner-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974 fetch origin -- 2e69bb64a4c45dc8c98c3b3733985288f007fb73 error: exit status 128, detail: error: RPC failed; HTTP 502 curl 22 The requested URL returned error: 502\nfatal: the remote end hung up unexpectedly\n, requeuing"
I0713 09:18:35.446540 52 trace.go:205] Trace[1881921681]: "DeltaFIFO Pop Process" ID:cattle-dashboards,Depth:13,Reason:slow event handlers blocking the queue (13-Jul-2023 09:18:32.818) (total time: 2628ms):
Trace[1881921681]: [2.628158491s] [2.628158491s] END
I0713 09:18:36.514460 52 trace.go:205] Trace[288480800]: "DeltaFIFO Pop Process" ID:cattle-monitoring-system/prometheus-operated,Depth:24,Reason:slow event handlers blocking the queue (13-Jul-2023 09:18:32.818) (total time: 3695ms):
Trace[288480800]: [3.695998672s] [3.695998672s] END
I0713 09:18:37.226408 52 trace.go:205] Trace[1751402228]: "DeltaFIFO Pop Process" ID:cattle-monitoring-system/prometheus-operated,Depth:24,Reason:slow event handlers blocking the queue (13-Jul-2023 09:18:32.818) (total time: 4407ms):
Trace[1751402228]: [4.407818079s] [4.407818079s] END
I0713 09:18:37.554927 52 trace.go:205] Trace[204009722]: "DeltaFIFO Pop Process" ID:cattle-fleet-system/fleet-agent-74784f4cf6,Depth:24,Reason:slow event handlers blocking the queue (13-Jul-2023 09:18:35.446) (total time: 2108ms):
Trace[204009722]: [2.108574538s] [2.108574538s] END
I0713 09:18:37.555008 52 trace.go:205] Trace[1659639646]: "DeltaFIFO Pop Process" ID:cattle-impersonation-system,Depth:11,Reason:slow event handlers blocking the queue (13-Jul-2023 09:18:36.514) (total time: 1040ms):
Trace[1659639646]: [1.040739548s] [1.040739548s] END
I0713 09:18:37.555090 52 trace.go:205] Trace[1312257642]: "DeltaFIFO Pop Process" ID:cattle-monitoring-system/rancher-monitoring-grafana,Depth:22,Reason:slow event handlers blocking the queue (13-Jul-2023 09:18:37.226) (total time: 328ms):
Trace[1312257642]: [328.746992ms] [328.746992ms] END
I0713 09:18:38.123459 52 trace.go:205] Trace[446519050]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.25.4-rancher1/tools/cache/reflector.go:170 (13-Jul-2023 09:18:27.062) (total time: 11060ms):
Trace[446519050]: ---"Objects listed" error:<nil> 11060ms (09:18:38.123)
Trace[446519050]: [11.060891224s] [11.060891224s] END
I0713 09:18:42.496070 52 trace.go:205] Trace[1386512547]: "DeltaFIFO Pop Process" ID:cattle-monitoring-system/rancher-monitoring-grafana-5f86d88688,Depth:23,Reason:slow event handlers blocking the queue (13-Jul-2023 09:18:41.962) (total time: 533ms):
Trace[1386512547]: [533.786553ms] [533.786553ms] ENDX
I0713 09:18:54.004112 52 trace.go:205] Trace[1907182927]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.25.4-rancher1/tools/cache/reflector.go:170 (13-Jul-2023 09:18:23.586) (total time: 30417ms):
Trace[1907182927]: ---"Objects listed" error:<nil> 30417ms (09:18:54.003)
Trace[1907182927]: [30.417903886s] [30.417903886s] END
E0713 09:18:55.385638 52 leaderelection.go:367] Failed to update lock: resource name may not be empty
time="2023-07-13T09:19:04Z" level=error msg="error syncing 'rancher-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40cac650031b74776e87c1a726b0484d0877c3ec137da0872547ff9b73a721 fetch origin -- 049c4b643fac957fba0a7f5834a9a9e06319ed39 error: exit status 128, detail: fatal: unable to access 'https://git.rancher.io/charts/': The requested URL returned error: 502\n, requeuing"
I0713 09:19:06.590862 52 leaderelection.go:283] failed to renew lease kube-system/cattle-controllers: timed out waiting for the condition
I0713 09:18:56.689166 52 trace.go:205] Trace[1203368975]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.25.4-rancher1/tools/cache/reflector.go:170 (13-Jul-2023 09:18:23.584) (total time: 33104ms):
Trace[1203368975]: ---"Objects listed" error:<nil> 33104ms (09:18:56.688)
Trace[1203368975]: [33.104439401s] [33.104439401s] END
I0713 09:19:04.530310 52 trace.go:205] Trace[622929508]: "DeltaFIFO Pop Process" ID:kube-system/ebs-csi-controller-5459c886d9-5qlmv,Depth:19,Reason:slow event handlers blocking the queue (13-Jul-2023 09:18:56.690) (total time: 7840ms):
Trace[622929508]: [7.840093692s] [7.840093692s] END
I0713 09:19:09.895954 52 trace.go:205] Trace[564063009]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.25.4-rancher1/tools/cache/reflector.go:170 (13-Jul-2023 09:18:23.584) (total time: 46311ms):
Trace[564063009]: ---"Objects listed" error:<nil> 46311ms (09:19:09.895)
Trace[564063009]: [46.311880669s] [46.311880669s] END
I0713 09:19:11.258368 52 trace.go:205] Trace[443366797]: "DeltaFIFO Pop Process" ID:kube-system/ebs-csi-node-hqx82,Depth:17,Reason:slow event handlers blocking the queue (13-Jul-2023 09:19:10.394) (total time: 863ms):
Trace[443366797]: [863.658079ms] [863.658079ms] END
I0713 09:19:14.938203 52 trace.go:205] Trace[1440101658]: "DeltaFIFO Pop Process" ID:kube-system/ebs-csi-node-rpvnm,Depth:15,Reason:slow event handlers blocking the queue (13-Jul-2023 09:19:13.746) (total time: 567ms):
Trace[1440101658]: [567.744513ms] [567.744513ms] END
time="2023-07-13T09:19:28Z" level=fatal msg="leaderelection lost for cattle-controllers"