Reinstall Rancher HA via helm failed on a k8s (by kubespray)

Kuo_Hugo · October 15, 2021, 4:05am

Hi Rancher team,

I installed Rancher v2.6.0 via helm and uninstall it. Afterward, attempted to install v2.5.9 but getting into some problem then I cleanup the Rancher deployment by using the system-tools with the script https://raw.githubusercontent.com/kingsd041/some_script/master/remove-rancher-ha/remove_r_ha.sh
.

After that, runs the helm install again but always failed to start rancher server pods. Any direction to troubleshooting it? thx

hugo@hugok-mlt  % kubectl get pods -n cattle-system
NAME                       READY   STATUS    RESTARTS   AGE
rancher-786f49f5dc-dx9hw   0/1     Running   5          12m
rancher-786f49f5dc-fnvlh   0/1     Running   1          5m15s
rancher-786f49f5dc-vzcvq   0/1     Running   5          12m

The self healthz check always failed.

Name:         rancher-786f49f5dc-vzcvq
Namespace:    cattle-system
Priority:     0
Node:         k8s-node-02/100.65.16.9
Start Time:   Fri, 15 Oct 2021 11:46:23 +0800
Labels:       app=rancher
              pod-template-hash=786f49f5dc
              release=rancher
Annotations:  cni.projectcalico.org/podIP: 10.233.74.102/32
              cni.projectcalico.org/podIPs: 10.233.74.102/32
Status:       Running
IP:           10.233.74.102
IPs:
  IP:           10.233.74.102
Controlled By:  ReplicaSet/rancher-786f49f5dc
Containers:
  rancher:
    Container ID:  docker://f78bf829e40038cdb62aa82cfa05dbb6c52166c5120a67f4bbb1aa79bdc9ee90
    Image:         docker.test.com/rancher/rancher:v2.5.9
    Image ID:      docker-pullable://docker.test.com/rancher/rancher@sha256:10e938f788e725d1d2ed7bc909bae8c7a83b756c520fb2596bf559e44e13587d
    Port:          80/TCP
    Host Port:     0/TCP
    Args:
      --no-cacerts
      --http-listen-port=80
      --https-listen-port=443
      --add-local=true
    State:          Running
      Started:      Fri, 15 Oct 2021 11:58:38 +0800
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Fri, 15 Oct 2021 11:56:08 +0800
      Finished:     Fri, 15 Oct 2021 11:58:37 +0800
    Ready:          False
    Restart Count:  5
    Liveness:       http-get http://:80/healthz delay=60s timeout=1s period=30s #success=1 #failure=3
    Readiness:      http-get http://:80/healthz delay=5s timeout=1s period=30s #success=1 #failure=3
    Environment:
      CATTLE_NAMESPACE:           cattle-system
      CATTLE_PEER_SERVICE:        rancher
      CATTLE_BOOTSTRAP_PASSWORD:  <set to the key 'bootstrapPassword' in secret 'bootstrap-secret'>  Optional: false
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from rancher-token-jmsph (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  rancher-token-jmsph:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rancher-token-jmsph
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     cattle.io/os=linux:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  13m                   default-scheduler  Successfully assigned cattle-system/rancher-786f49f5dc-vzcvq to k8s-node-02
  Normal   Pulled     9m14s (x3 over 13m)   kubelet            Container image "docker.addpchina.com/rancher/rancher:v2.5.9" already present on machine
  Warning  Unhealthy  9m14s (x6 over 12m)   kubelet            Liveness probe failed: Get "http://10.233.74.102:80/healthz": dial tcp 10.233.74.102:80: connect: connection refused
  Normal   Killing    9m14s (x2 over 11m)   kubelet            Container rancher failed liveness probe, will be restarted
  Normal   Created    9m13s (x3 over 13m)   kubelet            Created container rancher
  Normal   Started    9m13s (x3 over 13m)   kubelet            Started container rancher
  Warning  Unhealthy  3m42s (x18 over 13m)  kubelet            Readiness probe failed: Get "http://10.233.74.102:80/healthz": dial tcp 10.233.74.102:80: connect: connection refused

The pod logs shows waiting for initial data to be populated.

2021/10/15 04:02:23 [INFO] APIVersion metrics.k8s.io/v1beta1 Kind NodeMetrics
2021/10/15 04:02:23 [INFO] APIVersion metrics.k8s.io/v1beta1 Kind PodMetrics
W1015 04:02:24.010406       7 warnings.go:80] apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
2021/10/15 04:02:25 [INFO] Waiting for initial data to be populated
2021/10/15 04:02:27 [INFO] Waiting for initial data to be populated

Topic		Replies	Views
Liveness probe failed connection refused Rancher	16	29697	November 23, 2022
Install failed, Then it went downhill. Rancher 2.6.7 Rancher	2	417	September 5, 2022
Unable to setup Rancher 2.6 Rancher	1	936	February 3, 2023
Rancher in Docker - helm-operation error Rancher	2	6572	December 6, 2022
Error uninstall rancher helm	1	3532	June 26, 2022

Reinstall Rancher HA via helm failed on a k8s (by kubespray)

Related topics