How to troubleshoot Custom RKE2 Cluster creation

Hello everyone,

With Rancher v2.8.5 and 4 Oracle Linux Server 9.4 nodes. I would like to create a Kubernetes cluster with 1 master and 3 workers.

Selected options via Rancher UI

  • Kubernetes: v1.28.15+rke2r1
  • Cloud Provider: Default - RKE2 Embedded
  • System Services: CoreDNS, Metrics Server

At step executing Registration Command on nodes

  • etcd, Control Plan: for master node
    • firewall rules to open ports:
      rule family="ipv4" source address="each worker IP" port port="10250" protocol="tcp" accept
      rule family="ipv4" source address="each worker IP" port port="6443" protocol="tcp" accept
      rule family="ipv4" source address="each worker IP" port port="9345" protocol="tcp" accept
      
    • rancher-system-agent.service is running
      • suspicious logs:
        level=info msg="[Applyinator] Command sh [-c rke2 etcd-snapshot list --etcd-s3=false 2>/dev/null] finished with err: <nil> and exit code: 0"
        level=info msg="[K8s] updated plan secret fleet-default/custom-45efbb5ab4e0-machine-plan with feedback"
        
    • deploy/cattle-cluster-agent is running
      • last error log:
        error syncing 'rancher-charts': handler helm-clusterrepo-download: update failure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-charts/4b40...a721 fetch origin -- release-v2.8 error: exit status 128, detail: fatal: unable to access 'https://git.rancher.io/charts/': The requested URL returned error: 502\n, requeuing
        
    • pod/rancher-webhook-fd7… is Pending with Event message:
      0/1 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
      
  • only worker: for worker nodes
    • firewall rules top open ports:
      rule family="ipv4" source address="master IP" port port="9345" protocol="tcp" accept
      rule family="ipv4" source address="master IP" port port="10250" protocol="tcp" accept
      
    • rancher-system-agent.service is running
      • logs:
        systemd[1]: Started Rancher System Agent.
        rancher-system-agent[3358]: time="2024-12-18T08:55:34+01:00" level=info msg="Rancher System Agent version v0.3.6 (41c07d0) is starting"
        rancher-system-agent[3358]: time="2024-12-18T08:55:34+01:00" level=info msg="Using directory /var/lib/rancher/agent/work for work"
        rancher-system-agent[3358]: time="2024-12-18T08:55:34+01:00" level=info msg="Starting remote watch of plans"
        rancher-system-agent[3358]: time="2024-12-18T08:55:34+01:00" level=info msg="Starting /v1, Kind=Secret controller"
        

Rancher UI

  • All machines has State “Waiting for Node Ref”
  • Provisioning Log:
    [INFO ] waiting for at least one control plane, etcd, and worker node to be registered
    [INFO ] non-ready bootstrap machine(s) custom-45efbb5ab4e0 and join url to be available on bootstrap node
    

I would like to ask how to troubleshoot this problem? Any aspect that I missed?

Thank you in advance!