Rancher DigitalOcean node provisioning extremely unstable

Trying to provision nodes on Downstream-clusters with HA-Rancher v2.5.9 and v2.7.1 and its built-in node-driver for Digital Ocean, we observe a very unstable behavior with following errors:

Error creating machine: Error running "sudo apt-get update": ssh command error: command: sudo apt-get update err: exit status 100 output: Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease

[ERROR] handler node-controller: Error creating machine: Error installing Docker: , requeuing

[ERROR] handler node-controller: Error creating machine: Error running "sudo apt-get update": ssh command error: command: sudo apt-get update err: exit status 100 output: Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB], requeuing

Tested with Upstream Rancher v2.5.9 on RKE v1.20.11 and Downstream RKE v1.20.11 and
Upstream Rancher v2.7.1 on RKE v1.24.4 and Downstream RKE v1.24.13 with different Ubuntu versions.

With Rancher v2.5.9 nearly every 3rd provisioning-try fails, with Rancher v2.7.1 it became much worse again:

Does anyone have an idea wether this is a topic on Digital Ocean or Rancher side? Or are there other ways to debug the problem besides looking into the Rancher-debug-logs?

1 Like