Edit 2: Figured it out. The Terraform resource rancher2_cluster**_v2** (as opposed to rancher2_cluster), DOES have the correct registration commands. That took me almost a day to figure out. Oh well.
I have deployed Rancher and a bunch of nodes using Terraform on a Proxmox hypervisior. I’ve used Terraform to get the node_command from the Rancher server and the command is run automatically on the new nodes. The command is:
For controlplane/etcd nodes:
sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.9.0 --server https://1.2.3.4 --token <some token> --ca-checksum <some checksum> --etcd --controlplane
For worker nodes:
sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.9.0 --server https://1.2.3.4 --token <some token> --ca-checksum <some checksum> --worker
The docker containers start normally and in Rancher I see the nodes added as machines. However the Rancher cluster status is stuck in “Waiting for etcd, controlplane and worker nodes to be registered”.
The docker logs on the nodes have the same message. I’ve turned on debug logging on the Rancher server, but the docker logs don’t mention anything about nodes trying to register.
Any help is appriciated.
Edit: did some more investigating. I found that if I manually create a custom cluster, you get very different registration command that look like this (tokens don’t really matter, I’m in a lab setup atm):
curl -fL https://1.2.3.4/system-agent-install.sh | sudo sh -s - --server https://1.2.3.4 --label 'cattle.io/os=linux' --token pnwwq4sxmjjrp85q4s6xggbvsv22zhnlkbpgqcp5hrqw6sx5tt4pqv --ca-checksum 5e304c0fcce9ca6c8d95c80eca2c14aff68d2d86be4cb3072bd07ec0fe7bf7be --etcd --controlplane --worker
This command works. It doesn’t use docker, but uses this script to install the system agent: https://github.com/rancher/system-agent/blob/main/install.sh
My problem is now that I use the rancher2_cluster Terraform resource to create the cluster, but it looks like that resource cannot supply the non-docker command. This is what I get from that resource:
{
"annotations" = tomap({})
"cluster_id" = "c-lh6sv"
"command" = "kubectl apply -f https://10.93.74.10/v3/import/xkvxzcgnm7v8zpdx68hcksvj844m62sbrg94svtbh2cjlzfgdpmjmn_c-lh6sv.yaml"
"id" = "c-lh6sv:default-token"
"insecure_command" = "curl --insecure -sfL https://1.2.3.4/v3/import/xkvxzcgnm7v8zpdx68hcksvj844m62sbrg94svtbh2cjlzfgdpmjmn_c-lh6sv.yaml | kubectl apply -f -"
"insecure_node_command" = ""
"insecure_windows_node_command" = ""
"labels" = tomap({})
"manifest_url" = "https://1.2.3.4/v3/import/xkvxzcgnm7v8zpdx68hcksvj844m62sbrg94svtbh2cjlzfgdpmjmn_c-lh6sv.yaml"
"name" = "default-token"
"node_command" = "sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.9.0 --server https://1.2.3.4 --token xkvxzcgnm7v8zpdx68hcksvj844m62sbrg94svtbh2cjlzfgdpmjmn --ca-checksum 5e304c0fcce9ca6c8d95c80eca2c14aff68d2d86be4cb3072bd07ec0fe7bf7be"
"token" = "xkvxzcgnm7v8zpdx68hcksvj844m62sbrg94svtbh2cjlzfgdpmjmn"
"windows_node_command" = "PowerShell -NoLogo -NonInteractive -Command \"& {docker run -v c:\\:c:\\host rancher/rancher-agent:v2.9.0 bootstrap --server https://1.2.3.4 --token xkvxzcgnm7v8zpdx68hcksvj844m62sbrg94svtbh2cjlzfgdpmjmn --ca-checksum 5e304c0fcce9ca6c8d95c80eca2c14aff68d2d86be4cb3072bd07ec0fe7bf7be | iex}\""
},
])