Provisioning AWS EC2 cluster on Rancher 2.3.5: "Waiting for SSH to be available"

I’m having an issue setting up an AWS EC2 cluster on Rancher 2.3.5. I already have an existing EC2 cluster running on Rancher 2.2.3, and I’m attempting to set up the new cluster the exact same way; I launched a new Ubuntu 18.04 instance, installed Docker 19.03.5 (using the rancher/install-docker script), installed the Rancher server 2.3.5, and am launching the EC2 cluster with the exact same node definitions and security groups as my existing working cluster. The EC2 instances launch fine, but the Rancher server never connects to them via SSH; the nodes stay in provisioning mode with the message “Waiting for SSH to be available”.

I can SSH manually into the Rancher server as well as the nodes that Rancher launched, and I can connect via SSH from the Rancher server to the nodes, so it’s not a VPC permissions issue. I’ve triple-checked the cluster settings, the node definitions, the AWS profile permissions… and I can’t see anything wrong. I’ve tried different versions of Docker on the Rancher server, different AMIs for the nodes (both Ubuntu and RancherOS, and yes, I’ve checked that the usernames in the node definitions are correct for each, “ubuntu” and “rancher” respectively)… but the nodes always hang on the same message.

Has anything changed in this regard from 2.3 to 2.4? It should work the exact same way as the cluster I already have working, right? I’m pulling my hair out trying to figure out what might be wrong, so if anyone has any suggestions of things to try, it’d be much appreciated.

Oof, I finally figured this out. I didn’t have “Use only private IP address” checked in the node template, so the Rancher server was trying to SSH from the public/external IP to the public/external IPs of the nodes, which I didn’t have a rule to allow in my security group. I had set a rule in my security group to allow SSH from the public IP of the previous rancher server to the nodes in my previous cluster, which is why the old one worked, and my new one didn’t.

I was facing the same issue and did check the box use only private IP but doing so the newly provisioned node doesn’t get Public IP inturn it cannot install docker on the node and it fails to be provisioned. Do you have any idea on this?