I’m having an issue setting up an AWS EC2 cluster on Rancher 2.3.5. I already have an existing EC2 cluster running on Rancher 2.2.3, and I’m attempting to set up the new cluster the exact same way; I launched a new Ubuntu 18.04 instance, installed Docker 19.03.5 (using the rancher/install-docker script), installed the Rancher server 2.3.5, and am launching the EC2 cluster with the exact same node definitions and security groups as my existing working cluster. The EC2 instances launch fine, but the Rancher server never connects to them via SSH; the nodes stay in provisioning mode with the message “Waiting for SSH to be available”.
I can SSH manually into the Rancher server as well as the nodes that Rancher launched, and I can connect via SSH from the Rancher server to the nodes, so it’s not a VPC permissions issue. I’ve triple-checked the cluster settings, the node definitions, the AWS profile permissions… and I can’t see anything wrong. I’ve tried different versions of Docker on the Rancher server, different AMIs for the nodes (both Ubuntu and RancherOS, and yes, I’ve checked that the usernames in the node definitions are correct for each, “ubuntu” and “rancher” respectively)… but the nodes always hang on the same message.
Has anything changed in this regard from 2.3 to 2.4? It should work the exact same way as the cluster I already have working, right? I’m pulling my hair out trying to figure out what might be wrong, so if anyone has any suggestions of things to try, it’d be much appreciated.