Cluster appears up and running. Web UI is working and I can manage the cluster, setup Github Auth, etc.
However, when I check logs of the cattle pod, I see nothing but errors like below:
2018/10/09 21:26:04 [ERROR] Failed to connect to peer wss://10.42.1.3/v3/connect [local ID=10.42.2.2]: websocket: bad handshake
2018/10/09 21:26:07 [ERROR] Failed to connect to peer wss://10.42.2.3/v3/connect [local ID=10.42.2.2]: websocket: bad handshake
2018/10/09 21:26:07 [ERROR] Failed to connect to peer wss://10.42.0.7/v3/connect [local ID=10.42.2.2]: websocket: bad handshake
The rancher pod logs are filled with messages like:
Also, the AWS ALB needs to be set up with two target groups, one for http 80 going to port 80 and one for https 443 going to port 443. This worked with the TLS cert set on the https target group.
You can see from the links above that there are also docs for using AWS ALB. And using SSL certs on the ingress controller don’t make much sense when you already have wildcart certs in AWS set up for a domain. It’s just a lot nicer to have the AWS load balancer handle the certs.
Thankfully, I got it working and everything is great now.
Yes I see that ALB can be configured albeit even these docs recommend using a level 4 Load Balancer. I’m not 100% sure why that is but perhaps because of the extremely low overhead (and thus higher scalability) and also maintaining encryption further upstream. One thing that I dislike about AWS NLBs is that they do not implement security groups so it’s harder to write rules to constrain the origin, whereas for an ALB you can associate a sec-group of the ALB with the instances so traffic can only arrive via the ALB and with a valid cert.
We too use wildcard certs for hosted zones and, at least up until now have terminated SSL at the ALB.
TBH I’m a bit unsure which way to go with this now. Time to have a chat with my Rancher tech support guy.
Very interested to know what you found out if you don’t mind. That’s exactly why we gave up trying to use the suggested NLBs. There doesn’t seem to be a way to get the equivalent functionality of security groups with NLBs (sg on the lb, sg on the node hosts, allowing traffic from one to the other). If you have your node hosts in private subnets, then you’re pretty much forced to accept everything from the entire cidr block for the vpc.