Production Rancher HA on AWS

I’m trying to setup a production rancher cluster in AWS with no luck. I’ve see some of the problem posts and am using 1.0.1 for both the script generation instance as well specifying 1.0.1 when running the generated script but no go. The system never fully comes up. Watching the logs on the rancher-ha container it’s just the following over and over again:

time=“2016-05-20T18:56:43Z” level=info msg=“Waiting for server to be available” component=cert
time=“2016-05-20T18:56:43Z” level=info msg=“Can not launch agent right now: Server not available at” component=service

I know to expect additional problems using an ELB but I’m not even to that point yet. I can’t even get the first HA system up and available to respond to requests.

Any ideas?

Have you verified that all of the required ports are open in your hosts security group?

You know what. I bet that’s it. I missed that part int he docs and that could easily be an issue. I’ll check into it Monday.

Thank you.

So, still little luck with Rancher HA. Every time I get past one problem I run into a next and I don’t think I’m trying to do anything unusual. Following the instructions to a T on standard Ubuntu 14.04 AMI’s in AWS. I can now get the 3 servers to stand up, but in the “management” stack the go-machine-service and rancher-compose-executor never fully come up (always cycling between “initializing” and “degraded”) and and there are many errors in the logs related to not being able to remove filesystems for containers and such. For the containers that are present and running, neither logs nor shell access (from the UI) seems to work. There may be more but this is where I gave up.

A serious question and not a troll… is Rancher HA ready general availability? It’s perfectly OK if it’s not. We can use just a single instance server until HA is ready (as long as there’s an upgrade path), but right now, instead of learning Rancher and getting our services and applications deployed, I’m spending days fighting the HA installation. After our evaluation of several competing products, we had landed on Rancher as the solution, and we still like it a lot, but I admit to being a little frustrated right now.

@Matt_Welch What version were you using? There were some known issues with AWS/ELB/SSL.

We just made some fixes in the AWS/ELB/SSL setups in our most recent stable (v1.0.2) and latest (v1.1.0-dev3) releases. When we initially tested, we had tested using GCE and DO, so missed that test case.

Feel free to use this for help re: your HA setup. I can ask @aemneina to take a look at he’s been the one testing it.

@Matt_Welch is your registration url pointing to the ELB? if so can you share your elb policy config…
aws elb describe-load-balancers should get you that output, feel free to scrub any sensitive info.

I went through the process to get this up and running. There were a few switches on the AWS side that needed to be flipped, but now have a solid setup.

Using tips gleaned from the following two tickets, I was able to get my HA setup running:

The Rancher HA architecture involves quite a few moving pieces, so I understand the potential complexity. I’m sure a smoother path to success will emerge over time. In the meantime, my team can now get down to the real work of setting up are services and applications.

BTW, what the process for contributing to the documentation? In my case, I think there are things that I missed simply because I was going ONLY through the HA documentation. Some of what I needed was only linked or referred to in the non-HA docs. I think with a few additional references and some clarifications, it would be a smoother process. I would be happy to take a stab at adding that information.

I see that the doc pages are in github, but the edit link there seems to offer only to fork the project. Is that the appropriate way to contribute back to the docs?

Yes, the typical Github workflow is to fork the project into your own account, edit, and then submit a pull-request to the parent project.

I now see that the main page of the docs project in GitHub goes into exhausting details on how to contribute. I’ll see what I can do. Thanks.