Rancher is feeling like a mistake. Help!

William_Flanagan · December 16, 2016, 2:04pm

Both rc4 and rc3 do not boot.

They both exhibit the same behavior.

Logs in Gist: https://gist.github.com/wflanagan/9475b0c77cf38f3e75646eca8b6c865d

But, the net is the front end is not responsive, with the browser returning ERR_CONNECTION_REFUSED.

So, now what?

William_Flanagan · December 16, 2016, 5:01pm

@vincent So, in talking with my board… i guess my question is this as well… How much would it cost to get support to get this working? If a support contract will get these things resolved faster, I’m very willing to talk about it.

This has GOT to get out of our critical path.

William_Flanagan · December 16, 2016, 6:06pm

FYI… 1.2.1 production version does not work either and exhibits the same behavior. The UI does not boot, and the logs look like the ones above (no point in posting new ones).

vincent · December 16, 2016, 7:13pm

Continued on IRC, ending in:

innodb was corrpted on the mysql… we repaired the db and now its working

The quote button on the right half of Support | Rancher will get you in touch with the right person for your region.

William_Flanagan · December 16, 2016, 10:32pm

Marginally optimistic. On 1.2.1 we are still up. Big push coming in about an hour, and I’ll update after that.

Phillip_Ulberg · December 20, 2016, 4:11am

@William_Flanagan how are things now?

vincent · December 20, 2016, 6:34pm

Haven’t heard from them since Friday, so I’m assuming a case of

People that have no problems don’t come to github or the forum and tell you how awesome things are. There are many, many more installations (even just among the opt-ins I can see) than people commenting on issues/forum posts.

(Marketing + messaging around significant 1.2 upgrade maybe incorrect? · Issue #6961 · rancher/rancher · GitHub)

William_Flanagan · December 20, 2016, 6:55pm

Hey guys,

I’ll update. Sorry. Yes, we have been able to do 1 upgrade successfully on
the new system (1.2.1 with upgraded hardware). We are doing another one
today ,and if that goes, it does look like it’s solved.

Sorry for the delay in response. I came down with the flu and was out a
couple of days.

William_Flanagan · January 4, 2017, 2:18am

So, the net of this, is that with a machine that is 64GB RAM, solo… and on 1.2.1, we were able to deploy a few times successfully. However, for the last few days, everything got VERY erratic, and now Rancher is not accepting connections.

This is the machine. I think its safe to say is not a resource problem on the machine itself:

So far, we have rebooted reboot/restart of the docker daemon, and restarted the entire machine. Chrome is getting a: ERR_CONNECTION_REFUSED error.

The logs for the Rancher container in docker are this (identical to last time):

They haven’t moved from there.

And, we are down again, because this has brought down at least one of our processes… not necessarily that its down, but its not DNS reachable it seems.

I would appreciate any advice at this point. The “failure” was slower… but still there… just like previous times…

William_Flanagan · January 4, 2017, 9:50pm

OK. Update.

We have rebuilt this from scratch again, and are coming back up.

The thing we realized though, for MySQL is that, while the MySQL database is 5.2gb, the binlog is producing about 150GB per day. So, we modified our MySQL to reduce that…

Hopefully this was the problem, and not the server itself.

mister2d · January 18, 2017, 12:41am

How are things going? Does your problem still exist?

William_Flanagan · January 23, 2017, 4:32pm

Hey @mister2d,

Things are stable right now. So, so far, looking better.

Note that we made a BIG change to drop our image sizes to below 1G. Delete all your .git repo stuff (big size shrink), all extraneous libraries, use an Alpine-based image, ensure logs are deleted, etc.

I’ll update in a few days. We’ve been actually doing some development with the devops env stable, and we haven’t really wired it all back in. So, I’ll update when we start regularly pushing again. But, for the pushes we do… seems to be working.

mister2d · January 24, 2017, 6:31pm

@William_Flanagan

Nice. I’m glad things are stable now. Gives me some confidence going forward.

SoreGums · February 20, 2017, 3:00pm

Basically, this stuff is not simple and requires a thorough understanding. “She’ll be right mate” is not going to cut it.

This has been my biggest takeaway from getting involved in the “container way”.

It might be easy to get started, however once real work becomes a thing, unless you are actually familiar with everything (of which there is always something else to figure out), Your gonna have a bad time.

My biggest focus right now is sorting out the monitoring of hosts, containers and apps as well as associated outputs that end up on disk, especially log files. Monitoring and Persistent data are massive topics on their own and become essential once containers get involved.

Standing up a Wordpress blog that has zero posts or traffic is easy to run. However running a Wordpress blog that gets significant traffic and it is important that it stays online as it is the source of all revenue that pays the 10+ people employed, well, that is something completely different.

Glad you guys have been able to narrow things down, and seem to be having success

Topic		Replies	Views
Help. All is going to shits Rancher 1.x	13	2131	April 4, 2017
Rancher-HA startup woes and problems Rancher 1.x	0	1075	October 17, 2016
Rancher eating all the CPU, is it overloaded? Rancher 1.x	1	944	June 18, 2018
1.2 UI Still Extremely Slow Rancher 1.x	2	887	December 17, 2016
All rancher not start Rancher 1.x	1	878	April 12, 2016

Rancher is feeling like a mistake. Help!

Related topics