Rancher is feeling like a mistake. Help!

Both rc4 and rc3 do not boot.

They both exhibit the same behavior.

Logs in Gist: https://gist.github.com/wflanagan/9475b0c77cf38f3e75646eca8b6c865d

But, the net is the front end is not responsive, with the browser returning ERR_CONNECTION_REFUSED.

So, now what?

@vincent So, in talking with my board… i guess my question is this as well… How much would it cost to get support to get this working? If a support contract will get these things resolved faster, I’m very willing to talk about it.

This has GOT to get out of our critical path.

FYI… 1.2.1 production version does not work either and exhibits the same behavior. The UI does not boot, and the logs look like the ones above (no point in posting new ones).

Continued on IRC, ending in:

innodb was corrpted on the mysql… we repaired the db and now its working

The quote button on the right half of Support | Rancher will get you in touch with the right person for your region.

Marginally optimistic. On 1.2.1 we are still up. Big push coming in about an hour, and I’ll update after that.

@William_Flanagan how are things now?

Haven’t heard from them since Friday, so I’m assuming a case of

People that have no problems don’t come to github or the forum and tell you how awesome things are. There are many, many more installations (even just among the opt-ins I can see) than people commenting on issues/forum posts.

(Marketing + messaging around significant 1.2 upgrade maybe incorrect? · Issue #6961 · rancher/rancher · GitHub)

Hey guys,

I’ll update. Sorry. Yes, we have been able to do 1 upgrade successfully on
the new system (1.2.1 with upgraded hardware). We are doing another one
today ,and if that goes, it does look like it’s solved.

Sorry for the delay in response. I came down with the flu and was out a
couple of days.

So, the net of this, is that with a machine that is 64GB RAM, solo… and on 1.2.1, we were able to deploy a few times successfully. However, for the last few days, everything got VERY erratic, and now Rancher is not accepting connections.

This is the machine. I think its safe to say is not a resource problem on the machine itself:

So far, we have rebooted reboot/restart of the docker daemon, and restarted the entire machine. Chrome is getting a: ERR_CONNECTION_REFUSED error.

The logs for the Rancher container in docker are this (identical to last time):

They haven’t moved from there.

And, we are down again, because this has brought down at least one of our processes… not necessarily that its down, but its not DNS reachable it seems.

I would appreciate any advice at this point. The “failure” was slower… but still there… just like previous times…

OK. Update.

We have rebuilt this from scratch again, and are coming back up.

The thing we realized though, for MySQL is that, while the MySQL database is 5.2gb, the binlog is producing about 150GB per day. So, we modified our MySQL to reduce that…

Hopefully this was the problem, and not the server itself.

How are things going? Does your problem still exist?

Hey @mister2d,

Things are stable right now. So, so far, looking better.

Note that we made a BIG change to drop our image sizes to below 1G. Delete all your .git repo stuff (big size shrink), all extraneous libraries, use an Alpine-based image, ensure logs are deleted, etc.

I’ll update in a few days. We’ve been actually doing some development with the devops env stable, and we haven’t really wired it all back in. So, I’ll update when we start regularly pushing again. But, for the pushes we do… seems to be working.

@William_Flanagan

Nice. I’m glad things are stable now. Gives me some confidence going forward.

Basically, this stuff is not simple and requires a thorough understanding. “She’ll be right mate” is not going to cut it.

This has been my biggest takeaway from getting involved in the “container way”.

It might be easy to get started, however once real work becomes a thing, unless you are actually familiar with everything (of which there is always something else to figure out), Your gonna have a bad time.

My biggest focus right now is sorting out the monitoring of hosts, containers and apps as well as associated outputs that end up on disk, especially log files. Monitoring and Persistent data are massive topics on their own and become essential once containers get involved.

Standing up a Wordpress blog that has zero posts or traffic is easy to run. However running a Wordpress blog that gets significant traffic and it is important that it stays online as it is the source of all revenue that pays the 10+ people employed, well, that is something completely different.

Glad you guys have been able to narrow things down, and seem to be having success :smiley: