Unable to remove a host from my environment

Hey.

I have had one host in my environment where I set up a custom stack and after attempting to remove this stack the only container within is has been stuck on ‘in progress’ - when clicking on delete i get an internal server error, but am unable/unaware as to where to find more details on this error.

I am unable to delete this stack or container, if i run ‘docker ps -a’ on the host the container doesn’t exist, nor does the image for it.

I’ve since been making some changes to my environment and have removed this host from the system - I’ve attempted to remove the host from within my environment within rancher but it wont remove either, as it still appears to be waiting for that stack to be removed.

Any help on how to manually remove this host from my environment would be greatly appreciated.

Thanks,

Nathan

Are you still having issues with this host? I have never had issues with removing a host.

Hi Denise.

Yup still having issues, this host is still wont disappear.

Nathan

Denise, after many, MANY painful hours of getting our docker back up after OpenSuSE pushed a docker update that broke our agent connection do to a GO incompatibility, we now have a working docker installation at 1.11.1 versus the 1.9.1 that OpenSuSE refuses to upgrade from.

Now, we cannot connect the host back to the server (they are one and the same: agent is running on the host). I cannot delete the original host and I cannot add a new one. Doing a docker logs on the new agent returns nothing.

There is a process that I cannot kill because it does not exist in Docker, but Rancher thinks it’s still there on the original host config. It will not go away, no matter what I’ve tried (rm -rf of /var/lib/rancher/state, restarting docker, etc.).

We can’t do any work until this gets resolved and we’ve already lost two days trying to get this fixed.

So, here we are, 5 days later. No reply from the Rancher guys (all too typical) and I’m still without a working Rancher.

rancher-catalog/templates> docker ps -a
CONTAINER ID        IMAGE                                COMMAND                  CREATED             STATUS                    PORTS                     NAMES
e61b78e8d286        rancher/agent:v1.0.2                 "/run.sh http://axioh"   2 minutes ago       Up 21 seconds                                       tiny_thompson
734443046c3d        rancher/server:latest                "/usr/bin/s6-svscan /"   About an hour ago   Created                                             trusting_shaw

rancher-catalog/templates> docker logs --tail 10 e61b
rancher-catalog/templates> docker logs --tail 10 7344

So no idea what is going on with either the server or the agent. No host. It’s been about two hours since I tried to add the host back.

It seems like you have some issues with your docker daemon as typically the docker ps would look something like this. Note how the rancher/server shows Up instead of Created.

CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS                              NAMES
0af496faf96d        rancher/server:v1.2.0-pre2   "/usr/bin/s6-svscan /"   6 days ago          Up 6 days           3306/tcp, 0.0.0.0:8080->8080/tcp   adoring_hodgkin

Have you tried launching other containers that expose ports like rancher/server? Can you show me the output of those containers?

If your rancher/server isn’t up and running, how did you register rancher/agent? Where did you get the command for it?

Since we hadn’t heard from you in several days, we’ve been trying to fix it
ourselves. Our docker installation is working fine as we can launch
containers by hand from the command line (there were several there that I
did not show you in the docker ps listing). Right now, the problem we’re
encountering is that the server cannot start mysqld. Since I cannot do a
docker exec -it xyz1 bash to the container, I cannot see what the problem
is. I’m getting really tired of recreating everyone’s accounts and
recreating all of the settings we had set everytime there is an issue with
rancher.

Containers running with a state that says they’re not running, with no ports, no logs, and no exec does not sound like a Docker that is “working fine”. You continue to choose to run on a distribution that is not well tested or supported in the Docker ecosystem against our advice (with a daemon you compiled yourself now?) and you’re going to continue to have self-inflicted problems because of it.

Wow! How typical of you guys. You wait EONS before replying, forcing me to try whatever I can to fix your stuff. Then, when you do reply, you both make assumptions and snipe/troll at me. You guys are great. Really helpful.

Our docker installation is indeed working. I didn’t post all of the output to try and keep the post brief. Here’s a whole listing of what’s there:

rancher-catalog/templates> docker ps
CONTAINER ID        IMAGE                      COMMAND                  CREATED             STATUS              PORTS                     NAMES
02e02bc00ba5        735936678121               "/bin/sh -c 'apt-get "   About an hour ago   Up About an hour    3838/tcp                  admiring_ride
601f3f8f1bbf        60edac958e73               "/bin/sh -c 'bash /ap"   About an hour ago   Up About an hour    3000/tcp                  sharp_mcclintock
99eef307d4af        39d4364418eb               "/init"                  8 weeks ago         Up 7 days           0.0.0.0:59873->8787/tcp   r-Sequencing-EllisRJR_sequencing_1
817ae93f5ccf        dnadave/rtg-core:may2016   "/init"                  8 weeks ago         Up 7 days           0.0.0.0:60938->3000/tcp   r-RTGCore-C34001Takeda_RTGCore_1
38ca830ad8bd        dnadave/rtg-core:may2016   "/init"                  11 weeks ago        Up 7 days           0.0.0.0:58805->3000/tcp   r-RTGCore-SS_RTGCore_2
9d86e8632f72        dnadave/wettyr:apr2016     "/init"                  11 weeks ago        Up 7 days           0.0.0.0:60533->3000/tcp   r-Wetty_wetty_1
a8c1453867d5        69ca9c4d2639               "/init"                  11 weeks ago        Up 7 days           0.0.0.0:50586->3000/tcp   r-RTGCore-Ellis_RTGCore_1
9aa4292394be        e1fc2098433a               "/init"                  12 weeks ago        Up 7 days           0.0.0.0:55566->8787/tcp   r-MicroarrayRJR_microarray_1
2e20dffbf9e8        e1fc2098433a               "/init"                  12 weeks ago        Up 7 days           0.0.0.0:53515->8787/tcp   r-MicroarrayDave_microarray_4

So, you can see that there are containers running with ports. it’s just your stuff that doesn’t report any logs. Everything else works fine. Everything. it’s just your stuff that isn’t working.

I did not compile the docker myself. You assumed that. But, you are VERY wrong. I’m using one of the other installation sources for docker from OpenSuSE. You can inspect their compilation logs on their website, if you wish.

As I said before, it gets REALLY OLD recreating ALL of our settings EVERY TIME there’s an issue with your stuff. It does not create trust in your user base.

Now, how do I fix this without deleting my rancher-data container???

The docker ps you posted, into a 2 month old topic, showed the server container not running, with no logs and no working exec. The only thing reasonable thing one can conclude from that after you say you spent many hours fighting Docker, is something is still very wrong with your Docker daemon. Compiling it yourself was a question.

Posting a different ps that doesn’t show a server or agent container at all doesn’t tell me anything new. The agent container exercises many more corners of the docker API than running a few basic containers with one port.

You are using a free open-source software project. Nobody is under any obligation to answer you immediately, nor at all. The core engineers offer their time here, often after-hours, to help community users, but when you are consistently rude and demanding to our employees it gives us little incentive to want to help you.

And the problems you’ve had have repeatedly been a result of misconfigured Docker or networking on OpenSUSE and resolved by you changing something external, not an actual bug that resulted in an actual issue with an actual fix into a Rancher repo, so there is usually little we can do to help you.

I’ll say this. I didn’t start out rude. I ended up there because it would be DAYS before you would reply to an issue. I use a lot of open source software in my work and I use a lot of forums for support. Yours is routinely several orders of magnitude longer to get a response from. Most forums I get an answer in minutes to maybe a few hours. Yours can be days to weeks. As far as I can tell, you never responded to the original poster of this question. Not cool.

Then, your continual proposed solution to my problem is to wipe my server and install your preferred distribution. I’m not using some obscure distribution, I’m using one that’s been around since the mid 1990s. It has a large user base. What you ask me to do is not some minor inconvenience, it is a major shift in our IT infrastructure.

So yes, I get very frustrated with you and your team. You ask a lot of me as a user: wiping a server is not a small task. I’m only asking that your software work and for help if it doesn’t.

You are technically correct, it wasn’t until your 4th issue where you started to feel entitled to yell at us for not replying to you for 16 entire hours starting at 6pm Pacific.

Anyway, we can still tell you nothing from what you’ve posted. If the server and agent are running, there are logs for them.

Hello,

I tried out Rancher over the past week and it doesn’t look it will be a good solution for me. I have a standard ubuntu 16.04 server on Digital Ocean, and I’m trying to completely remove rancher and docker; it took some digging on the internet to figure out how to do this, and I’ve finally got it whittled down but now I can’t finish removing /var/lib/rancher and /var/lib/docker. Here are the outputs I get:

rm: cannot remove 'rancher/volumes': Device or resource busy```

I read that using this command might help track down the running processes so they can be killed, but no dice:
```lsof +D ./
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/c24324d8b667
      Output information may be incomplete.
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/default
      Output information may be incomplete.
COMMAND   PID       USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
bash    16667 blakers757  cwd    DIR  253,1     4096 267276 .
lsof    27938 blakers757  cwd    DIR  253,1     4096 267276 .
lsof    27939 blakers757  cwd    DIR  253,1     4096 267276 .```

When I try to kill the processes by pid, it fails. 

docker ps shows no running containers:
```docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

when I try to remove /var/lib/docker, I get the following:
```sudo rm -rf /var/lib/docker
rm: cannot remove '/var/lib/docker/overlay': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/5b04c89cac02bfebc6de9355808c905e149dd7cb2f324952750b49aa93393ef4/merged': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/4a17da45150a3e24ecef6babb933872f9aa403f3a072d5d37aff3b71b9eb936a/merged': Device or resource busy```

whatever is running inside this `overlay2` folder seems to be to blame.

Just wondering if you all have any ideas, thanks.

/var/lib/docker is where all state and volumes for Docker are stored. If you’re trying to remove Docker, remove it first and then the files wouldn’t be in use by the daemon. If you just want to delete the volumes, docker volume ls and then docker volume rm it/them. Or on a new enough daemon, docker system prune.

Thanks! Unfortunately, the recommended docker uninstall command is failing however:

rm: cannot remove '/var/lib/docker/overlay': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/5b04c89cac02bfebc6de9355808c905e149dd7cb2f324952750b49aa93393ef4/merged': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/4a17da45150a3e24ecef6babb933872f9aa403f3a072d5d37aff3b71b9eb936a/merged': Device or resource busy```

It seems that something inside the `overlay2/` directory is causing the issue, but I can't stop or kill it.

I’m getting a similar kind of error when attempting to remove /var/lib/rancher