I have had one host in my environment where I set up a custom stack and after attempting to remove this stack the only container within is has been stuck on ‘in progress’ - when clicking on delete i get an internal server error, but am unable/unaware as to where to find more details on this error.
I am unable to delete this stack or container, if i run ‘docker ps -a’ on the host the container doesn’t exist, nor does the image for it.
I’ve since been making some changes to my environment and have removed this host from the system - I’ve attempted to remove the host from within my environment within rancher but it wont remove either, as it still appears to be waiting for that stack to be removed.
Any help on how to manually remove this host from my environment would be greatly appreciated.
Denise, after many, MANY painful hours of getting our docker back up after OpenSuSE pushed a docker update that broke our agent connection do to a GO incompatibility, we now have a working docker installation at 1.11.1 versus the 1.9.1 that OpenSuSE refuses to upgrade from.
Now, we cannot connect the host back to the server (they are one and the same: agent is running on the host). I cannot delete the original host and I cannot add a new one. Doing a docker logs on the new agent returns nothing.
There is a process that I cannot kill because it does not exist in Docker, but Rancher thinks it’s still there on the original host config. It will not go away, no matter what I’ve tried (rm -rf of /var/lib/rancher/state, restarting docker, etc.).
We can’t do any work until this gets resolved and we’ve already lost two days trying to get this fixed.
rancher-catalog/templates> docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
e61b78e8d286 rancher/agent:v1.0.2 "/run.sh http://axioh" 2 minutes ago Up 21 seconds tiny_thompson
734443046c3d rancher/server:latest "/usr/bin/s6-svscan /" About an hour ago Created trusting_shaw
rancher-catalog/templates> docker logs --tail 10 e61b
rancher-catalog/templates> docker logs --tail 10 7344
So no idea what is going on with either the server or the agent. No host. It’s been about two hours since I tried to add the host back.
It seems like you have some issues with your docker daemon as typically the docker ps would look something like this. Note how the rancher/server shows Up instead of Created.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0af496faf96d rancher/server:v1.2.0-pre2 "/usr/bin/s6-svscan /" 6 days ago Up 6 days 3306/tcp, 0.0.0.0:8080->8080/tcp adoring_hodgkin
Have you tried launching other containers that expose ports like rancher/server? Can you show me the output of those containers?
If your rancher/server isn’t up and running, how did you register rancher/agent? Where did you get the command for it?
Since we hadn’t heard from you in several days, we’ve been trying to fix it
ourselves. Our docker installation is working fine as we can launch
containers by hand from the command line (there were several there that I
did not show you in the docker ps listing). Right now, the problem we’re
encountering is that the server cannot start mysqld. Since I cannot do a docker exec -it xyz1 bash to the container, I cannot see what the problem
is. I’m getting really tired of recreating everyone’s accounts and
recreating all of the settings we had set everytime there is an issue with
rancher.
Containers running with a state that says they’re not running, with no ports, no logs, and no exec does not sound like a Docker that is “working fine”. You continue to choose to run on a distribution that is not well tested or supported in the Docker ecosystem against our advice (with a daemon you compiled yourself now?) and you’re going to continue to have self-inflicted problems because of it.
Wow! How typical of you guys. You wait EONS before replying, forcing me to try whatever I can to fix your stuff. Then, when you do reply, you both make assumptions and snipe/troll at me. You guys are great. Really helpful.
Our docker installation is indeed working. I didn’t post all of the output to try and keep the post brief. Here’s a whole listing of what’s there:
rancher-catalog/templates> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
02e02bc00ba5 735936678121 "/bin/sh -c 'apt-get " About an hour ago Up About an hour 3838/tcp admiring_ride
601f3f8f1bbf 60edac958e73 "/bin/sh -c 'bash /ap" About an hour ago Up About an hour 3000/tcp sharp_mcclintock
99eef307d4af 39d4364418eb "/init" 8 weeks ago Up 7 days 0.0.0.0:59873->8787/tcp r-Sequencing-EllisRJR_sequencing_1
817ae93f5ccf dnadave/rtg-core:may2016 "/init" 8 weeks ago Up 7 days 0.0.0.0:60938->3000/tcp r-RTGCore-C34001Takeda_RTGCore_1
38ca830ad8bd dnadave/rtg-core:may2016 "/init" 11 weeks ago Up 7 days 0.0.0.0:58805->3000/tcp r-RTGCore-SS_RTGCore_2
9d86e8632f72 dnadave/wettyr:apr2016 "/init" 11 weeks ago Up 7 days 0.0.0.0:60533->3000/tcp r-Wetty_wetty_1
a8c1453867d5 69ca9c4d2639 "/init" 11 weeks ago Up 7 days 0.0.0.0:50586->3000/tcp r-RTGCore-Ellis_RTGCore_1
9aa4292394be e1fc2098433a "/init" 12 weeks ago Up 7 days 0.0.0.0:55566->8787/tcp r-MicroarrayRJR_microarray_1
2e20dffbf9e8 e1fc2098433a "/init" 12 weeks ago Up 7 days 0.0.0.0:53515->8787/tcp r-MicroarrayDave_microarray_4
So, you can see that there are containers running with ports. it’s just your stuff that doesn’t report any logs. Everything else works fine. Everything. it’s just your stuff that isn’t working.
I did not compile the docker myself. You assumed that. But, you are VERY wrong. I’m using one of the other installation sources for docker from OpenSuSE. You can inspect their compilation logs on their website, if you wish.
As I said before, it gets REALLY OLD recreating ALL of our settings EVERY TIME there’s an issue with your stuff. It does not create trust in your user base.
Now, how do I fix this without deleting my rancher-data container???
The docker ps you posted, into a 2 month old topic, showed the server container not running, with no logs and no working exec. The only thing reasonable thing one can conclude from that after you say you spent many hours fighting Docker, is something is still very wrong with your Docker daemon. Compiling it yourself was a question.
Posting a different ps that doesn’t show a server or agent container at all doesn’t tell me anything new. The agent container exercises many more corners of the docker API than running a few basic containers with one port.
You are using a free open-source software project. Nobody is under any obligation to answer you immediately, nor at all. The core engineers offer their time here, often after-hours, to help community users, but when you are consistently rude and demanding to our employees it gives us little incentive to want to help you.
And the problems you’ve had have repeatedly been a result of misconfigured Docker or networking on OpenSUSE and resolved by you changing something external, not an actual bug that resulted in an actual issue with an actual fix into a Rancher repo, so there is usually little we can do to help you.
I’ll say this. I didn’t start out rude. I ended up there because it would be DAYS before you would reply to an issue. I use a lot of open source software in my work and I use a lot of forums for support. Yours is routinely several orders of magnitude longer to get a response from. Most forums I get an answer in minutes to maybe a few hours. Yours can be days to weeks. As far as I can tell, you never responded to the original poster of this question. Not cool.
Then, your continual proposed solution to my problem is to wipe my server and install your preferred distribution. I’m not using some obscure distribution, I’m using one that’s been around since the mid 1990s. It has a large user base. What you ask me to do is not some minor inconvenience, it is a major shift in our IT infrastructure.
So yes, I get very frustrated with you and your team. You ask a lot of me as a user: wiping a server is not a small task. I’m only asking that your software work and for help if it doesn’t.
You are technically correct, it wasn’t until your 4th issue where you started to feel entitled to yell at us for not replying to you for 16 entire hours starting at 6pm Pacific.
Anyway, we can still tell you nothing from what you’ve posted. If the server and agent are running, there are logs for them.
I tried out Rancher over the past week and it doesn’t look it will be a good solution for me. I have a standard ubuntu 16.04 server on Digital Ocean, and I’m trying to completely remove rancher and docker; it took some digging on the internet to figure out how to do this, and I’ve finally got it whittled down but now I can’t finish removing /var/lib/rancher and /var/lib/docker. Here are the outputs I get:
rm: cannot remove 'rancher/volumes': Device or resource busy```
I read that using this command might help track down the running processes so they can be killed, but no dice:
```lsof +D ./
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/c24324d8b667
Output information may be incomplete.
lsof: WARNING: can't stat() nsfs file system /run/docker/netns/default
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 16667 blakers757 cwd DIR 253,1 4096 267276 .
lsof 27938 blakers757 cwd DIR 253,1 4096 267276 .
lsof 27939 blakers757 cwd DIR 253,1 4096 267276 .```
When I try to kill the processes by pid, it fails.
docker ps shows no running containers:
```docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
when I try to remove /var/lib/docker, I get the following:
```sudo rm -rf /var/lib/docker
rm: cannot remove '/var/lib/docker/overlay': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/5b04c89cac02bfebc6de9355808c905e149dd7cb2f324952750b49aa93393ef4/merged': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/4a17da45150a3e24ecef6babb933872f9aa403f3a072d5d37aff3b71b9eb936a/merged': Device or resource busy```
whatever is running inside this `overlay2` folder seems to be to blame.
Just wondering if you all have any ideas, thanks.
/var/lib/docker is where all state and volumes for Docker are stored. If you’re trying to remove Docker, remove it first and then the files wouldn’t be in use by the daemon. If you just want to delete the volumes, docker volume ls and then docker volume rm it/them. Or on a new enough daemon, docker system prune.
Thanks! Unfortunately, the recommended docker uninstall command is failing however:
rm: cannot remove '/var/lib/docker/overlay': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/5b04c89cac02bfebc6de9355808c905e149dd7cb2f324952750b49aa93393ef4/merged': Device or resource busy
rm: cannot remove '/var/lib/docker/overlay2/4a17da45150a3e24ecef6babb933872f9aa403f3a072d5d37aff3b71b9eb936a/merged': Device or resource busy```
It seems that something inside the `overlay2/` directory is causing the issue, but I can't stop or kill it.