[Solved] Can't remove dead Rancher Infra containers due to `Unable to remove filesystem for ... device or resource busy` errors

I noticed that my Rancher cluster has a number of failed Infrastructure Containers, alongside a number of similarly-named Infrastructure Containers which are working. I deleted all of these failed containers yesterday, but they have reappeared today.

Should I be expecting to see Infrastructure Containers respawning on occasion? Is this normal?

See this screenshot:

The logs for each of these containers is blank (Under Infrastructure > Containers > click ‘…’ menu for a container, ‘View Logs’). I was expecting something, or at least an error. However “View Logs” shows nothing, almost as if the Rancher Server cannot view the logs on these remote containers.

1 Like

Ah, I cannot remove the containers due to the error Error response from daemon: Unable to remove filesystem for ... device or resource busy:

# docker rm e2fc2f7cf1d4
Error response from daemon: Unable to remove filesystem for e2fc2f7cf1d4605756d42f2ffbd9cb01e29c3ba67012905bc7ae06273ca2d720: remove /var/lib/docker/containers/e2fc2f7cf1d4605756d42f2ffbd9cb01e29c3ba67012905bc7ae06273ca2d720/shm: device or resource busy
#

However, Rancher did try to remove them when I hit the “Delete” button for each of these dead containers, and they disappear from the GUI even though they aren’t really removed from the host. But then they re-appear in the GUI 20 minutes later.

Has anyone else had any luck resolving the “device or resource busy” error?

I was able to remove these containers with:

  1. Find the Dead containers, and remove them. Note that Docker reports an error, but will actually remove the container in a moment.

     node1:~ # docker ps -a | grep rancher | grep Dead
     5a13396e0c0d        rancher/healthcheck:v0.1.0          "/.r/r /tini -- healt"   15 hours ago        Dead                                                         r-healthcheck-healthcheck-1-9ff9fa3c
     node1:~ # docker rm -f $CONTAINER
     Error response from daemon: Unable to remove filesystem for 5a13396e0c0df23eace402a26e2a7744f3ef06c0a3fb9207b908c502425b5d52: remove /var/lib/docker/containers/5a13396e0c0df23eace402a26e2a7744f3ef06c0a3fb9207b908c502425b5d52/shm: device or resource busy
    
  2. Wait. Go ahead and tail -f /var/log/messages, and it prints several more messages like
    Error response from daemon: Unable to remove filesystem for 5a13396e0c0df23eace4...

  3. Check again, and the container is actually gone!

     node1:~ # docker ps -a | grep rancher | grep Dead
     node1:~ #
    

I couldn’t find any mountpoints for these containers using mount, so hopefully there aren’t any stale files leftover.

So all in all, this appeared to be a complicated problem, but in the end, the workaround was actually easy, although I’m never a fan of using a --force flag to perform routine tasks.

For the record, I’m using Docker 1.12 and CentOS 7. /var/lib/docker is created using LVM in lvm-direct mode (Not loop mode). My backing filesystem is XFS. The whole system is actually fairly vanilla.

This issue is likely outside of Rancher. According to the Docker issue tracker, there are many possible causes of the Unable to remove filesystem for ... device or resource busy: errors. Here’s one such ticket: https://github.com/docker/docker/issues/27381

2 Likes