I seem to be having the same problem as aschrijver did in this thread, difference being: I cannot easily just “nuke it and start over” as the server is deployed in a remote location (I had selected RancherOS purposefully as it looked like among the least likely to “randomly break”…apparently this was not a good assumption)
a very quick summary of the problem. A few minutes after a reboot, the following commands (pay special attention to the “docker” image, which has been restarting over and over for the last 10 minutes, with completely uninformative logfiles):
[rancher@rancher ~]$ docker container ls -a
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.38/containers/json?all=1: dial unix /var/run/docker.sock: connect: permission denied
[rancher@rancher ~]$ sudo docker container ls -a
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
[rancher@rancher ~]$ sudo system-docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
f5b1b2a44316 rancher/os-console:v1.5.3 "/usr/bin/ros entr..." 10 minutes ago Up 10 minutes console
c531010aeef9 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 10 minutes ago Created all-volumes
a58cd29a685e rancher/os-console:v1.5.3 "/usr/bin/ros entr..." 10 minutes ago Exited (255) 10 minutes ago reboot
683c7cd8bbcf rancher/os-docker:18.06.3 "ros user-docker" 20 minutes ago Restarting (1) 26 seconds ago docker
b0c20bc40821 rancher/os:v1.5.3 "/bin/ros install ..." 21 minutes ago Exited (255) 21 minutes ago once_os-upgrade_1
99471dfe162e rancher/os-acpid:v1.5.3 "/usr/bin/ros entr..." 3 weeks ago Exited (255) 3 weeks ago poweroff
e2c152223480 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Exited (0) 9 minutes ago preload-user-images
3553495a024a rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Exited (1) 10 minutes ago subscriber
9a302ec5e077 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Exited (0) 10 minutes ago cloud-init-execute
a18f9bf8b9da rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Up 10 minutes ntp
9657d19916b2 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Up 10 minutes network
1d3e662bcea8 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Up 10 minutes udev
1fb30afb259d rancher/container-crontab:v0.4.0 "container-crontab" 4 weeks ago Up 10 minutes system-cron
e4a7cd582059 rancher/os-logrotate:v1.5.3 "/usr/bin/entrypoi..." 4 weeks ago Exited (0) About an hour ago logrotate
2c9cc048d474 rancher/os-syslog:v1.5.3 "/usr/bin/entrypoi..." 4 weeks ago Up 9 minutes syslog
6197f00d9655 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Exited (0) 10 minutes ago udev-cold
1398048da474 rancher/os-acpid:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Up 10 minutes acpid
fb53ba4934f4 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Created user-volumes
9ff391db54d5 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Created system-volumes
00c89d888a7e rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Created media-volumes
66150408f9a3 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Created container-data-volumes
ce43b7535809 rancher/os-base:v1.5.3 "/usr/bin/ros entr..." 4 weeks ago Created command-volumes
[rancher@rancher ~]$
I tried using ros engine
to toggle Docker to docker-18.09.8
and back to docker-18.06.3-ce
, but that didn’t fix it.
How can I debug the user-docker service and try to, if nothing else, reset it to factory settings? (sudo rm -rfv /var/lib/rancher/cache
deleted some files, but did not solve the problem.)