Can you describe in more detail what stopped working means? Is Rancher crashing and the container is continuously starting -> stopping -> starting etc? Can you post the log to a gist or some other service that lets you paste logs (pastebin/0bin etc)?
Well, rancher was working until last friday. I changed nothing on it, I know the default end-user answer, and tried accessing it again this weekend. This is when I saw that I was getting a DNS_NOT_FOUND error in the internet browser.
I restored the rancher to a backup of friday, but still getting the same error. Then I returned to the last running state and updated the rancher to the latest stable docker image. And again the same.
After the startup of the container, and only if I’m fast, I can access the login page of my rancher. But then it suddenly restarts.
I checked the certificates in the meanwhile and they are still valid until December. I checked the rights of the folders as these are mounted to the host system. They are not changed compared with the moment everything went well. I’m guessing something happened inside the /opt/rancher/ folder but I cannot seem to find what.
When I do docker logs container I get more information from the system then what is actually in the logs when I go and look inside the container with docker exec.
If the name you’re pointing at is not resolving then no amount of tinkering with Rancher is going to fix that. You need the DNS record you’re trying to get to to resolve to the IP(s) the server containers are running on.
I did some more testing today and found out that the problem is inside my etcd, the strange thing however is that I do not understand why this is also not working when I do a restore to a previous version.
The easiest solution for me atm would be setting up a new Rancher, but I need to be able to reconnect the existing clusters, created with rancher, to the new Rancher in this case without loosing the containers already running on this clusters.