Hi all,
I tried to setup an HA configuration of Rancher, but I have not yet been able to reach a stable deploy. Each time I struggle with different errors and outcomes. I first installed a MariaDB Galera cluster and I created a script by following the official docs. After that I deployed the script on three different servers and run it. I tried with different Docker version 1.12,1.10 and 1.11 but the outcomes are similar, I get many of these errors when I setup the cluster the first time:
Error response from daemon: Unable to remove filesystem for 34ec156e02466420d5781d72e44ec036a9da2e5e37a58547be8886195bc2904a: remove /var/lib/docker/containers/34ec156e02466420d5781d72e44ec036a9da2e5e37a58547be8886195bc2904a/shm: device or resource busy
I also see these messages but I think they means that the ha cluster is waiting all nodes to start (It is not clear when a node is ready and I can proceed to start the next one, or if I should start all of them together the first time I setup the HA):
Can not launch agent right now: Server not available at http://172.17.0.1:18080/ping
And finally the only time I was able to setup the cluster, after a couple of hourse the HA failed due to deadlocks:
Failed to do cluster check in" component=manager err="Error 1180: Got error 35 “Resource deadlock avoided” during COMMIT
And then I wasn’t able to restore the connectivity of the cluster.
Is there a specific configuration for the database cluster (like all nodes should write on the same node) ? Moreover did someone found this error: Unable to remove filesystem? I am using redhat 7 with devicemapper ? It looks like a node keep deleating and recreating containers indefinitely.
I try to ask here for help before giving up with the HA, since I cannot reach a stable deploy after many tries (every time I drop the db and clean the rancher containers and files to be sure to start from a clean environment).
Regards