Rancher HA unstable

dave_s · September 27, 2016, 6:16pm

Hi all,
I tried to setup an HA configuration of Rancher, but I have not yet been able to reach a stable deploy. Each time I struggle with different errors and outcomes. I first installed a MariaDB Galera cluster and I created a script by following the official docs. After that I deployed the script on three different servers and run it. I tried with different Docker version 1.12,1.10 and 1.11 but the outcomes are similar, I get many of these errors when I setup the cluster the first time:

Error response from daemon: Unable to remove filesystem for 34ec156e02466420d5781d72e44ec036a9da2e5e37a58547be8886195bc2904a: remove /var/lib/docker/containers/34ec156e02466420d5781d72e44ec036a9da2e5e37a58547be8886195bc2904a/shm: device or resource busy

I also see these messages but I think they means that the ha cluster is waiting all nodes to start (It is not clear when a node is ready and I can proceed to start the next one, or if I should start all of them together the first time I setup the HA):

Can not launch agent right now: Server not available at http://172.17.0.1:18080/ping

And finally the only time I was able to setup the cluster, after a couple of hourse the HA failed due to deadlocks:

Failed to do cluster check in" component=manager err="Error 1180: Got error 35 “Resource deadlock avoided” during COMMIT

And then I wasn’t able to restore the connectivity of the cluster.

Is there a specific configuration for the database cluster (like all nodes should write on the same node) ? Moreover did someone found this error: Unable to remove filesystem? I am using redhat 7 with devicemapper ? It looks like a node keep deleating and recreating containers indefinitely.

I try to ask here for help before giving up with the HA, since I cannot reach a stable deploy after many tries (every time I drop the db and clean the rancher containers and files to be sure to start from a clean environment).

Regards

Topic		Replies	Views
Rancher-HA startup woes and problems Rancher 1.x	0	1075	October 17, 2016
Production Rancher HA on AWS Rancher 1.x	10	1999	May 27, 2016
Rancher HA setup - not working (?) Rancher 1.x	6	2978	May 17, 2016
Rancher HA Management Stack not coming up Rancher 1.x	20	2979	April 30, 2016
Problem install Rancher HA with Galera Cluster Rancher 1.x	0	913	June 21, 2018

Rancher HA unstable

Related topics