Hi,
rancher server rely on a mysql database.
The documentation seems implicitely, to advice to rely on an external database (mysql), maybe an amazon RDS to have the HA feature.
Another option is to have a mysql container running on the rancher cluster.
Why rely on a unique mysql database, which seems to be a Single point Of Failure?
Why not (except time and knowledge), rely on another database, more failure resilient like casssandra for example?(but sacrifying consistency).
Or do you plan to couple the mysql instance with your upcoming glusterFS/Convoy/volume management feature?
While not fashionable, MySQL is stable, easy to install, backup, restore, replicate, migrate, ad-hoc query, etc. It is comparatively easy for sysadmins to manage and debug, and available as-a-service virtually everywhere for those that don’t want to.
Cassandra and most other options are few to none of those, without even getting into the coding challenges of representing highly relational data searchable and sortable by almost everything into their models and eventual consistency.
You can use multi-master (i.e. Percona XtraDB Cluster) or standard master/slave with manual or automated failover to handle the loss of a database server. We will probably manage it ourselves as part of self-managed scaling of Rancher.
Thanks for this answer.
I understand your position about mysql.
I hope rancher labs will provide a solution out of the box later to solve this SPOF problem.
I understand that a failure in the Rancher server blocks deployments and changes to stacks. But does it stop agent networking or containers from continuing to work?
One reason I am looking to use Rancher over Kubernetes is its operational simplicity. Backing up the MySQL directory or an external MySQL database is very simple and familiar these days. Pair that with the fact that starting a Rancher server is literally one Docker run command and it is really hard to beat the disaster recovery scenario.
That said, I tested this recently (stopping the Rancher server and moving it to a new host) and it does still have some kinks to be worked out. The main issue I had was the reconnecting of agents to the new server. It was a manual process for each agent and ended up creating a new agent in the UI rather than re-using the previous.
First off, it is possible to do a complete HA setup, but fairly involved. We will make this simpler eventually. But the single container is “good enough” for many uses given the odds of failure, impact while it’s down and the ease of recovery (especially if using a database-as-a-service provider like RDS)
While the rancher/server container(s) is/are down or unreachable:
Existing containers on existing hosts continue to run
They can continue to communicate across the managed network (it is point-to-point between Network Agents on each host, the management server is not in the datapath)
Health checks will not be reported up (since nobody is listening…) and failing containers will not be rescheduled or proactively pulled out of DNS/load balancers. Haproxy will still detect failed backends and stop sending traffic to them though.
New containers can be added to existing hosts through the native docker run directly on the host, but (I think? should test this) will not fully work with the managed network since other hosts won’t know where to find the new container without a config update from the server.
When it comes back up:
Agents will reconcile their state, adding any containers that exist on the host but not in Rancher and removing vice versa.
Failed healthchecks and hosts will be rescheduled as usual
If you expect rancher/server to change IPs (rather than coming back up with the same IP after e.g. a failure of the hardware it runs on) it would be much better to use a DNS entry for it that can be updated in one place. If the agents are registered with an IP there’s no way for them to figure out the new IP when it changes. You can configure the registration URL (for future hosts) in the Admin tab -> Host Registration.
I totally agree clescot’s viewpoint. Originally I am very impressive about Docker OS concept of what RancherOS provides…until I saw MySQL. I got too many nightmare from MySQL. I’ll not roll out Rancher to production until no more MySQL. At this moment, I prefer etcd.
Come on guys…mysql is clearly the right choice here. AWS, Google Cloud, Rackspace, etc. all have a PaaS mysql that makes deploying a reliable rancher server extremely simple. The fact you haven’t used MySQL much shouldn’t drive the architecture of Rancher.