How to find out current leader in HA cluster

How can I find out which server is the current leader in an HA cluster? A text string would be best for HAProxy or similar to check.

We are planning to add a meta data service that can be used for containers to query for runtime meta data information about your service. One of those information would be the “leader” of a service. It is scheduled to be added in our 1.0 release (https://github.com/rancher/rancher/issues/1274).

I’m not sure if we’re talking about the same thing.

Just to clarify, I would like to be able to hit some sort of status page for an HA cluster of "rancher server” servers and find out which one is currently leading.

As an example, marathon uses this approach: https://mesosphere.github.io/marathon/docs/rest-api.html#get-/v2/leader

Ah, I see. You want to understand what is the leader of a Rancher/Server multi-node setup. Rancher actually does not employ a leader or master/slave architecture where the non-leader would proxy requests to the leader. Any Rancher server instances will accept incoming API requests and queue them in the mySQL DB. A “ProcessHandler” would then pick up the request and execute it. We currently rely on Redis (event publishing) and Zookeeper (txn handling) as separate microservices that are needed for a multi-node setup but are looking into removing Zookeeper to keep installation simpler.

Interesting. It seemed that I was only actually able to get anything to work when I pointed the proxy at the current master. The interface came up, but was not fully responsive. I’ll have to try that again.

Should redis be set up in clustered or standalone on each machine? It doesn’t say specifically in the docs so I assumed a standalone instance on each machine.

Zookeeper is a pain point of the installation. It’s difficult to get right and took me many tries when I was evaluating mesos. Luckily when I started using rancher I already had my cloud-init files ready to go for a cluster! :smile:

We just recently documented this process but again, we are in the middle of making our multi-node setup a lot easier (i.e. removing Zookeeper). If you have issues with hitting the other nodes, than it’s definitely a problem on our side and we’ll look into it.

@willchan, the trick here isn’t to add/remove zookeeper, or redis, or MySQL. The trick is to roll all of those into the Rancher server image, so all I need to do is start multiple Rancher server images, pass config that tells them how to find the others, and they all do the right thing.

Look at how etcd works: I just tell it about other etcd servers, and it picks it up from there.

Any reason why not to put Redis and MySQL and zookeeper (if you still need it) into the Rancher server container, and then pass it, e.g. -e rancher_servers=10.0.2.5,10.0.2.6,10.0.2.7 to tell it how to find all of the servers?

@deitch, we are planning to do what you’ve described. We will add at the basic services like Redis, MySQL (we’ve done this already), and Zookeeper (we are planning to remove it) into a single Rancher container for easier installation.

Great. Definitely on the same page. Then all you need to do is make sure the servers know about each other and have the right tokens. Is there a sense of ETA / version target?

Is there a page anywhere that describes how these services are used? You have the Web service (OK, makes sense), and MySQL (database) but also Redis and ZK, and the HA doc describes using websockets-proxy as well.

What is the purpose of each component? How are they connected? And how does it function when there is just one server vs. multiple?

@deitch, we do need to put together a diagram that shows all of the moving parts. We have been holding off in anticipation of it changing. To answer your question though, there are the main Rancher instances which run the Java processes and provide the APIs to drive everything else. There is the MySQL database which is the centralized datastore for everything. All state is stored in the database. Then there is the websocket-proxy and go-machine-service. The websocket-proxy provides an endpoint for the compute nodes to reach for websocket connections to provide logs, stats, etc. The go-machine-service is the light weight endpoint that wraps docker-machine for server provisioning.

In a single container setup, we run Rancher, MySQL, go-machine-service, and websocket-proxy inside the container. We manage Rancher and MySQL with s6 process manager and Rancher starts the other two services. Rancher in this setup also uses an in memory lock manager and pubsub mechanism.

In a multi-node setup we need to break the services up a bit for various reasons. Because we no longer share memory between the Rancher instances we need Redis for pub-sub and Zookeeper for lock management. The websocket-proxy needs to be pulled out of the container because it doesn’t function properly behind a load balancer. Go-machine-service also gets pulled out of the container for similar reasons.

@cloudnautique thanks for the detailed explanation. Some thoughts and feedback:

First, please please do not hold off publishing diagrams in anticipation of things changing. Things will always be changing. Even more, having those text descriptions and diagrams helps your users understand how you work and not only allows them (that’s us) to be more productive, but allows us to give input in ways you might not have thought of. Cathedral and the Bazaar, right?

To more details: an HA structure that has 2 Rancher servers but a single MySQL just pushes the problem further down the line. Sure, I can then do all sorts of clustering/HA things with MySQL (log shipping, drbd, whatever), but that is yet another problem to manage. Better you keep it in your image (like with the single server) and have the rancher-server container automatically handle all of the replication/clustering/HA/failover.

I would question if MySQL is the best solution at all, but I haven’t looked deeply into your source code. It depends on how you are using it and your structured data. But you already have a replicated in-memory data store, albeit being used as a pub-sub mechanism (i.e. Redis), which can be given on-disk persistence. It would also make backups far easier in multiple ways.

I took a good look today at how docker swarm works. They basically use consul/ZK/etcd for sync and coordination between swarm managers… and that is it. fleetd does the same thing. I don’t love their “run it on the actual host” design, as opposed to inside a container, but I bet I could get it to work that way as well.

If I had time (something about consulting clients actually asking for the deliverables they pay for), I would take a stab at this myself. Happy to be part of the discussion, though (as I am right now).

As it stands now, I am recommending to people not to bother with HA. Back up the database regularly (as in every few minutes), and monitor the Rancher server well. If the container fails, docker will restart it; only if the host fails is there an issue.

I kinda agree with deitch, you should not need that whole stack (RDBMS + Redis + ZK).
Why cant you just leverage etcd for example ? A distributed key value store should be all you need.

The “R” is fairly useful for relational data. Storing everything in etcd is ridiculously impractical to build an API and UI around.

Maybe it’s usable now, but we tried etcd before and it did not go well. Redis or Hazelcast are used for internal messaging, Zookeeper will be removed and everything required will eventually be managed by the rancher image instead of requiring external installation.

Hi Vincent,

If etcd is not practical enough for your needs, have you looked into rethinkdb for example ? I believe you could leverage its system tables to elect your master (essentially, rethinkdb master node would be rancher master node, all based on Raft protocol) and you would have builtin pubsub system as well ( https://rethinkdb.com/docs/publish-subscribe/javascript/ ) and amazing DB with joins for your api / UI needs.

With Rethinkdb alone, you might be able to simplify the stack and remove everything else…

When zookeeper is gone, and redis used only internally, I don’t see there is any need to rework what has already been done in rancher regarding the DB backend. I don’t think they’ll switch DB lightly unless the tooling used for it supports the alternative… and in my opinion, they benefit of sticking with something that’s been around a long time and are widely used, as it is a component you may want to host outside of rancher itself for persistent storage.