Is there a way to schedule containers so that no two containers in a service are within the same failure domain?
For example, I have 3 physical hosts, A, B, C. Each host has 2 VMs, A1, A2, B1, etc that are added as rancher hosts.
Can I create a service with 3 containers and a schedule that guarantees that no two containers are located on VMs running on the same host (i.e. A1, B2, C1)?
What I mean, is that if I create a service with 3 containers, and two of them happen to be on VMs that reside in the same physical host (A1 & A2), that host is a single point of failure that can take down 2/3 of the containers. The same can be said of network switches, server chassis where multiple servers share the same power & control plane, or server racks that share the same power circuit. It would be great if there was a way to spread out the containers in a service to multiple failure domains so there is no single hardware failure that can take down a large portion of the containers.
So on your VMs A1 & A2 define the same host label with value and then set the scheduler to run 1 instance on each host with label XY so you can guarantee that only 1 container spins up per physical host.
Thats my understanding of the scheduler affinity didn’t tried it by myself but i think it is worth it to try it out.
I think the host affinity is the area that is most relevant, but that still wouldn’t solve the problem. My example was a very basic one.
Lets say I have 5 server chassis, and each one has 4 physical servers in it for 20 hosts total. I only want 3 containers in the service, but I want each container to be located on a server in 3 different chassis. Using your method, I would get 5 containers.
I think what is needed is a “unique” condition. On each of the physical servers in a chassis, I can add the same label for chassis=number. Then in the scheduling, I say that the values of the server label “chassis” should/must be unique for all containers in the service.
This way you can have as many failure domains as you want (server, switch, rack, room, datacenter) and you add the appropriate tags to each server, and you can spread out the service containers into many different failure domains.
Afaik you can use the scale parameter in the docker-compose file and set it to 3 - so rancher will bring up only 3 containers on hosts meeting your rules for schedule.affinity
And cause of the scale + affinity.schedule even if 2 are on the same host as soon as this hosts isn’t reachable anymore it should automatically spin up the containers on available hosts meeting the criteria. Thats my understanding for that.