I deployed a workload in a cluster of 5 pods and exposed an HTTP port using the Ingress L7 load balancer, it generated a xip.io address that worked pretty well, pointing to a node IP.
But, when I stop the node associated with the xip.io, that is running the Ingress lb, my HTTP service does not work anymore. Is this normal behavior? How can I achieve High availability on my running container? I want the service continues operating if a node of the cluster gets down.
xip is just a DNS server that turns requests for a-b-c-d.xip.io into an answer of the IP a.b.c.d, the IP for a single node. To actually point at multiple servers you need a load balancer or a DNS name with multiple answer IPs that point to the nodes.
Thank you Vincent for your answer. So Rancher does not have a ready solution of load balancing and high availability using the nodes, and I need to setup a Nginx by myself?
There is no magic general solution we can just provide that works in every environment.
The basic strategies are to have a fixed IP(s) and be able to migrate it around onto a node that’s healthy when one dies so that there’s someone to answer, or to update the set of IPs a DNS record points to to remove the IPs of a dead node, or to pay someone else to deal with it.
Updating DNS records requires having a provider to publish a record to in the first place, credentials to make/update entries, etc. This is what external-dns does. You need to set a low TTL on the records for changes to take effect quickly, and there is therefore some performance cost to clients that need to resolve the name frequently. (and sometimes dollar cost to you, depending on the provider). Some clients/ISPs/networks also ignore or have a floor on how low a TTL they’ll honor, so they may see broken requests for longer than the TTL you set.
If you control the network the machines are running on you can use ARP broadcasts or BGP to “move” floating IPs from one node to another when it dies. This is what MetalLB does. These do not have the disadvantages of dealing with DNS TTLs because the target IPs stay the same. (There is also anycast which makes the same IP available on multiple machines simultaneously, but this is beyond the average small user scope)
Or hosted providers usually have a LB-as-a-service, which does the some combination of the above so that you don’t have to deal with it.