Networking goes randomly down

The Networking between the hosts goes randomly down for no reason. When I reboot the host where rancher server is located, everything goes back to normal and I can ping all other hosts.
I use 3 hosts who are all on the same Subnet with Private IPs.

This has been an issue for years. I just stopped using the rancher managed network.

What do you suggest, that I can use instead?

I’m on Google Cloud so I just use my own private network for internal services (redis, mysql, influxdb, beanstalk etc.)

I launch each internal service (network=host) with each container exposing the necessary port to the private network only.

I then launch my other containers (network=host) and hardcode all private ips using environment variables. This has allowed me to run hundreds of containers without the constant network failures I was getting with rancher’s managed network.

Regardless at the container velocity I’m dealing with you must use ips to sidestep DNS, it becomes another point of failure.

@Dr_No and @bacheson can you please check if you are hitting this:

docker exec -it $(docker ps | grep ipsec-router | awk '{print $1}') bash
cat /proc/net/xfrm_stat

and check for XfrmInStateSeqError.