IPSec not connecting in 3 Host Setup

Hi Guys,

I really like Rancher and all the nice concepts, so i was upgrading my vm setup right away :slight_smile:

I want to use Rancher 1.5.1 and have 3 Hosts:
A (Ubuntu), also running Rancher Server
B (Debian)
C (Ubuntu)

Setup was easy and everything seems to work, but going further sometimes containers stay in “initializing” mode and I started to dig deeper and found an ipsec problem. While host B and C are connected, A refuses to connect either to B or C.

They all have a datacenter private ip and i am able to ping all hosts from every host.

Execing into the ipsec-router I used “swanctl --list-conns” to see if the connection is configured and they all have the 2 connections configured
A: conn to B and C
B: conn to A and C
C: conn to A and B

This should be fine (at least i guess so :P)

Using “swanctl --list-sas” shows (in my understanding) active connections. B has connections to C and the other way around, while A has no connections at all.

Throwing away ipsec, ipsec-cni, ipsec-router on A rancher automatically recreates them, so i got the logs for creating the connections and what might cause the issue. Since I am not an expert on ipsec, I am not sure what it means, so I am posting it here and hope that someone could lead me in the right direction to resolve this issue:

Log from ipsec-router of host A after recreation:

IPs of B and C are replaced with “Host B” and “Host C”:

16.3.2017 09:12:2512[CFG] loaded IKE shared key for: '%any’
16.3.2017 09:12:2504[CFG] loaded IKE shared key for: 'Host C’
16.3.2017 09:12:2506[CFG] added vici connection: conn-Host C
16.3.2017 09:12:2506[CFG] initiating 'child-Host C’
16.3.2017 09:12:2506[IKE] initiating IKE_SA conn-Host C[1] to Host C
16.3.2017 09:12:2506[ENC] generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(HASH_ALG) N(REDIR_SUP) ]
16.3.2017 09:12:2506[NET] sending packet: from 10.42.194.174[500] to Host C[500] (500 bytes)
16.3.2017 09:12:2507[CFG] loaded IKE shared key for: '10.42.66.175’
16.3.2017 09:12:2505[CFG] loaded IKE shared key for: 'Host B’
16.3.2017 09:12:2513[CFG] added vici connection: conn-Host B
16.3.2017 09:12:2513[CFG] initiating 'child-Host B’
16.3.2017 09:12:2513[IKE] initiating IKE_SA conn-Host B[2] to Host B
16.3.2017 09:12:2513[ENC] generating IKE_SA_INIT request 0 [ SA KE No N(NATD_S_IP) N(NATD_D_IP) N(HASH_ALG) N(REDIR_SUP) ]
16.3.2017 09:12:2513[NET] sending packet: from 10.42.194.174[500] to Host B[500] (500 bytes)
16.3.2017 09:12:2514[CFG] loaded IKE shared key for: '10.42.152.222’
16.3.2017 09:12:2510[NET] received packet: from Host B[500] to 10.42.194.174[500] (36 bytes)
16.3.2017 09:12:2510[ENC] parsed IKE_SA_INIT response 0 [ N(NO_PROP) ]
16.3.2017 09:12:2510[IKE] received NO_PROPOSAL_CHOSEN notify error
16.3.2017 09:12:2507[CFG] loaded IKE shared key for: '10.42.194.174’
16.3.2017 09:12:2509[NET] received packet: from Host C[500] to 10.42.194.174[500] (36 bytes)
16.3.2017 09:12:2509[ENC] parsed IKE_SA_INIT response 0 [ N(NO_PROP) ]
16.3.2017 09:12:2509[IKE] received NO_PROPOSAL_CHOSEN notify error
16.3.2017 09:12:2507[KNL] creating acquire job for policy 10.42.139.69/32[6/41552] === 10.42.87.175/32[6/42] with reqid {1234}
16.3.2017 09:12:2507[CFG] trap not found, unable to acquire reqid 1234
16.3.2017 09:12:2504[KNL] creating acquire job for policy 10.42.139.69/32[6/59110] === 10.42.107.184/32[6/42] with reqid {1234}
16.3.2017 09:12:2508[CFG] trap not found, unable to acquire reqid 1234
16.3.2017 09:12:2510[KNL] creating delete job for CHILD_SA ESP/0x00000000/Host B
16.3.2017 09:12:2506[JOB] CHILD_SA ESP/0x00000000/Host B not found for delete
16.3.2017 09:12:2507[KNL] creating delete job for CHILD_SA ESP/0x00000000/Host C
16.3.2017 09:12:2514[JOB] CHILD_SA ESP/0x00000000/Host C not found for delete

Thanks in advance!

Cheers,

Martin

Make sure that UDP 4500 and 500 are open between the hosts and also that they are registering with the rancher server with their internal IP - you can control it using CATTLE_AGENT_IP variable on register. When hosts failed to get healthy ipsec and health check it was always one of those two things for me.

For example on AWS:

sudo docker run -d -e CATTLE_AGENT_IP="`wget -qO- http://169.254.169.254/latest/meta-data/local-ipv4`"\
 -e CATTLE_HOST_LABELS="az=`wget -qO- http://169.254.169.254/latest/meta-data/placement/availability-zone`"\
 --privileged -v /var/run/docker.sock:/var/run/docker.sock rancher/agent:v1.2.0 REGISTRATION_URL

Hi @sra ,

thanks for catching up on this. I doublechecked the settings and internal IPs. Server is confiured with the internal IP and all Agents have the CATTLE_AGENT_IP set for their host machine. Unfortunately it’s still not working. Thank you for your thoughts!

@Martin please find me tomorrow on https://slack.rancher.io, I would like to collect some information/logs from your setup.

Also if you are using either CentOS/RHEL, please do share the steps to setup the host. (docker installation, storage etc)

HI @leodotcloud,

thanks for your help. Really appreciated!

As promised, here the description of my situation and what we have discussed: https://www.atroo.de/a-normal-day-on-the-cattle/?utm_source=rancher_forum&utm_campaign=blog_post

1 Like

@Martin Glad we could solve your problem. Thank you for the post!