Racon fail in Network agent 0.4.1 on RancherOS 0.33

Hello,

When I try to “connect” to containers (a load blancer and an application portal) though managed network, it doesn’t work.

On network agent logs I get this error message :

25 septembre 2015 16:54:07 UTC+2racoon - IKE keying daemon will not be started as /proc/net/pfkey is not
25 septembre 2015 16:54:07 UTC+2         available or a suitable 2.6 (or 2.4 with IPSEC backport)
25 septembre 2015 16:54:07 UTC+2         kernel with af_key.[k]o module installed.

On network agent container (rancher/agent-instance:v0.4.1) in /var/log/racoon.log I get :

 2015-09-25 14:54:08: INFO: @(#)ipsec-tools 0.8.0 (http://ipsec-tools.sourceforge.net)
2015-09-25 14:54:08: INFO: @(#)This product linked OpenSSL 1.0.1f 6 Jan 2014 (http://www.openssl.org/)
2015-09-25 14:54:08: INFO: Reading configuration from "/etc/racoon/racoon.conf"
2015-09-25 14:54:56: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 14:55:16: ERROR: phase1 negotiation failed due to time up. 7c03d4505c30deb9:0000000000000000
2015-09-25 14:55:27: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 14:55:59: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 14:56:19: ERROR: phase1 negotiation failed due to time up. b32b533497d4013d:0000000000000000
2015-09-25 14:56:31: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 14:57:03: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 14:57:23: ERROR: phase1 negotiation failed due to time up. 1b1afb9331db2048:0000000000000000
2015-09-25 14:57:35: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 14:58:08: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 14:58:28: ERROR: phase1 negotiation failed due to time up. 6599af15985e6d81:0000000000000000
2015-09-25 14:58:40: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 14:59:12: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 14:59:32: ERROR: phase1 negotiation failed due to time up. 285376db3913c48a:0000000000000000
2015-09-25 14:59:44: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.
2015-09-25 15:00:16: WARNING: PF_KEY EXPIRE message received from kernel for SA being negotiated. Stopping negotiation.

So managed network seems not working because IPSec tunnel can’t establish.

Hi.

I saw this as well. I suspect the issue was that my EC2 instances were using the same NAT instance (they are on a private subnet) and hence showed up with identical IP’s in rancher. Specifying their private IP with CATTLE_AGENT_IP solved it for me. YMMV.

Thanks Sdlarsen, no change for me.
I’m in a local instances configurations.

Just upgrading to Rancher 0.40 a test again : same issue :

2015-10-08 09:39:26: INFO: @(#)ipsec-tools 0.8.0 (http://ipsec-tools.sourceforge
.net)                                                                           
2015-10-08 09:39:26: INFO: @(#)This product linked OpenSSL 1.0.1f 6 Jan 2014 (ht
tp://www.openssl.org/)                                                          
2015-10-08 09:39:26: INFO: Reading configuration from "/etc/racoon/racoon.conf" 
2015-10-08 09:40:38: WARNING: PF_KEY EXPIRE message received from kernel for SA 
being negotiated. Stopping negotiation.                                         
2015-10-08 09:40:58: ERROR: phase1 negotiation failed due to time up. b9b8b8a048
07f1ca:0000000000000000                                                         
2015-10-08 09:41:08: WARNING: PF_KEY EXPIRE message received from kernel for SA 
being negotiated. Stopping negotiation.

I wasn’t able to reproduce this on a set up using RancherOS as a host in v0.42.0. Since the rancher/agent and rancher/agent-instance (network agent) hasn’t changed in the last couple of releases, I"m not quite sure what could be the issue related to RancherOS.

Were you upgrading your Rancher setups from previous versions?

This sounds similar to https://github.com/rancher/rancher/issues/725, but not sure if the’s the same issue.

I will try with 0.42 and re-installing rancherOS VMs, and let you know.