Requests Timing Out Using Rancher Internal Network

Hey there, we’re using a Eureka service for registering and load balancing of our microservices on the latest Rancher enterprise build. One thing we’ve run into is requests timing out when services are registering against the Eureka node using internal DNS:

Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to registry.ereg-micro-services.rancher.internal:8080 timed out
        at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:119)
        at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:177)
        at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:144)
        at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:131)
        at org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:611)
        at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:446)
        at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:117)
        at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
        at com.sun.jersey.client.apache4.ApacheHttpClient4Handler.handle(ApacheHttpClient4Handler.java:173)
        ... 27 common frames omitted

Registry is the name of the Eureka service in Rancher, ereg-micro-services is the name of the stack, using default rancher.internal otherwise.

This problem doesn’t exist if we use an HAProxy LB in front of the registry microservice and point the other services at that location. What could cause this behavior? We’d prefer to avoid using the extraneous LB and only use the internal network layer. All services have no problem resolving one another and the service registry from the cli using the internal long and short DNS names. Any ideas?

Did you do any other debugging by any chance? Like checking if the IP is correct when the node tries to register? I’m trying to find a reason why a LB would always work, and DNS wouldn’t. How many containers are there in the registry service? And I assume running IPsec?