NGINX can't resolve linked container

I have a stack that has been running fine for several months.
I have a brand new Rancher build that I setup in a different region.
I am using Rancher 1.1.2 and docker 1.12.1.

I have successfully launched several stacks on my new Rancher infrastructure. However, for this one particular stack, the NGINX component fails.

The stack is for Portus

The problem is that the sslproxy container just goes into permanent restart. The error is:

nginx: [emerg] host not found in upstream “portus” in /etc/nginx/conf.d/portus.conf:9

This is my docker-compose.xml

sslproxy:
  labels:
    io.rancher.scheduler.affinity:container_label_soft: registry.portus.db=1
    io.rancher.scheduler.affinity:host_label: name=rancherpool-pvt
  tty: true
  image: nginx:1.9.9
  links:
  - portus:portus
  volumes:
  - /efs/data/registry/certs:/etc/nginx/certs:ro
  - /efs/data/registry/proxy:/etc/nginx/conf.d:ro  
  stdin_open: true
lb:
  ports:
  - 5000:5000/tcp
  - 443:443/tcp
  labels:
    io.rancher.scheduler.global: 'false'
    io.rancher.loadbalancer.target.sslproxy: 443=443
    io.rancher.scheduler.affinity:host_label: name=rancherpool-pvt
    io.rancher.loadbalancer.target.registry: 5000=5000
  tty: true
  image: rancher/load-balancer-service
  links:
  - registry:registry
  - sslproxy:sslproxy
  stdin_open: true
registry:
  environment:
    REGISTRY_AUTH: token
    REGISTRY_AUTH_TOKEN_ISSUER: reg.domain.com
    REGISTRY_AUTH_TOKEN_REALM: https://reg.domain.com:443/v2/token
    REGISTRY_AUTH_TOKEN_ROOTCERTBUNDLE: /certs/registry.crt
    REGISTRY_AUTH_TOKEN_SERVICE: reg.domain.com:5000
    REGISTRY_HTTP_SECRET: httpsecret
    REGISTRY_HTTP_TLS_CERTIFICATE: /certs/registry.crt
    REGISTRY_HTTP_TLS_KEY: /certs/registry.key
    REGISTRY_LOG_LEVEL: warn
    REGISTRY_NOTIFICATIONS_ENDPOINTS: |-
      - name: portus
        url: http://portus:3000/v2/webhooks/events
        timeout: 500
        threshold: 5
        backoff: 1
    REGISTRY_STORAGE_DELETE_ENABLED: 'true'
  log_driver: ''
  labels:
    io.rancher.scheduler.affinity:host_label: name=rancherpool-pvt
  log_opt: {}
  image: registry:2.3.1
  links:
  - portus:portus
  volumes:
  - /efs/data/registry/certs:/certs:rw
  - /efs/data/registry/data:/var/lib/registry:rw
portus:
  environment:
    PORTUS_CHECK_SSL_USAGE_ENABLED: 'true'
    PORTUS_GRAVATAR_ENABLED: 'true'
    PORTUS_KEY_PATH: /certs/registry.key
    PORTUS_MACHINE_FQDN: reg.domain.com
    PORTUS_PASSWORD: password$$
    PORTUS_PORT: '443'
    PORTUS_PRODUCTION_DATABASE: portus
    PORTUS_PRODUCTION_HOST: aws.database.rds.amazonaws.com
    PORTUS_PRODUCTION_PASSWORD: password$$
    PORTUS_PRODUCTION_USERNAME: username
    PORTUS_SECRET_KEY_BASE: password$$
    PORTUS_SMTP_ENABLED: 'false'
    REGISTRY_HOSTNAME: reg.domain.com
    REGISTRY_NAME: Registry
    REGISTRY_PORT: '5000'
    REGISTRY_SSL_ENABLED: 'true'
  log_driver: ''
  labels:
    registry.portus.app: '1'
    io.rancher.container.pull_image: always
    io.rancher.scheduler.affinity:host_label: name=rancherpool-pvt
  log_opt: {}
  image: sshipway/portus:2.0.5
  volumes:
  - /efs/data/registry/certs:/certs:rw
  - /efs/data/registry/proxy:/etc/nginx/conf.d:rw

This is my nginx.conf file:

 server {
    listen 443 ssl;
    ssl_certificate     certs/registry.crt;
    ssl_certificate_key certs/registry.key;
    location / {
      proxy_set_header Host reg.domain.com;
      proxy_set_header X-Forwarded-Proto https;
      proxy_set_header X-Forwarded-Host reg.domain.com:443;
      proxy_pass http://portus:3000/;
      proxy_http_version 1.1;
      proxy_set_header Connection "upgrade";
      proxy_read_timeout 900s;
    }
  }

Any help would be appreciated. I have tried forcing them to load on the same host, but that did not work. And I did not have to do that on my previous environment. I have other stacks that use container linking just fine.

I’m not sure if this will actually solve your issue, but it is related and worth a try… The normal behavior of nginx is to resolve any hostnames exactly once on startup and cache that value forever. To make it actually update you need to give it a resolver:

location / {
     ...
      resolver 169.254.169.250 valid=5s ipv6=off;
      set $target "portus";
      proxy_pass http://$target:3000;
    }

that’s copied from https://github.com/rancher/rancher/issues/3987#issuecomment-198220826, I’m not sure if defining a variable and then using that in proxy pass accomplishes anything over just using the name directly, and ipv6=off shouldn’t really be needed anymore.

(P.S. I edited your post, you missed replacing domain.com in one place…)

The config file gets regenerated every time the stack is launched. So I need to figure out how to pass that as an environmental variable.

I can’t pass that as a variable. The person who manages that build says I will have to customize the startup.sh file, which I am relectant to do since it will make future upgrades difficult.

Just to check, I launched some servers running Docker 1.10.3 which is the same as I used in the previous environment. Still the same issue. Could this be a Rancher-1.1.2 with NGINX problem? I do notice that NGINX is no longer in the Rancher Catalog.

I checked on github for any related issues but did not find any.

I was able to test it with resolver 169.254.169.250 valid=5s ipv6=off; added to the portus.conf but that did not work. Same error. I also tried backing down to Docker 1.10.3, but got the same error.

There had never been an nginx in the catalog that I’m aware of.

Have you tried exec-ing into the container and resolving portus?

it doesn’t stay up. Just restarts and restarts …

Change the entrypoint to something like bash that won’t die like nginx is doing