Rancher Graphite metrics issues

we have our rancher servers set up to export metrics to a graphite server but it appears that if/when there is a network issue and the metrics can’t be written to the server, then publisher stops sending metrics and never restarts.
We are on Rancher 1.1.3 and running in Rancher HA mode.
Restarting the rancher-ha-cattle container seems to recovery the process but this isn’t really viable.

1 Like

actually this appears to be an issue with my network configuration.
I have the following:

Rancher HA Servers -> AWS ELB (2003/TCP) -> Rancher Load Balancer (host 2003/TCP) -> Telegraf Agent

It appears that when I restart the telegraf agent that the existing connections no longer work.
I tried restart the Rancher LB’s and that didn’t work.
I then removed the 2003/TCP listener on the ELB and added it back and data started flowing again.
So my guess is this has to do with a persistent connection through the ELB that somehow restarting the Rancher LB doesn’t trigger a reconnect.

1 Like