Service being spammed with Healthchecks

I’ve been checking logging on some of my services. When I noticed that my services are basically being DDOS’d by the health check service.

See below for logs from one service. There is only one thing that calls “health check” on that service and thats the rancher health check (no external load balancer).

Why so many? There looks like many origins of this

  1. Every HAProxy Service (I have 6 hosts currently)
  2. From the healthcheck service on that host.

So we are looking at about That should be about 7 requests per 2 seconds, as you can see from the log below it is more. I’m not quite sure why HAProxy does a health check when you have a health check service? That seems like an odd setup?
I would expect the health check service to test the services then update HAProxy config (via etc or similar). What happens when I have 10,20,30 nodes? This doesn’t seem scalable?

(RancherOS 0.8.1, Rancher Server 1.5.3, Cattle Stack everything running latest updates.)

3/28/2017 4:18:28 PM10.42.167.221 - - [28/Mar/2017:15:18:28 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  19
3/28/2017 4:18:28 PM10.42.45.119 - - [28/Mar/2017:15:18:28 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 21
3/28/2017 4:18:28 PM10.42.195.86 - - [28/Mar/2017:15:18:28 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  22
3/28/2017 4:18:28 PM10.42.45.119 - - [28/Mar/2017:15:18:28 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 26
3/28/2017 4:18:28 PM10.42.197.10 - - [28/Mar/2017:15:18:28 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  26
3/28/2017 4:18:28 PM10.42.45.119 - - [28/Mar/2017:15:18:28 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 17
3/28/2017 4:18:28 PM10.42.43.37 - - [28/Mar/2017:15:18:28 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  19
3/28/2017 4:18:29 PM10.42.45.119 - - [28/Mar/2017:15:18:29 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 16
3/28/2017 4:18:29 PM10.42.163.10 - - [28/Mar/2017:15:18:29 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  19
3/28/2017 4:18:29 PM10.42.45.119 - - [28/Mar/2017:15:18:29 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 17
3/28/2017 4:18:29 PM10.42.161.170 - - [28/Mar/2017:15:18:29 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  20
3/28/2017 4:18:29 PM10.42.45.119 - - [28/Mar/2017:15:18:29 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 22
3/28/2017 4:18:29 PM10.42.39.7 - - [28/Mar/2017:15:18:29 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  24
3/28/2017 4:18:30 PM10.42.45.119 - - [28/Mar/2017:15:18:30 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 18
3/28/2017 4:18:30 PM10.42.167.221 - - [28/Mar/2017:15:18:30 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  21
3/28/2017 4:18:30 PM10.42.45.119 - - [28/Mar/2017:15:18:30 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 20
3/28/2017 4:18:30 PM10.42.195.86 - - [28/Mar/2017:15:18:30 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  22
3/28/2017 4:18:30 PM10.42.45.119 - - [28/Mar/2017:15:18:30 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 26
3/28/2017 4:18:30 PM10.42.197.10 - - [28/Mar/2017:15:18:30 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  27
3/28/2017 4:18:30 PM10.42.45.119 - - [28/Mar/2017:15:18:30 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 16
3/28/2017 4:18:30 PM10.42.43.37 - - [28/Mar/2017:15:18:30 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  19
3/28/2017 4:18:31 PM10.42.45.119 - - [28/Mar/2017:15:18:31 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 17
3/28/2017 4:18:31 PM10.42.163.10 - - [28/Mar/2017:15:18:31 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  19
3/28/2017 4:18:31 PM10.42.45.119 - - [28/Mar/2017:15:18:31 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 15
3/28/2017 4:18:31 PM10.42.161.170 - - [28/Mar/2017:15:18:31 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  17
3/28/2017 4:18:31 PM10.42.45.119 - - [28/Mar/2017:15:18:31 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 20
3/28/2017 4:18:31 PM10.42.39.7 - - [28/Mar/2017:15:18:31 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  21
3/28/2017 4:18:32 PM10.42.45.119 - - [28/Mar/2017:15:18:32 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 21
3/28/2017 4:18:32 PM10.42.167.221 - - [28/Mar/2017:15:18:32 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  23
3/28/2017 4:18:32 PM10.42.45.119 - - [28/Mar/2017:15:18:32 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 29
3/28/2017 4:18:32 PM10.42.195.86 - - [28/Mar/2017:15:18:32 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  30
3/28/2017 4:18:32 PM10.42.45.119 - - [28/Mar/2017:15:18:32 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 58
3/28/2017 4:18:32 PM10.42.197.10 - - [28/Mar/2017:15:18:32 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  60
3/28/2017 4:18:32 PM10.42.45.119 - - [28/Mar/2017:15:18:32 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 62
3/28/2017 4:18:32 PM10.42.43.37 - - [28/Mar/2017:15:18:32 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  63
3/28/2017 4:18:33 PM10.42.45.119 - - [28/Mar/2017:15:18:33 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 45
3/28/2017 4:18:33 PM10.42.163.10 - - [28/Mar/2017:15:18:33 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  50
3/28/2017 4:18:33 PM10.42.45.119 - - [28/Mar/2017:15:18:33 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 24
3/28/2017 4:18:33 PM10.42.161.170 - - [28/Mar/2017:15:18:33 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  26
3/28/2017 4:18:33 PM10.42.45.119 - - [28/Mar/2017:15:18:33 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 23
3/28/2017 4:18:33 PM10.42.39.7 - - [28/Mar/2017:15:18:33 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  24
3/28/2017 4:18:34 PM10.42.45.119 - - [28/Mar/2017:15:18:34 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 64
3/28/2017 4:18:34 PM10.42.167.221 - - [28/Mar/2017:15:18:34 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  66
3/28/2017 4:18:34 PM10.42.45.119 - - [28/Mar/2017:15:18:34 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 22
3/28/2017 4:18:34 PM10.42.195.86 - - [28/Mar/2017:15:18:34 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  25
3/28/2017 4:18:34 PM10.42.45.119 - - [28/Mar/2017:15:18:34 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 28
3/28/2017 4:18:34 PM10.42.197.10 - - [28/Mar/2017:15:18:34 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  32
3/28/2017 4:18:34 PM10.42.45.119 - - [28/Mar/2017:15:18:34 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 16
3/28/2017 4:18:34 PM10.42.43.37 - - [28/Mar/2017:15:18:34 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  19
3/28/2017 4:18:35 PM10.42.45.119 - - [28/Mar/2017:15:18:35 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 17
3/28/2017 4:18:35 PM10.42.163.10 - - [28/Mar/2017:15:18:35 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  19
3/28/2017 4:18:35 PM10.42.45.119 - - [28/Mar/2017:15:18:35 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 16
3/28/2017 4:18:35 PM10.42.161.170 - - [28/Mar/2017:15:18:35 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  18
3/28/2017 4:18:35 PM10.42.45.119 - - [28/Mar/2017:15:18:35 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 15
3/28/2017 4:18:35 PM10.42.39.7 - - [28/Mar/2017:15:18:35 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  18
3/28/2017 4:18:36 PM10.42.45.119 - - [28/Mar/2017:15:18:36 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 15
3/28/2017 4:18:36 PM10.42.167.221 - - [28/Mar/2017:15:18:36 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  17
3/28/2017 4:18:36 PM10.42.45.119 - - [28/Mar/2017:15:18:36 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 18
3/28/2017 4:18:36 PM10.42.195.86 - - [28/Mar/2017:15:18:36 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  20
3/28/2017 4:18:36 PM10.42.45.119 - - [28/Mar/2017:15:18:36 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 15
3/28/2017 4:18:36 PM10.42.197.10 - - [28/Mar/2017:15:18:36 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  18
3/28/2017 4:18:36 PM10.42.45.119 - - [28/Mar/2017:15:18:36 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 16
3/28/2017 4:18:36 PM10.42.43.37 - - [28/Mar/2017:15:18:36 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  17
3/28/2017 4:18:37 PM10.42.45.119 - - [28/Mar/2017:15:18:37 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 16
3/28/2017 4:18:37 PM10.42.163.10 - - [28/Mar/2017:15:18:37 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  19
3/28/2017 4:18:37 PM10.42.45.119 - - [28/Mar/2017:15:18:37 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1 (HttpUrlConnection 1.8.0_121)" 17
3/28/2017 4:18:37 PM10.42.161.170 - - [28/Mar/2017:15:18:37 +0000] "GET /healthcheck HTTP/1.0" 200 82 "-" "-"  19
3/28/2017 4:18:37 PM10.42.45.119 - - [28/Mar/2017:15:18:37 +0000] "GET /healthcheck HTTP/1.1" 200 - "-" "Jersey/2.23.1

For n >= 4 hosts, the health check services on 3 hosts (other than the one the target container is on) are selected to test each container. This updates the state in the Rancher db/api/ui/metadata, which then updates things like DNS entries. So that is a fixed 3 checks every 2 seconds.

When used as a target for a load balancer, the health check of the target service is configured in haproxy too. This allows the balancer to keep working even if the rancher/server container is completely down/unreachable. That’s an additional check every 2 seconds per balancer container that it’s a target of. You would not generally scale the number of balancer containers up forever.

You have 10 seconds of data there… so if you sort your log by IP you’ll see the expected 5 entries (or 4…) for each of:

  • 10.42.161.170
  • 10.42.163.10
  • 10.42.167.221
  • 10.42.195.86
  • 10.42.197.10
  • 10.42.39.7
  • 10.42.43.37

Presumably you’ll find 3 of those IPs are health checks and 4 are balancers.

The majority of the lines are from 10.42.45.119 and Java code (Jersey/2.23.1) making 3-4 requests/second. Which I don’t think has anything to do with us.

1 Like