After upgrading to Rancher v2.4.8, CIS 1.5 scans will not complete. I am able to start them, but they do not finish; they just show to be “Running”. When I click on the running scan, it shows “There are no tests”. I can download the report, and it is empty. The scans were working for me in Rancher v2.4.5.
I do not find any known issues in github about this or previous reports in the forums.
Has anybody else experienced this problem and found a solution?
Where can I find more information about the CIS scan process and why it isn’t completing? I understand the CIS scans are performed with kube-bench, but I do not know where that process runs or where to find the output of the process.
My scans are stopping after running for around two hours and showing the following:
“error running sonobuoy”
Searching for the message yields more relevant results than I was finding earlier. These issues look similar:
I tried refreshing driver and catalog meta data as suggested in a response to issue 26371.
This did not resolve the issue.
My clusters are custom Amazon EC2.
I am not sure what to try next. Any help is appreciated.
In my case, the problem was communication being blocked by a security group which seemed to have been defined correctly.
The documentation shows this port, UDP port 8472, should be allowed and sourced from members of the security group. The rule was configured as documented, but was not working.
I added a rule to the security group for my nodes to allow traffic on UDP port 8472 sourced from the VPC CIDR block and the scans started working again.
I got to this point by watching the logs in the Rancher UI for the sonobuoy containers that get created in the security-scan namespace when the scans are run. There were timeout errors like the following:
level=error msg="error entry for attempt: 3, verb: PUT, time: 2020-10-05 19:39:07.508550253 +0000 UTC m=+409.160503394, URL: https://cis-1601922433857806630-rancher-cis-benchmark/api/v1/results/by-node/ip-x-x-x-x.us-east-2.compute.internal/rancher-kube-bench: Put https://cis-1601922433857806630-rancher-cis-benchmark/api/v1/results/by-node/ip-x-x-x-x.us-east-2.compute.internal/rancher-kube-bench: dial tcp 10.x.x.x:443: connect: connection timed out"
Any ideas on why the security group did not work the way it is documented?
I had two nodes that were not in the same security group.