Greetings,
I’ve deployed elasticsearch via this helm chart onto a cluster; I"m then attempting to direct cluster logging into the elasticsearch cluster but keep hitting what appear to be memory problems. The cluster has 9 nodes and runs ca 300 pods.
First iteration, I just deployed elasticsearch w/ all the defaults in values.yaml; when I hit ‘Test’ on the Rancher logging page, it fails and caused the elasticsearch cluster to blow up with a message:
Data too large, data for [<transport_request>] would be (ca 0.9gb) which is larger than the limit of (0.8gb)
OK, sounds like a memory issue, so doubled the heap size, and container limit in values.yaml
, and tried a gain…same story.
I’ve iterated several times and now have the heap size up to 12g
and container memory limit up to 24Gi
, and the elasticsearch cluster still blows up with the same memory error(with higher values) when I hit ‘Test’ on the rancher logging setup page.
At this point I’m questioning if this is really a memory problem since it seems strange that this much memory is needed just to support whatever rancher is doing to test that it can successfully use the elasticsearch endpoint. If there were a potential memory issue here I would expect it to show up later when es has started storing a substantial amount of data and the indexes are growing large.
Anyone have thoughts on:
- what exactly is rancher doing when running ‘Test’ on the logging setup page and what type of minimum requirements should be met for the es cluster?
- proper elasticsearch sizing for capturing cluster logging
- useful approaches to deploying elasticsearch; is the helm deployment I’m using good, or are there better ways?
- could the memory error when the elasticsearch cluster blows up be a red herring? Possibly there are other issues at play here that I’m missing?
Thanks in advance for any help.