I have a k3s cluster with embedded etcd on ubuntu VM’s. It has 3 master nodes with 6G ram each and another 3 agents with 6G ram each.
It’s pretty much freshly installed, I added logging, longhorn, monitoring and rancher backups from the rancher helms.
After this and after configuring the alerting system I’m getting KubeAPIErrorBudgetBurn - The API server is burning too much error budget.
To be honest is the first time I’m seeing this and I’m not sure what to look for.
This is my API Server graphs which if I understand this correctly ErrorBudget is terrible somehow?
I found out that one of my node’s exporters was not reporting correctly for some reason, I don’t quite understand how but I went into it, checked the firewall and it was correct, so I just disabled the firewall to confirm, that node’s exporters went green on the prometheus…
I then re-enabled the firewall with no changes and it kept green some how…
Since then gradually the graphs have been becoming a bit better but it’s still wierdly negative in the error budget:
@aszmyd We moved the control plane nodes to faster machines (better disk and network), and the problem went away. It also goes without saying that running any workload on the same node will make the issue more likely to occur: we had a cluster with a single control plane node that also ran ingress on that node and exhibited the same behaviour. Running the ingress on a worker node solved the issue there.