jmlp1
February 3, 2024, 8:01pm
1
Rancher V2.7.5 and v2.7.6 both have the same issue, notifications via Slack or Email Receivers are not sent
Followed instruction here How to set up Alertmanager configs in Monitoring V2 in Rancher | Support | SUSE
There are also opened issues here
opened 04:49PM - 17 May 23 UTC
kind/bug
area/monitoring
[zube]: To Triage
team/observability&backup
**Rancher Server Setup**
- Rancher version: 2.7.3 - GKE: v1.24.10-gke.2300
*… *Information about the Cluster**
- Kubernetes version: GKE: v1.24.10-gke.2300
**Describe the bug**
After the creation of the AlertmanagerConfig nothing happens and the alerts/notifications are not sending to Slack channel.
**To Reproduce**
- Create a secret containing the webhook URL for the Slack channel:
```bash
kubectl create secret generic my-secret-slack -n cattle-monitoring-system --from-literal=key="https://hooks.slack.com/services/T9VQ2SN9K/B0542USLL8J/XXXXXXXXXXXXXXXXXXXX"
```
- PrometheusRule:
<img width="1266" alt="Screen Shot 2023-05-17 at 14 03 38" src="https://github.com/rancher/rancher/assets/26934246/33f59259-9ebc-419c-8e2b-4f5d99fc5659">
<img width="1266" alt="Screen Shot 2023-05-17 at 14 02 52" src="https://github.com/rancher/rancher/assets/26934246/aef76c7d-c031-4438-9745-ae32c936b785">
- Create an AlertmanagerConfig on UI setting Slack as a receiver:
<img width="1266" alt="Screen Shot 2023-05-17 at 13 37 55" src="https://github.com/rancher/rancher/assets/26934246/00a6ff7b-e650-4164-97af-ea2baa40c53a">
<img width="1266" alt="Screen Shot 2023-05-17 at 13 38 46" src="https://github.com/rancher/rancher/assets/26934246/f5ff4760-3abd-4b76-8bc4-1ce3bec9b4c3">
<img width="1266" alt="Screen Shot 2023-05-17 at 13 39 30" src="https://github.com/rancher/rancher/assets/26934246/62a0e98d-6510-4538-9918-b7621032382d">
<img width="1266" alt="Screen Shot 2023-05-17 at 13 46 24" src="https://github.com/rancher/rancher/assets/26934246/f5d5d955-6252-44eb-8c71-ebf5b55f5c87">
The alert is in the UI, but no messages are arriving in the Slack channel:
<img width="1266" alt="Screen Shot 2023-05-17 at 14 02 24" src="https://github.com/rancher/rancher/assets/26934246/b945ab8b-7496-4a52-8c7b-ef3197b48cc6">
<img width="1266" alt="Screen Shot 2023-05-17 at 14 02 52" src="https://github.com/rancher/rancher/assets/26934246/66612b99-144f-4d20-9d18-47552b3f8e63">
Using `curl` on the command line the message arrives normally on the Slack channel:
<img width="1266" alt="Screen Shot 2023-05-17 at 14 17 40" src="https://github.com/rancher/rancher/assets/26934246/378876f9-8f43-417a-9ff2-6e6a4358dea5">
**Result**
No alerts are sending to Slack channel.
**Expected Result**
Slack channel receives the alerts.
**Logs**
I believe there is nothing unusual on the logs either:
```bash
kubectl logs pod/alertmanager-rancher-monitoring-alertmanager-0 -n cattle-monitoring-system -c alertmanager
...
...
...
ts=2023-05-17T15:43:24.425Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2023-05-17T15:43:24.426Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2023-05-17T15:49:24.624Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2023-05-17T15:49:24.624Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2023-05-17T16:37:25.641Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2023-05-17T16:37:25.641Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
```
```bash
kubectl logs -f prometheus-rancher-monitoring-prometheus-0 -n cattle-monitoring-system -c prometheus
...
...
...
ts=2023-05-17T14:27:03.394Z caller=main.go:1181 level=info msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
ts=2023-05-17T14:27:03.401Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2023-05-17T14:27:03.401Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2023-05-17T14:27:03.401Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2023-05-17T14:27:03.402Z caller=kubernetes.go:326 level=info component="discovery manager scrape" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2023-05-17T14:27:03.402Z caller=kubernetes.go:326 level=info component="discovery manager notify" discovery=kubernetes msg="Using pod service account via in-cluster config"
ts=2023-05-17T14:27:03.518Z caller=main.go:1218 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=124.398203ms db_storage=1.668µs remote_storage=3.097µs web_handler=1.809µs query_engine=1.742µs scrape=2.261779ms scrape_sd=1.441285ms notify=29.135µs notify_sd=337.088µs rules=115.691019ms tracing=8.936µs
ts=2023-05-17T15:00:02.173Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1684324800180 maxt=1684332000000 ulid=01H0N3K9FG9W8715R904QF5W71 duration=1.036452537s
ts=2023-05-17T15:00:02.213Z caller=head.go:844 level=info component=tsdb msg="Head GC completed" duration=37.948353ms
ts=2023-05-17T15:00:02.215Z caller=checkpoint.go:100 level=info component=tsdb msg="Creating checkpoint" from_segment=6 to_segment=7 mint=1684332000000
ts=2023-05-17T15:00:02.768Z caller=head.go:1013 level=info component=tsdb msg="WAL checkpoint complete" first=6 last=7 duration=553.32545ms
ts=2023-05-17T17:00:02.170Z caller=compact.go:519 level=info component=tsdb msg="write block" mint=1684332000023 maxt=1684339200000 ulid=01H0NAF0QGSQKQMCKTCG7WK47X duration=1.03365409s
ts=2023-05-17T17:00:02.214Z caller=head.go:844 level=info component=tsdb msg="Head GC completed" duration=41.518447ms
ts=2023-05-17T17:00:04.062Z caller=compact.go:460 level=info component=tsdb msg="compact blocks" count=3 mint=1684303200173 maxt=1684324800000 ulid=01H0NAF1S8GA622JRW4F53383X sources="[01H0MF03VY371H8TW0K9TF1DYC 01H0MNVV3Y9HX4BXAPF7CPRDZX 01H0MWQJBZTY1ES1GJ4ZDBDB3G]" duration=1.84544325s
ts=2023-05-17T17:00:04.070Z caller=db.go:1294 level=info component=tsdb msg="Deleting obsolete block" block=01H0MF03VY371H8TW0K9TF1DYC
ts=2023-05-17T17:00:04.076Z caller=db.go:1294 level=info component=tsdb msg="Deleting obsolete block" block=01H0MWQJBZTY1ES1GJ4ZDBDB3G
ts=2023-05-17T17:00:04.082Z caller=db.go:1294 level=info component=tsdb msg="Deleting obsolete block" block=01H0MNVV3Y9HX4BXAPF7CPRDZX
```
**Additional context**
I've double-check this procedure before opening this Github issue: https://www.suse.com/support/kb/doc/?id=000020737
Any help will be appreciate it!
Thank you
opened 06:16AM - 25 Apr 23 UTC
kind/bug
area/alerting
[zube]: Need Info
team/infracloud
team/opni
I use rancher 2.7.1 I have set alert rule. I can see the rule is acitved and th… e alert is fired , just as following images
![1682403162206](https://user-images.githubusercontent.com/41671830/234189633-6e089629-d5aa-45c9-826f-b1a75f2e13fe.png)
I also set the AlertmanagerConfig as following
![1682403267511](https://user-images.githubusercontent.com/41671830/234189876-00b02374-5dfc-4fac-8cbe-2f87448c5713.png)
![1682403313286](https://user-images.githubusercontent.com/41671830/234189983-c870824d-fb84-4381-b568-c7ea91bfcc5f.png)
I set a receiver of email and set a routers for the new alert rule . But I can not get some alert email. Can someone help me. Thank you!
I tried with Robusta in an AKS env and just worked, their alert manager sent notifications to Slack or Emails based on Prometheus
At the moment we are not receiving alerts from production Rancher clusters or even testing ones, different clouds are all behaving the same
Would be great if someone could have a look into this
Hi, in the Alertmanager GUI → Status section, can you see your alert manager config present there? We need to check if the Config is getting reflected in the Alertmanager. If the Alertmanager configurations is not getting updated then search for alertmanager-monitoring-rancher-monitor-alertmanger
secret in the cattle-monitoring-system
namespace. You need to delete this secret since it contains the alertmanager data and after deleting this, new config will get applied. Wait for some time and if the config is correct it will be updated.
jmlp1
February 17, 2024, 10:15am
3
Hi @vaishnav ,
Tried your steps, easy to follow, and indeed was the wrong config, however after deleting the secret still showed the wrong config, should be a slack and email receivers instead of pagerduty, which we don’t use
In addition the Status appears disabled, I have tried to read few internet article and Suse documentation but this is not really clear about how to proceed for enabling it
This is the confusing part
Routers and Receivers appear as deprecated, receivers are empty but routes have a default route that matches the above, this “default null” looks cannot be deleted
My configuration is set in AlertmanagerConfigs section but looks not been applied
If it’s showing disabled it’s okay. And can you create a AlertManagerConfig instead of Routes and Receivers? You can configure the routes and receivers in the AlertManagerConfig as well.