Hello all,
I’m struggeling with exposing the Thanos-sidecar via GRPC deployed with monitoring-chart for a Thanos-querier located in another cluster. All clusters are managed by Rancher with its default nginx-ingress-controller and we’re using Cloudflare for DNS and the default ingress-certificates.
The error I can see in the ingress-controller-logs when trying to access the host-address of the ingress via browser:
[error] upstream rejected request with error 2 while reading response header from upstream, client: , server: thanos-sidecar-clusterXYZ.domain.com, request: “GET / HTTP/2.0”, upstream: “grpc://10.42.5.138:10901”, host: “thanos-sidecar-clusterXYZ.domain.com”
12/Nov/2021:17:06:22 +0000 [source IP: ], server: thanos-sidecar-clusterXYZ.domain.com, method: GET, uri: /, request_filename: /usr/local/nginx/html/, bytes_sent: 659, request_time: 0.001, status: 502, request_proto: HTTP/2.0, duration: 0.001, http_user_agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
2021/11/12 17:06:22
Manifests of the cluster with the sidecar:
apiVersion: v1
kind: Service
metadata:
annotations:
meta.helm.sh/release-name: rancher-monitoring
meta.helm.sh/release-namespace: cattle-monitoring-system
spec:
clusterIP: 10.43.202.158
clusterIPs:
- 10.43.202.158
ports:
- name: grpc
port: 10901
protocol: TCP
targetPort: 10901
selector:
app.kubernetes.io/name: prometheus
prometheus: rancher-monitoring-prometheus
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
meta.helm.sh/release-name: rancher-monitoring
meta.helm.sh/release-namespace: cattle-monitoring-system
nginx.ingress.kubernetes.io/backend-protocol: grpc
rancher.io/globalDNS.hostname: thanos-sidecar-clusterXYZ.domain.com
spec:
rules:
- host: thanos-sidecar-clusterXYZ.domain.com
http:
paths:
- backend:
serviceName: rancher-monitoring-thanos-external
servicePort: 10901
path: /
pathType: ImplementationSpecific
tls:
- hosts:
- thanos-sidecar-clusterXYZ.domain.com
In the Thanos-querier-cluster with querier-config:
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
meta.helm.sh/release-name: thanos
meta.helm.sh/release-namespace: thanos
spec:
spec:
containers:
- args:
- query
- --log.level=debug
- --log.format=logfmt
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --query.replica-label=replica
- --store=dnssrv+_grpc._tcp.rancher-monitoring-thanos-discovery.cattle-monitoring-system.svc.cluster.local
- --store=dnssrv+_grpc._tcp.thanos-storegateway.thanos.svc.cluster.local
- --store=thanos-sidecar-clusterXYZ.domain.com:10901
name: query
ports:
- containerPort: 10902
name: http
protocol: TCP
- containerPort: 10901
name: grpc
protocol: TCP
there just occure:
level=warn caller=endpointset.go:525 component=endpointset msg="update of node failed" err="getting metadata: fallback fetching info from thanos-sidecar-clusterXYZ.domain.com:10901: rpc error: code = Unavailable desc = connection closed" address=thanos-sidecar-clusterXYZ.domain.com:10901
Can someone point me into the right direction or has any hints to setup the connection properly? Or has somebody in general thoughts or experience how to handle the cross-cluster GRPC-communication with Rancher + Cloudflare + its certificates?