Rancher 2.6.1 on RKE2 using RKE2's nginx ingress controller failing

wcoateRR · October 20, 2021, 6:15pm

Trying creating a proof-of-concept Rancher with HA. Using CentOS 7.9. Three Rancher nodes running RKE2 cluster with Kubernetes 1.21 and an external haproxy to forward traffic to it as required.

Setup seems to go ok, but trying to connect to Rancher at the end via the external haproxy leads me to a blank page which has a cert with DNS name rke2-ingress-nginx-controller-admission.kube-system.svc and Rancher did install an ingress which doesn’t mention it.

Errors in the Ingress controller logs are “TLS handshake error from ${EXT_HAPROXY_IP}:51888: remote error: tls: unknown certificate” & ‘Failed to decode request body" err="couldn’t get version/kind; json parse error: unexpected end of JSON input’

Rancher install commands:

helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm search repo rancher-stable/rancher --versions | grep 2.6
kubectl create namespace cattle-system
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.5.1 --set installCRDs=true
cp /nfs/admin/certs/local.ca.crt ./ca-additional.pem
kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem=./ca-additional.pem
helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=${EXT_HAPROXY_HOSTNAME} --version 2.6.1 --set additionalTrustedCAs=true --set auditLog.level=1

Contents of ingress installed by Rancher:

[root@gdrch004v ~]# kubectl -n cattle-system describe ingress
Name:             rancher
Namespace:        cattle-system
Address:          ${RANCHER_RKE_NODE1_IP},${RANCHER_RKE_NODE2_IP},${RANCHER_RKE_NODE3_IP}
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
  tls-rancher-ingress terminates ${EXT_HAPROXY_HOSTNAME}
Rules:
  Host                        Path  Backends
  ----                        ----  --------
  ${EXT_HAPROXY_HOSTNAME}  
                                 rancher:80 (10.42.0.10:80,10.42.1.4:80,10.42.2.4:80)
Annotations:                  cert-manager.io/issuer: rancher
                              cert-manager.io/issuer-kind: Issuer
                              field.cattle.io/publicEndpoints:
                                [{"addresses":["${RANCHER_RKE_NODE1_IP}","${RANCHER_RKE_NODE1_IP}","${RANCHER_RKE_NODE1_IP}"],"port":443,"protocol":"HTTPS","serviceName":"cattle-system:rancher","ingressN...
                              meta.helm.sh/release-name: rancher
                              meta.helm.sh/release-namespace: cattle-system
                              nginx.ingress.kubernetes.io/proxy-connect-timeout: 30
                              nginx.ingress.kubernetes.io/proxy-read-timeout: 1800
                              nginx.ingress.kubernetes.io/proxy-send-timeout: 1800
Events:                       <none>

(note - I replaced the IPs & hostnames for my environment with bash shell variable syntax describing them above)

I’m not sure if I need to do something to the RKE2 ingress controller to make it happy with incoming SSL (maybe register my local self-signed CA) or if the ingress isn’t connected to the ingress controller (which is the only one in the RKE2 cluster, set as a daemonset as per default). Documentation has pages for RKE but I’m not finding RKE2 and the RKE docs seem different enough that a lot of them have seemed to talk about things that were automated with RKE2.

Thanks,
Bill

wcoateRR · October 21, 2021, 9:02pm

A little more debugging info.

Running curl -k -v https://${EXT_HAPROXY_HOSTNAME} I get the following:

* About to connect() to ${EXT_HAPROXY_HOSTNAME} port 443 (#0)
*   Trying ${EXT_HAPROXY_IP}...
* Connected to ${EXT_HAPROXY_HOSTNAME} (${EXT_HAPROXY_IP}) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* 	subject: O=nil2
* 	start date: Oct 19 15:14:11 2021 GMT
* 	expire date: Sep 25 15:14:11 2121 GMT
* 	common name: (nil)
* 	issuer: O=nil1
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: ${EXT_HAPROXY_HOSTNAME}
> Accept: */*
> 
< HTTP/1.1 400 Bad Request
< Date: Thu, 21 Oct 2021 20:49:29 GMT
< Content-Length: 0
< 
* Connection #0 to host ${EXT_HAPROXY_HOSTNAME} left intact

Using kubectl -n kube-system logs -f rke2-ingress-nginx-controller-_____ on all three instances at once I saw that gave me the log message:

E1021 20:49:29.891546       8 server.go:77] "Failed to decode request body" err="couldn't get version/kind; json parse error: unexpected end of JSON input"

So tried adding a -d '{}' to the curl command and the log message changed to:

E1021 20:50:28.368491       7 server.go:77] "Failed to decode request body" err="Object 'Kind' is missing in '{}'"

So to me this looks like the ingress installed by Rancher is not using the ingress controller in RKE2, which is the only installed ingress controller, which the general Kubernetes docs say means it should make it the default ingress controller.

I’m brand new to Kubernetes, so possibly I’m missing something here?

wcoateRR · October 25, 2021, 3:18pm

Got a reply from Rancher employee @brandond on Slack RKE2 channel. The JSON parsing is from traffic going to the webhook and not to the ingress. I needed to swap the 8080 & 8443 to 80 & 443 in my load balancer, so the load balancer also wasn’t really necessary.

I’d originally done this because when checking on the RKE2 hosts the process listening on ports 80 & 443 was labeled nginx, while the one on 8443 was named nginx-ingress-controller also 8080 & 8443 were listed as Rancher UI ports on Rancher Docs: Port Requirements for RKE2 required ports for running a Rancher cluster, but actually the traffic is supposed to be 80 & 443.

I’m going to have problems to go back to later as the 8443 thing fixed an issue I had installing RKE2 in the first place, but that’ll be a different problem.

So this one appears fixed.

Topic		Replies	Views
Failed to connect to peer wss://IP/v3/connect [local ID=IP]: dial tcp IP:443: i/o timeout Rancher	4	1543	January 2, 2024
Rancher + MetalLb +ingress.tls.source=secret = ERR_SSL_PROTOCOL_ERROR Rancher	0	1058	May 16, 2020
Rancher 2.x HA install and SSL termination Rancher	5	2237	February 19, 2019
Getting Kubernetes Ingress Controller Fake Certificate Rancher	2	4270	September 11, 2020
[solved] Rancher 2.6.x Ingress Controller SSL failing Rancher	1	2227	October 29, 2021

Rancher 2.6.1 on RKE2 using RKE2's nginx ingress controller failing

Related topics