Rancher 2.6.1 on RKE2 using RKE2's nginx ingress controller failing

Trying creating a proof-of-concept Rancher with HA. Using CentOS 7.9. Three Rancher nodes running RKE2 cluster with Kubernetes 1.21 and an external haproxy to forward traffic to it as required.

Setup seems to go ok, but trying to connect to Rancher at the end via the external haproxy leads me to a blank page which has a cert with DNS name rke2-ingress-nginx-controller-admission.kube-system.svc and Rancher did install an ingress which doesn’t mention it.

Errors in the Ingress controller logs are “TLS handshake error from ${EXT_HAPROXY_IP}:51888: remote error: tls: unknown certificate” & ‘Failed to decode request body" err="couldn’t get version/kind; json parse error: unexpected end of JSON input’

Rancher install commands:

helm repo add rancher-stable https://releases.rancher.com/server-charts/stable
helm search repo rancher-stable/rancher --versions | grep 2.6
kubectl create namespace cattle-system
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.5.1 --set installCRDs=true
cp /nfs/admin/certs/local.ca.crt ./ca-additional.pem
kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem=./ca-additional.pem
helm install rancher rancher-stable/rancher --namespace cattle-system --set hostname=${EXT_HAPROXY_HOSTNAME} --version 2.6.1 --set additionalTrustedCAs=true --set auditLog.level=1

Contents of ingress installed by Rancher:

[root@gdrch004v ~]# kubectl -n cattle-system describe ingress
Name:             rancher
Namespace:        cattle-system
Address:          ${RANCHER_RKE_NODE1_IP},${RANCHER_RKE_NODE2_IP},${RANCHER_RKE_NODE3_IP}
Default backend:  default-http-backend:80 (<error: endpoints "default-http-backend" not found>)
TLS:
  tls-rancher-ingress terminates ${EXT_HAPROXY_HOSTNAME}
Rules:
  Host                        Path  Backends
  ----                        ----  --------
  ${EXT_HAPROXY_HOSTNAME}  
                                 rancher:80 (10.42.0.10:80,10.42.1.4:80,10.42.2.4:80)
Annotations:                  cert-manager.io/issuer: rancher
                              cert-manager.io/issuer-kind: Issuer
                              field.cattle.io/publicEndpoints:
                                [{"addresses":["${RANCHER_RKE_NODE1_IP}","${RANCHER_RKE_NODE1_IP}","${RANCHER_RKE_NODE1_IP}"],"port":443,"protocol":"HTTPS","serviceName":"cattle-system:rancher","ingressN...
                              meta.helm.sh/release-name: rancher
                              meta.helm.sh/release-namespace: cattle-system
                              nginx.ingress.kubernetes.io/proxy-connect-timeout: 30
                              nginx.ingress.kubernetes.io/proxy-read-timeout: 1800
                              nginx.ingress.kubernetes.io/proxy-send-timeout: 1800
Events:                       <none>

(note - I replaced the IPs & hostnames for my environment with bash shell variable syntax describing them above)

I’m not sure if I need to do something to the RKE2 ingress controller to make it happy with incoming SSL (maybe register my local self-signed CA) or if the ingress isn’t connected to the ingress controller (which is the only one in the RKE2 cluster, set as a daemonset as per default). Documentation has pages for RKE but I’m not finding RKE2 and the RKE docs seem different enough that a lot of them have seemed to talk about things that were automated with RKE2.

Thanks,
Bill

A little more debugging info.

Running curl -k -v https://${EXT_HAPROXY_HOSTNAME} I get the following:

* About to connect() to ${EXT_HAPROXY_HOSTNAME} port 443 (#0)
*   Trying ${EXT_HAPROXY_IP}...
* Connected to ${EXT_HAPROXY_HOSTNAME} (${EXT_HAPROXY_IP}) port 443 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb
* skipping SSL peer certificate verification
* SSL connection using TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
* Server certificate:
* 	subject: O=nil2
* 	start date: Oct 19 15:14:11 2021 GMT
* 	expire date: Sep 25 15:14:11 2121 GMT
* 	common name: (nil)
* 	issuer: O=nil1
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: ${EXT_HAPROXY_HOSTNAME}
> Accept: */*
> 
< HTTP/1.1 400 Bad Request
< Date: Thu, 21 Oct 2021 20:49:29 GMT
< Content-Length: 0
< 
* Connection #0 to host ${EXT_HAPROXY_HOSTNAME} left intact

Using kubectl -n kube-system logs -f rke2-ingress-nginx-controller-_____ on all three instances at once I saw that gave me the log message:

E1021 20:49:29.891546       8 server.go:77] "Failed to decode request body" err="couldn't get version/kind; json parse error: unexpected end of JSON input"

So tried adding a -d '{}' to the curl command and the log message changed to:

E1021 20:50:28.368491       7 server.go:77] "Failed to decode request body" err="Object 'Kind' is missing in '{}'"

So to me this looks like the ingress installed by Rancher is not using the ingress controller in RKE2, which is the only installed ingress controller, which the general Kubernetes docs say means it should make it the default ingress controller.

I’m brand new to Kubernetes, so possibly I’m missing something here?

Got a reply from Rancher employee @brandond on Slack RKE2 channel. The JSON parsing is from traffic going to the webhook and not to the ingress. I needed to swap the 8080 & 8443 to 80 & 443 in my load balancer, so the load balancer also wasn’t really necessary.

I’d originally done this because when checking on the RKE2 hosts the process listening on ports 80 & 443 was labeled nginx, while the one on 8443 was named nginx-ingress-controller also 8080 & 8443 were listed as Rancher UI ports on Rancher Docs: Port Requirements for RKE2 required ports for running a Rancher cluster, but actually the traffic is supposed to be 80 & 443.

I’m going to have problems to go back to later as the 8443 thing fixed an issue I had installing RKE2 in the first place, but that’ll be a different problem.

So this one appears fixed.