Problem adding AKS cluster to Rancher

We are trying to add AKS cluster to Rancher and having some problem with it. Here is the Slack post with the full video recording that reproduces it in real time: Slack

As you can see Rancher pod crashes in several minutes after we try to add AKS cluster using Rancher UI. If Rancher is deployed in HA mode - all pods eventually die and are getting into the crash loop.

We can provide any info on demand to help to resolve this. Any pointer for where to look/zoom in would be greatly appreciated.

If we are doing anything wrong/stupid here (we never used AKS cluster with Rancher running in AKS before) - please let us know.

We missed the error log in the aks-config-operator logs, looks like this:

Doing /etc/rancher/ssl
W0712 18:30:13.902837      10 loader.go:221] Config not found: /home/aks-operator/.kube/config
time="2022-07-12T18:30:14Z" level=info msg="Starting, Kind=AKSClusterConfig controller"
time="2022-07-12T18:30:14Z" level=info msg="Starting /v1, Kind=Secret controller"
time="2022-07-12T18:30:14Z" level=info msg="Checking configuration for cluster [aksnonprod]"
E0712 18:30:15.575463      10 runtime.go:78] Observed a panic: runtime.boundsError{x:1, y:0, signed:true, code:0x0} (runtime error: index out of range [1] with length 0)
goroutine 252 [running]:, 0xc0008a6090)
	/go/pkg/mod/ +0x95, 0x0, 0x0)
	/go/pkg/mod/ +0x86
panic(0x14c27a0, 0xc0008a6090)
	/usr/local/go/src/runtime/panic.go:965 +0x1b9, 0xc000875e40, 0x17609d8, 0xc0003ce0c0, 0x7f25ec64eff0, 0xc000345f80, 0xc000396118, 0xd18c2e2800, 0x3, 0x6fc23ac00)
	/go/src/ +0x131f*Handler).checkAndUpdate(0xc0003271c0, 0xc000396000, 0x0, 0x0, 0x0)
	/go/src/ +0x55c*Handler).OnAksConfigChanged(0xc0003271c0, 0xc000040200, 0x1a, 0xc000396000, 0xc0002b08a0, 0xc0ab8d4d8b16b31f, 0x11b0ace1)
	/go/src/ +0x9d*Handler).recordError.func1(0xc000040200, 0x1a, 0xc000396000, 0x14eb180, 0xeca0c0, 0x12cb0c0)
	/go/src/ +0x67, 0x1a, 0x1739d68, 0xc000396000, 0xe0, 0xf, 0x14eb180, 0x40a84c)
	/go/src/ +0x6b, 0xc000040200, 0x1a, 0x1739d68, 0xc000396000, 0x1a, 0xc000026c01, 0x40a56c, 0x2030288)
	/go/pkg/mod/ +0x4e*SharedHandler).OnChange(0xc000327140, 0xc000040200, 0x1a, 0x1739d68, 0xc000396000, 0xc000915d01, 0x0)
	/go/pkg/mod/ +0x14c*controller).syncHandler(0xc0000e09a0, 0xc000040200, 0x1a, 0xc000915e58, 0x4)
	/go/pkg/mod/ +0xd1*controller).processSingleItem(0xc0000e09a0, 0x13ce1e0, 0xc00085c200, 0x0, 0x0)
	/go/pkg/mod/ +0xe7*controller).processNextWorkItem(0xc0000e09a0, 0x203000)
	/go/pkg/mod/ +0x54*controller).runWorker(...)
	/go/pkg/mod/ +0x5f, 0x1734000, 0xc0002b0930, 0x1, 0xc00008b080)
	/go/pkg/mod/ +0x9b, 0x3b9aca00, 0x0, 0xc00051fc01, 0xc00008b080)
	/go/pkg/mod/ +0x98, 0x3b9aca00, 0xc00008b080)
	/go/pkg/mod/ +0x4d
created by*controller).run
	/go/pkg/mod/ +0x33b
panic: runtime error: index out of range [1] with length 0 [recovered]
	panic: runtime error: index out of range [1] with length 0

It looks like /home/aks-operator/.kube/config is not in place. Is this a bug i.e. config should be created automatically or we are missing something?

Issue created Rancher 2.2.6 aks-config-operator /home/aks-operator/.kube/config not found · Issue #38285 · rancher/rancher · GitHub - no response yet :frowning:

At this point I’m wondering if anybody successfully added any Azure AKS cluster to Rancher recently. What versions are working for you?

Other people are seeing the same crash stack trace:

Something seems to be broken, any help/workaround from the Rancher team will be appreciated.