I’ve recently configured a launch template for a node group in EKS, which is managed by Rancher.
This went through some iterations which required creating a few different node groups in order to get the nodes set up as I wanted.
After cleaning up the old node groups and updating the rancher user with the necessary permissions, the cluster is now stuck in an updating state with the following error:
Controller.FailureMessage{ClusterName:“”, Message_:“Launch template could not be found : Could not find the specified version 3 for the launch template with ID lt-xxxx.”, NodegroupName:“”}
The launch template exists, but the version was removed. This version was also connected to a node group which doesn’t exist.
Anyone come across this before? Or any ideas on how to solve this?
EDIT: Looks like the ‘eks-config-operator’ is the culprit:
time=“2024-04-16T13:56:59Z” level=error msg=“error syncing ‘cattle-global-data/c-94tkb’: handler eks-controller: error creating nodegroup: InvalidParameterException: Launch template could not be found : Could not find the specified version 3 for the launch template with ID lt-xxxxxxxx.\n{\n RespMetadata: {\n StatusCode: 400,\n RequestID: "e324c1b9-16ec-4788-af55-c3581719fe15"\n },\n Message_: "Launch template could not be found : Could not find the specified version 3 for the launch template with ID lt-xxxxxxxx."\n}, requeuing”
It looks like the controller is infinitely requeuing the job. Restarting the operator doesn’t help. Not sure if there’s a way to clear this.