I’ve recently configured a launch template for a node group in EKS, which is managed by Rancher.
This went through some iterations which required creating a few different node groups in order to get the nodes set up as I wanted.
After cleaning up the old node groups and updating the rancher user with the necessary permissions, the cluster is now stuck in an updating state with the following error:
Controller.FailureMessage{ClusterName:“”, Message_:“Launch template could not be found : Could not find the specified version 3 for the launch template with ID lt-xxxx.”, NodegroupName:“”}
The launch template exists, but the version was removed. This version was also connected to a node group which doesn’t exist.
Anyone come across this before? Or any ideas on how to solve this?
EDIT: Looks like the ‘eks-config-operator’ is the culprit:
time=“2024-04-16T13:56:59Z” level=error msg=“error syncing ‘cattle-global-data/c-94tkb’: handler eks-controller: error creating nodegroup: InvalidParameterException: Launch template could not be found : Could not find the specified version 3 for the launch template with ID lt-xxxxxxxx.\n{\n RespMetadata: {\n StatusCode: 400,\n RequestID: "e324c1b9-16ec-4788-af55-c3581719fe15"\n },\n Message_: "Launch template could not be found : Could not find the specified version 3 for the launch template with ID lt-xxxxxxxx."\n}, requeuing”
It looks like the controller is infinitely requeuing the job. Restarting the operator doesn’t help. Not sure if there’s a way to clear this.
I am facing the similar issue.
I recently upgraded eks from 1.26 to 1.27, there was new launch template been created. On rancher, it is still looking for old template.
ERROR: Controller.FailureMessage{ClusterName:“”, Message_:“Launch template could not be found : The specified launch template, with template ID lt-xxxx, does not exist.”, NodegroupName:“”}
Also, when checked cluster management, the kubernetes version been grayed out.
On Rancher 2.7.9. I solved patching manually the clusters.management.cattle.io Object related to the downstream cluster on the Rancher local instance (namespace fllet-default) .
I replaced the inconsistent data under
spec:
eksConfig:
nodeGroups:
with the correct coming from the status field of the object
status:
appliedSpec:
eksConfig:
nodeGroups:
1 Like
Thanks! That worked like a charm.
Note for anyone else making these changes: ensure that all changes have been moved from status.appliedSpec.eksConfig.nodeGroups to spec.eksConfig.nodeGroups
Any changes made to node groups after the initial error will be reverted if the spec isn’t up-to-date, which can result in node groups being deleted from EKS.
We have the same issue here and it happens 3 times again where we created new nodegroup and deleted the old one. So it looks for us like an error which wil always happen.
Do someone know a way to fix this instead of everythime to edit the http://clusters.management.cattle.io with the fear to lose nodegroups because something is overseen.
Using Rancher v2.8.4
At it seems that Rancher 2.8.5 fix this:
- Fixed an issue where custom secrets encryption configurations were being stored in plaintext under the clusters
AppliedSpec
. This was also causing clusters to continuously reconcile, as the AppliedSpec
would never match the desired cluster Spec
. The information stored here contains the encryption configuration for secrets within etcd, and could potentially expose sensitive data if the etcd database was exposed directly. For more information, see [#45800] and [CVE-2024-22032 ].
Rancher Release v2.8.5 - Announcements - Rancher Labs