Rancher Release v2.7.10

Release v2.7.10

It is important to review the Install/Upgrade Notes below before upgrading to any Rancher version.

Rancher v2.7.10 is a security release to address the following issues:

Security Fixes for Rancher Vulnerabilities

This release addresses the following Rancher security issues:

For more details, see the Security Advisories and CVEs page in Rancher’s documentation page or in Rancher’s GitHub repo.

Known Issues

  • When you upgrade to Rancher v2.7.9 with provisioned RKE2/K3s clusters in an unhealthy state, you may encounter the error message, implausible joined server for entry. This requires manually marking the nodes in the cluster with a joined server. See #42856.
  • Scaling down etcd node pools on RKE2/K3s machine-provisioned clusters may result in unexpected behavior. To avoid this problem, define multiple machine pools for etcd nodes, each with a quantity of one. You can then scale down simply by deleting machine pools. As a further mitigation, have a robust backup strategy and store your etcd snapshots in a safe location. Restoring from an etcd snapshot allows you to return the cluster back to an operational state if you’re affected by unexpected behaviors while scaling down etcd node pools. See #42582 and #43263.

Install/Upgrade Notes

Upgrade Requirements

  • Creating backups: We strongly recommend creating a backup before upgrading Rancher. To roll back Rancher after an upgrade, you must back up and restore Rancher to the previous Rancher version. Because Rancher will be restored to its state when a backup was created, any changes post upgrade will not be included after the restore. For more information, see the documentation on backing up Rancher.
  • Helm version: Rancher install or upgrade must occur with Helm 3.2.x+ due to the changes with the latest cert-manager release. See #29213.
  • CNI requirements:
    • For Kubernetes v1.19 and newer, we recommend disabling firewalld as it’s incompatible with various CNI plugins. See #28840.
    • If upgrading or installing a Linux distribution which uses nf_tables as the backend packet filter (such as SLES 15, RHEL 8, Ubuntu 20.10, Debian 10, or later), upgrade to RKE v1.19.2 or later to get Flannel version v0.13.0, which supports nf_tables. See Flannel #1317.
  • Requirements for air gapped environments:
    • When installing or upgrading Rancher in an air gapped environment, add the flag --no-hooks to the helm template command, to skip rendering files for Helm’s hooks. See #3226.
    • If using a proxy in front of an air-gapped Rancher instance, you must pass additional parameters to NO_PROXY. See the documentation and related issue #2725.
  • Requirements for Docker installs:
    • When starting the Rancher Docker container, you must use the privileged flag. See documentation.
    • When installing in an air-gapped environment, you must supply a custom registries.yaml file to the docker run command, as shown in the K3s documentation. If the registry has certificates, then you’ll also need to supply those. See #28969.
    • When upgrading a Docker installation, a panic may occur in the container, which causes it to restart. After restarting, the container will come up and work as expected. See #33685.

Reverting the Active Directory Migration Delivered in v2.7.5

The following information only applies if you are upgrading from Rancher v2.7.5. It does not apply if you are upgrading directly to the latest Rancher version from v2.7.4 or earlier, or if you are upgrading to the latest Rancher version from v2.7.6.

Rancher v2.7.6 and later contain a reverse migration utility that runs at startup. Data migration is only triggered if you have been on Rancher v2.7.5.

Users will be corrected from the v2.7.5 data migration, which updated personalIDs to use GUIDs instead of Distinguished Names (DNs). Rancher v2.7.6 and later fix bugs related to the inability to login for various reasons. See #41985 and #42120.

Important: If you disabled AD authentication while on v2.7.5, don’t enable it after upgrading until after the utility is run. Doing so will cause the reverse migration to fail to clean up the remaining bad data.

We strongly recommend that you directly upgrade to the latest version of Rancher v2.7.x, especially if you’re on a broken or partially downgraded Rancher setup after upgrading to v2.7.5. Allow the startup utility to revert the Active Directory changes to restore functionality to your setup.

Even if you’re currently on Rancher v2.7.5 and your setup wasn’t broken by the Active Directory changes, you should still upgrade to v2.7.6 or later and allow the startup utility to revert the migration.

The reverse migration startup utility saves all relevant changes to Rancher if it finds GUID-based users in Active Directory. The users’ data (including the user object, all bindings, and tokens) return a Distinguished Name as the principalID. If the LDAP connection permanently fails during execution of the utility, Rancher automatically retries the utility several times with exponential backoff. Missing users are left behind and reported to the local admin for manual review.

If you need to clean up any missing users following an upgrade to the latest Rancher version, contact support.

Long-standing Rancher Behavior Changes

  • The cluster-api core provider controllers are now run in a pod in the cattle-provisioning-cattle-system namespace, within the local cluster. These controllers are installed with a Helm chart. Previously, Rancher ran cluster-api controllers in an embedded fashion. This change makes it easier to maintain cluster-api versioning. See #41094.
  • Changed the token hashing algorithm to generate new tokens using SHA3. Existing tokens will not be re-hashed. This change affects ClusterAuthTokens (the downstream synced version of tokens for ACE) and Tokens (only when token hashing is enabled). SHA3 tokens should work with ACE and Token Hashing. Existing tokens which don’t use SHA3 may not work when ACE and token hashing are used in combination. If, after upgrading to Rancher v2.7.7, you experience issues with ACE while Token Hashing is enabled, you should re-generate any applicable tokens. See #42062.
  • If you use a version of backup restore older than v102.0.2+up3.1.2 to take a backup of Rancher v2.7.7, the migration will encounter a capi-webhook error. Make sure that the chart version used for backups is v102.0.2+up3.1.2, which has cluster.x-k8s.io/v1alpha4 resources removed from the resourceSet. If you can’t use v102.0.2+up3.1.2 for backups, delete all cluster.x-k8s.io/v1alpha4 resources from the backup tar before using it. See #382.
  • Rancher installs the same pinned version of the rancher-webhook chart not only in the local cluster but also in all downstream clusters. Note that restoring Rancher from v2.7.5 to an earlier version will result in downstream clusters’ webhooks being at the version set by Rancher v2.7.5, which might cause incompatibility issues. Local and downstream webhook versions ideally need to be in sync. See #41730 and #41917.
  • The mutating webhook configuration for secrets is no longer active in downstream clusters. See #41613.
  • You must manually change the psp.enabled value in the chart install yaml when you install or upgrade v102.x.y charts on hardened RKE2 clusters. Instructions for updating the value are available. See #41018.
  • The Helm Controller in RKE2/K3s now respects the managedBy annotation. Project Monitoring V2 required a workaround in its initial release to set helmProjectOperator.helmController.enabled: false since the Helm Controller operated on a cluster-wide level and ignored the managedBy annotation. See #39724.
  • Rancher might retain resources from a disabled auth provider configuration in the local cluster, even after you configure another auth provider. To manually trigger cleanup for a disabled auth provider, add the management.cattle.io/auth-provider-cleanup annotation with the unlocked value to its auth config. See #40378.
  • Privilege escalation is now disabled by default when creating deployments. See #7165.
  • Rancher maintains a /v1/counts endpoint that the UI uses to display resource counts. The UI subscribes to changes to the counts for all resources through a websocket to receive the new counts for resources.
    • Rancher now aggregates the changed counts and only send a message every 5 seconds. This, in turn, requires the UI to update the counts at most once every 5 seconds, improving UI performance. Previously, Rancher would send a message each time the resource counts changed for a resource type. This lead to the UI needing to constantly stop other areas of processing to update the resource counts. See #36682.
    • Rancher now only sends back a count for a resource type if the count has changed from the previously known number, improving UI performance. Previously, each message from this socket would include all counts for every resource type in the cluster, even if the counts only changed for one specific resource type. This would cause the UI to need to re-update resource counts for every resource type at a high frequency, causing significant performance impact. See #36681.
  • When provisioning downstream clusters, the cluster name must now conform to RFC-1123. Previously, characters that did not follow the specification, such as ., were permitted and would result in clusters being provisioned without the necessary Fleet components. See #39248.
  • In previous versions, pods critical to running Rancher didn’t use a priority class. This could cause a cluster with limited resources to evict Rancher pods before other noncritical pods. A configurable priorityClass has been added to the Rancher pod and its feature charts. See #37927.
  • Rancher now defaults to using the bci-micro image for sidecar audit logging, instead of Busybox. See #35587.
  • Rancher no longer validates an app registration’s permissions to use Microsoft Graph on endpoint updates or initial setup. You should add Directory.Read.All permissions of type Application. If you configure a different set of permissions, Rancher may not have sufficient privileges to perform some necessary actions within Azure AD. This will cause errors.
  • Previously, only the global container registry was used when installing or upgrading an official Rancher Helm chart app for RKE2/K3s node driver clusters. The default behavior has been changed, so that if a private registry exists in the cluster configuration, that registry will be used for pulling images. If no cluster-scoped registry is found, the global container registry will be used. A custom default registry can be specified during the Helm chart install and upgrade workflows.

Versions

Please refer to the README for latest and stable versions.

Please review our version documentation for more details on versioning and tagging conventions.

Images

  • rancher/rancher:v2.7.10

Tools

Kubernetes Versions for RKE

  • v1.26.11 (Default)
  • v1.25.16
  • v1.24.17
  • v1.23.16

Kubernetes Versions for RKE2/K3s

  • v1.26.9 (Default)
  • v1.25.14
  • v1.24.17
  • v1.23.17

Rancher Helm Chart Versions

Starting in 2.6.0, many of the Rancher Helm charts available in the Apps & Marketplace will start with a major version of 100. This was done to avoid simultaneous upstream changes and Rancher changes from causing conflicting version increments. This also brings us into compliance with semver, which is a requirement for newer versions of Helm. You can now see the upstream version of a chart in the build metadata, for example: 100.0.0+up2.1.0. See #32294.

Other Notes

Experimental Features

  • Dual-stack and IPv6-only support for RKE1 clusters using the Flannel CNI will be experimental starting in v1.23.x. See the upstream Kubernetes docs. Dual-stack is not currently supported on Windows. See #165.

Deprecated Upstream Projects

  • Microsoft has deprecated the Azure AD Graph API that Rancher had been using for authentication via Azure AD. A configuration update is necessary to make sure users can still use Rancher with Azure AD. See the docs and #29306 for details.

Removed Legacy Features

The following legacy features have been removed as of Rancher v2.7.0. The deprecation and removal of these features were announced in previous releases. See #6864.

UI and Backend

  • CIS Scans v1 (Cluster)
  • Pipelines (Project)
  • Istio v1 (Project)
  • Logging v1 (Project)
  • RancherD

UI

  • Multiclusterapps (Global) - Apps within Multicluster Apps section

Long-standing Known Major Issues

  • The SAML authentication pop-up throws a 404 error on high-availability RKE installations. Single node Docker installations aren’t affected. If you refresh the browser window and select Resend, the authentication request will succeed, and you will be able to log in. See #31163.
  • Due to the backoff logic in various components, downstream provisioned K3s and RKE2 clusters may take longer to re-achieve Active status after a migration. If you see that a downstream cluster is still updating or in an error state immediately after a migration, please let it attempt to resolve itself. This might take up to an hour to complete. See #34518 and #42834.
  • Users on certain LDAP setups don’t have permission to search LDAP. When they attempt to perform a search, they receive the error message, Result Code 32 "No Such Object". See #35259.
  • Target customization can produce custom resources that exceed the Rancher API’s maximum bundle size. This results in Request entity too large errors when attempting to add a GitHub repo. Only target customizations that modify the Helm chart URL or version are affected. As a workaround, use multiple paths or GitHub repos instead of target customization. See #1650.
  • Kubernetes Cluster Distributions:
    • Starting with Rancher v2.7.2, a webhook will now be installed in all downstream clusters. There are several issues that users may encounter with this functionality:
      • If you rollback from a version of Rancher >= v2.7.2 to a version < v2.7.2, you’ll experience an issue where the webhooks will remain in downstream clusters. Since the webhook is designed to be 1:1 compatible with specific versions of Rancher, this can cause unexpected behaviors to occur downstream. The Rancher team has developed a script which should be used after rollback is complete (meaning after Rancher version < v2.7.2 is running) to remove the webhook from affected downstream clusters. See #40816.
    • Any distribution - Windows:
      • Downstream Windows clusters get stuck after a Rancher upgrade. Windows nodes become stuck in an unavailable state, with an failed to list *v1.ConfigMap: configmaps "kube-root-ca.crt" is forbidden error message. As a workaround, reboot the node. See #42426.
    • RKE:
      • Rotating encryption keys with a custom encryption provider is not supported. See #30539.
    • RKE and RKE2:
      • When running CIS scans on RKE and RKE2 clusters on Kubernetes v1.25, some tests will fail if the rke-profile-hardened-1.23 or the rke2-profile-hardened-1.23 profile is used. These RKE and RKE2 test cases failing is expected as they rely on PSPs, which have been removed in Kubernetes v1.25. See #39851.
    • RKE2:
      • Amazon ECR Private Registries are not functional. See #33920.
      • When provisioning using an RKE2 cluster template, the rootSize for AWS EC2 provisioners does not currently take an integer when it should, and an error is thrown. To work around this issue, wrap the EC2 rootSize in quotes. See #40128.
      • The communication between the ingress controller and the pods doesn’t work when you create an RKE2 cluster with Cilium as the CNI and activate project network isolation. See documentation and #34275.
    • RKE2 - Windows:
      • CSI Proxy for Windows will now work in an air-gapped environment.
      • NodePorts do not work on Windows Server 2022 in RKE2 clusters due to a Windows kernel bug. See #159.
      • When upgrading Windows nodes in RKE2 clusters via the Rancher UI, Windows worker nodes will require a reboot after the upgrade is completed. See #37645.
    • RKE2 and K3s:
      • Provisioning RKE2/K3s clusters with added (not built-in) custom node drivers causes provisioning to fail. However, there is a workaround for fixing these node drivers after activating. See #37074.
      • Scaling down etcd nodes on K3s/RKE2 machine-provisioned clusters may inadvertently delete all etcd nodes in the pool. This is linked to an upstream cluster-api bug that causes the controllers to delete more than the desired quantity of etcd nodes when reconciling an RKE Machine Pool. This issue affects etcd node scale-down operations on K3s/RKE2 machine-provisioned clusters. To help mitigate the issue, have a robust backup strategy and store your etcd snapshots in a safe location. See #42582.
      • Deleting a control plane node results in worker nodes also reconciling. See #39021.
      • Deleting nodes from custom RKE2/K3s clusters in Rancher v2.7.2 can cause unexpected behavior, if the underlying infrastructure isn’t thoroughly cleaned. When deleting a custom node from your cluster, ensure that you delete the underlying infrastructure for it, or run the corresponding uninstall script for the Kubernetes distribution installed on the node. See #41034:
    • K3s:
      • Clusters are in an Updating state even when it contains nodes in an Error state. See #39164.
    • AKS:
      • Imported Azure Kubernetes Service (AKS) clusters don’t display workload level metrics. This bug affects Monitoring V1. A workaround is available. See #4658.
      • When editing or upgrading the AKS cluster, do not make changes from the Azure console or CLI at the same time. These actions must be done separately. See #33561.
      • Windows node pools are not currently supported. See #32586.
      • Azure Container Registry-based Helm charts cannot be added in Cluster Explorer, but do work in the Apps feature of Cluster Manager. Note that when using a Helm chart repository, the disableSameOriginCheck setting controls when credentials are attached to requests. See documentation and #34584 for more information.
    • GKE:
      • Basic authentication must be explicitly disabled in GCP before upgrading a GKE cluster to 1.19+ in Rancher. See #32312.
    • EKS:
      • EKS clusters on Kubernetes v1.21 or below on Rancher v2.7 cannot be upgraded. To see more detail about this issue and the workaround, please see this comment.
  • Harvester:
    • If you’re using Rancher v2.7.2 with Harvester v1.1.1 clusters, you won’t be able to select the Harvester cloud provider when deploying or updating guest clusters. The Harvester release notes contain instructions on how to resolve this. See #3750.
    • Upgrades from Harvester v0.3.0 are not supported.
    • Deploying Fleet to Harvester clusters is not yet supported. Clusters, whether Harvester or non-Harvester, imported using the Virtualization Management page will result in the cluster not being listed on the Continuous Delivery page. See #35049.
  • Cluster Tools:
    • Authentication:
      • Rancher might retain resources from a disabled auth provider configuration in the local cluster, even after configuring another auth provider. To manually trigger cleanup for a disabled auth provider, add the management.cattle.io/auth-provider-cleanup annotation with the unlocked value to its auth config. See #40378.
    • Fleet:
      • When updating Rancher, sometimes Fleet is not upgraded to the latest version. See #1590. To ensure you upgrade Fleet:
        1. Refresh the rancher-charts catalog resource.
        2. Go to Apps, select All Workspaces, and note the versions of fleet-crd and fleet-chart.
        3. Select the dot menu to select Upgrade to check if there is a newer Fleet version listed.
        4. Update the fleet chart. Fleet will automatically update the agents.
      • Multiple fleet-agent pods may be created and deleted during initial downstream agent deployment; rather than just one. This resolves itself quickly, but is unintentional behavior. See #33293.
    • Hardened clusters:
      • Not all cluster tools can currently be installed on a hardened cluster.
    • Rancher Backup:
      • When migrating to a cluster with the Rancher Backup feature, the server-url cannot be changed to a different location. It must continue to use the same URL.
      • Because Kubernetes v1.22 drops the apiVersion apiextensions.k8s.io/v1beta1, trying to restore an existing backup file into a v1.22+ cluster will fail because the backup file contains CRDs with the apiVersion v1beta1. There are two options to work around this issue: update the default resourceSet to collect the CRDs with the apiVersion v1, or update the default resourceSet and the client to use the new APIs internally. See documentation and #34154.
    • Monitoring:
      • Deploying Monitoring on a Windows cluster with win_prefix_path set requires users to deploy Rancher Wins Upgrader to restart wins on the hosts to start collecting metrics in Prometheus. See #32535.
    • Logging:
      • Windows nodeAgents are not deleted when performing helm upgrade after disabling Windows logging on a Windows cluster. See #32325.
    • Istio Versions:
      • Istio 1.12 and below do not work on Kubernetes 1.23 clusters. To use the Istio charts, please do not update to Kubernetes 1.23 until the next charts’ release.
      • Deprecated resources are not automatically removed and will cause errors during upgrades. Manual steps must be taken to migrate and/or cleanup resources before an upgrade is performed. See #34699.
      • Applications injecting Istio sidecars, fail on SELinux RHEL 8.4 enabled clusters. A temporary workaround for this issue is to run the following command on each cluster node before creating a cluster: mkdir -p /var/run/istio-cni && semanage fcontext -a -t container_file_t /var/run/istio-cni && restorecon -v /var/run/istio-cni. See #33291.
  • Docker Installations:
    • UI issues may occur due to a longer startup time. User will receive an error message when launching Docker for the first time #28800, and user is directed to username/password screen when accessing the UI after a Docker install of Rancher. See #28798.
    • On a Docker install upgrade and rollback, Rancher logs will repeatedly display the messages “Updating workload ingress-nginx/nginx-ingress-controller” and “Updating service frontend with public endpoints”. Ingresses and clusters are functional and active, and logs resolve eventually. See #35798.
    • Rancher single node wont start on Apple M1 devices with Docker Desktop 4.3.0 or newer. See #35930.
  • Rancher UI:
    • The Cluster page shows the Registration tab when updating or upgrading a hosted cluster. See #8524.
    • When viewing or editing the YAML configuration of downstream RKE2 clusters through the UI, spec.rkeConfig.machineGlobalConfig.profile is set to null, which is an invalid configuration. See #8480.
    • When you upgrade your Kubernetes cluster, you might see the following error: Cluster health check failed. During an upgrade, this is a benign error and will self-resolve. It’s caused by the Kubernetes API server becoming temporarily unavailable as it is being upgraded within your cluster. See #41012.
      • When enabling some custom node drivers, the Cloud Credential creation page does not show the correct default fields and has an uneditable foo key. See #8563.
    • You need to force-refresh the Rancher UI after initial Rancher setup, to trigger the prompt to accept the self-signed certificate. As a workaround, visit the Rancher portal, accept the self-signed certificate, and go through the setup process. Once done, go to the address bar of your browser and click the lock icon. Select the option to allow you to receive certificate errors for the Rancher website. You’ll then be prompted again to accept the new certificate. See #7867.
    • Once you configure a setting with an environmental variable, it can’t be updated through the Rancher API or the UI. It can only be updated through changing the value of the environmental variable. Setting the environmental variable to “” (the empty string) changes the value in the Rancher API but not in Kubernetes. As a workaround, run kubectl edit setting <setting-name>, then set the value and source fields to "", and re-deploy Rancher. See #37998.
    • After installing an app from a partner chart repo, the partner chart will upgrade to feature charts if the chart also exists in the feature charts default repo. See #5655.
    • In some instances under Users and Authentication, no users are listed and selectinging Create to create a new user does not display the entire form. To work around this when encountered, perform a hard refresh to be able to log back in. See #37531.
    • Deployment securityContext section is missing when a new workload is created. This prevents pods from starting when Pod Security Policy Support is enabled. See #4815.
    • Removed legacy feature multi-cluster app. See #39525.
  • Legacy UI:
    • When using the Rancher UI to add a new port of type ClusterIP to an existing Deployment created using the legacy UI, the new port will not be created upon saving. To work around this issue, repeat the procedure to add the port again. Users will notice the Service Type field will display as Do not create a service. Change this to ClusterIP and upon saving, the new port will be created successfully during this subsequent attempt. See #4280.