Rancher Release v2.7.2

Jono-SUSE-Rancher · April 12, 2023, 4:22am

Release v2.7.2

It is important to review the Install/Upgrade Notes below before upgrading to any Rancher version.

Warning: v2.7.2 contains a critical vulnerability, CVE-2023-22651. Customers currently on v2.7.2 should upgrade to v2.7.3 as soon as possible. v2.7.3 resolves this issue but is otherwise a mirror release of v2.7.2. Please see the advisory for more information.

Features and Enhancements

Kubernetes v1.25 Support

Support for Kubernetes v1.25 has been added. See #38701

Pod Security Policies, Pod Security Admissions, and Pod Security Standards

Kubernetes v1.25 removes Pod Security Policies (PSPs). Rancher v2.7.2 adds support for their replacement, Pod Security admission (PSAs) and Pod Security Standards (PSSs). To facilitate their usage, Rancher v2.7.2 also adds a new custom resource definition (CRD): PSA configuration templates. These templates are pre-defined security configurations that you can apply to RKE and RKE2/K3s clusters.

Rancher Webhook Now Deployed Downstream

Starting with Rancher v2.7.2, a webhook is now installed in all downstream clusters, to validate and enforce certain boundaries. There is a known issue with behavior during rollbacks, as well as with downstream GKE private clusters; see the Known Issues section of these release notes for advice and instructions. See #40816 and #41142.

New in Rancher

Added support for provisioning downstream K3s clusters on ARM64. See #34658.
Added a new global.cattle.psp.enabled value to the Rancher Helm chart, to allow disabling of PSPs from within Rancher. See #40327.
The Rancher chart now exposes the startupProbe value, along with its failureThreshold and periodSeconds sub-values. These values are off by default. The startupProbe value defaults to 1 for failureThreshold and 30 for periodSeconds if they are not set. See #38177 and #39841.
Validation of cert-manager v1.11.0 compatibility. See #40880.
If you trigger an upgrade to Kubernetes v1.25 on a cluster with existing PSP resources but the PSP feature is disabled at the cluster level, the upgrade will still complete and the PSP resources will no longer be active or accessible on the cluster.

New in the Rancher UI

Added RBAC for Fleet users. See #7315.
Updated the Azure driver to allow for more advanced configuration of Azure Node Driver Clusters for RKE2/K3s. Important note: These fields can only be configured on newly added node pools. See #40609 (comment):
- Tags can be added to Azure nodes through the Rancher UI. See #26955.
- Accelerated Networking can be enabled on Azure nodes. This adds a dedicated NIC to each VM created by Rancher. See #37847.
- Availability Zones can be selected when deploying nodes to Azure. Proper use of Availability Zones can result in a more resilient cluster. Each node can only exist in one Availability Zone. See #39458.
SSH and HTTPS protocols allowed for repos in Fleet. See #7518.
Added support for the phoenixNAP (pnap) machine driver for node provisioning. See #6695.
You can deploy node templates with Availability Zones in Azure, for both RKE and RKE2. See #7180 and #7753.
You can assign custom instance roles for nodes in EKS groups. See #4568.
You’re prompted to select a namespace when there are a large number of resources. This is intended to help with performance. See #6483.
The Service detail page always shows the external IP of the Load Balancer Service. See #6245.

Major Bug Fixes

Fixed an issue where etcd nodes can’t be removed from RKE2/K3s clusters with dedicated nodes for control plane and etcd roles. Note: The downstream RKE2/K3s clusters must be upgraded to Kubernetes 1.23.7/1.24.11/1.25.7 or later for the fix to work. See #40301.
Fixed an issue where an RKE2/K3s cluster remains in an Updating state when performing a restore to Kubernetes v1.24, after it’s been upgraded from Kubernetes v1.24 to Kubernetes v1.25. See #40843.
Fixed an issue where provisioning clusters with --node-name would fail with a node not found error if set to a value that doesn’t match the default hostname. See #40147.
Fixed an issue where downstream K3s node driver clusters get stuck in an Updating state, when etcd and control plane roles share a node pool. The stuck clusters report, Configuring bootstrap node(s): waiting for cluster agent to connect. See #40016.
Fixed an issue where provisioning a downstream cluster with a . in its name prevents required Fleet components from deploying to the cluster. See #39248.
Fixed an issue where provisioning a hardened RKE2 cluster with the CIS 1.23 profile fails, if the cluster is on Kuberetes v1.25. See #39148.
Fixed an issue with unreachable nodes on downstream RKE2/K3s clusters. Setting Auto Replace didn’t result in the clusters being deleted and replaced. See #39139.
Fixed an issue with faulty formation of Keycloak search URLs. Rancher used to form a search URL for Keycloak by splitting the string based on the presence of /auth/. Newer Keycloak versions using the Quarkus Distribution didn’t include this value, causing a panic when users attempted to integrate Keycloak with Rancher. Rancher was updated to properly form this search URL for Quarkus Based distributions. Thank you to @jamhed for developing a fix for this issue. See #38683.
Fixed an issue where removing a resource quota limit from a project didn’t remove the corresponding limit from the project’s namespaces. See #38010.
Fixed an issue where high CPU usage was observed on RKE v1.24.x clusters. See #38816.
Fixed an issue where groups weren’t being evaluated when determining a user’s permission to Manage Cluster Members and Project Members. See #196.
Fixed an issue where tokens were created with an invalid GroupPrincipals value, which prevents the token from being updated. See #39107.
Fixed an issue that prevented pod_security_policy_template_id from being removed using the Rancher Terraform provider. See #40494.
Fixed an issue that prevented authentication providers from being enabled again after they were previously disabled outside of the Rancher UI. See #40878.
Fixed an issue where provisioning new RKE clusters with a password-protected private registry may fail. See #40321 for a workaround and more information on the issue.
Fixed an issue in RKE clusters on Kubernetes v1.24, where cAdvisor container labels and metric series would break Monitoring V2 dashboards. See #38934.
Fixed an issue where encryption keys may fail to rotate when there are a large number of secrets. See #38283.
Fixed an issue with downstream K3s clusters not working in certain proxied environments. See #39284.
Fixed an issue preventing the creation of Digital Ocean Node Templates. See #8476.
Project owners can now select Services for Ingress. See #7194.
The UI properly displays the TLS expiration date on certificates. See #7081.
Cloned namespaces are now created and put in the same project. See #7231.
Adding a label to a pod through the Rancher UI no longer triggers an error. See #8074.
Changing the vSphere CPU value no longer resets the CPU value to the default, when other values are changed. See #7783.
Browsers no longer run out of memory when tailing log files. See #7156.
The UI now shows the correct number of active pods, and the active pod percentage won’t ever be erronously displayed as being over 100%. See #7504.

Rancher Behavior Changes

You must manually change the psp.enabled value in the chart install yaml when you install or upgrade v102.x.y charts on hardened RKE2 clusters. Instructions for updating the value are available. See #41018.
The Helm Controller in RKE2/K3s now respects the managedBy annotation. Project Monitoring V2 required a workaround in its initial release to set helmProjectOperator.helmController.enabled: false since the Helm Controller operated on a cluster-wide level and ignored the managedBy annotation. See #39724.
Rancher might retain resources from a disabled auth provider configuration in the local cluster, even after you configure another auth provider. To manually trigger cleanup for a disabled auth provider, add the management.cattle.io/auth-provider-cleanup annotation with the unlocked value to its auth config. See #40378.
Privilege escalation is now disabled by default when creating deployments. See #7165.
Rancher maintains a /v1/counts endpoint that the UI uses to display resource counts. The UI subscribes to changes to the counts for all resources through a websocket to receive the new counts for resources.
- Rancher now aggregates the changed counts and only send a message every 5 seconds. This, in turn, requires the UI to update the counts at most once every 5 seconds, improving UI performance. Previously, Rancher would send a message each time the resource counts changed for a resource type. This lead to the UI needing to constantly stop other areas of processing to update the resource counts. See #36682.
- Rancher now only sends back a count for a resource type if the count has changed from the previously known number, improving UI performance. Previously, each message from this socket would include all counts for every resource type in the cluster, even if the counts only changed for one specific resource type. This would cause the UI to need to re-update resource counts for every resource type at a high frequency, causing significant performance impact. See #36681.
When provisioning downstream clusters, the cluster name must now conform to RFC-1123. Previously, characters that did not follow the specification, such as ., were permitted and would result in clusters being provisioned without the necessary Fleet components. See #39248.

Known Issues

Starting with Rancher v2.7.2, a webhook will now be installed in all downstream clusters. There are several issues that users may encounter with this functionality:
- If you rollback from a version of Rancher >= v2.7.2 to a version < v2.7.2, you’ll experience an issue where the webhooks will remain in downstream clusters. Since the webhook is designed to be 1:1 compatible with specific versions of Rancher, this can cause unexpected behaviors to occur downstream. The Rancher team has developed a script which should be used after rollback is complete (meaning after Rancher version < v2.7.2 is running) to remove the webhook from affected downstream clusters. See #40816.
- If you have downstream private GKE clusters, you might experience issues when interacting with the resources that the webhook validates, such as namespaces. This can cause problems with activities where Rancher needs to interact with those resources, such as when you install charts. As a workaround, add a firewall setting to allow traffic to the webhook. See #41142.
When running CIS scans on RKE and RKE2 clusters on Kubernetes v1.25, some tests will fail if the rke-profile-hardened-1.23 or the rke2-profile-hardened-1.23 profile is used. These RKE and RKE2 test cases failing is expected as they rely on PSPs, which have been removed in Kubernetes v1.25. See #39851.
If you’re using Rancher v2.7.2 with Harvester v1.1.1 clusters, you won’t be able to select the Harvester cloud provider when deploying or updating guest clusters. The Harvester release notes contain instructions on how to resolve this. See #3750.
Deleting nodes from custom RKE2/K3s clusters in Rancher v2.7.2 can cause unexpected behavior, if the underlying infrastructure isn’t thoroughly cleaned. When deleting a custom node from your cluster, ensure that you delete the underlying infrastructure for it, or run the corresponding uninstall script for the Kubernetes distribution installed on the node. See #41034.:
- RKE2 uninstall script
- K3s uninstall script
When you upgrade your Kubernetes cluster, you might see the following error: Cluster health check failed. During an upgrade, this is a benign error and will self-resolve. It’s caused by the Kubernetes API server becoming temporarily unavailable as it is being upgraded within your cluster. See #41012.
You need to force-refresh the Rancher UI after initial Rancher setup, to trigger the prompt to accept the self-signed certificate. As a workaround, visit the Rancher portal, accept the self-signed certificate, and go through the setup process. Once done, go to the address bar of your browser and click the lock icon. Select the option to allow you to receive certificate errors for the Rancher website. You’ll then be prompted again to accept the new certificate. See #7867.
Once you configure a setting with an environmental variable, it can’t be updated through the Rancher API or the UI. It can only be updated through changing the value of the environmental variable. Setting the environmental variable to “” (the empty string) changes the value in the Rancher API but not in Kubernetes. As a workaround, run kubectl edit setting <setting-name>, then set the value and source fields to "", and re-deploy Rancher. See #37998.
Imported Azure Kubernetes Service (AKS) clusters don’t display workload level metrics. This bug affects Monitoring V1. A workaround is available. See #4658.
Rancher might retain resources from a disabled auth provider configuration in the local cluster, even after configuring another auth provider. To manually trigger cleanup for a disabled auth provider, add the management.cattle.io/auth-provider-cleanup annotation with the unlocked value to its auth config. See #40378.
When enabling some custom node drivers, the Cloud Credential creation page does not show the correct default fields and has an uneditable foo key. See #8563.

Install/Upgrade Notes

If you’re installing Rancher for the first time, your environment must fulfill the installation requirements.

Upgrade Requirements

Creating backups: We strongly recommend creating a backup before upgrading Rancher. To roll back Rancher after an upgrade, you must back up and restore Rancher to the previous Rancher version. Because Rancher will be restored to its state when a backup was created, any changes post upgrade will not be included after the restore. For more information, see the documentation on backing up Rancher.
Helm version: Rancher install or upgrade must occur with Helm 3.2.x+ due to the changes with the latest cert-manager release. See #29213.
CNI requirements:
- For Kubernetes v1.19 and newer, we recommend disabling firewalld as it’s incompatible with various CNI plugins. See #28840.
- If upgrading or installing a Linux distribution which uses nf_tables as the backend packet filter (such as SLES 15, RHEL 8, Ubuntu 20.10, Debian 10, or later), upgrade to RKE v1.19.2 or later to get Flannel version v0.13.0, which supports nf_tables. See Flannel #1317.
Requirements for air gapped environments:
- When installing or upgrading Rancher in an air gapped environment, add the flag --no-hooks to the helm template command, to skip rendering files for Helm’s hooks. See #3226.
- If using a proxy in front of an air-gapped Rancher instance, you must pass additional parameters to NO_PROXY. See the documentation and related issue #2725.
Requirements for Docker installs:
- When starting the Rancher Docker container, you must use the privileged flag. See documentation.
- When installing in an air-gapped environment, you must supply a custom registries.yaml file to the docker run command, as shown in the K3s documentation. If the registry has certificates, then you’ll also need to supply those. See #28969.
- When upgrading a Docker installation, a panic may occur in the container, which causes it to restart. After restarting, the container will come up and work as expected. See #33685.

Rancher Behavior Changes

In previous versions, pods critical to running Rancher didn’t use a priority class. This could cause a cluster with limited resources to evict Rancher pods before other noncritical pods. A configurable priorityClass has been added to the Rancher pod and its feature charts. See #37927.
Rancher now defaults to using the bci-micro image for sidecar audit logging, instead of Busybox. See #35587.
Rancher no longer validates an app registration’s permissions to use Microsoft Graph on endpoint updates or initial setup. You should add Directory.Read.All permissions of type Application. If you configure a different set of permissions, Rancher may not have sufficient privileges to perform some necessary actions within Azure AD. This will cause errors.
Previously, only the global container registry was used when installing or upgrading an official Rancher Helm chart app for RKE2/K3s node driver clusters. The default behavior has been changed, so that if a private registry exists in the cluster configuration, that registry will be used for pulling images. If no cluster-scoped registry is found, the global container registry will be used. A custom default registry can be specified during the Helm chart install and upgrade workflows.

Versions

Please refer to the README for latest and stable versions.

Please review our version documentation for more details on versioning and tagging conventions.

Images

rancher/rancher:v2.7.2

Tools

CLI - v2.7.0
RKE - v1.4.4

Kubernetes Versions

In this Rancher release, Kubernetes versions differ between RKE and RKE2/K3s.

Kubernetes Versions for RKE

v1.25.6 (Default)
v1.24.10
v1.23.16

Kubernetes Versions for RKE2/K3s

v1.25.7 (Default)
v1.24.11
v1.23.17

Rancher Helm Chart Versions

Starting in 2.6.0, many of the Rancher Helm charts available in the Apps & Marketplace will start with a major version of 100. This was done to avoid simultaneous upstream changes and Rancher changes from causing conflicting version increments. This also brings us into compliance with semver, which is a requirement for newer versions of Helm. You can now see the upstream version of a chart in the build metadata, for example: 100.0.0+up2.1.0. See #32294.

Other Notes

Experimental Features

Dual-stack and IPv6-only support for RKE1 clusters using the Flannel CNI will be experimental starting in v1.23.x. See the upstream Kubernetes docs. Dual-stack is not currently supported on Windows. See #165.

Deprecated Upstream Projects

Microsoft has deprecated the Azure AD Graph API that Rancher had been using for authentication via Azure AD. A configuration update is necessary to make sure users can still use Rancher with Azure AD. See the docs and #29306 for details.

Removed Legacy Features

The following legacy features have been removed as of Rancher v2.7.0. The deprecation and removal of these features were announced in previous releases. See #6864.

UI and Backend

CIS Scans v1 (Cluster)
Pipelines (Project)
Istio v1 (Project)
Logging v1 (Project)
RancherD

UI

Multiclusterapps (Global) - Apps within Multicluster Apps section

Known Major Issues

Kubernetes Cluster Distributions:
- RKE:
  - Rotating encryption keys with a custom encryption provider is not supported. See #30539.
- RKE2:
  - Amazon ECR Private Registries are not functional. See #33920.
  - When provisioning using an RKE2 cluster template, the rootSize for AWS EC2 provisioners does not currently take an integer when it should, and an error is thrown. To work around this issue, wrap the EC2 rootSize in quotes. See Dashboard #3689.
  - The communication between the ingress controller and the pods doesn’t work when you create an RKE2 cluster with Cilium as the CNI and activate project network isolation. See documentation and #34275.
  - The system-upgrade-controller Deployment may fail after Monitoring is enabled on an RKE2 v1.23 or v1.24 cluster with Windows nodes. See #38646.
- RKE2 - Windows:
  - CSI Proxy for Windows will now work in an air-gapped environment.
  - NodePorts do not work on Windows Server 2022 in RKE2 clusters due to a Windows kernel bug. See #159.
  - When upgrading Windows nodes in RKE2 clusters via the Rancher UI, Windows worker nodes will require a reboot after the upgrade is completed. See #37645.
  - The fleet-agent pod fails to deploy on an upgraded RKE2 Windows Custom cluster. See #993.
- RKE2 and K3s:
  - Deleting a control plane node results in worker nodes also reconciling. See #39021.
- K3s:
  - Clusters are in an Updating state even when it contains nodes in an Error state. See #39164.
- AKS:
  - When editing or upgrading the AKS cluster, do not make changes from the Azure console or CLI at the same time. These actions must be done separately. See #33561.
  - Windows node pools are not currently supported. See #32586.
  - Azure Container Registry-based Helm charts cannot be added in Cluster Explorer, but do work in the Apps feature of Cluster Manager. Note that when using a Helm chart repository, the disableSameOriginCheck setting controls when credentials are attached to requests. See documentation and #34584 for more information.
- GKE:
  - Basic authentication must be explicitly disabled in GCP before upgrading a GKE cluster to 1.19+ in Rancher. See #32312.
- EKS:
  - EKS clusters on Kubernetes v1.21 or below on Rancher v2.7 cannot be upgraded. To see more detail about this issue and the workaround, please see this comment.
Infrastructures:
- vSphere:
  - PersistentVolumes are unable to mount to custom vSphere hardened clusters using CSI charts. See #35173.
Harvester:
- Upgrades from Harvester v0.3.0 are not supported.
- Deploying Fleet to Harvester clusters is not yet supported. Clusters, whether Harvester or non-Harvester, imported using the Virtualization Management page will result in the cluster not being listed on the Continuous Delivery page. See #35049.
- When upgrading RKE2 in a Harvester cluster, the result may be that the first node will be upgraded while the remaining server nodes’ scheduling is disabled. See #39167.
Cluster Tools:
- Fleet:
  - Multiple fleet-agent pods may be created and deleted during initial downstream agent deployment; rather than just one. This resolves itself quickly, but is unintentional behavior. See #33293.
- Hardened clusters:
  - Not all cluster tools can currently be installed on a hardened cluster.
- Rancher Backup:
  - When migrating to a cluster with the Rancher Backup feature, the server-url cannot be changed to a different location. It must continue to use the same URL.
  - Because Kubernetes v1.22 drops the apiVersion apiextensions.k8s.io/v1beta1, trying to restore an existing backup file into a v1.22+ cluster will fail because the backup file contains CRDs with the apiVersion v1beta1. There are two options to work around this issue: update the default resourceSet to collect the CRDs with the apiVersion v1, or update the default resourceSet and the client to use the new APIs internally. See documentation and #34154.
- Monitoring:
  - Deploying Monitoring on a Windows cluster with win_prefix_path set requires users to deploy Rancher Wins Upgrader to restart wins on the hosts to start collecting metrics in Prometheus. See #32535.
- Logging:
  - Windows nodeAgents are not deleted when performing helm upgrade after disabling Windows logging on a Windows cluster. See #32325.
- Istio Versions:
  - Istio 1.12 and below do not work on Kubernetes 1.23 clusters. To use the Istio charts, please do not update to Kubernetes 1.23 until the next charts’ release.
  - Deprecated resources are not automatically removed and will cause errors during upgrades. Manual steps must be taken to migrate and/or cleanup resources before an upgrade is performed. See #34699.
  - Applications injecting Istio sidecars, fail on SELinux RHEL 8.4 enabled clusters. A temporary workaround for this issue is to run the following command on each cluster node before creating a cluster: mkdir -p /var/run/istio-cni && semanage fcontext -a -t container_file_t /var/run/istio-cni && restorecon -v /var/run/istio-cni. See #33291.
Docker Installations:
- UI issues may occur due to a longer startup time. User will receive an error message when launching Docker for the first time #28800, and user is directed to username/password screen when accessing the UI after a Docker install of Rancher. See #28798.
- On a Docker install upgrade and rollback, Rancher logs will repeatedly display the messages “Updating workload ingress-nginx/nginx-ingress-controller” and “Updating service frontend with public endpoints”. Ingresses and clusters are functional and active, and logs resolve eventually. See #35798.
- Rancher single node wont start on Apple M1 devices with Docker Desktop 4.3.0 or newer. See #35930.
Rancher UI:
- After installing an app from a partner chart repo, the partner chart will upgrade to feature charts if the chart also exists in the feature charts default repo. See #5655.
- In some instances under Users and Authentication, no users are listed and clicking Create to create a new user does not display the entire form. To work around this when encountered, perform a hard refresh to be able to log back in. See #37531.
- Deployment securityContext section is missing when a new workload is created. This prevents pods from starting when Pod Security Policy Support is enabled. See #4815.
- Remove legacy feature multi-cluster app. See #39525.
Legacy UI:
- When using the Rancher UI to add a new port of type ClusterIP to an existing Deployment created using the legacy UI, the new port will not be created upon saving. To work around this issue, repeat the procedure to add the port again. Users will notice the Service Type field will display as Do not create a service. Change this to ClusterIP and upon saving, the new port will be created successfully during this subsequent attempt. See #4280.

Topic		Replies	Views
Rancher Release v2.7.3 Announcements	1	2043	April 24, 2023
Rancher Release v2.7.5 Announcements	1	4112	June 29, 2023
Rancher Release v2.7.4 Announcements	1	1811	June 1, 2023
Rancher Release v2.7.1 Announcements	1	2300	January 25, 2023
Rancher Release v2.7.0 Announcements	1	4488	November 16, 2022

Rancher Release v2.7.2

Release v2.7.2

Features and Enhancements

Kubernetes v1.25 Support

Pod Security Policies, Pod Security Admissions, and Pod Security Standards

Rancher Webhook Now Deployed Downstream

New in Rancher

New in the Rancher UI

Major Bug Fixes

Rancher Behavior Changes

Known Issues

Install/Upgrade Notes

Upgrade Requirements

Rancher Behavior Changes

Versions

Images

Tools

Kubernetes Versions

Kubernetes Versions for RKE

Kubernetes Versions for RKE2/K3s

Rancher Helm Chart Versions

Other Notes

Experimental Features

Deprecated Upstream Projects

Removed Legacy Features

Known Major Issues

Related topics