Release v2.6.4
It is important to review the Install/Upgrade Notes below before upgrading to any Rancher version.
In Rancher v2.6.4, the cluster-api module has been upgraded from v0.4.4 to v1.0.2 in which the apiVersion of CAPI CRDs are upgraded from
cluster.x-k8s.io/v1alpha4
tocluster.x-k8s.io/v1beta1
. This has the effect of causing rollbacks from Rancher v2.6.4 to any previous version of Rancher v2.6.x to fail because the previous version the CRDs needed to roll back are no longer available in v1beta1. To avoid this, the Rancher resource cleanup script should be run before the restore or rollback is attempted. This script can be found in the rancherlabs/support-tools repo and the usage of the script can be found in the backup-restore operator docs. See also #36803 for more details.
Features and Enhancements
New in Rancher
- Kubernetes v1.22 is no longer experimental and is now supported. Kubernetes v1.23 is experimental.
- Kubernetes v1.22 and v1.23 are available as Kubernetes version options when provisioning clusters as well as upgrading imported RKE2/k3s clusters.
- Rancher on IBM Z is now in tech preview.
New in Cert-Manager
- Rancher now supports cert-manager versions 1.6.2 and 1.7.1. We recommend v1.7.x because v 1.6.x will reach end-of-life on March 30, 2022. To read more, see the documentation.
- When upgrading Rancher and cert-manager, you will need to use Option B: Reinstalling Rancher and cert-manager from the Rancher docs.
- There are several versions of cert-manager which, due to their backwards incompatibility, are not recommended for use with Rancher. You can read more about which versions are affected by this issue in the cert-manager docs. As a result, only versions 1.6.2 and 1.7.1 are recommended for use at this time.
- For instructions on upgrading cert-manager from version 1.5 to 1.6, see the relevant cert-manager docs.
- For instructions on upgrading cert-manager from version 1.6 to 1.7, see the relevant cert-manager docs.
New in RKE2 - Tech Preview
- UI Enhancements in RKE2 Cluster Provisioning
- The UI now provides an option to rotate certificates for RKE2 provisioned clusters, including an option to rotate certificates for an individual component. See Dashboard #4485.
- S3 configuration support has been re-enabled for RKE2 snapshots; this is required for RKE2 provisioning parity with RKE1. See Dashboard #4551.
- When restoring RKE2 cluster etcd snapshots, more restore options are available. See Dashboard #4539.
- “Auto Replace” option support enabled for RKE2 machine pools. See Dashboard #4449.
- Ability to scale down individual nodes for RKE2-provisioned clusters added. See Dashboard #4446.
- Enhanced RKE2 cluster provisioning on Linode. See Dashboard #3262.
- Added user-configurable OS field for vSphere machine pools. See Dashboard #4859.
- “Drain Before Delete” support for RKE2 machine pools has been added. Note that when editing nodes and enabling the “Drain Before Delete” option, the existing control plane nodes and worker nodes are deleted and new nodes are created. It should be noted that for etcd nodes, the same behavior does not apply. See #35274 and Dashboard #4448.
- Health checks were added to support self-healing RKE2 machine pools. See #35275.
- Bug Fixes
- Known Issues in Windows
- Experimental Support for RKE2 Provisioning tech preview for Windows will only work on v1.22 and up of RKE2. End users should not use v1.21.x of RKE2 for any RKE2 cluster that will have Windows worker nodes. This is due to an upstream Calico bug that was not backported to the minor version of Calico (3.19.x) that is present in v1.21.x of RKE2. See #131.
- When upgrading Windows nodes in RKE2 clusters via the Rancher UI, Windows worker nodes will require a reboot after the upgrade is completed.
- CSI Proxy for Windows will now work in an air-gapped environment.
- NodePorts do not work on Windows Server 2022 in RKE2 clusters due to a Windows kernel bug. See #159.
- Other Known Issues
- RKE2 node driver cluster gets stuck in provisioning state after an upgrade to v2.6.4 and rollback to v2.6.3. See #36859.
- RKE2 node driver cluster has its nodes redeployed when upgrading Rancher from v2.6.3 to v2.6.4. See #36627.
- RKE2 node driver cluster gets stuck in provisioning state in Kubernetes v1.23.x. See #36939.
- The communication between the ingress controller and the pods doesn’t work when you create an RKE2 cluster with Cillium as the CNI and activate project network isolation. See documentation and #34275
cluster state
changes toProvisioning
when a worker node is deleted in an RKE2 cluster. See #36689.cluster state
changes toProvisioning
when a snapshot is taken in an RKE2 cluster. See #36504.
UI Enhancements in Fleet
- Added a Bundles tab to the GitRepo detail view. See Dashboard #4794.
- Added a Detail view for the Fleet Bundle resource. See Dashboard #4793.
- Fleet controller logs are now viewable on cluster dashboard. See Dashboard #3668.
- In the Continuous Delivery dashboard, badge colors have been updated to feature counts that convey health status information. See Dashboard #5232.
- On the Git Repos detail page, error notifications were added to the Conditions tab. See Dashboard #5218.
- Added a new dashboard view to Fleet at the top of the navigation list; this is the default page that the user will land on. See Dashboard #5048.
- In the Git Repos detail page, a warning icon now displays when the GitRepo does not apply to any clusters. See Dashboard #4929.
Other UI Enhancements
- Nvidia GPU Reservation can be edited on the workloads page. See Dashboard #5005.
- The UI now uses a nested structure to determine whether to display global or cluster-scoped resources in the role creation forms, and which API group should be auto-populated when a resource is selected. See Dashboard #4888.
- An edit and detail UI has been added for NetworkPolicies. See Dashboard #4729.
- On the namespace detail page, a new tab was added that shows the workloads. See Dashboard #5115.
- In Apps and Marketplace, existing charts have been updated with annotations such that users will better understand what occurs in their mixed Linux/Windows workload cluster if they deploy a chart. See Dashboard #5137.
- If a custom consent banner is enabled, after logging into Rancher, a user must now accept or deny consent. By default, consent is required to use Rancher, but can be configured to be optional. See Dashboard #4719.
- Improvements added to the Rollback Diff view: ability to switch between the side-by-side and inline diff and the ability to collapse (or hide) the fields that are non-standard. See Dashboard #4636.
- Users can now create a ConfigMap at the project scope. See Dashboard #4571.
Security Enhancements
- New
restricted-noroot
PSP created to prevent pods from running as a privileged user. See the documentation, #35191 and security advisory GHSA-hwm2-4ph6-w6m5. - CIS Hardening Guide updated for Rancher 2.6. See the documentation and #35735.
Other New Features
- Users can now provision node driver clusters from an air gapped cluster configured to use a proxy for outbound connections. See the documentation and #28411.
- The “rancher/install-docker” script now supports the Linux distributions
SLES / OpenSUSE / RHEL / Rocky Linux
for the Docker versions20.10.8 / 20.10.9 / 20.10.10
. See #34615. - Users can now configure the
Readiness Check
andLiveness Check
ofcoredns-autoscaler
. See #24939.
Behavior Changes
- If you provide your own security group for an EC2 instance, Rancher will not modify it; therefore, you must set the security group to allow the necessary ports for Rancher to provision the instance. See documentation and #3832.
Major Bug Fixes
UI
- Fixed an issue in which restricted admins couldn’t create git repos from the continuous delivery menu. See Dashboard #4909.
- Cluster Dashboard resource gauges now show correct max values. See Dashboard #2892.
- Can now add or edit cluster labels via the UI. See Dashboard #3086.
- Add support for cluster icons for drivers that are not built in. See Dashboard #3124.
- Editing Deployment image names via “Edit Config” results in failure. See Dashboard #3827.
- Templates are listing from vSphere content library as expected. See Dashboard #4302.
- “Restore snapshot” action is not available in Cluster management page. See Dashboard #4606.
- Rollback option on workload does not work. See Dashboard #4664.
- Group list not sorted alphabetically when attempting to add role grant for SAML group. See Dashboard #4685.
- Namespace resource settings for quota override now display in UI. See Dashboard #4704.
- Fixed an issue in which there were differences in behavior of conditional logic in questions.yaml between older Cluster Manager Apps and the Cluster Explorer in Apps & Marketplace. See Dashboard #4706.
- UI now shows network-attachment-definitions correctly. See Dashboard #4748.
- Added form field for
ssl_verify
andssl_version
when enabling an HTTPS flow. See Dashboard #4753. - Fixed issue in which advanced selector settings for GitRepo get lost when editing. See Dashboard #4788.
- UI change of replicas are now possible. See Dashboard #4828.
- Form elements are now displayed as expected in Continuous Delivery Git Repo create/edit form. See Dashboard #4830.
- Added back Pod selection checkboxes and delete button to workload views in Cluster Explorer. See Dashboard #4831.
- Removed Rancher OS AMI link from EC2 provisioning UI. See Dashboard #4833.
- Restricted admins can now see Assign Global Roles button in Users & Authentication → Groups. See Dashboard #4834.
- Project/Cluster Member with Project Monitoring role granted on System Project able to view workload /node metrics tab in Explorer UI. See Dashboard #4835.
- Init containers are now visible in the Cluster Explorer UI. See Dashboard #4840.
- Users may create cronjob in Rancher UI as expected. See Dashboard #4842.
- API Key Expired date displays correctly now at first login as standard user. See Dashboard #4863.
- Secrets can be used as Environment Variables in Deployments. See Dashboard #4866.
- Dashboard now shows the API endpoint URL in the API Keys page. See Dashboard #4869.
- Unnecessary “%” that was displayed in Maximum Worker Nodes Unavailable screen has been removed. See Dashboard #4870.
- Events table in cluster dashboard has been replaced with monitoring alerts table. See Dashboard #4911.
- Kubernetes Job now displays in Rancher UI when job is created from CronJob with kubectl 1.19+ version. See Dashboard #4963.
- Resolved Rancher error when browsing clusters and refreshing the page. See Dashboard #4967.
- New User Default Role Toggle now functions as expected in Rancher UI. See Dashboard #4992.
- Fixed login error
Cannot read properties of null (reading 'startsWith') Home
. See Dashboard #5004. - Users can now change default role via UI after configuring ADFS authentication. See Dashboard #5066.
- Namespace view now provides detail for the summary pills. See Dashboard #5074.
- Dropdown added so that viewing previous container logs is easier. See Dashboard #5075.
- Helm UI now displays logs of a deployment. See Dashboard #5079.
- Component Status pills now show on cluster dashboard view. See Dashboard #5085.
- Fixed an issue in which config map view was inconsistent for binary data. See Dashboard #5311.
Rancher
- Fixed error in which standard users received an error when creating a cluster from the
Edit as YAML
andEdit as a Form
buttons. See #35868. - If the
rancher-node
security group is used, the existing security group is not modified, as expected. See #24337. - For namespaces created inside projects via
kubectl
, when a resource limit exceeds the remaining amount in the project, Rancher no longer assigns an all-restrictive quota limit. Instead, a zero limit is set for the exceeding resource only. See #35647. - When deploying a cluster on v1.22, monitoring installation works as expected. See #35571.
- Fixed an issue in which Calico probes failed on Windows nodes. See #36910.
- Enabling
Cluster Monitoring
through the API or the UI will set the same memory. See #25103. - When creating a RKE cluster with the Rancher CLI, the values from a config file are not applied. See #25416.
- The
Container Default Resource Limits
are now applied when a new namespace is created in a project viakubectl
(bypassing the UI). See #27750. - Deploying clusters using the Azure node driver causes
Pod Predicate NodeAffinity failed error
ondefault-http-backend
pod. See #29882. - Hairpin rules are not added when using IPVS with a cloud provider enabled. See #30363.
- Cluster namespaces remain after deleting a cluster. See #31546.
- Upgrading Kubernetes version on downstream clusters causes a memory consumption increase. See #31640.
- Configuring Keycloak (SAML) authentication fails with decoding error
Unknown error: SAML: cannot initialize saml SP, cannot decode IDP Metadata content from the config
. See #33709. - Intermittent health check failure causes unreachable downstream clusters. See #34819.
- Backups taking longer than 5 minutes will cause Rancher to start a new backup and delete the one that is currently running, generating a backup loop. See #34890.
- K3s Docker image should now be used in the Dockerfile instead of downloading the K3s binary. See #35101.
- When Logging v1 and v2 are both installed, the v1
fluentd
pods get stuck with thecrashlooping
error. See #35125. - Syslog output can now be sent through UDP in Logging v2. See #35197.
- The
cattle-node-cleanup
job will be deleted after it times out. See #35334. - If a new namespace does not fit the Project quota limits, a
zero quota
limit will be created only for the new namespace’s resources. See #35647. - When upgrading a RKE cluster, the status is seen as active even though the add-on’s are still being updated. See #35750.
rancher-webhook
certificate renewal workaround updated. See #35860.- Default value for
ingressNginx.serviceMonitor.interval
is set to30s
in therancher-monitoring
charts. See #36070. - Importing KEv2 clusters with the Rancher client have their config correctly rewritten. See #36128.
- The
rancher-monitoring
ports fornode-exporter
andpush-proxy-clients
are no longer opened on the host. See #36140. - DNS redirect iptables rules are now correctly created by
istio-cni
. See #36159. - When using Fleet to apply the configuration, it will only apply the final configMap entry. See #36242.
- When creating a cluster, RKE will panic if
rotate_encryption_key
is enabled andsecrets_encryption_config
disabled. See #36333.
Security
- Role bindings are now created with
owner refs
referring to theglobal role bindings
which caused their creation. See #934. - When creating cluster role bindings and role bindings, several parts of rancher assumed that the subject (i.e. the user/group that the role is being bound to) was a user. This caused errors (and an inability to create the binding) when a group was bound to these roles instead. See #974.
- Error syncing handler
grb-cluster-sync: RoleBinding.rbac.authorization.k8s.io
. See #975.
Install/Upgrade Notes
- If you are installing Rancher for the first time, your environment must fulfill the installation requirements.
- The namespace where the local Fleet agent runs has been changed to
cattle-fleet-local-system
. This change does not impact GitOps workflows.
Upgrade Requirements
- Creating backups: We strongly recommend creating a backup before upgrading Rancher. To roll back Rancher after an upgrade, you must back up and restore Rancher to the previous Rancher version. Because Rancher will be restored to its state when a backup was created, any changes post upgrade will not be included after the restore. For more information, see the documentation on backing up Rancher.
- Helm version: Rancher install or upgrade must occur with Helm 3.2.x+ due to the changes with the latest cert-manager release. See #29213.
- Kubernetes version:
- The local Kubernetes cluster for the Rancher server should be upgraded to Kubernetes 1.18+ before installing Rancher 2.6+.
- When using Kubernetes v1.21 with Windows Server 20H2 Standard Core, the patch “2019-08 Servicing Stack Update for Windows Server” must be installed on the node. See #72.
- CNI requirements:
- For Kubernetes v1.19 and newer, we recommend disabling firewalld as it has been found to be incompatible with various CNI plugins. See #28840.
- If upgrading or installing to a Linux distribution which uses nf_tables as the backend packet filter, such as SLES 15, RHEL 8, Ubuntu 20.10, Debian 10, or newer, users should upgrade to RKE1 v1.19.2 or later to get Flannel version v0.13.0 that supports nf_tables. See Flannel #1317.
- For users upgrading from
>=v2.4.4
tov2.5.x
with clusters where ACI CNI is enabled, note that upgrading Rancher will result in automatic cluster reconciliation. This is applicable for Kubernetes versionsv1.17.16-rancher1-1
,v1.17.17-rancher1-1
,v1.17.17-rancher2-1
,v1.18.14-rancher1-1
,v1.18.15-rancher1-1
,v1.18.16-rancher1-1
, andv1.18.17-rancher1-1
. Please refer to the workaround BEFORE upgrading tov2.5.x
. See #32002.
- Requirements for air gapped environments:
- For installing or upgrading Rancher in an air gapped environment, please add the flag
--no-hooks
to thehelm template
command to skip rendering files for Helm’s hooks. See #3226. - If using a proxy in front of an air gapped Rancher, you must pass additional parameters to
NO_PROXY
. See the documentation and related issue #2725.
- For installing or upgrading Rancher in an air gapped environment, please add the flag
- Cert-manager version requirements: Recent changes to cert-manager require an upgrade if you have a high-availability install of Rancher using self-signed certificates. If you are using cert-manager older than v0.9.1, please see the documentation on how to upgrade cert-manager. See documentation.
- Requirements for Docker installs:
- When starting the Rancher Docker container, the privileged flag must be used. See documentation.
- When installing in an air gapped environment, you must supply a custom
registries.yaml
file to thedocker run
command as shown in the K3s documentation. If the registry has certificates, then you will need to also supply those. See #28969. - When upgrading a Docker installation, a panic may occur in the container, which causes it to restart. After restarting, the container comes up and is working as expected. See #33685.
Rancher Behavior Changes
- Legacy features are gated behind a feature flag. Users upgrading from Rancher <=v2.5.x will automatically have the
--legacy
feature flag enabled. New installations that require legacy features need to enable the flag on install or through the UI. - Users must manually remove legacy services. When workloads created using the legacy UI are deleted, the corresponding services are not automatically deleted. Users will need to manually remove these services. A message will be displayed notifying the user to manually delete the associated services when such a workload is deleted. See #34639.
- Charts from library and helm3-library catalogs can no longer be launched. Users will no longer be able to launch charts from the library and helm3-library catalogs, which are available through the legacy apps and multi-cluster-apps pages. Any existing legacy app that was deployed from a previous Rancher version will continue to be able to edit its currently deployed chart. Note that the Longhorn app will still be available from the library for new installs but will be removed in the next Rancher version. All users are recommended to deploy Longhorn from the Apps & Marketplace section of the Rancher UI instead of through the Legacy Apps pages.
- The local cluster can no longer be turned off. In older Rancher versions, the local cluster could be hidden to restrict admin access to the Rancher server’s local Kubernetes cluster, but that feature has been deprecated. The local Kubernetes cluster can no longer be hidden and all admins will have access to the local cluster. If you would like to restrict permissions to the local cluster, there is a new restricted-admin role that must be used. The access to local cluster can now be disabled by setting
hide_local_cluster
to true from the v3/settings API. See the documentation and #29325. For more information on upgrading from Rancher with a hidden local cluster, see the documentation. - Users must log in again. After upgrading to
v2.6+
, users will be automatically logged out of the old Rancher UI and must log in again to access Rancher and the new UI. See #34004. - Fleet is now always enabled. For users upgrading from
v2.5.x
tov2.6.x
, note that Fleet will be enabled by default as it is required for operation inv2.6+
. This will occur even if Fleet was disabled inv2.5.x
. During the upgrade process, users will observe restarts of therancher
pods, which is expected. See #31044 and #32688. - The Fleet agent in the local cluster now lives in
cattle-fleet-local-system
. Starting with Rancher v2.6.1, Fleet allows for two agents in the local cluster for scenarios where “Fleet is managing Fleet”. The true local agent runs in the newcattle-fleet-local-system
namespace. The agent downstream from another Fleet management cluster runs incattle-fleet-system
, similar to the agent pure downstream clusters. See #34716 and #531. - Editing and saving clusters can result in cluster reconciliation. For users upgrading from
<=v2.4.8 (<= RKE v1.1.6)
tov2.4.12+ (RKE v1.1.13+)
/v2.5.0+ (RKE v1.2.0+)
, please note that Edit and save cluster (even with no changes or a trivial change like cluster name) will result in cluster reconciliation and upgradingkube-proxy
on all nodes because of a change inkube-proxy
binds. This only happens on the first edit and later edits shouldn’t affect the cluster. See #32216. - The EKS cluster refresh interval setting changed. There is currently a setting allowing users to configure the length of refresh time in cron format:
eks-refresh-cron
. That setting is now deprecated and has been migrated to a standard seconds format in a new setting:eks-refresh
. If previously set, the migration will happen automatically. See #31789. - System components will restart. Please be aware that upon an upgrade to v2.3.0+, any edits to a Rancher launched Kubernetes cluster will cause all system components to restart due to added tolerations to Kubernetes system components. Plan accordingly.
- New GKE and AKS clusters will use Rancher’s new lifecycle management features. Existing GKE and AKS clusters and imported clusters will continue to operate as-is. Only new creations and registered clusters will use the new full lifecycle management.
- New steps for rolling back Rancher. The process to roll back Rancher has been updated for versions v2.5.0 and above. New steps require scaling Rancher down to 0 replica before restoring the backup. Please refer to the documentation for the new instructions.
- RBAC differences around Manage Nodes for RKE2 clusters. Due to the change of the provisioning framework, the
Manage Nodes
role will no longer be able to scale up/down machine pools. The user would need the ability to edit the cluster to manage the machine pools #34474. - New procedure to set up Azure cloud provider for RKE2. For RKE2, the process to set up an Azure cloud provider is different than for RKE1 clusters. Users should refer to the documentation for the new instructions. See #34367 for original issue.
- Machines vs Kube Nodes. In previous versions, Rancher only displayed Nodes, but with v2.6, there are the concepts of
machines
andkube nodes
. Kube nodes are the Kubernetes node objects and are only accessible if the Kubernetes API server is running and the cluster is active. Machines are the cluster’s machine object which defines what the cluster should be running. - Rancher’s External IP Webhook chart no longer supported in v1.22. In v1.22, upstream Kubernetes has enabled the admission controller to reject usage of external IPs. As such, the
rancher-external-ip-webhook
chart that was created as a workaround is no longer needed, and support for it is now capped to Kubernetes v1.21 and below. See #33893. - Increased memory limit for Legacy Monitoring. The default value of the Prometheus memory limit in the legacy Rancher UI is now 2000Mi to prevent the pod from restarting due to a OOMKill. See #34850.
- Increased memory limit for Monitoring. The default value of the Prometheus memory limit in the new Rancher UI is now 3000Mi to prevent the pod from restarting due to a OOMKill. See #34850.
Versions
Please refer to the README for latest and stable versions.
Please review our version documentation for more details on versioning and tagging conventions.
Images
- rancher/rancher:v2.6.4
Tools
Kubernetes Versions
- v1.23.4 (Experimental)
- v1.22.7 (Default)
- v1.21.10
- v1.20.15
- v1.19.16
- v1.18.20
Rancher Helm Chart Versions
Starting in 2.6.0, many of the Rancher Helm charts available in the Apps & Marketplace will start with a major version of 100. This was done to avoid simultaneous upstream changes and Rancher changes from causing conflicting version increments. This also brings us into compliance with semver, which is a requirement for newer versions of Helm. You can now see the upstream version of a chart in the build metadata, for example: 100.0.0+up2.1.0
. See #32294.
Other Notes
Feature Flags
Feature flags introduced in 2.6.0 and the Harvester feature flag introduced in 2.6.1 are listed below for reference:
Feature Flag | Default Value | Description |
---|---|---|
harvester |
true |
Used to manage access to the Harvester list page where users can navigate directly to Harvester host clusters and have the ability to import them. |
fleet |
true |
The previous fleet feature flag is now required to be enabled as the fleet capabilities are leveraged within the new provisioning framework. If you had this feature flag disabled in earlier versions, upon upgrading to Rancher, the flag will automatically be enabled. |
gitops |
true |
If you want to hide the “Continuous Delivery” feature from your users, then please use the newly introduced gitops feature flag, which hides the ability to leverage Continuous Delivery. |
rke2 |
true |
We have introduced the ability to provision RKE2 clusters as tech preview. By default, this feature flag is enabled, which allows users to attempt to provision these type of clusters. |
legacy |
false for new installs, true for upgrades |
There are a set of features from previous versions that are slowly being phased out of Rancher for newer iterations of the feature. This is a mix of deprecated features as well as features that will eventually be moved to newer variations in Rancher. By default, this feature flag is disabled for new installations. If you are upgrading from a previous version, this feature flag would be enabled. |
token-hashing |
false |
Used to enable new token-hashing feature. Once enabled, existing tokens will be hashed and all new tokens will be hashed automatically using the SHA256 algorithm. Once a token is hashed it cannot be undone. Once this feature flag is enabled it cannot be disabled. |
Experimental Features
RancherD was introduced in 2.5 as an easy-to-use installation binary. With the introduction of RKE2 provisioning, this project is being re-written and will be available at a later time. See #33423.
Legacy Features
Legacy features are features hidden behind the legacy
feature flag, which are various features/functionality of Rancher that was available in previous releases. These are features that Rancher doesn’t intend for new users to consume, but if you have been using past versions of Rancher, you’ll still want to use this functionality.
When you first start 2.6, there is a card in the Home page that outlines the location of where these features are now located.
The deprecated features from v2.5 are now behind the legacy
feature flag. Please review our deprecation policy for questions.
The following legacy features are no longer supported on Kubernetes v1.21+ clusters:
- Logging
- CIS Scans
- Istio 1.5
- Pipelines
The following legacy feature is no longer supported past Kubernetes v1.21+ clusters:
- Monitoring V1
Known Major Issues
- Kubernetes Cluster Distributions:
- RKE:
- Rotating encryption keys with a custom encryption provider is not supported. See #30539.
- Actions that can’t be performed by a cluster member,
Download KubeConfig
andTake Snapshot
, are being shown as available on RKE1 clusters in a Rancher setup. See #35828. - On the Cluster Management page, snapshot-related actions such as create/restore and
rotate certificate
are not available for a standard user in RKE1. See Dashboard #5011. - Dual stack is not supported for RKE1 Windows clusters, specifically v1.23.4 with dual stack networking becoming GA in upstream Kubernetes. See Windows #165.
- RKE2 - Tech Preview: There are several known issues as this feature is in tech preview, but here are some major issues to consider before using RKE2.
- Amazon ECR Private Registries are not functional. See #33920.
- When provisioning using a RKE2 cluster template, the
rootSize
for AWS EC2 provisioners does not currently take an integer when it should, and an error is thrown. To work around this issue, wrap the EC2rootSize
in quotes. See Dashboard #3689.
- RKE2 - Windows:
- AKS:
- When editing or upgrading the AKS cluster, do not make changes from the Azure console or CLI at the same time. These actions must be done separately. See #33561.
- Windows node pools are not currently supported. See #32586.
- Azure Container Registry-based Helm charts cannot be added in Cluster Explorer, but do work in the Apps feature of Cluster Manager. Note that when using a Helm chart repository, the
disableSameOriginCheck
setting controls when credentials are attached to requests. See documentation and #34584 for more information.
- GKE:
- Basic authentication must be explicitly disabled in GCP before upgrading a GKE cluster to 1.19+ in Rancher. See #32312.
- AWS:
- In an HA Rancher server on Kubernetes v1.20, ingresses on AWS EC2 node driver clusters do not go through and result in a
failed calling webhook
error. Please refer to the workaround. See #35754. - On RHEL8.4 SELinux in AWS AMI, Kubernetes v1.22 fails to provision on AWS. As Rancher will not install RPMs on the nodes, users may work around this issue either by using AMI with this package already installed, or by installing AMI via cloud-init. Users will encounter this issue on upgrade to v1.22 as well. When upgrading to 1.22, users must manually upgrade/install the rancher-selinux package on all the nodes in the cluster, then upgrade the Kubernetes version. See #36509.
- In an HA Rancher server on Kubernetes v1.20, ingresses on AWS EC2 node driver clusters do not go through and result in a
- RKE:
- Infrastructures:
- vSphere:
- The vSphere CSI Driver does not support Kubernetes v1.22 due to unsupported v1beta1 CRD APIs. Support will be added in a later release, but in the meantime users with the
CSIMigrationvSphere
feature enabled should not upgrade to Kubernetes v1.22. See #33848. PersistentVolumes
are unable to mount to custom vSphere hardened clusters using CSI charts. See #35173.
- The vSphere CSI Driver does not support Kubernetes v1.22 due to unsupported v1beta1 CRD APIs. Support will be added in a later release, but in the meantime users with the
- vSphere:
- Harvester:
- Upgrades from Harvester v0.3.0 are not supported.
- Deploying Fleet to Harvester clusters is not yet supported. Clusters, whether Harvester or non-Harvester, imported using the Virtualization Management page will result in the cluster not being listed on the Continuous Delivery page. See #35049.
- Cluster Tools:
- Fleet:
- Multiple
fleet-agent
pods may be created and deleted during initial downstream agent deployment; rather than just one. This resolves itself quickly, but is unintentional behavior. See #33293.
- Multiple
- Hardened clusters:
- Not all cluster tools can currently be installed on a hardened cluster.
- Rancher Backup:
- When migrating to a cluster with the Rancher Backup feature, the server-url cannot be changed to a different location. It must continue to use the same URL.
- When running a newer version of the rancher-backup app to restore a backup made with an older version of the app, the
resourceSet
namedrancher-resource-set
will be restored to an older version that might be different from the one defined in the current running rancher-backup app. The workaround is to edit the rancher-backup app to trigger a reconciliation. See #34495. - Because Kubernetes v1.22 drops the apiVersion
apiextensions.k8s.io/v1beta1
, trying to restore an existing backup file into a v1.22 cluster will fail because the backup file contains CRDs with the apiVersion v1beta1. There are two options to work around this issue: update the defaultresourceSet
to collect the CRDs with the apiVersion v1, or update the defaultresourceSet
and the client to use the new APIs internally. See documentation and #34154. - When performing a backup/restore using Helm, the command will fail if Let’s Encrypt is used. See #37060.
- Monitoring:
- Deploying Monitoring on a Windows cluster with win_prefix_path set requires users to deploy Rancher Wins Upgrader to restart wins on the hosts to start collecting metrics in Prometheus. See #32535.
- Monitoring fails to upgrade when CRD is in a failed state. To work around this issue, use Helm to install the
rancher-monitoring
chart into the cluster directly, rather than using the Rancher UI. In order to set nodeSelector or tolerations to therancher-monitoring-crd
chart, you need to install therancher-monitoring-crd
andrancher-monitoring
chart by using the Helm command via command line. Rancher UI will add support soon. See #35744.
- Logging:
- Windows nodeAgents are not deleted when performing helm upgrade after disabling Windows logging on a Windows cluster. See #32325.
- Istio Versions:
- Istio 1.5 is not supported in air gapped environments. Please note that the Istio project has ended support for Istio 1.5.
- Istio 1.9 support ended on October 8th, 2021.
- The Kiali dashboard bundled with 100.0.0+up1.10.2 errors on a page refresh. Instead of refreshing the page when needed, simply access Kiali using the dashboard link again. Everything else works in Kiali as expected, including the graph auto-fresh. See #33739.
- In Istio v1.10.4, Kubernetes IP service is set to default IP, which does not work for all environments. To work around this issue, install Istio version 100.1.0+up1.11.4 in the downstream cluster, and installation will complete successfully. Note that the new install will not include the Kiali CRD. See #35339.
- As part of the upgrade to Istio 1.11.4, Kiali was upgraded to 1.41 which removed the CRD installation. If you upgraded from a previous version of
rancher-istio
, you will need to manually delete therancher-kiali-server-crd
found on the installed apps page, since it is no longer in use after the upgrade completes. See #35686. - A
failed calling webhook "validation.istio.io"
error will occur in air gapped environments if theistiod-istio-system
ValidatingWebhookConfiguration
exists, and you attempt a fresh install of Istio 1.11.x and higher. To work around this issue, run the commandkubectl delete validatingwebhookconfiguration istiod-istio-system
and attempt your install again. See #35742. - Deprecated resources are not automatically removed and will cause errors during upgrades. Manual steps must be taken to migrate and/or cleanup resources before an upgrade is performed. See #34699.
- Applications injecting Istio sidecars, fail on SELinux RHEL 8.4 enabled clusters. A temporary workaround for this issue is to run the following command on each cluster node before creating a cluster:
mkdir -p /var/run/istio-cni && semanage fcontext -a -t container_file_t /var/run/istio-cni && restorecon -v /var/run/istio-cni
. See #33291.
- Legacy Monitoring:
- The Grafana instance inside Cluster Manager’s Monitoring is not compatible with Kubernetes v1.21. To work around this issue, disable the
BoundServiceAccountTokenVolume
feature in Kubernetes v1.21 and above. Note that this workaround will be deprecated in Kubernetes v1.22. See #33465. - In air gapped setups, the generated
rancher-images.txt
that is used to mirror images on private registries does not contain the images required to run Legacy Monitoring which is compatible with Kubernetes v1.15 clusters. If you are running Kubernetes v1.15 clusters in an air gapped environment, and you want to either install Legacy Monitoring or upgrade Legacy Monitoring to the latest that is offered by Rancher for Kubernetes v1.15 clusters, you will need to take one of the following actions:- Upgrade the Kubernetes version so that you can use v0.2.x of the Monitoring application Helm chart
- Manually import the necessary images into your private registry for the Monitoring application to use
- When deploying any downstream cluster, Rancher logs errors that seem to be related to Monitoring even when Monitoring is not installed onto either cluster; specifically, Rancher logs that it
failed on subscribe
to the Prometheus CRs in the cluster because it is unable to get the resourceprometheus.meta.k8s.io
. These logs appear in a similar fashion for other Prometheus CRs (namely Alertmanager, ServiceMonitors, and PrometheusRules), but do not seem to cause any other major impact in functionality. See #32978. - Legacy Monitoring does not support Kubernetes v1.22 due to the
feature-gates
flag no longer being supported. See #35574. - After performing an upgrade to Rancher v2.6.3 from v2.6.2, the Legacy Monitoring custom metric endpoint stops working. To work around this issue, delete the service that is being targeted by the servicemonitor and allow it to be recreated; this will reload the pods that need to be targeted on a service sync. See #35790.
- The Grafana instance inside Cluster Manager’s Monitoring is not compatible with Kubernetes v1.21. To work around this issue, disable the
- Fleet:
- Docker Installations:
- UI issues may occur due to a longer startup time. User will receive an error message when launching Docker for the first time #28800, and user is directed to username/password screen when accessing the UI after a Docker install of Rancher. See #28798.
- On a Docker install upgrade and rollback, Rancher logs will repeatedly display the messages “Updating workload
ingress-nginx/nginx-ingress-controller
” and “Updating servicefrontend
with public endpoints”. Ingresses and clusters are functional and active, and logs resolve eventually. See #35798. - Rancher single node wont start on Apple M1 devices with Docker Desktop 4.3.0 or newer. See #35930.
- Login to Rancher using Active Directory with TLS:
- Upon an upgrade to v2.6.0, authenticating via Rancher against an Active Directory server using TLS can fail if the certificates on the AD server do not support SAN attributes. This is a check enabled by default in Go v1.15. See #34325.
- The error received is “Error creating SSL connection: LDAP Result Code 200 “Network Error”: x509: certificate relies on legacy Common Name field, use SANs or temporarily enable Common Name matching with GODEBUG=x509ignoreCN=0”
- To resolve this, the certificates on the AD server should be updated or replaced with new ones that support the SAN attribute. Alternatively this error can be ignored by setting
GODEBUG=x509ignoreCN=0
as an environment variable to Rancher server container.
- Rancher UI:
- In some instances under Users and Authentication, no users are listed and clicking Create to create a new user does not display the entire form. To work around this when encountered, perform a hard refresh to be able to log back in. See Dashboard #5336.
- Deployment securityContext section is missing when a new workload is created. This prevents pods from starting when Pod Security Policy Support is enabled. See #4815.
- Legacy UI:
- When using the Rancher v2.6 UI to add a new port of type ClusterIP to an existing Deployment created using the legacy UI, the new port will not be created upon saving. To work around this issue, repeat the procedure to add the port again. Users will notice the Service Type field will display as
Do not create a service
. Change this to ClusterIP and upon saving, the new port will be created successfully during this subsequent attempt. See #4280.
- When using the Rancher v2.6 UI to add a new port of type ClusterIP to an existing Deployment created using the legacy UI, the new port will not be created upon saving. To work around this issue, repeat the procedure to add the port again. Users will notice the Service Type field will display as