Installation issue of rke2 / rancher 2.6.x on suse linux es 15 sp3 su vmware vsphere hypervisor (esxi) 7.0.x

Description

There is an integration and compatibility issue between GNU/Linux SUSE Linux ES 15 SP3 / SP4 () with Kubernetes/Rancher and virtual machines processed by Hypervisor VMWare ESXi 7.0.x. The issue impacts the integration of virtualized network layers Level0 – Level1 – Level2. My Managed Service & Infrastructure Delivery team detect this problem inside customer environment and replicate the issue is Almaviva Labs.

we discover that VXLAN (Flannel) network defined as Overlay Network of Kubernetes doesn’t properly work, in particular we understood that package encapsulation between different network levels was not correctly managed by Linux kernel on some GNU/Linux distribution, like Ubuntu 18.04, Ubuntu 20.04 e Open SUSE Leap 15.3, SUSE Linux ES 15 SP3 / SP4, etc.
This compatibility issue inside virtualization software layer of network (vxlan encapsulation e checksum offloading) is even documented inside official documentation portal of RKE2: Known Issues and Limitations - RKE2 - Rancher’s Next Generation Kubernetes Distribution

Solutions

Below the identified solutions:

Team identifies multiple solutions to solve this issue, but some of them are not applicable in production environment:

  1. Apply a downgrade of Hypervisor version, for example VMware vSphere (ESXi) 6.x.x. This is not suggested because apply a downgrade means have software not updated with less functionalities and last security patch.

  2. Use a different GNU/Linux distribution like Red-Hat 8.5 or 9.x / Ubuntu 22.04.x LTS / Rocky Linux 8.5-9.x / Alma Linux 8.5-9.0;

  3. Change the CNI plug-in of Kubernetes. This change requires a big effort in configuration and test to understand if there is some limitation or issues.

  4. Change the virtual network adapter inside VMware vSphere Hypervisor (ESXi) 7.0.x to force the hypervisor to change modules and virtualization connectors between different internal stack so that the Overlay Network can properly work.

The team decided to apply the solution 4 and after setting change inside Hypervisor the issue is solved and pods inside Kubernetes properly communicate on Overlay network

Created by
Almaviva
Managed Service & Infrastructure Delivery