Rancher with IPV6

Hi all

I’m trying to enable ipv6 on our setup and struggling to do it

Maybe I will need @superseb eyes on it to figure it out what am I missing.

I’m following instructions from:

Merging those two I have installed the cluster with configs from docs rancher and tried to verify ipv6 stack with @superseb cli from github link

But I think that I’m missing something here as ipv6 shows on annotations but not on the ip field from POD:

Exec ip a s inside the POD return the IPV6:

But not able to achieve IPV6 comms between two pods…

Also, I followed instructions to enable IPV6 at infrastructure layer, so, IPV6 it’s being assigned to the node and confirmed with ip a s.

what am I missing here?

Thanks in advance!

Can you please share the relevant part of the config so we can check if its all correctly configured? Please also include RKE version used and what exact Kubernetes version vX.X.X-rancherX-X.

It also helps if you can run through the Validate parts of Validate IPv4/IPv6 dual-stack | Kubernetes

You right @superseb , shame on me not sharing those upfront.

answers: {}
description: Custom cluster using RKE
docker_root_dir: /var/lib/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
fleet_workspace_name: fleet-default
local_cluster_auth_endpoint:
  enabled: false
name: mycluster-14
rancher_kubernetes_engine_config:
  addon_job_timeout: 45
  authentication:
    strategy: x509
  authorization: {}
  bastion_host:
    ignore_proxy_env_vars: false
    ssh_agent_auth: false
  cloud_provider:
    name: gce
  cluster_name: mycluster-14
  enable_cri_dockerd: true
  ignore_docker_version: true
  ingress:
    default_backend: true
    default_ingress_class: true
    http_port: 0
    https_port: 0
    node_selector:
      role: tooling
    provider: nginx
  kubernetes_version: v1.24.10-rancher4-1
  monitoring:
    node_selector:
      role: tooling
    provider: metrics-server
    replicas: 1
  network:
    mtu: 0
    plugin: calico
  restore:
    restore: false
  rotate_encryption_key: false
  services:
    etcd:
      backup_config:
        enabled: true
        interval_hours: 12
        retention: 6
        safe_timestamp: false
        timeout: 300
      creation: 12h
      extra_args:
        election-timeout: '5000'
        heartbeat-interval: '500'
      gid: 0
      retention: 72h
      snapshot: false
      uid: 0
    kube-api:
      always_pull_images: false
      pod_security_policy: false
      service_cluster_ip_range: '10.43.0.0/16,fd98::/108'
      service_node_port_range: 30000-32767
    kube-controller:
      cluster_cidr: '10.42.0.0/16,fd01::/64'
      service_cluster_ip_range: '10.43.0.0/16,fd98::/108'
    kubelet:
      fail_swap_on: false
      generate_serving_certificate: false
    kubeproxy: {}
    scheduler: {}
  ssh_agent_auth: false
  upgrade_strategy:
    max_unavailable_controlplane: '1'
    max_unavailable_worker: 10%
    node_drain_input:
      delete_local_data: false
      force: false
      grace_period: -1
      ignore_daemon_sets: true
      timeout: 120

I also read the validate IPv4/IPv6 dual-stack | Kubernetes document; and noticed some non-expected behaviour as:

# my control node
> kubectl get nodes mycluster-14-dsc-node-master-0 -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
10.42.0.0/24
fd01::/64

# my worker node
> kubectl get nodes mycluster-14-dsw-node-worker-1jfg -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
> 

# a CIDR error
> kubectl get events | grep CIDRNotAvailable
2m7s        Normal   CIDRNotAvailable          node/mycluster-14-dsw-node-worker-1jfg   Node mycluster-14-dsw-node-worker-1jfg status is now: CIDRNotAvailable
37s         Normal   CIDRNotAvailable          node/mycluster-14-dsw-tooling-node-worker-jmx3   Node mycluster-14-dsw-tooling-node-worker-jmx3 status is now: CIDRNotAvailable

Since this is a playground that I’m using to test ipv6 I can delete it as many time as needed.

We are trying on top of GCP (I know instructions mention only AWS) nevertheless all the bullet points - related to IPV6 config - are configured at GCP config.

But I fully understand you if you want to avoid to go further.

I guess it makes sense but I don’t have the full story yet why this worked when it was being developed/tested. Can you try changing the cluster_cidr to fd01::/48? You probably need to build the cluster from scratch to make sure no leftover stuff causes issues.

Hi @superseb , changing that value cleared out the CIDR error.

Using the k8s docs page to validate, I’m not seeing - at node layer - the IPV6 being assigned to internal ip field:

> kubectl get nodes mycluster-13-dsc-node-master-0 -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
10.42.0.0/24
fd01::/64
> 
> kubectl get nodes mycluster-13-dsw-node-worker-nfxn -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
10.42.1.0/24
fd01:0:0:1::/64
> 
> kubectl get nodes mycluster-13-dsw-tooling-node-worker-hncc -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
10.42.2.0/24
fd01:0:0:2::/64
> 


> kubectl get nodes mycluster-13-dsc-node-master-0 -o go-template --template='{{range .status.addresses}}{{printf "%s: %s\n" .type .address}}{{end}}'
InternalIP: 10.40.9.92
InternalDNS: mycluster-13-dsc-node-master-0.asia-south1-a.c.mycluster-poc.internal
Hostname: mycluster-13-dsc-node-master-0
> 
> kubectl get nodes mycluster-13-dsw-node-worker-nfxn -o go-template --template='{{range .status.addresses}}{{printf "%s: %s\n" .type .address}}{{end}}'
InternalIP: 10.40.9.94
ExternalIP: <redacted>
InternalDNS: mycluster-13-dsw-node-worker-nfxn.asia-south1-a.c.mycluster-poc.internal
Hostname: mycluster-13-dsw-node-worker-nfxn
>
> kubectl get nodes mycluster-13-dsw-tooling-node-worker-hncc -o go-template --template='{{range .status.addresses}}{{printf "%s: %s\n" .type .address}}{{end}}'
InternalIP: 10.40.9.93
InternalDNS: mycluster-13-dsw-tooling-node-worker-hncc.asia-south1-a.c.mycluster-poc.internal
Hostname: mycluster-13-dsw-tooling-node-worker-hncc
> 

Reusing cli from github to validate I can confirm that annotations have an IPV6 listed but .status.podIPs don’t reflect it:

> kubectl create deployment nginx --image=nginx --replicas=1

> kubectl get pods nginx-8f458dc5b-km2kp -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 14b591bd5bee0382c628ff95df8fe084e55eecf0e5ca642f7627335317197986
    cni.projectcalico.org/podIP: 10.42.239.75/32
    cni.projectcalico.org/podIPs: 10.42.239.75/32,fd01::3932:47c6:c4b4:356a:ef4a/128
...
status:
  conditions:
...
    name: nginx
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-06-22T08:41:03Z"
  hostIP: 10.40.9.93
  phase: Running
  podIP: 10.42.239.75
  podIPs:
  - ip: 10.42.239.75

Hope that this help

Okay, I need some time to reproduce this. If you are still working with disposable enviroments (I assume this was an environment from scratch), I would advise testing:

  • With cloud_provider: name: gce removed from the configuration

Can you share what Rancher version you are using (given that you are provisioning the clusters from Rancher)?

And I assume the nodes have IPv6 addresses and connectivity has been verified (outside of k8s, so on host level)?

Hi @superseb I’m working with Rancher Manager version 2.7.1

My config YAML without cloud_provider settings:

answers: {}
description: Custom cluster using RKE
docker_root_dir: /var/lib/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
fleet_workspace_name: fleet-default
local_cluster_auth_endpoint:
  enabled: false
name: mycluster-13
rancher_kubernetes_engine_config:
  addon_job_timeout: 45
  authentication:
    strategy: x509
  authorization: {}
  bastion_host:
    ignore_proxy_env_vars: false
    ssh_agent_auth: false
  cloud_provider: {}
  cluster_name: mycluster-13
  enable_cri_dockerd: true
  ignore_docker_version: true
  ingress:
    default_backend: true
    default_ingress_class: true
    http_port: 0
    https_port: 0
    node_selector:
      role: tooling
    provider: nginx
  kubernetes_version: v1.24.10-rancher4-1
  monitoring:
    node_selector:
      role: tooling
    provider: metrics-server
    replicas: 1
  network:
    mtu: 0
    plugin: calico
  restore:
    restore: false
  rotate_encryption_key: false
  services:
    etcd:
      backup_config:
        enabled: true
        interval_hours: 12
        retention: 6
        safe_timestamp: false
        timeout: 300
      creation: 12h
      extra_args:
        election-timeout: '5000'
        heartbeat-interval: '500'
      gid: 0
      retention: 72h
      snapshot: false
      uid: 0
    kube-api:
      always_pull_images: false
      pod_security_policy: false
      service_cluster_ip_range: '10.43.0.0/16,fd98::/108'
      service_node_port_range: 30000-32767
    kube-controller:
      cluster_cidr: '10.42.0.0/16,fd01::/48'
      service_cluster_ip_range: '10.43.0.0/16,fd98::/108'
    kubelet:
      fail_swap_on: false
      generate_serving_certificate: false
    kubeproxy: {}
    scheduler: {}
  ssh_agent_auth: false
  upgrade_strategy:
    max_unavailable_controlplane: '1'
    max_unavailable_worker: 10%
    node_drain_input:
      delete_local_data: false
      force: false
      grace_period: -1
      ignore_daemon_sets: true
      timeout: 120

Of course without such setting I lost most labels provided by GCP and will lose features provided by autoscaler pod.
But lets leave it for later.

Findings, I have repeated previous steps as before, still same behaviour:

  • on .status IPV6 it’s not present
  • annotations cni.projectcalico.org list one IPV6
  • same situation happens with PODS and NODES

At OS, I’m able to uset nc and reach between hosts to the any open IPV6 port (e.g. 6443)

All my tests will be done from scratch, so yes, they are disposable for now.

This will be followed up in cri-dockerd needs IPv6 dual stack flag enabled when more than 1 CIDR is configured · Issue #3271 · rancher/rke · GitHub