js-max
June 21, 2023, 3:28pm
1
Hi all
I’m trying to enable ipv6 on our setup and struggling to do it
Maybe I will need @superseb eyes on it to figure it out what am I missing.
I’m following instructions from:
Merging those two I have installed the cluster with configs from docs rancher and tried to verify ipv6 stack with @superseb cli from github link
But I think that I’m missing something here as ipv6 shows on annotations but not on the ip field from POD:
Exec ip a s
inside the POD return the IPV6:
But not able to achieve IPV6 comms between two pods…
Also, I followed instructions to enable IPV6 at infrastructure layer, so, IPV6 it’s being assigned to the node and confirmed with ip a s
.
what am I missing here?
Thanks in advance!
Can you please share the relevant part of the config so we can check if its all correctly configured? Please also include RKE version used and what exact Kubernetes version vX.X.X-rancherX-X
.
It also helps if you can run through the Validate parts of Validate IPv4/IPv6 dual-stack | Kubernetes
js-max
June 21, 2023, 4:44pm
3
You right @superseb , shame on me not sharing those upfront.
answers: {}
description: Custom cluster using RKE
docker_root_dir: /var/lib/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
fleet_workspace_name: fleet-default
local_cluster_auth_endpoint:
enabled: false
name: mycluster-14
rancher_kubernetes_engine_config:
addon_job_timeout: 45
authentication:
strategy: x509
authorization: {}
bastion_host:
ignore_proxy_env_vars: false
ssh_agent_auth: false
cloud_provider:
name: gce
cluster_name: mycluster-14
enable_cri_dockerd: true
ignore_docker_version: true
ingress:
default_backend: true
default_ingress_class: true
http_port: 0
https_port: 0
node_selector:
role: tooling
provider: nginx
kubernetes_version: v1.24.10-rancher4-1
monitoring:
node_selector:
role: tooling
provider: metrics-server
replicas: 1
network:
mtu: 0
plugin: calico
restore:
restore: false
rotate_encryption_key: false
services:
etcd:
backup_config:
enabled: true
interval_hours: 12
retention: 6
safe_timestamp: false
timeout: 300
creation: 12h
extra_args:
election-timeout: '5000'
heartbeat-interval: '500'
gid: 0
retention: 72h
snapshot: false
uid: 0
kube-api:
always_pull_images: false
pod_security_policy: false
service_cluster_ip_range: '10.43.0.0/16,fd98::/108'
service_node_port_range: 30000-32767
kube-controller:
cluster_cidr: '10.42.0.0/16,fd01::/64'
service_cluster_ip_range: '10.43.0.0/16,fd98::/108'
kubelet:
fail_swap_on: false
generate_serving_certificate: false
kubeproxy: {}
scheduler: {}
ssh_agent_auth: false
upgrade_strategy:
max_unavailable_controlplane: '1'
max_unavailable_worker: 10%
node_drain_input:
delete_local_data: false
force: false
grace_period: -1
ignore_daemon_sets: true
timeout: 120
I also read the validate IPv4/IPv6 dual-stack | Kubernetes document; and noticed some non-expected behaviour as:
# my control node
> kubectl get nodes mycluster-14-dsc-node-master-0 -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
10.42.0.0/24
fd01::/64
# my worker node
> kubectl get nodes mycluster-14-dsw-node-worker-1jfg -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
>
# a CIDR error
> kubectl get events | grep CIDRNotAvailable
2m7s Normal CIDRNotAvailable node/mycluster-14-dsw-node-worker-1jfg Node mycluster-14-dsw-node-worker-1jfg status is now: CIDRNotAvailable
37s Normal CIDRNotAvailable node/mycluster-14-dsw-tooling-node-worker-jmx3 Node mycluster-14-dsw-tooling-node-worker-jmx3 status is now: CIDRNotAvailable
Since this is a playground that I’m using to test ipv6 I can delete it as many time as needed.
We are trying on top of GCP (I know instructions mention only AWS) nevertheless all the bullet points - related to IPV6 config - are configured at GCP config.
But I fully understand you if you want to avoid to go further.
I guess it makes sense but I don’t have the full story yet why this worked when it was being developed/tested. Can you try changing the cluster_cidr
to fd01::/48
? You probably need to build the cluster from scratch to make sure no leftover stuff causes issues.
js-max
June 22, 2023, 8:50am
5
Hi @superseb , changing that value cleared out the CIDR error.
Using the k8s docs page to validate, I’m not seeing - at node layer - the IPV6 being assigned to internal ip field:
> kubectl get nodes mycluster-13-dsc-node-master-0 -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
10.42.0.0/24
fd01::/64
>
> kubectl get nodes mycluster-13-dsw-node-worker-nfxn -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
10.42.1.0/24
fd01:0:0:1::/64
>
> kubectl get nodes mycluster-13-dsw-tooling-node-worker-hncc -o go-template --template='{{range .spec.podCIDRs}}{{printf "%s\n" .}}{{end}}'
10.42.2.0/24
fd01:0:0:2::/64
>
> kubectl get nodes mycluster-13-dsc-node-master-0 -o go-template --template='{{range .status.addresses}}{{printf "%s: %s\n" .type .address}}{{end}}'
InternalIP: 10.40.9.92
InternalDNS: mycluster-13-dsc-node-master-0.asia-south1-a.c.mycluster-poc.internal
Hostname: mycluster-13-dsc-node-master-0
>
> kubectl get nodes mycluster-13-dsw-node-worker-nfxn -o go-template --template='{{range .status.addresses}}{{printf "%s: %s\n" .type .address}}{{end}}'
InternalIP: 10.40.9.94
ExternalIP: <redacted>
InternalDNS: mycluster-13-dsw-node-worker-nfxn.asia-south1-a.c.mycluster-poc.internal
Hostname: mycluster-13-dsw-node-worker-nfxn
>
> kubectl get nodes mycluster-13-dsw-tooling-node-worker-hncc -o go-template --template='{{range .status.addresses}}{{printf "%s: %s\n" .type .address}}{{end}}'
InternalIP: 10.40.9.93
InternalDNS: mycluster-13-dsw-tooling-node-worker-hncc.asia-south1-a.c.mycluster-poc.internal
Hostname: mycluster-13-dsw-tooling-node-worker-hncc
>
Reusing cli from github to validate I can confirm that annotations
have an IPV6 listed but .status.podIPs
don’t reflect it:
> kubectl create deployment nginx --image=nginx --replicas=1
> kubectl get pods nginx-8f458dc5b-km2kp -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
cni.projectcalico.org/containerID: 14b591bd5bee0382c628ff95df8fe084e55eecf0e5ca642f7627335317197986
cni.projectcalico.org/podIP: 10.42.239.75/32
cni.projectcalico.org/podIPs: 10.42.239.75/32,fd01::3932:47c6:c4b4:356a:ef4a/128
...
status:
conditions:
...
name: nginx
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2023-06-22T08:41:03Z"
hostIP: 10.40.9.93
phase: Running
podIP: 10.42.239.75
podIPs:
- ip: 10.42.239.75
Hope that this help
Okay, I need some time to reproduce this. If you are still working with disposable enviroments (I assume this was an environment from scratch), I would advise testing:
With cloud_provider: name: gce
removed from the configuration
Can you share what Rancher version you are using (given that you are provisioning the clusters from Rancher)?
And I assume the nodes have IPv6 addresses and connectivity has been verified (outside of k8s, so on host level)?
js-max
June 22, 2023, 10:47am
7
Hi @superseb I’m working with Rancher Manager version 2.7.1
My config YAML without cloud_provider
settings:
answers: {}
description: Custom cluster using RKE
docker_root_dir: /var/lib/docker
enable_cluster_alerting: false
enable_cluster_monitoring: false
enable_network_policy: false
fleet_workspace_name: fleet-default
local_cluster_auth_endpoint:
enabled: false
name: mycluster-13
rancher_kubernetes_engine_config:
addon_job_timeout: 45
authentication:
strategy: x509
authorization: {}
bastion_host:
ignore_proxy_env_vars: false
ssh_agent_auth: false
cloud_provider: {}
cluster_name: mycluster-13
enable_cri_dockerd: true
ignore_docker_version: true
ingress:
default_backend: true
default_ingress_class: true
http_port: 0
https_port: 0
node_selector:
role: tooling
provider: nginx
kubernetes_version: v1.24.10-rancher4-1
monitoring:
node_selector:
role: tooling
provider: metrics-server
replicas: 1
network:
mtu: 0
plugin: calico
restore:
restore: false
rotate_encryption_key: false
services:
etcd:
backup_config:
enabled: true
interval_hours: 12
retention: 6
safe_timestamp: false
timeout: 300
creation: 12h
extra_args:
election-timeout: '5000'
heartbeat-interval: '500'
gid: 0
retention: 72h
snapshot: false
uid: 0
kube-api:
always_pull_images: false
pod_security_policy: false
service_cluster_ip_range: '10.43.0.0/16,fd98::/108'
service_node_port_range: 30000-32767
kube-controller:
cluster_cidr: '10.42.0.0/16,fd01::/48'
service_cluster_ip_range: '10.43.0.0/16,fd98::/108'
kubelet:
fail_swap_on: false
generate_serving_certificate: false
kubeproxy: {}
scheduler: {}
ssh_agent_auth: false
upgrade_strategy:
max_unavailable_controlplane: '1'
max_unavailable_worker: 10%
node_drain_input:
delete_local_data: false
force: false
grace_period: -1
ignore_daemon_sets: true
timeout: 120
Of course without such setting I lost most labels provided by GCP and will lose features provided by autoscaler pod.
But lets leave it for later.
Findings, I have repeated previous steps as before, still same behaviour:
on .status
IPV6 it’s not present
annotations cni.projectcalico.org
list one IPV6
same situation happens with PODS and NODES
At OS, I’m able to uset nc
and reach between hosts to the any open IPV6 port (e.g. 6443)
All my tests will be done from scratch, so yes, they are disposable for now.