The endpoint was removed from Service list long time when node down

mengzyou · September 13, 2023, 9:12am

Env info:
K3s version: v1.27.3+k3s1

Node info：Linux shk3s-3 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Cluster info: 3 servers

❯ k get no
NAME      STATUS   ROLES                       AGE    VERSION
shk3s-1   Ready    control-plane,etcd,master   2d3h   v1.27.3+k3s1
shk3s-2   Ready    control-plane,etcd,master   2d3h   v1.27.3+k3s1
shk3s-3   Ready    control-plane,etcd,master   2d3h   v1.27.3+k3s1

Description:
Run 3 replicas app (like redis) in the cluster, and a service with 3 endpoints normally, like below

❯ kubectl describe endpoints/redis
Name:         redis
Namespace:    dbs
Labels:       app.kubernetes.io/component=node
              app.kubernetes.io/instance=redis
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=redis
              helm.sh/chart=redis-18.0.2
Annotations:  <none>
Subsets:
  Addresses:          10.42.0.23,10.42.1.37,10.42.2.22
  NotReadyAddresses:  <none>
  Ports:
    Name          Port   Protocol
    ----          ----   --------
    tcp-redis     6379   TCP
    tcp-sentinel  26379  TCP

Events:  <none>

When use command kubectl delete po/redis-node-0, then the endpoint for that pod was removed from list quickly. But when shutdown a node, watch the endpoints list, the endpoint was removed wait about 30s to 1min, during this period, the triffics will be routed to the failed pod, cause the call error.

Below is the k3s server start parameters:

k3s server --prefer-bundled-bin \
  --disable local-storage \
  --write-kubeconfig-mode 644 \
  --kube-apiserver-arg default-not-ready-toleration-seconds=30 \
  --kube-apiserver-arg default-unreachable-toleration-seconds=30 \
  --kube-controller-manager-arg node-monitor-grace-period=10s

So, any tips to reduce the time of removing endpoint when node down?

Topic		Replies	Views
[SOLVED] K3S (only master) gives the error: "couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request" k3s, k3OS, and k3d	4	10392	May 19, 2023
Nodeport conflicts k3s, k3OS, and k3d	1	33	November 7, 2024
Couldn't get resource list for metrics.k8s.io k3s, k3OS, and k3d	2	3252	January 25, 2024
K3s.service stops working k3s, k3OS, and k3d	1	460	November 7, 2023
Replace master node k3s, k3OS, and k3d	1	12	December 5, 2024

The endpoint was removed from Service list long time when node down

Related topics