Env info:
K3s version: v1.27.3+k3s1
Node info:Linux shk3s-3 5.4.0-153-generic #170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Cluster info: 3 servers
❯ k get no
NAME STATUS ROLES AGE VERSION
shk3s-1 Ready control-plane,etcd,master 2d3h v1.27.3+k3s1
shk3s-2 Ready control-plane,etcd,master 2d3h v1.27.3+k3s1
shk3s-3 Ready control-plane,etcd,master 2d3h v1.27.3+k3s1
Description:
Run 3 replicas app (like redis) in the cluster, and a service with 3 endpoints normally, like below
❯ kubectl describe endpoints/redis
Name: redis
Namespace: dbs
Labels: app.kubernetes.io/component=node
app.kubernetes.io/instance=redis
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=redis
helm.sh/chart=redis-18.0.2
Annotations: <none>
Subsets:
Addresses: 10.42.0.23,10.42.1.37,10.42.2.22
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
tcp-redis 6379 TCP
tcp-sentinel 26379 TCP
Events: <none>
When use command kubectl delete po/redis-node-0
, then the endpoint for that pod was removed from list quickly. But when shutdown a node, watch the endpoints list, the endpoint was removed wait about 30s to 1min, during this period, the triffics will be routed to the failed pod, cause the call error.
Below is the k3s server start parameters:
k3s server --prefer-bundled-bin \
--disable local-storage \
--write-kubeconfig-mode 644 \
--kube-apiserver-arg default-not-ready-toleration-seconds=30 \
--kube-apiserver-arg default-unreachable-toleration-seconds=30 \
--kube-controller-manager-arg node-monitor-grace-period=10s
So, any tips to reduce the time of removing endpoint when node down?