Rke complains about 1 of 3 nodes

Harri · March 26, 2020, 11:58am

I am trying to update my Rancher cluster using “rke up”, as recommended on https://rancher.com/docs/rke/latest/en/upgrades/. Problem is, it complains about 1 of 3 nodes:

:
INFO[0189] Waiting for [kubelet] container to exit on host [r03.example.com] 
INFO[0189] Renaming container [kubelet] to [old-kubelet] on host [r03.example.com], try #1 
INFO[0189] Starting container [kubelet] on host [r03.example.com], try #1 
INFO[0189] Waiting for [kubelet] container to exit on host [r02.example.com] 
INFO[0189] Renaming container [kubelet] to [old-kubelet] on host [r02.example.com], try #1 
INFO[0189] Waiting for [kubelet] container to exit on host [r01.example.com] 
INFO[0189] Renaming container [kubelet] to [old-kubelet] on host [r01.example.com], try #1 
INFO[0189] Starting container [kubelet] on host [r02.example.com], try #1 
INFO[0189] [worker] Successfully updated [kubelet] container on host [r03.example.com] 
INFO[0189] Removing container [old-kubelet] on host [r03.example.com], try #1 
INFO[0189] Starting container [kubelet] on host [r01.example.com], try #1 
INFO[0189] [healthcheck] Start Healthcheck on service [kubelet] on host [r03.example.com] 
INFO[0190] [worker] Successfully updated [kubelet] container on host [r02.example.com] 
INFO[0190] Removing container [old-kubelet] on host [r02.example.com], try #1 
INFO[0190] [worker] Successfully updated [kubelet] container on host [r01.example.com] 
INFO[0190] Removing container [old-kubelet] on host [r01.example.com], try #1 
INFO[0190] [healthcheck] Start Healthcheck on service [kubelet] on host [r02.example.com] 
INFO[0190] [healthcheck] Start Healthcheck on service [kubelet] on host [r01.example.com] 
FATA[0241] [workerPlane] Failed to bring up Worker Plane: [Failed to verify healthcheck: Failed to check http://localhost:10248/healthz for service [kubelet] on host [r03.example.com]: Get http://localhost:10248/healthz: Unable to access the service on localhost:10248. The service might be still starting up. Error: ssh: rejected: connect failed (Connection refused), log: I0326 06:52:19.557779   15320 balancer_conn_wrappers.go:127] pickfirstBalancer: HandleSubConnStateChange: 0xc0003d4040, READY]

AFAICT r03 is not starting up, so I wonder what could be done to fix this problem?

superseb · March 26, 2020, 1:13pm

Does this happen consistently when trying to upgrade? We’ll need the full logs of the kubelet container on that node to see why it can’t start properly.

Harri · March 27, 2020, 6:19am

Thats a pretty large file. 170K compressed. Is there some way to attach files here?

Harri · March 27, 2020, 6:26am

https://afaics.de/download/kubelet.log.gz

Topic		Replies	Views
Recreating cluster fails Rancher	1	1476	September 22, 2020
Rancher deploy pending state Rancher	2	883	September 29, 2021
Old-kubelet & kubelet on the node Rancher	0	485	July 30, 2019
RKE2 - Rancher cluster versions missmtach after manual upgrade Rancher	0	188	March 26, 2024
Trying to remove broken Rancher cluster node (rke) Rancher	0	2171	June 19, 2020

Rke complains about 1 of 3 nodes

Related topics