Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

Nick50 · July 4, 2019, 6:48am

Hello, I am new to Rancher, I tried it and I have this problem:
I have installed Rancher v2.2.4 on my local docker machine (v18.09.5, build e8ff056dbc) via this command:
docker run -d --restart=unless-stopped -p 2804:80 -p 2805:443 rancher/rancher

My goal for now is to make 1 cluster with 1 node.
After install, I successfully got into settings, logged in and started adding new cluster.
I selected this settings:
Custom - From my own existing nodes
Kubernetes version - v1.13.5-rancher1-3
Network provider - Canal
Cloud provider - none

This settings should be right, but I tried other combinations, but it still didnt work.
In adding nodes I selected all (etcd, Control plane, worker). Again, I tried other combinations, but with no result.

I successfully installed needed containers via provided command and cluster started provisioning.
And then I got this error:
Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system.

Then I noticed, that rancher on some point was adding constantly these 2 containers, which couldn’t start:

Name: k8s_rke-network-plugin-pod_rke-network-plugin-deploy-job-g9mcn_kube-system_8fe433df-9e26-11e9-b1a6-0216caa1f797_0
Image: 5731d7826aae

Name:kk8s_POD_rke-network-plugin-deploy-job-g9mcn_kube-system_8fe433df-9e26-11e9-b1a6-0216caa1f797_0
Image: rancher/pause:3.1

On other forums I found that they changed cluster.yml (changed localhost to actual ip of the node), but I can’t find this file anywhere.

Thanks for any help.

superseb · July 9, 2019, 2:29pm

Can you provide the output of docker info from the node you are trying to add? And the logs of one of the non-pause containers?

Nick50 · July 11, 2019, 7:59am

Docker info:
Containers: 21
Running: 18
Paused: 0
Stopped: 3
Images: 19
Server Version: 18.09.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.0-8-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 39.3GiB
Name: docker3
ID: LREU:JZWY:YILP:VFEV:4A43:V7LS:HPS2:6X5E:C45D:QDPY:4ACD:ROQQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: aiphee
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Every container listed below is from image rancher/rancher-agent:v2.2.4.
Logs messages were repeating, so I copied a little sample.
Docker logs cranky_ramanujan:
time="2019-07-11T07:48:53Z" level=info msg="Connecting to wss://my_url.com:2805/v3/connect with token x2sj9pts4x8pmlnh858hmcvm8hnll4ct9fbn4jdkv9prkktkn52ns9"

time="2019-07-11T07:48:53Z" level=info msg="Connecting to proxy" url="wss://my_url.com:2805/v3/connect"

time="2019-07-11T07:48:53Z" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"

Docker logs eager_haibt:
time="2019-07-11T07:50:41Z" level=info msg="Error while getting agent config: invalid response 500: cluster not found"

Docker logs share-mnt:
Error response from daemon: {"message":"No such container: kubelet"} Error: failed to start containers: kubelet

+ sleep 2

Docker logs elegant_austin:
time="2019-07-11T07:53:49Z" level=info msg="Error while getting agent config: invalid response 500: cluster not found"

Thanks for any help.

3lin · October 13, 2019, 10:26am

Excuse me, has this problem been solved? I encountered the same error

KetoStheno · October 30, 2019, 3:32am

I had the same issue. My nodes got just about 25% free disc storage left. Once I deleted old data and had about 75% of free disc space on all of my 3 nodes the rke-network-plugin was deployed successfully.

Topic		Replies	Views
Error updating kubernetes from v1.13.5 to v1.17.2 Rancher	0	675	February 6, 2020
Unable to add new worker nodes in existing rancher cluster Rancher	2	863	November 17, 2021
Rancher deploy pending state Rancher	2	883	September 29, 2021
Recreating cluster fails Rancher	1	1476	September 22, 2020
Wrong RKE-tools version Rancher	3	859	April 27, 2023

Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

Related topics