Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system

Hello, I am new to Rancher, I tried it and I have this problem:
I have installed Rancher v2.2.4 on my local docker machine (v18.09.5, build e8ff056dbc) via this command:
docker run -d --restart=unless-stopped -p 2804:80 -p 2805:443 rancher/rancher

My goal for now is to make 1 cluster with 1 node.
After install, I successfully got into settings, logged in and started adding new cluster.
I selected this settings:
Custom - From my own existing nodes
Kubernetes version - v1.13.5-rancher1-3
Network provider - Canal
Cloud provider - none

This settings should be right, but I tried other combinations, but it still didnt work.
In adding nodes I selected all (etcd, Control plane, worker). Again, I tried other combinations, but with no result.

I successfully installed needed containers via provided command and cluster started provisioning.
And then I got this error:
Failed to get job complete status for job rke-network-plugin-deploy-job in namespace kube-system.

Then I noticed, that rancher on some point was adding constantly these 2 containers, which couldn’t start:

Name: k8s_rke-network-plugin-pod_rke-network-plugin-deploy-job-g9mcn_kube-system_8fe433df-9e26-11e9-b1a6-0216caa1f797_0
Image: 5731d7826aae

Name:kk8s_POD_rke-network-plugin-deploy-job-g9mcn_kube-system_8fe433df-9e26-11e9-b1a6-0216caa1f797_0
Image: rancher/pause:3.1

On other forums I found that they changed cluster.yml (changed localhost to actual ip of the node), but I can’t find this file anywhere.

Thanks for any help.

Can you provide the output of docker info from the node you are trying to add? And the logs of one of the non-pause containers?

Docker info:
Containers: 21
Running: 18
Paused: 0
Stopped: 3
Images: 19
Server Version: 18.09.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.0-8-amd64
Operating System: Debian GNU/Linux 9 (stretch)
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 39.3GiB
Name: docker3
ID: LREU:JZWY:YILP:VFEV:4A43:V7LS:HPS2:6X5E:C45D:QDPY:4ACD:ROQQ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Username: aiphee
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine

Every container listed below is from image rancher/rancher-agent:v2.2.4.
Logs messages were repeating, so I copied a little sample.
Docker logs cranky_ramanujan:
time="2019-07-11T07:48:53Z" level=info msg="Connecting to wss://my_url.com:2805/v3/connect with token x2sj9pts4x8pmlnh858hmcvm8hnll4ct9fbn4jdkv9prkktkn52ns9"

time="2019-07-11T07:48:53Z" level=info msg="Connecting to proxy" url="wss://my_url.com:2805/v3/connect"

time="2019-07-11T07:48:53Z" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"

time="2019-07-11T07:48:53Z" level=error msg="Failed to connect to proxy" error="websocket: bad handshake"

Docker logs eager_haibt:
time="2019-07-11T07:50:41Z" level=info msg="Error while getting agent config: invalid response 500: cluster not found"

Docker logs share-mnt:
Error response from daemon: {"message":"No such container: kubelet"} Error: failed to start containers: kubelet

+ sleep 2

Docker logs elegant_austin:
time="2019-07-11T07:53:49Z" level=info msg="Error while getting agent config: invalid response 500: cluster not found"

Thanks for any help.

Excuse me, has this problem been solved? I encountered the same error

I had the same issue. My nodes got just about 25% free disc storage left. Once I deleted old data and had about 75% of free disc space on all of my 3 nodes the rke-network-plugin was deployed successfully.