So each node is having the exact same problem. For each, I ssh’d in and ran the requested commands. They all returned identical information:
When I ran the following command:
docker pull rancher/hyperkube:v1.20.6-rancher1
Each node tells me it’s out of space. Which is odd as I’ve never had to provision space before. I checked the node template and it has a little under 20 GB’s. I can increase it but i’d prefer to find out why 20 GB’s is filled. So I tried from the root of the drive to run:
sudo su -
cd
du -sh *
This produced the following for all nodes (identical):
[root@sg-master1 /]# du -sh *
0 bin
0 dev
7.5M etc
8.0K home
0 host
0 lib
0 lib64
0 media
0 mnt
122.2M opt
0 proc
0 root
948.0K run
0 sbin
0 sys
4.0K tmp
214.9M usr
1.7G var
[root@sg-master1 /]#
As you can see, there is not 20 GB’s there. What’s weird is the docker pull command gets to the last megabyte of the download and craps out.
I extended my node template to give 25 GB’s just to test. I changed to a single node and let it build. Same issue:
So i’ve done some additional troubleshooting:
- Ran a prune on docker images and containers. Reclaimed 159 MB’s in space. This got me through pulling all but the last two lines of the above screenshot. Extraction failed.
- I did further research thinking it was a docker problem and not a rancher problem. I discovered some things to try with the files in the docker directory. I went to work on that file, it’s not there. Another set of steps was to change the allotted file size in the daemon json - but that was missing too. I confirmed that the container itself is at 25% space used on 25 GB. This lead me to the above mentioned research but as none of the config files are where they should be (per the docker documentation), I figured it may be related to Rancher OS specifically.
- I then figured that as my previous config (before it blew up) was using the RancherOS image 1.5.6. As my new cluster was not, I made node templates using 1.5.6 and spun up a cluster. Unfortunately, I reached the same abrupt conclusion with the exact same behavior identified above.
- I ran the following command:
docker system prune --all --force --volumes
This allowed me to get farther allong in the pull:
Here is current storage capacity system wide (result of command ‘df -h’):
Filesystem Size Used Available Use% Mounted on
overlay 1.9G 489.5M 1.4G 25% /
tmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
tmpfs 1.9G 0 1.9G 0% /media
none 1.9G 928.0K 1.9G 0% /run
tmpfs 1.9G 0 1.9G 0% /mnt
none 1.9G 928.0K 1.9G 0% /var/run
devtmpfs 1.9G 0 1.9G 0% /host/dev
shm 64.0M 0 64.0M 0% /host/dev/shm
tmpfs 1.9G 489.5M 1.4G 25% /etc/hostname
shm 64.0M 0 64.0M 0% /dev/shm
devtmpfs 1.9G 0 1.9G 0% /dev
shm 64.0M 0 64.0M 0% /dev/shm
overlay 1.9G 1.2G 676.7M 65% /var/lib/docker/overlay2/b1388a705ae818c5993ae98af360d815a22185e1d48c21bab4b64f58bdbaa243/merged
overlay 1.9G 1.2G 676.7M 65% /var/lib/docker/overlay2/1b5eb117e063cd17898b7d990818af49a5a2ffb5a4732107ff8a6d956db7c0c3/merged
shm 64.0M 0 64.0M 0% /var/lib/docker/containers/661c72bedf38d29f4f2e0d9574d61448131c9a395c6c4ec4a1013ec9684a528b/mounts/shm
- (Ultimately grasping at straws…) I then spun up a new linux VM and created a new Rancher instance on the last version my previous setup was using before it blew up on me (2.5.5). The results of that test were identical to the above.
All of the above makes me think it has something to do with a config somewhere. I’m using the same version of Rancher, RancherOS image, credentials to my hosting infrastructure, etc.
Please let me know if there is anything further I can do on my end as far as information or testing goes. I’m at a complete loss for what I am missing!