Hi.
I’m in charge of setting up a test ceph cluster in our institution. The hardware where it must fit in is composed by 6 single hdd computers (adding or moving around disks is absolutely not an option).
As a consequence, I partitioned the disks to have the operating system (openSuse Leap 15) and a free (unmounted) partition to test ceph as follows:
#######
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465,8G 0 disk
ââsda1 8:1 0 500M 0 part /boot/efi
ââsda2 8:2 0 16G 0 part [SWAP]
ââsda3 8:3 0 49,3G 0 part /
ââsda4 8:4 0 400G 0 part
sr0 11:0 1 3,7G 0 rom
#######
Our intention is that sda4 is used by the ceph cluster to store data that will be spread among the 6 machines, being accessible by all while providing some level of redundancy (losing two computers will keep data available, etc).
Among some difficulties I faced in the different stages of deploying it, we had the same issue with empty profile-default/stack/default/ceph/minions/node*.yml files that vazaari reported on https://forums.suse.com/showthread.php?11788-state-orch-ceph-stage-discovery-does-not-collect-HDD-info and solved in a similar fashion with handly edited files:
#######
# salt -I 'roles:storage' pillar.get ceph
node02:
----------
storage:
----------
osds:
----------
/dev/sda4:
----------
format:
bluestore
standalone:
True
(and so on for all 6 machines)
#######
Please note that the device points straight to the sda4 partition, instead of just to the sdb device as in vazaari’s thread example.
After this, everything goes smoothly until I run:
#######
# salt-run state.orch ceph.stage.deploy
(...)
[14/71] ceph.sysctl on
node01....................................... â (0.5s)
node02........................................ â (0.7s)
node03....................................... â (0.6s)
node04......................................... â (0.5s)
node05....................................... â (0.6s)
node06.......................................... â (0.5s)
[15/71] ceph.osd on
node01...................................... â (0.7s)
node02........................................ â (0.7s)
node03....................................... â (0.7s)
node04......................................... â (0.6s)
node05....................................... â (0.6s)
node06.......................................... â (0.7s)
Ended stage: ceph.stage.deploy succeeded=14/71 failed=1/71 time=624.7s
Failures summary:
ceph.osd (/srv/salt/ceph/osd):
node02:
deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node02 for cephdisks.list
node03:
deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node03 for cephdisks.list
node01:
deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node01 for cephdisks.list
node04:
deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node04 for cephdisks.list
node05:
deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node05 for cephdisks.list
node06:
deploy OSDs: Module function osd.deploy threw an exception. Exception: Mine on node06 for cephdisks.list
#######
Which is where I got stucked. As extra information here follows our “policy.cfg” file:
#########
cluster-ceph/cluster/*.sls
profile-default/cluster/*.sls
profile-default/stack/default/ceph/minions/*yml
config/stack/default/global.yml
config/stack/default/ceph/cluster.yml
role-master/cluster/node01.sls
role-admin/cluster/*.sls
role-mon/cluster/*.sls
role-mgr/cluster/*.sls
role-mds/cluster/*.sls
role-ganesha/cluster/*.sls
role-client-nfs/cluster/*.sls
role-client-cephfs/cluster/*.sls
##########
Can anybody help me here with this issue? Where precisely am I most probably messing things up? What should I look for? Should I have not partitioned the sda device up to sda4 and left the space for ceph to find it? Or maybe and unfortunately it is simply not feasible with openSuse Leap + Ceph?
Thanks a lot in advance for any help provided!
Sincerely yours,
Jones