state.orch ceph.stage.discovery does not collect HDD info

vazaari · March 13, 2018, 4:47pm

Hi,

I have four nodes (1x admin, 3x osd, mons,…) prepared for the CEPH cluster.
For some reason discovery stage does not create templates for the osd nodes.

stage.0 completes successfully.
stage.1 completes successfully but osd templates are missing:

[CODE]

ls -la profile-default/stack/default/ceph/minions/

total 0
drwxr-xr-x 1 salt salt 0 Mar 13 16:03 .
drwxr-xr-x 1 salt salt 14 Mar 13 16:03 …[/CODE]

[CODE]

ls -la profile-default/cluster/

total 0
drwxr-xr-x 1 salt salt 0 Mar 13 16:03 .
drwxr-xr-x 1 salt salt 38 Mar 13 16:03 …[/CODE]

Deepsea monitor output for stage.1:

Starting stage: ceph.stage.1
Parsing ceph.stage.1 steps... &#10003;


Stage initialization output:
salt-api                 : valid
deepsea_minions          : valid
master_minion            : valid
ceph_version             : valid

[1/4]     minions.ready(timeout=300)................................. &#10003; (0.4s)

[2/4]     ceph.refresh on
          tw-ceph-admin.............................................. &#10003; (0.3s)

[3/4]     populate.proposals......................................... &#10003; (5s)

[4/4]     proposal.populate.......................................... &#10003; (1s)

Ended stage: ceph.stage.1 succeeded=4/4 time=28.3s[/CODE]

Master can see all minions:
[CODE]# salt-key -L
Accepted Keys:
tw-ceph-admin
tw-ceph-node1
tw-ceph-node2
tw-ceph-node3
Denied Keys:
Unaccepted Keys:
Rejected Keys:[/CODE]

All (pretended to be) osd nodes have per 3 unformated hdd:
[CODE]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    1 136.6G  0 disk 
&#9500;&#9472;sda1   8:1    1     7M  0 part 
&#9500;&#9472;sda2   8:2    1     2G  0 part [SWAP]
&#9492;&#9472;sda3   8:3    1 134.6G  0 part /
sdb      8:16   1 136.6G  0 disk 
sdc      8:32   1 136.6G  0 disk 
sr0     11:0    1  1024M  0 rom  [/CODE]

Please assist to find out what am I doing wrong?

Additional info:
[CODE]# lsb_release -a
LSB Version:	n/a
Distributor ID:	SUSE
Description:	SUSE Linux Enterprise Server 12 SP3
Release:	12.3
Codename:	n/a

[CODE]

zypper se -s --installed-only | grep ses-release

i+ | ses-release | package | 5-1.54 | x86_64 | SES5
i+ | ses-release | package | 5-1.54 | x86_64 | SUSE-Enterprise-Storage-5-Pool
i | ses-release-cd | package | 5-1.54 | x86_64 | SES5 [/CODE]

Weird warning:

salt-master[28198]: [WARNING ] Although 'dmidecode' was found in path, the current user cannot execute it. Grains output might not be accurate.

Firewall stopped, apparmor disabled.

brg,
Serhiy.

vazaari · March 14, 2018, 10:20am

Stage 0

[CODE]Starting stage: ceph.stage.0
Parsing ceph.stage.0 steps… ✓

Stage initialization output:
deepsea_minions : valid
master_minion : valid
ceph_version : valid

[1/14] ceph.salt-api on
tw-ceph-admin… ✓ (5s)

[2/14] ceph.sync on
tw-ceph-admin… ✓ (1s)

[3/14] ceph.repo on
tw-ceph-admin… ✓ (0.8s)

[4/14] ceph.updates on
tw-ceph-admin… ✓ (10s)

[5/14] filequeue.remove(item=lock)… ✓ (0.0s)

[6/14] ceph.updates.restart on
tw-ceph-admin… ✓ (2s)

[7/14] filequeue.add(item=complete)… ✓ (0.0s)

[8/14] minions.ready(timeout=300)… ✓ (0.4s)

[9/14] ceph.repo on
tw-ceph-node2… ✓ (0.3s)
tw-ceph-node3… ✓ (0.3s)
tw-ceph-node1… ✓ (0.3s)
tw-ceph-admin… ✓ (0.3s)

[10/14] ceph.packages.common on
tw-ceph-node2… ✓ (2s)
tw-ceph-node3… ✓ (2s)
tw-ceph-node1… ✓ (2s)
tw-ceph-admin… ✓ (3s)

[11/14] ceph.sync on
tw-ceph-node2… ✓ (1.0s)
tw-ceph-node3… ✓ (1s)
tw-ceph-node1… ✓ (1s)
tw-ceph-admin… ✓ (1s)

[12/14] ceph.mines on
tw-ceph-node2… ✓ (2s)
tw-ceph-node3… ✓ (2s)
tw-ceph-node1… ✓ (2s)
tw-ceph-admin… ✓ (2s)

[13/14] ceph.updates on
tw-ceph-node2… ✓ (19s)
tw-ceph-node3… ✓ (16s)
tw-ceph-node1… ✓ (21s)
tw-ceph-admin… ✓ (11s)

[14/14] ceph.updates.restart on
tw-ceph-node2… ✓ (3s)
tw-ceph-node3… ✓ (3s)
tw-ceph-node1… ✓ (3s)
tw-ceph-admin… ✓ (3s)

Ended stage: ceph.stage.0 succeeded=14/14 time=92.7s[/CODE]

vazaari · March 14, 2018, 3:45pm

Seems the problem is the lack of storage nodes and osds.

So I can generate templates with the following command:

salt-run proposal.populate leftovers=True standalone=True target='tw-ceph-node*'

but it generates empty osds list:

# cat profile-default/stack/default/ceph/minions/tw-ceph-node1.yml ceph: storage: osds: {}

Could you provide sample of .yml for storage role?

thsundel · March 14, 2018, 4:11pm

[QUOTE=vazaari;51528]Seems the problem is the lack of storage nodes and osds.

So I can generate templates with the following command:

salt-run proposal.populate leftovers=True standalone=True target='tw-ceph-node*'

but it generates empty osds list:

# cat profile-default/stack/default/ceph/minions/tw-ceph-node1.yml ceph: storage: osds: {}

Could you provide sample of .yml for storage role?[/QUOTE]

Here is an sample:

ceph:
storage:
osds:
/dev/disk/by-id/ata-ST4000VN0001-1SF178_Z4F0PS49:
db: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
db_size: 500m
format: bluestore
wal: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
wal_size: 500m
…And so on for next osds.

Thomas

thsundel · March 14, 2018, 4:13pm

ceph: storage: osds: /dev/disk/by-id/ata-ST4000VN0001-1SF178_Z4F0PS49: db: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00 db_size: 500m format: bluestore wal: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00 wal_size: 500m

Reposting with better “layout”

Thomas

vazaari · March 14, 2018, 5:11pm

Thomas, thanks a lot.

I’ve created the the yml with the following content:

ceph: storage: osds: /dev/sdb: format: bluestore standalone: true /dev/sdc: format: bluestore standalone: true

stage.2 passed successfully
stage.3 ends with errors

Module function osd.deploy threw an exception. Exception: Mine on tw-ceph-node1 for cephdisks.list

hdds are still not recognized:

# salt 'tw-ceph-node*' cephdisks.list tw-ceph-node2: tw-ceph-node3: tw-ceph-node1:

Nodes itself see the disks, f.e.:

hwinfo --disk | egrep 'sdb|sdc' SysFS ID: /class/block/sdb Device File: /dev/sdb (/dev/sg1) Device Files: /dev/sdb, /dev/disk/by-id/scsi-1ADAPTEC_ARRAY_4022A45B, /dev/disk/by-id/scsi-25ba4224000d00000, /dev/disk/by-id/scsi-SServeRA_disk1_4022A45B, /dev/disk/by-path/pci-0000:04:00.0-scsi-0:0:1:0 SysFS ID: /class/block/sdc Device File: /dev/sdc (/dev/sg2) Device Files: /dev/sdc, /dev/disk/by-id/scsi-1ADAPTEC_ARRAY_354EA45B, /dev/disk/by-id/scsi-25ba44e3500d00000, /dev/disk/by-id/scsi-SServeRA_disk2_354EA45B, /dev/disk/by-path/pci-0000:04:00.0-scsi-0:0:2:0

vazaari · March 15, 2018, 2:04pm

Seems that deepsea does not like hdds under raid controllers or some of them.
In my case raid controller does not support jbod mode so I have to create volumes which recognised by host as ordinary hdds.

I’ve tried to deploy ceph with ‘ceph-deploy’ and it also failed to create osds…
But I’ve created one partition per each hdd and osds had been created successfully with

# ceph-deploy osd create tw-ceph-nodeY --data /dev/sdX1

After that I played a little bit with destroy, purge,… and get success with unpartitioned disk:

on the osd node side
zap disk and reboot host
ceph-volume lvm prepare --osd-id {id} --data /dev/sdX
on the ceph-deploy node
ceph-deploy osd create tw-ceph-nodeY --data /dev/sdX

Now I’m trying to ‘migrate’ ceph-deploy to deepsea (as in the case of ses[3,4] to ses5 ).
I successfuly have passed stages 0,1,2, edited /srv/modules/runners/validate.py to bypass ‘4 nodes’ requirement and stuck on the stage 3:

[CODE][13/44] ceph.osd.auth on
tw-ceph-admin… (2s)

Ended stage: ceph.stage.3 succeeded=12/44 failed=1/44 time=85.1s

Failures summary:

ceph.osd.auth (/srv/salt/ceph/osd/auth):
tw-ceph-admin:
auth /srv/salt/ceph/osd/cache/bootstrap.keyring: Command “ceph auth add client.bootstrap-osd -i /srv/salt/ceph/osd/cache/bootstrap.keyring” run
stdout:
stderr: Error EINVAL: entity client.bootstrap-osd exists but caps do not match[/CODE]

vazaari · March 15, 2018, 2:29pm

The following trick helps to solve ‘caps do not match’ and pass the stage 3:

# ceph auth caps client.bootstrap-osd mgr \\ "allow r" mon "allow profile bootstrap-osd"

vazaari · March 15, 2018, 2:38pm

Closer and closer…
Faced the igw issue with stage 4:

[CODE][6/16] ceph.igw on
tw-ceph-node1… (4s)
…
Ended stage: ceph.stage.4 succeeded=15/16 failed=1/16 time=547.9s
…
Failures summary:

ceph.igw (/srv/salt/ceph/igw):
tw-ceph-node1:
reload lrbd: Module function service.restart executed
enable lrbd: Service lrbd has been enabled, and is dead
[/CODE]
Not yet solved.

thsundel · March 15, 2018, 4:30pm

[QUOTE=vazaari;51561]Closer and closer…
Faced the igw issue with stage 4:

[CODE][6/16] ceph.igw on
tw-ceph-node1… (4s)
…
Ended stage: ceph.stage.4 succeeded=15/16 failed=1/16 time=547.9s
…
Failures summary:

ceph.igw (/srv/salt/ceph/igw):
tw-ceph-node1:
reload lrbd: Module function service.restart executed
enable lrbd: Service lrbd has been enabled, and is dead
[/CODE]
Not yet solved.[/QUOTE]

Maybe this is what you encountered: https://www.novell.com/support/kb/doc.php?id=7018668

Thomas

vazaari · March 15, 2018, 5:10pm

[QUOTE=thsundel;51566]Maybe this is what you encountered: https://www.novell.com/support/kb/doc.php?id=7018668

Thomas[/QUOTE]

I’ve recreated rdb & iscsi from the openAttic.

Topic		Replies	Views
Stage 1 Change Number of OSD Disks SUSE Enterprise Storage	1	554	August 31, 2018
SES6 - Stage.3 Disks.deploy not deploy all disks SUSE Enterprise Storage	6	534	December 18, 2019
Single HDD ceph cluster SUSE Enterprise Storage	5	1033	September 5, 2018
SES5 Install not adding all OSDs SUSE Enterprise Storage	3	562	September 27, 2018
How to install and configure SES with Ceph SES201 SUSE Enterprise Storage Basic Operations	3	825	November 8, 2021

state.orch ceph.stage.discovery does not collect HDD info

ls -la profile-default/stack/default/ceph/minions/

ls -la profile-default/cluster/

zypper se -s --installed-only | grep ses-release

Related topics