state.orch ceph.stage.discovery does not collect HDD info

Hi,

I have four nodes (1x admin, 3x osd, mons,…) prepared for the CEPH cluster.
For some reason discovery stage does not create templates for the osd nodes.

stage.0 completes successfully.
stage.1 completes successfully but osd templates are missing:

[CODE]

ls -la profile-default/stack/default/ceph/minions/

total 0
drwxr-xr-x 1 salt salt 0 Mar 13 16:03 .
drwxr-xr-x 1 salt salt 14 Mar 13 16:03 …[/CODE]

[CODE]

ls -la profile-default/cluster/

total 0
drwxr-xr-x 1 salt salt 0 Mar 13 16:03 .
drwxr-xr-x 1 salt salt 38 Mar 13 16:03 …[/CODE]

Deepsea monitor output for stage.1:

Starting stage: ceph.stage.1
Parsing ceph.stage.1 steps... ✓


Stage initialization output:
salt-api                 : valid
deepsea_minions          : valid
master_minion            : valid
ceph_version             : valid

[1/4]     minions.ready(timeout=300)................................. ✓ (0.4s)

[2/4]     ceph.refresh on
          tw-ceph-admin.............................................. ✓ (0.3s)

[3/4]     populate.proposals......................................... ✓ (5s)

[4/4]     proposal.populate.......................................... ✓ (1s)

Ended stage: ceph.stage.1 succeeded=4/4 time=28.3s[/CODE]

Master can see all minions:
[CODE]# salt-key -L
Accepted Keys:
tw-ceph-admin
tw-ceph-node1
tw-ceph-node2
tw-ceph-node3
Denied Keys:
Unaccepted Keys:
Rejected Keys:[/CODE]

All (pretended to be) osd nodes have per 3 unformated hdd:
[CODE]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    1 136.6G  0 disk 
├─sda1   8:1    1     7M  0 part 
├─sda2   8:2    1     2G  0 part [SWAP]
└─sda3   8:3    1 134.6G  0 part /
sdb      8:16   1 136.6G  0 disk 
sdc      8:32   1 136.6G  0 disk 
sr0     11:0    1  1024M  0 rom  [/CODE]

Please assist to find out what am I doing wrong?

Additional info:
[CODE]# lsb_release -a
LSB Version:	n/a
Distributor ID:	SUSE
Description:	SUSE Linux Enterprise Server 12 SP3
Release:	12.3
Codename:	n/a

[CODE]

zypper se -s --installed-only | grep ses-release

i+ | ses-release | package | 5-1.54 | x86_64 | SES5
i+ | ses-release | package | 5-1.54 | x86_64 | SUSE-Enterprise-Storage-5-Pool
i | ses-release-cd | package | 5-1.54 | x86_64 | SES5 [/CODE]

Weird warning:

salt-master[28198]: [WARNING ] Although 'dmidecode' was found in path, the current user cannot execute it. Grains output might not be accurate.

Firewall stopped, apparmor disabled.

brg,
Serhiy.

Stage 0

[CODE]Starting stage: ceph.stage.0
Parsing ceph.stage.0 steps… ✓

Stage initialization output:
deepsea_minions : valid
master_minion : valid
ceph_version : valid

[1/14] ceph.salt-api on
tw-ceph-admin… ✓ (5s)

[2/14] ceph.sync on
tw-ceph-admin… ✓ (1s)

[3/14] ceph.repo on
tw-ceph-admin… ✓ (0.8s)

[4/14] ceph.updates on
tw-ceph-admin… ✓ (10s)

[5/14] filequeue.remove(item=lock)… ✓ (0.0s)

[6/14] ceph.updates.restart on
tw-ceph-admin… ✓ (2s)

[7/14] filequeue.add(item=complete)… ✓ (0.0s)

[8/14] minions.ready(timeout=300)… ✓ (0.4s)

[9/14] ceph.repo on
tw-ceph-node2… ✓ (0.3s)
tw-ceph-node3… ✓ (0.3s)
tw-ceph-node1… ✓ (0.3s)
tw-ceph-admin… ✓ (0.3s)

[10/14] ceph.packages.common on
tw-ceph-node2… ✓ (2s)
tw-ceph-node3… ✓ (2s)
tw-ceph-node1… ✓ (2s)
tw-ceph-admin… ✓ (3s)

[11/14] ceph.sync on
tw-ceph-node2… ✓ (1.0s)
tw-ceph-node3… ✓ (1s)
tw-ceph-node1… ✓ (1s)
tw-ceph-admin… ✓ (1s)

[12/14] ceph.mines on
tw-ceph-node2… ✓ (2s)
tw-ceph-node3… ✓ (2s)
tw-ceph-node1… ✓ (2s)
tw-ceph-admin… ✓ (2s)

[13/14] ceph.updates on
tw-ceph-node2… ✓ (19s)
tw-ceph-node3… ✓ (16s)
tw-ceph-node1… ✓ (21s)
tw-ceph-admin… ✓ (11s)

[14/14] ceph.updates.restart on
tw-ceph-node2… ✓ (3s)
tw-ceph-node3… ✓ (3s)
tw-ceph-node1… ✓ (3s)
tw-ceph-admin… ✓ (3s)

Ended stage: ceph.stage.0 succeeded=14/14 time=92.7s[/CODE]

Seems the problem is the lack of storage nodes and osds.

So I can generate templates with the following command:

salt-run proposal.populate leftovers=True standalone=True target='tw-ceph-node*'

but it generates empty osds list:

# cat profile-default/stack/default/ceph/minions/tw-ceph-node1.yml ceph: storage: osds: {}

Could you provide sample of .yml for storage role?

[QUOTE=vazaari;51528]Seems the problem is the lack of storage nodes and osds.

So I can generate templates with the following command:

salt-run proposal.populate leftovers=True standalone=True target='tw-ceph-node*'

but it generates empty osds list:

# cat profile-default/stack/default/ceph/minions/tw-ceph-node1.yml ceph: storage: osds: {}

Could you provide sample of .yml for storage role?[/QUOTE]

Here is an sample:

ceph:
storage:
osds:
/dev/disk/by-id/ata-ST4000VN0001-1SF178_Z4F0PS49:
db: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
db_size: 500m
format: bluestore
wal: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00
wal_size: 500m
…And so on for next osds.

Thomas

ceph: storage: osds: /dev/disk/by-id/ata-ST4000VN0001-1SF178_Z4F0PS49: db: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00 db_size: 500m format: bluestore wal: /dev/disk/by-id/nvme-20000000001000000e4d25c0bf42f4c00 wal_size: 500m

Reposting with better “layout”

Thomas

Thomas, thanks a lot.

I’ve created the the yml with the following content:

ceph: storage: osds: /dev/sdb: format: bluestore standalone: true /dev/sdc: format: bluestore standalone: true

stage.2 passed successfully
stage.3 ends with errors

Module function osd.deploy threw an exception. Exception: Mine on tw-ceph-node1 for cephdisks.list

hdds are still not recognized:

# salt 'tw-ceph-node*' cephdisks.list tw-ceph-node2: tw-ceph-node3: tw-ceph-node1:

Nodes itself see the disks, f.e.:

hwinfo --disk | egrep 'sdb|sdc' SysFS ID: /class/block/sdb Device File: /dev/sdb (/dev/sg1) Device Files: /dev/sdb, /dev/disk/by-id/scsi-1ADAPTEC_ARRAY_4022A45B, /dev/disk/by-id/scsi-25ba4224000d00000, /dev/disk/by-id/scsi-SServeRA_disk1_4022A45B, /dev/disk/by-path/pci-0000:04:00.0-scsi-0:0:1:0 SysFS ID: /class/block/sdc Device File: /dev/sdc (/dev/sg2) Device Files: /dev/sdc, /dev/disk/by-id/scsi-1ADAPTEC_ARRAY_354EA45B, /dev/disk/by-id/scsi-25ba44e3500d00000, /dev/disk/by-id/scsi-SServeRA_disk2_354EA45B, /dev/disk/by-path/pci-0000:04:00.0-scsi-0:0:2:0

Seems that deepsea does not like hdds under raid controllers or some of them.
In my case raid controller does not support jbod mode so I have to create volumes which recognised by host as ordinary hdds.

I’ve tried to deploy ceph with ‘ceph-deploy’ and it also failed to create osds…
But I’ve created one partition per each hdd and osds had been created successfully with

# ceph-deploy osd create tw-ceph-nodeY --data /dev/sdX1

After that I played a little bit with destroy, purge,… and get success with unpartitioned disk:

  • on the osd node side
    zap disk and reboot host
    ceph-volume lvm prepare --osd-id {id} --data /dev/sdX

  • on the ceph-deploy node
    ceph-deploy osd create tw-ceph-nodeY --data /dev/sdX

Now I’m trying to ‘migrate’ ceph-deploy to deepsea (as in the case of ses[3,4] to ses5 ).
I successfuly have passed stages 0,1,2, edited /srv/modules/runners/validate.py to bypass ‘4 nodes’ requirement and stuck on the stage 3:

[CODE][13/44] ceph.osd.auth on
tw-ceph-admin… :x: (2s)

Ended stage: ceph.stage.3 succeeded=12/44 failed=1/44 time=85.1s

Failures summary:

ceph.osd.auth (/srv/salt/ceph/osd/auth):
tw-ceph-admin:
auth /srv/salt/ceph/osd/cache/bootstrap.keyring: Command “ceph auth add client.bootstrap-osd -i /srv/salt/ceph/osd/cache/bootstrap.keyring” run
stdout:
stderr: Error EINVAL: entity client.bootstrap-osd exists but caps do not match[/CODE]

The following trick helps to solve ‘caps do not match’ and pass the stage 3:

# ceph auth caps client.bootstrap-osd mgr \\ "allow r" mon "allow profile bootstrap-osd"

Closer and closer…
Faced the igw issue with stage 4:

[CODE][6/16] ceph.igw on
tw-ceph-node1… :x: (4s)

Ended stage: ceph.stage.4 succeeded=15/16 failed=1/16 time=547.9s

Failures summary:

ceph.igw (/srv/salt/ceph/igw):
tw-ceph-node1:
reload lrbd: Module function service.restart executed
enable lrbd: Service lrbd has been enabled, and is dead
[/CODE]
Not yet solved.

[QUOTE=vazaari;51561]Closer and closer…
Faced the igw issue with stage 4:

[CODE][6/16] ceph.igw on
tw-ceph-node1… :x: (4s)

Ended stage: ceph.stage.4 succeeded=15/16 failed=1/16 time=547.9s

Failures summary:

ceph.igw (/srv/salt/ceph/igw):
tw-ceph-node1:
reload lrbd: Module function service.restart executed
enable lrbd: Service lrbd has been enabled, and is dead
[/CODE]
Not yet solved.[/QUOTE]

Maybe this is what you encountered: https://www.novell.com/support/kb/doc.php?id=7018668

Thomas

[QUOTE=thsundel;51566]Maybe this is what you encountered: https://www.novell.com/support/kb/doc.php?id=7018668

Thomas[/QUOTE]

I’ve recreated rdb & iscsi from the openAttic.