SES 4 issues

OK, let’s investigate and have play with 3-node cluster, plus a monitoring node. Documentation is here.
CephFS is officially production-ready.
SES 4 still based on 10.x.x Ceph core. Not actual “Kraken” (11.x.x), which claims the stress-test stability for BlueStore ‘engine’ (detailed presentation is here).
The Calamary management project looks dead, as it stuck with 2014.7 version of SALT, which is completely outdated. The OpenATTIC package came instead. Installs easily, operates fine, looks nice - great choice, SUSE! Hope, it will last for long time.

First issue I found coupled with CephFS (preliminary steps are simple like a charm). Tried to mount CephFS in a kernel during boot, putting corresponding line into /etc/fstab.
Got the emergency mode prompt because of “libceph -101 connect error”. The ifconfig command shows no external interfaces at this stage. Tried to comment the line out, boot the system, uncomment the line and mount - success.
Thus, possibilities are:

  • The network mount via libceph improperly marked as vital to boot the kernel and causes system to interrupt the boot process.
  • System improperly tries to make the mount before NIC drivers are loaded, which causes the fault.
  • System improperly does not ignore the improper network mount. Instead, it must ignore it and try the mount again after all hardware and network layer will be completely initialized.
    Seems something is wrong with libceph or SLES12SP2 kernel.

Workaround I found. The modified line in /etc/fstab works fine

172.18.66.61:6789,172.18.66.62,172.18.66.63:/ /mnt/cephfs ceph name=admin,secretfile=/etc/ceph/secret.key,[B]auto,x-systemd.automount[/B],noatime 0 [B]0[/B]

Possibly, the SES 4 documentation should be revised.

Hi,

it seems that systemd (systemd-fstab-generator) is not yet able to recognize CephFS as a network file system. So instead of creating an automount (which I believe may cause errors if accessed during early boot stage, before networking is up), you may want to add the mount option

x-systemd.requires=network-online.target

See “man systemd.mount”, sections on automatic dependencies and fstab entries. YMMV, I currently have no test bed to try before posting.

Regards,
J

Greetings!

Tried. No, the “x-systemd.requires=network-online.target” option causes OS to boot without external network interfaces, loopback interface is only visible.
However, the “requires-mounts-for=…” looks helpful for clusters with multiple protocol acess to Ceph data (CIFS + NFS).

Have a nice weekend!

Another issue found. Steps I took:

  • Created ~176 GiB filesystem using SES cluster;
  • Mounted in at /mnt/cephfs point;
  • Made /mnt/cephfs/NFS directory;
  • Exported /mnt/cephfs/NFS via SES4 NFS server functionality;
  • Added the export as VMWare datastore;
  • Deployed a VM with virtual drive (/dev/sdb) at NFS-mounted datastore;
  • Loaded the /dev/sdb via FIO-generated load.
  • After night load the cluster health came to ERROR due to OSD overload (>90%) and filesystem stuck at write operation. OK.
    [INDENT][SIZE=1]ceph02admin:/mnt/cephfs/NFS # ceph -s cluster bd8aa69c-f316-4aa6-9128-6225c80024f6 health HEALTH_ERR 30 pgs backfill_toofull 30 pgs stuck unclean recovery 862/49712 objects degraded (1.734%) recovery 13382/49712 objects misplaced (26.919%) 1 full osd(s) 2 near full osd(s) nearfull,full,sortbitwise,require_jewel_osds flag(s) set monmap e1: 3 mons at {ceph02node01=172.18.66.61:6789/0,ceph02node02=172.18.66.62:6789/0,ceph02node03=172.18.66.63:6789/0} election epoch 10, quorum 0,1,2 ceph02node01,ceph02node02,ceph02node03 fsmap e11: 1/1/1 up {0=fsgw01=up:active} osdmap e124: 3 osds: 3 up, 3 in; 30 remapped pgs flags nearfull,full,sortbitwise,require_jewel_osds pgmap v169762: 250 pgs, 13 pools, 84846 MB data, 21423 objects 165 GB used, 11207 MB / 176 GB avail 862/49712 objects degraded (1.734%) 13382/49712 objects misplaced (26.919%) 220 active+clean 30 active+remapped+backfill_toofull client io 1125 B/s rd, 1 op/s rd, 0 op/s wr[/SIZE][/INDENT]
  • Stopped the VM and deleted the virtual drive at datastore. OSD remained overloaded as the filesystem space not released. Not OK.
    [INDENT][SIZE=1]ceph02admin:/mnt/cephfs/NFS # ceph df detail GLOBAL: SIZE AVAIL RAW USED %RAW USED OBJECTS 176G 11207M 165G 93.81 21423 POOLS: NAME ID CATEGORY QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY REA D WRITE RAW USED rbd 0 - N/A N/A 0 0 3010M 0 0 0 0 0 iscsipool 3 - N/A N/A 5174 0 4516M 4 4 229 9k 25 10348 .rgw.root 4 - N/A N/A 1588 0 3010M 4 4 2 04 4 4764 default.rgw.control 5 - N/A N/A 0 0 3010M 8 8 0 0 0 default.rgw.data.root 6 - N/A N/A 0 0 3010M 0 0 0 0 0 default.rgw.gc 7 - N/A N/A 0 0 3010M 32 32 477 10 31808 0 default.rgw.log 8 - N/A N/A 0 0 3010M 127 127 110 5k 736k 0 default.rgw.users.uid 9 - N/A N/A 551 0 3010M 1 1 1 5 1653 default.rgw.users.email 10 - N/A N/A 14 0 3010M 1 1 0 1 42 default.rgw.users.keys 11 - N/A N/A 14 0 3010M 1 1 0 1 42 default.rgw.users.swift 12 - N/A N/A 14 0 3010M 1 1 0 1 42 cephfs_data 13 - N/A N/A 84804M 94.86 4516M 21205 21205 6 51 821k 162G cephfs_metadata 14 - N/A N/A 42965k 0.92 4516M 39 39 835 13 275k 85930k[/SIZE][/INDENT]
  • Found a lot strays in the statistics. Not OK.
    [INDENT]ceph02node01:~ # ceph daemon mds.fsgw01 perf dump | grep stray "num_strays": 12913, "num_strays_purging": 0, "num_strays_delayed": 0, "strays_created": 19913, "strays_purged": 7000, "strays_reintegrated": 0, "strays_migrated": 0[/INDENT]
  • Tried to clean the filesystem data.
    [INDENT]ceph02node01:~ # ceph daemon mds.fsgw01 flush journal { "message": "", "return_code": 0 }[/INDENT]
  • Ceph cluster came to OK state.
    [INDENT][SIZE=1]ceph02node01:~ # ceph -s cluster bd8aa69c-f316-4aa6-9128-6225c80024f6 health HEALTH_OK monmap e1: 3 mons at {ceph02node01=172.18.66.61:6789/0,ceph02node02=172.18.66.62:6789/0,ceph02node03=172.18.66.63:6789/0} election epoch 10, quorum 0,1,2 ceph02node01,ceph02node02,ceph02node03 fsmap e11: 1/1/1 up {0=fsgw01=up:active} osdmap e158: 3 osds: 3 up, 3 in flags sortbitwise,require_jewel_osds pgmap v170084: 250 pgs, 13 pools, 2039 kB data, 210 objects 199 MB used, 176 GB / 176 GB avail 250 active+clean client io 1015 B/s rd, 2 op/s rd, 0 op/s wr[/SIZE][/INDENT]

The question remained:

  • What and how should be tweaked to make SES filesystem to purge all released data immediately?

Saying more.
I just loaded the chain “VMDK → VMWare Datastore → NFS → CephFS” using FIO at single VM for 3 days.
What is THAT, how IT happens and how to prevent THAT in the future? :cool:
SES cluster still in the OK health, it still has 3 nodes with 3 monitors and 3 OSD ~64GB each.

SUSE-customized OpenAttic does not allow CRUSH map and ruleset change. Yes, the SES 4 online manual presents view-mode only, but the original software is much functional.

Two more issues, which are, possibly, coupled:

  • The SUSE-customized OpenATTIC does not display “Nodes” information (“No matching records found”).
  • OpenATTIC log file has a lot of error messages about RADOS gateway keyring, SALT and performance data file.

2017-02-24 15:21:24,421 - INFO - openattic_systemd#loggedfunc - Calling /ceph_deployment::invoke_salt_key(dbus.Array([dbus.String(u'-L')], signature=dbus.Signature('s'))) 2017-02-24 15:21:25,119 - ERROR - ceph.models#set_performance_data_options - Set performance_data_options failed: XML file '/var/lib/pnp4nagios/perfdata/ceph02admin/Check_CephRbd_bd8aa69c-f316-4aa6-9128-6225c80024f6_rbd_iscsi.xml' could not be found. 2017-02-24 15:22:30,870 - ERROR - ceph.librados#__init__ - No usable keyring 2017-02-24 15:22:31,086 - ERROR - ceph.librados#__init__ - No usable keyring 2017-02-24 15:22:37,875 - ERROR - ceph.librados#__init__ - No usable keyring
OpenATTIC reinstall/reinit was in vain.

Database openattic exists, owned by openattic Creating tables ... Installing custom SQL ... Installing indexes ... Installed 70 object(s) from 2 fixture(s) We have an admin already, not creating default user. Found lo Found eth0 Found eth1 The authentication token for 'openattic' does already exist. checking disk /dev/sda serial not found Checking Ceph cluster ceph (bd8aa69c-f316-4aa6-9128-6225c80024f6)... [U]known[/U] Checking Ceph OSD 0... known Checking Ceph OSD 1... known Checking Ceph OSD 2... known Checking Ceph pool rbd... known Checking Ceph pool iscsipool... known Checking Ceph pool .rgw.root... known Checking Ceph pool default.rgw.control... known Checking Ceph pool default.rgw.data.root... known Checking Ceph pool default.rgw.gc... known Checking Ceph pool default.rgw.log... known Checking Ceph pool default.rgw.users.uid... known Checking Ceph pool default.rgw.users.email... known Checking Ceph pool default.rgw.users.keys... known Checking Ceph pool default.rgw.users.swift... known Checking Ceph pool cephfs_data... known Checking Ceph pool cephfs_metadata... known Checking Ceph mds fsgw01... skipped Checking Ceph mon ceph02node01... skipped Checking Ceph mon ceph02node02... skipped Checking Ceph mon ceph02node03... skipped Checking Ceph auth entity mds.fsgw01... found Checking Ceph auth entity osd.0... found Checking Ceph auth entity osd.1... found Checking Ceph auth entity osd.2... found Checking Ceph auth entity client.admin... [U]found[/U] Checking Ceph auth entity client.bootstrap-mds... [U]found[/U] Checking Ceph auth entity client.bootstrap-osd... [U]found[/U] Checking Ceph auth entity client.bootstrap-rgw... [U]found[/U] Checking Ceph auth entity client.rgw.rgw01... [U]found[/U] Checking Ceph auth entity client.rgw.rgw02... [U]found[/U] Checking Ceph auth entity client.rgw.rgw03... [U]found[/U] Updating Nagios configs: adding detected Ceph clusters /etc/openattic/cli.conf already exists Completed successfully.
I tried to set the group and access for keyring files (as mentioned in the OpenATTIC documenttion). In vain as well.

ceph02admin:/etc/ceph # ls -la total 256 drwxr-xr-x 1 root root 360 Feb 6 18:02 . drwxr-xr-x 1 root root 5576 Feb 24 14:12 .. -rw-r--r-- 1 cephadm root 1077 Feb 3 18:13 .cephdeploy.conf -rw-rw-rw- 1 cephadm users 221326 Feb 24 13:59 ceph-deploy-ceph.log -rw-r----- 1 cephadm [U]openattic[/U] 71 Feb 3 18:43 [U]ceph.bootstrap-mds.keyring[/U] -rw-r----- 1 cephadm [U]openattic[/U] 71 Feb 3 18:43 [U]ceph.bootstrap-osd.keyring[/U] -rw-r----- 1 cephadm [U]openattic[/U] 71 Feb 3 18:43 [U]ceph.bootstrap-rgw.keyring[/U] -rw-r----- 1 cephadm [U]openattic[/U] 63 Feb 3 18:43 [U]ceph.client.admin.keyring[/U] -rw-r----- 1 cephadm openattic 632 Feb 6 13:08 ceph.conf -rw-r----- 1 cephadm [U]openattic[/U] 73 Feb 3 18:20 [U]ceph.mon.keyring[/U] -rwxr-xr-x 1 root root 92 Dec 9 16:28 rbdmap -rw-r----- 1 root root 42 Feb 6 18:02 secret.key
However, the cluster is fresh and simple: 3x MON/OSD/RGW nodes + Admin node with OpenATTIC. The ceph.conf is also simple and cluster ID the same, as mentioned above:

[CODE][global]
fsid = bd8aa69c-f316-4aa6-9128-6225c80024f6
public_network = 172.18.64.0/22
cluster_network = 192.168.124.0/24
mon_initial_members = ceph02node01, ceph02node02, ceph02node03
mon_host = 172.18.66.61,172.18.66.62,172.18.66.63
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

[client.rgw.rgw01]
host = ceph02node01
rgw_dns_name = ceph02node01
rgw_frontends = civetweb port=7480

[client.rgw.rgw02]
host = ceph02node02
rgw_dns_name = ceph02node02
rgw_frontends = civetweb port=7480

[client.rgw.rgw03]
host = ceph02node03
rgw_dns_name = ceph02node03
rgw_frontends = civetweb port=7480[/CODE]
The keyring file has the default user registered:

ceph02admin:/etc/ceph # ceph-authtool -l ./ceph.client.admin.keyring [client.admin] key = AQBar5RYl3mcARAA+axYUM0Y7nJazb2HEbsfIA==

Additional information: list of entities

[CODE]ceph02admin:/etc/ceph # ceph auth list
installed auth entries:

mds.fsgw01
key: AQAkhJhYld6SBRAAYh2z8BsseBNlIC7qCStInQ==
caps: [mds] allow
caps: [mon] allow profile mds
caps: [osd] allow rwx
osd.0
key: AQD0s5RYOrMoKxAAvPgCsrM++8K3eGZNvczdkw==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: AQAGtJRYNCOUMBAA/fGe0gMJjFrlqKWpv+M6tA==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.2
key: AQAYtJRYAxdNHxAA4x77lNo8dKbZ6oH9MFYWdg==
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: AQBar5RYl3mcARAA+axYUM0Y7nJazb2HEbsfIA==
caps: [mds] allow *
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
key: AQBar5RYuOGhGhAAR6n05tlzpuBwkb2Ik4oO8A==
caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
key: AQBar5RYwwHjMhAAzJ2GE+6oVhYxlrzPsp/qEg==
caps: [mon] allow profile bootstrap-osd
client.bootstrap-rgw
key: AQBbr5RYsfGEERAA5i3pu79Kl6w9CtCjZRsFdA==
caps: [mon] allow profile bootstrap-rgw
client.rgw.rgw01
key: AQCLWJhYJiFxEBAAKmTSDNu02/ySvY4AvMVD0g==
caps: [mon] allow rw
caps: [osd] allow rwx
client.rgw.rgw02
key: AQATWZhYIT0JGhAAjIRx5VfvmtdgZLsUrEyJCg==
caps: [mon] allow rw
caps: [osd] allow rwx
client.rgw.rgw03
key: AQAVWZhY/7mjLhAAueAmh9bh3U3L4tD+62glbA==
caps: [mon] allow rw
caps: [osd] allow rwx
[/CODE]