OSD are down, restarted the process but it is down again later...

I had an issue “ceph -s” hangs for a long time before returning error, so I shutdown all data* and then mon* one by one. Then start mon* one by one and data*.

I then noticed that ceph osd tree showed 4 OSDs as down (all 9 OSD processes are there), so I left it running for some time. After coming back, I found that now 2 OSDs are down with the corresponding OSD processes dead, so I restarted them. However, OSD are still down…

(Sorry, I cannot cut and paste from the lab.) Now osd.4 and osd.8 are down. Also, processes corresponding to osd.8 are now down too. I tried to restart it but systemctl restart failed and

systemctl status ceph-osd@8 showed “start request repeated too quickly. Failed to start Ceph object storage daemon osd.8”

/var/log/ceph/ceph-osd.8.log shows wait_auth_rotating time out after 30. unable to obtain rotating service keys; retrying. init wait_auth_rotating time out.

Now osd.4 corresponding process also gone and cannot restart. It gave the same error as osd.8 above)

There are some message about clock skew (check_auth_rotating possible clock skew, rotating key expired way too early), but I verified that all clock are the same (salt \* cmd.run cmd=date)…

Please advise how to solve this…
Thank you very much,
Theeraphong T.

For the clock issue, do a systemctl restart chronyd on the machine having the issue. I will look into how to fix the OSDs.

The virtual machine host is probably resource starved. I would reboot the KVM host system.
Then slowly, deliberately, start each of the cluster VMs, starting exclusively with the MONs. Start each MON and let each start up completely before moving to the next.
After the MONs are all started, then start the Admin. When the Admin is started, go to the bash console and run “ceph -s” to ensure that the MONs are responding. The cluster won’t be up because there are no OSDs, but at least “ceph -s” should respond.
After “ceph -s” shows that the MONs are responding, then start slowly, deliberately, powering on the data hosts. After each is fully up and running, run “ceph -s” and “osd tree” to determine the health of the OSDs on that data host. Don’t start another data host until the OSDs on this host are up/in from a ceph perspective.

I followed the recommended procedure and managed to solve the issue of OSD down…
I have a few questions that I’d like your advices:

  • After shutting down all and reboot KVM hosts, I started MON one by one, but ceph -s did not show anything. I then noticed that there is no process ceph-mon and ceph-mgr on any MON node!! I tried to manually start it via systemctl start ceph-mon@x / ceph-mgr@x, but it did not work. (I used to use the same command to start ceph-osd@n and it worked…)
  1. What is the correct way to start ceph-mon and ceph-mgr?
    So I tried rebooting MON one by one, and now all mon/mgr processes are there, and then ceph -s works.
  2. I notice that ps -ef shows ceph-mon / ceph-mgr, but systemctl status ceph-mon@x (ceph-mgr@x) shows inactive (dead). Is this normal?
    (The reason I asked was because on data* node, ps -ef shows osd processes and systemctl status ceph-osd@n also show active (running), so I expect it to be the same on mon* nodes…)

When ceph -s works, ceph osd tree shows that osd.0 and osd.7 (on data1) is up!?!
3. How come?? I have not started any data* node yet!
(I noticed that, these same OSDs were also shown as up, after I shutdown all data* nodes! I waited for sometime for it to refresh the status to down, but after > 10 min, the status was still up, so I proceeded to shutdown mon* node…)
4. So Mon* node remember the previous status of OSD? (I think it should just detect the latest status…)

Then I started data1, wait until all OSD came up, then started data2. However, not all OSD came up and I notice that the one not coming up did not have corresponding process in ps -ef. So I used systemctl restart ceph-osd@n to restart it and it came back up.
Then I started data3, it has the same issue as data2 (one OSD did not come up and have to manually start)…
5. ceph-osd is under systemctl control, why don’t we have systemctl automatically restart it if it is dead?
Thank you very much for your advices,
Theeraphong T.

Without being able to see your lab environment, it is very difficult to troubleshoot remotely. Are all of your MONs and OSDs up now?

Yes → I followed the recommended procedure and managed to solve the issue of OSD down…

Could you help advise on the questions above (#1 to #5)?
Thank you.