Maximum HDDs per OSD node

Does any technical limitation for number of HDD per OSD node exist? If yes, which is the limit?
Consider the customer’s application goal forces to shift the balance between cluster size and node failure impact into ‘size’ side. Today it is not hard to build a server, putting into node 4 - 6 SAS controllers, supporting up to 256 target devices each. It’s also possible to daisy-chain SAS enclosures with up to 90 HDDs inside. Thus, the 1000+ physical HDDs for Ceph node - it’s not a dream. RAM is not a problem as well. Xeon-v4-based dual-socket server might be filled with up to 3TB of RAM, thus, it may store up to 1500 OSD processes (2GB each).

From the other hand, look at the document. The 65 - 71 and 128 - 135 opcode blocks are allocated for HDD mapping, thus, we have 16 allocated blocks. The minor opcode has range 0…255. Each disk might get 16 consecutive minor opcodes (first represents the entire disk and last 15 for partitions). Number of possible drives is = 16 * ( 256 / 16 ) = 256.
If SES(Ceph) uses GUID approach to buld the cluster, it’s not clear, how to manage it in ceph.conf file and command line. To configure separately journalling drives and data drives, for example.

Will SES3/4 be able to recognize, accesss and manage huge amount of drives?

I do not know the technical limitations, but doubt you’ll hit them even
with the numbers you specify.

On the other hand, are you planning for the situation where an OSD node
dies? Motherboard, power supply, etc.? When that happens, the system
will need to rebalance your thousands of OSDs’ content, and while that is
fine, it implies you have several of these really big systems running
happily, and able to take the TBs or PBs of data right then to rebalance.

Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…

[QUOTE=ab;35706]On the other hand, are you planning for the situation where an OSD node
dies? Motherboard, power supply, etc.?
Yes, as the document at the first link talks about this. However, saying accurately we have to choose between two ways:

  1. Rebalance the cluster.
  2. Stop the cluster for maintenance.
    And the second variant is not rare: most backup systems, for example, might be stopped at production time without direct impact to business availability.