As I see, Ceph reports the amount of used/free storage space as ‘raw’ data, without replication or erasure coding taken into account.
This is acceptable for object storage, but terribly wrong way for CephFS.
Look at the example, please.
- Cluster has 3 OSD ~33 GB each.
- Two pools (data and metadata) for CephFS are created with replication factor = 3 rule.
- The CephFS was used to store NFS export for VMWare datastore.
- The 25 GB .VMDK was placed to the datastore created.
- Ceph reports the usage as “78250 MB used, 22822 MB / 101072 MB avail”. This is correct for raw storage.
- VMWare reports as follows.
- Does that mean the ~22 GB VMDK file from CephFS datastore might be added to virtual infrastructure? No!
- Is that correct data for administering VMWare? No!
- Which numbers will be clear for VM Admin? “~25 GB used, ~7GB / ~33 GB avail”.
CephFS uses particular pool, which is coupled to particular rule, which defines the replication factor (or redundancy factor) in unique way. All data for calculation on-the-flight are present.
Does anybody know the reason not to divide the raw statistics by that factor, which will give the usable result?