Chapter 6. Using the ceph-volume Utility to Deploy OSDs
The ceph-volume
utility is a single purpose command-line tool to deploy logical volumes as OSDs. It uses a plugin-type framework to deploying OSDs with different device technologies. The ceph-volume
utility follows a similar workflow of the ceph-disk
utility for deploying OSDs, with a predictable, and robust way of preparing, activating, and starting OSDs. Currently, the ceph-volume
utility only supports the lvm
plugin, with the plan to support others technologies in the future.
The ceph-disk
command is deprecated.
6.1. Using the ceph-volume
LVM Plugin
By making use of LVM tags, the lvm
sub-command is able to store and re-discover by querying devices associated with OSDs so they can be activated. This includes support for lvm-based technologies like dm-cache
as well.
When using ceph-volume
, the use of dm-cache
is transparent, and treats dm-cache
like a logical volume. The performance gains and losses when using dm-cache
will depend on the specific workload. Generally, random and sequential reads will see an increase in performance at smaller block sizes; while random and sequential writes will see a decrease in performance at larger block sizes.
To use the LVM plugin, add lvm
as a subcommand to the ceph-volume
command:
ceph-volume lvm
There are three subcommands to the lvm
subcommand, as follows:
Using the create
subcommand combines the prepare
and activate
subcommands into one subcommand. See the create
subcommand section for more details.
6.1.1. Preparing OSDs
The prepare
subcommand prepares an OSD backend object store and consumes logical volumes for both the OSD data and journal. There is no default object storage type. The object storage type requires either the --filestore
or --bluestore
option to be set at preparation time. Starting with Red Hat Ceph Storage 3.2, support for the BlueStore object storage type is available. The prepare
subcommand will not create or modify the logical volumes, except for adding some extra metadata using LVM tags.
LVM tags makes volumes easier to discover later, and help identify them as part of a Ceph system, and what role they have. The ceph-volume lvm prepare
command adds the following list of LVM tags:
-
cluster_fsid
-
data_device
-
journal_device
-
encrypted
-
osd_fsid
-
osd_id
-
journal_uuid
The prepare
process is very strict, it requires two logical volumes that are ready for use, and requires the minimum size for an OSD data and journal. The journal device can be either a logical volume or a partition.
Here is the prepare
workflow process:
- Accept logical volumes for data and journal
- Generate a UUID for the OSD
- Ask the Ceph Monitor to get an OSD identifier reusing the generated UUID
- OSD data directory is created and data volume mounted
- Journal is symlinked from data volume to journal location
-
The
monmap
is fetched for activation -
Device is mounted and the data directory is populated by
ceph-osd
- LVM tags are assigned to theOSD data and journal volumes
Do the following step on an OSD node, and as the root
user, to prepare a simple OSD deployment using LVM:
ceph-volume lvm prepare --bluestore --data $VG_NAME/$LV_NAME
For example:
# ceph-volume lvm prepare --bluestore --data example_vg/data_lv
For BlueStore, you can also specify the --block.db
and --block.wal
options, if you want to use a separate device for RocksDB.
Here is an example of using FileStore with a partition as a journal device:
# ceph-volume lvm prepare --filestore --data example_vg/data_lv --journal /dev/sdc1
When using a partition, it must contain a PARTUUID
discoverable by the blkid
command, this way it can be identified correctly regardless of the device name or path.
The ceph-volume
LVM plugin does not create partitions on a raw disk device. Creating this partition has to be done before using a partition for the OSD journal device.
6.1.2. Activating OSDs
Once the prepare process is done, the OSD is ready to go active. The activation process enables a Systemd unit at boot time which allows the correct OSD identifier and its UUID to be enabled and mounted.
Here is the activate
workflow process:
- Requires both OSD id and OSD uuid
- Enable the systemd unit with matching OSD id and OSD uuid
- The systemd unit will ensure all devices are ready and mounted
-
The matching
ceph-osd
systemd unit will get started
Do the following step on an OSD node, and as the root
user, to activate an OSD:
ceph-volume lvm activate --filestore $OSD_ID $OSD_UUID
For example:
# ceph-volume lvm activate --filestore 0 0263644D-0BF1-4D6D-BC34-28BD98AE3BC8
There are no side-effects when running this command multiple times.
6.1.3. Creating OSDs
The create
subcommand wraps the two-step process to deploy a new OSD by calling the prepare
subcommand and then calling the activate
subcommand into a single subcommand. The reason to use prepare
and then activate
separately is to gradually introduce new OSDs into a storage cluster, and avoiding large amounts of data being rebalanced. There is nothing different to the process except the OSD will become up and in immediately after completion.
Do the following step, for FileStore, on an OSD node, and as the root
user:
ceph-volume lvm create --filestore --data $VG_NAME/$LV_NAME --journal $JOURNAL_DEVICE
For example:
# ceph-volume lvm create --filestore --data example_vg/data_lv --journal example_vg/journal_lv
Do the following step, for BlueStore, on an OSD node, and as the root
user:
# ceph-volume lvm create --bluestore --data <device>
For example:
# ceph-volume lvm create --bluestore --data /dev/sda
6.1.4. Using batch
mode
The batch
subcommand automates the creation of multiple OSDs when single devices are provided. The ceph-volume
command decides the best method in creating the OSDs based on drive type. This best method is dependant on the object store format, BlueStore or FileStore.
BlueStore is the default object store type for OSDs. When using BlueStore, OSD optimization depends on three different scenarios based on the devices being used. If all devices are traditional hard drives, then one OSD per device is created. If all devices are solid state drives, then two OSDs per device are created. If there is a mix of traditional hard drives and solid state drives, then data is put on the traditional hard drives, and the block.db
is created as large as possible on the solid state drive.
The batch
subcommand does not support the creating of a separate logical volume for the write-ahead-log (block.wal
) device.
BlusStore example
# ceph-volume lvm batch --bluestore /dev/sda /dev/sdb /dev/nvme0n1
When using FileStore, OSD optimization depends on two different scenarios based on the devices being used. If all devices are traditional hard drives or are solid state drives, then one OSD per device is created, collocating the journal on the same device. If there is a mix of traditional hard drives and solid state drives, then data is put on the traditional hard drives, and the journal is created on the solid state drive using the sizing parameters specified in the Ceph configuration file, by default ceph.conf
, with a default journal size of 5 GB.
FileStore example
# ceph-volume lvm batch --filestore /dev/sda /dev/sdb