Chapter 5. Customizing the Ceph Storage cluster
Director deploys containerized Red Hat Ceph Storage using a default configuration. You can customize Ceph Storage by overriding the default settings.
Prerequistes
To deploy containerized Ceph Storage you must include the /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml
file during overcloud deployment. This environment file defines the following resources:
-
CephAnsibleDisksConfig
- This resource maps the Ceph Storage node disk layout. For more information, see Section 5.2, “Mapping the Ceph Storage node disk layout”. -
CephConfigOverrides
- This resource applies all other custom settings to your Ceph Storage cluster.
Procedure
Enable the Red Hat Ceph Storage 3 Tools repository:
$ sudo subscription-manager repos --enable=rhel-7-server-rhceph-3-tools-rpms
Install the
ceph-ansible
package on your undercloud:$ sudo yum install ceph-ansible
To customize your Ceph Storage cluster, define custom parameters in a new environment file, for example,
/home/stack/templates/ceph-config.yaml
. You can apply Ceph Storage cluster settings with the following syntax in theparameter_defaults
section of your environment file:parameter_defaults: section: KEY:VALUE
NoteYou can apply the
CephConfigOverrides
parameter to the[global]
section of theceph.conf
file, as well as any other section, such as[osd]
,[mon]
, and[client]
. If you specify a section, thekey:value
data goes into the specified section. If you do not specify a section, the data goes into the[global]
section by default. For information about Ceph Storage configuration, customization, and supported parameters, see Red Hat Ceph Storage Configuration Guide.Replace
KEY
andVALUE
with the Ceph cluster settings that you want to apply. For example, in theglobal
section,max_open_files
is theKEY
and131072
is the correspondingVALUE
:parameter_defaults: CephConfigOverrides: global: max_open_files: 131072 osd: osd_scrub_during_recovery: false
This configuration results in the following settings defined in the configuration file of your Ceph cluster:
[global] max_open_files = 131072 [osd] osd_scrub_during_recovery = false
5.1. Setting ceph-ansible group variables
The ceph-ansible
tool is a playbook used to install and manage Ceph Storage clusters.
For information about the group_vars
directory, see 3.2. Installing a Red Hat Ceph Storage Cluster in the Installation Guide for Red Hat Enterprise Linux.
To change the variable defaults in director, use the CephAnsibleExtraConfig
parameter to pass the new values in heat environment files. For example, to set the ceph-ansible
group variable journal_size
to 40960, create an environment file with the following journal_size
definition:
parameter_defaults: CephAnsibleExtraConfig: journal_size: 40960
Change ceph-ansible
group variables with the override parameters; do not edit group variables directly in the /usr/share/ceph-ansible
directory on the undercloud.
5.2. Mapping the Ceph Storage node disk layout
When you deploy containerized Ceph Storage, you must map the disk layout and specify dedicated block devices for the Ceph OSD service. You can perform this mapping in the environment file you created earlier to define your custom Ceph parameters: /home/stack/templates/ceph-config.yaml
.
Use the CephAnsibleDisksConfig
resource in parameter_defaults
to map your disk layout. This resource uses the following variables:
Variable | Required? | Default value (if unset) | Description |
---|---|---|---|
osd_scenario | Yes | lvm
NOTE: For new deployments using Ceph 3.2 and later, |
With Ceph 3.2, With Ceph 3.1, the values set the journaling scenario, such as whether OSDs must be created with journals that are either:
- co-located on the same device for
- stored on dedicated devices for |
devices | Yes | NONE. Variable must be set. | A list of block devices to be used on the node for OSDs. |
dedicated_devices |
Yes (only if | devices |
A list of block devices that maps each entry under devices to a dedicated journaling block device. Use this variable only when |
dmcrypt | No | false |
Sets whether data stored on OSDs are encrypted ( |
osd_objectstore | No | bluestore
NOTE: For new deployments using Ceph 3.2 and later, | Sets the storage back end used by Ceph. |
If you deployed your Ceph cluster with a version of ceph-ansible
older than 3.3 and osd_scenario
is set to collocated
or non-collocated
, OSD reboot failure can occur due to a device naming discrepancy. For more information about this fault, see https://bugzilla.redhat.com/show_bug.cgi?id=1670734. For information about a workaround, see https://access.redhat.com/solutions/3702681.
5.2.1. Using BlueStore in Ceph 3.2 and later
New deployments of OpenStack Platform 13 must use bluestore
. Current deployments that use filestore
must continue using filestore
, as described in Using FileStore in Ceph 3.1 and earlier. Migrations from filestore
to bluestore
are not supported by default in RHCS 3.x.
Procedure
To specify the block devices to be used as Ceph OSDs, use a variation of the following:
parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd - /dev/nvme0n1 osd_scenario: lvm osd_objectstore: bluestore
Because
/dev/nvme0n1
is in a higher performing device class—it is an SSD and the other devices are HDDs—the example parameter defaults produce three OSDs that run on/dev/sdb
,/dev/sdc
, and/dev/sdd
. The three OSDs use/dev/nvme0n1
as a BlueStore WAL device. The ceph-volume tool does this by using thebatch
subcommand. The same configuration is duplicated for each Ceph storage node and assumes uniform hardware. If the BlueStore WAL data resides on the same disks as the OSDs, then change the parameter defaults in the following way:parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/sdb - /dev/sdc - /dev/sdd osd_scenario: lvm osd_objectstore: bluestore
5.2.2. Using FileStore in Ceph 3.1 and earlier
The default journaling scenario is set to osd_scenario=collocated
, which has lower hardware requirements consistent with most testing environments. In a typical production environment, however, journals are stored on dedicated devices, osd_scenario=non-collocated
, to accommodate heavier I/O workloads. For more information, see Identifying a Performance Use Case in the Red Hat Ceph Storage Hardware Selection Guide.
Procedure
List each block device to be used by the OSDs as a simple list under the
devices
variable, for example:devices: - /dev/sda - /dev/sdb - /dev/sdc - /dev/sdd
Optional: If
osd_scenario=non-collocated
, you must also map each entry indevices
to a corresponding entry indedicated_devices
. For example, the following snippet in/home/stack/templates/ceph-config.yaml
:osd_scenario: non-collocated devices: - /dev/sda - /dev/sdb - /dev/sdc - /dev/sdd dedicated_devices: - /dev/sdf - /dev/sdf - /dev/sdg - /dev/sdg
- Result
Each Ceph Storage node in the resulting Ceph cluster has the following characteristics:
-
/dev/sda
has/dev/sdf1
as its journal -
/dev/sdb
has/dev/sdf2
as its journal -
/dev/sdc
has/dev/sdg1
as its journal -
/dev/sdd
has/dev/sdg2
as its journal
-
5.2.3. Referring to devices with persistent names
Procedure
In some nodes, disk paths such as
/dev/sdb
and/dev/sdc
, might not point to the same block device during reboots. If this is the case with yourCephStorage
nodes, specify each disk with the/dev/disk/by-path/
symlink to ensure that the block device mapping is consistent throughout deployments:parameter_defaults: CephAnsibleDisksConfig: devices: - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:10:0 - /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:11:0 dedicated_devices: - /dev/nvme0n1 - /dev/nvme0n1
Optional: Because you must set the list of OSD devices before overcloud deployment, it might not be possible to identify and set the PCI path of disk devices. In this case, gather the
/dev/disk/by-path/symlink
data for block devices during introspection.In the following example, run the first command to download the introspection data from the undercloud Object Storage service (swift) for the server,
b08-h03-r620-hci
, and save the data in a file calledb08-h03-r620-hci.json
. Run the second command to grep for “by-path”. The output of this command contains the unique/dev/disk/by-path
values that you can use to identify disks.(undercloud) [stack@b08-h02-r620 ironic]$ openstack baremetal introspection data save b08-h03-r620-hci | jq . > b08-h03-r620-hci.json (undercloud) [stack@b08-h02-r620 ironic]$ grep by-path b08-h03-r620-hci.json "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:0:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:1:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:3:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:4:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:5:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:6:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:7:0", "by_path": "/dev/disk/by-path/pci-0000:02:00.0-scsi-0:2:0:0",
For more information about naming conventions for storage devices, see Persistent Naming in the Red Hat Enterprise Linux (RHEL) Managing storage devices guide.
osd_scenario: lvm
is used in the example to default new deployments to bluestore
as configured by ceph-volume
; this is only available with ceph-ansible 3.2 or later and Ceph Luminous or later. The parameters to support filestore
with ceph-ansible 3.2 are backwards compatible. Therefore, in existing FileStore deployments, do not change the osd_objectstore
or osd_scenario
parameters.
5.2.4. Creating a valid JSON file automatically from Bare Metal service introspection data
When you customize devices in a Ceph Storage deployment by manually including node-specific overrides, you can inadvertently introduce errors. The director tools directory contains a utility named make_ceph_disk_list.py
that you can use to create a valid JSON environment file automatically from Bare Metal service (ironic) introspection data.
Procedure
Export the introspection data from the Bare Metal service database for the Ceph Storage nodes you want to deploy:
openstack baremetal introspection data save oc0-ceph-0 > ceph0.json openstack baremetal introspection data save oc0-ceph-1 > ceph1.json ...
Copy the utility to the
stack
user’s home directory on the undercloud, and then use it to generate anode_data_lookup.json
file that you can pass to theopenstack overcloud deploy
command:./make_ceph_disk_list.py -i ceph*.json -o node_data_lookup.json -k by_path
-
The
-i
option can take an expression such as*.json
or a list of files as input. The
-k
option defines the key of the ironic disk data structure used to identify the OSD disks.NoteRed Hat does not recommend using
name
because it produces a list of devices such as/dev/sdd
, which may not always point to the same device on reboot. Instead, Red Hat recommends that you useby_path
, which is the default option if-k
is not specified.NoteYou can only define
NodeDataLookup
once during a deployment, so pass the introspection data file to all nodes that host Ceph OSDs. The Bare Metal service reserves one of the available disks on the system as the root disk. The utility always exludes the root disk from the list of generated devices.
-
The
-
Run the
./make_ceph_disk_list.py –help
command to see other available options.
5.2.5. Mapping the Disk Layout to Non-Homogeneous Ceph Storage Nodes
Non-homogeneous Ceph Storage nodes can cause performance issues, such as the risk of unpredictable performance loss. Although you can configure non-homogeneous Ceph Storage nodes in your Red Hat OpenStack Platform environment, Red Hat does not recommend it.
By default, all nodes that host Ceph OSDs use the global devices
and dedicated_devices
lists that you set in Section 5.2, “Mapping the Ceph Storage node disk layout”.
This default configuration is appropriate when all Ceph OSD nodes have homogeneous hardware. However, if a subset of these servers do not have homogeneous hardware, then you must define a node-specific disk configuration in the director.
To identify nodes that host Ceph OSDs, inspect the roles_data.yaml
file and identify all roles that include the OS::TripleO::Services::CephOSD
service.
To define a node-specific configuration, create a custom environment file that identifies each server and includes a list of local variables that override global variables and include the environment file in the openstack overcloud deploy
command. For example, create a node-specific configuration file called node-spec-overrides.yaml
.
You can extract the machine unique UUID for each individual server or from the Ironic database.
To locate the UUID for an individual server, log in to the server and run the following command:
dmidecode -s system-uuid
To extract the UUID from the Ironic database, run the following command on the undercloud:
openstack baremetal introspection data save NODE-ID | jq .extra.system.product.uuid
If the undercloud.conf file does not have inspection_extras = true
prior to undercloud installation or upgrade and introspection, then the machine unique UUID will not be in the Ironic database.
The machine unique UUID is not the Ironic UUID.
A valid node-spec-overrides.yaml
file may look like the following:
parameter_defaults: NodeDataLookup: | {"32E87B4C-C4A7-418E-865B-191684A6883B": {"devices": ["/dev/sdc"]}}
All lines after the first two lines must be valid JSON. An easy way to verify that the JSON is valid is to use the jq
command. For example:
-
Remove the first two lines (
parameter_defaults:
andNodeDataLookup: |
) from the file temporarily. -
Run
cat node-spec-overrides.yaml | jq .
As the node-spec-overrides.yaml
file grows, jq
may also be used to ensure that the embedded JSON is valid. For example, because the devices
and dedicated_devices
list should be the same length, use the following to verify that they are the same length before starting the deployment.
(undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .devices | length' 33 30 33 (undercloud) [stack@b08-h02-r620 tht]$ cat node-spec-c05-h17-h21-h25-6048r.yaml | jq '.[] | .dedicated_devices | length' 33 30 33 (undercloud) [stack@b08-h02-r620 tht]$
In this example, the node-spec-c05-h17-h21-h25-6048r.yaml
has three servers in rack c05 in which slots h17, h21, and h25 are missing disks. A more complicated example is included at the end of this section.
After you validate the JSON syntax, ensure that you repopulate the first two lines of the environment file and use the -e
option to include the file in the deployment command.
In the following example, the updated environment file uses NodeDataLookup for Ceph deployment. All of the servers had a devices list with 35 disks, except one server has a disk missing.
Use the following example environment file to override the default devices list for the node that has 34 disks with the list of disks it should use instead of the global list.
parameter_defaults: # c05-h01-6048r is missing scsi-0:2:35:0 (00000000-0000-0000-0000-0CC47A6EFD0C) NodeDataLookup: | { "00000000-0000-0000-0000-0CC47A6EFD0C": { "devices": [ "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:1:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:32:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:2:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:3:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:4:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:5:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:6:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:33:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:7:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:8:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:34:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:9:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:10:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:11:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:12:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:13:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:14:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:15:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:16:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:17:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:18:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:19:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:20:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:21:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:22:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:23:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:24:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:25:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:26:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:27:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:28:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:29:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:30:0", "/dev/disk/by-path/pci-0000:03:00.0-scsi-0:2:31:0" ], "dedicated_devices": [ "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:81:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1", "/dev/disk/by-path/pci-0000:84:00.0-nvme-1" ] } }
5.3. Controlling resources that are available to Ceph Storage containers
When you colocate Ceph Storage containers and Red Hat OpenStack Platform containers on the same server, the containers can compete for memory and CPU resources.
To control the amount of memory or CPU that Ceph Storage containers can use, define the CPU and memory limits as shown in the following example:
parameter_defaults: CephAnsibleExtraConfig: ceph_mds_docker_cpu_limit: 4 ceph_mgr_docker_cpu_limit: 1 ceph_mon_docker_cpu_limit: 1 ceph_osd_docker_cpu_limit: 4 ceph_mds_docker_memory_limit: 64438m ceph_mgr_docker_memory_limit: 64438m ceph_mon_docker_memory_limit: 64438m
The limits shown are for example only. Actual values can vary based on your environment.
The default value for all of the memory limits specified in the example is the total host memory on the system. For example, ceph-ansible
uses "{{ ansible_memtotal_mb }}m"
.
The ceph_osd_docker_memory_limit
parameter is intentionally excluded from the example. Do not use the ceph_osd_docker_memory_limit
parameter. For more information, see Reserving Memory Resources for Ceph in the Hyper-Converged Infrastructure Guide.
If the server on which the containers are colocated does not have sufficient memory or CPU, or if your design requires physical isolation, you can use composable services to deploy Ceph Storage containers to additional nodes. For more information, see Composable Services and Custom Roles in the Advanced Overcloud Customization guide.
5.4. Overriding Ansible environment variables
The Red Hat OpenStack Platform Workflow service (mistral) uses Ansible to configure Ceph Storage, but you can customize the Ansible environment by using Ansible environment variables.
Procedure
To override an ANSIBLE_*
environment variable, use the CephAnsibleEnvironmentVariables
heat template parameter.
This example configuration increases the number of forks and SSH retries:
parameter_defaults: CephAnsibleEnvironmentVariables: ANSIBLE_SSH_RETRIES: '6' DEFAULT_FORKS: '35'
For more information about Ansible environment variables, see Ansible Configuration Settings.
For more information about how to customize your Ceph Storage cluster, see Customizing the Ceph Storage cluster.