Chapter 4. Upgrading a Red Hat Ceph Storage Cluster
This section describes how to upgrade to a new major or minor version of Red Hat Ceph Storage.
Previously, Red Hat did not provide the ceph-ansible
package for Ubuntu. In Red Hat Ceph Storage version 3 and later, you can use the Ansible automation application to upgrade a Ceph cluster from an Ubuntu node.
- To upgrade a storage cluster, see Section 4.1, “Upgrading the Storage Cluster”.
Use the Ansible rolling_update.yml
playbook located in the /usr/share/ceph-ansible/infrastructure-playbooks/
directory from the administration node to upgrade between two major or minor versions of Red Hat Ceph Storage, or to apply asynchronous updates.
Ansible upgrades the Ceph nodes in the following order:
- Monitor nodes
- MGR nodes
- OSD nodes
- MDS nodes
- Ceph Object Gateway nodes
- All other Ceph client nodes
Red Hat Ceph Storage 3 introduces several changes in Ansible configuration files located in the /usr/share/ceph-ansible/group_vars/
directory; certain parameters were renamed or removed. Therefore, make backup copies of the all.yml
and osds.yml
files before creating new copies from the all.yml.sample
and osds.yml.sample
files after upgrading to version 3. For more details about the changes, see Appendix H, Changes in Ansible Variables Between Version 2 and 3.
Red Hat Ceph Storage 3.1 and later introduces new Ansible playbooks to optimize storage for performance when using Object Gateway and high speed NVMe based SSDs (and SATA SSDs). The playbooks do this by placing journals and bucket indexes together on SSDs, which can increase performance compared to having all journals on one device. These playbooks are designed to be used when installing Ceph. Existing OSDs continue to work and need no extra steps during an upgrade. There is no way to upgrade a Ceph cluster while simultaneously reconfiguring OSDs to optimize storage in this way. To use different devices for journals or bucket indexes requires reprovisioning OSDs. For more information see Using NVMe with LVM optimally in Ceph Object Gateway for Production.
The rolling_update.yml
playbook includes the serial
variable that adjusts the number of nodes to be updated simultaneously. Red Hat strongly recommends to use the default value (1
), which ensures that Ansible will upgrade cluster nodes one by one.
If the upgrade fails at any point, check the cluster status with the ceph status
command to understand the upgrade failure reason. If you are not sure of the failure reason and how to resolve , please contact Red hat Support for assistance.
When using the rolling_update.yml
playbook to upgrade to any Red Hat Ceph Storage 3.x version, users who use the Ceph File System (CephFS) must manually update the Metadata Server (MDS) cluster. This is due to a known issue.
Comment out the MDS hosts in /etc/ansible/hosts
before upgrading the entire cluster using ceph-ansible
rolling-upgrade.yml
, and then upgrade MDS manually. In the /etc/ansible/hosts
file:
#[mdss] #host-abc
#[mdss]
#host-abc
For more details about this known issue, including how to update the MDS cluster, refer to the Red Hat Ceph Storage 3.0 Release Notes.
When upgrading a Red Hat Ceph Storage cluster from a previous version to 3.2, the Ceph Ansible configuration will default the object store type to BlueStore. If you still want to use FileStore as the OSD object store, then explicitly set the Ceph Ansible configuration to FileStore. This ensures newly deployed and replaced OSDs are using FileStore.
When using the rolling_update.yml
playbook to upgrade to any Red Hat Ceph Storage 3.x version, and if you are using a multisite Ceph Object Gateway configuration, then you do not have to manually update the all.yml
file to specify the multisite configuration.
Prerequisites
- If the Ceph nodes are not connected to the Red Hat Content Delivery Network (CDN) and you used an ISO image to install Red Hat Ceph Storage, update the local repository with the latest version of Red Hat Ceph Storage. See Section 2.4, “Enabling the Red Hat Ceph Storage Repositories” for details.
If upgrading from Red Hat Ceph Storage 2.x to 3.x, on the Ansible administration node and the RBD mirroring node, enable the Red Hat Ceph Storage 3 Tools repository:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo bash -c 'umask 0077; echo deb https://customername:customerpasswd@rhcs.download.redhat.com/3-updates/Tools $(lsb_release -sc) main | tee /etc/apt/sources.list.d/Tools.list' sudo bash -c 'wget -O - https://www.redhat.com/security/fd431d51.txt | apt-key add -' sudo apt-get update
[root@admin ~]$ sudo bash -c 'umask 0077; echo deb https://customername:customerpasswd@rhcs.download.redhat.com/3-updates/Tools $(lsb_release -sc) main | tee /etc/apt/sources.list.d/Tools.list' [root@admin ~]$ sudo bash -c 'wget -O - https://www.redhat.com/security/fd431d51.txt | apt-key add -' [root@admin ~]$ sudo apt-get update
If upgrading from RHCS 2.x to 3.x, or from RHCS 3.x to the latest version, on the Ansible administration node, ensure the latest version of the
ceph-ansible
package is installed.Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo apt-get install ceph-ansible
[root@admin ~]$ sudo apt-get install ceph-ansible
In the
rolling_update.yml
playbook, change thehealth_osd_check_retries
andhealth_osd_check_delay
values to50
and30
respectively.Copy to Clipboard Copied! Toggle word wrap Toggle overflow health_osd_check_retries: 50 health_osd_check_delay: 30
health_osd_check_retries: 50 health_osd_check_delay: 30
With these values set, for each OSD node, Ansible will wait up to 25 minutes, and will check the storage cluster health every 30 seconds, waiting before continuing the upgrade process.
NoteAdjust the
health_osd_check_retries
option value up or down based on the used storage capacity of the storage cluster. For example, if you are using 218 TB out of 436 TB, basically using 50% of the storage capacity, then set thehealth_osd_check_retries
option to50
.If the cluster you want to upgrade contains Ceph Block Device images that use the
exclusive-lock
feature, ensure that all Ceph Block Device users have permissions to blacklist clients:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph auth caps client.<ID> mon 'allow r, allow command "osd blacklist"' osd '<existing-OSD-user-capabilities>'
ceph auth caps client.<ID> mon 'allow r, allow command "osd blacklist"' osd '<existing-OSD-user-capabilities>'
4.1. Upgrading the Storage Cluster
Procedure
Use the following commands from the Ansible administration node.
As the
root
user, navigate to the/usr/share/ceph-ansible/
directory:Copy to Clipboard Copied! Toggle word wrap Toggle overflow cd /usr/share/ceph-ansible/
[root@admin ~]# cd /usr/share/ceph-ansible/
Skip this step when upgrading from Red Hat Ceph Storage version 3.x to the latest version. Back up the
group_vars/all.yml
andgroup_vars/osds.yml
files.Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp group_vars/all.yml group_vars/all_old.yml cp group_vars/osds.yml group_vars/osds_old.yml cp group_vars/clients.yml group_vars/clients_old.yml
[root@admin ceph-ansible]# cp group_vars/all.yml group_vars/all_old.yml [root@admin ceph-ansible]# cp group_vars/osds.yml group_vars/osds_old.yml [root@admin ceph-ansible]# cp group_vars/clients.yml group_vars/clients_old.yml
Skip this step when upgrading from Red Hat Ceph Storage version 3.x to the latest version. When upgrading from Red Hat Ceph Storage 2.x to 3.x, create new copies of the
group_vars/all.yml.sample
,group_vars/osds.yml.sample
andgroup_vars/clients.yml.sample
files, and rename them togroup_vars/all.yml
,group_vars/osds.yml
, andgroup_vars/clients.yml
respectively. Open and edit them accordingly. For details, see Appendix H, Changes in Ansible Variables Between Version 2 and 3 and Section 3.2, “Installing a Red Hat Ceph Storage Cluster” .Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp group_vars/all.yml.sample group_vars/all.yml cp group_vars/osds.yml.sample group_vars/osds.yml cp group_vars/clients.yml.sample group_vars/clients.yml
[root@admin ceph-ansible]# cp group_vars/all.yml.sample group_vars/all.yml [root@admin ceph-ansible]# cp group_vars/osds.yml.sample group_vars/osds.yml [root@admin ceph-ansible]# cp group_vars/clients.yml.sample group_vars/clients.yml
Skip this step when upgrading from Red Hat Ceph Storage version 3.x to the latest version. When upgrading from Red Hat Ceph Storage 2.x to 3.x, open the
group_vars/clients.yml
file, and uncomment the following lines:Copy to Clipboard Copied! Toggle word wrap Toggle overflow keys: - { name: client.test, caps: { mon: "allow r", osd: "allow class-read object_prefix rbd_children, allow rwx pool=test" }, mode: "{{ ceph_keyring_permissions }}" }
keys: - { name: client.test, caps: { mon: "allow r", osd: "allow class-read object_prefix rbd_children, allow rwx pool=test" }, mode: "{{ ceph_keyring_permissions }}" }
Replace
client.test
with the real client name, and add the client key to the client definition line, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow key: "ADD-KEYRING-HERE=="
key: "ADD-KEYRING-HERE=="
Now the whole line example would look similar to this:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - { name: client.test, key: "AQAin8tUMICVFBAALRHNrV0Z4MXupRw4v9JQ6Q==", caps: { mon: "allow r", osd: "allow class-read object_prefix rbd_children, allow rwx pool=test" }, mode: "{{ ceph_keyring_permissions }}" }
- { name: client.test, key: "AQAin8tUMICVFBAALRHNrV0Z4MXupRw4v9JQ6Q==", caps: { mon: "allow r", osd: "allow class-read object_prefix rbd_children, allow rwx pool=test" }, mode: "{{ ceph_keyring_permissions }}" }
NoteTo get the client key, run the
ceph auth get-or-create
command to view the key for the named client.
In the
group_vars/all.yml
file, uncomment theupgrade_ceph_packages
option and set it toTrue
.Copy to Clipboard Copied! Toggle word wrap Toggle overflow upgrade_ceph_packages: True
upgrade_ceph_packages: True
Add the
fetch_directory
parameter to thegroup_vars/all.yml
file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow fetch_directory: <full_directory_path>
fetch_directory: <full_directory_path>
Replace:
-
<full_directory_path>
with a writable location, such as the Ansible user’s home directory. Provide the existing path that was used for the initial storage cluster installation.
If the existing path is lost or missing, then do the following first:
Add the following options to the existing
group_vars/all.yml
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow fsid: <add_the_fsid> generate_fsid: false
fsid: <add_the_fsid> generate_fsid: false
Run the
take-over-existing-cluster.yml
Ansible playbook:Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp infrastructure-playbooks/take-over-existing-cluster.yml . ansible-playbook take-over-existing-cluster.yml
[user@admin ceph-ansible]$ cp infrastructure-playbooks/take-over-existing-cluster.yml . [user@admin ceph-ansible]$ ansible-playbook take-over-existing-cluster.yml
-
If the cluster you want to upgrade contains any Ceph Object Gateway nodes, add the
radosgw_interface
parameter to thegroup_vars/all.yml
file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow radosgw_interface: <interface>
radosgw_interface: <interface>
Replace:
-
<interface>
with the interface that the Ceph Object Gateway nodes listen to.
-
Starting with Red Hat Ceph Storage 3.2, the default OSD object store is BlueStore. To keep the traditional OSD object store, you must explicitly set the
osd_objectstore
option tofilestore
in thegroup_vars/all.yml
file.Copy to Clipboard Copied! Toggle word wrap Toggle overflow osd_objectstore: filestore
osd_objectstore: filestore
NoteWith the
osd_objectstore
option set tofilestore
, replacing an OSD will use FileStore, instead of BlueStore.In the Ansible inventory file located at
/etc/ansible/hosts
, add the Ceph Manager (ceph-mgr
) nodes under the[mgrs]
section. Colocate the Ceph Manager daemon with Monitor nodes. Skip this step when upgrading from version 3.x to the latest version.Copy to Clipboard Copied! Toggle word wrap Toggle overflow [mgrs] <monitor-host-name> <monitor-host-name> <monitor-host-name>
[mgrs] <monitor-host-name> <monitor-host-name> <monitor-host-name>
Copy
rolling_update.yml
from theinfrastructure-playbooks
directory to the current directory.Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp infrastructure-playbooks/rolling_update.yml .
[root@admin ceph-ansible]# cp infrastructure-playbooks/rolling_update.yml .
ImportantDo not use the
limit
ansible option with therolling_update.yml
playbook.Create the
/var/log/ansible/
directory and assign the appropriate permissions for theansible
user:Copy to Clipboard Copied! Toggle word wrap Toggle overflow mkdir /var/log/ansible chown ansible:ansible /var/log/ansible chmod 755 /var/log/ansible
[root@admin ceph-ansible]# mkdir /var/log/ansible [root@admin ceph-ansible]# chown ansible:ansible /var/log/ansible [root@admin ceph-ansible]# chmod 755 /var/log/ansible
Edit the
/usr/share/ceph-ansible/ansible.cfg
file, updating thelog_path
value as follows:Copy to Clipboard Copied! Toggle word wrap Toggle overflow log_path = /var/log/ansible/ansible.log
log_path = /var/log/ansible/ansible.log
As the Ansible user, run the playbook:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ansible-playbook rolling_update.yml
[user@admin ceph-ansible]$ ansible-playbook rolling_update.yml
While logged in as the
root
user on the RBD mirroring daemon node, upgraderbd-mirror
manually:Copy to Clipboard Copied! Toggle word wrap Toggle overflow sudo apt-get upgrade rbd-mirror
$ sudo apt-get upgrade rbd-mirror
Restart the daemon:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart ceph-rbd-mirror@<client-id>
# systemctl restart ceph-rbd-mirror@<client-id>
-
Verify that the cluster health is OK. ..Log into a monitor node as the
root
user and run the ceph status command.
ceph -s
[root@monitor ~]# ceph -s
If working in an OpenStack environment, update all the
cephx
users to use the RBD profile for pools. The following commands must be run as theroot
user:Glance users
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=<glance-pool-name>'
ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=<glance-pool-name>'
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=images'
[root@monitor ~]# ceph auth caps client.glance mon 'profile rbd' osd 'profile rbd pool=images'
Cinder users
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph auth caps client.cinder mon 'profile rbd' osd 'profile rbd pool=<cinder-volume-pool-name>, profile rbd pool=<nova-pool-name>, profile rbd-read-only pool=<glance-pool-name>'
ceph auth caps client.cinder mon 'profile rbd' osd 'profile rbd pool=<cinder-volume-pool-name>, profile rbd pool=<nova-pool-name>, profile rbd-read-only pool=<glance-pool-name>'
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph auth caps client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images'
[root@monitor ~]# ceph auth caps client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images'
OpenStack general users
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph auth caps client.openstack mon 'profile rbd' osd 'profile rbd-read-only pool=<cinder-volume-pool-name>, profile rbd pool=<nova-pool-name>, profile rbd-read-only pool=<glance-pool-name>'
ceph auth caps client.openstack mon 'profile rbd' osd 'profile rbd-read-only pool=<cinder-volume-pool-name>, profile rbd pool=<nova-pool-name>, profile rbd-read-only pool=<glance-pool-name>'
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ceph auth caps client.openstack mon 'profile rbd' osd 'profile rbd-read-only pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images'
[root@monitor ~]# ceph auth caps client.openstack mon 'profile rbd' osd 'profile rbd-read-only pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images'
ImportantDo these CAPS updates before performing any live client migrations. This allows clients to use the new libraries running in memory, causing the old CAPS settings to drop from cache and applying the new RBD profile settings.