Chapter 12. Cephadm operations
As a storage administrator, you can carry out Cephadm operations in the Red Hat Ceph Storage cluster.
12.1. Prerequisites
- A running Red Hat Ceph Storage cluster.
12.2. Monitor cephadm log messages
Cephadm logs to the cephadm cluster log channel so you can monitor progress in real time.
To monitor progress in realtime, run the following command:
Example
[ceph: root@host01 /]# ceph -W cephadm
Example
2022-06-10T17:51:36.335728+0000 mgr.Ceph5-1.nqikfh [INF] refreshing Ceph5-adm facts 2022-06-10T17:51:37.170982+0000 mgr.Ceph5-1.nqikfh [INF] deploying 1 monitor(s) instead of 2 so monitors may achieve consensus 2022-06-10T17:51:37.173487+0000 mgr.Ceph5-1.nqikfh [ERR] It is NOT safe to stop ['mon.Ceph5-adm']: not enough monitors would be available (Ceph5-2) after stopping mons [Ceph5-adm] 2022-06-10T17:51:37.174415+0000 mgr.Ceph5-1.nqikfh [INF] Checking pool "nfs-ganesha" exists for service nfs.foo 2022-06-10T17:51:37.176389+0000 mgr.Ceph5-1.nqikfh [ERR] Failed to apply nfs.foo spec NFSServiceSpec({'placement': PlacementSpec(count=1), 'service_type': 'nfs', 'service_id': 'foo', 'unmanaged': False, 'preview_only': False, 'pool': 'nfs-ganesha', 'namespace': 'nfs-ns'}): Cannot find pool "nfs-ganesha" for service nfs.foo Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/serve.py", line 408, in _apply_all_services if self._apply_service(spec): File "/usr/share/ceph/mgr/cephadm/serve.py", line 509, in _apply_service config_func(spec) File "/usr/share/ceph/mgr/cephadm/services/nfs.py", line 23, in config self.mgr._check_pool_exists(spec.pool, spec.service_name()) File "/usr/share/ceph/mgr/cephadm/module.py", line 1840, in _check_pool_exists raise OrchestratorError(f'Cannot find pool "{pool}" for ' orchestrator._interface.OrchestratorError: Cannot find pool "nfs-ganesha" for service nfs.foo 2022-06-10T17:51:37.179658+0000 mgr.Ceph5-1.nqikfh [INF] Found osd claims -> {} 2022-06-10T17:51:37.180116+0000 mgr.Ceph5-1.nqikfh [INF] Found osd claims for drivegroup all-available-devices -> {} 2022-06-10T17:51:37.182138+0000 mgr.Ceph5-1.nqikfh [INF] Applying all-available-devices on host Ceph5-adm... 2022-06-10T17:51:37.182987+0000 mgr.Ceph5-1.nqikfh [INF] Applying all-available-devices on host Ceph5-1... 2022-06-10T17:51:37.183395+0000 mgr.Ceph5-1.nqikfh [INF] Applying all-available-devices on host Ceph5-2... 2022-06-10T17:51:43.373570+0000 mgr.Ceph5-1.nqikfh [INF] Reconfiguring node-exporter.Ceph5-1 (unknown last config time)... 2022-06-10T17:51:43.373840+0000 mgr.Ceph5-1.nqikfh [INF] Reconfiguring daemon node-exporter.Ceph5-1 on Ceph5-1
By default, the log displays info-level events and above. To see the debug-level messages, run the following commands:
Example
[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/log_to_cluster_level debug [ceph: root@host01 /]# ceph -W cephadm --watch-debug [ceph: root@host01 /]# ceph -W cephadm --verbose
Return debugging level to default
info
:Example
[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/log_to_cluster_level info
To see the recent events, run the following command:
Example
[ceph: root@host01 /]# ceph log last cephadm
Theses events are also logged to ceph.cephadm.log
file on the monitor hosts and to the monitor daemon’s stderr
12.3. Ceph daemon logs
You can view the Ceph daemon logs through stderr
or files.
Logging to stdout
Traditionally, Ceph daemons have logged to /var/log/ceph
. By default, Cephadm daemons log to stderr
and the logs are captured by the container runtime environment. For most systems, by default, these logs are sent to journald
and accessible through the journalctl
command.
For example, to view the logs for the daemon on host01 for a storage cluster with ID 5c5a50ae-272a-455d-99e9-32c6a013e694:
Example
[ceph: root@host01 /]# journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@host01
This works well for normal Cephadm operations when logging levels are low.
To disable logging to
stderr
, set the following values:Example
[ceph: root@host01 /]# ceph config set global log_to_stderr false [ceph: root@host01 /]# ceph config set global mon_cluster_log_to_stderr false
Logging to files
You can also configure Ceph daemons to log to files instead of stderr
. When logging to files, Ceph logs are located in /var/log/ceph/CLUSTER_FSID
.
To enable logging to files, set the follwing values:
Example
[ceph: root@host01 /]# ceph config set global log_to_file true [ceph: root@host01 /]# ceph config set global mon_cluster_log_to_file true
Red Hat recommends disabling logging to stderr
to avoid double logs.
Currently log rotation to a non-default path is not supported.
By default, Cephadm sets up log rotation on each host to rotate these files. You can configure the logging retention schedule by modifying /etc/logrotate.d/ceph.CLUSTER_FSID
.
12.4. Data location
Cephadm daemon data and logs are located in slightly different locations than the older versions of Ceph:
-
/var/log/ceph/CLUSTER_FSID
contains all the storage cluster logs. Note that by default Cephadm logs throughstderr
and the container runtime, so these logs are usually not present. -
/var/lib/ceph/CLUSTER_FSID
contains all the cluster daemon data, besides logs. -
var/lib/ceph/CLUSTER_FSID/DAEMON_NAME
contains all the data for an specific daemon. -
/var/lib/ceph/CLUSTER_FSID/crash
contains the crash reports for the storage cluster. -
/var/lib/ceph/CLUSTER_FSID/removed
contains old daemon data directories for the stateful daemons, for example monitor or Prometheus, that have been removed by Cephadm.
Disk usage
A few Ceph daemons may store a significant amount of data in /var/lib/ceph
, notably the monitors and Prometheus daemon, hence Red Hat recommends moving this directory to its own disk, partition, or logical volume so that the root file system is not filled up.
12.5. Cephadm health checks
As a storage administrator, you can monitor the Red Hat Ceph Storage cluster with the additional healthchecks provided by the Cephadm module. This is supplementary to the default healthchecks provided by the storage cluster.
12.5.1. Prerequisites
- A running Red Hat Ceph Storage cluster.
12.5.2. Cephadm operations health checks
Healthchecks are executed when the Cephadm module is active. You can get the following health warnings:
CEPHADM_PAUSED
Cephadm background work is paused with the ceph orch pause
command. Cephadm continues to perform passive monitoring activities such as checking the host and daemon status, but it does not make any changes like deploying or removing daemons. You can resume Cephadm work with the ceph orch resume
command.
CEPHADM_STRAY_HOST
One or more hosts have running Ceph daemons but are not registered as hosts managed by the Cephadm module. This means that those services are not currently managed by Cephadm, for example, a restart and upgrade that is included in the ceph orch ps
command. You can manage the host(s) with the ceph orch host add HOST_NAME
command but ensure that SSH access to the remote hosts is configured. Alternatively, you can manually connect to the host and ensure that services on that host are removed or migrated to a host that is managed by Cephadm. You can also disable this warning with the setting ceph config set mgr mgr/cephadm/warn_on_stray_hosts false
CEPHADM_STRAY_DAEMON
One or more Ceph daemons are running but are not managed by the Cephadm module. This might be because they were deployed using a different tool, or because they were started manually. Those services are not currently managed by Cephadm, for example, a restart and upgrade that is included in the ceph orch ps
command.
If the daemon is a stateful one that is a monitor or OSD daemon, these daemons should be adopted by Cephadm. For stateless daemons, you can provision a new daemon with the ceph orch apply
command and then stop the unmanaged daemon.
You can disable this health warning with the setting ceph config set mgr mgr/cephadm/warn_on_stray_daemons false
.
CEPHADM_HOST_CHECK_FAILED
One or more hosts have failed the basic Cephadm host check, which verifies that:name: value
- The host is reachable and you can execute Cephadm.
- The host meets the basic prerequisites, like a working container runtime that is Podman , and working time synchronization. If this test fails, Cephadm wont be able to manage the services on that host.
You can manually run this check with the ceph cephadm check-host HOST_NAME
command. You can remove a broken host from management with the ceph orch host rm HOST_NAME
command. You can disable this health warning with the setting ceph config set mgr mgr/cephadm/warn_on_failed_host_check false
.
12.5.3. Cephadm configuration health checks
Cephadm periodically scans each of the hosts in the storage cluster, to understand the state of the OS, disks, and NICs . These facts are analyzed for consistency across the hosts in the storage cluster to identify any configuration anomalies. The configuration checks are an optional feature.
You can enable this feature with the following command:
Example
[ceph: root@host01 /]# ceph config set mgr mgr/cephadm/config_checks_enabled true
The configuration checks are triggered after each host scan, which is for a duration of one minute.
The
ceph -W cephadm
command shows log entries of the current state and outcome of the configuration checks as follows:Disabled state
Example
ALL cephadm checks are disabled, use 'ceph config set mgr mgr/cephadm/config_checks_enabled true' to enable
Enabled state
Example
CEPHADM 8/8 checks enabled and executed (0 bypassed, 0 disabled). No issues detected
The configuration checks themselves are managed through several
cephadm
subcommands.To determine whether the configuration checks are enabled, run the following command:
Example
[ceph: root@host01 /]# ceph cephadm config-check status
This command returns the status of the configuration checker as either Enabled or Disabled.
To list all the configuration checks and their current state, run the following command:
Example
[ceph: root@host01 /]# ceph cephadm config-check ls NAME HEALTHCHECK STATUS DESCRIPTION kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_netork osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active) kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent
Each configuration check is described as follows:
CEPHADM_CHECK_KERNEL_LSM
Each host within the storage cluster is expected to operate within the same Linux Security Module (LSM) state. For example, if the majority of the hosts are running with SELINUX in enforcing
mode, any host not running in this mode would be flagged as an anomaly and a healthcheck with a warning state is raised.
CEPHADM_CHECK_SUBSCRIPTION
This check relates to the status of the vendor subscription. This check is only performed for hosts using Red Hat Enterprise Linux, but helps to confirm that all the hosts are covered by an active subscription so that patches and updates are available.
CEPHADM_CHECK_PUBLIC_MEMBERSHIP
All members of the cluster should have NICs configured on at least one of the public network subnets. Hosts that are not on the public network will rely on routing which may affect performance.
CEPHADM_CHECK_MTU
The maximum transmission unit (MTU) of the NICs on OSDs can be a key factor in consistent performance. This check examines hosts that are running OSD services to ensure that the MTU is configured consistently within the cluster. This is determined by establishing the MTU setting that the majority of hosts are using, with any anomalies resulting in a Ceph healthcheck.
CEPHADM_CHECK_LINKSPEED
Similar to the MTU check, linkspeed consistency is also a factor in consistent cluster performance. This check determines the linkspeed shared by the majority of the OSD hosts, resulting in a healthcheck for any hosts that are set at a lower linkspeed rate.
CEPHADM_CHECK_NETWORK_MISSING
The public_network
and cluster_network
settings support subnet definitions for IPv4 and IPv6. If these settings are not found on any host in the storage cluster a healthcheck is raised.
CEPHADM_CHECK_CEPH_RELEASE
Under normal operations, the Ceph cluster should be running daemons under the same Ceph release, for example all Red Hat Ceph Storage cluster 5 releases. This check looks at the active release for each daemon, and reports any anomalies as a healthcheck. This check is bypassed if an upgrade process is active within the cluster.
CEPHADM_CHECK_KERNEL_VERSION
The OS kernel version is checked for consistency across the hosts. Once again, the majority of the hosts is used as the basis of identifying anomalies.