Chapter 11. Contacting Red Hat support for service
If the information in this guide did not help you to solve the problem, this chapter explains how you contact the Red Hat support service.
11.1. Prerequisites
- Red Hat support account.
11.2. Providing information to Red Hat Support engineers
If you are unable to fix problems related to Red Hat Ceph Storage, contact the Red Hat Support Service and provide sufficient amount of information that helps the support engineers to faster troubleshoot the problem you encounter.
Prerequisites
- Root-level access to the node.
- Red Hat support account.
Procedure
- Open a support ticket on the Red Hat Customer Portal.
-
Ideally, attach an
sosreport
to the ticket. See the What is a sosreport and how to create one in Red Hat Enterprise Linux 4.6 and later? solution for details. - If the Ceph daemons fail with a segmentation fault, consider generating a human-readable core dump file. See Generating readable core dump files for details.
11.3. Generating readable core dump files
When a Ceph daemon terminates unexpectedly with a segmentation fault, gather the information about its failure and provide it to the Red Hat Support Engineers.
Such information speeds up the initial investigation. Also, the Support Engineers can compare the information from the core dump files with Red Hat Ceph Storage cluster known issues.
11.3.1. Prerequisites
Install the
ceph-debuginfo
package if it is not installed already.Enable the repository containing the
ceph-debuginfo
package:Red Hat Enterprise Linux 7:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow subscription-manager repos --enable=rhel-7-server-rhceph-4-DAEMON-debug-rpms
subscription-manager repos --enable=rhel-7-server-rhceph-4-DAEMON-debug-rpms
Replace
DAEMON
withosd
ormon
depending on the type of Ceph node.Red Hat Enterprise Linux 8:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow subscription-manager repos --enable=rhceph-4-tools-for-rhel-8-x86_64-debug-rpms
subscription-manager repos --enable=rhceph-4-tools-for-rhel-8-x86_64-debug-rpms
Install the
ceph-debuginfo
package:Copy to Clipboard Copied! Toggle word wrap Toggle overflow yum install ceph-debuginfo
[root@mon ~]# yum install ceph-debuginfo
Ensure that the
gdb
package is installed and if it is not, install it:Copy to Clipboard Copied! Toggle word wrap Toggle overflow yum install gdb
[root@mon ~]# yum install gdb
Continue with the procedure based on the type of your deployment:
11.3.2. Generating readable core dump files on bare-metal deployments
Follow this procedure to generate a core dump file if you use Red Hat Ceph Storage on bare-metal.
Procedure
Enable generating core dump files for Ceph.
Set the proper
ulimits
for the core dump files by adding the following parameter to the/etc/systemd/system.conf
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow DefaultLimitCORE=infinity
DefaultLimitCORE=infinity
Comment out the
PrivateTmp=true
parameter in the Ceph daemon service file, by default located at/lib/systemd/system/CLUSTER_NAME-DAEMON@.service
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow PrivateTmp=true
[root@mon ~]# PrivateTmp=true
Set the
suid_dumpable
flag to2
to allow the Ceph daemons to generate dump core files:Copy to Clipboard Copied! Toggle word wrap Toggle overflow sysctl fs.suid_dumpable=2
[root@mon ~]# sysctl fs.suid_dumpable=2
Adjust the core dump files location:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow sysctl kernel.core_pattern=/tmp/core
[root@mon ~]# sysctl kernel.core_pattern=/tmp/core
Modify
/etc/systemd/coredump.conf
file by adding the following lines under section[Coredump]
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ProcessSizeMax=8G ExternalSizeMax=8G JournalSizeMax=8G
ProcessSizeMax=8G ExternalSizeMax=8G JournalSizeMax=8G
Reload the
systemd
service for the changes to take effect:Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl daemon-reload
[root@mon ~]# systemctl daemon-reload
Restart the Ceph daemon for the changes to take effect:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart ceph-DAEMON@ID
[root@mon ~]# systemctl restart ceph-DAEMON@ID
Specify the daemon type (
osd
ormon
) and its ID (numbers for OSDs, or short host names for Monitors) for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart ceph-osd@1
[root@mon ~]# systemctl restart ceph-osd@1
- Reproduce the failure, for example try to start the daemon again.
Use the GNU Debugger (GDB) to generate a readable backtrace from an application core dump file:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow gdb /usr/bin/ceph-DAEMON /tmp/core.PID
gdb /usr/bin/ceph-DAEMON /tmp/core.PID
Specify the daemon type and the PID of the failed process, for example:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow gdb /usr/bin/ceph-osd /tmp/core.123456
$ gdb /usr/bin/ceph-osd /tmp/core.123456
In the GDB command prompt disable paging and enable logging to a file by entering the commands
set pag off
andset log on
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow (gdb) set pag off (gdb) set log on
(gdb) set pag off (gdb) set log on
Apply the
backtrace
command to all threads of the process by enteringthr a a bt full
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow (gdb) thr a a bt full
(gdb) thr a a bt full
After the backtrace is generated turn off logging by entering
set log off
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow (gdb) set log off
(gdb) set log off
-
Transfer the log file
gdb.txt
to the system you access the Red Hat Customer Portal from and attach it to a support ticket.
11.3.3. Generating readable core dump files in containerized deployments
Follow this procedure to generate a core dump file if you use Red Hat Ceph Storage in containers. The procedure involves two scenarios of capturing the core dump file:
- When a Ceph process terminates unexpectedly due to the SIGILL, SIGTRAP, SIGABRT, or SIGSEGV error.
or
- Manually, for example for debugging issues such as Ceph processes are consuming high CPU cycles, or are not responding.
Prerequisites
- Root-level access to the container node running the Ceph containers.
- Installation of the appropriate debugging packages.
-
Installation of the GNU Project Debugger (
gdb
) package.
Procedure
If a Ceph process terminates unexpectedly due to the SIGILL, SIGTRAP, SIGABRT, or SIGSEGV error:
Set the core pattern to the
systemd-coredump
service on the node where the container with the failed Ceph process is running, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow echo "| /usr/lib/systemd/systemd-coredump %P %u %g %s %t %e" > /proc/sys/kernel/core_pattern
[root@mon]# echo "| /usr/lib/systemd/systemd-coredump %P %u %g %s %t %e" > /proc/sys/kernel/core_pattern
Watch for the next container failure due to a Ceph process and search for a core dump file in the
/var/lib/systemd/coredump/
directory, for example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow ls -ltr /var/lib/systemd/coredump
[root@mon]# ls -ltr /var/lib/systemd/coredump total 8232 -rw-r-----. 1 root root 8427548 Jan 22 19:24 core.ceph-osd.167.5ede29340b6c4fe4845147f847514c12.15622.1584573794000000.xz
To manually capture a core dump file for the Ceph Monitors and Ceph Managers:
Get the
ceph-mon
package details of the Ceph daemon from the container:Red Hat Enterprise Linux 7:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow docker exec -it NAME /bin/bash rpm -qa | grep ceph
[root@mon]# docker exec -it NAME /bin/bash [root@mon]# rpm -qa | grep ceph
Red Hat Enterprise Linux 8:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow podman exec -it NAME /bin/bash rpm -qa | grep ceph
[root@mon]# podman exec -it NAME /bin/bash [root@mon]# rpm -qa | grep ceph
Replace NAME with the name of the Ceph container.
Make a backup copy and open for editing the
ceph-mon@.service
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp /etc/systemd/system/ceph-mon@.service /etc/systemd/system/ceph-mon@.service.orig
[root@mon]# cp /etc/systemd/system/ceph-mon@.service /etc/systemd/system/ceph-mon@.service.orig
In the
ceph-mon@.service
file, add these three options to the[Service]
section, each on a separate line:Copy to Clipboard Copied! Toggle word wrap Toggle overflow --pid=host \ --ipc=host \ --cap-add=SYS_PTRACE \
--pid=host \ --ipc=host \ --cap-add=SYS_PTRACE \
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow [Unit] Description=Ceph Monitor After=docker.service [Service] EnvironmentFile=-/etc/environment ExecStartPre=-/usr/bin/docker rm ceph-mon-%i ExecStartPre=/bin/sh -c '"$(command -v mkdir)" -p /etc/ceph /var/lib/ceph/mon' ExecStart=/usr/bin/docker run --rm --name ceph-mon-%i \ --memory=924m \ --cpu-quota=100000 \ -v /var/lib/ceph:/var/lib/ceph:z \ -v /etc/ceph:/etc/ceph:z \ -v /var/run/ceph:/var/run/ceph:z \ -v /etc/localtime:/etc/localtime:ro \ --net=host \ --privileged=true \ --ipc=host \ --pid=host \ --cap-add=SYS_PTRACE \ -e IP_VERSION=4 \ -e MON_IP=10.74.131.17 \ -e CLUSTER=ceph \ -e FSID=9448efca-b1a1-45a3-bf7b-b55cba696a6e \ -e CEPH_PUBLIC_NETWORK=10.74.131.0/24 \ -e CEPH_DAEMON=MON \ \ registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest ExecStop=-/usr/bin/docker stop ceph-mon-%i ExecStopPost=-/bin/rm -f /var/run/ceph/ceph-mon.pd-cephcontainer-mon01.asok Restart=always RestartSec=10s TimeoutStartSec=120 TimeoutStopSec=15 [Install] WantedBy=multi-user.target
[Unit] Description=Ceph Monitor After=docker.service [Service] EnvironmentFile=-/etc/environment ExecStartPre=-/usr/bin/docker rm ceph-mon-%i ExecStartPre=/bin/sh -c '"$(command -v mkdir)" -p /etc/ceph /var/lib/ceph/mon' ExecStart=/usr/bin/docker run --rm --name ceph-mon-%i \ --memory=924m \ --cpu-quota=100000 \ -v /var/lib/ceph:/var/lib/ceph:z \ -v /etc/ceph:/etc/ceph:z \ -v /var/run/ceph:/var/run/ceph:z \ -v /etc/localtime:/etc/localtime:ro \ --net=host \ --privileged=true \ --ipc=host \
1 --pid=host \
2 --cap-add=SYS_PTRACE \
3 -e IP_VERSION=4 \ -e MON_IP=10.74.131.17 \ -e CLUSTER=ceph \ -e FSID=9448efca-b1a1-45a3-bf7b-b55cba696a6e \ -e CEPH_PUBLIC_NETWORK=10.74.131.0/24 \ -e CEPH_DAEMON=MON \ \ registry.access.redhat.com/rhceph/rhceph-3-rhel7:latest ExecStop=-/usr/bin/docker stop ceph-mon-%i ExecStopPost=-/bin/rm -f /var/run/ceph/ceph-mon.pd-cephcontainer-mon01.asok Restart=always RestartSec=10s TimeoutStartSec=120 TimeoutStopSec=15 [Install] WantedBy=multi-user.target
Restart the Ceph Monitor daemon:
Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart ceph-mon@MONITOR_ID
systemctl restart ceph-mon@MONITOR_ID
Replace MONITOR_ID with the ID number of the Ceph Monitor.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart ceph-mon@1
[root@mon]# systemctl restart ceph-mon@1
Install the
gdb
package inside the Ceph Monitor container:Red Hat Enterprise Linux 7:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow docker exec -it ceph-mon-MONITOR_ID /bin/bash
[root@mon]# docker exec -it ceph-mon-MONITOR_ID /bin/bash sh $ yum install gdb
Red Hat Enterprise Linux 8:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow podman exec -it ceph-mon-MONITOR_ID /bin/bash
[root@mon]# podman exec -it ceph-mon-MONITOR_ID /bin/bash sh $ yum install gdb
Replace MONITOR_ID with the ID number of the Ceph Monitor.
Find the process ID:
Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ps -aef | grep PROCESS | grep -v run
ps -aef | grep PROCESS | grep -v run
Replace PROCESS with the name of failed process, for example
ceph-mon
.Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ps -aef | grep ceph-mon | grep -v run
[root@mon]# ps -aef | grep ceph-mon | grep -v run ceph 15390 15266 0 18:54 ? 00:00:29 /usr/bin/ceph-mon --cluster ceph --setroot ceph --setgroup ceph -d -i 5 ceph 18110 17985 1 19:40 ? 00:00:08 /usr/bin/ceph-mon --cluster ceph --setroot ceph --setgroup ceph -d -i 2
Generate the core dump file:
Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow gcore ID
gcore ID
Replace ID with the ID of the failed process that you got from the previous step, for example
18110
:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow gcore 18110
[root@mon]# gcore 18110 warning: target file /proc/18110/cmdline contained unexpected null characters Saved corefile core.18110
Verify that the core dump file has been generated correctly.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ls -ltr
[root@mon]# ls -ltr total 709772 -rw-r--r--. 1 root root 726799544 Mar 18 19:46 core.18110
Copy the core dump file outside of the Ceph Monitor container:
Red Hat Enterprise Linux 7:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow docker cp ceph-mon-MONITOR_ID:/tmp/mon.core.MONITOR_PID /tmp
[root@mon]# docker cp ceph-mon-MONITOR_ID:/tmp/mon.core.MONITOR_PID /tmp
Red Hat Enterprise Linux 8:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow podman cp ceph-mon-MONITOR_ID:/tmp/mon.core.MONITOR_PID /tmp
[root@mon]# podman cp ceph-mon-MONITOR_ID:/tmp/mon.core.MONITOR_PID /tmp
Replace MONITOR_ID with the ID number of the Ceph Monitor and replace MONITOR_PID with the process ID number.
Restore the backup copy of the
ceph-mon@.service
file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow cp /etc/systemd/system/ceph-mon@.service.orig /etc/systemd/system/ceph-mon@.service
[root@mon]# cp /etc/systemd/system/ceph-mon@.service.orig /etc/systemd/system/ceph-mon@.service
Restart the Ceph Monitor daemon:
Syntax
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart ceph-mon@MONITOR_ID
systemctl restart ceph-mon@MONITOR_ID
Replace MONITOR_ID with the ID number of the Ceph Monitor.
Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow systemctl restart ceph-mon@1
[root@mon]# systemctl restart ceph-mon@1
- Upload the core dump file for analysis by Red Hat support, see step 4.
To manually capture a core dump file for Ceph OSDs:
Get the
ceph-osd
package details of the Ceph daemon from the container:Red Hat Enterprise Linux 7:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow docker exec -it NAME /bin/bash rpm -qa | grep ceph
[root@osd]# docker exec -it NAME /bin/bash [root@osd]# rpm -qa | grep ceph
Red Hat Enterprise Linux 8:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow podman exec -it NAME /bin/bash rpm -qa | grep ceph
[root@osd]# podman exec -it NAME /bin/bash [root@osd]# rpm -qa | grep ceph
Replace NAME with the name of the Ceph container.
Install the Ceph package for the same version of the
ceph-osd
package on the node where the Ceph containers are running:Red Hat Enterprise Linux 7:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow yum install ceph-osd
[root@osd]# yum install ceph-osd
Red Hat Enterprise Linux 8:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow dnf install ceph-osd
[root@osd]# dnf install ceph-osd
If needed, enable the appropriate repository first. See the Enabling the Red Hat Ceph Storage repositories section in the Installation Guide for details.
Find the ID of the process that has failed:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ps -aef | grep PROCESS | grep -v run
ps -aef | grep PROCESS | grep -v run
Replace PROCESS with the name of failed process, for example
ceph-osd
.Copy to Clipboard Copied! Toggle word wrap Toggle overflow ps -aef | grep ceph-osd | grep -v run
[root@osd]# ps -aef | grep ceph-osd | grep -v run ceph 15390 15266 0 18:54 ? 00:00:29 /usr/bin/ceph-osd --cluster ceph --setroot ceph --setgroup ceph -d -i 5 ceph 18110 17985 1 19:40 ? 00:00:08 /usr/bin/ceph-osd --cluster ceph --setroot ceph --setgroup ceph -d -i 2
Generate the core dump file:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow gcore ID
gcore ID
Replace ID with the ID of the failed process that you got from the previous step, for example
18110
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow gcore 18110
[root@osd]# gcore 18110 warning: target file /proc/18110/cmdline contained unexpected null characters Saved corefile core.18110
Verify that the core dump file has been generated correctly.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow ls -ltr
[root@osd]# ls -ltr total 709772 -rw-r--r--. 1 root root 726799544 Mar 18 19:46 core.18110
- Upload the core dump file for analysis by Red Hat support, see the next step.
- Upload the core dump file for analysis to a Red Hat support case. See Providing information to Red Hat Support engineers for details.
11.3.4. Additional Resources
- The How to use gdb to generate a readable backtrace from an application core solution on the Red Hat Customer Portal
- The How to enable core file dumps when an application crashes or segmentation faults solution on the Red Hat Customer Portal