Chapter 3. Troubleshooting networking issues
This chapter lists basic troubleshooting procedures connected with networking and Network Time Protocol (NTP).
3.1. Prerequisites Copy linkLink copied to clipboard!
- A running Red Hat Ceph Storage cluster.
3.2. Basic networking troubleshooting Copy linkLink copied to clipboard!
Red Hat Ceph Storage depends heavily on a reliable network connection. Red Hat Ceph Storage nodes use the network for communicating with each other. Networking issues can cause many problems with Ceph OSDs, such as them flapping, or being incorrectly reported as down
. Networking issues can also cause the Ceph Monitor’s clock skew errors. In addition, packet loss, high latency, or limited bandwidth can impact the cluster performance and stability.
Prerequisites
- Root-level access to the node.
Procedure
Installing the
net-tools
andtelnet
packages can help when troubleshooting network issues that can occur in a Ceph storage cluster:Red Hat Enterprise Linux 7
yum install net-tools yum install telnet
[root@mon ~]# yum install net-tools [root@mon ~]# yum install telnet
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Red Hat Enterprise Linux 8
dnf install net-tools dnf install telnet
[root@mon ~]# dnf install net-tools [root@mon ~]# dnf install telnet
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the
cluster_network
andpublic_network
parameters in the Ceph configuration file include the correct values:Example
cat /etc/ceph/ceph.conf | grep net cluster_network = 192.168.1.0/24 public_network = 192.168.0.0/24
[root@mon ~]# cat /etc/ceph/ceph.conf | grep net cluster_network = 192.168.1.0/24 public_network = 192.168.0.0/24
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the network interfaces are up:
Example
ip link list
[root@mon ~]# ip link list 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp22s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 40:f2:e9:b8:a0:48 brd ff:ff:ff:ff:ff:ff
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the Ceph nodes are able to reach each other using their short host names. Verify this on each node in the storage cluster:
Syntax
ping SHORT_HOST_NAME
ping SHORT_HOST_NAME
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ping osd01
[root@mon ~]# ping osd01
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If you use a firewall, ensure that Ceph nodes are able to reach other on their appropriate ports. The
firewall-cmd
andtelnet
tools can validate the port status, and if the port is open respectively:Syntax
firewall-cmd --info-zone=ZONE telnet IP_ADDRESS PORT
firewall-cmd --info-zone=ZONE telnet IP_ADDRESS PORT
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that there are no errors on the interface counters. Verify that the network connectivity between nodes has expected latency, and that there is no packet loss.
Using the
ethtool
command:Syntax
ethtool -S INTERFACE
ethtool -S INTERFACE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Using the
ifconfig
command:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Using the
netstat
command:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
For performance issues, in addition to the latency checks and to verify the network bandwidth between all nodes of the storage cluster, use the
iperf3
tool. Theiperf3
tool does a simple point-to-point network bandwidth test between a server and a client.Install the
iperf3
package on the Red Hat Ceph Storage nodes you want to check the bandwidth:Red Hat Enterprise Linux 7
yum install iperf3
[root@mon ~]# yum install iperf3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Red Hat Enterprise Linux 8
dnf install iperf3
[root@mon ~]# dnf install iperf3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow On a Red Hat Ceph Storage node, start the
iperf3
server:Example
iperf3 -s
[root@mon ~]# iperf3 -s ----------------------------------------------------------- Server listening on 5201 -----------------------------------------------------------
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe default port is 5201, but can be set using the
-P
command argument.On a different Red Hat Ceph Storage node, start the
iperf3
client:Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow This output shows a network bandwidth of 1.1 Gbits/second between the Red Hat Ceph Storage nodes, along with no retransmissions (
Retr
) during the test.Red Hat recommends you validate the network bandwidth between all the nodes in the storage cluster.
Ensure that all nodes have the same network interconnect speed. Slower attached nodes might slow down the faster connected ones. Also, ensure that the inter switch links can handle the aggregated bandwidth of the attached nodes:
Syntax
ethtool INTERFACE
ethtool INTERFACE
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Additional Resources
- See the Basic Network troubleshooting solution on the Customer Portal for details.
- See the Verifying and configuring the MTU value section in the Red Hat Ceph Storage Configuration Guide.
- See the Configuring Firewall section in the Red Hat Ceph Storage Installation Guide.
- See the What is the "ethtool" command and how can I use it to obtain information about my network devices and interfaces for details.
- See the RHEL network interface dropping packets solutions on the Customer Portal for details.
- For details, see the What are the performance benchmarking tools available for Red Hat Ceph Storage? solution on the Customer Portal.
- The Networking Guide for Red Hat Enterprise Linux 7.
- For more information, see Knowledgebase articles and solutions related to troubleshooting networking issues on the Customer Portal.
3.3. Basic chrony NTP troubleshooting Copy linkLink copied to clipboard!
This section includes basic chrony troubleshooting steps.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the Ceph Monitor node.
Procedure
Verify that the
chronyd
daemon is running on the Ceph Monitor hosts:Example
systemctl status chronyd
[root@mon ~]# systemctl status chronyd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If
chronyd
is not running, enable and start it:Example
systemctl enable chronyd systemctl start chronyd
[root@mon ~]# systemctl enable chronyd [root@mon ~]# systemctl start chronyd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure that
chronyd
is synchronizing the clocks correctly:Example
chronyc sources chronyc sourcestats chronyc tracking
[root@mon ~]# chronyc sources [root@mon ~]# chronyc sourcestats [root@mon ~]# chronyc tracking
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
3.4. Basic NTP troubleshooting Copy linkLink copied to clipboard!
This section includes basic NTP troubleshooting steps.
Prerequisites
- A running Red Hat Ceph Storage cluster.
- Root-level access to the Ceph Monitor node.
Procedure
Verify that the
ntpd
daemon is running on the Ceph Monitor hosts:Example
systemctl status ntpd
[root@mon ~]# systemctl status ntpd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If
ntpd
is not running, enable and start it:Example
systemctl enable ntpd systemctl start ntpd
[root@mon ~]# systemctl enable ntpd [root@mon ~]# systemctl start ntpd
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Ensure that
ntpd
is synchronizing the clocks correctly:Example
ntpq -p
[root@mon ~]# ntpq -p
Copy to Clipboard Copied! Toggle word wrap Toggle overflow