Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 8. Managing database replication with Galera
Red Hat OpenStack Platform uses the MariaDB Galera Cluster to manage database replication. Pacemaker runs the Galera service as a bundle set resource that manages the database master/slave status. You can use Galera to test and verify different aspects of the database cluster, such as hostname resolution, cluster integrity, node integrity, and database replication performance.
When you investigate database cluster integrity, each node must meet the following criteria:
- The node is a part of the correct cluster.
- The node can write to the cluster.
- The node can receive queries and write commands from the cluster.
- The node is connected to other nodes in the cluster.
- The node is replicating write-sets to tables in the local database.
8.1. Verifying hostname resolution in a MariaDB cluster Link kopierenLink in die Zwischenablage kopiert!
To troubleshoot the MariaDB Galera cluster, first eliminate any hostname resolution problems and then check the write-set replication status on the database of each Controller node. To access the MySQL database, use the password set by director during the overcloud deployment.
By default, director binds the Galera resource to a hostname instead of an IP address. Therefore, any problems that prevent hostname resolution, such as misconfigured or failed DNS, might cause Pacemaker to incorrectly manage the Galera resource.
Procedure
From a Controller node, get the MariaDB database root password by running the
hiera
command.sudo hiera -c /etc/puppet/hiera.yaml "mysql::server::root_password"
$ sudo hiera -c /etc/puppet/hiera.yaml "mysql::server::root_password" *[MYSQL-HIERA-PASSWORD]*
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the name of the MariaDB container that runs on the node.
sudo podman ps | grep -i galera
$ sudo podman ps | grep -i galera a403d96c5026 undercloud.ctlplane.localdomain:8787/rhosp-rhel8/openstack-mariadb:16.0-106 /bin/bash /usr/lo... 3 hours ago Up 3 hours ago galera-bundle-podman-0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Get the write-set replication information from the MariaDB database on each node.
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Each relevant variable uses the prefix
wsrep
.- Verify the health and integrity of the MariaDB Galera cluster by checking that the cluster is reporting the correct number of nodes.
8.2. Checking MariaDB cluster integrity Link kopierenLink in die Zwischenablage kopiert!
To investigate problems with the MariaDB Galera Cluster, check the integrity of the whole cluster by checking specific wsrep
database variables on each Controller node.
Procedure
Run the following command and replace
<variable>
with thewsrep
database variable that you want to check:sudo podman exec galera-bundle-podman-0 sudo mysql -B --password="[MYSQL-HIERA-PASSWORD]" -e "SHOW GLOBAL STATUS LIKE <variable;"
$ sudo podman exec galera-bundle-podman-0 sudo mysql -B --password="[MYSQL-HIERA-PASSWORD]" -e "SHOW GLOBAL STATUS LIKE <variable;"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following example shows how to view the cluster state UUID of the node:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following table lists the
wsrep
database variables that you can use to check cluster integrity.Expand Table 8.1. Database variables to check for cluster integrity Variable Summary Description wsrep_cluster_state_uuid
Cluster state UUID
ID of the cluster to which the node belongs. All nodes must have an identical cluster ID. A node with a different ID is not connected to the cluster.
wsrep_cluster_size
Number of nodes in the cluster
You can check this on any node. If the value is less than the actual number of nodes, then some nodes either failed or lost connectivity.
wsrep_cluster_conf_id
Total number of cluster changes
Determines whether the cluster was split to several components, or partitions. Partitioning is usually caused by a network failure. All nodes must have an identical value.
In case some nodes report a different
wsrep_cluster_conf_id
, check thewsrep_cluster_status
value to see if the nodes can still write to the cluster (Primary
).wsrep_cluster_status
Primary component status
Determines whether the node can write to the cluster. If the node can write to the cluster, the
wsrep_cluster_status
value isPrimary
. Any other value indicates that the node is part of a non-operational partition.
8.3. Checking database node integrity in a MariaDB cluster Link kopierenLink in die Zwischenablage kopiert!
To investigate problems with a specific Controller node in the MariaDB Galera Cluster, check the integrity of the node by checking specific wsrep
database variables.
Procedure
Run the following command and replace
<variable>
with thewsrep
database variable that you want to check:sudo podman exec galera-bundle-podman-0 sudo mysql -B --password="[MYSQL-HIERA-PASSWORD]" -e "SHOW GLOBAL STATUS LIKE <variable>;"
$ sudo podman exec galera-bundle-podman-0 sudo mysql -B --password="[MYSQL-HIERA-PASSWORD]" -e "SHOW GLOBAL STATUS LIKE <variable>;"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following table lists the
wsrep
database variables that you can use to check node integrity.Expand Table 8.2. Database variables to check for node integrity Variable Summary Description wsrep_ready
Node ability to accept queries
States whether the node can accept write-sets from the cluster. If so, then
wsrep_ready
isON
.wsrep_connected
Node network connectivity
States whether the node can connect to other nodes on the network. If so, then
wsrep_connected
isON
.wsrep_local_state_comment
Node state
Summarizes the node state. If the node can write to the cluster, then typical values for
wsrep_local_state_comment
can beJoining
,Waiting on SST
,Joined
,Synced
, orDonor
.If the node is part of a non-operational component, then the value of
wsrep_local_state_comment
isInitialized
.Note-
The
wsrep_connected
value can beON
even if the node is connected only to a subset of nodes in the cluster. For example, in case of a cluster partition, the node might be part of a component that cannot write to the cluster. For more information about checking cluster integrity, see Section 8.2, “Checking MariaDB cluster integrity”. -
If the
wsrep_connected
value isOFF
, then the node is not connected to any cluster components.
-
The
8.4. Testing database replication performance in a MariaDB cluster Link kopierenLink in die Zwischenablage kopiert!
To check the performance of the MariaDB Galera Cluster, run benchmark tests on the replication throughput of the cluster by checking specific wsrep
database variables.
Every time you query one of these variables, a FLUSH STATUS
command resets the variable value. To run benchmark tests, you must run multiple queries and analyze the variances. These variances can help you determine how much Flow Control is affecting the cluster performance.
Flow Control is a mechanism that the cluster uses to manage replication. When the local receive queue exceeds a certain threshold, Flow Control pauses the replication until the queue size goes down. For more information about Flow Control, see Flow Control on the Galera Cluster website.
Procedure
Run the following command and replace
<variable>
with thewsrep
database variable that you want to check:sudo podman exec galera-bundle-podman-0 sudo mysql -B --password="[MYSQL-HIERA-PASSWORD]" -e "SHOW STATUS LIKE <variable>;"
$ sudo podman exec galera-bundle-podman-0 sudo mysql -B --password="[MYSQL-HIERA-PASSWORD]" -e "SHOW STATUS LIKE <variable>;"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The following table lists the
wsrep
database variables that you can use to test database replication performance.Expand Table 8.3. Database variables to check for database replication performance Variable Summary Usage wsrep_local_recv_queue_avg
Average size of the local received write-set queue after the last query.
A value higher than 0.0 indicates that the node cannot apply write-sets as quickly as it receives write-sets, which triggers replication throttling. Check
wsrep_local_recv_queue_min
andwsrep_local_recv_queue_max
for a detailed look at this benchmark.wsrep_local_send_queue_avg
Average send queue length after the last query.
A value higher than 0.0 indicates a higher likelihood of replication throttling and network throughput problems.
wsrep_local_recv_queue_min
andwsrep_local_recv_queue_max
Minimum and maximum size of the local receive queue after the last query.
If the value of
wsrep_local_recv_queue_avg
is higher than 0.0, you can check these variables to determine the scope of the queue size.wsrep_flow_control_paused
Fraction of the time that Flow Control paused the node after the last query.
A value higher than 0.0 indicates that Flow Control paused the node. To determine the duration of the pause, multiply the
wsrep_flow_control_paused
value with the number of seconds between the queries. The optimal value is as close to 0.0 as possible.For example:
-
If the value of
wsrep_flow_control_paused
is 0.50 one minute after the last query, then Flow Control paused the node for 30 seconds. -
If the value of
wsrep_flow_control_paused
is 1.0 one minute after the last query, then Flow Control paused the node for the entire minute.
wsrep_cert_deps_distance
Average difference between the lowest and highest sequence number (
seqno
) value that can be applied in parallelIn case of throttling and pausing, this variable indicates how many write-sets on average can be applied in parallel. Compare the value with the
wsrep_slave_threads
variable to see how many write-sets can actually be applied simultaneously.wsrep_slave_threads
Number of threads that can be applied simultaneously
You can increase the value of this variable to apply more threads simultaneously, which also increases the value of
wsrep_cert_deps_distance
. The value ofwsrep_slave_threads
must not be higher than the number of CPU cores in the node.For example, if the
wsrep_cert_deps_distance
value is20
, you can increase the value ofwsrep_slave_threads
from2
to4
to increase the amount of write-sets that the node can apply.If a problematic node already has an optimal
wsrep_slave_threads
value, you can exclude the node from the cluster while you investigate possible connectivity issues.-
If the value of