Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 11. Replacing Controller Nodes

In certain circumstances a Controller node in a high availability cluster might fail. In these situations, you must remove the node from the cluster and replace it with a new Controller node.

Complete the steps in this section to replace a Controller node. The Controller node replacement process involves running the openstack overcloud deploy command to update the overcloud with a request to replace a Controller node.

Important

The following procedure applies only to high availability environments. Do not use this procedure if using only one Controller node.

11.1. Preparing for Controller replacement
Link kopieren

Before attempting to replace an overcloud Controller node, it is important to check the current state of your Red Hat OpenStack Platform environment. Checking the current state can help avoid complications during the Controller replacement process. Use the following list of preliminary checks to determine if it is safe to perform a Controller node replacement. Run all commands for these checks on the undercloud.

Procedure

Check the current status of the overcloud stack on the undercloud:
```
source stackrc
```
```
$ source stackrc
(undercloud) $ openstack stack list --nested
```
Copy to Clipboard Toggle word wrap
The overcloud stack and its subsequent child stacks should have either a CREATE_COMPLETE or UPDATE_COMPLETE.
Install the database client tools:
```
(undercloud) $ sudo yum -y install mariadb
```
```
(undercloud) $ sudo yum -y install mariadb
```
Copy to Clipboard Toggle word wrap

Configure root user access to the database:

(undercloud) $ sudo cp /var/lib/config-data/puppet-generated/mysql/root/.my.cnf /root/.

(undercloud) $ sudo cp /var/lib/config-data/puppet-generated/mysql/root/.my.cnf /root/.

Copy to Clipboard

Toggle word wrap

Perform a backup of the undercloud databases:

(undercloud) $ mkdir /home/stack/backup
(undercloud) $ sudo mysqldump --all-databases --quick --single-transaction | gzip > /home/stack/backup/dump_db_undercloud.sql.gz

(undercloud) $ mkdir /home/stack/backup
(undercloud) $ sudo mysqldump --all-databases --quick --single-transaction | gzip > /home/stack/backup/dump_db_undercloud.sql.gz

Copy to Clipboard

Toggle word wrap

Check that your undercloud contains 10 GB free storage to accommodate for image caching and conversion when provisioning the new node:
```
(undercloud) $ df -h
```
```
(undercloud) $ df -h
```
Copy to Clipboard Toggle word wrap
Check the status of Pacemaker on the running Controller nodes. For example, if 192.168.0.47 is the IP address of a running Controller node, use the following command to get the Pacemaker status:
```
(undercloud) $ ssh heat-admin@192.168.0.47 'sudo pcs status'
```
```
(undercloud) $ ssh heat-admin@192.168.0.47 'sudo pcs status'
```
Copy to Clipboard Toggle word wrap
The output should show all services running on the existing nodes and stopped on the failed node.

Check the following parameters on each node of the overcloud MariaDB cluster:

wsrep_local_state_comment: Synced

wsrep_cluster_size: 2

Use the following command to check these parameters on each running Controller node. In this example, the Controller node IP addresses are 192.168.0.47 and 192.168.0.46:

(undercloud) $ for i in 192.168.0.47 192.168.0.46 ; do echo "*** $i ***" ; ssh heat-admin@$i "sudo mysql -p\$(sudo hiera -c /etc/puppet/hiera.yaml mysql::server::root_password) --execute=\"SHOW STATUS LIKE 'wsrep_local_state_comment'; SHOW STATUS LIKE 'wsrep_cluster_size';\""; done

(undercloud) $ for i in 192.168.0.47 192.168.0.46 ; do echo "*** $i ***" ; ssh heat-admin@$i "sudo mysql -p\$(sudo hiera -c /etc/puppet/hiera.yaml mysql::server::root_password) --execute=\"SHOW STATUS LIKE 'wsrep_local_state_comment'; SHOW STATUS LIKE 'wsrep_cluster_size';\""; done

Copy to Clipboard

Toggle word wrap

Check the RabbitMQ status. For example, if 192.168.0.47 is the IP address of a running Controller node, use the following command to get the status:
```
(undercloud) $ ssh heat-admin@192.168.0.47 "sudo docker exec \$(sudo docker ps -f name=rabbitmq-bundle -q) rabbitmqctl cluster_status"
```
```
(undercloud) $ ssh heat-admin@192.168.0.47 "sudo docker exec \$(sudo docker ps -f name=rabbitmq-bundle -q) rabbitmqctl cluster_status"
```
Copy to Clipboard Toggle word wrap
The running_nodes key should only show the two available nodes and not the failed node.

Disable fencing, if enabled. For example, if 192.168.0.47 is the IP address of a running Controller node, use the following command to check the status of fencing:

(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs property show stonith-enabled"

(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs property show stonith-enabled"

Copy to Clipboard

Toggle word wrap

Run the following command to disable fencing:

(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs property set stonith-enabled=false"

(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs property set stonith-enabled=false"

Copy to Clipboard

Toggle word wrap

Check the Compute services are active on the director node:
```
(undercloud) $ openstack hypervisor list
```
```
(undercloud) $ openstack hypervisor list
```
Copy to Clipboard Toggle word wrap
The output should show all non-maintenance mode nodes as up.
Ensure all undercloud containers are running:
```
(undercloud) $ sudo docker ps
```
```
(undercloud) $ sudo docker ps
```
Copy to Clipboard Toggle word wrap

11.2. Removing a Ceph Monitor daemon
Link kopieren

Follow this procedure to remove a ceph-mon daemon from the storage cluster. If your Controller node is running a Ceph monitor service, complete the following steps to remove the ceph-mon daemon. This procedure assumes the Controller is reachable.

Note

Adding a new Controller to the cluster also adds a new Ceph monitor daemon automatically.

Procedure

Connect to the Controller you want to replace and become root:
```
ssh heat-admin@192.168.0.47
sudo su -
```
```
# ssh heat-admin@192.168.0.47
# sudo su -
```
Copy to Clipboard Toggle word wrap
Note
If the controller is unreachable, skip steps 1 and 2 and continue the procedure at step 3 on any working controller node.

As root, stop the monitor:

systemctl stop ceph-mon@<monitor_hostname>

# systemctl stop ceph-mon@<monitor_hostname>

Copy to Clipboard

Toggle word wrap

For example:

systemctl stop ceph-mon@overcloud-controller-1

# systemctl stop ceph-mon@overcloud-controller-1

Copy to Clipboard

Toggle word wrap

Disconnect from the controller to be replaced.
Connect to one of the existing controllers.
```
ssh heat-admin@192.168.0.46
sudo su -
```
```
# ssh heat-admin@192.168.0.46
# sudo su -
```
Copy to Clipboard Toggle word wrap
Remove the monitor from the cluster:
```
ceph mon remove <mon_id>
```
```
# ceph mon remove <mon_id>
```
Copy to Clipboard Toggle word wrap
On all Controller nodes, remove the monitor entry from /etc/ceph/ceph.conf. For example, if you remove controller-1, then remove the IP and hostname for controller-1.
Before:
```
mon host = 172.18.0.21,172.18.0.22,172.18.0.24
mon initial members = overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
```
```
mon host = 172.18.0.21,172.18.0.22,172.18.0.24
mon initial members = overcloud-controller-2,overcloud-controller-1,overcloud-controller-0
```
Copy to Clipboard Toggle word wrap
After:
```
mon host = 172.18.0.22,172.18.0.24
mon initial members = overcloud-controller-2,overcloud-controller-0
```
```
mon host = 172.18.0.22,172.18.0.24
mon initial members = overcloud-controller-2,overcloud-controller-0
```
Copy to Clipboard Toggle word wrap
Note
The director updates the ceph.conf file on the relevant overcloud nodes when you add the replacement controller node. Normally, director manages this configuration file exclusively and you should not edit the file manually. However, you can edit the file manually to ensure consistency in case the other nodes restart before you add the new node.

Optionally, archive the monitor data and save the archive on another server:

mv /var/lib/ceph/mon/<cluster>-<daemon_id> /var/lib/ceph/mon/removed-<cluster>-<daemon_id>

# mv /var/lib/ceph/mon/<cluster>-<daemon_id> /var/lib/ceph/mon/removed-<cluster>-<daemon_id>

Copy to Clipboard

Toggle word wrap

11.3. Preparing the cluster for Controller replacement
Link kopieren

Before replacing the old node, you must ensure that Pacemaker is no longer running on the node and then remove that node from the Pacemaker cluster.

Procedure

Get a list of IP addresses for the Controller nodes:

(undercloud) $ openstack server list -c Name -c Networks
+------------------------+-----------------------+
| Name                   | Networks              |
+------------------------+-----------------------+
| overcloud-compute-0    | ctlplane=192.168.0.44 |
| overcloud-controller-0 | ctlplane=192.168.0.47 |
| overcloud-controller-1 | ctlplane=192.168.0.45 |
| overcloud-controller-2 | ctlplane=192.168.0.46 |
+------------------------+-----------------------+

(undercloud) $ openstack server list -c Name -c Networks
+------------------------+-----------------------+
| Name                   | Networks              |
+------------------------+-----------------------+
| overcloud-compute-0    | ctlplane=192.168.0.44 |
| overcloud-controller-0 | ctlplane=192.168.0.47 |
| overcloud-controller-1 | ctlplane=192.168.0.45 |
| overcloud-controller-2 | ctlplane=192.168.0.46 |
+------------------------+-----------------------+

Copy to Clipboard

Toggle word wrap

If the old node is still reachable, log in to one of the remaining nodes and stop pacemaker on the old node. For this example, stop pacemaker on overcloud-controller-1:

(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs status | grep -w Online | grep -w overcloud-controller-1"
(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs cluster stop overcloud-controller-1"

(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs status | grep -w Online | grep -w overcloud-controller-1"
(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs cluster stop overcloud-controller-1"

Copy to Clipboard

Toggle word wrap

Note

In case the old node is physically unavailable or stopped, it is not necessary to perform the previous operation, as pacemaker is already stopped on that node.

After stopping Pacemaker on the old node, delete the old node from the Corosync configuration on each node and restart Corosync. To check the status of Pacemaker on the old node, run the pcs status command and verify that the status is Stopped.

The following example command logs in to overcloud-controller-0 and overcloud-controller-2 to remove overcloud-controller-1:

(undercloud) $ for NAME in overcloud-controller-0 overcloud-controller-2; do IP=$(openstack server list -c Networks -f value --name $NAME | cut -d "=" -f 2) ; ssh heat-admin@$IP "sudo pcs cluster node remove overcloud-controller-1; sudo pcs cluster reload corosync"; done

(undercloud) $ for NAME in overcloud-controller-0 overcloud-controller-2; do IP=$(openstack server list -c Networks -f value --name $NAME | cut -d "=" -f 2) ; ssh heat-admin@$IP "sudo pcs cluster node remove overcloud-controller-1; sudo pcs cluster reload corosync"; done

Copy to Clipboard

Toggle word wrap

sudo crm_node -R overcloud-controller-1 --force

(undercloud) $ ssh heat-admin@192.168.0.47
[heat-admin@overcloud-controller-0 ~]$ sudo crm_node -R overcloud-controller-1 --force

Copy to Clipboard

Toggle word wrap

The overcloud database must continue to run during the replacement procedure. To ensure Pacemaker does not stop Galera during this procedure, select a running Controller node and run the following command on the undercloud using the Controller node’s IP address:
```
(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs resource unmanage galera-bundle"
```
```
(undercloud) $ ssh heat-admin@192.168.0.47 "sudo pcs resource unmanage galera-bundle"
```
Copy to Clipboard Toggle word wrap

11.4. Replacing a Controller node
Link kopieren

To replace a Controller node, identify the index of the node that you want to replace.

If the node is a virtual node, identify the node that contains the failed disk and restore the disk from a backup. Ensure that the MAC address of the NIC used for PXE boot on the failed server remains the same after disk replacement.
If the node is a bare metal node, replace the disk, prepare the new disk with your overcloud configuration, and perform a node introspection on the new hardware.

Complete the following example steps to replace the the overcloud-controller-1 node with the overcloud-controller-3 node. The overcloud-controller-3 node has the ID 75b25e9a-948d-424a-9b3b-f0ef70a6eacf.

Important

To replace the node with an existing ironic node, enable maintenance mode on the outgoing node so that the director does not automatically reprovision the node.

Procedure

Source the stackrc file:
```
source ~/stackrc
```
```
$ source ~/stackrc
```
Copy to Clipboard Toggle word wrap

Identify the index of the overcloud-controller-1 node:

INSTANCE=$(openstack server list --name overcloud-controller-1 -f value -c ID)

$ INSTANCE=$(openstack server list --name overcloud-controller-1 -f value -c ID)

Copy to Clipboard

Toggle word wrap

Identify the bare metal node associated with the instance:

NODE=$(openstack baremetal node list -f csv --quote minimal | grep $INSTANCE | cut -f1 -d,)

$ NODE=$(openstack baremetal node list -f csv --quote minimal | grep $INSTANCE | cut -f1 -d,)

Copy to Clipboard

Toggle word wrap

Set the node to maintenance mode:

openstack baremetal node maintenance set $NODE

$ openstack baremetal node maintenance set $NODE

Copy to Clipboard

Toggle word wrap

If the Controller node is a virtual node, run the following command on the Controller host to replace the virtual disk from a backup:
```
cp <VIRTUAL_DISK_BACKUP> /var/lib/libvirt/images/<VIRTUAL_DISK>
```
```
$ cp <VIRTUAL_DISK_BACKUP> /var/lib/libvirt/images/<VIRTUAL_DISK>
```
Copy to Clipboard Toggle word wrap
Replace <VIRTUAL_DISK_BACKUP> with the path to the backup of the failed virtual disk, and replace <VIRTUAL_DISK> with the name of the virtual disk that you want to replace.
If you do not have a backup of the outgoing node, you must use a new virtualized node.
If the Controller node is a bare metal node, complete the following steps to replace the disk with a new bare metal disk:
1. Replace the physical hard drive or solid state drive.
2. Prepare the node with the same configuration as the failed node.
List unassociated nodes and identify the ID of the new node:
```
openstack baremetal node list --unassociated
```
```
$ openstack baremetal node list --unassociated
```
Copy to Clipboard Toggle word wrap

Tag the new node with the control profile:

(undercloud) $ openstack baremetal node set --property capabilities='profile:control,boot_option:local' 75b25e9a-948d-424a-9b3b-f0ef70a6eacf

(undercloud) $ openstack baremetal node set --property capabilities='profile:control,boot_option:local' 75b25e9a-948d-424a-9b3b-f0ef70a6eacf

Copy to Clipboard

Toggle word wrap

11.5. Triggering the Controler node replacement
Link kopieren

Complete the following steps to remove the old Controller node and replace it with a new Controller node.

Procedure

Create an environment file (~/templates/remove-controller.yaml) that defines the node index to remove:
```
parameters:
  ControllerRemovalPolicies:
    [{'resource_list': ['1']}]
```
```
parameters:
  ControllerRemovalPolicies:
    [{'resource_list': ['1']}]
```
Copy to Clipboard Toggle word wrap
Run your overcloud deployment command, including the remove-controller.yaml environment file along with any other environment files relevant to your environment:
```
(undercloud) $ openstack overcloud deploy --templates \
    -e /home/stack/templates/remove-controller.yaml \
    -e /home/stack/templates/node-info.yaml \
    [OTHER OPTIONS]
```
```
(undercloud) $ openstack overcloud deploy --templates \
    -e /home/stack/templates/remove-controller.yaml \
    -e /home/stack/templates/node-info.yaml \
    [OTHER OPTIONS]
```
Copy to Clipboard Toggle word wrap
Note
Include -e ~/templates/remove-controller.yaml only for this instance of the deployment command. Remove this environment file from subsequent deployment operations.
The director removes the old node, creates a new one, and updates the overcloud stack. You can check the status of the overcloud stack with the following command:
```
(undercloud) $ openstack stack list --nested
```
```
(undercloud) $ openstack stack list --nested
```
Copy to Clipboard Toggle word wrap

Once the deployment command completes, the director shows the old node replaced with the new node:

(undercloud) $ openstack server list -c Name -c Networks
+------------------------+-----------------------+
| Name                   | Networks              |
+------------------------+-----------------------+
| overcloud-compute-0    | ctlplane=192.168.0.44 |
| overcloud-controller-0 | ctlplane=192.168.0.47 |
| overcloud-controller-2 | ctlplane=192.168.0.46 |
| overcloud-controller-3 | ctlplane=192.168.0.48 |
+------------------------+-----------------------+

(undercloud) $ openstack server list -c Name -c Networks
+------------------------+-----------------------+
| Name                   | Networks              |
+------------------------+-----------------------+
| overcloud-compute-0    | ctlplane=192.168.0.44 |
| overcloud-controller-0 | ctlplane=192.168.0.47 |
| overcloud-controller-2 | ctlplane=192.168.0.46 |
| overcloud-controller-3 | ctlplane=192.168.0.48 |
+------------------------+-----------------------+

Copy to Clipboard

Toggle word wrap

The new node now hosts running control plane services.

11.6. Cleaning up after Controller node replacement
Link kopieren

After completing the node replacement, complete the following steps to finalize the Controller cluster.

Procedure

Log into a Controller node.

Enable Pacemaker management of the Galera cluster and start Galera on the new node:

sudo pcs resource refresh galera-bundle
sudo pcs resource manage galera-bundle

[heat-admin@overcloud-controller-0 ~]$ sudo pcs resource refresh galera-bundle
[heat-admin@overcloud-controller-0 ~]$ sudo pcs resource manage galera-bundle

Copy to Clipboard

Toggle word wrap

Perform a final status check to make sure services are running correctly:
```
sudo pcs status
```
```
[heat-admin@overcloud-controller-0 ~]$ sudo pcs status
```
Copy to Clipboard Toggle word wrap
Note
If any services have failed, use the pcs resource refresh command to resolve and restart the failed services.
Exit to the director
```
exit
```
```
[heat-admin@overcloud-controller-0 ~]$ exit
```
Copy to Clipboard Toggle word wrap
Source the overcloudrc file so that you can interact with the overcloud:
```
source ~/overcloudrc
```
```
$ source ~/overcloudrc
```
Copy to Clipboard Toggle word wrap
Check the network agents in your overcloud environment:
```
(overcloud) $ openstack network agent list
```
```
(overcloud) $ openstack network agent list
```
Copy to Clipboard Toggle word wrap

If any agents appear for the old node, remove them:

(overcloud) $ for AGENT in $(openstack network agent list --host overcloud-controller-1.localdomain -c ID -f value) ; do openstack network agent delete $AGENT ; done

(overcloud) $ for AGENT in $(openstack network agent list --host overcloud-controller-1.localdomain -c ID -f value) ; do openstack network agent delete $AGENT ; done

Copy to Clipboard

Toggle word wrap

If necessary, add your router to the L3 agent host on the new node. Use the following example command to add a router named r1 to the L3 agent using the UUID 2d1c1dc1-d9d4-4fa9-b2c8-f29cd1a649d4:
```
(overcloud) $ openstack network agent add router --l3 2d1c1dc1-d9d4-4fa9-b2c8-f29cd1a649d4 r1
```
```
(overcloud) $ openstack network agent add router --l3 2d1c1dc1-d9d4-4fa9-b2c8-f29cd1a649d4 r1
```
Copy to Clipboard Toggle word wrap
Compute services for the removed node still exist in the overcloud and require removal. Check the compute services for the removed node:
```
source ~/overcloudrc
```
```
[stack@director ~]$ source ~/overcloudrc
(overcloud) $ openstack compute service list --host overcloud-controller-1.localdomain
```
Copy to Clipboard Toggle word wrap

Remove the compute services for the removed node:

(overcloud) $ for SERVICE in $(openstack compute service list --host overcloud-controller-1.localdomain -c ID -f value ) ; do openstack compute service delete $SERVICE ; done

(overcloud) $ for SERVICE in $(openstack compute service list --host overcloud-controller-1.localdomain -c ID -f value ) ; do openstack compute service delete $SERVICE ; done

Copy to Clipboard

Toggle word wrap

Nach oben

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 11. Replacing Controller Nodes

11.1. Preparing for Controller replacement
Link kopieren

11.2. Removing a Ceph Monitor daemon
Link kopieren

11.3. Preparing the cluster for Controller replacement
Link kopieren

11.4. Replacing a Controller node
Link kopieren

11.5. Triggering the Controler node replacement
Link kopieren

11.6. Cleaning up after Controller node replacement
Link kopieren

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.

Chapter 11. Replacing Controller Nodes

11.1. Preparing for Controller replacementLink kopierenLink in die Zwischenablage kopiert!

11.2. Removing a Ceph Monitor daemonLink kopierenLink in die Zwischenablage kopiert!

11.3. Preparing the cluster for Controller replacementLink kopierenLink in die Zwischenablage kopiert!

11.4. Replacing a Controller nodeLink kopierenLink in die Zwischenablage kopiert!

11.5. Triggering the Controler node replacementLink kopierenLink in die Zwischenablage kopiert!

11.6. Cleaning up after Controller node replacementLink kopierenLink in die Zwischenablage kopiert!

Lernen

Testen, kaufen und verkaufen

Communitys

Über Red Hat Dokumentation

Mehr Inklusion in Open Source

Über Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.1. Preparing for Controller replacement
Link kopieren

11.2. Removing a Ceph Monitor daemon
Link kopieren

11.3. Preparing the cluster for Controller replacement
Link kopieren

11.4. Replacing a Controller node
Link kopieren

11.5. Triggering the Controler node replacement
Link kopieren

11.6. Cleaning up after Controller node replacement
Link kopieren