Chapter 1. OpenShift Data Foundation deployed using dynamic devices
1.1. OpenShift Data Foundation deployed on AWS
To replace an operational node, see:
To replace a failed node, see:
1.1.1. Replacing an operational AWS node on user-provisioned infrastructure
Prerequisites
- Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
- You must be logged into the OpenShift Container Platform cluster.
Procedure
- Identify the node that you need to replace.
Mark the node as unschedulable:
$ oc adm cordon <node_name>
<node_name>
- Specify the name of node that you need to replace.
Drain the node:
$ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.
Delete the node:
$ oc delete nodes <node_name>
- Create a new Amazon Web Service (AWS) machine instance with the required infrastructure. See Platform requirements.
- Create a new OpenShift Container Platform node using the new AWS machine instance.
Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in
Pending
state:$ oc get csr
Approve all the required OpenShift Container Platform CSRs for the new node:
$ oc adm certificate approve <certificate_name>
<certificate_name>
- Specify the name of the CSR.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.1.2. Replacing an operational AWS node on installer-provisioned infrastructure
Procedure
-
Log in to the OpenShift Web Console, and click Compute
Nodes. - Identify the node that you need to replace. Take a note of its Machine Name.
Mark the node as unschedulable:
$ oc adm cordon <node_name>
<node_name>
- Specify the name of node that you need to replace.
Drain the node:
$ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.
-
Click Compute
Machines. Search for the required machine. -
Besides the required machine, click Action menu (⋮)
Delete Machine. - Click Delete to confirm that the machine is deleted. A new machine is automatically created.
Wait for the new machine to start and transition into Running state.
ImportantThis activity might take at least 5 - 10 minutes or more.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.1.3. Replacing a failed AWS node on user-provisioned infrastructure
Prerequisites
- Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
- You must be logged into the OpenShift Container Platform cluster.
Procedure
- Identify the Amazon Web Service (AWS) machine instance of the node that you need to replace.
- Log in to AWS, and terminate the AWS machine instance that you identified.
- Create a new AWS machine instance with the required infrastructure. See Platform requirements.
- Create a new OpenShift Container Platform node using the new AWS machine instance.
Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in
Pending
state:$ oc get csr
Approve all the required OpenShift Container Platform CSRs for the new node:
$ oc adm certificate approve <certificate_name>
<certificate_name>
- Specify the name of the CSR.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Execute the following command to apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.1.4. Replacing a failed AWS node on installer-provisioned infrastructure
Procedure
-
Log in to the OpenShift Web Console, and click Compute
Nodes. - Identify the faulty node, and click on its Machine Name.
-
Click Actions
Edit Annotations, and click Add More. -
Add
machine.openshift.io/exclude-node-draining
, and click Save. -
Click Actions
Delete Machine, and click Delete. A new machine is automatically created, wait for new machine to start.
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
- Optional: If the failed Amazon Web Service (AWS) instance is not removed automatically, terminate the instance from the AWS console.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.2. OpenShift Data Foundation deployed on VMware
To replace an operational node, see:
To replace a failed node, see:
1.2.1. Replacing an operational VMware node on user-provisioned infrastructure
Prerequisites
- Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
- You must be logged into the OpenShift Container Platform cluster.
Procedure
- Identify the node and its Virtual Machine (VM) that you need replace.
Mark the node as unschedulable:
$ oc adm cordon <node_name>
<node_name>
- Specify the name of node that you need to replace.
Drain the node:
$ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.
Delete the node:
$ oc delete nodes <node_name>
Log in to VMware vSphere, and terminate the VM that you identified:
ImportantDelete the VM only from the inventory and not from the disk.
- Create a new VM on VMware vSphere with the required infrastructure. See Platform requirements.
- Create a new OpenShift Container Platform worker node using the new VM.
Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in
Pending
state:$ oc get csr
Approve all the required OpenShift Container Platform CSRs for the new node:
$ oc adm certificate approve <certificate_name>
<certificate_name>
- Specify the name of the CSR.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.2.2. Replacing an operational VMware node on installer-provisioned infrastructure
Procedure
-
Log in to the OpenShift Web Console, and click Compute
Nodes. - Identify the node that you need to replace. Take a note of its Machine Name.
Mark the node as unschedulable:
$ oc adm cordon <node_name>
<node_name>
- Specify the name of node that you need to replace.
Drain the node:
$ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.
-
Click Compute
Machines. Search for the required machine. -
Besides the required machine, click Action menu (⋮)
Delete Machine. - Click Delete to confirm the machine is deleted. A new machine is automatically created.
Wait for the new machine to start and transition into Running state.
ImportantThis activity might take at least 5 - 10 minutes or more.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.2.3. Replacing a failed VMware node on user-provisioned infrastructure
Prerequisites
- Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
- You must be logged into the OpenShift Container Platform cluster.
Procedure
- Identify the node and its Virtual Machine (VM) that you need to replace.
Delete the node:
$ oc delete nodes <node_name>
<node_name>
- Specify the name of node that you need to replace.
Log in to VMware vSphere and terminate the VM that you identified.
ImportantDelete the VM only from the inventory and not from the disk.
- Create a new VM on VMware vSphere with the required infrastructure. See Platform requirements.
- Create a new OpenShift Container Platform worker node using the new VM.
Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in
Pending
state:$ oc get csr
Approve all the required OpenShift Container Platform CSRs for the new node:
$ oc adm certificate approve <certificate_name>
<certificate_name>
- Specify the name of the CSR.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.2.4. Replacing a failed VMware node on installer-provisioned infrastructure
Procedure
-
Log in to the OpenShift Web Console, and click Compute
Nodes. - Identify the faulty node, and click on its Machine Name.
-
Click Actions
Edit Annotations, and click Add More. -
Add
machine.openshift.io/exclude-node-draining
, and click Save. -
Click Actions
Delete Machine, and click Delete. A new machine is automatically created. Wait for te new machine to start.
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
- Optional: If the failed Virtual Machine (VM) is not removed automatically, terminate the VM from VMware vSphere.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.3. OpenShift Data Foundation deployed on Red Hat Virtualization
1.3.1. Replacing an operational Red Hat Virtualization node on installer-provisioned infrastructure
Procedure
-
Log in to the OpenShift Web Console, and click Compute
Nodes. - Identify the node that you need to replace. Take a note of its Machine Name.
Mark the node as unschedulable:
$ oc adm cordon <node_name>
<node_name>
- Specify the name of node that you need to replace.
Drain the node:
$ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.
-
Click Compute
Machines. Search for the required machine. -
Besides the required machine, click Action menu (⋮)
Delete Machine. Click Delete to confirm the machine is deleted. A new machine is automatically created. Wait for the new machine to start and transition into
Running
state.ImportantThis activity might take at least 5 - 10 minutes or more.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.3.2. Replacing a failed Red Hat Virtualization node on installer-provisioned infrastructure
Procedure
-
Log in to the OpenShift Web Console, and click Compute
Nodes. - Identify the faulty node. Take a note of its Machine Name.
Ensure that the disks are not deleted when you delete the Virtual Machine (VM) instance.
Log in to the Red Hat Virtualization Administration Portal, and remove the virtual disks associated with the monitor pod and Object Storage Devices (OSDs) from the failed VM.
ImportantDo not select the Remove Permanently option when you remove the one or more disks.
-
In the OpenShift Web Console, click Compute
Machines. Search for the required machine. -
Click Actions
Edit Annotations, and click Add More. -
Add
machine.openshift.io/exclude-node-draining
, and click Save. Click Actions
Delete Machine, and click Delete. A new machine is automatically created. Wait for the new machine to start.
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
- Optional: If the failed VM is not removed automatically, remove the VM from the Red Hat Virtualization Administration Portal.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.4. OpenShift Data Foundation deployed on Microsoft Azure
1.4.1. Replacing operational nodes on Azure installer-provisioned infrastructure
Procedure
-
Log in to the OpenShift Web Console, and click Compute
Nodes. - Identify the node that you need to replace. Take a note of its Machine Name.
Mark the node as unschedulable:
$ oc adm cordon <node_name>
<node_name>
- Specify the name of node that you need to replace.
Drain the node:
$ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.
-
Click Compute
Machines. Search for the required machine. -
Besides the required machine, click the Action menu (⋮)
Delete Machine. - Click Delete to confirm the machine is deleted. A new machine is automatically created.
Wait for the new machine to start and transition into Running state.
ImportantThis activity might take at least 5 - 10 minutes or more.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Execute the following command to apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads→ Pods. Confirm that at least the following pods on the new node are in Running state:
-
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that the new Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.
1.4.2. Replacing failed nodes on Azure installer-provisioned infrastructure
Procedure
-
Log in to the OpenShift Web Console, and click Compute
Nodes. - Identify the faulty node, and click on its Machine Name.
-
Click Actions
Edit Annotations, and click Add More. -
Add
machine.openshift.io/exclude-node-draining
, and click Save. -
Click Actions
Delete Machine, and click Delete. A new machine is automatically created. Wait for the new machine to start.
ImportantThis activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.
-
Click Compute
Nodes. Confirm that the new node is in Ready state. Apply the OpenShift Data Foundation label to the new node using any one of the following:
- From the user interface
-
For the new node, click Action Menu (⋮)
Edit Labels. -
Add
cluster.ocs.openshift.io/openshift-storage
, and click Save.
-
For the new node, click Action Menu (⋮)
- From the command-line interface
- Apply the OpenShift Data Foundation label to the new node:
$ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
<new_node_name>
- Specify the name of the new node.
- Optional: If the failed Azure instance is not removed automatically, terminate the instance from the Azure console.
Verification steps
Verify that the new node is present in the output:
$ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
Click Workloads
Pods. Confirm that at least the following pods on the new node are in Running state: -
csi-cephfsplugin-*
-
csi-rbdplugin-*
-
- Verify that all the other required OpenShift Data Foundation pods are in Running state.
Verify that new the Object Storage Device (OSD) pods are running on the replacement node:
$ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.
For each of the new nodes identified in the previous step, do the following:
Create a debug pod and open a chroot environment for the one or more selected hosts:
$ oc debug node/<node_name>
$ chroot /host
Display the list of available block devices:
$ lsblk
Check for the
crypt
keyword beside the one or moreocs-deviceset
names.
- If the verification steps fail, contact Red Hat Support.