Search

Chapter 1. OpenShift Data Foundation deployed using dynamic devices

download PDF

1.1. OpenShift Data Foundation deployed on AWS

1.1.1. Replacing an operational AWS node on user-provisioned infrastructure

Prerequisites

  • Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.

Procedure

  1. Identify the node that you need to replace.
  2. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  3. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.

  4. Delete the node:

    $ oc delete nodes <node_name>
  5. Create a new Amazon Web Service (AWS) machine instance with the required infrastructure. See Platform requirements.
  6. Create a new OpenShift Container Platform node using the new AWS machine instance.
  7. Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  8. Approve all the required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <certificate_name>
    <certificate_name>
    Specify the name of the CSR.
  9. Click Compute Nodes. Confirm that the new node is in Ready state.
  10. Apply the OpenShift Data Foundation label to the new node using one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.1.2. Replacing an operational AWS node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the node that you need to replace. Take a note of its Machine Name.
  3. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  4. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.

  5. Click Compute Machines. Search for the required machine.
  6. Besides the required machine, click Action menu (⋮) Delete Machine.
  7. Click Delete to confirm that the machine is deleted. A new machine is automatically created.
  8. Wait for the new machine to start and transition into Running state.

    Important

    This activity might take at least 5 - 10 minutes or more.

  9. Click Compute Nodes. Confirm that the new node is in Ready state.
  10. Apply the OpenShift Data Foundation label to the new node:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.1.3. Replacing a failed AWS node on user-provisioned infrastructure

Prerequisites

  • Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.

Procedure

  1. Identify the Amazon Web Service (AWS) machine instance of the node that you need to replace.
  2. Log in to AWS, and terminate the AWS machine instance that you identified.
  3. Create a new AWS machine instance with the required infrastructure. See Platform requirements.
  4. Create a new OpenShift Container Platform node using the new AWS machine instance.
  5. Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  6. Approve all the required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <certificate_name>
    <certificate_name>
    Specify the name of the CSR.
  7. Click Compute Nodes. Confirm that the new node is in Ready state.
  8. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Execute the following command to apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.1.4. Replacing a failed AWS node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the faulty node, and click on its Machine Name.
  3. Click Actions Edit Annotations, and click Add More.
  4. Add machine.openshift.io/exclude-node-draining, and click Save.
  5. Click Actions Delete Machine, and click Delete.
  6. A new machine is automatically created, wait for new machine to start.

    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.

  7. Click Compute Nodes. Confirm that the new node is in Ready state.
  8. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.
  9. Optional: If the failed Amazon Web Service (AWS) instance is not removed automatically, terminate the instance from the AWS console.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.2. OpenShift Data Foundation deployed on VMware

1.2.1. Replacing an operational VMware node on user-provisioned infrastructure

Prerequisites

  • Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.

Procedure

  1. Identify the node and its Virtual Machine (VM) that you need replace.
  2. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  3. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.

  4. Delete the node:

    $ oc delete nodes <node_name>
  5. Log in to VMware vSphere, and terminate the VM that you identified:

    Important

    Delete the VM only from the inventory and not from the disk.

  6. Create a new VM on VMware vSphere with the required infrastructure. See Platform requirements.
  7. Create a new OpenShift Container Platform worker node using the new VM.
  8. Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  9. Approve all the required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <certificate_name>
    <certificate_name>
    Specify the name of the CSR.
  10. Click Compute Nodes. Confirm that the new node is in Ready state.
  11. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.2.2. Replacing an operational VMware node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the node that you need to replace. Take a note of its Machine Name.
  3. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  4. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.

  5. Click Compute Machines. Search for the required machine.
  6. Besides the required machine, click Action menu (⋮) Delete Machine.
  7. Click Delete to confirm the machine is deleted. A new machine is automatically created.
  8. Wait for the new machine to start and transition into Running state.

    Important

    This activity might take at least 5 - 10 minutes or more.

  9. Click Compute Nodes. Confirm that the new node is in Ready state.
  10. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.2.3. Replacing a failed VMware node on user-provisioned infrastructure

Prerequisites

  • Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.

Procedure

  1. Identify the node and its Virtual Machine (VM) that you need to replace.
  2. Delete the node:

    $ oc delete nodes <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  3. Log in to VMware vSphere and terminate the VM that you identified.

    Important

    Delete the VM only from the inventory and not from the disk.

  4. Create a new VM on VMware vSphere with the required infrastructure. See Platform requirements.
  5. Create a new OpenShift Container Platform worker node using the new VM.
  6. Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  7. Approve all the required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <certificate_name>
    <certificate_name>
    Specify the name of the CSR.
  8. Click Compute Nodes. Confirm that the new node is in Ready state.
  9. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.2.4. Replacing a failed VMware node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the faulty node, and click on its Machine Name.
  3. Click Actions Edit Annotations, and click Add More.
  4. Add machine.openshift.io/exclude-node-draining, and click Save.
  5. Click Actions Delete Machine, and click Delete.
  6. A new machine is automatically created. Wait for te new machine to start.

    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.

  7. Click Compute Nodes. Confirm that the new node is in Ready state.
  8. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.
  9. Optional: If the failed Virtual Machine (VM) is not removed automatically, terminate the VM from VMware vSphere.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.3. OpenShift Data Foundation deployed on Red Hat Virtualization

1.3.1. Replacing an operational Red Hat Virtualization node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the node that you need to replace. Take a note of its Machine Name.
  3. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  4. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.

  5. Click Compute Machines. Search for the required machine.
  6. Besides the required machine, click Action menu (⋮) Delete Machine.
  7. Click Delete to confirm the machine is deleted. A new machine is automatically created. Wait for the new machine to start and transition into Running state.

    Important

    This activity might take at least 5 - 10 minutes or more.

  8. Click Compute Nodes. Confirm that the new node is in Ready state.
  9. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.3.2. Replacing a failed Red Hat Virtualization node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the faulty node. Take a note of its Machine Name.
  3. Ensure that the disks are not deleted when you delete the Virtual Machine (VM) instance.

    • Log in to the Red Hat Virtualization Administration Portal, and remove the virtual disks associated with the monitor pod and Object Storage Devices (OSDs) from the failed VM.

      Important

      Do not select the Remove Permanently option when you remove the one or more disks.

  4. In the OpenShift Web Console, click Compute Machines. Search for the required machine.
  5. Click Actions Edit Annotations, and click Add More.
  6. Add machine.openshift.io/exclude-node-draining, and click Save.
  7. Click Actions Delete Machine, and click Delete.

    A new machine is automatically created. Wait for the new machine to start.

    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.

  8. Click Compute Nodes. Confirm that the new node is in Ready state.
  9. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.
  10. Optional: If the failed VM is not removed automatically, remove the VM from the Red Hat Virtualization Administration Portal.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.4. OpenShift Data Foundation deployed on Microsoft Azure

1.4.1. Replacing operational nodes on Azure installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the node that you need to replace. Take a note of its Machine Name.
  3. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  4. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.

  5. Click Compute Machines. Search for the required machine.
  6. Besides the required machine, click the Action menu (⋮) Delete Machine.
  7. Click Delete to confirm the machine is deleted. A new machine is automatically created.
  8. Wait for the new machine to start and transition into Running state.

    Important

    This activity might take at least 5 - 10 minutes or more.

  9. Click Compute Nodes. Confirm that the new node is in Ready state.
  10. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Execute the following command to apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click WorkloadsPods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.4.2. Replacing failed nodes on Azure installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the faulty node, and click on its Machine Name.
  3. Click Actions Edit Annotations, and click Add More.
  4. Add machine.openshift.io/exclude-node-draining, and click Save.
  5. Click Actions Delete Machine, and click Delete.
  6. A new machine is automatically created. Wait for the new machine to start.

    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when when you label the new node, and it is functional.

  7. Click Compute Nodes. Confirm that the new node is in Ready state.
  8. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.
  9. Optional: If the failed Azure instance is not removed automatically, terminate the instance from the Azure console.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that new the Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.