搜索

此内容没有您所选择的语言版本。

Chapter 1. OpenShift Data Foundation deployed using dynamic devices

download PDF

1.1. OpenShift Data Foundation deployed on AWS

1.1.1. Replacing an operational AWS node on user-provisioned infrastructure

Prerequisites

  • Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.
Note

When replacing an AWS node on user-provisioned infrastructure, the new node needs to be created in the same AWS zone as the original node.

Procedure

  1. Identify the node that you need to replace.
  2. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  3. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.

  4. Delete the node:

    $ oc delete nodes <node_name>
  5. Create a new Amazon Web Service (AWS) machine instance with the required infrastructure. See Platform requirements.
  6. Create a new OpenShift Container Platform node using the new AWS machine instance.
  7. Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  8. Approve all the required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <certificate_name>
    <certificate_name>
    Specify the name of the CSR.
  9. Click Compute Nodes. Confirm that the new node is in Ready state.
  10. Apply the OpenShift Data Foundation label to the new node using one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.1.2. Replacing an operational AWS node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the node that you need to replace. Take a note of its Machine Name.
  3. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  4. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.

  5. Click Compute Machines. Search for the required machine.
  6. Besides the required machine, click Action menu (⋮) Delete Machine.
  7. Click Delete to confirm that the machine is deleted. A new machine is automatically created.
  8. Wait for the new machine to start and transition into Running state.

    Important

    This activity might take at least 5 - 10 minutes or more.

  9. Click Compute Nodes. Confirm that the new node is in Ready state.
  10. Apply the OpenShift Data Foundation label to the new node:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.1.3. Replacing a failed AWS node on user-provisioned infrastructure

Prerequisites

  • Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.

Procedure

  1. Identify the Amazon Web Service (AWS) machine instance of the node that you need to replace.
  2. Log in to AWS, and terminate the AWS machine instance that you identified.
  3. Create a new AWS machine instance with the required infrastructure. See Platform requirements.
  4. Create a new OpenShift Container Platform node using the new AWS machine instance.
  5. Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  6. Approve all the required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <certificate_name>
    <certificate_name>
    Specify the name of the CSR.
  7. Click Compute Nodes. Confirm that the new node is in Ready state.
  8. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Execute the following command to apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.1.4. Replacing a failed AWS node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the faulty node, and click on its Machine Name.
  3. Click Actions Edit Annotations, and click Add More.
  4. Add machine.openshift.io/exclude-node-draining, and click Save.
  5. Click Actions Delete Machine, and click Delete.
  6. A new machine is automatically created, wait for new machine to start.

    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.

  7. Click Compute Nodes. Confirm that the new node is in Ready state.
  8. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.
  9. Optional: If the failed Amazon Web Service (AWS) instance is not removed automatically, terminate the instance from the AWS console.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.2. OpenShift Data Foundation deployed on VMware

1.2.1. Replacing an operational VMware node on user-provisioned infrastructure

Prerequisites

  • Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.

Procedure

  1. Identify the node and its Virtual Machine (VM) that you need replace.
  2. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  3. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.

  4. Delete the node:

    $ oc delete nodes <node_name>
  5. Log in to VMware vSphere, and terminate the VM that you identified:

    Important

    Delete the VM only from the inventory and not from the disk.

  6. Create a new VM on VMware vSphere with the required infrastructure. See Platform requirements.
  7. Create a new OpenShift Container Platform worker node using the new VM.
  8. Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  9. Approve all the required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <certificate_name>
    <certificate_name>
    Specify the name of the CSR.
  10. Click Compute Nodes. Confirm that the new node is in Ready state.
  11. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.2.2. Replacing an operational VMware node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the node that you need to replace. Take a note of its Machine Name.
  3. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  4. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.

  5. Click Compute Machines. Search for the required machine.
  6. Besides the required machine, click Action menu (⋮) Delete Machine.
  7. Click Delete to confirm the machine is deleted. A new machine is automatically created.
  8. Wait for the new machine to start and transition into Running state.

    Important

    This activity might take at least 5 - 10 minutes or more.

  9. Click Compute Nodes. Confirm that the new node is in Ready state.
  10. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.2.3. Replacing a failed VMware node on user-provisioned infrastructure

Prerequisites

  • Ensure that the replacement nodes are configured with similar infrastructure and resources to the node that you replace.
  • You must be logged into the OpenShift Container Platform cluster.

Procedure

  1. Identify the node and its Virtual Machine (VM) that you need to replace.
  2. Delete the node:

    $ oc delete nodes <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  3. Log in to VMware vSphere and terminate the VM that you identified.

    Important

    Delete the VM only from the inventory and not from the disk.

  4. Create a new VM on VMware vSphere with the required infrastructure. See Platform requirements.
  5. Create a new OpenShift Container Platform worker node using the new VM.
  6. Check for the Certificate Signing Requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  7. Approve all the required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <certificate_name>
    <certificate_name>
    Specify the name of the CSR.
  8. Click Compute Nodes. Confirm that the new node is in Ready state.
  9. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.2.4. Replacing a failed VMware node on installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the faulty node, and click on its Machine Name.
  3. Click Actions Edit Annotations, and click Add More.
  4. Add machine.openshift.io/exclude-node-draining, and click Save.
  5. Click Actions Delete Machine, and click Delete.
  6. A new machine is automatically created. Wait for te new machine to start.

    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.

  7. Click Compute Nodes. Confirm that the new node is in Ready state.
  8. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.
  9. Optional: If the failed Virtual Machine (VM) is not removed automatically, terminate the VM from VMware vSphere.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.3. OpenShift Data Foundation deployed on Microsoft Azure

1.3.1. Replacing operational nodes on Azure installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the node that you need to replace. Take a note of its Machine Name.
  3. Mark the node as unschedulable:

    $ oc adm cordon <node_name>
    <node_name>
    Specify the name of node that you need to replace.
  4. Drain the node:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.

  5. Click Compute Machines. Search for the required machine.
  6. Besides the required machine, click the Action menu (⋮) Delete Machine.
  7. Click Delete to confirm the machine is deleted. A new machine is automatically created.
  8. Wait for the new machine to start and transition into Running state.

    Important

    This activity might take at least 5 - 10 minutes or more.

  9. Click Compute Nodes. Confirm that the new node is in Ready state.
  10. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Execute the following command to apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click WorkloadsPods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.3.2. Replacing failed nodes on Azure installer-provisioned infrastructure

Procedure

  1. Log in to the OpenShift Web Console, and click Compute Nodes.
  2. Identify the faulty node, and click on its Machine Name.
  3. Click Actions Edit Annotations, and click Add More.
  4. Add machine.openshift.io/exclude-node-draining, and click Save.
  5. Click Actions Delete Machine, and click Delete.
  6. A new machine is automatically created. Wait for the new machine to start.

    Important

    This activity might take at least 5 - 10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when you label the new node, and it is functional.

  7. Click Compute Nodes. Confirm that the new node is in Ready state.
  8. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the user interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage, and click Save.
    From the command-line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.
  9. Optional: If the failed Azure instance is not removed automatically, terminate the instance from the Azure console.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that new the Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.4. OpenShift Data Foundation deployed on Google cloud

1.4.1. Replacing operational nodes on Google Cloud installer-provisioned infrastructure

Procedure

  1. Log in to OpenShift Web Console and click Compute Nodes.
  2. Identify the node that needs to be replaced. Take a note of its Machine Name.
  3. Mark the node as unschedulable using the following command:

    $ oc adm cordon <node_name>
  4. Drain the node using the following command:

    $ oc adm drain <node_name> --force --delete-emptydir-data=true --ignore-daemonsets
    Important

    This activity may take at least 5-10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.

  5. Click Compute Machines. Search for the required machine.
  6. Besides the required machine, click the Action menu (⋮) Delete Machine.
  7. Click Delete to confirm the machine deletion. A new machine is automatically created.
  8. Wait for new machine to start and transition into Running state.

    Important

    This activity may take at least 5-10 minutes or more.

  9. Click Compute Nodes, confirm if the new node is in Ready state.
  10. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From User interface
    1. For the new node, click Action Menu (⋮) Edit Labels
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From Command line interface
    • Execute the following command to apply the OpenShift Data Foundation label to the new node:

      $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that the new Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.

1.4.2. Replacing failed nodes on Google Cloud installer-provisioned infrastructure

Procedure

  1. Log in to OpenShift Web Console and click Compute Nodes.
  2. Identify the faulty node and click on its Machine Name.
  3. Click Actions Edit Annotations, and click Add More.
  4. Add machine.openshift.io/exclude-node-draining and click Save.
  5. Click Actions Delete Machine, and click Delete.
  6. A new machine is automatically created, wait for new machine to start.

    Important

    This activity may take at least 5-10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.

  7. Click Compute Nodes, confirm if the new node is in Ready state.
  8. Apply the OpenShift Data Foundation label to the new node using any one of the following:

    From the web user interface
    1. For the new node, click Action Menu (⋮) Edit Labels
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From the command line interface
    • Apply the OpenShift Data Foundation label to the new node:
    $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
    <new_node_name>
    Specify the name of the new node.
  9. Optional: If the failed Google Cloud instance is not removed automatically, terminate the instance from Google Cloud console.

Verification steps

  1. Verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods. Confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all the other required OpenShift Data Foundation pods are in Running state.
  4. Verify that new the Object Storage Device (OSD) pods are running on the replacement node:

    $ oc get pods -o wide -n openshift-storage| egrep -i <new_node_name> | egrep osd
  5. Optional: If cluster-wide encryption is enabled on the cluster, verify that the new OSD devices are encrypted.

    For each of the new nodes identified in the previous step, do the following:

    1. Create a debug pod and open a chroot environment for the one or more selected hosts:

      $ oc debug node/<node_name>
      $ chroot /host
    2. Display the list of available block devices:

      $ lsblk

      Check for the crypt keyword beside the one or more ocs-deviceset names.

  6. If the verification steps fail, contact Red Hat Support.
Red Hat logoGithubRedditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

© 2024 Red Hat, Inc.