Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 4. Replacing storage nodes for OpenShift Container Storage


For OpenShift Container Storage 4.2, node replacement can be performed proactively for an operational node and reactively for a failed node for the following deployments:

  • For Amazon Web Services (AWS)

    • User-provisioned infrastructure
    • Installer-provisioned infrastructure
  • For VMware

    • User-provisioned infrastructure

4.1. OpenShift Container Storage deployed on AWS

Perform this procedure to replace an operational node on AWS user-provisioned infrastructure.

Procedure

  1. Identify the node that needs to be replaced.
  2. Mark the node as unschedulable using the following command:

    $ oc adm cordon <node_name>
  3. Drain the node using the following command:

    $ oc adm drain <node_name> --force --delete-local-data --ignore-daemonsets
    Important

    This activity may take at least 5-10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.

  4. Delete the node using the following command:

    $ oc delete nodes <node_name>
  5. Create a new AWS machine instance with the required infrastructure. See Infrastructure requirements.
  6. Create a new OpenShift Container Platform node using the new AWS machine instance.
  7. Check for certificate signing requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  8. Approve all required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <Certificate_Name>
  9. Click Compute Nodes, confirm if the new node is in Ready state.
  10. Apply the OpenShift Container Storage label to the new node using any one of the following:

    From User interface
    1. For the new node, click Action Menu (⋮) Edit Labels.
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From Command line interface
    • Execute the following command to apply the OpenShift Container Storage label to the new node:

      $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
  11. Restart the mgr pod to update the OpenShift Container Storage with the new hostname.

    $ oc delete pod rook-ceph-mgr-xxxx

Verification steps

  1. Execute the following command and verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods, confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all other required OpenShift Container Storage pods are in Running state.
  4. If verification steps fail, kindly contact Red Hat Support.

Perform this procedure to replace an operational node on AWS installer-provisioned infrastructure (IPI).

Procedure

  1. Log in to OpenShift Web Console and click Compute Nodes.
  2. Identify the node that needs to be replaced. Take a note of its Machine Name.
  3. Mark the node as unschedulable using the following command:

    $ oc adm cordon <node_name>
  4. Drain the node using the following command:

    $ oc adm drain <node_name> --force --delete-local-data --ignore-daemonsets
    Important

    This activity may take at least 5-10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.

  5. Click Compute Machines. Search for the required machine.
  6. Besides the required machine, click the Action menu (⋮) Delete Machine.
  7. Click Delete to confirm the machine deletion. A new machine is automatically created.
  8. Wait for new machine to start and transition into Running state.

    Important

    This activity may take at least 5-10 minutes or more.

  9. Click Compute Nodes, confirm if the new node is in Ready state.
  10. Apply the OpenShift Container Storage label to the new node using any one of the following:

    From User interface
    1. For the new node, click Action Menu (⋮) Edit Labels
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From Command line interface
    • Execute the following command to apply the OpenShift Container Storage label to the new node:

      $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
  11. Restart the mgr pod to update the OpenShift Container Storage with the new hostname.

    $ oc delete pod rook-ceph-mgr-xxxx

Verification steps

  1. Execute the following command and verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods, confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all other required OpenShift Container Storage pods are in Running state.
  4. If verification steps fail, kindly contact Red Hat Support.

Perform this procedure to replace a failed node which is not operational on AWS user-provisioned infrastructure (UPI) for OpenShift Container Storage 4.2.

Procedure

  1. Identify the AWS machine instance of the node that needs to be replaced.
  2. Log in to AWS and terminate the identified AWS machine instance.
  3. Create a new AWS machine instance with the required infrastructure. See Infrastructure requirements.
  4. Create a new OpenShift Container Platform node using the new AWS machine instance.
  5. Check for certificate signing requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  6. Approve all required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <Certificate_Name>
  7. Click Compute Nodes, confirm if the new node is in Ready state.
  8. Apply the OpenShift Container Storage label to the new node using any one of the following:

    From User interface
    1. For the new node, click Action Menu (⋮) Edit Labels
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From Command line interface
    • Execute the following command to apply the OpenShift Container Storage label to the new node:

      $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
  9. Restart the mgr pod to update the OpenShift Container Storage with the new hostname.

    $ oc delete pod rook-ceph-mgr-xxxx

Verification steps

  1. Execute the following command and verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods, confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all other required OpenShift Container Storage pods are in Running state.
  4. If verification steps fail, kindly contact Red Hat Support.

Perform this procedure to replace a failed node which is not operational on AWS installer-provisioned infrastructure (IPI) for OpenShift Container Storage 4.2.

Procedure

  1. Log in to OpenShift Web Console and click Compute Nodes.
  2. Identify the faulty node and click on its Machine Name.
  3. Click Actions Edit Annotations, and click Add More.
  4. Add machine.openshift.io/exclude-node-draining and click Save.
  5. Click Actions Delete Machine, and click Delete.
  6. A new machine is automatically created, wait for new machine to start.

    Important

    This activity may take at least 5-10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.

  7. Click Compute Nodes, confirm if the new node is in Ready state.
  8. Apply the OpenShift Container Storage label to the new node using any one of the following:

    From User interface
    1. For the new node, click Action Menu (⋮) Edit Labels
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From Command line interface
    • Execute the following command to apply the OpenShift Container Storage label to the new node:

      $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
  9. [Optional]: If the failed AWS instance is not removed automatically, terminate the instance from AWS console.
  10. Restart the mgr pod to update the OpenShift Container Storage with the new hostname.

    $ oc delete pod rook-ceph-mgr-xxxx

Verification steps

  1. Execute the following command and verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods, confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all other required OpenShift Container Storage pods are in Running state.
  4. If verification steps fail, kindly contact Red Hat Support.

4.2. OpenShift Container Storage deployed on VMware

Perform this procedure to replace an operational node on VMware user-provisioned infrastructure (UPI).

Procedure

  1. Identify the node and its VM that needs to be replaced.
  2. Mark the node as unschedulable using the following command:

    $ oc adm cordon <node_name>
  3. Drain the node using the following command:

    $ oc adm drain <node_name> --force --delete-local-data --ignore-daemonsets
    Important

    This activity may take at least 5-10 minutes or more. Ceph errors generated during this period are temporary and are automatically resolved when the new node is labeled and functional.

  4. Delete the node using the following command:

    $ oc delete nodes <node_name>
  5. Log in to vSphere and terminate the identified VM.

    Important

    VM should be deleted only from the inventory and not from the disk.

  6. Create a new VM on vSphere with the required infrastructure. See Infrastructure requirements.
  7. Create a new OpenShift Container Platform worker node using the new VM.
  8. Check for certificate signing requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  9. Approve all required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <Certificate_Name>
  10. Click Compute Nodes, confirm if the new node is in Ready state.
  11. Apply the OpenShift Container Storage label to the new node using any one of the following:

    From User interface
    1. For the new node, click Action Menu (⋮) Edit Labels
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From Command line interface
    • Execute the following command to apply the OpenShift Container Storage label to the new node:

      $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
  12. Restart the mgr pod to update the OpenShift Container Storage with the new hostname.

    $ oc delete pod rook-ceph-mgr-xxxx

Verification steps

  1. Execute the following command and verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods, confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all other required OpenShift Container Storage pods are in Running state.
  4. If verification steps fail, kindly contact Red Hat Support.

Perform this procedure to replace a failed node on VMware user-provisioned infrastructure (UPI).

Procedure

  1. Identify the node and its VM that needs to be replaced.
  2. Delete the node using the following command:

    $ oc delete nodes <node_name>
  3. Log in to vSphere and terminate the identified VM.

    Important

    VM should be deleted only from the inventory and not from the disk.

  4. Create a new VM on vSphere with the required infrastructure. See Infrastructure requirements.
  5. Create a new OpenShift Container Platform worker node using the new VM.
  6. Check for certificate signing requests (CSRs) related to OpenShift Container Platform that are in Pending state:

    $ oc get csr
  7. Approve all required OpenShift Container Platform CSRs for the new node:

    $ oc adm certificate approve <Certificate_Name>
  8. Click Compute Nodes, confirm if the new node is in Ready state.
  9. Apply the OpenShift Container Storage label to the new node using any one of the following:

    From User interface
    1. For the new node, click Action Menu (⋮) Edit Labels
    2. Add cluster.ocs.openshift.io/openshift-storage and click Save.
    From Command line interface
    • Execute the following command to apply the OpenShift Container Storage label to the new node:

      $ oc label node <new_node_name> cluster.ocs.openshift.io/openshift-storage=""
  10. Restart the mgr pod to update the OpenShift Container Storage with the new hostname.

    $ oc delete pod rook-ceph-mgr-xxxx

Verification steps

  1. Execute the following command and verify that the new node is present in the output:

    $ oc get nodes --show-labels | grep cluster.ocs.openshift.io/openshift-storage= |cut -d' ' -f1
  2. Click Workloads Pods, confirm that at least the following pods on the new node are in Running state:

    • csi-cephfsplugin-*
    • csi-rbdplugin-*
  3. Verify that all other required OpenShift Container Storage pods are in Running state.
  4. If verification steps fail, kindly contact Red Hat Support.
Red Hat logoGithubredditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance. Découvrez nos récentes mises à jour.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez le Blog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

Theme

© 2026 Red Hat
Retour au début