Este contenido no está disponible en el idioma seleccionado.
Chapter 4. Using Machine Deletion Remediation
You can use the Machine Deletion Remediation Operator to reprovision unhealthy nodes using the Machine API. You can use the Machine Deletion Remediation Operator in conjunction with the Node Health Check Operator.
4.1. About the Machine Deletion Remediation Operator Copiar enlaceEnlace copiado en el portapapeles!
The Machine Deletion Remediation (MDR) operator works with the NodeHealthCheck
controller, to reprovision unhealthy nodes using the Machine API. MDR follows the annotation on the node to the associated machine object, confirms that it has an owning controller (for example, MachineSetController
), and deletes it. Once the machine CR is deleted, the owning controller creates a replacement.
The prerequisites for MDR include:
- a Machine API-based cluster that is able to programmatically destroy and create cluster nodes,
- nodes that are associated with machines, and
- declaratively managed machines.
You can then modify the NodeHealthCheck
CR to use MDR as its remediator. An example MDR template object and NodeHealthCheck
configuration are provided in the documentation.
The MDR process works as follows:
- the Node Health Check Operator detects an unhealthy node and creates a MDR CR.
- the MDR Operator watches for the MDR CR associated with the unhealthy node and deletes it, if the machine has an owning controller.
-
when the node is healthy again, the MDR CR is deleted by the
NodeHealthCheck
controller.
4.2. Installing the Machine Deletion Remediation Operator by using the web console Copiar enlaceEnlace copiado en el portapapeles!
You can use the Red Hat OpenShift web console to install the Machine Deletion Remediation Operator.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
-
In the Red Hat OpenShift web console, navigate to Operators
OperatorHub. - Select the Machine Deletion Remediation Operator, or MDR, from the list of available Operators, and then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the
openshift-workload-availability
namespace. - Click Install.
Verification
To confirm that the installation is successful:
-
Navigate to the Operators
Installed Operators page. -
Check that the Operator is installed in the
openshift-workload-availability
namespace and its status isSucceeded
.
If the Operator is not installed successfully:
-
Navigate to the Operators
Installed Operators page and inspect the Status column for any errors or failures. -
Navigate to the Workloads
Pods page and check the log of the pod in the openshift-workload-availability
project for any reported issues.
4.3. Installing the Machine Deletion Remediation Operator by using the CLI Copiar enlaceEnlace copiado en el portapapeles!
You can use the OpenShift CLI (oc
) to install the Machine Deletion Remediation Operator.
You can install the Machine Deletion Remediation Operator in your own namespace or in the openshift-workload-availability
namespace.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
Namespace
custom resource (CR) for the Machine Deletion Remediation Operator:Define the
Namespace
CR and save the YAML file, for example,workload-availability-namespace.yaml
:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availability
apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availability
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To create the
Namespace
CR, run the following command:oc create -f workload-availability-namespace.yaml
$ oc create -f workload-availability-namespace.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create an
OperatorGroup
CR:Define the
OperatorGroup
CR and save the YAML file, for example,workload-availability-operator-group.yaml
:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availability
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availability
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To create the
OperatorGroup
CR, run the following command:oc create -f workload-availability-operator-group.yaml
$ oc create -f workload-availability-operator-group.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create a
Subscription
CR:Define the
Subscription
CR and save the YAML file, for example,machine-deletion-remediation-subscription.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the
Namespace
where you want to install the Machine Deletion Remediation Operator. When installing the Machine Deletion Remediation Operator in theopenshift-workload-availability
Subscription
CR, theNamespace
andOperatorGroup
CRs will already exist.
To create the
Subscription
CR, run the following command:oc create -f machine-deletion-remediation-subscription.yaml
$ oc create -f machine-deletion-remediation-subscription.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the installation succeeded by inspecting the CSV resource:
oc get csv -n openshift-workload-availability
$ oc get csv -n openshift-workload-availability
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY VERSION REPLACES PHASE machine-deletion-remediation.v0.3.0 Machine Deletion Remediation Operator 0.3.0 machine-deletion-remediation.v0.2.1 Succeeded
NAME DISPLAY VERSION REPLACES PHASE machine-deletion-remediation.v0.3.0 Machine Deletion Remediation Operator 0.3.0 machine-deletion-remediation.v0.2.1 Succeeded
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4. Configuring the Machine Deletion Remediation Operator Copiar enlaceEnlace copiado en el portapapeles!
You can use the Machine Deletion Remediation Operator, with the Node Health Check Operator, to create the MachineDeletionRemediationTemplate
Custom Resource (CR). This CR defines the remediation strategy for the nodes.
The MachineDeletionRemediationTemplate
CR resembles the following YAML file:
4.5. Troubleshooting the Machine Deletion Remediation Operator Copiar enlaceEnlace copiado en el portapapeles!
4.5.1. General troubleshooting Copiar enlaceEnlace copiado en el portapapeles!
- Issue
- You want to troubleshoot issues with the Machine Deletion Remediation Operator.
- Resolution
Check the Operator logs.
oc logs <machine-deletion-remediation-controller-manager-name> -c manager -n <namespace-name>
$ oc logs <machine-deletion-remediation-controller-manager-name> -c manager -n <namespace-name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.5.2. Unsuccessful remediation Copiar enlaceEnlace copiado en el portapapeles!
- Issue
- An unhealthy node was not remediated.
- Resolution
Verify that the
MachineDeletionRemediation
CR was created by running the following command:oc get mdr -A
$ oc get mdr -A
Copy to Clipboard Copied! Toggle word wrap Toggle overflow If the
NodeHealthCheck
controller did not create theMachineDeletionRemediation
CR when the node turned unhealthy, check the logs of theNodeHealthCheck
controller. Additionally, ensure that theNodeHealthCheck
CR includes the required specification to use the remediation template.If the
MachineDeletionRemediation
CR was created, ensure that its name matches the unhealthy node object.
4.5.3. Machine Deletion Remediation Operator resources exist even after uninstalling the Operator Copiar enlaceEnlace copiado en el portapapeles!
- Issue
- The Machine Deletion Remediation Operator resources, such as the remediation CR and the remediation template CR, exist even after uninstalling the Operator.
- Resolution
To remove the Machine Deletion Remediation Operator resources, you can delete the resources by selecting the Delete all operand instances for this operator checkbox before uninstalling. This checkbox feature is only available in Red Hat OpenShift since version 4.13. For all versions of Red Hat OpenShift, you can delete the resources by running the following relevant command for each resource type:
oc delete mdr <machine-deletion-remediation> -n <namespace>
$ oc delete mdr <machine-deletion-remediation> -n <namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete mdrt <machine-deletion-remediation-template> -n <namespace>
$ oc delete mdrt <machine-deletion-remediation-template> -n <namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The remediation CR
mdr
must be created and deleted by the same entity, for example, NHC. If the remediation CRmdr
is still present, it is deleted, together with the MDR operator.The remediation template CR
mdrt
only exists if you use MDR with NHC. When the MDR operator is deleted using the web console, the remediation template CRmdrt
is also deleted.
4.6. Gathering data about the Machine Deletion Remediation Operator Copiar enlaceEnlace copiado en el portapapeles!
To collect debugging information about the Machine Deletion Remediation Operator, use the must-gather
tool. For information about the must-gather
image for the Machine Deletion Remediation Operator, see Gathering data about specific features.