Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 4. Using Machine Deletion Remediation
You can use the Machine Deletion Remediation Operator to reprovision unhealthy nodes using the Machine API. You can use the Machine Deletion Remediation Operator in conjunction with the Node Health Check Operator.
4.1. About the Machine Deletion Remediation Operator Copier lienLien copié sur presse-papiers!
The Machine Deletion Remediation (MDR) operator works with the NodeHealthCheck controller, to reprovision unhealthy nodes using the Machine API. MDR follows the annotation on the node to the associated machine object, confirms that it has an owning controller (for example, MachineSetController), and deletes it. Once the machine CR is deleted, the owning controller creates a replacement.
The prerequisites for MDR include:
- a Machine API-based cluster that is able to programmatically destroy and create cluster nodes,
- nodes that are associated with machines, and
- declaratively managed machines.
You can then modify the NodeHealthCheck CR to use MDR as its remediator. An example MDR template object and NodeHealthCheck configuration are provided in the documentation.
The MDR process works as follows:
- the Node Health Check Operator detects an unhealthy node and creates a MDR CR.
- the MDR Operator watches for the MDR CR associated with the unhealthy node and deletes it, if the machine has an owning controller.
-
when the node is healthy again, the MDR CR is deleted by the
NodeHealthCheckcontroller.
4.2. Installing the Machine Deletion Remediation Operator by using the web console Copier lienLien copié sur presse-papiers!
You can use the Red Hat OpenShift web console to install the Machine Deletion Remediation Operator.
Prerequisites
-
Log in as a user with
cluster-adminprivileges.
Procedure
-
In the Red Hat OpenShift web console, navigate to Operators
OperatorHub. - Select the Machine Deletion Remediation Operator, or MDR, from the list of available Operators, and then click Install.
-
Keep the default selection of Installation mode and namespace to ensure that the Operator is installed to the
openshift-workload-availabilitynamespace. - Click Install.
Verification
To confirm that the installation is successful:
-
Navigate to the Operators
Installed Operators page. -
Check that the Operator is installed in the
openshift-workload-availabilitynamespace and its status isSucceeded.
If the Operator is not installed successfully:
-
Navigate to the Operators
Installed Operators page and inspect the Status column for any errors or failures. -
Navigate to the Workloads
Pods page and check the log of the pod in the openshift-workload-availabilityproject for any reported issues.
4.3. Installing the Machine Deletion Remediation Operator by using the CLI Copier lienLien copié sur presse-papiers!
You can use the OpenShift CLI (oc) to install the Machine Deletion Remediation Operator.
You can install the Machine Deletion Remediation Operator in your own namespace or in the openshift-workload-availability namespace.
Prerequisites
-
Install the OpenShift CLI (
oc). -
Log in as a user with
cluster-adminprivileges.
Procedure
Create a
Namespacecustom resource (CR) for the Machine Deletion Remediation Operator:Define the
NamespaceCR and save the YAML file, for example,workload-availability-namespace.yaml:apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availability
apiVersion: v1 kind: Namespace metadata: name: openshift-workload-availabilityCopy to Clipboard Copied! Toggle word wrap Toggle overflow To create the
NamespaceCR, run the following command:oc create -f workload-availability-namespace.yaml
$ oc create -f workload-availability-namespace.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create an
OperatorGroupCR:Define the
OperatorGroupCR and save the YAML file, for example,workload-availability-operator-group.yaml:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availability
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: workload-availability-operator-group namespace: openshift-workload-availabilityCopy to Clipboard Copied! Toggle word wrap Toggle overflow To create the
OperatorGroupCR, run the following command:oc create -f workload-availability-operator-group.yaml
$ oc create -f workload-availability-operator-group.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create a
SubscriptionCR:Define the
SubscriptionCR and save the YAML file, for example,machine-deletion-remediation-subscription.yaml:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specify the
Namespacewhere you want to install the Machine Deletion Remediation Operator. When installing the Machine Deletion Remediation Operator in theopenshift-workload-availabilitySubscriptionCR, theNamespaceandOperatorGroupCRs will already exist.
To create the
SubscriptionCR, run the following command:oc create -f machine-deletion-remediation-subscription.yaml
$ oc create -f machine-deletion-remediation-subscription.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the installation succeeded by inspecting the CSV resource:
oc get csv -n openshift-workload-availability
$ oc get csv -n openshift-workload-availabilityCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DISPLAY VERSION REPLACES PHASE machine-deletion-remediation.v0.3.0 Machine Deletion Remediation Operator 0.3.0 machine-deletion-remediation.v0.2.1 Succeeded
NAME DISPLAY VERSION REPLACES PHASE machine-deletion-remediation.v0.3.0 Machine Deletion Remediation Operator 0.3.0 machine-deletion-remediation.v0.2.1 SucceededCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.4. Configuring the Machine Deletion Remediation Operator Copier lienLien copié sur presse-papiers!
You can use the Machine Deletion Remediation Operator, with the Node Health Check Operator, to create the MachineDeletionRemediationTemplate Custom Resource (CR). This CR defines the remediation strategy for the nodes.
The MachineDeletionRemediationTemplate CR resembles the following YAML file:
4.5. Troubleshooting the Machine Deletion Remediation Operator Copier lienLien copié sur presse-papiers!
4.5.1. General troubleshooting Copier lienLien copié sur presse-papiers!
- Issue
- You want to troubleshoot issues with the Machine Deletion Remediation Operator.
- Resolution
Check the Operator logs.
oc logs <machine-deletion-remediation-controller-manager-name> -c manager -n <namespace-name>
$ oc logs <machine-deletion-remediation-controller-manager-name> -c manager -n <namespace-name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
4.5.2. Unsuccessful remediation Copier lienLien copié sur presse-papiers!
- Issue
- An unhealthy node was not remediated.
- Resolution
Verify that the
MachineDeletionRemediationCR was created by running the following command:oc get mdr -A
$ oc get mdr -ACopy to Clipboard Copied! Toggle word wrap Toggle overflow If the
NodeHealthCheckcontroller did not create theMachineDeletionRemediationCR when the node turned unhealthy, check the logs of theNodeHealthCheckcontroller. Additionally, ensure that theNodeHealthCheckCR includes the required specification to use the remediation template.If the
MachineDeletionRemediationCR was created, ensure that its name matches the unhealthy node object.
4.5.3. Machine Deletion Remediation Operator resources exist even after uninstalling the Operator Copier lienLien copié sur presse-papiers!
- Issue
- The Machine Deletion Remediation Operator resources, such as the remediation CR and the remediation template CR, exist even after uninstalling the Operator.
- Resolution
To remove the Machine Deletion Remediation Operator resources, you can delete the resources by selecting the Delete all operand instances for this operator checkbox before uninstalling. This checkbox feature is only available in Red Hat OpenShift since version 4.13. For all versions of Red Hat OpenShift, you can delete the resources by running the following relevant command for each resource type:
oc delete mdr <machine-deletion-remediation> -n <namespace>
$ oc delete mdr <machine-deletion-remediation> -n <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc delete mdrt <machine-deletion-remediation-template> -n <namespace>
$ oc delete mdrt <machine-deletion-remediation-template> -n <namespace>Copy to Clipboard Copied! Toggle word wrap Toggle overflow The remediation CR
mdrmust be created and deleted by the same entity, for example, NHC. If the remediation CRmdris still present, it is deleted, together with the MDR operator.The remediation template CR
mdrtonly exists if you use MDR with NHC. When the MDR operator is deleted using the web console, the remediation template CRmdrtis also deleted.
4.6. Gathering data about the Machine Deletion Remediation Operator Copier lienLien copié sur presse-papiers!
To collect debugging information about the Machine Deletion Remediation Operator, use the must-gather tool. For information about the must-gather image for the Machine Deletion Remediation Operator, see Gathering data about specific features.