Questo contenuto non è disponibile nella lingua selezionata.

Chapter 2. Deploying and configuring the high availability for Compute instances service


The Red Hat OpenStack Services on OpenShift (RHOSO) high availability for Compute instances (Instance HA) service is managed by the infra-operator, which RHOSO installs by default.

You must deploy an Instance HA service to automate the process of monitoring which Compute nodes have failed and, if necessary, to evacuate instances from the failed Compute nodes. For more information, see Deploying the Instance HA service.

Warning

You must not use the Instance HA service to evacuate Compute nodes that host your storage in a RHOSO hyperconverged infrastructure (HCI) environment. In HCI environments, you must tag a subset of your Compute nodes, which do not host the Red Hat Ceph Storage services. For more information, see Tag images, flavors, or host aggregates for evacuation.

2.1. Deploying the Instance HA service

You must deploy an Red Hat OpenStack Services on OpenShift (RHOSO) high availability for Compute instances (Instance HA) service to automate the process of monitoring failed Compute nodes and, if necessary, to evacuate instances from the failed Compute nodes.

Note

If you have multiple clouds defined, you can create a separate Instance HA service pod to monitor each cloud. For more information, see Configuring the Instance HA service pod specification.

Procedure

  1. Create a YAML Instance HA service manifest file, for example Instance-HA-service-0.yaml:

  2. Apply the Instance HA service manifest and the fencingSecret files:

    $ oc apply -f fencing-0.yaml
    $ oc apply -f Instance-HA-service-0.yaml
  3. Verify that the Instance HA service pod Message field displays Setup complete before continuing:

    $ oc get instanceha -w
    NAME        STATUS   MESSAGE
    instanceha-0   True     Setup complete
    Note

    A unique string is appended to the .metadata.name that you specified in the manifest file.

  4. Determine the fully qualified name and status of your deployed Instance HA service pod:

    $ oc get pods |grep instanceha
    instanceha-0-54f865b6dd-w6h4t                                   1/1     Running     0       10h

    In this example, the fully qualified name is instanceha-0-54f865b6dd-w6h4t.

    Warning

    A new, unique fully qualified name is created every time the Instance HA service pod restarts. All log entries associated with the previous name are removed. For more information, see Troubleshooting the Instance HA service.

Next steps

2.1.1. Configuring the Instance HA service pod specification

When you deploy the Red Hat OpenStack Services on OpenShift (RHOSO) high availability for Compute instances (Instance HA) service, you must create a YAML Instance HA service manifest file to define the specification .spec of your Instance HA service pod.

In this example, the YAML Instance HA service manifest file is Instance-HA-service-0.yaml.

$ cat Instance-HA-service-0.yaml
---
apiVersion: instanceha.openstack.org/v1beta1
kind: InstanceHa
metadata:
  name: instanceha-0
spec:
  caBundleSecretName: combined-ca-bundle
  fencingSecret: fencing-0
  #instanceHaConfigMap:
  #networkAttachments: ['internalapi']
  #instanceHaKdumpPort:
  #openStackCloud: "default"
  #openStackConfigMap:
  #openStackConfigSecret:
  #nodeSelector:
  • Use .spec.caBundleSecretName to specify the name of the secret containing the CA Certificate Bundle that has been used during the deployment of RHOSO. By default this parameter is set to combined-ca-bundle, but this value might change if you implement custom TLS certificates. For more information, see Adding custom TLS certificates for Red Hat OpenStack Services on OpenShift in Configuring security services.
  • Use .spec.fencingSecret to specify the name of the YAML file with the configured fencing agents of all the Compute nodes that can be evacuated. In this example, this file is called fencing-0. For more information, see Configuring the fencing of Compute nodes.

    Note

    All the other values for defining the specification of your Instance HA service pod are optional, this why they have been commented out in this example.

  • Optional: You can create and name a YAML file containing a ConfigMap that provides your configured Instance HA service parameters. In this case, you must use .spec.instanceHaConfigMap to specify the name of this YAML file. If you do not create this file, then a YAML file called instanceha-config, is created automatically when the Instance HA service is installed, providing the default values of the Instance HA service parameters.
  • Optional: If you configure the Instance HA service to detect if a Compute node is capturing a kernel dump, then:

    • You must use .spec.networkAttachments to specify the network that receives the kdump notifications from the kdump service.
    • If you do not use the default UDP port of 7410, you must use .spec.instanceHaKdumpPort to specify the UDP port that receives the kdump notifications from the kdump service. For more information, see Detecting if a Compute node is capturing a kernel dump.
  • Optional: If you have multiple clouds defined, you can create a separate Instance HA service pod to monitor each cloud. In this case, you can use the following settings to specify the required authentication details for each cloud:

    • You can use .spec.openStackCloud to specify the name of the cloud detailed in your clouds.yaml file. If you do not specify a value, then default is used.
    • You can use .spec.openStackConfigMap to specify the name of the ConfigMap containing your clouds.yaml file.
    • You can use .spec.openStackConfigSecret to specify the name of the secret containing the admin password.
  • Optional: You can use .spec.nodeSelector to specify the label of the Red Hat OpenShift Container Platform (RHOCP) worker nodes that you need the Instance HA service pod to run on. For more information, see Placing pods on specific nodes using node selectors in RHOCP Nodes.

2.1.2. Configuring the fencing of Compute nodes

You must fence each Compute node that is eligible for evacuation. Configure their fencing agents in the fencingSecret YAML file that you specify when deploying the Red Hat OpenStack Services on OpenShift (RHOSO) high availability for Compute instances (Instance HA) service pod.

Note

You cannot evacuate a Compute node unless it has a configured fencing agent.

The supported fencing agents are: IPMI, Redfish, or BareMetalHost (BMH), which is the fencing agent for Metal³.

You can use the FENCING_TIMEOUT parameter to specify the expected timeout for a fencing operation to be performed, in seconds. The default value is 30 seconds and maximum configurable timeout is 120 seconds. For more information, see Editing the Instance HA service parameters.

The following is an example of a fencingSecret YAML file called fencing-0.yaml, which provides an example configuration of each of the three supported fencing agents.

Note

You must use the Compute service (nova) hostname to identify each Compute node, for example, compute-0. You can use the following command to obtain these hostnames: $ openstack compute service list.

$ cat fencing-0.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: fencing-0
stringData:
  fencing.yaml: |
    FencingConfig:
     compute-0:
        agent: ipmi
        ipaddr: 192.168.111.9
        ipport: 443
        login: admin
        passwd: password
     compute-1:
        agent: redfish
        ipaddr: 192.158.12.3
        ipport: 8000
        tls: 'true'
        login: admin
        passwd: password
        uuid: b7d32e6b-edbc-477d-80bf-4cda77ada8cb
     compute-2:
       agent: bmh
       host: edpm-compute-0
       namespace: openstack-edpm-ipam
       token: $2a$10$yc9Q.eHLiQmCdS0/LzxJ5.V5/lrmx8JxwFbU5X4Hdr1albfDl7wtm
  • You must provide each IPMI fencing agent agent: ipmi with the IP connection and user authentication details of the Intelligent Platform Management Interface (IPMI).
  • You must provide each Redfish fencing agent agent: redfish with the IP connection and user authentication details of the Redfish Host Interface.

    You must specify the ipport parameter when your Redfish Host Interface does not use the default 443 port. You must specify the value of the tls parameter in quotes as 'true'. The uuid parameter is optional for standard servers, in which case the Instance HA service uses the default value of System.Embedded.1 to specify the Redfish node UUID.

  • You must provide each BareMetalHost (BMH) fencing agent agent: bmh with the details of the associated BMH resource.

    You can use this command to obtain the host and namespace of the BMH resource:

    $ oc get bmh
    NAME            STATE       CONSUMER            ONLINE   ERROR   AGE
    edpm-compute-0   provisioned   openstack-edpm-ipam   true           17h
    edpm-compute-1   provisioned   openstack-edpm-ipam   true           17h
    • The NAME column provides the BMH resource host, for example, edpm-compute-0.
    • The CONSUMER column provides the BMH resource namespace, for example openstack-edpm-ipam.

      If you already have a user that has the necessary privileges to power the BMH resource on and off, then you can provide their authentication token as the BMH resource token. If not, then you must create a dedicated Red Hat OpenShift Container Platform (RHOCP) service account and provide this authentication token. For more information, see: RHOCP Authentication and authorization.

2.2. Instance HA service parameters

The Red Hat OpenStack Services on OpenShift (RHOSO) high availability for Compute instances (Instance HA) service provides a number of parameters that allow you to customize the process of evacuating instances from your failed Compute nodes. For information about editing these parameters values, see Editing the Instance HA service parameters.

Expand
ParameterDefaultDescription

DELTA

30

You must specify how often you want the status of enabled Compute node queried, in seconds. This parameter reduces the time taken by the Instance HA service to detect failed Compute nodes. For more information, see How the Instance HA service evacuates failed Compute nodes.

POLL

45

You must specify how often you want the Instance HA service to poll the Compute service (nova) database, in seconds. For more information, see How the Instance HA service evacuates failed Compute nodes.

THRESHOLD

50

You must specify the percentage of the total number of Compute nodes that are eligible for evacuation that can fail before the evacuation process becomes impractical. The Instance HA service stops evacuating the Compute nodes when this percentage is exceeded. For more information, see How the Instance HA service evacuates failed Compute nodes.

Note

When the TAGGED_AGGREGATES parameter is true, the THRESHOLD parameter is calculated based on the total number of Compute nodes that are tagged by using the EVACUABLE_TAG parameter.

LOGLEVEL

info

You must specify the amount of detail you want the Instance HA service log file messages to provide. When the LOGLEVEL parameter is set to info the Instance HA service log file provides minimal log messages. When you are troubleshooting the Instance HA service you can change the LOGLEVEL parameter to debug to increase the number of the log messages. For more information, see Troubleshooting the Instance HA service.

EVACUABLE_TAG

evacuable

Optional: When you tag flavors, images, or host aggregates, you must specify the text that you use to tag their metadata. For more information, see Tag images, flavors, or host aggregates for evacuation.

TAGGED_AGGREGATES

true

Optional: You can specify whether you want the Instance HA service to check for tagged host aggregates when deciding which Compute nodes to evacuate. For more information, see Tag images, flavors, or host aggregates for evacuation.

TAGGED_FLAVORS

true

Optional: You can specify whether you want the Instance HA service to check for tagged flavors when deciding which Compute nodes to evacuate. For more information, see Tag images, flavors, or host aggregates for evacuation.

TAGGED_IMAGES

true

Optional: You can specify whether you want the Instance HA service to check for tagged images when deciding which Compute nodes to evacuate. For more information, see Tag images, flavors, or host aggregates for evacuation.

SMART_EVACUATION

false

Optional: You can configure the Instance HA service to use SMART_EVACUATION to monitor and if necessary, restart the evacuation process of instances from failed Compute nodes up to 5 times. For more information, see How the Instance HA service evacuates failed Compute nodes.

WORKERS

4

Optional: When SMART_EVACUATION is set to true, you can specify the number of instances that the Instance HA service can evacuate at the same time.

DELAY

0

Optional: You can specify the time to wait before fencing a Compute node, in seconds. For more information, see How the Instance HA service evacuates failed Compute nodes.

FENCING_TIMEOUT

30

Optional: You can specify the expected timeout for a fencing operation to be performed, in seconds. The maximum configurable timeout is 120 seconds.

RESERVED_HOSTS

false

Optional: You can reserve healthy Compute nodes to evacuate the instances of failed Compute nodes. For more information, see Reserving healthy Compute nodes.

LEAVE_DISABLED

false

Optional: You can configure the Instance HA service to leave the fenced Compute nodes disabled after they have been evacuated. For more information, see How the Instance HA service evacuates failed Compute nodes.

FORCE_ENABLE

false

Optional: You can configure the Instance HA service to enable a Compute node even when the instances have not been successfully evacuated. For more information, see How the Instance HA service evacuates failed Compute nodes.

CHECK_KDUMP

false

Optional: You can configure the Instance HA service to detect if a Compute node is capturing a kernel before fencing and evacuating the Compute node. For more information, see Detecting if a Compute node is capturing a kernel dump.

DISABLED

false

Optional: You can configure the Instance HA service to not evacuate failed Compute nodes. For more information, see How the Instance HA service evacuates failed Compute nodes.

FORCE_RESERVED_HOST_EVACUATION

false

Optional: You can configure the Instance HA service to force the evacuation to the reserved host that was enabled to replace the failed Compute node.

Warning

This evacuation might fail if the destination Compute node does not have sufficient capacity.

2.2.1. Editing the Instance HA service parameters

The parameters of the Red Hat OpenStack Services on OpenShift (RHOSO) high availability for Compute instances (Instance HA) service pod are stored as strings within a YAML ConfigMap file. For more information about the supported Instance HA service parameters, see Instance HA service parameters.

Warning

When you edit the value of an Instance HA service parameter, all the log file entries are lost when the Instance HA service pod restarts. For more information, see Troubleshooting the Instance HA service.

Note

You must enclose all the parameter values in double quotes (“).

The name of this YAML ConfigMap file depends upon how you have chosen to create it:

  • You can create and name a YAML file containing a ConfigMap that provides your configured Instance HA service parameters. In this case, you must use .spec.instanceHaConfigMap to specify the name of this YAML file, when you create the Instance HA service manifest file. For more information, see Configuring the Instance HA service pod specification.
  • You can choose to let the infra-operator create this YAML ConfigMap when the Instance HA service pod is deployed. This is called instanceha-config and contains the default values of the Instance HA service parameters that you can modify as needed.

You can use the following command to edit your Instance HA service parameters:

$ oc edit cm <config_map_name>
  • Replace <config_map_name> with the name of your YAML ConfigMap file; for example, instanceha-config.

You can use the following command to display the current configuration of your Instance HA service parameters:

$ oc get cm <config_map_name> -o yaml

The following example displays the default values of the parameters configured in the instanceha-config file when the Instance HA service pod is deployed.

$ oc get cm instanceha-config -o yaml
apiVersion: v1
data:
  config.yaml: |
    config:
    EVACUABLE_TAG: "evacuable"
    TAGGED_IMAGES: "true"
    TAGGED_FLAVORS: "true"
    DELTA: "30"
    DELAY: "0"
    POLL: "45"
    THRESHOLD: "50"
    WORKERS: "4"
    SMART_EVACUATION: "false"
    RESERVED_HOSTS: "false"
    LEAVE_DISABLED: "false"
    FORCE_ENABLE: "false"
    CHECK_KDUMP: "false"
    LOGLEVEL: "info"
    DISABLED: "false"
kind: ConfigMap

2.3. Removing the Instance HA service

If you want to completely remove the Red Hat OpenStack Services on OpenShift (RHOSO) high availability for Compute instances (Instance HA) service, in addition to removing the Instance HA service, you must remove the ConfigMap containing the Instance HA service parameters and the fencing secret containing the fencing configuration of the Compute nodes that can be evacuated.

Prerequisites

  • You must know the name of your deployed Instance HA service, which is the .metadata.name that you specified in the manifest file. You can run this command to obtain this name: $ oc get instanceha.
  • You must know the name of the ConfigMap containing the Instance HA service parameters.
  • You must know the name of the fencingSecret YAML file containing the fencing configuration of the Compute nodes that can be evacuated.

Procedure

  1. Delete the Instance HA service:

    $ oc delete instanceha/<instanceha_service_name>
    • Replace <instanceha_service_name> with the name of your deployed Instance HA service, for example, instanceha-0.
  2. Delete the ConfigMap containing the Instance HA service parameters:

    $ oc delete cm/<config_map_name> instanceha-config
    • Replace <config_map_name> with the name of the ConfigMap. For example, if you use the default ConfigMap the ConfigMap is instanceha-config.
  3. Delete the fencing secret containing the fencing configuration of the Compute nodes that can be evacuated:

    $ oc delete secret/<fencing_secret_name>
    • Replace <fencing_secret_name> with the name of the fencing secret that you specified when defining the specification of your Instance HA service pod, for example, fencing-0.
Red Hat logoGithubredditYoutubeTwitter

Formazione

Prova, acquista e vendi

Community

Informazioni sulla documentazione di Red Hat

Aiutiamo gli utenti Red Hat a innovarsi e raggiungere i propri obiettivi con i nostri prodotti e servizi grazie a contenuti di cui possono fidarsi. Esplora i nostri ultimi aggiornamenti.

Rendiamo l’open source più inclusivo

Red Hat si impegna a sostituire il linguaggio problematico nel codice, nella documentazione e nelle proprietà web. Per maggiori dettagli, visita il Blog di Red Hat.

Informazioni su Red Hat

Forniamo soluzioni consolidate che rendono più semplice per le aziende lavorare su piattaforme e ambienti diversi, dal datacenter centrale all'edge della rete.

Theme

© 2026 Red Hat
Torna in cima