Este conteúdo não está disponível no idioma selecionado.

Chapter 2. Considerations for implementing the Optimize service


Review software requirements, supported features, and optimization strategies before deploying the Optimize service (watcher) in your Red Hat OpenStack Services on OpenShift (RHOSO) environment. Understanding these requirements helps you configure the service correctly and choose the right strategies for your infrastructure goals.

The topics included in this section are:

2.1. Optimize service (watcher) supported features

Use only the documented features and strategies when implementing the Optimize service (watcher) in your production environment. Limiting your implementation to supported features ensures supportability and service functions reliably.

2.2. Optimize service software requirements

Verify that your Red Hat OpenStack Services on OpenShift (RHOSO) environment meets the necessary prerequisites before deploying the Optimize service (watcher). Ensuring you have the required services and the required version helps prevent deployment issues and ensures the Optimize service functions correctly.

The Red Hat OpenStack Services on OpenShift (RHOSO) Optimize service (watcher) requires that you have an existing deployment of RHOSO 18.0.6 or later that includes, at least, the following components:

  • Compute service (nova)
  • Identity service (keystone)
  • Image service (glance)
  • MariaDB
  • RabbitMQ
  • Telemetry service, including the Prometheus metrics store

The Dashboard service (horizon) is not required. However, if it is enabled in the OpenStack cloud, the Optimize service dashboard is available in the Dashboard service post deployment.

2.3. Improving Optimize service accuracy with OpenStack service notifications

Enable service notifications in Red Hat OpenStack Services on OpenShift (RHOSO) to keep the Optimize service (watcher) internal model synchronized with your infrastructure in real time. Configuring notifications prevents the service from working with outdated cluster information when creating action plans for large environments.

Prerequisites

  • You are logged on to a workstation that has access to the RHOSO control plane as a user with cluster-admin privileges.
  • You have the oc command line tool installed on your workstation.

The Optimize service (watcher) has an internal model of your cluster that it uses to create action plans. By default, the Optimize service queries for updates every 15 minutes. In large clusters, the internal model might be out of date when an action plan is created.

You can ensure that the internal model used by the Optimize (watcher) service is always up to date by enabling notifications to be sent from the Compute (nova) and Storage (cinder) services, and then enabling Optimize to receive those notifications.

Procedure

  1. On your workstation, open your OpenStackControlPlane custom resource (CR) file called openstack_control_plane.yaml.
  2. Add a new messaging bus instance for your cluster:

    spec:
        rabbitmq:
            enabled: true
            templates:
            ...
                <rabbitmq_notification_server>:
                    delayStartSeconds: 30
                    override:
                        service:
                            metadata:
                                annotations:
                                    metallb.universe.tf/address-pool: internalapi
                                    metallb.universe.tf/loadBalancerIPs: <ip_address>
                            spec:
                                type: LoadBalancer
    • Replace <rabbitmq_notification_server> with the name of your notification server, for example, rabbitmq-notifications.
    • Replace <ip_address> with the appropriate ip address based on your networking plan and configuration.
  3. Enable the notifications in your cluster. This change enables all services supporting notifications to send or consume notifications:

    spec:
      notificationsBusInstance: <rabbitmq_bus_name>
    • Replace <rabbitmq_bus_name> with the name of your notification bus, for example, rabbitmq-notifications.
  4. Save your changes to the openstack_control_plane.yaml CR file.
  5. Update the control plane:

    $ oc apply -f openstack_control_plane.yaml -n openstack

2.4. Optimize service strategies

Select the appropriate strategy to accomplish your optimization goals, such as consolidating workloads or balancing resource usage. Knowing the available strategies and their parameters helps you create effective audits that address your operational needs.

In RHOSO 18.0.10, the Optimize service supports the following strategies:

2.4.1. Host maintenance strategy

Prepare Compute nodes for scheduled maintenance by migrating all instances to other nodes without interrupting users. This strategy helps you perform hardware updates or repairs while maintaining service availability.

When a backup node is provided, the strategy moves all instances to the backup node, using the Compute service (nova) migration feature. When there is no backup node, the strategy relies on nova-scheduler to migrate all instances. The host maintenance strategy sets the status of the Compute node which no longer hosts running instances to disabled.

For a demonstration of how to use this strategy, see Preparing Compute nodes for planned maintenance.

  • Requirements:

    • A minimum of two physical hosts which serve as Compute nodes.
    • Ability for the Compute service (nova) to perform live and cold migrations.
  • User-supplied parameters:

    • -p maintenance_node=<compute_node_name> (required) - name of the Compute node which needs maintenance.
    • -p backup_node=<compute_node_name> (optional) - name of the backup Compute node where instances will be live migrated. When there is no backup node, the strategy relies on nova-scheduler to migrate all instances.

2.4.2. Node resource consolidation strategy

Consolidate workloads onto fewer Compute nodes to free up hardware resources. This strategy moves instances between nodes while keeping all nodes active and available for future workload placement.

The node resource consolidation strategy moves instances between a source and a destination Compute node in the cluster. It uses the Compute service (nova) live migration feature to consolidate resource usage. The node resource strategy does not change the status of the source Compute node.

For a demonstration of how to use this strategy, see Consolidating node resources.

  • Requirements:

    • A minimum of two Compute nodes that use the same CPU and RAM hardware.
    • Ability for the Compute service to perform live migrations to any active Compute nodes.
  • User-supplied parameters:

    • -p host_choice=<auto | specify> (optional) - the method used to select the server migration destination node. The value auto causes the Compute service (nova) scheduler to select the migration destination node. The value specify causes the strategy to select the migration destination node. When host_choice is not specified, the strategy defaults to auto.

2.4.3. VM workload consolidation strategy

Use the VM workload consolidation strategy to consolidate VM instances so that you can disable underutilized nodes. This strategy helps you reduce power consumption by concentrating workloads and freeing up nodes that can be powered down.

This strategy uses the Compute service (nova) live migration feature and sets the status of the source Compute node to disabled.

For a demonstration of how to use this strategy, see Consolidating VM instances.

  • Requirements:

    • A minimum of two physical hosts which serve as Compute nodes.
    • Ability for the Compute service to perform migrations to any active Compute nodes.
  • User-supplied parameters:

    • -p period=<seconds> (optional) - time interval in seconds for getting statistic aggregation from the metric data source. When period is not specified, the strategy defaults to one hour (3600 seconds).

2.4.4. Workload balance migration strategy

Balance workload distribution across Compute nodes by migrating individual instances when resource utilization exceeds configured thresholds. This strategy helps prevent performance bottlenecks by ensuring that no single node becomes overloaded while others remain underutilized.

For a demonstration of how to use this strategy, see Balancing single instance workloads.

  • Requirements:

    • A minimum of two Compute nodes that use the CPU and RAM hardware.
    • Ability for the Compute service to perform live migrations to any active Compute nodes.
  • User-supplied parameters:

    All parameters are optional. When the user does not supply a parameter, the strategy uses the default value.

    • -p metrics=instance_cpu_usage|instance_ram_usage - the type of workload balancing desired: based on CPU or RAM utilization. The default is instance_cpu_usage.
    • -p threshold=<percentage> - a percentage, a decimal number in the form (n.n) in the range 0.0 - 100.0, that is the threshold for CPU or memory usage in the Compute hosts. When either the CPU usage or RAM usage exceeds the threshold percentage, the optimize service searches for an instance live migration that improves the balance of Compute node resources usage. The threshold value applies to both the source and the destination node involved in the migration. The default is 25.0.
    • -p period=<seconds> - the time interval in seconds that the threshold is evaluated. The default is five minutes (300 seconds).

2.4.5. Workload stabilization strategy

Stabilize cluster performance by redistributing VM instances when workload distribution becomes uneven across Compute nodes. This strategy uses a standard deviation to identify and correct resource imbalances.

For a demonstration of how to use this strategy, see Stabilizing multiple workloads.

  • Requirements:

    • A minimum of two Compute nodes that use the CPU and RAM hardware.
    • Ability for the Compute service to perform live migrations to any active Compute nodes.
  • User-supplied parameters:

    All parameters are optional. When the user does not supply a parameter, the strategy uses the default value.

    • -p metrics='["instance_cpu_usage","instance_ram_usage"]' - the metric or metrics the strategy uses to evaluate cluster workloads. The default is to evaluate both CPU usage and RAM usage, unless one is not specified.
    • -p thresholds='{"instance_cpu_usage": <trigger_value>, "instance_ram_usage": <trigger_value>}' - numbers in the range of 0.0 - 0.5 that represent a value for the standard deviation of the normalized CPU or RAM usage, where the value 0.0 is a perfectly balanced cluster, and the value 0.5 would be a totally unbalanced cluster. When the values of thresholds is exceeded, the strategy is triggered to look for an action plan. The default threshold values for both usage types is 0.2.
    • -p weights='{"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}' - numbers, in decimal form, that are used to calculate common standard deviation. The default weight values for both usage types is 1.0.
    • -p instance_metrics='{"instance_cpu_usage":"host_cpu_usage","instance_ram_usage":"host_ram_usage"}' - a mapping that the strategy uses to get hardware statistics using instance metrics for CPU and RAM usage. Do not change these parameters or their values.
    • -p host_choice='cycle|retry|fullsearch' - the method that the strategy uses to obtain the destination host for each live migration. Valid values are: cycle which queries hosts in an iteration; retry which query random hosts using the count_retry parameter; fullsearch which queries each host from a list. The default value is retry.
    • -p retry_count='<number>' - a number used when -p host_choice='retry' is specified for the number of random queries used. The default value is 1.
    • -p periods='{"instance":<seconds>,"node":<seconds>}' - repeating intervals of time, in seconds, into which the instance and host samples are grouped for aggregation. The Optimize service (watcher) uses only the last period. The default periods for instance and node are 720 and 600 seconds, respectively.

2.4.6. Zone migration strategy

Use the zone migration strategy to migrate instances and volumes between user-defined zones of Compute nodes and storage pools to prepare for infrastructure maintenance.

For a demonstration of how to use this strategy, see Streamlining workload migrations.

Note

The term zone in the zone migration strategy refers to a user-defined set of Compute nodes and storage pools. Zone does not refer to OpenStack availability zones.

  • Prerequisites for instance zone migration:

    • A minimum of two physical hosts which serve as Compute nodes.
    • Ability for the Compute service (nova) to perform live and cold migrations.
  • Prerequisites for volume zone migration:

    • A minimum of two cinder pools that you can migrate volumes between.
  • User-supplied parameters:

    • -p compute_nodes='[{<compute_nodes_array_elements>}]' - the Compute nodes to migrate. See "compute_nodes array elements" later in this document.
    • -p storage_pools='[{<storage_pools_array_elements>}]' - the Storage pools to migrate. See "storage_pools array elements" later in this document.
    • -p parallel_total - the total number of actions that run in parallel. The default is 6.
    • -p parallel_per_node - the number of actions that run in parallel per compute node in one action plan. The default is 2.
    • -p parallel_per_pool - the number of actions that run in parallel per storage pool. The default is 2.
    • -p priority - a list that prioritizes instances and volumes.
    • -p with_attached_volume - controls instance migration order relative to attached volumes.

      • False (default): Instances migrate after all volumes migrate.
      • True: An instance migrates after its attached volumes migrate.
  • compute_nodes array elements:

    `-p compute_nodes='[{"src_node":"<compute_node_name>",
    "dst_node":"<compute_node_name>"}]'`
    • "src_node":"<compute_node_name>" (required) - the name of the Compute node from which instances migrate.
    • "dst_node":"<compute_node_name>" - the name of the Compute node to which instances migrate. If you do not specify the destination node, then it is determined through nova-scheduler.
  • storage_pools array elements:

    `-p storage_pools='[{"src_pool":"<storage_pool_name>",
    "src_type":"<volume_type>", "dst_pool":"<storage_pool_name>",
    "dst_type":"<volume_type>"}]'`
    • "src_pool":"<storage_pool_name>" - the storage pool from which volumes migrate.
    • "dst_pool":"<storage_pool_name>" - the storage pool to which volumes migrate.
    • "src_type":"<volume_type>" - the source volume type.
    • "dst_type":"<volume_type>" - the destination volume type.
  • priority object elements (compute):

    `-p priority='[{
    "project":"<project_name>,...",
    "compute_nodes='[{<compute_node_array_elements>}]'",
    "compute='["vcpu_num", "mem_size", "disk_size", "created_at"]'"
    }]'`
    • "project":"<project_name>,..." - the project (tenant) names that contain the prioritized Compute nodes and volumes.
    • "compute_nodes='[{<compute_node_array_elements>}]'" - the Compute node names that you want to prioritize. See "compute_nodes array elements" earlier in this document.
    • "compute='["vcpu_num", "mem_size", "disk_size", "created_at"]'" - attributes of the instances that you want to prioritize.
  • priority object elements (storage):

    `-p priority='[{
    "project":"<project_name>,...",
    "storage_pool='[{<storage_pool_array_elements>}],
    "storage='["size", "created_at"]}]'
    }]'`
    • "project":"<project_name>,..." - the project (tenant) names that contain the prioritized Compute nodes and volumes.
    • "storage_pool"="[{<storage_pool_array_elements>}]" - the Storage pool names that you want to prioritize.
    • "storage"="["size", "created_at"]" - Volume attributes.

2.5. Verifying your RHOSO environment for the Optimize service

Confirm that your Red Hat OpenStack Services on OpenShift (RHOSO) environment has met all requirements before deploying the Optimize service (watcher). Verifying prerequisites prevents deployment failures and ensures the Optimize service can function properly.

Prerequisites

  • A functional RHOSO 18.0.6 or later deployment that contains two or more Compute nodes.
  • The Compute service (nova) live migration feature is operational.
  • You have the oc command line tool installed on your workstation.
  • You are logged on to a workstation that has access to the RHOSO control plane as a user with cluster-admin privileges.

Procedure

  1. Verify that the service endpoints are available:

    $ oc rsh openstackclient openstack endpoint list -c 'ID' -c 'Service Name' -c 'Enabled'
    Sample output
    +----------------------------------+--------------+---------+
    | ID                               | Service Name | Enabled |
    +----------------------------------+--------------+---------+
    | 0bada656064a4d409bc5fed610654edd | neutron      | True    |
    | 17453066f8dc40bfa0f8584007cffc9a | cinderv3     | True    |
    | 22768bf3e9a34fefa57b96c20d405cfe | keystone     | True    |
    | 54e3d48cdda84263b7f1c65c924f3e3a | glance       | True    |
    | 74345a18262740eb952d2b6b7220ceeb | keystone     | True    |
    | 789a2d6048174b849a7c7243421675b4 | placement    | True    |
    | 9b7d8f26834343a59108a4225e0e574a | nova         | True    |
    | a836d134394846ff88f2f3dd8d96de34 | nova         | True    |
    | af1bf23e62c148d3b7f6c47f8f071739 | placement    | True    |
    | ce0489dfeff64afb859338e480397f90 | glance       | True    |
    | db69cc22117344b796f97e8dd3dc67e5 | neutron      | True    |
    | fa48dc132b524915b4d1ca963c50a653 | cinderv3     | True    |
    +----------------------------------+--------------+---------+
  2. Verify that the Telemetry Operator with Prometheus metric storage is ready:

    $ oc get telemetry
    Sample output
    NAME        STATUS   MESSAGE
    telemetry   True     Setup complete
    $ oc get metricstorage
    Sample output
    NAME             STATUS   MESSAGE
    metric-storage   True     Setup complete
Red Hat logoGithubredditYoutubeTwitter

Aprender

Experimente, compre e venda

Comunidades

Sobre a documentação da Red Hat

Ajudamos os usuários da Red Hat a inovar e atingir seus objetivos com nossos produtos e serviços com conteúdo em que podem confiar. Explore nossas atualizações recentes.

Tornando o open source mais inclusivo

A Red Hat está comprometida em substituir a linguagem problemática em nosso código, documentação e propriedades da web. Para mais detalhes veja o Blog da Red Hat.

Sobre a Red Hat

Fornecemos soluções robustas que facilitam o trabalho das empresas em plataformas e ambientes, desde o data center principal até a borda da rede.

Theme

© 2026 Red Hat
Voltar ao topo