Este conteúdo não está disponível no idioma selecionado.
Chapter 2. Considerations for implementing the Optimize service
Review software requirements, supported features, and optimization strategies before deploying the Optimize service (watcher) in your Red Hat OpenStack Services on OpenShift (RHOSO) environment. Understanding these requirements helps you configure the service correctly and choose the right strategies for your infrastructure goals.
The topics included in this section are:
2.1. Optimize service (watcher) supported features Copiar o linkLink copiado para a área de transferência!
Use only the documented features and strategies when implementing the Optimize service (watcher) in your production environment. Limiting your implementation to supported features ensures supportability and service functions reliably.
2.2. Optimize service software requirements Copiar o linkLink copiado para a área de transferência!
Verify that your Red Hat OpenStack Services on OpenShift (RHOSO) environment meets the necessary prerequisites before deploying the Optimize service (watcher). Ensuring you have the required services and the required version helps prevent deployment issues and ensures the Optimize service functions correctly.
The Red Hat OpenStack Services on OpenShift (RHOSO) Optimize service (watcher) requires that you have an existing deployment of RHOSO 18.0.6 or later that includes, at least, the following components:
- Compute service (nova)
- Identity service (keystone)
- Image service (glance)
- MariaDB
- RabbitMQ
- Telemetry service, including the Prometheus metrics store
The Dashboard service (horizon) is not required. However, if it is enabled in the OpenStack cloud, the Optimize service dashboard is available in the Dashboard service post deployment.
2.3. Improving Optimize service accuracy with OpenStack service notifications Copiar o linkLink copiado para a área de transferência!
Enable service notifications in Red Hat OpenStack Services on OpenShift (RHOSO) to keep the Optimize service (watcher) internal model synchronized with your infrastructure in real time. Configuring notifications prevents the service from working with outdated cluster information when creating action plans for large environments.
Prerequisites
-
You are logged on to a workstation that has access to the RHOSO control plane as a user with
cluster-adminprivileges. -
You have the
occommand line tool installed on your workstation.
The Optimize service (watcher) has an internal model of your cluster that it uses to create action plans. By default, the Optimize service queries for updates every 15 minutes. In large clusters, the internal model might be out of date when an action plan is created.
You can ensure that the internal model used by the Optimize (watcher) service is always up to date by enabling notifications to be sent from the Compute (nova) and Storage (cinder) services, and then enabling Optimize to receive those notifications.
Procedure
-
On your workstation, open your
OpenStackControlPlanecustom resource (CR) file calledopenstack_control_plane.yaml. Add a new messaging bus instance for your cluster:
spec: rabbitmq: enabled: true templates: ... <rabbitmq_notification_server>: delayStartSeconds: 30 override: service: metadata: annotations: metallb.universe.tf/address-pool: internalapi metallb.universe.tf/loadBalancerIPs: <ip_address> spec: type: LoadBalancer-
Replace
<rabbitmq_notification_server>with the name of your notification server, for example,rabbitmq-notifications. -
Replace
<ip_address>with the appropriate ip address based on your networking plan and configuration.
-
Replace
Enable the notifications in your cluster. This change enables all services supporting notifications to send or consume notifications:
spec: notificationsBusInstance: <rabbitmq_bus_name>-
Replace
<rabbitmq_bus_name>with the name of your notification bus, for example,rabbitmq-notifications.
-
Replace
-
Save your changes to the
openstack_control_plane.yamlCR file. Update the control plane:
$ oc apply -f openstack_control_plane.yaml -n openstack
2.4. Optimize service strategies Copiar o linkLink copiado para a área de transferência!
Select the appropriate strategy to accomplish your optimization goals, such as consolidating workloads or balancing resource usage. Knowing the available strategies and their parameters helps you create effective audits that address your operational needs.
In RHOSO 18.0.10, the Optimize service supports the following strategies:
2.4.1. Host maintenance strategy Copiar o linkLink copiado para a área de transferência!
Prepare Compute nodes for scheduled maintenance by migrating all instances to other nodes without interrupting users. This strategy helps you perform hardware updates or repairs while maintaining service availability.
When a backup node is provided, the strategy moves all instances to the backup node, using the Compute service (nova) migration feature. When there is no backup node, the strategy relies on nova-scheduler to migrate all instances. The host maintenance strategy sets the status of the Compute node which no longer hosts running instances to disabled.
For a demonstration of how to use this strategy, see Preparing Compute nodes for planned maintenance.
Requirements:
- A minimum of two physical hosts which serve as Compute nodes.
- Ability for the Compute service (nova) to perform live and cold migrations.
User-supplied parameters:
-
-p maintenance_node=<compute_node_name>(required) - name of the Compute node which needs maintenance. -
-p backup_node=<compute_node_name>(optional) - name of the backup Compute node where instances will be live migrated. When there is no backup node, the strategy relies onnova-schedulerto migrate all instances.
-
2.4.2. Node resource consolidation strategy Copiar o linkLink copiado para a área de transferência!
Consolidate workloads onto fewer Compute nodes to free up hardware resources. This strategy moves instances between nodes while keeping all nodes active and available for future workload placement.
The node resource consolidation strategy moves instances between a source and a destination Compute node in the cluster. It uses the Compute service (nova) live migration feature to consolidate resource usage. The node resource strategy does not change the status of the source Compute node.
For a demonstration of how to use this strategy, see Consolidating node resources.
Requirements:
- A minimum of two Compute nodes that use the same CPU and RAM hardware.
- Ability for the Compute service to perform live migrations to any active Compute nodes.
User-supplied parameters:
-
-p host_choice=<auto | specify>(optional) - the method used to select the server migration destination node. The valueautocauses the Compute service (nova) scheduler to select the migration destination node. The valuespecifycauses the strategy to select the migration destination node. Whenhost_choiceis not specified, the strategy defaults toauto.
-
2.4.3. VM workload consolidation strategy Copiar o linkLink copiado para a área de transferência!
Use the VM workload consolidation strategy to consolidate VM instances so that you can disable underutilized nodes. This strategy helps you reduce power consumption by concentrating workloads and freeing up nodes that can be powered down.
This strategy uses the Compute service (nova) live migration feature and sets the status of the source Compute node to disabled.
For a demonstration of how to use this strategy, see Consolidating VM instances.
Requirements:
- A minimum of two physical hosts which serve as Compute nodes.
- Ability for the Compute service to perform migrations to any active Compute nodes.
User-supplied parameters:
-
-p period=<seconds>(optional) - time interval in seconds for getting statistic aggregation from the metric data source. Whenperiodis not specified, the strategy defaults to one hour (3600seconds).
-
2.4.4. Workload balance migration strategy Copiar o linkLink copiado para a área de transferência!
Balance workload distribution across Compute nodes by migrating individual instances when resource utilization exceeds configured thresholds. This strategy helps prevent performance bottlenecks by ensuring that no single node becomes overloaded while others remain underutilized.
For a demonstration of how to use this strategy, see Balancing single instance workloads.
Requirements:
- A minimum of two Compute nodes that use the CPU and RAM hardware.
- Ability for the Compute service to perform live migrations to any active Compute nodes.
User-supplied parameters:
All parameters are optional. When the user does not supply a parameter, the strategy uses the default value.
-
-p metrics=instance_cpu_usage|instance_ram_usage- the type of workload balancing desired: based on CPU or RAM utilization. The default isinstance_cpu_usage. -
-p threshold=<percentage>- a percentage, a decimal number in the form (n.n) in the range0.0-100.0, that is the threshold for CPU or memory usage in the Compute hosts. When either the CPU usage or RAM usage exceeds the threshold percentage, the optimize service searches for an instance live migration that improves the balance of Compute node resources usage. Thethresholdvalue applies to both the source and the destination node involved in the migration. The default is25.0. -
-p period=<seconds>- the time interval in seconds that the threshold is evaluated. The default is five minutes (300seconds).
-
2.4.5. Workload stabilization strategy Copiar o linkLink copiado para a área de transferência!
Stabilize cluster performance by redistributing VM instances when workload distribution becomes uneven across Compute nodes. This strategy uses a standard deviation to identify and correct resource imbalances.
For a demonstration of how to use this strategy, see Stabilizing multiple workloads.
Requirements:
- A minimum of two Compute nodes that use the CPU and RAM hardware.
- Ability for the Compute service to perform live migrations to any active Compute nodes.
User-supplied parameters:
All parameters are optional. When the user does not supply a parameter, the strategy uses the default value.
-
-p metrics='["instance_cpu_usage","instance_ram_usage"]'- the metric or metrics the strategy uses to evaluate cluster workloads. The default is to evaluate both CPU usage and RAM usage, unless one is not specified. -
-p thresholds='{"instance_cpu_usage": <trigger_value>, "instance_ram_usage": <trigger_value>}'- numbers in the range of0.0 - 0.5that represent a value for the standard deviation of the normalized CPU or RAM usage, where the value0.0is a perfectly balanced cluster, and the value0.5would be a totally unbalanced cluster. When the values ofthresholdsis exceeded, the strategy is triggered to look for an action plan. The default threshold values for both usage types is0.2. -
-p weights='{"instance_cpu_usage_weight": 1.0, "instance_ram_usage_weight": 1.0}'- numbers, in decimal form, that are used to calculate common standard deviation. The default weight values for both usage types is1.0. -
-p instance_metrics='{"instance_cpu_usage":"host_cpu_usage","instance_ram_usage":"host_ram_usage"}'- a mapping that the strategy uses to get hardware statistics using instance metrics for CPU and RAM usage. Do not change these parameters or their values. -
-p host_choice='cycle|retry|fullsearch'- the method that the strategy uses to obtain the destination host for each live migration. Valid values are:cyclewhich queries hosts in an iteration;retrywhich query random hosts using thecount_retryparameter;fullsearchwhich queries each host from a list. The default value isretry. -
-p retry_count='<number>'- a number used when-p host_choice='retry'is specified for the number of random queries used. The default value is1. -
-p periods='{"instance":<seconds>,"node":<seconds>}'- repeating intervals of time, in seconds, into which the instance and host samples are grouped for aggregation. The Optimize service (watcher) uses only the last period. The default periods for instance and node are720and600seconds, respectively.
-
2.4.6. Zone migration strategy Copiar o linkLink copiado para a área de transferência!
Use the zone migration strategy to migrate instances and volumes between user-defined zones of Compute nodes and storage pools to prepare for infrastructure maintenance.
For a demonstration of how to use this strategy, see Streamlining workload migrations.
The term zone in the zone migration strategy refers to a user-defined set of Compute nodes and storage pools. Zone does not refer to OpenStack availability zones.
Prerequisites for instance zone migration:
- A minimum of two physical hosts which serve as Compute nodes.
- Ability for the Compute service (nova) to perform live and cold migrations.
Prerequisites for volume zone migration:
- A minimum of two cinder pools that you can migrate volumes between.
User-supplied parameters:
-
-p compute_nodes='[{<compute_nodes_array_elements>}]'- the Compute nodes to migrate. See "compute_nodesarray elements" later in this document. -
-p storage_pools='[{<storage_pools_array_elements>}]'- the Storage pools to migrate. See "storage_poolsarray elements" later in this document. -
-p parallel_total- the total number of actions that run in parallel. The default is6. -
-p parallel_per_node- the number of actions that run in parallel per compute node in one action plan. The default is2. -
-p parallel_per_pool- the number of actions that run in parallel per storage pool. The default is2. -
-p priority- a list that prioritizes instances and volumes. -p with_attached_volume- controls instance migration order relative to attached volumes.- False (default): Instances migrate after all volumes migrate.
- True: An instance migrates after its attached volumes migrate.
-
compute_nodesarray elements:`-p compute_nodes='[{"src_node":"<compute_node_name>", "dst_node":"<compute_node_name>"}]'`-
"src_node":"<compute_node_name>"(required) - the name of the Compute node from which instances migrate. -
"dst_node":"<compute_node_name>"- the name of the Compute node to which instances migrate. If you do not specify the destination node, then it is determined throughnova-scheduler.
-
storage_poolsarray elements:`-p storage_pools='[{"src_pool":"<storage_pool_name>", "src_type":"<volume_type>", "dst_pool":"<storage_pool_name>", "dst_type":"<volume_type>"}]'`-
"src_pool":"<storage_pool_name>"- the storage pool from which volumes migrate. -
"dst_pool":"<storage_pool_name>"- the storage pool to which volumes migrate. -
"src_type":"<volume_type>"- the source volume type. -
"dst_type":"<volume_type>"- the destination volume type.
-
priorityobject elements (compute):`-p priority='[{ "project":"<project_name>,...", "compute_nodes='[{<compute_node_array_elements>}]'", "compute='["vcpu_num", "mem_size", "disk_size", "created_at"]'" }]'`-
"project":"<project_name>,..."- the project (tenant) names that contain the prioritized Compute nodes and volumes. -
"compute_nodes='[{<compute_node_array_elements>}]'"- the Compute node names that you want to prioritize. See "compute_nodesarray elements" earlier in this document. -
"compute='["vcpu_num", "mem_size", "disk_size", "created_at"]'"- attributes of the instances that you want to prioritize.
-
priorityobject elements (storage):`-p priority='[{ "project":"<project_name>,...", "storage_pool='[{<storage_pool_array_elements>}], "storage='["size", "created_at"]}]' }]'`-
"project":"<project_name>,..."- the project (tenant) names that contain the prioritized Compute nodes and volumes. -
"storage_pool"="[{<storage_pool_array_elements>}]"- the Storage pool names that you want to prioritize. -
"storage"="["size", "created_at"]"- Volume attributes.
-
2.5. Verifying your RHOSO environment for the Optimize service Copiar o linkLink copiado para a área de transferência!
Confirm that your Red Hat OpenStack Services on OpenShift (RHOSO) environment has met all requirements before deploying the Optimize service (watcher). Verifying prerequisites prevents deployment failures and ensures the Optimize service can function properly.
Prerequisites
- A functional RHOSO 18.0.6 or later deployment that contains two or more Compute nodes.
- The Compute service (nova) live migration feature is operational.
-
You have the
occommand line tool installed on your workstation. -
You are logged on to a workstation that has access to the RHOSO control plane as a user with
cluster-adminprivileges.
Procedure
Verify that the service endpoints are available:
$ oc rsh openstackclient openstack endpoint list -c 'ID' -c 'Service Name' -c 'Enabled'- Sample output
+----------------------------------+--------------+---------+ | ID | Service Name | Enabled | +----------------------------------+--------------+---------+ | 0bada656064a4d409bc5fed610654edd | neutron | True | | 17453066f8dc40bfa0f8584007cffc9a | cinderv3 | True | | 22768bf3e9a34fefa57b96c20d405cfe | keystone | True | | 54e3d48cdda84263b7f1c65c924f3e3a | glance | True | | 74345a18262740eb952d2b6b7220ceeb | keystone | True | | 789a2d6048174b849a7c7243421675b4 | placement | True | | 9b7d8f26834343a59108a4225e0e574a | nova | True | | a836d134394846ff88f2f3dd8d96de34 | nova | True | | af1bf23e62c148d3b7f6c47f8f071739 | placement | True | | ce0489dfeff64afb859338e480397f90 | glance | True | | db69cc22117344b796f97e8dd3dc67e5 | neutron | True | | fa48dc132b524915b4d1ca963c50a653 | cinderv3 | True | +----------------------------------+--------------+---------+
Verify that the Telemetry Operator with Prometheus metric storage is ready:
$ oc get telemetry- Sample output
NAME STATUS MESSAGE telemetry True Setup complete$ oc get metricstorage- Sample output
NAME STATUS MESSAGE metric-storage True Setup complete