Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 8. Working with hardware profiles
Hardware profiles are currently available in Red Hat OpenShift AI as a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope
In Red Hat OpenShift AI, you can use hardware profiles to manage and allocate specific hardware resources, such as hardware accelerators, specialized memory, or CPU-only nodes for data science, machine learning, and generative AI workloads.
Hardware profiles are custom resources (CRs) for targeted scheduling that allow you to specify the exact resources you need for workloads such as workbenches and model serving. You can create your hardware profile in OpenShift AI to specify a particular hardware configuration by going to Settings
By default, this feature is hidden from appearing in the dashboard navigation menu. To show the Settings disableHardwareProfiles value to false in the OdhDashboardConfig custom resource (CR) in OpenShift. For more information about setting dashboard configuration options, see Customizing the dashboard.
These profiles offer fine-grained control over resource allocation by defining specifications that include:
- Hardware identifiers
- Explicit resource limits (such as CPU, memory, and accelerators)
- Tolerations
- Node selectors
To get started, contact your cluster administrator to identify the available hardware resources in your cluster.
8.1. Creating a hardware profile Copier lienLien copié sur presse-papiers!
To configure specific hardware configurations for your data scientists to use in OpenShift AI, you must create an associated hardware profile.
Prerequisites
You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
NoteIf you need a hardware profile and do not have OpenShift AI administrator privileges, contact your OpenShift AI administrator to request assistance.
- You have installed the relevant hardware and confirmed that it is detected in your environment.
- You have verified your desired GPU type and vRAM size.
Procedure
From the OpenShift AI dashboard, click Settings
Hardware profiles. The Hardware profiles page opens, displaying existing hardware profiles. To enable or disable an existing hardware profile, on the row containing the relevant hardware profile, click the toggle in the Enabled column.
Click Create hardware profile.
The Create hardware profile page opens.
- In the Name field, enter a name for the hardware profile.
- Optional: To change the default name of your Kubernetes resource, click Edit resource name and enter a name in the Resource name field. The resource name cannot be edited after creation.
- Optional: In the Description field, enter a description for the hardware profile.
In the Visibility section, set the hardware profile visibility level to indicate where the hardware profile can be used in workbenches and model deployment:
- To access the hardware profile in all areas of OpenShift AI, leave the Visible everywhere radio button selected.
- To limit the areas of OpenShift AI where your data scientists can use the hardware profile, select the Limited visibility radio button.
Optional: Configure resource requests and limits:
Click Add resource.
The Add resource dialog opens. For more information about custom resources, see Managing resources from custom resource definitions in the Additional Resources section.
- In the Resource label field, enter a unique resource label.
- In the Resource identifier field, enter a unique resource identifier.
- From the Resource type field, select a resource type from the list.
- In the Default field, enter the default resource request limit. This value must be equal to or between the minimum and maximum limits.
- In the Minimum allowed field, enter the minimum number of resources that users can request.
In the Maximum allowed field, enter the maximum number of resources that users can request:
- To set a specific maximum request limit, click the Set maximum limit radio button and enter a value.
- To set no maximum request limit, click the No maximum limit radio button.
- Click Add.
In the Resource allocation section, select a Workload allocation strategy to configure how workloads are assigned to nodes:
- Local queue
- To use Kueue to automatically queue jobs and manage resources based on workload priority, select Local queue. This option is available only if your cluster is configured to manage workloads with Kueue.
In the Local queue field, enter the name of the
LocalQueuethat this hardware profile will use.NoteFor globally scoped profiles, use a
LocalQueuename that exists in all user projects, such as the defaultLocalQueuecreated by the OpenShift AI Operator.- Optional: From the Workload priority list, select a priority for jobs that use this profile. Higher-priority workloads are admitted before lower-priority workloads when resources are limited.
- Node selectors and tolerations
- To manually add node selectors and tolerations, select Node selectors and tolerations. For more information about taints and tolerations, see Understanding taints and tolerations in the Additional Resources section.
Optional: Add a node selector to schedule pods on nodes with matching labels.
Click Add node selector.
The Add node selector dialog opens.
- In the Key field, enter a node selection key. The key must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
- In the Value field, enter a node selection value. The value must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
- Click Add.
Optional: Add a toleration to schedule pods with matching taints.
Click Add toleration.
The Add toleration dialog opens.
From the Operator list, select one of the following options:
- Equal - The key/value/effect parameters must match. This is the default.
- Exists - The key/effect parameters must match. You must leave a blank value parameter, which matches any.
From the Effect list, select one of the following options:
- None
- NoSchedule - New pods that do not match the taint are not scheduled onto that node. Existing pods on the node remain.
- PreferNoSchedule - New pods that do not match the taint might be scheduled onto that node, but the scheduler tries not to. Existing pods on the node remain.
- NoExecute - New pods that do not match the taint cannot be scheduled onto that node. Existing pods on the node that do not have a matching toleration are removed.
- In the Key field, enter a toleration key. The key must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
- In the Value field, enter a toleration value. The value must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
In the Toleration Seconds section, select one of the following options to specify how long a pod stays bound to a node that has a node condition:
- Forever - Pods stays permanently bound to a node.
- Custom value - Enter a value, in seconds, to define how long pods stay bound to a node that has a node condition.
- Click Add.
- Click Create hardware profile.
Verification
- The hardware profile is displayed on the Hardware profiles page.
- The hardware profile is displayed in the Hardware profiles list on the Create workbench page.
-
The hardware profile is displayed on the Instances tab on the details page for the
HardwareProfilecustom resource definition (CRD).
8.2. Updating a hardware profile Copier lienLien copié sur presse-papiers!
You can update the existing hardware profiles in your deployment. You can change important identifying information, such as the display name, the identifier, or the description.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- The hardware profile exists in your deployment.
Procedure
From the OpenShift AI dashboard, click Settings
Hardware profiles. The Hardware profiles page opens. Existing hardware profiles are displayed. To enable or disable a hardware profile, on the row containing the relevant hardware profile, click the toggle in the Enabled column.
Click the action menu (⋮) and select Edit from the list.
The Edit hardware profile dialog opens.
- Make your changes.
- Click Update hardware profile.
Verification
- If your hardware profile has new identifying information, this information is displayed in the Hardware profile list on the Create workbench page.
8.3. Deleting a hardware profile Copier lienLien copié sur presse-papiers!
To discard hardware profiles that you no longer require, you can delete them so that they do not appear on the dashboard.
Prerequisites
- You have logged in to OpenShift AI as a user with OpenShift AI administrator privileges.
- The hardware profile that you want to delete exists in your deployment.
Procedure
From the OpenShift AI dashboard, click Settings
Hardware profiles. The Hardware profiles page opens, displaying existing hardware profiles.
Click the action menu (⋮) beside the hardware profile that you want to delete and click Delete.
The Delete hardware profile dialog opens.
- Enter the name of the hardware profile in the text field to confirm that you intend to delete it.
- Click Delete.
Verification
- The hardware profile is no longer displayed on the Hardware profiles page.