Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 8. Managing workloads with Kueue
As a cluster administrator, you can manage AI and machine learning workloads at scale by integrating the Red Hat build of Kueue with Red Hat OpenShift AI. This integration provides capabilities for quota management, resource allocation, and prioritized job scheduling.
Starting with OpenShift AI 2.24, the embedded Kueue component for managing distributed workloads is deprecated. Kueue is now provided through Red Hat build of Kueue, which is installed and managed by the Red Hat build of Kueue Operator. You cannot install both the embedded Kueue and the Red Hat build of Kueue Operator on the same cluster because this creates conflicting controllers that manage the same resources.
OpenShift AI does not automatically migrate existing workloads. To ensure your workloads continue using queue management after upgrading, cluster administrators must manually migrate from the embedded Kueue to the Red Hat build of Kueue Operator. For more information, see Migrating to the Red Hat build of Kueue Operator.
8.1. Overview of managing workloads with Kueue Link kopierenLink in die Zwischenablage kopiert!
You can use Kueue in OpenShift AI to manage AI and machine learning workloads at scale. Kueue controls how cluster resources are allocated and shared through hierarchical quota management, dynamic resource allocation, and prioritized job scheduling. These capabilities help prevent cluster contention, ensure fair access across teams, and optimize the use of heterogeneous compute resources, such as hardware accelerators.
Kueue lets you schedule diverse workloads, including distributed training jobs (RayJob
, RayCluster
, PyTorchJob
), workbenches (Notebook
), and model serving (InferenceService
). Kueue validation and queue enforcement apply only to workloads in namespaces with the kueue.openshift.io/managed=true
label.
Using Kueue in OpenShift AI provides these benefits:
- Prevents resource conflicts and prioritizes workload processing
- Manages quotas across teams and projects
- Ensures consistent scheduling for all workload types
- Maximizes GPU and other specialized hardware utilization
Starting with OpenShift AI 2.24, the embedded Kueue component for managing distributed workloads is deprecated. Kueue is now provided through Red Hat build of Kueue, which is installed and managed by the Red Hat build of Kueue Operator. You cannot install both the embedded Kueue and the Red Hat build of Kueue Operator on the same cluster because this creates conflicting controllers that manage the same resources.
OpenShift AI does not automatically migrate existing workloads. To ensure your workloads continue using queue management after upgrading, cluster administrators must manually migrate from the embedded Kueue to the Red Hat build of Kueue Operator. For more information, see Migrating to the Red Hat build of Kueue Operator.
8.1.1. Kueue management states Link kopierenLink in die Zwischenablage kopiert!
You configure how OpenShift AI interacts with Kueue by setting the managementState
in the DataScienceCluster
object.
Unmanaged
This state is supported for using Kueue with OpenShift AI. In
Unmanaged
state, OpenShift AI integrates with an existing Kueue installation managed by the Red Hat build of Kueue Operator. You must have the Red Hat build of Kueue Operator installed and running on the cluster.When you enable
Unmanaged
mode, the OpenShift AI Operator creates a defaultKueue
custom resource (CR) if one does not already exist. This prompts the Red Hat build of Kueue Operator to activate Kueue on the cluster.Managed
-
This state is deprecated. Previously, OpenShift AI deployed and managed an embedded Kueue distribution.
Managed
mode is not compatible with the Red Hat build of Kueue Operator. If both are installed, OpenShift AI stops reconciliation to avoid conflicts. You must migrate any environment using theManaged
state to theUnmanaged
state to ensure continued support. Removed
-
This state disables Kueue in OpenShift AI. If the state was previously
Managed
, OpenShift AI uninstalls the embedded Kueue distribution. If the state was previouslyUnmanaged
, OpenShift AI stops checking for the external Kueue integration but does not uninstall the Red Hat build of Kueue Operator. An emptymanagementState
value also functions asRemoved
.
8.1.2. Queue enforcement for projects Link kopierenLink in die Zwischenablage kopiert!
To ensure workloads do not bypass the queuing system, a validating webhook automatically enforces queuing rules on any project that is enabled for Kueue management. You enable a project for Kueue management by applying the kueue.openshift.io/managed=true
label to the project namespace.
This validating webhook enforcement method replaces the Validating Admission Policy that was used with the deprecated embedded Kueue component. The system also supports the legacy kueue-managed
label for backward compatibility, but kueue.openshift.io/managed=true
is the recommended label going forward.
After a project is enabled for Kueue management, the webhook requires that any new or updated workload has the kueue.x-k8s.io/queue-name
label. If this label is missing, the webhook prevents the workload from being created or updated.
OpenShift AI creates a default, cluster-scoped ClusterQueue
(if one does not already exist) and a namespace-scoped LocalQueue
for that namespace (if one does not already exist). These default resources are created with the opendatahub.io/managed=false
annotation, so they are not managed after creation. Cluster administrators can change or delete them.
The webhook enforces this rule on the create
and update
operations for the following resource types:
-
InferenceService
-
Notebook
-
PyTorchJob
-
RayCluster
-
RayJob
You can apply hardware profiles to other workload types, but the validation webhook enforces the kueue.x-k8s.io/queue-name
label requirement only for these specific resource types.
8.1.3. Restrictions for managing workloads with Kueue Link kopierenLink in die Zwischenablage kopiert!
When you use Kueue to manage workloads in OpenShift AI, the following restrictions apply:
-
Namespaces must be labeled with
kueue.openshift.io/managed=true
to enable Kueue validation and queue enforcement. - All workloads that you create from the OpenShift AI dashboard, such as workbenches and model servers, must use a hardware profile that specifies a local queue.
-
When you specify a local queue in a hardware profile, OpenShift AI automatically applies the corresponding
kueue.x-k8s.io/queue-name
label to workloads that use that profile. - You cannot use hardware profiles that contain node selectors or tolerations for node placement. To direct workloads to specific nodes, use a hardware profile that specifies a local queue that is associated with a queue configured with the appropriate resource flavors.
- You cannot use accelerator profiles with Kueue. You must migrate any existing accelerator profiles to hardware profiles.
- Because workbenches are not suspendable workloads, you can only assign them to a local queue that is associated with a non-preemptive cluster queue. The default cluster queue that OpenShift AI creates is non-preemptive.
Additional resources
8.1.4. Kueue workflow Link kopierenLink in die Zwischenablage kopiert!
Managing workloads with Kueue in OpenShift AI involves tasks for OpenShift cluster administrators, OpenShift AI administrators, and machine learning (ML) engineers or data scientists:
Cluster administrator
Installs and configures Kueue:
- Installs the Red Hat build of Kueue Operator on the cluster, as described in the Red Hat build of Kueue documentation.
-
Activates the Kueue integration by setting the
managementState
toUnmanaged
in theDataScienceCluster
custom resource, as described in Configuring workload management with Kueue. - Configures quotas to optimize resource allocation for user workloads, as described in the Red Hat build of Kueue documentation.
Enables Kueue in the dashboard by setting
disableKueue
tofalse
in theOdhDashboardConfig
custom resource, as described in Enabling Kueue in the dashboard.NoteWhen Kueue is enabled in the dashboard, OpenShift AI automatically enables Kueue management for all new projects created from the dashboard. For existing projects, or for projects created by using the command-line interface, you must enable Kueue management manually by applying the
kueue.openshift.io/managed=true
label to the project namespace.
OpenShift AI administrator
Prepares the OpenShift AI environment:
- Creates Kueue-enabled hardware profiles so that users can submit workloads from the OpenShift AI dashboard, as described in Working with hardware profiles.
ML Engineer or data scientist
Submits workloads to the queuing system:
- For workloads created from the OpenShift AI dashboard, such as workbenches and model servers, selects a Kueue-enabled hardware profile during creation.
-
For workloads created by using a command-line interface or an SDK, such as distributed training jobs, adds the
kueue.x-k8s.io/queue-name
label to the workload’s YAML manifest and sets its value to the targetLocalQueue
name.
8.2. Configuring workload management with Kueue Link kopierenLink in die Zwischenablage kopiert!
To use workload queuing in OpenShift AI, install the Red Hat build of Kueue Operator and activate the Kueue integration in OpenShift AI.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You are using OpenShift 4.18 or later.
- You have installed and configured the cert-manager Operator for Red Hat OpenShift for your cluster.
- You have installed the OpenShift command-line interface (CLI). See Installing the OpenShift CLI.
Procedure
In a terminal window, log in to the OpenShift CLI as shown in the following example:
oc login <openshift_cluster_url> -u <admin_username> -p <password>
$ oc login <openshift_cluster_url> -u <admin_username> -p <password>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Install the Red Hat build of Kueue Operator on your OpenShift cluster as described in the Red Hat build of Kueue documentation.
Activate the Kueue integration. You can use the predefined names for the default cluster queue and default local queue, or specify custom names.
To use the predefined queue names (
default
), run the following command. Replace<operator-namespace>
with your operator namespace. The default operator namespace isredhat-ods-operator
.oc patch datasciencecluster default-dsc --type='merge' -p '{"spec":{"components":{"kueue":{"managementState":"Unmanaged"}}}}' -n <operator-namespace>
$ oc patch datasciencecluster default-dsc --type='merge' -p '{"spec":{"components":{"kueue":{"managementState":"Unmanaged"}}}}' -n <operator-namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To specify custom queue names, run the following command. Replace
<example-cluster-queue>
and<example-local-queue>
with your custom queue names, and replace<operator-namespace>
with your operator namespace. The default operator namespace isredhat-ods-operator
.oc patch datasciencecluster default-dsc --type='merge' -p '{"spec":{"components":{"kueue":{"managementState":"Unmanaged","defaultClusterQueueName":"<example-cluster-queue>","defaultLocalQueueName":"<example-local-queue>"}}}}' -n <operator-namespace>
$ oc patch datasciencecluster default-dsc --type='merge' -p '{"spec":{"components":{"kueue":{"managementState":"Unmanaged","defaultClusterQueueName":"<example-cluster-queue>","defaultLocalQueueName":"<example-local-queue>"}}}}' -n <operator-namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that the Red Hat build of Kueue pods are running:
oc get pods -n openshift-kueue-operator
$ oc get pods -n openshift-kueue-operator
Copy to Clipboard Copied! Toggle word wrap Toggle overflow You should see output similar to the following example:
kueue-controller-manager-d9fc745df-ph77w 1/1 Running openshift-kueue-operator-69cfbf45cf-lwtpm 1/1 Running
kueue-controller-manager-d9fc745df-ph77w 1/1 Running openshift-kueue-operator-69cfbf45cf-lwtpm 1/1 Running
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the default
ClusterQueue
was created:oc get clusterqueues
$ oc get clusterqueues
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Next steps
-
Configure quotas by creating and modifying
ResourceFlavor
,ClusterQueue
, andLocalQueue
objects. For details, see the Red Hat build of Kueue documentation. - Enable Kueue in the dashboard so that users can select Kueue-enabled options when creating workloads. When you enable Kueue, you also enable Kueue management for all new projects created from the dashboard. See Enabling Kueue in the dashboard.
- Cluster administrators and OpenShift AI administrators can create hardware profiles so that users can submit workloads from the OpenShift AI dashboard. See Working with hardware profiles.
8.2.1. Enabling Kueue in the dashboard Link kopierenLink in die Zwischenablage kopiert!
Enable Kueue in the OpenShift AI dashboard so that users can select Kueue-enabled options when creating workloads.
When you enable Kueue in the dashboard, OpenShift AI automatically enables Kueue management for all new projects created from the dashboard. For these projects, OpenShift AI applies the kueue.openshift.io/managed=true
label to the namespace and creates a LocalQueue
object if one does not already exist. The LocalQueue
object is created with the opendatahub.io/managed=false
annotation, so it is not managed after creation. Cluster administrators can modify or delete it as needed. A validating webhook then enforces that any new or updated workload resource in a Kueue-enabled project includes the kueue.x-k8s.io/queue-name
label.
For existing projects, or for projects created by using the command-line interface, you must enable Kueue management manually by applying the kueue.openshift.io/managed=true
label to the project namespace.
oc label namespace <project-namespace> kueue.openshift.io/managed=true --overwrite
$ oc label namespace <project-namespace> kueue.openshift.io/managed=true --overwrite
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You are using OpenShift 4.18 or later.
- You have installed and activated the Red Hat build of Kueue Operator, as described in Configuring workload management with Kueue.
- You have configured quotas, as described in the Red Hat build of Kueue documentation.
Procedure
In a terminal window, log in to the OpenShift CLI as shown in the following example:
oc login <openshift_cluster_url> -u <admin_username> -p <password>
$ oc login <openshift_cluster_url> -u <admin_username> -p <password>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Update the
odh-dashboard-config
custom resource in the OpenShift AI applications namespace. Replace<applications-namespace>
with your OpenShift AI applications namespace. The default isredhat-ods-applications
.oc patch odhdashboardconfig odh-dashboard-config \ -n \<applications-namespace\> \ --type merge \ -p {"spec":{"dashboardConfig":{"disableHardwareProfiles":false,"disableKueue":false}}}
$ oc patch odhdashboardconfig odh-dashboard-config \ -n \<applications-namespace\> \ --type merge \ -p {"spec":{"dashboardConfig":{"disableHardwareProfiles":false,"disableKueue":false}}}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- From the OpenShift AI dashboard, create a new project.
Verify that the project namespace is labeled for Kueue management:
oc get ns <project-namespace> -o jsonpath='{.metadata.labels.kueue\.openshift\.io/managed}{"\n"}'
$ oc get ns <project-namespace> -o jsonpath='{.metadata.labels.kueue\.openshift\.io/managed}{"\n"}'
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The output should be
true
.Confirm that a default
LocalQueue
exists for the project namespace:oc get localqueues -n <project-namespace>
$ oc get localqueues -n <project-namespace>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Create a test workload (for example, a
Notebook
) and verify that it includes thekueue.x-k8s.io/queue-name
label.
Next step
- Cluster administrators and OpenShift AI administrators can create hardware profiles so that users can submit workloads from the OpenShift AI dashboard. See Working with hardware profiles.
8.3. Troubleshooting common problems with Kueue Link kopierenLink in die Zwischenablage kopiert!
If your users are experiencing errors in Red Hat OpenShift AI relating to Kueue workloads, read this section to understand what could be causing the problem, and how to resolve the problem.
If the problem is not documented here or in the release notes, contact Red Hat Support.
8.3.1. A user receives a "failed to call webhook" error message for Kueue Link kopierenLink in die Zwischenablage kopiert!
Problem
After the user runs the cluster.apply()
command, the following error is shown:
ApiException: (500) Reason: Internal Server Error HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Internal error occurred: failed calling webhook \"mraycluster.kb.io\": failed to call webhook: Post \"https://kueue-webhook-service.redhat-ods-applications.svc:443/mutate-ray-io-v1-raycluster?timeout=10s\": no endpoints available for service \"kueue-webhook-service\"","reason":"InternalError","details":{"causes":[{"message":"failed calling webhook \"mraycluster.kb.io\": failed to call webhook: Post \"https://kueue-webhook-service.redhat-ods-applications.svc:443/mutate-ray-io-v1-raycluster?timeout=10s\": no endpoints available for service \"kueue-webhook-service\""}]},"code":500}
ApiException: (500)
Reason: Internal Server Error
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Internal error occurred: failed calling webhook \"mraycluster.kb.io\": failed to call webhook: Post \"https://kueue-webhook-service.redhat-ods-applications.svc:443/mutate-ray-io-v1-raycluster?timeout=10s\": no endpoints available for service \"kueue-webhook-service\"","reason":"InternalError","details":{"causes":[{"message":"failed calling webhook \"mraycluster.kb.io\": failed to call webhook: Post \"https://kueue-webhook-service.redhat-ods-applications.svc:443/mutate-ray-io-v1-raycluster?timeout=10s\": no endpoints available for service \"kueue-webhook-service\""}]},"code":500}
Diagnosis
The Kueue pod might not be running.
Resolution
- In the OpenShift console, select the user’s project from the Project list.
-
Click Workloads
Pods. - Verify that the Kueue pod is running. If necessary, restart the Kueue pod.
Review the logs for the Kueue pod to verify that the webhook server is serving, as shown in the following example:
{"level":"info","ts":"2024-06-24T14:36:24.255137871Z","logger":"controller-runtime.webhook","caller":"webhook/server.go:242","msg":"Serving webhook server","host":"","port":9443}
{"level":"info","ts":"2024-06-24T14:36:24.255137871Z","logger":"controller-runtime.webhook","caller":"webhook/server.go:242","msg":"Serving webhook server","host":"","port":9443}
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
8.3.2. A user receives a "Default Local Queue … not found" error message Link kopierenLink in die Zwischenablage kopiert!
Problem
After the user runs the cluster.apply()
command, the following error is shown:
Default Local Queue with kueue.x-k8s.io/default-queue: true annotation not found please create a default Local Queue or provide the local_queue name in Cluster Configuration.
Default Local Queue with kueue.x-k8s.io/default-queue: true annotation not found please create a default Local Queue or provide the local_queue name in Cluster Configuration.
Diagnosis
No default local queue is defined, and a local queue is not specified in the cluster configuration.
Resolution
Check whether a local queue exists in the user’s project, as follows:
- In the OpenShift console, select the user’s project from the Project list.
-
Click Home
Search, and from the Resources list, select LocalQueue. - If no local queues are found, create a local queue.
- Provide the user with the details of the local queues in their project, and advise them to add a local queue to their cluster configuration.
Define a default local queue.
For information about creating a local queue and defining a default local queue, see Configuring quota management for distributed workloads.
8.3.3. A user receives a "local_queue provided does not exist" error message Link kopierenLink in die Zwischenablage kopiert!
Problem
After the user runs the cluster.apply()
command, the following error is shown:
local_queue provided does not exist or is not in this namespace. Please provide the correct local_queue name in Cluster Configuration.
local_queue provided does not exist or is not in this namespace. Please provide the correct local_queue name in Cluster Configuration.
Diagnosis
An incorrect value is specified for the local queue in the cluster configuration, or an incorrect default local queue is defined. The specified local queue either does not exist, or exists in a different namespace.
Resolution
In the OpenShift console, select the user’s project from the Project list.
- Click Search, and from the Resources list, select LocalQueue.
Resolve the problem in one of the following ways:
- If no local queues are found, create a local queue.
-
If one or more local queues are found, provide the user with the details of the local queues in their project. Advise the user to ensure that they spelled the local queue name correctly in their cluster configuration, and that the
namespace
value in the cluster configuration matches their project name.
Define a default local queue.
For information about creating a local queue and defining a default local queue, see Configuring quota management for distributed workloads.
8.3.4. The pod provisioned by Kueue is terminated before the image is pulled Link kopierenLink in die Zwischenablage kopiert!
Problem
Kueue waits for a period of time before marking a workload as ready for all of the workload pods to become provisioned and running. By default, Kueue waits for 5 minutes. If the pod image is very large and is still being pulled after the 5-minute waiting period elapses, Kueue fails the workload and terminates the related pods.
Diagnosis
- In the OpenShift console, select the user’s project from the Project list.
-
Click Workloads
Pods. - Click the user’s pod name to open the pod details page.
- Click the Events tab, and review the pod events to check whether the image pull completed successfully.
Resolution
If the pod takes more than 5 minutes to pull the image, resolve the problem in one of the following ways:
-
Add an
OnFailure
restart policy for resources that are managed by Kueue. -
Configure a custom timeout for the
waitForPodsReady
property in theKueue
custom resource (CR). The CR is installed in theopenshift-kueue-operator
namespace by the Red Hat build of Kueue Operator.
For more information about this configuration option, see Enabling waitForPodsReady in the Kueue documentation.
8.4. Migrating to the Red Hat build of Kueue Operator Link kopierenLink in die Zwischenablage kopiert!
Starting with OpenShift AI 2.24, the embedded Kueue component for managing distributed workloads is deprecated. You must migrate to the Red Hat build of Kueue Operator. You cannot install both the embedded Kueue and the Red Hat build of Kueue Operator on the same cluster because this creates conflicting controllers that manage the same resources.
OpenShift AI does not automatically migrate existing workloads to Red Hat build of Kueue. Cluster administrators must manually migrate from the embedded Kueue to the Red Hat build of Kueue Operator to ensure workloads continue using queue management after upgrading.
Prerequisites
- You have cluster administrator privileges for your OpenShift cluster.
- You are using OpenShift 4.18 or later.
- You have installed and configured the cert-manager Operator for Red Hat OpenShift for your cluster.
-
The embedded Kueue component is enabled (that is, the
spec.components.kueue.managementState
field in theDataScienceCluster
object is set toManaged
).
Procedure
Optional: When you migrate from the embedded Kueue to Red Hat build of Kueue, the OpenShift AI Operator automatically moves your existing Kueue configuration from the
kueue-manager-config
ConfigMap to theKueue
custom resource (CR).To retain the
kueue-manager-config
ConfigMap, run the following command. Replace<applications-namespace>
with your OpenShift AI applications namespace. The default isredhat-ods-applications
.oc annotate configmap kueue-manager-config -n <applications-namespace> opendatahub.io/managed=false
$ oc annotate configmap kueue-manager-config -n <applications-namespace> opendatahub.io/managed=false
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Log in to the OpenShift web console as a cluster administrator.
Optional (recommended): To avoid potential configuration conflicts, uninstall the embedded Kueue component before installing Red Hat build of Kueue.
-
In the web console, click Operators
Installed Operators and then click the Red Hat OpenShift AI Operator. - Click the Data Science Cluster tab.
- Click the default-dsc object.
- Click the YAML tab.
Set
spec.components.kueue.managementState
toRemoved
as shown:spec: components: kueue: managementState: Removed
spec: components: kueue: managementState: Removed
Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Click Save.
Wait for the OpenShift AI Operator to reconcile, and then verify that the embedded Kueue was removed:
-
On the Details tab of the
default-dsc
object, check that the KueueReady condition has a Status ofFalse
and a Reason ofRemoved
. -
Go to Workloads
Deployments, select the project where OpenShift AI is installed (for example, redhat-ods-applications
), and confirm that Kueue-related deployments (for example,kueue-controller-manager
) are no longer present.
-
On the Details tab of the
-
In the web console, click Operators
Install the Red Hat build of Kueue Operator on your OpenShift cluster:
- Follow the steps to install the Red Hat build of Kueue Operator, as described in the Red Hat build of Kueue documentation.
-
Go to Operators
Installed Operators and confirm that the Red Hat build of Kueue Operator is listed with Status as Succeeded.
Activate the Red Hat build of Kueue Operator in OpenShift AI:
-
In the web console, click Operators
Installed Operators and then click the Red Hat OpenShift AI Operator. - Click the Data Science Cluster tab.
- Click the default-dsc object.
- Click the YAML tab.
Set
spec.components.kueue.managementState
toUnmanaged
. You can either use the predefined names (default
) for the default cluster queue and default local queue, or specify custom names, as shown in the following examples.To use the predefined queue names, apply the following configuration:
spec: components: kueue: managementState: Unmanaged
spec: components: kueue: managementState: Unmanaged
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To specify custom queue names, apply the following configuration, replacing
<example-cluster-queue>
and<example-local-queue>
with your custom values:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
- Click Save.
-
In the web console, click Operators
Enable Kueue management for existing projects by applying the
kueue.openshift.io/managed=true
label to each project namespace:oc label namespace <project-namespace> kueue.openshift.io/managed=true --overwrite
$ oc label namespace <project-namespace> kueue.openshift.io/managed=true --overwrite
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Replace
<project-namespace>
with the name of your project.NoteKueue validation and queue enforcement apply only to workloads in namespaces with the
kueue.openshift.io/managed=true
label.
Verification
- Verify that the embedded Kueue is removed.
-
Verify that the
DataScienceCluster
resource shows a healthyUnmanaged
status for Kueue. - Verify that existing workloads in the queue continue to be processed by the new operator-managed Kueue controllers. Submit a new test workload to confirm functionality.
Next steps
-
Configure quotas by creating and modifying
ResourceFlavor
,ClusterQueue
, andLocalQueue
objects. For details, see the Red Hat build of Kueue documentation. - Enable Kueue in the dashboard so that users can select Kueue-enabled options when creating workloads. When you enable Kueue, you also enable Kueue management for all new projects created from the dashboard. See Enabling Kueue in the dashboard.
- Cluster administrators and OpenShift AI administrators can create hardware profiles so that users can submit workloads from the OpenShift AI dashboard. See Working with hardware profiles.