Este conteúdo não está disponível no idioma selecionado.
Chapter 2. Red Hat build of Kueue
2.1. Introduction to Red Hat build of Kueue Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue is a Kubernetes-native system that manages access to resources for jobs. Red Hat build of Kueue can determine when a job waits, is admitted to start by creating pods, or should be preempted, meaning that active pods for that job are deleted.
In the context of Red Hat build of Kueue, a job can be defined as a one-time or on-demand task that runs to completion.
Red Hat build of Kueue is based on the Kueue open source project.
Red Hat build of Kueue is compatible with environments that use heterogeneous, elastic resources. This means that the environment has many different resource types, and those resources are capable of dynamic scaling.
Red Hat build of Kueue does not replace any existing components in a Kubernetes cluster, but instead integrates with the existing Kubernetes API server, scheduler, and cluster autoscaler components.
Red Hat build of Kueue supports all-or-nothing semantics. This means that either an entire job with all of its components is admitted to the cluster, or the entire job is rejected if it does not fit on the cluster.
2.1.1. Personas Copiar o linkLink copiado para a área de transferência!
Different personas exist in a Red Hat build of Kueue workflow.
- Batch administrators
- Batch administrators manage the cluster infrastructure and establish quotas and queues.
- Batch users
- Batch users run jobs on the cluster. Examples of batch users might be researchers, AI/ML engineers, or data scientists.
- Serving users
- Serving users run jobs on the cluster. For example, to expose a trained AI/ML model for inference.
- Platform developers
- Platform developers integrate Red Hat build of Kueue with other software. They might also contribute to the Kueue open source project.
2.1.2. Workflow overview Copiar o linkLink copiado para a área de transferência!
The Red Hat build of Kueue workflow can be described at a high level as follows:
-
Batch administrators create and configure
ResourceFlavor
,LocalQueue
, andClusterQueue
resources. - User personas create jobs on the cluster.
- The Kubernetes API server validates and accepts job data.
-
Red Hat build of Kueue admits jobs based on configured options, such as order or quota. It injects affinity into the job by using resource flavors, and creates a
Workload
object that corresponds to each job. - The applicable controller for the job type creates pods.
- The Kubernetes scheduler assigns pods to a node in the cluster.
- The Kubernetes cluster autoscaler provisions more nodes as required.
2.2. Release notes Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue is released as an Operator that is supported on OpenShift Container Platform.
2.2.1. Compatible environments Copiar o linkLink copiado para a área de transferência!
Before you install Red Hat build of Kueue, review this section to ensure that your cluster meets the requirements.
2.2.1.1. Supported architectures Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue version 1.1 and later is supported on the following architectures:
- ARM64
- 64-bit x86
- ppc64le (IBM Power®)
- s390x (IBM Z®)
2.2.1.2. Supported platforms Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue version 1.1 and later is supported on the following platforms:
- OpenShift Container Platform
- Hosted control planes for OpenShift Container Platform
Currently, Red Hat build of Kueue is not supported on Red Hat build of MicroShift (MicroShift).
2.2.2. Release notes for Red Hat build of Kueue version 1.1 Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue version 1.1 is a generally available release that is supported on OpenShift Container Platform versions 4.18 and later. Red Hat build of Kueue version 1.1 uses Kueue version 0.12.
If you have a previously installed version of Red Hat build of Kueue on your cluster, you must uninstall the Operator and manually install version 1.1. For information see Upgrading Red Hat build of Kueue.
2.2.2.1. New features and enhancements Copiar o linkLink copiado para a área de transferência!
- Configure a default local queue
A default local queue serves as the local queue for newly created jobs that do not have the
kueue.x-k8s.io/queue-name
label. After you create a default local queue, any new jobs created in the namespace without akueue.x-k8s.io/queue-name
label automatically update to have thekueue.x-k8s.io/queue-name: default
label.(RFE-7615)
- Multi-architecture and Hosted control planes support
With this release, Red Hat build of Kueue is supported on multiple different architectures, including ARM64, 64-bit x86, ppc64le (IBM Power®), and s390x (IBM Z®), as well as on Hosted control planes for OpenShift Container Platform.
2.2.2.2. Fixed issues Copiar o linkLink copiado para a área de transferência!
- You can create a
Kueue
custom resource by using the OpenShift Container Platform web console Before this update, if you tried to use the OpenShift Container Platform web console to create a
Kueue
custom resource (CR) by using the form view, the web console showed an error and the resource could not be created. With this release, the default namespace was removed from theKueue
CR template. As a result, you can use the OpenShift Container Platform web console to create aKueue
CR by using the form view.
2.2.2.3. Known issues Copiar o linkLink copiado para a área de transferência!
Kueue
CR description reads as "Not available" in the OpenShift Container Platform web consoleAfter you install Red Hat build of Kueue, in the Operator details view, the description for the
Kueue
CR reads as "Not available". This issue does not affect or degrade the Red Hat build of Kueue Operator functionality.- Custom resources are not deleted properly when you uninstall Red Hat build of Kueue
After you uninstall the Red Hat Build of Kueue Operator using the Delete all operand instances for this operator option in the OpenShift Container Platform web console, some Red Hat build of Kueue custom resources are not fully deleted. These resources can be viewed in the Installed Operators view with the status Resource is being deleted. As a workaround, you can manually delete the resource finalizers to remove them fully.
2.2.3. Release notes for Red Hat build of Kueue version 1.0.1 Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue version 1.0.1 is a patch release that is supported on OpenShift Container Platform versions 4.18 and 4.19 on the 64-bit x86 architecture.
Red Hat build of Kueue version 1.0.1 uses Kueue version 0.11.
2.2.3.1. Bug fixes in Red Hat build of Kueue version 1.0.1 Copiar o linkLink copiado para a área de transferência!
- Previously, leader election for Red Hat build of Kueue was not configured to tolerate disruption, which resulted in frequent crashing. With this release, the leader election values for Red Hat build of Kueue have been updated to match the durations recommended for OpenShift Container Platform. (OCPBUGS-58496)
-
Previously, the
ReadyReplicas
count was not set in the reconciler, which meant that the Red Hat build of Kueue Operator status would report that there were no replicas ready. With this release, theReadyReplicas
count is based on the number of ready replicas for the deployment, which ensures that the Operator shows as ready in the OpenShift Container Platform console when thekueue-controller-manager
pods are ready. (OCPBUGS-59261) -
Previously, when the
Kueue
custom resource (CR) was deleted from theopenshift-kueue-operator
namespace, thekueue-manager-config
config map was not deleted automatically and could remain in the namespace. With this release, thekueue-manager-config
config map,kueue-webhook-server-cert
secret, andmetrics-server-cert
secret are deleted automatically when theKueue
CR is deleted. (OCPBUGS-57960)
2.2.4. Release notes for Red Hat build of Kueue version 1.0 Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue version 1.0 is a generally available release that is supported on OpenShift Container Platform versions 4.18 and 4.19 on the 64-bit x86 architecture. Red Hat build of Kueue version 1.0 uses Kueue version 0.11.
2.2.4.1. New features and enhancements Copiar o linkLink copiado para a área de transferência!
- Role-based access control (RBAC)
- Role-based access control (RBAC) enables you to control which types of users can create which types of Red Hat build of Kueue resources.
- Configure resource quotas
- Configuring resource quotas by creating cluster queues, resource flavors, and local queues enables you to control the amount of resources used by user-submitted jobs and workloads.
- Control job and workload management
- Labeling namespaces and configuring label policies enable you to control which jobs and workloads are managed by Red Hat build of Kueue.
- Share borrowable resources between queues
- Configuring cohorts, fair sharing, and gang scheduling settings enable you to share unused, borrowable resources between queues.
2.2.4.2. Known issues Copiar o linkLink copiado para a área de transferência!
- Jobs in all namespaces are reconciled if they have the
kueue.x-k8s.io/queue-name
label Red Hat build of Kueue uses the
managedJobsNamespaceSelector
configuration field, so that administrators can configure which namespaces opt in to be managed by Red Hat build of Kueue. Because namespaces must be manually configured to opt in to being managed by Red Hat build of Kueue, resources in system or third-party namespaces are not impacted or managed by Red Hat build of Kueue.The behavior in Red Hat build of Kueue 1.0 allows reconciliation of
Job
resources that have thekueue.x-k8s.io/queue-name
label, even if these resources are in namespaces that are not configured to opt in to being managed by Red Hat build of Kueue. This is inconsistent with the behavior for other core integrations like pods, deployments, and stateful sets, which are only reconciled if they are in namespaces that have been configured to opt in to being managed by Red Hat build of Kueue.- You cannot create a
Kueue
custom resource by using the OpenShift Container Platform web console If you try to use the OpenShift Container Platform web console to create a
Kueue
custom resource (CR) by using the form view, the web console shows an error and the resource cannot be created. As a workaround, use the YAML view to create aKueue
CR instead.
2.3. Installing Red Hat build of Kueue Copiar o linkLink copiado para a área de transferência!
You can install Red Hat build of Kueue by using the Red Hat Build of Kueue Operator in OperatorHub.
2.3.1. Compatible environments Copiar o linkLink copiado para a área de transferência!
Before you install Red Hat build of Kueue, review this section to ensure that your cluster meets the requirements.
2.3.1.1. Supported architectures Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue version 1.1 and later is supported on the following architectures:
- ARM64
- 64-bit x86
- ppc64le (IBM Power®)
- s390x (IBM Z®)
2.3.1.2. Supported platforms Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue version 1.1 and later is supported on the following platforms:
- OpenShift Container Platform
- Hosted control planes for OpenShift Container Platform
Currently, Red Hat build of Kueue is not supported on Red Hat build of MicroShift (MicroShift).
2.3.2. Installing the Red Hat Build of Kueue Operator Copiar o linkLink copiado para a área de transferência!
You can install the Red Hat Build of Kueue Operator on a OpenShift Container Platform cluster by using the OperatorHub in the web console.
Prerequisites
- You have administrator permissions on a OpenShift Container Platform cluster.
- You have access to the OpenShift Container Platform web console.
- You have installed and configured the cert-manager Operator for Red Hat OpenShift for your cluster.
Procedure
-
In the OpenShift Container Platform web console, click Operators
OperatorHub. - Choose Red Hat Build of Kueue Operator from the list of available Operators, and click Install.
Verification
-
Go to Operators
Installed Operators and confirm that the Red Hat Build of Kueue Operator is listed with Status as Succeeded.
2.3.3. Upgrading Red Hat build of Kueue Copiar o linkLink copiado para a área de transferência!
If you have previously installed Red Hat build of Kueue, you must manually upgrade your deployment to the latest version to use the latest bug fixes and feature enhancements.
Prerequisites
- You have installed a previous version of Red Hat build of Kueue.
- You are logged in to the OpenShift Container Platform web console with cluster administrator permissions.
Procedure
-
In the OpenShift Container Platform web console, click Operators
Installed Operators, then select Red Hat build of Kueue from the list. - From the Actions drop-down menu, select Uninstall Operator.
The Uninstall Operator? dialog box opens. Click Uninstall.
ImportantSelecting the Delete all operand instances for this operator checkbox before clicking Uninstall deletes all existing resources from the cluster, including:
-
The
Kueue
CR - Any cluster queues, local queues, or resource flavors that you have created
Leave this box unchecked when upgrading your cluster to retain your created resources.
-
The
-
In the OpenShift Container Platform web console, click Operators
OperatorHub. - Choose Red Hat Build of Kueue Operator from the list of available Operators, and click Install.
Verification
-
Go to Operators
Installed Operators. - Confirm that the Red Hat Build of Kueue Operator is listed with Status as Succeeded.
- Confirm that the version shown under the Operator name in the list is the latest version.
2.3.4. Creating a Kueue custom resource Copiar o linkLink copiado para a área de transferência!
After you have installed the Red Hat Build of Kueue Operator, you must create a Kueue
custom resource (CR) to configure your installation.
Prerequisites
Ensure that you have completed the following prerequisites:
- The Red Hat build of Kueue Operator is installed on your cluster.
-
You have cluster administrator permissions and the
kueue-batch-admin-role
role. - You have access to the OpenShift Container Platform web console.
Procedure
-
In the OpenShift Container Platform web console, click Operators
Installed Operators. - In the Provided APIs table column, click Kueue. This takes you to the Kueue tab of the Operator details page.
- Click Create Kueue. This takes you to the Create Kueue YAML view.
Enter the details for your
Kueue
CR.Example
Kueue
CRCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The name of the
Kueue
CR must becluster
. - 2
- If you want to configure Red Hat build of Kueue for use with other workload types, add those types here. For the default configuration, only the
BatchJob
type is recommended and supported. - 3
- Optional: If you want to configure fair sharing for Red Hat build of Kueue, set the
preemptionPolicy
value toFairSharing
. The default setting in theKueue
CR isClassical
preemption.
- Click Create.
Verification
-
After you create the
Kueue
CR, the web console brings you to the Operator details page, where you can see the CR in the list of Kueues. Optional: If you have the OpenShift CLI (
oc
) installed, you can run the following command and observe the output to confirm that yourKueue
CR has been created successfully:oc get kueue
$ oc get kueue
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE cluster 4m
NAME AGE cluster 4m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.3.5. Labeling namespaces to allow Red Hat build of Kueue to manage jobs Copiar o linkLink copiado para a área de transferência!
The Red Hat build of Kueue Operator uses an opt-in webhook mechanism to ensure that policies are only enforced for the jobs and namespaces that it is expected to target.
You must label the namespaces where you want Red Hat build of Kueue to manage jobs with the kueue.openshift.io/managed=true
label.
Prerequisites
- You have cluster administrator permissions.
-
The Red Hat build of Kueue Operator is installed on your cluster, and you have created a
Kueue
custom resource (CR). -
You have installed the OpenShift CLI (
oc
).
Procedure
Add the
kueue.openshift.io/managed=true
label to a namespace by running the following command:oc label namespace <namespace> kueue.openshift.io/managed=true
$ oc label namespace <namespace> kueue.openshift.io/managed=true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
When you add this label, you instruct the Red Hat build of Kueue Operator that the namespace is managed by its webhook admission controllers. As a result, any Red Hat build of Kueue resources within that namespace are properly validated and mutated.
2.4. Installing Red Hat build of Kueue in a disconnected environment Copiar o linkLink copiado para a área de transferência!
Before you can install Red Hat build of Kueue on a disconnected OpenShift Container Platform cluster, you must enable Operator Lifecycle Manager (OLM) in disconnected environments by completing the following steps:
- Disable the default remote OperatorHub sources for OLM.
- Use a workstation with full internet access to create and push local mirrors of the OperatorHub content to a mirror registry.
- Configure OLM to install and manage Operators from local sources on the mirror registry instead of the default remote sources.
After enabling OLM in a disconnected environment, you can continue to use your unrestricted workstation to keep your local OperatorHub sources updated as newer versions of Operators are released.
For full documentation on completing these steps, see the OpenShift Container Platform documentation on Using Operator Lifecycle Manager in disconnected environments.
2.4.1. Compatible environments Copiar o linkLink copiado para a área de transferência!
Before you install Red Hat build of Kueue, review this section to ensure that your cluster meets the requirements.
2.4.1.1. Supported architectures Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue version 1.1 and later is supported on the following architectures:
- ARM64
- 64-bit x86
- ppc64le (IBM Power®)
- s390x (IBM Z®)
2.4.1.2. Supported platforms Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue version 1.1 and later is supported on the following platforms:
- OpenShift Container Platform
- Hosted control planes for OpenShift Container Platform
Currently, Red Hat build of Kueue is not supported on Red Hat build of MicroShift (MicroShift).
2.4.2. Installing the Red Hat Build of Kueue Operator Copiar o linkLink copiado para a área de transferência!
You can install the Red Hat Build of Kueue Operator on a OpenShift Container Platform cluster by using the OperatorHub in the web console.
Prerequisites
- You have administrator permissions on a OpenShift Container Platform cluster.
- You have access to the OpenShift Container Platform web console.
- You have installed and configured the cert-manager Operator for Red Hat OpenShift for your cluster.
Procedure
-
In the OpenShift Container Platform web console, click Operators
OperatorHub. - Choose Red Hat Build of Kueue Operator from the list of available Operators, and click Install.
Verification
-
Go to Operators
Installed Operators and confirm that the Red Hat Build of Kueue Operator is listed with Status as Succeeded.
2.4.3. Upgrading Red Hat build of Kueue Copiar o linkLink copiado para a área de transferência!
If you have previously installed Red Hat build of Kueue, you must manually upgrade your deployment to the latest version to use the latest bug fixes and feature enhancements.
Prerequisites
- You have installed a previous version of Red Hat build of Kueue.
- You are logged in to the OpenShift Container Platform web console with cluster administrator permissions.
Procedure
-
In the OpenShift Container Platform web console, click Operators
Installed Operators, then select Red Hat build of Kueue from the list. - From the Actions drop-down menu, select Uninstall Operator.
The Uninstall Operator? dialog box opens. Click Uninstall.
ImportantSelecting the Delete all operand instances for this operator checkbox before clicking Uninstall deletes all existing resources from the cluster, including:
-
The
Kueue
CR - Any cluster queues, local queues, or resource flavors that you have created
Leave this box unchecked when upgrading your cluster to retain your created resources.
-
The
-
In the OpenShift Container Platform web console, click Operators
OperatorHub. - Choose Red Hat Build of Kueue Operator from the list of available Operators, and click Install.
Verification
-
Go to Operators
Installed Operators. - Confirm that the Red Hat Build of Kueue Operator is listed with Status as Succeeded.
- Confirm that the version shown under the Operator name in the list is the latest version.
2.4.4. Creating a Kueue custom resource Copiar o linkLink copiado para a área de transferência!
After you have installed the Red Hat Build of Kueue Operator, you must create a Kueue
custom resource (CR) to configure your installation.
Prerequisites
Ensure that you have completed the following prerequisites:
- The Red Hat build of Kueue Operator is installed on your cluster.
-
You have cluster administrator permissions and the
kueue-batch-admin-role
role. - You have access to the OpenShift Container Platform web console.
Procedure
-
In the OpenShift Container Platform web console, click Operators
Installed Operators. - In the Provided APIs table column, click Kueue. This takes you to the Kueue tab of the Operator details page.
- Click Create Kueue. This takes you to the Create Kueue YAML view.
Enter the details for your
Kueue
CR.Example
Kueue
CRCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The name of the
Kueue
CR must becluster
. - 2
- If you want to configure Red Hat build of Kueue for use with other workload types, add those types here. For the default configuration, only the
BatchJob
type is recommended and supported. - 3
- Optional: If you want to configure fair sharing for Red Hat build of Kueue, set the
preemptionPolicy
value toFairSharing
. The default setting in theKueue
CR isClassical
preemption.
- Click Create.
Verification
-
After you create the
Kueue
CR, the web console brings you to the Operator details page, where you can see the CR in the list of Kueues. Optional: If you have the OpenShift CLI (
oc
) installed, you can run the following command and observe the output to confirm that yourKueue
CR has been created successfully:oc get kueue
$ oc get kueue
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE cluster 4m
NAME AGE cluster 4m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.4.5. Labeling namespaces to allow Red Hat build of Kueue to manage jobs Copiar o linkLink copiado para a área de transferência!
The Red Hat build of Kueue Operator uses an opt-in webhook mechanism to ensure that policies are only enforced for the jobs and namespaces that it is expected to target.
You must label the namespaces where you want Red Hat build of Kueue to manage jobs with the kueue.openshift.io/managed=true
label.
Prerequisites
- You have cluster administrator permissions.
-
The Red Hat build of Kueue Operator is installed on your cluster, and you have created a
Kueue
custom resource (CR). -
You have installed the OpenShift CLI (
oc
).
Procedure
Add the
kueue.openshift.io/managed=true
label to a namespace by running the following command:oc label namespace <namespace> kueue.openshift.io/managed=true
$ oc label namespace <namespace> kueue.openshift.io/managed=true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
When you add this label, you instruct the Red Hat build of Kueue Operator that the namespace is managed by its webhook admission controllers. As a result, any Red Hat build of Kueue resources within that namespace are properly validated and mutated.
2.5. Configuring role-based permissions Copiar o linkLink copiado para a área de transferência!
The following procedures provide information about how you can configure role-based access control (RBAC) for your Red Hat build of Kueue deployment. These RBAC permissions determine which types of users can create which types of Red Hat build of Kueue objects.
2.5.1. Cluster roles Copiar o linkLink copiado para a área de transferência!
The Red Hat build of Kueue Operator deploys kueue-batch-admin-role
and kueue-batch-user-role
cluster roles by default.
- kueue-batch-admin-role
- This cluster role includes the permissions to manage cluster queues, local queues, workloads, and resource flavors.
- kueue-batch-user-role
- This cluster role includes the permissions to manage jobs and to view local queues and workloads.
2.5.2. Configuring permissions for batch administrators Copiar o linkLink copiado para a área de transferência!
You can configure permissions for batch administrators by binding the kueue-batch-admin-role
cluster role to a user or group of users.
Prerequisites
- The Red Hat build of Kueue Operator is installed on your cluster.
- You have cluster administrator permissions.
-
You have installed the OpenShift CLI (
oc
).
Procedure
Create a
ClusterRoleBinding
object as a YAML file:Example
ClusterRoleBinding
objectCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
ClusterRoleBinding
object:oc apply -f <filename>.yaml
$ oc apply -f <filename>.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
You can verify that the
ClusterRoleBinding
object was applied correctly by running the following command and verifying that the output contains the correct information for thekueue-batch-admin-role
cluster role:$ oc describe clusterrolebinding.rbac
$ oc describe clusterrolebinding.rbac
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.5.3. Configuring permissions for users Copiar o linkLink copiado para a área de transferência!
You can configure permissions for Red Hat build of Kueue users by binding the kueue-batch-user-role
cluster role to a user or group of users.
Prerequisites
- The Red Hat build of Kueue Operator is installed on your cluster.
- You have cluster administrator permissions.
-
You have installed the OpenShift CLI (
oc
).
Procedure
Create a
RoleBinding
object as a YAML file:Example
ClusterRoleBinding
objectCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
RoleBinding
object:oc apply -f <filename>.yaml
$ oc apply -f <filename>.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
You can verify that the
RoleBinding
object was applied correctly by running the following command and verifying that the output contains the correct information for thekueue-batch-user-role
cluster role:$ oc describe rolebinding.rbac
$ oc describe rolebinding.rbac
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.6. Configuring quotas Copiar o linkLink copiado para a área de transferência!
As an administrator, you can use Red Hat build of Kueue to configure quotas to optimize resource allocation and system throughput for user workloads. You can configure quotas for compute resources such as CPU, memory, pods, and GPU.
You can configure quotas in Red Hat build of Kueue by completing the following steps:
- Configure a cluster queue.
- Configure a resource flavor.
- Configure a local queue.
Users can then submit their workloads to the local queue.
2.6.1. Configuring a cluster queue Copiar o linkLink copiado para a área de transferência!
A cluster queue is a cluster-scoped resource, represented by a ClusterQueue
object, that governs a pool of resources such as CPU, memory, and pods. Cluster queues can be used to define usage limits, quotas for resource flavors, order of consumption, and fair sharing rules.
The cluster queue is not ready for use until a ResourceFlavor
object has also been configured.
Prerequisites
- The Red Hat build of Kueue Operator is installed on your cluster.
-
You have cluster administrator permissions or the
kueue-batch-admin-role
role. -
You have installed the OpenShift CLI (
oc
).
Procedure
Create a
ClusterQueue
object as a YAML file:Example of a basic
ClusterQueue
object using a single resource flavorCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Defines which namespaces can use the resources governed by this cluster queue. An empty
namespaceSelector
as shown in the example means that all namespaces can use these resources. - 2
- Defines the resource types governed by the cluster queue. This example
ClusterQueue
object governs CPU, memory, pod, and GPU resources. - 3
- Defines the resource flavor that is applied to the resource types listed. In this example, the
default-flavor
resource flavor is applied to CPU, memory, pod, and GPU resources. - 4
- Defines the resource requirements for admitting jobs. This example cluster queue only admits jobs if the following conditions are met:
- The sum of the CPU requests is less than or equal to 9.
- The sum of the memory requests is less than or equal to 36Gi.
- The total number of pods is less than or equal to 5.
- The sum of the GPU requests is less than or equal to 100.
Apply the
ClusterQueue
object by running the following command:oc apply -f <filename>.yaml
$ oc apply -f <filename>.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.6.2. Configuring a resource flavor Copiar o linkLink copiado para a área de transferência!
After you have configured a ClusterQueue
object, you can configure a ResourceFlavor
object.
Resources in a cluster are typically not homogeneous. If the resources in your cluster are homogeneous, you can use an empty ResourceFlavor
instead of adding labels to custom resource flavors.
You can use a custom ResourceFlavor
object to represent different resource variations that are associated with cluster nodes through labels, taints, and tolerations. You can then associate workloads with specific node types to enable fine-grained resource management.
Prerequisites
- The Red Hat build of Kueue Operator is installed on your cluster.
-
You have cluster administrator permissions or the
kueue-batch-admin-role
role. -
You have installed the OpenShift CLI (
oc
).
Procedure
Create a
ResourceFlavor
object as a YAML file:Example of an empty
ResourceFlavor
objectapiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: default-flavor
apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: default-flavor
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example of a custom
ResourceFlavor
objectCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
ResourceFlavor
object by running the following command:oc apply -f <filename>.yaml
$ oc apply -f <filename>.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.6.3. Configuring a local queue Copiar o linkLink copiado para a área de transferência!
A local queue is a namespaced object, represented by a LocalQueue
object, that groups closely related workloads that belong to a single namespace.
As an administrator, you can configure a LocalQueue
object to point to a cluster queue. This allocates resources from the cluster queue to workloads in the namespace specified in the LocalQueue
object.
Prerequisites
- The Red Hat build of Kueue Operator is installed on your cluster.
-
You have cluster administrator permissions or the
kueue-batch-admin-role
role. -
You have installed the OpenShift CLI (
oc
). -
You have created a
ClusterQueue
object.
Procedure
Create a
LocalQueue
object as a YAML file:Example of a basic
LocalQueue
objectCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
LocalQueue
object by running the following command:oc apply -f <filename>.yaml
$ oc apply -f <filename>.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.6.4. Configuring a default local queue Copiar o linkLink copiado para a área de transferência!
As a cluster administrator, you can improve quota enforcement in your cluster by managing all jobs in selected namespaces without needing to explicitly label each job. You can do this by creating a default local queue.
A default local queue serves as the local queue for newly created jobs that do not have the kueue.x-k8s.io/queue-name
label. After you create a default local queue, any new jobs created in the namespace without a kueue.x-k8s.io/queue-name
label automatically update to have the kueue.x-k8s.io/queue-name: default
label.
Preexisting jobs in a namespace are not affected when you create a default local queue. If jobs already exist in the namespace before you create the default local queue, you must label those jobs explicitly to assign them to a queue.
Prerequisites
- You have installed Red Hat build of Kueue version 1.1 on your cluster.
-
You have cluster administrator permissions or the
kueue-batch-admin-role
role. -
You have installed the OpenShift CLI (
oc
). -
You have created a
ClusterQueue
object.
Procedure
Create a
LocalQueue
object nameddefault
as a YAML file:Example of a default
LocalQueue
objectCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
LocalQueue
object by running the following command:oc apply -f <filename>.yaml
$ oc apply -f <filename>.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
- Create a job in the same namespace as the default local queue.
-
Observe that the job updates with the
kueue.x-k8s.io/queue-name: default
label.
2.7. Managing jobs and workloads Copiar o linkLink copiado para a área de transferência!
Red Hat build of Kueue does not directly manipulate jobs that are created by users. Instead, Kueue manages Workload
objects that represent the resource requirements of a job. Red Hat build of Kueue automatically creates a workload for each job, and syncs any decisions and statuses between the two objects.
2.7.1. Labeling namespaces to allow Red Hat build of Kueue to manage jobs Copiar o linkLink copiado para a área de transferência!
The Red Hat build of Kueue Operator uses an opt-in webhook mechanism to ensure that policies are only enforced for the jobs and namespaces that it is expected to target.
You must label the namespaces where you want Red Hat build of Kueue to manage jobs with the kueue.openshift.io/managed=true
label.
Prerequisites
- You have cluster administrator permissions.
-
The Red Hat build of Kueue Operator is installed on your cluster, and you have created a
Kueue
custom resource (CR). -
You have installed the OpenShift CLI (
oc
).
Procedure
Add the
kueue.openshift.io/managed=true
label to a namespace by running the following command:oc label namespace <namespace> kueue.openshift.io/managed=true
$ oc label namespace <namespace> kueue.openshift.io/managed=true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
When you add this label, you instruct the Red Hat build of Kueue Operator that the namespace is managed by its webhook admission controllers. As a result, any Red Hat build of Kueue resources within that namespace are properly validated and mutated.
2.7.2. Configuring label policies for jobs Copiar o linkLink copiado para a área de transferência!
The spec.config.workloadManagement.labelPolicy
spec in the Kueue
custom resource (CR) is an optional field that controls how Red Hat build of Kueue decides whether to manage or ignore different jobs. The allowed values are QueueName
, None
and empty (""
).
If the labelPolicy
setting is omitted or empty (""
), the default policy is that Red Hat build of Kueue manages jobs that have a kueue.x-k8s.io/queue-name
label, and ignores jobs that do not have the kueue.x-k8s.io/queue-name
label. This is the same workflow as if the labelPolicy
is set to QueueName
.
If the labelPolicy
setting is set to None
, jobs are managed by Red Hat build of Kueue even if they do not have the kueue.x-k8s.io/queue-name
label.
Example workloadManagement
spec configuration
Example user-created Job
object containing the kueue.x-k8s.io/queue-name
label
2.8. Using cohorts Copiar o linkLink copiado para a área de transferência!
You can use cohorts to group cluster queues and determine which cluster queues are able to share borrowable resources with each other. Borrowable resources are defined as the unused nominal quota of all the cluster queues in a cohort.
Using cohorts can help to optimize resource utilization by preventing under-utilization and enabling fair sharing configurations. Cohorts can also help to simplify resource management and allocation between teams, because you can group cluster queues for related workloads or for each team. You can also use cohorts to set resource quotas at a group level to define the limits for resources that a group of cluster queues can consume.
2.8.1. Configuring cohorts within a cluster queue spec Copiar o linkLink copiado para a área de transferência!
You can add a cluster queue to a cohort by specifying the name of the cohort in the .spec.cohort
field of the ClusterQueue
object, as shown in the following example:
All cluster queues that have a matching spec.cohort
are part of the same cohort.
If the spec.cohort
field is omitted, the cluster queue does not belong to any cohort and cannot access borrowable resources.
2.9. Configuring fair sharing Copiar o linkLink copiado para a área de transferência!
Fair sharing is a preemption strategy that is used to achieve an equal or weighted share of borrowable resources between the tenants of a cohort. Borrowable resources are the unused nominal quota of all the cluster queues in a cohort.
You can configure fair sharing by setting the preemptionPolicy
value in the Kueue
custom resource (CR) to FairSharing
.
2.10. Gang scheduling Copiar o linkLink copiado para a área de transferência!
Gang scheduling ensures that a group or gang of related jobs only start when all required resources are available. Red Hat build of Kueue enables gang scheduling by suspending jobs until the OpenShift Container Platform cluster can guarantee the capacity to start and execute all of the related jobs in the gang together. This is also known as all-or-nothing scheduling.
Gang scheduling is important if you are working with expensive, limited resources, such as GPUs. Gang scheduling can prevent jobs from claiming but not using GPUs, which can improve GPU utilization and can reduce running costs. Gang scheduling can also help to prevent issues like resource segmentation and deadlocking.
2.10.1. Configuring gang scheduling Copiar o linkLink copiado para a área de transferência!
As a cluster administrator, you can configure gang scheduling by modifying the gangScheduling
spec in the Kueue
custom resource (CR).
Example Kueue
CR with gang scheduling configured
- 1
- You can set the
policy
value to enable or disable gang scheduling. The possible values areByWorkload
,None
, or empty (""
).ByWorkload
-
When the
policy
value is set toByWorkload
, each job is processed and considered for admission as a single unit. If the job does not become ready within the specified time, the entire job is evicted and retried at a later time. None
-
When the
policy
value is set toNone
, gang scheduling is disabled. - Empty (
""
) -
When the
policy
value is empty or set to""
, the Red Hat build of Kueue Operator determines settings for gang scheduling. Currently, gang scheduling is disabled by default.
- 2
- If the
policy
value is set toByWorkload
, you must configure job admission settings. The possible values for theadmission
spec areParallel
,Sequential
, or empty (""
).Parallel
-
When the
admission
value is set toParallel
, pods from any job can be admitted at any time. This can cause a deadlock, where jobs are in contention for cluster capacity. When a deadlock occurs, the successful scheduling of pods from another job can prevent the scheduling of pods from the current job. Sequential
-
When the
admission
value is set toSequential
, only pods from the currently processing job are admitted. After all of the pods from the current job have been admitted and are ready, Red Hat build of Kueue processes the next job. Sequential processing can slow down admission when the cluster has sufficient capacity for multiple jobs, but provides a higher likelihood that all of the pods for a job are scheduled together successfully. - Empty (
""
) -
When the
admission
value is empty or set to""
, the Red Hat build of Kueue Operator determines job admission settings. Currently, theadmission
value is set toParallel
by default.
2.11. Running jobs with quota limits Copiar o linkLink copiado para a área de transferência!
You can run Kubernetes jobs with Red Hat build of Kueue enabled to manage resource allocation within defined quota limits. This can help to ensure predictable resource availability, cluster stability, and optimized performance.
2.11.1. Identifying available local queues Copiar o linkLink copiado para a área de transferência!
Before you can submit a job to a queue, you must find the name of the local queue.
Prerequisites
- A cluster administrator has installed and configured Red Hat build of Kueue on your OpenShift Container Platform cluster.
-
A cluster administrator has assigned you the
kueue-batch-user-role
cluster role. -
You have installed the OpenShift CLI (
oc
).
Procedure
Run the following command to list available local queues in your namespace:
oc -n <namespace> get localqueues
$ oc -n <namespace> get localqueues
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CLUSTERQUEUE PENDING WORKLOADS user-queue cluster-queue 3
NAME CLUSTERQUEUE PENDING WORKLOADS user-queue cluster-queue 3
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.11.2. Defining a job to run with Red Hat build of Kueue Copiar o linkLink copiado para a área de transferência!
When you are defining a job to run with Red Hat build of Kueue, ensure that it meets the following criteria:
-
Specify the local queue to submit the job to, by using the
kueue.x-k8s.io/queue-name
label. - Include the resource requests for each job pod.
Red Hat build of Kueue suspends the job, and then starts it when resources are available. Red Hat build of Kueue creates a corresponding workload, represented as a Workload
object with a name that matches the job.
Prerequisites
- A cluster administrator has installed and configured Red Hat build of Kueue on your OpenShift Container Platform cluster.
-
A cluster administrator has assigned you the
kueue-batch-user-role
cluster role. -
You have installed the OpenShift CLI (
oc
). - You have identified the name of the local queue that you want to submit jobs to.
Procedure
Create a
Job
object.Example job
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Run the job by running the following command:
oc create -f <filename>.yaml
$ oc create -f <filename>.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Verify that pods are running for the job you have created, by running the following command and observing the output:
oc get job <job-name>
$ oc get job <job-name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME STATUS COMPLETIONS DURATION AGE sample-job-sk42x Suspended 0/1 2m12s
NAME STATUS COMPLETIONS DURATION AGE sample-job-sk42x Suspended 0/1 2m12s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that a workload has been created in your namespace for the job, by running the following command and observing the output:
oc -n <namespace> get workloads
$ oc -n <namespace> get workloads
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME QUEUE RESERVED IN ADMITTED FINISHED AGE job-sample-job-sk42x-77c03 user-queue 3m8s
NAME QUEUE RESERVED IN ADMITTED FINISHED AGE job-sample-job-sk42x-77c03 user-queue 3m8s
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
2.12. Getting support Copiar o linkLink copiado para a área de transferência!
If you experience difficulty with a procedure described in this documentation, or with Red Hat build of Kueue in general, visit the Red Hat Customer Portal.
From the Customer Portal, you can:
- Search or browse through the Red Hat Knowledgebase of articles and solutions relating to Red Hat products.
- Submit a support case to Red Hat Support.
- Access other product documentation.
2.12.1. About the Red Hat Knowledgebase Copiar o linkLink copiado para a área de transferência!
The Red Hat Knowledgebase provides rich content aimed at helping you make the most of Red Hat’s products and technologies. The Red Hat Knowledgebase consists of articles, product documentation, and videos outlining best practices on installing, configuring, and using Red Hat products. In addition, you can search for solutions to known issues, each providing concise root cause descriptions and remedial steps.
2.12.2. Collecting data for Red Hat Support Copiar o linkLink copiado para a área de transferência!
You can use the oc adm must-gather
CLI command to collect the information about your Red Hat build of Kueue instance that is most likely needed for debugging issues, including:
- Red Hat build of Kueue custom resources, such as workloads, cluster queues, local queues, resource flavors, admission checks, and their corresponding cluster resource definitions (CRDs)
- Services
- Endpoints
- Webhook configurations
-
Logs from the
openshift-kueue-operator
namespace andkueue-controller-manager
pods
Collected data is written into a new directory named must-gather/
in the current working directory by default.
Prerequisites
- The Red Hat build of Kueue Operator is installed on your cluster.
-
You have installed the OpenShift CLI (
oc
).
Procedure
-
Navigate to the directory where you want to store the
must-gather
data. Collect
must-gather
data by running the following command:oc adm must-gather \ --image=registry.redhat.io/kueue/kueue-must-gather-rhel9:<version>
$ oc adm must-gather \ --image=registry.redhat.io/kueue/kueue-must-gather-rhel9:<version>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Where
<version>
is your current version of Red Hat build of Kueue.-
Create a compressed file from the
must-gather
directory that was just created in your working directory. Make sure you provide the date and cluster ID for the uniquemust-gather
data. For more information about how to find the cluster ID, see How to find the cluster-id or name on OpenShift cluster. - Attach the compressed file to your support case on the the Customer Support page of the Red Hat Customer Portal.