Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 2. Red Hat build of Kueue


2.1. Introduction to Red Hat build of Kueue

Red Hat build of Kueue is a Kubernetes-native system that manages access to resources for jobs. Red Hat build of Kueue can determine when a job waits, is admitted to start by creating pods, or should be preempted, meaning that active pods for that job are deleted.

Note

In the context of Red Hat build of Kueue, a job can be defined as a one-time or on-demand task that runs to completion.

Red Hat build of Kueue is based on the Kueue open source project.

Red Hat build of Kueue is compatible with environments that use heterogeneous, elastic resources. This means that the environment has many different resource types, and those resources are capable of dynamic scaling.

Red Hat build of Kueue does not replace any existing components in a Kubernetes cluster, but instead integrates with the existing Kubernetes API server, scheduler, and cluster autoscaler components.

Red Hat build of Kueue supports all-or-nothing semantics. This means that either an entire job with all of its components is admitted to the cluster, or the entire job is rejected if it does not fit on the cluster.

2.1.1. Personas

Different personas exist in a Red Hat build of Kueue workflow.

Batch administrators
Batch administrators manage the cluster infrastructure and establish quotas and queues.
Batch users
Batch users run jobs on the cluster. Examples of batch users might be researchers, AI/ML engineers, or data scientists.
Serving users
Serving users run jobs on the cluster. For example, to expose a trained AI/ML model for inference.
Platform developers
Platform developers integrate Red Hat build of Kueue with other software. They might also contribute to the Kueue open source project.

2.1.2. Workflow overview

The Red Hat build of Kueue workflow can be described at a high level as follows:

  1. Batch administrators create and configure ResourceFlavor, LocalQueue, and ClusterQueue resources.
  2. User personas create jobs on the cluster.
  3. The Kubernetes API server validates and accepts job data.
  4. Red Hat build of Kueue admits jobs based on configured options, such as order or quota. It injects affinity into the job by using resource flavors, and creates a Workload object that corresponds to each job.
  5. The applicable controller for the job type creates pods.
  6. The Kubernetes scheduler assigns pods to a node in the cluster.
  7. The Kubernetes cluster autoscaler provisions more nodes as required.

2.2. Release notes

Red Hat build of Kueue is released as an Operator that is supported on OpenShift Container Platform.

2.2.1. Compatible environments

Before you install Red Hat build of Kueue, review this section to ensure that your cluster meets the requirements.

2.2.1.1. Supported architectures

Red Hat build of Kueue version 1.1 and later is supported on the following architectures:

  • ARM64
  • 64-bit x86
  • ppc64le (IBM Power®)
  • s390x (IBM Z®)

2.2.1.2. Supported platforms

Red Hat build of Kueue version 1.1 and later is supported on the following platforms:

  • OpenShift Container Platform
  • Hosted control planes for OpenShift Container Platform
Important

Currently, Red Hat build of Kueue is not supported on Red Hat build of MicroShift (MicroShift).

2.2.2. Release notes for Red Hat build of Kueue version 1.1

Red Hat build of Kueue version 1.1 is a generally available release that is supported on OpenShift Container Platform versions 4.18 and later. Red Hat build of Kueue version 1.1 uses Kueue version 0.12.

Important

If you have a previously installed version of Red Hat build of Kueue on your cluster, you must uninstall the Operator and manually install version 1.1. For information see Upgrading Red Hat build of Kueue.

2.2.2.1. New features and enhancements

Configure a default local queue

A default local queue serves as the local queue for newly created jobs that do not have the kueue.x-k8s.io/queue-name label. After you create a default local queue, any new jobs created in the namespace without a kueue.x-k8s.io/queue-name label automatically update to have the kueue.x-k8s.io/queue-name: default label.

(RFE-7615)

Multi-architecture and Hosted control planes support

With this release, Red Hat build of Kueue is supported on multiple different architectures, including ARM64, 64-bit x86, ppc64le (IBM Power®), and s390x (IBM Z®), as well as on Hosted control planes for OpenShift Container Platform.

(OCPSTRAT-2103)

(OCPSTRAT-2106)

2.2.2.2. Fixed issues

You can create a Kueue custom resource by using the OpenShift Container Platform web console

Before this update, if you tried to use the OpenShift Container Platform web console to create a Kueue custom resource (CR) by using the form view, the web console showed an error and the resource could not be created. With this release, the default namespace was removed from the Kueue CR template. As a result, you can use the OpenShift Container Platform web console to create a Kueue CR by using the form view.

(OCPBUGS-58118)

2.2.2.3. Known issues

Kueue CR description reads as "Not available" in the OpenShift Container Platform web console

After you install Red Hat build of Kueue, in the Operator details view, the description for the Kueue CR reads as "Not available". This issue does not affect or degrade the Red Hat build of Kueue Operator functionality.

(OCPBUGS-62185)

Custom resources are not deleted properly when you uninstall Red Hat build of Kueue

After you uninstall the Red Hat Build of Kueue Operator using the Delete all operand instances for this operator option in the OpenShift Container Platform web console, some Red Hat build of Kueue custom resources are not fully deleted. These resources can be viewed in the Installed Operators view with the status Resource is being deleted. As a workaround, you can manually delete the resource finalizers to remove them fully.

(OCPBUGS-62254)

2.2.3. Release notes for Red Hat build of Kueue version 1.0.1

Red Hat build of Kueue version 1.0.1 is a patch release that is supported on OpenShift Container Platform versions 4.18 and 4.19 on the 64-bit x86 architecture.

Red Hat build of Kueue version 1.0.1 uses Kueue version 0.11.

2.2.3.1. Bug fixes in Red Hat build of Kueue version 1.0.1

  • Previously, leader election for Red Hat build of Kueue was not configured to tolerate disruption, which resulted in frequent crashing. With this release, the leader election values for Red Hat build of Kueue have been updated to match the durations recommended for OpenShift Container Platform. (OCPBUGS-58496)
  • Previously, the ReadyReplicas count was not set in the reconciler, which meant that the Red Hat build of Kueue Operator status would report that there were no replicas ready. With this release, the ReadyReplicas count is based on the number of ready replicas for the deployment, which ensures that the Operator shows as ready in the OpenShift Container Platform console when the kueue-controller-manager pods are ready. (OCPBUGS-59261)
  • Previously, when the Kueue custom resource (CR) was deleted from the openshift-kueue-operator namespace, the kueue-manager-config config map was not deleted automatically and could remain in the namespace. With this release, the kueue-manager-config config map, kueue-webhook-server-cert secret, and metrics-server-cert secret are deleted automatically when the Kueue CR is deleted. (OCPBUGS-57960)

2.2.4. Release notes for Red Hat build of Kueue version 1.0

Red Hat build of Kueue version 1.0 is a generally available release that is supported on OpenShift Container Platform versions 4.18 and 4.19 on the 64-bit x86 architecture. Red Hat build of Kueue version 1.0 uses Kueue version 0.11.

2.2.4.1. New features and enhancements

Role-based access control (RBAC)
Role-based access control (RBAC) enables you to control which types of users can create which types of Red Hat build of Kueue resources.
Configure resource quotas
Configuring resource quotas by creating cluster queues, resource flavors, and local queues enables you to control the amount of resources used by user-submitted jobs and workloads.
Control job and workload management
Labeling namespaces and configuring label policies enable you to control which jobs and workloads are managed by Red Hat build of Kueue.
Share borrowable resources between queues
Configuring cohorts, fair sharing, and gang scheduling settings enable you to share unused, borrowable resources between queues.

2.2.4.2. Known issues

Jobs in all namespaces are reconciled if they have the kueue.x-k8s.io/queue-name label

Red Hat build of Kueue uses the managedJobsNamespaceSelector configuration field, so that administrators can configure which namespaces opt in to be managed by Red Hat build of Kueue. Because namespaces must be manually configured to opt in to being managed by Red Hat build of Kueue, resources in system or third-party namespaces are not impacted or managed by Red Hat build of Kueue.

The behavior in Red Hat build of Kueue 1.0 allows reconciliation of Job resources that have the kueue.x-k8s.io/queue-name label, even if these resources are in namespaces that are not configured to opt in to being managed by Red Hat build of Kueue. This is inconsistent with the behavior for other core integrations like pods, deployments, and stateful sets, which are only reconciled if they are in namespaces that have been configured to opt in to being managed by Red Hat build of Kueue.

(OCPBUGS-58205)

You cannot create a Kueue custom resource by using the OpenShift Container Platform web console

If you try to use the OpenShift Container Platform web console to create a Kueue custom resource (CR) by using the form view, the web console shows an error and the resource cannot be created. As a workaround, use the YAML view to create a Kueue CR instead.

(OCPBUGS-58118)

2.3. Installing Red Hat build of Kueue

You can install Red Hat build of Kueue by using the Red Hat Build of Kueue Operator in OperatorHub.

2.3.1. Compatible environments

Before you install Red Hat build of Kueue, review this section to ensure that your cluster meets the requirements.

2.3.1.1. Supported architectures

Red Hat build of Kueue version 1.1 and later is supported on the following architectures:

  • ARM64
  • 64-bit x86
  • ppc64le (IBM Power®)
  • s390x (IBM Z®)

2.3.1.2. Supported platforms

Red Hat build of Kueue version 1.1 and later is supported on the following platforms:

  • OpenShift Container Platform
  • Hosted control planes for OpenShift Container Platform
Important

Currently, Red Hat build of Kueue is not supported on Red Hat build of MicroShift (MicroShift).

2.3.2. Installing the Red Hat Build of Kueue Operator

You can install the Red Hat Build of Kueue Operator on a OpenShift Container Platform cluster by using the OperatorHub in the web console.

Prerequisites

  • You have administrator permissions on a OpenShift Container Platform cluster.
  • You have access to the OpenShift Container Platform web console.
  • You have installed and configured the cert-manager Operator for Red Hat OpenShift for your cluster.

Procedure

  1. In the OpenShift Container Platform web console, click Operators OperatorHub.
  2. Choose Red Hat Build of Kueue Operator from the list of available Operators, and click Install.

Verification

  • Go to Operators Installed Operators and confirm that the Red Hat Build of Kueue Operator is listed with Status as Succeeded.

2.3.3. Upgrading Red Hat build of Kueue

If you have previously installed Red Hat build of Kueue, you must manually upgrade your deployment to the latest version to use the latest bug fixes and feature enhancements.

Prerequisites

  • You have installed a previous version of Red Hat build of Kueue.
  • You are logged in to the OpenShift Container Platform web console with cluster administrator permissions.

Procedure

  1. In the OpenShift Container Platform web console, click Operators Installed Operators, then select Red Hat build of Kueue from the list.
  2. From the Actions drop-down menu, select Uninstall Operator.
  3. The Uninstall Operator? dialog box opens. Click Uninstall.

    Important

    Selecting the Delete all operand instances for this operator checkbox before clicking Uninstall deletes all existing resources from the cluster, including:

    • The Kueue CR
    • Any cluster queues, local queues, or resource flavors that you have created

    Leave this box unchecked when upgrading your cluster to retain your created resources.

  4. In the OpenShift Container Platform web console, click Operators OperatorHub.
  5. Choose Red Hat Build of Kueue Operator from the list of available Operators, and click Install.

Verification

  1. Go to Operators Installed Operators.
  2. Confirm that the Red Hat Build of Kueue Operator is listed with Status as Succeeded.
  3. Confirm that the version shown under the Operator name in the list is the latest version.

2.3.4. Creating a Kueue custom resource

After you have installed the Red Hat Build of Kueue Operator, you must create a Kueue custom resource (CR) to configure your installation.

Prerequisites

Ensure that you have completed the following prerequisites:

  • The Red Hat build of Kueue Operator is installed on your cluster.
  • You have cluster administrator permissions and the kueue-batch-admin-role role.
  • You have access to the OpenShift Container Platform web console.

Procedure

  1. In the OpenShift Container Platform web console, click Operators Installed Operators.
  2. In the Provided APIs table column, click Kueue. This takes you to the Kueue tab of the Operator details page.
  3. Click Create Kueue. This takes you to the Create Kueue YAML view.
  4. Enter the details for your Kueue CR.

    Example Kueue CR

    apiVersion: kueue.openshift.io/v1
    kind: Kueue
    metadata:
      labels:
        app.kubernetes.io/name: kueue-operator
        app.kubernetes.io/managed-by: kustomize
      name: cluster 
    1
    
      namespace: openshift-kueue-operator
    spec:
      managementState: Managed
      config:
        integrations:
          frameworks: 
    2
    
          - BatchJob
        preemption:
          preemptionPolicy: Classical 
    3
    
    # ...
    Copy to Clipboard Toggle word wrap

    1
    The name of the Kueue CR must be cluster.
    2
    If you want to configure Red Hat build of Kueue for use with other workload types, add those types here. For the default configuration, only the BatchJob type is recommended and supported.
    3
    Optional: If you want to configure fair sharing for Red Hat build of Kueue, set the preemptionPolicy value to FairSharing. The default setting in the Kueue CR is Classical preemption.
  5. Click Create.

Verification

  • After you create the Kueue CR, the web console brings you to the Operator details page, where you can see the CR in the list of Kueues.
  • Optional: If you have the OpenShift CLI (oc) installed, you can run the following command and observe the output to confirm that your Kueue CR has been created successfully:

    $ oc get kueue
    Copy to Clipboard Toggle word wrap

    Example output

    NAME      	AGE
    cluster   	4m
    Copy to Clipboard Toggle word wrap

The Red Hat build of Kueue Operator uses an opt-in webhook mechanism to ensure that policies are only enforced for the jobs and namespaces that it is expected to target.

You must label the namespaces where you want Red Hat build of Kueue to manage jobs with the kueue.openshift.io/managed=true label.

Prerequisites

  • You have cluster administrator permissions.
  • The Red Hat build of Kueue Operator is installed on your cluster, and you have created a Kueue custom resource (CR).
  • You have installed the OpenShift CLI (oc).

Procedure

  • Add the kueue.openshift.io/managed=true label to a namespace by running the following command:

    $ oc label namespace <namespace> kueue.openshift.io/managed=true
    Copy to Clipboard Toggle word wrap

When you add this label, you instruct the Red Hat build of Kueue Operator that the namespace is managed by its webhook admission controllers. As a result, any Red Hat build of Kueue resources within that namespace are properly validated and mutated.

Before you can install Red Hat build of Kueue on a disconnected OpenShift Container Platform cluster, you must enable Operator Lifecycle Manager (OLM) in disconnected environments by completing the following steps:

  • Disable the default remote OperatorHub sources for OLM.
  • Use a workstation with full internet access to create and push local mirrors of the OperatorHub content to a mirror registry.
  • Configure OLM to install and manage Operators from local sources on the mirror registry instead of the default remote sources.

After enabling OLM in a disconnected environment, you can continue to use your unrestricted workstation to keep your local OperatorHub sources updated as newer versions of Operators are released.

For full documentation on completing these steps, see the OpenShift Container Platform documentation on Using Operator Lifecycle Manager in disconnected environments.

2.4.1. Compatible environments

Before you install Red Hat build of Kueue, review this section to ensure that your cluster meets the requirements.

2.4.1.1. Supported architectures

Red Hat build of Kueue version 1.1 and later is supported on the following architectures:

  • ARM64
  • 64-bit x86
  • ppc64le (IBM Power®)
  • s390x (IBM Z®)

2.4.1.2. Supported platforms

Red Hat build of Kueue version 1.1 and later is supported on the following platforms:

  • OpenShift Container Platform
  • Hosted control planes for OpenShift Container Platform
Important

Currently, Red Hat build of Kueue is not supported on Red Hat build of MicroShift (MicroShift).

2.4.2. Installing the Red Hat Build of Kueue Operator

You can install the Red Hat Build of Kueue Operator on a OpenShift Container Platform cluster by using the OperatorHub in the web console.

Prerequisites

  • You have administrator permissions on a OpenShift Container Platform cluster.
  • You have access to the OpenShift Container Platform web console.
  • You have installed and configured the cert-manager Operator for Red Hat OpenShift for your cluster.

Procedure

  1. In the OpenShift Container Platform web console, click Operators OperatorHub.
  2. Choose Red Hat Build of Kueue Operator from the list of available Operators, and click Install.

Verification

  • Go to Operators Installed Operators and confirm that the Red Hat Build of Kueue Operator is listed with Status as Succeeded.

2.4.3. Upgrading Red Hat build of Kueue

If you have previously installed Red Hat build of Kueue, you must manually upgrade your deployment to the latest version to use the latest bug fixes and feature enhancements.

Prerequisites

  • You have installed a previous version of Red Hat build of Kueue.
  • You are logged in to the OpenShift Container Platform web console with cluster administrator permissions.

Procedure

  1. In the OpenShift Container Platform web console, click Operators Installed Operators, then select Red Hat build of Kueue from the list.
  2. From the Actions drop-down menu, select Uninstall Operator.
  3. The Uninstall Operator? dialog box opens. Click Uninstall.

    Important

    Selecting the Delete all operand instances for this operator checkbox before clicking Uninstall deletes all existing resources from the cluster, including:

    • The Kueue CR
    • Any cluster queues, local queues, or resource flavors that you have created

    Leave this box unchecked when upgrading your cluster to retain your created resources.

  4. In the OpenShift Container Platform web console, click Operators OperatorHub.
  5. Choose Red Hat Build of Kueue Operator from the list of available Operators, and click Install.

Verification

  1. Go to Operators Installed Operators.
  2. Confirm that the Red Hat Build of Kueue Operator is listed with Status as Succeeded.
  3. Confirm that the version shown under the Operator name in the list is the latest version.

2.4.4. Creating a Kueue custom resource

After you have installed the Red Hat Build of Kueue Operator, you must create a Kueue custom resource (CR) to configure your installation.

Prerequisites

Ensure that you have completed the following prerequisites:

  • The Red Hat build of Kueue Operator is installed on your cluster.
  • You have cluster administrator permissions and the kueue-batch-admin-role role.
  • You have access to the OpenShift Container Platform web console.

Procedure

  1. In the OpenShift Container Platform web console, click Operators Installed Operators.
  2. In the Provided APIs table column, click Kueue. This takes you to the Kueue tab of the Operator details page.
  3. Click Create Kueue. This takes you to the Create Kueue YAML view.
  4. Enter the details for your Kueue CR.

    Example Kueue CR

    apiVersion: kueue.openshift.io/v1
    kind: Kueue
    metadata:
      labels:
        app.kubernetes.io/name: kueue-operator
        app.kubernetes.io/managed-by: kustomize
      name: cluster 
    1
    
      namespace: openshift-kueue-operator
    spec:
      managementState: Managed
      config:
        integrations:
          frameworks: 
    2
    
          - BatchJob
        preemption:
          preemptionPolicy: Classical 
    3
    
    # ...
    Copy to Clipboard Toggle word wrap

    1
    The name of the Kueue CR must be cluster.
    2
    If you want to configure Red Hat build of Kueue for use with other workload types, add those types here. For the default configuration, only the BatchJob type is recommended and supported.
    3
    Optional: If you want to configure fair sharing for Red Hat build of Kueue, set the preemptionPolicy value to FairSharing. The default setting in the Kueue CR is Classical preemption.
  5. Click Create.

Verification

  • After you create the Kueue CR, the web console brings you to the Operator details page, where you can see the CR in the list of Kueues.
  • Optional: If you have the OpenShift CLI (oc) installed, you can run the following command and observe the output to confirm that your Kueue CR has been created successfully:

    $ oc get kueue
    Copy to Clipboard Toggle word wrap

    Example output

    NAME      	AGE
    cluster   	4m
    Copy to Clipboard Toggle word wrap

The Red Hat build of Kueue Operator uses an opt-in webhook mechanism to ensure that policies are only enforced for the jobs and namespaces that it is expected to target.

You must label the namespaces where you want Red Hat build of Kueue to manage jobs with the kueue.openshift.io/managed=true label.

Prerequisites

  • You have cluster administrator permissions.
  • The Red Hat build of Kueue Operator is installed on your cluster, and you have created a Kueue custom resource (CR).
  • You have installed the OpenShift CLI (oc).

Procedure

  • Add the kueue.openshift.io/managed=true label to a namespace by running the following command:

    $ oc label namespace <namespace> kueue.openshift.io/managed=true
    Copy to Clipboard Toggle word wrap

When you add this label, you instruct the Red Hat build of Kueue Operator that the namespace is managed by its webhook admission controllers. As a result, any Red Hat build of Kueue resources within that namespace are properly validated and mutated.

2.5. Configuring role-based permissions

The following procedures provide information about how you can configure role-based access control (RBAC) for your Red Hat build of Kueue deployment. These RBAC permissions determine which types of users can create which types of Red Hat build of Kueue objects.

2.5.1. Cluster roles

The Red Hat build of Kueue Operator deploys kueue-batch-admin-role and kueue-batch-user-role cluster roles by default.

kueue-batch-admin-role
This cluster role includes the permissions to manage cluster queues, local queues, workloads, and resource flavors.
kueue-batch-user-role
This cluster role includes the permissions to manage jobs and to view local queues and workloads.

2.5.2. Configuring permissions for batch administrators

You can configure permissions for batch administrators by binding the kueue-batch-admin-role cluster role to a user or group of users.

Prerequisites

  • The Red Hat build of Kueue Operator is installed on your cluster.
  • You have cluster administrator permissions.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Create a ClusterRoleBinding object as a YAML file:

    Example ClusterRoleBinding object

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: kueue-admins 
    1
    
    subjects: 
    2
    
    - kind: User
      name: admin@example.com
      apiGroup: rbac.authorization.k8s.io
    roleRef: 
    3
    
      kind: ClusterRole
      name: kueue-batch-admin-role
      apiGroup: rbac.authorization.k8s.io
    Copy to Clipboard Toggle word wrap

    1
    Provide a name for the ClusterRoleBinding object.
    2
    Add details about which user or group of users you want to provide user permissions for.
    3
    Add details about the kueue-batch-admin-role cluster role.
  2. Apply the ClusterRoleBinding object:

    $ oc apply -f <filename>.yaml
    Copy to Clipboard Toggle word wrap

Verification

  • You can verify that the ClusterRoleBinding object was applied correctly by running the following command and verifying that the output contains the correct information for the kueue-batch-admin-role cluster role:

    $ oc describe clusterrolebinding.rbac
    Copy to Clipboard Toggle word wrap

    Example output

    ...
    Name:         kueue-batch-admin-role
    Labels:       app.kubernetes.io/name=kueue
    Annotations:  <none>
    Role:
      Kind:  ClusterRole
      Name:  kueue-batch-admin-role
    Subjects:
      Kind            Name                      Namespace
      ----            ----                      ---------
      User            admin@example.com         admin-namespace
    ...
    Copy to Clipboard Toggle word wrap

2.5.3. Configuring permissions for users

You can configure permissions for Red Hat build of Kueue users by binding the kueue-batch-user-role cluster role to a user or group of users.

Prerequisites

  • The Red Hat build of Kueue Operator is installed on your cluster.
  • You have cluster administrator permissions.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Create a RoleBinding object as a YAML file:

    Example ClusterRoleBinding object

    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: kueue-users 
    1
    
      namespace: user-namespace 
    2
    
    subjects: 
    3
    
    - kind: Group
      name: team-a@example.com
      apiGroup: rbac.authorization.k8s.io
    roleRef: 
    4
    
      kind: ClusterRole
      name: kueue-batch-user-role
      apiGroup: rbac.authorization.k8s.io
    Copy to Clipboard Toggle word wrap

    1
    Provide a name for the RoleBinding object.
    2
    Add details about which namespace the RoleBinding object applies to.
    3
    Add details about which user or group of users you want to provide user permissions for.
    4
    Add details about the kueue-batch-user-role cluster role.
  2. Apply the RoleBinding object:

    $ oc apply -f <filename>.yaml
    Copy to Clipboard Toggle word wrap

Verification

  • You can verify that the RoleBinding object was applied correctly by running the following command and verifying that the output contains the correct information for the kueue-batch-user-role cluster role:

    $ oc describe rolebinding.rbac
    Copy to Clipboard Toggle word wrap

    Example output

    ...
    Name:         kueue-users
    Labels:       app.kubernetes.io/name=kueue
    Annotations:  <none>
    Role:
      Kind:  ClusterRole
      Name:  kueue-batch-user-role
    Subjects:
      Kind            Name                      Namespace
      ----            ----                      ---------
      Group           team-a@example.com        user-namespace
    ...
    Copy to Clipboard Toggle word wrap

2.6. Configuring quotas

As an administrator, you can use Red Hat build of Kueue to configure quotas to optimize resource allocation and system throughput for user workloads. You can configure quotas for compute resources such as CPU, memory, pods, and GPU.

You can configure quotas in Red Hat build of Kueue by completing the following steps:

  1. Configure a cluster queue.
  2. Configure a resource flavor.
  3. Configure a local queue.

Users can then submit their workloads to the local queue.

2.6.1. Configuring a cluster queue

A cluster queue is a cluster-scoped resource, represented by a ClusterQueue object, that governs a pool of resources such as CPU, memory, and pods. Cluster queues can be used to define usage limits, quotas for resource flavors, order of consumption, and fair sharing rules.

Note

The cluster queue is not ready for use until a ResourceFlavor object has also been configured.

Prerequisites

  • The Red Hat build of Kueue Operator is installed on your cluster.
  • You have cluster administrator permissions or the kueue-batch-admin-role role.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Create a ClusterQueue object as a YAML file:

    Example of a basic ClusterQueue object using a single resource flavor

    apiVersion: kueue.x-k8s.io/v1beta1
    kind: ClusterQueue
    metadata:
      name: cluster-queue
    spec:
      namespaceSelector: {} 
    1
    
      resourceGroups:
      - coveredResources: ["cpu", "memory", "pods", "foo.com/gpu"] 
    2
    
        flavors:
        - name: "default-flavor" 
    3
    
          resources: 
    4
    
          - name: "cpu"
            nominalQuota: 9
          - name: "memory"
            nominalQuota: 36Gi
          - name: "pods"
            nominalQuota: 5
          - name: "foo.com/gpu"
            nominalQuota: 100
    Copy to Clipboard Toggle word wrap

    1
    Defines which namespaces can use the resources governed by this cluster queue. An empty namespaceSelector as shown in the example means that all namespaces can use these resources.
    2
    Defines the resource types governed by the cluster queue. This example ClusterQueue object governs CPU, memory, pod, and GPU resources.
    3
    Defines the resource flavor that is applied to the resource types listed. In this example, the default-flavor resource flavor is applied to CPU, memory, pod, and GPU resources.
    4
    Defines the resource requirements for admitting jobs. This example cluster queue only admits jobs if the following conditions are met:
    • The sum of the CPU requests is less than or equal to 9.
    • The sum of the memory requests is less than or equal to 36Gi.
    • The total number of pods is less than or equal to 5.
    • The sum of the GPU requests is less than or equal to 100.
  2. Apply the ClusterQueue object by running the following command:

    $ oc apply -f <filename>.yaml
    Copy to Clipboard Toggle word wrap

Next steps

The cluster queue is not ready for use until a ResourceFlavor object has also been configured.

2.6.2. Configuring a resource flavor

After you have configured a ClusterQueue object, you can configure a ResourceFlavor object.

Resources in a cluster are typically not homogeneous. If the resources in your cluster are homogeneous, you can use an empty ResourceFlavor instead of adding labels to custom resource flavors.

You can use a custom ResourceFlavor object to represent different resource variations that are associated with cluster nodes through labels, taints, and tolerations. You can then associate workloads with specific node types to enable fine-grained resource management.

Prerequisites

  • The Red Hat build of Kueue Operator is installed on your cluster.
  • You have cluster administrator permissions or the kueue-batch-admin-role role.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Create a ResourceFlavor object as a YAML file:

    Example of an empty ResourceFlavor object

    apiVersion: kueue.x-k8s.io/v1beta1
    kind: ResourceFlavor
    metadata:
      name: default-flavor
    Copy to Clipboard Toggle word wrap

    Example of a custom ResourceFlavor object

    apiVersion: kueue.x-k8s.io/v1beta1
    kind: ResourceFlavor
    metadata:
      name: "x86"
    spec:
      nodeLabels:
        cpu-arch: x86
    Copy to Clipboard Toggle word wrap

  2. Apply the ResourceFlavor object by running the following command:

    $ oc apply -f <filename>.yaml
    Copy to Clipboard Toggle word wrap

2.6.3. Configuring a local queue

A local queue is a namespaced object, represented by a LocalQueue object, that groups closely related workloads that belong to a single namespace.

As an administrator, you can configure a LocalQueue object to point to a cluster queue. This allocates resources from the cluster queue to workloads in the namespace specified in the LocalQueue object.

Prerequisites

  • The Red Hat build of Kueue Operator is installed on your cluster.
  • You have cluster administrator permissions or the kueue-batch-admin-role role.
  • You have installed the OpenShift CLI (oc).
  • You have created a ClusterQueue object.

Procedure

  1. Create a LocalQueue object as a YAML file:

    Example of a basic LocalQueue object

    apiVersion: kueue.x-k8s.io/v1beta1
    kind: LocalQueue
    metadata:
      namespace: team-namespace
      name: user-queue
    spec:
      clusterQueue: cluster-queue
    Copy to Clipboard Toggle word wrap

  2. Apply the LocalQueue object by running the following command:

    $ oc apply -f <filename>.yaml
    Copy to Clipboard Toggle word wrap

2.6.4. Configuring a default local queue

As a cluster administrator, you can improve quota enforcement in your cluster by managing all jobs in selected namespaces without needing to explicitly label each job. You can do this by creating a default local queue.

A default local queue serves as the local queue for newly created jobs that do not have the kueue.x-k8s.io/queue-name label. After you create a default local queue, any new jobs created in the namespace without a kueue.x-k8s.io/queue-name label automatically update to have the kueue.x-k8s.io/queue-name: default label.

Important

Preexisting jobs in a namespace are not affected when you create a default local queue. If jobs already exist in the namespace before you create the default local queue, you must label those jobs explicitly to assign them to a queue.

Prerequisites

  • You have installed Red Hat build of Kueue version 1.1 on your cluster.
  • You have cluster administrator permissions or the kueue-batch-admin-role role.
  • You have installed the OpenShift CLI (oc).
  • You have created a ClusterQueue object.

Procedure

  1. Create a LocalQueue object named default as a YAML file:

    Example of a default LocalQueue object

    apiVersion: kueue.x-k8s.io/v1beta1
    kind: LocalQueue
    metadata:
      namespace: team-namespace
      name: default
    spec:
      clusterQueue: cluster-queue
    Copy to Clipboard Toggle word wrap

  2. Apply the LocalQueue object by running the following command:

    $ oc apply -f <filename>.yaml
    Copy to Clipboard Toggle word wrap

Verification

  1. Create a job in the same namespace as the default local queue.
  2. Observe that the job updates with the kueue.x-k8s.io/queue-name: default label.

2.7. Managing jobs and workloads

Red Hat build of Kueue does not directly manipulate jobs that are created by users. Instead, Kueue manages Workload objects that represent the resource requirements of a job. Red Hat build of Kueue automatically creates a workload for each job, and syncs any decisions and statuses between the two objects.

The Red Hat build of Kueue Operator uses an opt-in webhook mechanism to ensure that policies are only enforced for the jobs and namespaces that it is expected to target.

You must label the namespaces where you want Red Hat build of Kueue to manage jobs with the kueue.openshift.io/managed=true label.

Prerequisites

  • You have cluster administrator permissions.
  • The Red Hat build of Kueue Operator is installed on your cluster, and you have created a Kueue custom resource (CR).
  • You have installed the OpenShift CLI (oc).

Procedure

  • Add the kueue.openshift.io/managed=true label to a namespace by running the following command:

    $ oc label namespace <namespace> kueue.openshift.io/managed=true
    Copy to Clipboard Toggle word wrap

When you add this label, you instruct the Red Hat build of Kueue Operator that the namespace is managed by its webhook admission controllers. As a result, any Red Hat build of Kueue resources within that namespace are properly validated and mutated.

2.7.2. Configuring label policies for jobs

The spec.config.workloadManagement.labelPolicy spec in the Kueue custom resource (CR) is an optional field that controls how Red Hat build of Kueue decides whether to manage or ignore different jobs. The allowed values are QueueName, None and empty ("").

If the labelPolicy setting is omitted or empty (""), the default policy is that Red Hat build of Kueue manages jobs that have a kueue.x-k8s.io/queue-name label, and ignores jobs that do not have the kueue.x-k8s.io/queue-name label. This is the same workflow as if the labelPolicy is set to QueueName.

If the labelPolicy setting is set to None, jobs are managed by Red Hat build of Kueue even if they do not have the kueue.x-k8s.io/queue-name label.

Example workloadManagement spec configuration

apiVersion: kueue.openshift.io/v1
kind: Kueue
metadata:
  labels:
    app.kubernetes.io/name: kueue-operator
    app.kubernetes.io/managed-by: kustomize
  name: cluster
  namespace: openshift-kueue-operator
spec:
  config:
    workloadManagement:
      labelPolicy: QueueName
# ...
Copy to Clipboard Toggle word wrap

Example user-created Job object containing the kueue.x-k8s.io/queue-name label

apiVersion: batch/v1
kind: Job
metadata:
  generateName: sample-job-
  namespace: my-namespace
  labels:
    kueue.x-k8s.io/queue-name: user-queue
spec:
# ...
Copy to Clipboard Toggle word wrap

2.8. Using cohorts

You can use cohorts to group cluster queues and determine which cluster queues are able to share borrowable resources with each other. Borrowable resources are defined as the unused nominal quota of all the cluster queues in a cohort.

Using cohorts can help to optimize resource utilization by preventing under-utilization and enabling fair sharing configurations. Cohorts can also help to simplify resource management and allocation between teams, because you can group cluster queues for related workloads or for each team. You can also use cohorts to set resource quotas at a group level to define the limits for resources that a group of cluster queues can consume.

2.8.1. Configuring cohorts within a cluster queue spec

You can add a cluster queue to a cohort by specifying the name of the cohort in the .spec.cohort field of the ClusterQueue object, as shown in the following example:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: cluster-queue
spec:
# ...
  cohort: example-cohort
# ...
Copy to Clipboard Toggle word wrap

All cluster queues that have a matching spec.cohort are part of the same cohort.

If the spec.cohort field is omitted, the cluster queue does not belong to any cohort and cannot access borrowable resources.

2.9. Configuring fair sharing

Fair sharing is a preemption strategy that is used to achieve an equal or weighted share of borrowable resources between the tenants of a cohort. Borrowable resources are the unused nominal quota of all the cluster queues in a cohort.

You can configure fair sharing by setting the preemptionPolicy value in the Kueue custom resource (CR) to FairSharing.

2.9.1. Cluster queue weights

After you have enabled fair sharing, you must set share values for each cluster queue before fair sharing can take place. Share values are represented as the weight value in a ClusterQueue object.

Share values are important because they allow administrators to prioritize specific job types or teams. Critical applications or high-priority teams can be configured with a weighted value so that they receive a proportionally larger share of the available resources. Configuring weights ensures that unused resources are distributed according to defined organizational or project priorities rather than on a first-come, first-served basis.

The weight value, or share value, defines a comparative advantage for the cluster queue when competing for borrowable resources. Generally, Red Hat build of Kueue admits jobs with a lower share value first. Jobs with a higher share value are more likely to be preempted before those with lower share values.

Example cluster queue with a fair sharing weight configured

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: cluster-queue
spec:
  namespaceSelector: {}
  resourceGroups:
  - coveredResources: ["cpu"]
    flavors:
    - name: default-flavor
      resources:
      - name: cpu
        nominalQuota: 9
  cohort: example-cohort
  fairSharing:
    weight: 2
Copy to Clipboard Toggle word wrap

2.9.1.1. Zero weight

A weight value of 0 represents an infinite share value. This means that the cluster queue is always at a disadvantage compared to others, and its workloads are always the first to be preempted when fair sharing is enabled.

2.10. Gang scheduling

Gang scheduling ensures that a group or gang of related jobs only start when all required resources are available. Red Hat build of Kueue enables gang scheduling by suspending jobs until the OpenShift Container Platform cluster can guarantee the capacity to start and execute all of the related jobs in the gang together. This is also known as all-or-nothing scheduling.

Gang scheduling is important if you are working with expensive, limited resources, such as GPUs. Gang scheduling can prevent jobs from claiming but not using GPUs, which can improve GPU utilization and can reduce running costs. Gang scheduling can also help to prevent issues like resource segmentation and deadlocking.

2.10.1. Configuring gang scheduling

As a cluster administrator, you can configure gang scheduling by modifying the gangScheduling spec in the Kueue custom resource (CR).

Example Kueue CR with gang scheduling configured

apiVersion: kueue.openshift.io/v1
kind: Kueue
metadata:
  name: cluster
  labels:
    app.kubernetes.io/managed-by: kustomize
    app.kubernetes.io/name: kueue-operator
  namespace: openshift-kueue-operator
spec:
  config:
    gangScheduling:
      policy: ByWorkload 
1

      byWorkload:
        admission: Parallel 
2

# ...
Copy to Clipboard Toggle word wrap

1
You can set the policy value to enable or disable gang scheduling. The possible values are ByWorkload, None, or empty ("").
ByWorkload
When the policy value is set to ByWorkload, each job is processed and considered for admission as a single unit. If the job does not become ready within the specified time, the entire job is evicted and retried at a later time.
None
When the policy value is set to None, gang scheduling is disabled.
Empty ("")
When the policy value is empty or set to "", the Red Hat build of Kueue Operator determines settings for gang scheduling. Currently, gang scheduling is disabled by default.
2
If the policy value is set to ByWorkload, you must configure job admission settings. The possible values for the admission spec are Parallel, Sequential, or empty ("").
Parallel
When the admission value is set to Parallel, pods from any job can be admitted at any time. This can cause a deadlock, where jobs are in contention for cluster capacity. When a deadlock occurs, the successful scheduling of pods from another job can prevent the scheduling of pods from the current job.
Sequential
When the admission value is set to Sequential, only pods from the currently processing job are admitted. After all of the pods from the current job have been admitted and are ready, Red Hat build of Kueue processes the next job. Sequential processing can slow down admission when the cluster has sufficient capacity for multiple jobs, but provides a higher likelihood that all of the pods for a job are scheduled together successfully.
Empty ("")
When the admission value is empty or set to "", the Red Hat build of Kueue Operator determines job admission settings. Currently, the admission value is set to Parallel by default.

2.11. Running jobs with quota limits

You can run Kubernetes jobs with Red Hat build of Kueue enabled to manage resource allocation within defined quota limits. This can help to ensure predictable resource availability, cluster stability, and optimized performance.

2.11.1. Identifying available local queues

Before you can submit a job to a queue, you must find the name of the local queue.

Prerequisites

  • A cluster administrator has installed and configured Red Hat build of Kueue on your OpenShift Container Platform cluster.
  • A cluster administrator has assigned you the kueue-batch-user-role cluster role.
  • You have installed the OpenShift CLI (oc).

Procedure

  • Run the following command to list available local queues in your namespace:

    $ oc -n <namespace> get localqueues
    Copy to Clipboard Toggle word wrap

    Example output

    NAME         CLUSTERQUEUE    PENDING WORKLOADS
    user-queue   cluster-queue   3
    Copy to Clipboard Toggle word wrap

2.11.2. Defining a job to run with Red Hat build of Kueue

When you are defining a job to run with Red Hat build of Kueue, ensure that it meets the following criteria:

  • Specify the local queue to submit the job to, by using the kueue.x-k8s.io/queue-name label.
  • Include the resource requests for each job pod.

Red Hat build of Kueue suspends the job, and then starts it when resources are available. Red Hat build of Kueue creates a corresponding workload, represented as a Workload object with a name that matches the job.

Prerequisites

  • A cluster administrator has installed and configured Red Hat build of Kueue on your OpenShift Container Platform cluster.
  • A cluster administrator has assigned you the kueue-batch-user-role cluster role.
  • You have installed the OpenShift CLI (oc).
  • You have identified the name of the local queue that you want to submit jobs to.

Procedure

  1. Create a Job object.

    Example job

    apiVersion: batch/v1
    kind: Job 
    1
    
    metadata:
      generateName: sample-job- 
    2
    
      namespace: my-namespace
      labels:
        kueue.x-k8s.io/queue-name: user-queue 
    3
    
    spec:
      parallelism: 3
      completions: 3
      template:
        spec:
          containers:
          - name: dummy-job
            image: registry.k8s.io/e2e-test-images/agnhost:2.53
            args: ["entrypoint-tester", "hello", "world"]
            resources: 
    4
    
              requests:
                cpu: 1
                memory: "200Mi"
          restartPolicy: Never
    Copy to Clipboard Toggle word wrap

    1
    Defines the resource type as a Job object, which represents a batch computation task.
    2
    Provides a prefix for generating a unique name for the job.
    3
    Identifies the queue to send the job to.
    4
    Defines the resource requests for each pod.
  2. Run the job by running the following command:

    $ oc create -f <filename>.yaml
    Copy to Clipboard Toggle word wrap

Verification

  • Verify that pods are running for the job you have created, by running the following command and observing the output:

    $ oc get job <job-name>
    Copy to Clipboard Toggle word wrap

    Example output

    NAME               STATUS      COMPLETIONS   DURATION   AGE
    sample-job-sk42x   Suspended   0/1                      2m12s
    Copy to Clipboard Toggle word wrap

  • Verify that a workload has been created in your namespace for the job, by running the following command and observing the output:

    $ oc -n <namespace> get workloads
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                         QUEUE          RESERVED IN   ADMITTED   FINISHED   AGE
    job-sample-job-sk42x-77c03   user-queue                                         3m8s
    Copy to Clipboard Toggle word wrap

2.12. Getting support

If you experience difficulty with a procedure described in this documentation, or with Red Hat build of Kueue in general, visit the Red Hat Customer Portal.

From the Customer Portal, you can:

  • Search or browse through the Red Hat Knowledgebase of articles and solutions relating to Red Hat products.
  • Submit a support case to Red Hat Support.
  • Access other product documentation.

2.12.1. About the Red Hat Knowledgebase

The Red Hat Knowledgebase provides rich content aimed at helping you make the most of Red Hat’s products and technologies. The Red Hat Knowledgebase consists of articles, product documentation, and videos outlining best practices on installing, configuring, and using Red Hat products. In addition, you can search for solutions to known issues, each providing concise root cause descriptions and remedial steps.

2.12.2. Collecting data for Red Hat Support

You can use the oc adm must-gather CLI command to collect the information about your Red Hat build of Kueue instance that is most likely needed for debugging issues, including:

  • Red Hat build of Kueue custom resources, such as workloads, cluster queues, local queues, resource flavors, admission checks, and their corresponding cluster resource definitions (CRDs)
  • Services
  • Endpoints
  • Webhook configurations
  • Logs from the openshift-kueue-operator namespace and kueue-controller-manager pods

Collected data is written into a new directory named must-gather/ in the current working directory by default.

Prerequisites

  • The Red Hat build of Kueue Operator is installed on your cluster.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Navigate to the directory where you want to store the must-gather data.
  2. Collect must-gather data by running the following command:

    $ oc adm must-gather \
      --image=registry.redhat.io/kueue/kueue-must-gather-rhel9:<version>
    Copy to Clipboard Toggle word wrap

    Where <version> is your current version of Red Hat build of Kueue.

  3. Create a compressed file from the must-gather directory that was just created in your working directory. Make sure you provide the date and cluster ID for the unique must-gather data. For more information about how to find the cluster ID, see How to find the cluster-id or name on OpenShift cluster.
  4. Attach the compressed file to your support case on the the Customer Support page of the Red Hat Customer Portal.
Retour au début
Red Hat logoGithubredditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance. Découvrez nos récentes mises à jour.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez le Blog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

Theme

© 2025 Red Hat