Chapter 4. JobSet Operator

4.1. JobSet Operator overview
Copy link

Use the JobSet Operator on OpenShift Container Platform to manage and run large-scale, coordinated workloads like high-performance computing (HPC) and AI training. Features like multi-template job support and stable networking can help you recover quickly and use resources efficiently.

4.1.1. About the JobSet Operator
Copy link

Use the JobSet Operator on OpenShift Container Platform to manage large, distributed, and coordinated computing workloads, such as high-performance computing (HPC) or artificial intelligence (AI) training, and gain automatic stability, coordination, and failure recovery.

The JobSet Operator is based on the JobSet open source project.

JobSet Operator is designed to manage a group of jobs as a single, coordinated unit. This is especially useful for fields like HPC and training massive AI models where you need a team of machines to run for hours or days.

You can use the JobSet Operator to solve problems that are too big or too complex for a standard OpenShift Container Platform job. The JobSet Operator provides coordination, stability, and recovery.

The JobSet Operator automatically sets up stable headless service to get an IP address so workers can find and communicate with each other, even after a failure and restart. It also provides automatic failure recovery. If one small part of a large training job fails, the Operator can be configured to restart the entire group of workers from a saved checkpoint. This saves time and computing costs.

The JobSet Operator offers startup control, allowing you to define a specific startup sequence to ensure dependencies are met. For example, making sure the leader is running before any workers attempt to connect.

JobSet Operator makes managing large, distributed, and coordinated computing tasks on OpenShift Container Platform easier, turning many individual components into one resilient and manageable system.

4.2. JobSet Operator release notes
Copy link

Track the development, features, and fixes for the JobSet Operator, which manages coordinated, large-scale computing workloads on OpenShift Container Platform.

For more information, see About the JobSet Operator.

4.2.1. Release notes for JobSet Operator 1.0
Copy link

Review the new features and advisories for the initial release of JobSet Operator 1.0.

Issued: 12 February 2026

The following advisories are available for the JobSet Operator 1.0:

RHBA-2026:2570

4.2.1.1. New features and enhancements
Copy link

This is the initial Generally Available release of the JobSet Operator.

4.3. Installing the JobSet Operator
Copy link

Install the JobSet Operator on OpenShift Container Platform to enable management of large-scale, coordinated computing workloads, giving your applications a unified API and failure recovery.

4.3.1. Installing the JobSet Operator
Copy link

Install the JobSet Operator on OpenShift Container Platform using the web console to begin managing large-scale, coordinated computing workloads.

Prerequisites

You have access to the cluster with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.
You have installed the cert-manager Operator for Red Hat OpenShift.

Procedure

Log in to the OpenShift Container Platform web console.
Verify that the cert-manager Operator for Red Hat OpenShift is installed.
Install the JobSet Operator.
1. Navigate to Ecosystem Software Catalog.
2. Search for and select the openshift-operators project.
3. Enter JobSet Operator into the filter box.
4. Select the JobSet Operator and click Install.
5. On the Install Operator page:
  1. The Update channel is set to stable-v1.0, which installs the latest stable release of JobSet Operator.
  2. Under Installation mode, select A specific namespace on the cluster.
  3. Under Installed Namespace, select Operator recommended Namespace: openshift-jobset-operator.
  4. Under Update approval, select one of the following update strategies:
    The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
    The Manual strategy requires a user with appropriate credentials to approve the Operator update.
  5. Click Install.
Create the custom resource (CR) for the JobSet Operator:
1. Navigate to Installed Operators JobSet Operator.
2. Under Provided APIs, click Create instance in the JobSetOperator pane.
3. Set the name to cluster.
4. Set the managementState to Managed.
5. Click Create.

Verification

Check that the JobSet Operator and operand pods are running by entering the following command:

oc get pod -n openshift-jobset-operator

$ oc get pod -n openshift-jobset-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                        READY   STATUS    RESTARTS   AGE
jobset-controller-manager-5595547fb-b4g2x   1/1     Running   0          48s
jobset-operator-596cb848c6-q2dmp            1/1     Running   0          2m33s

NAME                                        READY   STATUS    RESTARTS   AGE
jobset-controller-manager-5595547fb-b4g2x   1/1     Running   0          48s
jobset-operator-596cb848c6-q2dmp            1/1     Running   0          2m33s

Copy to Clipboard

Toggle word wrap

4.4. Managing workloads with the JobSet Operator
Copy link

Use the JobSet Operator on OpenShift Container Platform to manage and run large-scale, coordinated workloads like high-performance computing (HPC) and AI training. Features like multi-template job support and stable networking can help you recover quickly and use resources efficiently.

4.4.1. Deploying a JobSet
Copy link

You can use the JobSet Operator to deploy a JobSet to manage and run large-scale, coordinated workloads.

Prerequisites

You have installed the JobSet Operator.
You have a cluster with available NVIDIA GPUs.

Procedure

Create a new project by running the following command:
```
oc new-project <my_namespace>
```
```
$ oc new-project <my_namespace>
```
Copy to Clipboard Toggle word wrap

Create a file named jobset.yaml:

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: pytorch
spec:
  replicatedJobs:
  - name: workers
    template:
      spec:
        parallelism: <pods_running_number>
        completions: <pods_finish_number>
        backoffLimit: 0
        template:
          spec:
            imagePullSecrets:
              - name: my-registry-secret
            initContainers:
              - name: prepare
                image: docker.io/alpine/git:v2.52.0
                args: ['clone', 'https://github.com/pytorch/examples']
                volumeMounts:
                  - name: workdir
                    mountPath: /git
            containers:
              - name: pytorch
                image: docker.io/pytorch/pytorch:2.10.0-cuda13.0-cudnn9-runtime
                resources:
                  limits:
                    nvidia.com/gpu: "1"
                  requests:
                    nvidia.com/gpu: "1"
                ports:
                - containerPort: 4321
                env:
                - name: MASTER_ADDR
                  value: "pytorch-workers-0-0.pytorch"
                - name: MASTER_PORT
                  value: "4321"
                - name: RANK
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
                - name: PYTHONUNBUFFERED
                  value: "0"
                command:
                - /bin/sh
                - -c
                - |
                  cd examples/distributed/ddp-tutorial-series
                  torchrun --nproc_per_node=1 --nnodes=3 --rdzv_id=100 --rdzv_backend=c10d --rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT multinode.py 1000 100
                volumeMounts:
                  - name: workdir
                    mountPath: /workspace
            volumes:
              - name: workdir
                emptyDir: {}

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: pytorch
spec:
  replicatedJobs:
  - name: workers
    template:
      spec:
        parallelism: <pods_running_number>
        completions: <pods_finish_number>
        backoffLimit: 0
        template:
          spec:
            imagePullSecrets:
              - name: my-registry-secret
            initContainers:
              - name: prepare
                image: docker.io/alpine/git:v2.52.0
                args: ['clone', 'https://github.com/pytorch/examples']
                volumeMounts:
                  - name: workdir
                    mountPath: /git
            containers:
              - name: pytorch
                image: docker.io/pytorch/pytorch:2.10.0-cuda13.0-cudnn9-runtime
                resources:
                  limits:
                    nvidia.com/gpu: "1"
                  requests:
                    nvidia.com/gpu: "1"
                ports:
                - containerPort: 4321
                env:
                - name: MASTER_ADDR
                  value: "pytorch-workers-0-0.pytorch"
                - name: MASTER_PORT
                  value: "4321"
                - name: RANK
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
                - name: PYTHONUNBUFFERED
                  value: "0"
                command:
                - /bin/sh
                - -c
                - |
                  cd examples/distributed/ddp-tutorial-series
                  torchrun --nproc_per_node=1 --nnodes=3 --rdzv_id=100 --rdzv_backend=c10d --rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT multinode.py 1000 100
                volumeMounts:
                  - name: workdir
                    mountPath: /workspace
            volumes:
              - name: workdir
                emptyDir: {}

Copy to Clipboard

Toggle word wrap

where:

<pods_running_number>: Specifies the number of pods running at the same time.
<pods_finish_number>: Specifies the total number of pods that must finish successfully for the job to be marked complete.

Apply the JobSet configuration by running the following command:
```
oc apply -f jobset.yaml
```
```
$ oc apply -f jobset.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Verify that pods were started by running the following command:

oc get pods -n <my_namespace>

$ oc get pods -n <my_namespace>

Copy to Clipboard

Toggle word wrap

Example output

NAME                        READY   STATUS    RESTARTS   AGE
pytorch-workers-0-0-2lzwt   1/1     Running   0          2m17s
pytorch-workers-0-1-g2lrv   1/1     Running   0          2m17s
pytorch-workers-0-2-dpljq   1/1     Running   0          2m17s

NAME                        READY   STATUS    RESTARTS   AGE
pytorch-workers-0-0-2lzwt   1/1     Running   0          2m17s
pytorch-workers-0-1-g2lrv   1/1     Running   0          2m17s
pytorch-workers-0-2-dpljq   1/1     Running   0          2m17s

Copy to Clipboard

Toggle word wrap

4.4.2. Specifying a JobSet coordinator
Copy link

To manage communication between JobSet pods, you can assign a specific JobSet coordinator pod. This ensures that your distributed workloads can reference a stable network endpoint as a central point of coordination for task synchronization and data exchange.

Prerequisites

You have installed the JobSet Operator.

Procedure

Create a new namespace by running the following command.
```
oc new-project <new_namespace>
```
```
$ oc new-project <new_namespace>
```
Copy to Clipboard Toggle word wrap

Create a YAML file called jobset-coordinator.yaml:

Example YAML file

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: coordinator
spec:
  coordinator:
    replicatedJob: driver
    jobIndex: 0
    podIndex: 0
  replicatedJobs:
  - name: workers
    template:
      spec:
        parallelism: <pods_running_number>
        completions: <pods_finish_number>
        backoffLimit: 0
        template:
          spec:
            containers:
            - name: worker
              env:
                - name: COORDINATOR_ENDPOINT
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.labels['jobset.sigs.k8s.io/coordinator']
              image: quay.io/nginx/nginx-unprivileged:1.29-alpine
              command: [ "/bin/sh", "-c" ]
              args:
                - |
                  while ! curl -s "${COORDINATOR_ENDPOINT}:8080" | grep Welcome; do
                    sleep 3
                  done
                  sleep 100
  - name: driver
    template:
      spec:
        parallelism: <pods_running_number>
        completions: <pods_finish_number>
        backoffLimit: 0
        template:
          spec:
            containers:
            - name: driver
              image: quay.io/nginx/nginx-unprivileged:1.29-alpine
            ports:
              - containerPort: 8080
                protocol: TCP

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: coordinator
spec:
  coordinator:
    replicatedJob: driver
    jobIndex: 0
    podIndex: 0
  replicatedJobs:
  - name: workers
    template:
      spec:
        parallelism: <pods_running_number>
        completions: <pods_finish_number>
        backoffLimit: 0
        template:
          spec:
            containers:
            - name: worker
              env:
                - name: COORDINATOR_ENDPOINT
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.labels['jobset.sigs.k8s.io/coordinator']
              image: quay.io/nginx/nginx-unprivileged:1.29-alpine
              command: [ "/bin/sh", "-c" ]
              args:
                - |
                  while ! curl -s "${COORDINATOR_ENDPOINT}:8080" | grep Welcome; do
                    sleep 3
                  done
                  sleep 100
  - name: driver
    template:
      spec:
        parallelism: <pods_running_number>
        completions: <pods_finish_number>
        backoffLimit: 0
        template:
          spec:
            containers:
            - name: driver
              image: quay.io/nginx/nginx-unprivileged:1.29-alpine
            ports:
              - containerPort: 8080
                protocol: TCP

Copy to Clipboard

Toggle word wrap

where:

<pods_running_number>: Specifies the number of pods running at the same time.
<pods_finish_number>: Specifies the total number of pods that must finish successfully for the job to be marked complete.

Apply the jobset-coordinator.yaml file by running the following command:
```
oc apply -f jobset-coordinator.yaml
```
```
$ oc apply -f jobset-coordinator.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Verify that pods were created by running the following command:

oc get pods -n <new_namespace>

$ oc get pods -n <new_namespace>

Copy to Clipboard

Toggle word wrap

Example output

NAME                            READY   STATUS              RESTARTS   AGE
coordinator-driver-0-0-svgk7    1/1     Running             0          67s
coordinator-workers-0-0-57jvg   1/1     Running             0          67s
coordinator-workers-0-1-mghvx   1/1     Running             0          67s
coordinator-workers-0-2-7cnvv   1/1     Running             0          67s

NAME                            READY   STATUS              RESTARTS   AGE
coordinator-driver-0-0-svgk7    1/1     Running             0          67s
coordinator-workers-0-0-57jvg   1/1     Running             0          67s
coordinator-workers-0-1-mghvx   1/1     Running             0          67s
coordinator-workers-0-2-7cnvv   1/1     Running             0          67s

Copy to Clipboard

Toggle word wrap

4.4.3. Failure policy configuration for JobSet Operator
Copy link

To control workload behavior in response to child job failures, you can configure a JobSet failure policy. This enables you to define specific actions, such as restarting or failing the entire JobSet, based on the failure reason or the specific replicated job affected.

4.4.3.1. Failure policy actions
Copy link

These actions are available when a job failure matches a defined rule.

Expand

Action	Description
`FailJobSet`	Marks the entire JobSet as failed immediately.
`RestartJobSet`	Restarts the JobSet by recreating all child jobs. This action counts toward the `maxRestarts` limit. This is the default action if no rules match.
`RestartJobSetAndIgnoreMaxRestarts`	Restarts the JobSet without counting toward the `maxRestarts` limit.

4.4.3.2. Rule-targeting attributes
Copy link

Use the following attributes to define failure rules.

Expand

Attribute	Description
`targetReplicatedJobs`	Specifies which replicated jobs trigger the rule.
`onJobFailureReasons`	Triggers the rule based on the specific job failure reason. Valid values include `BackoffLimitExceeded`, `DeadlineExceeded`, and `PodFailurePolicy`.

4.4.3.3. Configuration example
Copy link

This configuration marks the JobSet as failed if the leader job fails.

Example of a YAML file to mark the job set failed if the leader fails

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: failjobset-action-example
spec:
  failurePolicy:
    maxRestarts: 3
    rules:
      - action: FailJobSet
        targetReplicatedJobs:
        - leader
  replicatedJobs:
  - name: leader
    replicas: 1
    template:
      spec:
        backoffLimit: 0
        completions: 2
        parallelism: 2
        template:
          spec:
            containers:
            - name: leader
              image: docker.io/bash:latest
              command:
              - bash
              - -xc
              - |
                echo "JOB_COMPLETION_INDEX=$JOB_COMPLETION_INDEX"
                if [[ "$JOB_COMPLETION_INDEX" == "0" ]]; then
                  for i in $(seq 10 -1 1)
                  do
                    echo "Sleeping in $i"
                    sleep 1
                  done
                  exit 1
                fi
                for i in $(seq 1 1000)
                do
                  echo "$i"
                  sleep 1
                done
  - name: workers
    replicas: 1
    template:
      spec:
        backoffLimit: 0
        completions: 2
        parallelism: 2
        template:
          spec:
            containers:
            - name: worker
              image: docker.io/bash:latest
              command:
              - bash
              - -xc
              - |
                sleep 1000

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: failjobset-action-example
spec:
  failurePolicy:
    maxRestarts: 3
    rules:
      - action: FailJobSet
        targetReplicatedJobs:
        - leader
  replicatedJobs:
  - name: leader
    replicas: 1
    template:
      spec:
        backoffLimit: 0
        completions: 2
        parallelism: 2
        template:
          spec:
            containers:
            - name: leader
              image: docker.io/bash:latest
              command:
              - bash
              - -xc
              - |
                echo "JOB_COMPLETION_INDEX=$JOB_COMPLETION_INDEX"
                if [[ "$JOB_COMPLETION_INDEX" == "0" ]]; then
                  for i in $(seq 10 -1 1)
                  do
                    echo "Sleeping in $i"
                    sleep 1
                  done
                  exit 1
                fi
                for i in $(seq 1 1000)
                do
                  echo "$i"
                  sleep 1
                done
  - name: workers
    replicas: 1
    template:
      spec:
        backoffLimit: 0
        completions: 2
        parallelism: 2
        template:
          spec:
            containers:
            - name: worker
              image: docker.io/bash:latest
              command:
              - bash
              - -xc
              - |
                sleep 1000

Copy to Clipboard

Toggle word wrap

4.4.4. Configuring volume claim policies for JobSet Operator
Copy link

You can configure a JobSet to automatically create and manage shared persistent volume claims (PVCs) across multiple replicated jobs. This is useful for workloads that require shared access to datasets, models, or checkpoints.

Prerequisites

You have the JobSet Operator installed in your cluster.
You have set a default storage class or chosen a storage class for your workload.

Procedure

Define the volume templates in the spec.volumeClaimPolicies section of your JobSet YAML file.

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: <job_name>
spec:
  volumeClaimPolicies:
    - templates:
        - metadata:
            name: <persistent_volume_claim_name_prefix>
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: mystorageclass
            resources:
              requests:
                storage: 1Gi
      retentionPolicy:
        whenDeleted: Retain

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: <job_name>
spec:
  volumeClaimPolicies:
    - templates:
        - metadata:
            name: <persistent_volume_claim_name_prefix>
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: mystorageclass
            resources:
              requests:
                storage: 1Gi
      retentionPolicy:
        whenDeleted: Retain

Copy to Clipboard

Toggle word wrap

where:

<job_name>: Specifies your unique identifier for your jobs within your namespace.
<persistent_volume_claim_name>: Specifies the name for the PVC. The name used here will also be used as the volumeMounts name. A volume will be automatically added to the pod that will mount a PVC created with a name in the format of <persistent_volume_claim_name>-<job_name>.
<deletion_retention_policy>: Specifies the deletion retention policy. Optionally, you can keep data after the JobSet is deleted by setting this value to Retain.

In your replicatedJobs configuration, add a volumeMount that matches the template name you defined.

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: <job_name>
spec:
  replicatedJobs:
  - name: workers
    template:
      spec:
        parallelism: 2
        completions: 2
        backoffLimit: 0
        template:
          spec:
            imagePullSecrets:
              - name: my-registry-secret
            initContainers:
              - name: prepare
                image: docker.io/alpine/git:v2.52.0
                args: ['clone', 'https://github.com/pytorch/examples']
                volumeMounts:
                  - name: <persistent_volume_claim_name>
                    mountPath: /git/checkpoint
#...

apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
  name: <job_name>
spec:
  replicatedJobs:
  - name: workers
    template:
      spec:
        parallelism: 2
        completions: 2
        backoffLimit: 0
        template:
          spec:
            imagePullSecrets:
              - name: my-registry-secret
            initContainers:
              - name: prepare
                image: docker.io/alpine/git:v2.52.0
                args: ['clone', 'https://github.com/pytorch/examples']
                volumeMounts:
                  - name: <persistent_volume_claim_name>
                    mountPath: /git/checkpoint
#...

Copy to Clipboard

Toggle word wrap

Apply the JobSet configuration by running the following command:
```
oc apply -f <jobset_yaml>
```
```
$ oc apply -f <jobset_yaml>
```
Copy to Clipboard Toggle word wrap

Verification

Verify that the PVCs were created using the naming convention <claim_name>-<jobset_name>:

oc get pvc

$ oc get pvc

Copy to Clipboard

Toggle word wrap

Example output

NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-1       Bound    pvc-385996a0-70af-4791-aa8e-9e6459e6b123   3Gi        RWO            file-storage   3d
pvc-2       Bound    pvc-8aeddd4d-aad5-4039-8d04-640a71c9a72d   12Gi       RWO            file-storage   3d
pvc-3       Bound    pvc-0050144d-940c-4c4e-a23a-2a660a5490eb   12Gi       RWO            file-storage   3d

NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
pvc-1       Bound    pvc-385996a0-70af-4791-aa8e-9e6459e6b123   3Gi        RWO            file-storage   3d
pvc-2       Bound    pvc-8aeddd4d-aad5-4039-8d04-640a71c9a72d   12Gi       RWO            file-storage   3d
pvc-3       Bound    pvc-0050144d-940c-4c4e-a23a-2a660a5490eb   12Gi       RWO            file-storage   3d

Copy to Clipboard

Toggle word wrap

4.5. Uninstalling the JobSet Operator
Copy link

Uninstall the JobSet Operator by using the OpenShift Container Platform web console to remove the Operator instance and its resources from your cluster.

4.5.1. Uninstalling the JobSet Operator
Copy link

Uninstall the JobSet Operator by using the OpenShift Container Platform web console to remove the Operator instance.

Prerequisites

You have access to the cluster with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.
You have installed the JobSet Operator.

Procedure

Log in to the OpenShift Container Platform web console.
Navigate to Operators Installed Operators.
Select openshift-js-operator from the Project dropdown list.
Delete the JobSetOperator instance.
1. Click JobSet Operator and select the JobSetOperator tab.
2. Click the Options menu next to the cluster entry and select Delete JobSetOperator.
3. In the confirmation dialog, click Delete.
Uninstall the JobSet Operator.
1. Navigate to Operators Installed Operators.
2. Click the Options menu next to the JobSet Operator entry and click Uninstall Operator.
3. In the confirmation dialog, click Uninstall.

4.5.2. Uninstalling JobSet Operator resources
Copy link

Optionally, after uninstalling the JobSet Operator, you can remove its related resources from your cluster.

Prerequisites

You have access to the cluster with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.
You have uninstalled the JobSet Operator.

Procedure

Log in to the OpenShift Container Platform web console.
Remove CRDs that were created when the JobSet Operator was installed:
1. Navigate to Administration CustomResourceDefinitions.
2. Enter JobSetOperator in the Name field to filter the CRDs.
3. Click the Options menu next to the JobSetOperator CRD and select Delete CustomResourceDefinition.
4. In the confirmation dialog, click Delete.
Delete the openshift-jobset-operator namespace.
1. Navigate to Administration Namespaces.
2. Fine openshift-jobset-operator in the list of namespaces.
3. Click the Options menu next to the openshift-jobset-operator entry and select Delete Namespace.
4. In the confirmation dialog, enter openshift-jobset-operator and click Delete.

4.1. JobSet Operator overview
Copy link

4.1.1. About the JobSet Operator
Copy link

4.2. JobSet Operator release notes
Copy link

4.2.1. Release notes for JobSet Operator 1.0
Copy link

4.2.1.1. New features and enhancements
Copy link

4.3. Installing the JobSet Operator
Copy link

4.3.1. Installing the JobSet Operator
Copy link

4.4. Managing workloads with the JobSet Operator
Copy link

4.4.1. Deploying a JobSet
Copy link

4.4.2. Specifying a JobSet coordinator
Copy link

4.4.3. Failure policy configuration for JobSet Operator
Copy link

4.4.3.1. Failure policy actions
Copy link

4.4.3.2. Rule-targeting attributes
Copy link

4.4.3.3. Configuration example
Copy link

4.4.4. Configuring volume claim policies for JobSet Operator
Copy link

4.5. Uninstalling the JobSet Operator
Copy link

4.5.1. Uninstalling the JobSet Operator
Copy link

4.5.2. Uninstalling JobSet Operator resources
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 4. JobSet Operator

4.1. JobSet Operator overviewCopy linkLink copied to clipboard!

4.1.1. About the JobSet OperatorCopy linkLink copied to clipboard!

4.2. JobSet Operator release notesCopy linkLink copied to clipboard!

4.2.1. Release notes for JobSet Operator 1.0Copy linkLink copied to clipboard!

4.2.1.1. New features and enhancementsCopy linkLink copied to clipboard!

4.3. Installing the JobSet OperatorCopy linkLink copied to clipboard!

4.3.1. Installing the JobSet OperatorCopy linkLink copied to clipboard!

4.4. Managing workloads with the JobSet OperatorCopy linkLink copied to clipboard!

4.4.1. Deploying a JobSetCopy linkLink copied to clipboard!

4.4.2. Specifying a JobSet coordinatorCopy linkLink copied to clipboard!

4.4.3. Failure policy configuration for JobSet OperatorCopy linkLink copied to clipboard!

4.4.3.1. Failure policy actionsCopy linkLink copied to clipboard!

4.4.3.2. Rule-targeting attributesCopy linkLink copied to clipboard!

4.4.3.3. Configuration exampleCopy linkLink copied to clipboard!

4.4.4. Configuring volume claim policies for JobSet OperatorCopy linkLink copied to clipboard!

4.5. Uninstalling the JobSet OperatorCopy linkLink copied to clipboard!

4.5.1. Uninstalling the JobSet OperatorCopy linkLink copied to clipboard!

4.5.2. Uninstalling JobSet Operator resourcesCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

4.1. JobSet Operator overview
Copy link

4.1.1. About the JobSet Operator
Copy link

4.2. JobSet Operator release notes
Copy link

4.2.1. Release notes for JobSet Operator 1.0
Copy link

4.2.1.1. New features and enhancements
Copy link

4.3. Installing the JobSet Operator
Copy link

4.3.1. Installing the JobSet Operator
Copy link

4.4. Managing workloads with the JobSet Operator
Copy link

4.4.1. Deploying a JobSet
Copy link

4.4.2. Specifying a JobSet coordinator
Copy link

4.4.3. Failure policy configuration for JobSet Operator
Copy link

4.4.3.1. Failure policy actions
Copy link

4.4.3.2. Rule-targeting attributes
Copy link

4.4.3.3. Configuration example
Copy link

4.4.4. Configuring volume claim policies for JobSet Operator
Copy link

4.5. Uninstalling the JobSet Operator
Copy link

4.5.1. Uninstalling the JobSet Operator
Copy link

4.5.2. Uninstalling JobSet Operator resources
Copy link