Ce contenu n'est pas disponible dans la langue sélectionnée.
Chapter 4. JobSet Operator
4.1. JobSet Operator overview Copier lienLien copié sur presse-papiers!
Use the JobSet Operator on OpenShift Container Platform to easily manage and run large-scale, coordinated workloads like high-performance computing (HPC) and AI training. The JobSet Operator can help you gain fast recovery and efficient resource use through features like multi-template job support and stable networking.
JobSet Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
4.1.1. About the JobSet Operator Copier lienLien copié sur presse-papiers!
Use the JobSet Operator on OpenShift Container Platform to manage large, distributed, and coordinated computing workloads, such as high-performance computing (HPC) or artificial intelligence (AI) training, and gain automatic stability, coordination, and failure recovery.
The JobSet Operator is based on the JobSet open source project.
JobSet Operator is designed to manage a group of jobs as a single, coordinated unit. This is especially useful for fields like HPC and training massive AI models where you need a team of machines to run for hours or days.
You can use the JobSet Operator to solve problems that are too big or too complex for a standard OpenShift Container Platform job. The JobSet Operator provides coordination, stability, and recovery.
The JobSet Operator automatically sets up stable headless service to get an IP address so workers can find and communicate with each other, even after a failure and restart. It also provides automatic failure recovery. If one small part of a large training job fails, the Operator can be configured to restart the entire group of workers from a saved checkpoint. This saves time and computing costs.
The JobSet Operator offers startup control, allowing you to define a specific startup sequence to ensure dependencies are met. For example, making sure the leader is running before any workers attempt to connect.
JobSet Operator makes managing large, distributed, and coordinated computing tasks on OpenShift Container Platform easier, turning many individual components into one resilient and manageable system.
4.2. Installing the JobSet Operator Copier lienLien copié sur presse-papiers!
Install the JobSet Operator on OpenShift Container Platform to enable management of large-scale, coordinated computing workloads, giving your applications a unified API and failure recovery.
JobSet Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
4.2.1. Installing the JobSet Operator Copier lienLien copié sur presse-papiers!
Install the JobSet Operator on OpenShift Container Platform using the web console to begin managing large-scale, coordinated computing workloads.
Prerequisites
-
You have access to the cluster with
cluster-adminprivileges. - You have access to the OpenShift Container Platform web console.
- You have installed the cert-manager Operator for Red Hat OpenShift.
Procedure
- Log in to the OpenShift Container Platform web console.
- Verify that the cert-manager Operator for Red Hat OpenShift is installed.
Install the JobSet Operator.
-
Navigate to Ecosystem
Software Catalog. -
Search for and select the
openshift-operatorsproject. - Enter JobSet Operator into the filter box.
- Select the JobSet Operator and click Install.
On the Install Operator page:
- The Update channel is set to tech-preview-v0.1, which installs the latest stable release of JobSet Operator 0.1.
- Under Installation mode, select A specific namespace on the cluster.
- Under Installed Namespace, select Operator recommended Namespace: openshift-jobset-operator.
Under Update approval, select one of the following update strategies:
- The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
- The Manual strategy requires a user with appropriate credentials to approve the Operator update.
- Click Install.
-
Navigate to Ecosystem
Create the custom resource (CR) for the JobSet Operator:
-
Navigate to Installed Operators
JobSet Operator. - Navigate to Create JobSetOperator page.
- Set the name to cluster.
- Set the managementState to Managed.
- Under Provided APIs, click Create instance in the JobSetOperator pane.
- Click Create.
-
Navigate to Installed Operators
Verification
Check that the JobSet Operator and operand pods are running by entering the following command:
oc get pod -n openshift-jobset-operator
$ oc get pod -n openshift-jobset-operatorCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY STATUS RESTARTS AGE jobset-controller-manager-5595547fb-b4g2x 1/1 Running 0 48s jobset-operator-596cb848c6-q2dmp 1/1 Running 0 2m33s
NAME READY STATUS RESTARTS AGE jobset-controller-manager-5595547fb-b4g2x 1/1 Running 0 48s jobset-operator-596cb848c6-q2dmp 1/1 Running 0 2m33sCopy to Clipboard Copied! Toggle word wrap Toggle overflow
4.3. JobSet Operator release notes Copier lienLien copié sur presse-papiers!
Track the development, features, and fixes for the JobSet Operator, which manages coordinated, large-scale computing workloads on OpenShift Container Platform.
JobSet Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
For more information, see About the JobSet Operator.
4.3.1. Release notes for JobSet Operator 0.1.0 Copier lienLien copié sur presse-papiers!
Review the new features and advisories for the initial Technology Preview release of JobSet Operator 0.1.0.
Issued: 4 November 2025
The following advisories are available for the JobSet Operator 0.1.0:
4.3.1.1. New features and enhancements Copier lienLien copié sur presse-papiers!
- This is the initial Technology Preview release of the JobSet Operator.