このコンテンツは選択した言語では利用できません。

Chapter 4. JobSet Operator


4.1. JobSet Operator overview

Use the JobSet Operator on OpenShift Container Platform to manage and run large-scale, coordinated workloads like high-performance computing (HPC) and AI training. Features like multi-template job support and stable networking can help you recover quickly and use resources efficiently.

Important

The JobSet Operator is not currently available for OpenShift Container Platform 4.21. The Operator is planned to be released in the near future.

4.1.1. About the JobSet Operator

Use the JobSet Operator on OpenShift Container Platform to manage large, distributed, and coordinated computing workloads, such as high-performance computing (HPC) or artificial intelligence (AI) training, and gain automatic stability, coordination, and failure recovery.

The JobSet Operator is based on the JobSet open source project.

JobSet Operator is designed to manage a group of jobs as a single, coordinated unit. This is especially useful for fields like HPC and training massive AI models where you need a team of machines to run for hours or days.

You can use the JobSet Operator to solve problems that are too big or too complex for a standard OpenShift Container Platform job. The JobSet Operator provides coordination, stability, and recovery.

The JobSet Operator automatically sets up stable headless service to get an IP address so workers can find and communicate with each other, even after a failure and restart. It also provides automatic failure recovery. If one small part of a large training job fails, the Operator can be configured to restart the entire group of workers from a saved checkpoint. This saves time and computing costs.

The JobSet Operator offers startup control, allowing you to define a specific startup sequence to ensure dependencies are met. For example, making sure the leader is running before any workers attempt to connect.

JobSet Operator makes managing large, distributed, and coordinated computing tasks on OpenShift Container Platform easier, turning many individual components into one resilient and manageable system.

4.2. Installing the JobSet Operator

Install the JobSet Operator on OpenShift Container Platform to enable management of large-scale, coordinated computing workloads, giving your applications a unified API and failure recovery.

Important

The JobSet Operator is not currently available for OpenShift Container Platform 4.21. The Operator is planned to be released in the near future.

4.2.1. Installing the JobSet Operator

Install the JobSet Operator on OpenShift Container Platform using the web console to begin managing large-scale, coordinated computing workloads.

Prerequisites

  • You have access to the cluster with cluster-admin privileges.
  • You have access to the OpenShift Container Platform web console.
  • You have installed the cert-manager Operator for Red Hat OpenShift.

Procedure

  1. Log in to the OpenShift Container Platform web console.
  2. Verify that the cert-manager Operator for Red Hat OpenShift is installed.
  3. Install the JobSet Operator.

    1. Navigate to Ecosystem Software Catalog.
    2. Search for and select the openshift-operators project.
    3. Enter JobSet Operator into the filter box.
    4. Select the JobSet Operator and click Install.
    5. On the Install Operator page:

      1. The Update channel is set to tech-preview-v0.1, which installs the latest stable release of JobSet Operator 0.1.
      2. Under Installation mode, select A specific namespace on the cluster.
      3. Under Installed Namespace, select Operator recommended Namespace: openshift-jobset-operator.
      4. Under Update approval, select one of the following update strategies:

        • The Automatic strategy allows Operator Lifecycle Manager (OLM) to automatically update the Operator when a new version is available.
        • The Manual strategy requires a user with appropriate credentials to approve the Operator update.
      5. Click Install.
  4. Create the custom resource (CR) for the JobSet Operator:

    1. Navigate to Installed Operators JobSet Operator.
    2. Under Provided APIs, click Create instance in the JobSetOperator pane.
    3. Set the name to cluster.
    4. Set the managementState to Managed.
    5. Click Create.

Verification

  • Check that the JobSet Operator and operand pods are running by entering the following command:

    $ oc get pod -n openshift-jobset-operator
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                        READY   STATUS    RESTARTS   AGE
    jobset-controller-manager-5595547fb-b4g2x   1/1     Running   0          48s
    jobset-operator-596cb848c6-q2dmp            1/1     Running   0          2m33s
    Copy to Clipboard Toggle word wrap

4.3. JobSet Operator release notes

Track the development, features, and fixes for the JobSet Operator, which manages coordinated, large-scale computing workloads on OpenShift Container Platform.

Important

The JobSet Operator is not currently available for OpenShift Container Platform 4.21. The Operator is planned to be released in the near future.

For more information, see About the JobSet Operator.

Red Hat logoGithubredditYoutubeTwitter

詳細情報

試用、購入および販売

コミュニティー

Red Hat ドキュメントについて

Red Hat をお使いのお客様が、信頼できるコンテンツが含まれている製品やサービスを活用することで、イノベーションを行い、目標を達成できるようにします。 最新の更新を見る.

多様性を受け入れるオープンソースの強化

Red Hat では、コード、ドキュメント、Web プロパティーにおける配慮に欠ける用語の置き換えに取り組んでいます。このような変更は、段階的に実施される予定です。詳細情報: Red Hat ブログ.

会社概要

Red Hat は、企業がコアとなるデータセンターからネットワークエッジに至るまで、各種プラットフォームや環境全体で作業を簡素化できるように、強化されたソリューションを提供しています。

Theme

© 2026 Red Hat
トップに戻る