이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 5. Installing the distributed workloads components


To use the distributed workloads feature in OpenShift AI, you must install several components.

Prerequisites

  • You have logged in to OpenShift with the cluster-admin role and you can access the data science cluster.
  • You have installed Red Hat OpenShift AI.
  • You have installed the Red Hat build of Kueue Operator on your OpenShift cluster, as described in the Red Hat build of Kueue documentation.
  • You have sufficient resources. In addition to the minimum OpenShift AI resources described in Installing and deploying OpenShift AI (for disconnected environments, see Deploying OpenShift AI in a disconnected environment), you need 1.6 vCPU and 2 GiB memory to deploy the distributed workloads infrastructure.
  • You have removed any previously installed instances of the CodeFlare Operator, as described in the Knowledgebase solution How to migrate from a separately installed CodeFlare Operator in your data science cluster.
  • If you want to use graphics processing units (GPUs), you have enabled GPU support in OpenShift AI. If you use NVIDIA GPUs, see Enabling NVIDIA GPUs. If you use AMD GPUs, see AMD GPU integration.

    Note

    In OpenShift AI, Red Hat supports the use of accelerators within the same cluster only.

    Starting from Red Hat OpenShift AI 2.19, Red Hat supports remote direct memory access (RDMA) for NVIDIA GPUs only, enabling them to communicate directly with each other by using NVIDIA GPUDirect RDMA across either Ethernet or InfiniBand networks.

  • If you want to use self-signed certificates, you have added them to a central Certificate Authority (CA) bundle as described in Working with certificates (for disconnected environments, see Working with certificates). No additional configuration is necessary to use those certificates with distributed workloads. The centrally configured self-signed certificates are automatically available in the workload pods at the following mount points:

    • Cluster-wide CA bundle:

      /etc/pki/tls/certs/odh-trusted-ca-bundle.crt
      /etc/ssl/certs/odh-trusted-ca-bundle.crt
    • Custom CA bundle:

      /etc/pki/tls/certs/odh-ca-bundle.crt
      /etc/ssl/certs/odh-ca-bundle.crt

Procedure

  1. In the OpenShift console, click Operators Installed Operators.
  2. Search for the Red Hat OpenShift AI Operator, and then click the Operator name to open the Operator details page.
  3. Click the Data Science Cluster tab.
  4. Click the default instance name (for example, default-dsc) to open the instance details page.
  5. Click the YAML tab to show the instance specifications.
  6. Enable the required distributed workloads components. In the spec.components section, set the managementState field correctly for the required components:

    • Set kueue to Unmanaged to allow the Red Hat build of Kueue Operator to manage Kueue.
    • If you want to use the CodeFlare framework to tune models, set codeflare and ray to Managed.
    • If you want to use the Kubeflow Training Operator to tune models, set trainingoperator to Managed.
    • The list of required components depends on whether the distributed workload is run from a pipeline or workbench or both, as shown in the following table.
    Expand
    Table 5.1. Components required for distributed workloads
    ComponentPipelines onlyWorkbenches onlyPipelines and workbenches

    codeflare

    Managed

    Managed

    Managed

    dashboard

    Managed

    Managed

    Managed

    datasciencepipelines

    Managed

    Removed

    Managed

    kueue

    Unmanaged

    Unmanaged

    Unmanaged

    ray

    Managed

    Managed

    Managed

    trainingoperator

    Managed

    Managed

    Managed

    workbenches

    Removed

    Managed

    Managed

  7. Click Save. After a short time, the components with a Managed state are ready.

Verification

Check the status of the codeflare-operator-manager, kubeflow-training-operator, kuberay-operator, kueue-controller-manager, and openshift-kueue-operator pods, as follows:

  1. In the OpenShift console, click Workloads Deployments.
  2. In the Search by name field, enter the following search strings:

    • In the redhat-ods-applications project, search for codeflare-operator-manager, kubeflow-training-operator, and kuberay-operator.
    • In the openshift-kueue-operator project, search for kueue-controller-manager and openshift-kueue-operator.
  3. In each case, check the status as follows:

    1. Click the deployment name to open the deployment details page.
    2. Click the Pods tab.
    3. Check the pod status.

      When the status of the pods is Running, the pods are ready to use.

    4. To see more information about each pod, click the pod name to open the pod details page, and then click the Logs tab.

Next Step

Configure the distributed workloads feature as described in Managing distributed workloads.

Red Hat logoGithubredditYoutubeTwitter

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

Red Hat을 사용하는 고객은 신뢰할 수 있는 콘텐츠가 포함된 제품과 서비스를 통해 혁신하고 목표를 달성할 수 있습니다. 최신 업데이트를 확인하세요.

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat은 코드, 문서, 웹 속성에서 문제가 있는 언어를 교체하기 위해 최선을 다하고 있습니다. 자세한 내용은 다음을 참조하세요.Red Hat 블로그.

Red Hat 소개

Red Hat은 기업이 핵심 데이터 센터에서 네트워크 에지에 이르기까지 플랫폼과 환경 전반에서 더 쉽게 작업할 수 있도록 강화된 솔루션을 제공합니다.

Theme

© 2026 Red Hat
맨 위로 이동