このコンテンツは選択した言語では利用できません。

Chapter 3. AMD GPU Operator


AMD Instinct GPU accelerators combined with the AMD GPU Operator within your OpenShift Container Platform cluster lets you seamlessly harness computing capabilities for machine learning, Generative AI, and GPU-accelerated applications.

This documentation provides the information you need to enable, configure, and test the AMD GPU Operator. For more information, see AMD Instinct™ Accelerators.

3.1. About the AMD GPU Operator

The hardware acceleration capabilities of the AMD GPU Operator provide enhanced performance and cost efficiency for data scientists and developers using Red Hat OpenShift AI for creating artificial intelligence and machine learning (AI/ML) applications. Accelerating specific areas of GPU functions can minimize CPU processing and memory usage, improving overall application speed, memory consumption, and bandwidth restrictions.

3.2. Installing the AMD GPU Operator

As a cluster administrator, you can install the AMD GPU Operator by using the OpenShift CLI and the web console. This is a multi-step procedure that requires the installation of the Node Feature Discovery Operator, the Kernel Module Management Operator, and then the AMD GPU Operator. Use the following steps in succession to install the AMD community release of the Operator.

Next steps

  1. Install the Node Feature Discovery Operator.
  2. Install the Kernel Module Management Operator.
  3. Install and configure the AMD GPU Operator.

3.3. Testing the AMD GPU Operator

Use the following procedure to test the ROCmInfo installation and view the logs for the AMD MI210 GPU.

Procedure

  1. Create a YAML file that tests ROCmInfo:

    $ cat << EOF > rocminfo.yaml
    
    apiVersion: v1
    kind: Pod
    metadata:
     name: rocminfo
    spec:
     containers:
     - image: docker.io/rocm/pytorch:latest
       name: rocminfo
       command: ["/bin/sh","-c"]
       args: ["rocminfo"]
       resources:
        limits:
          amd.com/gpu: 1
        requests:
          amd.com/gpu: 1
     restartPolicy: Never
    EOF
    Copy to Clipboard Toggle word wrap
  2. Create the rocminfo pod:

    $ oc create -f rocminfo.yaml
    Copy to Clipboard Toggle word wrap

    Example output

    apiVersion: v1
    pod/rocminfo created
    Copy to Clipboard Toggle word wrap

  3. Check the rocmnfo log with one MI210 GPU:

    $ oc logs rocminfo | grep -A5 "Agent"
    Copy to Clipboard Toggle word wrap

    Example output

    HSA Agents
    ==========
    *******
    Agent 1
    *******
      Name:                    Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
      Uuid:                    CPU-XX
      Marketing Name:          Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
      Vendor Name:             CPU
    --
    Agent 2
    *******
      Name:                    Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
      Uuid:                    CPU-XX
      Marketing Name:          Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
      Vendor Name:             CPU
    --
    Agent 3
    *******
      Name:                    gfx90a
      Uuid:                    GPU-024b776f768a638b
      Marketing Name:          AMD Instinct MI210
      Vendor Name:             AMD
    Copy to Clipboard Toggle word wrap

  4. Delete the pod:

    $ oc delete -f rocminfo.yaml
    Copy to Clipboard Toggle word wrap

    Example output

    pod "rocminfo" deleted
    Copy to Clipboard Toggle word wrap

Red Hat logoGithubredditYoutubeTwitter

詳細情報

試用、購入および販売

コミュニティー

Red Hat ドキュメントについて

Red Hat をお使いのお客様が、信頼できるコンテンツが含まれている製品やサービスを活用することで、イノベーションを行い、目標を達成できるようにします。 最新の更新を見る.

多様性を受け入れるオープンソースの強化

Red Hat では、コード、ドキュメント、Web プロパティーにおける配慮に欠ける用語の置き換えに取り組んでいます。このような変更は、段階的に実施される予定です。詳細情報: Red Hat ブログ.

会社概要

Red Hat は、企業がコアとなるデータセンターからネットワークエッジに至るまで、各種プラットフォームや環境全体で作業を簡素化できるように、強化されたソリューションを提供しています。

Theme

© 2026 Red Hat
トップに戻る