이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 3. AMD GPU Operator

AMD Instinct GPU accelerators combined with the AMD GPU Operator within your OpenShift Container Platform cluster lets you seamlessly harness computing capabilities for machine learning, Generative AI, and GPU-accelerated applications.

This documentation provides the information you need to enable, configure, and test the AMD GPU Operator. For more information, see AMD Instinct™ Accelerators.

3.1. About the AMD GPU Operator
링크 복사

The hardware acceleration capabilities of the AMD GPU Operator provide enhanced performance and cost efficiency for data scientists and developers using Red Hat OpenShift AI for creating artificial intelligence and machine learning (AI/ML) applications. Accelerating specific areas of GPU functions can minimize CPU processing and memory usage, improving overall application speed, memory consumption, and bandwidth restrictions.

3.2. Installing the AMD GPU Operator
링크 복사

As a cluster administrator, you can install the AMD GPU Operator by using the OpenShift CLI and the web console. This is a multi-step procedure that requires the installation of the Node Feature Discovery Operator, the Kernel Module Management Operator, and then the AMD GPU Operator. Use the following steps in succession to install the AMD community release of the Operator.

Next steps

Install the Node Feature Discovery Operator.
Install the Kernel Module Management Operator.
Install and configure the AMD GPU Operator.

3.3. Testing the AMD GPU Operator
링크 복사

Use the following procedure to test the ROCmInfo installation and view the logs for the AMD MI210 GPU.

Procedure

Create a YAML file that tests ROCmInfo:

cat << EOF > rocminfo.yaml

apiVersion: v1
kind: Pod
metadata:
 name: rocminfo
spec:
 containers:
 - image: docker.io/rocm/pytorch:latest
   name: rocminfo
   command: ["/bin/sh","-c"]
   args: ["rocminfo"]
   resources:
    limits:
      amd.com/gpu: 1
    requests:
      amd.com/gpu: 1
 restartPolicy: Never
EOF

$ cat << EOF > rocminfo.yaml

apiVersion: v1
kind: Pod
metadata:
 name: rocminfo
spec:
 containers:
 - image: docker.io/rocm/pytorch:latest
   name: rocminfo
   command: ["/bin/sh","-c"]
   args: ["rocminfo"]
   resources:
    limits:
      amd.com/gpu: 1
    requests:
      amd.com/gpu: 1
 restartPolicy: Never
EOF

Copy to Clipboard

Toggle word wrap

Create the rocminfo pod:
```
oc create -f rocminfo.yaml
```
```
$ oc create -f rocminfo.yaml
```
Copy to Clipboard Toggle word wrap
Example output
```
apiVersion: v1
pod/rocminfo created
```
```
apiVersion: v1
pod/rocminfo created
```
Copy to Clipboard Toggle word wrap

Check the rocmnfo log with one MI210 GPU:

oc logs rocminfo | grep -A5 "Agent"

$ oc logs rocminfo | grep -A5 "Agent"

Copy to Clipboard

Toggle word wrap

Example output

HSA Agents
==========
*******
Agent 1
*******
  Name:                    Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  Uuid:                    CPU-XX
  Marketing Name:          Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  Vendor Name:             CPU
--
Agent 2
*******
  Name:                    Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  Uuid:                    CPU-XX
  Marketing Name:          Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  Vendor Name:             CPU
--
Agent 3
*******
  Name:                    gfx90a
  Uuid:                    GPU-024b776f768a638b
  Marketing Name:          AMD Instinct MI210
  Vendor Name:             AMD

HSA Agents
==========
*******
Agent 1
*******
  Name:                    Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  Uuid:                    CPU-XX
  Marketing Name:          Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  Vendor Name:             CPU
--
Agent 2
*******
  Name:                    Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  Uuid:                    CPU-XX
  Marketing Name:          Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
  Vendor Name:             CPU
--
Agent 3
*******
  Name:                    gfx90a
  Uuid:                    GPU-024b776f768a638b
  Marketing Name:          AMD Instinct MI210
  Vendor Name:             AMD

Copy to Clipboard

Toggle word wrap

Delete the pod:
```
oc delete -f rocminfo.yaml
```
```
$ oc delete -f rocminfo.yaml
```
Copy to Clipboard Toggle word wrap
Example output
```
pod "rocminfo" deleted
```
```
pod "rocminfo" deleted
```
Copy to Clipboard Toggle word wrap

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 3. AMD GPU Operator

3.1. About the AMD GPU Operator
링크 복사

3.2. Installing the AMD GPU Operator
링크 복사

3.3. Testing the AMD GPU Operator
링크 복사

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 3. AMD GPU Operator

3.1. About the AMD GPU Operator링크 복사링크가 클립보드에 복사되었습니다!

3.2. Installing the AMD GPU Operator링크 복사링크가 클립보드에 복사되었습니다!

3.3. Testing the AMD GPU Operator링크 복사링크가 클립보드에 복사되었습니다!

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

3.1. About the AMD GPU Operator
링크 복사

3.2. Installing the AMD GPU Operator
링크 복사

3.3. Testing the AMD GPU Operator
링크 복사