홈
제품
OpenShift Container Platform
4.20
Virtualization
9.17.11.2. About using the NVIDIA GPU Operator

9.17.11.2. About using the NVIDIA GPU Operator

You can use the NVIDIA GPU Operator with OpenShift Virtualization to rapidly provision worker nodes for running GPU-enabled virtual machines (VMs). The NVIDIA GPU Operator manages NVIDIA GPU resources in an OpenShift Container Platform cluster and automates tasks that are required when preparing nodes for GPU workloads.

Before you can deploy application workloads to a GPU resource, you must install components such as the NVIDIA drivers that enable the compute unified device architecture (CUDA), Kubernetes device plugin, container runtime, and other features, such as automatic node labeling and monitoring. By automating these tasks, you can quickly scale the GPU capacity of your infrastructure. The NVIDIA GPU Operator can especially facilitate provisioning complex artificial intelligence and machine learning (AI/ML) workloads.

9.17.11.2.1. Options for configuring mediated devices
링크 복사

There are two available methods for configuring mediated devices when using the NVIDIA GPU Operator. The method that Red Hat tests uses OpenShift Virtualization features to schedule mediated devices, while the NVIDIA method only uses the GPU Operator.

Using the NVIDIA GPU Operator to configure mediated devices

This method exclusively uses the NVIDIA GPU Operator to configure mediated devices. To use this method, refer to NVIDIA GPU Operator with OpenShift Virtualization in the NVIDIA documentation.

Using OpenShift Virtualization to configure mediated devices

This method, which is tested by Red Hat, uses OpenShift Virtualization’s capabilities to configure mediated devices. In this case, the NVIDIA GPU Operator is only used for installing drivers with the NVIDIA vGPU Manager. The GPU Operator does not configure mediated devices.

When using the OpenShift Virtualization method, you still configure the GPU Operator by following the NVIDIA documentation. However, this method differs from the NVIDIA documentation in the following ways:

You must not overwrite the default disableMDEVConfiguration: false setting in the HyperConverged custom resource (CR).
중요
Setting this feature gate as described in the NVIDIA documentation prevents OpenShift Virtualization from configuring mediated devices.

You must configure your ClusterPolicy manifest so that it matches the following example:

kind: ClusterPolicy
apiVersion: nvidia.com/v1
metadata:
  name: gpu-cluster-policy
spec:
  operator:
    defaultRuntime: crio
    use_ocp_driver_toolkit: true
    initContainer: {}
  sandboxWorkloads:
    enabled: true
    defaultWorkload: vm-vgpu
  driver:
    enabled: false
  dcgmExporter: {}
  dcgm:
    enabled: true
  daemonsets: {}
  devicePlugin: {}
  gfd: {}
  migManager:
    enabled: true
  nodeStatusExporter:
    enabled: true
  mig:
    strategy: single
  toolkit:
    enabled: true
  validator:
    plugin:
      env:
        - name: WITH_WORKLOAD
          value: "true"
  vgpuManager:
    enabled: true
    repository: <vgpu_container_registry>
    image: <vgpu_image_name>
    version: <nvidia_vgpu_manager_version>
  vgpuDeviceManager:
    enabled: false
  sandboxDevicePlugin:
    enabled: false
  vfioManager:
    enabled: false

spec.drive.enabled is set to false. This is not required for VMs.
spec.vgpuManager.enabled is set to true. This is required if you want to use vGPUs with VMs.
spec.vgpuManager.repository is set to your registry value.
spec.vgpuManager.version is set to the version of the vGPU driver you have downloaded from the NVIDIA website and used to build the image.
spec.vgpuDeviceManager.enabled is set to false to allow OpenShift Virtualization to configure mediated devices instead of the NVIDIA GPU Operator.
spec.sandboxDevicePlugin.enabled is set to false to prevent discovery and advertising of the vGPU devices to the kubelet.
spec.vfioManager.enabled is set to false to prevent loading the vfio-pci driver. Instead, follow the OpenShift Virtualization documentation to configure PCI passthrough.

9.17.11.2. About using the NVIDIA GPU Operator

9.17.11.2.1. Options for configuring mediated devices
링크 복사

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 소개

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 문서 정보

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

9.17.11.2. About using the NVIDIA GPU Operator

9.17.11.2.1. Options for configuring mediated devices링크 복사링크가 클립보드에 복사되었습니다!

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 소개

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 문서 정보

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

9.17.11.2.1. Options for configuring mediated devices
링크 복사