8.2. Kueue 资源配置示例
这些示例演示了如何配置 Kueue 资源类别和集群队列。
注意
在 OpenShift AI 2.17 中,红帽不支持共享 cohorts。
8.2.1. 没有共享 cohort 的 NVIDIA GPU
8.2.1.1. NVIDIA RTX A400 GPU 资源类别
apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: "A400-node" spec: nodeLabels: instance-type: nvidia-a400-node tolerations: - key: "HasGPU" operator: "Exists" effect: "NoSchedule"
8.2.1.2. NVIDIA RTX A1000 GPU 资源类别
apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: "A1000-node" spec: nodeLabels: instance-type: nvidia-a1000-node tolerations: - key: "HasGPU" operator: "Exists" effect: "NoSchedule"
8.2.1.3. NVIDIA RTX A400 GPU 集群队列
apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: "A400-queue" spec: namespaceSelector: {} # match all. resourceGroups: - coveredResources: ["cpu", "memory", "nvidia.com/gpu"] - name: "A400-node" resources: - name: "cpu" nominalQuota: 16 - name: "memory" nominalQuota: 64Gi - name: "nvidia.com/gpu" nominalQuota: 2
8.2.1.4. NVIDIA RTX A1000 GPU 集群队列
apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: "A1000-queue" spec: namespaceSelector: {} # match all. resourceGroups: - coveredResources: ["cpu", "memory", "nvidia.com/gpu"] flavors: - name: "A1000-node" resources: - name: "cpu" nominalQuota: 16 - name: "memory" nominalQuota: 64Gi - name: "nvidia.com/gpu" nominalQuota: 2
8.2.2. Nvidia GPU 和 AMD GPU 没有共享 cohort
8.2.2.1. AMD GPU 资源类型
apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: "amd-node" spec: nodeLabels: instance-type: amd-node tolerations: - key: "HasGPU" operator: "Exists" effect: "NoSchedule"
8.2.2.2. NVIDIA GPU 资源类型
apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: "nvidia-node" spec: nodeLabels: instance-type: nvidia-node tolerations: - key: "HasGPU" operator: "Exists" effect: "NoSchedule"
8.2.2.3. AMD GPU 集群队列
apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: "team-a-amd-queue" spec: namespaceSelector: {} # match all. resourceGroups: - coveredResources: ["cpu", "memory", "amd.com/gpu"] - name: "amd-node" resources: - name: "cpu" nominalQuota: 16 - name: "memory" nominalQuota: 64Gi - name: "amd.com/gpu"
8.2.2.4. NVIDIA GPU 集群队列
apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: "team-a-nvidia-queue" spec: namespaceSelector: {} # match all. resourceGroups: - coveredResources: ["cpu", "memory", "nvidia.com/gpu"] flavors: - name: "nvidia-node" resources: - name: "cpu" nominalQuota: 16 - name: "memory" nominalQuota: 64Gi - name: "nvidia.com/gpu" nominalQuota: 2
8.2.3. 其他资源
- Kueue 文档中的 资源类别
- Kueue 文档中的 Cluster Queue