8.2. Kueue 资源配置示例
这些示例演示了如何配置 Kueue 资源类别和集群队列。
注意
在 OpenShift AI 2.16 中,红帽不支持共享 cohorts。
8.2.1. 没有共享 cohort 的 NVIDIA GPU 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
8.2.1.1. NVIDIA RTX A400 GPU 资源类别 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: "A400-node" spec: nodeLabels: instance-type: nvidia-a400-node tolerations: - key: "HasGPU" operator: "Exists" effect: "NoSchedule"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "A400-node"
spec:
nodeLabels:
instance-type: nvidia-a400-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
8.2.1.2. NVIDIA RTX A1000 GPU 资源类别 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: "A1000-node" spec: nodeLabels: instance-type: nvidia-a1000-node tolerations: - key: "HasGPU" operator: "Exists" effect: "NoSchedule"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "A1000-node"
spec:
nodeLabels:
instance-type: nvidia-a1000-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
8.2.1.3. NVIDIA RTX A400 GPU 集群队列 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: "A400-queue" spec: namespaceSelector: {} # match all. resourceGroups: - coveredResources: ["cpu", "memory", "nvidia.com/gpu"] - name: "A400-node" resources: - name: "cpu" nominalQuota: 16 - name: "memory" nominalQuota: 64Gi - name: "nvidia.com/gpu" nominalQuota: 2
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "A400-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
- name: "A400-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "nvidia.com/gpu"
nominalQuota: 2
8.2.1.4. NVIDIA RTX A1000 GPU cluster queue 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: "A1000-queue" spec: namespaceSelector: {} # match all. resourceGroups: - coveredResources: ["cpu", "memory", "nvidia.com/gpu"] flavors: - name: "A1000-node" resources: - name: "cpu" nominalQuota: 16 - name: "memory" nominalQuota: 64Gi - name: "nvidia.com/gpu" nominalQuota: 2
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "A1000-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: "A1000-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "nvidia.com/gpu"
nominalQuota: 2
8.2.2. Nvidia GPU 和 AMD GPU 没有共享 cohort 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
8.2.2.1. AMD GPU 资源类型 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: "amd-node" spec: nodeLabels: instance-type: amd-node tolerations: - key: "HasGPU" operator: "Exists" effect: "NoSchedule"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "amd-node"
spec:
nodeLabels:
instance-type: amd-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
8.2.2.2. NVIDIA GPU 资源类型 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1 kind: ResourceFlavor metadata: name: "nvidia-node" spec: nodeLabels: instance-type: nvidia-node tolerations: - key: "HasGPU" operator: "Exists" effect: "NoSchedule"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "nvidia-node"
spec:
nodeLabels:
instance-type: nvidia-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
8.2.2.3. AMD GPU 集群队列 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: "team-a-amd-queue" spec: namespaceSelector: {} # match all. resourceGroups: - coveredResources: ["cpu", "memory", "amd.com/gpu"] - name: "amd-node" resources: - name: "cpu" nominalQuota: 16 - name: "memory" nominalQuota: 64Gi - name: "amd.com/gpu"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-a-amd-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "amd.com/gpu"]
- name: "amd-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "amd.com/gpu"
8.2.2.4. NVIDIA GPU 集群队列 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1 kind: ClusterQueue metadata: name: "team-a-nvidia-queue" spec: namespaceSelector: {} # match all. resourceGroups: - coveredResources: ["cpu", "memory", "nvidia.com/gpu"] flavors: - name: "nvidia-node" resources: - name: "cpu" nominalQuota: 16 - name: "memory" nominalQuota: 64Gi - name: "nvidia.com/gpu" nominalQuota: 2
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-a-nvidia-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: "nvidia-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "nvidia.com/gpu"
nominalQuota: 2