8.2. Kueue 资源配置示例
这些示例演示了如何配置 Kueue 资源类别和集群队列。
注意
在 OpenShift AI 2.22 中,红帽不支持共享 cohorts。
8.2.1. 没有共享 cohort 的 NVIDIA GPU 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
8.2.1.1. NVIDIA RTX A400 GPU 资源类别 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "A400-node"
spec:
nodeLabels:
instance-type: nvidia-a400-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "A400-node"
spec:
nodeLabels:
instance-type: nvidia-a400-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
8.2.1.2. NVIDIA RTX A1000 GPU 资源类别 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "A1000-node"
spec:
nodeLabels:
instance-type: nvidia-a1000-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "A1000-node"
spec:
nodeLabels:
instance-type: nvidia-a1000-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
8.2.1.3. NVIDIA RTX A400 GPU 集群队列 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "A400-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
- name: "A400-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "nvidia.com/gpu"
nominalQuota: 2
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "A400-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
- name: "A400-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "nvidia.com/gpu"
nominalQuota: 2
8.2.1.4. NVIDIA RTX A1000 GPU 集群队列 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "A1000-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: "A1000-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "nvidia.com/gpu"
nominalQuota: 2
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "A1000-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: "A1000-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "nvidia.com/gpu"
nominalQuota: 2
8.2.2. Nvidia GPU 和 AMD GPU 没有共享 cohort 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
8.2.2.1. AMD GPU 资源类型 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "amd-node"
spec:
nodeLabels:
instance-type: amd-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "amd-node"
spec:
nodeLabels:
instance-type: amd-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
8.2.2.2. NVIDIA GPU 资源类型 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "nvidia-node"
spec:
nodeLabels:
instance-type: nvidia-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "nvidia-node"
spec:
nodeLabels:
instance-type: nvidia-node
tolerations:
- key: "HasGPU"
operator: "Exists"
effect: "NoSchedule"
8.2.2.3. AMD GPU 集群队列 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-a-amd-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "amd.com/gpu"]
- name: "amd-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "amd.com/gpu"
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-a-amd-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "amd.com/gpu"]
- name: "amd-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "amd.com/gpu"
8.2.2.4. NVIDIA GPU 集群队列 复制链接链接已复制到粘贴板!
复制链接链接已复制到粘贴板!
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-a-nvidia-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: "nvidia-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "nvidia.com/gpu"
nominalQuota: 2
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "team-a-nvidia-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
flavors:
- name: "nvidia-node"
resources:
- name: "cpu"
nominalQuota: 16
- name: "memory"
nominalQuota: 64Gi
- name: "nvidia.com/gpu"
nominalQuota: 2