5.9. 创建工作负载 pod
使用本节中的步骤为共享和主机设备创建工作负载 pod。
5.9.2. 在 RoCE 上创建主机设备 RDMA 复制链接链接已复制到粘贴板!
为 NVIDIA Network Operator 为主机设备 Remote Direct Memory Access (RDMA) 创建工作负载 pod,并测试 pod 配置。
先决条件
- 确保 Operator 正在运行。
-
删除
NicClusterPolicy自定义资源 (CR) (如果存在)。
流程
生成一个新的主机设备
NicClusterPolicy(CR),如下所示:$ cat <<EOF > network-hostdev-nic-cluster-policy.yaml apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.10-0.7.0.0-0 startupProbe: initialDelaySeconds: 10 periodSeconds: 20 livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: initialDelaySeconds: 10 periodSeconds: 30 env: - name: UNLOAD_STORAGE_MODULES value: "true" - name: RESTORE_DRIVER_ON_POD_TERMINATION value: "true" - name: CREATE_IFNAMES_UDEV value: "true" sriovDevicePlugin: image: sriov-network-device-plugin repository: ghcr.io/k8snetworkplumbingwg version: v3.7.0 config: | { "resourceList": [ { "resourcePrefix": "nvidia.com", "resourceName": "hostdev", "selectors": { "vendors": ["15b3"], "isRdma": true } } ] } EOF使用以下命令在集群中创建
NicClusterPolicyCR:$ oc create -f network-hostdev-nic-cluster-policy.yaml输出示例
nicclusterpolicy.mellanox.com/nic-cluster-policy created在 DOCA/MOFED 容器中使用以下命令验证主机设备
NicClusterPolicyCR:$ oc get pods -n nvidia-network-operator输出示例
NAME READY STATUS RESTARTS AGE mofed-rhcos4.16-696886fcb4-ds-9sgvd 2/2 Running 0 2m37s mofed-rhcos4.16-696886fcb4-ds-lkjd4 2/2 Running 0 2m37s nvidia-network-operator-controller-manager-68d547dbbd-qsdkf 1/1 Running 0 141m sriov-device-plugin-6v2nz 1/1 Running 0 2m14s sriov-device-plugin-hc4t8 1/1 Running 0 2m14s使用以下命令确认资源出现在集群
oc describe node部分中:$ oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A7输出示例
Capacity: cpu: 128 ephemeral-storage: 1561525616Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 263596708Ki nvidia.com/hostdev: 2 pods: 250 Allocatable: cpu: 127500m ephemeral-storage: 1438028263499 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 262445732Ki nvidia.com/hostdev: 2 pods: 250 -- Capacity: cpu: 128 ephemeral-storage: 1561525616Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 263596704Ki nvidia.com/hostdev: 2 pods: 250 Allocatable: cpu: 127500m ephemeral-storage: 1438028263499 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 262445728Ki nvidia.com/hostdev: 2 pods: 250创建
HostDeviceNetworkCR 文件:$ cat <<EOF > hostdev-network.yaml apiVersion: mellanox.com/v1alpha1 kind: HostDeviceNetwork metadata: name: hostdev-net spec: networkNamespace: "default" resourceName: "hostdev" ipam: | { "type": "whereabouts", "range": "192.168.3.225/28", "exclude": [ "192.168.3.229/30", "192.168.3.236/32" ] } EOF使用以下命令在集群中创建
HostDeviceNetwork资源:$ oc create -f hostdev-network.yaml输出示例
hostdevicenetwork.mellanox.com/hostdev-net created使用以下命令确认资源出现在集群
oc describe node部分中:$ oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A8输出示例
Capacity: cpu: 128 ephemeral-storage: 1561525616Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 263596708Ki nvidia.com/gpu: 2 nvidia.com/hostdev: 2 pods: 250 Allocatable: cpu: 127500m ephemeral-storage: 1438028263499 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 262445732Ki nvidia.com/gpu: 2 nvidia.com/hostdev: 2 pods: 250 -- Capacity: cpu: 128 ephemeral-storage: 1561525616Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 263596680Ki nvidia.com/gpu: 2 nvidia.com/hostdev: 2 pods: 250 Allocatable: cpu: 127500m ephemeral-storage: 1438028263499 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 262445704Ki nvidia.com/gpu: 2 nvidia.com/hostdev: 2 pods: 250
5.9.3. 在 RoCE 上创建 SR-IOV 旧模式 RDMA 复制链接链接已复制到粘贴板!
在 RoCE 上配置单根 I/O 虚拟化 (SR-IOV) 旧模式主机设备 RDMA。
流程
生成一个新的主机设备
NicClusterPolicy自定义资源 (CR):$ cat <<EOF > network-sriovleg-nic-cluster-policy.yaml apiVersion: mellanox.com/v1alpha1 kind: NicClusterPolicy metadata: name: nic-cluster-policy spec: ofedDriver: image: doca-driver repository: nvcr.io/nvidia/mellanox version: 24.10-0.7.0.0-0 startupProbe: initialDelaySeconds: 10 periodSeconds: 20 livenessProbe: initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: initialDelaySeconds: 10 periodSeconds: 30 env: - name: UNLOAD_STORAGE_MODULES value: "true" - name: RESTORE_DRIVER_ON_POD_TERMINATION value: "true" - name: CREATE_IFNAMES_UDEV value: "true" EOF使用以下命令在集群中创建策略:
$ oc create -f network-sriovleg-nic-cluster-policy.yaml输出示例
nicclusterpolicy.mellanox.com/nic-cluster-policy created在 DOCA/MOFED 容器中使用以下命令验证 pod:
$ oc get pods -n nvidia-network-operator输出示例
NAME READY STATUS RESTARTS AGE mofed-rhcos4.16-696886fcb4-ds-4mb42 2/2 Running 0 40s mofed-rhcos4.16-696886fcb4-ds-8knwq 2/2 Running 0 40s nvidia-network-operator-controller-manager-68d547dbbd-qsdkf 1/1 Running 13 (4d ago) 4d21h为您要在 SR-IOV 传统模式下运行的设备生成虚拟功能 (VF) 的
SriovNetworkNodePolicyCR。请参见以下示例:$ cat <<EOF > sriov-network-node-policy.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-legacy-policy namespace: openshift-sriov-network-operator spec: deviceType: netdevice mtu: 1500 nicSelector: vendor: "15b3" pfNames: ["ens8f0np0#0-7"] nodeSelector: feature.node.kubernetes.io/pci-15b3.present: "true" numVfs: 8 priority: 90 isRdma: true resourceName: sriovlegacy EOF使用以下命令在集群中创建 CR:
注意确保启用了 SR-IOV 全局启用。如需更多信息,请参阅无法启用 SR-IOV,并在 Red Hat Enterprise Linux 中接收消息 "not enough MMIO resources for SR-IOV "。
$ oc create -f sriov-network-node-policy.yaml输出示例
sriovnetworknodepolicy.sriovnetwork.openshift.io/sriov-legacy-policy created每个节点都被禁用调度。节点重新引导以应用配置。您可以使用以下命令查看节点:
$ oc get nodes输出示例
NAME STATUS ROLES AGE VERSION edge-19.edge.lab.eng.rdu2.redhat.com Ready control-plane,master,worker 5d v1.29.8+632b078 nvd-srv-32.nvidia.eng.rdu2.dc.redhat.com Ready worker 4d22h v1.29.8+632b078 nvd-srv-33.nvidia.eng.rdu2.dc.redhat.com NotReady,SchedulingDisabled worker 4d22h v1.29.8+632b078节点重启后,通过在每个节点中打开 debug pod 来验证 VF 接口是否存在。运行以下命令:
a$ oc debug node/nvd-srv-33.nvidia.eng.rdu2.dc.redhat.com输出示例
Starting pod/nvd-srv-33nvidiaengrdu2dcredhatcom-debug-cqfjz ... To use host binaries, run `chroot /host` Pod IP: 10.6.135.12 If you don't see a command prompt, try pressing enter. sh-5.1# chroot /host sh-5.1# ip link show | grep ens8 26: ens8f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 42: ens8f0v0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 43: ens8f0v1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 44: ens8f0v2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 45: ens8f0v3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 46: ens8f0v4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 47: ens8f0v5: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 48: ens8f0v6: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 49: ens8f0v7: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000- 如有必要,在第二个节点上重复前面的步骤。
可选:使用以下命令确认资源出现在集群
oc describe node部分中:$ oc describe node -l node-role.kubernetes.io/worker=| grep -E 'Capacity:|Allocatable:' -A8输出示例
Capacity: cpu: 128 ephemeral-storage: 1561525616Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 263596692Ki nvidia.com/gpu: 2 nvidia.com/hostdev: 0 openshift.io/sriovlegacy: 8 -- Allocatable: cpu: 127500m ephemeral-storage: 1438028263499 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 262445716Ki nvidia.com/gpu: 2 nvidia.com/hostdev: 0 openshift.io/sriovlegacy: 8 -- Capacity: cpu: 128 ephemeral-storage: 1561525616Ki hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 263596688Ki nvidia.com/gpu: 2 nvidia.com/hostdev: 0 openshift.io/sriovlegacy: 8 -- Allocatable: cpu: 127500m ephemeral-storage: 1438028263499 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 262445712Ki nvidia.com/gpu: 2 nvidia.com/hostdev: 0 openshift.io/sriovlegacy: 8SR-IOV 传统模式的 VF 就绪后,生成
SriovNetworkCR 文件。请参见以下示例:$ cat <<EOF > sriov-network.yaml apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-network namespace: openshift-sriov-network-operator spec: vlan: 0 networkNamespace: "default" resourceName: "sriovlegacy" ipam: | { "type": "whereabouts", "range": "192.168.3.225/28", "exclude": [ "192.168.3.229/30", "192.168.3.236/32" ] } EOF使用以下命令在集群中创建自定义资源:
$ oc create -f sriov-network.yaml输出示例
sriovnetwork.sriovnetwork.openshift.io/sriov-network created