5.10. 验证 RDMA 连接


确认远程直接内存访问(RDMA)连接在系统间正常工作,特别是传统的单根 I/O 虚拟化(SR-IOV)以太网。

流程

  1. 使用以下命令连接到每个 rdma-workload-client pod:

    $ oc rsh -n default rdma-sriov-32-workload
    Copy to Clipboard Toggle word wrap

    输出示例

    sh-5.1#
    Copy to Clipboard Toggle word wrap

  2. 使用以下命令,检查分配给第一个工作负载 pod 的 IP 地址。在本例中,第一个工作负载 pod 是 RDMA 测试服务器。

    sh-5.1# ip a
    Copy to Clipboard Toggle word wrap

    输出示例

    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: eth0@if3970: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        link/ether 0a:58:0a:80:02:a7 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 10.128.2.167/23 brd 10.128.3.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::858:aff:fe80:2a7/64 scope link
           valid_lft forever preferred_lft forever
    3843: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether 26:34:fd:53:a6:ec brd ff:ff:ff:ff:ff:ff
        altname enp55s0f0v5
        inet 192.168.4.225/28 brd 192.168.4.239 scope global net1
           valid_lft forever preferred_lft forever
        inet6 fe80::2434:fdff:fe53:a6ec/64 scope link
           valid_lft forever preferred_lft forever
    sh-5.1#
    Copy to Clipboard Toggle word wrap

    分配给此 pod 的 RDMA 服务器的 IP 地址是 net1 接口。在本例中,IP 地址为 192.168.4.225

  3. 运行 ibstatus 命令以获取与每个 RDMA 设备 mlx5_x 关联的 link_layer 类型、以太网或 Infiniband。输出中还通过检查 state 字段来显示所有 RDMA 设备的状态,该字段会显示 ACTIVEDOWN

    sh-5.1# ibstatus
    Copy to Clipboard Toggle word wrap

    输出示例

    Infiniband device 'mlx5_0' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 4: ACTIVE
    	phys state:	 5: LinkUp
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_1' port 1 status:
    	default gid:	 fe80:0000:0000:0000:e8eb:d303:0072:1415
    	base lid:	 0xc
    	sm lid:		 0x1
    	state:		 4: ACTIVE
    	phys state:	 5: LinkUp
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 InfiniBand
    
    Infiniband device 'mlx5_2' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_3' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_4' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_5' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_6' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_7' port 1 status:
    	default gid:	 fe80:0000:0000:0000:2434:fdff:fe53:a6ec
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 4: ACTIVE
    	phys state:	 5: LinkUp
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_8' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_9' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    sh-5.1#
    Copy to Clipboard Toggle word wrap

  4. 要获取 worker 节点上每个 RDMA mlx5 设备的 link_layer,请运行 ibstat 命令:

    sh-5.1# ibstat | egrep "Port|Base|Link"
    Copy to Clipboard Toggle word wrap

    输出示例

    Port 1:
    		Physical state: LinkUp
    		Base lid: 0
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    	Port 1:
    		Physical state: LinkUp
    		Base lid: 12
    		Port GUID: 0xe8ebd30300721415
    		Link layer: InfiniBand
    	Port 1:
    		Base lid: 0
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    	Port 1:
    		Base lid: 0
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    	Port 1:
    		Base lid: 0
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    	Port 1:
    		Base lid: 0
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    	Port 1:
    		Base lid: 0
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    	Port 1:
    		Physical state: LinkUp
    		Base lid: 0
    		Port GUID: 0x2434fdfffe53a6ec
    		Link layer: Ethernet
    	Port 1:
    		Base lid: 0
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    	Port 1:
    		Base lid: 0
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    sh-5.1#
    Copy to Clipboard Toggle word wrap

  5. 对于 RDMA 共享设备或主机设备工作负载 pod,名为 mlx5_x 的 RDMA 设备已经已知,通常是 mlx5_0mlx5_1。对于 RDMA 传统 SR-IOV 工作负载 pod,您需要确定哪个 RDMA 设备与哪些虚拟功能(VF)子接口相关联。使用以下命令提供此信息:

    sh-5.1# rdma link show
    Copy to Clipboard Toggle word wrap

    输出示例

    link mlx5_0/1 state ACTIVE physical_state LINK_UP
    link mlx5_1/1 subnet_prefix fe80:0000:0000:0000 lid 12 sm_lid 1 lmc 0 state ACTIVE physical_state LINK_UP
    link mlx5_2/1 state DOWN physical_state DISABLED
    link mlx5_3/1 state DOWN physical_state DISABLED
    link mlx5_4/1 state DOWN physical_state DISABLED
    link mlx5_5/1 state DOWN physical_state DISABLED
    link mlx5_6/1 state DOWN physical_state DISABLED
    link mlx5_7/1 state ACTIVE physical_state LINK_UP netdev net1
    link mlx5_8/1 state DOWN physical_state DISABLED
    link mlx5_9/1 state DOWN physical_state DISABLED
    Copy to Clipboard Toggle word wrap

    在这个示例中,RDMA 设备名称 mlx5_7net1 接口相关联。下一个命令中使用此输出来执行 RDMA 带宽测试,该测试还会验证 worker 节点之间的 RDMA 连接。

  6. 运行以下 ib_write_bw RDMA 带宽测试命令:

    sh-5.1# /root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60  -d mlx5_7 -p 10000 --source_ip  192.168.4.225 --use_cuda=0 --use_cuda_dmabuf
    Copy to Clipboard Toggle word wrap

    其中:

    • mlx5_7 RDMA 设备在 -d 交换机中传递。
    • 源 IP 地址为 192.168.4.225 来启动 RDMA 服务器。
    • --use_cuda=0 ,--use_cuda_dmabuf 交换机表示使用 GPUDirect RDMA。

    输出示例

    WARNING: BW peak won't be measured in this run.
    Perftest doesn't supports CUDA tests with inline messages: inline size set to 0
    
    ************************************
    * Waiting for client to connect... *
    ************************************
    Copy to Clipboard Toggle word wrap

  7. 打开另一个终端窗口,并在作为 RDMA 测试客户端 pod 的第二个工作负载 pod 上运行 oc rsh 命令:

    $ oc rsh -n default rdma-sriov-33-workload
    Copy to Clipboard Toggle word wrap

    输出示例

    sh-5.1#
    Copy to Clipboard Toggle word wrap

  8. 使用以下命令,从 net1 接口获取 RDMA 测试客户端 pod IP 地址:

    sh-5.1# ip a
    Copy to Clipboard Toggle word wrap

    输出示例

    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: eth0@if4139: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
        link/ether 0a:58:0a:83:01:d5 brd ff:ff:ff:ff:ff:ff link-netnsid 0
        inet 10.131.1.213/23 brd 10.131.1.255 scope global eth0
           valid_lft forever preferred_lft forever
        inet6 fe80::858:aff:fe83:1d5/64 scope link
           valid_lft forever preferred_lft forever
    4076: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether 56:6c:59:41:ae:4a brd ff:ff:ff:ff:ff:ff
        altname enp55s0f0v0
        inet 192.168.4.226/28 brd 192.168.4.239 scope global net1
           valid_lft forever preferred_lft forever
        inet6 fe80::546c:59ff:fe41:ae4a/64 scope link
           valid_lft forever preferred_lft forever
    sh-5.1#
    Copy to Clipboard Toggle word wrap

  9. 使用以下命令,获取与每个 RDMA 设备 mlx5_x 关联的 link_layer 类型:

    sh-5.1# ibstatus
    Copy to Clipboard Toggle word wrap

    输出示例

    Infiniband device 'mlx5_0' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 4: ACTIVE
    	phys state:	 5: LinkUp
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_1' port 1 status:
    	default gid:	 fe80:0000:0000:0000:e8eb:d303:0072:09f5
    	base lid:	 0xd
    	sm lid:		 0x1
    	state:		 4: ACTIVE
    	phys state:	 5: LinkUp
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 InfiniBand
    
    Infiniband device 'mlx5_2' port 1 status:
    	default gid:	 fe80:0000:0000:0000:546c:59ff:fe41:ae4a
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 4: ACTIVE
    	phys state:	 5: LinkUp
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_3' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_4' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_5' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_6' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_7' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_8' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    
    Infiniband device 'mlx5_9' port 1 status:
    	default gid:	 0000:0000:0000:0000:0000:0000:0000:0000
    	base lid:	 0x0
    	sm lid:		 0x0
    	state:		 1: DOWN
    	phys state:	 3: Disabled
    	rate:		 200 Gb/sec (4X HDR)
    	link_layer:	 Ethernet
    Copy to Clipboard Toggle word wrap

  10. 可选:使用 ibstat 命令获取 Mellanox 卡的固件版本:

    sh-5.1# ibstat
    Copy to Clipboard Toggle word wrap

    输出示例

    CA 'mlx5_0'
    	CA type: MT4123
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0xe8ebd303007209f4
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Active
    		Physical state: LinkUp
    		Rate: 200
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00010000
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    CA 'mlx5_1'
    	CA type: MT4123
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0xe8ebd303007209f5
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Active
    		Physical state: LinkUp
    		Rate: 200
    		Base lid: 13
    		LMC: 0
    		SM lid: 1
    		Capability mask: 0xa651e848
    		Port GUID: 0xe8ebd303007209f5
    		Link layer: InfiniBand
    CA 'mlx5_2'
    	CA type: MT4124
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0x566c59fffe41ae4a
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Active
    		Physical state: LinkUp
    		Rate: 200
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00010000
    		Port GUID: 0x546c59fffe41ae4a
    		Link layer: Ethernet
    CA 'mlx5_3'
    	CA type: MT4124
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0xb2ae4bfffe8f3d02
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Down
    		Physical state: Disabled
    		Rate: 200
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00010000
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    CA 'mlx5_4'
    	CA type: MT4124
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0x2a9967fffe8bf272
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Down
    		Physical state: Disabled
    		Rate: 200
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00010000
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    CA 'mlx5_5'
    	CA type: MT4124
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0x5aff2ffffe2e17e8
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Down
    		Physical state: Disabled
    		Rate: 200
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00010000
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    CA 'mlx5_6'
    	CA type: MT4124
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0x121bf1fffe074419
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Down
    		Physical state: Disabled
    		Rate: 200
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00010000
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    CA 'mlx5_7'
    	CA type: MT4124
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0xb22b16fffed03dd7
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Down
    		Physical state: Disabled
    		Rate: 200
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00010000
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    CA 'mlx5_8'
    	CA type: MT4124
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0x523800fffe16d105
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Down
    		Physical state: Disabled
    		Rate: 200
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00010000
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    CA 'mlx5_9'
    	CA type: MT4124
    	Number of ports: 1
    	Firmware version: 20.43.1014
    	Hardware version: 0
    	Node GUID: 0xd2b4a1fffebdc4a9
    	System image GUID: 0xe8ebd303007209f4
    	Port 1:
    		State: Down
    		Physical state: Disabled
    		Rate: 200
    		Base lid: 0
    		LMC: 0
    		SM lid: 0
    		Capability mask: 0x00010000
    		Port GUID: 0x0000000000000000
    		Link layer: Ethernet
    sh-5.1#
    Copy to Clipboard Toggle word wrap

  11. 要确定哪个 RDMA 设备与客户端工作负载 pod 使用的 Virtual Function 子接口相关联,请运行以下命令。在本例中,net1 接口使用 RDMA 设备 mlx5_2

    sh-5.1# rdma link show
    Copy to Clipboard Toggle word wrap

    输出示例

    link mlx5_0/1 state ACTIVE physical_state LINK_UP
    link mlx5_1/1 subnet_prefix fe80:0000:0000:0000 lid 13 sm_lid 1 lmc 0 state ACTIVE physical_state LINK_UP
    link mlx5_2/1 state ACTIVE physical_state LINK_UP netdev net1
    link mlx5_3/1 state DOWN physical_state DISABLED
    link mlx5_4/1 state DOWN physical_state DISABLED
    link mlx5_5/1 state DOWN physical_state DISABLED
    link mlx5_6/1 state DOWN physical_state DISABLED
    link mlx5_7/1 state DOWN physical_state DISABLED
    link mlx5_8/1 state DOWN physical_state DISABLED
    link mlx5_9/1 state DOWN physical_state DISABLED
    sh-5.1#
    Copy to Clipboard Toggle word wrap

  12. 运行以下 ib_write_bw RDMA 带宽测试命令:

    sh-5.1# /root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60  -d mlx5_2 -p 10000 --source_ip  192.168.4.226 --use_cuda=0 --use_cuda_dmabuf 192.168.4.225
    Copy to Clipboard Toggle word wrap

    其中:

    • mlx5_2 RDMA 设备在 -d 交换机中传递。
    • 源 IP 地址 192.168.4.226 以及 RDMA 服务器 192.168.4.225 的目标 IP 地址。
    • --use_cuda=0 ,--use_cuda_dmabuf 交换机表示使用 GPUDirect RDMA。

      输出示例

      WARNING: BW peak won't be measured in this run.
      Perftest doesn't supports CUDA tests with inline messages: inline size set to 0
      Requested mtu is higher than active mtu
      Changing to active mtu - 3
      initializing CUDA
      Listing all CUDA devices in system:
      CUDA device 0: PCIe address is 61:00
      
      Picking device No. 0
      [pid = 8909, dev = 0] device name = [NVIDIA A40]
      creating CUDA Ctx
      making it the current CUDA Ctx
      CUDA device integrated: 0
      using DMA-BUF for GPU buffer address at 0x7f8738600000 aligned at 0x7f8738600000 with aligned size 2097152
      allocated GPU buffer of a 2097152 address at 0x23a7420 for type CUDA_MEM_DEVICE
      Calling ibv_reg_dmabuf_mr(offset=0, size=2097152, addr=0x7f8738600000, fd=40) for QP #0
      ---------------------------------------------------------------------------------------
                          RDMA_Write BW Test
       Dual-port       : OFF		Device         : mlx5_2
       Number of qps   : 16		Transport type : IB
       Connection type : RC		Using SRQ      : OFF
       PCIe relax order: ON		Lock-free      : OFF
       ibv_wr* API     : ON		Using DDP      : OFF
       TX depth        : 128
       CQ Moderation   : 1
       CQE Poll Batch  : 16
       Mtu             : 1024[B]
       Link type       : Ethernet
       GID index       : 3
       Max inline data : 0[B]
       rdma_cm QPs	 : ON
       Data ex. method : rdma_cm 	TOS    : 41
      ---------------------------------------------------------------------------------------
       local address: LID 0000 QPN 0x012d PSN 0x3cb6d7
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x012e PSN 0x90e0ac
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x012f PSN 0x153f50
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0130 PSN 0x5e0128
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0131 PSN 0xd89752
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0132 PSN 0xe5fc16
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0133 PSN 0x236787
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0134 PSN 0xd9273e
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0135 PSN 0x37cfd4
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0136 PSN 0x3bff8f
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0137 PSN 0x81f2bd
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0138 PSN 0x575c43
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x0139 PSN 0x6cf53d
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x013a PSN 0xcaaf6f
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x013b PSN 0x346437
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       local address: LID 0000 QPN 0x013c PSN 0xcc5865
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x026d PSN 0x359409
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x026e PSN 0xe387bf
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x026f PSN 0x5be79d
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0270 PSN 0x1b4b28
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0271 PSN 0x76a61b
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0272 PSN 0x3d50e1
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0273 PSN 0x1b572c
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0274 PSN 0x4ae1b5
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0275 PSN 0x5591b5
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0276 PSN 0xfa2593
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0277 PSN 0xd9473b
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0278 PSN 0x2116b2
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x0279 PSN 0x9b83b6
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x027a PSN 0xa0822b
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x027b PSN 0x6d930d
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x027c PSN 0xb1a4d
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
      ---------------------------------------------------------------------------------------
       #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
       65536      10329004         0.00               180.47 		     0.344228
      ---------------------------------------------------------------------------------------
      deallocating GPU buffer 00007f8738600000
      destroying current CUDA Ctx
      sh-5.1#
      Copy to Clipboard Toggle word wrap

      一个正的测试会在 Mpps 中看到一个预期的 BW 平均和 MsgRate。

      在完成 ib_write_bw 命令后,服务器端输出也会出现在服务器 pod 中。请参见以下示例:

      输出示例

      WARNING: BW peak won't be measured in this run.
      Perftest doesn't supports CUDA tests with inline messages: inline size set to 0
      
      ************************************
      * Waiting for client to connect... *
      ************************************
      Requested mtu is higher than active mtu
      Changing to active mtu - 3
      initializing CUDA
      Listing all CUDA devices in system:
      CUDA device 0: PCIe address is 61:00
      
      Picking device No. 0
      [pid = 9226, dev = 0] device name = [NVIDIA A40]
      creating CUDA Ctx
      making it the current CUDA Ctx
      CUDA device integrated: 0
      using DMA-BUF for GPU buffer address at 0x7f447a600000 aligned at 0x7f447a600000 with aligned size 2097152
      allocated GPU buffer of a 2097152 address at 0x2406400 for type CUDA_MEM_DEVICE
      Calling ibv_reg_dmabuf_mr(offset=0, size=2097152, addr=0x7f447a600000, fd=40) for QP #0
      ---------------------------------------------------------------------------------------
                          RDMA_Write BW Test
       Dual-port       : OFF		Device         : mlx5_7
       Number of qps   : 16		Transport type : IB
       Connection type : RC		Using SRQ      : OFF
       PCIe relax order: ON		Lock-free      : OFF
       ibv_wr* API     : ON		Using DDP      : OFF
       CQ Moderation   : 1
       CQE Poll Batch  : 16
       Mtu             : 1024[B]
       Link type       : Ethernet
       GID index       : 3
       Max inline data : 0[B]
       rdma_cm QPs	 : ON
       Data ex. method : rdma_cm 	TOS    : 41
      ---------------------------------------------------------------------------------------
       Waiting for client rdma_cm QP to connect
       Please run the same command with the IB/RoCE interface IP
      ---------------------------------------------------------------------------------------
       local address: LID 0000 QPN 0x026d PSN 0x359409
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x026e PSN 0xe387bf
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x026f PSN 0x5be79d
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0270 PSN 0x1b4b28
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0271 PSN 0x76a61b
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0272 PSN 0x3d50e1
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0273 PSN 0x1b572c
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0274 PSN 0x4ae1b5
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0275 PSN 0x5591b5
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0276 PSN 0xfa2593
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0277 PSN 0xd9473b
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0278 PSN 0x2116b2
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x0279 PSN 0x9b83b6
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x027a PSN 0xa0822b
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x027b PSN 0x6d930d
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       local address: LID 0000 QPN 0x027c PSN 0xb1a4d
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225
       remote address: LID 0000 QPN 0x012d PSN 0x3cb6d7
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x012e PSN 0x90e0ac
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x012f PSN 0x153f50
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0130 PSN 0x5e0128
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0131 PSN 0xd89752
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0132 PSN 0xe5fc16
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0133 PSN 0x236787
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0134 PSN 0xd9273e
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0135 PSN 0x37cfd4
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0136 PSN 0x3bff8f
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0137 PSN 0x81f2bd
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0138 PSN 0x575c43
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x0139 PSN 0x6cf53d
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x013a PSN 0xcaaf6f
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x013b PSN 0x346437
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
       remote address: LID 0000 QPN 0x013c PSN 0xcc5865
       GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226
      ---------------------------------------------------------------------------------------
       #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
       65536      10329004         0.00               180.47 		     0.344228
      ---------------------------------------------------------------------------------------
      deallocating GPU buffer 00007f447a600000
      destroying current CUDA Ctx
      Copy to Clipboard Toggle word wrap

返回顶部
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2025 Red Hat