5.10. RDMA 연결 확인
특히 레거시 단일 루트 I/O 가상화(SR-IOV) 이더넷의 경우 시스템 간에 원격 직접 메모리 액세스(RDMA) 연결이 작동하는지 확인합니다.
프로세스
다음 명령을 사용하여 각
rdma-workload-client
포드에 연결합니다.oc rsh -n default rdma-sriov-32-workload
$ oc rsh -n default rdma-sriov-32-workload
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
sh-5.1#
sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 다음 명령을 사용하여 첫 번째 워크로드 포드에 할당된 IP 주소를 확인하세요. 이 예에서 첫 번째 워크로드 포드는 RDMA 테스트 서버입니다.
ip a
sh-5.1# ip a
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if3970: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:80:02:a7 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.128.2.167/23 brd 10.128.3.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::858:aff:fe80:2a7/64 scope link valid_lft forever preferred_lft forever 3843: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 26:34:fd:53:a6:ec brd ff:ff:ff:ff:ff:ff altname enp55s0f0v5 inet 192.168.4.225/28 brd 192.168.4.239 scope global net1 valid_lft forever preferred_lft forever inet6 fe80::2434:fdff:fe53:a6ec/64 scope link valid_lft forever preferred_lft forever sh-5.1#
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if3970: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:80:02:a7 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.128.2.167/23 brd 10.128.3.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::858:aff:fe80:2a7/64 scope link valid_lft forever preferred_lft forever 3843: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 26:34:fd:53:a6:ec brd ff:ff:ff:ff:ff:ff altname enp55s0f0v5 inet 192.168.4.225/28 brd 192.168.4.239 scope global net1 valid_lft forever preferred_lft forever inet6 fe80::2434:fdff:fe53:a6ec/64 scope link valid_lft forever preferred_lft forever sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 이 포드에 할당된 RDMA 서버의 IP 주소는
net1
인터페이스입니다. 이 예에서 IP 주소는192.168.4.225
입니다.ibstatus
명령을 실행하여 각 RDMA 장치mlx5_x
와 연관된link_layer
유형(Ethernet 또는 Infiniband)을 가져옵니다. 출력에서는상태 필드를 확인하여 모든 RDMA 장치의 상태도 보여줍니다. 상태
필드에는ACTIVE
또는DOWN
이 표시됩니다.ibstatus
sh-5.1# ibstatus
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
Infiniband device 'mlx5_0' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_1' port 1 status: default gid: fe80:0000:0000:0000:e8eb:d303:0072:1415 base lid: 0xc sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: InfiniBand Infiniband device 'mlx5_2' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_3' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_4' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_5' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_6' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_7' port 1 status: default gid: fe80:0000:0000:0000:2434:fdff:fe53:a6ec base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_8' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_9' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet sh-5.1#
Infiniband device 'mlx5_0' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_1' port 1 status: default gid: fe80:0000:0000:0000:e8eb:d303:0072:1415 base lid: 0xc sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: InfiniBand Infiniband device 'mlx5_2' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_3' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_4' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_5' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_6' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_7' port 1 status: default gid: fe80:0000:0000:0000:2434:fdff:fe53:a6ec base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_8' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_9' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 작업자 노드의 각 RDMA
mlx5
장치에 대한link_layer를
얻으려면ibstat
명령을 실행하세요.ibstat | egrep "Port|Base|Link"
sh-5.1# ibstat | egrep "Port|Base|Link"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
Port 1: Physical state: LinkUp Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Physical state: LinkUp Base lid: 12 Port GUID: 0xe8ebd30300721415 Link layer: InfiniBand Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Physical state: LinkUp Base lid: 0 Port GUID: 0x2434fdfffe53a6ec Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet sh-5.1#
Port 1: Physical state: LinkUp Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Physical state: LinkUp Base lid: 12 Port GUID: 0xe8ebd30300721415 Link layer: InfiniBand Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Physical state: LinkUp Base lid: 0 Port GUID: 0x2434fdfffe53a6ec Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet Port 1: Base lid: 0 Port GUID: 0x0000000000000000 Link layer: Ethernet sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow RDMA 공유 장치 또는 호스트 장치 워크로드 포드의 경우
mlx5_x
라는 RDMA 장치는 이미 알려져 있으며 일반적으로mlx5_0
또는mlx5_1
입니다. RDMA 레거시 SR-IOV 워크로드 포드의 경우, 어떤 RDMA 장치가 어떤 가상 기능(VF) 하위 인터페이스와 연결되어 있는지 확인해야 합니다. 다음 명령을 사용하여 이 정보를 제공하세요.rdma link show
sh-5.1# rdma link show
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
link mlx5_0/1 state ACTIVE physical_state LINK_UP link mlx5_1/1 subnet_prefix fe80:0000:0000:0000 lid 12 sm_lid 1 lmc 0 state ACTIVE physical_state LINK_UP link mlx5_2/1 state DOWN physical_state DISABLED link mlx5_3/1 state DOWN physical_state DISABLED link mlx5_4/1 state DOWN physical_state DISABLED link mlx5_5/1 state DOWN physical_state DISABLED link mlx5_6/1 state DOWN physical_state DISABLED link mlx5_7/1 state ACTIVE physical_state LINK_UP netdev net1 link mlx5_8/1 state DOWN physical_state DISABLED link mlx5_9/1 state DOWN physical_state DISABLED
link mlx5_0/1 state ACTIVE physical_state LINK_UP link mlx5_1/1 subnet_prefix fe80:0000:0000:0000 lid 12 sm_lid 1 lmc 0 state ACTIVE physical_state LINK_UP link mlx5_2/1 state DOWN physical_state DISABLED link mlx5_3/1 state DOWN physical_state DISABLED link mlx5_4/1 state DOWN physical_state DISABLED link mlx5_5/1 state DOWN physical_state DISABLED link mlx5_6/1 state DOWN physical_state DISABLED link mlx5_7/1 state ACTIVE physical_state LINK_UP netdev net1 link mlx5_8/1 state DOWN physical_state DISABLED link mlx5_9/1 state DOWN physical_state DISABLED
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 이 예에서 RDMA 장치 이름
mlx5_7
은net1
인터페이스와 연결됩니다. 이 출력은 다음 명령에서 RDMA 대역폭 테스트를 수행하는 데 사용되며, 이를 통해 작업자 노드 간의 RDMA 연결도 검증됩니다.다음
ib_write_bw
RDMA 대역폭 테스트 명령을 실행하세요./root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60 -d mlx5_7 -p 10000 --source_ip 192.168.4.225 --use_cuda=0 --use_cuda_dmabuf
sh-5.1# /root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60 -d mlx5_7 -p 10000 --source_ip 192.168.4.225 --use_cuda=0 --use_cuda_dmabuf
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 다음과 같습니다.
-
mlx5_7
RDMA 장치는-d
스위치로 전달됩니다. -
RDMA 서버를 시작하기 위한 소스 IP 주소는
192.168.4.225
입니다. -
--use_cuda=0
,--use_cuda_dmabuf
스위치는 GPUDirect RDMA를 사용한다는 것을 나타냅니다.
출력 예
WARNING: BW peak won't be measured in this run. Perftest doesn't supports CUDA tests with inline messages: inline size set to 0 ************************************ * Waiting for client to connect... * ************************************
WARNING: BW peak won't be measured in this run. Perftest doesn't supports CUDA tests with inline messages: inline size set to 0 ************************************ * Waiting for client to connect... * ************************************
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
다른 터미널 창을 열고 RDMA 테스트 클라이언트 포드 역할을 하는 두 번째 워크로드 포드에서
oc rsh
명령을 실행합니다.oc rsh -n default rdma-sriov-33-workload
$ oc rsh -n default rdma-sriov-33-workload
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
sh-5.1#
sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 다음 명령을 사용하여
net1
인터페이스에서 RDMA 테스트 클라이언트 Pod IP 주소를 얻습니다.ip a
sh-5.1# ip a
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if4139: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:83:01:d5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.131.1.213/23 brd 10.131.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::858:aff:fe83:1d5/64 scope link valid_lft forever preferred_lft forever 4076: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 56:6c:59:41:ae:4a brd ff:ff:ff:ff:ff:ff altname enp55s0f0v0 inet 192.168.4.226/28 brd 192.168.4.239 scope global net1 valid_lft forever preferred_lft forever inet6 fe80::546c:59ff:fe41:ae4a/64 scope link valid_lft forever preferred_lft forever sh-5.1#
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0@if4139: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:83:01:d5 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.131.1.213/23 brd 10.131.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::858:aff:fe83:1d5/64 scope link valid_lft forever preferred_lft forever 4076: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000 link/ether 56:6c:59:41:ae:4a brd ff:ff:ff:ff:ff:ff altname enp55s0f0v0 inet 192.168.4.226/28 brd 192.168.4.239 scope global net1 valid_lft forever preferred_lft forever inet6 fe80::546c:59ff:fe41:ae4a/64 scope link valid_lft forever preferred_lft forever sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 다음 명령을 사용하여 각 RDMA 장치
mlx5_x
와 연관된link_layer
유형을 가져옵니다.ibstatus
sh-5.1# ibstatus
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
Infiniband device 'mlx5_0' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_1' port 1 status: default gid: fe80:0000:0000:0000:e8eb:d303:0072:09f5 base lid: 0xd sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: InfiniBand Infiniband device 'mlx5_2' port 1 status: default gid: fe80:0000:0000:0000:546c:59ff:fe41:ae4a base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_3' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_4' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_5' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_6' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_7' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_8' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_9' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet
Infiniband device 'mlx5_0' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_1' port 1 status: default gid: fe80:0000:0000:0000:e8eb:d303:0072:09f5 base lid: 0xd sm lid: 0x1 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: InfiniBand Infiniband device 'mlx5_2' port 1 status: default gid: fe80:0000:0000:0000:546c:59ff:fe41:ae4a base lid: 0x0 sm lid: 0x0 state: 4: ACTIVE phys state: 5: LinkUp rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_3' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_4' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_5' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_6' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_7' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_8' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet Infiniband device 'mlx5_9' port 1 status: default gid: 0000:0000:0000:0000:0000:0000:0000:0000 base lid: 0x0 sm lid: 0x0 state: 1: DOWN phys state: 3: Disabled rate: 200 Gb/sec (4X HDR) link_layer: Ethernet
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 선택 사항:
ibstat
명령을 사용하여 Mellanox 카드의 펌웨어 버전을 가져옵니다.ibstat
sh-5.1# ibstat
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
CA 'mlx5_0' CA type: MT4123 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xe8ebd303007209f4 System image GUID: 0xe8ebd303007209f4 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_1' CA type: MT4123 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xe8ebd303007209f5 System image GUID: 0xe8ebd303007209f4 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 13 LMC: 0 SM lid: 1 Capability mask: 0xa651e848 Port GUID: 0xe8ebd303007209f5 Link layer: InfiniBand CA 'mlx5_2' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x566c59fffe41ae4a System image GUID: 0xe8ebd303007209f4 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x546c59fffe41ae4a Link layer: Ethernet CA 'mlx5_3' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xb2ae4bfffe8f3d02 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_4' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x2a9967fffe8bf272 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_5' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x5aff2ffffe2e17e8 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_6' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x121bf1fffe074419 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_7' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xb22b16fffed03dd7 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_8' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x523800fffe16d105 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_9' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xd2b4a1fffebdc4a9 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet sh-5.1#
CA 'mlx5_0' CA type: MT4123 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xe8ebd303007209f4 System image GUID: 0xe8ebd303007209f4 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_1' CA type: MT4123 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xe8ebd303007209f5 System image GUID: 0xe8ebd303007209f4 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 13 LMC: 0 SM lid: 1 Capability mask: 0xa651e848 Port GUID: 0xe8ebd303007209f5 Link layer: InfiniBand CA 'mlx5_2' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x566c59fffe41ae4a System image GUID: 0xe8ebd303007209f4 Port 1: State: Active Physical state: LinkUp Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x546c59fffe41ae4a Link layer: Ethernet CA 'mlx5_3' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xb2ae4bfffe8f3d02 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_4' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x2a9967fffe8bf272 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_5' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x5aff2ffffe2e17e8 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_6' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x121bf1fffe074419 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_7' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xb22b16fffed03dd7 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_8' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0x523800fffe16d105 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet CA 'mlx5_9' CA type: MT4124 Number of ports: 1 Firmware version: 20.43.1014 Hardware version: 0 Node GUID: 0xd2b4a1fffebdc4a9 System image GUID: 0xe8ebd303007209f4 Port 1: State: Down Physical state: Disabled Rate: 200 Base lid: 0 LMC: 0 SM lid: 0 Capability mask: 0x00010000 Port GUID: 0x0000000000000000 Link layer: Ethernet sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 클라이언트 워크로드 포드가 사용하는 가상 기능 하위 인터페이스와 연결된 RDMA 장치를 확인하려면 다음 명령을 실행하세요. 이 예에서
net1
인터페이스는 RDMA 장치mlx5_2를
사용하고 있습니다.rdma link show
sh-5.1# rdma link show
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 출력 예
link mlx5_0/1 state ACTIVE physical_state LINK_UP link mlx5_1/1 subnet_prefix fe80:0000:0000:0000 lid 13 sm_lid 1 lmc 0 state ACTIVE physical_state LINK_UP link mlx5_2/1 state ACTIVE physical_state LINK_UP netdev net1 link mlx5_3/1 state DOWN physical_state DISABLED link mlx5_4/1 state DOWN physical_state DISABLED link mlx5_5/1 state DOWN physical_state DISABLED link mlx5_6/1 state DOWN physical_state DISABLED link mlx5_7/1 state DOWN physical_state DISABLED link mlx5_8/1 state DOWN physical_state DISABLED link mlx5_9/1 state DOWN physical_state DISABLED sh-5.1#
link mlx5_0/1 state ACTIVE physical_state LINK_UP link mlx5_1/1 subnet_prefix fe80:0000:0000:0000 lid 13 sm_lid 1 lmc 0 state ACTIVE physical_state LINK_UP link mlx5_2/1 state ACTIVE physical_state LINK_UP netdev net1 link mlx5_3/1 state DOWN physical_state DISABLED link mlx5_4/1 state DOWN physical_state DISABLED link mlx5_5/1 state DOWN physical_state DISABLED link mlx5_6/1 state DOWN physical_state DISABLED link mlx5_7/1 state DOWN physical_state DISABLED link mlx5_8/1 state DOWN physical_state DISABLED link mlx5_9/1 state DOWN physical_state DISABLED sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 다음
ib_write_bw
RDMA 대역폭 테스트 명령을 실행하세요./root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60 -d mlx5_2 -p 10000 --source_ip 192.168.4.226 --use_cuda=0 --use_cuda_dmabuf 192.168.4.225
sh-5.1# /root/perftest/ib_write_bw -R -T 41 -s 65536 -F -x 3 -m 4096 --report_gbits -q 16 -D 60 -d mlx5_2 -p 10000 --source_ip 192.168.4.226 --use_cuda=0 --use_cuda_dmabuf 192.168.4.225
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 다음과 같습니다.
-
mlx5_2
RDMA 장치는-d
스위치로 전달됩니다. -
RDMA 서버의 소스 IP 주소는
192.168.4.226
이고 대상 IP 주소는192.168.4.225입니다
. --use_cuda=0
,--use_cuda_dmabuf
스위치는 GPUDirect RDMA를 사용한다는 것을 나타냅니다.출력 예
WARNING: BW peak won't be measured in this run. Perftest doesn't supports CUDA tests with inline messages: inline size set to 0 Requested mtu is higher than active mtu Changing to active mtu - 3 initializing CUDA Listing all CUDA devices in system: CUDA device 0: PCIe address is 61:00 Picking device No. 0 [pid = 8909, dev = 0] device name = [NVIDIA A40] creating CUDA Ctx making it the current CUDA Ctx CUDA device integrated: 0 using DMA-BUF for GPU buffer address at 0x7f8738600000 aligned at 0x7f8738600000 with aligned size 2097152 allocated GPU buffer of a 2097152 address at 0x23a7420 for type CUDA_MEM_DEVICE Calling ibv_reg_dmabuf_mr(offset=0, size=2097152, addr=0x7f8738600000, fd=40) for QP #0 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_2 Number of qps : 16 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON Lock-free : OFF ibv_wr* API : ON Using DDP : OFF TX depth : 128 CQ Moderation : 1 CQE Poll Batch : 16 Mtu : 1024[B] Link type : Ethernet GID index : 3 Max inline data : 0[B] rdma_cm QPs : ON Data ex. method : rdma_cm TOS : 41 --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x012d PSN 0x3cb6d7 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x012e PSN 0x90e0ac GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x012f PSN 0x153f50 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0130 PSN 0x5e0128 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0131 PSN 0xd89752 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0132 PSN 0xe5fc16 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0133 PSN 0x236787 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0134 PSN 0xd9273e GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0135 PSN 0x37cfd4 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0136 PSN 0x3bff8f GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0137 PSN 0x81f2bd GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0138 PSN 0x575c43 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0139 PSN 0x6cf53d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x013a PSN 0xcaaf6f GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x013b PSN 0x346437 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x013c PSN 0xcc5865 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x026d PSN 0x359409 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x026e PSN 0xe387bf GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x026f PSN 0x5be79d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0270 PSN 0x1b4b28 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0271 PSN 0x76a61b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0272 PSN 0x3d50e1 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0273 PSN 0x1b572c GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0274 PSN 0x4ae1b5 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0275 PSN 0x5591b5 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0276 PSN 0xfa2593 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0277 PSN 0xd9473b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0278 PSN 0x2116b2 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0279 PSN 0x9b83b6 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x027a PSN 0xa0822b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x027b PSN 0x6d930d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x027c PSN 0xb1a4d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 10329004 0.00 180.47 0.344228 --------------------------------------------------------------------------------------- deallocating GPU buffer 00007f8738600000 destroying current CUDA Ctx sh-5.1#
WARNING: BW peak won't be measured in this run. Perftest doesn't supports CUDA tests with inline messages: inline size set to 0 Requested mtu is higher than active mtu Changing to active mtu - 3 initializing CUDA Listing all CUDA devices in system: CUDA device 0: PCIe address is 61:00 Picking device No. 0 [pid = 8909, dev = 0] device name = [NVIDIA A40] creating CUDA Ctx making it the current CUDA Ctx CUDA device integrated: 0 using DMA-BUF for GPU buffer address at 0x7f8738600000 aligned at 0x7f8738600000 with aligned size 2097152 allocated GPU buffer of a 2097152 address at 0x23a7420 for type CUDA_MEM_DEVICE Calling ibv_reg_dmabuf_mr(offset=0, size=2097152, addr=0x7f8738600000, fd=40) for QP #0 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_2 Number of qps : 16 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON Lock-free : OFF ibv_wr* API : ON Using DDP : OFF TX depth : 128 CQ Moderation : 1 CQE Poll Batch : 16 Mtu : 1024[B] Link type : Ethernet GID index : 3 Max inline data : 0[B] rdma_cm QPs : ON Data ex. method : rdma_cm TOS : 41 --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x012d PSN 0x3cb6d7 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x012e PSN 0x90e0ac GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x012f PSN 0x153f50 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0130 PSN 0x5e0128 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0131 PSN 0xd89752 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0132 PSN 0xe5fc16 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0133 PSN 0x236787 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0134 PSN 0xd9273e GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0135 PSN 0x37cfd4 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0136 PSN 0x3bff8f GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0137 PSN 0x81f2bd GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0138 PSN 0x575c43 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x0139 PSN 0x6cf53d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x013a PSN 0xcaaf6f GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x013b PSN 0x346437 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 local address: LID 0000 QPN 0x013c PSN 0xcc5865 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x026d PSN 0x359409 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x026e PSN 0xe387bf GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x026f PSN 0x5be79d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0270 PSN 0x1b4b28 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0271 PSN 0x76a61b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0272 PSN 0x3d50e1 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0273 PSN 0x1b572c GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0274 PSN 0x4ae1b5 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0275 PSN 0x5591b5 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0276 PSN 0xfa2593 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0277 PSN 0xd9473b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0278 PSN 0x2116b2 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x0279 PSN 0x9b83b6 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x027a PSN 0xa0822b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x027b PSN 0x6d930d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x027c PSN 0xb1a4d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 10329004 0.00 180.47 0.344228 --------------------------------------------------------------------------------------- deallocating GPU buffer 00007f8738600000 destroying current CUDA Ctx sh-5.1#
Copy to Clipboard Copied! Toggle word wrap Toggle overflow 긍정적인 테스트는 예상되는 BW 평균과 MsgRate가 Mpps 단위로 나타나는 것을 의미합니다.
ib_write_bw
명령이 완료되면 서버 측 출력도 서버 포드에 나타납니다. 다음 예제를 참조하십시오.출력 예
WARNING: BW peak won't be measured in this run. Perftest doesn't supports CUDA tests with inline messages: inline size set to 0 ************************************ * Waiting for client to connect... * ************************************ Requested mtu is higher than active mtu Changing to active mtu - 3 initializing CUDA Listing all CUDA devices in system: CUDA device 0: PCIe address is 61:00 Picking device No. 0 [pid = 9226, dev = 0] device name = [NVIDIA A40] creating CUDA Ctx making it the current CUDA Ctx CUDA device integrated: 0 using DMA-BUF for GPU buffer address at 0x7f447a600000 aligned at 0x7f447a600000 with aligned size 2097152 allocated GPU buffer of a 2097152 address at 0x2406400 for type CUDA_MEM_DEVICE Calling ibv_reg_dmabuf_mr(offset=0, size=2097152, addr=0x7f447a600000, fd=40) for QP #0 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_7 Number of qps : 16 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON Lock-free : OFF ibv_wr* API : ON Using DDP : OFF CQ Moderation : 1 CQE Poll Batch : 16 Mtu : 1024[B] Link type : Ethernet GID index : 3 Max inline data : 0[B] rdma_cm QPs : ON Data ex. method : rdma_cm TOS : 41 --------------------------------------------------------------------------------------- Waiting for client rdma_cm QP to connect Please run the same command with the IB/RoCE interface IP --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x026d PSN 0x359409 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x026e PSN 0xe387bf GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x026f PSN 0x5be79d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0270 PSN 0x1b4b28 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0271 PSN 0x76a61b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0272 PSN 0x3d50e1 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0273 PSN 0x1b572c GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0274 PSN 0x4ae1b5 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0275 PSN 0x5591b5 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0276 PSN 0xfa2593 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0277 PSN 0xd9473b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0278 PSN 0x2116b2 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0279 PSN 0x9b83b6 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x027a PSN 0xa0822b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x027b PSN 0x6d930d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x027c PSN 0xb1a4d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x012d PSN 0x3cb6d7 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x012e PSN 0x90e0ac GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x012f PSN 0x153f50 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0130 PSN 0x5e0128 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0131 PSN 0xd89752 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0132 PSN 0xe5fc16 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0133 PSN 0x236787 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0134 PSN 0xd9273e GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0135 PSN 0x37cfd4 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0136 PSN 0x3bff8f GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0137 PSN 0x81f2bd GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0138 PSN 0x575c43 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0139 PSN 0x6cf53d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x013a PSN 0xcaaf6f GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x013b PSN 0x346437 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x013c PSN 0xcc5865 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 10329004 0.00 180.47 0.344228 --------------------------------------------------------------------------------------- deallocating GPU buffer 00007f447a600000 destroying current CUDA Ctx
WARNING: BW peak won't be measured in this run. Perftest doesn't supports CUDA tests with inline messages: inline size set to 0 ************************************ * Waiting for client to connect... * ************************************ Requested mtu is higher than active mtu Changing to active mtu - 3 initializing CUDA Listing all CUDA devices in system: CUDA device 0: PCIe address is 61:00 Picking device No. 0 [pid = 9226, dev = 0] device name = [NVIDIA A40] creating CUDA Ctx making it the current CUDA Ctx CUDA device integrated: 0 using DMA-BUF for GPU buffer address at 0x7f447a600000 aligned at 0x7f447a600000 with aligned size 2097152 allocated GPU buffer of a 2097152 address at 0x2406400 for type CUDA_MEM_DEVICE Calling ibv_reg_dmabuf_mr(offset=0, size=2097152, addr=0x7f447a600000, fd=40) for QP #0 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx5_7 Number of qps : 16 Transport type : IB Connection type : RC Using SRQ : OFF PCIe relax order: ON Lock-free : OFF ibv_wr* API : ON Using DDP : OFF CQ Moderation : 1 CQE Poll Batch : 16 Mtu : 1024[B] Link type : Ethernet GID index : 3 Max inline data : 0[B] rdma_cm QPs : ON Data ex. method : rdma_cm TOS : 41 --------------------------------------------------------------------------------------- Waiting for client rdma_cm QP to connect Please run the same command with the IB/RoCE interface IP --------------------------------------------------------------------------------------- local address: LID 0000 QPN 0x026d PSN 0x359409 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x026e PSN 0xe387bf GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x026f PSN 0x5be79d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0270 PSN 0x1b4b28 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0271 PSN 0x76a61b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0272 PSN 0x3d50e1 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0273 PSN 0x1b572c GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0274 PSN 0x4ae1b5 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0275 PSN 0x5591b5 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0276 PSN 0xfa2593 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0277 PSN 0xd9473b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0278 PSN 0x2116b2 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x0279 PSN 0x9b83b6 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x027a PSN 0xa0822b GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x027b PSN 0x6d930d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 local address: LID 0000 QPN 0x027c PSN 0xb1a4d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:225 remote address: LID 0000 QPN 0x012d PSN 0x3cb6d7 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x012e PSN 0x90e0ac GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x012f PSN 0x153f50 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0130 PSN 0x5e0128 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0131 PSN 0xd89752 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0132 PSN 0xe5fc16 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0133 PSN 0x236787 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0134 PSN 0xd9273e GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0135 PSN 0x37cfd4 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0136 PSN 0x3bff8f GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0137 PSN 0x81f2bd GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0138 PSN 0x575c43 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x0139 PSN 0x6cf53d GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x013a PSN 0xcaaf6f GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x013b PSN 0x346437 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 remote address: LID 0000 QPN 0x013c PSN 0xcc5865 GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:04:226 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[Gb/sec] BW average[Gb/sec] MsgRate[Mpps] 65536 10329004 0.00 180.47 0.344228 --------------------------------------------------------------------------------------- deallocating GPU buffer 00007f447a600000 destroying current CUDA Ctx
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
-