11.2. 诊断
如果没有额外的设置,nova 不知道其他进程使用了一定数量的巨页内存。默认情况下,nova 假定所有巨页内存都可用于实例。如果假定这个 NUMA 节点仍有 pCPU 和空闲的巨页内存,Nova 将首先填满 NUMA 节点 0。这可能是因为以下原因造成的:
- 请求的 pCPU 仍适用于 NUMA 0
- 所有现有实例的合并内存加上实例的内存仍然被生成至 NUMA 节点 0
- OVS 等另一个进程在 NUMA 节点 0 上保存一定数量的巨页内存。
确保分配与巨页倍数相等的类别 RAM 量,以避免 [Errno 12] 无法分配内存 错误。
11.2.1. 诊断步骤 复制链接链接已复制到粘贴板!
检查
meminfo。以下显示了每个 NUMA 节点有 2MB 巨页和 512 个可用巨页的虚拟机监控程序:[root@overcloud-compute-1 ~]# cat /sys/devices/system/node/node*/meminfo | grep -i huge Node 0 AnonHugePages: 2048 kB Node 0 HugePages_Total: 1024 Node 0 HugePages_Free: 512 Node 0 HugePages_Surp: 0 Node 1 AnonHugePages: 2048 kB Node 1 HugePages_Total: 1024 Node 1 HugePages_Free: 512 Node 1 HugePages_Surp: 0检查 NUMA 架构:
[root@overcloud-compute-1 nova]# lscpu | grep -i NUMA NUMA node(s): 2 NUMA node0 CPU(s): 0-3 NUMA node1 CPU(s): 4-7检查 OVS 保留的巨页。在以下输出中,OVS 会为每个 NUMA 节点保留 512MB 的巨页:
[root@overcloud-compute-1 virt]# ovs-vsctl list Open_vSwitch | grep mem other_config : {dpdk-init="true", dpdk-lcore-mask="3", dpdk-socket-mem="512,512", pmd-cpu-mask="1e"}部署具有以下类别的实例(1 个 vCPU 和 512 MB 或内存):
[stack@undercloud-4 ~]$ nova flavor-show m1.tiny +----------------------------+-------------------------------------------------------------+ | Property | Value | +----------------------------+-------------------------------------------------------------+ | OS-FLV-DISABLED:disabled | False | | OS-FLV-EXT-DATA:ephemeral | 0 | | disk | 8 | | extra_specs | {"hw:cpu_policy": "dedicated", "hw:mem_page_size": "large"} | | id | 49debbdb-c12e-4435-97ef-f575990b352f | | name | m1.tiny | | os-flavor-access:is_public | True | | ram | 512 | | rxtx_factor | 1.0 | | swap | | | vcpus | 1 | +----------------------------+-------------------------------------------------------------+新实例将引导并将使用来自 NUMA 1 的内存:
[stack@undercloud-4 ~]$ nova list | grep d98772d1-119e-48fa-b1d9-8a68411cba0b | d98772d1-119e-48fa-b1d9-8a68411cba0b | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe8d:a6ef, 10.0.0.102 |[root@overcloud-compute-1 nova]# cat /sys/devices/system/node/node*/meminfo | grep -i huge Node 0 AnonHugePages: 2048 kB Node 0 HugePages_Total: 1024 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 AnonHugePages: 2048 kB Node 1 HugePages_Total: 1024 Node 1 HugePages_Free: 256 Node 1 HugePages_Surp: 0nova boot --nic net-id=$NETID --image cirros --flavor m1.tiny --key-name id_rsa cirros-test0这个实例无法引导:
[stack@undercloud-4 ~]$ nova list +--------------------------------------+--------------+--------+------------+-------------+-----------------------------------------+ | ID | Name | Status | Task State | Power State | Networks | +--------------------------------------+--------------+--------+------------+-------------+-----------------------------------------+ | 1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc | cirros-test0 | ERROR | - | NOSTATE | | | a44c43ca-49ad-43c5-b8a1-543ed8ab80ad | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe0f:565b, 10.0.0.105 | | e21ba401-6161-45e6-8a04-6c45cef4aa3e | cirros-test0 | ACTIVE | - | Running | provider1=2000:10::f816:3eff:fe69:18bd, 10.0.0.111 | +--------------------------------------+--------------+--------+------------+-------------+-----------------------------------------+从计算节点,检查 NUMA 节点 0 上的空闲巨页是否已耗尽。然而,NUMA 节点 1 有足够的空间:
[root@overcloud-compute-1 qemu]# cat /sys/devices/system/node/node*/meminfo | grep -i huge Node 0 AnonHugePages: 2048 kB Node 0 HugePages_Total: 1024 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 AnonHugePages: 2048 kB Node 1 HugePages_Total: 1024 Node 1 HugePages_Free: 512 Node 1 HugePages_Surp: 0/var/log/containers/nova/nova-compute.log中的信息显示实例 CPU 固定到 NUMA 节点 0:<name>instance-00000006</name> <uuid>1b72e7a1-c298-4c92-8d2c-0a9fe886e9bc</uuid> <metadata> <nova:instance xmlns:nova="http://openstack.org/xmlns/libvirt/nova/1.0"> <nova:package version="14.0.8-5.el7ost"/> <nova:name>cirros-test0</nova:name> <nova:creationTime>2017-11-23 19:53:00</nova:creationTime> <nova:flavor name="m1.tiny"> <nova:memory>512</nova:memory> <nova:disk>8</nova:disk> <nova:swap>0</nova:swap> <nova:ephemeral>0</nova:ephemeral> <nova:vcpus>1</nova:vcpus> </nova:flavor> <nova:owner> <nova:user uuid="5d1785ee87294a6fad5e2bdddd91cc20">admin</nova:user> <nova:project uuid="8c307c08d2234b339c504bfdd896c13e">admin</nova:project> </nova:owner> <nova:root type="image" uuid="6350211f-5a11-4e02-a21a-cb1c0d543214"/> </nova:instance> </metadata> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> <memoryBacking> <hugepages> <page size='2048' unit='KiB' nodeset='0'/> </hugepages> </memoryBacking> <vcpu placement='static'>1</vcpu> <cputune> <shares>1024</shares> <vcpupin vcpu='0' cpuset='2'/> <emulatorpin cpuset='2'/> </cputune> <numatune> <memory mode='strict' nodeset='0'/> <memnode cellid='0' mode='strict' nodeset='0'/> </numatune>
在 numatune 部分中,nodeset="0" 表示内存将被声明从 NUMA 0。