Chapter 13. Managing memory devices
As a system administrator, you can configure how Red Hat Enterprise Linux (RHEL) manages newly added memory devices. By default, RHEL uses udev rules to automatically online hot-plugged memory. You can disable this behavior if you require manual control over memory onlining.
13.1. Automatic memory onlining with udev Copy linkLink copied to clipboard!
Red Hat Enterprise Linux (RHEL) uses udev rules to automatically online newly added memory devices, including Compute Express Link (CXL) memory. This mechanism ensures that hot-plugged memory blocks are immediately available to the operating system without manual intervention.
The default udev rule for memory hot-plug operations is located at /usr/lib/udev/rules.d/40-redhat-hotplug.rules. This file contains the configuration that triggers automatic memory onlining and, where applicable, assigns memory blocks to ZONE_MOVABLE.
In RHEL bare metal, ZONE_MOVABLE is configured automatically through this default udev rule set. This ensures that when memory is dynamically added to the system, the kernel reserves a portion as ZONE_MOVABLE, which can be safely removed later if needed. No manual configuration is required for hot-plugged memory devices.
13.2. Disabling automatic memory onlining Copy linkLink copied to clipboard!
To prevent newly added memory devices from being immediately available to the operating system, you can disable automatic memory onlining in Red Hat Enterprise Linux. This action is performed by modifying the relevant udev rules.
Prerequisites
- You have root privileges on your system.
Procedure
Copy the default rule to the local override directory:
cp /usr/lib/udev/rules.d/40-redhat-hotplug.rules /etc/udev/rules.d/Edit the
/etc/udev/rules.d/40-redhat-hotplug.rulesfile in a text editor and comment out the lines that handle memory hot-plug operations:# Memory hotadd request #SUBSYSTEM!="memory", GOTO="memory_hotplug_end" #ACTION!="add", GOTO="memory_hotplug_end" #CONST{arch}=="s390*", GOTO="memory_hotplug_end" #CONST{arch}=="ppc64*", GOTO="memory_hotplug_end" #ENV{.state}="online" #CONST{virt}=="none", ENV{.state}="online_movable" #ATTR{state}=="offline", ATTR{state}="$env{.state}" #LABEL="memory_hotplug_end"Reload
udevrules to apply the changes:udevadm control --reload-rules- Reboot the system, or remove the device and add it back to the system to ensure the new configuration takes effect.
Verification
- After disabling automatic onlining, newly added memory blocks will remain offline until you manually online them.
13.3. Manually onlining memory Copy linkLink copied to clipboard!
If you have disabled automatic memory onlining, newly added memory blocks, such as those from a hot-plugged CXL device, remain offline by default. You can manually online these memory blocks by using the sysfs interface.
Prerequisites
- You have root privileges on your system.
Procedure
Identify the offline memory block identifier using the
lsmemcommand:# lsmemRANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x000000007fffffff 2G online yes 0-1 0x0000000100000000-0x000000087fffffff 30G online yes 4-33 0x0000000880000000-0x00000008ffffffff 2G offline 34-41 Memory block size: 256M Total online memory: 32G Total offline memory: 2GIn this example, the offline memory blocks are
34-41.Online a specific memory block by writing to its state file:
echo online > /sys/devices/system/memory/memory<_identifier_>/stateReplace
memory<_identifier_>with the identifier of the memory block, for example,memory34.
Verification
Verify the memory block is online:
cat /sys/devices/system/memory/memory<_identifier_>/stateonline
13.4. Managing CXL memory devices Copy linkLink copied to clipboard!
CXL v2 memory hot-plugging enables you to add and remove memory resources dynamically in Red Hat Enterprise Linux (RHEL) environments, supporting flexibility and scalability for data center operations. By using CXL memory devices and external switches, you can create shared memory pools that are not limited by traditional system hardware constraints, provided that your system supports CXL v2.
CXL v2 introduces enhanced memory management and hot-plug capabilities, and extends support to bare-metal systems. These enhancements help reduce the risks associated with manual memory node removal, thereby improving system stability and reliability.
Before configuring CXL v2 memory hot-plugging, verify that your system firmware and hardware are compatible with CXL v2.
13.4.1. CXL memory device status levels Copy linkLink copied to clipboard!
A Compute Express Link (CXL) memory device transitions through several states in response to specific system events. The terms "Level 0" through "Level 4" describe the state transitions of a CXL memory device during hot-plug operations.
In RHEL, CXL memory is managed in units called memory regions. A memory region can correspond to a single CXL device or span multiple devices by specifying interleave during creation. Memory incorporated into memory regions must be brought "online" before it is available to the operating system.
You must modify or configure memory regions manually in several scenarios. For example, during a hot-add event, the driver detects the hardware connection but does not automatically configure memory regions or bring the memory online. Similarly, you must manually create memory regions when you want to interleave memory across multiple CXL devices to improve performance, as the driver cannot automatically determine your desired interleave topology.
- Level 0
The CXL driver does not detect the CXL device.
At this stage, the device is not recognized by the kernel.
- Level 1
The CXL driver detects the hardware and establishes a connection.
The device is recognized on system boot, or an interrupt is raised for hot-adding a CXL device. This level indicates the device is connected but unconfigured.
- Level 2
A memory region is created for the CXL device.
A specific memory region has been defined and allocated for the device, but it is not yet enabled.
- Level 3
The memory region is enabled but not yet online for the Operating System.
The CXL memory region is active, but the memory blocks have not been transitioned to the "online" state for kernel use.
- Level 4
The OS can use the hot-added CXL memory device.
At this point, memory blocks are online and available to the operating system. If memory blocks are taken offline, the device returns to Level 3.
13.4.2. Safe offline of hot-removable memory Copy linkLink copied to clipboard!
In Red Hat Enterprise Linux (RHEL), memory used by the kernel or drivers cannot be set to an offline state or hot-removed. RHEL cannot migrate data from these areas, which can block hot-remove operations.
By default, the kernel and drivers can use all memory areas. This behavior prevents other memory devices, such as Compute Express Link (CXL), from being hot-plugged or hot-removed. To avoid this limitation, RHEL allows you to create a ZONE_MOVABLE. This memory area is not used by the kernel or drivers, ensuring that it can be safely hot-removed.
For bare metal systems, RHEL automatically configures ZONE_MOVABLE through the default udev rules for hot-plugged memory. This configuration ensures safer hot-removal of devices like CXL without manual configuration.
13.4.3. Preventing triggering the OOM Killer when hot-adding memory Copy linkLink copied to clipboard!
The kernel requires a certain amount of memory capacity to manage hot-added memory, known as the memmap area. This area uses approximately 1.6% of the memory size, for example, 16GB for 1TB of memory. By default, this is allocated from ZONE_NORMAL. If ZONE_NORMAL has insufficient amount of free memory, the allocation could trigger the Out of Memory (OOM) Killer.
To prevent this, you can allocate the memmap area from the hot-added memory itself. Specific devices, such as Compute Express Link (CXL) devices and standard Dual Inline Memory Modules (DIMMs), support this feature.
Prerequisites
- You have root privileges.
-
Your memory hardware supports allocating
memmapon the device itself. Refer to your hardware manufacturer’s documentation for more information.
Procedure
Enable the
memmap_on_memoryfeature by adding the following kernel boot option:memory_hotplug.memmap_on_memory=1- Reboot the system for the change to take effect.
Verification
Verify the setting is enabled:
# cat /sys/module/memory_hotplug/parameters/memmap_on_memoryY
13.4.4. Hot-plugging CXL memory devices Copy linkLink copied to clipboard!
To dynamically increase system memory, you can hot-plug memory devices, such as Compute Express Link (CXL) modules or Dual Inline Memory Modules (DIMMs), in Red Hat Enterprise Linux.
Prerequisites
- You are running Red Hat Enterprise Linux 8 or later.
- You have root privileges on your system.
- Your system firmware such as BIOS or UEFI, and hardware components such as a CXL switch or backplane, must support memory hot-plug events.
Procedure
Insert the supported memory device into the appropriate slot according to your hardware vendor’s instructions.
The hot-plug event can originate from a physical installation of a memory module into a slot, or from a logical allocation, such as dynamically assigning memory from a CXL shared memory pool.
List the online status of memory blocks and confirm if the newly added memory was automatically onlined:
# lsmemRANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x000000007fffffff 2G online yes 0-1 0x0000000100000000-0x000000087fffffff 30G online yes 4-33 0x0000000880000000-0x00000008ffffffff 2G offline 34-41 Memory block size: 256M Total online memory: 32G Total offline memory: 2GNoteIf the memory remains offline, verify if automatic memory onlining is disabled. For more information, see Disabling automatic memory onlining. You must manually online the memory blocks if the operation did not complete automatically.
Verification
Check the system memory and NUMA topology to ensure the new memory is fully integrated:
# free -h # numactl -H
If you encounter issues, review hardware documentation and ensure all prerequisites are met.
Troubleshooting
Optional: Review the relevant logs for errors or warnings:
# journalctl -k | grep -i memory # dmesg | tail
13.4.5. Hot-removing CXL memory devices Copy linkLink copied to clipboard!
To safely hot-remove a Compute Express Link (CXL) memory device, you must ensure that the memory is no longer in use by the kernel. You can safely hot-remove a CXL memory device by taking the memory offline and disabling the associated memory regions.
Prerequisites
- You have root privileges on your system.
Procedure
Set the memory to offline.
This allows the kernel to migrate data away from the memory area being removed, ensuring it is no longer in use.
Disable the CXL memory region if it is still active:
# cxl disable-regionNoteThe
cxl disable-regioncommand also automatically takes the memory offline if it is still online.Destroy the CXL memory region:
# cxl destroy-regionDisable the memory device:
# cxl disable-memdev- After all commands complete successfully, physically remove the CXL device or disconnect it using your hardware’s management interface.
13.4.6. Verifying CXL memory status Copy linkLink copied to clipboard!
You can verify the status and configuration of Compute Express Link (CXL) memory devices by using various command-line tools.
Prerequisites
- You have root privileges.
-
The
pciutilspackage is installed (forlspci). -
The
numactlpackage is installed.
Procedure
Check the CXL device connection by using the
lspcicommand:Because CXL is an extension of PCIe, you can use
lspcito view device details.# lspci -vvv 01:00.0 CXL: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX [CXL Memory Device (CXL 2.x)] Subsystem: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX ... NUMA node: 0 ...Check the CXL memory status as NUMA nodes:
CXL memory is treated as CPU-less NUMA nodes. Use the
numactlcommand to check the status.# numactl -H --cpu-compress available: 4 nodes (0-3) node 0 cpus: 0-95, 192-287 (192) node 0 size: 15219 MB ... node 2 cpus: (0) node 2 size: 258048 MB ...Verify BIOS/EFI configuration support by checking specifically for a "soft reserved" area in the BIOS-e820 output using
dmesg:[ 0.000000] BIOS-e820: [mem 0x0000001070000000-0x000000506fffffff] soft reservedSupport for configuring CXL memory devices depends on the
EFI_MEMORY_SPBIOS feature. If this setting is not enabled, Red Hat Enterprise Linux (RHEL) treats the CXL device as ordinary RAM. If this setting is not configured, the kernel treats the CXL device as DDR DRAM. Refer to your platform’s firmware setup manual for instructions.
13.4.8. Considerations for hot-unplugging memory devices Copy linkLink copied to clipboard!
When planning to hot-remove memory devices, consider the following potential issues and limitations.
- Migration failure messages
- When offlining memory, you can see migration failure messages such as:
[ 6381.169189] page dumped because: migration failure
[ 6381.181353] migrating pfn 8842600 failed ret:2
...
This indicates a failure to move data from the memory targeted for removal. The kernel retries memory migration, and it usually completes successfully. However, hot removal fails depending on system conditions:
- Insufficient memory after removal
- If memory capacity is insufficient after removal, memory migration might fail. Ensure enough free memory remains before starting the hot removal.
- Using Transparent Hugepages
- If Transparent Hugepages are used, there might not be enough free memory for 2MB pages, which can cause migration delays or failures. Consider disabling Transparent Hugepages or ensuring that sufficient free memory is available.
- Using Hugetlb Pages
- 1GB hugepages: Hot removal is not supported.
2MB hugepages: Requires contiguous free memory for migration. If the memory is not available, hot removal fails.
- Out of Memory (OOM) Killer risks
-
If a process is bound to a Non-Uniform Memory Access (NUMA) node through
mbind(MPOL_BIND)and you hot-remove that node, the kernel triggers the OOM Killer due to insufficient memory.
Stop these processes before performing the hot removal.