Chapter 25. Configuring an operating system to optimize memory access
You can configure the operating system to optimize memory access across workloads with the tools that are included in RHEL.
25.1. Tools for monitoring and diagnosing system memory issues Copy linkLink copied to clipboard!
The following tools are available in Red Hat Enterprise Linux for monitoring system performance and diagnosing performance issues related to system memory:
-
The
vmstattool, included in theprocps-ngpackage, displays reports of a system’s processes, memory, paging, block I/O, traps, disks, and CPU activity. It generates an instant report displaying the average of these events from the time the machine was last turned on, or since the previous report. The
valgrindframework provides instrumentation to user-space binaries. This framework includes several tools, which you can use to profile and analyze program performance, such as:The
memchecktool is the default tool invalgrind. It detects and reports several memory errors that can be difficult to detect and diagnose, such as:- Invalid Memory Access
- Use of undefined or uninitialized values
- Incorrectly freed heap memory
- Pointer overlap (Buffer overlap)
Memory leaks
NoteMemcheckcan only report these errors, it cannot prevent them from occurring. However,memchecklogs an error message immediately before the error occurs.
-
The
cachegrindtool simulates how an application interacts with a system’s cache hierarchy and branch predictor. It gathers statistics for the duration of application’s execution and displays a summary to the console. The
massiftool measures the heap space used by a specified application. It measures both useful space and any additional space allocated for bookkeeping and alignment purposes.For more information, see the
/usr/share/doc/valgrind-version/valgrind_manual.pdffile andvmstat(8)andvalgrind(1)man pages on your system.
25.2. Overview of system memory Copy linkLink copied to clipboard!
The Linux Kernel is designed to maximize the utilization of resources in system memory (RAM). Due to these design characteristics, Kernel operations use most system memory for the workload, while leaving a small portion free.
This free memory is reserved for: special system allocations and other low or high priority system services. The rest of the system memory is dedicated to the workload itself, and divided into the following two categories:
- File memory
Pages added in this category represent parts of files in permanent storage. These pages, from the page cache, can be mapped or unmapped in address spaces of an application. Applications can map files into their address space by using the
mmapsystem calls or operate on files by using the buffered I/O read or write system calls.Buffered I/O system calls, as well as applications that map pages directly, can re-use unmapped pages. As a result, the kernel caches these pages to avoid repeated, slow I/O operations, particularly during periods of low memory usage.
- Anonymous memory
- These pages are used by dynamic processes or have no related files in permanent storage. These pages back up the in-memory control structures of each task, such as the application stack and heap.
25.3. Virtual memory parameters Copy linkLink copied to clipboard!
The virtual memory parameters are listed in the /proc/sys/vm directory.
The following are the available virtual memory parameters:
vm.dirty_ratio- A percentage value. When this percentage of total system memory is modified, the system begins writing the modifications to the disk. The default value is 20 percent.
vm.dirty_background_ratio- A percentage value. When this percentage of total system memory is modified, the system begins writing the modifications to the disk in the background. The default value is 10 percent.
vm.overcommit_memory- Defines the conditions that determine whether a large memory request is accepted or denied. The default value is 0. The kernel grants memory requests based on total available RAM and swap, rejecting only large allocations. Otherwise, virtual memory allocations are granted, and can allow memory overcommitment.
- Setting the
overcommit_memoryparameter’s value - When set to 1, the kernel performs no memory overcommit handling. This increases the possibility of memory overload, but improves performance for memory-intensive tasks.
-
When set to 2, the kernel denies allocations exceeding the total swap plus the RAM percentage defined in
overcommit_ratio. This setting reduces the risk of overcommitting memory. Use this only for systems with the swap partition larger than physical memory.
vm.overcommit_ratio-
Specifies the percentage of physical RAM considered when
overcommit_memoryis set to 2. The default value is 50. vm.max_map_count- Defines the maximum number of memory map areas that a process can use. The default value is 65530. Increase this value if your application needs more memory map areas.
vm.min_free_kbytes-
Sets the size of the reserved free pages pool. It is also responsible for setting the
min_page,low_page, andhigh_pagethresholds. These thresholds govern the behavior of the page reclaim algorithms in the Linux kernel. It also specifies to keep the minimum number of free kilobytes across the system. This calculates a specific value for each low memory zone. Each zone is assigned a number of reserved free pages in proportion to its size. - Setting the
vm.min_free_kbytesparameter’s value - Increasing the parameter value effectively reduces the application working set usable memory. Therefore, you can use it for only kernel-driven workloads, where driver buffers need to be allocated in atomic contexts.
Decreasing the parameter value might render the kernel unable to service system requests if memory becomes heavily contended in the system.
WarningExtreme values can be detrimental to the system’s performance. Setting the
vm.min_free_kbytesto an extremely low value prevents the system from reclaiming memory effectively. This can result in system crashes and failure to service interrupts or other kernel services. However, settingvm.min_free_kbytestoo high considerably increases system reclaim activity, causing allocation latency due to a false direct reclaim state. This might cause the system to immediately enter an out-of-memory state. Thevm.min_free_kbytesparameter also sets a page reclaim watermark, calledmin_pages. This watermark is used as a factor when determining the two other memory watermarks,low_pages, andhigh_pages, that govern page reclaim algorithms./proc/PID/oom_adjIn the event, when a system runs out of memory and thepanic_on_oomparameter is set to 0, theoom_killerfunction kills processes, starting with the process that has the highestoom_score, until the system recovers. Theoom_adjparameter determines theoom_scoreof a process. This parameter is set per process identifier. A value of -17 disables theoom_killerfor that process. Other valid values range from -16 to 15.NoteProcesses created by an adjusted process inherit the
oom_scoreof that process.
vm.swappiness- The swappiness value, ranging from 0 to 200, controls whether the system prioritizes reclaiming memory from anonymous pages or the page cache.
- Setting the swappiness parameter’s value
- Higher values prefer file-mapped driven workloads while swapping out the less actively accessed processes’ anonymous mapped memory of RAM. File servers and streaming applications use this to keep data in memory, reducing I/O latency.
Low values prefer anonymous-mapped driven workloads while reclaiming the page cache (file mapped memory). This setting is useful for applications that do not depend heavily on file system information and use dynamically allocated private memory. Examples include mathematical and number-crunching applications and some hardware virtualization supervisors such as QEMU. The default value of the
vm.swappinessparameter is 60.WarningSetting the
vm.swappinessto 0 aggressively avoids swapping out anonymous memory to a disk. This increases the likelihood that theoom_killerfunction kills processes during memory or I/O-intensive workloads.
25.4. File system parameters Copy linkLink copied to clipboard!
The file system parameters are listed in the /proc/sys/fs directory. The following are the available file system parameters:
aio-max-nr- Defines the maximum number of events allowed in all active asynchronous input/output contexts. The default value is 65536, and modifying this value does not pre-allocate or resize any kernel data structures.
file-maxDetermines the maximum number of file handles for the entire system. On Red Hat Enterprise Linux 10, the default value is
9223372036854775807.NoteThe default value is set by
systemdand corresponds to the configurable maximum.
25.5. Kernel parameters Copy linkLink copied to clipboard!
The default values for the kernel parameters are located in the /proc/sys/kernel/ directory. These are set default values provided by the kernel or values specified by a user by using sysctl. The following kernel parameters configure limits for the msg* and shm* System V IPC (sysvipc) system calls:
msgmax-
Defines the maximum allowed size in bytes of any single message in a message queue. This value must not exceed the size of the queue (
msgmnb). Use thesysctl kernel.msgmaxcommand to determine the currentmsgmaxvalue on your system. msgmnb-
Defines the maximum size in bytes of a single message queue. Use the
sysctl msgmnbcommand to determine the currentmsgmnbvalue on your system. msgmni-
Defines the maximum number of message queue identifiers, and therefore the maximum number of queues. Use the
sysctl kernel.msgmnicommand to determine the currentmsgmnivalue on your system. shmall-
Defines the total amount of shared memory pages that can be used on the system at one time. For example, a page is 4096 bytes on the AMD64 and Intel 64 architecture. Use the
sysctl kernel.shmallcommand to determine the currentshmallvalue on your system. shmmax-
Defines the maximum size in bytes of a single shared memory segment allowed by the kernel. Shared memory segments up to 1Gb are now supported in the kernel. Use the
sysctl kernel.shmmaxcommand to determine the currentshmmaxvalue on your system. shmmni- Defines the system-wide maximum number of shared memory segments. The default value is 4096 on all systems.