Chapter 35. Configuring an operating system to optimize memory access
You can configure the operating system to optimize memory access across workloads with the tools that are included in RHEL
35.1. Tools for monitoring and diagnosing system memory issues Copy linkLink copied to clipboard!
The following tools are available in Red Hat Enterprise Linux 8 for monitoring system performance and diagnosing performance problems related to system memory:
-
vmstattool, provided by theprocps-ngpackage, displays reports of a system’s processes, memory, paging, block I/O, traps, disks, and CPU activity. It provides an instantaneous report of the average of these events since the machine was last turned on, or since the previous report. valgrindframework provides instrumentation to user-space binaries. Install this tool, using theyum install valgrindcommand. It includes a number of tools, that you can use to profile and analyze program performance, such as:memcheckoption is the defaultvalgrindtool. It detects and reports on a number of memory errors that can be difficult to detect and diagnose, such as:- Memory access that should not occur
- Undefined or uninitialized value use
- Incorrectly freed heap memory
- Pointer overlap
Memory leaks
NoteMemcheck can only report these errors, it cannot prevent them from occurring. However,
memchecklogs an error message immediately before the error occurs.
-
cachegrindoption simulates application interaction with a system’s cache hierarchy and branch predictor. It gathers statistics for the duration of application’s execution and outputs a summary to the console. -
massifoption measures the heap space used by a specified application. It measures both useful space and any additional space allocated for bookkeeping and alignment purposes.
35.2. Overview of a system’s memory Copy linkLink copied to clipboard!
The Linux Kernel is designed to maximize the utilization of a system’s memory resources (RAM). Due to these design characteristics, and depending on the memory requirements of the workload, part of the system’s memory is in use within the kernel on behalf of the workload, while a small part of the memory is free. This free memory is reserved for special system allocations, and for other low or high priority system services.
The rest of the system’s memory is dedicated to the workload itself, and divided into the following two categories:
File memoryPages added in this category represent parts of files in permanent storage. These pages, from the page cache, can be mapped or unmapped in an application’s address spaces. You can use applications to map files into their address space using the
mmapsystem calls, or to operate on files via the buffered I/O read or write system calls.Buffered I/O system calls, as well as applications that map pages directly, can re-utilize unmapped pages. As a result, these pages are stored in the cache by the kernel, especially when the system is not running any memory intensive tasks, to avoid re-issuing costly I/O operations over the same set of pages.
Anonymous memory- Pages in this category are in use by a dynamically allocated process, or are not related to files in permanent storage. This set of pages back up the in-memory control structures of each task, such as the application stack and heap areas.
Figure 35.1. Memory usage patterns
35.3. Virtual memory parameters Copy linkLink copied to clipboard!
The virtual memory parameters are listed in the /proc/sys/vm directory.
The following are the available virtual memory parameters:
vm.dirty_ratio-
Is a percentage value. When this percentage of the total system memory is modified, the system begins writing the modifications to the disk. The default value is
20percent. vm.dirty_background_ratio-
A percentage value. When this percentage of total system memory is modified, the system begins writing the modifications to the disk in the background. The default value is
10percent. vm.overcommit_memoryDefines the conditions that determine whether a large memory request is accepted or denied.The default value is
0.By default, the kernel performs checks if a virtual memory allocation request fits into the present amount of memory (total + swap) and rejects only large requests. Otherwise virtual memory allocations are granted, and this means they allow memory overcommitment.
Setting the
overcommit_memoryparameter’s value:-
When this parameter is set to
1, the kernel performs no memory overcommit handling. This increases the possibility of memory overload, but improves performance for memory-intensive tasks. -
When this parameter is set to
2, the kernel denies requests for memory equal to or larger than the sum of the total available swap space and the percentage of physical RAM specified in theovercommit_ratio. This reduces the risk of overcommitting memory, but is recommended only for systems with swap areas larger than their physical memory.
-
When this parameter is set to
vm.overcommit_ratio-
Specifies the percentage of physical RAM considered when
overcommit_memoryis set to2. The default value is50. vm.max_map_count-
Defines the maximum number of memory map areas that a process can use. The default value is
65530. Increase this value if your application needs more memory map areas. vm.min_free_kbytesSets the size of the reserved free pages pool. It is also responsible for setting the
min_page,low_page, andhigh_pagethresholds that govern the behavior of the Linux kernel’s page reclaim algorithms. It also specifies the minimum number of kilobytes to keep free across the system. This calculates a specific value for each low memory zone, each of which is assigned a number of reserved free pages in proportion to their size.Setting the
vm.min_free_kbytesparameter’s value:- Increasing the parameter value effectively reduces the application working set usable memory. Therefore, you might want to use it for only kernel-driven workloads, where driver buffers need to be allocated in atomic contexts.
Decreasing the parameter value might render the kernel unable to service system requests, if memory becomes heavily contended in the system.
WarningExtreme values can be detrimental to the system’s performance. Setting the
vm.min_free_kbytesto an extremely low value prevents the system from reclaiming memory effectively, which can result in system crashes and failure to service interrupts or other kernel services. However, settingvm.min_free_kbytestoo high considerably increases system reclaim activity, causing allocation latency due to a false direct reclaim state. This might cause the system to enter an out-of-memory state immediately.The
vm.min_free_kbytesparameter also sets a page reclaim watermark, calledmin_pages. This watermark is used as a factor when determining the two other memory watermarks,low_pages, andhigh_pages, that govern page reclaim algorithms.
/proc/PID/oom_adjIn the event that a system runs out of memory, and the
panic_on_oomparameter is set to0, theoom_killerfunction kills processes, starting with the process that has the highestoom_score, until the system recovers.The
oom_adjparameter determines theoom_scoreof a process. This parameter is set per process identifier. A value of-17disables theoom_killerfor that process. Other valid values range from-16to15.
Processes created by an adjusted process inherit the oom_score of that process.
vm.swappinessThe swappiness value, ranging from
0to200, controls the degree to which the system favors reclaiming memory from the anonymous memory pool, or the page cache memory pool.Setting the
swappinessparameter’s value:- Higher values favor file-mapped driven workloads while swapping out the less actively accessed processes’ anonymous mapped memory of RAM. This is useful for file-servers or streaming applications that depend on data, from files in the storage, to reside on memory to reduce I/O latency for the service requests.
Low values favor anonymous-mapped driven workloads while reclaiming the page cache (file mapped memory). This setting is useful for applications that do not depend heavily on the file system information, and heavily utilize dynamically allocated and private memory, such as mathematical and number crunching applications, and few hardware virtualization supervisors like QEMU.
The default value of the
vm.swappinessparameter is60.Warning-
Setting the
vm.swappinessto0aggressively avoids swapping anonymous memory out to a disk, this increases the risk of processes being killed by theoom_killerfunction when under memory or I/O intensive workloads. If you are using
cgroupsV1, the per-cgroup swappiness value exclusive tocgroupsV1will result in the system-wide swappiness configured by thevm.swappinessparameter having little-to-no effect on the swap behavior of the system. This issue might lead to unexpected and inconsistent swap behavior.In such cases, consider using the
vm.force_cgroup_v2_swappinessparameter.For more information, see the Red Hat Knowledgebase solution Premature swapping with swappiness=0 while there is still plenty of pagecache to be reclaimed.
-
Setting the
force_cgroup_v2_swappiness-
This control is used to deprecate the per-cgroup swappiness value available only in
cgroupsV1. Most of all system and user processes are run within a cgroup. Cgroup swappiness values default to 60. This can lead to effects where systems swappiness value has little effect on the swap behavior of their system. If a user does not care about the per-cgroup swappiness feature they can configure their system withforce_cgroup_v2_swappiness=1to have more consistent swappiness behavior across their whole system.
35.4. File system parameters Copy linkLink copied to clipboard!
The file system parameters are listed in the /proc/sys/fs directory. The following are the available file system parameters:
aio-max-nr-
Defines the maximum allowed number of events in all active asynchronous input/output contexts. The default value is
65536, and modifying this value does not pre-allocate or resize any kernel data structures. file-maxDetermines the maximum number of file handles for the entire system. The default value on Red Hat Enterprise Linux 8 is either
8192or one tenth of the free memory pages available at the time the kernel starts, whichever is higher.Raising this value can resolve errors caused by a lack of available file handles.
35.5. Kernel parameters Copy linkLink copied to clipboard!
The default values for the kernel parameters are located in the /proc/sys/kernel/ directory. These are set default values provided by the kernel or values specified by a user via sysctl.
The following are the available kernel parameters used to set up limits for the msg* and shm* System V IPC (sysvipc) system calls:
msgmax-
Defines the maximum allowed size in bytes of any single message in a message queue. This value must not exceed the size of the queue (
msgmnb). Use thesysctl kernel.msgmaxcommand to determine the currentmsgmaxvalue on your system. msgmnb-
Defines the maximum size in bytes of a single message queue. Use the
sysctl msgmnbcommand to determine the currentmsgmnbvalue on your system. msgmni-
Defines the maximum number of message queue identifiers, and therefore the maximum number of queues. Use the
sysctl kernel.msgmnicommand to determine the currentmsgmnivalue on your system. shmall-
Defines the total amount of shared memory
pagesthat can be used on the system at one time. For example, a page is4096bytes on the AMD64 and Intel 64 architecture. Use thesysctl kernel.shmallcommand to determine the currentshmallvalue on your system. shmmax-
Defines the maximum size in bytes of a single shared memory segment allowed by the kernel. Shared memory segments up to 1Gb are now supported in the kernel. Use the
sysctl kernel.shmmaxcommand to determine the currentshmmaxvalue on your system. shmmni-
Defines the system-wide maximum number of shared memory segments. The default value is
4096on all systems.