Chapter 35. Configuring an operating system to optimize memory access
You can configure the operating system to optimize memory access across workloads with the tools that are included in RHEL
35.1. Tools for monitoring and diagnosing system memory issues
The following tools are available in Red Hat Enterprise Linux 8 for monitoring system performance and diagnosing performance problems related to system memory:
-
vmstat
tool, provided by theprocps-ng
package, displays reports of a system’s processes, memory, paging, block I/O, traps, disks, and CPU activity. It provides an instantaneous report of the average of these events since the machine was last turned on, or since the previous report. valgrind
framework provides instrumentation to user-space binaries. Install this tool, using theyum install valgrind
command. It includes a number of tools, that you can use to profile and analyze program performance, such as:memcheck
option is the defaultvalgrind
tool. It detects and reports on a number of memory errors that can be difficult to detect and diagnose, such as:- Memory access that should not occur
- Undefined or uninitialized value use
- Incorrectly freed heap memory
- Pointer overlap
Memory leaks
NoteMemcheck can only report these errors, it cannot prevent them from occurring. However,
memcheck
logs an error message immediately before the error occurs.
-
cachegrind
option simulates application interaction with a system’s cache hierarchy and branch predictor. It gathers statistics for the duration of application’s execution and outputs a summary to the console. -
massif
option measures the heap space used by a specified application. It measures both useful space and any additional space allocated for bookkeeping and alignment purposes.
Additional resources
-
vmstat(8)
andvalgrind(1)
man pages on your system -
/usr/share/doc/valgrind-version/valgrind_manual.pdf
file
35.2. Overview of a system’s memory
The Linux Kernel is designed to maximize the utilization of a system’s memory resources (RAM). Due to these design characteristics, and depending on the memory requirements of the workload, part of the system’s memory is in use within the kernel on behalf of the workload, while a small part of the memory is free. This free memory is reserved for special system allocations, and for other low or high priority system services.
The rest of the system’s memory is dedicated to the workload itself, and divided into the following two categories:
File memory
Pages added in this category represent parts of files in permanent storage. These pages, from the page cache, can be mapped or unmapped in an application’s address spaces. You can use applications to map files into their address space using the
mmap
system calls, or to operate on files via the buffered I/O read or write system calls.Buffered I/O system calls, as well as applications that map pages directly, can re-utilize unmapped pages. As a result, these pages are stored in the cache by the kernel, especially when the system is not running any memory intensive tasks, to avoid re-issuing costly I/O operations over the same set of pages.
Anonymous memory
- Pages in this category are in use by a dynamically allocated process, or are not related to files in permanent storage. This set of pages back up the in-memory control structures of each task, such as the application stack and heap areas.
Figure 35.1. Memory usage patterns
35.3. Virtual memory parameters
The virtual memory parameters are listed in the /proc/sys/vm
directory.
The following are the available virtual memory parameters:
vm.dirty_ratio
-
Is a percentage value. When this percentage of the total system memory is modified, the system begins writing the modifications to the disk. The default value is
20
percent. vm.dirty_background_ratio
-
A percentage value. When this percentage of total system memory is modified, the system begins writing the modifications to the disk in the background. The default value is
10
percent. vm.overcommit_memory
Defines the conditions that determine whether a large memory request is accepted or denied.The default value is
0
.By default, the kernel performs checks if a virtual memory allocation request fits into the present amount of memory (total + swap) and rejects only large requests. Otherwise virtual memory allocations are granted, and this means they allow memory overcommitment.
Setting the
overcommit_memory
parameter’s value:-
When this parameter is set to
1
, the kernel performs no memory overcommit handling. This increases the possibility of memory overload, but improves performance for memory-intensive tasks. -
When this parameter is set to
2
, the kernel denies requests for memory equal to or larger than the sum of the total available swap space and the percentage of physical RAM specified in theovercommit_ratio
. This reduces the risk of overcommitting memory, but is recommended only for systems with swap areas larger than their physical memory.
-
When this parameter is set to
vm.overcommit_ratio
-
Specifies the percentage of physical RAM considered when
overcommit_memory
is set to2
. The default value is50
. vm.max_map_count
-
Defines the maximum number of memory map areas that a process can use. The default value is
65530
. Increase this value if your application needs more memory map areas. vm.min_free_kbytes
Sets the size of the reserved free pages pool. It is also responsible for setting the
min_page
,low_page
, andhigh_page
thresholds that govern the behavior of the Linux kernel’s page reclaim algorithms. It also specifies the minimum number of kilobytes to keep free across the system. This calculates a specific value for each low memory zone, each of which is assigned a number of reserved free pages in proportion to their size.Setting the
vm.min_free_kbytes
parameter’s value:- Increasing the parameter value effectively reduces the application working set usable memory. Therefore, you might want to use it for only kernel-driven workloads, where driver buffers need to be allocated in atomic contexts.
Decreasing the parameter value might render the kernel unable to service system requests, if memory becomes heavily contended in the system.
WarningExtreme values can be detrimental to the system’s performance. Setting the
vm.min_free_kbytes
to an extremely low value prevents the system from reclaiming memory effectively, which can result in system crashes and failure to service interrupts or other kernel services. However, settingvm.min_free_kbytes
too high considerably increases system reclaim activity, causing allocation latency due to a false direct reclaim state. This might cause the system to enter an out-of-memory state immediately.The
vm.min_free_kbytes
parameter also sets a page reclaim watermark, calledmin_pages
. This watermark is used as a factor when determining the two other memory watermarks,low_pages
, andhigh_pages
, that govern page reclaim algorithms.
/proc/PID/oom_adj
In the event that a system runs out of memory, and the
panic_on_oom
parameter is set to0
, theoom_killer
function kills processes, starting with the process that has the highestoom_score
, until the system recovers.The
oom_adj
parameter determines theoom_score
of a process. This parameter is set per process identifier. A value of-17
disables theoom_killer
for that process. Other valid values range from-16
to15
.
Processes created by an adjusted process inherit the oom_score
of that process.
vm.swappiness
The swappiness value, ranging from
0
to200
, controls the degree to which the system favors reclaiming memory from the anonymous memory pool, or the page cache memory pool.Setting the
swappiness
parameter’s value:- Higher values favor file-mapped driven workloads while swapping out the less actively accessed processes’ anonymous mapped memory of RAM. This is useful for file-servers or streaming applications that depend on data, from files in the storage, to reside on memory to reduce I/O latency for the service requests.
Low values favor anonymous-mapped driven workloads while reclaiming the page cache (file mapped memory). This setting is useful for applications that do not depend heavily on the file system information, and heavily utilize dynamically allocated and private memory, such as mathematical and number crunching applications, and few hardware virtualization supervisors like QEMU.
The default value of the
vm.swappiness
parameter is60
.Warning-
Setting the
vm.swappiness
to0
aggressively avoids swapping anonymous memory out to a disk, this increases the risk of processes being killed by theoom_killer
function when under memory or I/O intensive workloads. If you are using
cgroupsV1
, the per-cgroup swappiness value exclusive tocgroupsV1
will result in the system-wide swappiness configured by thevm.swappiness
parameter having little-to-no effect on the swap behavior of the system. This issue might lead to unexpected and inconsistent swap behavior.In such cases, consider using the
vm.force_cgroup_v2_swappiness
parameter.For more information, see the Premature swapping with swappiness=0 while there is still plenty of pagecache to be reclaimed KCS solution.
-
Setting the
force_cgroup_v2_swappiness
-
This control is used to deprecate the per-cgroup swappiness value available only in
cgroupsV1
. Most of all system and user processes are run within a cgroup. Cgroup swappiness values default to 60. This can lead to effects where systems swappiness value has little effect on the swap behavior of their system. If a user does not care about the per-cgroup swappiness feature they can configure their system withforce_cgroup_v2_swappiness=1
to have more consistent swappiness behavior across their whole system.
Additional resources
-
sysctl(8)
man page on your system - Setting memory-related kernel parameters
35.4. File system parameters
The file system parameters are listed in the /proc/sys/fs
directory. The following are the available file system parameters:
aio-max-nr
-
Defines the maximum allowed number of events in all active asynchronous input/output contexts. The default value is
65536
, and modifying this value does not pre-allocate or resize any kernel data structures. file-max
Determines the maximum number of file handles for the entire system. The default value on Red Hat Enterprise Linux 8 is either
8192
or one tenth of the free memory pages available at the time the kernel starts, whichever is higher.Raising this value can resolve errors caused by a lack of available file handles.
Additional resources
-
sysctl(8)
man page on your system
35.5. Kernel parameters
The default values for the kernel parameters are located in the /proc/sys/kernel/
directory. These are set default values provided by the kernel or values specified by a user via sysctl
.
The following are the available kernel parameters used to set up limits for the msg*
and shm*
System V IPC (sysvipc
) system calls:
msgmax
-
Defines the maximum allowed size in bytes of any single message in a message queue. This value must not exceed the size of the queue (
msgmnb
). Use thesysctl msgmax
command to determine the currentmsgmax
value on your system. msgmnb
-
Defines the maximum size in bytes of a single message queue. Use the
sysctl msgmnb
command to determine the currentmsgmnb
value on your system. msgmni
-
Defines the maximum number of message queue identifiers, and therefore the maximum number of queues. Use the
sysctl msgmni
command to determine the currentmsgmni
value on your system. shmall
-
Defines the total amount of shared memory
pages
that can be used on the system at one time. For example, a page is4096
bytes on the AMD64 and Intel 64 architecture. Use thesysctl shmall
command to determine the currentshmall
value on your system. shmmax
-
Defines the maximum size in bytes of a single shared memory segment allowed by the kernel. Shared memory segments up to 1Gb are now supported in the kernel. Use the
sysctl shmmax
command to determine the currentshmmax
value on your system. shmmni
-
Defines the system-wide maximum number of shared memory segments. The default value is
4096
on all systems.
Additional resources
-
sysvipc(7)
andsysctl(8)
man pages on your system