Rechercher

Ce contenu n'est pas disponible dans la langue sélectionnée.

Chapter 2. Memory management on RHEL for Real Time

download PDF

Real-time systems use a virtual memory system, where an address referenced by a user-space application translates into a physical address. The translation occurs through a combination of page tables and address translation hardware in the underlying computing system. An advantage of having the translation mechanism in between a program and the actual memory is that the operating system can swap pages when required or on request of the CPU.

In real-time, to swap pages from the storage to the primary memory, the previously used page table entry is marked as invalid. As a result, even under normal memory pressure, the operating system can retrieve pages from one application and give them to another. This can cause unpredictable system behaviors.

Memory allocation implementations include demand paging mechanism and memory lock (mlock()) system calls.

Note

Sharing data information about CPUs in different cache and NUMA domains might cause traffic problems and bottlenecks.

When writing a multithreaded application, consider the machine topology before designing the data decomposition. Topology is the memory hierarchy, and includes CPU caches and the Non-Uniform Memory Access (NUMA) node.

2.1. Demand paging

Demand paging is similar to a paging system with page swapping. The system loads pages that are stored in the secondary memory when required or on CPU demand. All memory addresses generated by a program pass through an address translation mechanism in the processor. The addresses then convert from a process-specific virtual address to a physical memory address. This is referred to as virtual memory. The two main components in the translation mechanism are page tables and translation lookaside buffers (TLBs)

Page tables

Page tables are multi-level tables in physical memory that contain mappings for virtual to physical memory. These mappings are readable by the virtual memory translation hardware in the processor.

A page table entry with an assigned physical address, is referred to as the resident working set. When the operating system needs to free memory for other processes, it can swap pages from the resident working set. When swapping pages, any reference to a virtual address within that page creates a page fault and causes page reallocation.

When the system is extremely low on physical memory, the swap process starts to thrash, which constantly steals pages from processes, and never allows a process to complete. You can monitor the virtual memory statistics by looking for the pgfault value in the /proc/vmstat file.

Translation lookaside buffers

Translation Lookaside Buffers (TLBs) are hardware caches of virtual memory translations. Any processor core with a TLB checks the TLB in parallel with initiating a memory read of a page table entry. If the TLB entry for a virtual address is valid, the memory read is aborted and the value in the TLB is used for the address translation.

A TLB operates on the principle of locality of reference. This means that if code stays in one region of memory for a significant period of time (such as loops or call-related functions) then the TLB references avoid the main memory for address translations. This can significantly speed up processing times.

When writing deterministic and fast code, use functions that maintain locality of reference. This can mean using loops rather than recursion. If recursion are not avoidable, place the recursion call at the end of the function. This is called tail-recursion, which makes the code work in a relatively small region of memory and avoids calling table translations from the main memory.

2.2. Major and minor page faults

RHEL for Real Time allocates memory by breaking physical memory into chunks called pages and then maps them to the virtual memory. Faults in real-time occur when a process needs a specific page that is not mapped or is no longer available in the memory. Hence faults essentially mean unavailability of pages when required by a CPU. When a process encounters a page fault, all threads freeze until the kernel handles this fault. There are several ways to address this problem, but the best solution can be to adjust the source code to avoid page faults.

Minor page faults

Minor page faults in real-time occur when a process attempts to access a portion of memory before it has been initialized. In such scenarios, the system performs operations to fill the memory maps or other management structures. The severity of a minor page fault can depend on the system load and other factors, but they are usually short and have a negligible impact.

Major page faults

Real-time major faults occur when the system has to either synchronize memory buffers with the disk, swap memory pages belonging to other processes, or undertake any other input output (I/O) activity to free memory. This occurs when the processor references a virtual memory address that does not have a physical page allocated to it. The reference to an empty page causes the processor to execute a fault, and instructs the kernel code to allocate a page which increases latency dramatically.

In real-time, when an application shows a performance drop, it is beneficial to check for the process information related to page faults in the /proc/ directory. For a specific process identifier (PID), using the cat command, you can view the /proc/PID/stat file for following relevant entries:

  • Field 2: the executable file name.
  • Field 10: the number of minor page faults.
  • Field 12: the number of major page faults.

The following example demonstrates viewing page faults with the cat command and a pipe function to return only the second, tenth, and twelfth lines of the /proc/PID/stat file:

# cat /proc/3366/stat | cut -d\ -f2,10,12
  (bash) 5389 0

In the example output, the process with PID 3366 is bash and it has 5389 minor page faults and no major page faults.

Additional resources

  • Linux System Programming by Robert Love

2.3. mlock() system calls

The memory lock (mlock()) system calls enable calling processes to lock or unlock a specified range of the address space and prevents Linux from paging the locked memory to swap space. After you allocate the physical page to the page table entry, references to that page are relatively fast. The memory lock system calls fall into mlock() and munlock() categories.

The mlock() and munlock() system calls lock and unlock a specified range of process address pages. When successful, the pages in the specified range remain resident in the memory until the munlock() system call unlocks the pages.

The mlock() and munlock() system calls take following parameters:

  • addr: specifies the start of an address range.
  • len: specifies the length of the address space in bytes.

When successful, mlock() and munlock() system calls return 0. In case of an error, they return -1 and set a errno to indicate the error.

The mlockall() and munlockall() system calls locks or unlocks all of the program space.

Note

The mlock() system call does not ensure that the program will not have a page I/O. It ensures that the data stays in the memory but cannot ensure it stays on the same page. Other functions such as move_pages and memory compactors can move data around regardless of the use of mlock().

Memory locks are made on a page basis and do not stack. If two dynamically allocated memory segments share the same page locked twice by mlock() or mlockall(), they unlock by using a single munlock() or munlockall() system call. As such, it is important to be aware of the pages that the application unlocks to avoid double-locking or single-unlocking problems.

The following are two most common workarounds to mitigate double-lock or single-unlock problems:

  • Tracking the allocated and locked memory areas and creating a wrapper function that verifies the number of page allocations before unlocking a page. This is the resource counting principle used in device drivers.
  • Making memory allocations based on page size and alignment to avoid double-lock on a page.

Additional resources

  • capabilities(7) man page
  • mlock(2) man page
  • mlock(3) man page
  • mlockall(2) man page
  • mmap(2) man page
  • move_pages(2) man page
  • posix_memalign(3) man page
  • posix_memalign(3p) man page

2.4. Shared libraries

The RHEL for Real Time shared libraries are called dynamic shared objects (DSO) and are a collection of pre-compiled code blocks called functions. These functions are reusable in multiple programs and they load at run-time or compile time.

Linux supports the following two library classes:

  • Dynamic or shared libraries: exists as separate files outside of the executable file. These files load into the memory and get mapped at run-time.
  • Static libraries: are files linked to a program statically at compile time.

The ld.so dynamic linker loads the shared libraries required by a program and then executes the code. The DSO functions load the libraries in the memory once and multiple processes can then reference the objects by mapping into the address space of processes. You can configure the dynamic libraries to load at compile time using the LD_BIND_NOW variable.

Evaluating symbols before program initialization can improve performance because evaluating at application run-time can cause latency if the memory pages are located on an external disk.

Additional resources

  • ld.so(8) man page

2.5. Shared memory

In RHEL for Real Time, shared memory is a memory space shared between multiple processes. Using program threads, all threads created in one process context can share the same address space. This makes all data structures accessible to threads. With POSIX shared memory calls, you can configure processes to share a part of the address space.

You can use the following supported POSIX shared memory calls:

  • shm_open(): creates and opens a new or opens an existing POSIX shared memory object.
  • shm_unlink(): unlinks POSIX shared memory objects.
  • mmap(): creates a new mapping in the virtual address space of the calling process.
Note

The mechanism for sharing a memory region between two processes using System V IPC shmem() set of calls is deprecated and is no longer supported on RHEL for Real Time.

Additional resources

  • shm_open(3) man page
  • shm_overview(7) man page
  • mmap(2) man page
Red Hat logoGithubRedditYoutubeTwitter

Apprendre

Essayez, achetez et vendez

Communautés

À propos de la documentation Red Hat

Nous aidons les utilisateurs de Red Hat à innover et à atteindre leurs objectifs grâce à nos produits et services avec un contenu auquel ils peuvent faire confiance.

Rendre l’open source plus inclusif

Red Hat s'engage à remplacer le langage problématique dans notre code, notre documentation et nos propriétés Web. Pour plus de détails, consultez leBlog Red Hat.

À propos de Red Hat

Nous proposons des solutions renforcées qui facilitent le travail des entreprises sur plusieurs plates-formes et environnements, du centre de données central à la périphérie du réseau.

© 2024 Red Hat, Inc.