Chapter 10. Kernel
10.1. Resource control
10.1.1. Control group v2 available as a Technology Preview in RHEL 8
Control group v2 mechanism is a unified hierarchy control group. Control group v2 organizes processes hierarchically and distributes system resources along the hierarchy in a controlled and configurable manner.
Unlike the previous version, control group v2 has only a single hierarchy. This single hierarchy enables the Linux kernel to:
- Categorize processes based on the role of their owner.
- Eliminate issues with conflicting policies of multiple hierarchies.
Control group v2 supports numerous controllers:
CPU controller regulates the distribution of CPU cycles. This controller implements:
- Weight and absolute bandwidth limit models for normal scheduling policy.
- Absolute bandwidth allocation model for real time scheduling policy.
Memory controller regulates the memory distribution. Currently, the following types of memory usages are tracked:
- Userland memory - page cache and anonymous memory.
- Kernel data structures such as dentries and inodes.
- TCP socket buffers.
- I/O controller regulates the distribution of I/O resources.
- Remote Direct Memory Access (RDMA) controller limits RDMA/IB specific resources that certain processes can use. These processes are grouped through the RDMA controller.
-
Process number controller enables the control group to stop any new tasks from being
fork()
’d orclone()
’d after a certain limit. - Writeback controller acts as a mechanism, which balances conflicts between I/O and the memory controllers.
The information above was based on cgroups-v2 online documentation. You can refer to the same link to obtain more information about particular control group v2 controllers.
10.2. Memory management
10.2.1. 52-bit PA for 64-bit ARM available
With this update, support for 52-bit physical addressing (PA) for the 64-bit ARM architecture is available. This provides a larger physical address space than previous 48-bit PA.
10.2.2. 5-level page tables x86_64
In RHEL 7, existing memory bus had 48/46 bit of virtual/physical memory addressing capacity, and the Linux kernel implemented 4 levels of page tables to manage these virtual addresses to physical addresses. The physical bus addressing line put the physical memory upper limit capacity at 64 TB.
These limits have been extended to 57/52 bit of virtual/physical memory addressing with 128 PiB of virtual address space (64PB user/64PB kernel) and 4 PB of physical memory capacity.
With the extended address range, the memory management in RHEL 8 adds support for 5-level page table implementation. This implementation is able to handle the expanded address range with up to 128 PiB of virtual address space and 4 PiB of physical address space.
The 5-level page table is enabled by default for hardware capable of supporting this feature even if the installed physical memory is less than 64 TiB. For systems with less than 64 TiB of memory, there is a small overhead increase in walking the 5-level page table. To avoid this overhead, users can disable 5-level page table by using the no5lvl
kernel command-line parameter to force the use of 4-level page table.
10.2.3. Swapping algorithm changes
In RHEL 8, there is an update in the virtual memory management of Linux kernel algorithms. This might cause anonymous memory pages (process data) to be swapped to disk more often compared to earlier versions. These changes were made because the performance impact of modern solid-state drives (SSDs) is significantly lower compared to spinning disks. To ensure optimal system performance when migrating to RHEL 8, it is recommended that users evaluate and, if necessary, adjust the swappiness settings of the system.
10.3. Performance analysis and observability tools
10.3.1. bpftool added to kernel
The bpftool
utility that serves for inspection and simple manipulation of programs and maps based on extended Berkeley Packet Filtering (eBPF) has been added into the Linux kernel. bpftool
is a part of the kernel source tree, and is provided by the bpftool package, which is included as a sub-package of the kernel package.
10.3.2. eBPF available as a Technology Preview
The extended Berkeley Packet Filtering (eBPF) feature is available as a Technology Preview for both networking and tracing. eBPF enables the user space to attach custom programs onto a variety of points (sockets, trace points, packet reception) to receive and process data. The feature includes a new system call bpf()
, which supports creating various types of maps, and also to insert various types of programs into the kernel. Note that the bpf()
syscall can be successfully used only by a user with the CAP_SYS_ADMIN
capability, such as a root user. See the bpf
(2) man page for more information.
10.3.3. BCC is available as a Technology Preview
BPF Compiler Collection (BCC)
is a user space tool kit for creating efficient kernel tracing and manipulation programs that is available as a Technology Preview in RHEL 8. BCC
provides tools for I/O analysis, networking, and monitoring of Linux operating systems using the extended Berkeley Packet Filtering (eBPF)
.
10.4. Booting process
10.4.1. How to install and boot custom kernels in RHEL
The Boot Loader Specification (BLS) defines a scheme and file format to manage boot loader configurations for each boot option in a drop-in directory. There is no need to manipulate the individual drop-in configuration files. This premise is particularly relevant in RHEL 8 because not all architectures use the same boot loader:
-
x86_64
,aarch64
andppc64le
with open firmware useGRUB
. -
ppc64le
with Open Power Abstraction Layer (OPAL) usesPetitboot
. -
s390x
useszipl
.
Each boot loader has a different configuration file and format that has to be modified when a new kernel is installed or removed. In the previous versions of RHEL, the component that permitted this work was the grubby
utility. However, for RHEL 8 the boot loader configuration was standardized by implementing the BLS file format, where grubby
works as a thin wrapper around the BLS operations.
10.4.2. Early kdump support in RHEL
Previously, the kdump
service started too late to register the kernel crashes that occurred in early stages of the booting process. As a result, the crash information together with a chance for troubleshooting was lost.
To address this problem, RHEL 8 introduced an early kdump
support. To learn more about this mechanism, see the /usr/share/doc/kexec-tools/early-kdump-howto.txt
file. Also, see the Red Hat Knowledgebase solution What is early kdump support and how do I configure it?.