Chapter 4. Kernel features
This chapter explains the purpose and use of kernel features that enable many user space tools and includes resources for further investigation of those tools.
4.1. Control groups
4.1.1. What is a control group?
Control Group Namespaces are a Technology Preview in Red Hat Enterprise Linux 7.5
Linux Control Groups (cgroups) enable limits on the use of system hardware, ensuring that an individual process running inside a cgroup only utilizes as much as has been allowed in the cgroups configuration.
Control Groups restrict the volume of usage on a resource that has been enabled by a namespace. For example, the network namespace allows a process to access a particular network card, the cgroup ensures that the process does not exceed 50% usage of that card, ensuring bandwidth is available for other processes.
Control Group Namespaces provide a virtualized view of individual cgroups through the /proc/self/ns/cgroup interface.
The purpose is to prevent leakage of privileged data from the global namespaces to the cgroup and to enable other features, such as container migration.
Because it is now much easier to associate a container with a single cgroup, containers have a much more coherent cgroup view, it also enables tasks inside the container to have a virtualized view of the cgroup it belongs to.
4.1.2. What is a namespace?
Namespaces are a kernel feature that allow a virtual view of isolated system resources. By isolating a process from system resources, you can specify and control what a process is able to interact with. Namespaces are an essential part of Control Groups.
4.1.3. Supported namespaces
The following namespaces are supported from Red Hat Enterprise Linux 7.5 and later
Mount
- The mount namespace isolates file system mount points, enabling each process to have a distinct filesystem space within wich to operate.
UTS
- Hostname and NIS domain name
IPC
- System V IPC, POSIX message queues
PID
- Process IDs
Network
- Network devices, stacks, ports, etc.
User
- User and group IDs
Control Groups
- Isolates cgroups
Usage of Control Groups is documented in the Resource Management Guide
4.2. Kernel source checker
The Linux Kernel Module Source Checker (ksc) is a tool to check for non whitelist symbols in a given kernel module. Red Hat Partners can also use the tool to request review of a symbol for whitelist inclusion, by filing a bug in Red Hat bugzilla database.
4.2.1. Usage
The tool accepts the path to a module with the "-k" option
# ksc -k e1000e.ko Checking against architecture x86_64 Total symbol usage: 165 Total Non white list symbol usage: 74 # ksc -k /path/to/module
Output is saved in $HOME/ksc-result.txt
. If review of the symbols for whitelist addition is requested, then the usage description for each non-whitelisted symbol must be added to the ksc-result.txt
file. The request bug can then be filed by running ksc
with the "-p" option.
KSC currently does not support xz compression The ksc tool is unable to process the xz compression method and reports the following error:
Invalid architecture, (Only kernel object files are supported)
Until this limitation is resolved, system administrators need to manually uncompress any third party modules using xz compression, before running the ksc tool.
4.3. Direct access for files (DAX)
Direct Access for files, known as 'file system dax', or 'fs dax', enables applications to read and write data on a dax-capable storage device without using the page cache to buffer access to the device.
This functionality is available when using the 'ext4' or 'xfs' file system, and is enabled either by mounting the file system with -o dax
or by adding dax
to the options section for the mount entry in /etc/fstab
.
Further information, including code examples can be found in the kernel-doc
package and is stored at /usr/share/doc/kernel-doc-<version>/Documentation/filesystems/dax.txt
where '<version>' is the corresponding kernel version number.
4.4. Memory protection keys for userspace (also known as PKU, or PKEYS)
Memory Protection Keys provide a mechanism for enforcing page-based protections, but without requiring modification of the page tables when an application changes protection domains. It works by dedicating 4 previously ignored bits in each page table entry to a "protection key", giving 16 possible keys.
Memory Protection Keys are hardware feature of some Intel CPU chipsets. To determine if your processor supports this feature, check for the presence of pku
in /proc/cpuinfo
$ grep pku /proc/cpuinfo
To support this feature, the CPUs provide a new user-accessible register (PKRU) with two separate bits (Access Disable and Write Disable) for each key. Two new instructions (RDPKRU and WRPKRU) exist for reading and writing to the new register.
Further documentation, including programming examples can be found in /usr/share/doc/kernel-doc-*/Documentation/x86/protection-keys.txt
which is provided by the kernel-doc
package.
4.5. Kernel adress space layout randomization
Kernel Adress Space Layout Randomization (KASLR) consists of two parts which work together to enhance the security of the Linux kernel:
- kernel text KASLR
- memory management KASLR
The physical address and virtual address of kernel text itself are randomized to a different position separately. The physical address of the kernel can be anywhere under 64TB, while the virtual address of the kernel is restricted between [0xffffffff80000000, 0xffffffffc0000000], the 1GB space.
Memory management KASLR has three sections whose starting address is randomized in a specific area. KASLR can thus prevent inserting and redirecting the execution of the kernel to a malicious code if this code relies on knowing where symbols of interest are located in the kernel address space.
Memory management KASLR sections are:
- direct mapping section
- vmalloc section
- vmemmap section
KASLR code is now compiled into the Linux kernel, and it is enabled by default. To disable it explicitly, add the nokaslr
kernel option to the kernel command line.
4.6. Advanced Error Reporting (AER)
4.6.1. What is AER
Advanced Error Reporting
(AER
) is a kernel feature that provides enhanced error reporting for Peripheral Component Interconnect Express
(PCIe
) devices. The AER
kernel driver attaches root ports which support PCIe
AER
capability in order to:
- Gather the comprehensive error information if errors occurred
- Report error to the users
- Perform error recovery actions
Example 4.1. Example AER output
Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Corrected error received: id=ae00 Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00 Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID) Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: device [8086:2030] error status/mask=000000c0/00002000 Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: [ 6] Bad TLP Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: [ 7] Bad DLLP Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: AER: Multiple Corrected error received: id=ae00 Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=0000(Receiver ID) Feb 5 15:41:33 hostname kernel: pcieport 10003:00:00.0: device [8086:2030] error status/mask=00000040/00002000
When AER
captures an error, it sends an error message to the console. If the error is repairable, the console output is a warning.
4.6.2. Collecting and displaying AER messages
In order to collect and display AER messages, use the rasdaemon
program.
Procedure
Install the
rasdaemon
package.~]# yum install rasdaemon
Enable and start the
rasdaemon
service.~]# systemctl enable --now rasdaemon
Run the
ras-mc-ctl
command that displays a summary of the logged errors (the--summary
option) or displays the errors stored at the error database (the--errors
option).~]# ras-mc-ctl --summary ~]# ras-mc-ctl --errors
Additional resources
-
For more information on the
rasdaemon
service, see therasdaemon(8)
manual page. -
For more information on the
ras-mc-ctl
service, see theras-mc-ctl(8)
manual page.