Chapter 3. Important Changes to External Kernel Parameters
This chapter provides system administrators with a summary of significant changes in the kernel shipped with Red Hat Enterprise Linux 8.1. These changes include added or updated proc
entries, sysctl
, and sysfs
default values, boot parameters, kernel configuration options, or any noticeable behavior changes.
New kernel parameters
- perf_v4_pmi = [X86,INTEL]
This parameter disables the Intel PMU counter freezing feature.
The feature only exists starting from Arch Perfmon v4 (Skylake and newer).
Format: <bool>
- hv_nopvspin [X86,HYPER_V]
- This parameter disables the paravirtual spinlock optimizations which allow the hypervisor to 'idle' the guest on lock contention.
- ipcmni_extend [KNL]
- This parameter extends the maximum number of unique System V IPC identifiers from 32,768 to 16,777,216.
- kpti = [ARM64]
This parameter controls the page table isolation of user and kernel address spaces.
The options are:
-
Default
: enabled on cores which need mitigation. -
0
: force disabled -
1
: force enabled
-
- mds = [X86,INTEL]
This parameter controls the mitigation for the Micro-architectural Data Sampling (MDS) vulnerability.
Certain CPUs are vulnerable to an exploit against CPU internal buffers which can forward information to a disclosure gadget under certain conditions. In vulnerable processors, the speculatively forwarded data can be used in a cache side channel attack, to access data to which the attacker does not have direct access.
The options are:
-
full
- Enable MDS mitigation on vulnerable CPUs. -
full,nosmt
- Enable MDS mitigation and disable Simultaneous Multi Threading (SMT) on vulnerable CPUs. off
- Unconditionally disable MDS mitigation.Not specifying this parameter is equivalent to
mds=full
.For details see the upstream kernel documentation.
-
- mitigations = [X86,PPC,S390,ARM64]
This parameter controls the optional mitigations for CPU vulnerabilities. This is a set of curated, arch-independent options, each of which is an aggregation of existing arch-specific options.
The options are:
off
- Disable all optional CPU mitigations. This improves the system performance, but it may also expose users to several CPU vulnerabilities.Equivalent to:
-
nopti [X86,PPC]
-
kpti=0 [ARM64]
-
nospectre_v1 [X86,PPC]
-
nobp=0 [S390]
-
nospectre_v2 [X86,PPC,S390,ARM64]
-
spectre_v2_user=off [X86]
-
spec_store_bypass_disable=off [X86,PPC]
-
ssbd=force-off [ARM64]
-
l1tf=off [X86]
-
mds=off [X86]
-
auto
(default) - Mitigate all CPU vulnerabilities, but leave Simultaneous Multi Threading (SMT) enabled, even if it is vulnerable. This option is for users who do not want to be surprised by SMT getting disabled across kernel upgrades, or who have other ways of avoiding SMT-based attacks.Equivalent to:
- (default behavior)
auto,nosmt
- Mitigate all CPU vulnerabilities, disabling Simultaneous Multi Threading (SMT) if needed. This option is for users who always want to be fully mitigated, even if it means losing SMT.Equivalent to:
-
l1tf=flush,nosmt [X86]
-
mds=full,nosmt [X86]
-
- novmcoredd [KNL,KDUMP]
This parameter disables device dump.
The device dump allows drivers to append dump data to vmcore so you can collect driver specified debug info. Drivers can append the data without any limit and this data is stored in memory, so this may cause significant memory stress.
Disabling the device dump can help save memory but the driver debug data will be no longer available.
This parameter is only available when the
CONFIG_PROC_VMCORE_DEVICE_DUMP
kernel configuration is set.- nospectre_v1 [X86]
This parameter disables mitigations for Spectre Variant 1 (bounds check bypass).
With this option, data leaks are possible in the system.
- psi = [KNL]
This parameter enables or disables pressure stall information tracking.
Format: <bool>
- random.trust_cpu={on,off} [KNL]
-
This parameter enables or disables trusting the use of the CPU’s random number generator (if available) to fully seed the kernel’s Cryptographic Random Number Generator (CRNG). The default is controlled by the
CONFIG_RANDOM_TRUST_CPU
kernel config. - vm_debug[=options] [KNL]
Available with
CONFIG_DEBUG_VM=y
.Enabling this parameter may slow down the system boot speed, especially on systems with a large amount of memory.
All options are enabled by default, and this interface is meant to allow for selectively enabling or disabling specific virtual memory debugging features.
The options are:
-
P
- Enable page structure init time poisoning. -
-
(dash) - Disable all of the above options.
-
Updated kernel parameters
- cgroup_no_v1 = [KNL]
This parameter disables cgroup controllers and named hierarchies in version 1 (v1).
The parameter is like the
cgroup_disable
kernel parameter, but only applies to cgroup v1. The blacklisted controllers remain available in cgroup2. The "all" option blacklists all controllers and the "named" option disables the named mounts. Specifying both "all" and "named" disables all v1 hierarchies.Format: { { controller | "all" | "named" } [,{ controller | "all" | "named" }…] }
- crashkernel = size[KMG][@offset[KMG]][KNL]
The
kexec
system call allows Linux to switch to a 'crash kernel' upon panic. This parameter reserves the physical memory region [offset, offset + size] for that kernel image. If@offset
is omitted, then a suitable offset is selected automatically.[KNL, x86_64] select a region under 4G first, and fall back to reserve region above 4G when
@offset
has not been specified.For more information, see the upstream kdump documentation.
- l1tf = [X86]
This parameter controls the mitigation of the L1 Terminal Fault (L1TF) vulnerability on the affected CPUs.
The options are:
-
off
- Disables hypervisor mitigations and does not emit any warnings. It also drops the swap size and available RAM limit restriction on both hypervisor and bare metal. flush
- Is the default.For details see the upstream kernel documentation.
-
- nospectre_v2 [X86,PPC_FSL_BOOK3E,ARM64]
This parameter disables all mitigations for the Spectre variant 2 (indirect branch prediction) vulnerability.
The system may allow data leaks with this parameter.
- pci=option[,option…] [PCI]
Various PCI subsystem options.
The options are:
-
force_floating
[S390] - Force usage of floating interrupts. -
nomio
[S390] - Do not use memory input/output (MIO) instructions.
-
New /proc/sys/kernel parameters
- hyperv_record_panic_msg
This parameter controls whether the panic kernel message (kmsg) data is reported to Hyper-V or not.
The values are:
-
0
- Do not report the panic kmsg data. -
1
- Report the panic kmsg data. This is the default behavior.
-
New /proc/sys/net parameters
- bpf_jit_limit
This parameter enforces a global limit for memory allocations to the Berkeley Packet Filter Just-in-Time (BPF JIT) compiler in order to reject the unprivileged JIT requests once it has been surpassed.
The
bpf_jit_limit
parameter contains the value of the global limit in bytes.
Updated /proc/sys/fs parameters
- dentry-state
Dentries are dynamically allocated and deallocated.
The user is able to retrieve the following values from reading the
/proc/sys/fs/dentry-state
file:-
nr_dentry
- Shows the total number of dentries allocated (active + unused). -
nr_unused
- Shows the number of dentries that are not actively used, but are saved in the Least recently used (LRU) list for future reuse. -
age_limit
- Shows the age in seconds after which thedcache
entries can be reclaimed when the memory is short. -
want_pages
- Is nonzero when theshrink_dcache_pages()
function has been called and thedcache
is not pruned yet. -
nr_negative
- Shows the number of unused dentries that are also negative dentries which do not map to any files. Instead, they help speeding up rejection of non-existing files provided by the users.
-
Updated /proc/sys/kernel parameters
- msg_next_id, sem_next_id, and shm_next_id
Notes:
- The kernel does not guarantee that new object will have desired ID. It is up to the userspace, how to handle an object with the "wrong" ID.
- Toggle with the non-default value will be set back to -1 by the kernel after successful Inter-process Communication (IPC) object allocation. If the IPC object allocation syscall fails, it is undefined if the value remains unmodified or is reset to -1.