Chapter 5. Important changes to external kernel parameters
This chapter provides system administrators with a summary of significant changes in the kernel shipped with Red Hat Enterprise Linux 8.7. These changes could include for example added or updated proc
entries, sysctl
, and sysfs
default values, boot parameters, kernel configuration options, or any noticeable behavior changes.
New kernel parameters
- idxd.tc_override = [HW]
With this parameter in the
<bool>
format you can allow override of default traffic class configuration for the device.The default value is set to
false
(0
).- kvm.eager_page_split = [KVM,X86]
With this parameter you can control whether or not a KVM proactively splits all huge pages during dirty logging. Eager page splitting reduces interruptions to vCPU execution by eliminating the write-protection faults and Memory Management Unit (MMU) lock contention that is otherwise required to split huge pages lazily.
VM workloads that rarely perform writes or that write only to a small region of VM memory can benefit from disabling eager page splitting to allow huge pages to still be used for reads.
The behavior of eager page splitting depends on whether the
KVM_DIRTY_LOG_INITIALLY_SET
option is enabled or disabled.-
If disabled, all huge pages in a
memslot
are eagerly split when dirty logging is enabled on thatmemslot
. If enabled, eager page splitting is performed during the
KVM_CLEAR_DIRTY
ioctl()
system call, and only for the pages being cleared.Eager page splitting currently only supports splitting huge pages mapped by the two dimensional paging (TDP) MMU.
The default value is set to
Y
(on
).
-
If disabled, all huge pages in a
- kvm.nx_huge_pages_recovery_period_ms = [KVM]
With this parameter you can control the time period at which KVM zaps 4 KiB pages back to huge pages.
-
If the value is a non-zero
N
, KVM zaps a portion of the pages everyN
milliseconds. If the value is
0
, KVM picks a period based on the ratio, such that a page is zapped after 1 hour on average.The default value is set to
0
.
-
If the value is a non-zero
- mmio_stale_data = [X86,INTEL]
With this parameter you can control mitigation for the Processor Memory-mapped I/O (MMIO) Stale Data vulnerabilities.
Processor MMIO Stale Data is a class of vulnerabilities that can expose data after an MMIO operation. Exposed data could originate or end in the same CPU buffers as affected by metadata server (MDS) and Transactional Asynchronous Abort (TAA). Therefore, similar to MDS and TAA, the mitigation is to clear the affected CPU buffers.
The available options are:
-
full
: enable mitigation on vulnerable CPUs -
full,nosmt
: enable mitigation and disable SMT on vulnerable CPUs. off
: unconditionally disable mitigationOn MDS or TAA affected machines,
mmio_stale_data=off
can be prevented by an active MDS or TAA mitigation as these vulnerabilities are mitigated with the same mechanism. Thus, in order to disable this mitigation, you need to specifymds=off
andtsx_async_abort=off
, too.Not specifying this option is equivalent to
mmio_stale_data=full
.For more information, see
Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
.
-
- rcutree.rcu_delay_page_cache_fill_msec = [KNL]
-
With this parameter you can set the page-cache refill delay in milliseconds in response to low-memory conditions. The range of permitted values is
0:100000
. - rcuscale.kfree_rcu_test_double = [KNL]
-
With this parameter you can test the double-argument variant of the
kfree_rcu()
function. If this parameter has the same value asrcuscale.kfree_rcu_test_single
, both the single- and double-argument variants are tested. - rcuscale.kfree_rcu_test_single = [KNL]
-
With this parameter you can test the single-argument variant of the
kfree_rcu()
function. If this parameter has the same value asrcuscale.kfree_rcu_test_double
, both the single- and double-argument variants are tested. - retbleed = [X86]
With this parameter you can control mitigation of Arbitrary Speculative Code Execution with Return Instructions (RETBleed) vulnerability. The available options are:
-
off
: no mitigation -
auto
: automatically select a migitation -
auto,nosmt
: automatically select a mitigation, disabling SMT if necessary for the full mitigation (only on Zen1 and older without STIBP). -
ibpb
: mitigate short speculation windows on basic block boundaries too. Safe, highest performance impact. -
unret
: force enable untrained return thunks, only effective on AMD f15h-f17h based systems. unret,nosmt
: like theunret
option, will disable SMT when STIBP is not available.Selecting the
auto
option chooses a mitigation method at run time according to the CPU.Not specifying this option is equivalent to
retbleed=auto
.
-
- s390_iommu_aperture = [KNL,S390]
With this parameter you can specify the size of the per device DMA address space accessible through the DMA and IOMMU APIs as a decimal factor of the size of main memory.
-
The default value is set to
1
which means that you can concurrently use as many DMA addresses as physical memory is installed, if supported by hardware, and thus map all of memory at once. -
With a value of
2
you can map all of memory twice. -
The value of
0
imposes no restrictions other than those given by hardware at the cost of significant additional memory use for tables.
-
The default value is set to
Updated kernel parameters
- acpi_sleep = [HW,ACPI]
Format: { s3_bios, s3_mode, s3_beep, s4_hwsig, s4_nohwsig, old_ordering, nonvs, sci_force_enable, nobl }
-
For more information on
s3_bios
ands3_mode
, seeDocumentation/power/video.rst
. -
s3_beep
is for debugging; it makes the PC’s speaker beep as soon as the kernel real-mode entry point is called. -
s4_hwsig
causes the kernel to check the ACPI hardware signature during resume from hibernation, and gracefully refuse to resume if it has changed. The default behavior is to allow resume and simply warn when the signature changes, unless thes4_hwsig
option is enabled. -
s4_nohwsig
prevents ACPI hardware signature from being used, or even warned about, during resume.old_ordering
causes the ACPI 1.0 ordering of the_PTS
control method, with respect to putting devices into low power states, to be enforced. The ACPI 2.0 ordering of_PTS
is used by default. -
nonvs
prevents the kernel from saving and restoring the ACPI NVS memory during suspend, hibernation, and resume. -
sci_force_enable
causes the kernel to setSCI_EN
directly on resume from S1/S3. Even though this behavior is contrary to the ACPI specifications, some corrupted systems do not work without it. nobl
causes the internal denylist of systems known to behave incorrectly in some ways with respect to system suspend and resume to be ignored. Use this option wisely.For more information, see
Documentation/power/video.rst
.
-
For more information on
- crashkernel=size[KMG],high = [KNL, X86-64, ARM64]
With this parameter you can allocate physical memory region from top as follows:
- If the system has more than 4 GB RAM installed, the physical memory region can exceed 4 GB.
If the system has less than 4 GB RAM installed, the physical memory region will be allocated below 4 GB, if available.
This parameter is ignored if the
crashkernel=X
parameter is specified.
- crashkernel=size[KMG],low = [KNL, X86-64]
When you pass
crashkernel=X,high
, the kernel can allocate a physical memory region above 4 GB. This causes the second kernel crash on systems that require some amount of low memory (for example,swiotlb
requires at least 64M+32K low memory) and enough extra low memory to make sure DMA buffers for 32-bit devices are not exhausted. Kernel tries to allocate at least 256 M below 4 GB automatically. With this parameter you can specify the low range under 4 GB for the second kernel instead.-
0:
disables low allocation. It will be ignored whencrashkernel=X,high
is not used or memory reserved is below 4 GB.
-
- kvm.nx_huge_pages_recovery_ratio = [KVM]
With this parameter you can control how many 4KiB pages are periodically zapped back to huge pages:
-
0
disables the recovery N
KVM will zap1/Nth
of the 4KiB pages every period.The default is set to
60
.
-
- module.sig_enforce = norid [S390]
- With this parameter you can ignore the RID field and force the use of one PCI domain per PCI function.
- rcu_nocbs[=cpu-list] = [KNL]
The optional argument is a CPU list.
In kernels built with
CONFIG_RCU_NOCB_CPU=y
, you can enable the no-callback CPU mode, which prevents such CPUs callbacks from being invoked in softirq context. Invocation of such CPUs' RCU callbacks will instead be offloaded torcuox/N
kthreads
created for that purpose, wherex
isp
for RCU-preempt,s
for RCU-sched, andg
for thekthreads
that mediate grace periods; andN
is the CPU number. This reduces OS jitter on the offloaded CPUs, which can be useful for HPC and real-time workloads. It can also improve energy efficiency for asymmetric multiprocessors.-
If a
cpulist
is passed as an argument, the specified list of CPUs is set to no-callback mode from boot. -
If the
=
sign and thecpulist
arguments are omitted, no CPU will be set to no-callback mode from boot but you can toggle the mode at runtime usingcpusets
.
-
If a
- spectre_v2_user = [X86]
With this parameter you can control mitigation of Spectre variant 2 (indirect branch speculation) vulnerability between user space tasks.
-
auto
: kernel selects the mitigation depending on the available CPU features and vulnerability. -
The default mitigation is set to
prctl
. -
Not specifying this option is equivalent to
spectre_v2_user=auto
.
-
- spec_store_bypass_disable = [X86]
With this parameter you can control whether the Speculative Store Bypass (SSB) optimization to mitigate the SSB vulnerability is used.
-
Not specifying this option is equivalent to
spec_store_bypass_disable=auto
. -
The default mitigation is set to
prctl
.
-
Not specifying this option is equivalent to
New sysctl parameters
- perf_user_access = [ARM64]
With this parameter you can control user space access for reading performance event counters.
-
When set to
1
, user space can read performance monitor counter registers directly. The default is set to
0
, which meansaccess disabled
.For more information, see
Documentation/arm64/perf.rst
.
-
When set to
- force_cgroup_v2_swappiness
With this parameter you can deprecate the per-cgroup swappiness value available only in
cgroupsV1
. Due to asystemd
design choice, most of all system and user processes are run within acgroup
. Furthermore thesecgroup
swappiness values default to60
. This can lead to undesireable effects where systems swappiness value has little effect on the swap behavior of the system.If you do want to use the per-
cgroup
swappiness feature, you can configure the system withforce_cgroup_v2_swappiness=1
to have more consistent swappiness behavior across the whole system.Note that this is a RHEL specific feature.