Este conteúdo não está disponível no idioma selecionado.
Chapter 5. Important changes to external kernel parameters
This chapter provides system administrators with a summary of significant changes in the kernel distributed with Red Hat Enterprise Linux 9.5. These changes could include, for example, added or updated proc
entries, sysctl
, and sysfs
default values, boot parameters, kernel configuration options, or any noticeable behavior changes.
New kernel parameters
numa_cma=<node>:nn[MG][,<node>:nn[MG]]
[KNL,CMA]
Sets the size of kernel numa memory area for contiguous memory allocations. It will reserve CMA area for the specified node.
With numa CMA enabled, DMA users on node nid will first try to allocate buffer from the numa area which is located in node nid, if the allocation fails, they will fallback to the global default memory area.
reg_file_data_sampling=
[x86]
Controls mitigation for Register File Data Sampling (RFDS) vulnerability. RFDS is a CPU vulnerability which may allow userspace to infer kernel data values previously stored in floating point registers, vector registers, or integer registers. RFDS only affects Intel Atom processors.
Values:
-
on
:: Turns ON the mitigation -
off
:: Turns OFF the mitigation
This parameter overrides the compile time default set by CONFIG_MITIGATION_RFDS. Mitigation cannot be disabled when other VERW based mitigations (like MDS) are enabled. In order to disable RFDS mitigation all VERW based mitigations need to be disabled.
For details see: Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
locktorture.acq_writer_lim=
[KNL]
Set the time limit in jiffies for a lock acquisition. Acquisitions exceeding this limit will result in a splat once they do complete.
locktorture.bind_readers=
[KNL]
Specify the list of CPUs to which the readers are to be bound.
locktorture.bind_writers=
[KNL]
Specify the list of CPUs to which the writers are to be bound.
locktorture.call_rcu_chains=
[KNL]
Specify the number of self-propagating call_rcu()
chains to set up. These are used to ensure that there is a high probability of an RCU grace period in progress at any given time. Defaults to 0, which disables these call_rcu()
chains.
locktorture.long_hold=
[KNL]
Specify the duration in milliseconds for the occasional long-duration lock hold time. Defaults to 100 milliseconds. Select 0
to disable.
locktorture.nested_locks=
[KNL]
Specify the maximum lock nesting depth that locktorture is to exercise, up to a limit of 8
(MAX_NESTED_LOCKS). Specify zero
to disable. Note that this parameter is ineffective on types of locks that do not support nested acquisition.
workqueue.default_affinity_scope=
Select the default affinity scope to use for unbound workqueues. Can be one of "cpu", "smt", "cache", "numa" and "system". Default is "cache". For more information, see the Affinity Scopes section in Documentation/core-api/workqueue.rst
.
This can be changed after boot by writing to the matching /sys/module/workqueue/parameters
file. All workqueues with the "default" affinity scope will be updated accordignly.
locktorture.rt_boost=
[KNL]
Do periodic testing of real-time lock priority boosting. Select 0
to disable, 1
to boost only rt_mutex, and 2
to boost unconditionally. Defaults to 2
, which might seem to be an odd choice, but which should be harmless for non-real-time spinlocks, due to their disabling of preemption. Note that non-realtime mutexes disable boosting.
locktorture.writer_fifo=
[KNL]
Run the write-side locktorture kthreads at sched_set_fifo()
real-time priority.
locktorture.rt_boost_factor=
[KNL]
Number that determines how often and for how long priority boosting is exercised. This is scaled down by the number of writers, so that the number of boosts per unit time remains roughly constant as the number of writers increases. On the other hand, the duration of each boost increases with the number of writers.
microcode.force_minrev=
[X86]
Format: <bool>
Enable or disable the microcode minimal revision enforcement for the runtime microcode loader.
module.async_probe=<bool>
[KNL]
When set to true, modules will use async probing by default. To enable/disable async probing for a specific module, use the module specific control that is documented under <module>.async_probe
. When both module.async_probe
and <module>.async_probe
are specified, <module>.async_probe
takes precedence for the specific module.
module.enable_dups_trace
[KNL]
When CONFIG_MODULE_DEBUG_AUTOLOAD_DUPS
is set, this means that duplicate request_module()
calls will trigger a WARN_ON()
instead of a pr_warn()
. Note that if MODULE_DEBUG_AUTOLOAD_DUPS_TRACE
is set, WARN_ON()
will always be issued and this option does nothing.
nfs.delay_retrans=
[NFS]
Specifies the number of times the NFSv4 client retries the request before returning an EAGAIN error, after a reply of NFS4ERR_DELAY from the server. Only applies if the softerr mount option is enabled, and the specified value is >= 0
.
rcutree.do_rcu_barrier=
[KNL]
Request a call to rcu_barrier()
. This is throttled so that userspace tests can safely hammer on the sysfs variable if they so choose. If triggered before the RCU grace-period machinery is fully active, this will error out with EAGAIN.
rcuscale.minruntime=
[KNL]
Set the minimum test run time in seconds. This does not affect the data-collection interval, but instead allows better measurement of things like CPU consumption.
rcuscale.writer_holdoff_jiffies=
[KNL]
Additional write-side holdoff between grace periods, but in jiffies. The default of zero says no holdoff.
rcupdate.rcu_cpu_stall_notifiers=
[KNL]
Provide RCU CPU stall notifiers, but see the warnings in the RCU_CPU_STALL_NOTIFIER Kconfig option’s help text. TL;DR: You almost certainly do not want rcupdate.rcu_cpu_stall_notifiers.
rcupdate.rcu_task_lazy_lim=
[KNL]
Number of callbacks on a given CPU that will cancel laziness on that CPU. Use -1
to disable cancellation of laziness, but be advised that doing so increases the danger of OOM due to callback flooding.
rcupdate.rcu_tasks_lazy_ms= ++
[KNL]
Set timeout in milliseconds RCU Tasks asynchronous callback batching for call_rcu_tasks()
. A negative value will take the default. A value of zero will disable batching. Batching is always disabled for synchronize_rcu_tasks()
.
rcupdate.rcu_tasks_rude_lazy_ms=
[KNL]
Set timeout in milliseconds RCU Tasks Rude asynchronous callback batching for call_rcu_tasks_rude()
. A negative value will take the default. A value of zero will disable batching. Batching is always disabled for synchronize_rcu_tasks_rude()
.
rcupdate.rcu_tasks_trace_lazy_ms=
[KNL]
Set timeout in milliseconds RCU Tasks Trace asynchronous callback batching for call_rcu_tasks_trace(). A negative value will take the default. A value of zero will disable batching. Batching is always disabled for synchronize_rcu_tasks_trace().
spectre_bhi=
[X86]
Control mitigation of Branch History Injection (BHI) vulnerability. This setting affects the deployment of the HW BHI control and the SW BHB clearing sequence.
Values:
on
- (default) Enable the HW or SW mitigation as needed.
off
- Disable the mitigation.
unwind_debug
[X86-64]
Enable unwinder debug output. This can be useful for debugging certain unwinder error conditions, including corrupt stacks and bad/missing unwinder metadata.
workqueue.cpu_intensive_thresh_us=
Per-cpu work items which run for longer than this threshold are automatically considered CPU intensive and excluded from concurrency management to prevent them from noticeably delaying other per-cpu work items. Default is 10000
(10ms).
If CONFIG_WQ_CPU_INTENSIVE_REPORT
is set, the kernel will report the work functions which violate this threshold repeatedly. They are likely good candidates for using WQ_UNBOUND workqueues instead.
workqueue.cpu_intensive_warning_thresh=<uint> If CONFIG_WQ_CPU_INTENSIVE_REPORT
is set, the kernel will report the work functions which violate the intensive_threshold_us
repeatedly. In order to prevent spurious warnings, start printing only after a work function has violated this threshold number of times.
The default is 4 times. 0
disables the warning.
workqueue.default_affinity_scope=
Select the default affinity scope to use for unbound workqueues. Can be one of "cpu", "smt", "cache", "numa" and "system". Default is "cache". For more information, see the Affinity Scopes section in Documentation/core-api/workqueue.rst.
This can be changed after boot by writing to the matching /sys/module/workqueue/parameters
file. All workqueues with the "default" affinity scope will be updated accordignly.
xen_msr_safe=
[X86,XEN]
Format: <bool>
Select whether to always use non-faulting (safe) MSR access functions when running as Xen PV guest. The default value is controlled by CONFIG_XEN_PV_MSR_SAFE
.
Updated kernel parameters
clearcpuid=
X[,X...] [X86]
Disable CPUID feature X for the kernel. See numbers X.
Note the Linux-specific bits are not necessarily stable over kernel options, but the vendor-specific ones should be. X can also be a string as appearing in the flags: line in /proc/cpuinfo
which does not have the above instability issue. However, not all features have names in /proc/cpuinfo
. Note that using this option will taint your kernel.
Also note that user programs calling CPUID directly or using the feature without checking anything will still see it. This just prevents it from being used by the kernel or shown in /proc/cpuinfo
. Also note the kernel might malfunction if you disable some critical bits.
cma_pernuma=nn[MG]
[KNL,CMA]
Sets the size of kernel per-numa memory area for contiguous memory allocations. A value of 0
disables per-numa CMA altogether. And If this option is not specified, the default value is 0
. With per-numa CMA enabled, DMA users on node nid will first try to allocate buffer from the pernuma area which is located in node nid, if the allocation fails, they will fallback to the global default memory area.
csdlock_debug=
[KNL]
Enable debug add-ons of cross-CPU function call handling. When switched on, additional debug data is printed to the console in case a hanging CPU is detected, and that CPU is pinged again in order to try to resolve the hang situation. The default value of this option depends on the CSD_LOCK_WAIT_DEBUG_DEFAULT
Kconfig option.
<module>.async_probe[=<bool>]
[KNL]
If no <bool>
value is specified or if the value specified is not a valid <bool>
, enable asynchronous probe on this module. Otherwise, enable/disable asynchronous probe on this module as indicated by the <bool>
value. See also: module.async_probe
earlycon=
[KNL]
Output early console device and options.
When used with no options, the early console is determined by stdout-path property in device tree’s chosen node or the ACPI SPCR table if supported by the platform.
cdns,<addr>[,options] Start an early, polled-mode console on a Cadence (xuartps) serial port at the specified address. Only supported option is baud rate. If baud rate is not specified, the serial port must already be setup and configured.
uart[8250],io,<addr>[,options[,uartclk]]uart[8250],mmio,<addr>[,options[,uartclk]]uart[8250],mmio32,<addr>[,options[,uartclk]]uart[8250],mmio32be,<addr>[,options[,uartclk]]uart[8250],0x<addr>[,options]
Start an early, polled-mode console on the 8250/16550 UART at the specified I/O port or MMIO address. MMIO inter-register address stride is either 8-bit (mmio) or 32-bit (mmio32 or mmio32be). If none of [io|mmio|mmio32|mmio32be], <addr>
is assumed to be equivalent to 'mmio'. 'options' are specified in the same format described for "console=ttyS<n>"; if unspecified, the h/w is not initialized. 'uartclk' is the uart clock frequency; if unspecified, it is set to 'BASE_BAUD' * 16.
earlyprintk=
[X86,SH,ARM,M68k,S390]
earlyprintk=vga earlyprintk=sclp earlyprintk=xen earlyprintk=serial[,ttySn[,baudrate]] earlyprintk=serial[,0x…[,baudrate]] earlyprintk=ttySn[,baudrate] earlyprintk=dbgp[debugController#] earlyprintk=pciserial[,force],bus:device.function[,baudrate] earlyprintk=xdbc[xhciController#]
earlyprintk is useful when the kernel crashes before the normal console is initialized. It is not enabled by default because it has some cosmetic problems.
Append ",keep" to not disable it when the real console takes over.
Only one of vga, efi, serial, or usb debug port can be used at a time.
Currently only ttyS0 and ttyS1 may be specified by name. Other I/O ports may be explicitly specified on some architectures (x86 and arm at least) by replacing ttySn with an I/O port address, like this: earlyprintk=serial,0x1008,115200 You can find the port for a given device in /proc/tty/driver/serial: 2: uart:ST16650V2 port:00001008 irq:18.
Interaction with the standard serial driver is not very good.
The VGA and EFI output is eventually overwritten by the real console.
The xen option can only be used in Xen domains.
The sclp output can only be used on s390.
The optional "force" to "pciserial" enables use of a PCI device even when its classcode is not of the UART class.
iommu.strict=
[ARM64, X86, S390]
Configure TLB invalidation behaviour.
Format: { "0" | "1" }
0
- Lazy mode- Request that DMA unmap operations use deferred invalidation of hardware TLBs, for increased throughput at the cost of reduced device isolation. Will fall back to strict mode if not supported by the relevant IOMMU driver.
1
- Strict mode- DMA unmap operations invalidate IOMMU hardware TLBs synchronously.
unset
- Use value of CONFIG_IOMMU_DEFAULT_DMA_{LAZY,STRICT}.
On x86, strict mode specified via one of the legacy driver-specific options takes precedence.
mem_encrypt=
[x86_64]
AMD Secure Memory Encryption (SME) control.
Valid arguments: on, off
Default: off
mem_encrypt=on: Activate SME mem_encrypt=off: Do not activate SME
mitigations=
[X86,PPC,S390,ARM64]
Control optional mitigations for CPU vulnerabilities. This is a set of curated, arch-independent options, each of which is an aggregation of existing arch-specific options.
Value: off
Disable all optional CPU mitigations. This improves system performance, but it may also expose users to several CPU vulnerabilities. Equivalent to: if nokaslr then kpti=0 [ARM64] gather_data_sampling=off [X86] kvm.nx_huge_pages=off [X86] l1tf=off [X86] mds=off [X86] mmio_stale_data=off [X86] no_entry_flush [PPC] no_uaccess_flush [PPC] nobp=0 [S390] nopti [X86,PPC] nospectre_bhb [ARM64] nospectre_v1 [X86,PPC] nospectre_v2 [X86,PPC,S390,ARM64] reg_file_data_sampling=off [X86] retbleed=off [X86] spec_rstack_overflow=off [X86] spec_store_bypass_disable=off [X86,PPC] spectre_bhi=off [X86] spectre_v2_user=off [X86] srbds=off [X86,INTEL] ssbd=force-off [ARM64] tsx_async_abort=off [X86]
nosmap
[PPC]
Disable SMAP (Supervisor Mode Access Prevention) even if it is supported by processor.
nosmep
[PPC64s]
Disable SMEP (Supervisor Mode Execution Prevention) even if it is supported by processor.
nox2apic
[x86_64,APIC]
Do not enable x2APIC mode.
This parameter will be ignored on systems with the LEGACY_XAPIC_DISABLED
bit set in the IA32_XAPIC_DISABLE_STATUS MSR
.
panic_print=
Bitmask for printing system info when panic happens. User can chose combination of the following bits:
- bit 0: print all tasks info
- bit 1: print system memory info
- bit 2: print timer info
- bit 3: print locks info if CONFIG_LOCKDEP is on
- bit 4: print ftrace buffer
- bit 5: print all printk messages in buffer
- bit 6: print all CPUs backtrace (if available in the arch)
This option may print a lot of lines, so there are risks of losing older messages in the log. Use this option carefully, maybe worth to setup a bigger log buffer with "log_buf_len" along with this.
pcie_aspm=
[PCIE]
Forcibly enable or ignore PCIe Active State Power Management.
Value:
off
- Don’t touch ASPM configuration at all. Leave any configuration done by firmware unchanged.
force
- Enable ASPM even on devices that claim not to support it.
Forcing ASPM on may cause system lockups.
s390_iommu=
[HW,S390]
Set s390 IOTLB flushing mode.
Value:
strict
- with strict flushing every unmap operation will result in an IOTLB flush.
Default
- is lazy flushing before reuse, which is faster.
Deprecated
- equivalent to iommu.strict=1.
spectre_v2=
[x86]
Control mitigation of Spectre variant 2 (indirect branch speculation) vulnerability. The default operation protects the kernel from user space attacks.
Value:
on
- unconditionally enable, implies spectre_v2_user=on.
off
- unconditionally disable, implies spectre_v2_user=off.
auto
- kernel detects whether your CPU model is vulnerable.
Selecting 'on' will, and 'auto' may, choose a mitigation method at run time according to the CPU, the available microcode, the setting of the CONFIG_MITIGATION_RETPOLINE
configuration option, and the compiler with which the kernel was built.
usbcore.quirks=
[USB]
A list of quirk entries to augment the built-in usb core quirk list. List entries are separated by commas. Each entry has the form VendorID:ProductID:Flags
. The IDs are 4-digit hex numbers and Flags is a set of letters. Each letter will change the built-in quirk; setting it if it is clear and clearing it if it is set. The letters have the following meanings:
- a = USB_QUIRK_STRING_FETCH_255 (string descriptors must not be fetched usinga 255-byte read);
- b = USB_QUIRK_RESET_RESUME (device can’t resume correctly so reset it instead);
- c = USB_QUIRK_NO_SET_INTF (device can’t handle Set-Interface requests);
- d = USB_QUIRK_CONFIG_INTF_STRINGS (device can’t handle its Configuration or Interface strings);
- e = USB_QUIRK_RESET (device can’t be reset (e.g morph devices), don’t use reset);
-
f = USB_QUIRK_HONOR_BNUMINTERFACES (device has more interface descriptions than the
bNumInterfaces
count, and can’t handle talking to these interfaces); - g = USB_QUIRK_DELAY_INIT (device needs a pause during initialization, after we read the device descriptor);
- h = USB_QUIRK_LINEAR_UFRAME_INTR_BINTERVAL (For high speed and super speed interrupt endpoints, the USB 2.0 and USB 3.0 spec require the interval in microframes (1 microframe = 125 microseconds) to be calculated as interval = 2 ^ (bInterval-1). Devices with this quirk report their bInterval as the result of this calculation instead of the exponent variable used in the calculation);
- i = USB_QUIRK_DEVICE_QUALIFIER (device can’t handle device_qualifier descriptor requests);
- j = USB_QUIRK_IGNORE_REMOTE_WAKEUP (device generates spurious wakeup, ignore remote wakeup capability);
- k = USB_QUIRK_NO_LPM (device can’t handle Link Power Management);
- l = USB_QUIRK_LINEAR_FRAME_INTR_BINTERVAL (Device reports its bInterval as linear frames instead of the USB 2.0 calculation);
- m = USB_QUIRK_DISCONNECT_SUSPEND (Device needs to be disconnected before suspend to prevent spurious wakeup);
- n = USB_QUIRK_DELAY_CTRL_MSG (Device needs a pause after every control message);
- o = USB_QUIRK_HUB_SLOW_RESET (Hub needs extra delay after resetting its port);
- p = USB_QUIRK_SHORT_SET_ADDRESS_REQ_TIMEOUT (Reduce timeout of the SET_ADDRESS request from 5000 ms to 500 ms);
Example: quirks=0781:5580:bk,0a5c:5834:gij
Removed kernel parameters
-
[BUGS=X86]
noclflush
:: Don’t use the CLFLUSH instruction. -
Workqueue.disable_numa
-
[X86]
noexec
-
[BUGS=X86-32]
nosep
:: Disables x86 SYSENTER/SYSEXIT support. -
[X86]
nordrand
:: Disable kernel use of the RDRAND. -
thermal.nocrt
New sysctl parameters
oops_limit
Number of kernel oopses after which the kernel should panic when panic_on_oops
is not set. Setting this to 0
disables checking the count. Setting this to 1
has the same effect as setting panic_on_oops=1
. The default value is 10000
.
warn_limit
Number of kernel warnings after which the kernel should panic when panic_on_warn
is not set. Setting this to 0
disables checking the warning count. Setting this to 1
has the same effect as setting panic_on_warn=1
. The default value is 0
.
kexec_load_limit_panic
This parameter specifies a limit to the number of times the syscalls kexec_load
and kexec_file_load
can be called with a crash image. It can only be set with a more restrictive value than the current one.
Value:
-1
- Unlimited calls to kexec. This is the default setting.
N
- Number of calls left.
kexec_load_limit_reboot
Similar functionality as kexec_load_limit_panic
, but for a normal image.
numa_balancing_promote_rate_limit_MBps
Too high promotion or demotion throughput between different memory types might hurt application latency. You can use this parameter to rate-limit the promotion throughput. The per-node maximum promotion throughput in MB/s is limited to be no more than the set value.
A rule of thumb is to set this to less than 1/10 of the PMEM node write bandwidth.
Updated sysctl parameters
kexec_load_disabled
A toggle indicating if the syscalls kexec_load
and kexec_file_load
have been disabled. This value defaults to 0
(false: kexec_*load
enabled), but can be set to 1
(true: kexec_*load
disabled).
Once true, kexec can no longer be used, and the toggle cannot be set back to false
. This allows a kexec image to be loaded before disabling the syscall allowing a system to set up (and later use) an image without it being altered. Generally used together with the `modules_disabled`_ sysctl.
panic_print
Bitmask for printing system info when panic happens. User can chose combination of the following bits:
- bit 0 print all tasks info
- bit 1 print system memory info
- bit 2 print timer info
-
bit 3 print locks info if
CONFIG_LOCKDEP
is on - bit 4 print ftrace buffer
- bit 5 print all printk messages in buffer
- bit 6 print all CPUs backtrace (if available in the arch)
sched_energy_aware
Enables/disables Energy Aware Scheduling (EAS). EAS starts automatically on platforms where it can run (that is, platforms with asymmetric CPU topologies and having an Energy Model available). If your platform happens to meet the requirements for EAS but you do not want to use it, change this value to 0
. On Non-EAS platforms, write operation fails and read doesn’t return anything.