Chapter 11. Keeping kernel panic parameters disabled in virtualized environments
When configuring a Virtual Machine in RHEL 8, do not enable the softlockup_panic and nmi_watchdog kernel parameters, because the Virtual Machine might suffer from a spurious soft lockup. And that should not require a kernel panic.
Find the reasons behind this advice in the following sections.
11.1. What is a soft lockup Copy linkLink copied to clipboard!
A soft lockup occurs when a task executes in kernel space without rescheduling, preventing other tasks from running on that CPU. This issue, often caused by a bug, triggers a warning on the system console to alert users.
11.2. Parameters controlling kernel panic Copy linkLink copied to clipboard!
The following kernel parameters can be set to control a system’s behavior when a soft lockup is detected.
softlockup_panicControls whether or not the kernel will panic when a soft lockup is detected.
Expand Type Value Effect Integer
0
kernel does not panic on soft lockup
Integer
1
kernel panics on soft lockup
By default, on RHEL 8, this value is 0.
The system needs to detect a hard lockup first to be able to panic. The detection is controlled by the
nmi_watchdogparameter.nmi_watchdogControls whether lockup detection mechanisms (
watchdogs) are active or not. This parameter is of integer type.Expand Value Effect 0
disables lockup detector
1
enables lockup detector
The hard lockup detector monitors each CPU for its ability to respond to interrupts.
watchdog_threshControls frequency of watchdog
hrtimer, NMI events, and soft or hard lockup thresholds.Expand Default threshold Soft lockup threshold 10 seconds
2 *
watchdog_threshSetting this parameter to zero disables lockup detection altogether.
11.3. Spurious soft lockups in virtualized environments Copy linkLink copied to clipboard!
Soft lockup warnings on guest operating systems can be false alarms caused by host workload or resource contention. Unlike physical hosts where these indicate bugs, virtualized environments might trigger false warnings when the host schedules out the guest CPU for extended periods.
Heavy workload on a host or high contention over some specific resource, such as memory, can cause a spurious soft lockup firing because the host might schedule out the guest CPU for a period longer than 20 seconds. When the guest CPU is again scheduled to run on the host, it experiences a time jump that triggers the due timers. The timers also include the hrtimer watchdog that can report a soft lockup on the guest CPU.
Soft lockup in a virtualized environment can be false. You must not enable the kernel parameters that trigger a system panic when a soft lockup reports to a guest CPU.
To understand soft lockups in guests, it is essential to know that the host schedules the guest as a task, and the guest then schedules its own tasks.