Chapter 2. Working with sysctl and kernel tunables
2.1. What is a kernel tunable?
Kernel tunables are used to customize the behavior of Red Hat Enterprise Linux at boot, or on demand while the system is running. Some hardware parameters are specified at boot time only and cannot be altered once the system is running, most however, can be altered as required and set permanent for the next boot.
2.2. How to work with kernel tunables
There are three ways to modify kernel tunables.
-
Using the
sysctl
command -
By manually modifying configuration files in the
/etc/sysctl.d/
directory -
Through a shell, interacting with the virtual file system mounted at
/proc/sys
Not all boot time parameters are under control of the sysfs subsystem, some hardware specific option must be set on the kernel command line, the Kernel Parameters section of this guide addresses those options
2.2.1. Using the sysctl command
The sysctl
command is used to list, read, and set kernel tunables. It can filter tunables when listing or reading and set tunables temporarily or permanently.
Listing variables
# sysctl -a
Reading variables
# sysctl kernel.version kernel.version = #1 SMP Fri Jan 19 13:19:54 UTC 2018
Writing variables temporarily
# sysctl <tunable class>.<tunable>=<value>
Writing variables permanently
# sysctl -w <tunable class>.<tunable>=<value> >> /etc/sysctl.conf
2.2.2. Modifying files in /etc/sysctl.d
To override a default at boot, you can also manually populate files in /etc/sysctl.d
.
Create a new file in
/etc/sysctl.d
# vim /etc/sysctl.d/99-custom.conf
Include the variables you wish to set, one per line, in the following form
<tunable class>.<tunable> = <value> + <tunable class>.<tunable> = <value>
- Save the file
-
Either reboot the machine to make the changes take effect
or
Executesysctl -p /etc/sysctl.d/99-custom.conf
to apply the changes without rebooting
2.3. What tunables can be controlled?
Tunables are divided into groups by kernel sybsystem. A Red Hat Enterprise Linux system has the following classes of tunables:
Class | Subsystem |
---|---|
abi | Execution domains and personalities |
crypto | Cryptographic interfaces |
debug | Kernel debugging interfaces |
dev | Device specific information |
fs | Global and specific filesystem tunables |
kernel | Global kernel tunables |
net | Network tunables |
sunrpc | Sun Remote Procedure Call (NFS) |
user | User Namespace limits |
vm | Tuning and management of memory, buffer, and cache |
2.3.1. Network interface tunables
System administrators are able to adjust the network configuration on a running system through the networking tunables.
Networking tunables are included in the /proc/sys/net
directory, which contains multiple subdirectories for various networking topics. To adjust the network configuration, system administrators need to modify the files within such subdirectories.
The most frequently used directories are:
-
/proc/sys/net/core/
-
/proc/sys/net/ipv4/
The /proc/sys/net/core/
directory contains a variety of settings that control the interaction between the kernel and networking layers. By adjusting some of those tunables, you can improve performance of a system, for example by increasing the size of a receive queue, increasing the maximum connections or the memory dedicated to network interfaces. Note that the performance of a system depends on different aspects according to the individual issues.
The /proc/sys/net/ipv4/
directory contains additional networking settings, which are useful when preventing attacks on the system or when using the system to act as a router. The directory contains both IP and TCP variables. For detailed explaination of those variables, see /usr/share/doc/kernel-doc-<version>/Documentation/networking/ip-sysctl.txt
.
Other directories within the /proc/sys/net/ipv4/
directory cover different aspects of the network stack:
-
/proc/sys/net/ipv4/conf/
- allows you to configure each system interface in different ways, including the use of default settings for unconfigured devices and settings that override all special configurations -
/proc/sys/net/ipv4/neigh/
- contains settings for communicating with a host directly connected to the system and also contains different settings for systems more than one step away -
/proc/sys/net/ipv4/route/
- contains specifications that apply to routing with any interfaces on the system
This list of network tunables is relevant to IPv4 interfaces and are accessible from the /proc/sys/net/ipv4/{all,<interface_name>}/
directory.
Description of the following parameters have been adopted from the kernel documentation sites.[1]
- log_martians
Log packets with impossible addresses to kernel log.
Type Default Boolean
0
Enabled if one or more of
conf/{all,interface}/log_martians
is set to TRUEFurther Resources
- accept_redirects
Accept ICMP redirect messages.
Type Default Boolean
1
accept_redirects for the interface is enabled under the following conditions:
-
Both
conf/{all,interface}/accept_redirects
are TRUE (when forwarding for the interface is enabled) -
At least one of
conf/{all,interface}/accept_redirects
is TRUE (forwarding for the interface is disabled)
For more information refer to How to enable or disable ICMP redirects
-
Both
- forwarding
Enable IP forwarding on an interface.
Type Default Boolean
0
Further Resources
- mc_forwarding
Do multicast routing.
Type Default Boolean
0
- Read only value
- A multicast routing daemon is required.
-
conf/all/mc_forwarding
must also be set to TRUE to enable multicast routing for the interface
Further Resources
- For an explanation of the read only behavior, see Why system reports "permission denied on key" while setting the kernel parameter "net.ipv4.conf.all.mc_forwarding"?
- medium_id
Arbitrary value used to differentiate the devices by the medium they are attached to.
Type Default Integer
0
Notes
- Two devices on the same medium can have different id values when the broadcast packets are received only on one of them.
- The default value 0 means that the device is the only interface to its medium
- value of -1 means that medium is not known.
- Currently, it is used to change the proxy_arp behavior:
- the proxy_arp feature is enabled for packets forwarded between two devices attached to different media.
Further Resources - For examples, see Using the "medium_id" feature in Linux 2.2 and 2.4
- proxy_arp
Do proxy arp.
Type Default Boolean
0
proxy_arp for the interface is enabled if at least one of
conf/{all,interface}/proxy_arp
is set to TRUE, otherwise it is disabled
- proxy_arp_pvlan
Private VLAN proxy arp.
Type Default Boolean
0
Allow proxy arp replies back to the same interface, to support features like RFC 3069
- secure_redirects
Accept ICMP redirect messages only to gateways listed in the interface’s current gateway list.
Type Default Boolean
1
Notes
- Even if disabled, RFC1122 redirect rules still apply.
- Overridden by shared_media.
-
secure_redirects for the interface is enabled if at least one of
conf/{all,interface}/secure_redirects
is set to TRUE
- send_redirects
Send redirects, if router.
Type Default Boolean
1
Notes
send_redirects for the interface is enabled if at least one ofconf/{all,interface}/send_redirects
is set to TRUE
- bootp_relay
Accept packets with source address 0.b.c.d destined not to this host as local ones.
Type Default Boolean
0
Notes
- A BOOTP daemon must be enabled to manage these packets
-
conf/all/bootp_relay
must also be set to TRUE to enable BOOTP relay for the interface - Not implemented, see DHCP Relay Agent in the Red Hat Enterprise Linux Networking Guide
- accept_source_route
Accept packets with SRR option.
Type Default Boolean
1
Notes
-
conf/all/accept_source_route
must also be set to TRUE to accept packets with SRR option on the interface
-
- accept_local
Accept packets with local source addresses.
Type Default Boolean
0
Notes
- In combination with suitable routing, this can be used to direct packets between two local interfaces over the wire and have them accepted properly.
-
rp_filter
must be set to a non-zero value in order for accept_local to have an effect.
- route_localnet
Do not consider loopback addresses as martian source or destination while routing.
Type Default Boolean
0
Notes
- This enables the use of 127/8 for local routing purposes.
- rp_filter
Enable source Validation
Type Default Integer
0
Value Effect 0
No source validation.
1
Strict mode as defined in RFC3704 Strict Reverse Path
2
Loose mode as defined in RFC3704 Loose Reverse Path
Notes
- Current recommended practice in RFC3704 is to enable strict mode to prevent IP spoofing from DDos attacks.
- If using asymmetric routing or other complicated routing, then loose mode is recommended.
-
The highest value from
conf/{all,interface}/rp_filter
is used when doing source validation on the {interface}
- arp_filter
Type Default Boolean
0
Value Effect 0
(default) The kernel can respond to arp requests with addresses from other interfaces. It usually makes sense, because it increases the chance of successful communication.
1
Allows you to have multiple network interfaces on the samesubnet, and have the ARPs for each interface be answered based on whether or not the kernel would route a packet from the ARP’d IP out that interface (therefore you must use source based routing for this to work). In other words it allows control of cards (usually 1) that respond to an arp request.
Note
- IP addresses are owned by the complete host on Linux, not by particular interfaces. Only for more complex setups like load-balancing, does this behavior cause problems.
-
arp_filter
for the interface is enabled if at least one ofconf/{all,interface}/arp_filter
is set to TRUE
- arp_announce
Define different restriction levels for announcing the local source IP address from IP packets in ARP requests sent on interface
Type Default Integer
0
Value Effect 0
(default) Use any local address, configured on any interface
1
Try to avoid local addresses that are not in the target’s subnet for this interface. This mode is useful when target hosts reachable via this interface require the source IP address in ARP requests to be part of their logical network configured on the receiving interface. When we generate the request we check all our subnets that include the target IP and preserve the source address if it is from such subnet. If there is no such subnet we select source address according to the rules for level 2.
2
Always use the best local address for this target. In this mode we ignore the source address in the IP packet and try to select local address that we prefer for talks with the target host. Such local address is selected by looking for primary IP addresses on all our subnets on the outgoing interface that include the target IP address. If no suitable local address is found we select the first local address we have on the outgoing interface or on all other interfaces, with the hope we receive reply for our request and even sometimes no matter the source IP address we announce.
Notes
-
The highest value from
conf/{all,interface}/arp_announce
is used. - Increasing the restriction level gives more chance for receiving answer from the resolved target while decreasing the level announces more valid sender’s information.
-
The highest value from
- arp_ignore
Define different modes for sending replies in response to received ARP requests that resolve local target IP addresses
Type Default Integer
0
Value Effect 0
(default): reply for any local target IP address, configured on any interface
1
reply only if the target IP address is local address configured on the incoming interface
2
reply only if the target IP address is local address configured on the incoming interface and both with the sender’s IP address are part from same subnet on this interface
3
do not reply for local addresses configured with scope host, only resolutions for global and link addresses are replied
4-7
reserved
8
do not reply for all local addresses The max value from conf/{all,interface}/arp_ignore is used when ARP request is received on the {interface}
Notes
- arp_notify
Define mode for notification of address and device changes.
Type Default Boolean
0
Value Effect 0
do nothing
1
Generate gratuitous arp requests when device is brought up or hardware address changes.
Notes
- arp_accept
Define behavior for gratuitous ARP frames who’s IP is not already present in the ARP table
Type Default Boolean
0
Value Effect 0
do not create new entries in the ARP table
1
create new entries in the ARP table.
Notes
Both replies and requests type gratuitous arp trigger the ARP table to be updated, if this setting is on. If the ARP table already contains the IP address of the gratuitous arp frame, the arp table is updated regardless if this setting is on or off.
- app_solicit
The maximum number of probes to send to the user space ARP daemon via netlink before dropping back to multicast probes (see mcast_solicit).
Type Default Integer
0
Notes
See mcast_solicit
- disable_policy
Disable IPSEC policy (SPD) for this interface
Type Default Boolean
0
needinfo
- disable_xfrm
Disable IPSEC encryption on this interface, whatever the policy
Type Default Boolean
0
needinfo
- igmpv2_unsolicited_report_interval
The interval in milliseconds in which the next unsolicited IGMPv1 or IGMPv2 report retransmit takes place.
Type Default Integer
10000
Notes
Milliseconds
- igmpv3_unsolicited_report_interval
The interval in milliseconds in which the next unsolicited IGMPv3 report retransmit takes place.
Type Default Integer
1000
Notes
Milliseconds
- tag
Allows you to write a number, which can be used as required.
Type Default Integer
0
- xfrm4_gc_thresh
The threshold at which we start garbage collecting for IPv4 destination cache entries.
Type Default Integer
1
Notes
At twice this value the system refuses new allocations.
2.3.2. Global kernel tunables
System administrators are able to configure and monitor general settings on a running system through the global kernel tunables.
Global kernel tunables are included in the /proc/sys/kernel/
directory either directly as named control files or grouped in further subdirectories for various configuration topics. To adjust the global kernel tunables, system administrators need to modify the control files.
Descriptions of the following parameters have been adopted from the kernel documentation sites.[2]
- dmesg_restrict
Indicates whether unprivileged users are prevented from using the
dmesg
command to view messages from the kernel’s log buffer.For further information, see Kernel sysctl documentation.
- core_pattern
Specifies a core dumpfile pattern name.
Max length Default 128 characters
"core"
For further information, see Kernel sysctl documentation.
- hardlockup_panic
Controls the kernel panic when a hard lockup is detected.
Type Value Effect Integer
0
kernel does not panic on hard lockup
Integer
1
kernel panics on hard lockup
In order to panic, the system needs to detect a hard lockup first. The detection is controlled by the nmi_watchdog parameter.
Further Resources
- softlockup_panic
Controls the kernel panic when a soft lockup is detected.
Type Value Effect Integer
0
kernel does not panic on soft lockup
Integer
1
kernel panics on soft lockup
By default, on RHEL7 this value is 0.
For more information about
softlockup_panic
, see kernel_parameters.
- kptr_restrict
Indicates whether restrictions are placed on exposing kernel addresses via
/proc
and other interfaces.Type Default Integer
0
Value Effect 0
hashes the kernel address before printing
1
replaces printed kernel pointers with 0’s under certain conditions
2
replaces printed kernel pointers with 0’s unconditionally
To learn more, see Kernel sysctl documentation.
- nmi_watchdog
Controls the hard lockup detector on x86 systems.
Type Default Integer
0
Value Effect 0
disables the lockup detector
1
enables the lockup detector
The hard lockup detector monitors each CPU for its ability to respond to interrupts.
For more details, see Kernel sysctl documentation.
- watchdog_thresh
Controls frequency of watchdog
hrtimer
, NMI events, and soft/hard lockup thresholds.Default threshold Soft lockup threshold 10 seconds
2 *
watchdog_thresh
Setting this tunable to zero disables lockup detection altogether.
For more info, consult Kernel sysctl documentation.
- panic, panic_on_oops, panic_on_stackoverflow, panic_on_unrecovered_nmi, panic_on_warn, panic_on_rcu_stall, hung_task_panic
These tunables specify under what circumstances the kernel should panic.
To see more details about a group of
panic
parameters, see Kernel sysctl documentation.
- printk, printk_delay, printk_ratelimit, printk_ratelimit_burst, printk_devkmsg
These tunables control logging or printing of kernel error messages.
For more details about a group of
printk
parameters, see Kernel sysctl documentation.
- shmall, shmmax, shm_rmid_forced
These tunables control limits for shared memory.
For more information about a group of
shm
parameters, see Kernel sysctl documentation.
- threads-max
Controls the maximum number of threads created by the
fork()
system call.Min value Max value 20
Given by FUTEX_TID_MASK (0x3fffffff)
The
threads-max
value is checked against the available RAM pages. If the thread structures occupy too much of the available RAM pages,threads-max
is reduced accordingly.For more details, see Kernel sysctl documentation.
- pid_max
PID allocation wrap value.
To see more information, refer to Kernel sysctl documentation.
- numa_balancing
This parameter enables or disables automatic NUMA memory balancing. On NUMA machines, there is a performance penalty if remote memory is accessed by a CPU.
For more details, see Kernel sysctl documentation.
- numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
These tunables detect if pages are properly placed of if the data should be migrated to a memory node local to where the task is running.
For more details about a group of
numa_balancing_scan
parameters, see Kernel sysctl documentation.