Este conteúdo não está disponível no idioma selecionado.
Chapter 24. Using systemd to manage resources used by applications
RHEL 9 moves the resource management settings from the process level to the application level by binding the system of cgroup
hierarchies with the systemd
unit tree. Therefore, you can manage the system resources with the systemctl
command, or by modifying the systemd
unit files.
To achieve this, systemd
takes various configuration options from the unit files or directly via the systemctl
command. Then systemd
applies those options to specific process groups by using the Linux kernel system calls and features like cgroups
and namespaces
.
You can review the full set of configuration options for systemd
in the following manual pages:
-
systemd.resource-control(5)
-
systemd.exec(5)
24.1. Role of systemd in resource management
The core function of systemd
is service management and supervision. The systemd
system and service manager :
- ensures that managed services start at the right time and in the correct order during the boot process.
- ensures that managed services run smoothly to use the underlying hardware platform optimally.
- provides capabilities to define resource management policies.
- provides capabilities to tune various options, which can improve the performance of the service.
In general, Red Hat recommends you use systemd
for controlling the usage of system resources. You should manually configure the cgroups
virtual file system only in special cases. For example, when you need to use cgroup-v1
controllers that have no equivalents in cgroup-v2
hierarchy.
24.2. Distribution models of system sources
To modify the distribution of system resources, you can apply one or more of the following distribution models:
- Weights
You can distribute the resource by adding up the weights of all sub-groups and giving each sub-group the fraction matching its ratio against the sum.
For example, if you have 10 cgroups, each with weight of value 100, the sum is 1000. Each cgroup receives one tenth of the resource.
Weight is usually used to distribute stateless resources. For example the CPUWeight= option is an implementation of this resource distribution model.
- Limits
A cgroup can consume up to the configured amount of the resource. The sum of sub-group limits can exceed the limit of the parent cgroup. Therefore it is possible to overcommit resources in this model.
For example the MemoryMax= option is an implementation of this resource distribution model.
- Protections
You can set up a protected amount of a resource for a cgroup. If the resource usage is below the protection boundary, the kernel will try not to penalize this cgroup in favor of other cgroups that compete for the same resource. An overcommit is also possible.
For example the MemoryLow= option is an implementation of this resource distribution model.
- Allocations
- Exclusive allocations of an absolute amount of a finite resource. An overcommit is not possible. An example of this resource type in Linux is the real-time budget.
- unit file option
A setting for resource control configuration.
For example, you can configure CPU resource with options like CPUAccounting=, or CPUQuota=. Similarly, you can configure memory or I/O resources with options like AllowedMemoryNodes= and IOAccounting=.
24.3. Allocating system resources using systemd
Allocating system resources by using systemd involves creating & managing systemd services and units. This can be configured to start, stop, or restart at specific times or in response to certain system events.
Procedure
To change the required value of the unit file option of your service, you can adjust the value in the unit file, or use the systemctl
command:
Check the assigned values for the service of your choice.
# systemctl show --property <unit file option> <service name>
Set the required value of the CPU time allocation policy option:
# systemctl set-property <service name> <unit file option>=<value>
Verification
Check the newly assigned values for the service of your choice.
# systemctl show --property <unit file option> <service name>
Additional resources
-
systemd.resource-control(5)
andsystemd.exec(5)
man pages on your system
24.4. Overview of systemd hierarchy for cgroups
On the backend, the systemd
system and service manager uses the slice
, the scope
, and the service
units to organize and structure processes in the control groups. You can further modify this hierarchy by creating custom unit files or using the systemctl
command. Also, systemd
automatically mounts hierarchies for important kernel resource controllers at the /sys/fs/cgroup/
directory.
For resource control, you can use the following three systemd
unit types:
- Service
A process or a group of processes, which
systemd
started according to a unit configuration file.Services encapsulate the specified processes so that they can be started and stopped as one set. Services are named in the following way:
<name>.service
- Scope
A group of externally created processes. Scopes encapsulate processes that are started and stopped by the arbitrary processes through the
fork()
function and then registered bysystemd
at runtime. For example, user sessions, containers, and virtual machines are treated as scopes. Scopes are named as follows:<name>.scope
- Slice
A group of hierarchically organized units. Slices organize a hierarchy in which scopes and services are placed.
The actual processes are contained in scopes or in services. Every name of a slice unit corresponds to the path to a location in the hierarchy.
The dash (
-
) character acts as a separator of the path components to a slice from the-.slice
root slice. In the following example:<parent-name>.slice
parent-name.slice
is a sub-slice ofparent.slice
, which is a sub-slice of the-.slice
root slice.parent-name.slice
can have its own sub-slice namedparent-name-name2.slice
, and so on.
The service
, the scope
, and the slice
units directly map to objects in the control group hierarchy. When these units are activated, they map directly to control group paths built from the unit names.
The following is an abbreviated example of a control group hierarchy:
Control group /: -.slice ├─user.slice │ ├─user-42.slice │ │ ├─session-c1.scope │ │ │ ├─ 967 gdm-session-worker [pam/gdm-launch-environment] │ │ │ ├─1035 /usr/libexec/gdm-x-session gnome-session --autostart /usr/share/gdm/greeter/autostart │ │ │ ├─1054 /usr/libexec/Xorg vt1 -displayfd 3 -auth /run/user/42/gdm/Xauthority -background none -noreset -keeptty -verbose 3 │ │ │ ├─1212 /usr/libexec/gnome-session-binary --autostart /usr/share/gdm/greeter/autostart │ │ │ ├─1369 /usr/bin/gnome-shell │ │ │ ├─1732 ibus-daemon --xim --panel disable │ │ │ ├─1752 /usr/libexec/ibus-dconf │ │ │ ├─1762 /usr/libexec/ibus-x11 --kill-daemon │ │ │ ├─1912 /usr/libexec/gsd-xsettings │ │ │ ├─1917 /usr/libexec/gsd-a11y-settings │ │ │ ├─1920 /usr/libexec/gsd-clipboard … ├─init.scope │ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18 └─system.slice ├─rngd.service │ └─800 /sbin/rngd -f ├─systemd-udevd.service │ └─659 /usr/lib/systemd/systemd-udevd ├─chronyd.service │ └─823 /usr/sbin/chronyd ├─auditd.service │ ├─761 /sbin/auditd │ └─763 /usr/sbin/sedispatch ├─accounts-daemon.service │ └─876 /usr/libexec/accounts-daemon ├─example.service │ ├─ 929 /bin/bash /home/jdoe/example.sh │ └─4902 sleep 1 …
The example above shows that services and scopes contain processes and are placed in slices that do not contain processes of their own.
Additional resources
- Managing system services with systemctl in Red Hat Enterprise Linux
- What are kernel resource controllers
-
The
systemd.resource-control(5)
,systemd.exec(5)
,cgroups(7)
,fork()
,fork(2)
manual pages - Understanding cgroups
24.5. Listing systemd units
Use the systemd
system and service manager to list its units.
Procedure
List all active units on the system with the
systemctl
utility. The terminal returns an output similar to the following example:# systemctl UNIT LOAD ACTIVE SUB DESCRIPTION … init.scope loaded active running System and Service Manager session-2.scope loaded active running Session 2 of user jdoe abrt-ccpp.service loaded active exited Install ABRT coredump hook abrt-oops.service loaded active running ABRT kernel log watcher abrt-vmcore.service loaded active exited Harvest vmcores for ABRT abrt-xorg.service loaded active running ABRT Xorg log watcher … -.slice loaded active active Root Slice machine.slice loaded active active Virtual Machine and Container Slice system-getty.slice loaded active active system-getty.slice system-lvm2\x2dpvscan.slice loaded active active system-lvm2\x2dpvscan.slice system-sshd\x2dkeygen.slice loaded active active system-sshd\x2dkeygen.slice system-systemd\x2dhibernate\x2dresume.slice loaded active active system-systemd\x2dhibernate\x2dresume> system-user\x2druntime\x2ddir.slice loaded active active system-user\x2druntime\x2ddir.slice system.slice loaded active active System Slice user-1000.slice loaded active active User Slice of UID 1000 user-42.slice loaded active active User Slice of UID 42 user.slice loaded active active User and Session Slice …
UNIT
- A name of a unit that also reflects the unit position in a control group hierarchy. The units relevant for resource control are a slice, a scope, and a service.
LOAD
- Indicates whether the unit configuration file was properly loaded. If the unit file failed to load, the field provides the state error instead of loaded. Other unit load states are: stub, merged, and masked.
ACTIVE
-
The high-level unit activation state, which is a generalization of
SUB
. SUB
- The low-level unit activation state. The range of possible values depends on the unit type.
DESCRIPTION
- The description of the unit content and functionality.
List all active and inactive units:
# systemctl --all
Limit the amount of information in the output:
# systemctl --type service,masked
The
--type
option requires a comma-separated list of unit types such as a service and a slice, or unit load states such as loaded and masked.
Additional resources
- Managing system services with systemctl in RHEL
-
The
systemd.resource-control(5)
,systemd.exec(5)
manual pages
24.6. Viewing systemd cgroups hierarchy
Display control groups (cgroups
) hierarchy and processes running in specific cgroups
.
Procedure
Display the whole
cgroups
hierarchy on your system with thesystemd-cgls
command.# systemd-cgls Control group /: -.slice ├─user.slice │ ├─user-42.slice │ │ ├─session-c1.scope │ │ │ ├─ 965 gdm-session-worker [pam/gdm-launch-environment] │ │ │ ├─1040 /usr/libexec/gdm-x-session gnome-session --autostart /usr/share/gdm/greeter/autostart … ├─init.scope │ └─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18 └─system.slice … ├─example.service │ ├─6882 /bin/bash /home/jdoe/example.sh │ └─6902 sleep 1 ├─systemd-journald.service └─629 /usr/lib/systemd/systemd-journald …
The example output returns the entire
cgroups
hierarchy, where the highest level is formed by slices.Display the
cgroups
hierarchy filtered by a resource controller with thesystemd-cgls <resource_controller>
command.# systemd-cgls memory Controller memory; Control group /: ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 18 ├─user.slice │ ├─user-42.slice │ │ ├─session-c1.scope │ │ │ ├─ 965 gdm-session-worker [pam/gdm-launch-environment] … └─system.slice | … ├─chronyd.service │ └─844 /usr/sbin/chronyd ├─example.service │ ├─8914 /bin/bash /home/jdoe/example.sh │ └─8916 sleep 1 …
The example output lists the services that interact with the selected controller.
Display detailed information about a certain unit and its part of the
cgroups
hierarchy with thesystemctl status <system_unit>
command.# systemctl status example.service ● example.service - My example service Loaded: loaded (/usr/lib/systemd/system/example.service; enabled; vendor preset: disabled) Active: active (running) since Tue 2019-04-16 12:12:39 CEST; 3s ago Main PID: 17737 (bash) Tasks: 2 (limit: 11522) Memory: 496.0K (limit: 1.5M) CGroup: /system.slice/example.service ├─17737 /bin/bash /home/jdoe/example.sh └─17743 sleep 1 Apr 16 12:12:39 redhat systemd[1]: Started My example service. Apr 16 12:12:39 redhat bash[17737]: The current time is Tue Apr 16 12:12:39 CEST 2019 Apr 16 12:12:40 redhat bash[17737]: The current time is Tue Apr 16 12:12:40 CEST 2019
Additional resources
-
systemd.resource-control(5)
andcgroups(7)
man pages on your system
24.7. Viewing cgroups of processes
You can learn which control group (cgroup
) a process belongs to. Then you can check the cgroup
to find which controllers and controller-specific configurations it uses.
Procedure
To view which
cgroup
a process belongs to, run the# cat proc/<PID>/cgroup
command:# cat /proc/2467/cgroup 0::/system.slice/example.service
The example output relates to a process of interest. In this case, it is a process identified by
PID 2467
, which belongs to theexample.service
unit. You can determine whether the process was placed in a correct control group as defined by thesystemd
unit file specifications.To display what controllers and respective configuration files the
cgroup
uses, check thecgroup
directory:# cat /sys/fs/cgroup/system.slice/example.service/cgroup.controllers memory pids # ls /sys/fs/cgroup/system.slice/example.service/ cgroup.controllers cgroup.events … cpu.pressure cpu.stat io.pressure memory.current memory.events … pids.current pids.events pids.max
The version 1 hierarchy of cgroups
uses a per-controller model. Therefore the output from the /proc/PID/cgroup
file shows, which cgroups
under each controller the PID belongs to. You can find the respective cgroups
under the controller directories at /sys/fs/cgroup/<controller_name>/
.
Additional resources
-
The
cgroups(7)
manual page - What are kernel resource controllers
-
Documentation in the
/usr/share/doc/kernel-doc-<kernel_version>/Documentation/admin-guide/cgroup-v2.rst
file (after installing thekernel-doc
package)
24.8. Monitoring resource consumption
View a list of currently running control groups (cgroups
) and their resource consumption in real-time.
Procedure
Display a dynamic account of currently running
cgroups
with thesystemd-cgtop
command.# systemd-cgtop Control Group Tasks %CPU Memory Input/s Output/s / 607 29.8 1.5G - - /system.slice 125 - 428.7M - - /system.slice/ModemManager.service 3 - 8.6M - - /system.slice/NetworkManager.service 3 - 12.8M - - /system.slice/accounts-daemon.service 3 - 1.8M - - /system.slice/boot.mount - - 48.0K - - /system.slice/chronyd.service 1 - 2.0M - - /system.slice/cockpit.socket - - 1.3M - - /system.slice/colord.service 3 - 3.5M - - /system.slice/crond.service 1 - 1.8M - - /system.slice/cups.service 1 - 3.1M - - /system.slice/dev-hugepages.mount - - 244.0K - - /system.slice/dev-mapper-rhel\x2dswap.swap - - 912.0K - - /system.slice/dev-mqueue.mount - - 48.0K - - /system.slice/example.service 2 - 2.0M - - /system.slice/firewalld.service 2 - 28.8M - - ...
The example output displays currently running
cgroups
ordered by their resource usage (CPU, memory, disk I/O load). The list refreshes every 1 second by default. Therefore, it offers a dynamic insight into the actual resource usage of each control group.
Additional resources
-
The
systemd-cgtop(1)
manual page
24.9. Using systemd unit files to set limits for applications
The systemd
service manager supervises each existing or running unit and creates control groups for them. The units have configuration files in the /usr/lib/systemd/system/
directory.
You can manually modify the unit files to:
- set limits.
- prioritize.
- control access to hardware resources for groups of processes.
Prerequisites
-
You have the
root
privileges.
Procedure
Edit the
/usr/lib/systemd/system/example.service
file to limit the memory usage of a service:… [Service] MemoryMax=1500K …
The configuration limits the maximum memory that the processes in a control group cannot exceed. The
example.service
service is part of such a control group which has imposed limitations. You can use suffixes K, M, G, or T to identify Kilobyte, Megabyte, Gigabyte, or Terabyte as a unit of measurement.Reload all unit configuration files:
# systemctl daemon-reload
Restart the service:
# systemctl restart example.service
Verification
Check that the changes took effect:
# cat /sys/fs/cgroup/system.slice/example.service/memory.max 1536000
The example output shows that the memory consumption was limited at around 1,500 KB.
Additional resources
- Understanding cgroups
- Managing system services with systemctl in Red Hat Enterprise Linux
-
systemd.resource-control(5)
,systemd.exec(5)
, andcgroups(7)
man pages on your system
24.10. Using systemctl command to set limits to applications
CPU affinity settings help you restrict the access of a particular process to some CPUs. Effectively, the CPU scheduler never schedules the process to run on the CPU that is not in the affinity mask of the process.
The default CPU affinity mask applies to all services managed by systemd
.
To configure CPU affinity mask for a particular systemd
service, systemd
provides CPUAffinity=
both as:
- a unit file option.
-
a configuration option in the [Manager] section of the
/etc/systemd/system.conf
file.
The CPUAffinity=
unit file option sets a list of CPUs or CPU ranges that are merged and used as the affinity mask.
Procedure
To set CPU affinity mask for a particular systemd
service using the CPUAffinity
unit file option:
Check the values of the
CPUAffinity
unit file option in the service of your choice:$ systemctl show --property <CPU affinity configuration option> <service name>
As the root user, set the required value of the
CPUAffinity
unit file option for the CPU ranges used as the affinity mask:# systemctl set-property <service name> CPUAffinity=<value>
Restart the service to apply the changes.
# systemctl restart <service name>
Additional resources
-
systemd.resource-control(5)
,systemd.exec(5)
,cgroups(7)
man pages on your system
24.11. Setting global default CPU affinity through manager configuration
The CPUAffinity
option in the /etc/systemd/system.conf
file defines an affinity mask for the process identification number (PID) 1 and all processes forked off of PID1. You can then override the CPUAffinity
on a per-service basis.
To set the default CPU affinity mask for all systemd
services using the /etc/systemd/system.conf
file:
-
Set the CPU numbers for the
CPUAffinity=
option in the [Manager] section of the/etc/systemd/system.conf
file. Save the edited file and reload the
systemd
service:# systemctl daemon-reload
- Reboot the server to apply the changes.
Additional resources
-
The
systemd.resource-control(5)
andsystemd.exec(5)
man pages.
24.12. Configuring NUMA policies using systemd
Non-uniform memory access (NUMA) is a computer memory subsystem design, in which the memory access time depends on the physical memory location relative to the processor.
Memory close to the CPU has lower latency (local memory) than memory that is local for a different CPU (foreign memory) or is shared between a set of CPUs.
In terms of the Linux kernel, NUMA policy governs where (for example, on which NUMA nodes) the kernel allocates physical memory pages for the process.
systemd
provides unit file options NUMAPolicy
and NUMAMask
to control memory allocation policies for services.
Procedure
To set the NUMA memory policy through the NUMAPolicy
unit file option:
Check the values of the
NUMAPolicy
unit file option in the service of your choice:$ systemctl show --property <NUMA policy configuration option> <service name>
As a root, set the required policy type of the
NUMAPolicy
unit file option:# systemctl set-property <service name> NUMAPolicy=<value>
Restart the service to apply the changes.
# systemctl restart <service name>
To set a global NUMAPolicy
setting using the [Manager] configuration option:
-
Search in the
/etc/systemd/system.conf
file for theNUMAPolicy
option in the [Manager] section of the file. - Edit the policy type and save the file.
Reload the
systemd
configuration:# systemd daemon-reload
- Reboot the server.
When you configure a strict NUMA policy, for example bind
, make sure that you also appropriately set the CPUAffinity=
unit file option.
Additional resources
- Using systemctl command to set limits to applications
-
The
systemd.resource-control(5)
,systemd.exec(5)
, andset_mempolicy(2)
man pages.
24.13. NUMA policy configuration options for systemd
Systemd
provides the following options to configure the NUMA policy:
NUMAPolicy
Controls the NUMA memory policy of the executed processes. You can use these policy types:
- default
- preferred
- bind
- interleave
- local
NUMAMask
Controls the NUMA node list that is associated with the selected NUMA policy.
Note that you do not have to specify the
NUMAMask
option for the following policies:- default
- local
For the preferred policy, the list specifies only a single NUMA node.
Additional resources
-
systemd.resource-control(5)
,systemd.exec(5)
, andset_mempolicy(2)
man pages on your system
24.14. Creating transient cgroups using systemd-run command
The transient cgroups
set limits on resources consumed by a unit (service or scope) during its runtime.
Procedure
To create a transient control group, use the
systemd-run
command in the following format:# systemd-run --unit=<name> --slice=<name>.slice <command>
This command creates and starts a transient service or a scope unit and runs a custom command in such a unit.
-
The
--unit=<name>
option gives a name to the unit. If--unit
is not specified, the name is generated automatically. -
The
--slice=<name>.slice
option makes your service or scope unit a member of a specified slice. Replace<name>.slice
with the name of an existing slice (as shown in the output ofsystemctl -t slice
), or create a new slice by passing a unique name. By default, services and scopes are created as members of thesystem.slice
. Replace
<command>
with the command you want to enter in the service or the scope unit.The following message is displayed to confirm that you created and started the service or the scope successfully:
# Running as unit <name>.service
-
The
Optional: Keep the unit running after its processes finished to collect runtime information:
# systemd-run --unit=<name> --slice=<name>.slice --remain-after-exit <command>
The command creates and starts a transient service unit and runs a custom command in the unit. The
--remain-after-exit
option ensures that the service keeps running after its processes have finished.
Additional resources
-
The
systemd-run(1)
manual page
24.15. Removing transient control groups
You can use the systemd
system and service manager to remove transient control groups (cgroups
) if you no longer need to limit, prioritize, or control access to hardware resources for groups of processes.
Transient cgroups
are automatically released when all the processes that a service or a scope unit contains finish.
Procedure
To stop the service unit with all its processes, enter:
# systemctl stop <name>.service
To terminate one or more of the unit processes, enter:
# systemctl kill <name>.service --kill-who=PID,… --signal=<signal>
The command uses the
--kill-who
option to select process(es) from the control group you want to terminate. To kill multiple processes at the same time, pass a comma-separated list of PIDs. The--signal
option determines the type of POSIX signal to be sent to the specified processes. The default signal is SIGTERM.
Additional resources
- What are control groups
- What are kernel resource controllers
-
systemd.resource-control(5)
andcgroups(7)
man pages on your system - Understanding control groups
- Managing systemd in RHEL