Chapter 22. Allocating Node Resources

22.1. Purpose for Allocating Node Resources
Copy link

To provide more reliable scheduling and minimize node resource overcommitment, reserve a portion of the CPU and memory resources for use by the underlying node components such as kubelet, kube-proxy, and the container engine. The resources that you reserve are also used by the remaining system components such as sshd, NetworkManager, and so on. Specifying the resources to reserve provides the scheduler with more information about the remaining memory and CPU resources that a node has available for use by pods.

22.2. Configuring Nodes for Allocated Resources
Copy link

Resources are reserved for node components and system components in OpenShift Container Platform by configuring the system-reserved node setting.

OpenShift Container Platform does not use the kube-reserved setting. Documentation for Kubernetes and some cloud vendors that provide a Kubernetes environment might suggest configuring kube-reserved. That information does not apply to an OpenShift Container Platform cluster.

Use caution when you tune your cluster with resource limits and enforcing limits with evictions. Enforcing system-reserved limits can prevent critical system services from receiving CPU time or ending the critical system services when memory resources run low.

In most cases, tuning resource allocation is performed by making an adjustment and then monitoring the cluster performance with a production-like workload. That process is repeated until the cluster is stable and meets service-level agreements.

For more information on the effects of these settings, see Computing Allocated Resources.

Expand

Setting	Description
`kube-reserved`	This setting is not used with OpenShift Container Platform. Add the CPU and memory resources that you planned to reserve to `system-reserved` setting.
`system-reserved`	Resources that are reserved for the node components and system components. Default is none.

View the services that are controlled by system-reserved with a tool such as lscgroup by running the following commands:

yum install libcgroup-tools

# yum install libcgroup-tools

Copy to Clipboard

Toggle word wrap

lscgroup memory:/system.slice

$ lscgroup memory:/system.slice

Copy to Clipboard

Toggle word wrap

Reserve resources in the kubeletArguments section of the node configuration map by adding a set of <resource_type>=<resource_quantity> pairs. For example, cpu=500m,memory=1Gi reserves 500 millicores of CPU and one gigabyte of memory.

Example 22.1. Node-Allocatable Resources Settings

kubeletArguments:
  system-reserved:
    - "cpu=500m,memory=1Gi"

kubeletArguments:
  system-reserved:
    - "cpu=500m,memory=1Gi"

Copy to Clipboard

Toggle word wrap

Add the system-reserved field if it does not exist.

Note

Do not edit the node-config.yaml file directly.

To determine appropriate values for these settings, view the resource usage of a node by using the node summary API. For more information, see System Resources Reported by Node.

After you set system-reserved:

Monitor the memory usage of a node for high-water marks:

ps aux | grep <service-name>

$ ps aux | grep <service-name>

Copy to Clipboard

Toggle word wrap

For example:

ps aux | grep atomic-openshift-node

USER       PID   %CPU  %MEM  VSZ     RSS  TTY    STAT  START  TIME  COMMAND
root       11089 11.5  0.3   112712  996  pts/1  R+    16:23  0:00  grep --color=auto atomic-openshift-node

$ ps aux | grep atomic-openshift-node

USER       PID   %CPU  %MEM  VSZ     RSS  TTY    STAT  START  TIME  COMMAND
root       11089 11.5  0.3   112712  996  pts/1  R+    16:23  0:00  grep --color=auto atomic-openshift-node

Copy to Clipboard

Toggle word wrap

If this value is close to your system-reserved mark, you can increase the system-reserved value.

Monitor the memory usage of system services with a tool such as cgget by running the following commands:
```
yum install libcgroup-tools
```
```
# yum install libcgroup-tools
```
Copy to Clipboard Toggle word wrap
```
cgget -g memory  /system.slice | grep memory.usage_in_bytes
```
```
$ cgget -g memory  /system.slice | grep memory.usage_in_bytes
```
Copy to Clipboard Toggle word wrap
If this value is close to your system-reserved mark, you can increase the system-reserved value.
Use the OpenShift Container Platform cluster loader to measure performance metrics of your deployment at various cluster states.

22.3. Computing Allocated Resources
Copy link

An allocated amount of a resource is computed based on the following formula:

[Allocatable] = [Node Capacity] - [system-reserved] - [Hard-Eviction-Thresholds]

[Allocatable] = [Node Capacity] - [system-reserved] - [Hard-Eviction-Thresholds]

Copy to Clipboard

Toggle word wrap

Note

The withholding of Hard-Eviction-Thresholds from allocatable improves system reliability because the value for allocatable is enforced for pods at the node level. The experimental-allocatable-ignore-eviction setting is available to preserve legacy behavior, but it will be deprecated in a future release.

If [Allocatable] is negative, it is set to 0.

22.4. Viewing Node-Allocatable Resources and Capacity
Copy link

To view the current capacity and allocatable resources for a node, run the following command:

oc get node/<node_name> -o yaml

$ oc get node/<node_name> -o yaml

Copy to Clipboard

Toggle word wrap

In the following partial output, the allocatable values are less than the capacity. The difference is expected and matches a cpu=500m,memory=1Gi resource allocation for system-reserved.

status:
...
  allocatable:
    cpu: "3500m"
    memory: 6857952Ki
    pods: "110"
  capacity:
    cpu: "4"
    memory: 8010948Ki
    pods: "110"
...

status:
...
  allocatable:
    cpu: "3500m"
    memory: 6857952Ki
    pods: "110"
  capacity:
    cpu: "4"
    memory: 8010948Ki
    pods: "110"
...

Copy to Clipboard

Toggle word wrap

The scheduler uses the values for allocatable to decide if a node is a candidate for pod scheduling.

22.5. System Resources Reported by Node
Copy link

Each node reports the system resources that are used by the container runtime and kubelet. To simplify configuring system-reserved, view the resource usage for the node by using the node summary API. The node summary is available at <master>/api/v1/nodes/<node>/proxy/stats/summary.

For instance, to access the resources from cluster.node22 node, run the following command:

curl <certificate details> https://<master>/api/v1/nodes/cluster.node22/proxy/stats/summary

$ curl <certificate details> https://<master>/api/v1/nodes/cluster.node22/proxy/stats/summary

Copy to Clipboard

Toggle word wrap

The response includes information that is similar to the following:

{
    "node": {
        "nodeName": "cluster.node22",
        "systemContainers": [
            {
                "cpu": {
                    "usageCoreNanoSeconds": 929684480915,
                    "usageNanoCores": 190998084
                },
                "memory": {
                    "rssBytes": 176726016,
                    "usageBytes": 1397895168,
                    "workingSetBytes": 1050509312
                },
                "name": "kubelet"
            },
            {
                "cpu": {
                    "usageCoreNanoSeconds": 128521955903,
                    "usageNanoCores": 5928600
                },
                "memory": {
                    "rssBytes": 35958784,
                    "usageBytes": 129671168,
                    "workingSetBytes": 102416384
                },
                "name": "runtime"
            }
        ]
    }
}

{
    "node": {
        "nodeName": "cluster.node22",
        "systemContainers": [
            {
                "cpu": {
                    "usageCoreNanoSeconds": 929684480915,
                    "usageNanoCores": 190998084
                },
                "memory": {
                    "rssBytes": 176726016,
                    "usageBytes": 1397895168,
                    "workingSetBytes": 1050509312
                },
                "name": "kubelet"
            },
            {
                "cpu": {
                    "usageCoreNanoSeconds": 128521955903,
                    "usageNanoCores": 5928600
                },
                "memory": {
                    "rssBytes": 35958784,
                    "usageBytes": 129671168,
                    "workingSetBytes": 102416384
                },
                "name": "runtime"
            }
        ]
    }
}

Copy to Clipboard

Toggle word wrap

See REST API Overview for more details about certificate details.

22.6. Node Enforcement
Copy link

The node is able to limit the total amount of resources that pods can consume based on the configured allocatable value. This feature significantly improves the reliability of the node by preventing pods from using CPU and memory resources that are needed by system services such as the container runtime and node agent. To improve node reliability, administrators should reserve resources based on a target for resource use.

The node enforces resource constraints using a new cgroup hierarchy that enforces quality of service. All pods are launched in a dedicated cgroup hierarchy that is separate from system daemons.

To configure node enforcement, use the following parameters in the appropriate node configuration map.

Example 22.2. Node Cgroup Settings

kubeletArguments:
  cgroups-per-qos:
    - "true" 
  cgroup-driver:
    - "systemd" 
  enforce-node-allocatable:
    - "pods"

kubeletArguments:
  cgroups-per-qos:
    - "true"

1


  cgroup-driver:
    - "systemd"

2


  enforce-node-allocatable:
    - "pods"

3

Copy to Clipboard

Toggle word wrap

1: Enable or disable a cgroup hierarchy for each quality of service. The cgroups are managed by the node. Any change of this setting requires a full drain of the node. This flag must be true to enable the node to enforce the node-allocatable resource constraints. The default value is true and Red Hat does not recommend that customers change this value.
2: The cgroup driver that is used by the node to manage the cgroup hierarchies. This value must match the driver that is associated with the container runtime. Valid values are systemd and cgroupfs, but Red Hat supports systemd only.
3: A comma-delimited list of scopes for where the node should enforce node resource constraints. The default value is pods and Red Hat supports pods only.

Administrators should treat system daemons similar to pods that have a guaranteed quality of service. System daemons can burst within their bounding control groups and this behavior must be managed as part of cluster deployments. Reserve CPU and memory resources for system daemons by specifying the resources in system-reserved as shown in section Configuring Nodes for Allocated Resources.

To view the cgroup driver that is set, run the following command:

systemctl status atomic-openshift-node -l | grep cgroup-driver=

$ systemctl status atomic-openshift-node -l | grep cgroup-driver=

Copy to Clipboard

Toggle word wrap

The output includes a response that is similar to the following:

--cgroup-driver=systemd

--cgroup-driver=systemd

Copy to Clipboard

Toggle word wrap

For more information on managing and troubleshooting cgroup drivers, see Introduction to Control Groups (Cgroups).

22.7. Eviction Thresholds
Copy link

If a node is under memory pressure, it can impact the entire node and all pods running on it. If a system daemon uses more than its reserved amount of memory, an out-of-memory event can occur that impacts the entire node and all pods running on the node. To avoid or reduce the probability of system out-of-memory events, the node provides out of resource handling.

22.1. Purpose for Allocating Node Resources
Copy link

22.2. Configuring Nodes for Allocated Resources
Copy link

22.3. Computing Allocated Resources
Copy link

22.4. Viewing Node-Allocatable Resources and Capacity
Copy link

22.5. System Resources Reported by Node
Copy link

22.6. Node Enforcement
Copy link

22.7. Eviction Thresholds
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 22. Allocating Node Resources

22.1. Purpose for Allocating Node ResourcesCopy linkLink copied to clipboard!

22.2. Configuring Nodes for Allocated ResourcesCopy linkLink copied to clipboard!

22.3. Computing Allocated ResourcesCopy linkLink copied to clipboard!

22.4. Viewing Node-Allocatable Resources and CapacityCopy linkLink copied to clipboard!

22.5. System Resources Reported by NodeCopy linkLink copied to clipboard!

22.6. Node EnforcementCopy linkLink copied to clipboard!

22.7. Eviction ThresholdsCopy linkLink copied to clipboard!

22.8. Related ResourcesCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

22.1. Purpose for Allocating Node Resources
Copy link

22.2. Configuring Nodes for Allocated Resources
Copy link

22.3. Computing Allocated Resources
Copy link

22.4. Viewing Node-Allocatable Resources and Capacity
Copy link

22.5. System Resources Reported by Node
Copy link

22.6. Node Enforcement
Copy link

22.7. Eviction Thresholds
Copy link

22.8. Related Resources
Copy link