Inicio
Productos
OpenShift Container Platform
4.17
Scalability and performance
Chapter 12. Optimizing memory management for workloads by using huge pages

Este contenido no está disponible en el idioma seleccionado.

Chapter 12. Optimizing memory management for workloads by using huge pages

To optimize memory management for specific workloads, configure huge pages. By using these Linux-based system page sizes, you can maintain manual control over memory allocation and override automatic system behaviors.

12.1. What huge pages do
Copiar enlace

To optimize memory mapping efficiency, understand the function of huge pages. Unlike standard 4Ki blocks, huge pages are larger memory segments that reduce the tracking load on the translation lookaside buffer (TLB) hardware cache.

Memory is managed in blocks known as pages. On most systems, a page is 4Ki; 1Mi of memory is equal to 256 pages; 1Gi of memory is 256,000 pages, and so on. CPUs have a built-in memory management unit that manages a list of these pages in hardware. The translation lookaside buffer (TLB) is a small hardware cache of virtual-to-physical page mappings. If the virtual address passed in a hardware instruction can be found in the TLB, the mapping can be determined quickly. If not, a TLB miss occurs, and the system falls back to slower, software-based address translation, resulting in performance issues. Since the size of the TLB is fixed, the only way to reduce the chance of a TLB miss is to increase the page size.

A huge page is a memory page that is larger than 4Ki. On x86_64 architectures, there are two common huge page sizes: 2Mi and 1Gi. Sizes vary on other architectures. To use huge pages, code must be written so that applications are aware of them. Transparent huge pages (THP) attempt to automate the management of huge pages without application knowledge, but they have limitations. In particular, they are limited to 2Mi page sizes. THP can lead to performance degradation on nodes with high memory utilization or fragmentation because of defragmenting efforts of THP, which can lock memory pages. For this reason, some applications might be designed to or recommend usage of pre-allocated huge pages instead of THP.

In OpenShift Container Platform, applications in a pod can allocate and consume pre-allocated huge pages.

12.2. How huge pages are consumed by apps
Copiar enlace

To enable applications to consume huge pages, nodes must pre-allocate these memory segments to report capacity. Because a node can only pre-allocate huge pages for a single size, you must align this configuration with your specific workload requirements.

Huge pages can be consumed through container-level resource requirements by using the resource name hugepages-<size>, where size is the most compact binary notation by using integer values supported on a particular node. For example, if a node supports 2048 KiB page sizes, the node exposes a schedulable resource hugepages-2Mi. Unlike CPU or memory, huge pages do not support over-commitment.

apiVersion: v1
kind: Pod
metadata:
  generateName: hugepages-volume-
spec:
  containers:
  - securityContext:
      privileged: true
    image: rhel7:latest
    command:
    - sleep
    - inf
    name: example
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        hugepages-2Mi: 100Mi
        memory: "1Gi"
        cpu: "1"
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

apiVersion: v1
kind: Pod
metadata:
  generateName: hugepages-volume-
spec:
  containers:
  - securityContext:
      privileged: true
    image: rhel7:latest
    command:
    - sleep
    - inf
    name: example
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        hugepages-2Mi: 100Mi
        memory: "1Gi"
        cpu: "1"
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

Copy to Clipboard

Toggle word wrap

spec.containers.resources.limits.hugepages-2Mi: Specifies the amount of memory for hugepages as the exact amount to be allocated.
Important
Do not specify this value as the amount of memory for hugepages multiplied by the size of the page. For example, given a huge page size of 2 MB, if you want to use 100 MB of huge-page-backed RAM for your application, then you would allocate 50 huge pages. OpenShift Container Platform handles the math for you. As in the above example, you can specify 100MB directly.

12.2.1. Allocating huge pages of a specific size
Copiar enlace

Some platforms support multiple huge page sizes. To allocate huge pages of a specific size, precede the huge pages boot command parameters with a huge page size selection parameter hugepagesz=<size>. The <size> value must be specified in bytes with an optional scale suffix [kKmMgG]. The default huge page size can be defined with the default_hugepagesz=<size> boot parameter.

12.2.2. Huge page requirements
Copiar enlace

Huge page requests must equal the limits. This is the default if limits are specified, but requests are not.
Huge pages are isolated at a pod scope. Container isolation is planned in a future iteration.
EmptyDir volumes backed by huge pages must not consume more huge page memory than the pod request.
Applications that consume huge pages via shmget() with SHM_HUGETLB must run with a supplemental group that matches proc/sys/vm/hugetlb_shm_group.

12.3. Consuming huge pages resources using the Downward API
Copiar enlace

To inject information about the huge pages resources consumed by a container, use the Downward API. This configuration enables applications to retrieve and use their own memory usage data directly.

You can inject the resource allocation as environment variables, a volume plugin, or both. Applications that you develop and run in the container can determine the resources that are available by reading the environment variables or files in the specified volumes.

Procedure

Create a hugepages-volume-pod.yaml file that is similar to the following example:

apiVersion: v1
kind: Pod
metadata:
  generateName: hugepages-volume-
  labels:
    app: hugepages-example
spec:
  containers:
  - securityContext:
      capabilities:
        add: [ "IPC_LOCK" ]
    image: rhel7:latest
    command:
    - sleep
    - inf
    name: example
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    - mountPath: /etc/podinfo
      name: podinfo
    resources:
      limits:
        hugepages-1Gi: 2Gi
        memory: "1Gi"
        cpu: "1"
      requests:
        hugepages-1Gi: 2Gi
    env:
    - name: REQUESTS_HUGEPAGES_1GI
      valueFrom:
        resourceFieldRef:
          containerName: example
          resource: requests.hugepages-1Gi
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages
  - name: podinfo
    downwardAPI:
      items:
        - path: "hugepages_1G_request" <.>
          resourceFieldRef:
            containerName: example
            resource: requests.hugepages-1Gi
            divisor: 1Gi

apiVersion: v1
kind: Pod
metadata:
  generateName: hugepages-volume-
  labels:
    app: hugepages-example
spec:
  containers:
  - securityContext:
      capabilities:
        add: [ "IPC_LOCK" ]
    image: rhel7:latest
    command:
    - sleep
    - inf
    name: example
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    - mountPath: /etc/podinfo
      name: podinfo
    resources:
      limits:
        hugepages-1Gi: 2Gi
        memory: "1Gi"
        cpu: "1"
      requests:
        hugepages-1Gi: 2Gi
    env:
    - name: REQUESTS_HUGEPAGES_1GI
      valueFrom:
        resourceFieldRef:
          containerName: example
          resource: requests.hugepages-1Gi
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages
  - name: podinfo
    downwardAPI:
      items:
        - path: "hugepages_1G_request" <.>
          resourceFieldRef:
            containerName: example
            resource: requests.hugepages-1Gi
            divisor: 1Gi

Copy to Clipboard

Toggle word wrap

where:

spec.containers.securityContext.env.name: Specifies what resource to read and use from requests.hugepages-1Gi and expose the value as the REQUESTS_HUGEPAGES_1GI environment variable.
spec.volumes.name.items.path: Specifies what resource to read and use from requests.hugepages-1Gi and expose the value as the file /etc/podinfo/hugepages_1G_request.

Create the pod from the hugepages-volume-pod.yaml file by entering the following command:
```
oc create -f hugepages-volume-pod.yaml
```
```
$ oc create -f hugepages-volume-pod.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Check the value of the REQUESTS_HUGEPAGES_1GI environment variable:

oc exec -it $(oc get pods -l app=hugepages-example -o jsonpath='{.items[0].metadata.name}') \
     -- env | grep REQUESTS_HUGEPAGES_1GI

$ oc exec -it $(oc get pods -l app=hugepages-example -o jsonpath='{.items[0].metadata.name}') \
     -- env | grep REQUESTS_HUGEPAGES_1GI

Copy to Clipboard

Toggle word wrap

Example output

REQUESTS_HUGEPAGES_1GI=2147483648

REQUESTS_HUGEPAGES_1GI=2147483648

Copy to Clipboard

Toggle word wrap

Check the value of the /etc/podinfo/hugepages_1G_request file:

oc exec -it $(oc get pods -l app=hugepages-example -o jsonpath='{.items[0].metadata.name}') \
     -- cat /etc/podinfo/hugepages_1G_request

$ oc exec -it $(oc get pods -l app=hugepages-example -o jsonpath='{.items[0].metadata.name}') \
     -- cat /etc/podinfo/hugepages_1G_request

Copy to Clipboard

Toggle word wrap

Example output

Copy to Clipboard

Toggle word wrap

12.4. Configuring huge pages at boot time
Copiar enlace

To ensure nodes in your OpenShift Container Platform cluster pre-allocate memory for specific workloads, reserve huge pages at boot time. This configuration sets aside memory resources during system startup, offering a distinct alternative to run-time allocation.

There are two ways of reserving huge pages: at boot time and at run time. Reserving at boot time increases the possibility of success because the memory has not yet been significantly fragmented. The Node Tuning Operator currently supports boot-time allocation of huge pages on specific nodes.

Note

The TuneD boot-loader plugin only supports Red Hat Enterprise Linux CoreOS (RHCOS) compute nodes.

Procedure

Label all nodes that need the same huge pages setting by a label by entering the following command:
```
oc label node <node_using_hugepages> node-role.kubernetes.io/worker-hp=
```
```
$ oc label node <node_using_hugepages> node-role.kubernetes.io/worker-hp=
```
Copy to Clipboard Toggle word wrap

Create a file with the following content and name it hugepages-tuned-boottime.yaml:

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: hugepages
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Boot time configuration for hugepages
      include=openshift-node
      [bootloader]
      cmdline_openshift_node_hugepages=hugepagesz=2M hugepages=50
    name: openshift-node-hugepages

  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: "worker-hp"
    priority: 30
    profile: openshift-node-hugepages
# ...

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: hugepages
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Boot time configuration for hugepages
      include=openshift-node
      [bootloader]
      cmdline_openshift_node_hugepages=hugepagesz=2M hugepages=50
    name: openshift-node-hugepages

  recommend:
  - machineConfigLabels:
      machineconfiguration.openshift.io/role: "worker-hp"
    priority: 30
    profile: openshift-node-hugepages
# ...

Copy to Clipboard

Toggle word wrap

where:

metadata.name: Specifies the name of the Tuned resource to hugepages.
spec.profile: Specifies the profile section to allocate huge pages.
spec.profile.data: Specifies the order of parameters. The order is important as some platforms support huge pages of various sizes.
spec.recommend.machineConfigLabels: Specifies the enablement of a machine config pool based matching.

Create the Tuned hugepages object by entering the following command:
```
oc create -f hugepages-tuned-boottime.yaml
```
```
$ oc create -f hugepages-tuned-boottime.yaml
```
Copy to Clipboard Toggle word wrap

Create a file with the following content and name it hugepages-mcp.yaml:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-hp
  labels:
    worker-hp: ""
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-hp]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-hp: ""

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker-hp
  labels:
    worker-hp: ""
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,worker-hp]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker-hp: ""

Copy to Clipboard

Toggle word wrap

Create the machine config pool by entering the following command:
```
oc create -f hugepages-mcp.yaml
```
```
$ oc create -f hugepages-mcp.yaml
```
Copy to Clipboard Toggle word wrap

Verification

To check that enough non-fragmented memory exists and that all the nodes in the worker-hp machine config pool now have 50 2Mi huge pages allocated, enter the following command:
```
oc get node <node_using_hugepages> -o jsonpath="{.status.allocatable.hugepages-2Mi}"
100Mi
```
```
$ oc get node <node_using_hugepages> -o jsonpath="{.status.allocatable.hugepages-2Mi}"
100Mi
```
Copy to Clipboard Toggle word wrap

12.5. Disabling transparent huge pages
Copiar enlace

If your application can handle huge pages on its own, you can disable transparent huge pages (THP) to optimally handle huge pages for all types of workloads and avoid the performance regressions that THP can cause.

Disabling THP prevents them from attempting to automate most aspects of creating, managing, and using huge pages. You can disable THP by using the Node Tuning Operator (NTO).

Procedure

Create a file with the following content and name it thp-disable-tuned.yaml:

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: thp-workers-profile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom tuned profile for OpenShift to turn off THP on worker nodes
      include=openshift-node

      [vm]
      transparent_hugepages=never
    name: openshift-thp-never-worker

  recommend:
  - match:
    - label: node-role.kubernetes.io/worker
    priority: 25
    profile: openshift-thp-never-worker
# ...

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: thp-workers-profile
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
  - data: |
      [main]
      summary=Custom tuned profile for OpenShift to turn off THP on worker nodes
      include=openshift-node

      [vm]
      transparent_hugepages=never
    name: openshift-thp-never-worker

  recommend:
  - match:
    - label: node-role.kubernetes.io/worker
    priority: 25
    profile: openshift-thp-never-worker
# ...

Copy to Clipboard

Toggle word wrap

Create the Tuned object by entering the following command:
```
oc create -f thp-disable-tuned.yaml
```
```
$ oc create -f thp-disable-tuned.yaml
```
Copy to Clipboard Toggle word wrap
Check the list of active profiles by entering the following command::
```
oc get profile -n openshift-cluster-node-tuning-operator
```
```
$ oc get profile -n openshift-cluster-node-tuning-operator
```
Copy to Clipboard Toggle word wrap

Verification

Log in to one of the nodes and do a regular THP check to verify if the nodes applied the profile successfully:
```
cat /sys/kernel/mm/transparent_hugepage/enabled
```
```
$ cat /sys/kernel/mm/transparent_hugepage/enabled
```
Copy to Clipboard Toggle word wrap
Example output
```
always madvise [never]
```
```
always madvise [never]
```
Copy to Clipboard Toggle word wrap

Este contenido no está disponible en el idioma seleccionado.

Chapter 12. Optimizing memory management for workloads by using huge pages

12.1. What huge pages do
Copiar enlace

12.2. How huge pages are consumed by apps
Copiar enlace

12.2.1. Allocating huge pages of a specific size
Copiar enlace

12.2.2. Huge page requirements
Copiar enlace

12.3. Consuming huge pages resources using the Downward API
Copiar enlace

12.4. Configuring huge pages at boot time
Copiar enlace

12.5. Disabling transparent huge pages
Copiar enlace

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Este contenido no está disponible en el idioma seleccionado.

Chapter 12. Optimizing memory management for workloads by using huge pages

12.1. What huge pages doCopiar enlaceEnlace copiado en el portapapeles!

12.2. How huge pages are consumed by appsCopiar enlaceEnlace copiado en el portapapeles!

12.2.1. Allocating huge pages of a specific sizeCopiar enlaceEnlace copiado en el portapapeles!

12.2.2. Huge page requirementsCopiar enlaceEnlace copiado en el portapapeles!

12.3. Consuming huge pages resources using the Downward APICopiar enlaceEnlace copiado en el portapapeles!

12.4. Configuring huge pages at boot timeCopiar enlaceEnlace copiado en el portapapeles!

12.5. Disabling transparent huge pagesCopiar enlaceEnlace copiado en el portapapeles!

Aprender

Pruebe, compre y venda

Comunidades

Acerca de la documentación de Red Hat

Hacer que el código abierto sea más inclusivo

Acerca de Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

12.1. What huge pages do
Copiar enlace

12.2. How huge pages are consumed by apps
Copiar enlace

12.2.1. Allocating huge pages of a specific size
Copiar enlace

12.2.2. Huge page requirements
Copiar enlace

12.3. Consuming huge pages resources using the Downward API
Copiar enlace

12.4. Configuring huge pages at boot time
Copiar enlace

12.5. Disabling transparent huge pages
Copiar enlace