Chapter 2. Deploying OpenShift sandboxed containers on bare metal


You can deploy OpenShift sandboxed containers on an on-premise bare-metal cluster with Red Hat Enterprise Linux CoreOS (RHCOS) installed on the worker nodes.

Note
  • RHEL nodes are not supported.
  • Nested virtualization is not supported.

You can use any installation method including user-provisioned, installer-provisioned, or Assisted Installer to deploy your cluster.

You can also install OpenShift sandboxed containers on Amazon Web Services (AWS) bare-metal instances. Bare-metal instances offered by other cloud providers are not supported.

Cluster requirements

  • You have installed Red Hat OpenShift Container Platform 4.14 or later on the cluster where you are installing the OpenShift sandboxed containers Operator.
  • Your cluster has at least one worker node.

2.1. OpenShift sandboxed containers resource requirements

You must ensure that your cluster has sufficient resources.

OpenShift sandboxed containers lets users run workloads on their OpenShift Container Platform clusters inside a sandboxed runtime (Kata). Each pod is represented by a virtual machine (VM). Each VM runs in a QEMU process and hosts a kata-agent process that acts as a supervisor for managing container workloads, and the processes running in those containers. Two additional processes add more overhead:

  • containerd-shim-kata-v2 is used to communicate with the pod.
  • virtiofsd handles host file system access on behalf of the guest.

Each VM is configured with a default amount of memory. Additional memory is hot-plugged into the VM for containers that explicitly request memory.

A container running without a memory resource consumes free memory until the total memory used by the VM reaches the default allocation. The guest and its I/O buffers also consume memory.

If a container is given a specific amount of memory, then that memory is hot-plugged into the VM before the container starts.

When a memory limit is specified, the workload is terminated if it consumes more memory than the limit. If no memory limit is specified, the kernel running on the VM might run out of memory. If the kernel runs out of memory, it might terminate other processes on the VM.

Default memory sizes

The following table lists some the default values for resource allocation.

ResourceValue

Memory allocated by default to a virtual machine

2Gi

Guest Linux kernel memory usage at boot

~110Mi

Memory used by the QEMU process (excluding VM memory)

~30Mi

Memory used by the virtiofsd process (excluding VM I/O buffers)

~10Mi

Memory used by the containerd-shim-kata-v2 process

~20Mi

File buffer cache data after running dnf install on Fedora

~300Mi* [1]

File buffers appear and are accounted for in multiple locations:

  • In the guest where it appears as file buffer cache.
  • In the virtiofsd daemon that maps allowed user-space file I/O operations.
  • In the QEMU process as guest memory.
Note

Total memory usage is properly accounted for by the memory utilization metrics, which only count that memory once.

Pod overhead describes the amount of system resources that a pod on a node uses. You can get the current pod overhead for the Kata runtime by using oc describe runtimeclass kata as shown below.

Example

$ oc describe runtimeclass kata

Example output

kind: RuntimeClass
apiVersion: node.k8s.io/v1
metadata:
  name: kata
overhead:
  podFixed:
    memory: "500Mi"
    cpu: "500m"

You can change the pod overhead by changing the spec.overhead field for a RuntimeClass. For example, if the configuration that you run for your containers consumes more than 350Mi of memory for the QEMU process and guest kernel data, you can alter the RuntimeClass overhead to suit your needs.

Note

The specified default overhead values are supported by Red Hat. Changing default overhead values is not supported and can result in technical issues.

When performing any kind of file system I/O in the guest, file buffers are allocated in the guest kernel. The file buffers are also mapped in the QEMU process on the host, as well as in the virtiofsd process.

For example, if you use 300Mi of file buffer cache in the guest, both QEMU and virtiofsd appear to use 300Mi additional memory. However, the same memory is used in all three cases. Therefore, the total memory usage is only 300Mi, mapped in three different places. This is correctly accounted for when reporting the memory utilization metrics.

2.2. Deploying OpenShift sandboxed containers by using the web console

You can deploy OpenShift sandboxed containers on bare metal by using the OpenShift Container Platform web console to perform the following tasks:

  1. Install the OpenShift sandboxed containers Operator.
  2. Optional: Install the Node Feature Discovery (NFD) Operator to configure node eligibility checks. For more information, see node eligibility checks and the NFD Operator documentation.
  3. Create the KataConfig custom resource.
  4. Configure the OpenShift sandboxed containers workload objects.

2.2.1. Installing the OpenShift sandboxed containers Operator

You can install the OpenShift sandboxed containers Operator by using the OpenShift Container Platform web console.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.

Procedure

  1. In the web console, navigate to Operators OperatorHub.
  2. In the Filter by keyword field, type OpenShift sandboxed containers.
  3. Select the OpenShift sandboxed containers Operator tile and click Install.
  4. On the Install Operator page, select stable from the list of available Update Channel options.
  5. Verify that Operator recommended Namespace is selected for Installed Namespace. This installs the Operator in the mandatory openshift-sandboxed-containers-operator namespace. If this namespace does not yet exist, it is automatically created.

    Note

    Attempting to install the OpenShift sandboxed containers Operator in a namespace other than openshift-sandboxed-containers-operator causes the installation to fail.

  6. Verify that Automatic is selected for Approval Strategy. Automatic is the default value, and enables automatic updates to OpenShift sandboxed containers when a new z-stream release is available.
  7. Click Install.
  8. Navigate to Operators Installed Operators to verify that the Operator is installed.

2.2.2. Creating the KataConfig custom resource

You must create the KataConfig custom resource (CR) to install kata as a RuntimeClass on your worker nodes.

The kata runtime class is installed on all worker nodes by default. If you want to install kata on specific nodes, you can add labels to those nodes and then define the label in the KataConfig CR.

OpenShift sandboxed containers installs kata as a secondary, optional runtime on the cluster and not as the primary runtime.

Important

Creating the KataConfig CR automatically reboots the worker nodes. The reboot can take from 10 to more than 60 minutes. The following factors might increase the reboot time:

  • A larger OpenShift Container Platform deployment with a greater number of worker nodes.
  • Activation of the BIOS and Diagnostics utility.
  • Deployment on a hard disk drive rather than an SSD.
  • Deployment on physical nodes such as bare metal, rather than on virtual nodes.
  • A slow CPU and network.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • Optional: You have installed the Node Feature Discovery Operator if you want to enable node eligibility checks.

Procedure

  1. In the OpenShift Container Platform web console, navigate to Operators Installed Operators.
  2. Select the OpenShift sandboxed containers Operator.
  3. On the KataConfig tab, click Create KataConfig.
  4. Enter the following details:

    • Name: Optional: The default name is example-kataconfig.
    • Labels: Optional: Enter any relevant, identifying attributes to the KataConfig resource. Each label represents a key-value pair.
    • checkNodeEligibility: Optional: Select to use the Node Feature Discovery Operator (NFD) to detect node eligibility.
    • kataConfigPoolSelector. Optional: To install kata on selected nodes, add a match expression for the labels on the selected nodes:

      1. Expand the kataConfigPoolSelector area.
      2. In the kataConfigPoolSelector area, expand matchExpressions. This is a list of label selector requirements.
      3. Click Add matchExpressions.
      4. In the Key field, enter the label key the selector applies to.
      5. In the Operator field, enter the key’s relationship to the label values. Valid operators are In, NotIn, Exists, and DoesNotExist.
      6. Expand the Values area and then click Add value.
      7. In the Value field, enter true or false for key label value.
    • logLevel: Define the level of log data retrieved for nodes with the kata runtime class.
  5. Click Create. The KataConfig CR is created and installs the kata runtime class on the worker nodes.

    Wait for the kata installation to complete and the worker nodes to reboot before verifying the installation.

Verification

  1. On the KataConfig tab, click the KataConfig CR to view its details.
  2. Click the YAML tab to view the status stanza.

    The status stanza contains the conditions and kataNodes keys. The value of status.kataNodes is an array of nodes, each of which lists nodes in a particular state of kata installation. A message appears each time there is an update.

  3. Click Reload to refresh the YAML.

    When all workers in the status.kataNodes array display the values installed and conditions.InProgress: False with no specified reason, the kata is installed on the cluster.

Additional resources

2.2.3. Configuring workload objects

You must configure OpenShift sandboxed containers workload objects by setting kata as the runtime class for the following pod-templated objects:

  • Pod objects
  • ReplicaSet objects
  • ReplicationController objects
  • StatefulSet objects
  • Deployment objects
  • DeploymentConfig objects
Important

Do not deploy workloads in an Operator namespace. Create a dedicated namespace for these resources.

Prerequisites

  • You have created the KataConfig custom resource (CR).

Procedure

  1. In the OpenShift Container Platform web console, navigate to Workloads workload type, for example, Pods.
  2. On the workload type page, click an object to view its details.
  3. Click the YAML tab.
  4. Add spec.runtimeClassName: kata to the manifest of each pod-templated workload object as in the following example:

    apiVersion: v1
    kind: <object>
    # ...
    spec:
      runtimeClassName: kata
    # ...

    OpenShift Container Platform creates the workload object and begins scheduling it.

Verification

  • Inspect the spec.runtimeClassName field of a pod-templated object. If the value is kata, then the workload is running on OpenShift sandboxed containers, using peer pods.

2.3. Deploying OpenShift sandboxed containers by using the command line

You can deploy OpenShift sandboxed containers on bare metal by using the command line interface (CLI) to perform the following tasks:

  1. Install the OpenShift sandboxed containers Operator.
  2. After installing the Operator, you can configure the following options:

    • Configure a block storage device.
    • Install the Node Feature Discovery (NFD) Operator to configure node eligibility checks. For more information, see node eligibility checks and the NFD Operator documentation.

      • Create a NodeFeatureDiscovery custom resource.
  3. Create the KataConfig custom resource.
  4. Optional: Modify the pod overhead.
  5. Configure the OpenShift sandboxed containers workload objects.

2.3.1. Installing the OpenShift sandboxed containers Operator

You can install the OpenShift sandboxed containers Operator by using the CLI.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have access to the cluster as a user with the cluster-admin role.

Procedure

  1. Create an osc-namespace.yaml manifest file:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-sandboxed-containers-operator
  2. Create the namespace by running the following command:

    $ oc apply -f osc-namespace.yaml
  3. Create an osc-operatorgroup.yaml manifest file:

    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: sandboxed-containers-operator-group
      namespace: openshift-sandboxed-containers-operator
    spec:
      targetNamespaces:
      - openshift-sandboxed-containers-operator
  4. Create the operator group by running the following command:

    $ oc apply -f osc-operatorgroup.yaml
  5. Create an osc-subscription.yaml manifest file:

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: sandboxed-containers-operator
      namespace: openshift-sandboxed-containers-operator
    spec:
      channel: stable
      installPlanApproval: Automatic
      name: sandboxed-containers-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
      startingCSV: sandboxed-containers-operator.v1.8.0
  6. Create the subscription by running the following command:

    $ oc apply -f osc-subscription.yaml
  7. Verify that the Operator is correctly installed by running the following command:

    $ oc get csv -n openshift-sandboxed-containers-operator

    This command can take several minutes to complete.

  8. Watch the process by running the following command:

    $ watch oc get csv -n openshift-sandboxed-containers-operator

    Example output

    NAME                             DISPLAY                                  VERSION             REPLACES                   PHASE
    openshift-sandboxed-containers   openshift-sandboxed-containers-operator  1.8.0    1.7.0        Succeeded

2.3.2. Optional configurations

You can configure the following options after you install the OpenShift sandboxed containers Operator.

2.3.2.1. Provisioning local block volumes

You can use local block volumes with OpenShift sandboxed containers. You must first provision the local block volumes by using the Local Storage Operator (LSO). Then you must enable the nodes with the local block volumes to run OpenShift sandboxed containers workloads.

You can provision local block volumes for OpenShift sandboxed containers by using the Local Storage Operator (LSO). The local volume provisioner looks for any block volume devices at the paths specified in the defined resource.

Prerequisites

  • You have installed the Local Storage Operator.
  • You have a local disk that meets the following conditions:

    • It is attached to a node.
    • It is not mounted.
    • It does not contain partitions.

Procedure

  1. Create the local volume resource. This resource must define the nodes and paths to the local volumes.

    Note

    Do not use different storage class names for the same device. Doing so creates multiple persistent volumes (PVs).

    Example: Block

    apiVersion: "local.storage.openshift.io/v1"
    kind: "LocalVolume"
    metadata:
      name: "local-disks"
      namespace: "openshift-local-storage" 1
    spec:
      nodeSelector: 2
        nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
              - ip-10-0-136-143
              - ip-10-0-140-255
              - ip-10-0-144-180
      storageClassDevices:
        - storageClassName: "local-sc" 3
          forceWipeDevicesAndDestroyAllData: false 4
          volumeMode: Block
          devicePaths: 5
            - /path/to/device 6

    1
    The namespace where the Local Storage Operator is installed.
    2
    Optional: A node selector containing a list of nodes where the local storage volumes are attached. This example uses the node hostnames, obtained from oc get node. If a value is not defined, then the Local Storage Operator will attempt to find matching disks on all available nodes.
    3
    The name of the storage class to use when creating persistent volume objects.
    4
    This setting defines whether or not to call wipefs, which removes partition table signatures (magic strings) making the disk ready to use for Local Storage Operator provisioning. No other data besides signatures is erased. The default is "false" (wipefs is not invoked). Setting forceWipeDevicesAndDestroyAllData to "true" can be useful in scenarios where previous data can remain on disks that need to be re-used. In these scenarios, setting this field to true eliminates the need for administrators to erase the disks manually.
    5
    The path containing a list of local storage devices to choose from. You must use this path when enabling a node with a local block device to run OpenShift sandboxed containers workloads.
    6
    Replace this value with the filepath to your LocalVolume resource by-id, such as /dev/disk/by-id/wwn. PVs are created for these local disks when the provisioner is deployed successfully.
  2. Create the local volume resource in your OpenShift Container Platform cluster. Specify the file you just created:

    $ oc apply -f <local-volume>.yaml
  3. Verify that the provisioner was created and that the corresponding daemon sets were created:

    $ oc get all -n openshift-local-storage

    Example output

    NAME                                          READY   STATUS    RESTARTS   AGE
    pod/diskmaker-manager-9wzms                   1/1     Running   0          5m43s
    pod/diskmaker-manager-jgvjp                   1/1     Running   0          5m43s
    pod/diskmaker-manager-tbdsj                   1/1     Running   0          5m43s
    pod/local-storage-operator-7db4bd9f79-t6k87   1/1     Running   0          14m
    
    NAME                                     TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
    service/local-storage-operator-metrics   ClusterIP   172.30.135.36   <none>        8383/TCP,8686/TCP   14m
    
    NAME                               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    daemonset.apps/diskmaker-manager   3         3         3       3            3           <none>          5m43s
    
    NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/local-storage-operator   1/1     1            1           14m
    
    NAME                                                DESIRED   CURRENT   READY   AGE
    replicaset.apps/local-storage-operator-7db4bd9f79   1         1         1       14m

    Note the desired and current number of daemon set processes. A desired count of 0 indicates that the label selectors were invalid.

  4. Verify that the persistent volumes were created:

    $ oc get pv

    Example output

    NAME                CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
    local-pv-1cec77cf   100Gi      RWO            Delete           Available           local-sc                88m
    local-pv-2ef7cd2a   100Gi      RWO            Delete           Available           local-sc                82m
    local-pv-3fa1c73    100Gi      RWO            Delete           Available           local-sc                48m

Important

Editing the LocalVolume object does not change existing persistent volumes because doing so might result in a destructive operation.

2.3.2.2. Enabling nodes to use a local block device

You can configure nodes with a local block device to run OpenShift sandboxed containers workloads at the paths specified in the defined volume resource.

Prerequisites

  • You provisioned a block device using the Local Storage Operator (LSO).

Procedure

  • Enable each node with a local block device to run OpenShift sandboxed containers workloads by running the following command:

    $ oc debug node/worker-0 -- chcon -vt container_file_t /host/path/to/device

    The /path/to/device must be the same path you defined when creating the local storage resource.

    Example output

    system_u:object_r:container_file_t:s0 /host/path/to/device

2.3.2.3. Creating a NodeFeatureDiscovery custom resource

You create a NodeFeatureDiscovery custom resource (CR) to define the configuration parameters that the Node Feature Discovery (NFD) Operator checks to determine that the worker nodes can support OpenShift sandboxed containers.

Note

To install the kata runtime on only selected worker nodes that you know are eligible, apply the feature.node.kubernetes.io/runtime.kata=true label to the selected nodes and set checkNodeEligibility: true in the KataConfig CR.

To install the kata runtime on all worker nodes, set checkNodeEligibility: false in the KataConfig CR.

In both these scenarios, you do not need to create the NodeFeatureDiscovery CR. You should only apply the feature.node.kubernetes.io/runtime.kata=true label manually if you are sure that the node is eligible to run OpenShift sandboxed containers.

The following procedure applies the feature.node.kubernetes.io/runtime.kata=true label to all eligible nodes and configures the KataConfig resource to check for node eligibility.

Prerequisites

  • You have installed the NFD Operator.

Procedure

  1. Create an nfd.yaml manifest file according to the following example:

    apiVersion: nfd.openshift.io/v1
    kind: NodeFeatureDiscovery
    metadata:
      name: nfd-kata
      namespace: openshift-nfd
    spec:
      workerConfig:
        configData: |
          sources:
            custom:
              - name: "feature.node.kubernetes.io/runtime.kata"
                matchOn:
                  - cpuId: ["SSE4", "VMX"]
                    loadedKMod: ["kvm", "kvm_intel"]
                  - cpuId: ["SSE4", "SVM"]
                    loadedKMod: ["kvm", "kvm_amd"]
    # ...
  2. Create the NodeFeatureDiscovery CR:

    $ oc create -f nfd.yaml

    The NodeFeatureDiscovery CR applies the feature.node.kubernetes.io/runtime.kata=true label to all qualifying worker nodes.

  1. Create a kata-config.yaml manifest file according to the following example:

    apiVersion: kataconfiguration.openshift.io/v1
    kind: KataConfig
    metadata:
      name: example-kataconfig
    spec:
      checkNodeEligibility: true
  2. Create the KataConfig CR:

    $ oc create -f kata-config.yaml

Verification

  • Verify that qualifying nodes in the cluster have the correct label applied:

    $ oc get nodes --selector='feature.node.kubernetes.io/runtime.kata=true'

    Example output

    NAME                           STATUS                     ROLES    AGE     VERSION
    compute-3.example.com          Ready                      worker   4h38m   v1.25.0
    compute-2.example.com          Ready                      worker   4h35m   v1.25.0

2.3.3. Creating the KataConfig custom resource

You must create the KataConfig custom resource (CR) to install kata as a runtime class on your worker nodes.

Creating the KataConfig CR triggers the OpenShift sandboxed containers Operator to do the following:

  • Install the needed RHCOS extensions, such as QEMU and kata-containers, on your RHCOS node.
  • Ensure that the CRI-O runtime is configured with the correct runtime handlers.
  • Create a RuntimeClass CR named kata with a default configuration. This enables users to configure workloads to use kata as the runtime by referencing the CR in the RuntimeClassName field. This CR also specifies the resource overhead for the runtime.

OpenShift sandboxed containers installs kata as a secondary, optional runtime on the cluster and not as the primary runtime.

Important

Creating the KataConfig CR automatically reboots the worker nodes. The reboot can take from 10 to more than 60 minutes. Factors that impede reboot time are as follows:

  • A larger OpenShift Container Platform deployment with a greater number of worker nodes.
  • Activation of the BIOS and Diagnostics utility.
  • Deployment on a hard disk drive rather than an SSD.
  • Deployment on physical nodes such as bare metal, rather than on virtual nodes.
  • A slow CPU and network.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • Optional: You have installed the Node Feature Discovery Operator if you want to enable node eligibility checks.

Procedure

  1. Create an example-kataconfig.yaml manifest file according to the following example:

    apiVersion: kataconfiguration.openshift.io/v1
    kind: KataConfig
    metadata:
      name: example-kataconfig
    spec:
      checkNodeEligibility: false 1
      logLevel: info
    #  kataConfigPoolSelector:
    #    matchLabels:
    #      <label_key>: '<label_value>' 2
    1
    Optional: Set`checkNodeEligibility` to true to run node eligibility checks.
    2
    Optional: If you have applied node labels to install OpenShift sandboxed containers on specific nodes, specify the key and value.
  2. Create the KataConfig CR by running the following command:

    $ oc apply -f example-kataconfig.yaml

    The new KataConfig CR is created and installs kata as a runtime class on the worker nodes.

    Wait for the kata installation to complete and the worker nodes to reboot before verifying the installation.

  3. Monitor the installation progress by running the following command:

    $ watch "oc describe kataconfig | sed -n /^Status:/,/^Events/p"

    When the status of all workers under kataNodes is installed and the condition InProgress is False without specifying a reason, the kata is installed on the cluster.

2.3.4. Modifying pod overhead

Pod overhead describes the amount of system resources that a pod on a node uses. You can modify the pod overhead by changing the spec.overhead field for a RuntimeClass custom resource. For example, if the configuration that you run for your containers consumes more than 350Mi of memory for the QEMU process and guest kernel data, you can alter the RuntimeClass overhead to suit your needs.

When performing any kind of file system I/O in the guest, file buffers are allocated in the guest kernel. The file buffers are also mapped in the QEMU process on the host, as well as in the virtiofsd process.

For example, if you use 300Mi of file buffer cache in the guest, both QEMU and virtiofsd appear to use 300Mi additional memory. However, the same memory is being used in all three cases. Therefore, the total memory usage is only 300Mi, mapped in three different places. This is correctly accounted for when reporting the memory utilization metrics.

Note

The default values are supported by Red Hat. Changing default overhead values is not supported and can result in technical issues.

Procedure

  1. Obtain the RuntimeClass object by running the following command:

    $ oc describe runtimeclass kata
  2. Update the overhead.podFixed.memory and cpu values and save as the file as runtimeclass.yaml:

    kind: RuntimeClass
    apiVersion: node.k8s.io/v1
    metadata:
      name: kata
    overhead:
      podFixed:
        memory: "500Mi"
        cpu: "500m"
  3. Apply the changes by running the following command:

    $ oc apply -f runtimeclass.yaml

2.3.5. Configuring workload objects

You must configure OpenShift sandboxed containers workload objects by setting kata as the runtime class for the following pod-templated objects:

  • Pod objects
  • ReplicaSet objects
  • ReplicationController objects
  • StatefulSet objects
  • Deployment objects
  • DeploymentConfig objects
Important

Do not deploy workloads in an Operator namespace. Create a dedicated namespace for these resources.

Prerequisites

  • You have created the KataConfig custom resource (CR).

Procedure

  1. Add spec.runtimeClassName: kata to the manifest of each pod-templated workload object as in the following example:

    apiVersion: v1
    kind: <object>
    # ...
    spec:
      runtimeClassName: kata
    # ...

    OpenShift Container Platform creates the workload object and begins scheduling it.

Verification

  • Inspect the spec.runtimeClassName field of a pod-templated object. If the value is kata, then the workload is running on OpenShift sandboxed containers, using peer pods.
Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.