Chapter 2. Deploying OpenShift sandboxed containers on bare metal
You can deploy OpenShift sandboxed containers on an on-premise bare-metal cluster with Red Hat Enterprise Linux CoreOS (RHCOS) installed on the worker nodes.
- RHEL nodes are not supported.
- Nested virtualization is not supported.
You can use any installation method including user-provisioned, installer-provisioned, or Assisted Installer to deploy your cluster.
You can also install OpenShift sandboxed containers on Amazon Web Services (AWS) bare-metal instances. Bare-metal instances offered by other cloud providers are not supported.
Cluster requirements
- You have installed Red Hat OpenShift Container Platform 4.14 or later on the cluster where you are installing the OpenShift sandboxed containers Operator.
- Your cluster has at least one worker node.
2.1. OpenShift sandboxed containers resource requirements
You must ensure that your cluster has sufficient resources.
OpenShift sandboxed containers lets users run workloads on their OpenShift Container Platform clusters inside a sandboxed runtime (Kata). Each pod is represented by a virtual machine (VM). Each VM runs in a QEMU process and hosts a kata-agent
process that acts as a supervisor for managing container workloads, and the processes running in those containers. Two additional processes add more overhead:
-
containerd-shim-kata-v2
is used to communicate with the pod. -
virtiofsd
handles host file system access on behalf of the guest.
Each VM is configured with a default amount of memory. Additional memory is hot-plugged into the VM for containers that explicitly request memory.
A container running without a memory resource consumes free memory until the total memory used by the VM reaches the default allocation. The guest and its I/O buffers also consume memory.
If a container is given a specific amount of memory, then that memory is hot-plugged into the VM before the container starts.
When a memory limit is specified, the workload is terminated if it consumes more memory than the limit. If no memory limit is specified, the kernel running on the VM might run out of memory. If the kernel runs out of memory, it might terminate other processes on the VM.
Default memory sizes
The following table lists some the default values for resource allocation.
Resource | Value |
---|---|
Memory allocated by default to a virtual machine | 2Gi |
Guest Linux kernel memory usage at boot | ~110Mi |
Memory used by the QEMU process (excluding VM memory) | ~30Mi |
Memory used by the | ~10Mi |
Memory used by the | ~20Mi |
File buffer cache data after running | ~300Mi* [1] |
File buffers appear and are accounted for in multiple locations:
- In the guest where it appears as file buffer cache.
-
In the
virtiofsd
daemon that maps allowed user-space file I/O operations. - In the QEMU process as guest memory.
Total memory usage is properly accounted for by the memory utilization metrics, which only count that memory once.
Pod overhead describes the amount of system resources that a pod on a node uses. You can get the current pod overhead for the Kata runtime by using oc describe runtimeclass kata
as shown below.
Example
$ oc describe runtimeclass kata
Example output
kind: RuntimeClass apiVersion: node.k8s.io/v1 metadata: name: kata overhead: podFixed: memory: "500Mi" cpu: "500m"
You can change the pod overhead by changing the spec.overhead
field for a RuntimeClass
. For example, if the configuration that you run for your containers consumes more than 350Mi of memory for the QEMU process and guest kernel data, you can alter the RuntimeClass
overhead to suit your needs.
The specified default overhead values are supported by Red Hat. Changing default overhead values is not supported and can result in technical issues.
When performing any kind of file system I/O in the guest, file buffers are allocated in the guest kernel. The file buffers are also mapped in the QEMU process on the host, as well as in the virtiofsd
process.
For example, if you use 300Mi of file buffer cache in the guest, both QEMU and virtiofsd
appear to use 300Mi additional memory. However, the same memory is used in all three cases. Therefore, the total memory usage is only 300Mi, mapped in three different places. This is correctly accounted for when reporting the memory utilization metrics.
2.2. Deploying OpenShift sandboxed containers by using the web console
You can deploy OpenShift sandboxed containers on bare metal by using the OpenShift Container Platform web console to perform the following tasks:
- Install the OpenShift sandboxed containers Operator.
- Optional: Install the Node Feature Discovery (NFD) Operator to configure node eligibility checks. For more information, see node eligibility checks and the NFD Operator documentation.
-
Create the
KataConfig
custom resource. - Configure the OpenShift sandboxed containers workload objects.
2.2.1. Installing the OpenShift sandboxed containers Operator
You can install the OpenShift sandboxed containers Operator by using the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
-
In the web console, navigate to Operators
OperatorHub. -
In the Filter by keyword field, type
OpenShift sandboxed containers
. - Select the OpenShift sandboxed containers Operator tile and click Install.
- On the Install Operator page, select stable from the list of available Update Channel options.
Verify that Operator recommended Namespace is selected for Installed Namespace. This installs the Operator in the mandatory
openshift-sandboxed-containers-operator
namespace. If this namespace does not yet exist, it is automatically created.NoteAttempting to install the OpenShift sandboxed containers Operator in a namespace other than
openshift-sandboxed-containers-operator
causes the installation to fail.- Verify that Automatic is selected for Approval Strategy. Automatic is the default value, and enables automatic updates to OpenShift sandboxed containers when a new z-stream release is available.
- Click Install.
-
Navigate to Operators
Installed Operators to verify that the Operator is installed.
Additional resources
- Using Operator Lifecycle Manager on restricted networks.
- Configuring proxy support in Operator Lifecycle Manager for disconnected environments.
2.2.2. Creating the KataConfig custom resource
You must create the KataConfig
custom resource (CR) to install kata
as a RuntimeClass
on your worker nodes.
The kata
runtime class is installed on all worker nodes by default. If you want to install kata
on specific nodes, you can add labels to those nodes and then define the label in the KataConfig
CR.
OpenShift sandboxed containers installs kata
as a secondary, optional runtime on the cluster and not as the primary runtime.
Creating the KataConfig
CR automatically reboots the worker nodes. The reboot can take from 10 to more than 60 minutes. The following factors might increase the reboot time:
- A larger OpenShift Container Platform deployment with a greater number of worker nodes.
- Activation of the BIOS and Diagnostics utility.
- Deployment on a hard disk drive rather than an SSD.
- Deployment on physical nodes such as bare metal, rather than on virtual nodes.
- A slow CPU and network.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - Optional: You have installed the Node Feature Discovery Operator if you want to enable node eligibility checks.
Procedure
-
In the OpenShift Container Platform web console, navigate to Operators
Installed Operators. - Select the OpenShift sandboxed containers Operator.
- On the KataConfig tab, click Create KataConfig.
Enter the following details:
-
Name: Optional: The default name is
example-kataconfig
. -
Labels: Optional: Enter any relevant, identifying attributes to the
KataConfig
resource. Each label represents a key-value pair. - checkNodeEligibility: Optional: Select to use the Node Feature Discovery Operator (NFD) to detect node eligibility.
kataConfigPoolSelector. Optional: To install
kata
on selected nodes, add a match expression for the labels on the selected nodes:- Expand the kataConfigPoolSelector area.
- In the kataConfigPoolSelector area, expand matchExpressions. This is a list of label selector requirements.
- Click Add matchExpressions.
- In the Key field, enter the label key the selector applies to.
-
In the Operator field, enter the key’s relationship to the label values. Valid operators are
In
,NotIn
,Exists
, andDoesNotExist
. - Expand the Values area and then click Add value.
-
In the Value field, enter
true
orfalse
for key label value.
-
logLevel: Define the level of log data retrieved for nodes with the
kata
runtime class.
-
Name: Optional: The default name is
Click Create. The
KataConfig
CR is created and installs thekata
runtime class on the worker nodes.Wait for the
kata
installation to complete and the worker nodes to reboot before verifying the installation.
Verification
-
On the KataConfig tab, click the
KataConfig
CR to view its details. Click the YAML tab to view the
status
stanza.The
status
stanza contains theconditions
andkataNodes
keys. The value ofstatus.kataNodes
is an array of nodes, each of which lists nodes in a particular state ofkata
installation. A message appears each time there is an update.Click Reload to refresh the YAML.
When all workers in the
status.kataNodes
array display the valuesinstalled
andconditions.InProgress: False
with no specified reason, thekata
is installed on the cluster.
Additional resources
2.2.3. Configuring workload objects
You must configure OpenShift sandboxed containers workload objects by setting kata
as the runtime class for the following pod-templated objects:
-
Pod
objects -
ReplicaSet
objects -
ReplicationController
objects -
StatefulSet
objects -
Deployment
objects -
DeploymentConfig
objects
Do not deploy workloads in an Operator namespace. Create a dedicated namespace for these resources.
Prerequisites
-
You have created the
KataConfig
custom resource (CR).
Procedure
-
In the OpenShift Container Platform web console, navigate to Workloads
workload type, for example, Pods. - On the workload type page, click an object to view its details.
- Click the YAML tab.
Add
spec.runtimeClassName: kata
to the manifest of each pod-templated workload object as in the following example:apiVersion: v1 kind: <object> # ... spec: runtimeClassName: kata # ...
OpenShift Container Platform creates the workload object and begins scheduling it.
Verification
-
Inspect the
spec.runtimeClassName
field of a pod-templated object. If the value iskata
, then the workload is running on OpenShift sandboxed containers, using peer pods.
2.3. Deploying OpenShift sandboxed containers by using the command line
You can deploy OpenShift sandboxed containers on bare metal by using the command line interface (CLI) to perform the following tasks:
- Install the OpenShift sandboxed containers Operator.
After installing the Operator, you can configure the following options:
- Configure a block storage device.
Install the Node Feature Discovery (NFD) Operator to configure node eligibility checks. For more information, see node eligibility checks and the NFD Operator documentation.
-
Create a
NodeFeatureDiscovery
custom resource.
-
Create a
-
Create the
KataConfig
custom resource. - Optional: Modify the pod overhead.
- Configure the OpenShift sandboxed containers workload objects.
2.3.1. Installing the OpenShift sandboxed containers Operator
You can install the OpenShift sandboxed containers Operator by using the CLI.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Create an
osc-namespace.yaml
manifest file:apiVersion: v1 kind: Namespace metadata: name: openshift-sandboxed-containers-operator
Create the namespace by running the following command:
$ oc apply -f osc-namespace.yaml
Create an
osc-operatorgroup.yaml
manifest file:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sandboxed-containers-operator-group namespace: openshift-sandboxed-containers-operator spec: targetNamespaces: - openshift-sandboxed-containers-operator
Create the operator group by running the following command:
$ oc apply -f osc-operatorgroup.yaml
Create an
osc-subscription.yaml
manifest file:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sandboxed-containers-operator namespace: openshift-sandboxed-containers-operator spec: channel: stable installPlanApproval: Automatic name: sandboxed-containers-operator source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: sandboxed-containers-operator.v1.8.0
Create the subscription by running the following command:
$ oc apply -f osc-subscription.yaml
Verify that the Operator is correctly installed by running the following command:
$ oc get csv -n openshift-sandboxed-containers-operator
This command can take several minutes to complete.
Watch the process by running the following command:
$ watch oc get csv -n openshift-sandboxed-containers-operator
Example output
NAME DISPLAY VERSION REPLACES PHASE openshift-sandboxed-containers openshift-sandboxed-containers-operator 1.8.0 1.7.0 Succeeded
Additional resources
- Using Operator Lifecycle Manager on restricted networks.
- Configuring proxy support in Operator Lifecycle Manager for disconnected environments.
2.3.2. Optional configurations
You can configure the following options after you install the OpenShift sandboxed containers Operator.
2.3.2.1. Provisioning local block volumes
You can use local block volumes with OpenShift sandboxed containers. You must first provision the local block volumes by using the Local Storage Operator (LSO). Then you must enable the nodes with the local block volumes to run OpenShift sandboxed containers workloads.
You can provision local block volumes for OpenShift sandboxed containers by using the Local Storage Operator (LSO). The local volume provisioner looks for any block volume devices at the paths specified in the defined resource.
Prerequisites
- You have installed the Local Storage Operator.
You have a local disk that meets the following conditions:
- It is attached to a node.
- It is not mounted.
- It does not contain partitions.
Procedure
Create the local volume resource. This resource must define the nodes and paths to the local volumes.
NoteDo not use different storage class names for the same device. Doing so creates multiple persistent volumes (PVs).
Example: Block
apiVersion: "local.storage.openshift.io/v1" kind: "LocalVolume" metadata: name: "local-disks" namespace: "openshift-local-storage" 1 spec: nodeSelector: 2 nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - ip-10-0-136-143 - ip-10-0-140-255 - ip-10-0-144-180 storageClassDevices: - storageClassName: "local-sc" 3 forceWipeDevicesAndDestroyAllData: false 4 volumeMode: Block devicePaths: 5 - /path/to/device 6
- 1
- The namespace where the Local Storage Operator is installed.
- 2
- Optional: A node selector containing a list of nodes where the local storage volumes are attached. This example uses the node hostnames, obtained from
oc get node
. If a value is not defined, then the Local Storage Operator will attempt to find matching disks on all available nodes. - 3
- The name of the storage class to use when creating persistent volume objects.
- 4
- This setting defines whether or not to call
wipefs
, which removes partition table signatures (magic strings) making the disk ready to use for Local Storage Operator provisioning. No other data besides signatures is erased. The default is "false" (wipefs
is not invoked). SettingforceWipeDevicesAndDestroyAllData
to "true" can be useful in scenarios where previous data can remain on disks that need to be re-used. In these scenarios, setting this field to true eliminates the need for administrators to erase the disks manually. - 5
- The path containing a list of local storage devices to choose from. You must use this path when enabling a node with a local block device to run OpenShift sandboxed containers workloads.
- 6
- Replace this value with the filepath to your
LocalVolume
resourceby-id
, such as/dev/disk/by-id/wwn
. PVs are created for these local disks when the provisioner is deployed successfully.
Create the local volume resource in your OpenShift Container Platform cluster. Specify the file you just created:
$ oc apply -f <local-volume>.yaml
Verify that the provisioner was created and that the corresponding daemon sets were created:
$ oc get all -n openshift-local-storage
Example output
NAME READY STATUS RESTARTS AGE pod/diskmaker-manager-9wzms 1/1 Running 0 5m43s pod/diskmaker-manager-jgvjp 1/1 Running 0 5m43s pod/diskmaker-manager-tbdsj 1/1 Running 0 5m43s pod/local-storage-operator-7db4bd9f79-t6k87 1/1 Running 0 14m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/local-storage-operator-metrics ClusterIP 172.30.135.36 <none> 8383/TCP,8686/TCP 14m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/diskmaker-manager 3 3 3 3 3 <none> 5m43s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/local-storage-operator 1/1 1 1 14m NAME DESIRED CURRENT READY AGE replicaset.apps/local-storage-operator-7db4bd9f79 1 1 1 14m
Note the
desired
andcurrent
number of daemon set processes. Adesired
count of0
indicates that the label selectors were invalid.Verify that the persistent volumes were created:
$ oc get pv
Example output
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-1cec77cf 100Gi RWO Delete Available local-sc 88m local-pv-2ef7cd2a 100Gi RWO Delete Available local-sc 82m local-pv-3fa1c73 100Gi RWO Delete Available local-sc 48m
Editing the LocalVolume
object does not change existing persistent volumes because doing so might result in a destructive operation.
2.3.2.2. Enabling nodes to use a local block device
You can configure nodes with a local block device to run OpenShift sandboxed containers workloads at the paths specified in the defined volume resource.
Prerequisites
- You provisioned a block device using the Local Storage Operator (LSO).
Procedure
Enable each node with a local block device to run OpenShift sandboxed containers workloads by running the following command:
$ oc debug node/worker-0 -- chcon -vt container_file_t /host/path/to/device
The
/path/to/device
must be the same path you defined when creating the local storage resource.Example output
system_u:object_r:container_file_t:s0 /host/path/to/device
2.3.2.3. Creating a NodeFeatureDiscovery custom resource
You create a NodeFeatureDiscovery
custom resource (CR) to define the configuration parameters that the Node Feature Discovery (NFD) Operator checks to determine that the worker nodes can support OpenShift sandboxed containers.
To install the kata
runtime on only selected worker nodes that you know are eligible, apply the feature.node.kubernetes.io/runtime.kata=true
label to the selected nodes and set checkNodeEligibility: true
in the KataConfig
CR.
To install the kata
runtime on all worker nodes, set checkNodeEligibility: false
in the KataConfig
CR.
In both these scenarios, you do not need to create the NodeFeatureDiscovery
CR. You should only apply the feature.node.kubernetes.io/runtime.kata=true
label manually if you are sure that the node is eligible to run OpenShift sandboxed containers.
The following procedure applies the feature.node.kubernetes.io/runtime.kata=true
label to all eligible nodes and configures the KataConfig
resource to check for node eligibility.
Prerequisites
- You have installed the NFD Operator.
Procedure
Create an
nfd.yaml
manifest file according to the following example:apiVersion: nfd.openshift.io/v1 kind: NodeFeatureDiscovery metadata: name: nfd-kata namespace: openshift-nfd spec: workerConfig: configData: | sources: custom: - name: "feature.node.kubernetes.io/runtime.kata" matchOn: - cpuId: ["SSE4", "VMX"] loadedKMod: ["kvm", "kvm_intel"] - cpuId: ["SSE4", "SVM"] loadedKMod: ["kvm", "kvm_amd"] # ...
Create the
NodeFeatureDiscovery
CR:$ oc create -f nfd.yaml
The
NodeFeatureDiscovery
CR applies thefeature.node.kubernetes.io/runtime.kata=true
label to all qualifying worker nodes.
Create a
kata-config.yaml
manifest file according to the following example:apiVersion: kataconfiguration.openshift.io/v1 kind: KataConfig metadata: name: example-kataconfig spec: checkNodeEligibility: true
Create the
KataConfig
CR:$ oc create -f kata-config.yaml
Verification
Verify that qualifying nodes in the cluster have the correct label applied:
$ oc get nodes --selector='feature.node.kubernetes.io/runtime.kata=true'
Example output
NAME STATUS ROLES AGE VERSION compute-3.example.com Ready worker 4h38m v1.25.0 compute-2.example.com Ready worker 4h35m v1.25.0
2.3.3. Creating the KataConfig custom resource
You must create the KataConfig
custom resource (CR) to install kata
as a runtime class on your worker nodes.
Creating the KataConfig
CR triggers the OpenShift sandboxed containers Operator to do the following:
-
Install the needed RHCOS extensions, such as QEMU and
kata-containers
, on your RHCOS node. - Ensure that the CRI-O runtime is configured with the correct runtime handlers.
-
Create a
RuntimeClass
CR namedkata
with a default configuration. This enables users to configure workloads to usekata
as the runtime by referencing the CR in theRuntimeClassName
field. This CR also specifies the resource overhead for the runtime.
OpenShift sandboxed containers installs kata
as a secondary, optional runtime on the cluster and not as the primary runtime.
Creating the KataConfig
CR automatically reboots the worker nodes. The reboot can take from 10 to more than 60 minutes. Factors that impede reboot time are as follows:
- A larger OpenShift Container Platform deployment with a greater number of worker nodes.
- Activation of the BIOS and Diagnostics utility.
- Deployment on a hard disk drive rather than an SSD.
- Deployment on physical nodes such as bare metal, rather than on virtual nodes.
- A slow CPU and network.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - Optional: You have installed the Node Feature Discovery Operator if you want to enable node eligibility checks.
Procedure
Create an
example-kataconfig.yaml
manifest file according to the following example:apiVersion: kataconfiguration.openshift.io/v1 kind: KataConfig metadata: name: example-kataconfig spec: checkNodeEligibility: false 1 logLevel: info # kataConfigPoolSelector: # matchLabels: # <label_key>: '<label_value>' 2
Create the
KataConfig
CR by running the following command:$ oc apply -f example-kataconfig.yaml
The new
KataConfig
CR is created and installskata
as a runtime class on the worker nodes.Wait for the
kata
installation to complete and the worker nodes to reboot before verifying the installation.Monitor the installation progress by running the following command:
$ watch "oc describe kataconfig | sed -n /^Status:/,/^Events/p"
When the status of all workers under
kataNodes
isinstalled
and the conditionInProgress
isFalse
without specifying a reason, thekata
is installed on the cluster.
2.3.4. Modifying pod overhead
Pod overhead describes the amount of system resources that a pod on a node uses. You can modify the pod overhead by changing the spec.overhead
field for a RuntimeClass
custom resource. For example, if the configuration that you run for your containers consumes more than 350Mi of memory for the QEMU process and guest kernel data, you can alter the RuntimeClass
overhead to suit your needs.
When performing any kind of file system I/O in the guest, file buffers are allocated in the guest kernel. The file buffers are also mapped in the QEMU process on the host, as well as in the virtiofsd
process.
For example, if you use 300Mi of file buffer cache in the guest, both QEMU and virtiofsd
appear to use 300Mi additional memory. However, the same memory is being used in all three cases. Therefore, the total memory usage is only 300Mi, mapped in three different places. This is correctly accounted for when reporting the memory utilization metrics.
The default values are supported by Red Hat. Changing default overhead values is not supported and can result in technical issues.
Procedure
Obtain the
RuntimeClass
object by running the following command:$ oc describe runtimeclass kata
Update the
overhead.podFixed.memory
andcpu
values and save as the file asruntimeclass.yaml
:kind: RuntimeClass apiVersion: node.k8s.io/v1 metadata: name: kata overhead: podFixed: memory: "500Mi" cpu: "500m"
Apply the changes by running the following command:
$ oc apply -f runtimeclass.yaml
2.3.5. Configuring workload objects
You must configure OpenShift sandboxed containers workload objects by setting kata
as the runtime class for the following pod-templated objects:
-
Pod
objects -
ReplicaSet
objects -
ReplicationController
objects -
StatefulSet
objects -
Deployment
objects -
DeploymentConfig
objects
Do not deploy workloads in an Operator namespace. Create a dedicated namespace for these resources.
Prerequisites
-
You have created the
KataConfig
custom resource (CR).
Procedure
Add
spec.runtimeClassName: kata
to the manifest of each pod-templated workload object as in the following example:apiVersion: v1 kind: <object> # ... spec: runtimeClassName: kata # ...
OpenShift Container Platform creates the workload object and begins scheduling it.
Verification
-
Inspect the
spec.runtimeClassName
field of a pod-templated object. If the value iskata
, then the workload is running on OpenShift sandboxed containers, using peer pods.