Deploying confidential containers on bare-metal servers

OpenShift sandboxed containers 1.12

Protecting containers and data by leveraging trusted execution environments

Red Hat Customer Content Services

Abstract

Confidential containers provide a confidential computing environment to protect containers and data by leveraging trusted execution environments. You install the OpenShift sandboxed containers Operator on an OpenShift Container Platform cluster for your confidential containers workload after configuring an attestation service such as Red Hat build of Trustee in a trusted environment.

Preface
Copy link

Providing feedback on Red Hat documentation
Copy link

You can provide feedback or report an error by submitting the Create Issue form in Jira.

Procedure

Ensure that you are logged in to Jira. If you do not have a Jira account, you must create a Red Hat Jira account.
Launch the Create Issue form.
Enter a descriptive title in the Summary field.
In the Description field, include the documentation URL, chapter or section number, and a detailed description of the issue.
Enter your Jira user ID in the Reporter field.
Click Create.

Chapter 1. Discover
Copy link

You can deploy confidential containers workloads on a Red Hat OpenShift Container Platform cluster running on bare-metal servers with the Intel® Trust Domain Extensions (TDX) and AMD SEV-SNP Trusted Execution Environments (TEEs). Confidential containers provides a confidential computing environment to protect containers and data by leveraging hardware-based Trusted Execution Environments (TEEs).

Important

Confidential containers on bare-metal servers is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

1.1. Compatibility with OpenShift Container Platform
Copy link

You must ensure that your Red Hat OpenShift Container Platform version supports the features you require.

The required functionality for OpenShift Container Platform is supported by two main components:

Kata runtime: The Kata runtime is included with Red Hat Enterprise Linux CoreOS (RHCOS) and receives updates with every OpenShift Container Platform release. When enabling peer pods with the Kata runtime, the OpenShift sandboxed containers Operator requires external network connectivity to pull the necessary image components and helper utilities to create the pod virtual machine (VM) image.
OpenShift sandboxed containers Operator: The OpenShift sandboxed containers Operator is a Rolling Stream Operator, which means the latest version is the only supported version. It works with all currently supported versions of OpenShift Container Platform.

The Operator depends on the features that come with the RHCOS host and the environment it runs in.

Note

You must install RHCOS on the worker nodes. Red Hat Enterprise Linux (RHEL) nodes are not supported.

The following compatibility matrix for OpenShift sandboxed containers and OpenShift Container Platform releases identifies compatible features and environments.

Expand

Table 1.1. Supported architectures
Architecture	OpenShift Container Platform version (without GPU)	OpenShift Container Platform version (with GPU)
x86_64	4.19.28+	4.21.9+
s390x	4.19.28+	—

There are two ways to deploy the Kata containers runtime:

Bare metal
Peer pods

You can deploy OpenShift sandboxed containers by using peer pods on Microsoft Azure, Amazon Web Services (AWS), or Google Cloud. With the release of OpenShift sandboxed containers 1.12.0, the OpenShift sandboxed containers Operator requires OpenShift Container Platform version 4.19.28 or later for deployments without GPU support .

The following table describes OpenShift Container Platform versions and features with the following support levels:

GA: General Availability
TP: Technology Preview
DP: Developer Preview

Note

The version numbers in the table represent the minimum supported version. For example, "4.21.9+" means version 4.21.9 or any later version.

Expand

Table 1.2. Feature availability by OpenShift Container Platform version
Platform	TEE	GPU	4.19.28+	4.20.18+	4.21.9+
Bare metal	Intel® TDX or AMD SEV-SNP	No	GA	GA	GA
Bare metal	Intel® TDX or AMD SEV-SNP	NVIDIA H100	—	—	TP
IBM Z bare metal	IBM SE for Linux	No	GA	GA	GA
IBM Z peer pods	IBM SE for Linux	No	GA	GA	GA
Azure	Intel® TDX or AMD SEV-SNP	No	GA	GA	GA
Azure	Intel® TDX or AMD SEV-SNP	NVIDIA H100	—	—	DP

Note

Confidential containers includes Red Hat build of Trustee.

GPU support with Kata requires the KubeletPodResourcesGet feature gate to be enabled. This is enabled only in OpenShift Container Platform 4.21 and later.

1.2. Common terms
Copy link

The following terms are used throughout the documentation.

Attestation

The process of verifying the integrity and trustworthiness of a Trusted Execution Environment (TEE) and the confidential containers workloads running within it, ensuring that only trusted code and data are executed. Red Hat build of Trustee performs this function.

Confidential containers

A technology that provides a confidential computing environment to protect containers and data by leveraging Trusted Execution Environments.

Initdata

A specification used to securely initialize a pod with workload-specific data (such as certificates, cryptographic keys, or an optional Kata Agent policy) at runtime, preventing the need to embed this data directly in the virtual machine (VM) image.

Kata Agent

A component within the pod Virtual Machine (VM) that enforces runtime policies and manages the lifecycle of the containers running inside the VM. Its policy controls API requests for peer pods.

Kata containers

Kata containers is a core upstream project that is used to build OpenShift sandboxed containers. OpenShift sandboxed containers integrate Kata containers with OpenShift Container Platform.

kata runtime

The optional runtime installed by the OpenShift sandboxed containers Operator when configuring bare metal deployments.

kata-cc runtime

The runtime class used specifically for confidential containers deployments on bare-metal servers.

kata-remote runtime

The runtime class used for peer pod deployments on cloud platforms or remote hypervisors.

KataConfig

A custom resource used to configure and launch OpenShift sandboxed containers.

TrusteeConfig

A custom resource used to configure and launch Red Hat build of Trustee.

OpenShift sandboxed containers

OpenShift sandboxed containers integrates Kata containers as an optional runtime to provide enhanced security and isolation for container workloads by running applications in lightweight virtual machines.

OpenShift sandboxed containers Operator

The OpenShift sandboxed containers Operator manages the lifecycle of OpenShift sandboxed containers and confidential containers on a cluster.

Peer pod

A peer pod in OpenShift sandboxed containers extends the concept of a standard pod. Unlike a standard sandboxed container, where the virtual machine is created on the worker node itself, in a peer pod, the virtual machine is created through a remote hypervisor using any supported hypervisor or cloud provider API.

The peer pod acts as a regular pod on the worker node, with its corresponding VM running elsewhere. The remote location of the VM is transparent to the user and is specified by the runtime class in the pod specification. The peer pod design circumvents the need for nested virtualization.

Pod

A pod is a construct that is inherited from Kubernetes and OpenShift Container Platform. It represents resources where containers can be deployed. Containers run inside pods, and pods are used to specify resources that can be shared between multiple containers.

In the context of OpenShift sandboxed containers, a pod is implemented as a virtual machine. Several containers can run in the same pod on the same virtual machine.

Red Hat build of Trustee

Red Hat build of Trustee is an attestation service that verifies the trustworthiness of the location where you plan to run your workload or where you plan to send confidential information. Red Hat build of Trustee includes components deployed on a trusted side and used to verify whether the remote workload is running in a Trusted Execution Environment (TEE).

Red Hat build of Trustee Operator

The Red Hat build of Trustee Operator manages the installation, lifecycle, and configuration of Red Hat build of Trustee.

Runtime class

An object that describes the specific runtime configuration used to execute a workload.

Sandbox

A sandbox is an isolated environment where programs can run. In a sandbox, you can run untested or untrusted programs without risking harm to the host machine or the operating system.

In the context of OpenShift sandboxed containers, sandboxing is achieved by running workloads in a different kernel using virtualization, providing enhanced control over the interactions between multiple workloads that run on the same host.

Trusted Execution Environment (TEE)

Hardware-based security technology leveraged by confidential containers to protect containers and data. Examples: Intel® TDX, AMD SEV-SNP.

1.3. Initrd images
Copy link

An initial ramdisk (initrd) is a compressed file system used in a virtual machine (VM) boot process. In a confidential containers environment, an initrd is essential for booting the confidential virtual machine (CVM) and is a critical "link in the chain of trust".

Before a pod initializes in the CVM, hardware, such as AMD SEV-SNP or Intel® Trust Domain Extensions (TDX), evaluates the initrd contents.

Important

For confidential containers use cases, you must build initrd in a secure, isolated environment and add its hash to the reference values in Red Hat build of Trustee. Do not build initrd on a standard worker node at runtime. A compromised worker node could modify initrd during the build process, rendering the hardware measurement untrustworthy. initrd images provide a verified, static starting point for your confidential containers workloads.

initrd images provide the following benefits:

Established root of trust: initrd images from a trusted vendor contain a known Measurement Hash. You can hard-code the hash value into your Red Hat build of Trustee (Attestation Service) policy.
Operational simplicity: Using initrd images eliminates the need to maintain a private build pipeline and manage the dependencies required to create the correct initrd for a specific kernel.

The following Red Hat initrd image variants are available:

Standard (kata-cc.initrd): Includes a minimal root file system (RHEL 10 based), kernel drivers, kata-agent, confidential-data-hub, attestation-agent, and default restrictive kata-agent policy.
GPU (kata-nvidia-gpu-cc.initrd): Includes a minimal root file system (RHEL 10 based), kernel drivers (RHEL 10 based), kata-agent, confidential-data-hub, attestation-agent, default restrictive kata-agent policy, NVIDIA GPU driver (version 580.105.08), and nvidia-container toolkit.

The OpenShift sandboxed containers setup process installs the initrd images. You do not need to take any specific action.

1.4. FIPS compliance
Copy link

OpenShift Container Platform is designed for Federal Information Processing Standards (FIPS) 140-2 and 140-3. When running Red Hat Enterprise Linux (RHEL) or Red Hat Enterprise Linux CoreOS (RHCOS) booted in FIPS mode, OpenShift Container Platform core components use the RHEL cryptographic libraries that have been submitted to NIST for FIPS 140-2/140-3 Validation on only the x86_64, ppc64le, and s390x architectures.

For more information about the NIST validation program, see Cryptographic Module Validation Program. For the latest NIST status for the individual versions of RHEL cryptographic libraries that have been submitted for validation, see Compliance Activities and Government Standards.

OpenShift sandboxed containers can be used on FIPS enabled clusters.

When running in FIPS mode, OpenShift sandboxed containers components, VMs, and VM images are adapted to comply with FIPS.

Note

FIPS compliance for OpenShift sandboxed containers only applies to the kata runtime class. The peer pod runtime class, kata-remote, is not yet fully supported and has not been tested for FIPS compliance.

FIPS compliance is one of the most critical components required in highly secure environments, to ensure that only supported cryptographic technologies are allowed on nodes.

Important

The use of FIPS Validated / Modules in Process cryptographic libraries is only supported on OpenShift Container Platform deployments on the x86_64 architecture.

To understand Red Hat’s view of OpenShift Container Platform compliance frameworks, refer to the Risk Management and Regulatory Readiness chapter of the OpenShift Security Guide Book.

Chapter 2. Install
Copy link

You install confidential containers on bare-metal servers with the Intel® Trust Domain Extensions (TDX) and AMD SEV-SNP Trusted Execution Environments (TEEs) by configuring your environment and installing the OpenShift sandboxed containers Operator.

Perform the following steps:

Intel® TDX: Create a machine config for your cluster.
Install the OpenShift sandboxed containers Operator.

2.1. Prerequisites
Copy link

Review the following prerequisites before deploying confidential containers.

You have installed the latest version of Red Hat OpenShift Container Platform on the cluster where you are running your confidential containers workload.

Important

Check the Compatibility with OpenShift Container Platform for the specific minimum version required for confidential containers in release 1.12, as it requires a higher version than OpenShift sandboxed containers.

You have deployed Red Hat build of Trustee on an OpenShift Container Platform cluster in a trusted environment. For more information, see Deploying Red Hat build of Trustee.

2.2. Creating an Intel TDX machine config
Copy link

Enable Intel® Trust Domain Extensions (TDX) support by creating a MachineConfig object that configures the required kernel parameters and modules on your cluster nodes.

Procedure

Create a tdx-machine-config.yaml manifest file according to the following example:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: <role>
  name: 99-enable-intel-tdx
spec:
  kernelArguments:
  - kvm_intel.tdx=1
  - nohibernate
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - path: /etc/modules-load.d/vsock.conf
          mode: 0644
          contents:
            source: data:text/plain;charset=utf-8;base64,dnNvY2stbG9vcGJhY2sK

<role>: Specify master for single-node OpenShift or kata-oc for a multi-node cluster.

Create the config map by running the following command:
```
$ oc create -f tdx-machine-config.yaml
```
Updating the machine config triggers node reboot.

Verification

Verify that the machine config is correctly configured by running the following command:
```
$ oc get machineconfig 99-enable-intel-tdx
```

2.3. Installing the OpenShift sandboxed containers Operator
Copy link

You can install the OpenShift sandboxed containers Operator by using the command line interface (CLI).

Prerequisites

You have access to the cluster as a user with the cluster-admin role.

Procedure

Create an osc-namespace.yaml manifest file:

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sandboxed-containers-operator

Create the namespace by running the following command:
```
$ oc create -f osc-namespace.yaml
```

Create an osc-operatorgroup.yaml manifest file:

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sandboxed-containers-operator-group
  namespace: openshift-sandboxed-containers-operator
spec:
  targetNamespaces:
  - openshift-sandboxed-containers-operator

Create the operator group by running the following command:
```
$ oc create -f osc-operatorgroup.yaml
```

Create an osc-subscription.yaml manifest file:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sandboxed-containers-operator
  namespace: openshift-sandboxed-containers-operator
spec:
  channel: stable
  installPlanApproval: Automatic
  name: sandboxed-containers-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  startingCSV: sandboxed-containers-operator.v1.12.0

Create the subscription by running the following command:
```
$ oc create -f osc-subscription.yaml
```
Verify that the Operator is correctly installed by running the following command:
```
$ oc get csv -n openshift-sandboxed-containers-operator
```
This command can take several minutes to complete.

Watch the process by running the following command:

$ watch oc get csv -n openshift-sandboxed-containers-operator

Example output

NAME                             DISPLAY                                  VERSION         PHASE
openshift-sandboxed-containers   openshift-sandboxed-containers-operator  1.12.0          Succeeded

Chapter 3. Configure
Copy link

You can configure confidential containers on bare-metal servers with the Intel® Trust Domain Extensions (TDX) and AMD SEV-SNP Trusted Execution Environments (TEEs).

Perform the following steps:

Configure worker nodes so that TEEs are automatically detected.
Intel® TDX: Configure the remote attestation infrastructure.
Enable confidential containers.
Create initdata to initialize a pod with sensitive or workload-specific data at runtime.
Important
Do not use the default permissive Kata Agent policy in a production environment. You must configure a restrictive policy, preferably by creating initdata.
As a minimum requirement, you must disable ExecProcessRequest to prevent a cluster administrator from accessing sensitive data by running the oc exec command on a confidential containers pod.
Add initdata to a pod manifest.
Create the KataConfig CR.
Verify the attestation process.
Configure your workload for confidential containers.

3.1. Auto-detecting TEEs
Copy link

You must label your worker nodes so that the OpenShift sandboxed containers Operator can detect the Trusted Execution Environments (TEEs).

You label the nodes by installing and configuring the Node Feature Discovery (NFD) Operator.

3.1.1. Creating a NodeFeatureDiscovery custom resource
Copy link

You create a NodeFeatureDiscovery custom resource (CR) to define the configuration parameters that the Node Feature Discovery (NFD) Operator checks to automatically detect your TEE.

Prerequisites

You have installed the NFD Operator. For more information, see Node Feature Discovery Operator in the OpenShift Container Platform documentation.

Procedure

Create a my-nfd.yaml manifest file according to the following example:

apiVersion: nfd.openshift.io/v1
kind: NodeFeatureDiscovery
metadata:
  name: nfd-instance
  namespace: openshift-nfd
spec:
  operand:
    image: registry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.21
    imagePullPolicy: Always
    servicePort: 12000
  workerConfig:
    configData: |

Create the NodeFeatureDiscovery CR:
```
$ oc create -f my-nfd.yaml
```

3.1.2. Creating the NodeFeatureRule custom resource
Copy link

Create a NodeFeatureRule custom resource for your Trusted Execution Environment (TEE).

Procedure

Create a custom resource manifest named my-nodefeaturerule.yaml:

apiVersion: nfd.openshift.io/v1alpha1
kind: NodeFeatureRule
metadata:
  name: consolidated-hardware-features
  namespace: openshift-nfd
spec:
  rules:
    - name: "runtime.kata"
      labels:
        feature.node.kubernetes.io/runtime.kata: "true"
      matchAny:
        - matchFeatures:
            - feature: cpu.cpuid
              matchExpressions:
                SSE42: { op: Exists }
                VMX: { op: Exists }
            - feature: kernel.loadedmodule
              matchExpressions:
                kvm: { op: Exists }
                kvm_intel: { op: Exists }
        - matchFeatures:
            - feature: cpu.cpuid
              matchExpressions:
                SSE42: { op: Exists }
                SVM: { op: Exists }
            - feature: kernel.loadedmodule
              matchExpressions:
                kvm: { op: Exists }
                kvm_amd: { op: Exists }
    - name: "amd.sev-snp"
      labels:
        amd.feature.node.kubernetes.io/snp: "true"
      extendedResources:
        sev-snp.amd.com/esids: "@cpu.security.sev.encrypted_state_ids"
      matchFeatures:
        - feature: cpu.cpuid
          matchExpressions:
            SVM: { op: Exists }
        - feature: cpu.security
          matchExpressions:
            sev.snp.enabled: { op: Exists }

    - name: "intel.sgx"
      labels:
        intel.feature.node.kubernetes.io/sgx: "true"
      extendedResources:
        sgx.intel.com/epc: "@cpu.security.sgx.epc"
      matchFeatures:
        - feature: cpu.cpuid
          matchExpressions:
            SGX: { op: Exists }
            SGXLC: { op: Exists }
        - feature: cpu.security
          matchExpressions:
            sgx.enabled: { op: IsTrue }
        - feature: kernel.config
          matchExpressions:
            X86_SGX: { op: Exists }

    - name: "intel.tdx"
      labels:
        intel.feature.node.kubernetes.io/tdx: "true"
      extendedResources:
        tdx.intel.com/keys: "@cpu.security.tdx.total_keys"
      matchFeatures:
        - feature: cpu.cpuid
          matchExpressions:
            VMX: { op: Exists }
        - feature: cpu.security
          matchExpressions:
            tdx.enabled: { op: Exists }

Create the NodeFeatureRule CR by running the following command:
```
$ oc create -f my-nodefeaturerule.yaml
```
Note
A relabeling delay of up to 1 minute might occur.

3.2. Deploying Intel TDX remote attestation
Copy link

Set up the Intel® remote attestation infrastructure to enable quote generation and attestation for Intel® Trust Domain Extensions (TDX) pod virtual machines. This infrastructure includes an in-cluster Provisioning Certificate Caching Service (PCCS), automatic per-node Provisioning Certification Key (PCK) Cert ID Retrieval Tool based platform (re-)registration, and a per-node Quote Generation Service (QGS).

Note

The system does not back up the PCCS database automatically. Cluster administrators must implement a manual backup strategy for the database file located at /var/cache/pccs/ on the deployment node, typically a control plane node. If you do not have a valid backup, you must trigger an SGX Factory Reset in the BIOS to re-provision the required platform manifests.

Prerequisites

You must deploy the Intel® remote attestation infrastructure to enable quote generation for Intel® Trust Domain Extensions (TDX) pod virtual machines.
You have installed the Intel® device plugins Operator and created an instance of the Intel® Software Guard Extensions device plugin. For details, see Installing from the software catalog by using the web console in the OpenShift Container Platform documentation.
The node on which you deploy PCCS has Internet access.

Procedure

Configure the remote attestation project:
1. Create the intel-dcap namespace by running the following command:
  $ oc create namespace intel-dcap
2. Switch to the intel-dcap project by running the following command:
  $ oc project intel-dcap
3. Create dedicated service accounts for PCCS and QGS by running the following commands:
  $ oc create serviceaccount pccs-sa -n intel-dcap
  $ oc create serviceaccount qgs-sa -n intel-dcap
4. Grant the privileged Security Context Constraint to the service accounts by running the following commands:
  $ oc adm policy add-scc-to-user privileged -z pccs-sa -n intel-dcap
  $ oc adm policy add-scc-to-user privileged -z qgs-sa -n intel-dcap
Switch to the default project by running the following command:
```
$ oc project default
```
Set the PCCS variables by running the following commands:
```
$ export PCCS_API_KEY="<API_KEY_VALUE>"
```
To obtain the API key for the Intel® Software Guard Extensions and Intel® TDX Provisioning Certification Service, navigate to the Intel Trusted Services API portal, sign in, and subscribe to the Provisioning Certification Service. The API key is displayed on the Manage Subscriptions page.
```
$ export PCCS_USER_TOKEN="${PCCS_USER_TOKEN:-mytoken}"
```
For details about PCCS tokens, see the Design Guide for Intel® SGX Provisioning Certificate Caching Service (Intel® SGX PCCS).
```
$ export PCCS_ADMIN_TOKEN="${PCCS_ADMIN_TOKEN:-mytoken}"
```
```
$ export PCCS_NODE=$(oc get nodes \
  -l 'node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=' \
  -o jsonpath='{.items[0].metadata.name}')
```

Set the cluster proxy variable by running the appropriate command:

$ export CLUSTER_HTTPS_PROXY="$(oc get proxy/cluster \
  -o jsonpath={.spec.httpsProxy})"

$ export CLUSTER_NO_PROXY="$(oc get proxy/cluster \
  -o jsonpath={.spec.noProxy})"

Create the PCCS secrets:

Set the PCCS secrets variables by running the following commands:

$ export PCCS_USER_TOKEN_HASH=$(echo -n "$PCCS_USER_TOKEN" | sha512sum | tr -d '[:space:]-')

$ export PCCS_ADMIN_TOKEN_HASH=$(echo -n "$PCCS_ADMIN_TOKEN" | sha512sum | tr -d '[:space:]-')

$ export PCCS_PEM_CERT_PATH=$(mktemp -d)

Note

This directory is automatically deleted at reboot. To re-use the PCCS certificate and key, you must create a persistent directory.

Generate an RSA key pair and output the private key as a PCCS certificate by running the following command:

$ openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 \
  -keyout $PCCS_PEM_CERT_PATH/private.pem \
  -out $PCCS_PEM_CERT_PATH/certificate.pem \
  -subj "/C=US/ST=Denial/L=Springfield/O=Dis/CN=www.example.com"

Set the PCCS certificate variables by running the following commands:

$ export PCCS_PEM=$(cat "$PCCS_PEM_CERT_PATH"/private.pem | base64 | tr -d '\n')

$ export PCCS_CERT=$(cat "$PCCS_PEM_CERT_PATH"/certificate.pem | base64 | tr -d '\n')

Create the PCCS secrets by running the following command:

$ oc create secret generic pccs-secrets \
    --namespace intel-dcap \
    --from-literal=PCCS_API_KEY="$PCCS_API_KEY" \
    --from-literal=PCCS_USER_TOKEN_HASH="$PCCS_USER_TOKEN_HASH" \
    --from-literal=USER_TOKEN="$PCCS_USER_TOKEN" \
    --from-literal=PCCS_ADMIN_TOKEN_HASH="$PCCS_ADMIN_TOKEN_HASH"

Create the PCCS by running the following command:

$ oc apply -f <(curl -sSf https://github.com/openshift/sandboxed-containers-operator/blob/v{osc-version-z}/scripts/install-helpers/baremetal-coco/intel-dcap/pccs.yaml.in|envsubst)

Configure the PCCS deployment to use the dedicated service account by running the following command:
```
$ oc set serviceaccount deployment/pccs pccs-sa -n intel-dcap
```

Create the QGS by running the following command:

$ oc apply -f https://github.com/openshift/sandboxed-containers-operator/blob/v{osc-version-z}/scripts/install-helpers/baremetal-coco/intel-dcap/qgs.yaml

Configure the QGS DaemonSet to use the dedicated service account by running the following command:
```
$ oc set serviceaccount daemonset/tdx-qgs qgs-sa -n intel-dcap
```

3.3. Enabling confidential containers
Copy link

You enable confidential containers and specify the deployment mode by creating an osc-feature-gates config map.

The deployment mode determines how the Operator installs and configures the Kata runtime. This flexibility allows the Operator to work consistently in clusters with or without the Machine Config Operator (MCO).

MachineConfig: For clusters that use the Machine Config Operator (MCO). If the deploymentMode key is missing in the config map, the Operator defaults to the MachineConfig for backward compatibility.
DaemonSet: For clusters without the MCO. The Operator uses a DaemonSet to install kata-containers RPMs and manage CRI-O configuration by using host drop-in files. Installation progress is tracked through node labels (for example, installing, installed).
DaemonSetFallback: Enables conditional deployment based on the cluster environment. When set, the operator checks for the presence of the MCO. It uses DaemonSet if the MachineConfig add-on is unavailable and defaults to MachineConfig otherwise.

Procedure

Create a my-feature-gate.yaml manifest file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: osc-feature-gates
  namespace: openshift-sandboxed-containers-operator
data:
  confidential: "true"
  deploymentMode: <deployment_mode>

<deployment_mode>: Specify the deployment mode.

Create the config map by running the following command:
```
$ oc create -f my-feature-gate.yaml
```

3.4. Initializing pods at runtime by using initdata
Copy link

You can initialize a pod with workload-specific data at runtime by creating and applying initdata.

This approach enhances security by reducing the exposure of confidential information and improves flexibility by eliminating custom image builds. For example, initdata can include three configuration settings:

An X.509 certificate for secure communication.
A cryptographic key for authentication.
An optional Kata Agent policy.rego file to enforce runtime behavior when overriding the default Kata Agent policy.

The initdata content configures the following components:

Attestation Agent (AA), which verifies the trustworthiness of the pod by sending evidence for attestation.
Confidential Data Hub (CDH), which manages secrets and secure data access within the pod VM.
Kata Agent, which enforces runtime policies and manages the lifecycle of the containers inside the pod VM.

You create an initdata.toml file and convert it to a gzip-format Base64-encoded string.

You apply initdata to a confidential containers pod by adding an annotation to the pod manifest.

3.5. Create the KataConfig custom resource
Copy link

You must create the KataConfig custom resource (CR) to install kata-cc as a runtime class on your worker nodes.

Prerequisites

Be aware that creating the KataConfig CR automatically reboots the worker nodes. The reboot can take from 10 to more than 60 minutes depending on your deployment size, hardware type, and other factors.

Procedure

Create an example-kataconfig.yaml manifest file according to the following example:

apiVersion: kataconfiguration.openshift.io/v1
kind: KataConfig
metadata:
  name: example-kataconfig
spec:
  enablePeerPods: false
  checkNodeEligibility: true
  logLevel: info
#  kataConfigPoolSelector:
#    matchLabels:
#      <label_key>: '<label_value>'

<label_key>: '<label_value>': Optional: If you have applied node labels to install kata-cc on specific nodes, specify the key and value, for example, kata-cc: 'true'.

Create the KataConfig CR by running the following command:
```
$ oc create -f example-kataconfig.yaml
```
The new KataConfig CR is created and installs kata-cc as a runtime class on the worker nodes.
Wait for the kata-cc installation to complete and the worker nodes to reboot before verifying the installation.
Monitor the installation progress by running the following command:
```
$ watch "oc describe kataconfig | sed -n /^Status:/,/^Events/p"
```
When the status of all workers under kataNodes is installed and the condition InProgress is False without specifying a reason, the kata-cc is installed on the cluster.
Verify the runtime classes by running the following command:
```
$ oc get runtimeclass
```
Example output
```
NAME           HANDLER             AGE
kata            kata                34m
kata-cc    kata-tdx        152m
```
You can also see the default kata runtime class in addition to kata-cc.

3.5.1. The checkNodeEligibility parameter
Copy link

To manage node selection for your workloads, configure the checkNodeEligibility parameter in the KataConfig resource. This determines if runtime classes are created based on hardware labels or unconditionally. From 1.12.0, this applies to all standard and confidential container (CC) runtimes.

When checkNodeEligibility is set to true

The Operator performs the following actions:

Node eligibility verification: The Operator verifies that nodes have the required hardware capabilities by using node labels before installing the Kata runtime.
Conditional runtime class creation: The Operator creates runtime classes only if nodes with the required labels exist in the cluster:
- Standard runtime classes: The kata or kata-nvidia-gpu runtime classes are created only if nodes with the required base and GPU labels exist.
- Confidential container runtime classes: The kata-cc or kata-cc-nvidia-gpu runtime classes are created only if nodes with the required Trusted Execution Environment (TEE) labels (such as Intel® Trust Domain Extensions (TDX) or AMD SEV-SNP) and the corresponding confidential containers and GPU labels exist.
Dynamic runtime class management: If no nodes match the required labels, the corresponding runtime class is not created. This prevents workload scheduling failures by ensuring users cannot select a runtime that the cluster cannot support.

When checkNodeEligibility is set to false (default)

The Operator performs the following actions:

Unconditional creation for standard runtimes: The Operator always creates the kata and kata-nvidia-gpu runtime classes, regardless of whether nodes currently have the required hardware labels.
Identification-based creation for CC runtimes: For the kata-cc and kata-cc-nvidia-gpu runtime classes, the Operator still depends on the TEE label for identification, but it does not verify the base or GPU labels during the installation phase.
Manual scheduling: The Operator skips the detailed node label check during installation. The cluster will only schedule pods using these runtime classes if a node eventually matches the nodeSelector defined in the runtime class.

Additional resources

Node Feature Discovery Operator

3.6. Create initdata
Copy link

You create initdata to securely initialize a pod with sensitive or workload-specific data at runtime, thus avoiding the need to embed this data in a virtual machine image. This approach provides additional security by reducing the risk of exposure of confidential information and eliminates the need for custom image builds.

Prerequisites

You have installed the Node Feature Discovery (NFD) Operator. For more information, see Node Feature Discovery Operator in the OpenShift Container Platform documentation.
You have deleted the kbs_cert setting if you configure insecure_http = true in the kbs-config config map for Red Hat build of Trustee.

Procedure

Obtain the Red Hat build of Trustee URL by running the following command:

$ TRUSTEE_URL=$(oc get route kbs-service \
  -n trustee-operator-system -o jsonpath='{.spec.host}') \
  && echo $TRUSTEE_URL

Create the initdata.toml file:

algorithm = <algorithm>
version = "0.1.0"
[data]
"aa.toml" = '''
[token_configs]
[token_configs.coco_as]

url = '<trustee_url>'

[token_configs.kbs]
url = '<trustee_url>'
'''
"cdh.toml" = '''
socket = 'unix:///run/confidential-containers/cdh.sock'
credentials = []

[kbc]
name = 'cc_kbc'
url = '<trustee_url>'
kbs_cert = """
-----BEGIN CERTIFICATE-----
<kbs_certificate>
-----END CERTIFICATE-----
"""
[image]
image_security_policy_uri = 'kbs:///default/<secret-policy-name>/<key>
'''

"policy.rego" = '''
package agent_policy

default AddARPNeighborsRequest := true
default AddSwapRequest := true
default CloseStdinRequest := true
default CopyFileRequest := true
default CreateContainerRequest := true
default CreateSandboxRequest := true
default DestroySandboxRequest := true
default GetMetricsRequest := true
default GetOOMEventRequest := true
default GuestDetailsRequest := true
default ListInterfacesRequest := true
default ListRoutesRequest := true
default MemHotplugByProbeRequest := true
default OnlineCPUMemRequest := true
default PauseContainerRequest := true
default PullImageRequest := true
default ReadStreamRequest := false
default RemoveContainerRequest := true
default RemoveStaleVirtiofsShareMountsRequest := true
default ReseedRandomDevRequest := true
default ResumeContainerRequest := true
default SetGuestDateTimeRequest := true
default SignalProcessRequest := true
default StartContainerRequest := true
default StartTracingRequest := true
default StatsContainerRequest := true
default StopTracingRequest := true
default TtyWinResizeRequest := true
default UpdateContainerRequest := true
default UpdateEphemeralMountsRequest := true
default UpdateInterfaceRequest := true
default UpdateRoutesRequest := true
default WaitProcessRequest := true
default ExecProcessRequest := false
default SetPolicyRequest := false
default WriteStreamRequest := false

default ExecProcessRequest := false
'''

algorithm: Specify sha256, sha384, or sha512.
url: Specify Red Hat build of Trustee
<kbs_certificate>: Specify the Base64-encoded TLS certificate for the attestation agent.
kbs_cert: Delete the kbs_cert setting if you configure insecure_http = true in the kbs-config config map for Red Hat build of Trustee.
image_security_policy_uri: Optional, only if you enabled the container image signature verification policy. Replace <secret-policy-name> and <key> with the secret name and key, respectively specified in Creating the KbsConfig custom resource.

Convert the initdata.toml file to a gzipped, Base64-encoded string in a text file by running the following command:
```
$ cat initdata.toml | gzip | base64 -w0 > initdata.txt
```
Record this string to use in the pod manifest.
Calculate the hash of an initdata.toml file and assign its value to the hash variable by running the following command:
```
$ hash=$(<algorithm> initdata.toml | cut -d' ' -f1)
```
Assign 32 bytes of 0s to the initial_pcr variable by running the following command:
```
$ initial_pcr=0000000000000000000000000000000000000000000000000000000000000000
```
Calculate the SHA-256 hash of hash and initial_pcr and assign its value to the PCR8_HASH variable by running the following command:
```
$ PCR8_HASH=$(echo -n "$initial_pcr$hash" | xxd -r -p | sha256sum | cut -d' ' -f1) && echo $PCR8_HASH
```
Record the PCR8_HASH value for the RVPS config map.

3.7. Applying initdata to a pod
Copy link

Prerequisite

You have created an initdata string.

Procedure

Add the initdata string to the pod manifest and save the file as my-pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: ocp-cc-pod
  labels:
    app: ocp-cc-pod
  annotations:
    io.katacontainers.config.hypervisor.cc_init_data: <initdata_string>
spec:
  runtimeClassName: kata-cc
  containers:
  - name: <container_name>
    image: registry.access.redhat.com/ubi9/ubi:latest
    command:
    - sleep
    - "36000"
    securityContext:
      privileged: false
      seccompProfile:
        type: RuntimeDefault

where

<initdata_string>: Specify the gzipped, Base64-encoded initdata value in a pod annotation to override the global INITDATA setting in the peer pods config map.
<container_name>: Specify a container name.

Create the pod by running the following command:
```
$ oc create -f my-pod.yaml
```

3.8. Verifying attestation
Copy link

You can verify the attestation process by creating a test pod to retrieve a specific resource from Red Hat build of Trustee.

Important

This procedure is an example to verify that attestation is working. Do not write sensitive data to standard I/O, because the data can be captured by using a memory dump. Only data written to memory is encrypted.

Procedure

Create a test-pod.yaml manifest file:

apiVersion: v1
kind: Pod
metadata:
  name: ocp-cc-pod
  labels:
    app: ocp-cc-pod
  annotations:
    io.katacontainers.config.hypervisor.cc_init_data: "<initdata_string>"
spec:
  runtimeClassName: kata-cc
  containers:
    - name: skr-openshift
      image: registry.access.redhat.com/ubi9/ubi:latest
      command:
        - sleep
        - "36000"
      securityContext:
        privileged: false
        seccompProfile:
          type: RuntimeDefault
metadata:
  name: coco-test-pod
  labels:
    app: coco-test-pod
  annotations:
    io.katacontainers.config.hypervisor.cc_init_data: "<initdata_string>"
spec:
  runtimeClassName: kata-cc
  containers:
    - name: test-container
      image: registry.access.redhat.com/ubi9/ubi:9.3
      command:
        - sleep
        - "36000"
      securityContext:
        privileged: false
        seccompProfile:
          type: RuntimeDefault

where:

io.katacontainers.config.hypervisor.cc_init_data: Optional: Specifies initdata in a pod annotation, which overrides the global INITDATA setting in the peer pods config map.

Create the pod by running the following command:
```
$ oc create -f test-pod.yaml
```
Log in to the pod by running the following command:
```
$ oc exec -it ocp-cc-pod -- bash
```
Fetch the Red Hat build of Trustee resource by running the following command:
```
$ curl http://127.0.0.1:8006/cdh/resource/default/attestation-status/status
```
Example output
```
success #/
```

3.9. Configuring your workload
Copy link

You configure your workload for confidential containers by setting kata-cc as the runtime class for the following pod-templated objects:

Pod objects
ReplicaSet objects
ReplicationController objects
StatefulSet objects
Deployment objects
DeploymentConfig objects

Important

Do not deploy workloads in an Operator namespace. Create a dedicated namespace for these resources.

Prerequisites

You have created the KataConfig custom resource (CR).

Procedure

Add spec.runtimeClassName: kata-cc to the manifest of each pod-templated workload object as in the following example:
```
apiVersion: v1
kind: <object>
# ...
spec:
  runtimeClassName: kata-cc
# ...
```
Apply the changes to the workload object by running the following command:
```
$ oc apply -f <object.yaml>
```
OpenShift Container Platform creates the workload object and begins scheduling it.

Verification

Inspect the spec.runtimeClassName field of a pod-templated object. If the value is kata-cc, then the workload is running on confidential containers.

3.9.1. Encrypt the block volumes
Copy link

You must encrypt volumes inside the TEE to ensure data stays private. Rather than relying on host-level CSI drivers, you attach raw blocks, use an init container for Linux Unified Key Setup (LUKS) formatting, and mount to your app by using shared namespaces and hooks. This keeps data secure in use, in memory, and at rest.

Prerequisites

You have installed the Container Storage Interface (CSI) driver configured for raw block volumes. For more information, see Understanding persistent storage.
You have installed OpenShift sandboxed containers on a bare-metal server.
You have configured an attestation service, such as Red Hat build of Trustee, to provide secrets like the encryption passphrase.

Procedure

Create a storage-encrypted.yaml manifest file for the PersistentVolumeClaim object with the volumeMode parameter set to Block:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: storage-encrypted
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Block
  resources:
    requests:
      storage: <size>

Create the PersistentVolumeClaim object by running the following command:
```
$ oc create -f storage-encrypted.yaml
```

Create an encrypted-pod.yaml manifest file with the complete pod specification:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    io.katacontainers.config.hypervisor.cc_init_data: <init_data>
  name: storage-encrypted
  labels:
    app: storage-encrypted
spec:
  runtimeClassName: kata-cc
  shareProcessNamespace: true
  initContainers:
  - name: format-disk
    image: quay.io/redhat-user-workloads/ose-osc-tenant/osc-storage-helper:on-pr-3fe822e41e1bd31cd2cbfd9468ab087abc58d9a1-linux-x86-64
    command: ["/usr/local/bin/luks-helper", "format-disk"]
    securityContext:
      privileged: true
    restartPolicy: Always
    env:
      - name: PASS
        valueFrom:
          secretKeyRef:
            name: <my_sealed_secret>
            key: <secret_key>
    volumeMounts:
      - name: storage-ipc
        mountPath: /dev/shm
    volumeDevices:
      - name: luks-block
        devicePath: /dev/block-device
  - name: check-ready
    image: quay.io/redhat-user-workloads/ose-osc-tenant/osc-storage-helper:on-pr-3fe822e41e1bd31cd2cbfd9468ab087abc58d9a1-linux-x86-64
    command: ["/usr/local/bin/luks-helper", "wait-ready"]
    securityContext:
      privileged: true
    volumeMounts:
      - name: storage-ipc
        mountPath: /dev/shm
  containers:
  - name: <container_name>
    image: <image_name>
    ports:
      - containerPort: 8888
    env:
      - name: DATA_DIR
        value: <mount_point>
    lifecycle:
      postStart:
        exec:
          command:
            - /bin/sh
            - -c
            - |
              PID=$(cat /dev/shm/luks-helper.pid)
              chmod ug+w "$(dirname "$DATA_DIR")"
              ln -sfn "/proc/$PID/root/mnt/storage" "$DATA_DIR"
    securityContext:
      privileged: true
    volumeMounts:
      - name: storage-ipc
        mountPath: /dev/shm
  volumes:
  - name: luks-block
    persistentVolumeClaim:
      claimName: storage-encrypted
  - name: storage-ipc
    emptyDir:
      medium: Memory

where:

<init_data>: Specifies the initdata for the runtime configuration.
<my_sealed_secret>: Specifies the name of the sealed secret that contains the LUKS encryption passphrase.
<secret_key>: Specifies the key within the sealed secret that contains the encryption passphrase.
<container_name>: Specifies the container name for your application container.
<image_name>: Specifies the image name.
<mount_point>: Specifies the mount point for encrypted storage inside your application container.

Create the pod by running the following command:
```
$ oc create -f encrypted-pod.yaml
```

3.10. Configuring confidential containers for NVIDIA GPUs
Copy link

Configure confidential containers to use NVIDIA graphics processing units (GPUs). By configuring the required Operators and custom resources, you can provision both regular and confidential GPUs for your sandboxed workloads.

3.10.1. NVIDIA GPUs as trusted execution environments
Copy link

Use NVIDIA graphics processing units (GPUs) as a trusted execution environment (TEE) to provide hardware-based isolation for your confidential workloads. Leveraging NVIDIA GPUs within a TEE protects data and code in memory from unauthorized access or tampering, even from privileged users or the host operating system.

When you deploy confidential containers on bare-metal servers with NVIDIA GPU support, you must manually configure the MachineConfig with the required kernel arguments for GPU integration. After configuring the MachineConfig, verify that the kernel arguments are correctly applied to the machine config pool where Kata containers and GPU support are configured to run.

3.10.2. Create a MachineConfig for NVIDIA GPUs
Copy link

Enable Input-Output Memory Management Unit (IOMMU) kernel parameters on your worker nodes. This configuration helps you support GPU pass-through for your sandboxed containers.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.

Procedure

Create a gpu-machine-config.yaml manifest file according to the following example:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 100-iommu-kernel-args
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
  - amd_iommu=on
  - intel_iommu=on

Note

If using Single Node OpenShift (SNO), replace worker with master in the machineconfiguration.openshift.io/role label.

The nodes will reboot after applying this configuration.

Create the config map by running the following command:
```
$ oc create -f gpu-machine-config.yaml
```

Verification

Verify the kernel parameters are set by running the following command:
```
$ oc debug node/<node_name>
```
```
$ cat /proc/cmdline | grep iommu
```

3.10.3. Install the Node Feature Discovery Operator
Copy link

Install the Node Feature Discovery (NFD) Operator to detect hardware features and system configurations on your cluster nodes. This tool enables automatic labeling based on the detected features

Prerequisites

You have installed OpenShift sandboxed containers Operator.
You have access to the cluster as a user with the cluster-admin role.

Procedure

Install the Node Feature Discovery (NFD) Operator by following the OpenShift Container Platform documentation.

Verification

Verify the NFD Operator is active by running the following command:

$ oc get pods -n openshift-nfd

Example output

NAME                                      READY   STATUS    RESTARTS   AGE
nfd-controller-manager-5d8d9d9f8b-abcde   2/2     Running   0          2m

3.10.4. Create a node feature rule for NVIDIA GPUs
Copy link

Create a NodeFeatureRule custom resource to match NVIDIA kernel modules on your cluster. This custom resource enables the automatic labeling of nodes with compatible NVIDIA graphics processing units.

Prerequisites

You have created the NodeFeatureDiscovery custom resource. For more information, see Create the NodeFeatureDiscovery custom resource.
You have access to the cluster as a user with the cluster-admin role.

Procedure

Create a NodeFeatureRule custom resource to match NVIDIA kernel modules by running the following command:

apiVersion: nfd.openshift.io/v1alpha1
kind: NodeFeatureRule
metadata:
  name: nvidia-kernel-modules
spec:
  rules:
    - name: kernel-module-gdrdrv
      labels:
        nvidia.com/gdrcopy.capable: "true"
      matchFeatures:
        - feature: kernel.loadedmodule
          matchExpressions:
            gdrdrv:
              op: Exists
    - name: kernel-module-nvidia_fs
      labels:
        nvidia.com/gds.capable: "true"
      matchFeatures:
        - feature: kernel.loadedmodule
          matchExpressions:
            nvidia_fs:
              op: Exists
    - name: kernel-module-nvidia_peermem
      labels:
        nvidia.com/peermem.capable: "true"
      matchFeatures:
        - feature: kernel.loadedmodule
          matchExpressions:
            nvidia_peermem:
              op: Exists

Create the NodeFeatureRule CR by running the following command:
```
$ oc create -f my-nfd-gpu.yaml
```

3.10.5. Install the NVIDIA GPU Operator
Copy link

You must install the NVIDIA GPU Operator to manage GPU resources in your cluster.

Prerequisites

You have created the NodeFeatureRule custom resource for NVIDIA GPUs.
You have access to the cluster as a user with the cluster-admin role.

Procedure

Install the NVIDIA GPU Operator version 26.3.0. For detailed installation instructions, see the NVIDIA GPU Operator documentation.

Verification

Verify that the GPU Operator pods are running by running the following command:

$ oc get pods -n nvidia-gpu-operator

NAME                                       READY   STATUS    RESTARTS   AGE
gpu-operator-1234567890-abcde              1/1     Running   0          10m

3.10.6. Create the ClusterPolicy CR for NVIDIA GPUs
Copy link

Create a ClusterPolicy custom resource to configure the NVIDIA GPU Operator. This policy helps you correctly set up and manage the operator for use with OpenShift sandboxed containers.

Prerequisites

You have installed the NVIDIA GPU Operator.
You have access to the cluster as a user with the cluster-admin role.

Procedure

Create a my-cluster-policy-gpu.yaml manifest file according to the following example:

apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
  name: gpu-cluster-policy
spec:
  ccManager:
    defaultMode: "on"
    enabled: true
  cdi:
    default: false
    enabled: true
    nriPluginEnabled: false
  daemonsets:
    rollingUpdate:
      maxUnavailable: '1'
    updateStrategy: RollingUpdate
  dcgm:
    enabled: false
  dcgmExporter:
    config:
      name: ''
    enabled: false
    serviceMonitor:
      enabled: true
  devicePlugin:
    config:
      default: ''
      name: ''
    enabled: false
    mps:
      root: /run/nvidia/mps
  driver:
    certConfig:
      name: ''
    enabled: false
    kernelModuleConfig:
      name: ''
    kernelModuleType: auto
    licensingConfig:
      configMapName: ''
      nlsEnabled: true
    repoConfig:
      configMapName: ''
    upgradePolicy:
      autoUpgrade: true
      drain:
        deleteEmptyDir: false
        enable: false
        force: false
        timeoutSeconds: 300
      maxParallelUpgrades: 1
      maxUnavailable: 25%
      podDeletion:
        deleteEmptyDir: false
        force: false
        timeoutSeconds: 300
      waitForCompletion:
        timeoutSeconds: 0
    useNvidiaDriverCRD: false
    useOpenKernelModules: false
    virtualTopology:
      config: ''
  gdrcopy:
    enabled: false
  gds:
    enabled: false
  gfd:
    enabled: true
  kataManager:
    enabled: false
  mig:
    strategy: single
  migManager:
    enabled: false
  nodeStatusExporter:
    enabled: true
  operator:
    defaultRuntime: crio
    initContainer: {}
    runtimeClass: nvidia
    use_ocp_driver_toolkit: true
  kataSandboxDevicePlugin:
    enabled: true
    env:
      - name: P_GPU_ALIAS
        value: pgpu
      - name: NVSWITCH_ALIAS
        value: nvswitch
  sandboxWorkloads:
    defaultWorkload: vm-passthrough
    enabled: true
    mode: kata
  toolkit:
    enabled: false
    installDir: /usr/local/nvidia
  validator:
    plugin:
      env:
        - name: WITH_WORKLOAD
          value: 'false'
  vfioManager:
    enabled: true
    env:
      - name: BIND_NVSWITCHES
        value: 'true'
  vgpuDeviceManager:
    enabled: false
  vgpuManager:
    enabled: false

Create the ClusterPolicy CR by running the following command:
```
$ oc create -f my-cluster-policy-gpu.yaml
```

Verification

Verify that the required labels are present on your worker nodes by running the following command:

$ oc get nodes -o json | jq '.items[].metadata.labels | with_entries(select(.key | startswith("nvidia.com")))'

"nvidia.com/cc.mode.state"
"nvidia.com/cc.ready.state"
"nvidia.com/gpu.deploy.cc-manager"
"nvidia.com/gpu.deploy.kata-manager"
"nvidia.com/gpu.deploy.kata-sandbox-device-plugin"
"nvidia.com/gpu.deploy.sandbox-validator"
"nvidia.com/gpu.deploy.vfio-manager"
"nvidia.com/gpu.present"

Verify the GPU Operator setup by running the following command:

$ oc get pods -n nvidia-gpu-operator

NAME                                                READY   STATUS    RESTARTS      AGE
gpu-operator-cb99f5757-djl7k                        1/1     Running   2             16h
nvidia-cc-manager-hjd6t                             1/1     Running   5 (42m ago)   16h
nvidia-kata-sandbox-device-plugin-daemonset-wn6bc   1/1     Running   2             16h
nvidia-sandbox-validator-7cvx5                      1/1     Running   0             70m
nvidia-vfio-manager-zsmqn                           1/1     Running   2             16h

Verify the CC Manager DaemonSet by running the following command:

$ oc get daemonset -n nvidia-gpu-operator | grep cc-manager

nvidia-cc-manager     1         1         1       1            1           nvidia.com/gpu.deploy.cc-manager=true   7m43s

Create a sample pod to test the GPU allocation.

Note

You must create the KataConfig custom resource before creating the GPU sample pod. See Creating the KataConfig custom resource.

For confidential GPUs, create the following pod:

apiVersion: v1
kind: Pod
metadata:
  name: sample-gpu-pod
  annotations:
    io.katacontainers.config.hypervisor.default_memory: "32768"
    io.katacontainers.config.hypervisor.cc_init_data: "H4sIAAAAAAAAA6WUS2/TQBCA7/4Vli85xQ1IoKpSDyUJUIkQy07JoYqiyXpir7IPszsONb++YyIQINauxMGHnfk875kzOi+tiW/jZJa+SmdJBKqyTlKte5mv4fWbt0n0WALBLkoaq6ToUoeVTVg/mUyiBsQJKoz5M7S/AFFU4hFaRfFdWd7l2WeUVX2wzuf4tUVP8c1tTK7F37HiGzQh9VxZjwWV0gQJ23TvpcKg3iEQzq0hkAbdMFaAKQ/2KQQtWOhsN0Itn1Bkzgr0waw/IK2QnBRDxHq9Wp65tEGkly6Q81JBM5+kp3tD6I7A8QxRuW0pTKxQf7TUqLZ613Fuh2C510ZxmefZA/8RYjJo/XhHslape83DFQJyhLIg7lvQUY7ansc9XbCCQOEX6Ujaoy9qcLiyraFgSXL0iGXOs2D1As8DWKvHYyiQLv3kKdxIHUyauezHpgUBWRlQI/PHuToaj6mnNg6ENNUAQ/4FlmwzYmhD3bbfci+/B5N/aMqX7PIFWzY1anSghtt4gX/tyDA2vCRbkDRS+C0fWPz30PYnNQFIyWr188Q+kj2h2QtrjrLyu7/e6enAstap/mDXRM3N1RWLph7dWQpM2awnxKltuAxk3dR3/NY317NrPve9/SgRZf2nx9NB7CIDGnujQuz5nfyXj2f3zZjpaQYAAA=="
spec:
  runtimeClassName: kata-cc-nvidia-gpu
  restartPolicy: OnFailure
  containers:
    - name: gpu-cc-verifier
      image: quay.io/openshift_sandboxed_containers/gpu-verifier:ubi9
      imagePullPolicy: IfNotPresent
      command: ["/bin/bash"]
      args:
        - -c
        - |
          /opt/cuda-samples/Samples/0_Introduction/vectorAdd/build/vectorAdd
          sleep 36000
      resources:
        limits:
          nvidia.com/pgpu: 1
      securityContext:
        privileged: false

Note

The io.katacontainers.config.hypervisor.cc_init_data annotation includes a permissive kata-agent policy for verification purposes. The "exec" and "log" APIs use an embedded kata-agent policy that disables them. This configuration does not include a Key Broker Service (KBS) URL, which prevents issues in customer environments where the KBS URL might not align to the actual deployment.

Verify the sample pod is running successfully by running the following command:

$ oc get pods

NAME               READY   STATUS    RESTARTS   AGE
sample-gpu-pod     1/1     Running   0          2m

Check the pod logs to verify GPU functionality by running the following command:

$ oc logs sample-gpu-pod

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

3.10.7. Required node labels for GPU runtime classes
Copy link

Apply specific labels to your worker nodes so you can use NVIDIA GPUs with OpenShift sandboxed containers. The NVIDIA GPU Operator typically adds these labels automatically when it detects compatible hardware configured for VFIO passthrough mode.

The required labels depend on whether you are deploying confidential GPUs.

Labels for confidential GPUs

For confidential GPU workloads using the kata-cc-nvidia-gpu runtime class, nodes must have the base Kata and GPU labels, plus additional labels for confidential computing and the Trusted Execution Environment (TEE). Nodes must have the following labels:

Base Kata label:
- feature.node.kubernetes.io/runtime.kata: "true"
Base GPU labels:
- nvidia.com/gpu.present: "true"
- nvidia.com/gpu.deploy.vfio-manager: "true"
- nvidia.com/gpu.deploy.kata-sandbox-device-plugin: "true"
Confidential computing GPU labels:
- nvidia.com/cc.mode.state: "on"
- nvidia.com/cc.ready.state: "true"
- nvidia.com/gpu.deploy.cc-manager: "true"
TEE label (one of the following):
- intel.feature.node.kubernetes.io/tdx: "true"
- amd.feature.node.kubernetes.io/snp: "true"

Chapter 4. Update
Copy link

You update confidential containers by updating the OpenShift Container Platform cluster and the OpenShift sandboxed containers Operator.

You must perform the following steps:

Update your OpenShift Container Platform cluster to update the Kata runtime and its dependencies.
The RHCOS extension sandboxed containers contains the required components to run OpenShift sandboxed containers, such as the Kata containers runtime, the hypervisor QEMU, and other dependencies. You update the extension by updating the cluster to a new release of OpenShift Container Platform.
Update the OpenShift sandboxed containers Operator.

4.1. Updating the OpenShift sandboxed containers Operator
Copy link

You can update the OpenShift sandboxed containers Operator by using the command line interface (CLI).

Procedure

Create an osc-subscription.yaml manifest file:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sandboxed-containers-operator
  namespace: openshift-sandboxed-containers-operator
spec:
  channel: stable
  installPlanApproval: Automatic
  name: sandboxed-containers-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  startingCSV: sandboxed-containers-operator.v1.12.0

Create the subscription by running the following command:
```
$ oc create -f osc-subscription.yaml
```
Verify that the Operator is correctly installed by running the following command:
```
$ oc get csv -n openshift-sandboxed-containers-operator
```
This command can take several minutes to complete.

Watch the process by running the following command:

$ watch oc get csv -n openshift-sandboxed-containers-operator

Example output

NAME                             DISPLAY                                  VERSION   REPLACES    PHASE
openshift-sandboxed-containers   openshift-sandboxed-containers-operator  1.12.0    1.11.1      Succeeded

Chapter 5. Uninstall
Copy link

You uninstall confidential containers by deleting the workload pods, uninstalling the OpenShift sandboxed containers Operator, and deleting its resources.

You perform the following tasks:

Delete pods that use the kata-cc runtime class.
Important
You must delete the workload pods before you delete the KataConfig CR. The pod names usually have the prefix podvm and custom tags, if provided.
Delete the KataConfig custom resource (CR).
Uninstall the OpenShift sandboxed containers Operator.
Delete the KataConfig custom resource definition (CRD).

5.1. Deleting workload pods
Copy link

You must delete your workload pods. The pod names usually have the prefix podvm and custom tags, if provided.

Prerequisites

You have installed the jq utility.

Procedure

Search for the pods by running the following command:

$ oc get pods -A -o json | jq -r '.items[] | \
  select(.spec.runtimeClassName == "kata-cc").metadata.name'

Delete each pod by running the following command:
```
$ oc delete pod <pod>
```

5.2. Deleting the KataConfig custom resource
Copy link

You must delete the KataConfig custom resource (CR).

Deleting the KataConfig CR automatically reboots the worker nodes. Reboot can take from 10 to 60 minutes. The following factors can affect the reboot time:

A larger OpenShift Container Platform deployment with a greater number of worker nodes.
Activation of the BIOS and Diagnostics utility.
Deployment on a hard drive rather than an SSD.
Deployment on physical nodes such as bare metal, rather than on virtual nodes.
A slow CPU and network.

Prerequisites

You have deleted all pods that use the kata-cc runtime class.

Procedure

Delete the KataConfig CR by running the following command:
```
$ oc delete kataconfig example-kataconfig
```
The OpenShift sandboxed containers Operator removes all resources that were initially created to enable the runtime on your cluster.
Important
When you delete the KataConfig CR, the CLI stops responding until all worker nodes reboot. You must wait for the deletion process to complete before performing the verification.

Verify the CR removal by running the following command:

$ oc get kataconfig example-kataconfig

Example output

No example-kataconfig instances exist

5.3. Uninstalling the OpenShift sandboxed containers Operator
Copy link

You uninstall the OpenShift sandboxed containers Operator by using the command line.

Prerequisites

You have deleted all pods with the kata-cc runtime class.
You have deleted the KataConfig custom resource.

Procedure

Delete the subscription by running the following command:

$ oc delete subscription OpenShift sandboxed containers Operator -n openshift-sandboxed-containers-operator

Delete the namespace by running the following command:

$ oc delete namespace openshift-sandboxed-containers-operator

5.4. Deleting the KataConfig CRD
Copy link

You must delete the KataConfig custom resource definition (CRD).

Prerequisites

You have deleted the KataConfig custom resource.
You have uninstalled the OpenShift sandboxed containers Operator.

Procedure

Delete the KataConfig CRD by running the following command:

$ oc delete crd kataconfigs.kataconfiguration.openshift.io

Verify that the CRD was deleted by running the following command:

$ oc get crd kataconfigs.kataconfiguration.openshift.io

Example output

Unknown CRD kataconfigs.kataconfiguration.openshift.io

Chapter 6. Observe
Copy link

You can monitor the health of your confidential containers environment.

The following tools are available:

OpenShift Container Platform web console. Administrators can access and query raw metrics through Prometheus.
Logging

6.1. Metrics
Copy link

You can monitor system health by querying metrics displayed in the OpenShift Container Platform web console.

You can access the following metrics:

Kata agent metrics: Kata agent metrics display information about the kata agent process running in the VM embedded in your sandboxed containers. These metrics include data from /proc/<pid>/[io, stat, status].
Kata guest operating system metrics: Kata guest operating system metrics display data from the guest operating system running in your sandboxed containers. These metrics include data from /proc/[stats, diskstats, meminfo, vmstats] and /proc/net/dev.
Hypervisor metrics: Hypervisor metrics display data regarding the hypervisor running the VM embedded in your sandboxed containers. These metrics mainly include data from /proc/<pid>/[io, stat, status].
Kata monitor metrics: Kata monitor is the process that gathers metric data and makes it available to Prometheus. The kata monitor metrics display detailed information about the resource usage of the kata-monitor process itself. These metrics also include counters from Prometheus data collection.
Kata containerd shim v2 metrics: Kata containerd shim v2 metrics display detailed information about the kata shim process. These metrics include data from /proc/<pid>/[io, stat, status] and detailed resource usage metrics.

6.2. Viewing metrics
Copy link

You can access the metrics for OpenShift sandboxed containers in the Metrics page In the OpenShift Container Platform web console.

Prerequisites

You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.

Procedure

In the OpenShift Container Platform web console, navigate to Observe → Metrics.
In the input field, enter the query for the metric you want to observe.
All kata-related metrics begin with kata. Typing kata displays a list of all available kata metrics.

The metrics from your query are visualized on the page.

6.3. Enabling debug logs for CRI-O runtime
Copy link

You can enable debug logs by updating the logLevel field in the KataConfig CR. This changes the log level in the CRI-O runtime for the worker nodes running OpenShift sandboxed containers.

Prerequisites

You have installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.

Procedure

Change the logLevel field in your existing KataConfig CR to debug:

$ oc patch kataconfig <kataconfig> --type merge --patch '{"spec":{"logLevel":"debug"}}'

Monitor the kata-oc machine config pool until the value of UPDATED is True, indicating that all worker nodes are updated:

$ oc get mcp kata-oc

Example output

NAME     CONFIG                 UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
kata-oc  rendered-kata-oc-169   False    True      False     3             1                  1                    0                     9h

Verification

Start a debug session with a node in the machine config pool:
```
$ oc debug node/<node_name>
```
Change the root directory to /host:
```
# chroot /host
```

Verify the changes in the crio.conf file:

# crio config | egrep 'log_level

Example output

log_level = "debug"

6.4. Viewing debug logs for components
Copy link

Cluster administrators can use the debug logs to troubleshoot issues. The logs for each node are printed to the node journal.

You can review the logs for the following OpenShift sandboxed containers components:

Kata agent
Kata runtime (containerd-shim-kata-v2)
virtiofsd

QEMU only generates warning and error logs. These warnings and errors print to the node journal in both the Kata runtime logs and the CRI-O logs with an extra qemuPid field.

Example of QEMU logs

Mar 11 11:57:28 openshift-worker-0 kata[2241647]: time="2023-03-11T11:57:28.587116986Z" level=info msg="Start logging QEMU (qemuPid=2241693)" name=containerd-shim-v2 pid=2241647 sandbox=d1d4d68efc35e5ccb4331af73da459c13f46269b512774aa6bde7da34db48987 source=virtcontainers/hypervisor subsystem=qemu

Mar 11 11:57:28 openshift-worker-0 kata[2241647]: time="2023-03-11T11:57:28.607339014Z" level=error msg="qemu-kvm: -machine q35,accel=kvm,kernel_irqchip=split,foo: Expected '=' after parameter 'foo'" name=containerd-shim-v2 pid=2241647 qemuPid=2241693 sandbox=d1d4d68efc35e5ccb4331af73da459c13f46269b512774aa6bde7da34db48987 source=virtcontainers/hypervisor subsystem=qemu

Mar 11 11:57:28 openshift-worker-0 kata[2241647]: time="2023-03-11T11:57:28.60890737Z" level=info msg="Stop logging QEMU (qemuPid=2241693)" name=containerd-shim-v2 pid=2241647 sandbox=d1d4d68efc35e5ccb4331af73da459c13f46269b512774aa6bde7da34db48987 source=virtcontainers/hypervisor subsystem=qemu

The Kata runtime prints Start logging QEMU when QEMU starts, and Stop Logging QEMU when QEMU stops. The error appears in between these two log messages with the qemuPid field. The actual error message from QEMU appears in red.

The console of the QEMU guest is printed to the node journal as well. You can view the guest console logs together with the Kata agent logs.

Prerequisites

You have installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.

Procedure

To review the Kata agent logs and guest console logs, run the following command:

$ oc debug node/<nodename> -- journalctl -D /host/var/log/journal -t kata -g “reading guest console”

To review the Kata runtime logs, run the following command:

$ oc debug node/<nodename> -- journalctl -D /host/var/log/journal -t kata

To review the virtiofsd logs, run the following command:

$ oc debug node/<nodename> -- journalctl -D /host/var/log/journal -t virtiofsd

To review the QEMU logs, run the following command:

$ oc debug node/<nodename> -- journalctl -D /host/var/log/journal -t kata -g "qemuPid=\d+"

Chapter 7. Troubleshoot
Copy link

You can open a Red Hat support case and provide debugging information by using must-gather. The must-gather tool collects diagnostic information about your OpenShift Container Platform cluster, including virtual machines and other data.

7.1. Using must-gather
Copy link

The oc adm must-gather CLI command collects the information from your cluster that is most likely needed for debugging issues, including:

Resource definitions
Service logs

By default, the oc adm must-gather command uses the default plugin image and writes into ./must-gather.local.

Alternatively, you can collect specific information by running the command with the appropriate arguments as described in the following sections:

To collect data related to one or more specific features, use the --image argument with an image, as listed in a following section.
For example:
```
$ oc adm must-gather --image=registry.redhat.io/openshift-sandboxed-containers/osc-must-gather-rhel9:1.12.0
```
To collect the audit logs, use the -- /usr/bin/gather_audit_logs argument, as described in a following section.
For example:
```
$ oc adm must-gather -- /usr/bin/gather_audit_logs
```
Note
Audit logs are not collected as part of the default set of information to reduce the size of the files.

When you run oc adm must-gather, a new pod with a random name is created in a new project on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local. This directory is created in the current working directory.

For example:

NAMESPACE                      NAME                 READY   STATUS      RESTARTS      AGE
...
openshift-must-gather-5drcj    must-gather-bklx4    2/2     Running     0             72s
openshift-must-gather-5drcj    must-gather-s8sdh    2/2     Running     0             72s
...

Optionally, you can run the oc adm must-gather command in a specific namespace by using the --run-namespace option.

For example:

$ oc adm must-gather --run-namespace <namespace> --image=registry.redhat.io/openshift-sandboxed-containers/osc-must-gather-rhel9:1.12.0

Chapter 8. Reference
Copy link

8.1. KataConfig status messages
Copy link

The following table displays the status messages for the KataConfig custom resource (CR) for a cluster with two worker nodes.

Expand

Table 8.1. KataConfig status messages
Status	Description
Initial installation When a `KataConfig` CR is created and starts installing `kata-cc` on both workers, the following status is displayed for a few seconds.	`conditions: message: Performing initial installation of kata-cc on cluster reason: Installing status: 'True' type: InProgress kataNodes: nodeCount: 0 readyNodeCount: 0`
Installing Within a few seconds the status changes.	`kataNodes: nodeCount: 2 readyNodeCount: 0 waitingToInstall: - worker-0 - worker-1`
Installing (Worker-1 installation starting) For a short period of time, the status changes, signifying that one node has initiated the installation of `kata-cc`, while the other is in a waiting state. This is because only one node can be unavailable at any given time. The `nodeCount` remains at 2 because both nodes will eventually receive `kata-cc`, but the `readyNodeCount` is currently 0 as neither of them has reached that state yet.	`kataNodes: installing: - worker-1 nodeCount: 2 readyNodeCount: 0 waitingToInstall: - worker-0`
Installing (Worker-1 installed, worker-0 installation started) After some time, `worker-1` will complete its installation, causing a change in the status. The `readyNodeCount` is updated to 1, indicating that `worker-1` is now prepared to execute `kata-cc` workloads. You cannot schedule or run `kata-cc` workloads until the runtime class is created at the end of the installation process.	`kataNodes: installed: - worker-1 installing: - worker-0 nodeCount: 2 readyNodeCount: 1`
Installed When installed, both workers are listed as installed, and the `InProgress` condition transitions to `False` without specifying a reason, indicating the successful installation of `kata-cc` on the cluster.	`conditions: message: "" reason: "" status: 'False' type: InProgress kataNodes: installed: - worker-0 - worker-1 nodeCount: 2 readyNodeCount: 2`

Expand

Status Description

Status	Description
Initial uninstall If `kata-cc` is installed on both workers, and you delete the `KataConfig` to remove `kata-cc` from the cluster, both workers briefly enter a waiting state for a few seconds.	`conditions: message: Removing kata-cc from cluster reason: Uninstalling status: 'True' type: InProgress kataNodes: nodeCount: 0 readyNodeCount: 0 waitingToUninstall: - worker-0 - worker-1`
Uninstalling After a few seconds, one of the workers starts uninstalling.	`kataNodes: nodeCount: 0 readyNodeCount: 0 uninstalling: - worker-1 waitingToUninstall: - worker-0`
Uninstalling Worker-1 finishes and worker-0 starts uninstalling.	`kataNodes: nodeCount: 0 readyNodeCount: 0 uninstalling: - worker-0`

Initial uninstall

If kata-cc is installed on both workers, and you delete the KataConfig to remove kata-cc from the cluster, both workers briefly enter a waiting state for a few seconds.

 conditions:
    message: Removing kata-cc from cluster
    reason: Uninstalling
    status: 'True'
    type: InProgress
 kataNodes:
   nodeCount: 0
   readyNodeCount: 0
   waitingToUninstall:
   - worker-0
   - worker-1

Uninstalling

After a few seconds, one of the workers starts uninstalling.

 kataNodes:
   nodeCount: 0
   readyNodeCount: 0
   uninstalling:
   - worker-1
   waitingToUninstall:
   - worker-0

Uninstalling

Worker-1 finishes and worker-0 starts uninstalling.

 kataNodes:
   nodeCount: 0
   readyNodeCount: 0
   uninstalling:
   - worker-0

Note

The reason field can also report the following causes:

Failed: This is reported if the node cannot finish its transition. The status reports True and the message is Node <node_name> Degraded: <error_message_from_the_node>.
BlockedByExistingKataPods: This is reported if there are pods running on a cluster that use the kata-cc runtime while kata-cc is being uninstalled. The status field is False and the message is Existing pods using "kata-cc" RuntimeClass found. Please delete the pods manually for KataConfig deletion to proceed. There could also be a technical error message reported like Failed to list kata pods: <error_message> if communication with the cluster control plane fails.

Legal Notice
Copy link

Except as otherwise noted below, the text of and illustrations in this documentation are licensed by Red Hat under the Creative Commons Attribution–Share Alike 3.0 Unported license . If you distribute this document or an adaptation of it, you must provide the URL for the original version.

Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, the Red Hat logo, JBoss, Hibernate, and RHCE are trademarks or registered trademarks of Red Hat, LLC. or its subsidiaries in the United States and other countries.

Linux® is the registered trademark of Linus Torvalds in the United States and other countries.

XFS is a trademark or registered trademark of Hewlett Packard Enterprise Development LP or its subsidiaries in the United States and other countries.

The OpenStack® Word Mark and OpenStack logo are trademarks or registered trademarks of the Linux Foundation, used under license.

All other trademarks are the property of their respective owners.

Deploying confidential containers on bare-metal servers

Protecting containers and data by leveraging trusted execution environments

PrefaceCopy linkLink copied to clipboard!

Providing feedback on Red Hat documentationCopy linkLink copied to clipboard!

Chapter 1. DiscoverCopy linkLink copied to clipboard!

1.1. Compatibility with OpenShift Container PlatformCopy linkLink copied to clipboard!

1.2. Common termsCopy linkLink copied to clipboard!

1.3. Initrd imagesCopy linkLink copied to clipboard!

1.4. FIPS complianceCopy linkLink copied to clipboard!

Chapter 2. InstallCopy linkLink copied to clipboard!

2.1. PrerequisitesCopy linkLink copied to clipboard!

2.2. Creating an Intel TDX machine configCopy linkLink copied to clipboard!

2.3. Installing the OpenShift sandboxed containers OperatorCopy linkLink copied to clipboard!

Chapter 3. ConfigureCopy linkLink copied to clipboard!

3.1. Auto-detecting TEEsCopy linkLink copied to clipboard!

3.1.1. Creating a NodeFeatureDiscovery custom resourceCopy linkLink copied to clipboard!

3.1.2. Creating the NodeFeatureRule custom resourceCopy linkLink copied to clipboard!

3.2. Deploying Intel TDX remote attestationCopy linkLink copied to clipboard!

3.3. Enabling confidential containersCopy linkLink copied to clipboard!

3.4. Initializing pods at runtime by using initdataCopy linkLink copied to clipboard!

3.5. Create the KataConfig custom resourceCopy linkLink copied to clipboard!

3.5.1. The checkNodeEligibility parameterCopy linkLink copied to clipboard!

3.6. Create initdataCopy linkLink copied to clipboard!

3.7. Applying initdata to a podCopy linkLink copied to clipboard!

3.8. Verifying attestationCopy linkLink copied to clipboard!

3.9. Configuring your workloadCopy linkLink copied to clipboard!

3.9.1. Encrypt the block volumesCopy linkLink copied to clipboard!

3.10. Configuring confidential containers for NVIDIA GPUsCopy linkLink copied to clipboard!

3.10.1. NVIDIA GPUs as trusted execution environmentsCopy linkLink copied to clipboard!

3.10.2. Create a MachineConfig for NVIDIA GPUsCopy linkLink copied to clipboard!

3.10.3. Install the Node Feature Discovery OperatorCopy linkLink copied to clipboard!

3.10.4. Create a node feature rule for NVIDIA GPUsCopy linkLink copied to clipboard!

3.10.5. Install the NVIDIA GPU OperatorCopy linkLink copied to clipboard!

3.10.6. Create the ClusterPolicy CR for NVIDIA GPUsCopy linkLink copied to clipboard!

3.10.7. Required node labels for GPU runtime classesCopy linkLink copied to clipboard!

Chapter 4. UpdateCopy linkLink copied to clipboard!

4.1. Updating the OpenShift sandboxed containers OperatorCopy linkLink copied to clipboard!

Chapter 5. UninstallCopy linkLink copied to clipboard!

5.1. Deleting workload podsCopy linkLink copied to clipboard!

5.2. Deleting the KataConfig custom resourceCopy linkLink copied to clipboard!

5.3. Uninstalling the OpenShift sandboxed containers OperatorCopy linkLink copied to clipboard!

5.4. Deleting the KataConfig CRDCopy linkLink copied to clipboard!

Chapter 6. ObserveCopy linkLink copied to clipboard!

6.1. MetricsCopy linkLink copied to clipboard!

6.2. Viewing metricsCopy linkLink copied to clipboard!

6.3. Enabling debug logs for CRI-O runtimeCopy linkLink copied to clipboard!

6.4. Viewing debug logs for componentsCopy linkLink copied to clipboard!

Chapter 7. TroubleshootCopy linkLink copied to clipboard!

7.1. Using must-gatherCopy linkLink copied to clipboard!

Chapter 8. ReferenceCopy linkLink copied to clipboard!

8.1. KataConfig status messagesCopy linkLink copied to clipboard!

Legal NoticeCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat

Making open source more inclusive

About Red Hat Documentation

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Preface
Copy link

Providing feedback on Red Hat documentation
Copy link

Chapter 1. Discover
Copy link

1.1. Compatibility with OpenShift Container Platform
Copy link

1.2. Common terms
Copy link

1.3. Initrd images
Copy link

1.4. FIPS compliance
Copy link

Chapter 2. Install
Copy link

2.1. Prerequisites
Copy link

2.2. Creating an Intel TDX machine config
Copy link

2.3. Installing the OpenShift sandboxed containers Operator
Copy link

Chapter 3. Configure
Copy link

3.1. Auto-detecting TEEs
Copy link

3.1.1. Creating a NodeFeatureDiscovery custom resource
Copy link

3.1.2. Creating the NodeFeatureRule custom resource
Copy link

3.2. Deploying Intel TDX remote attestation
Copy link

3.3. Enabling confidential containers
Copy link

3.4. Initializing pods at runtime by using initdata
Copy link

3.5. Create the KataConfig custom resource
Copy link

3.5.1. The checkNodeEligibility parameter
Copy link

3.6. Create initdata
Copy link

3.7. Applying initdata to a pod
Copy link

3.8. Verifying attestation
Copy link

3.9. Configuring your workload
Copy link

3.9.1. Encrypt the block volumes
Copy link

3.10. Configuring confidential containers for NVIDIA GPUs
Copy link

3.10.1. NVIDIA GPUs as trusted execution environments
Copy link

3.10.2. Create a MachineConfig for NVIDIA GPUs
Copy link

3.10.3. Install the Node Feature Discovery Operator
Copy link

3.10.4. Create a node feature rule for NVIDIA GPUs
Copy link

3.10.5. Install the NVIDIA GPU Operator
Copy link

3.10.6. Create the ClusterPolicy CR for NVIDIA GPUs
Copy link

3.10.7. Required node labels for GPU runtime classes
Copy link

Chapter 4. Update
Copy link

4.1. Updating the OpenShift sandboxed containers Operator
Copy link

Chapter 5. Uninstall
Copy link

5.1. Deleting workload pods
Copy link

5.2. Deleting the KataConfig custom resource
Copy link

5.3. Uninstalling the OpenShift sandboxed containers Operator
Copy link

5.4. Deleting the KataConfig CRD
Copy link

Chapter 6. Observe
Copy link

6.1. Metrics
Copy link

6.2. Viewing metrics
Copy link

6.3. Enabling debug logs for CRI-O runtime
Copy link

6.4. Viewing debug logs for components
Copy link

Chapter 7. Troubleshoot
Copy link

7.1. Using must-gather
Copy link

Chapter 8. Reference
Copy link

8.1. KataConfig status messages
Copy link

Legal Notice
Copy link