Chapter 7. Collecting OpenShift sandboxed containers data
When troubleshooting OpenShift sandboxed containers, you can open a support case and provide debugging information using the must-gather
tool.
If you are a cluster administrator, you can also review logs on your own, enabling a more detailed level of logs.
7.1. Collecting OpenShift sandboxed containers data for Red Hat Support
When opening a support case, it is helpful to provide debugging information about your cluster to Red Hat Support.
The must-gather
tool enables you to collect diagnostic information about your Red Hat OpenShift cluster, including virtual machines and other data related to OpenShift sandboxed containers.
For prompt support, supply diagnostic information for both Red Hat OpenShift and OpenShift sandboxed containers.
7.1.1. About the must-gather tool
The oc adm must-gather
CLI command collects the information from your cluster that is most likely needed for debugging issues, including:
- Resource definitions
- Service logs
By default, the oc adm must-gather
command uses the default plugin image and writes into ./must-gather.local
.
Alternatively, you can collect specific information by running the command with the appropriate arguments as described in the following sections:
To collect data related to one or more specific features, use the
--image
argument with an image, as listed in a following section.For example:
$ oc adm must-gather --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8:v4.13.0
To collect the audit logs, use the
-- /usr/bin/gather_audit_logs
argument, as described in a following section.For example:
$ oc adm must-gather -- /usr/bin/gather_audit_logs
NoteAudit logs are not collected as part of the default set of information to reduce the size of the files.
When you run oc adm must-gather
, a new pod with a random name is created in a new project on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local
. This directory is created in the current working directory.
For example:
NAMESPACE NAME READY STATUS RESTARTS AGE ... openshift-must-gather-5drcj must-gather-bklx4 2/2 Running 0 72s openshift-must-gather-5drcj must-gather-s8sdh 2/2 Running 0 72s ...
Optionally, you can run the oc adm must-gather
command in a specific namespace by using the --run-namespace
option.
For example:
$ oc adm must-gather --run-namespace <namespace> --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8:v4.13.0
To collect OpenShift sandboxed containers data with must-gather
, you must specify the OpenShift sandboxed containers image:
--image=registry.redhat.io/openshift-sandboxed-containers/osc-must-gather-rhel8:1.4.0
7.2. About OpenShift sandboxed containers log data
When you collect log data about your cluster, the following features and objects are associated with OpenShift sandboxed containers:
- All namespaces and their child objects that belong to any OpenShift sandboxed containers resources
- All OpenShift sandboxed containers custom resource definitions (CRDs)
The following OpenShift sandboxed containers component logs are collected for each pod running with the kata
runtime:
- Kata agent logs
- Kata runtime logs
- QEMU logs
- Audit logs
- CRI-O logs
7.3. Enabling debug logs for OpenShift sandboxed containers
As a cluster administrator, you can collect a more detailed level of logs for OpenShift sandboxed containers. You can also enhance logging by changing the logLevel
field in the KataConfig
CR. This changes the log_level
in the CRI-O runtime for the worker nodes running OpenShift sandboxed containers.
Procedure
-
Change the
logLevel
field in your existingKataConfig
CR todebug
:
$ oc patch kataconfig <name_of_kataconfig_file> --type merge --patch '{"spec":{"logLevel":"debug"}}'
When running this command, reference the name of your KataConfig
CR. This is the name you used to create the CR when setting up OpenShift sandboxed containers.
Verification
Monitor the
kata-oc
machine config pool until theUPDATED
field appears asTrue
, meaning all worker nodes are updated:$ oc get mcp kata-oc
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE kata-oc rendered-kata-oc-169 False True False 3 1 1 0 9h
Verify that the
log_level
was updated in CRI-O:Open an
oc debug
session to a node in the machine config pool and runchroot /host
.$ oc debug node/<node_name>
sh-4.4# chroot /host
Verify the changes in the
crio.conf
file:sh-4.4# crio config | egrep 'log_level
Example output
log_level = "debug"
7.3.1. Viewing debug logs for OpenShift sandboxed containers
Cluster administrators can use the enhanced debug logs for OpenShift sandboxed containers to troubleshoot issues. The logs for each node are printed to the node journal.
You can review the logs for the following OpenShift sandboxed containers components:
- Kata agent
-
Kata runtime (
containerd-shim-kata-v2
) - virtiofsd
QEMU only generates warning and error logs. These warnings and errors print to the node journal in both the Kata runtime logs and the CRI-O logs with an extra qemuPid
field.
Example of QEMU logs
Mar 11 11:57:28 openshift-worker-0 kata[2241647]: time="2023-03-11T11:57:28.587116986Z" level=info msg="Start logging QEMU (qemuPid=2241693)" name=containerd-shim-v2 pid=2241647 sandbox=d1d4d68efc35e5ccb4331af73da459c13f46269b512774aa6bde7da34db48987 source=virtcontainers/hypervisor subsystem=qemu Mar 11 11:57:28 openshift-worker-0 kata[2241647]: time="2023-03-11T11:57:28.607339014Z" level=error msg="qemu-kvm: -machine q35,accel=kvm,kernel_irqchip=split,foo: Expected '=' after parameter 'foo'" name=containerd-shim-v2 pid=2241647 qemuPid=2241693 sandbox=d1d4d68efc35e5ccb4331af73da459c13f46269b512774aa6bde7da34db48987 source=virtcontainers/hypervisor subsystem=qemu Mar 11 11:57:28 openshift-worker-0 kata[2241647]: time="2023-03-11T11:57:28.60890737Z" level=info msg="Stop logging QEMU (qemuPid=2241693)" name=containerd-shim-v2 pid=2241647 sandbox=d1d4d68efc35e5ccb4331af73da459c13f46269b512774aa6bde7da34db48987 source=virtcontainers/hypervisor subsystem=qemu
The Kata runtime prints Start logging QEMU
when QEMU starts, and Stop Logging QEMU
when QEMU stops. The error appears in between these two log messages with the qemuPid
field. The actual error message from QEMU appears in red.
The console of the QEMU guest is printed to the node journal as well. You can view the guest console logs together with the Kata agent logs.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
To review the Kata agent logs and guest console logs, run:
$ oc debug node/<nodename> -- journalctl -D /host/var/log/journal -t kata -g “reading guest console”
To review the kata runtime logs, run:
$ oc debug node/<nodename> -- journalctl -D /host/var/log/journal -t kata
To review the virtiofsd logs, run:
$ oc debug node/<nodename> -- journalctl -D /host/var/log/journal -t virtiofsd
To review the QEMU logs, run:
$ oc debug node/<nodename> -- journalctl -D /host/var/log/journal -t kata -g "qemuPid=\d+"
7.4. Additional resources
- For more information about gathering data for support, see Gathering data about your cluster.