Chapter 13. Support

13.1. Support overview

You can collect data about your environment, monitor the health of your cluster and virtual machines (VMs), and troubleshoot OpenShift Virtualization resources with the following tools.

13.1.1. Web console

The OpenShift Container Platform web console displays resource usage, alerts, events, and trends for your cluster and for OpenShift Virtualization components and resources.

Table 13.1. Web console pages for monitoring and troubleshooting
Page	Description
Overview page	Cluster details, status, alerts, inventory, and resource usage
Virtualization Overview tab	OpenShift Virtualization resources, usage, alerts, and status
Virtualization Top consumers tab	Top consumers of CPU, memory, and storage
Virtualization Migrations tab	Progress of live migrations
VirtualMachines VirtualMachine VirtualMachine details Metrics tab	VM resource usage, storage, network, and migration
VirtualMachines VirtualMachine VirtualMachine details Events tab	List of VM events
VirtualMachines VirtualMachine VirtualMachine details Diagnostics tab	VM status conditions and volume snapshot status

13.1.2. Collecting data for Red Hat Support

When you submit a support case to Red Hat Support, it is helpful to provide debugging information. You can gather debugging information by performing the following steps:

Collecting data about your environment: Configure Prometheus and Alertmanager and collect must-gather data for OpenShift Container Platform and OpenShift Virtualization.
Collecting data about VMs: Collect must-gather data and memory dumps from VMs.

must-gather tool for OpenShift Virtualization: Configure and use the must-gather tool.

13.1.3. Troubleshooting

Troubleshoot OpenShift Virtualization components and VMs and resolve issues that trigger alerts in the web console.

Events: View important life-cycle information for VMs, namespaces, and resources.
Logs: View and configure logs for OpenShift Virtualization components and VMs.
Troubleshooting data volumes: Troubleshoot data volumes by analyzing conditions and events.

13.2. Collecting data for Red Hat Support

When you submit a support case to Red Hat Support, it is helpful to provide debugging information for OpenShift Container Platform and OpenShift Virtualization by using the following tools:

must-gather tool: The must-gather tool collects diagnostic information, including resource definitions and service logs.
Prometheus: Prometheus is a time-series database and a rule evaluation engine for metrics. Prometheus sends alerts to Alertmanager for processing.
Alertmanager: The Alertmanager service handles alerts received from Prometheus. The Alertmanager is also responsible for sending the alerts to external notification systems. For information about the OpenShift Container Platform monitoring stack, see About OpenShift Container Platform monitoring.

13.2.1. Collecting data about your environment

Collecting data about your environment minimizes the time required to analyze and determine the root cause.

Prerequisites

Set the retention time for Prometheus metrics data to a minimum of seven days.
Configure the Alertmanager to capture relevant alerts and to send alert notifications to a dedicated mailbox so that they can be viewed and persisted outside the cluster.
Record the exact number of affected nodes and virtual machines.

Procedure

13.2.2. Collecting data about virtual machines

Collecting data about malfunctioning virtual machines (VMs) minimizes the time required to analyze and determine the root cause.

Prerequisites

Linux VMs: Install the latest QEMU guest agent.
Windows VMs:
- Record the Windows patch update details.
- Install the latest VirtIO drivers.
- Install the latest QEMU guest agent.
- If Remote Desktop Protocol (RDP) is enabled, connect by using the desktop viewer to determine whether there is a problem with the connection software.

Procedure

Collect must-gather data for the VMs using the /usr/bin/gather script.
Collect screenshots of VMs that have crashed before you restart them.
Collect memory dumps from VMs before remediation attempts.
Record factors that the malfunctioning VMs have in common. For example, the VMs have the same host or network.

13.2.3. Using the must-gather tool for OpenShift Virtualization

You can collect data about OpenShift Virtualization resources by running the must-gather command with the OpenShift Virtualization image.

The default data collection includes information about the following resources:

OpenShift Virtualization Operator namespaces, including child objects
OpenShift Virtualization custom resource definitions
Namespaces that contain virtual machines
Basic virtual machine definitions

Instance types information is not currently collected by default; you can, however, run a command to optionally collect it.

Procedure

Run the following command to collect data about OpenShift Virtualization:

$ oc adm must-gather \
  --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.14.11 \
  -- /usr/bin/gather

13.2.3.1. must-gather tool options

You can specify a combination of scripts and environment variables for the following options:

Collecting detailed virtual machine (VM) information from a namespace
Collecting detailed information about specified VMs
Collecting image, image-stream, and image-stream-tags information
Limiting the maximum number of parallel processes used by the must-gather tool

13.2.3.1.1. Parameters

Environment variables

You can specify environment variables for a compatible script.

NS=<namespace_name>: Collect virtual machine information, including virt-launcher pod details, from the namespace that you specify. The VirtualMachine and VirtualMachineInstance CR data is collected for all namespaces.
VM=<vm_name>: Collect details about a particular virtual machine. To use this option, you must also specify a namespace by using the NS environment variable.
PROS=<number_of_processes>: Modify the maximum number of parallel processes that the must-gather tool uses. The default value is 5.
Important
Using too many parallel processes can cause performance issues. Increasing the maximum number of parallel processes is not recommended.

Scripts

Each script is compatible only with certain environment variable combinations.

/usr/bin/gather: Use the default must-gather script, which collects cluster data from all namespaces and includes only basic VM information. This script is compatible only with the PROS variable.
/usr/bin/gather --vms_details: Collect VM log files, VM definitions, control-plane logs, and namespaces that belong to OpenShift Virtualization resources. Specifying namespaces includes their child objects. If you use this parameter without specifying a namespace or VM, the must-gather tool collects this data for all VMs in the cluster. This script is compatible with all environment variables, but you must specify a namespace if you use the VM variable.
/usr/bin/gather --images: Collect image, image-stream, and image-stream-tags custom resource information. This script is compatible only with the PROS variable.
/usr/bin/gather --instancetypes: Collect instance types information. This information is not currently collected by default; you can, however, optionally collect it.

13.2.3.1.2. Usage and examples

Environment variables are optional. You can run a script by itself or with one or more compatible environment variables.

Table 13.2. Compatible parameters
Script	Compatible environment variable
`/usr/bin/gather`	* `PROS=<number_of_processes>`
`/usr/bin/gather --vms_details`	* For a namespace: `NS=<namespace_name>` * For a VM: `VM=<vm_name> NS=<namespace_name>` * `PROS=<number_of_processes>`
`/usr/bin/gather --images`	* `PROS=<number_of_processes>`

Syntax

$ oc adm must-gather \
  --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.14.11 \
  -- <environment_variable_1> <environment_variable_2> <script_name>

Default data collection parallel processes

By default, five processes run in parallel.

$ oc adm must-gather \
  --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.14.11 \
  -- PROS=5 /usr/bin/gather 1

1: You can modify the number of parallel processes by changing the default.

Detailed VM information

The following command collects detailed VM information for the my-vm VM in the mynamespace namespace:

$ oc adm must-gather \
  --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.14.11 \
  -- NS=mynamespace VM=my-vm /usr/bin/gather --vms_details 1

1: The NS environment variable is mandatory if you use the VM environment variable.

Image, image-stream, and image-stream-tags information

The following command collects image, image-stream, and image-stream-tags information from the cluster:

$ oc adm must-gather \
  --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.14.11 \
  /usr/bin/gather --images

Instance types information

The following command collects instance types information from the cluster:

$ oc adm must-gather \
  --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel9:v4.14.11 \
  /usr/bin/gather --instancetypes

13.3. Troubleshooting

OpenShift Virtualization provides tools and logs for troubleshooting virtual machines and virtualization components.

You can troubleshoot OpenShift Virtualization components by using the tools provided in the web console or by using the oc CLI tool.

13.3.1. Events

OpenShift Container Platform events are records of important life-cycle information and are useful for monitoring and troubleshooting virtual machine, namespace, and resource issues.

VM events: Navigate to the Events tab of the VirtualMachine details page in the web console.
Namespace events
You can view namespace events by running the following command:
```
$ oc get events -n <namespace>
```
See the list of events for details about specific events.
Resource events
You can view resource events by running the following command:
```
$ oc describe <resource> <resource_name>
```

13.3.2. Logs

You can review the following logs for troubleshooting:

13.3.2.1. Viewing virtual machine logs with the web console

You can view virtual machine logs with the OpenShift Container Platform web console.

Procedure

Navigate to Virtualization VirtualMachines.
Select a virtual machine to open the VirtualMachine details page.
On the Details tab, click the pod name to open the Pod details page.
Click the Logs tab to view the logs.

13.3.2.2. Viewing OpenShift Virtualization pod logs

You can view logs for OpenShift Virtualization pods by using the oc CLI tool.

You can configure the verbosity level of the logs by editing the HyperConverged custom resource (CR).

13.3.2.2.1. Viewing OpenShift Virtualization pod logs with the CLI

You can view logs for the OpenShift Virtualization pods by using the oc CLI tool.

Procedure

View a list of pods in the OpenShift Virtualization namespace by running the following command:

$ oc get pods -n openshift-cnv

Example 13.1. Example output

NAME                               READY   STATUS    RESTARTS   AGE
disks-images-provider-7gqbc        1/1     Running   0          32m
disks-images-provider-vg4kx        1/1     Running   0          32m
virt-api-57fcc4497b-7qfmc          1/1     Running   0          31m
virt-api-57fcc4497b-tx9nc          1/1     Running   0          31m
virt-controller-76c784655f-7fp6m   1/1     Running   0          30m
virt-controller-76c784655f-f4pbd   1/1     Running   0          30m
virt-handler-2m86x                 1/1     Running   0          30m
virt-handler-9qs6z                 1/1     Running   0          30m
virt-operator-7ccfdbf65f-q5snk     1/1     Running   0          32m
virt-operator-7ccfdbf65f-vllz8     1/1     Running   0          32m

View the pod log by running the following command:

$ oc logs -n openshift-cnv <pod_name>

Note

If a pod fails to start, you can use the --previous option to view logs from the last attempt.

To monitor log output in real time, use the -f option.

Example 13.2. Example output

{"component":"virt-handler","level":"info","msg":"set verbosity to 2","pos":"virt-handler.go:453","timestamp":"2022-04-17T08:58:37.373695Z"}
{"component":"virt-handler","level":"info","msg":"set verbosity to 2","pos":"virt-handler.go:453","timestamp":"2022-04-17T08:58:37.373726Z"}
{"component":"virt-handler","level":"info","msg":"setting rate limiter to 5 QPS and 10 Burst","pos":"virt-handler.go:462","timestamp":"2022-04-17T08:58:37.373782Z"}
{"component":"virt-handler","level":"info","msg":"CPU features of a minimum baseline CPU model: map[apic:true clflush:true cmov:true cx16:true cx8:true de:true fpu:true fxsr:true lahf_lm:true lm:true mca:true mce:true mmx:true msr:true mtrr:true nx:true pae:true pat:true pge:true pni:true pse:true pse36:true sep:true sse:true sse2:true sse4.1:true ssse3:true syscall:true tsc:true]","pos":"cpu_plugin.go:96","timestamp":"2022-04-17T08:58:37.390221Z"}
{"component":"virt-handler","level":"warning","msg":"host model mode is expected to contain only one model","pos":"cpu_plugin.go:103","timestamp":"2022-04-17T08:58:37.390263Z"}
{"component":"virt-handler","level":"info","msg":"node-labeller is running","pos":"node_labeller.go:94","timestamp":"2022-04-17T08:58:37.391011Z"}

13.3.2.2.2. Configuring OpenShift Virtualization pod log verbosity

You can configure the verbosity level of OpenShift Virtualization pod logs by editing the HyperConverged custom resource (CR).

Procedure

To set log verbosity for specific components, open the HyperConverged CR in your default text editor by running the following command:
```
$ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnv
```
Set the log level for one or more components by editing the spec.logVerbosityConfig stanza. For example:
```
apiVersion: hco.kubevirt.io/v1beta1
kind: HyperConverged
metadata:
  name: kubevirt-hyperconverged
spec:
  logVerbosityConfig:
    kubevirt:
      virtAPI: 5 1
      virtController: 4
      virtHandler: 3
      virtLauncher: 2
      virtOperator: 6
```
1
The log verbosity value must be an integer in the range 1–9, where a higher number indicates a more detailed log. In this example, the virtAPI component logs are exposed if their priority level is 5 or higher.
Apply your changes by saving and exiting the editor.

13.3.2.2.3. Common error messages

The following error messages might appear in OpenShift Virtualization logs:

ErrImagePull or ImagePullBackOff: Indicates an incorrect deployment configuration or problems with the images that are referenced.

13.3.2.3. Viewing aggregated OpenShift Virtualization logs with the LokiStack

You can view aggregated logs for OpenShift Virtualization pods and containers by using the LokiStack in the web console.

Prerequisites

You deployed the LokiStack.

Procedure

Navigate to Observe Logs in the web console.
Select application, for virt-launcher pod logs, or infrastructure, for OpenShift Virtualization control plane pods and containers, from the log type list.
Click Show Query to display the query field.
Enter the LogQL query in the query field and click Run Query to display the filtered logs.

13.3.2.3.1. OpenShift Virtualization LogQL queries

You can view and filter aggregated logs for OpenShift Virtualization components by running Loki Query Language (LogQL) queries on the Observe Logs page in the web console.

The default log type is infrastructure. The virt-launcher log type is application.

Optional: You can include or exclude strings or regular expressions by using line filter expressions.

Note

If the query matches a large number of logs, the query might time out.

Table 13.3. OpenShift Virtualization LogQL example queries
Component	LogQL query
All	{log_type=~".+"}\|json \|kubernetes_labels_app_kubernetes_io_part_of="hyperconverged-cluster"
`cdi-apiserver` `cdi-deployment` `cdi-operator`	{log_type=~".+"}\|json \|kubernetes_labels_app_kubernetes_io_part_of="hyperconverged-cluster" \|kubernetes_labels_app_kubernetes_io_component="storage"
`hco-operator`	{log_type=~".+"}\|json \|kubernetes_labels_app_kubernetes_io_part_of="hyperconverged-cluster" \|kubernetes_labels_app_kubernetes_io_component="deployment"
`kubemacpool`	{log_type=~".+"}\|json \|kubernetes_labels_app_kubernetes_io_part_of="hyperconverged-cluster" \|kubernetes_labels_app_kubernetes_io_component="network"
`virt-api` `virt-controller` `virt-handler` `virt-operator`	{log_type=~".+"}\|json \|kubernetes_labels_app_kubernetes_io_part_of="hyperconverged-cluster" \|kubernetes_labels_app_kubernetes_io_component="compute"
`ssp-operator`	{log_type=~".+"}\|json \|kubernetes_labels_app_kubernetes_io_part_of="hyperconverged-cluster" \|kubernetes_labels_app_kubernetes_io_component="schedule"
Container	{log_type=~".+",kubernetes_container_name=~"<container>\|<container>"} 1 \|json\|kubernetes_labels_app_kubernetes_io_part_of="hyperconverged-cluster" 1 Specify one or more containers separated by a pipe (`\|`).
`virt-launcher`	You must select application from the log type list before running this query. {log_type=~".+", kubernetes_container_name="compute"}\|json \|!= "custom-ga-command" 1 1 `\|!= "custom-ga-command"` excludes libvirt logs that contain the string `custom-ga-command`. (BZ#2177684)

You can filter log lines to include or exclude strings or regular expressions by using line filter expressions.

Table 13.4. Line filter expressions
Line filter expression	Description
`\|= "<string>"`	Log line contains string
`!= "<string>"`	Log line does not contain string
`\|~ "<regex>"`	Log line contains regular expression
`!~ "<regex>"`	Log line does not contain regular expression

Example line filter expression

{log_type=~".+"}|json
|kubernetes_labels_app_kubernetes_io_part_of="hyperconverged-cluster"
|= "error" != "timeout"

Additional resources for LokiStack and LogQL

About log storage
Deploying the LokiStack
LogQL log queries in the Grafana documentation

13.3.3. Troubleshooting data volumes

You can check the Conditions and Events sections of the DataVolume object to analyze and resolve issues.

13.3.3.1. About data volume conditions and events

You can diagnose data volume issues by examining the output of the Conditions and Events sections generated by the command:

$ oc describe dv <DataVolume>

The Conditions section displays the following Types:

Bound
Running
Ready

The Events section provides the following additional information:

Type of event
Reason for logging
Source of the event
Message containing additional diagnostic information.

The output from oc describe does not always contains Events.

An event is generated when the Status, Reason, or Message changes. Both conditions and events react to changes in the state of the data volume.

For example, if you misspell the URL during an import operation, the import generates a 404 message. That message change generates an event with a reason. The output in the Conditions section is updated as well.

13.3.3.2. Analyzing data volume conditions and events

By inspecting the Conditions and Events sections generated by the describe command, you determine the state of the data volume in relation to persistent volume claims (PVCs), and whether or not an operation is actively running or completed. You might also receive messages that offer specific details about the status of the data volume, and how it came to be in its current state.

There are many different combinations of conditions. Each must be evaluated in its unique context.

Examples of various combinations follow.

Bound - A successfully bound PVC displays in this example.
Note that the Type is Bound, so the Status is True. If the PVC is not bound, the Status is False.
When the PVC is bound, an event is generated stating that the PVC is bound. In this case, the Reason is Bound and Status is True. The Message indicates which PVC owns the data volume.
Message, in the Events section, provides further details including how long the PVC has been bound (Age) and by what resource (From), in this case datavolume-controller:
Example output
```
Status:
  Conditions:
    Last Heart Beat Time:  2020-07-15T03:58:24Z
    Last Transition Time:  2020-07-15T03:58:24Z
    Message:               PVC win10-rootdisk Bound
    Reason:                Bound
    Status:                True
    Type:                  Bound
...
  Events:
    Type     Reason     Age    From                   Message
    ----     ------     ----   ----                   -------
    Normal   Bound      24s    datavolume-controller  PVC example-dv Bound
```
Running - In this case, note that Type is Running and Status is False, indicating that an event has occurred that caused an attempted operation to fail, changing the Status from True to False.
However, note that Reason is Completed and the Message field indicates Import Complete.
In the Events section, the Reason and Message contain additional troubleshooting information about the failed operation. In this example, the Message displays an inability to connect due to a 404, listed in the Events section’s first Warning.
From this information, you conclude that an import operation was running, creating contention for other operations that are attempting to access the data volume:
Example output
```
Status:
  Conditions:
    Last Heart Beat Time:  2020-07-15T04:31:39Z
    Last Transition Time:  2020-07-15T04:31:39Z
    Message:               Import Complete
    Reason:                Completed
    Status:                False
    Type:                  Running
...
  Events:
    Type     Reason       Age                From                   Message
    ----     ------       ----               ----                   -------
    Warning  Error        12s (x2 over 14s)  datavolume-controller  Unable to connect
    to http data source: expected status code 200, got 404. Status: 404 Not Found
```
Ready – If Type is Ready and Status is True, then the data volume is ready to be used, as in the following example. If the data volume is not ready to be used, the Status is False:
Example output
```
Status:
  Conditions:
    Last Heart Beat Time: 2020-07-15T04:31:39Z
    Last Transition Time:  2020-07-15T04:31:39Z
    Status:                True
    Type:                  Ready
```

13.1. Support overview

13.1.1. Web console

13.1.2. Collecting data for Red Hat Support

13.1.3. Troubleshooting

13.2. Collecting data for Red Hat Support

13.2.1. Collecting data about your environment

13.2.2. Collecting data about virtual machines

13.2.3. Using the must-gather tool for OpenShift Virtualization

13.2.3.1. must-gather tool options

13.2.3.1.1. Parameters

13.2.3.1.2. Usage and examples

13.3. Troubleshooting

13.3.1. Events

13.3.2. Logs

13.3.2.1. Viewing virtual machine logs with the web console

13.3.2.2. Viewing OpenShift Virtualization pod logs

13.3.2.2.1. Viewing OpenShift Virtualization pod logs with the CLI

13.3.2.2.2. Configuring OpenShift Virtualization pod log verbosity

13.3.2.2.3. Common error messages

13.3.2.3. Viewing aggregated OpenShift Virtualization logs with the LokiStack

13.3.2.3.1. OpenShift Virtualization LogQL queries

13.3.3. Troubleshooting data volumes

13.3.3.1. About data volume conditions and events

13.3.3.2. Analyzing data volume conditions and events

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links