Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 15. Logging, events, and monitoring
15.1. Virtualization Overview page Link kopierenLink in die Zwischenablage kopiert!
The Virtualization Overview page provides a comprehensive view of virtualization resources, details, status, and top consumers:
- The Overview tab displays Getting started resources, details, inventory, alerts, and other information about your OpenShift Virtualization environment.
- The Top consumers tab displays high utilization of a specific resource by projects, virtual machines, or nodes.
- The Migrations tab displays the status of live migrations.
- The Settings tab displays cluster-wide settings, including live migration settings and user permissions.
By gaining an insight into the overall health of OpenShift Virtualization, you can determine if intervention is required to resolve specific issues identified by examining the data.
15.1.1. Reviewing top consumers Link kopierenLink in die Zwischenablage kopiert!
You can view the top consumers of resources for a selected project, virtual machine, or node on the Top consumers tab of the Virtualization Overview page.
Prerequisites
-
You must have access to the cluster as a user with the role.
cluster-admin -
To use the vCPU wait metric on the Top consumers tab, you must apply the kernel argument to the
schedstats=enableobject.MachineConfig
Procedure
-
In the Administrator perspective in the OpenShift Container Platform web console, navigate to Virtualization
Overview. - Click the Top consumers tab.
- Optional: You can filter the results by selecting a time period or by selecting the 5 or 10 top consumers.
15.2. Viewing OpenShift Virtualization logs Link kopierenLink in die Zwischenablage kopiert!
You can view logs for OpenShift Virtualization components and virtual machines by using the web console or the
oc
virt-launcher
HyperConverged
15.2.1. Viewing OpenShift Virtualization logs with the CLI Link kopierenLink in die Zwischenablage kopiert!
Configure log verbosity for OpenShift Virtualization components by editing the
HyperConverged
oc
Procedure
To set log verbosity for specific components, open the
CR in your default text editor by running the following command:HyperConverged$ oc edit hyperconverged kubevirt-hyperconverged -n openshift-cnvSet the log level for one or more components by editing the
stanza. For example:spec.logVerbosityConfigapiVersion: hco.kubevirt.io/v1beta1 kind: HyperConverged metadata: name: kubevirt-hyperconverged spec: logVerbosityConfig: kubevirt: virtAPI: 51 virtController: 4 virtHandler: 3 virtLauncher: 2 virtOperator: 6- 1
- The log verbosity value must be an integer in the range
1–9, where a higher number indicates a more detailed log. In this example, thevirtAPIcomponent logs are exposed if their priority level is5or higher.
- Apply your changes by saving and exiting the editor.
View a list of pods in the OpenShift Virtualization namespace by running the following command:
$ oc get pods -n openshift-cnvExample 15.1. Example output
NAME READY STATUS RESTARTS AGE disks-images-provider-7gqbc 1/1 Running 0 32m disks-images-provider-vg4kx 1/1 Running 0 32m virt-api-57fcc4497b-7qfmc 1/1 Running 0 31m virt-api-57fcc4497b-tx9nc 1/1 Running 0 31m virt-controller-76c784655f-7fp6m 1/1 Running 0 30m virt-controller-76c784655f-f4pbd 1/1 Running 0 30m virt-handler-2m86x 1/1 Running 0 30m virt-handler-9qs6z 1/1 Running 0 30m virt-operator-7ccfdbf65f-q5snk 1/1 Running 0 32m virt-operator-7ccfdbf65f-vllz8 1/1 Running 0 32mTo view logs for a component pod, run the following command:
$ oc logs -n openshift-cnv <pod_name>For example:
$ oc logs -n openshift-cnv virt-handler-2m86xNoteIf a pod fails to start, you can use the
option to view logs from the last attempt.--previousTo monitor log output in real time, use the
option.-fExample 15.2. Example output
{"component":"virt-handler","level":"info","msg":"set verbosity to 2","pos":"virt-handler.go:453","timestamp":"2022-04-17T08:58:37.373695Z"} {"component":"virt-handler","level":"info","msg":"set verbosity to 2","pos":"virt-handler.go:453","timestamp":"2022-04-17T08:58:37.373726Z"} {"component":"virt-handler","level":"info","msg":"setting rate limiter to 5 QPS and 10 Burst","pos":"virt-handler.go:462","timestamp":"2022-04-17T08:58:37.373782Z"} {"component":"virt-handler","level":"info","msg":"CPU features of a minimum baseline CPU model: map[apic:true clflush:true cmov:true cx16:true cx8:true de:true fpu:true fxsr:true lahf_lm:true lm:true mca:true mce:true mmx:true msr:true mtrr:true nx:true pae:true pat:true pge:true pni:true pse:true pse36:true sep:true sse:true sse2:true sse4.1:true ssse3:true syscall:true tsc:true]","pos":"cpu_plugin.go:96","timestamp":"2022-04-17T08:58:37.390221Z"} {"component":"virt-handler","level":"warning","msg":"host model mode is expected to contain only one model","pos":"cpu_plugin.go:103","timestamp":"2022-04-17T08:58:37.390263Z"} {"component":"virt-handler","level":"info","msg":"node-labeller is running","pos":"node_labeller.go:94","timestamp":"2022-04-17T08:58:37.391011Z"}
15.2.2. Viewing virtual machine logs in the web console Link kopierenLink in die Zwischenablage kopiert!
Get virtual machine logs from the associated virtual machine launcher pod.
Procedure
-
In the OpenShift Container Platform console, click Virtualization
VirtualMachines from the side menu. - Select a virtual machine to open the VirtualMachine details page.
- Click the Details tab.
-
Click the pod in the Pod section to open the Pod details page.
virt-launcher-<name> - Click the Logs tab to view the pod logs.
15.2.3. Common error messages Link kopierenLink in die Zwischenablage kopiert!
The following error messages might appear in OpenShift Virtualization logs:
ErrImagePullorImagePullBackOff- Indicates an incorrect deployment configuration or problems with the images that are referenced.
15.3. Viewing events Link kopierenLink in die Zwischenablage kopiert!
15.3.1. About virtual machine events Link kopierenLink in die Zwischenablage kopiert!
OpenShift Container Platform events are records of important life-cycle information in a namespace and are useful for monitoring and troubleshooting resource scheduling, creation, and deletion issues.
OpenShift Virtualization adds events for virtual machines and virtual machine instances. These can be viewed from either the web console or the CLI.
See also: Viewing system event information in an OpenShift Container Platform cluster.
15.3.2. Viewing the events for a virtual machine in the web console Link kopierenLink in die Zwischenablage kopiert!
You can view streaming events for a running virtual machine on the VirtualMachine details page of the web console.
Procedure
-
Click Virtualization
VirtualMachines from the side menu. - Select a virtual machine to open the VirtualMachine details page.
Click the Events tab to view streaming events for the virtual machine.
- The ▮▮ button pauses the events stream.
- The ▶ button resumes a paused events stream.
15.3.3. Viewing namespace events in the CLI Link kopierenLink in die Zwischenablage kopiert!
Use the OpenShift Container Platform client to get the events for a namespace.
Procedure
In the namespace, use the
command:oc get$ oc get events
15.3.4. Viewing resource events in the CLI Link kopierenLink in die Zwischenablage kopiert!
Events are included in the resource description, which you can get using the OpenShift Container Platform client.
Procedure
In the namespace, use the
command. The following example shows how to get the events for a virtual machine, a virtual machine instance, and the virt-launcher pod for a virtual machine:oc describe$ oc describe vm <vm>$ oc describe vmi <vmi>$ oc describe pod virt-launcher-<name>
15.4. Monitoring live migration Link kopierenLink in die Zwischenablage kopiert!
You can monitor the progress of live migration from either the web console or the CLI.
15.4.1. Monitoring live migration by using the web console Link kopierenLink in die Zwischenablage kopiert!
You can monitor the progress of all live migrations on the Overview
You can view the migration metrics of a virtual machine on the VirtualMachine details
15.4.2. Monitoring live migration of a virtual machine instance in the CLI Link kopierenLink in die Zwischenablage kopiert!
The status of the virtual machine migration is stored in the
Status
VirtualMachineInstance
Procedure
Use the
command on the migrating virtual machine instance:oc describe$ oc describe vmi vmi-fedoraExample output
... Status: Conditions: Last Probe Time: <nil> Last Transition Time: <nil> Status: True Type: LiveMigratable Migration Method: LiveMigration Migration State: Completed: true End Timestamp: 2018-12-24T06:19:42Z Migration UID: d78c8962-0743-11e9-a540-fa163e0c69f1 Source Node: node2.example.com Start Timestamp: 2018-12-24T06:19:35Z Target Node: node1.example.com Target Node Address: 10.9.0.18:43891 Target Node Domain Detected: true
15.4.3. Metrics Link kopierenLink in die Zwischenablage kopiert!
You can use Prometheus queries to monitor live migration.
15.4.3.1. Live migration metrics Link kopierenLink in die Zwischenablage kopiert!
The following metrics can be queried to show live migration status:
kubevirt_migrate_vmi_data_processed_bytes- The amount of guest operating system (OS) data that has migrated to the new virtual machine (VM). Type: Gauge.
kubevirt_migrate_vmi_data_remaining_bytes- The amount of guest OS data that remains to be migrated. Type: Gauge.
kubevirt_migrate_vmi_dirty_memory_rate_bytes- The rate at which memory is becoming dirty in the guest OS. Dirty memory is data that has been changed but not yet written to disk. Type: Gauge.
kubevirt_migrate_vmi_pending_count- The number of pending migrations. Type: Gauge.
kubevirt_migrate_vmi_scheduling_count- The number of scheduling migrations. Type: Gauge.
kubevirt_migrate_vmi_running_count- The number of running migrations. Type: Gauge.
kubevirt_migrate_vmi_succeeded- The number of successfully completed migrations. Type: Gauge.
kubevirt_migrate_vmi_failed- The number of failed migrations. Type: Gauge.
15.5. Diagnosing data volumes using events and conditions Link kopierenLink in die Zwischenablage kopiert!
Use the
oc describe
15.5.1. About conditions and events Link kopierenLink in die Zwischenablage kopiert!
Diagnose data volume issues by examining the output of the
Conditions
Events
$ oc describe dv <DataVolume>
There are three
Types
Conditions
-
Bound -
Running -
Ready
The
Events
-
of event
Type -
for logging
Reason -
of the event
Source -
containing additional diagnostic information.
Message
The output from
oc describe
Events
An event is generated when either
Status
Reason
Message
For example, if you misspell the URL during an import operation, the import generates a 404 message. That message change generates an event with a reason. The output in the
Conditions
15.5.2. Analyzing data volumes using conditions and events Link kopierenLink in die Zwischenablage kopiert!
By inspecting the
Conditions
Events
describe
There are many different combinations of conditions. Each must be evaluated in its unique context.
Examples of various combinations follow.
- – A successfully bound PVC displays in this example.
BoundNote that the
isType, so theBoundisStatus. If the PVC is not bound, theTrueisStatus.FalseWhen the PVC is bound, an event is generated stating that the PVC is bound. In this case, the
isReasonandBoundisStatus. TheTrueindicates which PVC owns the data volume.Message, in theMessagesection, provides further details including how long the PVC has been bound (Events) and by what resource (Age), in this caseFrom:datavolume-controllerExample output
Status: Conditions: Last Heart Beat Time: 2020-07-15T03:58:24Z Last Transition Time: 2020-07-15T03:58:24Z Message: PVC win10-rootdisk Bound Reason: Bound Status: True Type: Bound Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Bound 24s datavolume-controller PVC example-dv Bound - – In this case, note that
RunningisTypeandRunningisStatus, indicating that an event has occurred that caused an attempted operation to fail, changing the Status fromFalsetoTrue.FalseHowever, note that
isReasonand theCompletedfield indicatesMessage.Import CompleteIn the
section, theEventsandReasoncontain additional troubleshooting information about the failed operation. In this example, theMessagedisplays an inability to connect due to aMessage, listed in the404section’s firstEvents.WarningFrom this information, you conclude that an import operation was running, creating contention for other operations that are attempting to access the data volume:
Example output
Status: Conditions: Last Heart Beat Time: 2020-07-15T04:31:39Z Last Transition Time: 2020-07-15T04:31:39Z Message: Import Complete Reason: Completed Status: False Type: Running Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning Error 12s (x2 over 14s) datavolume-controller Unable to connect to http data source: expected status code 200, got 404. Status: 404 Not Found - – If
ReadyisTypeandReadyisStatus, then the data volume is ready to be used, as in the following example. If the data volume is not ready to be used, theTrueisStatus:FalseExample output
Status: Conditions: Last Heart Beat Time: 2020-07-15T04:31:39Z Last Transition Time: 2020-07-15T04:31:39Z Status: True Type: Ready
15.6. Viewing information about virtual machine workloads Link kopierenLink in die Zwischenablage kopiert!
You can view high-level information about your virtual machines by using the Virtual Machines dashboard in the OpenShift Container Platform web console.
15.6.1. The Virtual Machines dashboard Link kopierenLink in die Zwischenablage kopiert!
Access virtual machines (VMs) from the OpenShift Container Platform web console by navigating to the Virtualization
The Overview tab displays the following cards:
Details provides identifying information about the virtual machine, including:
- Name
- Status
- Date of creation
- Operating system
- CPU and memory
- Hostname
- Template
If the VM is running, there is an active VNC preview window and a link to open the VNC web console. The Options menu
on the Details card provides options to stop or pause the VM, and to copy the command for SSH tunneling.ssh over nodeportAlerts lists VM alerts with three severity levels:
- Critical
- Warning
- Info
Snapshots provides information about VM snapshots and the ability to take a snapshot. For each snapshot listed, the Snapshots card includes:
- A visual indicator of the status of the snapshot, if it is successfully created, is still in progress, or has failed.
-
An Options menu
with options to restore or delete the snapshot
Network interfaces provides information about the network interfaces of the VM, including:
- Name (Network and Type)
- IP address, with the ability to copy the IP address to the clipboard
Disks lists VM disks details, including:
- Name
- Drive
- Size
Utilization includes charts that display usage data for:
- CPU
- Memory
- Storage
- Network transfer
NoteUse the drop-down list to choose a duration for the utilization data. The available options are 5 minutes, 1 hour, 6 hours, and 24 hours.
Hardware Devices provides information about GPU and host devices, including:
- Resource name
- Hardware device name
15.7. Monitoring virtual machine health Link kopierenLink in die Zwischenablage kopiert!
A virtual machine instance (VMI) can become unhealthy due to transient issues such as connectivity loss, deadlocks, or problems with external dependencies. A health check periodically performs diagnostics on a VMI by using any combination of the readiness and liveness probes.
15.7.1. About readiness and liveness probes Link kopierenLink in die Zwischenablage kopiert!
Use readiness and liveness probes to detect and handle unhealthy virtual machine instances (VMIs). You can include one or more probes in the specification of the VMI to ensure that traffic does not reach a VMI that is not ready for it and that a new instance is created when a VMI becomes unresponsive.
A readiness probe determines whether a VMI is ready to accept service requests. If the probe fails, the VMI is removed from the list of available endpoints until the VMI is ready.
A liveness probe determines whether a VMI is responsive. If the probe fails, the VMI is deleted and a new instance is created to restore responsiveness.
You can configure readiness and liveness probes by setting the
spec.readinessProbe
spec.livenessProbe
VirtualMachineInstance
- HTTP GET
- The probe determines the health of the VMI by using a web hook. The test is successful if the HTTP response code is between 200 and 399. You can use an HTTP GET test with applications that return HTTP status codes when they are completely initialized.
- TCP socket
- The probe attempts to open a socket to the VMI. The VMI is only considered healthy if the probe can establish a connection. You can use a TCP socket test with applications that do not start listening until initialization is complete.
- Guest agent ping
-
The probe uses the
guest-pingcommand to determine if the QEMU guest agent is running on the virtual machine.
15.7.2. Defining an HTTP readiness probe Link kopierenLink in die Zwischenablage kopiert!
Define an HTTP readiness probe by setting the
spec.readinessProbe.httpGet
Procedure
Include details of the readiness probe in the VMI configuration file.
Sample readiness probe with an HTTP GET test
# ... spec: readinessProbe: httpGet:1 port: 15002 path: /healthz3 httpHeaders: - name: Custom-Header value: Awesome initialDelaySeconds: 1204 periodSeconds: 205 timeoutSeconds: 106 failureThreshold: 37 successThreshold: 38 # ...- 1
- The HTTP GET request to perform to connect to the VMI.
- 2
- The port of the VMI that the probe queries. In the above example, the probe queries port 1500.
- 3
- The path to access on the HTTP server. In the above example, if the handler for the server’s /healthz path returns a success code, the VMI is considered to be healthy. If the handler returns a failure code, the VMI is removed from the list of available endpoints.
- 4
- The time, in seconds, after the VMI starts before the readiness probe is initiated.
- 5
- The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than
timeoutSeconds. - 6
- The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than
periodSeconds. - 7
- The number of times that the probe is allowed to fail. The default is 3. After the specified number of attempts, the pod is marked
Unready. - 8
- The number of times that the probe must report success, after a failure, to be considered successful. The default is 1.
Create the VMI by running the following command:
$ oc create -f <file_name>.yaml
15.7.3. Defining a TCP readiness probe Link kopierenLink in die Zwischenablage kopiert!
Define a TCP readiness probe by setting the
spec.readinessProbe.tcpSocket
Procedure
Include details of the TCP readiness probe in the VMI configuration file.
Sample readiness probe with a TCP socket test
... spec: readinessProbe: initialDelaySeconds: 1201 periodSeconds: 202 tcpSocket:3 port: 15004 timeoutSeconds: 105 ...- 1
- The time, in seconds, after the VMI starts before the readiness probe is initiated.
- 2
- The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than
timeoutSeconds. - 3
- The TCP action to perform.
- 4
- The port of the VMI that the probe queries.
- 5
- The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than
periodSeconds.
Create the VMI by running the following command:
$ oc create -f <file_name>.yaml
15.7.4. Defining an HTTP liveness probe Link kopierenLink in die Zwischenablage kopiert!
Define an HTTP liveness probe by setting the
spec.livenessProbe.httpGet
Procedure
Include details of the HTTP liveness probe in the VMI configuration file.
Sample liveness probe with an HTTP GET test
# ... spec: livenessProbe: initialDelaySeconds: 1201 periodSeconds: 202 httpGet:3 port: 15004 path: /healthz5 httpHeaders: - name: Custom-Header value: Awesome timeoutSeconds: 106 # ...- 1
- The time, in seconds, after the VMI starts before the liveness probe is initiated.
- 2
- The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than
timeoutSeconds. - 3
- The HTTP GET request to perform to connect to the VMI.
- 4
- The port of the VMI that the probe queries. In the above example, the probe queries port 1500. The VMI installs and runs a minimal HTTP server on port 1500 via cloud-init.
- 5
- The path to access on the HTTP server. In the above example, if the handler for the server’s
/healthzpath returns a success code, the VMI is considered to be healthy. If the handler returns a failure code, the VMI is deleted and a new instance is created. - 6
- The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than
periodSeconds.
Create the VMI by running the following command:
$ oc create -f <file_name>.yaml
15.7.5. Defining a guest agent ping probe Link kopierenLink in die Zwischenablage kopiert!
Define a guest agent ping probe by setting the
spec.readinessProbe.guestAgentPing
The guest agent ping probe is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Prerequisites
- The QEMU guest agent must be installed and enabled on the virtual machine.
Procedure
Include details of the guest agent ping probe in the VMI configuration file. For example:
Sample guest agent ping probe
# ... spec: readinessProbe: guestAgentPing: {}1 initialDelaySeconds: 1202 periodSeconds: 203 timeoutSeconds: 104 failureThreshold: 35 successThreshold: 36 # ...- 1
- The guest agent ping probe to connect to the VMI.
- 2
- Optional: The time, in seconds, after the VMI starts before the guest agent probe is initiated.
- 3
- Optional: The delay, in seconds, between performing probes. The default delay is 10 seconds. This value must be greater than
timeoutSeconds. - 4
- Optional: The number of seconds of inactivity after which the probe times out and the VMI is assumed to have failed. The default value is 1. This value must be lower than
periodSeconds. - 5
- Optional: The number of times that the probe is allowed to fail. The default is 3. After the specified number of attempts, the pod is marked
Unready. - 6
- Optional: The number of times that the probe must report success, after a failure, to be considered successful. The default is 1.
Create the VMI by running the following command:
$ oc create -f <file_name>.yaml
15.7.6. Template: Virtual machine configuration file for defining health checks Link kopierenLink in die Zwischenablage kopiert!
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
labels:
special: vm-fedora
name: vm-fedora
spec:
template:
metadata:
labels:
special: vm-fedora
spec:
domain:
devices:
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
resources:
requests:
memory: 1024M
readinessProbe:
httpGet:
port: 1500
initialDelaySeconds: 120
periodSeconds: 20
timeoutSeconds: 10
failureThreshold: 3
successThreshold: 3
terminationGracePeriodSeconds: 180
volumes:
- name: containerdisk
containerDisk:
image: kubevirt/fedora-cloud-registry-disk-demo
- cloudInitNoCloud:
userData: |-
#cloud-config
password: fedora
chpasswd: { expire: False }
bootcmd:
- setenforce 0
- dnf install -y nmap-ncat
- systemd-run --unit=httpserver nc -klp 1500 -e '/usr/bin/echo -e HTTP/1.1 200 OK\\n\\nHello World!'
name: cloudinitdisk
15.8. Using the OpenShift Container Platform dashboard to get cluster information Link kopierenLink in die Zwischenablage kopiert!
Access the OpenShift Container Platform dashboard, which captures high-level information about the cluster, by clicking Home > Dashboards > Overview from the OpenShift Container Platform web console.
The OpenShift Container Platform dashboard provides various cluster information, captured in individual dashboard cards.
15.8.1. About the OpenShift Container Platform dashboards page Link kopierenLink in die Zwischenablage kopiert!
Access the OpenShift Container Platform dashboard, which captures high-level information about the cluster, by navigating to Home
The OpenShift Container Platform dashboard provides various cluster information, captured in individual dashboard cards.
The OpenShift Container Platform dashboard consists of the following cards:
Details provides a brief overview of informational cluster details.
Status include ok, error, warning, in progress, and unknown. Resources can add custom status names.
- Cluster ID
- Provider
- Version
Cluster Inventory details number of resources and associated statuses. It is helpful when intervention is required to resolve problems, including information about:
- Number of nodes
- Number of pods
- Persistent storage volume claims
- Virtual machines (available if OpenShift Virtualization is installed)
- Bare metal hosts in the cluster, listed according to their state (only available in metal3 environment).
Cluster Health summarizes the current health of the cluster as a whole, including relevant alerts and descriptions. If OpenShift Virtualization is installed, the overall health of OpenShift Virtualization is diagnosed as well. If more than one subsystem is present, click See All to view the status of each subsystem.
- Bare metal hosts in the cluster, listed according to their state (only available in metal3 environment)
- Status helps administrators understand how cluster resources are consumed. Click on a resource to jump to a detailed page listing pods and nodes that consume the largest amount of the specified cluster resource (CPU, memory, or storage).
Cluster Utilization shows the capacity of various resources over a specified period of time, to help administrators understand the scale and frequency of high resource consumption, including information about:
- CPU time
- Memory allocation
- Storage consumed
- Network resources consumed
- Pod count
- Activity lists messages related to recent activity in the cluster, such as pod creation or virtual machine migration to another host.
15.9. Reviewing resource usage by virtual machines Link kopierenLink in die Zwischenablage kopiert!
Dashboards in the OpenShift Container Platform web console provide visual representations of cluster metrics to help you to quickly understand the state of your cluster. Dashboards belong to the Monitoring overview that provides monitoring for core platform components.
The OpenShift Virtualization dashboard provides data on resource consumption for virtual machines and associated pods. The visualization metrics displayed in the OpenShift Virtualization dashboard are based on Prometheus Query Language (PromQL) queries.
A monitoring role is required to monitor user-defined namespaces in the OpenShift Virtualization dashboard.
You can view resource usage for a specific virtual machine on the VirtualMachine details page
15.9.1. About reviewing top consumers Link kopierenLink in die Zwischenablage kopiert!
In the OpenShift Virtualization dashboard, you can select a specific time period and view the top consumers of resources within that time period. Top consumers are virtual machines or
virt-launcher
The following table shows resources monitored in the dashboard and describes the metrics associated with each resource for top consumers.
| Monitored resources | Description |
| Memory swap traffic | Virtual machines consuming the most memory pressure when swapping memory. |
| vCPU wait | Virtual machines experiencing the maximum wait time (in seconds) for their vCPUs. |
| CPU usage by pod | The
|
| Network traffic | Virtual machines that are saturating the network by receiving the most amount of network traffic (in bytes). |
| Storage traffic | Virtual machines with the highest amount (in bytes) of storage-related traffic. |
| Storage IOPS | Virtual machines with the highest amount of I/O operations per second over a time period. |
| Memory usage | The
|
Viewing the maximum resource consumption is limited to the top five consumers.
15.9.2. Reviewing top consumers Link kopierenLink in die Zwischenablage kopiert!
In the Administrator perspective, you can view the OpenShift Virtualization dashboard where top consumers of resources are displayed.
Prerequisites
-
You have access to the cluster as a user with the role.
cluster-admin
Procedure
-
In the Administrator perspective in the OpenShift Virtualization web console, navigate to Observe
Dashboards. - Select the KubeVirt/Infrastructure Resources/Top Consumers dashboard from the Dashboard list.
- Select a predefined time period from the drop-down menu for Period. You can review the data for top consumers in the tables.
- Optional: Click Inspect to view or edit the Prometheus Query Language (PromQL) query associated with the top consumers for a table.
15.10. OpenShift Container Platform cluster monitoring, logging, and Telemetry Link kopierenLink in die Zwischenablage kopiert!
OpenShift Container Platform provides various resources for monitoring at the cluster level.
15.10.1. About OpenShift Container Platform monitoring Link kopierenLink in die Zwischenablage kopiert!
OpenShift Container Platform includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components. OpenShift Container Platform delivers monitoring best practices out of the box. A set of alerts are included by default that immediately notify cluster administrators about issues with a cluster. Default dashboards in the OpenShift Container Platform web console include visual representations of cluster metrics to help you to quickly understand the state of your cluster.
After installing OpenShift Container Platform 4.12, cluster administrators can optionally enable monitoring for user-defined projects. By using this feature, cluster administrators, developers, and other users can specify how services and pods are monitored in their own projects. You can then query metrics, review dashboards, and manage alerting rules and silences for your own projects in the OpenShift Container Platform web console.
Cluster administrators can grant developers and other users permission to monitor their own projects. Privileges are granted by assigning one of the predefined monitoring roles.
15.10.2. Logging architecture Link kopierenLink in die Zwischenablage kopiert!
The major components of the logging are:
- Collector
The collector is a daemonset that deploys pods to each OpenShift Container Platform node. It collects log data from each node, transforms the data, and forwards it to configured outputs. You can use the Vector collector or the legacy Fluentd collector.
NoteFluentd is deprecated and is planned to be removed in a future release. Red Hat provides bug fixes and support for this feature during the current release lifecycle, but this feature no longer receives enhancements. As an alternative to Fluentd, you can use Vector instead.
- Log store
The log store stores log data for analysis and is the default output for the log forwarder. You can use the default LokiStack log store, the legacy Elasticsearch log store, or forward logs to additional external log stores.
NoteThe Logging 5.9 release does not contain an updated version of the OpenShift Elasticsearch Operator. If you currently use the OpenShift Elasticsearch Operator released with Logging 5.8, it will continue to work with Logging until the EOL of Logging 5.8. As an alternative to using the OpenShift Elasticsearch Operator to manage the default log storage, you can use the Loki Operator. For more information on the Logging lifecycle dates, see Platform Agnostic Operators.
- Visualization
You can use a UI component to view a visual representation of your log data. The UI provides a graphical interface to search, query, and view stored logs. The OpenShift Container Platform web console UI is provided by enabling the OpenShift Container Platform console plugin.
NoteThe Kibana web console is now deprecated is planned to be removed in a future logging release.
Logging collects container logs and node logs. These are categorized into types:
- Application logs
- Container logs generated by user applications running in the cluster, except infrastructure container applications.
- Infrastructure logs
-
Container logs generated by infrastructure namespaces:
openshift*,kube*, ordefault, as well as journald messages from nodes. - Audit logs
-
Logs generated by auditd, the node audit system, which are stored in the /var/log/audit/audit.log file, and logs from the
auditd,kube-apiserver,openshift-apiserverservices, as well as theovnproject if enabled.
For more information on OpenShift Logging, see the OpenShift Logging documentation.
15.10.3. About Telemetry Link kopierenLink in die Zwischenablage kopiert!
Telemetry sends a carefully chosen subset of the cluster monitoring metrics to Red Hat. The Telemeter Client fetches the metrics values every four minutes and thirty seconds and uploads the data to Red Hat. These metrics are described in this document.
This stream of data is used by Red Hat to monitor the clusters in real-time and to react as necessary to problems that impact our customers. It also allows Red Hat to roll out OpenShift Container Platform upgrades to customers to minimize service impact and continuously improve the upgrade experience.
This debugging information is available to Red Hat Support and Engineering teams with the same restrictions as accessing data reported through support cases. All connected cluster information is used by Red Hat to help make OpenShift Container Platform better and more intuitive to use.
15.10.3.1. Information collected by Telemetry Link kopierenLink in die Zwischenablage kopiert!
The following information is collected by Telemetry:
15.10.3.1.1. System information Link kopierenLink in die Zwischenablage kopiert!
- Version information, including the OpenShift Container Platform cluster version and installed update details that are used to determine update version availability
- Update information, including the number of updates available per cluster, the channel and image repository used for an update, update progress information, and the number of errors that occur in an update
- The unique random identifier that is generated during an installation
- Configuration details that help Red Hat Support to provide beneficial support for customers, including node configuration at the cloud infrastructure level, hostnames, IP addresses, Kubernetes pod names, namespaces, and services
- The OpenShift Container Platform framework components installed in a cluster and their condition and status
- Events for all namespaces listed as "related objects" for a degraded Operator
- Information about degraded software
- Information about the validity of certificates
- The name of the provider platform that OpenShift Container Platform is deployed on and the data center location
15.10.3.1.2. Sizing Information Link kopierenLink in die Zwischenablage kopiert!
- Sizing information about clusters, machine types, and machines, including the number of CPU cores and the amount of RAM used for each
- The number of running virtual machine instances in a cluster
- The number of etcd members and the number of objects stored in the etcd cluster
- Number of application builds by build strategy type
15.10.3.1.3. Usage information Link kopierenLink in die Zwischenablage kopiert!
- Usage information about components, features, and extensions
- Usage details about Technology Previews and unsupported configurations
Telemetry does not collect identifying information such as usernames or passwords. Red Hat does not intend to collect personal information. If Red Hat discovers that personal information has been inadvertently received, Red Hat will delete such information. To the extent that any telemetry data constitutes personal data, please refer to the Red Hat Privacy Statement for more information about Red Hat’s privacy practices.
15.10.4. CLI troubleshooting and debugging commands Link kopierenLink in die Zwischenablage kopiert!
For a list of the
oc
15.11. Running cluster checkups Link kopierenLink in die Zwischenablage kopiert!
OpenShift Virtualization includes predefined checkups that can be used for cluster maintenance and troubleshooting.
The OpenShift Container Platform cluster checkup framework is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
15.11.1. About the OpenShift Container Platform cluster checkup framework Link kopierenLink in die Zwischenablage kopiert!
A checkup is an automated test workload that allows you to verify if a specific cluster functionality works as expected. The cluster checkup framework uses native Kubernetes resources to configure and execute the checkup.
By using predefined checkups, cluster administrators and developers can improve cluster maintainability, troubleshoot unexpected behavior, minimize errors, and save time. They can also review the results of the checkup and share them with experts for further analysis. Vendors can write and publish checkups for features or services that they provide and verify that their customer environments are configured correctly.
Running a predefined checkup in an existing namespace involves setting up a service account for the checkup, creating the
Role
RoleBinding
You must always:
- Verify that the checkup image is from a trustworthy source before applying it.
-
Review the checkup permissions before creating the and
Roleobjects.RoleBinding
15.11.2. Checking network connectivity and latency for virtual machines on a secondary network Link kopierenLink in die Zwischenablage kopiert!
You use a predefined checkup to verify network connectivity and measure latency between two virtual machines (VMs) that are attached to a secondary network interface.
To run a checkup for the first time, follow the steps in the procedure.
If you have previously run a checkup, skip to step 5 of the procedure because the steps to install the framework and enable permissions for the checkup are not required.
Prerequisites
-
You installed the OpenShift CLI ().
oc - The cluster has at least two worker nodes.
- The Multus Container Network Interface (CNI) plugin is installed on the cluster.
- You configured a network attachment definition for a namespace.
Procedure
Create a manifest file that contains the
,ServiceAccount, andRoleobjects with permissions that the checkup requires for cluster access:RoleBindingExample role manifest file:
--- apiVersion: v1 kind: ServiceAccount metadata: name: vm-latency-checkup-sa --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kubevirt-vm-latency-checker rules: - apiGroups: ["kubevirt.io"] resources: ["virtualmachineinstances"] verbs: ["get", "create", "delete"] - apiGroups: ["subresources.kubevirt.io"] resources: ["virtualmachineinstances/console"] verbs: ["get"] - apiGroups: ["k8s.cni.cncf.io"] resources: ["network-attachment-definitions"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kubevirt-vm-latency-checker subjects: - kind: ServiceAccount name: vm-latency-checkup-sa roleRef: kind: Role name: kubevirt-vm-latency-checker apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: kiagnose-configmap-access rules: - apiGroups: [ "" ] resources: [ "configmaps" ] verbs: ["get", "update"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: kiagnose-configmap-access subjects: - kind: ServiceAccount name: vm-latency-checkup-sa roleRef: kind: Role name: kiagnose-configmap-access apiGroup: rbac.authorization.k8s.ioApply the checkup roles manifest:
$ oc apply -n <target_namespace> -f <latency_roles>.yamlwhere:
<target_namespace>-
Specifies the namespace where the checkup is to be run. This must be an existing namespace where the
NetworkAttachmentDefinitionobject resides.
Create a
manifest that contains the input parameters for the checkup. The config map provides the input for the framework to run the checkup and also stores the results of the checkup.ConfigMapExample input config map:
apiVersion: v1 kind: ConfigMap metadata: name: kubevirt-vm-latency-checkup-config data: spec.timeout: 5m spec.param.network_attachment_definition_namespace: <target_namespace> spec.param.network_attachment_definition_name: "blue-network" spec.param.max_desired_latency_milliseconds: "10" spec.param.sample_duration_seconds: "5" spec.param.source_node: "worker1" spec.param.target_node: "worker2"where:
spec.param.network_attachment_definition_name-
Specifies the name of the
NetworkAttachmentDefinitionobject. data.spec.param.max_desired_latency_milliseconds- Optional: Specifies the maximum desired latency, in milliseconds, between the virtual machines. If the measured latency exceeds this value, the checkup fails.
spec.param.sample_duration_seconds- Optional: Specifies the duration of the latency check, in seconds.
spec.param.source_node-
Optional: When specified, latency is measured from this node to the target node. If the source node is specified, the
spec.param.targetNodefield cannot be empty. spec.param.target_node- Optional: When specified, latency is measured from the source node to this node.
Apply the config map manifest in the target namespace:
$ oc apply -n <target_namespace> -f <latency_config_map>.yamlCreate a
object to run the checkup:JobExample job manifest:
apiVersion: batch/v1 kind: Job metadata: name: kubevirt-vm-latency-checkup spec: backoffLimit: 0 template: spec: serviceAccountName: vm-latency-checkup-sa restartPolicy: Never containers: - name: vm-latency-checkup image: registry.redhat.io/container-native-virtualization/vm-network-latency-checkup:v4.12.0 securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] runAsNonRoot: true seccompProfile: type: "RuntimeDefault" env: - name: CONFIGMAP_NAMESPACE value: <target_namespace> - name: CONFIGMAP_NAME value: kubevirt-vm-latency-checkup-configApply the
manifest. The checkup uses the ping utility to verify connectivity and measure latency.Job$ oc apply -n <target_namespace> -f <latency_job>.yamlWait for the job to complete:
$ oc wait job kubevirt-vm-latency-checkup -n <target_namespace> --for condition=complete --timeout 6mReview the results of the latency checkup by running the following command. If the maximum measured latency is greater than the value of the
attribute, the checkup fails and returns an error.spec.param.max_desired_latency_milliseconds$ oc get configmap kubevirt-vm-latency-checkup-config -n <target_namespace> -o yamlExample output config map (success):
apiVersion: v1 kind: ConfigMap metadata: name: kubevirt-vm-latency-checkup-config namespace: <target_namespace> data: spec.timeout: 5m spec.param.network_attachment_definition_namespace: <target_namespace> spec.param.network_attachment_definition_name: "blue-network" spec.param.max_desired_latency_milliseconds: "10" spec.param.sample_duration_seconds: "5" spec.param.source_node: "worker1" spec.param.target_node: "worker2" status.succeeded: "true" status.failureReason: "" status.completionTimestamp: "2022-01-01T09:00:00Z" status.startTimestamp: "2022-01-01T09:00:07Z" status.result.avgLatencyNanoSec: "177000" status.result.maxLatencyNanoSec: "244000" status.result.measurementDurationSec: "5" status.result.minLatencyNanoSec: "135000" status.result.sourceNode: "worker1" status.result.targetNode: "worker2"where:
data.status.result.maxLatencyNanoSec- Specifies the maximum measured latency in nanoseconds.
Optional: To view the detailed job log in case of checkup failure, use the following command:
$ oc logs job.batch/kubevirt-vm-latency-checkup -n <target_namespace>Delete the job and config map resources that you previously created by running the following commands:
$ oc delete job -n <target_namespace> kubevirt-vm-latency-checkup$ oc delete config-map -n <target_namespace> kubevirt-vm-latency-checkup-configOptional: If you do not plan to run another checkup, delete the checkup role and framework manifest files.
$ oc delete -f <file_name>.yaml
15.12. Prometheus queries for virtual resources Link kopierenLink in die Zwischenablage kopiert!
OpenShift Virtualization provides metrics that you can use to monitor the consumption of cluster infrastructure resources, including vCPU, network, storage, and guest memory swapping. You can also use metrics to query live migration status.
Use the OpenShift Container Platform monitoring dashboard to query virtualization metrics.
15.12.1. Prerequisites Link kopierenLink in die Zwischenablage kopiert!
-
To use the vCPU metric, the kernel argument must be applied to the
schedstats=enableobject. This kernel argument enables scheduler statistics used for debugging and performance tuning and adds a minor additional load to the scheduler. See the OpenShift Container Platform machine configuration tasks documentation for more information on applying a kernel argument.MachineConfig - For guest memory swapping queries to return data, memory swapping must be enabled on the virtual guests.
15.12.2. About querying metrics Link kopierenLink in die Zwischenablage kopiert!
The OpenShift Container Platform monitoring dashboard enables you to run Prometheus Query Language (PromQL) queries to examine metrics visualized on a plot. This functionality provides information about the state of a cluster and any user-defined workloads that you are monitoring.
As a cluster administrator, you can query metrics for all core OpenShift Container Platform and user-defined projects.
As a developer, you must specify a project name when querying metrics. You must have the required privileges to view metrics for the selected project.
15.12.2.1. Querying metrics for all projects as a cluster administrator Link kopierenLink in die Zwischenablage kopiert!
As a cluster administrator or as a user with view permissions for all projects, you can access metrics for all default OpenShift Container Platform and user-defined projects in the Metrics UI.
Prerequisites
-
You have access to the cluster as a user with the cluster role or with view permissions for all projects.
cluster-admin -
You have installed the OpenShift CLI ().
oc
Procedure
-
From the Administrator perspective of the OpenShift Container Platform web console, go to Observe
Metrics. To add one or more queries, perform any of the following actions:
Expand Option Description Create a custom query.
Add your Prometheus Query Language (PromQL) query to the Expression field.
As you type a PromQL expression, autocomplete suggestions are displayed in a list. These suggestions include functions, metrics, labels, and time tokens. You can use the keyboard arrows to select one of these suggested items and then press Enter to add the item to your expression. You can also move your mouse pointer over a suggested item to view a brief description of that item.
Add multiple queries.
Click Add query.
Duplicate an existing query.
Click the Options menu
next to the query and select Duplicate query.
Delete a query.
Click the Options menu
next to the query and select Delete query.
Disable a query from being run.
Click the Options menu
next to the query and select Disable query.
To run queries that you created, click Run queries. The metrics from the queries are visualized on the plot. If a query is invalid, the UI shows an error message.
NoteQueries that operate on large amounts of data might time out or overload the browser when drawing time series graphs. To avoid this, click Hide graph and calibrate your query by using the metrics table. After finding a feasible query, enable the plot to draw the graphs.
- Optional: The page URL now contains the queries you ran. To use this set of queries again in the future, save this URL.
15.12.2.2. Querying metrics for user-defined projects as a developer Link kopierenLink in die Zwischenablage kopiert!
You can access metrics for a user-defined project as a developer or as a user with view permissions for the project.
In the Developer perspective, the Metrics UI includes some predefined CPU, memory, bandwidth, and network packet queries for the selected project. You can also run custom Prometheus Query Language (PromQL) queries for CPU, memory, bandwidth, network packet and application metrics for the project.
Developers can only use the Developer perspective and not the Administrator perspective. As a developer, you can only query metrics for one project at a time in the Observe -→ Metrics page in the web console for your user-defined project.
Prerequisites
- You have access to the cluster as a developer or as a user with view permissions for the project that you are viewing metrics for.
- You have enabled monitoring for user-defined projects.
- You have deployed a service in a user-defined project.
-
You have created a custom resource definition (CRD) for the service to define how the service is monitored.
ServiceMonitor
Procedure
- Select the Developer perspective in the OpenShift Container Platform web console.
-
Select Observe
Metrics. - Select the project that you want to view metrics for in the Project: list.
- Select a query from the Select query list, or create a custom PromQL query based on the selected query by selecting Show PromQL.
Optional: Select Custom query from the Select query list to enter a new query. As you type, autocomplete suggestions appear in a drop-down list. These suggestions include functions and metrics. Click a suggested item to select it.
NoteIn the Developer perspective, you can only run one query at a time.
15.12.3. Virtualization metrics Link kopierenLink in die Zwischenablage kopiert!
The following metric descriptions include example Prometheus Query Language (PromQL) queries. These metrics are not an API and might change between versions.
The following examples use
topk
15.12.3.1. vCPU metrics Link kopierenLink in die Zwischenablage kopiert!
The following query can identify virtual machines that are waiting for Input/Output (I/O):
kubevirt_vmi_vcpu_wait_seconds- Returns the wait time (in seconds) for a virtual machine’s vCPU. Type: Counter.
A value above '0' means that the vCPU wants to run, but the host scheduler cannot run it yet. This inability to run indicates that there is an issue with I/O.
To query the vCPU metric, the
schedstats=enable
MachineConfig
Example vCPU wait time query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_vcpu_wait_seconds[6m]))) > 0
- 1
- This query returns the top 3 VMs waiting for I/O at every given moment over a six-minute time period.
15.12.3.2. Network metrics Link kopierenLink in die Zwischenablage kopiert!
The following queries can identify virtual machines that are saturating the network:
kubevirt_vmi_network_receive_bytes_total- Returns the total amount of traffic received (in bytes) on the virtual machine’s network. Type: Counter.
kubevirt_vmi_network_transmit_bytes_total- Returns the total amount of traffic transmitted (in bytes) on the virtual machine’s network. Type: Counter.
Example network traffic query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_network_receive_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_network_transmit_bytes_total[6m]))) > 0
- 1
- This query returns the top 3 VMs transmitting the most network traffic at every given moment over a six-minute time period.
15.12.3.3. Storage metrics Link kopierenLink in die Zwischenablage kopiert!
15.12.3.3.1. Storage-related traffic Link kopierenLink in die Zwischenablage kopiert!
The following queries can identify VMs that are writing large amounts of data:
kubevirt_vmi_storage_read_traffic_bytes_total- Returns the total amount (in bytes) of the virtual machine’s storage-related traffic. Type: Counter.
kubevirt_vmi_storage_write_traffic_bytes_total- Returns the total amount of storage writes (in bytes) of the virtual machine’s storage-related traffic. Type: Counter.
Example storage-related traffic query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_storage_read_traffic_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_storage_write_traffic_bytes_total[6m]))) > 0
- 1
- This query returns the top 3 VMs performing the most storage traffic at every given moment over a six-minute time period.
15.12.3.3.2. Storage snapshot data Link kopierenLink in die Zwischenablage kopiert!
kubevirt_vmsnapshot_disks_restored_from_source_total- Returns the total number of virtual machine disks restored from the source virtual machine. Type: Gauge.
kubevirt_vmsnapshot_disks_restored_from_source_bytes- Returns the amount of space in bytes restored from the source virtual machine. Type: Gauge.
Examples of storage snapshot data queries
kubevirt_vmsnapshot_disks_restored_from_source_total{vm_name="simple-vm", vm_namespace="default"}
- 1
- This query returns the total number of virtual machine disks restored from the source virtual machine.
kubevirt_vmsnapshot_disks_restored_from_source_bytes{vm_name="simple-vm", vm_namespace="default"}
- 1
- This query returns the amount of space in bytes restored from the source virtual machine.
15.12.3.3.3. I/O performance Link kopierenLink in die Zwischenablage kopiert!
The following queries can determine the I/O performance of storage devices:
kubevirt_vmi_storage_iops_read_total- Returns the amount of write I/O operations the virtual machine is performing per second. Type: Counter.
kubevirt_vmi_storage_iops_write_total- Returns the amount of read I/O operations the virtual machine is performing per second. Type: Counter.
Example I/O performance query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_storage_iops_read_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_storage_iops_write_total[6m]))) > 0
- 1
- This query returns the top 3 VMs performing the most I/O operations per second at every given moment over a six-minute time period.
15.12.3.4. Guest memory swapping metrics Link kopierenLink in die Zwischenablage kopiert!
The following queries can identify which swap-enabled guests are performing the most memory swapping:
kubevirt_vmi_memory_swap_in_traffic_bytes_total- Returns the total amount (in bytes) of memory the virtual guest is swapping in. Type: Gauge.
kubevirt_vmi_memory_swap_out_traffic_bytes_total- Returns the total amount (in bytes) of memory the virtual guest is swapping out. Type: Gauge.
Example memory swapping query
topk(3, sum by (name, namespace) (rate(kubevirt_vmi_memory_swap_in_traffic_bytes_total[6m])) + sum by (name, namespace) (rate(kubevirt_vmi_memory_swap_out_traffic_bytes_total[6m]))) > 0
- 1
- This query returns the top 3 VMs where the guest is performing the most memory swapping at every given moment over a six-minute time period.
Memory swapping indicates that the virtual machine is under memory pressure. Increasing the memory allocation of the virtual machine can mitigate this issue.
15.12.4. Live migration metrics Link kopierenLink in die Zwischenablage kopiert!
The following metrics can be queried to show live migration status:
kubevirt_migrate_vmi_data_processed_bytes- The amount of guest operating system (OS) data that has migrated to the new virtual machine (VM). Type: Gauge.
kubevirt_migrate_vmi_data_remaining_bytes- The amount of guest OS data that remains to be migrated. Type: Gauge.
kubevirt_migrate_vmi_dirty_memory_rate_bytes- The rate at which memory is becoming dirty in the guest OS. Dirty memory is data that has been changed but not yet written to disk. Type: Gauge.
kubevirt_migrate_vmi_pending_count- The number of pending migrations. Type: Gauge.
kubevirt_migrate_vmi_scheduling_count- The number of scheduling migrations. Type: Gauge.
kubevirt_migrate_vmi_running_count- The number of running migrations. Type: Gauge.
kubevirt_migrate_vmi_succeeded- The number of successfully completed migrations. Type: Gauge.
kubevirt_migrate_vmi_failed- The number of failed migrations. Type: Gauge.
15.13. Exposing custom metrics for virtual machines Link kopierenLink in die Zwischenablage kopiert!
OpenShift Container Platform includes a preconfigured, preinstalled, and self-updating monitoring stack that provides monitoring for core platform components. This monitoring stack is based on the Prometheus monitoring system. Prometheus is a time-series database and a rule evaluation engine for metrics.
In addition to using the OpenShift Container Platform monitoring stack, you can enable monitoring for user-defined projects by using the CLI and query custom metrics that are exposed for virtual machines through the
node-exporter
15.13.1. Configuring the node exporter service Link kopierenLink in die Zwischenablage kopiert!
The node-exporter agent is deployed on every virtual machine in the cluster from which you want to collect metrics. Configure the node-exporter agent as a service to expose internal metrics and processes that are associated with virtual machines.
Prerequisites
-
Install the OpenShift Container Platform CLI .
oc -
Log in to the cluster as a user with privileges.
cluster-admin -
Create the
cluster-monitoring-configobject in theConfigMapproject.openshift-monitoring -
Configure the
user-workload-monitoring-configobject in theConfigMapproject by settingopenshift-user-workload-monitoringtoenableUserWorkload.true
Procedure
Create the
YAML file. In the following example, the file is calledService.node-exporter-service.yamlkind: Service apiVersion: v1 metadata: name: node-exporter-service1 namespace: dynamation2 labels: servicetype: metrics3 spec: ports: - name: exmet4 protocol: TCP port: 91005 targetPort: 91006 type: ClusterIP selector: monitor: metrics7 - 1
- The node-exporter service that exposes the metrics from the virtual machines.
- 2
- The namespace where the service is created.
- 3
- The label for the service. The
ServiceMonitoruses this label to match this service. - 4
- The name given to the port that exposes metrics on port 9100 for the
ClusterIPservice. - 5
- The target port used by
node-exporter-serviceto listen for requests. - 6
- The TCP port number of the virtual machine that is configured with the
monitorlabel. - 7
- The label used to match the virtual machine’s pods. In this example, any virtual machine’s pod with the label
monitorand a value ofmetricswill be matched.
Create the node-exporter service:
$ oc create -f node-exporter-service.yaml
15.13.2. Configuring a virtual machine with the node exporter service Link kopierenLink in die Zwischenablage kopiert!
Download the
node-exporter
systemd
Prerequisites
-
The pods for the component are running in the project.
openshift-user-workload-monitoring -
Grant the role to users who need to monitor this user-defined project.
monitoring-edit
Procedure
- Log on to the virtual machine.
Download the
file on to the virtual machine by using the directory path that applies to the version ofnode-exporterfile.node-exporter$ wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gzExtract the executable and place it in the
directory./usr/bin$ sudo tar xvf node_exporter-1.3.1.linux-amd64.tar.gz \ --directory /usr/bin --strip 1 "*/node_exporter"Create a
file in this directory path:node_exporter.service. This/etc/systemd/systemservice file runs the node-exporter service when the virtual machine reboots.systemd[Unit] Description=Prometheus Metrics Exporter After=network.target StartLimitIntervalSec=0 [Service] Type=simple Restart=always RestartSec=1 User=root ExecStart=/usr/bin/node_exporter [Install] WantedBy=multi-user.targetEnable and start the
service.systemd$ sudo systemctl enable node_exporter.service $ sudo systemctl start node_exporter.service
Verification
Verify that the node-exporter agent is reporting metrics from the virtual machine.
$ curl http://localhost:9100/metricsExample output
go_gc_duration_seconds{quantile="0"} 1.5244e-05 go_gc_duration_seconds{quantile="0.25"} 3.0449e-05 go_gc_duration_seconds{quantile="0.5"} 3.7913e-05
15.13.3. Creating a custom monitoring label for virtual machines Link kopierenLink in die Zwischenablage kopiert!
To enable queries to multiple virtual machines from a single service, add a custom label in the virtual machine’s YAML file.
Prerequisites
-
Install the OpenShift Container Platform CLI .
oc -
Log in as a user with privileges.
cluster-admin - Access to the web console for stop and restart a virtual machine.
Procedure
Edit the
spec of your virtual machine configuration file. In this example, the labeltemplatehas the valuemonitor.metricsspec: template: metadata: labels: monitor: metrics-
Stop and restart the virtual machine to create a new pod with the label name given to the label.
monitor
15.13.3.1. Querying the node-exporter service for metrics Link kopierenLink in die Zwischenablage kopiert!
Metrics are exposed for virtual machines through an HTTP service endpoint under the
/metrics
Prerequisites
-
You have access to the cluster as a user with privileges or the
cluster-adminrole.monitoring-edit - You have enabled monitoring for the user-defined project by configuring the node-exporter service.
Procedure
Obtain the HTTP service endpoint by specifying the namespace for the service:
$ oc get service -n <namespace> <node-exporter-service>To list all available metrics for the node-exporter service, query the
resource.metrics$ curl http://<172.30.226.162:9100>/metrics | grep -vE "^#|^$"Example output
node_arp_entries{device="eth0"} 1 node_boot_time_seconds 1.643153218e+09 node_context_switches_total 4.4938158e+07 node_cooling_device_cur_state{name="0",type="Processor"} 0 node_cooling_device_max_state{name="0",type="Processor"} 0 node_cpu_guest_seconds_total{cpu="0",mode="nice"} 0 node_cpu_guest_seconds_total{cpu="0",mode="user"} 0 node_cpu_seconds_total{cpu="0",mode="idle"} 1.10586485e+06 node_cpu_seconds_total{cpu="0",mode="iowait"} 37.61 node_cpu_seconds_total{cpu="0",mode="irq"} 233.91 node_cpu_seconds_total{cpu="0",mode="nice"} 551.47 node_cpu_seconds_total{cpu="0",mode="softirq"} 87.3 node_cpu_seconds_total{cpu="0",mode="steal"} 86.12 node_cpu_seconds_total{cpu="0",mode="system"} 464.15 node_cpu_seconds_total{cpu="0",mode="user"} 1075.2 node_disk_discard_time_seconds_total{device="vda"} 0 node_disk_discard_time_seconds_total{device="vdb"} 0 node_disk_discarded_sectors_total{device="vda"} 0 node_disk_discarded_sectors_total{device="vdb"} 0 node_disk_discards_completed_total{device="vda"} 0 node_disk_discards_completed_total{device="vdb"} 0 node_disk_discards_merged_total{device="vda"} 0 node_disk_discards_merged_total{device="vdb"} 0 node_disk_info{device="vda",major="252",minor="0"} 1 node_disk_info{device="vdb",major="252",minor="16"} 1 node_disk_io_now{device="vda"} 0 node_disk_io_now{device="vdb"} 0 node_disk_io_time_seconds_total{device="vda"} 174 node_disk_io_time_seconds_total{device="vdb"} 0.054 node_disk_io_time_weighted_seconds_total{device="vda"} 259.79200000000003 node_disk_io_time_weighted_seconds_total{device="vdb"} 0.039 node_disk_read_bytes_total{device="vda"} 3.71867136e+08 node_disk_read_bytes_total{device="vdb"} 366592 node_disk_read_time_seconds_total{device="vda"} 19.128 node_disk_read_time_seconds_total{device="vdb"} 0.039 node_disk_reads_completed_total{device="vda"} 5619 node_disk_reads_completed_total{device="vdb"} 96 node_disk_reads_merged_total{device="vda"} 5 node_disk_reads_merged_total{device="vdb"} 0 node_disk_write_time_seconds_total{device="vda"} 240.66400000000002 node_disk_write_time_seconds_total{device="vdb"} 0 node_disk_writes_completed_total{device="vda"} 71584 node_disk_writes_completed_total{device="vdb"} 0 node_disk_writes_merged_total{device="vda"} 19761 node_disk_writes_merged_total{device="vdb"} 0 node_disk_written_bytes_total{device="vda"} 2.007924224e+09 node_disk_written_bytes_total{device="vdb"} 0
15.13.4. Creating a ServiceMonitor resource for the node exporter service Link kopierenLink in die Zwischenablage kopiert!
You can use a Prometheus client library and scrape metrics from the
/metrics
ServiceMonitor
Prerequisites
-
You have access to the cluster as a user with privileges or the
cluster-adminrole.monitoring-edit - You have enabled monitoring for the user-defined project by configuring the node-exporter service.
Procedure
Create a YAML file for the
resource configuration. In this example, the service monitor matches any service with the labelServiceMonitorand queries themetricsport every 30 seconds.exmetapiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: labels: k8s-app: node-exporter-metrics-monitor name: node-exporter-metrics-monitor1 namespace: dynamation2 spec: endpoints: - interval: 30s3 port: exmet4 scheme: http selector: matchLabels: servicetype: metricsCreate the
configuration for the node-exporter service.ServiceMonitor$ oc create -f node-exporter-metrics-monitor.yaml
15.13.4.1. Accessing the node exporter service outside the cluster Link kopierenLink in die Zwischenablage kopiert!
You can access the node-exporter service outside the cluster and view the exposed metrics.
Prerequisites
-
You have access to the cluster as a user with privileges or the
cluster-adminrole.monitoring-edit - You have enabled monitoring for the user-defined project by configuring the node-exporter service.
Procedure
Expose the node-exporter service.
$ oc expose service -n <namespace> <node_exporter_service_name>Obtain the FQDN (Fully Qualified Domain Name) for the route.
$ oc get route -o=custom-columns=NAME:.metadata.name,DNS:.spec.hostExample output
NAME DNS node-exporter-service node-exporter-service-dynamation.apps.cluster.example.orgUse the
command to display metrics for the node-exporter service.curl$ curl -s http://node-exporter-service-dynamation.apps.cluster.example.org/metricsExample output
go_gc_duration_seconds{quantile="0"} 1.5382e-05 go_gc_duration_seconds{quantile="0.25"} 3.1163e-05 go_gc_duration_seconds{quantile="0.5"} 3.8546e-05 go_gc_duration_seconds{quantile="0.75"} 4.9139e-05 go_gc_duration_seconds{quantile="1"} 0.000189423
15.14. OpenShift Virtualization runbooks Link kopierenLink in die Zwischenablage kopiert!
Runbooks for the OpenShift Virtualization Operator are maintained in the openshift/runbooks Git repository, and you can view them on GitHub. To diagnose and resolve issues that trigger OpenShift Virtualization alerts, follow the procedures in the runbooks.
OpenShift Virtualization alerts are displayed in the Virtualization
15.14.1. CDIDataImportCronOutdated Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CDIDataImportCronOutdated
15.14.2. CDIDataVolumeUnusualRestartCount Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CDIDataVolumeUnusualRestartCount
15.14.3. CDIDefaultStorageClassDegraded Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CDIDefaultStorageClassDegraded
15.14.4. CDIMultipleDefaultVirtStorageClasses Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CDIMultipleDefaultVirtStorageClasses
15.14.5. CDINoDefaultStorageClass Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CDINoDefaultStorageClass
15.14.6. CDINotReady Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CDINotReady
15.14.7. CDIOperatorDown Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CDIOperatorDown
15.14.8. CDIStorageProfilesIncomplete Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CDIStorageProfilesIncomplete
15.14.9. CnaoDown Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CnaoDown
15.14.10. CnaoNMstateMigration Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
CnaoNMstateMigration
15.14.11. HCOInstallationIncomplete Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
HCOInstallationIncomplete
15.14.12. HPPNotReady Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
HPPNotReady
15.14.13. HPPOperatorDown Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
HPPOperatorDown
15.14.14. HPPSharingPoolPathWithOS Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
HPPSharingPoolPathWithOS
15.14.15. KubemacpoolDown Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
KubemacpoolDown
15.14.16. KubeMacPoolDuplicateMacsFound Link kopierenLink in die Zwischenablage kopiert!
-
The alert is deprecated.
KubeMacPoolDuplicateMacsFound
15.14.17. KubeVirtComponentExceedsRequestedCPU Link kopierenLink in die Zwischenablage kopiert!
-
The alert is deprecated.
KubeVirtComponentExceedsRequestedCPU
15.14.18. KubeVirtComponentExceedsRequestedMemory Link kopierenLink in die Zwischenablage kopiert!
-
The alert is deprecated.
KubeVirtComponentExceedsRequestedMemory
15.14.19. KubeVirtCRModified Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
KubeVirtCRModified
15.14.20. KubeVirtDeprecatedAPIRequested Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
KubeVirtDeprecatedAPIRequested
15.14.21. KubeVirtNoAvailableNodesToRunVMs Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
KubeVirtNoAvailableNodesToRunVMs
15.14.22. KubevirtVmHighMemoryUsage Link kopierenLink in die Zwischenablage kopiert!
-
The alert is deprecated.
KubevirtVmHighMemoryUsage
15.14.23. KubeVirtVMIExcessiveMigrations Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
KubeVirtVMIExcessiveMigrations
15.14.24. LowKVMNodesCount Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
LowKVMNodesCount
15.14.25. LowReadyVirtControllersCount Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
LowReadyVirtControllersCount
15.14.26. LowReadyVirtOperatorsCount Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
LowReadyVirtOperatorsCount
15.14.27. LowVirtAPICount Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
LowVirtAPICount
15.14.28. LowVirtControllersCount Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
LowVirtControllersCount
15.14.29. LowVirtOperatorCount Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
LowVirtOperatorCount
15.14.30. NetworkAddonsConfigNotReady Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
NetworkAddonsConfigNotReady
15.14.31. NoLeadingVirtOperator Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
NoLeadingVirtOperator
15.14.32. NoReadyVirtController Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
NoReadyVirtController
15.14.33. NoReadyVirtOperator Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
NoReadyVirtOperator
15.14.34. OrphanedVirtualMachineInstances Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
OrphanedVirtualMachineInstances
15.14.35. OutdatedVirtualMachineInstanceWorkloads Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
OutdatedVirtualMachineInstanceWorkloads
15.14.36. SingleStackIPv6Unsupported Link kopierenLink in die Zwischenablage kopiert!
-
The alert is deprecated.
SingleStackIPv6Unsupported
15.14.37. SSPCommonTemplatesModificationReverted Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
SSPCommonTemplatesModificationReverted
15.14.38. SSPDown Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
SSPDown
15.14.39. SSPFailingToReconcile Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
SSPFailingToReconcile
15.14.40. SSPHighRateRejectedVms Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
SSPHighRateRejectedVms
15.14.41. SSPTemplateValidatorDown Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
SSPTemplateValidatorDown
15.14.42. UnsupportedHCOModification Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
UnsupportedHCOModification
15.14.43. VirtAPIDown Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VirtAPIDown
15.14.44. VirtApiRESTErrorsBurst Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VirtApiRESTErrorsBurst
15.14.45. VirtApiRESTErrorsHigh Link kopierenLink in die Zwischenablage kopiert!
-
The alert is deprecated.
VirtApiRESTErrorsHigh
15.14.46. VirtControllerDown Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VirtControllerDown
15.14.47. VirtControllerRESTErrorsBurst Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VirtControllerRESTErrorsBurst
15.14.48. VirtControllerRESTErrorsHigh Link kopierenLink in die Zwischenablage kopiert!
-
The alert is deprecated.
VirtControllerRESTErrorsHigh
15.14.49. VirtHandlerDaemonSetRolloutFailing Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VirtHandlerDaemonSetRolloutFailing
15.14.50. VirtHandlerRESTErrorsBurst Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VirtHandlerRESTErrorsBurst
15.14.51. VirtHandlerRESTErrorsHigh Link kopierenLink in die Zwischenablage kopiert!
-
The alert is deprecated.
VirtHandlerRESTErrorsHigh
15.14.52. VirtOperatorDown Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VirtOperatorDown
15.14.53. VirtOperatorRESTErrorsBurst Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VirtOperatorRESTErrorsBurst
15.14.54. VirtOperatorRESTErrorsHigh Link kopierenLink in die Zwischenablage kopiert!
-
The alert is deprecated.
VirtOperatorRESTErrorsHigh
15.14.55. VirtualMachineCRCErrors Link kopierenLink in die Zwischenablage kopiert!
The runbook for the
alert is deprecated because the alert was renamed toVirtualMachineCRCErrors.VMStorageClassWarning-
View the runbook for the alert.
VMStorageClassWarning
-
View the runbook for the
15.14.56. VMCannotBeEvicted Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VMCannotBeEvicted
15.14.57. VMStorageClassWarning Link kopierenLink in die Zwischenablage kopiert!
-
View the runbook for the alert.
VMStorageClassWarning
15.15. Collecting data for Red Hat Support Link kopierenLink in die Zwischenablage kopiert!
When you submit a support case to Red Hat Support, it is helpful to provide debugging information for OpenShift Container Platform and OpenShift Virtualization by using the following tools:
- must-gather tool
-
The
must-gathertool collects diagnostic information, including resource definitions and service logs. - Prometheus
- Prometheus is a time-series database and a rule evaluation engine for metrics. Prometheus sends alerts to Alertmanager for processing.
- Alertmanager
- The Alertmanager service handles alerts received from Prometheus. The Alertmanager is also responsible for sending the alerts to external notification systems.
15.15.1. Collecting data about your environment Link kopierenLink in die Zwischenablage kopiert!
Collecting data about your environment minimizes the time required to analyze and determine the root cause.
Prerequisites
- Set the retention time for Prometheus metrics data to a minimum of seven days.
- Configure the Alertmanager to capture relevant alerts and to send them to a dedicated mailbox so that they can be viewed and persisted outside the cluster.
- Record the exact number of affected nodes and virtual machines.
Procedure
-
Collect data for the cluster by using the default
must-gatherimage.must-gather -
Collect data for Red Hat OpenShift Data Foundation, if necessary.
must-gather -
Collect data for OpenShift Virtualization by using the OpenShift Virtualization
must-gatherimage.must-gather - Collect Prometheus metrics for the cluster.
15.15.2. Collecting data about virtual machines Link kopierenLink in die Zwischenablage kopiert!
Collecting data about malfunctioning virtual machines (VMs) minimizes the time required to analyze and determine the root cause.
Prerequisites
Windows VMs:
- Record the Windows patch update details for Red Hat Support.
- Install the latest version of the VirtIO drivers. The VirtIO drivers include the QEMU guest agent.
- If Remote Desktop Protocol (RDP) is enabled, try to connect to the VMs with RDP to determine whether there is a problem with the connection software.
Procedure
-
Collect detailed data about the malfunctioning VMs.
must-gather - Collect screenshots of VMs that have crashed before you restart them.
- Record factors that the malfunctioning VMs have in common. For example, the VMs have the same host or network.
15.15.3. Using the must-gather tool for OpenShift Virtualization Link kopierenLink in die Zwischenablage kopiert!
You can collect data about OpenShift Virtualization resources by running the
must-gather
The default data collection includes information about the following resources:
- OpenShift Virtualization Operator namespaces, including child objects
- OpenShift Virtualization custom resource definitions
- Namespaces that contain virtual machines
- Basic virtual machine definitions
Procedure
Run the following command to collect data about OpenShift Virtualization:
$ oc adm must-gather --image-stream=openshift/must-gather \ --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8:v4.12.22
15.15.3.1. must-gather tool options Link kopierenLink in die Zwischenablage kopiert!
You can specify a combination of scripts and environment variables for the following options:
- Collecting detailed virtual machine (VM) information from a namespace
- Collecting detailed information about specified VMs
- Collecting image, image-stream, and image-stream-tags information
-
Limiting the maximum number of parallel processes used by the tool
must-gather
15.15.3.1.1. Parameters Link kopierenLink in die Zwischenablage kopiert!
Environment variables
You can specify environment variables for a compatible script.
NS=<namespace_name>-
Collect virtual machine information, including
virt-launcherpod details, from the namespace that you specify. TheVirtualMachineandVirtualMachineInstanceCR data is collected for all namespaces. VM=<vm_name>-
Collect details about a particular virtual machine. To use this option, you must also specify a namespace by using the
NSenvironment variable. PROS=<number_of_processes>Modify the maximum number of parallel processes that the
tool uses. The default value ismust-gather.5ImportantUsing too many parallel processes can cause performance issues. Increasing the maximum number of parallel processes is not recommended.
Scripts
Each script is compatible only with certain environment variable combinations.
/usr/bin/gather-
Use the default
must-gatherscript, which collects cluster data from all namespaces and includes only basic VM information. This script is compatible only with thePROSvariable. /usr/bin/gather --vms_details-
Collect VM log files, VM definitions, control-plane logs, and namespaces that belong to OpenShift Virtualization resources. Specifying namespaces includes their child objects. If you use this parameter without specifying a namespace or VM, the
must-gathertool collects this data for all VMs in the cluster. This script is compatible with all environment variables, but you must specify a namespace if you use theVMvariable. /usr/bin/gather --images-
Collect image, image-stream, and image-stream-tags custom resource information. This script is compatible only with the
PROSvariable.
15.15.3.1.2. Usage and examples Link kopierenLink in die Zwischenablage kopiert!
Environment variables are optional. You can run a script by itself or with one or more compatible environment variables.
| Script | Compatible environment variable |
|---|---|
|
|
|
|
|
|
|
|
|
Syntax
$ oc adm must-gather \
--image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8:v4.12.22 \
-- <environment_variable_1> <environment_variable_2> <script_name>
Default data collection parallel processes
By default, five processes run in parallel.
$ oc adm must-gather \
--image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8:v4.12.22 \
-- PROS=5 /usr/bin/gather
- 1
- You can modify the number of parallel processes by changing the default.
Detailed VM information
The following command collects detailed VM information for the
my-vm
mynamespace
$ oc adm must-gather \
--image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8:v4.12.22 \
-- NS=mynamespace VM=my-vm /usr/bin/gather --vms_details
- 1
- The
NSenvironment variable is mandatory if you use theVMenvironment variable.
Image, image-stream, and image-stream-tags information
The following command collects image, image-stream, and image-stream-tags information from the cluster:
$ oc adm must-gather \
--image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8:v4.12.22 \
-- /usr/bin/gather --images