Chapter 11. Logging, events, and monitoring
11.1. Viewing virtual machine logs
11.1.1. Understanding virtual machine logs
Logs are collected for OpenShift Container Platform Builds, Deployments, and pods. In container-native virtualization, virtual machine logs can be retrieved from the virtual machine launcher Pod in either the web console or the CLI.
The -f
option follows the log output in real time, which is useful for monitoring progress and error checking.
If the launcher Pod is failing to start, use the --previous
option to see the logs of the last attempt.
ErrImagePull
and ImagePullBackOff
errors can be caused by an incorrect Deployment configuration or problems with the images that are referenced.
11.1.2. Viewing virtual machine logs in the CLI
Get virtual machine logs from the virtual machine launcher pod.
Procedure
User the following command:
$ oc logs <virt-launcher-name>
11.1.3. Viewing virtual machine logs in the web console
Get virtual machine logs from the associated virtual machine launcher Pod.
Procedure
-
In the container-native virtualization console, click Workloads
Virtual Machines. - Click the virtual machine to open the Virtual Machine Details panel.
-
In the Overview tab, click the
virt-launcher-<name>
Pod in the POD section. - Click Logs.
11.2. Viewing events
11.2.1. Understanding virtual machine events
OpenShift Container Platform events are records of important life-cycle information in a namespace and are useful for monitoring and troubleshooting resource scheduling, creation, and deletion issues.
Container-native virtualization adds events for virtual machines and virtual machine instances. These can be viewed from either the web console or the CLI.
See also: Viewing system event information in an OpenShift Container Platform cluster.
11.2.2. Viewing the events for a virtual machine in the web console
You can view the stream events for a running a virtual machine from the Virtual Machine Details panel of the web console.
The ▮▮ button pauses the events stream.
The ▶ button continues a paused events stream.
Procedure
-
Click Workloads
Virtual Machines from the side menu. - Select a Virtual Machine.
- Click Events to view all events for the virtual machine.
11.2.3. Viewing namespace events in the CLI
Use the OpenShift Container Platform client to get the events for a namespace.
Procedure
In the namespace, use the
oc get
command:$ oc get events
11.2.4. Viewing resource events in the CLI
Events are included in the resource description, which you can get using the OpenShift Container Platform client.
Procedure
In the namespace, use the
oc describe
command. The following example shows how to get the events for a virtual machine, a virtual machine instance, and the virt-launcher Pod for a virtual machine:$ oc describe vm <vm> $ oc describe vmi <vmi> $ oc describe pod virt-launcher-<name>
11.3. Viewing information about virtual machine workloads
You can view high-level information about your virtual machines by using the Virtual Machines dashboard in the OpenShift Container Platform web console.
11.3.1. About the Virtual Machines dashboard
Access the Virtual Machines dashboard from the OpenShift Container Platform web console by navigating to the Workloads
The dashboard includes the following cards:
Details provides identifying information about the virtual machine, including:
- Name
- Namespace
- Date of creation
- Node name
- IP address
Inventory lists the virtual machine’s resources, including:
- Network interface controllers (NICs)
- Disks
Status includes:
- The current status of the virtual machine
- A note indicating whether or not the QEMU guest agent is installed on the virtual machine
Utilization includes charts that display usage data for:
- CPU
- Memory
- Filesystem
- Network transfer
Use the drop-down list to choose a duration for the utilization data. The available options are 1 Hour, 6 Hours, and 24 Hours.
- Events lists messages about virtual machine activity over the past hour. To view additional events, click View all.
11.4. Monitoring virtual machine health
Use this procedure to create liveness and readiness probes to monitor virtual machine health.
11.4.1. About liveness and readiness probes
When a VirtualMachineInstance (VMI) fails, liveness probes stop the VMI. Controllers such as VirtualMachine then spawn other VMIs, restoring virtual machine responsiveness.
Readiness probes tell services and endpoints that the VirtualMachineInstance is ready to receive traffic from services. If readiness probes fail, the VirtualMachineInstance is removed from applicable endpoints until the probe recovers.
11.4.2. Define an HTTP liveness probe
This procedure provides an example configuration file for defining HTTP liveness probes.
Procedure
Customize a YAML configuration file to create an HTTP liveness probe, using the following code block as an example. In this example:
-
You configure a probe using
spec.livenessProbe.httpGet
, which queries port1500
of the VirtualMachineInstance, after an initial delay of120
seconds. The VirtualMachineInstance installs and runs a minimal HTTP server on port
1500
usingcloud-init
.apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachineInstance metadata: labels: special: vmi-fedora name: vmi-fedora spec: domain: devices: disks: - disk: bus: virtio name: containerdisk - disk: bus: virtio name: cloudinitdisk resources: requests: memory: 1024M livenessProbe: initialDelaySeconds: 120 periodSeconds: 20 httpGet: port: 1500 timeoutSeconds: 10 terminationGracePeriodSeconds: 0 volumes: - name: containerdisk registryDisk: image: kubevirt/fedora-cloud-registry-disk-demo - cloudInitNoCloud: userData: |- #cloud-config password: fedora chpasswd: { expire: False } bootcmd: - setenforce 0 - dnf install -y nmap-ncat - systemd-run --unit=httpserver nc -klp 1500 -e '/usr/bin/echo -e HTTP/1.1 200 OK\\n\\nHello World!' name: cloudinitdisk
-
You configure a probe using
Create the VirtualMachineInstance by running the following command:
$ oc create -f <file name>.yaml
11.4.3. Define a TCP liveness probe
This procedure provides an example configuration file for defining TCP liveness probes.
Procedure
Customize a YAML configuration file to create an TCP liveness probe, using this code block as an example. In this example:
-
You configure a probe using
spec.livenessProbe.tcpSocket
, which queries port1500
of the VirtualMachineInstance, after an initial delay of120
seconds. The VirtualMachineInstance installs and runs a minimal HTTP server on port
1500
usingcloud-init
.apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachineInstance metadata: labels: special: vmi-fedora name: vmi-fedora spec: domain: devices: disks: - disk: bus: virtio name: containerdisk - disk: bus: virtio name: cloudinitdisk resources: requests: memory: 1024M livenessProbe: initialDelaySeconds: 120 periodSeconds: 20 tcpSocket: port: 1500 timeoutSeconds: 10 terminationGracePeriodSeconds: 0 volumes: - name: containerdisk registryDisk: image: kubevirt/fedora-cloud-registry-disk-demo - cloudInitNoCloud: userData: |- #cloud-config password: fedora chpasswd: { expire: False } bootcmd: - setenforce 0 - dnf install -y nmap-ncat - systemd-run --unit=httpserver nc -klp 1500 -e '/usr/bin/echo -e HTTP/1.1 200 OK\\n\\nHello World!' name: cloudinitdisk
-
You configure a probe using
Create the VirtualMachineInstance by running the following command:
$ oc create -f <file name>.yaml
11.4.4. Define a readiness probe
This procedure provides an example configuration file for defining readiness probes.
Procedure
Customize a YAML configuration file to create a readiness probe. Readiness probes are configured in a similar manner to liveness probes. However, note the following differences in this example:
-
Readiness probes are saved using a different spec name. For example, you create a readiness probe as
spec.readinessProbe
instead of asspec.livenessProbe.<type-of-probe>
. When creating a readiness probe, you optionally set a
failureThreshold
and asuccessThreshold
to switch betweenready
andnon-ready
states, should the probe succeed or fail multiple times.apiVersion: kubevirt.io/v1alpha3 kind: VirtualMachineInstance metadata: labels: special: vmi-fedora name: vmi-fedora spec: domain: devices: disks: - disk: bus: virtio name: containerdisk - disk: bus: virtio name: cloudinitdisk resources: requests: memory: 1024M readinessProbe: httpGet: port: 1500 initialDelaySeconds: 120 periodSeconds: 20 timeoutSeconds: 10 failureThreshold: 3 successThreshold: 3 terminationGracePeriodSeconds: 0 volumes: - name: containerdisk registryDisk: image: kubevirt/fedora-cloud-registry-disk-demo - cloudInitNoCloud: userData: |- #cloud-config password: fedora chpasswd: { expire: False } bootcmd: - setenforce 0 - dnf install -y nmap-ncat - systemd-run --unit=httpserver nc -klp 1500 -e '/usr/bin/echo -e HTTP/1.1 200 OK\\n\\nHello World!' name: cloudinitdisk
-
Readiness probes are saved using a different spec name. For example, you create a readiness probe as
Create the VirtualMachineInstance by running the following command:
$ oc create -f <file name>.yaml
11.5. Using the OpenShift Container Platform dashboard to get cluster information
Access the OpenShift Container Platform dashboard, which captures high-level information about the cluster, by clicking Home > Dashboards > Overview from the OpenShift Container Platform web console.
The OpenShift Container Platform dashboard provides various cluster information, captured in individual dashboard cards.
11.5.1. About the OpenShift Container Platform dashboards page
The OpenShift Container Platform dashboard consists of the following cards:
Details provides a brief overview of informational cluster details.
Status include ok, error, warning, in progress, and unknown. Resources can add custom status names.
- Cluster ID
- Provider
- Version
Cluster Inventory details number of resources and associated statuses. It is helpful when intervention is required to resolve problems, including information about:
- Number of nodes
- Number of pods
- Persistent storage volume claims
- Virtual machines (available if container-native virtualization is installed)
- Bare metal hosts in the cluster, listed according to their state (only available in metal3 environment).
- Cluster Health summarizes the current health of the cluster as a whole, including relevant alerts and descriptions. If container-native virtualization is installed, the overall health of container-native virtualization is diagnosed as well. If more than one subsystem is present, click See All to view the status of each subsystem.
Cluster Capacity charts help administrators understand when additional resources are required in the cluster. The charts contain an inner ring that displays current consumption, while an outer ring displays thresholds configured for the resource, including information about:
- CPU time
- Memory allocation
- Storage consumed
- Network resources consumed
- Cluster Utilization shows the capacity of various resources over a specified period of time, to help administrators understand the scale and frequency of high resource consumption.
- Events lists messages related to recent activity in the cluster, such as pod creation or virtual machine migration to another host.
- Top Consumers helps administrators understand how cluster resources are consumed. Click on a resource to jump to a detailed page listing pods and nodes that consume the largest amount of the specified cluster resource (CPU, memory, or storage).
11.6. OpenShift Container Platform cluster monitoring, logging, and Telemetry
OpenShift Container Platform provides various resources for monitoring at the cluster level.
11.6.1. About OpenShift Container Platform cluster monitoring
OpenShift Container Platform includes a pre-configured, pre-installed, and self-updating monitoring stack that is based on the Prometheus open source project and its wider eco-system. It provides monitoring of cluster components and includes a set of alerts to immediately notify the cluster administrator about any occurring problems and a set of Grafana dashboards. The cluster monitoring stack is only supported for monitoring OpenShift Container Platform clusters.
To ensure compatibility with future OpenShift Container Platform updates, configuring only the specified monitoring stack options is supported.
11.6.2. Cluster logging components
The cluster logging components are based upon Elasticsearch, Fluentd, and Kibana (EFK). The collector, Fluentd, is deployed to each node in the OpenShift Container Platform cluster. It collects all node and container logs and writes them to Elasticsearch (ES). Kibana is the centralized, web UI where users and administrators can create rich visualizations and dashboards with the aggregated data.
There are currently 5 different types of cluster logging components:
- logStore - This is where the logs will be stored. The current implementation is Elasticsearch.
- collection - This is the component that collects logs from the node, formats them, and stores them in the logStore. The current implementation is Fluentd.
- visualization - This is the UI component used to view logs, graphs, charts, and so forth. The current implementation is Kibana.
- curation - This is the component that trims logs by age. The current implementation is Curator.
For more information on cluster logging, see the OpenShift Container Platform cluster logging documentation.
11.6.3. About Telemetry
Telemetry sends a carefully chosen subset of the cluster monitoring metrics to Red Hat. These metrics are sent continuously and describe:
- The size of an OpenShift Container Platform cluster
- The health and status of OpenShift Container Platform components
- The health and status of any upgrade being performed
- Limited usage information about OpenShift Container Platform components and features
- Summary info about alerts reported by the cluster monitoring component
This continuous stream of data is used by Red Hat to monitor the health of clusters in real time and to react as necessary to problems that impact our customers. It also allows Red Hat to roll out OpenShift Container Platform upgrades to customers so as to minimize service impact and continuously improve the upgrade experience.
This debugging information is available to Red Hat Support and engineering teams with the same restrictions as accessing data reported via support cases. All connected cluster information is used by Red Hat to help make OpenShift Container Platform better and more intuitive to use. None of the information is shared with third parties.
11.6.3.1. Information collected by Telemetry
Primary information collected by Telemetry includes:
- The number of updates available per cluster
- Channel and image repository used for an update
- The number of errors that occurred during an update
- Progress information of running updates
- The number of machines per cluster
- The number of CPU cores and size of RAM of the machines
- The number of members in the etcd cluster and number of objects currently stored in the etcd cluster
- The number of CPU cores and RAM used per machine type - infra or master
- The number of CPU cores and RAM used per cluster
- The number of running virtual machine instances in the cluster
- Use of OpenShift Container Platform framework components per cluster
- The version of the OpenShift Container Platform cluster
- Health, condition, and status for any OpenShift Container Platform framework component that is installed on the cluster, for example Cluster Version Operator, Cluster Monitoring, Image Registry, and Elasticsearch for Logging
- A unique random identifier that is generated during installation
- The name of the platform that OpenShift Container Platform is deployed on, such as Amazon Web Services
Telemetry does not collect identifying information such as user names, passwords, or the names or addresses of user resources.
11.6.4. CLI troubleshooting and debugging commands
For a list of the oc
client troubleshooting and debugging commands, see the OpenShift Container Platform CLI tools documentation.
11.7. Collecting container-native virtualization data for Red Hat Support
When opening a support case, it is helpful to provide debugging information about your cluster to Red Hat Support.
The must-gather
tool enables you to collect diagnostic information about your OpenShift Container Platform cluster, including virtual machines and other data related to container-native virtualization.
For prompt support, supply diagnostic information for both OpenShift Container Platform and container-native virtualization.
Container-native virtualization is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see https://access.redhat.com/support/offerings/techpreview/.
11.7.1. About the must-gather tool
The oc adm must-gather
CLI command collects the information from your cluster that is most likely needed for debugging issues, such as:
- Resource definitions
- Audit logs
- Service logs
You can specify one or more images when you run the command by including the --image
argument. When you specify an image, the tool collects data related to that feature or product.
When you run oc adm must-gather
, a new pod is created on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local
. This directory is created in the current working directory.
11.7.2. About collecting container-native virtualization data
You can use the oc adm must-gather
CLI command to collect information about your cluster, including features and objects associated with container-native virtualization:
- The Hyperconverged Cluster Operator namespaces (and child objects)
- All namespaces (and their child objects) that belong to any container-native virtualization resources
- All container-native virtualization Custom Resource Definitions (CRDs)
- All namespaces that contain virtual machines
- All virtual machine definitions
To collect container-native virtualization data with must-gather
, you must specify the container-native virtualization image: --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8
.
11.7.3. Gathering data about specific features
You can gather debugging information about specific features by using the oc adm must-gather
CLI command with the --image
or --image-stream
argument. The must-gather
tool supports multiple images, so you can gather data about more than one feature by running a single command.
To collect the default must-gather
data in addition to specific feature data, add the --image-stream=openshift/must-gather
argument.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role. -
The OpenShift Container Platform CLI (
oc
) installed.
Procedure
-
Navigate to the directory where you want to store the
must-gather
data. Run the
oc adm must-gather
command with one or more--image
or--image-stream
arguments. For example, the following command gathers both the default cluster data and information specific to {VirtProductName}:$ oc adm must-gather \ --image-stream=openshift/must-gather \ 1 --image=registry.redhat.io/container-native-virtualization/cnv-must-gather-rhel8 2
You can use the
must-gather
tool with additional arguments to gather data that is specifically related to cluster logging and the Cluster Logging Operator in your cluster. For cluster logging, run the following command:$ oc adm must-gather --image=$(oc -n openshift-logging get deployment.apps/cluster-logging-operator \ -o jsonpath='{.spec.template.spec.containers[?(@.name == "cluster-logging-operator")].image}')
Example 11.1. Example
must-gather
output for cluster logging├── cluster-logging │ ├── clo │ │ ├── cluster-logging-operator-74dd5994f-6ttgt │ │ ├── clusterlogforwarder_cr │ │ ├── cr │ │ ├── csv │ │ ├── deployment │ │ └── logforwarding_cr │ ├── collector │ │ ├── fluentd-2tr64 │ ├── curator │ │ └── curator-1596028500-zkz4s │ ├── eo │ │ ├── csv │ │ ├── deployment │ │ └── elasticsearch-operator-7dc7d97b9d-jb4r4 │ ├── es │ │ ├── cluster-elasticsearch │ │ │ ├── aliases │ │ │ ├── health │ │ │ ├── indices │ │ │ ├── latest_documents.json │ │ │ ├── nodes │ │ │ ├── nodes_stats.json │ │ │ └── thread_pool │ │ ├── cr │ │ ├── elasticsearch-cdm-lp8l38m0-1-794d6dd989-4jxms │ │ └── logs │ │ ├── elasticsearch-cdm-lp8l38m0-1-794d6dd989-4jxms │ ├── install │ │ ├── co_logs │ │ ├── install_plan │ │ ├── olmo_logs │ │ └── subscription │ └── kibana │ ├── cr │ ├── kibana-9d69668d4-2rkvz ├── cluster-scoped-resources │ └── core │ ├── nodes │ │ ├── ip-10-0-146-180.eu-west-1.compute.internal.yaml │ └── persistentvolumes │ ├── pvc-0a8d65d9-54aa-4c44-9ecc-33d9381e41c1.yaml ├── event-filter.html ├── gather-debug.log └── namespaces ├── openshift-logging │ ├── apps │ │ ├── daemonsets.yaml │ │ ├── deployments.yaml │ │ ├── replicasets.yaml │ │ └── statefulsets.yaml │ ├── batch │ │ ├── cronjobs.yaml │ │ └── jobs.yaml │ ├── core │ │ ├── configmaps.yaml │ │ ├── endpoints.yaml │ │ ├── events │ │ │ ├── curator-1596021300-wn2ks.162634ebf0055a94.yaml │ │ │ ├── curator.162638330681bee2.yaml │ │ │ ├── elasticsearch-delete-app-1596020400-gm6nl.1626341a296c16a1.yaml │ │ │ ├── elasticsearch-delete-audit-1596020400-9l9n4.1626341a2af81bbd.yaml │ │ │ ├── elasticsearch-delete-infra-1596020400-v98tk.1626341a2d821069.yaml │ │ │ ├── elasticsearch-rollover-app-1596020400-cc5vc.1626341a3019b238.yaml │ │ │ ├── elasticsearch-rollover-audit-1596020400-s8d5s.1626341a31f7b315.yaml │ │ │ ├── elasticsearch-rollover-infra-1596020400-7mgv8.1626341a35ea59ed.yaml │ │ ├── events.yaml │ │ ├── persistentvolumeclaims.yaml │ │ ├── pods.yaml │ │ ├── replicationcontrollers.yaml │ │ ├── secrets.yaml │ │ └── services.yaml │ ├── openshift-logging.yaml │ ├── pods │ │ ├── cluster-logging-operator-74dd5994f-6ttgt │ │ │ ├── cluster-logging-operator │ │ │ │ └── cluster-logging-operator │ │ │ │ └── logs │ │ │ │ ├── current.log │ │ │ │ ├── previous.insecure.log │ │ │ │ └── previous.log │ │ │ └── cluster-logging-operator-74dd5994f-6ttgt.yaml │ │ ├── cluster-logging-operator-registry-6df49d7d4-mxxff │ │ │ ├── cluster-logging-operator-registry │ │ │ │ └── cluster-logging-operator-registry │ │ │ │ └── logs │ │ │ │ ├── current.log │ │ │ │ ├── previous.insecure.log │ │ │ │ └── previous.log │ │ │ ├── cluster-logging-operator-registry-6df49d7d4-mxxff.yaml │ │ │ └── mutate-csv-and-generate-sqlite-db │ │ │ └── mutate-csv-and-generate-sqlite-db │ │ │ └── logs │ │ │ ├── current.log │ │ │ ├── previous.insecure.log │ │ │ └── previous.log │ │ ├── curator-1596028500-zkz4s │ │ ├── elasticsearch-cdm-lp8l38m0-1-794d6dd989-4jxms │ │ ├── elasticsearch-delete-app-1596030300-bpgcx │ │ │ ├── elasticsearch-delete-app-1596030300-bpgcx.yaml │ │ │ └── indexmanagement │ │ │ └── indexmanagement │ │ │ └── logs │ │ │ ├── current.log │ │ │ ├── previous.insecure.log │ │ │ └── previous.log │ │ ├── fluentd-2tr64 │ │ │ ├── fluentd │ │ │ │ └── fluentd │ │ │ │ └── logs │ │ │ │ ├── current.log │ │ │ │ ├── previous.insecure.log │ │ │ │ └── previous.log │ │ │ ├── fluentd-2tr64.yaml │ │ │ └── fluentd-init │ │ │ └── fluentd-init │ │ │ └── logs │ │ │ ├── current.log │ │ │ ├── previous.insecure.log │ │ │ └── previous.log │ │ ├── kibana-9d69668d4-2rkvz │ │ │ ├── kibana │ │ │ │ └── kibana │ │ │ │ └── logs │ │ │ │ ├── current.log │ │ │ │ ├── previous.insecure.log │ │ │ │ └── previous.log │ │ │ ├── kibana-9d69668d4-2rkvz.yaml │ │ │ └── kibana-proxy │ │ │ └── kibana-proxy │ │ │ └── logs │ │ │ ├── current.log │ │ │ ├── previous.insecure.log │ │ │ └── previous.log │ └── route.openshift.io │ └── routes.yaml └── openshift-operators-redhat ├── ...
Create a compressed file from the
must-gather
directory that was just created in your working directory. For example, on a computer that uses a Linux operating system, run the following command:$ tar cvaf must-gather.tar.gz must-gather.local.5421342344627712289/ 1
- 1
- Make sure to replace
must-gather-local.5421342344627712289/
with the actual directory name.
- Attach the compressed file to your support case on the Red Hat Customer Portal.