Chapter 17. Debugging low latency node tuning status
Use the PerformanceProfile
custom resource (CR) status fields for reporting tuning status and debugging latency issues in the cluster node.
17.1. Debugging low latency CNF tuning status
The PerformanceProfile
custom resource (CR) contains status fields for reporting tuning status and debugging latency degradation issues. These fields report on conditions that describe the state of the operator’s reconciliation functionality.
A typical issue can arise when the status of machine config pools that are attached to the performance profile are in a degraded state, causing the PerformanceProfile
status to degrade. In this case, the machine config pool issues a failure message.
The Node Tuning Operator contains the performanceProfile.spec.status.Conditions
status field:
Status: Conditions: Last Heartbeat Time: 2020-06-02T10:01:24Z Last Transition Time: 2020-06-02T10:01:24Z Status: True Type: Available Last Heartbeat Time: 2020-06-02T10:01:24Z Last Transition Time: 2020-06-02T10:01:24Z Status: True Type: Upgradeable Last Heartbeat Time: 2020-06-02T10:01:24Z Last Transition Time: 2020-06-02T10:01:24Z Status: False Type: Progressing Last Heartbeat Time: 2020-06-02T10:01:24Z Last Transition Time: 2020-06-02T10:01:24Z Status: False Type: Degraded
The Status
field contains Conditions
that specify Type
values that indicate the status of the performance profile:
Available
- All machine configs and Tuned profiles have been created successfully and are available for cluster components are responsible to process them (NTO, MCO, Kubelet).
Upgradeable
- Indicates whether the resources maintained by the Operator are in a state that is safe to upgrade.
Progressing
- Indicates that the deployment process from the performance profile has started.
Degraded
Indicates an error if:
- Validation of the performance profile has failed.
- Creation of all relevant components did not complete successfully.
Each of these types contain the following fields:
Status
-
The state for the specific type (
true
orfalse
). Timestamp
- The transaction timestamp.
Reason string
- The machine readable reason.
Message string
- The human readable reason describing the state and error details, if any.
17.1.1. Machine config pools
A performance profile and its created products are applied to a node according to an associated machine config pool (MCP). The MCP holds valuable information about the progress of applying the machine configurations created by performance profiles that encompass kernel args, kube config, huge pages allocation, and deployment of rt-kernel. The Performance Profile controller monitors changes in the MCP and updates the performance profile status accordingly.
The only conditions returned by the MCP to the performance profile status is when the MCP is Degraded
, which leads to performanceProfile.status.condition.Degraded = true
.
Example
The following example is for a performance profile with an associated machine config pool (worker-cnf
) that was created for it:
The associated machine config pool is in a degraded state:
# oc get mcp
Example output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-2ee57a93fa6c9181b546ca46e1571d2d True False False 3 3 3 0 2d21h worker rendered-worker-d6b2bdc07d9f5a59a6b68950acf25e5f True False False 2 2 2 0 2d21h worker-cnf rendered-worker-cnf-6c838641b8a08fff08dbd8b02fb63f7c False True True 2 1 1 1 2d20h
The
describe
section of the MCP shows the reason:# oc describe mcp worker-cnf
Example output
Message: Node node-worker-cnf is reporting: "prepping update: machineconfig.machineconfiguration.openshift.io \"rendered-worker-cnf-40b9996919c08e335f3ff230ce1d170\" not found" Reason: 1 nodes are reporting degraded status on sync
The degraded state should also appear under the performance profile
status
field marked asdegraded = true
:# oc describe performanceprofiles performance
Example output
Message: Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync. Machine config pool worker-cnf Degraded Message: Node yquinn-q8s5v-w-b-z5lqn.c.openshift-gce-devel.internal is reporting: "prepping update: machineconfig.machineconfiguration.openshift.io \"rendered-worker-cnf-40b9996919c08e335f3ff230ce1d170\" not found". Reason: MCPDegraded Status: True Type: Degraded
17.2. Collecting low latency tuning debugging data for Red Hat Support
When opening a support case, it is helpful to provide debugging information about your cluster to Red Hat Support.
The must-gather
tool enables you to collect diagnostic information about your OpenShift Container Platform cluster, including node tuning, NUMA topology, and other information needed to debug issues with low latency setup.
For prompt support, supply diagnostic information for both OpenShift Container Platform and low latency tuning.
17.2.1. About the must-gather tool
The oc adm must-gather
CLI command collects the information from your cluster that is most likely needed for debugging issues, such as:
- Resource definitions
- Audit logs
- Service logs
You can specify one or more images when you run the command by including the --image
argument. When you specify an image, the tool collects data related to that feature or product. When you run oc adm must-gather
, a new pod is created on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local
. This directory is created in your current working directory.
17.2.2. Gathering low latency tuning data
Use the oc adm must-gather
CLI command to collect information about your cluster, including features and objects associated with low latency tuning, including:
- The Node Tuning Operator namespaces and child objects.
-
MachineConfigPool
and associatedMachineConfig
objects. - The Node Tuning Operator and associated Tuned objects.
- Linux kernel command line options.
- CPU and NUMA topology
- Basic PCI device information and NUMA locality.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role. - The OpenShift Container Platform CLI (oc) installed.
Procedure
-
Navigate to the directory where you want to store the
must-gather
data. Collect debugging information by running the following command:
$ oc adm must-gather
Example output
[must-gather ] OUT Using must-gather plug-in image: quay.io/openshift-release When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information: ClusterID: 829er0fa-1ad8-4e59-a46e-2644921b7eb6 ClusterVersion: Stable at "<cluster_version>" ClusterOperators: All healthy and stable [must-gather ] OUT namespace/openshift-must-gather-8fh4x created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-rhlgc created [must-gather-5564g] POD 2023-07-17T10:17:37.610340849Z Gathering data for ns/openshift-cluster-version... [must-gather-5564g] POD 2023-07-17T10:17:38.786591298Z Gathering data for ns/default... [must-gather-5564g] POD 2023-07-17T10:17:39.117418660Z Gathering data for ns/openshift... [must-gather-5564g] POD 2023-07-17T10:17:39.447592859Z Gathering data for ns/kube-system... [must-gather-5564g] POD 2023-07-17T10:17:39.803381143Z Gathering data for ns/openshift-etcd... ... Reprinting Cluster State: When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information: ClusterID: 829er0fa-1ad8-4e59-a46e-2644921b7eb6 ClusterVersion: Stable at "<cluster_version>" ClusterOperators: All healthy and stable
Create a compressed file from the
must-gather
directory that was created in your working directory. For example, on a computer that uses a Linux operating system, run the following command:$ tar cvaf must-gather.tar.gz must-gather-local.54213423446277122891
- 1
- Replace
must-gather-local.5421342344627712289//
with the directory name created by themust-gather
tool.
NoteCreate a compressed file to attach the data to a support case or to use with the Performance Profile Creator wrapper script when you create a performance profile.
- Attach the compressed file to your support case on the Red Hat Customer Portal.