Home
Products
OpenShift Container Platform
4.16
Scalability and performance
Chapter 17. Debugging low latency node tuning status

Chapter 17. Debugging low latency node tuning status

Use the PerformanceProfile custom resource (CR) status fields for reporting tuning status and debugging latency issues in a cluster node.

17.1. Debugging low latency CNF tuning status
Copy link

To report tuning status and debug latency degradation issues, use the status fields in the PerformanceProfile custom resource (CR). These fields describe the conditions of the reconciliation functionality of an Operator, helping you verify the state of your configuration.

A typical issue can arise when the status of machine config pools that are attached to the performance profile are in a degraded state, causing the PerformanceProfile status to degrade. In this case, the machine config pool issues a failure message.

The Node Tuning Operator contains the performanceProfile.spec.status.Conditions status field:

Status:
  Conditions:
    Last Heartbeat Time:   2020-06-02T10:01:24Z
    Last Transition Time:  2020-06-02T10:01:24Z
    Status:                True
    Type:                  Available
    Last Heartbeat Time:   2020-06-02T10:01:24Z
    Last Transition Time:  2020-06-02T10:01:24Z
    Status:                True
    Type:                  Upgradeable
    Last Heartbeat Time:   2020-06-02T10:01:24Z
    Last Transition Time:  2020-06-02T10:01:24Z
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2020-06-02T10:01:24Z
    Last Transition Time:  2020-06-02T10:01:24Z
    Status:                False
    Type:                  Degraded

Status:
  Conditions:
    Last Heartbeat Time:   2020-06-02T10:01:24Z
    Last Transition Time:  2020-06-02T10:01:24Z
    Status:                True
    Type:                  Available
    Last Heartbeat Time:   2020-06-02T10:01:24Z
    Last Transition Time:  2020-06-02T10:01:24Z
    Status:                True
    Type:                  Upgradeable
    Last Heartbeat Time:   2020-06-02T10:01:24Z
    Last Transition Time:  2020-06-02T10:01:24Z
    Status:                False
    Type:                  Progressing
    Last Heartbeat Time:   2020-06-02T10:01:24Z
    Last Transition Time:  2020-06-02T10:01:24Z
    Status:                False
    Type:                  Degraded

Copy to Clipboard

Toggle word wrap

The Status field contains Conditions that specify Type values that indicate the status of the performance profile:

Available

All machine configs and Tuned profiles have been created successfully and are available for cluster components, such as NTO, MCO, Kubelet, that are responsible to process them.

Upgradeable

Indicates whether the resources maintained by the Operator are in a state that is safe to upgrade.

Progressing

Indicates that the deployment process from the performance profile has started.

Degraded

Indicates an error if:

Validation of the performance profile has failed.
Creation of all relevant components did not complete successfully.

Each of these types contain the following fields:

Status: The state for the specific type (true or false).
Timestamp: The transaction timestamp.
Reason string: The machine readable reason.
Message string: The human readable reason describing the state and error details, if any.

17.2. Machine config pools
Copy link

To apply performance profiles to specific nodes, associate them with a machine config pool (MCP). The MCP tracks the status of tuning updates, such as kernel arguments, huge pages, and real-time kernels, ensuring your cluster configurations are applied correctly.

The Performance Profile controller monitors changes in the MCP and updates the performance profile status accordingly.

The only conditions returned by the MCP to the performance profile status is when the MCP is Degraded, which leads to performanceProfile.status.condition.Degraded = true.

Procedure

Check the state of the associated machine config pool by entering the following command. The output example shows a performance profile with an associated machine config pool (worker-cnf) that is in a degraded state.

oc get mcp

# oc get mcp

Copy to Clipboard

Toggle word wrap

Example output

NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master       rendered-master-2ee57a93fa6c9181b546ca46e1571d2d       True      False      False      3              3                   3                     0                      2d21h
worker       rendered-worker-d6b2bdc07d9f5a59a6b68950acf25e5f       True      False      False      2              2                   2                     0                      2d21h
worker-cnf   rendered-worker-cnf-6c838641b8a08fff08dbd8b02fb63f7c   False     True       True       2              1                   1                     1                      2d20h

NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master       rendered-master-2ee57a93fa6c9181b546ca46e1571d2d       True      False      False      3              3                   3                     0                      2d21h
worker       rendered-worker-d6b2bdc07d9f5a59a6b68950acf25e5f       True      False      False      2              2                   2                     0                      2d21h
worker-cnf   rendered-worker-cnf-6c838641b8a08fff08dbd8b02fb63f7c   False     True       True       2              1                   1                     1                      2d20h

Copy to Clipboard

Toggle word wrap

To check the reason for the degraded state, enter the following command, ensuring that you change the example machine config pool with your machine config pool. The describe section of the MCP shows the reason.

oc describe mcp worker-cnf

# oc describe mcp worker-cnf

Copy to Clipboard

Toggle word wrap

Example output

  Message:               Node node-worker-cnf is reporting: "prepping update:
  machineconfig.machineconfiguration.openshift.io \"rendered-worker-cnf-40b9996919c08e335f3ff230ce1d170\" not
  found"
    Reason:                1 nodes are reporting degraded status on sync

  Message:               Node node-worker-cnf is reporting: "prepping update:
  machineconfig.machineconfiguration.openshift.io \"rendered-worker-cnf-40b9996919c08e335f3ff230ce1d170\" not
  found"
    Reason:                1 nodes are reporting degraded status on sync

Copy to Clipboard

Toggle word wrap

Optional: You can also run the oc describe command against the performance profile to check the degraded state status. The example output shows the performance profile status field marked as degraded = true:

oc describe performanceprofiles performance

# oc describe performanceprofiles performance

Copy to Clipboard

Toggle word wrap

Example output

Message: Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync.
Machine config pool worker-cnf Degraded Message: Node yquinn-q8s5v-w-b-z5lqn.c.openshift-gce-devel.internal is
reporting: "prepping update: machineconfig.machineconfiguration.openshift.io
\"rendered-worker-cnf-40b9996919c08e335f3ff230ce1d170\" not found".    Reason:  MCPDegraded
   Status:  True
   Type:    Degraded

Message: Machine config pool worker-cnf Degraded Reason: 1 nodes are reporting degraded status on sync.
Machine config pool worker-cnf Degraded Message: Node yquinn-q8s5v-w-b-z5lqn.c.openshift-gce-devel.internal is
reporting: "prepping update: machineconfig.machineconfiguration.openshift.io
\"rendered-worker-cnf-40b9996919c08e335f3ff230ce1d170\" not found".    Reason:  MCPDegraded
   Status:  True
   Type:    Degraded

Copy to Clipboard

Toggle word wrap

17.3. About the must-gather tool
Copy link

To debug issues in your cluster, use the oc adm must-gather CLI command. This tool collects the diagnostic information most likely needed for troubleshooting, ensuring that you have the necessary data for analysis.

The oc adm must-gather CLI command collects the following information from your cluster:

Resource definitions
Audit logs
Service logs

You can specify one or more images when you run the command by including the --image argument. When you specify an image, the tool collects data related to that feature or product. When you run oc adm must-gather, a new pod is created on the cluster. The data is collected on that pod and saved in a new directory that starts with must-gather.local. This directory is created in your current working directory.

17.4. Collecting low latency tuning debugging data for Red Hat Support
Copy link

To debug low latency setup issues when opening a support case, collect diagnostic information for Red Hat Support using the must-gather tool. This command gathers essential data, such as node tuning and NUMA topology, from your OpenShift Container Platform cluster.

For prompt support, supply diagnostic information for both OpenShift Container Platform and low latency tuning.

Use the oc adm must-gather CLI command to collect the following information about your cluster, including features and objects associated with low latency tuning:

The Node Tuning Operator namespaces and child objects.
MachineConfigPool and associated MachineConfig objects.
The Node Tuning Operator and associated Tuned objects.
Linux kernel command-line options.
CPU and NUMA topology
Basic PCI device information and NUMA locality.

Prerequisites

Access to the cluster as a user with the cluster-admin role.
The OpenShift Container Platform OpenShift CLI (oc) installed.

Procedure

Navigate to the directory where you want to store the must-gather data.

Collect debugging information by running the following command:

oc adm must-gather

$ oc adm must-gather

Copy to Clipboard

Toggle word wrap

Example output

[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: 829er0fa-1ad8-4e59-a46e-2644921b7eb6
ClusterVersion: Stable at "<cluster_version>"
ClusterOperators:
	All healthy and stable


[must-gather      ] OUT namespace/openshift-must-gather-8fh4x created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-rhlgc created
[must-gather-5564g] POD 2023-07-17T10:17:37.610340849Z Gathering data for ns/openshift-cluster-version...
[must-gather-5564g] POD 2023-07-17T10:17:38.786591298Z Gathering data for ns/default...
[must-gather-5564g] POD 2023-07-17T10:17:39.117418660Z Gathering data for ns/openshift...
[must-gather-5564g] POD 2023-07-17T10:17:39.447592859Z Gathering data for ns/kube-system...
[must-gather-5564g] POD 2023-07-17T10:17:39.803381143Z Gathering data for ns/openshift-etcd...

...

Reprinting Cluster State:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: 829er0fa-1ad8-4e59-a46e-2644921b7eb6
ClusterVersion: Stable at "<cluster_version>"
ClusterOperators:
	All healthy and stable

[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-release
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: 829er0fa-1ad8-4e59-a46e-2644921b7eb6
ClusterVersion: Stable at "<cluster_version>"
ClusterOperators:
	All healthy and stable


[must-gather      ] OUT namespace/openshift-must-gather-8fh4x created
[must-gather      ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-rhlgc created
[must-gather-5564g] POD 2023-07-17T10:17:37.610340849Z Gathering data for ns/openshift-cluster-version...
[must-gather-5564g] POD 2023-07-17T10:17:38.786591298Z Gathering data for ns/default...
[must-gather-5564g] POD 2023-07-17T10:17:39.117418660Z Gathering data for ns/openshift...
[must-gather-5564g] POD 2023-07-17T10:17:39.447592859Z Gathering data for ns/kube-system...
[must-gather-5564g] POD 2023-07-17T10:17:39.803381143Z Gathering data for ns/openshift-etcd...

...

Reprinting Cluster State:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: 829er0fa-1ad8-4e59-a46e-2644921b7eb6
ClusterVersion: Stable at "<cluster_version>"
ClusterOperators:
	All healthy and stable

Copy to Clipboard

Toggle word wrap

Create a compressed file from the must-gather directory that was created in your working directory. For example, on a computer that uses a Linux operating system, run the following command:
```
tar cvaf must-gather.tar.gz must-gather-local.5421342344627712289//
```
```
$ tar cvaf must-gather.tar.gz must-gather-local.5421342344627712289//
```
Copy to Clipboard Toggle word wrap
- must-gather-local.5421342344627712289//: Replace this value with the directory name created by the must-gather tool.
  Note
  Create a compressed file to attach the data to a support case or to use with the Performance Profile Creator wrapper script when you create a performance profile.
Attach the compressed file to your support case on the Red Hat Customer Portal.

Chapter 17. Debugging low latency node tuning status

17.1. Debugging low latency CNF tuning status
Copy link

17.2. Machine config pools
Copy link

17.3. About the must-gather tool
Copy link

17.4. Collecting low latency tuning debugging data for Red Hat Support
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 17. Debugging low latency node tuning status

17.1. Debugging low latency CNF tuning statusCopy linkLink copied to clipboard!

17.2. Machine config poolsCopy linkLink copied to clipboard!

17.3. About the must-gather toolCopy linkLink copied to clipboard!

17.4. Collecting low latency tuning debugging data for Red Hat SupportCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

17.1. Debugging low latency CNF tuning status
Copy link

17.2. Machine config pools
Copy link

17.3. About the must-gather tool
Copy link

17.4. Collecting low latency tuning debugging data for Red Hat Support
Copy link