Home
Products
OpenShift Container Platform
4.15
Scalability and performance
Chapter 21. Using the Node Observability Operator

Chapter 21. Using the Node Observability Operator

The Node Observability Operator collects and stores CRI-O and Kubelet profiling or metrics from scripts of compute nodes.

With the Node Observability Operator, you can query the profiling data, enabling analysis of performance trends in CRI-O and Kubelet. It supports debugging performance-related issues and executing embedded scripts for network metrics by using the run field in the custom resource definition. To enable CRI-O and Kubelet profiling or scripting, you can configure the type field in the custom resource definition.

Important

The Node Observability Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

21.1. Workflow of the Node Observability Operator
Copy link

The following workflow outlines on how to query the profiling data using the Node Observability Operator:

Install the Node Observability Operator in the OpenShift Container Platform cluster.
Create a NodeObservability custom resource to enable the CRI-O profiling on the worker nodes of your choice.
Run the profiling query to generate the profiling data.

21.2. Installing the Node Observability Operator
Copy link

The Node Observability Operator is not installed in OpenShift Container Platform by default. You can install the Node Observability Operator by using the OpenShift Container Platform CLI or the web console.

21.2.1. Installing the Node Observability Operator using the CLI
Copy link

You can install the Node Observability Operator by using the OpenShift CLI (oc).

Prerequisites

You have installed the OpenShift CLI (oc).
You have access to the cluster with cluster-admin privileges.

Procedure

Confirm that the Node Observability Operator is available by running the following command:

oc get packagemanifests -n openshift-marketplace node-observability-operator

$ oc get packagemanifests -n openshift-marketplace node-observability-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                            CATALOG                AGE
node-observability-operator     Red Hat Operators      9h

NAME                            CATALOG                AGE
node-observability-operator     Red Hat Operators      9h

Copy to Clipboard

Toggle word wrap

Create the node-observability-operator namespace by running the following command:
```
oc new-project node-observability-operator
```
```
$ oc new-project node-observability-operator
```
Copy to Clipboard Toggle word wrap

Create an OperatorGroup object YAML file:

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: node-observability-operator
  namespace: node-observability-operator
spec:
  targetNamespaces: []
EOF

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: node-observability-operator
  namespace: node-observability-operator
spec:
  targetNamespaces: []
EOF

Copy to Clipboard

Toggle word wrap

Create a Subscription object YAML file to subscribe a namespace to an Operator:

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: node-observability-operator
  namespace: node-observability-operator
spec:
  channel: alpha
  name: node-observability-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: node-observability-operator
  namespace: node-observability-operator
spec:
  channel: alpha
  name: node-observability-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

Copy to Clipboard

Toggle word wrap

Verification

View the install plan name by running the following command:

oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'

$ oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'

Copy to Clipboard

Toggle word wrap

Example output

install-dt54w

install-dt54w

Copy to Clipboard

Toggle word wrap

Verify the install plan status by running the following command:
```
oc -n node-observability-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
```
```
$ oc -n node-observability-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
```
Copy to Clipboard Toggle word wrap
<install_plan_name> is the install plan name that you obtained from the output of the previous command.
Example output
```
COMPLETE
```
```
COMPLETE
```
Copy to Clipboard Toggle word wrap

Verify that the Node Observability Operator is up and running:

oc get deploy -n node-observability-operator

$ oc get deploy -n node-observability-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                            READY   UP-TO-DATE  AVAILABLE   AGE
node-observability-operator-controller-manager  1/1     1           1           40h

NAME                                            READY   UP-TO-DATE  AVAILABLE   AGE
node-observability-operator-controller-manager  1/1     1           1           40h

Copy to Clipboard

Toggle word wrap

21.2.2. Installing the Node Observability Operator using the web console
Copy link

You can install the Node Observability Operator from the OpenShift Container Platform web console.

Prerequisites

You have access to the cluster with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.

Procedure

Log in to the OpenShift Container Platform web console.
In the Administrator’s navigation panel, expand Operators OperatorHub.
In the All items field, enter Node Observability Operator and select the Node Observability Operator tile.
Click Install.
On the Install Operator page, configure the following settings:
1. In the Update channel area, click alpha.
2. In the Installation mode area, click A specific namespace on the cluster.
3. From the Installed Namespace list, select node-observability-operator from the list.
4. In the Update approval area, select Automatic.
5. Click Install.

Verification

In the Administrator’s navigation panel, expand Operators Installed Operators.
Verify that the Node Observability Operator is listed in the Operators list.

21.3. Requesting CRI-O and Kubelet profiling data using the Node Observability Operator
Copy link

Creating a Node Observability custom resource to collect CRI-O and Kubelet profiling data.

21.3.1. Creating the Node Observability custom resource
Copy link

You must create and run the NodeObservability custom resource (CR) before you run the profiling query. When you run the NodeObservability CR, it creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the worker nodes matching the nodeSelector.

Important

If CRI-O profiling is not enabled on the worker nodes, the NodeObservabilityMachineConfig resource gets created. Worker nodes matching the nodeSelector specified in NodeObservability CR restarts. This might take 10 or more minutes to complete.

Note

Kubelet profiling is enabled by default.

The CRI-O unix socket of the node is mounted on the agent pod, which allows the agent to communicate with CRI-O to run the pprof request. Similarly, the kubelet-serving-ca certificate chain is mounted on the agent pod, which allows secure communication between the agent and node’s kubelet endpoint.

Prerequisites

You have installed the Node Observability Operator.
You have installed the OpenShift CLI (oc).
You have access to the cluster with cluster-admin privileges.

Procedure

Log in to the OpenShift Container Platform CLI by running the following command:
```
oc login -u kubeadmin https://<HOSTNAME>:6443
```
```
$ oc login -u kubeadmin https://<HOSTNAME>:6443
```
Copy to Clipboard Toggle word wrap
Switch back to the node-observability-operator namespace by running the following command:
```
oc project node-observability-operator
```
```
$ oc project node-observability-operator
```
Copy to Clipboard Toggle word wrap

Create a CR file named nodeobservability.yaml that contains the following text:

    apiVersion: nodeobservability.olm.openshift.io/v1alpha2
    kind: NodeObservability
    metadata:
      name: cluster 
    spec:
      nodeSelector:
        kubernetes.io/hostname: <node_hostname> 
      type: crio-kubelet

    apiVersion: nodeobservability.olm.openshift.io/v1alpha2
    kind: NodeObservability
    metadata:
      name: cluster


    spec:
      nodeSelector:
        kubernetes.io/hostname: <node_hostname>


      type: crio-kubelet

Copy to Clipboard

Toggle word wrap

1: You must specify the name as cluster because there should be only one NodeObservability CR per cluster.
2: Specify the nodes on which the Node Observability agent must be deployed.

Run the NodeObservability CR:

oc apply -f nodeobservability.yaml

oc apply -f nodeobservability.yaml

Copy to Clipboard

Toggle word wrap

Example output

nodeobservability.olm.openshift.io/cluster created

nodeobservability.olm.openshift.io/cluster created

Copy to Clipboard

Toggle word wrap

Review the status of the NodeObservability CR by running the following command:

oc get nob/cluster -o yaml | yq '.status.conditions'

$ oc get nob/cluster -o yaml | yq '.status.conditions'

Copy to Clipboard

Toggle word wrap

Example output

conditions:
  conditions:
  - lastTransitionTime: "2022-07-05T07:33:54Z"
    message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig
      ready: true'
    reason: Ready
    status: "True"
    type: Ready

conditions:
  conditions:
  - lastTransitionTime: "2022-07-05T07:33:54Z"
    message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig
      ready: true'
    reason: Ready
    status: "True"
    type: Ready

Copy to Clipboard

Toggle word wrap

NodeObservability CR run is completed when the reason is Ready and the status is True.

21.3.2. Running the profiling query
Copy link

To run the profiling query, you must create a NodeObservabilityRun resource. The profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. After the profiling query is complete, you must retrieve the profiling data inside the container file system /run/node-observability directory. The lifetime of data is bound to the agent pod through the emptyDir volume, so you can access the profiling data while the agent pod is in the running status.

Important

You can request only one profiling query at any point of time.

Prerequisites

You have installed the Node Observability Operator.
You have created the NodeObservability custom resource (CR).
You have access to the cluster with cluster-admin privileges.

Procedure

Create a NodeObservabilityRun resource file named nodeobservabilityrun.yaml that contains the following text:

apiVersion: nodeobservability.olm.openshift.io/v1alpha2
kind: NodeObservabilityRun
metadata:
  name: nodeobservabilityrun
spec:
  nodeObservabilityRef:
    name: cluster

apiVersion: nodeobservability.olm.openshift.io/v1alpha2
kind: NodeObservabilityRun
metadata:
  name: nodeobservabilityrun
spec:
  nodeObservabilityRef:
    name: cluster

Copy to Clipboard

Toggle word wrap

Trigger the profiling query by running the NodeObservabilityRun resource:
```
oc apply -f nodeobservabilityrun.yaml
```
```
$ oc apply -f nodeobservabilityrun.yaml
```
Copy to Clipboard Toggle word wrap

Review the status of the NodeObservabilityRun by running the following command:

oc get nodeobservabilityrun nodeobservabilityrun -o yaml  | yq '.status.conditions'

$ oc get nodeobservabilityrun nodeobservabilityrun -o yaml  | yq '.status.conditions'

Copy to Clipboard

Toggle word wrap

Example output

conditions:
- lastTransitionTime: "2022-07-07T14:57:34Z"
  message: Ready to start profiling
  reason: Ready
  status: "True"
  type: Ready
- lastTransitionTime: "2022-07-07T14:58:10Z"
  message: Profiling query done
  reason: Finished
  status: "True"
  type: Finished

conditions:
- lastTransitionTime: "2022-07-07T14:57:34Z"
  message: Ready to start profiling
  reason: Ready
  status: "True"
  type: Ready
- lastTransitionTime: "2022-07-07T14:58:10Z"
  message: Profiling query done
  reason: Finished
  status: "True"
  type: Finished

Copy to Clipboard

Toggle word wrap

The profiling query is complete once the status is True and type is Finished.

Retrieve the profiling data from the container’s /run/node-observability path by running the following bash script:

for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do
  echo "agent ${a}"
  mkdir -p "/tmp/${a}"
  for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do
    f="$(basename ${p})"
    echo "copying ${f} to /tmp/${a}/${f}"
    oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}"
  done
done

for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do
  echo "agent ${a}"
  mkdir -p "/tmp/${a}"
  for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do
    f="$(basename ${p})"
    echo "copying ${f} to /tmp/${a}/${f}"
    oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}"
  done
done

Copy to Clipboard

Toggle word wrap

21.4. Node Observability Operator scripting
Copy link

Scripting allows you to run pre-configured bash scripts, using the current Node Observability Operator and Node Observability Agent.

These scripts monitor key metrics like CPU load, memory pressure, and worker node issues. They also collect sar reports and custom performance metrics.

21.4.1. Creating the Node Observability custom resource for scripting
Copy link

You must create and run the NodeObservability custom resource (CR) before you run the scripting. When you run the NodeObservability CR, it enables the agent in scripting mode on the compute nodes matching the nodeSelector label.

Prerequisites

You have installed the Node Observability Operator.
You have installed the OpenShift CLI (oc).
You have access to the cluster with cluster-admin privileges.

Procedure

Log in to the OpenShift Container Platform cluster by running the following command:
```
oc login -u kubeadmin https://<host_name>:6443
```
```
$ oc login -u kubeadmin https://<host_name>:6443
```
Copy to Clipboard Toggle word wrap
Switch to the node-observability-operator namespace by running the following command:
```
oc project node-observability-operator
```
```
$ oc project node-observability-operator
```
Copy to Clipboard Toggle word wrap

Create a file named nodeobservability.yaml that contains the following content:

    apiVersion: nodeobservability.olm.openshift.io/v1alpha2
    kind: NodeObservability
    metadata:
      name: cluster 
    spec:
      nodeSelector:
        kubernetes.io/hostname: <node_hostname> 
      type: scripting

    apiVersion: nodeobservability.olm.openshift.io/v1alpha2
    kind: NodeObservability
    metadata:
      name: cluster


    spec:
      nodeSelector:
        kubernetes.io/hostname: <node_hostname>


      type: scripting

Copy to Clipboard

Toggle word wrap

1: You must specify the name as cluster because there should be only one NodeObservability CR per cluster.
2: Specify the nodes on which the Node Observability agent must be deployed.
3: To deploy the agent in scripting mode, you must set the type to scripting.

Create the NodeObservability CR by running the following command:
```
oc apply -f nodeobservability.yaml
```
```
$ oc apply -f nodeobservability.yaml
```
Copy to Clipboard Toggle word wrap
Example output
```
nodeobservability.olm.openshift.io/cluster created
```
```
nodeobservability.olm.openshift.io/cluster created
```
Copy to Clipboard Toggle word wrap

Review the status of the NodeObservability CR by running the following command:

oc get nob/cluster -o yaml | yq '.status.conditions'

$ oc get nob/cluster -o yaml | yq '.status.conditions'

Copy to Clipboard

Toggle word wrap

Example output

conditions:
  conditions:
  - lastTransitionTime: "2022-07-05T07:33:54Z"
    message: 'DaemonSet node-observability-ds ready: true NodeObservabilityScripting
      ready: true'
    reason: Ready
    status: "True"
    type: Ready

conditions:
  conditions:
  - lastTransitionTime: "2022-07-05T07:33:54Z"
    message: 'DaemonSet node-observability-ds ready: true NodeObservabilityScripting
      ready: true'
    reason: Ready
    status: "True"
    type: Ready

Copy to Clipboard

Toggle word wrap

The NodeObservability CR run is completed when the reason is Ready and status is "True".

21.4.2. Configuring Node Observability Operator scripting
Copy link

Prerequisites

You have installed the Node Observability Operator.
You have created the NodeObservability custom resource (CR).
You have access to the cluster with cluster-admin privileges.

Procedure

Create a file named nodeobservabilityrun-script.yaml that contains the following content:

apiVersion: nodeobservability.olm.openshift.io/v1alpha2
kind: NodeObservabilityRun
metadata:
  name: nodeobservabilityrun-script
  namespace: node-observability-operator
spec:
  nodeObservabilityRef:
    name: cluster
    type: scripting

apiVersion: nodeobservability.olm.openshift.io/v1alpha2
kind: NodeObservabilityRun
metadata:
  name: nodeobservabilityrun-script
  namespace: node-observability-operator
spec:
  nodeObservabilityRef:
    name: cluster
    type: scripting

Copy to Clipboard

Toggle word wrap

Important

You can request only the following scripts:

metrics.sh
network-metrics.sh (uses monitor.sh)

Trigger the scripting by creating the NodeObservabilityRun resource with the following command:
```
oc apply -f nodeobservabilityrun-script.yaml
```
```
$ oc apply -f nodeobservabilityrun-script.yaml
```
Copy to Clipboard Toggle word wrap

Review the status of the NodeObservabilityRun scripting by running the following command:

oc get nodeobservabilityrun nodeobservabilityrun-script -o yaml  | yq '.status.conditions'

$ oc get nodeobservabilityrun nodeobservabilityrun-script -o yaml  | yq '.status.conditions'

Copy to Clipboard

Toggle word wrap

Example output

Status:
  Agents:
    Ip:    10.128.2.252
    Name:  node-observability-agent-n2fpm
    Port:  8443
    Ip:    10.131.0.186
    Name:  node-observability-agent-wcc8p
    Port:  8443
  Conditions:
    Conditions:
      Last Transition Time:  2023-12-19T15:10:51Z
      Message:               Ready to start profiling
      Reason:                Ready
      Status:                True
      Type:                  Ready
      Last Transition Time:  2023-12-19T15:11:01Z
      Message:               Profiling query done
      Reason:                Finished
      Status:                True
      Type:                  Finished
  Finished Timestamp:        2023-12-19T15:11:01Z
  Start Timestamp:           2023-12-19T15:10:51Z

Status:
  Agents:
    Ip:    10.128.2.252
    Name:  node-observability-agent-n2fpm
    Port:  8443
    Ip:    10.131.0.186
    Name:  node-observability-agent-wcc8p
    Port:  8443
  Conditions:
    Conditions:
      Last Transition Time:  2023-12-19T15:10:51Z
      Message:               Ready to start profiling
      Reason:                Ready
      Status:                True
      Type:                  Ready
      Last Transition Time:  2023-12-19T15:11:01Z
      Message:               Profiling query done
      Reason:                Finished
      Status:                True
      Type:                  Finished
  Finished Timestamp:        2023-12-19T15:11:01Z
  Start Timestamp:           2023-12-19T15:10:51Z

Copy to Clipboard

Toggle word wrap

The scripting is complete once Status is True and Type is Finished.

Retrieve the scripting data from the root path of the container by running the following bash script:

#!/bin/bash

RUN=$(oc get nodeobservabilityrun --no-headers | awk '{print $1}')

for a in $(oc get nodeobservabilityruns.nodeobservability.olm.openshift.io/${RUN} -o json | jq .status.agents[].name); do
  echo "agent ${a}"
  agent=$(echo ${a} | tr -d "\"\'\`")
  base_dir=$(oc exec "${agent}" -c node-observability-agent -- bash -c "ls -t | grep node-observability-agent" | head -1)
  echo "${base_dir}"
  mkdir -p "/tmp/${agent}"
  for p in $(oc exec "${agent}" -c node-observability-agent -- bash -c "ls ${base_dir}"); do
    f="/${base_dir}/${p}"
    echo "copying ${f} to /tmp/${agent}/${p}"
    oc exec "${agent}" -c node-observability-agent -- cat ${f} > "/tmp/${agent}/${p}"
  done
done

#!/bin/bash

RUN=$(oc get nodeobservabilityrun --no-headers | awk '{print $1}')

for a in $(oc get nodeobservabilityruns.nodeobservability.olm.openshift.io/${RUN} -o json | jq .status.agents[].name); do
  echo "agent ${a}"
  agent=$(echo ${a} | tr -d "\"\'\`")
  base_dir=$(oc exec "${agent}" -c node-observability-agent -- bash -c "ls -t | grep node-observability-agent" | head -1)
  echo "${base_dir}"
  mkdir -p "/tmp/${agent}"
  for p in $(oc exec "${agent}" -c node-observability-agent -- bash -c "ls ${base_dir}"); do
    f="/${base_dir}/${p}"
    echo "copying ${f} to /tmp/${agent}/${p}"
    oc exec "${agent}" -c node-observability-agent -- cat ${f} > "/tmp/${agent}/${p}"
  done
done

Copy to Clipboard

Toggle word wrap

Chapter 21. Using the Node Observability Operator

21.1. Workflow of the Node Observability Operator
Copy link

21.2. Installing the Node Observability Operator
Copy link

21.2.1. Installing the Node Observability Operator using the CLI
Copy link

21.2.2. Installing the Node Observability Operator using the web console
Copy link

21.3. Requesting CRI-O and Kubelet profiling data using the Node Observability Operator
Copy link

21.3.1. Creating the Node Observability custom resource
Copy link

21.3.2. Running the profiling query
Copy link

21.4. Node Observability Operator scripting
Copy link

21.4.1. Creating the Node Observability custom resource for scripting
Copy link

21.4.2. Configuring Node Observability Operator scripting
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 21. Using the Node Observability Operator

21.1. Workflow of the Node Observability OperatorCopy linkLink copied to clipboard!

21.2. Installing the Node Observability OperatorCopy linkLink copied to clipboard!

21.2.1. Installing the Node Observability Operator using the CLICopy linkLink copied to clipboard!

21.2.2. Installing the Node Observability Operator using the web consoleCopy linkLink copied to clipboard!

21.3. Requesting CRI-O and Kubelet profiling data using the Node Observability OperatorCopy linkLink copied to clipboard!

21.3.1. Creating the Node Observability custom resourceCopy linkLink copied to clipboard!

21.3.2. Running the profiling queryCopy linkLink copied to clipboard!

21.4. Node Observability Operator scriptingCopy linkLink copied to clipboard!

21.4.1. Creating the Node Observability custom resource for scriptingCopy linkLink copied to clipboard!

21.4.2. Configuring Node Observability Operator scriptingCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

21.1. Workflow of the Node Observability Operator
Copy link

21.2. Installing the Node Observability Operator
Copy link

21.2.1. Installing the Node Observability Operator using the CLI
Copy link

21.2.2. Installing the Node Observability Operator using the web console
Copy link

21.3. Requesting CRI-O and Kubelet profiling data using the Node Observability Operator
Copy link

21.3.1. Creating the Node Observability custom resource
Copy link

21.3.2. Running the profiling query
Copy link

21.4. Node Observability Operator scripting
Copy link

21.4.1. Creating the Node Observability custom resource for scripting
Copy link

21.4.2. Configuring Node Observability Operator scripting
Copy link