Chapter 18. Requesting CRI-O and Kubelet profiling data by using the Node Observability Operator
The Node Observability Operator collects and stores the CRI-O and Kubelet profiling data of worker nodes. You can query the profiling data to analyze the CRI-O and Kubelet performance trends and debug the performance-related issues.
The Node Observability Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
18.1. Workflow of the Node Observability Operator
The following workflow outlines on how to query the profiling data using the Node Observability Operator:
- Install the Node Observability Operator in the OpenShift Container Platform cluster.
- Create a NodeObservability custom resource to enable the CRI-O profiling on the worker nodes of your choice.
- Run the profiling query to generate the profiling data.
18.2. Installing the Node Observability Operator
The Node Observability Operator is not installed in OpenShift Container Platform by default. You can install the Node Observability Operator by using the OpenShift Container Platform CLI or the web console.
18.2.1. Installing the Node Observability Operator using the CLI
You can install the Node Observability Operator by using the OpenShift CLI (oc).
Prerequisites
- You have installed the OpenShift CLI (oc).
-
You have access to the cluster with
cluster-admin
privileges.
Procedure
Confirm that the Node Observability Operator is available by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get packagemanifests -n openshift-marketplace node-observability-operator
$ oc get packagemanifests -n openshift-marketplace node-observability-operator
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NAME CATALOG AGE node-observability-operator Red Hat Operators 9h
NAME CATALOG AGE node-observability-operator Red Hat Operators 9h
Create the
node-observability-operator
namespace by running the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc new-project node-observability-operator
$ oc new-project node-observability-operator
Create an
OperatorGroup
object YAML file:Copy to Clipboard Copied! Toggle word wrap Toggle overflow cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: node-observability-operator namespace: node-observability-operator spec: targetNamespaces: [] EOF
cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: node-observability-operator namespace: node-observability-operator spec: targetNamespaces: [] EOF
Create a
Subscription
object YAML file to subscribe a namespace to an Operator:Copy to Clipboard Copied! Toggle word wrap Toggle overflow cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: node-observability-operator namespace: node-observability-operator spec: channel: alpha name: node-observability-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: node-observability-operator namespace: node-observability-operator spec: channel: alpha name: node-observability-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
Verification
View the install plan name by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'
$ oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow install-dt54w
install-dt54w
Verify the install plan status by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc -n node-observability-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
$ oc -n node-observability-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
<install_plan_name>
is the install plan name that you obtained from the output of the previous command.Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow COMPLETE
COMPLETE
Verify that the Node Observability Operator is up and running:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get deploy -n node-observability-operator
$ oc get deploy -n node-observability-operator
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NAME READY UP-TO-DATE AVAILABLE AGE node-observability-operator-controller-manager 1/1 1 1 40h
NAME READY UP-TO-DATE AVAILABLE AGE node-observability-operator-controller-manager 1/1 1 1 40h
18.2.2. Installing the Node Observability Operator using the web console
You can install the Node Observability Operator from the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
-
In the Administrator’s navigation panel, expand Operators
OperatorHub. - In the All items field, enter Node Observability Operator and select the Node Observability Operator tile.
- Click Install.
On the Install Operator page, configure the following settings:
- In the Update channel area, click alpha.
- In the Installation mode area, click A specific namespace on the cluster.
- From the Installed Namespace list, select node-observability-operator from the list.
- In the Update approval area, select Automatic.
- Click Install.
Verification
-
In the Administrator’s navigation panel, expand Operators
Installed Operators. - Verify that the Node Observability Operator is listed in the Operators list.
18.3. Creating the Node Observability custom resource
You must create and run the NodeObservability
custom resource (CR) before you run the profiling query. When you run the NodeObservability
CR, it creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the worker nodes matching the nodeSelector
.
If CRI-O profiling is not enabled on the worker nodes, the NodeObservabilityMachineConfig
resource gets created. Worker nodes matching the nodeSelector
specified in NodeObservability
CR restarts. This might take 10 or more minutes to complete.
Kubelet profiling is enabled by default.
The CRI-O unix socket of the node is mounted on the agent pod, which allows the agent to communicate with CRI-O to run the pprof request. Similarly, the kubelet-serving-ca
certificate chain is mounted on the agent pod, which allows secure communication between the agent and node’s kubelet endpoint.
Prerequisites
- You have installed the Node Observability Operator.
- You have installed the OpenShift CLI (oc).
-
You have access to the cluster with
cluster-admin
privileges.
Procedure
Log in to the OpenShift Container Platform CLI by running the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc login -u kubeadmin https://<HOSTNAME>:6443
$ oc login -u kubeadmin https://<HOSTNAME>:6443
Switch back to the
node-observability-operator
namespace by running the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc project node-observability-operator
$ oc project node-observability-operator
Create a CR file named
nodeobservability.yaml
that contains the following text:Copy to Clipboard Copied! Toggle word wrap Toggle overflow apiVersion: nodeobservability.olm.openshift.io/v1alpha2 kind: NodeObservability metadata: name: cluster spec: nodeSelector: kubernetes.io/hostname: <node_hostname> type: crio-kubelet
apiVersion: nodeobservability.olm.openshift.io/v1alpha2 kind: NodeObservability metadata: name: cluster
1 spec: nodeSelector: kubernetes.io/hostname: <node_hostname>
2 type: crio-kubelet
Run the
NodeObservability
CR:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc apply -f nodeobservability.yaml
oc apply -f nodeobservability.yaml
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow nodeobservability.olm.openshift.io/cluster created
nodeobservability.olm.openshift.io/cluster created
Review the status of the
NodeObservability
CR by running the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get nob/cluster -o yaml | yq '.status.conditions'
$ oc get nob/cluster -o yaml | yq '.status.conditions'
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow conditions: conditions: - lastTransitionTime: "2022-07-05T07:33:54Z" message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig ready: true' reason: Ready status: "True" type: Ready
conditions: conditions: - lastTransitionTime: "2022-07-05T07:33:54Z" message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig ready: true' reason: Ready status: "True" type: Ready
NodeObservability
CR run is completed when the reason isReady
and the status isTrue
.
18.4. Running the profiling query
To run the profiling query, you must create a NodeObservabilityRun
resource. The profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. After the profiling query is complete, you must retrieve the profiling data inside the container file system /run/node-observability
directory. The lifetime of data is bound to the agent pod through the emptyDir
volume, so you can access the profiling data while the agent pod is in the running
status.
You can request only one profiling query at any point of time.
Prerequisites
- You have installed the Node Observability Operator.
-
You have created the
NodeObservability
custom resource (CR). -
You have access to the cluster with
cluster-admin
privileges.
Procedure
Create a
NodeObservabilityRun
resource file namednodeobservabilityrun.yaml
that contains the following text:Copy to Clipboard Copied! Toggle word wrap Toggle overflow apiVersion: nodeobservability.olm.openshift.io/v1alpha2 kind: NodeObservabilityRun metadata: name: nodeobservabilityrun spec: nodeObservabilityRef: name: cluster
apiVersion: nodeobservability.olm.openshift.io/v1alpha2 kind: NodeObservabilityRun metadata: name: nodeobservabilityrun spec: nodeObservabilityRef: name: cluster
Trigger the profiling query by running the
NodeObservabilityRun
resource:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc apply -f nodeobservabilityrun.yaml
$ oc apply -f nodeobservabilityrun.yaml
Review the status of the
NodeObservabilityRun
by running the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq '.status.conditions'
$ oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq '.status.conditions'
Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow conditions: - lastTransitionTime: "2022-07-07T14:57:34Z" message: Ready to start profiling reason: Ready status: "True" type: Ready - lastTransitionTime: "2022-07-07T14:58:10Z" message: Profiling query done reason: Finished status: "True" type: Finished
conditions: - lastTransitionTime: "2022-07-07T14:57:34Z" message: Ready to start profiling reason: Ready status: "True" type: Ready - lastTransitionTime: "2022-07-07T14:58:10Z" message: Profiling query done reason: Finished status: "True" type: Finished
The profiling query is complete once the status is
True
and type isFinished
.Retrieve the profiling data from the container’s
/run/node-observability
path by running the following bash script:Copy to Clipboard Copied! Toggle word wrap Toggle overflow for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do echo "agent ${a}" mkdir -p "/tmp/${a}" for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do f="$(basename ${p})" echo "copying ${f} to /tmp/${a}/${f}" oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}" done done
for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do echo "agent ${a}" mkdir -p "/tmp/${a}" for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do f="$(basename ${p})" echo "copying ${f} to /tmp/${a}/${f}" oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}" done done