Questo contenuto non è disponibile nella lingua selezionata.
Chapter 24. Using the Node Observability Operator
The Node Observability Operator collects and stores CRI-O and Kubelet profiling or metrics from scripts of compute nodes.
With the Node Observability Operator, you can query the profiling data, enabling analysis of performance trends in CRI-O and Kubelet. It supports debugging performance-related issues and executing embedded scripts for network metrics by using the run field in the custom resource definition. To enable CRI-O and Kubelet profiling or scripting, you can configure the type field in the custom resource definition.
The Node Observability Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
24.1. Workflow of the Node Observability Operator Copia collegamentoCollegamento copiato negli appunti!
The following workflow outlines on how to query the profiling data using the Node Observability Operator:
- Install the Node Observability Operator in the OpenShift Container Platform cluster.
- Create a NodeObservability custom resource to enable the CRI-O profiling on the worker nodes of your choice.
- Run the profiling query to generate the profiling data.
24.2. Installing the Node Observability Operator Copia collegamentoCollegamento copiato negli appunti!
The Node Observability Operator is not installed in OpenShift Container Platform by default. You can install the Node Observability Operator by using the OpenShift Container Platform CLI or the web console.
24.2.1. Installing the Node Observability Operator using the CLI Copia collegamentoCollegamento copiato negli appunti!
You can install the Node Observability Operator by using the OpenShift CLI (oc).
Prerequisites
- You have installed the OpenShift CLI (oc).
-
You have access to the cluster with
cluster-adminprivileges.
Procedure
Confirm that the Node Observability Operator is available by running the following command:
$ oc get packagemanifests -n openshift-marketplace node-observability-operatorExample output
NAME CATALOG AGE node-observability-operator Red Hat Operators 9hCreate the
node-observability-operatornamespace by running the following command:$ oc new-project node-observability-operatorCreate an
OperatorGroupobject YAML file:cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: node-observability-operator namespace: node-observability-operator spec: targetNamespaces: [] EOFCreate a
Subscriptionobject YAML file to subscribe a namespace to an Operator:cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: node-observability-operator namespace: node-observability-operator spec: channel: alpha name: node-observability-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
Verification
View the install plan name by running the following command:
$ oc -n node-observability-operator get sub node-observability-operator -o yaml | yq '.status.installplan.name'Example output
install-dt54wVerify the install plan status by running the following command:
$ oc -n node-observability-operator get ip <install_plan_name> -o yaml | yq '.status.phase'<install_plan_name>is the install plan name that you obtained from the output of the previous command.Example output
COMPLETEVerify that the Node Observability Operator is up and running:
$ oc get deploy -n node-observability-operatorExample output
NAME READY UP-TO-DATE AVAILABLE AGE node-observability-operator-controller-manager 1/1 1 1 40h
24.2.2. Installing the Node Observability Operator using the web console Copia collegamentoCollegamento copiato negli appunti!
You can install the Node Observability Operator from the OpenShift Container Platform web console.
Prerequisites
-
You have access to the cluster with
cluster-adminprivileges. - You have access to the OpenShift Container Platform web console.
Procedure
- Log in to the OpenShift Container Platform web console.
-
In the Administrator’s navigation panel, select Ecosystem
Software Catalog. - In the All items field, enter Node Observability Operator and select the Node Observability Operator tile.
- Click Install.
On the Install Operator page, configure the following settings:
- In the Update channel area, click alpha.
- In the Installation mode area, click A specific namespace on the cluster.
- From the Installed Namespace list, select node-observability-operator from the list.
- In the Update approval area, select Automatic.
- Click Install.
Verification
-
In the Administrator’s navigation panel, expand Ecosystem
Installed Operators. - Verify that the Node Observability Operator is listed in the Operators list.
24.3. Requesting CRI-O and Kubelet profiling data using the Node Observability Operator Copia collegamentoCollegamento copiato negli appunti!
Creating a Node Observability custom resource to collect CRI-O and Kubelet profiling data.
24.3.1. Creating the Node Observability custom resource Copia collegamentoCollegamento copiato negli appunti!
You must create and run the NodeObservability custom resource (CR) before you run the profiling query. When you run the NodeObservability CR, it creates the necessary machine config and machine config pool CRs to enable the CRI-O profiling on the worker nodes matching the nodeSelector.
If CRI-O profiling is not enabled on the worker nodes, the NodeObservabilityMachineConfig resource gets created. Worker nodes matching the nodeSelector specified in NodeObservability CR restarts. This might take 10 or more minutes to complete.
Kubelet profiling is enabled by default.
The CRI-O unix socket of the node is mounted on the agent pod, which allows the agent to communicate with CRI-O to run the pprof request. Similarly, the kubelet-serving-ca certificate chain is mounted on the agent pod, which allows secure communication between the agent and node’s kubelet endpoint.
Prerequisites
- You have installed the Node Observability Operator.
- You have installed the OpenShift CLI (oc).
-
You have access to the cluster with
cluster-adminprivileges.
Procedure
Log in to the OpenShift Container Platform CLI by running the following command:
$ oc login -u kubeadmin https://<HOSTNAME>:6443Switch back to the
node-observability-operatornamespace by running the following command:$ oc project node-observability-operatorCreate a CR file named
nodeobservability.yamlthat contains the following text:apiVersion: nodeobservability.olm.openshift.io/v1alpha2 kind: NodeObservability metadata: name: cluster1 spec: nodeSelector: kubernetes.io/hostname: <node_hostname>2 type: crio-kubeletRun the
NodeObservabilityCR:oc apply -f nodeobservability.yamlExample output
nodeobservability.olm.openshift.io/cluster createdReview the status of the
NodeObservabilityCR by running the following command:$ oc get nob/cluster -o yaml | yq '.status.conditions'Example output
conditions: conditions: - lastTransitionTime: "2022-07-05T07:33:54Z" message: 'DaemonSet node-observability-ds ready: true NodeObservabilityMachineConfig ready: true' reason: Ready status: "True" type: ReadyNodeObservabilityCR run is completed when the reason isReadyand the status isTrue.
24.3.2. Running the profiling query Copia collegamentoCollegamento copiato negli appunti!
To run the profiling query, you must create a NodeObservabilityRun resource. The profiling query is a blocking operation that fetches CRI-O and Kubelet profiling data for a duration of 30 seconds. After the profiling query is complete, you must retrieve the profiling data inside the container file system /run/node-observability directory. The lifetime of data is bound to the agent pod through the emptyDir volume, so you can access the profiling data while the agent pod is in the running status.
You can request only one profiling query at any point of time.
Prerequisites
- You have installed the Node Observability Operator.
-
You have created the
NodeObservabilitycustom resource (CR). -
You have access to the cluster with
cluster-adminprivileges.
Procedure
Create a
NodeObservabilityRunresource file namednodeobservabilityrun.yamlthat contains the following text:apiVersion: nodeobservability.olm.openshift.io/v1alpha2 kind: NodeObservabilityRun metadata: name: nodeobservabilityrun spec: nodeObservabilityRef: name: clusterTrigger the profiling query by running the
NodeObservabilityRunresource:$ oc apply -f nodeobservabilityrun.yamlReview the status of the
NodeObservabilityRunby running the following command:$ oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq '.status.conditions'Example output
conditions: - lastTransitionTime: "2022-07-07T14:57:34Z" message: Ready to start profiling reason: Ready status: "True" type: Ready - lastTransitionTime: "2022-07-07T14:58:10Z" message: Profiling query done reason: Finished status: "True" type: FinishedThe profiling query is complete once the status is
Trueand type isFinished.Retrieve the profiling data from the container’s
/run/node-observabilitypath by running the following bash script:for a in $(oc get nodeobservabilityrun nodeobservabilityrun -o yaml | yq .status.agents[].name); do echo "agent ${a}" mkdir -p "/tmp/${a}" for p in $(oc exec "${a}" -c node-observability-agent -- bash -c "ls /run/node-observability/*.pprof"); do f="$(basename ${p})" echo "copying ${f} to /tmp/${a}/${f}" oc exec "${a}" -c node-observability-agent -- cat "${p}" > "/tmp/${a}/${f}" done done
24.4. Node Observability Operator scripting Copia collegamentoCollegamento copiato negli appunti!
Scripting allows you to run pre-configured bash scripts, using the current Node Observability Operator and Node Observability Agent.
These scripts monitor key metrics like CPU load, memory pressure, and worker node issues. They also collect sar reports and custom performance metrics.
24.4.1. Creating the Node Observability custom resource for scripting Copia collegamentoCollegamento copiato negli appunti!
You must create and run the NodeObservability custom resource (CR) before you run the scripting. When you run the NodeObservability CR, it enables the agent in scripting mode on the compute nodes matching the nodeSelector label.
Prerequisites
- You have installed the Node Observability Operator.
-
You have installed the OpenShift CLI (
oc). -
You have access to the cluster with
cluster-adminprivileges.
Procedure
Log in to the OpenShift Container Platform cluster by running the following command:
$ oc login -u kubeadmin https://<host_name>:6443Switch to the
node-observability-operatornamespace by running the following command:$ oc project node-observability-operatorCreate a file named
nodeobservability.yamlthat contains the following content:apiVersion: nodeobservability.olm.openshift.io/v1alpha2 kind: NodeObservability metadata: name: cluster1 spec: nodeSelector: kubernetes.io/hostname: <node_hostname>2 type: scripting3 Create the
NodeObservabilityCR by running the following command:$ oc apply -f nodeobservability.yamlExample output
nodeobservability.olm.openshift.io/cluster createdReview the status of the
NodeObservabilityCR by running the following command:$ oc get nob/cluster -o yaml | yq '.status.conditions'Example output
conditions: conditions: - lastTransitionTime: "2022-07-05T07:33:54Z" message: 'DaemonSet node-observability-ds ready: true NodeObservabilityScripting ready: true' reason: Ready status: "True" type: ReadyThe
NodeObservabilityCR run is completed when thereasonisReadyandstatusis"True".
24.4.2. Configuring Node Observability Operator scripting Copia collegamentoCollegamento copiato negli appunti!
Prerequisites
- You have installed the Node Observability Operator.
-
You have created the
NodeObservabilitycustom resource (CR). -
You have access to the cluster with
cluster-adminprivileges.
Procedure
Create a file named
nodeobservabilityrun-script.yamlthat contains the following content:apiVersion: nodeobservability.olm.openshift.io/v1alpha2 kind: NodeObservabilityRun metadata: name: nodeobservabilityrun-script namespace: node-observability-operator spec: nodeObservabilityRef: name: cluster type: scriptingImportantYou can request only the following scripts:
-
metrics.sh -
network-metrics.sh(usesmonitor.sh)
-
Trigger the scripting by creating the
NodeObservabilityRunresource with the following command:$ oc apply -f nodeobservabilityrun-script.yamlReview the status of the
NodeObservabilityRunscripting by running the following command:$ oc get nodeobservabilityrun nodeobservabilityrun-script -o yaml | yq '.status.conditions'Example output
Status: Agents: Ip: 10.128.2.252 Name: node-observability-agent-n2fpm Port: 8443 Ip: 10.131.0.186 Name: node-observability-agent-wcc8p Port: 8443 Conditions: Conditions: Last Transition Time: 2023-12-19T15:10:51Z Message: Ready to start profiling Reason: Ready Status: True Type: Ready Last Transition Time: 2023-12-19T15:11:01Z Message: Profiling query done Reason: Finished Status: True Type: Finished Finished Timestamp: 2023-12-19T15:11:01Z Start Timestamp: 2023-12-19T15:10:51ZThe scripting is complete once
StatusisTrueandTypeisFinished.Retrieve the scripting data from the root path of the container by running the following bash script:
#!/bin/bash RUN=$(oc get nodeobservabilityrun --no-headers | awk '{print $1}') for a in $(oc get nodeobservabilityruns.nodeobservability.olm.openshift.io/${RUN} -o json | jq .status.agents[].name); do echo "agent ${a}" agent=$(echo ${a} | tr -d "\"\'\`") base_dir=$(oc exec "${agent}" -c node-observability-agent -- bash -c "ls -t | grep node-observability-agent" | head -1) echo "${base_dir}" mkdir -p "/tmp/${agent}" for p in $(oc exec "${agent}" -c node-observability-agent -- bash -c "ls ${base_dir}"); do f="/${base_dir}/${p}" echo "copying ${f} to /tmp/${agent}/${p}" oc exec "${agent}" -c node-observability-agent -- cat ${f} > "/tmp/${agent}/${p}" done done