Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 20. Troubleshooting network observability
Perform diagnostic actions to troubleshoot common issues related to the Network Observability Operator and its components.
20.1. Using the must-gather tool Link kopierenLink in die Zwischenablage kopiert!
You can use the must-gather tool to collect information about the Network Observability Operator resources and cluster-wide resources, such as pod logs,
FlowCollector
webhook
Procedure
- Navigate to the directory where you want to store the must-gather data.
Run the following command to collect cluster-wide must-gather resources:
$ oc adm must-gather --image-stream=openshift/must-gather \ --image=quay.io/netobserv/must-gather
20.2. Configuring network traffic menu entry in the OpenShift Container Platform console Link kopierenLink in die Zwischenablage kopiert!
Manually configure the network traffic menu entry in the OpenShift Container Platform console when the network traffic menu entry is not listed in Observe menu in the OpenShift Container Platform console.
Prerequisites
- You have installed OpenShift Container Platform version 4.10 or newer.
Procedure
Check if the
field is set tospec.consolePlugin.registerby running the following command:true$ oc -n netobserv get flowcollector cluster -o yamlExample output
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowCollector metadata: name: cluster spec: consolePlugin: register: falseOptional: Add the
plugin by manually editing the Console Operator config:netobserv-plugin$ oc edit console.operator.openshift.io clusterExample output
... spec: plugins: - netobserv-plugin ...Optional: Set the
field tospec.consolePlugin.registerby running the following command:true$ oc -n netobserv edit flowcollector cluster -o yamlExample output
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowCollector metadata: name: cluster spec: consolePlugin: register: trueEnsure the status of console pods is
by running the following command:running$ oc get pods -n openshift-console -l app=consoleRestart the console pods by running the following command:
$ oc delete pods -n openshift-console -l app=console- Clear your browser cache and history.
Check the status of network observability plugin pods by running the following command:
$ oc get pods -n netobserv -l app=netobserv-pluginExample output
NAME READY STATUS RESTARTS AGE netobserv-plugin-68c7bbb9bb-b69q6 1/1 Running 0 21sCheck the logs of the network observability plugin pods by running the following command:
$ oc logs -n netobserv -l app=netobserv-pluginExample output
time="2022-12-13T12:06:49Z" level=info msg="Starting netobserv-console-plugin [build version: , build date: 2022-10-21 15:15] at log level info" module=main time="2022-12-13T12:06:49Z" level=info msg="listening on https://:9001" module=server
20.3. Flowlogs-Pipeline does not consume network flows after installing Kafka Link kopierenLink in die Zwischenablage kopiert!
If you deployed the flow collector first with
deploymentModel: KAFKA
Procedure
Delete the flow-pipeline pods to restart them by running the following command:
$ oc delete pods -n netobserv -l app=flowlogs-pipeline-transformer
20.4. Failing to see network flows from both br-int and br-ex interfaces Link kopierenLink in die Zwischenablage kopiert!
br-ex` and
br-int
br-ex
br-int
br-ex
br-int
Manually remove the part in the
interfaces
excludeInterfaces
br-int
br-ex
Procedure
Remove the
field. This allows the agent to fetch information from all the interfaces. Alternatively, you can specify the Layer-3 interface for example,interfaces: [ 'br-int', 'br-ex' ]. Run the following command:eth0$ oc edit -n netobserv flowcollector.yaml -o yamlExample output
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowCollector metadata: name: cluster spec: agent: type: EBPF ebpf: interfaces: [ 'br-int', 'br-ex' ]1 - 1
- Specifies the network interfaces.
20.5. Network observability controller manager pod runs out of memory Link kopierenLink in die Zwischenablage kopiert!
You can increase memory limits for the Network Observability Operator by editing the
spec.config.resources.limits.memory
Subscription
Procedure
-
In the web console, navigate to Operators
Installed Operators - Click Network Observability and then select Subscription.
From the Actions menu, click Edit Subscription.
Alternatively, you can use the CLI to open the YAML configuration for the
object by running the following command:Subscription$ oc edit subscription netobserv-operator -n openshift-netobserv-operator
Edit the
object to add theSubscriptionspecification and set the value to account for your memory requirements. See the Additional resources for more information about resource considerations:config.resources.limits.memoryapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: netobserv-operator namespace: openshift-netobserv-operator spec: channel: stable config: resources: limits: memory: 800Mi1 requests: cpu: 100m memory: 100Mi installPlanApproval: Automatic name: netobserv-operator source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: <network_observability_operator_latest_version>2
20.6. Running custom queries to Loki Link kopierenLink in die Zwischenablage kopiert!
For troubleshooting, can run custom queries to Loki. There are two examples of ways to do this, which you can adapt according to your needs by replacing the <api_token> with your own.
These examples use the
netobserv
loki
-n netobserv
loki-gateway
Prerequisites
- Installed Loki Operator for use with Network Observability Operator
Procedure
To get all available labels, run the following:
$ oc exec deployment/netobserv-plugin -n netobserv -- curl -G -s -H 'X-Scope-OrgID:network' -H 'Authorization: Bearer <api_token>' -k https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network/loki/api/v1/labels | jqTo get all flows from the source namespace,
, run the following:my-namespace$ oc exec deployment/netobserv-plugin -n netobserv -- curl -G -s -H 'X-Scope-OrgID:network' -H 'Authorization: Bearer <api_token>' -k https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network/loki/api/v1/query --data-urlencode 'query={SrcK8S_Namespace="my-namespace"}' | jq
20.7. Troubleshooting Loki ResourceExhausted error Link kopierenLink in die Zwischenablage kopiert!
Loki may return a
ResourceExhausted
Procedure
-
Navigate to Operators
Installed Operators, viewing All projects from the Project drop-down menu. - In the Provided APIs list, select the Network Observability Operator.
Click the Flow Collector then the YAML view tab.
-
If you are using the Loki Operator, check that the value does not exceed 98 MiB.
spec.loki.batchSize -
If you are using a Loki installation method that is different from the Red Hat Loki Operator, such as Grafana Loki, verify that the Grafana Loki server setting is higher than the
grpc_server_max_recv_msg_sizeresourceFlowCollectorvalue. If it is not, you must either increase thespec.loki.batchSizevalue, or decrease thegrpc_server_max_recv_msg_sizevalue so that it is lower than the limit.spec.loki.batchSize
-
If you are using the Loki Operator, check that the
- Click Save if you edited the FlowCollector.
20.8. Loki empty ring error Link kopierenLink in die Zwischenablage kopiert!
The Loki "empty ring" error results in flows not being stored in Loki and not showing up in the web console. This error might happen in various situations. A single workaround to address them all does not exist. There are some actions you can take to investigate the logs in your Loki pods, and verify that the
LokiStack
Some of the situations where this error is observed are as follows:
After a
is uninstalled and reinstalled in the same namespace, old PVCs are not removed, which can cause this error.LokiStack-
Action: You can try removing the again, removing the PVC, then reinstalling the
LokiStack.LokiStack
-
Action: You can try removing the
After a certificate rotation, this error can prevent communication with the
andflowlogs-pipelinepods.console-plugin- Action: You can restart the pods to restore the connectivity.
20.9. Resource troubleshooting Link kopierenLink in die Zwischenablage kopiert!
20.10. LokiStack rate limit errors Link kopierenLink in die Zwischenablage kopiert!
A rate-limit placed on the Loki tenant can result in potential temporary loss of data and a 429 error:
Per stream rate limit exceeded (limit:xMB/sec) while attempting to ingest for stream
You can update the LokiStack CRD with the
perStreamRateLimit
perStreamRateLimitBurst
Procedure
-
Navigate to Operators
Installed Operators, viewing All projects from the Project dropdown. - Look for Loki Operator, and select the LokiStack tab.
Create or edit an existing LokiStack instance using the YAML view to add the
andperStreamRateLimitspecifications:perStreamRateLimitBurstapiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: loki namespace: netobserv spec: limits: global: ingestion: perStreamRateLimit: 61 perStreamRateLimitBurst: 302 tenants: mode: openshift-network managementState: Managed- Click Save.
Verification
Once you update the
perStreamRateLimit
perStreamRateLimitBurst
20.11. Running a large query results in Loki errors Link kopierenLink in die Zwischenablage kopiert!
When running large queries for a long time, Loki errors can occur, such as a
timeout
too many outstanding requests
- Adapt your query to add an indexed filter
-
With Loki queries, you can query on both indexed and non-indexed fields or labels. Queries that contain filters on labels perform better. For example, if you query for a particular Pod, which is not an indexed field, you can add its Namespace to the query. The list of indexed fields can be found in the "Network flows format reference", in the
Loki labelcolumn. - Consider querying Prometheus rather than Loki
- Prometheus is a better fit than Loki to query on large time ranges. However, whether or not you can use Prometheus instead of Loki depends on the use case. For example, queries on Prometheus are much faster than on Loki, and large time ranges do not impact performance. But Prometheus metrics do not contain as much information as flow logs in Loki. The Network Observability OpenShift web console automatically favors Prometheus over Loki if the query is compatible; otherwise, it defaults to Loki. If your query does not run against Prometheus, you can change some filters or aggregations to make the switch. In the OpenShift web console, you can force the use of Prometheus. An error message is displayed when incompatible queries fail, which can help you figure out which labels to change to make the query compatible. For example, changing a filter or an aggregation from Resource or Pods to Owner.
- Consider using the FlowMetrics API to create your own metric
- If the data that you need isn’t available as a Prometheus metric, you can use the FlowMetrics API to create your own metric. For more information, see "FlowMetrics API Reference" and "Configuring custom metrics by using FlowMetric API".
- Configure Loki to improve the query performance
If the problem persists, you can consider configuring Loki to improve the query performance. Some options depend on the installation mode you used for Loki, such as using the Operator and
, orLokiStackmode, orMonolithicmode.Microservices-
In or
LokiStackmodes, try increasing the number of querier replicas.Microservices -
Increase the query timeout. You must also increase the Network Observability read timeout to Loki in the
FlowCollector.spec.loki.readTimeout
-
In