Chapter 11. Troubleshooting Network Observability

11.1. Using the must-gather tool
Copy link

You can use the must-gather tool to collect information about the Network Observability Operator resources and cluster-wide resources, such as pod logs, FlowCollector, and webhook configurations.

Procedure

Navigate to the directory where you want to store the must-gather data.
Run the following command to collect cluster-wide must-gather resources:
```
oc adm must-gather
```
```
$ oc adm must-gather
 --image-stream=openshift/must-gather \
 --image=quay.io/netobserv/must-gather
```
Copy to Clipboard Toggle word wrap

11.2. Configuring network traffic menu entry in the OpenShift Container Platform console
Copy link

Manually configure the network traffic menu entry in the OpenShift Container Platform console when the network traffic menu entry is not listed in Observe menu in the OpenShift Container Platform console.

Prerequisites

You have installed OpenShift Container Platform version 4.10 or newer.

Procedure

Check if the spec.consolePlugin.register field is set to true by running the following command:

oc -n netobserv get flowcollector cluster -o yaml

$ oc -n netobserv get flowcollector cluster -o yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
  name: cluster
spec:
  consolePlugin:
    register: false

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
  name: cluster
spec:
  consolePlugin:
    register: false

Copy to Clipboard

Toggle word wrap

Optional: Add the netobserv-plugin plugin by manually editing the Console Operator config:
```
oc edit console.operator.openshift.io cluster
```
```
$ oc edit console.operator.openshift.io cluster
```
Copy to Clipboard Toggle word wrap
Example output
```
...
spec:
  plugins:
  - netobserv-plugin
...
```
```
...
spec:
  plugins:
  - netobserv-plugin
...
```
Copy to Clipboard Toggle word wrap

Optional: Set the spec.consolePlugin.register field to true by running the following command:

oc -n netobserv edit flowcollector cluster -o yaml

$ oc -n netobserv edit flowcollector cluster -o yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
  name: cluster
spec:
  consolePlugin:
    register: true

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
  name: cluster
spec:
  consolePlugin:
    register: true

Copy to Clipboard

Toggle word wrap

Ensure the status of console pods is running by running the following command:
```
oc get pods -n openshift-console -l app=console
```
```
$ oc get pods -n openshift-console -l app=console
```
Copy to Clipboard Toggle word wrap
Restart the console pods by running the following command:
```
oc delete pods -n openshift-console -l app=console
```
```
$ oc delete pods -n openshift-console -l app=console
```
Copy to Clipboard Toggle word wrap
Clear your browser cache and history.

Check the status of Network Observability plugin pods by running the following command:

oc get pods -n netobserv -l app=netobserv-plugin

$ oc get pods -n netobserv -l app=netobserv-plugin

Copy to Clipboard

Toggle word wrap

Example output

NAME                                READY   STATUS    RESTARTS   AGE
netobserv-plugin-68c7bbb9bb-b69q6   1/1     Running   0          21s

NAME                                READY   STATUS    RESTARTS   AGE
netobserv-plugin-68c7bbb9bb-b69q6   1/1     Running   0          21s

Copy to Clipboard

Toggle word wrap

Check the logs of the Network Observability plugin pods by running the following command:

oc logs -n netobserv -l app=netobserv-plugin

$ oc logs -n netobserv -l app=netobserv-plugin

Copy to Clipboard

Toggle word wrap

Example output

time="2022-12-13T12:06:49Z" level=info msg="Starting netobserv-console-plugin [build version: , build date: 2022-10-21 15:15] at log level info" module=main
time="2022-12-13T12:06:49Z" level=info msg="listening on https://:9001" module=server

time="2022-12-13T12:06:49Z" level=info msg="Starting netobserv-console-plugin [build version: , build date: 2022-10-21 15:15] at log level info" module=main
time="2022-12-13T12:06:49Z" level=info msg="listening on https://:9001" module=server

Copy to Clipboard

Toggle word wrap

11.3. Flowlogs-Pipeline does not consume network flows after installing Kafka
Copy link

If you deployed the flow collector first with deploymentModel: KAFKA and then deployed Kafka, the flow collector might not connect correctly to Kafka. Manually restart the flow-pipeline pods where Flowlogs-pipeline does not consume network flows from Kafka.

Procedure

Delete the flow-pipeline pods to restart them by running the following command:

oc delete pods -n netobserv -l app=flowlogs-pipeline-transformer

$ oc delete pods -n netobserv -l app=flowlogs-pipeline-transformer

Copy to Clipboard

Toggle word wrap

11.4. Failing to see network flows from both br-int and br-ex interfaces
Copy link

br-ex` and br-int are virtual bridge devices operated at OSI layer 2. The eBPF agent works at the IP and TCP levels, layers 3 and 4 respectively. You can expect that the eBPF agent captures the network traffic passing through br-ex and br-int, when the network traffic is processed by other interfaces such as physical host or virtual pod interfaces. If you restrict the eBPF agent network interfaces to attach only to br-ex and br-int, you do not see any network flow.

Manually remove the part in the interfaces or excludeInterfaces that restricts the network interfaces to br-int and br-ex.

Procedure

Remove the interfaces: [ 'br-int', 'br-ex' ] field. This allows the agent to fetch information from all the interfaces. Alternatively, you can specify the Layer-3 interface for example, eth0. Run the following command:

oc edit -n netobserv flowcollector.yaml -o yaml

$ oc edit -n netobserv flowcollector.yaml -o yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
  name: cluster
spec:
  agent:
    type: EBPF
    ebpf:
      interfaces: [ 'br-int', 'br-ex' ]

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowCollector
metadata:
  name: cluster
spec:
  agent:
    type: EBPF
    ebpf:
      interfaces: [ 'br-int', 'br-ex' ]

1

Copy to Clipboard

Toggle word wrap

1: Specifies the network interfaces.

11.5. Network Observability controller manager pod runs out of memory
Copy link

You can increase memory limits for the Network Observability operator by editing the spec.config.resources.limits.memory specification in the Subscription object.

Procedure

In the web console, navigate to Operators Installed Operators
Click Network Observability and then select Subscription.
From the Actions menu, click Edit Subscription.
1. Alternatively, you can use the CLI to open the YAML configuration for the Subscription object by running the following command:
  $ oc edit subscription netobserv-operator -n openshift-netobserv-operator
  Copy to Clipboard Toggle word wrap

Edit the Subscription object to add the config.resources.limits.memory specification and set the value to account for your memory requirements. See the Additional resources for more information about resource considerations:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: netobserv-operator
  namespace: openshift-netobserv-operator
spec:
  channel: stable
  config:
    resources:
      limits:
        memory: 800Mi     
      requests:
        cpu: 100m
        memory: 100Mi
  installPlanApproval: Automatic
  name: netobserv-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  startingCSV: <network_observability_operator_latest_version>

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: netobserv-operator
  namespace: openshift-netobserv-operator
spec:
  channel: stable
  config:
    resources:
      limits:
        memory: 800Mi

1


      requests:
        cpu: 100m
        memory: 100Mi
  installPlanApproval: Automatic
  name: netobserv-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  startingCSV: <network_observability_operator_latest_version>

2

Copy to Clipboard

Toggle word wrap

1: For example, you can increase the memory limit to 800Mi.
2: This value should not be edited, but note that it changes depending on the most current release of the Operator.

11.6. Troubleshooting Loki ResourceExhausted error
Copy link

Loki may return a ResourceExhausted error when network flow data sent by Network Observability exceeds the configured maximum message size. If you are using the Red Hat Loki Operator, this maximum message size is configured to 100 MiB.

Procedure

Navigate to Operators Installed Operators, viewing All projects from the Project drop-down menu.
In the Provided APIs list, select the Network Observability Operator.
Click the Flow Collector then the YAML view tab.
1. If you are using the Loki Operator, check that the spec.loki.batchSize value does not exceed 98 MiB.
2. If you are using a Loki installation method that is different from the Red Hat Loki Operator, such as Grafana Loki, verify that the grpc_server_max_recv_msg_size Grafana Loki server setting is higher than the FlowCollector resource spec.loki.batchSize value. If it is not, you must either increase the grpc_server_max_recv_msg_size value, or decrease the spec.loki.batchSize value so that it is lower than the limit.
Click Save if you edited the FlowCollector.

11.7. Resource troubleshooting
Copy link

11.8. LokiStack rate limit errors
Copy link

A rate-limit placed on the Loki tenant can result in potential temporary loss of data and a 429 error: Per stream rate limit exceeded (limit:xMB/sec) while attempting to ingest for stream. You might consider having an alert set to notify you of this error. For more information, see "Creating Loki rate limit alerts for the NetObserv dashboard" in the Additional resources of this section.

You can update the LokiStack CRD with the perStreamRateLimit and perStreamRateLimitBurst specifications, as shown in the following procedure.

Procedure

Navigate to Operators Installed Operators, viewing All projects from the Project dropdown.
Look for Loki Operator, and select the LokiStack tab.

Create or edit an existing LokiStack instance using the YAML view to add the perStreamRateLimit and perStreamRateLimitBurst specifications:

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: loki
  namespace: netobserv
spec:
  limits:
    global:
      ingestion:
        perStreamRateLimit: 6        
        perStreamRateLimitBurst: 30  
  tenants:
    mode: openshift-network
  managementState: Managed

apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
  name: loki
  namespace: netobserv
spec:
  limits:
    global:
      ingestion:
        perStreamRateLimit: 6

1


        perStreamRateLimitBurst: 30

2


  tenants:
    mode: openshift-network
  managementState: Managed

Copy to Clipboard

Toggle word wrap

1: The default value for perStreamRateLimit is 3.
2: The default value for perStreamRateLimitBurst is 15.

Click Save.

Verification

Once you update the perStreamRateLimit and perStreamRateLimitBurst specifications, the pods in your cluster restart and the 429 rate-limit error no longer occurs.

11.1. Using the must-gather tool
Copy link

11.2. Configuring network traffic menu entry in the OpenShift Container Platform console
Copy link

11.3. Flowlogs-Pipeline does not consume network flows after installing Kafka
Copy link

11.4. Failing to see network flows from both br-int and br-ex interfaces
Copy link

11.5. Network Observability controller manager pod runs out of memory
Copy link

11.6. Troubleshooting Loki ResourceExhausted error
Copy link

11.7. Resource troubleshooting
Copy link

11.8. LokiStack rate limit errors
Copy link

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Chapter 11. Troubleshooting Network Observability

11.1. Using the must-gather toolCopy linkLink copied to clipboard!

11.2. Configuring network traffic menu entry in the OpenShift Container Platform consoleCopy linkLink copied to clipboard!

11.3. Flowlogs-Pipeline does not consume network flows after installing KafkaCopy linkLink copied to clipboard!

11.4. Failing to see network flows from both br-int and br-ex interfacesCopy linkLink copied to clipboard!

11.5. Network Observability controller manager pod runs out of memoryCopy linkLink copied to clipboard!

11.6. Troubleshooting Loki ResourceExhausted errorCopy linkLink copied to clipboard!

11.7. Resource troubleshootingCopy linkLink copied to clipboard!

11.8. LokiStack rate limit errorsCopy linkLink copied to clipboard!

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.1. Using the must-gather tool
Copy link

11.2. Configuring network traffic menu entry in the OpenShift Container Platform console
Copy link

11.3. Flowlogs-Pipeline does not consume network flows after installing Kafka
Copy link

11.4. Failing to see network flows from both br-int and br-ex interfaces
Copy link

11.5. Network Observability controller manager pod runs out of memory
Copy link

11.6. Troubleshooting Loki ResourceExhausted error
Copy link

11.7. Resource troubleshooting
Copy link

11.8. LokiStack rate limit errors
Copy link