此内容没有您所选择的语言版本。

Chapter 11. Monitoring the Network Observability Operator


You can use the web console to monitor alerts related to the health of the Network Observability Operator.

11.1. Health dashboards

Metrics about health and resource usage of the Network Observability Operator are located in the Observe Dashboards page in the web console. You can view metrics about the health of the Operator in the following categories:

  • Flows per second
  • Sampling
  • Errors last minute
  • Dropped flows per second
  • Flowlogs-pipeline statistics
  • Flowlogs-pipleine statistics views
  • eBPF agent statistics views
  • Operator statistics
  • Resource usage

11.2. Health alerts

A health alert banner that directs you to the dashboard can appear on the Network Traffic and Home pages if an alert is triggered. Alerts are generated in the following cases:

  • The NetObservLokiError alert occurs if the flowlogs-pipeline workload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached.
  • The NetObservNoFlows alert occurs if no flows are ingested for a certain amount of time.
  • The NetObservFlowsDropped alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows with degraded performance, or when the capacity limiter is triggered.

11.3. Viewing health information

You can access metrics about health and resource usage of the Network Observability Operator from the Dashboards page in the web console.

Prerequisites

  • You have the Network Observability Operator installed.
  • You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.

Procedure

  1. From the Administrator perspective in the web console, navigate to Observe Dashboards.
  2. From the Dashboards dropdown, select Netobserv/Health.
  3. View the metrics about the health of the Operator that are displayed on the page.

11.3.1. Disabling health alerts

You can opt out of health alerting by editing the FlowCollector resource:

  1. In the web console, navigate to Operators Installed Operators.
  2. Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
  3. Select cluster then select the YAML tab.
  4. Add spec.processor.metrics.disableAlerts to disable health alerts, as in the following YAML sample:

    apiVersion: flows.netobserv.io/v1beta2
    kind: FlowCollector
    metadata:
      name: cluster
    spec:
      processor:
        metrics:
          disableAlerts: [NetObservLokiError, NetObservNoFlows] 
    1
    Copy to Clipboard Toggle word wrap
    1
    You can specify one or a list with both types of alerts to disable.

You can create custom alerting rules for the Netobserv dashboard metrics to trigger alerts when Loki rate limits have been reached.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.
  • You have the Network Observability Operator installed.

Procedure

  1. Create a YAML file by clicking the import icon, +.
  2. Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when Loki rate limits have been reached:

    apiVersion: monitoring.openshift.io/v1
    kind: AlertingRule
    metadata:
      name: loki-alerts
      namespace: openshift-monitoring
    spec:
      groups:
      - name: LokiRateLimitAlerts
        rules:
        - alert: LokiTenantRateLimit
          annotations:
            message: |-
              {{ $labels.job }} {{ $labels.route }} is experiencing 429 errors.
            summary: "At any number of requests are responded with the rate limit error code."
          expr: sum(irate(loki_request_duration_seconds_count{status_code="429"}[1m])) by (job, namespace, route) / sum(irate(loki_request_duration_seconds_count[1m])) by (job, namespace, route) * 100 > 0
          for: 10s
          labels:
            severity: warning
    Copy to Clipboard Toggle word wrap
  3. Click Create to apply the configuration file to the cluster.

11.5. Using the eBPF agent alert

An alert, NetObservAgentFlowsDropped, is triggered when the network observability eBPF agent hashmap table is full or when the capacity limiter is triggered. If you see this alert, consider increasing the cacheMaxFlows in the FlowCollector, as shown in the following example.

Note

Increasing the cacheMaxFlows might increase the memory usage of the eBPF agent.

Procedure

  1. In the web console, navigate to Operators Installed Operators.
  2. Under the Provided APIs heading for the Network Observability Operator, select Flow Collector.
  3. Select cluster, and then select the YAML tab.
  4. Increase the spec.agent.ebpf.cacheMaxFlows value, as shown in the following YAML sample:
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
  name: cluster
spec:
  namespace: netobserv
  deploymentModel: Direct
  agent:
    type: eBPF
    ebpf:
      cacheMaxFlows: 200000 
1
Copy to Clipboard Toggle word wrap
1
Increase the cacheMaxFlows value from its value at the time of the NetObservAgentFlowsDropped alert.
返回顶部
Red Hat logoGithubredditYoutubeTwitter

学习

尝试、购买和销售

社区

关于红帽文档

通过我们的产品和服务,以及可以信赖的内容,帮助红帽用户创新并实现他们的目标。 了解我们当前的更新.

让开源更具包容性

红帽致力于替换我们的代码、文档和 Web 属性中存在问题的语言。欲了解更多详情,请参阅红帽博客.

關於紅帽

我们提供强化的解决方案,使企业能够更轻松地跨平台和环境(从核心数据中心到网络边缘)工作。

Theme

© 2025 Red Hat