홈
제품
OpenShift Container Platform
4.18
Network Observability
Chapter 11. Monitoring the Network Observability Operator

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 11. Monitoring the Network Observability Operator

You can use the web console to monitor alerts related to the health of the Network Observability Operator.

11.1. Health dashboards
링크 복사

Metrics about health and resource usage of the Network Observability Operator are located in the Observe Dashboards page in the web console. You can view metrics about the health of the Operator in the following categories:

Flows per second
Sampling
Errors last minute
Dropped flows per second
Flowlogs-pipeline statistics
Flowlogs-pipleine statistics views
eBPF agent statistics views
Operator statistics
Resource usage

11.2. Health alerts
링크 복사

A health alert banner that directs you to the dashboard can appear on the Network Traffic and Home pages if an alert is triggered. Alerts are generated in the following cases:

The NetObservLokiError alert occurs if the flowlogs-pipeline workload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached.
The NetObservNoFlows alert occurs if no flows are ingested for a certain amount of time.
The NetObservFlowsDropped alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows with degraded performance, or when the capacity limiter is triggered.

11.3. Viewing health information
링크 복사

You can access metrics about health and resource usage of the Network Observability Operator from the Dashboards page in the web console.

Prerequisites

You have the Network Observability Operator installed.
You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.

Procedure

From the Administrator perspective in the web console, navigate to Observe Dashboards.
From the Dashboards dropdown, select Netobserv/Health.
View the metrics about the health of the Operator that are displayed on the page.

11.3.1. Disabling health alerts
링크 복사

You can opt out of health alerting by editing the FlowCollector resource:

In the web console, navigate to Operators Installed Operators.
Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
Select cluster then select the YAML tab.

Add spec.processor.metrics.disableAlerts to disable health alerts, as in the following YAML sample:

apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
  name: cluster
spec:
  processor:
    metrics:
      disableAlerts: [NetObservLokiError, NetObservNoFlows]

apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
  name: cluster
spec:
  processor:
    metrics:
      disableAlerts: [NetObservLokiError, NetObservNoFlows]

Copy to Clipboard

Toggle word wrap

1: You can specify one or a list with both types of alerts to disable.

11.4. Creating Loki rate limit alerts for the NetObserv dashboard
링크 복사

You can create custom alerting rules for the Netobserv dashboard metrics to trigger alerts when Loki rate limits have been reached.

Prerequisites

You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.
You have the Network Observability Operator installed.

Procedure

Create a YAML file by clicking the import icon, +.

Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when Loki rate limits have been reached:

apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
  name: loki-alerts
  namespace: openshift-monitoring
spec:
  groups:
  - name: LokiRateLimitAlerts
    rules:
    - alert: LokiTenantRateLimit
      annotations:
        message: |-
          {{ $labels.job }} {{ $labels.route }} is experiencing 429 errors.
        summary: "At any number of requests are responded with the rate limit error code."
      expr: sum(irate(loki_request_duration_seconds_count{status_code="429"}[1m])) by (job, namespace, route) / sum(irate(loki_request_duration_seconds_count[1m])) by (job, namespace, route) * 100 > 0
      for: 10s
      labels:
        severity: warning

apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
  name: loki-alerts
  namespace: openshift-monitoring
spec:
  groups:
  - name: LokiRateLimitAlerts
    rules:
    - alert: LokiTenantRateLimit
      annotations:
        message: |-
          {{ $labels.job }} {{ $labels.route }} is experiencing 429 errors.
        summary: "At any number of requests are responded with the rate limit error code."
      expr: sum(irate(loki_request_duration_seconds_count{status_code="429"}[1m])) by (job, namespace, route) / sum(irate(loki_request_duration_seconds_count[1m])) by (job, namespace, route) * 100 > 0
      for: 10s
      labels:
        severity: warning

Copy to Clipboard

Toggle word wrap

Click Create to apply the configuration file to the cluster.

11.5. Using the eBPF agent alert
링크 복사

An alert, NetObservAgentFlowsDropped, is triggered when the network observability eBPF agent hashmap table is full or when the capacity limiter is triggered. If you see this alert, consider increasing the cacheMaxFlows in the FlowCollector, as shown in the following example.

Note

Increasing the cacheMaxFlows might increase the memory usage of the eBPF agent.

Procedure

In the web console, navigate to Operators Installed Operators.
Under the Provided APIs heading for the Network Observability Operator, select Flow Collector.
Select cluster, and then select the YAML tab.
Increase the spec.agent.ebpf.cacheMaxFlows value, as shown in the following YAML sample:

apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
  name: cluster
spec:
  namespace: netobserv
  deploymentModel: Direct
  agent:
    type: eBPF
    ebpf:
      cacheMaxFlows: 200000

apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
  name: cluster
spec:
  namespace: netobserv
  deploymentModel: Direct
  agent:
    type: eBPF
    ebpf:
      cacheMaxFlows: 200000

Copy to Clipboard

Toggle word wrap

1: Increase the cacheMaxFlows value from its value at the time of the NetObservAgentFlowsDropped alert.

맨 위로 이동

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 11. Monitoring the Network Observability Operator

11.1. Health dashboards
링크 복사

11.2. Health alerts
링크 복사

11.3. Viewing health information
링크 복사

11.3.1. Disabling health alerts
링크 복사

11.4. Creating Loki rate limit alerts for the NetObserv dashboard
링크 복사

11.5. Using the eBPF agent alert
링크 복사

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

이 콘텐츠는 선택한 언어로 제공되지 않습니다.

Chapter 11. Monitoring the Network Observability Operator

11.1. Health dashboards링크 복사링크가 클립보드에 복사되었습니다!

11.2. Health alerts링크 복사링크가 클립보드에 복사되었습니다!

11.3. Viewing health information링크 복사링크가 클립보드에 복사되었습니다!

11.3.1. Disabling health alerts링크 복사링크가 클립보드에 복사되었습니다!

11.4. Creating Loki rate limit alerts for the NetObserv dashboard링크 복사링크가 클립보드에 복사되었습니다!

11.5. Using the eBPF agent alert링크 복사링크가 클립보드에 복사되었습니다!

자세한 정보

평가판, 구매 및 판매

커뮤니티

Red Hat 문서 정보

보다 포괄적 수용을 위한 오픈 소스 용어 교체

Red Hat 소개

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

11.1. Health dashboards
링크 복사

11.2. Health alerts
링크 복사

11.3. Viewing health information
링크 복사

11.3.1. Disabling health alerts
링크 복사

11.4. Creating Loki rate limit alerts for the NetObserv dashboard
링크 복사

11.5. Using the eBPF agent alert
링크 복사