Home
Products
OpenShift Container Platform
4.21
Network Observability
Chapter 12. Using metrics with dashboards and alerts

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 12. Using metrics with dashboards and alerts

The Network Observability Operator uses the flowlogs-pipeline component to generate metrics from flow logs. Use these metrics to set custom alerts and view dashboards for network activity analysis.

12.1. Viewing network observability metrics dashboards
Copia collegamento

View network observability metrics dashboards using the Overview tab in the OpenShift Container Platform console to monitor overall traffic flow and system health, with options to filter metrics by node, namespace, owner, pod, and service.

Procedure

In the web console Observe Dashboards, select the Netobserv dashboard.
View network traffic metrics in the following categories, with each having the subset per node, namespace, source, and destination:
- Byte rates
- Packet drops
- DNS
- RTT
Select the Netobserv/Health dashboard.
View metrics about the health of the Operator in the following categories, with each having the subset per node, namespace, source, and destination:
- Flows
- Flows Overhead
- Flow rates
- Agents
- Processor
- Operator
  Infrastructure and Application metrics are shown in a split-view for namespace and workloads.

12.2. Network observability metrics
Copia collegamento

Review the comprehensive list of network observability metrics, prefixed by netobserv_, which you can configure in the FlowCollector resource and use to monitor traffic and create Prometheus alerts.

Metrics generated by the flowlogs-pipeline are configurable in the spec.processor.metrics.includeList of the FlowCollector custom resource to add or remove metrics.

You can also create alerts by using the includeList metrics in Prometheus rules, as shown in the example "Creating alerts".

When looking for these metrics in Prometheus, such as in the Console through Observe Metrics, or when defining alerts, all the metrics names are prefixed with netobserv_. For example, netobserv_namespace_flows_total. Available metrics names are as follows:

includeList metrics names

Names followed by an asterisk * are enabled by default.

namespace_egress_bytes_total
namespace_egress_packets_total
namespace_ingress_bytes_total
namespace_ingress_packets_total
namespace_flows_total *
node_egress_bytes_total
node_egress_packets_total
node_ingress_bytes_total *
node_ingress_packets_total
node_flows_total
workload_egress_bytes_total
workload_egress_packets_total
workload_ingress_bytes_total *
workload_ingress_packets_total
workload_flows_total

PacketDrop metrics names

When the PacketDrop feature is enabled in spec.agent.ebpf.features (with privileged mode), the following additional metrics are available:

namespace_drop_bytes_total
namespace_drop_packets_total *
node_drop_bytes_total
node_drop_packets_total
workload_drop_bytes_total
workload_drop_packets_total

DNS metrics names

When the DNSTracking feature is enabled in spec.agent.ebpf.features, the following additional metrics are available:

namespace_dns_latency_seconds *
node_dns_latency_seconds
workload_dns_latency_seconds

FlowRTT metrics names

When the FlowRTT feature is enabled in spec.agent.ebpf.features, the following additional metrics are available:

namespace_rtt_seconds *
node_rtt_seconds
workload_rtt_seconds

Network events metrics names

When NetworkEvents feature is enabled, this metric is available by default:

namespace_network_policy_events_total

12.3. Creating alerts
Copia collegamento

Create custom AlertingRule resources based on Netobserv dashboard metrics to define conditions that trigger alerts in the OpenShift Container Platform console.

Prerequisites

You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.
You have the Network Observability Operator installed.

Procedure

Create a YAML file by clicking the import icon, +.

Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when the cluster ingress traffic reaches a given threshold of 10 MBps per destination workload.

apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
  name: netobserv-alerts
  namespace: openshift-monitoring
spec:
  groups:
  - name: NetObservAlerts
    rules:
    - alert: NetObservIncomingBandwidth
      annotations:
        message: |-
          {{ $labels.job }}: incoming traffic exceeding 10 MBps for 30s on {{ $labels.DstK8S_OwnerType }} {{ $labels.DstK8S_OwnerName }} ({{ $labels.DstK8S_Namespace }}).
        summary: "High incoming traffic."
      expr: sum(rate(netobserv_workload_ingress_bytes_total     {SrcK8S_Namespace="openshift-ingress"}[1m])) by (job, DstK8S_Namespace, DstK8S_OwnerName, DstK8S_OwnerType) > 10000000
      for: 30s
      labels:
        severity: warning

The netobserv_workload_ingress_bytes_total metric is enabled by default in spec.processor.metrics.includeList.

Click Create to apply the configuration file to the cluster.

12.4. Custom metrics
Copia collegamento

Define custom metrics from flowlog data using the FlowMetric API, leveraging log fields as Prometheus labels to customize dashboard information and monitor specific cluster data.

In every flowlogs data that is collected, there are several fields labeled per log, such as source name and destination name. These fields can be leveraged as Prometheus labels to enable the customization of cluster information on your dashboard.

12.5. Configuring custom metrics by using FlowMetric API
Copia collegamento

Configure the FlowMetric API to create custom Prometheus metrics by mapping flow log fields as labels to meet specific monitoring needs.

Procedure

In the web console, navigate to Ecosystem Installed Operators.
In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
In the Project: dropdown list, select the project of the Network Observability Operator instance.
Click Create FlowMetric.
Configure the FlowMetric resource. See "Custom metrics configuration examples".

Verification

Once the pods refresh, navigate to Observe Metrics.
In the Expression field, type the metric name to view the corresponding result. You can also enter an expression, such as topk(5, sum(rate(netobserv_cluster_external_ingress_bytes_total{DstK8S_Namespace="my-namespace"}[2m])) by (DstK8S_HostName, DstK8S_OwnerName, DstK8S_OwnerType))

12.5.1. Custom metrics configuration examples
Copia collegamento

To monitor specific network behaviors not covered by default metrics, such as external traffic volume or latency spikes, use the FlowMetric custom resource (CR). These examples provide the configuration needed to generate targeted Prometheus metrics from network flows.

12.5.1.1. Tracking ingress bytes from cluster external sources
Copia collegamento

To measure the volume of data entering the cluster from external networks, use the following FlowMetric configuration. This metric helps identify potential bandwidth issues or unexpected external data transfer costs.

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
  name: flowmetric-cluster-external-ingress-traffic
  namespace: netobserv
spec:
  metricName: cluster_external_ingress_bytes_total
  type: Counter
  valueField: Bytes
  direction: Ingress
  labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType]
  filters:
  - field: SrcSubnetLabel
    matchType: Absence

where:

metadata.namespace: Specifies the namespace where the FlowMetric resources are created. This must match the namespace defined in the FlowCollector resource spec.namespace field, which is netobserv by default.
spec.metricName: Specifies the name of the Prometheus metric, which in the OpenShift Container Platform web console appears with the prefix netobserv-<metricName>.
spec.type: Specifies the type of metric. The Counter type is useful for counting bytes or packets.
spec.direction: Specifies the direction of traffic to capture. If not specified, both ingress and egress are captured, which can lead to duplicated counts.
spec.labels: Specifies the labels that define what the metrics look like, the relationship between the different entities, and the metrics cardinality. For example, SrcK8S_Name is a high cardinality metric.
spec.filters: Specifies the criteria to refine results based on the listed criteria. In this example, selecting only the cluster external traffic is done by matching only flows where SrcSubnetLabel is absent. This assumes the subnet labels feature is enabled (via spec.processor.subnetLabels), which is done by default

12.5.1.2. Monitoring RTT latency for cluster external ingress traffic
Copia collegamento

To analyze the performance of external connections and identify high-latency paths, use the following FlowMetric configuration. This metric converts nanoseconds to seconds to align with standard Prometheus latency dashboards.

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
  name: flowmetric-cluster-external-ingress-rtt
  namespace: netobserv
spec:
  metricName: cluster_external_ingress_rtt_seconds
  type: Histogram
  valueField: TimeFlowRttNs
  direction: Ingress
  labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType]
  filters:
  - field: SrcSubnetLabel
    matchType: Absence
  - field: TimeFlowRttNs
    matchType: Presence
  divider: "1000000000"
  buckets: [".001", ".005", ".01", ".02", ".03", ".04", ".05", ".075", ".1", ".25", "1"]

where:

metadata.namespace: Specifies the namespace where the FlowMetric resources are created. This must match the namespace defined in the FlowCollector resource spec.namespace field, which is netobserv by default.
spec.type: Specifies the type of metric. The Histogram type is useful for a latency value, such as TimeFlowRttNs.
spec.divider: Specifies the value used to divide the metric. Because the Round-trip time (RTT) is provided as nanoseconds in flows, use a divider of 1,000,000,000 to convert the value into seconds, which is standard in Prometheus guidelines.
spec.buckets: Specifies custom buckets for RTT precision. The optimal precision ranges between 5ms and 250ms.

12.6. Creating metrics from nested or array fields in the Traffic flows table
Copia collegamento

Create a FlowMetric custom resource to generate metrics for nested or array fields in the Traffic flows table, such as Network events or Interfaces.

Important

OVN Observability / Viewing NetworkEvents is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Important

OVN Observability and the ability to view and track network events is available only in OpenShift Container Platform 4.17 and 4.18.

The following example shows how to generate metrics from the Network events field for network policy events.

Prerequisites

Enable NetworkEvents feature. See the Additional resources for how to do this.
A network policy specified.

Procedure

In the web console, navigate to Ecosystem Installed Operators.
In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
In the Project dropdown list, select the project of the Network Observability Operator instance.
Click Create FlowMetric.
Create FlowMetric resources to add the following configurations:
Configuration counting network policy events per policy name and namespace
```
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
  name: network-policy-events
  namespace: netobserv
spec:
  metricName: network_policy_events_total
  type: Counter
  labels: [NetworkEvents>Type, NetworkEvents>Namespace, NetworkEvents>Name, NetworkEvents>Action, NetworkEvents>Direction]
  filters:
  - field: NetworkEvents>Feature
    value: acl
  flatten: [NetworkEvents]
  remap:
    "NetworkEvents>Type": type
    "NetworkEvents>Namespace": namespace
    "NetworkEvents>Name": name
    "NetworkEvents>Direction": direction
```
where:
spec.labels
Specifies the labels that represent the nested fields for Network Events from the Traffic flows table. Each network event has a specific type, namespace, name, action, and direction. You can alternatively specify Interfaces if NetworkEvents is unavailable in your version of OpenShift Container Platform.
spec.flatten
Specifies an optional field that contains a list of items to be represented as distinct items.
spec.remap
Specifies an optional set of fields to rename in Prometheus.

Verification

In the web console, navigate to Observe Dashboards and scroll down to see the Network Policy tab.
You should begin seeing metrics filter in based on the metric you created along with the network policy specifications.

12.7. Configuring custom charts using FlowMetric API
Copia collegamento

Generate custom charts for OpenShift Container Platform web console dashboards by defining the charts section of the FlowMetric custom resource.

You can view custom charts as an administrator in the Dashboard menu.

Procedure

In the web console, navigate to Ecosystem Installed Operators.
In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
In the Project: dropdown list, select the project of the Network Observability Operator instance.
Click Create FlowMetric.
Configure the FlowMetric resource. See "Flowmetric chart configuration examples".

Verification

Once the pods refresh, navigate to Observe Dashboards.
Search for the NetObserv / Main dashboard. View two panels under the NetObserv / Main dashboard, or optionally a dashboard name that you create:
- A textual single statistic showing the global external ingress rate summed across all dimensions
- A timeseries graph showing the same metric per destination workload

For more information about the query language, refer to the Prometheus documentation.

12.7.1. Flowmetric chart configuration examples
Copia collegamento

These FlowMetric custom resource examples demonstrate how to define charts in the OpenShift Container Platform web console for tracking external ingress traffic and round-trip time (RTT) latency.

12.7.1.1. Ingress bytes chart for cluster external sources
Copia collegamento

Use the following configuration to track the rate of ingress traffic from cluster external sources. These charts help identify bandwidth usage per workload.

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
  name: flowmetric-cluster-external-ingress-traffic
  namespace: netobserv
# ...
  charts:
  - dashboardName: Main
    title: External ingress traffic
    unit: Bps
    type: SingleStat
    queries:
    - promQL: "sum(rate($METRIC[2m]))"
      legend: ""
  - dashboardName: Main
    sectionName: External
    title: Top external ingress traffic per workload
    unit: Bps
    type: StackArea
    queries:
    - promQL: "sum(rate($METRIC{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace, DstK8S_OwnerName)"
      legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...

where:

metadata.namespace: Specifies the namespace where the FlowMetric resources are created. This must match the namespace defined in the FlowCollector spec.namespace, which is netobserv by default.
spec.charts.dashboardName: Specifies the name of the dashboard. Using a different dashboardName creates a new dashboard that is prefixed with Netobserv. For example, Netobserv / <dashboard_name>.

12.7.1.2. RTT latency chart for cluster external ingress traffic
Copia collegamento

Use the following configuration to monitor round-trip time (RTT) for cluster external ingress traffic. These examples use the histogram_quantile function to display the 50th and 99th percentiles (p50 and p99).

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
  name: flowmetric-cluster-external-ingress-traffic
  namespace: netobserv
# ...
  charts:
  - dashboardName: Main
    title: External ingress TCP latency
    unit: seconds
    type: SingleStat
    queries:
    - promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket[2m])) by (le)) > 0"
      legend: "p99"
  - dashboardName: Main
    sectionName: External
    title: "Top external ingress sRTT per workload, p50 (ms)"
    unit: seconds
    type: Line
    queries:
    - promQL: "histogram_quantile(0.5, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
      legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
  - dashboardName: Main
    sectionName: External
    title: "Top external ingress sRTT per workload, p99 (ms)"
    unit: seconds
    type: Line
    queries:
    - promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
      legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...

where:

metadata.namespace: Specifies the namespace where the FlowMetric resources are created. This must match the namespace defined in the FlowCollector spec.namespace, which is netobserv by default.
spec.charts.dashboardName: Specifies the name of the dashboard. Using a different dashboardName creates a new dashboard that is prefixed with Netobserv. For example, Netobserv / <dashboard_name>.

12.7.1.3. Calculate histogram averages
Copia collegamento

You can show averages of histograms by dividing the metric, $METRIC_sum, by the metric, $METRIC_count, which are automatically generated when you create a histogram. With the preceding example, the Prometheus query to do this is as follows:

promQL: "(sum(rate($METRIC_sum{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName) / sum(rate($METRIC_count{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName))*1000"

12.8. Detecting SYN flooding using the FlowMetric API and TCP flags
Copia collegamento

Deploy a custom AlertingRule and FlowMetric configuration to monitor TCP flags, enabling real-time detection and alerting for SYN flooding attacks on the cluster.

Procedure

In the web console, navigate to Ecosystem Installed Operators.
In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
In the Project dropdown list, select the project of the Network Observability Operator instance.
Click Create FlowMetric.

Create FlowMetric resources to add the following configurations:

Configuration counting flows per destination host and resource, with TCP flags

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
  name: flows-with-flags-per-destination
spec:
  metricName: flows_with_flags_per_destination_total
  type: Counter
  labels: [SrcSubnetLabel,DstSubnetLabel,DstK8S_Name,DstK8S_Type,DstK8S_HostName,DstK8S_Namespace,Flags]

Configuration counting flows per source host and resource, with TCP flags

apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
  name: flows-with-flags-per-source
spec:
  metricName: flows_with_flags_per_source_total
  type: Counter
  labels: [DstSubnetLabel,SrcSubnetLabel,SrcK8S_Name,SrcK8S_Type,SrcK8S_HostName,SrcK8S_Namespace,Flags]

Deploy the following AlertingRule resource to alert for SYN flooding:

AlertingRule for SYN flooding

apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
  name: netobserv-syn-alerts
  namespace: openshift-monitoring
# ...
  spec:
  groups:
  - name: NetObservSYNAlerts
    rules:
    - alert: NetObserv-SYNFlood-in
      annotations:
        message: |-
          {{ $labels.job }}: incoming SYN-flood attack suspected to Host={{ $labels.DstK8S_HostName}}, Namespace={{ $labels.DstK8S_Namespace }}, Resource={{ $labels.DstK8S_Name }}. This is characterized by a high volume of SYN-only flows with different source IPs and/or ports.
        summary: "Incoming SYN-flood"
      expr: sum(rate(netobserv_flows_with_flags_per_destination_total{Flags="2"}[1m])) by (job, DstK8S_HostName, DstK8S_Namespace, DstK8S_Name) > 300
      for: 15s
      labels:
        severity: warning
        app: netobserv
    - alert: NetObserv-SYNFlood-out
      annotations:
        message: |-
          {{ $labels.job }}: outgoing SYN-flood attack suspected from Host={{ $labels.SrcK8S_HostName}}, Namespace={{ $labels.SrcK8S_Namespace }}, Resource={{ $labels.SrcK8S_Name }}. This is characterized by a high volume of SYN-only flows with different source IPs and/or ports.
        summary: "Outgoing SYN-flood"
      expr: sum(rate(netobserv_flows_with_flags_per_source_total{Flags="2"}[1m])) by (job, SrcK8S_HostName, SrcK8S_Namespace, SrcK8S_Name) > 300
      for: 15s
      labels:
        severity: warning
        app: netobserv
# ...

In this example, the threshold for the alert is 300; however, you can adapt this value empirically. A threshold that is too low might produce false-positives, and if it’s too high it might miss actual attacks.

Verification

In the web console, click Manage Columns in the Network Traffic table view and click TCP flags.
In the Network Traffic table view, filter on TCP protocol SYN TCPFlag. A large number of flows with the same byteSize indicates a SYN flood.
Go to Observe Alerting and select the Alerting Rules tab.
Filter on netobserv-synflood-in alert. The alert should fire when SYN flooding occurs.

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 12. Using metrics with dashboards and alerts

12.1. Viewing network observability metrics dashboards
Copia collegamento

12.2. Network observability metrics
Copia collegamento

12.3. Creating alerts
Copia collegamento

12.4. Custom metrics
Copia collegamento

12.5. Configuring custom metrics by using FlowMetric API
Copia collegamento

12.5.1. Custom metrics configuration examples
Copia collegamento

12.5.1.1. Tracking ingress bytes from cluster external sources
Copia collegamento

12.5.1.2. Monitoring RTT latency for cluster external ingress traffic
Copia collegamento

12.6. Creating metrics from nested or array fields in the Traffic flows table
Copia collegamento

12.7. Configuring custom charts using FlowMetric API
Copia collegamento

12.7.1. Flowmetric chart configuration examples
Copia collegamento

12.7.1.1. Ingress bytes chart for cluster external sources
Copia collegamento

12.7.1.2. RTT latency chart for cluster external ingress traffic
Copia collegamento

12.7.1.3. Calculate histogram averages
Copia collegamento

12.8. Detecting SYN flooding using the FlowMetric API and TCP flags
Copia collegamento

Formazione

Prova, acquista e vendi

Community

Informazioni su Red Hat

Rendiamo l’open source più inclusivo

Informazioni sulla documentazione di Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

Questo contenuto non è disponibile nella lingua selezionata.

Chapter 12. Using metrics with dashboards and alerts

12.1. Viewing network observability metrics dashboardsCopia collegamentoCollegamento copiato negli appunti!

12.2. Network observability metricsCopia collegamentoCollegamento copiato negli appunti!

12.3. Creating alertsCopia collegamentoCollegamento copiato negli appunti!

12.4. Custom metricsCopia collegamentoCollegamento copiato negli appunti!

12.5. Configuring custom metrics by using FlowMetric APICopia collegamentoCollegamento copiato negli appunti!

12.5.1. Custom metrics configuration examplesCopia collegamentoCollegamento copiato negli appunti!

12.5.1.1. Tracking ingress bytes from cluster external sourcesCopia collegamentoCollegamento copiato negli appunti!

12.5.1.2. Monitoring RTT latency for cluster external ingress trafficCopia collegamentoCollegamento copiato negli appunti!

12.6. Creating metrics from nested or array fields in the Traffic flows tableCopia collegamentoCollegamento copiato negli appunti!

12.7. Configuring custom charts using FlowMetric APICopia collegamentoCollegamento copiato negli appunti!

12.7.1. Flowmetric chart configuration examplesCopia collegamentoCollegamento copiato negli appunti!

12.7.1.1. Ingress bytes chart for cluster external sourcesCopia collegamentoCollegamento copiato negli appunti!

12.7.1.2. RTT latency chart for cluster external ingress trafficCopia collegamentoCollegamento copiato negli appunti!

12.7.1.3. Calculate histogram averagesCopia collegamentoCollegamento copiato negli appunti!

12.8. Detecting SYN flooding using the FlowMetric API and TCP flagsCopia collegamentoCollegamento copiato negli appunti!

Formazione

Prova, acquista e vendi

Community

Informazioni su Red Hat

Rendiamo l’open source più inclusivo

Informazioni sulla documentazione di Red Hat

Theme

Red Hat legal and privacy links

Red Hat legal and privacy links

12.1. Viewing network observability metrics dashboards
Copia collegamento

12.2. Network observability metrics
Copia collegamento

12.3. Creating alerts
Copia collegamento

12.4. Custom metrics
Copia collegamento

12.5. Configuring custom metrics by using FlowMetric API
Copia collegamento

12.5.1. Custom metrics configuration examples
Copia collegamento

12.5.1.1. Tracking ingress bytes from cluster external sources
Copia collegamento

12.5.1.2. Monitoring RTT latency for cluster external ingress traffic
Copia collegamento

12.6. Creating metrics from nested or array fields in the Traffic flows table
Copia collegamento

12.7. Configuring custom charts using FlowMetric API
Copia collegamento

12.7.1. Flowmetric chart configuration examples
Copia collegamento

12.7.1.1. Ingress bytes chart for cluster external sources
Copia collegamento

12.7.1.2. RTT latency chart for cluster external ingress traffic
Copia collegamento

12.7.1.3. Calculate histogram averages
Copia collegamento

12.8. Detecting SYN flooding using the FlowMetric API and TCP flags
Copia collegamento