Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
Chapter 12. Using metrics with dashboards and alerts
The Network Observability Operator uses the
flowlogs-pipeline
12.1. Viewing network observability metrics dashboards Link kopierenLink in die Zwischenablage kopiert!
On the Overview tab in the OpenShift Container Platform console, you can view the overall aggregated metrics of the network traffic flow on the cluster. You can choose to display the information by node, namespace, owner, pod, and service. You can also use filters and display options to further refine the metrics.
Procedure
-
In the web console Observe
Dashboards, select the Netobserv dashboard. View network traffic metrics in the following categories, with each having the subset per node, namespace, source, and destination:
- Byte rates
- Packet drops
- DNS
- RTT
- Select the Netobserv/Health dashboard.
View metrics about the health of the Operator in the following categories, with each having the subset per node, namespace, source, and destination.
- Flows
- Flows Overhead
- Flow rates
- Agents
- Processor
- Operator
Infrastructure and Application metrics are shown in a split-view for namespace and workloads.
12.2. Predefined metrics Link kopierenLink in die Zwischenablage kopiert!
Metrics generated by the
flowlogs-pipeline
spec.processor.metrics.includeList
FlowCollector
12.3. Network observability metrics Link kopierenLink in die Zwischenablage kopiert!
You can also create alerts by using the
includeList
When looking for these metrics in Prometheus, such as in the Console through Observe
netobserv_
netobserv_namespace_flows_total
- includeList metrics names
Names followed by an asterisk
are enabled by default.*-
namespace_egress_bytes_total -
namespace_egress_packets_total -
namespace_ingress_bytes_total -
namespace_ingress_packets_total -
*
namespace_flows_total -
node_egress_bytes_total -
node_egress_packets_total -
*
node_ingress_bytes_total -
node_ingress_packets_total -
node_flows_total -
workload_egress_bytes_total -
workload_egress_packets_total -
*
workload_ingress_bytes_total -
workload_ingress_packets_total -
workload_flows_total
-
- PacketDrop metrics names
When the
feature is enabled inPacketDrop(withspec.agent.ebpf.featuresmode), the following additional metrics are available:privileged-
namespace_drop_bytes_total -
*
namespace_drop_packets_total -
node_drop_bytes_total -
node_drop_packets_total -
workload_drop_bytes_total -
workload_drop_packets_total
-
- DNS metrics names
When the
feature is enabled inDNSTracking, the following additional metrics are available:spec.agent.ebpf.features-
*
namespace_dns_latency_seconds -
node_dns_latency_seconds -
workload_dns_latency_seconds
-
- FlowRTT metrics names
When the
feature is enabled inFlowRTT, the following additional metrics are available:spec.agent.ebpf.features-
*
namespace_rtt_seconds -
node_rtt_seconds -
workload_rtt_seconds
-
12.4. Creating alerts Link kopierenLink in die Zwischenablage kopiert!
You can create custom alerting rules for the Netobserv dashboard metrics to trigger alerts when some defined conditions are met.
Prerequisites
- You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.
- You have the Network Observability Operator installed.
Procedure
- Create a YAML file by clicking the import icon, +.
Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when the cluster ingress traffic reaches a given threshold of 10 MBps per destination workload.
apiVersion: monitoring.openshift.io/v1 kind: AlertingRule metadata: name: netobserv-alerts namespace: openshift-monitoring spec: groups: - name: NetObservAlerts rules: - alert: NetObservIncomingBandwidth annotations: message: |- {{ $labels.job }}: incoming traffic exceeding 10 MBps for 30s on {{ $labels.DstK8S_OwnerType }} {{ $labels.DstK8S_OwnerName }} ({{ $labels.DstK8S_Namespace }}). summary: "High incoming traffic." expr: sum(rate(netobserv_workload_ingress_bytes_total {SrcK8S_Namespace="openshift-ingress"}[1m])) by (job, DstK8S_Namespace, DstK8S_OwnerName, DstK8S_OwnerType) > 100000001 for: 30s labels: severity: warning- 1
- The
netobserv_workload_ingress_bytes_totalmetric is enabled by default inspec.processor.metrics.includeList.
- Click Create to apply the configuration file to the cluster.
12.5. Custom metrics Link kopierenLink in die Zwischenablage kopiert!
You can create custom metrics out of the flowlogs data using the
FlowMetric
12.6. Configuring custom metrics by using FlowMetric API Link kopierenLink in die Zwischenablage kopiert!
You can configure the
FlowMetric
FlowMetric
Procedure
-
In the web console, navigate to Operators
Installed Operators. - In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project: dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
Configure the
resource, similar to the following sample configurations:FlowMetricExample 12.1. Generate a metric that tracks ingress bytes received from cluster external sources
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: flowmetric-cluster-external-ingress-traffic namespace: netobserv1 spec: metricName: cluster_external_ingress_bytes_total2 type: Counter3 valueField: Bytes direction: Ingress4 labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType]5 filters:6 - field: SrcSubnetLabel matchType: Absence- 1
- The
FlowMetricresources need to be created in the namespace defined in theFlowCollectorspec.namespace, which isnetobservby default. - 2
- The name of the Prometheus metric, which in the web console appears with the prefix
netobserv-<metricName>. - 3
- The
typespecifies the type of metric. TheCountertypeis useful for counting bytes or packets. - 4
- The direction of traffic to capture. If not specified, both ingress and egress are captured, which can lead to duplicated counts.
- 5
- Labels define what the metrics look like and the relationship between the different entities and also define the metrics cardinality. For example,
SrcK8S_Nameis a high cardinality metric. - 6
- Refines results based on the listed criteria. In this example, selecting only the cluster external traffic is done by matching only flows where
SrcSubnetLabelis absent. This assumes the subnet labels feature is enabled (viaspec.processor.subnetLabels), which is done by default.
Verification
-
Once the pods refresh, navigate to Observe
Metrics. -
In the Expression field, type the metric name to view the corresponding result. You can also enter an expression, such as
topk(5, sum(rate(netobserv_cluster_external_ingress_bytes_total{DstK8S_Namespace="my-namespace"}[2m])) by (DstK8S_HostName, DstK8S_OwnerName, DstK8S_OwnerType))
Example 12.2. Show RTT latency for cluster external ingress traffic
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: flowmetric-cluster-external-ingress-rtt namespace: netobserv1 spec: metricName: cluster_external_ingress_rtt_seconds type: Histogram2 valueField: TimeFlowRttNs direction: Ingress labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType] filters: - field: SrcSubnetLabel matchType: Absence - field: TimeFlowRttNs matchType: Presence divider: "1000000000"3 buckets: [".001", ".005", ".01", ".02", ".03", ".04", ".05", ".075", ".1", ".25", "1"]4 - 1
- The
FlowMetricresources need to be created in the namespace defined in theFlowCollectorspec.namespace, which isnetobservby default. - 2
- The
typespecifies the type of metric. TheHistogramtypeis useful for a latency value (TimeFlowRttNs). - 3
- Since the Round-trip time (RTT) is provided as nanos in flows, use a divider of 1 billion to convert into seconds, which is standard in Prometheus guidelines.
- 4
- The custom buckets specify precision on RTT, with optimal precision ranging between 5ms and 250ms.
Verification
-
Once the pods refresh, navigate to Observe
Metrics. - In the Expression field, you can type the metric name to view the corresponding result.
High cardinality can affect the memory usage of Prometheus. You can check whether specific labels have high cardinality in the Network Flows format reference.
12.7. Configuring custom charts using FlowMetric API Link kopierenLink in die Zwischenablage kopiert!
You can generate charts for dashboards in the OpenShift Container Platform web console, which you can view as an administrator in the Dashboard menu by defining the
charts
FlowMetric
Procedure
-
In the web console, navigate to Operators
Installed Operators. - In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project: dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
-
Configure the resource, similar to the following sample configurations:
FlowMetric
Example 12.3. Chart for tracking ingress bytes received from cluster external sources
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv
# ...
charts:
- dashboardName: Main
title: External ingress traffic
unit: Bps
type: SingleStat
queries:
- promQL: "sum(rate($METRIC[2m]))"
legend: ""
- dashboardName: Main
sectionName: External
title: Top external ingress traffic per workload
unit: Bps
type: StackArea
queries:
- promQL: "sum(rate($METRIC{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace, DstK8S_OwnerName)"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...
- 1
- The
FlowMetricresources need to be created in the namespace defined in theFlowCollectorspec.namespace, which isnetobservby default.
Verification
-
Once the pods refresh, navigate to Observe
Dashboards. Search for the NetObserv / Main dashboard. View two panels under the NetObserv / Main dashboard, or optionally a dashboard name that you create:
- A textual single statistic showing the global external ingress rate summed across all dimensions
- A timeseries graph showing the same metric per destination workload
For more information about the query language, refer to the Prometheus documentation.
Example 12.4. Chart for RTT latency for cluster external ingress traffic
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv
# ...
charts:
- dashboardName: Main
title: External ingress TCP latency
unit: seconds
type: SingleStat
queries:
- promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket[2m])) by (le)) > 0"
legend: "p99"
- dashboardName: Main
sectionName: External
title: "Top external ingress sRTT per workload, p50 (ms)"
unit: seconds
type: Line
queries:
- promQL: "histogram_quantile(0.5, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
- dashboardName: Main
sectionName: External
title: "Top external ingress sRTT per workload, p99 (ms)"
unit: seconds
type: Line
queries:
- promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...
This example uses the
histogram_quantile
p50
p99
You can show averages of histograms by dividing the metric,
$METRIC_sum
$METRIC_count
promQL: "(sum(rate($METRIC_sum{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName) / sum(rate($METRIC_count{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName))*1000"
Verification
-
Once the pods refresh, navigate to Observe
Dashboards. - Search for the NetObserv / Main dashboard. View the new panel under the NetObserv / Main dashboard, or optionally a dashboard name that you create.
For more information about the query language, refer to the Prometheus documentation.
12.8. Detecting SYN flooding using the FlowMetric API and TCP flags Link kopierenLink in die Zwischenablage kopiert!
You can create an
AlertingRule
Procedure
-
In the web console, navigate to Operators
Installed Operators. - In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
Create
resources to add the following configurations:FlowMetricConfiguration counting flows per destination host and resource, with TCP flags
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: flows-with-flags-per-destination spec: metricName: flows_with_flags_per_destination_total type: Counter labels: [SrcSubnetLabel,DstSubnetLabel,DstK8S_Name,DstK8S_Type,DstK8S_HostName,DstK8S_Namespace,Flags]Configuration counting flows per source host and resource, with TCP flags
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: flows-with-flags-per-source spec: metricName: flows_with_flags_per_source_total type: Counter labels: [DstSubnetLabel,SrcSubnetLabel,SrcK8S_Name,SrcK8S_Type,SrcK8S_HostName,SrcK8S_Namespace,Flags]Deploy the following
resource to alert for SYN flooding:AlertingRuleAlertingRulefor SYN floodingapiVersion: monitoring.openshift.io/v1 kind: AlertingRule metadata: name: netobserv-syn-alerts namespace: openshift-monitoring # ... spec: groups: - name: NetObservSYNAlerts rules: - alert: NetObserv-SYNFlood-in annotations: message: |- {{ $labels.job }}: incoming SYN-flood attack suspected to Host={{ $labels.DstK8S_HostName}}, Namespace={{ $labels.DstK8S_Namespace }}, Resource={{ $labels.DstK8S_Name }}. This is characterized by a high volume of SYN-only flows with different source IPs and/or ports. summary: "Incoming SYN-flood" expr: sum(rate(netobserv_flows_with_flags_per_destination_total{Flags="2"}[1m])) by (job, DstK8S_HostName, DstK8S_Namespace, DstK8S_Name) > 3001 for: 15s labels: severity: warning app: netobserv - alert: NetObserv-SYNFlood-out annotations: message: |- {{ $labels.job }}: outgoing SYN-flood attack suspected from Host={{ $labels.SrcK8S_HostName}}, Namespace={{ $labels.SrcK8S_Namespace }}, Resource={{ $labels.SrcK8S_Name }}. This is characterized by a high volume of SYN-only flows with different source IPs and/or ports. summary: "Outgoing SYN-flood" expr: sum(rate(netobserv_flows_with_flags_per_source_total{Flags="2"}[1m])) by (job, SrcK8S_HostName, SrcK8S_Namespace, SrcK8S_Name) > 3002 for: 15s labels: severity: warning app: netobserv # ...
Verification
- In the web console, click Manage Columns in the Network Traffic table view and click TCP flags.
- In the Network Traffic table view, filter on TCP protocol SYN TCPFlag. A large number of flows with the same byteSize indicates a SYN flood.
-
Go to Observe
Alerting and select the Alerting Rules tab. - Filter on netobserv-synflood-in alert. The alert should fire when SYN flooding occurs.