Questo contenuto non è disponibile nella lingua selezionata.
Chapter 12. Using metrics with dashboards and alerts
The Network Observability Operator uses the flowlogs-pipeline component to generate metrics from flow logs. Use these metrics to set custom alerts and view dashboards for network activity analysis.
12.1. Viewing network observability metrics dashboards Copia collegamentoCollegamento copiato negli appunti!
View network observability metrics dashboards using the Overview tab in the OpenShift Container Platform console to monitor overall traffic flow and system health, with options to filter metrics by node, namespace, owner, pod, and service.
Procedure
-
In the web console Observe
Dashboards, select the Netobserv dashboard. View network traffic metrics in the following categories, with each having the subset per node, namespace, source, and destination:
- Byte rates
- Packet drops
- DNS
- RTT
- Select the Netobserv/Health dashboard.
View metrics about the health of the Operator in the following categories, with each having the subset per node, namespace, source, and destination:
- Flows
- Flows Overhead
- Flow rates
- Agents
- Processor
Operator
Infrastructure and Application metrics are shown in a split-view for namespace and workloads.
12.2. Network observability metrics Copia collegamentoCollegamento copiato negli appunti!
Review the comprehensive list of network observability metrics, prefixed by netobserv_, which you can configure in the FlowCollector resource and use to monitor traffic and create Prometheus alerts.
Metrics generated by the flowlogs-pipeline are configurable in the spec.processor.metrics.includeList of the FlowCollector custom resource to add or remove metrics.
You can also create alerts by using the includeList metrics in Prometheus rules, as shown in the example "Creating alerts".
When looking for these metrics in Prometheus, such as in the Console through Observe netobserv_. For example, netobserv_namespace_flows_total. Available metrics names are as follows:
- includeList metrics names
Names followed by an asterisk
*are enabled by default.-
namespace_egress_bytes_total -
namespace_egress_packets_total -
namespace_ingress_bytes_total -
namespace_ingress_packets_total -
namespace_flows_total* -
node_egress_bytes_total -
node_egress_packets_total -
node_ingress_bytes_total* -
node_ingress_packets_total -
node_flows_total -
workload_egress_bytes_total -
workload_egress_packets_total -
workload_ingress_bytes_total* -
workload_ingress_packets_total -
workload_flows_total
-
- PacketDrop metrics names
When the
PacketDropfeature is enabled inspec.agent.ebpf.features(withprivilegedmode), the following additional metrics are available:-
namespace_drop_bytes_total -
namespace_drop_packets_total* -
node_drop_bytes_total -
node_drop_packets_total -
workload_drop_bytes_total -
workload_drop_packets_total
-
- DNS metrics names
When the
DNSTrackingfeature is enabled inspec.agent.ebpf.features, the following additional metrics are available:-
namespace_dns_latency_seconds* -
node_dns_latency_seconds -
workload_dns_latency_seconds
-
- FlowRTT metrics names
When the
FlowRTTfeature is enabled inspec.agent.ebpf.features, the following additional metrics are available:-
namespace_rtt_seconds* -
node_rtt_seconds -
workload_rtt_seconds
-
- Network events metrics names
When
NetworkEventsfeature is enabled, this metric is available by default:-
namespace_network_policy_events_total
-
12.3. Creating alerts Copia collegamentoCollegamento copiato negli appunti!
Create custom AlertingRule resources based on Netobserv dashboard metrics to define conditions that trigger alerts in the OpenShift Container Platform console.
Prerequisites
- You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.
- You have the Network Observability Operator installed.
Procedure
- Create a YAML file by clicking the import icon, +.
Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when the cluster ingress traffic reaches a given threshold of 10 MBps per destination workload.
apiVersion: monitoring.openshift.io/v1 kind: AlertingRule metadata: name: netobserv-alerts namespace: openshift-monitoring spec: groups: - name: NetObservAlerts rules: - alert: NetObservIncomingBandwidth annotations: message: |- {{ $labels.job }}: incoming traffic exceeding 10 MBps for 30s on {{ $labels.DstK8S_OwnerType }} {{ $labels.DstK8S_OwnerName }} ({{ $labels.DstK8S_Namespace }}). summary: "High incoming traffic." expr: sum(rate(netobserv_workload_ingress_bytes_total {SrcK8S_Namespace="openshift-ingress"}[1m])) by (job, DstK8S_Namespace, DstK8S_OwnerName, DstK8S_OwnerType) > 100000001 for: 30s labels: severity: warning- 1
- The
netobserv_workload_ingress_bytes_totalmetric is enabled by default inspec.processor.metrics.includeList.
- Click Create to apply the configuration file to the cluster.
12.4. Custom metrics Copia collegamentoCollegamento copiato negli appunti!
Define custom metrics from flowlog data using the FlowMetric API, leveraging log fields as Prometheus labels to customize dashboard information and monitor specific cluster data.
In every flowlogs data that is collected, there are several fields labeled per log, such as source name and destination name. These fields can be leveraged as Prometheus labels to enable the customization of cluster information on your dashboard.
12.5. Configuring custom metrics by using FlowMetric API Copia collegamentoCollegamento copiato negli appunti!
Configure the FlowMetric API to create custom Prometheus metrics by mapping flow log fields as labels to meet specific monitoring needs.
Procedure
-
In the web console, navigate to Ecosystem
Installed Operators. - In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project: dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
-
Configure the
FlowMetricresource. See "Custom metrics configuration examples".
Verification
-
Once the pods refresh, navigate to Observe
Metrics. -
In the Expression field, type the metric name to view the corresponding result. You can also enter an expression, such as
topk(5, sum(rate(netobserv_cluster_external_ingress_bytes_total{DstK8S_Namespace="my-namespace"}[2m])) by (DstK8S_HostName, DstK8S_OwnerName, DstK8S_OwnerType))
12.5.1. Custom metrics configuration examples Copia collegamentoCollegamento copiato negli appunti!
To monitor specific network behaviors not covered by default metrics, such as external traffic volume or latency spikes, use the FlowMetric custom resource (CR). These examples provide the configuration needed to generate targeted Prometheus metrics from network flows.
12.5.1.1. Tracking ingress bytes from cluster external sources Copia collegamentoCollegamento copiato negli appunti!
To measure the volume of data entering the cluster from external networks, use the following FlowMetric configuration. This metric helps identify potential bandwidth issues or unexpected external data transfer costs.
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv
spec:
metricName: cluster_external_ingress_bytes_total
type: Counter
valueField: Bytes
direction: Ingress
labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType]
filters:
- field: SrcSubnetLabel
matchType: Absence
- 1
- The
FlowMetricresources need to be created in the namespace defined in theFlowCollectorspec.namespace, which isnetobservby default. - 2
- The name of the Prometheus metric, which in the web console appears with the prefix
netobserv-<metricName>. - 3
- The
typespecifies the type of metric. TheCountertypeis useful for counting bytes or packets. - 4
- The direction of traffic to capture. If not specified, both ingress and egress are captured, which can lead to duplicated counts.
- 5
- Labels define what the metrics look like and the relationship between the different entities and also define the metrics cardinality. For example,
SrcK8S_Nameis a high cardinality metric. - 6
- Refines results based on the listed criteria. In this example, selecting only the cluster external traffic is done by matching only flows where
SrcSubnetLabelis absent. This assumes the subnet labels feature is enabled (viaspec.processor.subnetLabels), which is done by default.
12.5.1.2. Monitoring RTT latency for cluster external ingress traffic Copia collegamentoCollegamento copiato negli appunti!
To analyze the performance of external connections and identify high-latency paths, use the following FlowMetric configuration. This metric converts nanoseconds to seconds to align with standard Prometheus latency dashboards.
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-rtt
namespace: netobserv
spec:
metricName: cluster_external_ingress_rtt_seconds
type: Histogram
valueField: TimeFlowRttNs
direction: Ingress
labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType]
filters:
- field: SrcSubnetLabel
matchType: Absence
- field: TimeFlowRttNs
matchType: Presence
divider: "1000000000"
buckets: [".001", ".005", ".01", ".02", ".03", ".04", ".05", ".075", ".1", ".25", "1"]
- 1
- The
FlowMetricresources need to be created in the namespace defined in theFlowCollectorspec.namespace, which isnetobservby default. - 2
- The
typespecifies the type of metric. TheHistogramtypeis useful for a latency value (TimeFlowRttNs). - 3
- Since the Round-trip time (RTT) is provided as nanos in flows, use a divider of 1 billion to convert into seconds, which is standard in Prometheus guidelines.
- 4
- The custom buckets specify precision on RTT, with optimal precision ranging between 5ms and 250ms.
12.6. Creating metrics from nested or array fields in the Traffic flows table Copia collegamentoCollegamento copiato negli appunti!
Create a FlowMetric custom resource to generate metrics for nested or array fields in the Traffic flows table, such as Network events or Interfaces.
OVN Observability / Viewing NetworkEvents is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
OVN Observability and the ability to view and track network events is available only in OpenShift Container Platform 4.17 and 4.18.
The following example shows how to generate metrics from the Network events field for network policy events.
Prerequisites
-
Enable
NetworkEvents feature. See the Additional resources for how to do this. - A network policy specified.
Procedure
-
In the web console, navigate to Ecosystem
Installed Operators. - In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
Create
FlowMetricresources to add the following configurations:Configuration counting network policy events per policy name and namespace
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: network-policy-events namespace: netobserv spec: metricName: network_policy_events_total type: Counter labels: [NetworkEvents>Type, NetworkEvents>Namespace, NetworkEvents>Name, NetworkEvents>Action, NetworkEvents>Direction]1 filters: - field: NetworkEvents>Feature value: acl flatten: [NetworkEvents]2 remap:3 "NetworkEvents>Type": type "NetworkEvents>Namespace": namespace "NetworkEvents>Name": name "NetworkEvents>Direction": direction- 1
- These labels represent the nested fields for Network Events from the Traffic flows table. Each network event has a specific type, namespace, name, action, and direction. You can alternatively specify the
InterfacesifNetworkEventsis unavailable in your OpenShift Container Platform version. - 2
- Optional: You can choose to represent a field that contains a list of items as distinct items.
- 3
- Optional: You can rename the fields in Prometheus.
Verification
-
In the web console, navigate to Observe
Dashboards and scroll down to see the Network Policy tab. - You should begin seeing metrics filter in based on the metric you created along with the network policy specifications.
High cardinality can affect the memory usage of Prometheus. You can check whether specific labels have high cardinality in the Network Flows format reference.
12.7. Configuring custom charts using FlowMetric API Copia collegamentoCollegamento copiato negli appunti!
Generate custom charts for OpenShift Container Platform web console dashboards by defining the charts section of the FlowMetric custom resource.
You can view custom charts as an administrator in the Dashboard menu.
Procedure
-
In the web console, navigate to Ecosystem
Installed Operators. - In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project: dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
-
Configure the
FlowMetricresource. See "Flowmetric chart configuration examples".
Verification
-
Once the pods refresh, navigate to Observe
Dashboards. Search for the NetObserv / Main dashboard. View two panels under the NetObserv / Main dashboard, or optionally a dashboard name that you create:
- A textual single statistic showing the global external ingress rate summed across all dimensions
- A timeseries graph showing the same metric per destination workload
For more information about the query language, refer to the Prometheus documentation.
12.7.1. Flowmetric chart configuration examples Copia collegamentoCollegamento copiato negli appunti!
These FlowMetric custom resource examples demonstrate how to define charts in the OpenShift Container Platform web console for tracking external ingress traffic and round-trip time (RTT) latency.
12.7.1.1. Ingress bytes chart for cluster external sources Copia collegamentoCollegamento copiato negli appunti!
Use the following configuration to track the rate of ingress traffic from cluster external sources. These charts help identify bandwidth usage per workload.
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv
# ...
charts:
- dashboardName: Main
title: External ingress traffic
unit: Bps
type: SingleStat
queries:
- promQL: "sum(rate($METRIC[2m]))"
legend: ""
- dashboardName: Main
sectionName: External
title: Top external ingress traffic per workload
unit: Bps
type: StackArea
queries:
- promQL: "sum(rate($METRIC{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace, DstK8S_OwnerName)"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...
- 1
- The
FlowMetricresources need to be created in the namespace defined in theFlowCollectorspec.namespace, which isnetobservby default.
12.7.1.2. RTT latency chart for cluster external ingress traffic Copia collegamentoCollegamento copiato negli appunti!
Use the following configuration to monitor round-trip time (RTT) for cluster external ingress traffic. These examples use the histogram_quantile function to display the 50th and 99th percentiles (p50 and p99).
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv
# ...
charts:
- dashboardName: Main
title: External ingress TCP latency
unit: seconds
type: SingleStat
queries:
- promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket[2m])) by (le)) > 0"
legend: "p99"
- dashboardName: Main
sectionName: External
title: "Top external ingress sRTT per workload, p50 (ms)"
unit: seconds
type: Line
queries:
- promQL: "histogram_quantile(0.5, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
- dashboardName: Main
sectionName: External
title: "Top external ingress sRTT per workload, p99 (ms)"
unit: seconds
type: Line
queries:
- promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...
12.7.1.3. Calculate histogram averages Copia collegamentoCollegamento copiato negli appunti!
You can show averages of histograms by dividing the metric, $METRIC_sum, by the metric, $METRIC_count, which are automatically generated when you create a histogram. With the preceding example, the Prometheus query to do this is as follows:
promQL: "(sum(rate($METRIC_sum{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName) / sum(rate($METRIC_count{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName))*1000"
12.8. Detecting SYN flooding using the FlowMetric API and TCP flags Copia collegamentoCollegamento copiato negli appunti!
Deploy a custom AlertingRule and FlowMetric configuration to monitor TCP flags, enabling real-time detection and alerting for SYN flooding attacks on the cluster.
Procedure
-
In the web console, navigate to Ecosystem
Installed Operators. - In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
Create
FlowMetricresources to add the following configurations:Configuration counting flows per destination host and resource, with TCP flags
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: flows-with-flags-per-destination spec: metricName: flows_with_flags_per_destination_total type: Counter labels: [SrcSubnetLabel,DstSubnetLabel,DstK8S_Name,DstK8S_Type,DstK8S_HostName,DstK8S_Namespace,Flags]Configuration counting flows per source host and resource, with TCP flags
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: flows-with-flags-per-source spec: metricName: flows_with_flags_per_source_total type: Counter labels: [DstSubnetLabel,SrcSubnetLabel,SrcK8S_Name,SrcK8S_Type,SrcK8S_HostName,SrcK8S_Namespace,Flags]Deploy the following
AlertingRuleresource to alert for SYN flooding:AlertingRulefor SYN floodingapiVersion: monitoring.openshift.io/v1 kind: AlertingRule metadata: name: netobserv-syn-alerts namespace: openshift-monitoring # ... spec: groups: - name: NetObservSYNAlerts rules: - alert: NetObserv-SYNFlood-in annotations: message: |- {{ $labels.job }}: incoming SYN-flood attack suspected to Host={{ $labels.DstK8S_HostName}}, Namespace={{ $labels.DstK8S_Namespace }}, Resource={{ $labels.DstK8S_Name }}. This is characterized by a high volume of SYN-only flows with different source IPs and/or ports. summary: "Incoming SYN-flood" expr: sum(rate(netobserv_flows_with_flags_per_destination_total{Flags="2"}[1m])) by (job, DstK8S_HostName, DstK8S_Namespace, DstK8S_Name) > 3001 for: 15s labels: severity: warning app: netobserv - alert: NetObserv-SYNFlood-out annotations: message: |- {{ $labels.job }}: outgoing SYN-flood attack suspected from Host={{ $labels.SrcK8S_HostName}}, Namespace={{ $labels.SrcK8S_Namespace }}, Resource={{ $labels.SrcK8S_Name }}. This is characterized by a high volume of SYN-only flows with different source IPs and/or ports. summary: "Outgoing SYN-flood" expr: sum(rate(netobserv_flows_with_flags_per_source_total{Flags="2"}[1m])) by (job, SrcK8S_HostName, SrcK8S_Namespace, SrcK8S_Name) > 3002 for: 15s labels: severity: warning app: netobserv # ...
Verification
- In the web console, click Manage Columns in the Network Traffic table view and click TCP flags.
- In the Network Traffic table view, filter on TCP protocol SYN TCPFlag. A large number of flows with the same byteSize indicates a SYN flood.
-
Go to Observe
Alerting and select the Alerting Rules tab. - Filter on netobserv-synflood-in alert. The alert should fire when SYN flooding occurs.