Network Observability
Configuring and using the Network Observability Operator in OpenShift Container Platform
Abstract
Chapter 1. Network Observability Operator release notes Copy linkLink copied to clipboard!
The Network Observability Operator enables administrators to observe and analyze network traffic flows for OpenShift Container Platform clusters.
These release notes track the development of the Network Observability Operator in the OpenShift Container Platform.
For an overview of the Network Observability Operator, see About network observability.
1.1. Network Observability Operator 1.11 advisory Copy linkLink copied to clipboard!
You can review the advisory for Network Observability Operator 1.11 release.
1.2. Network Observability Operator 1.11 new features and enhancements Copy linkLink copied to clipboard!
Learn about the new features and enhancements in the Network Observability Operator 1.11 release, including hierarchical governance with the
FlowCollectorSlice
- Per-tenant hierarchical governance with the FlowCollectorSlice resource
This release introduces the
API to support hierarchical governance, allowing project administrators to independently manage sampling and subnet labeling for their specific namespaces.FlowCollectorSliceThis feature was implemented to reduce global processing overhead and provide tenant autonomy in large-scale environments where individual teams require self-service visibility without cluster-wide configuration changes. As a result, organizations can selectively collect traffic and delegate data enrichment tasks to the project level while maintaining centralized cluster control.
- New Service deployment model for the
FlowCollectorresource This release introduces a new
deployment model in theServicecustom resource. This model provides an intermediate option between theFlowCollectorandDirectmodels. In theKafkamodel, the eBPF agent is deployed as aServiceset, and thedaemoncomponent is deployed as a scalable service.flowlogs-pipelineThis model offers improved performance in large clusters by reducing cache duplication across component instances.
- Health rules are generally available
The health alerts feature, introduced in previous versions as a Technology Preview feature, is fully supported as health rules in the Network Observability Operator 1.11 release.
ImportantNetwork Observability health rules are available on OpenShift Container Platform 4.16 and later.
This eBPF-based system correlates network metrics with infrastructure metadata to provide proactive notifications and automated insights into cluster health, such as traffic surges or latency trends. As a result, you can use the Network Health dashboard in the OpenShift Container Platform web console to manage categorized alerts, customize thresholds, and create recording rules for improved visualization performance.
- Enhanced network traffic visualization and filtering
This release introduces enhanced visualization and filtering tools in the OpenShift Container Platform web console.
- Inline filter editing: You can now edit filter chips directly within the filter input field. This enhancement provides a more efficient method for modifying long filter values that were previously truncated, eliminating the need to manually copy and paste values. This update adopts an inline editing convention consistent with the Saved filters feature.
- External traffic quick filters: New quick filters allow you to monitor external ingress and egress traffic actively. This enhancement streamlines network management, enabling you to identify and address issues related to external network communication quickly.
- Intuitive resource iconography: The OpenShift Container Platform console now uses specific icons for Kubernetes kinds, groups, and filters. These icons provide a more intuitive and visually consistent experience, making it easier to navigate the network topology and identify applied filters at a glance.
- DNS resolution analysis
This release includes eBPF-based DNS tracking to enrich network flow records with domain names.
This feature was implemented to reduce the mean time to identify (MTTI) by allowing administrators to immediately distinguish between network routing failures and service discovery issues, such as
errors.NXDOMAIN- Integration with Gateway API
This release introduces automatic integration between the Network Observability Operator and the Gateway API when a
resource is created. This feature provides high-level traffic attribution for cluster ingress and egress traffic without requiring manual configuration of theGatewayClassresource.FlowCollectorImportantIntegration with Gateway API is available on OpenShift Container Platform 4.19 and later.
You can verify the automated mapping of network flows to Gateway API resources in the Observe → Network Traffic view of the OpenShift Container Platform web console. The Owner column displays the Gateway name, providing a direct link to the associated Gateway resource page.
- Improved data resilience in the Overview and Topology views
With this release, functional data remains visible in the Overview and Topology views even if some background queries fail. This enhancement ensures that the scope and group drop-down menus in the Topology view remain accessible during partial service disruptions.
Additionally, the Overview page now displays active error messages to assist with troubleshooting, providing better visibility into system health without interrupting the monitoring workflow.
- Improved categorization of unknown network flows
With this release, network flows from unknown sources are categorized into four distinct groups: external, unknown service, unknown node, and unknown pod.
This enhancement uses subnet labels to separate unknown IP subnets, providing a clearer network topology. This improved visibility helps to identify potential security threats and allows for a more targeted analysis of unknown elements within the cluster.
- Improved performance for new Network Observability installations
The default performance of the Network Observability Operator is improved for new installations. The default value for
is increased from 5 to 15 seconds, and thecacheActiveTimeoutvalue is increased from 100,000 to 120,000 to accommodate higher flow volumes.cacheMaxFlowsImportantThese new default values apply only to new installations; existing installations retain their current configurations.
These changes reduce CPU load by up to 40%.
- Improved LokiStack status monitoring and reporting
With this release, the Network Observability Operator monitors the status of the
resource and reports errors or configuration issues. The Network Observability Operator verifiesLokiStackconditions, including pending or failed pods and specific warning conditions.LokiStackThis enhancement provides more actionable information in the
status, allowing for more effective troubleshooting of theFlowCollectorcomponent within network observability.LokiStack- Visual indicators for Loki indexed fields in the filter menu
With this release, functional data remains visible in the Overview and Topology views even if some background queries fail. This enhancement ensures that the scope and group drop-down menus in the Topology view remain accessible during partial service disruptions.
This enhancement improves query performance by indicating which fields are indexed for faster data retrieval. Using indexed fields when filtering data reduces the time required to browse and analyze network flows within the console.
1.3. Network Observability Operator 1.11 known issues Copy linkLink copied to clipboard!
The following known issues affect the Network Observability Operator 1.11 release.
- Health rules do not trigger when the sampling rate increases because of
lowVolumeThreshold Network observability alerts might not trigger when an elevated sampling rate causes the volume to fall below the
filter. This results in fewer alerts being evaluated or displayed.lowVolumeThresholdTo work around this problem, adjust the
value to align with the sampling rate to ensure consistent alert evaluation.lowVolumeThreshold- DNS metrics unavailable when Loki is disabled
When the
feature is enabled in a "Loki-less" installation, the required metrics for DNS graphs are unavailable. As a consequence, you cannot view DNS latency and response codes in the dashboard.DNSTrackingTo work around this problem, you must either disable the
option or enable Loki in theDNSTrackingresource by settingFlowCollectorto true.spec.loki.enable
1.4. Network Observability Operator 1.11 fixed issues Copy linkLink copied to clipboard!
The Network Observability Operator 1.11 release contains several fixed issues that improve performance and the user experience.
- Missing dates in charts
Before this update, the chart tooltip date was not displayed as intended, due to a breaking change in a dependency. As a consequence, users experienced missing date information in the OpenShift Container Platform web console plugin’s Overview tab chart, affecting data context.
With this release, the chart tooltip date display is restored.
- Warning message for Direct mode not refreshed after upscaling
Before this update, cluster information was not refreshed after scaling, causing a warning message to persist in large clusters, not updating with changes.
With this release, cluster information is now refreshed when it changes, resulting in the warning message for large clusters in
mode updating with changes in cluster size, improving user visibility.Direct- Unenriched OVN IPs
Before this update, some IPs declared by OVN-Kubernetes were not enriched, causing unenriched IPs like
to not appear in100.64.0.xnetwork. As a consequence, IPs not enriched caused the wrong network visibility for users.MachinesWith this release, missing IPs in OVN-Kubernetes are now enriched. As a result, IPs declared by OVN-Kubernetes are correctly enriched and appear in the
network improving the visibility of network traffic sources in theMachinesnetwork.Machines- Improved operator API discovery reliability
Before this update, a race condition during Network Observability Operator startup could cause API discovery to fail silently. As a consequence, the operator could fail to recognize the OpenShift Container Platform cluster, leading to missing mandatory
resources and preventing components from functioning correctly.ClusterRoleBindingWith this release, the Network Observability Operator continues to check for API availability over time and reconciliation is blocked if discovery fails. As a result, the operator correctly identifies the environment and ensures all required roles are created.
- Added missing translation fields to IPFIX exports
Before this update, some network flow fields were missing translations during the IPFIX export process. As a result, exported IPFIX data was incomplete or difficult to interpret in external collectors.
With this release, the missing translation fields (xlat) have been added to the
IPFIX exporter. IPFIX exports now provide a complete set of translated fields for consistent network observability.flowlogs-pipeline- Fixed FlowMetric form creation link and defaults
Before this update, the link to create a
custom resource incorrectly directed users to a YAML editor instead of the intended form view. Additionally, the editor was pre-filled with incorrect default values.FlowMetricWith this release, the link correctly leads to the
resource creation form with the expected default settings. As a result, users can now easily createFlowMetricresources through the user interface.FlowMetric- Virtual machine resource type icon in Topology view
Before this update, virtual machine (VM) owner types incorrectly displayed a generic question mark (?) icon in the Topology view.
With this release, the user interface now includes a specific icon for VM resources. As a result, users can more easily identify and distinguish VM traffic within the network topology.
- DNS optimization, update DNS Alerts
Before this update, many DNS "NXDOMAIN" errors were returned due to ambiguous URLs being used in network observability.
With this release, these URLs have been disambiguated, resulting in a more optimal use of DNS.
Chapter 2. Network Observability Operator release notes archive Copy linkLink copied to clipboard!
2.1. Network Observability Operator release notes archive Copy linkLink copied to clipboard!
These release notes track past developments of the Network Observability Operator in the OpenShift Container Platform. They are for reference purposes only.
The Network Observability Operator enables administrators to observe and analyze network traffic flows for OpenShift Container Platform clusters.
2.1.1. Network Observability Operator 1.10.1 advisory Copy linkLink copied to clipboard!
You can review the advisory for Network Observability Operator 1.10.1 release.
2.1.2. Network Observability Operator 1.10.1 CVEs Copy linkLink copied to clipboard!
You can review the CVEs for the Network Observability Operator 1.10.1 release.
2.1.3. Network Observability Operator 1.10.1 fixed issues Copy linkLink copied to clipboard!
The Network Observability Operator 1.10.1 release contains several fixed issues that improve performance and the user experience.
- Warning Generated for Direct Mode on Clusters Over 15 Nodes
Before this update, the recommendation against using the
deployment model on large clusters was only available in the documentation.DirectWith this release, the Network Observability Operator now generates a warning when the Direct deployment mode is used on a cluster exceeding 15 nodes.
- Network policy deployment disabled on OpenShiftSDN
Before this update, when OpenShift SDN was the cluster network plugin, enabling the
network policy would break communication between network observability pods. This issue does not occur with OVN-Kubernetes, which is the default supported network plugin.FlowCollectorWith this release, the Network Observability Operator no longer attempts to deploy the network policy when OpenShift SDN is detected; a warning is displayed instead. Additionally, the default value for enabling the network policy is modified: it is now enabled by default only when OVN-Kubernetes is detected as the cluster network plugin.
- Validation added for subnet label characters
Before this update, there were no restrictions on characters allowed in the subnet labels "name" configuration, meaning users could enter text containing spaces or special characters. This generated errors in the web console plugin when users tried to apply filters, and clicking the filter icon for a subnet label often failed.
With this release, the configured subnet label name is validated immediately when configured in the
custom resource. The validation ensures the name contains only alphanumeric characters,FlowCollector,:, and_. As a result, filtering on subnet labels from the web console plugin now works as expected.-- Network Observability CLI uses unique temporary directory per run
Before this update, the Network Observability CLI created or reused a single temporary (
) directory in the current working directory. This could lead to conflicts or data corruption between separate runs.tmpWith this release, the Network Observability CLI now creates a unique temporary directory for each run, preventing potential conflicts and improving file management hygiene.
2.1.4. Network Observability Operator 1.10 advisory Copy linkLink copied to clipboard!
Review the advisory that is available for the Network Observability Operator 1.10:
2.1.5. Network Observability Operator 1.10 new features and enhancements Copy linkLink copied to clipboard!
The Network Observability Operator 1.10 release enhances security, improves performance, and introduces new CLI UI tools for better network flow management.
2.1.5.1. Network policy updates Copy linkLink copied to clipboard!
The Network Observability Operator now supports configuring both ingress and egress network policies to control pod traffic. This enhancement improves security.
By default, the
spec.NetworkPolicy.enable
true
2.1.5.2. Network Observability Operator CLI UI updates Copy linkLink copied to clipboard!
This release brings the following new features and updates to the Network Observability Operator CLI (
oc netobserv
Table view enhancements
- Customizable columns: Click Manage Columns to select which columns to display, and tailor the table to your needs.
- Smart filtering: Live filters now include auto-suggestions, making it easier to select the right keys and values.
-
Packet preview: When capturing packets, click a row to inspect the content directly.
pcap
Terminal-based line charts enhancements
- Metrics visualization: Real-time graphs are rendered directly in the CLI.
- Panel selection: Choose from predefined views or customize views by using the Manage Panels pop-up menu to selectively view charts of specific metrics.
2.1.5.3. Network observability console improvements Copy linkLink copied to clipboard!
The network observability console plugin includes a new view to configure the
FlowCollector
-
Configure the CR.
FlowCollector - Calculate your resource footprint.
- Gain increased of issues such as configuration warnings or high metrics cardinality.
2.1.5.4. Performance improvements Copy linkLink copied to clipboard!
Network Observability Operator 1.10 has improved the performance and memory footprint of the Operator, especially visible on large clusters.
2.1.6. Network Observability Operator 1.10 Technology Preview features Copy linkLink copied to clipboard!
2.1.6.1. Network Observability Operator custom alerts (Technology Preview) Copy linkLink copied to clipboard!
This release introduces new alert functionality, and custom alert configuration. These capabilities are Technology Preview features, and must be explicitly enabled.
To view the new alerts, in the OpenShift Container Platform web console, click Observe → Alerting → Alerting rules.
2.1.6.2. Network Observability Operator Network Health dashboard (Technology Preview) Copy linkLink copied to clipboard!
When you enable the Technology Preview alerts functionality in the Network Observability Operator, you can view a new Network Health dashboard in the OpenShift Container Platform web console by clicking Observe.
The Network Health dashboard provides a summary of triggered alerts, distinguishing between critical, warning, and minor issues, and also shows pending alerts.
2.1.7. Network Observability Operator 1.10 removed features Copy linkLink copied to clipboard!
Review the removed features that might affect your use of the Network Observability Operator 1.10 release.
2.1.7.1. FlowCollector API version v1beta1 has been removed Copy linkLink copied to clipboard!
The
FlowCollector
v1beta1
v1beta2
2.1.8. Network Observability Operator 1.10 known issues Copy linkLink copied to clipboard!
Review the following known issues and their recommended workarounds (where available) that might affect your use of the Network Observability Operator 1.10 release.
2.1.8.1. Upgrading to 1.10 fails on OpenShift Container Platform 4.14 and earlier Copy linkLink copied to clipboard!
Upgrading to the Network Observability Operator 1.10 on OpenShift Container Platform 4.14 and earlier can fail due to a
FlowCollector
To workaround this problem, you must:
Uninstall both versions of the Network Observability Operator from the software catalog in the OpenShift Container Platform web console.
-
Keep the CRD installed so that it doesn’t cause any disruption in the flow collection process.
FlowCollector
-
Keep the
Check the current name of the
CRD by running the following command:FlowCollector$ oc get crd flowcollectors.flows.netobserv.io -o jsonpath='{.spec.versions[0].name}'Expected output:
v1beta1Check the current serving status of the
CRD by running the following command:FlowCollector$ oc get crd flowcollectors.flows.netobserv.io -o jsonpath='{.spec.versions[0].served}'Expected output:
trueSet the
flag for theservedversion tov1beta1by running the following command:false$ oc patch crd flowcollectors.flows.netobserv.io --type='json' -p "[{'op': 'replace', 'path': '/spec/versions/0/served', 'value': false}]"Verify that the
flag is set toservedby running the following command:false$ oc get crd flowcollectors.flows.netobserv.io -o jsonpath='{.spec.versions[0].served}'Expected output:
false- Install Network Observability Operator 1.10.
2.1.8.2. eBPF agent compatibility with older OpenShift Container Platform versions Copy linkLink copied to clipboard!
The eBPF agent used in the Network Observability Command Line Interface (CLI) packet capture feature is incompatible with OpenShift Container Platform versions 4.16 and older.
This limitation prevents the eBPF-based Packet Capture Agent (PCA) from functioning correctly on those older clusters.
To work around this problem, you must manually configure PCA to use an older, compatible eBPF agent container image. For more information, see the Red Hat Knowledgebase Solution eBPF agent compatibility with older Openshift versions in Network Observability CLI 1.10+.
2.1.8.3. eBPF Agent fails to send flows with OpenShiftSDN when NetworkPolicy is enabled Copy linkLink copied to clipboard!
When running Network Observability Operator 1.10 on OpenShift Container Platform 4.14 clusters that use the
OpenShiftSDN
flowlogs-pipeline
FlowCollector
NetworkPolicy
spec.networkPolicy.enable: true
As a consequence, flow data is not processed by the
flowlogs-pipeline
i/o timeout
time="2025-10-17T13:53:44Z" level=error msg="couldn't send flow records to collector" collector="10.0.68.187:2055" component=exporter/GRPCProto error="rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.0.68.187:2055: i/o timeout\""
To work around this problem, set
spec.networkPolicy.enable
false
NetworkPolicy
FlowCollector
This will allow the eBPF agent to communicate with the
flowlogs-pipeline
2.1.9. Network Observability Operator 1.10 fixed issues Copy linkLink copied to clipboard!
The Network Observability Operator 1.10 release contains several fixed issues that improve performance and the user experience.
2.1.9.1. MetricName and Remap fields are validated Copy linkLink copied to clipboard!
Before this update, users could create a
FlowMetric
FlowMetric
With this release, the
FlowMetric
metricName
remap
2.1.9.2. Improved html-to-image export performance Copy linkLink copied to clipboard!
Before this update, performance issues in the underlying library caused the
html-to-image
With this release, the performance of the
html-to-image
2.1.9.3. Improved warnings for eBPF privileged mode Copy linkLink copied to clipboard!
Before this update, when users selected
eBPF
privileged
privileged
With this release, a validation hook immediately warns the user if the configuration is inconsistent. This improves user understanding and prevents misconfiguration.
2.1.9.4. Subnet labels added to OpenTelemetry exporter Copy linkLink copied to clipboard!
Before this update, the
OpenTelemetry
SrcSubnetLabel
DstSubnetLabel
With this release, these labels are now correctly provided by the exporter. They have also been renamed to
source.subnet.label
destination.subnet.label
OpenTelemetry
2.1.9.5. Reduced default tolerations for network observability components Copy linkLink copied to clipboard!
Before this update, a default toleration was set on all network observability components to allow them to be scheduled on any node, including those tainted with
NoSchedule
With this release, the default toleration is now only maintained for the
eBPF
Flowlogs-Pipeline
Direct
Flowlogs-Pipeline
Kafka
Additionally, while tolerations were always configurable in the
FlowCollector
2.1.10. Network Observability Operator 1.9.3 advisory Copy linkLink copied to clipboard!
The following advisory is available for the Network Observability Operator 1.9.3:
2.1.11. Network Observability Operator 1.9.2 advisory Copy linkLink copied to clipboard!
The following advisory is available for the Network Observability Operator 1.9.2:
2.1.12. Network observability 1.9.2 bug fixes Copy linkLink copied to clipboard!
-
Before this update, OpenShift Container Platform versions 4.15 and earlier did not support the configuration. This led to command-line interface (CLI) errors and prevented the observation of packets and flows. With this release, the Traffic Control eXtension (TCX) hook attachment mode has been adjusted for these older versions. This eliminates
TC_ATTACH_MODEhook errors and enables flow and packet observation.tcx
2.1.13. Network Observability Operator 1.9.1 advisory Copy linkLink copied to clipboard!
You can review the advisory for the Network Observability Operator 1.9.1 release.
The following advisory is available for the Network Observability Operator 1.9.1:
2.1.14. Network Observability Operator 1.9.1 fixed issues Copy linkLink copied to clipboard!
You can review the fixed issues for the Network Observability Operator 1.9.1 release.
-
Before this update, network flows were not observed on OpenShift Container Platform 4.15 due to an incorrect attach mode setting. This stopped users from monitoring network flows correctly, especially with certain catalogs. With this release, the default attach mode for OpenShift Container Platform versions older than 4.16.0 is set to , so flows are now observed on OpenShift Container Platform 4.15. (NETOBSERV-2333)
tc - Before this update, if an IPFIX collector restarted, configuring an IPFIX exporter could lose its connection and stop sending network flows to the collector. With this release, the connection is restored, and network flows continue to be sent to the collector. (NETOBSERV-2315)
- Before this update, when you configured an IPFIX exporter, flows without port information (such as ICMP traffic) were ignored, which caused errors in logs. TCP flags and ICMP data were also missing from IPFIX exports. With this release, these details are now included. Missing fields (like ports) no longer cause errors and are part of the exported data. (NETOBSERV-2307)
- Before this update, the User Defined Networks (UDN) Mapping feature showed a configuration issue and warning on OpenShift Container Platform 4.18 because the OpenShift version was incorrectly set in the code. This impacted the user experience. With this release, UDN Mapping now supports OpenShift Container Platform 4.18 without warnings, making the user experience smooth. (NETOBSERV-2305)
-
Before this update, the expand function on the Network Traffic page had compatibility problems with OpenShift Container Platform Console 4.19. This resulted in empty menu space when expanding and an inconsistent user interface. With this release, the compatibility problem in the part and
NetflowTrafficis resolved. The side menu in the Network Traffic view is now properly managed, which improves how you interact with the user interface. (NETOBSERV-2304)theme hook
2.1.15. Network Observability Operator 1.9.0 advisory Copy linkLink copied to clipboard!
You can review the advisory for the Network Observability Operator 1.9.0 release.
2.1.16. Network Observability Operator 1.9.0 new features and enhancements Copy linkLink copied to clipboard!
You can review the new features and enhancements for the Network Observability Operator 1.9.0 release.
2.1.16.1. User-defined networks with network observability Copy linkLink copied to clipboard!
With this release, user-defined networks (UDN) feature is generally available with network observability. When the
UDNMapping
UDN labels
2.1.16.2. Filter flowlogs at ingestion Copy linkLink copied to clipboard!
With this release, you can create filters to reduce the number of generated network flows and the resource usage of network observability components. The following filters can be configured:
- eBPF Agent filters
- Flowlogs-pipeline filters
2.1.16.3. IPsec support Copy linkLink copied to clipboard!
This update brings the following enhancements to network observability when IPsec is enabled on OpenShift Container Platform:
- A new column named IPsec Status is displayed in the network observability Traffic flows view to show whether a flow was successfully IPsec-encrypted or if there was an error during encryption/decryption.
- A new dashboard showing the percentage of encrypted traffic is generated.
2.1.16.4. Network Observability CLI Copy linkLink copied to clipboard!
The following filtering options are now available for packets, flows, and metrics capture:
-
Configure the ratio of packets being sampled by using the option.
--sampling -
Filter flows using a custom query by using the option.
--query -
Specify interfaces to monitor by using the option.
--interfaces -
Specify interfaces to exclude by using the option.
--exclude_interfaces -
Specify metric names to generate by using the option.
--include_list
For more information, see:
2.1.17. Network Observability Operator release notes 1.9.0 notable technical changes Copy linkLink copied to clipboard!
You can review the notable technical changes for the Network Observability Operator 1.6.0 release.
-
The feature in network observability 1.9 has been updated to work with the newer Linux kernel of OpenShift Container Platform 4.19. This update breaks compatibility with older kernels. As a result, the
NetworkEventsfeature can only be used with OpenShift Container Platform 4.19. If you are using this feature with network observability 1.8 and OpenShift Container Platform 4.18, consider avoiding a network observability upgrade or upgrade to network observability 1.9 and OpenShift Container Platform to 4.19.NetworkEvents -
The cluster role has been renamed to
netobserv-reader.netobserv-loki-reader - Improved CPU performance of the eBPF agents.
2.1.18. Network Observability Operator 1.9.0 Technology Preview features Copy linkLink copied to clipboard!
You can review the Technology Preview features for the Network Observability Operator 1.9.0 release.
Some features in this release are currently in Technology Preview. These experimental features are not intended for production use. Note the following scope of support on the Red Hat Customer Portal for these features:
Technology Preview Features Support Scope
2.1.18.1. eBPF Manager Operator with network observability Copy linkLink copied to clipboard!
The eBPF Manager Operator reduces the attack surface and ensures compliance, security, and conflict prevention by managing all eBPF programs. Network observability can use the eBPF Manager Operator to load hooks. This eliminates the need to provide the eBPF Agent with privileged mode or additional Linux capabilities like
CAP_BPF
CAP_PERFMON
2.1.19. Network Observability Operator 1.9.0 CVEs Copy linkLink copied to clipboard!
You can review the CVEs for the Network Observability Operator 1.9.0 release.
2.1.20. Network Observability Operator 1.9.0 fixed issues Copy linkLink copied to clipboard!
You can review the fixed issues for the Network Observability Operator 1.9.0 release.
-
Previously, when filtering by source or destination IP from the console plugin, using a Classless Inter-Domain Routing (CIDR) notation such as did not work, returning results that should be filtered out. With this update, it is now possible to use a CIDR notation, with the results being filtered as expected. (NETOBSERV-2276)
10.128.0.0/24 -
Previously, network flows might have incorrectly identified the network interfaces in use, especially with a risk of mixing up and
eth0. This issue only occurred when the eBPF agents were configured asens5. With this update, it has been fixed partially, and almost all network interfaces are correctly identified. Refer to the known issues below for more details. (NETOBSERV-2257)Privileged - Previously, when the Operator checked for available Kubernetes APIs in order to adapt its behavior, if there was a stale API, this resulted in an error that prevented the Operator from starting normally. With this update, the Operator ignores error on unrelated APIs, logs errors on related APIs, and continues to run normally. (NETOBSERV-2240)
- Previously, users could not sort flows by Bytes or Packets in the Traffic flows view of the Console plugin. With this update, users can sort flows by Bytes and Packets. (NETOBSERV-2239)
-
Previously, when configuring the resource with an IPFIX exporter, MAC addresses in the IPFIX flows were truncated to their 2 first bytes. With this update, MAC addresses are fully represented in the IPFIX flows. (NETOBSERV-2208)
FlowCollector - Previously, some of the warnings sent from the Operator validation webhook could lack clarity on what needed to be done. With this update, some of these messages have been reviewed and amended to make them more actionable. (NETOBSERV-2178)
-
Previously, it was not obvious to figure out there was an issue when referencing a from the
LokiStackresource, such as in case of typing error. With this update, theFlowCollectorstatus clearly states that the referencedFlowCollectoris not found in that case. (NETOBSERV-2174)LokiStack - Previously, in the console plugin Traffic flows view, in case of text overflow, text ellipses sometimes hid much of the text to be displayed. With this update, it displays as much text as possible. (NETOBSERV-2119)
- Previously, the console plugin for network observability 1.8.1 and earlier did not work with the OpenShift Container Platform 4.19 web console, making the Network Traffic page inaccessible. With this update, the console plugin is compatible and the Network Traffic page is accessible in network observability 1.9.0. (NETOBSERV-2046)
-
Previously, when using conversation tracking (or
logTypes: Conversationsin thelogTypes: Allresource), the Traffic rates metrics visible in the dashboards were flawed, wrongly showing an out-of-control increase in traffic. Now, the metrics show more accurate traffic rates. However, note that inFlowCollectorandConversationsmodes, these metrics are still not completely accurate as they do not include long-standing connections. This information has been added to the documentation. The default modeEndedConversationsis recommended to avoid these inaccuracy. (NETOBSERV-1955)logTypes: Flows
2.1.21. Network Observability Operator 1.9.0 known issues Copy linkLink copied to clipboard!
You can review the known issues for the Network Observability Operator 1.9.0 release.
- The user-defined network (UDN) feature displays a configuration issue and a warning when used with OpenShift Container Platform 4.18, even though it is supported. This warning can be ignored. (NETOBSERV-2305)
-
In some rare cases, the eBPF agent is unable to appropriately correlate flows with the involved interfaces when running in modes with several network namespaces. A large part of these issues have been identified and resolved in this release, but some inconsistencies remain, especially with the
privilegedinterface. (NETOBSERV-2287)ens5
2.1.22. Network Observability Operator 1.8.1 advisory Copy linkLink copied to clipboard!
You can review the advisory for the Network Observability Operator 1.8.1 release.
2.1.23. Network Observability Operator 1.8.1 CVEs Copy linkLink copied to clipboard!
You can review the CVEs for the Network Observability Operator 1.8.1 release.
2.1.24. Network Observability Operator 1.8.1 fixed issues Copy linkLink copied to clipboard!
You can review the fixed issues for the Network Observability Operator 1.8.1 release.
- This fix ensures that the Observe menu appears only once in future versions of OpenShift Container Platform. (NETOBSERV-2139)
2.1.25. Network Observability Operator 1.8.0 advisory Copy linkLink copied to clipboard!
You can review the advisory for the Network Observability Operator 1.8.0 release.
2.1.26. Network Observability Operator 1.8.0 new features and enhancements Copy linkLink copied to clipboard!
You can review the new features and enhancements for the Network Observability Operator 1.8.0 release.
2.1.26.1. Packet translation Copy linkLink copied to clipboard!
You can now enrich network flows with translated endpoint information, showing not only the service but also the specific backend pod, so you can see which pod served a request.
For more information, see:
2.1.26.2. OVN-Kubernetes networking events tracking Copy linkLink copied to clipboard!
OVN-Kubernetes networking events tracking is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can now use network event tracking in network observability to gain insight into OVN-Kubernetes events, including network policies, admin network policies, and egress firewalls.
For more information, see:
2.1.26.3. eBPF performance improvements in 1.8 Copy linkLink copied to clipboard!
- Network observability now uses hash maps instead of per-CPU maps. This means that network flows data is now tracked in the kernel space and new packets are also aggregated there. The de-duplication of network flows can now occur in the kernel, so the size of data transfer between the kernel and the user spaces yields better performance. With these eBPF performance improvements, there is potential to observe a CPU resource reduction between 40% and 57% in the eBPF Agent.
2.1.26.4. Network Observability CLI Copy linkLink copied to clipboard!
The following new features, options, and filters are added to the Network Observability CLI for this release:
-
Capture metrics with filters enabled by running the command.
oc netobserv metrics -
Run the CLI in the background by using the option with flows and packets capture and running
--backgroundto see the progress of the background run andoc netobserv followto download the generated logs.oc netobserv copy -
Enrich flows and metrics capture with Machines, Pods, and Services subnets by using the option.
--get-subnets New filtering options available with packets, flows, and metrics capture:
- eBPF filters on IPs, Ports, Protocol, Action, TCP Flags and more
-
Custom nodes using
--node-selector -
Drops only using
--drops -
Any field using
--regexes
For more information, see:
2.1.27. Network Observability Operator release notes 1.8.0 fixed issues Copy linkLink copied to clipboard!
You can review the fixed issues for the Network Observability Operator 1.8.0 release.
- Previously, the Network Observability Operator came with a "kube-rbac-proxy" container to manage RBAC for its metrics server. Since this external component is deprecated, it was necessary to remove it. It is now replaced with direct TLS and RBAC management through Kubernetes controller-runtime, without the need for a side-car proxy. (NETOBSERV-1999)
- Previously in the OpenShift Container Platform console plugin, filtering on a key that was not equal to multiple values would not filter anything. With this fix, the expected results are returned, which is all flows not having any of the filtered values. (NETOBSERV-1990)
- Previously in the OpenShift Container Platform console plugin with disabled Loki, it was very likely to generate a "Can’t build query" error due to selecting an incompatible set of filters and aggregations. Now this error is avoided avoid by automatically disabling incompatible filters while still making the user aware of the filter incompatibility. (NETOBSERV-1977)
- Previously, when viewing flow details from the console plugin, the ICMP info was always displayed in the side panel, showing "undefined" values for non-ICMP flows. With this fix, ICMP info is not displayed for non-ICMP flows. (NETOBSERV-1969)
- Previously, the "Export data" link from the Traffic flows view did not work as intended, generating empty CSV reports. Now, the export feature is restored, generating non-empty CSV data. (NETOBSERV-1958)
-
Previously, it was possible to configure the with
FlowCollectorprocessor.logTypes,ConversationsorEndedConversationswithAllset toloki.enable, despite the conversation logs being only useful when Loki is enabled. This resulted in resource usage waste. Now, this configuration is invalid and is rejected by the validation webhook. (NETOBSERV-1957)false -
Configuring the with
FlowCollectorset toprocessor.logTypesconsumes much more resources, such as CPU, memory and network bandwidth, than the other options. This was previously not documented. It is now documented, and triggers a warning from the validation webhook. (NETOBSERV-1956)All - Previously, under high stress, some flows generated by the eBPF agent were mistakenly dismissed, resulting in traffic bandwidth under-estimation. Now, those generated flows are not dismissed. (NETOBSERV-1954)
-
Previously, when enabling the network policy in the configuration, the traffic to the Operator webhooks was blocked, breaking the
FlowCollectorAPI validation. Now traffic to the webhooks is allowed. (NETOBSERV-1934)FlowMetrics -
Previously, when deploying the default network policy, namespaces and
openshift-consolewere set by default in theopenshift-monitoringfield, resulting in duplicated rules. Now there is no additional namespace set by default, which helps avoid getting duplicated rules.(NETOBSERV-1933)additionalNamespaces - Previously from the OpenShift Container Platform console plugin, filtering on TCP flags would match flows having only the exact desired flag. Now, any flow having at least the desired flag appears in filtered flows. (NETOBSERV-1890)
- When the eBPF agent runs in privileged mode and pods are continuously added or deleted, a file descriptor (FD) leak occurs. The fix ensures proper closure of the FD when a network namespace is deleted. (NETOBSERV-2063)
-
Previously, the CLI agent did not deploy on master nodes. Now, a toleration is added on the agent
DaemonSetto schedule on every node when taints are set. Now, CLI agentDaemonSetpods run on all nodes. (NETOBSERV-2030)DaemonSet - Previously, the Source Resource and Source Destination filters autocomplete were not working when using Prometheus storage only. Now this issue is fixed and suggestions displays as expected. (NETOBSERV-1885)
- Previously, a resource using multiple IPs was displayed separately in the Topology view. Now, the resource shows as a single topology node in the view. (NETOBSERV-1818)
- Previously, the console refreshed the Network traffic table view contents when the mouse pointer hovered over the columns. Now, the display is fixed, so row height remains constant with a mouse hover. (NETOBSERV-2049)
2.1.28. Network Observability Operator release notes 1.8.0 known issues Copy linkLink copied to clipboard!
You can review the known issues for the Network Observability Operator 1.8.0 release.
- If there is traffic that uses overlapping subnets in your cluster, there is a small risk that the eBPF Agent mixes up the flows from overlapped IPs. This can happen if different connections happen to have the exact same source and destination IPs and if ports and protocol are within a 5 seconds time frame and happening on the same node. This should not be possible unless you configured secondary networks or UDN. Even in that case, it is still very unlikely in usual traffic, as source ports are usually a good differentiator. (NETOBSERV-2115)
-
After selecting a type of exporter to configure in the resource
FlowCollectorsection from the OpenShift Container Platform web console form view, the detailed configuration for that type does not show up in the form. The workaround is to configure directly the YAML. (NETOBSERV-1981)spec.exporters
2.1.29. Network Observability Operator 1.7.0 advisory Copy linkLink copied to clipboard!
You can review the advisory for the Network Observability Operator 1.7.0 release.
2.1.30. Network Observability Operator 1.7.0 new features and enhancements Copy linkLink copied to clipboard!
You can review the following new features and enhancements for the Network Observability Operator 1.7.0 release.
2.1.30.1. OpenTelemetry support Copy linkLink copied to clipboard!
You can now export enriched network flows to a compatible OpenTelemetry endpoint, such as the Red Hat build of OpenTelemetry.
For more information, see:
2.1.30.2. Network observability Developer perspective Copy linkLink copied to clipboard!
You can now use network observability in the Developer perspective.
For more information, see:
2.1.30.3. TCP flags filtering Copy linkLink copied to clipboard!
You can now use the
tcpFlags
For more information, see:
2.1.30.4. Network observability for OpenShift Virtualization Copy linkLink copied to clipboard!
You can observe networking patterns on an OpenShift Virtualization setup by identifying eBPF-enriched network flows coming from VMs that are connected to secondary networks, such as through Open Virtual Network (OVN)-Kubernetes.
For more information, see:
2.1.30.5. Network policy deploys in the FlowCollector custom resource (CR) Copy linkLink copied to clipboard!
With this release, you can configure the
FlowCollector
For more information, see:
2.1.30.6. FIPS compliance Copy linkLink copied to clipboard!
You can install and use the Network Observability Operator in an OpenShift Container Platform cluster running in FIPS mode.
ImportantTo enable FIPS mode for your cluster, you must run the installation program from a Red Hat Enterprise Linux (RHEL) computer configured to operate in FIPS mode. For more information about configuring FIPS mode on RHEL, see Switching RHEL to FIPS mode.
When running Red Hat Enterprise Linux (RHEL) or Red Hat Enterprise Linux CoreOS (RHCOS) booted in FIPS mode, OpenShift Container Platform core components use the RHEL cryptographic libraries that have been submitted to NIST for FIPS 140-2/140-3 Validation on only the x86_64, ppc64le, and s390x architectures.
2.1.30.7. eBPF agent enhancements Copy linkLink copied to clipboard!
The following enhancements are available for the eBPF agent:
-
If the DNS service maps to a different port than , you can specify this DNS tracking port using
53.spec.agent.ebpf.advanced.env.DNS_TRACKING_PORT - You can now use two ports for transport protocols (TCP, UDP, or SCTP) filtering rules.
- You can now filter on transport ports with a wildcard protocol by leaving the protocol field empty.
For more information, see:
2.1.30.8. Network Observability CLI Copy linkLink copied to clipboard!
The Network Observability CLI (
oc netobserv
- There are now eBPF enrichment filters for packet capture similar to flow capture.
-
You can now use filter with both flow and packets capture.
tcp_flags - The auto-teardown option is available when max-bytes or max-time is reached.
For more information, see:
2.1.31. Network Observability Operator 1.7.0 fixed issues Copy linkLink copied to clipboard!
You can review the following fixed issues for the Network Observability Operator 1.7.0 release.
-
Previously, when using a RHEL 9.2 real-time kernel, some of the webhooks did not work. Now, a fix is in place to check whether this RHEL 9.2 real-time kernel is being used. If the kernel is being used, a warning is displayed about the features that do not work, such as packet drop and neither Round-trip Time when using architecture. The fix is in OpenShift 4.16 and later. (NETOBSERV-1808)
s390x - Previously, in the Manage panels dialog in the Overview tab, filtering on total, bar, donut, or line did not show a result. Now the available panels are correctly filtered. (NETOBSERV-1540)
-
Previously, under high stress, the eBPF agents were susceptible to enter into a state where they generated a high number of small flows, almost not aggregated. With this fix, the aggregation process is still maintained under high stress, resulting in less flows being created. This fix improves the resource consumption not only in the eBPF agent but also in and Loki. (NETOBSERV-1564)
flowlogs-pipeline -
Previously, when the metric was enabled instead of the
workload_flows_totalmetric, the health dashboard stopped showingnamespace_flows_totalflow charts. With this fix, the health dashboard now shows the flow charts when theBy namespaceis enabled. (NETOBSERV-1746)workload_flows_total -
Previously, when you used the API to generate a custom metric and later modified its labels, such as by adding a new label, the metric stopped populating and an error was shown in the
FlowMetricslogs. With this fix, you can modify the labels, and the error is no longer raised in theflowlogs-pipelinelogs. (NETOBSERV-1748)flowlogs-pipeline -
Previously, there was an inconsistency with the default Loki configuration: it was set to 100 KB in the
WriteBatchSizeCRD default, and 10 MB in the OLM sample or default configuration. Both are now aligned to 10 MB, which generally provides better performances and less resource footprint. (NETOBSERV-1766)FlowCollector - Previously, the eBPF flow filter on ports was ignored if you did not specify a protocol. With this fix, you can set eBPF flow filters independently on ports and or protocols. (NETOBSERV-1779)
- Previously, traffic from Pods to Services was hidden from the Topology view. Only the return traffic from Services to Pods was visible. With this fix, that traffic is correctly displayed. (NETOBSERV-1788)
- Previously, non-cluster administrator users that had access to Network Observability saw an error in the console plugin when they tried to filter for something that triggered auto-completion, such as a namespace. With this fix, no error is displayed, and the auto-completion returns the expected results. (NETOBSERV-1798)
- When the secondary interface support was added, you had to iterate multiple times to register the per network namespace with the netlink to learn about interface notifications. At the same time, unsuccessful handlers caused a leaking file descriptor because with TCX hook, unlike TC, handlers needed to be explicitly removed when the interface went down. Furthermore, when the network namespace was deleted, there was no Go close channel event to terminate the netlink goroutine socket, which caused go threads to leak. Now, there are no longer leaking file descriptors or go threads when you create or delete pods. (NETOBSERV-1805)
- Previously, the ICMP type and value were displaying 'n/a' in the Traffic flows table even when related data was available in the flow JSON. With this fix, ICMP columns display related values as expected in the flow table. (NETOBSERV-1806)
- Previously in the console plugin, it wasn’t always possible to filter for unset fields, such as unset DNS latency. With this fix, filtering on unset fields is now possible. (NETOBSERV-1816)
- Previously, when you cleared filters in the OpenShift web console plugin, sometimes the filters reappeared after you navigated to another page and returned to the page with filters. With this fix, filters do not unexpectedly reappear after they are cleared. (NETOBSERV-1733)
2.1.32. Network Observability Operator 1.7.0 known issues Copy linkLink copied to clipboard!
You can review the following known issues for the Network Observability Operator 1.7.0 release.
- When you use the must-gather tool with network observability, logs are not collected when the cluster has FIPS enabled. (NETOBSERV-1830)
When the
is enabled in thespec.networkPolicy, which installs a network policy on theFlowCollectornamespace, it is impossible to use thenetobservAPI. The network policy blocks calls to the validation webhook. As a workaround, use the following network policy:FlowMetricskind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-from-hostnetwork namespace: netobserv spec: podSelector: matchLabels: app: netobserv-operator ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/host-network: '' policyTypes: - Ingress
2.1.33. Network Observability Operator release notes 1.6.2 advisory Copy linkLink copied to clipboard!
You can review the advisory for the Network Observability Operator 1.6.2 release.
2.1.34. Network Observability Operator release notes 1.6.2 CVEs Copy linkLink copied to clipboard!
You can review the CVEs for the Network Observability Operator 1.6.2 release.
2.1.35. Network Observability Operator release notes 1.6.2 fixed issues Copy linkLink copied to clipboard!
You can review the fixed issues for the Network Observability Operator 1.6.2 release.
- When the secondary interface support was added, there was a need to iterate multiple times to register the per network namespace with the netlink to learn about interface notifications. At the same time, unsuccessful handlers caused a leaking file descriptor because with TCX hook, unlike TC, handlers needed to be explicitly removed when the interface went down. Now, there are no longer leaking file descriptors when creating and deleting pods. (NETOBSERV-1805)
2.1.36. Network Observability Operator release notes 1.6.2 known issues Copy linkLink copied to clipboard!
You can review the known issues for the Network Observability Operator 1.6.2 release.
- There was a compatibility issue with console plugins that would have prevented network observability from being installed on future versions of an OpenShift Container Platform cluster. By upgrading to 1.6.2, the compatibility issue is resolved and network observability can be installed as expected. (NETOBSERV-1737)
2.1.37. Network Observability Operator release notes 1.6.1 advisory Copy linkLink copied to clipboard!
You can review the advisory for the Network Observability Operator 1.6.1 release.
2.1.38. Network Observability Operator release notes 1.6.1 CVEs Copy linkLink copied to clipboard!
You can review the CVEs for the Network Observability Operator 1.6.1 release.
2.1.39. Network Observability Operator release notes 1.6.1 fixed issues Copy linkLink copied to clipboard!
You can review the fixed issues for the Network Observability Operator 1.6.1 release.
- Previously, information about packet drops, such as the cause and TCP state, was only available in the Loki datastore and not in Prometheus. For that reason, the drop statistics in the OpenShift web console plugin Overview was only available with Loki. With this fix, information about packet drops is also added to metrics, so you can view drops statistics when Loki is disabled. (NETOBSERV-1649)
-
When the eBPF agent feature was enabled, and sampling was configured to a value greater than
PacketDrop, reported dropped bytes and dropped packets ignored the sampling configuration. While this was done on purpose, so as not to miss any drops, a side effect was that the reported proportion of drops compared with non-drops became biased. For example, at a very high sampling rate, such as1, it was likely that almost all the traffic appears to be dropped when observed from the console plugin. With this fix, the sampling configuration is honored with dropped bytes and packets. (NETOBSERV-1676)1:1000 - Previously, the SR-IOV secondary interface was not detected if the interface was created first and then the eBPF agent was deployed. It was only detected if the agent was deployed first and then the SR-IOV interface was created. With this fix, the SR-IOV secondary interface is detected no matter the sequence of the deployments. (NETOBSERV-1697)
- Previously, when Loki was disabled, the Topology view in the OpenShift web console displayed the Cluster and Zone aggregation options in the slider beside the network topology diagram, even when the related features were not enabled. With this fix, the slider now only displays options according to the enabled features. (NETOBSERV-1705)
-
Previously, when Loki was disabled, and the OpenShift web console was first loading, an error would occur: . With this fix, the errors no longer occur. (NETOBSERV-1706)
Request failed with status code 400 Loki is disabled - Previously, in the Topology view of the OpenShift web console, when clicking on the Step into icon next to any graph node, the filters were not applied as required in order to set the focus to the selected graph node, resulting in showing a wide view of the Topology view in the OpenShift web console. With this fix, the filters are correctly set, effectively narrowing down the Topology. As part of this change, clicking the Step into icon on a Node now brings you to the Resource scope instead of the Namespaces scope. (NETOBSERV-1720)
- Previously, when Loki was disabled, in the Topology view of the OpenShift web console with the Scope set to Owner, clicking on the Step into icon next to any graph node would bring the Scope to Resource, which is not available without Loki, so an error message was shown. With this fix, the Step into icon is hidden in the Owner scope when Loki is disabled, so this scenario no longer occurs. (NETOBSERV-1721)
- Previously, when Loki was disabled, an error was displayed in the Topology view of the OpenShift web console when a group was set, but then the scope was changed so that the group becomes invalid. With this fix, the invalid group is removed, preventing the error. (NETOBSERV-1722)
-
When creating a resource from the OpenShift web console Form view, as opposed to the YAML view, the following settings were incorrectly managed by the web console:
FlowCollectorandagent.ebpf.metrics.enable. These settings can only be disabled in the YAML view, not in the Form view. To avoid any confusion, these settings have been removed from the Form view. They are still accessible in the YAML view. (NETOBSERV-1731)processor.subnetLabels.openShiftAutoDetect - Previously, the eBPF agent was unable to clean up traffic control flows installed before an ungraceful crash, for example a crash due to a SIGTERM signal. This led to the creation of multiple traffic control flow filters with the same name, since the older ones were not removed. With this fix, all previously installed traffic control flows are cleaned up when the agent starts, before installing new ones. (NETOBSERV-1732)
- Previously, when configuring custom subnet labels and keeping the OpenShift subnets auto-detection enabled, OpenShift subnets would take precedence over the custom ones, preventing the definition of custom labels for in cluster subnets. With this fix, custom defined subnets take precedence, allowing the definition of custom labels for in cluster subnets. (NETOBSERV-1734)
2.1.40. Network Observability Operator release notes 1.6.0 advisory Copy linkLink copied to clipboard!
You can review the advisory for the Network Observability Operator 1.6.0 release.
2.1.41. Network Observability Operator 1.6.0 new features and enhancements Copy linkLink copied to clipboard!
You can review the following new features and enhancements for the Network Observability Operator 1.6.0.
2.1.41.1. Enhanced use of Network Observability Operator without Loki Copy linkLink copied to clipboard!
You can now use Prometheus metrics and rely less on Loki for storage when using the Network Observability Operator.
For more information, see:
2.1.41.2. Custom metrics API Copy linkLink copied to clipboard!
You can create custom metrics out of flowlogs data by using the
FlowMetrics
SrcSubnetLabel
DstSubnetLabel
For more information, see:
2.1.41.3. eBPF performance enhancements Copy linkLink copied to clipboard!
Experience improved performances of the eBPF agent, in terms of CPU and memory, with the following updates:
- The eBPF agent now uses TCX webhooks instead of TC.
The NetObserv / Health dashboard has a new section that shows eBPF metrics.
- Based on the new eBPF metrics, an alert notifies you when the eBPF agent is dropping flows.
- Loki storage demand decreases significantly now that duplicated flows are removed. Instead of having multiple, individual duplicated flows per network interface, there is one de-duplicated flow with a list of related network interfaces.
With the duplicated flows update, the Interface and Interface Direction fields in the Network Traffic table are renamed to Interfaces and Interface Directions, so any bookmarked Quick filter queries using these fields need to be updated to
interfaces
ifdirections
For more information, see:
2.1.41.4. eBPF collection rule-based filtering Copy linkLink copied to clipboard!
You can use rule-based filtering to reduce the volume of created flows. When this option is enabled, the Netobserv / Health dashboard for eBPF agent statistics has the Filtered flows rate view.
For more information, see:
2.1.42. Network Observability Operator 1.6.0 fixed issues Copy linkLink copied to clipboard!
You can review the following fixed issues for the Network Observability Operator 1.6.0.
-
Previously, a dead link to the OpenShift Container Platform documentation was displayed in the Operator Lifecycle Manager (OLM) form for the API creation. Now the link has been updated to point to a valid page. (NETOBSERV-1607)
FlowMetrics - Previously, the Network Observability Operator description in the Operator Hub displayed a broken link to the documentation. With this fix, this link is restored. (NETOBSERV-1544)
-
Previously, if Loki was disabled and the Loki was set to
Mode, or if Loki manual TLS configuration was configured, the Network Observability Operator still tried to read the Loki CA certificates. With this fix, when Loki is disabled, the Loki certificates are not read, even if there are settings in the Loki configuration. (NETOBSERV-1647)LokiStack -
Previously, the
ocplugin for the Network Observability Operator was only working on themust-gatherarchitecture and failing on all others because the plugin was usingamd64for theamd64binary. Now, the Network Observability Operatorococplugin collects logs on any architecture platform.must-gather -
Previously, when filtering on IP addresses using , the Network Observability Operator would return a request error. Now, the IP filtering works in both
not equal toandequalcases for IP addresses and ranges. (NETOBSERV-1630)not equal to -
Previously, when a user was not an admin, the error messages were not consistent with the selected tab of the Network Traffic view in the web console. Now, the error displays on any tab with improved display.(NETOBSERV-1621)
user not admin
2.1.43. Network Observability Operator 1.6.0 known issues Copy linkLink copied to clipboard!
You can review the following known issues for the Network Observability Operator 1.6.0.
-
When the eBPF agent feature is enabled, and sampling is configured to a value greater than
PacketDrop, reported dropped bytes and dropped packets ignore the sampling configuration. While this is done on purpose to not miss any drops, a side effect is that the reported proportion of drops compared to non-drops becomes biased. For example, at a very high sampling rate, such as1, it is likely that almost all the traffic appears to be dropped when observed from the console plugin. (NETOBSERV-1676)1:1000 - In the Manage panels window in the Overview tab, filtering on total, bar, donut, or line does not show any result. (NETOBSERV-1540)
- The SR-IOV secondary interface is not detected if the interface was created first and then the eBPF agent was deployed. It is only detected if the agent was deployed first and then the SR-IOV interface is created. (NETOBSERV-1697)
- When Loki is disabled, the Topology view in the OpenShift web console always shows the Cluster and Zone aggregation options in the slider beside the network topology diagram, even when the related features are not enabled. There is no specific workaround, besides ignoring these slider options. (NETOBSERV-1705)
-
When Loki is disabled, and the OpenShift web console first loads, it might display an error: . As a workaround, you can continue switching content on the Network Traffic page, such as clicking between the Topology and the Overview tabs. The error should disappear. (NETOBSERV-1706)
Request failed with status code 400 Loki is disabled
2.1.44. Network Observability Operator 1.5.0 advisory Copy linkLink copied to clipboard!
You can view the following advisory for the Network Observability Operator 1.5 release.
2.1.45. Network Observability Operator 1.5.0 new features and enhancements Copy linkLink copied to clipboard!
You can view the following new features and enhancements for the Network Observability Operator 1.5 release.
2.1.45.1. DNS tracking enhancements Copy linkLink copied to clipboard!
In 1.5, the TCP protocol is now supported in addition to UDP. New dashboards are also added to the Overview view of the Network Traffic page.
For more information, see:
2.1.45.2. Round-trip time (RTT) Copy linkLink copied to clipboard!
You can use TCP handshake Round-Trip Time (RTT) captured from the
fentry/tcp_rcv_established
For more information, see:
2.1.45.3. Metrics, dashboards, and alerts enhancements Copy linkLink copied to clipboard!
The network observability metrics dashboards in Observe → Dashboards → NetObserv have new metrics types you can use to create Prometheus alerts. You can now define available metrics in the
includeList
ignoreTags
For a complete list of these metrics, see:
2.1.45.4. Improvements for network observability without Loki Copy linkLink copied to clipboard!
You can create Prometheus alerts for the Netobserv dashboard using DNS, Packet drop, and RTT metrics, even if you don’t use Loki. In the previous version of network observability, 1.4, these metrics were only available for querying and analysis in the Network Traffic, Overview, and Topology views, which are not available without Loki.
For more information, see:
2.1.45.5. Availability zones Copy linkLink copied to clipboard!
You can configure the
FlowCollector
topology.kubernetes.io/zone label value applied to the nodes.
For more information, see:
2.1.45.6. Notable enhancements Copy linkLink copied to clipboard!
The 1.5 release of the Network Observability Operator adds improvements and new capabilities to the OpenShift Container Platform web console plugin and the Operator configuration.
2.1.45.7. Performance enhancements Copy linkLink copied to clipboard!
The
default is changed fromspec.agent.ebpf.kafkaBatchSizeto10MBto enhance eBPF performance when using Kafka.1MBImportantWhen upgrading from an existing installation, this new value is not set automatically in the configuration. If you monitor a performance regression with the eBPF Agent memory consumption after upgrading, you might consider reducing the
to the new value.kafkaBatchSize
2.1.45.8. Web console enhancements: Copy linkLink copied to clipboard!
- There are new panels added to the Overview view for DNS and RTT: Min, Max, P90, P99.
There are new panel display options added:
- Focus on one panel while keeping others viewable but with smaller focus.
- Switch graph type.
- Show Top and Overall.
- A collection latency warning is shown in the Custom time range window.
- There is enhanced visibility for the contents of the Manage panels and Manage columns pop-up windows.
- The Differentiated Services Code Point (DSCP) field for egress QoS is available for filtering QoS DSCP in the web console Network Traffic page.
2.1.45.9. Configuration enhancements: Copy linkLink copied to clipboard!
-
The mode in the
LokiStackspecification simplifies installation by automatically setting URLs, TLS, cluster roles and a cluster role binding, as well as thespec.loki.modevalue. TheauthTokenmode allows more control over configuration of these settings.Manual -
The API version changes from to
flows.netobserv.io/v1beta1.flows.netobserv.io/v1beta2
2.1.46. Network Observability Operator 1.5.0 fixed issues Copy linkLink copied to clipboard!
You can view the following fixed issues for the Network Observability Operator 1.5 release.
-
Previously, it was not possible to register the console plugin manually in the web console interface if the automatic registration of the console plugin was disabled. If the value was set to
spec.console.registerin thefalseresource, the Operator would override and erase the plugin registration. With this fix, setting theFlowCollectorvalue tospec.console.registerdoes not impact the console plugin registration or registration removal. As a result, the plugin can be safely registered manually. (NETOBSERV-1134)false -
Previously, using the default metrics settings, the NetObserv/Health dashboard was showing an empty graph named Flows Overhead. This metric was only available by removing "namespaces-flows" and "namespaces" from the list. With this fix, this metric is visible when you use the default metrics setting. (NETOBSERV-1351)
ignoreTags - Previously, the node on which the eBPF Agent was running would not resolve with a specific cluster configuration. This resulted in cascading consequences that culminated in a failure to provide some of the traffic metrics. With this fix, the eBPF agent’s node IP is safely provided by the Operator, inferred from the pod status. Now, the missing metrics are restored. (NETOBSERV-1430)
- Previously, the Loki error 'Input size too long' error for the Loki Operator did not include additional information to troubleshoot the problem. With this fix, help is directly displayed in the web console next to the error with a direct link for more guidance. (NETOBSERV-1464)
-
Previously, the console plugin read timeout was forced to 30s. With the
FlowCollectorAPI update, you can configure thev1beta2specification to update this value according to the Loki Operatorspec.loki.readTimeoutlimit. (NETOBSERV-1443)queryTimeout -
Previously, the Operator bundle did not display some of the supported features by CSV annotations as expected, such as With this fix, these annotations are set in the CSV as expected. (NETOBSERV-1305)
features.operators.openshift.io/… -
Previously, the status sometimes oscillated between
FlowCollectorandDeploymentInProgressstates during reconciliation. With this fix, the status only becomesReadywhen all of the underlying components are fully ready. (NETOBSERV-1293)Ready
2.1.47. Network Observability Operator 1.5.0 known issues Copy linkLink copied to clipboard!
You can view the following known issues for the Network Observability Operator 1.5 release.
-
When trying to access the web console, cache issues on OCP 4.14.10 prevent access to the Observe view. The web console shows the error message: . The recommended workaround is to update the cluster to the latest minor version. If this does not work, you need to apply the workarounds described in this Red Hat Knowledgebase article.(NETOBSERV-1493)
Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/ -
Since the 1.3.0 release of the Network Observability Operator, installing the Operator causes a warning kernel taint to appear. The reason for this error is that the network observability eBPF agent has memory constraints that prevent preallocating the entire hashmap table. The Operator eBPF agent sets the flag so that pre-allocation is disabled when the hashmap is too memory expansive.
BPF_F_NO_PREALLOC
2.1.48. Network Observability Operator 1.4.2 advisory Copy linkLink copied to clipboard!
The following advisory is available for the Network Observability Operator 1.4.2:
2.1.49. Network Observability Operator 1.4.2 CVEs Copy linkLink copied to clipboard!
You can review the following CVEs in the Network Observability Operator 1.4.2 release.
2.1.50. Network Observability Operator 1.4.1 advisory Copy linkLink copied to clipboard!
You can review the following advisory for the Network Observability Operator 1.4.1.
2.1.51. Network Observability Operator release 1.4.1 CVEs Copy linkLink copied to clipboard!
You can review the following CVEs in the Network Observability Operator 1.4.1 release.
2.1.52. Network Observability Operator release notes 1.4.1 fixed issues Copy linkLink copied to clipboard!
You can review the following fixed issues in the Network Observability Operator 1.4.1 release.
- In 1.4, there was a known issue when sending network flow data to Kafka. The Kafka message key was ignored, causing an error with connection tracking. Now the key is used for partitioning, so each flow from the same connection is sent to the same processor. (NETOBSERV-926)
-
In 1.4, the flow direction was introduced to account for flows between pods running on the same node. Flows with the
Innerdirection were not taken into account in the generated Prometheus metrics derived from flows, resulting in under-evaluated bytes and packets rates. Now, derived metrics are including flows with theInnerdirection, providing correct bytes and packets rates. (NETOBSERV-1344)Inner
2.1.53. Network observability release notes 1.4.0 advisory Copy linkLink copied to clipboard!
You can review the following advisory for the Network Observability Operator 1.4.0 release.
2.1.54. Network observability release notes 1.4.0 new features and enhancements Copy linkLink copied to clipboard!
You can review the following new features and enhancements in the Network Observability Operator 1.4.0 release.
2.1.54.1. Notable enhancements Copy linkLink copied to clipboard!
The 1.4 release of the Network Observability Operator adds improvements and new capabilities to the OpenShift Container Platform web console plugin and the Operator configuration.
2.1.54.2. Web console enhancements: Copy linkLink copied to clipboard!
- In the Query Options, the Duplicate flows checkbox is added to choose whether or not to show duplicated flows.
-
You can now filter source and destination traffic with
One-way,
Back-and-forth, and Swap filters.
The network observability metrics dashboards in Observe → Dashboards → NetObserv and NetObserv / Health are modified as follows:
- The NetObserv dashboard shows top bytes, packets sent, packets received per nodes, namespaces, and workloads. Flow graphs are removed from this dashboard.
- The NetObserv / Health dashboard shows flows overhead as well as top flow rates per nodes, namespaces, and workloads.
- Infrastructure and Application metrics are shown in a split-view for namespaces and workloads.
For more information, see:
2.1.54.3. Configuration enhancements: Copy linkLink copied to clipboard!
- You now have the option to specify different namespaces for any configured ConfigMap or Secret reference, such as in certificates configuration.
-
The parameter is added so that the name of the cluster appears in the flows data. This is useful in a multi-cluster context. When using OpenShift Container Platform, leave empty to make it automatically determined.
spec.processor.clusterName
For more information, see:
2.1.54.4. Network observability without Loki Copy linkLink copied to clipboard!
The Network Observability Operator is now functional and usable without Loki. If Loki is not installed, it can only export flows to KAFKA or IPFIX format and provide metrics in the network observability metrics dashboards.
For more information, see:
2.1.54.5. DNS tracking Copy linkLink copied to clipboard!
In 1.4, the Network Observability Operator makes use of eBPF tracepoint hooks to enable DNS tracking. You can monitor your network, conduct security analysis, and troubleshoot DNS issues in the Network Traffic and Overview pages in the web console.
For more information, see:
2.1.54.6. SR-IOV support Copy linkLink copied to clipboard!
You can now collect traffic from a cluster with Single Root I/O Virtualization (SR-IOV) device.
For more information, see:
2.1.54.7. IPFIX exporter support Copy linkLink copied to clipboard!
You can now export eBPF-enriched network flows to the IPFIX collector.
For more information, see:
2.1.54.8. Packet drops Copy linkLink copied to clipboard!
In the 1.4 release of the Network Observability Operator, eBPF tracepoint hooks are used to enable packet drop tracking. You can now detect and analyze the cause for packet drops and make decisions to optimize network performance. In OpenShift Container Platform 4.14 and later, both host drops and OVS drops are detected. In OpenShift Container Platform 4.13, only host drops are detected.
For more information, see:
2.1.54.9. s390x architecture support Copy linkLink copied to clipboard!
Network Observability Operator can now run on
s390x
amd64
ppc64le
arm64
2.1.55. Network observability release notes 1.4.0 removed features Copy linkLink copied to clipboard!
You can review the following removed features from the Network Observability Operator 1.4.0 release.
2.1.55.1. Channel removal Copy linkLink copied to clipboard!
You must switch your channel from
v1.0.x
stable
v1.0.x
2.1.56. Network observability release notes 1.4.0 fixed issues Copy linkLink copied to clipboard!
You can review the following fixed issues in the Network Observability Operator 1.4.0 release.
- Previously, the Prometheus metrics exported by network observability were computed out of potentially duplicated network flows. In the related dashboards, from Observe → Dashboards, this could result in potentially doubled rates. Note that dashboards from the Network Traffic view were not affected. Now, network flows are filtered to eliminate duplicates before metrics calculation, which results in correct traffic rates displayed in the dashboards. (NETOBSERV-1131)
-
Previously, the Network Observability Operator agents were not able to capture traffic on network interfaces when configured with Multus or SR-IOV, non-default network namespaces. Now, all available network namespaces are recognized and used for capturing flows, allowing capturing traffic for SR-IOV. There are configurations needed for the and
FlowCollectorcustom resource to collect traffic. (NETOBSERV-1283)SRIOVnetwork
-
Previously, in the Network Observability Operator details from Operators → Installed Operators, the Status field might have reported incorrect information about the state of the deployment. The status field now shows the proper conditions with improved messages. The history of events is kept, ordered by event date. (NETOBSERV-1224)
FlowCollector -
Previously, during spikes of network traffic load, certain eBPF pods were OOM-killed and went into a state. Now, the
CrashLoopBackOffagent memory footprint is improved, so pods are not OOM-killed and entering aeBPFstate. (NETOBSERV-975)CrashLoopBackOff -
Previously when was set to
processor.metrics.tlsthePROVIDEDoption value was forced to beinsecureSkipVerify. Now you can settruetoinsecureSkipVerifyortrue, and provide a CA certificate if needed. (NETOBSERV-1087)false
2.1.57. Network observability release notes 1.4.0 known issues Copy linkLink copied to clipboard!
You can review the following known issues in the Network Observability Operator 1.4.0 release.
-
Since the 1.2.0 release of the Network Observability Operator, using Loki Operator 5.6, a Loki certificate change periodically affects the pods and results in dropped flows rather than flows written to Loki. The problem self-corrects after some time, but it still causes temporary flow data loss during the Loki certificate change. This issue has only been observed in large-scale environments of 120 nodes or greater. (NETOBSERV-980)
flowlogs-pipeline -
Currently, when includes DNSTracking, larger DNS packets require the
spec.agent.ebpf.featuresagent to look for DNS header outside of the 1st socket buffer (SKB) segment. A neweBPFagent helper function needs to be implemented to support it. Currently, there is no workaround for this issue. (NETOBSERV-1304)eBPF -
Currently, when includes DNSTracking, DNS over TCP packets requires the
spec.agent.ebpf.featuresagent to look for DNS header outside of the 1st SKB segment. A neweBPFagent helper function needs to be implemented to support it. Currently, there is no workaround for this issue. (NETOBSERV-1245)eBPF -
Currently, when using a deployment model, if conversation tracking is configured, conversation events might be duplicated across Kafka consumers, resulting in inconsistent tracking of conversations, and incorrect volumetric data. For that reason, it is not recommended to configure conversation tracking when
KAFKAis set todeploymentModel. (NETOBSERV-926)KAFKA -
Currently, when the is configured to use a
processor.metrics.server.tls.typecertificate, the operator enters an unsteady state that might affect its performance and resource consumption. It is recommended to not use aPROVIDEDcertificate until this issue is resolved, and instead using an auto-generated certificate, settingPROVIDEDtoprocessor.metrics.server.tls.type. (NETOBSERV-1293AUTO -
Since the 1.3.0 release of the Network Observability Operator, installing the Operator causes a warning kernel taint to appear. The reason for this error is that the network observability eBPF agent has memory constraints that prevent preallocating the entire hashmap table. The Operator eBPF agent sets the flag so that pre-allocation is disabled when the hashmap is too memory expansive.
BPF_F_NO_PREALLOC
2.1.58. Network Observability Operator 1.3.0 advisory Copy linkLink copied to clipboard!
You can review the following advisory in the Network Observability Operator 1.3.0 release.
2.1.59. Network Observability Operator 1.3.0 new features and enhancements Copy linkLink copied to clipboard!
You can review the following new features and enhancements in the Network Observability Operator 1.3.0 release.
2.1.59.1. Multi-tenancy in network observability Copy linkLink copied to clipboard!
- System administrators can allow and restrict individual user access, or group access, to the flows stored in Loki. For more information, see "Multi-tenancy in network observability".
2.1.59.2. Flow-based metrics dashboard Copy linkLink copied to clipboard!
- This release adds a new dashboard, which provides an overview of the network flows in your OpenShift Container Platform cluster. For more information, see "Network observability metrics dashboards".
2.1.59.3. Troubleshooting with the must-gather tool Copy linkLink copied to clipboard!
- Information about the Network Observability Operator can now be included in the must-gather data for troubleshooting. For more information, see "Network observability must-gather".
2.1.59.4. Multiple architectures now supported Copy linkLink copied to clipboard!
-
Network Observability Operator can now run on an ,
amd64, orppc64learchitectures. Previously, it only ran onarm64.amd64
2.1.60. Network Observability Operator 1.3.0 deprecated features Copy linkLink copied to clipboard!
You can review the following deprecated features in the Network Observability Operator 1.3.0 release.
2.1.60.1. Channel deprecation Copy linkLink copied to clipboard!
You must switch your channel from
v1.0.x
stable
v1.0.x
2.1.60.2. Deprecated configuration parameter setting Copy linkLink copied to clipboard!
The release of Network Observability Operator 1.3 deprecates the
spec.Loki.authToken
HOST
FORWARD
2.1.61. Network Observability Operator 1.3.0 fixed issues Copy linkLink copied to clipboard!
You can review the following fixed issues in the Network Observability Operator 1.3.0 release.
-
Previously, when the Operator was installed from the CLI, the and
Rolethat are necessary for the Cluster Monitoring Operator to read the metrics were not installed as expected. The issue did not occur when the operator was installed from the web console. Now, either way of installing the Operator installs the requiredRoleBindingandRole. (NETOBSERV-1003)RoleBinding -
Since version 1.2, the Network Observability Operator can raise alerts when a problem occurs with the flows collection. Previously, due to a bug, the related configuration to disable alerts, was not working as expected and sometimes ineffectual. Now, this configuration is fixed so that it is possible to disable the alerts. (NETOBSERV-976)
spec.processor.metrics.disableAlerts -
Previously, when network observability was configured with set to
spec.loki.authToken, only aDISABLEDcluster administrator was able to view network flows. Other types of cluster administrators received authorization failure. Now, any cluster administrator is able to view network flows. (NETOBSERV-972)kubeadmin -
Previously, a bug prevented users from setting to
spec.consolePlugin.portNaming.enable. Now, this setting can be set tofalseto disable port-to-service name translation. (NETOBSERV-971)false - Previously, the metrics exposed by the console plugin were not collected by the Cluster Monitoring Operator (Prometheus), due to an incorrect configuration. Now the configuration has been fixed so that the console plugin metrics are correctly collected and accessible from the OpenShift Container Platform web console. (NETOBSERV-765)
-
Previously, when was set to
processor.metrics.tlsin theAUTO, theFlowCollectordid not adapt the appropriate TLS scheme, and metrics were not visible in the web console. Now the issue is fixed for AUTO mode. (NETOBSERV-1070)flowlogs-pipeline servicemonitor -
Previously, certificate configuration, such as used for Kafka and Loki, did not allow specifying a namespace field, implying that the certificates had to be in the same namespace where network observability is deployed. Moreover, when using Kafka with TLS/mTLS, the user had to manually copy the certificate(s) to the privileged namespace where the agent pods are deployed and manually manage certificate updates, such as in the case of certificate rotation. Now, network observability setup is simplified by adding a namespace field for certificates in the
eBPFresource. As a result, users can now install Loki or Kafka in different namespaces without needing to manually copy their certificates in the network observability namespace. The original certificates are watched so that the copies are automatically updated when needed. (NETOBSERV-773)FlowCollector - Previously, the SCTP, ICMPv4 and ICMPv6 protocols were not covered by the network observability agents, resulting in a less comprehensive network flows coverage. These protocols are now recognized to improve the flows coverage. (NETOBSERV-934)
2.1.62. Network Observability Operator 1.3.0 known issues Copy linkLink copied to clipboard!
You can review the following issues and their workarounds, if available, to troubleshoot issues with the Network Observability Operator 1.3.0 release.
-
When is set to
processor.metrics.tlsin thePROVIDED, theFlowCollectorflowlogs-pipelineis not adapted to the TLS scheme. (NETOBSERV-1087)servicemonitor -
Since the 1.2.0 release of the Network Observability Operator, using Loki Operator 5.6, a Loki certificate change periodically affects the pods and results in dropped flows rather than flows written to Loki. The problem self-corrects after some time, but it still causes temporary flow data loss during the Loki certificate change. This issue has only been observed in large-scale environments of 120 nodes or greater.(NETOBSERV-980)
flowlogs-pipeline -
When you install the Operator, a warning kernel taint can appear. The reason for this error is that the network observability eBPF agent has memory constraints that prevent preallocating the entire hashmap table. The Operator eBPF agent sets the flag so that pre-allocation is disabled when the hashmap is too memory expansive.
BPF_F_NO_PREALLOC
2.1.63. Network observability release notes 1.2.0 preparing for the next update Copy linkLink copied to clipboard!
Switch the Network Observability Operator’s update channel from the deprecated
v1.0.x
stable
The subscription of an installed Operator specifies an update channel that tracks and receives updates for the Operator. Until the 1.2 release of the Network Observability Operator, the only channel available was
v1.0.x
stable
v1.0.x
stable
v1.0.x
2.1.64. Network Observability Operator 1.2.0 advisory Copy linkLink copied to clipboard!
You can view the following advisory for the Network Observability Operator 1.2.0 release.
2.1.65. Network Observability Operator 1.2.0 new features and enhancements Copy linkLink copied to clipboard!
You can view the following new features and enhancements for the Network Observability Operator 1.2.0 release.
2.1.65.1. Histogram in Traffic Flows view Copy linkLink copied to clipboard!
You can now choose to show a histogram of flows over time. The histogram enables you to visualize the history of flows without hitting the Loki query limit. For more information, see "Using the histogram".
2.1.65.2. Conversation tracking Copy linkLink copied to clipboard!
You can now query flows by Log Type, which enables grouping network flows that are part of the same conversation. For more information, see "Working with conversations".
2.1.65.3. Network observability health alerts Copy linkLink copied to clipboard!
The Network Observability Operator now creates automatic alerts if the
flowlogs-pipeline
2.1.66. Network Observability Operator 1.2.0 bug fixes Copy linkLink copied to clipboard!
You can view the following fixed issues for the Network Observability Operator 1.2.0 release.
-
Previously, after changing the value in the FlowCollector spec,
namespaceagent pods running in the previous namespace were not appropriately deleted. Now, the pods running in the previous namespace are appropriately deleted. (NETOBSERV-774)eBPF -
Previously, after changing the value in the FlowCollector spec (such as in Loki section), FlowLogs-Pipeline pods and Console plug-in pods were not restarted, therefore they were unaware of the configuration change. Now, the pods are restarted, so they get the configuration change. (NETOBSERV-772)
caCert.name - Previously, network flows between pods running on different nodes were sometimes not correctly identified as being duplicates because they are captured by different network interfaces. This resulted in over-estimated metrics displayed in the console plug-in. Now, flows are correctly identified as duplicates, and the console plug-in displays accurate metrics. (NETOBSERV-755)
- The "reporter" option in the console plug-in is used to filter flows based on the observation point of either source node or destination node. Previously, this option mixed the flows regardless of the node observation point. This was due to network flows being incorrectly reported as Ingress or Egress at the node level. Now, the network flow direction reporting is correct. The "reporter" option filters for source observation point, or destination observation point, as expected. (NETOBSERV-696)
- Previously, for agents configured to send flows directly to the processor as gRPC+protobuf requests, the submitted payload could be too large and is rejected by the processors' GRPC server. This occurred under very-high-load scenarios and with only some configurations of the agent. The agent logged an error message, such as: grpc: received message larger than max. As a consequence, there was information loss about those flows. Now, the gRPC payload is split into several messages when the size exceeds a threshold. As a result, the server maintains connectivity. (NETOBSERV-617)
2.1.67. Network Observability Operator 1.2.0 known issues Copy linkLink copied to clipboard!
You can review the following issues and their workarounds, if available, to troubleshoot issues with the Network Observability Operator 1.2.0 release.
-
In the 1.2.0 release of the Network Observability Operator, using Loki Operator 5.6, a Loki certificate transition periodically affects the pods and results in dropped flows rather than flows written to Loki. The problem self-corrects after some time, but it still causes temporary flow data loss during the Loki certificate transition. (NETOBSERV-980)
flowlogs-pipeline
2.1.68. Network Observability Operator 1.2.0 notable technical changes Copy linkLink copied to clipboard!
The Network Observability Operator 1.2.0 release requires installation in the
openshift-netobserv-operator
Previously, you could install the Network Observability Operator using a custom namespace. This release introduces the
conversion webhook
ClusterServiceVersion
openshift-operators
Now, the Operator must be installed in the
openshift-netobserv-operator
You cannot automatically upgrade to the new Operator version if you previously installed the Network Observability Operator using a custom namespace. If you previously installed the Operator using a custom namespace, you must delete the instance of the Operator that was installed and re-install your operator in the
openshift-netobserv-operator
netobserv
FlowCollector
2.1.69. Network Observability Operator 1.1.0 enhancements Copy linkLink copied to clipboard!
You can view the following advisory for the Network Observability Operator 1.1.0:
The Network Observability Operator is now stable and the release channel is upgraded to
v1.1.0
2.1.70. Network Observability Operator 1.1.0 fixed issues Copy linkLink copied to clipboard!
You can view the following fixed issues for the Network Observability Operator 1.1.0 release.
-
Previously, unless the Loki configuration was set to
authTokenmode, authentication was not enforced, allowing unauthorized users to retrieve flows. Now, regardless of the LokiFORWARDmode, only cluster administrators can retrieve flows. (BZ#2169468)authToken
Chapter 3. About network observability Copy linkLink copied to clipboard!
Use the Network Observability Operator to observe network traffic via
eBPF
You can view and analyze this stored information in the OpenShift Container Platform console for further insight and troubleshooting.
3.1. Network Observability Operator Copy linkLink copied to clipboard!
The Network Observability Operator provides the cluster-scoped
FlowCollector
A
FlowCollector
The
eBPF
daemonset
3.2. Optional dependencies of the Network Observability Operator Copy linkLink copied to clipboard!
Integrate the Network Observability Operator with optional dependencies, such as the Loki Operator for flow storage and AMQ Streams (Kafka) for resilient, large-scale data handling and scalability.
Supported optional dependencies include the Loki Operator for flow storage, and AMQ Streams for large-scale data handling with Kafka.
- Loki Operator
- You can use Loki as the backend to store all collected flows with a maximal level of details. It is recommended to use the Red Hat supported Loki Operator to install Loki. You can also choose to use network observability without Loki, but you need to consider some factors. For more information, see "Network observability without Loki".
- AMQ Streams Operator
Kafka provides scalability, resiliency and high availability in the OpenShift Container Platform cluster for large scale deployments.
NoteIf you choose to use Kafka, it is recommended to use Red Hat supported AMQ Streams Operator.
3.3. OpenShift Container Platform console integration Copy linkLink copied to clipboard!
The Network Observability Operator integrates with the OpenShift Container Platform console, providing an overview, topology view, and traffic flow tables.
The Network observability metrics dashboards in Observe → Dashboards are available only to users with administrator access.
To enable multi-tenancy for developer access and for administrators with limited access to namespaces, you must specify permissions by defining roles. For more information, see "Enabling multi-tenancy in network observability".
3.3.1. Network observability metrics dashboards Copy linkLink copied to clipboard!
Review the network observability metrics dashboards in the OpenShift Container Platform console, which provide overall traffic flow aggregation, filtering options, and dedicated dashboards for monitoring operator health.
In the OpenShift Container Platform console on the Overview tab, you can view the overall aggregated metrics of the network traffic flow on the cluster. You can choose to display the information by cluster, node, namespace, owner, pod, and service. Filters and display options can further refine the metrics. For more information, see "Observing the network traffic from the Overview view".
In Observe → Dashboards, the Netobserv dashboards provide a quick overview of the network flows in your OpenShift Container Platform cluster. The Netobserv/Health dashboard provides metrics about the health of the Operator. For more information, see "Network observability metrics" and "Viewing health information".
3.3.2. Network observability topology views Copy linkLink copied to clipboard!
The network observability topology view in the OpenShift Container Platform console displays a graphical representation of traffic flow between components, which you can refine using various filters and display options.
The OpenShift Container Platform console offers the Topology tab which represents traffic between the OpenShift Container Platform components as a network graph. You can refine the graph by using the filters and display options. You can access the information for cluster, zone, udn, node, namespace, owner, pod, and service.
3.3.3. Traffic flow tables Copy linkLink copied to clipboard!
The Traffic flow tables in the OpenShift Container Platform web console provide a detailed view of raw network flows, offering powerful filtering options and configurable columns for in-depth analysis.
The Traffic flows tab in the OpenShift Container Platform web console displays the data of the network flows and the amount of traffic.
3.4. Network Observability CLI Copy linkLink copied to clipboard!
The Network Observability CLI (
oc netobserv
The Network Observability CLI is a flow and packet visualization tool that relies on eBPF agents to stream collected data to an ephemeral collector pod. It requires no persistent storage during the capture. After the run, the output is transferred to your local machine. This enables quick, live insight into packets and flow data without installing the Network Observability Operator.
Chapter 4. Installing the Network Observability Operator Copy linkLink copied to clipboard!
Installing the Loki Operator is recommended before using the Network Observability Operator. You can use network observability without Loki, but special considerations apply if you only need metrics or external exporters.
The Loki Operator integrates a gateway that implements multi-tenancy and authentication with Loki for data flow storage. The
LokiStack
LokiStack
4.1. Network observability without Loki Copy linkLink copied to clipboard!
Compare the features available with network observability with and without installing the Loki Operator.
If you only want to export flows to a Kafka consumer or IPFIX collector, or you only need dashboard metrics, then you do not need to install Loki or provide storage for Loki. The following table compares available features with and without Loki.
| With Loki | Without Loki | |
|---|---|---|
| Exporters | X | X |
| Multi-tenancy | X | X |
| Complete filtering and aggregations capabilities [1] | X | |
| Partial filtering and aggregations capabilities [2] | X | X |
| Flow-based metrics and dashboards | X | X |
| Traffic flows view overview [3] | X | X |
| Traffic flows view table | X | |
| Topology view | X | X |
| OpenShift Container Platform console Network Traffic tab integration | X | X |
- Such as per pod.
- Such as per workload or namespace.
- Statistics on packet drops are only available with Loki.
4.2. Installing the Loki Operator Copy linkLink copied to clipboard!
Install the supported Loki Operator version from the software catalog to enable the secure
LokiStack
The Loki Operator versions 6.0+ are the supported Loki Operator versions for network observability; these versions provide the ability to create a
LokiStack
openshift-network
Prerequisites
- You have administrator permissions.
- You have access to the OpenShift Container Platform web console.
- You have access to a supported object store. For example: AWS S3, Google Cloud Storage, Azure, Swift, Minio, or OpenShift Data Foundation.
Procedure
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Choose Loki Operator from the list of available Operators, and click Install.
- Under Installation Mode, select All namespaces on the cluster.
Verification
- Verify that you installed the Loki Operator. Visit the Operators → Installed Operators page and look for Loki Operator.
- Verify that Loki Operator is listed with Status as Succeeded in all the projects.
To uninstall Loki, refer to the uninstallation process that corresponds with the method you used to install Loki. You might have remaining
ClusterRoles
ClusterRoleBindings
4.2.1. Creating a secret for Loki storage Copy linkLink copied to clipboard!
Create a secret with cloud storage credentials, such as for Amazon Web Services (AWS), to allow the Loki Operator to access the necessary object store for log persistence.
The Loki Operator supports a few log storage options, such as AWS S3, Google Cloud Storage, Azure, Swift, Minio, OpenShift Data Foundation. The following example shows how to create a secret for AWS S3 storage. The secret created in this example,
loki-s3
Procedure
- Using the web console, navigate to the Project → All Projects dropdown and select Create Project.
-
Name the project and click Create.
netobserv Navigate to the Import icon, +, in the top right corner. Paste your YAML file into the editor.
The following shows an example secret YAML file for S3 storage:
apiVersion: v1 kind: Secret metadata: name: loki-s3 namespace: netobserv1 stringData: access_key_id: QUtJQUlPU0ZPRE5ON0VYQU1QTEUK access_key_secret: d0phbHJYVXRuRkVNSS9LN01ERU5HL2JQeFJmaUNZRVhBTVBMRUtFWQo= bucketnames: s3-bucket-name endpoint: https://s3.eu-central-1.amazonaws.com region: eu-central-1- 1
- The installation examples in this documentation use the same namespace,
netobserv, across all components. You can optionally use a different namespace for the different components
Verification
- After you create the secret, you view the secret listed under Workloads → Secrets in the web console.
4.2.2. Creating a LokiStack custom resource Copy linkLink copied to clipboard!
Deploy the
LokiStack
oc
You can deploy a
LokiStack
Procedure
- Navigate to Operators → Installed Operators, viewing All projects from the Project dropdown.
- Look for Loki Operator. In the details, under Provided APIs, select LokiStack.
- Click Create LokiStack.
Ensure the following fields are specified in either Form View or YAML view:
apiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: loki namespace: netobserv1 spec: size: 1x.small2 storage: schemas: - version: v13 effectiveDate: '2022-06-01' secret: name: loki-s3 type: s3 storageClassName: gp33 tenants: mode: openshift-network- 1
- The installation examples in this documentation use the same namespace,
netobserv, across all components. You can optionally use a different namespace. - 2
- Specify the deployment size. In the Loki Operator 5.8 and later versions, the supported size options for production instances of Loki are
1x.extra-small,1x.small, or1x.medium.ImportantIt is not possible to change the number
for the deployment size.1x - 3
- Use a storage class name that is available on the cluster for
ReadWriteOnceaccess mode. For best performance, specify a storage class that allocates block storage. You can useoc get storageclassesto see what is available on your cluster.ImportantYou must not reuse the same
CR that is used for logging.LokiStack
- Click Create.
4.2.3. Creating a new group for the cluster-admin user role Copy linkLink copied to clipboard!
Querying application logs for multiple namespaces as a
cluster-admin
Parse error: input size too long (XXXX > 5120)
cluster-admin
cluster-admin
cluster-admin
Use the following procedure to create a new group for users with
cluster-admin
Procedure
Enter the following command to create a new group:
$ oc adm groups new cluster-adminEnter the following command to add the desired user to the
group:cluster-admin$ oc adm groups add-users cluster-admin <username>Enter the following command to add
user role to the group:cluster-admin$ oc adm policy add-cluster-role-to-group cluster-admin cluster-admin
4.2.4. Custom admin group access Copy linkLink copied to clipboard!
If you need to see cluster-wide logs without necessarily being an administrator, or if you already have any group defined that you want to use here, you can specify a custom group using the
adminGroup
adminGroups
LokiStack
Administrator users have access to all network logs across the cluster.
Example LokiStack CR
apiVersion: loki.grafana.com/v1
kind: LokiStack
metadata:
name: loki
namespace: netobserv
spec:
tenants:
mode: openshift-network
openshift:
adminGroups:
- cluster-admin
- custom-admin-group
4.2.5. Loki deployment sizing Copy linkLink copied to clipboard!
Sizing for Loki follows the format of
1x.<size>
1x
<size>
It is not possible to change the number
1x
| 1x.demo | 1x.extra-small | 1x.small | 1x.medium | |
|---|---|---|---|---|
| Data transfer | Demo use only | 100GB/day | 500GB/day | 2TB/day |
| Queries per second (QPS) | Demo use only | 1-25 QPS at 200ms | 25-50 QPS at 200ms | 25-75 QPS at 200ms |
| Replication factor | None | 2 | 2 | 2 |
| Total CPU requests | None | 14 vCPUs | 34 vCPUs | 54 vCPUs |
| Total memory requests | None | 31Gi | 67Gi | 139Gi |
| Total disk requests | 40Gi | 430Gi | 430Gi | 590Gi |
4.2.6. LokiStack ingestion limits and health alerts Copy linkLink copied to clipboard!
The
LokiStack
You might want to update the ingestion and query limits if you get Loki errors showing up in the Console plugin, or in
flowlogs-pipeline
Here is an example of configured limits:
spec:
limits:
global:
ingestion:
ingestionBurstSize: 40
ingestionRate: 20
maxGlobalStreamsPerTenant: 25000
queries:
maxChunksPerQuery: 2000000
maxEntriesLimitPerQuery: 10000
maxQuerySeries: 3000
For more information about these settings, see the LokiStack API reference.
4.3. Installing the Network Observability Operator Copy linkLink copied to clipboard!
Install the Network Observability Operator and use the setup wizard to create the
FlowCollector
You can set specifications in the web console when you create the
FlowCollector
The actual memory consumption of the Operator depends on your cluster size and the number of resources deployed. Memory consumption might need to be adjusted accordingly. For more information refer to "Network Observability controller manager pod runs out of memory" in the "Important Flow Collector configuration considerations" section.
Prerequisites
- If you choose to use Loki, install the Loki Operator version 5.7+.
-
You must have privileges.
cluster-admin -
One of the following supported architectures is required: ,
amd64,ppc64le, orarm64.s390x - Any CPU supported by Red Hat Enterprise Linux (RHEL) 9.
- Must be configured with OVN-Kubernetes as the main network plugin, and optionally using secondary interfaces with Multus and SR-IOV.
Additionally, this installation example uses the
netobserv
Procedure
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Choose Network Observability Operator from the list of available Operators in the OperatorHub, and click Install.
-
Select the checkbox .
Enable Operator recommended cluster monitoring on this Namespace - Navigate to Operators → Installed Operators. Under Provided APIs for Network Observability, select the Flow Collector link.
- Follow the Network Observability FlowCollector setup wizard.
- Click Create.
Verification
To confirm this was successful, when you navigate to Observe you should see Network Traffic listed in the options.
In the absence of Application Traffic within the OpenShift Container Platform cluster, default filters might show that there are "No results", which results in no visual flow. Beside the filter selections, select Clear all filters to see the flow.
4.4. Enabling multi-tenancy in network observability Copy linkLink copied to clipboard!
Enable multi-tenancy in network observability by configuring cluster roles and namespace roles to grant project administrators and developers granular, restricted access to flows and metrics in Loki and Prometheus.
Access is enabled for project administrators. Project administrators who have limited access to some namespaces can access flows for only those namespaces.
For Developers, multi-tenancy is available for both Loki and Prometheus but requires different access rights.
Prerequisite
- If you are using Loki, you have installed at least Loki Operator version 5.7.
- You must be logged in as a project administrator.
Procedure
For per-tenant access, you must have the
cluster role and thenetobserv-loki-readernamespace role to use the developer perspective. Run the following commands for this level of access:netobserv-metrics-reader$ oc adm policy add-cluster-role-to-user netobserv-loki-reader <user_group_or_name>$ oc adm policy add-role-to-user netobserv-metrics-reader <user_group_or_name> -n <namespace>For cluster-wide access, non-cluster-administrators must have the
,netobserv-loki-reader, andcluster-monitoring-viewcluster roles. In this scenario, you can use either the admin perspective or the developer perspective. Run the following commands for this level of access:netobserv-metrics-reader$ oc adm policy add-cluster-role-to-user netobserv-loki-reader <user_group_or_name>$ oc adm policy add-cluster-role-to-user cluster-monitoring-view <user_group_or_name>$ oc adm policy add-cluster-role-to-user netobserv-metrics-reader <user_group_or_name>
4.6. Installing Kafka (optional) Copy linkLink copied to clipboard!
The Kafka Operator is supported for large-scale environments. Kafka provides high-throughput and low-latency data feeds for forwarding network flow data in a more resilient, scalable way.
You can install the Kafka Operator as Red Hat AMQ Streams from the Operator Hub, just as the Loki Operator and Network Observability Operator were installed. Refer to "Configuring the FlowCollector resource with Kafka" to configure Kafka as a storage option.
To uninstall Kafka, refer to the uninstallation process that corresponds with the method you used to install.
4.7. Uninstalling the Network Observability Operator Copy linkLink copied to clipboard!
Uninstall the Network Observability Operator using the OpenShift Container Platform web console Operator Hub, working in the Ecosystem → Installed Operators area.
Procedure
Remove the
custom resource.FlowCollector- Click Flow Collector, which is next to the Network Observability Operator in the Provided APIs column.
-
Click the Options menu
for the cluster and select Delete FlowCollector.
Uninstall the Network Observability Operator.
- Navigate back to the Operators → Installed Operators area.
-
Click the Options menu
next to the Network Observability Operator and select Uninstall Operator.
-
Home → Projects and select
openshift-netobserv-operator - Navigate to Actions and select Delete Project
Remove the
custom resource definition (CRD).FlowCollector- Navigate to Administration → CustomResourceDefinitions.
-
Look for FlowCollector and click the Options menu
.
Select Delete CustomResourceDefinition.
ImportantThe Loki Operator and Kafka remain if they were installed and must be removed separately. Additionally, you might have remaining data stored in an object store, and a persistent volume that must be removed.
Chapter 5. Network Observability Operator in OpenShift Container Platform Copy linkLink copied to clipboard!
The Network Observability Operator for OpenShift Container Platform deploys a monitoring pipeline. This pipeline collects and enriches network traffic flows generated by the
eBPF agent
5.1. Viewing statuses Copy linkLink copied to clipboard!
View the operational status of the Network Observability Operator by using the
oc get
FlowCollector
eBPF agent
flowlogs-pipeline
The Network Observability Operator provides the Flow Collector API. When a Flow Collector resource is created, it deploys pods and services to create and store network flows in the Loki log store, as well as to display dashboards, metrics, and flows in the OpenShift Container Platform web console.
Procedure
Run the following command to view the state of
:FlowCollector$ oc get flowcollector/clusterExample output
NAME AGENT SAMPLING (EBPF) DEPLOYMENT MODEL STATUS cluster EBPF 50 DIRECT ReadyCheck the status of pods running in the
namespace by entering the following command:netobserv$ oc get pods -n netobservExample output
NAME READY STATUS RESTARTS AGE flowlogs-pipeline-56hbp 1/1 Running 0 147m flowlogs-pipeline-9plvv 1/1 Running 0 147m flowlogs-pipeline-h5gkb 1/1 Running 0 147m flowlogs-pipeline-hh6kf 1/1 Running 0 147m flowlogs-pipeline-w7vv5 1/1 Running 0 147m netobserv-plugin-cdd7dc6c-j8ggp 1/1 Running 0 147mThe
pods collect flows, enriches the collected flows, then send flows to the Loki storage.flowlogs-pipelinepods create a visualization plugin for the OpenShift Container Platform Console.netobserv-pluginCheck the status of pods running in the namespace
by entering the following command:netobserv-privileged$ oc get pods -n netobserv-privilegedExample output
NAME READY STATUS RESTARTS AGE netobserv-ebpf-agent-4lpp6 1/1 Running 0 151m netobserv-ebpf-agent-6gbrk 1/1 Running 0 151m netobserv-ebpf-agent-klpl9 1/1 Running 0 151m netobserv-ebpf-agent-vrcnf 1/1 Running 0 151m netobserv-ebpf-agent-xf5jh 1/1 Running 0 151mThe
pods monitor network interfaces of the nodes to get flows and send them tonetobserv-ebpf-agentpods.flowlogs-pipelineIf you are using the Loki Operator, check the status of the
pods ofcomponentcustom resource in theLokiStacknamespace by entering the following command:netobserv$ oc get pods -n netobservExample output
NAME READY STATUS RESTARTS AGE lokistack-compactor-0 1/1 Running 0 18h lokistack-distributor-654f87c5bc-qhkhv 1/1 Running 0 18h lokistack-distributor-654f87c5bc-skxgm 1/1 Running 0 18h lokistack-gateway-796dc6ff7-c54gz 2/2 Running 0 18h lokistack-index-gateway-0 1/1 Running 0 18h lokistack-index-gateway-1 1/1 Running 0 18h lokistack-ingester-0 1/1 Running 0 18h lokistack-ingester-1 1/1 Running 0 18h lokistack-ingester-2 1/1 Running 0 18h lokistack-querier-66747dc666-6vh5x 1/1 Running 0 18h lokistack-querier-66747dc666-cjr45 1/1 Running 0 18h lokistack-querier-66747dc666-xh8rq 1/1 Running 0 18h lokistack-query-frontend-85c6db4fbd-b2xfb 1/1 Running 0 18h lokistack-query-frontend-85c6db4fbd-jm94f 1/1 Running 0 18h
5.2. Network Observablity Operator architecture Copy linkLink copied to clipboard!
Review the Network Observability Operator architecture, detailing how the
FlowCollector
eBPF agent
The Network Observability Operator provides the
FlowCollector
eBPF agent
flowlogs-pipeline
netobserv-plugin
FlowCollector
The
eBPF agent
flowlogs-pipeline
flowlogs-pipeline
netobserv-plugin
If you do not use Loki, you can generate metrics with Prometheus. Those metrics and their related dashboards are accessible in the web console. For more information, see "Network Observability without Loki".
There are three deployment model options for the Network Observability Operator.
The Network Observability Operator does not manage Loki or other data stores. You must install Loki separately by using the Loki Operator. If you use Kafka, you must install it separately by using the Kafka Operator.
- Service deployment model
-
When the
spec.deploymentModelfield in theFlowCollectorresource is set toService, agents are deployed per node as daemon sets. Theflowlogs-pipelineis a standard deployment with a service. You can scale theflowlogs-pipelinecomponent by using thespec.processor.consumerReplicasfield. - Direct deployment model
-
When the
spec.deploymentModelfield is set toDirect, agents and theflowlogs-pipelineare both deployed per node as daemon sets. This model is suitable for technology assessments and small clusters. However, it is less memory-efficient in large clusters because each instance offlowlogs-pipelinecaches the same cluster information. - Kafka deployment model (optional)
If you use the Kafka option, the
sends the network flow data to Kafka. You can scale theeBPF agentcomponent by using theflowlogs-pipelinefield. Thespec.processor.consumerReplicascomponent reads from the Kafka topic before sending data to Loki, as shown in the following diagram.flowlogs-pipeline
5.3. Viewing Network Observability Operator status and configuration Copy linkLink copied to clipboard!
Inspect the current status, configuration details, and generated resources of the Network Observability Operator by using the
oc describe flowcollector/cluster
Procedure
Run the following command to view the status and configuration of the Network Observability Operator:
$ oc describe flowcollector/cluster
Chapter 6. Configuring the Network Observability Operator Copy linkLink copied to clipboard!
Configure the Network Observability Operator by updating the cluster-wide
FlowCollector
The
FlowCollector
FlowCollector
cluster
6.1. View the FlowCollector resource Copy linkLink copied to clipboard!
View and modify the
FlowCollector
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
-
Select cluster then select the YAML tab. There, you can modify the resource to configure the Network Observability Operator.
FlowCollector
6.1.1. Example of a FlowCollector resource Copy linkLink copied to clipboard!
Review a comprehensive, annotated example of the
FlowCollector
eBPF
6.1.1.1. Sample FlowCollector resource Copy linkLink copied to clipboard!
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
namespace: netobserv
deploymentModel: Service
agent:
type: eBPF
ebpf:
sampling: 50
logLevel: info
privileged: false
resources:
requests:
memory: 50Mi
cpu: 100m
limits:
memory: 800Mi
processor:
logLevel: info
resources:
requests:
memory: 100Mi
cpu: 100m
limits:
memory: 800Mi
logTypes: Flows
advanced:
conversationEndTimeout: 10s
conversationHeartbeatInterval: 30s
loki:
mode: LokiStack
consolePlugin:
register: true
logLevel: info
portNaming:
enable: true
portNames:
"3100": loki
quickFilters:
- name: Applications
filter:
src_namespace!: 'openshift-,netobserv'
dst_namespace!: 'openshift-,netobserv'
default: true
- name: Infrastructure
filter:
src_namespace: 'openshift-,netobserv'
dst_namespace: 'openshift-,netobserv'
- name: Pods network
filter:
src_kind: 'Pod'
dst_kind: 'Pod'
default: true
- name: Services network
filter:
dst_kind: 'Service'
- 1
- The Agent specification,
spec.agent.type, must beEBPF. eBPF is the only OpenShift Container Platform supported option. - 2
- You can set the Sampling specification,
spec.agent.ebpf.sampling, to manage resources. By default, eBPF sampling is set to50, so a flow has a 1 in 50 chance of being sampled. A lower sampling interval value requires more computational, memory, and storage resources. A value of0or1means all flows are sampled. It is recommended to start with the default value and refine it empirically to determine the optimal setting for your cluster. - 3
- The Processor specification
spec.processor.can be set to enable conversation tracking. When enabled, conversation events are queryable in the web console. Thespec.processor.logTypesvalue isFlows. Thespec.processor.advancedvalues areConversations,EndedConversations, orALL. Storage requirements are highest forAlland lowest forEndedConversations. - 4
- The Loki specification,
spec.loki, specifies the Loki client. The default values match the Loki install paths mentioned in the Installing the Loki Operator section. If you used another installation method for Loki, specify the appropriate client information for your install. - 5
- The
LokiStackmode automatically sets a few configurations:querierUrl,ingesterUrlandstatusUrl,tenantID, and corresponding TLS configuration. Cluster roles and a cluster role binding are created for reading and writing logs to Loki. AndauthTokenis set toForward. You can set these manually using theManualmode. - 6
- The
spec.quickFiltersspecification defines filters that show up in the web console. TheApplicationfilter keys,src_namespaceanddst_namespace, are negated (!), so theApplicationfilter shows all traffic that does not originate from, or have a destination to, anyopenshift-ornetobservnamespaces. For more information, see Configuring quick filters below.
6.2. Configuring the FlowCollector resource with Kafka Copy linkLink copied to clipboard!
Configure the
FlowCollector
A Kafka instance needs to be running, and a Kafka topic dedicated to OpenShift Container Platform Network Observability must be created in that instance. For more information, see Kafka documentation with AMQ Streams.
Prerequisites
- Kafka is installed. Red Hat supports Kafka with AMQ Streams Operator.
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the Network Observability Operator, select Flow Collector.
- Select the cluster and then click the YAML tab.
Modify the
resource for OpenShift Container Platform Network Observability Operator to use Kafka, as shown in the following sample YAML:FlowCollectorSample Kafka configuration in
FlowCollectorresourceapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: deploymentModel: Kafka1 kafka: address: "kafka-cluster-kafka-bootstrap.netobserv"2 topic: network-flows3 tls: enable: false4 - 1
- Set
spec.deploymentModeltoKafkainstead ofDirectto enable the Kafka deployment model. - 2
spec.kafka.addressrefers to the Kafka bootstrap server address. You can specify a port if needed, for instancekafka-cluster-kafka-bootstrap.netobserv:9093for using TLS on port 9093.- 3
spec.kafka.topicshould match the name of a topic created in Kafka.- 4
spec.kafka.tlscan be used to encrypt all communications to and from Kafka with TLS or mTLS. When enabled, the Kafka CA certificate must be available as a ConfigMap or a Secret, both in the namespace where theflowlogs-pipelineprocessor component is deployed (default:netobserv) and where the eBPF agents are deployed (default:netobserv-privileged). It must be referenced withspec.kafka.tls.caCert. When using mTLS, client secrets must be available in these namespaces as well (they can be generated for instance using the AMQ Streams User Operator) and referenced withspec.kafka.tls.userCert.
6.3. Export enriched network flow data Copy linkLink copied to clipboard!
Configure the
FlowCollector
For Kafka or IPFIX, any processor or storage that supports those inputs, such as Splunk, Elasticsearch, or Fluentd, can consume the enriched network flow data.
For OpenTelemetry, network flow data and metrics can be exported to a compatible OpenTelemetry endpoint, such as Red Hat build of OpenTelemetry or Prometheus.
After configuration, network flows data can be sent to an available output. For more information, see "Network flows format reference".
Prerequisites
-
Your Kafka, IPFIX, or OpenTelemetry collector endpoints are available from Network Observability pods.
flowlogs-pipeline
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster and then select the YAML tab.
Edit the
to configureFlowCollectoras follows:spec.exportersapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: exporters: - type: Kafka1 kafka: address: "kafka-cluster-kafka-bootstrap.netobserv" topic: netobserv-flows-export2 tls: enable: false3 - type: IPFIX4 ipfix: targetHost: "ipfix-collector.ipfix.svc.cluster.local" targetPort: 4739 transport: tcp or udp5 - type: OpenTelemetry6 openTelemetry: targetHost: my-otelcol-collector-headless.otlp.svc targetPort: 4317 type: grpc7 logs:8 enable: true metrics:9 enable: true prefix: netobserv pushTimeInterval: 20s10 expiryTime: 2m # fieldsMapping:11 # input: SrcAddr # output: source.address- 1 4 6
- You can export flows to IPFIX, OpenTelemetry, and Kafka individually or concurrently.
- 2
- The Network Observability Operator exports all flows to the configured Kafka topic.
- 3
- You can encrypt all communications to and from Kafka with SSL/TLS or mTLS. When enabled, the Kafka CA certificate must be available as a ConfigMap or a Secret, both in the namespace where the
flowlogs-pipelineprocessor component is deployed (default: netobserv). It must be referenced withspec.exporters.tls.caCert. When using mTLS, client secrets must be available in these namespaces as well (they can be generated for instance using the AMQ Streams User Operator) and referenced withspec.exporters.tls.userCert. - 5
- You have the option to specify transport. The default value is
tcpbut you can also specifyudp. - 7
- The protocol of OpenTelemetry connection. The available options are
httpandgrpc. - 8
- OpenTelemetry configuration for exporting logs, which are the same as the logs created for Loki.
- 9
- OpenTelemetry configuration for exporting metrics, which are the same as the metrics created for Prometheus. These configurations are specified in the
spec.processor.metrics.includeListparameter of theFlowCollectorcustom resource, along with any custom metrics you defined using theFlowMetricscustom resource. - 10
- The time interval that metrics are sent to the OpenTelemetry collector.
- 11
- Optional:Network Observability network flows formats get automatically renamed to an OpenTelemetry compliant format. The
fieldsMappingspecification gives you the ability to customize the OpenTelemetry format output. For example in the YAML sample,SrcAddris the Network Observability input field, and it is being renamedsource.addressin OpenTelemetry output. You can see both Network Observability and OpenTelemetry formats in the "Network flows format reference".
6.4. Updating the FlowCollector resource Copy linkLink copied to clipboard!
As an alternative to using the web console, use the
oc patch
flowcollector
Procedure
Run the following command to patch the
CR and update theflowcollectorvalue:spec.agent.ebpf.sampling$ oc patch flowcollector cluster --type=json -p "[{"op": "replace", "path": "/spec/agent/ebpf/sampling", "value": <new value>}] -n netobserv"
6.5. Filter network flows at ingestion Copy linkLink copied to clipboard!
Create filters to reduce the number of generated network flows. Filtering network flows can reduce the resource usage of the network observability components.
You can configure two kinds of filters:
- eBPF agent filters
- Flowlogs-pipeline filters
6.5.1. eBPF agent filters Copy linkLink copied to clipboard!
eBPF agent filters maximize performance because they take effect at the earliest stage of the network flows collection process.
To configure eBPF agent filters with the Network Observability Operator, see "Filtering eBPF flow data using multiple rules".
6.5.2. Flowlogs-pipeline filters Copy linkLink copied to clipboard!
Flowlogs-pipeline filters provide greater control over traffic selection because they take effect later in the network flows collection process. They are primarily used to improve data storage.
Flowlogs-pipeline filters use a simple query language to filter network flow, as shown in the following example:
(srcnamespace="netobserv" OR (srcnamespace="ingress" AND dstnamespace="netobserv")) AND srckind!="service"
The query language uses the following syntax:
| Category | Operators |
|---|---|
| Logical boolean operators (not case-sensitive) |
|
| Comparison operators |
|
| Unary operations |
|
You can configure flowlogs-pipeline filters in the
spec.processor.filters
FlowCollector
Example YAML Flowlogs-pipeline filter
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
namespace: netobserv
agent:
processor:
filters:
- query: |
(SrcK8S_Namespace="netobserv" OR (SrcK8S_Namespace="openshift-ingress" AND DstK8S_Namespace="netobserv"))
outputTarget: Loki
sampling: 10
- 1
- Sends matching flows to a specific output, such as Loki, Prometheus, or an external system. When omitted, sends to all configured outputs.
- 2
- Optional. Applies a sampling interval to limit the number of matching flows to be stored or exported. For example,
sampling: 10means that there is a 1 in 10 chance that a flow will be kept.
6.6. Configuring quick filters Copy linkLink copied to clipboard!
Use the list of available source, destination, and universal filter keys to modify quick filters within the
FlowCollector
Exact matches are possible using double-quotes around values. Otherwise, partial matches are used for textual values. The bang (!) character, placed at the end of a key, means negation. See the sample
FlowCollector
The filter matching types "all of" or "any of" is a UI setting that the users can modify from the query options. It is not part of this resource configuration.
Here is a list of all available filter keys:
| Universal* | Source | Destination | Description |
|---|---|---|---|
| namespace |
|
| Filter traffic related to a specific namespace. |
| name |
|
| Filter traffic related to a given leaf resource name, such as a specific pod, service, or node (for host-network traffic). |
| kind |
|
| Filter traffic related to a given resource kind. The resource kinds include the leaf resource (Pod, Service or Node), or the owner resource (Deployment and StatefulSet). |
| owner_name |
|
| Filter traffic related to a given resource owner; that is, a workload or a set of pods. For example, it can be a Deployment name, a StatefulSet name, etc. |
| resource |
|
| Filter traffic related to a specific resource that is denoted by its canonical name, that identifies it uniquely. The canonical notation is
|
| address |
|
| Filter traffic related to an IP address. IPv4 and IPv6 are supported. CIDR ranges are also supported. |
| mac |
|
| Filter traffic related to a MAC address. |
| port |
|
| Filter traffic related to a specific port. |
| host_address |
|
| Filter traffic related to the host IP address where the pods are running. |
| protocol | N/A | N/A | Filter traffic related to a protocol, such as TCP or UDP. |
-
Universal keys filter for any of source or destination. For example, filtering means all traffic from
name: 'my-pod'and all traffic tomy-pod, regardless of the matching type used, whether Match all or Match any.my-pod
6.7. Resource management and performance considerations Copy linkLink copied to clipboard!
Review the key configuration settings, including eBPF sampling, feature enablement, and resource limits, necessary to manage performance criteria and optimize resource consumption for network observability.
The amount of resources required by network observability depends on the size of your cluster and your requirements for the cluster to ingest and store observability data. To manage resources and set performance criteria for your cluster, consider configuring the following settings. Configuring these settings might meet your optimal setup and observability needs.
The following settings can help you manage resources and performance from the outset:
- eBPF Sampling
-
You can set the Sampling specification,
spec.agent.ebpf.sampling, to manage resources. By default, eBPF sampling is set to50, so a flow has a 1 in 50 chance of being sampled. A lower sampling interval value requires more computational, memory, and storage resources. A value of0or1means all flows are sampled. It is recommended to start with the default value and refine it empirically to determine the optimal setting for your cluster. - eBPF features
- The more features that are enabled, the more CPU and memory are impacted. See "Observing the network traffic" for a complete list of these features.
- Without Loki
- You can reduce the amount of resources that network observability requires by not using Loki and instead relying on Prometheus. For example, when network observability is configured without Loki, the total savings of memory usage are in the 20-65% range and CPU utilization is lower by 10-30%, depending upon the sampling interval value. See "Network observability without Loki" for more information.
- Restricting or excluding interfaces
-
Reduce the overall observed traffic by setting the values for
spec.agent.ebpf.interfacesandspec.agent.ebpf.excludeInterfaces. By default, the agent fetches all the interfaces in the system, except the ones listed inexcludeInterfacesandlo(local interface). Note that the interface names might vary according to the Container Network Interface (CNI) used. - Performance fine-tuning
The following settings can be used to fine-tune performance after the Network Observability has been running for a while:
-
Resource requirements and limits: Adapt the resource requirements and limits to the load and memory usage you expect on your cluster by using the and
spec.agent.ebpf.resourcesspecifications. The default limits of 800MB might be sufficient for most medium-sized clusters.spec.processor.resources -
Cache max flows timeout: Control how often flows are reported by the agents by using the eBPF agent’s and
spec.agent.ebpf.cacheMaxFlowsspecifications. A larger value results in less traffic being generated by the agents, which correlates with a lower CPU load. However, a larger value leads to a slightly higher memory consumption, and might generate more latency in the flow collection.spec.agent.ebpf.cacheActiveTimeout
-
Resource requirements and limits: Adapt the resource requirements and limits to the load and memory usage you expect on your cluster by using the
6.7.1. Resource considerations Copy linkLink copied to clipboard!
The Network Observability Operator configuration can be adjusted based on the cluster workload size. Use the following baseline examples to determine the appropriate resource limits and configuration settings for the environment.
The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.
The test beds used for these recommendations are:
-
Extra small: 10-node cluster, 4 vCPUs and 16 GiB memory per worker, size
LokiStack, tested on AWS M6i instances.1x.extra-small -
Small: 25-node cluster, 16 vCPUs and 64 GiB memory per worker, size
LokiStack, tested on AWS M6i instances.1x.small -
Large: 250-node cluster, 16 vCPUs and 64 GiB memory per worker, size
LokiStack, tested on AWS M6i instances. In addition to the worker and controller nodes, three infrastructure nodes (size1x.medium) and one workload node (sizeM6i.12xlarge) were tested.M6i.8xlarge
| Criterion | Extra small (10 nodes) | Small (25 nodes) | Large (250 nodes) |
|---|---|---|---|
|
Operator memory limit: |
|
|
|
|
eBPF agent sampling interval: |
|
|
|
|
eBPF agent memory limit: |
|
|
|
|
eBPF agent cache size: |
|
|
|
|
Processor memory limit: |
|
|
|
|
Processor replicas: |
|
|
|
|
Deployment model: |
|
|
|
| Kafka partitions: Kafka installation | N/A |
|
|
| Kafka brokers: Kafka installation | N/A |
|
|
6.7.2. Total average memory and CPU usage Copy linkLink copied to clipboard!
Review the table detailing the total average CPU and memory usage for network observability components under two distinct traffic scenarios (
Test 1
Test 2
The following table outlines averages of total resource usage for clusters with a sampling value of
1
50
Test 1
Test 2
-
takes into account high ingress traffic volume in addition to the total number of namespace, pods and services in an OpenShift Container Platform cluster, places load on the eBPF agent, and represents use cases with a high number of workloads for a given cluster size. For example,
Test 1consists of 76 Namespaces, 5153 Pods, and 2305 Services with a network traffic scale of ~350 MB/s.Test 1 -
takes into account high ingress traffic volume in addition to the total number of namespace, pods and services in an OpenShift Container Platform cluster and represents use cases with a high number of workloads for a given cluster size. For example,
Test 2consists of 553 Namespaces, 6998 Pods, and 2508 Services with a network traffic scale of ~950 MB/s.Test 2
Since different types of cluster use cases are exemplified in the different tests, the numbers in this table do not scale linearly when compared side-by-side. Instead, they are intended to be used as a benchmark for evaluating your personal cluster usage. The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.
Metrics exported to Prometheus can impact the resource usage. Cardinality values for the metrics can help determine how much resources are impacted. For more information, see "Network Flows format" in the Additional resources section.
| Sampling value | Resources used | Test 1 (25 nodes) | Test 2 (250 nodes) |
|---|---|---|---|
| Sampling = 50 | Total NetObserv CPU Usage | 1.35 | 5.39 |
| Total NetObserv RSS (Memory) Usage | 16 GB | 63 GB | |
| Sampling = 1 | Total NetObserv CPU Usage | 1.82 | 11.99 |
| Total NetObserv RSS (Memory) Usage | 22 GB | 87 GB |
Summary: This table shows average total resource usage of Network Observability, which includes Agents, FLP, Kafka, and Loki with all features enabled. For details about what features are enabled, see the features covered in "Observing the network traffic", which comprises all the features that are enabled for this testing.
Chapter 7. Network observability per-tenant model Copy linkLink copied to clipboard!
Use the
FlowCollectorSlice
7.1. Per-tenant hierarchical governance and tenant autonomy Copy linkLink copied to clipboard!
Cluster administrators can maintain global governance while allowing project administrators to manage network traffic observability within their specific namespaces.
The Network Observability Operator uses a hierarchical configuration model to support multitenancy. This architecture is beneficial for large-scale deployments and hosted control planes environments where individual teams require self-service visibility without cluster administrator intervention.
The hierarchical model consists of the following components:
- Global governance
-
The cluster administrator manages the global
FlowCollectorresource. This resource defines the observability infrastructure and determines if per-tenant configuration is permitted. - Tenant autonomy
-
The project administrator manages the
FlowCollectorSliceresource. This namespace-scoped custom resource (CR) allows teams to define specific observability settings for their workloads.
7.2. FlowCollectorSlice resource for granular flow collection Copy linkLink copied to clipboard!
The
FlowCollectorSlice
It complements the existing
FlowCollector
When slice-based collection is enabled, only traffic that matches at least one
FlowCollectorSlice
7.2.1. Benefits of FlowCollectorSlice Copy linkLink copied to clipboard!
By default, network flow collection applies uniformly to all traffic in the cluster. This can result in excessive data volume and limited flexibility.
Using
FlowCollectorSlice
- Enables selective flow collection for specific namespaces or workloads.
- Supports multi-tenant and environment-based observability.
- Reduces storage and processing costs by filtering irrelevant traffic.
- Preserves backward compatibility through opt-in configuration.
7.2.2. Relationship between FlowCollector and FlowCollectorSlice Copy linkLink copied to clipboard!
While the
FlowCollector
FlowCollectorSlice
The
FlowCollector.spec.slicesConfig
7.2.3. Collection modes Copy linkLink copied to clipboard!
Slice behavior is governed by the
FlowCollector.spec.slicesConfig.collectionMode
- AlwaysCollect
- Collects network flows from all cluster namespaces.
-
Applies the subnet and sampling configurations defined in resources.
FlowCollectorSlice -
Ignores the namespace selection logic in resources.
FlowCollectorSlice - Maintains the default collection behavior for backward compatibility.
- AllowList
-
Collects only traffic that matches at least one resource.
FlowCollectorSlice - An optional namespace allow list includes selected namespaces in the collection.
-
Collects only traffic that matches at least one
7.2.4. FlowCollectorSlice status Copy linkLink copied to clipboard!
Each
FlowCollectorSlice
status
- Validation results.
- Reconciliation state.
- Whether the slice is successfully applied.
This status allows administrators to verify that slice definitions are active and functioning as expected.
7.3. Enable the Network Observability Operator FlowCollectorSlice Copy linkLink copied to clipboard!
Enabling the
FlowCollectorSlice
FlowCollector
Before project administrators can manage their own settings, a cluster administrator must enable the
FlowCollector
FlowCollectorSlice
Prerequisites
- The Network Observability Operator is installed.
-
A custom resource exists in the cluster.
FlowCollector -
You have privileges.
cluster-admin
Procedure
Edit the
custom resource by running the following command:FlowCollector$ oc edit flowcollector clusterConfigure the
field to define which namespaces are permitted to use slices:spec.processor.slicesConfigapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: processor: slicesConfig: enable: true collectionMode: AllowList namespacesAllowList: - /openshift-.*|netobserv.*/where:
spec.processor.sliceConfig.enable-
Specifies if the
FlowCollectorSlicefeature is enabled. If not, all resources of kindFlowCollectorSliceare ignored. spec.processor.sliceConfig.collectionMode-
Specifies how the
FlowCollectorSlicecustom resources impacts the flow collection process. When set toAlwaysCollect, all flows are collected regardless of the presence ofFlowCollectorSlice. When set toAllowList, only the flows related to namespaces where aFlowCollectorSliceresource is present, or configured via the globalnamespacesAllowList, are collected. spec.processor.sliceConfig.namespacesAllowListSpecifies a list of namespaces for which flows are always collected, regardless of the presence of
in those namespaces.FlowCollectorSliceNoteThe
field supports regular expressions, such asnamespacesAllowListto capture multiple namespaces, or strict equality, such as/openshift-.*/, to match a specific namespace.netobserv
- Save the changes and exit the editor.
Verification
-
Verify that only network flows from the namespace and namespaces starting with
netobservare displayed in the Network Traffic page of the web console.openshift-
7.3.1. Disable the Network Observability Operator FlowCollectorSlice Copy linkLink copied to clipboard!
Disable slice-based filtering in the Network Observability Operator to resume global flow collection while preserving existing
FlowCollectorSlice
Procedure
Edit the
resource by running the following command:FlowCollector$ oc edit flowcollector clusterSet the
field tospec.processor.slicesConfig.collectionMode:AlwaysCollectapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: processor: slicesConfig: enable: true collectionMode: AlwaysCollect ...Save the changes.
Flow collection resumes for all traffic, and existing
resources remain available for future use.FlowCollectorSlice
7.4. Configure the FlowCollectorSlice as a project administrator Copy linkLink copied to clipboard!
Project administrators can manage flow collection and data enrichment within their own namespaces by configuring a
FlowCollectorSlice
Prerequisites
- The Network Observability Operator is installed.
-
You have permissions for the namespace.
project-admin
Procedure
Create a YAML file named
:flowCollectorSlice.yamlapiVersion: flows.netobserv.io/v1alpha1 kind: FlowCollectorSlice metadata: name: flowcollectorslice-sample namespace: my-app spec: sampling: 1 subnetLabels: - name: EXT:Database cidrs: - 192.168.50.0/24Apply the configuration by running the following command:
$ oc apply -f flowCollectorSlice.yaml
Verification
- In the OpenShift Container Platform console, navigate to Observe → Network Traffic.
-
Ensure flows to subnet are observed with the
192.168.50.0/24label.EXT:Database
7.5. FlowCollectorSlice [flows.netobserv.io/v1alpha1] Copy linkLink copied to clipboard!
- Description
- FlowCollectorSlice is the API allowing to decentralize some of the FlowCollector configuration per namespace tenant.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and might reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources |
|
|
| Kind is a string value representing the REST resource this object represents. Servers might infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds |
|
|
| Standard object’s metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata |
|
|
| FlowCollectorSliceSpec defines the desired state of FlowCollectorSlice |
7.5.1. .metadata Copy linkLink copied to clipboard!
- Description
- Standard object’s metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
- Type
-
object
7.5.2. .spec Copy linkLink copied to clipboard!
- Description
- FlowCollectorSliceSpec defines the desired state of FlowCollectorSlice
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
Beware that the subnet labels configured in FlowCollectorSlice are not limited to the flows of the related namespace: any flow in the whole cluster can be labeled using this configuration. However, subnet labels defined in the cluster-scoped FlowCollector take precedence in case of conflicting rules. |
7.5.3. .spec.subnetLabels Copy linkLink copied to clipboard!
- Description
- allows you to customize subnets and IPs labeling, such as to identify cluster external workloads or web services. External subnets must be labeled with the prefix
subnetLabels, or not labeled at all, in order to work with default quick filters and some metrics examples provided.EXT:
Beware that the subnet labels configured in FlowCollectorSlice are not limited to the flows of the related namespace: any flow in the whole cluster can be labeled using this configuration. However, subnet labels defined in the cluster-scoped FlowCollector take precedence in case of conflicting rules.
- Type
-
array
7.5.4. .spec.subnetLabels[] Copy linkLink copied to clipboard!
- Description
- SubnetLabel allows to label subnets and IPs, such as to identify cluster-external workloads or web services.
- Type
-
object - Required
-
cidrs -
name
-
| Property | Type | Description |
|---|---|---|
|
|
| List of CIDRs, such as
|
|
|
| Label name, used to flag matching flows. External subnets must be labeled with the prefix
|
Chapter 8. Network Policy Copy linkLink copied to clipboard!
As an administrator, you can create a network policy for the
netobserv
8.1. Configuring network policy by using the FlowCollector custom resource Copy linkLink copied to clipboard!
You can set up ingress and egress network policies to control pod traffic. This enhances security and collects only the network flow data you need. This reduces noise, supports compliance, and improves visibility into network communication.
You can configure the
FlowCollector
spec.NetworkPolicy.enable
true
If you have installed Loki, Kafka or any exporter in a different namespace that also has a network policy, you must ensure that the network observability components can communicate with them. Consider the following about your setup:
-
Connection to Loki (as defined in the CR
FlowCollectorparameter)spec.loki -
Connection to Kafka (as defined in the CR
FlowCollectorparameter)spec.kafka -
Connection to any exporter (as defined in FlowCollector CR parameter)
spec.exporters -
If you are using Loki and including it in the policy target, connection to an external object storage (as defined in your related secret)
LokiStack
Procedure
- In the web console, go to Operators → Installed Operators page.
- Under the Provided APIs heading for Network Observability, select Flow Collector.
- Select cluster then select the YAML tab.
Configure the
CR. A sample configuration is as follows:FlowCollectorExample
FlowCollectorCR for network policyapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv networkPolicy: enable: true1 additionalNamespaces: ["openshift-console", "openshift-monitoring"]2 # ...
Chapter 9. Network observability DNS resolution analysis Copy linkLink copied to clipboard!
Learn how DNS resolution analysis uses eBPF-based decoding to identify service discovery issues and follow the steps to enable DNS tracking in the FlowCollector resource to enrich network flow records with domain names.
9.1. Strategic benefits of DNS resolution analysis Copy linkLink copied to clipboard!
Use DNS resolution analysis to differentiate between network transport failures and service discovery issues by enriching eBPF flow records with domain names and status codes.
Standard flow logs only show that traffic occurred on port 53. DNS resolution analysis allows you to complete the following tasks:
-
Reduced Mean time to identify (Mtti): Distinguish immediately between a network routing failure and a DNS resolution failure, such as an error.
NXDOMAIN -
Measure internal service latency: Track the time it takes for CoreDNS to respond to specific internal lookups (e.g., ).
my-service.namespace.svc.cluster.local - Audit external dependencies: Audit which external APIs or third-party domains your workloads are communicating with without requiring sidecars or manual packet captures.
- Improved security posture: Detect potential data exfiltration or Command and Control (C2) activity by auditing the Fully Qualified Domain Names (FQDNs) queried by internal workloads.
9.1.1. DNS flow enrichment Copy linkLink copied to clipboard!
When this feature is active, the eBPF agent enriches the flow records. This metadata allows you to group and filter traffic by the intent of the connection (the domain) rather than just the source IP.
Enhanced DNS decoding allows the eBPF agent to inspect UDP and TCP DNS traffic on port 53 along with the query names for the DNS request.
9.2. Configure DNS domain tracking for network observability Copy linkLink copied to clipboard!
Enable DNS tracking in the Network Observability Operator to monitor DNS query names, response codes, and latency for network flows within the cluster.
Prerequisites
- The Network Observability Operator is installed.
-
You have privileges.
cluster-admin -
You are familiar with the custom resource.
FlowCollector
Procedure
Edit the
resource by running the following command:FlowCollector$ oc edit flowcollector clusterConfigure the eBPF agent to enable the DNS tracking feature:
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowCollector metadata: name: cluster spec: agent: type: eBPF ebpf: features: - DNSTrackingwhere:
spec.agent.type.ebpf.features-
Specifies the list of features to enable for the eBPF agent. To enable DNS tracking, add
DNSTrackingto this list.
- Save and exit the editor.
Verification
- In the OpenShift Container Platform web console, navigate to Observe → Network Traffic.
- In the Traffic Flows view, click the Manage columns icon.
- Ensure that the DNS Query Name, DNS Response Code, and DNS Latency columns are selected.
-
Filter the results by setting Port to .
53 - Confirm that the flow table columns are populated with domain names and DNS metadata.
9.3. DNS flow enrichment and analysis reference Copy linkLink copied to clipboard!
Identify metadata added to network flows, leverage DNS data for network optimization, and understand the performance and storage impacts on the cluster.
The following table describes the metadata fields added to network flows when DNS tracking is enabled.
Query names might be missing or truncated because of compression pointers or cache limitations.
| Field | Description | Example |
|---|---|---|
|
| The Fully Qualified Domain Name (FQDN) being queried. |
|
|
| The status code returned by the DNS server. |
|
|
| The transaction ID used to match queries with responses. |
|
9.3.1. Leverage DNS data for network optimization Copy linkLink copied to clipboard!
Use the captured DNS metadata for the following operational outcomes:
- Audit external dependencies: Ensure workloads are not reaching out to unauthorized external APIs or high-risk domains.
-
Performance tuning: Monitor to identify if
DNS Latencypods require additional scaling or if upstream DNS providers are lagging.CoreDNS
9.3.2. Identify misconfiguration errors Copy linkLink copied to clipboard!
A high frequency of
NXDOMAIN
NXDOMAIN
When
NXDOMAIN
my-svc.my-namespace.svc
For example, instead of
https://my-svc.my-namespace.svc
https://my-svc.my-namespace.svc.cluster.local.
9.3.3. Loki storage considerations Copy linkLink copied to clipboard!
DNS tracking increases the number of labels and the amount of metadata per flow. Ensure that the Loki storage is sized to accommodate the increased log volume.
Chapter 10. Observing the network traffic Copy linkLink copied to clipboard!
As an administrator, you can observe the network traffic in the OpenShift Container Platform web console for detailed troubleshooting and analysis. This feature helps you get insights from different graphical representations of traffic flow.
10.1. Observing the network traffic from the Overview view Copy linkLink copied to clipboard!
The Network Traffic Overview view provides aggregated flow metrics and visual insights into application communications. Administrators can use the metrics to monitor data volume, troubleshoot connectivity, and detect unusual traffic patterns across the cluster.
The Overview view shows aggregate network traffic in your OpenShift Container Platform cluster, allowing you to see which applications are communicating and the volume of data being transferred. It provides detailed insights by source, destination, and flow type, along with the top traffic flows and average byte rates.
As an administrator, you can troubleshoot connectivity issues, detect unusual traffic patterns, and optimize application performance. It provides a quick overview of network behavior, making it easier to prioritize actions and ensure efficient resource usage.
10.1.1. Working with the Overview view Copy linkLink copied to clipboard!
Navigate to the network traffic Overview view in the OpenShift Container Platform console to see graphical representations of flow rate statistics and configure the display scope using available options.
Prerequisite
- Access to the cluster with administrator rights.
Procedure
- Navigate to Observe → Network Traffic.
- In the Network Traffic page, click the Overview tab.
- Click the menu icon to configure the scope of each flow rate data.
10.1.2. Configuring advanced options for the Overview view Copy linkLink copied to clipboard!
Customize the network traffic Overview view by configuring advanced options, such as graph scope, label truncation, and panel management, to refine the display of flow rate statistics and traffic data.
To access the advanced options, click Show advanced options. You can configure the details in the graph by using the Display options drop-down menu. The options available are as follows:
- Scope: Select to view the components that network traffic flows between. You can set the scope to Node, Namespace, Owner, Zones, Cluster or Resource. Owner is an aggregation of resources. Resource can be a pod, service, node, in case of host-network traffic, or an unknown IP address. The default value is Namespace.
- Truncate labels: Select the required width of the label from the drop-down list. The default value is M.
10.1.2.1. Managing panels and display Copy linkLink copied to clipboard!
You can select the required panels to be displayed, reorder them, and focus on a specific panel. To add or remove panels, click Manage panels.
The following panels are shown by default:
- Top X average bytes rates
- Top X bytes rates stacked with total
Other panels can be added in Manage panels:
- Top X average packets rates
- Top X packets rates stacked with total
Query options allows you to choose whether to show the Top 5, Top 10, or Top 15 rates.
10.1.3. Packet drop tracking Copy linkLink copied to clipboard!
Monitor and analyze network packet loss by using eBPF-based packet drop tracking, which identifies drop locations, detects host or OVS-specific drop reasons, and provides dedicated graphical panels in the Overview view.
You can configure graphical representation of network flow records with packet loss in the Overview view. By employing eBPF tracepoint hooks, you can gain valuable insights into packet drops for TCP, UDP, SCTP, ICMPv4, and ICMPv6 protocols, which can result in the following actions:
- Identification: Pinpoint the exact locations and network paths where packet drops are occurring. Determine whether specific devices, interfaces, or routes are more prone to drops.
- Root cause analysis: Examine the data collected by the eBPF program to understand the causes of packet drops. For example, are they a result of congestion, buffer issues, or specific network events?
- Performance optimization: With a clearer picture of packet drops, you can take steps to optimize network performance, such as adjust buffer sizes, reconfigure routing paths, or implement Quality of Service (QoS) measures.
When packet drop tracking is enabled, you can see the following panels in the Overview by default:
- Top X packet dropped state stacked with total
- Top X packet dropped cause stacked with total
- Top X average dropped packets rates
- Top X dropped packets rates stacked with total
Other packet drop panels are available to add in Manage panels:
- Top X average dropped bytes rates
- Top X dropped bytes rates stacked with total
10.1.3.1. Types of packet drops Copy linkLink copied to clipboard!
Two kinds of packet drops are detected by network observability: host drops and OVS drops. Host drops are prefixed with
SKB_DROP
OVS_DROP
-
: the packet dropped due to a missing socket.
SKB_DROP_REASON_NO_SOCKET -
: the packet dropped due to a TCP checksum error.
SKB_DROP_REASON_TCP_CSUM
Examples of OVS drops reasons are as follows:
-
: OVS packets dropped due to an implicit drop action, for example due to a configured network policy.
OVS_DROP_LAST_ACTION -
: OVS packets dropped due to an expired IP TTL.
OVS_DROP_IP_TTL
See the Additional resources of this section for more information about enabling and working with packet drop tracking.
10.1.4. DNS tracking Copy linkLink copied to clipboard!
Monitor DNS activity by using eBPF-based DNS tracking to gain insights into query patterns, detect security threats, and troubleshoot latency issues through dedicated graphical panels in the Overview view.
You can configure graphical representation of Domain Name System (DNS) tracking of network flows in the Overview view. Using DNS tracking with extended Berkeley Packet Filter (eBPF) tracepoint hooks can serve various purposes:
- Network Monitoring: Gain insights into DNS queries and responses, helping network administrators identify unusual patterns, potential bottlenecks, or performance issues.
- Security Analysis: Detect suspicious DNS activities, such as domain name generation algorithms (DGA) used by malware, or identify unauthorized DNS resolutions that might indicate a security breach.
- Troubleshooting: Debug DNS-related issues by tracing DNS resolution steps, tracking latency, and identifying misconfigurations.
By default, when DNS tracking is enabled, you can see the following non-empty metrics represented in a donut or line chart in the Overview:
- Top X DNS Response Code
- Top X average DNS latencies with overall
- Top X 90th percentile DNS latencies
Other DNS tracking panels can be added in Manage panels:
- Bottom X minimum DNS latencies
- Top X maximum DNS latencies
- Top X 99th percentile DNS latencies
This feature is supported for IPv4 and IPv6 UDP and TCP protocols.
See the Additional resources in this section for more information about enabling and working with this view.
10.1.5. Round-Trip Time Copy linkLink copied to clipboard!
Analyze network flow latencies by using TCP Round-Trip Time (RTT) metrics, which use eBPF hookpoints to identify performance bottlenecks and troubleshoot TCP-related issues through dedicated panels in the Overview view.
You can use TCP smoothed Round-Trip Time (sRTT) to analyze network flow latencies. You can use RTT captured from the
fentry/tcp_rcv_established
- Network Monitoring: Gain insights into TCP latencies, helping network administrators identify unusual patterns, potential bottlenecks, or performance issues.
- Troubleshooting: Debug TCP-related issues by tracking latency and identifying misconfigurations.
By default, when RTT is enabled, you can see the following TCP RTT metrics represented in the Overview:
- Top X 90th percentile TCP Round Trip Time with overall
- Top X average TCP Round Trip Time with overall
- Bottom X minimum TCP Round Trip Time with overall
Other RTT panels can be added in Manage panels:
- Top X maximum TCP Round Trip Time with overall
- Top X 99th percentile TCP Round Trip Time with overall
See the Additional resources in this section for more information about enabling and working with this view.
10.1.6. eBPF flow rule filter Copy linkLink copied to clipboard!
Control packet capture volume by using eBPF flow rule filtering to specify capture criteria based on ports and CIDR notation, while monitoring filter performance through dedicated health dashboards and Prometheus metrics.
You can use rule-based filtering to control the volume of packets cached in the eBPF flow table. For example, a filter can specify that only packets coming from port 100 should be captured. Then only the packets that match the filter are captured and the rest are dropped.
You can apply multiple filter rules.
10.1.6.1. Ingress and egress traffic filtering Copy linkLink copied to clipboard!
Classless Inter-Domain Routing (CIDR) notation efficiently represents IP address ranges by combining the base IP address with a prefix length. For both ingress and egress traffic, the source IP address is first used to match filter rules configured with CIDR notation. If there is a match, then the filtering proceeds. If there is no match, then the destination IP is used to match filter rules configured with CIDR notation.
After matching either the source IP or the destination IP CIDR, you can pinpoint specific endpoints using the
peerIP
10.1.6.2. Dashboard and metrics integrations Copy linkLink copied to clipboard!
When this option is enabled, the Netobserv/Health dashboard for eBPF agent statistics now has the Filtered flows rate view. Additionally, in Observe → Metrics you can query
netobserv_agent_filtered_flows_total
10.1.6.3. Flow filter configuration parameters Copy linkLink copied to clipboard!
Reference the required and optional parameters for configuring flow filter rules in the
FlowCollector
| Parameter | Description |
|---|---|
|
| Set
|
|
| Provides the IP address and CIDR mask for the flow filter rule. Supports both IPv4 and IPv6 address format. If you want to match against any IP, you can use
|
|
| Describes the action that is taken for the flow filter rule. The possible values are
|
| Parameter | Description |
|---|---|
|
| Defines the direction of the flow filter rule. Possible values are
|
|
| Defines the protocol of the flow filter rule. Possible values are
|
|
| Defines the TCP flags to filter flows. Possible values are
|
|
| Defines the ports to use for filtering flows. It can be used for either source or destination ports. To filter a single port, set a single port as an integer value. For example
|
|
| Defines the source port to use for filtering flows. To filter a single port, set a single port as an integer value, for example
|
|
| DestPorts defines the destination ports to use for filtering flows. To filter a single port, set a single port as an integer value, for example
|
|
| Defines the ICMP type to use for filtering flows. |
|
| Defines the ICMP code to use for filtering flows. |
|
| Defines the IP address to use for filtering flows, for example:
|
10.1.7. User-defined networks Copy linkLink copied to clipboard!
Understand how you can use user-defined networks (UDN) for flexible network segmentation and leverage the Network Observability Operator to monitor these segments through dedicated labels and name filters in the traffic flow table.
User-defined networks (UDN) improve the flexibility and segmentation capabilities of the default Layer 3 topology for a Kubernetes pod network by enabling custom Layer 2 and Layer 3 network segments, where all these segments are isolated by default. These segments act as primary or secondary networks for container pods and virtual machines that use the default OVN-Kubernetes CNI plugin.
UDNs enable a wide range of network architectures and topologies, enhancing network flexibility, security, and performance.
When the
UDNMapping
10.1.8. OVN-Kubernetes networking events Copy linkLink copied to clipboard!
Use OVN-Kubernetes network event tracking to monitor and audit network policies, admin network policies, and egress firewall rules in your cluster.
OVN-Kubernetes networking events tracking is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can use the insights from tracking network events to help with the following tasks:
- Network monitoring: Monitor allowed and blocked traffic, detecting whether packets are allowed or blocked based on network policies and admin network policies.
- Network security: You can track outbound traffic and see whether it adheres to egress firewall rules. Detect unauthorized outbound connections and flag outbound traffic that violates egress rules.
See the Additional resources in this section for more information about enabling and working with this view.
10.2. Observing the network traffic from the Traffic flows view Copy linkLink copied to clipboard!
Use the Traffic flows view to monitor real-time and historical network communication between cluster components. By analyzing granular flow data collected via eBPF, you can audit network traffic, validate network policies, and export data for external reporting and analysis.
The Traffic flows view in the Network Observability Operator provides a granular, tabular representation of network activity across a OpenShift Container Platform cluster. By leveraging eBPF technology to collect flow data, this view allows administrators to monitor real-time and historical communication between pods, services, and nodes. This visibility is essential for auditing network traffic, validating network policies, and identifying unexpected communication patterns within the cluster infrastructure.
In the Traffic flows interface, you can analyze specific connection details by interacting with individual rows to retrieve detailed flow information. The view supports advanced customization through the Display options menu, where you can adjust row density and manage columns. By selecting and reordering specific columns, you can tailor the table to highlight the most relevant data points for your environment, such as source and destination endpoints or traffic volume.
To support external analysis and reporting, the Traffic flows view includes data export capabilities. You can export the entire dataset or select specific fields to generate a targeted report of network activity. This functionality ensures that network flow data is accessible for long-term auditing or for use in third-party monitoring tools, providing a flexible way to document and analyze the network health of your OpenShift Container Platform environment.
10.2.1. Working with the Traffic flows view Copy linkLink copied to clipboard!
View and analyze detailed network flow information by using the Traffic flows table.
As an administrator, you can navigate to Traffic flows table to see network flow information.
Prerequisite
- You have administrator access.
Procedure
- Navigate to Observe → Network Traffic.
- In the Network Traffic page, click the Traffic flows tab.
- Click on each row to get the corresponding flow information.
10.2.2. Traffic flow display settings Copy linkLink copied to clipboard!
The Traffic flows view contains settings to customize the display density, data columns, and data export options.
10.2.2.1. Display options Copy linkLink copied to clipboard!
The following elements are available in the Traffic flows view:
- Show advanced options
- Specifies a menu to customize and export the current view.
- Display options drop-down
- Specifies the row size for the data table. The default value is Normal.
- Manage columns
- Specifies a dialog to select and reorder the columns displayed in the Traffic flows table.
10.2.3. Exporting traffic flow data Copy linkLink copied to clipboard!
Export network flow data from the Traffic flows view to a CSV file for external analysis or reporting.
Procedure
- Click Export data.
- In the window, select the Export all data checkbox to export all the data, and clear the checkbox to select the required fields to be exported.
- Click Export.
10.2.4. Configuring IPsec with the FlowCollector custom resource Copy linkLink copied to clipboard!
Enable IPsec tracking in the
FlowCollector
In OpenShift Container Platform, IPsec is disabled by default. You can enable IPsec by following the instructions in "Configuring IPsec encryption".
Prerequisite
- You have enabled IPsec encryption on OpenShift Container Platform.
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster then select the YAML tab.
Configure the
custom resource for IPsec:FlowCollectorExample configuration of
FlowCollectorfor IPsecapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv agent: type: eBPF ebpf: features: - "IPSec"
Verification
When IPsec is enabled:
- A new column named IPsec Status is displayed in the network observability Traffic flows view to show whether a flow was successfully IPsec-encrypted or if there was an error during encryption/decryption.
- A new dashboard showing the percent of encrypted traffic is generated.
10.2.5. Working with conversation tracking Copy linkLink copied to clipboard!
Configure the
FlowCollector
As an administrator, you can group network flows that are part of the same conversation. A conversation is defined as a grouping of peers that are identified by their IP addresses, ports, and protocols, resulting in an unique Conversation Id. You can query conversation events in the web console. These events are represented in the web console as follows:
- Conversation start: This event happens when a connection is starting or TCP flag intercepted
-
Conversation tick: This event happens at each specified interval defined in the
FlowCollectorparameter while the connection is active.spec.processor.conversationHeartbeatInterval -
Conversation end: This event happens when the
FlowCollectorparameter is reached or the TCP flag is intercepted.spec.processor.conversationEndTimeout - Flow: This is the network traffic flow that occurs within the specified interval.
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster then select the YAML tab.
Configure the
custom resource so thatFlowCollector,spec.processor.logTypes, andconversationEndTimeoutparameters are set according to your observation needs. A sample configuration is as follows:conversationHeartbeatIntervalConfigure
FlowCollectorfor conversation trackingapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: processor: logTypes: Flows1 advanced: conversationEndTimeout: 10s2 conversationHeartbeatInterval: 30s3 - 1
- When
logTypesis set toFlows, only the Flow event is exported. If you set the value toAll, both conversation and flow events are exported and visible in the Network Traffic page. To focus only on conversation events, you can specifyConversationswhich exports the Conversation start, Conversation tick and Conversation end events; orEndedConversationsexports only the Conversation end events. Storage requirements are highest forAlland lowest forEndedConversations. - 2
- The Conversation end event represents the point when the
conversationEndTimeoutis reached or the TCP flag is intercepted. - 3
- The Conversation tick event represents each specified interval defined in the
FlowCollectorconversationHeartbeatIntervalparameter while the network connection is active.
NoteIf you update the
option, the flows from the previous selection do not clear from the console plugin. For example, if you initially setlogTypetologTypefor a span of time until 10 AM and then move toConversations, the console plugin shows all conversation events before 10 AM and only ended conversations after 10 AM.EndedConversations-
Refresh the Network Traffic page on the Traffic flows tab. Notice there are two new columns, Event/Type and Conversation Id. All the Event/Type fields are when Flow is the selected query option.
Flow - Select Query Options and choose the Log Type, Conversation. Now the Event/Type shows all of the desired conversation events.
- Next you can filter on a specific conversation ID or switch between the Conversation and Flow log type options from the side panel.
10.2.6. Working with packet drops Copy linkLink copied to clipboard!
Enable packet drop tracking in the Network Observability Operator by configuring the
FlowCollector
Packet loss occurs when one or more packets of network flow data fail to reach their destination. You can track these drops by editing the
FlowCollector
CPU and memory usage increases when this feature is enabled.
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster, and then select the YAML tab.
Configure the
custom resource for packet drops, for example:FlowCollectorExample
FlowCollectorconfigurationapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv agent: type: eBPF ebpf: features: - PacketDrop1 privileged: true2
Verification
When you refresh the Network Traffic page, the Overview, Traffic Flow, and Topology views display new information about packet drops:
- Select new choices in Manage panels to choose which graphical visualizations of packet drops to display in the Overview.
Select new choices in Manage columns to choose which packet drop information to display in the Traffic flows table.
-
In the Traffic Flows view, you can also expand the side panel to view more information about packet drops. Host drops are prefixed with and OVS drops are prefixed with
SKB_DROP.OVS_DROP
-
In the Traffic Flows view, you can also expand the side panel to view more information about packet drops. Host drops are prefixed with
- In the Topology view, red lines are displayed where drops are present.
10.2.7. Working with DNS tracking Copy linkLink copied to clipboard!
Configure the
FlowCollector
You can track DNS by editing the
FlowCollector
CPU and memory usage increases are observed in the eBPF agent when this feature is enabled.
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for Network Observability, select Flow Collector.
- Select cluster then select the YAML tab.
Configure the
custom resource. A sample configuration is as follows:FlowCollectorConfigure
FlowCollectorfor DNS trackingapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv agent: type: eBPF ebpf: features: - DNSTracking1 sampling: 12 - 1
- You can set the
spec.agent.ebpf.featuresparameter list to enable DNS tracking of each network flow in the web console. - 2
- You can set
samplingto a value of1for more accurate metrics and to capture DNS latency. For asamplingvalue greater than 1, you can observe flows with DNS Response Code and DNS Id, and it is unlikely that DNS Latency can be observed.
When you refresh the Network Traffic page, there are new DNS representations you can choose to view in the Overview and Traffic Flow views and new filters you can apply.
- Select new DNS choices in Manage panels to display graphical visualizations and DNS metrics in the Overview.
- Select new choices in Manage columns to add DNS columns to the Traffic Flows view.
Filter on specific DNS metrics, such as DNS Id, DNS Error DNS Latency and DNS Response Code, and see more information from the side panel. The DNS Latency and DNS Response Code columns are shown by default.
NoteTCP handshake packets do not have DNS headers. TCP protocol flows without DNS headers are shown in the traffic flow data with DNS Latency, ID, and Response code values of "n/a". You can filter out flow data to view only flows that have DNS headers using the Common filter "DNSError" equal to "0".
10.2.8. Working with RTT tracing Copy linkLink copied to clipboard!
Enable Round Trip Time (RTT) tracing by configuring the
FlowCollector
You can track RTT by editing the
FlowCollector
Procedure
- In the web console, navigate to Operators → Installed Operators.
- In the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster, and then select the YAML tab.
Configure the
custom resource for RTT tracing, for example:FlowCollectorExample
FlowCollectorconfigurationapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv agent: type: eBPF ebpf: features: - FlowRTT1 - 1
- You can start tracing RTT network flows by listing the
FlowRTTparameter in thespec.agent.ebpf.featuresspecification list.
Verification
After the Network Traffic page is refreshed, the Overview, Traffic flows, and Topology views display RTT information.
- In the Overview view, click Manage panels to select the RTT graphical visualizations to display.
- In the Traffic flows table, verify that the Flow RTT column is visible by default. To manage columns, click Manage columns.
In the Traffic flows view, expand the side panel to view RTT metadata:
-
Filter the flow data for the TCP protocol by entering in the filter search bar.
protocol=TCP -
Verify that all TCP filtered flows have FlowRTT values greater than .
0 -
Filter for FlowRTT values greater than nanoseconds (10 ms) by entering
10,000,000in the filter search bar.time_flow_rtt>=10000000 - Remove the filters.
-
Filter the flow data for the TCP protocol by entering
- In the Topology view, click the Display option drop-down menu. In the Edge labels list, select RTT.
10.2.9. Working with the eBPF Manager Operator Copy linkLink copied to clipboard!
Integrate the eBPF Manager Operator with Network Observability to manage eBPF programs and reduce the need for privileged agent permissions.
The eBPF Manager Operator reduces the attack surface and ensures compliance, security, and conflict prevention by managing all eBPF programs. Network observability can use the eBPF Manager Operator to load hooks. As a result, you no longer need to provide the eBPF Agent with privileged mode or additional Linux capabilities such as
CAP_BPF
CAP_PERFMON
eBPF Manager Operator with network observability is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Procedure
- In the web console, navigate to Operators → Operator Hub.
- Install eBPF Manager.
-
Check Workloads → Pods in the namespace to make sure they are all up and running.
bpfman Configure the
custom resource to use the eBPF Manager Operator:FlowCollectorExample
FlowCollectorconfigurationapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: agent: ebpf: features: - EbpfManager
Verification
- In the web console, navigate to Operators → Installed Operators.
Click eBPF Manager Operator → All instances tab.
For each node, verify that a
namedBpfApplicationand a pair ofnetobservobjects, one for Traffic Control (TCx) ingress and another for TCx egress, exist. If you enable other eBPF Agent features, you might have more objects.BpfProgram
10.2.10. Using the histogram Copy linkLink copied to clipboard!
The histogram provides a visualization of network flow logs that you can use to analyze traffic volume trends and filter flow data by specific time intervals.
You can click Show histogram to display a toolbar view for visualizing the history of flows as a bar chart. The histogram shows the number of logs over time. You can select a part of the histogram to filter the network flow data in the table that follows the toolbar.
10.2.11. Working with availability zones Copy linkLink copied to clipboard!
Configure the
FlowCollector
You can configure the
FlowCollector
topology.kubernetes.io/zone label value applied to the nodes.
Procedure
- In the web console, go to Operators → Installed Operators.
- Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster then select the YAML tab.
Configure the
custom resource so that theFlowCollectorparameter is set tospec.processor.addZone. A sample configuration is as follows:trueConfigure
FlowCollectorfor availability zones collectionapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: # ... processor: addZone: true # ...
Verification
When you refresh the Network Traffic page, the Overview, Traffic Flow, and Topology views display new information about availability zones:
- In the Overview tab, you can see Zones as an available Scope.
- In Network Traffic → Traffic flows, Zones are viewable under the SrcK8S_Zone and DstK8S_Zone fields.
- In the Topology view, you can set Zones as Scope or Group.
10.2.12. Filtering eBPF flow data using multiple rules Copy linkLink copied to clipboard!
Configure multiple filtering rules in the
FlowCollector
- You cannot use duplicate Classless Inter-Domain Routing (CIDRs) in filter rules.
- When an IP address matches multiple filter rules, the rule with the most specific CIDR prefix (longest prefix) takes precedence.
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for Network Observability, select Flow Collector.
- Select cluster, then select the YAML tab.
-
Configure the custom resource.
FlowCollector
10.2.13. eBPF flow data filtering examples Copy linkLink copied to clipboard!
Use these
FlowCollector
10.2.13.1. Example YAML to sample all North-South traffic, and 1:50 East-West traffic Copy linkLink copied to clipboard!
By default, all other flows are rejected.
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
namespace: netobserv
deploymentModel: Service
agent:
type: eBPF
ebpf:
flowFilter:
enable: true
rules:
- action: Accept
cidr: 0.0.0.0/0
sampling: 1
- action: Accept
cidr: 10.128.0.0/14
peerCIDR: 10.128.0.0/14
- action: Accept
cidr: 172.30.0.0/16
peerCIDR: 10.128.0.0/14
sampling: 50
- 1
- Enables
eBPFflow filtering by settingspec.agent.ebpf.flowFilter.enabletotrue. - 2
- Defines the action for the flow filter rule. Valid values are
AcceptorReject. - 3
- Defines the IP address and
CIDRmask for the flow filter rule. This parameter supports bothIPv4andIPv6address formats. Use0.0.0.0/0forIPv4or::/0forIPv6to match any IP address. - 4
- Defines the sampling interval for matched flows and overrides the global sampling setting (
spec.agent.ebpf.sampling). - 5
- Filters flows by Peer IP
CIDR.
10.2.13.2. Example YAML to filter flows with packet drops Copy linkLink copied to clipboard!
By default, all other flows are rejected.
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
namespace: netobserv
deploymentModel: Service
agent:
type: eBPF
ebpf:
privileged: true
features:
- PacketDrop
flowFilter:
enable: true
rules:
- action: Accept
cidr: 172.30.0.0/16
pktDrops: true
- 1
- Enables packet drops reporting by setting
spec.agent.ebpf.privilegedtotrue. - 2
- Reports packet drops for each network flow by adding the
PacketDropvalue to thespec.agent.ebpf.featureslist. - 3
- Enables
eBPFflow filtering by settingspec.agent.ebpf.flowFilter.enabletotrue. - 4
- Defines the action for the flow filter rule. Valid values are
AcceptorReject. - 5
- Filters flows that contain drops by setting
pktDropstotrue.
10.2.14. Endpoint translation (xlat) Copy linkLink copied to clipboard!
Endpoint translation (xlat) uses eBPF to enrich network flow logs with translated pod-level metadata, providing visibility into the specific backend pods serving traffic behind services or load balancers.
You can gain visibility into the endpoints serving traffic in a consolidated view using network observability and extended Berkeley Packet Filter (eBPF). Typically, when traffic flows through a service, egressIP, or load balancer, the traffic flow information is abstracted as it is routed to one of the available pods. If you try to get information about the traffic, you can only view service related info, such as service IP and port, and not information about the specific pod that is serving the request. Often the information for both the service traffic and the virtual service endpoint is captured as two separate flows, which complicates troubleshooting.
To solve this, endpoint xlat can help in the following ways:
- Capture the network flows at the kernel level, which has a minimal impact on performance.
- Enrich the network flows with translated endpoint information, showing not only the service but also the specific backend pod, so you can see which pod served a request.
As network packets are processed, the eBPF hook enriches flow logs with metadata about the translated endpoint that includes the following pieces of information that you can view in the Network Traffic page in a single row:
- Source Pod IP
- Source Port
- Destination Pod IP
- Destination Port
- Conntrack Zone ID
10.2.15. Working with endpoint translation (xlat) Copy linkLink copied to clipboard!
Enable endpoint translation (xlat) in the
FlowCollector
You can use network observability and eBPF to enrich network flows from a Kubernetes service with translated endpoint information, gaining insight into the endpoints serving traffic.
Procedure
- In the web console, navigate to Operators → Installed Operators.
- In the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster, and then select the YAML tab.
Configure the
custom resource forFlowCollector, for example:PacketTranslationExample
FlowCollectorconfigurationapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv agent: type: eBPF ebpf: features: - PacketTranslation1 - 1
- You can start enriching network flows with translated packet information by listing the
PacketTranslationparameter in thespec.agent.ebpf.featuresspecification list.
Refresh the Network Traffic page to filter for information about translated packets:
- Filter the network flow data based on Destination kind: Service.
You can see the xlat column, which distinguishes where translated information is displayed, and the following default columns:
- Xlat Zone ID
- Xlat Src Kubernetes Object
- Xlat Dst Kubernetes Object
- You can manage the display of additional xlat columns in Manage columns.
10.2.16. Working with user-defined networks Copy linkLink copied to clipboard!
Configure the
FlowCollector
You can enable user-defined networks (UDN) in network observability resources. The following example shows the configuration for the
FlowCollector
Prerequisite
- You have configured UDN in Red Hat OpenShift Networking. For more information, see "Creating a UserDefinedNetwork by using the CLI" or "Creating a UserDefinedNetwork by using the web console."
Procedure
Edit the network observability
resource by running the following command:FlowCollector$ oc edit flowcollectorConfigure the
section of theebpfresource:FlowCollectorapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: agent: ebpf: sampling: 11 privileged: true features: - UDNMapping- 1
- Recommended so all flows are observed.
Verification
Refresh the Network Traffic page to view updated UDN information in the Traffic Flow and Topology views:
-
In Network Traffic > Traffic flows, you can view UDNs under the and
SrcK8S_NetworkNamefields.DstK8S_NetworkName - In the Topology view, you can set Network as Scope or Group.
-
In Network Traffic > Traffic flows, you can view UDNs under the
10.2.17. Viewing network events Copy linkLink copied to clipboard!
Configure the
FlowCollector
OVN-Kubernetes networking events tracking is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can edit the
FlowCollector
-
NetworkPolicy -
AdminNetworkPolicy -
BaselineNetworkPolicy -
EgressFirewall -
isolation
UserDefinedNetwork - Multicast ACLs
Prerequisites
-
You must have enabled by setting the
OVNObservabilityfeature set in theTechPreviewNoUpgradecustom resource (CR) namedFeatureGate. For more information, see "Enabling feature sets using the CLI" and "Checking OVN-Kubernetes network traffic with OVS sampling using the CLI".cluster -
You have created at least one of the following network APIs: ,
NetworkPolicy,AdminNetworkPolicy,BaselineNetworkPolicyisolation, multicast, orUserDefinedNetwork.EgressFirewall
Procedure
- In the web console, navigate to Operators → Installed Operators.
- In the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster, and then select the YAML tab.
Configure the
CR to enable viewingFlowCollector, for example:NetworkEventsExample
FlowCollectorconfigurationapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: agent: type: eBPF ebpf: # sampling: 11 privileged: true2 features: - "NetworkEvents"- 1
- Optional: The
samplingparameter is set to a value of 1 so that all network events are captured. If sampling1is too resource heavy, set sampling to something more appropriate for your needs. - 2
- The
privilegedparameter is set totruebecause theOVN observabilitylibrary needs to access local Open vSwitch (OVS) socket and OpenShift Virtual Network (OVN) databases.
Verification
- Navigate to the Network Traffic view and select the Traffic flows table.
You should see the new column, Network Events, where you can view information about impacts of one of the following network APIs you have enabled:
,NetworkPolicy,AdminNetworkPolicy,BaselineNetworkPolicyisolation, multicast, or egress firewalls.UserDefinedNetworkAn example of the kind of events you could see in this column is as follows:
Example of Network Events output
<Dropped_or_Allowed> by <network_event_and_event_name>, direction <Ingress_or_Egress>
10.3. Observing the network traffic from the Topology view Copy linkLink copied to clipboard!
The Topology view in the Network Traffic page provides a graphical representation of network flows and traffic volume across your OpenShift Container Platform cluster. As an administrator, you can use this view to monitor application traffic data and visualize the relationships between various network components.
The visualization represents network entities as nodes and traffic flows as edges. By selecting individual components within the graph, you can access a side panel containing specific metrics and health details for that resource. This interactive approach allows for rapid identification of traffic patterns and connectivity issues within the cluster.
To manage complex environments, the Topology view includes advanced configuration options that allow you to customize the layout and data density. You can adjust the Scope of the view, apply Groups to represent resource ownership, and choose different Layout algorithms to optimize the graphical display. Additionally, you can enable Edge labels to show real-time measurements, such as the average byte rate, directly on the flow lines.
For reporting or external analysis, the Topology view provides an export feature. You can download the current graphical representation as a PNG image or generate a direct link to the specific view configuration to share with other administrators. These tools ensure that network insights are both accessible and easily documented.
10.3.1. Working with the Topology view Copy linkLink copied to clipboard!
Access the Topology view to visually inspect cluster network relationships and select individual components to view detailed traffic metrics and metadata.
As an administrator, you can navigate to the Topology view to see the details and metrics of the component.
Prerequisites
- You have administrator access.
Procedure
- Navigate to Observe → Network Traffic.
- In the Network Traffic page, click the Topology tab.
- Click each component in the Topology tab to view its details and metrics.
10.3.2. Configuring the advanced options for the Topology view Copy linkLink copied to clipboard!
Review the available advanced options in the Topology view to customize display settings, configure component grouping and layouts, and export the network graph as an image.
You can customize and export the view by using Show advanced options. The advanced options view has the following features:
- Find in view: To search the required components in the view.
Display options: To configure the following options:
- Edge labels: To show the specified measurements as edge labels. The default is to show the Average rate in Bytes.
- Scope: To select the scope of components between which the network traffic flows. The default value is Namespace.
- Groups: To enhance the understanding of ownership by grouping the components. The default value is None.
- Layout: To select the layout of the graphical representation. The default value is ColaNoForce.
- Show: To select the details that need to be displayed. All the options are checked by default. The options available are: Edges, Edges label, and Badges.
- Truncate labels: To select the required width of the label from the drop-down list. The default value is M.
- Collapse groups: To expand or collapse the groups. The groups are expanded by default. This option is disabled if Groups has the value of None.
10.3.2.1. Exporting the topology view Copy linkLink copied to clipboard!
To export the view, click Export topology view. The view is downloaded in PNG format.
10.4. Filtering the network traffic Copy linkLink copied to clipboard!
Review the available query options and filtering parameters in the Network Traffic view to optimize data searches, analyze specific log types, and manage directional traffic visibility.
By default, the Network Traffic page displays the traffic flow data in the cluster based on the default filters configured in the
FlowCollector
Alternatively, you can access the traffic flow data in the Network Traffic tab of the Namespaces, Services, Routes, Nodes, and Workloads pages which provide the filtered data of the corresponding aggregations.
- Query Options
You can use Query Options to optimize the search results, as listed below:
- Log Type: The available options Conversation and Flows provide the ability to query flows by log type, such as flow log, new conversation, completed conversation, and a heartbeat, which is a periodic record with updates for long conversations. A conversation is an aggregation of flows between the same peers.
- Match filters: You can determine the relation between different filter parameters selected in the advanced filter. The available options are Match all and Match any. Match all provides results that match all the values, and Match any provides results that match any of the values entered. The default value is Match all.
- Datasource: You can choose the datasource to use for queries: Loki, Prometheus, or Auto. Notable performance improvements can be realized when using Prometheus as a datasource rather than Loki, but Prometheus supports a limited set of filters and aggregations. The default datasource is Auto, which uses Prometheus on supported queries or uses Loki if the query does not support Prometheus.
Drops filter: You can view different levels of dropped packets with the following query options:
- Fully dropped shows flow records with fully dropped packets.
- Containing drops shows flow records that contain drops but can be sent.
- Without drops shows records that contain sent packets.
- All shows all the aforementioned records.
- Limit: The data limit for internal backend queries. Depending upon the matching and the filter settings, the number of traffic flow data is displayed within the specified limit.
- Quick filters
-
The default values in Quick filters drop-down menu are defined in the
FlowCollectorconfiguration. You can modify the options from console. - Advanced filters
- You can set the advanced filters, Common, Source, or Destination, by selecting the parameter to be filtered from the dropdown list. The flow data is filtered based on the selection. To enable or disable the applied filter, you can click on the applied filter listed below the filter options.
You can toggle between
One way and
Back and forth filtering. The
One way filter shows only Source and Destination traffic according to your filter selections. You can use Swap to change the directional view of the Source and Destination traffic. The
Back and forth filter includes return traffic with the Source and Destination filters. The directional flow of network traffic is shown in the Direction column in the Traffic flows table as
Ingress`or `Egress
You can click Reset defaults to remove the existing filters, and apply the filter defined in
FlowCollector
To understand the rules of specifying the text value, click Learn More.
Chapter 11. Network observability health rules Copy linkLink copied to clipboard!
The Network Observability Operator provides alerts by using built-in metrics and the OpenShift Container Platform monitoring stack to report cluster network health.
Network observability health alerts require OpenShift Container Platform 4.16 or later.
11.1. Network observability rules for health and performance Copy linkLink copied to clipboard!
Network observability includes a system for managing Prometheus-based rules. Use these rules to monitor the health and performance of OpenShift Container Platform applications and infrastructure.
The Network Observability Operator converts these rules into a
PrometheusRule
-
Alerting rules: Specifies rules managed by the Prometheus to provide notification of network anomalies or infrastructure failures.
AlertManager - Recording rules: Specifies pre-compute complex Prometheus Query Language (PromQL) expressions into new time series to improve dashboard performance and visualization.
View the
PrometheusRule
netobserv
$ oc get prometheusrules -n netobserv -o yaml
11.1.1. Network health monitoring and alerting rules Copy linkLink copied to clipboard!
The Network Observability Operator includes a rule-based system to detect network anomalies and infrastructure failures. By converting configurations into alerting rules, the Operator enables automated monitoring and troubleshooting through the OpenShift Container Platform web console.
11.1.1.1. Monitoring outcomes Copy linkLink copied to clipboard!
The Network Observability Operator surfaces network status in the following areas:
- Alerting UI
-
Specific alerts appear in Observe → Alerting, where notifications are managed through the Prometheus
AlertManager. - Network Health dashboard
- A specialized dashboard in Observe → Network Health provides a high-level summary of cluster network status.
The Network Health dashboard categorizes violations into tabs to isolate the scope of an issue:
- Global: Aggregate health of the entire cluster.
- Nodes: Violations specific to infrastructure nodes.
- Namespaces: Violations specific to individual namespaces.
-
Workloads: Violations specific to resources, such as or
Deployments.DaemonSets
11.1.1.2. Predefined health rules Copy linkLink copied to clipboard!
The Network Observability Operator provides default rules for common networking scenarios. These rules are active only if the corresponding feature is enabled in the
FlowCollector
The following list contains a subset of available default rules:
PacketDropsByDevice-
Triggers on a high percentage of packet drops from network devices. It is based on standard node-exporter metrics and does not require the
PacketDropagent feature. PacketDropsByKernel-
Triggers on a high percentage of packet drops by the kernel. Requires the
PacketDropagent feature. IPsecErrors-
Triggers when IPsec encryption errors are detected. Requires the
IPSecagent feature. NetpolDenied-
Triggers when traffic denied by network policies is detected. Requires the
NetworkEventsagent feature. LatencyHighTrend-
Triggers when a significant increase in TCP latency is detected. Requires the
FlowRTTagent feature. DNSErrors-
Triggers when DNS errors are detected. Requires the
DNSTrackingagent feature.
Operational alerts for the Network Observability Operator:
NetObservNoFlows- Triggers when the pipeline is active but no flows are observed.
NetObservLokiError- Triggers when flows are dropped because of Loki errors.
For a complete list of rules and runbooks, see the Network Observability Operator runbooks.
11.1.1.3. Rule dependencies and feature requirements Copy linkLink copied to clipboard!
The Network Observability Operator creates rules based on the features enabled in the
FlowCollector
For example, packet drop-related rules are created only if the
PacketDrop
spec.processor.metrics.includeList
FlowCollector
11.2. Performance optimization with recording rules Copy linkLink copied to clipboard!
For large-scale clusters, recording rules optimize how Prometheus handles network data. Recording rules improve dashboard responsiveness and reduce the computational overhead of complex queries.
11.2.1. Optimization benefits Copy linkLink copied to clipboard!
Recording rules pre-compute complex Prometheus Query Language (PromQL) expressions and save the results as new time series. Unlike alerting rules, recording rules do not monitor thresholds.
Using recording rules provides the following advantages:
- Improved performance
- Pre-computing Prometheus queries allows dashboards to load faster by avoiding on-demand calculations for long-term trends.
- Resource efficiency
- Calculating data at fixed intervals reduces CPU load on the Prometheus server compared to recalculating data on every dashboard refresh.
- Simplified queries
-
Using short metric names, such as
cluster:network_traffic:rate_5m, simplifies complex aggregate calculations in custom dashboards.
11.2.2. Comparison of rule modes Copy linkLink copied to clipboard!
The following table compares rule modes based on the expected outcome:
| Description | Alerting rules | Recording rules |
|---|---|---|
| Goal | Issue notification. | Save history of high level metrics. |
| Data result | Generates an alerting state. | Creates a persistent metric. |
| Visibility | Alerting UI and Network Health view. | Metrics Explorer and Network Health view. |
| Notifications | Triggers
| Does not trigger notifications. |
11.3. Network observability health rule structure and customization Copy linkLink copied to clipboard!
Health rules in the Network Observability Operator are defined using rule templates and variants in the
spec.processor.metrics.healthRules
FlowCollector
For each template, you can define a list of variants, each with their own thresholds and grouping configurations. For more information, see "List of default alert templates".
The following example shows an alert:
apiVersion: flows.netobserv.io/v1beta1
kind: FlowCollector
metadata:
name: flow-collector
spec:
processor:
metrics:
healthRules:
- template: PacketDropsByKernel
mode: Alert # or Recording
variants:
# triggered when the whole cluster traffic (no grouping) reaches 10% of drops
- thresholds:
critical: "10"
# triggered when per-node traffic reaches 5% of drops, with gradual severity
- thresholds:
critical: "15"
warning: "10"
info: "5"
groupBy: Node
where:
spec.processor.metrics.healthRules.template- Specifies the name of the predefined rule template.
spec.processor.metrics.healthRules.mode-
Specifies whether the rule functions as an
Alertor aRecordingrule. This setting can either be defined per variant, or for the whole template. spec.processor.metrics.healthRules.variants.thresholds-
Specifies the numerical values that trigger the rule. You can define multiple severity levels, such as
critical,warning, orinfo, within a single variant. cluster-wide variant-
Specifies a variant defined without a
groupBysetting. In the provided example, this variant triggers when the total cluster traffic reaches 10% drops. spec.processor.metrics.healthRules.variants.groupBy- Specifies the dimension used to aggregate the metric. In the provided example, the alert is evaluated independently for each *Node8.
Customizing a rule replaces the default configuration for that template. If you want to keep the default configurations, you must manually replicate them.
11.3.1. PromQL expressions and metadata for health rules Copy linkLink copied to clipboard!
Learn about the base query for Prometheus Query Language (
PromQL
The health rule API in the network observability
FlowCollector
CR
PrometheusRule
PrometheusRule
netobserv
$ oc get prometheusrules -n netobserv -oyaml
11.3.1.1. An example query for an alert in a surge of incoming traffic Copy linkLink copied to clipboard!
This example provides the base
PromQL
sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[30m])) by (DstK8S_Namespace)
This query calculates the byte rate coming from the
openshift-ingress
You can customize the query, including retaining only some rates, running the query for specific time periods, and setting a final threshold.
- Filtering noise
Appending
to this query retains only the rates observed that are greater than> 1000, which eliminates noise from low-bandwidth consumers.1 KB/s(sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[30m])) by (DstK8S_Namespace) > 1000)The byte rate is relative to the sampling interval defined in the
custom resource (FlowCollector) configuration. If the sampling interval isCR, the actual traffic might be approximately 100 times higher than the reported metrics.1:100- Time comparison
You can run the same query for a particular period of time using the
modifier. For example, a query for one day earlier can be run usingoffset, and a query for five hours ago can be run usingoffset 1d.offset 5hsum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[30m] offset 1d)) by (DstK8S_Namespace))You can use the formula
to calculate the percentage of increase compared to the previous day. This value can be negative if the byte rate today is lower than the previous day.100 * (<query now> - <query from the previous day>) / <query from the previous day>- Final threshold
-
You can apply a final threshold to filter increases that are lower than the desired percentage. For example,
> 100eliminates increases that are lower than 100%.
Together, the complete expression for the
PrometheusRule
...
expr: |-
(100 *
(
(sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[30m])) by (DstK8S_Namespace) > 1000)
- sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[30m] offset 1d)) by (DstK8S_Namespace)
)
/ sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[30m] offset 1d)) by (DstK8S_Namespace))
> 100
11.3.1.2. Alert metadata fields Copy linkLink copied to clipboard!
The Network Observability Operator uses components from other OpenShift Container Platform features, such as the monitoring stack, to enhance visibility into network traffic. For more information, see: "Monitoring stack architecture".
Some metadata must be configured for the rule definitions. This metadata is used by Prometheus and the
Alertmanager
The following example shows an
AlertingRule
apiVersion: monitoring.openshift.io/v1
kind: AlertingRule
metadata:
name: netobserv-alerts
namespace: openshift-monitoring
spec:
groups:
- name: NetObservAlerts
rules:
- alert: NetObservIncomingBandwidth
annotations:
netobserv_io_network_health: '{"namespaceLabels":["DstK8S_Namespace"],"threshold":"100","unit":"%","upperBound":"500"}'
message: |-
NetObserv is detecting a surge of incoming traffic: current traffic to {{ $labels.DstK8S_Namespace }} has increased by more than 100% since yesterday.
summary: "Surge in incoming traffic"
expr: |-
(100 *
(
(sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[30m])) by (DstK8S_Namespace) > 1000)
- sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[30m] offset 1d)) by (DstK8S_Namespace)
)
/ sum(rate(netobserv_workload_ingress_bytes_total{SrcK8S_Namespace="openshift-ingress"}[30m] offset 1d)) by (DstK8S_Namespace))
> 100
for: 1m
labels:
app: netobserv
netobserv: "true"
severity: warning
where:
spec.groups.rules.alert.labels.netobserv-
Specifies the alert for the Network Health dashboard to detect when set to
true. spec.groups.rules.alert.labels.severity-
Specifies the severity of the alert. The following values are valid:
critical,warning, orinfo.
You can leverage the output labels from the defined
PromQL
message
DstK8S_Namespace
{{ $labels.DstK8S_Namespace }}
The
netobserv_io_network_health
The
netobserv_io_network_health
| Field | Type | Description |
|---|---|---|
|
| List of strings | One or more labels that hold namespaces. When provided, the alert appears under the Namespaces tab. |
|
| List of strings | One or more labels that hold node names. When provided, the alert appears under the Nodes tab. |
|
| List of strings | One or more labels that hold owner/workload names. When provided alongside with
|
|
| List of strings | One or more labels that hold owner/workload kinds. When provided alongside with
|
|
| String | The alert threshold, expected to match the threshold defined in the
|
|
| String | The data unit, used only for display purposes. |
|
| String | An upper bound value used to compute the score on a closed scale. Metric values exceeding this bound are clamped. |
|
| List of objects | A list of links to display contextually with the alert. Each link requires a
|
|
| String | Information related to the link to the Network Traffic page, for URL building. Some filters will be set automatically, such as the
|
The
namespaceLabels
nodeLabels
| Field | Description |
|---|---|
|
| Additional filter to inject (for example, a DNS response code for DNS-related alerts). |
|
| Whether the filter should include return traffic (
|
|
| Whether the filter should target the destination of the traffic instead of the source (
|
11.3.2. Custom health rule configuration Copy linkLink copied to clipboard!
Use the Prometheus Query Language (
PromQL
AlertingRule
Prerequisites
-
Familiarity with .
PromQL - You have installed OpenShift Container Platform 4.16 or later.
-
You have access to the cluster as a user with the role.
cluster-admin - You have installed the Network Observability Operator.
Procedure
-
Create a YAML file named that contains your
custom-alert.yamlresource.AlertingRule Apply the custom alert rule by running the following command:
$ oc apply -f custom-alert.yaml
Verification
Verify that the
resource was created in thePrometheusRulenamespace by running the following command:netobserv$ oc get prometheusrules -n netobserv -oyamlThe output should include the
rule you just created, confirming that the resource was generated correctly.netobserv-alerts- Confirm the rule is active by checking the Network Health dashboard in the OpenShift Container Platform web console → Observe.
11.4. Disable predefined rules Copy linkLink copied to clipboard!
Rule templates can be disabled in the
spec.processor.metrics.disableAlerts
FlowCollector
If a template is disabled and overridden in the
spec.processor.metrics.healthRules
Chapter 12. Using metrics with dashboards and alerts Copy linkLink copied to clipboard!
The Network Observability Operator uses the
flowlogs-pipeline
12.1. Viewing network observability metrics dashboards Copy linkLink copied to clipboard!
View network observability metrics dashboards using the Overview tab in the OpenShift Container Platform console to monitor overall traffic flow and system health, with options to filter metrics by node, namespace, owner, pod, and service.
Procedure
- In the web console Observe → Dashboards, select the Netobserv dashboard.
View network traffic metrics in the following categories, with each having the subset per node, namespace, source, and destination:
- Byte rates
- Packet drops
- DNS
- RTT
- Select the Netobserv/Health dashboard.
View metrics about the health of the Operator in the following categories, with each having the subset per node, namespace, source, and destination:
- Flows
- Flows Overhead
- Flow rates
- Agents
- Processor
Operator
Infrastructure and Application metrics are shown in a split-view for namespace and workloads.
12.2. Network observability metrics Copy linkLink copied to clipboard!
Review the comprehensive list of network observability metrics, prefixed by
netobserv_
FlowCollector
Metrics generated by the
flowlogs-pipeline
spec.processor.metrics.includeList
FlowCollector
You can also create alerts by using the
includeList
When looking for these metrics in Prometheus, such as in the Console through Observe → Metrics, or when defining alerts, all the metrics names are prefixed with
netobserv_
netobserv_namespace_flows_total
- includeList metrics names
Names followed by an asterisk
are enabled by default.*-
namespace_egress_bytes_total -
namespace_egress_packets_total -
namespace_ingress_bytes_total -
namespace_ingress_packets_total -
*
namespace_flows_total -
node_egress_bytes_total -
node_egress_packets_total -
*
node_ingress_bytes_total -
node_ingress_packets_total -
node_flows_total -
workload_egress_bytes_total -
workload_egress_packets_total -
*
workload_ingress_bytes_total -
workload_ingress_packets_total -
workload_flows_total
-
- PacketDrop metrics names
When the
feature is enabled inPacketDrop(withspec.agent.ebpf.featuresmode), the following additional metrics are available:privileged-
namespace_drop_bytes_total -
*
namespace_drop_packets_total -
node_drop_bytes_total -
node_drop_packets_total -
workload_drop_bytes_total -
workload_drop_packets_total
-
- DNS metrics names
When the
feature is enabled inDNSTracking, the following additional metrics are available:spec.agent.ebpf.features-
*
namespace_dns_latency_seconds -
node_dns_latency_seconds -
workload_dns_latency_seconds
-
- FlowRTT metrics names
When the
feature is enabled inFlowRTT, the following additional metrics are available:spec.agent.ebpf.features-
*
namespace_rtt_seconds -
node_rtt_seconds -
workload_rtt_seconds
-
- Network events metrics names
When
feature is enabled, this metric is available by default:NetworkEvents-
namespace_network_policy_events_total
-
12.3. Creating alerts Copy linkLink copied to clipboard!
Create custom
AlertingRule
Netobserv
Prerequisites
- You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.
- You have the Network Observability Operator installed.
Procedure
- Create a YAML file by clicking the import icon, +.
Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when the cluster ingress traffic reaches a given threshold of 10 MBps per destination workload.
apiVersion: monitoring.openshift.io/v1 kind: AlertingRule metadata: name: netobserv-alerts namespace: openshift-monitoring spec: groups: - name: NetObservAlerts rules: - alert: NetObservIncomingBandwidth annotations: message: |- {{ $labels.job }}: incoming traffic exceeding 10 MBps for 30s on {{ $labels.DstK8S_OwnerType }} {{ $labels.DstK8S_OwnerName }} ({{ $labels.DstK8S_Namespace }}). summary: "High incoming traffic." expr: sum(rate(netobserv_workload_ingress_bytes_total {SrcK8S_Namespace="openshift-ingress"}[1m])) by (job, DstK8S_Namespace, DstK8S_OwnerName, DstK8S_OwnerType) > 100000001 for: 30s labels: severity: warning- 1
- The
netobserv_workload_ingress_bytes_totalmetric is enabled by default inspec.processor.metrics.includeList.
- Click Create to apply the configuration file to the cluster.
12.4. Custom metrics Copy linkLink copied to clipboard!
Define custom metrics from flowlog data using the
FlowMetric
In every flowlogs data that is collected, there are several fields labeled per log, such as source name and destination name. These fields can be leveraged as Prometheus labels to enable the customization of cluster information on your dashboard.
12.5. Configuring custom metrics by using FlowMetric API Copy linkLink copied to clipboard!
Configure the
FlowMetric
Procedure
- In the web console, navigate to Operators → Installed Operators.
- In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project: dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
-
Configure the resource. See "Custom metrics configuration examples".
FlowMetric
Verification
- Once the pods refresh, navigate to Observe → Metrics.
-
In the Expression field, type the metric name to view the corresponding result. You can also enter an expression, such as
topk(5, sum(rate(netobserv_cluster_external_ingress_bytes_total{DstK8S_Namespace="my-namespace"}[2m])) by (DstK8S_HostName, DstK8S_OwnerName, DstK8S_OwnerType))
12.5.1. Custom metrics configuration examples Copy linkLink copied to clipboard!
To monitor specific network behaviors not covered by default metrics, such as external traffic volume or latency spikes, use the
FlowMetric
12.5.1.1. Tracking ingress bytes from cluster external sources Copy linkLink copied to clipboard!
To measure the volume of data entering the cluster from external networks, use the following
FlowMetric
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv
spec:
metricName: cluster_external_ingress_bytes_total
type: Counter
valueField: Bytes
direction: Ingress
labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType]
filters:
- field: SrcSubnetLabel
matchType: Absence
- 1
- The
FlowMetricresources need to be created in the namespace defined in theFlowCollectorspec.namespace, which isnetobservby default. - 2
- The name of the Prometheus metric, which in the web console appears with the prefix
netobserv-<metricName>. - 3
- The
typespecifies the type of metric. TheCountertypeis useful for counting bytes or packets. - 4
- The direction of traffic to capture. If not specified, both ingress and egress are captured, which can lead to duplicated counts.
- 5
- Labels define what the metrics look like and the relationship between the different entities and also define the metrics cardinality. For example,
SrcK8S_Nameis a high cardinality metric. - 6
- Refines results based on the listed criteria. In this example, selecting only the cluster external traffic is done by matching only flows where
SrcSubnetLabelis absent. This assumes the subnet labels feature is enabled (viaspec.processor.subnetLabels), which is done by default.
12.5.1.2. Monitoring RTT latency for cluster external ingress traffic Copy linkLink copied to clipboard!
To analyze the performance of external connections and identify high-latency paths, use the following
FlowMetric
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-rtt
namespace: netobserv
spec:
metricName: cluster_external_ingress_rtt_seconds
type: Histogram
valueField: TimeFlowRttNs
direction: Ingress
labels: [DstK8S_HostName,DstK8S_Namespace,DstK8S_OwnerName,DstK8S_OwnerType]
filters:
- field: SrcSubnetLabel
matchType: Absence
- field: TimeFlowRttNs
matchType: Presence
divider: "1000000000"
buckets: [".001", ".005", ".01", ".02", ".03", ".04", ".05", ".075", ".1", ".25", "1"]
- 1
- The
FlowMetricresources need to be created in the namespace defined in theFlowCollectorspec.namespace, which isnetobservby default. - 2
- The
typespecifies the type of metric. TheHistogramtypeis useful for a latency value (TimeFlowRttNs). - 3
- Since the Round-trip time (RTT) is provided as nanos in flows, use a divider of 1 billion to convert into seconds, which is standard in Prometheus guidelines.
- 4
- The custom buckets specify precision on RTT, with optimal precision ranging between 5ms and 250ms.
12.6. Creating metrics from nested or array fields in the Traffic flows table Copy linkLink copied to clipboard!
Create a
FlowMetric
OVN Observability / Viewing
NetworkEvents
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
OVN Observability and the ability to view and track network events is available only in OpenShift Container Platform 4.17 and 4.18.
The following example shows how to generate metrics from the Network events field for network policy events.
Prerequisites
-
Enable . See the Additional resources for how to do this.
NetworkEvents feature - A network policy specified.
Procedure
- In the web console, navigate to Operators → Installed Operators.
- In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
Create
resources to add the following configurations:FlowMetricConfiguration counting network policy events per policy name and namespace
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: network-policy-events namespace: netobserv spec: metricName: network_policy_events_total type: Counter labels: [NetworkEvents>Type, NetworkEvents>Namespace, NetworkEvents>Name, NetworkEvents>Action, NetworkEvents>Direction]1 filters: - field: NetworkEvents>Feature value: acl flatten: [NetworkEvents]2 remap:3 "NetworkEvents>Type": type "NetworkEvents>Namespace": namespace "NetworkEvents>Name": name "NetworkEvents>Direction": direction- 1
- These labels represent the nested fields for Network Events from the Traffic flows table. Each network event has a specific type, namespace, name, action, and direction. You can alternatively specify the
InterfacesifNetworkEventsis unavailable in your OpenShift Container Platform version. - 2
- Optional: You can choose to represent a field that contains a list of items as distinct items.
- 3
- Optional: You can rename the fields in Prometheus.
Verification
- In the web console, navigate to Observe → Dashboards and scroll down to see the Network Policy tab.
- You should begin seeing metrics filter in based on the metric you created along with the network policy specifications.
High cardinality can affect the memory usage of Prometheus. You can check whether specific labels have high cardinality in the Network Flows format reference.
12.7. Configuring custom charts using FlowMetric API Copy linkLink copied to clipboard!
Generate custom charts for OpenShift Container Platform web console dashboards by defining the charts section of the
FlowMetric
You can view custom charts as an administrator in the Dashboard menu.
Procedure
- In the web console, navigate to Operators → Installed Operators.
- In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project: dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
-
Configure the resource. See "Flowmetric chart configuration examples".
FlowMetric
Verification
- Once the pods refresh, navigate to Observe → Dashboards.
Search for the NetObserv / Main dashboard. View two panels under the NetObserv / Main dashboard, or optionally a dashboard name that you create:
- A textual single statistic showing the global external ingress rate summed across all dimensions
- A timeseries graph showing the same metric per destination workload
For more information about the query language, refer to the Prometheus documentation.
12.7.1. Flowmetric chart configuration examples Copy linkLink copied to clipboard!
These
FlowMetric
12.7.1.1. Ingress bytes chart for cluster external sources Copy linkLink copied to clipboard!
Use the following configuration to track the rate of ingress traffic from cluster external sources. These charts help identify bandwidth usage per workload.
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv
# ...
charts:
- dashboardName: Main
title: External ingress traffic
unit: Bps
type: SingleStat
queries:
- promQL: "sum(rate($METRIC[2m]))"
legend: ""
- dashboardName: Main
sectionName: External
title: Top external ingress traffic per workload
unit: Bps
type: StackArea
queries:
- promQL: "sum(rate($METRIC{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace, DstK8S_OwnerName)"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...
- 1
- The
FlowMetricresources need to be created in the namespace defined in theFlowCollectorspec.namespace, which isnetobservby default.
12.7.1.2. RTT latency chart for cluster external ingress traffic Copy linkLink copied to clipboard!
Use the following configuration to monitor round-trip time (RTT) for cluster external ingress traffic. These examples use the
histogram_quantile
apiVersion: flows.netobserv.io/v1alpha1
kind: FlowMetric
metadata:
name: flowmetric-cluster-external-ingress-traffic
namespace: netobserv
# ...
charts:
- dashboardName: Main
title: External ingress TCP latency
unit: seconds
type: SingleStat
queries:
- promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket[2m])) by (le)) > 0"
legend: "p99"
- dashboardName: Main
sectionName: External
title: "Top external ingress sRTT per workload, p50 (ms)"
unit: seconds
type: Line
queries:
- promQL: "histogram_quantile(0.5, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
- dashboardName: Main
sectionName: External
title: "Top external ingress sRTT per workload, p99 (ms)"
unit: seconds
type: Line
queries:
- promQL: "histogram_quantile(0.99, sum(rate($METRIC_bucket{DstK8S_Namespace!=\"\"}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 0"
legend: "{{DstK8S_Namespace}} / {{DstK8S_OwnerName}}"
# ...
12.7.1.3. Calculate histogram averages Copy linkLink copied to clipboard!
You can show averages of histograms by dividing the metric,
$METRIC_sum
$METRIC_count
promQL: "(sum(rate($METRIC_sum{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName) / sum(rate($METRIC_count{DstK8S_Namespace!=\"\"}[2m])) by (DstK8S_Namespace,DstK8S_OwnerName))*1000"
12.8. Detecting SYN flooding using the FlowMetric API and TCP flags Copy linkLink copied to clipboard!
Deploy a custom
AlertingRule
FlowMetric
Procedure
- In the web console, navigate to Operators → Installed Operators.
- In the Provided APIs heading for the NetObserv Operator, select FlowMetric.
- In the Project dropdown list, select the project of the Network Observability Operator instance.
- Click Create FlowMetric.
Create
resources to add the following configurations:FlowMetricConfiguration counting flows per destination host and resource, with TCP flags
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: flows-with-flags-per-destination spec: metricName: flows_with_flags_per_destination_total type: Counter labels: [SrcSubnetLabel,DstSubnetLabel,DstK8S_Name,DstK8S_Type,DstK8S_HostName,DstK8S_Namespace,Flags]Configuration counting flows per source host and resource, with TCP flags
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowMetric metadata: name: flows-with-flags-per-source spec: metricName: flows_with_flags_per_source_total type: Counter labels: [DstSubnetLabel,SrcSubnetLabel,SrcK8S_Name,SrcK8S_Type,SrcK8S_HostName,SrcK8S_Namespace,Flags]Deploy the following
resource to alert for SYN flooding:AlertingRuleAlertingRulefor SYN floodingapiVersion: monitoring.openshift.io/v1 kind: AlertingRule metadata: name: netobserv-syn-alerts namespace: openshift-monitoring # ... spec: groups: - name: NetObservSYNAlerts rules: - alert: NetObserv-SYNFlood-in annotations: message: |- {{ $labels.job }}: incoming SYN-flood attack suspected to Host={{ $labels.DstK8S_HostName}}, Namespace={{ $labels.DstK8S_Namespace }}, Resource={{ $labels.DstK8S_Name }}. This is characterized by a high volume of SYN-only flows with different source IPs and/or ports. summary: "Incoming SYN-flood" expr: sum(rate(netobserv_flows_with_flags_per_destination_total{Flags="2"}[1m])) by (job, DstK8S_HostName, DstK8S_Namespace, DstK8S_Name) > 3001 for: 15s labels: severity: warning app: netobserv - alert: NetObserv-SYNFlood-out annotations: message: |- {{ $labels.job }}: outgoing SYN-flood attack suspected from Host={{ $labels.SrcK8S_HostName}}, Namespace={{ $labels.SrcK8S_Namespace }}, Resource={{ $labels.SrcK8S_Name }}. This is characterized by a high volume of SYN-only flows with different source IPs and/or ports. summary: "Outgoing SYN-flood" expr: sum(rate(netobserv_flows_with_flags_per_source_total{Flags="2"}[1m])) by (job, SrcK8S_HostName, SrcK8S_Namespace, SrcK8S_Name) > 3002 for: 15s labels: severity: warning app: netobserv # ...
Verification
- In the web console, click Manage Columns in the Network Traffic table view and click TCP flags.
- In the Network Traffic table view, filter on TCP protocol SYN TCPFlag. A large number of flows with the same byteSize indicates a SYN flood.
- Go to Observe → Alerting and select the Alerting Rules tab.
- Filter on netobserv-synflood-in alert. The alert should fire when SYN flooding occurs.
Chapter 13. Monitoring the Network Observability Operator Copy linkLink copied to clipboard!
Use the OpenShift Container Platform web console to monitor alerts related to the Network Observability Operator’s health. This helps you maintain system stability and quickly detect operational issues.
13.1. Health dashboards Copy linkLink copied to clipboard!
View the Network Observability Operator health dashboards in the OpenShift Container Platform web console to monitor the health status, resource usage, and internal statistics of the operator and its components.
Metrics are located in the Observe → Dashboards page in the OpenShift Container Platform web console. You can view metrics about the health of the Network Observability Operator in the following categories:
- Flows per second
- Sampling
- Errors last minute
- Dropped flows per second
- Flowlogs-pipeline statistics
- Flowlogs-pipleine statistics views
- eBPF agent statistics views
- Operator statistics
- Resource usage
13.2. Health alerts Copy linkLink copied to clipboard!
Understand the health alerts generated by the Network Observability Operator, which trigger banners when conditions like Loki ingestion errors, zero flow ingestion, or dropped eBPF flows occur.
A health alert banner that directs you to the dashboard can appear on the Network Traffic and Home pages if an alert is triggered. Alerts are generated in the following cases:
-
The alert occurs if the
NetObservLokiErrorworkload is dropping flows because of Loki errors, such as if the Loki ingestion rate limit has been reached.flowlogs-pipeline -
The alert occurs if no flows are ingested for a certain amount of time.
NetObservNoFlows -
The alert occurs if the Network Observability eBPF agent hashmap table is full, and the eBPF agent processes flows with degraded performance, or when the capacity limiter is triggered.
NetObservFlowsDropped
13.3. Viewing health information Copy linkLink copied to clipboard!
View the Netobserv/Health dashboard within the OpenShift Container Platform web console to monitor the health status and resource usage of the Network Observability Operator and its components.
Prerequisites
- You have the Network Observability Operator installed.
-
You have access to the cluster as a user with the role or with view permissions for all projects.
cluster-admin
Procedure
- From the Administrator perspective in the web console, navigate to Observe → Dashboards.
- From the Dashboards dropdown, select Netobserv/Health.
- View the metrics about the health of the Operator that are displayed on the page.
13.3.1. Disabling health alerts Copy linkLink copied to clipboard!
Disable specific health alerts, such as
NetObservLokiError
NetObservNoFlows
FlowCollector
spec.processor.metrics.disableAlerts
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster then select the YAML tab.
Add
to disable health alerts, as in the following YAML sample:spec.processor.metrics.disableAlertsapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: processor: metrics: disableAlerts: [NetObservLokiError, NetObservNoFlows]1 - 1
- You can specify one or a list with both types of alerts to disable.
13.4. Creating Loki rate limit alerts for the NetObserv dashboard Copy linkLink copied to clipboard!
Create a custom
AlertingRule
You can create custom alerting rules for the Netobserv dashboard metrics to trigger alerts when Loki rate limits have been reached.
Prerequisites
- You have access to the cluster as a user with the cluster-admin role or with view permissions for all projects.
- You have the Network Observability Operator installed.
Procedure
- Create a YAML file by clicking the import icon, +.
Add an alerting rule configuration to the YAML file. In the YAML sample that follows, an alert is created for when Loki rate limits have been reached:
apiVersion: monitoring.openshift.io/v1 kind: AlertingRule metadata: name: loki-alerts namespace: openshift-monitoring spec: groups: - name: LokiRateLimitAlerts rules: - alert: LokiTenantRateLimit annotations: message: |- {{ $labels.job }} {{ $labels.route }} is experiencing 429 errors. summary: "At any number of requests are responded with the rate limit error code." expr: sum(irate(loki_request_duration_seconds_count{status_code="429"}[1m])) by (job, namespace, route) / sum(irate(loki_request_duration_seconds_count[1m])) by (job, namespace, route) * 100 > 0 for: 10s labels: severity: warning- Click Create to apply the configuration file to the cluster.
13.5. Using the eBPF agent alert Copy linkLink copied to clipboard!
Resolve the
NetObservAgentFlowsDropped
spec.agent.ebpf.cacheMaxFlows
FlowCollector
An alert,
NetObservAgentFlowsDropped
cacheMaxFlows
FlowCollector
Increasing the
cacheMaxFlows
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the Network Observability Operator, select Flow Collector.
- Select cluster, and then select the YAML tab.
Increase the
value, as shown in the following YAML sample:spec.agent.ebpf.cacheMaxFlowsapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv deploymentModel: Service agent: type: eBPF ebpf: cacheMaxFlows: 2000001 - 1
- Increase the
cacheMaxFlowsvalue from its value at the time of theNetObservAgentFlowsDroppedalert.
Chapter 14. Scheduling resources Copy linkLink copied to clipboard!
Taints and tolerations help you control which nodes host certain pods. Use these tools, along with node selectors, to guide the placement of network observability components.
A node selector specifies a map of key/value pairs that are defined using custom labels on nodes and selectors specified in pods.
For the pod to be eligible to run on a node, the pod must have the same key/value node selector as the label on the node.
14.1. Network observability deployment in specific nodes Copy linkLink copied to clipboard!
Configure the
FlowCollector
NodeSelector
Tolerations
Affinity
The
spec.agent.ebpf.advanced.scheduling
spec.processor.advanced.scheduling
spec.consolePlugin.advanced.scheduling
-
NodeSelector -
Tolerations -
Affinity -
PriorityClassName
Sample FlowCollector resource for spec.<component>.advanced.scheduling
apiVersion: flows.netobserv.io/v1beta2
kind: FlowCollector
metadata:
name: cluster
spec:
# ...
advanced:
scheduling:
tolerations:
- key: "<taint key>"
operator: "Equal"
value: "<taint value>"
effect: "<taint effect>"
nodeSelector:
<key>: <value>
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: name
operator: In
values:
- app-worker-node
priorityClassName: """
# ...
Chapter 15. Secondary networks Copy linkLink copied to clipboard!
You can configure the Network Observability Operator to collect and enrich network flow data from secondary networks, such as
SR-IOV
OVN-Kubernetes
15.1. Prerequisites Copy linkLink copied to clipboard!
- Access to an OpenShift Container Platform cluster with an additional network interface, such as a secondary interface or an L2 network.
15.2. Configuring monitoring for SR-IOV interface traffic Copy linkLink copied to clipboard!
Configure the
FlowCollector
spec.agent.ebpf.privileged
true
The eBPF agent monitors other network namespaces in addition to the host network namespaces, which are monitored by default. When a pod with a virtual functions (VF) interface is created, a new network namespace is created. With
SRIOVNetwork
IPAM
Prerequisites
- Access to an OpenShift Container Platform cluster with a SR-IOV device.
-
The custom resource (CR)
SRIOVNetworkconfiguration must be set with an IP address from the range that the interface lists or from other plugins.spec.ipam
Procedure
- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster and then select the YAML tab.
Configure the
custom resource. A sample configuration is as follows:FlowCollectorConfigure
FlowCollectorfor SR-IOV monitoringapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: namespace: netobserv deploymentModel: Service agent: type: eBPF ebpf: privileged: true1 - 1
- The
spec.agent.ebpf.privilegedfield value must be set totrueto enable SR-IOV monitoring.
15.3. Configuring virtual machine (VM) secondary network interfaces for Network Observability Copy linkLink copied to clipboard!
Configure the
FlowCollector
privileged
Network flows coming from VMs that are connected to the default internal pod network are automatically captured by network observability.
Procedure
Get information about the virtual machine launcher pod by running the following command. This information is used in Step 5:
$ oc get pod virt-launcher-<vm_name>-<suffix> -n <namespace> -o yamlapiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/network-status: |- [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.129.2.39" ], "mac": "0a:58:0a:81:02:27", "default": true, "dns": {} }, { "name": "my-vms/l2-network",1 "interface": "podc0f69e19ba2",2 "ips": [3 "10.10.10.15" ], "mac": "02:fb:f8:00:00:12",4 "dns": {} }] name: virt-launcher-fedora-aqua-fowl-13-zr2x9 namespace: my-vms spec: # ... status: # ...- In the web console, navigate to Operators → Installed Operators.
- Under the Provided APIs heading for the NetObserv Operator, select Flow Collector.
- Select cluster and then select the YAML tab.
Configure
based on the information you found from the additional network investigation:FlowCollectorapiVersion: flows.netobserv.io/v1beta2 kind: FlowCollector metadata: name: cluster spec: agent: ebpf: privileged: true \1 processor: advanced: secondaryNetworks: - index: \2 - MAC \3 name: my-vms/l2-network \4 # ...- 1
- Ensure that the eBPF agent is in
privilegedmode so that flows are collected for secondary interfaces. - 2
- Define the fields to use for indexing the virtual machine launcher pods. It is recommended to use the
MACaddress as the indexing field to get network flows enrichment for secondary interfaces. If you have overlapping MAC address between pods, then additional indexing fields, such asIPandInterface, could be added to have accurate enrichment. - 3
- If your additional network information has a MAC address, add
MACto the field list. - 4
- Specify the name of the network found in the
k8s.v1.cni.cncf.io/network-statusannotation. Usually <namespace>/<network_attachement_definition_name>.
Observe VM traffic:
- Navigate to the Network Traffic page.
-
Filter by Source IP using your virtual machine IP found in annotation.
k8s.v1.cni.cncf.io/network-status - View both Source and Destination fields, which should be enriched, and identify the VM launcher pods and the VM instance as owners.
Chapter 16. Network Observability CLI Copy linkLink copied to clipboard!
16.1. Installing the Network Observability CLI Copy linkLink copied to clipboard!
The Network Observability CLI (
oc netobserv
oc
16.1.1. About the Network Observability CLI Copy linkLink copied to clipboard!
Use the Network Observability CLI (
oc netobserv
The Network Observability CLI is a flow and packet visualization tool that relies on eBPF agents to stream collected data to an ephemeral collector pod. It requires no persistent storage during the capture. After the run, the output is transferred to your local machine.
CLI capture is meant to run only for short durations, such as 8-10 minutes. If it runs for too long, it can be difficult to delete the running process.
16.1.2. Installing the Network Observability CLI Copy linkLink copied to clipboard!
The Network Observability CLI gives you a lightweight way to quickly debug and troubleshoot network observability. It must be installed separately.
Installing the Network Observability CLI (
oc netobserv
CLI
Users can optionally use Krew to install the
netobserv
Prerequisites
-
You must install the OpenShift CLI ().
oc - You must have a macOS or Linux operating system.
-
You must install either or
docker.podman
You can use
podman
docker
podman
Procedure
Log in to the Red Hat registry by running the following command:
$ podman login registry.redhat.ioExtract the
file from the image by running the following commands:oc-netobserv$ podman create --name netobserv-cli registry.redhat.io/network-observability/network-observability-cli-rhel9:1.11 $ podman cp netobserv-cli:/oc-netobserv . $ podman rm netobserv-cliMove the extracted file to a directory that is on the system’s
, such asPATH, by running the following command:/usr/local/bin/$ sudo mv oc-netobserv /usr/local/bin/
Verification
Verify that
is available:oc netobserv$ oc netobserv versionThis command should produce an outcome similar to the following example:
Netobserv CLI version <version>
16.2. Using the Network Observability CLI Copy linkLink copied to clipboard!
You can visualize and filter the flows and packets data directly in the terminal to see specific usage, such as identifying who is using a specific port. The Network Observability CLI collects flows as JSON and database files or packets as a PCAP file, which you can use with third-party tools.
16.2.1. Capturing flows Copy linkLink copied to clipboard!
Capture network flows and apply filters based on resources or zones directly in the CLI. This helps you solve complex use cases, such as visualizing the Round-Trip Time (RTT) between two different zones.
Table visualization in the CLI provides viewing and flow search capabilities.
Prerequisites
-
Install the OpenShift CLI ().
oc -
Install the Network Observability CLI () plugin.
oc netobserv
Procedure
Capture flows with filters enabled by running the following command:
$ oc netobserv flows --enable_filter=true --action=Accept --cidr=0.0.0.0/0 --protocol=TCP --port=49051Add filters to the
prompt in the terminal to further refine the incoming flows. For example:live table filterlive table filter: [SrcK8S_Zone:us-west-1b] press enter to match multiple regular expressions at once- Use the PageUp and PageDown keys to toggle between None, Resource, Zone, Host, Owner and all of the above.
-
To stop capturing, press Ctrl+C. The data that was captured is written to two separate files in an directory located in the same path used to install the CLI.
./output View the captured data in the
JSON file, which contains JSON arrays of the captured data../output/flow/<capture_date_time>.jsonExample JSON file
{ "AgentIP": "10.0.1.76", "Bytes": 561, "DnsErrno": 0, "Dscp": 20, "DstAddr": "f904:ece9:ba63:6ac7:8018:1e5:7130:0", "DstMac": "0A:58:0A:80:00:37", "DstPort": 9999, "Duplicate": false, "Etype": 2048, "Flags": 16, "FlowDirection": 0, "IfDirection": 0, "Interface": "ens5", "K8S_FlowLayer": "infra", "Packets": 1, "Proto": 6, "SrcAddr": "3e06:6c10:6440:2:a80:37:b756:270f", "SrcMac": "0A:58:0A:80:00:01", "SrcPort": 46934, "TimeFlowEndMs": 1709741962111, "TimeFlowRttNs": 121000, "TimeFlowStartMs": 1709741962111, "TimeReceived": 1709741964 }You can use SQLite to inspect the
database file. For example:./output/flow/<capture_date_time>.dbOpen the file by running the following command:
$ sqlite3 ./output/flow/<capture_date_time>.dbQuery the data by running a SQLite
statement, for example:SELECTsqlite> SELECT DnsLatencyMs, DnsFlagsResponseCode, DnsId, DstAddr, DstPort, Interface, Proto, SrcAddr, SrcPort, Bytes, Packets FROM flow WHERE DnsLatencyMs >10 LIMIT 10;Example output
12|NoError|58747|10.128.0.63|57856||17|172.30.0.10|53|284|1 11|NoError|20486|10.128.0.52|56575||17|169.254.169.254|53|225|1 11|NoError|59544|10.128.0.103|51089||17|172.30.0.10|53|307|1 13|NoError|32519|10.128.0.52|55241||17|169.254.169.254|53|254|1 12|NoError|32519|10.0.0.3|55241||17|169.254.169.254|53|254|1 15|NoError|57673|10.128.0.19|59051||17|172.30.0.10|53|313|1 13|NoError|35652|10.0.0.3|46532||17|169.254.169.254|53|183|1 32|NoError|37326|10.0.0.3|52718||17|169.254.169.254|53|169|1 14|NoError|14530|10.0.0.3|58203||17|169.254.169.254|53|246|1 15|NoError|40548|10.0.0.3|45933||17|169.254.169.254|53|174|1
16.2.2. Capturing packets Copy linkLink copied to clipboard!
Use the Network Observability CLI to capture network packets. You can apply filters and refine them live in the terminal for accurate, real-time debugging.
Prerequisites
-
Install the OpenShift CLI ().
oc -
Install the Network Observability CLI () plugin.
oc netobserv
Procedure
Run the packet capture with filters enabled:
$ oc netobserv packets --action=Accept --cidr=0.0.0.0/0 --protocol=TCP --port=49051Add filters to the
prompt in the terminal to refine the incoming packets. An example filter is as follows:live table filterlive table filter: [SrcK8S_Zone:us-west-1b] press enter to match multiple regular expressions at once- Use the PageUp and PageDown keys to toggle between None, Resource, Zone, Host, Owner and all of the above.
- To stop capturing, press Ctrl+C.
View the captured data, which is written to a single file in an
directory located in the same path that was used to install the CLI:./output/pcap-
The file can be opened with Wireshark.
./output/pcap/<capture_date_time>.pcap
-
The
16.2.3. Capturing metrics Copy linkLink copied to clipboard!
Generate on-demand network observability dashboards in Prometheus using a service monitor. This allows you to quickly view and analyze network metrics.
Prerequisites
-
Install the OpenShift CLI ().
oc -
Install the Network Observability CLI () plugin.
oc netobserv
Procedure
Capture metrics with filters enabled by running the following command:
Example output
$ oc netobserv metrics --enable_filter=true --cidr=0.0.0.0/0 --protocol=TCP --port=49051Open the link provided in the terminal to view the NetObserv / On-Demand dashboard:
Example URL
https://console-openshift-console.apps.rosa...openshiftapps.com/monitoring/dashboards/netobserv-cliNoteFeatures that are not enabled present as empty graphs.
16.2.4. Cleaning the Network Observability CLI Copy linkLink copied to clipboard!
Use
oc netobserv cleanup
Procedure
Run the following command:
$ oc netobserv cleanup
Additional resources
16.3. Network Observability CLI (oc netobserv) reference Copy linkLink copied to clipboard!
The Network Observability CLI (
oc netobserv
16.3.1. Network Observability CLI usage Copy linkLink copied to clipboard!
You can use the Network Observability CLI (
oc netobserv
16.3.1.1. Syntax Copy linkLink copied to clipboard!
The basic syntax for
oc netobserv
oc netobserv syntax
$ oc netobserv [<command>] [<feature_option>] [<command_options>]
- 1
- Feature options can only be used with the
oc netobserv flowscommand. They cannot be used with theoc netobserv packetscommand.
16.3.1.2. Basic commands Copy linkLink copied to clipboard!
| Command | Description |
|---|---|
| flows | Capture flows information. For subcommands, see the "Flows capture options" table. |
| packets | Capture packets data. For subcommands, see the "Packets capture options" table. |
| metrics | Capture metrics data. For subcommands, see the "Metrics capture options" table. |
| follow | Follow collector logs when running in background. |
| stop | Stop collection by removing agent daemonset. |
| copy | Copy collector generated files locally. |
| cleanup | Remove the Network Observability CLI components. |
| version | Print the software version. |
| help | Show help. |
16.3.1.3. Flows capture options Copy linkLink copied to clipboard!
Flows capture has mandatory commands as well as additional options, such as enabling extra features about packet drops, DNS latencies, Round-trip time, and filtering.
oc netobserv flows syntax
$ oc netobserv flows [<feature_option>] [<command_options>]
| Option | Description | Default |
|---|---|---|
| --enable_all | enable all eBPF features | false |
| --enable_dns | enable DNS tracking | false |
| --enable_ipsec | enable IPsec tracking | false |
| --enable_network_events | enable network events monitoring | false |
| --enable_pkt_translation | enable packet translation | false |
| --enable_pkt_drop | enable packet drop | false |
| --enable_rtt | enable RTT tracking | false |
| --enable_udn_mapping | enable User Defined Network mapping | false |
| --get-subnets | get subnets information | false |
| --privileged | force eBPF agent privileged mode | auto |
| --sampling | packets sampling interval | 1 |
| --background | run in background | false |
| --copy | copy the output files locally | prompt |
| --log-level | components logs | info |
| --max-time | maximum capture time | 5m |
| --max-bytes | maximum capture bytes | 50000000 = 50MB |
| --action | filter action | Accept |
| --cidr | filter CIDR | 0.0.0.0/0 |
| --direction | filter direction | - |
| --dport | filter destination port | - |
| --dport_range | filter destination port range | - |
| --dports | filter on either of two destination ports | - |
| --drops | filter flows with only dropped packets | false |
| --icmp_code | filter ICMP code | - |
| --icmp_type | filter ICMP type | - |
| --node-selector | capture on specific nodes | - |
| --peer_ip | filter peer IP | - |
| --peer_cidr | filter peer CIDR | - |
| --port_range | filter port range | - |
| --port | filter port | - |
| --ports | filter on either of two ports | - |
| --protocol | filter protocol | - |
| --query | filter flows using a custom query | - |
| --sport_range | filter source port range | - |
| --sport | filter source port | - |
| --sports | filter on either of two source ports | - |
| --tcp_flags | filter TCP flags | - |
| --interfaces | list of interfaces to monitor, comma separated | - |
| --exclude_interfaces | list of interfaces to exclude, comma separated | lo |
Example running flows capture on TCP protocol and port 49051 with PacketDrop and RTT features enabled:
$ oc netobserv flows --enable_pkt_drop --enable_rtt --action=Accept --cidr=0.0.0.0/0 --protocol=TCP --port=49051
16.3.1.4. Packets capture options Copy linkLink copied to clipboard!
You can filter packets capture data the as same as flows capture by using the filters. Certain features, such as packets drop, DNS, RTT, and network events, are only available for flows and metrics capture.
oc netobserv packets syntax
$ oc netobserv packets [<option>]
| Option | Description | Default |
|---|---|---|
| --background | run in background | false |
| --copy | copy the output files locally | prompt |
| --log-level | components logs | info |
| --max-time | maximum capture time | 5m |
| --max-bytes | maximum capture bytes | 50000000 = 50MB |
| --action | filter action | Accept |
| --cidr | filter CIDR | 0.0.0.0/0 |
| --direction | filter direction | - |
| --dport | filter destination port | - |
| --dport_range | filter destination port range | - |
| --dports | filter on either of two destination ports | - |
| --drops | filter flows with only dropped packets | false |
| --icmp_code | filter ICMP code | - |
| --icmp_type | filter ICMP type | - |
| --node-selector | capture on specific nodes | - |
| --peer_ip | filter peer IP | - |
| --peer_cidr | filter peer CIDR | - |
| --port_range | filter port range | - |
| --port | filter port | - |
| --ports | filter on either of two ports | - |
| --protocol | filter protocol | - |
| --query | filter flows using a custom query | - |
| --sport_range | filter source port range | - |
| --sport | filter source port | - |
| --sports | filter on either of two source ports | - |
| --tcp_flags | filter TCP flags | - |
Example running packets capture on TCP protocol and port 49051:
$ oc netobserv packets --action=Accept --cidr=0.0.0.0/0 --protocol=TCP --port=49051
16.3.1.5. Metrics capture options Copy linkLink copied to clipboard!
You can enable features and use filters on metrics capture, the same as flows capture. The generated graphs fill accordingly in the dashboard.
oc netobserv metrics syntax
$ oc netobserv metrics [<option>]
| Option | Description | Default |
|---|---|---|
| --enable_all | enable all eBPF features | false |
| --enable_dns | enable DNS tracking | false |
| --enable_ipsec | enable IPsec tracking | false |
| --enable_network_events | enable network events monitoring | false |
| --enable_pkt_translation | enable packet translation | false |
| --enable_pkt_drop | enable packet drop | false |
| --enable_rtt | enable RTT tracking | false |
| --enable_udn_mapping | enable User Defined Network mapping | false |
| --get-subnets | get subnets information | false |
| --privileged | force eBPF agent privileged mode | auto |
| --sampling | packets sampling interval | 1 |
| --background | run in background | false |
| --log-level | components logs | info |
| --max-time | maximum capture time | 1h |
| --action | filter action | Accept |
| --cidr | filter CIDR | 0.0.0.0/0 |
| --direction | filter direction | - |
| --dport | filter destination port | - |
| --dport_range | filter destination port range | - |
| --dports | filter on either of two destination ports | - |
| --drops | filter flows with only dropped packets | false |
| --icmp_code | filter ICMP code | - |
| --icmp_type | filter ICMP type | - |
| --node-selector | capture on specific nodes | - |
| --peer_ip | filter peer IP | - |
| --peer_cidr | filter peer CIDR | - |
| --port_range | filter port range | - |
| --port | filter port | - |
| --ports | filter on either of two ports | - |
| --protocol | filter protocol | - |
| --query | filter flows using a custom query | - |
| --sport_range | filter source port range | - |
| --sport | filter source port | - |
| --sports | filter on either of two source ports | - |
| --tcp_flags | filter TCP flags | - |
| --include_list | list of metric names to generate, comma separated | namespace_flows_total,node_ingress_bytes_total,node_egress_bytes_total,workload_ingress_bytes_total |
| --interfaces | list of interfaces to monitor, comma separated | - |
| --exclude_interfaces | list of interfaces to exclude, comma separated | lo |
Example running metrics capture for TCP drops
$ oc netobserv metrics --enable_pkt_drop --protocol=TCP
Chapter 17. FlowCollector API reference Copy linkLink copied to clipboard!
The
FlowCollector
17.1. FlowCollector API specifications Copy linkLink copied to clipboard!
- Description
-
FlowCollectoris the schema for the network flows collection API, which pilots and configures the underlying deployments. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and might reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources |
|
|
| Kind is a string value representing the REST resource this object represents. Servers might infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds |
|
|
| Standard object’s metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata |
|
|
|
Defines the desired state of the FlowCollector resource. *: the mention of "unsupported" or "deprecated" for a feature throughout this document means that this feature is not officially supported by Red Hat. It might have been, for example, contributed by the community and accepted without a formal agreement for maintenance. The product maintainers might provide some support for these features as a best effort only. |
17.1.1. .metadata Copy linkLink copied to clipboard!
- Description
- Standard object’s metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
- Type
-
object
17.1.2. .spec Copy linkLink copied to clipboard!
- Description
Defines the desired state of the FlowCollector resource.
*: the mention of "unsupported" or "deprecated" for a feature throughout this document means that this feature is not officially supported by Red Hat. It might have been, for example, contributed by the community and accepted without a formal agreement for maintenance. The product maintainers might provide some support for these features as a best effort only.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Agent configuration for flows extraction. |
|
|
|
|
|
|
|
-
-
-
Kafka can provide better scalability, resiliency, and high availability (for more details, see https://www.redhat.com/en/topics/integration/what-is-apache-kafka).
|
|
|
|
|
|
|
| Kafka configuration, allowing to use Kafka as a broker as part of the flow collection pipeline. Available when the
|
|
|
|
|
|
|
| Namespace where Network Observability pods are deployed. |
|
|
|
|
|
|
|
|
|
|
|
|
17.1.3. .spec.agent Copy linkLink copied to clipboard!
- Description
- Agent configuration for flows extraction.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
17.1.4. .spec.agent.ebpf Copy linkLink copied to clipboard!
- Description
-
ebpfdescribes the settings related to the eBPF-based flow reporter whenspec.agent.typeis set toeBPF. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
List of additional features to enable. They are all disabled by default. Enabling additional features might have performance impacts. Possible values are: -
-
-
-
-
-
-
This feature requires mounting the kernel debug filesystem, so the eBPF agent pods must run as privileged via
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Privileged mode for the eBPF Agent container. When set to
|
|
|
|
|
|
|
| Sampling interval of the eBPF probe. 100 means one packet on 100 is sent. 0 or 1 means all packets are sampled. |
17.1.5. .spec.agent.ebpf.advanced Copy linkLink copied to clipboard!
- Description
-
advancedallows setting some aspects of the internal configuration of the eBPF agent. This section is aimed mostly for debugging and fine-grained performance optimizations, such asGOGCandGOMAXPROCSenvironment variables. Set these values at your own risk. You can also override the default Linux capabilities from there. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Linux capabilities override, when not running as privileged. Default capabilities are BPF, PERFMON and NET_ADMIN. |
|
|
|
|
|
|
| scheduling controls how the pods are scheduled on nodes. |
17.1.6. .spec.agent.ebpf.advanced.scheduling Copy linkLink copied to clipboard!
- Description
- scheduling controls how the pods are scheduled on nodes.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| If specified, the pod’s scheduling constraints. For documentation, refer to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling. |
|
|
|
|
|
|
| If specified, indicates the pod’s priority. For documentation, refer to https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#how-to-use-priority-and-preemption. If not specified, default priority is used, or zero if there is no default. |
|
|
|
|
17.1.7. .spec.agent.ebpf.advanced.scheduling.affinity Copy linkLink copied to clipboard!
- Description
- If specified, the pod’s scheduling constraints. For documentation, refer to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling.
- Type
-
object
17.1.8. .spec.agent.ebpf.advanced.scheduling.tolerations Copy linkLink copied to clipboard!
- Description
-
tolerationsis a list of tolerations that allow the pod to schedule onto nodes with matching taints. For documentation, refer to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling. - Type
-
array
17.1.9. .spec.agent.ebpf.flowFilter Copy linkLink copied to clipboard!
- Description
-
flowFilterdefines the eBPF agent configuration regarding flow filtering. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Set
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17.1.10. .spec.agent.ebpf.flowFilter.rules Copy linkLink copied to clipboard!
- Description
-
rulesdefines a list of filtering rules on the eBPF Agents. When filtering is enabled, by default, flows that don’t match any rule are rejected. To change the default, you can define a rule that accepts everything:{ action: "Accept", cidr: "0.0.0.0/0" }, and then refine with rejecting rules. - Type
-
array
17.1.11. .spec.agent.ebpf.flowFilter.rules[] Copy linkLink copied to clipboard!
- Description
-
EBPFFlowFilterRuledefines the desired eBPF agent configuration regarding flow filtering rule. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17.1.12. .spec.agent.ebpf.metrics Copy linkLink copied to clipboard!
- Description
-
metricsdefines the eBPF agent configuration regarding metrics. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Set
|
|
|
| Metrics server endpoint configuration for the Prometheus scraper. |
17.1.13. .spec.agent.ebpf.metrics.server Copy linkLink copied to clipboard!
- Description
- Metrics server endpoint configuration for the Prometheus scraper.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| The metrics server HTTP port. |
|
|
| TLS configuration. |
17.1.14. .spec.agent.ebpf.metrics.server.tls Copy linkLink copied to clipboard!
- Description
- TLS configuration.
- Type
-
object - Required
-
type
-
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| TLS configuration when
|
|
|
| Reference to the CA file when
|
|
|
|
Select the type of TLS configuration: -
|
17.1.15. .spec.agent.ebpf.metrics.server.tls.provided Copy linkLink copied to clipboard!
- Description
-
TLS configuration when
typeis set toProvided. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.16. .spec.agent.ebpf.metrics.server.tls.providedCaFile Copy linkLink copied to clipboard!
- Description
-
Reference to the CA file when
typeis set toProvided. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| File name within the config map or secret. |
|
|
| Name of the config map or secret containing the file. |
|
|
| Namespace of the config map or secret containing the file. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the file reference:
|
17.1.17. .spec.agent.ebpf.resources Copy linkLink copied to clipboard!
- Description
-
resourcesare the compute resources required by this container. For more information, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
|
|
| Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
17.1.18. .spec.consolePlugin Copy linkLink copied to clipboard!
- Description
-
consolePlugindefines the settings related to the OpenShift Container Platform Console plugin, when available. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Enables the console plugin deployment. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Deploy as a standalone console, instead of a plugin of the OpenShift Container Platform Console. This is not recommended when using with OpenShift Container Platform, as it doesn’t provide an integrated experience. [Unsupported (*)]. |
|
|
| If
|
17.1.19. .spec.consolePlugin.advanced Copy linkLink copied to clipboard!
- Description
-
advancedallows setting some aspects of the internal configuration of the console plugin. This section is aimed mostly for debugging and fine-grained performance optimizations, such asGOGCandGOMAXPROCSenvironment variables. Set these values at your own risk. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17.1.20. .spec.consolePlugin.advanced.scheduling Copy linkLink copied to clipboard!
- Description
-
schedulingcontrols how the pods are scheduled on nodes. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| If specified, the pod’s scheduling constraints. For documentation, refer to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling. |
|
|
|
|
|
|
| If specified, indicates the pod’s priority. For documentation, refer to https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#how-to-use-priority-and-preemption. If not specified, default priority is used, or zero if there is no default. |
|
|
|
|
17.1.21. .spec.consolePlugin.advanced.scheduling.affinity Copy linkLink copied to clipboard!
- Description
- If specified, the pod’s scheduling constraints. For documentation, refer to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling.
- Type
-
object
17.1.22. .spec.consolePlugin.advanced.scheduling.tolerations Copy linkLink copied to clipboard!
- Description
-
tolerationsis a list of tolerations that allow the pod to schedule onto nodes with matching taints. For documentation, refer to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling. - Type
-
array
17.1.23. .spec.consolePlugin.autoscaler Copy linkLink copied to clipboard!
- Description
-
autoscaler[deprecated (*)] spec of a horizontal pod autoscaler to set up for the plugin Deployment. Deprecation notice: managed autoscaler will be removed in a future version. You might configure instead an autoscaler of your choice, and setspec.consolePlugin.unmanagedReplicastotrue. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). - Type
-
object
17.1.24. .spec.consolePlugin.portNaming Copy linkLink copied to clipboard!
- Description
-
portNamingdefines the configuration of the port-to-service name translation. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Enable the console plugin port-to-service name translation |
|
|
|
|
17.1.25. .spec.consolePlugin.quickFilters Copy linkLink copied to clipboard!
- Description
-
quickFiltersconfigures quick filter presets for the Console plugin. Filters for external traffic assume the subnet labels are configured to distinguish internal and external traffic (seespec.processor.subnetLabels). - Type
-
array
17.1.26. .spec.consolePlugin.quickFilters[] Copy linkLink copied to clipboard!
- Description
-
QuickFilterdefines preset configuration for Console’s quick filters - Type
-
object - Required
-
filter -
name
-
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the filter, that is displayed in the Console |
17.1.27. .spec.consolePlugin.resources Copy linkLink copied to clipboard!
- Description
-
resources, in terms of compute resources, required by this container. For more information, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
|
|
| Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
17.1.28. .spec.exporters Copy linkLink copied to clipboard!
- Description
-
exportersdefines additional optional exporters for custom consumption or storage. - Type
-
array
17.1.29. .spec.exporters[] Copy linkLink copied to clipboard!
- Description
-
FlowCollectorExporterdefines an additional exporter to send enriched flows to. - Type
-
object - Required
-
type
-
| Property | Type | Description |
|---|---|---|
|
|
| IPFIX configuration, such as the IP address and port to send enriched IPFIX flows to. |
|
|
| Kafka configuration, such as the address and topic, to send enriched flows to. |
|
|
| OpenTelemetry configuration, such as the IP address and port to send enriched logs or metrics to. |
|
|
|
|
17.1.30. .spec.exporters[].ipfix Copy linkLink copied to clipboard!
- Description
- IPFIX configuration, such as the IP address and port to send enriched IPFIX flows to.
- Type
-
object - Required
-
enterpriseID -
targetHost -
targetPort
-
| Property | Type | Description |
|---|---|---|
|
|
| EnterpriseID, or Private Enterprise Number (PEN). To date, Network Observability does not own an assigned number, so it is left open for configuration. The PEN is needed to collect non standard data, such as Kubernetes names, RTT, etc. |
|
|
| Address of the IPFIX external receiver. |
|
|
| Port for the IPFIX external receiver. |
|
|
| Transport protocol (
|
17.1.31. .spec.exporters[].kafka Copy linkLink copied to clipboard!
- Description
- Kafka configuration, such as the address and topic, to send enriched flows to.
- Type
-
object - Required
-
address -
topic
-
| Property | Type | Description |
|---|---|---|
|
|
| Address of the Kafka server |
|
|
| SASL authentication configuration. [Unsupported (*)]. |
|
|
| TLS client configuration. When using TLS, verify that the address matches the Kafka port used for TLS, generally 9093. |
|
|
| Kafka topic to use. It must exist. Network Observability does not create it. |
17.1.32. .spec.exporters[].kafka.sasl Copy linkLink copied to clipboard!
- Description
- SASL authentication configuration. [Unsupported (*)].
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Reference to the secret or config map containing the client ID |
|
|
| Reference to the secret or config map containing the client secret |
|
|
| Type of SASL authentication to use, or
|
17.1.33. .spec.exporters[].kafka.sasl.clientIDReference Copy linkLink copied to clipboard!
- Description
- Reference to the secret or config map containing the client ID
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| File name within the config map or secret. |
|
|
| Name of the config map or secret containing the file. |
|
|
| Namespace of the config map or secret containing the file. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the file reference:
|
17.1.34. .spec.exporters[].kafka.sasl.clientSecretReference Copy linkLink copied to clipboard!
- Description
- Reference to the secret or config map containing the client secret
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| File name within the config map or secret. |
|
|
| Name of the config map or secret containing the file. |
|
|
| Namespace of the config map or secret containing the file. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the file reference:
|
17.1.35. .spec.exporters[].kafka.tls Copy linkLink copied to clipboard!
- Description
- TLS client configuration. When using TLS, verify that the address matches the Kafka port used for TLS, generally 9093.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Enable TLS |
|
|
|
|
|
|
|
|
17.1.36. .spec.exporters[].kafka.tls.caCert Copy linkLink copied to clipboard!
- Description
-
caCertdefines the reference of the certificate for the Certificate Authority. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.37. .spec.exporters[].kafka.tls.userCert Copy linkLink copied to clipboard!
- Description
-
userCertdefines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.38. .spec.exporters[].openTelemetry Copy linkLink copied to clipboard!
- Description
- OpenTelemetry configuration, such as the IP address and port to send enriched logs or metrics to.
- Type
-
object - Required
-
targetHost -
targetPort
-
| Property | Type | Description |
|---|---|---|
|
|
| Custom fields mapping to an OpenTelemetry conformant format. By default, Network Observability format proposal is used: https://github.com/rhobs/observability-data-model/blob/main/network-observability.md#format-proposal . As there is currently no accepted standard for L3 or L4 enriched network logs, you can freely override it with your own. |
|
|
| Headers to add to messages (optional) |
|
|
| OpenTelemetry configuration for logs. |
|
|
| OpenTelemetry configuration for metrics. |
|
|
| Protocol of the OpenTelemetry connection. The available options are
|
|
|
| Address of the OpenTelemetry receiver. |
|
|
| Port for the OpenTelemetry receiver. |
|
|
| TLS client configuration. |
17.1.39. .spec.exporters[].openTelemetry.fieldsMapping Copy linkLink copied to clipboard!
- Description
- Custom fields mapping to an OpenTelemetry conformant format. By default, Network Observability format proposal is used: https://github.com/rhobs/observability-data-model/blob/main/network-observability.md#format-proposal . As there is currently no accepted standard for L3 or L4 enriched network logs, you can freely override it with your own.
- Type
-
array
17.1.40. .spec.exporters[].openTelemetry.fieldsMapping[] Copy linkLink copied to clipboard!
- Description
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| |
|
|
| |
|
|
|
17.1.41. .spec.exporters[].openTelemetry.logs Copy linkLink copied to clipboard!
- Description
- OpenTelemetry configuration for logs.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Set
|
17.1.42. .spec.exporters[].openTelemetry.metrics Copy linkLink copied to clipboard!
- Description
- OpenTelemetry configuration for metrics.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Set
|
|
|
| Specify how often metrics are sent to a collector. |
17.1.43. .spec.exporters[].openTelemetry.tls Copy linkLink copied to clipboard!
- Description
- TLS client configuration.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Enable TLS |
|
|
|
|
|
|
|
|
17.1.44. .spec.exporters[].openTelemetry.tls.caCert Copy linkLink copied to clipboard!
- Description
-
caCertdefines the reference of the certificate for the Certificate Authority. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.45. .spec.exporters[].openTelemetry.tls.userCert Copy linkLink copied to clipboard!
- Description
-
userCertdefines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.46. .spec.kafka Copy linkLink copied to clipboard!
- Description
-
Kafka configuration, allowing to use Kafka as a broker as part of the flow collection pipeline. Available when the
spec.deploymentModelisKafka. - Type
-
object - Required
-
address -
topic
-
| Property | Type | Description |
|---|---|---|
|
|
| Address of the Kafka server |
|
|
| SASL authentication configuration. [Unsupported (*)]. |
|
|
| TLS client configuration. When using TLS, verify that the address matches the Kafka port used for TLS, generally 9093. |
|
|
| Kafka topic to use. It must exist. Network Observability does not create it. |
17.1.47. .spec.kafka.sasl Copy linkLink copied to clipboard!
- Description
- SASL authentication configuration. [Unsupported (*)].
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Reference to the secret or config map containing the client ID |
|
|
| Reference to the secret or config map containing the client secret |
|
|
| Type of SASL authentication to use, or
|
17.1.48. .spec.kafka.sasl.clientIDReference Copy linkLink copied to clipboard!
- Description
- Reference to the secret or config map containing the client ID
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| File name within the config map or secret. |
|
|
| Name of the config map or secret containing the file. |
|
|
| Namespace of the config map or secret containing the file. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the file reference:
|
17.1.49. .spec.kafka.sasl.clientSecretReference Copy linkLink copied to clipboard!
- Description
- Reference to the secret or config map containing the client secret
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| File name within the config map or secret. |
|
|
| Name of the config map or secret containing the file. |
|
|
| Namespace of the config map or secret containing the file. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the file reference:
|
17.1.50. .spec.kafka.tls Copy linkLink copied to clipboard!
- Description
- TLS client configuration. When using TLS, verify that the address matches the Kafka port used for TLS, generally 9093.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Enable TLS |
|
|
|
|
|
|
|
|
17.1.51. .spec.kafka.tls.caCert Copy linkLink copied to clipboard!
- Description
-
caCertdefines the reference of the certificate for the Certificate Authority. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.52. .spec.kafka.tls.userCert Copy linkLink copied to clipboard!
- Description
-
userCertdefines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.53. .spec.loki Copy linkLink copied to clipboard!
- Description
-
loki, the flow store, client settings. - Type
-
object - Required
-
mode
-
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Set
|
|
|
| Loki configuration for
|
|
|
| Loki configuration for
|
|
|
| Loki configuration for
|
|
|
|
- Use
- Use
- Use
- Use
|
|
|
| Loki configuration for
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17.1.54. .spec.loki.advanced Copy linkLink copied to clipboard!
- Description
-
advancedallows setting some aspects of the internal configuration of the Loki clients. This section is aimed mostly for debugging and fine-grained performance optimizations. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17.1.55. .spec.loki.lokiStack Copy linkLink copied to clipboard!
- Description
-
Loki configuration for
LokiStackmode. This is useful for an easy Loki Operator configuration. It is ignored for other modes. - Type
-
object - Required
-
name
-
| Property | Type | Description |
|---|---|---|
|
|
| Name of an existing LokiStack resource to use. |
|
|
| Namespace where this
|
17.1.56. .spec.loki.manual Copy linkLink copied to clipboard!
- Description
-
Loki configuration for
Manualmode. This is the most flexible configuration. It is ignored for other modes. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
-
-
-
When using the Loki Operator, this must be set to
|
|
|
|
|
|
|
|
|
|
|
| TLS client configuration for Loki status URL. |
|
|
|
|
|
|
|
|
|
|
| TLS client configuration for Loki URL. |
17.1.57. .spec.loki.manual.statusTls Copy linkLink copied to clipboard!
- Description
- TLS client configuration for Loki status URL.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Enable TLS |
|
|
|
|
|
|
|
|
17.1.58. .spec.loki.manual.statusTls.caCert Copy linkLink copied to clipboard!
- Description
-
caCertdefines the reference of the certificate for the Certificate Authority. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.59. .spec.loki.manual.statusTls.userCert Copy linkLink copied to clipboard!
- Description
-
userCertdefines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.60. .spec.loki.manual.tls Copy linkLink copied to clipboard!
- Description
- TLS client configuration for Loki URL.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Enable TLS |
|
|
|
|
|
|
|
|
17.1.61. .spec.loki.manual.tls.caCert Copy linkLink copied to clipboard!
- Description
-
caCertdefines the reference of the certificate for the Certificate Authority. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.62. .spec.loki.manual.tls.userCert Copy linkLink copied to clipboard!
- Description
-
userCertdefines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.63. .spec.loki.microservices Copy linkLink copied to clipboard!
- Description
-
Loki configuration for
Microservicesmode. Use this option when Loki is installed using the microservices deployment mode (https://grafana.com/docs/loki/latest/fundamentals/architecture/deployment-modes/#microservices-mode). It is ignored for other modes. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| TLS client configuration for Loki URL. |
17.1.64. .spec.loki.microservices.tls Copy linkLink copied to clipboard!
- Description
- TLS client configuration for Loki URL.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Enable TLS |
|
|
|
|
|
|
|
|
17.1.65. .spec.loki.microservices.tls.caCert Copy linkLink copied to clipboard!
- Description
-
caCertdefines the reference of the certificate for the Certificate Authority. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.66. .spec.loki.microservices.tls.userCert Copy linkLink copied to clipboard!
- Description
-
userCertdefines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.67. .spec.loki.monolithic Copy linkLink copied to clipboard!
- Description
-
Loki configuration for
Monolithicmode. Use this option when Loki is installed using the monolithic deployment mode (https://grafana.com/docs/loki/latest/fundamentals/architecture/deployment-modes/#monolithic-mode). It is ignored for other modes. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Set
|
|
|
|
|
|
|
| TLS client configuration for Loki URL. |
|
|
|
|
17.1.68. .spec.loki.monolithic.tls Copy linkLink copied to clipboard!
- Description
- TLS client configuration for Loki URL.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Enable TLS |
|
|
|
|
|
|
|
|
17.1.69. .spec.loki.monolithic.tls.caCert Copy linkLink copied to clipboard!
- Description
-
caCertdefines the reference of the certificate for the Certificate Authority. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.70. .spec.loki.monolithic.tls.userCert Copy linkLink copied to clipboard!
- Description
-
userCertdefines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.71. .spec.networkPolicy Copy linkLink copied to clipboard!
- Description
-
networkPolicydefines network policy settings for Network Observability components isolation. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Deploys network policies on the namespaces used by Network Observability (main and privileged). These network policies better isolate the Network Observability components to prevent undesired connections from and to them. This option is enabled by default when using with OVNKubernetes, and disabled otherwise (it has not been tested with other CNIs). When disabled, you can manually create the network policies for the Network Observability components. |
17.1.72. .spec.processor Copy linkLink copied to clipboard!
- Description
-
processordefines the settings of the component that receives the flows from the agent, enriches them, generates metrics, and forwards them to the Loki persistence layer and/or any available exporter. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-
-
-
-
|
|
|
|
|
|
|
| Set
|
|
|
|
|
|
|
| Global configuration managing FlowCollectorSlices custom resources. |
|
|
|
|
|
|
| If
|
17.1.73. .spec.processor.advanced Copy linkLink copied to clipboard!
- Description
-
advancedallows setting some aspects of the internal configuration of the flow processor. This section is aimed mostly for debugging and fine-grained performance optimizations, such asGOGCandGOMAXPROCSenvironment variables. Set these values at your own risk. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Port of the flow collector (host port). By convention, some values are forbidden. It must be greater than 1024 and different from 4500, 4789 and 6081. |
|
|
|
|
|
|
| scheduling controls how the pods are scheduled on nodes. |
|
|
| Defines secondary networks to be checked for resources identification. To guarantee a correct identification, indexed values must form an unique identifier across the cluster. If the same index is used by several resources, those resources might be incorrectly labeled. |
17.1.74. .spec.processor.advanced.scheduling Copy linkLink copied to clipboard!
- Description
- scheduling controls how the pods are scheduled on nodes.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| If specified, the pod’s scheduling constraints. For documentation, refer to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling. |
|
|
|
|
|
|
| If specified, indicates the pod’s priority. For documentation, refer to https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#how-to-use-priority-and-preemption. If not specified, default priority is used, or zero if there is no default. |
|
|
|
|
17.1.75. .spec.processor.advanced.scheduling.affinity Copy linkLink copied to clipboard!
- Description
- If specified, the pod’s scheduling constraints. For documentation, refer to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling.
- Type
-
object
17.1.76. .spec.processor.advanced.scheduling.tolerations Copy linkLink copied to clipboard!
- Description
-
tolerationsis a list of tolerations that allow the pod to schedule onto nodes with matching taints. For documentation, refer to https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#scheduling. - Type
-
array
17.1.77. .spec.processor.advanced.secondaryNetworks Copy linkLink copied to clipboard!
- Description
- Defines secondary networks to be checked for resources identification. To guarantee a correct identification, indexed values must form an unique identifier across the cluster. If the same index is used by several resources, those resources might be incorrectly labeled.
- Type
-
array
17.1.78. .spec.processor.advanced.secondaryNetworks[] Copy linkLink copied to clipboard!
- Description
- Type
-
object - Required
-
index -
name
-
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
17.1.79. .spec.processor.deduper Copy linkLink copied to clipboard!
- Description
-
deduperallows you to sample or drop flows identified as duplicates, in order to save on resource usage. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
Set the Processor de-duplication mode. It comes in addition to the Agent-based deduplication, since the Agent cannot de-duplicate same flows reported from different nodes. - Use
- Use
- Use
|
|
|
|
|
17.1.80. .spec.processor.filters Copy linkLink copied to clipboard!
- Description
-
filterslets you define custom filters to limit the amount of generated flows. These filters provide more flexibility than the eBPF Agent filters (inspec.agent.ebpf.flowFilter), such as allowing to filter by Kubernetes namespace, but with a lesser improvement in performance. - Type
-
array
17.1.81. .spec.processor.filters[] Copy linkLink copied to clipboard!
- Description
-
FLPFilterSetdefines the desired configuration for FLP-based filtering satisfying all conditions. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| If specified, these filters target a single output:
|
|
|
| A query that selects the network flows to keep. More information about this query language in https://github.com/netobserv/flowlogs-pipeline/blob/main/docs/filtering.md. |
|
|
|
|
17.1.82. .spec.processor.kafkaConsumerAutoscaler Copy linkLink copied to clipboard!
- Description
-
kafkaConsumerAutoscaler[deprecated (*)] is the spec of a horizontal pod autoscaler to set up forflowlogs-pipeline-transformer, which consumes Kafka messages. This setting is ignored when Kafka is disabled. Deprecation notice: managed autoscaler will be removed in a future version. You might configure instead an autoscaler of your choice, and setspec.processor.unmanagedReplicastotrue. Refer to HorizontalPodAutoscaler documentation (autoscaling/v2). - Type
-
object
17.1.83. .spec.processor.metrics Copy linkLink copied to clipboard!
- Description
-
Metricsdefine the processor configuration regarding metrics - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Metrics server endpoint configuration for Prometheus scraper |
17.1.84. .spec.processor.metrics.healthRules Copy linkLink copied to clipboard!
- Description
-
healthRulesis a list of health rules to be created for Prometheus, organized by templates and variants. Each health rule can be configured to generate either alerts or recording rules based on the mode field. More information on health rules: https://github.com/netobserv/network-observability-operator/blob/main/docs/HealthRules.md - Type
-
array
17.1.85. .spec.processor.metrics.healthRules[] Copy linkLink copied to clipboard!
- Description
- Type
-
object - Required
-
template -
variants
-
| Property | Type | Description |
|---|---|---|
|
|
| Mode defines whether this health rule should be generated as an alert or a recording rule. Possible values are:
|
|
|
| Health rule template name. Possible values are:
|
|
|
| A list of variants for this template |
17.1.86. .spec.processor.metrics.healthRules[].variants Copy linkLink copied to clipboard!
- Description
- A list of variants for this template
- Type
-
array
17.1.87. .spec.processor.metrics.healthRules[].variants[] Copy linkLink copied to clipboard!
- Description
- Type
-
object - Required
-
thresholds
-
| Property | Type | Description |
|---|---|---|
|
|
| Optional grouping criteria, possible values are:
|
|
|
| The low volume threshold allows to ignore metrics with a too low volume of traffic, in order to improve signal-to-noise. It is provided as an absolute rate (bytes per second or packets per second, depending on the context). When provided, it must be parsable as a float. |
|
|
| Mode overrides the health rule mode for this specific variant. If not specified, inherits from the parent health rule’s mode. Possible values are:
|
|
|
| Thresholds of the health rule per severity. They are expressed as a percentage of errors above which the alert is triggered. They must be parsable as floats. Required for both alert and recording modes |
|
|
| For trending health rules, the duration interval for baseline comparison. For example, "2h" means comparing against a 2-hours average. Defaults to 2h. |
|
|
| For trending health rules, the time offset for baseline comparison. For example, "1d" means comparing against yesterday. Defaults to 1d. |
17.1.88. .spec.processor.metrics.healthRules[].variants[].thresholds Copy linkLink copied to clipboard!
- Description
- Thresholds of the health rule per severity. They are expressed as a percentage of errors above which the alert is triggered. They must be parsable as floats. Required for both alert and recording modes
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Threshold for severity
|
|
|
| Threshold for severity
|
|
|
| Threshold for severity
|
17.1.89. .spec.processor.metrics.server Copy linkLink copied to clipboard!
- Description
- Metrics server endpoint configuration for Prometheus scraper
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| The metrics server HTTP port. |
|
|
| TLS configuration. |
17.1.90. .spec.processor.metrics.server.tls Copy linkLink copied to clipboard!
- Description
- TLS configuration.
- Type
-
object - Required
-
type
-
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| TLS configuration when
|
|
|
| Reference to the CA file when
|
|
|
|
Select the type of TLS configuration: -
|
17.1.91. .spec.processor.metrics.server.tls.provided Copy linkLink copied to clipboard!
- Description
-
TLS configuration when
typeis set toProvided. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.92. .spec.processor.metrics.server.tls.providedCaFile Copy linkLink copied to clipboard!
- Description
-
Reference to the CA file when
typeis set toProvided. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| File name within the config map or secret. |
|
|
| Name of the config map or secret containing the file. |
|
|
| Namespace of the config map or secret containing the file. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the file reference:
|
17.1.93. .spec.processor.resources Copy linkLink copied to clipboard!
- Description
-
resourcesare the compute resources required by this container. For more information, see https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Limits describes the maximum amount of compute resources allowed. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
|
|
| Requests describes the minimum amount of compute resources required. If Requests is omitted for a container, it defaults to Limits if that is explicitly specified, otherwise to an implementation-defined value. Requests cannot exceed Limits. More info: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
17.1.94. .spec.processor.slicesConfig Copy linkLink copied to clipboard!
- Description
- Global configuration managing FlowCollectorSlices custom resources.
- Type
-
object - Required
-
enable
-
| Property | Type | Description |
|---|---|---|
|
|
|
- When set to
- When set to
|
|
|
|
|
|
|
|
|
17.1.95. .spec.processor.subnetLabels Copy linkLink copied to clipboard!
- Description
-
subnetLabelsallows to define custom labels on subnets and IPs or to enable automatic labeling of recognized subnets in OpenShift Container Platform, which is used to identify cluster external traffic. When a subnet matches the source or destination IP of a flow, a corresponding field is added:SrcSubnetLabelorDstSubnetLabel. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
If
If
|
|
|
|
|
17.1.96. .spec.processor.subnetLabels.customLabels Copy linkLink copied to clipboard!
- Description
- allows you to customize subnets and IPs labeling, such as to identify cluster external workloads or web services. External subnets must be labeled with the prefix
customLabels, or not labeled at all, in order to work with default quick filters and some metrics examples provided.EXT:
If
is disabled or you are not using OpenShift Container Platform, it is recommended to manually configure labels for the cluster subnets, to distinguish internal traffic from external traffic.openShiftAutoDetect
If
is enabled,openShiftAutoDetectoverrides the detected subnets when they overlap.customLabels
- Type
-
array
17.1.97. .spec.processor.subnetLabels.customLabels[] Copy linkLink copied to clipboard!
- Description
- SubnetLabel allows to label subnets and IPs, such as to identify cluster-external workloads or web services.
- Type
-
object - Required
-
cidrs -
name
-
| Property | Type | Description |
|---|---|---|
|
|
| List of CIDRs, such as
|
|
|
| Label name, used to flag matching flows. External subnets must be labeled with the prefix
|
17.1.98. .spec.prometheus Copy linkLink copied to clipboard!
- Description
-
prometheusdefines Prometheus settings, such as querier configuration used to fetch metrics from the Console plugin. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| Prometheus querying configuration, such as client settings, used in the Console plugin. |
17.1.99. .spec.prometheus.querier Copy linkLink copied to clipboard!
- Description
- Prometheus querying configuration, such as client settings, used in the Console plugin.
- Type
-
object - Required
-
mode
-
| Property | Type | Description |
|---|---|---|
|
|
| When
|
|
|
| Prometheus configuration for
|
|
|
|
- Use
- Use
|
|
|
|
|
17.1.100. .spec.prometheus.querier.manual Copy linkLink copied to clipboard!
- Description
-
Prometheus configuration for
Manualmode. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| AlertManager configuration. This is used in the console to query silenced alerts, for displaying health information. When used in OpenShift Container Platform it can be left empty to use the Console API instead. [Unsupported (*)]. |
|
|
| Set
|
|
|
| TLS client configuration for Prometheus URL. |
|
|
|
|
17.1.101. .spec.prometheus.querier.manual.alertManager Copy linkLink copied to clipboard!
- Description
- AlertManager configuration. This is used in the console to query silenced alerts, for displaying health information. When used in OpenShift Container Platform it can be left empty to use the Console API instead. [Unsupported (*)].
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| TLS client configuration for Prometheus AlertManager URL. |
|
|
|
|
17.1.102. .spec.prometheus.querier.manual.alertManager.tls Copy linkLink copied to clipboard!
- Description
- TLS client configuration for Prometheus AlertManager URL.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Enable TLS |
|
|
|
|
|
|
|
|
17.1.103. .spec.prometheus.querier.manual.alertManager.tls.caCert Copy linkLink copied to clipboard!
- Description
-
caCertdefines the reference of the certificate for the Certificate Authority. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.104. .spec.prometheus.querier.manual.alertManager.tls.userCert Copy linkLink copied to clipboard!
- Description
-
userCertdefines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.105. .spec.prometheus.querier.manual.tls Copy linkLink copied to clipboard!
- Description
- TLS client configuration for Prometheus URL.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
| Enable TLS |
|
|
|
|
|
|
|
|
17.1.106. .spec.prometheus.querier.manual.tls.caCert Copy linkLink copied to clipboard!
- Description
-
caCertdefines the reference of the certificate for the Certificate Authority. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
17.1.107. .spec.prometheus.querier.manual.tls.userCert Copy linkLink copied to clipboard!
- Description
-
userCertdefines the user certificate reference and is used for mTLS. When you use one-way TLS, you can ignore this property. - Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| Name of the config map or secret containing certificates. |
|
|
| Namespace of the config map or secret containing certificates. If omitted, the default is to use the same namespace as where Network Observability is deployed. If the namespace is different, the config map or the secret is copied so that it can be mounted as required. |
|
|
| Type for the certificate reference:
|
Chapter 18. FlowMetric configuration parameters Copy linkLink copied to clipboard!
The
FlowMetric
18.1. FlowMetric [flows.netobserv.io/v1alpha1] Copy linkLink copied to clipboard!
- Description
- FlowMetric is the API allowing to create custom metrics from the collected flow logs.
- Type
-
object
| Property | Type | Description |
|---|---|---|
|
|
| APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and might reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources |
|
|
| Kind is a string value representing the REST resource this object represents. Servers might infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds |
|
|
| Standard object’s metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata |
|
|
|
FlowMetricSpec defines the desired state of FlowMetric The provided API allows you to customize these metrics according to your needs.
When adding new metrics or modifying existing labels, you must carefully monitor the memory usage of Prometheus workloads as this could potentially have a high impact. Cf https://rhobs-handbook.netlify.app/products/openshiftmonitoring/telemetry.md/#what-is-the-cardinality-of-a-metric To check the cardinality of all Network Observability metrics, run as
|
18.1.1. .metadata Copy linkLink copied to clipboard!
- Description
- Standard object’s metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
- Type
-
object
18.1.2. .spec Copy linkLink copied to clipboard!
- Description
FlowMetricSpec defines the desired state of FlowMetric The provided API allows you to customize these metrics according to your needs.
When adding new metrics or modifying existing labels, you must carefully monitor the memory usage of Prometheus workloads as this could potentially have a high impact. Cf https://rhobs-handbook.netlify.app/products/openshiftmonitoring/telemetry.md/#what-is-the-cardinality-of-a-metric
To check the cardinality of all Network Observability metrics, run as
:promql.count({name=~"netobserv.*"}) by (name)- Type
-
object - Required
-
type
-
| Property | Type | Description |
|---|---|---|
|
|
| A list of buckets to use when
|
|
|
| Charts configuration, for the OpenShift Container Platform Console in the administrator view, Dashboards menu. |
|
|
| Filter for ingress, egress or any direction flows. When set to
|
|
|
| When nonzero, scale factor (divider) of the value. Metric value = Flow value / Divider. |
|
|
|
|
|
|
|
|
|
|
| Help text of the metric, as it appears in Prometheus. |
|
|
|
|
|
|
| Name of the metric. In Prometheus, it is automatically prefixed with "netobserv_". Leave empty to generate the name based on the
|
|
|
| Set the
|
|
|
| Metric type: "Counter", "Histogram" or "Gauge". Use "Counter" for any value that increases over time and on which you can compute a rate, such as Bytes or Packets. Use "Histogram" for any value that must be sampled independently, such as latencies. Use "Gauge" for other values that don’t necessitate accuracy over time (gauges are sampled only every N seconds when Prometheus fetches the metric). |
|
|
|
|
18.1.3. .spec.charts Copy linkLink copied to clipboard!
- Description
- Charts configuration, for the OpenShift Container Platform Console in the administrator view, Dashboards menu.
- Type
-
array
18.1.4. .spec.charts[] Copy linkLink copied to clipboard!
- Description
- Configures charts / dashboard generation associated to a metric
- Type
-
object - Required
-
dashboardName -
queries -
title -
type
-
| Property | Type | Description |
|---|---|---|
|
|
| Name of the containing dashboard. If this name does not refer to an existing dashboard, a new dashboard is created. |
|
|
| List of queries to be displayed on this chart. If
|
|
|
| Name of the containing dashboard section. If this name does not refer to an existing section, a new section is created. If
|
|
|
| Title of the chart. |
|
|
| Type of the chart. |
|
|
| Unit of this chart. Only a few units are currently supported. Leave empty to use generic number. |
18.1.5. .spec.charts[].queries Copy linkLink copied to clipboard!
- Description
-
List of queries to be displayed on this chart. If
typeisSingleStatand multiple queries are provided, this chart is automatically expanded in several panels (one per query). - Type
-
array
18.1.6. .spec.charts[].queries[] Copy linkLink copied to clipboard!
- Description
- Configures PromQL queries
- Type
-
object - Required
-
legend -
promQL -
top
-
| Property | Type | Description |
|---|---|---|
|
|
| The query legend that applies to each timeseries represented in this chart. When multiple timeseries are displayed, you should set a legend that distinguishes each of them. It can be done with the following format:
|
|
|
| The
|
|
|
| Top N series to display per timestamp. Does not apply to
|
18.1.7. .spec.filters Copy linkLink copied to clipboard!
- Description
-
filtersis a list of fields and values used to restrict which flows are taken into account. Refer to the documentation for the list of available fields: https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/network_observability/json-flows-format-reference. - Type
-
array
18.1.8. .spec.filters[] Copy linkLink copied to clipboard!
- Description
- Type
-
object - Required
-
field -
matchType
-
| Property | Type | Description |
|---|---|---|
|
|
| Name of the field to filter on (for example:
|
|
|
| Type of matching to apply |
|
|
| Value to filter on. When
|
Chapter 19. Network flows format reference Copy linkLink copied to clipboard!
Review the specifications for the network flow format, which is used internally and for exporting flow data to Kafka.
19.1. Network Flows format reference Copy linkLink copied to clipboard!
This is the specification of the network flows format. That format is used when a Kafka exporter is configured, for Prometheus metrics labels as well as internally for the Loki store.
The "Filter ID" column shows which related name to use when defining Quick Filters (see
spec.consolePlugin.quickFilters
FlowCollector
The "Loki label" column is useful when querying Loki directly: label fields need to be selected using stream selectors.
The "Cardinality" column gives information about the implied metric cardinality if this field was to be used as a Prometheus label with the
FlowMetrics
FlowMetrics
| Name | Type | Description | Filter ID | Loki label | Cardinality | OpenTelemetry |
|---|---|---|---|---|---|---|
|
| number | Number of bytes | n/a | no | avoid | bytes |
|
| number | Error number returned from DNS tracker ebpf hook function |
| no | fine | dns.errno |
|
| number | DNS flags for DNS record | n/a | no | fine | dns.flags |
|
| string | Parsed DNS header RCODEs name |
| no | fine | dns.responsecode |
|
| number | DNS record id |
| no | avoid | dns.id |
|
| number | Time between a DNS request and response, in milliseconds |
| no | avoid | dns.latency |
|
| string | DNS queried name |
| no | careful | n/a |
|
| number | Differentiated Services Code Point (DSCP) value |
| no | fine | dscp |
|
| string | Destination IP address (ipv4 or ipv6) |
| no | avoid | destination.address |
|
| string | Destination node IP |
| no | fine | destination.k8s.host.address |
|
| string | Destination node name |
| no | fine | destination.k8s.host.name |
|
| string | Name of the destination Kubernetes object, such as Pod name, Service name or Node name. |
| no | careful | destination.k8s.name |
|
| string | Destination namespace |
| yes | fine | destination.k8s.namespace.name |
|
| string | Destination network name |
| no | fine | n/a |
|
| string | Name of the destination owner, such as Deployment name, StatefulSet name, etc. |
| yes | fine | destination.k8s.owner.name |
|
| string | Kind of the destination owner, such as Deployment, StatefulSet, etc. |
| no | fine | destination.k8s.owner.kind |
|
| string | Kind of the destination Kubernetes object, such as Pod, Service or Node. |
| yes | fine | destination.k8s.kind |
|
| string | Destination availability zone |
| yes | fine | destination.zone |
|
| string | Destination MAC address |
| no | avoid | destination.mac |
|
| number | Destination port |
| no | careful | destination.port |
|
| string | Destination subnet label |
| no | fine | destination.subnet.label |
|
| string[] |
List of TCP flags comprised in the flow, according to RFC-9293, with additional custom flags to represent the following per-packet combinations: |
| no | careful | tcp.flags |
|
| number |
Flow interpreted direction from the node observation point. Can be one of: |
| yes | fine | host.direction |
|
| string | Status of the IPsec encryption (on egress, given by the kernel xfrm_output function) or decryption (on ingress, via xfrm_input) |
| no | fine | n/a |
|
| number | ICMP code |
| no | fine | icmp.code |
|
| number | ICMP type |
| no | fine | icmp.type |
|
| number[] |
Flow directions from the network interface observation point. Can be one of: |
| no | fine | interface.directions |
|
| string[] | Network interfaces |
| no | careful | interface.names |
|
| string | Cluster name or identifier |
| yes | fine | k8s.cluster.name |
|
| string | Flow layer: 'app' or 'infra' |
| yes | fine | k8s.layer |
|
| object[] |
Network events, such as network policy actions, composed of nested fields: |
| no | avoid | n/a |
|
| number | Number of packets | n/a | no | avoid | packets |
|
| number | Number of bytes dropped by the kernel | n/a | no | avoid | drops.bytes |
|
| string | Latest drop cause |
| no | fine | drops.latestcause |
|
| number | TCP flags on last dropped packet | n/a | no | fine | drops.latestflags |
|
| string | TCP state on last dropped packet |
| no | fine | drops.lateststate |
|
| number | Number of packets dropped by the kernel | n/a | no | avoid | drops.packets |
|
| number | L4 protocol |
| no | fine | protocol |
|
| number | Sampling interval used for this flow | n/a | no | fine | n/a |
|
| string | Source IP address (ipv4 or ipv6) |
| no | avoid | source.address |
|
| string | Source node IP |
| no | fine | source.k8s.host.address |
|
| string | Source node name |
| no | fine | source.k8s.host.name |
|
| string | Name of the source Kubernetes object, such as Pod name, Service name or Node name. |
| no | careful | source.k8s.name |
|
| string | Source namespace |
| yes | fine | source.k8s.namespace.name |
|
| string | Source network name |
| no | fine | n/a |
|
| string | Name of the source owner, such as Deployment name, StatefulSet name, etc. |
| yes | fine | source.k8s.owner.name |
|
| string | Kind of the source owner, such as Deployment, StatefulSet, etc. |
| no | fine | source.k8s.owner.kind |
|
| string | Kind of the source Kubernetes object, such as Pod, Service or Node. |
| yes | fine | source.k8s.kind |
|
| string | Source availability zone |
| yes | fine | source.zone |
|
| string | Source MAC address |
| no | avoid | source.mac |
|
| number | Source port |
| no | careful | source.port |
|
| string | Source subnet label |
| no | fine | source.subnet.label |
|
| number | End timestamp of this flow, in milliseconds | n/a | no | avoid | timeflowend |
|
| number | TCP Smoothed Round Trip Time (SRTT), in nanoseconds |
| no | avoid | tcp.rtt |
|
| number | Start timestamp of this flow, in milliseconds | n/a | no | avoid | timeflowstart |
|
| number | Timestamp when this flow was received and processed by the flow collector, in seconds | n/a | no | avoid | timereceived |
|
| string[] | List of User Defined Networks |
| no | careful | n/a |
|
| string | packet translation destination address |
| no | avoid | n/a |
|
| number | packet translation destination port |
| no | careful | n/a |
|
| string | packet translation source address |
| no | avoid | n/a |
|
| number | packet translation source port |
| no | careful | n/a |
|
| number | packet translation zone id |
| no | avoid | n/a |
|
| string | In conversation tracking, the conversation identifier |
| no | avoid | n/a |
|
| string | Type of record:
|
| yes | fine | n/a |
Chapter 20. Troubleshooting network observability Copy linkLink copied to clipboard!
Perform diagnostic actions to troubleshoot common issues related to the Network Observability Operator and its components.
20.1. Using the must-gather tool Copy linkLink copied to clipboard!
Use the must-gather tool to collect diagnostic information about Network Observability Operator resources, including pod logs and configuration details, to assist in troubleshooting cluster issues.
Procedure
- Navigate to the directory where you want to store the must-gather data.
Run the following command to collect cluster-wide must-gather resources:
$ oc adm must-gather --image-stream=openshift/must-gather \ --image=quay.io/netobserv/must-gather
20.2. Configuring network traffic menu entry in the OpenShift Container Platform console Copy linkLink copied to clipboard!
Restore a missing network traffic menu entry in the Observe menu of the OpenShift Container Platform console by manually registering the console plugin in the
FlowCollector
Prerequisites
- You have installed OpenShift Container Platform version 4.10 or newer.
Procedure
Check if the
field is set tospec.consolePlugin.registerby running the following command:true$ oc -n netobserv get flowcollector cluster -o yamlExample output
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowCollector metadata: name: cluster spec: consolePlugin: register: falseOptional: Add the
plugin by manually editing the Console Operator config:netobserv-plugin$ oc edit console.operator.openshift.io clusterExample output
... spec: plugins: - netobserv-plugin ...Optional: Set the
field tospec.consolePlugin.registerby running the following command:true$ oc -n netobserv edit flowcollector cluster -o yamlExample output
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowCollector metadata: name: cluster spec: consolePlugin: register: trueEnsure the status of console pods is
by running the following command:running$ oc get pods -n openshift-console -l app=consoleRestart the console pods by running the following command:
$ oc delete pods -n openshift-console -l app=console- Clear your browser cache and history.
Check the status of network observability plugin pods by running the following command:
$ oc get pods -n netobserv -l app=netobserv-pluginExample output
NAME READY STATUS RESTARTS AGE netobserv-plugin-68c7bbb9bb-b69q6 1/1 Running 0 21sCheck the logs of the network observability plugin pods by running the following command:
$ oc logs -n netobserv -l app=netobserv-pluginExample output
time="2022-12-13T12:06:49Z" level=info msg="Starting netobserv-console-plugin [build version: , build date: 2022-10-21 15:15] at log level info" module=main time="2022-12-13T12:06:49Z" level=info msg="listening on https://:9001" module=server
20.3. Flowlogs-Pipeline does not consume network flows after installing Kafka Copy linkLink copied to clipboard!
Resolve issues where the flow-pipeline fails to consume network flows from Kafka by manually restarting the flow-pipeline pods to restore the connection between the flow collector and your Kafka deployment.
If you deployed the flow collector first with
deploymentModel: KAFKA
Procedure
Delete the flow-pipeline pods to restart them by running the following command:
$ oc delete pods -n netobserv -l app=flowlogs-pipeline-transformer
20.4. Failing to see network flows from both br-int and br-ex interfaces Copy linkLink copied to clipboard!
Resolve issues with missing network flows by removing interface restrictions on virtual bridge devices like
br-int
br-ex
br-ex
br-int
br-ex
br-int
br-ex
br-int
Manually remove the part in the
interfaces
excludeInterfaces
br-int
br-ex
Procedure
Remove the
field. This allows the agent to fetch information from all the interfaces. Alternatively, you can specify the Layer-3 interface for example,interfaces: [ 'br-int', 'br-ex' ]. Run the following command:eth0$ oc edit -n netobserv flowcollector.yaml -o yamlExample output
apiVersion: flows.netobserv.io/v1alpha1 kind: FlowCollector metadata: name: cluster spec: agent: type: EBPF ebpf: interfaces: [ 'br-int', 'br-ex' ]1 - 1
- Specifies the network interfaces.
20.5. Network observability controller manager pod runs out of memory Copy linkLink copied to clipboard!
Resolve memory issues with the Network Observability Operator by increasing the memory limits in the
Subscription
You can increase memory limits for the Network Observability Operator by editing the
spec.config.resources.limits.memory
Subscription
Procedure
- In the web console, navigate to Operators → Installed Operators
- Click Network Observability and then select Subscription.
From the Actions menu, click Edit Subscription.
Alternatively, you can use the CLI to open the YAML configuration for the
object by running the following command:Subscription$ oc edit subscription netobserv-operator -n openshift-netobserv-operator
Edit the
object to add theSubscriptionspecification and set the value to account for your memory requirements. See the Additional resources for more information about resource considerations:config.resources.limits.memoryapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: netobserv-operator namespace: openshift-netobserv-operator spec: channel: stable config: resources: limits: memory: 800Mi1 requests: cpu: 100m memory: 100Mi installPlanApproval: Automatic name: netobserv-operator source: redhat-operators sourceNamespace: openshift-marketplace startingCSV: <network_observability_operator_latest_version>2
20.6. Running custom queries to Loki Copy linkLink copied to clipboard!
Troubleshoot network flow data by running custom Loki queries to retrieve available labels or filter logs by specific criteria, such as source namespaces, using the command-line interface.
There are two examples of ways to do this, which you can adapt according to your needs by replacing the <api_token> with your own.
These examples use the
netobserv
loki
-n netobserv
loki-gateway
Prerequisites
- Installed Loki Operator for use with Network Observability Operator.
Procedure
To get all available labels, run the following command:
$ oc exec deployment/netobserv-plugin -n netobserv -- curl -G -s -H 'X-Scope-OrgID:network' -H 'Authorization: Bearer <api_token>' -k https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network/loki/api/v1/labels | jqTo get all flows from the source namespace,
, run the following command:my-namespace$ oc exec deployment/netobserv-plugin -n netobserv -- curl -G -s -H 'X-Scope-OrgID:network' -H 'Authorization: Bearer <api_token>' -k https://loki-gateway-http.netobserv.svc:8080/api/logs/v1/network/loki/api/v1/query --data-urlencode 'query={SrcK8S_Namespace="my-namespace"}' | jq
20.7. Troubleshooting Loki ResourceExhausted error Copy linkLink copied to clipboard!
Resolve Loki
ResourceExhausted
batchSize
FlowCollector
Loki may return a
ResourceExhausted
Procedure
- Navigate to Operators → Installed Operators, viewing All projects from the Project drop-down menu.
- In the Provided APIs list, select the Network Observability Operator.
Click the Flow Collector then the YAML view tab.
-
If you are using the Loki Operator, check that the value does not exceed 98 MiB.
spec.loki.batchSize -
If you are using a Loki installation method that is different from the Red Hat Loki Operator, such as Grafana Loki, verify that the Grafana Loki server setting is higher than the
grpc_server_max_recv_msg_sizeresourceFlowCollectorvalue. If it is not, you must either increase thespec.loki.batchSizevalue, or decrease thegrpc_server_max_recv_msg_sizevalue so that it is lower than the limit.spec.loki.batchSize
-
If you are using the Loki Operator, check that the
- Click Save if you edited the FlowCollector.
20.8. Loki empty ring error Copy linkLink copied to clipboard!
Investigate and resolve Loki "empty ring" errors by checking pod health, clearing old persistent volume claims, or restarting pods to restore connectivity and ensure network flows are properly stored and displayed.
The Loki "empty ring" error results in flows not being stored in Loki and not showing up in the web console. This error might happen in various situations. A single workaround to address them all does not exist. There are some actions you can take to investigate the logs in your Loki pods, and verify that the
LokiStack
Some of the situations where this error is observed are as follows:
After a
is uninstalled and reinstalled in the same namespace, old PVCs are not removed, which can cause this error.LokiStack-
Action: You can try removing the again, removing the PVC, then reinstalling the
LokiStack.LokiStack
-
Action: You can try removing the
After a certificate rotation, this error can prevent communication with the
andflowlogs-pipelinepods.console-plugin- Action: You can restart the pods to restore the connectivity.
20.9. Resource troubleshooting Copy linkLink copied to clipboard!
20.10. LokiStack rate limit errors Copy linkLink copied to clipboard!
Resolve Loki rate limit errors and prevent data loss by updating the
LokiStack
A rate-limit placed on the Loki tenant can result in potential temporary loss of data and a 429 error:
Per stream rate limit exceeded (limit:xMB/sec) while attempting to ingest for stream
You can update the LokiStack CRD with the
perStreamRateLimit
perStreamRateLimitBurst
Procedure
- Navigate to Operators → Installed Operators, viewing All projects from the Project dropdown.
- Look for Loki Operator, and select the LokiStack tab.
Create or edit an existing LokiStack instance using the YAML view to add the
andperStreamRateLimitspecifications:perStreamRateLimitBurstapiVersion: loki.grafana.com/v1 kind: LokiStack metadata: name: loki namespace: netobserv spec: limits: global: ingestion: perStreamRateLimit: 61 perStreamRateLimitBurst: 302 tenants: mode: openshift-network managementState: Managed- Click Save.
Verification
Once you update the
perStreamRateLimit
perStreamRateLimitBurst
20.11. Running a large query results in Loki errors Copy linkLink copied to clipboard!
Understand how you can mitigate Loki timeout and request errors when running large queries by using indexed filters, leveraging Prometheus for long time ranges, creating custom metrics, or adjusting Loki and FlowCollector performance settings.
When running large queries for a long time, Loki errors can occur, such as a
timeout
too many outstanding requests
- Adapt your query to add an indexed filter
-
With Loki queries, you can query on both indexed and non-indexed fields or labels. Queries that contain filters on labels perform better. For example, if you query for a particular Pod, which is not an indexed field, you can add its Namespace to the query. The list of indexed fields can be found in the "Network flows format reference", in the
Loki labelcolumn. - Consider querying Prometheus rather than Loki
- Prometheus is a better fit than Loki to query on large time ranges. However, whether or not you can use Prometheus instead of Loki depends on the use case. For example, queries on Prometheus are much faster than on Loki, and large time ranges do not impact performance. But Prometheus metrics do not contain as much information as flow logs in Loki. The Network Observability OpenShift web console automatically favors Prometheus over Loki if the query is compatible; otherwise, it defaults to Loki. If your query does not run against Prometheus, you can change some filters or aggregations to make the switch. In the OpenShift web console, you can force the use of Prometheus. An error message is displayed when incompatible queries fail, which can help you figure out which labels to change to make the query compatible. For example, changing a filter or an aggregation from Resource or Pods to Owner.
- Consider using the FlowMetrics API to create your own metric
- If the data that you need isn’t available as a Prometheus metric, you can use the FlowMetrics API to create your own metric. For more information, see "FlowMetrics API Reference" and "Configuring custom metrics by using FlowMetric API".
- Configure Loki to improve the query performance
If the problem persists, you can consider configuring Loki to improve the query performance. Some options depend on the installation mode you used for Loki, such as using the Operator and
, orLokiStackmode, orMonolithicmode.Microservices-
In or
LokiStackmodes, try increasing the number of querier replicas.Microservices -
Increase the query timeout. You must also increase the Network Observability read timeout to Loki in the
FlowCollector.spec.loki.readTimeout
-
In
Legal Notice
Copy linkLink copied to clipboard!
Copyright © Red Hat
OpenShift documentation is licensed under the Apache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
Modified versions must remove all Red Hat trademarks.
Portions adapted from https://github.com/kubernetes-incubator/service-catalog/ with modifications by Red Hat.
Red Hat, Red Hat Enterprise Linux, the Red Hat logo, the Shadowman logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
Node.js® is an official trademark of the OpenJS Foundation.
The OpenStack® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.
All other trademarks are the property of their respective owners.