Chapter 1. Network Observability Operator release notes
The Network Observability Operator enables administrators to observe and analyze network traffic flows for OpenShift Container Platform clusters.
These release notes track the development of the Network Observability Operator in the OpenShift Container Platform.
For an overview of the Network Observability Operator, see About Network Observability Operator.
1.1. Network Observability Operator 1.7.0
The following advisory is available for the Network Observability Operator 1.7.0:
1.1.1. New features and enhancements
1.1.1.1. OpenTelemetry support
You can now export enriched network flows to a compatible OpenTelemetry endpoint, such as the Red Hat build of OpenTelemetry. For more information see Export enriched network flow data.
1.1.1.2. Network Observability Developer perspective
You can now use Network Observability in the Developer perspective. For more information, see OpenShift Container Platform console integration.
1.1.1.3. TCP flags filtering
You can now use the tcpFlags
filter to limit the volume of packets processed by the eBPF program. For more information, see Flow filter configuration parameters, eBPF flow rule filter, and Detecting SYN flooding using the FlowMetric API and TCP flags.
1.1.1.4. Network Observability for OpenShift Virtualization
You can observe networking patterns on an OpenShift Virtualization setup by identifying eBPF-enriched network flows coming from VMs that are connected to secondary networks, such as through Open Virtual Network (OVN)-Kubernetes. For more information, see Configuring virtual machine (VM) secondary network interfaces for Network Observability.
1.1.1.5. Network policy deploys in the FlowCollector custom resource (CR)
With this release, you can configure the FlowCollector
CR to deploy a network policy for Network Observability. Previously, if you wanted a network policy, you had to manually create one. The option to manually create a network policy is still available. For more information, see Configuring an ingress network policy by using the FlowCollector custom resource.
1.1.1.6. FIPS compliance
You can install and use the Network Observability Operator in an OpenShift Container Platform cluster running in FIPS mode.
ImportantTo enable FIPS mode for your cluster, you must run the installation program from a RHEL computer configured to operate in FIPS mode. For more information about configuring FIPS mode on RHEL, see Installing the system in FIPS mode.
1.1.1.7. eBPF agent enhancements
The following enhancements are available for the eBPF agent:
-
If the DNS service maps to a different port than
53
, you can specify this DNS tracking port usingspec.agent.ebpf.advanced.env.DNS_TRACKING_PORT
. - You can now use two ports for transport protocols (TCP, UDP, or SCTP) filtering rules.
- You can now filter on transport ports with a wildcard protocol by leaving the protocol field empty.
For more information, see FlowCollector API specifications.
1.1.1.8. Network Observability CLI
The Network Observability CLI (oc netobserv
), is now generally available. The following enhancements have been made since the 1.6 Technology Preview release: * There are now eBPF enrichment filters for packet capture similar to flow capture. * You can now use filter tcp_flags
with both flow and packets capture. * The auto-teardown option is available when max-bytes or max-time is reached. For more information, see Network Observability CLI and Network Observability CLI 1.7.0.
1.1.2. Bug fixes
-
Previously, when using a RHEL 9.2 real-time kernel, some of the webhooks did not work. Now, a fix is in place to check whether this RHEL 9.2 real-time kernel is being used. If the kernel is being used, a warning is displayed about the features that do not work, such as packet drop and neither Round-trip Time when using
s390x
architecture. The fix is in OpenShift 4.16 and later. (NETOBSERV-1808) - Previously, in the Manage panels dialog in the Overview tab, filtering on total, bar, donut, or line did not show a result. Now the available panels are correctly filtered. (NETOBSERV-1540)
-
Previously, under high stress, the eBPF agents were susceptible to enter into a state where they generated a high number of small flows, almost not aggregated. With this fix, the aggregation process is still maintained under high stress, resulting in less flows being created. This fix improves the resource consumption not only in the eBPF agent but also in
flowlogs-pipeline
and Loki. (NETOBSERV-1564) -
Previously, when the
workload_flows_total
metric was enabled instead of thenamespace_flows_total
metric, the health dashboard stopped showingBy namespace
flow charts. With this fix, the health dashboard now shows the flow charts when theworkload_flows_total
is enabled. (NETOBSERV-1746) -
Previously, when you used the
FlowMetrics
API to generate a custom metric and later modified its labels, such as by adding a new label, the metric stopped populating and an error was shown in theflowlogs-pipeline
logs. With this fix, you can modify the labels, and the error is no longer raised in theflowlogs-pipeline
logs. (NETOBSERV-1748) -
Previously, there was an inconsistency with the default Loki
WriteBatchSize
configuration: it was set to 100 KB in theFlowCollector
CRD default, and 10 MB in the OLM sample or default configuration. Both are now aligned to 10 MB, which generally provides better performances and less resource footprint. (NETOBSERV-1766) - Previously, the eBPF flow filter on ports was ignored if you did not specify a protocol. With this fix, you can set eBPF flow filters independently on ports and or protocols. (NETOBSERV-1779)
- Previously, traffic from Pods to Services was hidden from the Topology view. Only the return traffic from Services to Pods was visible. With this fix, that traffic is correctly displayed. (NETOBSERV-1788)
- Previously, non-cluster administrator users that had access to Network Observability saw an error in the console plugin when they tried to filter for something that triggered auto-completion, such as a namespace. With this fix, no error is displayed, and the auto-completion returns the expected results. (NETOBSERV-1798)
- When the secondary interface support was added, you had to iterate multiple times to register the per network namespace with the netlink to learn about interface notifications. At the same time, unsuccessful handlers caused a leaking file descriptor because with TCX hook, unlike TC, handlers needed to be explicitly removed when the interface went down. Furthermore, when the network namespace was deleted, there was no Go close channel event to terminate the netlink goroutine socket, which caused go threads to leak. Now, there are no longer leaking file descriptors or go threads when you create or delete pods. (NETOBSERV-1805)
- Previously, the ICMP type and value were displaying 'n/a' in the Traffic flows table even when related data was available in the flow JSON. With this fix, ICMP columns display related values as expected in the flow table. (NETOBSERV-1806)
- Previously in the console plugin, it wasn’t always possible to filter for unset fields, such as unset DNS latency. With this fix, filtering on unset fields is now possible. (NETOBSERV-1816)
- Previously, when you cleared filters in the OpenShift web console plugin, sometimes the filters reappeared after you navigated to another page and returned to the page with filters. With this fix, filters do not unexpectedly reappear after they are cleared. (NETOBSERV-1733)
1.1.3. Known issues
- WWhen you use the must-gather tool with Network Observability, logs are not collected when the cluster has FIPS enabled. (NETOBSERV-1830)
When the
spec.networkPolicy
is enabled in theFlowCollector
, which installs a network policy on thenetobserv
namespace, it is impossible to use theFlowMetrics
API. The network policy blocks calls to the validation webhook. As a workaround, use the following network policy:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-from-hostnetwork namespace: netobserv spec: podSelector: matchLabels: app: netobserv-operator ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/host-network: '' policyTypes: - Ingress
1.2. Network Observability Operator 1.6.2
The following advisory is available for the Network Observability Operator 1.6.2:
1.2.1. CVEs
1.2.2. Bug fixes
- When the secondary interface support was added, there was a need to iterate multiple times to register the per network namespace with the netlink to learn about interface notifications. At the same time, unsuccessful handlers caused a leaking file descriptor because with TCX hook, unlike TC, handlers needed to be explicitly removed when the interface went down. Now, there is no longer leaking file descriptors when creating and deleting pods. (NETOBSERV-1805)
1.2.3. Known issues
There was a compatibility issue with console plugins that would have prevented Network Observability from being installed on future versions of an OpenShift Container Platform cluster. By upgrading to 1.6.2, the compatibility issue is resolved and Network Observability can be installed as expected. (NETOBSERV-1737)
1.3. Network Observability Operator 1.6.1
The following advisory is available for the Network Observability Operator 1.6.1:
1.3.1. CVEs
1.3.2. Bug fixes
- Previously, information about packet drops, such as the cause and TCP state, was only available in the Loki datastore and not in Prometheus. For that reason, the drop statistics in the OpenShift web console plugin Overview was only available with Loki. With this fix, information about packet drops is also added to metrics, so you can view drops statistics when Loki is disabled. (NETOBSERV-1649)
-
When the eBPF agent
PacketDrop
feature was enabled, and sampling was configured to a value greater than1
, reported dropped bytes and dropped packets ignored the sampling configuration. While this was done on purpose, so as not to miss any drops, a side effect was that the reported proportion of drops compared with non-drops became biased. For example, at a very high sampling rate, such as1:1000
, it was likely that almost all the traffic appears to be dropped when observed from the console plugin. With this fix, the sampling configuration is honored with dropped bytes and packets. (NETOBSERV-1676) - Previously, the SR-IOV secondary interface was not detected if the interface was created first and then the eBPF agent was deployed. It was only detected if the agent was deployed first and then the SR-IOV interface was created. With this fix, the SR-IOV secondary interface is detected no matter the sequence of the deployments. (NETOBSERV-1697)
- Previously, when Loki was disabled, the Topology view in the OpenShift web console displayed the Cluster and Zone aggregation options in the slider beside the network topology diagram, even when the related features were not enabled. With this fix, the slider now only displays options according to the enabled features. (NETOBSERV-1705)
-
Previously, when Loki was disabled, and the OpenShift web console was first loading, an error would occur:
Request failed with status code 400 Loki is disabled
. With this fix, the errors no longer occur. (NETOBSERV-1706) - Previously, in the Topology view of the OpenShift web console, when clicking on the Step into icon next to any graph node, the filters were not applied as required in order to set the focus to the selected graph node, resulting in showing a wide view of the Topology view in the OpenShift web console. With this fix, the filters are correctly set, effectively narrowing down the Topology. As part of this change, clicking the Step into icon on a Node now brings you to the Resource scope instead of the Namespaces scope. (NETOBSERV-1720)
- Previously, when Loki was disabled, in the Topology view of the OpenShift web console with the Scope set to Owner, clicking on the Step into icon next to any graph node would bring the Scope to Resource, which is not available without Loki, so an error message was shown. With this fix, the Step into icon is hidden in the Owner scope when Loki is disabled, so this scenario no longer occurs.(NETOBSERV-1721)
- Previously, when Loki was disabled, an error was displayed in the Topology view of the OpenShift web console when a group was set, but then the scope was changed so that the group becomes invalid. With this fix, the invalid group is removed, preventing the error. (NETOBSERV-1722)
-
When creating a
FlowCollector
resource from the OpenShift web console Form view, as opposed to the YAML view, the following settings were incorrectly managed by the web console:agent.ebpf.metrics.enable
andprocessor.subnetLabels.openShiftAutoDetect
. These settings can only be disabled in the YAML view, not in the Form view. To avoid any confusion, these settings have been removed from the Form view. They are still accessible in the YAML view. (NETOBSERV-1731) - Previously, the eBPF agent was unable to clean up traffic control flows installed before an ungraceful crash, for example a crash due to a SIGTERM signal. This led to the creation of multiple traffic control flow filters with the same name, since the older ones were not removed. With this fix, all previously installed traffic control flows are cleaned up when the agent starts, before installing new ones. (NETOBSERV-1732)
- Previously, when configuring custom subnet labels and keeping the OpenShift subnets auto-detection enabled, OpenShift subnets would take precedence over the custom ones, preventing the definition of custom labels for in cluster subnets. With this fix, custom defined subnets take precedence, allowing the definition of custom labels for in cluster subnets. (NETOBSERV-1734)
1.4. Network Observability Operator 1.6.0
The following advisory is available for the Network Observability Operator 1.6.0:
Before upgrading to the latest version of the Network Observability Operator, you must Migrate removed stored versions of the FlowCollector CRD. An automated solution to this workaround is planned with NETOBSERV-1747.
1.4.1. New features and enhancements
1.4.1.1. Enhanced use of Network Observability Operator without Loki
You can now use Prometheus metrics and rely less on Loki for storage when using the Network Observability Operator. For more information, see Network Observability without Loki.
1.4.1.2. Custom metrics API
You can create custom metrics out of flowlogs data by using the FlowMetrics
API. Flowlogs data can be used with Prometheus labels to customize cluster information on your dashboards. You can add custom labels for any subnet that you want to identify in your flows and metrics. This enhancement can also be used to more easily identify external traffic by using the new labels SrcSubnetLabel
and DstSubnetLabel
, which exists both in flow logs and in metrics. Those fields are empty when there is external traffic, which gives a way to identify it. For more information, see Custom metrics and FlowMetric API reference.
1.4.1.3. eBPF performance enhancements
Experience improved performances of the eBPF agent, in terms of CPU and memory, with the following updates:
- The eBPF agent now uses TCX webhooks instead of TC.
The NetObserv / Health dashboard has a new section that shows eBPF metrics.
- Based on the new eBPF metrics, an alert notifies you when the eBPF agent is dropping flows.
- Loki storage demand decreases significantly now that duplicated flows are removed. Instead of having multiple, individual duplicated flows per network interface, there is one de-duplicated flow with a list of related network interfaces.
With the duplicated flows update, the Interface and Interface Direction fields in the Network Traffic table are renamed to Interfaces and Interface Directions, so any bookmarked Quick filter queries using these fields need to be updated to interfaces
and ifdirections
.
For more information, see Using the eBPF agent alert and Quick filters.
1.4.1.4. eBPF collection rule-based filtering
You can use rule-based filtering to reduce the volume of created flows. When this option is enabled, the Netobserv / Health dashboard for eBPF agent statistics has the Filtered flows rate view. For more information, see eBPF flow rule filter.
1.4.2. Technology Preview features
Some features in this release are currently in Technology Preview. These experimental features are not intended for production use. Note the following scope of support on the Red Hat Customer Portal for these features:
Technology Preview Features Support Scope
1.4.2.1. Network Observability CLI
You can debug and troubleshoot network traffic issues without needing to install the Network Observability Operator by using the Network Observability CLI. Capture and visualize flow and packet data in real-time with no persistent storage requirement during the capture. For more information, see Network Observability CLI and Network Observability CLI 1.6.0.
1.4.3. Bug fixes
-
Previously, a dead link to the OpenShift containter platform documentation was displayed in the Operator Lifecycle Manager (OLM) form for the
FlowMetrics
API creation. Now the link has been updated to point to a valid page. (NETOBSERV-1607) - Previously, the Network Observability Operator description in the Operator Hub displayed a broken link to the documentation. With this fix, this link is restored. (NETOBSERV-1544)
-
Previously, if Loki was disabled and the Loki
Mode
was set toLokiStack
, or if Loki manual TLS configuration was configured, the Network Observability Operator still tried to read the Loki CA certificates. With this fix, when Loki is disabled, the Loki certificates are not read, even if there are settings in the Loki configuration. (NETOBSERV-1647) -
Previously, the
oc
must-gather
plugin for the Network Observability Operator was only working on theamd64
architecture and failing on all others because the plugin was usingamd64
for theoc
binary. Now, the Network Observability Operatoroc
must-gather
plugin collects logs on any architecture platform. -
Previously, when filtering on IP addresses using
not equal to
, the Network Observability Operator would return a request error. Now, the IP filtering works in bothequal
andnot equal to
cases for IP addresses and ranges. (NETOBSERV-1630) -
Previously, when a user was not an admin, the error messages were not consistent with the selected tab of the Network Traffic view in the web console. Now, the
user not admin
error displays on any tab with improved display.(NETOBSERV-1621)
1.4.4. Known issues
-
When the eBPF agent
PacketDrop
feature is enabled, and sampling is configured to a value greater than1
, reported dropped bytes and dropped packets ignore the sampling configuration. While this is done on purpose to not miss any drops, a side effect is that the reported proportion of drops compared to non-drops becomes biased. For example, at a very high sampling rate, such as1:1000
, it is likely that almost all the traffic appears to be dropped when observed from the console plugin. (NETOBSERV-1676) - In the Manage panels pop-up window in the Overview tab, filtering on total, bar, donut, or line does not show any result. (NETOBSERV-1540)
- The SR-IOV secondary interface is not detected if the interface was created first and then the eBPF agent was deployed. It is only detected if the agent was deployed first and then the SR-IOV interface is created. (NETOBSERV-1697)
- When Loki is disabled, the Topology view in the OpenShift web console always shows the Cluster and Zone aggregation options in the slider beside the network topology diagram, even when the related features are not enabled. There is no specific workaround, besides ignoring these slider options. (NETOBSERV-1705)
-
When Loki is disabled, and the OpenShift web console first loads, it might display an error:
Request failed with status code 400 Loki is disabled
. As a workaround, you can continue switching content on the Network Traffic page, such as clicking between the Topology and the Overview tabs. The error should disappear. (NETOBSERV-1706)
1.5. Network Observability Operator 1.5.0
The following advisory is available for the Network Observability Operator 1.5.0:
1.5.1. New features and enhancements
1.5.1.1. DNS tracking enhancements
In 1.5, the TCP protocol is now supported in addition to UDP. New dashboards are also added to the Overview view of the Network Traffic page. For more information, see Configuring DNS tracking and Working with DNS tracking.
1.5.1.2. Round-trip time (RTT)
You can use TCP handshake Round-Trip Time (RTT) captured from the fentry/tcp_rcv_established
Extended Berkeley Packet Filter (eBPF) hookpoint to read smoothed round-trip time (SRTT) and analyze network flows. In the Overview, Network Traffic, and Topology pages in web console, you can monitor network traffic and troubleshoot with RTT metrics, filtering, and edge labeling. For more information, see RTT Overview and Working with RTT.
1.5.1.3. Metrics, dashboards, and alerts enhancements
The Network Observability metrics dashboards in Observe includeList
specification. In previous releases, these metrics were defined in the ignoreTags
specification. For a complete list of these metrics, see Network Observability Metrics.
1.5.1.4. Improvements for Network Observability without Loki
You can create Prometheus alerts for the Netobserv dashboard using DNS, Packet drop, and RTT metrics, even if you don’t use Loki. In the previous version of Network Observability, 1.4, these metrics were only available for querying and analysis in the Network Traffic, Overview, and Topology views, which are not available without Loki. For more information, see Network Observability Metrics.
1.5.1.5. Availability zones
You can configure the FlowCollector
resource to collect information about the cluster availability zones. This configuration enriches the network flow data with the topology.kubernetes.io/zone
label value applied to the nodes. For more information, see Working with availability zones.
1.5.1.6. Notable enhancements
The 1.5 release of the Network Observability Operator adds improvements and new capabilities to the OpenShift Container Platform web console plugin and the Operator configuration.
Performance enhancements
The
spec.agent.ebpf.kafkaBatchSize
default is changed from10MB
to1MB
to enhance eBPF performance when using Kafka.ImportantWhen upgrading from an existing installation, this new value is not set automatically in the configuration. If you monitor a performance regression with the eBPF Agent memory consumption after upgrading, you might consider reducing the
kafkaBatchSize
to the new value.
Web console enhancements:
- There are new panels added to the Overview view for DNS and RTT: Min, Max, P90, P99.
There are new panel display options added:
- Focus on one panel while keeping others viewable but with smaller focus.
- Switch graph type.
- Show Top and Overall.
- A collection latency warning is shown in the Custom time range pop-up window.
- There is enhanced visibility for the contents of the Manage panels and Manage columns pop-up windows.
- The Differentiated Services Code Point (DSCP) field for egress QoS is available for filtering QoS DSCP in the web console Network Traffic page.
Configuration enhancements:
-
The
LokiStack
mode in thespec.loki.mode
specification simplifies installation by automatically setting URLs, TLS, cluster roles and a cluster role binding, as well as theauthToken
value. TheManual
mode allows more control over configuration of these settings. -
The API version changes from
flows.netobserv.io/v1beta1
toflows.netobserv.io/v1beta2
.
1.5.2. Bug fixes
-
Previously, it was not possible to register the console plugin manually in the web console interface if the automatic registration of the console plugin was disabled. If the
spec.console.register
value was set tofalse
in theFlowCollector
resource, the Operator would override and erase the plugin registration. With this fix, setting thespec.console.register
value tofalse
does not impact the console plugin registration or registration removal. As a result, the plugin can be safely registered manually. (NETOBSERV-1134) -
Previously, using the default metrics settings, the NetObserv/Health dashboard was showing an empty graph named Flows Overhead. This metric was only available by removing "namespaces-flows" and "namespaces" from the
ignoreTags
list. With this fix, this metric is visible when you use the default metrics setting. (NETOBSERV-1351) - Previously, the node on which the eBPF Agent was running would not resolve with a specific cluster configuration. This resulted in cascading consequences that culminated in a failure to provide some of the traffic metrics. With this fix, the eBPF agent’s node IP is safely provided by the Operator, inferred from the pod status. Now, the missing metrics are restored. (NETOBSERV-1430)
- Previously, the Loki error 'Input size too long' error for the Loki Operator did not include additional information to troubleshoot the problem. With this fix, help is directly displayed in the web console next to the error with a direct link for more guidance. (NETOBSERV-1464)
-
Previously, the console plugin read timeout was forced to 30s. With the
FlowCollector
v1beta2
API update, you can configure thespec.loki.readTimeout
specification to update this value according to the Loki OperatorqueryTimeout
limit. (NETOBSERV-1443) -
Previously, the Operator bundle did not display some of the supported features by CSV annotations as expected, such as
features.operators.openshift.io/…
With this fix, these annotations are set in the CSV as expected. (NETOBSERV-1305) -
Previously, the
FlowCollector
status sometimes oscillated betweenDeploymentInProgress
andReady
states during reconciliation. With this fix, the status only becomesReady
when all of the underlying components are fully ready. (NETOBSERV-1293)
1.5.3. Known issues
-
When trying to access the web console, cache issues on OCP 4.14.10 prevent access to the Observe view. The web console shows the error message:
Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
. The recommended workaround is to update the cluster to the latest minor version. If this does not work, you need to apply the workarounds described in this Red Hat Knowledgebase article.(NETOBSERV-1493) -
Since the 1.3.0 release of the Network Observability Operator, installing the Operator causes a warning kernel taint to appear. The reason for this error is that the Network Observability eBPF agent has memory constraints that prevent preallocating the entire hashmap table. The Operator eBPF agent sets the
BPF_F_NO_PREALLOC
flag so that pre-allocation is disabled when the hashmap is too memory expansive.
1.6. Network Observability Operator 1.4.2
The following advisory is available for the Network Observability Operator 1.4.2:
1.6.1. CVEs
1.7. Network Observability Operator 1.4.1
The following advisory is available for the Network Observability Operator 1.4.1:
1.7.1. CVEs
1.7.2. Bug fixes
- In 1.4, there was a known issue when sending network flow data to Kafka. The Kafka message key was ignored, causing an error with connection tracking. Now the key is used for partitioning, so each flow from the same connection is sent to the same processor. (NETOBSERV-926)
-
In 1.4, the
Inner
flow direction was introduced to account for flows between pods running on the same node. Flows with theInner
direction were not taken into account in the generated Prometheus metrics derived from flows, resulting in under-evaluated bytes and packets rates. Now, derived metrics are including flows with theInner
direction, providing correct bytes and packets rates. (NETOBSERV-1344)
1.8. Network Observability Operator 1.4.0
The following advisory is available for the Network Observability Operator 1.4.0:
1.8.1. Channel removal
You must switch your channel from v1.0.x
to stable
to receive the latest Operator updates. The v1.0.x
channel is now removed.
1.8.2. New features and enhancements
1.8.2.1. Notable enhancements
The 1.4 release of the Network Observability Operator adds improvements and new capabilities to the OpenShift Container Platform web console plugin and the Operator configuration.
Web console enhancements:
- In the Query Options, the Duplicate flows checkbox is added to choose whether or not to show duplicated flows.
- You can now filter source and destination traffic with One-way, Back-and-forth, and Swap filters.
The Network Observability metrics dashboards in Observe
Dashboards NetObserv and NetObserv / Health are modified as follows: - The NetObserv dashboard shows top bytes, packets sent, packets received per nodes, namespaces, and workloads. Flow graphs are removed from this dashboard.
- The NetObserv / Health dashboard shows flows overhead as well as top flow rates per nodes, namespaces, and workloads.
- Infrastructure and Application metrics are shown in a split-view for namespaces and workloads.
For more information, see Network Observability metrics and Quick filters.
Configuration enhancements:
- You now have the option to specify different namespaces for any configured ConfigMap or Secret reference, such as in certificates configuration.
-
The
spec.processor.clusterName
parameter is added so that the name of the cluster appears in the flows data. This is useful in a multi-cluster context. When using OpenShift Container Platform, leave empty to make it automatically determined.
For more information, see Flow Collector sample resource and Flow Collector API Reference.
1.8.2.2. Network Observability without Loki
The Network Observability Operator is now functional and usable without Loki. If Loki is not installed, it can only export flows to KAFKA or IPFIX format and provide metrics in the Network Observability metrics dashboards. For more information, see Network Observability without Loki.
1.8.2.3. DNS tracking
In 1.4, the Network Observability Operator makes use of eBPF tracepoint hooks to enable DNS tracking. You can monitor your network, conduct security analysis, and troubleshoot DNS issues in the Network Traffic and Overview pages in the web console.
For more information, see Configuring DNS tracking and Working with DNS tracking.
1.8.2.4. SR-IOV support
You can now collect traffic from a cluster with Single Root I/O Virtualization (SR-IOV) device. For more information, see Configuring the monitoring of SR-IOV interface traffic.
1.8.2.5. IPFIX exporter support
You can now export eBPF-enriched network flows to the IPFIX collector. For more information, see Export enriched network flow data.
1.8.2.6. Packet drops
In the 1.4 release of the Network Observability Operator, eBPF tracepoint hooks are used to enable packet drop tracking. You can now detect and analyze the cause for packet drops and make decisions to optimize network performance. In OpenShift Container Platform 4.14 and later, both host drops and OVS drops are detected. In OpenShift Container Platform 4.13, only host drops are detected. For more information, see Configuring packet drop tracking and Working with packet drops.
1.8.2.7. s390x architecture support
Network Observability Operator can now run on s390x
architecture. Previously it ran on amd64
, ppc64le
, or arm64
.
1.8.3. Bug fixes
-
Previously, the Prometheus metrics exported by Network Observability were computed out of potentially duplicated network flows. In the related dashboards, from Observe
Dashboards, this could result in potentially doubled rates. Note that dashboards from the Network Traffic view were not affected. Now, network flows are filtered to eliminate duplicates before metrics calculation, which results in correct traffic rates displayed in the dashboards. (NETOBSERV-1131) -
Previously, the Network Observability Operator agents were not able to capture traffic on network interfaces when configured with Multus or SR-IOV, non-default network namespaces. Now, all available network namespaces are recognized and used for capturing flows, allowing capturing traffic for SR-IOV. There are configurations needed for the
FlowCollector
andSRIOVnetwork
custom resource to collect traffic. (NETOBSERV-1283) -
Previously, in the Network Observability Operator details from Operators
Installed Operators, the FlowCollector
Status field might have reported incorrect information about the state of the deployment. The status field now shows the proper conditions with improved messages. The history of events is kept, ordered by event date. (NETOBSERV-1224) -
Previously, during spikes of network traffic load, certain eBPF pods were OOM-killed and went into a
CrashLoopBackOff
state. Now, theeBPF
agent memory footprint is improved, so pods are not OOM-killed and entering aCrashLoopBackOff
state. (NETOBSERV-975) -
Previously when
processor.metrics.tls
was set toPROVIDED
theinsecureSkipVerify
option value was forced to betrue
. Now you can setinsecureSkipVerify
totrue
orfalse
, and provide a CA certificate if needed. (NETOBSERV-1087)
1.8.4. Known issues
-
Since the 1.2.0 release of the Network Observability Operator, using Loki Operator 5.6, a Loki certificate change periodically affects the
flowlogs-pipeline
pods and results in dropped flows rather than flows written to Loki. The problem self-corrects after some time, but it still causes temporary flow data loss during the Loki certificate change. This issue has only been observed in large-scale environments of 120 nodes or greater. (NETOBSERV-980) -
Currently, when
spec.agent.ebpf.features
includes DNSTracking, larger DNS packets require theeBPF
agent to look for DNS header outside of the 1st socket buffer (SKB) segment. A neweBPF
agent helper function needs to be implemented to support it. Currently, there is no workaround for this issue. (NETOBSERV-1304) -
Currently, when
spec.agent.ebpf.features
includes DNSTracking, DNS over TCP packets requires theeBPF
agent to look for DNS header outside of the 1st SKB segment. A neweBPF
agent helper function needs to be implemented to support it. Currently, there is no workaround for this issue. (NETOBSERV-1245) -
Currently, when using a
KAFKA
deployment model, if conversation tracking is configured, conversation events might be duplicated across Kafka consumers, resulting in inconsistent tracking of conversations, and incorrect volumetric data. For that reason, it is not recommended to configure conversation tracking whendeploymentModel
is set toKAFKA
. (NETOBSERV-926) -
Currently, when the
processor.metrics.server.tls.type
is configured to use aPROVIDED
certificate, the operator enters an unsteady state that might affect its performance and resource consumption. It is recommended to not use aPROVIDED
certificate until this issue is resolved, and instead using an auto-generated certificate, settingprocessor.metrics.server.tls.type
toAUTO
. (NETOBSERV-1293 -
Since the 1.3.0 release of the Network Observability Operator, installing the Operator causes a warning kernel taint to appear. The reason for this error is that the Network Observability eBPF agent has memory constraints that prevent preallocating the entire hashmap table. The Operator eBPF agent sets the
BPF_F_NO_PREALLOC
flag so that pre-allocation is disabled when the hashmap is too memory expansive.
1.9. Network Observability Operator 1.3.0
The following advisory is available for the Network Observability Operator 1.3.0:
1.9.1. Channel deprecation
You must switch your channel from v1.0.x
to stable
to receive future Operator updates. The v1.0.x
channel is deprecated and planned for removal in the next release.
1.9.2. New features and enhancements
1.9.2.1. Multi-tenancy in Network Observability
- System administrators can allow and restrict individual user access, or group access, to the flows stored in Loki. For more information, see Multi-tenancy in Network Observability.
1.9.2.2. Flow-based metrics dashboard
- This release adds a new dashboard, which provides an overview of the network flows in your OpenShift Container Platform cluster. For more information, see Network Observability metrics.
1.9.2.3. Troubleshooting with the must-gather tool
- Information about the Network Observability Operator can now be included in the must-gather data for troubleshooting. For more information, see Network Observability must-gather.
1.9.2.4. Multiple architectures now supported
-
Network Observability Operator can now run on an
amd64
,ppc64le
, orarm64
architectures. Previously, it only ran onamd64
.
1.9.3. Deprecated features
1.9.3.1. Deprecated configuration parameter setting
The release of Network Observability Operator 1.3 deprecates the spec.Loki.authToken
HOST
setting. When using the Loki Operator, you must now only use the FORWARD
setting.
1.9.4. Bug fixes
-
Previously, when the Operator was installed from the CLI, the
Role
andRoleBinding
that are necessary for the Cluster Monitoring Operator to read the metrics were not installed as expected. The issue did not occur when the operator was installed from the web console. Now, either way of installing the Operator installs the requiredRole
andRoleBinding
. (NETOBSERV-1003) -
Since version 1.2, the Network Observability Operator can raise alerts when a problem occurs with the flows collection. Previously, due to a bug, the related configuration to disable alerts,
spec.processor.metrics.disableAlerts
was not working as expected and sometimes ineffectual. Now, this configuration is fixed so that it is possible to disable the alerts. (NETOBSERV-976) -
Previously, when Network Observability was configured with
spec.loki.authToken
set toDISABLED
, only akubeadmin
cluster administrator was able to view network flows. Other types of cluster administrators received authorization failure. Now, any cluster administrator is able to view network flows. (NETOBSERV-972) -
Previously, a bug prevented users from setting
spec.consolePlugin.portNaming.enable
tofalse
. Now, this setting can be set tofalse
to disable port-to-service name translation. (NETOBSERV-971) - Previously, the metrics exposed by the console plugin were not collected by the Cluster Monitoring Operator (Prometheus), due to an incorrect configuration. Now the configuration has been fixed so that the console plugin metrics are correctly collected and accessible from the OpenShift Container Platform web console. (NETOBSERV-765)
-
Previously, when
processor.metrics.tls
was set toAUTO
in theFlowCollector
, theflowlogs-pipeline servicemonitor
did not adapt the appropriate TLS scheme, and metrics were not visible in the web console. Now the issue is fixed for AUTO mode. (NETOBSERV-1070) -
Previously, certificate configuration, such as used for Kafka and Loki, did not allow specifying a namespace field, implying that the certificates had to be in the same namespace where Network Observability is deployed. Moreover, when using Kafka with TLS/mTLS, the user had to manually copy the certificate(s) to the privileged namespace where the
eBPF
agent pods are deployed and manually manage certificate updates, such as in the case of certificate rotation. Now, Network Observability setup is simplified by adding a namespace field for certificates in theFlowCollector
resource. As a result, users can now install Loki or Kafka in different namespaces without needing to manually copy their certificates in the Network Observability namespace. The original certificates are watched so that the copies are automatically updated when needed. (NETOBSERV-773) - Previously, the SCTP, ICMPv4 and ICMPv6 protocols were not covered by the Network Observability agents, resulting in a less comprehensive network flows coverage. These protocols are now recognized to improve the flows coverage. (NETOBSERV-934)
1.9.5. Known issues
-
When
processor.metrics.tls
is set toPROVIDED
in theFlowCollector
, theflowlogs-pipeline
servicemonitor
is not adapted to the TLS scheme. (NETOBSERV-1087) -
Since the 1.2.0 release of the Network Observability Operator, using Loki Operator 5.6, a Loki certificate change periodically affects the
flowlogs-pipeline
pods and results in dropped flows rather than flows written to Loki. The problem self-corrects after some time, but it still causes temporary flow data loss during the Loki certificate change. This issue has only been observed in large-scale environments of 120 nodes or greater.(NETOBSERV-980) -
When you install the Operator, a warning kernel taint can appear. The reason for this error is that the Network Observability eBPF agent has memory constraints that prevent preallocating the entire hashmap table. The Operator eBPF agent sets the
BPF_F_NO_PREALLOC
flag so that pre-allocation is disabled when the hashmap is too memory expansive.
1.10. Network Observability Operator 1.2.0
The following advisory is available for the Network Observability Operator 1.2.0:
1.10.1. Preparing for the next update
The subscription of an installed Operator specifies an update channel that tracks and receives updates for the Operator. Until the 1.2 release of the Network Observability Operator, the only channel available was v1.0.x
. The 1.2 release of the Network Observability Operator introduces the stable
update channel for tracking and receiving updates. You must switch your channel from v1.0.x
to stable
to receive future Operator updates. The v1.0.x
channel is deprecated and planned for removal in a following release.
1.10.2. New features and enhancements
1.10.2.1. Histogram in Traffic Flows view
- You can now choose to show a histogram bar chart of flows over time. The histogram enables you to visualize the history of flows without hitting the Loki query limit. For more information, see Using the histogram.
1.10.2.2. Conversation tracking
- You can now query flows by Log Type, which enables grouping network flows that are part of the same conversation. For more information, see Working with conversations.
1.10.2.3. Network Observability health alerts
-
The Network Observability Operator now creates automatic alerts if the
flowlogs-pipeline
is dropping flows because of errors at the write stage or if the Loki ingestion rate limit has been reached. For more information, see Health dashboards.
1.10.3. Bug fixes
-
Previously, after changing the
namespace
value in the FlowCollector spec,eBPF
agent pods running in the previous namespace were not appropriately deleted. Now, the pods running in the previous namespace are appropriately deleted. (NETOBSERV-774) -
Previously, after changing the
caCert.name
value in the FlowCollector spec (such as in Loki section), FlowLogs-Pipeline pods and Console plug-in pods were not restarted, therefore they were unaware of the configuration change. Now, the pods are restarted, so they get the configuration change. (NETOBSERV-772) - Previously, network flows between pods running on different nodes were sometimes not correctly identified as being duplicates because they are captured by different network interfaces. This resulted in over-estimated metrics displayed in the console plug-in. Now, flows are correctly identified as duplicates, and the console plug-in displays accurate metrics. (NETOBSERV-755)
- The "reporter" option in the console plug-in is used to filter flows based on the observation point of either source node or destination node. Previously, this option mixed the flows regardless of the node observation point. This was due to network flows being incorrectly reported as Ingress or Egress at the node level. Now, the network flow direction reporting is correct. The "reporter" option filters for source observation point, or destination observation point, as expected. (NETOBSERV-696)
- Previously, for agents configured to send flows directly to the processor as gRPC+protobuf requests, the submitted payload could be too large and is rejected by the processors' GRPC server. This occurred under very-high-load scenarios and with only some configurations of the agent. The agent logged an error message, such as: grpc: received message larger than max. As a consequence, there was information loss about those flows. Now, the gRPC payload is split into several messages when the size exceeds a threshold. As a result, the server maintains connectivity. (NETOBSERV-617)
1.10.4. Known issue
-
In the 1.2.0 release of the Network Observability Operator, using Loki Operator 5.6, a Loki certificate transition periodically affects the
flowlogs-pipeline
pods and results in dropped flows rather than flows written to Loki. The problem self-corrects after some time, but it still causes temporary flow data loss during the Loki certificate transition. (NETOBSERV-980)
1.10.5. Notable technical changes
-
Previously, you could install the Network Observability Operator using a custom namespace. This release introduces the
conversion webhook
which changes theClusterServiceVersion
. Because of this change, all the available namespaces are no longer listed. Additionally, to enable Operator metrics collection, namespaces that are shared with other Operators, like theopenshift-operators
namespace, cannot be used. Now, the Operator must be installed in theopenshift-netobserv-operator
namespace. You cannot automatically upgrade to the new Operator version if you previously installed the Network Observability Operator using a custom namespace. If you previously installed the Operator using a custom namespace, you must delete the instance of the Operator that was installed and re-install your operator in theopenshift-netobserv-operator
namespace. It is important to note that custom namespaces, such as the commonly usednetobserv
namespace, are still possible for theFlowCollector
, Loki, Kafka, and other plug-ins. (NETOBSERV-907)(NETOBSERV-956)
1.11. Network Observability Operator 1.1.0
The following advisory is available for the Network Observability Operator 1.1.0:
The Network Observability Operator is now stable and the release channel is upgraded to v1.1.0
.
1.11.1. Bug fix
-
Previously, unless the Loki
authToken
configuration was set toFORWARD
mode, authentication was no longer enforced, allowing any user who could connect to the OpenShift Container Platform console in an OpenShift Container Platform cluster to retrieve flows without authentication. Now, regardless of the LokiauthToken
mode, only cluster administrators can retrieve flows. (BZ#2169468)