Networking
Configuring and managing cluster networking
Abstract
Chapter 1. About networking
Red Hat OpenShift Networking is an ecosystem of features, plugins and advanced networking capabilities that extend Kubernetes networking with the advanced networking-related features that your cluster needs to manage its network traffic for one or multiple hybrid clusters. This ecosystem of networking capabilities integrates ingress, egress, load balancing, high-performance throughput, security, inter- and intra-cluster traffic management and provides role-based observability tooling to reduce its natural complexities.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
The following list highlights some of the most commonly used Red Hat OpenShift Networking features available on your cluster:
Primary cluster network provided by either of the following Container Network Interface (CNI) plugins:
- OVN-Kubernetes network plugin, the default plugin
- OpenShift SDN network plugin
- Certified 3rd-party alternative primary network plugins
- Cluster Network Operator for network plugin management
- Ingress Operator for TLS encrypted web traffic
- DNS Operator for name assignment
- MetalLB Operator for traffic load balancing on bare metal clusters
- IP failover support for high-availability
- Additional hardware network support through multiple CNI plugins, including for macvlan, ipvlan, and SR-IOV hardware networks
- IPv4, IPv6, and dual stack addressing
- Hybrid Linux-Windows host clusters for Windows-based workloads
- Red Hat OpenShift Service Mesh for discovery, load balancing, service-to-service authentication, failure recovery, metrics, and monitoring of services
- Single-node OpenShift
- Network Observability Operator for network debugging and insights
- Submariner for inter-cluster networking
- Red Hat Service Interconnect for layer 7 inter-cluster networking
Chapter 2. Understanding networking
Cluster Administrators have several options for exposing applications that run inside a cluster to external traffic and securing network connections:
- Service types, such as node ports or load balancers
-
API resources, such as
Ingress
andRoute
By default, Kubernetes allocates each pod an internal IP address for applications running within the pod. Pods and their containers can network, but clients outside the cluster do not have networking access. When you expose your application to external traffic, giving each pod its own IP address means that pods can be treated like physical hosts or virtual machines in terms of port allocation, networking, naming, service discovery, load balancing, application configuration, and migration.
Some cloud platforms offer metadata APIs that listen on the 169.254.169.254 IP address, a link-local IP address in the IPv4 169.254.0.0/16
CIDR block.
This CIDR block is not reachable from the pod network. Pods that need access to these IP addresses must be given host network access by setting the spec.hostNetwork
field in the pod spec to true
.
If you allow a pod host network access, you grant the pod privileged access to the underlying network infrastructure.
2.1. OpenShift Container Platform DNS
If you are running multiple services, such as front-end and back-end services for use with multiple pods, environment variables are created for user names, service IPs, and more so the front-end pods can communicate with the back-end services. If the service is deleted and recreated, a new IP address can be assigned to the service, and requires the front-end pods to be recreated to pick up the updated values for the service IP environment variable. Additionally, the back-end service must be created before any of the front-end pods to ensure that the service IP is generated properly, and that it can be provided to the front-end pods as an environment variable.
For this reason, OpenShift Container Platform has a built-in DNS so that the services can be reached by the service DNS as well as the service IP/port.
2.2. OpenShift Container Platform Ingress Operator
When you create your OpenShift Container Platform cluster, pods and services running on the cluster are each allocated their own IP addresses. The IP addresses are accessible to other pods and services running nearby but are not accessible to outside clients. The Ingress Operator implements the IngressController
API and is the component responsible for enabling external access to OpenShift Container Platform cluster services.
The Ingress Operator makes it possible for external clients to access your service by deploying and managing one or more HAProxy-based Ingress Controllers to handle routing. You can use the Ingress Operator to route traffic by specifying OpenShift Container Platform Route
and Kubernetes Ingress
resources. Configurations within the Ingress Controller, such as the ability to define endpointPublishingStrategy
type and internal load balancing, provide ways to publish Ingress Controller endpoints.
2.2.1. Comparing routes and Ingress
The Kubernetes Ingress resource in OpenShift Container Platform implements the Ingress Controller with a shared router service that runs as a pod inside the cluster. The most common way to manage Ingress traffic is with the Ingress Controller. You can scale and replicate this pod like any other regular pod. This router service is based on HAProxy, which is an open source load balancer solution.
The OpenShift Container Platform route provides Ingress traffic to services in the cluster. Routes provide advanced features that might not be supported by standard Kubernetes Ingress Controllers, such as TLS re-encryption, TLS passthrough, and split traffic for blue-green deployments.
Ingress traffic accesses services in the cluster through a route. Routes and Ingress are the main resources for handling Ingress traffic. Ingress provides features similar to a route, such as accepting external requests and delegating them based on the route. However, with Ingress you can only allow certain types of connections: HTTP/2, HTTPS and server name identification (SNI), and TLS with certificate. In OpenShift Container Platform, routes are generated to meet the conditions specified by the Ingress resource.
2.3. Glossary of common terms for OpenShift Container Platform networking
This glossary defines common terms that are used in the networking content.
- authentication
- To control access to an OpenShift Container Platform cluster, a cluster administrator can configure user authentication and ensure only approved users access the cluster. To interact with an OpenShift Container Platform cluster, you must authenticate to the OpenShift Container Platform API. You can authenticate by providing an OAuth access token or an X.509 client certificate in your requests to the OpenShift Container Platform API.
- AWS Load Balancer Operator
-
The AWS Load Balancer (ALB) Operator deploys and manages an instance of the
aws-load-balancer-controller
. - Cluster Network Operator
- The Cluster Network Operator (CNO) deploys and manages the cluster network components in an OpenShift Container Platform cluster. This includes deployment of the Container Network Interface (CNI) network plugin selected for the cluster during installation.
- config map
-
A config map provides a way to inject configuration data into pods. You can reference the data stored in a config map in a volume of type
ConfigMap
. Applications running in a pod can use this data. - custom resource (CR)
- A CR is extension of the Kubernetes API. You can create custom resources.
- DNS
- Cluster DNS is a DNS server which serves DNS records for Kubernetes services. Containers started by Kubernetes automatically include this DNS server in their DNS searches.
- DNS Operator
- The DNS Operator deploys and manages CoreDNS to provide a name resolution service to pods. This enables DNS-based Kubernetes Service discovery in OpenShift Container Platform.
- deployment
- A Kubernetes resource object that maintains the life cycle of an application.
- domain
- Domain is a DNS name serviced by the Ingress Controller.
- egress
- The process of data sharing externally through a network’s outbound traffic from a pod.
- External DNS Operator
- The External DNS Operator deploys and manages ExternalDNS to provide the name resolution for services and routes from the external DNS provider to OpenShift Container Platform.
- HTTP-based route
- An HTTP-based route is an unsecured route that uses the basic HTTP routing protocol and exposes a service on an unsecured application port.
- Ingress
- The Kubernetes Ingress resource in OpenShift Container Platform implements the Ingress Controller with a shared router service that runs as a pod inside the cluster.
- Ingress Controller
- The Ingress Operator manages Ingress Controllers. Using an Ingress Controller is the most common way to allow external access to an OpenShift Container Platform cluster.
- installer-provisioned infrastructure
- The installation program deploys and configures the infrastructure that the cluster runs on.
- kubelet
- A primary node agent that runs on each node in the cluster to ensure that containers are running in a pod.
- Kubernetes NMState Operator
- The Kubernetes NMState Operator provides a Kubernetes API for performing state-driven network configuration across the OpenShift Container Platform cluster’s nodes with NMState.
- kube-proxy
- Kube-proxy is a proxy service which runs on each node and helps in making services available to the external host. It helps in forwarding the request to correct containers and is capable of performing primitive load balancing.
- load balancers
- OpenShift Container Platform uses load balancers for communicating from outside the cluster with services running in the cluster.
- MetalLB Operator
-
As a cluster administrator, you can add the MetalLB Operator to your cluster so that when a service of type
LoadBalancer
is added to the cluster, MetalLB can add an external IP address for the service. - multicast
- With IP multicast, data is broadcast to many IP addresses simultaneously.
- namespaces
- A namespace isolates specific system resources that are visible to all processes. Inside a namespace, only processes that are members of that namespace can see those resources.
- networking
- Network information of a OpenShift Container Platform cluster.
- node
- A worker machine in the OpenShift Container Platform cluster. A node is either a virtual machine (VM) or a physical machine.
- OpenShift Container Platform Ingress Operator
-
The Ingress Operator implements the
IngressController
API and is the component responsible for enabling external access to OpenShift Container Platform services. - pod
- One or more containers with shared resources, such as volume and IP addresses, running in your OpenShift Container Platform cluster. A pod is the smallest compute unit defined, deployed, and managed.
- PTP Operator
-
The PTP Operator creates and manages the
linuxptp
services. - route
- The OpenShift Container Platform route provides Ingress traffic to services in the cluster. Routes provide advanced features that might not be supported by standard Kubernetes Ingress Controllers, such as TLS re-encryption, TLS passthrough, and split traffic for blue-green deployments.
- scaling
- Increasing or decreasing the resource capacity.
- service
- Exposes a running application on a set of pods.
- Single Root I/O Virtualization (SR-IOV) Network Operator
- The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.
- software-defined networking (SDN)
- OpenShift Container Platform uses a software-defined networking (SDN) approach to provide a unified cluster network that enables communication between pods across the OpenShift Container Platform cluster.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
- Stream Control Transmission Protocol (SCTP)
- SCTP is a reliable message based protocol that runs on top of an IP network.
- taint
- Taints and tolerations ensure that pods are scheduled onto appropriate nodes. You can apply one or more taints on a node.
- toleration
- You can apply tolerations to pods. Tolerations allow the scheduler to schedule pods with matching taints.
- web console
- A user interface (UI) to manage OpenShift Container Platform.
Chapter 3. Zero trust networking
Zero trust is an approach to designing security architectures based on the premise that every interaction begins in an untrusted state. This contrasts with traditional architectures, which might determine trustworthiness based on whether communication starts inside a firewall. More specifically, zero trust attempts to close gaps in security architectures that rely on implicit trust models and one-time authentication.
OpenShift Container Platform can add some zero trust networking capabilities to containers running on the platform without requiring changes to the containers or the software running in them. There are also several products that Red Hat offers that can further augment the zero trust networking capabilities of containers. If you have the ability to change the software running in the containers, then there are other projects that Red Hat supports that can add further capabilities.
Explore the following targeted capabilities of zero trust networking.
3.1. Root of trust
Public certificates and private keys are critical to zero trust networking. These are used to identify components to one another, authenticate, and to secure traffic. The certificates are signed by other certificates, and there is a chain of trust to a root certificate authority (CA). Everything participating in the network needs to ultimately have the public key for a root CA so that it can validate the chain of trust. For public-facing things, these are usually the set of root CAs that are globally known, and whose keys are distributed with operating systems, web browsers, and so on. However, it is possible to run a private CA for a cluster or a corporation if the certificate of the private CA is distributed to all parties.
Leverage:
- OpenShift Container Platform: OpenShift creates a cluster CA at installation that is used to secure the cluster resources. However, OpenShift Container Platform can also create and sign certificates for services in the cluster, and can inject the cluster CA bundle into a pod if requested. Service certificates created and signed by OpenShift Container Platform have a 26-month time to live (TTL) and are rotated automatically at 13 months. They can also be rotated manually if necessary.
- OpenShift cert-manager Operator: cert-manager allows you to request keys that are signed by an external root of trust. There are many configurable issuers to integrate with external issuers, along with ways to run with a delegated signing certificate. The cert-manager API can be used by other software in zero trust networking to request the necessary certificates (for example, Red Hat OpenShift Service Mesh), or can be used directly by customer software.
3.2. Traffic authentication and encryption
Ensure that all traffic on the wire is encrypted and the endpoints are identifiable. An example of this is Mutual TLS, or mTLS, which is a method for mutual authentication.
Leverage:
- OpenShift Container Platform: With transparent pod-to-pod IPsec, the source and destination of the traffic can be identified by the IP address. There is the capability for egress traffic to be encrypted using IPsec. By using the egress IP feature, the source IP address of the traffic can be used to identify the source of the traffic inside the cluster.
- Red Hat OpenShift Service Mesh: Provides powerful mTLS capabilities that can transparently augment traffic leaving a pod to provide authentication and encryption.
- OpenShift cert-manager Operator: Use custom resource definitions (CRDs) to request certificates that can be mounted for your programs to use for SSL/TLS protocols.
3.3. Identification and authentication
After you have the ability to mint certificates using a CA, you can use it to establish trust relationships by verification of the identity of the other end of a connection — either a user or a client machine. This also requires management of certificate lifecycles to limit use if compromised.
Leverage:
- OpenShift Container Platform: Cluster-signed service certificates to ensure that a client is talking to a trusted endpoint. This requires that the service uses SSL/TLS and that the client uses the cluster CA. The client identity must be provided using some other means.
- Red Hat Single Sign-On: Provides request authentication integration with enterprise user directories or third-party identity providers.
- Red Hat OpenShift Service Mesh: Transparent upgrade of connections to mTLS, auto-rotation, custom certificate expiration, and request authentication with JSON web token (JWT).
- OpenShift cert-manager Operator: Creation and management of certificates for use by your application. Certificates can be controlled by CRDs and mounted as secrets, or your application can be changed to interact directly with the cert-manager API.
3.4. Inter-service authorization
It is critical to be able to control access to services based on the identity of the requester. This is done by the platform and does not require each application to implement it. That allows better auditing and inspection of the policies.
Leverage:
-
OpenShift Container Platform: Can enforce isolation in the networking layer of the platform using the Kubernetes
NetworkPolicy
andAdminNetworkPolicy
objects. - Red Hat OpenShift Service Mesh: Sophisticated L4 and L7 control of traffic using standard Istio objects and using mTLS to identify the source and destination of traffic and then apply policies based on that information.
3.5. Transaction-level verification
In addition to the ability to identify and authenticate connections, it is also useful to control access to individual transactions. This can include rate-limiting by source, observability, and semantic validation that a transaction is well formed.
Leverage:
- Red Hat OpenShift Service Mesh: Perform L7 inspection of requests, rejecting malformed HTTP requests, transaction-level observability and reporting. Service Mesh can also provide request-based authentication using JWT.
3.6. Risk assessment
As the number of security policies in a cluster increase, visualization of what the policies allow and deny becomes increasingly important. These tools make it easier to create, visualize, and manage cluster security policies.
Leverage:
-
Red Hat OpenShift Service Mesh: Create and visualize Kubernetes
NetworkPolicy
andAdminNetworkPolicy
, and OpenShift NetworkingEgressFirewall
objects using the OpenShift web console. - Red Hat Advanced Cluster Security for Kubernetes: Advanced visualization of objects.
3.7. Site-wide policy enforcement and distribution
After deploying applications on a cluster, it becomes challenging to manage all of the objects that make up the security rules. It becomes critical to be able to apply site-wide policies and audit the deployed objects for compliance with the policies. This should allow for delegation of some permissions to users and cluster administrators within defined bounds, and should allow for exceptions to the policies if necessary.
Leverage:
- Red Hat OpenShift Service Mesh: RBAC to control policy objects and delegate control.
- Red Hat Advanced Cluster Security for Kubernetes: Policy enforcement engine.
- Red Hat Advanced Cluster Management (RHACM) for Kubernetes: Centralized policy control.
3.8. Observability for constant, and retrospective, evaluation
After you have a running cluster, you want to be able to observe the traffic and verify that the traffic comports with the defined rules. This is important for intrusion detection, forensics, and is helpful for operational load management.
Leverage:
- Network Observability Operator: Allows for inspection, monitoring, and alerting on network connections to pods and nodes in the cluster.
- Red Hat Advanced Cluster Management (RHACM) for Kubernetes: Monitors, collects, and evaluates system-level events such as process execution, network connections and flows, and privilege escalation. It can determine a baseline for a cluster, and then detect anomalous activity and alert you about it.
- Red Hat OpenShift Service Mesh: Can monitor traffic entering and leaving a pod.
- Red Hat OpenShift Distributed Tracing Platform: For suitably instrumented applications, you can see all traffic associated with a particular action as it splits into sub-requests to microservices. This allows you to identify bottlenecks within a distributed application.
3.9. Endpoint security
It is important to be able to trust that the software running the services in your cluster has not been compromised. For example, you might need to ensure that certified images are run on trusted hardware, and have policies to only allow connections to or from an endpoint based on endpoint characteristics.
Leverage:
- OpenShift Container Platform: Secureboot can ensure that the nodes in the cluster are running trusted software, so the platform itself (including the container runtime) have not been tampered with. You can configure OpenShift Container Platform to only run images that have been signed by certain signatures.
- Red Hat Trusted Artifact Signer: This can be used in a trusted build chain and produce signed container images.
3.10. Extending trust outside of the cluster
You might want to extend trust outside of the cluster by allowing a cluster to mint CAs for a subdomain. Alternatively, you might want to attest to workload identity in the cluster to a remote endpoint.
Leverage:
- OpenShift cert-manager Operator: You can use cert-manager to manage delegated CAs so that you can distribute trust across different clusters, or through your organization.
- Red Hat OpenShift Service Mesh: Can use SPIFFE to provide remote attestation of workloads to endpoints running in remote or local clusters.
Chapter 4. Accessing hosts
Learn how to create a bastion host to access OpenShift Container Platform instances and access the control plane nodes with secure shell (SSH) access.
4.1. Accessing hosts on Amazon Web Services in an installer-provisioned infrastructure cluster
The OpenShift Container Platform installer does not create any public IP addresses for any of the Amazon Elastic Compute Cloud (Amazon EC2) instances that it provisions for your OpenShift Container Platform cluster. To be able to SSH to your OpenShift Container Platform hosts, you must follow this procedure.
Procedure
-
Create a security group that allows SSH access into the virtual private cloud (VPC) created by the
openshift-install
command. - Create an Amazon EC2 instance on one of the public subnets the installer created.
Associate a public IP address with the Amazon EC2 instance that you created.
Unlike with the OpenShift Container Platform installation, you should associate the Amazon EC2 instance you created with an SSH keypair. It does not matter what operating system you choose for this instance, as it will simply serve as an SSH bastion to bridge the internet into your OpenShift Container Platform cluster’s VPC. The Amazon Machine Image (AMI) you use does matter. With Red Hat Enterprise Linux CoreOS (RHCOS), for example, you can provide keys via Ignition, like the installer does.
After you provisioned your Amazon EC2 instance and can SSH into it, you must add the SSH key that you associated with your OpenShift Container Platform installation. This key can be different from the key for the bastion instance, but does not have to be.
NoteDirect SSH access is only recommended for disaster recovery. When the Kubernetes API is responsive, run privileged pods instead.
-
Run
oc get nodes
, inspect the output, and choose one of the nodes that is a master. The hostname looks similar toip-10-0-1-163.ec2.internal
. From the bastion SSH host you manually deployed into Amazon EC2, SSH into that control plane host. Ensure that you use the same SSH key you specified during the installation:
ssh -i <ssh-key-path> core@<master-hostname>
$ ssh -i <ssh-key-path> core@<master-hostname>
Copy to Clipboard Copied!
Chapter 5. Networking Operators overview
OpenShift Container Platform supports multiple types of networking Operators. You can manage the cluster networking using these networking Operators.
5.1. Cluster Network Operator
The Cluster Network Operator (CNO) deploys and manages the cluster network components in an OpenShift Container Platform cluster. This includes deployment of the Container Network Interface (CNI) network plugin selected for the cluster during installation. For more information, see Cluster Network Operator in OpenShift Container Platform.
5.2. DNS Operator
The DNS Operator deploys and manages CoreDNS to provide a name resolution service to pods. This enables DNS-based Kubernetes Service discovery in OpenShift Container Platform. For more information, see DNS Operator in OpenShift Container Platform.
5.3. Ingress Operator
When you create your OpenShift Container Platform cluster, pods and services running on the cluster are each allocated IP addresses. The IP addresses are accessible to other pods and services running nearby but are not accessible to external clients. The Ingress Operator implements the Ingress Controller API and is responsible for enabling external access to OpenShift Container Platform cluster services. For more information, see Ingress Operator in OpenShift Container Platform.
5.4. External DNS Operator
The External DNS Operator deploys and manages ExternalDNS to provide the name resolution for services and routes from the external DNS provider to OpenShift Container Platform. For more information, see Understanding the External DNS Operator.
5.5. Ingress Node Firewall Operator
The Ingress Node Firewall Operator uses an extended Berkley Packet Filter (eBPF) and eXpress Data Path (XDP) plugin to process node firewall rules, update statistics and generate events for dropped traffic. The operator manages ingress node firewall resources, verifies firewall configuration, does not allow incorrectly configured rules that can prevent cluster access, and loads ingress node firewall XDP programs to the selected interfaces in the rule’s object(s). For more information, see Understanding the Ingress Node Firewall Operator
5.6. Network Observability Operator
The Network Observability Operator is an optional Operator that allows cluster administrators to observe the network traffic for OpenShift Container Platform clusters. The Network Observability Operator uses the eBPF technology to create network flows. The network flows are then enriched with OpenShift Container Platform information and stored in Loki. You can view and analyze the stored network flows information in the OpenShift Container Platform console for further insight and troubleshooting. For more information, see About Network Observability Operator.
Chapter 6. Networking dashboards
Networking metrics are viewable in dashboards within the OpenShift Container Platform web console, under Observe → Dashboards.
6.1. Network Observability Operator
If you have the Network Observability Operator installed, you can view network traffic metrics dashboards by selecting the Netobserv dashboard from the Dashboards drop-down list. For more information about metrics available in this Dashboard, see Network Observability metrics dashboards.
6.2. Networking and OVN-Kubernetes dashboard
You can view both general networking metrics as well as OVN-Kubernetes metrics from the dashboard.
To view general networking metrics, select Networking/Linux Subsystem Stats from the Dashboards drop-down list. You can view the following networking metrics from the dashboard: Network Utilisation, Network Saturation, and Network Errors.
To view OVN-Kubernetes metrics select Networking/Infrastructure from the Dashboards drop-down list. You can view the following OVN-Kuberenetes metrics: Networking Configuration, TCP Latency Probes, Control Plane Resources, and Worker Resources.
6.3. Ingress Operator dashboard
You can view networking metrics handled by the Ingress Operator from the dashboard. This includes metrics like the following:
- Incoming and outgoing bandwidth
- HTTP error rates
- HTTP server response latency
To view these Ingress metrics, select Networking/Ingress from the Dashboards drop-down list. You can view Ingress metrics for the following categories: Top 10 Per Route, Top 10 Per Namespace, and Top 10 Per Shard.
Chapter 7. Cluster Network Operator in OpenShift Container Platform
You can use the Cluster Network Operator (CNO) to deploy and manage cluster network components on an OpenShift Container Platform cluster, including the Container Network Interface (CNI) network plugin selected for the cluster during installation.
7.1. Cluster Network Operator
The Cluster Network Operator implements the network
API from the operator.openshift.io
API group. The Operator deploys the OVN-Kubernetes network plugin, or the network provider plugin that you selected during cluster installation, by using a daemon set.
Procedure
The Cluster Network Operator is deployed during installation as a Kubernetes Deployment
.
Run the following command to view the Deployment status:
oc get -n openshift-network-operator deployment/network-operator
$ oc get -n openshift-network-operator deployment/network-operator
Copy to Clipboard Copied! Example output
NAME READY UP-TO-DATE AVAILABLE AGE network-operator 1/1 1 1 56m
NAME READY UP-TO-DATE AVAILABLE AGE network-operator 1/1 1 1 56m
Copy to Clipboard Copied! Run the following command to view the state of the Cluster Network Operator:
oc get clusteroperator/network
$ oc get clusteroperator/network
Copy to Clipboard Copied! Example output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE network 4.5.4 True False False 50m
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE network 4.5.4 True False False 50m
Copy to Clipboard Copied! The following fields provide information about the status of the operator:
AVAILABLE
,PROGRESSING
, andDEGRADED
. TheAVAILABLE
field isTrue
when the Cluster Network Operator reports an available status condition.
7.2. Viewing the cluster network configuration
Every new OpenShift Container Platform installation has a network.config
object named cluster
.
Procedure
Use the
oc describe
command to view the cluster network configuration:oc describe network.config/cluster
$ oc describe network.config/cluster
Copy to Clipboard Copied! Example output
Name: cluster Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: Network Metadata: Self Link: /apis/config.openshift.io/v1/networks/cluster Spec: Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Network Type: OpenShiftSDN Service Network: 172.30.0.0/16 Status: Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Cluster Network MTU: 8951 Network Type: OpenShiftSDN Service Network: 172.30.0.0/16 Events: <none>
Name: cluster Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: Network Metadata: Self Link: /apis/config.openshift.io/v1/networks/cluster Spec:
1 Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Network Type: OpenShiftSDN Service Network: 172.30.0.0/16 Status:
2 Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Cluster Network MTU: 8951 Network Type: OpenShiftSDN Service Network: 172.30.0.0/16 Events: <none>
Copy to Clipboard Copied!
7.3. Viewing Cluster Network Operator status
You can inspect the status and view the details of the Cluster Network Operator using the oc describe
command.
Procedure
Run the following command to view the status of the Cluster Network Operator:
oc describe clusteroperators/network
$ oc describe clusteroperators/network
Copy to Clipboard Copied!
7.4. Enabling IP forwarding globally
From OpenShift Container Platform 4.14 onward, global IP address forwarding is disabled on OVN-Kubernetes based cluster deployments to prevent undesirable effects for cluster administrators with nodes acting as routers. However, in some cases where an administrator expects traffic to be forwarded a new configuration parameter ipForwarding
is available to allow forwarding of all IP traffic.
To re-enable IP forwarding for all traffic on OVN-Kubernetes managed interfaces set the gatewayConfig.ipForwarding
specification in the Cluster Network Operator to Global
following this procedure:
Procedure
Backup the existing network configuration by running the following command:
oc get network.operator cluster -o yaml > network-config-backup.yaml
$ oc get network.operator cluster -o yaml > network-config-backup.yaml
Copy to Clipboard Copied! Run the following command to modify the existing network configuration:
oc edit network.operator cluster
$ oc edit network.operator cluster
Copy to Clipboard Copied! Add or update the following block under
spec
as illustrated in the following example:spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16 networkType: OVNKubernetes clusterNetworkMTU: 8900 defaultNetwork: ovnKubernetesConfig: gatewayConfig: ipForwarding: Global
spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16 networkType: OVNKubernetes clusterNetworkMTU: 8900 defaultNetwork: ovnKubernetesConfig: gatewayConfig: ipForwarding: Global
Copy to Clipboard Copied! - Save and close the file.
After applying the changes, the OpenShift Cluster Network Operator (CNO) applies the update across the cluster. You can monitor the progress by using the following command:
oc get clusteroperators network
$ oc get clusteroperators network
Copy to Clipboard Copied! The status should eventually report as
Available
,Progressing=False
, andDegraded=False
.Alternatively, you can enable IP forwarding globally by running the following command:
oc patch network.operator cluster -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":{"ipForwarding": "Global"}}}}}
$ oc patch network.operator cluster -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":{"ipForwarding": "Global"}}}}}
Copy to Clipboard Copied! NoteThe other valid option for this parameter is
Restricted
in case you want to revert this change.Restricted
is the default and with that setting global IP address forwarding is disabled.
7.5. Viewing Cluster Network Operator logs
You can view Cluster Network Operator logs by using the oc logs
command.
Procedure
Run the following command to view the logs of the Cluster Network Operator:
oc logs --namespace=openshift-network-operator deployment/network-operator
$ oc logs --namespace=openshift-network-operator deployment/network-operator
Copy to Clipboard Copied!
7.6. Cluster Network Operator configuration
The configuration for the cluster network is specified as part of the Cluster Network Operator (CNO) configuration and stored in a custom resource (CR) object that is named cluster
. The CR specifies the fields for the Network
API in the operator.openshift.io
API group.
The CNO configuration inherits the following fields during cluster installation from the Network
API in the Network.config.openshift.io
API group:
clusterNetwork
- IP address pools from which pod IP addresses are allocated.
serviceNetwork
- IP address pool for services.
defaultNetwork.type
-
Cluster network plugin.
OVNKubernetes
is the only supported plugin during installation.
After cluster installation, you can only modify the clusterNetwork
IP address range. The default network type can only be changed from OpenShift SDN to OVN-Kubernetes through migration.
You can specify the cluster network plugin configuration for your cluster by setting the fields for the defaultNetwork
object in the CNO object named cluster
.
7.6.1. Cluster Network Operator configuration object
The fields for the Cluster Network Operator (CNO) are described in the following table:
Field | Type | Description |
---|---|---|
|
|
The name of the CNO object. This name is always |
|
| A list specifying the blocks of IP addresses from which pod IP addresses are allocated and the subnet prefix length assigned to each individual node in the cluster. For example: spec: clusterNetwork: - cidr: 10.128.0.0/19 hostPrefix: 23 - cidr: 10.128.32.0/19 hostPrefix: 23
|
|
| A block of IP addresses for services. The OpenShift SDN and OVN-Kubernetes network plugins support only a single IP address block for the service network. For example: spec: serviceNetwork: - 172.30.0.0/14
This value is ready-only and inherited from the |
|
| Configures the network plugin for the cluster network. |
|
| The fields for this object specify the kube-proxy configuration. If you are using the OVN-Kubernetes cluster network plugin, the kube-proxy configuration has no effect. |
For a cluster that needs to deploy objects across multiple networks, ensure that you specify the same value for the clusterNetwork.hostPrefix
parameter for each network type that is defined in the install-config.yaml
file. Setting a different value for each clusterNetwork.hostPrefix
parameter can impact the OVN-Kubernetes network plugin, where the plugin cannot effectively route object traffic among different nodes.
defaultNetwork object configuration
The values for the defaultNetwork
object are defined in the following table:
Field | Type | Description |
---|---|---|
|
|
Note OpenShift Container Platform uses the OVN-Kubernetes network plugin by default. OpenShift SDN is no longer available as an installation choice for new clusters. |
|
| This object is only valid for the OVN-Kubernetes network plugin. |
Configuration for the OpenShift SDN network plugin
The following table describes the configuration fields for the OpenShift SDN network plugin:
Field | Type | Description |
---|---|---|
|
| The network isolation mode for OpenShift SDN. |
|
| The maximum transmission unit (MTU) for the VXLAN overlay network. This value is normally configured automatically. |
|
|
The port to use for all VXLAN packets. The default value is |
Example OpenShift SDN configuration
defaultNetwork: type: OpenShiftSDN openshiftSDNConfig: mode: NetworkPolicy mtu: 1450 vxlanPort: 4789
defaultNetwork:
type: OpenShiftSDN
openshiftSDNConfig:
mode: NetworkPolicy
mtu: 1450
vxlanPort: 4789
Configuration for the OVN-Kubernetes network plugin
The following table describes the configuration fields for the OVN-Kubernetes network plugin:
Field | Type | Description |
---|---|---|
|
| The maximum transmission unit (MTU) for the Geneve (Generic Network Virtualization Encapsulation) overlay network. This value is normally configured automatically. |
|
| The UDP port for the Geneve overlay network. |
|
| An object describing the IPsec mode for the cluster. |
|
| Specifies a configuration object for IPv4 settings. |
|
| Specifies a configuration object for IPv6 settings. |
|
| Specify a configuration object for customizing network policy audit logging. If unset, the defaults audit log settings are used. |
|
| Optional: Specify a configuration object for customizing how egress traffic is sent to the node gateway. Note While migrating egress traffic, you can expect some disruption to workloads and service traffic until the Cluster Network Operator (CNO) successfully rolls out the changes. |
Field | Type | Description |
---|---|---|
| string |
If your existing network infrastructure overlaps with the
The default value is |
| string |
If your existing network infrastructure overlaps with the
The default value is |
Field | Type | Description |
---|---|---|
| string |
If your existing network infrastructure overlaps with the
This field cannot be changed after installation. The default value is |
| string |
If your existing network infrastructure overlaps with the
The default value is |
Field | Type | Description |
---|---|---|
| integer |
The maximum number of messages to generate every second per node. The default value is |
| integer |
The maximum size for the audit log in bytes. The default value is |
| integer | The maximum number of log files that are retained. |
| string | One of the following additional audit log targets:
|
| string |
The syslog facility, such as |
Field | Type | Description |
---|---|---|
|
|
Set this field to
This field has an interaction with the Open vSwitch hardware offloading feature. If you set this field to |
|
|
You can control IP forwarding for all traffic on OVN-Kubernetes managed interfaces by using the |
|
| Optional: Specify an object to configure the internal OVN-Kubernetes masquerade address for host to service traffic for IPv4 addresses. |
|
| Optional: Specify an object to configure the internal OVN-Kubernetes masquerade address for host to service traffic for IPv6 addresses. |
Field | Type | Description |
---|---|---|
|
|
The masquerade IPv4 addresses that are used internally to enable host to service traffic. The host is configured with these IP addresses as well as the shared gateway bridge interface. The default value is |
Field | Type | Description |
---|---|---|
|
|
The masquerade IPv6 addresses that are used internally to enable host to service traffic. The host is configured with these IP addresses as well as the shared gateway bridge interface. The default value is |
Field | Type | Description |
---|---|---|
|
| Specifies the behavior of the IPsec implementation. Must be one of the following values:
|
You can only change the configuration for your cluster network plugin during cluster installation, except for the gatewayConfig
field that can be changed at runtime as a postinstallation activity.
Example OVN-Kubernetes configuration with IPSec enabled
defaultNetwork: type: OVNKubernetes ovnKubernetesConfig: mtu: 1400 genevePort: 6081 ipsecConfig: mode: Full
defaultNetwork:
type: OVNKubernetes
ovnKubernetesConfig:
mtu: 1400
genevePort: 6081
ipsecConfig:
mode: Full
Using OVNKubernetes can lead to a stack exhaustion problem on IBM Power®.
kubeProxyConfig object configuration (OpenShiftSDN container network interface only)
The values for the kubeProxyConfig
object are defined in the following table:
Field | Type | Description |
---|---|---|
|
|
The refresh period for Note
Because of performance improvements introduced in OpenShift Container Platform 4.3 and greater, adjusting the |
|
|
The minimum duration before refreshing kubeProxyConfig: proxyArguments: iptables-min-sync-period: - 0s
|
7.6.2. Cluster Network Operator example configuration
A complete CNO configuration is specified in the following example:
Example Cluster Network Operator object
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: - 172.30.0.0/16 networkType: OVNKubernetes
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
serviceNetwork:
- 172.30.0.0/16
networkType: OVNKubernetes
Chapter 8. DNS Operator in OpenShift Container Platform
The DNS Operator deploys and manages CoreDNS to provide a name resolution service to pods, enabling DNS-based Kubernetes Service discovery in OpenShift Container Platform.
8.1. DNS Operator
The DNS Operator implements the dns
API from the operator.openshift.io
API group. The Operator deploys CoreDNS using a daemon set, creates a service for the daemon set, and configures the kubelet to instruct pods to use the CoreDNS service IP address for name resolution.
Procedure
The DNS Operator is deployed during installation with a Deployment
object.
Use the
oc get
command to view the deployment status:oc get -n openshift-dns-operator deployment/dns-operator
$ oc get -n openshift-dns-operator deployment/dns-operator
Copy to Clipboard Copied! Example output
NAME READY UP-TO-DATE AVAILABLE AGE dns-operator 1/1 1 1 23h
NAME READY UP-TO-DATE AVAILABLE AGE dns-operator 1/1 1 1 23h
Copy to Clipboard Copied! Use the
oc get
command to view the state of the DNS Operator:oc get clusteroperator/dns
$ oc get clusteroperator/dns
Copy to Clipboard Copied! Example output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE dns 4.1.0-0.11 True False False 92m
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE dns 4.1.0-0.11 True False False 92m
Copy to Clipboard Copied! AVAILABLE
,PROGRESSING
andDEGRADED
provide information about the status of the operator.AVAILABLE
isTrue
when at least 1 pod from the CoreDNS daemon set reports anAvailable
status condition.
8.2. Changing the DNS Operator managementState
DNS manages the CoreDNS component to provide a name resolution service for pods and services in the cluster. The managementState
of the DNS Operator is set to Managed
by default, which means that the DNS Operator is actively managing its resources. You can change it to Unmanaged
, which means the DNS Operator is not managing its resources.
The following are use cases for changing the DNS Operator managementState
:
-
You are a developer and want to test a configuration change to see if it fixes an issue in CoreDNS. You can stop the DNS Operator from overwriting the fix by setting the
managementState
toUnmanaged
. -
You are a cluster administrator and have reported an issue with CoreDNS, but need to apply a workaround until the issue is fixed. You can set the
managementState
field of the DNS Operator toUnmanaged
to apply the workaround.
Procedure
Change
managementState
DNS Operator:oc patch dns.operator.openshift.io default --type merge --patch '{"spec":{"managementState":"Unmanaged"}}'
oc patch dns.operator.openshift.io default --type merge --patch '{"spec":{"managementState":"Unmanaged"}}'
Copy to Clipboard Copied!
8.3. Controlling DNS pod placement
The DNS Operator has two daemon sets: one for CoreDNS and one for managing the /etc/hosts
file. The daemon set for /etc/hosts
must run on every node host to add an entry for the cluster image registry to support pulling images. Security policies can prohibit communication between pairs of nodes, which prevents the daemon set for CoreDNS from running on every node.
As a cluster administrator, you can use a custom node selector to configure the daemon set for CoreDNS to run or not run on certain nodes.
Prerequisites
-
You installed the
oc
CLI. -
You are logged in to the cluster with a user with
cluster-admin
privileges.
Procedure
To prevent communication between certain nodes, configure the
spec.nodePlacement.nodeSelector
API field:Modify the DNS Operator object named
default
:oc edit dns.operator/default
$ oc edit dns.operator/default
Copy to Clipboard Copied! Specify a node selector that includes only control plane nodes in the
spec.nodePlacement.nodeSelector
API field:spec: nodePlacement: nodeSelector: node-role.kubernetes.io/worker: ""
spec: nodePlacement: nodeSelector: node-role.kubernetes.io/worker: ""
Copy to Clipboard Copied!
To allow the daemon set for CoreDNS to run on nodes, configure a taint and toleration:
Modify the DNS Operator object named
default
:oc edit dns.operator/default
$ oc edit dns.operator/default
Copy to Clipboard Copied! Specify a taint key and a toleration for the taint:
spec: nodePlacement: tolerations: - effect: NoExecute key: "dns-only" operators: Equal value: abc tolerationSeconds: 3600
spec: nodePlacement: tolerations: - effect: NoExecute key: "dns-only" operators: Equal value: abc tolerationSeconds: 3600
1 Copy to Clipboard Copied! - 1
- If the taint is
dns-only
, it can be tolerated indefinitely. You can omittolerationSeconds
.
8.4. View the default DNS
Every new OpenShift Container Platform installation has a dns.operator
named default
.
Procedure
Use the
oc describe
command to view the defaultdns
:oc describe dns.operator/default
$ oc describe dns.operator/default
Copy to Clipboard Copied! Example output
Name: default Namespace: Labels: <none> Annotations: <none> API Version: operator.openshift.io/v1 Kind: DNS ... Status: Cluster Domain: cluster.local Cluster IP: 172.30.0.10 ...
Name: default Namespace: Labels: <none> Annotations: <none> API Version: operator.openshift.io/v1 Kind: DNS ... Status: Cluster Domain: cluster.local
1 Cluster IP: 172.30.0.10
2 ...
Copy to Clipboard Copied! To find the service CIDR of your cluster, use the
oc get
command:oc get networks.config/cluster -o jsonpath='{$.status.serviceNetwork}'
$ oc get networks.config/cluster -o jsonpath='{$.status.serviceNetwork}'
Copy to Clipboard Copied!
Example output
[172.30.0.0/16]
[172.30.0.0/16]
8.5. Using DNS forwarding
You can use DNS forwarding to override the default forwarding configuration in the /etc/resolv.conf
file in the following ways:
- Specify name servers for every zone. If the forwarded zone is the Ingress domain managed by OpenShift Container Platform, then the upstream name server must be authorized for the domain.
- Provide a list of upstream DNS servers.
- Change the default forwarding policy.
A DNS forwarding configuration for the default domain can have both the default servers specified in the /etc/resolv.conf
file and the upstream DNS servers.
Procedure
Modify the DNS Operator object named
default
:oc edit dns.operator/default
$ oc edit dns.operator/default
Copy to Clipboard Copied! After you issue the previous command, the Operator creates and updates the config map named
dns-default
with additional server configuration blocks based onServer
. If none of the servers have a zone that matches the query, then name resolution falls back to the upstream DNS servers.Configuring DNS forwarding
apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: servers: - name: example-server zones: - example.com forwardPlugin: policy: Random upstreams: - 1.1.1.1 - 2.2.2.2:5353 upstreamResolvers: policy: Random upstreams: - type: SystemResolvConf - type: Network address: 1.2.3.4 port: 53
apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: servers: - name: example-server
1 zones:
2 - example.com forwardPlugin: policy: Random
3 upstreams:
4 - 1.1.1.1 - 2.2.2.2:5353 upstreamResolvers:
5 policy: Random
6 upstreams:
7 - type: SystemResolvConf
8 - type: Network address: 1.2.3.4
9 port: 53
10 Copy to Clipboard Copied! - 1
- Must comply with the
rfc6335
service name syntax. - 2
- Must conform to the definition of a subdomain in the
rfc1123
service name syntax. The cluster domain,cluster.local
, is an invalid subdomain for thezones
field. - 3
- Defines the policy to select upstream resolvers. Default value is
Random
. You can also use the valuesRoundRobin
, andSequential
. - 4
- A maximum of 15
upstreams
is allowed perforwardPlugin
. - 5
- Optional. You can use it to override the default policy and forward DNS resolution to the specified DNS resolvers (upstream resolvers) for the default domain. If you do not provide any upstream resolvers, the DNS name queries go to the servers in
/etc/resolv.conf
. - 6
- Determines the order in which upstream servers are selected for querying. You can specify one of these values:
Random
,RoundRobin
, orSequential
. The default value isSequential
. - 7
- Optional. You can use it to provide upstream resolvers.
- 8
- You can specify two types of
upstreams
-SystemResolvConf
andNetwork
.SystemResolvConf
configures the upstream to use/etc/resolv.conf
andNetwork
defines aNetworkresolver
. You can specify one or both. - 9
- If the specified type is
Network
, you must provide an IP address. Theaddress
field must be a valid IPv4 or IPv6 address. - 10
- If the specified type is
Network
, you can optionally provide a port. Theport
field must have a value between1
and65535
. If you do not specify a port for the upstream, by default port 853 is tried.
Optional: When working in a highly regulated environment, you might need the ability to secure DNS traffic when forwarding requests to upstream resolvers so that you can ensure additional DNS traffic and data privacy. Cluster administrators can configure transport layer security (TLS) for forwarded DNS queries.
Configuring DNS forwarding with TLS
apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: servers: - name: example-server zones: - example.com forwardPlugin: transportConfig: transport: TLS tls: caBundle: name: mycacert serverName: dnstls.example.com policy: Random upstreams: - 1.1.1.1 - 2.2.2.2:5353 upstreamResolvers: transportConfig: transport: TLS tls: caBundle: name: mycacert serverName: dnstls.example.com upstreams: - type: Network address: 1.2.3.4 port: 53
apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: servers: - name: example-server
1 zones:
2 - example.com forwardPlugin: transportConfig: transport: TLS
3 tls: caBundle: name: mycacert serverName: dnstls.example.com
4 policy: Random
5 upstreams:
6 - 1.1.1.1 - 2.2.2.2:5353 upstreamResolvers:
7 transportConfig: transport: TLS tls: caBundle: name: mycacert serverName: dnstls.example.com upstreams: - type: Network
8 address: 1.2.3.4
9 port: 53
10 Copy to Clipboard Copied! - 1
- Must comply with the
rfc6335
service name syntax. - 2
- Must conform to the definition of a subdomain in the
rfc1123
service name syntax. The cluster domain,cluster.local
, is an invalid subdomain for thezones
field. The cluster domain,cluster.local
, is an invalidsubdomain
forzones
. - 3
- When configuring TLS for forwarded DNS queries, set the
transport
field to have the valueTLS
. By default, CoreDNS caches forwarded connections for 10 seconds. CoreDNS will hold a TCP connection open for those 10 seconds if no request is issued. With large clusters, ensure that your DNS server is aware that it might get many new connections to hold open because you can initiate a connection per node. Set up your DNS hierarchy accordingly to avoid performance issues. - 4
- When configuring TLS for forwarded DNS queries, this is a mandatory server name used as part of the server name indication (SNI) to validate the upstream TLS server certificate.
- 5
- Defines the policy to select upstream resolvers. Default value is
Random
. You can also use the valuesRoundRobin
, andSequential
. - 6
- Required. You can use it to provide upstream resolvers. A maximum of 15
upstreams
entries are allowed perforwardPlugin
entry. - 7
- Optional. You can use it to override the default policy and forward DNS resolution to the specified DNS resolvers (upstream resolvers) for the default domain. If you do not provide any upstream resolvers, the DNS name queries go to the servers in
/etc/resolv.conf
. - 8
Network
type indicates that this upstream resolver should handle forwarded requests separately from the upstream resolvers listed in/etc/resolv.conf
. Only theNetwork
type is allowed when using TLS and you must provide an IP address.- 9
- The
address
field must be a valid IPv4 or IPv6 address. - 10
- You can optionally provide a port. The
port
must have a value between1
and65535
. If you do not specify a port for the upstream, by default port 853 is tried.
NoteIf
servers
is undefined or invalid, the config map only contains the default server.
Verification
View the config map:
oc get configmap/dns-default -n openshift-dns -o yaml
$ oc get configmap/dns-default -n openshift-dns -o yaml
Copy to Clipboard Copied! Sample DNS ConfigMap based on previous sample DNS
apiVersion: v1 data: Corefile: | example.com:5353 { forward . 1.1.1.1 2.2.2.2:5353 } bar.com:5353 example.com:5353 { forward . 3.3.3.3 4.4.4.4:5454 } .:5353 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf 1.2.3.4:53 { policy Random } cache 30 reload } kind: ConfigMap metadata: labels: dns.operator.openshift.io/owning-dns: default name: dns-default namespace: openshift-dns
apiVersion: v1 data: Corefile: | example.com:5353 { forward . 1.1.1.1 2.2.2.2:5353 } bar.com:5353 example.com:5353 { forward . 3.3.3.3 4.4.4.4:5454
1 } .:5353 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf 1.2.3.4:53 { policy Random } cache 30 reload } kind: ConfigMap metadata: labels: dns.operator.openshift.io/owning-dns: default name: dns-default namespace: openshift-dns
Copy to Clipboard Copied! - 1
- Changes to the
forwardPlugin
triggers a rolling update of the CoreDNS daemon set.
8.6. DNS Operator status
You can inspect the status and view the details of the DNS Operator using the oc describe
command.
Procedure
View the status of the DNS Operator:
oc describe clusteroperators/dns
$ oc describe clusteroperators/dns
8.7. DNS Operator logs
You can view DNS Operator logs by using the oc logs
command.
Procedure
View the logs of the DNS Operator:
oc logs -n openshift-dns-operator deployment/dns-operator -c dns-operator
$ oc logs -n openshift-dns-operator deployment/dns-operator -c dns-operator
8.8. Setting the CoreDNS log level
You can configure the CoreDNS log level to determine the amount of detail in logged error messages. The valid values for CoreDNS log level are Normal
, Debug
, and Trace
. The default logLevel
is Normal
.
The errors plugin is always enabled. The following logLevel
settings report different error responses:
-
logLevel
:Normal
enables the "errors" class:log . { class error }
. -
logLevel
:Debug
enables the "denial" class:log . { class denial error }
. -
logLevel
:Trace
enables the "all" class:log . { class all }
.
Procedure
To set
logLevel
toDebug
, enter the following command:oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Debug"}}' --type=merge
$ oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Debug"}}' --type=merge
Copy to Clipboard Copied! To set
logLevel
toTrace
, enter the following command:oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Trace"}}' --type=merge
$ oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Trace"}}' --type=merge
Copy to Clipboard Copied!
Verification
To ensure the desired log level was set, check the config map:
oc get configmap/dns-default -n openshift-dns -o yaml
$ oc get configmap/dns-default -n openshift-dns -o yaml
Copy to Clipboard Copied!
8.9. Viewing the CoreDNS logs
You can view CoreDNS logs by using the oc logs
command.
Procedure
View the logs of a specific CoreDNS pod by entering the following command:
oc -n openshift-dns logs -c dns <core_dns_pod_name>
$ oc -n openshift-dns logs -c dns <core_dns_pod_name>
Copy to Clipboard Copied! Follow the logs of all CoreDNS pods by entering the following command:
oc -n openshift-dns logs -c dns -l dns.operator.openshift.io/daemonset-dns=default -f --max-log-requests=<number>
$ oc -n openshift-dns logs -c dns -l dns.operator.openshift.io/daemonset-dns=default -f --max-log-requests=<number>
1 Copy to Clipboard Copied! - 1
- Specifies the number of DNS pods to stream logs from. The maximum is 6.
8.10. Setting the CoreDNS Operator log level
Cluster administrators can configure the Operator log level to more quickly track down OpenShift DNS issues. The valid values for operatorLogLevel
are Normal
, Debug
, and Trace
. Trace
has the most detailed information. The default operatorlogLevel
is Normal
. There are seven logging levels for issues: Trace, Debug, Info, Warning, Error, Fatal and Panic. After the logging level is set, log entries with that severity or anything above it will be logged.
-
operatorLogLevel: "Normal"
setslogrus.SetLogLevel("Info")
. -
operatorLogLevel: "Debug"
setslogrus.SetLogLevel("Debug")
. -
operatorLogLevel: "Trace"
setslogrus.SetLogLevel("Trace")
.
Procedure
To set
operatorLogLevel
toDebug
, enter the following command:oc patch dnses.operator.openshift.io/default -p '{"spec":{"operatorLogLevel":"Debug"}}' --type=merge
$ oc patch dnses.operator.openshift.io/default -p '{"spec":{"operatorLogLevel":"Debug"}}' --type=merge
Copy to Clipboard Copied! To set
operatorLogLevel
toTrace
, enter the following command:oc patch dnses.operator.openshift.io/default -p '{"spec":{"operatorLogLevel":"Trace"}}' --type=merge
$ oc patch dnses.operator.openshift.io/default -p '{"spec":{"operatorLogLevel":"Trace"}}' --type=merge
Copy to Clipboard Copied!
8.11. Tuning the CoreDNS cache
You can configure the maximum duration of both successful or unsuccessful caching, also known as positive or negative caching respectively, done by CoreDNS. Tuning the duration of caching of DNS query responses can reduce the load for any upstream DNS resolvers.
Procedure
Edit the DNS Operator object named
default
by running the following command:oc edit dns.operator.openshift.io/default
$ oc edit dns.operator.openshift.io/default
Copy to Clipboard Copied! Modify the time-to-live (TTL) caching values:
Configuring DNS caching
apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: cache: positiveTTL: 1h negativeTTL: 0.5h10m
apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: cache: positiveTTL: 1h
1 negativeTTL: 0.5h10m
2 Copy to Clipboard Copied! - 1
- The string value
1h
is converted to its respective number of seconds by CoreDNS. If this field is omitted, the value is assumed to be0s
and the cluster uses the internal default value of900s
as a fallback. - 2
- The string value can be a combination of units such as
0.5h10m
and is converted to its respective number of seconds by CoreDNS. If this field is omitted, the value is assumed to be0s
and the cluster uses the internal default value of30s
as a fallback.
WarningSetting TTL fields to low values could lead to an increased load on the cluster, any upstream resolvers, or both.
Chapter 9. Ingress Operator in OpenShift Container Platform
9.1. OpenShift Container Platform Ingress Operator
When you create your OpenShift Container Platform cluster, pods and services running on the cluster are each allocated their own IP addresses. The IP addresses are accessible to other pods and services running nearby but are not accessible to outside clients. The Ingress Operator implements the IngressController
API and is the component responsible for enabling external access to OpenShift Container Platform cluster services.
The Ingress Operator makes it possible for external clients to access your service by deploying and managing one or more HAProxy-based Ingress Controllers to handle routing. You can use the Ingress Operator to route traffic by specifying OpenShift Container Platform Route
and Kubernetes Ingress
resources. Configurations within the Ingress Controller, such as the ability to define endpointPublishingStrategy
type and internal load balancing, provide ways to publish Ingress Controller endpoints.
9.2. The Ingress configuration asset
The installation program generates an asset with an Ingress
resource in the config.openshift.io
API group, cluster-ingress-02-config.yml
.
YAML Definition of the Ingress
resource
apiVersion: config.openshift.io/v1 kind: Ingress metadata: name: cluster spec: domain: apps.openshiftdemos.com
apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
name: cluster
spec:
domain: apps.openshiftdemos.com
The installation program stores this asset in the cluster-ingress-02-config.yml
file in the manifests/
directory. This Ingress
resource defines the cluster-wide configuration for Ingress. This Ingress configuration is used as follows:
- The Ingress Operator uses the domain from the cluster Ingress configuration as the domain for the default Ingress Controller.
-
The OpenShift API Server Operator uses the domain from the cluster Ingress configuration. This domain is also used when generating a default host for a
Route
resource that does not specify an explicit host.
9.3. Ingress Controller configuration parameters
The IngressController
custom resource (CR) includes optional configuration parameters that you can configure to meet specific needs for your organization.
Parameter | Description |
---|---|
|
The
If empty, the default value is |
|
|
|
For cloud environments, use the
On GCP, AWS, and Azure you can configure the following
If not set, the default value is based on
For most platforms, the
For non-cloud environments, such as a bare-metal platform, use the
If you do not set a value in one of these fields, the default value is based on binding ports specified in the
If you need to update the
|
|
The
The secret must contain the following keys and data: *
If not set, a wildcard certificate is automatically generated and used. The certificate is valid for the Ingress Controller The in-use certificate, whether generated or user-specified, is automatically integrated with OpenShift Container Platform built-in OAuth server. |
|
|
|
|
|
If not set, the defaults values are used. Note
The nodePlacement: nodeSelector: matchLabels: kubernetes.io/os: linux tolerations: - effect: NoSchedule operator: Exists
|
|
If not set, the default value is based on the
When using the
The minimum TLS version for Ingress Controllers is Note
Ciphers and the minimum TLS version of the configured security profile are reflected in the Important
The Ingress Operator converts the TLS |
|
The
The |
|
|
|
|
|
By setting the
By default, the policy is set to
By setting These adjustments are only applied to cleartext, edge-terminated, and re-encrypt routes, and only when using HTTP/1.
For request headers, these adjustments are applied only for routes that have the
|
|
|
|
|
|
For any cookie that you want to capture, the following parameters must be in your
For example: httpCaptureCookies: - matchType: Exact maxLength: 128 name: MYCOOKIE
|
|
httpCaptureHeaders: request: - maxLength: 256 name: Connection - maxLength: 128 name: User-Agent response: - maxLength: 256 name: Content-Type - maxLength: 256 name: Content-Length
|
|
|
|
The
|
|
The
These connections come from load balancer health probes or web browser speculative connections (preconnect) and can be safely ignored. However, these requests can be caused by network errors, so setting this field to |
9.3.1. Ingress Controller TLS security profiles
TLS security profiles provide a way for servers to regulate which ciphers a connecting client can use when connecting to the server.
9.3.1.1. Understanding TLS security profiles
You can use a TLS (Transport Layer Security) security profile to define which TLS ciphers are required by various OpenShift Container Platform components. The OpenShift Container Platform TLS security profiles are based on Mozilla recommended configurations.
You can specify one of the following TLS security profiles for each component:
Profile | Description |
---|---|
| This profile is intended for use with legacy clients or libraries. The profile is based on the Old backward compatibility recommended configuration.
The Note For the Ingress Controller, the minimum TLS version is converted from 1.0 to 1.1. |
| This profile is the default TLS security profile for the Ingress Controller, kubelet, and control plane. The profile is based on the Intermediate compatibility recommended configuration.
The Note This profile is the recommended configuration for the majority of clients. |
| This profile is intended for use with modern clients that have no need for backwards compatibility. This profile is based on the Modern compatibility recommended configuration.
The |
| This profile allows you to define the TLS version and ciphers to use. Warning
Use caution when using a |
When using one of the predefined profile types, the effective profile configuration is subject to change between releases. For example, given a specification to use the Intermediate profile deployed on release X.Y.Z, an upgrade to release X.Y.Z+1 might cause a new profile configuration to be applied, resulting in a rollout.
9.3.1.2. Configuring the TLS security profile for the Ingress Controller
To configure a TLS security profile for an Ingress Controller, edit the IngressController
custom resource (CR) to specify a predefined or custom TLS security profile. If a TLS security profile is not configured, the default value is based on the TLS security profile set for the API server.
Sample IngressController
CR that configures the Old
TLS security profile
apiVersion: operator.openshift.io/v1 kind: IngressController ... spec: tlsSecurityProfile: old: {} type: Old ...
apiVersion: operator.openshift.io/v1
kind: IngressController
...
spec:
tlsSecurityProfile:
old: {}
type: Old
...
The TLS security profile defines the minimum TLS version and the TLS ciphers for TLS connections for Ingress Controllers.
You can see the ciphers and the minimum TLS version of the configured TLS security profile in the IngressController
custom resource (CR) under Status.Tls Profile
and the configured TLS security profile under Spec.Tls Security Profile
. For the Custom
TLS security profile, the specific ciphers and minimum TLS version are listed under both parameters.
The HAProxy Ingress Controller image supports TLS 1.3
and the Modern
profile.
The Ingress Operator also converts the TLS 1.0
of an Old
or Custom
profile to 1.1
.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Edit the
IngressController
CR in theopenshift-ingress-operator
project to configure the TLS security profile:oc edit IngressController default -n openshift-ingress-operator
$ oc edit IngressController default -n openshift-ingress-operator
Copy to Clipboard Copied! Add the
spec.tlsSecurityProfile
field:Sample
IngressController
CR for aCustom
profileapiVersion: operator.openshift.io/v1 kind: IngressController ... spec: tlsSecurityProfile: type: Custom custom: ciphers: - ECDHE-ECDSA-CHACHA20-POLY1305 - ECDHE-RSA-CHACHA20-POLY1305 - ECDHE-RSA-AES128-GCM-SHA256 - ECDHE-ECDSA-AES128-GCM-SHA256 minTLSVersion: VersionTLS11 ...
apiVersion: operator.openshift.io/v1 kind: IngressController ... spec: tlsSecurityProfile: type: Custom
1 custom:
2 ciphers:
3 - ECDHE-ECDSA-CHACHA20-POLY1305 - ECDHE-RSA-CHACHA20-POLY1305 - ECDHE-RSA-AES128-GCM-SHA256 - ECDHE-ECDSA-AES128-GCM-SHA256 minTLSVersion: VersionTLS11 ...
Copy to Clipboard Copied! - Save the file to apply the changes.
Verification
Verify that the profile is set in the
IngressController
CR:oc describe IngressController default -n openshift-ingress-operator
$ oc describe IngressController default -n openshift-ingress-operator
Copy to Clipboard Copied! Example output
Name: default Namespace: openshift-ingress-operator Labels: <none> Annotations: <none> API Version: operator.openshift.io/v1 Kind: IngressController ... Spec: ... Tls Security Profile: Custom: Ciphers: ECDHE-ECDSA-CHACHA20-POLY1305 ECDHE-RSA-CHACHA20-POLY1305 ECDHE-RSA-AES128-GCM-SHA256 ECDHE-ECDSA-AES128-GCM-SHA256 Min TLS Version: VersionTLS11 Type: Custom ...
Name: default Namespace: openshift-ingress-operator Labels: <none> Annotations: <none> API Version: operator.openshift.io/v1 Kind: IngressController ... Spec: ... Tls Security Profile: Custom: Ciphers: ECDHE-ECDSA-CHACHA20-POLY1305 ECDHE-RSA-CHACHA20-POLY1305 ECDHE-RSA-AES128-GCM-SHA256 ECDHE-ECDSA-AES128-GCM-SHA256 Min TLS Version: VersionTLS11 Type: Custom ...
Copy to Clipboard Copied!
9.3.1.3. Configuring mutual TLS authentication
You can configure the Ingress Controller to enable mutual TLS (mTLS) authentication by setting a spec.clientTLS
value. The clientTLS
value configures the Ingress Controller to verify client certificates. This configuration includes setting a clientCA
value, which is a reference to a config map. The config map contains the PEM-encoded CA certificate bundle that is used to verify a client’s certificate. Optionally, you can also configure a list of certificate subject filters.
If the clientCA
value specifies an X509v3 certificate revocation list (CRL) distribution point, the Ingress Operator downloads and manages a CRL config map based on the HTTP URI X509v3 CRL Distribution Point
specified in each provided certificate. The Ingress Controller uses this config map during mTLS/TLS negotiation. Requests that do not provide valid certificates are rejected.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - You have a PEM-encoded CA certificate bundle.
If your CA bundle references a CRL distribution point, you must have also included the end-entity or leaf certificate to the client CA bundle. This certificate must have included an HTTP URI under
CRL Distribution Points
, as described in RFC 5280. For example:Issuer: C=US, O=Example Inc, CN=Example Global G2 TLS RSA SHA256 2020 CA1 Subject: SOME SIGNED CERT X509v3 CRL Distribution Points: Full Name: URI:http://crl.example.com/example.crl
Issuer: C=US, O=Example Inc, CN=Example Global G2 TLS RSA SHA256 2020 CA1 Subject: SOME SIGNED CERT X509v3 CRL Distribution Points: Full Name: URI:http://crl.example.com/example.crl
Copy to Clipboard Copied!
Procedure
In the
openshift-config
namespace, create a config map from your CA bundle:oc create configmap \ router-ca-certs-default \ --from-file=ca-bundle.pem=client-ca.crt \ -n openshift-config
$ oc create configmap \ router-ca-certs-default \ --from-file=ca-bundle.pem=client-ca.crt \
1 -n openshift-config
Copy to Clipboard Copied! - 1
- The config map data key must be
ca-bundle.pem
, and the data value must be a CA certificate in PEM format.
Edit the
IngressController
resource in theopenshift-ingress-operator
project:oc edit IngressController default -n openshift-ingress-operator
$ oc edit IngressController default -n openshift-ingress-operator
Copy to Clipboard Copied! Add the
spec.clientTLS
field and subfields to configure mutual TLS:Sample
IngressController
CR for aclientTLS
profile that specifies filtering patternsapiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: clientTLS: clientCertificatePolicy: Required clientCA: name: router-ca-certs-default allowedSubjectPatterns: - "^/CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift$"
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: clientTLS: clientCertificatePolicy: Required clientCA: name: router-ca-certs-default allowedSubjectPatterns: - "^/CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift$"
Copy to Clipboard Copied! -
Optional, get the Distinguished Name (DN) for
allowedSubjectPatterns
by entering the following command.
openssl x509 -in custom-cert.pem -noout -subject
$ openssl x509 -in custom-cert.pem -noout -subject
subject= /CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift
9.4. View the default Ingress Controller
The Ingress Operator is a core feature of OpenShift Container Platform and is enabled out of the box.
Every new OpenShift Container Platform installation has an ingresscontroller
named default. It can be supplemented with additional Ingress Controllers. If the default ingresscontroller
is deleted, the Ingress Operator will automatically recreate it within a minute.
Procedure
View the default Ingress Controller:
oc describe --namespace=openshift-ingress-operator ingresscontroller/default
$ oc describe --namespace=openshift-ingress-operator ingresscontroller/default
Copy to Clipboard Copied!
9.5. View Ingress Operator status
You can view and inspect the status of your Ingress Operator.
Procedure
View your Ingress Operator status:
oc describe clusteroperators/ingress
$ oc describe clusteroperators/ingress
Copy to Clipboard Copied!
9.6. View Ingress Controller logs
You can view your Ingress Controller logs.
Procedure
View your Ingress Controller logs:
oc logs --namespace=openshift-ingress-operator deployments/ingress-operator -c <container_name>
$ oc logs --namespace=openshift-ingress-operator deployments/ingress-operator -c <container_name>
Copy to Clipboard Copied!
9.7. View Ingress Controller status
Your can view the status of a particular Ingress Controller.
Procedure
View the status of an Ingress Controller:
oc describe --namespace=openshift-ingress-operator ingresscontroller/<name>
$ oc describe --namespace=openshift-ingress-operator ingresscontroller/<name>
Copy to Clipboard Copied!
9.8. Creating a custom Ingress Controller
As a cluster administrator, you can create a new custom Ingress Controller. Because the default Ingress Controller might change during OpenShift Container Platform updates, creating a custom Ingress Controller can be helpful when maintaining a configuration manually that persists across cluster updates.
This example provides a minimal spec for a custom Ingress Controller. To further customize your custom Ingress Controller, see "Configuring the Ingress Controller".
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a YAML file that defines the custom
IngressController
object:Example
custom-ingress-controller.yaml
fileapiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: <custom_name> namespace: openshift-ingress-operator spec: defaultCertificate: name: <custom-ingress-custom-certs> replicas: 1 domain: <custom_domain>
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: <custom_name>
1 namespace: openshift-ingress-operator spec: defaultCertificate: name: <custom-ingress-custom-certs>
2 replicas: 1
3 domain: <custom_domain>
4 Copy to Clipboard Copied! - 1
- Specify the a custom
name
for theIngressController
object. - 2
- Specify the name of the secret with the custom wildcard certificate.
- 3
- Minimum replica needs to be ONE
- 4
- Specify the domain to your domain name. The domain specified on the IngressController object and the domain used for the certificate must match. For example, if the domain value is "custom_domain.mycompany.com", then the certificate must have SAN *.custom_domain.mycompany.com (with the
*.
added to the domain).
Create the object by running the following command:
oc create -f custom-ingress-controller.yaml
$ oc create -f custom-ingress-controller.yaml
Copy to Clipboard Copied!
9.9. Configuring the Ingress Controller
9.9.1. Setting a custom default certificate
As an administrator, you can configure an Ingress Controller to use a custom certificate by creating a Secret resource and editing the IngressController
custom resource (CR).
Prerequisites
- You must have a certificate/key pair in PEM-encoded files, where the certificate is signed by a trusted certificate authority or by a private trusted certificate authority that you configured in a custom PKI.
Your certificate meets the following requirements:
- The certificate is valid for the ingress domain.
-
The certificate uses the
subjectAltName
extension to specify a wildcard domain, such as*.apps.ocp4.example.com
.
You must have an
IngressController
CR. You may use the default one:oc --namespace openshift-ingress-operator get ingresscontrollers
$ oc --namespace openshift-ingress-operator get ingresscontrollers
Copy to Clipboard Copied! Example output
NAME AGE default 10m
NAME AGE default 10m
Copy to Clipboard Copied!
If you have intermediate certificates, they must be included in the tls.crt
file of the secret containing a custom default certificate. Order matters when specifying a certificate; list your intermediate certificate(s) after any server certificate(s).
Procedure
The following assumes that the custom certificate and key pair are in the tls.crt
and tls.key
files in the current working directory. Substitute the actual path names for tls.crt
and tls.key
. You also may substitute another name for custom-certs-default
when creating the Secret resource and referencing it in the IngressController CR.
This action will cause the Ingress Controller to be redeployed, using a rolling deployment strategy.
Create a Secret resource containing the custom certificate in the
openshift-ingress
namespace using thetls.crt
andtls.key
files.oc --namespace openshift-ingress create secret tls custom-certs-default --cert=tls.crt --key=tls.key
$ oc --namespace openshift-ingress create secret tls custom-certs-default --cert=tls.crt --key=tls.key
Copy to Clipboard Copied! Update the IngressController CR to reference the new certificate secret:
oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \ --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'
$ oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \ --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'
Copy to Clipboard Copied! Verify the update was effective:
echo Q |\ openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null |\ openssl x509 -noout -subject -issuer -enddate
$ echo Q |\ openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null |\ openssl x509 -noout -subject -issuer -enddate
Copy to Clipboard Copied! where:
<domain>
- Specifies the base domain name for your cluster.
Example output
subject=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = *.apps.example.com issuer=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = example.com notAfter=May 10 08:32:45 2022 GM
subject=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = *.apps.example.com issuer=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = example.com notAfter=May 10 08:32:45 2022 GM
Copy to Clipboard Copied! TipYou can alternatively apply the following YAML to set a custom default certificate:
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: defaultCertificate: name: custom-certs-default
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: defaultCertificate: name: custom-certs-default
Copy to Clipboard Copied! The certificate secret name should match the value used to update the CR.
Once the IngressController CR has been modified, the Ingress Operator updates the Ingress Controller’s deployment to use the custom certificate.
9.9.2. Removing a custom default certificate
As an administrator, you can remove a custom certificate that you configured an Ingress Controller to use.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. -
You have installed the OpenShift CLI (
oc
). - You previously configured a custom default certificate for the Ingress Controller.
Procedure
To remove the custom certificate and restore the certificate that ships with OpenShift Container Platform, enter the following command:
oc patch -n openshift-ingress-operator ingresscontrollers/default \ --type json -p $'- op: remove\n path: /spec/defaultCertificate'
$ oc patch -n openshift-ingress-operator ingresscontrollers/default \ --type json -p $'- op: remove\n path: /spec/defaultCertificate'
Copy to Clipboard Copied! There can be a delay while the cluster reconciles the new certificate configuration.
Verification
To confirm that the original cluster certificate is restored, enter the following command:
echo Q | \ openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null | \ openssl x509 -noout -subject -issuer -enddate
$ echo Q | \ openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null | \ openssl x509 -noout -subject -issuer -enddate
Copy to Clipboard Copied! where:
<domain>
- Specifies the base domain name for your cluster.
Example output
subject=CN = *.apps.<domain> issuer=CN = ingress-operator@1620633373 notAfter=May 10 10:44:36 2023 GMT
subject=CN = *.apps.<domain> issuer=CN = ingress-operator@1620633373 notAfter=May 10 10:44:36 2023 GMT
Copy to Clipboard Copied!
9.9.3. Autoscaling an Ingress Controller
You can automatically scale an Ingress Controller to dynamically meet routing performance or availability requirements, such as the requirement to increase throughput.
The following procedure provides an example for scaling up the default Ingress Controller.
Prerequisites
-
You have the OpenShift CLI (
oc
) installed. -
You have access to an OpenShift Container Platform cluster as a user with the
cluster-admin
role. You installed the Custom Metrics Autoscaler Operator and an associated KEDA Controller.
-
You can install the Operator by using OperatorHub on the web console. After you install the Operator, you can create an instance of
KedaController
.
-
You can install the Operator by using OperatorHub on the web console. After you install the Operator, you can create an instance of
Procedure
Create a service account to authenticate with Thanos by running the following command:
oc create -n openshift-ingress-operator serviceaccount thanos && oc describe -n openshift-ingress-operator serviceaccount thanos
$ oc create -n openshift-ingress-operator serviceaccount thanos && oc describe -n openshift-ingress-operator serviceaccount thanos
Copy to Clipboard Copied! Example output
Name: thanos Namespace: openshift-ingress-operator Labels: <none> Annotations: <none> Image pull secrets: thanos-dockercfg-kfvf2 Mountable secrets: thanos-dockercfg-kfvf2 Tokens: thanos-token-c422q Events: <none>
Name: thanos Namespace: openshift-ingress-operator Labels: <none> Annotations: <none> Image pull secrets: thanos-dockercfg-kfvf2 Mountable secrets: thanos-dockercfg-kfvf2 Tokens: thanos-token-c422q Events: <none>
Copy to Clipboard Copied! Optional: Manually create the service account secret token with the following command:
ImportantIf you disable the
ImageRegistry
capability or if you disable the integrated OpenShift image registry in the Cluster Image Registry Operator’s configuration, the image pull secret is not generated for each service account. In this situation, you must perform this step.oc apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: thanos-token namespace: openshift-ingress-operator annotations: kubernetes.io/service-account.name: thanos type: kubernetes.io/service-account-token EOF
$ oc apply -f - <<EOF apiVersion: v1 kind: Secret metadata: name: thanos-token namespace: openshift-ingress-operator annotations: kubernetes.io/service-account.name: thanos type: kubernetes.io/service-account-token EOF
Copy to Clipboard Copied! Define a
TriggerAuthentication
object within theopenshift-ingress-operator
namespace by using the service account’s token.Define the
secret
variable that contains the secret by running the following command:secret=$(oc get secret -n openshift-ingress-operator | grep thanos-token | head -n 1 | awk '{ print $1 }')
$ secret=$(oc get secret -n openshift-ingress-operator | grep thanos-token | head -n 1 | awk '{ print $1 }')
Copy to Clipboard Copied! Create the
TriggerAuthentication
object and pass the value of thesecret
variable to theTOKEN
parameter:oc process TOKEN="$secret" -f - <<EOF | oc apply -n openshift-ingress-operator -f - apiVersion: template.openshift.io/v1 kind: Template parameters: - name: TOKEN objects: - apiVersion: keda.sh/v1alpha1 kind: TriggerAuthentication metadata: name: keda-trigger-auth-prometheus spec: secretTargetRef: - parameter: bearerToken name: \${TOKEN} key: token - parameter: ca name: \${TOKEN} key: ca.crt EOF
$ oc process TOKEN="$secret" -f - <<EOF | oc apply -n openshift-ingress-operator -f - apiVersion: template.openshift.io/v1 kind: Template parameters: - name: TOKEN objects: - apiVersion: keda.sh/v1alpha1 kind: TriggerAuthentication metadata: name: keda-trigger-auth-prometheus spec: secretTargetRef: - parameter: bearerToken name: \${TOKEN} key: token - parameter: ca name: \${TOKEN} key: ca.crt EOF
Copy to Clipboard Copied!
Create and apply a role for reading metrics from Thanos:
Create a new role,
thanos-metrics-reader.yaml
, that reads metrics from pods and nodes:thanos-metrics-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: thanos-metrics-reader namespace: openshift-ingress-operator rules: - apiGroups: - "" resources: - pods - nodes verbs: - get - apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list - watch - apiGroups: - "" resources: - namespaces verbs: - get
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: thanos-metrics-reader namespace: openshift-ingress-operator rules: - apiGroups: - "" resources: - pods - nodes verbs: - get - apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list - watch - apiGroups: - "" resources: - namespaces verbs: - get
Copy to Clipboard Copied! Apply the new role by running the following command:
oc apply -f thanos-metrics-reader.yaml
$ oc apply -f thanos-metrics-reader.yaml
Copy to Clipboard Copied!
Add the new role to the service account by entering the following commands:
oc adm policy -n openshift-ingress-operator add-role-to-user thanos-metrics-reader -z thanos --role-namespace=openshift-ingress-operator
$ oc adm policy -n openshift-ingress-operator add-role-to-user thanos-metrics-reader -z thanos --role-namespace=openshift-ingress-operator
Copy to Clipboard Copied! oc adm policy -n openshift-ingress-operator add-cluster-role-to-user cluster-monitoring-view -z thanos
$ oc adm policy -n openshift-ingress-operator add-cluster-role-to-user cluster-monitoring-view -z thanos
Copy to Clipboard Copied! NoteThe argument
add-cluster-role-to-user
is only required if you use cross-namespace queries. The following step uses a query from thekube-metrics
namespace which requires this argument.Create a new
ScaledObject
YAML file,ingress-autoscaler.yaml
, that targets the default Ingress Controller deployment:Example
ScaledObject
definitionapiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: ingress-scaler namespace: openshift-ingress-operator spec: scaleTargetRef: apiVersion: operator.openshift.io/v1 kind: IngressController name: default envSourceContainerName: ingress-operator minReplicaCount: 1 maxReplicaCount: 20 cooldownPeriod: 1 pollingInterval: 1 triggers: - type: prometheus metricType: AverageValue metadata: serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091 namespace: openshift-ingress-operator metricName: 'kube-node-role' threshold: '1' query: 'sum(kube_node_role{role="worker",service="kube-state-metrics"})' authModes: "bearer" authenticationRef: name: keda-trigger-auth-prometheus
apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: ingress-scaler namespace: openshift-ingress-operator spec: scaleTargetRef:
1 apiVersion: operator.openshift.io/v1 kind: IngressController name: default envSourceContainerName: ingress-operator minReplicaCount: 1 maxReplicaCount: 20
2 cooldownPeriod: 1 pollingInterval: 1 triggers: - type: prometheus metricType: AverageValue metadata: serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
3 namespace: openshift-ingress-operator
4 metricName: 'kube-node-role' threshold: '1' query: 'sum(kube_node_role{role="worker",service="kube-state-metrics"})'
5 authModes: "bearer" authenticationRef: name: keda-trigger-auth-prometheus
Copy to Clipboard Copied! - 1
- The custom resource that you are targeting. In this case, the Ingress Controller.
- 2
- Optional: The maximum number of replicas. If you omit this field, the default maximum is set to 100 replicas.
- 3
- The Thanos service endpoint in the
openshift-monitoring
namespace. - 4
- The Ingress Operator namespace.
- 5
- This expression evaluates to however many worker nodes are present in the deployed cluster.
ImportantIf you are using cross-namespace queries, you must target port 9091 and not port 9092 in the
serverAddress
field. You also must have elevated privileges to read metrics from this port.Apply the custom resource definition by running the following command:
oc apply -f ingress-autoscaler.yaml
$ oc apply -f ingress-autoscaler.yaml
Copy to Clipboard Copied!
Verification
Verify that the default Ingress Controller is scaled out to match the value returned by the
kube-state-metrics
query by running the following commands:Use the
grep
command to search the Ingress Controller YAML file for replicas:oc get -n openshift-ingress-operator ingresscontroller/default -o yaml | grep replicas:
$ oc get -n openshift-ingress-operator ingresscontroller/default -o yaml | grep replicas:
Copy to Clipboard Copied! Example output
replicas: 3
replicas: 3
Copy to Clipboard Copied! Get the pods in the
openshift-ingress
project:oc get pods -n openshift-ingress
$ oc get pods -n openshift-ingress
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE router-default-7b5df44ff-l9pmm 2/2 Running 0 17h router-default-7b5df44ff-s5sl5 2/2 Running 0 3d22h router-default-7b5df44ff-wwsth 2/2 Running 0 66s
NAME READY STATUS RESTARTS AGE router-default-7b5df44ff-l9pmm 2/2 Running 0 17h router-default-7b5df44ff-s5sl5 2/2 Running 0 3d22h router-default-7b5df44ff-wwsth 2/2 Running 0 66s
Copy to Clipboard Copied!
9.9.4. Scaling an Ingress Controller
Manually scale an Ingress Controller to meeting routing performance or availability requirements such as the requirement to increase throughput. oc
commands are used to scale the IngressController
resource. The following procedure provides an example for scaling up the default IngressController
.
Scaling is not an immediate action, as it takes time to create the desired number of replicas.
Procedure
View the current number of available replicas for the default
IngressController
:oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'
$ oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'
Copy to Clipboard Copied! Example output
2
2
Copy to Clipboard Copied! Scale the default
IngressController
to the desired number of replicas using theoc patch
command. The following example scales the defaultIngressController
to 3 replicas:oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"replicas": 3}}' --type=merge
$ oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"replicas": 3}}' --type=merge
Copy to Clipboard Copied! Example output
ingresscontroller.operator.openshift.io/default patched
ingresscontroller.operator.openshift.io/default patched
Copy to Clipboard Copied! Verify that the default
IngressController
scaled to the number of replicas that you specified:oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'
$ oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'
Copy to Clipboard Copied! Example output
3
3
Copy to Clipboard Copied! TipYou can alternatively apply the following YAML to scale an Ingress Controller to three replicas:
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 3
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 3
1 Copy to Clipboard Copied! - 1
- If you need a different amount of replicas, change the
replicas
value.
9.9.5. Configuring Ingress access logging
You can configure the Ingress Controller to enable access logs. If you have clusters that do not receive much traffic, then you can log to a sidecar. If you have high traffic clusters, to avoid exceeding the capacity of the logging stack or to integrate with a logging infrastructure outside of OpenShift Container Platform, you can forward logs to a custom syslog endpoint. You can also specify the format for access logs.
Container logging is useful to enable access logs on low-traffic clusters when there is no existing Syslog logging infrastructure, or for short-term use while diagnosing problems with the Ingress Controller.
Syslog is needed for high-traffic clusters where access logs could exceed the OpenShift Logging stack’s capacity, or for environments where any logging solution needs to integrate with an existing Syslog logging infrastructure. The Syslog use-cases can overlap.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
Configure Ingress access logging to a sidecar.
To configure Ingress access logging, you must specify a destination using
spec.logging.access.destination
. To specify logging to a sidecar container, you must specifyContainer
spec.logging.access.destination.type
. The following example is an Ingress Controller definition that logs to aContainer
destination:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Container
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Container
Copy to Clipboard Copied! When you configure the Ingress Controller to log to a sidecar, the operator creates a container named
logs
inside the Ingress Controller Pod:oc -n openshift-ingress logs deployment.apps/router-default -c logs
$ oc -n openshift-ingress logs deployment.apps/router-default -c logs
Copy to Clipboard Copied! Example output
2020-05-11T19:11:50.135710+00:00 router-default-57dfc6cd95-bpmk6 router-default-57dfc6cd95-bpmk6 haproxy[108]: 174.19.21.82:39654 [11/May/2020:19:11:50.133] public be_http:hello-openshift:hello-openshift/pod:hello-openshift:hello-openshift:10.128.2.12:8080 0/0/1/0/1 200 142 - - --NI 1/1/0/0/0 0/0 "GET / HTTP/1.1"
2020-05-11T19:11:50.135710+00:00 router-default-57dfc6cd95-bpmk6 router-default-57dfc6cd95-bpmk6 haproxy[108]: 174.19.21.82:39654 [11/May/2020:19:11:50.133] public be_http:hello-openshift:hello-openshift/pod:hello-openshift:hello-openshift:10.128.2.12:8080 0/0/1/0/1 200 142 - - --NI 1/1/0/0/0 0/0 "GET / HTTP/1.1"
Copy to Clipboard Copied!
Configure Ingress access logging to a Syslog endpoint.
To configure Ingress access logging, you must specify a destination using
spec.logging.access.destination
. To specify logging to a Syslog endpoint destination, you must specifySyslog
forspec.logging.access.destination.type
. If the destination type isSyslog
, you must also specify a destination endpoint usingspec.logging.access.destination.syslog.endpoint
and you can specify a facility usingspec.logging.access.destination.syslog.facility
. The following example is an Ingress Controller definition that logs to aSyslog
destination:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Syslog syslog: address: 1.2.3.4 port: 10514
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Syslog syslog: address: 1.2.3.4 port: 10514
Copy to Clipboard Copied! NoteThe
syslog
destination port must be UDP.
Configure Ingress access logging with a specific log format.
You can specify
spec.logging.access.httpLogFormat
to customize the log format. The following example is an Ingress Controller definition that logs to asyslog
endpoint with IP address 1.2.3.4 and port 10514:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Syslog syslog: address: 1.2.3.4 port: 10514 httpLogFormat: '%ci:%cp [%t] %ft %b/%s %B %bq %HM %HU %HV'
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Syslog syslog: address: 1.2.3.4 port: 10514 httpLogFormat: '%ci:%cp [%t] %ft %b/%s %B %bq %HM %HU %HV'
Copy to Clipboard Copied!
Disable Ingress access logging.
To disable Ingress access logging, leave
spec.logging
orspec.logging.access
empty:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: null
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: null
Copy to Clipboard Copied!
Allow the Ingress Controller to modify the HAProxy log length when using a sidecar.
Use
spec.logging.access.destination.syslog.maxLength
if you are usingspec.logging.access.destination.type: Syslog
.apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Syslog syslog: address: 1.2.3.4 maxLength: 4096 port: 10514
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Syslog syslog: address: 1.2.3.4 maxLength: 4096 port: 10514
Copy to Clipboard Copied! Use
spec.logging.access.destination.container.maxLength
if you are usingspec.logging.access.destination.type: Container
.apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Container container: maxLength: 8192
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Container container: maxLength: 8192
Copy to Clipboard Copied!
9.9.6. Setting Ingress Controller thread count
A cluster administrator can set the thread count to increase the amount of incoming connections a cluster can handle. You can patch an existing Ingress Controller to increase the amount of threads.
Prerequisites
- The following assumes that you already created an Ingress Controller.
Procedure
Update the Ingress Controller to increase the number of threads:
oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"threadCount": 8}}}'
$ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"threadCount": 8}}}'
Copy to Clipboard Copied! NoteIf you have a node that is capable of running large amounts of resources, you can configure
spec.nodePlacement.nodeSelector
with labels that match the capacity of the intended node, and configurespec.tuningOptions.threadCount
to an appropriately high value.
9.9.7. Configuring an Ingress Controller to use an internal load balancer
When creating an Ingress Controller on cloud platforms, the Ingress Controller is published by a public cloud load balancer by default. As an administrator, you can create an Ingress Controller that uses an internal cloud load balancer.
If your cloud provider is Microsoft Azure, you must have at least one public load balancer that points to your nodes. If you do not, all of your nodes will lose egress connectivity to the internet.
If you want to change the scope
for an IngressController
, you can change the .spec.endpointPublishingStrategy.loadBalancer.scope
parameter after the custom resource (CR) is created.
Figure 9.1. Diagram of LoadBalancer

The preceding graphic shows the following concepts pertaining to OpenShift Container Platform Ingress LoadBalancerService endpoint publishing strategy:
- You can load balance externally, using the cloud provider load balancer, or internally, using the OpenShift Ingress Controller Load Balancer.
- You can use the single IP address of the load balancer and more familiar ports, such as 8080 and 4200 as shown on the cluster depicted in the graphic.
- Traffic from the external load balancer is directed at the pods, and managed by the load balancer, as depicted in the instance of a down node. See the Kubernetes Services documentation for implementation details.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create an
IngressController
custom resource (CR) in a file named<name>-ingress-controller.yaml
, such as in the following example:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: <name> spec: domain: <domain> endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: Internal
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: <name>
1 spec: domain: <domain>
2 endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: Internal
3 Copy to Clipboard Copied! Create the Ingress Controller defined in the previous step by running the following command:
oc create -f <name>-ingress-controller.yaml
$ oc create -f <name>-ingress-controller.yaml
1 Copy to Clipboard Copied! - 1
- Replace
<name>
with the name of theIngressController
object.
Optional: Confirm that the Ingress Controller was created by running the following command:
oc --all-namespaces=true get ingresscontrollers
$ oc --all-namespaces=true get ingresscontrollers
Copy to Clipboard Copied!
9.9.8. Configuring global access for an Ingress Controller on GCP
An Ingress Controller created on GCP with an internal load balancer generates an internal IP address for the service. A cluster administrator can specify the global access option, which enables clients in any region within the same VPC network and compute region as the load balancer, to reach the workloads running on your cluster.
For more information, see the GCP documentation for global access.
Prerequisites
- You deployed an OpenShift Container Platform cluster on GCP infrastructure.
- You configured an Ingress Controller to use an internal load balancer.
-
You installed the OpenShift CLI (
oc
).
Procedure
Configure the Ingress Controller resource to allow global access.
NoteYou can also create an Ingress Controller and specify the global access option.
Configure the Ingress Controller resource:
oc -n openshift-ingress-operator edit ingresscontroller/default
$ oc -n openshift-ingress-operator edit ingresscontroller/default
Copy to Clipboard Copied! Edit the YAML file:
Sample
clientAccess
configuration toGlobal
spec: endpointPublishingStrategy: loadBalancer: providerParameters: gcp: clientAccess: Global type: GCP scope: Internal type: LoadBalancerService
spec: endpointPublishingStrategy: loadBalancer: providerParameters: gcp: clientAccess: Global
1 type: GCP scope: Internal type: LoadBalancerService
Copy to Clipboard Copied! - 1
- Set
gcp.clientAccess
toGlobal
.
- Save the file to apply the changes.
Run the following command to verify that the service allows global access:
oc -n openshift-ingress edit svc/router-default -o yaml
$ oc -n openshift-ingress edit svc/router-default -o yaml
Copy to Clipboard Copied! The output shows that global access is enabled for GCP with the annotation,
networking.gke.io/internal-load-balancer-allow-global-access
.
9.9.9. Setting the Ingress Controller health check interval
A cluster administrator can set the health check interval to define how long the router waits between two consecutive health checks. This value is applied globally as a default for all routes. The default value is 5 seconds.
Prerequisites
- The following assumes that you already created an Ingress Controller.
Procedure
Update the Ingress Controller to change the interval between back end health checks:
oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"healthCheckInterval": "8s"}}}'
$ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"healthCheckInterval": "8s"}}}'
Copy to Clipboard Copied! NoteTo override the
healthCheckInterval
for a single route, use the route annotationrouter.openshift.io/haproxy.health.check.interval
9.9.10. Configuring the default Ingress Controller for your cluster to be internal
You can configure the default
Ingress Controller for your cluster to be internal by deleting and recreating it.
If your cloud provider is Microsoft Azure, you must have at least one public load balancer that points to your nodes. If you do not, all of your nodes will lose egress connectivity to the internet.
If you want to change the scope
for an IngressController
, you can change the .spec.endpointPublishingStrategy.loadBalancer.scope
parameter after the custom resource (CR) is created.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Configure the
default
Ingress Controller for your cluster to be internal by deleting and recreating it.oc replace --force --wait --filename - <<EOF apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: default spec: endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: Internal EOF
$ oc replace --force --wait --filename - <<EOF apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: default spec: endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: Internal EOF
Copy to Clipboard Copied!
9.9.11. Configuring the route admission policy
Administrators and application developers can run applications in multiple namespaces with the same domain name. This is for organizations where multiple teams develop microservices that are exposed on the same hostname.
Allowing claims across namespaces should only be enabled for clusters with trust between namespaces, otherwise a malicious user could take over a hostname. For this reason, the default admission policy disallows hostname claims across namespaces.
Prerequisites
- Cluster administrator privileges.
Procedure
Edit the
.spec.routeAdmission
field of theingresscontroller
resource variable using the following command:oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge
$ oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge
Copy to Clipboard Copied! Sample Ingress Controller configuration
spec: routeAdmission: namespaceOwnership: InterNamespaceAllowed ...
spec: routeAdmission: namespaceOwnership: InterNamespaceAllowed ...
Copy to Clipboard Copied! TipYou can alternatively apply the following YAML to configure the route admission policy:
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: routeAdmission: namespaceOwnership: InterNamespaceAllowed
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: routeAdmission: namespaceOwnership: InterNamespaceAllowed
Copy to Clipboard Copied!
9.9.12. Using wildcard routes
The HAProxy Ingress Controller has support for wildcard routes. The Ingress Operator uses wildcardPolicy
to configure the ROUTER_ALLOW_WILDCARD_ROUTES
environment variable of the Ingress Controller.
The default behavior of the Ingress Controller is to admit routes with a wildcard policy of None
, which is backwards compatible with existing IngressController
resources.
Procedure
Configure the wildcard policy.
Use the following command to edit the
IngressController
resource:oc edit IngressController
$ oc edit IngressController
Copy to Clipboard Copied! Under
spec
, set thewildcardPolicy
field toWildcardsDisallowed
orWildcardsAllowed
:spec: routeAdmission: wildcardPolicy: WildcardsDisallowed # or WildcardsAllowed
spec: routeAdmission: wildcardPolicy: WildcardsDisallowed # or WildcardsAllowed
Copy to Clipboard Copied!
9.9.13. HTTP header configuration
OpenShift Container Platform provides different methods for working with HTTP headers. When setting or deleting headers, you can use specific fields in the Ingress Controller or an individual route to modify request and response headers. You can also set certain headers by using route annotations. The various ways of configuring headers can present challenges when working together.
You can only set or delete headers within an IngressController
or Route
CR, you cannot append them. If an HTTP header is set with a value, that value must be complete and not require appending in the future. In situations where it makes sense to append a header, such as the X-Forwarded-For header, use the spec.httpHeaders.forwardedHeaderPolicy
field, instead of spec.httpHeaders.actions
.
9.9.13.1. Order of precedence
When the same HTTP header is modified both in the Ingress Controller and in a route, HAProxy prioritizes the actions in certain ways depending on whether it is a request or response header.
- For HTTP response headers, actions specified in the Ingress Controller are executed after the actions specified in a route. This means that the actions specified in the Ingress Controller take precedence.
- For HTTP request headers, actions specified in a route are executed after the actions specified in the Ingress Controller. This means that the actions specified in the route take precedence.
For example, a cluster administrator sets the X-Frame-Options response header with the value DENY
in the Ingress Controller using the following configuration:
Example IngressController
spec
apiVersion: operator.openshift.io/v1 kind: IngressController # ... spec: httpHeaders: actions: response: - name: X-Frame-Options action: type: Set set: value: DENY
apiVersion: operator.openshift.io/v1
kind: IngressController
# ...
spec:
httpHeaders:
actions:
response:
- name: X-Frame-Options
action:
type: Set
set:
value: DENY
A route owner sets the same response header that the cluster administrator set in the Ingress Controller, but with the value SAMEORIGIN
using the following configuration:
Example Route
spec
apiVersion: route.openshift.io/v1 kind: Route # ... spec: httpHeaders: actions: response: - name: X-Frame-Options action: type: Set set: value: SAMEORIGIN
apiVersion: route.openshift.io/v1
kind: Route
# ...
spec:
httpHeaders:
actions:
response:
- name: X-Frame-Options
action:
type: Set
set:
value: SAMEORIGIN
When both the IngressController
spec and Route
spec are configuring the X-Frame-Options header, then the value set for this header at the global level in the Ingress Controller will take precedence, even if a specific route allows frames.
This prioritzation occurs because the haproxy.config
file uses the following logic, where the Ingress Controller is considered the front end and individual routes are considered the back end. The header value DENY
applied to the front end configurations overrides the same header with the value SAMEORIGIN
that is set in the back end:
frontend public http-response set-header X-Frame-Options 'DENY' frontend fe_sni http-response set-header X-Frame-Options 'DENY' frontend fe_no_sni http-response set-header X-Frame-Options 'DENY' backend be_secure:openshift-monitoring:alertmanager-main http-response set-header X-Frame-Options 'SAMEORIGIN'
frontend public
http-response set-header X-Frame-Options 'DENY'
frontend fe_sni
http-response set-header X-Frame-Options 'DENY'
frontend fe_no_sni
http-response set-header X-Frame-Options 'DENY'
backend be_secure:openshift-monitoring:alertmanager-main
http-response set-header X-Frame-Options 'SAMEORIGIN'
Additionally, any actions defined in either the Ingress Controller or a route override values set using route annotations.
9.9.13.2. Special case headers
The following headers are either prevented entirely from being set or deleted, or allowed under specific circumstances:
Header name | Configurable using IngressController spec | Configurable using Route spec | Reason for disallowment | Configurable using another method |
---|---|---|---|---|
| No | No |
The | No |
| No | Yes |
When the | No |
| No | No |
The |
Yes: the |
| No | No | The cookies that HAProxy sets are used for session tracking to map client connections to particular back-end servers. Allowing these headers to be set could interfere with HAProxy’s session affinity and restrict HAProxy’s ownership of a cookie. | Yes:
|
9.9.14. Setting or deleting HTTP request and response headers in an Ingress Controller
You can set or delete certain HTTP request and response headers for compliance purposes or other reasons. You can set or delete these headers either for all routes served by an Ingress Controller or for specific routes.
For example, you might want to migrate an application running on your cluster to use mutual TLS, which requires that your application checks for an X-Forwarded-Client-Cert request header, but the OpenShift Container Platform default Ingress Controller provides an X-SSL-Client-Der request header.
The following procedure modifies the Ingress Controller to set the X-Forwarded-Client-Cert request header, and delete the X-SSL-Client-Der request header.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have access to an OpenShift Container Platform cluster as a user with the
cluster-admin
role.
Procedure
Edit the Ingress Controller resource:
oc -n openshift-ingress-operator edit ingresscontroller/default
$ oc -n openshift-ingress-operator edit ingresscontroller/default
Copy to Clipboard Copied! Replace the X-SSL-Client-Der HTTP request header with the X-Forwarded-Client-Cert HTTP request header:
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpHeaders: actions: request: - name: X-Forwarded-Client-Cert action: type: Set set: value: "%{+Q}[ssl_c_der,base64]" - name: X-SSL-Client-Der action: type: Delete
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpHeaders: actions:
1 request:
2 - name: X-Forwarded-Client-Cert
3 action: type: Set
4 set: value: "%{+Q}[ssl_c_der,base64]"
5 - name: X-SSL-Client-Der action: type: Delete
Copy to Clipboard Copied! - 1
- The list of actions you want to perform on the HTTP headers.
- 2
- The type of header you want to change. In this case, a request header.
- 3
- The name of the header you want to change. For a list of available headers you can set or delete, see HTTP header configuration.
- 4
- The type of action being taken on the header. This field can have the value
Set
orDelete
. - 5
- When setting HTTP headers, you must provide a
value
. The value can be a string from a list of available directives for that header, for exampleDENY
, or it can be a dynamic value that will be interpreted using HAProxy’s dynamic value syntax. In this case, a dynamic value is added.
NoteFor setting dynamic header values for HTTP responses, allowed sample fetchers are
res.hdr
andssl_c_der
. For setting dynamic header values for HTTP requests, allowed sample fetchers arereq.hdr
andssl_c_der
. Both request and response dynamic values can use thelower
andbase64
converters.- Save the file to apply the changes.
9.9.15. Using X-Forwarded headers
You configure the HAProxy Ingress Controller to specify a policy for how to handle HTTP headers including Forwarded
and X-Forwarded-For
. The Ingress Operator uses the HTTPHeaders
field to configure the ROUTER_SET_FORWARDED_HEADERS
environment variable of the Ingress Controller.
Procedure
Configure the
HTTPHeaders
field for the Ingress Controller.Use the following command to edit the
IngressController
resource:oc edit IngressController
$ oc edit IngressController
Copy to Clipboard Copied! Under
spec
, set theHTTPHeaders
policy field toAppend
,Replace
,IfNone
, orNever
:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpHeaders: forwardedHeaderPolicy: Append
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpHeaders: forwardedHeaderPolicy: Append
Copy to Clipboard Copied!
Example use cases
As a cluster administrator, you can:
Configure an external proxy that injects the
X-Forwarded-For
header into each request before forwarding it to an Ingress Controller.To configure the Ingress Controller to pass the header through unmodified, you specify the
never
policy. The Ingress Controller then never sets the headers, and applications receive only the headers that the external proxy provides.Configure the Ingress Controller to pass the
X-Forwarded-For
header that your external proxy sets on external cluster requests through unmodified.To configure the Ingress Controller to set the
X-Forwarded-For
header on internal cluster requests, which do not go through the external proxy, specify theif-none
policy. If an HTTP request already has the header set through the external proxy, then the Ingress Controller preserves it. If the header is absent because the request did not come through the proxy, then the Ingress Controller adds the header.
As an application developer, you can:
Configure an application-specific external proxy that injects the
X-Forwarded-For
header.To configure an Ingress Controller to pass the header through unmodified for an application’s Route, without affecting the policy for other Routes, add an annotation
haproxy.router.openshift.io/set-forwarded-headers: if-none
orhaproxy.router.openshift.io/set-forwarded-headers: never
on the Route for the application.NoteYou can set the
haproxy.router.openshift.io/set-forwarded-headers
annotation on a per route basis, independent from the globally set value for the Ingress Controller.
9.9.16. Enabling and disabling HTTP/2 on Ingress Controllers
You can enable or disable transparent end-to-end HTTP/2 connectivity in HAProxy. This allows application owners to make use of HTTP/2 protocol capabilities, including single connection, header compression, binary streams, and more.
You can enable or disable HTTP/2 connectivity for an individual Ingress Controller or for the entire cluster.
To enable the use of HTTP/2 for the connection from the client to HAProxy, a route must specify a custom certificate. A route that uses the default certificate cannot use HTTP/2. This restriction is necessary to avoid problems from connection coalescing, where the client re-uses a connection for different routes that use the same certificate.
The connection from HAProxy to the application pod can use HTTP/2 only for re-encrypt routes and not for edge-terminated or insecure routes. This restriction is because HAProxy uses Application-Level Protocol Negotiation (ALPN), which is a TLS extension, to negotiate the use of HTTP/2 with the back-end. The implication is that end-to-end HTTP/2 is possible with passthrough and re-encrypt and not with edge-terminated routes.
You can use HTTP/2 with an insecure route whether the Ingress Controller has HTTP/2 enabled or disabled.
For non-passthrough routes, the Ingress Controller negotiates its connection to the application independently of the connection from the client. This means a client may connect to the Ingress Controller and negotiate HTTP/1.1, and the Ingress Controller might then connect to the application, negotiate HTTP/2, and forward the request from the client HTTP/1.1 connection using the HTTP/2 connection to the application. This poses a problem if the client subsequently tries to upgrade its connection from HTTP/1.1 to the WebSocket protocol, because the Ingress Controller cannot forward WebSocket to HTTP/2 and cannot upgrade its HTTP/2 connection to WebSocket. Consequently, if you have an application that is intended to accept WebSocket connections, it must not allow negotiating the HTTP/2 protocol or else clients will fail to upgrade to the WebSocket protocol.
9.9.16.1. Enabling HTTP/2
You can enable HTTP/2 on a specific Ingress Controller, or you can enable HTTP/2 for the entire cluster.
Procedure
To enable HTTP/2 on a specific Ingress Controller, enter the
oc annotate
command:oc -n openshift-ingress-operator annotate ingresscontrollers/<ingresscontroller_name> ingress.operator.openshift.io/default-enable-http2=true
$ oc -n openshift-ingress-operator annotate ingresscontrollers/<ingresscontroller_name> ingress.operator.openshift.io/default-enable-http2=true
1 Copy to Clipboard Copied! - 1
- Replace
<ingresscontroller_name>
with the name of an Ingress Controller to enable HTTP/2.
To enable HTTP/2 for the entire cluster, enter the
oc annotate
command:oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=true
$ oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=true
Copy to Clipboard Copied!
Alternatively, you can apply the following YAML code to enable HTTP/2:
apiVersion: config.openshift.io/v1 kind: Ingress metadata: name: cluster annotations: ingress.operator.openshift.io/default-enable-http2: "true"
apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
name: cluster
annotations:
ingress.operator.openshift.io/default-enable-http2: "true"
9.9.16.2. Disabling HTTP/2
You can disable HTTP/2 on a specific Ingress Controller, or you can disable HTTP/2 for the entire cluster.
Procedure
To disable HTTP/2 on a specific Ingress Controller, enter the
oc annotate
command:oc -n openshift-ingress-operator annotate ingresscontrollers/<ingresscontroller_name> ingress.operator.openshift.io/default-enable-http2=false
$ oc -n openshift-ingress-operator annotate ingresscontrollers/<ingresscontroller_name> ingress.operator.openshift.io/default-enable-http2=false
1 Copy to Clipboard Copied! - 1
- Replace
<ingresscontroller_name>
with the name of an Ingress Controller to disable HTTP/2.
To disable HTTP/2 for the entire cluster, enter the
oc annotate
command:oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=false
$ oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=false
Copy to Clipboard Copied!
Alternatively, you can apply the following YAML code to disable HTTP/2:
apiVersion: config.openshift.io/v1 kind: Ingress metadata: name: cluster annotations: ingress.operator.openshift.io/default-enable-http2: "false"
apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
name: cluster
annotations:
ingress.operator.openshift.io/default-enable-http2: "false"
9.9.17. Configuring the PROXY protocol for an Ingress Controller
A cluster administrator can configure the PROXY protocol when an Ingress Controller uses either the HostNetwork
, NodePortService
, or Private
endpoint publishing strategy types. The PROXY protocol enables the load balancer to preserve the original client addresses for connections that the Ingress Controller receives. The original client addresses are useful for logging, filtering, and injecting HTTP headers. In the default configuration, the connections that the Ingress Controller receives only contain the source address that is associated with the load balancer.
The default Ingress Controller with installer-provisioned clusters on non-cloud platforms that use a Keepalived Ingress Virtual IP (VIP) do not support the PROXY protocol.
The PROXY protocol enables the load balancer to preserve the original client addresses for connections that the Ingress Controller receives. The original client addresses are useful for logging, filtering, and injecting HTTP headers. In the default configuration, the connections that the Ingress Controller receives contain only the source IP address that is associated with the load balancer.
For a passthrough route configuration, servers in OpenShift Container Platform clusters cannot observe the original client source IP address. If you need to know the original client source IP address, configure Ingress access logging for your Ingress Controller so that you can view the client source IP addresses.
For re-encrypt and edge routes, the OpenShift Container Platform router sets the Forwarded
and X-Forwarded-For
headers so that application workloads check the client source IP address.
For more information about Ingress access logging, see "Configuring Ingress access logging".
Configuring the PROXY protocol for an Ingress Controller is not supported when using the LoadBalancerService
endpoint publishing strategy type. This restriction is because when OpenShift Container Platform runs in a cloud platform, and an Ingress Controller specifies that a service load balancer should be used, the Ingress Operator configures the load balancer service and enables the PROXY protocol based on the platform requirement for preserving source addresses.
You must configure both OpenShift Container Platform and the external load balancer to use either the PROXY protocol or TCP.
This feature is not supported in cloud deployments. This restriction is because when OpenShift Container Platform runs in a cloud platform, and an Ingress Controller specifies that a service load balancer should be used, the Ingress Operator configures the load balancer service and enables the PROXY protocol based on the platform requirement for preserving source addresses.
You must configure both OpenShift Container Platform and the external load balancer to either use the PROXY protocol or to use Transmission Control Protocol (TCP).
Prerequisites
- You created an Ingress Controller.
Procedure
Edit the Ingress Controller resource by entering the following command in your CLI:
oc -n openshift-ingress-operator edit ingresscontroller/default
$ oc -n openshift-ingress-operator edit ingresscontroller/default
Copy to Clipboard Copied! Set the PROXY configuration:
If your Ingress Controller uses the
HostNetwork
endpoint publishing strategy type, set thespec.endpointPublishingStrategy.hostNetwork.protocol
subfield toPROXY
:Sample
hostNetwork
configuration toPROXY
# ... spec: endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork # ...
# ... spec: endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork # ...
Copy to Clipboard Copied! If your Ingress Controller uses the
NodePortService
endpoint publishing strategy type, set thespec.endpointPublishingStrategy.nodePort.protocol
subfield toPROXY
:Sample
nodePort
configuration toPROXY
# ... spec: endpointPublishingStrategy: nodePort: protocol: PROXY type: NodePortService # ...
# ... spec: endpointPublishingStrategy: nodePort: protocol: PROXY type: NodePortService # ...
Copy to Clipboard Copied! If your Ingress Controller uses the
Private
endpoint publishing strategy type, set thespec.endpointPublishingStrategy.private.protocol
subfield toPROXY
:Sample
private
configuration toPROXY
# ... spec: endpointPublishingStrategy: private: protocol: PROXY type: Private # ...
# ... spec: endpointPublishingStrategy: private: protocol: PROXY type: Private # ...
Copy to Clipboard Copied!
9.9.18. Specifying an alternative cluster domain using the appsDomain option
As a cluster administrator, you can specify an alternative to the default cluster domain for user-created routes by configuring the appsDomain
field. The appsDomain
field is an optional domain for OpenShift Container Platform to use instead of the default, which is specified in the domain
field. If you specify an alternative domain, it overrides the default cluster domain for the purpose of determining the default host for a new route.
For example, you can use the DNS domain for your company as the default domain for routes and ingresses for applications running on your cluster.
Prerequisites
- You deployed an OpenShift Container Platform cluster.
-
You installed the
oc
command-line interface.
Procedure
Configure the
appsDomain
field by specifying an alternative default domain for user-created routes.Edit the ingress
cluster
resource:oc edit ingresses.config/cluster -o yaml
$ oc edit ingresses.config/cluster -o yaml
Copy to Clipboard Copied! Edit the YAML file:
Sample
appsDomain
configuration totest.example.com
apiVersion: config.openshift.io/v1 kind: Ingress metadata: name: cluster spec: domain: apps.example.com appsDomain: <test.example.com>
apiVersion: config.openshift.io/v1 kind: Ingress metadata: name: cluster spec: domain: apps.example.com
1 appsDomain: <test.example.com>
2 Copy to Clipboard Copied!
Verify that an existing route contains the domain name specified in the
appsDomain
field by exposing the route and verifying the route domain change:NoteWait for the
openshift-apiserver
finish rolling updates before exposing the route.Expose the route:
oc expose service hello-openshift
$ oc expose service hello-openshift route.route.openshift.io/hello-openshift exposed
Copy to Clipboard Copied! Example output:
oc get routes
$ oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD hello-openshift hello_openshift-<my_project>.test.example.com hello-openshift 8080-tcp None
Copy to Clipboard Copied!
9.9.19. Converting HTTP header case
HAProxy lowercases HTTP header names by default; for example, changing Host: xyz.com
to host: xyz.com
. If legacy applications are sensitive to the capitalization of HTTP header names, use the Ingress Controller spec.httpHeaders.headerNameCaseAdjustments
API field for a solution to accommodate legacy applications until they can be fixed.
OpenShift Container Platform includes HAProxy 2.6. If you want to update to this version of the web-based load balancer, ensure that you add the spec.httpHeaders.headerNameCaseAdjustments
section to your cluster’s configuration file.
As a cluster administrator, you can convert the HTTP header case by entering the oc patch
command or by setting the HeaderNameCaseAdjustments
field in the Ingress Controller YAML file.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Capitalize an HTTP header by using the
oc patch
command.Change the HTTP header from
host
toHost
by running the following command:oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"httpHeaders":{"headerNameCaseAdjustments":["Host"]}}}'
$ oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"httpHeaders":{"headerNameCaseAdjustments":["Host"]}}}'
Copy to Clipboard Copied! Create a
Route
resource YAML file so that the annotation can be applied to the application.Example of a route named
my-application
apiVersion: route.openshift.io/v1 kind: Route metadata: annotations: haproxy.router.openshift.io/h1-adjust-case: true name: <application_name> namespace: <application_name> # ...
apiVersion: route.openshift.io/v1 kind: Route metadata: annotations: haproxy.router.openshift.io/h1-adjust-case: true
1 name: <application_name> namespace: <application_name> # ...
Copy to Clipboard Copied! - 1
- Set
haproxy.router.openshift.io/h1-adjust-case
so that the Ingress Controller can adjust thehost
request header as specified.
Specify adjustments by configuring the
HeaderNameCaseAdjustments
field in the Ingress Controller YAML configuration file.The following example Ingress Controller YAML file adjusts the
host
header toHost
for HTTP/1 requests to appropriately annotated routes:Example Ingress Controller YAML
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpHeaders: headerNameCaseAdjustments: - Host
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpHeaders: headerNameCaseAdjustments: - Host
Copy to Clipboard Copied! The following example route enables HTTP response header name case adjustments by using the
haproxy.router.openshift.io/h1-adjust-case
annotation:Example route YAML
apiVersion: route.openshift.io/v1 kind: Route metadata: annotations: haproxy.router.openshift.io/h1-adjust-case: true name: my-application namespace: my-application spec: to: kind: Service name: my-application
apiVersion: route.openshift.io/v1 kind: Route metadata: annotations: haproxy.router.openshift.io/h1-adjust-case: true
1 name: my-application namespace: my-application spec: to: kind: Service name: my-application
Copy to Clipboard Copied! - 1
- Set
haproxy.router.openshift.io/h1-adjust-case
to true.
9.9.20. Using router compression
You configure the HAProxy Ingress Controller to specify router compression globally for specific MIME types. You can use the mimeTypes
variable to define the formats of MIME types to which compression is applied. The types are: application, image, message, multipart, text, video, or a custom type prefaced by "X-". To see the full notation for MIME types and subtypes, see RFC1341.
Memory allocated for compression can affect the max connections. Additionally, compression of large buffers can cause latency, like heavy regex or long lists of regex.
Not all MIME types benefit from compression, but HAProxy still uses resources to try to compress if instructed to. Generally, text formats, such as html, css, and js, formats benefit from compression, but formats that are already compressed, such as image, audio, and video, benefit little in exchange for the time and resources spent on compression.
Procedure
Configure the
httpCompression
field for the Ingress Controller.Use the following command to edit the
IngressController
resource:oc edit -n openshift-ingress-operator ingresscontrollers/default
$ oc edit -n openshift-ingress-operator ingresscontrollers/default
Copy to Clipboard Copied! Under
spec
, set thehttpCompression
policy field tomimeTypes
and specify a list of MIME types that should have compression applied:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpCompression: mimeTypes: - "text/html" - "text/css; charset=utf-8" - "application/json" ...
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpCompression: mimeTypes: - "text/html" - "text/css; charset=utf-8" - "application/json" ...
Copy to Clipboard Copied!
9.9.21. Exposing router metrics
You can expose the HAProxy router metrics by default in Prometheus format on the default stats port, 1936. The external metrics collection and aggregation systems such as Prometheus can access the HAProxy router metrics. You can view the HAProxy router metrics in a browser in the HTML and comma separated values (CSV) format.
Prerequisites
- You configured your firewall to access the default stats port, 1936.
Procedure
Get the router pod name by running the following command:
oc get pods -n openshift-ingress
$ oc get pods -n openshift-ingress
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE router-default-76bfffb66c-46qwp 1/1 Running 0 11h
NAME READY STATUS RESTARTS AGE router-default-76bfffb66c-46qwp 1/1 Running 0 11h
Copy to Clipboard Copied! Get the router’s username and password, which the router pod stores in the
/var/lib/haproxy/conf/metrics-auth/statsUsername
and/var/lib/haproxy/conf/metrics-auth/statsPassword
files:Get the username by running the following command:
oc rsh <router_pod_name> cat metrics-auth/statsUsername
$ oc rsh <router_pod_name> cat metrics-auth/statsUsername
Copy to Clipboard Copied! Get the password by running the following command:
oc rsh <router_pod_name> cat metrics-auth/statsPassword
$ oc rsh <router_pod_name> cat metrics-auth/statsPassword
Copy to Clipboard Copied!
Get the router IP and metrics certificates by running the following command:
oc describe pod <router_pod>
$ oc describe pod <router_pod>
Copy to Clipboard Copied! Get the raw statistics in Prometheus format by running the following command:
curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics
$ curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics
Copy to Clipboard Copied! Access the metrics securely by running the following command:
curl -u user:password https://<router_IP>:<stats_port>/metrics -k
$ curl -u user:password https://<router_IP>:<stats_port>/metrics -k
Copy to Clipboard Copied! Access the default stats port, 1936, by running the following command:
curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics
$ curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics
Copy to Clipboard Copied! Example 9.1. Example output
... # HELP haproxy_backend_connections_total Total number of connections. # TYPE haproxy_backend_connections_total gauge haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route"} 0 haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route-alt"} 0 haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route01"} 0 ... # HELP haproxy_exporter_server_threshold Number of servers tracked and the current threshold value. # TYPE haproxy_exporter_server_threshold gauge haproxy_exporter_server_threshold{type="current"} 11 haproxy_exporter_server_threshold{type="limit"} 500 ... # HELP haproxy_frontend_bytes_in_total Current total of incoming bytes. # TYPE haproxy_frontend_bytes_in_total gauge haproxy_frontend_bytes_in_total{frontend="fe_no_sni"} 0 haproxy_frontend_bytes_in_total{frontend="fe_sni"} 0 haproxy_frontend_bytes_in_total{frontend="public"} 119070 ... # HELP haproxy_server_bytes_in_total Current total of incoming bytes. # TYPE haproxy_server_bytes_in_total gauge haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_no_sni",service=""} 0 haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_sni",service=""} 0 haproxy_server_bytes_in_total{namespace="default",pod="docker-registry-5-nk5fz",route="docker-registry",server="10.130.0.89:5000",service="docker-registry"} 0 haproxy_server_bytes_in_total{namespace="default",pod="hello-rc-vkjqx",route="hello-route",server="10.130.0.90:8080",service="hello-svc-1"} 0 ...
... # HELP haproxy_backend_connections_total Total number of connections. # TYPE haproxy_backend_connections_total gauge haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route"} 0 haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route-alt"} 0 haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route01"} 0 ... # HELP haproxy_exporter_server_threshold Number of servers tracked and the current threshold value. # TYPE haproxy_exporter_server_threshold gauge haproxy_exporter_server_threshold{type="current"} 11 haproxy_exporter_server_threshold{type="limit"} 500 ... # HELP haproxy_frontend_bytes_in_total Current total of incoming bytes. # TYPE haproxy_frontend_bytes_in_total gauge haproxy_frontend_bytes_in_total{frontend="fe_no_sni"} 0 haproxy_frontend_bytes_in_total{frontend="fe_sni"} 0 haproxy_frontend_bytes_in_total{frontend="public"} 119070 ... # HELP haproxy_server_bytes_in_total Current total of incoming bytes. # TYPE haproxy_server_bytes_in_total gauge haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_no_sni",service=""} 0 haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_sni",service=""} 0 haproxy_server_bytes_in_total{namespace="default",pod="docker-registry-5-nk5fz",route="docker-registry",server="10.130.0.89:5000",service="docker-registry"} 0 haproxy_server_bytes_in_total{namespace="default",pod="hello-rc-vkjqx",route="hello-route",server="10.130.0.90:8080",service="hello-svc-1"} 0 ...
Copy to Clipboard Copied! Launch the stats window by entering the following URL in a browser:
http://<user>:<password>@<router_IP>:<stats_port>
http://<user>:<password>@<router_IP>:<stats_port>
Copy to Clipboard Copied! Optional: Get the stats in CSV format by entering the following URL in a browser:
http://<user>:<password>@<router_ip>:1936/metrics;csv
http://<user>:<password>@<router_ip>:1936/metrics;csv
Copy to Clipboard Copied!
9.9.22. Customizing HAProxy error code response pages
As a cluster administrator, you can specify a custom error code response page for either 503, 404, or both error pages. The HAProxy router serves a 503 error page when the application pod is not running or a 404 error page when the requested URL does not exist. For example, if you customize the 503 error code response page, then the page is served when the application pod is not running, and the default 404 error code HTTP response page is served by the HAProxy router for an incorrect route or a non-existing route.
Custom error code response pages are specified in a config map then patched to the Ingress Controller. The config map keys have two available file names as follows: error-page-503.http
and error-page-404.http
.
Custom HTTP error code response pages must follow the HAProxy HTTP error page configuration guidelines. Here is an example of the default OpenShift Container Platform HAProxy router http 503 error code response page. You can use the default content as a template for creating your own custom page.
By default, the HAProxy router serves only a 503 error page when the application is not running or when the route is incorrect or non-existent. This default behavior is the same as the behavior on OpenShift Container Platform 4.8 and earlier. If a config map for the customization of an HTTP error code response is not provided, and you are using a custom HTTP error code response page, the router serves a default 404 or 503 error code response page.
If you use the OpenShift Container Platform default 503 error code page as a template for your customizations, the headers in the file require an editor that can use CRLF line endings.
Procedure
Create a config map named
my-custom-error-code-pages
in theopenshift-config
namespace:oc -n openshift-config create configmap my-custom-error-code-pages \ --from-file=error-page-503.http \ --from-file=error-page-404.http
$ oc -n openshift-config create configmap my-custom-error-code-pages \ --from-file=error-page-503.http \ --from-file=error-page-404.http
Copy to Clipboard Copied! ImportantIf you do not specify the correct format for the custom error code response page, a router pod outage occurs. To resolve this outage, you must delete or correct the config map and delete the affected router pods so they can be recreated with the correct information.
Patch the Ingress Controller to reference the
my-custom-error-code-pages
config map by name:oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
$ oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
Copy to Clipboard Copied! The Ingress Operator copies the
my-custom-error-code-pages
config map from theopenshift-config
namespace to theopenshift-ingress
namespace. The Operator names the config map according to the pattern,<your_ingresscontroller_name>-errorpages
, in theopenshift-ingress
namespace.Display the copy:
oc get cm default-errorpages -n openshift-ingress
$ oc get cm default-errorpages -n openshift-ingress
Copy to Clipboard Copied! Example output
NAME DATA AGE default-errorpages 2 25s
NAME DATA AGE default-errorpages 2 25s
1 Copy to Clipboard Copied! - 1
- The example config map name is
default-errorpages
because thedefault
Ingress Controller custom resource (CR) was patched.
Confirm that the config map containing the custom error response page mounts on the router volume where the config map key is the filename that has the custom HTTP error code response:
For 503 custom HTTP custom error code response:
oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-503.http
$ oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-503.http
Copy to Clipboard Copied! For 404 custom HTTP custom error code response:
oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-404.http
$ oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-404.http
Copy to Clipboard Copied!
Verification
Verify your custom error code HTTP response:
Create a test project and application:
oc new-project test-ingress
$ oc new-project test-ingress
Copy to Clipboard Copied! oc new-app django-psql-example
$ oc new-app django-psql-example
Copy to Clipboard Copied! For 503 custom http error code response:
- Stop all the pods for the application.
Run the following curl command or visit the route hostname in the browser:
curl -vk <route_hostname>
$ curl -vk <route_hostname>
Copy to Clipboard Copied!
For 404 custom http error code response:
- Visit a non-existent route or an incorrect route.
Run the following curl command or visit the route hostname in the browser:
curl -vk <route_hostname>
$ curl -vk <route_hostname>
Copy to Clipboard Copied!
Check if the
errorfile
attribute is properly in thehaproxy.config
file:oc -n openshift-ingress rsh <router> cat /var/lib/haproxy/conf/haproxy.config | grep errorfile
$ oc -n openshift-ingress rsh <router> cat /var/lib/haproxy/conf/haproxy.config | grep errorfile
Copy to Clipboard Copied!
9.9.23. Setting the Ingress Controller maximum connections
A cluster administrator can set the maximum number of simultaneous connections for OpenShift router deployments. You can patch an existing Ingress Controller to increase the maximum number of connections.
Prerequisites
- The following assumes that you already created an Ingress Controller
Procedure
Update the Ingress Controller to change the maximum number of connections for HAProxy:
oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"maxConnections": 7500}}}'
$ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"maxConnections": 7500}}}'
Copy to Clipboard Copied! WarningIf you set the
spec.tuningOptions.maxConnections
value greater than the current operating system limit, the HAProxy process will not start. See the table in the "Ingress Controller configuration parameters" section for more information about this parameter.
Chapter 10. Ingress Node Firewall Operator in OpenShift Container Platform
The Ingress Node Firewall Operator provides a stateless, eBPF-based firewall for managing node-level ingress traffic in OpenShift Container Platform.
10.1. Ingress Node Firewall Operator
The Ingress Node Firewall Operator provides ingress firewall rules at a node level by deploying the daemon set to nodes you specify and manage in the firewall configurations. To deploy the daemon set, you create an IngressNodeFirewallConfig
custom resource (CR). The Operator applies the IngressNodeFirewallConfig
CR to create ingress node firewall daemon set daemon
, which run on all nodes that match the nodeSelector
.
You configure rules
of the IngressNodeFirewall
CR and apply them to clusters using the nodeSelector
and setting values to "true".
The Ingress Node Firewall Operator supports only stateless firewall rules.
Network interface controllers (NICs) that do not support native XDP drivers will run at a lower performance.
For OpenShift Container Platform 4.14 or later, you must run Ingress Node Firewall Operator on RHEL 9.0 or later.
Ingress Node Firewall Operator is not supported on Amazon Web Services (AWS) with the default OpenShift installation or on Red Hat OpenShift Service on AWS (ROSA). For more information on Red Hat OpenShift Service on AWS support and ingress, see Ingress Operator in Red Hat OpenShift Service on AWS.
10.2. Installing the Ingress Node Firewall Operator
As a cluster administrator, you can install the Ingress Node Firewall Operator by using the OpenShift Container Platform CLI or the web console.
10.2.1. Installing the Ingress Node Firewall Operator using the CLI
As a cluster administrator, you can install the Operator using the CLI.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). - You have an account with administrator privileges.
Procedure
To create the
openshift-ingress-node-firewall
namespace, enter the following command:cat << EOF| oc create -f - apiVersion: v1 kind: Namespace metadata: labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: v1.24 name: openshift-ingress-node-firewall EOF
$ cat << EOF| oc create -f - apiVersion: v1 kind: Namespace metadata: labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: v1.24 name: openshift-ingress-node-firewall EOF
Copy to Clipboard Copied! To create an
OperatorGroup
CR, enter the following command:cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: ingress-node-firewall-operators namespace: openshift-ingress-node-firewall EOF
$ cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: ingress-node-firewall-operators namespace: openshift-ingress-node-firewall EOF
Copy to Clipboard Copied! Subscribe to the Ingress Node Firewall Operator.
To create a
Subscription
CR for the Ingress Node Firewall Operator, enter the following command:cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ingress-node-firewall-sub namespace: openshift-ingress-node-firewall spec: name: ingress-node-firewall channel: stable source: redhat-operators sourceNamespace: openshift-marketplace EOF
$ cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ingress-node-firewall-sub namespace: openshift-ingress-node-firewall spec: name: ingress-node-firewall channel: stable source: redhat-operators sourceNamespace: openshift-marketplace EOF
Copy to Clipboard Copied!
To verify that the Operator is installed, enter the following command:
oc get ip -n openshift-ingress-node-firewall
$ oc get ip -n openshift-ingress-node-firewall
Copy to Clipboard Copied! Example output
NAME CSV APPROVAL APPROVED install-5cvnz ingress-node-firewall.4.15.0-202211122336 Automatic true
NAME CSV APPROVAL APPROVED install-5cvnz ingress-node-firewall.4.15.0-202211122336 Automatic true
Copy to Clipboard Copied! To verify the version of the Operator, enter the following command:
oc get csv -n openshift-ingress-node-firewall
$ oc get csv -n openshift-ingress-node-firewall
Copy to Clipboard Copied! Example output
NAME DISPLAY VERSION REPLACES PHASE ingress-node-firewall.4.15.0-202211122336 Ingress Node Firewall Operator 4.15.0-202211122336 ingress-node-firewall.4.15.0-202211102047 Succeeded
NAME DISPLAY VERSION REPLACES PHASE ingress-node-firewall.4.15.0-202211122336 Ingress Node Firewall Operator 4.15.0-202211122336 ingress-node-firewall.4.15.0-202211102047 Succeeded
Copy to Clipboard Copied!
10.2.2. Installing the Ingress Node Firewall Operator using the web console
As a cluster administrator, you can install the Operator using the web console.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). - You have an account with administrator privileges.
Procedure
Install the Ingress Node Firewall Operator:
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Select Ingress Node Firewall Operator from the list of available Operators, and then click Install.
- On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
- Click Install.
Verify that the Ingress Node Firewall Operator is installed successfully:
- Navigate to the Operators → Installed Operators page.
Ensure that Ingress Node Firewall Operator is listed in the openshift-ingress-node-firewall project with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the Operator does not have a Status of InstallSucceeded, troubleshoot using the following steps:
- Inspect the Operator Subscriptions and Install Plans tabs for any failures or errors under Status.
-
Navigate to the Workloads → Pods page and check the logs for pods in the
openshift-ingress-node-firewall
project. Check the namespace of the YAML file. If the annotation is missing, you can add the annotation
workload.openshift.io/allowed=management
to the Operator namespace with the following command:oc annotate ns/openshift-ingress-node-firewall workload.openshift.io/allowed=management
$ oc annotate ns/openshift-ingress-node-firewall workload.openshift.io/allowed=management
Copy to Clipboard Copied! NoteFor single-node OpenShift clusters, the
openshift-ingress-node-firewall
namespace requires theworkload.openshift.io/allowed=management
annotation.
10.3. Deploying Ingress Node Firewall Operator
Prerequisite
- The Ingress Node Firewall Operator is installed.
Procedure
To deploy the Ingress Node Firewall Operator, create a IngressNodeFirewallConfig
custom resource that will deploy the Operator’s daemon set. You can deploy one or multiple IngressNodeFirewall
CRDs to nodes by applying firewall rules.
-
Create the
IngressNodeFirewallConfig
inside theopenshift-ingress-node-firewall
namespace namedingressnodefirewallconfig
. Run the following command to deploy Ingress Node Firewall Operator rules:
oc apply -f rule.yaml
$ oc apply -f rule.yaml
Copy to Clipboard Copied!
10.3.1. Ingress Node Firewall configuration object
The fields for the Ingress Node Firewall configuration object are described in the following table:
Field | Type | Description |
---|---|---|
|
|
The name of the CR object. The name of the firewall rules object must be |
|
|
Namespace for the Ingress Firewall Operator CR object. The |
|
| A node selection constraint used to target nodes through specified node labels. For example: spec: nodeSelector: node-role.kubernetes.io/worker: ""
Note
One label used in |
The Operator consumes the CR and creates an ingress node firewall daemon set on all the nodes that match the nodeSelector
.
Ingress Node Firewall Operator example configuration
A complete Ingress Node Firewall Configuration is specified in the following example:
Example Ingress Node Firewall Configuration object
apiVersion: ingressnodefirewall.openshift.io/v1alpha1 kind: IngressNodeFirewallConfig metadata: name: ingressnodefirewallconfig namespace: openshift-ingress-node-firewall spec: nodeSelector: node-role.kubernetes.io/worker: ""
apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewallConfig
metadata:
name: ingressnodefirewallconfig
namespace: openshift-ingress-node-firewall
spec:
nodeSelector:
node-role.kubernetes.io/worker: ""
The Operator consumes the CR and creates an ingress node firewall daemon set on all the nodes that match the nodeSelector
.
10.3.2. Ingress Node Firewall rules object
The fields for the Ingress Node Firewall rules object are described in the following table:
Field | Type | Description |
---|---|---|
|
| The name of the CR object. |
|
|
The fields for this object specify the interfaces to apply the firewall rules to. For example, |
|
|
You can use |
|
|
|
Ingress object configuration
The values for the ingress
object are defined in the following table:
Field | Type | Description |
---|---|---|
|
| Allows you to set the CIDR block. You can configure multiple CIDRs from different address families. Note
Different CIDRs allow you to use the same order rule. In the case that there are multiple |
|
|
Ingress firewall
Set Note Ingress firewall rules are verified using a verification webhook that blocks any invalid configuration. The verification webhook prevents you from blocking any critical cluster services such as the API server or SSH. |
Ingress Node Firewall rules object example
A complete Ingress Node Firewall configuration is specified in the following example:
Example Ingress Node Firewall configuration
apiVersion: ingressnodefirewall.openshift.io/v1alpha1 kind: IngressNodeFirewall metadata: name: ingressnodefirewall spec: interfaces: - eth0 nodeSelector: matchLabels: <ingress_firewall_label_name>: <label_value> ingress: - sourceCIDRs: - 172.16.0.0/12 rules: - order: 10 protocolConfig: protocol: ICMP icmp: icmpType: 8 #ICMP Echo request action: Deny - order: 20 protocolConfig: protocol: TCP tcp: ports: "8000-9000" action: Deny - sourceCIDRs: - fc00:f853:ccd:e793::0/64 rules: - order: 10 protocolConfig: protocol: ICMPv6 icmpv6: icmpType: 128 #ICMPV6 Echo request action: Deny
apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewall
metadata:
name: ingressnodefirewall
spec:
interfaces:
- eth0
nodeSelector:
matchLabels:
<ingress_firewall_label_name>: <label_value>
ingress:
- sourceCIDRs:
- 172.16.0.0/12
rules:
- order: 10
protocolConfig:
protocol: ICMP
icmp:
icmpType: 8 #ICMP Echo request
action: Deny
- order: 20
protocolConfig:
protocol: TCP
tcp:
ports: "8000-9000"
action: Deny
- sourceCIDRs:
- fc00:f853:ccd:e793::0/64
rules:
- order: 10
protocolConfig:
protocol: ICMPv6
icmpv6:
icmpType: 128 #ICMPV6 Echo request
action: Deny
- 1
- A <label_name> and a <label_value> must exist on the node and must match the
nodeselector
label and value applied to the nodes you want theingressfirewallconfig
CR to run on. The <label_value> can betrue
orfalse
. By usingnodeSelector
labels, you can target separate groups of nodes to apply different rules to using theingressfirewallconfig
CR.
Zero trust Ingress Node Firewall rules object example
Zero trust Ingress Node Firewall rules can provide additional security to multi-interface clusters. For example, you can use zero trust Ingress Node Firewall rules to drop all traffic on a specific interface except for SSH.
A complete configuration of a zero trust Ingress Node Firewall rule set is specified in the following example:
Users need to add all ports their application will use to their allowlist in the following case to ensure proper functionality.
Example zero trust Ingress Node Firewall rules
apiVersion: ingressnodefirewall.openshift.io/v1alpha1 kind: IngressNodeFirewall metadata: name: ingressnodefirewall-zero-trust spec: interfaces: - eth1 nodeSelector: matchLabels: <ingress_firewall_label_name>: <label_value> ingress: - sourceCIDRs: - 0.0.0.0/0 rules: - order: 10 protocolConfig: protocol: TCP tcp: ports: 22 action: Allow - order: 20 action: Deny
apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewall
metadata:
name: ingressnodefirewall-zero-trust
spec:
interfaces:
- eth1
nodeSelector:
matchLabels:
<ingress_firewall_label_name>: <label_value>
ingress:
- sourceCIDRs:
- 0.0.0.0/0
rules:
- order: 10
protocolConfig:
protocol: TCP
tcp:
ports: 22
action: Allow
- order: 20
action: Deny
10.4. Viewing Ingress Node Firewall Operator rules
Procedure
Run the following command to view all current rules :
oc get ingressnodefirewall
$ oc get ingressnodefirewall
Copy to Clipboard Copied! Choose one of the returned
<resource>
names and run the following command to view the rules or configs:oc get <resource> <name> -o yaml
$ oc get <resource> <name> -o yaml
Copy to Clipboard Copied!
10.5. Troubleshooting the Ingress Node Firewall Operator
Run the following command to list installed Ingress Node Firewall custom resource definitions (CRD):
oc get crds | grep ingressnodefirewall
$ oc get crds | grep ingressnodefirewall
Copy to Clipboard Copied! Example output
NAME READY UP-TO-DATE AVAILABLE AGE ingressnodefirewallconfigs.ingressnodefirewall.openshift.io 2022-08-25T10:03:01Z ingressnodefirewallnodestates.ingressnodefirewall.openshift.io 2022-08-25T10:03:00Z ingressnodefirewalls.ingressnodefirewall.openshift.io 2022-08-25T10:03:00Z
NAME READY UP-TO-DATE AVAILABLE AGE ingressnodefirewallconfigs.ingressnodefirewall.openshift.io 2022-08-25T10:03:01Z ingressnodefirewallnodestates.ingressnodefirewall.openshift.io 2022-08-25T10:03:00Z ingressnodefirewalls.ingressnodefirewall.openshift.io 2022-08-25T10:03:00Z
Copy to Clipboard Copied! Run the following command to view the state of the Ingress Node Firewall Operator:
oc get pods -n openshift-ingress-node-firewall
$ oc get pods -n openshift-ingress-node-firewall
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE ingress-node-firewall-controller-manager 2/2 Running 0 5d21h ingress-node-firewall-daemon-pqx56 3/3 Running 0 5d21h
NAME READY STATUS RESTARTS AGE ingress-node-firewall-controller-manager 2/2 Running 0 5d21h ingress-node-firewall-daemon-pqx56 3/3 Running 0 5d21h
Copy to Clipboard Copied! The following fields provide information about the status of the Operator:
READY
,STATUS
,AGE
, andRESTARTS
. TheSTATUS
field isRunning
when the Ingress Node Firewall Operator is deploying a daemon set to the assigned nodes.Run the following command to collect all ingress firewall node pods' logs:
oc adm must-gather – gather_ingress_node_firewall
$ oc adm must-gather – gather_ingress_node_firewall
Copy to Clipboard Copied! The logs are available in the sos node’s report containing eBPF
bpftool
outputs at/sos_commands/ebpf
. These reports include lookup tables used or updated as the ingress firewall XDP handles packet processing, updates statistics, and emits events.
Chapter 11. Configuring an Ingress Controller for manual DNS Management
As a cluster administrator, when you create an Ingress Controller, the Operator manages the DNS records automatically. This has some limitations when the required DNS zone is different from the cluster DNS zone or when the DNS zone is hosted outside the cloud provider.
As a cluster administrator, you can configure an Ingress Controller to stop automatic DNS management and start manual DNS management. Set dnsManagementPolicy
to specify when it should be automatically or manually managed.
When you change an Ingress Controller from Managed
to Unmanaged
DNS management policy, the Operator does not clean up the previous wildcard DNS record provisioned on the cloud. When you change an Ingress Controller from Unmanaged
to Managed
DNS management policy, the Operator attempts to create the DNS record on the cloud provider if it does not exist or updates the DNS record if it already exists.
When you set dnsManagementPolicy
to unmanaged
, you have to manually manage the lifecycle of the wildcard DNS record on the cloud provider.
11.1. Managed
DNS management policy
The Managed
DNS management policy for Ingress Controllers ensures that the lifecycle of the wildcard DNS record on the cloud provider is automatically managed by the Operator.
11.2. Unmanaged
DNS management policy
The Unmanaged
DNS management policy for Ingress Controllers ensures that the lifecycle of the wildcard DNS record on the cloud provider is not automatically managed, instead it becomes the responsibility of the cluster administrator.
On the AWS cloud platform, if the domain on the Ingress Controller does not match with dnsConfig.Spec.BaseDomain
then the DNS management policy is automatically set to Unmanaged
.
11.3. Creating a custom Ingress Controller with the Unmanaged
DNS management policy
As a cluster administrator, you can create a new custom Ingress Controller with the Unmanaged
DNS management policy.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a custom resource (CR) file named
sample-ingress.yaml
containing the following:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: <name> spec: domain: <domain> endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: External dnsManagementPolicy: Unmanaged
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: <name>
1 spec: domain: <domain>
2 endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: External
3 dnsManagementPolicy: Unmanaged
4 Copy to Clipboard Copied! - 1
- Specify the
<name>
with a name for theIngressController
object. - 2
- Specify the
domain
based on the DNS record that was created as a prerequisite. - 3
- Specify the
scope
asExternal
to expose the load balancer externally. - 4
dnsManagementPolicy
indicates if the Ingress Controller is managing the lifecycle of the wildcard DNS record associated with the load balancer. The valid values areManaged
andUnmanaged
. The default value isManaged
.
Save the file to apply the changes.
oc apply -f <name>.yaml
oc apply -f <name>.yaml
1 Copy to Clipboard Copied!
11.4. Modifying an existing Ingress Controller
As a cluster administrator, you can modify an existing Ingress Controller to manually manage the DNS record lifecycle.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Modify the chosen
IngressController
to setdnsManagementPolicy
:SCOPE=$(oc -n openshift-ingress-operator get ingresscontroller <name> -o=jsonpath="{.status.endpointPublishingStrategy.loadBalancer.scope}") oc -n openshift-ingress-operator patch ingresscontrollers/<name> --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"dnsManagementPolicy":"Unmanaged", "scope":"${SCOPE}"}}}}'
SCOPE=$(oc -n openshift-ingress-operator get ingresscontroller <name> -o=jsonpath="{.status.endpointPublishingStrategy.loadBalancer.scope}") oc -n openshift-ingress-operator patch ingresscontrollers/<name> --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"dnsManagementPolicy":"Unmanaged", "scope":"${SCOPE}"}}}}'
Copy to Clipboard Copied! - Optional: You can delete the associated DNS record in the cloud provider.
Chapter 12. Verifying connectivity to an endpoint
The Cluster Network Operator (CNO) runs a controller, the connectivity check controller, that performs a connection health check between resources within your cluster. By reviewing the results of the health checks, you can diagnose connection problems or eliminate network connectivity as the cause of an issue that you are investigating.
12.1. Connection health checks performed
To verify that cluster resources are reachable, a TCP connection is made to each of the following cluster API services:
- Kubernetes API server service
- Kubernetes API server endpoints
- OpenShift API server service
- OpenShift API server endpoints
- Load balancers
To verify that services and service endpoints are reachable on every node in the cluster, a TCP connection is made to each of the following targets:
- Health check target service
- Health check target endpoints
12.2. Implementation of connection health checks
The connectivity check controller orchestrates connection verification checks in your cluster. The results for the connection tests are stored in PodNetworkConnectivity
objects in the openshift-network-diagnostics
namespace. Connection tests are performed every minute in parallel.
The Cluster Network Operator (CNO) deploys several resources to the cluster to send and receive connectivity health checks:
- Health check source
-
This program deploys in a single pod replica set managed by a
Deployment
object. The program consumesPodNetworkConnectivity
objects and connects to thespec.targetEndpoint
specified in each object. - Health check target
- A pod deployed as part of a daemon set on every node in the cluster. The pod listens for inbound health checks. The presence of this pod on every node allows for the testing of connectivity to each node.
12.3. PodNetworkConnectivityCheck object fields
The PodNetworkConnectivityCheck
object fields are described in the following tables.
Field | Type | Description |
---|---|---|
|
|
The name of the object in the following format:
|
|
|
The namespace that the object is associated with. This value is always |
|
|
The name of the pod where the connection check originates, such as |
|
|
The target of the connection check, such as |
|
| Configuration for the TLS certificate to use. |
|
| The name of the TLS certificate used, if any. The default value is an empty string. |
|
| An object representing the condition of the connection test and logs of recent connection successes and failures. |
|
| The latest status of the connection check and any previous statuses. |
|
| Connection test logs from unsuccessful attempts. |
|
| Connect test logs covering the time periods of any outages. |
|
| Connection test logs from successful attempts. |
The following table describes the fields for objects in the status.conditions
array:
Field | Type | Description |
---|---|---|
|
| The time that the condition of the connection transitioned from one status to another. |
|
| The details about last transition in a human readable format. |
|
| The last status of the transition in a machine readable format. |
|
| The status of the condition. |
|
| The type of the condition. |
The following table describes the fields for objects in the status.conditions
array:
Field | Type | Description |
---|---|---|
|
| The timestamp from when the connection failure is resolved. |
|
| Connection log entries, including the log entry related to the successful end of the outage. |
|
| A summary of outage details in a human readable format. |
|
| The timestamp from when the connection failure is first detected. |
|
| Connection log entries, including the original failure. |
Connection log fields
The fields for a connection log entry are described in the following table. The object is used in the following fields:
-
status.failures[]
-
status.successes[]
-
status.outages[].startLogs[]
-
status.outages[].endLogs[]
Field | Type | Description |
---|---|---|
|
| Records the duration of the action. |
|
| Provides the status in a human readable format. |
|
|
Provides the reason for status in a machine readable format. The value is one of |
|
| Indicates if the log entry is a success or failure. |
|
| The start time of connection check. |
12.4. Verifying network connectivity for an endpoint
As a cluster administrator, you can verify the connectivity of an endpoint, such as an API server, load balancer, service, or pod.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Access to the cluster as a user with the
cluster-admin
role.
Procedure
To list the current
PodNetworkConnectivityCheck
objects, enter the following command:oc get podnetworkconnectivitycheck -n openshift-network-diagnostics
$ oc get podnetworkconnectivitycheck -n openshift-network-diagnostics
Copy to Clipboard Copied! Example output
NAME AGE network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1 73m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-service-cluster 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-default-service-cluster 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-external 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-internal 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-1 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-2 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-c-n8mbf 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-d-4hnrz 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-service-cluster 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-service-cluster 75m
NAME AGE network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1 73m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-service-cluster 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-default-service-cluster 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-external 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-internal 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-1 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-2 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-c-n8mbf 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-d-4hnrz 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-service-cluster 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-service-cluster 75m
Copy to Clipboard Copied! View the connection test logs:
- From the output of the previous command, identify the endpoint that you want to review the connectivity logs for.
To view the object, enter the following command:
oc get podnetworkconnectivitycheck <name> \ -n openshift-network-diagnostics -o yaml
$ oc get podnetworkconnectivitycheck <name> \ -n openshift-network-diagnostics -o yaml
Copy to Clipboard Copied! where
<name>
specifies the name of thePodNetworkConnectivityCheck
object.Example output
apiVersion: controlplane.operator.openshift.io/v1alpha1 kind: PodNetworkConnectivityCheck metadata: name: network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 namespace: openshift-network-diagnostics ... spec: sourcePod: network-check-source-7c88f6d9f-hmg2f targetEndpoint: 10.0.0.4:6443 tlsClientCert: name: "" status: conditions: - lastTransitionTime: "2021-01-13T20:11:34Z" message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnectSuccess status: "True" type: Reachable failures: - latency: 2.241775ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:10:34Z" - latency: 2.582129ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:09:34Z" - latency: 3.483578ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:08:34Z" outages: - end: "2021-01-13T20:11:34Z" endLogs: - latency: 2.032018ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T20:11:34Z" - latency: 2.241775ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:10:34Z" - latency: 2.582129ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:09:34Z" - latency: 3.483578ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:08:34Z" message: Connectivity restored after 2m59.999789186s start: "2021-01-13T20:08:34Z" startLogs: - latency: 3.483578ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:08:34Z" successes: - latency: 2.845865ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:14:34Z" - latency: 2.926345ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:13:34Z" - latency: 2.895796ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:12:34Z" - latency: 2.696844ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:11:34Z" - latency: 1.502064ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:10:34Z" - latency: 1.388857ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:09:34Z" - latency: 1.906383ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:08:34Z" - latency: 2.089073ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:07:34Z" - latency: 2.156994ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:06:34Z" - latency: 1.777043ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:05:34Z"
apiVersion: controlplane.operator.openshift.io/v1alpha1 kind: PodNetworkConnectivityCheck metadata: name: network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 namespace: openshift-network-diagnostics ... spec: sourcePod: network-check-source-7c88f6d9f-hmg2f targetEndpoint: 10.0.0.4:6443 tlsClientCert: name: "" status: conditions: - lastTransitionTime: "2021-01-13T20:11:34Z" message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnectSuccess status: "True" type: Reachable failures: - latency: 2.241775ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:10:34Z" - latency: 2.582129ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:09:34Z" - latency: 3.483578ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:08:34Z" outages: - end: "2021-01-13T20:11:34Z" endLogs: - latency: 2.032018ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T20:11:34Z" - latency: 2.241775ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:10:34Z" - latency: 2.582129ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:09:34Z" - latency: 3.483578ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:08:34Z" message: Connectivity restored after 2m59.999789186s start: "2021-01-13T20:08:34Z" startLogs: - latency: 3.483578ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:08:34Z" successes: - latency: 2.845865ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:14:34Z" - latency: 2.926345ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:13:34Z" - latency: 2.895796ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:12:34Z" - latency: 2.696844ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:11:34Z" - latency: 1.502064ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:10:34Z" - latency: 1.388857ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:09:34Z" - latency: 1.906383ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:08:34Z" - latency: 2.089073ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:07:34Z" - latency: 2.156994ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:06:34Z" - latency: 1.777043ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:05:34Z"
Copy to Clipboard Copied!
Chapter 13. Changing the MTU for the cluster network
As a cluster administrator, you can change the MTU for the cluster network after cluster installation. This change is disruptive as cluster nodes must be rebooted to finalize the MTU change.
You can change the MTU only for clusters that use the OVN-Kubernetes plugin or the OpenShift SDN network plugin.
13.1. About the cluster MTU
During installation the maximum transmission unit (MTU) for the cluster network is detected automatically based on the MTU of the primary network interface of nodes in the cluster. You do not usually need to override the detected MTU.
You might want to change the MTU of the cluster network for several reasons:
- The MTU detected during cluster installation is not correct for your infrastructure.
- Your cluster infrastructure now requires a different MTU, such as from the addition of nodes that need a different MTU for optimal performance.
13.1.1. Service interruption considerations
When you initiate an MTU change on your cluster the following effects might impact service availability:
- At least two rolling reboots are required to complete the migration to a new MTU. During this time, some nodes are not available as they restart.
- Specific applications deployed to the cluster with shorter timeout intervals than the absolute TCP timeout interval might experience disruption during the MTU change.
13.1.2. MTU value selection
When planning your MTU migration there are two related but distinct MTU values to consider.
- Hardware MTU: This MTU value is set based on the specifics of your network infrastructure.
-
Cluster network MTU: This MTU value is always less than your hardware MTU to account for the cluster network overlay overhead. The specific overhead is determined by your network plugin. For OVN-Kubernetes, the overhead is
100
bytes. For OpenShift SDN, the overhead is50
bytes.
If your cluster requires different MTU values for different nodes, you must subtract the overhead value for your network plugin from the lowest MTU value that is used by any node in your cluster. For example, if some nodes in your cluster have an MTU of 9001
, and some have an MTU of 1500
, you must set this value to 1400
.
To avoid selecting an MTU value that is not acceptable by a node, verify the maximum MTU value (maxmtu
) that is accepted by the network interface by using the ip -d link
command.
13.1.3. How the migration process works
The following table summarizes the migration process by segmenting between the user-initiated steps in the process and the actions that the migration performs in response.
User-initiated steps | OpenShift Container Platform activity |
---|---|
Set the following values in the Cluster Network Operator configuration:
| Cluster Network Operator (CNO): Confirms that each field is set to a valid value.
If the values provided are valid, the CNO writes out a new temporary configuration with the MTU for the cluster network set to the value of the Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster. |
Reconfigure the MTU of the primary network interface for the nodes on the cluster. You can use a variety of methods to accomplish this, including:
| N/A |
Set the | Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster with the new MTU configuration. |
13.2. Changing the cluster network MTU
As a cluster administrator, you can increase or decrease the maximum transmission unit (MTU) for your cluster.
You cannot roll back an MTU value for nodes during the MTU migration process, but you can roll back the value after the MTU migration process completes.
The migration is disruptive and nodes in your cluster might be temporarily unavailable as the MTU update takes effect.
The following procedure describes how to change the cluster network MTU by using either machine configs, Dynamic Host Configuration Protocol (DHCP), or an ISO image. If you use either the DHCP or ISO approaches, you must refer to configuration artifacts that you kept after installing your cluster to complete the procedure.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have access to the cluster using an account with
cluster-admin
permissions. You have identified the target MTU for your cluster.
-
The MTU for the OVN-Kubernetes network plugin must be set to
100
less than the lowest hardware MTU value in your cluster. -
The MTU for the OpenShift SDN network plugin must be set to
50
less than the lowest hardware MTU value in your cluster.
-
The MTU for the OVN-Kubernetes network plugin must be set to
- If your nodes are physical machines, ensure that the cluster network and the connected network switches support jumbo frames.
- If your nodes are virtual machines (VMs), ensure that the hypervisor and the connected network switches support jumbo frames.
Procedure
To obtain the current MTU for the cluster network, enter the following command:
oc describe network.config cluster
$ oc describe network.config cluster
Copy to Clipboard Copied! Example output
... Status: Cluster Network: Cidr: 10.217.0.0/22 Host Prefix: 23 Cluster Network MTU: 1400 Network Type: OVNKubernetes Service Network: 10.217.4.0/23 ...
... Status: Cluster Network: Cidr: 10.217.0.0/22 Host Prefix: 23 Cluster Network MTU: 1400 Network Type: OVNKubernetes Service Network: 10.217.4.0/23 ...
Copy to Clipboard Copied! Prepare your configuration for the hardware MTU:
If your hardware MTU is specified with DHCP, update your DHCP configuration such as with the following dnsmasq configuration:
dhcp-option-force=26,<mtu>
dhcp-option-force=26,<mtu>
Copy to Clipboard Copied! where:
<mtu>
- Specifies the hardware MTU for the DHCP server to advertise.
- If your hardware MTU is specified with a kernel command line with PXE, update that configuration accordingly.
If your hardware MTU is specified in a NetworkManager connection configuration, complete the following steps. This approach is the default for OpenShift Container Platform if you do not explicitly specify your network configuration with DHCP, a kernel command line, or some other method. Your cluster nodes must all use the same underlying network configuration for the following procedure to work unmodified.
Find the primary network interface:
If you are using the OpenShift SDN network plugin, enter the following command:
oc debug node/<node_name> -- chroot /host ip route list match 0.0.0.0/0 | awk '{print $5 }'
$ oc debug node/<node_name> -- chroot /host ip route list match 0.0.0.0/0 | awk '{print $5 }'
Copy to Clipboard Copied! where:
<node_name>
- Specifies the name of a node in your cluster.
If you are using the OVN-Kubernetes network plugin, enter the following command:
oc debug node/<node_name> -- chroot /host nmcli -g connection.interface-name c show ovs-if-phys0
$ oc debug node/<node_name> -- chroot /host nmcli -g connection.interface-name c show ovs-if-phys0
Copy to Clipboard Copied! where:
<node_name>
- Specifies the name of a node in your cluster.
Create the following NetworkManager configuration in the
<interface>-mtu.conf
file:Example NetworkManager connection configuration
[connection-<interface>-mtu] match-device=interface-name:<interface> ethernet.mtu=<mtu>
[connection-<interface>-mtu] match-device=interface-name:<interface> ethernet.mtu=<mtu>
Copy to Clipboard Copied! where:
<mtu>
- Specifies the new hardware MTU value.
<interface>
- Specifies the primary network interface name.
Create two
MachineConfig
objects, one for the control plane nodes and another for the worker nodes in your cluster:Create the following Butane config in the
control-plane-interface.bu
file:NoteThe Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in
0
. For example,4.15.0
. See "Creating machine configs with Butane" for information about Butane.variant: openshift version: 4.15.0 metadata: name: 01-control-plane-interface labels: machineconfiguration.openshift.io/role: master storage: files: - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf contents: local: <interface>-mtu.conf mode: 0600
variant: openshift version: 4.15.0 metadata: name: 01-control-plane-interface labels: machineconfiguration.openshift.io/role: master storage: files: - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf
1 contents: local: <interface>-mtu.conf
2 mode: 0600
Copy to Clipboard Copied! Create the following Butane config in the
worker-interface.bu
file:NoteThe Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in
0
. For example,4.15.0
. See "Creating machine configs with Butane" for information about Butane.variant: openshift version: 4.15.0 metadata: name: 01-worker-interface labels: machineconfiguration.openshift.io/role: worker storage: files: - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf contents: local: <interface>-mtu.conf mode: 0600
variant: openshift version: 4.15.0 metadata: name: 01-worker-interface labels: machineconfiguration.openshift.io/role: worker storage: files: - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf
1 contents: local: <interface>-mtu.conf
2 mode: 0600
Copy to Clipboard Copied! Create
MachineConfig
objects from the Butane configs by running the following command:for manifest in control-plane-interface worker-interface; do
$ for manifest in control-plane-interface worker-interface; do butane --files-dir . $manifest.bu > $manifest.yaml done
Copy to Clipboard Copied! WarningDo not apply these machine configs until explicitly instructed later in this procedure. Applying these machine configs now causes a loss of stability for the cluster.
To begin the MTU migration, specify the migration configuration by entering the following command. The Machine Config Operator performs a rolling reboot of the nodes in the cluster in preparation for the MTU change.
oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": { "mtu": { "network": { "from": <overlay_from>, "to": <overlay_to> } , "machine": { "to" : <machine_to> } } } } }'
$ oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": { "mtu": { "network": { "from": <overlay_from>, "to": <overlay_to> } , "machine": { "to" : <machine_to> } } } } }'
Copy to Clipboard Copied! where:
<overlay_from>
- Specifies the current cluster network MTU value.
<overlay_to>
-
Specifies the target MTU for the cluster network. This value is set relative to the value of
<machine_to>
. For OVN-Kubernetes, this value must be100
less than the value of<machine_to>
. For OpenShift SDN, this value must be50
less than the value of<machine_to>
. <machine_to>
- Specifies the MTU for the primary network interface on the underlying host network.
Example that increases the cluster MTU
oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 9000 } , "machine": { "to" : 9100} } } } }'
$ oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 9000 } , "machine": { "to" : 9100} } } } }'
Copy to Clipboard Copied! As the Machine Config Operator updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
oc get machineconfigpools
$ oc get machineconfigpools
Copy to Clipboard Copied! A successfully updated node has the following status:
UPDATED=true
,UPDATING=false
,DEGRADED=false
.NoteBy default, the Machine Config Operator updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
To list the machine configuration state and the name of the applied machine configuration, enter the following command:
oc describe node | egrep "hostname|machineconfig"
$ oc describe node | egrep "hostname|machineconfig"
Copy to Clipboard Copied! Example output
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
Copy to Clipboard Copied! Verify that the following statements are true:
-
The value of
machineconfiguration.openshift.io/state
field isDone
. -
The value of the
machineconfiguration.openshift.io/currentConfig
field is equal to the value of themachineconfiguration.openshift.io/desiredConfig
field.
-
The value of
To confirm that the machine config is correct, enter the following command:
oc get machineconfig <config_name> -o yaml | grep ExecStart
$ oc get machineconfig <config_name> -o yaml | grep ExecStart
Copy to Clipboard Copied! where
<config_name>
is the name of the machine config from themachineconfiguration.openshift.io/currentConfig
field.The machine config must include the following update to the systemd configuration:
ExecStart=/usr/local/bin/mtu-migration.sh
ExecStart=/usr/local/bin/mtu-migration.sh
Copy to Clipboard Copied!
Update the underlying network interface MTU value:
If you are specifying the new MTU with a NetworkManager connection configuration, enter the following command. The MachineConfig Operator automatically performs a rolling reboot of the nodes in your cluster.
for manifest in control-plane-interface worker-interface; do
$ for manifest in control-plane-interface worker-interface; do oc create -f $manifest.yaml done
Copy to Clipboard Copied! - If you are specifying the new MTU with a DHCP server option or a kernel command line and PXE, make the necessary changes for your infrastructure.
As the Machine Config Operator updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
oc get machineconfigpools
$ oc get machineconfigpools
Copy to Clipboard Copied! A successfully updated node has the following status:
UPDATED=true
,UPDATING=false
,DEGRADED=false
.NoteBy default, the Machine Config Operator updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
To list the machine configuration state and the name of the applied machine configuration, enter the following command:
oc describe node | egrep "hostname|machineconfig"
$ oc describe node | egrep "hostname|machineconfig"
Copy to Clipboard Copied! Example output
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
Copy to Clipboard Copied! Verify that the following statements are true:
-
The value of
machineconfiguration.openshift.io/state
field isDone
. -
The value of the
machineconfiguration.openshift.io/currentConfig
field is equal to the value of themachineconfiguration.openshift.io/desiredConfig
field.
-
The value of
To confirm that the machine config is correct, enter the following command:
oc get machineconfig <config_name> -o yaml | grep path:
$ oc get machineconfig <config_name> -o yaml | grep path:
Copy to Clipboard Copied! where
<config_name>
is the name of the machine config from themachineconfiguration.openshift.io/currentConfig
field.If the machine config is successfully deployed, the previous output contains the
/etc/NetworkManager/conf.d/99-<interface>-mtu.conf
file path and theExecStart=/usr/local/bin/mtu-migration.sh
line.
Finalize the MTU migration for your plugin. In both example commands,
<mtu>
specifies the new cluster network MTU that you specified with<overlay_to>
.To finalize the MTU migration, enter the following command for the OVN-Kubernetes network plugin:
oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": null, "defaultNetwork":{ "ovnKubernetesConfig": { "mtu": <mtu> }}}}'
$ oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": null, "defaultNetwork":{ "ovnKubernetesConfig": { "mtu": <mtu> }}}}'
Copy to Clipboard Copied! To finalize the MTU migration, enter the following command for the OpenShift SDN network plugin:
oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": null, "defaultNetwork":{ "openshiftSDNConfig": { "mtu": <mtu> }}}}'
$ oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": null, "defaultNetwork":{ "openshiftSDNConfig": { "mtu": <mtu> }}}}'
Copy to Clipboard Copied!
After finalizing the MTU migration, each machine config pool node is rebooted one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
oc get machineconfigpools
$ oc get machineconfigpools
Copy to Clipboard Copied! A successfully updated node has the following status:
UPDATED=true
,UPDATING=false
,DEGRADED=false
.
Verification
To get the current MTU for the cluster network, enter the following command:
oc describe network.config cluster
$ oc describe network.config cluster
Copy to Clipboard Copied! Get the current MTU for the primary network interface of a node:
To list the nodes in your cluster, enter the following command:
oc get nodes
$ oc get nodes
Copy to Clipboard Copied! To obtain the current MTU setting for the primary network interface on a node, enter the following command:
oc debug node/<node> -- chroot /host ip address show <interface>
$ oc debug node/<node> -- chroot /host ip address show <interface>
Copy to Clipboard Copied! where:
<node>
- Specifies a node from the output from the previous step.
<interface>
- Specifies the primary network interface name for the node.
Example output
ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8051
ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8051
Copy to Clipboard Copied!
Chapter 14. Configuring the node port service range
During cluster installation, you can configure the node port range to meet the requirements of your cluster. After cluster installation, only a cluster administrator can expand the range as a postinstallation task. If your cluster uses a large number of node ports, consider increasing the available port range according to the requirements of your cluster.
Before you expand a node port range, consider that Red Hat has not performed testing outside the default port range of 30000-32768
. For ranges outside the default port range, ensure that you test to verify the expanding node port range does not impact your cluster. If you expanded the range and a port allocation issue occurs, create a new cluster and set the required range for it.
If you do not set a node port range during cluster installation, the default range of 30000-32768
applies to your cluster. In this situation, you can expand the range on either side, but you must preserve 30000-32768
within your new port range.
If you expand the node port range and OpenShift CLI (oc
) stops working because of a port conflict with the OpenShift API server, you must create a new cluster. Ensure that the new node port range does not overlap with any ports already in use by host processes or pods that are configured with host networking.
14.1. Expanding the node port range
You can expand the node port range for your cluster. However, after you install your OpenShift Container Platform cluster, you cannot contract the node port range on either side.
Before you expand a node port range, consider that Red Hat has not performed testing outside the default port range of 30000-32768
. For ranges outside the default port range, ensure that you test to verify the expanding node port range does not impact your cluster. If you expanded the range and a port allocation issue occurs, create a new cluster and set the required range for it.
Prerequisites
-
Installed the OpenShift CLI (
oc
). -
Logged in to the cluster as a user with
cluster-admin
privileges. -
You ensured that your cluster infrastructure allows access to the ports that exist in the extended range. For example, if you expand the node port range to
30000-32900
, your firewall or packet filtering configuration must allow the inclusive port range of30000-32900
.
Procedure
Expand the range for the
serviceNodePortRange
parameter in thenetwork.config.openshift.io
object that your cluster uses to manage traffic for pods by entering the following command in your command-line interface (CLI):oc patch network.config.openshift.io cluster --type=merge -p \ '{
$ oc patch network.config.openshift.io cluster --type=merge -p \ '{ "spec": { "serviceNodePortRange": "<port_range>" }
1 }'
Copy to Clipboard Copied! - 1
- Where
<port_range>
is your expanded range, such as30000-32900
.
TipYou can also apply the following YAML to update the node port range:
apiVersion: config.openshift.io/v1 kind: Network metadata: name: cluster spec: serviceNodePortRange: "<port_range>" # ...
apiVersion: config.openshift.io/v1 kind: Network metadata: name: cluster spec: serviceNodePortRange: "<port_range>" # ...
Copy to Clipboard Copied! Example output
network.config.openshift.io/cluster patched
network.config.openshift.io/cluster patched
Copy to Clipboard Copied!
Verification
To confirm a successful configuration, enter the following command. The update can take several minutes to apply.
oc get configmaps -n openshift-kube-apiserver config \ -o jsonpath="{.data['config\.yaml']}" | \ grep -Eo '"service-node-port-range":["[[:digit:]]+-[[:digit:]]+"]'
$ oc get configmaps -n openshift-kube-apiserver config \ -o jsonpath="{.data['config\.yaml']}" | \ grep -Eo '"service-node-port-range":["[[:digit:]]+-[[:digit:]]+"]'
Copy to Clipboard Copied! Example output
"service-node-port-range":["30000-32900"]
"service-node-port-range":["30000-32900"]
Copy to Clipboard Copied!
Chapter 15. Configuring the cluster network range
As a cluster administrator, you can expand the cluster network range after cluster installation. You might want to expand the cluster network range if you need more IP addresses for additional nodes.
For example, if you deployed a cluster and specified 10.128.0.0/19
as the cluster network range and a host prefix of 23
, you are limited to 16 nodes. You can expand that to 510 nodes by changing the CIDR mask on a cluster to /14
.
When expanding the cluster network address range, your cluster must use the OVN-Kubernetes network plugin. Other network plugins are not supported.
The following limitations apply when modifying the cluster network IP address range:
- The CIDR mask size specified must always be smaller than the currently configured CIDR mask size, because you can only increase IP space by adding more nodes to an installed cluster
- The host prefix cannot be modified
- Pods that are configured with an overridden default gateway must be recreated after the cluster network expands
15.1. Expanding the cluster network IP address range
You can expand the IP address range for the cluster network. Because this change requires rolling out a new Operator configuration across the cluster, it can take up to 30 minutes to take effect.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster with a user with
cluster-admin
privileges. - Ensure that the cluster uses the OVN-Kubernetes network plugin.
Procedure
To obtain the cluster network range and host prefix for your cluster, enter the following command:
oc get network.operator.openshift.io \ -o jsonpath="{.items[0].spec.clusterNetwork}"
$ oc get network.operator.openshift.io \ -o jsonpath="{.items[0].spec.clusterNetwork}"
Copy to Clipboard Copied! Example output
[{"cidr":"10.217.0.0/22","hostPrefix":23}]
[{"cidr":"10.217.0.0/22","hostPrefix":23}]
Copy to Clipboard Copied! To expand the cluster network IP address range, enter the following command. Use the CIDR IP address range and host prefix returned from the output of the previous command.
oc patch Network.config.openshift.io cluster --type='merge' --patch \ '{
$ oc patch Network.config.openshift.io cluster --type='merge' --patch \ '{ "spec":{ "clusterNetwork": [ {"cidr":"<network>/<cidr>","hostPrefix":<prefix>} ], "networkType": "OVNKubernetes" } }'
Copy to Clipboard Copied! where:
<network>
-
Specifies the network part of the
cidr
field that you obtained from the previous step. You cannot change this value. <cidr>
-
Specifies the network prefix length. For example,
14
. Change this value to a smaller number than the value from the output in the previous step to expand the cluster network range. <prefix>
-
Specifies the current host prefix for your cluster. This value must be the same value for the
hostPrefix
field that you obtained from the previous step.
Example command
oc patch Network.config.openshift.io cluster --type='merge' --patch \ '{
$ oc patch Network.config.openshift.io cluster --type='merge' --patch \ '{ "spec":{ "clusterNetwork": [ {"cidr":"10.217.0.0/14","hostPrefix": 23} ], "networkType": "OVNKubernetes" } }'
Copy to Clipboard Copied! Example output
network.config.openshift.io/cluster patched
network.config.openshift.io/cluster patched
Copy to Clipboard Copied! To confirm that the configuration is active, enter the following command. It can take up to 30 minutes for this change to take effect.
oc get network.operator.openshift.io \ -o jsonpath="{.items[0].spec.clusterNetwork}"
$ oc get network.operator.openshift.io \ -o jsonpath="{.items[0].spec.clusterNetwork}"
Copy to Clipboard Copied! Example output
[{"cidr":"10.217.0.0/14","hostPrefix":23}]
[{"cidr":"10.217.0.0/14","hostPrefix":23}]
Copy to Clipboard Copied!
Chapter 16. Configuring IP failover
This topic describes configuring IP failover for pods and services on your OpenShift Container Platform cluster.
IP failover uses Keepalived to host a set of externally accessible Virtual IP (VIP) addresses on a set of hosts. Each VIP address is only serviced by a single host at a time. Keepalived uses the Virtual Router Redundancy Protocol (VRRP) to determine which host, from the set of hosts, services which VIP. If a host becomes unavailable, or if the service that Keepalived is watching does not respond, the VIP is switched to another host from the set. This means a VIP is always serviced as long as a host is available.
Every VIP in the set is serviced by a node selected from the set. If a single node is available, the VIPs are served. There is no way to explicitly distribute the VIPs over the nodes, so there can be nodes with no VIPs and other nodes with many VIPs. If there is only one node, all VIPs are on it.
The administrator must ensure that all of the VIP addresses meet the following requirements:
- Accessible on the configured hosts from outside the cluster.
- Not used for any other purpose within the cluster.
Keepalived on each node determines whether the needed service is running. If it is, VIPs are supported and Keepalived participates in the negotiation to determine which node serves the VIP. For a node to participate, the service must be listening on the watch port on a VIP or the check must be disabled.
Each VIP in the set might be served by a different node.
IP failover monitors a port on each VIP to determine whether the port is reachable on the node. If the port is not reachable, the VIP is not assigned to the node. If the port is set to 0
, this check is suppressed. The check script does the needed testing.
When a node running Keepalived passes the check script, the VIP on that node can enter the master
state based on its priority and the priority of the current master and as determined by the preemption strategy.
A cluster administrator can provide a script through the OPENSHIFT_HA_NOTIFY_SCRIPT
variable, and this script is called whenever the state of the VIP on the node changes. Keepalived uses the master
state when it is servicing the VIP, the backup
state when another node is servicing the VIP, or in the fault
state when the check script fails. The notify script is called with the new state whenever the state changes.
You can create an IP failover deployment configuration on OpenShift Container Platform. The IP failover deployment configuration specifies the set of VIP addresses, and the set of nodes on which to service them. A cluster can have multiple IP failover deployment configurations, with each managing its own set of unique VIP addresses. Each node in the IP failover configuration runs an IP failover pod, and this pod runs Keepalived.
When using VIPs to access a pod with host networking, the application pod runs on all nodes that are running the IP failover pods. This enables any of the IP failover nodes to become the master and service the VIPs when needed. If application pods are not running on all nodes with IP failover, either some IP failover nodes never service the VIPs or some application pods never receive any traffic. Use the same selector and replication count, for both IP failover and the application pods, to avoid this mismatch.
While using VIPs to access a service, any of the nodes can be in the IP failover set of nodes, since the service is reachable on all nodes, no matter where the application pod is running. Any of the IP failover nodes can become master at any time. The service can either use external IPs and a service port or it can use a NodePort
. Setting up a NodePort
is a privileged operation.
When using external IPs in the service definition, the VIPs are set to the external IPs, and the IP failover monitoring port is set to the service port. When using a node port, the port is open on every node in the cluster, and the service load-balances traffic from whatever node currently services the VIP. In this case, the IP failover monitoring port is set to the NodePort
in the service definition.
Even though a service VIP is highly available, performance can still be affected. Keepalived makes sure that each of the VIPs is serviced by some node in the configuration, and several VIPs can end up on the same node even when other nodes have none. Strategies that externally load-balance across a set of VIPs can be thwarted when IP failover puts multiple VIPs on the same node.
When you use ExternalIP
, you can set up IP failover to have the same VIP range as the ExternalIP
range. You can also disable the monitoring port. In this case, all of the VIPs appear on same node in the cluster. Any user can set up a service with an ExternalIP
and make it highly available.
There are a maximum of 254 VIPs in the cluster.
16.1. IP failover environment variables
The following table contains the variables used to configure IP failover.
Variable Name | Default | Description |
---|---|---|
|
|
The IP failover pod tries to open a TCP connection to this port on each Virtual IP (VIP). If connection is established, the service is considered to be running. If this port is set to |
|
The interface name that IP failover uses to send Virtual Router Redundancy Protocol (VRRP) traffic. The default value is
If your cluster uses the OVN-Kubernetes network plugin, set this value to | |
|
|
The number of replicas to create. This must match |
|
The list of IP address ranges to replicate. This must be provided. For example, | |
|
|
The offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is |
|
The number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the | |
| INPUT |
The name of the iptables chain, to automatically add an |
| The full path name in the pod file system of a script that is periodically run to verify the application is operating. | |
|
| The period, in seconds, that the check script is run. |
| The full path name in the pod file system of a script that is run whenever the state changes. | |
|
|
The strategy for handling a new higher priority host. The |
16.2. Configuring IP failover in your cluster
As a cluster administrator, you can configure IP failover on an entire cluster, or on a subset of nodes, as defined by the label selector. You can also configure multiple IP failover deployments in your cluster, where each one is independent of the others.
The IP failover deployment ensures that a failover pod runs on each of the nodes matching the constraints or the label used.
This pod runs Keepalived, which can monitor an endpoint and use Virtual Router Redundancy Protocol (VRRP) to fail over the virtual IP (VIP) from one node to another if the first node cannot reach the service or endpoint.
For production use, set a selector
that selects at least two nodes, and set replicas
equal to the number of selected nodes.
Prerequisites
-
You are logged in to the cluster as a user with
cluster-admin
privileges. - You created a pull secret.
Red Hat OpenStack Platform (RHOSP) only:
- You installed an RHOSP client (RHCOS documentation) on the target environment.
-
You also downloaded the RHOSP
openrc.sh
rc file (RHCOS documentation).
Procedure
Create an IP failover service account:
oc create sa ipfailover
$ oc create sa ipfailover
Copy to Clipboard Copied! Update security context constraints (SCC) for
hostNetwork
:oc adm policy add-scc-to-user privileged -z ipfailover
$ oc adm policy add-scc-to-user privileged -z ipfailover
Copy to Clipboard Copied! oc adm policy add-scc-to-user hostnetwork -z ipfailover
$ oc adm policy add-scc-to-user hostnetwork -z ipfailover
Copy to Clipboard Copied! Red Hat OpenStack Platform (RHOSP) only: Complete the following steps to make a failover VIP address reachable on RHOSP ports.
Use the RHOSP CLI to show the default RHOSP API and VIP addresses in the
allowed_address_pairs
parameter of your RHOSP cluster:openstack port show <cluster_name> -c allowed_address_pairs
$ openstack port show <cluster_name> -c allowed_address_pairs
Copy to Clipboard Copied! Output example
*Field* *Value* allowed_address_pairs ip_address='192.168.0.5', mac_address='fa:16:3e:31:f9:cb' ip_address='192.168.0.7', mac_address='fa:16:3e:31:f9:cb'
*Field* *Value* allowed_address_pairs ip_address='192.168.0.5', mac_address='fa:16:3e:31:f9:cb' ip_address='192.168.0.7', mac_address='fa:16:3e:31:f9:cb'
Copy to Clipboard Copied! Set a different VIP address for the IP failover deployment and make the address reachable on RHOSP ports by entering the following command in the RHOSP CLI. Do not set any default RHOSP API and VIP addresses as the failover VIP address for the IP failover deployment.
Example of adding the
1.1.1.1
failover IP address as an allowed address on RHOSP ports.openstack port set <cluster_name> --allowed-address ip-address=1.1.1.1,mac-address=fa:fa:16:3e:31:f9:cb
$ openstack port set <cluster_name> --allowed-address ip-address=1.1.1.1,mac-address=fa:fa:16:3e:31:f9:cb
Copy to Clipboard Copied! - Create a deployment YAML file to configure IP failover for your deployment. See "Example deployment YAML for IP failover configuration" in a later step.
Specify the following specification in the IP failover deployment so that you pass the failover VIP address to the
OPENSHIFT_HA_VIRTUAL_IPS
environment variable:Example of adding the
1.1.1.1
VIP address toOPENSHIFT_HA_VIRTUAL_IPS
apiVersion: apps/v1 kind: Deployment metadata: name: ipfailover-keepalived # ... spec: env: - name: OPENSHIFT_HA_VIRTUAL_IPS value: "1.1.1.1" # ...
apiVersion: apps/v1 kind: Deployment metadata: name: ipfailover-keepalived # ... spec: env: - name: OPENSHIFT_HA_VIRTUAL_IPS value: "1.1.1.1" # ...
Copy to Clipboard Copied!
Create a deployment YAML file to configure IP failover.
NoteFor Red Hat OpenStack Platform (RHOSP), you do not need to re-create the deployment YAML file. You already created this file as part of the earlier instructions.
Example deployment YAML for IP failover configuration
apiVersion: apps/v1 kind: Deployment metadata: name: ipfailover-keepalived labels: ipfailover: hello-openshift spec: strategy: type: Recreate replicas: 2 selector: matchLabels: ipfailover: hello-openshift template: metadata: labels: ipfailover: hello-openshift spec: serviceAccountName: ipfailover privileged: true hostNetwork: true nodeSelector: node-role.kubernetes.io/worker: "" containers: - name: openshift-ipfailover image: registry.redhat.io/openshift4/ose-keepalived-ipfailover-rhel9:v4.15 ports: - containerPort: 63000 hostPort: 63000 imagePullPolicy: IfNotPresent securityContext: privileged: true volumeMounts: - name: lib-modules mountPath: /lib/modules readOnly: true - name: host-slash mountPath: /host readOnly: true mountPropagation: HostToContainer - name: etc-sysconfig mountPath: /etc/sysconfig readOnly: true - name: config-volume mountPath: /etc/keepalive env: - name: OPENSHIFT_HA_CONFIG_NAME value: "ipfailover" - name: OPENSHIFT_HA_VIRTUAL_IPS value: "1.1.1.1-2" - name: OPENSHIFT_HA_VIP_GROUPS value: "10" - name: OPENSHIFT_HA_NETWORK_INTERFACE value: "ens3" #The host interface to assign the VIPs - name: OPENSHIFT_HA_MONITOR_PORT value: "30060" - name: OPENSHIFT_HA_VRRP_ID_OFFSET value: "0" - name: OPENSHIFT_HA_REPLICA_COUNT value: "2" #Must match the number of replicas in the deployment - name: OPENSHIFT_HA_USE_UNICAST value: "false" #- name: OPENSHIFT_HA_UNICAST_PEERS #value: "10.0.148.40,10.0.160.234,10.0.199.110" - name: OPENSHIFT_HA_IPTABLES_CHAIN value: "INPUT" #- name: OPENSHIFT_HA_NOTIFY_SCRIPT # value: /etc/keepalive/mynotifyscript.sh - name: OPENSHIFT_HA_CHECK_SCRIPT value: "/etc/keepalive/mycheckscript.sh" - name: OPENSHIFT_HA_PREEMPTION value: "preempt_delay 300" - name: OPENSHIFT_HA_CHECK_INTERVAL value: "2" livenessProbe: initialDelaySeconds: 10 exec: command: - pgrep - keepalived volumes: - name: lib-modules hostPath: path: /lib/modules - name: host-slash hostPath: path: / - name: etc-sysconfig hostPath: path: /etc/sysconfig # config-volume contains the check script # created with `oc create configmap keepalived-checkscript --from-file=mycheckscript.sh` - configMap: defaultMode: 0755 name: keepalived-checkscript name: config-volume imagePullSecrets: - name: openshift-pull-secret
apiVersion: apps/v1 kind: Deployment metadata: name: ipfailover-keepalived
1 labels: ipfailover: hello-openshift spec: strategy: type: Recreate replicas: 2 selector: matchLabels: ipfailover: hello-openshift template: metadata: labels: ipfailover: hello-openshift spec: serviceAccountName: ipfailover privileged: true hostNetwork: true nodeSelector: node-role.kubernetes.io/worker: "" containers: - name: openshift-ipfailover image: registry.redhat.io/openshift4/ose-keepalived-ipfailover-rhel9:v4.15 ports: - containerPort: 63000 hostPort: 63000 imagePullPolicy: IfNotPresent securityContext: privileged: true volumeMounts: - name: lib-modules mountPath: /lib/modules readOnly: true - name: host-slash mountPath: /host readOnly: true mountPropagation: HostToContainer - name: etc-sysconfig mountPath: /etc/sysconfig readOnly: true - name: config-volume mountPath: /etc/keepalive env: - name: OPENSHIFT_HA_CONFIG_NAME value: "ipfailover" - name: OPENSHIFT_HA_VIRTUAL_IPS
2 value: "1.1.1.1-2" - name: OPENSHIFT_HA_VIP_GROUPS
3 value: "10" - name: OPENSHIFT_HA_NETWORK_INTERFACE
4 value: "ens3" #The host interface to assign the VIPs - name: OPENSHIFT_HA_MONITOR_PORT
5 value: "30060" - name: OPENSHIFT_HA_VRRP_ID_OFFSET
6 value: "0" - name: OPENSHIFT_HA_REPLICA_COUNT
7 value: "2" #Must match the number of replicas in the deployment - name: OPENSHIFT_HA_USE_UNICAST value: "false" #- name: OPENSHIFT_HA_UNICAST_PEERS #value: "10.0.148.40,10.0.160.234,10.0.199.110" - name: OPENSHIFT_HA_IPTABLES_CHAIN
8 value: "INPUT" #- name: OPENSHIFT_HA_NOTIFY_SCRIPT
9 # value: /etc/keepalive/mynotifyscript.sh - name: OPENSHIFT_HA_CHECK_SCRIPT
10 value: "/etc/keepalive/mycheckscript.sh" - name: OPENSHIFT_HA_PREEMPTION
11 value: "preempt_delay 300" - name: OPENSHIFT_HA_CHECK_INTERVAL
12 value: "2" livenessProbe: initialDelaySeconds: 10 exec: command: - pgrep - keepalived volumes: - name: lib-modules hostPath: path: /lib/modules - name: host-slash hostPath: path: / - name: etc-sysconfig hostPath: path: /etc/sysconfig # config-volume contains the check script # created with `oc create configmap keepalived-checkscript --from-file=mycheckscript.sh` - configMap: defaultMode: 0755 name: keepalived-checkscript name: config-volume imagePullSecrets: - name: openshift-pull-secret
13 Copy to Clipboard Copied! - 1
- The name of the IP failover deployment.
- 2
- The list of IP address ranges to replicate. This must be provided. For example,
1.2.3.4-6,1.2.3.9
. - 3
- The number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the
OPENSHIFT_HA_VIP_GROUPS
variable. - 4
- The interface name that IP failover uses to send VRRP traffic. By default,
eth0
is used. - 5
- The IP failover pod tries to open a TCP connection to this port on each VIP. If connection is established, the service is considered to be running. If this port is set to
0
, the test always passes. The default value is80
. - 6
- The offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is
0
, and the allowed range is0
through255
. - 7
- The number of replicas to create. This must match
spec.replicas
value in IP failover deployment configuration. The default value is2
. - 8
- The name of the
iptables
chain to automatically add aniptables
rule to allow the VRRP traffic on. If the value is not set, aniptables
rule is not added. If the chain does not exist, it is not created, and Keepalived operates in unicast mode. The default isINPUT
. - 9
- The full path name in the pod file system of a script that is run whenever the state changes.
- 10
- The full path name in the pod file system of a script that is periodically run to verify the application is operating.
- 11
- The strategy for handling a new higher priority host. The default value is
preempt_delay 300
, which causes a Keepalived instance to take over a VIP after 5 minutes if a lower-priority master is holding the VIP. - 12
- The period, in seconds, that the check script is run. The default value is
2
. - 13
- Create the pull secret before creating the deployment, otherwise you will get an error when creating the deployment.
16.3. Configuring check and notify scripts
Keepalived monitors the health of the application by periodically running an optional user-supplied check script. For example, the script can test a web server by issuing a request and verifying the response. As cluster administrator, you can provide an optional notify script, which is called whenever the state changes.
The check and notify scripts run in the IP failover pod and use the pod file system, not the host file system. However, the IP failover pod makes the host file system available under the /hosts
mount path. When configuring a check or notify script, you must provide the full path to the script. The recommended approach for providing the scripts is to use a ConfigMap
object.
The full path names of the check and notify scripts are added to the Keepalived configuration file, _/etc/keepalived/keepalived.conf
, which is loaded every time Keepalived starts. The scripts can be added to the pod with a ConfigMap
object as described in the following methods.
Check script
When a check script is not provided, a simple default script is run that tests the TCP connection. This default test is suppressed when the monitor port is 0
.
Each IP failover pod manages a Keepalived daemon that manages one or more virtual IP (VIP) addresses on the node where the pod is running. The Keepalived daemon keeps the state of each VIP for that node. A particular VIP on a particular node might be in master
, backup
, or fault
state.
If the check script returns non-zero, the node enters the backup
state, and any VIPs it holds are reassigned.
Notify script
Keepalived passes the following three parameters to the notify script:
-
$1
-group
orinstance
-
$2
- Name of thegroup
orinstance
-
$3
- The new state:master
,backup
, orfault
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges.
Procedure
Create the desired script and create a
ConfigMap
object to hold it. The script has no input arguments and must return0
forOK
and1
forfail
.The check script,
mycheckscript.sh
:#!/bin/bash # Whatever tests are needed # E.g., send request and verify response exit 0
#!/bin/bash # Whatever tests are needed # E.g., send request and verify response exit 0
Copy to Clipboard Copied! Create the
ConfigMap
object :oc create configmap mycustomcheck --from-file=mycheckscript.sh
$ oc create configmap mycustomcheck --from-file=mycheckscript.sh
Copy to Clipboard Copied! Add the script to the pod. The
defaultMode
for the mountedConfigMap
object files must able to run by usingoc
commands or by editing the deployment configuration. A value of0755
,493
decimal, is typical:oc set env deploy/ipfailover-keepalived \ OPENSHIFT_HA_CHECK_SCRIPT=/etc/keepalive/mycheckscript.sh
$ oc set env deploy/ipfailover-keepalived \ OPENSHIFT_HA_CHECK_SCRIPT=/etc/keepalive/mycheckscript.sh
Copy to Clipboard Copied! oc set volume deploy/ipfailover-keepalived --add --overwrite \ --name=config-volume \ --mount-path=/etc/keepalive \ --source='{"configMap": { "name": "mycustomcheck", "defaultMode": 493}}'
$ oc set volume deploy/ipfailover-keepalived --add --overwrite \ --name=config-volume \ --mount-path=/etc/keepalive \ --source='{"configMap": { "name": "mycustomcheck", "defaultMode": 493}}'
Copy to Clipboard Copied! NoteThe
oc set env
command is whitespace sensitive. There must be no whitespace on either side of the=
sign.TipYou can alternatively edit the
ipfailover-keepalived
deployment configuration:oc edit deploy ipfailover-keepalived
$ oc edit deploy ipfailover-keepalived
Copy to Clipboard Copied! spec: containers: - env: - name: OPENSHIFT_HA_CHECK_SCRIPT value: /etc/keepalive/mycheckscript.sh ... volumeMounts: - mountPath: /etc/keepalive name: config-volume dnsPolicy: ClusterFirst ... volumes: - configMap: defaultMode: 0755 name: customrouter name: config-volume ...
spec: containers: - env: - name: OPENSHIFT_HA_CHECK_SCRIPT
1 value: /etc/keepalive/mycheckscript.sh ... volumeMounts:
2 - mountPath: /etc/keepalive name: config-volume dnsPolicy: ClusterFirst ... volumes:
3 - configMap: defaultMode: 0755
4 name: customrouter name: config-volume ...
Copy to Clipboard Copied! - 1
- In the
spec.container.env
field, add theOPENSHIFT_HA_CHECK_SCRIPT
environment variable to point to the mounted script file. - 2
- Add the
spec.container.volumeMounts
field to create the mount point. - 3
- Add a new
spec.volumes
field to mention the config map. - 4
- This sets run permission on the files. When read back, it is displayed in decimal,
493
.
Save the changes and exit the editor. This restarts
ipfailover-keepalived
.
16.4. Configuring VRRP preemption
When a Virtual IP (VIP) on a node leaves the fault
state by passing the check script, the VIP on the node enters the backup
state if it has lower priority than the VIP on the node that is currently in the master
state. The nopreempt
strategy does not move master
from the lower priority VIP on the host to the higher priority VIP on the host. With preempt_delay 300
, the default, Keepalived waits the specified 300 seconds and moves master
to the higher priority VIP on the host.
Procedure
To specify preemption enter
oc edit deploy ipfailover-keepalived
to edit the router deployment configuration:oc edit deploy ipfailover-keepalived
$ oc edit deploy ipfailover-keepalived
Copy to Clipboard Copied! ... spec: containers: - env: - name: OPENSHIFT_HA_PREEMPTION value: preempt_delay 300 ...
... spec: containers: - env: - name: OPENSHIFT_HA_PREEMPTION
1 value: preempt_delay 300 ...
Copy to Clipboard Copied! - 1
- Set the
OPENSHIFT_HA_PREEMPTION
value:-
preempt_delay 300
: Keepalived waits the specified 300 seconds and movesmaster
to the higher priority VIP on the host. This is the default value. -
nopreempt
: does not movemaster
from the lower priority VIP on the host to the higher priority VIP on the host.
-
16.5. Deploying multiple IP failover instances
Each IP failover pod managed by the IP failover deployment configuration, 1
pod per node or replica, runs a Keepalived daemon. As more IP failover deployment configurations are configured, more pods are created and more daemons join into the common Virtual Router Redundancy Protocol (VRRP) negotiation. This negotiation is done by all the Keepalived daemons and it determines which nodes service which virtual IPs (VIP).
Internally, Keepalived assigns a unique vrrp-id
to each VIP. The negotiation uses this set of vrrp-ids
, when a decision is made, the VIP corresponding to the winning vrrp-id
is serviced on the winning node.
Therefore, for every VIP defined in the IP failover deployment configuration, the IP failover pod must assign a corresponding vrrp-id
. This is done by starting at OPENSHIFT_HA_VRRP_ID_OFFSET
and sequentially assigning the vrrp-ids
to the list of VIPs. The vrrp-ids
can have values in the range 1..255
.
When there are multiple IP failover deployment configurations, you must specify OPENSHIFT_HA_VRRP_ID_OFFSET
so that there is room to increase the number of VIPs in the deployment configuration and none of the vrrp-id
ranges overlap.
16.6. Configuring IP failover for more than 254 addresses
IP failover management is limited to 254 groups of Virtual IP (VIP) addresses. By default OpenShift Container Platform assigns one IP address to each group. You can use the OPENSHIFT_HA_VIP_GROUPS
variable to change this so multiple IP addresses are in each group and define the number of VIP groups available for each Virtual Router Redundancy Protocol (VRRP) instance when configuring IP failover.
Grouping VIPs creates a wider range of allocation of VIPs per VRRP in the case of VRRP failover events, and is useful when all hosts in the cluster have access to a service locally. For example, when a service is being exposed with an ExternalIP
.
As a rule for failover, do not limit services, such as the router, to one specific host. Instead, services should be replicated to each host so that in the case of IP failover, the services do not have to be recreated on the new host.
If you are using OpenShift Container Platform health checks, the nature of IP failover and groups means that all instances in the group are not checked. For that reason, the Kubernetes health checks must be used to ensure that services are live.
Prerequisites
-
You are logged in to the cluster with a user with
cluster-admin
privileges.
Procedure
To change the number of IP addresses assigned to each group, change the value for the
OPENSHIFT_HA_VIP_GROUPS
variable, for example:Example
Deployment
YAML for IP failover configuration... spec: env: - name: OPENSHIFT_HA_VIP_GROUPS value: "3" ...
... spec: env: - name: OPENSHIFT_HA_VIP_GROUPS
1 value: "3" ...
Copy to Clipboard Copied! - 1
- If
OPENSHIFT_HA_VIP_GROUPS
is set to3
in an environment with seven VIPs, it creates three groups, assigning three VIPs to the first group, and two VIPs to the two remaining groups.
If the number of groups set by OPENSHIFT_HA_VIP_GROUPS
is fewer than the number of IP addresses set to fail over, the group contains more than one IP address, and all of the addresses move as a single unit.
16.7. High availability For ExternalIP
In non-cloud clusters, IP failover and ExternalIP
to a service can be combined. The result is high availability services for users that create services using ExternalIP
.
The approach is to specify an spec.ExternalIP.autoAssignCIDRs
range of the cluster network configuration, and then use the same range in creating the IP failover configuration.
Because IP failover can support up to a maximum of 255 VIPs for the entire cluster, the spec.ExternalIP.autoAssignCIDRs
must be /24
or smaller.
16.8. Removing IP failover
When IP failover is initially configured, the worker nodes in the cluster are modified with an iptables
rule that explicitly allows multicast packets on 224.0.0.18
for Keepalived. Because of the change to the nodes, removing IP failover requires running a job to remove the iptables
rule and removing the virtual IP addresses used by Keepalived.
Procedure
Optional: Identify and delete any check and notify scripts that are stored as config maps:
Identify whether any pods for IP failover use a config map as a volume:
oc get pod -l ipfailover \ -o jsonpath="\ {range .items[?(@.spec.volumes[*].configMap)]}
$ oc get pod -l ipfailover \ -o jsonpath="\ {range .items[?(@.spec.volumes[*].configMap)]} {'Namespace: '}{.metadata.namespace} {'Pod: '}{.metadata.name} {'Volumes that use config maps:'} {range .spec.volumes[?(@.configMap)]} {'volume: '}{.name} {'configMap: '}{.configMap.name}{'\n'}{end} {end}"
Copy to Clipboard Copied! Example output
Namespace: default Pod: keepalived-worker-59df45db9c-2x9mn Volumes that use config maps: volume: config-volume configMap: mycustomcheck
Namespace: default Pod: keepalived-worker-59df45db9c-2x9mn Volumes that use config maps: volume: config-volume configMap: mycustomcheck
Copy to Clipboard Copied! If the preceding step provided the names of config maps that are used as volumes, delete the config maps:
oc delete configmap <configmap_name>
$ oc delete configmap <configmap_name>
Copy to Clipboard Copied!
Identify an existing deployment for IP failover:
oc get deployment -l ipfailover
$ oc get deployment -l ipfailover
Copy to Clipboard Copied! Example output
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE default ipfailover 2/2 2 2 105d
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE default ipfailover 2/2 2 2 105d
Copy to Clipboard Copied! Delete the deployment:
oc delete deployment <ipfailover_deployment_name>
$ oc delete deployment <ipfailover_deployment_name>
Copy to Clipboard Copied! Remove the
ipfailover
service account:oc delete sa ipfailover
$ oc delete sa ipfailover
Copy to Clipboard Copied! Run a job that removes the IP tables rule that was added when IP failover was initially configured:
Create a file such as
remove-ipfailover-job.yaml
with contents that are similar to the following example:apiVersion: batch/v1 kind: Job metadata: generateName: remove-ipfailover- labels: app: remove-ipfailover spec: template: metadata: name: remove-ipfailover spec: containers: - name: remove-ipfailover image: registry.redhat.io/openshift4/ose-keepalived-ipfailover-rhel9:v4.15 command: ["/var/lib/ipfailover/keepalived/remove-failover.sh"] nodeSelector: kubernetes.io/hostname: <host_name> restartPolicy: Never
apiVersion: batch/v1 kind: Job metadata: generateName: remove-ipfailover- labels: app: remove-ipfailover spec: template: metadata: name: remove-ipfailover spec: containers: - name: remove-ipfailover image: registry.redhat.io/openshift4/ose-keepalived-ipfailover-rhel9:v4.15 command: ["/var/lib/ipfailover/keepalived/remove-failover.sh"] nodeSelector:
1 kubernetes.io/hostname: <host_name>
2 restartPolicy: Never
Copy to Clipboard Copied! Run the job:
oc create -f remove-ipfailover-job.yaml
$ oc create -f remove-ipfailover-job.yaml
Copy to Clipboard Copied! Example output
job.batch/remove-ipfailover-2h8dm created
job.batch/remove-ipfailover-2h8dm created
Copy to Clipboard Copied!
Verification
Confirm that the job removed the initial configuration for IP failover.
oc logs job/remove-ipfailover-2h8dm
$ oc logs job/remove-ipfailover-2h8dm
Copy to Clipboard Copied! Example output
remove-failover.sh: OpenShift IP Failover service terminating. - Removing ip_vs module ... - Cleaning up ... - Releasing VIPs (interface eth0) ...
remove-failover.sh: OpenShift IP Failover service terminating. - Removing ip_vs module ... - Cleaning up ... - Releasing VIPs (interface eth0) ...
Copy to Clipboard Copied!
Chapter 17. Configuring system controls and interface attributes using the tuning plugin
In Linux, sysctl allows an administrator to modify kernel parameters at runtime. You can modify interface-level network sysctls using the tuning Container Network Interface (CNI) meta plugin. The tuning CNI meta plugin operates in a chain with a main CNI plugin as illustrated.

The main CNI plugin assigns the interface and passes this interface to the tuning CNI meta plugin at runtime. You can change some sysctls and several interface attributes such as promiscuous mode, all-multicast mode, MTU, and MAC address in the network namespace by using the tuning CNI meta plugin.
17.1. Configuring system controls by using the tuning CNI
The following procedure configures the tuning CNI to change the interface-level network net.ipv4.conf.IFNAME.accept_redirects
sysctl. This example enables accepting and sending ICMP-redirected packets. In the tuning CNI meta plugin configuration, the interface name is represented by the IFNAME
token and is replaced with the actual name of the interface at runtime.
Procedure
Create a network attachment definition, such as
tuning-example.yaml
, with the following content:apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: <name> namespace: default spec: config: '{ "cniVersion": "0.4.0", "name": "<name>", "plugins": [{ "type": "<main_CNI_plugin>" }, { "type": "tuning", "sysctl": { "net.ipv4.conf.IFNAME.accept_redirects": "1" } } ] }
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: <name>
1 namespace: default
2 spec: config: '{ "cniVersion": "0.4.0",
3 "name": "<name>",
4 "plugins": [{ "type": "<main_CNI_plugin>"
5 }, { "type": "tuning",
6 "sysctl": { "net.ipv4.conf.IFNAME.accept_redirects": "1"
7 } } ] }
Copy to Clipboard Copied! - 1
- Specifies the name for the additional network attachment to create. The name must be unique within the specified namespace.
- 2
- Specifies the namespace that the object is associated with.
- 3
- Specifies the CNI specification version.
- 4
- Specifies the name for the configuration. It is recommended to match the configuration name to the name value of the network attachment definition.
- 5
- Specifies the name of the main CNI plugin to configure.
- 6
- Specifies the name of the CNI meta plugin.
- 7
- Specifies the sysctl to set. The interface name is represented by the
IFNAME
token and is replaced with the actual name of the interface at runtime.
An example YAML file is shown here:
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: tuningnad namespace: default spec: config: '{ "cniVersion": "0.4.0", "name": "tuningnad", "plugins": [{ "type": "bridge" }, { "type": "tuning", "sysctl": { "net.ipv4.conf.IFNAME.accept_redirects": "1" } } ] }'
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: tuningnad namespace: default spec: config: '{ "cniVersion": "0.4.0", "name": "tuningnad", "plugins": [{ "type": "bridge" }, { "type": "tuning", "sysctl": { "net.ipv4.conf.IFNAME.accept_redirects": "1" } } ] }'
Copy to Clipboard Copied! Apply the YAML by running the following command:
oc apply -f tuning-example.yaml
$ oc apply -f tuning-example.yaml
Copy to Clipboard Copied! Example output
networkattachmentdefinition.k8.cni.cncf.io/tuningnad created
networkattachmentdefinition.k8.cni.cncf.io/tuningnad created
Copy to Clipboard Copied! Create a pod such as
examplepod.yaml
with the network attachment definition similar to the following:apiVersion: v1 kind: Pod metadata: name: tunepod namespace: default annotations: k8s.v1.cni.cncf.io/networks: tuningnad spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault
apiVersion: v1 kind: Pod metadata: name: tunepod namespace: default annotations: k8s.v1.cni.cncf.io/networks: tuningnad
1 spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000
2 runAsGroup: 3000
3 allowPrivilegeEscalation: false
4 capabilities:
5 drop: ["ALL"] securityContext: runAsNonRoot: true
6 seccompProfile:
7 type: RuntimeDefault
Copy to Clipboard Copied! - 1
- Specify the name of the configured
NetworkAttachmentDefinition
. - 2
runAsUser
controls which user ID the container is run with.- 3
runAsGroup
controls which primary group ID the containers is run with.- 4
allowPrivilegeEscalation
determines if a pod can request to allow privilege escalation. If unspecified, it defaults to true. This boolean directly controls whether theno_new_privs
flag gets set on the container process.- 5
capabilities
permit privileged actions without giving full root access. This policy ensures all capabilities are dropped from the pod.- 6
runAsNonRoot: true
requires that the container will run with a user with any UID other than 0.- 7
RuntimeDefault
enables the default seccomp profile for a pod or container workload.
Apply the yaml by running the following command:
oc apply -f examplepod.yaml
$ oc apply -f examplepod.yaml
Copy to Clipboard Copied! Verify that the pod is created by running the following command:
oc get pod
$ oc get pod
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE tunepod 1/1 Running 0 47s
NAME READY STATUS RESTARTS AGE tunepod 1/1 Running 0 47s
Copy to Clipboard Copied! Log in to the pod by running the following command:
oc rsh tunepod
$ oc rsh tunepod
Copy to Clipboard Copied! Verify the values of the configured sysctl flags. For example, find the value
net.ipv4.conf.net1.accept_redirects
by running the following command:sysctl net.ipv4.conf.net1.accept_redirects
sh-4.4# sysctl net.ipv4.conf.net1.accept_redirects
Copy to Clipboard Copied! Expected output
net.ipv4.conf.net1.accept_redirects = 1
net.ipv4.conf.net1.accept_redirects = 1
Copy to Clipboard Copied!
17.2. Enabling all-multicast mode by using the tuning CNI
You can enable all-multicast mode by using the tuning Container Network Interface (CNI) meta plugin.
The following procedure describes how to configure the tuning CNI to enable the all-multicast mode.
Procedure
Create a network attachment definition, such as
tuning-example.yaml
, with the following content:apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: <name> namespace: default spec: config: '{ "cniVersion": "0.4.0", "name": "<name>", "plugins": [{ "type": "<main_CNI_plugin>" }, { "type": "tuning", "allmulti": true } } ] }
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: <name>
1 namespace: default
2 spec: config: '{ "cniVersion": "0.4.0",
3 "name": "<name>",
4 "plugins": [{ "type": "<main_CNI_plugin>"
5 }, { "type": "tuning",
6 "allmulti": true
7 } } ] }
Copy to Clipboard Copied! - 1
- Specifies the name for the additional network attachment to create. The name must be unique within the specified namespace.
- 2
- Specifies the namespace that the object is associated with.
- 3
- Specifies the CNI specification version.
- 4
- Specifies the name for the configuration. Match the configuration name to the name value of the network attachment definition.
- 5
- Specifies the name of the main CNI plugin to configure.
- 6
- Specifies the name of the CNI meta plugin.
- 7
- Changes the all-multicast mode of interface. If enabled, all multicast packets on the network will be received by the interface.
An example YAML file is shown here:
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: setallmulti namespace: default spec: config: '{ "cniVersion": "0.4.0", "name": "setallmulti", "plugins": [ { "type": "bridge" }, { "type": "tuning", "allmulti": true } ] }'
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: setallmulti namespace: default spec: config: '{ "cniVersion": "0.4.0", "name": "setallmulti", "plugins": [ { "type": "bridge" }, { "type": "tuning", "allmulti": true } ] }'
Copy to Clipboard Copied! Apply the settings specified in the YAML file by running the following command:
oc apply -f tuning-allmulti.yaml
$ oc apply -f tuning-allmulti.yaml
Copy to Clipboard Copied! Example output
networkattachmentdefinition.k8s.cni.cncf.io/setallmulti created
networkattachmentdefinition.k8s.cni.cncf.io/setallmulti created
Copy to Clipboard Copied! Create a pod with a network attachment definition similar to that specified in the following
examplepod.yaml
sample file:apiVersion: v1 kind: Pod metadata: name: allmultipod namespace: default annotations: k8s.v1.cni.cncf.io/networks: setallmulti spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault
apiVersion: v1 kind: Pod metadata: name: allmultipod namespace: default annotations: k8s.v1.cni.cncf.io/networks: setallmulti
1 spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000
2 runAsGroup: 3000
3 allowPrivilegeEscalation: false
4 capabilities:
5 drop: ["ALL"] securityContext: runAsNonRoot: true
6 seccompProfile:
7 type: RuntimeDefault
Copy to Clipboard Copied! - 1
- Specifies the name of the configured
NetworkAttachmentDefinition
. - 2
- Specifies the user ID the container is run with.
- 3
- Specifies which primary group ID the containers is run with.
- 4
- Specifies if a pod can request privilege escalation. If unspecified, it defaults to
true
. This boolean directly controls whether theno_new_privs
flag gets set on the container process. - 5
- Specifies the container capabilities. The
drop: ["ALL"]
statement indicates that all Linux capabilities are dropped from the pod, providing a more restrictive security profile. - 6
- Specifies that the container will run with a user with any UID other than 0.
- 7
- Specifies the container’s seccomp profile. In this case, the type is set to
RuntimeDefault
. Seccomp is a Linux kernel feature that restricts the system calls available to a process, enhancing security by minimizing the attack surface.
Apply the settings specified in the YAML file by running the following command:
oc apply -f examplepod.yaml
$ oc apply -f examplepod.yaml
Copy to Clipboard Copied! Verify that the pod is created by running the following command:
oc get pod
$ oc get pod
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE allmultipod 1/1 Running 0 23s
NAME READY STATUS RESTARTS AGE allmultipod 1/1 Running 0 23s
Copy to Clipboard Copied! Log in to the pod by running the following command:
oc rsh allmultipod
$ oc rsh allmultipod
Copy to Clipboard Copied! List all the interfaces associated with the pod by running the following command:
ip link
sh-4.4# ip link
Copy to Clipboard Copied! Example output
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8901 qdisc noqueue state UP mode DEFAULT group default link/ether 0a:58:0a:83:00:10 brd ff:ff:ff:ff:ff:ff link-netnsid 0 3: net1@if24: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether ee:9b:66:a4:ec:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8901 qdisc noqueue state UP mode DEFAULT group default link/ether 0a:58:0a:83:00:10 brd ff:ff:ff:ff:ff:ff link-netnsid 0
1 3: net1@if24: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether ee:9b:66:a4:ec:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
2 Copy to Clipboard Copied!
Chapter 18. Using the Stream Control Transmission Protocol (SCTP)
As a cluster administrator, you can use the Stream Control Transmission Protocol (SCTP) on a bare-metal cluster.
18.1. Support for SCTP on OpenShift Container Platform
As a cluster administrator, you can enable SCTP on the hosts in the cluster. On Red Hat Enterprise Linux CoreOS (RHCOS), the SCTP module is disabled by default.
SCTP is a reliable message based protocol that runs on top of an IP network.
When enabled, you can use SCTP as a protocol with pods, services, and network policy. A Service
object must be defined with the type
parameter set to either the ClusterIP
or NodePort
value.
18.1.1. Example configurations using SCTP protocol
You can configure a pod or service to use SCTP by setting the protocol
parameter to the SCTP
value in the pod or service object.
In the following example, a pod is configured to use SCTP:
apiVersion: v1 kind: Pod metadata: namespace: project1 name: example-pod spec: containers: - name: example-pod ... ports: - containerPort: 30100 name: sctpserver protocol: SCTP
apiVersion: v1
kind: Pod
metadata:
namespace: project1
name: example-pod
spec:
containers:
- name: example-pod
...
ports:
- containerPort: 30100
name: sctpserver
protocol: SCTP
In the following example, a service is configured to use SCTP:
apiVersion: v1 kind: Service metadata: namespace: project1 name: sctpserver spec: ... ports: - name: sctpserver protocol: SCTP port: 30100 targetPort: 30100 type: ClusterIP
apiVersion: v1
kind: Service
metadata:
namespace: project1
name: sctpserver
spec:
...
ports:
- name: sctpserver
protocol: SCTP
port: 30100
targetPort: 30100
type: ClusterIP
In the following example, a NetworkPolicy
object is configured to apply to SCTP network traffic on port 80
from any pods with a specific label:
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-sctp-on-http spec: podSelector: matchLabels: role: web ingress: - ports: - protocol: SCTP port: 80
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-sctp-on-http
spec:
podSelector:
matchLabels:
role: web
ingress:
- ports:
- protocol: SCTP
port: 80
18.2. Enabling Stream Control Transmission Protocol (SCTP)
As a cluster administrator, you can load and enable the blacklisted SCTP kernel module on worker nodes in your cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Access to the cluster as a user with the
cluster-admin
role.
Procedure
Create a file named
load-sctp-module.yaml
that contains the following YAML definition:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: name: load-sctp-module labels: machineconfiguration.openshift.io/role: worker spec: config: ignition: version: 3.2.0 storage: files: - path: /etc/modprobe.d/sctp-blacklist.conf mode: 0644 overwrite: true contents: source: data:, - path: /etc/modules-load.d/sctp-load.conf mode: 0644 overwrite: true contents: source: data:,sctp
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: name: load-sctp-module labels: machineconfiguration.openshift.io/role: worker spec: config: ignition: version: 3.2.0 storage: files: - path: /etc/modprobe.d/sctp-blacklist.conf mode: 0644 overwrite: true contents: source: data:, - path: /etc/modules-load.d/sctp-load.conf mode: 0644 overwrite: true contents: source: data:,sctp
Copy to Clipboard Copied! To create the
MachineConfig
object, enter the following command:oc create -f load-sctp-module.yaml
$ oc create -f load-sctp-module.yaml
Copy to Clipboard Copied! Optional: To watch the status of the nodes while the MachineConfig Operator applies the configuration change, enter the following command. When the status of a node transitions to
Ready
, the configuration update is applied.oc get nodes
$ oc get nodes
Copy to Clipboard Copied!
18.3. Verifying Stream Control Transmission Protocol (SCTP) is enabled
You can verify that SCTP is working on a cluster by creating a pod with an application that listens for SCTP traffic, associating it with a service, and then connecting to the exposed service.
Prerequisites
-
Access to the internet from the cluster to install the
nc
package. -
Install the OpenShift CLI (
oc
). -
Access to the cluster as a user with the
cluster-admin
role.
Procedure
Create a pod starts an SCTP listener:
Create a file named
sctp-server.yaml
that defines a pod with the following YAML:apiVersion: v1 kind: Pod metadata: name: sctpserver labels: app: sctpserver spec: containers: - name: sctpserver image: registry.access.redhat.com/ubi9/ubi command: ["/bin/sh", "-c"] args: ["dnf install -y nc && sleep inf"] ports: - containerPort: 30102 name: sctpserver protocol: SCTP
apiVersion: v1 kind: Pod metadata: name: sctpserver labels: app: sctpserver spec: containers: - name: sctpserver image: registry.access.redhat.com/ubi9/ubi command: ["/bin/sh", "-c"] args: ["dnf install -y nc && sleep inf"] ports: - containerPort: 30102 name: sctpserver protocol: SCTP
Copy to Clipboard Copied! Create the pod by entering the following command:
oc create -f sctp-server.yaml
$ oc create -f sctp-server.yaml
Copy to Clipboard Copied!
Create a service for the SCTP listener pod.
Create a file named
sctp-service.yaml
that defines a service with the following YAML:apiVersion: v1 kind: Service metadata: name: sctpservice labels: app: sctpserver spec: type: NodePort selector: app: sctpserver ports: - name: sctpserver protocol: SCTP port: 30102 targetPort: 30102
apiVersion: v1 kind: Service metadata: name: sctpservice labels: app: sctpserver spec: type: NodePort selector: app: sctpserver ports: - name: sctpserver protocol: SCTP port: 30102 targetPort: 30102
Copy to Clipboard Copied! To create the service, enter the following command:
oc create -f sctp-service.yaml
$ oc create -f sctp-service.yaml
Copy to Clipboard Copied!
Create a pod for the SCTP client.
Create a file named
sctp-client.yaml
with the following YAML:apiVersion: v1 kind: Pod metadata: name: sctpclient labels: app: sctpclient spec: containers: - name: sctpclient image: registry.access.redhat.com/ubi9/ubi command: ["/bin/sh", "-c"] args: ["dnf install -y nc && sleep inf"]
apiVersion: v1 kind: Pod metadata: name: sctpclient labels: app: sctpclient spec: containers: - name: sctpclient image: registry.access.redhat.com/ubi9/ubi command: ["/bin/sh", "-c"] args: ["dnf install -y nc && sleep inf"]
Copy to Clipboard Copied! To create the
Pod
object, enter the following command:oc apply -f sctp-client.yaml
$ oc apply -f sctp-client.yaml
Copy to Clipboard Copied!
Run an SCTP listener on the server.
To connect to the server pod, enter the following command:
oc rsh sctpserver
$ oc rsh sctpserver
Copy to Clipboard Copied! To start the SCTP listener, enter the following command:
nc -l 30102 --sctp
$ nc -l 30102 --sctp
Copy to Clipboard Copied!
Connect to the SCTP listener on the server.
- Open a new terminal window or tab in your terminal program.
Obtain the IP address of the
sctpservice
service. Enter the following command:oc get services sctpservice -o go-template='{{.spec.clusterIP}}{{"\n"}}'
$ oc get services sctpservice -o go-template='{{.spec.clusterIP}}{{"\n"}}'
Copy to Clipboard Copied! To connect to the client pod, enter the following command:
oc rsh sctpclient
$ oc rsh sctpclient
Copy to Clipboard Copied! To start the SCTP client, enter the following command. Replace
<cluster_IP>
with the cluster IP address of thesctpservice
service.nc <cluster_IP> 30102 --sctp
# nc <cluster_IP> 30102 --sctp
Copy to Clipboard Copied!
Chapter 19. Using Precision Time Protocol hardware
19.1. About Precision Time Protocol in OpenShift cluster nodes
Precision Time Protocol (PTP) is used to synchronize clocks in a network. When used in conjunction with hardware support, PTP is capable of sub-microsecond accuracy, and is more accurate than Network Time Protocol (NTP).
If your openshift-sdn
cluster with PTP uses the User Datagram Protocol (UDP) for hardware time stamping and you migrate to the OVN-Kubernetes plugin, the hardware time stamping cannot be applied to primary interface devices, such as an Open vSwitch (OVS) bridge. As a result, UDP version 4 configurations cannot work with a br-ex
interface.
You can configure linuxptp
services and use PTP-capable hardware in OpenShift Container Platform cluster nodes.
Use the OpenShift Container Platform web console or OpenShift CLI (oc
) to install PTP by deploying the PTP Operator. The PTP Operator creates and manages the linuxptp
services and provides the following features:
- Discovery of the PTP-capable devices in the cluster.
-
Management of the configuration of
linuxptp
services. -
Notification of PTP clock events that negatively affect the performance and reliability of your application with the PTP Operator
cloud-event-proxy
sidecar.
The PTP Operator works with PTP-capable devices on clusters provisioned only on bare-metal infrastructure.
19.1.1. Elements of a PTP domain
PTP is used to synchronize multiple nodes connected in a network, with clocks for each node. The clocks synchronized by PTP are organized in a leader-follower hierarchy. The hierarchy is created and updated automatically by the best master clock (BMC) algorithm, which runs on every clock. Follower clocks are synchronized to leader clocks, and follower clocks can themselves be the source for other downstream clocks.
Figure 19.1. PTP nodes in the network

The three primary types of PTP clocks are described below.
- Grandmaster clock
- The grandmaster clock provides standard time information to other clocks across the network and ensures accurate and stable synchronisation. It writes time stamps and responds to time requests from other clocks. Grandmaster clocks synchronize to a Global Navigation Satellite System (GNSS) time source. The Grandmaster clock is the authoritative source of time in the network and is responsible for providing time synchronization to all other devices.
- Boundary clock
- The boundary clock has ports in two or more communication paths and can be a source and a destination to other destination clocks at the same time. The boundary clock works as a destination clock upstream. The destination clock receives the timing message, adjusts for delay, and then creates a new source time signal to pass down the network. The boundary clock produces a new timing packet that is still correctly synced with the source clock and can reduce the number of connected devices reporting directly to the source clock.
- Ordinary clock
- The ordinary clock has a single port connection that can play the role of source or destination clock, depending on its position in the network. The ordinary clock can read and write timestamps.
Advantages of PTP over NTP
One of the main advantages that PTP has over NTP is the hardware support present in various network interface controllers (NIC) and network switches. The specialized hardware allows PTP to account for delays in message transfer and improves the accuracy of time synchronization. To achieve the best possible accuracy, it is recommended that all networking components between PTP clocks are PTP hardware enabled.
Hardware-based PTP provides optimal accuracy, since the NIC can timestamp the PTP packets at the exact moment they are sent and received. Compare this to software-based PTP, which requires additional processing of the PTP packets by the operating system.
Before enabling PTP, ensure that NTP is disabled for the required nodes. You can disable the chrony time service (chronyd
) using a MachineConfig
custom resource. For more information, see Disabling chrony time service.
19.1.2. Using dual Intel E810 NIC hardware with PTP
OpenShift Container Platform supports single and dual NIC Intel E810 hardware for precision PTP timing in grandmaster clocks (T-GM) and boundary clocks (T-BC).
- Dual NIC grandmaster clock
You can use a cluster host that has dual NIC hardware as PTP grandmaster clock. One NIC receives timing information from the global navigation satellite system (GNSS). The second NIC receives the timing information from the first using the SMA1 Tx/Rx connections on the E810 NIC faceplate. The system clock on the cluster host is synchronized from the NIC that is connected to the GNSS satellite.
Dual NIC grandmaster clocks are a feature of distributed RAN (D-RAN) configurations where the Remote Radio Unit (RRU) and Baseband Unit (BBU) are located at the same radio cell site. D-RAN distributes radio functions across multiple sites, with backhaul connections linking them to the core network.
Figure 19.2. Dual NIC grandmaster clock
NoteIn a dual NIC T-GM configuration, a single
ts2phc
process reports as twots2phc
instances in the system.- Dual NIC boundary clock
For 5G telco networks that deliver mid-band spectrum coverage, each virtual distributed unit (vDU) requires connections to 6 radio units (RUs). To make these connections, each vDU host requires 2 NICs configured as boundary clocks.
Dual NIC hardware allows you to connect each NIC to the same upstream leader clock with separate
ptp4l
instances for each NIC feeding the downstream clocks.
19.1.3. Overview of linuxptp and gpsd in OpenShift Container Platform nodes
OpenShift Container Platform uses the PTP Operator with linuxptp
and gpsd
packages for high precision network synchronization. The linuxptp
package provides tools and daemons for PTP timing in networks. Cluster hosts with Global Navigation Satellite System (GNSS) capable NICs use gpsd
to interface with GNSS clock sources.
The linuxptp
package includes the ts2phc
, pmc
, ptp4l
, and phc2sys
programs for system clock synchronization.
- ts2phc
ts2phc
synchronizes the PTP hardware clock (PHC) across PTP devices with a high degree of precision.ts2phc
is used in grandmaster clock configurations. It receives the precision timing signal a high precision clock source such as Global Navigation Satellite System (GNSS). GNSS provides an accurate and reliable source of synchronized time for use in large distributed networks. GNSS clocks typically provide time information with a precision of a few nanoseconds.The
ts2phc
system daemon sends timing information from the grandmaster clock to other PTP devices in the network by reading time information from the grandmaster clock and converting it to PHC format. PHC time is used by other devices in the network to synchronize their clocks with the grandmaster clock.- pmc
-
pmc
implements a PTP management client (pmc
) according to IEEE standard 1588.1588.pmc
provides basic management access for theptp4l
system daemon.pmc
reads from standard input and sends the output over the selected transport, printing any replies it receives. - ptp4l
ptp4l
implements the PTP boundary clock and ordinary clock and runs as a system daemon.ptp4l
does the following:- Synchronizes the PHC to the source clock with hardware time stamping
- Synchronizes the system clock to the source clock with software time stamping
- phc2sys
-
phc2sys
synchronizes the system clock to the PHC on the network interface controller (NIC). Thephc2sys
system daemon continuously monitors the PHC for timing information. When it detects a timing error, the PHC corrects the system clock.
The gpsd
package includes the ubxtool
, gspipe
, gpsd
, programs for GNSS clock synchronization with the host clock.
- ubxtool
-
ubxtool
CLI allows you to communicate with a u-blox GPS system. Theubxtool
CLI uses the u-blox binary protocol to communicate with the GPS. - gpspipe
-
gpspipe
connects togpsd
output and pipes it tostdout
. - gpsd
-
gpsd
is a service daemon that monitors one or more GPS or AIS receivers connected to the host.
19.1.4. Overview of GNSS timing for PTP grandmaster clocks
OpenShift Container Platform supports receiving precision PTP timing from Global Navigation Satellite System (GNSS) sources and grandmaster clocks (T-GM) in the cluster.
OpenShift Container Platform supports PTP timing from GNSS sources with Intel E810 Westport Channel NICs only.
Figure 19.3. Overview of Synchronization with GNSS and T-GM

- Global Navigation Satellite System (GNSS)
GNSS is a satellite-based system used to provide positioning, navigation, and timing information to receivers around the globe. In PTP, GNSS receivers are often used as a highly accurate and stable reference clock source. These receivers receive signals from multiple GNSS satellites, allowing them to calculate precise time information. The timing information obtained from GNSS is used as a reference by the PTP grandmaster clock.
By using GNSS as a reference, the grandmaster clock in the PTP network can provide highly accurate timestamps to other devices, enabling precise synchronization across the entire network.
- Digital Phase-Locked Loop (DPLL)
- DPLL provides clock synchronization between different PTP nodes in the network. DPLL compares the phase of the local system clock signal with the phase of the incoming synchronization signal, for example, PTP messages from the PTP grandmaster clock. The DPLL continuously adjusts the local clock frequency and phase to minimize the phase difference between the local clock and the reference clock.
Handling leap second events in GNSS-synced PTP grandmaster clocks
A leap second is a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) to keep it synchronized with International Atomic Time (TAI). UTC leap seconds are unpredictable. Internationally agreed leap seconds are listed in leap-seconds.list. This file is regularly updated by the International Earth Rotation and Reference Systems Service (IERS). An unhandled leap second can have a significant impact on far edge RAN networks. It can cause the far edge RAN application to immediately disconnect voice calls and data sessions.
19.2. Configuring PTP devices
The PTP Operator adds the NodePtpDevice.ptp.openshift.io
custom resource definition (CRD) to OpenShift Container Platform.
When installed, the PTP Operator searches your cluster for Precision Time Protocol (PTP) capable network devices on each node. The Operator creates and updates a NodePtpDevice
custom resource (CR) object for each node that provides a compatible PTP-capable network device.
Network interface controller (NIC) hardware with built-in PTP capabilities sometimes require a device-specific configuration. You can use hardware-specific NIC features for supported hardware with the PTP Operator by configuring a plugin in the PtpConfig
custom resource (CR). The linuxptp-daemon
service uses the named parameters in the plugin
stanza to start linuxptp
processes, ptp4l
and phc2sys
, based on the specific hardware configuration.
In OpenShift Container Platform 4.15, the Intel E810 NIC is supported with a PtpConfig
plugin.
19.2.1. Installing the PTP Operator using the CLI
As a cluster administrator, you can install the Operator by using the CLI.
Prerequisites
- A cluster installed on bare-metal hardware with nodes that have hardware that supports PTP.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a namespace for the PTP Operator.
Save the following YAML in the
ptp-namespace.yaml
file:apiVersion: v1 kind: Namespace metadata: name: openshift-ptp annotations: workload.openshift.io/allowed: management labels: name: openshift-ptp openshift.io/cluster-monitoring: "true"
apiVersion: v1 kind: Namespace metadata: name: openshift-ptp annotations: workload.openshift.io/allowed: management labels: name: openshift-ptp openshift.io/cluster-monitoring: "true"
Copy to Clipboard Copied! Create the
Namespace
CR:oc create -f ptp-namespace.yaml
$ oc create -f ptp-namespace.yaml
Copy to Clipboard Copied!
Create an Operator group for the PTP Operator.
Save the following YAML in the
ptp-operatorgroup.yaml
file:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: ptp-operators namespace: openshift-ptp spec: targetNamespaces: - openshift-ptp
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: ptp-operators namespace: openshift-ptp spec: targetNamespaces: - openshift-ptp
Copy to Clipboard Copied! Create the
OperatorGroup
CR:oc create -f ptp-operatorgroup.yaml
$ oc create -f ptp-operatorgroup.yaml
Copy to Clipboard Copied!
Subscribe to the PTP Operator.
Save the following YAML in the
ptp-sub.yaml
file:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ptp-operator-subscription namespace: openshift-ptp spec: channel: "stable" name: ptp-operator source: redhat-operators sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ptp-operator-subscription namespace: openshift-ptp spec: channel: "stable" name: ptp-operator source: redhat-operators sourceNamespace: openshift-marketplace
Copy to Clipboard Copied! Create the
Subscription
CR:oc create -f ptp-sub.yaml
$ oc create -f ptp-sub.yaml
Copy to Clipboard Copied!
To verify that the Operator is installed, enter the following command:
oc get csv -n openshift-ptp -o custom-columns=Name:.metadata.name,Phase:.status.phase
$ oc get csv -n openshift-ptp -o custom-columns=Name:.metadata.name,Phase:.status.phase
Copy to Clipboard Copied! Example output
Name Phase 4.15.0-202301261535 Succeeded
Name Phase 4.15.0-202301261535 Succeeded
Copy to Clipboard Copied!
19.2.2. Installing the PTP Operator by using the web console
As a cluster administrator, you can install the PTP Operator by using the web console.
You have to create the namespace and Operator group as mentioned in the previous section.
Procedure
Install the PTP Operator using the OpenShift Container Platform web console:
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Choose PTP Operator from the list of available Operators, and then click Install.
- On the Install Operator page, under A specific namespace on the cluster select openshift-ptp. Then, click Install.
Optional: Verify that the PTP Operator installed successfully:
- Switch to the Operators → Installed Operators page.
Ensure that PTP Operator is listed in the openshift-ptp project with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the Operator does not appear as installed, to troubleshoot further:
- Go to the Operators → Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Go to the Workloads → Pods page and check the logs for pods in the
openshift-ptp
project.
19.2.3. Discovering PTP-capable network devices in your cluster
Identify PTP-capable network devices that exist in your cluster so that you can configure them
Prerequisties
- You installed the PTP Operator.
Procedure
To return a complete list of PTP capable network devices in your cluster, run the following command:
oc get NodePtpDevice -n openshift-ptp -o yaml
$ oc get NodePtpDevice -n openshift-ptp -o yaml
Copy to Clipboard Copied! Example output
apiVersion: v1 items: - apiVersion: ptp.openshift.io/v1 kind: NodePtpDevice metadata: creationTimestamp: "2022-01-27T15:16:28Z" generation: 1 name: dev-worker-0 namespace: openshift-ptp resourceVersion: "6538103" uid: d42fc9ad-bcbf-4590-b6d8-b676c642781a spec: {} status: devices: - name: eno1 - name: eno2 - name: eno3 - name: eno4 - name: enp5s0f0 - name: enp5s0f1 ...
apiVersion: v1 items: - apiVersion: ptp.openshift.io/v1 kind: NodePtpDevice metadata: creationTimestamp: "2022-01-27T15:16:28Z" generation: 1 name: dev-worker-0
1 namespace: openshift-ptp resourceVersion: "6538103" uid: d42fc9ad-bcbf-4590-b6d8-b676c642781a spec: {} status: devices:
2 - name: eno1 - name: eno2 - name: eno3 - name: eno4 - name: enp5s0f0 - name: enp5s0f1 ...
Copy to Clipboard Copied!
19.2.4. Configuring linuxptp services as a grandmaster clock
You can configure the linuxptp
services (ptp4l
, phc2sys
, ts2phc
) as grandmaster clock (T-GM) by creating a PtpConfig
custom resource (CR) that configures the host NIC.
The ts2phc
utility allows you to synchronize the system clock with the PTP grandmaster clock so that the node can stream precision clock signal to downstream PTP ordinary clocks and boundary clocks.
Use the following example PtpConfig
CR as the basis to configure linuxptp
services as T-GM for an Intel Westport Channel E810-XXVDA4T network interface.
To configure PTP fast events, set appropriate values for ptp4lOpts
, ptp4lConf
, and ptpClockThreshold
. ptpClockThreshold
is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.
Prerequisites
- For T-GM clocks in production environments, install an Intel E810 Westport Channel NIC in the bare-metal cluster host.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Create the
PtpConfig
CR. For example:Depending on your requirements, use one of the following T-GM configurations for your deployment. Save the YAML in the
grandmaster-clock-ptp-config.yaml
file:Example 19.1. PTP grandmaster clock configuration for E810 NIC
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: grandmaster namespace: openshift-ptp annotations: {} spec: profile: - name: "grandmaster" ptp4lOpts: "-2 --summary_interval -4" phc2sysOpts: -r -u 0 -m -N 8 -R 16 -s $iface_master -n 24 ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" plugins: e810: enableDefaultConfig: false settings: LocalMaxHoldoverOffSet: 1500 LocalHoldoverTimeout: 14400 MaxInSpecOffset: 1500 pins: $e810_pins # "$iface_master": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "0 1" ublxCmds: - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1 - "-P" - "29.20" - "-z" - "CFG-HW-ANT_CFG_VOLTCTRL,1" reportOutput: false - args: #ubxtool -P 29.20 -e GPS - "-P" - "29.20" - "-e" - "GPS" reportOutput: false - args: #ubxtool -P 29.20 -d Galileo - "-P" - "29.20" - "-d" - "Galileo" reportOutput: false - args: #ubxtool -P 29.20 -d GLONASS - "-P" - "29.20" - "-d" - "GLONASS" reportOutput: false - args: #ubxtool -P 29.20 -d BeiDou - "-P" - "29.20" - "-d" - "BeiDou" reportOutput: false - args: #ubxtool -P 29.20 -d SBAS - "-P" - "29.20" - "-d" - "SBAS" reportOutput: false - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000 - "-P" - "29.20" - "-t" - "-w" - "5" - "-v" - "1" - "-e" - "SURVEYIN,600,50000" reportOutput: true - args: #ubxtool -P 29.20 -p MON-HW - "-P" - "29.20" - "-p" - "MON-HW" reportOutput: true - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248 - "-P" - "29.20" - "-p" - "CFG-MSG,1,38,248" reportOutput: true ts2phcOpts: " " ts2phcConf: | [nmea] ts2phc.master 1 [global] use_syslog 0 verbose 1 logging_level 7 ts2phc.pulsewidth 100000000 #cat /dev/GNSS to find available serial port #example value of gnss_serialport is /dev/ttyGNSS_1700_0 ts2phc.nmea_serialport $gnss_serialport [$iface_master] ts2phc.extts_polarity rising ts2phc.extts_correction 0 ptp4lConf: | [$iface_master] masterOnly 1 [$iface_master_1] masterOnly 1 [$iface_master_2] masterOnly 1 [$iface_master_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 6 clockAccuracy 0x27 offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval 0 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval -4 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0x20 recommend: - profile: "grandmaster" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: grandmaster namespace: openshift-ptp annotations: {} spec: profile: - name: "grandmaster" ptp4lOpts: "-2 --summary_interval -4" phc2sysOpts: -r -u 0 -m -N 8 -R 16 -s $iface_master -n 24 ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" plugins: e810: enableDefaultConfig: false settings: LocalMaxHoldoverOffSet: 1500 LocalHoldoverTimeout: 14400 MaxInSpecOffset: 1500 pins: $e810_pins # "$iface_master": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "0 1" ublxCmds: - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1 - "-P" - "29.20" - "-z" - "CFG-HW-ANT_CFG_VOLTCTRL,1" reportOutput: false - args: #ubxtool -P 29.20 -e GPS - "-P" - "29.20" - "-e" - "GPS" reportOutput: false - args: #ubxtool -P 29.20 -d Galileo - "-P" - "29.20" - "-d" - "Galileo" reportOutput: false - args: #ubxtool -P 29.20 -d GLONASS - "-P" - "29.20" - "-d" - "GLONASS" reportOutput: false - args: #ubxtool -P 29.20 -d BeiDou - "-P" - "29.20" - "-d" - "BeiDou" reportOutput: false - args: #ubxtool -P 29.20 -d SBAS - "-P" - "29.20" - "-d" - "SBAS" reportOutput: false - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000 - "-P" - "29.20" - "-t" - "-w" - "5" - "-v" - "1" - "-e" - "SURVEYIN,600,50000" reportOutput: true - args: #ubxtool -P 29.20 -p MON-HW - "-P" - "29.20" - "-p" - "MON-HW" reportOutput: true - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248 - "-P" - "29.20" - "-p" - "CFG-MSG,1,38,248" reportOutput: true ts2phcOpts: " " ts2phcConf: | [nmea] ts2phc.master 1 [global] use_syslog 0 verbose 1 logging_level 7 ts2phc.pulsewidth 100000000 #cat /dev/GNSS to find available serial port #example value of gnss_serialport is /dev/ttyGNSS_1700_0 ts2phc.nmea_serialport $gnss_serialport [$iface_master] ts2phc.extts_polarity rising ts2phc.extts_correction 0 ptp4lConf: | [$iface_master] masterOnly 1 [$iface_master_1] masterOnly 1 [$iface_master_2] masterOnly 1 [$iface_master_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 6 clockAccuracy 0x27 offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval 0 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval -4 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0x20 recommend: - profile: "grandmaster" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
Copy to Clipboard Copied! NoteFor E810 Westport Channel NICs, set the value for
ts2phc.nmea_serialport
to/dev/gnss0
.Create the CR by running the following command:
oc create -f grandmaster-clock-ptp-config.yaml
$ oc create -f grandmaster-clock-ptp-config.yaml
Copy to Clipboard Copied!
Verification
Check that the
PtpConfig
profile is applied to the node.Get the list of pods in the
openshift-ptp
namespace by running the following command:oc get pods -n openshift-ptp -o wide
$ oc get pods -n openshift-ptp -o wide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-74m2g 3/3 Running 3 4d15h 10.16.230.7 compute-1.example.com ptp-operator-5f4f48d7c-x7zkf 1/1 Running 1 4d15h 10.128.1.145 compute-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-74m2g 3/3 Running 3 4d15h 10.16.230.7 compute-1.example.com ptp-operator-5f4f48d7c-x7zkf 1/1 Running 1 4d15h 10.128.1.145 compute-1.example.com
Copy to Clipboard Copied! Check that the profile is correct. Examine the logs of the
linuxptp
daemon that corresponds to the node you specified in thePtpConfig
profile. Run the following command:oc logs linuxptp-daemon-74m2g -n openshift-ptp -c linuxptp-daemon-container
$ oc logs linuxptp-daemon-74m2g -n openshift-ptp -c linuxptp-daemon-container
Copy to Clipboard Copied! Example output
ts2phc[94980.334]: [ts2phc.0.config] nmea delay: 98690975 ns ts2phc[94980.334]: [ts2phc.0.config] ens3f0 extts index 0 at 1676577329.999999999 corr 0 src 1676577330.901342528 diff -1 ts2phc[94980.334]: [ts2phc.0.config] ens3f0 master offset -1 s2 freq -1 ts2phc[94980.441]: [ts2phc.0.config] nmea sentence: GNRMC,195453.00,A,4233.24427,N,07126.64420,W,0.008,,160223,,,A,V phc2sys[94980.450]: [ptp4l.0.config] CLOCK_REALTIME phc offset 943 s2 freq -89604 delay 504 phc2sys[94980.512]: [ptp4l.0.config] CLOCK_REALTIME phc offset 1000 s2 freq -89264 delay 474
ts2phc[94980.334]: [ts2phc.0.config] nmea delay: 98690975 ns ts2phc[94980.334]: [ts2phc.0.config] ens3f0 extts index 0 at 1676577329.999999999 corr 0 src 1676577330.901342528 diff -1 ts2phc[94980.334]: [ts2phc.0.config] ens3f0 master offset -1 s2 freq -1 ts2phc[94980.441]: [ts2phc.0.config] nmea sentence: GNRMC,195453.00,A,4233.24427,N,07126.64420,W,0.008,,160223,,,A,V phc2sys[94980.450]: [ptp4l.0.config] CLOCK_REALTIME phc offset 943 s2 freq -89604 delay 504 phc2sys[94980.512]: [ptp4l.0.config] CLOCK_REALTIME phc offset 1000 s2 freq -89264 delay 474
Copy to Clipboard Copied!
19.2.5. Configuring linuxptp services as a grandmaster clock for dual E810 Westport Channel NICs
You can configure the linuxptp
services (ptp4l
, phc2sys
, ts2phc
) as grandmaster clock (T-GM) for dual E810 Westport Channel NICs by creating a PtpConfig
custom resource (CR) that configures the host NICs.
For distributed RAN (D-RAN) use cases, you can configure PTP for dual NICs as follows:
- NIC one is synced to the global navigation satellite system (GNSS) time source.
-
NIC two is synced to the 1PPS timing output provided by NIC one. This configuration is provided by the PTP hardware plugin in the
PtpConfig
CR.
The dual NIC PTP T-GM configuration uses a single instance of ptp4l
and one ts2phc
process reporting two ts2phc
instances, one for each NIC. The host system clock is synchronized from the NIC that is connected to the GNSS time source.
Use the following example PtpConfig
CR as the basis to configure linuxptp
services as T-GM for dual Intel Westport Channel E810-XXVDA4T network interfaces.
To configure PTP fast events, set appropriate values for ptp4lOpts
, ptp4lConf
, and ptpClockThreshold
. ptpClockThreshold
is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.
Prerequisites
- For T-GM clocks in production environments, install two Intel E810 Westport Channel NICs in the bare-metal cluster host.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Create the
PtpConfig
CR. For example:Save the following YAML in the
grandmaster-clock-ptp-config-dual-nics.yaml
file:Example 19.2. PTP grandmaster clock configuration for dual E810 NICs
# In this example two cards $iface_nic1 and $iface_nic2 are connected via # SMA1 ports by a cable and $iface_nic2 receives 1PPS signals from $iface_nic1 apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: grandmaster namespace: openshift-ptp annotations: {} spec: profile: - name: "grandmaster" ptp4lOpts: "-2 --summary_interval -4" phc2sysOpts: -r -u 0 -m -N 8 -R 16 -s $iface_nic1 -n 24 ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" plugins: e810: enableDefaultConfig: false settings: LocalMaxHoldoverOffSet: 1500 LocalHoldoverTimeout: 14400 MaxInSpecOffset: 1500 pins: $e810_pins # "$iface_nic1": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "2 1" # "$iface_nic2": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "1 1" ublxCmds: - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1 - "-P" - "29.20" - "-z" - "CFG-HW-ANT_CFG_VOLTCTRL,1" reportOutput: false - args: #ubxtool -P 29.20 -e GPS - "-P" - "29.20" - "-e" - "GPS" reportOutput: false - args: #ubxtool -P 29.20 -d Galileo - "-P" - "29.20" - "-d" - "Galileo" reportOutput: false - args: #ubxtool -P 29.20 -d GLONASS - "-P" - "29.20" - "-d" - "GLONASS" reportOutput: false - args: #ubxtool -P 29.20 -d BeiDou - "-P" - "29.20" - "-d" - "BeiDou" reportOutput: false - args: #ubxtool -P 29.20 -d SBAS - "-P" - "29.20" - "-d" - "SBAS" reportOutput: false - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000 - "-P" - "29.20" - "-t" - "-w" - "5" - "-v" - "1" - "-e" - "SURVEYIN,600,50000" reportOutput: true - args: #ubxtool -P 29.20 -p MON-HW - "-P" - "29.20" - "-p" - "MON-HW" reportOutput: true - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248 - "-P" - "29.20" - "-p" - "CFG-MSG,1,38,248" reportOutput: true ts2phcOpts: " " ts2phcConf: | [nmea] ts2phc.master 1 [global] use_syslog 0 verbose 1 logging_level 7 ts2phc.pulsewidth 100000000 #cat /dev/GNSS to find available serial port #example value of gnss_serialport is /dev/ttyGNSS_1700_0 ts2phc.nmea_serialport $gnss_serialport [$iface_nic1] ts2phc.extts_polarity rising ts2phc.extts_correction 0 [$iface_nic2] ts2phc.master 0 ts2phc.extts_polarity rising #this is a measured value in nanoseconds to compensate for SMA cable delay ts2phc.extts_correction -10 ptp4lConf: | [$iface_nic1] masterOnly 1 [$iface_nic1_1] masterOnly 1 [$iface_nic1_2] masterOnly 1 [$iface_nic1_3] masterOnly 1 [$iface_nic2] masterOnly 1 [$iface_nic2_1] masterOnly 1 [$iface_nic2_2] masterOnly 1 [$iface_nic2_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 6 clockAccuracy 0x27 offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval 0 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval -4 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 1 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0x20 recommend: - profile: "grandmaster" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
# In this example two cards $iface_nic1 and $iface_nic2 are connected via # SMA1 ports by a cable and $iface_nic2 receives 1PPS signals from $iface_nic1 apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: grandmaster namespace: openshift-ptp annotations: {} spec: profile: - name: "grandmaster" ptp4lOpts: "-2 --summary_interval -4" phc2sysOpts: -r -u 0 -m -N 8 -R 16 -s $iface_nic1 -n 24 ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" plugins: e810: enableDefaultConfig: false settings: LocalMaxHoldoverOffSet: 1500 LocalHoldoverTimeout: 14400 MaxInSpecOffset: 1500 pins: $e810_pins # "$iface_nic1": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "2 1" # "$iface_nic2": # "U.FL2": "0 2" # "U.FL1": "0 1" # "SMA2": "0 2" # "SMA1": "1 1" ublxCmds: - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1 - "-P" - "29.20" - "-z" - "CFG-HW-ANT_CFG_VOLTCTRL,1" reportOutput: false - args: #ubxtool -P 29.20 -e GPS - "-P" - "29.20" - "-e" - "GPS" reportOutput: false - args: #ubxtool -P 29.20 -d Galileo - "-P" - "29.20" - "-d" - "Galileo" reportOutput: false - args: #ubxtool -P 29.20 -d GLONASS - "-P" - "29.20" - "-d" - "GLONASS" reportOutput: false - args: #ubxtool -P 29.20 -d BeiDou - "-P" - "29.20" - "-d" - "BeiDou" reportOutput: false - args: #ubxtool -P 29.20 -d SBAS - "-P" - "29.20" - "-d" - "SBAS" reportOutput: false - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000 - "-P" - "29.20" - "-t" - "-w" - "5" - "-v" - "1" - "-e" - "SURVEYIN,600,50000" reportOutput: true - args: #ubxtool -P 29.20 -p MON-HW - "-P" - "29.20" - "-p" - "MON-HW" reportOutput: true - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248 - "-P" - "29.20" - "-p" - "CFG-MSG,1,38,248" reportOutput: true ts2phcOpts: " " ts2phcConf: | [nmea] ts2phc.master 1 [global] use_syslog 0 verbose 1 logging_level 7 ts2phc.pulsewidth 100000000 #cat /dev/GNSS to find available serial port #example value of gnss_serialport is /dev/ttyGNSS_1700_0 ts2phc.nmea_serialport $gnss_serialport [$iface_nic1] ts2phc.extts_polarity rising ts2phc.extts_correction 0 [$iface_nic2] ts2phc.master 0 ts2phc.extts_polarity rising #this is a measured value in nanoseconds to compensate for SMA cable delay ts2phc.extts_correction -10 ptp4lConf: | [$iface_nic1] masterOnly 1 [$iface_nic1_1] masterOnly 1 [$iface_nic1_2] masterOnly 1 [$iface_nic1_3] masterOnly 1 [$iface_nic2] masterOnly 1 [$iface_nic2_1] masterOnly 1 [$iface_nic2_2] masterOnly 1 [$iface_nic2_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 6 clockAccuracy 0x27 offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval 0 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval -4 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 1 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0x20 recommend: - profile: "grandmaster" priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
Copy to Clipboard Copied! NoteFor E810 Westport Channel NICs, set the value for
ts2phc.nmea_serialport
to/dev/gnss0
.Create the CR by running the following command:
oc create -f grandmaster-clock-ptp-config-dual-nics.yaml
$ oc create -f grandmaster-clock-ptp-config-dual-nics.yaml
Copy to Clipboard Copied!
Verification
Check that the
PtpConfig
profile is applied to the node.Get the list of pods in the
openshift-ptp
namespace by running the following command:oc get pods -n openshift-ptp -o wide
$ oc get pods -n openshift-ptp -o wide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-74m2g 3/3 Running 3 4d15h 10.16.230.7 compute-1.example.com ptp-operator-5f4f48d7c-x7zkf 1/1 Running 1 4d15h 10.128.1.145 compute-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-74m2g 3/3 Running 3 4d15h 10.16.230.7 compute-1.example.com ptp-operator-5f4f48d7c-x7zkf 1/1 Running 1 4d15h 10.128.1.145 compute-1.example.com
Copy to Clipboard Copied! Check that the profile is correct. Examine the logs of the
linuxptp
daemon that corresponds to the node you specified in thePtpConfig
profile. Run the following command:oc logs linuxptp-daemon-74m2g -n openshift-ptp -c linuxptp-daemon-container
$ oc logs linuxptp-daemon-74m2g -n openshift-ptp -c linuxptp-daemon-container
Copy to Clipboard Copied! Example output
ts2phc[509863.660]: [ts2phc.0.config] nmea delay: 347527248 ns ts2phc[509863.660]: [ts2phc.0.config] ens2f0 extts index 0 at 1705516553.000000000 corr 0 src 1705516553.652499081 diff 0 ts2phc[509863.660]: [ts2phc.0.config] ens2f0 master offset 0 s2 freq -0 I0117 18:35:16.000146 1633226 stats.go:57] state updated for ts2phc =s2 I0117 18:35:16.000163 1633226 event.go:417] dpll State s2, gnss State s2, tsphc state s2, gm state s2, ts2phc[1705516516]:[ts2phc.0.config] ens2f0 nmea_status 1 offset 0 s2 GM[1705516516]:[ts2phc.0.config] ens2f0 T-GM-STATUS s2 ts2phc[509863.677]: [ts2phc.0.config] ens7f0 extts index 0 at 1705516553.000000010 corr -10 src 1705516553.652499081 diff 0 ts2phc[509863.677]: [ts2phc.0.config] ens7f0 master offset 0 s2 freq -0 I0117 18:35:16.016597 1633226 stats.go:57] state updated for ts2phc =s2 phc2sys[509863.719]: [ptp4l.0.config] CLOCK_REALTIME phc offset -6 s2 freq +15441 delay 510 phc2sys[509863.782]: [ptp4l.0.config] CLOCK_REALTIME phc offset -7 s2 freq +15438 delay 502
ts2phc[509863.660]: [ts2phc.0.config] nmea delay: 347527248 ns ts2phc[509863.660]: [ts2phc.0.config] ens2f0 extts index 0 at 1705516553.000000000 corr 0 src 1705516553.652499081 diff 0 ts2phc[509863.660]: [ts2phc.0.config] ens2f0 master offset 0 s2 freq -0 I0117 18:35:16.000146 1633226 stats.go:57] state updated for ts2phc =s2 I0117 18:35:16.000163 1633226 event.go:417] dpll State s2, gnss State s2, tsphc state s2, gm state s2, ts2phc[1705516516]:[ts2phc.0.config] ens2f0 nmea_status 1 offset 0 s2 GM[1705516516]:[ts2phc.0.config] ens2f0 T-GM-STATUS s2 ts2phc[509863.677]: [ts2phc.0.config] ens7f0 extts index 0 at 1705516553.000000010 corr -10 src 1705516553.652499081 diff 0 ts2phc[509863.677]: [ts2phc.0.config] ens7f0 master offset 0 s2 freq -0 I0117 18:35:16.016597 1633226 stats.go:57] state updated for ts2phc =s2 phc2sys[509863.719]: [ptp4l.0.config] CLOCK_REALTIME phc offset -6 s2 freq +15441 delay 510 phc2sys[509863.782]: [ptp4l.0.config] CLOCK_REALTIME phc offset -7 s2 freq +15438 delay 502
Copy to Clipboard Copied!
19.2.5.1. Grandmaster clock PtpConfig configuration reference
The following reference information describes the configuration options for the PtpConfig
custom resource (CR) that configures the linuxptp
services (ptp4l
, phc2sys
, ts2phc
) as a grandmaster clock.
PtpConfig CR field | Description |
---|---|
|
Specify an array of
The plugin mechanism allows the PTP Operator to do automated hardware configuration. For the Intel Westport Channel NIC, when |
|
Specify system configuration options for the |
|
Specify the required configuration to start |
| Specify the maximum amount of time to wait for the transmit (TX) timestamp from the sender before discarding the data. |
| Specify the JBOD boundary clock time delay value. This value is used to correct the time values that are passed between the network time devices. |
|
Specify system config options for the Note
Ensure that the network interface listed here is configured as grandmaster and is referenced as required in the |
|
Configure the scheduling policy for |
|
Set an integer value from 1-65 to configure FIFO priority for |
|
Optional. If |
|
Sets the configuration for the
|
|
Set options for the |
|
Specify an array of one or more |
|
Specify the |
|
Specify the |
|
Specify |
|
Set |
|
Set |
19.2.5.2. Grandmaster clock class sync state reference
The following table describes the PTP grandmaster clock (T-GM) gm.ClockClass
states. Clock class states categorize T-GM clocks based on their accuracy and stability with regard to the Primary Reference Time Clock (PRTC) or other timing source.
Holdover specification is the amount of time a PTP clock can maintain synchronization without receiving updates from the primary time source.
Clock class state | Description |
---|---|
|
T-GM clock is connected to a PRTC in |
|
T-GM clock is in |
|
T-GM clock is in |
|
T-GM clock is in |
For more information, see "Phase/time traceability information", ITU-T G.8275.1/Y.1369.1 Recommendations.
19.2.5.3. Intel Westport Channel E810 hardware configuration reference
Use this information to understand how to use the Intel E810-XXVDA4T hardware plugin to configure the E810 network interface as PTP grandmaster clock. Hardware pin configuration determines how the network interface interacts with other components and devices in the system. The E810-XXVDA4T NIC has four connectors for external 1PPS signals: SMA1
, SMA2
, U.FL1
, and U.FL2
.
Hardware pin | Recommended setting | Description |
---|---|---|
|
|
Disables the |
|
|
Disables the |
|
|
Disables the |
|
|
Disables the |
SMA1
and U.FL1
connectors share channel one. SMA2
and U.FL2
connectors share channel two.
Set spec.profile.plugins.e810.ublxCmds
parameters to configure the GNSS clock in the PtpConfig
custom resource (CR).
You must configure an offset value to compensate for T-GM GPS antenna cable signal delay. To configure the optimal T-GM antenna offset value, make precise measurements of the GNSS antenna cable signal delay. Red Hat cannot assist in this measurement or provide any values for the required delay offsets.
Each of these ublxCmds
stanzas correspond to a configuration that is applied to the host NIC by using ubxtool
commands. For example:
ublxCmds: - args: - "-P" - "29.20" - "-z" - "CFG-HW-ANT_CFG_VOLTCTRL,1" - "-z" - "CFG-TP-ANT_CABLEDELAY,<antenna_delay_offset>" reportOutput: false
ublxCmds:
- args:
- "-P"
- "29.20"
- "-z"
- "CFG-HW-ANT_CFG_VOLTCTRL,1"
- "-z"
- "CFG-TP-ANT_CABLEDELAY,<antenna_delay_offset>"
reportOutput: false
- 1
- Measured T-GM antenna delay offset in nanoseconds. To get the required delay offset value, you must measure the cable delay using external test equipment.
The following table describes the equivalent ubxtool
commands:
ubxtool command | Description |
---|---|
|
Enables antenna voltage control, allows antenna status to be reported in the |
| Enables the antenna to receive GPS signals. |
| Configures the antenna to receive signal from the Galileo GPS satellite. |
| Disables the antenna from receiving signal from the GLONASS GPS satellite. |
| Disables the antenna from receiving signal from the BeiDou GPS satellite. |
| Disables the antenna from receiving signal from the SBAS GPS satellite. |
| Configures the GNSS receiver survey-in process to improve its initial position estimate. This can take up to 24 hours to achieve an optimal result. |
| Runs a single automated scan of the hardware and reports on the NIC state and configuration settings. |
19.2.5.4. Dual E810 Westport Channel NIC configuration reference
Use this information to understand how to use the Intel E810-XXVDA4T hardware plugin to configure a pair of E810 network interfaces as PTP grandmaster clock (T-GM).
Before you configure the dual NIC cluster host, you must connect the two NICs with an SMA1 cable using the 1PPS faceplace connections.
When you configure a dual NIC T-GM, you need to compensate for the 1PPS signal delay that occurs when you connect the NICs using the SMA1 connection ports. Various factors such as cable length, ambient temperature, and component and manufacturing tolerances can affect the signal delay. To compensate for the delay, you must calculate the specific value that you use to offset the signal delay.
PtpConfig field | Description |
---|---|
| Configure the E810 hardware pins using the PTP Operator E810 hardware plugin.
|
|
Use the |
|
Set the value of |
19.2.6. Configuring dynamic leap seconds handling for PTP grandmaster clocks
The PTP Operator container image includes the latest leap-seconds.list
file that is available at the time of release. You can configure the PTP Operator to automatically update the leap second file by using Global Positioning System (GPS) announcements.
Leap second information is stored in an automatically generated ConfigMap
resource named leap-configmap
in the openshift-ptp
namespace. The PTP Operator mounts the leap-configmap
resource as a volume in the linuxptp-daemon
pod that is accessible by the ts2phc
process.
If the GPS satellite broadcasts new leap second data, the PTP Operator updates the leap-configmap
resource with the new data. The ts2phc
process picks up the changes automatically.
The following procedure is provided as reference. The 4.15 version of the PTP Operator enables automatic leap second management by default.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have installed the PTP Operator and configured a PTP grandmaster clock (T-GM) in the cluster.
Procedure
Configure automatic leap second handling in the
phc2sysOpts
section of thePtpConfig
CR. Set the following options:phc2sysOpts: -r -u 0 -m -N 8 -R 16 -S 2 -s ens2f0 -n 24
phc2sysOpts: -r -u 0 -m -N 8 -R 16 -S 2 -s ens2f0 -n 24
1 Copy to Clipboard Copied! NotePreviously, the T-GM required an offset adjustment in the
phc2sys
configuration (-O -37
) to account for historical leap seconds. This is no longer needed.Configure the Intel e810 NIC to enable periodical reporting of
NAV-TIMELS
messages by the GPS receiver in thespec.profile.plugins.e810.ublxCmds
section of thePtpConfig
CR. For example:- args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248 - "-P" - "29.20" - "-p" - "CFG-MSG,1,38,248"
- args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248 - "-P" - "29.20" - "-p" - "CFG-MSG,1,38,248"
Copy to Clipboard Copied!
Verification
Validate that the configured T-GM is receiving
NAV-TIMELS
messages from the connected GPS. Run the following command:oc -n openshift-ptp -c linuxptp-daemon-container exec -it $(oc -n openshift-ptp get pods -o name | grep daemon) -- ubxtool -t -p NAV-TIMELS -P 29.20
$ oc -n openshift-ptp -c linuxptp-daemon-container exec -it $(oc -n openshift-ptp get pods -o name | grep daemon) -- ubxtool -t -p NAV-TIMELS -P 29.20
Copy to Clipboard Copied! Example output
1722509534.4417 UBX-NAV-STATUS: iTOW 384752000 gpsFix 5 flags 0xdd fixStat 0x0 flags2 0x8 ttff 18261, msss 1367642864 1722509534.4419 UBX-NAV-TIMELS: iTOW 384752000 version 0 reserved2 0 0 0 srcOfCurrLs 2 currLs 18 srcOfLsChange 2 lsChange 0 timeToLsEvent 70376866 dateOfLsGpsWn 2441 dateOfLsGpsDn 7 reserved2 0 0 0 valid x3 1722509534.4421 UBX-NAV-CLOCK: iTOW 384752000 clkB 784281 clkD 435 tAcc 3 fAcc 215 1722509535.4477 UBX-NAV-STATUS: iTOW 384753000 gpsFix 5 flags 0xdd fixStat 0x0 flags2 0x8 ttff 18261, msss 1367643864 1722509535.4479 UBX-NAV-CLOCK: iTOW 384753000 clkB 784716 clkD 435 tAcc 3 fAcc 218
1722509534.4417 UBX-NAV-STATUS: iTOW 384752000 gpsFix 5 flags 0xdd fixStat 0x0 flags2 0x8 ttff 18261, msss 1367642864 1722509534.4419 UBX-NAV-TIMELS: iTOW 384752000 version 0 reserved2 0 0 0 srcOfCurrLs 2 currLs 18 srcOfLsChange 2 lsChange 0 timeToLsEvent 70376866 dateOfLsGpsWn 2441 dateOfLsGpsDn 7 reserved2 0 0 0 valid x3 1722509534.4421 UBX-NAV-CLOCK: iTOW 384752000 clkB 784281 clkD 435 tAcc 3 fAcc 215 1722509535.4477 UBX-NAV-STATUS: iTOW 384753000 gpsFix 5 flags 0xdd fixStat 0x0 flags2 0x8 ttff 18261, msss 1367643864 1722509535.4479 UBX-NAV-CLOCK: iTOW 384753000 clkB 784716 clkD 435 tAcc 3 fAcc 218
Copy to Clipboard Copied! Validate that the
leap-configmap
resource has been successfully generated by the PTP Operator and is up to date with the latest version of the leap-seconds.list. Run the following command:oc -n openshift-ptp get configmap leap-configmap -o jsonpath='{.data.<node_name>}'
$ oc -n openshift-ptp get configmap leap-configmap -o jsonpath='{.data.<node_name>}'
1 Copy to Clipboard Copied! Example output
Do not edit This file is generated automatically by linuxptp-daemon
# Do not edit # This file is generated automatically by linuxptp-daemon #$ 3913697179 #@ 4291747200 2272060800 10 # 1 Jan 1972 2287785600 11 # 1 Jul 1972 2303683200 12 # 1 Jan 1973 2335219200 13 # 1 Jan 1974 2366755200 14 # 1 Jan 1975 2398291200 15 # 1 Jan 1976 2429913600 16 # 1 Jan 1977 2461449600 17 # 1 Jan 1978 2492985600 18 # 1 Jan 1979 2524521600 19 # 1 Jan 1980 2571782400 20 # 1 Jul 1981 2603318400 21 # 1 Jul 1982 2634854400 22 # 1 Jul 1983 2698012800 23 # 1 Jul 1985 2776982400 24 # 1 Jan 1988 2840140800 25 # 1 Jan 1990 2871676800 26 # 1 Jan 1991 2918937600 27 # 1 Jul 1992 2950473600 28 # 1 Jul 1993 2982009600 29 # 1 Jul 1994 3029443200 30 # 1 Jan 1996 3076704000 31 # 1 Jul 1997 3124137600 32 # 1 Jan 1999 3345062400 33 # 1 Jan 2006 3439756800 34 # 1 Jan 2009 3550089600 35 # 1 Jul 2012 3644697600 36 # 1 Jul 2015 3692217600 37 # 1 Jan 2017 #h e65754d4 8f39962b aa854a61 661ef546 d2af0bfa
Copy to Clipboard Copied!
19.2.7. Configuring linuxptp services as a boundary clock
You can configure the linuxptp
services (ptp4l
, phc2sys
) as boundary clock by creating a PtpConfig
custom resource (CR) object.
Use the following example PtpConfig
CR as the basis to configure linuxptp
services as the boundary clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts
, ptp4lConf
, and ptpClockThreshold
. ptpClockThreshold
is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Create the following
PtpConfig
CR, and then save the YAML in theboundary-clock-ptp-config.yaml
file.Example PTP boundary clock configuration
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary-clock namespace: openshift-ptp annotations: {} spec: profile: - name: boundary-clock ptp4lOpts: "-2" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | # The interface name is hardware-specific [$iface_slave] masterOnly 0 [$iface_master_1] masterOnly 1 [$iface_master_2] masterOnly 1 [$iface_master_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 slaveOnly 0 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 248 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 135 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: boundary-clock priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary-clock namespace: openshift-ptp annotations: {} spec: profile: - name: boundary-clock ptp4lOpts: "-2" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | # The interface name is hardware-specific [$iface_slave] masterOnly 0 [$iface_master_1] masterOnly 1 [$iface_master_2] masterOnly 1 [$iface_master_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 slaveOnly 0 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 248 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 135 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: boundary-clock priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
Copy to Clipboard Copied! Table 19.6. PTP boundary clock CR configuration options CR field Description name
The name of the
PtpConfig
CR.profile
Specify an array of one or more
profile
objects.name
Specify the name of a profile object which uniquely identifies a profile object.
ptp4lOpts
Specify system config options for the
ptp4l
service. The options should not include the network interface name-i <interface>
and service config file-f /etc/ptp4l.conf
because the network interface name and the service config file are automatically appended.ptp4lConf
Specify the required configuration to start
ptp4l
as boundary clock. For example,ens1f0
synchronizes from a grandmaster clock andens1f3
synchronizes connected devices.<interface_1>
The interface that receives the synchronization clock.
<interface_2>
The interface that sends the synchronization clock.
tx_timestamp_timeout
For Intel Columbiaville 800 Series NICs, set
tx_timestamp_timeout
to50
.boundary_clock_jbod
For Intel Columbiaville 800 Series NICs, ensure
boundary_clock_jbod
is set to0
. For Intel Fortville X710 Series NICs, ensureboundary_clock_jbod
is set to1
.phc2sysOpts
Specify system config options for the
phc2sys
service. If this field is empty, the PTP Operator does not start thephc2sys
service.ptpSchedulingPolicy
Scheduling policy for ptp4l and phc2sys processes. Default value is
SCHED_OTHER
. UseSCHED_FIFO
on systems that support FIFO scheduling.ptpSchedulingPriority
Integer value from 1-65 used to set FIFO priority for
ptp4l
andphc2sys
processes whenptpSchedulingPolicy
is set toSCHED_FIFO
. TheptpSchedulingPriority
field is not used whenptpSchedulingPolicy
is set toSCHED_OTHER
.ptpClockThreshold
Optional. If
ptpClockThreshold
is not present, default values are used for theptpClockThreshold
fields.ptpClockThreshold
configures how long after the PTP master clock is disconnected before PTP events are triggered.holdOverTimeout
is the time value in seconds before the PTP clock event state changes toFREERUN
when the PTP master clock is disconnected. ThemaxOffsetThreshold
andminOffsetThreshold
settings configure offset values in nanoseconds that compare against the values forCLOCK_REALTIME
(phc2sys
) or master offset (ptp4l
). When theptp4l
orphc2sys
offset value is outside this range, the PTP clock state is set toFREERUN
. When the offset value is within this range, the PTP clock state is set toLOCKED
.recommend
Specify an array of one or more
recommend
objects that define rules on how theprofile
should be applied to nodes..recommend.profile
Specify the
.recommend.profile
object name defined in theprofile
section..recommend.priority
Specify the
priority
with an integer value between0
and99
. A larger number gets lower priority, so a priority of99
is lower than a priority of10
. If a node can be matched with multiple profiles according to rules defined in thematch
field, the profile with the higher priority is applied to that node..recommend.match
Specify
.recommend.match
rules withnodeLabel
ornodeName
values..recommend.match.nodeLabel
Set
nodeLabel
with thekey
of thenode.Labels
field from the node object by using theoc get nodes --show-labels
command. For example,node-role.kubernetes.io/worker
..recommend.match.nodeName
Set
nodeName
with the value of thenode.Name
field from the node object by using theoc get nodes
command. For example,compute-1.example.com
.Create the CR by running the following command:
oc create -f boundary-clock-ptp-config.yaml
$ oc create -f boundary-clock-ptp-config.yaml
Copy to Clipboard Copied!
Verification
Check that the
PtpConfig
profile is applied to the node.Get the list of pods in the
openshift-ptp
namespace by running the following command:oc get pods -n openshift-ptp -o wide
$ oc get pods -n openshift-ptp -o wide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-4xkbb 1/1 Running 0 43m 10.1.196.24 compute-0.example.com linuxptp-daemon-tdspf 1/1 Running 0 43m 10.1.196.25 compute-1.example.com ptp-operator-657bbb64c8-2f8sj 1/1 Running 0 43m 10.129.0.61 control-plane-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-4xkbb 1/1 Running 0 43m 10.1.196.24 compute-0.example.com linuxptp-daemon-tdspf 1/1 Running 0 43m 10.1.196.25 compute-1.example.com ptp-operator-657bbb64c8-2f8sj 1/1 Running 0 43m 10.129.0.61 control-plane-1.example.com
Copy to Clipboard Copied! Check that the profile is correct. Examine the logs of the
linuxptp
daemon that corresponds to the node you specified in thePtpConfig
profile. Run the following command:oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container
$ oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container
Copy to Clipboard Copied! Example output
I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to: I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------ I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1 I1115 09:41:17.117616 4143292 daemon.go:102] Interface: I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2 I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24 I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------
I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to: I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------ I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1 I1115 09:41:17.117616 4143292 daemon.go:102] Interface: I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2 I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24 I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------
Copy to Clipboard Copied!
19.2.7.1. Configuring linuxptp services as boundary clocks for dual NIC hardware
You can configure the linuxptp
services (ptp4l
, phc2sys
) as boundary clocks for dual-NIC hardware by creating a PtpConfig
custom resource (CR) object for each NIC.
Dual NIC hardware allows you to connect each NIC to the same upstream leader clock with separate ptp4l
instances for each NIC feeding the downstream clocks.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Create two separate
PtpConfig
CRs, one for each NIC, using the reference CR in "Configuring linuxptp services as a boundary clock" as the basis for each CR. For example:Create
boundary-clock-ptp-config-nic1.yaml
, specifying values forphc2sysOpts
:apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary-clock-ptp-config-nic1 namespace: openshift-ptp spec: profile: - name: "profile1" ptp4lOpts: "-2 --summary_interval -4" ptp4lConf: | [ens5f1] masterOnly 1 [ens5f0] masterOnly 0 ... phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16"
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary-clock-ptp-config-nic1 namespace: openshift-ptp spec: profile: - name: "profile1" ptp4lOpts: "-2 --summary_interval -4" ptp4lConf: |
1 [ens5f1] masterOnly 1 [ens5f0] masterOnly 0 ... phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16"
2 Copy to Clipboard Copied! - 1
- Specify the required interfaces to start
ptp4l
as a boundary clock. For example,ens5f0
synchronizes from a grandmaster clock andens5f1
synchronizes connected devices. - 2
- Required
phc2sysOpts
values.-m
prints messages tostdout
. Thelinuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics.
Create
boundary-clock-ptp-config-nic2.yaml
, removing thephc2sysOpts
field altogether to disable thephc2sys
service for the second NIC:apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary-clock-ptp-config-nic2 namespace: openshift-ptp spec: profile: - name: "profile2" ptp4lOpts: "-2 --summary_interval -4" ptp4lConf: | [ens7f1] masterOnly 1 [ens7f0] masterOnly 0 ...
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary-clock-ptp-config-nic2 namespace: openshift-ptp spec: profile: - name: "profile2" ptp4lOpts: "-2 --summary_interval -4" ptp4lConf: |
1 [ens7f1] masterOnly 1 [ens7f0] masterOnly 0 ...
Copy to Clipboard Copied! - 1
- Specify the required interfaces to start
ptp4l
as a boundary clock on the second NIC.
NoteYou must completely remove the
phc2sysOpts
field from the secondPtpConfig
CR to disable thephc2sys
service on the second NIC.
Create the dual NIC
PtpConfig
CRs by running the following commands:Create the CR that configures PTP for the first NIC:
oc create -f boundary-clock-ptp-config-nic1.yaml
$ oc create -f boundary-clock-ptp-config-nic1.yaml
Copy to Clipboard Copied! Create the CR that configures PTP for the second NIC:
oc create -f boundary-clock-ptp-config-nic2.yaml
$ oc create -f boundary-clock-ptp-config-nic2.yaml
Copy to Clipboard Copied!
Verification
Check that the PTP Operator has applied the
PtpConfig
CRs for both NICs. Examine the logs for thelinuxptp
daemon corresponding to the node that has the dual NIC hardware installed. For example, run the following command:oc logs linuxptp-daemon-cvgr6 -n openshift-ptp -c linuxptp-daemon-container
$ oc logs linuxptp-daemon-cvgr6 -n openshift-ptp -c linuxptp-daemon-container
Copy to Clipboard Copied! Example output
ptp4l[80828.335]: [ptp4l.1.config] master offset 5 s2 freq -5727 path delay 519 ptp4l[80828.343]: [ptp4l.0.config] master offset -5 s2 freq -10607 path delay 533 phc2sys[80828.390]: [ptp4l.0.config] CLOCK_REALTIME phc offset 1 s2 freq -87239 delay 539
ptp4l[80828.335]: [ptp4l.1.config] master offset 5 s2 freq -5727 path delay 519 ptp4l[80828.343]: [ptp4l.0.config] master offset -5 s2 freq -10607 path delay 533 phc2sys[80828.390]: [ptp4l.0.config] CLOCK_REALTIME phc offset 1 s2 freq -87239 delay 539
Copy to Clipboard Copied!
19.2.8. Configuring linuxptp services as an ordinary clock
You can configure linuxptp
services (ptp4l
, phc2sys
) as ordinary clock by creating a PtpConfig
custom resource (CR) object.
Use the following example PtpConfig
CR as the basis to configure linuxptp
services as an ordinary clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts
, ptp4lConf
, and ptpClockThreshold
. ptpClockThreshold
is required only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Create the following
PtpConfig
CR, and then save the YAML in theordinary-clock-ptp-config.yaml
file.Example PTP ordinary clock configuration
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: ordinary-clock namespace: openshift-ptp annotations: {} spec: profile: - name: ordinary-clock # The interface name is hardware-specific interface: $interface ptp4lOpts: "-2 -s" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | [global] # # Default Data Set # twoStepFlag 1 slaveOnly 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 255 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type OC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: ordinary-clock priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: ordinary-clock namespace: openshift-ptp annotations: {} spec: profile: - name: ordinary-clock # The interface name is hardware-specific interface: $interface ptp4lOpts: "-2 -s" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | [global] # # Default Data Set # twoStepFlag 1 slaveOnly 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 255 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type OC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: ordinary-clock priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
Copy to Clipboard Copied! Table 19.7. PTP ordinary clock CR configuration options CR field Description name
The name of the
PtpConfig
CR.profile
Specify an array of one or more
profile
objects. Each profile must be uniquely named.interface
Specify the network interface to be used by the
ptp4l
service, for exampleens787f1
.ptp4lOpts
Specify system config options for the
ptp4l
service, for example-2
to select the IEEE 802.3 network transport. The options should not include the network interface name-i <interface>
and service config file-f /etc/ptp4l.conf
because the network interface name and the service config file are automatically appended. Append--summary_interval -4
to use PTP fast events with this interface.phc2sysOpts
Specify system config options for the
phc2sys
service. If this field is empty, the PTP Operator does not start thephc2sys
service. For Intel Columbiaville 800 Series NICs, setphc2sysOpts
options to-a -r -m -n 24 -N 8 -R 16
.-m
prints messages tostdout
. Thelinuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics.ptp4lConf
Specify a string that contains the configuration to replace the default
/etc/ptp4l.conf
file. To use the default configuration, leave the field empty.tx_timestamp_timeout
For Intel Columbiaville 800 Series NICs, set
tx_timestamp_timeout
to50
.boundary_clock_jbod
For Intel Columbiaville 800 Series NICs, set
boundary_clock_jbod
to0
.ptpSchedulingPolicy
Scheduling policy for
ptp4l
andphc2sys
processes. Default value isSCHED_OTHER
. UseSCHED_FIFO
on systems that support FIFO scheduling.ptpSchedulingPriority
Integer value from 1-65 used to set FIFO priority for
ptp4l
andphc2sys
processes whenptpSchedulingPolicy
is set toSCHED_FIFO
. TheptpSchedulingPriority
field is not used whenptpSchedulingPolicy
is set toSCHED_OTHER
.ptpClockThreshold
Optional. If
ptpClockThreshold
is not present, default values are used for theptpClockThreshold
fields.ptpClockThreshold
configures how long after the PTP master clock is disconnected before PTP events are triggered.holdOverTimeout
is the time value in seconds before the PTP clock event state changes toFREERUN
when the PTP master clock is disconnected. ThemaxOffsetThreshold
andminOffsetThreshold
settings configure offset values in nanoseconds that compare against the values forCLOCK_REALTIME
(phc2sys
) or master offset (ptp4l
). When theptp4l
orphc2sys
offset value is outside this range, the PTP clock state is set toFREERUN
. When the offset value is within this range, the PTP clock state is set toLOCKED
.recommend
Specify an array of one or more
recommend
objects that define rules on how theprofile
should be applied to nodes..recommend.profile
Specify the
.recommend.profile
object name defined in theprofile
section..recommend.priority
Set
.recommend.priority
to0
for ordinary clock..recommend.match
Specify
.recommend.match
rules withnodeLabel
ornodeName
values..recommend.match.nodeLabel
Set
nodeLabel
with thekey
of thenode.Labels
field from the node object by using theoc get nodes --show-labels
command. For example,node-role.kubernetes.io/worker
..recommend.match.nodeName
Set
nodeName
with the value of thenode.Name
field from the node object by using theoc get nodes
command. For example,compute-1.example.com
.Create the
PtpConfig
CR by running the following command:oc create -f ordinary-clock-ptp-config.yaml
$ oc create -f ordinary-clock-ptp-config.yaml
Copy to Clipboard Copied!
Verification
Check that the
PtpConfig
profile is applied to the node.Get the list of pods in the
openshift-ptp
namespace by running the following command:oc get pods -n openshift-ptp -o wide
$ oc get pods -n openshift-ptp -o wide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-4xkbb 1/1 Running 0 43m 10.1.196.24 compute-0.example.com linuxptp-daemon-tdspf 1/1 Running 0 43m 10.1.196.25 compute-1.example.com ptp-operator-657bbb64c8-2f8sj 1/1 Running 0 43m 10.129.0.61 control-plane-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-4xkbb 1/1 Running 0 43m 10.1.196.24 compute-0.example.com linuxptp-daemon-tdspf 1/1 Running 0 43m 10.1.196.25 compute-1.example.com ptp-operator-657bbb64c8-2f8sj 1/1 Running 0 43m 10.129.0.61 control-plane-1.example.com
Copy to Clipboard Copied! Check that the profile is correct. Examine the logs of the
linuxptp
daemon that corresponds to the node you specified in thePtpConfig
profile. Run the following command:oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container
$ oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container
Copy to Clipboard Copied! Example output
I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to: I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------ I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1 I1115 09:41:17.117616 4143292 daemon.go:102] Interface: ens787f1 I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2 -s I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24 I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------
I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to: I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------ I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1 I1115 09:41:17.117616 4143292 daemon.go:102] Interface: ens787f1 I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2 -s I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24 I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------
Copy to Clipboard Copied!
19.2.8.1. Intel Columbiaville E800 series NIC as PTP ordinary clock reference
The following table describes the changes that you must make to the reference PTP configuration to use Intel Columbiaville E800 series NICs as ordinary clocks. Make the changes in a PtpConfig
custom resource (CR) that you apply to the cluster.
PTP configuration | Recommended setting |
---|---|
|
|
|
|
|
|
For phc2sysOpts
, -m
prints messages to stdout
. The linuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics.
19.2.9. Configuring FIFO priority scheduling for PTP hardware
In telco or other deployment types that require low latency performance, PTP daemon threads run in a constrained CPU footprint alongside the rest of the infrastructure components. By default, PTP threads run with the SCHED_OTHER
policy. Under high load, these threads might not get the scheduling latency they require for error-free operation.
To mitigate against potential scheduling latency errors, you can configure the PTP Operator linuxptp
services to allow threads to run with a SCHED_FIFO
policy. If SCHED_FIFO
is set for a PtpConfig
CR, then ptp4l
and phc2sys
will run in the parent container under chrt
with a priority set by the ptpSchedulingPriority
field of the PtpConfig
CR.
Setting ptpSchedulingPolicy
is optional, and is only required if you are experiencing latency errors.
Procedure
Edit the
PtpConfig
CR profile:oc edit PtpConfig -n openshift-ptp
$ oc edit PtpConfig -n openshift-ptp
Copy to Clipboard Copied! Change the
ptpSchedulingPolicy
andptpSchedulingPriority
fields:apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: <ptp_config_name> namespace: openshift-ptp ... spec: profile: - name: "profile1" ... ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: <ptp_config_name> namespace: openshift-ptp ... spec: profile: - name: "profile1" ... ptpSchedulingPolicy: SCHED_FIFO
1 ptpSchedulingPriority: 10
2 Copy to Clipboard Copied! -
Save and exit to apply the changes to the
PtpConfig
CR.
Verification
Get the name of the
linuxptp-daemon
pod and corresponding node where thePtpConfig
CR has been applied:oc get pods -n openshift-ptp -o wide
$ oc get pods -n openshift-ptp -o wide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-gmv2n 3/3 Running 0 1d17h 10.1.196.24 compute-0.example.com linuxptp-daemon-lgm55 3/3 Running 0 1d17h 10.1.196.25 compute-1.example.com ptp-operator-3r4dcvf7f4-zndk7 1/1 Running 0 1d7h 10.129.0.61 control-plane-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-gmv2n 3/3 Running 0 1d17h 10.1.196.24 compute-0.example.com linuxptp-daemon-lgm55 3/3 Running 0 1d17h 10.1.196.25 compute-1.example.com ptp-operator-3r4dcvf7f4-zndk7 1/1 Running 0 1d7h 10.129.0.61 control-plane-1.example.com
Copy to Clipboard Copied! Check that the
ptp4l
process is running with the updatedchrt
FIFO priority:oc -n openshift-ptp logs linuxptp-daemon-lgm55 -c linuxptp-daemon-container|grep chrt
$ oc -n openshift-ptp logs linuxptp-daemon-lgm55 -c linuxptp-daemon-container|grep chrt
Copy to Clipboard Copied! Example output
I1216 19:24:57.091872 1600715 daemon.go:285] /bin/chrt -f 65 /usr/sbin/ptp4l -f /var/run/ptp4l.0.config -2 --summary_interval -4 -m
I1216 19:24:57.091872 1600715 daemon.go:285] /bin/chrt -f 65 /usr/sbin/ptp4l -f /var/run/ptp4l.0.config -2 --summary_interval -4 -m
Copy to Clipboard Copied!
19.2.10. Configuring log filtering for linuxptp services
The linuxptp
daemon generates logs that you can use for debugging purposes. In telco or other deployment types that feature a limited storage capacity, these logs can add to the storage demand.
To reduce the number log messages, you can configure the PtpConfig
custom resource (CR) to exclude log messages that report the master offset
value. The master offset
log message reports the difference between the current node’s clock and the master clock in nanoseconds.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Edit the
PtpConfig
CR:oc edit PtpConfig -n openshift-ptp
$ oc edit PtpConfig -n openshift-ptp
Copy to Clipboard Copied! In
spec.profile
, add theptpSettings.logReduce
specification and set the value totrue
:apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: <ptp_config_name> namespace: openshift-ptp ... spec: profile: - name: "profile1" ... ptpSettings: logReduce: "true"
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: <ptp_config_name> namespace: openshift-ptp ... spec: profile: - name: "profile1" ... ptpSettings: logReduce: "true"
Copy to Clipboard Copied! NoteFor debugging purposes, you can revert this specification to
False
to include the master offset messages.-
Save and exit to apply the changes to the
PtpConfig
CR.
Verification
Get the name of the
linuxptp-daemon
pod and corresponding node where thePtpConfig
CR has been applied:oc get pods -n openshift-ptp -o wide
$ oc get pods -n openshift-ptp -o wide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-gmv2n 3/3 Running 0 1d17h 10.1.196.24 compute-0.example.com linuxptp-daemon-lgm55 3/3 Running 0 1d17h 10.1.196.25 compute-1.example.com ptp-operator-3r4dcvf7f4-zndk7 1/1 Running 0 1d7h 10.129.0.61 control-plane-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-gmv2n 3/3 Running 0 1d17h 10.1.196.24 compute-0.example.com linuxptp-daemon-lgm55 3/3 Running 0 1d17h 10.1.196.25 compute-1.example.com ptp-operator-3r4dcvf7f4-zndk7 1/1 Running 0 1d7h 10.129.0.61 control-plane-1.example.com
Copy to Clipboard Copied! Verify that master offset messages are excluded from the logs by running the following command:
oc -n openshift-ptp logs <linux_daemon_container> -c linuxptp-daemon-container | grep "master offset"
$ oc -n openshift-ptp logs <linux_daemon_container> -c linuxptp-daemon-container | grep "master offset"
1 Copy to Clipboard Copied! - 1
- <linux_daemon_container> is the name of the
linuxptp-daemon
pod, for examplelinuxptp-daemon-gmv2n
.
When you configure the
logReduce
specification, this command does not report any instances ofmaster offset
in the logs of thelinuxptp
daemon.
19.2.11. Troubleshooting common PTP Operator issues
Troubleshoot common problems with the PTP Operator by performing the following steps.
Prerequisites
-
Install the OpenShift Container Platform CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator on a bare-metal cluster with hosts that support PTP.
Procedure
Check the Operator and operands are successfully deployed in the cluster for the configured nodes.
oc get pods -n openshift-ptp -o wide
$ oc get pods -n openshift-ptp -o wide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-lmvgn 3/3 Running 0 4d17h 10.1.196.24 compute-0.example.com linuxptp-daemon-qhfg7 3/3 Running 0 4d17h 10.1.196.25 compute-1.example.com ptp-operator-6b8dcbf7f4-zndk7 1/1 Running 0 5d7h 10.129.0.61 control-plane-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-lmvgn 3/3 Running 0 4d17h 10.1.196.24 compute-0.example.com linuxptp-daemon-qhfg7 3/3 Running 0 4d17h 10.1.196.25 compute-1.example.com ptp-operator-6b8dcbf7f4-zndk7 1/1 Running 0 5d7h 10.129.0.61 control-plane-1.example.com
Copy to Clipboard Copied! NoteWhen the PTP fast event bus is enabled, the number of ready
linuxptp-daemon
pods is3/3
. If the PTP fast event bus is not enabled,2/2
is displayed.Check that supported hardware is found in the cluster.
oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io
$ oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io
Copy to Clipboard Copied! Example output
NAME AGE control-plane-0.example.com 10d control-plane-1.example.com 10d compute-0.example.com 10d compute-1.example.com 10d compute-2.example.com 10d
NAME AGE control-plane-0.example.com 10d control-plane-1.example.com 10d compute-0.example.com 10d compute-1.example.com 10d compute-2.example.com 10d
Copy to Clipboard Copied! Check the available PTP network interfaces for a node:
oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io <node_name> -o yaml
$ oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io <node_name> -o yaml
Copy to Clipboard Copied! where:
- <node_name>
Specifies the node you want to query, for example,
compute-0.example.com
.Example output
apiVersion: ptp.openshift.io/v1 kind: NodePtpDevice metadata: creationTimestamp: "2021-09-14T16:52:33Z" generation: 1 name: compute-0.example.com namespace: openshift-ptp resourceVersion: "177400" uid: 30413db0-4d8d-46da-9bef-737bacd548fd spec: {} status: devices: - name: eno1 - name: eno2 - name: eno3 - name: eno4 - name: enp5s0f0 - name: enp5s0f1
apiVersion: ptp.openshift.io/v1 kind: NodePtpDevice metadata: creationTimestamp: "2021-09-14T16:52:33Z" generation: 1 name: compute-0.example.com namespace: openshift-ptp resourceVersion: "177400" uid: 30413db0-4d8d-46da-9bef-737bacd548fd spec: {} status: devices: - name: eno1 - name: eno2 - name: eno3 - name: eno4 - name: enp5s0f0 - name: enp5s0f1
Copy to Clipboard Copied!
Check that the PTP interface is successfully synchronized to the primary clock by accessing the
linuxptp-daemon
pod for the corresponding node.Get the name of the
linuxptp-daemon
pod and corresponding node you want to troubleshoot by running the following command:oc get pods -n openshift-ptp -o wide
$ oc get pods -n openshift-ptp -o wide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-lmvgn 3/3 Running 0 4d17h 10.1.196.24 compute-0.example.com linuxptp-daemon-qhfg7 3/3 Running 0 4d17h 10.1.196.25 compute-1.example.com ptp-operator-6b8dcbf7f4-zndk7 1/1 Running 0 5d7h 10.129.0.61 control-plane-1.example.com
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-lmvgn 3/3 Running 0 4d17h 10.1.196.24 compute-0.example.com linuxptp-daemon-qhfg7 3/3 Running 0 4d17h 10.1.196.25 compute-1.example.com ptp-operator-6b8dcbf7f4-zndk7 1/1 Running 0 5d7h 10.129.0.61 control-plane-1.example.com
Copy to Clipboard Copied! Remote shell into the required
linuxptp-daemon
container:oc rsh -n openshift-ptp -c linuxptp-daemon-container <linux_daemon_container>
$ oc rsh -n openshift-ptp -c linuxptp-daemon-container <linux_daemon_container>
Copy to Clipboard Copied! where:
- <linux_daemon_container>
-
is the container you want to diagnose, for example
linuxptp-daemon-lmvgn
.
In the remote shell connection to the
linuxptp-daemon
container, use the PTP Management Client (pmc
) tool to diagnose the network interface. Run the followingpmc
command to check the sync status of the PTP device, for exampleptp4l
.pmc -u -f /var/run/ptp4l.0.config -b 0 'GET PORT_DATA_SET'
# pmc -u -f /var/run/ptp4l.0.config -b 0 'GET PORT_DATA_SET'
Copy to Clipboard Copied! Example output when the node is successfully synced to the primary clock
sending: GET PORT_DATA_SET 40a6b7.fffe.166ef0-1 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET portIdentity 40a6b7.fffe.166ef0-1 portState SLAVE logMinDelayReqInterval -4 peerMeanPathDelay 0 logAnnounceInterval -3 announceReceiptTimeout 3 logSyncInterval -4 delayMechanism 1 logMinPdelayReqInterval -4 versionNumber 2
sending: GET PORT_DATA_SET 40a6b7.fffe.166ef0-1 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET portIdentity 40a6b7.fffe.166ef0-1 portState SLAVE logMinDelayReqInterval -4 peerMeanPathDelay 0 logAnnounceInterval -3 announceReceiptTimeout 3 logSyncInterval -4 delayMechanism 1 logMinPdelayReqInterval -4 versionNumber 2
Copy to Clipboard Copied!
For GNSS-sourced grandmaster clocks, verify that the in-tree NIC ice driver is correct by running the following command, for example:
oc rsh -n openshift-ptp -c linuxptp-daemon-container linuxptp-daemon-74m2g ethtool -i ens7f0
$ oc rsh -n openshift-ptp -c linuxptp-daemon-container linuxptp-daemon-74m2g ethtool -i ens7f0
Copy to Clipboard Copied! Example output
driver: ice version: 5.14.0-356.bz2232515.el9.x86_64 firmware-version: 4.20 0x8001778b 1.3346.0
driver: ice version: 5.14.0-356.bz2232515.el9.x86_64 firmware-version: 4.20 0x8001778b 1.3346.0
Copy to Clipboard Copied! For GNSS-sourced grandmaster clocks, verify that the
linuxptp-daemon
container is receiving signal from the GNSS antenna. If the container is not receiving the GNSS signal, the/dev/gnss0
file is not populated. To verify, run the following command:oc rsh -n openshift-ptp -c linuxptp-daemon-container linuxptp-daemon-jnz6r cat /dev/gnss0
$ oc rsh -n openshift-ptp -c linuxptp-daemon-container linuxptp-daemon-jnz6r cat /dev/gnss0
Copy to Clipboard Copied! Example output
$GNRMC,125223.00,A,4233.24463,N,07126.64561,W,0.000,,300823,,,A,V*0A $GNVTG,,T,,M,0.000,N,0.000,K,A*3D $GNGGA,125223.00,4233.24463,N,07126.64561,W,1,12,99.99,98.6,M,-33.1,M,,*7E $GNGSA,A,3,25,17,19,11,12,06,05,04,09,20,,,99.99,99.99,99.99,1*37 $GPGSV,3,1,10,04,12,039,41,05,31,222,46,06,50,064,48,09,28,064,42,1*62
$GNRMC,125223.00,A,4233.24463,N,07126.64561,W,0.000,,300823,,,A,V*0A $GNVTG,,T,,M,0.000,N,0.000,K,A*3D $GNGGA,125223.00,4233.24463,N,07126.64561,W,1,12,99.99,98.6,M,-33.1,M,,*7E $GNGSA,A,3,25,17,19,11,12,06,05,04,09,20,,,99.99,99.99,99.99,1*37 $GPGSV,3,1,10,04,12,039,41,05,31,222,46,06,50,064,48,09,28,064,42,1*62
Copy to Clipboard Copied!
19.2.12. Collecting PTP Operator data
You can use the oc adm must-gather
command to collect information about your cluster, including features and objects associated with PTP Operator.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. -
You have installed the OpenShift CLI (
oc
). - You have installed the PTP Operator.
Procedure
To collect PTP Operator data with
must-gather
, you must specify the PTP Operatormust-gather
image.oc adm must-gather --image=registry.redhat.io/openshift4/ptp-must-gather-rhel8:v4.15
$ oc adm must-gather --image=registry.redhat.io/openshift4/ptp-must-gather-rhel8:v4.15
Copy to Clipboard Copied!
19.3. Using the PTP hardware fast event notifications framework
Cloud native applications such as virtual RAN (vRAN) require access to notifications about hardware timing events that are critical to the functioning of the overall network. Precision Time Protocol (PTP) clock synchronization errors can negatively affect the performance and reliability of your low-latency application, for example, a vRAN application running in a distributed unit (DU).
19.3.1. About PTP and clock synchronization error events
Loss of PTP synchronization is a critical error for a RAN network. If synchronization is lost on a node, the radio might be shut down and the network Over the Air (OTA) traffic might be shifted to another node in the wireless network. Fast event notifications mitigate against workload errors by allowing cluster nodes to communicate PTP clock sync status to the vRAN application running in the DU.
Event notifications are available to vRAN applications running on the same DU node. A publish/subscribe REST API passes events notifications to the messaging bus. Publish/subscribe messaging, or pub-sub messaging, is an asynchronous service-to-service communication architecture where any message published to a topic is immediately received by all of the subscribers to the topic.
The PTP Operator generates fast event notifications for every PTP-capable network interface. You can access the events by using a cloud-event-proxy
sidecar container over an HTTP or Advanced Message Queuing Protocol (AMQP) message bus.
PTP fast event notifications are available for network interfaces configured to use PTP ordinary clocks, PTP grandmaster clocks, or PTP boundary clocks.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
19.3.2. About the PTP fast event notifications framework
Use the Precision Time Protocol (PTP) fast event notifications framework to subscribe cluster applications to PTP events that the bare-metal cluster node generates.
The fast events notifications framework uses a REST API for communication. The REST API is based on the O-RAN O-Cloud Notification API Specification for Event Consumers 3.0 that is available from O-RAN ALLIANCE Specifications.
The framework consists of a publisher, subscriber, and an AMQ or HTTP messaging protocol to handle communications between the publisher and subscriber applications. Applications run the cloud-event-proxy
container in a sidecar pattern to subscribe to PTP events. The cloud-event-proxy
sidecar container can access the same resources as the primary application container without using any of the resources of the primary application and with no significant latency.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
Figure 19.4. Overview of PTP fast events

-
Event is generated on the cluster host
-
linuxptp-daemon
in the PTP Operator-managed pod runs as a KubernetesDaemonSet
and manages the variouslinuxptp
processes (ptp4l
,phc2sys
, and optionally for grandmaster clocks,ts2phc
). Thelinuxptp-daemon
passes the event to the UNIX domain socket. -
Event is passed to the cloud-event-proxy sidecar
-
The PTP plugin reads the event from the UNIX domain socket and passes it to the
cloud-event-proxy
sidecar in the PTP Operator-managed pod.cloud-event-proxy
delivers the event from the Kubernetes infrastructure to Cloud-Native Network Functions (CNFs) with low latency. -
Event is persisted
-
The
cloud-event-proxy
sidecar in the PTP Operator-managed pod processes the event and publishes the cloud-native event by using a REST API. -
Message is transported
-
The message transporter transports the event to the
cloud-event-proxy
sidecar in the application pod over HTTP or AMQP 1.0 QPID. -
Event is available from the REST API
-
The
cloud-event-proxy
sidecar in the Application pod processes the event and makes it available by using the REST API. -
Consumer application requests a subscription and receives the subscribed event
-
The consumer application sends an API request to the
cloud-event-proxy
sidecar in the application pod to create a PTP events subscription. Thecloud-event-proxy
sidecar creates an AMQ or HTTP messaging listener protocol for the resource specified in the subscription.
The cloud-event-proxy
sidecar in the application pod receives the event from the PTP Operator-managed pod, unwraps the cloud events object to retrieve the data, and posts the event to the consumer application. The consumer application listens to the address specified in the resource qualifier and receives and processes the PTP event.
19.3.3. Configuring the PTP fast event notifications publisher
To start using PTP fast event notifications for a network interface in your cluster, you must enable the fast event publisher in the PTP Operator PtpOperatorConfig
custom resource (CR) and configure ptpClockThreshold
values in a PtpConfig
CR that you create.
Prerequisites
-
You have installed the OpenShift Container Platform CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have installed the PTP Operator.
Procedure
Modify the default PTP Operator config to enable PTP fast events.
Save the following YAML in the
ptp-operatorconfig.yaml
file:apiVersion: ptp.openshift.io/v1 kind: PtpOperatorConfig metadata: name: default namespace: openshift-ptp spec: daemonNodeSelector: node-role.kubernetes.io/worker: "" ptpEventConfig: enableEventPublisher: true
apiVersion: ptp.openshift.io/v1 kind: PtpOperatorConfig metadata: name: default namespace: openshift-ptp spec: daemonNodeSelector: node-role.kubernetes.io/worker: "" ptpEventConfig: enableEventPublisher: true
1 Copy to Clipboard Copied! - 1
- Set
enableEventPublisher
totrue
to enable PTP fast event notifications.
NoteIn OpenShift Container Platform 4.13 or later, you do not need to set the
spec.ptpEventConfig.transportHost
field in thePtpOperatorConfig
resource when you use HTTP transport for PTP events. SettransportHost
only when you use AMQP transport for PTP events.Update the
PtpOperatorConfig
CR:oc apply -f ptp-operatorconfig.yaml
$ oc apply -f ptp-operatorconfig.yaml
Copy to Clipboard Copied!
Create a
PtpConfig
custom resource (CR) for the PTP enabled interface, and set the required values forptpClockThreshold
andptp4lOpts
. The following YAML illustrates the required values that you must set in thePtpConfig
CR:spec: profile: - name: "profile1" interface: "enp5s0f0" ptp4lOpts: "-2 -s --summary_interval -4" phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" ptp4lConf: "" ptpClockThreshold: holdOverTimeout: 5 maxOffsetThreshold: 100 minOffsetThreshold: -100
spec: profile: - name: "profile1" interface: "enp5s0f0" ptp4lOpts: "-2 -s --summary_interval -4"
1 phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16"
2 ptp4lConf: ""
3 ptpClockThreshold:
4 holdOverTimeout: 5 maxOffsetThreshold: 100 minOffsetThreshold: -100
Copy to Clipboard Copied! - 1
- Append
--summary_interval -4
to use PTP fast events. - 2
- Required
phc2sysOpts
values.-m
prints messages tostdout
. Thelinuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics. - 3
- Specify a string that contains the configuration to replace the default
/etc/ptp4l.conf
file. To use the default configuration, leave the field empty. - 4
- Optional. If the
ptpClockThreshold
stanza is not present, default values are used for theptpClockThreshold
fields. The stanza shows defaultptpClockThreshold
values. TheptpClockThreshold
values configure how long after the PTP master clock is disconnected before PTP events are triggered.holdOverTimeout
is the time value in seconds before the PTP clock event state changes toFREERUN
when the PTP master clock is disconnected. ThemaxOffsetThreshold
andminOffsetThreshold
settings configure offset values in nanoseconds that compare against the values forCLOCK_REALTIME
(phc2sys
) or master offset (ptp4l
). When theptp4l
orphc2sys
offset value is outside this range, the PTP clock state is set toFREERUN
. When the offset value is within this range, the PTP clock state is set toLOCKED
.
19.3.4. Migrating consumer applications to use HTTP transport for PTP or bare-metal events
If you have previously deployed PTP or bare-metal events consumer applications, you need to update the applications to use HTTP message transport.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have updated the PTP Operator or Bare Metal Event Relay to version 4.13+ which uses HTTP transport by default.
Procedure
Update your events consumer application to use HTTP transport. Set the
http-event-publishers
variable for the cloud event sidecar deployment.For example, in a cluster with PTP events configured, the following YAML snippet illustrates a cloud event sidecar deployment:
containers: - name: cloud-event-sidecar image: cloud-event-sidecar args: - "--metrics-addr=127.0.0.1:9091" - "--store-path=/store" - "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043" - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043" - "--api-port=8089"
containers: - name: cloud-event-sidecar image: cloud-event-sidecar args: - "--metrics-addr=127.0.0.1:9091" - "--store-path=/store" - "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043" - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"
1 - "--api-port=8089"
Copy to Clipboard Copied! - 1
- The PTP Operator automatically resolves
NODE_NAME
to the host that is generating the PTP events. For example,compute-1.example.com
.
In a cluster with bare-metal events configured, set the
http-event-publishers
field tohw-event-publisher-service.openshift-bare-metal-events.svc.cluster.local:9043
in the cloud event sidecar deployment CR.Deploy the
consumer-events-subscription-service
service alongside the events consumer application. For example:apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: "true" service.alpha.openshift.io/serving-cert-secret-name: sidecar-consumer-secret name: consumer-events-subscription-service namespace: cloud-events labels: app: consumer-service spec: ports: - name: sub-port port: 9043 selector: app: consumer clusterIP: None sessionAffinity: None type: ClusterIP
apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: "true" service.alpha.openshift.io/serving-cert-secret-name: sidecar-consumer-secret name: consumer-events-subscription-service namespace: cloud-events labels: app: consumer-service spec: ports: - name: sub-port port: 9043 selector: app: consumer clusterIP: None sessionAffinity: None type: ClusterIP
Copy to Clipboard Copied!
19.3.5. Installing the AMQ messaging bus
To pass PTP fast event notifications between publisher and subscriber on a node, you can install and configure an AMQ messaging bus to run locally on the node. To use AMQ messaging, you must install the AMQ Interconnect Operator.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
Prerequisites
-
Install the OpenShift Container Platform CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
-
Install the AMQ Interconnect Operator to its own
amq-interconnect
namespace. See Adding the Red Hat Integration - AMQ Interconnect Operator.
Verification
Check that the AMQ Interconnect Operator is available and the required pods are running:
oc get pods -n amq-interconnect
$ oc get pods -n amq-interconnect
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE amq-interconnect-645db76c76-k8ghs 1/1 Running 0 23h interconnect-operator-5cb5fc7cc-4v7qm 1/1 Running 0 23h
NAME READY STATUS RESTARTS AGE amq-interconnect-645db76c76-k8ghs 1/1 Running 0 23h interconnect-operator-5cb5fc7cc-4v7qm 1/1 Running 0 23h
Copy to Clipboard Copied! Check that the required
linuxptp-daemon
PTP event producer pods are running in theopenshift-ptp
namespace.oc get pods -n openshift-ptp
$ oc get pods -n openshift-ptp
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE linuxptp-daemon-2t78p 3/3 Running 0 12h linuxptp-daemon-k8n88 3/3 Running 0 12h
NAME READY STATUS RESTARTS AGE linuxptp-daemon-2t78p 3/3 Running 0 12h linuxptp-daemon-k8n88 3/3 Running 0 12h
Copy to Clipboard Copied!
19.3.6. Subscribing DU applications to PTP events with the REST API
Subscribe applications to PTP events by using the resource address /cluster/node/<node_name>/ptp
, where <node_name>
is the cluster node running the DU application.
Deploy your cloud-event-consumer
DU application container and cloud-event-proxy
sidecar container in a separate DU application pod. The cloud-event-consumer
DU application subscribes to the cloud-event-proxy
container in the application pod.
Use the following API endpoints to subscribe the cloud-event-consumer
DU application to PTP events posted by the cloud-event-proxy
container at http://localhost:8089/api/ocloudNotifications/v1/
in the DU application pod:
/api/ocloudNotifications/v1/subscriptions
-
POST
: Creates a new subscription -
GET
: Retrieves a list of subscriptions -
DELETE
: Deletes all subscriptions
-
/api/ocloudNotifications/v1/subscriptions/{subscription_id}
-
GET
: Returns details for the specified subscription ID -
DELETE
: Deletes the subscription associated with the specified subscription ID
-
/api/ocloudNotifications/v1/health
-
GET
: Returns the health status ofocloudNotifications
API
-
api/ocloudNotifications/v1/publishers
-
GET
: Returns an array ofos-clock-sync-state
,ptp-clock-class-change
,lock-state
, andgnss-sync-status
messages for the cluster node
-
/api/ocloudnotifications/v1/{resource_address}/CurrentState
-
GET
: Returns the current state of one the following event types:os-clock-sync-state
,ptp-clock-class-change
,lock-state
, orgnss-state-change
events
-
9089
is the default port for the cloud-event-consumer
container deployed in the application pod. You can configure a different port for your DU application as required.
19.3.6.1. PTP events REST API reference
Use the PTP event notifications REST API to subscribe a cluster application to the PTP events that are generated on the parent node.
19.3.6.1.1. api/ocloudNotifications/v1/subscriptions
HTTP method
GET api/ocloudNotifications/v1/subscriptions
Description
Returns a list of subscriptions. If subscriptions exist, a 200 OK
status code is returned along with the list of subscriptions.
Example API response
[ { "id": "75b1ad8f-c807-4c23-acf5-56f4b7ee3826", "endpointUri": "http://localhost:9089/event", "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/75b1ad8f-c807-4c23-acf5-56f4b7ee3826", "resource": "/cluster/node/compute-1.example.com/ptp" } ]
[
{
"id": "75b1ad8f-c807-4c23-acf5-56f4b7ee3826",
"endpointUri": "http://localhost:9089/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/75b1ad8f-c807-4c23-acf5-56f4b7ee3826",
"resource": "/cluster/node/compute-1.example.com/ptp"
}
]
HTTP method
POST api/ocloudNotifications/v1/subscriptions
Description
Creates a new subscription. If a subscription is successfully created, or if it already exists, a 201 Created
status code is returned.
Parameter | Type |
---|---|
subscription | data |
Example payload
{ "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions", "resource": "/cluster/node/compute-1.example.com/ptp" }
{
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions",
"resource": "/cluster/node/compute-1.example.com/ptp"
}
HTTP method
DELETE api/ocloudNotifications/v1/subscriptions
Description
Deletes all subscriptions.
Example API response
{ "status": "deleted all subscriptions" }
{
"status": "deleted all subscriptions"
}
19.3.6.1.2. api/ocloudNotifications/v1/subscriptions/{subscription_id}
HTTP method
GET api/ocloudNotifications/v1/subscriptions/{subscription_id}
Description
Returns details for the subscription with ID subscription_id
.
Parameter | Type |
---|---|
| string |
Example API response
{ "id":"48210fb3-45be-4ce0-aa9b-41a0e58730ab", "endpointUri": "http://localhost:9089/event", "uriLocation":"http://localhost:8089/api/ocloudNotifications/v1/subscriptions/48210fb3-45be-4ce0-aa9b-41a0e58730ab", "resource":"/cluster/node/compute-1.example.com/ptp" }
{
"id":"48210fb3-45be-4ce0-aa9b-41a0e58730ab",
"endpointUri": "http://localhost:9089/event",
"uriLocation":"http://localhost:8089/api/ocloudNotifications/v1/subscriptions/48210fb3-45be-4ce0-aa9b-41a0e58730ab",
"resource":"/cluster/node/compute-1.example.com/ptp"
}
HTTP method
DELETE api/ocloudNotifications/v1/subscriptions/{subscription_id}
Description
Deletes the subscription with ID subscription_id
.
Parameter | Type |
---|---|
| string |
Example API response
{ "status": "OK" }
{
"status": "OK"
}
19.3.6.1.3. api/ocloudNotifications/v1/health
HTTP method
GET api/ocloudNotifications/v1/health/
Description
Returns the health status for the ocloudNotifications
REST API.
Example API response
OK
OK
19.3.6.1.4. api/ocloudNotifications/v1/publishers
HTTP method
GET api/ocloudNotifications/v1/publishers
Description
Returns an array of os-clock-sync-state
, ptp-clock-class-change
, lock-state
, and gnss-sync-status
details for the cluster node. The system generates notifications when the relevant equipment state changes.
-
os-clock-sync-state
notifications describe the host operating system clock synchronization state. Valid states areLOCKED
orFREERUN
. -
ptp-clock-class-change
notifications describe the current state of the PTP clock class. -
lock-state
notifications describe the current status of the PTP equipment lock state. Valid states areLOCKED
,HOLDOVER
orFREERUN
. -
gnss-sync-status
notifications describe the GPS synchronization state with regard to the external GNSS clock signal. Valid states areSYNCHRONIZED
,ANTENNA_DISCONNECTED
, orACQUIRING_SYNC
.
You can use equipment synchronization status subscriptions together to deliver a detailed view of the overall synchronization health of the system.
Example API response
[ { "id": "0fa415ae-a3cf-4299-876a-589438bacf75", "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy", "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/0fa415ae-a3cf-4299-876a-589438bacf75", "resource": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state" }, { "id": "28cd82df-8436-4f50-bbd9-7a9742828a71", "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy", "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/28cd82df-8436-4f50-bbd9-7a9742828a71", "resource": "/cluster/node/compute-1.example.com/sync/ptp-status/ptp-clock-class-change" }, { "id": "44aa480d-7347-48b0-a5b0-e0af01fa9677", "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy", "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/44aa480d-7347-48b0-a5b0-e0af01fa9677", "resource": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state" }, { "id": "778da345d-4567-67b0-a43f0-rty885a456", "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy", "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/778da345d-4567-67b0-a43f0-rty885a456", "resource": "/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status" } ]
[
{
"id": "0fa415ae-a3cf-4299-876a-589438bacf75",
"endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
"uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/0fa415ae-a3cf-4299-876a-589438bacf75",
"resource": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state"
},
{
"id": "28cd82df-8436-4f50-bbd9-7a9742828a71",
"endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
"uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/28cd82df-8436-4f50-bbd9-7a9742828a71",
"resource": "/cluster/node/compute-1.example.com/sync/ptp-status/ptp-clock-class-change"
},
{
"id": "44aa480d-7347-48b0-a5b0-e0af01fa9677",
"endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
"uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/44aa480d-7347-48b0-a5b0-e0af01fa9677",
"resource": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state"
},
{
"id": "778da345d-4567-67b0-a43f0-rty885a456",
"endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
"uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/778da345d-4567-67b0-a43f0-rty885a456",
"resource": "/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status"
}
]
You can find os-clock-sync-state
, ptp-clock-class-change
, lock-state
, and gnss-sync-status
events in the logs for the cloud-event-proxy
container. For example:
oc logs -f linuxptp-daemon-cvgr6 -n openshift-ptp -c cloud-event-proxy
$ oc logs -f linuxptp-daemon-cvgr6 -n openshift-ptp -c cloud-event-proxy
Example os-clock-sync-state event
{ "id":"c8a784d1-5f4a-4c16-9a81-a3b4313affe5", "type":"event.sync.sync-status.os-clock-sync-state-change", "source":"/cluster/compute-1.example.com/ptp/CLOCK_REALTIME", "dataContentType":"application/json", "time":"2022-05-06T15:31:23.906277159Z", "data":{ "version":"v1", "values":[ { "resource":"/sync/sync-status/os-clock-sync-state", "dataType":"notification", "valueType":"enumeration", "value":"LOCKED" }, { "resource":"/sync/sync-status/os-clock-sync-state", "dataType":"metric", "valueType":"decimal64.3", "value":"-53" } ] } }
{
"id":"c8a784d1-5f4a-4c16-9a81-a3b4313affe5",
"type":"event.sync.sync-status.os-clock-sync-state-change",
"source":"/cluster/compute-1.example.com/ptp/CLOCK_REALTIME",
"dataContentType":"application/json",
"time":"2022-05-06T15:31:23.906277159Z",
"data":{
"version":"v1",
"values":[
{
"resource":"/sync/sync-status/os-clock-sync-state",
"dataType":"notification",
"valueType":"enumeration",
"value":"LOCKED"
},
{
"resource":"/sync/sync-status/os-clock-sync-state",
"dataType":"metric",
"valueType":"decimal64.3",
"value":"-53"
}
]
}
}
Example ptp-clock-class-change event
{ "id":"69eddb52-1650-4e56-b325-86d44688d02b", "type":"event.sync.ptp-status.ptp-clock-class-change", "source":"/cluster/compute-1.example.com/ptp/ens2fx/master", "dataContentType":"application/json", "time":"2022-05-06T15:31:23.147100033Z", "data":{ "version":"v1", "values":[ { "resource":"/sync/ptp-status/ptp-clock-class-change", "dataType":"metric", "valueType":"decimal64.3", "value":"135" } ] } }
{
"id":"69eddb52-1650-4e56-b325-86d44688d02b",
"type":"event.sync.ptp-status.ptp-clock-class-change",
"source":"/cluster/compute-1.example.com/ptp/ens2fx/master",
"dataContentType":"application/json",
"time":"2022-05-06T15:31:23.147100033Z",
"data":{
"version":"v1",
"values":[
{
"resource":"/sync/ptp-status/ptp-clock-class-change",
"dataType":"metric",
"valueType":"decimal64.3",
"value":"135"
}
]
}
}
Example lock-state event
{ "id":"305ec18b-1472-47b3-aadd-8f37933249a9", "type":"event.sync.ptp-status.ptp-state-change", "source":"/cluster/compute-1.example.com/ptp/ens2fx/master", "dataContentType":"application/json", "time":"2022-05-06T15:31:23.467684081Z", "data":{ "version":"v1", "values":[ { "resource":"/sync/ptp-status/lock-state", "dataType":"notification", "valueType":"enumeration", "value":"LOCKED" }, { "resource":"/sync/ptp-status/lock-state", "dataType":"metric", "valueType":"decimal64.3", "value":"62" } ] } }
{
"id":"305ec18b-1472-47b3-aadd-8f37933249a9",
"type":"event.sync.ptp-status.ptp-state-change",
"source":"/cluster/compute-1.example.com/ptp/ens2fx/master",
"dataContentType":"application/json",
"time":"2022-05-06T15:31:23.467684081Z",
"data":{
"version":"v1",
"values":[
{
"resource":"/sync/ptp-status/lock-state",
"dataType":"notification",
"valueType":"enumeration",
"value":"LOCKED"
},
{
"resource":"/sync/ptp-status/lock-state",
"dataType":"metric",
"valueType":"decimal64.3",
"value":"62"
}
]
}
}
Example gnss-sync-status event
{ "id": "435e1f2a-6854-4555-8520-767325c087d7", "type": "event.sync.gnss-status.gnss-state-change", "source": "/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status", "dataContentType": "application/json", "time": "2023-09-27T19:35:33.42347206Z", "data": { "version": "v1", "values": [ { "resource": "/cluster/node/compute-1.example.com/ens2fx/master", "dataType": "notification", "valueType": "enumeration", "value": "SYNCHRONIZED" }, { "resource": "/cluster/node/compute-1.example.com/ens2fx/master", "dataType": "metric", "valueType": "decimal64.3", "value": "5" } ] } }
{
"id": "435e1f2a-6854-4555-8520-767325c087d7",
"type": "event.sync.gnss-status.gnss-state-change",
"source": "/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status",
"dataContentType": "application/json",
"time": "2023-09-27T19:35:33.42347206Z",
"data": {
"version": "v1",
"values": [
{
"resource": "/cluster/node/compute-1.example.com/ens2fx/master",
"dataType": "notification",
"valueType": "enumeration",
"value": "SYNCHRONIZED"
},
{
"resource": "/cluster/node/compute-1.example.com/ens2fx/master",
"dataType": "metric",
"valueType": "decimal64.3",
"value": "5"
}
]
}
}
19.3.6.1.5. api/ocloudNotifications/v1/{resource_address}/CurrentState
HTTP method
GET api/ocloudNotifications/v1/cluster/node/<node_name>/sync/ptp-status/lock-state/CurrentState
GET api/ocloudNotifications/v1/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state/CurrentState
GET api/ocloudNotifications/v1/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change/CurrentState
Description
Configure the CurrentState
API endpoint to return the current state of the os-clock-sync-state
, ptp-clock-class-change
, lock-state
events for the cluster node.
-
os-clock-sync-state
notifications describe the host operating system clock synchronization state. Can be inLOCKED
orFREERUN
state. -
ptp-clock-class-change
notifications describe the current state of the PTP clock class. -
lock-state
notifications describe the current status of the PTP equipment lock state. Can be inLOCKED
,HOLDOVER
orFREERUN
state.
Parameter | Type |
---|---|
| string |
Example lock-state API response
{ "id": "c1ac3aa5-1195-4786-84f8-da0ea4462921", "type": "event.sync.ptp-status.ptp-state-change", "source": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state", "dataContentType": "application/json", "time": "2023-01-10T02:41:57.094981478Z", "data": { "version": "v1", "values": [ { "resource": "/cluster/node/compute-1.example.com/ens5fx/master", "dataType": "notification", "valueType": "enumeration", "value": "LOCKED" }, { "resource": "/cluster/node/compute-1.example.com/ens5fx/master", "dataType": "metric", "valueType": "decimal64.3", "value": "29" } ] } }
{
"id": "c1ac3aa5-1195-4786-84f8-da0ea4462921",
"type": "event.sync.ptp-status.ptp-state-change",
"source": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
"dataContentType": "application/json",
"time": "2023-01-10T02:41:57.094981478Z",
"data": {
"version": "v1",
"values": [
{
"resource": "/cluster/node/compute-1.example.com/ens5fx/master",
"dataType": "notification",
"valueType": "enumeration",
"value": "LOCKED"
},
{
"resource": "/cluster/node/compute-1.example.com/ens5fx/master",
"dataType": "metric",
"valueType": "decimal64.3",
"value": "29"
}
]
}
}
Example os-clock-sync-state API response
{ "specversion": "0.3", "id": "4f51fe99-feaa-4e66-9112-66c5c9b9afcb", "source": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state", "type": "event.sync.sync-status.os-clock-sync-state-change", "subject": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state", "datacontenttype": "application/json", "time": "2022-11-29T17:44:22.202Z", "data": { "version": "v1", "values": [ { "resource": "/cluster/node/compute-1.example.com/CLOCK_REALTIME", "dataType": "notification", "valueType": "enumeration", "value": "LOCKED" }, { "resource": "/cluster/node/compute-1.example.com/CLOCK_REALTIME", "dataType": "metric", "valueType": "decimal64.3", "value": "27" } ] } }
{
"specversion": "0.3",
"id": "4f51fe99-feaa-4e66-9112-66c5c9b9afcb",
"source": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
"type": "event.sync.sync-status.os-clock-sync-state-change",
"subject": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
"datacontenttype": "application/json",
"time": "2022-11-29T17:44:22.202Z",
"data": {
"version": "v1",
"values": [
{
"resource": "/cluster/node/compute-1.example.com/CLOCK_REALTIME",
"dataType": "notification",
"valueType": "enumeration",
"value": "LOCKED"
},
{
"resource": "/cluster/node/compute-1.example.com/CLOCK_REALTIME",
"dataType": "metric",
"valueType": "decimal64.3",
"value": "27"
}
]
}
}
Example ptp-clock-class-change API response
{ "id": "064c9e67-5ad4-4afb-98ff-189c6aa9c205", "type": "event.sync.ptp-status.ptp-clock-class-change", "source": "/cluster/node/compute-1.example.com/sync/ptp-status/ptp-clock-class-change", "dataContentType": "application/json", "time": "2023-01-10T02:41:56.785673989Z", "data": { "version": "v1", "values": [ { "resource": "/cluster/node/compute-1.example.com/ens5fx/master", "dataType": "metric", "valueType": "decimal64.3", "value": "165" } ] } }
{
"id": "064c9e67-5ad4-4afb-98ff-189c6aa9c205",
"type": "event.sync.ptp-status.ptp-clock-class-change",
"source": "/cluster/node/compute-1.example.com/sync/ptp-status/ptp-clock-class-change",
"dataContentType": "application/json",
"time": "2023-01-10T02:41:56.785673989Z",
"data": {
"version": "v1",
"values": [
{
"resource": "/cluster/node/compute-1.example.com/ens5fx/master",
"dataType": "metric",
"valueType": "decimal64.3",
"value": "165"
}
]
}
}
19.3.7. Monitoring PTP fast event metrics
You can monitor PTP fast events metrics from cluster nodes where the linuxptp-daemon
is running. You can also monitor PTP fast event metrics in the OpenShift Container Platform web console by using the preconfigured and self-updating Prometheus monitoring stack.
Prerequisites
-
Install the OpenShift Container Platform CLI
oc
. -
Log in as a user with
cluster-admin
privileges. - Install and configure the PTP Operator on a node with PTP-capable hardware.
Procedure
Start a debug pod for the node by running the following command:
oc debug node/<node_name>
$ oc debug node/<node_name>
Copy to Clipboard Copied! Check for PTP metrics exposed by the
linuxptp-daemon
container. For example, run the following command:curl http://localhost:9091/metrics
sh-4.4# curl http://localhost:9091/metrics
Copy to Clipboard Copied! Example output
HELP cne_api_events_published Metric to get number of events published by the rest api TYPE cne_api_events_published gauge
# HELP cne_api_events_published Metric to get number of events published by the rest api # TYPE cne_api_events_published gauge cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status",status="success"} 1 cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",status="success"} 94 cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/ptp-status/ptp-clock-class-change",status="success"} 18 cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",status="success"} 27
Copy to Clipboard Copied! -
To view the PTP event in the OpenShift Container Platform web console, copy the name of the PTP metric you want to query, for example,
openshift_ptp_offset_ns
. - In the OpenShift Container Platform web console, click Observe → Metrics.
- Paste the PTP metric name into the Expression field, and click Run queries.
19.3.8. PTP fast event metrics reference
The following table describes the PTP fast events metrics that are available from cluster nodes where the linuxptp-daemon
service is running.
Metric | Description | Example |
---|---|---|
|
Returns the PTP clock class for the interface. Possible values for PTP clock class are 6 ( |
|
|
Returns the current PTP clock state for the interface. Possible values for PTP clock state are |
|
| Returns the delay in nanoseconds between the primary clock sending the timing packet and the secondary clock receiving the timing packet. |
|
|
Returns the current status of the highly available system clock when there are multiple time sources on different NICs. Possible values are 0 ( |
|
|
Returns the frequency adjustment in nanoseconds between 2 PTP clocks. For example, between the upstream clock and the NIC, between the system clock and the NIC, or between the PTP hardware clock ( |
|
|
Returns the configured PTP clock role for the interface. Possible values are 0 ( |
|
|
Returns the maximum offset in nanoseconds between 2 clocks or interfaces. For example, between the upstream GNSS clock and the NIC ( |
|
| Returns the offset in nanoseconds between the DPLL clock or the GNSS clock source and the NIC hardware clock. |
|
|
Returns a count of the number of times the |
|
| Returns a status code that shows whether the PTP processes are running or not. |
|
|
Returns values for
|
|
PTP fast event metrics only when T-GM is enabled
The following table describes the PTP fast event metrics that are available only when PTP grandmaster clock (T-GM) is enabled.
Metric | Description | Example |
---|---|---|
|
Returns the current status of the digital phase-locked loop (DPLL) frequency for the NIC. Possible values are -1 ( |
|
|
Returns the current status of the NMEA connection. NMEA is the protocol that is used for 1PPS NIC connections. Possible values are 0 ( |
|
|
Returns the status of the DPLL phase for the NIC. Possible values are -1 ( |
|
|
Returns the current status of the NIC 1PPS connection. You use the 1PPS connection to synchronize timing between connected NICs. Possible values are 0 ( |
|
|
Returns the current status of the global navigation satellite system (GNSS) connection. GNSS provides satellite-based positioning, navigation, and timing services globally. Possible values are 0 ( |
|
19.4. Developing Precision Time Protocol events consumer applications
When developing consumer applications that make use of Precision Time Protocol (PTP) events on a bare-metal cluster node, you need to deploy your consumer application and a cloud-event-proxy
container in a separate application pod. The cloud-event-proxy
container receives the events from the PTP Operator pod and passes it to the consumer application. The consumer application subscribes to the events posted in the cloud-event-proxy
container by using a REST API.
For more information about deploying PTP events applications, see About the PTP fast event notifications framework.
The following information provides general guidance for developing consumer applications that use PTP events. A complete events consumer application example is outside the scope of this information.
19.4.1. PTP events consumer application reference
PTP event consumer applications require the following features:
-
A web service running with a
POST
handler to receive the cloud native PTP events JSON payload -
A
createSubscription
function to subscribe to the PTP events producer -
A
getCurrentState
function to poll the current state of the PTP events producer
The following example Go snippets illustrate these requirements:
Example PTP events consumer server function in Go
func server() { http.HandleFunc("/event", getEvent) http.ListenAndServe("localhost:8989", nil) } func getEvent(w http.ResponseWriter, req *http.Request) { defer req.Body.Close() bodyBytes, err := io.ReadAll(req.Body) if err != nil { log.Errorf("error reading event %v", err) } e := string(bodyBytes) if e != "" { processEvent(bodyBytes) log.Infof("received event %s", string(bodyBytes)) } else { w.WriteHeader(http.StatusNoContent) } }
func server() {
http.HandleFunc("/event", getEvent)
http.ListenAndServe("localhost:8989", nil)
}
func getEvent(w http.ResponseWriter, req *http.Request) {
defer req.Body.Close()
bodyBytes, err := io.ReadAll(req.Body)
if err != nil {
log.Errorf("error reading event %v", err)
}
e := string(bodyBytes)
if e != "" {
processEvent(bodyBytes)
log.Infof("received event %s", string(bodyBytes))
} else {
w.WriteHeader(http.StatusNoContent)
}
}
Example PTP events createSubscription function in Go
import ( "github.com/redhat-cne/sdk-go/pkg/pubsub" "github.com/redhat-cne/sdk-go/pkg/types" v1pubsub "github.com/redhat-cne/sdk-go/v1/pubsub" ) // Subscribe to PTP events using REST API s1,_:=createsubscription("/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state") s2,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change") s3,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/lock-state") // Create PTP event subscriptions POST func createSubscription(resourceAddress string) (sub pubsub.PubSub, err error) { var status int apiPath:= "/api/ocloudNotifications/v1/" localAPIAddr:=localhost:8989 // vDU service API address apiAddr:= "localhost:8089" // event framework API address subURL := &types.URI{URL: url.URL{Scheme: "http", Host: apiAddr Path: fmt.Sprintf("%s%s", apiPath, "subscriptions")}} endpointURL := &types.URI{URL: url.URL{Scheme: "http", Host: localAPIAddr, Path: "event"}} sub = v1pubsub.NewPubSub(endpointURL, resourceAddress) var subB []byte if subB, err = json.Marshal(&sub); err == nil { rc := restclient.New() if status, subB = rc.PostWithReturn(subURL, subB); status != http.StatusCreated { err = fmt.Errorf("error in subscription creation api at %s, returned status %d", subURL, status) } else { err = json.Unmarshal(subB, &sub) } } else { err = fmt.Errorf("failed to marshal subscription for %s", resourceAddress) } return }
import (
"github.com/redhat-cne/sdk-go/pkg/pubsub"
"github.com/redhat-cne/sdk-go/pkg/types"
v1pubsub "github.com/redhat-cne/sdk-go/v1/pubsub"
)
// Subscribe to PTP events using REST API
s1,_:=createsubscription("/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state")
s2,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change")
s3,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/lock-state")
// Create PTP event subscriptions POST
func createSubscription(resourceAddress string) (sub pubsub.PubSub, err error) {
var status int
apiPath:= "/api/ocloudNotifications/v1/"
localAPIAddr:=localhost:8989 // vDU service API address
apiAddr:= "localhost:8089" // event framework API address
subURL := &types.URI{URL: url.URL{Scheme: "http",
Host: apiAddr
Path: fmt.Sprintf("%s%s", apiPath, "subscriptions")}}
endpointURL := &types.URI{URL: url.URL{Scheme: "http",
Host: localAPIAddr,
Path: "event"}}
sub = v1pubsub.NewPubSub(endpointURL, resourceAddress)
var subB []byte
if subB, err = json.Marshal(&sub); err == nil {
rc := restclient.New()
if status, subB = rc.PostWithReturn(subURL, subB); status != http.StatusCreated {
err = fmt.Errorf("error in subscription creation api at %s, returned status %d", subURL, status)
} else {
err = json.Unmarshal(subB, &sub)
}
} else {
err = fmt.Errorf("failed to marshal subscription for %s", resourceAddress)
}
return
}
- 1
- Replace
<node_name>
with the FQDN of the node that is generating the PTP events. For example,compute-1.example.com
.
Example PTP events consumer getCurrentState function in Go
//Get PTP event state for the resource func getCurrentState(resource string) { //Create publisher url := &types.URI{URL: url.URL{Scheme: "http", Host: localhost:8989, Path: fmt.SPrintf("/api/ocloudNotifications/v1/%s/CurrentState",resource}} rc := restclient.New() status, event := rc.Get(url) if status != http.StatusOK { log.Errorf("CurrentState:error %d from url %s, %s", status, url.String(), event) } else { log.Debugf("Got CurrentState: %s ", event) } }
//Get PTP event state for the resource
func getCurrentState(resource string) {
//Create publisher
url := &types.URI{URL: url.URL{Scheme: "http",
Host: localhost:8989,
Path: fmt.SPrintf("/api/ocloudNotifications/v1/%s/CurrentState",resource}}
rc := restclient.New()
status, event := rc.Get(url)
if status != http.StatusOK {
log.Errorf("CurrentState:error %d from url %s, %s", status, url.String(), event)
} else {
log.Debugf("Got CurrentState: %s ", event)
}
}
19.4.2. Reference cloud-event-proxy deployment and service CRs
Use the following example cloud-event-proxy
deployment and subscriber service CRs as a reference when deploying your PTP events consumer application.
HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.
Reference cloud-event-proxy deployment with HTTP transport
apiVersion: apps/v1 kind: Deployment metadata: name: event-consumer-deployment namespace: <namespace> labels: app: consumer spec: replicas: 1 selector: matchLabels: app: consumer template: metadata: labels: app: consumer spec: serviceAccountName: sidecar-consumer-sa containers: - name: event-subscriber image: event-subscriber-app - name: cloud-event-proxy-as-sidecar image: openshift4/ose-cloud-event-proxy args: - "--metrics-addr=127.0.0.1:9091" - "--store-path=/store" - "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043" - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043" - "--api-port=8089" env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: NODE_IP valueFrom: fieldRef: fieldPath: status.hostIP volumeMounts: - name: pubsubstore mountPath: /store ports: - name: metrics-port containerPort: 9091 - name: sub-port containerPort: 9043 volumes: - name: pubsubstore emptyDir: {}
apiVersion: apps/v1
kind: Deployment
metadata:
name: event-consumer-deployment
namespace: <namespace>
labels:
app: consumer
spec:
replicas: 1
selector:
matchLabels:
app: consumer
template:
metadata:
labels:
app: consumer
spec:
serviceAccountName: sidecar-consumer-sa
containers:
- name: event-subscriber
image: event-subscriber-app
- name: cloud-event-proxy-as-sidecar
image: openshift4/ose-cloud-event-proxy
args:
- "--metrics-addr=127.0.0.1:9091"
- "--store-path=/store"
- "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043"
- "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"
- "--api-port=8089"
env:
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: NODE_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
volumeMounts:
- name: pubsubstore
mountPath: /store
ports:
- name: metrics-port
containerPort: 9091
- name: sub-port
containerPort: 9043
volumes:
- name: pubsubstore
emptyDir: {}
Reference cloud-event-proxy deployment with AMQ transport
apiVersion: apps/v1 kind: Deployment metadata: name: cloud-event-proxy-sidecar namespace: cloud-events labels: app: cloud-event-proxy spec: selector: matchLabels: app: cloud-event-proxy template: metadata: labels: app: cloud-event-proxy spec: nodeSelector: node-role.kubernetes.io/worker: "" containers: - name: cloud-event-sidecar image: openshift4/ose-cloud-event-proxy args: - "--metrics-addr=127.0.0.1:9091" - "--store-path=/store" - "--transport-host=amqp://router.router.svc.cluster.local" - "--api-port=8089" env: - name: <node_name> valueFrom: fieldRef: fieldPath: spec.nodeName - name: <node_ip> valueFrom: fieldRef: fieldPath: status.hostIP volumeMounts: - name: pubsubstore mountPath: /store ports: - name: metrics-port containerPort: 9091 - name: sub-port containerPort: 9043 volumes: - name: pubsubstore emptyDir: {}
apiVersion: apps/v1
kind: Deployment
metadata:
name: cloud-event-proxy-sidecar
namespace: cloud-events
labels:
app: cloud-event-proxy
spec:
selector:
matchLabels:
app: cloud-event-proxy
template:
metadata:
labels:
app: cloud-event-proxy
spec:
nodeSelector:
node-role.kubernetes.io/worker: ""
containers:
- name: cloud-event-sidecar
image: openshift4/ose-cloud-event-proxy
args:
- "--metrics-addr=127.0.0.1:9091"
- "--store-path=/store"
- "--transport-host=amqp://router.router.svc.cluster.local"
- "--api-port=8089"
env:
- name: <node_name>
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: <node_ip>
valueFrom:
fieldRef:
fieldPath: status.hostIP
volumeMounts:
- name: pubsubstore
mountPath: /store
ports:
- name: metrics-port
containerPort: 9091
- name: sub-port
containerPort: 9043
volumes:
- name: pubsubstore
emptyDir: {}
Reference cloud-event-proxy subscriber service
apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: "true" service.alpha.openshift.io/serving-cert-secret-name: sidecar-consumer-secret name: consumer-events-subscription-service namespace: cloud-events labels: app: consumer-service spec: ports: - name: sub-port port: 9043 selector: app: consumer clusterIP: None sessionAffinity: None type: ClusterIP
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
service.alpha.openshift.io/serving-cert-secret-name: sidecar-consumer-secret
name: consumer-events-subscription-service
namespace: cloud-events
labels:
app: consumer-service
spec:
ports:
- name: sub-port
port: 9043
selector:
app: consumer
clusterIP: None
sessionAffinity: None
type: ClusterIP
19.4.3. PTP events available from the cloud-event-proxy sidecar REST API
PTP events consumer applications can poll the PTP events producer for the following PTP timing events.
Resource URI | Description |
---|---|
|
Describes the current status of the PTP equipment lock state. Can be in |
|
Describes the host operating system clock synchronization state. Can be in |
| Describes the current state of the PTP clock class. |
19.4.4. Subscribing the consumer application to PTP events
Before the PTP events consumer application can poll for events, you need to subscribe the application to the event producer.
19.4.4.1. Subscribing to PTP lock-state events
To create a subscription for PTP lock-state
events, send a POST
action to the cloud event API at http://localhost:8081/api/ocloudNotifications/v1/subscriptions
with the following payload:
{ "endpointUri": "http://localhost:8989/event", "resource": "/cluster/node/<node_name>/sync/ptp-status/lock-state", }
{
"endpointUri": "http://localhost:8989/event",
"resource": "/cluster/node/<node_name>/sync/ptp-status/lock-state",
}
Example response
{ "id": "e23473d9-ba18-4f78-946e-401a0caeff90", "endpointUri": "http://localhost:8989/event", "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90", "resource": "/cluster/node/<node_name>/sync/ptp-status/lock-state", }
{
"id": "e23473d9-ba18-4f78-946e-401a0caeff90",
"endpointUri": "http://localhost:8989/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90",
"resource": "/cluster/node/<node_name>/sync/ptp-status/lock-state",
}
19.4.4.2. Subscribing to PTP os-clock-sync-state events
To create a subscription for PTP os-clock-sync-state
events, send a POST
action to the cloud event API at http://localhost:8081/api/ocloudNotifications/v1/subscriptions
with the following payload:
{ "endpointUri": "http://localhost:8989/event", "resource": "/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state", }
{
"endpointUri": "http://localhost:8989/event",
"resource": "/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state",
}
Example response
{ "id": "e23473d9-ba18-4f78-946e-401a0caeff90", "endpointUri": "http://localhost:8989/event", "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90", "resource": "/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state", }
{
"id": "e23473d9-ba18-4f78-946e-401a0caeff90",
"endpointUri": "http://localhost:8989/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90",
"resource": "/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state",
}
19.4.4.3. Subscribing to PTP ptp-clock-class-change events
To create a subscription for PTP ptp-clock-class-change
events, send a POST
action to the cloud event API at http://localhost:8081/api/ocloudNotifications/v1/subscriptions
with the following payload:
{ "endpointUri": "http://localhost:8989/event", "resource": "/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change", }
{
"endpointUri": "http://localhost:8989/event",
"resource": "/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change",
}
Example response
{ "id": "e23473d9-ba18-4f78-946e-401a0caeff90", "endpointUri": "http://localhost:8989/event", "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90", "resource": "/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change", }
{
"id": "e23473d9-ba18-4f78-946e-401a0caeff90",
"endpointUri": "http://localhost:8989/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90",
"resource": "/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change",
}
19.4.5. Getting the current PTP clock status
To get the current PTP status for the node, send a GET
action to one of the following event REST APIs:
-
http://localhost:8081/api/ocloudNotifications/v1/cluster/node/<node_name>/sync/ptp-status/lock-state/CurrentState
-
http://localhost:8081/api/ocloudNotifications/v1/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state/CurrentState
-
http://localhost:8081/api/ocloudNotifications/v1/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change/CurrentState
The response is a cloud native event JSON object. For example:
Example lock-state API response
{ "id": "c1ac3aa5-1195-4786-84f8-da0ea4462921", "type": "event.sync.ptp-status.ptp-state-change", "source": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state", "dataContentType": "application/json", "time": "2023-01-10T02:41:57.094981478Z", "data": { "version": "v1", "values": [ { "resource": "/cluster/node/compute-1.example.com/ens5fx/master", "dataType": "notification", "valueType": "enumeration", "value": "LOCKED" }, { "resource": "/cluster/node/compute-1.example.com/ens5fx/master", "dataType": "metric", "valueType": "decimal64.3", "value": "29" } ] } }
{
"id": "c1ac3aa5-1195-4786-84f8-da0ea4462921",
"type": "event.sync.ptp-status.ptp-state-change",
"source": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
"dataContentType": "application/json",
"time": "2023-01-10T02:41:57.094981478Z",
"data": {
"version": "v1",
"values": [
{
"resource": "/cluster/node/compute-1.example.com/ens5fx/master",
"dataType": "notification",
"valueType": "enumeration",
"value": "LOCKED"
},
{
"resource": "/cluster/node/compute-1.example.com/ens5fx/master",
"dataType": "metric",
"valueType": "decimal64.3",
"value": "29"
}
]
}
}
19.4.6. Verifying that the PTP events consumer application is receiving events
Verify that the cloud-event-proxy
container in the application pod is receiving PTP events.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have logged in as a user with
cluster-admin
privileges. - You have installed and configured the PTP Operator.
Procedure
Get the list of active
linuxptp-daemon
pods. Run the following command:oc get pods -n openshift-ptp
$ oc get pods -n openshift-ptp
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE linuxptp-daemon-2t78p 3/3 Running 0 8h linuxptp-daemon-k8n88 3/3 Running 0 8h
NAME READY STATUS RESTARTS AGE linuxptp-daemon-2t78p 3/3 Running 0 8h linuxptp-daemon-k8n88 3/3 Running 0 8h
Copy to Clipboard Copied! Access the metrics for the required consumer-side
cloud-event-proxy
container by running the following command:oc exec -it <linuxptp-daemon> -n openshift-ptp -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics
$ oc exec -it <linuxptp-daemon> -n openshift-ptp -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics
Copy to Clipboard Copied! where:
- <linuxptp-daemon>
Specifies the pod you want to query, for example,
linuxptp-daemon-2t78p
.Example output
HELP cne_transport_connections_resets Metric to get number of connection resets TYPE cne_transport_connections_resets gauge HELP cne_transport_receiver Metric to get number of receiver created TYPE cne_transport_receiver gauge HELP cne_transport_sender Metric to get number of sender created TYPE cne_transport_sender gauge HELP cne_events_ack Metric to get number of events produced TYPE cne_events_ack gauge HELP cne_events_transport_published Metric to get number of events published by the transport TYPE cne_events_transport_published gauge HELP cne_events_transport_received Metric to get number of events received by the transport TYPE cne_events_transport_received gauge HELP cne_events_api_published Metric to get number of events published by the rest api TYPE cne_events_api_published gauge HELP cne_events_received Metric to get number of events received TYPE cne_events_received gauge HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served. TYPE promhttp_metric_handler_requests_in_flight gauge HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. TYPE promhttp_metric_handler_requests_total counter
# HELP cne_transport_connections_resets Metric to get number of connection resets # TYPE cne_transport_connections_resets gauge cne_transport_connection_reset 1 # HELP cne_transport_receiver Metric to get number of receiver created # TYPE cne_transport_receiver gauge cne_transport_receiver{address="/cluster/node/compute-1.example.com/ptp",status="active"} 2 cne_transport_receiver{address="/cluster/node/compute-1.example.com/redfish/event",status="active"} 2 # HELP cne_transport_sender Metric to get number of sender created # TYPE cne_transport_sender gauge cne_transport_sender{address="/cluster/node/compute-1.example.com/ptp",status="active"} 1 cne_transport_sender{address="/cluster/node/compute-1.example.com/redfish/event",status="active"} 1 # HELP cne_events_ack Metric to get number of events produced # TYPE cne_events_ack gauge cne_events_ack{status="success",type="/cluster/node/compute-1.example.com/ptp"} 18 cne_events_ack{status="success",type="/cluster/node/compute-1.example.com/redfish/event"} 18 # HELP cne_events_transport_published Metric to get number of events published by the transport # TYPE cne_events_transport_published gauge cne_events_transport_published{address="/cluster/node/compute-1.example.com/ptp",status="failed"} 1 cne_events_transport_published{address="/cluster/node/compute-1.example.com/ptp",status="success"} 18 cne_events_transport_published{address="/cluster/node/compute-1.example.com/redfish/event",status="failed"} 1 cne_events_transport_published{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 18 # HELP cne_events_transport_received Metric to get number of events received by the transport # TYPE cne_events_transport_received gauge cne_events_transport_received{address="/cluster/node/compute-1.example.com/ptp",status="success"} 18 cne_events_transport_received{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 18 # HELP cne_events_api_published Metric to get number of events published by the rest api # TYPE cne_events_api_published gauge cne_events_api_published{address="/cluster/node/compute-1.example.com/ptp",status="success"} 19 cne_events_api_published{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 19 # HELP cne_events_received Metric to get number of events received # TYPE cne_events_received gauge cne_events_received{status="success",type="/cluster/node/compute-1.example.com/ptp"} 18 cne_events_received{status="success",type="/cluster/node/compute-1.example.com/redfish/event"} 18 # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served. # TYPE promhttp_metric_handler_requests_in_flight gauge promhttp_metric_handler_requests_in_flight 1 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. # TYPE promhttp_metric_handler_requests_total counter promhttp_metric_handler_requests_total{code="200"} 4 promhttp_metric_handler_requests_total{code="500"} 0 promhttp_metric_handler_requests_total{code="503"} 0
Copy to Clipboard Copied!
Chapter 20. External DNS Operator
20.1. External DNS Operator release notes
The External DNS Operator deploys and manages ExternalDNS
to provide name resolution for services and routes from the external DNS provider to OpenShift Container Platform.
The External DNS Operator is only supported on the x86_64
architecture.
These release notes track the development of the External DNS Operator in OpenShift Container Platform.
20.1.1. External DNS Operator 1.2.0
The following advisory is available for the External DNS Operator version 1.2.0:
20.1.1.1. New features
- The External DNS Operator now supports AWS shared VPC. For more information, see Creating DNS records in a different AWS Account using a shared VPC.
20.1.1.2. Bug fixes
-
The update strategy for the operand changed from
Rolling
toRecreate
. (OCPBUGS-3630)
20.1.2. External DNS Operator 1.1.1
The following advisory is available for the External DNS Operator version 1.1.1:
20.1.3. External DNS Operator 1.1.0
This release included a rebase of the operand from the upstream project version 0.13.1. The following advisory is available for the External DNS Operator version 1.1.0:
20.1.3.1. Bug fixes
-
Previously, the ExternalDNS Operator enforced an empty
defaultMode
value for volumes, which caused constant updates due to a conflict with the OpenShift API. Now, thedefaultMode
value is not enforced and operand deployment does not update constantly. (OCPBUGS-2793)
20.1.4. External DNS Operator 1.0.1
The following advisory is available for the External DNS Operator version 1.0.1:
20.1.5. External DNS Operator 1.0.0
The following advisory is available for the External DNS Operator version 1.0.0:
20.1.5.1. Bug fixes
- Previously, the External DNS Operator issued a warning about the violation of the restricted SCC policy during ExternalDNS operand pod deployments. This issue has been resolved. (BZ#2086408)
20.2. External DNS Operator in OpenShift Container Platform
The External DNS Operator deploys and manages ExternalDNS
to provide the name resolution for services and routes from the external DNS provider to OpenShift Container Platform.
20.2.1. External DNS Operator
The External DNS Operator implements the External DNS API from the olm.openshift.io
API group. The External DNS Operator updates services, routes, and external DNS providers.
Prerequisites
-
You have installed the
yq
CLI tool.
Procedure
You can deploy the External DNS Operator on demand from the OperatorHub. Deploying the External DNS Operator creates a Subscription
object.
Check the name of an install plan by running the following command:
oc -n external-dns-operator get sub external-dns-operator -o yaml | yq '.status.installplan.name'
$ oc -n external-dns-operator get sub external-dns-operator -o yaml | yq '.status.installplan.name'
Copy to Clipboard Copied! Example output
install-zcvlr
install-zcvlr
Copy to Clipboard Copied! Check if the status of an install plan is
Complete
by running the following command:oc -n external-dns-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
$ oc -n external-dns-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
Copy to Clipboard Copied! Example output
Complete
Complete
Copy to Clipboard Copied! View the status of the
external-dns-operator
deployment by running the following command:oc get -n external-dns-operator deployment/external-dns-operator
$ oc get -n external-dns-operator deployment/external-dns-operator
Copy to Clipboard Copied! Example output
NAME READY UP-TO-DATE AVAILABLE AGE external-dns-operator 1/1 1 1 23h
NAME READY UP-TO-DATE AVAILABLE AGE external-dns-operator 1/1 1 1 23h
Copy to Clipboard Copied!
20.2.2. External DNS Operator logs
You can view External DNS Operator logs by using the oc logs
command.
Procedure
View the logs of the External DNS Operator by running the following command:
oc logs -n external-dns-operator deployment/external-dns-operator -c external-dns-operator
$ oc logs -n external-dns-operator deployment/external-dns-operator -c external-dns-operator
Copy to Clipboard Copied!
20.2.2.1. External DNS Operator domain name limitations
The External DNS Operator uses the TXT registry which adds the prefix for TXT records. This reduces the maximum length of the domain name for TXT records. A DNS record cannot be present without a corresponding TXT record, so the domain name of the DNS record must follow the same limit as the TXT records. For example, a DNS record of <domain_name_from_source>
results in a TXT record of external-dns-<record_type>-<domain_name_from_source>
.
The domain name of the DNS records generated by the External DNS Operator has the following limitations:
Record type | Number of characters |
---|---|
CNAME | 44 |
Wildcard CNAME records on AzureDNS | 42 |
A | 48 |
Wildcard A records on AzureDNS | 46 |
The following error appears in the External DNS Operator logs if the generated domain name exceeds any of the domain name limitations:
time="2022-09-02T08:53:57Z" level=error msg="Failure in zone test.example.io. [Id: /hostedzone/Z06988883Q0H0RL6UMXXX]" time="2022-09-02T08:53:57Z" level=error msg="InvalidChangeBatch: [FATAL problem: DomainLabelTooLong (Domain label is too long) encountered with 'external-dns-a-hello-openshift-aaaaaaaaaa-bbbbbbbbbb-ccccccc']\n\tstatus code: 400, request id: e54dfd5a-06c6-47b0-bcb9-a4f7c3a4e0c6"
time="2022-09-02T08:53:57Z" level=error msg="Failure in zone test.example.io. [Id: /hostedzone/Z06988883Q0H0RL6UMXXX]"
time="2022-09-02T08:53:57Z" level=error msg="InvalidChangeBatch: [FATAL problem: DomainLabelTooLong (Domain label is too long) encountered with 'external-dns-a-hello-openshift-aaaaaaaaaa-bbbbbbbbbb-ccccccc']\n\tstatus code: 400, request id: e54dfd5a-06c6-47b0-bcb9-a4f7c3a4e0c6"
20.3. Installing External DNS Operator on cloud providers
You can install the External DNS Operator on cloud providers such as AWS, Azure, and GCP.
20.3.1. Installing the External DNS Operator with OperatorHub
You can install the External DNS Operator by using the OpenShift Container Platform OperatorHub.
Procedure
- Click Operators → OperatorHub in the OpenShift Container Platform web console.
- Click External DNS Operator. You can use the Filter by keyword text box or the filter list to search for External DNS Operator from the list of Operators.
-
Select the
external-dns-operator
namespace. - On the External DNS Operator page, click Install.
On the Install Operator page, ensure that you selected the following options:
- Update the channel as stable-v1.
- Installation mode as A specific name on the cluster.
-
Installed namespace as
external-dns-operator
. If namespaceexternal-dns-operator
does not exist, it gets created during the Operator installation. - Select Approval Strategy as Automatic or Manual. Approval Strategy is set to Automatic by default.
- Click Install.
If you select Automatic updates, the Operator Lifecycle Manager (OLM) automatically upgrades the running instance of your Operator without any intervention.
If you select Manual updates, the OLM creates an update request. As a cluster administrator, you must then manually approve that update request to have the Operator updated to the new version.
Verification
Verify that the External DNS Operator shows the Status as Succeeded on the Installed Operators dashboard.
20.3.2. Installing the External DNS Operator by using the CLI
You can install the External DNS Operator by using the CLI.
Prerequisites
-
You are logged in to the OpenShift Container Platform web console as a user with
cluster-admin
permissions. -
You are logged into the OpenShift CLI (
oc
).
Procedure
Create a
Namespace
object:Create a YAML file that defines the
Namespace
object:Example
namespace.yaml
fileapiVersion: v1 kind: Namespace metadata: name: external-dns-operator
apiVersion: v1 kind: Namespace metadata: name: external-dns-operator
Copy to Clipboard Copied! Create the
Namespace
object by running the following command:oc apply -f namespace.yaml
$ oc apply -f namespace.yaml
Copy to Clipboard Copied!
Create an
OperatorGroup
object:Create a YAML file that defines the
OperatorGroup
object:Example
operatorgroup.yaml
fileapiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: external-dns-operator namespace: external-dns-operator spec: upgradeStrategy: Default targetNamespaces: - external-dns-operator
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: external-dns-operator namespace: external-dns-operator spec: upgradeStrategy: Default targetNamespaces: - external-dns-operator
Copy to Clipboard Copied! Create the
OperatorGroup
object by running the following command:oc apply -f operatorgroup.yaml
$ oc apply -f operatorgroup.yaml
Copy to Clipboard Copied!
Create a
Subscription
object:Create a YAML file that defines the
Subscription
object:Example
subscription.yaml
fileapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: external-dns-operator namespace: external-dns-operator spec: channel: stable-v1 installPlanApproval: Automatic name: external-dns-operator source: redhat-operators sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: external-dns-operator namespace: external-dns-operator spec: channel: stable-v1 installPlanApproval: Automatic name: external-dns-operator source: redhat-operators sourceNamespace: openshift-marketplace
Copy to Clipboard Copied! Create the
Subscription
object by running the following command:oc apply -f subscription.yaml
$ oc apply -f subscription.yaml
Copy to Clipboard Copied!
Verification
Get the name of the install plan from the subscription by running the following command:
oc -n external-dns-operator \ get subscription external-dns-operator \ --template='{{.status.installplan.name}}{{"\n"}}'
$ oc -n external-dns-operator \ get subscription external-dns-operator \ --template='{{.status.installplan.name}}{{"\n"}}'
Copy to Clipboard Copied! Verify that the status of the install plan is
Complete
by running the following command:oc -n external-dns-operator \ get ip <install_plan_name> \ --template='{{.status.phase}}{{"\n"}}'
$ oc -n external-dns-operator \ get ip <install_plan_name> \ --template='{{.status.phase}}{{"\n"}}'
Copy to Clipboard Copied! Verify that the status of the
external-dns-operator
pod isRunning
by running the following command:oc -n external-dns-operator get pod
$ oc -n external-dns-operator get pod
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE external-dns-operator-5584585fd7-5lwqm 2/2 Running 0 11m
NAME READY STATUS RESTARTS AGE external-dns-operator-5584585fd7-5lwqm 2/2 Running 0 11m
Copy to Clipboard Copied! Verify that the catalog source of the subscription is
redhat-operators
by running the following command:oc -n external-dns-operator get subscription
$ oc -n external-dns-operator get subscription
Copy to Clipboard Copied! Example output
NAME PACKAGE SOURCE CHANNEL external-dns-operator external-dns-operator redhat-operators stable-v1
NAME PACKAGE SOURCE CHANNEL external-dns-operator external-dns-operator redhat-operators stable-v1
Copy to Clipboard Copied! Check the
external-dns-operator
version by running the following command:oc -n external-dns-operator get csv
$ oc -n external-dns-operator get csv
Copy to Clipboard Copied! Example output
NAME DISPLAY VERSION REPLACES PHASE external-dns-operator.v<1.y.z> ExternalDNS Operator <1.y.z> Succeeded
NAME DISPLAY VERSION REPLACES PHASE external-dns-operator.v<1.y.z> ExternalDNS Operator <1.y.z> Succeeded
Copy to Clipboard Copied!
20.4. External DNS Operator configuration parameters
The External DNS Operator includes the following configuration parameters.
20.4.1. External DNS Operator configuration parameters
The External DNS Operator includes the following configuration parameters:
Parameter | Description |
---|---|
| Enables the type of a cloud provider. spec: provider: type: AWS aws: credentials: name: aws-access-key
|
|
Enables you to specify DNS zones by their domains. If you do not specify zones, the zones: - "myzoneid"
|
|
Enables you to specify AWS zones by their domains. If you do not specify domains, the domains: - filterType: Include matchType: Exact name: "myzonedomain1.com" - filterType: Include matchType: Pattern pattern: ".*\\.otherzonedomain\\.com"
|
|
Enables you to specify the source for the DNS records, source: type: Service service: serviceType: - LoadBalancer - ClusterIP labelFilter: matchLabels: external-dns.mydomain.org/publish: "yes" hostnameAnnotation: "Allow" fqdnTemplate: - "{{.Name}}.myzonedomain.com"
source: type: OpenShiftRoute openshiftRouteOptions: routerName: default labelFilter: matchLabels: external-dns.mydomain.org/publish: "yes"
|
20.5. Creating DNS records on AWS
You can create DNS records on AWS and AWS GovCloud by using External DNS Operator.
20.5.1. Creating DNS records on an public hosted zone for AWS by using Red Hat External DNS Operator
You can create DNS records on a public hosted zone for AWS by using the Red Hat External DNS Operator. You can use the same instructions to create DNS records on a hosted zone for AWS GovCloud.
Procedure
Check the user. The user must have access to the
kube-system
namespace. If you don’t have the credentials, as you can fetch the credentials from thekube-system
namespace to use the cloud provider client:oc whoami
$ oc whoami
Copy to Clipboard Copied! Example output
system:admin
system:admin
Copy to Clipboard Copied! Fetch the values from aws-creds secret present in
kube-system
namespace.export AWS_ACCESS_KEY_ID=$(oc get secrets aws-creds -n kube-system --template={{.data.aws_access_key_id}} | base64 -d) export AWS_SECRET_ACCESS_KEY=$(oc get secrets aws-creds -n kube-system --template={{.data.aws_secret_access_key}} | base64 -d)
$ export AWS_ACCESS_KEY_ID=$(oc get secrets aws-creds -n kube-system --template={{.data.aws_access_key_id}} | base64 -d) $ export AWS_SECRET_ACCESS_KEY=$(oc get secrets aws-creds -n kube-system --template={{.data.aws_secret_access_key}} | base64 -d)
Copy to Clipboard Copied! Get the routes to check the domain:
oc get routes --all-namespaces | grep console
$ oc get routes --all-namespaces | grep console
Copy to Clipboard Copied! Example output
openshift-console console console-openshift-console.apps.testextdnsoperator.apacshift.support console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.testextdnsoperator.apacshift.support downloads http edge/Redirect None
openshift-console console console-openshift-console.apps.testextdnsoperator.apacshift.support console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.testextdnsoperator.apacshift.support downloads http edge/Redirect None
Copy to Clipboard Copied! Get the list of dns zones to find the one which corresponds to the previously found route’s domain:
aws route53 list-hosted-zones | grep testextdnsoperator.apacshift.support
$ aws route53 list-hosted-zones | grep testextdnsoperator.apacshift.support
Copy to Clipboard Copied! Example output
HOSTEDZONES terraform /hostedzone/Z02355203TNN1XXXX1J6O testextdnsoperator.apacshift.support. 5
HOSTEDZONES terraform /hostedzone/Z02355203TNN1XXXX1J6O testextdnsoperator.apacshift.support. 5
Copy to Clipboard Copied! Create
ExternalDNS
resource forroute
source:$ cat <<EOF | oc create -f - apiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-aws spec: domains: - filterType: Include matchType: Exact name: testextdnsoperator.apacshift.support provider: type: AWS source: type: OpenShiftRoute openshiftRouteOptions: routerName: default EOF
$ cat <<EOF | oc create -f - apiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-aws
1 spec: domains: - filterType: Include
2 matchType: Exact
3 name: testextdnsoperator.apacshift.support
4 provider: type: AWS
5 source:
6 type: OpenShiftRoute
7 openshiftRouteOptions: routerName: default
8 EOF
Copy to Clipboard Copied! - 1
- Defines the name of external DNS resource.
- 2
- By default all hosted zones are selected as potential targets. You can include a hosted zone that you need.
- 3
- The matching of the target zone’s domain has to be exact (as opposed to regular expression match).
- 4
- Specify the exact domain of the zone you want to update. The hostname of the routes must be subdomains of the specified domain.
- 5
- Defines the
AWS Route53
DNS provider. - 6
- Defines options for the source of DNS records.
- 7
- Defines OpenShift
route
resource as the source for the DNS records which gets created in the previously specified DNS provider. - 8
- If the source is
OpenShiftRoute
, then you can pass the OpenShift Ingress Controller name. External DNS Operator selects the canonical hostname of that router as the target while creating CNAME record.
Check the records created for OCP routes using the following command:
aws route53 list-resource-record-sets --hosted-zone-id Z02355203TNN1XXXX1J6O --query "ResourceRecordSets[?Type == 'CNAME']" | grep console
$ aws route53 list-resource-record-sets --hosted-zone-id Z02355203TNN1XXXX1J6O --query "ResourceRecordSets[?Type == 'CNAME']" | grep console
Copy to Clipboard Copied!
20.5.2. Creating DNS records in a different AWS Account using a shared VPC
You can use the ExternalDNS Operator to create DNS records in a different AWS account using a shared Virtual Private Cloud (VPC). By using a shared VPC, an organization can connect resources from multiple projects to a common VPC network. Organizations can then use VPC sharing to use a single Route 53 instance across multiple AWS accounts.
Prerequisites
- You have created two Amazon AWS accounts: one with a VPC and a Route 53 private hosted zone configured (Account A), and another for installing a cluster (Account B).
- You have created an IAM Policy and IAM Role with the appropriate permissions in Account A for Account B to create DNS records in the Route 53 hosted zone of Account A.
- You have installed a cluster in Account B into the existing VPC for Account A.
- You have installed the ExternalDNS Operator in the cluster in Account B.
Procedure
Get the Role ARN of the IAM Role that you created to allow Account B to access Account A’s Route 53 hosted zone by running the following command:
aws --profile account-a iam get-role --role-name user-rol1 | head -1
$ aws --profile account-a iam get-role --role-name user-rol1 | head -1
Copy to Clipboard Copied! Example output
ROLE arn:aws:iam::1234567890123:role/user-rol1 2023-09-14T17:21:54+00:00 3600 / AROA3SGB2ZRKRT5NISNJN user-rol1
ROLE arn:aws:iam::1234567890123:role/user-rol1 2023-09-14T17:21:54+00:00 3600 / AROA3SGB2ZRKRT5NISNJN user-rol1
Copy to Clipboard Copied! Locate the private hosted zone to use with Account A’s credentials by running the following command:
aws --profile account-a route53 list-hosted-zones | grep testextdnsoperator.apacshift.support
$ aws --profile account-a route53 list-hosted-zones | grep testextdnsoperator.apacshift.support
Copy to Clipboard Copied! Example output
HOSTEDZONES terraform /hostedzone/Z02355203TNN1XXXX1J6O testextdnsoperator.apacshift.support. 5
HOSTEDZONES terraform /hostedzone/Z02355203TNN1XXXX1J6O testextdnsoperator.apacshift.support. 5
Copy to Clipboard Copied! Create the
ExternalDNS
object by running the following command:cat <<EOF | oc create -f - apiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-aws spec: domains: - filterType: Include matchType: Exact name: testextdnsoperator.apacshift.support provider: type: AWS aws: assumeRole: arn: arn:aws:iam::12345678901234:role/user-rol1 source: type: OpenShiftRoute openshiftRouteOptions: routerName: default EOF
$ cat <<EOF | oc create -f - apiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-aws spec: domains: - filterType: Include matchType: Exact name: testextdnsoperator.apacshift.support provider: type: AWS aws: assumeRole: arn: arn:aws:iam::12345678901234:role/user-rol1
1 source: type: OpenShiftRoute openshiftRouteOptions: routerName: default EOF
Copy to Clipboard Copied! - 1
- Specify the Role ARN to have DNS records created in Account A.
Check the records created for OpenShift Container Platform (OCP) routes by using the following command:
aws --profile account-a route53 list-resource-record-sets --hosted-zone-id Z02355203TNN1XXXX1J6O --query "ResourceRecordSets[?Type == 'CNAME']" | grep console-openshift-console
$ aws --profile account-a route53 list-resource-record-sets --hosted-zone-id Z02355203TNN1XXXX1J6O --query "ResourceRecordSets[?Type == 'CNAME']" | grep console-openshift-console
Copy to Clipboard Copied!
20.6. Creating DNS records on Azure
You can create DNS records on Azure by using the External DNS Operator.
Using the External DNS Operator on a Microsoft Entra Workload ID-enabled cluster or a cluster that runs in Microsoft Azure Government (MAG) regions is not supported.
20.6.1. Creating DNS records on an Azure public DNS zone
You can create DNS records on a public DNS zone for Azure by using the External DNS Operator.
Prerequisites
- You must have administrator privileges.
-
The
admin
user must have access to thekube-system
namespace.
Procedure
Fetch the credentials from the
kube-system
namespace to use the cloud provider client by running the following command:CLIENT_ID=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_client_id}} | base64 -d) CLIENT_SECRET=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_client_secret}} | base64 -d) RESOURCE_GROUP=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_resourcegroup}} | base64 -d) SUBSCRIPTION_ID=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_subscription_id}} | base64 -d) TENANT_ID=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_tenant_id}} | base64 -d)
$ CLIENT_ID=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_client_id}} | base64 -d) $ CLIENT_SECRET=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_client_secret}} | base64 -d) $ RESOURCE_GROUP=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_resourcegroup}} | base64 -d) $ SUBSCRIPTION_ID=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_subscription_id}} | base64 -d) $ TENANT_ID=$(oc get secrets azure-credentials -n kube-system --template={{.data.azure_tenant_id}} | base64 -d)
Copy to Clipboard Copied! Log in to Azure by running the following command:
az login --service-principal -u "${CLIENT_ID}" -p "${CLIENT_SECRET}" --tenant "${TENANT_ID}"
$ az login --service-principal -u "${CLIENT_ID}" -p "${CLIENT_SECRET}" --tenant "${TENANT_ID}"
Copy to Clipboard Copied! Get a list of routes by running the following command:
oc get routes --all-namespaces | grep console
$ oc get routes --all-namespaces | grep console
Copy to Clipboard Copied! Example output
openshift-console console console-openshift-console.apps.test.azure.example.com console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.test.azure.example.com downloads http edge/Redirect None
openshift-console console console-openshift-console.apps.test.azure.example.com console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.test.azure.example.com downloads http edge/Redirect None
Copy to Clipboard Copied! Get a list of DNS zones by running the following command:
az network dns zone list --resource-group "${RESOURCE_GROUP}"
$ az network dns zone list --resource-group "${RESOURCE_GROUP}"
Copy to Clipboard Copied! Create a YAML file, for example,
external-dns-sample-azure.yaml
, that defines theExternalDNS
object:Example
external-dns-sample-azure.yaml
fileapiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-azure spec: zones: - "/subscriptions/1234567890/resourceGroups/test-azure-xxxxx-rg/providers/Microsoft.Network/dnszones/test.azure.example.com" provider: type: Azure source: openshiftRouteOptions: routerName: default type: OpenShiftRoute
apiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-azure
1 spec: zones: - "/subscriptions/1234567890/resourceGroups/test-azure-xxxxx-rg/providers/Microsoft.Network/dnszones/test.azure.example.com"
2 provider: type: Azure
3 source: openshiftRouteOptions:
4 routerName: default
5 type: OpenShiftRoute
6 Copy to Clipboard Copied! - 1
- Specifies the External DNS name.
- 2
- Defines the zone ID.
- 3
- Defines the provider type.
- 4
- You can define options for the source of DNS records.
- 5
- If the source type is
OpenShiftRoute
, you can pass the OpenShift Ingress Controller name. External DNS selects the canonical hostname of that router as the target while creating CNAME record. - 6
- Defines the
route
resource as the source for the Azure DNS records.
Check the DNS records created for OpenShift Container Platform routes by running the following command:
az network dns record-set list -g "${RESOURCE_GROUP}" -z test.azure.example.com | grep console
$ az network dns record-set list -g "${RESOURCE_GROUP}" -z test.azure.example.com | grep console
Copy to Clipboard Copied! NoteTo create records on private hosted zones on private Azure DNS, you need to specify the private zone under the
zones
field which populates the provider type toazure-private-dns
in theExternalDNS
container arguments.
20.7. Creating DNS records on GCP
You can create DNS records on Google Cloud Platform (GCP) by using the External DNS Operator.
Using the External DNS Operator on a cluster with GCP Workload Identity enabled is not supported. For more information about the GCP Workload Identity, see GCP Workload Identity.
20.7.1. Creating DNS records on a public managed zone for GCP
You can create DNS records on a public managed zone for GCP by using the External DNS Operator.
Prerequisites
- You must have administrator privileges.
Procedure
Copy the
gcp-credentials
secret in theencoded-gcloud.json
file by running the following command:oc get secret gcp-credentials -n kube-system --template='{{$v := index .data "service_account.json"}}{{$v}}' | base64 -d - > decoded-gcloud.json
$ oc get secret gcp-credentials -n kube-system --template='{{$v := index .data "service_account.json"}}{{$v}}' | base64 -d - > decoded-gcloud.json
Copy to Clipboard Copied! Export your Google credentials by running the following command:
export GOOGLE_CREDENTIALS=decoded-gcloud.json
$ export GOOGLE_CREDENTIALS=decoded-gcloud.json
Copy to Clipboard Copied! Activate your account by using the following command:
gcloud auth activate-service-account <client_email as per decoded-gcloud.json> --key-file=decoded-gcloud.json
$ gcloud auth activate-service-account <client_email as per decoded-gcloud.json> --key-file=decoded-gcloud.json
Copy to Clipboard Copied! Set your project by running the following command:
gcloud config set project <project_id as per decoded-gcloud.json>
$ gcloud config set project <project_id as per decoded-gcloud.json>
Copy to Clipboard Copied! Get a list of routes by running the following command:
oc get routes --all-namespaces | grep console
$ oc get routes --all-namespaces | grep console
Copy to Clipboard Copied! Example output
openshift-console console console-openshift-console.apps.test.gcp.example.com console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.test.gcp.example.com downloads http edge/Redirect None
openshift-console console console-openshift-console.apps.test.gcp.example.com console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.test.gcp.example.com downloads http edge/Redirect None
Copy to Clipboard Copied! Get a list of managed zones by running the following command:
gcloud dns managed-zones list | grep test.gcp.example.com
$ gcloud dns managed-zones list | grep test.gcp.example.com
Copy to Clipboard Copied! Example output
qe-cvs4g-private-zone test.gcp.example.com
qe-cvs4g-private-zone test.gcp.example.com
Copy to Clipboard Copied! Create a YAML file, for example,
external-dns-sample-gcp.yaml
, that defines theExternalDNS
object:Example
external-dns-sample-gcp.yaml
fileapiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-gcp spec: domains: - filterType: Include matchType: Exact name: test.gcp.example.com provider: type: GCP source: openshiftRouteOptions: routerName: default type: OpenShiftRoute
apiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-gcp
1 spec: domains: - filterType: Include
2 matchType: Exact
3 name: test.gcp.example.com
4 provider: type: GCP
5 source: openshiftRouteOptions:
6 routerName: default
7 type: OpenShiftRoute
8 Copy to Clipboard Copied! - 1
- Specifies the External DNS name.
- 2
- By default, all hosted zones are selected as potential targets. You can include your hosted zone.
- 3
- The domain of the target must match the string defined by the
name
key. - 4
- Specify the exact domain of the zone you want to update. The hostname of the routes must be subdomains of the specified domain.
- 5
- Defines the provider type.
- 6
- You can define options for the source of DNS records.
- 7
- If the source type is
OpenShiftRoute
, you can pass the OpenShift Ingress Controller name. External DNS selects the canonical hostname of that router as the target while creating CNAME record. - 8
- Defines the
route
resource as the source for GCP DNS records.
Check the DNS records created for OpenShift Container Platform routes by running the following command:
gcloud dns record-sets list --zone=qe-cvs4g-private-zone | grep console
$ gcloud dns record-sets list --zone=qe-cvs4g-private-zone | grep console
Copy to Clipboard Copied!
20.8. Creating DNS records on Infoblox
You can create DNS records on Infoblox by using the External DNS Operator.
20.8.1. Creating DNS records on a public DNS zone on Infoblox
You can create DNS records on a public DNS zone on Infoblox by using the External DNS Operator.
Prerequisites
-
You have access to the OpenShift CLI (
oc
). - You have access to the Infoblox UI.
Procedure
Create a
secret
object with Infoblox credentials by running the following command:oc -n external-dns-operator create secret generic infoblox-credentials --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_USERNAME=<infoblox_username> --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_PASSWORD=<infoblox_password>
$ oc -n external-dns-operator create secret generic infoblox-credentials --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_USERNAME=<infoblox_username> --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_PASSWORD=<infoblox_password>
Copy to Clipboard Copied! Get a list of routes by running the following command:
oc get routes --all-namespaces | grep console
$ oc get routes --all-namespaces | grep console
Copy to Clipboard Copied! Example Output
openshift-console console console-openshift-console.apps.test.example.com console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.test.example.com downloads http edge/Redirect None
openshift-console console console-openshift-console.apps.test.example.com console https reencrypt/Redirect None openshift-console downloads downloads-openshift-console.apps.test.example.com downloads http edge/Redirect None
Copy to Clipboard Copied! Create a YAML file, for example,
external-dns-sample-infoblox.yaml
, that defines theExternalDNS
object:Example
external-dns-sample-infoblox.yaml
fileapiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-infoblox spec: provider: type: Infoblox infoblox: credentials: name: infoblox-credentials gridHost: ${INFOBLOX_GRID_PUBLIC_IP} wapiPort: 443 wapiVersion: "2.3.1" domains: - filterType: Include matchType: Exact name: test.example.com source: type: OpenShiftRoute openshiftRouteOptions: routerName: default
apiVersion: externaldns.olm.openshift.io/v1beta1 kind: ExternalDNS metadata: name: sample-infoblox
1 spec: provider: type: Infoblox
2 infoblox: credentials: name: infoblox-credentials gridHost: ${INFOBLOX_GRID_PUBLIC_IP} wapiPort: 443 wapiVersion: "2.3.1" domains: - filterType: Include matchType: Exact name: test.example.com source: type: OpenShiftRoute
3 openshiftRouteOptions: routerName: default
4 Copy to Clipboard Copied! - 1
- Specifies the External DNS name.
- 2
- Defines the provider type.
- 3
- You can define options for the source of DNS records.
- 4
- If the source type is
OpenShiftRoute
, you can pass the OpenShift Ingress Controller name. External DNS selects the canonical hostname of that router as the target while creating CNAME record.
Create the
ExternalDNS
resource on Infoblox by running the following command:oc create -f external-dns-sample-infoblox.yaml
$ oc create -f external-dns-sample-infoblox.yaml
Copy to Clipboard Copied! From the Infoblox UI, check the DNS records created for
console
routes:- Click Data Management → DNS → Zones.
- Select the zone name.
20.9. Configuring the cluster-wide proxy on the External DNS Operator
After configuring the cluster-wide proxy, the Operator Lifecycle Manager (OLM) triggers automatic updates to all of the deployed Operators with the new contents of the HTTP_PROXY
, HTTPS_PROXY
, and NO_PROXY
environment variables.
20.9.1. Trusting the certificate authority of the cluster-wide proxy
You can configure the External DNS Operator to trust the certificate authority of the cluster-wide proxy.
Procedure
Create the config map to contain the CA bundle in the
external-dns-operator
namespace by running the following command:oc -n external-dns-operator create configmap trusted-ca
$ oc -n external-dns-operator create configmap trusted-ca
Copy to Clipboard Copied! To inject the trusted CA bundle into the config map, add the
config.openshift.io/inject-trusted-cabundle=true
label to the config map by running the following command:oc -n external-dns-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
$ oc -n external-dns-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
Copy to Clipboard Copied! Update the subscription of the External DNS Operator by running the following command:
oc -n external-dns-operator patch subscription external-dns-operator --type='json' -p='[{"op": "add", "path": "/spec/config", "value":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}]}}]'
$ oc -n external-dns-operator patch subscription external-dns-operator --type='json' -p='[{"op": "add", "path": "/spec/config", "value":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}]}}]'
Copy to Clipboard Copied!
Verification
After the deployment of the External DNS Operator is completed, verify that the trusted CA environment variable is added to the
external-dns-operator
deployment by running the following command:oc -n external-dns-operator exec deploy/external-dns-operator -c external-dns-operator -- printenv TRUSTED_CA_CONFIGMAP_NAME
$ oc -n external-dns-operator exec deploy/external-dns-operator -c external-dns-operator -- printenv TRUSTED_CA_CONFIGMAP_NAME
Copy to Clipboard Copied! Example output
trusted-ca
trusted-ca
Copy to Clipboard Copied!
Chapter 21. Network policy
21.1. About network policy
As a developer, you can define network policies that restrict traffic to pods in your cluster.
21.1.1. About network policy
In a cluster using a network plugin that supports Kubernetes network policy, network isolation is controlled entirely by NetworkPolicy
objects. In OpenShift Container Platform 4.15, OpenShift SDN supports using network policy in its default network isolation mode.
- A network policy does not apply to the host network namespace. Pods with host networking enabled are unaffected by network policy rules. However, pods connecting to the host-networked pods might be affected by the network policy rules.
-
Using the
namespaceSelector
field without thepodSelector
field set to{}
will not includehostNetwork
pods. You must use thepodSelector
set to{}
with thenamespaceSelector
field in order to targethostNetwork
pods when creating network policies. - Network policies cannot block traffic from localhost or from their resident nodes.
By default, all pods in a project are accessible from other pods and network endpoints. To isolate one or more pods in a project, you can create NetworkPolicy
objects in that project to indicate the allowed incoming connections. Project administrators can create and delete NetworkPolicy
objects within their own project.
If a pod is matched by selectors in one or more NetworkPolicy
objects, then the pod will accept only connections that are allowed by at least one of those NetworkPolicy
objects. A pod that is not selected by any NetworkPolicy
objects is fully accessible.
A network policy applies to only the Transmission Control Protocol (TCP), User Datagram Protocol (UDP), Internet Control Message Protocol (ICMP), and Stream Control Transmission Protocol (SCTP) protocols. Other protocols are not affected.
The following example NetworkPolicy
objects demonstrate supporting different scenarios:
Deny all traffic:
To make a project deny by default, add a
NetworkPolicy
object that matches all pods but accepts no traffic:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-by-default spec: podSelector: {} ingress: []
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-by-default spec: podSelector: {} ingress: []
Copy to Clipboard Copied! Only allow connections from the OpenShift Container Platform Ingress Controller:
To make a project allow only connections from the OpenShift Container Platform Ingress Controller, add the following
NetworkPolicy
object.apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-openshift-ingress spec: ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/ingress: "" podSelector: {} policyTypes: - Ingress
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-openshift-ingress spec: ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/ingress: "" podSelector: {} policyTypes: - Ingress
Copy to Clipboard Copied! Only accept connections from pods within a project:
ImportantTo allow ingress connections from
hostNetwork
pods in the same namespace, you need to apply theallow-from-hostnetwork
policy together with theallow-same-namespace
policy.To make pods accept connections from other pods in the same project, but reject all other connections from pods in other projects, add the following
NetworkPolicy
object:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-same-namespace spec: podSelector: {} ingress: - from: - podSelector: {}
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-same-namespace spec: podSelector: {} ingress: - from: - podSelector: {}
Copy to Clipboard Copied! Only allow HTTP and HTTPS traffic based on pod labels:
To enable only HTTP and HTTPS access to the pods with a specific label (
role=frontend
in following example), add aNetworkPolicy
object similar to the following:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-http-and-https spec: podSelector: matchLabels: role: frontend ingress: - ports: - protocol: TCP port: 80 - protocol: TCP port: 443
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-http-and-https spec: podSelector: matchLabels: role: frontend ingress: - ports: - protocol: TCP port: 80 - protocol: TCP port: 443
Copy to Clipboard Copied! Accept connections by using both namespace and pod selectors:
To match network traffic by combining namespace and pod selectors, you can use a
NetworkPolicy
object similar to the following:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-pod-and-namespace-both spec: podSelector: matchLabels: name: test-pods ingress: - from: - namespaceSelector: matchLabels: project: project_name podSelector: matchLabels: name: test-pods
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-pod-and-namespace-both spec: podSelector: matchLabels: name: test-pods ingress: - from: - namespaceSelector: matchLabels: project: project_name podSelector: matchLabels: name: test-pods
Copy to Clipboard Copied!
NetworkPolicy
objects are additive, which means you can combine multiple NetworkPolicy
objects together to satisfy complex network requirements.
For example, for the NetworkPolicy
objects defined in previous samples, you can define both allow-same-namespace
and allow-http-and-https
policies within the same project. Thus allowing the pods with the label role=frontend
, to accept any connection allowed by each policy. That is, connections on any port from pods in the same namespace, and connections on ports 80
and 443
from pods in any namespace.
21.1.1.1. Using the allow-from-router network policy
Use the following NetworkPolicy
to allow external traffic regardless of the router configuration:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-router spec: ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/ingress: "" podSelector: {} policyTypes: - Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-router
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
policy-group.network.openshift.io/ingress: ""
podSelector: {}
policyTypes:
- Ingress
- 1
policy-group.network.openshift.io/ingress:""
label supports both OpenShift-SDN and OVN-Kubernetes.
21.1.1.2. Using the allow-from-hostnetwork network policy
Add the following allow-from-hostnetwork
NetworkPolicy
object to direct traffic from the host network pods.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-hostnetwork spec: ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/host-network: "" podSelector: {} policyTypes: - Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-from-hostnetwork
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
policy-group.network.openshift.io/host-network: ""
podSelector: {}
policyTypes:
- Ingress
21.1.2. Optimizations for network policy with OpenShift SDN
Use a network policy to isolate pods that are differentiated from one another by labels within a namespace.
It is inefficient to apply NetworkPolicy
objects to large numbers of individual pods in a single namespace. Pod labels do not exist at the IP address level, so a network policy generates a separate Open vSwitch (OVS) flow rule for every possible link between every pod selected with a podSelector
.
For example, if the spec podSelector
and the ingress podSelector
within a NetworkPolicy
object each match 200 pods, then 40,000 (200*200) OVS flow rules are generated. This might slow down a node.
When designing your network policy, refer to the following guidelines:
Reduce the number of OVS flow rules by using namespaces to contain groups of pods that need to be isolated.
NetworkPolicy
objects that select a whole namespace, by using thenamespaceSelector
or an emptypodSelector
, generate only a single OVS flow rule that matches the VXLAN virtual network ID (VNID) of the namespace.- Keep the pods that do not need to be isolated in their original namespace, and move the pods that require isolation into one or more different namespaces.
- Create additional targeted cross-namespace network policies to allow the specific traffic that you do want to allow from the isolated pods.
21.1.3. Optimizations for network policy with OVN-Kubernetes network plugin
When designing your network policy, refer to the following guidelines:
-
For network policies with the same
spec.podSelector
spec, it is more efficient to use one network policy with multipleingress
oregress
rules, than multiple network policies with subsets ofingress
oregress
rules. Every
ingress
oregress
rule based on thepodSelector
ornamespaceSelector
spec generates the number of OVS flows proportional tonumber of pods selected by network policy + number of pods selected by ingress or egress rule
. Therefore, it is preferable to use thepodSelector
ornamespaceSelector
spec that can select as many pods as you need in one rule, instead of creating individual rules for every pod.For example, the following policy contains two rules:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: test-network-policy spec: podSelector: {} ingress: - from: - podSelector: matchLabels: role: frontend - from: - podSelector: matchLabels: role: backend
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: test-network-policy spec: podSelector: {} ingress: - from: - podSelector: matchLabels: role: frontend - from: - podSelector: matchLabels: role: backend
Copy to Clipboard Copied! The following policy expresses those same two rules as one:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: test-network-policy spec: podSelector: {} ingress: - from: - podSelector: matchExpressions: - {key: role, operator: In, values: [frontend, backend]}
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: test-network-policy spec: podSelector: {} ingress: - from: - podSelector: matchExpressions: - {key: role, operator: In, values: [frontend, backend]}
Copy to Clipboard Copied! The same guideline applies to the
spec.podSelector
spec. If you have the sameingress
oregress
rules for different network policies, it might be more efficient to create one network policy with a commonspec.podSelector
spec. For example, the following two policies have different rules:apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: policy1 spec: podSelector: matchLabels: role: db ingress: - from: - podSelector: matchLabels: role: frontend --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: policy2 spec: podSelector: matchLabels: role: client ingress: - from: - podSelector: matchLabels: role: frontend
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: policy1 spec: podSelector: matchLabels: role: db ingress: - from: - podSelector: matchLabels: role: frontend --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: policy2 spec: podSelector: matchLabels: role: client ingress: - from: - podSelector: matchLabels: role: frontend
Copy to Clipboard Copied! The following network policy expresses those same two rules as one:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: policy3 spec: podSelector: matchExpressions: - {key: role, operator: In, values: [db, client]} ingress: - from: - podSelector: matchLabels: role: frontend
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: policy3 spec: podSelector: matchExpressions: - {key: role, operator: In, values: [db, client]} ingress: - from: - podSelector: matchLabels: role: frontend
Copy to Clipboard Copied! You can apply this optimization when only multiple selectors are expressed as one. In cases where selectors are based on different labels, it may not be possible to apply this optimization. In those cases, consider applying some new labels for network policy optimization specifically.
21.1.3.1. NetworkPolicy CR and external IPs in OVN-Kubernetes
In OVN-Kubernetes, the NetworkPolicy
custom resource (CR) enforces strict isolation rules. If a service is exposed using an external IP, a network policy can block access from other namespaces unless explicitly configured to allow traffic.
To allow access to external IPs across namespaces, create a NetworkPolicy
CR that explicitly permits ingress from the required namespaces and ensures traffic is allowed to the designated service ports. Without allowing traffic to the required ports, access might still be restricted.
Example output
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: annotations: name: <policy_name> namespace: openshift-ingress spec: ingress: - ports: - port: 80 protocol: TCP - ports: - port: 443 protocol: TCP - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: <my_namespace> podSelector: {} policyTypes: - Ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
annotations:
name: <policy_name>
namespace: openshift-ingress
spec:
ingress:
- ports:
- port: 80
protocol: TCP
- ports:
- port: 443
protocol: TCP
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: <my_namespace>
podSelector: {}
policyTypes:
- Ingress
where:
<policy_name>
- Specifies your name for the policy.
<my_namespace>
- Specifies the name of the namespace where the policy is deployed.
For more details, see "About network policy".
21.1.4. Next steps
21.2. Creating a network policy
As a user with the admin
role, you can create a network policy for a namespace.
21.2.1. Example NetworkPolicy object
The following annotates an example NetworkPolicy object:
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-27107 spec: podSelector: matchLabels: app: mongodb ingress: - from: - podSelector: matchLabels: app: app ports: - protocol: TCP port: 27017
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-27107
spec:
podSelector:
matchLabels:
app: mongodb
ingress:
- from:
- podSelector:
matchLabels:
app: app
ports:
- protocol: TCP
port: 27017
- 1
- The name of the NetworkPolicy object.
- 2
- A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object.
- 3
- A selector that matches the pods from which the policy object allows ingress traffic. The selector matches pods in the same namespace as the NetworkPolicy.
- 4
- A list of one or more destination ports on which to accept traffic.
21.2.2. Creating a network policy using the CLI
To define granular rules describing ingress or egress network traffic allowed for namespaces in your cluster, you can create a network policy.
If you log in with a user with the cluster-admin
role, then you can create a network policy in any namespace in the cluster.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
admin
privileges. - You are working in the namespace that the network policy applies to.
Procedure
Create a policy rule:
Create a
<policy_name>.yaml
file:touch <policy_name>.yaml
$ touch <policy_name>.yaml
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the network policy file name.
Define a network policy in the file that you just created, such as in the following examples:
Deny ingress from all pods in all namespaces
This is a fundamental policy, blocking all cross-pod networking other than cross-pod traffic allowed by the configuration of other Network Policies.
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-by-default spec: podSelector: {} policyTypes: - Ingress ingress: []
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-by-default spec: podSelector: {} policyTypes: - Ingress ingress: []
Copy to Clipboard Copied! Allow ingress from all pods in the same namespace
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-same-namespace spec: podSelector: ingress: - from: - podSelector: {}
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-same-namespace spec: podSelector: ingress: - from: - podSelector: {}
Copy to Clipboard Copied! Allow ingress traffic to one pod from a particular namespace
This policy allows traffic to pods labelled
pod-a
from pods running innamespace-y
.kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-traffic-pod spec: podSelector: matchLabels: pod: pod-a policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: namespace-y
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-traffic-pod spec: podSelector: matchLabels: pod: pod-a policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: namespace-y
Copy to Clipboard Copied!
To create the network policy object, enter the following command:
oc apply -f <policy_name>.yaml -n <namespace>
$ oc apply -f <policy_name>.yaml -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the network policy file name.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Example output
networkpolicy.networking.k8s.io/deny-by-default created
networkpolicy.networking.k8s.io/deny-by-default created
Copy to Clipboard Copied!
If you log in to the web console with cluster-admin
privileges, you have a choice of creating a network policy in any namespace in the cluster directly in YAML or from a form in the web console.
21.2.3. Creating a default deny all network policy
This is a fundamental policy, blocking all cross-pod networking other than network traffic allowed by the configuration of other deployed network policies. This procedure enforces a default deny-by-default
policy.
If you log in with a user with the cluster-admin
role, then you can create a network policy in any namespace in the cluster.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
admin
privileges. - You are working in the namespace that the network policy applies to.
Procedure
Create the following YAML that defines a
deny-by-default
policy to deny ingress from all pods in all namespaces. Save the YAML in thedeny-by-default.yaml
file:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-by-default namespace: default spec: podSelector: {} ingress: []
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-by-default namespace: default
1 spec: podSelector: {}
2 ingress: []
3 Copy to Clipboard Copied! Apply the policy by entering the following command:
oc apply -f deny-by-default.yaml
$ oc apply -f deny-by-default.yaml
Copy to Clipboard Copied! Example output
networkpolicy.networking.k8s.io/deny-by-default created
networkpolicy.networking.k8s.io/deny-by-default created
Copy to Clipboard Copied!
21.2.4. Creating a network policy to allow traffic from external clients
With the deny-by-default
policy in place you can proceed to configure a policy that allows traffic from external clients to a pod with the label app=web
.
If you log in with a user with the cluster-admin
role, then you can create a network policy in any namespace in the cluster.
Follow this procedure to configure a policy that allows external service from the public Internet directly or by using a Load Balancer to access the pod. Traffic is only allowed to a pod with the label app=web
.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
admin
privileges. - You are working in the namespace that the network policy applies to.
Procedure
Create a policy that allows traffic from the public Internet directly or by using a load balancer to access the pod. Save the YAML in the
web-allow-external.yaml
file:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: web-allow-external namespace: default spec: policyTypes: - Ingress podSelector: matchLabels: app: web ingress: - {}
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: web-allow-external namespace: default spec: policyTypes: - Ingress podSelector: matchLabels: app: web ingress: - {}
Copy to Clipboard Copied! Apply the policy by entering the following command:
oc apply -f web-allow-external.yaml
$ oc apply -f web-allow-external.yaml
Copy to Clipboard Copied! Example output
networkpolicy.networking.k8s.io/web-allow-external created
networkpolicy.networking.k8s.io/web-allow-external created
Copy to Clipboard Copied! This policy allows traffic from all resources, including external traffic as illustrated in the following diagram:

21.2.5. Creating a network policy allowing traffic to an application from all namespaces
If you log in with a user with the cluster-admin
role, then you can create a network policy in any namespace in the cluster.
Follow this procedure to configure a policy that allows traffic from all pods in all namespaces to a particular application.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
admin
privileges. - You are working in the namespace that the network policy applies to.
Procedure
Create a policy that allows traffic from all pods in all namespaces to a particular application. Save the YAML in the
web-allow-all-namespaces.yaml
file:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: web-allow-all-namespaces namespace: default spec: podSelector: matchLabels: app: web policyTypes: - Ingress ingress: - from: - namespaceSelector: {}
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: web-allow-all-namespaces namespace: default spec: podSelector: matchLabels: app: web
1 policyTypes: - Ingress ingress: - from: - namespaceSelector: {}
2 Copy to Clipboard Copied! NoteBy default, if you omit specifying a
namespaceSelector
it does not select any namespaces, which means the policy allows traffic only from the namespace the network policy is deployed to.Apply the policy by entering the following command:
oc apply -f web-allow-all-namespaces.yaml
$ oc apply -f web-allow-all-namespaces.yaml
Copy to Clipboard Copied! Example output
networkpolicy.networking.k8s.io/web-allow-all-namespaces created
networkpolicy.networking.k8s.io/web-allow-all-namespaces created
Copy to Clipboard Copied!
Verification
Start a web service in the
default
namespace by entering the following command:oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
Copy to Clipboard Copied! Run the following command to deploy an
alpine
image in thesecondary
namespace and to start a shell:oc run test-$RANDOM --namespace=secondary --rm -i -t --image=alpine -- sh
$ oc run test-$RANDOM --namespace=secondary --rm -i -t --image=alpine -- sh
Copy to Clipboard Copied! Run the following command in the shell and observe that the request is allowed:
wget -qO- --timeout=2 http://web.default
# wget -qO- --timeout=2 http://web.default
Copy to Clipboard Copied! Expected output
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
Copy to Clipboard Copied!
21.2.6. Creating a network policy allowing traffic to an application from a namespace
If you log in with a user with the cluster-admin
role, then you can create a network policy in any namespace in the cluster.
Follow this procedure to configure a policy that allows traffic to a pod with the label app=web
from a particular namespace. You might want to do this to:
- Restrict traffic to a production database only to namespaces where production workloads are deployed.
- Enable monitoring tools deployed to a particular namespace to scrape metrics from the current namespace.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
admin
privileges. - You are working in the namespace that the network policy applies to.
Procedure
Create a policy that allows traffic from all pods in a particular namespaces with a label
purpose=production
. Save the YAML in theweb-allow-prod.yaml
file:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: web-allow-prod namespace: default spec: podSelector: matchLabels: app: web policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: purpose: production
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: web-allow-prod namespace: default spec: podSelector: matchLabels: app: web
1 policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: purpose: production
2 Copy to Clipboard Copied! Apply the policy by entering the following command:
oc apply -f web-allow-prod.yaml
$ oc apply -f web-allow-prod.yaml
Copy to Clipboard Copied! Example output
networkpolicy.networking.k8s.io/web-allow-prod created
networkpolicy.networking.k8s.io/web-allow-prod created
Copy to Clipboard Copied!
Verification
Start a web service in the
default
namespace by entering the following command:oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
Copy to Clipboard Copied! Run the following command to create the
prod
namespace:oc create namespace prod
$ oc create namespace prod
Copy to Clipboard Copied! Run the following command to label the
prod
namespace:oc label namespace/prod purpose=production
$ oc label namespace/prod purpose=production
Copy to Clipboard Copied! Run the following command to create the
dev
namespace:oc create namespace dev
$ oc create namespace dev
Copy to Clipboard Copied! Run the following command to label the
dev
namespace:oc label namespace/dev purpose=testing
$ oc label namespace/dev purpose=testing
Copy to Clipboard Copied! Run the following command to deploy an
alpine
image in thedev
namespace and to start a shell:oc run test-$RANDOM --namespace=dev --rm -i -t --image=alpine -- sh
$ oc run test-$RANDOM --namespace=dev --rm -i -t --image=alpine -- sh
Copy to Clipboard Copied! Run the following command in the shell and observe that the request is blocked:
wget -qO- --timeout=2 http://web.default
# wget -qO- --timeout=2 http://web.default
Copy to Clipboard Copied! Expected output
wget: download timed out
wget: download timed out
Copy to Clipboard Copied! Run the following command to deploy an
alpine
image in theprod
namespace and start a shell:oc run test-$RANDOM --namespace=prod --rm -i -t --image=alpine -- sh
$ oc run test-$RANDOM --namespace=prod --rm -i -t --image=alpine -- sh
Copy to Clipboard Copied! Run the following command in the shell and observe that the request is allowed:
wget -qO- --timeout=2 http://web.default
# wget -qO- --timeout=2 http://web.default
Copy to Clipboard Copied! Expected output
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
Copy to Clipboard Copied!
21.3. Viewing a network policy
As a user with the admin
role, you can view a network policy for a namespace.
21.3.1. Example NetworkPolicy object
The following annotates an example NetworkPolicy object:
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-27107 spec: podSelector: matchLabels: app: mongodb ingress: - from: - podSelector: matchLabels: app: app ports: - protocol: TCP port: 27017
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-27107
spec:
podSelector:
matchLabels:
app: mongodb
ingress:
- from:
- podSelector:
matchLabels:
app: app
ports:
- protocol: TCP
port: 27017
- 1
- The name of the NetworkPolicy object.
- 2
- A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object.
- 3
- A selector that matches the pods from which the policy object allows ingress traffic. The selector matches pods in the same namespace as the NetworkPolicy.
- 4
- A list of one or more destination ports on which to accept traffic.
21.3.2. Viewing network policies using the CLI
You can examine the network policies in a namespace.
If you log in with a user with the cluster-admin
role, then you can view any network policy in the cluster.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
admin
privileges. - You are working in the namespace where the network policy exists.
Procedure
List network policies in a namespace:
To view network policy objects defined in a namespace, enter the following command:
oc get networkpolicy
$ oc get networkpolicy
Copy to Clipboard Copied! Optional: To examine a specific network policy, enter the following command:
oc describe networkpolicy <policy_name> -n <namespace>
$ oc describe networkpolicy <policy_name> -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the name of the network policy to inspect.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
For example:
oc describe networkpolicy allow-same-namespace
$ oc describe networkpolicy allow-same-namespace
Copy to Clipboard Copied! Output for
oc describe
commandName: allow-same-namespace Namespace: ns1 Created on: 2021-05-24 22:28:56 -0400 EDT Labels: <none> Annotations: <none> Spec: PodSelector: <none> (Allowing the specific traffic to all pods in this namespace) Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: PodSelector: <none> Not affecting egress traffic Policy Types: Ingress
Name: allow-same-namespace Namespace: ns1 Created on: 2021-05-24 22:28:56 -0400 EDT Labels: <none> Annotations: <none> Spec: PodSelector: <none> (Allowing the specific traffic to all pods in this namespace) Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: PodSelector: <none> Not affecting egress traffic Policy Types: Ingress
Copy to Clipboard Copied!
If you log in to the web console with cluster-admin
privileges, you have a choice of viewing a network policy in any namespace in the cluster directly in YAML or from a form in the web console.
21.4. Editing a network policy
As a user with the admin
role, you can edit an existing network policy for a namespace.
21.4.1. Editing a network policy
You can edit a network policy in a namespace.
If you log in with a user with the cluster-admin
role, then you can edit a network policy in any namespace in the cluster.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
admin
privileges. - You are working in the namespace where the network policy exists.
Procedure
Optional: To list the network policy objects in a namespace, enter the following command:
oc get networkpolicy
$ oc get networkpolicy
Copy to Clipboard Copied! where:
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Edit the network policy object.
If you saved the network policy definition in a file, edit the file and make any necessary changes, and then enter the following command.
oc apply -n <namespace> -f <policy_file>.yaml
$ oc apply -n <namespace> -f <policy_file>.yaml
Copy to Clipboard Copied! where:
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
<policy_file>
- Specifies the name of the file containing the network policy.
If you need to update the network policy object directly, enter the following command:
oc edit networkpolicy <policy_name> -n <namespace>
$ oc edit networkpolicy <policy_name> -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the name of the network policy.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Confirm that the network policy object is updated.
oc describe networkpolicy <policy_name> -n <namespace>
$ oc describe networkpolicy <policy_name> -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the name of the network policy.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
If you log in to the web console with cluster-admin
privileges, you have a choice of editing a network policy in any namespace in the cluster directly in YAML or from the policy in the web console through the Actions menu.
21.4.2. Example NetworkPolicy object
The following annotates an example NetworkPolicy object:
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-27107 spec: podSelector: matchLabels: app: mongodb ingress: - from: - podSelector: matchLabels: app: app ports: - protocol: TCP port: 27017
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: allow-27107
spec:
podSelector:
matchLabels:
app: mongodb
ingress:
- from:
- podSelector:
matchLabels:
app: app
ports:
- protocol: TCP
port: 27017
- 1
- The name of the NetworkPolicy object.
- 2
- A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object.
- 3
- A selector that matches the pods from which the policy object allows ingress traffic. The selector matches pods in the same namespace as the NetworkPolicy.
- 4
- A list of one or more destination ports on which to accept traffic.
21.5. Deleting a network policy
As a user with the admin
role, you can delete a network policy from a namespace.
21.5.1. Deleting a network policy using the CLI
You can delete a network policy in a namespace.
If you log in with a user with the cluster-admin
role, then you can delete any network policy in the cluster.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
admin
privileges. - You are working in the namespace where the network policy exists.
Procedure
To delete a network policy object, enter the following command:
oc delete networkpolicy <policy_name> -n <namespace>
$ oc delete networkpolicy <policy_name> -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the name of the network policy.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Example output
networkpolicy.networking.k8s.io/default-deny deleted
networkpolicy.networking.k8s.io/default-deny deleted
Copy to Clipboard Copied!
If you log in to the web console with cluster-admin
privileges, you have a choice of deleting a network policy in any namespace in the cluster directly in YAML or from the policy in the web console through the Actions menu.
21.6. Defining a default network policy for projects
As a cluster administrator, you can modify the new project template to automatically include network policies when you create a new project. If you do not yet have a customized template for new projects, you must first create one.
21.6.1. Modifying the template for new projects
As a cluster administrator, you can modify the default project template so that new projects are created using your custom requirements.
To create your own custom project template:
Prerequisites
-
You have access to an OpenShift Container Platform cluster using an account with
cluster-admin
permissions.
Procedure
-
Log in as a user with
cluster-admin
privileges. Generate the default project template:
oc adm create-bootstrap-project-template -o yaml > template.yaml
$ oc adm create-bootstrap-project-template -o yaml > template.yaml
Copy to Clipboard Copied! -
Use a text editor to modify the generated
template.yaml
file by adding objects or modifying existing objects. The project template must be created in the
openshift-config
namespace. Load your modified template:oc create -f template.yaml -n openshift-config
$ oc create -f template.yaml -n openshift-config
Copy to Clipboard Copied! Edit the project configuration resource using the web console or CLI.
Using the web console:
- Navigate to the Administration → Cluster Settings page.
- Click Configuration to view all configuration resources.
- Find the entry for Project and click Edit YAML.
Using the CLI:
Edit the
project.config.openshift.io/cluster
resource:oc edit project.config.openshift.io/cluster
$ oc edit project.config.openshift.io/cluster
Copy to Clipboard Copied!
Update the
spec
section to include theprojectRequestTemplate
andname
parameters, and set the name of your uploaded project template. The default name isproject-request
.Project configuration resource with custom project template
apiVersion: config.openshift.io/v1 kind: Project metadata: # ... spec: projectRequestTemplate: name: <template_name> # ...
apiVersion: config.openshift.io/v1 kind: Project metadata: # ... spec: projectRequestTemplate: name: <template_name> # ...
Copy to Clipboard Copied! - After you save your changes, create a new project to verify that your changes were successfully applied.
21.6.2. Adding network policies to the new project template
As a cluster administrator, you can add network policies to the default template for new projects. OpenShift Container Platform will automatically create all the NetworkPolicy
objects specified in the template in the project.
Prerequisites
-
Your cluster uses a default CNI network plugin that supports
NetworkPolicy
objects, such as the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You must log in to the cluster with a user with
cluster-admin
privileges. - You must have created a custom default project template for new projects.
Procedure
Edit the default template for a new project by running the following command:
oc edit template <project_template> -n openshift-config
$ oc edit template <project_template> -n openshift-config
Copy to Clipboard Copied! Replace
<project_template>
with the name of the default template that you configured for your cluster. The default template name isproject-request
.In the template, add each
NetworkPolicy
object as an element to theobjects
parameter. Theobjects
parameter accepts a collection of one or more objects.In the following example, the
objects
parameter collection includes severalNetworkPolicy
objects.objects: - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-same-namespace spec: podSelector: {} ingress: - from: - podSelector: {} - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-openshift-ingress spec: ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/ingress: podSelector: {} policyTypes: - Ingress - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-kube-apiserver-operator spec: ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: openshift-kube-apiserver-operator podSelector: matchLabels: app: kube-apiserver-operator policyTypes: - Ingress ...
objects: - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-same-namespace spec: podSelector: {} ingress: - from: - podSelector: {} - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-openshift-ingress spec: ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/ingress: podSelector: {} policyTypes: - Ingress - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-kube-apiserver-operator spec: ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: openshift-kube-apiserver-operator podSelector: matchLabels: app: kube-apiserver-operator policyTypes: - Ingress ...
Copy to Clipboard Copied! Optional: Create a new project to confirm that your network policy objects are created successfully by running the following commands:
Create a new project:
oc new-project <project>
$ oc new-project <project>
1 Copy to Clipboard Copied! - 1
- Replace
<project>
with the name for the project you are creating.
Confirm that the network policy objects in the new project template exist in the new project:
oc get networkpolicy
$ oc get networkpolicy NAME POD-SELECTOR AGE allow-from-openshift-ingress <none> 7s allow-from-same-namespace <none> 7s
Copy to Clipboard Copied!
21.7. Configuring multitenant isolation with network policy
As a cluster administrator, you can configure your network policies to provide multitenant network isolation.
If you are using the OpenShift SDN network plugin, configuring network policies as described in this section provides network isolation similar to multitenant mode but with network policy mode set.
21.7.1. Configuring multitenant isolation by using network policy
You can configure your project to isolate it from pods and services in other project namespaces.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
admin
privileges.
Procedure
Create the following
NetworkPolicy
objects:A policy named
allow-from-openshift-ingress
.cat << EOF| oc create -f - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-openshift-ingress spec: ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/ingress: "" podSelector: {} policyTypes: - Ingress EOF
$ cat << EOF| oc create -f - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-openshift-ingress spec: ingress: - from: - namespaceSelector: matchLabels: policy-group.network.openshift.io/ingress: "" podSelector: {} policyTypes: - Ingress EOF
Copy to Clipboard Copied! Notepolicy-group.network.openshift.io/ingress: ""
is the preferred namespace selector label for OpenShift SDN. You can use thenetwork.openshift.io/policy-group: ingress
namespace selector label, but this is a legacy label.A policy named
allow-from-openshift-monitoring
:cat << EOF| oc create -f - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-openshift-monitoring spec: ingress: - from: - namespaceSelector: matchLabels: network.openshift.io/policy-group: monitoring podSelector: {} policyTypes: - Ingress EOF
$ cat << EOF| oc create -f - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-openshift-monitoring spec: ingress: - from: - namespaceSelector: matchLabels: network.openshift.io/policy-group: monitoring podSelector: {} policyTypes: - Ingress EOF
Copy to Clipboard Copied! A policy named
allow-same-namespace
:cat << EOF| oc create -f - kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-same-namespace spec: podSelector: ingress: - from: - podSelector: {} EOF
$ cat << EOF| oc create -f - kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-same-namespace spec: podSelector: ingress: - from: - podSelector: {} EOF
Copy to Clipboard Copied! A policy named
allow-from-kube-apiserver-operator
:cat << EOF| oc create -f - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-kube-apiserver-operator spec: ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: openshift-kube-apiserver-operator podSelector: matchLabels: app: kube-apiserver-operator policyTypes: - Ingress EOF
$ cat << EOF| oc create -f - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-kube-apiserver-operator spec: ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: openshift-kube-apiserver-operator podSelector: matchLabels: app: kube-apiserver-operator policyTypes: - Ingress EOF
Copy to Clipboard Copied! For more details, see New
kube-apiserver-operator
webhook controller validating health of webhook.
Optional: To confirm that the network policies exist in your current project, enter the following command:
oc describe networkpolicy
$ oc describe networkpolicy
Copy to Clipboard Copied! Example output
Name: allow-from-openshift-ingress Namespace: example1 Created on: 2020-06-09 00:28:17 -0400 EDT Labels: <none> Annotations: <none> Spec: PodSelector: <none> (Allowing the specific traffic to all pods in this namespace) Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: NamespaceSelector: policy-group.network.openshift.io/ingress: Not affecting egress traffic Policy Types: Ingress Name: allow-from-openshift-monitoring Namespace: example1 Created on: 2020-06-09 00:29:57 -0400 EDT Labels: <none> Annotations: <none> Spec: PodSelector: <none> (Allowing the specific traffic to all pods in this namespace) Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: NamespaceSelector: network.openshift.io/policy-group: monitoring Not affecting egress traffic Policy Types: Ingress
Name: allow-from-openshift-ingress Namespace: example1 Created on: 2020-06-09 00:28:17 -0400 EDT Labels: <none> Annotations: <none> Spec: PodSelector: <none> (Allowing the specific traffic to all pods in this namespace) Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: NamespaceSelector: policy-group.network.openshift.io/ingress: Not affecting egress traffic Policy Types: Ingress Name: allow-from-openshift-monitoring Namespace: example1 Created on: 2020-06-09 00:29:57 -0400 EDT Labels: <none> Annotations: <none> Spec: PodSelector: <none> (Allowing the specific traffic to all pods in this namespace) Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: NamespaceSelector: network.openshift.io/policy-group: monitoring Not affecting egress traffic Policy Types: Ingress
Copy to Clipboard Copied!
21.7.2. Next steps
Chapter 22. CIDR range definitions
If your cluster uses OVN-Kubernetes, you must specify non-overlapping ranges for Classless Inter-Domain Routing (CIDR) subnet ranges.
The following subnet types are mandatory for a cluster that uses OVN-Kubernetes:
- Join: Uses a join switch to connect gateway routers to distributed routers. A join switch reduces the number of IP addresses for a distributed router. For a cluster that uses the OVN-Kubernetes plugin, an IP address from a dedicated subnet is assigned to any logical port that attaches to the join switch.
- Masquerade: Prevents collisions for identical source and destination IP addresses that are sent from a node as hairpin traffic to the same node after a load balancer makes a routing decision.
- Transit: A transit switch is a type of distributed switch that spans across all nodes in the cluster. A transit switch routes traffic between different zones. For a cluster that uses the OVN-Kubernetes plugin, an IP address from a dedicated subnet is assigned to any logical port that attaches to the transit switch.
You can change the join and transit CIDR ranges for your cluster as a post-installation task.
OVN-Kubernetes, the default network provider in OpenShift Container Platform 4.14 and later versions, internally uses the following IP address subnet ranges:
-
V4JoinSubnet
:100.64.0.0/16
-
V6JoinSubnet
:fd98::/64
-
V4TransitSwitchSubnet
:100.88.0.0/16
-
V6TransitSwitchSubnet
:fd97::/64
-
defaultV4MasqueradeSubnet
:169.254.169.0/29
-
defaultV6MasqueradeSubnet
:fd69::/125
The previous list includes join, transit, and masquerade IPv4 and IPv6 address subnets. If your cluster uses OVN-Kubernetes, do not include any of these IP address subnet ranges in any other CIDR definitions in your cluster or infrastructure.
22.1. Machine CIDR
In the Machine classless inter-domain routing (CIDR) field, you must specify the IP address range for machines or cluster nodes.
Machine CIDR ranges cannot be changed after creating your cluster.
The default is 10.0.0.0/16
. This range must not conflict with any connected networks.
22.2. Service CIDR
In the Service CIDR field, you must specify the IP address range for services. The range must be large enough to accommodate your workload. The address block must not overlap with any external service accessed from within the cluster. The default is 172.30.0.0/16
.
22.3. Pod CIDR
In the pod CIDR field, you must specify the IP address range for pods.
The pod CIDR is the same as the clusterNetwork
CIDR and the cluster CIDR. The range must be large enough to accommodate your workload. The address block must not overlap with any external service accessed from within the cluster. The default is 10.128.0.0/14
. You can expand the range after cluster installation.
22.4. Host Prefix
In the Host Prefix field, you must specify the subnet prefix length assigned to pods scheduled to individual machines. The host prefix determines the pod IP address pool for each machine.
For example, if the host prefix is set to /23
, each machine is assigned a /23
subnet from the pod CIDR address range. The default is /23
, allowing 510 cluster nodes, and 510 pod IP addresses per node.
Chapter 23. AWS Load Balancer Operator
23.1. AWS Load Balancer Operator release notes
The AWS Load Balancer (ALB) Operator deploys and manages an instance of the AWSLoadBalancerController
resource.
The AWS Load Balancer (ALB) Operator is only supported on the x86_64
architecture.
These release notes track the development of the AWS Load Balancer Operator in OpenShift Container Platform.
For an overview of the AWS Load Balancer Operator, see AWS Load Balancer Operator in OpenShift Container Platform.
AWS Load Balancer Operator currently does not support AWS GovCloud.
23.1.1. AWS Load Balancer Operator 1.1.1
The following advisory is available for the AWS Load Balancer Operator version 1.1.1:
23.1.2. AWS Load Balancer Operator 1.1.0
The AWS Load Balancer Operator version 1.1.0 supports the AWS Load Balancer Controller version 2.4.4.
The following advisory is available for the AWS Load Balancer Operator version 1.1.0:
23.1.2.1. Notable changes
- This release uses the Kubernetes API version 0.27.2.
23.1.2.2. New features
- The AWS Load Balancer Operator now supports a standardized Security Token Service (STS) flow by using the Cloud Credential Operator.
23.1.2.3. Bug fixes
A FIPS-compliant cluster must use TLS version 1.2. Previously, webhooks for the AWS Load Balancer Controller only accepted TLS 1.3 as the minimum version, resulting in an error such as the following on a FIPS-compliant cluster:
remote error: tls: protocol version not supported
remote error: tls: protocol version not supported
Copy to Clipboard Copied! Now, the AWS Load Balancer Controller accepts TLS 1.2 as the minimum TLS version, resolving this issue. (OCPBUGS-14846)
23.1.3. AWS Load Balancer Operator 1.0.1
The following advisory is available for the AWS Load Balancer Operator version 1.0.1:
23.1.4. AWS Load Balancer Operator 1.0.0
The AWS Load Balancer Operator is now generally available with this release. The AWS Load Balancer Operator version 1.0.0 supports the AWS Load Balancer Controller version 2.4.4.
The following advisory is available for the AWS Load Balancer Operator version 1.0.0:
The AWS Load Balancer (ALB) Operator version 1.x.x cannot upgrade automatically from the Technology Preview version 0.x.x. To upgrade from an earlier version, you must uninstall the ALB operands and delete the aws-load-balancer-operator
namespace.
23.1.4.1. Notable changes
-
This release uses the new
v1
API version.
23.1.4.2. Bug fixes
- Previously, the controller provisioned by the AWS Load Balancer Operator did not properly use the configuration for the cluster-wide proxy. These settings are now applied appropriately to the controller. (OCPBUGS-4052, OCPBUGS-5295)
23.1.5. Earlier versions
The two earliest versions of the AWS Load Balancer Operator are available as a Technology Preview. These versions should not be used in a production cluster. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
The following advisory is available for the AWS Load Balancer Operator version 0.2.0:
The following advisory is available for the AWS Load Balancer Operator version 0.0.1:
23.2. AWS Load Balancer Operator in OpenShift Container Platform
The AWS Load Balancer Operator deploys and manages the AWS Load Balancer Controller. You can install the AWS Load Balancer Operator from OperatorHub by using OpenShift Container Platform web console or CLI.
23.2.1. AWS Load Balancer Operator considerations
Review the following limitations before installing and using the AWS Load Balancer Operator:
- The IP traffic mode only works on AWS Elastic Kubernetes Service (EKS). The AWS Load Balancer Operator disables the IP traffic mode for the AWS Load Balancer Controller. As a result of disabling the IP traffic mode, the AWS Load Balancer Controller cannot use the pod readiness gate.
-
The AWS Load Balancer Operator adds command-line flags such as
--disable-ingress-class-annotation
and--disable-ingress-group-name-annotation
to the AWS Load Balancer Controller. Therefore, the AWS Load Balancer Operator does not allow using thekubernetes.io/ingress.class
andalb.ingress.kubernetes.io/group.name
annotations in theIngress
resource. -
You have configured the AWS Load Balancer Operator so that the SVC type is
NodePort
(notLoadBalancer
orClusterIP
).
23.2.2. AWS Load Balancer Operator
The AWS Load Balancer Operator can tag the public subnets if the kubernetes.io/role/elb
tag is missing. Also, the AWS Load Balancer Operator detects the following information from the underlying AWS cloud:
- The ID of the virtual private cloud (VPC) on which the cluster hosting the Operator is deployed in.
- Public and private subnets of the discovered VPC.
The AWS Load Balancer Operator supports the Kubernetes service resource of type LoadBalancer
by using Network Load Balancer (NLB) with the instance
target type only.
Procedure
You can deploy the AWS Load Balancer Operator on demand from OperatorHub, by creating a
Subscription
object by running the following command:oc -n aws-load-balancer-operator get sub aws-load-balancer-operator --template='{{.status.installplan.name}}{{"\n"}}'
$ oc -n aws-load-balancer-operator get sub aws-load-balancer-operator --template='{{.status.installplan.name}}{{"\n"}}'
Copy to Clipboard Copied! Example output
install-zlfbt
install-zlfbt
Copy to Clipboard Copied! Check if the status of an install plan is
Complete
by running the following command:oc -n aws-load-balancer-operator get ip <install_plan_name> --template='{{.status.phase}}{{"\n"}}'
$ oc -n aws-load-balancer-operator get ip <install_plan_name> --template='{{.status.phase}}{{"\n"}}'
Copy to Clipboard Copied! Example output
Complete
Complete
Copy to Clipboard Copied! View the status of the
aws-load-balancer-operator-controller-manager
deployment by running the following command:oc get -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager
$ oc get -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager
Copy to Clipboard Copied! Example output
NAME READY UP-TO-DATE AVAILABLE AGE aws-load-balancer-operator-controller-manager 1/1 1 1 23h
NAME READY UP-TO-DATE AVAILABLE AGE aws-load-balancer-operator-controller-manager 1/1 1 1 23h
Copy to Clipboard Copied!
23.2.3. Using the AWS Load Balancer Operator in an AWS VPC cluster extended into an Outpost
You can configure the AWS Load Balancer Operator to provision an AWS Application Load Balancer in an AWS VPC cluster extended into an Outpost. AWS Outposts does not support AWS Network Load Balancers. As a result, the AWS Load Balancer Operator cannot provision Network Load Balancers in an Outpost.
You can create an AWS Application Load Balancer either in the cloud subnet or in the Outpost subnet. An Application Load Balancer in the cloud can attach to cloud-based compute nodes and an Application Load Balancer in the Outpost can attach to edge compute nodes. You must annotate Ingress resources with the Outpost subnet or the VPC subnet, but not both.
Prerequisites
- You have extended an AWS VPC cluster into an Outpost.
-
You have installed the OpenShift CLI (
oc
). - You have installed the AWS Load Balancer Operator and created the AWS Load Balancer Controller.
Procedure
Configure the
Ingress
resource to use a specified subnet:Example
Ingress
resource configurationapiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: <application_name> annotations: alb.ingress.kubernetes.io/subnets: <subnet_id> spec: ingressClassName: alb rules: - http: paths: - path: / pathType: Exact backend: service: name: <application_name> port: number: 80
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: <application_name> annotations: alb.ingress.kubernetes.io/subnets: <subnet_id>
1 spec: ingressClassName: alb rules: - http: paths: - path: / pathType: Exact backend: service: name: <application_name> port: number: 80
Copy to Clipboard Copied! - 1
- Specifies the subnet to use.
- To use the Application Load Balancer in an Outpost, specify the Outpost subnet ID.
- To use the Application Load Balancer in the cloud, you must specify at least two subnets in different availability zones.
23.2.4. AWS Load Balancer Operator logs
You can view the AWS Load Balancer Operator logs by using the oc logs
command.
Procedure
View the logs of the AWS Load Balancer Operator by running the following command:
oc logs -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager -c manager
$ oc logs -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager -c manager
Copy to Clipboard Copied!
23.3. Installing the AWS Load Balancer Operator
The AWS Load Balancer Operator deploys and manages the AWS Load Balancer Controller. You can install the AWS Load Balancer Operator from the OperatorHub by using OpenShift Container Platform web console or CLI.
23.3.1. Installing the AWS Load Balancer Operator by using the web console
You can install the AWS Load Balancer Operator by using the web console.
Prerequisites
-
You have logged in to the OpenShift Container Platform web console as a user with
cluster-admin
permissions. - Your cluster is configured with AWS as the platform type and cloud provider.
- If you are using a security token service (STS) or user-provisioned infrastructure, follow the related preparation steps. For example, if you are using AWS Security Token Service, see "Preparing for the AWS Load Balancer Operator on a cluster using the AWS Security Token Service (STS)".
Procedure
- Navigate to Operators → OperatorHub in the OpenShift Container Platform web console.
- Select the AWS Load Balancer Operator. You can use the Filter by keyword text box or use the filter list to search for the AWS Load Balancer Operator from the list of Operators.
-
Select the
aws-load-balancer-operator
namespace. On the Install Operator page, select the following options:
- Update the channel as stable-v1.
- Installation mode as All namespaces on the cluster (default).
-
Installed Namespace as
aws-load-balancer-operator
. If theaws-load-balancer-operator
namespace does not exist, it gets created during the Operator installation. - Select Update approval as Automatic or Manual. By default, the Update approval is set to Automatic. If you select automatic updates, the Operator Lifecycle Manager (OLM) automatically upgrades the running instance of your Operator without any intervention. If you select manual updates, the OLM creates an update request. As a cluster administrator, you must then manually approve that update request to update the Operator updated to the new version.
- Click Install.
Verification
- Verify that the AWS Load Balancer Operator shows the Status as Succeeded on the Installed Operators dashboard.
23.3.2. Installing the AWS Load Balancer Operator by using the CLI
You can install the AWS Load Balancer Operator by using the CLI.
Prerequisites
-
You are logged in to the OpenShift Container Platform web console as a user with
cluster-admin
permissions. - Your cluster is configured with AWS as the platform type and cloud provider.
-
You are logged into the OpenShift CLI (
oc
).
Procedure
Create a
Namespace
object:Create a YAML file that defines the
Namespace
object:Example
namespace.yaml
fileapiVersion: v1 kind: Namespace metadata: name: aws-load-balancer-operator
apiVersion: v1 kind: Namespace metadata: name: aws-load-balancer-operator
Copy to Clipboard Copied! Create the
Namespace
object by running the following command:oc apply -f namespace.yaml
$ oc apply -f namespace.yaml
Copy to Clipboard Copied!
Create an
OperatorGroup
object:Create a YAML file that defines the
OperatorGroup
object:Example
operatorgroup.yaml
fileapiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: aws-lb-operatorgroup namespace: aws-load-balancer-operator spec: upgradeStrategy: Default
apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: aws-lb-operatorgroup namespace: aws-load-balancer-operator spec: upgradeStrategy: Default
Copy to Clipboard Copied! Create the
OperatorGroup
object by running the following command:oc apply -f operatorgroup.yaml
$ oc apply -f operatorgroup.yaml
Copy to Clipboard Copied!
Create a
Subscription
object:Create a YAML file that defines the
Subscription
object:Example
subscription.yaml
fileapiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: aws-load-balancer-operator namespace: aws-load-balancer-operator spec: channel: stable-v1 installPlanApproval: Automatic name: aws-load-balancer-operator source: redhat-operators sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: aws-load-balancer-operator namespace: aws-load-balancer-operator spec: channel: stable-v1 installPlanApproval: Automatic name: aws-load-balancer-operator source: redhat-operators sourceNamespace: openshift-marketplace
Copy to Clipboard Copied! Create the
Subscription
object by running the following command:oc apply -f subscription.yaml
$ oc apply -f subscription.yaml
Copy to Clipboard Copied!
Verification
Get the name of the install plan from the subscription:
oc -n aws-load-balancer-operator \ get subscription aws-load-balancer-operator \ --template='{{.status.installplan.name}}{{"\n"}}'
$ oc -n aws-load-balancer-operator \ get subscription aws-load-balancer-operator \ --template='{{.status.installplan.name}}{{"\n"}}'
Copy to Clipboard Copied! Check the status of the install plan:
oc -n aws-load-balancer-operator \ get ip <install_plan_name> \ --template='{{.status.phase}}{{"\n"}}'
$ oc -n aws-load-balancer-operator \ get ip <install_plan_name> \ --template='{{.status.phase}}{{"\n"}}'
Copy to Clipboard Copied! The output must be
Complete
.
23.4. Preparing for the AWS Load Balancer Operator on a cluster using the AWS Security Token Service
You can install the AWS Load Balancer Operator on a cluster that uses STS. Follow these steps to prepare your cluster before installing the Operator.
The AWS Load Balancer Operator relies on the CredentialsRequest
object to bootstrap the Operator and the AWS Load Balancer Controller. The AWS Load Balancer Operator waits until the required secrets are created and available.
23.4.1. Creating an IAM role for the AWS Load Balancer Operator
An additional AWS Identity and Access Management (IAM) role is required to successfully install the AWS Load Balancer Operator on a cluster that uses STS. The IAM role is required to interact with subnets and Virtual Private Clouds (VPCs). The AWS Load Balancer Operator generates the CredentialsRequest
object with the IAM role to bootstrap itself.
You can create the IAM role by using the following options:
-
Using the Cloud Credential Operator utility (
ccoctl
) and a predefinedCredentialsRequest
object. - Using the AWS CLI and predefined AWS manifests.
Use the AWS CLI if your environment does not support the ccoctl
command.
23.4.1.1. Creating an AWS IAM role by using the Cloud Credential Operator utility
You can use the Cloud Credential Operator utility (ccoctl
) to create an AWS IAM role for the AWS Load Balancer Operator. An AWS IAM role is used to interact with subnets and Virtual Private Clouds (VPCs).
Prerequisites
-
You must extract and prepare the
ccoctl
binary.
Procedure
Download the
CredentialsRequest
custom resource (CR) and store it in a directory by running the following command:curl --create-dirs -o <credrequests-dir>/operator.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/operator-credentials-request.yaml
$ curl --create-dirs -o <credrequests-dir>/operator.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/operator-credentials-request.yaml
Copy to Clipboard Copied! Use the
ccoctl
utility to create an AWS IAM role by running the following command:ccoctl aws create-iam-roles \ --name <name> \ --region=<aws_region> \ --credentials-requests-dir=<credrequests-dir> \ --identity-provider-arn <oidc-arn>
$ ccoctl aws create-iam-roles \ --name <name> \ --region=<aws_region> \ --credentials-requests-dir=<credrequests-dir> \ --identity-provider-arn <oidc-arn>
Copy to Clipboard Copied! Example output
2023/09/12 11:38:57 Role arn:aws:iam::777777777777:role/<name>-aws-load-balancer-operator-aws-load-balancer-operator created 2023/09/12 11:38:57 Saved credentials configuration to: /home/user/<credrequests-dir>/manifests/aws-load-balancer-operator-aws-load-balancer-operator-credentials.yaml 2023/09/12 11:38:58 Updated Role policy for Role <name>-aws-load-balancer-operator-aws-load-balancer-operator created
2023/09/12 11:38:57 Role arn:aws:iam::777777777777:role/<name>-aws-load-balancer-operator-aws-load-balancer-operator created
1 2023/09/12 11:38:57 Saved credentials configuration to: /home/user/<credrequests-dir>/manifests/aws-load-balancer-operator-aws-load-balancer-operator-credentials.yaml 2023/09/12 11:38:58 Updated Role policy for Role <name>-aws-load-balancer-operator-aws-load-balancer-operator created
Copy to Clipboard Copied! - 1
- Note the Amazon Resource Name (ARN) of an AWS IAM role.
NoteThe length of an AWS IAM role name must be less than or equal to 12 characters.
23.4.1.2. Creating an AWS IAM role by using the AWS CLI
You can use the AWS Command Line Interface to create an IAM role for the AWS Load Balancer Operator. The IAM role is used to interact with subnets and Virtual Private Clouds (VPCs).
Prerequisites
-
You must have access to the AWS Command Line Interface (
aws
).
Procedure
Generate a trust policy file by using your identity provider by running the following command:
cat <<EOF > albo-operator-trust-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::777777777777:oidc-provider/<oidc-provider-id>" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "<oidc-provider-id>:sub": "system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-operator-controller-manager" } } } ] } EOF
$ cat <<EOF > albo-operator-trust-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::777777777777:oidc-provider/<oidc-provider-id>"
1 }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "<oidc-provider-id>:sub": "system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-operator-controller-manager"
2 } } } ] } EOF
Copy to Clipboard Copied! Create the IAM role with the generated trust policy by running the following command:
aws iam create-role --role-name albo-operator --assume-role-policy-document file://albo-operator-trust-policy.json
$ aws iam create-role --role-name albo-operator --assume-role-policy-document file://albo-operator-trust-policy.json
Copy to Clipboard Copied! Example output
ROLE arn:aws:iam::777777777777:role/albo-operator 2023-08-02T12:13:22Z ASSUMEROLEPOLICYDOCUMENT 2012-10-17 STATEMENT sts:AssumeRoleWithWebIdentity Allow STRINGEQUALS system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-controller-manager PRINCIPAL arn:aws:iam:777777777777:oidc-provider/<oidc-provider-id>
ROLE arn:aws:iam::777777777777:role/albo-operator 2023-08-02T12:13:22Z
1 ASSUMEROLEPOLICYDOCUMENT 2012-10-17 STATEMENT sts:AssumeRoleWithWebIdentity Allow STRINGEQUALS system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-controller-manager PRINCIPAL arn:aws:iam:777777777777:oidc-provider/<oidc-provider-id>
Copy to Clipboard Copied! - 1
- Note the ARN of the created IAM role.
Download the permission policy for the AWS Load Balancer Operator by running the following command:
curl -o albo-operator-permission-policy.json https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/operator-permission-policy.json
$ curl -o albo-operator-permission-policy.json https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/operator-permission-policy.json
Copy to Clipboard Copied! Attach the permission policy for the AWS Load Balancer Controller to the IAM role by running the following command:
aws iam put-role-policy --role-name albo-operator --policy-name perms-policy-albo-operator --policy-document file://albo-operator-permission-policy.json
$ aws iam put-role-policy --role-name albo-operator --policy-name perms-policy-albo-operator --policy-document file://albo-operator-permission-policy.json
Copy to Clipboard Copied!
23.4.2. Configuring the ARN role for the AWS Load Balancer Operator
You can configure the Amazon Resource Name (ARN) role for the AWS Load Balancer Operator as an environment variable. You can configure the ARN role by using the CLI.
Prerequisites
-
You have installed the OpenShift CLI (
oc
).
Procedure
Create the
aws-load-balancer-operator
project by running the following command:oc new-project aws-load-balancer-operator
$ oc new-project aws-load-balancer-operator
Copy to Clipboard Copied! Create the
OperatorGroup
object by running the following command:cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: aws-load-balancer-operator namespace: aws-load-balancer-operator spec: targetNamespaces: [] EOF
$ cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: aws-load-balancer-operator namespace: aws-load-balancer-operator spec: targetNamespaces: [] EOF
Copy to Clipboard Copied! Create the
Subscription
object by running the following command:cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: aws-load-balancer-operator namespace: aws-load-balancer-operator spec: channel: stable-v1 name: aws-load-balancer-operator source: redhat-operators sourceNamespace: openshift-marketplace config: env: - name: ROLEARN value: "<role-arn>" EOF
$ cat <<EOF | oc apply -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: aws-load-balancer-operator namespace: aws-load-balancer-operator spec: channel: stable-v1 name: aws-load-balancer-operator source: redhat-operators sourceNamespace: openshift-marketplace config: env: - name: ROLEARN value: "<role-arn>"
1 EOF
Copy to Clipboard Copied! - 1
- Specifies the ARN role to be used in the
CredentialsRequest
to provision the AWS credentials for the AWS Load Balancer Operator.
NoteThe AWS Load Balancer Operator waits until the secret is created before moving to the
Available
status.
23.4.3. Creating an IAM role for the AWS Load Balancer Controller
The CredentialsRequest
object for the AWS Load Balancer Controller must be set with a manually provisioned IAM role.
You can create the IAM role by using the following options:
-
Using the Cloud Credential Operator utility (
ccoctl
) and a predefinedCredentialsRequest
object. - Using the AWS CLI and predefined AWS manifests.
Use the AWS CLI if your environment does not support the ccoctl
command.
23.4.3.1. Creating an AWS IAM role for the controller by using the Cloud Credential Operator utility
You can use the Cloud Credential Operator utility (ccoctl
) to create an AWS IAM role for the AWS Load Balancer Controller. An AWS IAM role is used to interact with subnets and Virtual Private Clouds (VPCs).
Prerequisites
-
You must extract and prepare the
ccoctl
binary.
Procedure
Download the
CredentialsRequest
custom resource (CR) and store it in a directory by running the following command:curl --create-dirs -o <credrequests-dir>/controller.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/controller/controller-credentials-request.yaml
$ curl --create-dirs -o <credrequests-dir>/controller.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/controller/controller-credentials-request.yaml
Copy to Clipboard Copied! Use the
ccoctl
utility to create an AWS IAM role by running the following command:ccoctl aws create-iam-roles \ --name <name> \ --region=<aws_region> \ --credentials-requests-dir=<credrequests-dir> \ --identity-provider-arn <oidc-arn>
$ ccoctl aws create-iam-roles \ --name <name> \ --region=<aws_region> \ --credentials-requests-dir=<credrequests-dir> \ --identity-provider-arn <oidc-arn>
Copy to Clipboard Copied! Example output
2023/09/12 11:38:57 Role arn:aws:iam::777777777777:role/<name>-aws-load-balancer-operator-aws-load-balancer-controller created 2023/09/12 11:38:57 Saved credentials configuration to: /home/user/<credrequests-dir>/manifests/aws-load-balancer-operator-aws-load-balancer-controller-credentials.yaml 2023/09/12 11:38:58 Updated Role policy for Role <name>-aws-load-balancer-operator-aws-load-balancer-controller created
2023/09/12 11:38:57 Role arn:aws:iam::777777777777:role/<name>-aws-load-balancer-operator-aws-load-balancer-controller created
1 2023/09/12 11:38:57 Saved credentials configuration to: /home/user/<credrequests-dir>/manifests/aws-load-balancer-operator-aws-load-balancer-controller-credentials.yaml 2023/09/12 11:38:58 Updated Role policy for Role <name>-aws-load-balancer-operator-aws-load-balancer-controller created
Copy to Clipboard Copied! - 1
- Note the Amazon Resource Name (ARN) of an AWS IAM role.
NoteThe length of an AWS IAM role name must be less than or equal to 12 characters.
23.4.3.2. Creating an AWS IAM role for the controller by using the AWS CLI
You can use the AWS command-line interface to create an AWS IAM role for the AWS Load Balancer Controller. An AWS IAM role is used to interact with subnets and Virtual Private Clouds (VPCs).
Prerequisites
-
You must have access to the AWS command-line interface (
aws
).
Procedure
Generate a trust policy file using your identity provider by running the following command:
cat <<EOF > albo-controller-trust-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::777777777777:oidc-provider/<oidc-provider-id>" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "<oidc-provider-id>:sub": "system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-controller-cluster" } } } ] } EOF
$ cat <<EOF > albo-controller-trust-policy.json { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Federated": "arn:aws:iam::777777777777:oidc-provider/<oidc-provider-id>"
1 }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { "<oidc-provider-id>:sub": "system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-controller-cluster"
2 } } } ] } EOF
Copy to Clipboard Copied! Create an AWS IAM role with the generated trust policy by running the following command:
aws iam create-role --role-name albo-controller --assume-role-policy-document file://albo-controller-trust-policy.json
$ aws iam create-role --role-name albo-controller --assume-role-policy-document file://albo-controller-trust-policy.json
Copy to Clipboard Copied! Example output
ROLE arn:aws:iam::777777777777:role/albo-controller 2023-08-02T12:13:22Z ASSUMEROLEPOLICYDOCUMENT 2012-10-17 STATEMENT sts:AssumeRoleWithWebIdentity Allow STRINGEQUALS system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-controller-cluster PRINCIPAL arn:aws:iam:777777777777:oidc-provider/<oidc-provider-id>
ROLE arn:aws:iam::777777777777:role/albo-controller 2023-08-02T12:13:22Z
1 ASSUMEROLEPOLICYDOCUMENT 2012-10-17 STATEMENT sts:AssumeRoleWithWebIdentity Allow STRINGEQUALS system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-controller-cluster PRINCIPAL arn:aws:iam:777777777777:oidc-provider/<oidc-provider-id>
Copy to Clipboard Copied! - 1
- Note the ARN of an AWS IAM role.
Download the permission policy for the AWS Load Balancer Controller by running the following command:
curl -o albo-controller-permission-policy.json https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/assets/iam-policy.json
$ curl -o albo-controller-permission-policy.json https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/assets/iam-policy.json
Copy to Clipboard Copied! Attach the permission policy for the AWS Load Balancer Controller to an AWS IAM role by running the following command:
aws iam put-role-policy --role-name albo-controller --policy-name perms-policy-albo-controller --policy-document file://albo-controller-permission-policy.json
$ aws iam put-role-policy --role-name albo-controller --policy-name perms-policy-albo-controller --policy-document file://albo-controller-permission-policy.json
Copy to Clipboard Copied! Create a YAML file that defines the
AWSLoadBalancerController
object:Example
sample-aws-lb-manual-creds.yaml
file:apiVersion: networking.olm.openshift.io/v1 kind: AWSLoadBalancerController metadata: name: cluster spec: credentialsRequestConfig: stsIAMRoleARN: <role-arn>
apiVersion: networking.olm.openshift.io/v1 kind: AWSLoadBalancerController
1 metadata: name: cluster
2 spec: credentialsRequestConfig: stsIAMRoleARN: <role-arn>
3 Copy to Clipboard Copied!
23.5. Creating an instance of the AWS Load Balancer Controller
After installing the AWS Load Balancer Operator, you can create the AWS Load Balancer Controller.
23.5.1. Creating the AWS Load Balancer Controller
You can install only a single instance of the AWSLoadBalancerController
object in a cluster. You can create the AWS Load Balancer Controller by using CLI. The AWS Load Balancer Operator reconciles only the cluster
named resource.
Prerequisites
-
You have created the
echoserver
namespace. -
You have access to the OpenShift CLI (
oc
).
Procedure
Create a YAML file that defines the
AWSLoadBalancerController
object:Example
sample-aws-lb.yaml
fileapiVersion: networking.olm.openshift.io/v1 kind: AWSLoadBalancerController metadata: name: cluster spec: subnetTagging: Auto additionalResourceTags: - key: example.org/security-scope value: staging ingressClass: alb config: replicas: 2 enabledAddons: - AWSWAFv2
apiVersion: networking.olm.openshift.io/v1 kind: AWSLoadBalancerController
1 metadata: name: cluster
2 spec: subnetTagging: Auto
3 additionalResourceTags:
4 - key: example.org/security-scope value: staging ingressClass: alb
5 config: replicas: 2
6 enabledAddons:
7 - AWSWAFv2
8 Copy to Clipboard Copied! - 1
- Defines the
AWSLoadBalancerController
object. - 2
- Defines the AWS Load Balancer Controller name. This instance name gets added as a suffix to all related resources.
- 3
- Configures the subnet tagging method for the AWS Load Balancer Controller. The following values are valid:
-
Auto
: The AWS Load Balancer Operator determines the subnets that belong to the cluster and tags them appropriately. The Operator cannot determine the role correctly if the internal subnet tags are not present on internal subnet. -
Manual
: You manually tag the subnets that belong to the cluster with the appropriate role tags. Use this option if you installed your cluster on user-provided infrastructure.
-
- 4
- Defines the tags used by the AWS Load Balancer Controller when it provisions AWS resources.
- 5
- Defines the ingress class name. The default value is
alb
. - 6
- Specifies the number of replicas of the AWS Load Balancer Controller.
- 7
- Specifies annotations as an add-on for the AWS Load Balancer Controller.
- 8
- Enables the
alb.ingress.kubernetes.io/wafv2-acl-arn
annotation.
Create the
AWSLoadBalancerController
object by running the following command:oc create -f sample-aws-lb.yaml
$ oc create -f sample-aws-lb.yaml
Copy to Clipboard Copied! Create a YAML file that defines the
Deployment
resource:Example
sample-aws-lb.yaml
fileapiVersion: apps/v1 kind: Deployment metadata: name: <echoserver> namespace: echoserver spec: selector: matchLabels: app: echoserver replicas: 3 template: metadata: labels: app: echoserver spec: containers: - image: openshift/origin-node command: - "/bin/socat" args: - TCP4-LISTEN:8080,reuseaddr,fork - EXEC:'/bin/bash -c \"printf \\\"HTTP/1.0 200 OK\r\n\r\n\\\"; sed -e \\\"/^\r/q\\\"\"' imagePullPolicy: Always name: echoserver ports: - containerPort: 8080
apiVersion: apps/v1 kind: Deployment
1 metadata: name: <echoserver>
2 namespace: echoserver spec: selector: matchLabels: app: echoserver replicas: 3
3 template: metadata: labels: app: echoserver spec: containers: - image: openshift/origin-node command: - "/bin/socat" args: - TCP4-LISTEN:8080,reuseaddr,fork - EXEC:'/bin/bash -c \"printf \\\"HTTP/1.0 200 OK\r\n\r\n\\\"; sed -e \\\"/^\r/q\\\"\"' imagePullPolicy: Always name: echoserver ports: - containerPort: 8080
Copy to Clipboard Copied! Create a YAML file that defines the
Service
resource:Example
service-albo.yaml
file:apiVersion: v1 kind: Service metadata: name: <echoserver> namespace: echoserver spec: ports: - port: 80 targetPort: 8080 protocol: TCP type: NodePort selector: app: echoserver
apiVersion: v1 kind: Service
1 metadata: name: <echoserver>
2 namespace: echoserver spec: ports: - port: 80 targetPort: 8080 protocol: TCP type: NodePort selector: app: echoserver
Copy to Clipboard Copied! Create a YAML file that defines the
Ingress
resource:Example
ingress-albo.yaml
file:apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: <name> namespace: echoserver annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: instance spec: ingressClassName: alb rules: - http: paths: - path: / pathType: Exact backend: service: name: <echoserver> port: number: 80
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: <name>
1 namespace: echoserver annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: instance spec: ingressClassName: alb rules: - http: paths: - path: / pathType: Exact backend: service: name: <echoserver>
2 port: number: 80
Copy to Clipboard Copied!
Verification
Save the status of the
Ingress
resource in theHOST
variable by running the following command:HOST=$(oc get ingress -n echoserver echoserver --template='{{(index .status.loadBalancer.ingress 0).hostname}}')
$ HOST=$(oc get ingress -n echoserver echoserver --template='{{(index .status.loadBalancer.ingress 0).hostname}}')
Copy to Clipboard Copied! Verify the status of the
Ingress
resource by running the following command:curl $HOST
$ curl $HOST
Copy to Clipboard Copied!
23.6. Serving multiple ingress resources through a single AWS Load Balancer
You can route the traffic to different services that are part of a single domain through a single AWS Load Balancer. Each Ingress resource provides different endpoints of the domain.
23.6.1. Creating multiple ingress resources through a single AWS Load Balancer
You can route the traffic to multiple ingress resources through a single AWS Load Balancer by using the CLI.
Prerequisites
-
You have an access to the OpenShift CLI (
oc
).
Procedure
Create an
IngressClassParams
resource YAML file, for example,sample-single-lb-params.yaml
, as follows:apiVersion: elbv2.k8s.aws/v1beta1 kind: IngressClassParams metadata: name: single-lb-params spec: group: name: single-lb
apiVersion: elbv2.k8s.aws/v1beta1
1 kind: IngressClassParams metadata: name: single-lb-params
2 spec: group: name: single-lb
3 Copy to Clipboard Copied! Create the
IngressClassParams
resource by running the following command:oc create -f sample-single-lb-params.yaml
$ oc create -f sample-single-lb-params.yaml
Copy to Clipboard Copied! Create the
IngressClass
resource YAML file, for example,sample-single-lb-class.yaml
, as follows:apiVersion: networking.k8s.io/v1 kind: IngressClass metadata: name: single-lb spec: controller: ingress.k8s.aws/alb parameters: apiGroup: elbv2.k8s.aws kind: IngressClassParams name: single-lb-params
apiVersion: networking.k8s.io/v1
1 kind: IngressClass metadata: name: single-lb
2 spec: controller: ingress.k8s.aws/alb
3 parameters: apiGroup: elbv2.k8s.aws
4 kind: IngressClassParams
5 name: single-lb-params
6 Copy to Clipboard Copied! - 1
- Defines the API group and version of the
IngressClass
resource. - 2
- Specifies the ingress class name.
- 3
- Defines the controller name. The
ingress.k8s.aws/alb
value denotes that all ingress resources of this class should be managed by the AWS Load Balancer Controller. - 4
- Defines the API group of the
IngressClassParams
resource. - 5
- Defines the resource type of the
IngressClassParams
resource. - 6
- Defines the
IngressClassParams
resource name.
Create the
IngressClass
resource by running the following command:oc create -f sample-single-lb-class.yaml
$ oc create -f sample-single-lb-class.yaml
Copy to Clipboard Copied! Create the
AWSLoadBalancerController
resource YAML file, for example,sample-single-lb.yaml
, as follows:apiVersion: networking.olm.openshift.io/v1 kind: AWSLoadBalancerController metadata: name: cluster spec: subnetTagging: Auto ingressClass: single-lb
apiVersion: networking.olm.openshift.io/v1 kind: AWSLoadBalancerController metadata: name: cluster spec: subnetTagging: Auto ingressClass: single-lb
1 Copy to Clipboard Copied! - 1
- Defines the name of the
IngressClass
resource.
Create the
AWSLoadBalancerController
resource by running the following command:oc create -f sample-single-lb.yaml
$ oc create -f sample-single-lb.yaml
Copy to Clipboard Copied! Create the
Ingress
resource YAML file, for example,sample-multiple-ingress.yaml
, as follows:apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-1 annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/group.order: "1" alb.ingress.kubernetes.io/target-type: instance spec: ingressClassName: single-lb rules: - host: example.com http: paths: - path: /blog pathType: Prefix backend: service: name: example-1 port: number: 80 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-2 annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/group.order: "2" alb.ingress.kubernetes.io/target-type: instance spec: ingressClassName: single-lb rules: - host: example.com http: paths: - path: /store pathType: Prefix backend: service: name: example-2 port: number: 80 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-3 annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/group.order: "3" alb.ingress.kubernetes.io/target-type: instance spec: ingressClassName: single-lb rules: - host: example.com http: paths: - path: / pathType: Prefix backend: service: name: example-3 port: number: 80
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-1
1 annotations: alb.ingress.kubernetes.io/scheme: internet-facing
2 alb.ingress.kubernetes.io/group.order: "1"
3 alb.ingress.kubernetes.io/target-type: instance
4 spec: ingressClassName: single-lb
5 rules: - host: example.com
6 http: paths: - path: /blog
7 pathType: Prefix backend: service: name: example-1
8 port: number: 80
9 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-2 annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/group.order: "2" alb.ingress.kubernetes.io/target-type: instance spec: ingressClassName: single-lb rules: - host: example.com http: paths: - path: /store pathType: Prefix backend: service: name: example-2 port: number: 80 --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: example-3 annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/group.order: "3" alb.ingress.kubernetes.io/target-type: instance spec: ingressClassName: single-lb rules: - host: example.com http: paths: - path: / pathType: Prefix backend: service: name: example-3 port: number: 80
Copy to Clipboard Copied! - 1
- Specifies the ingress name.
- 2
- Indicates the load balancer to provision in the public subnet to access the internet.
- 3
- Specifies the order in which the rules from the multiple ingress resources are matched when the request is received at the load balancer.
- 4
- Indicates that the load balancer will target OpenShift Container Platform nodes to reach the service.
- 5
- Specifies the ingress class that belongs to this ingress.
- 6
- Defines a domain name used for request routing.
- 7
- Defines the path that must route to the service.
- 8
- Defines the service name that serves the endpoint configured in the
Ingress
resource. - 9
- Defines the port on the service that serves the endpoint.
Create the
Ingress
resource by running the following command:oc create -f sample-multiple-ingress.yaml
$ oc create -f sample-multiple-ingress.yaml
Copy to Clipboard Copied!
23.7. Adding TLS termination
You can add TLS termination on the AWS Load Balancer.
23.7.1. Adding TLS termination on the AWS Load Balancer
You can route the traffic for the domain to pods of a service and add TLS termination on the AWS Load Balancer.
Prerequisites
-
You have an access to the OpenShift CLI (
oc
).
Procedure
Create a YAML file that defines the
AWSLoadBalancerController
resource:Example
add-tls-termination-albc.yaml
fileapiVersion: networking.olm.openshift.io/v1 kind: AWSLoadBalancerController metadata: name: cluster spec: subnetTagging: Auto ingressClass: tls-termination
apiVersion: networking.olm.openshift.io/v1 kind: AWSLoadBalancerController metadata: name: cluster spec: subnetTagging: Auto ingressClass: tls-termination
1 Copy to Clipboard Copied! - 1
- Defines the ingress class name. If the ingress class is not present in your cluster the AWS Load Balancer Controller creates one. The AWS Load Balancer Controller reconciles the additional ingress class values if
spec.controller
is set toingress.k8s.aws/alb
.
Create a YAML file that defines the
Ingress
resource:Example
add-tls-termination-ingress.yaml
fileapiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: <example> annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:xxxxx spec: ingressClassName: tls-termination rules: - host: <example.com> http: paths: - path: / pathType: Exact backend: service: name: <example-service> port: number: 80
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: <example>
1 annotations: alb.ingress.kubernetes.io/scheme: internet-facing
2 alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:xxxxx
3 spec: ingressClassName: tls-termination
4 rules: - host: <example.com>
5 http: paths: - path: / pathType: Exact backend: service: name: <example-service>
6 port: number: 80
Copy to Clipboard Copied! - 1
- Specifies the ingress name.
- 2
- The controller provisions the load balancer for ingress in a public subnet to access the load balancer over the internet.
- 3
- The Amazon Resource Name (ARN) of the certificate that you attach to the load balancer.
- 4
- Defines the ingress class name.
- 5
- Defines the domain for traffic routing.
- 6
- Defines the service for traffic routing.
23.8. Configuring cluster-wide proxy
You can configure the cluster-wide proxy in the AWS Load Balancer Operator. After configuring the cluster-wide proxy, Operator Lifecycle Manager (OLM) automatically updates all the deployments of the Operators with the environment variables such as HTTP_PROXY
, HTTPS_PROXY
, and NO_PROXY
. These variables are populated to the managed controller by the AWS Load Balancer Operator.
23.8.1. Trusting the certificate authority of the cluster-wide proxy
Create the config map to contain the certificate authority (CA) bundle in the
aws-load-balancer-operator
namespace by running the following command:oc -n aws-load-balancer-operator create configmap trusted-ca
$ oc -n aws-load-balancer-operator create configmap trusted-ca
Copy to Clipboard Copied! To inject the trusted CA bundle into the config map, add the
config.openshift.io/inject-trusted-cabundle=true
label to the config map by running the following command:oc -n aws-load-balancer-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
$ oc -n aws-load-balancer-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
Copy to Clipboard Copied! Update the AWS Load Balancer Operator subscription to access the config map in the AWS Load Balancer Operator deployment by running the following command:
oc -n aws-load-balancer-operator patch subscription aws-load-balancer-operator --type='merge' -p '{"spec":{"config":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}],"volumes":[{"name":"trusted-ca","configMap":{"name":"trusted-ca"}}],"volumeMounts":[{"name":"trusted-ca","mountPath":"/etc/pki/tls/certs/albo-tls-ca-bundle.crt","subPath":"ca-bundle.crt"}]}}}'
$ oc -n aws-load-balancer-operator patch subscription aws-load-balancer-operator --type='merge' -p '{"spec":{"config":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}],"volumes":[{"name":"trusted-ca","configMap":{"name":"trusted-ca"}}],"volumeMounts":[{"name":"trusted-ca","mountPath":"/etc/pki/tls/certs/albo-tls-ca-bundle.crt","subPath":"ca-bundle.crt"}]}}}'
Copy to Clipboard Copied! After the AWS Load Balancer Operator is deployed, verify that the CA bundle is added to the
aws-load-balancer-operator-controller-manager
deployment by running the following command:oc -n aws-load-balancer-operator exec deploy/aws-load-balancer-operator-controller-manager -c manager -- bash -c "ls -l /etc/pki/tls/certs/albo-tls-ca-bundle.crt; printenv TRUSTED_CA_CONFIGMAP_NAME"
$ oc -n aws-load-balancer-operator exec deploy/aws-load-balancer-operator-controller-manager -c manager -- bash -c "ls -l /etc/pki/tls/certs/albo-tls-ca-bundle.crt; printenv TRUSTED_CA_CONFIGMAP_NAME"
Copy to Clipboard Copied! Example output
-rw-r--r--. 1 root 1000690000 5875 Jan 11 12:25 /etc/pki/tls/certs/albo-tls-ca-bundle.crt trusted-ca
-rw-r--r--. 1 root 1000690000 5875 Jan 11 12:25 /etc/pki/tls/certs/albo-tls-ca-bundle.crt trusted-ca
Copy to Clipboard Copied! Optional: Restart deployment of the AWS Load Balancer Operator every time the config map changes by running the following command:
oc -n aws-load-balancer-operator rollout restart deployment/aws-load-balancer-operator-controller-manager
$ oc -n aws-load-balancer-operator rollout restart deployment/aws-load-balancer-operator-controller-manager
Copy to Clipboard Copied!
Chapter 24. Multiple networks
24.1. Understanding multiple networks
In Kubernetes, container networking is delegated to networking plugins that implement the Container Network Interface (CNI).
OpenShift Container Platform uses the Multus CNI plugin to allow chaining of CNI plugins. During cluster installation, you configure your default pod network. The default network handles all ordinary network traffic for the cluster. You can define an additional network based on the available CNI plugins and attach one or more of these networks to your pods. You can define more than one additional network for your cluster, depending on your needs. This gives you flexibility when you configure pods that deliver network functionality, such as switching or routing.
24.1.1. Usage scenarios for an additional network
You can use an additional network in situations where network isolation is needed, including data plane and control plane separation. Isolating network traffic is useful for the following performance and security reasons:
- Performance
- You can send traffic on two different planes to manage how much traffic is along each plane.
- Security
- You can send sensitive traffic onto a network plane that is managed specifically for security considerations, and you can separate private data that must not be shared between tenants or customers.
All of the pods in the cluster still use the cluster-wide default network to maintain connectivity across the cluster. Every pod has an eth0
interface that is attached to the cluster-wide pod network. You can view the interfaces for a pod by using the oc exec -it <pod_name> -- ip a
command. If you add additional network interfaces that use Multus CNI, they are named net1
, net2
, …, netN
.
To attach additional network interfaces to a pod, you must create configurations that define how the interfaces are attached. You specify each interface by using a NetworkAttachmentDefinition
custom resource (CR). A CNI configuration inside each of these CRs defines how that interface is created.
24.1.2. Additional networks in OpenShift Container Platform
OpenShift Container Platform provides the following CNI plugins for creating additional networks in your cluster:
- bridge: Configure a bridge-based additional network to allow pods on the same host to communicate with each other and the host.
- host-device: Configure a host-device additional network to allow pods access to a physical Ethernet network device on the host system.
- ipvlan: Configure an ipvlan-based additional network to allow pods on a host to communicate with other hosts and pods on those hosts, similar to a macvlan-based additional network. Unlike a macvlan-based additional network, each pod shares the same MAC address as the parent physical network interface.
- vlan: Configure a vlan-based additional network to allow VLAN-based network isolation and connectivity for pods.
- macvlan: Configure a macvlan-based additional network to allow pods on a host to communicate with other hosts and pods on those hosts by using a physical network interface. Each pod that is attached to a macvlan-based additional network is provided a unique MAC address.
- tap: Configure a tap-based additional network to create a tap device inside the container namespace. A tap device enables user space programs to send and receive network packets.
- SR-IOV: Configure an SR-IOV based additional network to allow pods to attach to a virtual function (VF) interface on SR-IOV capable hardware on the host system.
-
route-override: Configure a
route-override
based additional network to allow pods to override and set routes.
24.2. Configuring an additional network
As a cluster administrator, you can configure an additional network for your cluster. The following network types are supported:
24.2.1. Approaches to managing an additional network
You can manage the lifecycle of an additional network in OpenShift Container Platform by using one of two approaches: modifying the Cluster Network Operator (CNO) configuration or applying a YAML manifest. Each approach is mutually exclusive and you can only use one approach for managing an additional network at a time. For either approach, the additional network is managed by a Container Network Interface (CNI) plugin that you configure. The two different approaches are summarized here:
-
Modifying the Cluster Network Operator (CNO) configuration: Configuring additional networks through CNO is only possible for cluster administrators. The CNO automatically creates and manages the
NetworkAttachmentDefinition
object. By using this approach, you can defineNetworkAttachmentDefinition
objects at install time through configuration of theinstall-config
. -
Applying a YAML manifest: You can manage the additional network directly by creating an
NetworkAttachmentDefinition
object. Compared to modifying the CNO configuration, this approach gives you more granular control and flexibility when it comes to configuration.
When deploying OpenShift Container Platform nodes with multiple network interfaces on Red Hat OpenStack Platform (RHOSP) with OVN Kubernetes, DNS configuration of the secondary interface might take precedence over the DNS configuration of the primary interface. In this case, remove the DNS nameservers for the subnet ID that is attached to the secondary interface:
openstack subnet set --dns-nameserver 0.0.0.0 <subnet_id>
$ openstack subnet set --dns-nameserver 0.0.0.0 <subnet_id>
24.2.2. IP address assignment for additional networks
For additional networks, IP addresses can be assigned using an IP Address Management (IPAM) CNI plugin, which supports various assignment methods, including Dynamic Host Configuration Protocol (DHCP) and static assignment.
The DHCP IPAM CNI plugin responsible for dynamic assignment of IP addresses operates with two distinct components:
- CNI Plugin: Responsible for integrating with the Kubernetes networking stack to request and release IP addresses.
- DHCP IPAM CNI Daemon: A listener for DHCP events that coordinates with existing DHCP servers in the environment to handle IP address assignment requests. This daemon is not a DHCP server itself.
For networks requiring type: dhcp
in their IPAM configuration, ensure the following:
- A DHCP server is available and running in the environment. The DHCP server is external to the cluster and is expected to be part of the customer’s existing network infrastructure.
- The DHCP server is appropriately configured to serve IP addresses to the nodes.
In cases where a DHCP server is unavailable in the environment, it is recommended to use the Whereabouts IPAM CNI plugin instead. The Whereabouts CNI provides similar IP address management capabilities without the need for an external DHCP server.
Use the Whereabouts CNI plugin when there is no external DHCP server or where static IP address management is preferred. The Whereabouts plugin includes a reconciler daemon to manage stale IP address allocations.
A DHCP lease must be periodically renewed throughout the container’s lifetime, so a separate daemon, the DHCP IPAM CNI Daemon, is required. To deploy the DHCP IPAM CNI daemon, modify the Cluster Network Operator (CNO) configuration to trigger the deployment of this daemon as part of the additional network setup.
24.2.3. Configuration for an additional network attachment
An additional network is configured by using the NetworkAttachmentDefinition
API in the k8s.cni.cncf.io
API group.
Do not store any sensitive information or a secret in the NetworkAttachmentDefinition
CRD because this information is accessible by the project administration user.
The configuration for the API is described in the following table:
Field | Type | Description |
---|---|---|
|
| The name for the additional network. |
|
| The namespace that the object is associated with. |
|
| The CNI plugin configuration in JSON format. |
24.2.3.1. Configuration of an additional network through the Cluster Network Operator
The configuration for an additional network attachment is specified as part of the Cluster Network Operator (CNO) configuration.
The following YAML describes the configuration parameters for managing an additional network with the CNO:
Cluster Network Operator configuration
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: # ... additionalNetworks: - name: <name> namespace: <namespace> rawCNIConfig: |- { ... } type: Raw
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
# ...
additionalNetworks:
- name: <name>
namespace: <namespace>
rawCNIConfig: |-
{
...
}
type: Raw
- 1
- An array of one or more additional network configurations.
- 2
- The name for the additional network attachment that you are creating. The name must be unique within the specified
namespace
. - 3
- The namespace to create the network attachment in. If you do not specify a value then the
default
namespace is used.ImportantTo prevent namespace issues for the OVN-Kubernetes network plugin, do not name your additional network attachment
default
, because this namespace is reserved for thedefault
additional network attachment. - 4
- A CNI plugin configuration in JSON format.
24.2.3.2. Configuration of an additional network from a YAML manifest
The configuration for an additional network is specified from a YAML configuration file, such as in the following example:
apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: <name> spec: config: |- { ... }
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: <name>
spec:
config: |-
{
...
}
24.2.4. Configurations for additional network types
The specific configuration fields for additional networks is described in the following sections.
24.2.4.1. Configuration for a bridge additional network
The following object describes the configuration parameters for the bridge CNI plugin:
Field | Type | Description |
---|---|---|
|
|
The CNI specification version. The |
|
|
The value for the |
|
|
The name of the CNI plugin to configure: |
|
| The configuration object for the IPAM CNI plugin. The plugin manages IP address assignment for the attachment definition. |
|
|
Optional: Specify the name of the virtual bridge to use. If the bridge interface does not exist on the host, it is created. The default value is |
|
|
Optional: Set to |
|
|
Optional: Set to |
|
|
Optional: Set to |
|
|
Optional: Set to |
|
|
Optional: Set to |
|
|
Optional: Set to |
|
| Optional: Specify a virtual LAN (VLAN) tag as an integer value. By default, no VLAN tag is assigned. |
|
|
Optional: Indicates whether the default vlan must be preserved on the |
|
|
Optional: Assign a VLAN trunk tag. The default value is |
|
| Optional: Set the maximum transmission unit (MTU) to the specified value. The default value is automatically set by the kernel. |
|
|
Optional: Enables duplicate address detection for the container side |
|
|
Optional: Enables mac spoof check, limiting the traffic originating from the container to the mac address of the interface. The default value is |
The VLAN parameter configures the VLAN tag on the host end of the veth
and also enables the vlan_filtering
feature on the bridge interface.
To configure uplink for a L2 network you need to allow the vlan on the uplink interface by using the following command:
bridge vlan add vid VLAN_ID dev DEV
$ bridge vlan add vid VLAN_ID dev DEV
24.2.4.1.1. bridge configuration example
The following example configures an additional network named bridge-net
:
{ "cniVersion": "0.3.1", "name": "bridge-net", "type": "bridge", "isGateway": true, "vlan": 2, "ipam": { "type": "dhcp" } }
{
"cniVersion": "0.3.1",
"name": "bridge-net",
"type": "bridge",
"isGateway": true,
"vlan": 2,
"ipam": {
"type": "dhcp"
}
}
24.2.4.2. Configuration for a host device additional network
Specify your network device by setting only one of the following parameters: device
,hwaddr
, kernelpath
, or pciBusID
.
The following object describes the configuration parameters for the host-device CNI plugin:
Field | Type | Description |
---|---|---|
|
|
The CNI specification version. The |
|
|
The value for the |
|
|
The name of the CNI plugin to configure: |
|
|
Optional: The name of the device, such as |
|
| Optional: The device hardware MAC address. |
|
|
Optional: The Linux kernel device path, such as |
|
|
Optional: The PCI address of the network device, such as |
24.2.4.2.1. host-device configuration example
The following example configures an additional network named hostdev-net
:
{ "cniVersion": "0.3.1", "name": "hostdev-net", "type": "host-device", "device": "eth1" }
{
"cniVersion": "0.3.1",
"name": "hostdev-net",
"type": "host-device",
"device": "eth1"
}
24.2.4.3. Configuration for a VLAN additional network
The following object describes the configuration parameters for the VLAN, vlan
, CNI plugin:
Field | Type | Description |
---|---|---|
|
|
The CNI specification version. The |
|
|
The value for the |
|
|
The name of the CNI plugin to configure: |
|
|
The Ethernet interface to associate with the network attachment. If a |
|
|
Set the ID of the |
|
| The configuration object for the IPAM CNI plugin. The plugin manages IP address assignment for the attachment definition. |
|
| Optional: Set the maximum transmission unit (MTU) to the specified value. The default value is automatically set by the kernel. |
|
| Optional: DNS information to return. For example, a priority-ordered list of DNS nameservers. |
|
|
Optional: Specifies whether the |
A NetworkAttachmentDefinition
custom resource definition (CRD) with a vlan
configuration can be used only on a single pod in a node because the CNI plugin cannot create multiple vlan
subinterfaces with the same vlanId
on the same master
interface.
24.2.4.3.1. VLAN configuration example
The following example demonstrates a vlan
configuration with an additional network that is named vlan-net
:
{ "name": "vlan-net", "cniVersion": "0.3.1", "type": "vlan", "master": "eth0", "mtu": 1500, "vlanId": 5, "linkInContainer": false, "ipam": { "type": "host-local", "subnet": "10.1.1.0/24" }, "dns": { "nameservers": [ "10.1.1.1", "8.8.8.8" ] } }
{
"name": "vlan-net",
"cniVersion": "0.3.1",
"type": "vlan",
"master": "eth0",
"mtu": 1500,
"vlanId": 5,
"linkInContainer": false,
"ipam": {
"type": "host-local",
"subnet": "10.1.1.0/24"
},
"dns": {
"nameservers": [ "10.1.1.1", "8.8.8.8" ]
}
}
24.2.4.4. Configuration for an IPVLAN additional network
The following object describes the configuration parameters for the IPVLAN, ipvlan
, CNI plugin:
Field | Type | Description |
---|---|---|
|
|
The CNI specification version. The |
|
|
The value for the |
|
|
The name of the CNI plugin to configure: |
|
| The configuration object for the IPAM CNI plugin. The plugin manages IP address assignment for the attachment definition. This is required unless the plugin is chained. |
|
|
Optional: The operating mode for the virtual network. The value must be |
|
|
Optional: The Ethernet interface to associate with the network attachment. If a |
|
| Optional: Set the maximum transmission unit (MTU) to the specified value. The default value is automatically set by the kernel. |
|
|
Optional: Specifies whether the |
-
The
ipvlan
object does not allow virtual interfaces to communicate with themaster
interface. Therefore the container will not be able to reach the host by using theipvlan
interface. Be sure that the container joins a network that provides connectivity to the host, such as a network supporting the Precision Time Protocol (PTP
). -
A single
master
interface cannot simultaneously be configured to use bothmacvlan
andipvlan
. -
For IP allocation schemes that cannot be interface agnostic, the
ipvlan
plugin can be chained with an earlier plugin that handles this logic. If themaster
is omitted, then the previous result must contain a single interface name for theipvlan
plugin to enslave. Ifipam
is omitted, then the previous result is used to configure theipvlan
interface.
24.2.4.4.1. ipvlan configuration example
The following example configures an additional network named ipvlan-net
:
{ "cniVersion": "0.3.1", "name": "ipvlan-net", "type": "ipvlan", "master": "eth1", "linkInContainer": false, "mode": "l3", "ipam": { "type": "static", "addresses": [ { "address": "192.168.10.10/24" } ] } }
{
"cniVersion": "0.3.1",
"name": "ipvlan-net",
"type": "ipvlan",
"master": "eth1",
"linkInContainer": false,
"mode": "l3",
"ipam": {
"type": "static",
"addresses": [
{
"address": "192.168.10.10/24"
}
]
}
}
24.2.4.5. Configuration for a MACVLAN additional network
The following object describes the configuration parameters for the MAC Virtual LAN (MACVLAN) Container Network Interface (CNI) plugin:
Field | Type | Description |
---|---|---|
|
|
The CNI specification version. The |
|
|
The value for the |
|
|
The name of the CNI plugin to configure: |
|
| The configuration object for the IPAM CNI plugin. The plugin manages IP address assignment for the attachment definition. |
|
|
Optional: Configures traffic visibility on the virtual network. Must be either |
|
| Optional: The host network interface to associate with the newly created macvlan interface. If a value is not specified, then the default route interface is used. |
|
| Optional: The maximum transmission unit (MTU) to the specified value. The default value is automatically set by the kernel. |
|
|
Optional: Specifies whether the |
If you specify the master
key for the plugin configuration, use a different physical network interface than the one that is associated with your primary network plugin to avoid possible conflicts.
24.2.4.5.1. MACVLAN configuration example
The following example configures an additional network named macvlan-net
:
{ "cniVersion": "0.3.1", "name": "macvlan-net", "type": "macvlan", "master": "eth1", "linkInContainer": false, "mode": "bridge", "ipam": { "type": "dhcp" } }
{
"cniVersion": "0.3.1",
"name": "macvlan-net",
"type": "macvlan",
"master": "eth1",
"linkInContainer": false,
"mode": "bridge",
"ipam": {
"type": "dhcp"
}
}
24.2.4.6. Configuration for a TAP additional network
The following object describes the configuration parameters for the TAP CNI plugin:
Field | Type | Description |
---|---|---|
|
|
The CNI specification version. The |
|
|
The value for the |
|
|
The name of the CNI plugin to configure: |
|
| Optional: Request the specified MAC address for the interface. |
|
| Optional: Set the maximum transmission unit (MTU) to the specified value. The default value is automatically set by the kernel. |
|
| Optional: The SELinux context to associate with the tap device. Note
The value |
|
|
Optional: Set to |
|
| Optional: The user owning the tap device. |
|
| Optional: The group owning the tap device. |
|
| Optional: Set the tap device as a port of an already existing bridge. |
24.2.4.6.1. Tap configuration example
The following example configures an additional network named mynet
:
{ "name": "mynet", "cniVersion": "0.3.1", "type": "tap", "mac": "00:11:22:33:44:55", "mtu": 1500, "selinuxcontext": "system_u:system_r:container_t:s0", "multiQueue": true, "owner": 0, "group": 0 "bridge": "br1" }
{
"name": "mynet",
"cniVersion": "0.3.1",
"type": "tap",
"mac": "00:11:22:33:44:55",
"mtu": 1500,
"selinuxcontext": "system_u:system_r:container_t:s0",
"multiQueue": true,
"owner": 0,
"group": 0
"bridge": "br1"
}
24.2.4.6.2. Setting SELinux boolean for the TAP CNI plugin
To create the tap device with the container_t
SELinux context, enable the container_use_devices
boolean on the host by using the Machine Config Operator (MCO).
Prerequisites
-
You have installed the OpenShift CLI (
oc
).
Procedure
Create a new YAML file named, such as
setsebool-container-use-devices.yaml
, with the following details:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-worker-setsebool spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: setsebool.service contents: | [Unit] Description=Set SELinux boolean for the TAP CNI plugin Before=kubelet.service [Service] Type=oneshot ExecStart=/usr/sbin/setsebool container_use_devices=on RemainAfterExit=true [Install] WantedBy=multi-user.target graphical.target
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: 99-worker-setsebool spec: config: ignition: version: 3.2.0 systemd: units: - enabled: true name: setsebool.service contents: | [Unit] Description=Set SELinux boolean for the TAP CNI plugin Before=kubelet.service [Service] Type=oneshot ExecStart=/usr/sbin/setsebool container_use_devices=on RemainAfterExit=true [Install] WantedBy=multi-user.target graphical.target
Copy to Clipboard Copied! Create the new
MachineConfig
object by running the following command:oc apply -f setsebool-container-use-devices.yaml
$ oc apply -f setsebool-container-use-devices.yaml
Copy to Clipboard Copied! NoteApplying any changes to the
MachineConfig
object causes all affected nodes to gracefully reboot after the change is applied. This update can take some time to be applied.Verify the change is applied by running the following command:
oc get machineconfigpools
$ oc get machineconfigpools
Copy to Clipboard Copied! Expected output
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-e5e0c8e8be9194e7c5a882e047379cfa True False False 3 3 3 0 7d2h worker rendered-worker-d6c9ca107fba6cd76cdcbfcedcafa0f2 True False False 3 3 3 0 7d
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-e5e0c8e8be9194e7c5a882e047379cfa True False False 3 3 3 0 7d2h worker rendered-worker-d6c9ca107fba6cd76cdcbfcedcafa0f2 True False False 3 3 3 0 7d
Copy to Clipboard Copied! NoteAll nodes should be in the updated and ready state.
24.2.4.7. Configuring routes using the route-override plugin on an additional network
The following object describes the configuration parameters for the route-override
CNI plugin:
Field | Type | Description |
---|---|---|
|
|
The name of the CNI plugin to configure: |
|
|
Optional: Set to |
|
|
Optional: Set to |
|
| Optional: Specify the list of routes to delete from the container namespace. |
|
|
Optional: Specify the list of routes to add to the container namespace. Each route is a dictionary with |
|
|
Optional: Set this to |
24.2.4.7.1. Route-override plugin configuration example
The route-override
CNI is a type of CNI that it is designed to be used when chained with a parent CNI. It does not operate independently, but relies on the parent CNI to first create the network interface and assign IP addresses before it can modify the routing rules.
The following example configures an additional network named mymacvlan
. The parent CNI creates a network interface attached to eth1
and assigns an IP address in the 192.168.1.0/24
range using host-local
IPAM. The route-override
CNI is then chained to the parent CNI and modifies the routing rules by flushing existing routes, deleting the route to 192.168.0.0/24
, and adding a new route for 192.168.0.0/24
with a custom gateway.
{ "cniVersion": "0.3.0", "name": "mymacvlan", "plugins": [ { "type": "macvlan", "master": "eth1", "mode": "bridge", "ipam": { "type": "host-local", "subnet": "192.168.1.0/24" } }, { "type": "route-override", "flushroutes": true, "delroutes": [ { "dst": "192.168.0.0/24" } ], "addroutes": [ { "dst": "192.168.0.0/24", "gw": "10.1.254.254" } ] } ] }
{
"cniVersion": "0.3.0",
"name": "mymacvlan",
"plugins": [
{
"type": "macvlan",
"master": "eth1",
"mode": "bridge",
"ipam": {
"type": "host-local",
"subnet": "192.168.1.0/24"
}
},
{
"type": "route-override",
"flushroutes": true,
"delroutes": [
{
"dst": "192.168.0.0/24"
}
],
"addroutes": [
{
"dst": "192.168.0.0/24",
"gw": "10.1.254.254"
}
]
}
]
}
24.2.4.8. Configuration for an OVN-Kubernetes additional network
The Red Hat OpenShift Networking OVN-Kubernetes network plugin allows the configuration of secondary network interfaces for pods. To configure secondary network interfaces, you must define the configurations in the NetworkAttachmentDefinition
custom resource definition (CRD).
Pod and multi-network policy creation might remain in a pending state until the OVN-Kubernetes control plane agent in the nodes processes the associated network-attachment-definition
CRD.
You can configure an OVN-Kubernetes additional network in either layer 2 or localnet topologies.
- A layer 2 topology supports east-west cluster traffic, but does not allow access to the underlying physical network.
- A localnet topology allows connections to the physical network, but requires additional configuration of the underlying Open vSwitch (OVS) bridge on cluster nodes.
The following sections provide example configurations for each of the topologies that OVN-Kubernetes currently allows for secondary networks.
Networks names must be unique. For example, creating multiple NetworkAttachmentDefinition
CRDs with different configurations that reference the same network is unsupported.
24.2.4.8.1. Supported platforms for OVN-Kubernetes additional network
You can use an OVN-Kubernetes additional network with the following supported platforms:
- Bare metal
- IBM Power®
- IBM Z®
- IBM® LinuxONE
- VMware vSphere
- Red Hat OpenStack Platform (RHOSP)
24.2.4.8.2. OVN-Kubernetes network plugin JSON configuration table
The following table describes the configuration parameters for the OVN-Kubernetes CNI network plugin:
Field | Type | Description |
---|---|---|
|
|
The CNI specification version. The required value is |
|
|
The name of the network. These networks are not namespaced. For example, a network named |
|
|
The name of the CNI plugin to configure. This value must be set to |
|
|
The topological configuration for the network. Must be one of |
|
| The subnet to use for the network across the cluster.
For When omitted, the logical switch implementing the network only provides layer 2 communication, and users must configure IP addresses for the pods. Port security only prevents MAC spoofing. |
|
|
The maximum transmission unit (MTU). The default value, |
|
|
The metadata |
|
| A comma-separated list of CIDRs and IP addresses. IP addresses are removed from the assignable IP address pool and are never passed to the pods. |
|
|
If topology is set to |
24.2.4.8.3. Compatibility with multi-network policy
The multi-network policy API, which is provided by the MultiNetworkPolicy
custom resource definition (CRD) in the k8s.cni.cncf.io
API group, is compatible with an OVN-Kubernetes secondary network. When defining a network policy, the network policy rules that can be used depend on whether the OVN-Kubernetes secondary network defines the subnets
field. Refer to the following table for details:
subnets field specified | Allowed multi-network policy selectors |
---|---|
Yes |
|
No |
|
You can use the k8s.v1.cni.cncf.io/policy-for
annotation on a MultiNetworkPolicy
object to point to a NetworkAttachmentDefinition
(NAD) custom resource (CR). The NAD CR defines the network to which the policy applies. The following example multi-network policy is valid only if the subnets
field is defined in the secondary network CNI configuration for the secondary network named blue2
:
Example multi-network policy that uses a pod selector
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: allow-same-namespace annotations: k8s.v1.cni.cncf.io/policy-for: blue2 spec: podSelector: ingress: - from: - podSelector: {}
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
name: allow-same-namespace
annotations:
k8s.v1.cni.cncf.io/policy-for: blue2
spec:
podSelector:
ingress:
- from:
- podSelector: {}
The following example uses the ipBlock
network policy selector, which is always valid for an OVN-Kubernetes additional network:
Example multi-network policy that uses an IP block selector
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: ingress-ipblock annotations: k8s.v1.cni.cncf.io/policy-for: default/flatl2net spec: podSelector: matchLabels: name: access-control policyTypes: - Ingress ingress: - from: - ipBlock: cidr: 10.200.0.0/30
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
name: ingress-ipblock
annotations:
k8s.v1.cni.cncf.io/policy-for: default/flatl2net
spec:
podSelector:
matchLabels:
name: access-control
policyTypes:
- Ingress
ingress:
- from:
- ipBlock:
cidr: 10.200.0.0/30
24.2.4.8.4. Configuration for a layer 2 switched topology
The switched (layer 2) topology networks interconnect the workloads through a cluster-wide logical switch. This configuration can be used for IPv6 and dual-stack deployments.
Layer 2 switched topology networks only allow for the transfer of data packets between pods within a cluster.
The following JSON example configures a switched secondary network:
{ "cniVersion": "0.3.1", "name": "l2-network", "type": "ovn-k8s-cni-overlay", "topology":"layer2", "subnets": "10.100.200.0/24", "mtu": 1300, "netAttachDefName": "ns1/l2-network", "excludeSubnets": "10.100.200.0/29" }
{
"cniVersion": "0.3.1",
"name": "l2-network",
"type": "ovn-k8s-cni-overlay",
"topology":"layer2",
"subnets": "10.100.200.0/24",
"mtu": 1300,
"netAttachDefName": "ns1/l2-network",
"excludeSubnets": "10.100.200.0/29"
}
24.2.4.8.5. Configuration for a localnet topology
The switched localnet
topology interconnects the workloads created as Network Attachment Definitions (NADs) through a cluster-wide logical switch to a physical network.
24.2.4.8.5.1. Prerequisites for configuring OVN-Kubernetes additional network
- The NMState Operator is installed. For more information, see About the Kubernetes NMState Operator.
24.2.4.8.5.2. Configuration for an OVN-Kubernetes additional network mapping
You must map an additional network to the OVN bridge to use it as an OVN-Kubernetes additional network. Bridge mappings allow network traffic to reach the physical network. A bridge mapping associates a physical network name, also known as an interface label, to a bridge created with Open vSwitch (OVS).
You can create an NodeNetworkConfigurationPolicy
(NNCP) object, part of the nmstate.io/v1
API group, to declaratively create the mapping. This API is provided by the NMState Operator. By using this API you can apply the bridge mapping to nodes that match your specified nodeSelector
expression, such as node-role.kubernetes.io/worker: ''
. With this declarative approach, the NMState Operator applies additional network configuration to all nodes specified by the node selector automatically and transparently.
When attaching an additional network, you can either use the existing br-ex
bridge or create a new bridge. Which approach to use depends on your specific network infrastructure. Consider the following approaches:
-
If your nodes include only a single network interface, you must use the existing bridge. This network interface is owned and managed by OVN-Kubernetes and you must not remove it from the
br-ex
bridge or alter the interface configuration. If you remove or alter the network interface, your cluster network will stop working correctly. - If your nodes include several network interfaces, you can attach a different network interface to a new bridge, and use that for your additional network. This approach provides for traffic isolation from your primary cluster network.
The localnet1
network is mapped to the br-ex
bridge in the following example:
Example mapping for sharing a bridge
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: mapping spec: nodeSelector: node-role.kubernetes.io/worker: '' desiredState: ovn: bridge-mappings: - localnet: localnet1 bridge: br-ex state: present
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: mapping
spec:
nodeSelector:
node-role.kubernetes.io/worker: ''
desiredState:
ovn:
bridge-mappings:
- localnet: localnet1
bridge: br-ex
state: present
- 1 1
- The name for the configuration object.
- 2
- A node selector that specifies the nodes to apply the node network configuration policy to.
- 3
- The name for the additional network from which traffic is forwarded to the OVS bridge. This additional network must match the name of the
spec.config.name
field of theNetworkAttachmentDefinition
CRD that defines the OVN-Kubernetes additional network. - 4
- The name of the OVS bridge on the node. This value is required only if you specify
state: present
. - 5
- The state for the mapping. Must be either
present
to add the bridge orabsent
to remove the bridge. The default value ispresent
.The following JSON example configures a localnet secondary network that is named
localnet1
:{ "cniVersion": "0.3.1", "name": "ns1-localnet-network", "type": "ovn-k8s-cni-overlay", "topology":"localnet", "physicalNetworkName": "localnet1", "subnets": "202.10.130.112/28", "vlanID": 33, "mtu": 1500, "netAttachDefName": "ns1/localnet-network", "excludeSubnets": "10.100.200.0/29" }
{ "cniVersion": "0.3.1", "name": "ns1-localnet-network", "type": "ovn-k8s-cni-overlay", "topology":"localnet", "physicalNetworkName": "localnet1", "subnets": "202.10.130.112/28", "vlanID": 33, "mtu": 1500, "netAttachDefName": "ns1/localnet-network", "excludeSubnets": "10.100.200.0/29" }
Copy to Clipboard Copied!
In the following example, the localnet2
network interface is attached to the ovs-br1
bridge. Through this attachment, the network interface is available to the OVN-Kubernetes network plugin as an additional network.
Example mapping for nodes with multiple interfaces
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: ovs-br1-multiple-networks spec: nodeSelector: node-role.kubernetes.io/worker: '' desiredState: interfaces: - name: ovs-br1 description: |- A dedicated OVS bridge with eth1 as a port allowing all VLANs and untagged traffic type: ovs-bridge state: up bridge: allow-extra-patch-ports: true options: stp: false port: - name: eth1 ovn: bridge-mappings: - localnet: localnet2 bridge: ovs-br1 state: present
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
name: ovs-br1-multiple-networks
spec:
nodeSelector:
node-role.kubernetes.io/worker: ''
desiredState:
interfaces:
- name: ovs-br1
description: |-
A dedicated OVS bridge with eth1 as a port
allowing all VLANs and untagged traffic
type: ovs-bridge
state: up
bridge:
allow-extra-patch-ports: true
options:
stp: false
port:
- name: eth1
ovn:
bridge-mappings:
- localnet: localnet2
bridge: ovs-br1
state: present
- 1
- The name for the configuration object.
- 2
- A node selector that specifies the nodes to apply the node network configuration policy to.
- 3
- A new OVS bridge, separate from the default bridge used by OVN-Kubernetes for all cluster traffic.
- 4
- A network device on the host system to associate with this new OVS bridge.
- 5
- The name for the additional network from which traffic is forwarded to the OVS bridge. This additional network must match the name of the
spec.config.name
field of theNetworkAttachmentDefinition
CRD that defines the OVN-Kubernetes additional network. - 6
- The name of the OVS bridge on the node. This value is required only if you specify
state: present
. - 7
- The state for the mapping. Must be either
present
to add the bridge orabsent
to remove the bridge. The default value ispresent
.The following JSON example configures a localnet secondary network that is named
localnet2
:{ "cniVersion": "0.3.1", "name": "ns1-localnet-network", "type": "ovn-k8s-cni-overlay", "topology":"localnet", "physicalNetworkName": "localnet2", "subnets": "202.10.130.112/28", "vlanID": 33, "mtu": 1500, "netAttachDefName": "ns1/localnet-network" "excludeSubnets": "10.100.200.0/29" }
{ "cniVersion": "0.3.1", "name": "ns1-localnet-network", "type": "ovn-k8s-cni-overlay", "topology":"localnet", "physicalNetworkName": "localnet2", "subnets": "202.10.130.112/28", "vlanID": 33, "mtu": 1500, "netAttachDefName": "ns1/localnet-network" "excludeSubnets": "10.100.200.0/29" }
Copy to Clipboard Copied!
24.2.4.8.6. Configuring pods for additional networks
You must specify the secondary network attachments through the k8s.v1.cni.cncf.io/networks
annotation.
The following example provisions a pod with two secondary attachments, one for each of the attachment configurations presented in this guide.
apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks: l2-network name: tinypod namespace: ns1 spec: containers: - args: - pause image: k8s.gcr.io/e2e-test-images/agnhost:2.36 imagePullPolicy: IfNotPresent name: agnhost-container
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: l2-network
name: tinypod
namespace: ns1
spec:
containers:
- args:
- pause
image: k8s.gcr.io/e2e-test-images/agnhost:2.36
imagePullPolicy: IfNotPresent
name: agnhost-container
24.2.4.8.7. Configuring pods with a static IP address
The following example provisions a pod with a static IP address.
- You can only specify the IP address for a pod’s secondary network attachment for layer 2 attachments.
- Specifying a static IP address for the pod is only possible when the attachment configuration does not feature subnets.
apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "l2-network", "mac": "02:03:04:05:06:07", "interface": "myiface1", "ips": [ "192.0.2.20/24" ] } ]' name: tinypod namespace: ns1 spec: containers: - args: - pause image: k8s.gcr.io/e2e-test-images/agnhost:2.36 imagePullPolicy: IfNotPresent name: agnhost-container
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "l2-network",
"mac": "02:03:04:05:06:07",
"interface": "myiface1",
"ips": [
"192.0.2.20/24"
]
}
]'
name: tinypod
namespace: ns1
spec:
containers:
- args:
- pause
image: k8s.gcr.io/e2e-test-images/agnhost:2.36
imagePullPolicy: IfNotPresent
name: agnhost-container
24.2.5. Configuration of IP address assignment for an additional network
The IP address management (IPAM) Container Network Interface (CNI) plugin provides IP addresses for other CNI plugins.
You can use the following IP address assignment types:
- Static assignment.
- Dynamic assignment through a DHCP server. The DHCP server you specify must be reachable from the additional network.
- Dynamic assignment through the Whereabouts IPAM CNI plugin.
24.2.5.1. Static IP address assignment configuration
The following table describes the configuration for static IP address assignment:
Field | Type | Description |
---|---|---|
|
|
The IPAM address type. The value |
|
| An array of objects specifying IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported. |
|
| An array of objects specifying routes to configure inside the pod. |
|
| Optional: An array of objects specifying the DNS configuration. |
The addresses
array requires objects with the following fields:
Field | Type | Description |
---|---|---|
|
|
An IP address and network prefix that you specify. For example, if you specify |
|
| The default gateway to route egress network traffic to. |
Field | Type | Description |
---|---|---|
|
|
The IP address range in CIDR format, such as |
|
| The gateway where network traffic is routed. |
Field | Type | Description |
---|---|---|
|
| An array of one or more IP addresses for to send DNS queries to. |
|
|
The default domain to append to a hostname. For example, if the domain is set to |
|
|
An array of domain names to append to an unqualified hostname, such as |
Static IP address assignment configuration example
{ "ipam": { "type": "static", "addresses": [ { "address": "191.168.1.7/24" } ] } }
{
"ipam": {
"type": "static",
"addresses": [
{
"address": "191.168.1.7/24"
}
]
}
}
24.2.5.2. Dynamic IP address (DHCP) assignment configuration
The following JSON describes the configuration for dynamic IP address address assignment with DHCP.
A pod obtains its original DHCP lease when it is created. The lease must be periodically renewed by a minimal DHCP server deployment running on the cluster.
To trigger the deployment of the DHCP server, you must create a shim network attachment by editing the Cluster Network Operator configuration, as in the following example:
Example shim network attachment definition
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: additionalNetworks: - name: dhcp-shim namespace: default type: Raw rawCNIConfig: |- { "name": "dhcp-shim", "cniVersion": "0.3.1", "type": "bridge", "ipam": { "type": "dhcp" } } # ...
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
additionalNetworks:
- name: dhcp-shim
namespace: default
type: Raw
rawCNIConfig: |-
{
"name": "dhcp-shim",
"cniVersion": "0.3.1",
"type": "bridge",
"ipam": {
"type": "dhcp"
}
}
# ...
Field | Type | Description |
---|---|---|
|
|
The IPAM address type. The value |
Dynamic IP address (DHCP) assignment configuration example
{ "ipam": { "type": "dhcp" } }
{
"ipam": {
"type": "dhcp"
}
}
24.2.5.3. Dynamic IP address assignment configuration with Whereabouts
The Whereabouts CNI plugin allows the dynamic assignment of an IP address to an additional network without the use of a DHCP server.
The following table describes the configuration for dynamic IP address assignment with Whereabouts:
Field | Type | Description |
---|---|---|
|
|
The IPAM address type. The value |
|
| An IP address and range in CIDR notation. IP addresses are assigned from within this range of addresses. |
|
| Optional: A list of zero or more IP addresses and ranges in CIDR notation. IP addresses within an excluded address range are not assigned. |
Dynamic IP address assignment configuration example that uses Whereabouts
{ "ipam": { "type": "whereabouts", "range": "192.0.2.192/27", "exclude": [ "192.0.2.192/30", "192.0.2.196/32" ] } }
{
"ipam": {
"type": "whereabouts",
"range": "192.0.2.192/27",
"exclude": [
"192.0.2.192/30",
"192.0.2.196/32"
]
}
}
24.2.5.4. Creating a whereabouts-reconciler daemon set
The Whereabouts reconciler is responsible for managing dynamic IP address assignments for the pods within a cluster by using the Whereabouts IP Address Management (IPAM) solution. It ensures that each pod gets a unique IP address from the specified IP address range. It also handles IP address releases when pods are deleted or scaled down.
You can also use a NetworkAttachmentDefinition
custom resource definition (CRD) for dynamic IP address assignment.
The whereabouts-reconciler
daemon set is automatically created when you configure an additional network through the Cluster Network Operator. It is not automatically created when you configure an additional network from a YAML manifest.
To trigger the deployment of the whereabouts-reconciler
daemon set, you must manually create a whereabouts-shim
network attachment by editing the Cluster Network Operator custom resource (CR) file.
Use the following procedure to deploy the whereabouts-reconciler
daemon set.
Procedure
Edit the
Network.operator.openshift.io
custom resource (CR) by running the following command:oc edit network.operator.openshift.io cluster
$ oc edit network.operator.openshift.io cluster
Copy to Clipboard Copied! Include the
additionalNetworks
section shown in this example YAML extract within thespec
definition of the custom resource (CR):apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster # ... spec: additionalNetworks: - name: whereabouts-shim namespace: default rawCNIConfig: |- { "name": "whereabouts-shim", "cniVersion": "0.3.1", "type": "bridge", "ipam": { "type": "whereabouts" } } type: Raw # ...
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster # ... spec: additionalNetworks: - name: whereabouts-shim namespace: default rawCNIConfig: |- { "name": "whereabouts-shim", "cniVersion": "0.3.1", "type": "bridge", "ipam": { "type": "whereabouts" } } type: Raw # ...
Copy to Clipboard Copied! - Save the file and exit the text editor.
Verify that the
whereabouts-reconciler
daemon set deployed successfully by running the following command:oc get all -n openshift-multus | grep whereabouts-reconciler
$ oc get all -n openshift-multus | grep whereabouts-reconciler
Copy to Clipboard Copied! Example output
pod/whereabouts-reconciler-jnp6g 1/1 Running 0 6s pod/whereabouts-reconciler-k76gg 1/1 Running 0 6s pod/whereabouts-reconciler-k86t9 1/1 Running 0 6s pod/whereabouts-reconciler-p4sxw 1/1 Running 0 6s pod/whereabouts-reconciler-rvfdv 1/1 Running 0 6s pod/whereabouts-reconciler-svzw9 1/1 Running 0 6s daemonset.apps/whereabouts-reconciler 6 6 6 6 6 kubernetes.io/os=linux 6s
pod/whereabouts-reconciler-jnp6g 1/1 Running 0 6s pod/whereabouts-reconciler-k76gg 1/1 Running 0 6s pod/whereabouts-reconciler-k86t9 1/1 Running 0 6s pod/whereabouts-reconciler-p4sxw 1/1 Running 0 6s pod/whereabouts-reconciler-rvfdv 1/1 Running 0 6s pod/whereabouts-reconciler-svzw9 1/1 Running 0 6s daemonset.apps/whereabouts-reconciler 6 6 6 6 6 kubernetes.io/os=linux 6s
Copy to Clipboard Copied!
24.2.5.5. Configuring the Whereabouts IP reconciler schedule
The Whereabouts IPAM CNI plugin runs the IP reconciler daily. This process cleans up any stranded IP allocations that might result in exhausting IPs and therefore prevent new pods from getting an IP allocated to them.
Use this procedure to change the frequency at which the IP reconciler runs.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role. -
You have deployed the
whereabouts-reconciler
daemon set, and thewhereabouts-reconciler
pods are up and running.
Procedure
Run the following command to create a
ConfigMap
object namedwhereabouts-config
in theopenshift-multus
namespace with a specific cron expression for the IP reconciler:oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/15 * * * *"
$ oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/15 * * * *"
Copy to Clipboard Copied! This cron expression indicates the IP reconciler runs every 15 minutes. Adjust the expression based on your specific requirements.
NoteThe
whereabouts-reconciler
daemon set can only consume a cron expression pattern that includes five asterisks. The sixth, which is used to denote seconds, is currently not supported.Retrieve information about resources related to the
whereabouts-reconciler
daemon set and pods within theopenshift-multus
namespace by running the following command:oc get all -n openshift-multus | grep whereabouts-reconciler
$ oc get all -n openshift-multus | grep whereabouts-reconciler
Copy to Clipboard Copied! Example output
pod/whereabouts-reconciler-2p7hw 1/1 Running 0 4m14s pod/whereabouts-reconciler-76jk7 1/1 Running 0 4m14s pod/whereabouts-reconciler-94zw6 1/1 Running 0 4m14s pod/whereabouts-reconciler-mfh68 1/1 Running 0 4m14s pod/whereabouts-reconciler-pgshz 1/1 Running 0 4m14s pod/whereabouts-reconciler-xn5xz 1/1 Running 0 4m14s daemonset.apps/whereabouts-reconciler 6 6 6 6 6 kubernetes.io/os=linux 4m16s
pod/whereabouts-reconciler-2p7hw 1/1 Running 0 4m14s pod/whereabouts-reconciler-76jk7 1/1 Running 0 4m14s pod/whereabouts-reconciler-94zw6 1/1 Running 0 4m14s pod/whereabouts-reconciler-mfh68 1/1 Running 0 4m14s pod/whereabouts-reconciler-pgshz 1/1 Running 0 4m14s pod/whereabouts-reconciler-xn5xz 1/1 Running 0 4m14s daemonset.apps/whereabouts-reconciler 6 6 6 6 6 kubernetes.io/os=linux 4m16s
Copy to Clipboard Copied! Run the following command to verify that the
whereabouts-reconciler
pod runs the IP reconciler with the configured interval:oc -n openshift-multus logs whereabouts-reconciler-2p7hw
$ oc -n openshift-multus logs whereabouts-reconciler-2p7hw
Copy to Clipboard Copied! Example output
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_33_54.1375928161": CREATE 2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_33_54.1375928161": CHMOD 2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..data_tmp": RENAME 2024-02-02T16:33:54Z [verbose] using expression: */15 * * * * 2024-02-02T16:33:54Z [verbose] configuration updated to file "/cron-schedule/..data". New cron expression: */15 * * * * 2024-02-02T16:33:54Z [verbose] successfully updated CRON configuration id "00c2d1c9-631d-403f-bb86-73ad104a6817" - new cron expression: */15 * * * * 2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/config": CREATE 2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_26_17.3874177937": REMOVE 2024-02-02T16:45:00Z [verbose] starting reconciler run 2024-02-02T16:45:00Z [debug] NewReconcileLooper - inferred connection data 2024-02-02T16:45:00Z [debug] listing IP pools 2024-02-02T16:45:00Z [debug] no IP addresses to cleanup 2024-02-02T16:45:00Z [verbose] reconciler success
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_33_54.1375928161": CREATE 2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_33_54.1375928161": CHMOD 2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..data_tmp": RENAME 2024-02-02T16:33:54Z [verbose] using expression: */15 * * * * 2024-02-02T16:33:54Z [verbose] configuration updated to file "/cron-schedule/..data". New cron expression: */15 * * * * 2024-02-02T16:33:54Z [verbose] successfully updated CRON configuration id "00c2d1c9-631d-403f-bb86-73ad104a6817" - new cron expression: */15 * * * * 2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/config": CREATE 2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_26_17.3874177937": REMOVE 2024-02-02T16:45:00Z [verbose] starting reconciler run 2024-02-02T16:45:00Z [debug] NewReconcileLooper - inferred connection data 2024-02-02T16:45:00Z [debug] listing IP pools 2024-02-02T16:45:00Z [debug] no IP addresses to cleanup 2024-02-02T16:45:00Z [verbose] reconciler success
Copy to Clipboard Copied!
24.2.5.6. Creating a configuration for assignment of dual-stack IP addresses dynamically
Dual-stack IP address assignment can be configured with the ipRanges
parameter for:
- IPv4 addresses
- IPv6 addresses
- multiple IP address assignment
Procedure
-
Set
type
towhereabouts
. Use
ipRanges
to allocate IP addresses as shown in the following example:cniVersion: operator.openshift.io/v1 kind: Network =metadata: name: cluster spec: additionalNetworks: - name: whereabouts-shim namespace: default type: Raw rawCNIConfig: |- { "name": "whereabouts-dual-stack", "cniVersion": "0.3.1, "type": "bridge", "ipam": { "type": "whereabouts", "ipRanges": [ {"range": "192.168.10.0/24"}, {"range": "2001:db8::/64"} ] } }
cniVersion: operator.openshift.io/v1 kind: Network =metadata: name: cluster spec: additionalNetworks: - name: whereabouts-shim namespace: default type: Raw rawCNIConfig: |- { "name": "whereabouts-dual-stack", "cniVersion": "0.3.1, "type": "bridge", "ipam": { "type": "whereabouts", "ipRanges": [ {"range": "192.168.10.0/24"}, {"range": "2001:db8::/64"} ] } }
Copy to Clipboard Copied! - Attach network to a pod. For more information, see "Adding a pod to an additional network".
- Verify that all IP addresses are assigned.
Run the following command to ensure the IP addresses are assigned as metadata.
$ oc exec -it mypod -- ip a
$ oc exec -it mypod -- ip a
Copy to Clipboard Copied!
24.2.6. Creating an additional network attachment with the Cluster Network Operator
The Cluster Network Operator (CNO) manages additional network definitions. When you specify an additional network to create, the CNO creates the NetworkAttachmentDefinition
CRD automatically.
Do not edit the NetworkAttachmentDefinition
CRDs that the Cluster Network Operator manages. Doing so might disrupt network traffic on your additional network.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Optional: Create the namespace for the additional networks:
oc create namespace <namespace_name>
$ oc create namespace <namespace_name>
Copy to Clipboard Copied! To edit the CNO configuration, enter the following command:
oc edit networks.operator.openshift.io cluster
$ oc edit networks.operator.openshift.io cluster
Copy to Clipboard Copied! Modify the CR that you are creating by adding the configuration for the additional network that you are creating, as in the following example CR.
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: # ... additionalNetworks: - name: tertiary-net namespace: namespace2 type: Raw rawCNIConfig: |- { "cniVersion": "0.3.1", "name": "tertiary-net", "type": "ipvlan", "master": "eth1", "mode": "l2", "ipam": { "type": "static", "addresses": [ { "address": "192.168.1.23/24" } ] } }
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: # ... additionalNetworks: - name: tertiary-net namespace: namespace2 type: Raw rawCNIConfig: |- { "cniVersion": "0.3.1", "name": "tertiary-net", "type": "ipvlan", "master": "eth1", "mode": "l2", "ipam": { "type": "static", "addresses": [ { "address": "192.168.1.23/24" } ] } }
Copy to Clipboard Copied! - Save your changes and quit the text editor to commit your changes.
Verification
Confirm that the CNO created the
NetworkAttachmentDefinition
CRD by running the following command. There might be a delay before the CNO creates the CRD.oc get network-attachment-definitions -n <namespace>
$ oc get network-attachment-definitions -n <namespace>
Copy to Clipboard Copied! where:
<namespace>
- Specifies the namespace for the network attachment that you added to the CNO configuration.
Example output
NAME AGE test-network-1 14m
NAME AGE test-network-1 14m
Copy to Clipboard Copied!
24.2.7. Creating an additional network attachment by applying a YAML manifest
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a YAML file with your additional network configuration, such as in the following example:
apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: next-net spec: config: |- { "cniVersion": "0.3.1", "name": "work-network", "type": "host-device", "device": "eth1", "ipam": { "type": "dhcp" } }
apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: next-net spec: config: |- { "cniVersion": "0.3.1", "name": "work-network", "type": "host-device", "device": "eth1", "ipam": { "type": "dhcp" } }
Copy to Clipboard Copied! To create the additional network, enter the following command:
oc apply -f <file>.yaml
$ oc apply -f <file>.yaml
Copy to Clipboard Copied! where:
<file>
- Specifies the name of the file contained the YAML manifest.
24.2.8. About configuring the master interface in the container network namespace
You can create a MAC-VLAN, an IP-VLAN, or a VLAN subinterface that is based on a master
interface that exists in a container namespace. You can also create a master
interface as part of the pod network configuration in a separate network attachment definition CRD.
To use a container namespace master
interface, you must specify true
for the linkInContainer
parameter that exists in the subinterface configuration of the NetworkAttachmentDefinition
CRD.
24.2.8.1. Creating multiple VLANs on SR-IOV VFs
An example use case for utilizing this feature is to create multiple VLANs based on SR-IOV VFs. To do so, begin by creating an SR-IOV network and then define the network attachments for the VLAN interfaces.
The following example shows how to configure the setup illustrated in this diagram.
Figure 24.1. Creating VLANs

Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role. - You have installed the SR-IOV Network Operator.
Procedure
Create a dedicated container namespace where you want to deploy your pod by using the following command:
oc new-project test-namespace
$ oc new-project test-namespace
Copy to Clipboard Copied! Create an SR-IOV node policy:
Create an
SriovNetworkNodePolicy
object, and then save the YAML in thesriov-node-network-policy.yaml
file:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriovnic namespace: openshift-sriov-network-operator spec: deviceType: netdevice isRdma: false needVhostNet: true nicSelector: vendor: "15b3" deviceID: "101b" rootDevices: ["00:05.0"] numVfs: 10 priority: 99 resourceName: sriovnic nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true"
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriovnic namespace: openshift-sriov-network-operator spec: deviceType: netdevice isRdma: false needVhostNet: true nicSelector: vendor: "15b3"
1 deviceID: "101b"
2 rootDevices: ["00:05.0"] numVfs: 10 priority: 99 resourceName: sriovnic nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true"
Copy to Clipboard Copied! NoteThe SR-IOV network node policy configuration example, with the setting
deviceType: netdevice
, is tailored specifically for Mellanox Network Interface Cards (NICs).Apply the YAML by running the following command:
oc apply -f sriov-node-network-policy.yaml
$ oc apply -f sriov-node-network-policy.yaml
Copy to Clipboard Copied! NoteApplying this might take some time due to the node requiring a reboot.
Create an SR-IOV network:
Create the
SriovNetwork
custom resource (CR) for the additional SR-IOV network attachment as in the following example CR. Save the YAML as the filesriov-network-attachment.yaml
:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-network namespace: openshift-sriov-network-operator spec: networkNamespace: test-namespace resourceName: sriovnic spoofChk: "off" trust: "on"
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-network namespace: openshift-sriov-network-operator spec: networkNamespace: test-namespace resourceName: sriovnic spoofChk: "off" trust: "on"
Copy to Clipboard Copied! Apply the YAML by running the following command:
oc apply -f sriov-network-attachment.yaml
$ oc apply -f sriov-network-attachment.yaml
Copy to Clipboard Copied!
Create the VLAN additional network:
Using the following YAML example, create a file named
vlan100-additional-network-configuration.yaml
:apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: vlan-100 namespace: test-namespace spec: config: | { "cniVersion": "0.4.0", "name": "vlan-100", "plugins": [ { "type": "vlan", "master": "ext0", "mtu": 1500, "vlanId": 100, "linkInContainer": true, "ipam": {"type": "whereabouts", "ipRanges": [{"range": "1.1.1.0/24"}]} } ] }
apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: vlan-100 namespace: test-namespace spec: config: | { "cniVersion": "0.4.0", "name": "vlan-100", "plugins": [ { "type": "vlan", "master": "ext0",
1 "mtu": 1500, "vlanId": 100, "linkInContainer": true,
2 "ipam": {"type": "whereabouts", "ipRanges": [{"range": "1.1.1.0/24"}]} } ] }
Copy to Clipboard Copied! Apply the YAML file by running the following command:
oc apply -f vlan100-additional-network-configuration.yaml
$ oc apply -f vlan100-additional-network-configuration.yaml
Copy to Clipboard Copied!
Create a pod definition by using the earlier specified networks:
Using the following YAML example, create a file named
pod-a.yaml
file:NoteThe manifest below includes 2 resources:
- Namespace with security labels
- Pod definition with appropriate network annotation
apiVersion: v1 kind: Namespace metadata: name: test-namespace labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/audit: privileged pod-security.kubernetes.io/warn: privileged security.openshift.io/scc.podSecurityLabelSync: "false" --- apiVersion: v1 kind: Pod metadata: name: nginx-pod namespace: test-namespace annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "sriov-network", "namespace": "test-namespace", "interface": "ext0" }, { "name": "vlan-100", "namespace": "test-namespace", "interface": "ext0.100" } ]' spec: securityContext: runAsNonRoot: true containers: - name: nginx-container image: nginxinc/nginx-unprivileged:latest securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] ports: - containerPort: 80 seccompProfile: type: "RuntimeDefault"
apiVersion: v1 kind: Namespace metadata: name: test-namespace labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/audit: privileged pod-security.kubernetes.io/warn: privileged security.openshift.io/scc.podSecurityLabelSync: "false" --- apiVersion: v1 kind: Pod metadata: name: nginx-pod namespace: test-namespace annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "sriov-network", "namespace": "test-namespace", "interface": "ext0"
1 }, { "name": "vlan-100", "namespace": "test-namespace", "interface": "ext0.100" } ]' spec: securityContext: runAsNonRoot: true containers: - name: nginx-container image: nginxinc/nginx-unprivileged:latest securityContext: allowPrivilegeEscalation: false capabilities: drop: ["ALL"] ports: - containerPort: 80 seccompProfile: type: "RuntimeDefault"
Copy to Clipboard Copied! - 1
- The name to be used as the
master
for the VLAN interface.
Apply the YAML file by running the following command:
oc apply -f pod-a.yaml
$ oc apply -f pod-a.yaml
Copy to Clipboard Copied!
Get detailed information about the
nginx-pod
within thetest-namespace
by running the following command:oc describe pods nginx-pod -n test-namespace
$ oc describe pods nginx-pod -n test-namespace
Copy to Clipboard Copied! Example output
Name: nginx-pod Namespace: test-namespace Priority: 0 Node: worker-1/10.46.186.105 Start Time: Mon, 14 Aug 2023 16:23:13 -0400 Labels: <none> Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.131.0.26/23"],"mac_address":"0a:58:0a:83:00:1a","gateway_ips":["10.131.0.1"],"routes":[{"dest":"10.128.0.0... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.131.0.26" ], "mac": "0a:58:0a:83:00:1a", "default": true, "dns": {} },{ "name": "test-namespace/sriov-network", "interface": "ext0", "mac": "6e:a7:5e:3f:49:1b", "dns": {}, "device-info": { "type": "pci", "version": "1.0.0", "pci": { "pci-address": "0000:d8:00.2" } } },{ "name": "test-namespace/vlan-100", "interface": "ext0.100", "ips": [ "1.1.1.1" ], "mac": "6e:a7:5e:3f:49:1b", "dns": {} }] k8s.v1.cni.cncf.io/networks: [ { "name": "sriov-network", "namespace": "test-namespace", "interface": "ext0" }, { "name": "vlan-100", "namespace": "test-namespace", "i... openshift.io/scc: privileged Status: Running IP: 10.131.0.26 IPs: IP: 10.131.0.26
Name: nginx-pod Namespace: test-namespace Priority: 0 Node: worker-1/10.46.186.105 Start Time: Mon, 14 Aug 2023 16:23:13 -0400 Labels: <none> Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.131.0.26/23"],"mac_address":"0a:58:0a:83:00:1a","gateway_ips":["10.131.0.1"],"routes":[{"dest":"10.128.0.0... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.131.0.26" ], "mac": "0a:58:0a:83:00:1a", "default": true, "dns": {} },{ "name": "test-namespace/sriov-network", "interface": "ext0", "mac": "6e:a7:5e:3f:49:1b", "dns": {}, "device-info": { "type": "pci", "version": "1.0.0", "pci": { "pci-address": "0000:d8:00.2" } } },{ "name": "test-namespace/vlan-100", "interface": "ext0.100", "ips": [ "1.1.1.1" ], "mac": "6e:a7:5e:3f:49:1b", "dns": {} }] k8s.v1.cni.cncf.io/networks: [ { "name": "sriov-network", "namespace": "test-namespace", "interface": "ext0" }, { "name": "vlan-100", "namespace": "test-namespace", "i... openshift.io/scc: privileged Status: Running IP: 10.131.0.26 IPs: IP: 10.131.0.26
Copy to Clipboard Copied!
24.2.8.2. Creating a subinterface based on a bridge master interface in a container namespace
You can create a subinterface based on a bridge master
interface that exists in a container namespace. Creating a subinterface can be applied to other types of interfaces.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You are logged in to the OpenShift Container Platform cluster as a user with
cluster-admin
privileges.
Procedure
Create a dedicated container namespace where you want to deploy your pod by entering the following command:
oc new-project test-namespace
$ oc new-project test-namespace
Copy to Clipboard Copied! Using the following YAML example, create a bridge
NetworkAttachmentDefinition
custom resource definition (CRD) file namedbridge-nad.yaml
:apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: bridge-network spec: config: '{ "cniVersion": "0.4.0", "name": "bridge-network", "type": "bridge", "bridge": "br-001", "isGateway": true, "ipMasq": true, "hairpinMode": true, "ipam": { "type": "host-local", "subnet": "10.0.0.0/24", "routes": [{"dst": "0.0.0.0/0"}] } }'
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: bridge-network spec: config: '{ "cniVersion": "0.4.0", "name": "bridge-network", "type": "bridge", "bridge": "br-001", "isGateway": true, "ipMasq": true, "hairpinMode": true, "ipam": { "type": "host-local", "subnet": "10.0.0.0/24", "routes": [{"dst": "0.0.0.0/0"}] } }'
Copy to Clipboard Copied! Run the following command to apply the
NetworkAttachmentDefinition
CRD to your OpenShift Container Platform cluster:oc apply -f bridge-nad.yaml
$ oc apply -f bridge-nad.yaml
Copy to Clipboard Copied! Verify that you successfully created a
NetworkAttachmentDefinition
CRD by entering the following command:oc get network-attachment-definitions
$ oc get network-attachment-definitions
Copy to Clipboard Copied! Example output
NAME AGE bridge-network 15s
NAME AGE bridge-network 15s
Copy to Clipboard Copied! Using the following YAML example, create a file named
ipvlan-additional-network-configuration.yaml
for the IPVLAN additional network configuration:apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: ipvlan-net namespace: test-namespace spec: config: '{ "cniVersion": "0.3.1", "name": "ipvlan-net", "type": "ipvlan", "master": "ext0", "mode": "l3", "linkInContainer": true, "ipam": {"type": "whereabouts", "ipRanges": [{"range": "10.0.0.0/24"}]} }'
apiVersion: k8s.cni.cncf.io/v1 kind: NetworkAttachmentDefinition metadata: name: ipvlan-net namespace: test-namespace spec: config: '{ "cniVersion": "0.3.1", "name": "ipvlan-net", "type": "ipvlan", "master": "ext0",
1 "mode": "l3", "linkInContainer": true,
2 "ipam": {"type": "whereabouts", "ipRanges": [{"range": "10.0.0.0/24"}]} }'
Copy to Clipboard Copied! Apply the YAML file by running the following command:
oc apply -f ipvlan-additional-network-configuration.yaml
$ oc apply -f ipvlan-additional-network-configuration.yaml
Copy to Clipboard Copied! Verify that the
NetworkAttachmentDefinition
CRD has been created successfully by running the following command:oc get network-attachment-definitions
$ oc get network-attachment-definitions
Copy to Clipboard Copied! Example output
NAME AGE bridge-network 87s ipvlan-net 9s
NAME AGE bridge-network 87s ipvlan-net 9s
Copy to Clipboard Copied! Using the following YAML example, create a file named
pod-a.yaml
for the pod definition:apiVersion: v1 kind: Pod metadata: name: pod-a namespace: test-namespace annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "bridge-network", "interface": "ext0" }, { "name": "ipvlan-net", "interface": "ext1" } ]' spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: test-pod image: quay.io/openshifttest/hello-sdn@sha256:c89445416459e7adea9a5a416b3365ed3d74f2491beb904d61dc8d1eb89a72a4 securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL]
apiVersion: v1 kind: Pod metadata: name: pod-a namespace: test-namespace annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "bridge-network", "interface": "ext0"
1 }, { "name": "ipvlan-net", "interface": "ext1" } ]' spec: securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault containers: - name: test-pod image: quay.io/openshifttest/hello-sdn@sha256:c89445416459e7adea9a5a416b3365ed3d74f2491beb904d61dc8d1eb89a72a4 securityContext: allowPrivilegeEscalation: false capabilities: drop: [ALL]
Copy to Clipboard Copied! - 1
- Specifies the name to be used as the
master
for the IPVLAN interface.
Apply the YAML file by running the following command:
oc apply -f pod-a.yaml
$ oc apply -f pod-a.yaml
Copy to Clipboard Copied! Verify that the pod is running by using the following command:
oc get pod -n test-namespace
$ oc get pod -n test-namespace
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE pod-a 1/1 Running 0 2m36s
NAME READY STATUS RESTARTS AGE pod-a 1/1 Running 0 2m36s
Copy to Clipboard Copied! Show network interface information about the
pod-a
resource within thetest-namespace
by running the following command:oc exec -n test-namespace pod-a -- ip a
$ oc exec -n test-namespace pod-a -- ip a
Copy to Clipboard Copied! Example output
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 3: eth0@if105: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:d9:00:5d brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.217.0.93/23 brd 10.217.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::488b:91ff:fe84:a94b/64 scope link valid_lft forever preferred_lft forever 4: ext0@if107: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether be:da:bd:7e:f4:37 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.0.0.2/24 brd 10.0.0.255 scope global ext0 valid_lft forever preferred_lft forever inet6 fe80::bcda:bdff:fe7e:f437/64 scope link valid_lft forever preferred_lft forever 5: ext1@ext0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether be:da:bd:7e:f4:37 brd ff:ff:ff:ff:ff:ff inet 10.0.0.1/24 brd 10.0.0.255 scope global ext1 valid_lft forever preferred_lft forever inet6 fe80::beda:bd00:17e:f437/64 scope link valid_lft forever preferred_lft forever
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 3: eth0@if105: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default link/ether 0a:58:0a:d9:00:5d brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.217.0.93/23 brd 10.217.1.255 scope global eth0 valid_lft forever preferred_lft forever inet6 fe80::488b:91ff:fe84:a94b/64 scope link valid_lft forever preferred_lft forever 4: ext0@if107: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether be:da:bd:7e:f4:37 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 10.0.0.2/24 brd 10.0.0.255 scope global ext0 valid_lft forever preferred_lft forever inet6 fe80::bcda:bdff:fe7e:f437/64 scope link valid_lft forever preferred_lft forever 5: ext1@ext0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default link/ether be:da:bd:7e:f4:37 brd ff:ff:ff:ff:ff:ff inet 10.0.0.1/24 brd 10.0.0.255 scope global ext1 valid_lft forever preferred_lft forever inet6 fe80::beda:bd00:17e:f437/64 scope link valid_lft forever preferred_lft forever
Copy to Clipboard Copied! This output shows that the network interface
ext1
is associated with the physical interfaceext0
.
24.3. About virtual routing and forwarding
24.3.1. About virtual routing and forwarding
Virtual routing and forwarding (VRF) devices combined with IP rules provide the ability to create virtual routing and forwarding domains. VRF reduces the number of permissions needed by CNF, and provides increased visibility of the network topology of secondary networks. VRF is used to provide multi-tenancy functionality, for example, where each tenant has its own unique routing tables and requires different default gateways.
Processes can bind a socket to the VRF device. Packets through the binded socket use the routing table associated with the VRF device. An important feature of VRF is that it impacts only OSI model layer 3 traffic and above so L2 tools, such as LLDP, are not affected. This allows higher priority IP rules such as policy based routing to take precedence over the VRF device rules directing specific traffic.
24.3.1.1. Benefits of secondary networks for pods for telecommunications operators
In telecommunications use cases, each CNF can potentially be connected to multiple different networks sharing the same address space. These secondary networks can potentially conflict with the cluster’s main network CIDR. Using the CNI VRF plugin, network functions can be connected to different customers' infrastructure using the same IP address, keeping different customers isolated. IP addresses are overlapped with OpenShift Container Platform IP space. The CNI VRF plugin also reduces the number of permissions needed by CNF and increases the visibility of network topologies of secondary networks.
24.4. Configuring multi-network policy
As a cluster administrator, you can configure a multi-network policy for a Single-Root I/O Virtualization (SR-IOV), MAC Virtual Local Area Network (MacVLAN), or OVN-Kubernetes additional networks. MacVLAN additional networks are fully supported. Other types of additional networks, such as IP Virtual Local Area Network (IPVLAN), are not supported.
Support for configuring multi-network policies for SR-IOV additional networks is only supported with kernel network interface controllers (NICs). SR-IOV is not supported for Data Plane Development Kit (DPDK) applications.
24.4.1. Differences between multi-network policy and network policy
Although the MultiNetworkPolicy
API implements the NetworkPolicy
API, there are several important differences:
You must use the
MultiNetworkPolicy
API:apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy
Copy to Clipboard Copied! -
You must use the
multi-networkpolicy
resource name when using the CLI to interact with multi-network policies. For example, you can view a multi-network policy object with theoc get multi-networkpolicy <name>
command where<name>
is the name of a multi-network policy. You can use the
k8s.v1.cni.cncf.io/policy-for
annotation on aMultiNetworkPolicy
object to point to aNetworkAttachmentDefinition
(NAD) custom resource (CR). The NAD CR defines the network to which the policy applies.Example multi-network policy that includes the
k8s.v1.cni.cncf.io/policy-for
annotationapiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
Copy to Clipboard Copied! where:
<namespace_name>
- Specifies the namespace name.
<network_name>
- Specifies the name of a network attachment definition.
24.4.2. Enabling multi-network policy for the cluster
As a cluster administrator, you can enable multi-network policy support on your cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster with a user with
cluster-admin
privileges.
Procedure
Create the
multinetwork-enable-patch.yaml
file with the following YAML:apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: useMultiNetworkPolicy: true
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: useMultiNetworkPolicy: true
Copy to Clipboard Copied! Configure the cluster to enable multi-network policy:
oc patch network.operator.openshift.io cluster --type=merge --patch-file=multinetwork-enable-patch.yaml
$ oc patch network.operator.openshift.io cluster --type=merge --patch-file=multinetwork-enable-patch.yaml
Copy to Clipboard Copied! Example output
network.operator.openshift.io/cluster patched
network.operator.openshift.io/cluster patched
Copy to Clipboard Copied!
24.4.3. Supporting multi-network policies in IPv6 networks
The ICMPv6 Neighbor Discovery Protocol (NDP) is a set of messages and processes that enable devices to discover and maintain information about neighboring nodes. NDP plays a crucial role in IPv6 networks, facilitating the interaction between devices on the same link.
The Cluster Network Operator (CNO) deploys the iptables implementation of multi-network policy when the useMultiNetworkPolicy
parameter is set to true
.
To support multi-network policies in IPv6 networks the Cluster Network Operator deploys the following set of rules in every pod affected by a multi-network policy:
Multi-network policy custom rules
kind: ConfigMap apiVersion: v1 metadata: name: multi-networkpolicy-custom-rules namespace: openshift-multus data: custom-v6-rules.txt: | # accept NDP -p icmpv6 --icmpv6-type neighbor-solicitation -j ACCEPT -p icmpv6 --icmpv6-type neighbor-advertisement -j ACCEPT # accept RA/RS -p icmpv6 --icmpv6-type router-solicitation -j ACCEPT -p icmpv6 --icmpv6-type router-advertisement -j ACCEPT
kind: ConfigMap
apiVersion: v1
metadata:
name: multi-networkpolicy-custom-rules
namespace: openshift-multus
data:
custom-v6-rules.txt: |
# accept NDP
-p icmpv6 --icmpv6-type neighbor-solicitation -j ACCEPT
-p icmpv6 --icmpv6-type neighbor-advertisement -j ACCEPT
# accept RA/RS
-p icmpv6 --icmpv6-type router-solicitation -j ACCEPT
-p icmpv6 --icmpv6-type router-advertisement -j ACCEPT
- 1
- This rule allows incoming ICMPv6 neighbor solicitation messages, which are part of the neighbor discovery protocol (NDP). These messages help determine the link-layer addresses of neighboring nodes.
- 2
- This rule allows incoming ICMPv6 neighbor advertisement messages, which are part of NDP and provide information about the link-layer address of the sender.
- 3
- This rule permits incoming ICMPv6 router solicitation messages. Hosts use these messages to request router configuration information.
- 4
- This rule allows incoming ICMPv6 router advertisement messages, which give configuration information to hosts.
You cannot edit these predefined rules.
These rules collectively enable essential ICMPv6 traffic for correct network functioning, including address resolution and router communication in an IPv6 environment. With these rules in place and a multi-network policy denying traffic, applications are not expected to experience connectivity issues.
24.4.4. Working with multi-network policy
As a cluster administrator, you can create, edit, view, and delete multi-network policies.
24.4.4.1. Prerequisites
- You have enabled multi-network policy support for your cluster.
24.4.4.2. Creating a multi-network policy using the CLI
To define granular rules describing ingress or egress network traffic allowed for namespaces in your cluster, you can create a multi-network policy.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - You are working in the namespace that the multi-network policy applies to.
Procedure
Create a policy rule:
Create a
<policy_name>.yaml
file:touch <policy_name>.yaml
$ touch <policy_name>.yaml
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the multi-network policy file name.
Define a multi-network policy in the file that you just created, such as in the following examples:
Deny ingress from all pods in all namespaces
This is a fundamental policy, blocking all cross-pod networking other than cross-pod traffic allowed by the configuration of other Network Policies.
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: deny-by-default annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: {} policyTypes: - Ingress ingress: []
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: deny-by-default annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: {} policyTypes: - Ingress ingress: []
Copy to Clipboard Copied! where:
<network_name>
- Specifies the name of a network attachment definition.
Allow ingress from all pods in the same namespace
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: allow-same-namespace annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: ingress: - from: - podSelector: {}
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: allow-same-namespace annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: ingress: - from: - podSelector: {}
Copy to Clipboard Copied! where:
<network_name>
- Specifies the name of a network attachment definition.
Allow ingress traffic to one pod from a particular namespace
This policy allows traffic to pods labelled
pod-a
from pods running innamespace-y
.apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: allow-traffic-pod annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: matchLabels: pod: pod-a policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: namespace-y
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: allow-traffic-pod annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: matchLabels: pod: pod-a policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: namespace-y
Copy to Clipboard Copied! where:
<network_name>
- Specifies the name of a network attachment definition.
Restrict traffic to a service
This policy when applied ensures every pod with both labels
app=bookstore
androle=api
can only be accessed by pods with labelapp=bookstore
. In this example the application could be a REST API server, marked with labelsapp=bookstore
androle=api
.This example addresses the following use cases:
- Restricting the traffic to a service to only the other microservices that need to use it.
Restricting the connections to a database to only permit the application using it.
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: api-allow annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: matchLabels: app: bookstore role: api ingress: - from: - podSelector: matchLabels: app: bookstore
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: api-allow annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: matchLabels: app: bookstore role: api ingress: - from: - podSelector: matchLabels: app: bookstore
Copy to Clipboard Copied! where:
<network_name>
- Specifies the name of a network attachment definition.
To create the multi-network policy object, enter the following command:
oc apply -f <policy_name>.yaml -n <namespace>
$ oc apply -f <policy_name>.yaml -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the multi-network policy file name.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Example output
multinetworkpolicy.k8s.cni.cncf.io/deny-by-default created
multinetworkpolicy.k8s.cni.cncf.io/deny-by-default created
Copy to Clipboard Copied!
If you log in to the web console with cluster-admin
privileges, you have a choice of creating a network policy in any namespace in the cluster directly in YAML or from a form in the web console.
24.4.4.3. Editing a multi-network policy
You can edit a multi-network policy in a namespace.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - You are working in the namespace where the multi-network policy exists.
Procedure
Optional: To list the multi-network policy objects in a namespace, enter the following command:
oc get multi-networkpolicy
$ oc get multi-networkpolicy
Copy to Clipboard Copied! where:
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Edit the multi-network policy object.
If you saved the multi-network policy definition in a file, edit the file and make any necessary changes, and then enter the following command.
oc apply -n <namespace> -f <policy_file>.yaml
$ oc apply -n <namespace> -f <policy_file>.yaml
Copy to Clipboard Copied! where:
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
<policy_file>
- Specifies the name of the file containing the network policy.
If you need to update the multi-network policy object directly, enter the following command:
oc edit multi-networkpolicy <policy_name> -n <namespace>
$ oc edit multi-networkpolicy <policy_name> -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the name of the network policy.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Confirm that the multi-network policy object is updated.
oc describe multi-networkpolicy <policy_name> -n <namespace>
$ oc describe multi-networkpolicy <policy_name> -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the name of the multi-network policy.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
If you log in to the web console with cluster-admin
privileges, you have a choice of editing a network policy in any namespace in the cluster directly in YAML or from the policy in the web console through the Actions menu.
24.4.4.4. Viewing multi-network policies using the CLI
You can examine the multi-network policies in a namespace.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - You are working in the namespace where the multi-network policy exists.
Procedure
List multi-network policies in a namespace:
To view multi-network policy objects defined in a namespace, enter the following command:
oc get multi-networkpolicy
$ oc get multi-networkpolicy
Copy to Clipboard Copied! Optional: To examine a specific multi-network policy, enter the following command:
oc describe multi-networkpolicy <policy_name> -n <namespace>
$ oc describe multi-networkpolicy <policy_name> -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the name of the multi-network policy to inspect.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
If you log in to the web console with cluster-admin
privileges, you have a choice of viewing a network policy in any namespace in the cluster directly in YAML or from a form in the web console.
24.4.4.5. Deleting a multi-network policy using the CLI
You can delete a multi-network policy in a namespace.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - You are working in the namespace where the multi-network policy exists.
Procedure
To delete a multi-network policy object, enter the following command:
oc delete multi-networkpolicy <policy_name> -n <namespace>
$ oc delete multi-networkpolicy <policy_name> -n <namespace>
Copy to Clipboard Copied! where:
<policy_name>
- Specifies the name of the multi-network policy.
<namespace>
- Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Example output
multinetworkpolicy.k8s.cni.cncf.io/default-deny deleted
multinetworkpolicy.k8s.cni.cncf.io/default-deny deleted
Copy to Clipboard Copied!
If you log in to the web console with cluster-admin
privileges, you have a choice of deleting a network policy in any namespace in the cluster directly in YAML or from the policy in the web console through the Actions menu.
24.4.4.6. Creating a default deny all multi-network policy
This is a fundamental policy, blocking all cross-pod networking other than network traffic allowed by the configuration of other deployed network policies. This procedure enforces a default deny-by-default
policy.
If you log in with a user with the cluster-admin
role, then you can create a network policy in any namespace in the cluster.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - You are working in the namespace that the multi-network policy applies to.
Procedure
Create the following YAML that defines a
deny-by-default
policy to deny ingress from all pods in all namespaces. Save the YAML in thedeny-by-default.yaml
file:apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: deny-by-default namespace: default annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: {} policyTypes: - Ingress ingress: []
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: deny-by-default namespace: default
1 annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
2 spec: podSelector: {}
3 policyTypes:
4 - Ingress
5 ingress: []
6 Copy to Clipboard Copied! - 1
- Specifies the namespace in which to deploy the policy. For example, the
my-project
namespace. - 2
- Specifies the name of namespace project followed by the network attachment definition name.
- 3
- If this field is empty, the configuration matches all the pods. Therefore, the policy applies to all pods in the
my-project
namespace. - 4
- Specifies a list of rule types that the
NetworkPolicy
relates to. - 5
- Specifies
Ingress
onlypolicyTypes
. - 6
- Specifies
ingress
rules. If not specified, all incoming traffic is dropped to all pods.
Apply the policy by entering the following command:
oc apply -f deny-by-default.yaml
$ oc apply -f deny-by-default.yaml
Copy to Clipboard Copied! Example output
multinetworkpolicy.k8s.cni.cncf.io/deny-by-default created
multinetworkpolicy.k8s.cni.cncf.io/deny-by-default created
Copy to Clipboard Copied!
24.4.4.7. Creating a multi-network policy to allow traffic from external clients
With the deny-by-default
policy in place you can proceed to configure a policy that allows traffic from external clients to a pod with the label app=web
.
If you log in with a user with the cluster-admin
role, then you can create a network policy in any namespace in the cluster.
Follow this procedure to configure a policy that allows external service from the public Internet directly or by using a Load Balancer to access the pod. Traffic is only allowed to a pod with the label app=web
.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - You are working in the namespace that the multi-network policy applies to.
Procedure
Create a policy that allows traffic from the public Internet directly or by using a load balancer to access the pod. Save the YAML in the
web-allow-external.yaml
file:apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: web-allow-external namespace: default annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: policyTypes: - Ingress podSelector: matchLabels: app: web ingress: - {}
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: web-allow-external namespace: default annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: policyTypes: - Ingress podSelector: matchLabels: app: web ingress: - {}
Copy to Clipboard Copied! Apply the policy by entering the following command:
oc apply -f web-allow-external.yaml
$ oc apply -f web-allow-external.yaml
Copy to Clipboard Copied! Example output
multinetworkpolicy.k8s.cni.cncf.io/web-allow-external created
multinetworkpolicy.k8s.cni.cncf.io/web-allow-external created
Copy to Clipboard Copied! This policy allows traffic from all resources, including external traffic as illustrated in the following diagram:

24.4.4.8. Creating a multi-network policy allowing traffic to an application from all namespaces
If you log in with a user with the cluster-admin
role, then you can create a network policy in any namespace in the cluster.
Follow this procedure to configure a policy that allows traffic from all pods in all namespaces to a particular application.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - You are working in the namespace that the multi-network policy applies to.
Procedure
Create a policy that allows traffic from all pods in all namespaces to a particular application. Save the YAML in the
web-allow-all-namespaces.yaml
file:apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: web-allow-all-namespaces namespace: default annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: matchLabels: app: web policyTypes: - Ingress ingress: - from: - namespaceSelector: {}
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: web-allow-all-namespaces namespace: default annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: matchLabels: app: web
1 policyTypes: - Ingress ingress: - from: - namespaceSelector: {}
2 Copy to Clipboard Copied! NoteBy default, if you omit specifying a
namespaceSelector
it does not select any namespaces, which means the policy allows traffic only from the namespace the network policy is deployed to.Apply the policy by entering the following command:
oc apply -f web-allow-all-namespaces.yaml
$ oc apply -f web-allow-all-namespaces.yaml
Copy to Clipboard Copied! Example output
multinetworkpolicy.k8s.cni.cncf.io/web-allow-all-namespaces created
multinetworkpolicy.k8s.cni.cncf.io/web-allow-all-namespaces created
Copy to Clipboard Copied!
Verification
Start a web service in the
default
namespace by entering the following command:oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
Copy to Clipboard Copied! Run the following command to deploy an
alpine
image in thesecondary
namespace and to start a shell:oc run test-$RANDOM --namespace=secondary --rm -i -t --image=alpine -- sh
$ oc run test-$RANDOM --namespace=secondary --rm -i -t --image=alpine -- sh
Copy to Clipboard Copied! Run the following command in the shell and observe that the request is allowed:
wget -qO- --timeout=2 http://web.default
# wget -qO- --timeout=2 http://web.default
Copy to Clipboard Copied! Expected output
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
Copy to Clipboard Copied!
24.4.4.9. Creating a multi-network policy allowing traffic to an application from a namespace
If you log in with a user with the cluster-admin
role, then you can create a network policy in any namespace in the cluster.
Follow this procedure to configure a policy that allows traffic to a pod with the label app=web
from a particular namespace. You might want to do this to:
- Restrict traffic to a production database only to namespaces where production workloads are deployed.
- Enable monitoring tools deployed to a particular namespace to scrape metrics from the current namespace.
Prerequisites
-
Your cluster uses a network plugin that supports
NetworkPolicy
objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin withmode: NetworkPolicy
set. This mode is the default for OpenShift SDN. -
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - You are working in the namespace that the multi-network policy applies to.
Procedure
Create a policy that allows traffic from all pods in a particular namespaces with a label
purpose=production
. Save the YAML in theweb-allow-prod.yaml
file:apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: web-allow-prod namespace: default annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: matchLabels: app: web policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: purpose: production
apiVersion: k8s.cni.cncf.io/v1beta1 kind: MultiNetworkPolicy metadata: name: web-allow-prod namespace: default annotations: k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> spec: podSelector: matchLabels: app: web
1 policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: purpose: production
2 Copy to Clipboard Copied! Apply the policy by entering the following command:
oc apply -f web-allow-prod.yaml
$ oc apply -f web-allow-prod.yaml
Copy to Clipboard Copied! Example output
multinetworkpolicy.k8s.cni.cncf.io/web-allow-prod created
multinetworkpolicy.k8s.cni.cncf.io/web-allow-prod created
Copy to Clipboard Copied!
Verification
Start a web service in the
default
namespace by entering the following command:oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
Copy to Clipboard Copied! Run the following command to create the
prod
namespace:oc create namespace prod
$ oc create namespace prod
Copy to Clipboard Copied! Run the following command to label the
prod
namespace:oc label namespace/prod purpose=production
$ oc label namespace/prod purpose=production
Copy to Clipboard Copied! Run the following command to create the
dev
namespace:oc create namespace dev
$ oc create namespace dev
Copy to Clipboard Copied! Run the following command to label the
dev
namespace:oc label namespace/dev purpose=testing
$ oc label namespace/dev purpose=testing
Copy to Clipboard Copied! Run the following command to deploy an
alpine
image in thedev
namespace and to start a shell:oc run test-$RANDOM --namespace=dev --rm -i -t --image=alpine -- sh
$ oc run test-$RANDOM --namespace=dev --rm -i -t --image=alpine -- sh
Copy to Clipboard Copied! Run the following command in the shell and observe that the request is blocked:
wget -qO- --timeout=2 http://web.default
# wget -qO- --timeout=2 http://web.default
Copy to Clipboard Copied! Expected output
wget: download timed out
wget: download timed out
Copy to Clipboard Copied! Run the following command to deploy an
alpine
image in theprod
namespace and start a shell:oc run test-$RANDOM --namespace=prod --rm -i -t --image=alpine -- sh
$ oc run test-$RANDOM --namespace=prod --rm -i -t --image=alpine -- sh
Copy to Clipboard Copied! Run the following command in the shell and observe that the request is allowed:
wget -qO- --timeout=2 http://web.default
# wget -qO- --timeout=2 http://web.default
Copy to Clipboard Copied! Expected output
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
Copy to Clipboard Copied!
24.5. Attaching a pod to an additional network
As a cluster user you can attach a pod to an additional network.
24.5.1. Adding a pod to an additional network
You can add a pod to an additional network. The pod continues to send normal cluster-related network traffic over the default network.
When a pod is created additional networks are attached to it. However, if a pod already exists, you cannot attach additional networks to it.
The pod must be in the same namespace as the additional network.
Prerequisites
-
Install the OpenShift CLI (
oc
). - Log in to the cluster.
Procedure
Add an annotation to the
Pod
object. Only one of the following annotation formats can be used:To attach an additional network without any customization, add an annotation with the following format. Replace
<network>
with the name of the additional network to associate with the pod:metadata: annotations: k8s.v1.cni.cncf.io/networks: <network>[,<network>,...]
metadata: annotations: k8s.v1.cni.cncf.io/networks: <network>[,<network>,...]
1 Copy to Clipboard Copied! - 1
- To specify more than one additional network, separate each network with a comma. Do not include whitespace between the comma. If you specify the same additional network multiple times, that pod will have multiple network interfaces attached to that network.
To attach an additional network with customizations, add an annotation with the following format:
metadata: annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "<network>", "namespace": "<namespace>", "default-route": ["<default-route>"] } ]
metadata: annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "<network>",
1 "namespace": "<namespace>",
2 "default-route": ["<default-route>"]
3 } ]
Copy to Clipboard Copied!
To create the pod, enter the following command. Replace
<name>
with the name of the pod.oc create -f <name>.yaml
$ oc create -f <name>.yaml
Copy to Clipboard Copied! Optional: To Confirm that the annotation exists in the
Pod
CR, enter the following command, replacing<name>
with the name of the pod.oc get pod <name> -o yaml
$ oc get pod <name> -o yaml
Copy to Clipboard Copied! In the following example, the
example-pod
pod is attached to thenet1
additional network:oc get pod example-pod -o yaml
$ oc get pod example-pod -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks: macvlan-bridge k8s.v1.cni.cncf.io/network-status: |-
1 [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.128.2.14" ], "default": true, "dns": {} },{ "name": "macvlan-bridge", "interface": "net1", "ips": [ "20.2.2.100" ], "mac": "22:2f:60:a5:f8:00", "dns": {} }] name: example-pod namespace: default spec: ... status: ...
Copy to Clipboard Copied! - 1
- The
k8s.v1.cni.cncf.io/network-status
parameter is a JSON array of objects. Each object describes the status of an additional network attached to the pod. The annotation value is stored as a plain text value.
24.5.1.1. Specifying pod-specific addressing and routing options
When attaching a pod to an additional network, you may want to specify further properties about that network in a particular pod. This allows you to change some aspects of routing, as well as specify static IP addresses and MAC addresses. To accomplish this, you can use the JSON formatted annotations.
Prerequisites
- The pod must be in the same namespace as the additional network.
-
Install the OpenShift CLI (
oc
). - You must log in to the cluster.
Procedure
To add a pod to an additional network while specifying addressing and/or routing options, complete the following steps:
Edit the
Pod
resource definition. If you are editing an existingPod
resource, run the following command to edit its definition in the default editor. Replace<name>
with the name of thePod
resource to edit.oc edit pod <name>
$ oc edit pod <name>
Copy to Clipboard Copied! In the
Pod
resource definition, add thek8s.v1.cni.cncf.io/networks
parameter to the podmetadata
mapping. Thek8s.v1.cni.cncf.io/networks
accepts a JSON string of a list of objects that reference the name ofNetworkAttachmentDefinition
custom resource (CR) names in addition to specifying additional properties.metadata: annotations: k8s.v1.cni.cncf.io/networks: '[<network>[,<network>,...]]'
metadata: annotations: k8s.v1.cni.cncf.io/networks: '[<network>[,<network>,...]]'
1 Copy to Clipboard Copied! - 1
- Replace
<network>
with a JSON object as shown in the following examples. The single quotes are required.
In the following example the annotation specifies which network attachment will have the default route, using the
default-route
parameter.apiVersion: v1 kind: Pod metadata: name: example-pod annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "net1" }, { "name": "net2", "default-route": ["192.0.2.1"] }]' spec: containers: - name: example-pod command: ["/bin/bash", "-c", "sleep 2000000000000"] image: centos/tools
apiVersion: v1 kind: Pod metadata: name: example-pod annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "net1" }, { "name": "net2",
1 "default-route": ["192.0.2.1"]
2 }]' spec: containers: - name: example-pod command: ["/bin/bash", "-c", "sleep 2000000000000"] image: centos/tools
Copy to Clipboard Copied! - 1
- The
name
key is the name of the additional network to associate with the pod. - 2
- The
default-route
key specifies a value of a gateway for traffic to be routed over if no other routing entry is present in the routing table. If more than onedefault-route
key is specified, this will cause the pod to fail to become active.
The default route will cause any traffic that is not specified in other routes to be routed to the gateway.
Setting the default route to an interface other than the default network interface for OpenShift Container Platform may cause traffic that is anticipated for pod-to-pod traffic to be routed over another interface.
To verify the routing properties of a pod, the oc
command may be used to execute the ip
command within a pod.
oc exec -it <pod_name> -- ip route
$ oc exec -it <pod_name> -- ip route
You may also reference the pod’s k8s.v1.cni.cncf.io/network-status
to see which additional network has been assigned the default route, by the presence of the default-route
key in the JSON-formatted list of objects.
To set a static IP address or MAC address for a pod you can use the JSON formatted annotations. This requires you create networks that specifically allow for this functionality. This can be specified in a rawCNIConfig for the CNO.
Edit the CNO CR by running the following command:
oc edit networks.operator.openshift.io cluster
$ oc edit networks.operator.openshift.io cluster
Copy to Clipboard Copied!
The following YAML describes the configuration parameters for the CNO:
Cluster Network Operator YAML configuration
name: <name> namespace: <namespace> rawCNIConfig: '{ ... }' type: Raw
name: <name>
namespace: <namespace>
rawCNIConfig: '{
...
}'
type: Raw
- 1
- Specify a name for the additional network attachment that you are creating. The name must be unique within the specified
namespace
. - 2
- Specify the namespace to create the network attachment in. If you do not specify a value, then the
default
namespace is used. - 3
- Specify the CNI plugin configuration in JSON format, which is based on the following template.
The following object describes the configuration parameters for utilizing static MAC address and IP address using the macvlan CNI plugin:
macvlan CNI plugin JSON configuration object using static IP and MAC address
{ "cniVersion": "0.3.1", "name": "<name>", "plugins": [{ "type": "macvlan", "capabilities": { "ips": true }, "master": "eth0", "mode": "bridge", "ipam": { "type": "static" } }, { "capabilities": { "mac": true }, "type": "tuning" }] }
{
"cniVersion": "0.3.1",
"name": "<name>",
"plugins": [{
"type": "macvlan",
"capabilities": { "ips": true },
"master": "eth0",
"mode": "bridge",
"ipam": {
"type": "static"
}
}, {
"capabilities": { "mac": true },
"type": "tuning"
}]
}
- 1
- Specifies the name for the additional network attachment to create. The name must be unique within the specified
namespace
. - 2
- Specifies an array of CNI plugin configurations. The first object specifies a macvlan plugin configuration and the second object specifies a tuning plugin configuration.
- 3
- Specifies that a request is made to enable the static IP address functionality of the CNI plugin runtime configuration capabilities.
- 4
- Specifies the interface that the macvlan plugin uses.
- 5
- Specifies that a request is made to enable the static MAC address functionality of a CNI plugin.
The above network attachment can be referenced in a JSON formatted annotation, along with keys to specify which static IP and MAC address will be assigned to a given pod.
Edit the pod with:
oc edit pod <name>
$ oc edit pod <name>
macvlan CNI plugin JSON configuration object using static IP and MAC address
apiVersion: v1 kind: Pod metadata: name: example-pod annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "<name>", "ips": [ "192.0.2.205/24" ], "mac": "CA:FE:C0:FF:EE:00" } ]'
apiVersion: v1
kind: Pod
metadata:
name: example-pod
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "<name>",
"ips": [ "192.0.2.205/24" ],
"mac": "CA:FE:C0:FF:EE:00"
}
]'
Static IP addresses and MAC addresses do not have to be used at the same time, you may use them individually, or together.
To verify the IP address and MAC properties of a pod with additional networks, use the oc
command to execute the ip command within a pod.
oc exec -it <pod_name> -- ip a
$ oc exec -it <pod_name> -- ip a
24.6. Removing a pod from an additional network
As a cluster user you can remove a pod from an additional network.
24.6.1. Removing a pod from an additional network
You can remove a pod from an additional network only by deleting the pod.
Prerequisites
- An additional network is attached to the pod.
-
Install the OpenShift CLI (
oc
). - Log in to the cluster.
Procedure
To delete the pod, enter the following command:
oc delete pod <name> -n <namespace>
$ oc delete pod <name> -n <namespace>
Copy to Clipboard Copied! -
<name>
is the name of the pod. -
<namespace>
is the namespace that contains the pod.
-
24.7. Editing an additional network
As a cluster administrator you can modify the configuration for an existing additional network.
24.7.1. Modifying an additional network attachment definition
As a cluster administrator, you can make changes to an existing additional network. Any existing pods attached to the additional network will not be updated.
Prerequisites
- You have configured an additional network for your cluster.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
To edit an additional network for your cluster, complete the following steps:
Run the following command to edit the Cluster Network Operator (CNO) CR in your default text editor:
oc edit networks.operator.openshift.io cluster
$ oc edit networks.operator.openshift.io cluster
Copy to Clipboard Copied! -
In the
additionalNetworks
collection, update the additional network with your changes. - Save your changes and quit the text editor to commit your changes.
Optional: Confirm that the CNO updated the
NetworkAttachmentDefinition
object by running the following command. Replace<network-name>
with the name of the additional network to display. There might be a delay before the CNO updates theNetworkAttachmentDefinition
object to reflect your changes.oc get network-attachment-definitions <network-name> -o yaml
$ oc get network-attachment-definitions <network-name> -o yaml
Copy to Clipboard Copied! For example, the following console output displays a
NetworkAttachmentDefinition
object that is namednet1
:oc get network-attachment-definitions net1 -o go-template='{{printf "%s\n" .spec.config}}'
$ oc get network-attachment-definitions net1 -o go-template='{{printf "%s\n" .spec.config}}' { "cniVersion": "0.3.1", "type": "macvlan", "master": "ens5", "mode": "bridge", "ipam": {"type":"static","routes":[{"dst":"0.0.0.0/0","gw":"10.128.2.1"}],"addresses":[{"address":"10.128.2.100/23","gateway":"10.128.2.1"}],"dns":{"nameservers":["172.30.0.10"],"domain":"us-west-2.compute.internal","search":["us-west-2.compute.internal"]}} }
Copy to Clipboard Copied!
24.8. Removing an additional network
As a cluster administrator you can remove an additional network attachment.
24.8.1. Removing an additional network attachment definition
As a cluster administrator, you can remove an additional network from your OpenShift Container Platform cluster. The additional network is not removed from any pods it is attached to.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
To remove an additional network from your cluster, complete the following steps:
Edit the Cluster Network Operator (CNO) in your default text editor by running the following command:
oc edit networks.operator.openshift.io cluster
$ oc edit networks.operator.openshift.io cluster
Copy to Clipboard Copied! Modify the CR by removing the configuration from the
additionalNetworks
collection for the network attachment definition you are removing.apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: additionalNetworks: []
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: additionalNetworks: []
1 Copy to Clipboard Copied! - 1
- If you are removing the configuration mapping for the only additional network attachment definition in the
additionalNetworks
collection, you must specify an empty collection.
- Save your changes and quit the text editor to commit your changes.
Optional: Confirm that the additional network CR was deleted by running the following command:
oc get network-attachment-definition --all-namespaces
$ oc get network-attachment-definition --all-namespaces
Copy to Clipboard Copied!
24.9. Assigning a secondary network to a VRF
As a cluster administrator, you can configure an additional network for a virtual routing and forwarding (VRF) domain by using the CNI VRF plugin. The virtual network that this plugin creates is associated with the physical interface that you specify.
Using a secondary network with a VRF instance has the following advantages:
- Workload isolation
- Isolate workload traffic by configuring a VRF instance for the additional network.
- Improved security
- Enable improved security through isolated network paths in the VRF domain.
- Multi-tenancy support
- Support multi-tenancy through network segmentation with a unique routing table in the VRF domain for each tenant.
Applications that use VRFs must bind to a specific device. The common usage is to use the SO_BINDTODEVICE
option for a socket. The SO_BINDTODEVICE
option binds the socket to the device that is specified in the passed interface name, for example, eth1
. To use the SO_BINDTODEVICE
option, the application must have CAP_NET_RAW
capabilities.
Using a VRF through the ip vrf exec
command is not supported in OpenShift Container Platform pods. To use VRF, bind applications directly to the VRF interface.
24.9.1. Creating an additional network attachment with the CNI VRF plugin
The Cluster Network Operator (CNO) manages additional network definitions. When you specify an additional network to create, the CNO creates the NetworkAttachmentDefinition
custom resource (CR) automatically.
Do not edit the NetworkAttachmentDefinition
CRs that the Cluster Network Operator manages. Doing so might disrupt network traffic on your additional network.
To create an additional network attachment with the CNI VRF plugin, perform the following procedure.
Prerequisites
- Install the OpenShift Container Platform CLI (oc).
- Log in to the OpenShift cluster as a user with cluster-admin privileges.
Procedure
Create the
Network
custom resource (CR) for the additional network attachment and insert therawCNIConfig
configuration for the additional network, as in the following example CR. Save the YAML as the fileadditional-network-attachment.yaml
.apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: additionalNetworks: - name: test-network-1 namespace: additional-network-1 type: Raw rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "macvlan-vrf", "plugins": [ { "type": "macvlan", "master": "eth1", "ipam": { "type": "static", "addresses": [ { "address": "191.168.1.23/24" } ] } }, { "type": "vrf", "vrfname": "vrf-1", "table": 1001 }] }'
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: additionalNetworks: - name: test-network-1 namespace: additional-network-1 type: Raw rawCNIConfig: '{ "cniVersion": "0.3.1", "name": "macvlan-vrf", "plugins": [
1 { "type": "macvlan", "master": "eth1", "ipam": { "type": "static", "addresses": [ { "address": "191.168.1.23/24" } ] } }, { "type": "vrf",
2 "vrfname": "vrf-1",
3 "table": 1001
4 }] }'
Copy to Clipboard Copied! - 1
plugins
must be a list. The first item in the list must be the secondary network underpinning the VRF network. The second item in the list is the VRF plugin configuration.- 2
type
must be set tovrf
.- 3
vrfname
is the name of the VRF that the interface is assigned to. If it does not exist in the pod, it is created.- 4
- Optional.
table
is the routing table ID. By default, thetableid
parameter is used. If it is not specified, the CNI assigns a free routing table ID to the VRF.
NoteVRF functions correctly only when the resource is of type
netdevice
.Create the
Network
resource:oc create -f additional-network-attachment.yaml
$ oc create -f additional-network-attachment.yaml
Copy to Clipboard Copied! Confirm that the CNO created the
NetworkAttachmentDefinition
CR by running the following command. Replace<namespace>
with the namespace that you specified when configuring the network attachment, for example,additional-network-1
.oc get network-attachment-definitions -n <namespace>
$ oc get network-attachment-definitions -n <namespace>
Copy to Clipboard Copied! Example output
NAME AGE additional-network-1 14m
NAME AGE additional-network-1 14m
Copy to Clipboard Copied! NoteThere might be a delay before the CNO creates the CR.
Verification
Create a pod and assign it to the additional network with the VRF instance:
Create a YAML file that defines the
Pod
resource:Example
pod-additional-net.yaml
fileapiVersion: v1 kind: Pod metadata: name: pod-additional-net annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "test-network-1" } ]' spec: containers: - name: example-pod-1 command: ["/bin/bash", "-c", "sleep 9000000"] image: centos:8
apiVersion: v1 kind: Pod metadata: name: pod-additional-net annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "test-network-1"
1 } ]' spec: containers: - name: example-pod-1 command: ["/bin/bash", "-c", "sleep 9000000"] image: centos:8
Copy to Clipboard Copied! - 1
- Specify the name of the additional network with the VRF instance.
Create the
Pod
resource by running the following command:oc create -f pod-additional-net.yaml
$ oc create -f pod-additional-net.yaml
Copy to Clipboard Copied! Example output
pod/test-pod created
pod/test-pod created
Copy to Clipboard Copied!
Verify that the pod network attachment is connected to the VRF additional network. Start a remote session with the pod and run the following command:
ip vrf show
$ ip vrf show
Copy to Clipboard Copied! Example output
Name Table ----------------------- vrf-1 1001
Name Table ----------------------- vrf-1 1001
Copy to Clipboard Copied! Confirm that the VRF interface is the controller for the additional interface:
ip link
$ ip link
Copy to Clipboard Copied! Example output
5: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master red state UP mode
5: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master red state UP mode
Copy to Clipboard Copied!
Chapter 25. Hardware networks
25.1. About Single Root I/O Virtualization (SR-IOV) hardware networks
The Single Root I/O Virtualization (SR-IOV) specification is a standard for a type of PCI device assignment that can share a single device with multiple pods.
SR-IOV can segment a compliant network device, recognized on the host node as a physical function (PF), into multiple virtual functions (VFs). The VF is used like any other network device. The SR-IOV network device driver for the device determines how the VF is exposed in the container:
-
netdevice
driver: A regular kernel network device in thenetns
of the container -
vfio-pci
driver: A character device mounted in the container
You can use SR-IOV network devices with additional networks on your OpenShift Container Platform cluster installed on bare metal or Red Hat OpenStack Platform (RHOSP) infrastructure for applications that require high bandwidth or low latency.
You can configure multi-network policies for SR-IOV networks. The support for this is technology preview and SR-IOV additional networks are only supported with kernel NICs. They are not supported for Data Plane Development Kit (DPDK) applications.
Creating multi-network policies on SR-IOV networks might not deliver the same performance to applications compared to SR-IOV networks without a multi-network policy configured.
Multi-network policies for SR-IOV network is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can enable SR-IOV on a node by using the following command:
oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"
$ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"
25.1.1. Components that manage SR-IOV network devices
The SR-IOV Network Operator creates and manages the components of the SR-IOV stack. It performs the following functions:
- Orchestrates discovery and management of SR-IOV network devices
-
Generates
NetworkAttachmentDefinition
custom resources for the SR-IOV Container Network Interface (CNI) - Creates and updates the configuration of the SR-IOV network device plugin
-
Creates node specific
SriovNetworkNodeState
custom resources -
Updates the
spec.interfaces
field in eachSriovNetworkNodeState
custom resource
The Operator provisions the following components:
- SR-IOV network configuration daemon
- A daemon set that is deployed on worker nodes when the SR-IOV Network Operator starts. The daemon is responsible for discovering and initializing SR-IOV network devices in the cluster.
- SR-IOV Network Operator webhook
- A dynamic admission controller webhook that validates the Operator custom resource and sets appropriate default values for unset fields.
- SR-IOV Network resources injector
-
A dynamic admission controller webhook that provides functionality for patching Kubernetes pod specifications with requests and limits for custom network resources such as SR-IOV VFs. The SR-IOV network resources injector adds the
resource
field to only the first container in a pod automatically. - SR-IOV network device plugin
- A device plugin that discovers, advertises, and allocates SR-IOV network virtual function (VF) resources. Device plugins are used in Kubernetes to enable the use of limited resources, typically in physical devices. Device plugins give the Kubernetes scheduler awareness of resource availability, so that the scheduler can schedule pods on nodes with sufficient resources.
- SR-IOV CNI plugin
- A CNI plugin that attaches VF interfaces allocated from the SR-IOV network device plugin directly into a pod.
- SR-IOV InfiniBand CNI plugin
- A CNI plugin that attaches InfiniBand (IB) VF interfaces allocated from the SR-IOV network device plugin directly into a pod.
The SR-IOV Network resources injector and SR-IOV Network Operator webhook are enabled by default and can be disabled by editing the default
SriovOperatorConfig
CR. Use caution when disabling the SR-IOV Network Operator Admission Controller webhook. You can disable the webhook under specific circumstances, such as troubleshooting, or if you want to use unsupported devices.
25.1.1.1. Supported platforms
The SR-IOV Network Operator is supported on the following platforms:
- Bare metal
- Red Hat OpenStack Platform (RHOSP)
25.1.1.2. Supported devices
OpenShift Container Platform supports the following network interface controllers:
Manufacturer | Model | Vendor ID | Device ID |
---|---|---|---|
Broadcom | BCM57414 | 14e4 | 16d7 |
Broadcom | BCM57508 | 14e4 | 1750 |
Broadcom | BCM57504 | 14e4 | 1751 |
Intel | X710 | 8086 | 1572 |
Intel | X710 Backplane | 8086 | 1581 |
Intel | X710 Base T | 8086 | 15ff |
Intel | XL710 | 8086 | 1583 |
Intel | XXV710 | 8086 | 158b |
Intel | E810-CQDA2 | 8086 | 1592 |
Intel | E810-2CQDA2 | 8086 | 1592 |
Intel | E810-XXVDA2 | 8086 | 159b |
Intel | E810-XXVDA4 | 8086 | 1593 |
Intel | E810-XXVDA4T | 8086 | 1593 |
Mellanox | MT27700 Family [ConnectX‑4] | 15b3 | 1013 |
Mellanox | MT27710 Family [ConnectX‑4 Lx] | 15b3 | 1015 |
Mellanox | MT27800 Family [ConnectX‑5] | 15b3 | 1017 |
Mellanox | MT28880 Family [ConnectX‑5 Ex] | 15b3 | 1019 |
Mellanox | MT28908 Family [ConnectX‑6] | 15b3 | 101b |
Mellanox | MT2892 Family [ConnectX‑6 Dx] | 15b3 | 101d |
Mellanox | MT2894 Family [ConnectX‑6 Lx] | 15b3 | 101f |
Mellanox | Mellanox MT2910 Family [ConnectX‑7] | 15b3 | 1021 |
Mellanox | MT42822 BlueField‑2 in ConnectX‑6 NIC mode | 15b3 | a2d6 |
Pensando [1] | DSC-25 dual-port 25G distributed services card for ionic driver | 0x1dd8 | 0x1002 |
Pensando [1] | DSC-100 dual-port 100G distributed services card for ionic driver | 0x1dd8 | 0x1003 |
Silicom | STS Family | 8086 | 1591 |
- OpenShift SR-IOV is supported, but you must set a static, Virtual Function (VF) media access control (MAC) address using the SR-IOV CNI config file when using SR-IOV.
For the most up-to-date list of supported cards and compatible OpenShift Container Platform versions available, see Openshift Single Root I/O Virtualization (SR-IOV) and PTP hardware networks Support Matrix.
25.1.1.3. Automated discovery of SR-IOV network devices
The SR-IOV Network Operator searches your cluster for SR-IOV capable network devices on worker nodes. The Operator creates and updates a SriovNetworkNodeState custom resource (CR) for each worker node that provides a compatible SR-IOV network device.
The CR is assigned the same name as the worker node. The status.interfaces
list provides information about the network devices on a node.
Do not modify a SriovNetworkNodeState
object. The Operator creates and manages these resources automatically.
25.1.1.3.1. Example SriovNetworkNodeState object
The following YAML is an example of a SriovNetworkNodeState
object created by the SR-IOV Network Operator:
An SriovNetworkNodeState object
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodeState metadata: name: node-25 namespace: openshift-sriov-network-operator ownerReferences: - apiVersion: sriovnetwork.openshift.io/v1 blockOwnerDeletion: true controller: true kind: SriovNetworkNodePolicy name: default spec: dpConfigVersion: "39824" status: interfaces: - deviceID: "1017" driver: mlx5_core mtu: 1500 name: ens785f0 pciAddress: "0000:18:00.0" totalvfs: 8 vendor: 15b3 - deviceID: "1017" driver: mlx5_core mtu: 1500 name: ens785f1 pciAddress: "0000:18:00.1" totalvfs: 8 vendor: 15b3 - deviceID: 158b driver: i40e mtu: 1500 name: ens817f0 pciAddress: 0000:81:00.0 totalvfs: 64 vendor: "8086" - deviceID: 158b driver: i40e mtu: 1500 name: ens817f1 pciAddress: 0000:81:00.1 totalvfs: 64 vendor: "8086" - deviceID: 158b driver: i40e mtu: 1500 name: ens803f0 pciAddress: 0000:86:00.0 totalvfs: 64 vendor: "8086" syncStatus: Succeeded
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
name: node-25
namespace: openshift-sriov-network-operator
ownerReferences:
- apiVersion: sriovnetwork.openshift.io/v1
blockOwnerDeletion: true
controller: true
kind: SriovNetworkNodePolicy
name: default
spec:
dpConfigVersion: "39824"
status:
interfaces:
- deviceID: "1017"
driver: mlx5_core
mtu: 1500
name: ens785f0
pciAddress: "0000:18:00.0"
totalvfs: 8
vendor: 15b3
- deviceID: "1017"
driver: mlx5_core
mtu: 1500
name: ens785f1
pciAddress: "0000:18:00.1"
totalvfs: 8
vendor: 15b3
- deviceID: 158b
driver: i40e
mtu: 1500
name: ens817f0
pciAddress: 0000:81:00.0
totalvfs: 64
vendor: "8086"
- deviceID: 158b
driver: i40e
mtu: 1500
name: ens817f1
pciAddress: 0000:81:00.1
totalvfs: 64
vendor: "8086"
- deviceID: 158b
driver: i40e
mtu: 1500
name: ens803f0
pciAddress: 0000:86:00.0
totalvfs: 64
vendor: "8086"
syncStatus: Succeeded
25.1.1.4. Example use of a virtual function in a pod
You can run a remote direct memory access (RDMA) or a Data Plane Development Kit (DPDK) application in a pod with SR-IOV VF attached.
This example shows a pod using a virtual function (VF) in RDMA mode:
Pod
spec that uses RDMA mode
apiVersion: v1 kind: Pod metadata: name: rdma-app annotations: k8s.v1.cni.cncf.io/networks: sriov-rdma-mlnx spec: containers: - name: testpmd image: <RDMA_image> imagePullPolicy: IfNotPresent securityContext: runAsUser: 0 capabilities: add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] command: ["sleep", "infinity"]
apiVersion: v1
kind: Pod
metadata:
name: rdma-app
annotations:
k8s.v1.cni.cncf.io/networks: sriov-rdma-mlnx
spec:
containers:
- name: testpmd
image: <RDMA_image>
imagePullPolicy: IfNotPresent
securityContext:
runAsUser: 0
capabilities:
add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
command: ["sleep", "infinity"]
The following example shows a pod with a VF in DPDK mode:
Pod
spec that uses DPDK mode
apiVersion: v1 kind: Pod metadata: name: dpdk-app annotations: k8s.v1.cni.cncf.io/networks: sriov-dpdk-net spec: containers: - name: testpmd image: <DPDK_image> securityContext: runAsUser: 0 capabilities: add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] volumeMounts: - mountPath: /dev/hugepages name: hugepage resources: limits: memory: "1Gi" cpu: "2" hugepages-1Gi: "4Gi" requests: memory: "1Gi" cpu: "2" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
apiVersion: v1
kind: Pod
metadata:
name: dpdk-app
annotations:
k8s.v1.cni.cncf.io/networks: sriov-dpdk-net
spec:
containers:
- name: testpmd
image: <DPDK_image>
securityContext:
runAsUser: 0
capabilities:
add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
volumeMounts:
- mountPath: /dev/hugepages
name: hugepage
resources:
limits:
memory: "1Gi"
cpu: "2"
hugepages-1Gi: "4Gi"
requests:
memory: "1Gi"
cpu: "2"
hugepages-1Gi: "4Gi"
command: ["sleep", "infinity"]
volumes:
- name: hugepage
emptyDir:
medium: HugePages
25.1.1.5. DPDK library for use with container applications
An optional library, app-netutil
, provides several API methods for gathering network information about a pod from within a container running within that pod.
This library can assist with integrating SR-IOV virtual functions (VFs) in Data Plane Development Kit (DPDK) mode into the container. The library provides both a Golang API and a C API.
Currently there are three API methods implemented:
GetCPUInfo()
- This function determines which CPUs are available to the container and returns the list.
GetHugepages()
-
This function determines the amount of huge page memory requested in the
Pod
spec for each container and returns the values. GetInterfaces()
- This function determines the set of interfaces in the container and returns the list. The return value includes the interface type and type-specific data for each interface.
The repository for the library includes a sample Dockerfile to build a container image, dpdk-app-centos
. The container image can run one of the following DPDK sample applications, depending on an environment variable in the pod specification: l2fwd
, l3wd
or testpmd
. The container image provides an example of integrating the app-netutil
library into the container image itself. The library can also integrate into an init container. The init container can collect the required data and pass the data to an existing DPDK workload.
25.1.1.6. Huge pages resource injection for Downward API
When a pod specification includes a resource request or limit for huge pages, the Network Resources Injector automatically adds Downward API fields to the pod specification to provide the huge pages information to the container.
The Network Resources Injector adds a volume that is named podnetinfo
and is mounted at /etc/podnetinfo
for each container in the pod. The volume uses the Downward API and includes a file for huge pages requests and limits. The file naming convention is as follows:
-
/etc/podnetinfo/hugepages_1G_request_<container-name>
-
/etc/podnetinfo/hugepages_1G_limit_<container-name>
-
/etc/podnetinfo/hugepages_2M_request_<container-name>
-
/etc/podnetinfo/hugepages_2M_limit_<container-name>
The paths specified in the previous list are compatible with the app-netutil
library. By default, the library is configured to search for resource information in the /etc/podnetinfo
directory. If you choose to specify the Downward API path items yourself manually, the app-netutil
library searches for the following paths in addition to the paths in the previous list.
-
/etc/podnetinfo/hugepages_request
-
/etc/podnetinfo/hugepages_limit
-
/etc/podnetinfo/hugepages_1G_request
-
/etc/podnetinfo/hugepages_1G_limit
-
/etc/podnetinfo/hugepages_2M_request
-
/etc/podnetinfo/hugepages_2M_limit
As with the paths that the Network Resources Injector can create, the paths in the preceding list can optionally end with a _<container-name>
suffix.
25.1.3. Next steps
25.2. Installing the SR-IOV Network Operator
You can install the Single Root I/O Virtualization (SR-IOV) Network Operator on your cluster to manage SR-IOV network devices and network attachments.
25.2.1. Installing the SR-IOV Network Operator
As a cluster administrator, you can install the Single Root I/O Virtualization (SR-IOV) Network Operator by using the OpenShift Container Platform CLI or the web console.
25.2.1.1. CLI: Installing the SR-IOV Network Operator
As a cluster administrator, you can install the Operator using the CLI.
Prerequisites
- A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
-
Install the OpenShift CLI (
oc
). -
An account with
cluster-admin
privileges.
Procedure
To create the
openshift-sriov-network-operator
namespace, enter the following command:cat << EOF| oc create -f - apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator annotations: workload.openshift.io/allowed: management EOF
$ cat << EOF| oc create -f - apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator annotations: workload.openshift.io/allowed: management EOF
Copy to Clipboard Copied! To create an OperatorGroup CR, enter the following command:
cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator EOF
$ cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator EOF
Copy to Clipboard Copied! To create a Subscription CR for the SR-IOV Network Operator, enter the following command:
cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subscription namespace: openshift-sriov-network-operator spec: channel: stable name: sriov-network-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
$ cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subscription namespace: openshift-sriov-network-operator spec: channel: stable name: sriov-network-operator source: redhat-operators sourceNamespace: openshift-marketplace EOF
Copy to Clipboard Copied! To verify that the Operator is installed, enter the following command:
oc get csv -n openshift-sriov-network-operator \ -o custom-columns=Name:.metadata.name,Phase:.status.phase
$ oc get csv -n openshift-sriov-network-operator \ -o custom-columns=Name:.metadata.name,Phase:.status.phase
Copy to Clipboard Copied! Example output
Name Phase sriov-network-operator.4.15.0-202310121402 Succeeded
Name Phase sriov-network-operator.4.15.0-202310121402 Succeeded
Copy to Clipboard Copied!
25.2.1.2. Web console: Installing the SR-IOV Network Operator
As a cluster administrator, you can install the Operator using the web console.
Prerequisites
- A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
-
Install the OpenShift CLI (
oc
). -
An account with
cluster-admin
privileges.
Procedure
Install the SR-IOV Network Operator:
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Select SR-IOV Network Operator from the list of available Operators, and then click Install.
- On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
- Click Install.
Verify that the SR-IOV Network Operator is installed successfully:
- Navigate to the Operators → Installed Operators page.
Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the Operator does not appear as installed, to troubleshoot further:
- Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Navigate to the Workloads → Pods page and check the logs for pods in the
openshift-sriov-network-operator
project. Check the namespace of the YAML file. If the annotation is missing, you can add the annotation
workload.openshift.io/allowed=management
to the Operator namespace with the following command:oc annotate ns/openshift-sriov-network-operator workload.openshift.io/allowed=management
$ oc annotate ns/openshift-sriov-network-operator workload.openshift.io/allowed=management
Copy to Clipboard Copied! NoteFor single-node OpenShift clusters, the annotation
workload.openshift.io/allowed=management
is required for the namespace.
25.2.2. Next steps
- Optional: Configuring the SR-IOV Network Operator
25.3. Configuring the SR-IOV Network Operator
The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.
25.3.1. Configuring the SR-IOV Network Operator
Modifying the SR-IOV Network Operator configuration is not normally necessary. The default configuration is recommended for most use cases. Complete the steps to modify the relevant configuration only if the default behavior of the Operator is not compatible with your use case.
The SR-IOV Network Operator adds the SriovOperatorConfig.sriovnetwork.openshift.io
CustomResourceDefinition resource. The Operator automatically creates a SriovOperatorConfig custom resource (CR) named default
in the openshift-sriov-network-operator
namespace.
The default
CR contains the SR-IOV Network Operator configuration for your cluster. To change the Operator configuration, you must modify this CR.
25.3.1.1. SR-IOV Network Operator config custom resource
The fields for the sriovoperatorconfig
custom resource are described in the following table:
Field | Type | Description |
---|---|---|
|
|
Specifies the name of the SR-IOV Network Operator instance. The default value is |
|
|
Specifies the namespace of the SR-IOV Network Operator instance. The default value is |
|
| Specifies the node selection to control scheduling the SR-IOV Network Config Daemon on selected nodes. By default, this field is not set and the Operator deploys the SR-IOV Network Config daemon set on worker nodes. |
|
|
Specifies whether to disable the node draining process or enable the node draining process when you apply a new policy to configure the NIC on a node. Setting this field to
For single-node clusters, set this field to |
|
|
Specifies whether to enable or disable the Network Resources Injector daemon set. By default, this field is set to |
|
| Specifies whether to enable or disable the Operator Admission Controller webhook daemon set. |
|
|
Specifies the log verbosity level of the Operator. By default, this field is set to |
25.3.1.2. About the Network Resources Injector
The Network Resources Injector is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:
- Mutation of resource requests and limits in a pod specification to add an SR-IOV resource name according to an SR-IOV network attachment definition annotation.
-
Mutation of a pod specification with a Downward API volume to expose pod annotations, labels, and huge pages requests and limits. Containers that run in the pod can access the exposed information as files under the
/etc/podnetinfo
path.
By default, the Network Resources Injector is enabled by the SR-IOV Network Operator and runs as a daemon set on all control plane nodes. The following is an example of Network Resources Injector pods running in a cluster with three control plane nodes:
oc get pods -n openshift-sriov-network-operator
$ oc get pods -n openshift-sriov-network-operator
Example output
NAME READY STATUS RESTARTS AGE network-resources-injector-5cz5p 1/1 Running 0 10m network-resources-injector-dwqpx 1/1 Running 0 10m network-resources-injector-lktz5 1/1 Running 0 10m
NAME READY STATUS RESTARTS AGE
network-resources-injector-5cz5p 1/1 Running 0 10m
network-resources-injector-dwqpx 1/1 Running 0 10m
network-resources-injector-lktz5 1/1 Running 0 10m
25.3.1.3. About the SR-IOV Network Operator admission controller webhook
The SR-IOV Network Operator Admission Controller webhook is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:
-
Validation of the
SriovNetworkNodePolicy
CR when it is created or updated. -
Mutation of the
SriovNetworkNodePolicy
CR by setting the default value for thepriority
anddeviceType
fields when the CR is created or updated.
By default the SR-IOV Network Operator Admission Controller webhook is enabled by the Operator and runs as a daemon set on all control plane nodes.
Use caution when disabling the SR-IOV Network Operator Admission Controller webhook. You can disable the webhook under specific circumstances, such as troubleshooting, or if you want to use unsupported devices. For information about configuring unsupported devices, see Configuring the SR-IOV Network Operator to use an unsupported NIC.
The following is an example of the Operator Admission Controller webhook pods running in a cluster with three control plane nodes:
oc get pods -n openshift-sriov-network-operator
$ oc get pods -n openshift-sriov-network-operator
Example output
NAME READY STATUS RESTARTS AGE operator-webhook-9jkw6 1/1 Running 0 16m operator-webhook-kbr5p 1/1 Running 0 16m operator-webhook-rpfrl 1/1 Running 0 16m
NAME READY STATUS RESTARTS AGE
operator-webhook-9jkw6 1/1 Running 0 16m
operator-webhook-kbr5p 1/1 Running 0 16m
operator-webhook-rpfrl 1/1 Running 0 16m
25.3.1.4. About custom node selectors
The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker
nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.
25.3.1.5. Disabling or enabling the Network Resources Injector
To disable or enable the Network Resources Injector, which is enabled by default, complete the following procedure.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - You must have installed the SR-IOV Network Operator.
Procedure
Set the
enableInjector
field. Replace<value>
withfalse
to disable the feature ortrue
to enable the feature.oc patch sriovoperatorconfig default \ --type=merge -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableInjector": <value> } }'
$ oc patch sriovoperatorconfig default \ --type=merge -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableInjector": <value> } }'
Copy to Clipboard Copied! TipYou can alternatively apply the following YAML to update the Operator:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: enableInjector: <value>
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: enableInjector: <value>
Copy to Clipboard Copied!
25.3.1.6. Disabling or enabling the SR-IOV Network Operator admission controller webhook
To disable or enable the admission controller webhook, which is enabled by default, complete the following procedure.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - You must have installed the SR-IOV Network Operator.
Procedure
Set the
enableOperatorWebhook
field. Replace<value>
withfalse
to disable the feature ortrue
to enable it:oc patch sriovoperatorconfig default --type=merge \ -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableOperatorWebhook": <value> } }'
$ oc patch sriovoperatorconfig default --type=merge \ -n openshift-sriov-network-operator \ --patch '{ "spec": { "enableOperatorWebhook": <value> } }'
Copy to Clipboard Copied! TipYou can alternatively apply the following YAML to update the Operator:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: enableOperatorWebhook: <value>
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: enableOperatorWebhook: <value>
Copy to Clipboard Copied!
25.3.1.7. Configuring a custom NodeSelector for the SR-IOV Network Config daemon
The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker
nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.
To specify the nodes where the SR-IOV Network Config daemon is deployed, complete the following procedure.
When you update the configDaemonNodeSelector
field, the SR-IOV Network Config daemon is recreated on each selected node. While the daemon is recreated, cluster users are unable to apply any new SR-IOV Network node policy or create new SR-IOV pods.
Procedure
To update the node selector for the operator, enter the following command:
oc patch sriovoperatorconfig default --type=json \ -n openshift-sriov-network-operator \ --patch '[{
$ oc patch sriovoperatorconfig default --type=json \ -n openshift-sriov-network-operator \ --patch '[{ "op": "replace", "path": "/spec/configDaemonNodeSelector", "value": {<node_label>} }]'
Copy to Clipboard Copied! Replace
<node_label>
with a label to apply as in the following example:"node-role.kubernetes.io/worker": ""
.TipYou can alternatively apply the following YAML to update the Operator:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: configDaemonNodeSelector: <node_label>
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: configDaemonNodeSelector: <node_label>
Copy to Clipboard Copied!
25.3.1.8. Configuring the SR-IOV Network Operator for single node installations
By default, the SR-IOV Network Operator drains workloads from a node before every policy change. The Operator performs this action to ensure that there no workloads using the virtual functions before the reconfiguration.
For installations on a single node, there are no other nodes to receive the workloads. As a result, the Operator must be configured not to drain the workloads from the single node.
After performing the following procedure to disable draining workloads, you must remove any workload that uses an SR-IOV network interface before you change any SR-IOV network node policy.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - You must have installed the SR-IOV Network Operator.
Procedure
To set the
disableDrain
field totrue
, enter the following command:oc patch sriovoperatorconfig default --type=merge \ -n openshift-sriov-network-operator \ --patch '{ "spec": { "disableDrain": true } }'
$ oc patch sriovoperatorconfig default --type=merge \ -n openshift-sriov-network-operator \ --patch '{ "spec": { "disableDrain": true } }'
Copy to Clipboard Copied! TipYou can alternatively apply the following YAML to update the Operator:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: disableDrain: true
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: disableDrain: true
Copy to Clipboard Copied!
25.3.1.9. Deploying the SR-IOV Operator for hosted control planes
Hosted control planes on the AWS platform is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
After you configure and deploy your hosting service cluster, you can create a subscription to the SR-IOV Operator on a hosted cluster. The SR-IOV pod runs on worker machines rather than the control plane.
Prerequisites
You must configure and deploy the hosted cluster on AWS. For more information, see Configuring the hosting cluster on AWS (Technology Preview).
Procedure
Create a namespace and an Operator group:
apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator
apiVersion: v1 kind: Namespace metadata: name: openshift-sriov-network-operator --- apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: sriov-network-operators namespace: openshift-sriov-network-operator spec: targetNamespaces: - openshift-sriov-network-operator
Copy to Clipboard Copied! Create a subscription to the SR-IOV Operator:
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subsription namespace: openshift-sriov-network-operator spec: channel: stable name: sriov-network-operator config: nodeSelector: node-role.kubernetes.io/worker: "" source: s/qe-app-registry/redhat-operators sourceNamespace: openshift-marketplace
apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: sriov-network-operator-subsription namespace: openshift-sriov-network-operator spec: channel: stable name: sriov-network-operator config: nodeSelector: node-role.kubernetes.io/worker: "" source: s/qe-app-registry/redhat-operators sourceNamespace: openshift-marketplace
Copy to Clipboard Copied!
Verification
To verify that the SR-IOV Operator is ready, run the following command and view the resulting output:
oc get csv -n openshift-sriov-network-operator
$ oc get csv -n openshift-sriov-network-operator
Copy to Clipboard Copied! Example output
NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.15.0-202211021237 SR-IOV Network Operator 4.15.0-202211021237 sriov-network-operator.4.15.0-202210290517 Succeeded
NAME DISPLAY VERSION REPLACES PHASE sriov-network-operator.4.15.0-202211021237 SR-IOV Network Operator 4.15.0-202211021237 sriov-network-operator.4.15.0-202210290517 Succeeded
Copy to Clipboard Copied! To verify that the SR-IOV pods are deployed, run the following command:
oc get pods -n openshift-sriov-network-operator
$ oc get pods -n openshift-sriov-network-operator
Copy to Clipboard Copied!
25.3.2. Next steps
25.4. Configuring an SR-IOV network device
You can configure a Single Root I/O Virtualization (SR-IOV) device in your cluster.
25.4.1. SR-IOV network node configuration object
You specify the SR-IOV network device configuration for a node by creating an SR-IOV network node policy. The API object for the policy is part of the sriovnetwork.openshift.io
API group.
The following YAML describes an SR-IOV network node policy:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: <name> namespace: openshift-sriov-network-operator spec: resourceName: <sriov_resource_name> nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> mtu: <mtu> needVhostNet: false numVfs: <num> externallyManaged: false nicSelector: vendor: "<vendor_code>" deviceID: "<device_id>" pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", ...] netFilter: "<filter_string>" deviceType: <device_type> isRdma: false linkType: <link_type> eSwitchMode: "switchdev" excludeTopology: false
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: <name>
namespace: openshift-sriov-network-operator
spec:
resourceName: <sriov_resource_name>
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
priority: <priority>
mtu: <mtu>
needVhostNet: false
numVfs: <num>
externallyManaged: false
nicSelector:
vendor: "<vendor_code>"
deviceID: "<device_id>"
pfNames: ["<pf_name>", ...]
rootDevices: ["<pci_bus_id>", ...]
netFilter: "<filter_string>"
deviceType: <device_type>
isRdma: false
linkType: <link_type>
eSwitchMode: "switchdev"
excludeTopology: false
- 1
- The name for the custom resource object.
- 2
- The namespace where the SR-IOV Network Operator is installed.
- 3
- The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name.
When specifying a name, be sure to use the accepted syntax expression
^[a-zA-Z0-9_]+$
in theresourceName
. - 4
- The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.Important
The SR-IOV Network Operator applies node network configuration policies to nodes in sequence. Before applying node network configuration policies, the SR-IOV Network Operator checks if the machine config pool (MCP) for a node is in an unhealthy state such as
Degraded
orUpdating
. If a node is in an unhealthy MCP, the process of applying node network configuration policies to all targeted nodes in the cluster pauses until the MCP returns to a healthy state.To avoid a node in an unhealthy MCP from blocking the application of node network configuration policies to other nodes, including nodes in other MCPs, you must create a separate node network configuration policy for each MCP.
- 5
- Optional: The priority is an integer value between
0
and99
. A smaller value receives higher priority. For example, a priority of10
is a higher priority than99
. The default value is99
. - 6
- Optional: The maximum transmission unit (MTU) of the physical function and all its virtual functions. The maximum MTU value can vary for different network interface controller (NIC) models.Important
If you want to create virtual function on the default network interface, ensure that the MTU is set to a value that matches the cluster MTU.
If you want to modify the MTU of a single virtual function while the function is assigned to a pod, leave the MTU value blank in the SR-IOV network node policy. Otherwise, the SR-IOV Network Operator reverts the MTU of the virtual function to the MTU value defined in the SR-IOV network node policy, which might trigger a node drain.
- 7
- Optional: Set
needVhostNet
totrue
to mount the/dev/vhost-net
device in the pod. Use the mounted/dev/vhost-net
device with Data Plane Development Kit (DPDK) to forward traffic to the kernel network stack. - 8
- The number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than
127
. - 9
- The
externallyManaged
field indicates whether the SR-IOV Network Operator manages all, or only a subset of virtual functions (VFs). With the value set tofalse
the SR-IOV Network Operator manages and configures all VFs on the PF.NoteWhen
externallyManaged
is set totrue
, you must manually create the Virtual Functions (VFs) on the physical function (PF) before applying theSriovNetworkNodePolicy
resource. If the VFs are not pre-created, the SR-IOV Network Operator’s webhook will block the policy request.When
externallyManaged
is set tofalse
, the SR-IOV Network Operator automatically creates and manages the VFs, including resetting them if necessary.To use VFs on the host system, you must create them through NMState, and set
externallyManaged
totrue
. In this mode, the SR-IOV Network Operator does not modify the PF or the manually managed VFs, except for those explicitly defined in thenicSelector
field of your policy. However, the SR-IOV Network Operator continues to manage VFs that are used as pod secondary interfaces. - 10
- The NIC selector identifies the device to which this resource applies. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally.
If you specify
rootDevices
, you must also specify a value forvendor
,deviceID
, orpfNames
. If you specify bothpfNames
androotDevices
at the same time, ensure that they refer to the same device. If you specify a value fornetFilter
, then you do not need to specify any other parameter because a network ID is unique. - 11
- Optional: The vendor hexadecimal vendor identifier of the SR-IOV network device. The only allowed values are
8086
(Intel) and15b3
(Mellanox). - 12
- Optional: The device hexadecimal device identifier of the SR-IOV network device. For example,
101b
is the device ID for a Mellanox ConnectX-6 device. - 13
- Optional: An array of one or more physical function (PF) names the resource must apply to.
- 14
- Optional: An array of one or more PCI bus addresses the resource must apply to. For example
0000:02:00.1
. - 15
- Optional: The platform-specific network filter. The only supported platform is Red Hat OpenStack Platform (RHOSP). Acceptable values use the following format:
openstack/NetworkID:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
. Replacexxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
with the value from the/var/config/openstack/latest/network_data.json
metadata file. This filter ensures that VFs are associated with a specific OpenStack network. The operator uses this filter to map the VFs to the appropriate network based on metadata provided by the OpenStack platform. - 16
- Optional: The driver to configure for the VFs created from this resource. The only allowed values are
netdevice
andvfio-pci
. The default value isnetdevice
.For a Mellanox NIC to work in DPDK mode on bare metal nodes, use the
netdevice
driver type and setisRdma
totrue
. - 17
- Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is
false
.If the
isRdma
parameter is set totrue
, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode.Set
isRdma
totrue
and additionally setneedVhostNet
totrue
to configure a Mellanox NIC for use with Fast Datapath DPDK applications.NoteYou cannot set the
isRdma
parameter totrue
for intel NICs. - 18
- Optional: The link type for the VFs. The default value is
eth
for Ethernet. Change this value to 'ib' for InfiniBand.When
linkType
is set toib
,isRdma
is automatically set totrue
by the SR-IOV Network Operator webhook. WhenlinkType
is set toib
,deviceType
should not be set tovfio-pci
.Do not set linkType to
eth
for SriovNetworkNodePolicy, because this can lead to an incorrect number of available devices reported by the device plugin. - 19
- Optional: To enable hardware offloading, you must set the
eSwitchMode
field to"switchdev"
. For more information about hardware offloading, see "Configuring hardware offloading". - 20
- Optional: To exclude advertising an SR-IOV network resource’s NUMA node to the Topology Manager, set the value to
true
. The default value isfalse
.
25.4.1.1. SR-IOV network node configuration examples
The following example describes the configuration for an InfiniBand device:
Example configuration for an InfiniBand device
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-ib-net-1 namespace: openshift-sriov-network-operator spec: resourceName: ibnic1 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 4 nicSelector: vendor: "15b3" deviceID: "101b" rootDevices: - "0000:19:00.0" linkType: ib isRdma: true
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-ib-net-1
namespace: openshift-sriov-network-operator
spec:
resourceName: ibnic1
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 4
nicSelector:
vendor: "15b3"
deviceID: "101b"
rootDevices:
- "0000:19:00.0"
linkType: ib
isRdma: true
The following example describes the configuration for an SR-IOV network device in a RHOSP virtual machine:
Example configuration for an SR-IOV device in a virtual machine
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-sriov-net-openstack-1 namespace: openshift-sriov-network-operator spec: resourceName: sriovnic1 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 1 nicSelector: vendor: "15b3" deviceID: "101b" netFilter: "openstack/NetworkID:ea24bd04-8674-4f69-b0ee-fa0b3bd20509"
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-sriov-net-openstack-1
namespace: openshift-sriov-network-operator
spec:
resourceName: sriovnic1
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 1
nicSelector:
vendor: "15b3"
deviceID: "101b"
netFilter: "openstack/NetworkID:ea24bd04-8674-4f69-b0ee-fa0b3bd20509"
25.4.1.2. Virtual function (VF) partitioning for SR-IOV devices
In some cases, you might want to split virtual functions (VFs) from the same physical function (PF) into multiple resource pools. For example, you might want some of the VFs to load with the default driver and the remaining VFs load with the vfio-pci
driver. In such a deployment, the pfNames
selector in your SriovNetworkNodePolicy custom resource (CR) can be used to specify a range of VFs for a pool using the following format: <pfname>#<first_vf>-<last_vf>
.
For example, the following YAML shows the selector for an interface named netpf0
with VF 2
through 7
:
pfNames: ["netpf0#2-7"]
pfNames: ["netpf0#2-7"]
-
netpf0
is the PF interface name. -
2
is the first VF index (0-based) that is included in the range. -
7
is the last VF index (0-based) that is included in the range.
You can select VFs from the same PF by using different policy CRs if the following requirements are met:
-
The
numVfs
value must be identical for policies that select the same PF. -
The VF index must be in the range of
0
to<numVfs>-1
. For example, if you have a policy withnumVfs
set to8
, then the<first_vf>
value must not be smaller than0
, and the<last_vf>
must not be larger than7
. - The VFs ranges in different policies must not overlap.
-
The
<first_vf>
must not be larger than the<last_vf>
.
The following example illustrates NIC partitioning for an SR-IOV device.
The policy policy-net-1
defines a resource pool net-1
that contains the VF 0
of PF netpf0
with the default VF driver. The policy policy-net-1-dpdk
defines a resource pool net-1-dpdk
that contains the VF 8
to 15
of PF netpf0
with the vfio
VF driver.
Policy policy-net-1
:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-net-1 namespace: openshift-sriov-network-operator spec: resourceName: net1 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 16 nicSelector: pfNames: ["netpf0#0-0"] deviceType: netdevice
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-net-1
namespace: openshift-sriov-network-operator
spec:
resourceName: net1
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 16
nicSelector:
pfNames: ["netpf0#0-0"]
deviceType: netdevice
Policy policy-net-1-dpdk
:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-net-1-dpdk namespace: openshift-sriov-network-operator spec: resourceName: net1dpdk nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 16 nicSelector: pfNames: ["netpf0#8-15"] deviceType: vfio-pci
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: policy-net-1-dpdk
namespace: openshift-sriov-network-operator
spec:
resourceName: net1dpdk
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: "true"
numVfs: 16
nicSelector:
pfNames: ["netpf0#8-15"]
deviceType: vfio-pci
Verifying that the interface is successfully partitioned
Confirm that the interface partitioned to virtual functions (VFs) for the SR-IOV device by running the following command.
ip link show <interface>
$ ip link show <interface>
- 1
- Replace
<interface>
with the interface that you specified when partitioning to VFs for the SR-IOV device, for example,ens3f1
.
Example output
5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 3c:fd:fe:d1:bc:01 brd ff:ff:ff:ff:ff:ff vf 0 link/ether 5a:e7:88:25:ea:a0 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off vf 1 link/ether 3e:1d:36:d7:3d:49 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off vf 2 link/ether ce:09:56:97:df:f9 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off vf 3 link/ether 5e:91:cf:88:d1:38 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off vf 4 link/ether e6:06:a1:96:2f:de brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:fd:fe:d1:bc:01 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 5a:e7:88:25:ea:a0 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 1 link/ether 3e:1d:36:d7:3d:49 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 2 link/ether ce:09:56:97:df:f9 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 3 link/ether 5e:91:cf:88:d1:38 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 4 link/ether e6:06:a1:96:2f:de brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
25.4.2. Configuring SR-IOV network devices
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
CustomResourceDefinition to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).
When applying the configuration specified in a SriovNetworkNodePolicy
object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes.
It might take several minutes for a configuration change to apply.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role. - You have installed the SR-IOV Network Operator.
- You have enough available nodes in your cluster to handle the evicted workload from drained nodes.
- You have not selected any control plane nodes for SR-IOV network device configuration.
Procedure
-
Create an
SriovNetworkNodePolicy
object, and then save the YAML in the<name>-sriov-node-network.yaml
file. Replace<name>
with the name for this configuration. -
Optional: Label the SR-IOV capable cluster nodes with
SriovNetworkNodePolicy.Spec.NodeSelector
if they are not already labeled. For more information about labeling nodes, see "Understanding how to update labels on nodes". Create the
SriovNetworkNodePolicy
object:oc create -f <name>-sriov-node-network.yaml
$ oc create -f <name>-sriov-node-network.yaml
Copy to Clipboard Copied! where
<name>
specifies the name for this configuration.After applying the configuration update, all the pods in
sriov-network-operator
namespace transition to theRunning
status.To verify that the SR-IOV network device is configured, enter the following command. Replace
<node_name>
with the name of a node with the SR-IOV network device that you just configured.oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
Copy to Clipboard Copied!
25.4.2.1. Configuring parallel node draining during SR-IOV network policy updates
By default, the SR-IOV Network Operator drains workloads from a node before every policy change. The Operator completes this action, one node at a time, to ensure that no workloads are affected by the reconfiguration.
In large clusters, draining nodes sequentially can be time-consuming, taking hours or even days. In time-sensitive environments, you can enable parallel node draining in an SriovNetworkPoolConfig
custom resource (CR) for faster rollouts of SR-IOV network configurations.
To configure parallel draining, use the SriovNetworkPoolConfig
CR to create a node pool. You can then add nodes to the pool and define the maximum number of nodes in the pool that the Operator can drain in parallel. With this approach, you can enable parallel draining for faster reconfiguration while ensuring you still have enough nodes remaining in the pool to handle any running workloads.
A node can belong to only one SR-IOV network pool configuration. If a node is not part of a pool, it is added to a virtual, default pool that is configured to drain one node at a time only.
The node might restart during the draining process.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the SR-IOV Network Operator.
- Ensure that nodes have hardware that supports SR-IOV.
Procedure
Create a
SriovNetworkPoolConfig
resource:Create a YAML file that defines the
SriovNetworkPoolConfig
resource:Example
sriov-nw-pool.yaml
fileapiVersion: v1 kind: SriovNetworkPoolConfig metadata: name: pool-1 namespace: openshift-sriov-network-operator spec: maxUnavailable: 2 nodeSelector: matchLabels: node-role.kubernetes.io/worker: ""
apiVersion: v1 kind: SriovNetworkPoolConfig metadata: name: pool-1
1 namespace: openshift-sriov-network-operator
2 spec: maxUnavailable: 2
3 nodeSelector:
4 matchLabels: node-role.kubernetes.io/worker: ""
Copy to Clipboard Copied! - 1
- Specify the name of the
SriovNetworkPoolConfig
object. - 2
- Specify namespace where the SR-IOV Network Operator is installed.
- 3
- Specify an integer number or percentage value for nodes that can be unavailable in the pool during an update. For example, if you have 10 nodes and you set the maximum unavailable value to 2, then only 2 nodes can be drained in parallel at any time, leaving 8 nodes for handling workloads.
- 4
- Specify the nodes to add the pool by using the node selector. This example adds all nodes with the
worker
role to the pool.
Create the
SriovNetworkPoolConfig
resource by running the following command:oc create -f sriov-nw-pool.yaml
$ oc create -f sriov-nw-pool.yaml
Copy to Clipboard Copied!
Create the
sriov-test
namespace by running the following comand:oc create namespace sriov-test
$ oc create namespace sriov-test
Copy to Clipboard Copied! Create a
SriovNetworkNodePolicy
resource:Create a YAML file that defines the
SriovNetworkNodePolicy
resource:Example
sriov-node-policy.yaml
fileapiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-nic-1 namespace: openshift-sriov-network-operator spec: deviceType: netdevice nicSelector: pfNames: ["ens1"] nodeSelector: node-role.kubernetes.io/worker: "" numVfs: 5 priority: 99 resourceName: sriov_nic_1
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-nic-1 namespace: openshift-sriov-network-operator spec: deviceType: netdevice nicSelector: pfNames: ["ens1"] nodeSelector: node-role.kubernetes.io/worker: "" numVfs: 5 priority: 99 resourceName: sriov_nic_1
Copy to Clipboard Copied! Create the
SriovNetworkNodePolicy
resource by running the following command:oc create -f sriov-node-policy.yaml
$ oc create -f sriov-node-policy.yaml
Copy to Clipboard Copied!
Create a
SriovNetwork
resource:Create a YAML file that defines the
SriovNetwork
resource:Example
sriov-network.yaml
fileapiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-nic-1 namespace: openshift-sriov-network-operator spec: linkState: auto networkNamespace: sriov-test resourceName: sriov_nic_1 capabilities: '{ "mac": true, "ips": true }' ipam: '{ "type": "static" }'
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-nic-1 namespace: openshift-sriov-network-operator spec: linkState: auto networkNamespace: sriov-test resourceName: sriov_nic_1 capabilities: '{ "mac": true, "ips": true }' ipam: '{ "type": "static" }'
Copy to Clipboard Copied! Create the
SriovNetwork
resource by running the following command:oc create -f sriov-network.yaml
$ oc create -f sriov-network.yaml
Copy to Clipboard Copied!
Verification
View the node pool you created by running the following command:
oc get sriovNetworkpoolConfig -n openshift-sriov-network-operator
$ oc get sriovNetworkpoolConfig -n openshift-sriov-network-operator
Copy to Clipboard Copied! Example output
NAME AGE pool-1 67s
NAME AGE pool-1 67s
1 Copy to Clipboard Copied! - 1
- In this example,
pool-1
contains all the nodes with theworker
role.
To demonstrate the node draining process by using the example scenario from the previous procedure, complete the following steps:
Update the number of virtual functions in the
SriovNetworkNodePolicy
resource to trigger workload draining in the cluster:oc patch SriovNetworkNodePolicy sriov-nic-1 -n openshift-sriov-network-operator --type merge -p '{"spec": {"numVfs": 4}}'
$ oc patch SriovNetworkNodePolicy sriov-nic-1 -n openshift-sriov-network-operator --type merge -p '{"spec": {"numVfs": 4}}'
Copy to Clipboard Copied! Monitor the draining status on the target cluster by running the following command:
oc get sriovNetworkNodeState -n openshift-sriov-network-operator
$ oc get sriovNetworkNodeState -n openshift-sriov-network-operator
Copy to Clipboard Copied! Example output
NAMESPACE NAME SYNC STATUS DESIRED SYNC STATE CURRENT SYNC STATE AGE openshift-sriov-network-operator worker-0 InProgress Drain_Required DrainComplete 3d10h openshift-sriov-network-operator worker-1 InProgress Drain_Required DrainComplete 3d10h
NAMESPACE NAME SYNC STATUS DESIRED SYNC STATE CURRENT SYNC STATE AGE openshift-sriov-network-operator worker-0 InProgress Drain_Required DrainComplete 3d10h openshift-sriov-network-operator worker-1 InProgress Drain_Required DrainComplete 3d10h
Copy to Clipboard Copied! When the draining process is complete, the
SYNC STATUS
changes toSucceeded
, and theDESIRED SYNC STATE
andCURRENT SYNC STATE
values return toIDLE
.Example output
NAMESPACE NAME SYNC STATUS DESIRED SYNC STATE CURRENT SYNC STATE AGE openshift-sriov-network-operator worker-0 Succeeded Idle Idle 3d10h openshift-sriov-network-operator worker-1 Succeeded Idle Idle 3d10h
NAMESPACE NAME SYNC STATUS DESIRED SYNC STATE CURRENT SYNC STATE AGE openshift-sriov-network-operator worker-0 Succeeded Idle Idle 3d10h openshift-sriov-network-operator worker-1 Succeeded Idle Idle 3d10h
Copy to Clipboard Copied!
25.4.3. Troubleshooting SR-IOV configuration
After following the procedure to configure an SR-IOV network device, the following sections address some error conditions.
To display the state of nodes, run the following command:
oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name>
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name>
where: <node_name>
specifies the name of a node with an SR-IOV network device.
Error output: Cannot allocate memory
"lastSyncError": "write /sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs: cannot allocate memory"
"lastSyncError": "write /sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs: cannot allocate memory"
When a node indicates that it cannot allocate memory, check the following items:
- Confirm that global SR-IOV settings are enabled in the BIOS for the node.
- Confirm that VT-d is enabled in the BIOS for the node.
25.4.4. Assigning an SR-IOV network to a VRF
As a cluster administrator, you can assign an SR-IOV network interface to your VRF domain by using the CNI VRF plugin.
To do this, add the VRF configuration to the optional metaPlugins
parameter of the SriovNetwork
resource.
Applications that use VRFs need to bind to a specific device. The common usage is to use the SO_BINDTODEVICE
option for a socket. SO_BINDTODEVICE
binds the socket to a device that is specified in the passed interface name, for example, eth1
. To use SO_BINDTODEVICE
, the application must have CAP_NET_RAW
capabilities.
Using a VRF through the ip vrf exec
command is not supported in OpenShift Container Platform pods. To use VRF, bind applications directly to the VRF interface.
25.4.4.1. Creating an additional SR-IOV network attachment with the CNI VRF plugin
The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition
custom resource (CR) automatically.
Do not edit NetworkAttachmentDefinition
custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.
To create an additional SR-IOV network attachment with the CNI VRF plugin, perform the following procedure.
Prerequisites
- Install the OpenShift Container Platform CLI (oc).
- Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.
Procedure
Create the
SriovNetwork
custom resource (CR) for the additional SR-IOV network attachment and insert themetaPlugins
configuration, as in the following example CR. Save the YAML as the filesriov-network-attachment.yaml
.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: example-network namespace: additional-sriov-network-1 spec: ipam: | { "type": "host-local", "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "routes": [{ "dst": "0.0.0.0/0" }], "gateway": "10.56.217.1" } vlan: 0 resourceName: intelnics metaPlugins : | { "type": "vrf", "vrfname": "example-vrf-name" }
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: example-network namespace: additional-sriov-network-1 spec: ipam: | { "type": "host-local", "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "routes": [{ "dst": "0.0.0.0/0" }], "gateway": "10.56.217.1" } vlan: 0 resourceName: intelnics metaPlugins : | { "type": "vrf",
1 "vrfname": "example-vrf-name"
2 }
Copy to Clipboard Copied! Create the
SriovNetwork
resource:oc create -f sriov-network-attachment.yaml
$ oc create -f sriov-network-attachment.yaml
Copy to Clipboard Copied!
Verifying that the NetworkAttachmentDefinition
CR is successfully created
Confirm that the SR-IOV Network Operator created the
NetworkAttachmentDefinition
CR by running the following command.oc get network-attachment-definitions -n <namespace>
$ oc get network-attachment-definitions -n <namespace>
1 Copy to Clipboard Copied! - 1
- Replace
<namespace>
with the namespace that you specified when configuring the network attachment, for example,additional-sriov-network-1
.
Example output
NAME AGE additional-sriov-network-1 14m
NAME AGE additional-sriov-network-1 14m
Copy to Clipboard Copied! NoteThere might be a delay before the SR-IOV Network Operator creates the CR.
Verifying that the additional SR-IOV network attachment is successful
To verify that the VRF CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:
- Create an SR-IOV network that uses the VRF CNI.
- Assign the network to a pod.
Verify that the pod network attachment is connected to the SR-IOV additional network. Remote shell into the pod and run the following command:
ip vrf show
$ ip vrf show
Copy to Clipboard Copied! Example output
Name Table ----------------------- red 10
Name Table ----------------------- red 10
Copy to Clipboard Copied! Confirm the VRF interface is master of the secondary interface:
ip link
$ ip link
Copy to Clipboard Copied! Example output
... 5: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master red state UP mode ...
... 5: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master red state UP mode ...
Copy to Clipboard Copied!
25.4.5. Exclude the SR-IOV network topology for NUMA-aware scheduling
You can exclude advertising the Non-Uniform Memory Access (NUMA) node for the SR-IOV network to the Topology Manager for more flexible SR-IOV network deployments during NUMA-aware pod scheduling.
In some scenarios, it is a priority to maximize CPU and memory resources for a pod on a single NUMA node. By not providing a hint to the Topology Manager about the NUMA node for the pod’s SR-IOV network resource, the Topology Manager can deploy the SR-IOV network resource and the pod CPU and memory resources to different NUMA nodes. This can add to network latency because of the data transfer between NUMA nodes. However, it is acceptable in scenarios when workloads require optimal CPU and memory performance.
For example, consider a compute node, compute-1
, that features two NUMA nodes: numa0
and numa1
. The SR-IOV-enabled NIC is present on numa0
. The CPUs available for pod scheduling are present on numa1
only. By setting the excludeTopology
specification to true
, the Topology Manager can assign CPU and memory resources for the pod to numa1
and can assign the SR-IOV network resource for the same pod to numa0
. This is only possible when you set the excludeTopology
specification to true
. Otherwise, the Topology Manager attempts to place all resources on the same NUMA node.
25.4.5.1. Excluding the SR-IOV network topology for NUMA-aware scheduling
To exclude advertising the SR-IOV network resource’s Non-Uniform Memory Access (NUMA) node to the Topology Manager, you can configure the excludeTopology
specification in the SriovNetworkNodePolicy
custom resource. Use this configuration for more flexible SR-IOV network deployments during NUMA-aware pod scheduling.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have configured the CPU Manager policy to
static
. For more information about CPU Manager, see the Additional resources section. -
You have configured the Topology Manager policy to
single-numa-node
. - You have installed the SR-IOV Network Operator.
Procedure
Create the
SriovNetworkNodePolicy
CR:Save the following YAML in the
sriov-network-node-policy.yaml
file, replacing values in the YAML to match your environment:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: <policy_name> namespace: openshift-sriov-network-operator spec: resourceName: sriovnuma0 nodeSelector: kubernetes.io/hostname: <node_name> numVfs: <number_of_Vfs> nicSelector: vendor: "<vendor_ID>" deviceID: "<device_ID>" deviceType: netdevice excludeTopology: true
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: <policy_name> namespace: openshift-sriov-network-operator spec: resourceName: sriovnuma0
1 nodeSelector: kubernetes.io/hostname: <node_name> numVfs: <number_of_Vfs> nicSelector:
2 vendor: "<vendor_ID>" deviceID: "<device_ID>" deviceType: netdevice excludeTopology: true
3 Copy to Clipboard Copied! - 1
- The resource name of the SR-IOV network device plugin. This YAML uses a sample
resourceName
value. - 2
- Identify the device for the Operator to configure by using the NIC selector.
- 3
- To exclude advertising the NUMA node for the SR-IOV network resource to the Topology Manager, set the value to
true
. The default value isfalse
.
NoteIf multiple
SriovNetworkNodePolicy
resources target the same SR-IOV network resource, theSriovNetworkNodePolicy
resources must have the same value as theexcludeTopology
specification. Otherwise, the conflicting policy is rejected.Create the
SriovNetworkNodePolicy
resource by running the following command:oc create -f sriov-network-node-policy.yaml
$ oc create -f sriov-network-node-policy.yaml
Copy to Clipboard Copied! Example output
sriovnetworknodepolicy.sriovnetwork.openshift.io/policy-for-numa-0 created
sriovnetworknodepolicy.sriovnetwork.openshift.io/policy-for-numa-0 created
Copy to Clipboard Copied!
Create the
SriovNetwork
CR:Save the following YAML in the
sriov-network.yaml
file, replacing values in the YAML to match your environment:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-numa-0-network namespace: openshift-sriov-network-operator spec: resourceName: sriovnuma0 networkNamespace: <namespace> ipam: |- { "type": "<ipam_type>", }
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-numa-0-network
1 namespace: openshift-sriov-network-operator spec: resourceName: sriovnuma0
2 networkNamespace: <namespace>
3 ipam: |-
4 { "type": "<ipam_type>", }
Copy to Clipboard Copied! - 1
- Replace
sriov-numa-0-network
with the name for the SR-IOV network resource. - 2
- Specify the resource name for the
SriovNetworkNodePolicy
CR from the previous step. This YAML uses a sampleresourceName
value. - 3
- Enter the namespace for your SR-IOV network resource.
- 4
- Enter the IP address management configuration for the SR-IOV network.
Create the
SriovNetwork
resource by running the following command:oc create -f sriov-network.yaml
$ oc create -f sriov-network.yaml
Copy to Clipboard Copied! Example output
sriovnetwork.sriovnetwork.openshift.io/sriov-numa-0-network created
sriovnetwork.sriovnetwork.openshift.io/sriov-numa-0-network created
Copy to Clipboard Copied!
Create a pod and assign the SR-IOV network resource from the previous step:
Save the following YAML in the
sriov-network-pod.yaml
file, replacing values in the YAML to match your environment:apiVersion: v1 kind: Pod metadata: name: <pod_name> annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "sriov-numa-0-network", } ] spec: containers: - name: <container_name> image: <image> imagePullPolicy: IfNotPresent command: ["sleep", "infinity"]
apiVersion: v1 kind: Pod metadata: name: <pod_name> annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "sriov-numa-0-network",
1 } ] spec: containers: - name: <container_name> image: <image> imagePullPolicy: IfNotPresent command: ["sleep", "infinity"]
Copy to Clipboard Copied! - 1
- This is the name of the
SriovNetwork
resource that uses theSriovNetworkNodePolicy
resource.
Create the
Pod
resource by running the following command:oc create -f sriov-network-pod.yaml
$ oc create -f sriov-network-pod.yaml
Copy to Clipboard Copied! Example output
pod/example-pod created
pod/example-pod created
Copy to Clipboard Copied!
Verification
Verify the status of the pod by running the following command, replacing
<pod_name>
with the name of the pod:oc get pod <pod_name>
$ oc get pod <pod_name>
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE test-deployment-sriov-76cbbf4756-k9v72 1/1 Running 0 45h
NAME READY STATUS RESTARTS AGE test-deployment-sriov-76cbbf4756-k9v72 1/1 Running 0 45h
Copy to Clipboard Copied! Open a debug session with the target pod to verify that the SR-IOV network resources are deployed to a different node than the memory and CPU resources.
Open a debug session with the pod by running the following command, replacing <pod_name> with the target pod name.
oc debug pod/<pod_name>
$ oc debug pod/<pod_name>
Copy to Clipboard Copied! Set
/host
as the root directory within the debug shell. The debug pod mounts the root file system from the host in/host
within the pod. By changing the root directory to/host
, you can run binaries from the host file system:chroot /host
$ chroot /host
Copy to Clipboard Copied! View information about the CPU allocation by running the following commands:
lscpu | grep NUMA
$ lscpu | grep NUMA
Copy to Clipboard Copied! Example output
NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,... NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,...
NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,... NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,...
Copy to Clipboard Copied! cat /proc/self/status | grep Cpus
$ cat /proc/self/status | grep Cpus
Copy to Clipboard Copied! Example output
Cpus_allowed: aa Cpus_allowed_list: 1,3,5,7
Cpus_allowed: aa Cpus_allowed_list: 1,3,5,7
Copy to Clipboard Copied! cat /sys/class/net/net1/device/numa_node
$ cat /sys/class/net/net1/device/numa_node
Copy to Clipboard Copied! Example output
0
0
Copy to Clipboard Copied! In this example, CPUs 1,3,5, and 7 are allocated to
NUMA node1
but the SR-IOV network resource can use the NIC inNUMA node0
.
If the excludeTopology
specification is set to True
, it is possible that the required resources exist in the same NUMA node.
25.4.6. Next steps
25.5. Configuring an SR-IOV Ethernet network attachment
You can configure an Ethernet network attachment for an Single Root I/O Virtualization (SR-IOV) device in the cluster.
25.5.1. Ethernet device configuration object
You can configure an Ethernet network device by defining an SriovNetwork
object.
The following YAML describes an SriovNetwork
object:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: <name> namespace: openshift-sriov-network-operator spec: resourceName: <sriov_resource_name> networkNamespace: <target_namespace> vlan: <vlan> spoofChk: "<spoof_check>" ipam: |- {} linkState: <link_state> maxTxRate: <max_tx_rate> minTxRate: <min_tx_rate> vlanQoS: <vlan_qos> trust: "<trust_vf>" capabilities: <capabilities>
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: <name>
namespace: openshift-sriov-network-operator
spec:
resourceName: <sriov_resource_name>
networkNamespace: <target_namespace>
vlan: <vlan>
spoofChk: "<spoof_check>"
ipam: |-
{}
linkState: <link_state>
maxTxRate: <max_tx_rate>
minTxRate: <min_tx_rate>
vlanQoS: <vlan_qos>
trust: "<trust_vf>"
capabilities: <capabilities>
- 1
- A name for the object. The SR-IOV Network Operator creates a
NetworkAttachmentDefinition
object with same name. - 2
- The namespace where the SR-IOV Network Operator is installed.
- 3
- The value for the
spec.resourceName
parameter from theSriovNetworkNodePolicy
object that defines the SR-IOV hardware for this additional network. - 4
- The target namespace for the
SriovNetwork
object. Only pods in the target namespace can attach to the additional network. - 5
- Optional: A Virtual LAN (VLAN) ID for the additional network. The integer value must be from
0
to4095
. The default value is0
. - 6
- Optional: The spoof check mode of the VF. The allowed values are the strings
"on"
and"off"
.ImportantYou must enclose the value you specify in quotes or the object is rejected by the SR-IOV Network Operator.
- 7
- A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
- 8
- Optional: The link state of virtual function (VF). Allowed value are
enable
,disable
andauto
. - 9
- Optional: A maximum transmission rate, in Mbps, for the VF.
- 10
- Optional: A minimum transmission rate, in Mbps, for the VF. This value must be less than or equal to the maximum transmission rate.Note
Intel NICs do not support the
minTxRate
parameter. For more information, see BZ#1772847. - 11
- Optional: An IEEE 802.1p priority level for the VF. The default value is
0
. - 12
- Optional: The trust mode of the VF. The allowed values are the strings
"on"
and"off"
.ImportantYou must enclose the value that you specify in quotes, or the SR-IOV Network Operator rejects the object.
- 13
- Optional: The capabilities to configure for this additional network. You can specify
'{ "ips": true }'
to enable IP address support or'{ "mac": true }'
to enable MAC address support.
25.5.1.1. Configuration of IP address assignment for an additional network
The IP address management (IPAM) Container Network Interface (CNI) plugin provides IP addresses for other CNI plugins.
You can use the following IP address assignment types:
- Static assignment.
- Dynamic assignment through a DHCP server. The DHCP server you specify must be reachable from the additional network.
- Dynamic assignment through the Whereabouts IPAM CNI plugin.
25.5.1.1.1. Static IP address assignment configuration
The following table describes the configuration for static IP address assignment:
Field | Type | Description |
---|---|---|
|
|
The IPAM address type. The value |
|
| An array of objects specifying IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported. |
|
| An array of objects specifying routes to configure inside the pod. |
|
| Optional: An array of objects specifying the DNS configuration. |
The addresses
array requires objects with the following fields:
Field | Type | Description |
---|---|---|
|
|
An IP address and network prefix that you specify. For example, if you specify |
|
| The default gateway to route egress network traffic to. |
Field | Type | Description |
---|---|---|
|
|
The IP address range in CIDR format, such as |
|
| The gateway where network traffic is routed. |
Field | Type | Description |
---|---|---|
|
| An array of one or more IP addresses for to send DNS queries to. |
|
|
The default domain to append to a hostname. For example, if the domain is set to |
|
|
An array of domain names to append to an unqualified hostname, such as |
Static IP address assignment configuration example
{ "ipam": { "type": "static", "addresses": [ { "address": "191.168.1.7/24" } ] } }
{
"ipam": {
"type": "static",
"addresses": [
{
"address": "191.168.1.7/24"
}
]
}
}
25.5.1.1.2. Dynamic IP address (DHCP) assignment configuration
The following JSON describes the configuration for dynamic IP address address assignment with DHCP.
A pod obtains its original DHCP lease when it is created. The lease must be periodically renewed by a minimal DHCP server deployment running on the cluster.
The SR-IOV Network Operator does not create a DHCP server deployment; The Cluster Network Operator is responsible for creating the minimal DHCP server deployment.
To trigger the deployment of the DHCP server, you must create a shim network attachment by editing the Cluster Network Operator configuration, as in the following example:
Example shim network attachment definition
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: additionalNetworks: - name: dhcp-shim namespace: default type: Raw rawCNIConfig: |- { "name": "dhcp-shim", "cniVersion": "0.3.1", "type": "bridge", "ipam": { "type": "dhcp" } } # ...
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
additionalNetworks:
- name: dhcp-shim
namespace: default
type: Raw
rawCNIConfig: |-
{
"name": "dhcp-shim",
"cniVersion": "0.3.1",
"type": "bridge",
"ipam": {
"type": "dhcp"
}
}
# ...
Field | Type | Description |
---|---|---|
|
|
The IPAM address type. The value |
Dynamic IP address (DHCP) assignment configuration example
{ "ipam": { "type": "dhcp" } }
{
"ipam": {
"type": "dhcp"
}
}
25.5.1.1.3. Dynamic IP address assignment configuration with Whereabouts
The Whereabouts CNI plugin allows the dynamic assignment of an IP address to an additional network without the use of a DHCP server.
The following table describes the configuration for dynamic IP address assignment with Whereabouts:
Field | Type | Description |
---|---|---|
|
|
The IPAM address type. The value |
|
| An IP address and range in CIDR notation. IP addresses are assigned from within this range of addresses. |
|
| Optional: A list of zero or more IP addresses and ranges in CIDR notation. IP addresses within an excluded address range are not assigned. |
Dynamic IP address assignment configuration example that uses Whereabouts
{ "ipam": { "type": "whereabouts", "range": "192.0.2.192/27", "exclude": [ "192.0.2.192/30", "192.0.2.196/32" ] } }
{
"ipam": {
"type": "whereabouts",
"range": "192.0.2.192/27",
"exclude": [
"192.0.2.192/30",
"192.0.2.196/32"
]
}
}
25.5.1.2. Creating a configuration for assignment of dual-stack IP addresses dynamically
Dual-stack IP address assignment can be configured with the ipRanges
parameter for:
- IPv4 addresses
- IPv6 addresses
- multiple IP address assignment
Procedure
-
Set
type
towhereabouts
. Use
ipRanges
to allocate IP addresses as shown in the following example:cniVersion: operator.openshift.io/v1 kind: Network =metadata: name: cluster spec: additionalNetworks: - name: whereabouts-shim namespace: default type: Raw rawCNIConfig: |- { "name": "whereabouts-dual-stack", "cniVersion": "0.3.1, "type": "bridge", "ipam": { "type": "whereabouts", "ipRanges": [ {"range": "192.168.10.0/24"}, {"range": "2001:db8::/64"} ] } }
cniVersion: operator.openshift.io/v1 kind: Network =metadata: name: cluster spec: additionalNetworks: - name: whereabouts-shim namespace: default type: Raw rawCNIConfig: |- { "name": "whereabouts-dual-stack", "cniVersion": "0.3.1, "type": "bridge", "ipam": { "type": "whereabouts", "ipRanges": [ {"range": "192.168.10.0/24"}, {"range": "2001:db8::/64"} ] } }
Copy to Clipboard Copied! - Attach network to a pod. For more information, see "Adding a pod to an additional network".
- Verify that all IP addresses are assigned.
Run the following command to ensure the IP addresses are assigned as metadata.
$ oc exec -it mypod -- ip a
$ oc exec -it mypod -- ip a
Copy to Clipboard Copied!
25.5.2. Configuring SR-IOV additional network
You can configure an additional network that uses SR-IOV hardware by creating an SriovNetwork
object. When you create an SriovNetwork
object, the SR-IOV Network Operator automatically creates a NetworkAttachmentDefinition
object.
Do not modify or delete an SriovNetwork
object if it is attached to any pods in a running
state.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
SriovNetwork
object, and then save the YAML in the<name>.yaml
file, where<name>
is a name for this additional network. The object specification might resemble the following example:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: attach1 namespace: openshift-sriov-network-operator spec: resourceName: net1 networkNamespace: project2 ipam: |- { "type": "host-local", "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "gateway": "10.56.217.1" }
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: attach1 namespace: openshift-sriov-network-operator spec: resourceName: net1 networkNamespace: project2 ipam: |- { "type": "host-local", "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "gateway": "10.56.217.1" }
Copy to Clipboard Copied! To create the object, enter the following command:
oc create -f <name>.yaml
$ oc create -f <name>.yaml
Copy to Clipboard Copied! where
<name>
specifies the name of the additional network.Optional: To confirm that the
NetworkAttachmentDefinition
object that is associated with theSriovNetwork
object that you created in the previous step exists, enter the following command. Replace<namespace>
with the networkNamespace you specified in theSriovNetwork
object.oc get net-attach-def -n <namespace>
$ oc get net-attach-def -n <namespace>
Copy to Clipboard Copied!
25.5.3. Next steps
25.6. Configuring an SR-IOV InfiniBand network attachment
You can configure an InfiniBand (IB) network attachment for an Single Root I/O Virtualization (SR-IOV) device in the cluster.
25.6.1. InfiniBand device configuration object
You can configure an InfiniBand (IB) network device by defining an SriovIBNetwork
object.
The following YAML describes an SriovIBNetwork
object:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovIBNetwork metadata: name: <name> namespace: openshift-sriov-network-operator spec: resourceName: <sriov_resource_name> networkNamespace: <target_namespace> ipam: |- {} linkState: <link_state> capabilities: <capabilities>
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
name: <name>
namespace: openshift-sriov-network-operator
spec:
resourceName: <sriov_resource_name>
networkNamespace: <target_namespace>
ipam: |-
{}
linkState: <link_state>
capabilities: <capabilities>
- 1
- A name for the object. The SR-IOV Network Operator creates a
NetworkAttachmentDefinition
object with same name. - 2
- The namespace where the SR-IOV Operator is installed.
- 3
- The value for the
spec.resourceName
parameter from theSriovNetworkNodePolicy
object that defines the SR-IOV hardware for this additional network. - 4
- The target namespace for the
SriovIBNetwork
object. Only pods in the target namespace can attach to the network device. - 5
- Optional: A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
- 6
- Optional: The link state of virtual function (VF). Allowed values are
enable
,disable
andauto
. - 7
- Optional: The capabilities to configure for this network. You can specify
'{ "ips": true }'
to enable IP address support or'{ "infinibandGUID": true }'
to enable IB Global Unique Identifier (GUID) support.
25.6.1.1. Configuration of IP address assignment for an additional network
The IP address management (IPAM) Container Network Interface (CNI) plugin provides IP addresses for other CNI plugins.
You can use the following IP address assignment types:
- Static assignment.
- Dynamic assignment through a DHCP server. The DHCP server you specify must be reachable from the additional network.
- Dynamic assignment through the Whereabouts IPAM CNI plugin.
25.6.1.1.1. Static IP address assignment configuration
The following table describes the configuration for static IP address assignment:
Field | Type | Description |
---|---|---|
|
|
The IPAM address type. The value |
|
| An array of objects specifying IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported. |
|
| An array of objects specifying routes to configure inside the pod. |
|
| Optional: An array of objects specifying the DNS configuration. |
The addresses
array requires objects with the following fields:
Field | Type | Description |
---|---|---|
|
|
An IP address and network prefix that you specify. For example, if you specify |
|
| The default gateway to route egress network traffic to. |
Field | Type | Description |
---|---|---|
|
|
The IP address range in CIDR format, such as |
|
| The gateway where network traffic is routed. |
Field | Type | Description |
---|---|---|
|
| An array of one or more IP addresses for to send DNS queries to. |
|
|
The default domain to append to a hostname. For example, if the domain is set to |
|
|
An array of domain names to append to an unqualified hostname, such as |
Static IP address assignment configuration example
{ "ipam": { "type": "static", "addresses": [ { "address": "191.168.1.7/24" } ] } }
{
"ipam": {
"type": "static",
"addresses": [
{
"address": "191.168.1.7/24"
}
]
}
}
25.6.1.1.2. Dynamic IP address (DHCP) assignment configuration
The following JSON describes the configuration for dynamic IP address address assignment with DHCP.
A pod obtains its original DHCP lease when it is created. The lease must be periodically renewed by a minimal DHCP server deployment running on the cluster.
To trigger the deployment of the DHCP server, you must create a shim network attachment by editing the Cluster Network Operator configuration, as in the following example:
Example shim network attachment definition
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: additionalNetworks: - name: dhcp-shim namespace: default type: Raw rawCNIConfig: |- { "name": "dhcp-shim", "cniVersion": "0.3.1", "type": "bridge", "ipam": { "type": "dhcp" } } # ...
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
additionalNetworks:
- name: dhcp-shim
namespace: default
type: Raw
rawCNIConfig: |-
{
"name": "dhcp-shim",
"cniVersion": "0.3.1",
"type": "bridge",
"ipam": {
"type": "dhcp"
}
}
# ...
Field | Type | Description |
---|---|---|
|
|
The IPAM address type. The value |
Dynamic IP address (DHCP) assignment configuration example
{ "ipam": { "type": "dhcp" } }
{
"ipam": {
"type": "dhcp"
}
}
25.6.1.1.3. Dynamic IP address assignment configuration with Whereabouts
The Whereabouts CNI plugin allows the dynamic assignment of an IP address to an additional network without the use of a DHCP server.
The following table describes the configuration for dynamic IP address assignment with Whereabouts:
Field | Type | Description |
---|---|---|
|
|
The IPAM address type. The value |
|
| An IP address and range in CIDR notation. IP addresses are assigned from within this range of addresses. |
|
| Optional: A list of zero or more IP addresses and ranges in CIDR notation. IP addresses within an excluded address range are not assigned. |
Dynamic IP address assignment configuration example that uses Whereabouts
{ "ipam": { "type": "whereabouts", "range": "192.0.2.192/27", "exclude": [ "192.0.2.192/30", "192.0.2.196/32" ] } }
{
"ipam": {
"type": "whereabouts",
"range": "192.0.2.192/27",
"exclude": [
"192.0.2.192/30",
"192.0.2.196/32"
]
}
}
25.6.1.2. Creating a configuration for assignment of dual-stack IP addresses dynamically
Dual-stack IP address assignment can be configured with the ipRanges
parameter for:
- IPv4 addresses
- IPv6 addresses
- multiple IP address assignment
Procedure
-
Set
type
towhereabouts
. Use
ipRanges
to allocate IP addresses as shown in the following example:cniVersion: operator.openshift.io/v1 kind: Network =metadata: name: cluster spec: additionalNetworks: - name: whereabouts-shim namespace: default type: Raw rawCNIConfig: |- { "name": "whereabouts-dual-stack", "cniVersion": "0.3.1, "type": "bridge", "ipam": { "type": "whereabouts", "ipRanges": [ {"range": "192.168.10.0/24"}, {"range": "2001:db8::/64"} ] } }
cniVersion: operator.openshift.io/v1 kind: Network =metadata: name: cluster spec: additionalNetworks: - name: whereabouts-shim namespace: default type: Raw rawCNIConfig: |- { "name": "whereabouts-dual-stack", "cniVersion": "0.3.1, "type": "bridge", "ipam": { "type": "whereabouts", "ipRanges": [ {"range": "192.168.10.0/24"}, {"range": "2001:db8::/64"} ] } }
Copy to Clipboard Copied! - Attach network to a pod. For more information, see "Adding a pod to an additional network".
- Verify that all IP addresses are assigned.
Run the following command to ensure the IP addresses are assigned as metadata.
$ oc exec -it mypod -- ip a
$ oc exec -it mypod -- ip a
Copy to Clipboard Copied!
25.6.2. Configuring SR-IOV additional network
You can configure an additional network that uses SR-IOV hardware by creating an SriovIBNetwork
object. When you create an SriovIBNetwork
object, the SR-IOV Network Operator automatically creates a NetworkAttachmentDefinition
object.
Do not modify or delete an SriovIBNetwork
object if it is attached to any pods in a running
state.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a
SriovIBNetwork
object, and then save the YAML in the<name>.yaml
file, where<name>
is a name for this additional network. The object specification might resemble the following example:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovIBNetwork metadata: name: attach1 namespace: openshift-sriov-network-operator spec: resourceName: net1 networkNamespace: project2 ipam: |- { "type": "host-local", "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "gateway": "10.56.217.1" }
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovIBNetwork metadata: name: attach1 namespace: openshift-sriov-network-operator spec: resourceName: net1 networkNamespace: project2 ipam: |- { "type": "host-local", "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "gateway": "10.56.217.1" }
Copy to Clipboard Copied! To create the object, enter the following command:
oc create -f <name>.yaml
$ oc create -f <name>.yaml
Copy to Clipboard Copied! where
<name>
specifies the name of the additional network.Optional: To confirm that the
NetworkAttachmentDefinition
object that is associated with theSriovIBNetwork
object that you created in the previous step exists, enter the following command. Replace<namespace>
with the networkNamespace you specified in theSriovIBNetwork
object.oc get net-attach-def -n <namespace>
$ oc get net-attach-def -n <namespace>
Copy to Clipboard Copied!
25.6.3. Next steps
25.7. Adding a pod to an SR-IOV additional network
You can add a pod to an existing Single Root I/O Virtualization (SR-IOV) network.
25.7.1. Runtime configuration for a network attachment
When attaching a pod to an additional network, you can specify a runtime configuration to make specific customizations for the pod. For example, you can request a specific MAC hardware address.
You specify the runtime configuration by setting an annotation in the pod specification. The annotation key is k8s.v1.cni.cncf.io/networks
, and it accepts a JSON object that describes the runtime configuration.
25.7.1.1. Runtime configuration for an Ethernet-based SR-IOV attachment
The following JSON describes the runtime configuration options for an Ethernet-based SR-IOV network attachment.
[ { "name": "<name>", "mac": "<mac_address>", "ips": ["<cidr_range>"] } ]
[
{
"name": "<name>",
"mac": "<mac_address>",
"ips": ["<cidr_range>"]
}
]
- 1
- The name of the SR-IOV network attachment definition CR.
- 2
- Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify
{ "mac": true }
in theSriovNetwork
object. - 3
- Optional: IP addresses for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify
{ "ips": true }
in theSriovNetwork
object.
Example runtime configuration
apiVersion: v1 kind: Pod metadata: name: sample-pod annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "net1", "mac": "20:04:0f:f1:88:01", "ips": ["192.168.10.1/24", "2001::1/64"] } ] spec: containers: - name: sample-container image: <image> imagePullPolicy: IfNotPresent command: ["sleep", "infinity"]
apiVersion: v1
kind: Pod
metadata:
name: sample-pod
annotations:
k8s.v1.cni.cncf.io/networks: |-
[
{
"name": "net1",
"mac": "20:04:0f:f1:88:01",
"ips": ["192.168.10.1/24", "2001::1/64"]
}
]
spec:
containers:
- name: sample-container
image: <image>
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
25.7.1.2. Runtime configuration for an InfiniBand-based SR-IOV attachment
The following JSON describes the runtime configuration options for an InfiniBand-based SR-IOV network attachment.
[ { "name": "<network_attachment>", "infiniband-guid": "<guid>", "ips": ["<cidr_range>"] } ]
[
{
"name": "<network_attachment>",
"infiniband-guid": "<guid>",
"ips": ["<cidr_range>"]
}
]
- 1
- The name of the SR-IOV network attachment definition CR.
- 2
- The InfiniBand GUID for the SR-IOV device. To use this feature, you also must specify
{ "infinibandGUID": true }
in theSriovIBNetwork
object. - 3
- The IP addresses for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify
{ "ips": true }
in theSriovIBNetwork
object.
Example runtime configuration
apiVersion: v1 kind: Pod metadata: name: sample-pod annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "ib1", "infiniband-guid": "c2:11:22:33:44:55:66:77", "ips": ["192.168.10.1/24", "2001::1/64"] } ] spec: containers: - name: sample-container image: <image> imagePullPolicy: IfNotPresent command: ["sleep", "infinity"]
apiVersion: v1
kind: Pod
metadata:
name: sample-pod
annotations:
k8s.v1.cni.cncf.io/networks: |-
[
{
"name": "ib1",
"infiniband-guid": "c2:11:22:33:44:55:66:77",
"ips": ["192.168.10.1/24", "2001::1/64"]
}
]
spec:
containers:
- name: sample-container
image: <image>
imagePullPolicy: IfNotPresent
command: ["sleep", "infinity"]
25.7.2. Adding a pod to an additional network
You can add a pod to an additional network. The pod continues to send normal cluster-related network traffic over the default network.
When a pod is created additional networks are attached to it. However, if a pod already exists, you cannot attach additional networks to it.
The pod must be in the same namespace as the additional network.
The SR-IOV Network Resource Injector adds the resource
field to the first container in a pod automatically.
If you are using an Intel network interface controller (NIC) in Data Plane Development Kit (DPDK) mode, only the first container in your pod is configured to access the NIC. Your SR-IOV additional network is configured for DPDK mode if the deviceType
is set to vfio-pci
in the SriovNetworkNodePolicy
object.
You can work around this issue by either ensuring that the container that needs access to the NIC is the first container defined in the Pod
object or by disabling the Network Resource Injector. For more information, see BZ#1990953.
Prerequisites
-
Install the OpenShift CLI (
oc
). - Log in to the cluster.
- Install the SR-IOV Operator.
-
Create either an
SriovNetwork
object or anSriovIBNetwork
object to attach the pod to.
Procedure
Add an annotation to the
Pod
object. Only one of the following annotation formats can be used:To attach an additional network without any customization, add an annotation with the following format. Replace
<network>
with the name of the additional network to associate with the pod:metadata: annotations: k8s.v1.cni.cncf.io/networks: <network>[,<network>,...]
metadata: annotations: k8s.v1.cni.cncf.io/networks: <network>[,<network>,...]
1 Copy to Clipboard Copied! - 1
- To specify more than one additional network, separate each network with a comma. Do not include whitespace between the comma. If you specify the same additional network multiple times, that pod will have multiple network interfaces attached to that network.
To attach an additional network with customizations, add an annotation with the following format:
metadata: annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "<network>", "namespace": "<namespace>", "default-route": ["<default-route>"] } ]
metadata: annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "<network>",
1 "namespace": "<namespace>",
2 "default-route": ["<default-route>"]
3 } ]
Copy to Clipboard Copied!
To create the pod, enter the following command. Replace
<name>
with the name of the pod.oc create -f <name>.yaml
$ oc create -f <name>.yaml
Copy to Clipboard Copied! Optional: To Confirm that the annotation exists in the
Pod
CR, enter the following command, replacing<name>
with the name of the pod.oc get pod <name> -o yaml
$ oc get pod <name> -o yaml
Copy to Clipboard Copied! In the following example, the
example-pod
pod is attached to thenet1
additional network:oc get pod example-pod -o yaml
$ oc get pod example-pod -o yaml apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks: macvlan-bridge k8s.v1.cni.cncf.io/network-status: |-
1 [{ "name": "openshift-sdn", "interface": "eth0", "ips": [ "10.128.2.14" ], "default": true, "dns": {} },{ "name": "macvlan-bridge", "interface": "net1", "ips": [ "20.2.2.100" ], "mac": "22:2f:60:a5:f8:00", "dns": {} }] name: example-pod namespace: default spec: ... status: ...
Copy to Clipboard Copied! - 1
- The
k8s.v1.cni.cncf.io/network-status
parameter is a JSON array of objects. Each object describes the status of an additional network attached to the pod. The annotation value is stored as a plain text value.
25.7.3. Creating a non-uniform memory access (NUMA) aligned SR-IOV pod
You can create a NUMA aligned SR-IOV pod by restricting SR-IOV and the CPU resources allocated from the same NUMA node with restricted
or single-numa-node
Topology Manager polices.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have configured the CPU Manager policy to
static
. For more information on CPU Manager, see the "Additional resources" section. You have configured the Topology Manager policy to
single-numa-node
.NoteWhen
single-numa-node
is unable to satisfy the request, you can configure the Topology Manager policy torestricted
. For more flexible SR-IOV network resource scheduling, see Excluding SR-IOV network topology during NUMA-aware scheduling in the Additional resources section.
Procedure
Create the following SR-IOV pod spec, and then save the YAML in the
<name>-sriov-pod.yaml
file. Replace<name>
with a name for this pod.The following example shows an SR-IOV pod spec:
apiVersion: v1 kind: Pod metadata: name: sample-pod annotations: k8s.v1.cni.cncf.io/networks: <name> spec: containers: - name: sample-container image: <image> command: ["sleep", "infinity"] resources: limits: memory: "1Gi" cpu: "2" requests: memory: "1Gi" cpu: "2"
apiVersion: v1 kind: Pod metadata: name: sample-pod annotations: k8s.v1.cni.cncf.io/networks: <name>
1 spec: containers: - name: sample-container image: <image>
2 command: ["sleep", "infinity"] resources: limits: memory: "1Gi"
3 cpu: "2"
4 requests: memory: "1Gi" cpu: "2"
Copy to Clipboard Copied! - 1
- Replace
<name>
with the name of the SR-IOV network attachment definition CR. - 2
- Replace
<image>
with the name of thesample-pod
image. - 3
- To create the SR-IOV pod with guaranteed QoS, set
memory limits
equal tomemory requests
. - 4
- To create the SR-IOV pod with guaranteed QoS, set
cpu limits
equals tocpu requests
.
Create the sample SR-IOV pod by running the following command:
oc create -f <filename>
$ oc create -f <filename>
1 Copy to Clipboard Copied! - 1
- Replace
<filename>
with the name of the file you created in the previous step.
Confirm that the
sample-pod
is configured with guaranteed QoS.oc describe pod sample-pod
$ oc describe pod sample-pod
Copy to Clipboard Copied! Confirm that the
sample-pod
is allocated with exclusive CPUs.oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
$ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
Copy to Clipboard Copied! Confirm that the SR-IOV device and CPUs that are allocated for the
sample-pod
are on the same NUMA node.oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
$ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
Copy to Clipboard Copied!
25.7.4. A test pod template for clusters that use SR-IOV on OpenStack
The following testpmd
pod demonstrates container creation with huge pages, reserved CPUs, and the SR-IOV port.
An example testpmd
pod
apiVersion: v1 kind: Pod metadata: name: testpmd-sriov namespace: mynamespace annotations: cpu-load-balancing.crio.io: "disable" cpu-quota.crio.io: "disable" # ... spec: containers: - name: testpmd command: ["sleep", "99999"] image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9 securityContext: capabilities: add: ["IPC_LOCK","SYS_ADMIN"] privileged: true runAsUser: 0 resources: requests: memory: 1000Mi hugepages-1Gi: 1Gi cpu: '2' openshift.io/sriov1: 1 limits: hugepages-1Gi: 1Gi cpu: '2' memory: 1000Mi openshift.io/sriov1: 1 volumeMounts: - mountPath: /dev/hugepages name: hugepage readOnly: False runtimeClassName: performance-cnf-performanceprofile volumes: - name: hugepage emptyDir: medium: HugePages
apiVersion: v1
kind: Pod
metadata:
name: testpmd-sriov
namespace: mynamespace
annotations:
cpu-load-balancing.crio.io: "disable"
cpu-quota.crio.io: "disable"
# ...
spec:
containers:
- name: testpmd
command: ["sleep", "99999"]
image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
securityContext:
capabilities:
add: ["IPC_LOCK","SYS_ADMIN"]
privileged: true
runAsUser: 0
resources:
requests:
memory: 1000Mi
hugepages-1Gi: 1Gi
cpu: '2'
openshift.io/sriov1: 1
limits:
hugepages-1Gi: 1Gi
cpu: '2'
memory: 1000Mi
openshift.io/sriov1: 1
volumeMounts:
- mountPath: /dev/hugepages
name: hugepage
readOnly: False
runtimeClassName: performance-cnf-performanceprofile
volumes:
- name: hugepage
emptyDir:
medium: HugePages
- 1
- This example assumes that the name of the performance profile is
cnf-performance profile
.
25.8. Configuring interface-level network sysctl settings and all-multicast mode for SR-IOV networks
As a cluster administrator, you can change interface-level network sysctls and several interface attributes such as promiscuous mode, all-multicast mode, MTU, and MAC address by using the tuning Container Network Interface (CNI) meta plugin for a pod connected to a SR-IOV network device.
25.8.1. Labeling nodes with an SR-IOV enabled NIC
If you want to enable SR-IOV on only SR-IOV capable nodes there are a couple of ways to do this:
-
Install the Node Feature Discovery (NFD) Operator. NFD detects the presence of SR-IOV enabled NICs and labels the nodes with
node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = true
. Examine the
SriovNetworkNodeState
CR for each node. Theinterfaces
stanza includes a list of all of the SR-IOV devices discovered by the SR-IOV Network Operator on the worker node. Label each node withfeature.node.kubernetes.io/network-sriov.capable: "true"
by using the following command:$ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"
$ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"
Copy to Clipboard Copied! NoteYou can label the nodes with whatever name you want.
25.8.2. Setting one sysctl flag
You can set interface-level network sysctl
settings for a pod connected to a SR-IOV network device.
In this example, net.ipv4.conf.IFNAME.accept_redirects
is set to 1
on the created virtual interfaces.
The sysctl-tuning-test
is a namespace used in this example.
Use the following command to create the
sysctl-tuning-test
namespace:oc create namespace sysctl-tuning-test
$ oc create namespace sysctl-tuning-test
Copy to Clipboard Copied!
25.8.2.1. Setting one sysctl flag on nodes with SR-IOV network devices
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
custom resource definition (CRD) to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy
custom resource (CR).
When applying the configuration specified in a SriovNetworkNodePolicy
object, the SR-IOV Operator might drain and reboot the nodes.
It can take several minutes for a configuration change to apply.
Follow this procedure to create a SriovNetworkNodePolicy
custom resource (CR).
Procedure
Create an
SriovNetworkNodePolicy
custom resource (CR). For example, save the following YAML as the filepolicyoneflag-sriov-node-network.yaml
:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policyoneflag namespace: openshift-sriov-network-operator spec: resourceName: policyoneflag nodeSelector: feature.node.kubernetes.io/network-sriov.capable="true" priority: 10 numVfs: 5 nicSelector: pfNames: ["ens5"] deviceType: "netdevice" isRdma: false
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policyoneflag
1 namespace: openshift-sriov-network-operator
2 spec: resourceName: policyoneflag
3 nodeSelector:
4 feature.node.kubernetes.io/network-sriov.capable="true" priority: 10
5 numVfs: 5
6 nicSelector:
7 pfNames: ["ens5"]
8 deviceType: "netdevice"
9 isRdma: false
10 Copy to Clipboard Copied! - 1
- The name for the custom resource object.
- 2
- The namespace where the SR-IOV Network Operator is installed.
- 3
- The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name.
- 4
- The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.
- 5
- Optional: The priority is an integer value between
0
and99
. A smaller value receives higher priority. For example, a priority of10
is a higher priority than99
. The default value is99
. - 6
- The number of the virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than
127
. - 7
- The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally. If you specify
rootDevices
, you must also specify a value forvendor
,deviceID
, orpfNames
. If you specify bothpfNames
androotDevices
at the same time, ensure that they refer to the same device. If you specify a value fornetFilter
, then you do not need to specify any other parameter because a network ID is unique. - 8
- Optional: An array of one or more physical function (PF) names for the device.
- 9
- Optional: The driver type for the virtual functions. The only allowed value is
netdevice
. For a Mellanox NIC to work in DPDK mode on bare metal nodes, setisRdma
totrue
. - 10
- Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is
false
. If theisRdma
parameter is set totrue
, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode. SetisRdma
totrue
and additionally setneedVhostNet
totrue
to configure a Mellanox NIC for use with Fast Datapath DPDK applications.
NoteThe
vfio-pci
driver type is not supported.Create the
SriovNetworkNodePolicy
object:oc create -f policyoneflag-sriov-node-network.yaml
$ oc create -f policyoneflag-sriov-node-network.yaml
Copy to Clipboard Copied! After applying the configuration update, all the pods in
sriov-network-operator
namespace change to theRunning
status.To verify that the SR-IOV network device is configured, enter the following command. Replace
<node_name>
with the name of a node with the SR-IOV network device that you just configured.oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
Copy to Clipboard Copied! Example output
Succeeded
Succeeded
Copy to Clipboard Copied!
25.8.2.2. Configuring sysctl on a SR-IOV network
You can set interface specific sysctl
settings on virtual interfaces created by SR-IOV by adding the tuning configuration to the optional metaPlugins
parameter of the SriovNetwork
resource.
The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition
custom resource (CR) automatically.
Do not edit NetworkAttachmentDefinition
custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.
To change the interface-level network net.ipv4.conf.IFNAME.accept_redirects
sysctl
settings, create an additional SR-IOV network with the Container Network Interface (CNI) tuning plugin.
Prerequisites
- Install the OpenShift Container Platform CLI (oc).
- Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.
Procedure
Create the
SriovNetwork
custom resource (CR) for the additional SR-IOV network attachment and insert themetaPlugins
configuration, as in the following example CR. Save the YAML as the filesriov-network-interface-sysctl.yaml
.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: onevalidflag namespace: openshift-sriov-network-operator spec: resourceName: policyoneflag networkNamespace: sysctl-tuning-test ipam: '{ "type": "static" }' capabilities: '{ "mac": true, "ips": true }' metaPlugins : | { "type": "tuning", "capabilities":{ "mac":true }, "sysctl":{ "net.ipv4.conf.IFNAME.accept_redirects": "1" } }
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: onevalidflag
1 namespace: openshift-sriov-network-operator
2 spec: resourceName: policyoneflag
3 networkNamespace: sysctl-tuning-test
4 ipam: '{ "type": "static" }'
5 capabilities: '{ "mac": true, "ips": true }'
6 metaPlugins : |
7 { "type": "tuning", "capabilities":{ "mac":true }, "sysctl":{ "net.ipv4.conf.IFNAME.accept_redirects": "1" } }
Copy to Clipboard Copied! - 1
- A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
- 2
- The namespace where the SR-IOV Network Operator is installed.
- 3
- The value for the
spec.resourceName
parameter from theSriovNetworkNodePolicy
object that defines the SR-IOV hardware for this additional network. - 4
- The target namespace for the
SriovNetwork
object. Only pods in the target namespace can attach to the additional network. - 5
- A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
- 6
- Optional: Set capabilities for the additional network. You can specify
"{ "ips": true }"
to enable IP address support or"{ "mac": true }"
to enable MAC address support. - 7
- Optional: The metaPlugins parameter is used to add additional capabilities to the device. In this use case set the
type
field totuning
. Specify the interface-level networksysctl
you want to set in thesysctl
field.
Create the
SriovNetwork
resource:oc create -f sriov-network-interface-sysctl.yaml
$ oc create -f sriov-network-interface-sysctl.yaml
Copy to Clipboard Copied!
Verifying that the NetworkAttachmentDefinition
CR is successfully created
Confirm that the SR-IOV Network Operator created the
NetworkAttachmentDefinition
CR by running the following command:oc get network-attachment-definitions -n <namespace>
$ oc get network-attachment-definitions -n <namespace>
1 Copy to Clipboard Copied! - 1
- Replace
<namespace>
with the value fornetworkNamespace
that you specified in theSriovNetwork
object. For example,sysctl-tuning-test
.
Example output
NAME AGE onevalidflag 14m
NAME AGE onevalidflag 14m
Copy to Clipboard Copied! NoteThere might be a delay before the SR-IOV Network Operator creates the CR.
Verifying that the additional SR-IOV network attachment is successful
To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:
Create a
Pod
CR. Save the following YAML as the fileexamplepod.yaml
:apiVersion: v1 kind: Pod metadata: name: tunepod namespace: sysctl-tuning-test annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "onevalidflag", "mac": "0a:56:0a:83:04:0c", "ips": ["10.100.100.200/24"] } ] spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault
apiVersion: v1 kind: Pod metadata: name: tunepod namespace: sysctl-tuning-test annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "onevalidflag",
1 "mac": "0a:56:0a:83:04:0c",
2 "ips": ["10.100.100.200/24"]
3 } ] spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault
Copy to Clipboard Copied! - 1
- The name of the SR-IOV network attachment definition CR.
- 2
- Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify
{ "mac": true }
in the SriovNetwork object. - 3
- Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify
{ "ips": true }
in theSriovNetwork
object.
Create the
Pod
CR:oc apply -f examplepod.yaml
$ oc apply -f examplepod.yaml
Copy to Clipboard Copied! Verify that the pod is created by running the following command:
oc get pod -n sysctl-tuning-test
$ oc get pod -n sysctl-tuning-test
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE tunepod 1/1 Running 0 47s
NAME READY STATUS RESTARTS AGE tunepod 1/1 Running 0 47s
Copy to Clipboard Copied! Log in to the pod by running the following command:
oc rsh -n sysctl-tuning-test tunepod
$ oc rsh -n sysctl-tuning-test tunepod
Copy to Clipboard Copied! Verify the values of the configured sysctl flag. Find the value
net.ipv4.conf.IFNAME.accept_redirects
by running the following command::sysctl net.ipv4.conf.net1.accept_redirects
$ sysctl net.ipv4.conf.net1.accept_redirects
Copy to Clipboard Copied! Example output
net.ipv4.conf.net1.accept_redirects = 1
net.ipv4.conf.net1.accept_redirects = 1
Copy to Clipboard Copied!
25.8.3. Configuring sysctl settings for pods associated with bonded SR-IOV interface flag
You can set interface-level network sysctl
settings for a pod connected to a bonded SR-IOV network device.
In this example, the specific network interface-level sysctl
settings that can be configured are set on the bonded interface.
The sysctl-tuning-test
is a namespace used in this example.
Use the following command to create the
sysctl-tuning-test
namespace:oc create namespace sysctl-tuning-test
$ oc create namespace sysctl-tuning-test
Copy to Clipboard Copied!
25.8.3.1. Setting all sysctl flag on nodes with bonded SR-IOV network devices
The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io
custom resource definition (CRD) to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy
custom resource (CR).
When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes.
It might take several minutes for a configuration change to apply.
Follow this procedure to create a SriovNetworkNodePolicy
custom resource (CR).
Procedure
Create an
SriovNetworkNodePolicy
custom resource (CR). Save the following YAML as the filepolicyallflags-sriov-node-network.yaml
. Replacepolicyallflags
with the name for the configuration.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policyallflags namespace: openshift-sriov-network-operator spec: resourceName: policyallflags nodeSelector: node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = `true` priority: 10 numVfs: 5 nicSelector: pfNames: ["ens1f0"] deviceType: "netdevice" isRdma: false
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policyallflags
1 namespace: openshift-sriov-network-operator
2 spec: resourceName: policyallflags
3 nodeSelector:
4 node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = `true` priority: 10
5 numVfs: 5
6 nicSelector:
7 pfNames: ["ens1f0"]
8 deviceType: "netdevice"
9 isRdma: false
10 Copy to Clipboard Copied! - 1
- The name for the custom resource object.
- 2
- The namespace where the SR-IOV Network Operator is installed.
- 3
- The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name.
- 4
- The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.
- 5
- Optional: The priority is an integer value between
0
and99
. A smaller value receives higher priority. For example, a priority of10
is a higher priority than99
. The default value is99
. - 6
- The number of virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than
127
. - 7
- The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally. If you specify
rootDevices
, you must also specify a value forvendor
,deviceID
, orpfNames
. If you specify bothpfNames
androotDevices
at the same time, ensure that they refer to the same device. If you specify a value fornetFilter
, then you do not need to specify any other parameter because a network ID is unique. - 8
- Optional: An array of one or more physical function (PF) names for the device.
- 9
- Optional: The driver type for the virtual functions. The only allowed value is
netdevice
. For a Mellanox NIC to work in DPDK mode on bare metal nodes, setisRdma
totrue
. - 10
- Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is
false
. If theisRdma
parameter is set totrue
, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode. SetisRdma
totrue
and additionally setneedVhostNet
totrue
to configure a Mellanox NIC for use with Fast Datapath DPDK applications.
NoteThe
vfio-pci
driver type is not supported.Create the SriovNetworkNodePolicy object:
oc create -f policyallflags-sriov-node-network.yaml
$ oc create -f policyallflags-sriov-node-network.yaml
Copy to Clipboard Copied! After applying the configuration update, all the pods in sriov-network-operator namespace change to the
Running
status.To verify that the SR-IOV network device is configured, enter the following command. Replace
<node_name>
with the name of a node with the SR-IOV network device that you just configured.oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
Copy to Clipboard Copied! Example output
Succeeded
Succeeded
Copy to Clipboard Copied!
25.8.3.2. Configuring sysctl on a bonded SR-IOV network
You can set interface specific sysctl
settings on a bonded interface created from two SR-IOV interfaces. Do this by adding the tuning configuration to the optional Plugins
parameter of the bond network attachment definition.
Do not edit NetworkAttachmentDefinition
custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.
To change specific interface-level network sysctl
settings create the SriovNetwork
custom resource (CR) with the Container Network Interface (CNI) tuning plugin by using the following procedure.
Prerequisites
- Install the OpenShift Container Platform CLI (oc).
- Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.
Procedure
Create the
SriovNetwork
custom resource (CR) for the bonded interface as in the following example CR. Save the YAML as the filesriov-network-attachment.yaml
.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: allvalidflags namespace: openshift-sriov-network-operator spec: resourceName: policyallflags networkNamespace: sysctl-tuning-test capabilities: '{ "mac": true, "ips": true }'
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: allvalidflags
1 namespace: openshift-sriov-network-operator
2 spec: resourceName: policyallflags
3 networkNamespace: sysctl-tuning-test
4 capabilities: '{ "mac": true, "ips": true }'
5 Copy to Clipboard Copied! - 1
- A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
- 2
- The namespace where the SR-IOV Network Operator is installed.
- 3
- The value for the
spec.resourceName
parameter from theSriovNetworkNodePolicy
object that defines the SR-IOV hardware for this additional network. - 4
- The target namespace for the
SriovNetwork
object. Only pods in the target namespace can attach to the additional network. - 5
- Optional: The capabilities to configure for this additional network. You can specify
"{ "ips": true }"
to enable IP address support or"{ "mac": true }"
to enable MAC address support.
Create the
SriovNetwork
resource:oc create -f sriov-network-attachment.yaml
$ oc create -f sriov-network-attachment.yaml
Copy to Clipboard Copied! Create a bond network attachment definition as in the following example CR. Save the YAML as the file
sriov-bond-network-interface.yaml
.apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: bond-sysctl-network namespace: sysctl-tuning-test spec: config: '{ "cniVersion":"0.4.0", "name":"bound-net", "plugins":[ { "type":"bond", "mode": "active-backup", "failOverMac": 1, "linksInContainer": true, "miimon": "100", "links": [ {"name": "net1"}, {"name": "net2"} ], "ipam":{ "type":"static" } }, { "type":"tuning", "capabilities":{ "mac":true }, "sysctl":{ "net.ipv4.conf.IFNAME.accept_redirects": "0", "net.ipv4.conf.IFNAME.accept_source_route": "0", "net.ipv4.conf.IFNAME.disable_policy": "1", "net.ipv4.conf.IFNAME.secure_redirects": "0", "net.ipv4.conf.IFNAME.send_redirects": "0", "net.ipv6.conf.IFNAME.accept_redirects": "0", "net.ipv6.conf.IFNAME.accept_source_route": "1", "net.ipv6.neigh.IFNAME.base_reachable_time_ms": "20000", "net.ipv6.neigh.IFNAME.retrans_time_ms": "2000" } } ] }'
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: bond-sysctl-network namespace: sysctl-tuning-test spec: config: '{ "cniVersion":"0.4.0", "name":"bound-net", "plugins":[ { "type":"bond",
1 "mode": "active-backup",
2 "failOverMac": 1,
3 "linksInContainer": true,
4 "miimon": "100", "links": [
5 {"name": "net1"}, {"name": "net2"} ], "ipam":{
6 "type":"static" } }, { "type":"tuning",
7 "capabilities":{ "mac":true }, "sysctl":{ "net.ipv4.conf.IFNAME.accept_redirects": "0", "net.ipv4.conf.IFNAME.accept_source_route": "0", "net.ipv4.conf.IFNAME.disable_policy": "1", "net.ipv4.conf.IFNAME.secure_redirects": "0", "net.ipv4.conf.IFNAME.send_redirects": "0", "net.ipv6.conf.IFNAME.accept_redirects": "0", "net.ipv6.conf.IFNAME.accept_source_route": "1", "net.ipv6.neigh.IFNAME.base_reachable_time_ms": "20000", "net.ipv6.neigh.IFNAME.retrans_time_ms": "2000" } } ] }'
Copy to Clipboard Copied! - 1
- The type is
bond
. - 2
- The
mode
attribute specifies the bonding mode. The bonding modes supported are:-
balance-rr
- 0 -
active-backup
- 1 balance-xor
- 2For
balance-rr
orbalance-xor
modes, you must set thetrust
mode toon
for the SR-IOV virtual function.
-
- 3
- The
failover
attribute is mandatory for active-backup mode. - 4
- The
linksInContainer=true
flag informs the Bond CNI that the required interfaces are to be found inside the container. By default, Bond CNI looks for these interfaces on the host which does not work for integration with SRIOV and Multus. - 5
- The
links
section defines which interfaces will be used to create the bond. By default, Multus names the attached interfaces as: "net", plus a consecutive number, starting with one. - 6
- A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition. In this pod example IP addresses are configured manually, so in this case,
ipam
is set to static. - 7
- Add additional capabilities to the device. For example, set the
type
field totuning
. Specify the interface-level networksysctl
you want to set in the sysctl field. This example sets all interface-level networksysctl
settings that can be set.
Create the bond network attachment resource:
oc create -f sriov-bond-network-interface.yaml
$ oc create -f sriov-bond-network-interface.yaml
Copy to Clipboard Copied!
Verifying that the NetworkAttachmentDefinition
CR is successfully created
Confirm that the SR-IOV Network Operator created the
NetworkAttachmentDefinition
CR by running the following command:oc get network-attachment-definitions -n <namespace>
$ oc get network-attachment-definitions -n <namespace>
1 Copy to Clipboard Copied! - 1
- Replace
<namespace>
with the networkNamespace that you specified when configuring the network attachment, for example,sysctl-tuning-test
.
Example output
NAME AGE bond-sysctl-network 22m allvalidflags 47m
NAME AGE bond-sysctl-network 22m allvalidflags 47m
Copy to Clipboard Copied! NoteThere might be a delay before the SR-IOV Network Operator creates the CR.
Verifying that the additional SR-IOV network resource is successful
To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:
Create a
Pod
CR. For example, save the following YAML as the fileexamplepod.yaml
:apiVersion: v1 kind: Pod metadata: name: tunepod namespace: sysctl-tuning-test annotations: k8s.v1.cni.cncf.io/networks: |- [ {"name": "allvalidflags"}, {"name": "allvalidflags"}, { "name": "bond-sysctl-network", "interface": "bond0", "mac": "0a:56:0a:83:04:0c", "ips": ["10.100.100.200/24"] } ] spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault
apiVersion: v1 kind: Pod metadata: name: tunepod namespace: sysctl-tuning-test annotations: k8s.v1.cni.cncf.io/networks: |- [ {"name": "allvalidflags"},
1 {"name": "allvalidflags"}, { "name": "bond-sysctl-network", "interface": "bond0", "mac": "0a:56:0a:83:04:0c",
2 "ips": ["10.100.100.200/24"]
3 } ] spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault
Copy to Clipboard Copied! - 1
- The name of the SR-IOV network attachment definition CR.
- 2
- Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify
{ "mac": true }
in the SriovNetwork object. - 3
- Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify
{ "ips": true }
in theSriovNetwork
object.
Apply the YAML:
oc apply -f examplepod.yaml
$ oc apply -f examplepod.yaml
Copy to Clipboard Copied! Verify that the pod is created by running the following command:
oc get pod -n sysctl-tuning-test
$ oc get pod -n sysctl-tuning-test
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE tunepod 1/1 Running 0 47s
NAME READY STATUS RESTARTS AGE tunepod 1/1 Running 0 47s
Copy to Clipboard Copied! Log in to the pod by running the following command:
oc rsh -n sysctl-tuning-test tunepod
$ oc rsh -n sysctl-tuning-test tunepod
Copy to Clipboard Copied! Verify the values of the configured
sysctl
flag. Find the valuenet.ipv6.neigh.IFNAME.base_reachable_time_ms
by running the following command::sysctl net.ipv6.neigh.bond0.base_reachable_time_ms
$ sysctl net.ipv6.neigh.bond0.base_reachable_time_ms
Copy to Clipboard Copied! Example output
net.ipv6.neigh.bond0.base_reachable_time_ms = 20000
net.ipv6.neigh.bond0.base_reachable_time_ms = 20000
Copy to Clipboard Copied!
25.8.4. About all-multicast mode
Enabling all-multicast mode, particularly in the context of rootless applications, is critical. If you do not enable this mode, you would be required to grant the NET_ADMIN
capability to the pod’s Security Context Constraints (SCC). If you were to allow the NET_ADMIN
capability to grant the pod privileges to make changes that extend beyond its specific requirements, you could potentially expose security vulnerabilities.
The tuning CNI plugin supports changing several interface attributes, including all-multicast mode. By enabling this mode, you can allow applications running on Virtual Functions (VFs) that are configured on a SR-IOV network device to receive multicast traffic from applications on other VFs, whether attached to the same or different physical functions.
25.8.4.1. Enabling the all-multicast mode on an SR-IOV network
You can enable the all-multicast mode on an SR-IOV interface by:
-
Adding the tuning configuration to the
metaPlugins
parameter of theSriovNetwork
resource Setting the
allmulti
field totrue
in the tuning configurationNoteEnsure that you create the virtual function (VF) with trust enabled.
The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition
custom resource (CR) automatically.
Do not edit NetworkAttachmentDefinition
custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.
Enable the all-multicast mode on a SR-IOV network by following this guidance.
Prerequisites
- You have installed the OpenShift Container Platform CLI (oc).
-
You are logged in to the OpenShift Container Platform cluster as a user with
cluster-admin
privileges. - You have installed the SR-IOV Network Operator.
-
You have configured an appropriate
SriovNetworkNodePolicy
object.
Procedure
Create a YAML file with the following settings that defines a
SriovNetworkNodePolicy
object for a Mellanox ConnectX-5 device. Save the YAML file assriovnetpolicy-mlx.yaml
.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriovnetpolicy-mlx namespace: openshift-sriov-network-operator spec: deviceType: netdevice nicSelector: deviceID: "1017" pfNames: - ens8f0np0#0-9 rootDevices: - 0000:d8:00.0 vendor: "15b3" nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 10 priority: 99 resourceName: resourcemlx
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriovnetpolicy-mlx namespace: openshift-sriov-network-operator spec: deviceType: netdevice nicSelector: deviceID: "1017" pfNames: - ens8f0np0#0-9 rootDevices: - 0000:d8:00.0 vendor: "15b3" nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 10 priority: 99 resourceName: resourcemlx
Copy to Clipboard Copied! -
Optional: If the SR-IOV capable cluster nodes are not already labeled, add the
SriovNetworkNodePolicy.Spec.NodeSelector
label. For more information about labeling nodes, see "Understanding how to update labels on nodes". Create the
SriovNetworkNodePolicy
object by running the following command:oc create -f sriovnetpolicy-mlx.yaml
$ oc create -f sriovnetpolicy-mlx.yaml
Copy to Clipboard Copied! After applying the configuration update, all the pods in the
sriov-network-operator
namespace automatically move to aRunning
status.Create the
enable-allmulti-test
namespace by running the following command:oc create namespace enable-allmulti-test
$ oc create namespace enable-allmulti-test
Copy to Clipboard Copied! Create the
SriovNetwork
custom resource (CR) for the additional SR-IOV network attachment and insert themetaPlugins
configuration, as in the following example CR YAML, and save the file assriov-enable-all-multicast.yaml
.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: enableallmulti namespace: openshift-sriov-network-operator spec: resourceName: enableallmulti networkNamespace: enable-allmulti-test ipam: '{ "type": "static" }' capabilities: '{ "mac": true, "ips": true }' trust: "on" metaPlugins : | { "type": "tuning", "capabilities":{ "mac":true }, "allmulti": true } }
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: enableallmulti
1 namespace: openshift-sriov-network-operator
2 spec: resourceName: enableallmulti
3 networkNamespace: enable-allmulti-test
4 ipam: '{ "type": "static" }'
5 capabilities: '{ "mac": true, "ips": true }'
6 trust: "on"
7 metaPlugins : |
8 { "type": "tuning", "capabilities":{ "mac":true }, "allmulti": true } }
Copy to Clipboard Copied! - 1
- Specify a name for the object. The SR-IOV Network Operator creates a
NetworkAttachmentDefinition
object with the same name. - 2
- Specify the namespace where the SR-IOV Network Operator is installed.
- 3
- Specify a value for the
spec.resourceName
parameter from theSriovNetworkNodePolicy
object that defines the SR-IOV hardware for this additional network. - 4
- Specify the target namespace for the
SriovNetwork
object. Only pods in the target namespace can attach to the additional network. - 5
- Specify a configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
- 6
- Optional: Set capabilities for the additional network. You can specify
"{ "ips": true }"
to enable IP address support or"{ "mac": true }"
to enable MAC address support. - 7
- Specify the trust mode of the virtual function. This must be set to "on".
- 8
- Add more capabilities to the device by using the
metaPlugins
parameter. In this use case, set thetype
field totuning
, and add theallmulti
field and set it totrue
.
Create the
SriovNetwork
resource by running the following command:oc create -f sriov-enable-all-multicast.yaml
$ oc create -f sriov-enable-all-multicast.yaml
Copy to Clipboard Copied!
Verification of the NetworkAttachmentDefinition
CR
Confirm that the SR-IOV Network Operator created the
NetworkAttachmentDefinition
CR by running the following command:oc get network-attachment-definitions -n <namespace>
$ oc get network-attachment-definitions -n <namespace>
1 Copy to Clipboard Copied! - 1
- Replace
<namespace>
with the value fornetworkNamespace
that you specified in theSriovNetwork
object. For this example, that isenable-allmulti-test
.
Example output
NAME AGE enableallmulti 14m
NAME AGE enableallmulti 14m
Copy to Clipboard Copied! NoteThere might be a delay before the SR-IOV Network Operator creates the CR.
Display information about the SR-IOV network resources by running the following command:
oc get sriovnetwork -n openshift-sriov-network-operator
$ oc get sriovnetwork -n openshift-sriov-network-operator
Copy to Clipboard Copied!
Verification of the additional SR-IOV network attachment
To verify that the tuning CNI is correctly configured and that the additional SR-IOV network attachment is attached, follow these steps:
Create a
Pod
CR. Save the following sample YAML in a file namedexamplepod.yaml
:apiVersion: v1 kind: Pod metadata: name: samplepod namespace: enable-allmulti-test annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "enableallmulti", "mac": "0a:56:0a:83:04:0c", "ips": ["10.100.100.200/24"] } ] spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault
apiVersion: v1 kind: Pod metadata: name: samplepod namespace: enable-allmulti-test annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "enableallmulti",
1 "mac": "0a:56:0a:83:04:0c",
2 "ips": ["10.100.100.200/24"]
3 } ] spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 runAsGroup: 3000 allowPrivilegeEscalation: false capabilities: drop: ["ALL"] securityContext: runAsNonRoot: true seccompProfile: type: RuntimeDefault
Copy to Clipboard Copied! - 1
- Specify the name of the SR-IOV network attachment definition CR.
- 2
- Optional: Specify the MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify
{"mac": true}
in the SriovNetwork object. - 3
- Optional: Specify the IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify
{ "ips": true }
in theSriovNetwork
object.
Create the
Pod
CR by running the following command:oc apply -f examplepod.yaml
$ oc apply -f examplepod.yaml
Copy to Clipboard Copied! Verify that the pod is created by running the following command:
oc get pod -n enable-allmulti-test
$ oc get pod -n enable-allmulti-test
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE samplepod 1/1 Running 0 47s
NAME READY STATUS RESTARTS AGE samplepod 1/1 Running 0 47s
Copy to Clipboard Copied! Log in to the pod by running the following command:
oc rsh -n enable-allmulti-test samplepod
$ oc rsh -n enable-allmulti-test samplepod
Copy to Clipboard Copied! List all the interfaces associated with the pod by running the following command:
ip link
sh-4.4# ip link
Copy to Clipboard Copied! Example output
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8901 qdisc noqueue state UP mode DEFAULT group default link/ether 0a:58:0a:83:00:10 brd ff:ff:ff:ff:ff:ff link-netnsid 0 3: net1@if24: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether ee:9b:66:a4:ec:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8901 qdisc noqueue state UP mode DEFAULT group default link/ether 0a:58:0a:83:00:10 brd ff:ff:ff:ff:ff:ff link-netnsid 0
1 3: net1@if24: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default link/ether ee:9b:66:a4:ec:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
2 Copy to Clipboard Copied!
25.9. Using high performance multicast
You can use multicast on your Single Root I/O Virtualization (SR-IOV) hardware network.
25.9.1. High performance multicast
The OpenShift SDN network plugin supports multicast between pods on the default network. This is best used for low-bandwidth coordination or service discovery, and not high-bandwidth applications. For applications such as streaming media, like Internet Protocol television (IPTV) and multipoint videoconferencing, you can utilize Single Root I/O Virtualization (SR-IOV) hardware to provide near-native performance.
When using additional SR-IOV interfaces for multicast:
- Multicast packages must be sent or received by a pod through the additional SR-IOV interface.
- The physical network which connects the SR-IOV interfaces decides the multicast routing and topology, which is not controlled by OpenShift Container Platform.
25.9.2. Configuring an SR-IOV interface for multicast
The follow procedure creates an example SR-IOV interface for multicast.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
You must log in to the cluster with a user that has the
cluster-admin
role.
Procedure
Create a
SriovNetworkNodePolicy
object:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-example namespace: openshift-sriov-network-operator spec: resourceName: example nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 4 nicSelector: vendor: "8086" pfNames: ['ens803f0'] rootDevices: ['0000:86:00.0']
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: policy-example namespace: openshift-sriov-network-operator spec: resourceName: example nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 4 nicSelector: vendor: "8086" pfNames: ['ens803f0'] rootDevices: ['0000:86:00.0']
Copy to Clipboard Copied! Create a
SriovNetwork
object:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: net-example namespace: openshift-sriov-network-operator spec: networkNamespace: default ipam: | { "type": "host-local", "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "routes": [ {"dst": "224.0.0.0/5"}, {"dst": "232.0.0.0/5"} ], "gateway": "10.56.217.1" } resourceName: example
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: net-example namespace: openshift-sriov-network-operator spec: networkNamespace: default ipam: |
1 { "type": "host-local",
2 "subnet": "10.56.217.0/24", "rangeStart": "10.56.217.171", "rangeEnd": "10.56.217.181", "routes": [ {"dst": "224.0.0.0/5"}, {"dst": "232.0.0.0/5"} ], "gateway": "10.56.217.1" } resourceName: example
Copy to Clipboard Copied! Create a pod with multicast application:
apiVersion: v1 kind: Pod metadata: name: testpmd namespace: default annotations: k8s.v1.cni.cncf.io/networks: nic1 spec: containers: - name: example image: rhel7:latest securityContext: capabilities: add: ["NET_ADMIN"] command: [ "sleep", "infinity"]
apiVersion: v1 kind: Pod metadata: name: testpmd namespace: default annotations: k8s.v1.cni.cncf.io/networks: nic1 spec: containers: - name: example image: rhel7:latest securityContext: capabilities: add: ["NET_ADMIN"]
1 command: [ "sleep", "infinity"]
Copy to Clipboard Copied! - 1
- The
NET_ADMIN
capability is required only if your application needs to assign the multicast IP address to the SR-IOV interface. Otherwise, it can be omitted.
25.10. Using DPDK and RDMA
The containerized Data Plane Development Kit (DPDK) application is supported on OpenShift Container Platform. You can use Single Root I/O Virtualization (SR-IOV) network hardware with the Data Plane Development Kit (DPDK) and with remote direct memory access (RDMA).
For information about supported devices, see Supported devices.
25.10.1. Using a virtual function in DPDK mode with an Intel NIC
Prerequisites
-
Install the OpenShift CLI (
oc
). - Install the SR-IOV Network Operator.
-
Log in as a user with
cluster-admin
privileges.
Procedure
Create the following
SriovNetworkNodePolicy
object, and then save the YAML in theintel-dpdk-node-policy.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: intel-dpdk-node-policy namespace: openshift-sriov-network-operator spec: resourceName: intelnics nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> numVfs: <num> nicSelector: vendor: "8086" deviceID: "158b" pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", "..."] deviceType: vfio-pci
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: intel-dpdk-node-policy namespace: openshift-sriov-network-operator spec: resourceName: intelnics nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> numVfs: <num> nicSelector: vendor: "8086" deviceID: "158b" pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", "..."] deviceType: vfio-pci
1 Copy to Clipboard Copied! - 1
- Specify the driver type for the virtual functions to
vfio-pci
.
NoteSee the
Configuring SR-IOV network devices
section for a detailed explanation on each option inSriovNetworkNodePolicy
.When applying the configuration specified in a
SriovNetworkNodePolicy
object, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.After the configuration update is applied, all the pods in
openshift-sriov-network-operator
namespace will change to aRunning
status.Create the
SriovNetworkNodePolicy
object by running the following command:oc create -f intel-dpdk-node-policy.yaml
$ oc create -f intel-dpdk-node-policy.yaml
Copy to Clipboard Copied! Create the following
SriovNetwork
object, and then save the YAML in theintel-dpdk-network.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: intel-dpdk-network namespace: openshift-sriov-network-operator spec: networkNamespace: <target_namespace> ipam: |- # ... vlan: <vlan> resourceName: intelnics
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: intel-dpdk-network namespace: openshift-sriov-network-operator spec: networkNamespace: <target_namespace> ipam: |- # ...
1 vlan: <vlan> resourceName: intelnics
Copy to Clipboard Copied! - 1
- Specify a configuration object for the ipam CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
NoteSee the "Configuring SR-IOV additional network" section for a detailed explanation on each option in
SriovNetwork
.An optional library, app-netutil, provides several API methods for gathering network information about a container’s parent pod.
Create the
SriovNetwork
object by running the following command:oc create -f intel-dpdk-network.yaml
$ oc create -f intel-dpdk-network.yaml
Copy to Clipboard Copied! Create the following
Pod
spec, and then save the YAML in theintel-dpdk-pod.yaml
file.apiVersion: v1 kind: Pod metadata: name: dpdk-app namespace: <target_namespace> annotations: k8s.v1.cni.cncf.io/networks: intel-dpdk-network spec: containers: - name: testpmd image: <DPDK_image> securityContext: runAsUser: 0 capabilities: add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] volumeMounts: - mountPath: /mnt/huge name: hugepage resources: limits: openshift.io/intelnics: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" requests: openshift.io/intelnics: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
apiVersion: v1 kind: Pod metadata: name: dpdk-app namespace: <target_namespace>
1 annotations: k8s.v1.cni.cncf.io/networks: intel-dpdk-network spec: containers: - name: testpmd image: <DPDK_image>
2 securityContext: runAsUser: 0 capabilities: add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
3 volumeMounts: - mountPath: /mnt/huge
4 name: hugepage resources: limits: openshift.io/intelnics: "1"
5 memory: "1Gi" cpu: "4"
6 hugepages-1Gi: "4Gi"
7 requests: openshift.io/intelnics: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
Copy to Clipboard Copied! - 1
- Specify the same
target_namespace
where theSriovNetwork
objectintel-dpdk-network
is created. If you would like to create the pod in a different namespace, changetarget_namespace
in both thePod
spec and theSriovNetwork
object. - 2
- Specify the DPDK image which includes your application and the DPDK library used by application.
- 3
- Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access.
- 4
- Mount a hugepage volume to the DPDK pod under
/mnt/huge
. The hugepage volume is backed by the emptyDir volume type with the medium beingHugepages
. - 5
- Optional: Specify the number of DPDK devices allocated to DPDK pod. This resource request and limit, if not explicitly specified, will be automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by the SR-IOV Operator. It is enabled by default and can be disabled by setting
enableInjector
option tofalse
in the defaultSriovOperatorConfig
CR. - 6
- Specify the number of CPUs. The DPDK pod usually requires exclusive CPUs to be allocated from the kubelet. This is achieved by setting CPU Manager policy to
static
and creating a pod withGuaranteed
QoS. - 7
- Specify hugepage size
hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to the DPDK pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepage requires adding kernel arguments to Nodes. For example, adding kernel argumentsdefault_hugepagesz=1GB
,hugepagesz=1G
andhugepages=16
will result in16*1Gi
hugepages be allocated during system boot.
Create the DPDK pod by running the following command:
oc create -f intel-dpdk-pod.yaml
$ oc create -f intel-dpdk-pod.yaml
Copy to Clipboard Copied!
25.10.2. Using a virtual function in DPDK mode with a Mellanox NIC
You can create a network node policy and create a Data Plane Development Kit (DPDK) pod using a virtual function in DPDK mode with a Mellanox NIC.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). - You have installed the Single Root I/O Virtualization (SR-IOV) Network Operator.
-
You have logged in as a user with
cluster-admin
privileges.
Procedure
Save the following
SriovNetworkNodePolicy
YAML configuration to anmlx-dpdk-node-policy.yaml
file:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlx-dpdk-node-policy namespace: openshift-sriov-network-operator spec: resourceName: mlxnics nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> numVfs: <num> nicSelector: vendor: "15b3" deviceID: "1015" pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", "..."] deviceType: netdevice isRdma: true
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlx-dpdk-node-policy namespace: openshift-sriov-network-operator spec: resourceName: mlxnics nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> numVfs: <num> nicSelector: vendor: "15b3" deviceID: "1015"
1 pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", "..."] deviceType: netdevice
2 isRdma: true
3 Copy to Clipboard Copied! - 1
- Specify the device hex code of the SR-IOV network device.
- 2
- Specify the driver type for the virtual functions to
netdevice
. A Mellanox SR-IOV Virtual Function (VF) can work in DPDK mode without using thevfio-pci
device type. The VF device appears as a kernel network interface inside a container. - 3
- Enable Remote Direct Memory Access (RDMA) mode. This is required for Mellanox cards to work in DPDK mode.
NoteSee Configuring an SR-IOV network device for a detailed explanation of each option in the
SriovNetworkNodePolicy
object.When applying the configuration specified in an
SriovNetworkNodePolicy
object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes. It might take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.After the configuration update is applied, all the pods in the
openshift-sriov-network-operator
namespace will change to aRunning
status.Create the
SriovNetworkNodePolicy
object by running the following command:oc create -f mlx-dpdk-node-policy.yaml
$ oc create -f mlx-dpdk-node-policy.yaml
Copy to Clipboard Copied! Save the following
SriovNetwork
YAML configuration to anmlx-dpdk-network.yaml
file:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: mlx-dpdk-network namespace: openshift-sriov-network-operator spec: networkNamespace: <target_namespace> ipam: |- ... vlan: <vlan> resourceName: mlxnics
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: mlx-dpdk-network namespace: openshift-sriov-network-operator spec: networkNamespace: <target_namespace> ipam: |-
1 ... vlan: <vlan> resourceName: mlxnics
Copy to Clipboard Copied! - 1
- Specify a configuration object for the IP Address Management (IPAM) Container Network Interface (CNI) plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
NoteSee Configuring an SR-IOV network device for a detailed explanation on each option in the
SriovNetwork
object.The
app-netutil
option library provides several API methods for gathering network information about the parent pod of a container.Create the
SriovNetwork
object by running the following command:oc create -f mlx-dpdk-network.yaml
$ oc create -f mlx-dpdk-network.yaml
Copy to Clipboard Copied! Save the following
Pod
YAML configuration to anmlx-dpdk-pod.yaml
file:apiVersion: v1 kind: Pod metadata: name: dpdk-app namespace: <target_namespace> annotations: k8s.v1.cni.cncf.io/networks: mlx-dpdk-network spec: containers: - name: testpmd image: <DPDK_image> securityContext: runAsUser: 0 capabilities: add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] volumeMounts: - mountPath: /mnt/huge name: hugepage resources: limits: openshift.io/mlxnics: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" requests: openshift.io/mlxnics: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
apiVersion: v1 kind: Pod metadata: name: dpdk-app namespace: <target_namespace>
1 annotations: k8s.v1.cni.cncf.io/networks: mlx-dpdk-network spec: containers: - name: testpmd image: <DPDK_image>
2 securityContext: runAsUser: 0 capabilities: add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
3 volumeMounts: - mountPath: /mnt/huge
4 name: hugepage resources: limits: openshift.io/mlxnics: "1"
5 memory: "1Gi" cpu: "4"
6 hugepages-1Gi: "4Gi"
7 requests: openshift.io/mlxnics: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
Copy to Clipboard Copied! - 1
- Specify the same
target_namespace
whereSriovNetwork
objectmlx-dpdk-network
is created. To create the pod in a different namespace, changetarget_namespace
in both thePod
spec andSriovNetwork
object. - 2
- Specify the DPDK image which includes your application and the DPDK library used by the application.
- 3
- Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access.
- 4
- Mount the hugepage volume to the DPDK pod under
/mnt/huge
. The hugepage volume is backed by theemptyDir
volume type with the medium beingHugepages
. - 5
- Optional: Specify the number of DPDK devices allocated for the DPDK pod. If not explicitly specified, this resource request and limit is automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the
enableInjector
option tofalse
in the defaultSriovOperatorConfig
CR. - 6
- Specify the number of CPUs. The DPDK pod usually requires that exclusive CPUs be allocated from the kubelet. To do this, set the CPU Manager policy to
static
and create a pod withGuaranteed
Quality of Service (QoS). - 7
- Specify hugepage size
hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to the DPDK pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepages requires adding kernel arguments to Nodes.
Create the DPDK pod by running the following command:
oc create -f mlx-dpdk-pod.yaml
$ oc create -f mlx-dpdk-pod.yaml
Copy to Clipboard Copied!
25.10.3. Using the TAP CNI to run a rootless DPDK workload with kernel access
DPDK applications can use virtio-user
as an exception path to inject certain types of packets, such as log messages, into the kernel for processing. For more information about this feature, see Virtio_user as Exception Path.
In OpenShift Container Platform version 4.14 and later, you can use non-privileged pods to run DPDK applications alongside the tap CNI plugin. To enable this functionality, you need to mount the vhost-net
device by setting the needVhostNet
parameter to true
within the SriovNetworkNodePolicy
object.
Figure 25.1. DPDK and TAP example configuration

Prerequisites
-
You have installed the OpenShift CLI (
oc
). - You have installed the SR-IOV Network Operator.
-
You are logged in as a user with
cluster-admin
privileges. Ensure that
setsebools container_use_devices=on
is set as root on all nodes.NoteUse the Machine Config Operator to set this SELinux boolean.
Procedure
Create a file, such as
test-namespace.yaml
, with content like the following example:apiVersion: v1 kind: Namespace metadata: name: test-namespace labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/audit: privileged pod-security.kubernetes.io/warn: privileged security.openshift.io/scc.podSecurityLabelSync: "false"
apiVersion: v1 kind: Namespace metadata: name: test-namespace labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/audit: privileged pod-security.kubernetes.io/warn: privileged security.openshift.io/scc.podSecurityLabelSync: "false"
Copy to Clipboard Copied! Create the new
Namespace
object by running the following command:oc apply -f test-namespace.yaml
$ oc apply -f test-namespace.yaml
Copy to Clipboard Copied! Create a file, such as
sriov-node-network-policy.yaml
, with content like the following example::apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriovnic namespace: openshift-sriov-network-operator spec: deviceType: netdevice isRdma: true needVhostNet: true nicSelector: vendor: "15b3" deviceID: "101b" rootDevices: ["00:05.0"] numVfs: 10 priority: 99 resourceName: sriovnic nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true"
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriovnic namespace: openshift-sriov-network-operator spec: deviceType: netdevice
1 isRdma: true
2 needVhostNet: true
3 nicSelector: vendor: "15b3"
4 deviceID: "101b"
5 rootDevices: ["00:05.0"] numVfs: 10 priority: 99 resourceName: sriovnic nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true"
Copy to Clipboard Copied! - 1
- This indicates that the profile is tailored specifically for Mellanox Network Interface Controllers (NICs).
- 2
- Setting
isRdma
totrue
is only required for a Mellanox NIC. - 3
- This mounts the
/dev/net/tun
and/dev/vhost-net
devices into the container so the application can create a tap device and connect the tap device to the DPDK workload. - 4
- The vendor hexadecimal code of the SR-IOV network device. The value 15b3 is associated with a Mellanox NIC.
- 5
- The device hexadecimal code of the SR-IOV network device.
Create the
SriovNetworkNodePolicy
object by running the following command:oc create -f sriov-node-network-policy.yaml
$ oc create -f sriov-node-network-policy.yaml
Copy to Clipboard Copied! Create the following
SriovNetwork
object, and then save the YAML in thesriov-network-attachment.yaml
file:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-network namespace: openshift-sriov-network-operator spec: networkNamespace: test-namespace resourceName: sriovnic spoofChk: "off" trust: "on"
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: sriov-network namespace: openshift-sriov-network-operator spec: networkNamespace: test-namespace resourceName: sriovnic spoofChk: "off" trust: "on"
Copy to Clipboard Copied! NoteSee the "Configuring SR-IOV additional network" section for a detailed explanation on each option in
SriovNetwork
.An optional library,
app-netutil
, provides several API methods for gathering network information about a container’s parent pod.Create the
SriovNetwork
object by running the following command:oc create -f sriov-network-attachment.yaml
$ oc create -f sriov-network-attachment.yaml
Copy to Clipboard Copied! Create a file, such as
tap-example.yaml
, that defines a network attachment definition, with content like the following example:apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: tap-one namespace: test-namespace spec: config: '{ "cniVersion": "0.4.0", "name": "tap", "plugins": [ { "type": "tap", "multiQueue": true, "selinuxcontext": "system_u:system_r:container_t:s0" }, { "type":"tuning", "capabilities":{ "mac":true } } ] }'
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: tap-one namespace: test-namespace
1 spec: config: '{ "cniVersion": "0.4.0", "name": "tap", "plugins": [ { "type": "tap", "multiQueue": true, "selinuxcontext": "system_u:system_r:container_t:s0" }, { "type":"tuning", "capabilities":{ "mac":true } } ] }'
Copy to Clipboard Copied! - 1
- Specify the same
target_namespace
where theSriovNetwork
object is created.
Create the
NetworkAttachmentDefinition
object by running the following command:oc apply -f tap-example.yaml
$ oc apply -f tap-example.yaml
Copy to Clipboard Copied! Create a file, such as
dpdk-pod-rootless.yaml
, with content like the following example:apiVersion: v1 kind: Pod metadata: name: dpdk-app namespace: test-namespace annotations: k8s.v1.cni.cncf.io/networks: '[ {"name": "sriov-network", "namespace": "test-namespace"}, {"name": "tap-one", "interface": "ext0", "namespace": "test-namespace"}]' spec: nodeSelector: kubernetes.io/hostname: "worker-0" securityContext: fsGroup: 1001 runAsGroup: 1001 seccompProfile: type: RuntimeDefault containers: - name: testpmd image: <DPDK_image> securityContext: capabilities: drop: ["ALL"] add: - IPC_LOCK - NET_RAW #for mlx only runAsUser: 1001 privileged: false allowPrivilegeEscalation: true runAsNonRoot: true volumeMounts: - mountPath: /mnt/huge name: hugepages resources: limits: openshift.io/sriovnic: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" requests: openshift.io/sriovnic: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] runtimeClassName: performance-cnf-performanceprofile volumes: - name: hugepages emptyDir: medium: HugePages
apiVersion: v1 kind: Pod metadata: name: dpdk-app namespace: test-namespace
1 annotations: k8s.v1.cni.cncf.io/networks: '[ {"name": "sriov-network", "namespace": "test-namespace"}, {"name": "tap-one", "interface": "ext0", "namespace": "test-namespace"}]' spec: nodeSelector: kubernetes.io/hostname: "worker-0" securityContext: fsGroup: 1001
2 runAsGroup: 1001
3 seccompProfile: type: RuntimeDefault containers: - name: testpmd image: <DPDK_image>
4 securityContext: capabilities: drop: ["ALL"]
5 add:
6 - IPC_LOCK - NET_RAW #for mlx only
7 runAsUser: 1001
8 privileged: false
9 allowPrivilegeEscalation: true
10 runAsNonRoot: true
11 volumeMounts: - mountPath: /mnt/huge
12 name: hugepages resources: limits: openshift.io/sriovnic: "1"
13 memory: "1Gi" cpu: "4"
14 hugepages-1Gi: "4Gi"
15 requests: openshift.io/sriovnic: "1" memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] runtimeClassName: performance-cnf-performanceprofile
16 volumes: - name: hugepages emptyDir: medium: HugePages
Copy to Clipboard Copied! - 1
- Specify the same
target_namespace
in which theSriovNetwork
object is created. If you want to create the pod in a different namespace, changetarget_namespace
in both thePod
spec and theSriovNetwork
object. - 2
- Sets the group ownership of volume-mounted directories and files created in those volumes.
- 3
- Specify the primary group ID used for running the container.
- 4
- Specify the DPDK image that contains your application and the DPDK library used by application.
- 5
- Removing all capabilities (
ALL
) from the container’s securityContext means that the container has no special privileges beyond what is necessary for normal operation. - 6
- Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access. These capabilities must also be set in the binary file by using the
setcap
command. - 7
- Mellanox network interface controller (NIC) requires the
NET_RAW
capability. - 8
- Specify the user ID used for running the container.
- 9
- This setting indicates that the container or containers within the pod should not be granted privileged access to the host system.
- 10
- This setting allows a container to escalate its privileges beyond the initial non-root privileges it might have been assigned.
- 11
- This setting ensures that the container runs with a non-root user. This helps enforce the principle of least privilege, limiting the potential impact of compromising the container and reducing the attack surface.
- 12
- Mount a hugepage volume to the DPDK pod under
/mnt/huge
. The hugepage volume is backed by the emptyDir volume type with the medium beingHugepages
. - 13
- Optional: Specify the number of DPDK devices allocated for the DPDK pod. If not explicitly specified, this resource request and limit is automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the
enableInjector
option tofalse
in the defaultSriovOperatorConfig
CR. - 14
- Specify the number of CPUs. The DPDK pod usually requires exclusive CPUs to be allocated from the kubelet. This is achieved by setting CPU Manager policy to
static
and creating a pod withGuaranteed
QoS. - 15
- Specify hugepage size
hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to the DPDK pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepage requires adding kernel arguments to Nodes. For example, adding kernel argumentsdefault_hugepagesz=1GB
,hugepagesz=1G
andhugepages=16
will result in16*1Gi
hugepages be allocated during system boot. - 16
- If your performance profile is not named
cnf-performance profile
, replace that string with the correct performance profile name.
Create the DPDK pod by running the following command:
oc create -f dpdk-pod-rootless.yaml
$ oc create -f dpdk-pod-rootless.yaml
Copy to Clipboard Copied!
25.10.4. Overview of achieving a specific DPDK line rate
To achieve a specific Data Plane Development Kit (DPDK) line rate, deploy a Node Tuning Operator and configure Single Root I/O Virtualization (SR-IOV). You must also tune the DPDK settings for the following resources:
- Isolated CPUs
- Hugepages
- The topology scheduler
In previous versions of OpenShift Container Platform, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift Container Platform applications. In OpenShift Container Platform 4.11 and later, this functionality is part of the Node Tuning Operator.
DPDK test environment
The following diagram shows the components of a traffic-testing environment:

- Traffic generator: An application that can generate high-volume packet traffic.
- SR-IOV-supporting NIC: A network interface card compatible with SR-IOV. The card runs a number of virtual functions on a physical interface.
- Physical Function (PF): A PCI Express (PCIe) function of a network adapter that supports the SR-IOV interface.
- Virtual Function (VF): A lightweight PCIe function on a network adapter that supports SR-IOV. The VF is associated with the PCIe PF on the network adapter. The VF represents a virtualized instance of the network adapter.
- Switch: A network switch. Nodes can also be connected back-to-back.
-
testpmd
: An example application included with DPDK. Thetestpmd
application can be used to test the DPDK in a packet-forwarding mode. Thetestpmd
application is also an example of how to build a fully-fledged application using the DPDK Software Development Kit (SDK). - worker 0 and worker 1: OpenShift Container Platform nodes.
25.10.5. Using SR-IOV and the Node Tuning Operator to achieve a DPDK line rate
You can use the Node Tuning Operator to configure isolated CPUs, hugepages, and a topology scheduler. You can then use the Node Tuning Operator with Single Root I/O Virtualization (SR-IOV) to achieve a specific Data Plane Development Kit (DPDK) line rate.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). - You have installed the SR-IOV Network Operator.
-
You have logged in as a user with
cluster-admin
privileges. You have deployed a standalone Node Tuning Operator.
NoteIn previous versions of OpenShift Container Platform, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In OpenShift Container Platform 4.11 and later, this functionality is part of the Node Tuning Operator.
Procedure
Create a
PerformanceProfile
object based on the following example:apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance spec: globallyDisableIrqLoadBalancing: true cpu: isolated: 21-51,73-103 reserved: 0-20,52-72 hugepages: defaultHugepagesSize: 1G pages: - count: 32 size: 1G net: userLevelNetworking: true numa: topologyPolicy: "single-numa-node" nodeSelector: node-role.kubernetes.io/worker-cnf: ""
apiVersion: performance.openshift.io/v2 kind: PerformanceProfile metadata: name: performance spec: globallyDisableIrqLoadBalancing: true cpu: isolated: 21-51,73-103
1 reserved: 0-20,52-72
2 hugepages: defaultHugepagesSize: 1G
3 pages: - count: 32 size: 1G net: userLevelNetworking: true numa: topologyPolicy: "single-numa-node" nodeSelector: node-role.kubernetes.io/worker-cnf: ""
Copy to Clipboard Copied! - 1
- If hyperthreading is enabled on the system, allocate the relevant symbolic links to the
isolated
andreserved
CPU groups. If the system contains multiple non-uniform memory access nodes (NUMAs), allocate CPUs from both NUMAs to both groups. You can also use the Performance Profile Creator for this task. For more information, see Creating a performance profile. - 2
- You can also specify a list of devices that will have their queues set to the reserved CPU count. For more information, see Reducing NIC queues using the Node Tuning Operator.
- 3
- Allocate the number and size of hugepages needed. You can specify the NUMA configuration for the hugepages. By default, the system allocates an even number to every NUMA node on the system. If needed, you can request the use of a realtime kernel for the nodes. See Provisioning a worker with real-time capabilities for more information.
-
Save the
yaml
file asmlx-dpdk-perfprofile-policy.yaml
. Apply the performance profile using the following command:
oc create -f mlx-dpdk-perfprofile-policy.yaml
$ oc create -f mlx-dpdk-perfprofile-policy.yaml
Copy to Clipboard Copied!
25.10.5.1. Example SR-IOV Network Operator for virtual functions
You can use the Single Root I/O Virtualization (SR-IOV) Network Operator to allocate and configure Virtual Functions (VFs) from SR-IOV-supporting Physical Function NICs on the nodes.
For more information on deploying the Operator, see Installing the SR-IOV Network Operator. For more information on configuring an SR-IOV network device, see Configuring an SR-IOV network device.
There are some differences between running Data Plane Development Kit (DPDK) workloads on Intel VFs and Mellanox VFs. This section provides object configuration examples for both VF types. The following is an example of an sriovNetworkNodePolicy
object used to run DPDK applications on Intel NICs:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: dpdk-nic-1 namespace: openshift-sriov-network-operator spec: deviceType: vfio-pci needVhostNet: true nicSelector: pfNames: ["ens3f0"] nodeSelector: node-role.kubernetes.io/worker-cnf: "" numVfs: 10 priority: 99 resourceName: dpdk_nic_1 --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: dpdk-nic-1 namespace: openshift-sriov-network-operator spec: deviceType: vfio-pci needVhostNet: true nicSelector: pfNames: ["ens3f1"] nodeSelector: node-role.kubernetes.io/worker-cnf: "" numVfs: 10 priority: 99 resourceName: dpdk_nic_2
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: dpdk-nic-1
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci
needVhostNet: true
nicSelector:
pfNames: ["ens3f0"]
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 10
priority: 99
resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: dpdk-nic-1
namespace: openshift-sriov-network-operator
spec:
deviceType: vfio-pci
needVhostNet: true
nicSelector:
pfNames: ["ens3f1"]
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 10
priority: 99
resourceName: dpdk_nic_2
The following is an example of an sriovNetworkNodePolicy
object for Mellanox NICs:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: dpdk-nic-1 namespace: openshift-sriov-network-operator spec: deviceType: netdevice isRdma: true nicSelector: rootDevices: - "0000:5e:00.1" nodeSelector: node-role.kubernetes.io/worker-cnf: "" numVfs: 5 priority: 99 resourceName: dpdk_nic_1 --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: dpdk-nic-2 namespace: openshift-sriov-network-operator spec: deviceType: netdevice isRdma: true nicSelector: rootDevices: - "0000:5e:00.0" nodeSelector: node-role.kubernetes.io/worker-cnf: "" numVfs: 5 priority: 99 resourceName: dpdk_nic_2
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: dpdk-nic-1
namespace: openshift-sriov-network-operator
spec:
deviceType: netdevice
isRdma: true
nicSelector:
rootDevices:
- "0000:5e:00.1"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 5
priority: 99
resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: dpdk-nic-2
namespace: openshift-sriov-network-operator
spec:
deviceType: netdevice
isRdma: true
nicSelector:
rootDevices:
- "0000:5e:00.0"
nodeSelector:
node-role.kubernetes.io/worker-cnf: ""
numVfs: 5
priority: 99
resourceName: dpdk_nic_2
25.10.5.2. Example SR-IOV network operator
The following is an example definition of an sriovNetwork
object. In this case, Intel and Mellanox configurations are identical:
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: dpdk-network-1 namespace: openshift-sriov-network-operator spec: ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.1.0/24"}]],"dataDir": "/run/my-orchestrator/container-ipam-state-1"}' networkNamespace: dpdk-test spoofChk: "off" trust: "on" resourceName: dpdk_nic_1 --- apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: dpdk-network-2 namespace: openshift-sriov-network-operator spec: ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.2.0/24"}]],"dataDir": "/run/my-orchestrator/container-ipam-state-1"}' networkNamespace: dpdk-test spoofChk: "off" trust: "on" resourceName: dpdk_nic_2
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: dpdk-network-1
namespace: openshift-sriov-network-operator
spec:
ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.1.0/24"}]],"dataDir":
"/run/my-orchestrator/container-ipam-state-1"}'
networkNamespace: dpdk-test
spoofChk: "off"
trust: "on"
resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
name: dpdk-network-2
namespace: openshift-sriov-network-operator
spec:
ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.2.0/24"}]],"dataDir":
"/run/my-orchestrator/container-ipam-state-1"}'
networkNamespace: dpdk-test
spoofChk: "off"
trust: "on"
resourceName: dpdk_nic_2
- 1
- You can use a different IP Address Management (IPAM) implementation, such as Whereabouts. For more information, see Dynamic IP address assignment configuration with Whereabouts.
- 2
- You must request the
networkNamespace
where the network attachment definition will be created. You must create thesriovNetwork
CR under theopenshift-sriov-network-operator
namespace. - 3
- The
resourceName
value must match that of theresourceName
created under thesriovNetworkNodePolicy
.
25.10.5.3. Example DPDK base workload
The following is an example of a Data Plane Development Kit (DPDK) container:
apiVersion: v1 kind: Namespace metadata: name: dpdk-test --- apiVersion: v1 kind: Pod metadata: annotations: k8s.v1.cni.cncf.io/networks: '[ { "name": "dpdk-network-1", "namespace": "dpdk-test" }, { "name": "dpdk-network-2", "namespace": "dpdk-test" } ]' irq-load-balancing.crio.io: "disable" cpu-load-balancing.crio.io: "disable" cpu-quota.crio.io: "disable" labels: app: dpdk name: testpmd namespace: dpdk-test spec: runtimeClassName: performance-performance containers: - command: - /bin/bash - -c - sleep INF image: registry.redhat.io/openshift4/dpdk-base-rhel8 imagePullPolicy: Always name: dpdk resources: limits: cpu: "16" hugepages-1Gi: 8Gi memory: 2Gi requests: cpu: "16" hugepages-1Gi: 8Gi memory: 2Gi securityContext: capabilities: add: - IPC_LOCK - SYS_RESOURCE - NET_RAW - NET_ADMIN runAsUser: 0 volumeMounts: - mountPath: /mnt/huge name: hugepages terminationGracePeriodSeconds: 5 volumes: - emptyDir: medium: HugePages name: hugepages
apiVersion: v1
kind: Namespace
metadata:
name: dpdk-test
---
apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "dpdk-network-1",
"namespace": "dpdk-test"
},
{
"name": "dpdk-network-2",
"namespace": "dpdk-test"
}
]'
irq-load-balancing.crio.io: "disable"
cpu-load-balancing.crio.io: "disable"
cpu-quota.crio.io: "disable"
labels:
app: dpdk
name: testpmd
namespace: dpdk-test
spec:
runtimeClassName: performance-performance
containers:
- command:
- /bin/bash
- -c
- sleep INF
image: registry.redhat.io/openshift4/dpdk-base-rhel8
imagePullPolicy: Always
name: dpdk
resources:
limits:
cpu: "16"
hugepages-1Gi: 8Gi
memory: 2Gi
requests:
cpu: "16"
hugepages-1Gi: 8Gi
memory: 2Gi
securityContext:
capabilities:
add:
- IPC_LOCK
- SYS_RESOURCE
- NET_RAW
- NET_ADMIN
runAsUser: 0
volumeMounts:
- mountPath: /mnt/huge
name: hugepages
terminationGracePeriodSeconds: 5
volumes:
- emptyDir:
medium: HugePages
name: hugepages
- 1
- Request the SR-IOV networks you need. Resources for the devices will be injected automatically.
- 2
- Disable the CPU and IRQ load balancing base. See Disabling interrupt processing for individual pods for more information.
- 3
- Set the
runtimeClass
toperformance-performance
. Do not set theruntimeClass
toHostNetwork
orprivileged
. - 4
- Request an equal number of resources for requests and limits to start the pod with
Guaranteed
Quality of Service (QoS).
Do not start the pod with SLEEP
and then exec into the pod to start the testpmd or the DPDK workload. This can add additional interrupts as the exec
process is not pinned to any CPU.
25.10.5.4. Example testpmd script
The following is an example script for running testpmd
:
#!/bin/bash set -ex export CPU=$(cat /sys/fs/cgroup/cpuset/cpuset.cpus) echo ${CPU} dpdk-testpmd -l ${CPU} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_1} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_2} -n 4 -- -i --nb-cores=15 --rxd=4096 --txd=4096 --rxq=7 --txq=7 --forward-mode=mac --eth-peer=0,50:00:00:00:00:01 --eth-peer=1,50:00:00:00:00:02
#!/bin/bash
set -ex
export CPU=$(cat /sys/fs/cgroup/cpuset/cpuset.cpus)
echo ${CPU}
dpdk-testpmd -l ${CPU} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_1} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_2} -n 4 -- -i --nb-cores=15 --rxd=4096 --txd=4096 --rxq=7 --txq=7 --forward-mode=mac --eth-peer=0,50:00:00:00:00:01 --eth-peer=1,50:00:00:00:00:02
This example uses two different sriovNetwork
CRs. The environment variable contains the Virtual Function (VF) PCI address that was allocated for the pod. If you use the same network in the pod definition, you must split the pciAddress
. It is important to configure the correct MAC addresses of the traffic generator. This example uses custom MAC addresses.
25.10.6. Using a virtual function in RDMA mode with a Mellanox NIC
RDMA over Converged Ethernet (RoCE) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
RDMA over Converged Ethernet (RoCE) is the only supported mode when using RDMA on OpenShift Container Platform.
Prerequisites
-
Install the OpenShift CLI (
oc
). - Install the SR-IOV Network Operator.
-
Log in as a user with
cluster-admin
privileges.
Procedure
Create the following
SriovNetworkNodePolicy
object, and then save the YAML in themlx-rdma-node-policy.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlx-rdma-node-policy namespace: openshift-sriov-network-operator spec: resourceName: mlxnics nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> numVfs: <num> nicSelector: vendor: "15b3" deviceID: "1015" pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", "..."] deviceType: netdevice isRdma: true
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: mlx-rdma-node-policy namespace: openshift-sriov-network-operator spec: resourceName: mlxnics nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" priority: <priority> numVfs: <num> nicSelector: vendor: "15b3" deviceID: "1015"
1 pfNames: ["<pf_name>", ...] rootDevices: ["<pci_bus_id>", "..."] deviceType: netdevice
2 isRdma: true
3 Copy to Clipboard Copied! NoteSee the
Configuring SR-IOV network devices
section for a detailed explanation on each option inSriovNetworkNodePolicy
.When applying the configuration specified in a
SriovNetworkNodePolicy
object, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.After the configuration update is applied, all the pods in the
openshift-sriov-network-operator
namespace will change to aRunning
status.Create the
SriovNetworkNodePolicy
object by running the following command:oc create -f mlx-rdma-node-policy.yaml
$ oc create -f mlx-rdma-node-policy.yaml
Copy to Clipboard Copied! Create the following
SriovNetwork
object, and then save the YAML in themlx-rdma-network.yaml
file.apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: mlx-rdma-network namespace: openshift-sriov-network-operator spec: networkNamespace: <target_namespace> ipam: |- # ... vlan: <vlan> resourceName: mlxnics
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetwork metadata: name: mlx-rdma-network namespace: openshift-sriov-network-operator spec: networkNamespace: <target_namespace> ipam: |-
1 # ... vlan: <vlan> resourceName: mlxnics
Copy to Clipboard Copied! - 1
- Specify a configuration object for the ipam CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
NoteSee the "Configuring SR-IOV additional network" section for a detailed explanation on each option in
SriovNetwork
.An optional library, app-netutil, provides several API methods for gathering network information about a container’s parent pod.
Create the
SriovNetworkNodePolicy
object by running the following command:oc create -f mlx-rdma-network.yaml
$ oc create -f mlx-rdma-network.yaml
Copy to Clipboard Copied! Create the following
Pod
spec, and then save the YAML in themlx-rdma-pod.yaml
file.apiVersion: v1 kind: Pod metadata: name: rdma-app namespace: <target_namespace> annotations: k8s.v1.cni.cncf.io/networks: mlx-rdma-network spec: containers: - name: testpmd image: <RDMA_image> securityContext: runAsUser: 0 capabilities: add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] volumeMounts: - mountPath: /mnt/huge name: hugepage resources: limits: memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" requests: memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
apiVersion: v1 kind: Pod metadata: name: rdma-app namespace: <target_namespace>
1 annotations: k8s.v1.cni.cncf.io/networks: mlx-rdma-network spec: containers: - name: testpmd image: <RDMA_image>
2 securityContext: runAsUser: 0 capabilities: add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
3 volumeMounts: - mountPath: /mnt/huge
4 name: hugepage resources: limits: memory: "1Gi" cpu: "4"
5 hugepages-1Gi: "4Gi"
6 requests: memory: "1Gi" cpu: "4" hugepages-1Gi: "4Gi" command: ["sleep", "infinity"] volumes: - name: hugepage emptyDir: medium: HugePages
Copy to Clipboard Copied! - 1
- Specify the same
target_namespace
whereSriovNetwork
objectmlx-rdma-network
is created. If you would like to create the pod in a different namespace, changetarget_namespace
in bothPod
spec andSriovNetwork
object. - 2
- Specify the RDMA image which includes your application and RDMA library used by application.
- 3
- Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access.
- 4
- Mount the hugepage volume to RDMA pod under
/mnt/huge
. The hugepage volume is backed by the emptyDir volume type with the medium beingHugepages
. - 5
- Specify number of CPUs. The RDMA pod usually requires exclusive CPUs be allocated from the kubelet. This is achieved by setting CPU Manager policy to
static
and create pod withGuaranteed
QoS. - 6
- Specify hugepage size
hugepages-1Gi
orhugepages-2Mi
and the quantity of hugepages that will be allocated to the RDMA pod. Configure2Mi
and1Gi
hugepages separately. Configuring1Gi
hugepage requires adding kernel arguments to Nodes.
Create the RDMA pod by running the following command:
oc create -f mlx-rdma-pod.yaml
$ oc create -f mlx-rdma-pod.yaml
Copy to Clipboard Copied!
25.10.7. A test pod template for clusters that use OVS-DPDK on OpenStack
The following testpmd
pod demonstrates container creation with huge pages, reserved CPUs, and the SR-IOV port.
An example testpmd
pod
apiVersion: v1 kind: Pod metadata: name: testpmd-dpdk namespace: mynamespace annotations: cpu-load-balancing.crio.io: "disable" cpu-quota.crio.io: "disable" # ... spec: containers: - name: testpmd command: ["sleep", "99999"] image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9 securityContext: capabilities: add: ["IPC_LOCK","SYS_ADMIN"] privileged: true runAsUser: 0 resources: requests: memory: 1000Mi hugepages-1Gi: 1Gi cpu: '2' openshift.io/dpdk1: 1 limits: hugepages-1Gi: 1Gi cpu: '2' memory: 1000Mi openshift.io/dpdk1: 1 volumeMounts: - mountPath: /mnt/huge name: hugepage readOnly: False runtimeClassName: performance-cnf-performanceprofile volumes: - name: hugepage emptyDir: medium: HugePages
apiVersion: v1
kind: Pod
metadata:
name: testpmd-dpdk
namespace: mynamespace
annotations:
cpu-load-balancing.crio.io: "disable"
cpu-quota.crio.io: "disable"
# ...
spec:
containers:
- name: testpmd
command: ["sleep", "99999"]
image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
securityContext:
capabilities:
add: ["IPC_LOCK","SYS_ADMIN"]
privileged: true
runAsUser: 0
resources:
requests:
memory: 1000Mi
hugepages-1Gi: 1Gi
cpu: '2'
openshift.io/dpdk1: 1
limits:
hugepages-1Gi: 1Gi
cpu: '2'
memory: 1000Mi
openshift.io/dpdk1: 1
volumeMounts:
- mountPath: /mnt/huge
name: hugepage
readOnly: False
runtimeClassName: performance-cnf-performanceprofile
volumes:
- name: hugepage
emptyDir:
medium: HugePages
25.10.8. A test pod template for clusters that use OVS hardware offloading on OpenStack
The following testpmd
pod demonstrates Open vSwitch (OVS) hardware offloading on Red Hat OpenStack Platform (RHOSP).
An example testpmd
pod
apiVersion: v1 kind: Pod metadata: name: testpmd-sriov namespace: mynamespace annotations: k8s.v1.cni.cncf.io/networks: hwoffload1 spec: runtimeClassName: performance-cnf-performanceprofile containers: - name: testpmd command: ["sleep", "99999"] image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9 securityContext: capabilities: add: ["IPC_LOCK","SYS_ADMIN"] privileged: true runAsUser: 0 resources: requests: memory: 1000Mi hugepages-1Gi: 1Gi cpu: '2' limits: hugepages-1Gi: 1Gi cpu: '2' memory: 1000Mi volumeMounts: - mountPath: /mnt/huge name: hugepage readOnly: False volumes: - name: hugepage emptyDir: medium: HugePages
apiVersion: v1
kind: Pod
metadata:
name: testpmd-sriov
namespace: mynamespace
annotations:
k8s.v1.cni.cncf.io/networks: hwoffload1
spec:
runtimeClassName: performance-cnf-performanceprofile
containers:
- name: testpmd
command: ["sleep", "99999"]
image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
securityContext:
capabilities:
add: ["IPC_LOCK","SYS_ADMIN"]
privileged: true
runAsUser: 0
resources:
requests:
memory: 1000Mi
hugepages-1Gi: 1Gi
cpu: '2'
limits:
hugepages-1Gi: 1Gi
cpu: '2'
memory: 1000Mi
volumeMounts:
- mountPath: /mnt/huge
name: hugepage
readOnly: False
volumes:
- name: hugepage
emptyDir:
medium: HugePages
- 1
- If your performance profile is not named
cnf-performance profile
, replace that string with the correct performance profile name.
25.11. Using pod-level bonding
Bonding at the pod level is vital to enable workloads inside pods that require high availability and more throughput. With pod-level bonding, you can create a bond interface from multiple single root I/O virtualization (SR-IOV) virtual function interfaces in a kernel mode interface. The SR-IOV virtual functions are passed into the pod and attached to a kernel driver.
One scenario where pod level bonding is required is creating a bond interface from multiple SR-IOV virtual functions on different physical functions. Creating a bond interface from two different physical functions on the host can be used to achieve high availability and throughput at pod level.
For guidance on tasks such as creating a SR-IOV network, network policies, network attachment definitions and pods, see Configuring an SR-IOV network device.
25.11.1. Configuring a bond interface from two SR-IOV interfaces
Bonding enables multiple network interfaces to be aggregated into a single logical "bonded" interface. Bond Container Network Interface (Bond-CNI) brings bond capability into containers.
Bond-CNI can be created using Single Root I/O Virtualization (SR-IOV) virtual functions and placing them in the container network namespace.
OpenShift Container Platform only supports Bond-CNI using SR-IOV virtual functions. The SR-IOV Network Operator provides the SR-IOV CNI plugin needed to manage the virtual functions. Other CNIs or types of interfaces are not supported.
Prerequisites
- The SR-IOV Network Operator must be installed and configured to obtain virtual functions in a container.
- To configure SR-IOV interfaces, an SR-IOV network and policy must be created for each interface.
- The SR-IOV Network Operator creates a network attachment definition for each SR-IOV interface, based on the SR-IOV network and policy defined.
-
The
linkState
is set to the default valueauto
for the SR-IOV virtual function.
25.11.1.1. Creating a bond network attachment definition
Now that the SR-IOV virtual functions are available, you can create a bond network attachment definition.
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: bond-net1 namespace: demo spec: config: '{ "type": "bond", "cniVersion": "0.3.1", "name": "bond-net1", "mode": "active-backup", "failOverMac": 1, "linksInContainer": true, "miimon": "100", "mtu": 1500, "links": [ {"name": "net1"}, {"name": "net2"} ], "ipam": { "type": "host-local", "subnet": "10.56.217.0/24", "routes": [{ "dst": "0.0.0.0/0" }], "gateway": "10.56.217.1" } }'
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: bond-net1
namespace: demo
spec:
config: '{
"type": "bond",
"cniVersion": "0.3.1",
"name": "bond-net1",
"mode": "active-backup",
"failOverMac": 1,
"linksInContainer": true,
"miimon": "100",
"mtu": 1500,
"links": [
{"name": "net1"},
{"name": "net2"}
],
"ipam": {
"type": "host-local",
"subnet": "10.56.217.0/24",
"routes": [{
"dst": "0.0.0.0/0"
}],
"gateway": "10.56.217.1"
}
}'
- 1
- The cni-type is always set to
bond
. - 2
- The
mode
attribute specifies the bonding mode.NoteThe bonding modes supported are:
-
balance-rr
- 0 -
active-backup
- 1 -
balance-xor
- 2
For
balance-rr
orbalance-xor
modes, you must set thetrust
mode toon
for the SR-IOV virtual function. -
- 3
- The
failover
attribute is mandatory for active-backup mode and must be set to 1. - 4
- The
linksInContainer=true
flag informs the Bond CNI that the required interfaces are to be found inside the container. By default, Bond CNI looks for these interfaces on the host which does not work for integration with SRIOV and Multus. - 5
- The
links
section defines which interfaces will be used to create the bond. By default, Multus names the attached interfaces as: "net", plus a consecutive number, starting with one.
25.11.1.2. Creating a pod using a bond interface
Test the setup by creating a pod with a YAML file named for example
podbonding.yaml
with content similar to the following:apiVersion: v1 kind: Pod metadata: name: bondpod1 namespace: demo annotations: k8s.v1.cni.cncf.io/networks: demo/sriovnet1, demo/sriovnet2, demo/bond-net1 spec: containers: - name: podexample image: quay.io/openshift/origin-network-interface-bond-cni:4.11.0 command: ["/bin/bash", "-c", "sleep INF"]
apiVersion: v1 kind: Pod metadata: name: bondpod1 namespace: demo annotations: k8s.v1.cni.cncf.io/networks: demo/sriovnet1, demo/sriovnet2, demo/bond-net1
1 spec: containers: - name: podexample image: quay.io/openshift/origin-network-interface-bond-cni:4.11.0 command: ["/bin/bash", "-c", "sleep INF"]
Copy to Clipboard Copied! - 1
- Note the network annotation: it contains two SR-IOV network attachments, and one bond network attachment. The bond attachment uses the two SR-IOV interfaces as bonded port interfaces.
Apply the yaml by running the following command:
oc apply -f podbonding.yaml
$ oc apply -f podbonding.yaml
Copy to Clipboard Copied! Inspect the pod interfaces with the following command:
$ oc rsh -n demo bondpod1 sh-4.4# sh-4.4# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 3: eth0@if150: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP link/ether 62:b1:b5:c8:fb:7a brd ff:ff:ff:ff:ff:ff inet 10.244.1.122/24 brd 10.244.1.255 scope global eth0 valid_lft forever preferred_lft forever 4: net3: <BROADCAST,MULTICAST,UP,LOWER_UP400> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff inet 10.56.217.66/24 scope global bond0 valid_lft forever preferred_lft forever 43: net1: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff 44: net2: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff
$ oc rsh -n demo bondpod1 sh-4.4# sh-4.4# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 3: eth0@if150: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP link/ether 62:b1:b5:c8:fb:7a brd ff:ff:ff:ff:ff:ff inet 10.244.1.122/24 brd 10.244.1.255 scope global eth0 valid_lft forever preferred_lft forever 4: net3: <BROADCAST,MULTICAST,UP,LOWER_UP400> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff
1 inet 10.56.217.66/24 scope global bond0 valid_lft forever preferred_lft forever 43: net1: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff
2 44: net2: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff
3 Copy to Clipboard Copied! NoteIf no interface names are configured in the pod annotation, interface names are assigned automatically as
net<n>
, with<n>
starting at1
.Optional: If you want to set a specific interface name for example
bond0
, edit thek8s.v1.cni.cncf.io/networks
annotation and setbond0
as the interface name as follows:annotations: k8s.v1.cni.cncf.io/networks: demo/sriovnet1, demo/sriovnet2, demo/bond-net1@bond0
annotations: k8s.v1.cni.cncf.io/networks: demo/sriovnet1, demo/sriovnet2, demo/bond-net1@bond0
Copy to Clipboard Copied!
25.12. Configuring hardware offloading
As a cluster administrator, you can configure hardware offloading on compatible nodes to increase data processing performance and reduce load on host CPUs.
25.12.1. About hardware offloading
Open vSwitch hardware offloading is a method of processing network tasks by diverting them away from the CPU and offloading them to a dedicated processor on a network interface controller. As a result, clusters can benefit from faster data transfer speeds, reduced CPU workloads, and lower computing costs.
The key element for this feature is a modern class of network interface controllers known as SmartNICs. A SmartNIC is a network interface controller that is able to handle computationally-heavy network processing tasks. In the same way that a dedicated graphics card can improve graphics performance, a SmartNIC can improve network performance. In each case, a dedicated processor improves performance for a specific type of processing task.
In OpenShift Container Platform, you can configure hardware offloading for bare metal nodes that have a compatible SmartNIC. Hardware offloading is configured and enabled by the SR-IOV Network Operator.
Hardware offloading is not compatible with all workloads or application types. Only the following two communication types are supported:
- pod-to-pod
- pod-to-service, where the service is a ClusterIP service backed by a regular pod
In all cases, hardware offloading takes place only when those pods and services are assigned to nodes that have a compatible SmartNIC. Suppose, for example, that a pod on a node with hardware offloading tries to communicate with a service on a regular node. On the regular node, all the processing takes place in the kernel, so the overall performance of the pod-to-service communication is limited to the maximum performance of that regular node. Hardware offloading is not compatible with DPDK applications.
Enabling hardware offloading on a node, but not configuring pods to use, it can result in decreased throughput performance for pod traffic. You cannot configure hardware offloading for pods that are managed by OpenShift Container Platform.
25.12.2. Supported devices
Hardware offloading is supported on the following network interface controllers:
Manufacturer | Model | Vendor ID | Device ID |
---|---|---|---|
Mellanox | MT27800 Family [ConnectX‑5] | 15b3 | 1017 |
Mellanox | MT28880 Family [ConnectX‑5 Ex] | 15b3 | 1019 |
Mellanox | MT2892 Family [ConnectX‑6 Dx] | 15b3 | 101d |
Mellanox | MT2894 Family [ConnectX-6 Lx] | 15b3 | 101f |
Mellanox | MT42822 BlueField-2 in ConnectX-6 NIC mode | 15b3 | a2d6 |
25.12.3. Prerequisites
- Your cluster has at least one bare metal machine with a network interface controller that is supported for hardware offloading.
- You installed the SR-IOV Network Operator.
- Your cluster uses the OVN-Kubernetes network plugin.
-
In your OVN-Kubernetes network plugin configuration, the
gatewayConfig.routingViaHost
field is set tofalse
.
25.12.4. Setting the SR-IOV Network Operator into systemd mode
To support hardware offloading, you must first set the SR-IOV Network Operator into systemd
mode.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user that has the
cluster-admin
role.
Procedure
Create a
SriovOperatorConfig
custom resource (CR) to deploy all the SR-IOV Operator components:Create a file named
sriovOperatorConfig.yaml
that contains the following YAML:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default namespace: openshift-sriov-network-operator spec: enableInjector: true enableOperatorWebhook: true configurationMode: "systemd" logLevel: 2
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovOperatorConfig metadata: name: default
1 namespace: openshift-sriov-network-operator spec: enableInjector: true enableOperatorWebhook: true configurationMode: "systemd"
2 logLevel: 2
Copy to Clipboard Copied! Create the resource by running the following command:
oc apply -f sriovOperatorConfig.yaml
$ oc apply -f sriovOperatorConfig.yaml
Copy to Clipboard Copied!
25.12.5. Configuring a machine config pool for hardware offloading
To enable hardware offloading, you now create a dedicated machine config pool and configure it to work with the SR-IOV Network Operator.
Prerequisites
-
SR-IOV Network Operator installed and set into
systemd
mode.
Procedure
Create a machine config pool for machines you want to use hardware offloading on.
Create a file, such as
mcp-offloading.yaml
, with content like the following example:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: mcp-offloading spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,mcp-offloading]} nodeSelector: matchLabels: node-role.kubernetes.io/mcp-offloading: ""
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: mcp-offloading
1 spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,mcp-offloading]}
2 nodeSelector: matchLabels: node-role.kubernetes.io/mcp-offloading: ""
3 Copy to Clipboard Copied! Apply the configuration for the machine config pool:
oc create -f mcp-offloading.yaml
$ oc create -f mcp-offloading.yaml
Copy to Clipboard Copied!
Add nodes to the machine config pool. Label each node with the node role label of your pool:
oc label node worker-2 node-role.kubernetes.io/mcp-offloading=""
$ oc label node worker-2 node-role.kubernetes.io/mcp-offloading=""
Copy to Clipboard Copied! Optional: To verify that the new pool is created, run the following command:
oc get nodes
$ oc get nodes
Copy to Clipboard Copied! Example output
NAME STATUS ROLES AGE VERSION master-0 Ready master 2d v1.28.5 master-1 Ready master 2d v1.28.5 master-2 Ready master 2d v1.28.5 worker-0 Ready worker 2d v1.28.5 worker-1 Ready worker 2d v1.28.5 worker-2 Ready mcp-offloading,worker 47h v1.28.5 worker-3 Ready mcp-offloading,worker 47h v1.28.5
NAME STATUS ROLES AGE VERSION master-0 Ready master 2d v1.28.5 master-1 Ready master 2d v1.28.5 master-2 Ready master 2d v1.28.5 worker-0 Ready worker 2d v1.28.5 worker-1 Ready worker 2d v1.28.5 worker-2 Ready mcp-offloading,worker 47h v1.28.5 worker-3 Ready mcp-offloading,worker 47h v1.28.5
Copy to Clipboard Copied! Add this machine config pool to the
SriovNetworkPoolConfig
custom resource:Create a file, such as
sriov-pool-config.yaml
, with content like the following example:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: sriovnetworkpoolconfig-offload namespace: openshift-sriov-network-operator spec: ovsHardwareOffloadConfig: name: mcp-offloading
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: sriovnetworkpoolconfig-offload namespace: openshift-sriov-network-operator spec: ovsHardwareOffloadConfig: name: mcp-offloading
1 Copy to Clipboard Copied! - 1
- The name of your machine config pool for hardware offloading.
Apply the configuration:
oc create -f <SriovNetworkPoolConfig_name>.yaml
$ oc create -f <SriovNetworkPoolConfig_name>.yaml
Copy to Clipboard Copied! NoteWhen you apply the configuration specified in a
SriovNetworkPoolConfig
object, the SR-IOV Operator drains and restarts the nodes in the machine config pool.It might take several minutes for a configuration changes to apply.
25.12.6. Configuring the SR-IOV network node policy
You can create an SR-IOV network device configuration for a node by creating an SR-IOV network node policy. To enable hardware offloading, you must define the .spec.eSwitchMode
field with the value "switchdev"
.
The following procedure creates an SR-IOV interface for a network interface controller with hardware offloading.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Create a file, such as
sriov-node-policy.yaml
, with content like the following example:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-node-policy namespace: openshift-sriov-network-operator spec: deviceType: netdevice eSwitchMode: "switchdev" nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 6 priority: 5 resourceName: mlxnics
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-node-policy
1 namespace: openshift-sriov-network-operator spec: deviceType: netdevice
2 eSwitchMode: "switchdev"
3 nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0 nodeSelector: feature.node.kubernetes.io/network-sriov.capable: "true" numVfs: 6 priority: 5 resourceName: mlxnics
Copy to Clipboard Copied! Apply the configuration for the policy:
oc create -f sriov-node-policy.yaml
$ oc create -f sriov-node-policy.yaml
Copy to Clipboard Copied! NoteWhen you apply the configuration specified in a
SriovNetworkPoolConfig
object, the SR-IOV Operator drains and restarts the nodes in the machine config pool.It might take several minutes for a configuration change to apply.
25.12.6.1. An example SR-IOV network node policy for OpenStack
The following example describes an SR-IOV interface for a network interface controller (NIC) with hardware offloading on Red Hat OpenStack Platform (RHOSP).
An SR-IOV interface for a NIC with hardware offloading on RHOSP
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: ${name} namespace: openshift-sriov-network-operator spec: deviceType: switchdev isRdma: true nicSelector: netFilter: openstack/NetworkID:${net_id} nodeSelector: feature.node.kubernetes.io/network-sriov.capable: 'true' numVfs: 1 priority: 99 resourceName: ${name}
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
name: ${name}
namespace: openshift-sriov-network-operator
spec:
deviceType: switchdev
isRdma: true
nicSelector:
netFilter: openstack/NetworkID:${net_id}
nodeSelector:
feature.node.kubernetes.io/network-sriov.capable: 'true'
numVfs: 1
priority: 99
resourceName: ${name}
25.12.7. Improving network traffic performance using a virtual function
Follow this procedure to assign a virtual function to the OVN-Kubernetes management port and increase its network traffic performance.
This procedure results in the creation of two pools: the first has a virtual function used by OVN-Kubernetes, and the second comprises the remaining virtual functions.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Add the
network.operator.openshift.io/smart-nic
label to each worker node with a SmartNIC present by running the following command:oc label node <node-name> network.operator.openshift.io/smart-nic=
$ oc label node <node-name> network.operator.openshift.io/smart-nic=
Copy to Clipboard Copied! Use the
oc get nodes
command to get a list of the available nodes.Create a policy named
sriov-node-mgmt-vf-policy.yaml
for the management port with content such as the following example:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-node-mgmt-vf-policy namespace: openshift-sriov-network-operator spec: deviceType: netdevice eSwitchMode: "switchdev" nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0#0-0 nodeSelector: network.operator.openshift.io/smart-nic: "" numVfs: 6 priority: 5 resourceName: mgmtvf
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-node-mgmt-vf-policy namespace: openshift-sriov-network-operator spec: deviceType: netdevice eSwitchMode: "switchdev" nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0#0-0
1 nodeSelector: network.operator.openshift.io/smart-nic: "" numVfs: 6
2 priority: 5 resourceName: mgmtvf
Copy to Clipboard Copied! - 1
- Replace this device with the appropriate network device for your use case. The
#0-0
part of thepfNames
value reserves a single virtual function used by OVN-Kubernetes. - 2
- The value provided here is an example. Replace this value with one that meets your requirements. For more information, see SR-IOV network node configuration object in the Additional resources section.
Create a policy named
sriov-node-policy.yaml
with content such as the following example:apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-node-policy namespace: openshift-sriov-network-operator spec: deviceType: netdevice eSwitchMode: "switchdev" nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0#1-5 nodeSelector: network.operator.openshift.io/smart-nic: "" numVfs: 6 priority: 5 resourceName: mlxnics
apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkNodePolicy metadata: name: sriov-node-policy namespace: openshift-sriov-network-operator spec: deviceType: netdevice eSwitchMode: "switchdev" nicSelector: deviceID: "1019" rootDevices: - 0000:d8:00.0 vendor: "15b3" pfNames: - ens8f0#1-5
1 nodeSelector: network.operator.openshift.io/smart-nic: "" numVfs: 6
2 priority: 5 resourceName: mlxnics
Copy to Clipboard Copied! - 1
- Replace this device with the appropriate network device for your use case.
- 2
- The value provided here is an example. Replace this value with the value specified in the
sriov-node-mgmt-vf-policy.yaml
file. For more information, see SR-IOV network node configuration object in the Additional resources section.
NoteThe
sriov-node-mgmt-vf-policy.yaml
file has different values for thepfNames
andresourceName
keys than thesriov-node-policy.yaml
file.Apply the configuration for both policies:
oc create -f sriov-node-policy.yaml
$ oc create -f sriov-node-policy.yaml
Copy to Clipboard Copied! oc create -f sriov-node-mgmt-vf-policy.yaml
$ oc create -f sriov-node-mgmt-vf-policy.yaml
Copy to Clipboard Copied! Create a Cluster Network Operator (CNO) ConfigMap in the cluster for the management configuration:
Create a ConfigMap named
hardware-offload-config.yaml
with the following contents:apiVersion: v1 kind: ConfigMap metadata: name: hardware-offload-config namespace: openshift-network-operator data: mgmt-port-resource-name: openshift.io/mgmtvf
apiVersion: v1 kind: ConfigMap metadata: name: hardware-offload-config namespace: openshift-network-operator data: mgmt-port-resource-name: openshift.io/mgmtvf
Copy to Clipboard Copied! Apply the configuration for the ConfigMap:
oc create -f hardware-offload-config.yaml
$ oc create -f hardware-offload-config.yaml
Copy to Clipboard Copied!
25.12.8. Creating a network attachment definition
After you define the machine config pool and the SR-IOV network node policy, you can create a network attachment definition for the network interface card you specified.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Create a file, such as
net-attach-def.yaml
, with content like the following example:apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: net-attach-def namespace: net-attach-def annotations: k8s.v1.cni.cncf.io/resourceName: openshift.io/mlxnics spec: config: '{"cniVersion":"0.3.1","name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay","ipam":{},"dns":{}}'
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: net-attach-def
1 namespace: net-attach-def
2 annotations: k8s.v1.cni.cncf.io/resourceName: openshift.io/mlxnics
3 spec: config: '{"cniVersion":"0.3.1","name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay","ipam":{},"dns":{}}'
Copy to Clipboard Copied! Apply the configuration for the network attachment definition:
oc create -f net-attach-def.yaml
$ oc create -f net-attach-def.yaml
Copy to Clipboard Copied!
Verification
Run the following command to see whether the new definition is present:
oc get net-attach-def -A
$ oc get net-attach-def -A
Copy to Clipboard Copied! Example output
NAMESPACE NAME AGE net-attach-def net-attach-def 43h
NAMESPACE NAME AGE net-attach-def net-attach-def 43h
Copy to Clipboard Copied!
25.12.9. Adding the network attachment definition to your pods
After you create the machine config pool, the SriovNetworkPoolConfig
and SriovNetworkNodePolicy
custom resources, and the network attachment definition, you can apply these configurations to your pods by adding the network attachment definition to your pod specifications.
Procedure
In the pod specification, add the
.metadata.annotations.k8s.v1.cni.cncf.io/networks
field and specify the network attachment definition you created for hardware offloading:.... metadata: annotations: v1.multus-cni.io/default-network: net-attach-def/net-attach-def
.... metadata: annotations: v1.multus-cni.io/default-network: net-attach-def/net-attach-def
1 Copy to Clipboard Copied! - 1
- The value must be the name and namespace of the network attachment definition you created for hardware offloading.
25.13. Switching Bluefield-2 from DPU to NIC
You can switch the Bluefield-2 network device from data processing unit (DPU) mode to network interface controller (NIC) mode.
25.13.1. Switching Bluefield-2 from DPU mode to NIC mode
Use the following procedure to switch Bluefield-2 from data processing units (DPU) mode to network interface controller (NIC) mode.
Currently, only switching Bluefield-2 from DPU to NIC mode is supported. Switching from NIC mode to DPU mode is unsupported.
Prerequisites
- You have installed the SR-IOV Network Operator. For more information, see "Installing SR-IOV Network Operator".
- You have updated Bluefield-2 to the latest firmware. For more information, see Firmware for NVIDIA BlueField-2.
Procedure
Add the following labels to each of your worker nodes by entering the following commands:
oc label node <example_node_name_one> node-role.kubernetes.io/sriov=
$ oc label node <example_node_name_one> node-role.kubernetes.io/sriov=
Copy to Clipboard Copied! oc label node <example_node_name_two> node-role.kubernetes.io/sriov=
$ oc label node <example_node_name_two> node-role.kubernetes.io/sriov=
Copy to Clipboard Copied! Create a machine config pool for the SR-IOV Network Operator, for example:
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: sriov spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,sriov]} nodeSelector: matchLabels: node-role.kubernetes.io/sriov: ""
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: sriov spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,sriov]} nodeSelector: matchLabels: node-role.kubernetes.io/sriov: ""
Copy to Clipboard Copied! Apply the following
machineconfig.yaml
file to the worker nodes:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: sriov name: 99-bf2-dpu spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,ZmluZF9jb250YWluZXIoKSB7CiAgY3JpY3RsIHBzIC1vIGpzb24gfCBqcSAtciAnLmNvbnRhaW5lcnNbXSB8IHNlbGVjdCgubWV0YWRhdGEubmFtZT09InNyaW92LW5ldHdvcmstY29uZmlnLWRhZW1vbiIpIHwgLmlkJwp9CnVudGlsIG91dHB1dD0kKGZpbmRfY29udGFpbmVyKTsgW1sgLW4gIiRvdXRwdXQiIF1dOyBkbwogIGVjaG8gIndhaXRpbmcgZm9yIGNvbnRhaW5lciB0byBjb21lIHVwIgogIHNsZWVwIDE7CmRvbmUKISBzdWRvIGNyaWN0bCBleGVjICRvdXRwdXQgL2JpbmRhdGEvc2NyaXB0cy9iZjItc3dpdGNoLW1vZGUuc2ggIiRAIgo= mode: 0755 overwrite: true path: /etc/default/switch_in_sriov_config_daemon.sh systemd: units: - name: dpu-switch.service enabled: true contents: | [Unit] Description=Switch BlueField2 card to NIC/DPU mode RequiresMountsFor=%t/containers Wants=network.target After=network-online.target kubelet.service [Service] SuccessExitStatus=0 120 RemainAfterExit=True ExecStart=/bin/bash -c '/etc/default/switch_in_sriov_config_daemon.sh nic || shutdown -r now' Type=oneshot [Install] WantedBy=multi-user.target
apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: sriov name: 99-bf2-dpu spec: config: ignition: version: 3.2.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,ZmluZF9jb250YWluZXIoKSB7CiAgY3JpY3RsIHBzIC1vIGpzb24gfCBqcSAtciAnLmNvbnRhaW5lcnNbXSB8IHNlbGVjdCgubWV0YWRhdGEubmFtZT09InNyaW92LW5ldHdvcmstY29uZmlnLWRhZW1vbiIpIHwgLmlkJwp9CnVudGlsIG91dHB1dD0kKGZpbmRfY29udGFpbmVyKTsgW1sgLW4gIiRvdXRwdXQiIF1dOyBkbwogIGVjaG8gIndhaXRpbmcgZm9yIGNvbnRhaW5lciB0byBjb21lIHVwIgogIHNsZWVwIDE7CmRvbmUKISBzdWRvIGNyaWN0bCBleGVjICRvdXRwdXQgL2JpbmRhdGEvc2NyaXB0cy9iZjItc3dpdGNoLW1vZGUuc2ggIiRAIgo= mode: 0755 overwrite: true path: /etc/default/switch_in_sriov_config_daemon.sh systemd: units: - name: dpu-switch.service enabled: true contents: | [Unit] Description=Switch BlueField2 card to NIC/DPU mode RequiresMountsFor=%t/containers Wants=network.target After=network-online.target kubelet.service [Service] SuccessExitStatus=0 120 RemainAfterExit=True ExecStart=/bin/bash -c '/etc/default/switch_in_sriov_config_daemon.sh nic || shutdown -r now'
1 Type=oneshot [Install] WantedBy=multi-user.target
Copy to Clipboard Copied! - 1
- Optional: The PCI address of a specific card can optionally be specified, for example
ExecStart=/bin/bash -c '/etc/default/switch_in_sriov_config_daemon.sh nic 0000:5e:00.0 || echo done'
. By default, the first device is selected. If there is more than one device, you must specify which PCI address to be used. The PCI address must be the same on all nodes that are switching Bluefield-2 from DPU mode to NIC mode.
- Wait for the worker nodes to restart. After restarting, the Bluefield-2 network device on the worker nodes is switched into NIC mode.
- Optional: You might need to restart the host hardware because most recent Bluefield-2 firmware releases require a hardware restart to switch into NIC mode.
25.14. Uninstalling the SR-IOV Network Operator
To uninstall the SR-IOV Network Operator, you must delete any running SR-IOV workloads, uninstall the Operator, and delete the webhooks that the Operator used.
25.14.1. Uninstalling the SR-IOV Network Operator
As a cluster administrator, you can uninstall the SR-IOV Network Operator.
Prerequisites
-
You have access to an OpenShift Container Platform cluster using an account with
cluster-admin
permissions. - You have the SR-IOV Network Operator installed.
Procedure
Delete all SR-IOV custom resources (CRs):
oc delete sriovnetwork -n openshift-sriov-network-operator --all
$ oc delete sriovnetwork -n openshift-sriov-network-operator --all
Copy to Clipboard Copied! oc delete sriovnetworknodepolicy -n openshift-sriov-network-operator --all
$ oc delete sriovnetworknodepolicy -n openshift-sriov-network-operator --all
Copy to Clipboard Copied! oc delete sriovibnetwork -n openshift-sriov-network-operator --all
$ oc delete sriovibnetwork -n openshift-sriov-network-operator --all
Copy to Clipboard Copied! - Follow the instructions in the "Deleting Operators from a cluster" section to remove the SR-IOV Network Operator from your cluster.
Delete the SR-IOV custom resource definitions that remain in the cluster after the SR-IOV Network Operator is uninstalled:
oc delete crd sriovibnetworks.sriovnetwork.openshift.io
$ oc delete crd sriovibnetworks.sriovnetwork.openshift.io
Copy to Clipboard Copied! oc delete crd sriovnetworknodepolicies.sriovnetwork.openshift.io
$ oc delete crd sriovnetworknodepolicies.sriovnetwork.openshift.io
Copy to Clipboard Copied! oc delete crd sriovnetworknodestates.sriovnetwork.openshift.io
$ oc delete crd sriovnetworknodestates.sriovnetwork.openshift.io
Copy to Clipboard Copied! oc delete crd sriovnetworkpoolconfigs.sriovnetwork.openshift.io
$ oc delete crd sriovnetworkpoolconfigs.sriovnetwork.openshift.io
Copy to Clipboard Copied! oc delete crd sriovnetworks.sriovnetwork.openshift.io
$ oc delete crd sriovnetworks.sriovnetwork.openshift.io
Copy to Clipboard Copied! oc delete crd sriovoperatorconfigs.sriovnetwork.openshift.io
$ oc delete crd sriovoperatorconfigs.sriovnetwork.openshift.io
Copy to Clipboard Copied! Delete the SR-IOV webhooks:
oc delete mutatingwebhookconfigurations network-resources-injector-config
$ oc delete mutatingwebhookconfigurations network-resources-injector-config
Copy to Clipboard Copied! oc delete MutatingWebhookConfiguration sriov-operator-webhook-config
$ oc delete MutatingWebhookConfiguration sriov-operator-webhook-config
Copy to Clipboard Copied! oc delete ValidatingWebhookConfiguration sriov-operator-webhook-config
$ oc delete ValidatingWebhookConfiguration sriov-operator-webhook-config
Copy to Clipboard Copied! Delete the SR-IOV Network Operator namespace:
oc delete namespace openshift-sriov-network-operator
$ oc delete namespace openshift-sriov-network-operator
Copy to Clipboard Copied!
Chapter 26. OVN-Kubernetes network plugin
26.1. About the OVN-Kubernetes network plugin
The OpenShift Container Platform cluster uses a virtualized network for pod and service networks.
Part of Red Hat OpenShift Networking, the OVN-Kubernetes network plugin is the default network provider for OpenShift Container Platform. OVN-Kubernetes is based on Open Virtual Network (OVN) and provides an overlay-based networking implementation. A cluster that uses the OVN-Kubernetes plugin also runs Open vSwitch (OVS) on each node. OVN configures OVS on each node to implement the declared network configuration.
OVN-Kubernetes is the default networking solution for OpenShift Container Platform and single-node OpenShift deployments.
OVN-Kubernetes, which arose from the OVS project, uses many of the same constructs, such as open flow rules, to determine how packets travel through the network. For more information, see the Open Virtual Network website.
OVN-Kubernetes is a series of daemons for OVS that translate virtual network configurations into OpenFlow
rules. OpenFlow
is a protocol for communicating with network switches and routers, providing a means for remotely controlling the flow of network traffic on a network device, allowing network administrators to configure, manage, and monitor the flow of network traffic.
OVN-Kubernetes provides more of the advanced functionality not available with OpenFlow
. OVN supports distributed virtual routing, distributed logical switches, access control, DHCP and DNS. OVN implements distributed virtual routing within logic flows which equate to open flows. So for example if you have a pod that sends out a DHCP request on the network, it sends out that broadcast looking for DHCP address there will be a logic flow rule that matches that packet, and it responds giving it a gateway, a DNS server an IP address and so on.
OVN-Kubernetes runs a daemon on each node. There are daemon sets for the databases and for the OVN controller that run on every node. The OVN controller programs the Open vSwitch daemon on the nodes to support the network provider features; egress IPs, firewalls, routers, hybrid networking, IPSEC encryption, IPv6, network policy, network policy logs, hardware offloading and multicast.
26.1.1. OVN-Kubernetes purpose
The OVN-Kubernetes network plugin is an open-source, fully-featured Kubernetes CNI plugin that uses Open Virtual Network (OVN) to manage network traffic flows. OVN is a community developed, vendor-agnostic network virtualization solution. The OVN-Kubernetes network plugin:
- Uses OVN (Open Virtual Network) to manage network traffic flows. OVN is a community developed, vendor-agnostic network virtualization solution.
- Implements Kubernetes network policy support, including ingress and egress rules.
- Uses the Geneve (Generic Network Virtualization Encapsulation) protocol rather than VXLAN to create an overlay network between nodes.
The OVN-Kubernetes network plugin provides the following advantages over OpenShift SDN.
- Full support for IPv6 single-stack and IPv4/IPv6 dual-stack networking on supported platforms
- Support for hybrid clusters with both Linux and Microsoft Windows workloads
- Optional IPsec encryption of intra-cluster communications
- Offload of network data processing from host CPU to compatible network cards and data processing units (DPUs)
26.1.2. Supported network plugin feature matrix
Red Hat OpenShift Networking offers two options for the network plugin, OpenShift SDN and OVN-Kubernetes, for the network plugin. The following table summarizes the current feature support for both network plugins:
Feature | OpenShift SDN | OVN-Kubernetes |
---|---|---|
Egress IPs | Supported | Supported |
Egress firewall | Supported | Supported [1] |
Egress router | Supported | Supported [2] |
Hybrid networking | Not supported | Supported |
IPsec encryption for intra-cluster communication | Not supported | Supported |
IPv4 single-stack | Supported | Supported |
IPv6 single-stack | Not supported | Supported [3] |
IPv4/IPv6 dual-stack | Not Supported | Supported [4] |
IPv6/IPv4 dual-stack | Not supported | Supported [5] |
Kubernetes network policy | Supported | Supported |
Kubernetes network policy logs | Not supported | Supported |
Hardware offloading | Not supported | Supported |
Multicast | Supported | Supported |
- Egress firewall is also known as egress network policy in OpenShift SDN. This is not the same as network policy egress.
- Egress router for OVN-Kubernetes supports only redirect mode.
- IPv6 single-stack networking on a bare-metal platform.
- IPv4/IPv6 dual-stack networking on bare-metal, VMware vSphere (installer-provisioned infrastructure installations only), IBM Power®, IBM Z®, and RHOSP platforms.
- IPv6/IPv4 dual-stack networking on bare-metal, VMware vSphere (installer-provisioned infrastructure installations only), and IBM Power® platforms.
26.1.3. OVN-Kubernetes IPv6 and dual-stack limitations
The OVN-Kubernetes network plugin has the following limitations:
For clusters configured for dual-stack networking, both IPv4 and IPv6 traffic must use the same network interface as the default gateway. If this requirement is not met, pods on the host in the
ovnkube-node
daemon set enter theCrashLoopBackOff
state. If you display a pod with a command such asoc get pod -n openshift-ovn-kubernetes -l app=ovnkube-node -o yaml
, thestatus
field contains more than one message about the default gateway, as shown in the following output:I1006 16:09:50.985852 60651 helper_linux.go:73] Found default gateway interface br-ex 192.168.127.1 I1006 16:09:50.985923 60651 helper_linux.go:73] Found default gateway interface ens4 fe80::5054:ff:febe:bcd4 F1006 16:09:50.985939 60651 ovnkube.go:130] multiple gateway interfaces detected: br-ex ens4
I1006 16:09:50.985852 60651 helper_linux.go:73] Found default gateway interface br-ex 192.168.127.1 I1006 16:09:50.985923 60651 helper_linux.go:73] Found default gateway interface ens4 fe80::5054:ff:febe:bcd4 F1006 16:09:50.985939 60651 ovnkube.go:130] multiple gateway interfaces detected: br-ex ens4
Copy to Clipboard Copied! The only resolution is to reconfigure the host networking so that both IP families use the same network interface for the default gateway.
For clusters configured for dual-stack networking, both the IPv4 and IPv6 routing tables must contain the default gateway. If this requirement is not met, pods on the host in the
ovnkube-node
daemon set enter theCrashLoopBackOff
state. If you display a pod with a command such asoc get pod -n openshift-ovn-kubernetes -l app=ovnkube-node -o yaml
, thestatus
field contains more than one message about the default gateway, as shown in the following output:I0512 19:07:17.589083 108432 helper_linux.go:74] Found default gateway interface br-ex 192.168.123.1 F0512 19:07:17.589141 108432 ovnkube.go:133] failed to get default gateway interface
I0512 19:07:17.589083 108432 helper_linux.go:74] Found default gateway interface br-ex 192.168.123.1 F0512 19:07:17.589141 108432 ovnkube.go:133] failed to get default gateway interface
Copy to Clipboard Copied! The only resolution is to reconfigure the host networking so that both IP families contain the default gateway.
26.1.4. Session affinity
Session affinity is a feature that applies to Kubernetes Service
objects. You can use session affinity if you want to ensure that each time you connect to a <service_VIP>:<Port>, the traffic is always load balanced to the same back end. For more information, including how to set session affinity based on a client’s IP address, see Session affinity.
Stickiness timeout for session affinity
The OVN-Kubernetes network plugin for OpenShift Container Platform calculates the stickiness timeout for a session from a client based on the last packet. For example, if you run a curl
command 10 times, the sticky session timer starts from the tenth packet not the first. As a result, if the client is continuously contacting the service, then the session never times out. The timeout starts when the service has not received a packet for the amount of time set by the timeoutSeconds
parameter.
26.2. OVN-Kubernetes architecture
26.2.1. Introduction to OVN-Kubernetes architecture
The following diagram shows the OVN-Kubernetes architecture.
Figure 26.1. OVK-Kubernetes architecture

The key components are:
- Cloud Management System (CMS) - A platform specific client for OVN that provides a CMS specific plugin for OVN integration. The plugin translates the cloud management system’s concept of the logical network configuration, stored in the CMS configuration database in a CMS-specific format, into an intermediate representation understood by OVN.
-
OVN Northbound database (
nbdb
) container - Stores the logical network configuration passed by the CMS plugin. -
OVN Southbound database (
sbdb
) container - Stores the physical and logical network configuration state for Open vSwitch (OVS) system on each node, including tables that bind them. -
OVN north daemon (
ovn-northd
) - This is the intermediary client betweennbdb
container andsbdb
container. It translates the logical network configuration in terms of conventional network concepts, taken from thenbdb
container, into logical data path flows in thesbdb
container. The container name forovn-northd
daemon isnorthd
and it runs in theovnkube-node
pods. -
ovn-controller - This is the OVN agent that interacts with OVS and hypervisors, for any information or update that is needed for
sbdb
container. Theovn-controller
reads logical flows from thesbdb
container, translates them intoOpenFlow
flows and sends them to the node’s OVS daemon. The container name isovn-controller
and it runs in theovnkube-node
pods.
The OVN northd, northbound database, and southbound database run on each node in the cluster and mostly contain and process information that is local to that node.
The OVN northbound database has the logical network configuration passed down to it by the cloud management system (CMS). The OVN northbound database contains the current desired state of the network, presented as a collection of logical ports, logical switches, logical routers, and more. The ovn-northd
(northd
container) connects to the OVN northbound database and the OVN southbound database. It translates the logical network configuration in terms of conventional network concepts, taken from the OVN northbound database, into logical data path flows in the OVN southbound database.
The OVN southbound database has physical and logical representations of the network and binding tables that link them together. It contains the chassis information of the node and other constructs like remote transit switch ports that are required to connect to the other nodes in the cluster. The OVN southbound database also contains all the logic flows. The logic flows are shared with the ovn-controller
process that runs on each node and the ovn-controller
turns those into OpenFlow
rules to program Open vSwitch
(OVS).
The Kubernetes control plane nodes contain two ovnkube-control-plane
pods on separate nodes, which perform the central IP address management (IPAM) allocation for each node in the cluster. At any given time, a single ovnkube-control-plane
pod is the leader.
26.2.2. Listing all resources in the OVN-Kubernetes project
Finding the resources and containers that run in the OVN-Kubernetes project is important to help you understand the OVN-Kubernetes networking implementation.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role. -
The OpenShift CLI (
oc
) installed.
Procedure
Run the following command to get all resources, endpoints, and
ConfigMaps
in the OVN-Kubernetes project:oc get all,ep,cm -n openshift-ovn-kubernetes
$ oc get all,ep,cm -n openshift-ovn-kubernetes
Copy to Clipboard Copied! Example output
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ NAME READY STATUS RESTARTS AGE pod/ovnkube-control-plane-65c6f55656-6d55h 2/2 Running 0 114m pod/ovnkube-control-plane-65c6f55656-fd7vw 2/2 Running 2 (104m ago) 114m pod/ovnkube-node-bcvts 8/8 Running 0 113m pod/ovnkube-node-drgvv 8/8 Running 0 113m pod/ovnkube-node-f2pxt 8/8 Running 0 113m pod/ovnkube-node-frqsb 8/8 Running 0 105m pod/ovnkube-node-lbxkk 8/8 Running 0 105m pod/ovnkube-node-tt7bx 8/8 Running 1 (102m ago) 105m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/ovn-kubernetes-control-plane ClusterIP None <none> 9108/TCP 114m service/ovn-kubernetes-node ClusterIP None <none> 9103/TCP,9105/TCP 114m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/ovnkube-node 6 6 6 6 6 beta.kubernetes.io/os=linux 114m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/ovnkube-control-plane 3/3 3 3 114m NAME DESIRED CURRENT READY AGE replicaset.apps/ovnkube-control-plane-65c6f55656 3 3 3 114m NAME ENDPOINTS AGE endpoints/ovn-kubernetes-control-plane 10.0.0.3:9108,10.0.0.4:9108,10.0.0.5:9108 114m endpoints/ovn-kubernetes-node 10.0.0.3:9105,10.0.0.4:9105,10.0.0.5:9105 + 9 more... 114m NAME DATA AGE configmap/control-plane-status 1 113m configmap/kube-root-ca.crt 1 114m configmap/openshift-service-ca.crt 1 114m configmap/ovn-ca 1 114m configmap/ovnkube-config 1 114m configmap/signer-ca 1 114m
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ NAME READY STATUS RESTARTS AGE pod/ovnkube-control-plane-65c6f55656-6d55h 2/2 Running 0 114m pod/ovnkube-control-plane-65c6f55656-fd7vw 2/2 Running 2 (104m ago) 114m pod/ovnkube-node-bcvts 8/8 Running 0 113m pod/ovnkube-node-drgvv 8/8 Running 0 113m pod/ovnkube-node-f2pxt 8/8 Running 0 113m pod/ovnkube-node-frqsb 8/8 Running 0 105m pod/ovnkube-node-lbxkk 8/8 Running 0 105m pod/ovnkube-node-tt7bx 8/8 Running 1 (102m ago) 105m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/ovn-kubernetes-control-plane ClusterIP None <none> 9108/TCP 114m service/ovn-kubernetes-node ClusterIP None <none> 9103/TCP,9105/TCP 114m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.apps/ovnkube-node 6 6 6 6 6 beta.kubernetes.io/os=linux 114m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/ovnkube-control-plane 3/3 3 3 114m NAME DESIRED CURRENT READY AGE replicaset.apps/ovnkube-control-plane-65c6f55656 3 3 3 114m NAME ENDPOINTS AGE endpoints/ovn-kubernetes-control-plane 10.0.0.3:9108,10.0.0.4:9108,10.0.0.5:9108 114m endpoints/ovn-kubernetes-node 10.0.0.3:9105,10.0.0.4:9105,10.0.0.5:9105 + 9 more... 114m NAME DATA AGE configmap/control-plane-status 1 113m configmap/kube-root-ca.crt 1 114m configmap/openshift-service-ca.crt 1 114m configmap/ovn-ca 1 114m configmap/ovnkube-config 1 114m configmap/signer-ca 1 114m
Copy to Clipboard Copied! There is one
ovnkube-node
pod for each node in the cluster. Theovnkube-config
config map has the OpenShift Container Platform OVN-Kubernetes configurations.List all of the containers in the
ovnkube-node
pods by running the following command:oc get pods ovnkube-node-bcvts -o jsonpath='{.spec.containers[*].name}' -n openshift-ovn-kubernetes
$ oc get pods ovnkube-node-bcvts -o jsonpath='{.spec.containers[*].name}' -n openshift-ovn-kubernetes
Copy to Clipboard Copied! Expected output
ovn-controller ovn-acl-logging kube-rbac-proxy-node kube-rbac-proxy-ovn-metrics northd nbdb sbdb ovnkube-controller
ovn-controller ovn-acl-logging kube-rbac-proxy-node kube-rbac-proxy-ovn-metrics northd nbdb sbdb ovnkube-controller
Copy to Clipboard Copied! The
ovnkube-node
pod is made up of several containers. It is responsible for hosting the northbound database (nbdb
container), the southbound database (sbdb
container), the north daemon (northd
container),ovn-controller
and theovnkube-controller
container. Theovnkube-controller
container watches for API objects like pods, egress IPs, namespaces, services, endpoints, egress firewall, and network policies. It is also responsible for allocating pod IP from the available subnet pool for that node.List all the containers in the
ovnkube-control-plane
pods by running the following command:oc get pods ovnkube-control-plane-65c6f55656-6d55h -o jsonpath='{.spec.containers[*].name}' -n openshift-ovn-kubernetes
$ oc get pods ovnkube-control-plane-65c6f55656-6d55h -o jsonpath='{.spec.containers[*].name}' -n openshift-ovn-kubernetes
Copy to Clipboard Copied! Expected output
kube-rbac-proxy ovnkube-cluster-manager
kube-rbac-proxy ovnkube-cluster-manager
Copy to Clipboard Copied! The
ovnkube-control-plane
pod has a container (ovnkube-cluster-manager
) that resides on each OpenShift Container Platform node. Theovnkube-cluster-manager
container allocates pod subnet, transit switch subnet IP and join switch subnet IP to each node in the cluster. Thekube-rbac-proxy
container monitors metrics for theovnkube-cluster-manager
container.
26.2.3. Listing the OVN-Kubernetes northbound database contents
Each node is controlled by the ovnkube-controller
container running in the ovnkube-node
pod on that node. To understand the OVN logical networking entities you need to examine the northbound database that is running as a container inside the ovnkube-node
pod on that node to see what objects are in the node you wish to see.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role. -
The OpenShift CLI (
oc
) installed.
To run ovn nbctl
or sbctl
commands in a cluster you must open a remote shell into the nbdb
or sbdb
containers on the relevant node
List pods by running the following command:
oc get po -n openshift-ovn-kubernetes
$ oc get po -n openshift-ovn-kubernetes
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE ovnkube-control-plane-8444dff7f9-4lh9k 2/2 Running 0 27m ovnkube-control-plane-8444dff7f9-5rjh9 2/2 Running 0 27m ovnkube-node-55xs2 8/8 Running 0 26m ovnkube-node-7r84r 8/8 Running 0 16m ovnkube-node-bqq8p 8/8 Running 0 17m ovnkube-node-mkj4f 8/8 Running 0 26m ovnkube-node-mlr8k 8/8 Running 0 26m ovnkube-node-wqn2m 8/8 Running 0 16m
NAME READY STATUS RESTARTS AGE ovnkube-control-plane-8444dff7f9-4lh9k 2/2 Running 0 27m ovnkube-control-plane-8444dff7f9-5rjh9 2/2 Running 0 27m ovnkube-node-55xs2 8/8 Running 0 26m ovnkube-node-7r84r 8/8 Running 0 16m ovnkube-node-bqq8p 8/8 Running 0 17m ovnkube-node-mkj4f 8/8 Running 0 26m ovnkube-node-mlr8k 8/8 Running 0 26m ovnkube-node-wqn2m 8/8 Running 0 16m
Copy to Clipboard Copied! Optional: To list the pods with node information, run the following command:
oc get pods -n openshift-ovn-kubernetes -owide
$ oc get pods -n openshift-ovn-kubernetes -owide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-control-plane-8444dff7f9-4lh9k 2/2 Running 0 27m 10.0.0.3 ci-ln-t487nnb-72292-mdcnq-master-1 <none> <none> ovnkube-control-plane-8444dff7f9-5rjh9 2/2 Running 0 27m 10.0.0.4 ci-ln-t487nnb-72292-mdcnq-master-2 <none> <none> ovnkube-node-55xs2 8/8 Running 0 26m 10.0.0.4 ci-ln-t487nnb-72292-mdcnq-master-2 <none> <none> ovnkube-node-7r84r 8/8 Running 0 17m 10.0.128.3 ci-ln-t487nnb-72292-mdcnq-worker-b-wbz7z <none> <none> ovnkube-node-bqq8p 8/8 Running 0 17m 10.0.128.2 ci-ln-t487nnb-72292-mdcnq-worker-a-lh7ms <none> <none> ovnkube-node-mkj4f 8/8 Running 0 27m 10.0.0.5 ci-ln-t487nnb-72292-mdcnq-master-0 <none> <none> ovnkube-node-mlr8k 8/8 Running 0 27m 10.0.0.3 ci-ln-t487nnb-72292-mdcnq-master-1 <none> <none> ovnkube-node-wqn2m 8/8 Running 0 17m 10.0.128.4 ci-ln-t487nnb-72292-mdcnq-worker-c-przlm <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-control-plane-8444dff7f9-4lh9k 2/2 Running 0 27m 10.0.0.3 ci-ln-t487nnb-72292-mdcnq-master-1 <none> <none> ovnkube-control-plane-8444dff7f9-5rjh9 2/2 Running 0 27m 10.0.0.4 ci-ln-t487nnb-72292-mdcnq-master-2 <none> <none> ovnkube-node-55xs2 8/8 Running 0 26m 10.0.0.4 ci-ln-t487nnb-72292-mdcnq-master-2 <none> <none> ovnkube-node-7r84r 8/8 Running 0 17m 10.0.128.3 ci-ln-t487nnb-72292-mdcnq-worker-b-wbz7z <none> <none> ovnkube-node-bqq8p 8/8 Running 0 17m 10.0.128.2 ci-ln-t487nnb-72292-mdcnq-worker-a-lh7ms <none> <none> ovnkube-node-mkj4f 8/8 Running 0 27m 10.0.0.5 ci-ln-t487nnb-72292-mdcnq-master-0 <none> <none> ovnkube-node-mlr8k 8/8 Running 0 27m 10.0.0.3 ci-ln-t487nnb-72292-mdcnq-master-1 <none> <none> ovnkube-node-wqn2m 8/8 Running 0 17m 10.0.128.4 ci-ln-t487nnb-72292-mdcnq-worker-c-przlm <none> <none>
Copy to Clipboard Copied! Navigate into a pod to look at the northbound database by running the following command:
oc rsh -c nbdb -n openshift-ovn-kubernetes ovnkube-node-55xs2
$ oc rsh -c nbdb -n openshift-ovn-kubernetes ovnkube-node-55xs2
Copy to Clipboard Copied! Run the following command to show all the objects in the northbound database:
ovn-nbctl show
$ ovn-nbctl show
Copy to Clipboard Copied! The output is too long to list here. The list includes the NAT rules, logical switches, load balancers and so on.
You can narrow down and focus on specific components by using some of the following optional commands:
Run the following command to show the list of logical routers:
oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c northd -- ovn-nbctl lr-list
$ oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c northd -- ovn-nbctl lr-list
Copy to Clipboard Copied! Example output
45339f4f-7d0b-41d0-b5f9-9fca9ce40ce6 (GR_ci-ln-t487nnb-72292-mdcnq-master-2) 96a0a0f0-e7ed-4fec-8393-3195563de1b8 (ovn_cluster_router)
45339f4f-7d0b-41d0-b5f9-9fca9ce40ce6 (GR_ci-ln-t487nnb-72292-mdcnq-master-2) 96a0a0f0-e7ed-4fec-8393-3195563de1b8 (ovn_cluster_router)
Copy to Clipboard Copied! NoteFrom this output you can see there is router on each node plus an
ovn_cluster_router
.Run the following command to show the list of logical switches:
oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c nbdb -- ovn-nbctl ls-list
$ oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c nbdb -- ovn-nbctl ls-list
Copy to Clipboard Copied! Example output
bdd7dc3d-d848-4a74-b293-cc15128ea614 (ci-ln-t487nnb-72292-mdcnq-master-2) b349292d-ee03-4914-935f-1940b6cb91e5 (ext_ci-ln-t487nnb-72292-mdcnq-master-2) 0aac0754-ea32-4e33-b086-35eeabf0a140 (join) 992509d7-2c3f-4432-88db-c179e43592e5 (transit_switch)
bdd7dc3d-d848-4a74-b293-cc15128ea614 (ci-ln-t487nnb-72292-mdcnq-master-2) b349292d-ee03-4914-935f-1940b6cb91e5 (ext_ci-ln-t487nnb-72292-mdcnq-master-2) 0aac0754-ea32-4e33-b086-35eeabf0a140 (join) 992509d7-2c3f-4432-88db-c179e43592e5 (transit_switch)
Copy to Clipboard Copied! NoteFrom this output you can see there is an ext switch for each node plus switches with the node name itself and a join switch.
Run the following command to show the list of load balancers:
oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c nbdb -- ovn-nbctl lb-list
$ oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c nbdb -- ovn-nbctl lb-list
Copy to Clipboard Copied! Example output
UUID LB PROTO VIP IPs 7c84c673-ed2a-4436-9a1f-9bc5dd181eea Service_default/ tcp 172.30.0.1:443 10.0.0.3:6443,169.254.169.2:6443,10.0.0.5:6443 4d663fd9-ddc8-4271-b333-4c0e279e20bb Service_default/ tcp 172.30.0.1:443 10.0.0.3:6443,10.0.0.4:6443,10.0.0.5:6443 292eb07f-b82f-4962-868a-4f541d250bca Service_openshif tcp 172.30.105.247:443 10.129.0.12:8443 034b5a7f-bb6a-45e9-8e6d-573a82dc5ee3 Service_openshif tcp 172.30.192.38:443 10.0.0.3:10259,10.0.0.4:10259,10.0.0.5:10259 a68bb53e-be84-48df-bd38-bdd82fcd4026 Service_openshif tcp 172.30.161.125:8443 10.129.0.32:8443 6cc21b3d-2c54-4c94-8ff5-d8e017269c2e Service_openshif tcp 172.30.3.144:443 10.129.0.22:8443 37996ffd-7268-4862-a27f-61cd62e09c32 Service_openshif tcp 172.30.181.107:443 10.129.0.18:8443 81d4da3c-f811-411f-ae0c-bc6713d0861d Service_openshif tcp 172.30.228.23:443 10.129.0.29:8443 ac5a4f3b-b6ba-4ceb-82d0-d84f2c41306e Service_openshif tcp 172.30.14.240:9443 10.129.0.36:9443 c88979fb-1ef5-414b-90ac-43b579351ac9 Service_openshif tcp 172.30.231.192:9001 10.128.0.5:9001,10.128.2.5:9001,10.129.0.5:9001,10.129.2.4:9001,10.130.0.3:9001,10.131.0.3:9001 fcb0a3fb-4a77-4230-a84a-be45dce757e8 Service_openshif tcp 172.30.189.92:443 10.130.0.17:8440 67ef3e7b-ceb9-4bf0-8d96-b43bde4c9151 Service_openshif tcp 172.30.67.218:443 10.129.0.9:8443 d0032fba-7d5e-424a-af25-4ab9b5d46e81 Service_openshif tcp 172.30.102.137:2379 10.0.0.3:2379,10.0.0.4:2379,10.0.0.5:2379 tcp 172.30.102.137:9979 10.0.0.3:9979,10.0.0.4:9979,10.0.0.5:9979 7361c537-3eec-4e6c-bc0c-0522d182abd4 Service_openshif tcp 172.30.198.215:9001 10.0.0.3:9001,10.0.0.4:9001,10.0.0.5:9001,10.0.128.2:9001,10.0.128.3:9001,10.0.128.4:9001 0296c437-1259-410b-a6fd-81c310ad0af5 Service_openshif tcp 172.30.198.215:9001 10.0.0.3:9001,169.254.169.2:9001,10.0.0.5:9001,10.0.128.2:9001,10.0.128.3:9001,10.0.128.4:9001 5d5679f5-45b8-479d-9f7c-08b123c688b8 Service_openshif tcp 172.30.38.253:17698 10.128.0.52:17698,10.129.0.84:17698,10.130.0.60:17698 2adcbab4-d1c9-447d-9573-b5dc9f2efbfa Service_openshif tcp 172.30.148.52:443 10.0.0.4:9202,10.0.0.5:9202 tcp 172.30.148.52:444 10.0.0.4:9203,10.0.0.5:9203 tcp 172.30.148.52:445 10.0.0.4:9204,10.0.0.5:9204 tcp 172.30.148.52:446 10.0.0.4:9205,10.0.0.5:9205 2a33a6d7-af1b-4892-87cc-326a380b809b Service_openshif tcp 172.30.67.219:9091 10.129.2.16:9091,10.131.0.16:9091 tcp 172.30.67.219:9092 10.129.2.16:9092,10.131.0.16:9092 tcp 172.30.67.219:9093 10.129.2.16:9093,10.131.0.16:9093 tcp 172.30.67.219:9094 10.129.2.16:9094,10.131.0.16:9094 f56f59d7-231a-4974-99b3-792e2741ec8d Service_openshif tcp 172.30.89.212:443 10.128.0.41:8443,10.129.0.68:8443,10.130.0.44:8443 08c2c6d7-d217-4b96-b5d8-c80c4e258116 Service_openshif tcp 172.30.102.137:2379 10.0.0.3:2379,169.254.169.2:2379,10.0.0.5:2379 tcp 172.30.102.137:9979 10.0.0.3:9979,169.254.169.2:9979,10.0.0.5:9979 60a69c56-fc6a-4de6-bd88-3f2af5ba5665 Service_openshif tcp 172.30.10.193:443 10.129.0.25:8443 ab1ef694-0826-4671-a22c-565fc2d282ec Service_openshif tcp 172.30.196.123:443 10.128.0.33:8443,10.129.0.64:8443,10.130.0.37:8443 b1fb34d3-0944-4770-9ee3-2683e7a630e2 Service_openshif tcp 172.30.158.93:8443 10.129.0.13:8443 95811c11-56e2-4877-be1e-c78ccb3a82a9 Service_openshif tcp 172.30.46.85:9001 10.130.0.16:9001 4baba1d1-b873-4535-884c-3f6fc07a50fd Service_openshif tcp 172.30.28.87:443 10.129.0.26:8443 6c2e1c90-f0ca-484e-8a8e-40e71442110a Service_openshif udp 172.30.0.10:53 10.128.0.13:5353,10.128.2.6:5353,10.129.0.39:5353,10.129.2.6:5353,10.130.0.11:5353,10.131.0.9:5353
UUID LB PROTO VIP IPs 7c84c673-ed2a-4436-9a1f-9bc5dd181eea Service_default/ tcp 172.30.0.1:443 10.0.0.3:6443,169.254.169.2:6443,10.0.0.5:6443 4d663fd9-ddc8-4271-b333-4c0e279e20bb Service_default/ tcp 172.30.0.1:443 10.0.0.3:6443,10.0.0.4:6443,10.0.0.5:6443 292eb07f-b82f-4962-868a-4f541d250bca Service_openshif tcp 172.30.105.247:443 10.129.0.12:8443 034b5a7f-bb6a-45e9-8e6d-573a82dc5ee3 Service_openshif tcp 172.30.192.38:443 10.0.0.3:10259,10.0.0.4:10259,10.0.0.5:10259 a68bb53e-be84-48df-bd38-bdd82fcd4026 Service_openshif tcp 172.30.161.125:8443 10.129.0.32:8443 6cc21b3d-2c54-4c94-8ff5-d8e017269c2e Service_openshif tcp 172.30.3.144:443 10.129.0.22:8443 37996ffd-7268-4862-a27f-61cd62e09c32 Service_openshif tcp 172.30.181.107:443 10.129.0.18:8443 81d4da3c-f811-411f-ae0c-bc6713d0861d Service_openshif tcp 172.30.228.23:443 10.129.0.29:8443 ac5a4f3b-b6ba-4ceb-82d0-d84f2c41306e Service_openshif tcp 172.30.14.240:9443 10.129.0.36:9443 c88979fb-1ef5-414b-90ac-43b579351ac9 Service_openshif tcp 172.30.231.192:9001 10.128.0.5:9001,10.128.2.5:9001,10.129.0.5:9001,10.129.2.4:9001,10.130.0.3:9001,10.131.0.3:9001 fcb0a3fb-4a77-4230-a84a-be45dce757e8 Service_openshif tcp 172.30.189.92:443 10.130.0.17:8440 67ef3e7b-ceb9-4bf0-8d96-b43bde4c9151 Service_openshif tcp 172.30.67.218:443 10.129.0.9:8443 d0032fba-7d5e-424a-af25-4ab9b5d46e81 Service_openshif tcp 172.30.102.137:2379 10.0.0.3:2379,10.0.0.4:2379,10.0.0.5:2379 tcp 172.30.102.137:9979 10.0.0.3:9979,10.0.0.4:9979,10.0.0.5:9979 7361c537-3eec-4e6c-bc0c-0522d182abd4 Service_openshif tcp 172.30.198.215:9001 10.0.0.3:9001,10.0.0.4:9001,10.0.0.5:9001,10.0.128.2:9001,10.0.128.3:9001,10.0.128.4:9001 0296c437-1259-410b-a6fd-81c310ad0af5 Service_openshif tcp 172.30.198.215:9001 10.0.0.3:9001,169.254.169.2:9001,10.0.0.5:9001,10.0.128.2:9001,10.0.128.3:9001,10.0.128.4:9001 5d5679f5-45b8-479d-9f7c-08b123c688b8 Service_openshif tcp 172.30.38.253:17698 10.128.0.52:17698,10.129.0.84:17698,10.130.0.60:17698 2adcbab4-d1c9-447d-9573-b5dc9f2efbfa Service_openshif tcp 172.30.148.52:443 10.0.0.4:9202,10.0.0.5:9202 tcp 172.30.148.52:444 10.0.0.4:9203,10.0.0.5:9203 tcp 172.30.148.52:445 10.0.0.4:9204,10.0.0.5:9204 tcp 172.30.148.52:446 10.0.0.4:9205,10.0.0.5:9205 2a33a6d7-af1b-4892-87cc-326a380b809b Service_openshif tcp 172.30.67.219:9091 10.129.2.16:9091,10.131.0.16:9091 tcp 172.30.67.219:9092 10.129.2.16:9092,10.131.0.16:9092 tcp 172.30.67.219:9093 10.129.2.16:9093,10.131.0.16:9093 tcp 172.30.67.219:9094 10.129.2.16:9094,10.131.0.16:9094 f56f59d7-231a-4974-99b3-792e2741ec8d Service_openshif tcp 172.30.89.212:443 10.128.0.41:8443,10.129.0.68:8443,10.130.0.44:8443 08c2c6d7-d217-4b96-b5d8-c80c4e258116 Service_openshif tcp 172.30.102.137:2379 10.0.0.3:2379,169.254.169.2:2379,10.0.0.5:2379 tcp 172.30.102.137:9979 10.0.0.3:9979,169.254.169.2:9979,10.0.0.5:9979 60a69c56-fc6a-4de6-bd88-3f2af5ba5665 Service_openshif tcp 172.30.10.193:443 10.129.0.25:8443 ab1ef694-0826-4671-a22c-565fc2d282ec Service_openshif tcp 172.30.196.123:443 10.128.0.33:8443,10.129.0.64:8443,10.130.0.37:8443 b1fb34d3-0944-4770-9ee3-2683e7a630e2 Service_openshif tcp 172.30.158.93:8443 10.129.0.13:8443 95811c11-56e2-4877-be1e-c78ccb3a82a9 Service_openshif tcp 172.30.46.85:9001 10.130.0.16:9001 4baba1d1-b873-4535-884c-3f6fc07a50fd Service_openshif tcp 172.30.28.87:443 10.129.0.26:8443 6c2e1c90-f0ca-484e-8a8e-40e71442110a Service_openshif udp 172.30.0.10:53 10.128.0.13:5353,10.128.2.6:5353,10.129.0.39:5353,10.129.2.6:5353,10.130.0.11:5353,10.131.0.9:5353
Copy to Clipboard Copied! NoteFrom this truncated output you can see there are many OVN-Kubernetes load balancers. Load balancers in OVN-Kubernetes are representations of services.
Run the following command to display the options available with the command
ovn-nbctl
:oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c nbdb ovn-nbctl --help
$ oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c nbdb ovn-nbctl --help
Copy to Clipboard Copied!
26.2.4. Command-line arguments for ovn-nbctl to examine northbound database contents
The following table describes the command-line arguments that can be used with ovn-nbctl
to examine the contents of the northbound database.
Open a remote shell in the pod you want to view the contents of and then run the ovn-nbctl
commands.
Argument | Description |
---|---|
| An overview of the northbound database contents as seen from a specific node. |
| Show the details associated with the specified switch or router. |
| Show the logical routers. |
|
Using the router information from |
| Show network address translation details for the specified router. |
| Show the logical switches |
|
Using the switch information from |
| Get the type for the logical port. |
| Show the load balancers. |
26.2.5. Listing the OVN-Kubernetes southbound database contents
Each node is controlled by the ovnkube-controller
container running in the ovnkube-node
pod on that node. To understand the OVN logical networking entities you need to examine the northbound database that is running as a container inside the ovnkube-node
pod on that node to see what objects are in the node you wish to see.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role. -
The OpenShift CLI (
oc
) installed.
To run ovn nbctl
or sbctl
commands in a cluster you must open a remote shell into the nbdb
or sbdb
containers on the relevant node
List the pods by running the following command:
oc get po -n openshift-ovn-kubernetes
$ oc get po -n openshift-ovn-kubernetes
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE ovnkube-control-plane-8444dff7f9-4lh9k 2/2 Running 0 27m ovnkube-control-plane-8444dff7f9-5rjh9 2/2 Running 0 27m ovnkube-node-55xs2 8/8 Running 0 26m ovnkube-node-7r84r 8/8 Running 0 16m ovnkube-node-bqq8p 8/8 Running 0 17m ovnkube-node-mkj4f 8/8 Running 0 26m ovnkube-node-mlr8k 8/8 Running 0 26m ovnkube-node-wqn2m 8/8 Running 0 16m
NAME READY STATUS RESTARTS AGE ovnkube-control-plane-8444dff7f9-4lh9k 2/2 Running 0 27m ovnkube-control-plane-8444dff7f9-5rjh9 2/2 Running 0 27m ovnkube-node-55xs2 8/8 Running 0 26m ovnkube-node-7r84r 8/8 Running 0 16m ovnkube-node-bqq8p 8/8 Running 0 17m ovnkube-node-mkj4f 8/8 Running 0 26m ovnkube-node-mlr8k 8/8 Running 0 26m ovnkube-node-wqn2m 8/8 Running 0 16m
Copy to Clipboard Copied! Optional: To list the pods with node information, run the following command:
oc get pods -n openshift-ovn-kubernetes -owide
$ oc get pods -n openshift-ovn-kubernetes -owide
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-control-plane-8444dff7f9-4lh9k 2/2 Running 0 27m 10.0.0.3 ci-ln-t487nnb-72292-mdcnq-master-1 <none> <none> ovnkube-control-plane-8444dff7f9-5rjh9 2/2 Running 0 27m 10.0.0.4 ci-ln-t487nnb-72292-mdcnq-master-2 <none> <none> ovnkube-node-55xs2 8/8 Running 0 26m 10.0.0.4 ci-ln-t487nnb-72292-mdcnq-master-2 <none> <none> ovnkube-node-7r84r 8/8 Running 0 17m 10.0.128.3 ci-ln-t487nnb-72292-mdcnq-worker-b-wbz7z <none> <none> ovnkube-node-bqq8p 8/8 Running 0 17m 10.0.128.2 ci-ln-t487nnb-72292-mdcnq-worker-a-lh7ms <none> <none> ovnkube-node-mkj4f 8/8 Running 0 27m 10.0.0.5 ci-ln-t487nnb-72292-mdcnq-master-0 <none> <none> ovnkube-node-mlr8k 8/8 Running 0 27m 10.0.0.3 ci-ln-t487nnb-72292-mdcnq-master-1 <none> <none> ovnkube-node-wqn2m 8/8 Running 0 17m 10.0.128.4 ci-ln-t487nnb-72292-mdcnq-worker-c-przlm <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-control-plane-8444dff7f9-4lh9k 2/2 Running 0 27m 10.0.0.3 ci-ln-t487nnb-72292-mdcnq-master-1 <none> <none> ovnkube-control-plane-8444dff7f9-5rjh9 2/2 Running 0 27m 10.0.0.4 ci-ln-t487nnb-72292-mdcnq-master-2 <none> <none> ovnkube-node-55xs2 8/8 Running 0 26m 10.0.0.4 ci-ln-t487nnb-72292-mdcnq-master-2 <none> <none> ovnkube-node-7r84r 8/8 Running 0 17m 10.0.128.3 ci-ln-t487nnb-72292-mdcnq-worker-b-wbz7z <none> <none> ovnkube-node-bqq8p 8/8 Running 0 17m 10.0.128.2 ci-ln-t487nnb-72292-mdcnq-worker-a-lh7ms <none> <none> ovnkube-node-mkj4f 8/8 Running 0 27m 10.0.0.5 ci-ln-t487nnb-72292-mdcnq-master-0 <none> <none> ovnkube-node-mlr8k 8/8 Running 0 27m 10.0.0.3 ci-ln-t487nnb-72292-mdcnq-master-1 <none> <none> ovnkube-node-wqn2m 8/8 Running 0 17m 10.0.128.4 ci-ln-t487nnb-72292-mdcnq-worker-c-przlm <none> <none>
Copy to Clipboard Copied! Navigate into a pod to look at the southbound database:
oc rsh -c sbdb -n openshift-ovn-kubernetes ovnkube-node-55xs2
$ oc rsh -c sbdb -n openshift-ovn-kubernetes ovnkube-node-55xs2
Copy to Clipboard Copied! Run the following command to show all the objects in the southbound database:
ovn-sbctl show
$ ovn-sbctl show
Copy to Clipboard Copied! Example output
Chassis "5db31703-35e9-413b-8cdf-69e7eecb41f7" hostname: ci-ln-9gp362t-72292-v2p94-worker-a-8bmwz Encap geneve ip: "10.0.128.4" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-worker-a-8bmwz Chassis "070debed-99b7-4bce-b17d-17e720b7f8bc" hostname: ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Encap geneve ip: "10.0.128.2" options: {csum="true"} Port_Binding k8s-ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding rtoe-GR_ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding openshift-monitoring_alertmanager-main-1 Port_Binding rtoj-GR_ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding etor-GR_ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding cr-rtos-ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding openshift-e2e-loki_loki-promtail-qcrcz Port_Binding jtor-GR_ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding openshift-multus_network-metrics-daemon-mkd4t Port_Binding openshift-ingress-canary_ingress-canary-xtvj4 Port_Binding openshift-ingress_router-default-6c76cbc498-pvlqk Port_Binding openshift-dns_dns-default-zz582 Port_Binding openshift-monitoring_thanos-querier-57585899f5-lbf4f Port_Binding openshift-network-diagnostics_network-check-target-tn228 Port_Binding openshift-monitoring_prometheus-k8s-0 Port_Binding openshift-image-registry_image-registry-68899bd877-xqxjj Chassis "179ba069-0af1-401c-b044-e5ba90f60fea" hostname: ci-ln-9gp362t-72292-v2p94-master-0 Encap geneve ip: "10.0.0.5" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-master-0 Chassis "68c954f2-5a76-47be-9e84-1cb13bd9dab9" hostname: ci-ln-9gp362t-72292-v2p94-worker-c-mjf9w Encap geneve ip: "10.0.128.3" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-worker-c-mjf9w Chassis "2de65d9e-9abf-4b6e-a51d-a1e038b4d8af" hostname: ci-ln-9gp362t-72292-v2p94-master-2 Encap geneve ip: "10.0.0.4" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-master-2 Chassis "1d371cb8-5e21-44fd-9025-c4b162cc4247" hostname: ci-ln-9gp362t-72292-v2p94-master-1 Encap geneve ip: "10.0.0.3" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-master-1
Chassis "5db31703-35e9-413b-8cdf-69e7eecb41f7" hostname: ci-ln-9gp362t-72292-v2p94-worker-a-8bmwz Encap geneve ip: "10.0.128.4" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-worker-a-8bmwz Chassis "070debed-99b7-4bce-b17d-17e720b7f8bc" hostname: ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Encap geneve ip: "10.0.128.2" options: {csum="true"} Port_Binding k8s-ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding rtoe-GR_ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding openshift-monitoring_alertmanager-main-1 Port_Binding rtoj-GR_ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding etor-GR_ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding cr-rtos-ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding openshift-e2e-loki_loki-promtail-qcrcz Port_Binding jtor-GR_ci-ln-9gp362t-72292-v2p94-worker-b-svmp6 Port_Binding openshift-multus_network-metrics-daemon-mkd4t Port_Binding openshift-ingress-canary_ingress-canary-xtvj4 Port_Binding openshift-ingress_router-default-6c76cbc498-pvlqk Port_Binding openshift-dns_dns-default-zz582 Port_Binding openshift-monitoring_thanos-querier-57585899f5-lbf4f Port_Binding openshift-network-diagnostics_network-check-target-tn228 Port_Binding openshift-monitoring_prometheus-k8s-0 Port_Binding openshift-image-registry_image-registry-68899bd877-xqxjj Chassis "179ba069-0af1-401c-b044-e5ba90f60fea" hostname: ci-ln-9gp362t-72292-v2p94-master-0 Encap geneve ip: "10.0.0.5" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-master-0 Chassis "68c954f2-5a76-47be-9e84-1cb13bd9dab9" hostname: ci-ln-9gp362t-72292-v2p94-worker-c-mjf9w Encap geneve ip: "10.0.128.3" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-worker-c-mjf9w Chassis "2de65d9e-9abf-4b6e-a51d-a1e038b4d8af" hostname: ci-ln-9gp362t-72292-v2p94-master-2 Encap geneve ip: "10.0.0.4" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-master-2 Chassis "1d371cb8-5e21-44fd-9025-c4b162cc4247" hostname: ci-ln-9gp362t-72292-v2p94-master-1 Encap geneve ip: "10.0.0.3" options: {csum="true"} Port_Binding tstor-ci-ln-9gp362t-72292-v2p94-master-1
Copy to Clipboard Copied! This detailed output shows the chassis and the ports that are attached to the chassis which in this case are all of the router ports and anything that runs like host networking. Any pods communicate out to the wider network using source network address translation (SNAT). Their IP address is translated into the IP address of the node that the pod is running on and then sent out into the network.
In addition to the chassis information the southbound database has all the logic flows and those logic flows are then sent to the
ovn-controller
running on each of the nodes. Theovn-controller
translates the logic flows into open flow rules and ultimately programsOpenvSwitch
so that your pods can then follow open flow rules and make it out of the network.Run the following command to display the options available with the command
ovn-sbctl
:oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c sbdb ovn-sbctl --help
$ oc exec -n openshift-ovn-kubernetes -it ovnkube-node-55xs2 \ -c sbdb ovn-sbctl --help
Copy to Clipboard Copied!
26.2.6. Command-line arguments for ovn-sbctl to examine southbound database contents
The following table describes the command-line arguments that can be used with ovn-sbctl
to examine the contents of the southbound database.
Open a remote shell in the pod you wish to view the contents of and then run the ovn-sbctl
commands.
Argument | Description |
---|---|
| An overview of the southbound database contents as seen from a specific node. |
| List the contents of southbound database for a the specified port . |
| List the logical flows. |
26.2.7. OVN-Kubernetes logical architecture
OVN is a network virtualization solution. It creates logical switches and routers. These switches and routers are interconnected to create any network topologies. When you run ovnkube-trace
with the log level set to 2 or 5 the OVN-Kubernetes logical components are exposed. The following diagram shows how the routers and switches are connected in OpenShift Container Platform.
Figure 26.2. OVN-Kubernetes router and switch components

The key components involved in packet processing are:
- Gateway routers
-
Gateway routers sometimes called L3 gateway routers, are typically used between the distributed routers and the physical network. Gateway routers including their logical patch ports are bound to a physical location (not distributed), or chassis. The patch ports on this router are known as l3gateway ports in the ovn-southbound database (
ovn-sbdb
). - Distributed logical routers
- Distributed logical routers and the logical switches behind them, to which virtual machines and containers attach, effectively reside on each hypervisor.
- Join local switch
- Join local switches are used to connect the distributed router and gateway routers. It reduces the number of IP addresses needed on the distributed router.
- Logical switches with patch ports
- Logical switches with patch ports are used to virtualize the network stack. They connect remote logical ports through tunnels.
- Logical switches with localnet ports
- Logical switches with localnet ports are used to connect OVN to the physical network. They connect remote logical ports by bridging the packets to directly connected physical L2 segments using localnet ports.
- Patch ports
- Patch ports represent connectivity between logical switches and logical routers and between peer logical routers. A single connection has a pair of patch ports at each such point of connectivity, one on each side.
- l3gateway ports
-
l3gateway ports are the port binding entries in the
ovn-sbdb
for logical patch ports used in the gateway routers. They are called l3gateway ports rather than patch ports just to portray the fact that these ports are bound to a chassis just like the gateway router itself. - localnet ports
-
localnet ports are present on the bridged logical switches that allows a connection to a locally accessible network from each
ovn-controller
instance. This helps model the direct connectivity to the physical network from the logical switches. A logical switch can only have a single localnet port attached to it.
26.2.7.1. Installing network-tools on local host
Install network-tools
on your local host to make a collection of tools available for debugging OpenShift Container Platform cluster network issues.
Procedure
Clone the
network-tools
repository onto your workstation with the following command:git clone git@github.com:openshift/network-tools.git
$ git clone git@github.com:openshift/network-tools.git
Copy to Clipboard Copied! Change into the directory for the repository you just cloned:
cd network-tools
$ cd network-tools
Copy to Clipboard Copied! Optional: List all available commands:
./debug-scripts/network-tools -h
$ ./debug-scripts/network-tools -h
Copy to Clipboard Copied!
26.2.7.2. Running network-tools
Get information about the logical switches and routers by running network-tools
.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster as a user with
cluster-admin
privileges. -
You have installed
network-tools
on local host.
Procedure
List the routers by running the following command:
./debug-scripts/network-tools ovn-db-run-command ovn-nbctl lr-list
$ ./debug-scripts/network-tools ovn-db-run-command ovn-nbctl lr-list
Copy to Clipboard Copied! Example output
944a7b53-7948-4ad2-a494-82b55eeccf87 (GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99) 84bd4a4c-4b0b-4a47-b0cf-a2c32709fc53 (ovn_cluster_router)
944a7b53-7948-4ad2-a494-82b55eeccf87 (GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99) 84bd4a4c-4b0b-4a47-b0cf-a2c32709fc53 (ovn_cluster_router)
Copy to Clipboard Copied! List the localnet ports by running the following command:
./debug-scripts/network-tools ovn-db-run-command \ ovn-sbctl find Port_Binding type=localnet
$ ./debug-scripts/network-tools ovn-db-run-command \ ovn-sbctl find Port_Binding type=localnet
Copy to Clipboard Copied! Example output
_uuid : d05298f5-805b-4838-9224-1211afc2f199 additional_chassis : [] additional_encap : [] chassis : [] datapath : f3c2c959-743b-4037-854d-26627902597c encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : br-ex_ci-ln-54932yb-72292-kd676-worker-c-rzj99 mac : [unknown] mirror_rules : [] nat_addresses : [] options : {network_name=physnet} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 2 type : localnet up : false virtual_parent : [] [...]
_uuid : d05298f5-805b-4838-9224-1211afc2f199 additional_chassis : [] additional_encap : [] chassis : [] datapath : f3c2c959-743b-4037-854d-26627902597c encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : br-ex_ci-ln-54932yb-72292-kd676-worker-c-rzj99 mac : [unknown] mirror_rules : [] nat_addresses : [] options : {network_name=physnet} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 2 type : localnet up : false virtual_parent : [] [...]
Copy to Clipboard Copied! List the
l3gateway
ports by running the following command:./debug-scripts/network-tools ovn-db-run-command \ ovn-sbctl find Port_Binding type=l3gateway
$ ./debug-scripts/network-tools ovn-db-run-command \ ovn-sbctl find Port_Binding type=l3gateway
Copy to Clipboard Copied! Example output
_uuid : 5207a1f3-1cf3-42f1-83e9-387bbb06b03c additional_chassis : [] additional_encap : [] chassis : ca6eb600-3a10-4372-a83e-e0d957c4cd92 datapath : f3c2c959-743b-4037-854d-26627902597c encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : etor-GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99 mac : ["42:01:0a:00:80:04"] mirror_rules : [] nat_addresses : ["42:01:0a:00:80:04 10.0.128.4"] options : {l3gateway-chassis="84737c36-b383-4c83-92c5-2bd5b3c7e772", peer=rtoe-GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 1 type : l3gateway up : true virtual_parent : [] _uuid : 6088d647-84f2-43f2-b53f-c9d379042679 additional_chassis : [] additional_encap : [] chassis : ca6eb600-3a10-4372-a83e-e0d957c4cd92 datapath : dc9cea00-d94a-41b8-bdb0-89d42d13aa2e encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : jtor-GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99 mac : [router] mirror_rules : [] nat_addresses : [] options : {l3gateway-chassis="84737c36-b383-4c83-92c5-2bd5b3c7e772", peer=rtoj-GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 2 type : l3gateway up : true virtual_parent : [] [...]
_uuid : 5207a1f3-1cf3-42f1-83e9-387bbb06b03c additional_chassis : [] additional_encap : [] chassis : ca6eb600-3a10-4372-a83e-e0d957c4cd92 datapath : f3c2c959-743b-4037-854d-26627902597c encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : etor-GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99 mac : ["42:01:0a:00:80:04"] mirror_rules : [] nat_addresses : ["42:01:0a:00:80:04 10.0.128.4"] options : {l3gateway-chassis="84737c36-b383-4c83-92c5-2bd5b3c7e772", peer=rtoe-GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 1 type : l3gateway up : true virtual_parent : [] _uuid : 6088d647-84f2-43f2-b53f-c9d379042679 additional_chassis : [] additional_encap : [] chassis : ca6eb600-3a10-4372-a83e-e0d957c4cd92 datapath : dc9cea00-d94a-41b8-bdb0-89d42d13aa2e encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : jtor-GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99 mac : [router] mirror_rules : [] nat_addresses : [] options : {l3gateway-chassis="84737c36-b383-4c83-92c5-2bd5b3c7e772", peer=rtoj-GR_ci-ln-54932yb-72292-kd676-worker-c-rzj99} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 2 type : l3gateway up : true virtual_parent : [] [...]
Copy to Clipboard Copied! List the patch ports by running the following command:
./debug-scripts/network-tools ovn-db-run-command \ ovn-sbctl find Port_Binding type=patch
$ ./debug-scripts/network-tools ovn-db-run-command \ ovn-sbctl find Port_Binding type=patch
Copy to Clipboard Copied! Example output
_uuid : 785fb8b6-ee5a-4792-a415-5b1cb855dac2 additional_chassis : [] additional_encap : [] chassis : [] datapath : f1ddd1cc-dc0d-43b4-90ca-12651305acec encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : stor-ci-ln-54932yb-72292-kd676-worker-c-rzj99 mac : [router] mirror_rules : [] nat_addresses : ["0a:58:0a:80:02:01 10.128.2.1 is_chassis_resident(\"cr-rtos-ci-ln-54932yb-72292-kd676-worker-c-rzj99\")"] options : {peer=rtos-ci-ln-54932yb-72292-kd676-worker-c-rzj99} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 1 type : patch up : false virtual_parent : [] _uuid : c01ff587-21a5-40b4-8244-4cd0425e5d9a additional_chassis : [] additional_encap : [] chassis : [] datapath : f6795586-bf92-4f84-9222-efe4ac6a7734 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : rtoj-ovn_cluster_router mac : ["0a:58:64:40:00:01 100.64.0.1/16"] mirror_rules : [] nat_addresses : [] options : {peer=jtor-ovn_cluster_router} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 1 type : patch up : false virtual_parent : [] [...]
_uuid : 785fb8b6-ee5a-4792-a415-5b1cb855dac2 additional_chassis : [] additional_encap : [] chassis : [] datapath : f1ddd1cc-dc0d-43b4-90ca-12651305acec encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : stor-ci-ln-54932yb-72292-kd676-worker-c-rzj99 mac : [router] mirror_rules : [] nat_addresses : ["0a:58:0a:80:02:01 10.128.2.1 is_chassis_resident(\"cr-rtos-ci-ln-54932yb-72292-kd676-worker-c-rzj99\")"] options : {peer=rtos-ci-ln-54932yb-72292-kd676-worker-c-rzj99} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 1 type : patch up : false virtual_parent : [] _uuid : c01ff587-21a5-40b4-8244-4cd0425e5d9a additional_chassis : [] additional_encap : [] chassis : [] datapath : f6795586-bf92-4f84-9222-efe4ac6a7734 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : rtoj-ovn_cluster_router mac : ["0a:58:64:40:00:01 100.64.0.1/16"] mirror_rules : [] nat_addresses : [] options : {peer=jtor-ovn_cluster_router} parent_port : [] port_security : [] requested_additional_chassis: [] requested_chassis : [] tag : [] tunnel_key : 1 type : patch up : false virtual_parent : [] [...]
Copy to Clipboard Copied!
26.3. Troubleshooting OVN-Kubernetes
OVN-Kubernetes has many sources of built-in health checks and logs. Follow the instructions in these sections to examine your cluster. If a support case is necessary, follow the support guide to collect additional information through a must-gather
. Only use the -- gather_network_logs
when instructed by support.
26.3.1. Monitoring OVN-Kubernetes health by using readiness probes
The ovnkube-control-plane
and ovnkube-node
pods have containers configured with readiness probes.
Prerequisites
-
Access to the OpenShift CLI (
oc
). -
You have access to the cluster with
cluster-admin
privileges. -
You have installed
jq
.
Procedure
Review the details of the
ovnkube-node
readiness probe by running the following command:oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node \ -o json | jq '.items[0].spec.containers[] | .name,.readinessProbe'
$ oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node \ -o json | jq '.items[0].spec.containers[] | .name,.readinessProbe'
Copy to Clipboard Copied! The readiness probe for the northbound and southbound database containers in the
ovnkube-node
pod checks for the health of the databases and theovnkube-controller
container.The
ovnkube-controller
container in theovnkube-node
pod has a readiness probe to verify the presence of the OVN-Kubernetes CNI configuration file, the absence of which would indicate that the pod is not running or is not ready to accept requests to configure pods.Show all events including the probe failures, for the namespace by using the following command:
oc get events -n openshift-ovn-kubernetes
$ oc get events -n openshift-ovn-kubernetes
Copy to Clipboard Copied! Show the events for just a specific pod:
oc describe pod ovnkube-node-9lqfk -n openshift-ovn-kubernetes
$ oc describe pod ovnkube-node-9lqfk -n openshift-ovn-kubernetes
Copy to Clipboard Copied! Show the messages and statuses from the cluster network operator:
oc get co/network -o json | jq '.status.conditions[]'
$ oc get co/network -o json | jq '.status.conditions[]'
Copy to Clipboard Copied! Show the
ready
status of each container inovnkube-node
pods by running the following script:for p in $(oc get pods --selector app=ovnkube-node -n openshift-ovn-kubernetes \ -o jsonpath='{range.items[*]}{" "}{.metadata.name}'); do echo === $p ===; \ oc get pods -n openshift-ovn-kubernetes $p -o json | jq '.status.containerStatuses[] | .name, .ready'; \ done
$ for p in $(oc get pods --selector app=ovnkube-node -n openshift-ovn-kubernetes \ -o jsonpath='{range.items[*]}{" "}{.metadata.name}'); do echo === $p ===; \ oc get pods -n openshift-ovn-kubernetes $p -o json | jq '.status.containerStatuses[] | .name, .ready'; \ done
Copy to Clipboard Copied! NoteThe expectation is all container statuses are reporting as
true
. Failure of a readiness probe sets the status tofalse
.
26.3.2. Viewing OVN-Kubernetes alerts in the console
The Alerting UI provides detailed information about alerts and their governing alerting rules and silences.
Prerequisites
- You have access to the cluster as a developer or as a user with view permissions for the project that you are viewing metrics for.
Procedure (UI)
- In the Administrator perspective, select Observe → Alerting. The three main pages in the Alerting UI in this perspective are the Alerts, Silences, and Alerting Rules pages.
- View the rules for OVN-Kubernetes alerts by selecting Observe → Alerting → Alerting Rules.
26.3.3. Viewing OVN-Kubernetes alerts in the CLI
You can get information about alerts and their governing alerting rules and silences from the command line.
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role. -
The OpenShift CLI (
oc
) installed. -
You have installed
jq
.
Procedure
View active or firing alerts by running the following commands.
Set the alert manager route environment variable by running the following command:
ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring \ -o jsonpath='{@.spec.host}')
$ ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring \ -o jsonpath='{@.spec.host}')
Copy to Clipboard Copied! Issue a
curl
request to the alert manager route API by running the following command, replacing$ALERT_MANAGER
with the URL of yourAlertmanager
instance:curl -s -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$ALERT_MANAGER/api/v1/alerts | jq '.data[] | "\(.labels.severity) \(.labels.alertname) \(.labels.pod) \(.labels.container) \(.labels.endpoint) \(.labels.instance)"'
$ curl -s -k -H "Authorization: Bearer $(oc create token prometheus-k8s -n openshift-monitoring)" https://$ALERT_MANAGER/api/v1/alerts | jq '.data[] | "\(.labels.severity) \(.labels.alertname) \(.labels.pod) \(.labels.container) \(.labels.endpoint) \(.labels.instance)"'
Copy to Clipboard Copied!
View alerting rules by running the following command:
oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s 'http://localhost:9090/api/v1/rules' | jq '.data.groups[].rules[] | select(((.name|contains("ovn")) or (.name|contains("OVN")) or (.name|contains("Ovn")) or (.name|contains("North")) or (.name|contains("South"))) and .type=="alerting")'
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s 'http://localhost:9090/api/v1/rules' | jq '.data.groups[].rules[] | select(((.name|contains("ovn")) or (.name|contains("OVN")) or (.name|contains("Ovn")) or (.name|contains("North")) or (.name|contains("South"))) and .type=="alerting")'
Copy to Clipboard Copied!
26.3.4. Viewing the OVN-Kubernetes logs using the CLI
You can view the logs for each of the pods in the ovnkube-master
and ovnkube-node
pods using the OpenShift CLI (oc
).
Prerequisites
-
Access to the cluster as a user with the
cluster-admin
role. -
Access to the OpenShift CLI (
oc
). -
You have installed
jq
.
Procedure
View the log for a specific pod:
oc logs -f <pod_name> -c <container_name> -n <namespace>
$ oc logs -f <pod_name> -c <container_name> -n <namespace>
Copy to Clipboard Copied! where:
-f
- Optional: Specifies that the output follows what is being written into the logs.
<pod_name>
- Specifies the name of the pod.
<container_name>
- Optional: Specifies the name of a container. When a pod has more than one container, you must specify the container name.
<namespace>
- Specify the namespace the pod is running in.
For example:
oc logs ovnkube-node-5dx44 -n openshift-ovn-kubernetes
$ oc logs ovnkube-node-5dx44 -n openshift-ovn-kubernetes
Copy to Clipboard Copied! oc logs -f ovnkube-node-5dx44 -c ovnkube-controller -n openshift-ovn-kubernetes
$ oc logs -f ovnkube-node-5dx44 -c ovnkube-controller -n openshift-ovn-kubernetes
Copy to Clipboard Copied! The contents of log files are printed out.
Examine the most recent entries in all the containers in the
ovnkube-node
pods:for p in $(oc get pods --selector app=ovnkube-node -n openshift-ovn-kubernetes \ -o jsonpath='{range.items[*]}{" "}{.metadata.name}'); \ do echo === $p ===; for container in $(oc get pods -n openshift-ovn-kubernetes $p \ -o json | jq -r '.status.containerStatuses[] | .name');do echo ---$container---; \ oc logs -c $container $p -n openshift-ovn-kubernetes --tail=5; done; done
$ for p in $(oc get pods --selector app=ovnkube-node -n openshift-ovn-kubernetes \ -o jsonpath='{range.items[*]}{" "}{.metadata.name}'); \ do echo === $p ===; for container in $(oc get pods -n openshift-ovn-kubernetes $p \ -o json | jq -r '.status.containerStatuses[] | .name');do echo ---$container---; \ oc logs -c $container $p -n openshift-ovn-kubernetes --tail=5; done; done
Copy to Clipboard Copied! View the last 5 lines of every log in every container in an
ovnkube-node
pod using the following command:oc logs -l app=ovnkube-node -n openshift-ovn-kubernetes --all-containers --tail 5
$ oc logs -l app=ovnkube-node -n openshift-ovn-kubernetes --all-containers --tail 5
Copy to Clipboard Copied!
26.3.5. Viewing the OVN-Kubernetes logs using the web console
You can view the logs for each of the pods in the ovnkube-master
and ovnkube-node
pods in the web console.
Prerequisites
-
Access to the OpenShift CLI (
oc
).
Procedure
- In the OpenShift Container Platform console, navigate to Workloads → Pods or navigate to the pod through the resource you want to investigate.
-
Select the
openshift-ovn-kubernetes
project from the drop-down menu. - Click the name of the pod you want to investigate.
-
Click Logs. By default for the
ovnkube-master
the logs associated with thenorthd
container are displayed. - Use the down-down menu to select logs for each container in turn.
26.3.5.1. Changing the OVN-Kubernetes log levels
The default log level for OVN-Kubernetes is 4. To debug OVN-Kubernetes, set the log level to 5. Follow this procedure to increase the log level of the OVN-Kubernetes to help you debug an issue.
Prerequisites
-
You have access to the cluster with
cluster-admin
privileges. - You have access to the OpenShift Container Platform web console.
Procedure
Run the following command to get detailed information for all pods in the OVN-Kubernetes project:
oc get po -o wide -n openshift-ovn-kubernetes
$ oc get po -o wide -n openshift-ovn-kubernetes
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-control-plane-65497d4548-9ptdr 2/2 Running 2 (128m ago) 147m 10.0.0.3 ci-ln-3njdr9b-72292-5nwkp-master-0 <none> <none> ovnkube-control-plane-65497d4548-j6zfk 2/2 Running 0 147m 10.0.0.5 ci-ln-3njdr9b-72292-5nwkp-master-2 <none> <none> ovnkube-node-5dx44 8/8 Running 0 146m 10.0.0.3 ci-ln-3njdr9b-72292-5nwkp-master-0 <none> <none> ovnkube-node-dpfn4 8/8 Running 0 146m 10.0.0.4 ci-ln-3njdr9b-72292-5nwkp-master-1 <none> <none> ovnkube-node-kwc9l 8/8 Running 0 134m 10.0.128.2 ci-ln-3njdr9b-72292-5nwkp-worker-a-2fjcj <none> <none> ovnkube-node-mcrhl 8/8 Running 0 134m 10.0.128.4 ci-ln-3njdr9b-72292-5nwkp-worker-c-v9x5v <none> <none> ovnkube-node-nsct4 8/8 Running 0 146m 10.0.0.5 ci-ln-3njdr9b-72292-5nwkp-master-2 <none> <none> ovnkube-node-zrj9f 8/8 Running 0 134m 10.0.128.3 ci-ln-3njdr9b-72292-5nwkp-worker-b-v78h7 <none> <none>
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES ovnkube-control-plane-65497d4548-9ptdr 2/2 Running 2 (128m ago) 147m 10.0.0.3 ci-ln-3njdr9b-72292-5nwkp-master-0 <none> <none> ovnkube-control-plane-65497d4548-j6zfk 2/2 Running 0 147m 10.0.0.5 ci-ln-3njdr9b-72292-5nwkp-master-2 <none> <none> ovnkube-node-5dx44 8/8 Running 0 146m 10.0.0.3 ci-ln-3njdr9b-72292-5nwkp-master-0 <none> <none> ovnkube-node-dpfn4 8/8 Running 0 146m 10.0.0.4 ci-ln-3njdr9b-72292-5nwkp-master-1 <none> <none> ovnkube-node-kwc9l 8/8 Running 0 134m 10.0.128.2 ci-ln-3njdr9b-72292-5nwkp-worker-a-2fjcj <none> <none> ovnkube-node-mcrhl 8/8 Running 0 134m 10.0.128.4 ci-ln-3njdr9b-72292-5nwkp-worker-c-v9x5v <none> <none> ovnkube-node-nsct4 8/8 Running 0 146m 10.0.0.5 ci-ln-3njdr9b-72292-5nwkp-master-2 <none> <none> ovnkube-node-zrj9f 8/8 Running 0 134m 10.0.128.3 ci-ln-3njdr9b-72292-5nwkp-worker-b-v78h7 <none> <none>
Copy to Clipboard Copied! Create a
ConfigMap
file similar to the following example and use a filename such asenv-overrides.yaml
:Example
ConfigMap
filekind: ConfigMap apiVersion: v1 metadata: name: env-overrides namespace: openshift-ovn-kubernetes data: ci-ln-3njdr9b-72292-5nwkp-master-0: | # This sets the log level for the ovn-kubernetes node process: OVN_KUBE_LOG_LEVEL=5 # You might also/instead want to enable debug logging for ovn-controller: OVN_LOG_LEVEL=dbg ci-ln-3njdr9b-72292-5nwkp-master-2: | # This sets the log level for the ovn-kubernetes node process: OVN_KUBE_LOG_LEVEL=5 # You might also/instead want to enable debug logging for ovn-controller: OVN_LOG_LEVEL=dbg _master: | # This sets the log level for the ovn-kubernetes master process as well as the ovn-dbchecker: OVN_KUBE_LOG_LEVEL=5 # You might also/instead want to enable debug logging for northd, nbdb and sbdb on all masters: OVN_LOG_LEVEL=dbg
kind: ConfigMap apiVersion: v1 metadata: name: env-overrides namespace: openshift-ovn-kubernetes data: ci-ln-3njdr9b-72292-5nwkp-master-0: |
1 # This sets the log level for the ovn-kubernetes node process: OVN_KUBE_LOG_LEVEL=5 # You might also/instead want to enable debug logging for ovn-controller: OVN_LOG_LEVEL=dbg ci-ln-3njdr9b-72292-5nwkp-master-2: | # This sets the log level for the ovn-kubernetes node process: OVN_KUBE_LOG_LEVEL=5 # You might also/instead want to enable debug logging for ovn-controller: OVN_LOG_LEVEL=dbg _master: |
2 # This sets the log level for the ovn-kubernetes master process as well as the ovn-dbchecker: OVN_KUBE_LOG_LEVEL=5 # You might also/instead want to enable debug logging for northd, nbdb and sbdb on all masters: OVN_LOG_LEVEL=dbg
Copy to Clipboard Copied! Apply the
ConfigMap
file by using the following command:oc apply -n openshift-ovn-kubernetes -f env-overrides.yaml
$ oc apply -n openshift-ovn-kubernetes -f env-overrides.yaml
Copy to Clipboard Copied! Example output
configmap/env-overrides.yaml created
configmap/env-overrides.yaml created
Copy to Clipboard Copied! Restart the
ovnkube
pods to apply the new log level by using the following commands:oc delete pod -n openshift-ovn-kubernetes \ --field-selector spec.nodeName=ci-ln-3njdr9b-72292-5nwkp-master-0 -l app=ovnkube-node
$ oc delete pod -n openshift-ovn-kubernetes \ --field-selector spec.nodeName=ci-ln-3njdr9b-72292-5nwkp-master-0 -l app=ovnkube-node
Copy to Clipboard Copied! oc delete pod -n openshift-ovn-kubernetes \ --field-selector spec.nodeName=ci-ln-3njdr9b-72292-5nwkp-master-2 -l app=ovnkube-node
$ oc delete pod -n openshift-ovn-kubernetes \ --field-selector spec.nodeName=ci-ln-3njdr9b-72292-5nwkp-master-2 -l app=ovnkube-node
Copy to Clipboard Copied! oc delete pod -n openshift-ovn-kubernetes -l app=ovnkube-node
$ oc delete pod -n openshift-ovn-kubernetes -l app=ovnkube-node
Copy to Clipboard Copied! To verify that the `ConfigMap`file has been applied to all nodes for a specific pod, run the following command:
oc logs -n openshift-ovn-kubernetes --all-containers --prefix ovnkube-node-<xxxx> | grep -E -m 10 '(Logging config:|vconsole|DBG)'
$ oc logs -n openshift-ovn-kubernetes --all-containers --prefix ovnkube-node-<xxxx> | grep -E -m 10 '(Logging config:|vconsole|DBG)'
Copy to Clipboard Copied! where:
<XXXX>
Specifies the random sequence of letters for a pod from the previous step.
Example output
[pod/ovnkube-node-2cpjc/sbdb] + exec /usr/share/ovn/scripts/ovn-ctl --no-monitor '--ovn-sb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' run_sb_ovsdb [pod/ovnkube-node-2cpjc/ovnkube-controller] I1012 14:39:59.984506 35767 config.go:2247] Logging config: {File: CNIFile:/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:5 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20} [pod/ovnkube-node-2cpjc/northd] + exec ovn-northd --no-chdir -vconsole:info -vfile:off '-vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' --pidfile /var/run/ovn/ovn-northd.pid --n-threads=1 [pod/ovnkube-node-2cpjc/nbdb] + exec /usr/share/ovn/scripts/ovn-ctl --no-monitor '--ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' run_nb_ovsdb [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.552Z|00002|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 6 nodes (32 nodes total across 32 buckets) [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00003|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 6 nodes (64 nodes total across 64 buckets) [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00004|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 7 nodes (32 nodes total across 32 buckets) [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00005|reconnect|DBG|unix:/var/run/openvswitch/db.sock: entering BACKOFF [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00007|reconnect|DBG|unix:/var/run/openvswitch/db.sock: entering CONNECTING [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00008|ovsdb_cs|DBG|unix:/var/run/openvswitch/db.sock: SERVER_SCHEMA_REQUESTED -> SERVER_SCHEMA_REQUESTED at lib/ovsdb-cs.c:423
[pod/ovnkube-node-2cpjc/sbdb] + exec /usr/share/ovn/scripts/ovn-ctl --no-monitor '--ovn-sb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' run_sb_ovsdb [pod/ovnkube-node-2cpjc/ovnkube-controller] I1012 14:39:59.984506 35767 config.go:2247] Logging config: {File: CNIFile:/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log LibovsdbFile:/var/log/ovnkube/libovsdb.log Level:5 LogFileMaxSize:100 LogFileMaxBackups:5 LogFileMaxAge:0 ACLLoggingRateLimit:20} [pod/ovnkube-node-2cpjc/northd] + exec ovn-northd --no-chdir -vconsole:info -vfile:off '-vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' --pidfile /var/run/ovn/ovn-northd.pid --n-threads=1 [pod/ovnkube-node-2cpjc/nbdb] + exec /usr/share/ovn/scripts/ovn-ctl --no-monitor '--ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D{%Y-%m-%dT%H:%M:%S.###Z}|%05N|%c%T|%p|%m' run_nb_ovsdb [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.552Z|00002|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 6 nodes (32 nodes total across 32 buckets) [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00003|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 6 nodes (64 nodes total across 64 buckets) [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00004|hmap|DBG|lib/shash.c:114: 1 bucket with 6+ nodes, including 1 bucket with 7 nodes (32 nodes total across 32 buckets) [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00005|reconnect|DBG|unix:/var/run/openvswitch/db.sock: entering BACKOFF [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00007|reconnect|DBG|unix:/var/run/openvswitch/db.sock: entering CONNECTING [pod/ovnkube-node-2cpjc/ovn-controller] 2023-10-12T14:39:54.553Z|00008|ovsdb_cs|DBG|unix:/var/run/openvswitch/db.sock: SERVER_SCHEMA_REQUESTED -> SERVER_SCHEMA_REQUESTED at lib/ovsdb-cs.c:423
Copy to Clipboard Copied!
Optional: Check the
ConfigMap
file has been applied by running the following command:for f in $(oc -n openshift-ovn-kubernetes get po -l 'app=ovnkube-node' --no-headers -o custom-columns=N:.metadata.name) ; do echo "---- $f ----" ; oc -n openshift-ovn-kubernetes exec -c ovnkube-controller $f -- pgrep -a -f init-ovnkube-controller | grep -P -o '^.*loglevel\s+\d' ; done
for f in $(oc -n openshift-ovn-kubernetes get po -l 'app=ovnkube-node' --no-headers -o custom-columns=N:.metadata.name) ; do echo "---- $f ----" ; oc -n openshift-ovn-kubernetes exec -c ovnkube-controller $f -- pgrep -a -f init-ovnkube-controller | grep -P -o '^.*loglevel\s+\d' ; done
Copy to Clipboard Copied! Example output
---- ovnkube-node-2dt57 ---- 60981 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-c-vmh5n.c.openshift-qe.internal --init-node xpst8-worker-c-vmh5n.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 ---- ovnkube-node-4zznh ---- 178034 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-2.c.openshift-qe.internal --init-node xpst8-master-2.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 ---- ovnkube-node-548sx ---- 77499 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-a-fjtnb.c.openshift-qe.internal --init-node xpst8-worker-a-fjtnb.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 ---- ovnkube-node-6btrf ---- 73781 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-b-p8rww.c.openshift-qe.internal --init-node xpst8-worker-b-p8rww.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 ---- ovnkube-node-fkc9r ---- 130707 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-0.c.openshift-qe.internal --init-node xpst8-master-0.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 5 ---- ovnkube-node-tk9l4 ---- 181328 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-1.c.openshift-qe.internal --init-node xpst8-master-1.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
---- ovnkube-node-2dt57 ---- 60981 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-c-vmh5n.c.openshift-qe.internal --init-node xpst8-worker-c-vmh5n.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 ---- ovnkube-node-4zznh ---- 178034 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-2.c.openshift-qe.internal --init-node xpst8-master-2.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 ---- ovnkube-node-548sx ---- 77499 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-a-fjtnb.c.openshift-qe.internal --init-node xpst8-worker-a-fjtnb.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 ---- ovnkube-node-6btrf ---- 73781 /usr/bin/ovnkube --init-ovnkube-controller xpst8-worker-b-p8rww.c.openshift-qe.internal --init-node xpst8-worker-b-p8rww.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4 ---- ovnkube-node-fkc9r ---- 130707 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-0.c.openshift-qe.internal --init-node xpst8-master-0.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 5 ---- ovnkube-node-tk9l4 ---- 181328 /usr/bin/ovnkube --init-ovnkube-controller xpst8-master-1.c.openshift-qe.internal --init-node xpst8-master-1.c.openshift-qe.internal --config-file=/run/ovnkube-config/ovnkube.conf --ovn-empty-lb-events --loglevel 4
Copy to Clipboard Copied!
26.3.6. Checking the OVN-Kubernetes pod network connectivity
The connectivity check controller, in OpenShift Container Platform 4.10 and later, orchestrates connection verification checks in your cluster. These include Kubernetes API, OpenShift API and individual nodes. The results for the connection tests are stored in PodNetworkConnectivity
objects in the openshift-network-diagnostics
namespace. Connection tests are performed every minute in parallel.
Prerequisites
-
Access to the OpenShift CLI (
oc
). -
Access to the cluster as a user with the
cluster-admin
role. -
You have installed
jq
.
Procedure
To list the current
PodNetworkConnectivityCheck
objects, enter the following command:oc get podnetworkconnectivitychecks -n openshift-network-diagnostics
$ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics
Copy to Clipboard Copied! View the most recent success for each connection object by using the following command:
oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \ -o json | jq '.items[]| .spec.targetEndpoint,.status.successes[0]'
$ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \ -o json | jq '.items[]| .spec.targetEndpoint,.status.successes[0]'
Copy to Clipboard Copied! View the most recent failures for each connection object by using the following command:
oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \ -o json | jq '.items[]| .spec.targetEndpoint,.status.failures[0]'
$ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \ -o json | jq '.items[]| .spec.targetEndpoint,.status.failures[0]'
Copy to Clipboard Copied! View the most recent outages for each connection object by using the following command:
oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \ -o json | jq '.items[]| .spec.targetEndpoint,.status.outages[0]'
$ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \ -o json | jq '.items[]| .spec.targetEndpoint,.status.outages[0]'
Copy to Clipboard Copied! The connectivity check controller also logs metrics from these checks into Prometheus.
View all the metrics by running the following command:
oc exec prometheus-k8s-0 -n openshift-monitoring -- \ promtool query instant http://localhost:9090 \ '{component="openshift-network-diagnostics"}'
$ oc exec prometheus-k8s-0 -n openshift-monitoring -- \ promtool query instant http://localhost:9090 \ '{component="openshift-network-diagnostics"}'
Copy to Clipboard Copied! View the latency between the source pod and the openshift api service for the last 5 minutes:
oc exec prometheus-k8s-0 -n openshift-monitoring -- \ promtool query instant http://localhost:9090 \ '{component="openshift-network-diagnostics"}'
$ oc exec prometheus-k8s-0 -n openshift-monitoring -- \ promtool query instant http://localhost:9090 \ '{component="openshift-network-diagnostics"}'
Copy to Clipboard Copied!
26.4. OVN-Kubernetes network policy
The AdminNetworkPolicy
resource is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
Kubernetes offers two features that users can use to enforce network security. One feature that allows users to enforce network policy is the NetworkPolicy
API that is designed mainly for application developers and namespace tenants to protect their namespaces by creating namespace-scoped policies. For more information, see About network policy.
The second feature is AdminNetworkPolicy
which is comprised of two API: the AdminNetworkPolicy
(ANP) API and the BaselineAdminNetworkPolicy
(BANP) API. ANP and BANP are designed for cluster and network administrators to protect their entire cluster by creating cluster-scoped policies. Cluster administrators can use ANPs to enforce non-overridable policies that take precedence over NetworkPolicy
objects. Administrators can use BANP to setup and enforce optional cluster-scoped network policy rules that are overridable by users using NetworkPolicy
objects if need be. When used together ANP and BANP can create multi-tenancy policy that administrators can use to secure their cluster.
OVN-Kubernetes CNI in OpenShift Container Platform implements these network policies using Access Control List (ACLs) Tiers to evaluate and apply them. ACLs are evaluated in descending order from Tier 1 to Tier 3.
Tier 1 evaluates AdminNetworkPolicy
(ANP) objects. Tier 2 evaluates NetworkPolicy
objects. Tier 3 evaluates BaselineAdminNetworkPolicy
(BANP) objects.
Figure 26.3. OVK-Kubernetes Access Control List (ACL)

If traffic matches an ANP rule, the rules in that ANP will be evaluated first. If the match is an ANP allow
or deny
rule, any existing NetworkPolicies
and BaselineAdminNetworkPolicy
(BANP) in the cluster will be intentionally skipped from evaluation. If the match is an ANP pass
rule, then evaluation moves from tier 1 of the ACLs to tier 2 where the NetworkPolicy
policy is evaluated.
26.4.1. AdminNetworkPolicy
An AdminNetworkPolicy
(ANP) is a cluster-scoped custom resource definition (CRD). As a OpenShift Container Platform administrator, you can use ANP to secure your network by creating network policies before creating namespaces. Additionally, you can create network policies on a cluster-scoped level that is non-overridable by NetworkPolicy
objects.
The key difference between AdminNetworkPolicy
and NetworkPolicy
objects are that the former is for administrators and is cluster scoped while the latter is for tenant owners and is namespace scoped.
An ANP allows administrators to specify the following:
-
A
priority
value that determines the order of its evaluation. The lower the value the higher the precedence. -
A
subject
that consists of a set of namespaces or namespace.. -
A list of ingress rules to be applied for all ingress traffic towards the
subject
. -
A list of egress rules to be applied for all egress traffic from the
subject
.
The AdminNetworkPolicy
resource is a TechnologyPreviewNoUpgrade
feature that can be enabled on test clusters that are not in production. For more information on feature gates and TechnologyPreviewNoUpgrade
features, see "Enabling features using feature gates" in the "Additional resources" of this section.
AdminNetworkPolicy example
Example 26.1. Example YAML file for an ANP
apiVersion: policy.networking.k8s.io/v1alpha1 kind: AdminNetworkPolicy metadata: name: sample-anp-deny-pass-rules spec: priority: 50 subject: namespaces: matchLabels: kubernetes.io/metadata.name: example.name ingress: - name: "deny-all-ingress-tenant-1" action: "Deny" from: - pods: namespaces: namespaceSelector: matchLabels: custom-anp: tenant-1 podSelector: matchLabels: custom-anp: tenant-1 egress: - name: "pass-all-egress-to-tenant-1" action: "Pass" to: - pods: namespaces: namespaceSelector: matchLabels: custom-anp: tenant-1 podSelector: matchLabels: custom-anp: tenant-1
apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
name: sample-anp-deny-pass-rules
spec:
priority: 50
subject:
namespaces:
matchLabels:
kubernetes.io/metadata.name: example.name
ingress:
- name: "deny-all-ingress-tenant-1"
action: "Deny"
from:
- pods:
namespaces:
namespaceSelector:
matchLabels:
custom-anp: tenant-1
podSelector:
matchLabels:
custom-anp: tenant-1
egress:
- name: "pass-all-egress-to-tenant-1"
action: "Pass"
to:
- pods:
namespaces:
namespaceSelector:
matchLabels:
custom-anp: tenant-1
podSelector:
matchLabels:
custom-anp: tenant-1
- 1
- Specify a name for your ANP.
- 2
- The
spec.priority
field supports a maximum of 100 ANP in the values of 0-99 in a cluster. The lower the value the higher the precedence. CreatingAdminNetworkPolicy
with the same priority creates a nondeterministic outcome. - 3
- Specify the namespace to apply the ANP resource.
- 4
- ANP have both ingress and egress rules. ANP rules for
spec.ingress
field accepts values ofPass
,Deny
, andAllow
for theaction
field. - 5
- Specify a name for the
ingress.name
. - 6
- Specify the namespaces to select the pods from to apply the ANP resource.
- 7
- Specify
podSelector.matchLabels
name of the pods to apply the ANP resource. - 8
- ANP have both ingress and egress rules. ANP rules for
spec.egress
field accepts values ofPass
,Deny
, andAllow
for theaction
field.
Additional resources
26.4.1.1. AdminNetworkPolicy actions for rules
As an administrator, you can set Allow
, Deny
, or Pass
as the action
field for your AdminNetworkPolicy
rules. Because OVN-Kubernetes uses a tiered ACLs to evaluate network traffic rules, ANP allow you to set very strong policy rules that can only be changed by an administrator modifying them, deleting the rule, or overriding them by setting a higher priority rule.
AdminNetworkPolicy Allow example
The following ANP that is defined at priority 9 ensures all ingress traffic is allowed from the monitoring
namespace towards any tenant (all other namespaces) in the cluster.
Example 26.2. Example YAML file for a strong Allow
ANP
apiVersion: policy.networking.k8s.io/v1alpha1 kind: AdminNetworkPolicy metadata: name: allow-monitoring spec: priority: 9 subject: namespaces: {} ingress: - name: "allow-ingress-from-monitoring" action: "Allow" from: - namespaces: namespaceSelector: matchLabels: kubernetes.io/metadata.name: monitoring # ...
apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
name: allow-monitoring
spec:
priority: 9
subject:
namespaces: {}
ingress:
- name: "allow-ingress-from-monitoring"
action: "Allow"
from:
- namespaces:
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
# ...
This is an example of a strong Allow
ANP because it is non-overridable by all the parties involved. No tenants can block themselves from being monitored using NetworkPolicy
objects and the monitoring tenant also has no say in what it can or cannot monitor.
AdminNetworkPolicy Deny example
The following ANP that is defined at priority 5 ensures all ingress traffic from the monitoring
namespace is blocked towards restricted tenants (namespaces that have labels security: restricted
).
Example 26.3. Example YAML file for a strong Deny
ANP
apiVersion: policy.networking.k8s.io/v1alpha1 kind: AdminNetworkPolicy metadata: name: block-monitoring spec: priority: 5 subject: namespaces: matchLabels: security: restricted ingress: - name: "deny-ingress-from-monitoring" action: "Deny" from: - namespaces: namespaceSelector: matchLabels: kubernetes.io/metadata.name: monitoring # ...
apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
name: block-monitoring
spec:
priority: 5
subject:
namespaces:
matchLabels:
security: restricted
ingress:
- name: "deny-ingress-from-monitoring"
action: "Deny"
from:
- namespaces:
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
# ...
This is a strong Deny
ANP that is non-overridable by all the parties involved. The restricted tenant owners cannot authorize themselves to allow monitoring traffic, and the infrastructure’s monitoring service cannot scrape anything from these sensitive namespaces.
When combined with the strong Allow
example, the block-monitoring
ANP has a lower priority value giving it higher precedence, which ensures restricted tenants are never monitored.
AdminNetworkPolicy Pass example
TThe following ANP that is defined at priority 7 ensures all ingress traffic from the monitoring
namespace towards internal infrastructure tenants (namespaces that have labels security: internal
) are passed on to tier 2 of the ACLs and evaluated by the namespaces’ NetworkPolicy
objects.
Example 26.4. Example YAML file for a strong Pass
ANP
apiVersion: policy.networking.k8s.io/v1alpha1 kind: AdminNetworkPolicy metadata: name: pass-monitoring spec: priority: 7 subject: namespaces: matchLabels: security: internal ingress: - name: "pass-ingress-from-monitoring" action: "Pass" from: - namespaces: namespaceSelector: matchLabels: kubernetes.io/metadata.name: monitoring # ...
apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
name: pass-monitoring
spec:
priority: 7
subject:
namespaces:
matchLabels:
security: internal
ingress:
- name: "pass-ingress-from-monitoring"
action: "Pass"
from:
- namespaces:
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
# ...
This example is a strong Pass
action ANP because it delegates the decision to NetworkPolicy
objects defined by tenant owners. This pass-monitoring
ANP allows all tenant owners grouped at security level internal
to choose if their metrics should be scraped by the infrastructures' monitoring service using namespace scoped NetworkPolicy
objects.
26.4.2. BaselineAdminNetworkPolicy
BaselineAdminNetworkPolicy
(BANP) is a cluster-scoped custom resource definition (CRD). As a OpenShift Container Platform administrator, you can use BANP to setup and enforce optional baseline network policy rules that are overridable by users using NetworkPolicy
objects if need be. Rule actions for BANP are allow
or deny
.
The BaselineAdminNetworkPolicy
resource is a cluster singleton object that can be used as a guardrail policy incase a passed traffic policy does not match any NetworkPolicy
objects in the cluster. A BANP can also be used as a default security model that provides guardrails that intra-cluster traffic is blocked by default and a user will need to use NetworkPolicy
objects to allow known traffic. You must use default
as the name when creating a BANP resource.
A BANP allows administrators to specify:
-
A
subject
that consists of a set of namespaces or namespace. -
A list of ingress rules to be applied for all ingress traffic towards the
subject
. -
A list of egress rules to be applied for all egress traffic from the
subject
.
BaselineAdminNetworkPolicy
is a TechnologyPreviewNoUpgrade
feature that can be enabled on test clusters that are not in production.
BaselineAdminNetworkPolicy example
Example 26.5. Example YAML file for BANP
apiVersion: policy.networking.k8s.io/v1alpha1 kind: BaselineAdminNetworkPolicy metadata: name: default spec: subject: namespaces: matchLabels: kubernetes.io/metadata.name: example.name ingress: - name: "deny-all-ingress-from-tenant-1" action: "Deny" from: - pods: namespaces: namespaceSelector: matchLabels: custom-banp: tenant-1 podSelector: matchLabels: custom-banp: tenant-1 egress: - name: "allow-all-egress-to-tenant-1" action: "Allow" to: - pods: namespaces: namespaceSelector: matchLabels: custom-banp: tenant-1 podSelector: matchLabels: custom-banp: tenant-1
apiVersion: policy.networking.k8s.io/v1alpha1
kind: BaselineAdminNetworkPolicy
metadata:
name: default
spec:
subject:
namespaces:
matchLabels:
kubernetes.io/metadata.name: example.name
ingress:
- name: "deny-all-ingress-from-tenant-1"
action: "Deny"
from:
- pods:
namespaces:
namespaceSelector:
matchLabels:
custom-banp: tenant-1
podSelector:
matchLabels:
custom-banp: tenant-1
egress:
- name: "allow-all-egress-to-tenant-1"
action: "Allow"
to:
- pods:
namespaces:
namespaceSelector:
matchLabels:
custom-banp: tenant-1
podSelector:
matchLabels:
custom-banp: tenant-1
- 1
- The policy name must be
default
because BANP is a singleton object. - 2
- Specify the namespace to apply the ANP to.
- 3
- BANP have both ingress and egress rules. BANP rules for
spec.ingress
andspec.egress
fields accepts values ofDeny
andAllow
for theaction
field. - 4
- Specify a name for the
ingress.name
- 5
- Specify the namespaces to select the pods from to apply the BANP resource.
- 6
- Specify
podSelector.matchLabels
name of the pods to apply the BANP resource.
BaselineAdminNetworkPolicy Deny example
The following BANP singleton ensures that the administrator has set up a default deny policy for all ingress monitoring traffic coming into the tenants at internal
security level. When combined with the "AdminNetworkPolicy Pass example", this deny policy acts as a guardrail policy for all ingress traffic that is passed by the ANP pass-monitoring
policy.
Example 26.6. Example YAML file for a guardrail Deny
rule
apiVersion: policy.networking.k8s.io/v1alpha1 kind: BaselineAdminNetworkPolicy metadata: name: default spec: subject: namespaces: matchLabels: security: internal ingress: - name: "deny-ingress-from-monitoring" action: "Deny" from: - namespaces: namespaceSelector: matchLabels: kubernetes.io/metadata.name: monitoring # ...
apiVersion: policy.networking.k8s.io/v1alpha1
kind: BaselineAdminNetworkPolicy
metadata:
name: default
spec:
subject:
namespaces:
matchLabels:
security: internal
ingress:
- name: "deny-ingress-from-monitoring"
action: "Deny"
from:
- namespaces:
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
# ...
You can use an AdminNetworkPolicy
resource with a Pass
value for the action
field in conjunction with the BaselineAdminNetworkPolicy
resource to create a multi-tenant policy. This multi-tenant policy allows one tenant to collect monitoring data on their application while simultaneously not collecting data from a second tenant.
As an administrator, if you apply both the "AdminNetworkPolicy Pass
action example" and the "BaselineAdminNetwork Policy Deny
example", tenants are then left with the ability to choose to create a NetworkPolicy
resource that will be evaluated before the BANP.
For example, Tenant 1 can set up the following NetworkPolicy
resource to monitor ingress traffic:
Example 26.7. Example NetworkPolicy
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-monitoring namespace: tenant 1 spec: podSelector: policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: monitoring # ...
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-monitoring
namespace: tenant 1
spec:
podSelector:
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: monitoring
# ...
In this scenario, Tenant 1’s policy would be evaluated after the "AdminNetworkPolicy Pass
action example" and before the "BaselineAdminNetwork Policy Deny
example", which denies all ingress monitoring traffic coming into tenants with security
level internal
. With Tenant 1’s NetworkPolicy
object in place, they will be able to collect data on their application. Tenant 2, however, who does not have any NetworkPolicy
objects in place, will not be able to collect data. As an administrator, you have not by default monitored internal tenants, but instead, you created a BANP that allows tenants to use NetworkPolicy
objects to override the default behavior of your BANP.
26.5. Tracing Openflow with ovnkube-trace
OVN and OVS traffic flows can be simulated in a single utility called ovnkube-trace
. The ovnkube-trace
utility runs ovn-trace
, ovs-appctl ofproto/trace
and ovn-detrace
and correlates that information in a single output.
You can execute the ovnkube-trace
binary from a dedicated container. For releases after OpenShift Container Platform 4.7, you can also copy the binary to a local host and execute it from that host.
26.5.1. Installing the ovnkube-trace on local host
The ovnkube-trace
tool traces packet simulations for arbitrary UDP or TCP traffic between points in an OVN-Kubernetes driven OpenShift Container Platform cluster. Copy the ovnkube-trace
binary to your local host making it available to run against the cluster.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges.
Procedure
Create a pod variable by using the following command:
POD=$(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-control-plane -o name | head -1 | awk -F '/' '{print $NF}')
$ POD=$(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-control-plane -o name | head -1 | awk -F '/' '{print $NF}')
Copy to Clipboard Copied! Run the following command on your local host to copy the binary from the
ovnkube-control-plane
pods:oc cp -n openshift-ovn-kubernetes $POD:/usr/bin/ovnkube-trace -c ovnkube-cluster-manager ovnkube-trace
$ oc cp -n openshift-ovn-kubernetes $POD:/usr/bin/ovnkube-trace -c ovnkube-cluster-manager ovnkube-trace
Copy to Clipboard Copied! NoteIf you are using Red Hat Enterprise Linux (RHEL) 8 to run the
ovnkube-trace
tool, you must copy the file/usr/lib/rhel8/ovnkube-trace
to your local host.Make
ovnkube-trace
executable by running the following command:chmod +x ovnkube-trace
$ chmod +x ovnkube-trace
Copy to Clipboard Copied! Display the options available with
ovnkube-trace
by running the following command:./ovnkube-trace -help
$ ./ovnkube-trace -help
Copy to Clipboard Copied! Expected output
Usage of ./ovnkube-trace: -addr-family string Address family (ip4 or ip6) to be used for tracing (default "ip4") -dst string dest: destination pod name -dst-ip string destination IP address (meant for tests to external targets) -dst-namespace string k8s namespace of dest pod (default "default") -dst-port string dst-port: destination port (default "80") -kubeconfig string absolute path to the kubeconfig file -loglevel string loglevel: klog level (default "0") -ovn-config-namespace string namespace used by ovn-config itself -service string service: destination service name -skip-detrace skip ovn-detrace command -src string src: source pod name -src-namespace string k8s namespace of source pod (default "default") -tcp use tcp transport protocol -udp use udp transport protocol
Usage of ./ovnkube-trace: -addr-family string Address family (ip4 or ip6) to be used for tracing (default "ip4") -dst string dest: destination pod name -dst-ip string destination IP address (meant for tests to external targets) -dst-namespace string k8s namespace of dest pod (default "default") -dst-port string dst-port: destination port (default "80") -kubeconfig string absolute path to the kubeconfig file -loglevel string loglevel: klog level (default "0") -ovn-config-namespace string namespace used by ovn-config itself -service string service: destination service name -skip-detrace skip ovn-detrace command -src string src: source pod name -src-namespace string k8s namespace of source pod (default "default") -tcp use tcp transport protocol -udp use udp transport protocol
Copy to Clipboard Copied! The command-line arguments supported are familiar Kubernetes constructs, such as namespaces, pods, services so you do not need to find the MAC address, the IP address of the destination nodes, or the ICMP type.
The log levels are:
- 0 (minimal output)
- 2 (more verbose output showing results of trace commands)
- 5 (debug output)
26.5.2. Running ovnkube-trace
Run ovn-trace
to simulate packet forwarding within an OVN logical network.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. -
You have installed
ovnkube-trace
on local host
Example: Testing that DNS resolution works from a deployed pod
This example illustrates how to test the DNS resolution from a deployed pod to the core DNS pod that runs in the cluster.
Procedure
Start a web service in the default namespace by entering the following command:
oc run web --namespace=default --image=quay.io/openshifttest/nginx --labels="app=web" --expose --port=80
$ oc run web --namespace=default --image=quay.io/openshifttest/nginx --labels="app=web" --expose --port=80
Copy to Clipboard Copied! List the pods running in the
openshift-dns
namespace:oc get pods -n openshift-dns
oc get pods -n openshift-dns
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE dns-default-8s42x 2/2 Running 0 5h8m dns-default-mdw6r 2/2 Running 0 4h58m dns-default-p8t5h 2/2 Running 0 4h58m dns-default-rl6nk 2/2 Running 0 5h8m dns-default-xbgqx 2/2 Running 0 5h8m dns-default-zv8f6 2/2 Running 0 4h58m node-resolver-62jjb 1/1 Running 0 5h8m node-resolver-8z4cj 1/1 Running 0 4h59m node-resolver-bq244 1/1 Running 0 5h8m node-resolver-hc58n 1/1 Running 0 4h59m node-resolver-lm6z4 1/1 Running 0 5h8m node-resolver-zfx5k 1/1 Running 0 5h
NAME READY STATUS RESTARTS AGE dns-default-8s42x 2/2 Running 0 5h8m dns-default-mdw6r 2/2 Running 0 4h58m dns-default-p8t5h 2/2 Running 0 4h58m dns-default-rl6nk 2/2 Running 0 5h8m dns-default-xbgqx 2/2 Running 0 5h8m dns-default-zv8f6 2/2 Running 0 4h58m node-resolver-62jjb 1/1 Running 0 5h8m node-resolver-8z4cj 1/1 Running 0 4h59m node-resolver-bq244 1/1 Running 0 5h8m node-resolver-hc58n 1/1 Running 0 4h59m node-resolver-lm6z4 1/1 Running 0 5h8m node-resolver-zfx5k 1/1 Running 0 5h
Copy to Clipboard Copied! Run the following
ovnkube-trace
command to verify DNS resolution is working:./ovnkube-trace \ -src-namespace default \ -src web \ -dst-namespace openshift-dns \ -dst dns-default-p8t5h \ -udp -dst-port 53 \ -loglevel 0
$ ./ovnkube-trace \ -src-namespace default \
1 -src web \
2 -dst-namespace openshift-dns \
3 -dst dns-default-p8t5h \
4 -udp -dst-port 53 \
5 -loglevel 0
6 Copy to Clipboard Copied! Example output if the
src&dst
pod lands on the same node:ovn-trace source pod to destination pod indicates success from web to dns-default-p8t5h ovn-trace destination pod to source pod indicates success from dns-default-p8t5h to web ovs-appctl ofproto/trace source pod to destination pod indicates success from web to dns-default-p8t5h ovs-appctl ofproto/trace destination pod to source pod indicates success from dns-default-p8t5h to web ovn-detrace source pod to destination pod indicates success from web to dns-default-p8t5h ovn-detrace destination pod to source pod indicates success from dns-default-p8t5h to web
ovn-trace source pod to destination pod indicates success from web to dns-default-p8t5h ovn-trace destination pod to source pod indicates success from dns-default-p8t5h to web ovs-appctl ofproto/trace source pod to destination pod indicates success from web to dns-default-p8t5h ovs-appctl ofproto/trace destination pod to source pod indicates success from dns-default-p8t5h to web ovn-detrace source pod to destination pod indicates success from web to dns-default-p8t5h ovn-detrace destination pod to source pod indicates success from dns-default-p8t5h to web
Copy to Clipboard Copied! Example output if the
src&dst
pod lands on a different node:ovn-trace source pod to destination pod indicates success from web to dns-default-8s42x ovn-trace (remote) source pod to destination pod indicates success from web to dns-default-8s42x ovn-trace destination pod to source pod indicates success from dns-default-8s42x to web ovn-trace (remote) destination pod to source pod indicates success from dns-default-8s42x to web ovs-appctl ofproto/trace source pod to destination pod indicates success from web to dns-default-8s42x ovs-appctl ofproto/trace destination pod to source pod indicates success from dns-default-8s42x to web ovn-detrace source pod to destination pod indicates success from web to dns-default-8s42x ovn-detrace destination pod to source pod indicates success from dns-default-8s42x to web
ovn-trace source pod to destination pod indicates success from web to dns-default-8s42x ovn-trace (remote) source pod to destination pod indicates success from web to dns-default-8s42x ovn-trace destination pod to source pod indicates success from dns-default-8s42x to web ovn-trace (remote) destination pod to source pod indicates success from dns-default-8s42x to web ovs-appctl ofproto/trace source pod to destination pod indicates success from web to dns-default-8s42x ovs-appctl ofproto/trace destination pod to source pod indicates success from dns-default-8s42x to web ovn-detrace source pod to destination pod indicates success from web to dns-default-8s42x ovn-detrace destination pod to source pod indicates success from dns-default-8s42x to web
Copy to Clipboard Copied! The ouput indicates success from the deployed pod to the DNS port and also indicates that it is successful going back in the other direction. So you know bi-directional traffic is supported on UDP port 53 if my web pod wants to do dns resolution from core DNS.
If for example that did not work and you wanted to get the ovn-trace
, the ovs-appctl
of proto/trace
and ovn-detrace
, and more debug type information increase the log level to 2 and run the command again as follows:
./ovnkube-trace \ -src-namespace default \ -src web \ -dst-namespace openshift-dns \ -dst dns-default-467qw \ -udp -dst-port 53 \ -loglevel 2
$ ./ovnkube-trace \
-src-namespace default \
-src web \
-dst-namespace openshift-dns \
-dst dns-default-467qw \
-udp -dst-port 53 \
-loglevel 2
The output from this increased log level is too much to list here. In a failure situation the output of this command shows which flow is dropping that traffic. For example an egress or ingress network policy may be configured on the cluster that does not allow that traffic.
Example: Verifying by using debug output a configured default deny
This example illustrates how to identify by using the debug output that an ingress default deny policy blocks traffic.
Procedure
Create the following YAML that defines a
deny-by-default
policy to deny ingress from all pods in all namespaces. Save the YAML in thedeny-by-default.yaml
file:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-by-default namespace: default spec: podSelector: {} ingress: []
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: deny-by-default namespace: default spec: podSelector: {} ingress: []
Copy to Clipboard Copied! Apply the policy by entering the following command:
oc apply -f deny-by-default.yaml
$ oc apply -f deny-by-default.yaml
Copy to Clipboard Copied! Example output
networkpolicy.networking.k8s.io/deny-by-default created
networkpolicy.networking.k8s.io/deny-by-default created
Copy to Clipboard Copied! Start a web service in the
default
namespace by entering the following command:oc run web --namespace=default --image=quay.io/openshifttest/nginx --labels="app=web" --expose --port=80
$ oc run web --namespace=default --image=quay.io/openshifttest/nginx --labels="app=web" --expose --port=80
Copy to Clipboard Copied! Run the following command to create the
prod
namespace:oc create namespace prod
$ oc create namespace prod
Copy to Clipboard Copied! Run the following command to label the
prod
namespace:oc label namespace/prod purpose=production
$ oc label namespace/prod purpose=production
Copy to Clipboard Copied! Run the following command to deploy an
alpine
image in theprod
namespace and start a shell:oc run test-6459 --namespace=prod --rm -i -t --image=alpine -- sh
$ oc run test-6459 --namespace=prod --rm -i -t --image=alpine -- sh
Copy to Clipboard Copied! - Open another terminal session.
In this new terminal session run
ovn-trace
to verify the failure in communication between the source podtest-6459
running in namespaceprod
and destination pod running in thedefault
namespace:./ovnkube-trace \ -src-namespace prod \ -src test-6459 \ -dst-namespace default \ -dst web \ -tcp -dst-port 80 \ -loglevel 0
$ ./ovnkube-trace \ -src-namespace prod \ -src test-6459 \ -dst-namespace default \ -dst web \ -tcp -dst-port 80 \ -loglevel 0
Copy to Clipboard Copied! Example output
ovn-trace source pod to destination pod indicates failure from test-6459 to web
ovn-trace source pod to destination pod indicates failure from test-6459 to web
Copy to Clipboard Copied! Increase the log level to 2 to expose the reason for the failure by running the following command:
./ovnkube-trace \ -src-namespace prod \ -src test-6459 \ -dst-namespace default \ -dst web \ -tcp -dst-port 80 \ -loglevel 2
$ ./ovnkube-trace \ -src-namespace prod \ -src test-6459 \ -dst-namespace default \ -dst web \ -tcp -dst-port 80 \ -loglevel 2
Copy to Clipboard Copied! Example output
... ------------------------------------------------ 3. ls_out_acl_hint (northd.c:7454): !ct.new && ct.est && !ct.rpl && ct_mark.blocked == 0, priority 4, uuid 12efc456 reg0[8] = 1; reg0[10] = 1; next; 5. ls_out_acl_action (northd.c:7835): reg8[30..31] == 0, priority 500, uuid 69372c5d reg8[30..31] = 1; next(4); 5. ls_out_acl_action (northd.c:7835): reg8[30..31] == 1, priority 500, uuid 2fa0af89 reg8[30..31] = 2; next(4); 4. ls_out_acl_eval (northd.c:7691): reg8[30..31] == 2 && reg0[10] == 1 && (outport == @a16982411286042166782_ingressDefaultDeny), priority 2000, uuid 447d0dab reg8[17] = 1; ct_commit { ct_mark.blocked = 1; }; next; ...
... ------------------------------------------------ 3. ls_out_acl_hint (northd.c:7454): !ct.new && ct.est && !ct.rpl && ct_mark.blocked == 0, priority 4, uuid 12efc456 reg0[8] = 1; reg0[10] = 1; next; 5. ls_out_acl_action (northd.c:7835): reg8[30..31] == 0, priority 500, uuid 69372c5d reg8[30..31] = 1; next(4); 5. ls_out_acl_action (northd.c:7835): reg8[30..31] == 1, priority 500, uuid 2fa0af89 reg8[30..31] = 2; next(4); 4. ls_out_acl_eval (northd.c:7691): reg8[30..31] == 2 && reg0[10] == 1 && (outport == @a16982411286042166782_ingressDefaultDeny), priority 2000, uuid 447d0dab reg8[17] = 1; ct_commit { ct_mark.blocked = 1; };
1 next; ...
Copy to Clipboard Copied! - 1
- Ingress traffic is blocked due to the default deny policy being in place.
Create a policy that allows traffic from all pods in a particular namespaces with a label
purpose=production
. Save the YAML in theweb-allow-prod.yaml
file:kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: web-allow-prod namespace: default spec: podSelector: matchLabels: app: web policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: purpose: production
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: web-allow-prod namespace: default spec: podSelector: matchLabels: app: web policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: purpose: production
Copy to Clipboard Copied! Apply the policy by entering the following command:
oc apply -f web-allow-prod.yaml
$ oc apply -f web-allow-prod.yaml
Copy to Clipboard Copied! Run
ovnkube-trace
to verify that traffic is now allowed by entering the following command:./ovnkube-trace \ -src-namespace prod \ -src test-6459 \ -dst-namespace default \ -dst web \ -tcp -dst-port 80 \ -loglevel 0
$ ./ovnkube-trace \ -src-namespace prod \ -src test-6459 \ -dst-namespace default \ -dst web \ -tcp -dst-port 80 \ -loglevel 0
Copy to Clipboard Copied! Expected output
ovn-trace source pod to destination pod indicates success from test-6459 to web ovn-trace destination pod to source pod indicates success from web to test-6459 ovs-appctl ofproto/trace source pod to destination pod indicates success from test-6459 to web ovs-appctl ofproto/trace destination pod to source pod indicates success from web to test-6459 ovn-detrace source pod to destination pod indicates success from test-6459 to web ovn-detrace destination pod to source pod indicates success from web to test-6459
ovn-trace source pod to destination pod indicates success from test-6459 to web ovn-trace destination pod to source pod indicates success from web to test-6459 ovs-appctl ofproto/trace source pod to destination pod indicates success from test-6459 to web ovs-appctl ofproto/trace destination pod to source pod indicates success from web to test-6459 ovn-detrace source pod to destination pod indicates success from test-6459 to web ovn-detrace destination pod to source pod indicates success from web to test-6459
Copy to Clipboard Copied! Run the following command in the shell that was opened in step six to connect nginx to the web-server:
wget -qO- --timeout=2 http://web.default
wget -qO- --timeout=2 http://web.default
Copy to Clipboard Copied! Expected output
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
<!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>
Copy to Clipboard Copied!
26.6. Migrating from the OpenShift SDN network plugin
As a cluster administrator, you can migrate to the OVN-Kubernetes network plugin from the OpenShift software-defined networking (SDN) plugin.
The following methods exist for migrating from the OpenShift SDN network plugin to the OVN-Kubernetes plugin:
- Ansible playbook
- The Ansible playbook method automates the offline migration method steps. This method has the same usage scenarios as the manual offline migration method.
- Offline migration
- This is a manual process that includes some downtime. This method is primarily used for self-managed OpenShift Container Platform deployments, and consider using this method when you cannot perform a limited live migration to the OVN-Kubernetes network plugin.
- Limited live migration (Preferred method)
- This is an automated process that migrates your cluster from OpenShift SDN to OVN-Kubernetes.
For the limited live migration method only, do not automate the migration from OpenShift SDN to OVN-Kubernetes with a script or another tool such as Red Hat Ansible Automation Platform. This might cause outages or crash your OpenShift Container Platform cluster.
26.6.1. Limited live migration to the OVN-Kubernetes network plugin overview
The limited live migration method is the process in which the OpenShift SDN network plugin and its network configurations, connections, and associated resources, are migrated to the OVN-Kubernetes network plugin without service interruption. For OpenShift Container Platform 4.15, it is available for versions 4.15.31 and later. It is the preferred method for migrating from OpenShift SDN to OVN-Kubernetes. In the event that you cannot perform a limited live migration, you can use the offline migration method.
Before you migrate your OpenShift Container Platform cluster to use the OVN-Kubernetes network plugin, update your cluster to the latest z-stream release so that all the latest bug fixes apply to your cluster.
It is not available for hosted control plane deployment types. This migration method is valuable for deployment types that require constant service availability and offers the following benefits:
- Continuous service availability
- Minimized downtime
- Automatic node rebooting
- Seamless transition from the OpenShift SDN network plugin to the OVN-Kubernetes network plugin
Although a rollback procedure is provided, the limited live migration is intended to be a one-way process.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
The following sections provide more information about the limited live migration method.
26.6.1.1. Supported platforms when using the limited live migration method
The following table provides information about the supported platforms for the limited live migration type.
Platform | Limited Live Migration |
---|---|
Bare-metal hardware | ✓ |
Amazon Web Services (AWS) | ✓ |
Google Cloud Platform (GCP) | ✓ |
IBM Cloud® | ✓ |
Microsoft Azure | ✓ |
Red Hat OpenStack Platform (RHOSP) | ✓ |
VMware vSphere | ✓ |
Nutanix | ✓ |
Each listed platform supports installing an OpenShift Container Platform cluster on installer-provisioned infrastructure and user-provisioned infrastructure.
26.6.1.2. Best practices for limited live migration to the OVN-Kubernetes network plugin
For a list of best practices when migrating to the OVN-Kubernetes network plugin with the limited live migration method, see Limited Live Migration from OpenShift SDN to OVN-Kubernetes.
26.6.1.3. Considerations for limited live migration to the OVN-Kubernetes network plugin
Before using the limited live migration method to the OVN-Kubernetes network plugin, cluster administrators should consider the following information:
- The limited live migration procedure is unsupported for clusters with OpenShift SDN multitenant mode enabled.
- Egress router pods block the limited live migration process. They must be removed before beginning the limited live migration process.
- During the migration, when the cluster is running with both OVN-Kubernetes and OpenShift SDN, multicast and egress IP addresses are temporarily disabled for both CNIs. Egress firewalls remains functional.
- The migration is intended to be a one-way process. However, for users that want to rollback to OpenShift-SDN, migration from OpenShift-SDN to OVN-Kubernetes must have succeeded. Users can follow the same procedure below to migrate to the OpenShift SDN network plugin from the OVN-Kubernetes network plugin.
- The limited live migration is not supported on HyperShift clusters.
- OpenShift SDN does not support IPsec. After the migration, cluster administrators can enable IPsec.
- OpenShift SDN does not support IPv6. After the migration, cluster administrators can enable dual-stack.
-
The OpenShift SDN plugin allows application of the
NodeNetworkConfigurationPolicy
(NNCP) custom resource (CR) to the primary interface on a node. The OVN-Kubernetes network plugin does not have this capability. The cluster MTU is the MTU value for pod interfaces. It is always less than your hardware MTU to account for the cluster network overlay overhead. The overhead is 100 bytes for OVN-Kubernetes and 50 bytes for OpenShift SDN.
During the limited live migration, both OVN-Kubernetes and OpenShift SDN run in parallel. OVN-Kubernetes manages the cluster network of some nodes, while OpenShift SDN manages the cluster network of others. To ensure that cross-CNI traffic remains functional, the Cluster Network Operator updates the routable MTU to ensure that both CNIs share the same overlay MTU. As a result, after the migration has completed, the cluster MTU is 50 bytes less.
-
OVN-Kubernetes reserves the
100.64.0.0/16
and100.88.0.0/16
IP address ranges. These subnets cannot be overlapped with any other internal or external network. If these IP addresses have been used by OpenShift SDN or any external networks that might communicate with this cluster, you must patch them to use a different IP address range before starting the limited live migration. See "Patching OVN-Kubernetes address ranges" for more information. -
If your
openshift-sdn
cluster with Precision Time Protocol (PTP) uses the User Datagram Protocol (UDP) for hardware time stamping and you migrate to the OVN-Kubernetes plugin, the hardware time stamping cannot be applied to primary interface devices, such as an Open vSwitch (OVS) bridge. As a result, UDP version 4 configurations cannot work with abr-ex
interface. - In most cases, the limited live migration is independent of the secondary interfaces of pods created by the Multus CNI plugin. However, if these secondary interfaces were set up on the default network interface controller (NIC) of the host, for example, using MACVLAN, IPVLAN, SR-IOV, or bridge interfaces with the default NIC as the control node, OVN-Kubernetes might encounter malfunctions. Users should remove such configurations before proceeding with the limited live migration.
- When there are multiple NICs inside of the host, and the default route is not on the interface that has the Kubernetes NodeIP, you must use the offline migration instead.
-
All
DaemonSet
objects in theopenshift-sdn
namespace, which are not managed by the Cluster Network Operator (CNO), must be removed before initiating the limited live migration. These unmanaged daemon sets can cause the migration status to remain incomplete if not properly handled. -
If you run an Operator or you have configured any application with the pod disruption budget, you might experience an interruption during the update process. If
minAvailable
is set to 1 inPodDisruptionBudget
, the nodes are drained to apply pending machine configs which might block the eviction process. If several nodes are rebooted, all the pods might run on only one node, and thePodDisruptionBudget
field can prevent the node drain. Like OpenShift SDN, OVN-Kubernetes resources such as
EgressFirewall
resources requireClusterAdmin
privileges. Migrating from OpenShift SDN to OVN-Kubernetes does not automatically update role-base access control (RBAC) resources. OpenShift SDN resources granted to a project administrator through theaggregate-to-admin
ClusterRole
must be manually reviewed and adjusted, as these changes are not included in the migration process.After migration, manual verification of RBAC resources is required. For information about setting the
aggregate-to-admin
ClusterRole after migration, see the example in How to allow project admins to manage Egressfirewall resources in RHOCP4.-
When a cluster depends on static routes or routing policies in the host network so that pods can reach some destinations, users should set
routingViaHost
spec totrue
andipForwarding
toGlobal
in thegatewayConfig
object during migration. This will offload routing decision to host kernel. For more information, see Recommended practice to follow before Openshift SDN network plugin migration to OVNKubernetes plugin (Red Hat Knowledgebase) and, see step five in "Checking cluster resources before initiating the limited live migration".
26.6.1.4. How the limited live migration process works
The following table summarizes the limited live migration process by segmenting between the user-initiated steps in the process and the actions that the migration script performs in response.
User-initiated steps | Migration activity |
---|---|
Patch the cluster-level networking configuration by changing the |
|
26.6.1.5. Migrating to the OVN-Kubernetes network plugin by using the limited live migration method
Migrating to the OVN-Kubernetes network plugin by using the limited live migration method is a multiple step process that requires users to check the behavior of egress IP resources, egress firewall resources, and multicast enabled namespaces. Administrators must also review any network policies in their deployment and remove egress router resources before initiating the limited live migration process. The following procedures should be used in succession.
26.6.1.5.1. Checking cluster resources before initiating the limited live migration
Before migrating to OVN-Kubernetes by using the limited live migration, you should check for egress IP resources, egress firewall resources, and multicast-enabled namespaces on your OpenShift SDN deployment. You should also review any network policies in your deployment. If you find that your cluster has these resources before migration, you should check their behavior after migration to ensure that they are working as intended.
The following procedure shows you how to check for egress IP resources, egress firewall resources, multicast-enabled namespaces, network policies, and an NNCP. No action is necessary after checking for these resources.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
As an OpenShift Container Platform cluster administrator, check for egress firewall resources. You can do this by using the
oc
CLI, or by using the OpenShift Container Platform web console.To check for egress firewall resource by using the
oc
CLI tool:To check for egress firewall resources, enter the following command:
oc get egressnetworkpolicies.network.openshift.io -A
$ oc get egressnetworkpolicies.network.openshift.io -A
Copy to Clipboard Copied! Example output
NAMESPACE NAME AGE <namespace> <example_egressfirewall> 5d
NAMESPACE NAME AGE <namespace> <example_egressfirewall> 5d
Copy to Clipboard Copied! You can check the intended behavior of an egress firewall resource by using the
-o yaml
flag. For example:oc get egressnetworkpolicy <example_egressfirewall> -n <namespace> -o yaml
$ oc get egressnetworkpolicy <example_egressfirewall> -n <namespace> -o yaml
Copy to Clipboard Copied! Example output
apiVersion: network.openshift.io/v1 kind: EgressNetworkPolicy metadata: name: <example_egress_policy> namespace: <namespace> spec: egress: - type: Allow to: cidrSelector: 0.0.0.0/0 - type: Deny to: cidrSelector: 10.0.0.0/8
apiVersion: network.openshift.io/v1 kind: EgressNetworkPolicy metadata: name: <example_egress_policy> namespace: <namespace> spec: egress: - type: Allow to: cidrSelector: 0.0.0.0/0 - type: Deny to: cidrSelector: 10.0.0.0/8
Copy to Clipboard Copied!
To check for egress firewall resources by using the OpenShift Container Platform web console:
- On the OpenShift Container Platform web console, click Observe → Metrics.
-
In the Expression box, type
sdn_controller_num_egress_firewalls
and click Run queries. If you have egress firewall resources, they are returned in the Expression box.
Check your cluster for egress IP resources. You can do this by using the
oc
CLI, or by using the OpenShift Container Platform web console.To check for egress IPs by using the
oc
CLI tool:To list namespaces with egress IP resources, enter the following command
oc get netnamespace -A | awk '$3 != ""'
$ oc get netnamespace -A | awk '$3 != ""'
Copy to Clipboard Copied! Example output
NAME NETID EGRESS IPS namespace1 14173093 ["10.0.158.173"] namespace2 14173020 ["10.0.158.173"]
NAME NETID EGRESS IPS namespace1 14173093 ["10.0.158.173"] namespace2 14173020 ["10.0.158.173"]
Copy to Clipboard Copied!
To check for egress IPs by using the OpenShift Container Platform web console:
- On the OpenShift Container Platform web console, click Observe → Metrics.
-
In the Expression box, type
sdn_controller_num_egress_ips
and click Run queries. If you have egress firewall resources, they are returned in the Expression box.
Check your cluster for multicast enabled namespaces. You can do this by using the
oc
CLI, or by using the OpenShift Container Platform web console.To check for multicast enabled namespaces by using the
oc
CLI tool:To locate namespaces with multicast enabled, enter the following command:
oc get netnamespace -o json | jq -r '.items[] | select(.metadata.annotations."netnamespace.network.openshift.io/multicast-enabled" == "true") | .metadata.name'
$ oc get netnamespace -o json | jq -r '.items[] | select(.metadata.annotations."netnamespace.network.openshift.io/multicast-enabled" == "true") | .metadata.name'
Copy to Clipboard Copied! Example output
namespace1 namespace3
namespace1 namespace3
Copy to Clipboard Copied!
To check for multicast enabled namespaces by using the OpenShift Container Platform web console:
- On the OpenShift Container Platform web console, click Observe → Metrics.
-
In the Expression box, type
sdn_controller_num_multicast_enabled_namespaces
and click Run queries. If you have multicast enabled namespaces, they are returned in the Expression box.
Check your cluster for any network policies. You can do this by using the
oc
CLI.To check for network policies by using the
oc
CLI tool, enter the following command:oc get networkpolicy -n <namespace>
$ oc get networkpolicy -n <namespace>
Copy to Clipboard Copied! Example output
NAME POD-SELECTOR AGE allow-multicast app=my-app 11m
NAME POD-SELECTOR AGE allow-multicast app=my-app 11m
Copy to Clipboard Copied!
Optional: If your cluster uses static routes or routing policies in the host network, set
routingViaHost
spec totrue
and theipForwarding
spec toGlobal
in thegatewayConfig
object during migration.oc patch Network.operator.openshift.io cluster --type=merge \ --patch '{
$ oc patch Network.operator.openshift.io cluster --type=merge \ --patch '{ "spec":{ "defaultNetwork":{ "ovnKubernetesConfig":{ "gatewayConfig": { "ipForwarding": "Global", "routingViaHost": true }}}}}'
Copy to Clipboard Copied! Verify that the
ipForwarding
spec has been set toGlobal
and theroutingViaHost
spec totrue
by running the following command:oc get networks.operator.openshift.io cluster -o yaml | grep -A 5 "gatewayConfig"
$ oc get networks.operator.openshift.io cluster -o yaml | grep -A 5 "gatewayConfig"
Copy to Clipboard Copied! Example output
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster # ... gatewayConfig: ipForwarding: Global ipv4: {} ipv6: {} routingViaHost: true genevePort: 6081 # ...
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster # ... gatewayConfig: ipForwarding: Global ipv4: {} ipv6: {} routingViaHost: true genevePort: 6081 # ...
Copy to Clipboard Copied!
26.6.1.5.2. Removing egress router pods before initiating the limited live migration
Before initiating the limited live migration, you must check for, and remove, any egress router pods. If there is an egress router pod on the cluster when performing a limited live migration, the Network Operator blocks the migration and returns the following error:
The cluster configuration is invalid (network type limited live migration is not supported for pods with `pod.network.openshift.io/assign-macvlan` annotation. Please remove all egress router pods). Use `oc edit network.config.openshift.io cluster` to fix.
The cluster configuration is invalid (network type limited live migration is not supported for pods with `pod.network.openshift.io/assign-macvlan` annotation.
Please remove all egress router pods). Use `oc edit network.config.openshift.io cluster` to fix.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
To locate egress router pods on your cluster, enter the following command:
oc get pods --all-namespaces -o json | jq '.items[] | select(.metadata.annotations."pod.network.openshift.io/assign-macvlan" == "true") | {name: .metadata.name, namespace: .metadata.namespace}'
$ oc get pods --all-namespaces -o json | jq '.items[] | select(.metadata.annotations."pod.network.openshift.io/assign-macvlan" == "true") | {name: .metadata.name, namespace: .metadata.namespace}'
Copy to Clipboard Copied! Example output
{ "name": "egress-multi", "namespace": "egress-router-project" }
{ "name": "egress-multi", "namespace": "egress-router-project" }
Copy to Clipboard Copied! Alternatively, you can query metrics on the OpenShift Container Platform web console.
- On the OpenShift Container Platform web console, click Observe → Metrics.
-
In the Expression box, enter
network_attachment_definition_instances{networks="egress-router"}
. Then, click Add.
To remove an egress router pod, enter the following command:
oc delete pod <egress_pod_name> -n <egress_router_project>
$ oc delete pod <egress_pod_name> -n <egress_router_project>
Copy to Clipboard Copied!
26.6.1.5.3. Initiating the limited live migration process
After you have checked the behavior of egress IP resources, egress firewall resources, and multicast enabled namespaces, and removed any egress router resources, you can initiate the limited live migration process.
Prerequisites
- A cluster has been configured with the OpenShift SDN CNI network plugin in the network policy isolation mode.
-
You have installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role. - You have created a recent backup of the etcd database.
- The cluster is in a known good state without any errors.
-
Before migration to OVN-Kubernetes, a security group rule must be in place to allow UDP packets on port
6081
for all nodes on all cloud platforms. -
If the
100.64.0.0/16
and100.88.0.0/16
address ranges were previously in use by OpenShift-SDN, you have patched them. A step in the procedure checks whether these address ranges are in use. If they are in use, see "Patching OVN-Kubernetes address ranges". - You have checked for egress IP resources, egress firewall resources, and multicast enabled namespaces.
- You have removed any egress router pods before beginning the limited live migration. For more information about egress router pods, see "Deploying an egress router pod in redirect mode".
- You have reviewed the "Considerations for limited live migration to the OVN-Kubernetes network plugin" section of this document.
Procedure
To patch the cluster-level networking configuration and initiate the migration from OpenShift SDN to OVN-Kubernetes, enter the following command:
oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
$ oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
Copy to Clipboard Copied! After running this command, the migration process begins. During this process, the Machine Config Operator reboots the nodes in your cluster twice. The migration takes approximately twice as long as a cluster upgrade.
ImportantThis
oc patch
command checks for overlapping CIDRs in use by OpenShift SDN. If overlapping CIDRs are detected, you must patch them before the limited live migration process can start. For more information, see "Patching OVN-Kubernetes address ranges".Optional: To ensure that the migration process has completed, and to check the status of the
network.config
, you can enter the following commands:oc get network.config.openshift.io cluster -o jsonpath='{.status.networkType}'
$ oc get network.config.openshift.io cluster -o jsonpath='{.status.networkType}'
Copy to Clipboard Copied! oc get network.config cluster -o=jsonpath='{.status.conditions}' | jq .
$ oc get network.config cluster -o=jsonpath='{.status.conditions}' | jq .
Copy to Clipboard Copied! You can check limited live migration metrics for troubleshooting issues. For more information, see "Checking limited live migration metrics".
After a successful migration operation, remove the
network.openshift.io/network-type-migration-
annotation from thenetwork.config
custom resource by entering the following command:oc annotate network.config cluster network.openshift.io/network-type-migration-
$ oc annotate network.config cluster network.openshift.io/network-type-migration-
Copy to Clipboard Copied!
26.6.1.5.4. Patching OVN-Kubernetes address ranges
OVN-Kubernetes reserves the following IP address ranges:
-
100.64.0.0/16
. This IP address range is used for theinternalJoinSubnet
parameter of OVN-Kubernetes by default. -
100.88.0.0/16
. This IP address range is used for theinternalTransSwitchSubnet
parameter of OVN-Kubernetes by default.
If these IP addresses have been used by OpenShift SDN or any external networks that might communicate with this cluster, you must patch them to use a different IP address range before initiating the limited live migration.
The following procedure can be used to patch CIDR ranges that are in use by OpenShift SDN if the migration was initially blocked.
This is an optional procedure and must only be used if the migration was blocked after using the oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
command "Initiating the limited live migration process".
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
If the
100.64.0.0/16
IP address range is already in use, enter the following command to patch it to a different range. The following example uses100.70.0.0/16
.oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalJoinSubnet": "100.70.0.0/16"}}}}}'
$ oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalJoinSubnet": "100.70.0.0/16"}}}}}'
Copy to Clipboard Copied! If the
100.88.0.0/16
IP address range is already in use, enter the following command to patch it to a different range. The following example uses100.99.0.0/16
.oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalTransitSwitchSubnet": "100.99.0.0/16"}}}}}'
$ oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalTransitSwitchSubnet": "100.99.0.0/16"}}}}}'
Copy to Clipboard Copied!
After patching the 100.64.0.0/16
and 100.88.0.0/16
IP address ranges, you can initiate the limited live migration.
26.6.1.5.5. Checking cluster resources after initiating the limited live migration
The following procedure shows you how to check for egress IP resources, egress firewall resources, multicast enabled namespaces, and network policies when your deploying is using OVN-Kubernetes. If you had these resources on OpenShift SDN, you should check them after migration to ensure that they are working properly.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - You have successfully migrated from OpenShift SDN to OVN-Kubernetes by using the limited live migration.
Procedure
As an OpenShift Container Platform cluster administrator, check for egress firewall resources. You can do this by using the
oc
CLI, or by using the OpenShift Container Platform web console.To check for egress firewall resource by using the
oc
CLI tool:To check for egress firewall resources, enter the following command:
oc get egressfirewalls.k8s.ovn.org -A
$ oc get egressfirewalls.k8s.ovn.org -A
Copy to Clipboard Copied! Example output
NAMESPACE NAME AGE <namespace> <example_egressfirewall> 5d
NAMESPACE NAME AGE <namespace> <example_egressfirewall> 5d
Copy to Clipboard Copied! You can check the intended behavior of an egress firewall resource by using the
-o yaml
flag. For example:oc get egressfirewall <example_egressfirewall> -n <namespace> -o yaml
$ oc get egressfirewall <example_egressfirewall> -n <namespace> -o yaml
Copy to Clipboard Copied! Example output
apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: <example_egress_policy> namespace: <namespace> spec: egress: - type: Allow to: cidrSelector: 192.168.0.0/16 - type: Deny to: cidrSelector: 0.0.0.0/0
apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: <example_egress_policy> namespace: <namespace> spec: egress: - type: Allow to: cidrSelector: 192.168.0.0/16 - type: Deny to: cidrSelector: 0.0.0.0/0
Copy to Clipboard Copied! Ensure that the behavior of this resource is intended because it could have changed after migration. For more information about egress firewalls, see "Configuring an egress firewall for a project".
To check for egress firewall resources by using the OpenShift Container Platform web console:
- On the OpenShift Container Platform web console, click Observe → Metrics.
-
In the Expression box, type
ovnkube_controller_num_egress_firewall_rules
and click Run queries. If you have egress firewall resources, they are returned in the Expression box.
Check your cluster for egress IP resources. You can do this by using the
oc
CLI, or by using the OpenShift Container Platform web console.To check for egress IPs by using the
oc
CLI tool:To list the namespace with egress IP resources, enter the following command:
oc get egressip
$ oc get egressip
Copy to Clipboard Copied! Example output
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egress-sample 192.0.2.10 ip-10-0-42-79.us-east-2.compute.internal 192.0.2.10 egressip-sample-2 192.0.2.14 ip-10-0-42-79.us-east-2.compute.internal 192.0.2.14
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egress-sample 192.0.2.10 ip-10-0-42-79.us-east-2.compute.internal 192.0.2.10 egressip-sample-2 192.0.2.14 ip-10-0-42-79.us-east-2.compute.internal 192.0.2.14
Copy to Clipboard Copied! To provide detailed information about an egress IP, enter the following command:
oc get egressip <egressip_name> -o yaml
$ oc get egressip <egressip_name> -o yaml
Copy to Clipboard Copied! Example output
apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"k8s.ovn.org/v1","kind":"EgressIP","metadata":{"annotations":{},"name":"egressip-sample"},"spec":{"egressIPs":["192.0.2.12","192.0.2.13"],"namespaceSelector":{"matchLabels":{"name":"my-namespace"}}}} creationTimestamp: "2024-06-27T15:48:36Z" generation: 7 name: egressip-sample resourceVersion: "125511578" uid: b65833c8-781f-4cc9-bc96-d970259a7631 spec: egressIPs: - 192.0.2.12 - 192.0.2.13 namespaceSelector: matchLabels: name: my-namespace
apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"k8s.ovn.org/v1","kind":"EgressIP","metadata":{"annotations":{},"name":"egressip-sample"},"spec":{"egressIPs":["192.0.2.12","192.0.2.13"],"namespaceSelector":{"matchLabels":{"name":"my-namespace"}}}} creationTimestamp: "2024-06-27T15:48:36Z" generation: 7 name: egressip-sample resourceVersion: "125511578" uid: b65833c8-781f-4cc9-bc96-d970259a7631 spec: egressIPs: - 192.0.2.12 - 192.0.2.13 namespaceSelector: matchLabels: name: my-namespace
Copy to Clipboard Copied! Repeat this for all egress IPs. Ensure that the behavior of each resource is intended because it could have changed after migration. For more information about EgressIPs, see "Configuring an EgressIP address".
To check for egress IPs by using the OpenShift Container Platform web console:
- On the OpenShift Container Platform web console, click Observe → Metrics.
-
In the Expression box, type
ovnkube_clustermanager_num_egress_ips
and click Run queries. If you have egress firewall resources, they are returned in the Expression box.
Check your cluster for multicast enabled namespaces. You can only do this by using the
oc
CLI.To locate namespaces with multicast enabled, enter the following command:
oc get namespace -o json | jq -r '.items[] | select(.metadata.annotations."k8s.ovn.org/multicast-enabled" == "true") | .metadata.name'
$ oc get namespace -o json | jq -r '.items[] | select(.metadata.annotations."k8s.ovn.org/multicast-enabled" == "true") | .metadata.name'
Copy to Clipboard Copied! Example output
namespace1 namespace3
namespace1 namespace3
Copy to Clipboard Copied! To describe each multicast enabled namespace, enter the following command:
oc describe namespace <namespace>
$ oc describe namespace <namespace>
Copy to Clipboard Copied! Example output
Name: my-namespace Labels: kubernetes.io/metadata.name=my-namespace pod-security.kubernetes.io/audit=restricted pod-security.kubernetes.io/audit-version=v1.24 pod-security.kubernetes.io/warn=restricted pod-security.kubernetes.io/warn-version=v1.24 Annotations: k8s.ovn.org/multicast-enabled: true openshift.io/sa.scc.mcs: s0:c25,c0 openshift.io/sa.scc.supplemental-groups: 1000600000/10000 openshift.io/sa.scc.uid-range: 1000600000/10000 Status: Active
Name: my-namespace Labels: kubernetes.io/metadata.name=my-namespace pod-security.kubernetes.io/audit=restricted pod-security.kubernetes.io/audit-version=v1.24 pod-security.kubernetes.io/warn=restricted pod-security.kubernetes.io/warn-version=v1.24 Annotations: k8s.ovn.org/multicast-enabled: true openshift.io/sa.scc.mcs: s0:c25,c0 openshift.io/sa.scc.supplemental-groups: 1000600000/10000 openshift.io/sa.scc.uid-range: 1000600000/10000 Status: Active
Copy to Clipboard Copied! Ensure that multicast functionality is correctly configured and working as expected in each namespace. For more information, see "Enabling multicast for a project".
Check your cluster’s network policies. You can only do this by using the
oc
CLI.To obtain information about network policies within a namespace, enter the following command:
oc get networkpolicy -n <namespace>
$ oc get networkpolicy -n <namespace>
Copy to Clipboard Copied! Example output
NAME POD-SELECTOR AGE allow-multicast app=my-app 11m
NAME POD-SELECTOR AGE allow-multicast app=my-app 11m
Copy to Clipboard Copied! To provide detailed information about the network policy, enter the following command:
oc describe networkpolicy allow-multicast -n <namespace>
$ oc describe networkpolicy allow-multicast -n <namespace>
Copy to Clipboard Copied! Example output
Name: allow-multicast Namespace: my-namespace Created on: 2024-07-24 14:55:03 -0400 EDT Labels: <none> Annotations: <none> Spec: PodSelector: app=my-app Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: IPBlock: CIDR: 224.0.0.0/4 Except: Allowing egress traffic: To Port: <any> (traffic allowed to all ports) To: IPBlock: CIDR: 224.0.0.0/4 Except: Policy Types: Ingress, Egress
Name: allow-multicast Namespace: my-namespace Created on: 2024-07-24 14:55:03 -0400 EDT Labels: <none> Annotations: <none> Spec: PodSelector: app=my-app Allowing ingress traffic: To Port: <any> (traffic allowed to all ports) From: IPBlock: CIDR: 224.0.0.0/4 Except: Allowing egress traffic: To Port: <any> (traffic allowed to all ports) To: IPBlock: CIDR: 224.0.0.0/4 Except: Policy Types: Ingress, Egress
Copy to Clipboard Copied! Ensure that the behavior of the network policy is as intended. Optimization for network policies differ between SDN and OVN-K, so users might need to adjust their policies to achieve optimal performance for different CNIs. For more information, see "About network policy".
26.6.1.6. Checking limited live migration metrics
Metrics are available to monitor the progress of the limited live migration. Metrics can be viewed on the OpenShift Container Platform web console, or by using the oc
CLI.
Prerequisites
- You have initiated a limited live migration to OVN-Kubernetes.
Procedure
To view limited live migration metrics on the OpenShift Container Platform web console:
- Click Observe → Metrics.
- In the Expression box, type openshift_network and click the openshift_network_operator_live_migration_condition option.
To view metrics by using the
oc
CLI:Enter the following command to generate a token for the
prometheus-k8s
service account in theopenshift-monitoring
namespace:token=`oc create token prometheus-k8s -n openshift-monitoring`
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
Copy to Clipboard Copied! Enter the following command to request information about the
openshift_network_operator_live_migration_condition
metric:oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=openshift_network_operator_live_migration_condition' | jq
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=openshift_network_operator_live_migration_condition' | jq
Copy to Clipboard Copied! Example output
"status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "openshift_network_operator_live_migration_condition", "container": "network-operator", "endpoint": "metrics", "instance": "10.0.83.62:9104", "job": "metrics", "namespace": "openshift-network-operator", "pod": "network-operator-6c87754bc6-c8qld", "prometheus": "openshift-monitoring/k8s", "service": "metrics", "type": "NetworkTypeMigrationInProgress" }, "value": [ 1717653579.587, "1" ] }, ...
"status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "openshift_network_operator_live_migration_condition", "container": "network-operator", "endpoint": "metrics", "instance": "10.0.83.62:9104", "job": "metrics", "namespace": "openshift-network-operator", "pod": "network-operator-6c87754bc6-c8qld", "prometheus": "openshift-monitoring/k8s", "service": "metrics", "type": "NetworkTypeMigrationInProgress" }, "value": [ 1717653579.587, "1" ] }, ...
Copy to Clipboard Copied!
The table in "Information about limited live migration metrics" shows you the available metrics and the label values populated from the openshift_network_operator_live_migration_condition
expression. Use this information to monitor progress or to troubleshoot the migration.
26.6.1.6.1. Information about limited live migration metrics
The following table shows you the available metrics and the label values populated from the openshift_network_operator_live_migration_condition
expression. Use this information to monitor progress or to troubleshoot the migration.
Metric | Label values |
---|---|
|
|
|
|
26.6.3. Offline migration to the OVN-Kubernetes network plugin overview
The offline migration method is a manual process that includes some downtime, during which your cluster is unreachable. You can use an Ansible playbook that automates the offline migration steps so that you can save time. These methods are primarily used for self-managed OpenShift Container Platform deployments, and are an alternative to the limited live migration procedure. Use these methods only when you cannot perform a limited live migration to the OVN-Kubernetes network plugin.
Although a rollback procedure is provided, the offline migration is intended to be a one-way process.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
The following sections provide more information about the offline migration method.
26.6.3.1. Supported platforms when using the offline migration methods
The following table provides information about the supported platforms for the manual offline migration type.
Platform | Offline migration |
---|---|
Bare metal hardware (IPI and UPI) | ✓ |
Amazon Web Services (AWS) (IPI and UPI) | ✓ |
Google Cloud Platform (GCP) (IPI and UPI) | ✓ |
IBM Cloud® (IPI and UPI) | ✓ |
Microsoft Azure (IPI and UPI) | ✓ |
Red Hat OpenStack Platform (RHOSP) (IPI and UPI) | ✓ |
VMware vSphere (IPI and UPI) | ✓ |
AliCloud (IPI and UPI) | ✓ |
Nutanix (IPI and UPI) | ✓ |
26.6.3.2. Considerations for the offline migration methods to the OVN-Kubernetes network plugin
If you have more than 150 nodes in your OpenShift Container Platform cluster, then open a support case for consultation on your migration to the OVN-Kubernetes network plugin.
The subnets assigned to nodes and the IP addresses assigned to individual pods are not preserved during the migration.
While the OVN-Kubernetes network plugin implements many of the capabilities present in the OpenShift SDN network plugin, the configuration is not the same.
If your cluster uses any of the following OpenShift SDN network plugin capabilities, you must manually configure the same capability in the OVN-Kubernetes network plugin:
- Namespace isolation
- Egress router pods
-
Before migrating to OVN-Kubernetes, ensure that the following IP address ranges are not in use:
100.64.0.0/16
,169.254.169.0/29
,100.88.0.0/16
,fd98::/64
,fd69::/125
, andfd97::/64
. OVN-Kubernetes uses these ranges internally. Do not include any of these ranges in any other CIDR definitions in your cluster or infrastructure. -
If your
openshift-sdn
cluster with Precision Time Protocol (PTP) uses the User Datagram Protocol (UDP) for hardware time stamping and you migrate to the OVN-Kubernetes plugin, the hardware time stamping cannot be applied to primary interface devices, such as an Open vSwitch (OVS) bridge. As a result, UDP version 4 configurations cannot work with abr-ex
interface. Like OpenShift SDN, OVN-Kubernetes resources require
ClusterAdmin
privileges. Migrating from OpenShift SDN to OVN-Kubernetes does not automatically update role-base access control (RBAC) resources. OpenShift SDN resources granted to a project administrator through theaggregate-to-admin
ClusterRole
must be manually reviewed and adjusted, as these changes are not included in the migration process.After migration, manual verification of RBAC resources is required.
The following sections highlight the differences in configuration between the aforementioned capabilities in OVN-Kubernetes and OpenShift SDN network plugins.
Primary network interface
The OpenShift SDN plugin allows application of the NodeNetworkConfigurationPolicy
(NNCP) custom resource (CR) to the primary interface on a node. The OVN-Kubernetes network plugin does not have this capability.
If you have an NNCP applied to the primary interface, you must delete the NNCP before migrating to the OVN-Kubernetes network plugin. Deleting the NNCP does not remove the configuration from the primary interface, but with OVN-Kubernetes, the Kubernetes NMState cannot manage this configuration. Instead, the configure-ovs.sh
shell script manages the primary interface and the configuration attached to this interface.
Namespace isolation
OVN-Kubernetes supports only the network policy isolation mode.
For a cluster using OpenShift SDN that is configured in either the multitenant or subnet isolation mode, you can still migrate to the OVN-Kubernetes network plugin. Note that after the migration operation, multitenant isolation mode is dropped, so you must manually configure network policies to achieve the same level of project-level isolation for pods and services.
Egress IP addresses
OpenShift SDN supports two different Egress IP modes:
- In the automatically assigned approach, an egress IP address range is assigned to a node.
- In the manually assigned approach, a list of one or more egress IP addresses is assigned to a node.
The migration process supports migrating Egress IP configurations that use the automatically assigned mode.
The differences in configuring an egress IP address between OVN-Kubernetes and OpenShift SDN is described in the following table:
OVN-Kubernetes | OpenShift SDN |
---|---|
|
|
For more information on using egress IP addresses in OVN-Kubernetes, see "Configuring an egress IP address".
Egress network policies
The difference in configuring an egress network policy, also known as an egress firewall, between OVN-Kubernetes and OpenShift SDN is described in the following table:
OVN-Kubernetes | OpenShift SDN |
---|---|
|
|
Because the name of an EgressFirewall
object can only be set to default
, after the migration all migrated EgressNetworkPolicy
objects are named default
, regardless of what the name was under OpenShift SDN.
If you subsequently rollback to OpenShift SDN, all EgressNetworkPolicy
objects are named default
as the prior name is lost.
For more information on using an egress firewall in OVN-Kubernetes, see "Configuring an egress firewall for a project".
Egress router pods
OVN-Kubernetes supports egress router pods in redirect mode. OVN-Kubernetes does not support egress router pods in HTTP proxy mode or DNS proxy mode.
When you deploy an egress router with the Cluster Network Operator, you cannot specify a node selector to control which node is used to host the egress router pod.
Multicast
The difference between enabling multicast traffic on OVN-Kubernetes and OpenShift SDN is described in the following table:
OVN-Kubernetes | OpenShift SDN |
---|---|
|
|
For more information on using multicast in OVN-Kubernetes, see "Enabling multicast for a project".
Network policies
OVN-Kubernetes fully supports the Kubernetes NetworkPolicy
API in the networking.k8s.io/v1
API group. No changes are necessary in your network policies when migrating from OpenShift SDN.
26.6.3.3. How the offline migration process works
The following table summarizes the migration process by segmenting between the user-initiated steps in the process and the actions that the migration performs in response.
User-initiated steps | Migration activity |
---|---|
Set the |
|
Update the |
|
Reboot each node in the cluster. |
|
26.6.3.4. Using an Ansible playbook to migrate to the OVN-Kubernetes network plugin
As a cluster administrator, you can use an Ansible collection, network.offline_migration_sdn_to_ovnk
, to migrate from the OpenShift SDN Container Network Interface (CNI) network plugin to the OVN-Kubernetes plugin for your cluster. The Ansible collection includes the following playbooks:
-
playbooks/playbook-migration.yml
: Includes playbooks that execute in a sequence where each playbook represents a step in the migration process. -
playbooks/playbook-rollback.yml
: Includes playbooks that execute in a sequence where each playbook represents a step in the rollback process.
Prerequisites
-
You installed the
python3
package, minimum version 3.10. -
You installed the
jmespath
andjq
packages. - You logged in to the Red Hat Hybrid Cloud Console and opened the Ansible Automation Platform web console.
-
You created a security group rule that allows User Datagram Protocol (UDP) packets on port
6081
for all nodes on all cloud platforms. If you do not do this task, your cluster might fail to schedule pods. Check if your cluster uses static routes or routing policies in the host network.
-
If true, a later procedure step requires that you set the
routingViaHost
parameter totrue
and theipForwarding
parameter toGlobal
in thegatewayConfig
section of theplaybooks/playbook-migration.yml
file.
-
If true, a later procedure step requires that you set the
-
If the OpenShift-SDN plugin uses the
100.64.0.0/16
and100.88.0.0/16
address ranges, you patched the address ranges. For more information, see "Patching OVN-Kubernetes address ranges" in the Additional resources section.
Procedure
Install the
ansible-core
package, minimum version 2.15. The following example command shows how to install theansible-core
package on Red Hat Enterprise Linux (RHEL):sudo dnf install -y ansible-core
$ sudo dnf install -y ansible-core
Copy to Clipboard Copied! Create an
ansible.cfg
file and add information similar to the following example to the file. Ensure that file exists in the same directory as where theansible-galaxy
commands and the playbooks run.cat << EOF >> ansible.cfg [galaxy] server_list = automation_hub, validated [galaxy_server.automation_hub] url=https://console.redhat.com/api/automation-hub/content/published/ auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token token= #[galaxy_server.release_galaxy] #url=https://galaxy.ansible.com/ [galaxy_server.validated] url=https://console.redhat.com/api/automation-hub/content/validated/ auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token token= EOF
$ cat << EOF >> ansible.cfg [galaxy] server_list = automation_hub, validated [galaxy_server.automation_hub] url=https://console.redhat.com/api/automation-hub/content/published/ auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token token= #[galaxy_server.release_galaxy] #url=https://galaxy.ansible.com/ [galaxy_server.validated] url=https://console.redhat.com/api/automation-hub/content/validated/ auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token token= EOF
Copy to Clipboard Copied! From the Ansible Automation Platform web console, go to the Connect to Hub page and complete the following steps:
- In the Offline token section of the page, click the Load token button.
- After the token loads, click the Copy to clipboard icon.
-
Open the
ansible.cfg
file and paste the API token in thetoken=
parameter. The API token is required for authenticating against the server URL specified in theansible.cfg
file.
Install the
network.offline_migration_sdn_to_ovnk
Ansible collection by entering the followingansible-galaxy
command:ansible-galaxy collection install network.offline_migration_sdn_to_ovnk
$ ansible-galaxy collection install network.offline_migration_sdn_to_ovnk
Copy to Clipboard Copied! Verify that the
network.offline_migration_sdn_to_ovnk
Ansible collection is installed on your system:ansible-galaxy collection list | grep network.offline_migration_sdn_to_ovnk
$ ansible-galaxy collection list | grep network.offline_migration_sdn_to_ovnk
Copy to Clipboard Copied! Example output
network.offline_migration_sdn_to_ovnk 1.0.2
network.offline_migration_sdn_to_ovnk 1.0.2
Copy to Clipboard Copied! The
network.offline_migration_sdn_to_ovnk
Ansible collection is saved in the default path of~/.ansible/collections/ansible_collections/network/offline_migration_sdn_to_ovnk/
.Configure migration features in the
playbooks/playbook-migration.yml
file:# ... migration_interface_name: eth0 migration_disable_auto_migration: true migration_egress_ip: false migration_egress_firewall: false migration_multicast: false migration_routing_via_host: true migration_ip_forwarding: Global migration_cidr: "10.240.0.0/14" migration_prefix: 23 migration_mtu: 1400 migration_geneve_port: 6081 migration_ipv4_subnet: "100.64.0.0/16" # ...
# ... migration_interface_name: eth0 migration_disable_auto_migration: true migration_egress_ip: false migration_egress_firewall: false migration_multicast: false migration_routing_via_host: true migration_ip_forwarding: Global migration_cidr: "10.240.0.0/14" migration_prefix: 23 migration_mtu: 1400 migration_geneve_port: 6081 migration_ipv4_subnet: "100.64.0.0/16" # ...
Copy to Clipboard Copied! migration_interface_name
-
If you use an
NodeNetworkConfigurationPolicy
(NNCP) resource on a primary interface, specify the interface name in themigration-playbook.yml
file so that the NNCP resource gets deleted on the primary interface during the migration process. migration_disable_auto_migration
-
Disables the auto-migration of OpenShift SDN CNI plug-in features to the OVN-Kubernetes plugin. If you disable auto-migration of features, you must also set the
migration_egress_ip
,migration_egress_firewall
, andmigration_multicast
parameters tofalse
. If you need to enable auto-migration of features, set the parameter tofalse
. migration_routing_via_host
-
Set to
true
to configure local gateway mode orfalse
to configure shared gateway mode for nodes in your cluster. The default value isfalse
. In local gateway mode, traffic is routed through the host network stack. In shared gateway mode, traffic is not routed through the host network stack. migration_ip_forwarding
-
If you configured local gateway mode, set IP forwarding to
Global
if you need the host network of the node to act as a router for traffic not related to OVN-Kubernetes. migration_cidr
-
Specifies a Classless Inter-Domain Routing (CIDR) IP address block for your cluster. You cannot use any CIDR block that overlaps the
100.64.0.0/16
CIDR block, because the OVN-Kubernetes network provider uses this block internally. migration_prefix
- Ensure that you specify a prefix value, which is the slice of the CIDR block apportioned to each node in your cluster.
migration_mtu
- Optional parameter that sets a specific maximum transmission unit (MTU) to your cluster network after the migration process.
migration_geneve_port
-
Optional parameter that sets a Geneve port for OVN-Kubernetes. The default port is
6081
. migration_ipv4_subnet
-
Optional parameter that sets an IPv4 address range for internal use by OVN-Kubernetes. The default value for the parameter is
100.64.0.0/16
.
To run the
playbooks/playbook-migration.yml
file, enter the following command:ansible-playbook -v playbooks/playbook-migration.yml
$ ansible-playbook -v playbooks/playbook-migration.yml
Copy to Clipboard Copied!
26.6.3.5. Migrating to the OVN-Kubernetes network plugin by using the offline migration method
As a cluster administrator, you can change the network plugin for your cluster to OVN-Kubernetes. During the migration, you must reboot every node in your cluster.
While performing the migration, your cluster is unavailable and workloads might be interrupted. Perform the migration only when an interruption in service is acceptable.
Prerequisites
- You have a cluster configured with the OpenShift SDN CNI network plugin in the network policy isolation mode.
-
You installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role. - You have a recent backup of the etcd database.
- You can manually reboot each node.
- You checked that your cluster is in a known good state without any errors.
-
You created a security group rule that allows User Datagram Protocol (UDP) packets on port
6081
for all nodes on all cloud platforms. - You removed webhooks. Alternatively, you can set a timeout value for each webhook, which is detailed in the procedure. If you did not complete one of these tasks, your cluster might fail to schedule pods.
Procedure
If you did not remove webhooks, set the timeout value for each webhook to
3
seconds by creating aValidatingWebhookConfiguration
custom resource and then specify the timeout value for thetimeoutSeconds
parameter:oc patch ValidatingWebhookConfiguration <webhook_name> --type='json' \ -p '[{"op": "replace", "path": "/webhooks/0/timeoutSeconds", "value": 3}]'
oc patch ValidatingWebhookConfiguration <webhook_name> --type='json' \
1 -p '[{"op": "replace", "path": "/webhooks/0/timeoutSeconds", "value": 3}]'
Copy to Clipboard Copied! - 1
- Where
<webhook_name>
is the name of your webhook.
To backup the configuration for the cluster network, enter the following command:
oc get Network.config.openshift.io cluster -o yaml > cluster-openshift-sdn.yaml
$ oc get Network.config.openshift.io cluster -o yaml > cluster-openshift-sdn.yaml
Copy to Clipboard Copied! Verify that the
OVN_SDN_MIGRATION_TIMEOUT
environment variable is set and is equal to0s
by running the following command:#!/bin/bash if [ -n "$OVN_SDN_MIGRATION_TIMEOUT" ] && [ "$OVN_SDN_MIGRATION_TIMEOUT" = "0s" ]; then unset OVN_SDN_MIGRATION_TIMEOUT fi #loops the timeout command of the script to repeatedly check the cluster Operators until all are available. co_timeout=${OVN_SDN_MIGRATION_TIMEOUT:-1200s} timeout "$co_timeout" bash <<EOT until oc wait co --all --for='condition=AVAILABLE=True' --timeout=10s && \ oc wait co --all --for='condition=PROGRESSING=False' --timeout=10s && \ oc wait co --all --for='condition=DEGRADED=False' --timeout=10s; do sleep 10 echo "Some ClusterOperators Degraded=False,Progressing=True,or Available=False"; done EOT
#!/bin/bash if [ -n "$OVN_SDN_MIGRATION_TIMEOUT" ] && [ "$OVN_SDN_MIGRATION_TIMEOUT" = "0s" ]; then unset OVN_SDN_MIGRATION_TIMEOUT fi #loops the timeout command of the script to repeatedly check the cluster Operators until all are available. co_timeout=${OVN_SDN_MIGRATION_TIMEOUT:-1200s} timeout "$co_timeout" bash <<EOT until oc wait co --all --for='condition=AVAILABLE=True' --timeout=10s && \ oc wait co --all --for='condition=PROGRESSING=False' --timeout=10s && \ oc wait co --all --for='condition=DEGRADED=False' --timeout=10s; do sleep 10 echo "Some ClusterOperators Degraded=False,Progressing=True,or Available=False"; done EOT
Copy to Clipboard Copied! Remove the configuration from the Cluster Network Operator (CNO) configuration object by running the following command:
oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{"spec":{"migration":null}}'
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{"spec":{"migration":null}}'
Copy to Clipboard Copied! . Delete the
NodeNetworkConfigurationPolicy
(NNCP) custom resource (CR) that defines the primary network interface for the OpenShift SDN network plugin by completing the following steps:Check that the existing NNCP CR bonded the primary interface to your cluster by entering the following command:
oc get nncp
$ oc get nncp
Copy to Clipboard Copied! Example output
NAME STATUS REASON bondmaster0 Available SuccessfullyConfigured
NAME STATUS REASON bondmaster0 Available SuccessfullyConfigured
Copy to Clipboard Copied! Network Manager stores the connection profile for the bonded primary interface in the
/etc/NetworkManager/system-connections
system path.Remove the NNCP from your cluster:
oc delete nncp <nncp_manifest_filename>
$ oc delete nncp <nncp_manifest_filename>
Copy to Clipboard Copied!
To prepare all the nodes for the migration, set the
migration
field on the CNO configuration object by running the following command:oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes" } } }'
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes" } } }'
Copy to Clipboard Copied! NoteThis step does not deploy OVN-Kubernetes immediately. Instead, specifying the
migration
field triggers the Machine Config Operator (MCO) to apply new machine configs to all the nodes in the cluster in preparation for the OVN-Kubernetes deployment.Check that the reboot is finished by running the following command:
oc get mcp
$ oc get mcp
Copy to Clipboard Copied! Check that all cluster Operators are available by running the following command:
oc get co
$ oc get co
Copy to Clipboard Copied! Alternatively: You can disable automatic migration of several OpenShift SDN capabilities to the OVN-Kubernetes equivalents:
- Egress IPs
- Egress firewall
- Multicast
To disable automatic migration of the configuration for any of the previously noted OpenShift SDN features, specify the following keys:
oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes", "features": { "egressIP": <bool>, "egressFirewall": <bool>, "multicast": <bool> } } } }'
Copy to Clipboard Copied! where:
bool
: Specifies whether to enable migration of the feature. The default istrue
.
Optional: You can customize the following settings for OVN-Kubernetes to meet your network infrastructure requirements:
Maximum transmission unit (MTU). Consider the following before customizing the MTU for this optional step:
- If you use the default MTU, and you want to keep the default MTU during migration, this step can be ignored.
- If you used a custom MTU, and you want to keep the custom MTU during migration, you must declare the custom MTU value in this step.
This step does not work if you want to change the MTU value during migration. Instead, you must first follow the instructions for "Changing the cluster MTU". You can then keep the custom MTU value by performing this procedure and declaring the custom MTU value in this step.
NoteOpenShift-SDN and OVN-Kubernetes have different overlay overhead. MTU values should be selected by following the guidelines found on the "MTU value selection" page.
- Geneve (Generic Network Virtualization Encapsulation) overlay network port
- OVN-Kubernetes IPv4 internal subnet
To customize either of the previously noted settings, enter and customize the following command. If you do not need to change the default value, omit the key from the patch.
oc patch Network.operator.openshift.io cluster --type=merge \ --patch '{
$ oc patch Network.operator.openshift.io cluster --type=merge \ --patch '{ "spec":{ "defaultNetwork":{ "ovnKubernetesConfig":{ "mtu":<mtu>, "genevePort":<port>, "ipv4": "InternalJoinSubnet": "<ipv4_subnet>" }}}}'
Copy to Clipboard Copied! where:
mtu
-
The MTU for the Geneve overlay network. This value is normally configured automatically, but if the nodes in your cluster do not all use the same MTU, then you must set this explicitly to
100
less than the smallest node MTU value. port
-
The UDP port for the Geneve overlay network. If a value is not specified, the default is
6081
. The port cannot be the same as the VXLAN port that is used by OpenShift SDN. The default value for the VXLAN port is4789
. ipv4_subnet
-
An IPv4 address range for internal use by OVN-Kubernetes. You must ensure that the IP address range does not overlap with any other subnet used by your OpenShift Container Platform installation. The IP address range must be larger than the maximum number of nodes that can be added to the cluster. The default value is
100.64.0.0/16
.
Example patch command to update
mtu
fieldoc patch Network.operator.openshift.io cluster --type=merge \ --patch '{
$ oc patch Network.operator.openshift.io cluster --type=merge \ --patch '{ "spec":{ "defaultNetwork":{ "ovnKubernetesConfig":{ "mtu":1200 }}}}'
Copy to Clipboard Copied! As the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
oc get mcp
$ oc get mcp
Copy to Clipboard Copied! A successfully updated node has the following status:
UPDATED=true
,UPDATING=false
,DEGRADED=false
.NoteBy default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
To list the machine configuration state and the name of the applied machine configuration, enter the following command:
oc describe node | egrep "hostname|machineconfig"
$ oc describe node | egrep "hostname|machineconfig"
Copy to Clipboard Copied! Example output
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
Copy to Clipboard Copied! Verify that the following statements are true:
-
The value of
machineconfiguration.openshift.io/state
field isDone
. -
The value of the
machineconfiguration.openshift.io/currentConfig
field is equal to the value of themachineconfiguration.openshift.io/desiredConfig
field.
-
The value of
To confirm that the machine config is correct, enter the following command:
oc get machineconfig <config_name> -o yaml | grep ExecStart
$ oc get machineconfig <config_name> -o yaml | grep ExecStart
Copy to Clipboard Copied! where
<config_name>
is the name of the machine config from themachineconfiguration.openshift.io/currentConfig
field.The machine config must include the following update to the systemd configuration:
ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes
ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes
Copy to Clipboard Copied! If a node is stuck in the
NotReady
state, investigate the machine config daemon pod logs and resolve any errors.To list the pods, enter the following command:
oc get pod -n openshift-machine-config-operator
$ oc get pod -n openshift-machine-config-operator
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE machine-config-controller-75f756f89d-sjp8b 1/1 Running 0 37m machine-config-daemon-5cf4b 2/2 Running 0 43h machine-config-daemon-7wzcd 2/2 Running 0 43h machine-config-daemon-fc946 2/2 Running 0 43h machine-config-daemon-g2v28 2/2 Running 0 43h machine-config-daemon-gcl4f 2/2 Running 0 43h machine-config-daemon-l5tnv 2/2 Running 0 43h machine-config-operator-79d9c55d5-hth92 1/1 Running 0 37m machine-config-server-bsc8h 1/1 Running 0 43h machine-config-server-hklrm 1/1 Running 0 43h machine-config-server-k9rtx 1/1 Running 0 43h
NAME READY STATUS RESTARTS AGE machine-config-controller-75f756f89d-sjp8b 1/1 Running 0 37m machine-config-daemon-5cf4b 2/2 Running 0 43h machine-config-daemon-7wzcd 2/2 Running 0 43h machine-config-daemon-fc946 2/2 Running 0 43h machine-config-daemon-g2v28 2/2 Running 0 43h machine-config-daemon-gcl4f 2/2 Running 0 43h machine-config-daemon-l5tnv 2/2 Running 0 43h machine-config-operator-79d9c55d5-hth92 1/1 Running 0 37m machine-config-server-bsc8h 1/1 Running 0 43h machine-config-server-hklrm 1/1 Running 0 43h machine-config-server-k9rtx 1/1 Running 0 43h
Copy to Clipboard Copied! The names for the config daemon pods are in the following format:
machine-config-daemon-<seq>
. The<seq>
value is a random five character alphanumeric sequence.Display the pod log for the first machine config daemon pod shown in the previous output by enter the following command:
oc logs <pod> -n openshift-machine-config-operator
$ oc logs <pod> -n openshift-machine-config-operator
Copy to Clipboard Copied! where
pod
is the name of a machine config daemon pod.- Resolve any errors in the logs shown by the output from the previous command.
To start the migration, configure the OVN-Kubernetes network plugin by using one of the following commands:
To specify the network provider without changing the cluster network IP address block, enter the following command:
oc patch Network.config.openshift.io cluster \ --type='merge' --patch '{ "spec": { "networkType": "OVNKubernetes" } }'
$ oc patch Network.config.openshift.io cluster \ --type='merge' --patch '{ "spec": { "networkType": "OVNKubernetes" } }'
Copy to Clipboard Copied! To specify a different cluster network IP address block, enter the following command:
oc patch Network.config.openshift.io cluster \ --type='merge' --patch '{
$ oc patch Network.config.openshift.io cluster \ --type='merge' --patch '{ "spec": { "clusterNetwork": [ { "cidr": "<cidr>", "hostPrefix": <prefix> } ], "networkType": "OVNKubernetes" } }'
Copy to Clipboard Copied! where
cidr
is a CIDR block andprefix
is the slice of the CIDR block apportioned to each node in your cluster. You cannot use any CIDR block that overlaps with the100.64.0.0/16
CIDR block because the OVN-Kubernetes network provider uses this block internally.ImportantYou cannot change the service network address block during the migration.
Verify that the Multus daemon set rollout is complete before continuing with subsequent steps:
oc -n openshift-multus rollout status daemonset/multus
$ oc -n openshift-multus rollout status daemonset/multus
Copy to Clipboard Copied! The name of the Multus pods is in the form of
multus-<xxxxx>
where<xxxxx>
is a random sequence of letters. It might take several moments for the pods to restart.Example output
Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated... ... Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available... daemon set "multus" successfully rolled out
Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated... ... Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available... daemon set "multus" successfully rolled out
Copy to Clipboard Copied! To complete changing the network plugin, reboot each node in your cluster. You can reboot the nodes in your cluster with either of the following approaches:
ImportantThe following scripts reboot all of the nodes in the cluster at the same time. This can cause your cluster to be unstable. Another option is to reboot your nodes manually one at a time. Rebooting nodes one-by-one causes considerable downtime in a cluster with many nodes.
Cluster Operators will not work correctly before you reboot the nodes.
With the
oc rsh
command, you can use a bash script similar to the following:#!/bin/bash readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')" for i in "${POD_NODES[@]}" do read -r POD NODE <<< "$i" until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1 do echo "cannot reboot node $NODE, retry" && sleep 3 done done
#!/bin/bash readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')" for i in "${POD_NODES[@]}" do read -r POD NODE <<< "$i" until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1 do echo "cannot reboot node $NODE, retry" && sleep 3 done done
Copy to Clipboard Copied! With the
ssh
command, you can use a bash script similar to the following. The script assumes that you have configured sudo to not prompt for a password.#!/bin/bash for ip in $(oc get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}') do echo "reboot node $ip" ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3 done
#!/bin/bash for ip in $(oc get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}') do echo "reboot node $ip" ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3 done
Copy to Clipboard Copied!
Confirm that the migration succeeded:
To confirm that the network plugin is OVN-Kubernetes, enter the following command. The value of
status.networkType
must beOVNKubernetes
.oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
$ oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
Copy to Clipboard Copied! To confirm that the cluster nodes are in the
Ready
state, enter the following command:oc get nodes
$ oc get nodes
Copy to Clipboard Copied! To confirm that your pods are not in an error state, enter the following command:
oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
$ oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
Copy to Clipboard Copied! If pods on a node are in an error state, reboot that node.
To confirm that all of the cluster Operators are not in an abnormal state, enter the following command:
oc get co
$ oc get co
Copy to Clipboard Copied! The status of every cluster Operator must be the following:
AVAILABLE="True"
,PROGRESSING="False"
,DEGRADED="False"
. If a cluster Operator is not available or degraded, check the logs for the cluster Operator for more information.
Complete the following steps only if the migration succeeds and your cluster is in a good state:
To remove the migration configuration from the CNO configuration object, enter the following command:
oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": null } }'
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": null } }'
Copy to Clipboard Copied! To remove custom configuration for the OpenShift SDN network provider, enter the following command:
oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "defaultNetwork": { "openshiftSDNConfig": null } } }'
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "defaultNetwork": { "openshiftSDNConfig": null } } }'
Copy to Clipboard Copied! To remove the OpenShift SDN network provider namespace, enter the following command:
oc delete namespace openshift-sdn
$ oc delete namespace openshift-sdn
Copy to Clipboard Copied! After a successful migration operation, remove the
network.openshift.io/network-type-migration-
annotation from thenetwork.config
custom resource by entering the following command:oc annotate network.config cluster network.openshift.io/network-type-migration-
$ oc annotate network.config cluster network.openshift.io/network-type-migration-
Copy to Clipboard Copied!
Next steps
- Optional: After cluster migration, you can convert your IPv4 single-stack cluster to a dual-network cluster network that supports IPv4 and IPv6 address families. For more information, see "Converting to IPv4/IPv6 dual-stack networking".
26.6.4. Understanding changes to external IP behavior in OVN-Kubernetes
When migrating from OpenShift SDN to OVN-Kubernetes (OVN-K), services that use external IPs might become inaccessible across namespaces due to network policy enforcement.
In OpenShift SDN, external IPs were accessible across namespaces by default. However, in OVN-K, network policies strictly enforce multitenant isolation, preventing access to services exposed via external IPs from other namespaces.
To ensure access, consider the following alternatives:
- Use an ingress or route: Instead of exposing services by using external IPs, configure an ingress or route to allow external access while maintaining security controls.
-
Adjust the
NetworkPolicy
custom resource (CR): Modify aNetworkPolicy
CR to explicitly allow access from required namespaces and ensure that traffic is allowed to the designated service ports. Without explicitly allowing traffic to the required ports, access might still be blocked, even if the namespace is allowed. -
Use a
LoadBalancer
service: If applicable, deploy aLoadBalancer
service instead of relying on external IPs. For more information about configuring see "NetworkPolicy and external IPs in OVN-Kubernetes".
26.7. Rolling back to the OpenShift SDN network provider
As a cluster administrator, you can roll back to the OpenShift SDN network plugin from the OVN-Kubernetes network plugin using either the offline migration method, or the limited live migration method. This can only be done after the migration to the OVN-Kubernetes network plugin has successfully completed.
- If you used the offline migration method to migrate to the OpenShift SDN network plugin from the OVN-Kubernetes network plugin, you should use the offline migration rollback method.
- If you used the limited live migration method to migrate to the OpenShift SDN network plugin from the OVN-Kubernetes network plugin, you should use the limited live migration rollback method.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
26.7.1. Using the offline migration method to roll back to the OpenShift SDN network plugin
Cluster administrators can roll back to the OpenShift SDN Container Network Interface (CNI) network plugin by using the offline migration method. During the migration you must manually reboot every node in your cluster. With the offline migration method, there is some downtime, during which your cluster is unreachable.
You must wait until the migration process from OpenShift SDN to OVN-Kubernetes network plugin is successful before initiating a rollback.
If a rollback to OpenShift SDN is required, the following table describes the process.
User-initiated steps | Migration activity |
---|---|
Suspend the MCO to ensure that it does not interrupt the migration. | The MCO stops. |
Set the |
|
Update the |
|
Reboot each node in the cluster. |
|
Enable the MCO after all nodes in the cluster reboot. |
|
Prerequisites
-
The OpenShift CLI (
oc
) is installed. - Access to the cluster as a user with the cluster-admin role is available.
- The cluster is installed on infrastructure configured with the OVN-Kubernetes network plugin.
- A recent backup of the etcd database is available.
- A manual reboot can be triggered for each node.
- The cluster is in a known good state, without any errors.
Procedure
Stop all of the machine configuration pools managed by the Machine Config Operator (MCO):
Stop the
master
configuration pool by entering the following command in your CLI:oc patch MachineConfigPool master --type='merge' --patch \ '{ "spec": { "paused": true } }'
$ oc patch MachineConfigPool master --type='merge' --patch \ '{ "spec": { "paused": true } }'
Copy to Clipboard Copied! Stop the
worker
machine configuration pool by entering the following command in your CLI:oc patch MachineConfigPool worker --type='merge' --patch \ '{ "spec":{ "paused": true } }'
$ oc patch MachineConfigPool worker --type='merge' --patch \ '{ "spec":{ "paused": true } }'
Copy to Clipboard Copied!
To prepare for the migration, set the migration field to
null
by entering the following command in your CLI:oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": null } }'
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": null } }'
Copy to Clipboard Copied! Check that the migration status is empty for the
Network.config.openshift.io
object by entering the following command in your CLI. Empty command output indicates that the object is not in a migration operation.oc get Network.config cluster -o jsonpath='{.status.migration}'
$ oc get Network.config cluster -o jsonpath='{.status.migration}'
Copy to Clipboard Copied! Apply the patch to the
Network.operator.openshift.io
object to set the network plugin back to OpenShift SDN by entering the following command in your CLI:oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": { "networkType": "OpenShiftSDN" } } }'
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": { "networkType": "OpenShiftSDN" } } }'
Copy to Clipboard Copied! ImportantIf you applied the patch to the
Network.config.openshift.io
object before the patch operation finalizes on theNetwork.operator.openshift.io
object, the Cluster Network Operator (CNO) enters into a degradation state and this causes a slight delay until the CNO recovers from the degraded state.Confirm that the migration status of the network plugin for the
Network.config.openshift.io cluster
object isOpenShiftSDN
by entering the following command in your CLI:oc get Network.config cluster -o jsonpath='{.status.migration.networkType}'
$ oc get Network.config cluster -o jsonpath='{.status.migration.networkType}'
Copy to Clipboard Copied! Apply the patch to the
Network.config.openshift.io
object to set the network plugin back to OpenShift SDN by entering the following command in your CLI:oc patch Network.config.openshift.io cluster --type='merge' \ --patch '{ "spec": { "networkType": "OpenShiftSDN" } }'
$ oc patch Network.config.openshift.io cluster --type='merge' \ --patch '{ "spec": { "networkType": "OpenShiftSDN" } }'
Copy to Clipboard Copied! Optional: Disable automatic migration of several OVN-Kubernetes capabilities to the OpenShift SDN equivalents:
- Egress IPs
- Egress firewall
- Multicast
To disable automatic migration of the configuration for any of the previously noted OpenShift SDN features, specify the following keys:
oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": { "networkType": "OpenShiftSDN", "features": { "egressIP": <bool>, "egressFirewall": <bool>, "multicast": <bool> } } } }'
Copy to Clipboard Copied! where:
bool
: Specifies whether to enable migration of the feature. The default istrue
.Optional: You can customize the following settings for OpenShift SDN to meet your network infrastructure requirements:
- Maximum transmission unit (MTU)
- VXLAN port
To customize either or both of the previously noted settings, customize and enter the following command in your CLI. If you do not need to change the default value, omit the key from the patch.
oc patch Network.operator.openshift.io cluster --type=merge \ --patch '{
$ oc patch Network.operator.openshift.io cluster --type=merge \ --patch '{ "spec":{ "defaultNetwork":{ "openshiftSDNConfig":{ "mtu":<mtu>, "vxlanPort":<port> }}}}'
Copy to Clipboard Copied! mtu
-
The MTU for the VXLAN overlay network. This value is normally configured automatically, but if the nodes in your cluster do not all use the same MTU, then you must set this explicitly to
50
less than the smallest node MTU value. port
-
The UDP port for the VXLAN overlay network. If a value is not specified, the default is
4789
. The port cannot be the same as the Geneve port that is used by OVN-Kubernetes. The default value for the Geneve port is6081
.
Example patch command
oc patch Network.operator.openshift.io cluster --type=merge \ --patch '{
$ oc patch Network.operator.openshift.io cluster --type=merge \ --patch '{ "spec":{ "defaultNetwork":{ "openshiftSDNConfig":{ "mtu":1200 }}}}'
Copy to Clipboard Copied! Reboot each node in your cluster. You can reboot the nodes in your cluster with either of the following approaches:
With the
oc rsh
command, you can use a bash script similar to the following:#!/bin/bash readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')" for i in "${POD_NODES[@]}" do read -r POD NODE <<< "$i" until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1 do echo "cannot reboot node $NODE, retry" && sleep 3 done done
#!/bin/bash readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')" for i in "${POD_NODES[@]}" do read -r POD NODE <<< "$i" until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1 do echo "cannot reboot node $NODE, retry" && sleep 3 done done
Copy to Clipboard Copied! With the
ssh
command, you can use a bash script similar to the following. The script assumes that you have configured sudo to not prompt for a password.#!/bin/bash for ip in $(oc get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}') do echo "reboot node $ip" ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3 done
#!/bin/bash for ip in $(oc get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}') do echo "reboot node $ip" ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3 done
Copy to Clipboard Copied!
Wait until the Multus daemon set rollout completes. Run the following command to see your rollout status:
oc -n openshift-multus rollout status daemonset/multus
$ oc -n openshift-multus rollout status daemonset/multus
Copy to Clipboard Copied! The name of the Multus pods is in the form of
multus-<xxxxx>
where<xxxxx>
is a random sequence of letters. It might take several moments for the pods to restart.Example output
Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated... ... Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available... daemon set "multus" successfully rolled out
Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated... ... Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available... daemon set "multus" successfully rolled out
Copy to Clipboard Copied! After the nodes in your cluster have rebooted and the multus pods are rolled out, start all of the machine configuration pools by running the following commands::
Start the master configuration pool:
oc patch MachineConfigPool master --type='merge' --patch \ '{ "spec": { "paused": false } }'
$ oc patch MachineConfigPool master --type='merge' --patch \ '{ "spec": { "paused": false } }'
Copy to Clipboard Copied! Start the worker configuration pool:
oc patch MachineConfigPool worker --type='merge' --patch \ '{ "spec": { "paused": false } }'
$ oc patch MachineConfigPool worker --type='merge' --patch \ '{ "spec": { "paused": false } }'
Copy to Clipboard Copied!
As the MCO updates machines in each config pool, it reboots each node.
By default the MCO updates a single machine per pool at a time, so the time that the migration requires to complete grows with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
To list the machine configuration state and the name of the applied machine configuration, enter the following command in your CLI:
oc describe node | egrep "hostname|machineconfig"
$ oc describe node | egrep "hostname|machineconfig"
Copy to Clipboard Copied! Example output
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
Copy to Clipboard Copied! Verify that the following statements are true:
-
The value of
machineconfiguration.openshift.io/state
field isDone
. -
The value of the
machineconfiguration.openshift.io/currentConfig
field is equal to the value of themachineconfiguration.openshift.io/desiredConfig
field.
-
The value of
To confirm that the machine config is correct, enter the following command in your CLI:
oc get machineconfig <config_name> -o yaml
$ oc get machineconfig <config_name> -o yaml
Copy to Clipboard Copied! where
<config_name>
is the name of the machine config from themachineconfiguration.openshift.io/currentConfig
field.
Confirm that the migration succeeded:
To confirm that the network plugin is OpenShift SDN, enter the following command in your CLI. The value of
status.networkType
must beOpenShiftSDN
.oc get Network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
$ oc get Network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
Copy to Clipboard Copied! To confirm that the cluster nodes are in the
Ready
state, enter the following command in your CLI:oc get nodes
$ oc get nodes
Copy to Clipboard Copied! If a node is stuck in the
NotReady
state, investigate the machine config daemon pod logs and resolve any errors.To list the pods, enter the following command in your CLI:
oc get pod -n openshift-machine-config-operator
$ oc get pod -n openshift-machine-config-operator
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE machine-config-controller-75f756f89d-sjp8b 1/1 Running 0 37m machine-config-daemon-5cf4b 2/2 Running 0 43h machine-config-daemon-7wzcd 2/2 Running 0 43h machine-config-daemon-fc946 2/2 Running 0 43h machine-config-daemon-g2v28 2/2 Running 0 43h machine-config-daemon-gcl4f 2/2 Running 0 43h machine-config-daemon-l5tnv 2/2 Running 0 43h machine-config-operator-79d9c55d5-hth92 1/1 Running 0 37m machine-config-server-bsc8h 1/1 Running 0 43h machine-config-server-hklrm 1/1 Running 0 43h machine-config-server-k9rtx 1/1 Running 0 43h
NAME READY STATUS RESTARTS AGE machine-config-controller-75f756f89d-sjp8b 1/1 Running 0 37m machine-config-daemon-5cf4b 2/2 Running 0 43h machine-config-daemon-7wzcd 2/2 Running 0 43h machine-config-daemon-fc946 2/2 Running 0 43h machine-config-daemon-g2v28 2/2 Running 0 43h machine-config-daemon-gcl4f 2/2 Running 0 43h machine-config-daemon-l5tnv 2/2 Running 0 43h machine-config-operator-79d9c55d5-hth92 1/1 Running 0 37m machine-config-server-bsc8h 1/1 Running 0 43h machine-config-server-hklrm 1/1 Running 0 43h machine-config-server-k9rtx 1/1 Running 0 43h
Copy to Clipboard Copied! The names for the config daemon pods are in the following format:
machine-config-daemon-<seq>
. The<seq>
value is a random five character alphanumeric sequence.To display the pod log for each machine config daemon pod shown in the previous output, enter the following command in your CLI:
oc logs <pod> -n openshift-machine-config-operator
$ oc logs <pod> -n openshift-machine-config-operator
Copy to Clipboard Copied! where
pod
is the name of a machine config daemon pod.- Resolve any errors in the logs shown by the output from the previous command.
To confirm that your pods are not in an error state, enter the following command in your CLI:
oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
$ oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
Copy to Clipboard Copied! If pods on a node are in an error state, reboot that node.
Complete the following steps only if the migration succeeds and your cluster is in a good state:
To remove the migration configuration from the Cluster Network Operator configuration object, enter the following command in your CLI:
oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": null } }'
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": null } }'
Copy to Clipboard Copied! To remove the OVN-Kubernetes configuration, enter the following command in your CLI:
oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "defaultNetwork": { "ovnKubernetesConfig":null } } }'
$ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "defaultNetwork": { "ovnKubernetesConfig":null } } }'
Copy to Clipboard Copied! To remove the OVN-Kubernetes network provider namespace, enter the following command in your CLI:
oc delete namespace openshift-ovn-kubernetes
$ oc delete namespace openshift-ovn-kubernetes
Copy to Clipboard Copied!
26.7.2. Using an Ansible playbook to roll back to the OpenShift SDN network plugin
As a cluster administrator, you can use the playbooks/playbook-rollback.yml
from the network.offline_migration_sdn_to_ovnk
Ansible collection to roll back from the OVN-Kubernetes plugin to the OpenShift SDN Container Network Interface (CNI) network plugin.
Prerequisites
-
You installed the
python3
package, minimum version 3.10. -
You installed the
jmespath
andjq
packages. - You logged in to the Red Hat Hybrid Cloud Console and opened the Ansible Automation Platform web console.
-
You created a security group rule that allows User Datagram Protocol (UDP) packets on port
6081
for all nodes on all cloud platforms. If you do not do this task, your cluster might fail to schedule pods.
Procedure
Install the
ansible-core
package, minimum version 2.15. The following example command shows how to install theansible-core
package on Red Hat Enterprise Linux (RHEL):sudo dnf install -y ansible-core
$ sudo dnf install -y ansible-core
Copy to Clipboard Copied! Create an
ansible.cfg
file and add information similar to the following example to the file. Ensure that file exists in the same directory as where theansible-galaxy
commands and the playbooks run.cat << EOF >> ansible.cfg [galaxy] server_list = automation_hub, validated [galaxy_server.automation_hub] url=https://console.redhat.com/api/automation-hub/content/published/ auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token token= #[galaxy_server.release_galaxy] #url=https://galaxy.ansible.com/ [galaxy_server.validated] url=https://console.redhat.com/api/automation-hub/content/validated/ auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token token= EOF
$ cat << EOF >> ansible.cfg [galaxy] server_list = automation_hub, validated [galaxy_server.automation_hub] url=https://console.redhat.com/api/automation-hub/content/published/ auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token token= #[galaxy_server.release_galaxy] #url=https://galaxy.ansible.com/ [galaxy_server.validated] url=https://console.redhat.com/api/automation-hub/content/validated/ auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token token= EOF
Copy to Clipboard Copied! From the Ansible Automation Platform web console, go to the Connect to Hub page and complete the following steps:
- In the Offline token section of the page, click the Load token button.
- After the token loads, click the Copy to clipboard icon.
-
Open the
ansible.cfg
file and paste the API token in thetoken=
parameter. The API token is required for authenticating against the server URL specified in theansible.cfg
file.
Install the
network.offline_migration_sdn_to_ovnk
Ansible collection by entering the followingansible-galaxy
command:ansible-galaxy collection install network.offline_migration_sdn_to_ovnk
$ ansible-galaxy collection install network.offline_migration_sdn_to_ovnk
Copy to Clipboard Copied! Verify that the
network.offline_migration_sdn_to_ovnk
Ansible collection is installed on your system:ansible-galaxy collection list | grep network.offline_migration_sdn_to_ovnk
$ ansible-galaxy collection list | grep network.offline_migration_sdn_to_ovnk
Copy to Clipboard Copied! Example output
network.offline_migration_sdn_to_ovnk 1.0.2
network.offline_migration_sdn_to_ovnk 1.0.2
Copy to Clipboard Copied! The
network.offline_migration_sdn_to_ovnk
Ansible collection is saved in the default path of~/.ansible/collections/ansible_collections/network/offline_migration_sdn_to_ovnk/
.Configure rollback features in the
playbooks/playbook-migration.yml
file:... ...
# ... rollback_disable_auto_migration: true rollback_egress_ip: false rollback_egress_firewall: false rollback_multicast: false rollback_mtu: 1400 rollback_vxlanPort: 4790 # ...
Copy to Clipboard Copied! rollback_disable_auto_migration
-
Disables the auto-migration of OVN-Kubernetes plug-in features to the OpenShift SDN CNI plug-in. If you disable auto-migration of features, you must also set the
rollback_egress_ip
,rollback_egress_firewall
, androllback_multicast
parameters tofalse
. If you need to enable auto-migration of features, set the parameter tofalse
. rollback_mtu
- Optional parameter that sets a specific maximum transmission unit (MTU) to your cluster network after the migration process.
rollback_vxlanPort
-
Optional parameter that sets a VXLAN (Virtual Extensible LAN) port for use by OpenShift SDN CNI plug-in. The default value for the parameter is
4790
.
To run the
playbooks/playbook-rollback.yml
file, enter the following command:ansible-playbook -v playbooks/playbook-rollback.yml
$ ansible-playbook -v playbooks/playbook-rollback.yml
Copy to Clipboard Copied!
26.7.3. Using the limited live migration method to roll back to the OpenShift SDN network plugin
As a cluster administrator, you can roll back to the OpenShift SDN Container Network Interface (CNI) network plugin by using the limited live migration method. During the migration with this method, nodes are automatically rebooted and service to the cluster is not interrupted.
You must wait until the migration process from OpenShift SDN to OVN-Kubernetes network plugin is successful before initiating a rollback.
If a rollback to OpenShift SDN is required, the following table describes the process.
User-initiated steps | Migration activity |
---|---|
Patch the cluster-level networking configuration by changing the |
|
Prerequisites
-
The OpenShift CLI (
oc
) is installed. - Access to the cluster as a user with the cluster-admin role is available.
- The cluster is installed on infrastructure configured with the OVN-Kubernetes network plugin.
- A recent backup of the etcd database is available.
- A manual reboot can be triggered for each node.
- The cluster is in a known good state, without any errors.
Procedure
To initiate the rollback to OpenShift SDN, enter the following command:
oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OpenShiftSDN"}}'
$ oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OpenShiftSDN"}}'
Copy to Clipboard Copied! To watch the progress of your migration, enter the following command:
watch -n1 'oc get network.config/cluster -o json | jq ".status.conditions[]|\"\\(.type) \\(.status) \\(.reason) \\(.message)\"" -r | column --table --table-columns NAME,STATUS,REASON,MESSAGE --table-columns-limit 4; echo; oc get mcp -o wide; echo; oc get node -o "custom-columns=NAME:metadata.name,STATE:metadata.annotations.machineconfiguration\\.openshift\\.io/state,DESIRED:metadata.annotations.machineconfiguration\\.openshift\\.io/desiredConfig,CURRENT:metadata.annotations.machineconfiguration\\.openshift\\.io/currentConfig,REASON:metadata.annotations.machineconfiguration\\.openshift\\.io/reason"'
$ watch -n1 'oc get network.config/cluster -o json | jq ".status.conditions[]|\"\\(.type) \\(.status) \\(.reason) \\(.message)\"" -r | column --table --table-columns NAME,STATUS,REASON,MESSAGE --table-columns-limit 4; echo; oc get mcp -o wide; echo; oc get node -o "custom-columns=NAME:metadata.name,STATE:metadata.annotations.machineconfiguration\\.openshift\\.io/state,DESIRED:metadata.annotations.machineconfiguration\\.openshift\\.io/desiredConfig,CURRENT:metadata.annotations.machineconfiguration\\.openshift\\.io/currentConfig,REASON:metadata.annotations.machineconfiguration\\.openshift\\.io/reason"'
Copy to Clipboard Copied! The command prints the following information every second:
-
The conditions on the status of the
network.config.openshift.io/cluster
object, reporting the progress of the migration. -
The status of different nodes with respect to the
machine-config-operator
resource, including whether they are upgrading or have been upgraded, as well as their current and desired configurations.
-
The conditions on the status of the
Complete the following steps only if the migration succeeds and your cluster is in a good state:
Remove the
network.openshift.io/network-type-migration=
annotation from thenetwork.config
custom resource by entering the following command:oc annotate network.config cluster network.openshift.io/network-type-migration-
$ oc annotate network.config cluster network.openshift.io/network-type-migration-
Copy to Clipboard Copied! Remove the OVN-Kubernetes network provider namespace by entering the following command:
oc delete namespace openshift-ovn-kubernetes
$ oc delete namespace openshift-ovn-kubernetes
Copy to Clipboard Copied!
26.8. Converting to IPv4/IPv6 dual-stack networking
As a cluster administrator, you can convert your IPv4 single-stack cluster to a dual-network cluster network that supports IPv4 and IPv6 address families. After converting to dual-stack, all newly created pods are dual-stack enabled.
-
While using dual-stack networking, you cannot use IPv4-mapped IPv6 addresses, such as
::FFFF:198.51.100.1
, where IPv6 is required. - A dual-stack network is supported on clusters provisioned on bare metal, IBM Power®, IBM Z® infrastructure, single-node OpenShift, and VMware vSphere.
26.8.1. Converting to a dual-stack cluster network
As a cluster administrator, you can convert your single-stack cluster network to a dual-stack cluster network.
After converting to dual-stack networking only newly created pods are assigned IPv6 addresses. Any pods created before the conversion must be recreated to receive an IPv6 address.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - Your cluster uses the OVN-Kubernetes network plugin.
- The cluster nodes have IPv6 addresses.
- You have configured an IPv6-enabled router based on your infrastructure.
Procedure
To specify IPv6 address blocks for the cluster and service networks, create a file containing the following YAML:
- op: add path: /spec/clusterNetwork/- value: cidr: fd01::/48 hostPrefix: 64 - op: add path: /spec/serviceNetwork/- value: fd02::/112
- op: add path: /spec/clusterNetwork/- value:
1 cidr: fd01::/48 hostPrefix: 64 - op: add path: /spec/serviceNetwork/- value: fd02::/112
2 Copy to Clipboard Copied! - 1
- Specify an object with the
cidr
andhostPrefix
parameters. The host prefix must be64
or greater. The IPv6 Classless Inter-Domain Routing (CIDR) prefix must be large enough to accommodate the specified host prefix. - 2
- Specify an IPv6 CIDR with a prefix of
112
. Kubernetes uses only the lowest 16 bits. For a prefix of112
, IP addresses are assigned from112
to128
bits.
To patch the cluster network configuration, enter the following command:
oc patch network.config.openshift.io cluster \ --type='json' --patch-file <file>.yaml
$ oc patch network.config.openshift.io cluster \ --type='json' --patch-file <file>.yaml
Copy to Clipboard Copied! where:
file
Specifies the name of the file you created in the previous step.
Example output
network.config.openshift.io/cluster patched
network.config.openshift.io/cluster patched
Copy to Clipboard Copied!
Verification
Complete the following step to verify that the cluster network recognizes the IPv6 address blocks that you specified in the previous procedure.
Display the network configuration:
oc describe network
$ oc describe network
Copy to Clipboard Copied! Example output
Status: Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Cidr: fd01::/48 Host Prefix: 64 Cluster Network MTU: 1400 Network Type: OVNKubernetes Service Network: 172.30.0.0/16 fd02::/112
Status: Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Cidr: fd01::/48 Host Prefix: 64 Cluster Network MTU: 1400 Network Type: OVNKubernetes Service Network: 172.30.0.0/16 fd02::/112
Copy to Clipboard Copied!
26.8.2. Converting to a single-stack cluster network
As a cluster administrator, you can convert your dual-stack cluster network to a single-stack cluster network.
If you originally converted your IPv4 single-stack cluster network to a dual-stack cluster, you can convert only back to the IPv4 single-stack cluster and not an IPv6 single-stack cluster network. The same restriction applies for converting back to an IPv6 single-stack cluster network.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - Your cluster uses the OVN-Kubernetes network plugin.
- The cluster nodes have IPv6 addresses.
- You have enabled dual-stack networking.
Procedure
Edit the
networks.config.openshift.io
custom resource (CR) by running the following command:oc edit networks.config.openshift.io
$ oc edit networks.config.openshift.io
Copy to Clipboard Copied! -
Remove the IPv4 or IPv6 configuration that you added to the
cidr
and thehostPrefix
parameters from completing the "Converting to a dual-stack cluster network " procedure steps.
26.9. Configuring OVN-Kubernetes internal IP address subnets
As a cluster administrator, you can change the IP address ranges that the OVN-Kubernetes network plugin uses for the join and transit subnets.
26.9.1. Configuring the OVN-Kubernetes join subnet
You can change the join subnet used by OVN-Kubernetes to avoid conflicting with any existing subnets already in use in your environment.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster with a user with
cluster-admin
privileges. - Ensure that the cluster uses the OVN-Kubernetes network plugin.
Procedure
To change the OVN-Kubernetes join subnet, enter the following command:
oc patch network.operator.openshift.io cluster --type='merge' \ -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":
$ oc patch network.operator.openshift.io cluster --type='merge' \ -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig": {"ipv4":{"internalJoinSubnet": "<join_subnet>"}, "ipv6":{"internalJoinSubnet": "<join_subnet>"}}}}}'
Copy to Clipboard Copied! where:
<join_subnet>
-
Specifies an IP address subnet for internal use by OVN-Kubernetes. The subnet must be larger than the number of nodes in the cluster and it must be large enough to accommodate one IP address per node in the cluster. This subnet cannot overlap with any other subnets used by OpenShift Container Platform or on the host itself. The default value for IPv4 is
100.64.0.0/16
and the default value for IPv6 isfd98::/64
.
Example output
network.operator.openshift.io/cluster patched
network.operator.openshift.io/cluster patched
Copy to Clipboard Copied!
Verification
To confirm that the configuration is active, enter the following command:
oc get network.operator.openshift.io \ -o jsonpath="{.items[0].spec.defaultNetwork}"
$ oc get network.operator.openshift.io \ -o jsonpath="{.items[0].spec.defaultNetwork}"
Copy to Clipboard Copied! The command operation can take up to 30 minutes for this change to take effect.
Example output
{ "ovnKubernetesConfig": { "ipv4": { "internalJoinSubnet": "100.64.1.0/16" }, }, "type": "OVNKubernetes" }
{ "ovnKubernetesConfig": { "ipv4": { "internalJoinSubnet": "100.64.1.0/16" }, }, "type": "OVNKubernetes" }
Copy to Clipboard Copied!
26.9.2. Configuring the OVN-Kubernetes transit subnet
You can change the transit subnet used by OVN-Kubernetes to avoid conflicting with any existing subnets already in use in your environment.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster with a user with
cluster-admin
privileges. - Ensure that the cluster uses the OVN-Kubernetes network plugin.
Procedure
To change the OVN-Kubernetes transit subnet, enter the following command:
oc patch network.operator.openshift.io cluster --type='merge' \ -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":
$ oc patch network.operator.openshift.io cluster --type='merge' \ -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig": {"ipv4":{"internalTransitSwitchSubnet": "<transit_subnet>"}, "ipv6":{"internalTransitSwitchSubnet": "<transit_subnet>"}}}}}'
Copy to Clipboard Copied! where:
<transit_subnet>
-
Specifies an IP address subnet for the distributed transit switch that enables east-west traffic. This subnet cannot overlap with any other subnets used by OVN-Kubernetes or on the host itself. The default value for IPv4 is
100.88.0.0/16
and the default value for IPv6 isfd97::/64
.
Example output
network.operator.openshift.io/cluster patched
network.operator.openshift.io/cluster patched
Copy to Clipboard Copied!
Verification
To confirm that the configuration is active, enter the following command:
oc get network.operator.openshift.io \ -o jsonpath="{.items[0].spec.defaultNetwork}"
$ oc get network.operator.openshift.io \ -o jsonpath="{.items[0].spec.defaultNetwork}"
Copy to Clipboard Copied! It can take up to 30 minutes for this change to take effect.
Example output
{ "ovnKubernetesConfig": { "ipv4": { "internalTransitSwitchSubnet": "100.88.1.0/16" }, }, "type": "OVNKubernetes" }
{ "ovnKubernetesConfig": { "ipv4": { "internalTransitSwitchSubnet": "100.88.1.0/16" }, }, "type": "OVNKubernetes" }
Copy to Clipboard Copied!
26.10. Logging for egress firewall and network policy rules
As a cluster administrator, you can configure audit logging for your cluster and enable logging for one or more namespaces. OpenShift Container Platform produces audit logs for both egress firewalls and network policies.
Audit logging is available for only the OVN-Kubernetes network plugin.
26.10.1. Audit logging
The OVN-Kubernetes network plugin uses Open Virtual Network (OVN) ACLs to manage egress firewalls and network policies. Audit logging exposes allow and deny ACL events.
You can configure the destination for audit logs, such as a syslog server or a UNIX domain socket. Regardless of any additional configuration, an audit log is always saved to /var/log/ovn/acl-audit-log.log
on each OVN-Kubernetes pod in the cluster.
You can enable audit logging for each namespace by annotating each namespace configuration with a k8s.ovn.org/acl-logging
section. In the k8s.ovn.org/acl-logging
section, you must specify allow
, deny
, or both values to enable audit logging for a namespace.
A network policy does not support setting the Pass
action set as a rule.
The ACL-logging implementation logs access control list (ACL) events for a network. You can view these logs to analyze any potential security issues.
Example namespace annotation
kind: Namespace apiVersion: v1 metadata: name: example1 annotations: k8s.ovn.org/acl-logging: |- { "deny": "info", "allow": "info" }
kind: Namespace
apiVersion: v1
metadata:
name: example1
annotations:
k8s.ovn.org/acl-logging: |-
{
"deny": "info",
"allow": "info"
}
To view the default ACL logging configuration values, see the policyAuditConfig
object in the cluster-network-03-config.yml
file. If required, you can change the ACL logging configuration values for log file parameters in this file.
The logging message format is compatible with syslog as defined by RFC5424. The syslog facility is configurable and defaults to local0
. The following example shows key parameters and their values outputted in a log message:
Example logging message that outputs parameters and their values
<timestamp>|<message_serial>|acl_log(ovn_pinctrl0)|<severity>|name="<acl_name>", verdict="<verdict>", severity="<severity>", direction="<direction>": <flow>
<timestamp>|<message_serial>|acl_log(ovn_pinctrl0)|<severity>|name="<acl_name>", verdict="<verdict>", severity="<severity>", direction="<direction>": <flow>
Where:
-
<timestamp>
states the time and date for the creation of a log message. -
<message_serial>
lists the serial number for a log message. -
acl_log(ovn_pinctrl0)
is a literal string that prints the location of the log message in the OVN-Kubernetes plugin. -
<severity>
sets the severity level for a log message. If you enable audit logging that supportsallow
anddeny
tasks then two severity levels show in the log message output. -
<name>
states the name of the ACL-logging implementation in the OVN Network Bridging Database (nbdb
) that was created by the network policy. -
<verdict>
can be eitherallow
ordrop
. -
<direction>
can be eitherto-lport
orfrom-lport
to indicate that the policy was applied to traffic going to or away from a pod. -
<flow>
shows packet information in a format equivalent to theOpenFlow
protocol. This parameter comprises Open vSwitch (OVS) fields.
The following example shows OVS fields that the flow
parameter uses to extract packet information from system memory:
Example of OVS fields used by the flow
parameter to extract packet information
<proto>,vlan_tci=0x0000,dl_src=<src_mac>,dl_dst=<source_mac>,nw_src=<source_ip>,nw_dst=<target_ip>,nw_tos=<tos_dscp>,nw_ecn=<tos_ecn>,nw_ttl=<ip_ttl>,nw_frag=<fragment>,tp_src=<tcp_src_port>,tp_dst=<tcp_dst_port>,tcp_flags=<tcp_flags>
<proto>,vlan_tci=0x0000,dl_src=<src_mac>,dl_dst=<source_mac>,nw_src=<source_ip>,nw_dst=<target_ip>,nw_tos=<tos_dscp>,nw_ecn=<tos_ecn>,nw_ttl=<ip_ttl>,nw_frag=<fragment>,tp_src=<tcp_src_port>,tp_dst=<tcp_dst_port>,tcp_flags=<tcp_flags>
Where:
-
<proto>
states the protocol. Valid values aretcp
andudp
. -
vlan_tci=0x0000
states the VLAN header as0
because a VLAN ID is not set for internal pod network traffic. -
<src_mac>
specifies the source for the Media Access Control (MAC) address. -
<source_mac>
specifies the destination for the MAC address. -
<source_ip>
lists the source IP address -
<target_ip>
lists the target IP address. -
<tos_dscp>
states Differentiated Services Code Point (DSCP) values to classify and prioritize certain network traffic over other traffic. -
<tos_ecn>
states Explicit Congestion Notification (ECN) values that indicate any congested traffic in your network. -
<ip_ttl>
states the Time To Live (TTP) information for an packet. -
<fragment>
specifies what type of IP fragments or IP non-fragments to match. -
<tcp_src_port>
shows the source for the port for TCP and UDP protocols. -
<tcp_dst_port>
lists the destination port for TCP and UDP protocols. -
<tcp_flags>
supports numerous flags such asSYN
,ACK
,PSH
and so on. If you need to set multiple values then each value is separated by a vertical bar (|
). The UDP protocol does not support this parameter.
For more information about the previous field descriptions, go to the OVS manual page for ovs-fields
.
Example ACL deny log entry for a network policy
2023-11-02T16:28:54.139Z|00004|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn 2023-11-02T16:28:55.187Z|00005|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn 2023-11-02T16:28:57.235Z|00006|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
2023-11-02T16:28:54.139Z|00004|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
2023-11-02T16:28:55.187Z|00005|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
2023-11-02T16:28:57.235Z|00006|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
The following table describes namespace annotation values:
Field | Description |
---|---|
|
Blocks namespace access to any traffic that matches an ACL rule with the |
|
Permits namespace access to any traffic that matches an ACL rule with the |
|
A |
26.10.2. Audit configuration
The configuration for audit logging is specified as part of the OVN-Kubernetes cluster network provider configuration. The following YAML illustrates the default values for the audit logging:
Audit logging configuration
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: defaultNetwork: ovnKubernetesConfig: policyAuditConfig: destination: "null" maxFileSize: 50 rateLimit: 20 syslogFacility: local0
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
defaultNetwork:
ovnKubernetesConfig:
policyAuditConfig:
destination: "null"
maxFileSize: 50
rateLimit: 20
syslogFacility: local0
The following table describes the configuration fields for audit logging.
Field | Type | Description |
---|---|---|
| integer |
The maximum number of messages to generate every second per node. The default value is |
| integer |
The maximum size for the audit log in bytes. The default value is |
| integer | The maximum number of log files that are retained. |
| string | One of the following additional audit log targets:
|
| string |
The syslog facility, such as |
26.10.3. Configuring egress firewall and network policy auditing for a cluster
As a cluster administrator, you can customize audit logging for your cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster with a user with
cluster-admin
privileges.
Procedure
To customize the audit logging configuration, enter the following command:
oc edit network.operator.openshift.io/cluster
$ oc edit network.operator.openshift.io/cluster
Copy to Clipboard Copied! TipYou can alternatively customize and apply the following YAML to configure audit logging:
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: defaultNetwork: ovnKubernetesConfig: policyAuditConfig: destination: "null" maxFileSize: 50 rateLimit: 20 syslogFacility: local0
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: defaultNetwork: ovnKubernetesConfig: policyAuditConfig: destination: "null" maxFileSize: 50 rateLimit: 20 syslogFacility: local0
Copy to Clipboard Copied!
Verification
To create a namespace with network policies complete the following steps:
Create a namespace for verification:
cat <<EOF| oc create -f - kind: Namespace apiVersion: v1 metadata: name: verify-audit-logging annotations: k8s.ovn.org/acl-logging: '{ "deny": "alert", "allow": "alert" }' EOF
$ cat <<EOF| oc create -f - kind: Namespace apiVersion: v1 metadata: name: verify-audit-logging annotations: k8s.ovn.org/acl-logging: '{ "deny": "alert", "allow": "alert" }' EOF
Copy to Clipboard Copied! Example output
namespace/verify-audit-logging created
namespace/verify-audit-logging created
Copy to Clipboard Copied! Create network policies for the namespace:
cat <<EOF| oc create -n verify-audit-logging -f - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: deny-all spec: podSelector: matchLabels: policyTypes: - Ingress - Egress --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-same-namespace namespace: verify-audit-logging spec: podSelector: {} policyTypes: - Ingress - Egress ingress: - from: - podSelector: {} egress: - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: verify-audit-logging EOF
$ cat <<EOF| oc create -n verify-audit-logging -f - apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: deny-all spec: podSelector: matchLabels: policyTypes: - Ingress - Egress --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-from-same-namespace namespace: verify-audit-logging spec: podSelector: {} policyTypes: - Ingress - Egress ingress: - from: - podSelector: {} egress: - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: verify-audit-logging EOF
Copy to Clipboard Copied! Example output
networkpolicy.networking.k8s.io/deny-all created networkpolicy.networking.k8s.io/allow-from-same-namespace created
networkpolicy.networking.k8s.io/deny-all created networkpolicy.networking.k8s.io/allow-from-same-namespace created
Copy to Clipboard Copied!
Create a pod for source traffic in the
default
namespace:cat <<EOF| oc create -n default -f - apiVersion: v1 kind: Pod metadata: name: client spec: containers: - name: client image: registry.access.redhat.com/rhel7/rhel-tools command: ["/bin/sh", "-c"] args: ["sleep inf"] EOF
$ cat <<EOF| oc create -n default -f - apiVersion: v1 kind: Pod metadata: name: client spec: containers: - name: client image: registry.access.redhat.com/rhel7/rhel-tools command: ["/bin/sh", "-c"] args: ["sleep inf"] EOF
Copy to Clipboard Copied! Create two pods in the
verify-audit-logging
namespace:for name in client server; do
$ for name in client server; do cat <<EOF| oc create -n verify-audit-logging -f - apiVersion: v1 kind: Pod metadata: name: ${name} spec: containers: - name: ${name} image: registry.access.redhat.com/rhel7/rhel-tools command: ["/bin/sh", "-c"] args: ["sleep inf"] EOF done
Copy to Clipboard Copied! Example output
pod/client created pod/server created
pod/client created pod/server created
Copy to Clipboard Copied! To generate traffic and produce network policy audit log entries, complete the following steps:
Obtain the IP address for pod named
server
in theverify-audit-logging
namespace:POD_IP=$(oc get pods server -n verify-audit-logging -o jsonpath='{.status.podIP}')
$ POD_IP=$(oc get pods server -n verify-audit-logging -o jsonpath='{.status.podIP}')
Copy to Clipboard Copied! Ping the IP address from the previous command from the pod named
client
in thedefault
namespace and confirm that all packets are dropped:oc exec -it client -n default -- /bin/ping -c 2 $POD_IP
$ oc exec -it client -n default -- /bin/ping -c 2 $POD_IP
Copy to Clipboard Copied! Example output
PING 10.128.2.55 (10.128.2.55) 56(84) bytes of data. --- 10.128.2.55 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 2041ms
PING 10.128.2.55 (10.128.2.55) 56(84) bytes of data. --- 10.128.2.55 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 2041ms
Copy to Clipboard Copied! Ping the IP address saved in the
POD_IP
shell environment variable from the pod namedclient
in theverify-audit-logging
namespace and confirm that all packets are allowed:oc exec -it client -n verify-audit-logging -- /bin/ping -c 2 $POD_IP
$ oc exec -it client -n verify-audit-logging -- /bin/ping -c 2 $POD_IP
Copy to Clipboard Copied! Example output
PING 10.128.0.86 (10.128.0.86) 56(84) bytes of data. 64 bytes from 10.128.0.86: icmp_seq=1 ttl=64 time=2.21 ms 64 bytes from 10.128.0.86: icmp_seq=2 ttl=64 time=0.440 ms --- 10.128.0.86 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.440/1.329/2.219/0.890 ms
PING 10.128.0.86 (10.128.0.86) 56(84) bytes of data. 64 bytes from 10.128.0.86: icmp_seq=1 ttl=64 time=2.21 ms 64 bytes from 10.128.0.86: icmp_seq=2 ttl=64 time=0.440 ms --- 10.128.0.86 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 0.440/1.329/2.219/0.890 ms
Copy to Clipboard Copied!
Display the latest entries in the network policy audit log:
for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do
$ for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do oc exec -it $pod -n openshift-ovn-kubernetes -- tail -4 /var/log/ovn/acl-audit-log.log done
Copy to Clipboard Copied! Example output
2023-11-02T16:28:54.139Z|00004|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn 2023-11-02T16:28:55.187Z|00005|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn 2023-11-02T16:28:57.235Z|00006|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn 2023-11-02T16:49:57.909Z|00028|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:57.909Z|00029|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:58.932Z|00030|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:58.932Z|00031|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
2023-11-02T16:28:54.139Z|00004|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn 2023-11-02T16:28:55.187Z|00005|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn 2023-11-02T16:28:57.235Z|00006|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn 2023-11-02T16:49:57.909Z|00028|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:57.909Z|00029|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:58.932Z|00030|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:58.932Z|00031|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
Copy to Clipboard Copied!
26.10.4. Enabling egress firewall and network policy audit logging for a namespace
As a cluster administrator, you can enable audit logging for a namespace.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster with a user with
cluster-admin
privileges.
Procedure
To enable audit logging for a namespace, enter the following command:
oc annotate namespace <namespace> \ k8s.ovn.org/acl-logging='{ "deny": "alert", "allow": "notice" }'
$ oc annotate namespace <namespace> \ k8s.ovn.org/acl-logging='{ "deny": "alert", "allow": "notice" }'
Copy to Clipboard Copied! where:
<namespace>
- Specifies the name of the namespace.
TipYou can alternatively apply the following YAML to enable audit logging:
kind: Namespace apiVersion: v1 metadata: name: <namespace> annotations: k8s.ovn.org/acl-logging: |- { "deny": "alert", "allow": "notice" }
kind: Namespace apiVersion: v1 metadata: name: <namespace> annotations: k8s.ovn.org/acl-logging: |- { "deny": "alert", "allow": "notice" }
Copy to Clipboard Copied! Example output
namespace/verify-audit-logging annotated
namespace/verify-audit-logging annotated
Copy to Clipboard Copied!
Verification
Display the latest entries in the audit log:
for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do
$ for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do oc exec -it $pod -n openshift-ovn-kubernetes -- tail -4 /var/log/ovn/acl-audit-log.log done
Copy to Clipboard Copied! Example output
2023-11-02T16:49:57.909Z|00028|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:57.909Z|00029|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:58.932Z|00030|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:58.932Z|00031|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
2023-11-02T16:49:57.909Z|00028|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:57.909Z|00029|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:58.932Z|00030|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0 2023-11-02T16:49:58.932Z|00031|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
Copy to Clipboard Copied!
26.10.5. Disabling egress firewall and network policy audit logging for a namespace
As a cluster administrator, you can disable audit logging for a namespace.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster with a user with
cluster-admin
privileges.
Procedure
To disable audit logging for a namespace, enter the following command:
oc annotate --overwrite namespace <namespace> k8s.ovn.org/acl-logging-
$ oc annotate --overwrite namespace <namespace> k8s.ovn.org/acl-logging-
Copy to Clipboard Copied! where:
<namespace>
- Specifies the name of the namespace.
TipYou can alternatively apply the following YAML to disable audit logging:
kind: Namespace apiVersion: v1 metadata: name: <namespace> annotations: k8s.ovn.org/acl-logging: null
kind: Namespace apiVersion: v1 metadata: name: <namespace> annotations: k8s.ovn.org/acl-logging: null
Copy to Clipboard Copied! Example output
namespace/verify-audit-logging annotated
namespace/verify-audit-logging annotated
Copy to Clipboard Copied!
26.11. Configuring IPsec encryption
By enabling IPsec, you can encrypt both internal pod-to-pod cluster traffic between nodes and external traffic between pods and IPsec endpoints external to your cluster. All pod-to-pod network traffic between nodes on the OVN-Kubernetes cluster network is encrypted with IPsec in Transport mode.
IPsec is disabled by default. You can enable IPsec either during or after installing the cluster. For information about cluster installation, see OpenShift Container Platform installation overview.
Upgrading your cluster to OpenShift Container Platform 4.15 when the libreswan
and NetworkManager-libreswan
packages have different OpenShift Container Platform versions causes two consecutive compute node reboot operations. For the first reboot, the Cluster Network Operator (CNO) applies the IPsec configuration to compute nodes. For the second reboot, the Machine Config Operator (MCO) applies the latest machine configs to the cluster.
To combine the CNO and MCO updates into a single node reboot, complete the following tasks:
-
Before upgrading your cluster, set the
paused
parameter totrue
in theMachineConfigPools
custom resource (CR) that groups compute nodes. -
After you upgrade your cluster, set the parameter to
false
.
For more information, see Performing a Control Plane Only update.
The following support limitations exist for IPsec on a OpenShift Container Platform cluster:
- On IBM Cloud®, IPsec supports only NAT-T. Encapsulating Security Payload (ESP) is not supported on this platform.
- If your cluster uses hosted control planes for Red Hat OpenShift Container Platform, IPsec is not supported for IPsec encryption of either pod-to-pod or traffic to external hosts.
- Using ESP hardware offloading on any network interface is not supported if one or more of those interfaces is attached to Open vSwitch (OVS). Enabling IPsec for your cluster triggers the use of IPsec with interfaces attached to OVS. By default, OpenShift Container Platform disables ESP hardware offloading on any interfaces attached to OVS.
- If you enabled IPsec for network interfaces that are not attached to OVS, a cluster administrator must manually disable ESP hardware offloading on each interface that is not attached to OVS.
-
IPsec is not supported on Red Hat Enterprise Linux (RHEL) compute nodes because of a
libreswan
incompatiblility issue between a host and anovn-ipsec
container that exist in each compute node. See (OCPBUGS-53316).
The following list outlines key tasks in the IPsec documentation:
- Enable and disable IPsec after cluster installation.
- Configure IPsec encryption for traffic between the cluster and external hosts.
- Verify that IPsec encrypts traffic between pods on different nodes.
26.11.1. Modes of operation
When using IPsec on your OpenShift Container Platform cluster, you can choose from the following operating modes:
Mode | Description | Default |
---|---|---|
| No traffic is encrypted. This is the cluster default. | Yes |
| Pod-to-pod traffic is encrypted as described in "Types of network traffic flows encrypted by pod-to-pod IPsec". Traffic to external nodes may be encrypted after you complete the required configuration steps for IPsec. | No |
| Traffic to external nodes may be encrypted after you complete the required configuration steps for IPsec. | No |
26.11.2. Prerequisites
For IPsec support for encrypting traffic to external hosts, ensure that the following prerequisites are met:
-
The OVN-Kubernetes network plugin must be configured in local gateway mode, where
ovnKubernetesConfig.gatewayConfig.routingViaHost=true
. The NMState Operator is installed. This Operator is required for specifying the IPsec configuration. For more information, see About the Kubernetes NMState Operator.
NoteThe NMState Operator is supported on Google Cloud Platform (GCP) only for configuring IPsec.
-
The Butane tool (
butane
) is installed. To install Butane, see Installing Butane.
These prerequisites are required to add certificates into the host NSS database and to configure IPsec to communicate with external hosts.
26.11.3. Network connectivity requirements when IPsec is enabled
You must configure the network connectivity between machines to allow OpenShift Container Platform cluster components to communicate. Each machine must be able to resolve the hostnames of all other machines in the cluster.
Protocol | Port | Description |
---|---|---|
UDP |
| IPsec IKE packets |
| IPsec NAT-T packets | |
ESP | N/A | IPsec Encapsulating Security Payload (ESP) |
26.11.4. IPsec encryption for pod-to-pod traffic
For IPsec encryption of pod-to-pod traffic, the following sections describe which specific pod-to-pod traffic is encrypted, what kind of encryption protocol is used, and how X.509 certificates are handled. These sections do not apply to IPsec encryption between the cluster and external hosts, which you must configure manually for your specific external network infrastructure.
26.11.4.1. Types of network traffic flows encrypted by pod-to-pod IPsec
With IPsec enabled, only the following network traffic flows between pods are encrypted:
- Traffic between pods on different nodes on the cluster network
- Traffic from a pod on the host network to a pod on the cluster network
The following traffic flows are not encrypted:
- Traffic between pods on the same node on the cluster network
- Traffic between pods on the host network
- Traffic from a pod on the cluster network to a pod on the host network
The encrypted and unencrypted flows are illustrated in the following diagram:

26.11.4.2. Encryption protocol and IPsec mode
The encrypt cipher used is AES-GCM-16-256
. The integrity check value (ICV) is 16
bytes. The key length is 256
bits.
The IPsec mode used is Transport mode, a mode that encrypts end-to-end communication by adding an Encapsulated Security Payload (ESP) header to the IP header of the original packet and encrypts the packet data. OpenShift Container Platform does not currently use or support IPsec Tunnel mode for pod-to-pod communication.
26.11.4.3. Security certificate generation and rotation
The Cluster Network Operator (CNO) generates a self-signed X.509 certificate authority (CA) that is used by IPsec for encryption. Certificate signing requests (CSRs) from each node are automatically fulfilled by the CNO.
The CA is valid for 10 years. The individual node certificates are valid for 5 years and are automatically rotated after 4 1/2 years elapse.
26.11.5. IPsec encryption for external traffic
OpenShift Container Platform supports IPsec encryption for traffic to external hosts with TLS certificates that you must supply.
26.11.5.1. Supported platforms
This feature is supported on the following platforms:
- Bare metal
- Google Cloud Platform (GCP)
- Red Hat OpenStack Platform (RHOSP)
- VMware vSphere
If you have Red Hat Enterprise Linux (RHEL) worker nodes, these do not support IPsec encryption for external traffic.
If your cluster uses hosted control planes for Red Hat OpenShift Container Platform, configuring IPsec for encrypting traffic to external hosts is not supported.
26.11.5.2. Limitations
Ensure that the following prohibitions are observed:
- IPv6 configuration is not currently supported by the NMState Operator when configuring IPsec for external traffic.
-
Certificate common names (CN) in the provided certificate bundle must not begin with the
ovs_
prefix, because this naming can conflict with pod-to-pod IPsec CN names in the Network Security Services (NSS) database of each node.
26.11.6. Enabling IPsec encryption
As a cluster administrator, you can enable pod-to-pod IPsec encryption, IPsec encryption between the cluster, and external IPsec endpoints.
You can configure IPsec in either of the following modes:
-
Full
: Encryption for pod-to-pod and external traffic -
External
: Encryption for external traffic
If you configure IPsec in Full
mode, you must also complete the "Configuring IPsec encryption for external traffic" procedure.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
You are logged in to the cluster as a user with
cluster-admin
privileges. -
You have reduced the size of your cluster MTU by
46
bytes to allow for the overhead of the IPsec ESP header.
Procedure
To enable IPsec encryption, enter the following command:
oc patch networks.operator.openshift.io cluster --type=merge -p \ '{
$ oc patch networks.operator.openshift.io cluster --type=merge -p \ '{ "spec":{ "defaultNetwork":{ "ovnKubernetesConfig":{ "ipsecConfig":{ "mode":"<mode">
1 }}}}}'
Copy to Clipboard Copied! - 1
- Specify
External
to encrypt traffic to external hosts or specifyFull
to encrypt pod-to-pod traffic and, optionally, traffic to external hosts. By default, IPsec is disabled.
- Encrypt external traffic with IPsec by completing the "Configuring IPsec encryption for external traffic" procedure.
Verification
To find the names of the OVN-Kubernetes data plane pods, enter the following command:
oc get pods -n openshift-ovn-kubernetes -l=app=ovnkube-node
$ oc get pods -n openshift-ovn-kubernetes -l=app=ovnkube-node
Copy to Clipboard Copied! Example output
ovnkube-node-5xqbf 8/8 Running 0 28m ovnkube-node-6mwcx 8/8 Running 0 29m ovnkube-node-ck5fr 8/8 Running 0 31m ovnkube-node-fr4ld 8/8 Running 0 26m ovnkube-node-wgs4l 8/8 Running 0 33m ovnkube-node-zfvcl 8/8 Running 0 34m ...
ovnkube-node-5xqbf 8/8 Running 0 28m ovnkube-node-6mwcx 8/8 Running 0 29m ovnkube-node-ck5fr 8/8 Running 0 31m ovnkube-node-fr4ld 8/8 Running 0 26m ovnkube-node-wgs4l 8/8 Running 0 33m ovnkube-node-zfvcl 8/8 Running 0 34m ...
Copy to Clipboard Copied! Verify that IPsec is enabled on your cluster by running the following command:
NoteAs a cluster administrator, you can verify that IPsec is enabled between pods on your cluster when IPsec is configured in
Full
mode. This step does not verify whether IPsec is working between your cluster and external hosts.oc -n openshift-ovn-kubernetes rsh ovnkube-node-<XXXXX> ovn-nbctl --no-leader-only get nb_global . ipsec
$ oc -n openshift-ovn-kubernetes rsh ovnkube-node-<XXXXX> ovn-nbctl --no-leader-only get nb_global . ipsec
1 Copy to Clipboard Copied! - 1
- Where
<XXXXX>
specifies the random sequence of letters for a pod from the previous step.
Example output
true
true
Copy to Clipboard Copied!
26.11.7. Configuring IPsec encryption for external traffic
As a cluster administrator, to encrypt external traffic with IPsec you must configure IPsec for your network infrastructure, including providing PKCS#12 certificates. Because this procedure uses Butane to create machine configs, you must have the butane
command installed.
After you apply the machine config, the Machine Config Operator reboots affected nodes in your cluster to rollout the new machine config.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
You have installed the
butane
utility on your local computer. - You have installed the NMState Operator on the cluster.
-
You are logged in to the cluster as a user with
cluster-admin
privileges. - You have an existing PKCS#12 certificate for the IPsec endpoint and a CA cert in PEM format.
-
You enabled IPsec in either
Full
orExternal
mode on your cluster. -
The OVN-Kubernetes network plugin must be configured in local gateway mode, where
ovnKubernetesConfig.gatewayConfig.routingViaHost=true
.
Procedure
Create an IPsec configuration with an NMState Operator node network configuration policy. For more information, see Libreswan as an IPsec VPN implementation.
To identify the IP address of the cluster node that is the IPsec endpoint, enter the following command:
oc get nodes
$ oc get nodes
Copy to Clipboard Copied! Create a file named
ipsec-config.yaml
that contains a node network configuration policy for the NMState Operator, such as in the following examples. For an overview aboutNodeNetworkConfigurationPolicy
objects, see The Kubernetes NMState project.Example NMState IPsec transport configuration
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: ipsec-config spec: nodeSelector: kubernetes.io/hostname: "<hostname>" desiredState: interfaces: - name: <interface_name> type: ipsec libreswan: left: <cluster_node> leftid: '%fromcert' leftrsasigkey: '%cert' leftcert: left_server leftmodecfgclient: false right: <external_host> rightid: '%fromcert' rightrsasigkey: '%cert' rightsubnet: <external_address>/32 ikev2: insist type: transport
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: ipsec-config spec: nodeSelector: kubernetes.io/hostname: "<hostname>"
1 desiredState: interfaces: - name: <interface_name>
2 type: ipsec libreswan: left: <cluster_node>
3 leftid: '%fromcert' leftrsasigkey: '%cert' leftcert: left_server leftmodecfgclient: false right: <external_host>
4 rightid: '%fromcert' rightrsasigkey: '%cert' rightsubnet: <external_address>/32
5 ikev2: insist type: transport
Copy to Clipboard Copied! - 1
- Specifies the host name to apply the policy to. This host serves as the left side host in the IPsec configuration.
- 2
- Specifies the name of the interface to create on the host.
- 3
- Specifies the host name of the cluster node that terminates the IPsec tunnel on the cluster side. The name should match SAN
[Subject Alternate Name]
from your supplied PKCS#12 certificates. - 4
- Specifies the external host name, such as
host.example.com
. The name should match the SAN[Subject Alternate Name]
from your supplied PKCS#12 certificates. - 5
- Specifies the IP address of the external host, such as
10.1.2.3/32
.
Example NMState IPsec tunnel configuration
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: ipsec-config spec: nodeSelector: kubernetes.io/hostname: "<hostname>" desiredState: interfaces: - name: <interface_name> type: ipsec libreswan: left: <cluster_node> leftid: '%fromcert' leftmodecfgclient: false leftrsasigkey: '%cert' leftcert: left_server right: <external_host> rightid: '%fromcert' rightrsasigkey: '%cert' rightsubnet: <external_address>/32 ikev2: insist type: tunnel
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: ipsec-config spec: nodeSelector: kubernetes.io/hostname: "<hostname>"
1 desiredState: interfaces: - name: <interface_name>
2 type: ipsec libreswan: left: <cluster_node>
3 leftid: '%fromcert' leftmodecfgclient: false leftrsasigkey: '%cert' leftcert: left_server right: <external_host>
4 rightid: '%fromcert' rightrsasigkey: '%cert' rightsubnet: <external_address>/32
5 ikev2: insist type: tunnel
Copy to Clipboard Copied! - 1
- Specifies the host name to apply the policy to. This host serves as the left side host in the IPsec configuration.
- 2
- Specifies the name of the interface to create on the host.
- 3
- Specifies the host name of the cluster node that terminates the IPsec tunnel on the cluster side. The name should match SAN
[Subject Alternate Name]
from your supplied PKCS#12 certificates. - 4
- Specifies the external host name, such as
host.example.com
. The name should match the SAN[Subject Alternate Name]
from your supplied PKCS#12 certificates. - 5
- Specifies the IP address of the external host, such as
10.1.2.3/32
.
To configure the IPsec interface, enter the following command:
oc create -f ipsec-config.yaml
$ oc create -f ipsec-config.yaml
Copy to Clipboard Copied!
Provide the following certificate files to add to the Network Security Services (NSS) database on each host. These files are imported as part of the Butane configuration in subsequent steps.
-
left_server.p12
: The certificate bundle for the IPsec endpoints -
ca.pem
: The certificate authority that you signed your certificates with
-
Create a machine config to add your certificates to the cluster:
To create Butane config files for the control plane and worker nodes, enter the following command:
NoteThe Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in
0
. For example,4.15.0
. See "Creating machine configs with Butane" for information about Butane.for role in master worker; do
$ for role in master worker; do cat >> "99-ipsec-${role}-endpoint-config.bu" <<-EOF variant: openshift version: 4.15.0 metadata: name: 99-${role}-import-certs labels: machineconfiguration.openshift.io/role: $role systemd: units: - name: ipsec-import.service enabled: true contents: | [Unit] Description=Import external certs into ipsec NSS Before=ipsec.service [Service] Type=oneshot ExecStart=/usr/local/bin/ipsec-addcert.sh RemainAfterExit=false StandardOutput=journal [Install] WantedBy=multi-user.target storage: files: - path: /etc/pki/certs/ca.pem mode: 0400 overwrite: true contents: local: ca.pem - path: /etc/pki/certs/left_server.p12 mode: 0400 overwrite: true contents: local: left_server.p12 - path: /usr/local/bin/ipsec-addcert.sh mode: 0740 overwrite: true contents: inline: | #!/bin/bash -e echo "importing cert to NSS" certutil -A -n "CA" -t "CT,C,C" -d /var/lib/ipsec/nss/ -i /etc/pki/certs/ca.pem pk12util -W "" -i /etc/pki/certs/left_server.p12 -d /var/lib/ipsec/nss/ certutil -M -n "left_server" -t "u,u,u" -d /var/lib/ipsec/nss/ EOF done
Copy to Clipboard Copied! To transform the Butane files that you created in the previous step into machine configs, enter the following command:
for role in master worker; do
$ for role in master worker; do butane -d . 99-ipsec-${role}-endpoint-config.bu -o ./99-ipsec-$role-endpoint-config.yaml done
Copy to Clipboard Copied!
To apply the machine configs to your cluster, enter the following command:
for role in master worker; do
$ for role in master worker; do oc apply -f 99-ipsec-${role}-endpoint-config.yaml done
Copy to Clipboard Copied! ImportantAs the Machine Config Operator (MCO) updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated before external IPsec connectivity is available.
Check the machine config pool status by entering the following command:
oc get mcp
$ oc get mcp
Copy to Clipboard Copied! A successfully updated node has the following status:
UPDATED=true
,UPDATING=false
,DEGRADED=false
.NoteBy default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.
To confirm that IPsec machine configs rolled out successfully, enter the following commands:
Confirm that the IPsec machine configs were created:
oc get mc | grep ipsec
$ oc get mc | grep ipsec
Copy to Clipboard Copied! Example output
80-ipsec-master-extensions 3.2.0 6d15h 80-ipsec-worker-extensions 3.2.0 6d15h
80-ipsec-master-extensions 3.2.0 6d15h 80-ipsec-worker-extensions 3.2.0 6d15h
Copy to Clipboard Copied! Confirm that the that the IPsec extension are applied to control plane nodes:
oc get mcp master -o yaml | grep 80-ipsec-master-extensions -c
$ oc get mcp master -o yaml | grep 80-ipsec-master-extensions -c
Copy to Clipboard Copied! Expected output
2
2
Copy to Clipboard Copied! Confirm that the that the IPsec extension are applied to worker nodes:
oc get mcp worker -o yaml | grep 80-ipsec-worker-extensions -c
$ oc get mcp worker -o yaml | grep 80-ipsec-worker-extensions -c
Copy to Clipboard Copied! Expected output
2
2
Copy to Clipboard Copied!
26.11.8. Disabling IPsec encryption for an external IPsec endpoint
As a cluster administrator, you can remove an existing IPsec tunnel to an external host.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
You are logged in to the cluster as a user with
cluster-admin
privileges. -
You enabled IPsec in either
Full
orExternal
mode on your cluster.
Procedure
Create a file named
remove-ipsec-tunnel.yaml
with the following YAML:kind: NodeNetworkConfigurationPolicy apiVersion: nmstate.io/v1 metadata: name: <name> spec: nodeSelector: kubernetes.io/hostname: <node_name> desiredState: interfaces: - name: <tunnel_name> type: ipsec state: absent
kind: NodeNetworkConfigurationPolicy apiVersion: nmstate.io/v1 metadata: name: <name> spec: nodeSelector: kubernetes.io/hostname: <node_name> desiredState: interfaces: - name: <tunnel_name> type: ipsec state: absent
Copy to Clipboard Copied! where:
name
- Specifies a name for the node network configuration policy.
node_name
- Specifies the name of the node where the IPsec tunnel that you want to remove exists.
tunnel_name
- Specifies the interface name for the existing IPsec tunnel.
To remove the IPsec tunnel, enter the following command:
oc apply -f remove-ipsec-tunnel.yaml
$ oc apply -f remove-ipsec-tunnel.yaml
Copy to Clipboard Copied!
26.11.9. Disabling IPsec encryption
As a cluster administrator, you can disable IPsec encryption.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster with a user with
cluster-admin
privileges.
Procedure
To disable IPsec encryption, enter the following command:
oc patch networks.operator.openshift.io cluster --type=merge \ -p '{
$ oc patch networks.operator.openshift.io cluster --type=merge \ -p '{ "spec":{ "defaultNetwork":{ "ovnKubernetesConfig":{ "ipsecConfig":{ "mode":"Disabled" }}}}}'
Copy to Clipboard Copied! -
Optional: You can increase the size of your cluster MTU by
46
bytes because there is no longer any overhead from the IPsec ESP header in IP packets.
26.11.10. Additional resources
- Configuring a VPN with IPsec in Red Hat Enterprise Linux (RHEL) 9
- Installing Butane
- About the OVN-Kubernetes Container Network Interface (CNI) network plugin
- Changing the MTU for the cluster network
- link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.15/html-single/operator_apis/#network-operator-openshift-io-v1[Network [operator.openshift.io/v1\] API
26.12. Configure an external gateway on the default network
As a cluster administrator, you can configure an external gateway on the default network.
This feature offers the following benefits:
- Granular control over egress traffic on a per-namespace basis
- Flexible configuration of static and dynamic external gateway IP addresses
- Support for both IPv4 and IPv6 address families
26.12.1. Prerequisites
- Your cluster uses the OVN-Kubernetes network plugin.
- Your infrastructure is configured to route traffic from the secondary external gateway.
26.12.2. How OpenShift Container Platform determines the external gateway IP address
You configure a secondary external gateway with the AdminPolicyBasedExternalRoute
custom resource (CR) from the k8s.ovn.org
API group. The CR supports static and dynamic approaches to specifying an external gateway’s IP address.
Each namespace that a AdminPolicyBasedExternalRoute
CR targets cannot be selected by any other AdminPolicyBasedExternalRoute
CR. A namespace cannot have concurrent secondary external gateways.
Changes to policies are isolated in the controller. If a policy fails to apply, changes to other policies do not trigger a retry of other policies. Policies are only re-evaluated, applying any differences that might have occurred by the change, when updates to the policy itself or related objects to the policy such as target namespaces, pod gateways, or namespaces hosting them from dynamic hops are made.
- Static assignment
- You specify an IP address directly.
- Dynamic assignment
You specify an IP address indirectly, with namespace and pod selectors, and an optional network attachment definition.
- If the name of a network attachment definition is provided, the external gateway IP address of the network attachment is used.
-
If the name of a network attachment definition is not provided, the external gateway IP address for the pod itself is used. However, this approach works only if the pod is configured with
hostNetwork
set totrue
.
26.12.3. AdminPolicyBasedExternalRoute object configuration
You can define an AdminPolicyBasedExternalRoute
object, which is cluster scoped, with the following properties. A namespace can be selected by only one AdminPolicyBasedExternalRoute
CR at a time.
Field | Type | Description |
---|---|---|
|
|
Specifies the name of the |
|
|
Specifies a namespace selector that the routing polices apply to. Only from: namespaceSelector: matchLabels: kubernetes.io/metadata.name: novxlan-externalgw-ecmp-4059
A namespace can only be targeted by one |
|
|
Specifies the destinations where the packets are forwarded to. Must be either or both of |
Field | Type | Description |
---|---|---|
|
| Specifies an array of static IP addresses. |
|
| Specifies an array of pod selectors corresponding to pods configured with a network attachment definition to use as the external gateway target. |
Field | Type | Description |
---|---|---|
|
| Specifies either an IPv4 or IPv6 address of the next destination hop. |
|
|
Optional: Specifies whether Bi-Directional Forwarding Detection (BFD) is supported by the network. The default value is |
Field | Type | Description |
---|---|---|
|
| Specifies a [set-based](https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement) label selector to filter the pods in the namespace that match this network configuration. |
|
|
Specifies a |
|
|
Optional: Specifies whether Bi-Directional Forwarding Detection (BFD) is supported by the network. The default value is |
|
| Optional: Specifies the name of a network attachment definition. The name must match the list of logical networks associated with the pod. If this field is not specified, the host network of the pod is used. However, the pod must be configure as a host network pod to use the host network. |
26.12.3.1. Example secondary external gateway configurations
In the following example, the AdminPolicyBasedExternalRoute
object configures two static IP addresses as external gateways for pods in namespaces with the kubernetes.io/metadata.name: novxlan-externalgw-ecmp-4059
label.
apiVersion: k8s.ovn.org/v1 kind: AdminPolicyBasedExternalRoute metadata: name: default-route-policy spec: from: namespaceSelector: matchLabels: kubernetes.io/metadata.name: novxlan-externalgw-ecmp-4059 nextHops: static: - ip: "172.18.0.8" - ip: "172.18.0.9"
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
name: default-route-policy
spec:
from:
namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: novxlan-externalgw-ecmp-4059
nextHops:
static:
- ip: "172.18.0.8"
- ip: "172.18.0.9"
In the following example, the AdminPolicyBasedExternalRoute
object configures a dynamic external gateway. The IP addresses used for the external gateway are derived from the additional network attachments associated with each of the selected pods.
apiVersion: k8s.ovn.org/v1 kind: AdminPolicyBasedExternalRoute metadata: name: shadow-traffic-policy spec: from: namespaceSelector: matchLabels: externalTraffic: "" nextHops: dynamic: - podSelector: matchLabels: gatewayPod: "" namespaceSelector: matchLabels: shadowTraffic: "" networkAttachmentName: shadow-gateway - podSelector: matchLabels: gigabyteGW: "" namespaceSelector: matchLabels: gatewayNamespace: "" networkAttachmentName: gateway
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
name: shadow-traffic-policy
spec:
from:
namespaceSelector:
matchLabels:
externalTraffic: ""
nextHops:
dynamic:
- podSelector:
matchLabels:
gatewayPod: ""
namespaceSelector:
matchLabels:
shadowTraffic: ""
networkAttachmentName: shadow-gateway
- podSelector:
matchLabels:
gigabyteGW: ""
namespaceSelector:
matchLabels:
gatewayNamespace: ""
networkAttachmentName: gateway
In the following example, the AdminPolicyBasedExternalRoute
object configures both static and dynamic external gateways.
apiVersion: k8s.ovn.org/v1 kind: AdminPolicyBasedExternalRoute metadata: name: multi-hop-policy spec: from: namespaceSelector: matchLabels: trafficType: "egress" nextHops: static: - ip: "172.18.0.8" - ip: "172.18.0.9" dynamic: - podSelector: matchLabels: gatewayPod: "" namespaceSelector: matchLabels: egressTraffic: "" networkAttachmentName: gigabyte
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
name: multi-hop-policy
spec:
from:
namespaceSelector:
matchLabels:
trafficType: "egress"
nextHops:
static:
- ip: "172.18.0.8"
- ip: "172.18.0.9"
dynamic:
- podSelector:
matchLabels:
gatewayPod: ""
namespaceSelector:
matchLabels:
egressTraffic: ""
networkAttachmentName: gigabyte
26.12.4. Configure a secondary external gateway
You can configure an external gateway on the default network for a namespace in your cluster.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges.
Procedure
-
Create a YAML file that contains an
AdminPolicyBasedExternalRoute
object. To create an admin policy based external route, enter the following command:
oc create -f <file>.yaml
$ oc create -f <file>.yaml
Copy to Clipboard Copied! where:
<file>
- Specifies the name of the YAML file that you created in the previous step.
Example output
adminpolicybasedexternalroute.k8s.ovn.org/default-route-policy created
adminpolicybasedexternalroute.k8s.ovn.org/default-route-policy created
Copy to Clipboard Copied! To confirm that the admin policy based external route was created, enter the following command:
oc describe apbexternalroute <name> | tail -n 6
$ oc describe apbexternalroute <name> | tail -n 6
Copy to Clipboard Copied! where:
<name>
-
Specifies the name of the
AdminPolicyBasedExternalRoute
object.
Example output
Status: Last Transition Time: 2023-04-24T15:09:01Z Messages: Configured external gateway IPs: 172.18.0.8 Status: Success Events: <none>
Status: Last Transition Time: 2023-04-24T15:09:01Z Messages: Configured external gateway IPs: 172.18.0.8 Status: Success Events: <none>
Copy to Clipboard Copied!
26.12.5. Additional resources
- For more information about additional network attachments, see Understanding multiple networks
26.13. Configuring an egress firewall for a project
As a cluster administrator, you can create an egress firewall for a project that restricts egress traffic leaving your OpenShift Container Platform cluster.
26.13.1. How an egress firewall works in a project
As a cluster administrator, you can use an egress firewall to limit the external hosts that some or all pods can access from within the cluster. An egress firewall supports the following scenarios:
- A pod can only connect to internal hosts and cannot initiate connections to the public internet.
- A pod can only connect to the public internet and cannot initiate connections to internal hosts that are outside the OpenShift Container Platform cluster.
- A pod cannot reach specified internal subnets or hosts outside the OpenShift Container Platform cluster.
- A pod can connect to only specific external hosts.
For example, you can allow one project access to a specified IP range but deny the same access to a different project. Or you can restrict application developers from updating from Python pip mirrors, and force updates to come only from approved sources.
Egress firewall does not apply to the host network namespace. Pods with host networking enabled are unaffected by egress firewall rules.
You configure an egress firewall policy by creating an EgressFirewall custom resource (CR) object. The egress firewall matches network traffic that meets any of the following criteria:
- An IP address range in CIDR format
- A DNS name that resolves to an IP address
- A port number
- A protocol that is one of the following protocols: TCP, UDP, and SCTP
If your egress firewall includes a deny rule for 0.0.0.0/0
, access to your OpenShift Container Platform API servers is blocked. You must either add allow rules for each IP address or use the nodeSelector
type allow rule in your egress policy rules to connect to API servers.
The following example illustrates the order of the egress firewall rules necessary to ensure API server access:
apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: default namespace: <namespace> spec: egress: - to: cidrSelector: <api_server_address_range> type: Allow # ... - to: cidrSelector: 0.0.0.0/0 type: Deny
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
name: default
namespace: <namespace>
spec:
egress:
- to:
cidrSelector: <api_server_address_range>
type: Allow
# ...
- to:
cidrSelector: 0.0.0.0/0
type: Deny
To find the IP address for your API servers, run oc get ep kubernetes -n default
.
For more information, see BZ#1988324.
Egress firewall rules do not apply to traffic that goes through routers. Any user with permission to create a Route CR object can bypass egress firewall policy rules by creating a route that points to a forbidden destination.
26.13.1.1. Limitations of an egress firewall
An egress firewall has the following limitations:
- No project can have more than one EgressFirewall object.
- A maximum of one EgressFirewall object with a maximum of 8,000 rules can be defined per project.
- If you are using the OVN-Kubernetes network plugin with shared gateway mode in Red Hat OpenShift Networking, return ingress replies are affected by egress firewall rules. If the egress firewall rules drop the ingress reply destination IP, the traffic is dropped.
Violating any of these restrictions results in a broken egress firewall for the project. Consequently, all external network traffic is dropped, which can cause security risks for your organization.
An Egress Firewall resource can be created in the kube-node-lease
, kube-public
, kube-system
, openshift
and openshift-
projects.
26.13.1.2. Matching order for egress firewall policy rules
The egress firewall policy rules are evaluated in the order that they are defined, from first to last. The first rule that matches an egress connection from a pod applies. Any subsequent rules are ignored for that connection.
26.13.1.3. How Domain Name Server (DNS) resolution works
If you use DNS names in any of your egress firewall policy rules, proper resolution of the domain names is subject to the following restrictions:
- Domain name updates are polled based on a time-to-live (TTL) duration. By default, the duration is 30 minutes. When the egress firewall controller queries the local name servers for a domain name, if the response includes a TTL and the TTL is less than 30 minutes, the controller sets the duration for that DNS name to the returned value. Each DNS name is queried after the TTL for the DNS record expires.
- The pod must resolve the domain from the same local name servers when necessary. Otherwise the IP addresses for the domain known by the egress firewall controller and the pod can be different. If the IP addresses for a hostname differ, the egress firewall might not be enforced consistently.
- Because the egress firewall controller and pods asynchronously poll the same local name server, the pod might obtain the updated IP address before the egress controller does, which causes a race condition. Due to this current limitation, domain name usage in EgressFirewall objects is only recommended for domains with infrequent IP address changes.
Using DNS names in your egress firewall policy does not affect local DNS resolution through CoreDNS.
However, if your egress firewall policy uses domain names, and an external DNS server handles DNS resolution for an affected pod, you must include egress firewall rules that permit access to the IP addresses of your DNS server.
26.13.2. EgressFirewall custom resource (CR) object
You can define one or more rules for an egress firewall. A rule is either an Allow
rule or a Deny
rule, with a specification for the traffic that the rule applies to.
The following YAML describes an EgressFirewall CR object:
EgressFirewall object
apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: <name> spec: egress: ...
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
name: <name>
spec:
egress:
...
26.13.2.1. EgressFirewall rules
The following YAML describes an egress firewall rule object. The user can select either an IP address range in CIDR format, a domain name, or use the nodeSelector
to allow or deny egress traffic. The egress
stanza expects an array of one or more objects.
Egress policy rule stanza
egress: - type: <type> to: cidrSelector: <cidr> dnsName: <dns_name> nodeSelector: <label_name>: <label_value> ports: ...
egress:
- type: <type>
to:
cidrSelector: <cidr>
dnsName: <dns_name>
nodeSelector: <label_name>: <label_value>
ports:
...
- 1
- The type of rule. The value must be either
Allow
orDeny
. - 2
- A stanza describing an egress traffic match rule that specifies the
cidrSelector
field or thednsName
field. You cannot use both fields in the same rule. - 3
- An IP address range in CIDR format.
- 4
- A DNS domain name.
- 5
- Labels are key/value pairs that the user defines. Labels are attached to objects, such as pods. The
nodeSelector
allows for one or more node labels to be selected and attached to pods. - 6
- Optional: A stanza describing a collection of network ports and protocols for the rule.
Ports stanza
ports: - port: <port> protocol: <protocol>
ports:
- port: <port>
protocol: <protocol>
26.13.2.2. Example EgressFirewall CR objects
The following example defines several egress firewall policy rules:
apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: default spec: egress: - type: Allow to: cidrSelector: 1.2.3.0/24 - type: Deny to: cidrSelector: 0.0.0.0/0
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
name: default
spec:
egress:
- type: Allow
to:
cidrSelector: 1.2.3.0/24
- type: Deny
to:
cidrSelector: 0.0.0.0/0
- 1
- A collection of egress firewall policy rule objects.
The following example defines a policy rule that denies traffic to the host at the 172.16.1.1/32
IP address, if the traffic is using either the TCP protocol and destination port 80
or any protocol and destination port 443
.
apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: default spec: egress: - type: Deny to: cidrSelector: 172.16.1.1/32 ports: - port: 80 protocol: TCP - port: 443
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
name: default
spec:
egress:
- type: Deny
to:
cidrSelector: 172.16.1.1/32
ports:
- port: 80
protocol: TCP
- port: 443
26.13.2.3. Example nodeSelector for EgressFirewall
As a cluster administrator, you can allow or deny egress traffic to nodes in your cluster by specifying a label using nodeSelector
. Labels can be applied to one or more nodes. The following is an example with the region=east
label:
apiVersion: k8s.ovn.org/v1 kind: EgressFirewall metadata: name: default spec: egress: - to: nodeSelector: matchLabels: region: east type: Allow
apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
name: default
spec:
egress:
- to:
nodeSelector:
matchLabels:
region: east
type: Allow
Instead of adding manual rules per node IP address, use node selectors to create a label that allows pods behind an egress firewall to access host network pods.
26.13.3. Creating an egress firewall policy object
As a cluster administrator, you can create an egress firewall policy object for a project.
If the project already has an EgressFirewall object defined, you must edit the existing policy to make changes to the egress firewall rules.
Prerequisites
- A cluster that uses the OVN-Kubernetes network plugin.
-
Install the OpenShift CLI (
oc
). - You must log in to the cluster as a cluster administrator.
Procedure
Create a policy rule:
-
Create a
<policy_name>.yaml
file where<policy_name>
describes the egress policy rules. - In the file you created, define an egress policy object.
-
Create a
Enter the following command to create the policy object. Replace
<policy_name>
with the name of the policy and<project>
with the project that the rule applies to.oc create -f <policy_name>.yaml -n <project>
$ oc create -f <policy_name>.yaml -n <project>
Copy to Clipboard Copied! In the following example, a new EgressFirewall object is created in a project named
project1
:oc create -f default.yaml -n project1
$ oc create -f default.yaml -n project1
Copy to Clipboard Copied! Example output
egressfirewall.k8s.ovn.org/v1 created
egressfirewall.k8s.ovn.org/v1 created
Copy to Clipboard Copied! -
Optional: Save the
<policy_name>.yaml
file so that you can make changes later.
26.14. Viewing an egress firewall for a project
As a cluster administrator, you can list the names of any existing egress firewalls and view the traffic rules for a specific egress firewall.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
26.14.1. Viewing an EgressFirewall object
You can view an EgressFirewall object in your cluster.
Prerequisites
- A cluster using the OVN-Kubernetes network plugin.
-
Install the OpenShift Command-line Interface (CLI), commonly known as
oc
. - You must log in to the cluster.
Procedure
Optional: To view the names of the EgressFirewall objects defined in your cluster, enter the following command:
oc get egressfirewall --all-namespaces
$ oc get egressfirewall --all-namespaces
Copy to Clipboard Copied! To inspect a policy, enter the following command. Replace
<policy_name>
with the name of the policy to inspect.oc describe egressfirewall <policy_name>
$ oc describe egressfirewall <policy_name>
Copy to Clipboard Copied! Example output
Name: default Namespace: project1 Created: 20 minutes ago Labels: <none> Annotations: <none> Rule: Allow to 1.2.3.0/24 Rule: Allow to www.example.com Rule: Deny to 0.0.0.0/0
Name: default Namespace: project1 Created: 20 minutes ago Labels: <none> Annotations: <none> Rule: Allow to 1.2.3.0/24 Rule: Allow to www.example.com Rule: Deny to 0.0.0.0/0
Copy to Clipboard Copied!
26.15. Editing an egress firewall for a project
As a cluster administrator, you can modify network traffic rules for an existing egress firewall.
26.15.1. Editing an EgressFirewall object
As a cluster administrator, you can update the egress firewall for a project.
Prerequisites
- A cluster using the OVN-Kubernetes network plugin.
-
Install the OpenShift CLI (
oc
). - You must log in to the cluster as a cluster administrator.
Procedure
Find the name of the EgressFirewall object for the project. Replace
<project>
with the name of the project.oc get -n <project> egressfirewall
$ oc get -n <project> egressfirewall
Copy to Clipboard Copied! Optional: If you did not save a copy of the EgressFirewall object when you created the egress network firewall, enter the following command to create a copy.
oc get -n <project> egressfirewall <name> -o yaml > <filename>.yaml
$ oc get -n <project> egressfirewall <name> -o yaml > <filename>.yaml
Copy to Clipboard Copied! Replace
<project>
with the name of the project. Replace<name>
with the name of the object. Replace<filename>
with the name of the file to save the YAML to.After making changes to the policy rules, enter the following command to replace the EgressFirewall object. Replace
<filename>
with the name of the file containing the updated EgressFirewall object.oc replace -f <filename>.yaml
$ oc replace -f <filename>.yaml
Copy to Clipboard Copied!
26.16. Removing an egress firewall from a project
As a cluster administrator, you can remove an egress firewall from a project to remove all restrictions on network traffic from the project that leaves the OpenShift Container Platform cluster.
26.16.1. Removing an EgressFirewall object
As a cluster administrator, you can remove an egress firewall from a project.
Prerequisites
- A cluster using the OVN-Kubernetes network plugin.
-
Install the OpenShift CLI (
oc
). - You must log in to the cluster as a cluster administrator.
Procedure
Find the name of the EgressFirewall object for the project. Replace
<project>
with the name of the project.oc get -n <project> egressfirewall
$ oc get -n <project> egressfirewall
Copy to Clipboard Copied! Enter the following command to delete the EgressFirewall object. Replace
<project>
with the name of the project and<name>
with the name of the object.oc delete -n <project> egressfirewall <name>
$ oc delete -n <project> egressfirewall <name>
Copy to Clipboard Copied!
26.17. Configuring an egress IP address
As a cluster administrator, you can configure the OVN-Kubernetes Container Network Interface (CNI) network plugin to assign one or more egress IP addresses to a namespace, or to specific pods in a namespace.
Configuring egress IPs is not supported for ROSA with HCP clusters at this time.
26.17.1. Egress IP address architectural design and implementation
The OpenShift Container Platform egress IP address functionality allows you to ensure that the traffic from one or more pods in one or more namespaces has a consistent source IP address for services outside the cluster network.
For example, you might have a pod that periodically queries a database that is hosted on a server outside of your cluster. To enforce access requirements for the server, a packet filtering device is configured to allow traffic only from specific IP addresses. To ensure that you can reliably allow access to the server from only that specific pod, you can configure a specific egress IP address for the pod that makes the requests to the server.
An egress IP address assigned to a namespace is different from an egress router, which is used to send traffic to specific destinations.
In some cluster configurations, application pods and ingress router pods run on the same node. If you configure an egress IP address for an application project in this scenario, the IP address is not used when you send a request to a route from the application project.
Egress IP addresses must not be configured in any Linux network configuration files, such as ifcfg-eth0
.
26.17.1.1. Platform support
Support for the egress IP address functionality on various platforms is summarized in the following table:
Platform | Supported |
---|---|
Bare metal | Yes |
VMware vSphere | Yes |
Red Hat OpenStack Platform (RHOSP) | Yes |
Amazon Web Services (AWS) | Yes |
Google Cloud Platform (GCP) | Yes |
Microsoft Azure | Yes |
IBM Z® and IBM® LinuxONE | Yes |
IBM Z® and IBM® LinuxONE for Red Hat Enterprise Linux (RHEL) KVM | Yes |
IBM Power® | Yes |
Nutanix | Yes |
The assignment of egress IP addresses to control plane nodes with the EgressIP feature is not supported on a cluster provisioned on Amazon Web Services (AWS). (BZ#2039656).
26.17.1.2. Public cloud platform considerations
Typically, public cloud providers place a limit on egress IPs. This means that there is a constraint on the absolute number of assignable IP addresses per node for clusters provisioned on public cloud infrastructure. The maximum number of assignable IP addresses per node, or the IP capacity, can be described in the following formula:
IP capacity = public cloud default capacity - sum(current IP assignments)
IP capacity = public cloud default capacity - sum(current IP assignments)
While the Egress IPs capability manages the IP address capacity per node, it is important to plan for this constraint in your deployments. For example, if a public cloud provider limits IP address capacity to 10 IP addresses per node, and you have 8 nodes, the total number of assignable IP addresses is only 80. To achieve a higher IP address capacity, you would need to allocate additional nodes. For example, if you needed 150 assignable IP addresses, you would need to allocate 7 additional nodes.
To confirm the IP capacity and subnets for any node in your public cloud environment, you can enter the oc get node <node_name> -o yaml
command. The cloud.network.openshift.io/egress-ipconfig
annotation includes capacity and subnet information for the node.
The annotation value is an array with a single object with fields that provide the following information for the primary network interface:
-
interface
: Specifies the interface ID on AWS and Azure and the interface name on GCP. -
ifaddr
: Specifies the subnet mask for one or both IP address families. -
capacity
: Specifies the IP address capacity for the node. On AWS, the IP address capacity is provided per IP address family. On Azure and GCP, the IP address capacity includes both IPv4 and IPv6 addresses.
Automatic attachment and detachment of egress IP addresses for traffic between nodes are available. This allows for traffic from many pods in namespaces to have a consistent source IP address to locations outside of the cluster. This also supports OpenShift SDN and OVN-Kubernetes, which is the default networking plugin in Red Hat OpenShift Networking in OpenShift Container Platform 4.15.
The RHOSP egress IP address feature creates a Neutron reservation port called egressip-<IP address>
. Using the same RHOSP user as the one used for the OpenShift Container Platform cluster installation, you can assign a floating IP address to this reservation port to have a predictable SNAT address for egress traffic. When an egress IP address on an RHOSP network is moved from one node to another, because of a node failover, for example, the Neutron reservation port is removed and recreated. This means that the floating IP association is lost and you need to manually reassign the floating IP address to the new reservation port.
When an RHOSP cluster administrator assigns a floating IP to the reservation port, OpenShift Container Platform cannot delete the reservation port. The CloudPrivateIPConfig
object cannot perform delete and move operations until an RHOSP cluster administrator unassigns the floating IP from the reservation port.
The following examples illustrate the annotation from nodes on several public cloud providers. The annotations are indented for readability.
Example cloud.network.openshift.io/egress-ipconfig
annotation on AWS
cloud.network.openshift.io/egress-ipconfig: [ { "interface":"eni-078d267045138e436", "ifaddr":{"ipv4":"10.0.128.0/18"}, "capacity":{"ipv4":14,"ipv6":15} } ]
cloud.network.openshift.io/egress-ipconfig: [
{
"interface":"eni-078d267045138e436",
"ifaddr":{"ipv4":"10.0.128.0/18"},
"capacity":{"ipv4":14,"ipv6":15}
}
]
Example cloud.network.openshift.io/egress-ipconfig
annotation on GCP
cloud.network.openshift.io/egress-ipconfig: [ { "interface":"nic0", "ifaddr":{"ipv4":"10.0.128.0/18"}, "capacity":{"ip":14} } ]
cloud.network.openshift.io/egress-ipconfig: [
{
"interface":"nic0",
"ifaddr":{"ipv4":"10.0.128.0/18"},
"capacity":{"ip":14}
}
]
The following sections describe the IP address capacity for supported public cloud environments for use in your capacity calculation.
26.17.1.2.1. Amazon Web Services (AWS) IP address capacity limits
On AWS, constraints on IP address assignments depend on the instance type configured. For more information, see IP addresses per network interface per instance type
26.17.1.2.2. Google Cloud Platform (GCP) IP address capacity limits
On GCP, the networking model implements additional node IP addresses through IP address aliasing, rather than IP address assignments. However, IP address capacity maps directly to IP aliasing capacity.
The following capacity limits exist for IP aliasing assignment:
- Per node, the maximum number of IP aliases, both IPv4 and IPv6, is 100.
- Per VPC, the maximum number of IP aliases is unspecified, but OpenShift Container Platform scalability testing reveals the maximum to be approximately 15,000.
For more information, see Per instance quotas and Alias IP ranges overview.
26.17.1.2.3. Microsoft Azure IP address capacity limits
On Azure, the following capacity limits exist for IP address assignment:
- Per NIC, the maximum number of assignable IP addresses, for both IPv4 and IPv6, is 256.
- Per virtual network, the maximum number of assigned IP addresses cannot exceed 65,536.
For more information, see Networking limits.
26.17.1.3. Considerations for using an egress IP on additional network interfaces
In OpenShift Container Platform, egress IPs provide administrators a way to control network traffic. Egress IPs can be used with the br-ex
, or primary, network interface, which is a Linux bridge interface associated with Open vSwitch, or they can be used with additional network interfaces.
You can inspect your network interface type by running the following command:
ip -details link show
$ ip -details link show
The primary network interface is assigned a node IP address which also contains a subnet mask. Information for this node IP address can be retrieved from the Kubernetes node object for each node within your cluster by inspecting the k8s.ovn.org/node-primary-ifaddr
annotation. In an IPv4 cluster, this annotation is similar to the following example: "k8s.ovn.org/node-primary-ifaddr: {"ipv4":"192.168.111.23/24"}"
.
If the egress IP is not within the subnet of the primary network interface subnet, you can use an egress IP on another Linux network interface that is not of the primary network interface type. By doing so, OpenShift Container Platform administrators are provided with a greater level of control over networking aspects such as routing, addressing, segmentation, and security policies. This feature provides users with the option to route workload traffic over specific network interfaces for purposes such as traffic segmentation or meeting specialized requirements.
If the egress IP is not within the subnet of the primary network interface, then the selection of another network interface for egress traffic might occur if they are present on a node.
You can determine which other network interfaces might support egress IPs by inspecting the k8s.ovn.org/host-cidrs
Kubernetes node annotation. This annotation contains the addresses and subnet mask found for the primary network interface. It also contains additional network interface addresses and subnet mask information. These addresses and subnet masks are assigned to network interfaces that use the longest prefix match routing mechanism to determine which network interface supports the egress IP.
OVN-Kubernetes provides a mechanism to control and direct outbound network traffic from specific namespaces and pods. This ensures that it exits the cluster through a particular network interface and with a specific egress IP address.
Requirements for assigning an egress IP to a network interface that is not the primary network interface
For users who want an egress IP and traffic to be routed over a particular interface that is not the primary network interface, the following conditions must be met:
- OpenShift Container Platform is installed on a bare metal cluster. This feature is disabled within cloud or hypervisor environments.
- Your OpenShift Container Platform pods are not configured as host-networked.
- If a network interface is removed or if the IP address and subnet mask which allows the egress IP to be hosted on the interface is removed, then the egress IP is reconfigured. Consequently, it could be assigned to another node and interface.
IP forwarding must be enabled for the network interface. To enable IP forwarding, you can use the
oc edit network.operator
command and edit the object like the following example:# ... spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 defaultNetwork: ovnKubernetesConfig: gatewayConfig: ipForwarding: Global # ...
# ... spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 defaultNetwork: ovnKubernetesConfig: gatewayConfig: ipForwarding: Global # ...
Copy to Clipboard Copied!
26.17.1.4. Assignment of egress IPs to pods
To assign one or more egress IPs to a namespace or specific pods in a namespace, the following conditions must be satisfied:
-
At least one node in your cluster must have the
k8s.ovn.org/egress-assignable: ""
label. -
An
EgressIP
object exists that defines one or more egress IP addresses to use as the source IP address for traffic leaving the cluster from pods in a namespace.
If you create EgressIP
objects prior to labeling any nodes in your cluster for egress IP assignment, OpenShift Container Platform might assign every egress IP address to the first node with the k8s.ovn.org/egress-assignable: ""
label.
To ensure that egress IP addresses are widely distributed across nodes in the cluster, always apply the label to the nodes you intent to host the egress IP addresses before creating any EgressIP
objects.
26.17.1.5. Assignment of egress IPs to nodes
When creating an EgressIP
object, the following conditions apply to nodes that are labeled with the k8s.ovn.org/egress-assignable: ""
label:
- An egress IP address is never assigned to more than one node at a time.
- An egress IP address is equally balanced between available nodes that can host the egress IP address.
If the
spec.EgressIPs
array in anEgressIP
object specifies more than one IP address, the following conditions apply:- No node will ever host more than one of the specified IP addresses.
- Traffic is balanced roughly equally between the specified IP addresses for a given namespace.
- If a node becomes unavailable, any egress IP addresses assigned to it are automatically reassigned, subject to the previously described conditions.
When a pod matches the selector for multiple EgressIP
objects, there is no guarantee which of the egress IP addresses that are specified in the EgressIP
objects is assigned as the egress IP address for the pod.
Additionally, if an EgressIP
object specifies multiple egress IP addresses, there is no guarantee which of the egress IP addresses might be used. For example, if a pod matches a selector for an EgressIP
object with two egress IP addresses, 10.10.20.1
and 10.10.20.2
, either might be used for each TCP connection or UDP conversation.
26.17.1.6. Architectural diagram of an egress IP address configuration
The following diagram depicts an egress IP address configuration. The diagram describes four pods in two different namespaces running on three nodes in a cluster. The nodes are assigned IP addresses from the 192.168.126.0/18
CIDR block on the host network.
Both Node 1 and Node 3 are labeled with k8s.ovn.org/egress-assignable: ""
and thus available for the assignment of egress IP addresses.
The dashed lines in the diagram depict the traffic flow from pod1, pod2, and pod3 traveling through the pod network to egress the cluster from Node 1 and Node 3. When an external service receives traffic from any of the pods selected by the example EgressIP
object, the source IP address is either 192.168.126.10
or 192.168.126.102
. The traffic is balanced roughly equally between these two nodes.
The following resources from the diagram are illustrated in detail:
Namespace
objectsThe namespaces are defined in the following manifest:
Namespace objects
apiVersion: v1 kind: Namespace metadata: name: namespace1 labels: env: prod --- apiVersion: v1 kind: Namespace metadata: name: namespace2 labels: env: prod
apiVersion: v1 kind: Namespace metadata: name: namespace1 labels: env: prod --- apiVersion: v1 kind: Namespace metadata: name: namespace2 labels: env: prod
Copy to Clipboard Copied! EgressIP
objectThe following
EgressIP
object describes a configuration that selects all pods in any namespace with theenv
label set toprod
. The egress IP addresses for the selected pods are192.168.126.10
and192.168.126.102
.EgressIP
objectapiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressips-prod spec: egressIPs: - 192.168.126.10 - 192.168.126.102 namespaceSelector: matchLabels: env: prod status: items: - node: node1 egressIP: 192.168.126.10 - node: node3 egressIP: 192.168.126.102
apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressips-prod spec: egressIPs: - 192.168.126.10 - 192.168.126.102 namespaceSelector: matchLabels: env: prod status: items: - node: node1 egressIP: 192.168.126.10 - node: node3 egressIP: 192.168.126.102
Copy to Clipboard Copied! For the configuration in the previous example, OpenShift Container Platform assigns both egress IP addresses to the available nodes. The
status
field reflects whether and where the egress IP addresses are assigned.
26.17.2. EgressIP object
The following YAML describes the API for the EgressIP
object. The scope of the object is cluster-wide; it is not created in a namespace.
apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: <name> spec: egressIPs: - <ip_address> namespaceSelector: ... podSelector: ...
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
name: <name>
spec:
egressIPs:
- <ip_address>
namespaceSelector:
...
podSelector:
...
- 1
- The name for the
EgressIPs
object. - 2
- An array of one or more IP addresses.
- 3
- One or more selectors for the namespaces to associate the egress IP addresses with.
- 4
- Optional: One or more selectors for pods in the specified namespaces to associate egress IP addresses with. Applying these selectors allows for the selection of a subset of pods within a namespace.
The following YAML describes the stanza for the namespace selector:
Namespace selector stanza
namespaceSelector: matchLabels: <label_name>: <label_value>
namespaceSelector:
matchLabels:
<label_name>: <label_value>
- 1
- One or more matching rules for namespaces. If more than one match rule is provided, all matching namespaces are selected.
The following YAML describes the optional stanza for the pod selector:
Pod selector stanza
podSelector: matchLabels: <label_name>: <label_value>
podSelector:
matchLabels:
<label_name>: <label_value>
- 1
- Optional: One or more matching rules for pods in the namespaces that match the specified
namespaceSelector
rules. If specified, only pods that match are selected. Others pods in the namespace are not selected.
In the following example, the EgressIP
object associates the 192.168.126.11
and 192.168.126.102
egress IP addresses with pods that have the app
label set to web
and are in the namespaces that have the env
label set to prod
:
Example EgressIP
object
apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egress-group1 spec: egressIPs: - 192.168.126.11 - 192.168.126.102 podSelector: matchLabels: app: web namespaceSelector: matchLabels: env: prod
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
name: egress-group1
spec:
egressIPs:
- 192.168.126.11
- 192.168.126.102
podSelector:
matchLabels:
app: web
namespaceSelector:
matchLabels:
env: prod
In the following example, the EgressIP
object associates the 192.168.127.30
and 192.168.127.40
egress IP addresses with any pods that do not have the environment
label set to development
:
Example EgressIP
object
apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egress-group2 spec: egressIPs: - 192.168.127.30 - 192.168.127.40 namespaceSelector: matchExpressions: - key: environment operator: NotIn values: - development
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
name: egress-group2
spec:
egressIPs:
- 192.168.127.30
- 192.168.127.40
namespaceSelector:
matchExpressions:
- key: environment
operator: NotIn
values:
- development
26.17.3. The egressIPConfig object
As a feature of egress IP, the reachabilityTotalTimeoutSeconds
parameter configures the EgressIP node reachability check total timeout in seconds. If the EgressIP node cannot be reached within this timeout, the node is declared down.
You can set a value for the reachabilityTotalTimeoutSeconds
in the configuration file for the egressIPConfig
object. Setting a large value might cause the EgressIP implementation to react slowly to node changes. The implementation reacts slowly for EgressIP nodes that have an issue and are unreachable.
If you omit the reachabilityTotalTimeoutSeconds
parameter from the egressIPConfig
object, the platform chooses a reasonable default value, which is subject to change over time. The current default is 1
second. A value of 0
disables the reachability check for the EgressIP node.
The following egressIPConfig
object describes changing the reachabilityTotalTimeoutSeconds
from the default 1
second probes to 5
second probes:
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 defaultNetwork: ovnKubernetesConfig: egressIPConfig: reachabilityTotalTimeoutSeconds: 5 gatewayConfig: routingViaHost: false genevePort: 6081
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
defaultNetwork:
ovnKubernetesConfig:
egressIPConfig:
reachabilityTotalTimeoutSeconds: 5
gatewayConfig:
routingViaHost: false
genevePort: 6081
- 1
- The
egressIPConfig
holds the configurations for the options of theEgressIP
object. By changing these configurations, you can extend theEgressIP
object. - 2
- The value for
reachabilityTotalTimeoutSeconds
accepts integer values from0
to60
. A value of0
disables the reachability check of the egressIP node. Setting a value from1
to60
corresponds to the timeout in seconds for a probe to send the reachability check to the node.
26.17.4. Labeling a node to host egress IP addresses
You can apply the k8s.ovn.org/egress-assignable=""
label to a node in your cluster so that OpenShift Container Platform can assign one or more egress IP addresses to the node.
Prerequisites
-
Install the OpenShift CLI (
oc
). - Log in to the cluster as a cluster administrator.
Procedure
To label a node so that it can host one or more egress IP addresses, enter the following command:
oc label nodes <node_name> k8s.ovn.org/egress-assignable=""
$ oc label nodes <node_name> k8s.ovn.org/egress-assignable=""
1 Copy to Clipboard Copied! - 1
- The name of the node to label.
TipYou can alternatively apply the following YAML to add the label to a node:
apiVersion: v1 kind: Node metadata: labels: k8s.ovn.org/egress-assignable: "" name: <node_name>
apiVersion: v1 kind: Node metadata: labels: k8s.ovn.org/egress-assignable: "" name: <node_name>
Copy to Clipboard Copied!
26.17.5. Next steps
26.18. Assigning an egress IP address
As a cluster administrator, you can assign an egress IP address for traffic leaving the cluster from a namespace or from specific pods in a namespace.
26.18.1. Assigning an egress IP address to a namespace
You can assign one or more egress IP addresses to a namespace or to specific pods in a namespace.
Prerequisites
-
Install the OpenShift CLI (
oc
). - Log in to the cluster as a cluster administrator.
- Configure at least one node to host an egress IP address.
Procedure
Create an
EgressIP
object:-
Create a
<egressips_name>.yaml
file where<egressips_name>
is the name of the object. In the file that you created, define an
EgressIP
object, as in the following example:apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egress-project1 spec: egressIPs: - 192.168.127.10 - 192.168.127.11 namespaceSelector: matchLabels: env: qa
apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egress-project1 spec: egressIPs: - 192.168.127.10 - 192.168.127.11 namespaceSelector: matchLabels: env: qa
Copy to Clipboard Copied!
-
Create a
To create the object, enter the following command.
oc apply -f <egressips_name>.yaml
$ oc apply -f <egressips_name>.yaml
1 Copy to Clipboard Copied! - 1
- Replace
<egressips_name>
with the name of the object.
Example output
egressips.k8s.ovn.org/<egressips_name> created
egressips.k8s.ovn.org/<egressips_name> created
Copy to Clipboard Copied! -
Optional: Store the
<egressips_name>.yaml
file so that you can make changes later. Add labels to the namespace that requires egress IP addresses. To add a label to the namespace of an
EgressIP
object defined in step 1, run the following command:oc label ns <namespace> env=qa
$ oc label ns <namespace> env=qa
1 Copy to Clipboard Copied! - 1
- Replace
<namespace>
with the namespace that requires egress IP addresses.
Verification
To show all egress IPs that are in use in your cluster, enter the following command:
oc get egressip -o yaml
$ oc get egressip -o yaml
Copy to Clipboard Copied! NoteThe command
oc get egressip
only returns one egress IP address regardless of how many are configured. This is not a bug and is a limitation of Kubernetes. As a workaround, you can pass in the-o yaml
or-o json
flags to return all egress IPs addresses in use.Example output
... ...
# ... spec: egressIPs: - 192.168.127.10 - 192.168.127.11 # ...
Copy to Clipboard Copied!
26.19. Configuring an egress service
As a cluster administrator, you can configure egress traffic for pods behind a load balancer service by using an egress service.
Egress service is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can use the EgressService
custom resource (CR) to manage egress traffic in the following ways:
Assign a load balancer service IP address as the source IP address for egress traffic for pods behind the load balancer service.
Assigning the load balancer IP address as the source IP address in this context is useful to present a single point of egress and ingress. For example, in some scenarios, an external system communicating with an application behind a load balancer service can expect the source and destination IP address for the application to be the same.
NoteWhen you assign the load balancer service IP address to egress traffic for pods behind the service, OVN-Kubernetes restricts the ingress and egress point to a single node. This limits the load balancing of traffic that MetalLB typically provides.
Assign the egress traffic for pods behind a load balancer to a different network than the default node network.
This is useful to assign the egress traffic for applications behind a load balancer to a different network than the default network. Typically, the different network is implemented by using a VRF instance associated with a network interface.
26.19.1. Egress service custom resource
Define the configuration for an egress service in an EgressService
custom resource. The following YAML describes the fields for the configuration of an egress service:
apiVersion: k8s.ovn.org/v1 kind: EgressService metadata: name: <egress_service_name> namespace: <namespace> spec: sourceIPBy: <egress_traffic_ip> nodeSelector: matchLabels: node-role.kubernetes.io/<role>: "" network: <egress_traffic_network>
apiVersion: k8s.ovn.org/v1
kind: EgressService
metadata:
name: <egress_service_name>
namespace: <namespace>
spec:
sourceIPBy: <egress_traffic_ip>
nodeSelector:
matchLabels:
node-role.kubernetes.io/<role>: ""
network: <egress_traffic_network>
- 1
- Specify the name for the egress service. The name of the
EgressService
resource must match the name of the load-balancer service that you want to modify. - 2
- Specify the namespace for the egress service. The namespace for the
EgressService
must match the namespace of the load-balancer service that you want to modify. The egress service is namespace-scoped. - 3
- Specify the source IP address of egress traffic for pods behind a service. Valid values are
LoadBalancerIP
orNetwork
. Use theLoadBalancerIP
value to assign theLoadBalancer
service ingress IP address as the source IP address for egress traffic. SpecifyNetwork
to assign the network interface IP address as the source IP address for egress traffic. - 4
- Optional: If you use the
LoadBalancerIP
value for thesourceIPBy
specification, a single node handles theLoadBalancer
service traffic. Use thenodeSelector
field to limit which node can be assigned this task. When a node is selected to handle the service traffic, OVN-Kubernetes labels the node in the following format:egress-service.k8s.ovn.org/<svc-namespace>-<svc-name>: ""
. When thenodeSelector
field is not specified, any node can manage theLoadBalancer
service traffic. - 5
- Optional: Specify the routing table for egress traffic. If you do not include the
network
specification, the egress service uses the default host network.
Example egress service specification
apiVersion: k8s.ovn.org/v1 kind: EgressService metadata: name: test-egress-service namespace: test-namespace spec: sourceIPBy: "LoadBalancerIP" nodeSelector: matchLabels: vrf: "true" network: "2"
apiVersion: k8s.ovn.org/v1
kind: EgressService
metadata:
name: test-egress-service
namespace: test-namespace
spec:
sourceIPBy: "LoadBalancerIP"
nodeSelector:
matchLabels:
vrf: "true"
network: "2"
26.19.2. Deploying an egress service
You can deploy an egress service to manage egress traffic for pods behind a LoadBalancer
service.
The following example configures the egress traffic to have the same source IP address as the ingress IP address of the LoadBalancer
service.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. -
You configured MetalLB
BGPPeer
resources.
Procedure
Create an
IPAddressPool
CR with the desired IP for the service:Create a file, such as
ip-addr-pool.yaml
, with content like the following example:apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: example-pool namespace: metallb-system spec: addresses: - 172.19.0.100/32
apiVersion: metallb.io/v1beta1 kind: IPAddressPool metadata: name: example-pool namespace: metallb-system spec: addresses: - 172.19.0.100/32
Copy to Clipboard Copied! Apply the configuration for the IP address pool by running the following command:
oc apply -f ip-addr-pool.yaml
$ oc apply -f ip-addr-pool.yaml
Copy to Clipboard Copied!
Create
Service
andEgressService
CRs:Create a file, such as
service-egress-service.yaml
, with content like the following example:apiVersion: v1 kind: Service metadata: name: example-service namespace: example-namespace annotations: metallb.universe.tf/address-pool: example-pool spec: selector: app: example ports: - name: http protocol: TCP port: 8080 targetPort: 8080 type: LoadBalancer --- apiVersion: k8s.ovn.org/v1 kind: EgressService metadata: name: example-service namespace: example-namespace spec: sourceIPBy: "LoadBalancerIP" nodeSelector: matchLabels: node-role.kubernetes.io/worker: ""
apiVersion: v1 kind: Service metadata: name: example-service namespace: example-namespace annotations: metallb.universe.tf/address-pool: example-pool
1 spec: selector: app: example ports: - name: http protocol: TCP port: 8080 targetPort: 8080 type: LoadBalancer --- apiVersion: k8s.ovn.org/v1 kind: EgressService metadata: name: example-service namespace: example-namespace spec: sourceIPBy: "LoadBalancerIP"
2 nodeSelector:
3 matchLabels: node-role.kubernetes.io/worker: ""
Copy to Clipboard Copied! - 1
- The
LoadBalancer
service uses the IP address assigned by MetalLB from theexample-pool
IP address pool. - 2
- This example uses the
LoadBalancerIP
value to assign the ingress IP address of theLoadBalancer
service as the source IP address of egress traffic. - 3
- When you specify the
LoadBalancerIP
value, a single node handles theLoadBalancer
service’s traffic. In this example, only nodes with theworker
label can be selected to handle the traffic. When a node is selected, OVN-Kubernetes labels the node in the following formategress-service.k8s.ovn.org/<svc-namespace>-<svc-name>: ""
.
NoteIf you use the
sourceIPBy: "LoadBalancerIP"
setting, you must specify the load-balancer node in theBGPAdvertisement
custom resource (CR).Apply the configuration for the service and egress service by running the following command:
oc apply -f service-egress-service.yaml
$ oc apply -f service-egress-service.yaml
Copy to Clipboard Copied!
Create a
BGPAdvertisement
CR to advertise the service:Create a file, such as
service-bgp-advertisement.yaml
, with content like the following example:apiVersion: metallb.io/v1beta1 kind: BGPAdvertisement metadata: name: example-bgp-adv namespace: metallb-system spec: ipAddressPools: - example-pool nodeSelector: - matchLabels: egress-service.k8s.ovn.org/example-namespace-example-service: ""
apiVersion: metallb.io/v1beta1 kind: BGPAdvertisement metadata: name: example-bgp-adv namespace: metallb-system spec: ipAddressPools: - example-pool nodeSelector: - matchLabels: egress-service.k8s.ovn.org/example-namespace-example-service: ""
1 Copy to Clipboard Copied! - 1
- In this example, the
EgressService
CR configures the source IP address for egress traffic to use the load-balancer service IP address. Therefore, you must specify the load-balancer node for return traffic to use the same return path for the traffic originating from the pod.
Verification
Verify that you can access the application endpoint of the pods running behind the MetalLB service by running the following command:
curl <external_ip_address>:<port_number>
$ curl <external_ip_address>:<port_number>
1 Copy to Clipboard Copied! - 1
- Update the external IP address and port number to suit your application endpoint.
-
If you assigned the
LoadBalancer
service’s ingress IP address as the source IP address for egress traffic, verify this configuration by using tools such astcpdump
to analyze packets received at the external client.
26.20. Considerations for the use of an egress router pod
26.20.1. About an egress router pod
The OpenShift Container Platform egress router pod redirects traffic to a specified remote server from a private source IP address that is not used for any other purpose. An egress router pod can send network traffic to servers that are set up to allow access only from specific IP addresses.
The egress router pod is not intended for every outgoing connection. Creating large numbers of egress router pods can exceed the limits of your network hardware. For example, creating an egress router pod for every project or application could exceed the number of local MAC addresses that the network interface can handle before reverting to filtering MAC addresses in software.
The egress router image is not compatible with Amazon AWS, Azure Cloud, or any other cloud platform that does not support layer 2 manipulations due to their incompatibility with macvlan traffic.
26.20.1.1. Egress router modes
In redirect mode, an egress router pod configures iptables
rules to redirect traffic from its own IP address to one or more destination IP addresses. Client pods that need to use the reserved source IP address must be configured to access the service for the egress router rather than connecting directly to the destination IP. You can access the destination service and port from the application pod by using the curl
command. For example:
curl <router_service_IP> <port>
$ curl <router_service_IP> <port>
The egress router CNI plugin supports redirect mode only. This is a difference with the egress router implementation that you can deploy with OpenShift SDN. Unlike the egress router for OpenShift SDN, the egress router CNI plugin does not support HTTP proxy mode or DNS proxy mode.
26.20.1.2. Egress router pod implementation
The egress router implementation uses the egress router Container Network Interface (CNI) plugin. The plugin adds a secondary network interface to a pod.
An egress router is a pod that has two network interfaces. For example, the pod can have eth0
and net1
network interfaces. The eth0
interface is on the cluster network and the pod continues to use the interface for ordinary cluster-related network traffic. The net1
interface is on a secondary network and has an IP address and gateway for that network. Other pods in the OpenShift Container Platform cluster can access the egress router service and the service enables the pods to access external services. The egress router acts as a bridge between pods and an external system.
Traffic that leaves the egress router exits through a node, but the packets have the MAC address of the net1
interface from the egress router pod.
When you add an egress router custom resource, the Cluster Network Operator creates the following objects:
-
The network attachment definition for the
net1
secondary network interface of the pod. - A deployment for the egress router.
If you delete an egress router custom resource, the Operator deletes the two objects in the preceding list that are associated with the egress router.
26.20.1.3. Deployment considerations
An egress router pod adds an additional IP address and MAC address to the primary network interface of the node. As a result, you might need to configure your hypervisor or cloud provider to allow the additional address.
- Red Hat OpenStack Platform (RHOSP)
If you deploy OpenShift Container Platform on RHOSP, you must allow traffic from the IP and MAC addresses of the egress router pod on your OpenStack environment. If you do not allow the traffic, then communication will fail:
openstack port set --allowed-address \ ip_address=<ip_address>,mac_address=<mac_address> <neutron_port_uuid>
$ openstack port set --allowed-address \ ip_address=<ip_address>,mac_address=<mac_address> <neutron_port_uuid>
Copy to Clipboard Copied! - VMware vSphere
- If you are using VMware vSphere, see the VMware documentation for securing vSphere standard switches. View and change VMware vSphere default settings by selecting the host virtual switch from the vSphere Web Client.
Specifically, ensure that the following are enabled:
26.20.1.4. Failover configuration
To avoid downtime, the Cluster Network Operator deploys the egress router pod as a deployment resource. The deployment name is egress-router-cni-deployment
. The pod that corresponds to the deployment has a label of app=egress-router-cni
.
To create a new service for the deployment, use the oc expose deployment/egress-router-cni-deployment --port <port_number>
command or create a file like the following example:
apiVersion: v1 kind: Service metadata: name: app-egress spec: ports: - name: tcp-8080 protocol: TCP port: 8080 - name: tcp-8443 protocol: TCP port: 8443 - name: udp-80 protocol: UDP port: 80 type: ClusterIP selector: app: egress-router-cni
apiVersion: v1
kind: Service
metadata:
name: app-egress
spec:
ports:
- name: tcp-8080
protocol: TCP
port: 8080
- name: tcp-8443
protocol: TCP
port: 8443
- name: udp-80
protocol: UDP
port: 80
type: ClusterIP
selector:
app: egress-router-cni
26.21. Deploying an egress router pod in redirect mode
As a cluster administrator, you can deploy an egress router pod to redirect traffic to specified destination IP addresses from a reserved source IP address.
The egress router implementation uses the egress router Container Network Interface (CNI) plugin.
26.21.1. Egress router custom resource
Define the configuration for an egress router pod in an egress router custom resource. The following YAML describes the fields for the configuration of an egress router in redirect mode:
apiVersion: network.operator.openshift.io/v1 kind: EgressRouter metadata: name: <egress_router_name> namespace: <namespace> spec: addresses: [ { ip: "<egress_router>", gateway: "<egress_gateway>" } ] mode: Redirect redirect: { redirectRules: [ { destinationIP: "<egress_destination>", port: <egress_router_port>, targetPort: <target_port>, protocol: <network_protocol> }, ... ], fallbackIP: "<egress_destination>" }
apiVersion: network.operator.openshift.io/v1
kind: EgressRouter
metadata:
name: <egress_router_name>
namespace: <namespace>
spec:
addresses: [
{
ip: "<egress_router>",
gateway: "<egress_gateway>"
}
]
mode: Redirect
redirect: {
redirectRules: [
{
destinationIP: "<egress_destination>",
port: <egress_router_port>,
targetPort: <target_port>,
protocol: <network_protocol>
},
...
],
fallbackIP: "<egress_destination>"
}
- 1
- Optional: The
namespace
field specifies the namespace to create the egress router in. If you do not specify a value in the file or on the command line, thedefault
namespace is used. - 2
- The
addresses
field specifies the IP addresses to configure on the secondary network interface. - 3
- The
ip
field specifies the reserved source IP address and netmask from the physical network that the node is on to use with egress router pod. Use CIDR notation to specify the IP address and netmask. - 4
- The
gateway
field specifies the IP address of the network gateway. - 5
- Optional: The
redirectRules
field specifies a combination of egress destination IP address, egress router port, and protocol. Incoming connections to the egress router on the specified port and protocol are routed to the destination IP address. - 6
- Optional: The
targetPort
field specifies the network port on the destination IP address. If this field is not specified, traffic is routed to the same network port that it arrived on. - 7
- The
protocol
field supports TCP, UDP, or SCTP. - 8
- Optional: The
fallbackIP
field specifies a destination IP address. If you do not specify any redirect rules, the egress router sends all traffic to this fallback IP address. If you specify redirect rules, any connections to network ports that are not defined in the rules are sent by the egress router to this fallback IP address. If you do not specify this field, the egress router rejects connections to network ports that are not defined in the rules.
Example egress router specification
apiVersion: network.operator.openshift.io/v1 kind: EgressRouter metadata: name: egress-router-redirect spec: networkInterface: { macvlan: { mode: "Bridge" } } addresses: [ { ip: "192.168.12.99/24", gateway: "192.168.12.1" } ] mode: Redirect redirect: { redirectRules: [ { destinationIP: "10.0.0.99", port: 80, protocol: UDP }, { destinationIP: "203.0.113.26", port: 8080, targetPort: 80, protocol: TCP }, { destinationIP: "203.0.113.27", port: 8443, targetPort: 443, protocol: TCP } ] }
apiVersion: network.operator.openshift.io/v1
kind: EgressRouter
metadata:
name: egress-router-redirect
spec:
networkInterface: {
macvlan: {
mode: "Bridge"
}
}
addresses: [
{
ip: "192.168.12.99/24",
gateway: "192.168.12.1"
}
]
mode: Redirect
redirect: {
redirectRules: [
{
destinationIP: "10.0.0.99",
port: 80,
protocol: UDP
},
{
destinationIP: "203.0.113.26",
port: 8080,
targetPort: 80,
protocol: TCP
},
{
destinationIP: "203.0.113.27",
port: 8443,
targetPort: 443,
protocol: TCP
}
]
}
26.21.2. Deploying an egress router in redirect mode
You can deploy an egress router to redirect traffic from its own reserved source IP address to one or more destination IP addresses.
After you add an egress router, the client pods that need to use the reserved source IP address must be modified to connect to the egress router rather than connecting directly to the destination IP.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
- Create an egress router definition.
To ensure that other pods can find the IP address of the egress router pod, create a service that uses the egress router, as in the following example:
apiVersion: v1 kind: Service metadata: name: egress-1 spec: ports: - name: web-app protocol: TCP port: 8080 type: ClusterIP selector: app: egress-router-cni
apiVersion: v1 kind: Service metadata: name: egress-1 spec: ports: - name: web-app protocol: TCP port: 8080 type: ClusterIP selector: app: egress-router-cni
1 Copy to Clipboard Copied! - 1
- Specify the label for the egress router. The value shown is added by the Cluster Network Operator and is not configurable.
After you create the service, your pods can connect to the service. The egress router pod redirects traffic to the corresponding port on the destination IP address. The connections originate from the reserved source IP address.
Verification
To verify that the Cluster Network Operator started the egress router, complete the following procedure:
View the network attachment definition that the Operator created for the egress router:
oc get network-attachment-definition egress-router-cni-nad
$ oc get network-attachment-definition egress-router-cni-nad
Copy to Clipboard Copied! The name of the network attachment definition is not configurable.
Example output
NAME AGE egress-router-cni-nad 18m
NAME AGE egress-router-cni-nad 18m
Copy to Clipboard Copied! View the deployment for the egress router pod:
oc get deployment egress-router-cni-deployment
$ oc get deployment egress-router-cni-deployment
Copy to Clipboard Copied! The name of the deployment is not configurable.
Example output
NAME READY UP-TO-DATE AVAILABLE AGE egress-router-cni-deployment 1/1 1 1 18m
NAME READY UP-TO-DATE AVAILABLE AGE egress-router-cni-deployment 1/1 1 1 18m
Copy to Clipboard Copied! View the status of the egress router pod:
oc get pods -l app=egress-router-cni
$ oc get pods -l app=egress-router-cni
Copy to Clipboard Copied! Example output
NAME READY STATUS RESTARTS AGE egress-router-cni-deployment-575465c75c-qkq6m 1/1 Running 0 18m
NAME READY STATUS RESTARTS AGE egress-router-cni-deployment-575465c75c-qkq6m 1/1 Running 0 18m
Copy to Clipboard Copied! - View the logs and the routing table for the egress router pod.
Get the node name for the egress router pod:
POD_NODENAME=$(oc get pod -l app=egress-router-cni -o jsonpath="{.items[0].spec.nodeName}")
$ POD_NODENAME=$(oc get pod -l app=egress-router-cni -o jsonpath="{.items[0].spec.nodeName}")
Copy to Clipboard Copied! Enter into a debug session on the target node. This step instantiates a debug pod called
<node_name>-debug
:oc debug node/$POD_NODENAME
$ oc debug node/$POD_NODENAME
Copy to Clipboard Copied! Set
/host
as the root directory within the debug shell. The debug pod mounts the root file system of the host in/host
within the pod. By changing the root directory to/host
, you can run binaries from the executable paths of the host:chroot /host
# chroot /host
Copy to Clipboard Copied! From within the
chroot
environment console, display the egress router logs:cat /tmp/egress-router-log
# cat /tmp/egress-router-log
Copy to Clipboard Copied! Example output
2021-04-26T12:27:20Z [debug] Called CNI ADD 2021-04-26T12:27:20Z [debug] Gateway: 192.168.12.1 2021-04-26T12:27:20Z [debug] IP Source Addresses: [192.168.12.99/24] 2021-04-26T12:27:20Z [debug] IP Destinations: [80 UDP 10.0.0.99/30 8080 TCP 203.0.113.26/30 80 8443 TCP 203.0.113.27/30 443] 2021-04-26T12:27:20Z [debug] Created macvlan interface 2021-04-26T12:27:20Z [debug] Renamed macvlan to "net1" 2021-04-26T12:27:20Z [debug] Adding route to gateway 192.168.12.1 on macvlan interface 2021-04-26T12:27:20Z [debug] deleted default route {Ifindex: 3 Dst: <nil> Src: <nil> Gw: 10.128.10.1 Flags: [] Table: 254} 2021-04-26T12:27:20Z [debug] Added new default route with gateway 192.168.12.1 2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p UDP --dport 80 -j DNAT --to-destination 10.0.0.99 2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p TCP --dport 8080 -j DNAT --to-destination 203.0.113.26:80 2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p TCP --dport 8443 -j DNAT --to-destination 203.0.113.27:443 2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat -o net1 -j SNAT --to-source 192.168.12.99
2021-04-26T12:27:20Z [debug] Called CNI ADD 2021-04-26T12:27:20Z [debug] Gateway: 192.168.12.1 2021-04-26T12:27:20Z [debug] IP Source Addresses: [192.168.12.99/24] 2021-04-26T12:27:20Z [debug] IP Destinations: [80 UDP 10.0.0.99/30 8080 TCP 203.0.113.26/30 80 8443 TCP 203.0.113.27/30 443] 2021-04-26T12:27:20Z [debug] Created macvlan interface 2021-04-26T12:27:20Z [debug] Renamed macvlan to "net1" 2021-04-26T12:27:20Z [debug] Adding route to gateway 192.168.12.1 on macvlan interface 2021-04-26T12:27:20Z [debug] deleted default route {Ifindex: 3 Dst: <nil> Src: <nil> Gw: 10.128.10.1 Flags: [] Table: 254} 2021-04-26T12:27:20Z [debug] Added new default route with gateway 192.168.12.1 2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p UDP --dport 80 -j DNAT --to-destination 10.0.0.99 2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p TCP --dport 8080 -j DNAT --to-destination 203.0.113.26:80 2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p TCP --dport 8443 -j DNAT --to-destination 203.0.113.27:443 2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat -o net1 -j SNAT --to-source 192.168.12.99
Copy to Clipboard Copied! The logging file location and logging level are not configurable when you start the egress router by creating an
EgressRouter
object as described in this procedure.From within the
chroot
environment console, get the container ID:crictl ps --name egress-router-cni-pod | awk '{print $1}'
# crictl ps --name egress-router-cni-pod | awk '{print $1}'
Copy to Clipboard Copied! Example output
CONTAINER bac9fae69ddb6
CONTAINER bac9fae69ddb6
Copy to Clipboard Copied! Determine the process ID of the container. In this example, the container ID is
bac9fae69ddb6
:crictl inspect -o yaml bac9fae69ddb6 | grep 'pid:' | awk '{print $2}'
# crictl inspect -o yaml bac9fae69ddb6 | grep 'pid:' | awk '{print $2}'
Copy to Clipboard Copied! Example output
68857
68857
Copy to Clipboard Copied! Enter the network namespace of the container:
nsenter -n -t 68857
# nsenter -n -t 68857
Copy to Clipboard Copied! Display the routing table:
ip route
# ip route
Copy to Clipboard Copied! In the following example output, the
net1
network interface is the default route. Traffic for the cluster network uses theeth0
network interface. Traffic for the192.168.12.0/24
network uses thenet1
network interface and originates from the reserved source IP address192.168.12.99
. The pod routes all other traffic to the gateway at IP address192.168.12.1
. Routing for the service network is not shown.Example output
default via 192.168.12.1 dev net1 10.128.10.0/23 dev eth0 proto kernel scope link src 10.128.10.18 192.168.12.0/24 dev net1 proto kernel scope link src 192.168.12.99 192.168.12.1 dev net1
default via 192.168.12.1 dev net1 10.128.10.0/23 dev eth0 proto kernel scope link src 10.128.10.18 192.168.12.0/24 dev net1 proto kernel scope link src 192.168.12.99 192.168.12.1 dev net1
Copy to Clipboard Copied!
26.22. Enabling multicast for a project
26.22.1. About multicast
With IP multicast, data is broadcast to many IP addresses simultaneously.
- At this time, multicast is best used for low-bandwidth coordination or service discovery and not a high-bandwidth solution.
-
By default, network policies affect all connections in a namespace. However, multicast is unaffected by network policies. If multicast is enabled in the same namespace as your network policies, it is always allowed, even if there is a
deny-all
network policy. Cluster administrators should consider the implications to the exemption of multicast from network policies before enabling it.
Multicast traffic between OpenShift Container Platform pods is disabled by default. If you are using the OVN-Kubernetes network plugin, you can enable multicast on a per-project basis.
26.22.2. Enabling multicast between pods
You can enable multicast between pods for your project.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
You must log in to the cluster with a user that has the
cluster-admin
role.
Procedure
Run the following command to enable multicast for a project. Replace
<namespace>
with the namespace for the project you want to enable multicast for.oc annotate namespace <namespace> \ k8s.ovn.org/multicast-enabled=true
$ oc annotate namespace <namespace> \ k8s.ovn.org/multicast-enabled=true
Copy to Clipboard Copied! TipYou can alternatively apply the following YAML to add the annotation:
apiVersion: v1 kind: Namespace metadata: name: <namespace> annotations: k8s.ovn.org/multicast-enabled: "true"
apiVersion: v1 kind: Namespace metadata: name: <namespace> annotations: k8s.ovn.org/multicast-enabled: "true"
Copy to Clipboard Copied!
Verification
To verify that multicast is enabled for a project, complete the following procedure:
Change your current project to the project that you enabled multicast for. Replace
<project>
with the project name.oc project <project>
$ oc project <project>
Copy to Clipboard Copied! Create a pod to act as a multicast receiver:
cat <<EOF| oc create -f - apiVersion: v1 kind: Pod metadata: name: mlistener labels: app: multicast-verify spec: containers: - name: mlistener image: registry.access.redhat.com/ubi9 command: ["/bin/sh", "-c"] args: ["dnf -y install socat hostname && sleep inf"] ports: - containerPort: 30102 name: mlistener protocol: UDP EOF
$ cat <<EOF| oc create -f - apiVersion: v1 kind: Pod metadata: name: mlistener labels: app: multicast-verify spec: containers: - name: mlistener image: registry.access.redhat.com/ubi9 command: ["/bin/sh", "-c"] args: ["dnf -y install socat hostname && sleep inf"] ports: - containerPort: 30102 name: mlistener protocol: UDP EOF
Copy to Clipboard Copied! Create a pod to act as a multicast sender:
cat <<EOF| oc create -f - apiVersion: v1 kind: Pod metadata: name: msender labels: app: multicast-verify spec: containers: - name: msender image: registry.access.redhat.com/ubi9 command: ["/bin/sh", "-c"] args: ["dnf -y install socat && sleep inf"] EOF
$ cat <<EOF| oc create -f - apiVersion: v1 kind: Pod metadata: name: msender labels: app: multicast-verify spec: containers: - name: msender image: registry.access.redhat.com/ubi9 command: ["/bin/sh", "-c"] args: ["dnf -y install socat && sleep inf"] EOF
Copy to Clipboard Copied! In a new terminal window or tab, start the multicast listener.
Get the IP address for the Pod:
POD_IP=$(oc get pods mlistener -o jsonpath='{.status.podIP}')
$ POD_IP=$(oc get pods mlistener -o jsonpath='{.status.podIP}')
Copy to Clipboard Copied! Start the multicast listener by entering the following command:
oc exec mlistener -i -t -- \ socat UDP4-RECVFROM:30102,ip-add-membership=224.1.0.1:$POD_IP,fork EXEC:hostname
$ oc exec mlistener -i -t -- \ socat UDP4-RECVFROM:30102,ip-add-membership=224.1.0.1:$POD_IP,fork EXEC:hostname
Copy to Clipboard Copied!
Start the multicast transmitter.
Get the pod network IP address range:
CIDR=$(oc get Network.config.openshift.io cluster \ -o jsonpath='{.status.clusterNetwork[0].cidr}')
$ CIDR=$(oc get Network.config.openshift.io cluster \ -o jsonpath='{.status.clusterNetwork[0].cidr}')
Copy to Clipboard Copied! To send a multicast message, enter the following command:
oc exec msender -i -t -- \ /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64"
$ oc exec msender -i -t -- \ /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64"
Copy to Clipboard Copied! If multicast is working, the previous command returns the following output:
mlistener
mlistener
Copy to Clipboard Copied!
26.23. Disabling multicast for a project
26.23.1. Disabling multicast between pods
You can disable multicast between pods for your project.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
You must log in to the cluster with a user that has the
cluster-admin
role.
Procedure
Disable multicast by running the following command:
oc annotate namespace <namespace> \ k8s.ovn.org/multicast-enabled-
$ oc annotate namespace <namespace> \
1 k8s.ovn.org/multicast-enabled-
Copy to Clipboard Copied! - 1
- The
namespace
for the project you want to disable multicast for.
TipYou can alternatively apply the following YAML to delete the annotation:
apiVersion: v1 kind: Namespace metadata: name: <namespace> annotations: k8s.ovn.org/multicast-enabled: null
apiVersion: v1 kind: Namespace metadata: name: <namespace> annotations: k8s.ovn.org/multicast-enabled: null
Copy to Clipboard Copied!
26.24. Tracking network flows
As a cluster administrator, you can collect information about pod network flows from your cluster to assist with the following areas:
- Monitor ingress and egress traffic on the pod network.
- Troubleshoot performance issues.
- Gather data for capacity planning and security audits.
When you enable the collection of the network flows, only the metadata about the traffic is collected. For example, packet data is not collected, but the protocol, source address, destination address, port numbers, number of bytes, and other packet-level information is collected.
The data is collected in one or more of the following record formats:
- NetFlow
- sFlow
- IPFIX
When you configure the Cluster Network Operator (CNO) with one or more collector IP addresses and port numbers, the Operator configures Open vSwitch (OVS) on each node to send the network flows records to each collector.
You can configure the Operator to send records to more than one type of network flow collector. For example, you can send records to NetFlow collectors and also send records to sFlow collectors.
When OVS sends data to the collectors, each type of collector receives identical records. For example, if you configure two NetFlow collectors, OVS on a node sends identical records to the two collectors. If you also configure two sFlow collectors, the two sFlow collectors receive identical records. However, each collector type has a unique record format.
Collecting the network flows data and sending the records to collectors affects performance. Nodes process packets at a slower rate. If the performance impact is too great, you can delete the destinations for collectors to disable collecting network flows data and restore performance.
Enabling network flow collectors might have an impact on the overall performance of the cluster network.
26.24.1. Network object configuration for tracking network flows
The fields for configuring network flows collectors in the Cluster Network Operator (CNO) are shown in the following table:
Field | Type | Description |
---|---|---|
|
|
The name of the CNO object. This name is always |
|
|
One or more of |
|
| A list of IP address and network port pairs for up to 10 collectors. |
|
| A list of IP address and network port pairs for up to 10 collectors. |
|
| A list of IP address and network port pairs for up to 10 collectors. |
After applying the following manifest to the CNO, the Operator configures Open vSwitch (OVS) on each node in the cluster to send network flows records to the NetFlow collector that is listening at 192.168.1.99:2056
.
Example configuration for tracking network flows
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: exportNetworkFlows: netFlow: collectors: - 192.168.1.99:2056
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
name: cluster
spec:
exportNetworkFlows:
netFlow:
collectors:
- 192.168.1.99:2056
26.24.2. Adding destinations for network flows collectors
As a cluster administrator, you can configure the Cluster Network Operator (CNO) to send network flows metadata about the pod network to a network flows collector.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. - You have a network flows collector and know the IP address and port that it listens on.
Procedure
Create a patch file that specifies the network flows collector type and the IP address and port information of the collectors:
spec: exportNetworkFlows: netFlow: collectors: - 192.168.1.99:2056
spec: exportNetworkFlows: netFlow: collectors: - 192.168.1.99:2056
Copy to Clipboard Copied! Configure the CNO with the network flows collectors:
oc patch network.operator cluster --type merge -p "$(cat <file_name>.yaml)"
$ oc patch network.operator cluster --type merge -p "$(cat <file_name>.yaml)"
Copy to Clipboard Copied! Example output
network.operator.openshift.io/cluster patched
network.operator.openshift.io/cluster patched
Copy to Clipboard Copied!
Verification
Verification is not typically necessary. You can run the following command to confirm that Open vSwitch (OVS) on each node is configured to send network flows records to one or more collectors.
View the Operator configuration to confirm that the
exportNetworkFlows
field is configured:oc get network.operator cluster -o jsonpath="{.spec.exportNetworkFlows}"
$ oc get network.operator cluster -o jsonpath="{.spec.exportNetworkFlows}"
Copy to Clipboard Copied! Example output
{"netFlow":{"collectors":["192.168.1.99:2056"]}}
{"netFlow":{"collectors":["192.168.1.99:2056"]}}
Copy to Clipboard Copied! View the network flows configuration in OVS from each node:
for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node -o jsonpath='{range@.items[*]}{.metadata.name}{"\n"}{end}');
$ for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node -o jsonpath='{range@.items[*]}{.metadata.name}{"\n"}{end}'); do ; echo; echo $pod; oc -n openshift-ovn-kubernetes exec -c ovnkube-controller $pod \ -- bash -c 'for type in ipfix sflow netflow ; do ovs-vsctl find $type ; done'; done
Copy to Clipboard Copied! Example output
ovnkube-node-xrn4p _uuid : a4d2aaca-5023-4f3d-9400-7275f92611f9 active_timeout : 60 add_id_to_interface : false engine_id : [] engine_type : [] external_ids : {} targets : ["192.168.1.99:2056"] ovnkube-node-z4vq9 _uuid : 61d02fdb-9228-4993-8ff5-b27f01a29bd6 active_timeout : 60 add_id_to_interface : false engine_id : [] engine_type : [] external_ids : {} targets : ["192.168.1.99:2056"]- ...
ovnkube-node-xrn4p _uuid : a4d2aaca-5023-4f3d-9400-7275f92611f9 active_timeout : 60 add_id_to_interface : false engine_id : [] engine_type : [] external_ids : {} targets : ["192.168.1.99:2056"] ovnkube-node-z4vq9 _uuid : 61d02fdb-9228-4993-8ff5-b27f01a29bd6 active_timeout : 60 add_id_to_interface : false engine_id : [] engine_type : [] external_ids : {} targets : ["192.168.1.99:2056"]- ...
Copy to Clipboard Copied!
26.24.3. Deleting all destinations for network flows collectors
As a cluster administrator, you can configure the Cluster Network Operator (CNO) to stop sending network flows metadata to a network flows collector.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges.
Procedure
Remove all network flows collectors:
oc patch network.operator cluster --type='json' \ -p='[{"op":"remove", "path":"/spec/exportNetworkFlows"}]'
$ oc patch network.operator cluster --type='json' \ -p='[{"op":"remove", "path":"/spec/exportNetworkFlows"}]'
Copy to Clipboard Copied! Example output
network.operator.openshift.io/cluster patched
network.operator.openshift.io/cluster patched
Copy to Clipboard Copied!
26.25. Configuring hybrid networking
As a cluster administrator, you can configure the Red Hat OpenShift Networking OVN-Kubernetes network plugin to allow Linux and Windows nodes to host Linux and Windows workloads, respectively.
26.25.1. Configuring hybrid networking with OVN-Kubernetes
You can configure your cluster to use hybrid networking with the OVN-Kubernetes network plugin. This allows a hybrid cluster that supports different node networking configurations.
This configuration is necessary to run both Linux and Windows nodes in the same cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster as a user with
cluster-admin
privileges. - Ensure that the cluster uses the OVN-Kubernetes network plugin.
Procedure
To configure the OVN-Kubernetes hybrid network overlay, enter the following command:
oc patch networks.operator.openshift.io cluster --type=merge \ -p '{
$ oc patch networks.operator.openshift.io cluster --type=merge \ -p '{ "spec":{ "defaultNetwork":{ "ovnKubernetesConfig":{ "hybridOverlayConfig":{ "hybridClusterNetwork":[ { "cidr": "<cidr>", "hostPrefix": <prefix> } ], "hybridOverlayVXLANPort": <overlay_port> } } } } }'
Copy to Clipboard Copied! where:
cidr
- Specify the CIDR configuration used for nodes on the additional overlay network. This CIDR must not overlap with the cluster network CIDR.
hostPrefix
-
Specifies the subnet prefix length to assign to each individual node. For example, if
hostPrefix
is set to23
, then each node is assigned a/23
subnet out of the givencidr
, which allows for 510 (2^(32 - 23) - 2) pod IP addresses. If you are required to provide access to nodes from an external network, configure load balancers and routers to manage the traffic. hybridOverlayVXLANPort
-
Specify a custom VXLAN port for the additional overlay network. This is required for running Windows nodes in a cluster installed on vSphere, and must not be configured for any other cloud provider. The custom port can be any open port excluding the default
4789
port. For more information on this requirement, see the Microsoft documentation on Pod-to-pod connectivity between hosts is broken.
NoteWindows Server Long-Term Servicing Channel (LTSC): Windows Server 2019 is not supported on clusters with a custom
hybridOverlayVXLANPort
value because this Windows server version does not support selecting a custom VXLAN port.Example output
network.operator.openshift.io/cluster patched
network.operator.openshift.io/cluster patched
Copy to Clipboard Copied! To confirm that the configuration is active, enter the following command. It can take several minutes for the update to apply.
oc get network.operator.openshift.io -o jsonpath="{.items[0].spec.defaultNetwork.ovnKubernetesConfig}"
$ oc get network.operator.openshift.io -o jsonpath="{.items[0].spec.defaultNetwork.ovnKubernetesConfig}"
Copy to Clipboard Copied!
Chapter 27. OpenShift SDN network plugin
27.1. About the OpenShift SDN network plugin
Part of Red Hat OpenShift Networking, OpenShift SDN is a network plugin that uses a software-defined networking (SDN) approach to provide a unified cluster network that enables communication between pods across the OpenShift Container Platform cluster. This pod network is established and maintained by OpenShift SDN, which configures an overlay network using Open vSwitch (OVS).
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
27.1.1. OpenShift SDN network isolation modes
OpenShift SDN provides three SDN modes for configuring the pod network:
-
Network policy mode allows project administrators to configure their own isolation policies using
NetworkPolicy
objects. Network policy is the default mode in OpenShift Container Platform 4.15. - Multitenant mode provides project-level isolation for pods and services. Pods from different projects cannot send packets to or receive packets from pods and services of a different project. You can disable isolation for a project, allowing it to send network traffic to all pods and services in the entire cluster and receive network traffic from those pods and services.
- Subnet mode provides a flat pod network where every pod can communicate with every other pod and service. The network policy mode provides the same functionality as subnet mode.
27.1.2. Supported network plugin feature matrix
Red Hat OpenShift Networking offers two options for the network plugin, OpenShift SDN and OVN-Kubernetes, for the network plugin. The following table summarizes the current feature support for both network plugins:
Feature | OpenShift SDN | OVN-Kubernetes |
---|---|---|
Egress IPs | Supported | Supported |
Egress firewall | Supported | Supported [1] |
Egress router | Supported | Supported [2] |
Hybrid networking | Not supported | Supported |
IPsec encryption for intra-cluster communication | Not supported | Supported |
IPv4 single-stack | Supported | Supported |
IPv6 single-stack | Not supported | Supported [3] |
IPv4/IPv6 dual-stack | Not Supported | Supported [4] |
IPv6/IPv4 dual-stack | Not supported | Supported [5] |
Kubernetes network policy | Supported | Supported |
Kubernetes network policy logs | Not supported | Supported |
Hardware offloading | Not supported | Supported |
Multicast | Supported | Supported |
- Egress firewall is also known as egress network policy in OpenShift SDN. This is not the same as network policy egress.
- Egress router for OVN-Kubernetes supports only redirect mode.
- IPv6 single-stack networking on a bare-metal platform.
- IPv4/IPv6 dual-stack networking on bare-metal, VMware vSphere (installer-provisioned infrastructure installations only), IBM Power®, IBM Z®, and RHOSP platforms.
- IPv6/IPv4 dual-stack networking on bare-metal, VMware vSphere (installer-provisioned infrastructure installations only), and IBM Power® platforms.
27.2. Configuring egress IPs for a project
As a cluster administrator, you can configure the OpenShift SDN Container Network Interface (CNI) network plugin to assign one or more egress IP addresses to a project.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
27.2.1. Egress IP address architectural design and implementation
The OpenShift Container Platform egress IP address functionality allows you to ensure that the traffic from one or more pods in one or more namespaces has a consistent source IP address for services outside the cluster network.
For example, you might have a pod that periodically queries a database that is hosted on a server outside of your cluster. To enforce access requirements for the server, a packet filtering device is configured to allow traffic only from specific IP addresses. To ensure that you can reliably allow access to the server from only that specific pod, you can configure a specific egress IP address for the pod that makes the requests to the server.
An egress IP address assigned to a namespace is different from an egress router, which is used to send traffic to specific destinations.
In some cluster configurations, application pods and ingress router pods run on the same node. If you configure an egress IP address for an application project in this scenario, the IP address is not used when you send a request to a route from the application project.
An egress IP address is implemented as an additional IP address on the primary network interface of a node and must be in the same subnet as the primary IP address of the node. The additional IP address must not be assigned to any other node in the cluster.
Egress IP addresses must not be configured in any Linux network configuration files, such as ifcfg-eth0
.
27.2.1.1. Platform support
Support for the egress IP address functionality on various platforms is summarized in the following table:
Platform | Supported |
---|---|
Bare metal | Yes |
VMware vSphere | Yes |
Red Hat OpenStack Platform (RHOSP) | Yes |
Amazon Web Services (AWS) | Yes |
Google Cloud Platform (GCP) | Yes |
Microsoft Azure | Yes |
IBM Z® and IBM® LinuxONE | Yes |
IBM Z® and IBM® LinuxONE for Red Hat Enterprise Linux (RHEL) KVM | Yes |
IBM Power® | Yes |
Nutanix | Yes |
The assignment of egress IP addresses to control plane nodes with the EgressIP feature is not supported on a cluster provisioned on Amazon Web Services (AWS). (BZ#2039656).
27.2.1.2. Public cloud platform considerations
Typically, public cloud providers place a limit on egress IPs. This means that there is a constraint on the absolute number of assignable IP addresses per node for clusters provisioned on public cloud infrastructure. The maximum number of assignable IP addresses per node, or the IP capacity, can be described in the following formula:
IP capacity = public cloud default capacity - sum(current IP assignments)
IP capacity = public cloud default capacity - sum(current IP assignments)
While the Egress IPs capability manages the IP address capacity per node, it is important to plan for this constraint in your deployments. For example, if a public cloud provider limits IP address capacity to 10 IP addresses per node, and you have 8 nodes, the total number of assignable IP addresses is only 80. To achieve a higher IP address capacity, you would need to allocate additional nodes. For example, if you needed 150 assignable IP addresses, you would need to allocate 7 additional nodes.
To confirm the IP capacity and subnets for any node in your public cloud environment, you can enter the oc get node <node_name> -o yaml
command. The cloud.network.openshift.io/egress-ipconfig
annotation includes capacity and subnet information for the node.
The annotation value is an array with a single object with fields that provide the following information for the primary network interface:
-
interface
: Specifies the interface ID on AWS and Azure and the interface name on GCP. -
ifaddr
: Specifies the subnet mask for one or both IP address families. -
capacity
: Specifies the IP address capacity for the node. On AWS, the IP address capacity is provided per IP address family. On Azure and GCP, the IP address capacity includes both IPv4 and IPv6 addresses.
Automatic attachment and detachment of egress IP addresses for traffic between nodes are available. This allows for traffic from many pods in namespaces to have a consistent source IP address to locations outside of the cluster. This also supports OpenShift SDN and OVN-Kubernetes, which is the default networking plugin in Red Hat OpenShift Networking in OpenShift Container Platform 4.15.
The RHOSP egress IP address feature creates a Neutron reservation port called egressip-<IP address>
. Using the same RHOSP user as the one used for the OpenShift Container Platform cluster installation, you can assign a floating IP address to this reservation port to have a predictable SNAT address for egress traffic. When an egress IP address on an RHOSP network is moved from one node to another, because of a node failover, for example, the Neutron reservation port is removed and recreated. This means that the floating IP association is lost and you need to manually reassign the floating IP address to the new reservation port.
When an RHOSP cluster administrator assigns a floating IP to the reservation port, OpenShift Container Platform cannot delete the reservation port. The CloudPrivateIPConfig
object cannot perform delete and move operations until an RHOSP cluster administrator unassigns the floating IP from the reservation port.
The following examples illustrate the annotation from nodes on several public cloud providers. The annotations are indented for readability.
Example cloud.network.openshift.io/egress-ipconfig
annotation on AWS
cloud.network.openshift.io/egress-ipconfig: [ { "interface":"eni-078d267045138e436", "ifaddr":{"ipv4":"10.0.128.0/18"}, "capacity":{"ipv4":14,"ipv6":15} } ]
cloud.network.openshift.io/egress-ipconfig: [
{
"interface":"eni-078d267045138e436",
"ifaddr":{"ipv4":"10.0.128.0/18"},
"capacity":{"ipv4":14,"ipv6":15}
}
]
Example cloud.network.openshift.io/egress-ipconfig
annotation on GCP
cloud.network.openshift.io/egress-ipconfig: [ { "interface":"nic0", "ifaddr":{"ipv4":"10.0.128.0/18"}, "capacity":{"ip":14} } ]
cloud.network.openshift.io/egress-ipconfig: [
{
"interface":"nic0",
"ifaddr":{"ipv4":"10.0.128.0/18"},
"capacity":{"ip":14}
}
]
The following sections describe the IP address capacity for supported public cloud environments for use in your capacity calculation.
27.2.1.2.1. Amazon Web Services (AWS) IP address capacity limits
On AWS, constraints on IP address assignments depend on the instance type configured. For more information, see IP addresses per network interface per instance type
27.2.1.2.2. Google Cloud Platform (GCP) IP address capacity limits
On GCP, the networking model implements additional node IP addresses through IP address aliasing, rather than IP address assignments. However, IP address capacity maps directly to IP aliasing capacity.
The following capacity limits exist for IP aliasing assignment:
- Per node, the maximum number of IP aliases, both IPv4 and IPv6, is 100.
- Per VPC, the maximum number of IP aliases is unspecified, but OpenShift Container Platform scalability testing reveals the maximum to be approximately 15,000.
For more information, see Per instance quotas and Alias IP ranges overview.
27.2.1.2.3. Microsoft Azure IP address capacity limits
On Azure, the following capacity limits exist for IP address assignment:
- Per NIC, the maximum number of assignable IP addresses, for both IPv4 and IPv6, is 256.
- Per virtual network, the maximum number of assigned IP addresses cannot exceed 65,536.
For more information, see Networking limits.
27.2.1.3. Limitations
The following limitations apply when using egress IP addresses with the OpenShift SDN network plugin:
- You cannot use manually assigned and automatically assigned egress IP addresses on the same nodes.
- If you manually assign egress IP addresses from an IP address range, you must not make that range available for automatic IP assignment.
- You cannot share egress IP addresses across multiple namespaces using the OpenShift SDN egress IP address implementation.
If you need to share IP addresses across namespaces, the OVN-Kubernetes network plugin egress IP address implementation allows you to span IP addresses across multiple namespaces.
If you use OpenShift SDN in multitenant mode, you cannot use egress IP addresses with any namespace that is joined to another namespace by the projects that are associated with them. For example, if project1
and project2
are joined by running the oc adm pod-network join-projects --to=project1 project2
command, neither project can use an egress IP address. For more information, see BZ#1645577.
27.2.1.4. IP address assignment approaches
You can assign egress IP addresses to namespaces by setting the egressIPs
parameter of the NetNamespace
object. After an egress IP address is associated with a project, OpenShift SDN allows you to assign egress IP addresses to hosts in two ways:
- In the automatically assigned approach, an egress IP address range is assigned to a node.
- In the manually assigned approach, a list of one or more egress IP address is assigned to a node.
Namespaces that request an egress IP address are matched with nodes that can host those egress IP addresses, and then the egress IP addresses are assigned to those nodes. If the egressIPs
parameter is set on a NetNamespace
object, but no node hosts that egress IP address, then egress traffic from the namespace will be dropped.
High availability of nodes is automatic. If a node that hosts an egress IP address is unreachable and there are nodes that are able to host that egress IP address, then the egress IP address will move to a new node. When the unreachable node comes back online, the egress IP address automatically moves to balance egress IP addresses across nodes.
27.2.1.4.1. Considerations when using automatically assigned egress IP addresses
When using the automatic assignment approach for egress IP addresses the following considerations apply:
-
You set the
egressCIDRs
parameter of each node’sHostSubnet
resource to indicate the range of egress IP addresses that can be hosted by a node. OpenShift Container Platform sets theegressIPs
parameter of theHostSubnet
resource based on the IP address range you specify.
If the node hosting the namespace’s egress IP address is unreachable, OpenShift Container Platform will reassign the egress IP address to another node with a compatible egress IP address range. The automatic assignment approach works best for clusters installed in environments with flexibility in associating additional IP addresses with nodes.
27.2.1.4.2. Considerations when using manually assigned egress IP addresses
This approach allows you to control which nodes can host an egress IP address.
If your cluster is installed on public cloud infrastructure, you must ensure that each node that you assign egress IP addresses to has sufficient spare capacity to host the IP addresses. For more information, see "Platform considerations" in a previous section.
When using the manual assignment approach for egress IP addresses the following considerations apply:
-
You set the
egressIPs
parameter of each node’sHostSubnet
resource to indicate the IP addresses that can be hosted by a node. - Multiple egress IP addresses per namespace are supported.
If a namespace has multiple egress IP addresses and those addresses are hosted on multiple nodes, the following additional considerations apply:
- If a pod is on a node that is hosting an egress IP address, that pod always uses the egress IP address on the node.
- If a pod is not on a node that is hosting an egress IP address, that pod uses an egress IP address at random.
27.2.2. Configuring automatically assigned egress IP addresses for a namespace
In OpenShift Container Platform you can enable automatic assignment of an egress IP address for a specific namespace across one or more nodes.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. -
You have installed the OpenShift CLI (
oc
).
Procedure
Update the
NetNamespace
object with the egress IP address using the following JSON:oc patch netnamespace <project_name> --type=merge -p \ '{
$ oc patch netnamespace <project_name> --type=merge -p \ '{ "egressIPs": [ "<ip_address>" ] }'
Copy to Clipboard Copied! where:
<project_name>
- Specifies the name of the project.
<ip_address>
-
Specifies one or more egress IP addresses for the
egressIPs
array.
For example, to assign
project1
to an IP address of 192.168.1.100 andproject2
to an IP address of 192.168.1.101:oc patch netnamespace project1 --type=merge -p \ '{"egressIPs": ["192.168.1.100"]}' oc patch netnamespace project2 --type=merge -p \ '{"egressIPs": ["192.168.1.101"]}'
$ oc patch netnamespace project1 --type=merge -p \ '{"egressIPs": ["192.168.1.100"]}' $ oc patch netnamespace project2 --type=merge -p \ '{"egressIPs": ["192.168.1.101"]}'
Copy to Clipboard Copied! NoteBecause OpenShift SDN manages the
NetNamespace
object, you can make changes only by modifying the existingNetNamespace
object. Do not create a newNetNamespace
object.Indicate which nodes can host egress IP addresses by setting the
egressCIDRs
parameter for each host using the following JSON:oc patch hostsubnet <node_name> --type=merge -p \ '{
$ oc patch hostsubnet <node_name> --type=merge -p \ '{ "egressCIDRs": [ "<ip_address_range>", "<ip_address_range>" ] }'
Copy to Clipboard Copied! where:
<node_name>
- Specifies a node name.
<ip_address_range>
-
Specifies an IP address range in CIDR format. You can specify more than one address range for the
egressCIDRs
array.
For example, to set
node1
andnode2
to host egress IP addresses in the range 192.168.1.0 to 192.168.1.255:oc patch hostsubnet node1 --type=merge -p \ '{"egressCIDRs": ["192.168.1.0/24"]}' oc patch hostsubnet node2 --type=merge -p \ '{"egressCIDRs": ["192.168.1.0/24"]}'
$ oc patch hostsubnet node1 --type=merge -p \ '{"egressCIDRs": ["192.168.1.0/24"]}' $ oc patch hostsubnet node2 --type=merge -p \ '{"egressCIDRs": ["192.168.1.0/24"]}'
Copy to Clipboard Copied! OpenShift Container Platform automatically assigns specific egress IP addresses to available nodes in a balanced way. In this case, it assigns the egress IP address 192.168.1.100 to
node1
and the egress IP address 192.168.1.101 tonode2
or vice versa.
27.2.3. Configuring manually assigned egress IP addresses for a namespace
In OpenShift Container Platform you can associate one or more egress IP addresses with a namespace.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. -
You have installed the OpenShift CLI (
oc
).
Procedure
Update the
NetNamespace
object by specifying the following JSON object with the desired IP addresses:oc patch netnamespace <project_name> --type=merge -p \ '{
$ oc patch netnamespace <project_name> --type=merge -p \ '{ "egressIPs": [ "<ip_address>" ] }'
Copy to Clipboard Copied! where:
<project_name>
- Specifies the name of the project.
<ip_address>
-
Specifies one or more egress IP addresses for the
egressIPs
array.
For example, to assign the
project1
project to the IP addresses192.168.1.100
and192.168.1.101
:oc patch netnamespace project1 --type=merge \ -p '{"egressIPs": ["192.168.1.100","192.168.1.101"]}'
$ oc patch netnamespace project1 --type=merge \ -p '{"egressIPs": ["192.168.1.100","192.168.1.101"]}'
Copy to Clipboard Copied! To provide high availability, set the
egressIPs
value to two or more IP addresses on different nodes. If multiple egress IP addresses are set, then pods use all egress IP addresses roughly equally.NoteBecause OpenShift SDN manages the
NetNamespace
object, you can make changes only by modifying the existingNetNamespace
object. Do not create a newNetNamespace
object.Manually assign the egress IP address to the node hosts.
If your cluster is installed on public cloud infrastructure, you must confirm that the node has available IP address capacity.
Set the
egressIPs
parameter on theHostSubnet
object on the node host. Using the following JSON, include as many IP addresses as you want to assign to that node host:oc patch hostsubnet <node_name> --type=merge -p \ '{
$ oc patch hostsubnet <node_name> --type=merge -p \ '{ "egressIPs": [ "<ip_address>", "<ip_address>" ] }'
Copy to Clipboard Copied! where:
<node_name>
- Specifies a node name.
<ip_address>
-
Specifies an IP address. You can specify more than one IP address for the
egressIPs
array.
For example, to specify that
node1
should have the egress IPs192.168.1.100
,192.168.1.101
, and192.168.1.102
:oc patch hostsubnet node1 --type=merge -p \ '{"egressIPs": ["192.168.1.100", "192.168.1.101", "192.168.1.102"]}'
$ oc patch hostsubnet node1 --type=merge -p \ '{"egressIPs": ["192.168.1.100", "192.168.1.101", "192.168.1.102"]}'
Copy to Clipboard Copied! In the previous example, all egress traffic for
project1
will be routed to the node hosting the specified egress IP, and then connected through Network Address Translation (NAT) to that IP address.
27.2.4. Additional resources
- If you are configuring manual egress IP address assignment, see Platform considerations for information about IP capacity planning.
27.3. Configuring an egress firewall for a project
As a cluster administrator, you can create an egress firewall for a project that restricts egress traffic leaving your OpenShift Container Platform cluster.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
27.3.1. How an egress firewall works in a project
As a cluster administrator, you can use an egress firewall to limit the external hosts that some or all pods can access from within the cluster. An egress firewall supports the following scenarios:
- A pod can only connect to internal hosts and cannot initiate connections to the public internet.
- A pod can only connect to the public internet and cannot initiate connections to internal hosts that are outside the OpenShift Container Platform cluster.
- A pod cannot reach specified internal subnets or hosts outside the OpenShift Container Platform cluster.
- A pod can connect to only specific external hosts.
For example, you can allow one project access to a specified IP range but deny the same access to a different project. Or you can restrict application developers from updating from Python pip mirrors, and force updates to come only from approved sources.
Egress firewall does not apply to the host network namespace. Pods with host networking enabled are unaffected by egress firewall rules.
You configure an egress firewall policy by creating an EgressNetworkPolicy custom resource (CR) object. The egress firewall matches network traffic that meets any of the following criteria:
- An IP address range in CIDR format
- A DNS name that resolves to an IP address
If your egress firewall includes a deny rule for 0.0.0.0/0
, access to your OpenShift Container Platform API servers is blocked. You must either add allow rules for each IP address or use the nodeSelector
type allow rule in your egress policy rules to connect to API servers.
The following example illustrates the order of the egress firewall rules necessary to ensure API server access:
apiVersion: network.openshift.io/v1 kind: EgressNetworkPolicy metadata: name: default namespace: <namespace> spec: egress: - to: cidrSelector: <api_server_address_range> type: Allow # ... - to: cidrSelector: 0.0.0.0/0 type: Deny
apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
name: default
namespace: <namespace>
spec:
egress:
- to:
cidrSelector: <api_server_address_range>
type: Allow
# ...
- to:
cidrSelector: 0.0.0.0/0
type: Deny
To find the IP address for your API servers, run oc get ep kubernetes -n default
.
For more information, see BZ#1988324.
You must have OpenShift SDN configured to use either the network policy or multitenant mode to configure an egress firewall.
If you use network policy mode, an egress firewall is compatible with only one policy per namespace and will not work with projects that share a network, such as global projects.
Egress firewall rules do not apply to traffic that goes through routers. Any user with permission to create a Route CR object can bypass egress firewall policy rules by creating a route that points to a forbidden destination.
27.3.1.1. Limitations of an egress firewall
An egress firewall has the following limitations:
No project can have more than one EgressNetworkPolicy object.
ImportantThe creation of more than one EgressNetworkPolicy object is allowed, however it should not be done. When you create more than one EgressNetworkPolicy object, the following message is returned:
dropping all rules
. In actuality, all external traffic is dropped, which can cause security risks for your organization.- A maximum of one EgressNetworkPolicy object with a maximum of 1,000 rules can be defined per project.
-
The
default
project cannot use an egress firewall. When using the OpenShift SDN network plugin in multitenant mode, the following limitations apply:
-
Global projects cannot use an egress firewall. You can make a project global by using the
oc adm pod-network make-projects-global
command. -
Projects merged by using the
oc adm pod-network join-projects
command cannot use an egress firewall in any of the joined projects.
-
Global projects cannot use an egress firewall. You can make a project global by using the
-
If you create a selectorless service and manually define endpoints or
EndpointSlices
that point to external IPs, traffic to the service IP might still be allowed, even if yourEgressNetworkPolicy
is configured to deny all egress traffic. This occurs because OpenShift SDN does not fully enforce egress network policies for these external endpoints. Consequently, this might result in unexpected access to external services.
Violating any of these restrictions results in a broken egress firewall for the project. Consequently, all external network traffic is dropped, which can cause security risks for your organization.
An Egress Firewall resource can be created in the kube-node-lease
, kube-public
, kube-system
, openshift
and openshift-
projects.
27.3.1.2. Matching order for egress firewall policy rules
The egress firewall policy rules are evaluated in the order that they are defined, from first to last. The first rule that matches an egress connection from a pod applies. Any subsequent rules are ignored for that connection.
27.3.1.3. How Domain Name Server (DNS) resolution works
If you use DNS names in any of your egress firewall policy rules, proper resolution of the domain names is subject to the following restrictions:
- Domain name updates are polled based on a time-to-live (TTL) duration. By default, the duration is 30 seconds. When the egress firewall controller queries the local name servers for a domain name, if the response includes a TTL that is less than 30 seconds, the controller sets the duration to the returned value. If the TTL in the response is greater than 30 minutes, the controller sets the duration to 30 minutes. If the TTL is between 30 seconds and 30 minutes, the controller ignores the value and sets the duration to 30 seconds.
- The pod must resolve the domain from the same local name servers when necessary. Otherwise the IP addresses for the domain known by the egress firewall controller and the pod can be different. If the IP addresses for a hostname differ, the egress firewall might not be enforced consistently.
- Because the egress firewall controller and pods asynchronously poll the same local name server, the pod might obtain the updated IP address before the egress controller does, which causes a race condition. Due to this current limitation, domain name usage in EgressNetworkPolicy objects is only recommended for domains with infrequent IP address changes.
Using DNS names in your egress firewall policy does not affect local DNS resolution through CoreDNS.
However, if your egress firewall policy uses domain names, and an external DNS server handles DNS resolution for an affected pod, you must include egress firewall rules that permit access to the IP addresses of your DNS server.
27.3.2. EgressNetworkPolicy custom resource (CR) object
You can define one or more rules for an egress firewall. A rule is either an Allow
rule or a Deny
rule, with a specification for the traffic that the rule applies to.
The following YAML describes an EgressNetworkPolicy CR object:
EgressNetworkPolicy object
apiVersion: network.openshift.io/v1 kind: EgressNetworkPolicy metadata: name: <name> spec: egress: ...
apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
name: <name>
spec:
egress:
...
27.3.2.1. EgressNetworkPolicy rules
The following YAML describes an egress firewall rule object. The user can select either an IP address range in CIDR format, a domain name, or use the nodeSelector
to allow or deny egress traffic. The egress
stanza expects an array of one or more objects.
Egress policy rule stanza
egress: - type: <type> to: cidrSelector: <cidr> dnsName: <dns_name>
egress:
- type: <type>
to:
cidrSelector: <cidr>
dnsName: <dns_name>
27.3.2.2. Example EgressNetworkPolicy CR objects
The following example defines several egress firewall policy rules:
apiVersion: network.openshift.io/v1 kind: EgressNetworkPolicy metadata: name: default spec: egress: - type: Allow to: cidrSelector: 1.2.3.0/24 - type: Allow to: dnsName: www.example.com - type: Deny to: cidrSelector: 0.0.0.0/0
apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
name: default
spec:
egress:
- type: Allow
to:
cidrSelector: 1.2.3.0/24
- type: Allow
to:
dnsName: www.example.com
- type: Deny
to:
cidrSelector: 0.0.0.0/0
- 1
- A collection of egress firewall policy rule objects.
27.3.3. Creating an egress firewall policy object
As a cluster administrator, you can create an egress firewall policy object for a project.
If the project already has an EgressNetworkPolicy object defined, you must edit the existing policy to make changes to the egress firewall rules.
Prerequisites
- A cluster that uses the OpenShift SDN network plugin.
-
Install the OpenShift CLI (
oc
). - You must log in to the cluster as a cluster administrator.
Procedure
Create a policy rule:
-
Create a
<policy_name>.yaml
file where<policy_name>
describes the egress policy rules. - In the file you created, define an egress policy object.
-
Create a
Enter the following command to create the policy object. Replace
<policy_name>
with the name of the policy and<project>
with the project that the rule applies to.oc create -f <policy_name>.yaml -n <project>
$ oc create -f <policy_name>.yaml -n <project>
Copy to Clipboard Copied! In the following example, a new EgressNetworkPolicy object is created in a project named
project1
:oc create -f default.yaml -n project1
$ oc create -f default.yaml -n project1
Copy to Clipboard Copied! Example output
egressnetworkpolicy.network.openshift.io/v1 created
egressnetworkpolicy.network.openshift.io/v1 created
Copy to Clipboard Copied! -
Optional: Save the
<policy_name>.yaml
file so that you can make changes later.
27.4. Editing an egress firewall for a project
As a cluster administrator, you can modify network traffic rules for an existing egress firewall.
27.4.1. Viewing an EgressNetworkPolicy object
You can view an EgressNetworkPolicy object in your cluster.
Prerequisites
- A cluster using the OpenShift SDN network plugin.
-
Install the OpenShift Command-line Interface (CLI), commonly known as
oc
. - You must log in to the cluster.
Procedure
Optional: To view the names of the EgressNetworkPolicy objects defined in your cluster, enter the following command:
oc get egressnetworkpolicy --all-namespaces
$ oc get egressnetworkpolicy --all-namespaces
Copy to Clipboard Copied! To inspect a policy, enter the following command. Replace
<policy_name>
with the name of the policy to inspect.oc describe egressnetworkpolicy <policy_name>
$ oc describe egressnetworkpolicy <policy_name>
Copy to Clipboard Copied! Example output
Name: default Namespace: project1 Created: 20 minutes ago Labels: <none> Annotations: <none> Rule: Allow to 1.2.3.0/24 Rule: Allow to www.example.com Rule: Deny to 0.0.0.0/0
Name: default Namespace: project1 Created: 20 minutes ago Labels: <none> Annotations: <none> Rule: Allow to 1.2.3.0/24 Rule: Allow to www.example.com Rule: Deny to 0.0.0.0/0
Copy to Clipboard Copied!
27.5. Editing an egress firewall for a project
As a cluster administrator, you can modify network traffic rules for an existing egress firewall.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
27.5.1. Editing an EgressNetworkPolicy object
As a cluster administrator, you can update the egress firewall for a project.
Prerequisites
- A cluster using the OpenShift SDN network plugin.
-
Install the OpenShift CLI (
oc
). - You must log in to the cluster as a cluster administrator.
Procedure
Find the name of the EgressNetworkPolicy object for the project. Replace
<project>
with the name of the project.oc get -n <project> egressnetworkpolicy
$ oc get -n <project> egressnetworkpolicy
Copy to Clipboard Copied! Optional: If you did not save a copy of the EgressNetworkPolicy object when you created the egress network firewall, enter the following command to create a copy.
oc get -n <project> egressnetworkpolicy <name> -o yaml > <filename>.yaml
$ oc get -n <project> egressnetworkpolicy <name> -o yaml > <filename>.yaml
Copy to Clipboard Copied! Replace
<project>
with the name of the project. Replace<name>
with the name of the object. Replace<filename>
with the name of the file to save the YAML to.After making changes to the policy rules, enter the following command to replace the EgressNetworkPolicy object. Replace
<filename>
with the name of the file containing the updated EgressNetworkPolicy object.oc replace -f <filename>.yaml
$ oc replace -f <filename>.yaml
Copy to Clipboard Copied!
27.6. Removing an egress firewall from a project
As a cluster administrator, you can remove an egress firewall from a project to remove all restrictions on network traffic from the project that leaves the OpenShift Container Platform cluster.
OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead.
27.6.1. Removing an EgressNetworkPolicy object
As a cluster administrator, you can remove an egress firewall from a project.
Prerequisites
- A cluster using the OpenShift SDN network plugin.
-
Install the OpenShift CLI (
oc
). - You must log in to the cluster as a cluster administrator.
Procedure
Find the name of the EgressNetworkPolicy object for the project. Replace
<project>
with the name of the project.oc get -n <project> egressnetworkpolicy
$ oc get -n <project> egressnetworkpolicy
Copy to Clipboard Copied! Enter the following command to delete the EgressNetworkPolicy object. Replace
<project>
with the name of the project and<name>
with the name of the object.oc delete -n <project> egressnetworkpolicy <name>
$ oc delete -n <project> egressnetworkpolicy <name>
Copy to Clipboard Copied!
27.7. Considerations for the use of an egress router pod
27.7.1. About an egress router pod
The OpenShift Container Platform egress router pod redirects traffic to a specified remote server from a private source IP address that is not used for any other purpose. An egress router pod can send network traffic to servers that are set up to allow access only from specific IP addresses.
The egress router pod is not intended for every outgoing connection. Creating large numbers of egress router pods can exceed the limits of your network hardware. For example, creating an egress router pod for every project or application could exceed the number of local MAC addresses that the network interface can handle before reverting to filtering MAC addresses in software.
The egress router image is not compatible with Amazon AWS, Azure Cloud, or any other cloud platform that does not support layer 2 manipulations due to their incompatibility with macvlan traffic.
27.7.1.1. Egress router modes
In redirect mode, an egress router pod configures iptables
rules to redirect traffic from its own IP address to one or more destination IP addresses. Client pods that need to use the reserved source IP address must be configured to access the service for the egress router rather than connecting directly to the destination IP. You can access the destination service and port from the application pod by using the curl
command. For example:
curl <router_service_IP> <port>
$ curl <router_service_IP> <port>
In HTTP proxy mode, an egress router pod runs as an HTTP proxy on port 8080
. This mode only works for clients that are connecting to HTTP-based or HTTPS-based services, but usually requires fewer changes to the client pods to get them to work. Many programs can be told to use an HTTP proxy by setting an environment variable.
In DNS proxy mode, an egress router pod runs as a DNS proxy for TCP-based services from its own IP address to one or more destination IP addresses. To make use of the reserved, source IP address, client pods must be modified to connect to the egress router pod rather than connecting directly to the destination IP address. This modification ensures that external destinations treat traffic as though it were coming from a known source.
Redirect mode works for all services except for HTTP and HTTPS. For HTTP and HTTPS services, use HTTP proxy mode. For TCP-based services with IP addresses or domain names, use DNS proxy mode.
27.7.1.2. Egress router pod implementation
The egress router pod setup is performed by an initialization container. That container runs in a privileged context so that it can configure the macvlan interface and set up iptables
rules. After the initialization container finishes setting up the iptables
rules, it exits. Next the egress router pod executes the container to handle the egress router traffic. The image used varies depending on the egress router mode.
The environment variables determine which addresses the egress-router image uses. The image configures the macvlan interface to use EGRESS_SOURCE
as its IP address, with EGRESS_GATEWAY
as the IP address for the gateway.
Network Address Translation (NAT) rules are set up so that connections to the cluster IP address of the pod on any TCP or UDP port are redirected to the same port on IP address specified by the EGRESS_DESTINATION
variable.
If only some of the nodes in your cluster are capable of claiming the specified source IP address and using the specified gateway, you can specify a nodeName
or nodeSelector
to identify which nodes are acceptable.
27.7.1.3. Deployment considerations
An egress router pod adds an additional IP address and MAC address to the primary network interface of the node. As a result, you might need to configure your hypervisor or cloud provider to allow the additional address.
- Red Hat OpenStack Platform (RHOSP)
If you deploy OpenShift Container Platform on RHOSP, you must allow traffic from the IP and MAC addresses of the egress router pod on your OpenStack environment. If you do not allow the traffic, then communication will fail:
openstack port set --allowed-address \ ip_address=<ip_address>,mac_address=<mac_address> <neutron_port_uuid>
$ openstack port set --allowed-address \ ip_address=<ip_address>,mac_address=<mac_address> <neutron_port_uuid>
Copy to Clipboard Copied! - VMware vSphere
- If you are using VMware vSphere, see the VMware documentation for securing vSphere standard switches. View and change VMware vSphere default settings by selecting the host virtual switch from the vSphere Web Client.
Specifically, ensure that the following are enabled:
27.7.1.4. Failover configuration
To avoid downtime, you can deploy an egress router pod with a Deployment
resource, as in the following example. To create a new Service
object for the example deployment, use the oc expose deployment/egress-demo-controller
command.
apiVersion: apps/v1 kind: Deployment metadata: name: egress-demo-controller spec: replicas: 1 selector: matchLabels: name: egress-router template: metadata: name: egress-router labels: name: egress-router annotations: pod.network.openshift.io/assign-macvlan: "true" spec: initContainers: ... containers: ...
apiVersion: apps/v1
kind: Deployment
metadata:
name: egress-demo-controller
spec:
replicas: 1
selector:
matchLabels:
name: egress-router
template:
metadata:
name: egress-router
labels:
name: egress-router
annotations:
pod.network.openshift.io/assign-macvlan: "true"
spec:
initContainers:
...
containers:
...
27.8. Deploying an egress router pod in redirect mode
As a cluster administrator, you can deploy an egress router pod that is configured to redirect traffic to specified destination IP addresses.
27.8.1. Egress router pod specification for redirect mode
Define the configuration for an egress router pod in the Pod
object. The following YAML describes the fields for the configuration of an egress router pod in redirect mode:
apiVersion: v1 kind: Pod metadata: name: egress-1 labels: name: egress-1 annotations: pod.network.openshift.io/assign-macvlan: "true" spec: initContainers: - name: egress-router image: registry.redhat.io/openshift4/ose-egress-router securityContext: privileged: true env: - name: EGRESS_SOURCE value: <egress_router> - name: EGRESS_GATEWAY value: <egress_gateway> - name: EGRESS_DESTINATION value: <egress_destination> - name: EGRESS_ROUTER_MODE value: init containers: - name: egress-router-wait image: registry.redhat.io/openshift4/ose-pod
apiVersion: v1
kind: Pod
metadata:
name: egress-1
labels:
name: egress-1
annotations:
pod.network.openshift.io/assign-macvlan: "true"
spec:
initContainers:
- name: egress-router
image: registry.redhat.io/openshift4/ose-egress-router
securityContext:
privileged: true
env:
- name: EGRESS_SOURCE
value: <egress_router>
- name: EGRESS_GATEWAY
value: <egress_gateway>
- name: EGRESS_DESTINATION
value: <egress_destination>
- name: EGRESS_ROUTER_MODE
value: init
containers:
- name: egress-router-wait
image: registry.redhat.io/openshift4/ose-pod
- 1
- The annotation tells OpenShift Container Platform to create a macvlan network interface on the primary network interface controller (NIC) and move that macvlan interface into the pod’s network namespace. You must include the quotation marks around the
"true"
value. To have OpenShift Container Platform create the macvlan interface on a different NIC interface, set the annotation value to the name of that interface. For example,eth1
. - 2
- IP address from the physical network that the node is on that is reserved for use by the egress router pod. Optional: You can include the subnet length, the
/24
suffix, so that a proper route to the local subnet is set. If you do not specify a subnet length, then the egress router can access only the host specified with theEGRESS_GATEWAY
variable and no other hosts on the subnet. - 3
- Same value as the default gateway used by the node.
- 4
- External server to direct traffic to. Using this example, connections to the pod are redirected to
203.0.113.25
, with a source IP address of192.168.12.99
.
Example egress router pod specification
apiVersion: v1 kind: Pod metadata: name: egress-multi labels: name: egress-multi annotations: pod.network.openshift.io/assign-macvlan: "true" spec: initContainers: - name: egress-router image: registry.redhat.io/openshift4/ose-egress-router securityContext: privileged: true env: - name: EGRESS_SOURCE value: 192.168.12.99/24 - name: EGRESS_GATEWAY value: 192.168.12.1 - name: EGRESS_DESTINATION value: | 80 tcp 203.0.113.25 8080 tcp 203.0.113.26 80 8443 tcp 203.0.113.26 443 203.0.113.27 - name: EGRESS_ROUTER_MODE value: init containers: - name: egress-router-wait image: registry.redhat.io/openshift4/ose-pod
apiVersion: v1
kind: Pod
metadata:
name: egress-multi
labels:
name: egress-multi
annotations:
pod.network.openshift.io/assign-macvlan: "true"
spec:
initContainers:
- name: egress-router
image: registry.redhat.io/openshift4/ose-egress-router
securityContext:
privileged: true
env:
- name: EGRESS_SOURCE
value: 192.168.12.99/24
- name: EGRESS_GATEWAY
value: 192.168.12.1
- name: EGRESS_DESTINATION
value: |
80 tcp 203.0.113.25
8080 tcp 203.0.113.26 80
8443 tcp 203.0.113.26 443
203.0.113.27
- name: EGRESS_ROUTER_MODE
value: init
containers:
- name: egress-router-wait
image: registry.redhat.io/openshift4/ose-pod
27.8.2. Egress destination configuration format
When an egress router pod is deployed in redirect mode, you can specify redirection rules by using one or more of the following formats:
-
<port> <protocol> <ip_address>
- Incoming connections to the given<port>
should be redirected to the same port on the given<ip_address>
.<protocol>
is eithertcp
orudp
. -
<port> <protocol> <ip_address> <remote_port>
- As above, except that the connection is redirected to a different<remote_port>
on<ip_address>
. -
<ip_address>
- If the last line is a single IP address, then any connections on any other port will be redirected to the corresponding port on that IP address. If there is no fallback IP address then connections on other ports are rejected.
In the example that follows several rules are defined:
-
The first line redirects traffic from local port
80
to port80
on203.0.113.25
. -
The second and third lines redirect local ports
8080
and8443
to remote ports80
and443
on203.0.113.26
. - The last line matches traffic for any ports not specified in the previous rules.
Example configuration
80 tcp 203.0.113.25 8080 tcp 203.0.113.26 80 8443 tcp 203.0.113.26 443 203.0.113.27
80 tcp 203.0.113.25
8080 tcp 203.0.113.26 80
8443 tcp 203.0.113.26 443
203.0.113.27
27.8.3. Deploying an egress router pod in redirect mode
In redirect mode, an egress router pod sets up iptables rules to redirect traffic from its own IP address to one or more destination IP addresses. Client pods that need to use the reserved source IP address must be configured to access the service for the egress router rather than connecting directly to the destination IP. You can access the destination service and port from the application pod by using the curl
command. For example:
curl <router_service_IP> <port>
$ curl <router_service_IP> <port>
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
- Create an egress router pod.
To ensure that other pods can find the IP address of the egress router pod, create a service to point to the egress router pod, as in the following example:
apiVersion: v1 kind: Service metadata: name: egress-1 spec: ports: - name: http port: 80 - name: https port: 443 type: ClusterIP selector: name: egress-1
apiVersion: v1 kind: Service metadata: name: egress-1 spec: ports: - name: http port: 80 - name: https port: 443 type: ClusterIP selector: name: egress-1
Copy to Clipboard Copied! Your pods can now connect to this service. Their connections are redirected to the corresponding ports on the external server, using the reserved egress IP address.
27.9. Deploying an egress router pod in HTTP proxy mode
As a cluster administrator, you can deploy an egress router pod configured to proxy traffic to specified HTTP and HTTPS-based services.
27.9.1. Egress router pod specification for HTTP mode
Define the configuration for an egress router pod in the Pod
object. The following YAML describes the fields for the configuration of an egress router pod in HTTP mode:
apiVersion: v1 kind: Pod metadata: name: egress-1 labels: name: egress-1 annotations: pod.network.openshift.io/assign-macvlan: "true" spec: initContainers: - name: egress-router image: registry.redhat.io/openshift4/ose-egress-router securityContext: privileged: true env: - name: EGRESS_SOURCE value: <egress-router> - name: EGRESS_GATEWAY value: <egress-gateway> - name: EGRESS_ROUTER_MODE value: http-proxy containers: - name: egress-router-pod image: registry.redhat.io/openshift4/ose-egress-http-proxy env: - name: EGRESS_HTTP_PROXY_DESTINATION value: |- ... ...
apiVersion: v1
kind: Pod
metadata:
name: egress-1
labels:
name: egress-1
annotations:
pod.network.openshift.io/assign-macvlan: "true"
spec:
initContainers:
- name: egress-router
image: registry.redhat.io/openshift4/ose-egress-router
securityContext:
privileged: true
env:
- name: EGRESS_SOURCE
value: <egress-router>
- name: EGRESS_GATEWAY
value: <egress-gateway>
- name: EGRESS_ROUTER_MODE
value: http-proxy
containers:
- name: egress-router-pod
image: registry.redhat.io/openshift4/ose-egress-http-proxy
env:
- name: EGRESS_HTTP_PROXY_DESTINATION
value: |-
...
...
- 1
- The annotation tells OpenShift Container Platform to create a macvlan network interface on the primary network interface controller (NIC) and move that macvlan interface into the pod’s network namespace. You must include the quotation marks around the
"true"
value. To have OpenShift Container Platform create the macvlan interface on a different NIC interface, set the annotation value to the name of that interface. For example,eth1
. - 2
- IP address from the physical network that the node is on that is reserved for use by the egress router pod. Optional: You can include the subnet length, the
/24
suffix, so that a proper route to the local subnet is set. If you do not specify a subnet length, then the egress router can access only the host specified with theEGRESS_GATEWAY
variable and no other hosts on the subnet. - 3
- Same value as the default gateway used by the node.
- 4
- A string or YAML multi-line string specifying how to configure the proxy. Note that this is specified as an environment variable in the HTTP proxy container, not with the other environment variables in the init container.
27.9.2. Egress destination configuration format
When an egress router pod is deployed in HTTP proxy mode, you can specify redirection rules by using one or more of the following formats. Each line in the configuration specifies one group of connections to allow or deny:
-
An IP address allows connections to that IP address, such as
192.168.1.1
. -
A CIDR range allows connections to that CIDR range, such as
192.168.1.0/24
. -
A hostname allows proxying to that host, such as
www.example.com
. -
A domain name preceded by
*.
allows proxying to that domain and all of its subdomains, such as*.example.com
. -
A
!
followed by any of the previous match expressions denies the connection instead. -
If the last line is
*
, then anything that is not explicitly denied is allowed. Otherwise, anything that is not allowed is denied.
You can also use *
to allow connections to all remote destinations.
Example configuration
!*.example.com !192.168.1.0/24 192.168.2.1 *
!*.example.com
!192.168.1.0/24
192.168.2.1
*
27.9.3. Deploying an egress router pod in HTTP proxy mode
In HTTP proxy mode, an egress router pod runs as an HTTP proxy on port 8080
. This mode only works for clients that are connecting to HTTP-based or HTTPS-based services, but usually requires fewer changes to the client pods to get them to work. Many programs can be told to use an HTTP proxy by setting an environment variable.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
- Create an egress router pod.
To ensure that other pods can find the IP address of the egress router pod, create a service to point to the egress router pod, as in the following example:
apiVersion: v1 kind: Service metadata: name: egress-1 spec: ports: - name: http-proxy port: 8080 type: ClusterIP selector: name: egress-1
apiVersion: v1 kind: Service metadata: name: egress-1 spec: ports: - name: http-proxy port: 8080
1 type: ClusterIP selector: name: egress-1
Copy to Clipboard Copied! - 1
- Ensure the
http
port is set to8080
.
To configure the client pod (not the egress proxy pod) to use the HTTP proxy, set the
http_proxy
orhttps_proxy
variables:apiVersion: v1 kind: Pod metadata: name: app-1 labels: name: app-1 spec: containers: env: - name: http_proxy value: http://egress-1:8080/ - name: https_proxy value: http://egress-1:8080/ ...
apiVersion: v1 kind: Pod metadata: name: app-1 labels: name: app-1 spec: containers: env: - name: http_proxy value: http://egress-1:8080/
1 - name: https_proxy value: http://egress-1:8080/ ...
Copy to Clipboard Copied! - 1
- The service created in the previous step.
NoteUsing the
http_proxy
andhttps_proxy
environment variables is not necessary for all setups. If the above does not create a working setup, then consult the documentation for the tool or software you are running in the pod.
27.10. Deploying an egress router pod in DNS proxy mode
As a cluster administrator, you can deploy an egress router pod configured to proxy traffic to specified DNS names and IP addresses.
27.10.1. Egress router pod specification for DNS mode
Define the configuration for an egress router pod in the Pod
object. The following YAML describes the fields for the configuration of an egress router pod in DNS mode:
apiVersion: v1 kind: Pod metadata: name: egress-1 labels: name: egress-1 annotations: pod.network.openshift.io/assign-macvlan: "true" spec: initContainers: - name: egress-router image: registry.redhat.io/openshift4/ose-egress-router securityContext: privileged: true env: - name: EGRESS_SOURCE value: <egress-router> - name: EGRESS_GATEWAY value: <egress-gateway> - name: EGRESS_ROUTER_MODE value: dns-proxy containers: - name: egress-router-pod image: registry.redhat.io/openshift4/ose-egress-dns-proxy securityContext: privileged: true env: - name: EGRESS_DNS_PROXY_DESTINATION value: |- ... - name: EGRESS_DNS_PROXY_DEBUG value: "1" ...
apiVersion: v1
kind: Pod
metadata:
name: egress-1
labels:
name: egress-1
annotations:
pod.network.openshift.io/assign-macvlan: "true"
spec:
initContainers:
- name: egress-router
image: registry.redhat.io/openshift4/ose-egress-router
securityContext:
privileged: true
env:
- name: EGRESS_SOURCE
value: <egress-router>
- name: EGRESS_GATEWAY
value: <egress-gateway>
- name: EGRESS_ROUTER_MODE
value: dns-proxy
containers:
- name: egress-router-pod
image: registry.redhat.io/openshift4/ose-egress-dns-proxy
securityContext:
privileged: true
env:
- name: EGRESS_DNS_PROXY_DESTINATION
value: |-
...
- name: EGRESS_DNS_PROXY_DEBUG
value: "1"
...
- 1
- The annotation tells OpenShift Container Platform to create a macvlan network interface on the primary network interface controller (NIC) and move that macvlan interface into the pod’s network namespace. You must include the quotation marks around the
"true"
value. To have OpenShift Container Platform create the macvlan interface on a different NIC interface, set the annotation value to the name of that interface. For example,eth1
. - 2
- IP address from the physical network that the node is on that is reserved for use by the egress router pod. Optional: You can include the subnet length, the
/24
suffix, so that a proper route to the local subnet is set. If you do not specify a subnet length, then the egress router can access only the host specified with theEGRESS_GATEWAY
variable and no other hosts on the subnet. - 3
- Same value as the default gateway used by the node.
- 4
- Specify a list of one or more proxy destinations.
- 5
- Optional: Specify to output the DNS proxy log output to
stdout
.
27.10.2. Egress destination configuration format
When the router is deployed in DNS proxy mode, you specify a list of port and destination mappings. A destination may be either an IP address or a DNS name.
An egress router pod supports the following formats for specifying port and destination mappings:
- Port and remote address
-
You can specify a source port and a destination host by using the two field format:
<port> <remote_address>
.
The host can be an IP address or a DNS name. If a DNS name is provided, DNS resolution occurs at runtime. For a given host, the proxy connects to the specified source port on the destination host when connecting to the destination host IP address.
Port and remote address pair example
80 172.16.12.11 100 example.com
80 172.16.12.11
100 example.com
- Port, remote address, and remote port
-
You can specify a source port, a destination host, and a destination port by using the three field format:
<port> <remote_address> <remote_port>
.
The three field format behaves identically to the two field version, with the exception that the destination port can be different than the source port.
Port, remote address, and remote port example
8080 192.168.60.252 80 8443 web.example.com 443
8080 192.168.60.252 80
8443 web.example.com 443
27.10.3. Deploying an egress router pod in DNS proxy mode
In DNS proxy mode, an egress router pod acts as a DNS proxy for TCP-based services from its own IP address to one or more destination IP addresses.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
- Create an egress router pod.
Create a service for the egress router pod:
Create a file named
egress-router-service.yaml
that contains the following YAML. Setspec.ports
to the list of ports that you defined previously for theEGRESS_DNS_PROXY_DESTINATION
environment variable.apiVersion: v1 kind: Service metadata: name: egress-dns-svc spec: ports: ... type: ClusterIP selector: name: egress-dns-proxy
apiVersion: v1 kind: Service metadata: name: egress-dns-svc spec: ports: ... type: ClusterIP selector: name: egress-dns-proxy
Copy to Clipboard Copied!