Networking


OpenShift Container Platform 4.16

Configuring and managing cluster networking

Red Hat OpenShift Documentation Team

Abstract

This document provides instructions for configuring and managing your OpenShift Container Platform cluster network, including DNS, ingress, and the Pod network.

Chapter 1. About networking

Red Hat OpenShift Networking is an ecosystem of features, plugins and advanced networking capabilities that extend Kubernetes networking with the advanced networking-related features that your cluster needs to manage its network traffic for one or multiple hybrid clusters. This ecosystem of networking capabilities integrates ingress, egress, load balancing, high-performance throughput, security, inter- and intra-cluster traffic management and provides role-based observability tooling to reduce its natural complexities.

Note

OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead. For more information, see OpenShift SDN CNI removal.

The following list highlights some of the most commonly used Red Hat OpenShift Networking features available on your cluster:

  • Primary cluster network provided by either of the following Container Network Interface (CNI) plugins:

  • Certified 3rd-party alternative primary network plugins
  • Cluster Network Operator for network plugin management
  • Ingress Operator for TLS encrypted web traffic
  • DNS Operator for name assignment
  • MetalLB Operator for traffic load balancing on bare metal clusters
  • IP failover support for high-availability
  • Additional hardware network support through multiple CNI plugins, including for macvlan, ipvlan, and SR-IOV hardware networks
  • IPv4, IPv6, and dual stack addressing
  • Hybrid Linux-Windows host clusters for Windows-based workloads
  • Red Hat OpenShift Service Mesh for discovery, load balancing, service-to-service authentication, failure recovery, metrics, and monitoring of services
  • Single-node OpenShift
  • Network Observability Operator for network debugging and insights
  • Submariner for inter-cluster networking
  • Red Hat Service Interconnect for layer 7 inter-cluster networking

Chapter 2. Understanding networking

Cluster Administrators have several options for exposing applications that run inside a cluster to external traffic and securing network connections:

  • Service types, such as node ports or load balancers
  • API resources, such as Ingress and Route

By default, Kubernetes allocates each pod an internal IP address for applications running within the pod. Pods and their containers can network, but clients outside the cluster do not have networking access. When you expose your application to external traffic, giving each pod its own IP address means that pods can be treated like physical hosts or virtual machines in terms of port allocation, networking, naming, service discovery, load balancing, application configuration, and migration.

Note

Some cloud platforms offer metadata APIs that listen on the 169.254.169.254 IP address, a link-local IP address in the IPv4 169.254.0.0/16 CIDR block.

This CIDR block is not reachable from the pod network. Pods that need access to these IP addresses must be given host network access by setting the spec.hostNetwork field in the pod spec to true.

If you allow a pod host network access, you grant the pod privileged access to the underlying network infrastructure.

2.1. OpenShift Container Platform DNS

If you are running multiple services, such as front-end and back-end services for use with multiple pods, environment variables are created for user names, service IPs, and more so the front-end pods can communicate with the back-end services. If the service is deleted and recreated, a new IP address can be assigned to the service, and requires the front-end pods to be recreated to pick up the updated values for the service IP environment variable. Additionally, the back-end service must be created before any of the front-end pods to ensure that the service IP is generated properly, and that it can be provided to the front-end pods as an environment variable.

For this reason, OpenShift Container Platform has a built-in DNS so that the services can be reached by the service DNS as well as the service IP/port.

2.2. OpenShift Container Platform Ingress Operator

When you create your OpenShift Container Platform cluster, pods and services running on the cluster are each allocated their own IP addresses. The IP addresses are accessible to other pods and services running nearby but are not accessible to outside clients.

The Ingress Operator makes it possible for external clients to access your service by deploying and managing one or more HAProxy-based Ingress Controllers to handle routing. You can use the Ingress Operator to route traffic by specifying OpenShift Container Platform Route and Kubernetes Ingress resources. Configurations within the Ingress Controller, such as the ability to define endpointPublishingStrategy type and internal load balancing, provide ways to publish Ingress Controller endpoints.

2.2.1. Comparing routes and Ingress

The Kubernetes Ingress resource in OpenShift Container Platform implements the Ingress Controller with a shared router service that runs as a pod inside the cluster. The most common way to manage Ingress traffic is with the Ingress Controller. You can scale and replicate this pod like any other regular pod. This router service is based on HAProxy, which is an open source load balancer solution.

The OpenShift Container Platform route provides Ingress traffic to services in the cluster. Routes provide advanced features that might not be supported by standard Kubernetes Ingress Controllers, such as TLS re-encryption, TLS passthrough, and split traffic for blue-green deployments.

Ingress traffic accesses services in the cluster through a route. Routes and Ingress are the main resources for handling Ingress traffic. Ingress provides features similar to a route, such as accepting external requests and delegating them based on the route. However, with Ingress you can only allow certain types of connections: HTTP/2, HTTPS and server name identification (SNI), and TLS with certificate. In OpenShift Container Platform, routes are generated to meet the conditions specified by the Ingress resource.

This glossary defines common terms that are used in the networking content.

authentication
To control access to an OpenShift Container Platform cluster, a cluster administrator can configure user authentication and ensure only approved users access the cluster. To interact with an OpenShift Container Platform cluster, you must authenticate to the OpenShift Container Platform API. You can authenticate by providing an OAuth access token or an X.509 client certificate in your requests to the OpenShift Container Platform API.
AWS Load Balancer Operator
The AWS Load Balancer (ALB) Operator deploys and manages an instance of the aws-load-balancer-controller.
Cluster Network Operator
The Cluster Network Operator (CNO) deploys and manages the cluster network components in an OpenShift Container Platform cluster. This includes deployment of the Container Network Interface (CNI) network plugin selected for the cluster during installation.
config map
A config map provides a way to inject configuration data into pods. You can reference the data stored in a config map in a volume of type ConfigMap. Applications running in a pod can use this data.
custom resource (CR)
A CR is extension of the Kubernetes API. You can create custom resources.
DNS
Cluster DNS is a DNS server which serves DNS records for Kubernetes services. Containers started by Kubernetes automatically include this DNS server in their DNS searches.
DNS Operator
The DNS Operator deploys and manages CoreDNS to provide a name resolution service to pods. This enables DNS-based Kubernetes Service discovery in OpenShift Container Platform.
deployment
A Kubernetes resource object that maintains the life cycle of an application.
domain
Domain is a DNS name serviced by the Ingress Controller.
egress
The process of data sharing externally through a network’s outbound traffic from a pod.
External DNS Operator
The External DNS Operator deploys and manages ExternalDNS to provide the name resolution for services and routes from the external DNS provider to OpenShift Container Platform.
HTTP-based route
An HTTP-based route is an unsecured route that uses the basic HTTP routing protocol and exposes a service on an unsecured application port.
Ingress
The Kubernetes Ingress resource in OpenShift Container Platform implements the Ingress Controller with a shared router service that runs as a pod inside the cluster.
Ingress Controller
The Ingress Operator manages Ingress Controllers. Using an Ingress Controller is the most common way to allow external access to an OpenShift Container Platform cluster.
installer-provisioned infrastructure
The installation program deploys and configures the infrastructure that the cluster runs on.
kubelet
A primary node agent that runs on each node in the cluster to ensure that containers are running in a pod.
Kubernetes NMState Operator
The Kubernetes NMState Operator provides a Kubernetes API for performing state-driven network configuration across the OpenShift Container Platform cluster’s nodes with NMState.
kube-proxy
Kube-proxy is a proxy service which runs on each node and helps in making services available to the external host. It helps in forwarding the request to correct containers and is capable of performing primitive load balancing.
load balancers
OpenShift Container Platform uses load balancers for communicating from outside the cluster with services running in the cluster.
MetalLB Operator
As a cluster administrator, you can add the MetalLB Operator to your cluster so that when a service of type LoadBalancer is added to the cluster, MetalLB can add an external IP address for the service.
multicast
With IP multicast, data is broadcast to many IP addresses simultaneously.
namespaces
A namespace isolates specific system resources that are visible to all processes. Inside a namespace, only processes that are members of that namespace can see those resources.
networking
Network information of a OpenShift Container Platform cluster.
node
A worker machine in the OpenShift Container Platform cluster. A node is either a virtual machine (VM) or a physical machine.
OpenShift Container Platform Ingress Operator
The Ingress Operator implements the IngressController API and is the component responsible for enabling external access to OpenShift Container Platform services.
pod
One or more containers with shared resources, such as volume and IP addresses, running in your OpenShift Container Platform cluster. A pod is the smallest compute unit defined, deployed, and managed.
PTP Operator
The PTP Operator creates and manages the linuxptp services.
route
The OpenShift Container Platform route provides Ingress traffic to services in the cluster. Routes provide advanced features that might not be supported by standard Kubernetes Ingress Controllers, such as TLS re-encryption, TLS passthrough, and split traffic for blue-green deployments.
scaling
Increasing or decreasing the resource capacity.
service
Exposes a running application on a set of pods.
Single Root I/O Virtualization (SR-IOV) Network Operator
The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.
software-defined networking (SDN)
OpenShift Container Platform uses a software-defined networking (SDN) approach to provide a unified cluster network that enables communication between pods across the OpenShift Container Platform cluster.
Note

OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead. For more information, see OpenShift SDN CNI removal.

Stream Control Transmission Protocol (SCTP)
SCTP is a reliable message based protocol that runs on top of an IP network.
taint
Taints and tolerations ensure that pods are scheduled onto appropriate nodes. You can apply one or more taints on a node.
toleration
You can apply tolerations to pods. Tolerations allow the scheduler to schedule pods with matching taints.
web console
A user interface (UI) to manage OpenShift Container Platform.

Chapter 3. Accessing hosts

To establish secure administrative access to OpenShift Container Platform instances and control plane nodes, create a bastion host.

Configuring a bastion host provides an entry point for Secure Shell (SSH) traffic, ensuring that your cluster remains protected while allowing for remote management.

To establish Secure Shell (SSH) access to OpenShift Container Platform hosts on Amazon EC2 instances that lack public IP addresses, configure a bastion host or secure gateway. Defining this access path ensures that you can safely manage and troubleshoot your private infrastructure within an installer-provisioned environment.

Procedure

  1. Create a security group that allows SSH access into the virtual private cloud (VPC) that the openshift-install command-line interface creates.
  2. Create an Amazon EC2 instance on one of the public subnets the installation program created.
  3. Associate a public IP address with the Amazon EC2 instance that you created.

    Unlike with the OpenShift Container Platform installation, associate the Amazon EC2 instance you created with an SSH keypair. The operating system selection is not important for this instance, because the instanace serves as an SSH bastion to bridge the internet into the VPC of your OpenShift Container Platform cluster. The Amazon Machine Image (AMI) you use does matter. With Red Hat Enterprise Linux CoreOS (RHCOS), for example, you can provide keys through Ignition by using a similar method to the installation program.

  4. After you provisioned your Amazon EC2 instance and can SSH into the instance, add the SSH key that you associated with your OpenShift Container Platform installation. This key can be different from the key for the bastion instance, but this is not a strict requirement.

    Note

    Use direct SSH access only for disaster recovery. When the Kubernetes API is responsive, run privileged pods instead.

  5. Run oc get nodes, inspect the output, and choose one of the nodes that is a control plane. The hostname looks similar to ip-10-0-1-163.ec2.internal.
  6. From the bastion SSH host that you manually deployed into Amazon EC2, SSH into that control plane host by entering the following command. Ensure that you use the same SSH key that you specified during installation:

    $ ssh -i <ssh-key-path> core@<control_plane_hostname>
    Copy to Clipboard Toggle word wrap

Chapter 4. Networking dashboards

To monitor and analyze network performance within your cluster, view networking metrics in the OpenShift Container Platform web console. By accessing these dashboards through ObserveDashboards, you can identify traffic patterns and troubleshoot connectivity issues to ensure consistent workload availability.

Network Observability Operator
If you have the Network Observability Operator installed, you can view network traffic metrics dashboards by selecting the Netobserv dashboard from the Dashboards drop-down list. For more information about metrics available in this Dashboard, see Network Observability metrics dashboards.
Networking and OVN-Kubernetes dashboard

You can view both general networking metrics and OVN-Kubernetes metrics from the dashboard.

To view general networking metrics, select Networking/Linux Subsystem Stats from the Dashboards drop-down list. You can view the following networking metrics from the dashboard: Network Utilisation, Network Saturation, and Network Errors.

To view OVN-Kubernetes metrics select Networking/Infrastructure from the Dashboards drop-down list. You can view the following OVN-Kubernetes metrics: Networking Configuration, TCP Latency Probes, Control Plane Resources, and Worker Resources.

Ingress Operator dashboard

You can view networking metrics handled by the Ingress Operator from the dashboard. This includes metrics like the following:

  • Incoming and outgoing bandwidth
  • HTTP error rates
  • HTTP server response latency

    To view these Ingress metrics, select Networking/Ingress from the Dashboards drop-down list. You can view Ingress metrics for the following categories: Top 10 Per Route, Top 10 Per Namespace, and Top 10 Per Shard.

Chapter 5. Networking Operators

5.1. Kubernetes NMState Operator

The Kubernetes NMState Operator provides a Kubernetes API for performing state-driven network configuration across the OpenShift Container Platform cluster’s nodes with NMState. The Kubernetes NMState Operator provides users with functionality to configure various network interface types, DNS, and routing on cluster nodes. Additionally, the daemons on the cluster nodes periodically report on the state of each node’s network interfaces to the API server.

Important

Red Hat supports the Kubernetes NMState Operator in production environments on bare-metal, IBM Power®, IBM Z®, IBM® LinuxONE, VMware vSphere, and OpenStack installations.

Before you can use NMState with OpenShift Container Platform, you must install the Kubernetes NMState Operator. After you install the Kubernetes NMState Operator, you can complete the following tasks:

  • Observing and updating the node network state and configuration
  • Creating a manifest object that includes a customized br-ex bridge For more information on these tasks, see the Additional resources section

Before you can use NMState with OpenShift Container Platform, you must install the Kubernetes NMState Operator.

Note

The Kubernetes NMState Operator updates the network configuration of a secondary NIC. The Operator cannot update the network configuration of the primary NIC, or update the br-ex bridge on most on-premise networks.

On a bare-metal platform, using the Kubernetes NMState Operator to update the br-ex bridge network configuration is only supported if you set the br-ex bridge as the interface in a machine config manifest file. To update the br-ex bridge as a postinstallation task, you must set the br-ex bridge as the interface in the NMState configuration of the NodeNetworkConfigurationPolicy custom resource (CR) for your cluster. For more information, see Creating a manifest object that includes a customized br-ex bridge in Postinstallation configuration.

OpenShift Container Platform uses nmstate to report on and configure the state of the node network. This makes it possible to modify the network policy configuration, such as by creating a Linux bridge on all nodes, by applying a single configuration manifest to the cluster.

Node networking is monitored and updated by the following objects:

NodeNetworkState
Reports the state of the network on that node.
NodeNetworkConfigurationPolicy
Describes the requested network configuration on nodes. You update the node network configuration, including adding and removing interfaces, by applying a NodeNetworkConfigurationPolicy CR to the cluster.
NodeNetworkConfigurationEnactment
Reports the network policies enacted upon each node.
Note

Do not make configuration changes to the br-ex bridge or its underlying interfaces as a postinstallation task.

5.1.1. Installing the Kubernetes NMState Operator

You can install the Kubernetes NMState Operator by using the web console or the CLI.

You can install the Kubernetes NMState Operator by using the web console. After you install the Kubernetes NMState Operator, the Operator has deployed the NMState State Controller as a daemon set across all of the cluster nodes.

Prerequisites

  • You are logged in as a user with cluster-admin privileges.

Procedure

  1. Select OperatorsOperatorHub.
  2. In the search field below All Items, enter nmstate and click Enter to search for the Kubernetes NMState Operator.
  3. Click on the Kubernetes NMState Operator search result.
  4. Click on Install to open the Install Operator window.
  5. Click Install to install the Operator.
  6. After the Operator finishes installing, click View Operator.
  7. Under Provided APIs, click Create Instance to open the dialog box for creating an instance of kubernetes-nmstate.
  8. In the Name field of the dialog box, ensure the name of the instance is nmstate.

    Note

    The name restriction is a known issue. The instance is a singleton for the entire cluster.

  9. Accept the default settings and click Create to create the instance.

You can install the Kubernetes NMState Operator by using the OpenShift CLI (oc). After it is installed, the Operator can deploy the NMState State Controller as a daemon set across all of the cluster nodes.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You are logged in as a user with cluster-admin privileges.

Procedure

  1. Create the nmstate Operator namespace:

    $ cat << EOF | oc apply -f -
    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-nmstate
    spec:
      finalizers:
      - kubernetes
    EOF
    Copy to Clipboard Toggle word wrap
  2. Create the OperatorGroup:

    $ cat << EOF | oc apply -f -
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: openshift-nmstate
      namespace: openshift-nmstate
    spec:
      targetNamespaces:
      - openshift-nmstate
    EOF
    Copy to Clipboard Toggle word wrap
  3. Subscribe to the nmstate Operator:

    $ cat << EOF| oc apply -f -
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: kubernetes-nmstate-operator
      namespace: openshift-nmstate
    spec:
      channel: stable
      installPlanApproval: Automatic
      name: kubernetes-nmstate-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
    Copy to Clipboard Toggle word wrap
  4. Confirm the ClusterServiceVersion (CSV) status for the nmstate Operator deployment equals Succeeded:

    $ oc get clusterserviceversion -n openshift-nmstate \
     -o custom-columns=Name:.metadata.name,Phase:.status.phase
    Copy to Clipboard Toggle word wrap
  5. Create an instance of the nmstate Operator:

    $ cat << EOF | oc apply -f -
    apiVersion: nmstate.io/v1
    kind: NMState
    metadata:
      name: nmstate
    EOF
    Copy to Clipboard Toggle word wrap
  6. If your cluster has problems with the DNS health check probe because of DNS connectivity issues, you can add the following DNS host name configuration to the NMState CRD to build in health checks that can resolve these issues:

    apiVersion: nmstate.io/v1
    kind: NMState
    metadata:
      name: nmstate
    spec:
      probeConfiguration:
        dns:
          host: redhat.com
    # ...
    Copy to Clipboard Toggle word wrap
    1. Apply the DNS host name configuration to your cluster network by running the following command. Ensure that you replace <filename> with the name of your CRD file.

      $ oc apply -f <filename>.yaml
      Copy to Clipboard Toggle word wrap
    2. Monitor the nmstate CRD until the resource reaches the Available condition by running the following command. Ensure that you set a value for the --timeout option so that if the Available condition is not met within this set maximum waiting time, the command times out and generates an error message.

      $ oc wait --for=condition=Available nmstate/nmstate --timeout=600s
      Copy to Clipboard Toggle word wrap

Verification

  1. Verify that all pods for the NMState Operator have the Running status by entering the following command:

    $ oc get pod -n openshift-nmstate
    Copy to Clipboard Toggle word wrap

You can use the Operator Lifecycle Manager (OLM) to uninstall the Kubernetes NMState Operator, but by design OLM does not delete any associated custom resource definitions (CRDs), custom resources (CRs), or API Services.

Before you uninstall the Kubernetes NMState Operator from the Subcription resource used by OLM, identify what Kubernetes NMState Operator resources to delete. This identification ensures that you can delete resources without impacting your running cluster.

If you need to reinstall the Kubernetes NMState Operator, see "Installing the Kubernetes NMState Operator by using the CLI" or "Installing the Kubernetes NMState Operator by using the web console".

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have installed the jq CLI tool.
  • You are logged in as a user with cluster-admin privileges.

Procedure

  1. Unsubscribe the Kubernetes NMState Operator from the Subcription resource by running the following command:

    $ oc delete --namespace openshift-nmstate subscription kubernetes-nmstate-operator
    Copy to Clipboard Toggle word wrap
  2. Find the ClusterServiceVersion (CSV) resource that associates with the Kubernetes NMState Operator:

    $ oc get --namespace openshift-nmstate clusterserviceversion
    Copy to Clipboard Toggle word wrap

    Example output that lists a CSV resource

    NAME                              	  DISPLAY                   	VERSION   REPLACES     PHASE
    kubernetes-nmstate-operator.v4.18.0   Kubernetes NMState Operator   4.18.0           	   Succeeded
    Copy to Clipboard Toggle word wrap

  3. Delete the CSV resource. After you delete the file, OLM deletes certain resources, such as RBAC, that it created for the Operator.

    $ oc delete --namespace openshift-nmstate clusterserviceversion kubernetes-nmstate-operator.v4.18.0
    Copy to Clipboard Toggle word wrap
  4. Delete the nmstate CR and any associated Deployment resources by running the following commands:

    $ oc -n openshift-nmstate delete nmstate nmstate
    Copy to Clipboard Toggle word wrap
    $ oc delete --all deployments --namespace=openshift-nmstate
    Copy to Clipboard Toggle word wrap
  5. After you deleted the nmstate CR, remove the nmstate-console-plugin console plugin name from the console.operator.openshift.io/cluster CR.

    1. Store the position of the nmstate-console-plugin entry that exists among the list of enable plugins by running the following command. The following command uses the jq CLI tool to store the index of the entry in an environment variable named INDEX:

      INDEX=$(oc get console.operator.openshift.io cluster -o json | jq -r '.spec.plugins | to_entries[] | select(.value == "nmstate-console-plugin") | .key')
      Copy to Clipboard Toggle word wrap
    2. Remove the nmstate-console-plugin entry from the console.operator.openshift.io/cluster CR by running the following patch command:

      $ oc patch console.operator.openshift.io cluster --type=json -p "[{\"op\": \"remove\", \"path\": \"/spec/plugins/$INDEX\"}]" 
      1
      Copy to Clipboard Toggle word wrap
      1
      INDEX is an auxiliary variable. You can specify a different name for this variable.
  6. Delete all the custom resource definitions (CRDs), such as nmstates.nmstate.io, by running the following commands:

    $ oc delete crd nmstates.nmstate.io
    Copy to Clipboard Toggle word wrap
    $ oc delete crd nodenetworkconfigurationenactments.nmstate.io
    Copy to Clipboard Toggle word wrap
    $ oc delete crd nodenetworkstates.nmstate.io
    Copy to Clipboard Toggle word wrap
    $ oc delete crd nodenetworkconfigurationpolicies.nmstate.io
    Copy to Clipboard Toggle word wrap
  7. Delete the namespace:

    $ oc delete namespace kubernetes-nmstate
    Copy to Clipboard Toggle word wrap

5.2. AWS Load Balancer Operator

5.2.1. AWS Load Balancer Operator release notes

The release notes for the AWS Load Balancer (ALB) Operator summarize all new features and enhancements, notable technical changes, major corrections from the previous version, and any known bugs upon general availability.

Important

The AWS Load Balancer (ALB) Operator is only supported on the x86_64 architecture.

These release notes track the development of the AWS Load Balancer Operator in OpenShift Container Platform.

For an overview of the AWS Load Balancer Operator, see AWS Load Balancer Operator in OpenShift Container Platform.

Note

AWS Load Balancer Operator currently does not support AWS GovCloud.

5.2.1.1. AWS Load Balancer Operator 1.2.0

The AWS Load Balancer Operator 1.2.0 release notes summarize all new features and enhancements, notable technical changes, major corrections from the previous version, and any known bugs upon general availability.

The following advisory is available for the AWS Load Balancer Operator version 1.2.0:

5.2.1.2. AWS Load Balancer Operator 1.1.1

The AWS Load Balancer Operator 1.1.1 release notes summarize all new features and enhancements, notable technical changes, major corrections from the previous version, and any known bugs upon general availability.

The following advisory is available for the AWS Load Balancer Operator version 1.1.1:

5.2.1.3. AWS Load Balancer Operator 1.1.0

The AWS Load Balancer Operator 1.1.0 release notes summarize all new features and enhancements, notable technical changes, major corrections from the previous version, and any known bugs upon general availability.

The AWS Load Balancer Operator version 1.1.0 supports the AWS Load Balancer Controller version 2.4.4.

The following advisory is available for the AWS Load Balancer Operator version 1.1.0:

  • RHEA-2023:6218 Release of AWS Load Balancer Operator on OperatorHub Enhancement Advisory Update

    Notable changes
  • This release uses the Kubernetes API version 0.27.2.

    New features
  • The AWS Load Balancer Operator now supports a standardized Security Token Service (STS) flow by using the Cloud Credential Operator.

    Bug fixes
  • A FIPS-compliant cluster must use TLS version 1.2. Previously, webhooks for the AWS Load Balancer Controller only accepted TLS 1.3 as the minimum version, resulting in an error such as the following on a FIPS-compliant cluster:

    remote error: tls: protocol version not supported
    Copy to Clipboard Toggle word wrap

    Now, the AWS Load Balancer Controller accepts TLS 1.2 as the minimum TLS version, resolving this issue. (OCPBUGS-14846)

5.2.1.4. AWS Load Balancer Operator 1.0.1

The AWS Load Balancer Operator 1.0.1 release notes summarize all new features and enhancements, notable technical changes, major corrections from the previous version, and any known bugs upon general availability.

The following advisory is available for the AWS Load Balancer Operator version 1.0.1:

5.2.1.5. AWS Load Balancer Operator 1.0.0

The AWS Load Balancer Operator 1.0.0 release notes summarize all new features and enhancements, notable technical changes, major corrections from the previous version, and any known bugs upon general availability.

The AWS Load Balancer Operator is now generally available with this release. The AWS Load Balancer Operator version 1.0.0 supports the AWS Load Balancer Controller version 2.4.4.

The following advisory is available for the AWS Load Balancer Operator version 1.0.0:

Important

The AWS Load Balancer (ALB) Operator version 1.x.x cannot upgrade automatically from the Technology Preview version 0.x.x. To upgrade from an earlier version, you must uninstall the ALB operands and delete the aws-load-balancer-operator namespace.

Notable changes
  • This release uses the new v1 API version.
Bug fixes
  • Previously, the controller provisioned by the AWS Load Balancer Operator did not properly use the configuration for the cluster-wide proxy. These settings are now applied appropriately to the controller. (OCPBUGS-4052, OCPBUGS-5295)
5.2.1.6. Earlier versions

To evaluate the AWS Load Balancer Operator, use the two earliest versions, which are available as a Technology Preview. Do not use these versions in a production cluster.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The following advisory is available for the AWS Load Balancer Operator version 0.2.0:

The following advisory is available for the AWS Load Balancer Operator version 0.0.1:

To deploy and manage the AWS Load Balancer Controller, install the AWS Load Balancer Operator from the software catalog by using the OpenShift Container Platform web console or CLI. You can use the Operator to integrate AWS load balancers directly into your cluster infrastructure.

5.2.2.1. AWS Load Balancer Operator considerations

To ensure a successful deployment, review the limitations of the AWS Load Balancer Operator. Understanding these constraints helps avoid compatibility issues and ensures the Operator meets your architectural requirements before installation.

Review the following limitations before installing and using the AWS Load Balancer Operator:

  • The IP traffic mode only works on AWS Elastic Kubernetes Service (EKS). The AWS Load Balancer Operator disables the IP traffic mode for the AWS Load Balancer Controller. As a result of disabling the IP traffic mode, the AWS Load Balancer Controller cannot use the pod readiness gate.
  • The AWS Load Balancer Operator adds command-line flags such as --disable-ingress-class-annotation and --disable-ingress-group-name-annotation to the AWS Load Balancer Controller. Therefore, the AWS Load Balancer Operator does not allow using the kubernetes.io/ingress.class and alb.ingress.kubernetes.io/group.name annotations in the Ingress resource.
  • The AWS Load Balancer Operator requires that the service type is NodePort and not LoadBalancer or ClusterIP.
5.2.2.2. Deploying the AWS Load Balancer Operator

After you deploy the The AWS Load Balancer Operator, the Operator automatically tags public subnets if the kubernetes.io/role/elb tag is missing. The Operator then identifies specific network resources in the underlying AWS cloud to ensure successful cluster integration.

The AWS Load Balancer Operator detects the following information from the underlying AWS cloud:

  • The ID of the virtual private cloud (VPC) on which the cluster hosting the Operator is deployed.
  • Public and private subnets of the discovered VPC.

The AWS Load Balancer Operator supports the Kubernetes service resource of type LoadBalancer by using Network Load Balancer (NLB) with the instance target type only.

Procedure

  1. To deploy the AWS Load Balancer Operator on-demand from OperatorHub, create a Subscription object by running the following command:

    $ oc -n aws-load-balancer-operator get sub aws-load-balancer-operator --template='{{.status.installplan.name}}{{"\n"}}'
    Copy to Clipboard Toggle word wrap
  2. Check if the status of an install plan is Complete by running the following command:

    $ oc -n aws-load-balancer-operator get ip <install_plan_name> --template='{{.status.phase}}{{"\n"}}'
    Copy to Clipboard Toggle word wrap
  3. View the status of the aws-load-balancer-operator-controller-manager deployment by running the following command:

    $ oc get -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                           READY     UP-TO-DATE   AVAILABLE   AGE
    aws-load-balancer-operator-controller-manager  1/1       1            1           23h
    Copy to Clipboard Toggle word wrap

To provision an AWS Application Load Balancer in an AWS VPC cluster extended into an Outpost, configure the AWS Load Balancer Operator. Note that the Operator cannot provision AWS Network Load Balancers because AWS Outposts does not support them.

You can create an AWS Application Load Balancer either in the cloud subnet or in the Outpost subnet.

An Application Load Balancer in the cloud can attach to cloud-based compute nodes. An Application Load Balancer in the Outpost can attach to edge compute nodes.

You must annotate Ingress resources with the Outpost subnet or the VPC subnet, but not both.

Prerequisites

  • You have extended an AWS VPC cluster into an Outpost.
  • You have installed the OpenShift CLI (oc).
  • You have installed the AWS Load Balancer Operator and created the AWS Load Balancer Controller.

Procedure

  • Configure the Ingress resource to use a specified subnet:

    Example Ingress resource configuration

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: <application_name>
      annotations:
        alb.ingress.kubernetes.io/subnets: <subnet_id>
    spec:
      ingressClassName: alb
      rules:
        - http:
            paths:
              - path: /
                pathType: Exact
                backend:
                  service:
                    name: <application_name>
                    port:
                      number: 80
    Copy to Clipboard Toggle word wrap

    where:

    <subnet_id>
    Specifies the subnet to use. To use the Application Load Balancer in an Outpost, specify the Outpost subnet ID. To use the Application Load Balancer in the cloud, you must specify at least two subnets in different availability zones.

To install the Amazon Web Services (AWS) Load Balancer Operator on a cluster that uses the Security Token Service (STS), prepare the cluster by configuring the CredentialsRequest object. This ensures the Operator can bootstrap the AWS Load Balancer Controller and access the required secrets.

The AWS Load Balancer Operator waits until the required secrets are created and available.

Before you start any Security Token Service (STS) procedures, ensure that you meet the following prerequisites:

  • You installed the OpenShift CLI (oc).
  • You know the infrastructure ID of your cluster. To show this ID, run the following command in your CLI:

    $ oc get infrastructure cluster -o=jsonpath="{.status.infrastructureName}"
    Copy to Clipboard Toggle word wrap
  • You know the OpenID Connect (OIDC) DNS information for your cluster. To show this information, enter the following command in your CLI:

    $ oc get authentication.config cluster -o=jsonpath="{.spec.serviceAccountIssuer}"
    Copy to Clipboard Toggle word wrap

    where:

    {.spec.serviceAccountIssuer}
    Specifies an OIDC DNS URL. An example URL is https://rh-oidc.s3.us-east-1.amazonaws.com/28292va7ad7mr9r4he1fb09b14t59t4f.
  • You logged into the AWS management console, navigated to IAMAccess managementIdentity providers, and located the OIDC Amazon Resource Name (ARN) information. An OIDC ARN example is arn:aws:iam::777777777777:oidc-provider/<oidc_dns_url>.

To install the Amazon Web Services (AWS) Load Balancer Operator on a cluster by using STS, configure an additional Identity and Access Management (IAM) role. This role enables the Operator to interact with subnets and Virtual Private Clouds (VPCs), allowing the Operator to generate the CredentialsRequest object required for bootstrapping.

You can create the IAM role by using the following options:

  • Using the Cloud Credential Operator utility (ccoctl) and a predefined CredentialsRequest object.
  • Using the AWS CLI and predefined AWS manifests.

Use the AWS CLI if your environment does not support the ccoctl command.

To enable the AWS Load Balancer Operator to interact with subnets and VPCs, create an AWS IAM role by using the Cloud Credential Operator utility (ccoctl). By doing this task, you can generate the necessary credentials for the operator to function correctly within the cluster environment.

Prerequisites

  • You must extract and prepare the ccoctl binary.

Procedure

  1. Download the CredentialsRequest custom resource (CR) and store it in a directory by running the following command:

    $ curl --create-dirs -o <credentials_requests_dir>/operator.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/operator-credentials-request.yaml
    Copy to Clipboard Toggle word wrap
  2. Use the ccoctl utility to create an AWS IAM role by running the following command:

    $ ccoctl aws create-iam-roles \
        --name <name> \
        --region=<aws_region> \
        --credentials-requests-dir=<credentials_requests_dir> \
        --identity-provider-arn <oidc_arn>
    Copy to Clipboard Toggle word wrap

    Example output

    2023/09/12 11:38:57 Role arn:aws:iam::777777777777:role/<name>-aws-load-balancer-operator-aws-load-balancer-operator created
    2023/09/12 11:38:57 Saved credentials configuration to: /home/user/<credentials_requests_dir>/manifests/aws-load-balancer-operator-aws-load-balancer-operator-credentials.yaml
    2023/09/12 11:38:58 Updated Role policy for Role <name>-aws-load-balancer-operator-aws-load-balancer-operator created
    Copy to Clipboard Toggle word wrap

    where:

    <name>

    Specifies the Amazon Resource Name (ARN) for an AWS IAM role that was created for the AWS Load Balancer Operator, such as arn:aws:iam::777777777777:role/<name>-aws-load-balancer-operator-aws-load-balancer-operator.

    Note

    The length of an AWS IAM role name must be less than or equal to 12 characters.

To enable the AWS Load Balancer Operator to interact with subnets and VPCs, create an AWS IAM role by using the AWS CLI. This enables the Operator to access and manage the necessary network resources within the cluster.

Prerequisites

  • You must have access to the AWS Command Line Interface (aws).

Procedure

  1. Generate a trust policy file by using your identity provider by running the following command:

    $ cat <<EOF > albo-operator-trust-policy.json
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": "<oidc_arn>"
                },
                "Action": "sts:AssumeRoleWithWebIdentity",
                "Condition": {
                    "StringEquals": {
                        "<cluster_oidc_endpoint>:sub": "system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-operator-controller-manager"
                    }
                }
            }
        ]
    }
    EOF
    Copy to Clipboard Toggle word wrap

    where:

    <oidc_arn>
    Specifies the Amazon Resource Name (ARN) of the OIDC identity provider, such as arn:aws:iam::777777777777:oidc-provider/rh-oidc.s3.us-east-1.amazonaws.com/28292va7ad7mr9r4he1fb09b14t59t4f.
    serviceaccount
    Specifies the service account for the AWS Load Balancer Controller. An example of <cluster_oidc_endpoint> is rh-oidc.s3.us-east-1.amazonaws.com/28292va7ad7mr9r4he1fb09b14t59t4f.
  2. Create the IAM role with the generated trust policy by running the following command:

    $ aws iam create-role --role-name albo-operator --assume-role-policy-document file://albo-operator-trust-policy.json
    Copy to Clipboard Toggle word wrap

    Example output

    ROLE	arn:aws:iam::<aws_account_number>:role/albo-operator	2023-08-02T12:13:22Z 
    1
    
    ASSUMEROLEPOLICYDOCUMENT	2012-10-17
    STATEMENT	sts:AssumeRoleWithWebIdentity	Allow
    STRINGEQUALS	system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-controller-manager
    PRINCIPAL	arn:aws:iam:<aws_account_number>:oidc-provider/<cluster_oidc_endpoint>
    Copy to Clipboard Toggle word wrap

    where:

    <aws_account_number>
    Specifies the ARN of the created AWS IAM role for the AWS Load Balancer Operator, such as arn:aws:iam::777777777777:role/albo-operator.
  3. Download the permission policy for the AWS Load Balancer Operator by running the following command:

    $ curl -o albo-operator-permission-policy.json https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/operator-permission-policy.json
    Copy to Clipboard Toggle word wrap
  4. Attach the permission policy for the AWS Load Balancer Controller to the IAM role by running the following command:

    $ aws iam put-role-policy --role-name albo-operator --policy-name perms-policy-albo-operator --policy-document file://albo-operator-permission-policy.json
    Copy to Clipboard Toggle word wrap

To authorize the AWS Load Balancer Operator, configure the Amazon Resource Name (ARN) role as an environment variable by using the CLI. This ensures the Operator has the necessary permissions to manage resources within the cluster.

Prerequisites

  • You have installed the OpenShift CLI (oc).

Procedure

  1. Create the aws-load-balancer-operator project by running the following command:

    $ oc new-project aws-load-balancer-operator
    Copy to Clipboard Toggle word wrap
  2. Create the OperatorGroup object by running the following command:

    $ cat <<EOF | oc apply -f -
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: aws-load-balancer-operator
      namespace: aws-load-balancer-operator
    spec:
      targetNamespaces: []
    EOF
    Copy to Clipboard Toggle word wrap
  3. Create the Subscription object by running the following command:

    $ cat <<EOF | oc apply -f -
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: aws-load-balancer-operator
      namespace: aws-load-balancer-operator
    spec:
      channel: stable-v1
      name: aws-load-balancer-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
      config:
        env:
        - name: ROLEARN
          value: "<albo_role_arn>"
    EOF
    Copy to Clipboard Toggle word wrap

    where:

    <albo_role_arn>

    Specifies the ARN role to be used in the CredentialsRequest to provision the AWS credentials for the AWS Load Balancer Operator. An example for <albo_role_arn> is arn:aws:iam::<aws_account_number>:role/albo-operator.

    Note

    The AWS Load Balancer Operator waits until the secret is created before moving to the Available status.

To authorize the AWS Load Balancer Controller, configure the CredentialsRequest object with a manually provisioned IAM role. This ensures the controller functions correctly by using the specific permissions defined in your manual provisioning process.

You can create the IAM role by using the following options:

  • Using the Cloud Credential Operator utility (ccoctl) and a predefined CredentialsRequest object.
  • Using the AWS CLI and predefined AWS manifests.

If your environment does not support the ccoctl command.ws-short CLI, use the AWS CLI.

To enable the AWS Load Balancer Controller to interact with subnets and VPCs, create an IAM role by using the Cloud Credential Operator utility (ccoctl). This utility ensures the controller has the specific permissions required to manage network resources within the cluster.

Prerequisites

  • You must extract and prepare the ccoctl binary.

Procedure

  1. Download the CredentialsRequest custom resource (CR) and store it in a directory by running the following command:

    $ curl --create-dirs -o <credentials_requests_dir>/controller.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/controller/controller-credentials-request.yaml
    Copy to Clipboard Toggle word wrap
  2. Use the ccoctl utility to create an AWS IAM role by running the following command:

    $ ccoctl aws create-iam-roles \
        --name <name> \
        --region=<aws_region> \
        --credentials-requests-dir=<credentials_requests_dir> \
        --identity-provider-arn <oidc_arn>
    Copy to Clipboard Toggle word wrap

    Example output

    2023/09/12 11:38:57 Role arn:aws:iam::777777777777:role/<name>-aws-load-balancer-operator-aws-load-balancer-controller created
    2023/09/12 11:38:57 Saved credentials configuration to: /home/user/<credentials_requests_dir>/manifests/aws-load-balancer-operator-aws-load-balancer-controller-credentials.yaml
    2023/09/12 11:38:58 Updated Role policy for Role <name>-aws-load-balancer-operator-aws-load-balancer-controller created
    Copy to Clipboard Toggle word wrap

    where:

    <name>

    Specifies the Amazon Resource Name (ARN) for an AWS IAM role that was created for the AWS Load Balancer Controller, such as arn:aws:iam::777777777777:role/<name>-aws-load-balancer-operator-aws-load-balancer-controller.

    Note

    The length of an AWS IAM role name must be less than or equal to 12 characters.

To enable the AWS Load Balancer Controller to interact with subnets and Virtual Private Clouds (VPCs), create an IAM role by using the AWS CLI. This ensures the controller has the specific permissions required to manage network resources within the cluster.

Prerequisites

  • You must have access to the AWS command-line interface (aws).

Procedure

  1. Generate a trust policy file using your identity provider by running the following command:

    $ cat <<EOF > albo-controller-trust-policy.json
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Federated": "<oidc_arn>"
                },
                "Action": "sts:AssumeRoleWithWebIdentity",
                "Condition": {
                    "StringEquals": {
                        "<cluster_oidc_endpoint>:sub": "system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-operator-controller-manager"
                    }
                }
            }
        ]
    }
    EOF
    Copy to Clipboard Toggle word wrap

    where:

    <oidc_arn>
    Specifies the Amazon Resource Name (ARN) of the OIDC identity provider, such as arn:aws:iam::777777777777:oidc-provider/rh-oidc.s3.us-east-1.amazonaws.com/28292va7ad7mr9r4he1fb09b14t59t4f.
    serviceaccount
    Specifies the service account for the AWS Load Balancer Controller. An example of <cluster_oidc_endpoint> is rh-oidc.s3.us-east-1.amazonaws.com/28292va7ad7mr9r4he1fb09b14t59t4f.
  2. Create an AWS IAM role with the generated trust policy by running the following command:

    $ aws iam create-role --role-name albo-controller --assume-role-policy-document file://albo-controller-trust-policy.json
    Copy to Clipboard Toggle word wrap

    Example output

    ROLE	arn:aws:iam::<aws_account_number>:role/albo-controller	2023-08-02T12:13:22Z 
    1
    
    ASSUMEROLEPOLICYDOCUMENT	2012-10-17
    STATEMENT	sts:AssumeRoleWithWebIdentity	Allow
    STRINGEQUALS	system:serviceaccount:aws-load-balancer-operator:aws-load-balancer-operator-controller-manager
    PRINCIPAL	arn:aws:iam:<aws_account_number>:oidc-provider/<cluster_oidc_endpoint>
    Copy to Clipboard Toggle word wrap

    where:

    <aws_account_number>
    Specifies the ARN for an AWS IAM role for the AWS Load Balancer Controller, such as arn:aws:iam::777777777777:role/albo-controller.
  3. Download the permission policy for the AWS Load Balancer Controller by running the following command:

    $ curl -o albo-controller-permission-policy.json https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/assets/iam-policy.json
    Copy to Clipboard Toggle word wrap
  4. Attach the permission policy for the AWS Load Balancer Controller to an AWS IAM role by running the following command:

    $ aws iam put-role-policy --role-name albo-controller --policy-name perms-policy-albo-controller --policy-document file://albo-controller-permission-policy.json
    Copy to Clipboard Toggle word wrap
  5. Create a YAML file that defines the AWSLoadBalancerController object:

    Example sample-aws-lb-manual-creds.yaml file

    apiVersion: networking.olm.openshift.io/v1
    kind: AWSLoadBalancerController
    metadata:
      name: cluster
    spec:
      credentialsRequestConfig:
        stsIAMRoleARN: <albc_role_arn>
    Copy to Clipboard Toggle word wrap

    where:

    kind
    Specifies the AWSLoadBalancerController object.
    metatdata.name
    Specifies the AWS Load Balancer Controller name. All related resources use this instance name as a suffix.
    stsIAMRoleARN
    Specifies the ARN role for the AWS Load Balancer Controller. The CredentialsRequest object uses this ARN role to provision the AWS credentials. An example of <albc_role_arn> is arn:aws:iam::777777777777:role/albo-controller.

5.2.4. Installing the AWS Load Balancer Operator

The AWS Load Balancer Operator deploys and manages the AWS Load Balancer Controller. You can install the AWS Load Balancer Operator from the software catalog by using OpenShift Container Platform web console or CLI.

To deploy the AWS Load Balancer Operator, install the Operator by using the web console. You can manage the lifecycle of the Operator by using a graphical interface.

Prerequisites

  • You have logged in to the OpenShift Container Platform web console as a user with cluster-admin permissions.
  • Your cluster is configured with AWS as the platform type and cloud provider.
  • If you are using a security token service (STS) or user-provisioned infrastructure, follow the related preparation steps. For example, if you are using AWS Security Token Service, see "Preparing for the AWS Load Balancer Operator on a cluster using the AWS Security Token Service (STS)".

Procedure

  1. Navigate to OperatorsOperatorHub in the OpenShift Container Platform web console.
  2. Select the AWS Load Balancer Operator. You can use the Filter by keyword text box or the filter list to search for the AWS Load Balancer Operator from the list of Operators.
  3. Select the aws-load-balancer-operator namespace.
  4. On the Install Operator page, select the following options:

    1. For the Update the channel option, select stable-v1.
    2. For the Installation mode option, select All namespaces on the cluster (default).
    3. For the Installed Namespace option, select aws-load-balancer-operator. If the aws-load-balancer-operator namespace does not exist, it gets created during the Operator installation.
    4. Select Update approval as Automatic or Manual. By default, the Update approval is set to Automatic. If you select automatic updates, the Operator Lifecycle Manager (OLM) automatically upgrades the running instance of your Operator without any intervention. If you select manual updates, the OLM creates an update request. As a cluster administrator, you must then manually approve that update request to have the Operator update to the newer version.
  5. Click Install.

Verification

  • Verify that the AWS Load Balancer Operator shows the Status as Succeeded on the Installed Operators dashboard.

To deploy the AWS Load Balancer Controller, install the AWS Load Balancer Operator by using the command-line interface (CLI).

Prerequisites

  • You are logged in to the OpenShift Container Platform web console as a user with cluster-admin permissions.
  • Your cluster is configured with AWS as the platform type and cloud provider.
  • You have logged into the OpenShift CLI (oc).

Procedure

  1. Create a Namespace object:

    1. Create a YAML file that defines the Namespace object:

      Example namespace.yaml file

      apiVersion: v1
      kind: Namespace
      metadata:
        name: aws-load-balancer-operator
      # ...
      Copy to Clipboard Toggle word wrap

    2. Create the Namespace object by running the following command:

      $ oc apply -f namespace.yaml
      Copy to Clipboard Toggle word wrap
  2. Create an OperatorGroup object:

    1. Create a YAML file that defines the OperatorGroup object:

      Example operatorgroup.yaml file

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: aws-lb-operatorgroup
        namespace: aws-load-balancer-operator
      spec:
        upgradeStrategy: Default
      Copy to Clipboard Toggle word wrap

    2. Create the OperatorGroup object by running the following command:

      $ oc apply -f operatorgroup.yaml
      Copy to Clipboard Toggle word wrap
  3. Create a Subscription object:

    1. Create a YAML file that defines the Subscription object:

      Example subscription.yaml file

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: aws-load-balancer-operator
        namespace: aws-load-balancer-operator
      spec:
        channel: stable-v1
        installPlanApproval: Automatic
        name: aws-load-balancer-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
      Copy to Clipboard Toggle word wrap

    2. Create the Subscription object by running the following command:

      $ oc apply -f subscription.yaml
      Copy to Clipboard Toggle word wrap

Verification

  1. Get the name of the install plan from the subscription:

    $ oc -n aws-load-balancer-operator \
      get subscription aws-load-balancer-operator \
      --template='{{.status.installplan.name}}{{"\n"}}'
    Copy to Clipboard Toggle word wrap
  2. Check the status of the install plan:

    $ oc -n aws-load-balancer-operator \
      get ip <install_plan_name> \
      --template='{{.status.phase}}{{"\n"}}'
    Copy to Clipboard Toggle word wrap

    The output must be Complete.

5.2.4.3. Creating the AWS Load Balancer Controller

You can install only a single instance of the AWSLoadBalancerController object in a cluster. You can create the AWS Load Balancer Controller by using CLI. The AWS Load Balancer Operator reconciles only the cluster named resource.

Prerequisites

  • You have created the echoserver namespace.
  • You have access to the OpenShift CLI (oc).

Procedure

  1. Create a YAML file that defines the AWSLoadBalancerController object:

    Example sample-aws-lb.yaml file

    apiVersion: networking.olm.openshift.io/v1
    kind: AWSLoadBalancerController
    metadata:
      name: cluster
    spec:
      subnetTagging: Auto
      additionalResourceTags:
      - key: example.org/security-scope
        value: staging
      ingressClass: alb
      config:
        replicas: 2
      enabledAddons:
        - AWSWAFv2
    Copy to Clipboard Toggle word wrap

    where:

    kind
    Specifies the AWSLoadBalancerController object.
    metadata.name
    Specifies the AWS Load Balancer Controller name. The Operator adds this instance name as a suffix to all related resources.
    spec.subnetTagging

    Specifies the subnet tagging method for the AWS Load Balancer Controller. The following values are valid:

    • Auto: The AWS Load Balancer Operator determines the subnets that belong to the cluster and tags them appropriately. The Operator cannot determine the role correctly if the internal subnet tags are not present on internal subnet.
    • Manual: You manually tag the subnets that belong to the cluster with the appropriate role tags. Use this option if you installed your cluster on user-provided infrastructure.
    spec.additionalResourceTags
    Specifies the tags used by the AWS Load Balancer Controller when it provisions AWS resources.
    ingressClass
    Specifies the ingress class name. The default value is alb.
    config.replicas
    Specifies the number of replicas of the AWS Load Balancer Controller.
    enabledAddons
    Specifies annotations as an add-on for the AWS Load Balancer Controller.
    AWSWAFv2
    Specifies that enablement of the alb.ingress.kubernetes.io/wafv2-acl-arn annotation.
  2. Create the AWSLoadBalancerController object by running the following command:

    $ oc create -f sample-aws-lb.yaml
    Copy to Clipboard Toggle word wrap
  3. Create a YAML file that defines the Deployment resource:

    Example sample-aws-lb.yaml file

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: <echoserver>
      namespace: echoserver
    spec:
      selector:
        matchLabels:
          app: echoserver
      replicas: 3
      template:
        metadata:
          labels:
            app: echoserver
        spec:
          containers:
            - image: openshift/origin-node
              command:
               - "/bin/socat"
              args:
                - TCP4-LISTEN:8080,reuseaddr,fork
                - EXEC:'/bin/bash -c \"printf \\\"HTTP/1.0 200 OK\r\n\r\n\\\"; sed -e \\\"/^\r/q\\\"\"'
              imagePullPolicy: Always
              name: echoserver
              ports:
                - containerPort: 8080
    Copy to Clipboard Toggle word wrap

    where:

    kind
    Specifies the deployment resource.
    metadata.name
    Specifies the deployment name.
    spec.replicas
    Specifies the number of replicas of the deployment.
  4. Create a YAML file that defines the Service resource:

    Example service-albo.yaml file

    apiVersion: v1
    kind: Service
    metadata:
      name: <echoserver>
      namespace: echoserver
    spec:
      ports:
        - port: 80
          targetPort: 8080
          protocol: TCP
      type: NodePort
      selector:
        app: echoserver
    Copy to Clipboard Toggle word wrap

    where:

    apiVersion
    Specifies the service resource.
    metadata.name
    Specifies the service name.
  5. Create a YAML file that defines the Ingress resource:

    Example ingress-albo.yaml file

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: <name>
      namespace: echoserver
      annotations:
        alb.ingress.kubernetes.io/scheme: internet-facing
        alb.ingress.kubernetes.io/target-type: instance
    spec:
      ingressClassName: alb
      rules:
        - http:
            paths:
              - path: /
                pathType: Exact
                backend:
                  service:
                    name: <echoserver>
                    port:
                      number: 80
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specifies a name for the Ingress resource.
    service.name
    Specifies the service name.

Verification

  • Save the status of the Ingress resource in the HOST variable by running the following command:

    $ HOST=$(oc get ingress -n echoserver echoserver --template='{{(index .status.loadBalancer.ingress 0).hostname}}')
    Copy to Clipboard Toggle word wrap
  • Verify the status of the Ingress resource by running the following command:

    $ curl $HOST
    Copy to Clipboard Toggle word wrap

5.2.5. Configuring the AWS Load Balancer Operator

To automate the provisioning of AWS Load Balancers for your applications, configure the AWS Load Balancer Operator. This setup ensures that the Operator correctly manages ingress resources and external access to your cluster.

You can configure the cluster-wide proxy in the AWS Load Balancer Operator. After configuring the cluster-wide proxy, Operator Lifecycle Manager (OLM) automatically updates all the deployments of the Operators with the environment variables.

Environment variables include HTTP_PROXY, HTTPS_PROXY, and NO_PROXY. These variables are populated to the managed controller by the AWS Load Balancer Operator.

Procedure

  1. Create the config map to contain the certificate authority (CA) bundle in the aws-load-balancer-operator namespace by running the following command:

    $ oc -n aws-load-balancer-operator create configmap trusted-ca
    Copy to Clipboard Toggle word wrap
  2. To inject the trusted CA bundle into the config map, add the config.openshift.io/inject-trusted-cabundle=true label to the config map by running the following command:

    $ oc -n aws-load-balancer-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
    Copy to Clipboard Toggle word wrap
  3. Update the AWS Load Balancer Operator subscription to access the config map in the AWS Load Balancer Operator deployment by running the following command:

    $ oc -n aws-load-balancer-operator patch subscription aws-load-balancer-operator --type='merge' -p '{"spec":{"config":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}],"volumes":[{"name":"trusted-ca","configMap":{"name":"trusted-ca"}}],"volumeMounts":[{"name":"trusted-ca","mountPath":"/etc/pki/tls/certs/albo-tls-ca-bundle.crt","subPath":"ca-bundle.crt"}]}}}'
    Copy to Clipboard Toggle word wrap
  4. After the AWS Load Balancer Operator is deployed, verify that the CA bundle is added to the aws-load-balancer-operator-controller-manager deployment by running the following command:

    $ oc -n aws-load-balancer-operator exec deploy/aws-load-balancer-operator-controller-manager -c manager -- bash -c "ls -l /etc/pki/tls/certs/albo-tls-ca-bundle.crt; printenv TRUSTED_CA_CONFIGMAP_NAME"
    Copy to Clipboard Toggle word wrap

    Example output

    -rw-r--r--. 1 root 1000690000 5875 Jan 11 12:25 /etc/pki/tls/certs/albo-tls-ca-bundle.crt
    trusted-ca
    Copy to Clipboard Toggle word wrap

  5. Optional: Restart deployment of the AWS Load Balancer Operator every time the config map changes by running the following command:

    $ oc -n aws-load-balancer-operator rollout restart deployment/aws-load-balancer-operator-controller-manager
    Copy to Clipboard Toggle word wrap

To secure traffic for your domain, configure TLS termination on the AWS Load Balancer. This setup routes traffic to the pods of a service while ensuring that encrypted connections are decrypted at the load balancer level.

Prerequisites

  • You have access to the OpenShift CLI (oc).

Procedure

  1. Create a YAML file that defines the AWSLoadBalancerController resource:

    Example add-tls-termination-albc.yaml file

    apiVersion: networking.olm.openshift.io/v1
    kind: AWSLoadBalancerController
    metadata:
      name: cluster
    spec:
      subnetTagging: Auto
      ingressClass: tls-termination
    # ...
    Copy to Clipboard Toggle word wrap

    where:

    spec.ingressClass
    Specifies the ingress class name. If the ingress class is not present in your cluster the AWS Load Balancer Controller creates one. The AWS Load Balancer Controller reconciles the additional ingress class values if spec.controller is set to ingress.k8s.aws/alb.
  2. Create a YAML file that defines the Ingress resource:

    Example add-tls-termination-ingress.yaml file

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: <example>
      annotations:
        alb.ingress.kubernetes.io/scheme: internet-facing
        alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:xxxxx
    spec:
      ingressClassName: tls-termination
      rules:
      - host: example.com
        http:
            paths:
              - path: /
                pathType: Exact
                backend:
                  service:
                    name: <example_service>
                    port:
                      number: 80
    # ...
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specifies the ingress name.
    annotations.alb.ingress.kubernetes.io/scheme
    Specifies the controller that provisions the load balancer for ingress. The provisioning happens in a public subnet to access the load balancer over the internet.
    annotations.alb.ingress.kubernetes.io/certificate-arn
    Specifies the Amazon Resource Name (ARN) of the certificate that you attach to the load balancer.
    spec.ingressClassName
    Specifies the ingress class name.
    rules.host
    Specifies the domain for traffic routing.
    backend.service
    Specifies the service for traffic routing.

To route traffic to different services within a single domain, configure multiple ingress resources on a single AWS Load Balancer. This setup allows each resource to provide different endpoints while sharing the same load balancing infrastructure.

Prerequisites

  • You have access to the OpenShift CLI (oc).

Procedure

  1. Create an IngressClassParams resource YAML file, for example, sample-single-lb-params.yaml, as follows:

    apiVersion: elbv2.k8s.aws/v1beta1
    kind: IngressClassParams
    metadata:
      name: single-lb-params
    spec:
      group:
        name: single-lb
    Copy to Clipboard Toggle word wrap

    where:

    apiVersion
    Specifies the API group and version of the IngressClassParams resource.
    metadata.name
    Specifies the IngressClassParams resource name.
    spec.group.name
    Specifies the IngressGroup resource name. All of the Ingress resources of this class belong to this IngressGroup.
  2. Create the IngressClassParams resource by running the following command:

    $ oc create -f sample-single-lb-params.yaml
    Copy to Clipboard Toggle word wrap
  3. Create the IngressClass resource YAML file, for example, sample-single-lb-class.yaml, as follows:

    apiVersion: networking.k8s.io/v1
    kind: IngressClass
    metadata:
      name: single-lb
    spec:
      controller: ingress.k8s.aws/alb
      parameters:
        apiGroup: elbv2.k8s.aws
        kind: IngressClassParams
        name: single-lb-params
    Copy to Clipboard Toggle word wrap

    where:

    apiVersion
    Specifies the API group and version of the IngressClass resource.
    metadata.name
    Specifies the ingress class name.
    spec.controller
    Specifies the controller name. The ingress.k8s.aws/alb value denotes that all ingress resources of this class should be managed by the AWS Load Balancer Controller.
    parameters.apiGroup
    Specifies the API group of the IngressClassParams resource.
    parameters.kind
    Specifies the resource type of the IngressClassParams resource.
    parameters.name
    Specifies the IngressClassParams resource name.
  4. Create the IngressClass resource by running the following command:

    $ oc create -f sample-single-lb-class.yaml
    Copy to Clipboard Toggle word wrap
  5. Create the AWSLoadBalancerController resource YAML file, for example, sample-single-lb.yaml, as follows:

    apiVersion: networking.olm.openshift.io/v1
    kind: AWSLoadBalancerController
    metadata:
      name: cluster
    spec:
      subnetTagging: Auto
      ingressClass: single-lb
    Copy to Clipboard Toggle word wrap

    where:

    spec.ingressClass
    Specifies the name of the IngressClass resource.
  6. Create the AWSLoadBalancerController resource by running the following command:

    $ oc create -f sample-single-lb.yaml
    Copy to Clipboard Toggle word wrap
  7. Create the Ingress resource YAML file, for example, sample-multiple-ingress.yaml, as follows:

    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: example-1
      annotations:
        alb.ingress.kubernetes.io/scheme: internet-facing
        alb.ingress.kubernetes.io/group.order: "1"
        alb.ingress.kubernetes.io/target-type: instance
    spec:
      ingressClassName: single-lb
      rules:
      - host: example.com
        http:
            paths:
            - path: /blog
              pathType: Prefix
              backend:
                service:
                  name: example-1
                  port:
                    number: 80
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: example-2
      annotations:
        alb.ingress.kubernetes.io/scheme: internet-facing
        alb.ingress.kubernetes.io/group.order: "2"
        alb.ingress.kubernetes.io/target-type: instance
    spec:
      ingressClassName: single-lb
      rules:
      - host: example.com
        http:
            paths:
            - path: /store
              pathType: Prefix
              backend:
                service:
                  name: example-2
                  port:
                    number: 80
    ---
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
      name: example-3
      annotations:
        alb.ingress.kubernetes.io/scheme: internet-facing
        alb.ingress.kubernetes.io/group.order: "3"
        alb.ingress.kubernetes.io/target-type: instance
    spec:
      ingressClassName: single-lb
      rules:
      - host: example.com
        http:
            paths:
            - path: /
              pathType: Prefix
              backend:
                service:
                  name: example-3
                  port:
                    number: 80
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specifies the ingress name.
    alb.ingress.kubernetes.io/scheme
    Specifies the load balancer to provision in the public subnet to access the internet.
    alb.ingress.kubernetes.io/group.order
    Specifies the order in which the rules from the multiple ingress resources are matched when the request is received at the load balancer.
    alb.ingress.kubernetes.io/target-type
    Specifies that the load balancer will target OpenShift Container Platform nodes to reach the service.
    spec.ingressClassName
    Specifies the ingress class that belongs to this ingress.
    rules.host
    Specifies a domain name used for request routing.
    http.paths.path
    Specifies the path that must route to the service.
    backend.service.name
    Specifies the service name that serves the endpoint configured in the Ingress resource.
    port.number
    Specifies the port on the service that serves the endpoint.
  8. Create the Ingress resource by running the following command:

    $ oc create -f sample-multiple-ingress.yaml
    Copy to Clipboard Toggle word wrap
5.2.5.4. AWS Load Balancer Operator logs

To troubleshoot the AWS Load Balancer Operator, view the logs using the oc logs command. By viewing the logs, you can diagnose issues and monitor the activity of the Operator.

Procedure

  • View the logs of the AWS Load Balancer Operator by running the following command:

    $ oc logs -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager -c manager
    Copy to Clipboard Toggle word wrap

5.3. External DNS Operator

5.3.1. External DNS Operator release notes

The External DNS Operator deploys and manages ExternalDNS to provide name resolution for services and routes. This enables your external DNS provider to resolve hostnames directly to OpenShift Container Platform resources.

Important

The External DNS Operator is only supported on the x86_64 architecture.

These release notes track the development of the External DNS Operator in OpenShift Container Platform.

5.3.1.1. External DNS Operator 1.2

The External DNS Operator 1.2 release notes summarize all new features and enhancements, notable technical changes, major corrections from previous versions, and any known bugs upon general availability.

External DNS Operator 1.2.0

The following advisory is available for the External DNS Operator version 1.2.0:

5.3.1.2. External DNS Operator 1.1

The External DNS Operator 1.1 release notes summarize all new features and enhancements, notable technical changes, major corrections from previous versions, and any known bugs upon general availability.

External DNS Operator 1.1.1

The following advisory is available for the External DNS Operator version 1.1.1:

External DNS Operator 1.1.0

This release included a rebase of the operand from the upstream project version 0.13.1. The following advisory is available for the External DNS Operator version 1.1.0:

5.3.1.3. External DNS Operator 1.0

The External DNS Operator 1.0 release notes summarize all new features and enhancements, notable technical changes, major corrections from previous versions, and any known bugs upon general availability.

External DNS Operator 1.0.1

The following advisory is available for the External DNS Operator version 1.0.1:

External DNS Operator 1.0.0

The following advisory is available for the External DNS Operator version 1.0.0:

5.3.2. Understanding the External DNS Operator

To provide name resolution for services and routes from an External DNS provider to OpenShift Container Platform, use the External DNS Operator. This Operator deploys and manages ExternalDNS to synchronize your cluster resources with the external provider.

To prevent configuration errors when deploying the ExternalDNS resource, review the domain name limitations enforced by the External DNS Operator. Understanding these constraints ensures that your requested hostnames and domains are compatible with your underlying DNS provider.

The External DNS Operator uses the TXT registry that adds the prefix for TXT records. This reduces the maximum length of the domain name for TXT records. A DNS record cannot be present without a corresponding TXT record, so the domain name of the DNS record must follow the same limit as the TXT records. For example, a DNS record of <domain_name_from_source> results in a TXT record of external-dns-<record_type>-<domain_name_from_source>.

The domain name of the DNS records generated by the External DNS Operator has the following limitations:

Expand
Record typeNumber of characters

CNAME

44

Wildcard CNAME records on AzureDNS

42

A

48

Wildcard A records on AzureDNS

46

The following error shows in the External DNS Operator logs if the generated domain name exceeds any of the domain name limitations:

time="2022-09-02T08:53:57Z" level=error msg="Failure in zone test.example.io. [Id: /hostedzone/Z06988883Q0H0RL6UMXXX]"
time="2022-09-02T08:53:57Z" level=error msg="InvalidChangeBatch: [FATAL problem: DomainLabelTooLong (Domain label is too long) encountered with 'external-dns-a-hello-openshift-aaaaaaaaaa-bbbbbbbbbb-ccccccc']\n\tstatus code: 400, request id: e54dfd5a-06c6-47b0-bcb9-a4f7c3a4e0c6"
Copy to Clipboard Toggle word wrap
5.3.2.2. Deploying the External DNS Operator

You can deploy the External DNS Operator on-demand from the Software Catalog. Deploying the External DNS Operator creates a Subscription object.

The External DNS Operator implements the External DNS API from the olm.openshift.io API group. The External DNS Operator updates services, routes, and external DNS providers.

Prerequisites

  • You have installed the yq CLI tool.

Procedure

  1. Check the name of an install plan, such as install-zcvlr, by running the following command:

    $ oc -n external-dns-operator get sub external-dns-operator -o yaml | yq '.status.installplan.name'
    Copy to Clipboard Toggle word wrap
  2. Check if the status of an install plan is Complete by running the following command:

    $ oc -n external-dns-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
    Copy to Clipboard Toggle word wrap
  3. View the status of the external-dns-operator deployment by running the following command:

    $ oc get -n external-dns-operator deployment/external-dns-operator
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                    READY     UP-TO-DATE   AVAILABLE   AGE
    external-dns-operator   1/1       1            1           23h
    Copy to Clipboard Toggle word wrap

5.3.2.3. Viewing External DNS Operator logs

To troubleshoot DNS configuration issues, view the External DNS Operator logs. Use the oc logs command to retrieve diagnostic information directly from the Operator pod.

Procedure

  • View the logs of the External DNS Operator by running the following command:

    $ oc logs -n external-dns-operator deployment/external-dns-operator -c external-dns-operator
    Copy to Clipboard Toggle word wrap

5.3.3. Installing the External DNS Operator

To manage DNS records on your cloud infrastructure, install the External DNS Operator. This Operator supports deployment on major cloud providers, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud.

You can install the External DNS Operator by using the OpenShift Container Platform OperatorHub. You can then manage the Operator lifecycle directly from the web console.

Procedure

  1. Click OperatorsOperatorHub in the OpenShift Container Platform web console.
  2. Click External DNS Operator. You can use the Filter by keyword text box or the filter list to search for External DNS Operator from the list of Operators.
  3. Select the external-dns-operator namespace.
  4. On the External DNS Operator page, click Install.
  5. On the Install Operator page, ensure that you selected the following options:

    1. Update the channel as stable-v1.
    2. Installation mode as A specific name on the cluster.
    3. Installed namespace as external-dns-operator. If namespace external-dns-operator does not exist, the Operator gets created during the Operator installation.
    4. Select Approval Strategy as Automatic or Manual. The Approval Strategy defaults to Automatic.
    5. Click Install.

      If you select Automatic updates, the Operator Lifecycle Manager (OLM) automatically upgrades the running instance of your Operator without any intervention.

      If you select Manual updates, the OLM creates an update request. As a cluster administrator, you must then manually approve that update request to have the Operator updated to the new version.

Verification

  • Verify that the External DNS Operator shows the Status as Succeeded on the Installed Operators dashboard.

You can use the OpenShift CLI (oc) to install the External DNS Operator. The Operator manages the installation process directly from your terminal without you having to use the web console.

Prerequisites

  • You are logged in to the OpenShift CLI (oc).

Procedure

  1. Create a Namespace object:

    1. Create a YAML file that defines the Namespace object:

      Example namespace.yaml file

      apiVersion: v1
      kind: Namespace
      metadata:
        name: external-dns-operator
      # ...
      Copy to Clipboard Toggle word wrap

    2. Create the Namespace object by running the following command:

      $ oc apply -f namespace.yaml
      Copy to Clipboard Toggle word wrap
  2. Create an OperatorGroup object:

    1. Create a YAML file that defines the OperatorGroup object:

      Example operatorgroup.yaml file

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: external-dns-operator
        namespace: external-dns-operator
      spec:
        upgradeStrategy: Default
        targetNamespaces:
        - external-dns-operator
      # ...
      Copy to Clipboard Toggle word wrap

    2. Create the OperatorGroup object by running the following command:

      $ oc apply -f operatorgroup.yaml
      Copy to Clipboard Toggle word wrap
  3. Create a Subscription object:

    1. Create a YAML file that defines the Subscription object:

      Example subscription.yaml file

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: external-dns-operator
        namespace: external-dns-operator
      spec:
        channel: stable-v1
        installPlanApproval: Automatic
        name: external-dns-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
      # ...
      Copy to Clipboard Toggle word wrap

    2. Create the Subscription object by running the following command:

      $ oc apply -f subscription.yaml
      Copy to Clipboard Toggle word wrap

Verification

  1. Get the name of the install plan from the subscription by running the following command:

    $ oc -n external-dns-operator \
      get subscription external-dns-operator \
      --template='{{.status.installplan.name}}{{"\n"}}'
    Copy to Clipboard Toggle word wrap
  2. Verify that the status of the install plan is Complete by running the following command:

    $ oc -n external-dns-operator \
      get ip <install_plan_name> \
      --template='{{.status.phase}}{{"\n"}}'
    Copy to Clipboard Toggle word wrap
  3. Verify that the status of the external-dns-operator pod is Running by running the following command:

    $ oc -n external-dns-operator get pod
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                     READY   STATUS    RESTARTS   AGE
    external-dns-operator-5584585fd7-5lwqm   2/2     Running   0          11m
    Copy to Clipboard Toggle word wrap

  4. Verify that the catalog source of the subscription is redhat-operators by running the following command:

    $ oc -n external-dns-operator get subscription
    Copy to Clipboard Toggle word wrap
  5. Check the external-dns-operator version by running the following command:

    $ oc -n external-dns-operator get csv
    Copy to Clipboard Toggle word wrap

To customize the behavior of the External DNS Operator, configure the available parameters in the ExternalDNS custom resource (CR). By configuraing parameters, you can control how the Operator synchronizes services and routes with your external DNS provider.

To customize the behavior of the External DNS Operator, configure the available parameters in the ExternalDNS custom resource (CR). By configuring parameters, you can control how the Operator synchronizes services and routes with your external DNS provider.

Expand
ParameterDescription

spec

Enables the type of a cloud provider.

spec:
  provider:
    type: AWS
    aws:
      credentials:
        name: aws-access-key
Copy to Clipboard Toggle word wrap
  • provider.type: Specifies available options such as AWS, Google Cloud, Azure, and Infoblox.
  • provider.aws.credentials.name: Specifies a secret name for your cloud provider.

zones

Enables you to specify DNS zones by their domains. If you do not specify zones, the ExternalDNS resource discovers all of the zones present in your cloud provider account.

zones:
- "<zone_id>"
Copy to Clipboard Toggle word wrap
  • <zone_id>: Specifies the name of DNS zones.

domains

Enables you to specify AWS zones by their domains. If you do not specify domains, the ExternalDNS resource discovers all of the zones present in your cloud provider account.

domains:
- filterType: Include
  matchType: Exact
  name: "myzonedomain1.com"
- filterType: Include
  matchType: Pattern
  pattern: ".*\\.otherzonedomain\\.com"
Copy to Clipboard Toggle word wrap
  • domains.filterType: Specifies that the ExternalDNS resource includes the domain name.
  • domains.matchType: Specifies that the domain matching has to be exact as opposed to regular expression match.
  • domains.name: Specifies the name of the domain.
  • filterType.matchType: Specifies the regex-domain-filter flag in the ExternalDNS resource. You can limit possible domains by using a Regex filter.
  • filterType.pattern: Specifies the regex pattern to be used by the ExternalDNS resource to filter the domains of the target zones.

source

Enables you to specify the source for the DNS records, Service or Route.

source:
  type: Service
  service:
    serviceType:
      - LoadBalancer
      - ClusterIP
  labelFilter:
    matchLabels:
      external-dns.mydomain.org/publish: "yes"
  hostnameAnnotation: "Allow"
  fqdnTemplate:
  - "{{.Name}}.myzonedomain.com"
Copy to Clipboard Toggle word wrap
  • source: Specifies the settings for the source of DNS records.
  • source.type: Specifies that the ExternalDNS CR uses the Service type as the source for creating DNS records.
  • service.serviceType: Specifies the service-type-filter flag in the ExternalDNS resource. The serviceType contains the following fields: default: LoadBalancer; expected: ClusterIP; NodePort; LoadBalancer; ExternalName.
  • service.labelFilter: Specifies that the controller considers only those resources that match with label filter.
  • hostnameAnnotation: Specifies that the default value for hostnameAnnotation is Ignore which instructs ExternalDNS to generate DNS records by using the templates specified in the field fqdnTemplates. When the value is Allow the DNS records get generated based on the value specified in the external-dns.alpha.kubernetes.io/hostname annotation.
  • fqdnTemplate: Specifies that the External DNS Operator uses a string to generate DNS names from sources that do not define a hostname, or to add a hostname suffix when paired with the fake source.
source:
  type: OpenShiftRoute
  openshiftRouteOptions:
    routerName: default
    labelFilter:
      matchLabels:
        external-dns.mydomain.org/publish: "yes"
Copy to Clipboard Toggle word wrap
  • source.type: Specifies the creation of DNS records.
  • openshiftRouteOptions.routerName: Specifies if the source type is OpenShiftRoute. If so, you can pass the Ingress Controller name. The ExternalDNS resource uses the canonical name of the Ingress Controller as the target for CNAME records.

5.3.5. Creating DNS records on AWS

To create DNS records on AWS and AWS GovCloud, use the External DNS Operator. The Operator manages external name resolution for your cluster services directly through the Operator.

You can create DNS records on a public hosted zone for AWS by using the Red Hat External DNS Operator. You can use the same instructions to create DNS records on a hosted zone for AWS GovCloud.

Procedure

  1. Check the user profile by running the following command. The profile, such as system:admin, must have access to the kube-system namespace. If you do not have the credentials, you can fetch the credentials from the kube-system namespace to use the cloud provider client by running the following command:

    $ oc whoami
    Copy to Clipboard Toggle word wrap
  2. Fetch the values from the aws-creds secret that exists in the kube-system namespace.

    $ export AWS_ACCESS_KEY_ID=$(oc get secrets aws-creds -n kube-system  --template={{.data.aws_access_key_id}} | base64 -d)
    Copy to Clipboard Toggle word wrap
    $ export AWS_SECRET_ACCESS_KEY=$(oc get secrets aws-creds -n kube-system  --template={{.data.aws_secret_access_key}} | base64 -d)
    Copy to Clipboard Toggle word wrap
  3. Get the routes to check the domain:

    $ oc get routes --all-namespaces | grep console
    Copy to Clipboard Toggle word wrap

    Example output

    openshift-console          console             console-openshift-console.apps.testextdnsoperator.apacshift.support                       console             https   reencrypt/Redirect     None
    openshift-console          downloads           downloads-openshift-console.apps.testextdnsoperator.apacshift.support                     downloads           http    edge/Redirect          None
    Copy to Clipboard Toggle word wrap

  4. Get the list of DNS zones and find the DNS zone that corresponds to the domain of the route that you previously queried:

    $ aws route53 list-hosted-zones | grep testextdnsoperator.apacshift.support
    Copy to Clipboard Toggle word wrap

    Example output

    HOSTEDZONES	terraform	/hostedzone/Z02355203TNN1XXXX1J6O	testextdnsoperator.apacshift.support.	5
    Copy to Clipboard Toggle word wrap

  5. Create the ExternalDNS CR for the route source:

    $ cat <<EOF | oc create -f -
    apiVersion: externaldns.olm.openshift.io/v1beta1
    kind: ExternalDNS
    metadata:
      name: sample-aws
    spec:
      domains:
      - filterType: Include
        matchType: Exact
        name: testextdnsoperator.apacshift.support
      provider:
        type: AWS
      source:
        type: OpenShiftRoute
        openshiftRouteOptions:
          routerName: default
    EOF
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specifies the name of the external DNS resource.
    spec.domains
    By default all hosted zones are selected as potential targets. You can include a hosted zone that you need.
    domains.matchType
    Specifies that the matching of the domain from the target zone has to be exact. Exact as opposed to regular expression match.
    domains.name
    Specifies the exact domain of the zone you want to update. The hostname of the routes must be subdomains of the specified domain.
    provider.type
    Specifies the AWS Route53 DNS provider.
    source
    Specifies the options for the source of DNS records.
    source.type
    Specifies the OpenShiftRoute resource as the source for the DNS records which gets created in the previously specified DNS provider.
    openshiftRouteOptions.routerName
    If the source is OpenShiftRoute, then you can pass the OpenShift Ingress Controller name. External DNS Operator selects the canonical hostname of that router as the target while creating the CNAME record.
  6. Check the records created for OpenShift Container Platform routes by using the following command:

    $ aws route53 list-resource-record-sets --hosted-zone-id Z02355203TNN1XXXX1J6O --query "ResourceRecordSets[?Type == 'CNAME']" | grep console
    Copy to Clipboard Toggle word wrap

To create DNS records in a different AWS account, configure the ExternalDNS Operator to use a shared Virtual Private Cloud (VPC). Your organization can then use a single Route 53 instance for name resolution across multiple accounts and projects.

Prerequisites

  • You have created two Amazon AWS accounts: one with a VPC and a Route 53 private hosted zone configured (Account A), and another for installing a cluster (Account B).
  • You have created an IAM Policy and IAM Role with the appropriate permissions in Account A for Account B to create DNS records in the Route 53 hosted zone of Account A.
  • You have installed a cluster in Account B into the existing VPC for Account A.
  • You have installed the ExternalDNS Operator in the cluster in Account B.

Procedure

  1. Get the Role ARN of the IAM Role that you created to allow Account B to access Account A’s Route 53 hosted zone by running the following command:

    $ aws --profile account-a iam get-role --role-name user-rol1 | head -1
    Copy to Clipboard Toggle word wrap

    Example output

    ROLE	arn:aws:iam::1234567890123:role/user-rol1	2023-09-14T17:21:54+00:00	3600	/	AROA3SGB2ZRKRT5NISNJN	user-rol1
    Copy to Clipboard Toggle word wrap

  2. Locate the private hosted zone to use with Account A’s credentials by running the following command:

    $ aws --profile account-a route53 list-hosted-zones | grep testextdnsoperator.apacshift.support
    Copy to Clipboard Toggle word wrap

    Example output

    HOSTEDZONES	terraform	/hostedzone/Z02355203TNN1XXXX1J6O	testextdnsoperator.apacshift.support. 5
    Copy to Clipboard Toggle word wrap

  3. Create the ExternalDNS object by running the following command:

    $ cat <<EOF | oc create -f -
    apiVersion: externaldns.olm.openshift.io/v1beta1
    kind: ExternalDNS
    metadata:
      name: sample-aws
    spec:
      domains:
      - filterType: Include
        matchType: Exact
        name: testextdnsoperator.apacshift.support
      provider:
        type: AWS
        aws:
          assumeRole:
            arn: arn:aws:iam::12345678901234:role/user-rol1
      source:
        type: OpenShiftRoute
        openshiftRouteOptions:
          routerName: default
    EOF
    Copy to Clipboard Toggle word wrap

    where:

    arn
    Specifies the Role ARN to have DNS records created in Account A.
  4. Check the records created for OpenShift Container Platform routes by entering the following command:

    $ aws --profile account-a route53 list-resource-record-sets --hosted-zone-id Z02355203TNN1XXXX1J6O --query "ResourceRecordSets[?Type == 'CNAME']" | grep console-openshift-console
    Copy to Clipboard Toggle word wrap

5.3.6. Creating DNS records on Azure

To create DNS records on Microsoft Azure, use the External DNS Operator. By using this Operator, you can manage external name resolution for your cluster services.

Important

Using the External DNS Operator on a Microsoft Entra Workload ID-enabled cluster or a cluster that runs in Microsoft Azure Government (MAG) regions is not supported.

5.3.6.1. Creating DNS records on an Azure DNS zone

To create DNS records on a public or private DNS zone for Azure, use the External DNS Operator. The Operator manages external name resolution for your cluster.

Prerequisites

  • You must have administrator privileges.
  • The admin user must have access to the kube-system namespace.

Procedure

  1. Fetch the credentials from the kube-system namespace to use the cloud provider client by running the following command:

    $ CLIENT_ID=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_client_id}} | base64 -d)
    Copy to Clipboard Toggle word wrap
    $ CLIENT_SECRET=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_client_secret}} | base64 -d)
    Copy to Clipboard Toggle word wrap
    $ RESOURCE_GROUP=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_resourcegroup}} | base64 -d)
    Copy to Clipboard Toggle word wrap
    $ SUBSCRIPTION_ID=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_subscription_id}} | base64 -d)
    Copy to Clipboard Toggle word wrap
    $ TENANT_ID=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_tenant_id}} | base64 -d)
    Copy to Clipboard Toggle word wrap
  2. Log in to Azure by running the following command:

    $ az login --service-principal -u "${CLIENT_ID}" -p "${CLIENT_SECRET}" --tenant "${TENANT_ID}"
    Copy to Clipboard Toggle word wrap
  3. Get a list of routes by running the following command:

    $ oc get routes --all-namespaces | grep console
    Copy to Clipboard Toggle word wrap

    Example output

    openshift-console          console             console-openshift-console.apps.test.azure.example.com                       console             https   reencrypt/Redirect     None
    openshift-console          downloads           downloads-openshift-console.apps.test.azure.example.com                     downloads           http    edge/Redirect          None
    Copy to Clipboard Toggle word wrap

  4. Get a list of DNS zones.

    1. For public DNS zones, enter the following command:

      $ az network dns zone list --resource-group "${RESOURCE_GROUP}"
      Copy to Clipboard Toggle word wrap
    2. For private DNS zones, enter the following command:

      $ az network private-dns zone list -g "${RESOURCE_GROUP}"
      Copy to Clipboard Toggle word wrap
  5. Create a YAML file, for example, external-dns-sample-azure.yaml, that defines the ExternalDNS object:

    Example external-dns-sample-azure.yaml file

    apiVersion: externaldns.olm.openshift.io/v1beta1
    kind: ExternalDNS
    metadata:
      name: sample-azure
    spec:
      zones:
      - "/subscriptions/1234567890/resourceGroups/test-azure-xxxxx-rg/providers/Microsoft.Network/dnszones/test.azure.example.com"
      provider:
        type: Azure
      source:
        openshiftRouteOptions:
          routerName: default
        type: OpenShiftRoute
    # ...
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specifies the External DNS name.
    spec.zones
    Specifies the zone ID. For a private DNS zone, change dnszones to privateDnsZones.
    provider.type
    Specifies the provider type.
    source.openshiftRouteOptions
    Specifies the options for the source of DNS records.
    routerName
    If the source type is OpenShiftRoute, you can pass the OpenShift Ingress Controller name. The External DNS Operator selects the canonical hostname of that router as the target while creating the CNAME record.
    source.type
    Specifies the route resource as the source for the Azure DNS records.

Troubleshooting

  1. Check the records created for the routes.

    1. For public DNS zones, enter the following command:

      $ az network dns record-set list -g "${RESOURCE_GROUP}" -z "${ZONE_NAME}" | grep console
      Copy to Clipboard Toggle word wrap
    2. For private DNS zones, enter the following command:

      $ az network private-dns record-set list -g "${RESOURCE_GROUP}" -z "${ZONE_NAME}" | grep console
      Copy to Clipboard Toggle word wrap

To create DNS records on Google Cloud, use the External DNS Operator. The DNS Operator manages external name resolution for your cluster services.

Important

Using the External DNS Operator on a cluster with Google Cloud Workload Identity enabled is not supported. For more information about the Google Cloud Workload Identity, see Google Cloud Workload Identity.

To create DNS records on Google Cloud, use the External DNS Operator. The DNS Operator manages external name resolution for your cluster services.

Prerequisites

  • You must have administrator privileges.

Procedure

  1. Copy the gcp-credentials secret in the encoded-gcloud.json file by running the following command:

    $ oc get secret gcp-credentials -n kube-system --template='{{$v := index .data "service_account.json"}}{{$v}}' | base64 -d - > decoded-gcloud.json
    Copy to Clipboard Toggle word wrap
  2. Export your Google credentials by running the following command:

    $ export GOOGLE_CREDENTIALS=decoded-gcloud.json
    Copy to Clipboard Toggle word wrap
  3. Activate your account by using the following command:

    $ gcloud auth activate-service-account  <client_email as per decoded-gcloud.json> --key-file=decoded-gcloud.json
    Copy to Clipboard Toggle word wrap
  4. Set your project by running the following command:

    $ gcloud config set project <project_id as per decoded-gcloud.json>
    Copy to Clipboard Toggle word wrap
  5. Get a list of routes by running the following command:

    $ oc get routes --all-namespaces | grep console
    Copy to Clipboard Toggle word wrap

    Example output

    openshift-console          console             console-openshift-console.apps.test.gcp.example.com                       console             https   reencrypt/Redirect     None
    openshift-console          downloads           downloads-openshift-console.apps.test.gcp.example.com                     downloads           http    edge/Redirect          None
    Copy to Clipboard Toggle word wrap

  6. Get a list of managed zones, such as qe-cvs4g-private-zone test.gcp.example.com, by running the following command:

    $ gcloud dns managed-zones list | grep test.gcp.example.com
    Copy to Clipboard Toggle word wrap
  7. Create a YAML file, for example, external-dns-sample-gcp.yaml, that defines the ExternalDNS object:

    Example external-dns-sample-gcp.yaml file

    apiVersion: externaldns.olm.openshift.io/v1beta1
    kind: ExternalDNS
    metadata:
      name: sample-gcp
    spec:
      domains:
        - filterType: Include
          matchType: Exact
          name: test.gcp.example.com
      provider:
        type: GCP
      source:
        openshiftRouteOptions:
          routerName: default
        type: OpenShiftRoute
    # ...
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specifies the External DNS name.
    spec.domains.filterType
    By default, all hosted zones are selected as potential targets. You can include your hosted zone.
    spec.domains.matchType
    Specifies the domain of the target that must match the string defined by the name key.
    spec.domains.name
    Specifies the exact domain of the zone you want to update. The hostname of the routes must be subdomains of the specified domain.
    spec.provider.type
    Specifies the provider type.
    source.openshiftRouteOptions
    Specifies options for the source of DNS records.
    openshiftRouteOptions.routerName
    If the source type is OpenShiftRoute, you can pass the OpenShift Ingress Controller name. External DNS selects the canonical hostname of that router as the target while creating a CNAME record.
    type
    Specifies the route resource as the source for Google Cloud DNS records.
  8. Check the DNS records created for OpenShift Container Platform routes by running the following command:

    $ gcloud dns record-sets list --zone=qe-cvs4g-private-zone | grep console
    Copy to Clipboard Toggle word wrap

5.3.8. Creating DNS records on Infoblox

To create DNS records on Infoblox, use the External DNS Operator. The Operator manages external name resolution for your cluster services.

To create DNS records on Infoblox, use the External DNS Operator. The Operator manages external name resolution for your cluster services.

Prerequisites

  • You have access to the OpenShift CLI (oc).
  • You have access to the Infoblox UI.

Procedure

  1. Create a secret object with Infoblox credentials by running the following command:

    $ oc -n external-dns-operator create secret generic infoblox-credentials --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_USERNAME=<infoblox_username> --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_PASSWORD=<infoblox_password>
    Copy to Clipboard Toggle word wrap
  2. Get a list of routes by running the following command:

    $ oc get routes --all-namespaces | grep console
    Copy to Clipboard Toggle word wrap

    Example output

    openshift-console          console             console-openshift-console.apps.test.example.com                       console             https   reencrypt/Redirect     None
    openshift-console          downloads           downloads-openshift-console.apps.test.example.com                     downloads           http    edge/Redirect          None
    Copy to Clipboard Toggle word wrap

  3. Create a YAML file, for example, external-dns-sample-infoblox.yaml, that defines the ExternalDNS object:

    Example external-dns-sample-infoblox.yaml file

    apiVersion: externaldns.olm.openshift.io/v1beta1
    kind: ExternalDNS
    metadata:
      name: sample-infoblox
    spec:
      provider:
        type: Infoblox
        infoblox:
          credentials:
            name: infoblox-credentials
          gridHost: ${INFOBLOX_GRID_PUBLIC_IP}
          wapiPort: 443
          wapiVersion: "2.3.1"
      domains:
      - filterType: Include
        matchType: Exact
        name: test.example.com
      source:
        type: OpenShiftRoute
        openshiftRouteOptions:
          routerName: default
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specifies the External DNS name.
    provider.type
    Specifies the provider type.
    source.type
    Specifies options for the source of DNS records.
    routerName
    If the source type is OpenShiftRoute, you can pass the OpenShift Ingress Controller name. External DNS selects the canonical hostname of that router as the target while creating a CNAME record.
  4. Create the ExternalDNS resource on Infoblox by running the following command:

    $ oc create -f external-dns-sample-infoblox.yaml
    Copy to Clipboard Toggle word wrap
  5. From the Infoblox UI, check the DNS records created for console routes:

    1. Click Data ManagementDNSZones.
    2. Select the zone name.

To propagate proxy settings to your deployed Operators, configure the cluster-wide proxy. The Operator Lifecycle Manager (OLM) automatically updates these Operators with the new HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables.

To enable the External DNS Operator to authenticate with the cluster-wide proxy, configure the Operator to trust the certificate authority (CA) of the proxy. This ensures secure communication when routing DNS traffic through the proxy.

Procedure

  1. Create the config map to contain the CA bundle in the external-dns-operator namespace by running the following command:

    $ oc -n external-dns-operator create configmap trusted-ca
    Copy to Clipboard Toggle word wrap
  2. To inject the trusted CA bundle into the config map, add the config.openshift.io/inject-trusted-cabundle=true label to the config map by running the following command:

    $ oc -n external-dns-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
    Copy to Clipboard Toggle word wrap
  3. Update the subscription of the External DNS Operator by running the following command:

    $ oc -n external-dns-operator patch subscription external-dns-operator --type='json' -p='[{"op": "add", "path": "/spec/config", "value":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}]}}]'
    Copy to Clipboard Toggle word wrap

Verification

  • After deploying the External DNS Operator, verify that the trusted CA environment variable is added by running the following command. The output must show trusted-ca for the external-dns-operator deployment.

    $ oc -n external-dns-operator exec deploy/external-dns-operator -c external-dns-operator -- printenv TRUSTED_CA_CONFIGMAP_NAME
    Copy to Clipboard Toggle word wrap

5.4. MetalLB Operator

5.4.1. About MetalLB and the MetalLB Operator

In OpenShift Container Platform clusters running on bare metal or without a cloud load balancer, you can use the MetalLB Operator to assign external IP addresses to LoadBalancer services. These services receive external IPs on the host network.

5.4.1.1. When to use MetalLB

To get fault-tolerant access to applications through an external IP on bare metal in OpenShift Container Platform, you can use MetalLB.

Using MetalLB is valuable when you have a bare-metal cluster, or an infrastructure that is like bare metal, and you want fault-tolerant access to an application through an external IP address.

You must configure your networking infrastructure to ensure that network traffic for the external IP address is routed from clients to the host network for the cluster.

After deploying MetalLB with the MetalLB Operator, when you add a service of type LoadBalancer, MetalLB provides a platform-native load balancer.

When external traffic enters your OpenShift Container Platform cluster through a MetalLB LoadBalancer service, the return traffic to the client has the external IP address of the load balancer as the source IP.

MetalLB operating in layer2 mode provides support for failover by utilizing a mechanism similar to IP failover. However, instead of relying on the virtual router redundancy protocol (VRRP) and keepalived, MetalLB leverages a gossip-based protocol to identify instances of node failure. When a failover is detected, another node assumes the role of the leader node, and a gratuitous ARP message is dispatched to broadcast this change.

MetalLB operating in layer3 or border gateway protocol (BGP) mode delegates failure detection to the network. The BGP router or routers that the OpenShift Container Platform nodes have established a connection with will identify any node failure and terminate the routes to that node.

Using MetalLB instead of IP failover is preferable for ensuring high availability of pods and services.

5.4.1.2. MetalLB Operator custom resources

In OpenShift Container Platform, you configure MetalLB deployment and IP advertisement through custom resources that the MetalLB Operator monitors. The resources include MetalLB, IPAddressPool, L2Advertisement, BGPAdvertisement, BGPPeer, and BFDProfile.

MetalLB
When you add a MetalLB custom resource to the cluster, the MetalLB Operator deploys MetalLB on the cluster. The Operator only supports a single instance of the custom resource. If the instance is deleted, the Operator removes MetalLB from the cluster.
IPAddressPool

MetalLB requires one or more pools of IP addresses that it can assign to a service when you add a service of type LoadBalancer. An IPAddressPool includes a list of IP addresses. The list can be a single IP address that is set using a range, such as 1.1.1.1-1.1.1.1, a range specified in CIDR notation, a range specified as a starting and ending address separated by a hyphen, or a combination of the three. An IPAddressPool requires a name. The documentation uses names like doc-example, doc-example-reserved, and doc-example-ipv6. The MetalLB controller assigns IP addresses from a pool of addresses in an IPAddressPool. L2Advertisement and BGPAdvertisement custom resources enable the advertisement of a given IP from a given pool. You can assign IP addresses from an IPAddressPool to services and namespaces by using the spec.serviceAllocation specification in the IPAddressPool custom resource.

Note

A single IPAddressPool can be referenced by a L2 advertisement and a BGP advertisement.

BGPPeer
The BGP peer custom resource identifies the BGP router for MetalLB to communicate with, the AS number of the router, the AS number for MetalLB, and customizations for route advertisement. MetalLB advertises the routes for service load-balancer IP addresses to one or more BGP peers.
BFDProfile
The BFD profile custom resource configures Bidirectional Forwarding Detection (BFD) for a BGP peer. BFD provides faster path failure detection than BGP alone provides.
L2Advertisement
The L2Advertisement custom resource advertises an IP coming from an IPAddressPool using the L2 protocol.
BGPAdvertisement
The BGPAdvertisement custom resource advertises an IP coming from an IPAddressPool using the BGP protocol.

After you add the MetalLB custom resource to the cluster and the Operator deploys MetalLB, the controller and speaker MetalLB software components begin running.

MetalLB validates all relevant custom resources.

5.4.1.3. MetalLB software components

In OpenShift Container Platform, you get external IPs for LoadBalancer services from two MetalLB components. The controller assigns IPs from address pools, and the speaker advertises them via layer 2 or BGP.

When you install the MetalLB Operator, the metallb-operator-controller-manager deployment starts a pod. The pod is the implementation of the Operator. The pod monitors for changes to all the relevant resources.

When the Operator starts an instance of MetalLB, it starts a controller deployment and a speaker daemon set.

Note

You can configure deployment specifications in the MetalLB custom resource to manage how controller and speaker pods deploy and run in your cluster. For more information about these deployment specifications, see the Additional resources section.

controller

The Operator starts the deployment and a single pod. When you add a service of type LoadBalancer, Kubernetes uses the controller to allocate an IP address from an address pool. In case of a service failure, verify you have the following entry in your controller pod logs:

Example output

"event":"ipAllocated","ip":"172.22.0.201","msg":"IP address assigned by controller
Copy to Clipboard Toggle word wrap

speaker

The Operator starts a daemon set for speaker pods. By default, a pod is started on each node in your cluster. You can limit the pods to specific nodes by specifying a node selector in the MetalLB custom resource when you start MetalLB. If the controller allocated the IP address to the service and service is still unavailable, read the speaker pod logs. If the speaker pod is unavailable, run the oc describe pod -n command.

For layer 2 mode, after the controller allocates an IP address for the service, the speaker pods use an algorithm to determine which speaker pod on which node will announce the load balancer IP address. The algorithm involves hashing the node name and the load balancer IP address. For more information, see "MetalLB and external traffic policy". The speaker uses Address Resolution Protocol (ARP) to announce IPv4 addresses and Neighbor Discovery Protocol (NDP) to announce IPv6 addresses.

For Border Gateway Protocol (BGP) mode, after the controller allocates an IP address for the service, each speaker pod advertises the load balancer IP address with its BGP peers. You can configure which nodes start BGP sessions with BGP peers.

Requests for the load balancer IP address are routed to the node with the speaker that announces the IP address. After the node receives the packets, the service proxy routes the packets to an endpoint for the service. The endpoint can be on the same node in the optimal case, or it can be on another node. The service proxy chooses an endpoint each time a connection is established.

5.4.1.4. MetalLB and external traffic policy

External traffic policy for MetalLB LoadBalancer services determines how the service proxy distributes traffic to pods. Set the policy to cluster for uniform distribution or to local to preserve client IP addresses.

With layer 2 mode, one node in your cluster receives all the traffic for the service IP address.

With BGP mode, a router on the host network opens a connection to one of the nodes in the cluster for a new client connection.

How your cluster handles the traffic after it enters the node is affected by the external traffic policy.

cluster

This is the default value for spec.externalTrafficPolicy.

With the cluster traffic policy, after the node receives the traffic, the service proxy distributes the traffic to all the pods in your service. This policy provides uniform traffic distribution across the pods, but it obscures the client IP address and it can appear to the application in your pods that the traffic originates from the node rather than the client.

local

With the local traffic policy, after the node receives the traffic, the service proxy only sends traffic to the pods on the same node. For example, if the speaker pod on node A announces the external service IP, then all traffic is sent to node A. After the traffic enters node A, the service proxy only sends traffic to pods for the service that are also on node A. Pods for the service that are on additional nodes do not receive any traffic from node A. Pods for the service on additional nodes act as replicas in case failover is needed.

This policy does not affect the client IP address. Application pods can determine the client IP address from the incoming connections.

Note

The following information is important when configuring the external traffic policy in BGP mode.

Although MetalLB advertises the load balancer IP address from all the eligible nodes, the number of nodes loadbalancing the service can be limited by the capacity of the router to establish equal-cost multipath (ECMP) routes. If the number of nodes advertising the IP is greater than the ECMP group limit of the router, the router will use less nodes than the ones advertising the IP.

For example, if the external traffic policy is set to local and the router has an ECMP group limit set to 16 and the pods implementing a LoadBalancer service are deployed on 30 nodes, this would result in pods deployed on 14 nodes not receiving any traffic. In this situation, it would be preferable to set the external traffic policy for the service to cluster.

5.4.1.5. MetalLB concepts for layer 2 mode

MetalLB in layer 2 mode announces the external IP for a LoadBalancer service from one node via ARP or NDP. All traffic for the service goes through that node, and failover to another node is automatic when the node becomes unavailable.

Note

In layer 2 mode, MetalLB relies on ARP and NDP. These protocols implement local address resolution within a specific subnet. In this context, the client must be able to reach the VIP assigned by MetalLB that exists on the same subnet as the nodes announcing the service in order for MetalLB to work.

The speaker pod responds to ARP requests for IPv4 services and NDP requests for IPv6.

In layer 2 mode, all traffic for a service IP address is routed through one node. After traffic enters the node, the service proxy for the CNI network provider distributes the traffic to all the pods for the service.

Because all traffic for a service enters through a single node in layer 2 mode, in a strict sense, MetalLB does not implement a load balancer for layer 2. Rather, MetalLB implements a failover mechanism for layer 2 so that when a speaker pod becomes unavailable, a speaker pod on a different node can announce the service IP address.

When a node becomes unavailable, failover is automatic. The speaker pods on the other nodes detect that a node is unavailable and a new speaker pod and node take ownership of the service IP address from the failed node.

The preceding graphic shows the following concepts related to MetalLB:

  • An application is available through a service that has a cluster IP on the 172.130.0.0/16 subnet. That IP address is accessible from inside the cluster. The service also has an external IP address that MetalLB assigned to the service, 192.168.100.200.
  • Nodes 1 and 3 have a pod for the application.
  • The speaker daemon set runs a pod on each node. The MetalLB Operator starts these pods.
  • Each speaker pod is a host-networked pod. The IP address for the pod is identical to the IP address for the node on the host network.
  • The speaker pod on node 1 uses ARP to announce the external IP address for the service, 192.168.100.200. The speaker pod that announces the external IP address must be on the same node as an endpoint for the service and the endpoint must be in the Ready condition.
  • Client traffic is routed to the host network and connects to the 192.168.100.200 IP address. After traffic enters the node, the service proxy sends the traffic to the application pod on the same node or another node according to the external traffic policy that you set for the service.

    • If the external traffic policy for the service is set to cluster, the node that advertises the 192.168.100.200 load balancer IP address is selected from the nodes where a speaker pod is running. Only that node can receive traffic for the service.
    • If the external traffic policy for the service is set to local, the node that advertises the 192.168.100.200 load balancer IP address is selected from the nodes where a speaker pod is running and at least an endpoint of the service. Only that node can receive traffic for the service. In the preceding graphic, either node 1 or 3 would advertise 192.168.100.200.
  • If node 1 becomes unavailable, the external IP address fails over to another node. On another node that has an instance of the application pod and service endpoint, the speaker pod begins to announce the external IP address, 192.168.100.200 and the new node receives the client traffic. In the diagram, the only candidate is node 3.
5.4.1.6. MetalLB concepts for BGP mode

MetalLB in border gateway protocol (BGP) mode advertises load balancer IP addresses to BGP peers from each speaker pod. The router sends traffic to one of the nodes, so load is distributed across nodes and the router switches to another node when one becomes unavailable.

It is also possible to advertise the IPs coming from a given pool to a specific set of peers by adding an optional list of BGP peers.

BGP peers are commonly network routers that are configured to use the BGP protocol. When a router receives traffic for the load balancer IP address, the router picks one of the nodes with a speaker pod that advertised the IP address. The router sends the traffic to that node. After traffic enters the node, the service proxy for the CNI network plugin distributes the traffic to all the pods for the service.

The directly-connected router on the same layer 2 network segment as the cluster nodes can be configured as a BGP peer. If the directly-connected router is not configured as a BGP peer, you need to configure your network so that packets for load balancer IP addresses are routed between the BGP peers and the cluster nodes that run the speaker pods.

Each time a router receives new traffic for the load balancer IP address, it creates a new connection to a node. Each router manufacturer has an implementation-specific algorithm for choosing which node to initiate the connection with. However, the algorithms commonly are designed to distribute traffic across the available nodes for the purpose of balancing the network load.

If a node becomes unavailable, the router initiates a new connection with another node that has a speaker pod that advertises the load balancer IP address.

Figure 5.1. MetalLB topology diagram for BGP mode

The preceding graphic shows the following concepts related to MetalLB:

  • An application is available through a service that has an IPv4 cluster IP on the 172.130.0.0/16 subnet. That IP address is accessible from inside the cluster. The service also has an external IP address that MetalLB assigned to the service, 203.0.113.200.
  • Nodes 2 and 3 have a pod for the application.
  • The speaker daemon set runs a pod on each node. The MetalLB Operator starts these pods. You can configure MetalLB to specify which nodes run the speaker pods.
  • Each speaker pod is a host-networked pod. The IP address for the pod is identical to the IP address for the node on the host network.
  • Each speaker pod starts a BGP session with all BGP peers and advertises the load balancer IP addresses or aggregated routes to the BGP peers. The speaker pods advertise that they are part of Autonomous System 65010. The diagram shows a router, R1, as a BGP peer within the same Autonomous System. However, you can configure MetalLB to start BGP sessions with peers that belong to other Autonomous Systems.
  • All the nodes with a speaker pod that advertises the load balancer IP address can receive traffic for the service.

    • If the external traffic policy for the service is set to cluster, all the nodes where a speaker pod is running advertise the 203.0.113.200 load balancer IP address and all the nodes with a speaker pod can receive traffic for the service. The host prefix is advertised to the router peer only if the external traffic policy is set to cluster.
    • If the external traffic policy for the service is set to local, then all the nodes where a speaker pod is running and at least an endpoint of the service is running can advertise the 203.0.113.200 load balancer IP address. Only those nodes can receive traffic for the service. In the preceding graphic, nodes 2 and 3 would advertise 203.0.113.200.
  • You can configure MetalLB to control which speaker pods start BGP sessions with specific BGP peers by specifying a node selector when you add a BGP peer custom resource.
  • Any routers, such as R1, that are configured to use BGP can be set as BGP peers.
  • Client traffic is routed to one of the nodes on the host network. After traffic enters the node, the service proxy sends the traffic to the application pod on the same node or another node according to the external traffic policy that you set for the service.
  • If a node becomes unavailable, the router detects the failure and initiates a new connection with another node. You can configure MetalLB to use a Bidirectional Forwarding Detection (BFD) profile for BGP peers. BFD provides faster link failure detection so that routers can initiate new connections earlier than without BFD.
5.4.1.7. Limitations and restrictions

MetalLB has limitations for infrastructure, layer 2 mode, and BGP mode in OpenShift Container Platform. Consider infrastructure fit, layer 2 single-node bandwidth and failover, and BGP connection resets and single ASN when you plan your deployment.

You can use MetalLB for bare metal and on-premise installations that lack a native load balancer.

In addition to bare-metal installations, installations of OpenShift Container Platform on some infrastructures might not include a native load-balancer capability. For example, the following infrastructures can benefit from adding the MetalLB Operator:

  • Bare metal
  • VMware vSphere
  • IBM Z® and IBM® LinuxONE
  • IBM Z® and IBM® LinuxONE for Red Hat Enterprise Linux (RHEL) KVM
  • IBM Power®

MetalLB Operator and MetalLB are supported with the OpenShift SDN and OVN-Kubernetes network providers.

5.4.1.7.2. Limitations for layer 2 mode

In OpenShift Container Platform, MetalLB layer 2 mode is limited to single-node bandwidth and failover depends on client ARP handling. Avoid using the same VLAN for MetalLB and an additional network to prevent connection failures.

5.4.1.7.2.1. Single-node bottleneck

MetalLB routes all traffic for a service through a single node, the node can become a bottleneck and limit performance.

Layer 2 mode limits the ingress bandwidth for your service to the bandwidth of a single node. This is a fundamental limitation of using ARP and NDP to direct traffic.

5.4.1.7.2.2. Slow failover performance

Failover between nodes depends on cooperation from the clients. When a failover occurs, MetalLB sends gratuitous ARP packets to notify clients that the MAC address associated with the service IP has changed.

Most client operating systems handle gratuitous ARP packets correctly and update their neighbor caches promptly. When clients update their caches quickly, failover completes within a few seconds. Clients typically fail over to a new node within 10 seconds. However, some client operating systems either do not handle gratuitous ARP packets at all or have outdated implementations that delay the cache update.

Recent versions of common operating systems such as Windows, macOS, and Linux implement layer 2 failover correctly. Issues with slow failover are not expected except for older and less common client operating systems.

To minimize the impact from a planned failover on outdated clients, keep the old node running for a few minutes after flipping leadership. The old node can continue to forward traffic for outdated clients until their caches refresh.

During an unplanned failover, the service IPs are unreachable until the outdated clients refresh their cache entries.

Using the same VLAN for both MetalLB and an additional network interface set up on a source pod might result in a connection failure. This occurs when both the MetalLB IP and the source pod reside on the same node.

To avoid connection failures, place the MetalLB IP in a different subnet from the one where the source pod resides. This configuration ensures that traffic from the source pod will take the default gateway. Consequently, the traffic can effectively reach its destination by using the OVN overlay network, ensuring that the connection functions as intended.

5.4.1.7.3. Limitations for BGP mode

In OpenShift Container Platform, MetalLB border gateway protocol (BGP) mode can reset active connections when a BGP session terminates and requires a single ASN and router ID for all BGP peers. Use a node selector when adding a BGP peer to limit which nodes run BGP sessions and reduce the impact of node faults.

MetalLB shares a limitation that is common to BGP-based load balancing. When a BGP session terminates, such as when a node fails or when a speaker pod restarts, the session termination might result in resetting all active connections. End users can experience a Connection reset by peer message.

The consequence of a terminated BGP session is implementation-specific for each router manufacturer. However, you can anticipate that a change in the number of speaker pods affects the number of BGP sessions and that active connections with BGP peers will break.

To avoid or reduce the likelihood of a service interruption, you can specify a node selector when you add a BGP peer. By limiting the number of nodes that start BGP sessions, a fault on a node that does not have a BGP session has no affect on connections to the service.

When you add a BGP peer custom resource, you specify the spec.myASN field to identify the Autonomous System Number (ASN) that MetalLB belongs to. OpenShift Container Platform uses an implementation of BGP with MetalLB that requires MetalLB to belong to a single ASN. If you attempt to add a BGP peer and specify a different value for spec.myASN than an existing BGP peer custom resource, you receive an error.

Similarly, when you add a BGP peer custom resource, the spec.routerID field is optional. If you specify a value for this field, you must specify the same value for all other BGP peer custom resources that you add.

The limitation to support a single ASN and single router ID is a difference with the community-supported implementation of MetalLB.

5.4.2. Installing the MetalLB Operator

As a cluster administrator, you can add the MetalLB Operator so that the Operator can manage the lifecycle for an instance of MetalLB on your cluster.

MetalLB and IP failover are incompatible. If you configured IP failover for your cluster, perform the steps to remove IP failover before you install the Operator.

As a cluster administrator, you can install the MetalLB Operator by using the OpenShift Container Platform web console.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

  1. In the OpenShift Container Platform web console, navigate to OperatorsOperatorHub.
  2. Type a keyword into the Filter by keyword box or scroll to find the Operator you want. For example, type metallb to find the MetalLB Operator.

    You can also filter options by Infrastructure Features. For example, select Disconnected if you want to see Operators that work in disconnected environments, also known as restricted network environments.

  3. On the Install Operator page, accept the defaults and click Install.

Verification

  1. To confirm that the installation is successful:

    1. Navigate to the OperatorsInstalled Operators page.
    2. Check that the Operator is installed in the openshift-operators namespace and that its status is Succeeded.
  2. If the Operator is not installed successfully, check the status of the Operator and review the logs:

    1. Navigate to the OperatorsInstalled Operators page and inspect the Status column for any errors or failures.
    2. Navigate to the WorkloadsPods page and check the logs in any pods in the openshift-operators project that are reporting issues.
5.4.2.2. Installing from OperatorHub using the CLI

To install the MetalLB Operator from the software catalog in OpenShift Container Platform without using the web console, you can use the OpenShift CLI (oc).

It is recommended that when using the CLI you install the Operator in the metallb-system namespace.

Prerequisites

  • A cluster installed on bare-metal hardware.
  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a namespace for the MetalLB Operator by entering the following command:

    $ cat << EOF | oc apply -f -
    apiVersion: v1
    kind: Namespace
    metadata:
      name: metallb-system
    EOF
    Copy to Clipboard Toggle word wrap
  2. Create an Operator group custom resource (CR) in the namespace:

    $ cat << EOF | oc apply -f -
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: metallb-operator
      namespace: metallb-system
    EOF
    Copy to Clipboard Toggle word wrap
  3. Confirm the Operator group is installed in the namespace:

    $ oc get operatorgroup -n metallb-system
    Copy to Clipboard Toggle word wrap

    Example output

    NAME               AGE
    metallb-operator   14m
    Copy to Clipboard Toggle word wrap

  4. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, metallb-sub.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: metallb-operator-sub
        namespace: metallb-system
      spec:
        channel: stable
        name: metallb-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
      Copy to Clipboard Toggle word wrap
      • For the spec.source parameter, must specify the redhat-operators value.
    2. To create the Subscription CR, run the following command:

      $ oc create -f metallb-sub.yaml
      Copy to Clipboard Toggle word wrap
  5. Optional: To ensure BGP and BFD metrics appear in Prometheus, you can label the namespace as in the following command:

    $ oc label ns metallb-system "openshift.io/cluster-monitoring=true"
    Copy to Clipboard Toggle word wrap

Verification

The verification steps assume the MetalLB Operator is installed in the metallb-system namespace.

  1. Confirm the install plan is in the namespace:

    $ oc get installplan -n metallb-system
    Copy to Clipboard Toggle word wrap

    Example output

    NAME            CSV                                   APPROVAL    APPROVED
    install-wzg94   metallb-operator.4.16.0-nnnnnnnnnnnn   Automatic   true
    Copy to Clipboard Toggle word wrap

    Note

    Installation of the Operator might take a few seconds.

  2. To verify that the Operator is installed, enter the following command and then check that output shows Succeeded for the Operator:

    $ oc get clusterserviceversion -n metallb-system \
      -o custom-columns=Name:.metadata.name,Phase:.status.phase
    Copy to Clipboard Toggle word wrap
5.4.2.3. Starting MetalLB on your cluster

To start MetalLB on your cluster after installing the MetalLB Operator in OpenShift Container Platform, you create a single MetalLB custom resource.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the MetalLB Operator.

Procedure

  1. Create a single instance of a MetalLB custom resource:

    $ cat << EOF | oc apply -f -
    apiVersion: metallb.io/v1beta1
    kind: MetalLB
    metadata:
      name: metallb
      namespace: metallb-system
    EOF
    Copy to Clipboard Toggle word wrap
    • For the metdata.namespace parameter, substitute metallb-system with openshift-operators if you installed the MetalLB Operator using the web console.

Verification

Confirm that the deployment for the MetalLB controller and the daemon set for the MetalLB speaker are running.

  1. Verify that the deployment for the controller is running:

    $ oc get deployment -n metallb-system controller
    Copy to Clipboard Toggle word wrap

    Example output

    NAME         READY   UP-TO-DATE   AVAILABLE   AGE
    controller   1/1     1            1           11m
    Copy to Clipboard Toggle word wrap

  2. Verify that the daemon set for the speaker is running:

    $ oc get daemonset -n metallb-system speaker
    Copy to Clipboard Toggle word wrap

    Example output

    NAME      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
    speaker   6         6         6       6            6           kubernetes.io/os=linux   18m
    Copy to Clipboard Toggle word wrap

    The example output indicates 6 speaker pods. The number of speaker pods in your cluster might differ from the example output. Make sure the output indicates one pod for each node in your cluster.

5.4.2.4. Deployment specifications for MetalLB

Deployment specifications in the MetalLB custom resource control how the MetalLB controller and speaker pods deploy and run in OpenShift Container Platform.

Use deployment specifications to manage the following tasks:

  • Select nodes for MetalLB pod deployment.
  • Manage scheduling by using pod priority and pod affinity.
  • Assign CPU limits for MetalLB pods.
  • Assign a container RuntimeClass for MetalLB pods.
  • Assign metadata for MetalLB pods.
5.4.2.4.1. Limit speaker pods to specific nodes

You can limit MetalLB speaker pods to specific nodes in OpenShift Container Platform by configuring a node selector in the MetalLB custom resource. Only nodes that run a speaker pod advertise load balancer IP addresses, so you control which nodes serve MetalLB traffic.

The most common reason to limit the speaker pods to specific nodes is to ensure that only nodes with network interfaces on specific networks advertise load balancer IP addresses.

If you limit the speaker pods to specific nodes and specify local for the external traffic policy of a service, then you must ensure that the application pods for the service are deployed to the same nodes.

Example configuration to limit speaker pods to worker nodes

apiVersion: metallb.io/v1beta1
kind: MetalLB
metadata:
  name: metallb
  namespace: metallb-system
spec:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
  speakerTolerations:
  - key: "Example"
    operator: "Exists"
    effect: "NoExecute"
Copy to Clipboard Toggle word wrap

  • In this example configuration, the spec.nodeSelector field assigns the speaker pods to worker nodes. You can specify labels that you assigned to nodes or any valid node selector.
  • In this example configuration, spec.speakerToTolerations pod that this toleration is attached to tolerates any taint that matches the key and effect values by using the operator value.

After you apply a manifest with the spec.nodeSelector field, you can check the number of pods that the Operator deployed with the oc get daemonset -n metallb-system speaker command. Similarly, you can display the nodes that match your labels with a command like oc get nodes -l node-role.kubernetes.io/worker=.

You can optionally allow the node to control which speaker pods should, or should not, be scheduled on them by using affinity rules. You can also limit these pods by applying a list of tolerations. For more information about affinity rules, taints, and tolerations, see the additional resources.

To control scheduling of MetalLB controller and speaker pods in OpenShift Container Platform, you can assign pod priority and pod affinity in the MetalLB custom resource. You create a PriorityClass and set priorityClassName and affinity in the MetalLB spec, then apply the configuration.

The pod priority indicates the relative importance of a pod on a node and schedules the pod based on this priority. Set a high priority on your controller or speaker pod to ensure scheduling priority over other pods on the node.

Pod affinity manages relationships among pods. Assign pod affinity to the controller or speaker pods to control on what node the scheduler places the pod in the context of pod relationships. For example, you can use pod affinity rules to ensure that certain pods are located on the same node or nodes, which can help improve network communication and reduce latency between those components.

Prerequisites

  • You are logged in as a user with cluster-admin privileges.
  • You have installed the MetalLB Operator.
  • You have started the MetalLB Operator on your cluster.

Procedure

  1. Create a PriorityClass custom resource, such as myPriorityClass.yaml, to configure the priority level. This example defines a PriorityClass named high-priority with a value of 1000000. Pods that are assigned this priority class are considered higher priority during scheduling compared to pods with lower priority classes:

    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: high-priority
    value: 1000000
    Copy to Clipboard Toggle word wrap
  2. Apply the PriorityClass custom resource configuration:

    $ oc apply -f myPriorityClass.yaml
    Copy to Clipboard Toggle word wrap
  3. Create a MetalLB custom resource, such as MetalLBPodConfig.yaml, to specify the priorityClassName and podAffinity values:

    apiVersion: metallb.io/v1beta1
    kind: MetalLB
    metadata:
      name: metallb
      namespace: metallb-system
    spec:
      logLevel: debug
      controllerConfig:
        priorityClassName: high-priority
        affinity:
          podAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                 app: metallb
              topologyKey: kubernetes.io/hostname
      speakerConfig:
        priorityClassName: high-priority
        affinity:
          podAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                 app: metallb
              topologyKey: kubernetes.io/hostname
    Copy to Clipboard Toggle word wrap

    where:

    spec.controllerConfig.priorityClassName
    Specifies the priority class for the MetalLB controller pods. In this case, it is set to high-priority.
    spec.controllerConfig.affinity.podAffinity
    Specifies that you are configuring pod affinity rules. These rules dictate how pods are scheduled in relation to other pods or nodes. This configuration instructs the scheduler to schedule pods that have the label app: metallb onto nodes that share the same hostname. This helps to co-locate MetalLB-related pods on the same nodes, potentially optimizing network communication, latency, and resource usage between these pods.
  4. Apply the MetalLB custom resource configuration by running the following command:

    $ oc apply -f MetalLBPodConfig.yaml
    Copy to Clipboard Toggle word wrap

Verification

  • To view the priority class that you assigned to pods in the metallb-system namespace, run the following command:

    $ oc get pods -n metallb-system -o custom-columns=NAME:.metadata.name,PRIORITY:.spec.priorityClassName
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                                 PRIORITY
    controller-584f5c8cd8-5zbvg                          high-priority
    metallb-operator-controller-manager-9c8d9985-szkqg   <none>
    metallb-operator-webhook-server-c895594d4-shjgx      <none>
    speaker-dddf7                                        high-priority
    Copy to Clipboard Toggle word wrap

  • Verify that the scheduler placed pods according to pod affinity rules by viewing the metadata for the node of the pod. For example:

    $ oc get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name -n metallb-system
    Copy to Clipboard Toggle word wrap

To manage compute resources on nodes running MetalLB in OpenShift Container Platform, you can assign CPU limits to the controller and speaker pods in the MetalLB custom resource. This ensures that all pods on the node have the necessary compute resources to manage workloads and cluster housekeeping.

Prerequisites

  • You are logged in as a user with cluster-admin privileges.
  • You have installed the MetalLB Operator.

Procedure

  1. Create a MetalLB custom resource file, such as CPULimits.yaml, to specify the cpu value for the controller and speaker pods:

    apiVersion: metallb.io/v1beta1
    kind: MetalLB
    metadata:
      name: metallb
      namespace: metallb-system
    spec:
      logLevel: debug
      controllerConfig:
        resources:
          limits:
            cpu: "200m"
      speakerConfig:
        resources:
          limits:
            cpu: "300m"
    Copy to Clipboard Toggle word wrap
  2. Apply the MetalLB custom resource configuration:

    $ oc apply -f CPULimits.yaml
    Copy to Clipboard Toggle word wrap

Verification

  • To view compute resources for a pod, run the following command, replacing <pod_name> with your target pod:

    $ oc describe pod <pod_name>
    Copy to Clipboard Toggle word wrap

5.4.3. Upgrading the MetalLB Operator

The Subscription custom resource (CR) for the MetalLB Operator is used to manage whether the Operator is upgraded automatically or manually.

By default, the Subscription CR assigns the namespace to metallb-system and automatically sets the installPlanApproval parameter to Automatic. This means that when Red Hat-provided Operator catalogs include a newer version of the MetalLB Operator, the MetalLB Operator is automatically upgraded.

If you need to manually control upgrading the MetalLB Operator, set the installPlanApproval parameter to Manual.

5.4.3.1. Manually upgrading the MetalLB Operator

To manually control when the MetalLB Operator upgrades in OpenShift Container Platform, you set installPlanApproval to Manual in the Subscription custom resource and approve the install plan. You then verify the upgrade by using the ClusterServiceVersion status.

Prerequisites

  • You updated your cluster to the latest z-stream release.
  • You used OperatorHub to install the MetalLB Operator.
  • Access the cluster as a user with the cluster-admin role.

Procedure

  1. Get the YAML definition of the metallb-operator subscription in the metallb-system namespace by entering the following command:

    $ oc -n metallb-system get subscription metallb-operator -o yaml
    Copy to Clipboard Toggle word wrap
  2. Edit the Subscription CR by setting the installPlanApproval parameter to Manual:

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: metallb-operator
      namespace: metallb-system
    # ...
    spec:
       channel: stable
       installPlanApproval: Manual
       name: metallb-operator
       source: redhat-operators
       sourceNamespace: openshift-marketplace
    # ...
    Copy to Clipboard Toggle word wrap
  3. Find the latest OpenShift Container Platform 4.16 version of the MetalLB Operator by entering the following command:

    $ oc -n metallb-system get csv
    Copy to Clipboard Toggle word wrap
  4. Check the install plan that exists in the namespace by entering the following command.

    $ oc -n metallb-system get installplan
    Copy to Clipboard Toggle word wrap

    Example output that shows install-tsz2g as a manual install plan

    NAME            CSV                                     APPROVAL    APPROVED
    install-shpmd   metallb-operator.v4.16.0-202502261233   Automatic   true
    install-tsz2g   metallb-operator.v4.16.0-202503102139   Manual      false
    Copy to Clipboard Toggle word wrap

  5. Edit the install plan that exists in the namespace by entering the following command. Ensure that you replace <name_of_installplan> with the name of the install plan, such as install-tsz2g.

    $ oc edit installplan <name_of_installplan> -n metallb-system
    Copy to Clipboard Toggle word wrap
    1. With the install plan open in your editor, set the spec.approval parameter to Manual and set the spec.approved parameter to true.

      Note

      After you edit the install plan, the upgrade operation starts. If you enter the oc -n metallb-system get csv command during the upgrade operation, the output might show the Replacing or the Pending status.

Verification

  • To verify that the Operator is upgraded, enter the following command and then check that output shows Succeeded for the Operator:

    $ oc -n metallb-system get csv
    Copy to Clipboard Toggle word wrap
5.4.3.2. Additional resources

With the Cluster Network Operator, you can manage networking in OpenShift Container Platform, including how to view status, enable IP forwarding, and collect logs.

You can use the Cluster Network Operator (CNO) to deploy and manage cluster network components on an OpenShift Container Platform cluster, including the Container Network Interface (CNI) network plugin selected for the cluster during installation.

5.5.1. Cluster Network Operator

The Cluster Network Operator implements the network API from the operator.openshift.io API group. The Operator deploys the OVN-Kubernetes network plugin, or the network provider plugin that you selected during cluster installation, by using a daemon set.

The Cluster Network Operator is deployed during installation as a Kubernetes Deployment.

Procedure

  1. Run the following command to view the Deployment status:

    $ oc get -n openshift-network-operator deployment/network-operator
    Copy to Clipboard Toggle word wrap

    Example output

    NAME               READY   UP-TO-DATE   AVAILABLE   AGE
    network-operator   1/1     1            1           56m
    Copy to Clipboard Toggle word wrap

  2. Run the following command to view the state of the Cluster Network Operator:

    $ oc get clusteroperator/network
    Copy to Clipboard Toggle word wrap

    Example output

    NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
    network   4.16.1     True        False         False      50m
    Copy to Clipboard Toggle word wrap

    The following fields provide information about the status of the operator: AVAILABLE, PROGRESSING, and DEGRADED. The AVAILABLE field is True when the Cluster Network Operator reports an available status condition.

5.5.2. Viewing the cluster network configuration

You can view your OpenShift Container Platform cluster network configuration by using the oc describe command for the network.config/cluster resource.

Procedure

  • Use the oc describe command to view the cluster network configuration:

    $ oc describe network.config/cluster
    Copy to Clipboard Toggle word wrap

    Example output

    Name:         cluster
    Namespace:
    Labels:       <none>
    Annotations:  <none>
    API Version:  config.openshift.io/v1
    Kind:         Network
    Metadata:
      Creation Timestamp:  2024-08-08T11:25:56Z
      Generation:          3
      Resource Version:    29821
      UID:                 808dd2be-5077-4ff7-b6bb-21b7110126c7
    Spec:
      Cluster Network:
        Cidr:         10.128.0.0/14
        Host Prefix:  23
      Network Type:   OVNKubernetes
      Service Network:
        172.30.0.0/16
    Status
      Cluster Network:
        Cidr:               10.128.0.0/14
        Host Prefix:        23
      Cluster Network MTU:  8951
      Network Type:         OVNKubernetes
      Service Network:
        172.30.0.0/16
    Events:  <none>
    Copy to Clipboard Toggle word wrap

    where:

    spec
    Specifies the field that displays the configured state of the cluster network.
    Status
    Displays the current state of the cluster network configuration.

5.5.3. Viewing Cluster Network Operator status

You can inspect the status and view the details of the Cluster Network Operator by using the oc describe command.

Procedure

  • Run the following command to view the status of the Cluster Network Operator:

    $ oc describe clusteroperators/network
    Copy to Clipboard Toggle word wrap

5.5.4. Enabling IP forwarding globally

From OpenShift Container Platform 4.14 onward, OVN-Kubernetes disables global IP forwarding by default. By setting the Cluster Network Operator gatewayConfig.ipForwarding spec to Global, you can enable cluster-wide forwarding.

Procedure

  1. Backup the existing network configuration by running the following command:

    $ oc get network.operator cluster -o yaml > network-config-backup.yaml
    Copy to Clipboard Toggle word wrap
  2. Run the following command to modify the existing network configuration:

    $ oc edit network.operator cluster
    Copy to Clipboard Toggle word wrap
    1. Add or update the following block under spec as illustrated in the following example:

      spec:
        clusterNetwork:
        - cidr: 10.128.0.0/14
          hostPrefix: 23
        serviceNetwork:
        - 172.30.0.0/16
        networkType: OVNKubernetes
        clusterNetworkMTU: 8900
        defaultNetwork:
          ovnKubernetesConfig:
            gatewayConfig:
              ipForwarding: Global
      Copy to Clipboard Toggle word wrap
    2. Save and close the file.
  3. After applying the changes, the OpenShift Cluster Network Operator (CNO) applies the update across the cluster. You can monitor the progress by using the following command:

    $ oc get clusteroperators network
    Copy to Clipboard Toggle word wrap

    The status should eventually report as Available, Progressing=False, and Degraded=False.

  4. Alternatively, you can enable IP forwarding globally by running the following command:

    $ oc patch network.operator cluster -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":{"ipForwarding": "Global"}}}}}' --type=merge
    Copy to Clipboard Toggle word wrap
    Note

    The other valid option for this parameter is Restricted in case you want to revert this change. Restricted is the default and with that setting global IP address forwarding is disabled.

5.5.5. Viewing Cluster Network Operator logs

You can view Cluster Network Operator logs by using the oc logs command.

Procedure

  • Run the following command to view the logs of the Cluster Network Operator:

    $ oc logs --namespace=openshift-network-operator deployment/network-operator
    Copy to Clipboard Toggle word wrap

5.5.6. Cluster Network Operator configuration

To manage cluster networking, configure the Cluster Network Operator (CNO) Network custom resource (CR) named cluster so the cluster uses the correct IP ranges and network plugin settings for reliable pod and service connectivity. Some settings and fields are inherited at the time of install.

The CNO configuration inherits the following fields during cluster installation from the Network API in the Network.config.openshift.io API group:

clusterNetwork
IP address pools from which pod IP addresses are allocated.
serviceNetwork
IP address pool for services.
defaultNetwork.type
Cluster network plugin. OVNKubernetes is the only supported plugin during installation.
Note

After cluster installation, you can only modify the clusterNetwork IP address range. The default network type can only be changed from OpenShift SDN to OVN-Kubernetes through migration.

You can specify the cluster network plugin configuration for your cluster by setting the fields for the defaultNetwork object in the CNO object named cluster.

The fields for the Cluster Network Operator (CNO) are described in the following table:

Expand
Table 5.1. Cluster Network Operator configuration object
FieldTypeDescription

metadata.name

string

The name of the CNO object. This name is always cluster.

spec.clusterNetwork

array

A list specifying the blocks of IP addresses from which pod IP addresses are allocated and the subnet prefix length assigned to each individual node in the cluster. For example:

spec:
  clusterNetwork:
  - cidr: 10.128.0.0/19
    hostPrefix: 23
  - cidr: 10.128.32.0/19
    hostPrefix: 23
Copy to Clipboard Toggle word wrap

spec.serviceNetwork

array

A block of IP addresses for services. The OpenShift SDN and OVN-Kubernetes network plugins support only a single IP address block for the service network. For example:

spec:
  serviceNetwork:
  - 172.30.0.0/14
Copy to Clipboard Toggle word wrap

This value is ready-only and inherited from the Network.config.openshift.io object named cluster during cluster installation.

spec.defaultNetwork

object

Configures the network plugin for the cluster network.

spec.kubeProxyConfig

object

The fields for this object specify the kube-proxy configuration. If you are using the OVN-Kubernetes cluster network plugin, the kube-proxy configuration has no effect.

Important

For a cluster that needs to deploy objects across multiple networks, ensure that you specify the same value for the clusterNetwork.hostPrefix parameter for each network type that is defined in the install-config.yaml file. Setting a different value for each clusterNetwork.hostPrefix parameter can impact the OVN-Kubernetes network plugin, where the plugin cannot effectively route object traffic among different nodes.

5.5.6.2. defaultNetwork object configuration

The values for the defaultNetwork object are defined in the following table:

Expand
Table 5.2. defaultNetwork object
FieldTypeDescription

type

string

OVNKubernetes. The Red Hat OpenShift Networking network plugin is selected during installation. This value cannot be changed after cluster installation.

Note

OpenShift Container Platform uses the OVN-Kubernetes network plugin by default. OpenShift SDN is no longer available as an installation choice for new clusters.

ovnKubernetesConfig

object

This object is only valid for the OVN-Kubernetes network plugin.

The following table describes the configuration fields for the OpenShift SDN network plugin:

Expand
Table 5.3. openshiftSDNConfig object
FieldTypeDescription

mode

string

The network isolation mode for OpenShift SDN.

mtu

integer

The maximum transmission unit (MTU) for the VXLAN overlay network. This value is normally configured automatically.

vxlanPort

integer

The port to use for all VXLAN packets. The default value is 4789.

Example OpenShift SDN configuration

defaultNetwork:
  type: OpenShiftSDN
  openshiftSDNConfig:
    mode: NetworkPolicy
    mtu: 1450
    vxlanPort: 4789
Copy to Clipboard Toggle word wrap

The following table describes the configuration fields for the OVN-Kubernetes network plugin:

Expand
Table 5.4. ovnKubernetesConfig object
FieldTypeDescription

mtu

integer

The maximum transmission unit (MTU) for the Geneve (Generic Network Virtualization Encapsulation) overlay network. This value is normally configured automatically.

genevePort

integer

The UDP port for the Geneve overlay network.

ipsecConfig

object

An object describing the IPsec mode for the cluster.

ipv4

object

Specifies a configuration object for IPv4 settings.

ipv6

object

Specifies a configuration object for IPv6 settings.

policyAuditConfig

object

Specify a configuration object for customizing network policy audit logging. If unset, the defaults audit log settings are used.

gatewayConfig

object

Optional: Specify a configuration object for customizing how egress traffic is sent to the node gateway. Valid values are Shared and Local. The default value is Shared. In the default setting, the Open vSwitch (OVS) outputs traffic directly to the node IP interface. In the Local setting, it traverses the host network; consequently, it gets applied to the routing table of the host.

Note

While migrating egress traffic, you can expect some disruption to workloads and service traffic until the Cluster Network Operator (CNO) successfully rolls out the changes.

Expand
Table 5.5. ovnKubernetesConfig.ipv4 object
FieldTypeDescription

internalTransitSwitchSubnet

string

If your existing network infrastructure overlaps with the 100.88.0.0/16 IPv4 subnet, you can specify a different IP address range for internal use by OVN-Kubernetes. The subnet for the distributed transit switch that enables east-west traffic. This subnet cannot overlap with any other subnets used by OVN-Kubernetes or on the host itself. It must be large enough to accommodate one IP address per node in your cluster.

The default value is 100.88.0.0/16.

internalJoinSubnet

string

If your existing network infrastructure overlaps with the 100.64.0.0/16 IPv4 subnet, you can specify a different IP address range for internal use by OVN-Kubernetes. You must ensure that the IP address range does not overlap with any other subnet used by your OpenShift Container Platform installation. The IP address range must be larger than the maximum number of nodes that can be added to the cluster. For example, if the clusterNetwork.cidr value is 10.128.0.0/14 and the clusterNetwork.hostPrefix value is /23, then the maximum number of nodes is 2^(23-14)=512.

The default value is 100.64.0.0/16.

Expand
Table 5.6. ovnKubernetesConfig.ipv6 object
FieldTypeDescription

internalTransitSwitchSubnet

string

If your existing network infrastructure overlaps with the fd97::/64 IPv6 subnet, you can specify a different IP address range for internal use by OVN-Kubernetes. The subnet for the distributed transit switch that enables east-west traffic. This subnet cannot overlap with any other subnets used by OVN-Kubernetes or on the host itself. It must be large enough to accommodate one IP address per node in your cluster.

The default value is fd97::/64.

internalJoinSubnet

string

If your existing network infrastructure overlaps with the fd98::/64 IPv6 subnet, you can specify a different IP address range for internal use by OVN-Kubernetes. You must ensure that the IP address range does not overlap with any other subnet used by your OpenShift Container Platform installation. The IP address range must be larger than the maximum number of nodes that can be added to the cluster.

The default value is fd98::/64.

Expand
Table 5.7. policyAuditConfig object
FieldTypeDescription

rateLimit

integer

The maximum number of messages to generate every second per node. The default value is 20 messages per second.

maxFileSize

integer

The maximum size for the audit log in bytes. The default value is 50000000 or 50 MB.

maxLogFiles

integer

The maximum number of log files that are retained.

destination

string

One of the following additional audit log targets:

libc
The libc syslog() function of the journald process on the host.
udp:<host>:<port>
A syslog server. Replace <host>:<port> with the host and port of the syslog server.
unix:<file>
A Unix Domain Socket file specified by <file>.
null
Do not send the audit logs to any additional target.

syslogFacility

string

The syslog facility, such as kern, as defined by RFC5424. The default value is local0.

Expand
Table 5.8. gatewayConfig object
FieldTypeDescription

routingViaHost

boolean

Set this field to true to send egress traffic from pods to the host networking stack. For highly-specialized installations and applications that rely on manually configured routes in the kernel routing table, you might want to route egress traffic to the host networking stack. By default, egress traffic is processed in OVN to exit the cluster and is not affected by specialized routes in the kernel routing table. The default value is false.

This field has an interaction with the Open vSwitch hardware offloading feature. If you set this field to true, you do not receive the performance benefits of the offloading because egress traffic is processed by the host networking stack.

ipForwarding

object

You can control IP forwarding for all traffic on OVN-Kubernetes managed interfaces by using the ipForwarding specification in the Network resource. Specify Restricted to only allow IP forwarding for Kubernetes related traffic. Specify Global to allow forwarding of all IP traffic. For new installations, the default is Restricted. For updates to OpenShift Container Platform 4.14 or later, the default is Global.

ipv4

object

Optional: Specify an object to configure the internal OVN-Kubernetes masquerade address for host to service traffic for IPv4 addresses.

ipv6

object

Optional: Specify an object to configure the internal OVN-Kubernetes masquerade address for host to service traffic for IPv6 addresses.

Expand
Table 5.9. gatewayConfig.ipv4 object
FieldTypeDescription

internalMasqueradeSubnet

string

The masquerade IPv4 addresses that are used internally to enable host to service traffic. The host is configured with these IP addresses as well as the shared gateway bridge interface. The default value is 169.254.169.0/29.

Expand
Table 5.10. gatewayConfig.ipv6 object
FieldTypeDescription

internalMasqueradeSubnet

string

The masquerade IPv6 addresses that are used internally to enable host to service traffic. The host is configured with these IP addresses as well as the shared gateway bridge interface. The default value is fd69::/125.

Expand
Table 5.11. ipsecConfig object
FieldTypeDescription

mode

string

Specifies the behavior of the IPsec implementation. Must be one of the following values:

  • Disabled: IPsec is not enabled on cluster nodes.
  • External: IPsec is enabled for network traffic with external hosts.
  • Full: IPsec is enabled for pod traffic and network traffic with external hosts.
Note

You can only change the configuration for your cluster network plugin during cluster installation, except for the gatewayConfig field that can be changed at runtime as a postinstallation activity.

Example OVN-Kubernetes configuration with IPSec enabled

defaultNetwork:
  type: OVNKubernetes
  ovnKubernetesConfig:
    mtu: 1400
    genevePort: 6081
    ipsecConfig:
      mode: Full
Copy to Clipboard Toggle word wrap

Important

Using OVNKubernetes can lead to a stack exhaustion problem on IBM Power®.

The values for the kubeProxyConfig object are defined in the following table:

Expand
Table 5.12. kubeProxyConfig object
FieldTypeDescription

iptablesSyncPeriod

string

The refresh period for iptables rules. The default value is 30s. Valid suffixes include s, m, and h and are described in the Go time package documentation.

Note

Because of performance improvements introduced in OpenShift Container Platform 4.3 and greater, adjusting the iptablesSyncPeriod parameter is no longer necessary.

proxyArguments.iptables-min-sync-period

array

The minimum duration before refreshing iptables rules. This field ensures that the refresh does not happen too frequently. Valid suffixes include s, m, and h and are described in the Go time package. The default value is:

kubeProxyConfig:
  proxyArguments:
    iptables-min-sync-period:
    - 0s
Copy to Clipboard Toggle word wrap

A complete CNO configuration is specified in the following example:

Example Cluster Network Operator object

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  serviceNetwork:
  - 172.30.0.0/16
  networkType: OVNKubernetes
Copy to Clipboard Toggle word wrap

5.6. DNS Operator in OpenShift Container Platform

In OpenShift Container Platform, the DNS Operator deploys and manages a CoreDNS instance to provide a name resolution service to pods inside the cluster, enables DNS-based Kubernetes Service discovery, and resolves internal cluster.local names.

5.6.1. Checking the status of the DNS Operator

The DNS Operator implements the dns API from the operator.openshift.io API group. The Operator deploys CoreDNS using a daemon set, creates a service for the daemon set, and configures the kubelet to instruct pods to use the CoreDNS service IP address for name resolution.

Procedure

The DNS Operator is deployed during installation with a Deployment object.

  1. Use the oc get command to view the deployment status:

    $ oc get -n openshift-dns-operator deployment/dns-operator
    Copy to Clipboard Toggle word wrap

    Example output

    NAME           READY     UP-TO-DATE   AVAILABLE   AGE
    dns-operator   1/1       1            1           23h
    Copy to Clipboard Toggle word wrap

  2. Use the oc get command to view the state of the DNS Operator:

    $ oc get clusteroperator/dns
    Copy to Clipboard Toggle word wrap

    Example output

    NAME      VERSION     AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
    dns       4.1.15-0.11  True        False         False      92m
    Copy to Clipboard Toggle word wrap

    AVAILABLE, PROGRESSING, and DEGRADED provide information about the status of the Operator. AVAILABLE is True when at least 1 pod from the CoreDNS daemon set reports an Available status condition, and the DNS service has a cluster IP address.

5.6.2. View the default DNS

Every new OpenShift Container Platform installation has a dns.operator named default.

Procedure

  1. Use the oc describe command to view the default dns:

    $ oc describe dns.operator/default
    Copy to Clipboard Toggle word wrap

    Example output

    Name:         default
    Namespace:
    Labels:       <none>
    Annotations:  <none>
    API Version:  operator.openshift.io/v1
    Kind:         DNS
    ...
    Status:
      Cluster Domain:  cluster.local 
    1
    
      Cluster IP:      172.30.0.10 
    2
    
    ...
    Copy to Clipboard Toggle word wrap

    1 1 1
    The Cluster Domain field is the base DNS domain used to construct fully qualified pod and service domain names.
    2
    The Cluster IP is the address pods query for name resolution. The IP is defined as the 10th address in the service CIDR range.
  2. To find the service CIDR range, such as 172.30.0.0/16, of your cluster, use the oc get command:

    $ oc get networks.config/cluster -o jsonpath='{$.status.serviceNetwork}'
    Copy to Clipboard Toggle word wrap

5.6.3. Using DNS forwarding

You can use DNS forwarding to override the default forwarding configuration in the /etc/resolv.conf file in the following ways:

  • Specify name servers (spec.servers) for every zone. If the forwarded zone is the ingress domain managed by OpenShift Container Platform, then the upstream name server must be authorized for the domain.
  • Provide a list of upstream DNS servers (spec.upstreamResolvers).
  • Change the default forwarding policy.
Note

A DNS forwarding configuration for the default domain can have both the default servers specified in the /etc/resolv.conf file and the upstream DNS servers.

Procedure

  1. Modify the DNS Operator object named default:

    $ oc edit dns.operator/default
    Copy to Clipboard Toggle word wrap

    After you issue the previous command, the Operator creates and updates the config map named dns-default with additional server configuration blocks based on spec.servers. If none of the servers have a zone that matches the query, then name resolution falls back to the upstream DNS servers.

    Configuring DNS forwarding

    apiVersion: operator.openshift.io/v1
    kind: DNS
    metadata:
      name: default
    spec:
      cache:
        negativeTTL: 0s
        positiveTTL: 0s
      logLevel: Normal
      nodePlacement: {}
      operatorLogLevel: Normal
      servers:
      - name: example-server 
    1
    
        zones:
        - example.com 
    2
    
        forwardPlugin:
          policy: Random 
    3
    
          upstreams: 
    4
    
          - 1.1.1.1
          - 2.2.2.2:5353
      upstreamResolvers: 
    5
    
        policy: Random 
    6
    
        protocolStrategy: ""  
    7
    
        transportConfig: {}  
    8
    
        upstreams:
        - type: SystemResolvConf 
    9
    
        - type: Network
          address: 1.2.3.4 
    10
    
          port: 53 
    11
    
        status:
          clusterDomain: cluster.local
          clusterIP: x.y.z.10
          conditions:
    ...
    Copy to Clipboard Toggle word wrap

    1
    Must comply with the rfc6335 service name syntax.
    2
    Must conform to the definition of a subdomain in the rfc1123 service name syntax. The cluster domain, cluster.local, is an invalid subdomain for the zones field.
    3
    Defines the policy to select upstream resolvers listed in the forwardPlugin. Default value is Random. You can also use the values RoundRobin, and Sequential.
    4
    A maximum of 15 upstreams is allowed per forwardPlugin.
    5
    You can use upstreamResolvers to override the default forwarding policy and forward DNS resolution to the specified DNS resolvers (upstream resolvers) for the default domain. If you do not provide any upstream resolvers, the DNS name queries go to the servers declared in /etc/resolv.conf.
    6
    Determines the order in which upstream servers listed in upstreams are selected for querying. You can specify one of these values: Random, RoundRobin, or Sequential. The default value is Sequential.
    7
    When omitted, the platform chooses a default, normally the protocol of the original client request. Set to TCP to specify that the platform should use TCP for all upstream DNS requests, even if the client request uses UDP.
    8
    Used to configure the transport type, server name, and optional custom CA or CA bundle to use when forwarding DNS requests to an upstream resolver.
    9
    You can specify two types of upstreams: SystemResolvConf or Network. SystemResolvConf configures the upstream to use /etc/resolv.conf and Network defines a Networkresolver. You can specify one or both.
    10
    If the specified type is Network, you must provide an IP address. The address field must be a valid IPv4 or IPv6 address.
    11
    If the specified type is Network, you can optionally provide a port. The port field must have a value between 1 and 65535. If you do not specify a port for the upstream, the default port is 853.

5.6.4. Checking DNS Operator status

You can inspect the status and view the details of the DNS Operator using the oc describe command.

Procedure

  • View the status of the DNS Operator:

    $ oc describe clusteroperators/dns
    Copy to Clipboard Toggle word wrap

    Though the messages and spelling might vary in a specific release, the expected status output looks like:

    Status:
      Conditions:
        Last Transition Time:  <date>
        Message:               DNS "default" is available.
        Reason:                AsExpected
        Status:                True
        Type:                  Available
        Last Transition Time:  <date>
        Message:               Desired and current number of DNSes are equal
        Reason:                AsExpected
        Status:                False
        Type:                  Progressing
        Last Transition Time:  <date>
        Reason:                DNSNotDegraded
        Status:                False
        Type:                  Degraded
        Last Transition Time:  <date>
        Message:               DNS default is upgradeable: DNS Operator can be upgraded
        Reason:                DNSUpgradeable
        Status:                True
        Type:                  Upgradeable
    Copy to Clipboard Toggle word wrap

5.6.5. Viewing DNS Operator logs

You can view DNS Operator logs by using the oc logs command.

Procedure

  • View the logs of the DNS Operator:

    $ oc logs -n openshift-dns-operator deployment/dns-operator -c dns-operator
    Copy to Clipboard Toggle word wrap

5.6.6. Setting the CoreDNS log level

Log levels for CoreDNS and the CoreDNS Operator are set by using different methods. You can configure the CoreDNS log level to determine the amount of detail in logged error messages. The valid values for CoreDNS log level are Normal, Debug, and Trace. The default logLevel is Normal.

Note

The CoreDNS error log level is always enabled. The following log level settings report different error responses:

  • logLevel: Normal enables the "errors" class: log . { class error }.
  • logLevel: Debug enables the "denial" class: log . { class denial error }.
  • logLevel: Trace enables the "all" class: log . { class all }.

Procedure

  • To set logLevel to Debug, enter the following command:

    $ oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Debug"}}' --type=merge
    Copy to Clipboard Toggle word wrap
  • To set logLevel to Trace, enter the following command:

    $ oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Trace"}}' --type=merge
    Copy to Clipboard Toggle word wrap

Verification

  • To ensure the desired log level was set, check the config map:

    $ oc get configmap/dns-default -n openshift-dns -o yaml
    Copy to Clipboard Toggle word wrap

    For example, after setting the logLevel to Trace, you should see this stanza in each server block:

    errors
    log . {
        class all
    }
    Copy to Clipboard Toggle word wrap

5.6.7. Viewing the CoreDNS logs

You can view CoreDNS logs by using the oc logs command.

Procedure

  • View the logs of a specific CoreDNS pod by entering the following command:

    $ oc -n openshift-dns logs -c dns <core_dns_pod_name>
    Copy to Clipboard Toggle word wrap
  • Follow the logs of all CoreDNS pods by entering the following command:

    $ oc -n openshift-dns logs -c dns -l dns.operator.openshift.io/daemonset-dns=default -f --max-log-requests=<number> 
    1
    Copy to Clipboard Toggle word wrap
    1
    Specifies the number of DNS pods to stream logs from. The maximum is 6.

5.6.8. Setting the CoreDNS Operator log level

Log levels for CoreDNS and CoreDNS Operator are set by using different methods. Cluster administrators can configure the Operator log level to more quickly track down OpenShift DNS issues. The valid values for operatorLogLevel are Normal, Debug, and Trace. Trace has the most detailed information. The default operatorlogLevel is Normal. There are seven logging levels for Operator issues: Trace, Debug, Info, Warning, Error, Fatal, and Panic. After the logging level is set, log entries with that severity or anything above it will be logged.

  • operatorLogLevel: "Normal" sets logrus.SetLogLevel("Info").
  • operatorLogLevel: "Debug" sets logrus.SetLogLevel("Debug").
  • operatorLogLevel: "Trace" sets logrus.SetLogLevel("Trace").

Procedure

  • To set operatorLogLevel to Debug, enter the following command:

    $ oc patch dnses.operator.openshift.io/default -p '{"spec":{"operatorLogLevel":"Debug"}}' --type=merge
    Copy to Clipboard Toggle word wrap
  • To set operatorLogLevel to Trace, enter the following command:

    $ oc patch dnses.operator.openshift.io/default -p '{"spec":{"operatorLogLevel":"Trace"}}' --type=merge
    Copy to Clipboard Toggle word wrap

Verification

  1. To review the resulting change, enter the following command:

    $ oc get dnses.operator -A -oyaml
    Copy to Clipboard Toggle word wrap

    You should see two log level entries. The operatorLogLevel applies to OpenShift DNS Operator issues, and the logLevel applies to the daemonset of CoreDNS pods:

     logLevel: Trace
     operatorLogLevel: Debug
    Copy to Clipboard Toggle word wrap
  2. To review the logs for the daemonset, enter the following command:

    $ oc logs -n openshift-dns ds/dns-default
    Copy to Clipboard Toggle word wrap

5.6.9. Tuning the CoreDNS cache

For CoreDNS, you can configure the maximum duration of both successful or unsuccessful caching, also known respectively as positive or negative caching. Tuning the cache duration of DNS query responses can reduce the load for any upstream DNS resolvers.

Warning

Setting TTL fields to low values could lead to an increased load on the cluster, any upstream resolvers, or both.

Procedure

  1. Edit the DNS Operator object named default by running the following command:

    $ oc edit dns.operator.openshift.io/default
    Copy to Clipboard Toggle word wrap
  2. Modify the time-to-live (TTL) caching values:

    Configuring DNS caching

    apiVersion: operator.openshift.io/v1
    kind: DNS
    metadata:
      name: default
    spec:
      cache:
        positiveTTL: 1h 
    1
    
        negativeTTL: 0.5h10m 
    2
    Copy to Clipboard Toggle word wrap

    1
    The string value 1h is converted to its respective number of seconds by CoreDNS. If this field is omitted, the value is assumed to be 0s and the cluster uses the internal default value of 900s as a fallback.
    2
    The string value can be a combination of units such as 0.5h10m and is converted to its respective number of seconds by CoreDNS. If this field is omitted, the value is assumed to be 0s and the cluster uses the internal default value of 30s as a fallback.

Verification

  1. To review the change, look at the config map again by running the following command:

    $ oc get configmap/dns-default -n openshift-dns -o yaml
    Copy to Clipboard Toggle word wrap
  2. Verify that you see entries that look like the following example:

           cache 3600 {
                denial 9984 2400
            }
    Copy to Clipboard Toggle word wrap

5.6.10. Advanced tasks

The DNS Operator manages the CoreDNS component to provide a name resolution service for pods and services in the cluster. The managementState of the DNS Operator is set to Managed by default, which means that the DNS Operator is actively managing its resources. You can change it to Unmanaged, which means the DNS Operator is not managing its resources.

The following are use cases for changing the DNS Operator managementState:

  • You are a developer and want to test a configuration change to see if it fixes an issue in CoreDNS. You can stop the DNS Operator from overwriting the configuration change by setting the managementState to Unmanaged.
  • You are a cluster administrator and have reported an issue with CoreDNS, but need to apply a workaround until the issue is fixed. You can set the managementState field of the DNS Operator to Unmanaged to apply the workaround.

Procedure

  1. Change managementState to Unmanaged in the DNS Operator:

    oc patch dns.operator.openshift.io default --type merge --patch '{"spec":{"managementState":"Unmanaged"}}'
    Copy to Clipboard Toggle word wrap
  2. Review managementState of the DNS Operator by using the jsonpath command-line JSON parser:

    $ oc get dns.operator.openshift.io default -ojsonpath='{.spec.managementState}'
    Copy to Clipboard Toggle word wrap
    Note

    You cannot upgrade while the managementState is set to Unmanaged.

5.6.10.2. Controlling DNS pod placement

The DNS Operator has two daemon sets: one for CoreDNS called dns-default and one for managing the /etc/hosts file called node-resolver.

You might find a need to control which nodes have CoreDNS pods assigned and running, although this is not a common operation. For example, if the cluster administrator has configured security policies that can prohibit communication between pairs of nodes, that would necessitate restricting the set of nodes on which the daemonset for CoreDNS runs. If DNS pods are running on some nodes in the cluster and the nodes where DNS pods are not running have network connectivity to nodes where DNS pods are running, DNS service will be available to all pods.

The node-resolver daemon set must run on every node host because it adds an entry for the cluster image registry to support pulling images. The node-resolver pods have only one job: to look up the image-registry.openshift-image-registry.svc service’s cluster IP address and add it to /etc/hosts on the node host so that the container runtime can resolve the service name.

As a cluster administrator, you can use a custom node selector to configure the daemon set for CoreDNS to run or not run on certain nodes.

Prerequisites

  • You installed the oc CLI.
  • You are logged in to the cluster as a user with cluster-admin privileges.
  • Your DNS Operator managementState is set to Managed.

Procedure

  • To allow the daemon set for CoreDNS to run on certain nodes, configure a taint and toleration:

    1. Modify the DNS Operator object named default:

      $ oc edit dns.operator/default
      Copy to Clipboard Toggle word wrap
    2. Specify a taint key and a toleration for the taint:

       spec:
         nodePlacement:
           tolerations:
           - effect: NoExecute
             key: "dns-only"
             operators: Equal
             value: abc
             tolerationSeconds: 3600 
      1
      Copy to Clipboard Toggle word wrap
      1
      If the taint is dns-only, it can be tolerated indefinitely. You can omit tolerationSeconds.
5.6.10.3. Configuring DNS forwarding with TLS

When working in a highly regulated environment, you might need the ability to secure DNS traffic when forwarding requests to upstream resolvers so that you can ensure additional DNS traffic and data privacy.

Be aware that CoreDNS caches forwarded connections for 10 seconds. CoreDNS will hold a TCP connection open for those 10 seconds if no request is issued.

Note

With large clusters, ensure that your DNS server is aware that it might get many new connections to hold open because you can initiate a connection per node. Set up your DNS hierarchy accordingly to avoid performance issues.

Procedure

  1. Modify the DNS Operator object named default:

    $ oc edit dns.operator/default
    Copy to Clipboard Toggle word wrap

    Cluster administrators can configure transport layer security (TLS) for forwarded DNS queries.

    Configuring DNS forwarding with TLS

    apiVersion: operator.openshift.io/v1
    kind: DNS
    metadata:
      name: default
    spec:
      servers:
      - name: example-server 
    1
    
        zones:
        - example.com 
    2
    
        forwardPlugin:
          transportConfig:
            transport: TLS 
    3
    
            tls:
              caBundle:
                name: mycacert
              serverName: dnstls.example.com  
    4
    
          policy: Random 
    5
    
          upstreams: 
    6
    
          - 1.1.1.1
          - 2.2.2.2:5353
      upstreamResolvers: 
    7
    
        transportConfig:
          transport: TLS
          tls:
            caBundle:
              name: mycacert
            serverName: dnstls.example.com
        upstreams:
        - type: Network 
    8
    
          address: 1.2.3.4 
    9
    
          port: 53 
    10
    Copy to Clipboard Toggle word wrap

    1
    Must comply with the rfc6335 service name syntax.
    2
    Must conform to the definition of a subdomain in the rfc1123 service name syntax. The cluster domain, cluster.local, is an invalid subdomain for the zones field. The cluster domain, cluster.local, is an invalid subdomain for zones.
    3
    When configuring TLS for forwarded DNS queries, set the transport field to have the value TLS.
    4
    When configuring TLS for forwarded DNS queries, this is a mandatory server name used as part of the server name indication (SNI) to validate the upstream TLS server certificate.
    5
    Defines the policy to select upstream resolvers. Default value is Random. You can also use the values RoundRobin, and Sequential.
    6
    Required. Use it to provide upstream resolvers. A maximum of 15 upstreams entries are allowed per forwardPlugin entry.
    7
    Optional. You can use it to override the default policy and forward DNS resolution to the specified DNS resolvers (upstream resolvers) for the default domain. If you do not provide any upstream resolvers, the DNS name queries go to the servers in /etc/resolv.conf.
    8
    Only the Network type is allowed when using TLS and you must provide an IP address. Network type indicates that this upstream resolver should handle forwarded requests separately from the upstream resolvers listed in /etc/resolv.conf.
    9
    The address field must be a valid IPv4 or IPv6 address.
    10
    You can optionally provide a port. The port must have a value between 1 and 65535. If you do not specify a port for the upstream, the default port is 853.
    Note

    If servers is undefined or invalid, the config map only contains the default server.

Verification

  1. View the config map:

    $ oc get configmap/dns-default -n openshift-dns -o yaml
    Copy to Clipboard Toggle word wrap

    Sample DNS ConfigMap based on TLS forwarding example

    apiVersion: v1
    data:
      Corefile: |
        example.com:5353 {
            forward . 1.1.1.1 2.2.2.2:5353
        }
        bar.com:5353 example.com:5353 {
            forward . 3.3.3.3 4.4.4.4:5454 
    1
    
        }
        .:5353 {
            errors
            health
            kubernetes cluster.local in-addr.arpa ip6.arpa {
                pods insecure
                upstream
                fallthrough in-addr.arpa ip6.arpa
            }
            prometheus :9153
            forward . /etc/resolv.conf 1.2.3.4:53 {
                policy Random
            }
            cache 30
            reload
        }
    kind: ConfigMap
    metadata:
      labels:
        dns.operator.openshift.io/owning-dns: default
      name: dns-default
      namespace: openshift-dns
    Copy to Clipboard Toggle word wrap

    1
    Changes to the forwardPlugin triggers a rolling update of the CoreDNS daemon set.

The Ingress Operator implements the IngressController API and is the component responsible for enabling external access to OpenShift Container Platform cluster services.

When you create your OpenShift Container Platform cluster, pods and services running on the cluster are each allocated their own IP addresses. The IP addresses are accessible to other pods and services running nearby but are not accessible to outside clients.

The Ingress Operator makes it possible for external clients to access your service by deploying and managing one or more HAProxy-based Ingress Controllers to handle routing. You can use the Ingress Operator to route traffic by specifying OpenShift Container Platform Route and Kubernetes Ingress resources. Configurations within the Ingress Controller, such as the ability to define endpointPublishingStrategy type and internal load balancing, provide ways to publish Ingress Controller endpoints.

5.7.2. The Ingress configuration asset

The installation program generates an asset with an Ingress resource in the config.openshift.io API group, cluster-ingress-02-config.yml.

YAML Definition of the Ingress resource

apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
  name: cluster
spec:
  domain: apps.openshiftdemos.com
Copy to Clipboard Toggle word wrap

The installation program stores this asset in the cluster-ingress-02-config.yml file in the manifests/ directory. This Ingress resource defines the cluster-wide configuration for Ingress. This Ingress configuration is used as follows:

  • The Ingress Operator uses the domain from the cluster Ingress configuration as the domain for the default Ingress Controller.
  • The OpenShift API Server Operator uses the domain from the cluster Ingress configuration. This domain is also used when generating a default host for a Route resource that does not specify an explicit host.

5.7.3. Ingress Controller configuration parameters

The IngressController custom resource (CR) includes optional configuration parameters that you can configure to meet specific needs for your organization.

Expand
ParameterDescription

domain

domain is a DNS name serviced by the Ingress Controller and is used to configure multiple features:

  • For the LoadBalancerService endpoint publishing strategy, domain is used to configure DNS records. See endpointPublishingStrategy.
  • When using a generated default certificate, the certificate is valid for domain and its subdomains. See defaultCertificate.
  • The value is published to individual Route statuses so that users know where to target external DNS records.

The domain value must be unique among all Ingress Controllers and cannot be updated.

If empty, the default value is ingress.config.openshift.io/cluster .spec.domain.

replicas

replicas is the number of Ingress Controller replicas. If not set, the default value is 2.

endpointPublishingStrategy

endpointPublishingStrategy is used to publish the Ingress Controller endpoints to other networks, enable load balancer integrations, and provide access to other systems.

For cloud environments, use the loadBalancer field to configure the endpoint publishing strategy for your Ingress Controller.

On Google Cloud, AWS, and Azure you can configure the following endpointPublishingStrategy fields:

  • loadBalancer.scope
  • loadBalancer.allowedSourceRanges

If not set, the default value is based on infrastructure.config.openshift.io/cluster .status.platform:

  • Azure: LoadBalancerService (with External scope)
  • Google Cloud: LoadBalancerService (with External scope)

For most platforms, the endpointPublishingStrategy value can be updated. On Google Cloud, you can configure the following endpointPublishingStrategy fields:

  • loadBalancer.scope
  • loadbalancer.providerParameters.gcp.clientAccess

For non-cloud environments, such as a bare-metal platform, use the NodePortService, HostNetwork, or Private fields to configure the endpoint publishing strategy for your Ingress Controller.

If you do not set a value in one of these fields, the default value is based on binding ports specified in the .status.platform value in the IngressController CR.

If you need to update the endpointPublishingStrategy value after your cluster is deployed, you can configure the following endpointPublishingStrategy fields:

  • hostNetwork.protocol
  • nodePort.protocol
  • private.protocol

defaultCertificate

The defaultCertificate value is a reference to a secret that contains the default certificate that is served by the Ingress Controller. When Routes do not specify their own certificate, defaultCertificate is used.

The secret must contain the following keys and data: * tls.crt: certificate file contents * tls.key: key file contents

If not set, a wildcard certificate is automatically generated and used. The certificate is valid for the Ingress Controller domain and subdomains, and the generated certificate’s CA is automatically integrated with the cluster’s trust store.

The in-use certificate, whether generated or user-specified, is automatically integrated with OpenShift Container Platform built-in OAuth server.

namespaceSelector

namespaceSelector is used to filter the set of namespaces serviced by the Ingress Controller. This is useful for implementing shards.

routeSelector

routeSelector is used to filter the set of Routes serviced by the Ingress Controller. This is useful for implementing shards.

nodePlacement

nodePlacement enables explicit control over the scheduling of the Ingress Controller.

If not set, the defaults values are used.

Note

The nodePlacement parameter includes two parts, nodeSelector and tolerations. For example:

nodePlacement:
 nodeSelector:
   matchLabels:
     kubernetes.io/os: linux
 tolerations:
 - effect: NoSchedule
   operator: Exists
Copy to Clipboard Toggle word wrap

tlsSecurityProfile

tlsSecurityProfile specifies settings for TLS connections for Ingress Controllers.

If not set, the default value is based on the apiservers.config.openshift.io/cluster resource.

When using the Old, Intermediate, and Modern profile types, the effective profile configuration is subject to change between releases. For example, given a specification to use the Intermediate profile deployed on release X.Y.Z, an upgrade to release X.Y.Z+1 may cause a new profile configuration to be applied to the Ingress Controller, resulting in a rollout.

The minimum TLS version for Ingress Controllers is 1.1, and the maximum TLS version is 1.3.

Note

Ciphers and the minimum TLS version of the configured security profile are reflected in the TLSProfile status.

Important

The Ingress Operator converts the TLS 1.0 of an Old or Custom profile to 1.1.

clientTLS

clientTLS authenticates client access to the cluster and services; as a result, mutual TLS authentication is enabled. If not set, then client TLS is not enabled.

clientTLS has the required subfields, spec.clientTLS.clientCertificatePolicy and spec.clientTLS.ClientCA.

The ClientCertificatePolicy subfield accepts one of the two values: Required or Optional. The ClientCA subfield specifies a config map that is in the openshift-config namespace. The config map should contain a CA certificate bundle.

The AllowedSubjectPatterns is an optional value that specifies a list of regular expressions, which are matched against the distinguished name on a valid client certificate to filter requests. The regular expressions must use PCRE syntax. At least one pattern must match a client certificate’s distinguished name; otherwise, the Ingress Controller rejects the certificate and denies the connection. If not specified, the Ingress Controller does not reject certificates based on the distinguished name.

routeAdmission

routeAdmission defines a policy for handling new route claims, such as allowing or denying claims across namespaces.

namespaceOwnership describes how hostname claims across namespaces should be handled. The default is Strict.

  • Strict: does not allow routes to claim the same hostname across namespaces.
  • InterNamespaceAllowed: allows routes to claim different paths of the same hostname across namespaces.

wildcardPolicy describes how routes with wildcard policies are handled by the Ingress Controller.

  • WildcardsAllowed: Indicates routes with any wildcard policy are admitted by the Ingress Controller.
  • WildcardsDisallowed: Indicates only routes with a wildcard policy of None are admitted by the Ingress Controller. Updating wildcardPolicy from WildcardsAllowed to WildcardsDisallowed causes admitted routes with a wildcard policy of Subdomain to stop working. These routes must be recreated to a wildcard policy of None to be readmitted by the Ingress Controller. WildcardsDisallowed is the default setting.

IngressControllerLogging

logging defines parameters for what is logged where. If this field is empty, operational logs are enabled but access logs are disabled.

  • access describes how client requests are logged. If this field is empty, access logging is disabled.

    • destination describes a destination for log messages.

      • type is the type of destination for logs:

        • Container specifies that logs should go to a sidecar container. The Ingress Operator configures the container, named logs, on the Ingress Controller pod and configures the Ingress Controller to write logs to the container. The expectation is that the administrator configures a custom logging solution that reads logs from this container. Using container logs means that logs may be dropped if the rate of logs exceeds the container runtime capacity or the custom logging solution capacity.
        • Syslog specifies that logs are sent to a Syslog endpoint. The administrator must specify an endpoint that can receive Syslog messages. The expectation is that the administrator has configured a custom Syslog instance.
      • container describes parameters for the Container logging destination type. Currently there are no parameters for container logging, so this field must be empty.
      • syslog describes parameters for the Syslog logging destination type:

        • address is the IP address of the syslog endpoint that receives log messages.
        • port is the UDP port number of the syslog endpoint that receives log messages.
        • maxLength is the maximum length of the syslog message. It must be between 480 and 4096 bytes. If this field is empty, the maximum length is set to the default value of 1024 bytes.
        • facility specifies the syslog facility of log messages. If this field is empty, the facility is local1. Otherwise, it must specify a valid syslog facility: kern, user, mail, daemon, auth, syslog, lpr, news, uucp, cron, auth2, ftp, ntp, audit, alert, cron2, local0, local1, local2, local3. local4, local5, local6, or local7.
    • httpLogFormat specifies the format of the log message for an HTTP request. If this field is empty, log messages use the implementation’s default HTTP log format. For HAProxy’s default HTTP log format, see the HAProxy documentation.

httpHeaders

httpHeaders defines the policy for HTTP headers.

By setting the forwardedHeaderPolicy for the IngressControllerHTTPHeaders, you specify when and how the Ingress Controller sets the Forwarded, X-Forwarded-For, X-Forwarded-Host, X-Forwarded-Port, X-Forwarded-Proto, and X-Forwarded-Proto-Version HTTP headers.

By default, the policy is set to Append.

  • Append specifies that the Ingress Controller appends the headers, preserving any existing headers.
  • Replace specifies that the Ingress Controller sets the headers, removing any existing headers.
  • IfNone specifies that the Ingress Controller sets the headers if they are not already set.
  • Never specifies that the Ingress Controller never sets the headers, preserving any existing headers.

By setting headerNameCaseAdjustments, you can specify case adjustments that can be applied to HTTP header names. Each adjustment is specified as an HTTP header name with the desired capitalization. For example, specifying X-Forwarded-For indicates that the x-forwarded-for HTTP header should be adjusted to have the specified capitalization.

These adjustments are only applied to cleartext, edge-terminated, and re-encrypt routes, and only when using HTTP/1.

For request headers, these adjustments are applied only for routes that have the haproxy.router.openshift.io/h1-adjust-case=true annotation. For response headers, these adjustments are applied to all HTTP responses. If this field is empty, no request headers are adjusted.

actions specifies options for performing certain actions on headers. Headers cannot be set or deleted for TLS passthrough connections. The actions field has additional subfields spec.httpHeader.actions.response and spec.httpHeader.actions.request:

  • The response subfield specifies a list of HTTP response headers to set or delete.
  • The request subfield specifies a list of HTTP request headers to set or delete.

httpCompression

httpCompression defines the policy for HTTP traffic compression.

  • mimeTypes defines a list of MIME types to which compression should be applied. For example, text/css; charset=utf-8, text/html, text/*, image/svg+xml, application/octet-stream, X-custom/customsub, using the format pattern, type/subtype; [;attribute=value]. The types are: application, image, message, multipart, text, video, or a custom type prefaced by X-; e.g. To see the full notation for MIME types and subtypes, see RFC1341

httpErrorCodePages

httpErrorCodePages specifies custom HTTP error code response pages. By default, an IngressController uses error pages built into the IngressController image.

httpCaptureCookies

httpCaptureCookies specifies HTTP cookies that you want to capture in access logs. If the httpCaptureCookies field is empty, the access logs do not capture the cookies.

For any cookie that you want to capture, the following parameters must be in your IngressController configuration:

  • name specifies the name of the cookie.
  • maxLength specifies tha maximum length of the cookie.
  • matchType specifies if the field name of the cookie exactly matches the capture cookie setting or is a prefix of the capture cookie setting. The matchType field uses the Exact and Prefix parameters.

For example:

  httpCaptureCookies:
  - matchType: Exact
    maxLength: 128
    name: MYCOOKIE
Copy to Clipboard Toggle word wrap

httpCaptureHeaders

httpCaptureHeaders specifies the HTTP headers that you want to capture in the access logs. If the httpCaptureHeaders field is empty, the access logs do not capture the headers.

httpCaptureHeaders contains two lists of headers to capture in the access logs. The two lists of header fields are request and response. In both lists, the name field must specify the header name and the maxlength field must specify the maximum length of the header. For example:

  httpCaptureHeaders:
    request:
    - maxLength: 256
      name: Connection
    - maxLength: 128
      name: User-Agent
    response:
    - maxLength: 256
      name: Content-Type
    - maxLength: 256
      name: Content-Length
Copy to Clipboard Toggle word wrap

tuningOptions

tuningOptions specifies options for tuning the performance of Ingress Controller pods.

  • clientFinTimeout specifies how long a connection is held open while waiting for the client response to the server closing the connection. The default timeout is 1s.
  • clientTimeout specifies how long a connection is held open while waiting for a client response. The default timeout is 30s.
  • headerBufferBytes specifies how much memory is reserved, in bytes, for Ingress Controller connection sessions. This value must be at least 16384 if HTTP/2 is enabled for the Ingress Controller. If not set, the default value is 32768 bytes. Setting this field not recommended because headerBufferBytes values that are too small can break the Ingress Controller, and headerBufferBytes values that are too large could cause the Ingress Controller to use significantly more memory than necessary.
  • headerBufferMaxRewriteBytes specifies how much memory should be reserved, in bytes, from headerBufferBytes for HTTP header rewriting and appending for Ingress Controller connection sessions. The minimum value for headerBufferMaxRewriteBytes is 4096. headerBufferBytes must be greater than headerBufferMaxRewriteBytes for incoming HTTP requests. If not set, the default value is 8192 bytes. Setting this field not recommended because headerBufferMaxRewriteBytes values that are too small can break the Ingress Controller and headerBufferMaxRewriteBytes values that are too large could cause the Ingress Controller to use significantly more memory than necessary.
  • healthCheckInterval specifies how long the router waits between health checks. The default is 5s.
  • serverFinTimeout specifies how long a connection is held open while waiting for the server response to the client that is closing the connection. The default timeout is 1s.
  • serverTimeout specifies how long a connection is held open while waiting for a server response. The default timeout is 30s.
  • threadCount specifies the number of threads to create per HAProxy process. Creating more threads allows each Ingress Controller pod to handle more connections, at the cost of more system resources being used. HAProxy supports up to 64 threads. If this field is empty, the Ingress Controller uses the default value of 4 threads. The default value can change in future releases. Setting this field is not recommended because increasing the number of HAProxy threads allows Ingress Controller pods to use more CPU time under load, and prevent other pods from receiving the CPU resources they need to perform. Reducing the number of threads can cause the Ingress Controller to perform poorly.
  • tlsInspectDelay specifies how long the router can hold data to find a matching route. Setting this value too short can cause the router to fall back to the default certificate for edge-terminated, reencrypted, or passthrough routes, even when using a better matched certificate. The default inspect delay is 5s.
  • tunnelTimeout specifies how long a tunnel connection, including websockets, remains open while the tunnel is idle. The default timeout is 1h.
  • maxConnections specifies the maximum number of simultaneous connections that can be established per HAProxy process. Increasing this value allows each ingress controller pod to handle more connections at the cost of additional system resources. Permitted values are 0, -1, any value within the range 2000 and 2000000, or the field can be left empty.

    • If this field is left empty or has the value 0, the Ingress Controller will use the default value of 50000. This value is subject to change in future releases.
    • If the field has the value of -1, then HAProxy will dynamically compute a maximum value based on the available ulimits in the running container. This process results in a large computed value that will incur significant memory usage compared to the current default value of 50000.
    • If the field has a value that is greater than the current operating system limit, the HAProxy process will not start.
    • If you choose a discrete value and the router pod is migrated to a new node, it is possible the new node does not have an identical ulimit configured. In such cases, the pod fails to start.
    • If you have nodes with different ulimits configured, and you choose a discrete value, it is recommended to use the value of -1 for this field so that the maximum number of connections is calculated at runtime.

logEmptyRequests

logEmptyRequests specifies connections for which no request is received and logged. These empty requests come from load balancer health probes or web browser speculative connections (preconnect) and logging these requests can be undesirable. However, these requests can be caused by network errors, in which case logging empty requests can be useful for diagnosing the errors. These requests can be caused by port scans, and logging empty requests can aid in detecting intrusion attempts. Allowed values for this field are Log and Ignore. The default value is Log.

The LoggingPolicy type accepts either one of two values:

  • Log: Setting this value to Log indicates that an event should be logged.
  • Ignore: Setting this value to Ignore sets the dontlognull option in the HAproxy configuration.

HTTPEmptyRequestsPolicy

HTTPEmptyRequestsPolicy describes how HTTP connections are handled if the connection times out before a request is received. Allowed values for this field are Respond and Ignore. The default value is Respond.

The HTTPEmptyRequestsPolicy type accepts either one of two values:

  • Respond: If the field is set to Respond, the Ingress Controller sends an HTTP 400 or 408 response, logs the connection if access logging is enabled, and counts the connection in the appropriate metrics.
  • Ignore: Setting this option to Ignore adds the http-ignore-probes parameter in the HAproxy configuration. If the field is set to Ignore, the Ingress Controller closes the connection without sending a response, then logs the connection, or incrementing metrics.

These connections come from load balancer health probes or web browser speculative connections (preconnect) and can be safely ignored. However, these requests can be caused by network errors, so setting this field to Ignore can impede detection and diagnosis of problems. These requests can be caused by port scans, in which case logging empty requests can aid in detecting intrusion attempts.

5.7.3.1. Ingress Controller TLS security profiles

TLS security profiles provide a way for servers to regulate which ciphers a connecting client can use when connecting to the server.

5.7.3.1.1. Understanding TLS security profiles

You can use a TLS (Transport Layer Security) security profile, as described in this section, to define which TLS ciphers are required by various OpenShift Container Platform components.

The OpenShift Container Platform TLS security profiles are based on Mozilla recommended configurations.

You can specify one of the following TLS security profiles for each component:

Expand
Table 5.13. TLS security profiles
ProfileDescription

Old

This profile is intended for use with legacy clients or libraries. The profile is based on the Old backward compatibility recommended configuration.

The Old profile requires a minimum TLS version of 1.0.

Note

For the Ingress Controller, the minimum TLS version is converted from 1.0 to 1.1.

Intermediate

This profile is the default TLS security profile for the Ingress Controller, kubelet, and control plane. The profile is based on the Intermediate compatibility recommended configuration.

The Intermediate profile requires a minimum TLS version of 1.2.

Note

This profile is the recommended configuration for the majority of clients.

Modern

This profile is intended for use with modern clients that have no need for backwards compatibility. This profile is based on the Modern compatibility recommended configuration.

The Modern profile requires a minimum TLS version of 1.3.

Custom

This profile allows you to define the TLS version and ciphers to use.

Warning

Use caution when using a Custom profile, because invalid configurations can cause problems.

Note

When using one of the predefined profile types, the effective profile configuration is subject to change between releases. For example, given a specification to use the Intermediate profile deployed on release X.Y.Z, an upgrade to release X.Y.Z+1 might cause a new profile configuration to be applied, resulting in a rollout.

To configure a TLS security profile for an Ingress Controller, edit the IngressController custom resource (CR) to specify a predefined or custom TLS security profile. If a TLS security profile is not configured, the default value is based on the TLS security profile set for the API server.

Sample IngressController CR that configures the Old TLS security profile

apiVersion: operator.openshift.io/v1
kind: IngressController
 ...
spec:
  tlsSecurityProfile:
    old: {}
    type: Old
 ...
Copy to Clipboard Toggle word wrap

The TLS security profile defines the minimum TLS version and the TLS ciphers for TLS connections for Ingress Controllers.

You can see the ciphers and the minimum TLS version of the configured TLS security profile in the IngressController custom resource (CR) under Status.Tls Profile and the configured TLS security profile under Spec.Tls Security Profile. For the Custom TLS security profile, the specific ciphers and minimum TLS version are listed under both parameters.

Note

The HAProxy Ingress Controller image supports TLS 1.3 and the Modern profile.

The Ingress Operator also converts the TLS 1.0 of an Old or Custom profile to 1.1.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.

Procedure

  1. Edit the IngressController CR in the openshift-ingress-operator project to configure the TLS security profile:

    $ oc edit IngressController default -n openshift-ingress-operator
    Copy to Clipboard Toggle word wrap
  2. Add the spec.tlsSecurityProfile field:

    Sample IngressController CR for a Custom profile

    apiVersion: operator.openshift.io/v1
    kind: IngressController
     ...
    spec:
      tlsSecurityProfile:
        type: Custom 
    1
    
        custom: 
    2
    
          ciphers: 
    3
    
          - ECDHE-ECDSA-CHACHA20-POLY1305
          - ECDHE-RSA-CHACHA20-POLY1305
          - ECDHE-RSA-AES128-GCM-SHA256
          - ECDHE-ECDSA-AES128-GCM-SHA256
          minTLSVersion: VersionTLS11
     ...
    Copy to Clipboard Toggle word wrap

    1
    Specify the TLS security profile type (Old, Intermediate, or Custom). The default is Intermediate.
    2
    Specify the appropriate field for the selected type:
    • old: {}
    • intermediate: {}
    • custom:
    3
    For the custom type, specify a list of TLS ciphers and minimum accepted TLS version.
  3. Save the file to apply the changes.

Verification

  • Verify that the profile is set in the IngressController CR:

    $ oc describe IngressController default -n openshift-ingress-operator
    Copy to Clipboard Toggle word wrap

    Example output

    Name:         default
    Namespace:    openshift-ingress-operator
    Labels:       <none>
    Annotations:  <none>
    API Version:  operator.openshift.io/v1
    Kind:         IngressController
     ...
    Spec:
     ...
      Tls Security Profile:
        Custom:
          Ciphers:
            ECDHE-ECDSA-CHACHA20-POLY1305
            ECDHE-RSA-CHACHA20-POLY1305
            ECDHE-RSA-AES128-GCM-SHA256
            ECDHE-ECDSA-AES128-GCM-SHA256
          Min TLS Version:  VersionTLS11
        Type:               Custom
     ...
    Copy to Clipboard Toggle word wrap

5.7.3.1.3. Configuring mutual TLS authentication

You can configure the Ingress Controller to enable mutual TLS (mTLS) authentication by setting a spec.clientTLS value. The clientTLS value configures the Ingress Controller to verify client certificates. This configuration includes setting a clientCA value, which is a reference to a config map. The config map contains the PEM-encoded CA certificate bundle that is used to verify a client’s certificate. Optionally, you can also configure a list of certificate subject filters.

If the clientCA value specifies an X509v3 certificate revocation list (CRL) distribution point, the Ingress Operator downloads and manages a CRL config map based on the HTTP URI X509v3 CRL Distribution Point specified in each provided certificate. The Ingress Controller uses this config map during mTLS/TLS negotiation. Requests that do not provide valid certificates are rejected.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • You have a PEM-encoded CA certificate bundle.
  • If your CA bundle references a CRL distribution point, you must have also included the end-entity or leaf certificate to the client CA bundle. This certificate must have included an HTTP URI under CRL Distribution Points, as described in RFC 5280. For example:

     Issuer: C=US, O=Example Inc, CN=Example Global G2 TLS RSA SHA256 2020 CA1
             Subject: SOME SIGNED CERT            X509v3 CRL Distribution Points:
                    Full Name:
                      URI:http://crl.example.com/example.crl
    Copy to Clipboard Toggle word wrap

Procedure

  1. In the openshift-config namespace, create a config map from your CA bundle:

    $ oc create configmap \
      router-ca-certs-default \
      --from-file=ca-bundle.pem=client-ca.crt \
    1
    
      -n openshift-config
    Copy to Clipboard Toggle word wrap
    1
    The config map data key must be ca-bundle.pem, and the data value must be a CA certificate in PEM format.
  2. Edit the IngressController resource in the openshift-ingress-operator project:

    $ oc edit IngressController default -n openshift-ingress-operator
    Copy to Clipboard Toggle word wrap
  3. Add the spec.clientTLS field and subfields to configure mutual TLS:

    Sample IngressController CR for a clientTLS profile that specifies filtering patterns

      apiVersion: operator.openshift.io/v1
      kind: IngressController
      metadata:
        name: default
        namespace: openshift-ingress-operator
      spec:
        clientTLS:
          clientCertificatePolicy: Required
          clientCA:
            name: router-ca-certs-default
          allowedSubjectPatterns:
          - "^/CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift$"
    Copy to Clipboard Toggle word wrap

  4. Optional, get the Distinguished Name (DN) for allowedSubjectPatterns by entering the following command.

    $ openssl x509 -in custom-cert.pem -noout -subject
    Copy to Clipboard Toggle word wrap

    Example output

    subject=C=US, ST=NC, O=Security, OU=OpenShift, CN=example.com
    Copy to Clipboard Toggle word wrap

5.7.4. View the default Ingress Controller

The Ingress Operator is a core feature of OpenShift Container Platform and is enabled out of the box.

Every new OpenShift Container Platform installation has an ingresscontroller named default. It can be supplemented with additional Ingress Controllers. If the default ingresscontroller is deleted, the Ingress Operator will automatically recreate it within a minute.

Procedure

  • View the default Ingress Controller:

    $ oc describe --namespace=openshift-ingress-operator ingresscontroller/default
    Copy to Clipboard Toggle word wrap

5.7.5. View Ingress Operator status

You can view and inspect the status of your Ingress Operator.

Procedure

  • View your Ingress Operator status:

    $ oc describe clusteroperators/ingress
    Copy to Clipboard Toggle word wrap

5.7.6. View Ingress Controller logs

You can view your Ingress Controller logs.

Procedure

  • View your Ingress Controller logs:

    $ oc logs --namespace=openshift-ingress-operator deployments/ingress-operator -c <container_name>
    Copy to Clipboard Toggle word wrap

5.7.7. View Ingress Controller status

Your can view the status of a particular Ingress Controller.

Procedure

  • View the status of an Ingress Controller:

    $ oc describe --namespace=openshift-ingress-operator ingresscontroller/<name>
    Copy to Clipboard Toggle word wrap

5.7.8. Creating a custom Ingress Controller

As a cluster administrator, you can create a new custom Ingress Controller. Because the default Ingress Controller might change during OpenShift Container Platform updates, creating a custom Ingress Controller can be helpful when maintaining a configuration manually that persists across cluster updates.

This example provides a minimal spec for a custom Ingress Controller. To further customize your custom Ingress Controller, see "Configuring the Ingress Controller".

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a YAML file that defines the custom IngressController object:

    Example custom-ingress-controller.yaml file

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
        name: <custom_name> 
    1
    
        namespace: openshift-ingress-operator
    spec:
        defaultCertificate:
            name: <custom-ingress-custom-certs> 
    2
    
        replicas: 1 
    3
    
        domain: <custom_domain> 
    4
    Copy to Clipboard Toggle word wrap

    1
    Specify the a custom name for the IngressController object.
    2
    Specify the name of the secret with the custom wildcard certificate.
    3
    Minimum replica needs to be ONE
    4
    Specify the domain to your domain name. The domain specified on the IngressController object and the domain used for the certificate must match. For example, if the domain value is "custom_domain.mycompany.com", then the certificate must have SAN *.custom_domain.mycompany.com (with the *. added to the domain).
  2. Create the object by running the following command:

    $ oc create -f custom-ingress-controller.yaml
    Copy to Clipboard Toggle word wrap

5.7.9. Configuring the Ingress Controller

5.7.9.1. Setting a custom default certificate

As an administrator, you can configure an Ingress Controller to use a custom certificate by creating a Secret resource and editing the IngressController custom resource (CR).

Prerequisites

  • You must have a certificate/key pair in PEM-encoded files, where the certificate is signed by a trusted certificate authority or by a private trusted certificate authority that you configured in a custom PKI.
  • Your certificate meets the following requirements:

    • The certificate is valid for the ingress domain.
    • The certificate uses the subjectAltName extension to specify a wildcard domain, such as *.apps.ocp4.example.com.
  • You must have an IngressController CR, which includes just having the default IngressController CR. You can run the following command to check that you have an IngressController CR:

    $ oc --namespace openshift-ingress-operator get ingresscontrollers
    Copy to Clipboard Toggle word wrap
Note

If you have intermediate certificates, they must be included in the tls.crt file of the secret containing a custom default certificate. Order matters when specifying a certificate; list your intermediate certificate(s) after any server certificate(s).

Procedure

The following assumes that the custom certificate and key pair are in the tls.crt and tls.key files in the current working directory. Substitute the actual path names for tls.crt and tls.key. You also may substitute another name for custom-certs-default when creating the Secret resource and referencing it in the IngressController CR.

Note

This action will cause the Ingress Controller to be redeployed, using a rolling deployment strategy.

  1. Create a Secret resource containing the custom certificate in the openshift-ingress namespace using the tls.crt and tls.key files.

    $ oc --namespace openshift-ingress create secret tls custom-certs-default --cert=tls.crt --key=tls.key
    Copy to Clipboard Toggle word wrap
  2. Update the IngressController CR to reference the new certificate secret:

    $ oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \
      --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'
    Copy to Clipboard Toggle word wrap
  3. Verify the update was effective:

    $ echo Q |\
      openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null |\
      openssl x509 -noout -subject -issuer -enddate
    Copy to Clipboard Toggle word wrap

    where:

    <domain>
    Specifies the base domain name for your cluster.

    Example output

    subject=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = *.apps.example.com
    issuer=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = example.com
    notAfter=May 10 08:32:45 2022 GM
    Copy to Clipboard Toggle word wrap

    Tip

    You can alternatively apply the following YAML to set a custom default certificate:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      defaultCertificate:
        name: custom-certs-default
    Copy to Clipboard Toggle word wrap

    The certificate secret name should match the value used to update the CR.

Once the IngressController CR has been modified, the Ingress Operator updates the Ingress Controller’s deployment to use the custom certificate.

5.7.9.2. Removing a custom default certificate

As an administrator, you can remove a custom certificate that you configured an Ingress Controller to use.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • You have installed the OpenShift CLI (oc).
  • You previously configured a custom default certificate for the Ingress Controller.

Procedure

  • To remove the custom certificate and restore the certificate that ships with OpenShift Container Platform, enter the following command:

    $ oc patch -n openshift-ingress-operator ingresscontrollers/default \
      --type json -p $'- op: remove\n  path: /spec/defaultCertificate'
    Copy to Clipboard Toggle word wrap

    There can be a delay while the cluster reconciles the new certificate configuration.

Verification

  • To confirm that the original cluster certificate is restored, enter the following command:

    $ echo Q | \
      openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null | \
      openssl x509 -noout -subject -issuer -enddate
    Copy to Clipboard Toggle word wrap

    where:

    <domain>
    Specifies the base domain name for your cluster.

    Example output

    subject=CN = *.apps.<domain>
    issuer=CN = ingress-operator@1620633373
    notAfter=May 10 10:44:36 2023 GMT
    Copy to Clipboard Toggle word wrap

5.7.9.3. Autoscaling an Ingress Controller

You can automatically scale an Ingress Controller to dynamically meet routing performance or availability requirements. For example, the requirement to increase throughput.

The following procedure provides an example for scaling up the default Ingress Controller.

Prerequisites

  • You have the OpenShift CLI (oc) installed.
  • You have access to an OpenShift Container Platform cluster as a user with the cluster-admin role.
  • On VMware vSphere, bare-metal, and Nutanix installer-provisioned infrastructure, scaling up Ingress Controller pods does not improve external traffic performance. To improve performance, ensure that you complete the following prerequisites:

    • You manually configured a user-managed load balancer for your cluster.
    • You ensured that the load balancer was configured for the cluster nodes that handle incoming traffic from the Ingress Controller.
  • You installed the Custom Metrics Autoscaler Operator and an associated KEDA Controller.

    • You can install the Operator by using OperatorHub on the web console. After you install the Operator, you can create an instance of KedaController.

Procedure

  1. Create a service account to authenticate with Thanos by running the following command:

    $ oc create -n openshift-ingress-operator serviceaccount thanos && oc describe -n openshift-ingress-operator serviceaccount thanos
    Copy to Clipboard Toggle word wrap

    Example output

    Name:                thanos
    Namespace:           openshift-ingress-operator
    Labels:              <none>
    Annotations:         <none>
    Image pull secrets:  thanos-dockercfg-kfvf2
    Mountable secrets:   thanos-dockercfg-kfvf2
    Tokens:              <none>
    Events:              <none>
    Copy to Clipboard Toggle word wrap

  2. Manually create the service account secret token with the following command:

    $ oc apply -f - <<EOF
    apiVersion: v1
    kind: Secret
    metadata:
      name: thanos-token
      namespace: openshift-ingress-operator
      annotations:
        kubernetes.io/service-account.name: thanos
    type: kubernetes.io/service-account-token
    EOF
    Copy to Clipboard Toggle word wrap
  3. Define a TriggerAuthentication object within the openshift-ingress-operator namespace by using the service account’s token.

    1. Create the TriggerAuthentication object and pass the value of the secret variable to the TOKEN parameter:

      $ oc apply -f - <<EOF
      apiVersion: keda.sh/v1alpha1
      kind: TriggerAuthentication
      metadata:
        name: keda-trigger-auth-prometheus
        namespace: openshift-ingress-operator
      spec:
        secretTargetRef:
        - parameter: bearerToken
          name: thanos-token
          key: token
        - parameter: ca
          name: thanos-token
          key: ca.crt
      EOF
      Copy to Clipboard Toggle word wrap
  4. Create and apply a role for reading metrics from Thanos:

    1. Create a new role, thanos-metrics-reader.yaml, that reads metrics from pods and nodes:

      thanos-metrics-reader.yaml

      apiVersion: rbac.authorization.k8s.io/v1
      kind: Role
      metadata:
        name: thanos-metrics-reader
        namespace: openshift-ingress-operator
      rules:
      - apiGroups:
        - ""
        resources:
        - pods
        - nodes
        verbs:
        - get
      - apiGroups:
        - metrics.k8s.io
        resources:
        - pods
        - nodes
        verbs:
        - get
        - list
        - watch
      - apiGroups:
        - ""
        resources:
        - namespaces
        verbs:
        - get
      Copy to Clipboard Toggle word wrap

    2. Apply the new role by running the following command:

      $ oc apply -f thanos-metrics-reader.yaml
      Copy to Clipboard Toggle word wrap
  5. Add the new role to the service account by entering the following commands:

    $ oc adm policy -n openshift-ingress-operator add-role-to-user thanos-metrics-reader -z thanos --role-namespace=openshift-ingress-operator
    Copy to Clipboard Toggle word wrap
    $ oc adm policy -n openshift-ingress-operator add-cluster-role-to-user cluster-monitoring-view -z thanos
    Copy to Clipboard Toggle word wrap
    Note

    The argument add-cluster-role-to-user is only required if you use cross-namespace queries. The following step uses a query from the kube-metrics namespace which requires this argument.

  6. Create a new ScaledObject YAML file, ingress-autoscaler.yaml, that targets the default Ingress Controller deployment:

    Example ScaledObject definition

    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      name: ingress-scaler
      namespace: openshift-ingress-operator
    spec:
      scaleTargetRef: 
    1
    
        apiVersion: operator.openshift.io/v1
        kind: IngressController
        name: default
        envSourceContainerName: ingress-operator
      minReplicaCount: 1
      maxReplicaCount: 20 
    2
    
      cooldownPeriod: 1
      pollingInterval: 1
      triggers:
      - type: prometheus
        metricType: AverageValue
        metadata:
          serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091 
    3
    
          namespace: openshift-ingress-operator 
    4
    
          metricName: 'kube-node-role'
          threshold: '1'
          query: 'sum(kube_node_role{role="worker",service="kube-state-metrics"})' 
    5
    
          authModes: "bearer"
        authenticationRef:
          name: keda-trigger-auth-prometheus
    Copy to Clipboard Toggle word wrap

    1
    The custom resource that you are targeting. In this case, the Ingress Controller.
    2
    Optional: The maximum number of replicas. If you omit this field, the default maximum is set to 100 replicas.
    3
    The Thanos service endpoint in the openshift-monitoring namespace.
    4
    The Ingress Operator namespace.
    5
    This expression evaluates to however many worker nodes are present in the deployed cluster.
    Important

    If you are using cross-namespace queries, you must target port 9091 and not port 9092 in the serverAddress field. You also must have elevated privileges to read metrics from this port.

  7. Apply the custom resource definition by running the following command:

    $ oc apply -f ingress-autoscaler.yaml
    Copy to Clipboard Toggle word wrap

Verification

  • Verify that the default Ingress Controller is scaled out to match the value returned by the kube-state-metrics query by running the following commands:

    • Use the grep command to search the Ingress Controller YAML file for the number of replicas:

      $ oc get -n openshift-ingress-operator ingresscontroller/default -o yaml | grep replicas:
      Copy to Clipboard Toggle word wrap
    • Get the pods in the openshift-ingress project:

      $ oc get pods -n openshift-ingress
      Copy to Clipboard Toggle word wrap

      Example output

      NAME                             READY   STATUS    RESTARTS   AGE
      router-default-7b5df44ff-l9pmm   2/2     Running   0          17h
      router-default-7b5df44ff-s5sl5   2/2     Running   0          3d22h
      router-default-7b5df44ff-wwsth   2/2     Running   0          66s
      Copy to Clipboard Toggle word wrap

5.7.9.4. Scaling an Ingress Controller

Manually scale an Ingress Controller to meeting routing performance or availability requirements such as the requirement to increase throughput. oc commands are used to scale the IngressController resource. The following procedure provides an example for scaling up the default IngressController.

Note

Scaling is not an immediate action, as it takes time to create the desired number of replicas.

Prerequisites

  • On VMware vSphere, bare-metal, and Nutanix installer-provisioned infrastructure, scaling up Ingress Controller pods does not improve external traffic performance. To improve performance, ensure that you complete the following prerequisites:

    • You manually configured a user-managed load balancer for your cluster.
    • You ensured that the load balancer was configured for the cluster nodes that handle incoming traffic from the Ingress Controller.

Procedure

  1. View the current number of available replicas for the default IngressController:

    $ oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'
    Copy to Clipboard Toggle word wrap
  2. Scale the default IngressController to the desired number of replicas by using the oc patch command. The following example scales the default IngressController to 3 replicas.

    $ oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"replicas": 3}}' --type=merge
    Copy to Clipboard Toggle word wrap
  3. Verify that the default IngressController scaled to the number of replicas that you specified:

    $ oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'
    Copy to Clipboard Toggle word wrap
    Tip

    You can alternatively apply the following YAML to scale an Ingress Controller to three replicas:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      replicas: 3               
    1
    Copy to Clipboard Toggle word wrap
    1
    If you need a different amount of replicas, change the replicas value.
5.7.9.5. Configuring Ingress access logging

You can configure the Ingress Controller to enable access logs. If you have clusters that do not receive much traffic, then you can log to a sidecar. If you have high traffic clusters, to avoid exceeding the capacity of the logging stack or to integrate with a logging infrastructure outside of OpenShift Container Platform, you can forward logs to a custom syslog endpoint. You can also specify the format for access logs.

Container logging is useful to enable access logs on low-traffic clusters when there is no existing Syslog logging infrastructure, or for short-term use while diagnosing problems with the Ingress Controller.

Syslog is needed for high-traffic clusters where access logs could exceed the OpenShift Logging stack’s capacity, or for environments where any logging solution needs to integrate with an existing Syslog logging infrastructure. The Syslog use-cases can overlap.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

Configure Ingress access logging to a sidecar.

  • To configure Ingress access logging, you must specify a destination using spec.logging.access.destination. To specify logging to a sidecar container, you must specify Container spec.logging.access.destination.type. The following example is an Ingress Controller definition that logs to a Container destination:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      replicas: 2
      logging:
        access:
          destination:
            type: Container
    Copy to Clipboard Toggle word wrap
  • When you configure the Ingress Controller to log to a sidecar, the operator creates a container named logs inside the Ingress Controller Pod:

    $ oc -n openshift-ingress logs deployment.apps/router-default -c logs
    Copy to Clipboard Toggle word wrap

    Example output

    2020-05-11T19:11:50.135710+00:00 router-default-57dfc6cd95-bpmk6 router-default-57dfc6cd95-bpmk6 haproxy[108]: 174.19.21.82:39654 [11/May/2020:19:11:50.133] public be_http:hello-openshift:hello-openshift/pod:hello-openshift:hello-openshift:10.128.2.12:8080 0/0/1/0/1 200 142 - - --NI 1/1/0/0/0 0/0 "GET / HTTP/1.1"
    Copy to Clipboard Toggle word wrap

Configure Ingress access logging to a Syslog endpoint.

  • To configure Ingress access logging, you must specify a destination using spec.logging.access.destination. To specify logging to a Syslog endpoint destination, you must specify Syslog for spec.logging.access.destination.type. If the destination type is Syslog, you must also specify a destination endpoint using spec.logging.access.destination.syslog.address and you can specify a facility using spec.logging.access.destination.syslog.facility. The following example is an Ingress Controller definition that logs to a Syslog destination:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      replicas: 2
      logging:
        access:
          destination:
            type: Syslog
            syslog:
              address: 1.2.3.4
              port: 10514
    Copy to Clipboard Toggle word wrap
    Note

    The syslog destination port must be UDP.

    The syslog destination address must be an IP address. It does not support DNS hostname.

Configure Ingress access logging with a specific log format.

  • You can specify spec.logging.access.httpLogFormat to customize the log format. The following example is an Ingress Controller definition that logs to a syslog endpoint with IP address 1.2.3.4 and port 10514:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      replicas: 2
      logging:
        access:
          destination:
            type: Syslog
            syslog:
              address: 1.2.3.4
              port: 10514
          httpLogFormat: '%ci:%cp [%t] %ft %b/%s %B %bq %HM %HU %HV'
    Copy to Clipboard Toggle word wrap

Disable Ingress access logging.

  • To disable Ingress access logging, leave spec.logging or spec.logging.access empty:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      replicas: 2
      logging:
        access: null
    Copy to Clipboard Toggle word wrap

Allow the Ingress Controller to modify the HAProxy log length when using a sidecar.

  • Use spec.logging.access.destination.syslog.maxLength if you are using spec.logging.access.destination.type: Syslog.

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      replicas: 2
      logging:
        access:
          destination:
            type: Syslog
            syslog:
              address: 1.2.3.4
              maxLength: 4096
              port: 10514
    Copy to Clipboard Toggle word wrap
  • Use spec.logging.access.destination.container.maxLength if you are using spec.logging.access.destination.type: Container.

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      replicas: 2
      logging:
        access:
          destination:
            container:
              maxLength: 8192
            type: Container
          httpCaptureHeaders:
            request:
            - maxLength: 128
              name: X-Forwarded-For
    Copy to Clipboard Toggle word wrap
  • To view the original client source IP address by using the X-Forwarded-For header in the Ingress access logs, see the "Capturing Original Client IP from the X-Forwarded-For Header in Ingress and Application Logs" Red Hat Knowledgebase solution.
5.7.9.6. Setting Ingress Controller thread count

A cluster administrator can set the thread count to increase the amount of incoming connections a cluster can handle. You can patch an existing Ingress Controller to increase the amount of threads.

Prerequisites

  • The following assumes that you already created an Ingress Controller.

Procedure

  • Update the Ingress Controller to increase the number of threads:

    $ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"threadCount": 8}}}'
    Copy to Clipboard Toggle word wrap
    Note

    If you have a node that is capable of running large amounts of resources, you can configure spec.nodePlacement.nodeSelector with labels that match the capacity of the intended node, and configure spec.tuningOptions.threadCount to an appropriately high value.

When creating an Ingress Controller on cloud platforms, the Ingress Controller is published by a public cloud load balancer by default. As an administrator, you can create an Ingress Controller that uses an internal cloud load balancer.

Warning

If your cloud provider is Microsoft Azure, you must have at least one public load balancer that points to your nodes. If you do not, all of your nodes will lose egress connectivity to the internet.

Important

If you want to change the scope for an IngressController, you can change the .spec.endpointPublishingStrategy.loadBalancer.scope parameter after the custom resource (CR) is created.

Figure 5.2. Diagram of LoadBalancer

The preceding graphic shows the following concepts pertaining to OpenShift Container Platform Ingress LoadBalancerService endpoint publishing strategy:

  • You can load balance externally, using the cloud provider load balancer, or internally, using the OpenShift Ingress Controller Load Balancer.
  • You can use the single IP address of the load balancer and more familiar ports, such as 8080 and 4200 as shown on the cluster depicted in the graphic.
  • Traffic from the external load balancer is directed at the pods, and managed by the load balancer, as depicted in the instance of a down node. See the Kubernetes Services documentation for implementation details.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create an IngressController custom resource (CR) in a file named <name>-ingress-controller.yaml, such as in the following example:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      namespace: openshift-ingress-operator
      name: <name> 
    1
    
    spec:
      domain: <domain> 
    2
    
      endpointPublishingStrategy:
        type: LoadBalancerService
        loadBalancer:
          scope: Internal 
    3
    Copy to Clipboard Toggle word wrap
    1
    Replace <name> with a name for the IngressController object.
    2
    Specify the domain for the application published by the controller.
    3
    Specify a value of Internal to use an internal load balancer.
  2. Create the Ingress Controller defined in the previous step by running the following command:

    $ oc create -f <name>-ingress-controller.yaml 
    1
    Copy to Clipboard Toggle word wrap
    1
    Replace <name> with the name of the IngressController object.
  3. Optional: Confirm that the Ingress Controller was created by running the following command:

    $ oc --all-namespaces=true get ingresscontrollers
    Copy to Clipboard Toggle word wrap

An Ingress Controller created on Google Cloud with an internal load balancer generates an internal IP address for the service. A cluster administrator can specify the global access option, which enables clients in any region within the same VPC network and compute region as the load balancer, to reach the workloads running on your cluster.

For more information, see the Google Cloud documentation for global access.

Prerequisites

  • You deployed an OpenShift Container Platform cluster on Google Cloud infrastructure.
  • You configured an Ingress Controller to use an internal load balancer.
  • You installed the OpenShift CLI (oc).

Procedure

  1. Configure the Ingress Controller resource to allow global access.

    Note

    You can also create an Ingress Controller and specify the global access option.

    1. Configure the Ingress Controller resource:

      $ oc -n openshift-ingress-operator edit ingresscontroller/default
      Copy to Clipboard Toggle word wrap
    2. Edit the YAML file:

      Sample clientAccess configuration to Global

        spec:
          endpointPublishingStrategy:
            loadBalancer:
              providerParameters:
                gcp:
                  clientAccess: Global 
      1
      
                type: GCP
              scope: Internal
            type: LoadBalancerService
      Copy to Clipboard Toggle word wrap

      1
      Set gcp.clientAccess to Global.
    3. Save the file to apply the changes.
  2. Run the following command to verify that the service allows global access:

    $ oc -n openshift-ingress edit svc/router-default -o yaml
    Copy to Clipboard Toggle word wrap

    The output shows that global access is enabled for Google Cloud with the annotation, networking.gke.io/internal-load-balancer-allow-global-access.

A cluster administrator can set the health check interval to define how long the router waits between two consecutive health checks. This value is applied globally as a default for all routes. The default value is 5 seconds.

Prerequisites

  • The following assumes that you already created an Ingress Controller.

Procedure

  • Update the Ingress Controller to change the interval between back end health checks:

    $ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"healthCheckInterval": "8s"}}}'
    Copy to Clipboard Toggle word wrap
    Note

    To override the healthCheckInterval for a single route, use the route annotation router.openshift.io/haproxy.health.check.interval

You can configure the default Ingress Controller for your cluster to be internal by deleting and recreating it.

Warning

If your cloud provider is Microsoft Azure, you must have at least one public load balancer that points to your nodes. If you do not, all of your nodes will lose egress connectivity to the internet.

Important

If you want to change the scope for an IngressController, you can change the .spec.endpointPublishingStrategy.loadBalancer.scope parameter after the custom resource (CR) is created.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Configure the default Ingress Controller for your cluster to be internal by deleting and recreating it.

    $ oc replace --force --wait --filename - <<EOF
    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      namespace: openshift-ingress-operator
      name: default
    spec:
      endpointPublishingStrategy:
        type: LoadBalancerService
        loadBalancer:
          scope: Internal
    EOF
    Copy to Clipboard Toggle word wrap
5.7.9.11. Configuring the route admission policy

Administrators and application developers can run applications in multiple namespaces with the same domain name. This is for organizations where multiple teams develop microservices that are exposed on the same hostname.

Warning

Allowing claims across namespaces should only be enabled for clusters with trust between namespaces, otherwise a malicious user could take over a hostname. For this reason, the default admission policy disallows hostname claims across namespaces.

Prerequisites

  • Cluster administrator privileges.

Procedure

  • Edit the .spec.routeAdmission field of the ingresscontroller resource variable using the following command:

    $ oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge
    Copy to Clipboard Toggle word wrap

    Sample Ingress Controller configuration

    spec:
      routeAdmission:
        namespaceOwnership: InterNamespaceAllowed
    ...
    Copy to Clipboard Toggle word wrap

    Tip

    You can alternatively apply the following YAML to configure the route admission policy:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      routeAdmission:
        namespaceOwnership: InterNamespaceAllowed
    Copy to Clipboard Toggle word wrap
5.7.9.12. Using wildcard routes

The HAProxy Ingress Controller has support for wildcard routes. The Ingress Operator uses wildcardPolicy to configure the ROUTER_ALLOW_WILDCARD_ROUTES environment variable of the Ingress Controller.

The default behavior of the Ingress Controller is to admit routes with a wildcard policy of None, which is backwards compatible with existing IngressController resources.

Procedure

  1. Configure the wildcard policy.

    1. Use the following command to edit the IngressController resource:

      $ oc edit IngressController
      Copy to Clipboard Toggle word wrap
    2. Under spec, set the wildcardPolicy field to WildcardsDisallowed or WildcardsAllowed:

      spec:
        routeAdmission:
          wildcardPolicy: WildcardsDisallowed # or WildcardsAllowed
      Copy to Clipboard Toggle word wrap
5.7.9.13. HTTP header configuration

To customize request and response headers for your applications, configure the Ingress Controller or apply specific route annotations. Understanding the interaction between these configuration methods ensures you effectively manage global and route-specific header policies.

You can also set certain headers by using route annotations. The various ways of configuring headers can present challenges when working together.

Note

You can only set or delete headers within an IngressController or Route CR, you cannot append them. If an HTTP header is set with a value, that value must be complete and not require appending in the future. In situations where it makes sense to append a header, such as the X-Forwarded-For header, use the spec.httpHeaders.forwardedHeaderPolicy field, instead of spec.httpHeaders.actions.

Order of precedence

When the same HTTP header is modified both in the Ingress Controller and in a route, HAProxy prioritizes the actions in certain ways depending on whether it is a request or response header.

  • For HTTP response headers, actions specified in the Ingress Controller are executed after the actions specified in a route. This means that the actions specified in the Ingress Controller take precedence.
  • For HTTP request headers, actions specified in a route are executed after the actions specified in the Ingress Controller. This means that the actions specified in the route take precedence.

For example, a cluster administrator sets the X-Frame-Options response header with the value DENY in the Ingress Controller using the following configuration:

Example IngressController spec

apiVersion: operator.openshift.io/v1
kind: IngressController
# ...
spec:
  httpHeaders:
    actions:
      response:
      - name: X-Frame-Options
        action:
          type: Set
          set:
            value: DENY
Copy to Clipboard Toggle word wrap

A route owner sets the same response header that the cluster administrator set in the Ingress Controller, but with the value SAMEORIGIN using the following configuration:

Example Route spec

apiVersion: route.openshift.io/v1
kind: Route
# ...
spec:
  httpHeaders:
    actions:
      response:
      - name: X-Frame-Options
        action:
          type: Set
          set:
            value: SAMEORIGIN
Copy to Clipboard Toggle word wrap

When both the IngressController spec and Route spec are configuring the X-Frame-Options response header, then the value set for this header at the global level in the Ingress Controller takes precedence, even if a specific route allows frames. For a request header, the Route spec value overrides the IngressController spec value.

This prioritization occurs because the haproxy.config file uses the following logic, where the Ingress Controller is considered the front end and individual routes are considered the back end. The header value DENY applied to the front end configurations overrides the same header with the value SAMEORIGIN that is set in the back end:

frontend public
  http-response set-header X-Frame-Options 'DENY'

frontend fe_sni
  http-response set-header X-Frame-Options 'DENY'

frontend fe_no_sni
  http-response set-header X-Frame-Options 'DENY'

backend be_secure:openshift-monitoring:alertmanager-main
  http-response set-header X-Frame-Options 'SAMEORIGIN'
Copy to Clipboard Toggle word wrap

Additionally, any actions defined in either the Ingress Controller or a route override values set using route annotations.

Special case headers
The following headers are either prevented entirely from being set or deleted, or allowed under specific circumstances:
Expand
Table 5.14. Special case header configuration options
Header nameConfigurable using IngressController specConfigurable using Route specReason for disallowmentConfigurable using another method

proxy

No

No

The proxy HTTP request header can be used to exploit vulnerable CGI applications by injecting the header value into the HTTP_PROXY environment variable. The proxy HTTP request header is also non-standard and prone to error during configuration.

No

host

No

Yes

When the host HTTP request header is set using the IngressController CR, HAProxy can fail when looking up the correct route.

No

strict-transport-security

No

No

The strict-transport-security HTTP response header is already handled using route annotations and does not need a separate implementation.

Yes: the haproxy.router.openshift.io/hsts_header route annotation

cookie and set-cookie

No

No

The cookies that HAProxy sets are used for session tracking to map client connections to particular back-end servers. Allowing these headers to be set could interfere with HAProxy’s session affinity and restrict HAProxy’s ownership of a cookie.

Yes:

  • the haproxy.router.openshift.io/disable_cookie route annotation
  • the haproxy.router.openshift.io/cookie_name route annotation

You can set or delete certain HTTP request and response headers for compliance purposes or other reasons. You can set or delete these headers either for all routes served by an Ingress Controller or for specific routes.

For example, you might want to migrate an application running on your cluster to use mutual TLS, which requires that your application checks for an X-Forwarded-Client-Cert request header, but the OpenShift Container Platform default Ingress Controller provides an X-SSL-Client-Der request header.

The following procedure modifies the Ingress Controller to set the X-Forwarded-Client-Cert request header, and delete the X-SSL-Client-Der request header.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have access to an OpenShift Container Platform cluster as a user with the cluster-admin role.

Procedure

  1. Edit the Ingress Controller resource:

    $ oc -n openshift-ingress-operator edit ingresscontroller/default
    Copy to Clipboard Toggle word wrap
  2. Replace the X-SSL-Client-Der HTTP request header with the X-Forwarded-Client-Cert HTTP request header:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      name: default
      namespace: openshift-ingress-operator
    spec:
      httpHeaders:
        actions: 
    1
    
          request: 
    2
    
          - name: X-Forwarded-Client-Cert 
    3
    
            action:
              type: Set 
    4
    
              set:
               value: "%{+Q}[ssl_c_der,base64]" 
    5
    
          - name: X-SSL-Client-Der
            action:
              type: Delete
    Copy to Clipboard Toggle word wrap
    1
    The list of actions you want to perform on the HTTP headers.
    2
    The type of header you want to change. In this case, a request header.
    3
    The name of the header you want to change. For a list of available headers you can set or delete, see HTTP header configuration.
    4
    The type of action being taken on the header. This field can have the value Set or Delete.
    5
    When setting HTTP headers, you must provide a value. The value can be a string from a list of available directives for that header, for example DENY, or it can be a dynamic value that will be interpreted using HAProxy’s dynamic value syntax. In this case, a dynamic value is added.
    Note

    For setting dynamic header values for HTTP responses, allowed sample fetchers are res.hdr and ssl_c_der. For setting dynamic header values for HTTP requests, allowed sample fetchers are req.hdr and ssl_c_der. Both request and response dynamic values can use the lower and base64 converters.

  3. Save the file to apply the changes.
5.7.9.15. Using X-Forwarded headers

You configure the HAProxy Ingress Controller to specify a policy for how to handle HTTP headers including Forwarded and X-Forwarded-For. The Ingress Operator uses the HTTPHeaders field to configure the ROUTER_SET_FORWARDED_HEADERS environment variable of the Ingress Controller.

Procedure

  1. Configure the HTTPHeaders field for the Ingress Controller.

    1. Use the following command to edit the IngressController resource:

      $ oc edit IngressController
      Copy to Clipboard Toggle word wrap
    2. Under spec, set the HTTPHeaders policy field to Append, Replace, IfNone, or Never:

      apiVersion: operator.openshift.io/v1
      kind: IngressController
      metadata:
        name: default
        namespace: openshift-ingress-operator
      spec:
        httpHeaders:
          forwardedHeaderPolicy: Append
      Copy to Clipboard Toggle word wrap
5.7.9.15.1. Example use cases

As a cluster administrator, you can:

  • Configure an external proxy that injects the X-Forwarded-For header into each request before forwarding it to an Ingress Controller.

    To configure the Ingress Controller to pass the header through unmodified, you specify the never policy. The Ingress Controller then never sets the headers, and applications receive only the headers that the external proxy provides.

  • Configure the Ingress Controller to pass the X-Forwarded-For header that your external proxy sets on external cluster requests through unmodified.

    To configure the Ingress Controller to set the X-Forwarded-For header on internal cluster requests, which do not go through the external proxy, specify the if-none policy. If an HTTP request already has the header set through the external proxy, then the Ingress Controller preserves it. If the header is absent because the request did not come through the proxy, then the Ingress Controller adds the header.

As an application developer, you can:

  • Configure an application-specific external proxy that injects the X-Forwarded-For header.

    To configure an Ingress Controller to pass the header through unmodified for an application’s Route, without affecting the policy for other Routes, add an annotation haproxy.router.openshift.io/set-forwarded-headers: if-none or haproxy.router.openshift.io/set-forwarded-headers: never on the Route for the application.

    Note

    You can set the haproxy.router.openshift.io/set-forwarded-headers annotation on a per route basis, independent from the globally set value for the Ingress Controller.

You can enable or disable transparent end-to-end HTTP/2 connectivity in HAProxy. This allows application owners to make use of HTTP/2 protocol capabilities, including single connection, header compression, binary streams, and more.

You can enable or disable HTTP/2 connectivity for an individual Ingress Controller or for the entire cluster.

To enable the use of HTTP/2 for the connection from the client to HAProxy, a route must specify a custom certificate. A route that uses the default certificate cannot use HTTP/2. This restriction is necessary to avoid problems from connection coalescing, where the client re-uses a connection for different routes that use the same certificate.

The connection from HAProxy to the application pod can use HTTP/2 only for re-encrypt routes and not for edge-terminated or insecure routes. This restriction is because HAProxy uses Application-Level Protocol Negotiation (ALPN), which is a TLS extension, to negotiate the use of HTTP/2 with the back-end. The implication is that end-to-end HTTP/2 is possible with passthrough and re-encrypt and not with edge-terminated routes.

Note

You can use HTTP/2 with an insecure route whether the Ingress Controller has HTTP/2 enabled or disabled.

Important

For non-passthrough routes, the Ingress Controller negotiates its connection to the application independently of the connection from the client. This means a client may connect to the Ingress Controller and negotiate HTTP/1.1, and the Ingress Controller might then connect to the application, negotiate HTTP/2, and forward the request from the client HTTP/1.1 connection using the HTTP/2 connection to the application. This poses a problem if the client subsequently tries to upgrade its connection from HTTP/1.1 to the WebSocket protocol, because the Ingress Controller cannot forward WebSocket to HTTP/2 and cannot upgrade its HTTP/2 connection to WebSocket. Consequently, if you have an application that is intended to accept WebSocket connections, it must not allow negotiating the HTTP/2 protocol or else clients will fail to upgrade to the WebSocket protocol.

5.7.9.16.1. Enabling HTTP/2

You can enable HTTP/2 on a specific Ingress Controller, or you can enable HTTP/2 for the entire cluster.

Procedure

  • To enable HTTP/2 on a specific Ingress Controller, enter the oc annotate command:

    $ oc -n openshift-ingress-operator annotate ingresscontrollers/<ingresscontroller_name> ingress.operator.openshift.io/default-enable-http2=true 
    1
    Copy to Clipboard Toggle word wrap
    1
    Replace <ingresscontroller_name> with the name of an Ingress Controller to enable HTTP/2.
  • To enable HTTP/2 for the entire cluster, enter the oc annotate command:

    $ oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=true
    Copy to Clipboard Toggle word wrap
Tip

Alternatively, you can apply the following YAML code to enable HTTP/2:

apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
  name: cluster
  annotations:
    ingress.operator.openshift.io/default-enable-http2: "true"
Copy to Clipboard Toggle word wrap
5.7.9.16.2. Disabling HTTP/2

You can disable HTTP/2 on a specific Ingress Controller, or you can disable HTTP/2 for the entire cluster.

Procedure

  • To disable HTTP/2 on a specific Ingress Controller, enter the oc annotate command:

    $ oc -n openshift-ingress-operator annotate ingresscontrollers/<ingresscontroller_name> ingress.operator.openshift.io/default-enable-http2=false 
    1
    Copy to Clipboard Toggle word wrap
    1
    Replace <ingresscontroller_name> with the name of an Ingress Controller to disable HTTP/2.
  • To disable HTTP/2 for the entire cluster, enter the oc annotate command:

    $ oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=false
    Copy to Clipboard Toggle word wrap
Tip

Alternatively, you can apply the following YAML code to disable HTTP/2:

apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
  name: cluster
  annotations:
    ingress.operator.openshift.io/default-enable-http2: "false"
Copy to Clipboard Toggle word wrap

A cluster administrator can configure the PROXY protocol when an Ingress Controller uses either the HostNetwork, NodePortService, or Private endpoint publishing strategy types. The PROXY protocol enables the load balancer to preserve the original client addresses for connections that the Ingress Controller receives. The original client addresses are useful for logging, filtering, and injecting HTTP headers. In the default configuration, the connections that the Ingress Controller receives only contain the source address that is associated with the load balancer.

Warning

The default Ingress Controller with installer-provisioned clusters on non-cloud platforms that use a Keepalived Ingress Virtual IP (VIP) do not support the PROXY protocol.

The PROXY protocol enables the load balancer to preserve the original client addresses for connections that the Ingress Controller receives. The original client addresses are useful for logging, filtering, and injecting HTTP headers. In the default configuration, the connections that the Ingress Controller receives contain only the source IP address that is associated with the load balancer.

Important

For a passthrough route configuration, servers in OpenShift Container Platform clusters cannot observe the original client source IP address. If you need to know the original client source IP address, configure Ingress access logging for your Ingress Controller so that you can view the client source IP addresses.

For re-encrypt and edge routes, the OpenShift Container Platform router sets the Forwarded and X-Forwarded-For headers so that application workloads check the client source IP address.

For more information about Ingress access logging, see "Configuring Ingress access logging".

Configuring the PROXY protocol for an Ingress Controller is not supported when using the LoadBalancerService endpoint publishing strategy type. This restriction is because when OpenShift Container Platform runs in a cloud platform, and an Ingress Controller specifies that a service load balancer should be used, the Ingress Operator configures the load balancer service and enables the PROXY protocol based on the platform requirement for preserving source addresses.

Important

You must configure both OpenShift Container Platform and the external load balancer to use either the PROXY protocol or TCP.

This feature is not supported in cloud deployments. This restriction is because when OpenShift Container Platform runs in a cloud platform, and an Ingress Controller specifies that a service load balancer should be used, the Ingress Operator configures the load balancer service and enables the PROXY protocol based on the platform requirement for preserving source addresses.

Important

You must configure both OpenShift Container Platform and the external load balancer to either use the PROXY protocol or to use Transmission Control Protocol (TCP).

Prerequisites

  • You created an Ingress Controller.

Procedure

  1. Edit the Ingress Controller resource by entering the following command in your CLI:

    $ oc -n openshift-ingress-operator edit ingresscontroller/default
    Copy to Clipboard Toggle word wrap
  2. Set the PROXY configuration:

    • If your Ingress Controller uses the HostNetwork endpoint publishing strategy type, set the spec.endpointPublishingStrategy.hostNetwork.protocol subfield to PROXY:

      Sample hostNetwork configuration to PROXY

      # ...
        spec:
          endpointPublishingStrategy:
            hostNetwork:
              protocol: PROXY
            type: HostNetwork
      # ...
      Copy to Clipboard Toggle word wrap

    • If your Ingress Controller uses the NodePortService endpoint publishing strategy type, set the spec.endpointPublishingStrategy.nodePort.protocol subfield to PROXY:

      Sample nodePort configuration to PROXY

      # ...
        spec:
          endpointPublishingStrategy:
            nodePort:
              protocol: PROXY
            type: NodePortService
      # ...
      Copy to Clipboard Toggle word wrap

    • If your Ingress Controller uses the Private endpoint publishing strategy type, set the spec.endpointPublishingStrategy.private.protocol subfield to PROXY:

      Sample private configuration to PROXY

      # ...
        spec:
          endpointPublishingStrategy:
            private:
              protocol: PROXY
          type: Private
      # ...
      Copy to Clipboard Toggle word wrap

As a cluster administrator, you can specify an alternative to the default cluster domain for user-created routes by configuring the appsDomain field. The appsDomain field is an optional domain for OpenShift Container Platform to use instead of the default, which is specified in the domain field. If you specify an alternative domain, it overrides the default cluster domain for the purpose of determining the default host for a new route.

For example, you can use the DNS domain for your company as the default domain for routes and ingresses for applications running on your cluster.

Prerequisites

  • You deployed an OpenShift Container Platform cluster.
  • You installed the oc command-line interface.

Procedure

  1. Configure the appsDomain field by specifying an alternative default domain for user-created routes.

    1. Edit the ingress cluster resource:

      $ oc edit ingresses.config/cluster -o yaml
      Copy to Clipboard Toggle word wrap
    2. Edit the YAML file:

      Sample appsDomain configuration to test.example.com

      apiVersion: config.openshift.io/v1
      kind: Ingress
      metadata:
        name: cluster
      spec:
        domain: apps.example.com            
      1
      
        appsDomain: <test.example.com>      
      2
      Copy to Clipboard Toggle word wrap

      1
      Specifies the default domain. You cannot modify the default domain after installation.
      2
      Optional: Domain for OpenShift Container Platform infrastructure to use for application routes. Instead of the default prefix, apps, you can use an alternative prefix like test.
  2. Verify that an existing route contains the domain name specified in the appsDomain field by exposing the route and verifying the route domain change:

    Note

    Wait for the openshift-apiserver finish rolling updates before exposing the route.

    1. Expose the route by entering the following command. The command outputs route.route.openshift.io/hello-openshift exposed to designate exposure of the route.

      $ oc expose service hello-openshift
      Copy to Clipboard Toggle word wrap
    2. Get a list of routes by running the following command:

      $ oc get routes
      Copy to Clipboard Toggle word wrap

      Example output

      NAME              HOST/PORT                                   PATH   SERVICES          PORT       TERMINATION   WILDCARD
      hello-openshift   hello_openshift-<my_project>.test.example.com
      hello-openshift   8080-tcp                 None
      Copy to Clipboard Toggle word wrap

5.7.9.19. Converting HTTP header case

HAProxy lowercases HTTP header names by default; for example, changing Host: xyz.com to host: xyz.com. If legacy applications are sensitive to the capitalization of HTTP header names, use the Ingress Controller spec.httpHeaders.headerNameCaseAdjustments API field for a solution to accommodate legacy applications until they can be fixed.

Important

OpenShift Container Platform includes HAProxy 2.8. If you want to update to this version of the web-based load balancer, ensure that you add the spec.httpHeaders.headerNameCaseAdjustments section to your cluster’s configuration file.

As a cluster administrator, you can convert the HTTP header case by entering the oc patch command or by setting the HeaderNameCaseAdjustments field in the Ingress Controller YAML file.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have access to the cluster as a user with the cluster-admin role.

Procedure

  • Capitalize an HTTP header by using the oc patch command.

    1. Change the HTTP header from host to Host by running the following command:

      $ oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"httpHeaders":{"headerNameCaseAdjustments":["Host"]}}}'
      Copy to Clipboard Toggle word wrap
    2. Create a Route resource YAML file so that the annotation can be applied to the application.

      Example of a route named my-application

      apiVersion: route.openshift.io/v1
      kind: Route
      metadata:
        annotations:
          haproxy.router.openshift.io/h1-adjust-case: true 
      1
      
        name: <application_name>
        namespace: <application_name>
      # ...
      Copy to Clipboard Toggle word wrap

      1
      Set haproxy.router.openshift.io/h1-adjust-case so that the Ingress Controller can adjust the host request header as specified.
  • Specify adjustments by configuring the HeaderNameCaseAdjustments field in the Ingress Controller YAML configuration file.

    1. The following example Ingress Controller YAML file adjusts the host header to Host for HTTP/1 requests to appropriately annotated routes:

      Example Ingress Controller YAML

      apiVersion: operator.openshift.io/v1
      kind: IngressController
      metadata:
        name: default
        namespace: openshift-ingress-operator
      spec:
        httpHeaders:
          headerNameCaseAdjustments:
          - Host
      Copy to Clipboard Toggle word wrap

    2. The following example route enables HTTP response header name case adjustments by using the haproxy.router.openshift.io/h1-adjust-case annotation:

      Example route YAML

      apiVersion: route.openshift.io/v1
      kind: Route
      metadata:
        annotations:
          haproxy.router.openshift.io/h1-adjust-case: true 
      1
      
        name: my-application
        namespace: my-application
      spec:
        to:
          kind: Service
          name: my-application
      Copy to Clipboard Toggle word wrap

      1
      Set haproxy.router.openshift.io/h1-adjust-case to true.
5.7.9.20. Using router compression

You configure the HAProxy Ingress Controller to specify router compression globally for specific MIME types. You can use the mimeTypes variable to define the formats of MIME types to which compression is applied. The types are: application, image, message, multipart, text, video, or a custom type prefaced by "X-". To see the full notation for MIME types and subtypes, see RFC1341.

Note

Memory allocated for compression can affect the max connections. Additionally, compression of large buffers can cause latency, like heavy regex or long lists of regex.

Not all MIME types benefit from compression, but HAProxy still uses resources to try to compress if instructed to. Generally, text formats, such as html, css, and js, formats benefit from compression, but formats that are already compressed, such as image, audio, and video, benefit little in exchange for the time and resources spent on compression.

Procedure

  1. Configure the httpCompression field for the Ingress Controller.

    1. Use the following command to edit the IngressController resource:

      $ oc edit -n openshift-ingress-operator ingresscontrollers/default
      Copy to Clipboard Toggle word wrap
    2. Under spec, set the httpCompression policy field to mimeTypes and specify a list of MIME types that should have compression applied:

      apiVersion: operator.openshift.io/v1
      kind: IngressController
      metadata:
        name: default
        namespace: openshift-ingress-operator
      spec:
        httpCompression:
          mimeTypes:
          - "text/html"
          - "text/css; charset=utf-8"
          - "application/json"
         ...
      Copy to Clipboard Toggle word wrap
5.7.9.21. Exposing router metrics

You can expose the HAProxy router metrics by default in Prometheus format on the default stats port, 1936. The external metrics collection and aggregation systems such as Prometheus can access the HAProxy router metrics. You can view the HAProxy router metrics in a browser in the HTML and comma separated values (CSV) format.

Prerequisites

  • You configured your firewall to access the default stats port, 1936.

Procedure

  1. Get the router pod name by running the following command:

    $ oc get pods -n openshift-ingress
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                              READY   STATUS    RESTARTS   AGE
    router-default-76bfffb66c-46qwp   1/1     Running   0          11h
    Copy to Clipboard Toggle word wrap

  2. Get the router’s username and password, which the router pod stores in the /var/lib/haproxy/conf/metrics-auth/statsUsername and /var/lib/haproxy/conf/metrics-auth/statsPassword files:

    1. Get the username by running the following command:

      $ oc rsh <router_pod_name> cat metrics-auth/statsUsername
      Copy to Clipboard Toggle word wrap
    2. Get the password by running the following command:

      $ oc rsh <router_pod_name> cat metrics-auth/statsPassword
      Copy to Clipboard Toggle word wrap
  3. Get the router IP and metrics certificates by running the following command:

    $ oc describe pod <router_pod>
    Copy to Clipboard Toggle word wrap
  4. Get the raw statistics in Prometheus format by running the following command:

    $ curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics
    Copy to Clipboard Toggle word wrap
  5. Access the metrics securely by running the following command:

    $ curl -u user:password https://<router_IP>:<stats_port>/metrics -k
    Copy to Clipboard Toggle word wrap
  6. Access the default stats port, 1936, by running the following command:

    $ curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics
    Copy to Clipboard Toggle word wrap

    Example 5.1. Example output

    ...
    # HELP haproxy_backend_connections_total Total number of connections.
    # TYPE haproxy_backend_connections_total gauge
    haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route"} 0
    haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route-alt"} 0
    haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route01"} 0
    ...
    # HELP haproxy_exporter_server_threshold Number of servers tracked and the current threshold value.
    # TYPE haproxy_exporter_server_threshold gauge
    haproxy_exporter_server_threshold{type="current"} 11
    haproxy_exporter_server_threshold{type="limit"} 500
    ...
    # HELP haproxy_frontend_bytes_in_total Current total of incoming bytes.
    # TYPE haproxy_frontend_bytes_in_total gauge
    haproxy_frontend_bytes_in_total{frontend="fe_no_sni"} 0
    haproxy_frontend_bytes_in_total{frontend="fe_sni"} 0
    haproxy_frontend_bytes_in_total{frontend="public"} 119070
    ...
    # HELP haproxy_server_bytes_in_total Current total of incoming bytes.
    # TYPE haproxy_server_bytes_in_total gauge
    haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_no_sni",service=""} 0
    haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_sni",service=""} 0
    haproxy_server_bytes_in_total{namespace="default",pod="docker-registry-5-nk5fz",route="docker-registry",server="10.130.0.89:5000",service="docker-registry"} 0
    haproxy_server_bytes_in_total{namespace="default",pod="hello-rc-vkjqx",route="hello-route",server="10.130.0.90:8080",service="hello-svc-1"} 0
    ...
    Copy to Clipboard Toggle word wrap
  7. Launch the stats window by entering the following URL in a browser:

    http://<user>:<password>@<router_IP>:<stats_port>
    Copy to Clipboard Toggle word wrap
  8. Optional: Get the stats in CSV format by entering the following URL in a browser:

    http://<user>:<password>@<router_ip>:1936/metrics;csv
    Copy to Clipboard Toggle word wrap

As a cluster administrator, you can specify a custom error code response page for either 503, 404, or both error pages. The HAProxy router serves a 503 error page when the application pod is not running or a 404 error page when the requested URL does not exist. For example, if you customize the 503 error code response page, then the page is served when the application pod is not running, and the default 404 error code HTTP response page is served by the HAProxy router for an incorrect route or a non-existing route.

Custom error code response pages are specified in a config map then patched to the Ingress Controller. The config map keys have two available file names as follows: error-page-503.http and error-page-404.http.

Custom HTTP error code response pages must follow the HAProxy HTTP error page configuration guidelines. Here is an example of the default OpenShift Container Platform HAProxy router http 503 error code response page. You can use the default content as a template for creating your own custom page.

By default, the HAProxy router serves only a 503 error page when the application is not running or when the route is incorrect or non-existent. This default behavior is the same as the behavior on OpenShift Container Platform 4.8 and earlier. If a config map for the customization of an HTTP error code response is not provided, and you are using a custom HTTP error code response page, the router serves a default 404 or 503 error code response page.

Note

If you use the OpenShift Container Platform default 503 error code page as a template for your customizations, the headers in the file require an editor that can use CRLF line endings.

Procedure

  1. Create a config map named my-custom-error-code-pages in the openshift-config namespace:

    $ oc -n openshift-config create configmap my-custom-error-code-pages \
      --from-file=error-page-503.http \
      --from-file=error-page-404.http
    Copy to Clipboard Toggle word wrap
    Important

    If you do not specify the correct format for the custom error code response page, a router pod outage occurs. To resolve this outage, you must delete or correct the config map and delete the affected router pods so they can be recreated with the correct information.

  2. Patch the Ingress Controller to reference the my-custom-error-code-pages config map by name:

    $ oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
    Copy to Clipboard Toggle word wrap

    The Ingress Operator copies the my-custom-error-code-pages config map from the openshift-config namespace to the openshift-ingress namespace. The Operator names the config map according to the pattern, <your_ingresscontroller_name>-errorpages, in the openshift-ingress namespace.

  3. Display the copy:

    $ oc get cm default-errorpages -n openshift-ingress
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                       DATA   AGE
    default-errorpages         2      25s  
    1
    Copy to Clipboard Toggle word wrap

    1
    The example config map name is default-errorpages because the default Ingress Controller custom resource (CR) was patched.
  4. Confirm that the config map containing the custom error response page mounts on the router volume where the config map key is the filename that has the custom HTTP error code response:

    • For 503 custom HTTP custom error code response:

      $ oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-503.http
      Copy to Clipboard Toggle word wrap
    • For 404 custom HTTP custom error code response:

      $ oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-404.http
      Copy to Clipboard Toggle word wrap

Verification

Verify your custom error code HTTP response:

  1. Create a test project and application:

    $ oc new-project test-ingress
    Copy to Clipboard Toggle word wrap
    $ oc new-app django-psql-example
    Copy to Clipboard Toggle word wrap
  2. For 503 custom http error code response:

    1. Stop all the pods for the application.
    2. Run the following curl command or visit the route hostname in the browser:

      $ curl -vk <route_hostname>
      Copy to Clipboard Toggle word wrap
  3. For 404 custom http error code response:

    1. Visit a non-existent route or an incorrect route.
    2. Run the following curl command or visit the route hostname in the browser:

      $ curl -vk <route_hostname>
      Copy to Clipboard Toggle word wrap
  4. Check if the errorfile attribute is properly in the haproxy.config file:

    $ oc -n openshift-ingress rsh <router> cat /var/lib/haproxy/conf/haproxy.config | grep errorfile
    Copy to Clipboard Toggle word wrap

A cluster administrator can set the maximum number of simultaneous connections for OpenShift router deployments. You can patch an existing Ingress Controller to increase the maximum number of connections.

Prerequisites

  • The following assumes that you already created an Ingress Controller

Procedure

  • Update the Ingress Controller to change the maximum number of connections for HAProxy:

    $ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"maxConnections": 7500}}}'
    Copy to Clipboard Toggle word wrap
    Warning

    If you set the spec.tuningOptions.maxConnections value greater than the current operating system limit, the HAProxy process will not start. See the table in the "Ingress Controller configuration parameters" section for more information about this parameter.

The Ingress Node Firewall Operator provides a stateless, eBPF-based firewall for managing node-level ingress traffic in OpenShift Container Platform.

5.8.1. Ingress Node Firewall Operator

The Ingress Node Firewall Operator provides ingress firewall rules at a node level by deploying the daemon set to nodes you specify and manage in the firewall configurations. To deploy the daemon set, you create an IngressNodeFirewallConfig custom resource (CR). The Operator applies the IngressNodeFirewallConfig CR to create ingress node firewall daemon set daemon, which run on all nodes that match the nodeSelector.

You configure rules of the IngressNodeFirewall CR and apply them to clusters using the nodeSelector and setting values to "true".

Important

The Ingress Node Firewall Operator supports only stateless firewall rules.

Network interface controllers (NICs) that do not support native XDP drivers will run at a lower performance.

For OpenShift Container Platform 4.14 or later, you must run Ingress Node Firewall Operator on RHEL 9.0 or later.

As a cluster administrator, you can install the Ingress Node Firewall Operator by using the OpenShift Container Platform CLI or the web console.

As a cluster administrator, you can install the Operator using the CLI.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have an account with administrator privileges.

Procedure

  1. To create the openshift-ingress-node-firewall namespace, enter the following command:

    $ cat << EOF| oc create -f -
    apiVersion: v1
    kind: Namespace
    metadata:
      labels:
        pod-security.kubernetes.io/enforce: privileged
        pod-security.kubernetes.io/enforce-version: v1.24
      name: openshift-ingress-node-firewall
    EOF
    Copy to Clipboard Toggle word wrap
  2. To create an OperatorGroup CR, enter the following command:

    $ cat << EOF| oc create -f -
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: ingress-node-firewall-operators
      namespace: openshift-ingress-node-firewall
    EOF
    Copy to Clipboard Toggle word wrap
  3. Subscribe to the Ingress Node Firewall Operator.

    1. To create a Subscription CR for the Ingress Node Firewall Operator, enter the following command:

      $ cat << EOF| oc create -f -
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: ingress-node-firewall-sub
        namespace: openshift-ingress-node-firewall
      spec:
        name: ingress-node-firewall
        channel: stable
        source: redhat-operators
        sourceNamespace: openshift-marketplace
      EOF
      Copy to Clipboard Toggle word wrap
  4. To verify that the Operator is installed, enter the following command:

    $ oc get ip -n openshift-ingress-node-firewall
    Copy to Clipboard Toggle word wrap

    Example output

    NAME            CSV                                         APPROVAL    APPROVED
    install-5cvnz   ingress-node-firewall.4.16.0-202211122336   Automatic   true
    Copy to Clipboard Toggle word wrap

  5. To verify the version of the Operator, enter the following command:

    $ oc get csv -n openshift-ingress-node-firewall
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                        DISPLAY                          VERSION               REPLACES                                    PHASE
    ingress-node-firewall.4.16.0-202211122336   Ingress Node Firewall Operator   4.16.0-202211122336   ingress-node-firewall.4.16.0-202211102047   Succeeded
    Copy to Clipboard Toggle word wrap

As a cluster administrator, you can install the Operator using the web console.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have an account with administrator privileges.

Procedure

  1. Install the Ingress Node Firewall Operator:

    1. In the OpenShift Container Platform web console, click OperatorsOperatorHub.
    2. Select Ingress Node Firewall Operator from the list of available Operators, and then click Install.
    3. On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
    4. Click Install.
  2. Verify that the Ingress Node Firewall Operator is installed successfully:

    1. Navigate to the OperatorsInstalled Operators page.
    2. Ensure that Ingress Node Firewall Operator is listed in the openshift-ingress-node-firewall project with a Status of InstallSucceeded.

      Note

      During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

      If the Operator does not have a Status of InstallSucceeded, troubleshoot using the following steps:

      • Inspect the Operator Subscriptions and Install Plans tabs for any failures or errors under Status.
      • Navigate to the WorkloadsPods page and check the logs for pods in the openshift-ingress-node-firewall project.
      • Check the namespace of the YAML file. If the annotation is missing, you can add the annotation workload.openshift.io/allowed=management to the Operator namespace with the following command:

        $ oc annotate ns/openshift-ingress-node-firewall workload.openshift.io/allowed=management
        Copy to Clipboard Toggle word wrap
        Note

        For single-node OpenShift clusters, the openshift-ingress-node-firewall namespace requires the workload.openshift.io/allowed=management annotation.

5.8.3. Deploying Ingress Node Firewall Operator

Prerequisite

  • The Ingress Node Firewall Operator is installed.

Procedure

To deploy the Ingress Node Firewall Operator, create a IngressNodeFirewallConfig custom resource that will deploy the Operator’s daemon set. You can deploy one or multiple IngressNodeFirewall CRDs to nodes by applying firewall rules.

  1. Create the IngressNodeFirewallConfig inside the openshift-ingress-node-firewall namespace named ingressnodefirewallconfig.
  2. Run the following command to deploy Ingress Node Firewall Operator rules:

    $ oc apply -f rule.yaml
    Copy to Clipboard Toggle word wrap

The fields for the Ingress Node Firewall configuration object are described in the following table:

Expand
Table 5.15. Ingress Node Firewall Configuration object
FieldTypeDescription

metadata.name

string

The name of the CR object. The name of the firewall rules object must be ingressnodefirewallconfig.

metadata.namespace

string

Namespace for the Ingress Firewall Operator CR object. The IngressNodeFirewallConfig CR must be created inside the openshift-ingress-node-firewall namespace.

spec.nodeSelector

string

A node selection constraint used to target nodes through specified node labels. For example:

spec:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
Copy to Clipboard Toggle word wrap
Note

One label used in nodeSelector must match a label on the nodes in order for the daemon set to start. For example, if the node labels node-role.kubernetes.io/worker and node-type.kubernetes.io/vm are applied to a node, then at least one label must be set using nodeSelector for the daemon set to start.

Note

The Operator consumes the CR and creates an ingress node firewall daemon set on all the nodes that match the nodeSelector.

A complete Ingress Node Firewall Configuration is specified in the following example:

Example Ingress Node Firewall Configuration object

apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewallConfig
metadata:
  name: ingressnodefirewallconfig
  namespace: openshift-ingress-node-firewall
spec:
  nodeSelector:
    node-role.kubernetes.io/worker: ""
Copy to Clipboard Toggle word wrap

Note

The Operator consumes the CR and creates an ingress node firewall daemon set on all the nodes that match the nodeSelector.

5.8.3.3. Ingress Node Firewall rules object

The fields for the Ingress Node Firewall rules object are described in the following table:

Expand
Table 5.16. Ingress Node Firewall rules object
FieldTypeDescription

metadata.name

string

The name of the CR object.

interfaces

array

The fields for this object specify the interfaces to apply the firewall rules to. For example, - en0 and - en1.

nodeSelector

array

You can use nodeSelector to select the nodes to apply the firewall rules to. Set the value of your named nodeselector labels to true to apply the rule.

ingress

object

ingress allows you to configure the rules that allow outside access to the services on your cluster.

5.8.3.3.1. Ingress object configuration

The values for the ingress object are defined in the following table:

Expand
Table 5.17. ingress object
FieldTypeDescription

sourceCIDRs

array

Allows you to set the CIDR block. You can configure multiple CIDRs from different address families.

Note

Different CIDRs allow you to use the same order rule. In the case that there are multiple IngressNodeFirewall objects for the same nodes and interfaces with overlapping CIDRs, the order field will specify which rule is applied first. Rules are applied in ascending order.

rules

array

Ingress firewall rules.order objects are ordered starting at 1 for each source.CIDR with up to 100 rules per CIDR. Lower order rules are executed first.

rules.protocolConfig.protocol supports the following protocols: TCP, UDP, SCTP, ICMP and ICMPv6. ICMP and ICMPv6 rules can match against ICMP and ICMPv6 types or codes. TCP, UDP, and SCTP rules can match against a single destination port or a range of ports using <start : end-1> format.

Set rules.action to allow to apply the rule or deny to disallow the rule.

Note

Ingress firewall rules are verified using a verification webhook that blocks any invalid configuration. The verification webhook prevents you from blocking any critical cluster services such as the API server.

A complete Ingress Node Firewall configuration is specified in the following example:

Example Ingress Node Firewall configuration

apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewall
metadata:
  name: ingressnodefirewall
spec:
  interfaces:
  - eth0
  nodeSelector:
    matchLabels:
      <ingress_firewall_label_name>: <label_value> 
1

  ingress:
  - sourceCIDRs:
       - 172.16.0.0/12
    rules:
    - order: 10
      protocolConfig:
        protocol: ICMP
        icmp:
          icmpType: 8 #ICMP Echo request
      action: Deny
    - order: 20
      protocolConfig:
        protocol: TCP
        tcp:
          ports: "8000-9000"
      action: Deny
  - sourceCIDRs:
       - fc00:f853:ccd:e793::0/64
    rules:
    - order: 10
      protocolConfig:
        protocol: ICMPv6
        icmpv6:
          icmpType: 128 #ICMPV6 Echo request
      action: Deny
Copy to Clipboard Toggle word wrap

1
A <label_name> and a <label_value> must exist on the node and must match the nodeselector label and value applied to the nodes you want the ingressfirewallconfig CR to run on. The <label_value> can be true or false. By using nodeSelector labels, you can target separate groups of nodes to apply different rules to using the ingressfirewallconfig CR.

Zero trust Ingress Node Firewall rules can provide additional security to multi-interface clusters. For example, you can use zero trust Ingress Node Firewall rules to drop all traffic on a specific interface except for SSH.

A complete configuration of a zero trust Ingress Node Firewall rule set is specified in the following example:

Important

Users need to add all ports their application will use to their allowlist in the following case to ensure proper functionality.

Example zero trust Ingress Node Firewall rules

apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewall
metadata:
 name: ingressnodefirewall-zero-trust
spec:
 interfaces:
 - eth1 
1

 nodeSelector:
   matchLabels:
     <ingress_firewall_label_name>: <label_value> 
2

 ingress:
 - sourceCIDRs:
      - 0.0.0.0/0 
3

   rules:
   - order: 10
     protocolConfig:
       protocol: TCP
       tcp:
         ports: 22
     action: Allow
   - order: 20
     action: Deny 
4
Copy to Clipboard Toggle word wrap

1
Network-interface cluster
2
The <label_name> and <label_value> needs to match the nodeSelector label and value applied to the specific nodes with which you wish to apply the ingressfirewallconfig CR.
3
0.0.0.0/0 set to match any CIDR
4
action set to Deny

Procedure

  1. Run the following command to view all current rules :

    $ oc get ingressnodefirewall
    Copy to Clipboard Toggle word wrap
  2. Choose one of the returned <resource> names and run the following command to view the rules or configs:

    $ oc get <resource> <name> -o yaml
    Copy to Clipboard Toggle word wrap
  • Run the following command to list installed Ingress Node Firewall custom resource definitions (CRD):

    $ oc get crds | grep ingressnodefirewall
    Copy to Clipboard Toggle word wrap

    Example output

    NAME               READY   UP-TO-DATE   AVAILABLE   AGE
    ingressnodefirewallconfigs.ingressnodefirewall.openshift.io       2022-08-25T10:03:01Z
    ingressnodefirewallnodestates.ingressnodefirewall.openshift.io    2022-08-25T10:03:00Z
    ingressnodefirewalls.ingressnodefirewall.openshift.io             2022-08-25T10:03:00Z
    Copy to Clipboard Toggle word wrap

  • Run the following command to view the state of the Ingress Node Firewall Operator:

    $ oc get pods -n openshift-ingress-node-firewall
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                       READY  STATUS         RESTARTS  AGE
    ingress-node-firewall-controller-manager   2/2    Running        0         5d21h
    ingress-node-firewall-daemon-pqx56         3/3    Running        0         5d21h
    Copy to Clipboard Toggle word wrap

    The following fields provide information about the status of the Operator: READY, STATUS, AGE, and RESTARTS. The STATUS field is Running when the Ingress Node Firewall Operator is deploying a daemon set to the assigned nodes.

  • Run the following command to collect all ingress firewall node pods' logs:

    $ oc adm must-gather – gather_ingress_node_firewall
    Copy to Clipboard Toggle word wrap

    The logs are available in the sos node’s report containing eBPF bpftool outputs at /sos_commands/ebpf. These reports include lookup tables used or updated as the ingress firewall XDP handles packet processing, updates statistics, and emits events.

5.9. SR-IOV Operator

5.9.1. Installing the SR-IOV Network Operator

To manage SR-IOV network devices and network attachments on your cluster, install the Single Root I/O Virtualization (SR-IOV) Network Operator. By using this Operator, you can centralize the configuration and lifecycle management of your SR-IOV resources.

As a cluster administrator, you can install the Single Root I/O Virtualization (SR-IOV) Network Operator by using the OpenShift Container Platform CLI or the web console.

You can use the CLI to install the SR-IOV Network Operator. By using the CLI, you can deploy the Operator directly from your terminal to manage SR-IOV network devices and attachments without navigating the web console.

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You have an account with cluster-admin privileges.
  • You installed a cluster on bare-metal hardware, and you ensured that cluster nodes have hardware that supports SR-IOV.

Procedure

  1. Create the openshift-sriov-network-operator namespace by entering the following command:

    $ cat << EOF| oc create -f -
    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-sriov-network-operator
      annotations:
        workload.openshift.io/allowed: management
    EOF
    Copy to Clipboard Toggle word wrap
  2. Create an OperatorGroup custom resource (CR) by entering the following command:

    $ cat << EOF| oc create -f -
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: sriov-network-operators
      namespace: openshift-sriov-network-operator
    spec:
      targetNamespaces:
      - openshift-sriov-network-operator
    EOF
    Copy to Clipboard Toggle word wrap
  3. Create a Subscription CR for the SR-IOV Network Operator by entering the following command:

    $ cat << EOF| oc create -f -
    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: sriov-network-operator-subscription
      namespace: openshift-sriov-network-operator
    spec:
      channel: stable
      name: sriov-network-operator
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    EOF
    Copy to Clipboard Toggle word wrap
  4. Create an SriovoperatorConfig resource by entering the following command:

    $ cat <<EOF | oc create -f -
    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovOperatorConfig
    metadata:
      name: default
      namespace: openshift-sriov-network-operator
    spec:
      enableInjector: true
      enableOperatorWebhook: true
      logLevel: 2
      disableDrain: false
    EOF
    Copy to Clipboard Toggle word wrap

Verification

  • To verify that the Operator is installed, enter the following command and then check that the output shows Succeeded for the Operator:

    $ oc get csv -n openshift-sriov-network-operator \
      -o custom-columns=Name:.metadata.name,Phase:.status.phase
    Copy to Clipboard Toggle word wrap

You can use the web console to install the SR-IOV Network Operator. By using the web console, you can deploy the Operator and manage SR-IOV network devices and attachments directly from a graphical interface without having to the CLI.

Prerequisites

  • You have an account with cluster-admin privileges.
  • You installed a cluster on bare-metal hardware, and you ensured that cluster nodes have hardware that supports SR-IOV.

Procedure

  1. Install the SR-IOV Network Operator:

    1. In the OpenShift Container Platform web console, click EcosystemSoftware Catalog.
    2. Select SR-IOV Network Operator from the list of available Operators, and then click Install.
    3. On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
    4. Click Install.

Verification

  1. Navigate to the EcosystemInstalled Operators page.
  2. Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.

    Note

    During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

  3. If the Operator does not show as installed, complete any of the following steps to troubleshoot the issue:

    • Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
    • Navigate to the WorkloadsPods page and check the logs for pods in the openshift-sriov-network-operator project.
    • Check the namespace of the YAML file. If the annotation is missing, you can add the annotation workload.openshift.io/allowed=management to the Operator namespace with the following command:

      $ oc annotate ns/openshift-sriov-network-operator workload.openshift.io/allowed=management
      Copy to Clipboard Toggle word wrap
      Note

      For single-node OpenShift clusters, the annotation workload.openshift.io/allowed=management is required for the namespace.

5.9.2. Configuring the SR-IOV Network Operator

To manage SR-IOV network devices and network attachments in your cluster, use the Single Root I/O Virtualization (SR-IOV) Network Operator.

5.9.2.1. Configuring the SR-IOV Network Operator

To manage SR-IOV network devices and network attachments in your cluster, configure the Single Root I/O Virtualization (SR-IOV) Network Operator. You can then deploy the Operator components to your cluster.

Procedure

  1. Create a SriovOperatorConfig custom resource (CR). The following example creates a file named sriovOperatorConfig.yaml:

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovOperatorConfig
    metadata:
      name: default
      namespace: openshift-sriov-network-operator
    spec:
      disableDrain: false
      enableInjector: true
      enableOperatorWebhook: true
      logLevel: 2
      featureGates:
        metricsExporter: false
    # ...
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specifies the name of the SR-IOV Network Operator instance. The only valid name for the SriovOperatorConfig resource is default and the name must be in the namespace where the Operator is deployed.
    spec.enableInjector
    Specifies if any network-resources-injector pod can run in the namespace. If not specified in the CR or explicitly set to true, defaults to false or <none>, preventing any network-resources-injector pod from running in the namespace. The recommended setting is true.
    spec.enableOperatorWebhook
    Specifies if any operator-webhook pods can run in the namespace. The enableOperatorWebhook field, if not specified in the CR or explicitly set to true, defaults to false or <none>, preventing any operator-webhook pod from running in the namespace. The recommended setting is true.
  2. Apply the resource to your cluster by running the following command:

    $ oc apply -f sriovOperatorConfig.yaml
    Copy to Clipboard Toggle word wrap

To customize the SR-IOV Network Operator, configure the sriovoperatorconfig custom resource. The reference lists the parameters available for controlling the global settings and deployment behavior of the Operator.

The following table describes the sriovoperatorconfig CR fields:

Expand
Table 5.18. SR-IOV Network Operator config custom resource
FieldTypeDescription

metadata.name

string

Specifies the name of the SR-IOV Network Operator instance. The default value is default. Do not set a different value.

metadata.namespace

string

Specifies the namespace of the SR-IOV Network Operator instance. The default value is openshift-sriov-network-operator. Do not set a different value.

spec.configDaemonNodeSelector

string

Specifies the node selection to control scheduling the SR-IOV Network Config Daemon on selected nodes. By default, this field is not set and the Operator deploys the SR-IOV Network Config daemon set on compute nodes.

spec.disableDrain

boolean

Specifies whether to disable the node draining process or enable the node draining process when you apply a new policy to configure the NIC on a node. Setting this field to true facilitates software development and installing OpenShift Container Platform on a single node. By default, this field is not set. For single-node clusters, set this field to true after installing the Operator. This field must remain set to true.

spec.enableInjector

boolean

Specifies whether to enable or disable the Network Resources Injector daemon set.

spec.enableOperatorWebhook

boolean

Specifies whether to enable or disable the Operator Admission Controller webhook daemon set.

spec.logLevel

integer

Specifies the log verbosity level of the Operator. By default, this field is set to 0, which shows only basic logs. Set to 2 to show all the available logs.

spec.featureGates

map[string]bool

Specifies whether to enable or disable the optional features. For example, metricsExporter.

spec.featureGates.metricsExporter

boolean

Specifies whether to enable or disable the SR-IOV Network Operator metrics. By default, this field is set to false.

spec.featureGates.mellanoxFirmwareReset

boolean

Specifies whether to reset the firmware on virtual function (VF) changes in the SR-IOV Network Operator. Some chipsets, such as the Intel C740 Series, do not completely power off the PCI-E devices, which is required to configure VFs on NVIDIA/Mellanox NICs. By default, this field is set to false.

Important

Kubernetes NMState Operator is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

5.9.2.3. About the Network Resources Injector

To automate network configuration for your workloads, use the Network Resources Injector. This Kubernetes Dynamic Admission Controller intercepts pod creation requests to automatically inject the necessary network resources and parameters defined for your cluster.

The Network Resources Injector provides the following capabilities:

  • Mutation of resource requests and limits in a pod specification to add an SR-IOV resource name according to an SR-IOV network attachment definition annotation.
  • Mutation of a pod specification with a Downward API volume to expose pod annotations, labels, and huge pages requests and limits. Containers that run in the pod can access the exposed information as files under the /etc/podnetinfo path.

The SR-IOV Network Operator enables the Network Resources Injector when the enableInjector is set to true in the SriovOperatorConfig CR. The network-resources-injector pod runs as a daemon set on all control plane nodes. The following is an example of Network Resources Injector pods running in a cluster with three control plane nodes:

$ oc get pods -n openshift-sriov-network-operator
Copy to Clipboard Toggle word wrap

Example output

NAME                                      READY   STATUS    RESTARTS   AGE
network-resources-injector-5cz5p          1/1     Running   0          10m
network-resources-injector-dwqpx          1/1     Running   0          10m
network-resources-injector-lktz5          1/1     Running   0          10m
Copy to Clipboard Toggle word wrap

By default, the failurePolicy field in the Network Resources Injector webhook is set to Ignore. This default setting prevents pod creation from being blocked if the webhook is unavailable.

If you set the failurePolicy field to Fail, and the Network Resources Injector webhook is unavailable, the webhook attempts to mutate all pod creation and update requests. This behavior can block pod creation and disrupt normal cluster operations. To prevent such issues, you can enable the featureGates.resourceInjectorMatchCondition feature in the SriovOperatorConfig object to limit the scope of the Network Resources Injector webhook. If this feature is enabled, the webhook applies only to pods with the secondary network annotation k8s.v1.cni.cncf.io/networks.

If you set the failurePolicy field to Fail after enabling the resourceInjectorMatchCondition feature, the webhook applies only to pods with the secondary network annotation k8s.v1.cni.cncf.io/networks. If the webhook is unavailable, the cluster still deploys pods without this annotation; this prevents unnecessary disruptions to cluster operations.

The featureGates.resourceInjectorMatchCondition feature is disabled by default. To enable this feature, set the featureGates.resourceInjectorMatchCondition field to true in the SriovOperatorConfig object.

Example SriovOperatorConfig object configuration

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: sriov-network-operator
spec:
# ...
  featureGates:
    resourceInjectorMatchCondition: true
# ...
Copy to Clipboard Toggle word wrap

To control the automatic configuration of your cluster workloads, enable or disable the Network Resources Injector. By adjusting these settings you can better manage whether the Kubernetes Dynamic Admission Controller automatically injects network resources and parameters into pods during their creation.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • You must have installed the SR-IOV Network Operator.

Procedure

  • Set the enableInjector field. Replace <value> with false to disable the feature or true to enable the feature.

    $ oc patch sriovoperatorconfig default \
      --type=merge -n openshift-sriov-network-operator \
      --patch '{ "spec": { "enableInjector": <value> } }'
    Copy to Clipboard Toggle word wrap
    Tip

    You can alternatively apply the following YAML to update the Operator:

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovOperatorConfig
    metadata:
      name: default
      namespace: openshift-sriov-network-operator
    spec:
      enableInjector: <value>
    # ...
    Copy to Clipboard Toggle word wrap

To validate network configurations and prevent errors, rely on the SR-IOV Network Operator admission controller webhook. This Kubernetes Dynamic Admission Controller intercepts API requests to ensure that your SR-IOV resource definitions and pod specifications comply with cluster policies.

The SR-IOV Network Operator Admission Controller webhook provides the following capabilities:

  • Validation of the SriovNetworkNodePolicy CR when it is created or updated.
  • Mutation of the SriovNetworkNodePolicy CR by setting the default value for the priority and deviceType fields when the CR is created or updated.

The SR-IOV Network Operator Admission Controller webhook is enabled by the Operator when the enableOperatorWebhook is set to true in the SriovOperatorConfig CR. The operator-webhook pod runs as a daemon set on all control plane nodes.

Note

Use caution when disabling the SR-IOV Network Operator Admission Controller webhook. You can disable the webhook under specific circumstances, such as troubleshooting, or if you want to use unsupported devices. For information about configuring unsupported devices, see "Configuring the SR-IOV Network Operator to use an unsupported NIC".

The following is an example of the Operator Admission Controller webhook pods running in a cluster with three control plane nodes:

$ oc get pods -n openshift-sriov-network-operator
Copy to Clipboard Toggle word wrap

Example output

NAME                                      READY   STATUS    RESTARTS   AGE
operator-webhook-9jkw6                    1/1     Running   0          16m
operator-webhook-kbr5p                    1/1     Running   0          16m
operator-webhook-rpfrl                    1/1     Running   0          16m
Copy to Clipboard Toggle word wrap

To manage validation of your network configurations, enable or disable the SR-IOV Network Operator admission controller webhook. When enabled, the Operator automatically verifies SR-IOV resource definitions and pod specifications against cluster policies.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • You must have installed the SR-IOV Network Operator.

Procedure

  • Set the enableOperatorWebhook field. Replace <value> with false to disable the feature or true to enable it:

    $ oc patch sriovoperatorconfig default --type=merge \
      -n openshift-sriov-network-operator \
      --patch '{ "spec": { "enableOperatorWebhook": <value> } }'
    Copy to Clipboard Toggle word wrap
    Tip

    You can alternatively apply the following YAML to update the Operator:

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovOperatorConfig
    metadata:
      name: default
      namespace: openshift-sriov-network-operator
    spec:
      enableOperatorWebhook: <value>
    # ...
    Copy to Clipboard Toggle word wrap

To specify which nodes host the SR-IOV Network Config daemon, configure a custom node selector by using node labels. By completing this task, you can restrict deployment to specific nodes instead of the default compute nodes.

The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, the daemon is deployed to all the compute nodes in the cluster.

Important

When you update the configDaemonNodeSelector field, the SR-IOV Network Config daemon is recreated on each selected node. While the daemon is recreated, cluster users are unable to apply any new SR-IOV Network node policy or create new SR-IOV pods.

Procedure

  • To update the node selector for the Operator, enter the following command:

    $ oc patch sriovoperatorconfig default --type=json \
      -n openshift-sriov-network-operator \
      --patch '[{
          "op": "replace",
          "path": "/spec/configDaemonNodeSelector",
          "value": {<node_label>}
        }]'
    Copy to Clipboard Toggle word wrap

    Replace <node_label> with a label to apply as in the following example: "node-role.kubernetes.io/worker": "".

    Tip

    You can alternatively apply the following YAML to update the Operator:

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovOperatorConfig
    metadata:
      name: default
      namespace: openshift-sriov-network-operator
    spec:
      configDaemonNodeSelector:
        <node_label>
    # ...
    Copy to Clipboard Toggle word wrap

By default, the SR-IOV Network Operator drains workloads from a node before every policy change. The Operator performs this action to ensure that no workloads are using the virtual functions before the reconfiguration. As a result, you must configure the Operator to not drain workloads from the single node.

For installations on a single node, other nodes do not receive the workloads.

Important

After performing the following procedure to disable draining workloads, you must remove any workload that uses an SR-IOV network interface before you change any SR-IOV network node policy.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • You must have installed the SR-IOV Network Operator.

Procedure

  • To set the disableDrain field to true and the configDaemonNodeSelector field to node-role.kubernetes.io/master: "", enter the following command:

    $ oc patch sriovoperatorconfig default --type=merge -n openshift-sriov-network-operator --patch '{ "spec": { "disableDrain": true, "configDaemonNodeSelector": { "node-role.kubernetes.io/master": "" } } }'
    Copy to Clipboard Toggle word wrap
    Tip

    You can alternatively apply the following YAML to update the Operator:

    apiVersion: sriovnetwork.openshift.io/v1
    kind: SriovOperatorConfig
    metadata:
      name: default
      namespace: openshift-sriov-network-operator
    spec:
      disableDrain: true
      configDaemonNodeSelector:
       node-role.kubernetes.io/master: ""
    # ...
    Copy to Clipboard Toggle word wrap
Important

Hosted control planes on the AWS platform is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

After you configure and deploy your hosting service cluster, you can create a subscription to the SR-IOV Operator on a hosted cluster. The SR-IOV pod runs on worker machines rather than the control plane.

Prerequisites

You must configure and deploy the hosted cluster on AWS. For more information, see Configuring the hosting cluster on AWS (Technology Preview).

Procedure

  1. Create a namespace and an Operator group:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: openshift-sriov-network-operator
    ---
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: sriov-network-operators
      namespace: openshift-sriov-network-operator
    spec:
      targetNamespaces:
      - openshift-sriov-network-operator
    Copy to Clipboard Toggle word wrap
  2. Create a subscription to the SR-IOV Operator:

    apiVersion: operators.coreos.com/v1alpha1
    kind: Subscription
    metadata:
      name: sriov-network-operator-subsription
      namespace: openshift-sriov-network-operator
    spec:
      channel: stable
      name: sriov-network-operator
      config:
        nodeSelector:
          node-role.kubernetes.io/worker: ""
      source: redhat-operators
      sourceNamespace: openshift-marketplace
    Copy to Clipboard Toggle word wrap

Verification

  1. To verify that the SR-IOV Operator is ready, run the following command and view the resulting output:

    $ oc get csv -n openshift-sriov-network-operator
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                         DISPLAY                   VERSION               REPLACES                                     PHASE
    sriov-network-operator.4.16.0-202211021237   SR-IOV Network Operator   4.16.0-202211021237   sriov-network-operator.4.16.0-202210290517   Succeeded
    Copy to Clipboard Toggle word wrap

  2. To verify that the SR-IOV pods are deployed, run the following command:

    $ oc get pods -n openshift-sriov-network-operator
    Copy to Clipboard Toggle word wrap
5.9.2.9. About the SR-IOV network metrics exporter

To monitor the networking activity of SR-IOV pods, enable the SR-IOV network metrics exporter. This tool exposes metrics for SR-IOV virtual functions (VFs) in Prometheus format, so that you can query and visualize data through the OpenShift Container Platform web console.

When you query the SR-IOV VF metrics by using the web console, the SR-IOV network metrics exporter fetches and returns the VF network statistics along with the name and namespace of the pod that the VF is attached to.

The following table describes the SR-IOV VF metrics that the metrics exporter reads and exposes in Prometheus format:

Expand
Table 5.19. SR-IOV VF metrics
MetricDescriptionExample PromQL query to examine the VF metric

sriov_vf_rx_bytes

Received bytes per virtual function.

sriov_vf_rx_bytes * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice

sriov_vf_tx_bytes

Transmitted bytes per virtual function.

sriov_vf_tx_bytes * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice

sriov_vf_rx_packets

Received packets per virtual function.

sriov_vf_rx_packets * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice

sriov_vf_tx_packets

Transmitted packets per virtual function.

sriov_vf_tx_packets * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice

sriov_vf_rx_dropped

Dropped packets upon receipt per virtual function.

sriov_vf_rx_dropped * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice

sriov_vf_tx_dropped

Dropped packets during transmission per virtual function.

sriov_vf_tx_dropped * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice

sriov_vf_rx_multicast

Received multicast packets per virtual function.

sriov_vf_rx_multicast * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice

sriov_vf_rx_broadcast

Received broadcast packets per virtual function.

sriov_vf_rx_broadcast * on (pciAddr,node) group_left(pod,namespace,dev_type) sriov_kubepoddevice

sriov_kubepoddevice

Virtual functions linked to active pods.

-

You can also combine these queries by using the kube-state-metrics tool to get more information about the SR-IOV pods. For example, you can use the following query to get the VF network statistics along with the application name from the standard Kubernetes pod label:

(sriov_vf_tx_packets * on (pciAddr,node)  group_left(pod,namespace)  sriov_kubepoddevice) * on (pod,namespace) group_left (label_app_kubernetes_io_name) kube_pod_labels
Copy to Clipboard Toggle word wrap

To enable the SR-IOV network metrics exporter, set the spec.featureGates.metricsExporter field to true. Because the exporter is disabled by default, you must explicitly activate this feature gate to start exposing metrics for your SR-IOV devices.

Important

When the metrics exporter is enabled, the SR-IOV Network Operator deploys the metrics exporter only on nodes with SR-IOV capabilities.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have logged in as a user with cluster-admin privileges.
  • You have installed the SR-IOV Network Operator.

Procedure

  1. Enable cluster monitoring by running the following command:

    $ oc label ns/openshift-sriov-network-operator openshift.io/cluster-monitoring=true
    Copy to Clipboard Toggle word wrap

    To enable cluster monitoring, you must add the openshift.io/cluster-monitoring=true label in the namespace where you have installed the SR-IOV Network Operator.

  2. Set the spec.featureGates.metricsExporter field to true by running the following command:

    $ oc patch -n openshift-sriov-network-operator sriovoperatorconfig/default \
        --type='merge' -p='{"spec": {"featureGates": {"metricsExporter": true}}}'
    Copy to Clipboard Toggle word wrap

Verification

  1. Check that the SR-IOV network metrics exporter is enabled by running the following command:

    $ oc get pods -n openshift-sriov-network-operator
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                     READY   STATUS    RESTARTS   AGE
    operator-webhook-hzfg4                   1/1     Running   0          5d22h
    sriov-network-config-daemon-tr54m        1/1     Running   0          5d22h
    sriov-network-metrics-exporter-z5d7t     1/1     Running   0          10s
    sriov-network-operator-cc6fd88bc-9bsmt   1/1     Running   0          5d22h
    Copy to Clipboard Toggle word wrap

    Ensure that sriov-network-metrics-exporter pod is in the READY state.

  2. Optional: Examine the SR-IOV virtual function (VF) metrics by using the OpenShift Container Platform web console. For more information, see "Querying metrics".

5.9.3. Uninstalling the SR-IOV Network Operator

To uninstall the SR-IOV Network Operator, you must delete any running SR-IOV workloads, uninstall the Operator, and delete the webhooks that the Operator used.

5.9.3.1. Uninstalling the SR-IOV Network Operator

You can remove the SR-IOV Network Operator from your cluster by uninstalling the Operator. This ensures that the Operator and its associated resources are deleted when you no longer need to manage SR-IOV network devices.

Prerequisites

  • You have access to an OpenShift Container Platform cluster using an account with cluster-admin permissions.
  • You have the SR-IOV Network Operator installed.

Procedure

  1. Delete all SR-IOV custom resources (CRs):

    $ oc delete sriovnetwork -n openshift-sriov-network-operator --all
    Copy to Clipboard Toggle word wrap
    $ oc delete sriovnetworknodepolicy -n openshift-sriov-network-operator --all
    Copy to Clipboard Toggle word wrap
    $ oc delete sriovibnetwork -n openshift-sriov-network-operator --all
    Copy to Clipboard Toggle word wrap
  2. Follow the instructions in the "Deleting Operators from a cluster" section to remove the SR-IOV Network Operator from your cluster.
  3. Delete the SR-IOV custom resource definitions that remain in the cluster after the SR-IOV Network Operator is uninstalled:

    $ oc delete crd sriovibnetworks.sriovnetwork.openshift.io
    Copy to Clipboard Toggle word wrap
    $ oc delete crd sriovnetworknodepolicies.sriovnetwork.openshift.io
    Copy to Clipboard Toggle word wrap
    $ oc delete crd sriovnetworknodestates.sriovnetwork.openshift.io
    Copy to Clipboard Toggle word wrap
    $ oc delete crd sriovnetworkpoolconfigs.sriovnetwork.openshift.io
    Copy to Clipboard Toggle word wrap
    $ oc delete crd sriovnetworks.sriovnetwork.openshift.io
    Copy to Clipboard Toggle word wrap
    $ oc delete crd sriovoperatorconfigs.sriovnetwork.openshift.io
    Copy to Clipboard Toggle word wrap
  4. Delete the SR-IOV webhooks:

    $ oc delete mutatingwebhookconfigurations network-resources-injector-config
    Copy to Clipboard Toggle word wrap
    $ oc delete MutatingWebhookConfiguration sriov-operator-webhook-config
    Copy to Clipboard Toggle word wrap
    $ oc delete ValidatingWebhookConfiguration sriov-operator-webhook-config
    Copy to Clipboard Toggle word wrap
  5. Delete the SR-IOV Network Operator namespace:

    $ oc delete namespace openshift-sriov-network-operator
    Copy to Clipboard Toggle word wrap

Chapter 6. Network security

6.1. Understanding network policy APIs

Kubernetes offers two features that users can use to enforce network security. One feature that allows users to enforce network policy is the NetworkPolicy API that is designed mainly for application developers and namespace tenants to protect their namespaces by creating namespace-scoped policies.

The second feature is AdminNetworkPolicy which consists of two APIs: the AdminNetworkPolicy (ANP) API and the BaselineAdminNetworkPolicy (BANP) API. ANP and BANP are designed for cluster and network administrators to protect their entire cluster by creating cluster-scoped policies. Cluster administrators can use ANPs to enforce non-overridable policies that take precedence over NetworkPolicy objects. Administrators can use BANP to set up and enforce optional cluster-scoped network policy rules that are overridable by users using NetworkPolicy objects when necessary. When used together, ANP, BANP, and network policy can achieve full multi-tenant isolation that administrators can use to secure their cluster.

OVN-Kubernetes CNI in OpenShift Container Platform implements these network policies using Access Control List (ACL) Tiers to evaluate and apply them. ACLs are evaluated in descending order from Tier 1 to Tier 3.

Tier 1 evaluates AdminNetworkPolicy (ANP) objects. Tier 2 evaluates NetworkPolicy objects. Tier 3 evaluates BaselineAdminNetworkPolicy (BANP) objects.

ANPs are evaluated first. When the match is an ANP allow or deny rule, any existing NetworkPolicy and BaselineAdminNetworkPolicy (BANP) objects in the cluster are skipped from evaluation. When the match is an ANP pass rule, then evaluation moves from tier 1 of the ACL to tier 2 where the NetworkPolicy policy is evaluated. If no NetworkPolicy matches the traffic then evaluation moves from tier 2 ACLs to tier 3 ACLs where BANP is evaluated.

The following table explains key differences between the cluster scoped AdminNetworkPolicy API and the namespace scoped NetworkPolicy API.

Expand
Policy elementsAdminNetworkPolicyNetworkPolicy

Applicable user

Cluster administrator or equivalent

Namespace owners

Scope

Cluster

Namespaced

Drop traffic

Supported with an explicit Deny action set as a rule.

Supported via implicit Deny isolation at policy creation time.

Delegate traffic

Supported with an Pass action set as a rule.

Not applicable

Allow traffic

Supported with an explicit Allow action set as a rule.

The default action for all rules is to allow.

Rule precedence within the policy

Depends on the order in which they appear within an ANP. The higher the rule’s position the higher the precedence.

Rules are additive

Policy precedence

Among ANPs the priority field sets the order for evaluation. The lower the priority number higher the policy precedence.

There is no policy ordering between policies.

Feature precedence

Evaluated first via tier 1 ACL and BANP is evaluated last via tier 3 ACL.

Enforced after ANP and before BANP, they are evaluated in tier 2 of the ACL.

Matching pod selection

Can apply different rules across namespaces.

Can apply different rules across pods in single namespace.

Cluster egress traffic

Supported via nodes and networks peers

Supported through ipBlock field along with accepted CIDR syntax.

Cluster ingress traffic

Not supported

Not supported

Fully qualified domain names (FQDN) peer support

Not supported

Not supported

Namespace selectors

Supports advanced selection of Namespaces with the use of namespaces.matchLabels field

Supports label based namespace selection with the use of namespaceSelector field

6.2. Admin network policy

6.2.1. OVN-Kubernetes AdminNetworkPolicy

6.2.1.1. AdminNetworkPolicy

An AdminNetworkPolicy (ANP) is a cluster-scoped custom resource definition (CRD). As a OpenShift Container Platform administrator, you can use ANP to secure your network by creating network policies before creating namespaces. Additionally, you can create network policies on a cluster-scoped level that is non-overridable by NetworkPolicy objects.

The key difference between AdminNetworkPolicy and NetworkPolicy objects are that the former is for administrators and is cluster scoped while the latter is for tenant owners and is namespace scoped.

An ANP allows administrators to specify the following:

  • A priority value that determines the order of its evaluation. The lower the value the higher the precedence.
  • A set of pods that consists of a set of namespaces or namespace on which the policy is applied.
  • A list of ingress rules to be applied for all ingress traffic towards the subject.
  • A list of egress rules to be applied for all egress traffic from the subject.
6.2.1.1.1. AdminNetworkPolicy example

Example 6.1. Example YAML file for an ANP

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: sample-anp-deny-pass-rules 
1

spec:
  priority: 50 
2

  subject:
    namespaces:
      matchLabels:
          kubernetes.io/metadata.name: example.name 
3

  ingress: 
4

  - name: "deny-all-ingress-tenant-1" 
5

    action: "Deny"
    from:
    - pods:
        namespaceSelector:
          matchLabels:
            custom-anp: tenant-1
        podSelector:
          matchLabels:
            custom-anp: tenant-1 
6

  egress:
7

  - name: "pass-all-egress-to-tenant-1"
    action: "Pass"
    to:
    - pods:
        namespaceSelector:
          matchLabels:
            custom-anp: tenant-1
        podSelector:
          matchLabels:
            custom-anp: tenant-1
Copy to Clipboard Toggle word wrap
1
Specify a name for your ANP.
2
The spec.priority field supports a maximum of 100 ANPs in the range of values 0-99 in a cluster. The lower the value, the higher the precedence because the range is read in order from the lowest to highest value. Because there is no guarantee which policy takes precedence when ANPs are created at the same priority, set ANPs at different priorities so that precedence is deliberate.
3
Specify the namespace to apply the ANP resource.
4
ANP have both ingress and egress rules. ANP rules for spec.ingress field accepts values of Pass, Deny, and Allow for the action field.
5
Specify a name for the ingress.name.
6
Specify podSelector.matchLabels to select pods within the namespaces selected by namespaceSelector.matchLabels as ingress peers.
7
ANPs have both ingress and egress rules. ANP rules for spec.egress field accepts values of Pass, Deny, and Allow for the action field.

Additional resources

6.2.1.1.2. AdminNetworkPolicy actions for rules

As an administrator, you can set Allow, Deny, or Pass as the action field for your AdminNetworkPolicy rules. Because OVN-Kubernetes uses a tiered ACLs to evaluate network traffic rules, ANP allow you to set very strong policy rules that can only be changed by an administrator modifying them, deleting the rule, or overriding them by setting a higher priority rule.

6.2.1.1.2.1. AdminNetworkPolicy Allow example

The following ANP that is defined at priority 9 ensures all ingress traffic is allowed from the monitoring namespace towards any tenant (all other namespaces) in the cluster.

Example 6.2. Example YAML file for a strong Allow ANP

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: allow-monitoring
spec:
  priority: 9
  subject:
    namespaces: {} # Use the empty selector with caution because it also selects OpenShift namespaces as well.
  ingress:
  - name: "allow-ingress-from-monitoring"
    action: "Allow"
    from:
    - namespaces:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
# ...
Copy to Clipboard Toggle word wrap

This is an example of a strong Allow ANP because it is non-overridable by all the parties involved. No tenants can block themselves from being monitored using NetworkPolicy objects and the monitoring tenant also has no say in what it can or cannot monitor.

6.2.1.1.2.2. AdminNetworkPolicy Deny example

The following ANP that is defined at priority 5 ensures all ingress traffic from the monitoring namespace is blocked towards restricted tenants (namespaces that have labels security: restricted).

Example 6.3. Example YAML file for a strong Deny ANP

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: block-monitoring
spec:
  priority: 5
  subject:
    namespaces:
      matchLabels:
        security: restricted
  ingress:
  - name: "deny-ingress-from-monitoring"
    action: "Deny"
    from:
    - namespaces:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
# ...
Copy to Clipboard Toggle word wrap

This is a strong Deny ANP that is non-overridable by all the parties involved. The restricted tenant owners cannot authorize themselves to allow monitoring traffic, and the infrastructure’s monitoring service cannot scrape anything from these sensitive namespaces.

When combined with the strong Allow example, the block-monitoring ANP has a lower priority value giving it higher precedence, which ensures restricted tenants are never monitored.

6.2.1.1.2.3. AdminNetworkPolicy Pass example

The following ANP that is defined at priority 7 ensures all ingress traffic from the monitoring namespace towards internal infrastructure tenants (namespaces that have labels security: internal) are passed on to tier 2 of the ACLs and evaluated by the namespaces’ NetworkPolicy objects.

Example 6.4. Example YAML file for a strong Pass ANP

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: pass-monitoring
spec:
  priority: 7
  subject:
    namespaces:
      matchLabels:
        security: internal
  ingress:
  - name: "pass-ingress-from-monitoring"
    action: "Pass"
    from:
    - namespaces:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
# ...
Copy to Clipboard Toggle word wrap

This example is a strong Pass action ANP because it delegates the decision to NetworkPolicy objects defined by tenant owners. This pass-monitoring ANP allows all tenant owners grouped at security level internal to choose if their metrics should be scraped by the infrastructures' monitoring service using namespace scoped NetworkPolicy objects.

6.2.2. OVN-Kubernetes BaselineAdminNetworkPolicy

6.2.2.1. BaselineAdminNetworkPolicy

BaselineAdminNetworkPolicy (BANP) is a cluster-scoped custom resource definition (CRD). As a OpenShift Container Platform administrator, you can use BANP to setup and enforce optional baseline network policy rules that are overridable by users using NetworkPolicy objects if need be. Rule actions for BANP are allow or deny.

The BaselineAdminNetworkPolicy resource is a cluster singleton object that can be used as a guardrail policy incase a passed traffic policy does not match any NetworkPolicy objects in the cluster. A BANP can also be used as a default security model that provides guardrails that intra-cluster traffic is blocked by default and a user will need to use NetworkPolicy objects to allow known traffic. You must use default as the name when creating a BANP resource.

A BANP allows administrators to specify:

  • A subject that consists of a set of namespaces or namespace.
  • A list of ingress rules to be applied for all ingress traffic towards the subject.
  • A list of egress rules to be applied for all egress traffic from the subject.
6.2.2.1.1. BaselineAdminNetworkPolicy example

Example 6.5. Example YAML file for BANP

apiVersion: policy.networking.k8s.io/v1alpha1
kind: BaselineAdminNetworkPolicy
metadata:
  name: default 
1

spec:
  subject:
    namespaces:
      matchLabels:
          kubernetes.io/metadata.name: example.name 
2

  ingress: 
3

  - name: "deny-all-ingress-from-tenant-1" 
4

    action: "Deny"
    from:
    - pods:
        namespaceSelector:
          matchLabels:
            custom-banp: tenant-1 
5

        podSelector:
          matchLabels:
            custom-banp: tenant-1 
6

  egress:
  - name: "allow-all-egress-to-tenant-1"
    action: "Allow"
    to:
    - pods:
        namespaceSelector:
          matchLabels:
            custom-banp: tenant-1
        podSelector:
          matchLabels:
            custom-banp: tenant-1
Copy to Clipboard Toggle word wrap
1
The policy name must be default because BANP is a singleton object.
2
Specify the namespace to apply the ANP to.
3
BANP have both ingress and egress rules. BANP rules for spec.ingress and spec.egress fields accepts values of Deny and Allow for the action field.
4
Specify a name for the ingress.name
5
Specify the namespaces to select the pods from to apply the BANP resource.
6
Specify podSelector.matchLabels name of the pods to apply the BANP resource.
6.2.2.1.2. BaselineAdminNetworkPolicy Deny example

The following BANP singleton ensures that the administrator has set up a default deny policy for all ingress monitoring traffic coming into the tenants at internal security level. When combined with the "AdminNetworkPolicy Pass example", this deny policy acts as a guardrail policy for all ingress traffic that is passed by the ANP pass-monitoring policy.

Example 6.6. Example YAML file for a guardrail Deny rule

apiVersion: policy.networking.k8s.io/v1alpha1
kind: BaselineAdminNetworkPolicy
metadata:
  name: default
spec:
  subject:
    namespaces:
      matchLabels:
        security: internal
  ingress:
  - name: "deny-ingress-from-monitoring"
    action: "Deny"
    from:
    - namespaces:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
# ...
Copy to Clipboard Toggle word wrap

You can use an AdminNetworkPolicy resource with a Pass value for the action field in conjunction with the BaselineAdminNetworkPolicy resource to create a multi-tenant policy. This multi-tenant policy allows one tenant to collect monitoring data on their application while simultaneously not collecting data from a second tenant.

As an administrator, if you apply both the "AdminNetworkPolicy Pass action example" and the "BaselineAdminNetwork Policy Deny example", tenants are then left with the ability to choose to create a NetworkPolicy resource that will be evaluated before the BANP.

For example, Tenant 1 can set up the following NetworkPolicy resource to monitor ingress traffic:

Example 6.7. Example NetworkPolicy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-monitoring
  namespace: tenant 1
spec:
  podSelector:
  policyTypes:
    - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
# ...
Copy to Clipboard Toggle word wrap

In this scenario, Tenant 1’s policy would be evaluated after the "AdminNetworkPolicy Pass action example" and before the "BaselineAdminNetwork Policy Deny example", which denies all ingress monitoring traffic coming into tenants with security level internal. With Tenant 1’s NetworkPolicy object in place, they will be able to collect data on their application. Tenant 2, however, who does not have any NetworkPolicy objects in place, will not be able to collect data. As an administrator, you have not by default monitored internal tenants, but instead, you created a BANP that allows tenants to use NetworkPolicy objects to override the default behavior of your BANP.

6.2.3. Monitoring ANP and BANP

AdminNetworkPolicy and BaselineAdminNetworkPolicy resources have metrics that can be used for monitoring and managing your policies. See the following table for more details on the metrics.

6.2.3.1. Metrics for AdminNetworkPolicy
Expand
NameDescriptionExplanation

ovnkube_controller_admin_network_policies

Not applicable

The total number of AdminNetworkPolicy resources in the cluster.

ovnkube_controller_baseline_admin_network_policies

Not applicable

The total number of BaselineAdminNetworkPolicy resources in the cluster. The value should be 0 or 1.

ovnkube_controller_admin_network_policies_rules

  • direction: specifies either Ingress or Egress.
  • action: specifies either Pass, Allow, or Deny.

The total number of rules across all ANP policies in the cluster grouped by direction and action.

ovnkube_controller_baseline_admin_network_policies_rules

  • direction: specifies either Ingress or Egress.
  • action: specifies either Allow or Deny.

The total number of rules across all BANP policies in the cluster grouped by direction and action.

ovnkube_controller_admin_network_policies_db_objects

table_name: specifies either ACL or Address_Set

The total number of OVN Northbound database (nbdb) objects that are created by all the ANP in the cluster grouped by the table_name.

ovnkube_controller_baseline_admin_network_policies_db_objects

table_name: specifies either ACL or Address_Set

The total number of OVN Northbound database (nbdb) objects that are created by all the BANP in the cluster grouped by the table_name.

This section explains nodes and networks peers. Administrators can use the examples in this section to design AdminNetworkPolicy and BaselineAdminNetworkPolicy to control northbound traffic in their cluster.

In addition to supporting east-west traffic controls, ANP and BANP also allow administrators to control their northbound traffic leaving the cluster or traffic leaving the node to other nodes in the cluster. End-users can do the following:

  • Implement egress traffic control towards cluster nodes using nodes egress peer
  • Implement egress traffic control towards Kubernetes API servers using nodes or networks egress peers
  • Implement egress traffic control towards external destinations outside the cluster using networks peer
Note

For ANP and BANP, nodes and networks peers can be specified for egress rules only.

Using the nodes peer administrators can control egress traffic from pods to nodes in the cluster. A benefit of this is that you do not have to change the policy when nodes are added to or deleted from the cluster.

The following example allows egress traffic to the Kubernetes API server on port 6443 by any of the namespaces with a restricted, confidential, or internal level of security using the node selector peer. It also denies traffic to all worker nodes in your cluster from any of the namespaces with a restricted, confidential, or internal level of security.

Example 6.8. Example of ANP Allow egress using nodes peer

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: egress-security-allow
spec:
  egress:
  - action: Deny
    to:
    - nodes:
        matchExpressions:
        - key: node-role.kubernetes.io/worker
          operator: Exists
  - action: Allow
    name: allow-to-kubernetes-api-server-and-engr-dept-pods
    ports:
    - portNumber:
        port: 6443
        protocol: TCP
    to:
    - nodes: 
1

        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
    - pods: 
2

        namespaceSelector:
          matchLabels:
            dept: engr
        podSelector: {}
  priority: 55
  subject: 
3

    namespaces:
      matchExpressions:
      - key: security 
4

        operator: In
        values:
        - restricted
        - confidential
        - internal
Copy to Clipboard Toggle word wrap
1
Specifies a node or set of nodes in the cluster using the matchExpressions field.
2
Specifies all the pods labeled with dept: engr.
3
Specifies the subject of the ANP which includes any namespaces that match the labels used by the network policy. The example matches any of the namespaces with restricted, confidential, or internal level of security.
4
Specifies key/value pairs for matchExpressions field.

Cluster administrators can use CIDR ranges in networks peer and apply a policy to control egress traffic leaving from pods and going to a destination configured at the IP address that is within the CIDR range specified with networks field.

The following example uses networks peer and combines ANP and BANP policies to restrict egress traffic.

Important

Use the empty selector ({}) in the namespace field for ANP and BANP with caution. When using an empty selector, it also selects OpenShift namespaces.

If you use values of 0.0.0.0/0 in a ANP or BANP Deny rule, you must set a higher priority ANP Allow rule to necessary destinations before setting the Deny to 0.0.0.0/0.

Example 6.9. Example of ANP and BANP using networks peers

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: network-as-egress-peer
spec:
  priority: 70
  subject:
    namespaces: {} # Use the empty selector with caution because it also selects OpenShift namespaces as well.
  egress:
  - name: "deny-egress-to-external-dns-servers"
    action: "Deny"
    to:
    - networks:
1

      - 8.8.8.8/32
      - 8.8.4.4/32
      - 208.67.222.222/32
    ports:
      - portNumber:
          protocol: UDP
          port: 53
  - name: "allow-all-egress-to-intranet"
    action: "Allow"
    to:
    - networks: 
2

      - 89.246.180.0/22
      - 60.45.72.0/22
  - name: "allow-all-intra-cluster-traffic"
    action: "Allow"
    to:
    - namespaces: {} # Use the empty selector with caution because it also selects OpenShift namespaces as well.
  - name: "pass-all-egress-to-internet"
    action: "Pass"
    to:
    - networks:
      - 0.0.0.0/0 
3

---
apiVersion: policy.networking.k8s.io/v1alpha1
kind: BaselineAdminNetworkPolicy
metadata:
  name: default
spec:
  subject:
    namespaces: {} # Use the empty selector with caution because it also selects OpenShift namespaces as well.
  egress:
  - name: "deny-all-egress-to-internet"
    action: "Deny"
    to:
    - networks:
      - 0.0.0.0/0 
4

---
Copy to Clipboard Toggle word wrap
1
Use networks to specify a range of CIDR networks outside of the cluster.
2
Specifies the CIDR ranges for the intra-cluster traffic from your resources.
3 4
Specifies a Deny egress to everything by setting networks values to 0.0.0.0/0. Make sure you have a higher priority Allow rule to necessary destinations before setting a Deny to 0.0.0.0/0 because this will deny all traffic including to Kubernetes API and DNS servers.

Collectively the network-as-egress-peer ANP and default BANP using networks peers enforces the following egress policy:

  • All pods cannot talk to external DNS servers at the listed IP addresses.
  • All pods can talk to rest of the company’s intranet.
  • All pods can talk to other pods, nodes, and services.
  • All pods cannot talk to the internet. Combining the last ANP Pass rule and the strong BANP Deny rule a guardrail policy is created that secures traffic in the cluster.

Cluster administrators can combine nodes and networks peer in your ANP and BANP policies.

Example 6.10. Example of nodes and networks peer

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: egress-peer-1 
1

spec:
  egress: 
2

  - action: "Allow"
    name: "allow-egress"
    to:
    - nodes:
        matchExpressions:
        - key: worker-group
          operator: In
          values:
          - workloads # Egress traffic from nodes with label worker-group: workloads is allowed.
    - networks:
      - 104.154.164.170/32
    - pods:
        namespaceSelector:
          matchLabels:
            apps: external-apps
        podSelector:
          matchLabels:
            app: web # This rule in the policy allows the traffic directed to pods labeled apps: web in projects with apps: external-apps to leave the cluster.
  - action: "Deny"
    name: "deny-egress"
    to:
    - nodes:
        matchExpressions:
        - key: worker-group
          operator: In
          values:
          - infra # Egress traffic from nodes with label worker-group: infra is denied.
    - networks:
      - 104.154.164.160/32 # Egress traffic to this IP address from cluster is denied.
    - pods:
        namespaceSelector:
          matchLabels:
            apps: internal-apps
        podSelector: {}
  - action: "Pass"
    name: "pass-egress"
    to:
    - nodes:
        matchExpressions:
        - key: node-role.kubernetes.io/worker
          operator: Exists # All other egress traffic is passed to NetworkPolicy or BANP for evaluation.
  priority: 30 
3

  subject: 
4

    namespaces:
      matchLabels:
        apps: all-apps
Copy to Clipboard Toggle word wrap
1
Specifies the name of the policy.
2
For nodes and networks peers, you can only use northbound traffic controls in ANP as egress.
3
Specifies the priority of the ANP, determining the order in which they should be evaluated. Lower priority rules have higher precedence. ANP accepts values of 0-99 with 0 being the highest priority and 99 being the lowest.
4
Specifies the set of pods in the cluster on which the rules of the policy are to be applied. In the example, any pods with the apps: all-apps label across all namespaces are the subject of the policy.

6.2.5. Troubleshooting AdminNetworkPolicy

6.2.5.1. Checking creation of ANP

To check that your AdminNetworkPolicy (ANP) and BaselineAdminNetworkPolicy (BANP) are created correctly, check the status outputs of the following commands: oc describe anp or oc describe banp.

A good status indicates OVN DB plumbing was successful and the SetupSucceeded.

Example 6.11. Example ANP with a good status

...
Conditions:
Last Transition Time:  2024-06-08T20:29:00Z
Message:               Setting up OVN DB plumbing was successful
Reason:                SetupSucceeded
Status:                True
Type:                  Ready-In-Zone-ovn-control-plane Last Transition Time:  2024-06-08T20:29:00Z
Message:               Setting up OVN DB plumbing was successful
Reason:                SetupSucceeded
Status:                True
Type:                  Ready-In-Zone-ovn-worker
Last Transition Time:  2024-06-08T20:29:00Z
Message:               Setting up OVN DB plumbing was successful
Reason:                SetupSucceeded
Status:                True
Type:                  Ready-In-Zone-ovn-worker2
...
Copy to Clipboard Toggle word wrap

If plumbing is unsuccessful, an error is reported from the respective zone controller.

Example 6.12. Example of an ANP with a bad status and error message

...
Status:
  Conditions:
    Last Transition Time:  2024-06-25T12:47:44Z
    Message:               error attempting to add ANP cluster-control with priority 600 because, OVNK only supports priority ranges 0-99
    Reason:                SetupFailed
    Status:                False
    Type:                  Ready-In-Zone-example-worker-1.example.example-org.net
    Last Transition Time:  2024-06-25T12:47:45Z
    Message:               error attempting to add ANP cluster-control with priority 600 because, OVNK only supports priority ranges 0-99
    Reason:                SetupFailed
    Status:                False
    Type:                  Ready-In-Zone-example-worker-0.example.example-org.net
    Last Transition Time:  2024-06-25T12:47:44Z
    Message:               error attempting to add ANP cluster-control with priority 600 because, OVNK only supports priority ranges 0-99
    Reason:                SetupFailed
    Status:                False
    Type:                  Ready-In-Zone-example-ctlplane-1.example.example-org.net
    Last Transition Time:  2024-06-25T12:47:44Z
    Message:               error attempting to add ANP cluster-control with priority 600 because, OVNK only supports priority ranges 0-99
    Reason:                SetupFailed
    Status:                False
    Type:                  Ready-In-Zone-example-ctlplane-2.example.example-org.net
    Last Transition Time:  2024-06-25T12:47:44Z
    Message:               error attempting to add ANP cluster-control with priority 600 because, OVNK only supports priority ranges 0-99
    Reason:                SetupFailed
    Status:                False
    Type:                  Ready-In-Zone-example-ctlplane-0.example.example-org.net
    ```
Copy to Clipboard Toggle word wrap

See the following section for nbctl commands to help troubleshoot unsuccessful policies.

6.2.5.1.1. Using nbctl commands for ANP and BANP

To troubleshoot an unsuccessful setup, start by looking at OVN Northbound database (nbdb) objects including ACL, AdressSet, and Port_Group. To view the nbdb, you need to be inside the pod on that node to view the objects in that node’s database.

Prerequisites

  • Access to the cluster as a user with the cluster-admin role.
  • The OpenShift CLI (oc) installed.
Note

To run ovn nbctl commands in a cluster, you must open a remote shell into the `nbdb`on the relevant node.

The following policy was used to generate outputs.

Example 6.13. AdminNetworkPolicy used to generate outputs

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  name: cluster-control
spec:
  priority: 34
  subject:
    namespaces:
      matchLabels:
        anp: cluster-control-anp # Only namespaces with this label have this ANP
  ingress:
  - name: "allow-from-ingress-router" # rule0
    action: "Allow"
    from:
    - namespaces:
        matchLabels:
          policy-group.network.openshift.io/ingress: ""
  - name: "allow-from-monitoring" # rule1
    action: "Allow"
    from:
    - namespaces:
        matchLabels:
          kubernetes.io/metadata.name: openshift-monitoring
    ports:
    - portNumber:
        protocol: TCP
        port: 7564
    - namedPort: "scrape"
  - name: "allow-from-open-tenants" # rule2
    action: "Allow"
    from:
    - namespaces: # open tenants
        matchLabels:
          tenant: open
  - name: "pass-from-restricted-tenants" # rule3
    action: "Pass"
    from:
    - namespaces: # restricted tenants
        matchLabels:
          tenant: restricted
  - name: "default-deny" # rule4
    action: "Deny"
    from:
    - namespaces: {} # Use the empty selector with caution because it also selects OpenShift namespaces as well.
  egress:
  - name: "allow-to-dns" # rule0
    action: "Allow"
    to:
    - pods:
        namespaceSelector:
          matchlabels:
            kubernetes.io/metadata.name: openshift-dns
        podSelector:
          matchlabels:
            app: dns
    ports:
    - portNumber:
        protocol: UDP
        port: 5353
  - name: "allow-to-kapi-server" # rule1
    action: "Allow"
    to:
    - nodes:
        matchExpressions:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
    ports:
    - portNumber:
        protocol: TCP
        port: 6443
  - name: "allow-to-splunk" # rule2
    action: "Allow"
    to:
    - namespaces:
        matchlabels:
          tenant: splunk
    ports:
    - portNumber:
        protocol: TCP
        port: 8991
    - portNumber:
        protocol: TCP
        port: 8992
  - name: "allow-to-open-tenants-and-intranet-and-worker-nodes" # rule3
    action: "Allow"
    to:
    - nodes: # worker-nodes
        matchExpressions:
        - key: node-role.kubernetes.io/worker
          operator: Exists
    - networks: # intranet
      - 172.29.0.0/30
      - 10.0.54.0/19
      - 10.0.56.38/32
      - 10.0.69.0/24
    - namespaces: # open tenants
        matchLabels:
          tenant: open
  - name: "pass-to-restricted-tenants" # rule4
    action: "Pass"
    to:
    - namespaces: # restricted tenants
        matchLabels:
          tenant: restricted
  - name: "default-deny"
    action: "Deny"
    to:
    - networks:
      - 0.0.0.0/0
Copy to Clipboard Toggle word wrap

Procedure

  1. List pods with node information by running the following command:

    $ oc get pods -n openshift-ovn-kubernetes -owide
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                     READY   STATUS    RESTARTS   AGE   IP           NODE                                       NOMINATED NODE   READINESS GATES
    ovnkube-control-plane-5c95487779-8k9fd   2/2     Running   0          34m   10.0.0.5     ci-ln-0tv5gg2-72292-6sjw5-master-0         <none>           <none>
    ovnkube-control-plane-5c95487779-v2xn8   2/2     Running   0          34m   10.0.0.3     ci-ln-0tv5gg2-72292-6sjw5-master-1         <none>           <none>
    ovnkube-node-524dt                       8/8     Running   0          33m   10.0.0.4     ci-ln-0tv5gg2-72292-6sjw5-master-2         <none>           <none>
    ovnkube-node-gbwr9                       8/8     Running   0          24m   10.0.128.4   ci-ln-0tv5gg2-72292-6sjw5-worker-c-s9gqt   <none>           <none>
    ovnkube-node-h4fpx                       8/8     Running   0          33m   10.0.0.5     ci-ln-0tv5gg2-72292-6sjw5-master-0         <none>           <none>
    ovnkube-node-j4hzw                       8/8     Running   0          24m   10.0.128.2   ci-ln-0tv5gg2-72292-6sjw5-worker-a-hzbh5   <none>           <none>
    ovnkube-node-wdhgv                       8/8     Running   0          33m   10.0.0.3     ci-ln-0tv5gg2-72292-6sjw5-master-1         <none>           <none>
    ovnkube-node-wfncn                       8/8     Running   0          24m   10.0.128.3   ci-ln-0tv5gg2-72292-6sjw5-worker-b-5bb7f   <none>           <none>
    Copy to Clipboard Toggle word wrap

  2. Navigate into a pod to look at the northbound database by running the following command:

    $ oc rsh -c nbdb -n openshift-ovn-kubernetes ovnkube-node-524dt
    Copy to Clipboard Toggle word wrap
  3. Run the following command to look at the ACLs nbdb:

    $ ovn-nbctl find ACL 'external_ids{>=}{"k8s.ovn.org/owner-type"=AdminNetworkPolicy,"k8s.ovn.org/name"=cluster-control}'
    Copy to Clipboard Toggle word wrap
    Where, cluster-control
    Specifies the name of the AdminNetworkPolicy you are troubleshooting.
    AdminNetworkPolicy
    Specifies the type: AdminNetworkPolicy or BaselineAdminNetworkPolicy.

    Example 6.14. Example output for ACLs

    _uuid               : 0d5e4722-b608-4bb1-b625-23c323cc9926
    action              : allow-related
    direction           : to-lport
    external_ids        : {direction=Ingress, gress-index="2", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:2:None", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=None}
    label               : 0
    log                 : false
    match               : "outport == @a14645450421485494999 && ((ip4.src == $a13730899355151937870))"
    meter               : acl-logging
    name                : "ANP:cluster-control:Ingress:2"
    options             : {}
    priority            : 26598
    severity            : []
    tier                : 1
    
    _uuid               : b7be6472-df67-439c-8c9c-f55929f0a6e0
    action              : drop
    direction           : from-lport
    external_ids        : {direction=Egress, gress-index="5", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:5:None", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=None}
    label               : 0
    log                 : false
    match               : "inport == @a14645450421485494999 && ((ip4.dst == $a11452480169090787059))"
    meter               : acl-logging
    name                : "ANP:cluster-control:Egress:5"
    options             : {apply-after-lb="true"}
    priority            : 26595
    severity            : []
    tier                : 1
    
    _uuid               : 5a6e5bb4-36eb-4209-b8bc-c611983d4624
    action              : pass
    direction           : to-lport
    external_ids        : {direction=Ingress, gress-index="3", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:3:None", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=None}
    label               : 0
    log                 : false
    match               : "outport == @a14645450421485494999 && ((ip4.src == $a764182844364804195))"
    meter               : acl-logging
    name                : "ANP:cluster-control:Ingress:3"
    options             : {}
    priority            : 26597
    severity            : []
    tier                : 1
    
    _uuid               : 04f20275-c410-405c-a923-0e677f767889
    action              : pass
    direction           : from-lport
    external_ids        : {direction=Egress, gress-index="4", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:4:None", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=None}
    label               : 0
    log                 : false
    match               : "inport == @a14645450421485494999 && ((ip4.dst == $a5972452606168369118))"
    meter               : acl-logging
    name                : "ANP:cluster-control:Egress:4"
    options             : {apply-after-lb="true"}
    priority            : 26596
    severity            : []
    tier                : 1
    
    _uuid               : 4b5d836a-e0a3-4088-825e-f9f0ca58e538
    action              : drop
    direction           : to-lport
    external_ids        : {direction=Ingress, gress-index="4", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:4:None", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=None}
    label               : 0
    log                 : false
    match               : "outport == @a14645450421485494999 && ((ip4.src == $a13814616246365836720))"
    meter               : acl-logging
    name                : "ANP:cluster-control:Ingress:4"
    options             : {}
    priority            : 26596
    severity            : []
    tier                : 1
    
    _uuid               : 5d09957d-d2cc-4f5a-9ddd-b97d9d772023
    action              : allow-related
    direction           : from-lport
    external_ids        : {direction=Egress, gress-index="2", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:2:tcp", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=tcp}
    label               : 0
    log                 : false
    match               : "inport == @a14645450421485494999 && ((ip4.dst == $a18396736153283155648)) && tcp && tcp.dst=={8991,8992}"
    meter               : acl-logging
    name                : "ANP:cluster-control:Egress:2"
    options             : {apply-after-lb="true"}
    priority            : 26598
    severity            : []
    tier                : 1
    
    _uuid               : 1a68a5ed-e7f9-47d0-b55c-89184d97e81a
    action              : allow-related
    direction           : from-lport
    external_ids        : {direction=Egress, gress-index="1", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:1:tcp", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=tcp}
    label               : 0
    log                 : false
    match               : "inport == @a14645450421485494999 && ((ip4.dst == $a10706246167277696183)) && tcp && tcp.dst==6443"
    meter               : acl-logging
    name                : "ANP:cluster-control:Egress:1"
    options             : {apply-after-lb="true"}
    priority            : 26599
    severity            : []
    tier                : 1
    
    _uuid               : aa1a224d-7960-4952-bdfb-35246bafbac8
    action              : allow-related
    direction           : to-lport
    external_ids        : {direction=Ingress, gress-index="1", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:1:tcp", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=tcp}
    label               : 0
    log                 : false
    match               : "outport == @a14645450421485494999 && ((ip4.src == $a6786643370959569281)) && tcp && tcp.dst==7564"
    meter               : acl-logging
    name                : "ANP:cluster-control:Ingress:1"
    options             : {}
    priority            : 26599
    severity            : []
    tier                : 1
    
    _uuid               : 1a27d30e-3f96-4915-8ddd-ade7f22c117b
    action              : allow-related
    direction           : from-lport
    external_ids        : {direction=Egress, gress-index="3", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:3:None", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=None}
    label               : 0
    log                 : false
    match               : "inport == @a14645450421485494999 && ((ip4.dst == $a10622494091691694581))"
    meter               : acl-logging
    name                : "ANP:cluster-control:Egress:3"
    options             : {apply-after-lb="true"}
    priority            : 26597
    severity            : []
    tier                : 1
    
    _uuid               : b23a087f-08f8-4225-8c27-4a9a9ee0c407
    action              : allow-related
    direction           : from-lport
    external_ids        : {direction=Egress, gress-index="0", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:0:udp", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=udp}
    label               : 0
    log                 : false
    match               : "inport == @a14645450421485494999 && ((ip4.dst == $a13517855690389298082)) && udp && udp.dst==5353"
    meter               : acl-logging
    name                : "ANP:cluster-control:Egress:0"
    options             : {apply-after-lb="true"}
    priority            : 26600
    severity            : []
    tier                : 1
    
    _uuid               : d14ed5cf-2e06-496e-8cae-6b76d5dd5ccd
    action              : allow-related
    direction           : to-lport
    external_ids        : {direction=Ingress, gress-index="0", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:0:None", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=None}
    label               : 0
    log                 : false
    match               : "outport == @a14645450421485494999 && ((ip4.src == $a14545668191619617708))"
    meter               : acl-logging
    name                : "ANP:cluster-control:Ingress:0"
    options             : {}
    priority            : 26600
    severity            : []
    tier                : 1
    Copy to Clipboard Toggle word wrap
    Note

    The outputs for ingress and egress show you the logic of the policy in the ACL. For example, every time a packet matches the provided match the action is taken.

    1. Examine the specific ACL for the rule by running the following command:

      $ ovn-nbctl find ACL 'external_ids{>=}{"k8s.ovn.org/owner-type"=AdminNetworkPolicy,direction=Ingress,"k8s.ovn.org/name"=cluster-control,gress-index="1"}'
      Copy to Clipboard Toggle word wrap
      Where, cluster-control
      Specifies the name of your ANP.
      Ingress
      Specifies the direction of traffic either of type Ingress or Egress.
      1
      Specifies the rule you want to look at.

      For the example ANP named cluster-control at priority 34, the following is an example output for Ingress rule 1:

      Example 6.15. Example output

      _uuid               : aa1a224d-7960-4952-bdfb-35246bafbac8
      action              : allow-related
      direction           : to-lport
      external_ids        : {direction=Ingress, gress-index="1", "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:1:tcp", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy, port-policy-protocol=tcp}
      label               : 0
      log                 : false
      match               : "outport == @a14645450421485494999 && ((ip4.src == $a6786643370959569281)) && tcp && tcp.dst==7564"
      meter               : acl-logging
      name                : "ANP:cluster-control:Ingress:1"
      options             : {}
      priority            : 26599
      severity            : []
      tier                : 1
      Copy to Clipboard Toggle word wrap
  4. Run the following command to look at address sets in the nbdb:

    $ ovn-nbctl find Address_Set 'external_ids{>=}{"k8s.ovn.org/owner-type"=AdminNetworkPolicy,"k8s.ovn.org/name"=cluster-control}'
    Copy to Clipboard Toggle word wrap

    Example 6.16. Example outputs for Address_Set

    _uuid               : 56e89601-5552-4238-9fc3-8833f5494869
    addresses           : ["192.168.194.135", "192.168.194.152", "192.168.194.193", "192.168.194.254"]
    external_ids        : {direction=Egress, gress-index="1", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:1:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a10706246167277696183
    
    _uuid               : 7df9330d-380b-4bdb-8acd-4eddeda2419c
    addresses           : ["10.132.0.10", "10.132.0.11", "10.132.0.12", "10.132.0.13", "10.132.0.14", "10.132.0.15", "10.132.0.16", "10.132.0.17", "10.132.0.5", "10.132.0.7", "10.132.0.71", "10.132.0.75", "10.132.0.8", "10.132.0.81", "10.132.0.9", "10.132.2.10", "10.132.2.11", "10.132.2.12", "10.132.2.14", "10.132.2.15", "10.132.2.3", "10.132.2.4", "10.132.2.5", "10.132.2.6", "10.132.2.7", "10.132.2.8", "10.132.2.9", "10.132.3.64", "10.132.3.65", "10.132.3.72", "10.132.3.73", "10.132.3.76", "10.133.0.10", "10.133.0.11", "10.133.0.12", "10.133.0.13", "10.133.0.14", "10.133.0.15", "10.133.0.16", "10.133.0.17", "10.133.0.18", "10.133.0.19", "10.133.0.20", "10.133.0.21", "10.133.0.22", "10.133.0.23", "10.133.0.24", "10.133.0.25", "10.133.0.26", "10.133.0.27", "10.133.0.28", "10.133.0.29", "10.133.0.30", "10.133.0.31", "10.133.0.32", "10.133.0.33", "10.133.0.34", "10.133.0.35", "10.133.0.36", "10.133.0.37", "10.133.0.38", "10.133.0.39", "10.133.0.40", "10.133.0.41", "10.133.0.42", "10.133.0.44", "10.133.0.45", "10.133.0.46", "10.133.0.47", "10.133.0.48", "10.133.0.5", "10.133.0.6", "10.133.0.7", "10.133.0.8", "10.133.0.9", "10.134.0.10", "10.134.0.11", "10.134.0.12", "10.134.0.13", "10.134.0.14", "10.134.0.15", "10.134.0.16", "10.134.0.17", "10.134.0.18", "10.134.0.19", "10.134.0.20", "10.134.0.21", "10.134.0.22", "10.134.0.23", "10.134.0.24", "10.134.0.25", "10.134.0.26", "10.134.0.27", "10.134.0.28", "10.134.0.30", "10.134.0.31", "10.134.0.32", "10.134.0.33", "10.134.0.34", "10.134.0.35", "10.134.0.36", "10.134.0.37", "10.134.0.38", "10.134.0.4", "10.134.0.42", "10.134.0.9", "10.135.0.10", "10.135.0.11", "10.135.0.12", "10.135.0.13", "10.135.0.14", "10.135.0.15", "10.135.0.16", "10.135.0.17", "10.135.0.18", "10.135.0.19", "10.135.0.23", "10.135.0.24", "10.135.0.26", "10.135.0.27", "10.135.0.29", "10.135.0.3", "10.135.0.4", "10.135.0.40", "10.135.0.41", "10.135.0.42", "10.135.0.43", "10.135.0.44", "10.135.0.5", "10.135.0.6", "10.135.0.7", "10.135.0.8", "10.135.0.9"]
    external_ids        : {direction=Ingress, gress-index="4", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:4:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a13814616246365836720
    
    _uuid               : 84d76f13-ad95-4c00-8329-a0b1d023c289
    addresses           : ["10.132.3.76", "10.135.0.44"]
    external_ids        : {direction=Egress, gress-index="4", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:4:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a5972452606168369118
    
    _uuid               : 0c53e917-f7ee-4256-8f3a-9522c0481e52
    addresses           : ["10.132.0.10", "10.132.0.11", "10.132.0.12", "10.132.0.13", "10.132.0.14", "10.132.0.15", "10.132.0.16", "10.132.0.17", "10.132.0.5", "10.132.0.7", "10.132.0.71", "10.132.0.75", "10.132.0.8", "10.132.0.81", "10.132.0.9", "10.132.2.10", "10.132.2.11", "10.132.2.12", "10.132.2.14", "10.132.2.15", "10.132.2.3", "10.132.2.4", "10.132.2.5", "10.132.2.6", "10.132.2.7", "10.132.2.8", "10.132.2.9", "10.132.3.64", "10.132.3.65", "10.132.3.72", "10.132.3.73", "10.132.3.76", "10.133.0.10", "10.133.0.11", "10.133.0.12", "10.133.0.13", "10.133.0.14", "10.133.0.15", "10.133.0.16", "10.133.0.17", "10.133.0.18", "10.133.0.19", "10.133.0.20", "10.133.0.21", "10.133.0.22", "10.133.0.23", "10.133.0.24", "10.133.0.25", "10.133.0.26", "10.133.0.27", "10.133.0.28", "10.133.0.29", "10.133.0.30", "10.133.0.31", "10.133.0.32", "10.133.0.33", "10.133.0.34", "10.133.0.35", "10.133.0.36", "10.133.0.37", "10.133.0.38", "10.133.0.39", "10.133.0.40", "10.133.0.41", "10.133.0.42", "10.133.0.44", "10.133.0.45", "10.133.0.46", "10.133.0.47", "10.133.0.48", "10.133.0.5", "10.133.0.6", "10.133.0.7", "10.133.0.8", "10.133.0.9", "10.134.0.10", "10.134.0.11", "10.134.0.12", "10.134.0.13", "10.134.0.14", "10.134.0.15", "10.134.0.16", "10.134.0.17", "10.134.0.18", "10.134.0.19", "10.134.0.20", "10.134.0.21", "10.134.0.22", "10.134.0.23", "10.134.0.24", "10.134.0.25", "10.134.0.26", "10.134.0.27", "10.134.0.28", "10.134.0.30", "10.134.0.31", "10.134.0.32", "10.134.0.33", "10.134.0.34", "10.134.0.35", "10.134.0.36", "10.134.0.37", "10.134.0.38", "10.134.0.4", "10.134.0.42", "10.134.0.9", "10.135.0.10", "10.135.0.11", "10.135.0.12", "10.135.0.13", "10.135.0.14", "10.135.0.15", "10.135.0.16", "10.135.0.17", "10.135.0.18", "10.135.0.19", "10.135.0.23", "10.135.0.24", "10.135.0.26", "10.135.0.27", "10.135.0.29", "10.135.0.3", "10.135.0.4", "10.135.0.40", "10.135.0.41", "10.135.0.42", "10.135.0.43", "10.135.0.44", "10.135.0.5", "10.135.0.6", "10.135.0.7", "10.135.0.8", "10.135.0.9"]
    external_ids        : {direction=Egress, gress-index="2", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:2:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a18396736153283155648
    
    _uuid               : 5228bf1b-dfd8-40ec-bfa8-95c5bf9aded9
    addresses           : []
    external_ids        : {direction=Ingress, gress-index="0", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:0:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a14545668191619617708
    
    _uuid               : 46530d69-70da-4558-8c63-884ec9dc4f25
    addresses           : ["10.132.2.10", "10.132.2.5", "10.132.2.6", "10.132.2.7", "10.132.2.8", "10.132.2.9", "10.133.0.47", "10.134.0.33", "10.135.0.10", "10.135.0.11", "10.135.0.12", "10.135.0.19", "10.135.0.24", "10.135.0.7", "10.135.0.8", "10.135.0.9"]
    external_ids        : {direction=Ingress, gress-index="1", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:1:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a6786643370959569281
    
    _uuid               : 65fdcdea-0b9f-4318-9884-1b51d231ad1d
    addresses           : ["10.132.3.72", "10.135.0.42"]
    external_ids        : {direction=Ingress, gress-index="2", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:2:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a13730899355151937870
    
    _uuid               : 73eabdb0-36bf-4ca3-b66d-156ac710df4c
    addresses           : ["10.0.32.0/19", "10.0.56.38/32", "10.0.69.0/24", "10.132.3.72", "10.135.0.42", "172.29.0.0/30", "192.168.194.103", "192.168.194.2"]
    external_ids        : {direction=Egress, gress-index="3", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:3:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a10622494091691694581
    
    _uuid               : 50cdbef2-71b5-474b-914c-6fcd1d7712d3
    addresses           : ["10.132.0.10", "10.132.0.11", "10.132.0.12", "10.132.0.13", "10.132.0.14", "10.132.0.15", "10.132.0.16", "10.132.0.17", "10.132.0.5", "10.132.0.7", "10.132.0.71", "10.132.0.75", "10.132.0.8", "10.132.0.81", "10.132.0.9", "10.132.2.10", "10.132.2.11", "10.132.2.12", "10.132.2.14", "10.132.2.15", "10.132.2.3", "10.132.2.4", "10.132.2.5", "10.132.2.6", "10.132.2.7", "10.132.2.8", "10.132.2.9", "10.132.3.64", "10.132.3.65", "10.132.3.72", "10.132.3.73", "10.132.3.76", "10.133.0.10", "10.133.0.11", "10.133.0.12", "10.133.0.13", "10.133.0.14", "10.133.0.15", "10.133.0.16", "10.133.0.17", "10.133.0.18", "10.133.0.19", "10.133.0.20", "10.133.0.21", "10.133.0.22", "10.133.0.23", "10.133.0.24", "10.133.0.25", "10.133.0.26", "10.133.0.27", "10.133.0.28", "10.133.0.29", "10.133.0.30", "10.133.0.31", "10.133.0.32", "10.133.0.33", "10.133.0.34", "10.133.0.35", "10.133.0.36", "10.133.0.37", "10.133.0.38", "10.133.0.39", "10.133.0.40", "10.133.0.41", "10.133.0.42", "10.133.0.44", "10.133.0.45", "10.133.0.46", "10.133.0.47", "10.133.0.48", "10.133.0.5", "10.133.0.6", "10.133.0.7", "10.133.0.8", "10.133.0.9", "10.134.0.10", "10.134.0.11", "10.134.0.12", "10.134.0.13", "10.134.0.14", "10.134.0.15", "10.134.0.16", "10.134.0.17", "10.134.0.18", "10.134.0.19", "10.134.0.20", "10.134.0.21", "10.134.0.22", "10.134.0.23", "10.134.0.24", "10.134.0.25", "10.134.0.26", "10.134.0.27", "10.134.0.28", "10.134.0.30", "10.134.0.31", "10.134.0.32", "10.134.0.33", "10.134.0.34", "10.134.0.35", "10.134.0.36", "10.134.0.37", "10.134.0.38", "10.134.0.4", "10.134.0.42", "10.134.0.9", "10.135.0.10", "10.135.0.11", "10.135.0.12", "10.135.0.13", "10.135.0.14", "10.135.0.15", "10.135.0.16", "10.135.0.17", "10.135.0.18", "10.135.0.19", "10.135.0.23", "10.135.0.24", "10.135.0.26", "10.135.0.27", "10.135.0.29", "10.135.0.3", "10.135.0.4", "10.135.0.40", "10.135.0.41", "10.135.0.42", "10.135.0.43", "10.135.0.44", "10.135.0.5", "10.135.0.6", "10.135.0.7", "10.135.0.8", "10.135.0.9"]
    external_ids        : {direction=Egress, gress-index="0", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:0:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a13517855690389298082
    
    _uuid               : 32a42f32-2d11-43dd-979d-a56d7ee6aa57
    addresses           : ["10.132.3.76", "10.135.0.44"]
    external_ids        : {direction=Ingress, gress-index="3", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Ingress:3:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a764182844364804195
    
    _uuid               : 8fd3b977-6e1c-47aa-82b7-e3e3136c4a72
    addresses           : ["0.0.0.0/0"]
    external_ids        : {direction=Egress, gress-index="5", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:5:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a11452480169090787059
    Copy to Clipboard Toggle word wrap
    1. Examine the specific address set of the rule by running the following command:

      $ ovn-nbctl find Address_Set 'external_ids{>=}{"k8s.ovn.org/owner-type"=AdminNetworkPolicy,direction=Egress,"k8s.ovn.org/name"=cluster-control,gress-index="5"}'
      Copy to Clipboard Toggle word wrap

      Example 6.17. Example outputs for Address_Set

      _uuid               : 8fd3b977-6e1c-47aa-82b7-e3e3136c4a72
      addresses           : ["0.0.0.0/0"]
      external_ids        : {direction=Egress, gress-index="5", ip-family=v4, "k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control:Egress:5:v4", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
      name                : a11452480169090787059
      Copy to Clipboard Toggle word wrap
  5. Run the following command to look at the port groups in the nbdb:

    $ ovn-nbctl find Port_Group 'external_ids{>=}{"k8s.ovn.org/owner-type"=AdminNetworkPolicy,"k8s.ovn.org/name"=cluster-control}'
    Copy to Clipboard Toggle word wrap

    Example 6.18. Example outputs for Port_Group

    _uuid               : f50acf71-7488-4b9a-b7b8-c8a024e99d21
    acls                : [04f20275-c410-405c-a923-0e677f767889, 0d5e4722-b608-4bb1-b625-23c323cc9926, 1a27d30e-3f96-4915-8ddd-ade7f22c117b, 1a68a5ed-e7f9-47d0-b55c-89184d97e81a, 4b5d836a-e0a3-4088-825e-f9f0ca58e538, 5a6e5bb4-36eb-4209-b8bc-c611983d4624, 5d09957d-d2cc-4f5a-9ddd-b97d9d772023, aa1a224d-7960-4952-bdfb-35246bafbac8, b23a087f-08f8-4225-8c27-4a9a9ee0c407, b7be6472-df67-439c-8c9c-f55929f0a6e0, d14ed5cf-2e06-496e-8cae-6b76d5dd5ccd]
    external_ids        : {"k8s.ovn.org/id"="default-network-controller:AdminNetworkPolicy:cluster-control", "k8s.ovn.org/name"=cluster-control, "k8s.ovn.org/owner-controller"=default-network-controller, "k8s.ovn.org/owner-type"=AdminNetworkPolicy}
    name                : a14645450421485494999
    ports               : [5e75f289-8273-4f8a-8798-8c10f7318833, de7e1b71-6184-445d-93e7-b20acadf41ea]
    Copy to Clipboard Toggle word wrap

6.3. Network policy

6.3.1. About network policy

As a developer, you can define network policies that restrict traffic to pods in your cluster.

6.3.1.1. About network policy

In a cluster that uses a network plugin that supports a Kubernetes network policy, a NetworkPolicy object controls network isolation. In OpenShift Container Platform 4.16, OpenShift SDN supports using network policy in its default network isolation mode.

Warning
  • On OpenShift SDN: A network policy does not apply to the host network namespace. Network policy rules do not impact pods configured with host networking enabled. Network policy rules might impact pods connected to host-networked pods.
  • On Openshift-OVN-Kubernetes: Network policies do impact pods that have host networking enabled, so you must explicitly allow a connection to these pods in your network policy rules. If a namespace has any network policy applied, traffic originating from system components, such as openshift-ingress or openshift-kube-apiserver, get dropped by default; You must explicitly enable this traffic to allow it through.
  • Using the namespaceSelector field without the podSelector field set to {} will not include hostNetwork pods. You must use the podSelector set to {} with the namespaceSelector field in order to target hostNetwork pods when creating network policies.
  • Network policies cannot block traffic from localhost or from their resident nodes.

By default, all pods in a project are accessible from other pods and network endpoints. To isolate one or more pods in a project, you can create NetworkPolicy objects in that project to indicate the allowed incoming connections. Project administrators can create and delete NetworkPolicy objects within their own project.

If a pod is matched by selectors in one or more NetworkPolicy objects, then the pod will accept only connections that are allowed by at least one of those NetworkPolicy objects. A pod that is not selected by any NetworkPolicy objects is fully accessible.

A network policy applies to only the Transmission Control Protocol (TCP), User Datagram Protocol (UDP), Internet Control Message Protocol (ICMP), and Stream Control Transmission Protocol (SCTP) protocols. Other protocols are not affected.

The following example NetworkPolicy objects demonstrate supporting different scenarios:

  • Deny all traffic:

    To make a project deny by default, add a NetworkPolicy object that matches all pods but accepts no traffic:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: deny-by-default
    spec:
      podSelector: {}
      ingress: []
    Copy to Clipboard Toggle word wrap
  • Only allow connections from the OpenShift Container Platform Ingress Controller:

    To make a project allow only connections from the OpenShift Container Platform Ingress Controller, add the following NetworkPolicy object.

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: allow-from-openshift-ingress
    spec:
      ingress:
      - from:
        - namespaceSelector:
            matchLabels:
              policy-group.network.openshift.io/ingress: ""
      podSelector: {}
      policyTypes:
      - Ingress
    Copy to Clipboard Toggle word wrap
  • Only accept connections from pods within a project:

    Important

    To allow ingress connections from hostNetwork pods in the same namespace, you need to apply the allow-from-hostnetwork policy together with the allow-same-namespace policy.

    To make pods accept connections from other pods in the same project, but reject all other connections from pods in other projects, add the following NetworkPolicy object:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: allow-same-namespace
    spec:
      podSelector: {}
      ingress:
      - from:
        - podSelector: {}
    Copy to Clipboard Toggle word wrap
  • Only allow HTTP and HTTPS traffic based on pod labels:

    To enable only HTTP and HTTPS access to the pods with a specific label (role=frontend in following example), add a NetworkPolicy object similar to the following:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: allow-http-and-https
    spec:
      podSelector:
        matchLabels:
          role: frontend
      ingress:
      - ports:
        - protocol: TCP
          port: 80
        - protocol: TCP
          port: 443
    Copy to Clipboard Toggle word wrap
  • Accept connections by using both namespace and pod selectors:

    To match network traffic by combining namespace and pod selectors, you can use a NetworkPolicy object similar to the following:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: allow-pod-and-namespace-both
    spec:
      podSelector:
        matchLabels:
          name: test-pods
      ingress:
        - from:
          - namespaceSelector:
              matchLabels:
                project: project_name
            podSelector:
              matchLabels:
                name: test-pods
    Copy to Clipboard Toggle word wrap

NetworkPolicy objects are additive, which means you can combine multiple NetworkPolicy objects together to satisfy complex network requirements.

For example, for the NetworkPolicy objects defined in previous samples, you can define both allow-same-namespace and allow-http-and-https policies within the same project. Thus allowing the pods with the label role=frontend, to accept any connection allowed by each policy. That is, connections on any port from pods in the same namespace, and connections on ports 80 and 443 from pods in any namespace.

Use the following NetworkPolicy to allow external traffic regardless of the router configuration:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-router
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          policy-group.network.openshift.io/ingress: ""
1

  podSelector: {}
  policyTypes:
  - Ingress
Copy to Clipboard Toggle word wrap
1
policy-group.network.openshift.io/ingress:"" label supports both OpenShift-SDN and OVN-Kubernetes.

Add the following allow-from-hostnetwork NetworkPolicy object to direct traffic from the host network pods.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-hostnetwork
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          policy-group.network.openshift.io/host-network: ""
  podSelector: {}
  policyTypes:
  - Ingress
Copy to Clipboard Toggle word wrap

Use a network policy to isolate pods that are differentiated from one another by labels within a namespace.

It is inefficient to apply NetworkPolicy objects to large numbers of individual pods in a single namespace. Pod labels do not exist at the IP address level, so a network policy generates a separate Open vSwitch (OVS) flow rule for every possible link between every pod selected with a podSelector.

For example, if the spec podSelector and the ingress podSelector within a NetworkPolicy object each match 200 pods, then 40,000 (200*200) OVS flow rules are generated. This might slow down a node.

When designing your network policy, refer to the following guidelines:

  • Reduce the number of OVS flow rules by using namespaces to contain groups of pods that need to be isolated.

    NetworkPolicy objects that select a whole namespace, by using the namespaceSelector or an empty podSelector, generate only a single OVS flow rule that matches the VXLAN virtual network ID (VNID) of the namespace.

  • Keep the pods that do not need to be isolated in their original namespace, and move the pods that require isolation into one or more different namespaces.
  • Create additional targeted cross-namespace network policies to allow the specific traffic that you do want to allow from the isolated pods.

Learn how to optimize OVN-Kubernetes network policies to reduce flow count and ensure external IP traffic is allowed when needed.

When designing your network policy, refer to the following guidelines:

  • For network policies with the same spec.podSelector spec, it is more efficient to use one network policy with multiple ingress or egress rules, than multiple network policies with subsets of ingress or egress rules.
  • Every ingress or egress rule based on the podSelector or namespaceSelector spec generates the number of OVS flows proportional to number of pods selected by network policy + number of pods selected by ingress or egress rule. Therefore, it is preferable to use the podSelector or namespaceSelector spec that can select as many pods as you need in one rule, instead of creating individual rules for every pod.

    For example, the following policy contains two rules:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: test-network-policy
    spec:
      podSelector: {}
      ingress:
      - from:
        - podSelector:
            matchLabels:
              role: frontend
      - from:
        - podSelector:
            matchLabels:
              role: backend
    Copy to Clipboard Toggle word wrap

    The following policy expresses those same two rules as one:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: test-network-policy
    spec:
      podSelector: {}
      ingress:
      - from:
        - podSelector:
            matchExpressions:
            - {key: role, operator: In, values: [frontend, backend]}
    Copy to Clipboard Toggle word wrap

    The same guideline applies to the spec.podSelector spec. If you have the same ingress or egress rules for different network policies, it might be more efficient to create one network policy with a common spec.podSelector spec. For example, the following two policies have different rules:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: policy1
    spec:
      podSelector:
        matchLabels:
          role: db
      ingress:
      - from:
        - podSelector:
            matchLabels:
              role: frontend
    ---
    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: policy2
    spec:
      podSelector:
        matchLabels:
          role: client
      ingress:
      - from:
        - podSelector:
            matchLabels:
              role: frontend
    Copy to Clipboard Toggle word wrap

    The following network policy expresses those same two rules as one:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: policy3
    spec:
      podSelector:
        matchExpressions:
        - {key: role, operator: In, values: [db, client]}
      ingress:
      - from:
        - podSelector:
            matchLabels:
              role: frontend
    Copy to Clipboard Toggle word wrap

    You can apply this optimization when only multiple selectors are expressed as one. In cases where selectors are based on different labels, it may not be possible to apply this optimization. In those cases, consider applying some new labels for network policy optimization specifically.

In OVN-Kubernetes, the NetworkPolicy custom resource (CR) enforces strict isolation rules. If a service is exposed using an external IP, a network policy can block access from other namespaces unless explicitly configured to allow traffic.

To allow access to external IPs across namespaces, create a NetworkPolicy CR that explicitly permits ingress from the required namespaces and ensures traffic is allowed to the designated service ports. Without allowing traffic to the required ports, access might still be restricted.

Example output

  apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
    name: <policy_name>
    namespace: openshift-ingress
  spec:
    ingress:
    - ports:
      - port: 80
        protocol: TCP
    - ports:
      - port: 443
        protocol: TCP
    - from:
      - namespaceSelector:
          matchLabels:
          kubernetes.io/metadata.name: <my_namespace>
    podSelector: {}
    policyTypes:
    - Ingress
Copy to Clipboard Toggle word wrap

where:

<policy_name>
Specifies your name for the policy.
<my_namespace>
Specifies the name of the namespace where the policy is deployed.

For more details, see "About network policy".

6.3.1.4. Next steps

6.3.2. Creating a network policy

As a cluster administrator, you can create a network policy for a namespace.

6.3.2.1. Example NetworkPolicy object

The following configuration annotates an example NetworkPolicy object:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-27107
spec:
  podSelector:
    matchLabels:
      app: mongodb
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: app
    ports:
    - protocol: TCP
      port: 27017
Copy to Clipboard Toggle word wrap

where:

name
The name of the NetworkPolicy object.
spec.podSelector
A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object.
ingress.from.podSelector
A selector that matches the pods from which the policy object allows ingress traffic. The selector matches pods in the same namespace as the NetworkPolicy.
ingress.ports
A list of one or more destination ports on which to accept traffic.
6.3.2.2. Creating a network policy using the CLI

To define granular rules describing ingress or egress network traffic allowed for namespaces in your cluster, you can create a network policy.

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Prerequisites

  • Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
  • You installed the OpenShift CLI (oc).
  • You logged in to the cluster with a user with admin privileges.
  • You are working in the namespace that the network policy applies to.

Procedure

  1. Create a policy rule.

    1. Create a <policy_name>.yaml file:

      $ touch <policy_name>.yaml
      Copy to Clipboard Toggle word wrap

      where:

      <policy_name>
      Specifies the network policy file name.
    2. Define a network policy in the created file. The following example denies ingress traffic from all pods in all namespaces. This is a fundamental policy, blocking all cross-pod networking other than cross-pod traffic allowed by the configuration of other Network Policies.

      kind: NetworkPolicy
      apiVersion: networking.k8s.io/v1
      spec:
        podSelector: {}
        policyTypes:
        - Ingress
        ingress: []
      Copy to Clipboard Toggle word wrap

      The following example configuration allows ingress traffic from all pods in the same namespace:

      kind: NetworkPolicy
      apiVersion: networking.k8s.io/v1
      metadata:
        name: allow-same-namespace
      spec:
        podSelector:
        ingress:
        - from:
          - podSelector: {}
      # ...
      Copy to Clipboard Toggle word wrap

      The following example allows ingress traffic to one pod from a particular namespace. This policy allows traffic to pods that have the pod-a label from pods running in namespace-y.

      kind: NetworkPolicy
      apiVersion: networking.k8s.io/v1
      metadata:
        name: allow-traffic-pod
      spec:
        podSelector:
         matchLabels:
            pod: pod-a
        policyTypes:
        - Ingress
        ingress:
        - from:
          - namespaceSelector:
              matchLabels:
                 kubernetes.io/metadata.name: namespace-y
      # ...
      Copy to Clipboard Toggle word wrap

      The following example configuration restricts traffic to a service. This policy when applied ensures every pod with both labels app=bookstore and role=api can only be accessed by pods with label app=bookstore. In this example the application could be a REST API server, marked with labels app=bookstore and role=api.

      This example configuration addresses the following use cases:

      • Restricting the traffic to a service to only the other microservices that need to use it.
      • Restricting the connections to a database to only permit the application using it.

        kind: NetworkPolicy
        apiVersion: networking.k8s.io/v1
        metadata:
          name: api-allow
        spec:
          podSelector:
            matchLabels:
              app: bookstore
              role: api
          ingress:
          - from:
              - podSelector:
                  matchLabels:
                    app: bookstore
        # ...
        Copy to Clipboard Toggle word wrap
  2. To create the network policy object, enter the following command. Successful output lists the name of the policy object and the created status.

    $ oc apply -f <policy_name>.yaml -n <namespace>
    Copy to Clipboard Toggle word wrap

    where:

    <policy_name>
    Specifies the network policy file name.
    <namespace>
    Optional parameter. If you defined the object in a different namespace than the current namespace, the parameter specifices the namespace.

    Successful output lists the name of the policy object and the created status.

    Note

    If you log in to the web console with cluster-admin privileges, you have a choice of creating a network policy in any namespace in the cluster directly in YAML or from a form in the web console.

The default deny all network policy blocks all cross-pod networking other than network traffic allowed by the configuration of other deployed network policies and traffic between host-networked pods. This procedure enforces a strong deny policy by applying a deny-by-default policy in the my-project namespace.

Warning

Without configuring a NetworkPolicy custom resource (CR) that allows traffic communication, the following policy might cause communication problems across your cluster.

Prerequisites

  • Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
  • You installed the OpenShift CLI (oc).
  • You logged in to the cluster with a user with admin privileges.
  • You are working in the namespace that the network policy applies to.

Procedure

  1. Create the following YAML that defines a deny-by-default policy to deny ingress from all pods in all namespaces. Save the YAML in the deny-by-default.yaml file:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: deny-by-default
      namespace: my-project
    spec:
      podSelector: {}
      ingress: []
    Copy to Clipboard Toggle word wrap

    where:

    namespace
    Specifies the namespace in which to deploy the policy. For example, the my-project namespace.
    podSelector
    If this field is empty, the configuration matches all the pods. Therefore, the policy applies to all pods in the my-project namespace.
    ingress
    Where [] indicates that no ingress rules are specified. This causes incoming traffic to be dropped to all pods.
  2. Apply the policy by entering the following command. Successful output lists the name of the policy object and the created status.

    $ oc apply -f deny-by-default.yaml
    Copy to Clipboard Toggle word wrap

With the deny-by-default policy in place you can proceed to configure a policy that allows traffic from external clients to a pod with the label app=web.

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Follow this procedure to configure a policy that allows external service from the public Internet directly or by using a Load Balancer to access the pod. Traffic is only allowed to a pod with the label app=web.

Prerequisites

  • Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
  • You installed the OpenShift CLI (oc).
  • You logged in to the cluster with a user with admin privileges.
  • You are working in the namespace that the network policy applies to.

Procedure

  1. Create a policy that allows traffic from the public Internet directly or by using a load balancer to access the pod. Save the YAML in the web-allow-external.yaml file:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    spec:
      policyTypes:
      - Ingress
      podSelector:
        matchLabels:
          app: web
      ingress:
        - {}
    Copy to Clipboard Toggle word wrap
  2. Apply the policy by entering the following command. Successful output lists the name of the policy object and the created status.

    $ oc apply -f web-allow-external.yaml
    Copy to Clipboard Toggle word wrap

    This policy allows traffic from all resources, including external traffic as illustrated in the following diagram:

You can configure a policy that allows traffic from all pods in all namespaces to a particular application.

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Prerequisites

  • Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
  • You installed the OpenShift CLI (oc).
  • You logged in to the cluster with a user with admin privileges.
  • You are working in the namespace that the network policy applies to.

Procedure

  1. Create a policy that allows traffic from all pods in all namespaces to a particular application. Save the YAML in the web-allow-all-namespaces.yaml file:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    spec:
      podSelector:
        matchLabels:
          app: web
      policyTypes:
      - Ingress
      ingress:
      - from:
        - namespaceSelector: {}
    Copy to Clipboard Toggle word wrap

    where:

    app
    Applies the policy only to app:web pods in default namespace.
    namespaceSelector

    Selects all pods in all namespaces.

    Note

    By default, if you do not specify a namespaceSelector parameter in the policy object, no namespaces get selected. This means the policy allows traffic only from the namespace where the network policy deployes.

  2. Apply the policy by entering the following command. Successful output lists the name of the policy object and the created status.

    $ oc apply -f web-allow-all-namespaces.yaml
    Copy to Clipboard Toggle word wrap

Verification

  1. Start a web service in the default namespace by entering the following command:

    $ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
    Copy to Clipboard Toggle word wrap
  2. Run the following command to deploy an alpine image in the secondary namespace and to start a shell:

    $ oc run test-$RANDOM --namespace=secondary --rm -i -t --image=alpine -- sh
    Copy to Clipboard Toggle word wrap
  3. Run the following command in the shell and observe that the service allows the request:

    # wget -qO- --timeout=2 http://web.default
    Copy to Clipboard Toggle word wrap
    <!DOCTYPE html>
    <html>
    <head>
    <title>Welcome to nginx!</title>
    <style>
    html { color-scheme: light dark; }
    body { width: 35em; margin: 0 auto;
    font-family: Tahoma, Verdana, Arial, sans-serif; }
    </style>
    </head>
    <body>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.</p>
    
    <p>For online documentation and support please refer to
    <a href="http://nginx.org/">nginx.org</a>.<br/>
    Commercial support is available at
    <a href="http://nginx.com/">nginx.com</a>.</p>
    
    <p><em>Thank you for using nginx.</em></p>
    </body>
    </html>
    Copy to Clipboard Toggle word wrap

You can configure a policy that allows traffic to a pod with the label app=web from a particular namespace. This configuration is useful in the following use cases:

  • Restrict traffic to a production database only to namespaces that have production workloads deployed.
  • Enable monitoring tools deployed to a particular namespace to scrape metrics from the current namespace.
Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Prerequisites

  • Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
  • You installed the OpenShift CLI (oc).
  • You logged in to the cluster with a user with admin privileges.
  • You are working in the namespace that the network policy applies to.

Procedure

  1. Create a policy that allows traffic from all pods in a particular namespaces with a label purpose=production. Save the YAML in the web-allow-prod.yaml file:

    kind: NetworkPolicy
    apiVersion: networking.k8s.io/v1
    metadata:
      name: web-allow-prod
      namespace: default
    spec:
      podSelector:
        matchLabels:
          app: web
      policyTypes:
      - Ingress
      ingress:
      - from:
        - namespaceSelector:
            matchLabels:
              purpose: production
    Copy to Clipboard Toggle word wrap

    where:

    app
    Applies the policy only to app:web pods in the default namespace.
    purpose
    Restricts traffic to only pods in namespaces that have the label purpose=production.
  2. Apply the policy by entering the following command. Successful output lists the name of the policy object and the created status.

    $ oc apply -f web-allow-prod.yaml
    Copy to Clipboard Toggle word wrap

Verification

  1. Start a web service in the default namespace by entering the following command:

    $ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80
    Copy to Clipboard Toggle word wrap
  2. Run the following command to create the prod namespace:

    $ oc create namespace prod
    Copy to Clipboard Toggle word wrap
  3. Run the following command to label the prod namespace:

    $ oc label namespace/prod purpose=production
    Copy to Clipboard Toggle word wrap
  4. Run the following command to create the dev namespace:

    $ oc create namespace dev
    Copy to Clipboard Toggle word wrap
  5. Run the following command to label the dev namespace:

    $ oc label namespace/dev purpose=testing
    Copy to Clipboard Toggle word wrap
  6. Run the following command to deploy an alpine image in the dev namespace and to start a shell:

    $ oc run test-$RANDOM --namespace=dev --rm -i -t --image=alpine -- sh
    Copy to Clipboard Toggle word wrap
  7. Run the following command in the shell and observe the reason for the blocked request. For example, expected output states wget: download timed out.

    # wget -qO- --timeout=2 http://web.default
    Copy to Clipboard Toggle word wrap
  8. Run the following command to deploy an alpine image in the prod namespace and start a shell:

    $ oc run test-$RANDOM --namespace=prod --rm -i -t --image=alpine -- sh
    Copy to Clipboard Toggle word wrap
  9. Run the following command in the shell and observe that the request is allowed:

    # wget -qO- --timeout=2 http://web.default
    Copy to Clipboard Toggle word wrap
    <!DOCTYPE html>
    <html>
    <head>
    <title>Welcome to nginx!</title>
    <style>
    html { color-scheme: light dark; }
    body { width: 35em; margin: 0 auto;
    font-family: Tahoma, Verdana, Arial, sans-serif; }
    </style>
    </head>
    <body>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.</p>
    
    <p>For online documentation and support please refer to
    <a href="http://nginx.org/">nginx.org</a>.<br/>
    Commercial support is available at
    <a href="http://nginx.com/">nginx.com</a>.</p>
    
    <p><em>Thank you for using nginx.</em></p>
    </body>
    </html>
    Copy to Clipboard Toggle word wrap

6.3.3. Viewing a network policy

As a cluster administrator, you can view a network policy for a namespace.

6.3.3.1. Example NetworkPolicy object

The following configuration annotates an example NetworkPolicy object:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-27107
spec:
  podSelector:
    matchLabels:
      app: mongodb
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: app
    ports:
    - protocol: TCP
      port: 27017
Copy to Clipboard Toggle word wrap

where:

name
The name of the NetworkPolicy object.
spec.podSelector
A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object.
ingress.from.podSelector
A selector that matches the pods from which the policy object allows ingress traffic. The selector matches pods in the same namespace as the NetworkPolicy.
ingress.ports
A list of one or more destination ports on which to accept traffic.
6.3.3.2. Viewing network policies using the CLI

You can examine the network policies in a namespace.

Note

If you log in with cluster-admin privileges, you can edit network policies in any namespace in the cluster.

Note

If you log in with cluster-admin privileges, you can edit network policies in any namespace in the cluster. In the web console, you can edit policies directly in YAML or by using the Actions menu.

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You are logged in to the cluster with a user with admin privileges.
  • You are working in the namespace where the network policy exists.

Procedure

  1. List network policies in a namespace.

    1. To view network policy objects defined in a namespace enter the following command:

      $ oc get networkpolicy
      Copy to Clipboard Toggle word wrap
    2. Optional: To examine a specific network policy enter the following command:

      $ oc describe networkpolicy <policy_name> -n <namespace>
      Copy to Clipboard Toggle word wrap

      where:

      <policy_name>
      Specifies the name of the network policy to inspect.
      <namespace>

      Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.

      $ oc describe networkpolicy allow-same-namespace
      Copy to Clipboard Toggle word wrap
      Name:         allow-same-namespace
      Namespace:    ns1
      Created on:   2021-05-24 22:28:56 -0400 EDT
      Labels:       <none>
      Annotations:  <none>
      Spec:
        PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
        Allowing ingress traffic:
          To Port: <any> (traffic allowed to all ports)
          From:
            PodSelector: <none>
        Not affecting egress traffic
        Policy Types: Ingress
      Copy to Clipboard Toggle word wrap

6.3.4. Editing a network policy

As a cluster administrator, you can edit an existing network policy for a namespace.

6.3.4.1. Editing a network policy

To modify existing policy configurations, you can edit a network policy in a namespace. Edit policies by modifying the policy file and applying it with oc apply, or by using the oc edit command directly.

Note

If you log in with cluster-admin privileges, you can edit network policies in any namespace in the cluster.

Note

If you log in with cluster-admin privileges, you can edit network policies in any namespace in the cluster. In the web console, you can edit policies directly in YAML or by using the Actions menu.

Prerequisites

  • Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
  • You installed the OpenShift CLI (oc).
  • You are logged in to the cluster with a user with admin privileges.
  • You are working in the namespace where the network policy exists.

Procedure

  1. Optional: To list the network policy objects in a namespace, enter the following command:

    $ oc get network policy -n <namespace>
    Copy to Clipboard Toggle word wrap

    where:

    <namespace>
    Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
  2. Edit the network policy object.

    1. If you saved the network policy definition in a file, edit the file and make any necessary changes, and then enter the following command.

      $ oc apply -n <namespace> -f <policy_file>.yaml
      Copy to Clipboard Toggle word wrap

      where:

      <namespace>
      Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
      <policy_file>
      Specifies the name of the file containing the network policy.
    2. If you need to update the network policy object directly, enter the following command:

      $ oc edit network policy <policy_name> -n <namespace>
      Copy to Clipboard Toggle word wrap

      where:

      <policy_name>
      Specifies the name of the network policy.
      <namespace>
      Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
  3. Confirm that the network policy object is updated.

    $ oc describe networkpolicy <policy_name> -n <namespace>
    Copy to Clipboard Toggle word wrap

    where:

    <policy_name>
    Specifies the name of the network policy.
    <namespace>
    Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
6.3.4.2. Example NetworkPolicy object

The following configuration annotates an example NetworkPolicy object:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-27107
spec:
  podSelector:
    matchLabels:
      app: mongodb
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: app
    ports:
    - protocol: TCP
      port: 27017
Copy to Clipboard Toggle word wrap

where:

name
The name of the NetworkPolicy object.
spec.podSelector
A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object.
ingress.from.podSelector
A selector that matches the pods from which the policy object allows ingress traffic. The selector matches pods in the same namespace as the NetworkPolicy.
ingress.ports
A list of one or more destination ports on which to accept traffic.

6.3.5. Deleting a network policy

As a cluster administrator, you can delete a network policy from a namespace.

6.3.5.1. Deleting a network policy using the CLI

You can delete a network policy in a namespace.

Note

If you log in with cluster-admin privileges, you can delete network policies in any namespace in the cluster.

Note

If you log in with cluster-admin privileges, you can delete network policies in any namespace in the cluster. In the web console, you can delete policies directly in YAML or by using the Actions menu.

Prerequisites

  • Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
  • You installed the OpenShift CLI (oc).
  • You logged in to the cluster with a user with admin privileges.
  • You are working in the namespace where the network policy exists.

Procedure

  • To delete a network policy object, enter the following command. Successful output lists the name of the policy object and the deleted status.

    $ oc delete networkpolicy <policy_name> -n <namespace>
    Copy to Clipboard Toggle word wrap

    where:

    <policy_name>
    Specifies the name of the network policy.
    <namespace>
    Optional parameter. If you defined the object in a different namespace than the current namespace, the parameter specifices the namespace.

As a cluster administrator, you can modify the new project template to automatically include network policies when you create a new project. If you do not yet have a customized template for new projects, you must first create one.

6.3.6.1. Modifying the template for new projects

As a cluster administrator, you can modify the default project template so that new projects are created using your custom requirements.

To create your own custom project template:

Prerequisites

  • You have access to an OpenShift Container Platform cluster using an account with cluster-admin permissions.

Procedure

  1. Log in as a user with cluster-admin privileges.
  2. Generate the default project template:

    $ oc adm create-bootstrap-project-template -o yaml > template.yaml
    Copy to Clipboard Toggle word wrap
  3. Use a text editor to modify the generated template.yaml file by adding objects or modifying existing objects.
  4. The project template must be created in the openshift-config namespace. Load your modified template:

    $ oc create -f template.yaml -n openshift-config
    Copy to Clipboard Toggle word wrap
  5. Edit the project configuration resource using the web console or CLI.

    • Using the web console:

      1. Navigate to the AdministrationCluster Settings page.
      2. Click Configuration to view all configuration resources.
      3. Find the entry for Project and click Edit YAML.
    • Using the CLI:

      1. Edit the project.config.openshift.io/cluster resource:

        $ oc edit project.config.openshift.io/cluster
        Copy to Clipboard Toggle word wrap
  6. Update the spec section to include the projectRequestTemplate and name parameters, and set the name of your uploaded project template. The default name is project-request.

    Project configuration resource with custom project template

    apiVersion: config.openshift.io/v1
    kind: Project
    metadata:
    # ...
    spec:
      projectRequestTemplate:
        name: <template_name>
    # ...
    Copy to Clipboard Toggle word wrap

  7. After you save your changes, create a new project to verify that your changes were successfully applied.

As a cluster administrator, you can add network policies to the default template for new projects. OpenShift Container Platform will automatically create all the NetworkPolicy objects specified in the template in the project.

Prerequisites

  • Your cluster uses a default CNI network plugin that supports NetworkPolicy objects, such as the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
  • You installed the OpenShift CLI (oc).
  • You must log in to the cluster with a user with cluster-admin privileges.
  • You must have created a custom default project template for new projects.

Procedure

  1. Edit the default template for a new project by running the following command:

    $ oc edit template <project_template> -n openshift-config
    Copy to Clipboard Toggle word wrap

    Replace <project_template> with the name of the default template that you configured for your cluster. The default template name is project-request.

  2. In the template, add each NetworkPolicy object as an element to the objects parameter. The objects parameter accepts a collection of one or more objects.

    In the following example, the objects parameter collection includes several NetworkPolicy objects.

    objects:
    - apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-from-same-namespace
      spec:
        podSelector: {}
        ingress:
        - from:
          - podSelector: {}
    - apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-from-openshift-ingress
      spec:
        ingress:
        - from:
          - namespaceSelector:
              matchLabels:
                policy-group.network.openshift.io/ingress:
        podSelector: {}
        policyTypes:
        - Ingress
    - apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-from-kube-apiserver-operator
      spec:
        ingress:
        - from:
          - namespaceSelector:
              matchLabels:
                kubernetes.io/metadata.name: openshift-kube-apiserver-operator
            podSelector:
              matchLabels:
                app: kube-apiserver-operator
        policyTypes:
        - Ingress
    ...
    Copy to Clipboard Toggle word wrap
  3. Optional: Create a new project and confirm the successful creation of your network policy objects.

    1. Create a new project:

      $ oc new-project <project> 
      1
      Copy to Clipboard Toggle word wrap
      1
      Replace <project> with the name for the project you are creating.
    2. Confirm that the network policy objects in the new project template exist in the new project:

      $ oc get networkpolicy
      Copy to Clipboard Toggle word wrap

      Expected output:

      NAME                           POD-SELECTOR   AGE
      allow-from-openshift-ingress   <none>         7s
      allow-from-same-namespace      <none>         7s
      Copy to Clipboard Toggle word wrap

As a cluster administrator, you can configure your network policies to provide multitenant network isolation.

Note

If you are using the OpenShift SDN network plugin, configuring network policies as described in this section provides network isolation similar to multitenant mode but with network policy mode set.

You can configure your project to isolate it from pods and services in other project namespaces.

Prerequisites

  • Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
  • You installed the OpenShift CLI (oc).
  • You are logged in to the cluster with a user with admin privileges.

Procedure

  1. Create the following NetworkPolicy objects:

    1. A policy named allow-from-openshift-ingress.

      $ cat << EOF| oc create -f -
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-from-openshift-ingress
      spec:
        ingress:
        - from:
          - namespaceSelector:
              matchLabels:
                policy-group.network.openshift.io/ingress: ""
        podSelector: {}
        policyTypes:
        - Ingress
      EOF
      Copy to Clipboard Toggle word wrap
      Note

      policy-group.network.openshift.io/ingress: "" is the preferred namespace selector label for OpenShift SDN. You can use the network.openshift.io/policy-group: ingress namespace selector label, but this is a legacy label.

    2. A policy named allow-from-openshift-monitoring:

      $ cat << EOF| oc create -f -
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-from-openshift-monitoring
      spec:
        ingress:
        - from:
          - namespaceSelector:
              matchLabels:
                network.openshift.io/policy-group: monitoring
        podSelector: {}
        policyTypes:
        - Ingress
      EOF
      Copy to Clipboard Toggle word wrap
    3. A policy named allow-same-namespace:

      $ cat << EOF| oc create -f -
      kind: NetworkPolicy
      apiVersion: networking.k8s.io/v1
      metadata:
        name: allow-same-namespace
      spec:
        podSelector:
        ingress:
        - from:
          - podSelector: {}
      EOF
      Copy to Clipboard Toggle word wrap
    4. A policy named allow-from-kube-apiserver-operator:

      $ cat << EOF| oc create -f -
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-from-kube-apiserver-operator
      spec:
        ingress:
        - from:
          - namespaceSelector:
              matchLabels:
                kubernetes.io/metadata.name: openshift-kube-apiserver-operator
            podSelector:
              matchLabels:
                app: kube-apiserver-operator
        policyTypes:
        - Ingress
      EOF
      Copy to Clipboard Toggle word wrap

      For more details, see New kube-apiserver-operator webhook controller validating health of webhook.

  2. Optional: To confirm that the network policies exist in your current project, enter the following command:

    $ oc describe networkpolicy
    Copy to Clipboard Toggle word wrap

    Example output

    Name:         allow-from-openshift-ingress
    Namespace:    example1
    Created on:   2020-06-09 00:28:17 -0400 EDT
    Labels:       <none>
    Annotations:  <none>
    Spec:
      PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
      Allowing ingress traffic:
        To Port: <any> (traffic allowed to all ports)
        From:
          NamespaceSelector: policy-group.network.openshift.io/ingress:
      Not affecting egress traffic
      Policy Types: Ingress
    
    
    Name:         allow-from-openshift-monitoring
    Namespace:    example1
    Created on:   2020-06-09 00:29:57 -0400 EDT
    Labels:       <none>
    Annotations:  <none>
    Spec:
      PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
      Allowing ingress traffic:
        To Port: <any> (traffic allowed to all ports)
        From:
          NamespaceSelector: network.openshift.io/policy-group: monitoring
      Not affecting egress traffic
      Policy Types: Ingress
    Copy to Clipboard Toggle word wrap

6.3.7.2. Next steps

6.4. Audit logging for network security

The OVN-Kubernetes network plugin uses Open Virtual Network (OVN) access control lists (ACLs) to manage AdminNetworkPolicy, BaselineAdminNetworkPolicy, NetworkPolicy, and EgressFirewall objects. Audit logging exposes allow and deny ACL events for NetworkPolicy, EgressFirewall and BaselineAdminNetworkPolicy custom resources (CR). Logging also exposes allow, deny, and pass ACL events for AdminNetworkPolicy (ANP) CR.

Note

Audit logging is available for only the OVN-Kubernetes network plugin.

6.4.1. Audit configuration

The configuration for audit logging is specified as part of the OVN-Kubernetes cluster network provider configuration. The following YAML illustrates the default values for the audit logging:

Audit logging configuration

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    ovnKubernetesConfig:
      policyAuditConfig:
        destination: "null"
        maxFileSize: 50
        rateLimit: 20
        syslogFacility: local0
Copy to Clipboard Toggle word wrap

The following table describes the configuration fields for audit logging.

Expand
Table 6.1. policyAuditConfig object
FieldTypeDescription

rateLimit

integer

The maximum number of messages to generate every second per node. The default value is 20 messages per second.

maxFileSize

integer

The maximum size for the audit log in bytes. The default value is 50000000 or 50 MB.

maxLogFiles

integer

The maximum number of log files that are retained.

destination

string

One of the following additional audit log targets:

libc
The libc syslog() function of the journald process on the host.
udp:<host>:<port>
A syslog server. Replace <host>:<port> with the host and port of the syslog server.
unix:<file>
A Unix Domain Socket file specified by <file>.
null
Do not send the audit logs to any additional target.

syslogFacility

string

The syslog facility, such as kern, as defined by RFC5424. The default value is local0.

6.4.2. Audit logging

You can configure the destination for audit logs, such as a syslog server or a UNIX domain socket. Regardless of any additional configuration, an audit log is always saved to /var/log/ovn/acl-audit-log.log on each OVN-Kubernetes pod in the cluster.

You can enable audit logging for each namespace by annotating each namespace configuration with a k8s.ovn.org/acl-logging section. In the k8s.ovn.org/acl-logging section, you must specify allow, deny, or both values to enable audit logging for a namespace.

Note

A network policy does not support setting the Pass action set as a rule.

The ACL-logging implementation logs access control list (ACL) events for a network. You can view these logs to analyze any potential security issues.

Example namespace annotation

kind: Namespace
apiVersion: v1
metadata:
  name: example1
  annotations:
    k8s.ovn.org/acl-logging: |-
      {
        "deny": "info",
        "allow": "info"
      }
Copy to Clipboard Toggle word wrap

To view the default ACL logging configuration values, see the policyAuditConfig object in the cluster-network-03-config.yml file. If required, you can change the ACL logging configuration values for log file parameters in this file.

The logging message format is compatible with syslog as defined by RFC5424. The syslog facility is configurable and defaults to local0. The following example shows key parameters and their values outputted in a log message:

Example logging message that outputs parameters and their values

<timestamp>|<message_serial>|acl_log(ovn_pinctrl0)|<severity>|name="<acl_name>", verdict="<verdict>", severity="<severity>", direction="<direction>": <flow>
Copy to Clipboard Toggle word wrap

Where:

  • <timestamp> states the time and date for the creation of a log message.
  • <message_serial> lists the serial number for a log message.
  • acl_log(ovn_pinctrl0) is a literal string that prints the location of the log message in the OVN-Kubernetes plugin.
  • <severity> sets the severity level for a log message. If you enable audit logging that supports allow and deny tasks then two severity levels show in the log message output.
  • <name> states the name of the ACL-logging implementation in the OVN Network Bridging Database (nbdb) that was created by the network policy.
  • <verdict> can be either allow or drop.
  • <direction> can be either to-lport or from-lport to indicate that the policy was applied to traffic going to or away from a pod.
  • <flow> shows packet information in a format equivalent to the OpenFlow protocol. This parameter comprises Open vSwitch (OVS) fields.

The following example shows OVS fields that the flow parameter uses to extract packet information from system memory:

Example of OVS fields used by the flow parameter to extract packet information

<proto>,vlan_tci=0x0000,dl_src=<src_mac>,dl_dst=<source_mac>,nw_src=<source_ip>,nw_dst=<target_ip>,nw_tos=<tos_dscp>,nw_ecn=<tos_ecn>,nw_ttl=<ip_ttl>,nw_frag=<fragment>,tp_src=<tcp_src_port>,tp_dst=<tcp_dst_port>,tcp_flags=<tcp_flags>
Copy to Clipboard Toggle word wrap

Where:

  • <proto> states the protocol. Valid values are tcp and udp.
  • vlan_tci=0x0000 states the VLAN header as 0 because a VLAN ID is not set for internal pod network traffic.
  • <src_mac> specifies the source for the Media Access Control (MAC) address.
  • <source_mac> specifies the destination for the MAC address.
  • <source_ip> lists the source IP address
  • <target_ip> lists the target IP address.
  • <tos_dscp> states Differentiated Services Code Point (DSCP) values to classify and prioritize certain network traffic over other traffic.
  • <tos_ecn> states Explicit Congestion Notification (ECN) values that indicate any congested traffic in your network.
  • <ip_ttl> states the Time To Live (TTP) information for an packet.
  • <fragment> specifies what type of IP fragments or IP non-fragments to match.
  • <tcp_src_port> shows the source for the port for TCP and UDP protocols.
  • <tcp_dst_port> lists the destination port for TCP and UDP protocols.
  • <tcp_flags> supports numerous flags such as SYN, ACK, PSH and so on. If you need to set multiple values then each value is separated by a vertical bar (|). The UDP protocol does not support this parameter.
Note

For more information about the previous field descriptions, go to the OVS manual page for ovs-fields.

Example ACL deny log entry for a network policy

2023-11-02T16:28:54.139Z|00004|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
2023-11-02T16:28:55.187Z|00005|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
2023-11-02T16:28:57.235Z|00006|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
Copy to Clipboard Toggle word wrap

The following table describes namespace annotation values:

Expand
Table 6.2. Audit logging namespace annotation for k8s.ovn.org/acl-logging
FieldDescription

deny

Blocks namespace access to any traffic that matches an ACL rule with the deny action. The field supports alert, warning, notice, info, or debug values.

allow

Permits namespace access to any traffic that matches an ACL rule with the allow action. The field supports alert, warning, notice, info, or debug values.

pass

A pass action applies to an admin network policy’s ACL rule. A pass action allows either the network policy in the namespace or the baseline admin network policy rule to evaluate all incoming and outgoing traffic. A network policy does not support a pass action.

6.4.3. AdminNetworkPolicy audit logging

Audit logging is enabled per AdminNetworkPolicy CR by annotating an ANP policy with the k8s.ovn.org/acl-logging key such as in the following example:

Example 6.19. Example of annotation for AdminNetworkPolicy CR

apiVersion: policy.networking.k8s.io/v1alpha1
kind: AdminNetworkPolicy
metadata:
  annotations:
    k8s.ovn.org/acl-logging: '{ "deny": "alert", "allow": "alert", "pass" : "warning" }'
  name: anp-tenant-log
spec:
  priority: 5
  subject:
    namespaces:
      matchLabels:
        tenant: backend-storage # Selects all pods owned by storage tenant.
  ingress:
    - name: "allow-all-ingress-product-development-and-customer" # Product development and customer tenant ingress to backend storage.
      action: "Allow"
      from:
      - pods:
          namespaceSelector:
            matchExpressions:
            - key: tenant
              operator: In
              values:
              - product-development
              - customer
          podSelector: {}
    - name: "pass-all-ingress-product-security"
      action: "Pass"
      from:
      - namespaces:
          matchLabels:
              tenant: product-security
    - name: "deny-all-ingress" # Ingress to backend from all other pods in the cluster.
      action: "Deny"
      from:
      - namespaces: {}
  egress:
    - name: "allow-all-egress-product-development"
      action: "Allow"
      to:
      - pods:
          namespaceSelector:
            matchLabels:
              tenant: product-development
          podSelector: {}
    - name: "pass-egress-product-security"
      action: "Pass"
      to:
      - namespaces:
           matchLabels:
             tenant: product-security
    - name: "deny-all-egress" # Egress from backend denied to all other pods.
      action: "Deny"
      to:
      - namespaces: {}
Copy to Clipboard Toggle word wrap

Logs are generated whenever a specific OVN ACL is hit and meets the action criteria set in your logging annotation. For example, an event in which any of the namespaces with the label tenant: product-development accesses the namespaces with the label tenant: backend-storage, a log is generated.

Note

ACL logging is limited to 60 characters. If your ANP name field is long, the rest of the log will be truncated.

The following is a direction index for the examples log entries that follow:

Expand
DirectionRule

Ingress

Rule0
Allow from tenant product-development and customer to tenant backend-storage; Ingress0: Allow
Rule1
Pass from product-security`to tenant `backend-storage; Ingress1: Pass
Rule2
Deny ingress from all pods; Ingress2: Deny

Egress

Rule0
Allow to product-development; Egress0: Allow
Rule1
Pass to product-security; Egress1: Pass
Rule2
Deny egress to all other pods; Egress2: Deny

Example 6.20. Example ACL log entry for Allow action of the AdminNetworkPolicy named anp-tenant-log with Ingress:0 and Egress:0

2024-06-10T16:27:45.194Z|00052|acl_log(ovn_pinctrl0)|INFO|name="ANP:anp-tenant-log:Ingress:0", verdict=allow, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:1a,dl_dst=0a:58:0a:80:02:19,nw_src=10.128.2.26,nw_dst=10.128.2.25,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=57814,tp_dst=8080,tcp_flags=syn
2024-06-10T16:28:23.130Z|00059|acl_log(ovn_pinctrl0)|INFO|name="ANP:anp-tenant-log:Ingress:0", verdict=allow, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:18,dl_dst=0a:58:0a:80:02:19,nw_src=10.128.2.24,nw_dst=10.128.2.25,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=38620,tp_dst=8080,tcp_flags=ack
2024-06-10T16:28:38.293Z|00069|acl_log(ovn_pinctrl0)|INFO|name="ANP:anp-tenant-log:Egress:0", verdict=allow, severity=alert, direction=from-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:19,dl_dst=0a:58:0a:80:02:1a,nw_src=10.128.2.25,nw_dst=10.128.2.26,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=47566,tp_dst=8080,tcp_flags=fin|ack=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=55704,tp_dst=8080,tcp_flags=ack
Copy to Clipboard Toggle word wrap

Example 6.21. Example ACL log entry for Pass action of the AdminNetworkPolicy named anp-tenant-log with Ingress:1 and Egress:1

2024-06-10T16:33:12.019Z|00075|acl_log(ovn_pinctrl0)|INFO|name="ANP:anp-tenant-log:Ingress:1", verdict=pass, severity=warning, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:1b,dl_dst=0a:58:0a:80:02:19,nw_src=10.128.2.27,nw_dst=10.128.2.25,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=37394,tp_dst=8080,tcp_flags=ack
2024-06-10T16:35:04.209Z|00081|acl_log(ovn_pinctrl0)|INFO|name="ANP:anp-tenant-log:Egress:1", verdict=pass, severity=warning, direction=from-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:19,dl_dst=0a:58:0a:80:02:1b,nw_src=10.128.2.25,nw_dst=10.128.2.27,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=34018,tp_dst=8080,tcp_flags=ack
Copy to Clipboard Toggle word wrap

Example 6.22. Example ACL log entry for Deny action of the AdminNetworkPolicy named anp-tenant-log with Egress:2 and Ingress2

2024-06-10T16:43:05.287Z|00087|acl_log(ovn_pinctrl0)|INFO|name="ANP:anp-tenant-log:Egress:2", verdict=drop, severity=alert, direction=from-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:19,dl_dst=0a:58:0a:80:02:18,nw_src=10.128.2.25,nw_dst=10.128.2.24,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=51598,tp_dst=8080,tcp_flags=syn
2024-06-10T16:44:43.591Z|00090|acl_log(ovn_pinctrl0)|INFO|name="ANP:anp-tenant-log:Ingress:2", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:1c,dl_dst=0a:58:0a:80:02:19,nw_src=10.128.2.28,nw_dst=10.128.2.25,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=33774,tp_dst=8080,tcp_flags=syn
Copy to Clipboard Toggle word wrap

The following table describes ANP annotation:

Expand
Table 6.3. Audit logging AdminNetworkPolicy annotation
AnnotationValue

k8s.ovn.org/acl-logging

You must specify at least one of Allow, Deny, or Pass to enable audit logging for a namespace.

Deny
Optional: Specify alert, warning, notice, info, or debug.
Allow
Optional: Specify alert, warning, notice, info, or debug.
Pass
Optional: Specify alert, warning, notice, info, or debug.

6.4.4. BaselineAdminNetworkPolicy audit logging

Audit logging is enabled in the BaselineAdminNetworkPolicy CR by annotating an BANP policy with the k8s.ovn.org/acl-logging key such as in the following example:

Example 6.23. Example of annotation for BaselineAdminNetworkPolicy CR

apiVersion: policy.networking.k8s.io/v1alpha1
kind: BaselineAdminNetworkPolicy
metadata:
  annotations:
    k8s.ovn.org/acl-logging: '{ "deny": "alert", "allow": "alert"}'
  name: default
spec:
  subject:
    namespaces:
      matchLabels:
          tenant: workloads # Selects all workload pods in the cluster.
  ingress:
  - name: "default-allow-dns" # This rule allows ingress from dns tenant to all workloads.
    action: "Allow"
    from:
    - namespaces:
          matchLabels:
            tenant: dns
  - name: "default-deny-dns" # This rule denies all ingress from all pods to workloads.
    action: "Deny"
    from:
    - namespaces: {} # Use the empty selector with caution because it also selects OpenShift namespaces as well.
  egress:
  - name: "default-deny-dns" # This rule denies all egress from workloads. It will be applied when no ANP or network policy matches.
    action: "Deny"
    to:
    - namespaces: {} # Use the empty selector with caution because it also selects OpenShift namespaces as well.
Copy to Clipboard Toggle word wrap

In the example, an event in which any of the namespaces with the label tenant: dns accesses the namespaces with the label tenant: workloads, a log is generated.

The following is a direction index for the examples log entries that follow:

Expand
DirectionRule

Ingress

Rule0
Allow from tenant dns to tenant workloads; Ingress0: Allow
Rule1
Deny to tenant workloads from all pods; Ingress1: Deny

Egress

Rule0
Deny to all pods; Egress0: Deny

Example 6.24. Example ACL allow log entry for Allow action of default BANP with Ingress:0

2024-06-10T18:11:58.263Z|00022|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Ingress:0", verdict=allow, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:57,dl_dst=0a:58:0a:82:02:56,nw_src=10.130.2.87,nw_dst=10.130.2.86,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=60510,tp_dst=8080,tcp_flags=syn
2024-06-10T18:11:58.264Z|00023|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Ingress:0", verdict=allow, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:57,dl_dst=0a:58:0a:82:02:56,nw_src=10.130.2.87,nw_dst=10.130.2.86,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=60510,tp_dst=8080,tcp_flags=psh|ack
2024-06-10T18:11:58.264Z|00024|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Ingress:0", verdict=allow, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:57,dl_dst=0a:58:0a:82:02:56,nw_src=10.130.2.87,nw_dst=10.130.2.86,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=60510,tp_dst=8080,tcp_flags=ack
2024-06-10T18:11:58.264Z|00025|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Ingress:0", verdict=allow, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:57,dl_dst=0a:58:0a:82:02:56,nw_src=10.130.2.87,nw_dst=10.130.2.86,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=60510,tp_dst=8080,tcp_flags=ack
2024-06-10T18:11:58.264Z|00026|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Ingress:0", verdict=allow, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:57,dl_dst=0a:58:0a:82:02:56,nw_src=10.130.2.87,nw_dst=10.130.2.86,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=60510,tp_dst=8080,tcp_flags=fin|ack
2024-06-10T18:11:58.264Z|00027|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Ingress:0", verdict=allow, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:57,dl_dst=0a:58:0a:82:02:56,nw_src=10.130.2.87,nw_dst=10.130.2.86,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=60510,tp_dst=8080,tcp_flags=ack
Copy to Clipboard Toggle word wrap

Example 6.25. Example ACL allow log entry for Allow action of default BANP with Egress:0 and Ingress:1

2024-06-10T18:09:57.774Z|00016|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Egress:0", verdict=drop, severity=alert, direction=from-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:56,dl_dst=0a:58:0a:82:02:57,nw_src=10.130.2.86,nw_dst=10.130.2.87,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=45614,tp_dst=8080,tcp_flags=syn
2024-06-10T18:09:58.809Z|00017|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Egress:0", verdict=drop, severity=alert, direction=from-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:56,dl_dst=0a:58:0a:82:02:57,nw_src=10.130.2.86,nw_dst=10.130.2.87,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=45614,tp_dst=8080,tcp_flags=syn
2024-06-10T18:10:00.857Z|00018|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Egress:0", verdict=drop, severity=alert, direction=from-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:56,dl_dst=0a:58:0a:82:02:57,nw_src=10.130.2.86,nw_dst=10.130.2.87,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=45614,tp_dst=8080,tcp_flags=syn
2024-06-10T18:10:25.414Z|00019|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Ingress:1", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:58,dl_dst=0a:58:0a:82:02:56,nw_src=10.130.2.88,nw_dst=10.130.2.86,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=40630,tp_dst=8080,tcp_flags=syn
2024-06-10T18:10:26.457Z|00020|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Ingress:1", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:58,dl_dst=0a:58:0a:82:02:56,nw_src=10.130.2.88,nw_dst=10.130.2.86,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=40630,tp_dst=8080,tcp_flags=syn
2024-06-10T18:10:28.505Z|00021|acl_log(ovn_pinctrl0)|INFO|name="BANP:default:Ingress:1", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:82:02:58,dl_dst=0a:58:0a:82:02:56,nw_src=10.130.2.88,nw_dst=10.130.2.86,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,tp_src=40630,tp_dst=8080,tcp_flags=syn
Copy to Clipboard Toggle word wrap

The following table describes BANP annotation:

Expand
Table 6.4. Audit logging BaselineAdminNetworkPolicy annotation
AnnotationValue

k8s.ovn.org/acl-logging

You must specify at least one of Allow or Deny to enable audit logging for a namespace.

Deny
Optional: Specify alert, warning, notice, info, or debug.
Allow
Optional: Specify alert, warning, notice, info, or debug.

As a cluster administrator, you can customize audit logging for your cluster.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in to the cluster with a user with cluster-admin privileges.

Procedure

  • To customize the audit logging configuration, enter the following command:

    $ oc edit network.operator.openshift.io/cluster
    Copy to Clipboard Toggle word wrap
    Tip

    You can also customize and apply the following YAML to configure audit logging:

    apiVersion: operator.openshift.io/v1
    kind: Network
    metadata:
      name: cluster
    spec:
      defaultNetwork:
        ovnKubernetesConfig:
          policyAuditConfig:
            destination: "null"
            maxFileSize: 50
            rateLimit: 20
            syslogFacility: local0
    Copy to Clipboard Toggle word wrap

Verification

  1. To create a namespace with network policies complete the following steps:

    1. Create a namespace for verification:

      $ cat <<EOF| oc create -f -
      kind: Namespace
      apiVersion: v1
      metadata:
        name: verify-audit-logging
        annotations:
          k8s.ovn.org/acl-logging: '{ "deny": "alert", "allow": "alert" }'
      EOF
      Copy to Clipboard Toggle word wrap

      Successful output lists the namespace with the network policy and the created status.

    2. Create network policies for the namespace:

      $ cat <<EOF| oc create -n verify-audit-logging -f -
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: deny-all
      spec:
        podSelector:
          matchLabels:
        policyTypes:
        - Ingress
        - Egress
      ---
      apiVersion: networking.k8s.io/v1
      kind: NetworkPolicy
      metadata:
        name: allow-from-same-namespace
        namespace: verify-audit-logging
      spec:
        podSelector: {}
        policyTypes:
         - Ingress
         - Egress
        ingress:
          - from:
              - podSelector: {}
        egress:
          - to:
             - namespaceSelector:
                matchLabels:
                  kubernetes.io/metadata.name: verify-audit-logging
      EOF
      Copy to Clipboard Toggle word wrap

      Example output

      networkpolicy.networking.k8s.io/deny-all created
      networkpolicy.networking.k8s.io/allow-from-same-namespace created
      Copy to Clipboard Toggle word wrap

  2. Create a pod for source traffic in the default namespace:

    $ cat <<EOF| oc create -n default -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: client
    spec:
      containers:
        - name: client
          image: registry.access.redhat.com/rhel7/rhel-tools
          command: ["/bin/sh", "-c"]
          args:
            ["sleep inf"]
    EOF
    Copy to Clipboard Toggle word wrap
  3. Create two pods in the verify-audit-logging namespace:

    $ for name in client server; do
    cat <<EOF| oc create -n verify-audit-logging -f -
    apiVersion: v1
    kind: Pod
    metadata:
      name: ${name}
    spec:
      containers:
        - name: ${name}
          image: registry.access.redhat.com/rhel7/rhel-tools
          command: ["/bin/sh", "-c"]
          args:
            ["sleep inf"]
    EOF
    done
    Copy to Clipboard Toggle word wrap

    Successful output lists the two pods, such as pod/client and pod/server, and the created status.

  4. To generate traffic and produce network policy audit log entries, complete the following steps:

    1. Obtain the IP address for pod named server in the verify-audit-logging namespace:

      $ POD_IP=$(oc get pods server -n verify-audit-logging -o jsonpath='{.status.podIP}')
      Copy to Clipboard Toggle word wrap
    2. Ping the IP address from an earlier command from the pod named client in the default namespace and confirm the all packets are dropped:

      $ oc exec -it client -n default -- /bin/ping -c 2 $POD_IP
      Copy to Clipboard Toggle word wrap

      Example output

      PING 10.128.2.55 (10.128.2.55) 56(84) bytes of data.
      
      --- 10.128.2.55 ping statistics ---
      2 packets transmitted, 0 received, 100% packet loss, time 2041ms
      Copy to Clipboard Toggle word wrap

    3. From the client pod in the verify-audit-logging namespace, ping the IP address stored in the POD_IP shell environment variable and confirm the system allows all packets.

      $ oc exec -it client -n verify-audit-logging -- /bin/ping -c 2 $POD_IP
      Copy to Clipboard Toggle word wrap

      Example output

      PING 10.128.0.86 (10.128.0.86) 56(84) bytes of data.
      64 bytes from 10.128.0.86: icmp_seq=1 ttl=64 time=2.21 ms
      64 bytes from 10.128.0.86: icmp_seq=2 ttl=64 time=0.440 ms
      
      --- 10.128.0.86 ping statistics ---
      2 packets transmitted, 2 received, 0% packet loss, time 1001ms
      rtt min/avg/max/mdev = 0.440/1.329/2.219/0.890 ms
      Copy to Clipboard Toggle word wrap

  5. Display the latest entries in the network policy audit log:

    $ for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do
        oc exec -it $pod -n openshift-ovn-kubernetes -- tail -4 /var/log/ovn/acl-audit-log.log
      done
    Copy to Clipboard Toggle word wrap

    Example output

    2023-11-02T16:28:54.139Z|00004|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
    2023-11-02T16:28:55.187Z|00005|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
    2023-11-02T16:28:57.235Z|00006|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:Ingress", verdict=drop, severity=alert, direction=to-lport: tcp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:01,dl_dst=0a:58:0a:81:02:23,nw_src=10.131.0.39,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=62,nw_frag=no,tp_src=58496,tp_dst=8080,tcp_flags=syn
    2023-11-02T16:49:57.909Z|00028|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
    2023-11-02T16:49:57.909Z|00029|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
    2023-11-02T16:49:58.932Z|00030|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
    2023-11-02T16:49:58.932Z|00031|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
    Copy to Clipboard Toggle word wrap

As a cluster administrator, you can enable audit logging for a namespace.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in to the cluster with a user with cluster-admin privileges.

Procedure

  • To enable audit logging for a namespace, enter the following command:

    $ oc annotate namespace <namespace> \
      k8s.ovn.org/acl-logging='{ "deny": "alert", "allow": "notice" }'
    Copy to Clipboard Toggle word wrap

    where:

    <namespace>
    Specifies the name of the namespace.
    Tip

    You can also apply the following YAML to enable audit logging:

    kind: Namespace
    apiVersion: v1
    metadata:
      name: <namespace>
      annotations:
        k8s.ovn.org/acl-logging: |-
          {
            "deny": "alert",
            "allow": "notice"
          }
    Copy to Clipboard Toggle word wrap

    Successful output lists the audit logging name and the annotated status.

Verification

  • Display the latest entries in the audit log:

    $ for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do
        oc exec -it $pod -n openshift-ovn-kubernetes -- tail -4 /var/log/ovn/acl-audit-log.log
      done
    Copy to Clipboard Toggle word wrap

    Example output

    2023-11-02T16:49:57.909Z|00028|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
    2023-11-02T16:49:57.909Z|00029|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
    2023-11-02T16:49:58.932Z|00030|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Egress:0", verdict=allow, severity=alert, direction=from-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
    2023-11-02T16:49:58.932Z|00031|acl_log(ovn_pinctrl0)|INFO|name="NP:verify-audit-logging:allow-from-same-namespace:Ingress:0", verdict=allow, severity=alert, direction=to-lport: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:81:02:22,dl_dst=0a:58:0a:81:02:23,nw_src=10.129.2.34,nw_dst=10.129.2.35,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
    Copy to Clipboard Toggle word wrap

As a cluster administrator, you can disable audit logging for a namespace.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in to the cluster with a user with cluster-admin privileges.

Procedure

  • To disable audit logging for a namespace, enter the following command:

    $ oc annotate --overwrite namespace <namespace> k8s.ovn.org/acl-logging-
    Copy to Clipboard Toggle word wrap

    where:

    <namespace>
    Specifies the name of the namespace.
    Tip

    You can also apply the following YAML to disable audit logging:

    kind: Namespace
    apiVersion: v1
    metadata:
      name: <namespace>
      annotations:
        k8s.ovn.org/acl-logging: null
    Copy to Clipboard Toggle word wrap

    Successful output lists the audit logging name and the annotated status.

6.5. Egress Firewall

6.5.1. Viewing an egress firewall for a project

As a cluster administrator, you can list the names of any existing egress firewalls and view the traffic rules for a specific egress firewall.

Note

OpenShift SDN CNI is deprecated as of OpenShift Container Platform 4.14. As of OpenShift Container Platform 4.15, the network plugin is not an option for new installations. In a subsequent future release, the OpenShift SDN network plugin is planned to be removed and no longer supported. Red Hat will provide bug fixes and support for this feature until it is removed, but this feature will no longer receive enhancements. As an alternative to OpenShift SDN CNI, you can use OVN Kubernetes CNI instead. For more information, see OpenShift SDN CNI removal.

You can view an EgressNetworkPolicy CR in your cluster.

Prerequisites

  • A cluster using the OpenShift SDN network plugin.
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster.

Procedure

  1. Optional: To view the names of the EgressNetworkPolicy CRs defined in your cluster, enter the following command:

    $ oc get egressnetworkpolicy --all-namespaces
    Copy to Clipboard Toggle word wrap
  2. To inspect a policy, enter the following command. Replace <policy_name> with the name of the policy to inspect.

    $ oc describe egressnetworkpolicy <policy_name>
    Copy to Clipboard Toggle word wrap

    Example output

    Name:		default
    Namespace:	project1
    Created:	20 minutes ago
    Labels:		<none>
    Annotations:	<none>
    Rule:		Allow to 1.2.3.0/24
    Rule:		Allow to www.example.com
    Rule:		Deny to 0.0.0.0/0
    Copy to Clipboard Toggle word wrap

You can view an EgressFirewall CR in your cluster.

Prerequisites

  • A cluster using the OVN-Kubernetes network plugin.
  • Install the OpenShift Command-line Interface (CLI), commonly known as oc.
  • You must log in to the cluster.

Procedure

  1. Optional: To view the names of the EgressFirewall CR defined in your cluster, enter the following command:

    $ oc get egressfirewall --all-namespaces
    Copy to Clipboard Toggle word wrap
  2. To inspect a policy, enter the following command. Replace <policy_name> with the name of the policy to inspect.

    $ oc describe egressfirewall <policy_name>
    Copy to Clipboard Toggle word wrap

    Example output

    Name:		default
    Namespace:	project1
    Created:	20 minutes ago
    Labels:		<none>
    Annotations:	<none>
    Rule:		Allow to 1.2.3.0/24
    Rule:		Allow to www.example.com
    Rule:		Deny to 0.0.0.0/0
    Copy to Clipboard Toggle word wrap

6.5.2. Editing an egress firewall for a project

As a cluster administrator, you can modify network traffic rules for an existing egress firewall.

As a cluster administrator, you can update the egress firewall for a project.

Prerequisites

  • A cluster using the OVN-Kubernetes network plugin.
  • Install the OpenShift CLI (oc).
  • You must log in to the cluster as a cluster administrator.

Procedure

  1. Find the name of the EgressFirewall CR for the project. Replace <project> with the name of the project.

    $ oc get -n <project> egressfirewall
    Copy to Clipboard Toggle word wrap
  2. Optional: If you did not save a copy of the EgressFirewall object when you created the egress network firewall, enter the following command to create a copy.

    $ oc get -n <project> egressfirewall <name> -o yaml > <filename>.yaml
    Copy to Clipboard Toggle word wrap

    Replace <project> with the name of the project. Replace <name> with the name of the object. Replace <filename> with the name of the file to save the YAML to.

  3. After making changes to the policy rules, enter the following command to replace the EgressFirewall CR. Replace <filename> with the name of the file containing the updated EgressFirewall CR.

    $ oc replace -f <filename>.yaml
    Copy to Clipboard Toggle word wrap

6.5.3. Removing an egress firewall from a project

As a cluster administrator, you can remove an egress firewall from a project to remove all restrictions on network traffic from the project that leaves the OpenShift Container Platform cluster.

6.5.3.1. Removing an EgressFirewall CR

As a cluster administrator, you can remove an egress firewall from a project.

Prerequisites

  • A cluster using the OVN-Kubernetes network plugin.
  • Install the OpenShift CLI (oc).
  • You must log in to the cluster as a cluster administrator.

Procedure

  1. Find the name of the EgressFirewall CR for the project. Replace <project> with the name of the project.

    $ oc get egressfirewall -n <project>
    Copy to Clipboard Toggle word wrap
  2. Delete the EgressFirewall CR by entering the following command. Replace <project> with the name of the project and <name> with the name of the object.

    $ oc delete -n <project> egressfirewall <name>
    Copy to Clipboard Toggle word wrap

As a cluster administrator, you can create an egress firewall for a project that restricts egress traffic leaving your OpenShift Container Platform cluster.

6.5.4.1. How an egress firewall works in a project

As a cluster administrator, you can use an egress firewall to limit the external hosts that some or all pods can access from within the cluster. An egress firewall supports the following scenarios:

  • A pod can only connect to internal hosts and cannot initiate connections to the public internet.
  • A pod can only connect to the public internet and cannot initiate connections to internal hosts that are outside the OpenShift Container Platform cluster.
  • A pod cannot reach specified internal subnets or hosts outside the OpenShift Container Platform cluster.
  • A pod can only connect to specific external hosts.

For example, you can allow one project access to a specified IP range but deny the same access to a different project. Or, you can restrict application developers from updating from Python pip mirrors, and force updates to come only from approved sources.

You configure an egress firewall policy by creating an EgressFirewall custom resource (CR). The egress firewall matches network traffic that meets any of the following criteria:

  • An IP address range in CIDR format
  • A DNS name that resolves to an IP address
  • A port number
  • A protocol that is one of the following protocols: TCP, UDP, and SCTP
6.5.4.1.1. Limitations of an egress firewall

An egress firewall has the following limitations:

  • No project can have more than one EgressFirewall CR.
  • Egress firewall rules do not apply to traffic that goes through routers. Any user with permission to create a Route CR object can bypass egress firewall policy rules by creating a route that points to a forbidden destination.
  • Egress firewall does not apply to the host network namespace. Pods with host networking enabled are unaffected by egress firewall rules.
  • If your egress firewall includes a deny rule for 0.0.0.0/0, access to your OpenShift Container Platform API servers is blocked. You must either add allow rules for each IP address or use the nodeSelector type allow rule in your egress policy rules to connect to API servers.

    The following example illustrates the order of the egress firewall rules necessary to ensure API server access:

    EgressFirewall API server access example

    apiVersion: k8s.ovn.org/v1
    kind: EgressFirewall
    metadata:
      name: default
      namespace: <namespace> 
    1
    
    spec:
      egress:
      - to:
          cidrSelector: <api_server_address_range> 
    2
    
        type: Allow
    # ...
      - to:
          cidrSelector: 0.0.0.0/0 
    3
    
        type: Deny
    Copy to Clipboard Toggle word wrap

    where:

    <namespace>
    Specifies the namespace for the egress firewall.
    <api_server_address_range>
    Specifies the IP address range that includes your OpenShift Container Platform API servers.
    <cidrSelector>

    Specifies a value of 0.0.0.0/0 to set a global deny rule that prevents access to the OpenShift Container Platform API servers.

    To find the IP address for your API servers, run oc get ep kubernetes -n default.

    For more information, see BZ#1988324.

  • A maximum of one EgressFirewall object with a maximum of 8,000 rules can be defined per project.
  • If you are using the OVN-Kubernetes network plugin with shared gateway mode in Red Hat OpenShift Networking, return ingress replies are affected by egress firewall rules. If the egress firewall rules drop the ingress reply destination IP, the traffic is dropped.
  • In general, using Domain Name Server (DNS) names in your egress firewall policy does not affect local DNS resolution through CoreDNS. However, if your egress firewall policy uses domain names and an external DNS server handles DNS resolution for an affected pod, you must include egress firewall rules that permit access to the IP addresses of your DNS server.

Violating any of these restrictions results in a broken egress firewall for the project. Consequently, all external network traffic is dropped, which can cause security risks for your organization.

An EgressFirewall resource is created in the kube-node-lease, kube-public, kube-system, openshift and openshift- projects.

OVN-Kubernetes evaluates egress firewall policy rules in the order they are defined in, from first to last. The first rule that matches an egress connection from a pod applies. Any subsequent rules are ignored for that connection.

If you use DNS names in any of your egress firewall policy rules, proper resolution of the domain names is subject to the following restrictions:

  • Domain name updates are polled based on a time-to-live (TTL) duration. By default, the duration is 30 minutes. When the egress firewall controller queries the local name servers for a domain name, if the response includes a TTL and the TTL is less than 30 minutes, the controller sets the duration for that DNS name to the returned value. Each DNS name is queried after the TTL for the DNS record expires.
  • The pod must resolve the domain from the same local name servers when necessary. Otherwise the IP addresses for the domain known by the egress firewall controller and the pod can be different. If the IP addresses for a hostname differ, the egress firewall might not be enforced consistently.
  • Because the egress firewall controller and pods asynchronously poll the same local name server, the pod might obtain the updated IP address before the egress controller does, which causes a race condition. Due to this current limitation, domain name usage in EgressFirewall objects is only recommended for domains with infrequent IP address changes.

There might be situations where the IP addresses associated with a DNS record change frequently, or you might want to specify wildcard domain names in your egress firewall policy rules.

In this situation, the OVN-Kubernetes cluster manager creates a DNSNameResolver custom resource object for each unique DNS name used in your egress firewall policy rules. This custom resource stores the following information:

Important

Improved DNS resolution for egress firewall rules is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Example DNSNameResolver CR definition

apiVersion: networking.openshift.io/v1alpha1
kind: DNSNameResolver
spec:
  name: www.example.com.
status:
  resolvedNames:
  - dnsName: www.example.com.
    resolvedAddress:
    - ip: "1.2.3.4"
      ttlSeconds: 60
      lastLookupTime: "2023-08-08T15:07:04Z"
Copy to Clipboard Toggle word wrap

where:

<name>
Specifies the DNS name. This can be either a standard DNS name or a wildcard DNS name. For a wildcard DNS name, the DNS name resolution information contains all of the DNS names that match the wildcard DNS name.
<dnsName>
Specifies the resolved DNS name matching the spec.name field. If the spec.name field contains a wildcard DNS name, then multiple dnsName entries are created that contain the standard DNS names that match the wildcard DNS name when resolved. If the wildcard DNS name can also be successfully resolved, then this field also stores the wildcard DNS name. <ip> Specifies the current IP addresses associated with the DNS name.
<ttlSeconds>
Specifies the last time-to-live (TTL) duration.
<lastLookupTime>
Specifies the last lookup time.

If during DNS resolution the DNS name in the query matches any name defined in a DNSNameResolver CR, then the previous information is updated accordingly in the CR status field. For unsuccessful DNS wildcard name lookups, the request is retried after a default TTL of 30 minutes.

The OVN-Kubernetes cluster manager watches for updates to an EgressFirewall custom resource object, and creates, modifies, or deletes DNSNameResolver CRs associated with those egress firewall policies when that update occurs.

Warning

Do not modify DNSNameResolver custom resources directly. This can lead to unwanted behavior of your egress firewall.

6.5.4.2. EgressFirewall custom resource (CR)

You can define one or more rules for an egress firewall. A rule is either an Allow rule or a Deny rule, with a specification for the traffic that the rule applies to.

The following YAML describes an EgressFirewall CR:

EgressFirewall object

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: <ovn>
spec:
  egress: <egress_rules>
    ...
Copy to Clipboard Toggle word wrap

where:

<ovn>
The name for the object must be default.
<egress_rules>
Specifies a collection of one or more egress network policy rules as described in the following section.
6.5.4.2.1. EgressFirewall rules

The following YAML describes the rules for an EgressFirewall resource. The user can select either an IP address range in CIDR format, a domain name, or use the nodeSelector field to allow or deny egress traffic. The egress stanza expects an array of one or more objects.

Egress policy rule stanza

egress:
- type: <type>
  to:
    cidrSelector: <cidr_range>
    dnsName: <dns_name>
    nodeSelector: <label_name>: <label_value>
  ports: <optional_port>
      ...
Copy to Clipboard Toggle word wrap

where:

<type>
Specifies the type of rule. The value must be either Allow or Deny.
<to>
Specifies a stanza describing an egress traffic match rule that specifies the cidrSelector field or the dnsName field. You cannot use both fields in the same rule.
<cidr_range>
Specifies an IP address range in CIDR format.
<dns_name>
Specifies a DNS domain name.
<nodeSelector>
Specifies labels which are key and value pairs that the user defines. Labels are attached to objects, such as pods. The nodeSelector allows for one or more node labels to be selected and attached to pods.
<ports>
Specifies an optional field that describes a collection of network ports and protocols for the rule.

Ports stanza

ports:
- port:
  protocol:
Copy to Clipboard Toggle word wrap

where:

<port>
Specifies a network port, such as 80 or 443. If you specify a value for this field, you must also specify a value for the protocol field.
<protocol>
Specifies a network protocol. The value must be either TCP, UDP, or SCTP.
6.5.4.2.2. Example EgressFirewall CR

The following example defines several egress firewall policy rules:

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
  egress: 
1

  - type: Allow
    to:
      cidrSelector: 1.2.3.0/24
  - type: Deny
    to:
      cidrSelector: 0.0.0.0/0
Copy to Clipboard Toggle word wrap

where:

<egress>
Specifies a collection of egress firewall policy rule objects.

The following example defines a policy rule that denies traffic to the host at the 172.16.1.1/32 IP address, if the traffic is using either the TCP protocol and destination port 80 or any protocol and destination port 443.

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
  egress:
  - type: Deny
    to:
      cidrSelector: 172.16.1.1/32
    ports:
    - port: 80
      protocol: TCP
    - port: 443
Copy to Clipboard Toggle word wrap

As a cluster administrator, you can allow or deny egress traffic to nodes in your cluster by specifying a label using nodeSelector field. Labels can be applied to one or more nodes. Labels can be helpful because instead of adding manual rules per node IP address, you can use node selectors to create a label that allows pods behind an egress firewall to access host network pods. The following is an example with the region=east label:

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
    egress:
    - to:
        nodeSelector:
          matchLabels:
            region: east
      type: Allow
Copy to Clipboard Toggle word wrap

As a cluster administrator, you can create an egress firewall policy object for a project.

Important

If the project already has an EgressFirewall resource, you must edit the existing policy to make changes to egress firewall rules.

Prerequisites

  • A cluster that uses the OVN-Kubernetes network plugin.
  • Install the OpenShift CLI (oc).
  • You must log in to the cluster as a cluster administrator.

Procedure

  1. Create a policy rule:

    1. Create a <policy_name>.yaml file where <policy_name> describes the egress policy rules.
    2. Define the EgressFirewall object in the file.
  2. Create the policy object by entering the following command. Replace <policy_name> with the name of the policy and <project> with the project that the rule applies to.

    $ oc create -f <policy_name>.yaml -n <project>
    Copy to Clipboard Toggle word wrap

    Successful output lists the egressfirewall.k8s.ovn.org/v1 name and the created status.

  3. Optional: Save the <policy_name>.yaml file so that you can make changes later.

6.6. Configuring IPsec encryption

By enabling IPsec, you can encrypt both internal pod-to-pod cluster traffic between nodes and external traffic between pods and IPsec endpoints external to your cluster. All pod-to-pod network traffic between nodes on the OVN-Kubernetes cluster network is encrypted with IPsec in Transport mode.

IPsec is disabled by default. You can enable IPsec either during or after installing the cluster. For information about cluster installation, see OpenShift Container Platform installation overview.

Note

Upgrading your cluster to OpenShift Container Platform 4.16 when the libreswan and NetworkManager-libreswan packages have different OpenShift Container Platform versions causes two consecutive compute node reboot operations. For the first reboot, the Cluster Network Operator (CNO) applies the IPsec configuration to compute nodes. For the second reboot, the Machine Config Operator (MCO) applies the latest machine configs to the cluster.

To combine the CNO and MCO updates into a single node reboot, complete the following tasks:

  • Before upgrading your cluster, set the paused parameter to true in the MachineConfigPools custom resource (CR) that groups compute nodes.
  • After you upgrade your cluster, set the parameter to false.

For more information, see Performing a Control Plane Only update.

The following support limitations exist for IPsec on a OpenShift Container Platform cluster:

  • On IBM Cloud®, IPsec supports only NAT-T. Encapsulating Security Payload (ESP) is not supported on this platform.
  • If your cluster uses hosted control planes for Red Hat OpenShift Container Platform, IPsec is not supported for IPsec encryption of either pod-to-pod or traffic to external hosts.
  • Using ESP hardware offloading on any network interface is not supported if one or more of those interfaces is attached to Open vSwitch (OVS). Enabling IPsec for your cluster triggers the use of IPsec with interfaces attached to OVS. By default, OpenShift Container Platform disables ESP hardware offloading on any interfaces attached to OVS.
  • If you enabled IPsec for network interfaces that are not attached to OVS, a cluster administrator must manually disable ESP hardware offloading on each interface that is not attached to OVS.
  • IPsec is not supported on Red Hat Enterprise Linux (RHEL) compute nodes because of a libreswan incompatiblility issue between a host and an ovn-ipsec container that exist in each compute node. See (OCPBUGS-53316).

The following list outlines key tasks in the IPsec documentation:

  • Enable and disable IPsec after cluster installation.
  • Configure IPsec encryption for traffic between the cluster and external hosts.
  • Verify that IPsec encrypts traffic between pods on different nodes.

6.6.1. Modes of operation

When using IPsec on your OpenShift Container Platform cluster, you can choose from the following operating modes:

Expand
Table 6.5. IPsec modes of operation
ModeDescriptionDefault

Disabled

No traffic is encrypted. This is the cluster default.

Yes

Full

Pod-to-pod traffic is encrypted as described in "Types of network traffic flows encrypted by pod-to-pod IPsec". Traffic to external nodes may be encrypted after you complete the required configuration steps for IPsec.

No

External

Traffic to external nodes may be encrypted after you complete the required configuration steps for IPsec.

No

6.6.2. Prerequisites

For IPsec support for encrypting traffic to external hosts, ensure that you meet the following prerequisites:

  • Set routingViaHost=true in the ovnKubernetesConfig.gatewayConfig specification of the OVN-Kubernetes network plugin.
  • Install the NMState Operator. This Operator is required for specifying the IPsec configuration. For more information, see Kubernetes NMState Operator.

    Note

    The NMState Operator is supported on Google Cloud only for configuring IPsec.

  • The Butane tool (butane) is installed. To install Butane, see Installing Butane.

These prerequisites are required to add certificates into the host NSS database and to configure IPsec to communicate with external hosts.

You must configure the network connectivity between machines to allow OpenShift Container Platform cluster components to communicate. Each machine must be able to resolve the hostnames of all other machines in the cluster.

Expand
Table 6.6. Ports used for all-machine to all-machine communications
ProtocolPortDescription

UDP

500

IPsec IKE packets

4500

IPsec NAT-T packets

ESP

N/A

IPsec Encapsulating Security Payload (ESP)

6.6.4. IPsec encryption for pod-to-pod traffic

For IPsec encryption of pod-to-pod traffic, the following sections describe which specific pod-to-pod traffic is encrypted, what kind of encryption protocol is used, and how X.509 certificates are handled. These sections do not apply to IPsec encryption between the cluster and external hosts, which you must configure manually for your specific external network infrastructure.

With IPsec enabled, only the following network traffic flows between pods are encrypted:

  • Traffic between pods on different nodes on the cluster network
  • Traffic from a pod on the host network to a pod on the cluster network

The following traffic flows are not encrypted:

  • Traffic between pods on the same node on the cluster network
  • Traffic between pods on the host network
  • Traffic from a pod on the cluster network to a pod on the host network

The encrypted and unencrypted flows are illustrated in the following diagram:

6.6.4.2. Encryption protocol and IPsec mode

The encrypt cipher used is AES-GCM-16-256. The integrity check value (ICV) is 16 bytes. The key length is 256 bits.

The IPsec mode used is Transport mode, a mode that encrypts end-to-end communication by adding an Encapsulated Security Payload (ESP) header to the IP header of the original packet and encrypts the packet data. OpenShift Container Platform does not currently use or support IPsec Tunnel mode for pod-to-pod communication.

The Cluster Network Operator (CNO) generates a self-signed X.509 certificate authority (CA) that is used by IPsec for encryption. Certificate signing requests (CSRs) from each node are automatically fulfilled by the CNO.

The CA is valid for 10 years. The individual node certificates are valid for 5 years and are automatically rotated after 4 1/2 years elapse.

6.6.5. IPsec encryption for external traffic

OpenShift Container Platform supports the use of IPsec to encrypt traffic destined for external hosts, ensuring confidentiality and integrity of data in transit. This feature relies on X.509 certificates that you must supply.

6.6.5.1. Supported platforms

This feature is supported on the following platforms:

  • Bare metal
  • Google Cloud
  • Red Hat OpenStack Platform (RHOSP)
  • VMware vSphere
Important

If you have Red Hat Enterprise Linux (RHEL) compute nodes, these do not support IPsec encryption for external traffic.

If your cluster uses hosted control planes for Red Hat OpenShift Container Platform, configuring IPsec for encrypting traffic to external hosts is not supported.

6.6.5.2. Limitations

Ensure that the following prohibitions are observed:

  • IPv6 configuration is not currently supported by the NMState Operator when configuring IPsec for external traffic.
  • Certificate common names (CN) in the provided certificate bundle must not begin with the ovs_ prefix, because this naming can conflict with pod-to-pod IPsec CN names in the Network Security Services (NSS) database of each node.

6.6.6. Enabling IPsec encryption

As a cluster administrator, you can enable pod-to-pod IPsec encryption, IPsec encryption between the cluster, and external IPsec endpoints.

You can configure IPsec in either of the following modes:

  • Full: Encryption for pod-to-pod and external traffic
  • External: Encryption for external traffic
Note

If you configure IPsec in Full mode, you must also complete the "Configuring IPsec encryption for external traffic" procedure.

Prerequisites

  • Install the OpenShift CLI (oc).
  • You are logged in to the cluster as a user with cluster-admin privileges.
  • You have reduced the size of your cluster MTU by 46 bytes to allow for the overhead of the IPsec ESP header.

Procedure

  1. To enable IPsec encryption, enter the following command:

    $ oc patch networks.operator.openshift.io cluster --type=merge -p \
      '{
      "spec":{
        "defaultNetwork":{
          "ovnKubernetesConfig":{
            "ipsecConfig":{
              "mode":"<mode"> 
    1
    
            }}}}}'
    Copy to Clipboard Toggle word wrap
    1 1 1
    Specify External to encrypt traffic to external hosts or specify Full to encrypt pod-to-pod traffic and, optionally, traffic to external hosts. By default, IPsec is disabled.
  2. Encrypt external traffic with IPsec by completing the "Configuring IPsec encryption for external traffic" procedure.

Verification

  1. To find the names of the OVN-Kubernetes data plane pods, enter the following command:

    $ oc get pods -n openshift-ovn-kubernetes -l=app=ovnkube-node
    Copy to Clipboard Toggle word wrap

    Example output

    ovnkube-node-5xqbf                       8/8     Running   0              28m
    ovnkube-node-6mwcx                       8/8     Running   0              29m
    ovnkube-node-ck5fr                       8/8     Running   0              31m
    ovnkube-node-fr4ld                       8/8     Running   0              26m
    ovnkube-node-wgs4l                       8/8     Running   0              33m
    ovnkube-node-zfvcl                       8/8     Running   0              34m
    ...
    Copy to Clipboard Toggle word wrap

  2. Verify that you enabled IPsec on your cluster by running the following command:

    Note

    As a cluster administrator, you can verify that you enabled IPsec between pods on your cluster when you configured IPsec in Full mode. This step does not verify whether IPsec is working between your cluster and external hosts.

    $ oc -n openshift-ovn-kubernetes rsh ovnkube-node-<XXXXX> ovn-nbctl --no-leader-only get nb_global . ipsec 
    1
    Copy to Clipboard Toggle word wrap

    where: <XXXXX> specifies the random sequence of letters for a pod from an earlier step.

    Successful output from the command shows the status as true.

As a cluster administrator, to encrypt external traffic with IPsec you must configure IPsec for your network infrastructure, including providing PKCS#12 certificates. Because this procedure uses Butane to create machine configs, you must have the butane tool installed.

Note

After you apply the machine config, the Machine Config Operator (MCO) reboots affected nodes in your cluster to rollout the new machine config.

Prerequisites

  • Install the OpenShift CLI (oc).
  • You have installed the butane tool on your local computer.
  • You have installed the NMState Operator on the cluster.
  • You logged in to the cluster as a user with cluster-admin privileges.
  • You have an existing PKCS#12 certificate for the IPsec endpoint and a CA cert in Privacy Enhanced Mail (PEM) format.
  • You enabled IPsec in either Full or External mode on your cluster.
  • You must set the routingViaHost parameter to true in the ovnKubernetesConfig.gatewayConfig specification of the OVN-Kubernetes network plugin.

Procedure

  1. Create an IPsec configuration with an NMState Operator node network configuration policy. For more information, see Configuring an IPsec based VPN connection by using nmstatectl.

    1. To identify the IP address of the cluster node that is the IPsec endpoint, enter the following command:

      $ oc get nodes
      Copy to Clipboard Toggle word wrap
    2. Create a file named ipsec-config.yaml that has a node network configuration policy for the NMState Operator, such as in the following examples. For an overview about NodeNetworkConfigurationPolicy objects, see The Kubernetes NMState project.

      Example NMState IPsec transport configuration

      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: ipsec-config
      spec:
        nodeSelector:
          kubernetes.io/hostname: "<hostname>"
        desiredState:
          interfaces:
          - name: <interface_name>
            type: ipsec
            libreswan:
              left: <cluster_node>
              leftid: '%fromcert'
              leftrsasigkey: '%cert'
              leftcert: left_server
              leftmodecfgclient: false
              right: <external_host>
              rightid: '%fromcert'
              rightrsasigkey: '%cert'
              rightsubnet: <external_address>/32
              ikev2: insist
              type: transport
      Copy to Clipboard Toggle word wrap

      where:

      kubernetes.io/hostname
      Specifies the hostname to apply the policy to. This host serves as the left side host in the IPsec configuration.
      name
      Specifies the name of the interface to create on the host.
      left
      Specifies the hostname of the cluster node that terminates the IPsec tunnel on the cluster side. The name must match the SAN [Subject Alternate Name] from your supplied PKCS#12 certificates.
      right
      Specifies the external hostname, such as host.example.com. The name should match the SAN [Subject Alternate Name] from your supplied PKCS#12 certificates.
      rightsubnet

      Specifies the IP address of the external host, such as 10.1.2.3/32.

      Example NMState IPsec tunnel configuration

      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: ipsec-config
      spec:
        nodeSelector:
          kubernetes.io/hostname: "<hostname>"
        desiredState:
          interfaces:
          - name: <interface_name>
            type: ipsec
            libreswan:
              left: <cluster_node>
              leftid: '%fromcert'
              leftmodecfgclient: false
              leftrsasigkey: '%cert'
              leftcert: left_server
              right: <external_host>
              rightid: '%fromcert'
              rightrsasigkey: '%cert'
              rightsubnet: <external_address>/32
              ikev2: insist
              type: tunnel
      Copy to Clipboard Toggle word wrap

    3. To configure the IPsec interface, enter the following command:

      $ oc create -f ipsec-config.yaml
      Copy to Clipboard Toggle word wrap
  2. Give the following certificate files to add to the Network Security Services (NSS) database on each host. These files are imported as part of the Butane configuration in the next steps.

    • left_server.p12: The certificate bundle for the IPsec endpoints
    • ca.pem: The certificate authority that you signed your certificates with
  3. Create a machine config to add your certificates to the cluster.
  4. Read the password from a mounted secret file:

    $ password=$(cat run/secrets/<left_server_password>)
    Copy to Clipboard Toggle word wrap
    • left_server_password:: The name of the file that contains the password. This file exists in the mounted secret.
  5. Use the pk12util tool, which comes prepackaged with Red Hat Enterprise Linux (RHEL), to specify a password that protects PKCS#12 files by entering the following command. Ensure that you replace the <password> value with your password.

    $ pk12util -W "<password>" -i /etc/pki/certs/left_server.p12 -d /var/lib/ipsec/nss/
    Copy to Clipboard Toggle word wrap
  6. To create Butane config files for the control plane and compute nodes, enter the following command:

    Note

    The Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in 0. For example, 4.16.0. See "Creating machine configs with Butane" for information about Butane.

    $ for role in master worker; do
      cat >> "99-ipsec-${role}-endpoint-config.bu" <<-EOF
      variant: openshift
      version: 4.16.0
      metadata:
        name: 99-${role}-import-certs
        labels:
          machineconfiguration.openshift.io/role: $role
      systemd:
        units:
        - name: ipsec-import.service
          enabled: true
          contents: |
            [Unit]
            Description=Import external certs into ipsec NSS
            Before=ipsec.service
    
            [Service]
            Type=oneshot
            ExecStart=/usr/local/bin/ipsec-addcert.sh
            RemainAfterExit=false
            StandardOutput=journal
    
            [Install]
            WantedBy=multi-user.target
      storage:
        files:
        - path: /etc/pki/certs/ca.pem
          mode: 0400
          overwrite: true
          contents:
            local: ca.pem
        - path: /etc/pki/certs/left_server.p12
          mode: 0400
          overwrite: true
          contents:
            local: left_server.p12
        - path: /usr/local/bin/ipsec-addcert.sh
          mode: 0740
          overwrite: true
          contents:
            inline: |
              #!/bin/bash -e
              echo "importing cert to NSS"
              certutil -A -n "CA" -t "CT,C,C" -d /var/lib/ipsec/nss/ -i /etc/pki/certs/ca.pem
              pk12util -W "" -i /etc/pki/certs/left_server.p12 -d /var/lib/ipsec/nss/
              certutil -M -n "left_server" -t "u,u,u" -d /var/lib/ipsec/nss/
    EOF
    done
    Copy to Clipboard Toggle word wrap
  7. To transform the Butane files that you created in the earlier step into machine configs, enter the following command:

    $ for role in master worker; do
      butane -d . 99-ipsec-${role}-endpoint-config.bu -o ./99-ipsec-$role-endpoint-config.yaml
    done
    Copy to Clipboard Toggle word wrap
  8. To apply the machine configs to your cluster, enter the following command:

    $ for role in master worker; do
      oc apply -f 99-ipsec-${role}-endpoint-config.yaml
    done
    Copy to Clipboard Toggle word wrap
    Important

    As the Machine Config Operator (MCO) updates machines in each machine config pool, it reboots each node one by one. You must wait for all the nodes to update before external IPsec connectivity is available.

Verification

  1. Check the machine config pool status by entering the following command:

    $ oc get mcp
    Copy to Clipboard Toggle word wrap

    A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

    Note

    By default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.

  2. To confirm that IPsec machine configs rolled out successfully, enter the following commands:

    1. Confirm the creation of the IPsec machine configs:

      $ oc get mc | grep ipsec
      Copy to Clipboard Toggle word wrap

      Example output

      80-ipsec-master-extensions        3.2.0        6d15h
      80-ipsec-worker-extensions        3.2.0        6d15h
      Copy to Clipboard Toggle word wrap

    2. Confirm you have applied the IPsec extension to control plane nodes:

      $ oc get mcp master -o yaml | grep 80-ipsec-master-extensions -c
      Copy to Clipboard Toggle word wrap
    3. Confirm the application of the IPsec extension to compute nodes. Example output would show 2.

      $ oc get mcp worker -o yaml | grep 80-ipsec-worker-extensions -c
      Copy to Clipboard Toggle word wrap

As a cluster administrator, you can remove an existing IPsec tunnel to an external host.

Prerequisites

  • Install the OpenShift CLI (oc).
  • You are logged in to the cluster as a user with cluster-admin privileges.
  • You enabled IPsec in either Full or External mode on your cluster.

Procedure

  1. Create a file named remove-ipsec-tunnel.yaml with the following YAML:

    kind: NodeNetworkConfigurationPolicy
    apiVersion: nmstate.io/v1
    metadata:
      name: <name>
    spec:
      nodeSelector:
        kubernetes.io/hostname: <node_name>
      desiredState:
        interfaces:
        - name: <tunnel_name>
          type: ipsec
          state: absent
    Copy to Clipboard Toggle word wrap

    where:

    name
    Specifies a name for the node network configuration policy.
    node_name
    Specifies the name of the node where the IPsec tunnel that you want to remove exists.
    tunnel_name
    Specifies the interface name for the existing IPsec tunnel.
  2. To remove the IPsec tunnel, enter the following command:

    $ oc apply -f remove-ipsec-tunnel.yaml
    Copy to Clipboard Toggle word wrap

6.6.10. Disabling IPsec encryption

As a cluster administrator, you can disable IPsec encryption.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in to the cluster with a user with cluster-admin privileges.

Procedure

  1. To disable IPsec encryption, enter the following command:

    $ oc patch networks.operator.openshift.io cluster --type=merge \
    -p '{
      "spec":{
        "defaultNetwork":{
          "ovnKubernetesConfig":{
            "ipsecConfig":{
              "mode":"Disabled"
            }}}}}'
    Copy to Clipboard Toggle word wrap
  2. Optional: You can increase the size of your cluster MTU by 46 bytes because there is no longer any overhead from the IPsec ESP header in IP packets.

6.7. Zero trust networking

Zero trust is an approach to designing security architectures based on the premise that every interaction begins in an untrusted state. This contrasts with traditional architectures, which might determine trustworthiness based on whether communication starts inside a firewall. More specifically, zero trust attempts to close gaps in security architectures that rely on implicit trust models and one-time authentication.

OpenShift Container Platform can add some zero trust networking capabilities to containers running on the platform without requiring changes to the containers or the software running in them. There are also several products that Red Hat offers that can further augment the zero trust networking capabilities of containers. If you have the ability to change the software running in the containers, then there are other projects that Red Hat supports that can add further capabilities.

Explore the following targeted capabilities of zero trust networking.

6.7.1. Root of trust

Public certificates and private keys are critical to zero trust networking. These are used to identify components to one another, authenticate, and to secure traffic. The certificates are signed by other certificates, and there is a chain of trust to a root certificate authority (CA). Everything participating in the network needs to ultimately have the public key for a root CA so that it can validate the chain of trust. For public-facing things, these are usually the set of root CAs that are globally known, and whose keys are distributed with operating systems, web browsers, and so on. However, it is possible to run a private CA for a cluster or a corporation if the certificate of the private CA is distributed to all parties.

Leverage:

  • OpenShift Container Platform: OpenShift creates a cluster CA at installation that is used to secure the cluster resources. However, OpenShift Container Platform can also create and sign certificates for services in the cluster, and can inject the cluster CA bundle into a pod if requested. Service certificates created and signed by OpenShift Container Platform have a 26-month time to live (TTL) and are rotated automatically at 13 months. They can also be rotated manually if necessary.
  • OpenShift cert-manager Operator: cert-manager allows you to request keys that are signed by an external root of trust. There are many configurable issuers to integrate with external issuers, along with ways to run with a delegated signing certificate. The cert-manager API can be used by other software in zero trust networking to request the necessary certificates (for example, Red Hat OpenShift Service Mesh), or can be used directly by customer software.

6.7.2. Traffic authentication and encryption

Ensure that all traffic on the wire is encrypted and the endpoints are identifiable. An example of this is Mutual TLS, or mTLS, which is a method for mutual authentication.

Leverage:

  • OpenShift Container Platform: With transparent pod-to-pod IPsec, the source and destination of the traffic can be identified by the IP address. There is the capability for egress traffic to be encrypted using IPsec. By using the egress IP feature, the source IP address of the traffic can be used to identify the source of the traffic inside the cluster.
  • Red Hat OpenShift Service Mesh: Provides powerful mTLS capabilities that can transparently augment traffic leaving a pod to provide authentication and encryption.
  • OpenShift cert-manager Operator: Use custom resource definitions (CRDs) to request certificates that can be mounted for your programs to use for SSL/TLS protocols.

6.7.3. Identification and authentication

After you have the ability to mint certificates using a CA, you can use it to establish trust relationships by verification of the identity of the other end of a connection — either a user or a client machine. This also requires management of certificate lifecycles to limit use if compromised.

Leverage:

  • OpenShift Container Platform: Cluster-signed service certificates to ensure that a client is talking to a trusted endpoint. This requires that the service uses SSL/TLS and that the client uses the cluster CA. The client identity must be provided using some other means.
  • Red Hat Single Sign-On: Provides request authentication integration with enterprise user directories or third-party identity providers.
  • Red Hat OpenShift Service Mesh: Transparent upgrade of connections to mTLS, auto-rotation, custom certificate expiration, and request authentication with JSON web token (JWT).
  • OpenShift cert-manager Operator: Creation and management of certificates for use by your application. Certificates can be controlled by CRDs and mounted as secrets, or your application can be changed to interact directly with the cert-manager API.

6.7.4. Inter-service authorization

It is critical to be able to control access to services based on the identity of the requester. This is done by the platform and does not require each application to implement it. That allows better auditing and inspection of the policies.

Leverage:

6.7.5. Transaction-level verification

In addition to the ability to identify and authenticate connections, it is also useful to control access to individual transactions. This can include rate-limiting by source, observability, and semantic validation that a transaction is well formed.

Leverage:

6.7.6. Risk assessment

As the number of security policies in a cluster increase, visualization of what the policies allow and deny becomes increasingly important. These tools make it easier to create, visualize, and manage cluster security policies.

Leverage:

After deploying applications on a cluster, it becomes challenging to manage all of the objects that make up the security rules. It becomes critical to be able to apply site-wide policies and audit the deployed objects for compliance with the policies. This should allow for delegation of some permissions to users and cluster administrators within defined bounds, and should allow for exceptions to the policies if necessary.

Leverage:

After you have a running cluster, you want to be able to observe the traffic and verify that the traffic comports with the defined rules. This is important for intrusion detection, forensics, and is helpful for operational load management.

Leverage:

6.7.9. Endpoint security

It is important to be able to trust that the software running the services in your cluster has not been compromised. For example, you might need to ensure that certified images are run on trusted hardware, and have policies to only allow connections to or from an endpoint based on endpoint characteristics.

Leverage:

  • OpenShift Container Platform: Secureboot can ensure that the nodes in the cluster are running trusted software, so the platform itself (including the container runtime) have not been tampered with. You can configure OpenShift Container Platform to only run images that have been signed by certain signatures.
  • Red Hat Trusted Artifact Signer: This can be used in a trusted build chain and produce signed container images.

6.7.10. Extending trust outside of the cluster

You might want to extend trust outside of the cluster by allowing a cluster to mint CAs for a subdomain. Alternatively, you might want to attest to workload identity in the cluster to a remote endpoint.

Leverage:

  • OpenShift cert-manager Operator: You can use cert-manager to manage delegated CAs so that you can distribute trust across different clusters, or through your organization.
  • Red Hat OpenShift Service Mesh: Can use SPIFFE to provide remote attestation of workloads to endpoints running in remote or local clusters.

As a cluster administrator, when you create an Ingress Controller, the Operator manages the DNS records automatically. This has some limitations when the required DNS zone is different from the cluster DNS zone or when the DNS zone is hosted outside the cloud provider.

As a cluster administrator, you can configure an Ingress Controller to stop automatic DNS management and start manual DNS management. Set dnsManagementPolicy to specify when it should be automatically or manually managed.

When you change an Ingress Controller from Managed to Unmanaged DNS management policy, the Operator does not clean up the previous wildcard DNS record provisioned on the cloud. When you change an Ingress Controller from Unmanaged to Managed DNS management policy, the Operator attempts to create the DNS record on the cloud provider if it does not exist or updates the DNS record if it already exists.

Important

When you set dnsManagementPolicy to unmanaged, you have to manually manage the lifecycle of the wildcard DNS record on the cloud provider.

7.1. Managed DNS management policy

The Managed DNS management policy for Ingress Controllers ensures that the lifecycle of the wildcard DNS record on the cloud provider is automatically managed by the Operator.

7.2. Unmanaged DNS management policy

The Unmanaged DNS management policy for Ingress Controllers ensures that the lifecycle of the wildcard DNS record on the cloud provider is not automatically managed, instead it becomes the responsibility of the cluster administrator.

Note

On the AWS cloud platform, if the domain on the Ingress Controller does not match with dnsConfig.Spec.BaseDomain then the DNS management policy is automatically set to Unmanaged.

As a cluster administrator, you can create a new custom Ingress Controller with the Unmanaged DNS management policy.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a custom resource (CR) file named sample-ingress.yaml containing the following:

    apiVersion: operator.openshift.io/v1
    kind: IngressController
    metadata:
      namespace: openshift-ingress-operator
      name: <name>
    spec:
      domain: <domain>
      endpointPublishingStrategy:
        type: LoadBalancerService
        loadBalancer:
          scope: External
          dnsManagementPolicy: Unmanaged
    Copy to Clipboard Toggle word wrap

    where:

    metadata.name
    Specify the <name> with a name for the IngressController object.
    spec.domain
    Specify the domain based on the DNS record that was created as a prerequisite.
    loadBalancer.scope
    Specify the scope as External to expose the load balancer externally. loadBalancer.dnsManagementPolicy: Specifies if the Ingress Controller is managing the lifecycle of the wildcard DNS record associated with the load balancer. The valid values are Managed and Unmanaged. The default value is Managed.
  2. Save the file to apply the changes.

    oc apply -f <name>.yaml 
    1
    Copy to Clipboard Toggle word wrap

    Inspect the output and confirm that dnsManagementPolicy is set to Unmanaged.

7.4. Modifying an existing Ingress Controller

As a cluster administrator, you can modify an existing Ingress Controller to manually manage the DNS record lifecycle.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Modify the chosen IngressController to set dnsManagementPolicy:

    SCOPE=$(oc -n openshift-ingress-operator get ingresscontroller <name> -o=jsonpath="{.status.endpointPublishingStrategy.loadBalancer.scope}")
    Copy to Clipboard Toggle word wrap
    oc -n openshift-ingress-operator patch ingresscontrollers/<name> --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"dnsManagementPolicy":"Unmanaged", "scope":"${SCOPE}"}}}}'
    Copy to Clipboard Toggle word wrap
  2. Optional: You can delete the associated DNS record in the cloud provider.

Chapter 8. Verifying connectivity to an endpoint

The Cluster Network Operator (CNO) runs a controller, the connectivity check controller, that performs a connection health check between resources within your cluster. By reviewing the results of the health checks, you can diagnose connection problems or eliminate network connectivity as the cause of an issue that you are investigating.

8.1. Connection health checks that are performed

To verify that cluster resources are reachable, a TCP connection is made to each of the following cluster API services:

  • Kubernetes API server service
  • Kubernetes API server endpoints
  • OpenShift API server service
  • OpenShift API server endpoints
  • Load balancers

To verify that services and service endpoints are reachable on every node in the cluster, a TCP connection is made to each of the following targets:

  • Health check target service
  • Health check target endpoints

8.2. Implementation of connection health checks

The connectivity check controller orchestrates connection verification checks in your cluster. The results for the connection tests are stored in PodNetworkConnectivity objects in the openshift-network-diagnostics namespace. Connection tests are performed every minute in parallel.

The Cluster Network Operator (CNO) deploys several resources to the cluster to send and receive connectivity health checks:

Health check source
This program deploys in a single pod replica set managed by a Deployment object. The program consumes PodNetworkConnectivity objects and connects to the spec.targetEndpoint specified in each object.
Health check target
A pod deployed as part of a daemon set on every node in the cluster. The pod listens for inbound health checks. The presence of this pod on every node allows for the testing of connectivity to each node.

You can configure the nodes which network connectivity sources and targets run on with a node selector. Additionally, you can specify permissible tolerations for source and target pods. The configuration is defined in the singleton cluster custom resource of the Network API in the config.openshift.io/v1 API group.

Pod scheduling occurs after you have updated the configuration. Therefore, you must apply node labels that you intend to use in your selectors before updating the configuration. Labels applied after updating your network connectivity check pod placement are ignored.

Refer to the default configuration in the following YAML:

Default configuration for connectivity source and target pods

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  # ...
    networkDiagnostics: 
1

      mode: "All" 
2

      sourcePlacement: 
3

        nodeSelector:
          checkNodes: groupA
        tolerations:
        - key: myTaint
          effect: NoSchedule
          operator: Exists
      targetPlacement: 
4

        nodeSelector:
          checkNodes: groupB
        tolerations:
        - key: myOtherTaint
          effect: NoExecute
          operator: Exists
Copy to Clipboard Toggle word wrap

1 1 1
Specifies the network diagnostics configuration. If a value is not specified or an empty object is specified, and spec.disableNetworkDiagnostics=true is set in the network.operator.openshift.io custom resource named cluster, network diagnostics are disabled. If set, this value overrides spec.disableNetworkDiagnostics=true.
2
Specifies the diagnostics mode. The value can be the empty string, All, or Disabled. The empty string is equivalent to specifying All.
3
Optional: Specifies a selector for connectivity check source pods. You can use the nodeSelector and tolerations fields to further specify the sourceNode pods. These are optional for both source and target pods. You can omit them, use both, or use only one of them.
4
Optional: Specifies a selector for connectivity check target pods. You can use the nodeSelector and tolerations fields to further specify the targetNode pods. These are optional for both source and target pods. You can omit them, use both, or use only one of them.

8.3. Configuring pod connectivity check placement

As a cluster administrator, you can configure which nodes the connectivity check pods run by modifying the network.config.openshift.io object named cluster.

Prerequisites

  • Install the OpenShift CLI (oc).

Procedure

  1. Edit the connectivity check configuration by entering the following command:

    $ oc edit network.config.openshift.io cluster
    Copy to Clipboard Toggle word wrap
  2. In the text editor, update the networkDiagnostics stanza to specify the node selectors that you want for the source and target pods.
  3. Save your changes and exit the text editor.

Verification

  • Verify that the source and target pods are running on the intended nodes by entering the following command:
$ oc get pods -n openshift-network-diagnostics -o wide
Copy to Clipboard Toggle word wrap

Example output

NAME                                    READY   STATUS    RESTARTS   AGE     IP           NODE                                        NOMINATED NODE   READINESS GATES
network-check-source-84c69dbd6b-p8f7n   1/1     Running   0          9h      10.131.0.8   ip-10-0-40-197.us-east-2.compute.internal   <none>           <none>
network-check-target-46pct              1/1     Running   0          9h      10.131.0.6   ip-10-0-40-197.us-east-2.compute.internal   <none>           <none>
network-check-target-8kwgf              1/1     Running   0          9h      10.128.2.4   ip-10-0-95-74.us-east-2.compute.internal    <none>           <none>
network-check-target-jc6n7              1/1     Running   0          9h      10.129.2.4   ip-10-0-21-151.us-east-2.compute.internal   <none>           <none>
network-check-target-lvwnn              1/1     Running   0          9h      10.128.0.7   ip-10-0-17-129.us-east-2.compute.internal   <none>           <none>
network-check-target-nslvj              1/1     Running   0          9h      10.130.0.7   ip-10-0-89-148.us-east-2.compute.internal   <none>           <none>
network-check-target-z2sfx              1/1     Running   0          9h      10.129.0.4   ip-10-0-60-253.us-east-2.compute.internal   <none>           <none>
Copy to Clipboard Toggle word wrap

8.4. PodNetworkConnectivityCheck object fields

The PodNetworkConnectivityCheck object fields are described in the following tables.

Expand
Table 8.1. PodNetworkConnectivityCheck object fields
FieldTypeDescription

metadata.name

string

The name of the object in the following format: <source>-to-<target>. The destination described by <target> includes one of following strings:

  • load-balancer-api-external
  • load-balancer-api-internal
  • kubernetes-apiserver-endpoint
  • kubernetes-apiserver-service-cluster
  • network-check-target
  • openshift-apiserver-endpoint
  • openshift-apiserver-service-cluster

metadata.namespace

string

The namespace that the object is associated with. This value is always openshift-network-diagnostics.

spec.sourcePod

string

The name of the pod where the connection check originates, such as network-check-source-596b4c6566-rgh92.

spec.targetEndpoint

string

The target of the connection check, such as api.devcluster.example.com:6443.

spec.tlsClientCert

object

Configuration for the TLS certificate to use.

spec.tlsClientCert.name

string

The name of the TLS certificate used, if any. The default value is an empty string.

status

object

An object representing the condition of the connection test and logs of recent connection successes and failures.

status.conditions

array

The latest status of the connection check and any previous statuses.

status.failures

array

Connection test logs from unsuccessful attempts.

status.outages

array

Connect test logs covering the time periods of any outages.

status.successes

array

Connection test logs from successful attempts.

The following table describes the fields for objects in the status.conditions array:

Expand
Table 8.2. status.conditions
FieldTypeDescription

lastTransitionTime

string

The time that the condition of the connection transitioned from one status to another.

message

string

The details about last transition in a human readable format.

reason

string

The last status of the transition in a machine readable format.

status

string

The status of the condition.

type

string

The type of the condition.

The following table describes the fields for objects in the status.conditions array:

Expand
Table 8.3. status.outages
FieldTypeDescription

end

string

The timestamp from when the connection failure is resolved.

endLogs

array

Connection log entries, including the log entry related to the successful end of the outage.

message

string

A summary of outage details in a human readable format.

start

string

The timestamp from when the connection failure is first detected.

startLogs

array

Connection log entries, including the original failure.

8.4.1. Connection log fields

The fields for a connection log entry are described in the following table. The object is used in the following fields:

  • status.failures[]
  • status.successes[]
  • status.outages[].startLogs[]
  • status.outages[].endLogs[]
Expand
Table 8.4. Connection log object
FieldTypeDescription

latency

string

Records the duration of the action.

message

string

Provides the status in a human readable format.

reason

string

Provides the reason for status in a machine readable format. The value is one of TCPConnect, TCPConnectError, DNSResolve, DNSError.

success

boolean

Indicates if the log entry is a success or failure.

time

string

The start time of connection check.

As a cluster administrator, you can verify the connectivity of an endpoint, such as an API server, load balancer, service, or pod, and verify that network diagnostics is enabled.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Access to the cluster as a user with the cluster-admin role.

Procedure

  1. Confirm that network diagnostics are enable by entering the following command:

    $ oc get network.config.openshift.io cluster -o yaml
    Copy to Clipboard Toggle word wrap

    Example output

      # ...
      status:
      # ...
        conditions:
        - lastTransitionTime: "2024-05-27T08:28:39Z"
          message: ""
          reason: AsExpected
          status: "True"
          type: NetworkDiagnosticsAvailable
    Copy to Clipboard Toggle word wrap

  2. List the current PodNetworkConnectivityCheck objects by entering the following command:

    $ oc get podnetworkconnectivitycheck -n openshift-network-diagnostics
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                                                                                                                AGE
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0   75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1   73m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2   75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-service-cluster                               75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-default-service-cluster                                 75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-external                                         75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-internal                                         75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-0            75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-1            75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-2            75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh      74m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-c-n8mbf      74m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-d-4hnrz      74m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-service-cluster                               75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0    75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1    75m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2    74m
    network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-service-cluster                                75m
    Copy to Clipboard Toggle word wrap

  3. View the connection test logs:

    1. From the output of the previous command, identify the endpoint that you want to review the connectivity logs for.
    2. View the object by entering the following command:

      $ oc get podnetworkconnectivitycheck <name> \
        -n openshift-network-diagnostics -o yaml
      Copy to Clipboard Toggle word wrap

      where <name> specifies the name of the PodNetworkConnectivityCheck object.

      Example output

      apiVersion: controlplane.operator.openshift.io/v1alpha1
      kind: PodNetworkConnectivityCheck
      metadata:
        name: network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0
        namespace: openshift-network-diagnostics
        ...
      spec:
        sourcePod: network-check-source-7c88f6d9f-hmg2f
        targetEndpoint: 10.0.0.4:6443
        tlsClientCert:
          name: ""
      status:
        conditions:
        - lastTransitionTime: "2021-01-13T20:11:34Z"
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnectSuccess
          status: "True"
          type: Reachable
        failures:
        - latency: 2.241775ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed
            to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect:
            connection refused'
          reason: TCPConnectError
          success: false
          time: "2021-01-13T20:10:34Z"
        - latency: 2.582129ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed
            to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect:
            connection refused'
          reason: TCPConnectError
          success: false
          time: "2021-01-13T20:09:34Z"
        - latency: 3.483578ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed
            to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect:
            connection refused'
          reason: TCPConnectError
          success: false
          time: "2021-01-13T20:08:34Z"
        outages:
        - end: "2021-01-13T20:11:34Z"
          endLogs:
          - latency: 2.032018ms
            message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
              tcp connection to 10.0.0.4:6443 succeeded'
            reason: TCPConnect
            success: true
            time: "2021-01-13T20:11:34Z"
          - latency: 2.241775ms
            message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
              failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
              connect: connection refused'
            reason: TCPConnectError
            success: false
            time: "2021-01-13T20:10:34Z"
          - latency: 2.582129ms
            message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
              failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
              connect: connection refused'
            reason: TCPConnectError
            success: false
            time: "2021-01-13T20:09:34Z"
          - latency: 3.483578ms
            message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
              failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
              connect: connection refused'
            reason: TCPConnectError
            success: false
            time: "2021-01-13T20:08:34Z"
          message: Connectivity restored after 2m59.999789186s
          start: "2021-01-13T20:08:34Z"
          startLogs:
          - latency: 3.483578ms
            message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
              failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
              connect: connection refused'
            reason: TCPConnectError
            success: false
            time: "2021-01-13T20:08:34Z"
        successes:
        - latency: 2.845865ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:14:34Z"
        - latency: 2.926345ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:13:34Z"
        - latency: 2.895796ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:12:34Z"
        - latency: 2.696844ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:11:34Z"
        - latency: 1.502064ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:10:34Z"
        - latency: 1.388857ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:09:34Z"
        - latency: 1.906383ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:08:34Z"
        - latency: 2.089073ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:07:34Z"
        - latency: 2.156994ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:06:34Z"
        - latency: 1.777043ms
          message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
            connection to 10.0.0.4:6443 succeeded'
          reason: TCPConnect
          success: true
          time: "2021-01-13T21:05:34Z"
      Copy to Clipboard Toggle word wrap

As a cluster administrator, you can change the maximum transmission unit (MTU) for the cluster network after cluster installation. This change is disruptive as cluster nodes must be rebooted to finalize the MTU change.

You can change the MTU only for clusters that use the OVN-Kubernetes plugin or the OpenShift SDN network plugin.

9.1. About the cluster MTU

During installation, the cluster network maximum transmission unit (MTU) is set automatically based on the primary network interface MTU of cluster nodes. Typically, you do not need to override the detected MTU, but in some instances you must override it.

You might want to change the MTU of the cluster network for one of the following reasons:

  • The MTU detected during cluster installation is not correct for your infrastructure.
  • Your cluster infrastructure now requires a different MTU, such as from the addition of nodes that need a different MTU for optimal performance.

Only the OVN-Kubernetes network plugin supports changing the MTU value.

9.1.1. Service interruption considerations

When you initiate a maximum transmission unit (MTU) change on your cluster the following effects might impact service availability:

  • At least two rolling reboots are required to complete the migration to a new MTU. During this time, some nodes are not available as they restart.
  • Specific applications deployed to the cluster with shorter timeout intervals than the absolute TCP timeout interval might experience disruption during the MTU change.

9.1.2. MTU value selection

When planning your maximum transmission unit (MTU) migration there are two related but distinct MTU values to consider.

  • Hardware MTU: This MTU value is set based on the specifics of your network infrastructure.
  • Cluster network MTU: This MTU value is always less than your hardware MTU to account for the cluster network overlay overhead. The specific overhead is determined by your network plugin. For OVN-Kubernetes, the overhead is 100 bytes. For OpenShift SDN, the overhead is 50 bytes.

If your cluster requires different MTU values for different nodes, you must subtract the overhead value for your network plugin from the lowest MTU value that is used by any node in your cluster. For example, if some nodes in your cluster have an MTU of 9001, and some have an MTU of 1500, you must set this value to 1400.

Important

To avoid selecting an MTU value that is not acceptable by a node, verify the maximum MTU value (maxmtu) that is accepted by the network interface by using the ip -d link command.

9.1.3. How the migration process works

The following table summarizes the migration process by segmenting between the user-initiated steps in the process and the actions that the migration performs in response.

Expand
Table 9.1. Live migration of the cluster MTU
User-initiated stepsOpenShift Container Platform activity

Set the following values in the Cluster Network Operator configuration:

  • spec.migration.mtu.machine.to
  • spec.migration.mtu.network.from
  • spec.migration.mtu.network.to

Cluster Network Operator (CNO): Confirms that each field is set to a valid value.

  • The mtu.machine.to must be set to either the new hardware MTU or to the current hardware MTU if the MTU for the hardware is not changing. This value is transient and is used as part of the migration process. If you set a hardware MTU different from the current value, you must manually configure it to persist. Use methods such as a machine config, DHCP setting, or kernel command line.
  • The mtu.network.from field must equal the network.status.clusterNetworkMTU field, which is the current MTU of the cluster network.
  • The mtu.network.to field must be set to the target cluster network MTU and must be lower than the hardware MTU to allow for the overlay overhead of the network plugin. The overhead for OVN-Kubernetes is 100 bytes and for OpenShift SDN is 50 bytes.

If the values provided are valid, the CNO writes out a new temporary configuration with the MTU for the cluster network set to the value of the mtu.network.to field.

Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster.

Reconfigure the MTU of the primary network interface for the nodes on the cluster. You can use one of the following methods to accomplish this:

  • Deploying a new NetworkManager connection profile with the MTU change
  • Changing the MTU through a DHCP server setting
  • Changing the MTU through boot parameters

N/A

Set the mtu value in the CNO configuration for the network plugin and set spec.migration to null.

Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster with the new MTU configuration.

9.2. Changing the cluster network MTU

As a cluster administrator, you can increase or decrease the maximum transmission unit (MTU) for your cluster.

Important

You cannot roll back an MTU value for nodes during the MTU migration process, but you can roll back the value after the MTU migration process completes.

The migration is disruptive and nodes in your cluster might be temporarily unavailable as the MTU update takes effect.

The following procedures describe how to change the cluster network MTU by using machine configs, Dynamic Host Configuration Protocol (DHCP), or an ISO image. If you use either the DHCP or ISO approaches, you must refer to configuration artifacts that you kept after installing your cluster to complete the procedure.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have access to the cluster using an account with cluster-admin permissions.
  • You have identified the target MTU for your cluster.

    • The MTU for the OVN-Kubernetes network plugin must be set to 100 less than the lowest hardware MTU value in your cluster.
    • The MTU for the OpenShift SDN network plugin must be set to 50 less than the lowest hardware MTU value in your cluster.
  • If your nodes are physical machines, ensure that the cluster network and the connected network switches support jumbo frames.
  • If your nodes are virtual machines (VMs), ensure that the hypervisor and the connected network switches support jumbo frames.

Procedure

  1. To obtain the current MTU for the cluster network, enter the following command:

    $ oc describe network.config cluster
    Copy to Clipboard Toggle word wrap

    Example output

    ...
    Status:
      Cluster Network:
        Cidr:               10.217.0.0/22
        Host Prefix:        23
      Cluster Network MTU:  1400
      Network Type:         OVNKubernetes
      Service Network:
        10.217.4.0/23
    ...
    Copy to Clipboard Toggle word wrap

  2. Prepare your configuration for the hardware MTU:

    • If your hardware MTU is specified with DHCP, update your DHCP configuration such as with the following dnsmasq configuration:

      dhcp-option-force=26,<mtu>
      Copy to Clipboard Toggle word wrap

      where:

      <mtu>
      Specifies the hardware MTU for the DHCP server to advertise.
    • If your hardware MTU is specified with a kernel command line with PXE, update that configuration accordingly.
    • If your hardware MTU is specified in a NetworkManager connection configuration, complete the following steps. This approach is the default for OpenShift Container Platform if you do not explicitly specify your network configuration with DHCP, a kernel command line, or some other method. Your cluster nodes must all use the same underlying network configuration for the following procedure to work unmodified.

      1. Find the primary network interface:

        • If you are using the OpenShift SDN network plugin, enter the following command:

          $ oc debug node/<node_name> -- chroot /host ip route list match 0.0.0.0/0 | awk '{print $5 }'
          Copy to Clipboard Toggle word wrap

          where:

          <node_name>
          Specifies the name of a node in your cluster.
        • If you are using the OVN-Kubernetes network plugin, enter the following command:

          $ oc debug node/<node_name> -- chroot /host nmcli -g connection.interface-name c show ovs-if-phys0
          Copy to Clipboard Toggle word wrap

          where:

          <node_name>
          Specifies the name of a node in your cluster.
      2. Create the following NetworkManager configuration in the <interface>-mtu.conf file:

        Example NetworkManager connection configuration

        [connection-<interface>-mtu]
        match-device=interface-name:<interface>
        ethernet.mtu=<mtu>
        Copy to Clipboard Toggle word wrap

        where:

        <mtu>
        Specifies the new hardware MTU value.
        <interface>
        Specifies the primary network interface name.
      3. Create two MachineConfig objects, one for the control plane nodes and another for the worker nodes in your cluster:

        1. Create the following Butane config in the control-plane-interface.bu file:

          Note

          The Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in 0. For example, 4.16.0. See "Creating machine configs with Butane" for information about Butane.

          variant: openshift
          version: 4.16.0
          metadata:
            name: 01-control-plane-interface
            labels:
              machineconfiguration.openshift.io/role: master
          storage:
            files:
              - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf 
          1
          
                contents:
                  local: <interface>-mtu.conf 
          2
          
                mode: 0600
          Copy to Clipboard Toggle word wrap
          1
          Specify the NetworkManager connection name for the primary network interface.
          2
          Specify the local filename for the updated NetworkManager configuration file from the previous step.
        2. Create the following Butane config in the worker-interface.bu file:

          Note

          The Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in 0. For example, 4.16.0. See "Creating machine configs with Butane" for information about Butane.

          variant: openshift
          version: 4.16.0
          metadata:
            name: 01-worker-interface
            labels:
              machineconfiguration.openshift.io/role: worker
          storage:
            files:
              - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf 
          1
          
                contents:
                  local: <interface>-mtu.conf 
          2
          
                mode: 0600
          Copy to Clipboard Toggle word wrap
          1
          Specify the NetworkManager connection name for the primary network interface.
          2
          Specify the local filename for the updated NetworkManager configuration file from the previous step.
        3. Create MachineConfig objects from the Butane configs by running the following command:

          $ for manifest in control-plane-interface worker-interface; do
              butane --files-dir . $manifest.bu > $manifest.yaml
            done
          Copy to Clipboard Toggle word wrap
          Warning

          Do not apply these machine configs until explicitly instructed later in this procedure. Applying these machine configs now causes a loss of stability for the cluster.

  3. To begin the MTU migration, specify the migration configuration by entering the following command. The Machine Config Operator performs a rolling reboot of the nodes in the cluster in preparation for the MTU change.

    $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
      '{"spec": { "migration": { "mtu": { "network": { "from": <overlay_from>, "to": <overlay_to> } , "machine": { "to" : <machine_to> } } } } }'
    Copy to Clipboard Toggle word wrap

    where:

    <overlay_from>
    Specifies the current cluster network MTU value.
    <overlay_to>
    Specifies the target MTU for the cluster network. This value is set relative to the value of <machine_to>. For OVN-Kubernetes, this value must be 100 less than the value of <machine_to>. For OpenShift SDN, this value must be 50 less than the value of <machine_to>.
    <machine_to>
    Specifies the MTU for the primary network interface on the underlying host network.

    Example that increases the cluster MTU

    $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
      '{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 9000 } , "machine": { "to" : 9100} } } } }'
    Copy to Clipboard Toggle word wrap

  4. As the Machine Config Operator updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:

    $ oc get machineconfigpools
    Copy to Clipboard Toggle word wrap

    A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

    Note

    By default, the Machine Config Operator updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.

  5. Confirm the status of the new machine configuration on the hosts:

    1. To list the machine configuration state and the name of the applied machine configuration, enter the following command:

      $ oc describe node | egrep "hostname|machineconfig"
      Copy to Clipboard Toggle word wrap

      Example output

      kubernetes.io/hostname=master-0
      machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      machineconfiguration.openshift.io/reason:
      machineconfiguration.openshift.io/state: Done
      Copy to Clipboard Toggle word wrap

    2. Verify that the following statements are true:

      • The value of machineconfiguration.openshift.io/state field is Done.
      • The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.
    3. To confirm that the machine config is correct, enter the following command:

      $ oc get machineconfig <config_name> -o yaml | grep ExecStart
      Copy to Clipboard Toggle word wrap

      where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.

      The machine config must include the following update to the systemd configuration:

      ExecStart=/usr/local/bin/mtu-migration.sh
      Copy to Clipboard Toggle word wrap
  6. Update the underlying network interface MTU value:

    • If you are specifying the new MTU with a NetworkManager connection configuration, enter the following command. The MachineConfig Operator automatically performs a rolling reboot of the nodes in your cluster.

      $ for manifest in control-plane-interface worker-interface; do
          oc create -f $manifest.yaml
        done
      Copy to Clipboard Toggle word wrap
    • If you are specifying the new MTU with a DHCP server option or a kernel command line and PXE, make the necessary changes for your infrastructure.
  7. As the Machine Config Operator updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:

    $ oc get machineconfigpools
    Copy to Clipboard Toggle word wrap

    A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

    Note

    By default, the Machine Config Operator updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.

  8. Confirm the status of the new machine configuration on the hosts:

    1. To list the machine configuration state and the name of the applied machine configuration, enter the following command:

      $ oc describe node | egrep "hostname|machineconfig"
      Copy to Clipboard Toggle word wrap

      Example output

      kubernetes.io/hostname=master-0
      machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      machineconfiguration.openshift.io/reason:
      machineconfiguration.openshift.io/state: Done
      Copy to Clipboard Toggle word wrap

      Verify that the following statements are true:

      • The value of machineconfiguration.openshift.io/state field is Done.
      • The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.
    2. To confirm that the machine config is correct, enter the following command:

      $ oc get machineconfig <config_name> -o yaml | grep path:
      Copy to Clipboard Toggle word wrap

      where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.

      If the machine config is successfully deployed, the previous output contains the /etc/NetworkManager/conf.d/99-<interface>-mtu.conf file path and the ExecStart=/usr/local/bin/mtu-migration.sh line.

  9. Finalize the MTU migration for your plugin. In both example commands, <mtu> specifies the new cluster network MTU that you specified with <overlay_to>.

    • To finalize the MTU migration, enter the following command for the OVN-Kubernetes network plugin:

      $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
        '{"spec": { "migration": null, "defaultNetwork":{ "ovnKubernetesConfig": { "mtu": <mtu> }}}}'
      Copy to Clipboard Toggle word wrap
    • To finalize the MTU migration, enter the following command for the OpenShift SDN network plugin:

      $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
        '{"spec": { "migration": null, "defaultNetwork":{ "openshiftSDNConfig": { "mtu": <mtu> }}}}'
      Copy to Clipboard Toggle word wrap
  10. After finalizing the MTU migration, each machine config pool node is rebooted one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:

    $ oc get machineconfigpools
    Copy to Clipboard Toggle word wrap

    A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

Verification

  1. To get the current MTU for the cluster network, enter the following command:

    $ oc describe network.config cluster
    Copy to Clipboard Toggle word wrap
  2. Get the current MTU for the primary network interface of a node:

    1. To list the nodes in your cluster, enter the following command:

      $ oc get nodes
      Copy to Clipboard Toggle word wrap
    2. To obtain the current MTU setting for the primary network interface on a node, enter the following command:

      $ oc debug node/<node> -- chroot /host ip address show <interface>
      Copy to Clipboard Toggle word wrap

      where:

      <node>
      Specifies a node from the output from the previous step.
      <interface>
      Specifies the primary network interface name for the node.

      Example output

      ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8051
      Copy to Clipboard Toggle word wrap

9.2.1. Checking the current cluster MTU value

To ensure network stability and performance in a hybrid environment where part of your cluster is in the cloud and part is an on-premise environment, you can obtain the current maximum transmission unit (MTU) for the cluster network.

Procedure

  • To obtain the current MTU for the cluster network, enter the following command:

    $ oc describe network.config cluster
    Copy to Clipboard Toggle word wrap

    Example output

    ...
    Status:
      Cluster Network:
        Cidr:               10.217.0.0/22
        Host Prefix:        23
      Cluster Network MTU:  1400
      Network Type:         OVNKubernetes
      Service Network:
        10.217.4.0/23
    ...
    Copy to Clipboard Toggle word wrap

9.2.2. Preparing your hardware MTU configuration

Many ways exist to configure the hardware maximum transmission unit (MTU) for your cluster nodes. The following examples show only the most common methods. Verify the correctness of your infrastructure MTU. Select your preferred method for configuring your hardware MTU in the cluster nodes.

Procedure

  1. Prepare your configuration for the hardware MTU:

    • If your hardware MTU is specified with DHCP, update your DHCP configuration such as with the following dnsmasq configuration:

      dhcp-option-force=26,<mtu>
      Copy to Clipboard Toggle word wrap

      where:

      <mtu>
      Specifies the hardware MTU for the DHCP server to advertise.
    • If your hardware MTU is specified with a kernel command line with PXE, update that configuration accordingly.
    • If your hardware MTU is specified in a NetworkManager connection configuration, complete the following steps. This approach is the default for OpenShift Container Platform if you do not explicitly specify your network configuration with DHCP, a kernel command line, or some other method. Your cluster nodes must all use the same underlying network configuration for the following procedure to work unmodified.

      1. Find the primary network interface by entering the following command:

        $ oc debug node/<node_name> -- chroot /host nmcli -g connection.interface-name c show ovs-if-phys0
        Copy to Clipboard Toggle word wrap

        where:

        <node_name>
        Specifies the name of a node in your cluster.
      2. Create the following NetworkManager configuration in the <interface>-mtu.conf file:

        [connection-<interface>-mtu]
        match-device=interface-name:<interface>
        ethernet.mtu=<mtu>
        Copy to Clipboard Toggle word wrap

        where:

        <interface>
        Specifies the primary network interface name.
        <mtu>
        Specifies the new hardware MTU value.

9.2.3. Creating MachineConfig objects

Use the following procedure to create the MachineConfig objects.

Procedure

  1. Create two MachineConfig objects, one for the control plane nodes and another for the worker nodes in your cluster:

    1. Create the following Butane config in the control-plane-interface.bu file:

      Note

      The Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in 0. For example, 4.16.0. See "Creating machine configs with Butane" for information about Butane.

      variant: openshift
      version: 4.16.0
      metadata:
        name: 01-control-plane-interface
        labels:
          machineconfiguration.openshift.io/role: master
      storage:
        files:
          - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf 
      1
      
            contents:
              local: <interface>-mtu.conf 
      2
      
            mode: 0600
      Copy to Clipboard Toggle word wrap
      1
      Specifies the NetworkManager connection name for the primary network interface.
      2
      Specifies the local filename for the updated NetworkManager configuration file from an earlier step.
    2. Create the following Butane config in the worker-interface.bu file:

      Note

      The Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in 0. For example, 4.16.0. See "Creating machine configs with Butane" for information about Butane.

      variant: openshift
      version: 4.16.0
      metadata:
        name: 01-worker-interface
        labels:
          machineconfiguration.openshift.io/role: worker
      storage:
        files:
          - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf 
      1
      
            contents:
              local: <interface>-mtu.conf 
      2
      
            mode: 0600
      Copy to Clipboard Toggle word wrap
      1
      Specifies the NetworkManager connection name for the primary network interface.
      2
      Specifies the local filename for the updated NetworkManager configuration file from an earlier step.
  2. Create MachineConfig objects from the Butane configs by running the following command:

    $ for manifest in control-plane-interface worker-interface; do
        butane --files-dir . $manifest.bu > $manifest.yaml
      done
    Copy to Clipboard Toggle word wrap
    Warning

    Do not apply these machine configs until explicitly instructed later in this procedure. Applying these machine configs now causes a loss of stability for the cluster.

9.2.4. Beginning the MTU migration

Start the maximum transmission unit (MTU) migration by specifying the migration configuration for the cluster network and machine interfaces. The Machine Config Operator performs a rolling reboot of the nodes to prepare the cluster for the MTU change.

Procedure

  1. To begin the MTU migration, specify the migration configuration by entering the following command. The Machine Config Operator performs a rolling reboot of the nodes in the cluster in preparation for the MTU change.

    $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
      '{"spec": { "migration": { "mtu": { "network": { "from": <overlay_from>, "to": <overlay_to> } , "machine": { "to" : <machine_to> } } } } }'
    Copy to Clipboard Toggle word wrap

    where:

    <overlay_from>
    Specifies the current cluster network MTU value.
    <overlay_to>
    Specifies the target MTU for the cluster network. This value is set relative to the value of <machine_to>. For OVN-Kubernetes, this value must be 100 less than the value of <machine_to>.
    <machine_to>
    Specifies the MTU for the primary network interface on the underlying host network.
    $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
      '{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 9000 } , "machine": { "to" : 9100} } } } }'
    Copy to Clipboard Toggle word wrap
  2. As the Machine Config Operator updates machines in each machine config pool, the Operator reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:

    $ oc get machineconfigpools
    Copy to Clipboard Toggle word wrap

    A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

    Note

    By default, the Machine Config Operator updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.

9.2.5. Verifying the machine configuration

Verify the machine configuration on your hosts to confirm that the maximum transmission unit (MTU) migration applied successfully. Checking the configuration state and system settings help ensures that the nodes use the correct migration script.

Procedure

  • Confirm the status of the new machine configuration on the hosts:

    1. To list the machine configuration state and the name of the applied machine configuration, enter the following command:

      $ oc describe node | egrep "hostname|machineconfig"
      Copy to Clipboard Toggle word wrap

      Example output

      kubernetes.io/hostname=master-0
      machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      machineconfiguration.openshift.io/reason:
      machineconfiguration.openshift.io/state: Done
      Copy to Clipboard Toggle word wrap

    2. Verify that the following statements are true:

      • The value of machineconfiguration.openshift.io/state field is Done.
      • The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.
    3. To confirm that the machine config is correct, enter the following command:

      $ oc get machineconfig <config_name> -o yaml | grep ExecStart
      Copy to Clipboard Toggle word wrap

      where:

      <config_name>
      Specifies the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.

      The machine config must include the following update to the systemd configuration:

      ExecStart=/usr/local/bin/mtu-migration.sh
      Copy to Clipboard Toggle word wrap

9.2.6. Applying the new hardware MTU value

Use the following procedure to apply the new hardware maximum transmission unit (MTU) value.

Procedure

  1. Update the underlying network interface MTU value:

    • If you are specifying the new MTU with a NetworkManager connection configuration, enter the following command. The MachineConfig Operator automatically performs a rolling reboot of the nodes in your cluster.

      $ for manifest in control-plane-interface worker-interface; do
          oc create -f $manifest.yaml
        done
      Copy to Clipboard Toggle word wrap
    • If you are specifying the new MTU with a DHCP server option or a kernel command line and PXE, make the necessary changes for your infrastructure.
  2. As the Machine Config Operator updates machines in each machine config pool, the Operator reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:

    $ oc get machineconfigpools
    Copy to Clipboard Toggle word wrap

    A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

    Note

    By default, the Machine Config Operator updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.

  3. Confirm the status of the new machine configuration on the hosts:

    1. To list the machine configuration state and the name of the applied machine configuration, enter the following command:

      $ oc describe node | egrep "hostname|machineconfig"
      Copy to Clipboard Toggle word wrap

      Example output

      kubernetes.io/hostname=master-0
      machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
      machineconfiguration.openshift.io/reason:
      machineconfiguration.openshift.io/state: Done
      Copy to Clipboard Toggle word wrap

      Verify that the following statements are true:

      • The value of machineconfiguration.openshift.io/state field is Done.
      • The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.
    2. To confirm that the machine config is correct, enter the following command:

      $ oc get machineconfig <config_name> -o yaml | grep path:
      Copy to Clipboard Toggle word wrap

      where:

      <config_name>
      Specifies the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.

      If the machine config is successfully deployed, the previous output contains the /etc/NetworkManager/conf.d/99-<interface>-mtu.conf file path and the ExecStart=/usr/local/bin/mtu-migration.sh line.

9.2.7. Finalizing the MTU migration

Finalize the MTU migration to apply the new maximum transmission unit (MTU) settings to the OVN-Kubernetes network plugin. This updates the cluster configuration and triggers a rolling reboot of the nodes to complete the process.

Procedure

  1. To finalize the MTU migration, enter the following command for the OVN-Kubernetes network plugin:

    $ oc patch Network.operator.openshift.io cluster --type=merge --patch \
      '{"spec": { "migration": null, "defaultNetwork":{ "ovnKubernetesConfig": { "mtu": <mtu> }}}}'
    Copy to Clipboard Toggle word wrap

    where:

    <mtu>
    Specifies the new cluster network MTU that you specified with <overlay_to>.
  2. After finalizing the MTU migration, each machine config pool node is rebooted one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:

    $ oc get machineconfigpools
    Copy to Clipboard Toggle word wrap

    A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

Verification

  1. To get the current MTU for the cluster network, enter the following command:

    $ oc describe network.config cluster
    Copy to Clipboard Toggle word wrap
  2. Get the current MTU for the primary network interface of a node:

    1. To list the nodes in your cluster, enter the following command:

      $ oc get nodes
      Copy to Clipboard Toggle word wrap
    2. To obtain the current MTU setting for the primary network interface on a node, enter the following command:

      $ oc adm node-logs <node> -u ovs-configuration | grep configure-ovs.sh | grep mtu | grep <interface> | head -1
      Copy to Clipboard Toggle word wrap

      where:

      <node>
      Specifies a node from the output from the previous step.
      <interface>
      Specifies the primary network interface name for the node.

      Example output

      ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8051
      Copy to Clipboard Toggle word wrap

During cluster installation, you can configure the node port range to meet the requirements of your cluster. After cluster installation, only a cluster administrator can expand the range as a postinstallation task. If your cluster uses a large number of node ports, consider increasing the available port range according to the requirements of your cluster.

If you do not set a node port range during cluster installation, the default range of 30000-32768 applies to your cluster. In this situation, you can expand the range on either side, but you must preserve 30000-32768 within your new port range.

Important

Red Hat has not performed testing outside the default port range of 30000-32768. For ranges outside the default port range, ensure that you test to verify the expanding node port range does not impact your cluster. In particular, ensure that there is:

  • No overlap with any ports already in use by host processes
  • No overlap with any ports already in use by pods that are configured with host networking

If you expanded the range and a port allocation issue occurs, create a new cluster and set the required range for it.

If you expand the node port range and OpenShift CLI (oc) stops working because of a port conflict with the OpenShift Container Platform API server, you must create a new cluster.

10.1. Expanding the node port range

You can expand the node port range for your cluster. After you install your OpenShift Container Platform cluster, you cannot shrink the node port range on either side of the currently configured range.

Important

Red Hat has not performed testing outside the default port range of 30000-32768. For ranges outside the default port range, ensure that you test to verify that expanding your node port range does not impact your cluster. If you expanded the range and a port allocation issue occurs, create a new cluster and set the required range for it.

Prerequisites

  • Installed the OpenShift CLI (oc).
  • Logged in to the cluster as a user with cluster-admin privileges.
  • You ensured that your cluster infrastructure allows access to the ports that exist in the extended range. For example, if you expand the node port range to 30000-32900, your firewall or packet filtering configuration must allow the inclusive port range of 30000-32900.

Procedure

  • To expand the range for the serviceNodePortRange parameter in the network.config.openshift.io object that your cluster uses to manage traffic for pods, enter the following command:

    $ oc patch network.config.openshift.io cluster --type=merge -p \
      '{
        "spec":
          { "serviceNodePortRange": "<port_range>" }
      }'
    Copy to Clipboard Toggle word wrap

    where:

    <port_range>
    specifies your expanded range, such as 30000-32900.
    Tip

    You can also apply the following YAML to update the node port range:

    apiVersion: config.openshift.io/v1
    kind: Network
    metadata:
      name: cluster
    spec:
      serviceNodePortRange: "<port_range>"
    # ...
    Copy to Clipboard Toggle word wrap

    Example output

    network.config.openshift.io/cluster patched
    Copy to Clipboard Toggle word wrap

Verification

  • To confirm that the updated configuration is active, enter the following command. The update can take several minutes to apply.

    $ oc get configmaps -n openshift-kube-apiserver config \
      -o jsonpath="{.data['config\.yaml']}" | \
      grep -Eo '"service-node-port-range":["[[:digit:]]+-[[:digit:]]+"]'
    Copy to Clipboard Toggle word wrap

    Example output

    "service-node-port-range":["30000-32900"]
    Copy to Clipboard Toggle word wrap

Chapter 11. Configuring the cluster network range

To expand the cluster network range in OpenShift Container Platform to support more nodes and IP addresses, you can modify the cluster network CIDR mask after cluster installation. This procedure requires the OVN-Kubernetes network plugin and provides more IP space for additional nodes.

For example, if you deployed a cluster and specified 10.128.0.0/19 as the cluster network range and a host prefix of 23, you are limited to 16 nodes. You can expand that to 510 nodes by changing the CIDR mask on a cluster to /14.

When expanding the cluster network address range, your cluster must use the OVN-Kubernetes network plugin. Other network plugins are not supported.

The following limitations apply when modifying the cluster network IP address range:

  • The CIDR mask size specified must always be smaller than the currently configured CIDR mask size, because you can only increase IP space by adding more nodes to an installed cluster
  • The host prefix cannot be modified
  • Pods that are configured with an overridden default gateway must be recreated after the cluster network expands

To expand the cluster network IP address range in OpenShift Container Platform to support more nodes, you can modify the cluster network CIDR mask using the oc patch command.

Note

This change requires rolling out a new Operator configuration across the cluster, and can take up to 30 minutes to take effect.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have logged in to the cluster with a user with cluster-admin privileges.
  • You have ensured that the cluster uses the OVN-Kubernetes network plugin.

Procedure

  1. To obtain the cluster network range and host prefix for your cluster, enter the following command:

    $ oc get network.operator.openshift.io \
      -o jsonpath="{.items[0].spec.clusterNetwork}"
    Copy to Clipboard Toggle word wrap

    Example output

    [{"cidr":"10.217.0.0/22","hostPrefix":23}]
    Copy to Clipboard Toggle word wrap

  2. To expand the cluster network IP address range, enter the following command. Use the CIDR IP address range and host prefix returned from the output of the previous command.

    $ oc patch Network.config.openshift.io cluster --type='merge' --patch \
      '{
        "spec":{
          "clusterNetwork": [ {"cidr":"<network>/<cidr>","hostPrefix":<prefix>} ],
          "networkType": "OVNKubernetes"
        }
      }'
    Copy to Clipboard Toggle word wrap

    where:

    <network>
    Specifies the network part of the cidr field that you obtained from the previous step. You cannot change this value.
    <cidr>
    Specifies the network prefix length. For example, 14. Change this value to a smaller number than the value from the output in the previous step to expand the cluster network range.
    <prefix>
    Specifies the current host prefix for your cluster. This value must be the same value for the hostPrefix field that you obtained from the previous step.

    Example command

    $ oc patch Network.config.openshift.io cluster --type='merge' --patch \
      '{
        "spec":{
          "clusterNetwork": [ {"cidr":"10.217.0.0/14","hostPrefix": 23} ],
          "networkType": "OVNKubernetes"
        }
      }'
    Copy to Clipboard Toggle word wrap

    Example output

    network.config.openshift.io/cluster patched
    Copy to Clipboard Toggle word wrap

  3. To confirm that the configuration is active, enter the following command. It can take up to 30 minutes for this change to take effect.

    $ oc get network.operator.openshift.io \
      -o jsonpath="{.items[0].spec.clusterNetwork}"
    Copy to Clipboard Toggle word wrap

    Example output

    [{"cidr":"10.217.0.0/14","hostPrefix":23}]
    Copy to Clipboard Toggle word wrap

Chapter 12. Configuring IP failover

To provide high availability for Virtual IP addresses and ensure services remain accessible when nodes fail in OpenShift Container Platform, you can configure IP failover using Keepalived.

IP failover uses Keepalived to host a set of externally accessible Virtual IP (VIP) addresses on a set of hosts. Each VIP address is only serviced by a single host at a time. Keepalived uses the Virtual Router Redundancy Protocol (VRRP) to determine which host, from the set of hosts, services which VIP. If a host becomes unavailable, or if the service that Keepalived is watching does not respond, the VIP is switched to another host from the set. This means a VIP is always serviced as long as a host is available.

Every VIP in the set is serviced by a node selected from the set. If a single node is available, the VIPs are served. There is no way to explicitly distribute the VIPs over the nodes, so there can be nodes with no VIPs and other nodes with many VIPs. If there is only one node, all VIPs are on it.

The administrator must ensure that all of the VIP addresses meet the following requirements:

  • Accessible on the configured hosts from outside the cluster.
  • Not used for any other purpose within the cluster.

Keepalived on each node determines whether the needed service is running. If it is, VIPs are supported and Keepalived participates in the negotiation to determine which node serves the VIP. For a node to participate, the service must be listening on the watch port on a VIP or the check must be disabled.

Note

Each VIP in the set might be served by a different node.

IP failover monitors a port on each VIP to determine whether the port is reachable on the node. If the port is not reachable, the VIP is not assigned to the node. If the port is set to 0, this check is suppressed. The check script does the needed testing.

When a node running Keepalived passes the check script, the VIP on that node can enter the master state based on its priority and the priority of the current master and as determined by the preemption strategy.

A cluster administrator can provide a script through the OPENSHIFT_HA_NOTIFY_SCRIPT variable, and this script is called whenever the state of the VIP on the node changes. Keepalived uses the master state when it is servicing the VIP, the backup state when another node is servicing the VIP, or in the fault state when the check script fails. The notify script is called with the new state whenever the state changes.

You can create an IP failover deployment configuration on OpenShift Container Platform. The IP failover deployment configuration specifies the set of VIP addresses, and the set of nodes on which to service them. A cluster can have multiple IP failover deployment configurations, with each managing its own set of unique VIP addresses. Each node in the IP failover configuration runs an IP failover pod, and this pod runs Keepalived.

When using VIPs to access a pod with host networking, the application pod runs on all nodes that are running the IP failover pods. This enables any of the IP failover nodes to become the master and service the VIPs when needed. If application pods are not running on all nodes with IP failover, either some IP failover nodes never service the VIPs or some application pods never receive any traffic. Use the same selector and replication count, for both IP failover and the application pods, to avoid this mismatch.

While using VIPs to access a service, any of the nodes can be in the IP failover set of nodes, since the service is reachable on all nodes, no matter where the application pod is running. Any of the IP failover nodes can become master at any time. The service can either use external IPs and a service port or it can use a NodePort. Setting up a NodePort is a privileged operation.

When using external IPs in the service definition, the VIPs are set to the external IPs, and the IP failover monitoring port is set to the service port. When using a node port, the port is open on every node in the cluster, and the service load-balances traffic from whatever node currently services the VIP. In this case, the IP failover monitoring port is set to the NodePort in the service definition.

Important

Even though a service VIP is highly available, performance can still be affected. Keepalived makes sure that each of the VIPs is serviced by some node in the configuration, and several VIPs can end up on the same node even when other nodes have none. Strategies that externally load-balance across a set of VIPs can be thwarted when IP failover puts multiple VIPs on the same node.

When you use ExternalIP, you can set up IP failover to have the same VIP range as the ExternalIP range. You can also disable the monitoring port. In this case, all of the VIPs appear on same node in the cluster. Any user can set up a service with an ExternalIP and make it highly available.

Important

There are a maximum of 254 VIPs in the cluster.

12.1. IP failover environment variables

The IP failover environment variables reference lists all variables you can use to configure IP failover in OpenShift Container Platform, including VIP addresses, monitoring ports, and network interfaces.

Expand
Table 12.1. IP failover environment variables
Variable NameDefaultDescription

OPENSHIFT_HA_MONITOR_PORT

80

The IP failover pod tries to open a TCP connection to this port on each Virtual IP (VIP). If connection is established, the service is considered to be running. If this port is set to 0, the test always passes.

OPENSHIFT_HA_NETWORK_INTERFACE

 

The interface name that IP failover uses to send Virtual Router Redundancy Protocol (VRRP) traffic. The default value is eth0.

If your cluster uses the OVN-Kubernetes network plugin, set this value to br-ex to avoid packet loss. For a cluster that uses the OVN-Kubernetes network plugin, all listening interfaces do not serve VRRP but instead expect inbound traffic over a br-ex bridge.

OPENSHIFT_HA_REPLICA_COUNT

2

The number of replicas to create. This must match spec.replicas value in IP failover deployment configuration.

OPENSHIFT_HA_VIRTUAL_IPS

 

The list of IP address ranges to replicate. This must be provided. For example, 1.2.3.4-6,1.2.3.9.

OPENSHIFT_HA_VRRP_ID_OFFSET

0

The offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is 0, and the allowed range is 0 through 255.

OPENSHIFT_HA_VIP_GROUPS

 

The number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the OPENSHIFT_HA_VIP_GROUPS variable.

OPENSHIFT_HA_IPTABLES_CHAIN

INPUT

The name of the iptables chain, to automatically add an iptables rule to allow the VRRP traffic on. If the value is not set, an iptables rule is not added. If the chain does not exist, it is not created.

OPENSHIFT_HA_CHECK_SCRIPT

 

The full path name in the pod file system of a script that is periodically run to verify the application is operating.

OPENSHIFT_HA_CHECK_INTERVAL

2

The period, in seconds, that the check script is run.

OPENSHIFT_HA_NOTIFY_SCRIPT

 

The full path name in the pod file system of a script that is run whenever the state changes.

OPENSHIFT_HA_PREEMPTION

preempt_nodelay 300

The strategy for handling a new higher priority host. The nopreempt strategy does not move master from the lower priority host to the higher priority host.

12.2. Configuring IP failover in your cluster

To configure IP failover in your OpenShift Container Platform cluster and provide high availability for Virtual IP addresses, you can create a deployment that runs Keepalived on selected nodes to monitor services and fail over VIPs when nodes become unavailable.

The IP failover deployment ensures that a failover pod runs on each of the nodes matching the constraints or the label used. The pod, which runs Keepalived, can monitor an endpoint and use Virtual Router Redundancy Protocol (VRRP) to fail over the virtual IP (VIP) from one node to another if the first node cannot reach the service or endpoint.

For production use, set a selector that selects at least two nodes, and set replicas equal to the number of selected nodes.

Prerequisites

Procedure

  1. Create an IP failover service account:

    $ oc create sa ipfailover
    Copy to Clipboard Toggle word wrap
  2. Update security context constraints (SCC) for hostNetwork:

    $ oc adm policy add-scc-to-user privileged -z ipfailover
    Copy to Clipboard Toggle word wrap
    $ oc adm policy add-scc-to-user hostnetwork -z ipfailover
    Copy to Clipboard Toggle word wrap
  3. Red Hat OpenStack Platform (RHOSP) only: Complete the following steps to make a failover VIP address reachable on RHOSP ports.

    1. Use the RHOSP CLI to show the default RHOSP API and VIP addresses in the allowed_address_pairs parameter of your RHOSP cluster:

      $ openstack port show <cluster_name> -c allowed_address_pairs
      Copy to Clipboard Toggle word wrap

      Output example

      *Field*                  *Value*
      allowed_address_pairs    ip_address='192.168.0.5', mac_address='fa:16:3e:31:f9:cb'
                               ip_address='192.168.0.7', mac_address='fa:16:3e:31:f9:cb'
      Copy to Clipboard Toggle word wrap

    2. Set a different VIP address for the IP failover deployment and make the address reachable on RHOSP ports by entering the following command in the RHOSP CLI. Do not set any default RHOSP API and VIP addresses as the failover VIP address for the IP failover deployment.

      Example of adding the 1.1.1.1 failover IP address as an allowed address on RHOSP ports.

      $ openstack port set <cluster_name> --allowed-address ip-address=1.1.1.1,mac-address=fa:fa:16:3e:31:f9:cb
      Copy to Clipboard Toggle word wrap

    3. Create a deployment YAML file to configure IP failover for your deployment. See "Example deployment YAML for IP failover configuration" in a later step.
    4. Specify the following specification in the IP failover deployment so that you pass the failover VIP address to the OPENSHIFT_HA_VIRTUAL_IPS environment variable:

      Example of adding the 1.1.1.1 VIP address to OPENSHIFT_HA_VIRTUAL_IPS

      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: ipfailover-keepalived
      # ...
            spec:
                env:
                - name: OPENSHIFT_HA_VIRTUAL_IPS
                value: "1.1.1.1"
      # ...
      Copy to Clipboard Toggle word wrap

  4. Create a deployment YAML file to configure IP failover.

    Note

    For Red Hat OpenStack Platform (RHOSP), you do not need to re-create the deployment YAML file. You already created this file as part of the earlier instructions.

    Example deployment YAML for IP failover configuration

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: ipfailover-keepalived
      labels:
        ipfailover: hello-openshift
    spec:
      strategy:
        type: Recreate
      replicas: 2
      selector:
        matchLabels:
          ipfailover: hello-openshift
      template:
        metadata:
          labels:
            ipfailover: hello-openshift
        spec:
          serviceAccountName: ipfailover
          privileged: true
          hostNetwork: true
          nodeSelector:
            node-role.kubernetes.io/worker: ""
          containers:
          - name: openshift-ipfailover
            image: registry.redhat.io/openshift4/ose-keepalived-ipfailover-rhel9:v4.16
            ports:
            - containerPort: 63000
              hostPort: 63000
            imagePullPolicy: IfNotPresent
            securityContext:
              privileged: true
            volumeMounts:
            - name: lib-modules
              mountPath: /lib/modules
              readOnly: true
            - name: host-slash
              mountPath: /host
              readOnly: true
              mountPropagation: HostToContainer
            - name: etc-sysconfig
              mountPath: /etc/sysconfig
              readOnly: true
            - name: config-volume
              mountPath: /etc/keepalive
            env:
            - name: OPENSHIFT_HA_CONFIG_NAME
              value: "ipfailover"
            - name: OPENSHIFT_HA_VIRTUAL_IPS
              value: "1.1.1.1-2"
            - name: OPENSHIFT_HA_VIP_GROUPS
              value: "10"
            - name: OPENSHIFT_HA_NETWORK_INTERFACE
              value: "ens3" #The host interface to assign the VIPs
            - name: OPENSHIFT_HA_MONITOR_PORT
              value: "30060"
            - name: OPENSHIFT_HA_VRRP_ID_OFFSET
              value: "0"
            - name: OPENSHIFT_HA_REPLICA_COUNT
              value: "2" #Must match the number of replicas in the deployment
            - name: OPENSHIFT_HA_USE_UNICAST
              value: "false"
            #- name: OPENSHIFT_HA_UNICAST_PEERS
              #value: "10.0.148.40,10.0.160.234,10.0.199.110"
            - name: OPENSHIFT_HA_IPTABLES_CHAIN
              value: "INPUT"
            #- name: OPENSHIFT_HA_NOTIFY_SCRIPT
            #  value: /etc/keepalive/mynotifyscript.sh
            - name: OPENSHIFT_HA_CHECK_SCRIPT
              value: "/etc/keepalive/mycheckscript.sh"
            - name: OPENSHIFT_HA_PREEMPTION
              value: "preempt_delay 300"
            - name: OPENSHIFT_HA_CHECK_INTERVAL
              value: "2"
            livenessProbe:
              initialDelaySeconds: 10
              exec:
                command:
                - pgrep
                - keepalived
          volumes:
          - name: lib-modules
            hostPath:
              path: /lib/modules
          - name: host-slash
            hostPath:
              path: /
          - name: etc-sysconfig
            hostPath:
              path: /etc/sysconfig
          # config-volume contains the check script
          # created with `oc create configmap keepalived-checkscript --from-file=mycheckscript.sh`
          - configMap:
              defaultMode: 0755
              name: keepalived-checkscript
            name: config-volume
          imagePullSecrets:
            - name: openshift-pull-secret
    Copy to Clipboard Toggle word wrap

    where:

    ipfailover-keepalived
    Specifies the name of the IP failover deployment.
    OPENSHIFT_HA_VIRTUAL_IPS
    Specifies the lis t of IP address ranges to replicate. This must be provided. For example, 1.2.3.4-6,1.2.3.9.
    OPENSHIFT_HA_VIP_GROUPS
    Specifies the number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the OPENSHIFT_HA_VIP_GROUPS variable.
    OPENSHIFT_HA_NETWORK_INTERFACE
    Specifies the interface name that IP failover uses to send VRRP traffic. By default, eth0 is used.
    OPENSHIFT_HA_MONITOR_PORT
    Specifies the IP failover pod tries to open a TCP connection to this port on each VIP. If connection is established, the service is considered to be running. If this port is set to 0, the test always passes. The default value is 80.
    OPENSHIFT_HA_VRRP_ID_OFFSET
    Specifies the offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is 10, and the allowed range is 0 through 255.
    OPENSHIFT_HA_REPLICA_COUNT
    Specifies the number of replicas to create. This must match spec.replicas value in IP failover configuration. The default value is 2.
    OPENSHIFT_HA_USE_UNICAST
    Specifies whether to use unicast mode for VRRP. The default value is false.
    OPENSHIFT_HA_UNICAST_PEERS
    Specifies the list of IP addresses of the unicast peers. This must be provided if OPENSHIFT_HA_USE_UNICAST is set to true.
    OPENSHIFT_HA_IPTABLES_CHAIN
    Specifies the name of the iptables chain to automatically add an iptables rule to allow the VRRP traffic on. If the value is not set, an iptables rule is not added. If the chain does not exist, it is not created, and Keepalived operates in unicast mode. The default is INPUT.
    OPENSHIFT_HA_NOTIFY_SCRIPT
    Specifies the full path name in the pod file system of a script that is run whenever the state changes.
    OPENSHIFT_HA_CHECK_SCRIPT
    Specifies the full path name in the pod file system of a script that is periodically run to verify the application is operating.
    OPENSHIFT_HA_PREEMPTION
    Specifies the strategy for handling a new higher priority host. The default value is preempt_delay 300, which causes a Keepalived instance to take over a VIP after 5 minutes if a lower-priority master is holding the VIP.
    OPENSHIFT_HA_CHECK_INTERVAL
    Specifies the period, in seconds, that the check script is run. The default value is 2.
    openshift-pull-secret
    Specifies the name of the pull secret to use for the IP failover deployment. Create the pull secret before creating the deployment, otherwise you will get an error when creating the deployment.

12.3. Configuring check and notify scripts

To customize health monitoring for IP failover and receive notifications when VIP state changes in OpenShift Container Platform, you can configure check and notify scripts by using ConfigMap objects.

The check and notify scripts run inside the IP failover pod and use the pod file system rather than the host file system. The host file system is available to the pod under the /hosts mount path. When configuring a check or notify script, you must provide the full path to the script.

Each IP failover pod manages a Keepalived daemon that controls one or more virtual IP (VIP) addresses on the node where the pod is running. Keepalived tracks the state of each VIP on the node, which can be master, backup, or fault.

The full path names of the check and notify scripts are added to the Keepalived configuration file, /etc/keepalived/keepalived.conf, which is loaded each time Keepalived starts. You add the scripts to the pod by using a ConfigMap object, as described in the following sections.

Check script

Keepalived monitors application health by periodically running an optional, user-supplied check script. For example, the script can test a web server by issuing a request and verifying the response. If you do not provide a check script, Keepalived runs a default script that tests the TCP connection. This default test is suppressed when the monitor port is set to 0.

If the check script returns a non-zero value, the node enters the backup state and any VIPs that it holds are reassigned.

Notify script

As a cluster administrator, you can provide an optional notify script that Keepalived calls whenever the VIP state changes. Keepalived passes the following parameters to the notify script:

  • $1group or instance
  • $2 — Name of the group or instance
  • $3 — New state: master, backup, or fault

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You are logged in to the cluster with a user with cluster-admin privileges.

Procedure

  1. Create the desired script and create a ConfigMap object to hold it. The script has no input arguments and must return 0 for OK and 1 for fail.

    The check script, mycheckscript.sh:

    #!/bin/bash
        # Whatever tests are needed
        # E.g., send request and verify response
    exit 0
    Copy to Clipboard Toggle word wrap
  2. Create the ConfigMap object by running the following command:

    $ oc create configmap mycustomcheck --from-file=mycheckscript.sh
    Copy to Clipboard Toggle word wrap
  3. Add the script to the pod. The defaultMode for the mounted ConfigMap object files must able to run by using oc commands or by editing the IP failover configuration.

    1. Add the script to the pod by running the following command. A value of 0755, 493 decimal, is typical. For example:

      $ oc set env deploy/ipfailover-keepalived \
          OPENSHIFT_HA_CHECK_SCRIPT=/etc/keepalive/mycheckscript.sh
      Copy to Clipboard Toggle word wrap
      $ oc set volume deploy/ipfailover-keepalived --add --overwrite \
          --name=config-volume \
          --mount-path=/etc/keepalive \
          --source='{"configMap": { "name": "mycustomcheck", "defaultMode": 493}}'
      Copy to Clipboard Toggle word wrap
      Note

      The oc set env command is whitespace sensitive. There must be no whitespace on either side of the = sign.

    2. Alternatively, edit the ipfailover-keepalived configuration by running the following command:

      $ oc edit deploy ipfailover-keepalived
      Copy to Clipboard Toggle word wrap

      Example ipfailover-keepalived configuration

          spec:
            containers:
            - env:
              - name: OPENSHIFT_HA_CHECK_SCRIPT
                value: /etc/keepalive/mycheckscript.sh
      ...
              volumeMounts:
              - mountPath: /etc/keepalive
                name: config-volume
            dnsPolicy: ClusterFirst
      ...
            volumes:
            - configMap:
                defaultMode: 0755
                name: customrouter
              name: config-volume
      ...
      Copy to Clipboard Toggle word wrap

      where:

      spec.container.env.name
      Specifies the OPENSHIFT_HA_CHECK_SCRIPT environment variable to point to the mounted script file.
      spec.container.volumeMounts
      Specifies the spec.container.volumeMounts field to create the mount point.
      spec.volumes
      Specifies a new spec.volumes field to mention the config map.
      spec.volumes.configMap.defaultMode
      Specifies run permission on the files. When read back, it is displayed in decimal, 493.
    3. Save the changes and exit the editor. This restarts the ipfailover-keepalived configuration.

12.4. Configuring VRRP preemption

To control VIP preemption behavior when nodes recover in OpenShift Container Platform, you can configure the OPENSHIFT_HA_PREEMPTION variable to set a delay before higher priority VIPs take over or disable preemption entirely.

When a virtual IP (VIP) on a node recovers from the fault state, it enters the backup state if it has a lower priority than the VIP currently in the master state.

There are two options for the OPENSHIFT_HA_PREEMPTION variable:

  • nopreempt: When set, the master role does not move from a lower-priority VIP to a higher-priority VIP.
  • preempt_delay 300: When set, Keepalived waits 300 seconds before moving the master role to the higher-priority VIP.

In the following example, the OPENSHIFT_HA_PREEMPTION value is set to preempt_delay 300.

Procedure

  • To specify preemption enter oc edit deploy ipfailover-keepalived to edit the router deployment configuration:

    $ oc edit deploy ipfailover-keepalived
    Copy to Clipboard Toggle word wrap
    # ...
        spec:
          containers:
          - env:
            - name: OPENSHIFT_HA_PREEMPTION
              value: preempt_delay 300
    #...
    Copy to Clipboard Toggle word wrap

12.5. Deploying multiple IP failover instances

When deploying multiple IP failover instances in OpenShift Container Platform, each Keepalived daemon assigns unique VRRP IDs to virtual IP addresses. Configure the OPENSHIFT_HA_VRRP_ID_OFFSET variable to prevent VRRP ID range overlaps between different IP failover configurations.

Each IP failover pod created by an IP failover configuration (one pod per node or replica) runs a Keepalived daemon. When multiple IP failover configurations are present, additional pods are created, and their Keepalived daemons participate together in Virtual Router Redundancy Protocol (VRRP) negotiation. This negotiation determines which node services each virtual IP (VIP).

For each VIP, Keepalived assigns a unique internal vrrp-id. During VRRP negotiation, these vrrp-id values are used to select the node that services the corresponding VIP.

The IP failover pod assigns vrrp-id values sequentially to the VIPs defined in the IP failover configuration, starting from the value specified by OPENSHIFT_HA_VRRP_ID_OFFSET. Valid vrrp-id values are in the range 1..255.

When you deploy multiple IP failover configurations, ensure that the configured offset leaves sufficient space for additional VIPs and prevents vrrp-id ranges from overlapping across configurations.

To configure IP failover for more than 254 Virtual IP addresses in OpenShift Container Platform, you can use the OPENSHIFT_HA_VIP_GROUPS variable to group multiple addresses together. By using the OPENSHIFT_HA_VIP_GROUPS variable, you can change the number of VIPs per VRRP instance and define the number of VIP groups available for each VRRP instance when configuring IP failover.

Grouping VIPs creates a wider range of allocation of VIPs per VRRP in the case of VRRP failover events, and is useful when all hosts in the cluster have access to a service locally. For example, when a service is being exposed with an ExternalIP.

Note

As a rule for failover, do not limit services, such as the router, to one specific host. Instead, services should be replicated to each host so that in the case of IP failover, the services do not have to be recreated on the new host.

Note

If you are using OpenShift Container Platform health checks, the nature of IP failover and groups means that all instances in the group are not checked. For that reason, the Kubernetes health checks must be used to ensure that services are live.

Prerequisites

  • You are logged in to the cluster with a user with cluster-admin privileges.

Procedure

  • To change the number of IP addresses assigned to each group, change the value for the OPENSHIFT_HA_VIP_GROUPS variable, for example:

    Example Deployment YAML for IP failover configuration

    ...
        spec:
            env:
            - name: OPENSHIFT_HA_VIP_GROUPS
              value: "3"
    ...
    Copy to Clipboard Toggle word wrap

    In this example, the OPENSHIFT_HA_VIP_GROUPS variable is set to 3. In an environment with seven VIPs, it creates three groups, assigning three VIPs to the first group, and two VIPs to the two remaining groups.

    Note

    If the number of groups set by OPENSHIFT_HA_VIP_GROUPS is fewer than the number of IP addresses set to fail over, the group contains more than one IP address, and all of the addresses move as a single unit.

12.7. High availability For ExternalIP

High availability for ExternalIP in non-cloud clusters of OpenShift Container Platform combines IP failover with ExternalIP auto-assignment to ensure services remain accessible when nodes fail. You can configure this by using the same CIDR range for both ExternalIP auto-assignment and IP failover.

To configure high availability for ExternalIP, you can specify a spec.ExternalIP.autoAssignCIDRs range of the cluster network configuration, and then use the same range in creating the IP failover configuration.

Because IP failover can support up to a maximum of 255 VIPs for the entire cluster, the spec.ExternalIP.autoAssignCIDRs must be /24 or smaller.

12.8. Removing IP failover

To remove IP failover from your OpenShift Container Platform cluster and clean up iptables rules and virtual IP addresses, you can delete the deployment and service account, then run a cleanup job on each configured node.

When IP failover is initially configured, the worker nodes in the cluster are modified with an iptables rule that explicitly allows multicast packets on 224.0.0.18 for Keepalived. Because of the change to the nodes, removing IP failover requires running a job to remove the iptables rule and removing the virtual IP addresses used by Keepalived.

Procedure

  1. Optional: Identify and delete any check and notify scripts that are stored as config maps:

    1. Identify whether any pods for IP failover use a config map as a volume:

      $ oc get pod -l ipfailover \
        -o jsonpath="\
      {range .items[?(@.spec.volumes[*].configMap)]}
      {'Namespace: '}{.metadata.namespace}
      {'Pod:       '}{.metadata.name}
      {'Volumes that use config maps:'}
      {range .spec.volumes[?(@.configMap)]}  {'volume:    '}{.name}
        {'configMap: '}{.configMap.name}{'\n'}{end}
      {end}"
      Copy to Clipboard Toggle word wrap

      Example output

      Namespace: default
      Pod:       keepalived-worker-59df45db9c-2x9mn
      Volumes that use config maps:
        volume:    config-volume
        configMap: mycustomcheck
      Copy to Clipboard Toggle word wrap

    2. If the preceding step provided the names of config maps that are used as volumes, delete the config maps:

      $ oc delete configmap <configmap_name>
      Copy to Clipboard Toggle word wrap
  2. Identify an existing deployment for IP failover:

    $ oc get deployment -l ipfailover
    Copy to Clipboard Toggle word wrap

    Example output

    NAMESPACE   NAME         READY   UP-TO-DATE   AVAILABLE   AGE
    default     ipfailover   2/2     2            2           105d
    Copy to Clipboard Toggle word wrap

  3. Delete the deployment:

    $ oc delete deployment <ipfailover_deployment_name>
    Copy to Clipboard Toggle word wrap
  4. Remove the ipfailover service account:

    $ oc delete sa ipfailover
    Copy to Clipboard Toggle word wrap
  5. Run a job that removes the IP tables rule that was added when IP failover was initially configured:

    1. Create a file such as remove-ipfailover-job.yaml with contents that are similar to the following example:

      apiVersion: batch/v1
      kind: Job
      metadata:
        generateName: remove-ipfailover-
        labels:
          app: remove-ipfailover
      spec:
        template:
          metadata:
            name: remove-ipfailover
          spec:
            containers:
            - name: remove-ipfailover
              image: registry.redhat.io/openshift4/ose-keepalived-ipfailover-rhel9:v4.16
              command: ["/var/lib/ipfailover/keepalived/remove-failover.sh"]
            nodeSelector:
              kubernetes.io/hostname: <host_name>
            restartPolicy: Never
      Copy to Clipboard Toggle word wrap
      • The nodeSelector is likely the same as the selector used in the old IP failover deployment.
      • Run the job for each node in your cluster that was configured for IP failover and replace the hostname each time.
    2. Run the job:

      $ oc create -f remove-ipfailover-job.yaml
      Copy to Clipboard Toggle word wrap

      Example output

      job.batch/remove-ipfailover-2h8dm created
      Copy to Clipboard Toggle word wrap

Verification

  • Confirm that the job removed the initial configuration for IP failover.

    $ oc logs job/remove-ipfailover-2h8dm
    Copy to Clipboard Toggle word wrap

    Example output

    remove-failover.sh: OpenShift IP Failover service terminating.
      - Removing ip_vs module ...
      - Cleaning up ...
      - Releasing VIPs  (interface eth0) ...
    Copy to Clipboard Toggle word wrap

To modify kernel parameters and interface attributes at runtime in OpenShift Container Platform, you can use the tuning Container Network Interface (CNI) meta plugin. The plugin operates in a chain with a main CNI plugin and allows you to change sysctls and interface attributes such as promiscuous mode, all-multicast mode, MTU, and MAC address.

To configure interface-level network sysctls in OpenShift Container Platform, you can use the tuning CNI meta plugin in a network attachment definition. Configure the net.ipv4.conf.IFNAME.accept_redirects sysctl to enable accepting and sending ICMP-redirected packets.

Procedure

  1. Create a network attachment definition, such as tuning-example.yaml, with the following content:

    apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
      name: <name>
      namespace: default
    spec:
      config: '{
        "cniVersion": "0.4.0",
        "name": "<name>",
        "plugins": [{
           "type": "<main_CNI_plugin>"
          },
          {
           "type": "tuning",
           "sysctl": {
                "net.ipv4.conf.IFNAME.accept_redirects": "1"
            }
          }
         ]
    }
    Copy to Clipboard Toggle word wrap

    where:

    name
    Specifies the name for the additional network attachment to create. The name must be unique within the specified namespace.
    namespace
    Specifies the namespace that the object is associated with.
    cniVersion
    Specifies the CNI specification version.
    name
    Specifies the name for the configuration. It is recommended to match the configuration name to the name value of the network attachment definition.
    main_CNI_plugin
    Specifies the name of the main CNI plugin to configure.
    tuning
    Specifies the name of the CNI meta plugin.
    sysctl
    Specifies the sysctl to set. The interface name is represented by the IFNAME token and is replaced with the actual name of the interface at runtime.

    Example network attachment definition

    apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
      name: tuningnad
      namespace: default
    spec:
      config: '{
        "cniVersion": "0.4.0",
        "name": "tuningnad",
        "plugins": [{
          "type": "bridge"
          },
          {
          "type": "tuning",
          "sysctl": {
             "net.ipv4.conf.IFNAME.accept_redirects": "1"
            }
        }
      ]
    }'
    Copy to Clipboard Toggle word wrap

  2. Apply the YAML by running the following command:

    $ oc apply -f tuning-example.yaml
    Copy to Clipboard Toggle word wrap

    Example output

    networkattachmentdefinition.k8.cni.cncf.io/tuningnad created
    Copy to Clipboard Toggle word wrap

  3. Create a pod such as examplepod.yaml with the network attachment definition similar to the following:

    apiVersion: v1
    kind: Pod
    metadata:
      name: tunepod
      namespace: default
      annotations:
        k8s.v1.cni.cncf.io/networks: tuningnad
    spec:
      containers:
      - name: podexample
        image: centos
        command: ["/bin/bash", "-c", "sleep INF"]
        securityContext:
          runAsUser: 2000
          runAsGroup: 3000
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
    Copy to Clipboard Toggle word wrap

    where:

    k8s.v1.cni.cncf.io/networks
    Specifies the name of the configured NetworkAttachmentDefinition.
    runAsUser
    Specifies which user ID the container is run with.
    runAsGroup
    Specifies which primary group ID the containers is run with.
    allowPrivilegeEscalation
    Specifies if a pod can request to allow privilege escalation. If unspecified, it defaults to true. This boolean directly controls whether the no_new_privs flag gets set on the container process.
    capabilities
    Specifies privileged actions without giving full root access. This policy ensures all capabilities are dropped from the pod.
    runAsNonRoot: true
    Specifies that the container will run with a user with any UID other than 0.
    seccompProfile
    Specifies the default seccomp profile for a pod or container workload.
  4. Apply the yaml by running the following command:

    $ oc apply -f examplepod.yaml
    Copy to Clipboard Toggle word wrap
  5. Verify that the pod is created by running the following command:

    $ oc get pod
    Copy to Clipboard Toggle word wrap

    Example output

    NAME      READY   STATUS    RESTARTS   AGE
    tunepod   1/1     Running   0          47s
    Copy to Clipboard Toggle word wrap

  6. Log in to the pod by running the following command:

    $ oc rsh tunepod
    Copy to Clipboard Toggle word wrap
  7. Verify the values of the configured sysctl flags. For example, find the value net.ipv4.conf.net1.accept_redirects by running the following command:

    sh-4.4# sysctl net.ipv4.conf.net1.accept_redirects
    Copy to Clipboard Toggle word wrap

    Expected output

    net.ipv4.conf.net1.accept_redirects = 1
    Copy to Clipboard Toggle word wrap

To enable all-multicast mode on network interfaces in OpenShift Container Platform, you can use the tuning Container Network Interface (CNI) meta plugin in a network attachment definition. When enabled, the interface receives all multicast packets on the network.

Procedure

  1. Create a network attachment definition, such as tuning-example.yaml, with the following content:

    apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
      name: <name>
      namespace: default
    spec:
      config: '{
        "cniVersion": "0.4.0",
        "name": "<name>",
        "plugins": [{
           "type": "<main_CNI_plugin>"
          },
          {
           "type": "tuning",
           "allmulti": true
            }
          }
         ]
    }
    Copy to Clipboard Toggle word wrap

    where:

    name
    Specifies the name for the additional network attachment to create. The name must be unique within the specified namespace.
    namespace
    Specifies the namespace that the object is associated with.
    cniVersion
    Specifies the CNI specification version.
    name
    Specifies the name for the configuration. Match the configuration name to the name value of the network attachment definition.
    main_CNI_plugin
    Specifies the name of the main CNI plugin to configure.
    tuning
    Specifies the name of the CNI meta plugin.
    allmulti
    Specifies the all-multicast mode of interface. If enabled, all multicast packets on the network will be received by the interface.

    Example network attachment definition

    apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
      name: setallmulti
      namespace: default
    spec:
      config: '{
        "cniVersion": "0.4.0",
        "name": "setallmulti",
        "plugins": [
          {
            "type": "bridge"
          },
          {
            "type": "tuning",
            "allmulti": true
          }
        ]
      }'
    Copy to Clipboard Toggle word wrap

  2. Apply the settings specified in the YAML file by running the following command:

    $ oc apply -f tuning-allmulti.yaml
    Copy to Clipboard Toggle word wrap

    Example output

    networkattachmentdefinition.k8s.cni.cncf.io/setallmulti created
    Copy to Clipboard Toggle word wrap

  3. Create a pod with a network attachment definition similar to that specified in the following examplepod.yaml sample file:

    apiVersion: v1
    kind: Pod
    metadata:
      name: allmultipod
      namespace: default
      annotations:
        k8s.v1.cni.cncf.io/networks: setallmulti
    spec:
      containers:
      - name: podexample
        image: centos
        command: ["/bin/bash", "-c", "sleep INF"]
        securityContext:
          runAsUser: 2000
          runAsGroup: 3000
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
      securityContext:
        runAsNonRoot: true
        seccompProfile:
          type: RuntimeDefault
    Copy to Clipboard Toggle word wrap

    where:

    k8s.v1.cni.cncf.io/networks
    Specifies the name of the configured NetworkAttachmentDefinition.
    runAsUser
    Specifies which user ID the container is run with.
    runAsGroup
    Specifies which primary group ID the containers is run with.
    allowPrivilegeEscalation
    Specifies if a pod can request to allow privilege escalation. If unspecified, it defaults to true. This boolean directly controls whether the no_new_privs flag gets set on the container process.
    capabilities
    Specifies privileged actions without giving full root access. This policy ensures all capabilities are dropped from the pod.
    runAsNonRoot: true
    Specifies that the container will run with a user with any UID other than 0.
    seccompProfile
    Specifies the default seccomp profile for a pod or container workload.
  4. Apply the settings specified in the YAML file by running the following command:

    $ oc apply -f examplepod.yaml
    Copy to Clipboard Toggle word wrap
  5. Verify that the pod is created by running the following command:

    $ oc get pod
    Copy to Clipboard Toggle word wrap

    Example output

    NAME          READY   STATUS    RESTARTS   AGE
    allmultipod   1/1     Running   0          23s
    Copy to Clipboard Toggle word wrap

  6. Log in to the pod by running the following command:

    $ oc rsh allmultipod
    Copy to Clipboard Toggle word wrap
  7. List all the interfaces associated with the pod by running the following command:

    sh-4.4# ip link
    Copy to Clipboard Toggle word wrap

    Example output

    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: eth0@if22: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8901 qdisc noqueue state UP mode DEFAULT group default
        link/ether 0a:58:0a:83:00:10 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    3: net1@if24: <BROADCAST,MULTICAST,ALLMULTI,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
        link/ether ee:9b:66:a4:ec:1d brd ff:ff:ff:ff:ff:ff link-netnsid 0
    Copy to Clipboard Toggle word wrap

    where:

    eth0@if22
    Specifies the primary interface.
    net1@if24
    Specifies the secondary interface configured with the network-attachment-definition that supports the all-multicast mode (ALLMULTI flag).

As a cluster administrator, you can use the Stream Control Transmission Protocol (SCTP) on a bare-metal cluster.

As a cluster administrator, you can enable SCTP on the hosts in the cluster. On Red Hat Enterprise Linux CoreOS (RHCOS), the SCTP module is disabled by default.

SCTP is a reliable message based protocol that runs on top of an IP network.

When enabled, you can use SCTP as a protocol with pods, services, and network policy. A Service object must be defined with the type parameter set to either the ClusterIP or NodePort value.

14.1.1. Example configurations using SCTP protocol

You can configure a pod or service to use SCTP by setting the protocol parameter to the SCTP value in the pod or service object.

In the following example, a pod is configured to use SCTP:

apiVersion: v1
kind: Pod
metadata:
  namespace: project1
  name: example-pod
spec:
  containers:
    - name: example-pod
...
      ports:
        - containerPort: 30100
          name: sctpserver
          protocol: SCTP
Copy to Clipboard Toggle word wrap

In the following example, a service is configured to use SCTP:

apiVersion: v1
kind: Service
metadata:
  namespace: project1
  name: sctpserver
spec:
...
  ports:
    - name: sctpserver
      protocol: SCTP
      port: 30100
      targetPort: 30100
  type: ClusterIP
Copy to Clipboard Toggle word wrap

In the following example, a NetworkPolicy object is configured to apply to SCTP network traffic on port 80 from any pods with a specific label:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-sctp-on-http
spec:
  podSelector:
    matchLabels:
      role: web
  ingress:
  - ports:
    - protocol: SCTP
      port: 80
Copy to Clipboard Toggle word wrap

As a cluster administrator, you can load and enable the blacklisted SCTP kernel module on worker nodes in your cluster.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Access to the cluster as a user with the cluster-admin role.

Procedure

  1. Create a file named load-sctp-module.yaml that contains the following YAML definition:

    apiVersion: machineconfiguration.openshift.io/v1
    kind: MachineConfig
    metadata:
      name: load-sctp-module
      labels:
        machineconfiguration.openshift.io/role: worker
    spec:
      config:
        ignition:
          version: 3.2.0
        storage:
          files:
            - path: /etc/modprobe.d/sctp-blacklist.conf
              mode: 0644
              overwrite: true
              contents:
                source: data:,
            - path: /etc/modules-load.d/sctp-load.conf
              mode: 0644
              overwrite: true
              contents:
                source: data:,sctp
    Copy to Clipboard Toggle word wrap
  2. To create the MachineConfig object, enter the following command:

    $ oc create -f load-sctp-module.yaml
    Copy to Clipboard Toggle word wrap
  3. Optional: To watch the status of the nodes while the MachineConfig Operator applies the configuration change, enter the following command. When the status of a node transitions to Ready, the configuration update is applied.

    $ oc get nodes
    Copy to Clipboard Toggle word wrap

You can verify that SCTP is working on a cluster by creating a pod with an application that listens for SCTP traffic, associating it with a service, and then connecting to the exposed service.

Prerequisites

  • Access to the internet from the cluster to install the nc package.
  • Install the OpenShift CLI (oc).
  • Access to the cluster as a user with the cluster-admin role.

Procedure

  1. Create a pod starts an SCTP listener:

    1. Create a file named sctp-server.yaml that defines a pod with the following YAML:

      apiVersion: v1
      kind: Pod
      metadata:
        name: sctpserver
        labels:
          app: sctpserver
      spec:
        containers:
          - name: sctpserver
            image: registry.access.redhat.com/ubi9/ubi
            command: ["/bin/sh", "-c"]
            args:
              ["dnf install -y nc && sleep inf"]
            ports:
              - containerPort: 30102
                name: sctpserver
                protocol: SCTP
      Copy to Clipboard Toggle word wrap
    2. Create the pod by entering the following command:

      $ oc create -f sctp-server.yaml
      Copy to Clipboard Toggle word wrap
  2. Create a service for the SCTP listener pod.

    1. Create a file named sctp-service.yaml that defines a service with the following YAML:

      apiVersion: v1
      kind: Service
      metadata:
        name: sctpservice
        labels:
          app: sctpserver
      spec:
        type: NodePort
        selector:
          app: sctpserver
        ports:
          - name: sctpserver
            protocol: SCTP
            port: 30102
            targetPort: 30102
      Copy to Clipboard Toggle word wrap
    2. To create the service, enter the following command:

      $ oc create -f sctp-service.yaml
      Copy to Clipboard Toggle word wrap
  3. Create a pod for the SCTP client.

    1. Create a file named sctp-client.yaml with the following YAML:

      apiVersion: v1
      kind: Pod
      metadata:
        name: sctpclient
        labels:
          app: sctpclient
      spec:
        containers:
          - name: sctpclient
            image: registry.access.redhat.com/ubi9/ubi
            command: ["/bin/sh", "-c"]
            args:
              ["dnf install -y nc && sleep inf"]
      Copy to Clipboard Toggle word wrap
    2. To create the Pod object, enter the following command:

      $ oc apply -f sctp-client.yaml
      Copy to Clipboard Toggle word wrap
  4. Run an SCTP listener on the server.

    1. To connect to the server pod, enter the following command:

      $ oc rsh sctpserver
      Copy to Clipboard Toggle word wrap
    2. To start the SCTP listener, enter the following command:

      $ nc -l 30102 --sctp
      Copy to Clipboard Toggle word wrap
  5. Connect to the SCTP listener on the server.

    1. Open a new terminal window or tab in your terminal program.
    2. Obtain the IP address of the sctpservice service. Enter the following command:

      $ oc get services sctpservice -o go-template='{{.spec.clusterIP}}{{"\n"}}'
      Copy to Clipboard Toggle word wrap
    3. To connect to the client pod, enter the following command:

      $ oc rsh sctpclient
      Copy to Clipboard Toggle word wrap
    4. To start the SCTP client, enter the following command. Replace <cluster_IP> with the cluster IP address of the sctpservice service.

      # nc <cluster_IP> 30102 --sctp
      Copy to Clipboard Toggle word wrap

Chapter 15. Using PTP hardware

Precision Time Protocol (PTP) is used to synchronize clocks in a network. When used in conjunction with hardware support, PTP is capable of sub-microsecond accuracy, and is more accurate than Network Time Protocol (NTP).

Important

If your openshift-sdn cluster with PTP uses the User Datagram Protocol (UDP) for hardware time stamping and you migrate to the OVN-Kubernetes plugin, the hardware time stamping cannot be applied to primary interface devices, such as an Open vSwitch (OVS) bridge. As a result, UDP version 4 configurations cannot work with a br-ex interface.

You can configure linuxptp services and use PTP-capable hardware in OpenShift Container Platform cluster nodes.

Use the OpenShift Container Platform web console or OpenShift CLI (oc) to install PTP by deploying the PTP Operator. The PTP Operator creates and manages the linuxptp services and provides the following features:

  • Discovery of the PTP-capable devices in the cluster.
  • Management of the configuration of linuxptp services.
  • Notification of PTP clock events that negatively affect the performance and reliability of your application with the PTP Operator cloud-event-proxy sidecar.
Note

The PTP Operator works with PTP-capable devices on clusters provisioned only on bare-metal infrastructure.

15.1.1. Elements of a PTP domain

PTP is used to synchronize multiple nodes connected in a network, with clocks for each node. The clocks synchronized by PTP are organized in a leader-follower hierarchy. The hierarchy is created and updated automatically by the best master clock (BMC) algorithm, which runs on every clock. Follower clocks are synchronized to leader clocks, and follower clocks can themselves be the source for other downstream clocks.

Figure 15.1. PTP nodes in the network

The three primary types of PTP clocks are described below.

Grandmaster clock
The grandmaster clock provides standard time information to other clocks across the network and ensures accurate and stable synchronisation. It writes time stamps and responds to time requests from other clocks. Grandmaster clocks synchronize to a Global Navigation Satellite System (GNSS) time source. The Grandmaster clock is the authoritative source of time in the network and is responsible for providing time synchronization to all other devices.
Boundary clock
The boundary clock has ports in two or more communication paths and can be a source and a destination to other destination clocks at the same time. The boundary clock works as a destination clock upstream. The destination clock receives the timing message, adjusts for delay, and then creates a new source time signal to pass down the network. The boundary clock produces a new timing packet that is still correctly synced with the source clock and can reduce the number of connected devices reporting directly to the source clock.
Ordinary clock
The ordinary clock has a single port connection that can play the role of source or destination clock, depending on its position in the network. The ordinary clock can read and write timestamps.
15.1.1.1. Advantages of PTP over NTP

One of the main advantages that PTP has over NTP is the hardware support present in various network interface controllers (NIC) and network switches. The specialized hardware allows PTP to account for delays in message transfer and improves the accuracy of time synchronization. To achieve the best possible accuracy, it is recommended that all networking components between PTP clocks are PTP hardware enabled.

Hardware-based PTP provides optimal accuracy, since the NIC can timestamp the PTP packets at the exact moment they are sent and received. Compare this to software-based PTP, which requires additional processing of the PTP packets by the operating system.

Important

Before enabling PTP, ensure that NTP is disabled for the required nodes. You can disable the chrony time service (chronyd) using a MachineConfig custom resource. For more information, see Disabling chrony time service.

OpenShift Container Platform uses the PTP Operator with linuxptp and gpsd packages for high precision network synchronization. The linuxptp package provides tools and daemons for PTP timing in networks. Cluster hosts with Global Navigation Satellite System (GNSS) capable NICs use gpsd to interface with GNSS clock sources.

The linuxptp package includes the ts2phc, pmc, ptp4l, and phc2sys programs for system clock synchronization.

ts2phc

ts2phc synchronizes the PTP hardware clock (PHC) across PTP devices with a high degree of precision. ts2phc is used in grandmaster clock configurations. It receives the precision timing signal a high precision clock source such as Global Navigation Satellite System (GNSS). GNSS provides an accurate and reliable source of synchronized time for use in large distributed networks. GNSS clocks typically provide time information with a precision of a few nanoseconds.

The ts2phc system daemon sends timing information from the grandmaster clock to other PTP devices in the network by reading time information from the grandmaster clock and converting it to PHC format. PHC time is used by other devices in the network to synchronize their clocks with the grandmaster clock.

pmc
pmc implements a PTP management client (pmc) according to IEEE standard 1588.1588. pmc provides basic management access for the ptp4l system daemon. pmc reads from standard input and sends the output over the selected transport, printing any replies it receives.
ptp4l

ptp4l implements the PTP boundary clock and ordinary clock and runs as a system daemon. ptp4l does the following:

  • Synchronizes the PHC to the source clock with hardware time stamping
  • Synchronizes the system clock to the source clock with software time stamping
phc2sys
phc2sys synchronizes the system clock to the PHC on the network interface controller (NIC). The phc2sys system daemon continuously monitors the PHC for timing information. When it detects a timing error, the PHC corrects the system clock.

The gpsd package includes the ubxtool, gspipe, gpsd, programs for GNSS clock synchronization with the host clock.

ubxtool
ubxtool CLI allows you to communicate with a u-blox GPS system. The ubxtool CLI uses the u-blox binary protocol to communicate with the GPS.
gpspipe
gpspipe connects to gpsd output and pipes it to stdout.
gpsd
gpsd is a service daemon that monitors one or more GPS or AIS receivers connected to the host.

OpenShift Container Platform supports receiving precision PTP timing from Global Navigation Satellite System (GNSS) sources and grandmaster clocks (T-GM) in the cluster.

Important

OpenShift Container Platform supports PTP timing from GNSS sources with Intel E810 Westport Channel NICs only.

Figure 15.2. Overview of Synchronization with GNSS and T-GM

Global Navigation Satellite System (GNSS)

GNSS is a satellite-based system used to provide positioning, navigation, and timing information to receivers around the globe. In PTP, GNSS receivers are often used as a highly accurate and stable reference clock source. These receivers receive signals from multiple GNSS satellites, allowing them to calculate precise time information. The timing information obtained from GNSS is used as a reference by the PTP grandmaster clock.

By using GNSS as a reference, the grandmaster clock in the PTP network can provide highly accurate timestamps to other devices, enabling precise synchronization across the entire network.

Digital Phase-Locked Loop (DPLL)
DPLL provides clock synchronization between different PTP nodes in the network. DPLL compares the phase of the local system clock signal with the phase of the incoming synchronization signal, for example, PTP messages from the PTP grandmaster clock. The DPLL continuously adjusts the local clock frequency and phase to minimize the phase difference between the local clock and the reference clock.

A leap second is a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC) to keep it synchronized with International Atomic Time (TAI). UTC leap seconds are unpredictable. Internationally agreed leap seconds are listed in leap-seconds.list. This file is regularly updated by the International Earth Rotation and Reference Systems Service (IERS). An unhandled leap second can have a significant impact on far edge RAN networks. It can cause the far edge RAN application to immediately disconnect voice calls and data sessions.

Cloud native applications such as virtual RAN (vRAN) require access to notifications about hardware timing events that are critical to the functioning of the overall network. PTP clock synchronization errors can negatively affect the performance and reliability of your low-latency application, for example, a vRAN application running in a distributed unit (DU).

Loss of PTP synchronization is a critical error for a RAN network. If synchronization is lost on a node, the radio might be shut down and the network Over the Air (OTA) traffic might be shifted to another node in the wireless network. Fast event notifications mitigate against workload errors by allowing cluster nodes to communicate PTP clock sync status to the vRAN application running in the DU.

Event notifications are available to vRAN applications running on the same DU node. A publish/subscribe REST API passes events notifications to the messaging bus. Publish/subscribe messaging, or pub-sub messaging, is an asynchronous service-to-service communication architecture where any message published to a topic is immediately received by all of the subscribers to the topic.

The PTP Operator generates fast event notifications for every PTP-capable network interface. You can access the events by using a cloud-event-proxy sidecar container over an HTTP message bus.

Note

PTP fast event notifications are available for network interfaces configured to use PTP ordinary clocks, PTP grandmaster clocks, or PTP boundary clocks.

15.1.5. 2-card E810 NIC configuration reference

OpenShift Container Platform supports single and dual-NIC Intel E810 hardware for PTP timing in grandmaster clocks (T-GM) and boundary clocks (T-BC).

Dual NIC grandmaster clock

You can use a cluster host that has dual-NIC hardware as PTP grandmaster clock. One NIC receives timing information from the global navigation satellite system (GNSS). The second NIC receives the timing information from the first using the SMA1 Tx/Rx connections on the E810 NIC faceplate. The system clock on the cluster host is synchronized from the NIC that is connected to the GNSS satellite.

Dual NIC grandmaster clocks are a feature of distributed RAN (D-RAN) configurations where the Remote Radio Unit (RRU) and Baseband Unit (BBU) are located at the same radio cell site. D-RAN distributes radio functions across multiple sites, with backhaul connections linking them to the core network.

Figure 15.3. Dual NIC grandmaster clock

Note

In a dual-NIC T-GM configuration, a single ts2phc program operate on two PTP hardware clocks (PHCs), one for each NIC.

Dual NIC boundary clock

For 5G telco networks that deliver mid-band spectrum coverage, each virtual distributed unit (vDU) requires connections to 6 radio units (RUs). To make these connections, each vDU host requires 2 NICs configured as boundary clocks.

Dual NIC hardware allows you to connect each NIC to the same upstream leader clock with separate ptp4l instances for each NIC feeding the downstream clocks.

Highly available system clock with dual-NIC boundary clocks

You can configure Intel E810-XXVDA4 Salem channel dual-NIC hardware as dual PTP boundary clocks that provide timing for a highly available system clock. This configuration is useful when you have multiple time sources on different NICs. High availability ensures that the node does not lose timing synchronization if one of the two timing sources is lost or disconnected.

Each NIC is connected to the same upstream leader clock. Highly available boundary clocks use multiple PTP domains to synchronize with the target system clock. When a T-BC is highly available, the host system clock can maintain the correct offset even if one or more ptp4l instances syncing the NIC PHC clock fails. If any single SFP port or cable failure occurs, the boundary clock stays in sync with the leader clock.

Boundary clock leader source selection is done using the A-BMCA algorithm. For more information, see ITU-T recommendation G.8275.1.

15.1.6. 3-card Intel E810 PTP grandmaster clock

OpenShift Container Platform supports cluster hosts with 3 Intel E810 NICs as PTP grandmaster clocks (T-GM).

3-card grandmaster clock

You can use a cluster host that has 3 NICs as PTP grandmaster clock. One NIC receives timing information from the global navigation satellite system (GNSS). The second and third NICs receive the timing information from the first by using the SMA1 Tx/Rx connections on the E810 NIC faceplate. The system clock on the cluster host is synchronized from the NIC that is connected to the GNSS satellite.

3-card NIC grandmaster clocks can be used for distributed RAN (D-RAN) configurations where the Radio Unit (RU) is connected directly to the distributed unit (DU) without a front haul switch, for example, if the RU and DU are located in the same radio cell site. D-RAN distributes radio functions across multiple sites, with backhaul connections linking them to the core network.

Figure 15.4. 3-card Intel E810 PTP grandmaster clock

Note

In a 3-card T-GM configuration, a single ts2phc process reports as 3 ts2phc instances in the system.

15.2. Configuring PTP devices

The PTP Operator adds the NodePtpDevice.ptp.openshift.io custom resource definition (CRD) to OpenShift Container Platform.

When installed, the PTP Operator searches your cluster for Precision Time Protocol (PTP) capable network devices on each node. The Operator creates and updates a NodePtpDevice custom resource (CR) object for each node that provides a compatible PTP-capable network device.

Network interface controller (NIC) hardware with built-in PTP capabilities sometimes require a device-specific configuration. You can use hardware-specific NIC features for supported hardware with the PTP Operator by configuring a plugin in the PtpConfig custom resource (CR). The linuxptp-daemon service uses the named parameters in the plugin stanza to start linuxptp processes, ptp4l and phc2sys, based on the specific hardware configuration.

Important

In OpenShift Container Platform 4.16, the Intel E810 NIC is supported with a PtpConfig plugin.

15.2.1. Installing the PTP Operator using the CLI

As a cluster administrator, you can install the Operator by using the CLI.

Prerequisites

  • A cluster installed on bare-metal hardware with nodes that have hardware that supports PTP.
  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a namespace for the PTP Operator.

    1. Save the following YAML in the ptp-namespace.yaml file:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: openshift-ptp
        annotations:
          workload.openshift.io/allowed: management
        labels:
          name: openshift-ptp
          openshift.io/cluster-monitoring: "true"
      Copy to Clipboard Toggle word wrap
    2. Create the Namespace CR:

      $ oc create -f ptp-namespace.yaml
      Copy to Clipboard Toggle word wrap
  2. Create an Operator group for the PTP Operator.

    1. Save the following YAML in the ptp-operatorgroup.yaml file:

      apiVersion: operators.coreos.com/v1
      kind: OperatorGroup
      metadata:
        name: ptp-operators
        namespace: openshift-ptp
      spec:
        targetNamespaces:
        - openshift-ptp
      Copy to Clipboard Toggle word wrap
    2. Create the OperatorGroup CR:

      $ oc create -f ptp-operatorgroup.yaml
      Copy to Clipboard Toggle word wrap
  3. Subscribe to the PTP Operator.

    1. Save the following YAML in the ptp-sub.yaml file:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: ptp-operator-subscription
        namespace: openshift-ptp
      spec:
        channel: "stable"
        name: ptp-operator
        source: redhat-operators
        sourceNamespace: openshift-marketplace
      Copy to Clipboard Toggle word wrap
    2. Create the Subscription CR:

      $ oc create -f ptp-sub.yaml
      Copy to Clipboard Toggle word wrap
  4. To verify that the Operator is installed, enter the following command:

    $ oc get csv -n openshift-ptp -o custom-columns=Name:.metadata.name,Phase:.status.phase
    Copy to Clipboard Toggle word wrap

    Example output

    Name                         Phase
    4.16.0-202301261535          Succeeded
    Copy to Clipboard Toggle word wrap

As a cluster administrator, you can install the PTP Operator by using the web console.

Note

You have to create the namespace and Operator group as mentioned in the previous section.

Procedure

  1. Install the PTP Operator using the OpenShift Container Platform web console:

    1. In the OpenShift Container Platform web console, click OperatorsOperatorHub.
    2. Choose PTP Operator from the list of available Operators, and then click Install.
    3. On the Install Operator page, under A specific namespace on the cluster select openshift-ptp. Then, click Install.
  2. Optional: Verify that the PTP Operator installed successfully:

    1. Switch to the OperatorsInstalled Operators page.
    2. Ensure that PTP Operator is listed in the openshift-ptp project with a Status of InstallSucceeded.

      Note

      During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.

      If the Operator does not appear as installed, to troubleshoot further:

      • Go to the OperatorsInstalled Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
      • Go to the WorkloadsPods page and check the logs for pods in the openshift-ptp project.

Identify PTP-capable network devices that exist in your cluster so that you can configure them

Prerequisties

  • You installed the PTP Operator.

Procedure

  • To return a complete list of PTP capable network devices in your cluster, run the following command:

    $ oc get NodePtpDevice -n openshift-ptp -o yaml
    Copy to Clipboard Toggle word wrap

    Example output

    apiVersion: v1
    items:
    - apiVersion: ptp.openshift.io/v1
      kind: NodePtpDevice
      metadata:
        creationTimestamp: "2022-01-27T15:16:28Z"
        generation: 1
        name: dev-worker-0 
    1
    
        namespace: openshift-ptp
        resourceVersion: "6538103"
        uid: d42fc9ad-bcbf-4590-b6d8-b676c642781a
      spec: {}
      status:
        devices: 
    2
    
        - name: eno1
        - name: eno2
        - name: eno3
        - name: eno4
        - name: enp5s0f0
        - name: enp5s0f1
    ...
    Copy to Clipboard Toggle word wrap

    1
    The value for the name parameter is the same as the name of the parent node.
    2
    The devices collection includes a list of the PTP capable devices that the PTP Operator discovers for the node.

You can configure the linuxptp services (ptp4l, phc2sys, ts2phc) as grandmaster clock (T-GM) by creating a PtpConfig custom resource (CR) that configures the host NIC.

The ts2phc utility allows you to synchronize the system clock with the PTP grandmaster clock so that the node can stream precision clock signal to downstream PTP ordinary clocks and boundary clocks.

Note

Use the following example PtpConfig CR as the basis to configure linuxptp services as T-GM for an Intel Westport Channel E810-XXVDA4T network interface.

To configure PTP fast events, set appropriate values for ptp4lOpts, ptp4lConf, and ptpClockThreshold. ptpClockThreshold is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.

Prerequisites

  • For T-GM clocks in production environments, install an Intel E810 Westport Channel NIC in the bare-metal cluster host.
  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the PTP Operator.

Procedure

  1. Create the PtpConfig CR. For example:

    1. Depending on your requirements, use one of the following T-GM configurations for your deployment. Save the YAML in the grandmaster-clock-ptp-config.yaml file:

      Example 15.1. PTP grandmaster clock configuration for E810 NIC

      apiVersion: ptp.openshift.io/v1
      kind: PtpConfig
      metadata:
        name: grandmaster
        namespace: openshift-ptp
        annotations: {}
      spec:
        profile:
          - name: "grandmaster"
            ptp4lOpts: "-2 --summary_interval -4"
            phc2sysOpts: -r -u 0 -m -N 8 -R 16 -s $iface_master -n 24
            ptpSchedulingPolicy: SCHED_FIFO
            ptpSchedulingPriority: 10
            ptpSettings:
              logReduce: "true"
            plugins:
              e810:
                enableDefaultConfig: false
                settings:
                  LocalMaxHoldoverOffSet: 1500
                  LocalHoldoverTimeout: 14400
                  MaxInSpecOffset: 1500
                pins: $e810_pins
                #  "$iface_master":
                #    "U.FL2": "0 2"
                #    "U.FL1": "0 1"
                #    "SMA2": "0 2"
                #    "SMA1": "0 1"
                ublxCmds:
                  - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1
                      - "-P"
                      - "29.20"
                      - "-z"
                      - "CFG-HW-ANT_CFG_VOLTCTRL,1"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -e GPS
                      - "-P"
                      - "29.20"
                      - "-e"
                      - "GPS"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -d Galileo
                      - "-P"
                      - "29.20"
                      - "-d"
                      - "Galileo"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -d GLONASS
                      - "-P"
                      - "29.20"
                      - "-d"
                      - "GLONASS"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -d BeiDou
                      - "-P"
                      - "29.20"
                      - "-d"
                      - "BeiDou"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -d SBAS
                      - "-P"
                      - "29.20"
                      - "-d"
                      - "SBAS"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000
                      - "-P"
                      - "29.20"
                      - "-t"
                      - "-w"
                      - "5"
                      - "-v"
                      - "1"
                      - "-e"
                      - "SURVEYIN,600,50000"
                    reportOutput: true
                  - args: #ubxtool -P 29.20 -p MON-HW
                      - "-P"
                      - "29.20"
                      - "-p"
                      - "MON-HW"
                    reportOutput: true
                  - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248
                      - "-P"
                      - "29.20"
                      - "-p"
                      - "CFG-MSG,1,38,248"
                    reportOutput: true
            ts2phcOpts: " "
            ts2phcConf: |
              [nmea]
              ts2phc.master 1
              [global]
              use_syslog  0
              verbose 1
              logging_level 7
              ts2phc.pulsewidth 100000000
              #cat /dev/GNSS to find available serial port
              #example value of gnss_serialport is /dev/ttyGNSS_1700_0
              ts2phc.nmea_serialport $gnss_serialport
              [$iface_master]
              ts2phc.extts_polarity rising
              ts2phc.extts_correction 0
            ptp4lConf: |
              [$iface_master]
              masterOnly 1
              [$iface_master_1]
              masterOnly 1
              [$iface_master_2]
              masterOnly 1
              [$iface_master_3]
              masterOnly 1
              [global]
              #
              # Default Data Set
              #
              twoStepFlag 1
              priority1 128
              priority2 128
              domainNumber 24
              #utc_offset 37
              clockClass 6
              clockAccuracy 0x27
              offsetScaledLogVariance 0xFFFF
              free_running 0
              freq_est_interval 1
              dscp_event 0
              dscp_general 0
              dataset_comparison G.8275.x
              G.8275.defaultDS.localPriority 128
              #
              # Port Data Set
              #
              logAnnounceInterval -3
              logSyncInterval -4
              logMinDelayReqInterval -4
              logMinPdelayReqInterval 0
              announceReceiptTimeout 3
              syncReceiptTimeout 0
              delayAsymmetry 0
              fault_reset_interval -4
              neighborPropDelayThresh 20000000
              masterOnly 0
              G.8275.portDS.localPriority 128
              #
              # Run time options
              #
              assume_two_step 0
              logging_level 6
              path_trace_enabled 0
              follow_up_info 0
              hybrid_e2e 0
              inhibit_multicast_service 0
              net_sync_monitor 0
              tc_spanning_tree 0
              tx_timestamp_timeout 50
              unicast_listen 0
              unicast_master_table 0
              unicast_req_duration 3600
              use_syslog 1
              verbose 0
              summary_interval -4
              kernel_leap 1
              check_fup_sync 0
              clock_class_threshold 7
              #
              # Servo Options
              #
              pi_proportional_const 0.0
              pi_integral_const 0.0
              pi_proportional_scale 0.0
              pi_proportional_exponent -0.3
              pi_proportional_norm_max 0.7
              pi_integral_scale 0.0
              pi_integral_exponent 0.4
              pi_integral_norm_max 0.3
              step_threshold 2.0
              first_step_threshold 0.00002
              clock_servo pi
              sanity_freq_limit  200000000
              ntpshm_segment 0
              #
              # Transport options
              #
              transportSpecific 0x0
              ptp_dst_mac 01:1B:19:00:00:00
              p2p_dst_mac 01:80:C2:00:00:0E
              udp_ttl 1
              udp6_scope 0x0E
              uds_address /var/run/ptp4l
              #
              # Default interface options
              #
              clock_type BC
              network_transport L2
              delay_mechanism E2E
              time_stamping hardware
              tsproc_mode filter
              delay_filter moving_median
              delay_filter_length 10
              egressLatency 0
              ingressLatency 0
              boundary_clock_jbod 0
              #
              # Clock description
              #
              productDescription ;;
              revisionData ;;
              manufacturerIdentity 00:00:00
              userDescription ;
              timeSource 0x20
        recommend:
          - profile: "grandmaster"
            priority: 4
            match:
              - nodeLabel: "node-role.kubernetes.io/$mcp"
      Copy to Clipboard Toggle word wrap
      Note

      For E810 Westport Channel NICs, set the value for ts2phc.nmea_serialport to /dev/gnss0.

    2. Create the CR by running the following command:

      $ oc create -f grandmaster-clock-ptp-config.yaml
      Copy to Clipboard Toggle word wrap

Verification

  1. Check that the PtpConfig profile is applied to the node.

    1. Get the list of pods in the openshift-ptp namespace by running the following command:

      $ oc get pods -n openshift-ptp -o wide
      Copy to Clipboard Toggle word wrap

      Example output

      NAME                          READY   STATUS    RESTARTS   AGE     IP             NODE
      linuxptp-daemon-74m2g         3/3     Running   3          4d15h   10.16.230.7    compute-1.example.com
      ptp-operator-5f4f48d7c-x7zkf  1/1     Running   1          4d15h   10.128.1.145   compute-1.example.com
      Copy to Clipboard Toggle word wrap

    2. Check that the profile is correct. Examine the logs of the linuxptp daemon that corresponds to the node you specified in the PtpConfig profile. Run the following command:

      $ oc logs linuxptp-daemon-74m2g -n openshift-ptp -c linuxptp-daemon-container
      Copy to Clipboard Toggle word wrap

      Example output

      ts2phc[94980.334]: [ts2phc.0.config] nmea delay: 98690975 ns
      ts2phc[94980.334]: [ts2phc.0.config] ens3f0 extts index 0 at 1676577329.999999999 corr 0 src 1676577330.901342528 diff -1
      ts2phc[94980.334]: [ts2phc.0.config] ens3f0 master offset         -1 s2 freq      -1
      ts2phc[94980.441]: [ts2phc.0.config] nmea sentence: GNRMC,195453.00,A,4233.24427,N,07126.64420,W,0.008,,160223,,,A,V
      phc2sys[94980.450]: [ptp4l.0.config] CLOCK_REALTIME phc offset       943 s2 freq  -89604 delay    504
      phc2sys[94980.512]: [ptp4l.0.config] CLOCK_REALTIME phc offset      1000 s2 freq  -89264 delay    474
      Copy to Clipboard Toggle word wrap

You can configure the linuxptp services (ptp4l, phc2sys, ts2phc) as a grandmaster clock (T-GM) for 2 E810 NICs by creating a PtpConfig custom resource (CR) that configures the NICs.

You can configure the linuxptp services as a T-GM for the following E810 NICs:

  • Intel E810-XXVDA4T Westport Channel NIC
  • Intel E810-CQDA2T Logan Beach NIC

For distributed RAN (D-RAN) use cases, you can configure PTP for 2 NICs as follows:

  • NIC 1 is synced to the global navigation satellite system (GNSS) time source.
  • NIC 2 is synced to the 1PPS timing output provided by NIC one. This configuration is provided by the PTP hardware plugin in the PtpConfig CR.

The 2-card PTP T-GM configuration uses one instance of ptp4l and one instance of ts2phc. The ptp4l and ts2phc programs are each configured to operate on two PTP hardware clocks (PHCs), one for each NIC. The host system clock is synchronized from the NIC that is connected to the GNSS time source.

Note

Use the following example PtpConfig CR as the basis to configure linuxptp services as T-GM for dual Intel Westport Channel E810-XXVDA4T network interfaces.

To configure PTP fast events, set appropriate values for ptp4lOpts, ptp4lConf, and ptpClockThreshold. ptpClockThreshold is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.

Prerequisites

  • For T-GM clocks in production environments, install two Intel E810 Westport Channel NICs in the bare-metal cluster host.
  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the PTP Operator.

Procedure

  1. Create the PtpConfig CR. For example:

    1. Save the following YAML in the grandmaster-clock-ptp-config-dual-nics.yaml file:

      Example 15.2. PTP grandmaster clock configuration for dual E810 NICs

      # In this example two cards $iface_nic1 and $iface_nic2 are connected via
      # SMA1 ports by a cable and $iface_nic2 receives 1PPS signals from $iface_nic1
      apiVersion: ptp.openshift.io/v1
      kind: PtpConfig
      metadata:
        name: grandmaster
        namespace: openshift-ptp
        annotations: {}
      spec:
        profile:
          - name: "grandmaster"
            ptp4lOpts: "-2 --summary_interval -4"
            phc2sysOpts: -r -u 0 -m -N 8 -R 16 -s $iface_nic1 -n 24
            ptpSchedulingPolicy: SCHED_FIFO
            ptpSchedulingPriority: 10
            ptpSettings:
              logReduce: "true"
            plugins:
              e810:
                enableDefaultConfig: false
                settings:
                  LocalMaxHoldoverOffSet: 1500
                  LocalHoldoverTimeout: 14400
                  MaxInSpecOffset: 1500
                pins: $e810_pins
                #  "$iface_nic1":
                #    "U.FL2": "0 2"
                #    "U.FL1": "0 1"
                #    "SMA2": "0 2"
                #    "SMA1": "2 1"
                #  "$iface_nic2":
                #    "U.FL2": "0 2"
                #    "U.FL1": "0 1"
                #    "SMA2": "0 2"
                #    "SMA1": "1 1"
                ublxCmds:
                  - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1
                      - "-P"
                      - "29.20"
                      - "-z"
                      - "CFG-HW-ANT_CFG_VOLTCTRL,1"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -e GPS
                      - "-P"
                      - "29.20"
                      - "-e"
                      - "GPS"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -d Galileo
                      - "-P"
                      - "29.20"
                      - "-d"
                      - "Galileo"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -d GLONASS
                      - "-P"
                      - "29.20"
                      - "-d"
                      - "GLONASS"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -d BeiDou
                      - "-P"
                      - "29.20"
                      - "-d"
                      - "BeiDou"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -d SBAS
                      - "-P"
                      - "29.20"
                      - "-d"
                      - "SBAS"
                    reportOutput: false
                  - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000
                      - "-P"
                      - "29.20"
                      - "-t"
                      - "-w"
                      - "5"
                      - "-v"
                      - "1"
                      - "-e"
                      - "SURVEYIN,600,50000"
                    reportOutput: true
                  - args: #ubxtool -P 29.20 -p MON-HW
                      - "-P"
                      - "29.20"
                      - "-p"
                      - "MON-HW"
                    reportOutput: true
                  - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248
                      - "-P"
                      - "29.20"
                      - "-p"
                      - "CFG-MSG,1,38,248"
                    reportOutput: true
            ts2phcOpts: " "
            ts2phcConf: |
              [nmea]
              ts2phc.master 1
              [global]
              use_syslog  0
              verbose 1
              logging_level 7
              ts2phc.pulsewidth 100000000
              #cat /dev/GNSS to find available serial port
              #example value of gnss_serialport is /dev/ttyGNSS_1700_0
              ts2phc.nmea_serialport $gnss_serialport
              [$iface_nic1]
              ts2phc.extts_polarity rising
              ts2phc.extts_correction 0
              [$iface_nic2]
              ts2phc.master 0
              ts2phc.extts_polarity rising
              #this is a measured value in nanoseconds to compensate for SMA cable delay
              ts2phc.extts_correction -10
            ptp4lConf: |
              [$iface_nic1]
              masterOnly 1
              [$iface_nic1_1]
              masterOnly 1
              [$iface_nic1_2]
              masterOnly 1
              [$iface_nic1_3]
              masterOnly 1
              [$iface_nic2]
              masterOnly 1
              [$iface_nic2_1]
              masterOnly 1
              [$iface_nic2_2]
              masterOnly 1
              [$iface_nic2_3]
              masterOnly 1
              [global]
              #
              # Default Data Set
              #
              twoStepFlag 1
              priority1 128
              priority2 128
              domainNumber 24
              #utc_offset 37
              clockClass 6
              clockAccuracy 0x27
              offsetScaledLogVariance 0xFFFF
              free_running 0
              freq_est_interval 1
              dscp_event 0
              dscp_general 0
              dataset_comparison G.8275.x
              G.8275.defaultDS.localPriority 128
              #
              # Port Data Set
              #
              logAnnounceInterval -3
              logSyncInterval -4
              logMinDelayReqInterval -4
              logMinPdelayReqInterval 0
              announceReceiptTimeout 3
              syncReceiptTimeout 0
              delayAsymmetry 0
              fault_reset_interval -4
              neighborPropDelayThresh 20000000
              masterOnly 0
              G.8275.portDS.localPriority 128
              #
              # Run time options
              #
              assume_two_step 0
              logging_level 6
              path_trace_enabled 0
              follow_up_info 0
              hybrid_e2e 0
              inhibit_multicast_service 0
              net_sync_monitor 0
              tc_spanning_tree 0
              tx_timestamp_timeout 50
              unicast_listen 0
              unicast_master_table 0
              unicast_req_duration 3600
              use_syslog 1
              verbose 0
              summary_interval -4
              kernel_leap 1
              check_fup_sync 0
              clock_class_threshold 7
              #
              # Servo Options
              #
              pi_proportional_const 0.0
              pi_integral_const 0.0
              pi_proportional_scale 0.0
              pi_proportional_exponent -0.3
              pi_proportional_norm_max 0.7
              pi_integral_scale 0.0
              pi_integral_exponent 0.4
              pi_integral_norm_max 0.3
              step_threshold 2.0
              first_step_threshold 0.00002
              clock_servo pi
              sanity_freq_limit  200000000
              ntpshm_segment 0
              #
              # Transport options
              #
              transportSpecific 0x0
              ptp_dst_mac 01:1B:19:00:00:00
              p2p_dst_mac 01:80:C2:00:00:0E
              udp_ttl 1
              udp6_scope 0x0E
              uds_address /var/run/ptp4l
              #
              # Default interface options
              #
              clock_type BC
              network_transport L2
              delay_mechanism E2E
              time_stamping hardware
              tsproc_mode filter
              delay_filter moving_median
              delay_filter_length 10
              egressLatency 0
              ingressLatency 0
              boundary_clock_jbod 1
              #
              # Clock description
              #
              productDescription ;;
              revisionData ;;
              manufacturerIdentity 00:00:00
              userDescription ;
              timeSource 0x20
        recommend:
          - profile: "grandmaster"
            priority: 4
            match:
              - nodeLabel: "node-role.kubernetes.io/$mcp"
      Copy to Clipboard Toggle word wrap
      Note

      For E810 Westport Channel NICs, set the value for ts2phc.nmea_serialport to /dev/gnss0.

    2. Create the CR by running the following command:

      $ oc create -f grandmaster-clock-ptp-config-dual-nics.yaml
      Copy to Clipboard Toggle word wrap

Verification

  1. Check that the PtpConfig profile is applied to the node.

    1. Get the list of pods in the openshift-ptp namespace by running the following command:

      $ oc get pods -n openshift-ptp -o wide
      Copy to Clipboard Toggle word wrap

      Example output

      NAME                          READY   STATUS    RESTARTS   AGE     IP             NODE
      linuxptp-daemon-74m2g         3/3     Running   3          4d15h   10.16.230.7    compute-1.example.com
      ptp-operator-5f4f48d7c-x7zkf  1/1     Running   1          4d15h   10.128.1.145   compute-1.example.com
      Copy to Clipboard Toggle word wrap

    2. Check that the profile is correct. Examine the logs of the linuxptp daemon that corresponds to the node you specified in the PtpConfig profile. Run the following command:

      $ oc logs linuxptp-daemon-74m2g -n openshift-ptp -c linuxptp-daemon-container
      Copy to Clipboard Toggle word wrap

      Example output

      ts2phc[509863.660]: [ts2phc.0.config] nmea delay: 347527248 ns
      ts2phc[509863.660]: [ts2phc.0.config] ens2f0 extts index 0 at 1705516553.000000000 corr 0 src 1705516553.652499081 diff 0
      ts2phc[509863.660]: [ts2phc.0.config] ens2f0 master offset          0 s2 freq      -0
      I0117 18:35:16.000146 1633226 stats.go:57] state updated for ts2phc =s2
      I0117 18:35:16.000163 1633226 event.go:417] dpll State s2, gnss State s2, tsphc state s2, gm state s2,
      ts2phc[1705516516]:[ts2phc.0.config] ens2f0 nmea_status 1 offset 0 s2
      GM[1705516516]:[ts2phc.0.config] ens2f0 T-GM-STATUS s2
      ts2phc[509863.677]: [ts2phc.0.config] ens7f0 extts index 0 at 1705516553.000000010 corr -10 src 1705516553.652499081 diff 0
      ts2phc[509863.677]: [ts2phc.0.config] ens7f0 master offset          0 s2 freq      -0
      I0117 18:35:16.016597 1633226 stats.go:57] state updated for ts2phc =s2
      phc2sys[509863.719]: [ptp4l.0.config] CLOCK_REALTIME phc offset        -6 s2 freq  +15441 delay    510
      phc2sys[509863.782]: [ptp4l.0.config] CLOCK_REALTIME phc offset        -7 s2 freq  +15438 delay    502
      Copy to Clipboard Toggle word wrap

You can configure the linuxptp services (ptp4l, phc2sys, ts2phc) as a grandmaster clock (T-GM) for 3 E810 NICs by creating a PtpConfig custom resource (CR) that configures the NICs.

You can configure the linuxptp services as a T-GM with 3 NICs for the following E810 NICs:

  • Intel E810-XXVDA4T Westport Channel NIC
  • Intel E810-CQDA2T Logan Beach NIC

For distributed RAN (D-RAN) use cases, you can configure PTP for 3 NICs as follows:

  • NIC 1 is synced to the Global Navigation Satellite System (GNSS)
  • NICs 2 and 3 are synced to NIC 1 with 1PPS faceplate connections

Use the following example PtpConfig CRs as the basis to configure linuxptp services as a 3-card Intel E810 T-GM.

Prerequisites

  • For T-GM clocks in production environments, install 3 Intel E810 NICs in the bare-metal cluster host.
  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the PTP Operator.

Procedure

  1. Create the PtpConfig CR. For example:

    1. Save the following YAML in the three-nic-grandmaster-clock-ptp-config.yaml file:

      Example 15.3. PTP grandmaster clock configuration for 3 E810 NICs

      # In this example, the three cards are connected via SMA cables:
      # - $iface_timeTx1 has the GNSS signal input
      # - $iface_timeTx2 SMA1 is connected to $iface_timeTx1 SMA1
      # - $iface_timeTx3 SMA1 is connected to $iface_timeTx1 SMA2
      apiVersion: ptp.openshift.io/v1
      kind: PtpConfig
      metadata:
        name: gm-3card
        namespace: openshift-ptp
        annotations:
          ran.openshift.io/ztp-deploy-wave: "10"
      spec:
        profile:
        - name: grandmaster
          ptp4lOpts: -2 --summary_interval -4
          phc2sysOpts: -r -u 0 -m -N 8 -R 16 -s $iface_timeTx1 -n 24
          ptpSchedulingPolicy: SCHED_FIFO
          ptpSchedulingPriority: 10
          ptpSettings:
            logReduce: "true"
          plugins:
            e810:
              enableDefaultConfig: false
              settings:
                LocalHoldoverTimeout: 14400
                LocalMaxHoldoverOffSet: 1500
                MaxInSpecOffset: 1500
              pins:
                # Syntax guide:
                # - The 1st number in each pair must be one of:
                #    0 - Disabled
                #    1 - RX
                #    2 - TX
                # - The 2nd number in each pair must match the channel number
                $iface_timeTx1:
                  SMA1: 2 1
                  SMA2: 2 2
                  U.FL1: 0 1
                  U.FL2: 0 2
                $iface_timeTx2:
                  SMA1: 1 1
                  SMA2: 0 2
                  U.FL1: 0 1
                  U.FL2: 0 2
                $iface_timeTx3:
                  SMA1: 1 1
                  SMA2: 0 2
                  U.FL1: 0 1
                  U.FL2: 0 2
              ublxCmds:
                - args: #ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1
                    - "-P"
                    - "29.20"
                    - "-z"
                    - "CFG-HW-ANT_CFG_VOLTCTRL,1"
                  reportOutput: false
                - args: #ubxtool -P 29.20 -e GPS
                    - "-P"
                    - "29.20"
                    - "-e"
                    - "GPS"
                  reportOutput: false
                - args: #ubxtool -P 29.20 -d Galileo
                    - "-P"
                    - "29.20"
                    - "-d"
                    - "Galileo"
                  reportOutput: false
                - args: #ubxtool -P 29.20 -d GLONASS
                    - "-P"
                    - "29.20"
                    - "-d"
                    - "GLONASS"
                  reportOutput: false
                - args: #ubxtool -P 29.20 -d BeiDou
                    - "-P"
                    - "29.20"
                    - "-d"
                    - "BeiDou"
                  reportOutput: false
                - args: #ubxtool -P 29.20 -d SBAS
                    - "-P"
                    - "29.20"
                    - "-d"
                    - "SBAS"
                  reportOutput: false
                - args: #ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000
                    - "-P"
                    - "29.20"
                    - "-t"
                    - "-w"
                    - "5"
                    - "-v"
                    - "1"
                    - "-e"
                    - "SURVEYIN,600,50000"
                  reportOutput: true
                - args: #ubxtool -P 29.20 -p MON-HW
                    - "-P"
                    - "29.20"
                    - "-p"
                    - "MON-HW"
                  reportOutput: true
                - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248
                    - "-P"
                    - "29.20"
                    - "-p"
                    - "CFG-MSG,1,38,248"
                  reportOutput: true
          ts2phcOpts: " "
          ts2phcConf: |
            [nmea]
            ts2phc.master 1
            [global]
            use_syslog  0
            verbose 1
            logging_level 7
            ts2phc.pulsewidth 100000000
            #example value of nmea_serialport is /dev/gnss0
            ts2phc.nmea_serialport (?<gnss_serialport>[/\w\s/]+)
            leapfile /usr/share/zoneinfo/leap-seconds.list
            [$iface_timeTx1]
            ts2phc.extts_polarity rising
            ts2phc.extts_correction 0
            [$iface_timeTx2]
            ts2phc.master 0
            ts2phc.extts_polarity rising
            #this is a measured value in nanoseconds to compensate for SMA cable delay
            ts2phc.extts_correction -10
            [$iface_timeTx3]
            ts2phc.master 0
            ts2phc.extts_polarity rising
            #this is a measured value in nanoseconds to compensate for SMA cable delay
            ts2phc.extts_correction -10
          ptp4lConf: |
            [$iface_timeTx1]
            masterOnly 1
            [$iface_timeTx1_1]
            masterOnly 1
            [$iface_timeTx1_2]
            masterOnly 1
            [$iface_timeTx1_3]
            masterOnly 1
            [$iface_timeTx2]
            masterOnly 1
            [$iface_timeTx2_1]
            masterOnly 1
            [$iface_timeTx2_2]
            masterOnly 1
            [$iface_timeTx2_3]
            masterOnly 1
            [$iface_timeTx3]
            masterOnly 1
            [$iface_timeTx3_1]
            masterOnly 1
            [$iface_timeTx3_2]
            masterOnly 1
            [$iface_timeTx3_3]
            masterOnly 1
            [global]
            #
            # Default Data Set
            #
            twoStepFlag 1
            priority1 128
            priority2 128
            domainNumber 24
            #utc_offset 37
            clockClass 6
            clockAccuracy 0x27
            offsetScaledLogVariance 0xFFFF
            free_running 0
            freq_est_interval 1
            dscp_event 0
            dscp_general 0
            dataset_comparison G.8275.x
            G.8275.defaultDS.localPriority 128
            #
            # Port Data Set
            #
            logAnnounceInterval -3
            logSyncInterval -4
            logMinDelayReqInterval -4
            logMinPdelayReqInterval 0
            announceReceiptTimeout 3
            syncReceiptTimeout 0
            delayAsymmetry 0
            fault_reset_interval -4
            neighborPropDelayThresh 20000000
            masterOnly 0
            G.8275.portDS.localPriority 128
            #
            # Run time options
            #
            assume_two_step 0
            logging_level 6
            path_trace_enabled 0
            follow_up_info 0
            hybrid_e2e 0
            inhibit_multicast_service 0
            net_sync_monitor 0
            tc_spanning_tree 0
            tx_timestamp_timeout 50
            unicast_listen 0
            unicast_master_table 0
            unicast_req_duration 3600
            use_syslog 1
            verbose 0
            summary_interval -4
            kernel_leap 1
            check_fup_sync 0
            clock_class_threshold 7
            #
            # Servo Options
            #
            pi_proportional_const 0.0
            pi_integral_const 0.0
            pi_proportional_scale 0.0
            pi_proportional_exponent -0.3
            pi_proportional_norm_max 0.7
            pi_integral_scale 0.0
            pi_integral_exponent 0.4
            pi_integral_norm_max 0.3
            step_threshold 2.0
            first_step_threshold 0.00002
            clock_servo pi
            sanity_freq_limit 200000000
            ntpshm_segment 0
            #
            # Transport options
            #
            transportSpecific 0x0
            ptp_dst_mac 01:1B:19:00:00:00
            p2p_dst_mac 01:80:C2:00:00:0E
            udp_ttl 1
            udp6_scope 0x0E
            uds_address /var/run/ptp4l
            #
            # Default interface options
            #
            clock_type BC
            network_transport L2
            delay_mechanism E2E
            time_stamping hardware
            tsproc_mode filter
            delay_filter moving_median
            delay_filter_length 10
            egressLatency 0
            ingressLatency 0
            boundary_clock_jbod 1
            #
            # Clock description
            #
            productDescription ;;
            revisionData ;;
            manufacturerIdentity 00:00:00
            userDescription ;
            timeSource 0x20
          ptpClockThreshold:
            holdOverTimeout: 5
            maxOffsetThreshold: 1500
            minOffsetThreshold: -1500
        recommend:
        - profile: grandmaster
          priority: 4
          match:
          - nodeLabel: node-role.kubernetes.io/$mcp
      Copy to Clipboard Toggle word wrap
      Note

      Set the value for ts2phc.nmea_serialport to /dev/gnss0.

    2. Create the CR by running the following command:

      $ oc create -f three-nic-grandmaster-clock-ptp-config.yaml
      Copy to Clipboard Toggle word wrap

Verification

  1. Check that the PtpConfig profile is applied to the node.

    1. Get the list of pods in the openshift-ptp namespace by running the following command:

      $ oc get pods -n openshift-ptp -o wide
      Copy to Clipboard Toggle word wrap

      Example output

      NAME                          READY   STATUS    RESTARTS   AGE     IP             NODE
      linuxptp-daemon-74m3q         3/3     Running   3          4d15h   10.16.230.7    compute-1.example.com
      ptp-operator-5f4f48d7c-x6zkn  1/1     Running   1          4d15h   10.128.1.145   compute-1.example.com
      Copy to Clipboard Toggle word wrap

    2. Check that the profile is correct. Run the following command, and examine the logs of the linuxptp daemon that corresponds to the node you specified in the PtpConfig profile:

      $ oc logs linuxptp-daemon-74m3q -n openshift-ptp -c linuxptp-daemon-container
      Copy to Clipboard Toggle word wrap

      Example output

      ts2phc[2527.586]: [ts2phc.0.config:7] adding tstamp 1742826342.000000000 to clock /dev/ptp11 
      1
      
      ts2phc[2527.586]: [ts2phc.0.config:7] adding tstamp 1742826342.000000000 to clock /dev/ptp7 
      2
      
      ts2phc[2527.586]: [ts2phc.0.config:7] adding tstamp 1742826342.000000000 to clock /dev/ptp14 
      3
      
      ts2phc[2527.586]: [ts2phc.0.config:7] nmea delay: 56308811 ns
      ts2phc[2527.586]: [ts2phc.0.config:6] /dev/ptp14 offset          0 s2 freq      +0 
      4
      
      ts2phc[2527.587]: [ts2phc.0.config:6] /dev/ptp7 offset          0 s2 freq      +0 
      5
      
      ts2phc[2527.587]: [ts2phc.0.config:6] /dev/ptp11 offset          0 s2 freq      -0 
      6
      
      I0324 14:25:05.000439  106907 stats.go:61] state updated for ts2phc =s2
      I0324 14:25:05.000504  106907 event.go:419] dpll State s2, gnss State s2, tsphc state s2, gm state s2,
      I0324 14:25:05.000906  106907 stats.go:61] state updated for ts2phc =s2
      I0324 14:25:05.001059  106907 stats.go:61] state updated for ts2phc =s2
      ts2phc[1742826305]:[ts2phc.0.config] ens4f0 nmea_status 1 offset 0 s2
      GM[1742826305]:[ts2phc.0.config] ens4f0 T-GM-STATUS s2 
      7
      Copy to Clipboard Toggle word wrap

      1 2 3
      ts2phc is updating the PTP hardware clock.
      4 5 6
      Estimated PTP device offset between PTP device and the reference clock is 0 nanoseconds. The PTP device is in sync with the leader clock.
      7
      T-GM is in a locked state (s2).

The following reference information describes the configuration options for the PtpConfig custom resource (CR) that configures the linuxptp services (ptp4l, phc2sys, ts2phc) as a grandmaster clock.

Expand
Table 15.1. PtpConfig configuration options for PTP Grandmaster clock
PtpConfig CR fieldDescription

plugins

Specify an array of .exec.cmdline options that configure the NIC for grandmaster clock operation. Grandmaster clock configuration requires certain PTP pins to be disabled.

The plugin mechanism allows the PTP Operator to do automated hardware configuration. For the Intel Westport Channel NIC, when enableDefaultConfig is true, The PTP Operator runs a hard-coded script to do the required configuration for the NIC.

ptp4lOpts

Specify system configuration options for the ptp4l service. The options should not include the network interface name -i <interface> and service config file -f /etc/ptp4l.conf because the network interface name and the service config file are automatically appended.

ptp4lConf

Specify the required configuration to start ptp4l as a grandmaster clock. For example, the ens2f1 interface synchronizes downstream connected devices. For grandmaster clocks, set clockClass to 6 and set clockAccuracy to 0x27. Set timeSource to 0x20 for when receiving the timing signal from a Global navigation satellite system (GNSS).

tx_timestamp_timeout

Specify the maximum amount of time to wait for the transmit (TX) timestamp from the sender before discarding the data.

boundary_clock_jbod

Specify the JBOD boundary clock time delay value. This value is used to correct the time values that are passed between the network time devices.

phc2sysOpts

Specify system config options for the phc2sys service. If this field is empty the PTP Operator does not start the phc2sys service.

Note

Ensure that the network interface listed here is configured as grandmaster and is referenced as required in the ts2phcConf and ptp4lConf fields.

ptpSchedulingPolicy

Configure the scheduling policy for ptp4l and phc2sys processes. Default value is SCHED_OTHER. Use SCHED_FIFO on systems that support FIFO scheduling.

ptpSchedulingPriority

Set an integer value from 1-65 to configure FIFO priority for ptp4l and phc2sys processes when ptpSchedulingPolicy is set to SCHED_FIFO. The ptpSchedulingPriority field is not used when ptpSchedulingPolicy is set to SCHED_OTHER.

ptpClockThreshold

Optional. If ptpClockThreshold stanza is not present, default values are used for ptpClockThreshold fields. Stanza shows default ptpClockThreshold values. ptpClockThreshold values configure how long after the PTP master clock is disconnected before PTP events are triggered. holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected. The maxOffsetThreshold and minOffsetThreshold settings configure offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l). When the ptp4l or phc2sys offset value is outside this range, the PTP clock state is set to FREERUN. When the offset value is within this range, the PTP clock state is set to LOCKED.

ts2phcConf

Sets the configuration for the ts2phc command.

leapfile is the default path to the current leap seconds definition file in the PTP Operator container image.

ts2phc.nmea_serialport is the serial port device that is connected to the NMEA GPS clock source. When configured, the GNSS receiver is accessible on /dev/gnss<id>. If the host has multiple GNSS receivers, you can find the correct device by enumerating either of the following devices:

  • /sys/class/net/<eth_port>/device/gnss/
  • /sys/class/gnss/gnss<id>/device/

ts2phcOpts

Set options for the ts2phc command.

recommend

Specify an array of one or more recommend objects that define rules on how the profile should be applied to nodes.

.recommend.profile

Specify the .recommend.profile object name that is defined in the profile section.

.recommend.priority

Specify the priority with an integer value between 0 and 99. A larger number gets lower priority, so a priority of 99 is lower than a priority of 10. If a node can be matched with multiple profiles according to rules defined in the match field, the profile with the higher priority is applied to that node.

.recommend.match

Specify .recommend.match rules with nodeLabel or nodeName values.

.recommend.match.nodeLabel

Set nodeLabel with the key of the node.Labels field from the node object by using the oc get nodes --show-labels command. For example, node-role.kubernetes.io/worker.

.recommend.match.nodeName

Set nodeName with the value of the node.Name field from the node object by using the oc get nodes command. For example, compute-1.example.com.

The following table describes the PTP grandmaster clock (T-GM) gm.ClockClass states. Clock class states categorize T-GM clocks based on their accuracy and stability with regard to the Primary Reference Time Clock (PRTC) or other timing source.

Holdover specification is the amount of time a PTP clock can maintain synchronization without receiving updates from the primary time source.

Expand
Table 15.2. T-GM clock class states
Clock class stateDescription

gm.ClockClass 6

T-GM clock is connected to a PRTC in LOCKED mode. For example, the PRTC is traceable to a GNSS time source.

gm.ClockClass 7

T-GM clock is in HOLDOVER mode, and within holdover specification. The clock source might not be traceable to a category 1 frequency source.

gm.ClockClass 248

T-GM clock is in FREERUN mode.

For more information, see "Phase/time traceability information", ITU-T G.8275.1/Y.1369.1 Recommendations.

Use this information to understand how to use the Intel E810-XXVDA4T hardware plugin to configure the E810 network interface as PTP grandmaster clock. Hardware pin configuration determines how the network interface interacts with other components and devices in the system. The E810-XXVDA4T NIC has four connectors for external 1PPS signals: SMA1, SMA2, U.FL1, and U.FL2.

Expand
Table 15.3. Intel E810 NIC hardware connectors configuration
Hardware pinRecommended settingDescription

U.FL1

0 1

Disables the U.FL1 connector input. The U.FL1 connector is output-only.

U.FL2

0 2

Disables the U.FL2 connector output. The U.FL2 connector is input-only.

SMA1

0 1

Disables the SMA1 connector input. The SMA1 connector is bidirectional.

SMA2

0 2

Disables the SMA2 connector output. The SMA2 connector is bidirectional.

Note

SMA1 and U.FL1 connectors share channel one. SMA2 and U.FL2 connectors share channel two.

Set spec.profile.plugins.e810.ublxCmds parameters to configure the GNSS clock in the PtpConfig custom resource (CR).

Important

You must configure an offset value to compensate for T-GM GPS antenna cable signal delay. To configure the optimal T-GM antenna offset value, make precise measurements of the GNSS antenna cable signal delay. Red Hat cannot assist in this measurement or provide any values for the required delay offsets.

Each of these ublxCmds stanzas correspond to a configuration that is applied to the host NIC by using ubxtool commands. For example:

ublxCmds:
  - args:
      - "-P"
      - "29.20"
      - "-z"
      - "CFG-HW-ANT_CFG_VOLTCTRL,1"
      - "-z"
      - "CFG-TP-ANT_CABLEDELAY,<antenna_delay_offset>"
1

    reportOutput: false
Copy to Clipboard Toggle word wrap
1
Measured T-GM antenna delay offset in nanoseconds. To get the required delay offset value, you must measure the cable delay using external test equipment.

The following table describes the equivalent ubxtool commands:

Expand
Table 15.4. Intel E810 ublxCmds configuration
ubxtool commandDescription

ubxtool -P 29.20 -z CFG-HW-ANT_CFG_VOLTCTRL,1 -z CFG-TP-ANT_CABLEDELAY,<antenna_delay_offset>

Enables antenna voltage control, allows antenna status to be reported in the UBX-MON-RF and UBX-INF-NOTICE log messages, and sets a <antenna_delay_offset> value in nanoseconds that offsets the GPS antenna cable signal delay.

ubxtool -P 29.20 -e GPS

Enables the antenna to receive GPS signals.

ubxtool -P 29.20 -d Galileo

Configures the antenna to receive signal from the Galileo GPS satellite.

ubxtool -P 29.20 -d GLONASS

Disables the antenna from receiving signal from the GLONASS GPS satellite.

ubxtool -P 29.20 -d BeiDou

Disables the antenna from receiving signal from the BeiDou GPS satellite.

ubxtool -P 29.20 -d SBAS

Disables the antenna from receiving signal from the SBAS GPS satellite.

ubxtool -P 29.20 -t -w 5 -v 1 -e SURVEYIN,600,50000

Configures the GNSS receiver survey-in process to improve its initial position estimate. This can take up to 24 hours to achieve an optimal result.

ubxtool -P 29.20 -p MON-HW

Runs a single automated scan of the hardware and reports on the NIC state and configuration settings.

Use this information to understand how to use the Intel E810-XXVDA4T hardware plugin to configure a pair of E810 network interfaces as PTP grandmaster clock (T-GM).

Before you configure the dual-NIC cluster host, you must connect the two NICs with an SMA1 cable using the 1PPS faceplace connections.

When you configure a dual-NIC T-GM, you need to compensate for the 1PPS signal delay that occurs when you connect the NICs using the SMA1 connection ports. Various factors such as cable length, ambient temperature, and component and manufacturing tolerances can affect the signal delay. To compensate for the delay, you must calculate the specific value that you use to offset the signal delay.

Expand
Table 15.5. E810 dual-NIC T-GM PtpConfig CR reference
PtpConfig fieldDescription

spec.profile.plugins.e810.pins

Configure the E810 hardware pins using the PTP Operator E810 hardware plugin.

  • Pin 2 1 enables the 1PPS OUT connection for SMA1 on NIC one.
  • Pin 1 1 enables the 1PPS IN connection for SMA1 on NIC two.

spec.profile.ts2phcConf

Use the ts2phcConf field to configure parameters for NIC one and NIC two. Set ts2phc.master 0 for NIC two. This configures the timing source for NIC two from the 1PPS input, not GNSS. Configure the ts2phc.extts_correction value for NIC two to compensate for the delay that is incurred for the specific SMA cable and cable length that you use. The value that you configure depends on your specific measurements and SMA1 cable length.

spec.profile.ptp4lConf

Set the value of boundary_clock_jbod to 1 to enable support for multiple NICs.

15.2.5.4. 3-card E810 NIC configuration reference

Use this information to understand how to configure 3 E810 NICs as PTP grandmaster clock (T-GM).

Before you configure the 3-card cluster host, you must connect the 3 NICs by using the 1PPS faceplate connections. The primary NIC 1PPS_out outputs feed the other 2 NICs.

When you configure a 3-card T-GM, you need to compensate for the 1PPS signal delay that occurs when you connect the NICs by using the SMA1 connection ports. Various factors such as cable length, ambient temperature, and component and manufacturing tolerances can affect the signal delay. To compensate for the delay, you must calculate the specific value that you use to offset the signal delay.

Expand
Table 15.6. 3-card E810 T-GM PtpConfig CR reference
PtpConfig fieldDescription

spec.profile.plugins.e810.pins

Configure the E810 hardware pins with the PTP Operator E810 hardware plugin.

  • $iface_timeTx1.SMA1 enables the 1PPS OUT connection for SMA1 on NIC 1.
  • $iface_timeTx1.SMA2 enables the 1PPS OUT connection for SMA2 on NIC 1.
  • $iface_timeTx2.SMA1 and $iface_timeTx3.SMA1 enables the 1PPS IN connection for SMA1 on NIC 2 and NIC 3.
  • $iface_timeTx2.SMA2 and $iface_timeTx3.SMA2 disables the SMA2 connection on NIC 2 and NIC 3.

spec.profile.ts2phcConf

Use the ts2phcConf field to configure parameters for the NICs. Set ts2phc.master 0 for NIC 2 and NIC 3. This configures the timing source for NIC 2 and NIC 3 from the 1PPS input, not GNSS. Configure the ts2phc.extts_correction value for NIC 2 and NIC 3 to compensate for the delay that is incurred for the specific SMA cable and cable length that you use. The value that you configure depends on your specific measurements and SMA1 cable length.

spec.profile.ptp4lConf

Set the value of boundary_clock_jbod to 1 to enable support for multiple NICs.

The PTP Operator container image includes the latest leap-seconds.list file that is available at the time of release. You can configure the PTP Operator to automatically update the leap second file by using Global Positioning System (GPS) announcements.

Leap second information is stored in an automatically generated ConfigMap resource named leap-configmap in the openshift-ptp namespace. The PTP Operator mounts the leap-configmap resource as a volume in the linuxptp-daemon pod that is accessible by the ts2phc process.

If the GPS satellite broadcasts new leap second data, the PTP Operator updates the leap-configmap resource with the new data. The ts2phc process picks up the changes automatically.

Note

The following procedure is provided as reference. The 4.16 version of the PTP Operator enables automatic leap second management by default.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have logged in as a user with cluster-admin privileges.
  • You have installed the PTP Operator and configured a PTP grandmaster clock (T-GM) in the cluster.

Procedure

  1. Configure automatic leap second handling in the phc2sysOpts section of the PtpConfig CR. Set the following options:

    phc2sysOpts: -r -u 0 -m -N 8 -R 16 -S 2 -s ens2f0 -n 24 
    1
    Copy to Clipboard Toggle word wrap
    Note

    Previously, the T-GM required an offset adjustment in the phc2sys configuration (-O -37) to account for historical leap seconds. This is no longer needed.

  2. Configure the Intel e810 NIC to enable periodical reporting of NAV-TIMELS messages by the GPS receiver in the spec.profile.plugins.e810.ublxCmds section of the PtpConfig CR. For example:

    - args: #ubxtool -P 29.20 -p CFG-MSG,1,38,248
        - "-P"
        - "29.20"
        - "-p"
        - "CFG-MSG,1,38,248"
    Copy to Clipboard Toggle word wrap

Verification

  1. Validate that the configured T-GM is receiving NAV-TIMELS messages from the connected GPS. Run the following command:

    $ oc -n openshift-ptp -c linuxptp-daemon-container exec -it $(oc -n openshift-ptp get pods -o name | grep daemon) -- ubxtool -t -p NAV-TIMELS -P 29.20
    Copy to Clipboard Toggle word wrap

    Example output

    1722509534.4417
    UBX-NAV-STATUS:
      iTOW 384752000 gpsFix 5 flags 0xdd fixStat 0x0 flags2 0x8
      ttff 18261, msss 1367642864
    
    1722509534.4419
    UBX-NAV-TIMELS:
      iTOW 384752000 version 0 reserved2 0 0 0 srcOfCurrLs 2
      currLs 18 srcOfLsChange 2 lsChange 0 timeToLsEvent 70376866
      dateOfLsGpsWn 2441 dateOfLsGpsDn 7 reserved2 0 0 0
      valid x3
    
    1722509534.4421
    UBX-NAV-CLOCK:
      iTOW 384752000 clkB 784281 clkD 435 tAcc 3 fAcc 215
    
    1722509535.4477
    UBX-NAV-STATUS:
      iTOW 384753000 gpsFix 5 flags 0xdd fixStat 0x0 flags2 0x8
      ttff 18261, msss 1367643864
    
    1722509535.4479
    UBX-NAV-CLOCK:
      iTOW 384753000 clkB 784716 clkD 435 tAcc 3 fAcc 218
    Copy to Clipboard Toggle word wrap

  2. Validate that the leap-configmap resource has been successfully generated by the PTP Operator and is up to date with the latest version of the leap-seconds.list. Run the following command:

    $ oc -n openshift-ptp get configmap leap-configmap -o jsonpath='{.data.<node_name>}' 
    1
    Copy to Clipboard Toggle word wrap
    1 1
    Replace <node_name> with the node where you have installed and configured the PTP T-GM clock with automatic leap second management. Escape special characters in the node name. For example, node-1\.example\.com.

    Example output

    # Do not edit
    # This file is generated automatically by linuxptp-daemon
    #$  3913697179
    #@  4291747200
    2272060800     10    # 1 Jan 1972
    2287785600     11    # 1 Jul 1972
    2303683200     12    # 1 Jan 1973
    2335219200     13    # 1 Jan 1974
    2366755200     14    # 1 Jan 1975
    2398291200     15    # 1 Jan 1976
    2429913600     16    # 1 Jan 1977
    2461449600     17    # 1 Jan 1978
    2492985600     18    # 1 Jan 1979
    2524521600     19    # 1 Jan 1980
    2571782400     20    # 1 Jul 1981
    2603318400     21    # 1 Jul 1982
    2634854400     22    # 1 Jul 1983
    2698012800     23    # 1 Jul 1985
    2776982400     24    # 1 Jan 1988
    2840140800     25    # 1 Jan 1990
    2871676800     26    # 1 Jan 1991
    2918937600     27    # 1 Jul 1992
    2950473600     28    # 1 Jul 1993
    2982009600     29    # 1 Jul 1994
    3029443200     30    # 1 Jan 1996
    3076704000     31    # 1 Jul 1997
    3124137600     32    # 1 Jan 1999
    3345062400     33    # 1 Jan 2006
    3439756800     34    # 1 Jan 2009
    3550089600     35    # 1 Jul 2012
    3644697600     36    # 1 Jul 2015
    3692217600     37    # 1 Jan 2017
    
    #h  e65754d4 8f39962b aa854a61 661ef546 d2af0bfa
    Copy to Clipboard Toggle word wrap

You can configure the linuxptp services (ptp4l, phc2sys) as boundary clock by creating a PtpConfig custom resource (CR) object.

Note

Use the following example PtpConfig CR as the basis to configure linuxptp services as the boundary clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts, ptp4lConf, and ptpClockThreshold. ptpClockThreshold is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the PTP Operator.

Procedure

  1. Create the following PtpConfig CR, and then save the YAML in the boundary-clock-ptp-config.yaml file.

    Example PTP boundary clock configuration

    apiVersion: ptp.openshift.io/v1
    kind: PtpConfig
    metadata:
      name: boundary-clock
      namespace: openshift-ptp
      annotations: {}
    spec:
      profile:
        - name: boundary-clock
          ptp4lOpts: "-2"
          phc2sysOpts: "-a -r -n 24"
          ptpSchedulingPolicy: SCHED_FIFO
          ptpSchedulingPriority: 10
          ptpSettings:
            logReduce: "true"
          ptp4lConf: |
            # The interface name is hardware-specific
            [$iface_slave]
            masterOnly 0
            [$iface_master_1]
            masterOnly 1
            [$iface_master_2]
            masterOnly 1
            [$iface_master_3]
            masterOnly 1
            [global]
            #
            # Default Data Set
            #
            twoStepFlag 1
            slaveOnly 0
            priority1 128
            priority2 128
            domainNumber 24
            #utc_offset 37
            clockClass 248
            clockAccuracy 0xFE
            offsetScaledLogVariance 0xFFFF
            free_running 0
            freq_est_interval 1
            dscp_event 0
            dscp_general 0
            dataset_comparison G.8275.x
            G.8275.defaultDS.localPriority 128
            #
            # Port Data Set
            #
            logAnnounceInterval -3
            logSyncInterval -4
            logMinDelayReqInterval -4
            logMinPdelayReqInterval -4
            announceReceiptTimeout 3
            syncReceiptTimeout 0
            delayAsymmetry 0
            fault_reset_interval -4
            neighborPropDelayThresh 20000000
            masterOnly 0
            G.8275.portDS.localPriority 128
            #
            # Run time options
            #
            assume_two_step 0
            logging_level 6
            path_trace_enabled 0
            follow_up_info 0
            hybrid_e2e 0
            inhibit_multicast_service 0
            net_sync_monitor 0
            tc_spanning_tree 0
            tx_timestamp_timeout 50
            unicast_listen 0
            unicast_master_table 0
            unicast_req_duration 3600
            use_syslog 1
            verbose 0
            summary_interval 0
            kernel_leap 1
            check_fup_sync 0
            clock_class_threshold 135
            #
            # Servo Options
            #
            pi_proportional_const 0.0
            pi_integral_const 0.0
            pi_proportional_scale 0.0
            pi_proportional_exponent -0.3
            pi_proportional_norm_max 0.7
            pi_integral_scale 0.0
            pi_integral_exponent 0.4
            pi_integral_norm_max 0.3
            step_threshold 2.0
            first_step_threshold 0.00002
            max_frequency 900000000
            clock_servo pi
            sanity_freq_limit 200000000
            ntpshm_segment 0
            #
            # Transport options
            #
            transportSpecific 0x0
            ptp_dst_mac 01:1B:19:00:00:00
            p2p_dst_mac 01:80:C2:00:00:0E
            udp_ttl 1
            udp6_scope 0x0E
            uds_address /var/run/ptp4l
            #
            # Default interface options
            #
            clock_type BC
            network_transport L2
            delay_mechanism E2E
            time_stamping hardware
            tsproc_mode filter
            delay_filter moving_median
            delay_filter_length 10
            egressLatency 0
            ingressLatency 0
            boundary_clock_jbod 0
            #
            # Clock description
            #
            productDescription ;;
            revisionData ;;
            manufacturerIdentity 00:00:00
            userDescription ;
            timeSource 0xA0
      recommend:
        - profile: boundary-clock
          priority: 4
          match:
            - nodeLabel: "node-role.kubernetes.io/$mcp"
    Copy to Clipboard Toggle word wrap

    Expand
    Table 15.7. PTP boundary clock CR configuration options
    CR fieldDescription

    name

    The name of the PtpConfig CR.

    profile

    Specify an array of one or more profile objects.

    name

    Specify the name of a profile object which uniquely identifies a profile object.

    ptp4lOpts

    Specify system config options for the ptp4l service. The options should not include the network interface name -i <interface> and service config file -f /etc/ptp4l.conf because the network interface name and the service config file are automatically appended.

    ptp4lConf

    Specify the required configuration to start ptp4l as boundary clock. For example, ens1f0 synchronizes from a grandmaster clock and ens1f3 synchronizes connected devices.

    <interface_1>

    The interface that receives the synchronization clock.

    <interface_2>

    The interface that sends the synchronization clock.

    tx_timestamp_timeout

    For Intel Columbiaville 800 Series NICs, set tx_timestamp_timeout to 50.

    boundary_clock_jbod

    For Intel Columbiaville 800 Series NICs, ensure boundary_clock_jbod is set to 0. For Intel Fortville X710 Series NICs, ensure boundary_clock_jbod is set to 1.

    phc2sysOpts

    Specify system config options for the phc2sys service. If this field is empty, the PTP Operator does not start the phc2sys service.

    ptpSchedulingPolicy

    Scheduling policy for ptp4l and phc2sys processes. Default value is SCHED_OTHER. Use SCHED_FIFO on systems that support FIFO scheduling.

    ptpSchedulingPriority

    Integer value from 1-65 used to set FIFO priority for ptp4l and phc2sys processes when ptpSchedulingPolicy is set to SCHED_FIFO. The ptpSchedulingPriority field is not used when ptpSchedulingPolicy is set to SCHED_OTHER.

    ptpClockThreshold

    Optional. If ptpClockThreshold is not present, default values are used for the ptpClockThreshold fields. ptpClockThreshold configures how long after the PTP master clock is disconnected before PTP events are triggered. holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected. The maxOffsetThreshold and minOffsetThreshold settings configure offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l). When the ptp4l or phc2sys offset value is outside this range, the PTP clock state is set to FREERUN. When the offset value is within this range, the PTP clock state is set to LOCKED.

    recommend

    Specify an array of one or more recommend objects that define rules on how the profile should be applied to nodes.

    .recommend.profile

    Specify the .recommend.profile object name defined in the profile section.

    .recommend.priority

    Specify the priority with an integer value between 0 and 99. A larger number gets lower priority, so a priority of 99 is lower than a priority of 10. If a node can be matched with multiple profiles according to rules defined in the match field, the profile with the higher priority is applied to that node.

    .recommend.match

    Specify .recommend.match rules with nodeLabel or nodeName values.

    .recommend.match.nodeLabel

    Set nodeLabel with the key of the node.Labels field from the node object by using the oc get nodes --show-labels command. For example, node-role.kubernetes.io/worker.

    .recommend.match.nodeName

    Set nodeName with the value of the node.Name field from the node object by using the oc get nodes command. For example, compute-1.example.com.

  2. Create the CR by running the following command:

    $ oc create -f boundary-clock-ptp-config.yaml
    Copy to Clipboard Toggle word wrap

Verification

  1. Check that the PtpConfig profile is applied to the node.

    1. Get the list of pods in the openshift-ptp namespace by running the following command:

      $ oc get pods -n openshift-ptp -o wide
      Copy to Clipboard Toggle word wrap

      Example output

      NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE
      linuxptp-daemon-4xkbb           1/1     Running   0          43m   10.1.196.24      compute-0.example.com
      linuxptp-daemon-tdspf           1/1     Running   0          43m   10.1.196.25      compute-1.example.com
      ptp-operator-657bbb64c8-2f8sj   1/1     Running   0          43m   10.129.0.61      control-plane-1.example.com
      Copy to Clipboard Toggle word wrap

    2. Check that the profile is correct. Examine the logs of the linuxptp daemon that corresponds to the node you specified in the PtpConfig profile. Run the following command:

      $ oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container
      Copy to Clipboard Toggle word wrap

      Example output

      I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile
      I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to:
      I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------
      I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1
      I1115 09:41:17.117616 4143292 daemon.go:102] Interface:
      I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2
      I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24
      I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------
      Copy to Clipboard Toggle word wrap

You can configure the linuxptp services (ptp4l, phc2sys) as boundary clocks for dual-NIC hardware by creating a PtpConfig custom resource (CR) object for each NIC.

Dual NIC hardware allows you to connect each NIC to the same upstream leader clock with separate ptp4l instances for each NIC feeding the downstream clocks.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the PTP Operator.

Procedure

  1. Create two separate PtpConfig CRs, one for each NIC, using the reference CR in "Configuring linuxptp services as a boundary clock" as the basis for each CR. For example:

    1. Create boundary-clock-ptp-config-nic1.yaml, specifying values for phc2sysOpts:

      apiVersion: ptp.openshift.io/v1
      kind: PtpConfig
      metadata:
        name: boundary-clock-ptp-config-nic1
        namespace: openshift-ptp
      spec:
        profile:
        - name: "profile1"
          ptp4lOpts: "-2 --summary_interval -4"
          ptp4lConf: | 
      1
      
            [ens5f1]
            masterOnly 1
            [ens5f0]
            masterOnly 0
          ...
          phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" 
      2
      Copy to Clipboard Toggle word wrap
      1
      Specify the required interfaces to start ptp4l as a boundary clock. For example, ens5f0 synchronizes from a grandmaster clock and ens5f1 synchronizes connected devices.
      2
      Required phc2sysOpts values. -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.
    2. Create boundary-clock-ptp-config-nic2.yaml, removing the phc2sysOpts field altogether to disable the phc2sys service for the second NIC:

      apiVersion: ptp.openshift.io/v1
      kind: PtpConfig
      metadata:
        name: boundary-clock-ptp-config-nic2
        namespace: openshift-ptp
      spec:
        profile:
        - name: "profile2"
          ptp4lOpts: "-2 --summary_interval -4"
          ptp4lConf: | 
      1
      
            [ens7f1]
            masterOnly 1
            [ens7f0]
            masterOnly 0
      ...
      Copy to Clipboard Toggle word wrap
      1
      Specify the required interfaces to start ptp4l as a boundary clock on the second NIC.
      Note

      You must completely remove the phc2sysOpts field from the second PtpConfig CR to disable the phc2sys service on the second NIC.

  2. Create the dual-NIC PtpConfig CRs by running the following commands:

    1. Create the CR that configures PTP for the first NIC:

      $ oc create -f boundary-clock-ptp-config-nic1.yaml
      Copy to Clipboard Toggle word wrap
    2. Create the CR that configures PTP for the second NIC:

      $ oc create -f boundary-clock-ptp-config-nic2.yaml
      Copy to Clipboard Toggle word wrap

Verification

  • Check that the PTP Operator has applied the PtpConfig CRs for both NICs. Examine the logs for the linuxptp daemon corresponding to the node that has the dual-NIC hardware installed. For example, run the following command:

    $ oc logs linuxptp-daemon-cvgr6 -n openshift-ptp -c linuxptp-daemon-container
    Copy to Clipboard Toggle word wrap

    Example output

    ptp4l[80828.335]: [ptp4l.1.config] master offset          5 s2 freq   -5727 path delay       519
    ptp4l[80828.343]: [ptp4l.0.config] master offset         -5 s2 freq  -10607 path delay       533
    phc2sys[80828.390]: [ptp4l.0.config] CLOCK_REALTIME phc offset         1 s2 freq  -87239 delay    539
    Copy to Clipboard Toggle word wrap

You can configure the linuxptp services ptp4l and phc2sys as a highly available (HA) system clock for dual PTP boundary clocks (T-BC).

The highly available system clock uses multiple time sources from dual-NIC Intel E810 Salem channel hardware configured as two boundary clocks. Two boundary clocks instances participate in the HA setup, each with its own configuration profile. You connect each NIC to the same upstream leader clock with separate ptp4l instances for each NIC feeding the downstream clocks.

Create two PtpConfig custom resource (CR) objects that configure the NICs as T-BC and a third PtpConfig CR that configures high availability between the two NICs.

Important

You set phc2SysOpts options once in the PtpConfig CR that configures HA. Set the phc2sysOpts field to an empty string in the PtpConfig CRs that configure the two NICs. This prevents individual phc2sys processes from being set up for the two profiles.

The third PtpConfig CR configures a highly available system clock service. The CR sets the ptp4lOpts field to an empty string to prevent the ptp4l process from running. The CR adds profiles for the ptp4l configurations under the spec.profile.ptpSettings.haProfiles key and passes the kernel socket path of those profiles to the phc2sys service. When a ptp4l failure occurs, the phc2sys service switches to the backup ptp4l configuration. When the primary profile becomes active again, the phc2sys service reverts to the original state.

Important

Ensure that you set spec.recommend.priority to the same value for all three PtpConfig CRs that you use to configure HA.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the PTP Operator.
  • Configure a cluster node with Intel E810 Salem channel dual-NIC.

Procedure

  1. Create two separate PtpConfig CRs, one for each NIC, using the CRs in "Configuring linuxptp services as boundary clocks for dual-NIC hardware" as a reference for each CR.

    1. Create the ha-ptp-config-nic1.yaml file, specifying an empty string for the phc2sysOpts field. For example:

      apiVersion: ptp.openshift.io/v1
      kind: PtpConfig
      metadata:
        name: ha-ptp-config-nic1
        namespace: openshift-ptp
      spec:
        profile:
        - name: "ha-ptp-config-profile1"
          ptp4lOpts: "-2 --summary_interval -4"
          ptp4lConf: | 
      1
      
            [ens5f1]
            masterOnly 1
            [ens5f0]
            masterOnly 0
          #...
          phc2sysOpts: "" 
      2
      Copy to Clipboard Toggle word wrap
      1
      Specify the required interfaces to start ptp4l as a boundary clock. For example, ens5f0 synchronizes from a grandmaster clock and ens5f1 synchronizes connected devices.
      2
      Set phc2sysOpts with an empty string. These values are populated from the spec.profile.ptpSettings.haProfiles field of the PtpConfig CR that configures high availability.
    2. Apply the PtpConfig CR for NIC 1 by running the following command:

      $ oc create -f ha-ptp-config-nic1.yaml
      Copy to Clipboard Toggle word wrap
    3. Create the ha-ptp-config-nic2.yaml file, specifying an empty string for the phc2sysOpts field. For example:

      apiVersion: ptp.openshift.io/v1
      kind: PtpConfig
      metadata:
        name: ha-ptp-config-nic2
        namespace: openshift-ptp
      spec:
        profile:
        - name: "ha-ptp-config-profile2"
          ptp4lOpts: "-2 --summary_interval -4"
          ptp4lConf: |
            [ens7f1]
            masterOnly 1
            [ens7f0]
            masterOnly 0
          #...
          phc2sysOpts: ""
      Copy to Clipboard Toggle word wrap
    4. Apply the PtpConfig CR for NIC 2 by running the following command:

      $ oc create -f ha-ptp-config-nic2.yaml
      Copy to Clipboard Toggle word wrap
  2. Create the PtpConfig CR that configures the HA system clock. For example:

    1. Create the ptp-config-for-ha.yaml file. Set haProfiles to match the metadata.name fields that are set in the PtpConfig CRs that configure the two NICs. For example: haProfiles: ha-ptp-config-nic1,ha-ptp-config-nic2

      apiVersion: ptp.openshift.io/v1
      kind: PtpConfig
      metadata:
        name: boundary-ha
        namespace: openshift-ptp
        annotations: {}
      spec:
        profile:
          - name: "boundary-ha"
            ptp4lOpts: "" 
      1
      
            phc2sysOpts: "-a -r -n 24"
            ptpSchedulingPolicy: SCHED_FIFO
            ptpSchedulingPriority: 10
            ptpSettings:
              logReduce: "true"
              haProfiles: "$profile1,$profile2"
        recommend:
          - profile: "boundary-ha"
            priority: 4
            match:
              - nodeLabel: "node-role.kubernetes.io/$mcp"
      Copy to Clipboard Toggle word wrap
      1
      Set the ptp4lOpts field to an empty string. If it is not empty, the p4ptl process starts with a critical error.
    Important

    Do not apply the high availability PtpConfig CR before the PtpConfig CRs that configure the individual NICs.

    1. Apply the HA PtpConfig CR by running the following command:

      $ oc create -f ptp-config-for-ha.yaml
      Copy to Clipboard Toggle word wrap

Verification

  • Verify that the PTP Operator has applied the PtpConfig CRs correctly. Perform the following steps:

    1. Get the list of pods in the openshift-ptp namespace by running the following command:

      $ oc get pods -n openshift-ptp -o wide
      Copy to Clipboard Toggle word wrap

      Example output

      NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE
      linuxptp-daemon-4xkrb           1/1     Running   0          43m   10.1.196.24      compute-0.example.com
      ptp-operator-657bbq64c8-2f8sj   1/1     Running   0          43m   10.129.0.61      control-plane-1.example.com
      Copy to Clipboard Toggle word wrap

      Note

      There should be only one linuxptp-daemon pod.

    2. Check that the profile is correct by running the following command. Examine the logs of the linuxptp daemon that corresponds to the node you specified in the PtpConfig profile.

      $ oc logs linuxptp-daemon-4xkrb -n openshift-ptp -c linuxptp-daemon-container
      Copy to Clipboard Toggle word wrap

      Example output

      I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile
      I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to:
      I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------
      I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: ha-ptp-config-profile1
      I1115 09:41:17.117616 4143292 daemon.go:102] Interface:
      I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2
      I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24
      I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------
      Copy to Clipboard Toggle word wrap

You can configure linuxptp services (ptp4l, phc2sys) as ordinary clock by creating a PtpConfig custom resource (CR) object.

Note

Use the following example PtpConfig CR as the basis to configure linuxptp services as an ordinary clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts, ptp4lConf, and ptpClockThreshold. ptpClockThreshold is required only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the PTP Operator.

Procedure

  1. Create the following PtpConfig CR, and then save the YAML in the ordinary-clock-ptp-config.yaml file.

    Example PTP ordinary clock configuration

    apiVersion: ptp.openshift.io/v1
    kind: PtpConfig
    metadata:
      name: ordinary-clock
      namespace: openshift-ptp
      annotations: {}
    spec:
      profile:
        - name: ordinary-clock
          # The interface name is hardware-specific
          interface: $interface
          ptp4lOpts: "-2 -s"
          phc2sysOpts: "-a -r -n 24"
          ptpSchedulingPolicy: SCHED_FIFO
          ptpSchedulingPriority: 10
          ptpSettings:
            logReduce: "true"
          ptp4lConf: |
            [global]
            #
            # Default Data Set
            #
            twoStepFlag 1
            slaveOnly 1
            priority1 128
            priority2 128
            domainNumber 24
            #utc_offset 37
            clockClass 255
            clockAccuracy 0xFE
            offsetScaledLogVariance 0xFFFF
            free_running 0
            freq_est_interval 1
            dscp_event 0
            dscp_general 0
            dataset_comparison G.8275.x
            G.8275.defaultDS.localPriority 128
            #
            # Port Data Set
            #
            logAnnounceInterval -3
            logSyncInterval -4
            logMinDelayReqInterval -4
            logMinPdelayReqInterval -4
            announceReceiptTimeout 3
            syncReceiptTimeout 0
            delayAsymmetry 0
            fault_reset_interval -4
            neighborPropDelayThresh 20000000
            masterOnly 0
            G.8275.portDS.localPriority 128
            #
            # Run time options
            #
            assume_two_step 0
            logging_level 6
            path_trace_enabled 0
            follow_up_info 0
            hybrid_e2e 0
            inhibit_multicast_service 0
            net_sync_monitor 0
            tc_spanning_tree 0
            tx_timestamp_timeout 50
            unicast_listen 0
            unicast_master_table 0
            unicast_req_duration 3600
            use_syslog 1
            verbose 0
            summary_interval 0
            kernel_leap 1
            check_fup_sync 0
            clock_class_threshold 7
            #
            # Servo Options
            #
            pi_proportional_const 0.0
            pi_integral_const 0.0
            pi_proportional_scale 0.0
            pi_proportional_exponent -0.3
            pi_proportional_norm_max 0.7
            pi_integral_scale 0.0
            pi_integral_exponent 0.4
            pi_integral_norm_max 0.3
            step_threshold 2.0
            first_step_threshold 0.00002
            max_frequency 900000000
            clock_servo pi
            sanity_freq_limit 200000000
            ntpshm_segment 0
            #
            # Transport options
            #
            transportSpecific 0x0
            ptp_dst_mac 01:1B:19:00:00:00
            p2p_dst_mac 01:80:C2:00:00:0E
            udp_ttl 1
            udp6_scope 0x0E
            uds_address /var/run/ptp4l
            #
            # Default interface options
            #
            clock_type OC
            network_transport L2
            delay_mechanism E2E
            time_stamping hardware
            tsproc_mode filter
            delay_filter moving_median
            delay_filter_length 10
            egressLatency 0
            ingressLatency 0
            boundary_clock_jbod 0
            #
            # Clock description
            #
            productDescription ;;
            revisionData ;;
            manufacturerIdentity 00:00:00
            userDescription ;
            timeSource 0xA0
      recommend:
        - profile: ordinary-clock
          priority: 4
          match:
            - nodeLabel: "node-role.kubernetes.io/$mcp"
    Copy to Clipboard Toggle word wrap

    Expand
    Table 15.8. PTP ordinary clock CR configuration options
    CR fieldDescription

    name

    The name of the PtpConfig CR.

    profile

    Specify an array of one or more profile objects. Each profile must be uniquely named.

    interface

    Specify the network interface to be used by the ptp4l service, for example ens787f1.

    ptp4lOpts

    Specify system config options for the ptp4l service, for example -2 to select the IEEE 802.3 network transport. The options should not include the network interface name -i <interface> and service config file -f /etc/ptp4l.conf because the network interface name and the service config file are automatically appended. Append --summary_interval -4 to use PTP fast events with this interface.

    phc2sysOpts

    Specify system config options for the phc2sys service. If this field is empty, the PTP Operator does not start the phc2sys service. For Intel Columbiaville 800 Series NICs, set phc2sysOpts options to -a -r -m -n 24 -N 8 -R 16. -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.

    ptp4lConf

    Specify a string that contains the configuration to replace the default /etc/ptp4l.conf file. To use the default configuration, leave the field empty.

    tx_timestamp_timeout

    For Intel Columbiaville 800 Series NICs, set tx_timestamp_timeout to 50.

    boundary_clock_jbod

    For Intel Columbiaville 800 Series NICs, set boundary_clock_jbod to 0.

    ptpSchedulingPolicy

    Scheduling policy for ptp4l and phc2sys processes. Default value is SCHED_OTHER. Use SCHED_FIFO on systems that support FIFO scheduling.

    ptpSchedulingPriority

    Integer value from 1-65 used to set FIFO priority for ptp4l and phc2sys processes when ptpSchedulingPolicy is set to SCHED_FIFO. The ptpSchedulingPriority field is not used when ptpSchedulingPolicy is set to SCHED_OTHER.

    ptpClockThreshold

    Optional. If ptpClockThreshold is not present, default values are used for the ptpClockThreshold fields. ptpClockThreshold configures how long after the PTP master clock is disconnected before PTP events are triggered. holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected. The maxOffsetThreshold and minOffsetThreshold settings configure offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l). When the ptp4l or phc2sys offset value is outside this range, the PTP clock state is set to FREERUN. When the offset value is within this range, the PTP clock state is set to LOCKED.

    recommend

    Specify an array of one or more recommend objects that define rules on how the profile should be applied to nodes.

    .recommend.profile

    Specify the .recommend.profile object name defined in the profile section.

    .recommend.priority

    Set .recommend.priority to 0 for ordinary clock.

    .recommend.match

    Specify .recommend.match rules with nodeLabel or nodeName values.

    .recommend.match.nodeLabel

    Set nodeLabel with the key of the node.Labels field from the node object by using the oc get nodes --show-labels command. For example, node-role.kubernetes.io/worker.

    .recommend.match.nodeName

    Set nodeName with the value of the node.Name field from the node object by using the oc get nodes command. For example, compute-1.example.com.

  2. Create the PtpConfig CR by running the following command:

    $ oc create -f ordinary-clock-ptp-config.yaml
    Copy to Clipboard Toggle word wrap

Verification

  1. Check that the PtpConfig profile is applied to the node.

    1. Get the list of pods in the openshift-ptp namespace by running the following command:

      $ oc get pods -n openshift-ptp -o wide
      Copy to Clipboard Toggle word wrap

      Example output

      NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE
      linuxptp-daemon-4xkbb           1/1     Running   0          43m   10.1.196.24      compute-0.example.com
      linuxptp-daemon-tdspf           1/1     Running   0          43m   10.1.196.25      compute-1.example.com
      ptp-operator-657bbb64c8-2f8sj   1/1     Running   0          43m   10.129.0.61      control-plane-1.example.com
      Copy to Clipboard Toggle word wrap

    2. Check that the profile is correct. Examine the logs of the linuxptp daemon that corresponds to the node you specified in the PtpConfig profile. Run the following command:

      $ oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container
      Copy to Clipboard Toggle word wrap

      Example output

      I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile
      I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to:
      I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------
      I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1
      I1115 09:41:17.117616 4143292 daemon.go:102] Interface: ens787f1
      I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2 -s
      I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24
      I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------
      Copy to Clipboard Toggle word wrap

The following table describes the changes that you must make to the reference PTP configuration to use Intel Columbiaville E800 series NICs as ordinary clocks. Make the changes in a PtpConfig custom resource (CR) that you apply to the cluster.

Expand
Table 15.9. Recommended PTP settings for Intel Columbiaville NIC
PTP configurationRecommended setting

phc2sysOpts

-a -r -m -n 24 -N 8 -R 16

tx_timestamp_timeout

50

boundary_clock_jbod

0

Note

For phc2sysOpts, -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.

In telco or other deployment types that require low latency performance, PTP daemon threads run in a constrained CPU footprint alongside the rest of the infrastructure components. By default, PTP threads run with the SCHED_OTHER policy. Under high load, these threads might not get the scheduling latency they require for error-free operation.

To mitigate against potential scheduling latency errors, you can configure the PTP Operator linuxptp services to allow threads to run with a SCHED_FIFO policy. If SCHED_FIFO is set for a PtpConfig CR, then ptp4l and phc2sys will run in the parent container under chrt with a priority set by the ptpSchedulingPriority field of the PtpConfig CR.

Note

Setting ptpSchedulingPolicy is optional, and is only required if you are experiencing latency errors.

Procedure

  1. Edit the PtpConfig CR profile:

    $ oc edit PtpConfig -n openshift-ptp
    Copy to Clipboard Toggle word wrap
  2. Change the ptpSchedulingPolicy and ptpSchedulingPriority fields:

    apiVersion: ptp.openshift.io/v1
    kind: PtpConfig
    metadata:
      name: <ptp_config_name>
      namespace: openshift-ptp
    ...
    spec:
      profile:
      - name: "profile1"
    ...
        ptpSchedulingPolicy: SCHED_FIFO 
    1
    
        ptpSchedulingPriority: 10 
    2
    Copy to Clipboard Toggle word wrap
    1
    Scheduling policy for ptp4l and phc2sys processes. Use SCHED_FIFO on systems that support FIFO scheduling.
    2
    Required. Sets the integer value 1-65 used to configure FIFO priority for ptp4l and phc2sys processes.
  3. Save and exit to apply the changes to the PtpConfig CR.

Verification

  1. Get the name of the linuxptp-daemon pod and corresponding node where the PtpConfig CR has been applied:

    $ oc get pods -n openshift-ptp -o wide
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
    linuxptp-daemon-gmv2n           3/3     Running   0          1d17h   10.1.196.24   compute-0.example.com
    linuxptp-daemon-lgm55           3/3     Running   0          1d17h   10.1.196.25   compute-1.example.com
    ptp-operator-3r4dcvf7f4-zndk7   1/1     Running   0          1d7h    10.129.0.61   control-plane-1.example.com
    Copy to Clipboard Toggle word wrap

  2. Check that the ptp4l process is running with the updated chrt FIFO priority:

    $ oc -n openshift-ptp logs linuxptp-daemon-lgm55 -c linuxptp-daemon-container|grep chrt
    Copy to Clipboard Toggle word wrap

    Example output

    I1216 19:24:57.091872 1600715 daemon.go:285] /bin/chrt -f 65 /usr/sbin/ptp4l -f /var/run/ptp4l.0.config -2  --summary_interval -4 -m
    Copy to Clipboard Toggle word wrap

The linuxptp daemon generates logs that you can use for debugging purposes. In telco or other deployment types that feature a limited storage capacity, these logs can add to the storage demand.

To reduce the number log messages, you can configure the PtpConfig custom resource (CR) to exclude log messages that report the master offset value. The master offset log message reports the difference between the current node’s clock and the master clock in nanoseconds.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the PTP Operator.

Procedure

  1. Edit the PtpConfig CR:

    $ oc edit PtpConfig -n openshift-ptp
    Copy to Clipboard Toggle word wrap
  2. In spec.profile, add the ptpSettings.logReduce specification and set the value to true:

    apiVersion: ptp.openshift.io/v1
    kind: PtpConfig
    metadata:
      name: <ptp_config_name>
      namespace: openshift-ptp
    ...
    spec:
      profile:
      - name: "profile1"
    ...
        ptpSettings:
          logReduce: "true"
    Copy to Clipboard Toggle word wrap
    Note

    For debugging purposes, you can revert this specification to False to include the master offset messages.

  3. Save and exit to apply the changes to the PtpConfig CR.

Verification

  1. Get the name of the linuxptp-daemon pod and corresponding node where the PtpConfig CR has been applied:

    $ oc get pods -n openshift-ptp -o wide
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
    linuxptp-daemon-gmv2n           3/3     Running   0          1d17h   10.1.196.24   compute-0.example.com
    linuxptp-daemon-lgm55           3/3     Running   0          1d17h   10.1.196.25   compute-1.example.com
    ptp-operator-3r4dcvf7f4-zndk7   1/1     Running   0          1d7h    10.129.0.61   control-plane-1.example.com
    Copy to Clipboard Toggle word wrap

  2. Verify that master offset messages are excluded from the logs by running the following command:

    $ oc -n openshift-ptp logs <linux_daemon_container> -c linuxptp-daemon-container | grep "master offset" 
    1
    Copy to Clipboard Toggle word wrap
    1
    <linux_daemon_container> is the name of the linuxptp-daemon pod, for example linuxptp-daemon-gmv2n.

    When you configure the logReduce specification, this command does not report any instances of master offset in the logs of the linuxptp daemon.

Troubleshoot common problems with the PTP Operator by performing the following steps.

Prerequisites

  • Install the OpenShift Container Platform CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the PTP Operator on a bare-metal cluster with hosts that support PTP.

Procedure

  1. Check the Operator and operands are successfully deployed in the cluster for the configured nodes.

    $ oc get pods -n openshift-ptp -o wide
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
    linuxptp-daemon-lmvgn           3/3     Running   0          4d17h   10.1.196.24   compute-0.example.com
    linuxptp-daemon-qhfg7           3/3     Running   0          4d17h   10.1.196.25   compute-1.example.com
    ptp-operator-6b8dcbf7f4-zndk7   1/1     Running   0          5d7h    10.129.0.61   control-plane-1.example.com
    Copy to Clipboard Toggle word wrap

    Note

    When the PTP fast event bus is enabled, the number of ready linuxptp-daemon pods is 3/3. If the PTP fast event bus is not enabled, 2/2 is displayed.

  2. Check that supported hardware is found in the cluster.

    $ oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                                  AGE
    control-plane-0.example.com           10d
    control-plane-1.example.com           10d
    compute-0.example.com                 10d
    compute-1.example.com                 10d
    compute-2.example.com                 10d
    Copy to Clipboard Toggle word wrap

  3. Check the available PTP network interfaces for a node:

    $ oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io <node_name> -o yaml
    Copy to Clipboard Toggle word wrap

    where:

    <node_name>

    Specifies the node you want to query, for example, compute-0.example.com.

    Example output

    apiVersion: ptp.openshift.io/v1
    kind: NodePtpDevice
    metadata:
      creationTimestamp: "2021-09-14T16:52:33Z"
      generation: 1
      name: compute-0.example.com
      namespace: openshift-ptp
      resourceVersion: "177400"
      uid: 30413db0-4d8d-46da-9bef-737bacd548fd
    spec: {}
    status:
      devices:
      - name: eno1
      - name: eno2
      - name: eno3
      - name: eno4
      - name: enp5s0f0
      - name: enp5s0f1
    Copy to Clipboard Toggle word wrap

  4. Check that the PTP interface is successfully synchronized to the primary clock by accessing the linuxptp-daemon pod for the corresponding node.

    1. Get the name of the linuxptp-daemon pod and corresponding node you want to troubleshoot by running the following command:

      $ oc get pods -n openshift-ptp -o wide
      Copy to Clipboard Toggle word wrap

      Example output

      NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
      linuxptp-daemon-lmvgn           3/3     Running   0          4d17h   10.1.196.24   compute-0.example.com
      linuxptp-daemon-qhfg7           3/3     Running   0          4d17h   10.1.196.25   compute-1.example.com
      ptp-operator-6b8dcbf7f4-zndk7   1/1     Running   0          5d7h    10.129.0.61   control-plane-1.example.com
      Copy to Clipboard Toggle word wrap

    2. Remote shell into the required linuxptp-daemon container:

      $ oc rsh -n openshift-ptp -c linuxptp-daemon-container <linux_daemon_container>
      Copy to Clipboard Toggle word wrap

      where:

      <linux_daemon_container>
      is the container you want to diagnose, for example linuxptp-daemon-lmvgn.
    3. In the remote shell connection to the linuxptp-daemon container, use the PTP Management Client (pmc) tool to diagnose the network interface. Run the following pmc command to check the sync status of the PTP device, for example ptp4l.

      # pmc -u -f /var/run/ptp4l.0.config -b 0 'GET PORT_DATA_SET'
      Copy to Clipboard Toggle word wrap

      Example output when the node is successfully synced to the primary clock

      sending: GET PORT_DATA_SET
          40a6b7.fffe.166ef0-1 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET
              portIdentity            40a6b7.fffe.166ef0-1
              portState               SLAVE
              logMinDelayReqInterval  -4
              peerMeanPathDelay       0
              logAnnounceInterval     -3
              announceReceiptTimeout  3
              logSyncInterval         -4
              delayMechanism          1
              logMinPdelayReqInterval -4
              versionNumber           2
      Copy to Clipboard Toggle word wrap

  5. For GNSS-sourced grandmaster clocks, verify that the in-tree NIC ice driver is correct by running the following command, for example:

    $ oc rsh -n openshift-ptp -c linuxptp-daemon-container linuxptp-daemon-74m2g ethtool -i ens7f0
    Copy to Clipboard Toggle word wrap

    Example output

    driver: ice
    version: 5.14.0-356.bz2232515.el9.x86_64
    firmware-version: 4.20 0x8001778b 1.3346.0
    Copy to Clipboard Toggle word wrap

  6. For GNSS-sourced grandmaster clocks, verify that the linuxptp-daemon container is receiving signal from the GNSS antenna. If the container is not receiving the GNSS signal, the /dev/gnss0 file is not populated. To verify, run the following command:

    $ oc rsh -n openshift-ptp -c linuxptp-daemon-container linuxptp-daemon-jnz6r cat /dev/gnss0
    Copy to Clipboard Toggle word wrap

    Example output

    $GNRMC,125223.00,A,4233.24463,N,07126.64561,W,0.000,,300823,,,A,V*0A
    $GNVTG,,T,,M,0.000,N,0.000,K,A*3D
    $GNGGA,125223.00,4233.24463,N,07126.64561,W,1,12,99.99,98.6,M,-33.1,M,,*7E
    $GNGSA,A,3,25,17,19,11,12,06,05,04,09,20,,,99.99,99.99,99.99,1*37
    $GPGSV,3,1,10,04,12,039,41,05,31,222,46,06,50,064,48,09,28,064,42,1*62
    Copy to Clipboard Toggle word wrap

You can get the digital phase-locked loop (DPLL) firmware version for the Clock Generation Unit (CGU) in an Intel 800 series NIC by opening a debug shell to the cluster node and querying the NIC hardware.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have logged in as a user with cluster-admin privileges.
  • You have installed an Intel 800 series NIC in the cluster host.
  • You have installed the PTP Operator on a bare-metal cluster with hosts that support PTP.

Procedure

  1. Start a debug pod by running the following command:

    $ oc debug node/<node_name>
    Copy to Clipboard Toggle word wrap

    where:

    <node_name>
    Is the node where you have installed the Intel 800 series NIC.
  2. Check the CGU firmware version in the NIC by using the devlink tool and the bus and device name where the NIC is installed. For example, run the following command:

    sh-4.4# devlink dev info <bus_name>/<device_name> | grep cgu
    Copy to Clipboard Toggle word wrap

    where:

    <bus_name>
    Is the bus where the NIC is installed. For example, pci.
    <device_name>
    Is the NIC device name. For example, 0000:51:00.0.

    Example output

    cgu.id 36 
    1
    
    fw.cgu 8032.16973825.6021 
    2
    Copy to Clipboard Toggle word wrap

    1
    CGU hardware revision number
    2
    The DPLL firmware version running in the CGU, where the DPLL firmware version is 6201, and the DPLL model is 8032. The string 16973825 is a shorthand representation of the binary version of the DPLL firmware version (1.3.0.1).
    Note

    The firmware version has a leading nibble and 3 octets for each part of the version number. The number 16973825 in binary is 0001 0000 0011 0000 0000 0000 0001. Use the binary value to decode the firmware version. For example:

    Expand
    Table 15.10. DPLL firmware version
    Binary partDecimal value

    0001

    1

    0000 0011

    3

    0000 0000

    0

    0000 0001

    1

15.2.13. Collecting PTP Operator data

You can use the oc adm must-gather command to collect information about your cluster, including features and objects associated with PTP Operator.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • You have installed the OpenShift CLI (oc).
  • You have installed the PTP Operator.

Procedure

  • To collect PTP Operator data with must-gather, you must specify the PTP Operator must-gather image.

    $ oc adm must-gather --image=registry.redhat.io/openshift4/ptp-must-gather-rhel9:v4.16
    Copy to Clipboard Toggle word wrap

When developing consumer applications that make use of Precision Time Protocol (PTP) events on a bare-metal cluster node, you deploy your consumer application in a separate application pod. The consumer application subscribes to PTP events by using the PTP events REST API v2.

Note

The following information provides general guidance for developing consumer applications that use PTP events. A complete events consumer application example is outside the scope of this information.

Use the Precision Time Protocol (PTP) fast event REST API v2 to subscribe cluster applications to PTP events that the bare-metal cluster node generates.

Note

The fast events notifications framework uses a REST API for communication. The PTP events REST API v1 and v2 are based on the O-RAN O-Cloud Notification API Specification for Event Consumers 4.0 that is available from O-RAN ALLIANCE Specifications.

Only the PTP events REST API v2 is O-RAN v4 compliant.

Applications subscribe to PTP events by using an O-RAN v4 compatible REST API in the producer-side cloud event proxy sidecar. The cloud-event-proxy sidecar container can access the same resources as the primary application container without using any of the resources of the primary application and with no significant latency.

Figure 15.5. Overview of consuming PTP fast events from the PTP event producer REST API v2

20 Event is generated on the cluster host
The linuxptp-daemon process in the PTP Operator-managed pod runs as a Kubernetes DaemonSet and manages the various linuxptp processes (ptp4l, phc2sys, and optionally for grandmaster clocks, ts2phc). The linuxptp-daemon passes the event to the UNIX domain socket.
20 Event is passed to the cloud-event-proxy sidecar
The PTP plugin reads the event from the UNIX domain socket and passes it to the cloud-event-proxy sidecar in the PTP Operator-managed pod. cloud-event-proxy delivers the event from the Kubernetes infrastructure to Cloud-Native Network Functions (CNFs) with low latency.
20 Event is published
The cloud-event-proxy sidecar in the PTP Operator-managed pod processes the event and publishes the event by using the PTP events REST API v2.
20 Consumer application requests a subscription and receives the subscribed event
The consumer application sends an API request to the producer cloud-event-proxy sidecar to create a PTP events subscription. Once subscribed, the consumer application listens to the address specified in the resource qualifier and receives and processes the PTP events.

To start using PTP fast event notifications for a network interface in your cluster, you must enable the fast event publisher in the PTP Operator PtpOperatorConfig custom resource (CR) and configure ptpClockThreshold values in a PtpConfig CR that you create.

Prerequisites

  • You have installed the OpenShift Container Platform CLI (oc).
  • You have logged in as a user with cluster-admin privileges.
  • You have installed the PTP Operator.

Procedure

  1. Modify the default PTP Operator config to enable PTP fast events.

    1. Save the following YAML in the ptp-operatorconfig.yaml file:

      apiVersion: ptp.openshift.io/v1
      kind: PtpOperatorConfig
      metadata:
        name: default
        namespace: openshift-ptp
      spec:
        daemonNodeSelector:
          node-role.kubernetes.io/worker: ""
        ptpEventConfig:
          apiVersion: "2.0" 
      1
      
          enableEventPublisher: true 
      2
      Copy to Clipboard Toggle word wrap
      1
      Enable the PTP events REST API v2 for the PTP event producer by setting the ptpEventConfig.apiVersion to "2.0". The default value is "1.0".
      2
      Enable PTP fast event notifications by setting enableEventPublisher to true.
      Note

      In OpenShift Container Platform 4.13 or later, you do not need to set the spec.ptpEventConfig.transportHost field in the PtpOperatorConfig resource when you use HTTP transport for PTP events.

    2. Update the PtpOperatorConfig CR:

      $ oc apply -f ptp-operatorconfig.yaml
      Copy to Clipboard Toggle word wrap
  2. Create a PtpConfig custom resource (CR) for the PTP enabled interface, and set the required values for ptpClockThreshold and ptp4lOpts. The following YAML illustrates the required values that you must set in the PtpConfig CR:

    spec:
      profile:
      - name: "profile1"
        interface: "enp5s0f0"
        ptp4lOpts: "-2 -s --summary_interval -4" 
    1
    
        phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" 
    2
    
        ptp4lConf: "" 
    3
    
        ptpClockThreshold: 
    4
    
          holdOverTimeout: 5
          maxOffsetThreshold: 100
          minOffsetThreshold: -100
    Copy to Clipboard Toggle word wrap
    1
    Append --summary_interval -4 to use PTP fast events.
    2
    Required phc2sysOpts values. -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.
    3
    Specify a string that contains the configuration to replace the default /etc/ptp4l.conf file. To use the default configuration, leave the field empty.
    4
    Optional. If the ptpClockThreshold stanza is not present, default values are used for the ptpClockThreshold fields. The stanza shows default ptpClockThreshold values. The ptpClockThreshold values configure how long after the PTP master clock is disconnected before PTP events are triggered. holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected. The maxOffsetThreshold and minOffsetThreshold settings configure offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l). When the ptp4l or phc2sys offset value is outside this range, the PTP clock state is set to FREERUN. When the offset value is within this range, the PTP clock state is set to LOCKED.

PTP event consumer applications require the following features:

  1. A web service running with a POST handler to receive the cloud native PTP events JSON payload
  2. A createSubscription function to subscribe to the PTP events producer
  3. A getCurrentState function to poll the current state of the PTP events producer

The following example Go snippets illustrate these requirements:

Example PTP events consumer server function in Go

func server() {
  http.HandleFunc("/event", getEvent)
  http.ListenAndServe(":9043", nil)
}

func getEvent(w http.ResponseWriter, req *http.Request) {
  defer req.Body.Close()
  bodyBytes, err := io.ReadAll(req.Body)
  if err != nil {
    log.Errorf("error reading event %v", err)
  }
  e := string(bodyBytes)
  if e != "" {
    processEvent(bodyBytes)
    log.Infof("received event %s", string(bodyBytes))
  }
  w.WriteHeader(http.StatusNoContent)
}
Copy to Clipboard Toggle word wrap

Example PTP events createSubscription function in Go

import (
"github.com/redhat-cne/sdk-go/pkg/pubsub"
"github.com/redhat-cne/sdk-go/pkg/types"
v1pubsub "github.com/redhat-cne/sdk-go/v1/pubsub"
)

// Subscribe to PTP events using v2 REST API
s1,_:=createsubscription("/cluster/node/<node_name>/sync/sync-status/sync-state")
s2,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/lock-state")
s3,_:=createsubscription("/cluster/node/<node_name>/sync/gnss-status/gnss-sync-status")
s4,_:=createsubscription("/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state")
s5,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/clock-class")

// Create PTP event subscriptions POST
func createSubscription(resourceAddress string) (sub pubsub.PubSub, err error) {
  var status int
  apiPath := "/api/ocloudNotifications/v2/"
  localAPIAddr := "consumer-events-subscription-service.cloud-events.svc.cluster.local:9043" // vDU service API address
  apiAddr := "ptp-event-publisher-service-<node_name>.openshift-ptp.svc.cluster.local:9043" 
1

  apiVersion := "2.0"

  subURL := &types.URI{URL: url.URL{Scheme: "http",
    Host: apiAddr
    Path: fmt.Sprintf("%s%s", apiPath, "subscriptions")}}
  endpointURL := &types.URI{URL: url.URL{Scheme: "http",
    Host: localAPIAddr,
    Path: "event"}}

  sub = v1pubsub.NewPubSub(endpointURL, resourceAddress, apiVersion)
  var subB []byte

  if subB, err = json.Marshal(&sub); err == nil {
    rc := restclient.New()
    if status, subB = rc.PostWithReturn(subURL, subB); status != http.StatusCreated {
      err = fmt.Errorf("error in subscription creation api at %s, returned status %d", subURL, status)
    } else {
      err = json.Unmarshal(subB, &sub)
    }
  } else {
    err = fmt.Errorf("failed to marshal subscription for %s", resourceAddress)
  }
  return
}
Copy to Clipboard Toggle word wrap

1
Replace <node_name> with the FQDN of the node that is generating the PTP events. For example, compute-1.example.com.

Example PTP events consumer getCurrentState function in Go

//Get PTP event state for the resource
func getCurrentState(resource string) {
  //Create publisher
  url := &types.URI{URL: url.URL{Scheme: "http",
    Host: "ptp-event-publisher-service-<node_name>.openshift-ptp.svc.cluster.local:9043", 
1

    Path: fmt.SPrintf("/api/ocloudNotifications/v2/%s/CurrentState",resource}}
  rc := restclient.New()
  status, event := rc.Get(url)
  if status != http.StatusOK {
    log.Errorf("CurrentState:error %d from url %s, %s", status, url.String(), event)
  } else {
    log.Debugf("Got CurrentState: %s ", event)
  }
}
Copy to Clipboard Toggle word wrap

1
Replace <node_name> with the FQDN of the node that is generating the PTP events. For example, compute-1.example.com.

Use the following example PTP event consumer custom resources (CRs) as a reference when deploying your PTP events consumer application for use with the PTP events REST API v2.

Reference cloud event consumer namespace

apiVersion: v1
kind: Namespace
metadata:
  name: cloud-events
  labels:
    security.openshift.io/scc.podSecurityLabelSync: "false"
    pod-security.kubernetes.io/audit: "privileged"
    pod-security.kubernetes.io/enforce: "privileged"
    pod-security.kubernetes.io/warn: "privileged"
    name: cloud-events
    openshift.io/cluster-monitoring: "true"
  annotations:
    workload.openshift.io/allowed: management
Copy to Clipboard Toggle word wrap

Reference cloud event consumer deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloud-consumer-deployment
  namespace: cloud-events
  labels:
    app: consumer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: consumer
  template:
    metadata:
      annotations:
        target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
      labels:
        app: consumer
    spec:
      nodeSelector:
        node-role.kubernetes.io/worker: ""
      serviceAccountName: consumer-sa
      containers:
        - name: cloud-event-consumer
          image: cloud-event-consumer
          imagePullPolicy: Always
          args:
            - "--local-api-addr=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043"
            - "--api-path=/api/ocloudNotifications/v2/"
            - "--api-addr=127.0.0.1:8089"
            - "--api-version=2.0"
            - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: CONSUMER_TYPE
              value: "PTP"
            - name: ENABLE_STATUS_CHECK
              value: "true"
      volumes:
        - name: pubsubstore
          emptyDir: {}
Copy to Clipboard Toggle word wrap

Reference cloud event consumer service account

apiVersion: v1
kind: ServiceAccount
metadata:
  name: consumer-sa
  namespace: cloud-events
Copy to Clipboard Toggle word wrap

Reference cloud event consumer service

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
  name: consumer-events-subscription-service
  namespace: cloud-events
  labels:
    app: consumer-service
spec:
  ports:
    - name: sub-port
      port: 9043
  selector:
    app: consumer
  sessionAffinity: None
  type: ClusterIP
Copy to Clipboard Toggle word wrap

Deploy your cloud-event-consumer application container and subscribe the cloud-event-consumer application to PTP events posted by the cloud-event-proxy container in the pod managed by the PTP Operator.

Subscribe consumer applications to PTP events by sending a POST request to http://ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions passing the appropriate subscription request payload.

Note

9043 is the default port for the cloud-event-proxy container deployed in the PTP event producer pod. You can configure a different port for your application as required.

Verify that the cloud-event-consumer container in the application pod is receiving Precision Time Protocol (PTP) events.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have logged in as a user with cluster-admin privileges.
  • You have installed and configured the PTP Operator.
  • You have deployed a cloud events application pod and PTP events consumer application.

Procedure

  1. Check the logs for the deployed events consumer application. For example, run the following command:

    $ oc -n cloud-events logs -f deployment/cloud-consumer-deployment
    Copy to Clipboard Toggle word wrap

    Example output

    time = "2024-09-02T13:49:01Z"
    level = info msg = "transport host path is set to  ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043"
    time = "2024-09-02T13:49:01Z"
    level = info msg = "apiVersion=2.0, updated apiAddr=ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043, apiPath=/api/ocloudNotifications/v2/"
    time = "2024-09-02T13:49:01Z"
    level = info msg = "Starting local API listening to :9043"
    time = "2024-09-02T13:49:06Z"
    level = info msg = "transport host path is set to  ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043"
    time = "2024-09-02T13:49:06Z"
    level = info msg = "checking for rest service health"
    time = "2024-09-02T13:49:06Z"
    level = info msg = "health check http://ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/health"
    time = "2024-09-02T13:49:07Z"
    level = info msg = "rest service returned healthy status"
    time = "2024-09-02T13:49:07Z"
    level = info msg = "healthy publisher; subscribing to events"
    time = "2024-09-02T13:49:07Z"
    level = info msg = "received event {\"specversion\":\"1.0\",\"id\":\"ab423275-f65d-4760-97af-5b0b846605e4\",\"source\":\"/sync/ptp-status/clock-class\",\"type\":\"event.sync.ptp-status.ptp-clock-class-change\",\"time\":\"2024-09-02T13:49:07.226494483Z\",\"data\":{\"version\":\"1.0\",\"values\":[{\"ResourceAddress\":\"/cluster/node/compute-1.example.com/ptp-not-set\",\"data_type\":\"metric\",\"value_type\":\"decimal64.3\",\"value\":\"0\"}]}}"
    Copy to Clipboard Toggle word wrap

  2. Optional. Test the REST API by using oc and port-forwarding port 9043 from the linuxptp-daemon deployment. For example, run the following command:

    $ oc port-forward -n openshift-ptp ds/linuxptp-daemon 9043:9043
    Copy to Clipboard Toggle word wrap

    Example output

    Forwarding from 127.0.0.1:9043 -> 9043
    Forwarding from [::1]:9043 -> 9043
    Handling connection for 9043
    Copy to Clipboard Toggle word wrap

    Open a new shell prompt and test the REST API v2 endpoints:

    $ curl -X GET http://localhost:9043/api/ocloudNotifications/v2/health
    Copy to Clipboard Toggle word wrap

    Example output

    OK
    Copy to Clipboard Toggle word wrap

15.3.8. Monitoring PTP fast event metrics

You can monitor PTP fast events metrics from cluster nodes where the linuxptp-daemon is running. You can also monitor PTP fast event metrics in the OpenShift Container Platform web console by using the preconfigured and self-updating Prometheus monitoring stack.

Prerequisites

  • Install the OpenShift Container Platform CLI oc.
  • Log in as a user with cluster-admin privileges.
  • Install and configure the PTP Operator on a node with PTP-capable hardware.

Procedure

  1. Start a debug pod for the node by running the following command:

    $ oc debug node/<node_name>
    Copy to Clipboard Toggle word wrap
  2. Check for PTP metrics exposed by the linuxptp-daemon container. For example, run the following command:

    sh-4.4# curl http://localhost:9091/metrics
    Copy to Clipboard Toggle word wrap

    Example output

    # HELP cne_api_events_published Metric to get number of events published by the rest api
    # TYPE cne_api_events_published gauge
    cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status",status="success"} 1
    cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",status="success"} 94
    cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/ptp-status/class-change",status="success"} 18
    cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",status="success"} 27
    Copy to Clipboard Toggle word wrap

  3. Optional. You can also find PTP events in the logs for the cloud-event-proxy container. For example, run the following command:

    $ oc logs -f linuxptp-daemon-cvgr6 -n openshift-ptp -c cloud-event-proxy
    Copy to Clipboard Toggle word wrap
  4. To view the PTP event in the OpenShift Container Platform web console, copy the name of the PTP metric you want to query, for example, openshift_ptp_offset_ns.
  5. In the OpenShift Container Platform web console, click ObserveMetrics.
  6. Paste the PTP metric name into the Expression field, and click Run queries.

15.3.9. PTP fast event metrics reference

The following table describes the PTP fast events metrics that are available from cluster nodes where the linuxptp-daemon service is running.

Expand
Table 15.11. PTP fast event metrics
MetricDescriptionExample

openshift_ptp_clock_class

Returns the PTP clock class for the interface. Possible values for PTP clock class are 6 (LOCKED), 7 (PRC UNLOCKED IN-SPEC), 52 (PRC UNLOCKED OUT-OF-SPEC), 187 (PRC UNLOCKED OUT-OF-SPEC), 135 (T-BC HOLDOVER IN-SPEC), 165 (T-BC HOLDOVER OUT-OF-SPEC), 248 (DEFAULT), or 255 (SLAVE ONLY CLOCK).

{node="compute-1.example.com",process="ptp4l"} 6

openshift_ptp_clock_state

Returns the current PTP clock state for the interface. Possible values for PTP clock state are FREERUN, LOCKED, or HOLDOVER.

{iface="CLOCK_REALTIME", node="compute-1.example.com", process="phc2sys"} 1

openshift_ptp_delay_ns

Returns the delay in nanoseconds between the primary clock sending the timing packet and the secondary clock receiving the timing packet.

{from="master", iface="ens2fx", node="compute-1.example.com", process="ts2phc"} 0

openshift_ptp_ha_profile_status

Returns the current status of the highly available system clock when there are multiple time sources on different NICs. Possible values are 0 (INACTIVE) and 1 (ACTIVE).

{node="node1",process="phc2sys",profile="profile1"} 1{node="node1",process="phc2sys",profile="profile2"} 0

openshift_ptp_frequency_adjustment_ns

Returns the frequency adjustment in nanoseconds between 2 PTP clocks. For example, between the upstream clock and the NIC, between the system clock and the NIC, or between the PTP hardware clock (phc) and the NIC.

{from="phc", iface="CLOCK_REALTIME", node="compute-1.example.com", process="phc2sys"} -6768

openshift_ptp_interface_role

Returns the configured PTP clock role for the interface. Possible values are 0 (PASSIVE), 1 (SLAVE), 2 (MASTER), 3 (FAULTY), 4 (UNKNOWN), or 5 (LISTENING).

{iface="ens2f0", node="compute-1.example.com", process="ptp4l"} 2

openshift_ptp_max_offset_ns

Returns the maximum offset in nanoseconds between 2 clocks or interfaces. For example, between the upstream GNSS clock and the NIC (ts2phc), or between the PTP hardware clock (phc) and the system clock (phc2sys).

{from="master", iface="ens2fx", node="compute-1.example.com", process="ts2phc"} 1.038099569e+09

openshift_ptp_offset_ns

Returns the offset in nanoseconds between the DPLL clock or the GNSS clock source and the NIC hardware clock.

{from="phc", iface="CLOCK_REALTIME", node="compute-1.example.com", process="phc2sys"} -9

openshift_ptp_process_restart_count

Returns a count of the number of times the ptp4l and ts2phc processes were restarted.

{config="ptp4l.0.config", node="compute-1.example.com",process="phc2sys"} 1

openshift_ptp_process_status

Returns a status code that shows whether the PTP processes are running or not.

{config="ptp4l.0.config", node="compute-1.example.com",process="phc2sys"} 1

openshift_ptp_threshold

Returns values for HoldOverTimeout, MaxOffsetThreshold, and MinOffsetThreshold.

  • holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected.
  • maxOffsetThreshold and minOffsetThreshold are offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l) values that you configure in the PtpConfig CR for the NIC.

{node="compute-1.example.com", profile="grandmaster", threshold="HoldOverTimeout"} 5

The following table describes the PTP fast event metrics that are available only when PTP grandmaster clock (T-GM) is enabled.

Expand
Table 15.12. PTP fast event metrics when T-GM is enabled
MetricDescriptionExample

openshift_ptp_frequency_status

Returns the current status of the digital phase-locked loop (DPLL) frequency for the NIC. Possible values are -1 (UNKNOWN), 0 (INVALID), 1 (FREERUN), 2 (LOCKED), 3 (LOCKED_HO_ACQ), or 4 (HOLDOVER).

{from="dpll",iface="ens2fx",node="compute-1.example.com",process="dpll"} 3

openshift_ptp_nmea_status

Returns the current status of the NMEA connection. NMEA is the protocol that is used for 1PPS NIC connections. Possible values are 0 (UNAVAILABLE) and 1 (AVAILABLE).

{iface="ens2fx",node="compute-1.example.com",process="ts2phc"} 1

openshift_ptp_phase_status

Returns the status of the DPLL phase for the NIC. Possible values are -1 (UNKNOWN), 0 (INVALID), 1 (FREERUN), 2 (LOCKED), 3 (LOCKED_HO_ACQ), or 4 (HOLDOVER).

{from="dpll",iface="ens2fx",node="compute-1.example.com",process="dpll"} 3

openshift_ptp_pps_status

Returns the current status of the NIC 1PPS connection. You use the 1PPS connection to synchronize timing between connected NICs. Possible values are 0 (UNAVAILABLE) and 1 (AVAILABLE).

{from="dpll",iface="ens2fx",node="compute-1.example.com",process="dpll"} 1

openshift_ptp_gnss_status

Returns the current status of the global navigation satellite system (GNSS) connection. GNSS provides satellite-based positioning, navigation, and timing services globally. Possible values are 0 (NOFIX), 1 (DEAD RECKONING ONLY), 2 (2D-FIX), 3 (3D-FIX), 4 (GPS+DEAD RECKONING FIX), 5, (TIME ONLY FIX).

{from="gnss",iface="ens2fx",node="compute-1.example.com",process="gnss"} 3

15.4. PTP events REST API v2 reference

Use the following REST API v2 endpoints to subscribe the cloud-event-consumer application to Precision Time Protocol (PTP) events posted at http://localhost:9043/api/ocloudNotifications/v2 in the PTP events producer pod.

15.4.1. PTP events REST API v2 endpoints

15.4.1.1. api/ocloudNotifications/v2/subscriptions
HTTP method

GET api/ocloudNotifications/v2/subscriptions

Description

Returns a list of subscriptions. If subscriptions exist, a 200 OK status code is returned along with the list of subscriptions.

Example API response

[
 {
  "ResourceAddress": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
  "EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
  "SubscriptionId": "ccedbf08-3f96-4839-a0b6-2eb0401855ed",
  "UriLocation": "http://ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions/ccedbf08-3f96-4839-a0b6-2eb0401855ed"
 },
 {
  "ResourceAddress": "/cluster/node/compute-1.example.com/sync/ptp-status/clock-class",
  "EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
  "SubscriptionId": "a939a656-1b7d-4071-8cf1-f99af6e931f2",
  "UriLocation": "http://ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions/a939a656-1b7d-4071-8cf1-f99af6e931f2"
 },
 {
  "ResourceAddress": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
  "EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
  "SubscriptionId": "ba4564a3-4d9e-46c5-b118-591d3105473c",
  "UriLocation": "http://ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions/ba4564a3-4d9e-46c5-b118-591d3105473c"
 },
 {
  "ResourceAddress": "/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status",
  "EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
  "SubscriptionId": "ea0d772e-f00a-4889-98be-51635559b4fb",
  "UriLocation": "http://ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions/ea0d772e-f00a-4889-98be-51635559b4fb"
 },
 {
  "ResourceAddress": "/cluster/node/compute-1.example.com/sync/sync-status/sync-state",
  "EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
  "SubscriptionId": "762999bf-b4a0-4bad-abe8-66e646b65754",
  "UriLocation": "http://ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions/762999bf-b4a0-4bad-abe8-66e646b65754"
 }
]
Copy to Clipboard Toggle word wrap

HTTP method

POST api/ocloudNotifications/v2/subscriptions

Description

Creates a new subscription for the required event by passing the appropriate payload.

You can subscribe to the following PTP events:

  • sync-state events
  • lock-state events
  • gnss-sync-status events events
  • os-clock-sync-state events
  • clock-class events
Expand
Table 15.13. Query parameters
ParameterType

subscription

data

Example sync-state subscription payload

{
"EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
"ResourceAddress": "/cluster/node/{node_name}/sync/sync-status/sync-state"
}
Copy to Clipboard Toggle word wrap

Example PTP lock-state events subscription payload

{
"EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
"ResourceAddress": "/cluster/node/{node_name}/sync/ptp-status/lock-state"
}
Copy to Clipboard Toggle word wrap

Example PTP gnss-sync-status events subscription payload

{
"EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
"ResourceAddress": "/cluster/node/{node_name}/sync/gnss-status/gnss-sync-status"
}
Copy to Clipboard Toggle word wrap

Example PTP os-clock-sync-state events subscription payload

{
"EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
"ResourceAddress": "/cluster/node/{node_name}/sync/sync-status/os-clock-sync-state"
}
Copy to Clipboard Toggle word wrap

Example PTP clock-class events subscription payload

{
"EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
"ResourceAddress": "/cluster/node/{node_name}/sync/ptp-status/clock-class"
}
Copy to Clipboard Toggle word wrap

Example API response

{
    "ResourceAddress": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
    "EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
    "SubscriptionId": "620283f3-26cd-4a6d-b80a-bdc4b614a96a",
    "UriLocation": "http://ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions/620283f3-26cd-4a6d-b80a-bdc4b614a96a"
}
Copy to Clipboard Toggle word wrap

The following subscription status events are possible:

Expand
Table 15.14. PTP events REST API v2 subscription status codes
Status codeDescription

201 Created

Indicates that the subscription is created

400 Bad Request

Indicates that the server could not process the request because it was malformed or invalid

404 Not Found

Indicates that the subscription resource is not available

409 Conflict

Indicates that the subscription already exists

HTTP method

DELETE api/ocloudNotifications/v2/subscriptions

Description

Deletes all subscriptions.

Example API response

{
"status": "deleted all subscriptions"
}
Copy to Clipboard Toggle word wrap

HTTP method

GET api/ocloudNotifications/v2/subscriptions/{subscription_id}

Description

Returns details for the subscription with ID subscription_id.

Expand
Table 15.15. Global path parameters
ParameterType

subscription_id

string

Example API response

{
    "ResourceAddress": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
    "EndpointUri": "http://consumer-events-subscription-service.cloud-events.svc.cluster.local:9043/event",
    "SubscriptionId": "620283f3-26cd-4a6d-b80a-bdc4b614a96a",
    "UriLocation": "http://ptp-event-publisher-service-compute-1.openshift-ptp.svc.cluster.local:9043/api/ocloudNotifications/v2/subscriptions/620283f3-26cd-4a6d-b80a-bdc4b614a96a"
}
Copy to Clipboard Toggle word wrap

HTTP method

DELETE api/ocloudNotifications/v2/subscriptions/{subscription_id}

Description

Deletes the subscription with ID subscription_id.

Expand
Table 15.16. Global path parameters
ParameterType

subscription_id

string

Expand
Table 15.17. HTTP response codes
HTTP responseDescription

204 No Content

Success

15.4.1.3. api/ocloudNotifications/v2/health
HTTP method

GET api/ocloudNotifications/v2/health/

Description

Returns the health status for the ocloudNotifications REST API.

Expand
Table 15.18. HTTP response codes
HTTP responseDescription

200 OK

Success

15.4.1.4. api/ocloudNotifications/v2/publishers
HTTP method

GET api/ocloudNotifications/v2/publishers

Description

Returns a list of publisher details for the cluster node. The system generates notifications when the relevant equipment state changes.

You can use equipment synchronization status subscriptions together to deliver a detailed view of the overall synchronization health of the system.

Example API response

[
  {
    "ResourceAddress": "/cluster/node/compute-1.example.com/sync/sync-status/sync-state",
    "EndpointUri": "http://localhost:9043/api/ocloudNotifications/v2/dummy",
    "SubscriptionId": "4ea72bfa-185c-4703-9694-cdd0434cd570",
    "UriLocation": "http://localhost:9043/api/ocloudNotifications/v2/publishers/4ea72bfa-185c-4703-9694-cdd0434cd570"
  },
  {
    "ResourceAddress": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
    "EndpointUri": "http://localhost:9043/api/ocloudNotifications/v2/dummy",
    "SubscriptionId": "71fbb38e-a65d-41fc-823b-d76407901087",
    "UriLocation": "http://localhost:9043/api/ocloudNotifications/v2/publishers/71fbb38e-a65d-41fc-823b-d76407901087"
  },
  {
    "ResourceAddress": "/cluster/node/compute-1.example.com/sync/ptp-status/clock-class",
    "EndpointUri": "http://localhost:9043/api/ocloudNotifications/v2/dummy",
    "SubscriptionId": "7bc27cad-03f4-44a9-8060-a029566e7926",
    "UriLocation": "http://localhost:9043/api/ocloudNotifications/v2/publishers/7bc27cad-03f4-44a9-8060-a029566e7926"
  },
  {
    "ResourceAddress": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
    "EndpointUri": "http://localhost:9043/api/ocloudNotifications/v2/dummy",
    "SubscriptionId": "6e7b6736-f359-46b9-991c-fbaed25eb554",
    "UriLocation": "http://localhost:9043/api/ocloudNotifications/v2/publishers/6e7b6736-f359-46b9-991c-fbaed25eb554"
  },
  {
    "ResourceAddress": "/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status",
    "EndpointUri": "http://localhost:9043/api/ocloudNotifications/v2/dummy",
    "SubscriptionId": "31bb0a45-7892-45d4-91dd-13035b13ed18",
    "UriLocation": "http://localhost:9043/api/ocloudNotifications/v2/publishers/31bb0a45-7892-45d4-91dd-13035b13ed18"
  }
]
Copy to Clipboard Toggle word wrap

Expand
Table 15.19. HTTP response codes
HTTP responseDescription

200 OK

Success

HTTP method

GET api/ocloudNotifications/v2/cluster/node/{node_name}/sync/ptp-status/lock-state/CurrentState

GET api/ocloudNotifications/v2/cluster/node/{node_name}/sync/sync-status/os-clock-sync-state/CurrentState

GET api/ocloudNotifications/v2/cluster/node/{node_name}/sync/ptp-status/clock-class/CurrentState

GET api/ocloudNotifications/v2/cluster/node/{node_name}/sync/sync-status/sync-state/CurrentState

GET api/ocloudNotifications/v2/cluster/node/{node_name}/sync/gnss-status/gnss-sync-state/CurrentState

Description

Returns the current state of the os-clock-sync-state, clock-class, lock-state, gnss-sync-status, or sync-state events for the cluster node.

  • os-clock-sync-state notifications describe the host operating system clock synchronization state. Can be in LOCKED or FREERUN state.
  • clock-class notifications describe the current state of the PTP clock class.
  • lock-state notifications describe the current status of the PTP equipment lock state. Can be in LOCKED, HOLDOVER or FREERUN state.
  • sync-state notifications describe the current status of the least synchronized of the PTP clock lock-state and os-clock-sync-state states.
  • gnss-sync-status notifications describe the GNSS clock synchronization state.
Expand
Table 15.20. Global path parameters
ParameterType

resource_address

string

Example lock-state API response

{
  "id": "c1ac3aa5-1195-4786-84f8-da0ea4462921",
  "type": "event.sync.ptp-status.ptp-state-change",
  "source": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
  "dataContentType": "application/json",
  "time": "2023-01-10T02:41:57.094981478Z",
  "data": {
    "version": "1.0",
    "values": [
      {
        "ResourceAddress": "/cluster/node/compute-1.example.com/ens5fx/master",
        "data_type": "notification",
        "value_type": "enumeration",
        "value": "LOCKED"
      },
      {
        "ResourceAddress": "/cluster/node/compute-1.example.com/ens5fx/master",
        "data_type": "metric",
        "value_type": "decimal64.3",
        "value": "29"
      }
    ]
  }
}
Copy to Clipboard Toggle word wrap

Example os-clock-sync-state API response

{
  "specversion": "0.3",
  "id": "4f51fe99-feaa-4e66-9112-66c5c9b9afcb",
  "source": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
  "type": "event.sync.sync-status.os-clock-sync-state-change",
  "subject": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
  "datacontenttype": "application/json",
  "time": "2022-11-29T17:44:22.202Z",
  "data": {
    "version": "1.0",
    "values": [
      {
        "ResourceAddress": "/cluster/node/compute-1.example.com/CLOCK_REALTIME",
        "data_type": "notification",
        "value_type": "enumeration",
        "value": "LOCKED"
      },
      {
        "ResourceAddress": "/cluster/node/compute-1.example.com/CLOCK_REALTIME",
        "data_type": "metric",
        "value_type": "decimal64.3",
        "value": "27"
      }
    ]
  }
}
Copy to Clipboard Toggle word wrap

Example clock-class API response

{
  "id": "064c9e67-5ad4-4afb-98ff-189c6aa9c205",
  "type": "event.sync.ptp-status.ptp-clock-class-change",
  "source": "/cluster/node/compute-1.example.com/sync/ptp-status/clock-class",
  "dataContentType": "application/json",
  "time": "2023-01-10T02:41:56.785673989Z",
  "data": {
    "version": "1.0",
    "values": [
      {
        "ResourceAddress": "/cluster/node/compute-1.example.com/ens5fx/master",
        "data_type": "metric",
        "value_type": "decimal64.3",
        "value": "165"
      }
    ]
  }
}
Copy to Clipboard Toggle word wrap

Example sync-state API response

{
    "specversion": "0.3",
    "id": "8c9d6ecb-ae9f-4106-82c4-0a778a79838d",
    "source": "/sync/sync-status/sync-state",
    "type": "event.sync.sync-status.synchronization-state-change",
    "subject": "/cluster/node/compute-1.example.com/sync/sync-status/sync-state",
    "datacontenttype": "application/json",
    "time": "2024-08-28T14:50:57.327585316Z",
    "data":
    {
        "version": "1.0",
        "values": [
        {
            "ResourceAddress": "/cluster/node/compute-1.example.com/sync/sync-status/sync-state",
            "data_type": "notification",
            "value_type": "enumeration",
            "value": "LOCKED"
        }]
    }
}
Copy to Clipboard Toggle word wrap

Example gnss-sync-state API response

{
  "id": "435e1f2a-6854-4555-8520-767325c087d7",
  "type": "event.sync.gnss-status.gnss-state-change",
  "source": "/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status",
  "dataContentType": "application/json",
  "time": "2023-09-27T19:35:33.42347206Z",
  "data": {
    "version": "1.0",
    "values": [
      {
        "ResourceAddress": "/cluster/node/compute-1.example.com/ens2fx/master",
        "data_type": "notification",
        "value_type": "enumeration",
        "value": "SYNCHRONIZED"
      },
      {
        "ResourceAddress": "/cluster/node/compute-1.example.com/ens2fx/master",
        "data_type": "metric",
        "value_type": "decimal64.3",
        "value": "5"
      }
    ]
  }
}
Copy to Clipboard Toggle word wrap

When developing consumer applications that make use of Precision Time Protocol (PTP) events on a bare-metal cluster node, you deploy your consumer application in a separate application pod. The consumer application subscribes to PTP events by using the PTP events REST API v1.

Note

The following information provides general guidance for developing consumer applications that use PTP events. A complete events consumer application example is outside the scope of this information.

Important

The PTP events REST API v1 will be deprecated in a future release. When developing applications that use PTP events, use the PTP events REST API v2.

Use the Precision Time Protocol (PTP) fast event REST API v2 to subscribe cluster applications to PTP events that the bare-metal cluster node generates.

Note

The fast events notifications framework uses a REST API for communication. The PTP events REST API v1 and v2 are based on the O-RAN O-Cloud Notification API Specification for Event Consumers 4.0 that is available from O-RAN ALLIANCE Specifications.

Only the PTP events REST API v2 is O-RAN v4 compliant.

Applications run the cloud-event-proxy container in a sidecar pattern to subscribe to PTP events. The cloud-event-proxy sidecar container can access the same resources as the primary application container without using any of the resources of the primary application and with no significant latency.

Figure 15.6. Overview of PTP fast events with consumer sidecar and HTTP message transport

20 Event is generated on the cluster host
linuxptp-daemon in the PTP Operator-managed pod runs as a Kubernetes DaemonSet and manages the various linuxptp processes (ptp4l, phc2sys, and optionally for grandmaster clocks, ts2phc). The linuxptp-daemon passes the event to the UNIX domain socket.
20 Event is passed to the cloud-event-proxy sidecar
The PTP plugin reads the event from the UNIX domain socket and passes it to the cloud-event-proxy sidecar in the PTP Operator-managed pod. cloud-event-proxy delivers the event from the Kubernetes infrastructure to Cloud-Native Network Functions (CNFs) with low latency.
20 Event is persisted
The cloud-event-proxy sidecar in the PTP Operator-managed pod processes the event and publishes the cloud-native event by using a REST API.
20 Message is transported
The message transporter transports the event to the cloud-event-proxy sidecar in the application pod over HTTP.
20 Event is available from the REST API
The cloud-event-proxy sidecar in the Application pod processes the event and makes it available by using the REST API.
20 Consumer application requests a subscription and receives the subscribed event
The consumer application sends an API request to the cloud-event-proxy sidecar in the application pod to create a PTP events subscription. The cloud-event-proxy sidecar creates an HTTP messaging listener protocol for the resource specified in the subscription.

The cloud-event-proxy sidecar in the application pod receives the event from the PTP Operator-managed pod, unwraps the cloud events object to retrieve the data, and posts the event to the consumer application. The consumer application listens to the address specified in the resource qualifier and receives and processes the PTP event.

To start using PTP fast event notifications for a network interface in your cluster, you must enable the fast event publisher in the PTP Operator PtpOperatorConfig custom resource (CR) and configure ptpClockThreshold values in a PtpConfig CR that you create.

Prerequisites

  • You have installed the OpenShift Container Platform CLI (oc).
  • You have logged in as a user with cluster-admin privileges.
  • You have installed the PTP Operator.

Procedure

  1. Modify the default PTP Operator config to enable PTP fast events.

    1. Save the following YAML in the ptp-operatorconfig.yaml file:

      apiVersion: ptp.openshift.io/v1
      kind: PtpOperatorConfig
      metadata:
        name: default
        namespace: openshift-ptp
      spec:
        daemonNodeSelector:
          node-role.kubernetes.io/worker: ""
        ptpEventConfig:
          enableEventPublisher: true 
      1
      Copy to Clipboard Toggle word wrap
      1
      Enable PTP fast event notifications by setting enableEventPublisher to true.
      Note

      In OpenShift Container Platform 4.13 or later, you do not need to set the spec.ptpEventConfig.transportHost field in the PtpOperatorConfig resource when you use HTTP transport for PTP events.

    2. Update the PtpOperatorConfig CR:

      $ oc apply -f ptp-operatorconfig.yaml
      Copy to Clipboard Toggle word wrap
  2. Create a PtpConfig custom resource (CR) for the PTP enabled interface, and set the required values for ptpClockThreshold and ptp4lOpts. The following YAML illustrates the required values that you must set in the PtpConfig CR:

    spec:
      profile:
      - name: "profile1"
        interface: "enp5s0f0"
        ptp4lOpts: "-2 -s --summary_interval -4" 
    1
    
        phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" 
    2
    
        ptp4lConf: "" 
    3
    
        ptpClockThreshold: 
    4
    
          holdOverTimeout: 5
          maxOffsetThreshold: 100
          minOffsetThreshold: -100
    Copy to Clipboard Toggle word wrap
    1
    Append --summary_interval -4 to use PTP fast events.
    2
    Required phc2sysOpts values. -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.
    3
    Specify a string that contains the configuration to replace the default /etc/ptp4l.conf file. To use the default configuration, leave the field empty.
    4
    Optional. If the ptpClockThreshold stanza is not present, default values are used for the ptpClockThreshold fields. The stanza shows default ptpClockThreshold values. The ptpClockThreshold values configure how long after the PTP master clock is disconnected before PTP events are triggered. holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected. The maxOffsetThreshold and minOffsetThreshold settings configure offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l). When the ptp4l or phc2sys offset value is outside this range, the PTP clock state is set to FREERUN. When the offset value is within this range, the PTP clock state is set to LOCKED.

15.5.4. PTP events consumer application reference

PTP event consumer applications require the following features:

  1. A web service running with a POST handler to receive the cloud native PTP events JSON payload
  2. A createSubscription function to subscribe to the PTP events producer
  3. A getCurrentState function to poll the current state of the PTP events producer

The following example Go snippets illustrate these requirements:

Example PTP events consumer server function in Go

func server() {
  http.HandleFunc("/event", getEvent)
  http.ListenAndServe("localhost:8989", nil)
}

func getEvent(w http.ResponseWriter, req *http.Request) {
  defer req.Body.Close()
  bodyBytes, err := io.ReadAll(req.Body)
  if err != nil {
    log.Errorf("error reading event %v", err)
  }
  e := string(bodyBytes)
  if e != "" {
    processEvent(bodyBytes)
    log.Infof("received event %s", string(bodyBytes))
  } else {
    w.WriteHeader(http.StatusNoContent)
  }
}
Copy to Clipboard Toggle word wrap

Example PTP events createSubscription function in Go

import (
"github.com/redhat-cne/sdk-go/pkg/pubsub"
"github.com/redhat-cne/sdk-go/pkg/types"
v1pubsub "github.com/redhat-cne/sdk-go/v1/pubsub"
)

// Subscribe to PTP events using REST API
s1,_:=createsubscription("/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state") 
1

s2,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/class-change")
s3,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/lock-state")

// Create PTP event subscriptions POST
func createSubscription(resourceAddress string) (sub pubsub.PubSub, err error) {
  var status int
      apiPath:= "/api/ocloudNotifications/v1/"
      localAPIAddr:=localhost:8989 // vDU service API address
      apiAddr:= "localhost:8089" // event framework API address

  subURL := &types.URI{URL: url.URL{Scheme: "http",
    Host: apiAddr
    Path: fmt.Sprintf("%s%s", apiPath, "subscriptions")}}
  endpointURL := &types.URI{URL: url.URL{Scheme: "http",
    Host: localAPIAddr,
    Path: "event"}}

  sub = v1pubsub.NewPubSub(endpointURL, resourceAddress)
  var subB []byte

  if subB, err = json.Marshal(&sub); err == nil {
    rc := restclient.New()
    if status, subB = rc.PostWithReturn(subURL, subB); status != http.StatusCreated {
      err = fmt.Errorf("error in subscription creation api at %s, returned status %d", subURL, status)
    } else {
      err = json.Unmarshal(subB, &sub)
    }
  } else {
    err = fmt.Errorf("failed to marshal subscription for %s", resourceAddress)
  }
  return
}
Copy to Clipboard Toggle word wrap

1
Replace <node_name> with the FQDN of the node that is generating the PTP events. For example, compute-1.example.com.

Example PTP events consumer getCurrentState function in Go

//Get PTP event state for the resource
func getCurrentState(resource string) {
  //Create publisher
  url := &types.URI{URL: url.URL{Scheme: "http",
    Host: localhost:8989,
    Path: fmt.SPrintf("/api/ocloudNotifications/v1/%s/CurrentState",resource}}
  rc := restclient.New()
  status, event := rc.Get(url)
  if status != http.StatusOK {
    log.Errorf("CurrentState:error %d from url %s, %s", status, url.String(), event)
  } else {
    log.Debugf("Got CurrentState: %s ", event)
  }
}
Copy to Clipboard Toggle word wrap

Use the following example cloud-event-proxy deployment and subscriber service CRs as a reference when deploying your PTP events consumer application.

Reference cloud-event-proxy deployment with HTTP transport

apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-consumer-deployment
  namespace: <namespace>
  labels:
    app: consumer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: consumer
  template:
    metadata:
      labels:
        app: consumer
    spec:
      serviceAccountName: sidecar-consumer-sa
      containers:
        - name: event-subscriber
          image: event-subscriber-app
        - name: cloud-event-proxy-as-sidecar
          image: openshift4/ose-cloud-event-proxy
          args:
            - "--metrics-addr=127.0.0.1:9091"
            - "--store-path=/store"
            - "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043"
            - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"
            - "--api-port=8089"
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
          volumeMounts:
            - name: pubsubstore
              mountPath: /store
          ports:
            - name: metrics-port
              containerPort: 9091
            - name: sub-port
              containerPort: 9043
          volumes:
            - name: pubsubstore
              emptyDir: {}
Copy to Clipboard Toggle word wrap

Reference cloud-event-proxy subscriber service

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
    service.alpha.openshift.io/serving-cert-secret-name: sidecar-consumer-secret
  name: consumer-events-subscription-service
  namespace: cloud-events
  labels:
    app: consumer-service
spec:
  ports:
    - name: sub-port
      port: 9043
  selector:
    app: consumer
  clusterIP: None
  sessionAffinity: None
  type: ClusterIP
Copy to Clipboard Toggle word wrap

Deploy your cloud-event-consumer application container and cloud-event-proxy sidecar container in a separate application pod.

Subscribe the cloud-event-consumer application to PTP events posted by the cloud-event-proxy container at http://localhost:8089/api/ocloudNotifications/v1/ in the application pod.

Note

9089 is the default port for the cloud-event-consumer container deployed in the application pod. You can configure a different port for your application as required.

Verify that the cloud-event-proxy container in the application pod is receiving PTP events.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You have logged in as a user with cluster-admin privileges.
  • You have installed and configured the PTP Operator.

Procedure

  1. Get the list of active linuxptp-daemon pods. Run the following command:

    $ oc get pods -n openshift-ptp
    Copy to Clipboard Toggle word wrap

    Example output

    NAME                    READY   STATUS    RESTARTS   AGE
    linuxptp-daemon-2t78p   3/3     Running   0          8h
    linuxptp-daemon-k8n88   3/3     Running   0          8h
    Copy to Clipboard Toggle word wrap

  2. Access the metrics for the required consumer-side cloud-event-proxy container by running the following command:

    $ oc exec -it <linuxptp-daemon> -n openshift-ptp -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics
    Copy to Clipboard Toggle word wrap

    where:

    <linuxptp-daemon>

    Specifies the pod you want to query, for example, linuxptp-daemon-2t78p.

    Example output

    # HELP cne_transport_connections_resets Metric to get number of connection resets
    # TYPE cne_transport_connections_resets gauge
    cne_transport_connection_reset 1
    # HELP cne_transport_receiver Metric to get number of receiver created
    # TYPE cne_transport_receiver gauge
    cne_transport_receiver{address="/cluster/node/compute-1.example.com/ptp",status="active"} 2
    cne_transport_receiver{address="/cluster/node/compute-1.example.com/redfish/event",status="active"} 2
    # HELP cne_transport_sender Metric to get number of sender created
    # TYPE cne_transport_sender gauge
    cne_transport_sender{address="/cluster/node/compute-1.example.com/ptp",status="active"} 1
    cne_transport_sender{address="/cluster/node/compute-1.example.com/redfish/event",status="active"} 1
    # HELP cne_events_ack Metric to get number of events produced
    # TYPE cne_events_ack gauge
    cne_events_ack{status="success",type="/cluster/node/compute-1.example.com/ptp"} 18
    cne_events_ack{status="success",type="/cluster/node/compute-1.example.com/redfish/event"} 18
    # HELP cne_events_transport_published Metric to get number of events published by the transport
    # TYPE cne_events_transport_published gauge
    cne_events_transport_published{address="/cluster/node/compute-1.example.com/ptp",status="failed"} 1
    cne_events_transport_published{address="/cluster/node/compute-1.example.com/ptp",status="success"} 18
    cne_events_transport_published{address="/cluster/node/compute-1.example.com/redfish/event",status="failed"} 1
    cne_events_transport_published{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 18
    # HELP cne_events_transport_received Metric to get number of events received  by the transport
    # TYPE cne_events_transport_received gauge
    cne_events_transport_received{address="/cluster/node/compute-1.example.com/ptp",status="success"} 18
    cne_events_transport_received{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 18
    # HELP cne_events_api_published Metric to get number of events published by the rest api
    # TYPE cne_events_api_published gauge
    cne_events_api_published{address="/cluster/node/compute-1.example.com/ptp",status="success"} 19
    cne_events_api_published{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 19
    # HELP cne_events_received Metric to get number of events received
    # TYPE cne_events_received gauge
    cne_events_received{status="success",type="/cluster/node/compute-1.example.com/ptp"} 18
    cne_events_received{status="success",type="/cluster/node/compute-1.example.com/redfish/event"} 18
    # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
    # TYPE promhttp_metric_handler_requests_in_flight gauge
    promhttp_metric_handler_requests_in_flight 1
    # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
    # TYPE promhttp_metric_handler_requests_total counter
    promhttp_metric_handler_requests_total{code="200"} 4
    promhttp_metric_handler_requests_total{code="500"} 0
    promhttp_metric_handler_requests_total{code="503"} 0
    Copy to Clipboard Toggle word wrap

15.5.8. Monitoring PTP fast event metrics

You can monitor PTP fast events metrics from cluster nodes where the linuxptp-daemon is running. You can also monitor PTP fast event metrics in the OpenShift Container Platform web console by using the preconfigured and self-updating Prometheus monitoring stack.

Prerequisites

  • Install the OpenShift Container Platform CLI oc.
  • Log in as a user with cluster-admin privileges.
  • Install and configure the PTP Operator on a node with PTP-capable hardware.

Procedure

  1. Start a debug pod for the node by running the following command:

    $ oc debug node/<node_name>
    Copy to Clipboard Toggle word wrap
  2. Check for PTP metrics exposed by the linuxptp-daemon container. For example, run the following command:

    sh-4.4# curl http://localhost:9091/metrics
    Copy to Clipboard Toggle word wrap

    Example output

    # HELP cne_api_events_published Metric to get number of events published by the rest api
    # TYPE cne_api_events_published gauge
    cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/gnss-status/gnss-sync-status",status="success"} 1
    cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",status="success"} 94
    cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/ptp-status/class-change",status="success"} 18
    cne_api_events_published{address="/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",status="success"} 27
    Copy to Clipboard Toggle word wrap

  3. Optional. You can also find PTP events in the logs for the cloud-event-proxy container. For example, run the following command:

    $ oc logs -f linuxptp-daemon-cvgr6 -n openshift-ptp -c cloud-event-proxy
    Copy to Clipboard Toggle word wrap
  4. To view the PTP event in the OpenShift Container Platform web console, copy the name of the PTP metric you want to query, for example, openshift_ptp_offset_ns.
  5. In the OpenShift Container Platform web console, click ObserveMetrics.
  6. Paste the PTP metric name into the Expression field, and click Run queries.

15.5.9. PTP fast event metrics reference

The following table describes the PTP fast events metrics that are available from cluster nodes where the linuxptp-daemon service is running.

Expand
Table 15.21. PTP fast event metrics
MetricDescriptionExample

openshift_ptp_clock_class

Returns the PTP clock class for the interface. Possible values for PTP clock class are 6 (LOCKED), 7 (PRC UNLOCKED IN-SPEC), 52 (PRC UNLOCKED OUT-OF-SPEC), 187 (PRC UNLOCKED OUT-OF-SPEC), 135 (T-BC HOLDOVER IN-SPEC), 165 (T-BC HOLDOVER OUT-OF-SPEC), 248 (DEFAULT), or 255 (SLAVE ONLY CLOCK).

{node="compute-1.example.com",process="ptp4l"} 6

openshift_ptp_clock_state

Returns the current PTP clock state for the interface. Possible values for PTP clock state are FREERUN, LOCKED, or HOLDOVER.

{iface="CLOCK_REALTIME", node="compute-1.example.com", process="phc2sys"} 1

openshift_ptp_delay_ns

Returns the delay in nanoseconds between the primary clock sending the timing packet and the secondary clock receiving the timing packet.

{from="master", iface="ens2fx", node="compute-1.example.com", process="ts2phc"} 0

openshift_ptp_ha_profile_status

Returns the current status of the highly available system clock when there are multiple time sources on different NICs. Possible values are 0 (INACTIVE) and 1 (ACTIVE).

{node="node1",process="phc2sys",profile="profile1"} 1{node="node1",process="phc2sys",profile="profile2"} 0

openshift_ptp_frequency_adjustment_ns

Returns the frequency adjustment in nanoseconds between 2 PTP clocks. For example, between the upstream clock and the NIC, between the system clock and the NIC, or between the PTP hardware clock (phc) and the NIC.

{from="phc", iface="CLOCK_REALTIME", node="compute-1.example.com", process="phc2sys"} -6768

openshift_ptp_interface_role

Returns the configured PTP clock role for the interface. Possible values are 0 (PASSIVE), 1 (SLAVE), 2 (MASTER), 3 (FAULTY), 4 (UNKNOWN), or 5 (LISTENING).

{iface="ens2f0", node="compute-1.example.com", process="ptp4l"} 2

openshift_ptp_max_offset_ns

Returns the maximum offset in nanoseconds between 2 clocks or interfaces. For example, between the upstream GNSS clock and the NIC (ts2phc), or between the PTP hardware clock (phc) and the system clock (phc2sys).

{from="master", iface="ens2fx", node="compute-1.example.com", process="ts2phc"} 1.038099569e+09

openshift_ptp_offset_ns

Returns the offset in nanoseconds between the DPLL clock or the GNSS clock source and the NIC hardware clock.

{from="phc", iface="CLOCK_REALTIME", node="compute-1.example.com", process="phc2sys"} -9

openshift_ptp_process_restart_count

Returns a count of the number of times the ptp4l and ts2phc processes were restarted.

{config="ptp4l.0.config", node="compute-1.example.com",process="phc2sys"} 1

openshift_ptp_process_status

Returns a status code that shows whether the PTP processes are running or not.

{config="ptp4l.0.config", node="compute-1.example.com",process="phc2sys"} 1

openshift_ptp_threshold

Returns values for HoldOverTimeout, MaxOffsetThreshold, and MinOffsetThreshold.

  • holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected.
  • maxOffsetThreshold and minOffsetThreshold are offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l) values that you configure in the PtpConfig CR for the NIC.

{node="compute-1.example.com", profile="grandmaster", threshold="HoldOverTimeout"} 5

The following table describes the PTP fast event metrics that are available only when PTP grandmaster clock (T-GM) is enabled.

Expand
Table 15.22. PTP fast event metrics when T-GM is enabled
MetricDescriptionExample

openshift_ptp_frequency_status

Returns the current status of the digital phase-locked loop (DPLL) frequency for the NIC. Possible values are -1 (UNKNOWN), 0 (INVALID), 1 (FREERUN), 2 (LOCKED), 3 (LOCKED_HO_ACQ), or 4 (HOLDOVER).

{from="dpll",iface="ens2fx",node="compute-1.example.com",process="dpll"} 3

openshift_ptp_nmea_status

Returns the current status of the NMEA connection. NMEA is the protocol that is used for 1PPS NIC connections. Possible values are 0 (UNAVAILABLE) and 1 (AVAILABLE).

{iface="ens2fx",node="compute-1.example.com",process="ts2phc"} 1

openshift_ptp_phase_status

Returns the status of the DPLL phase for the NIC. Possible values are -1 (UNKNOWN), 0 (INVALID), 1 (FREERUN), 2 (LOCKED), 3 (LOCKED_HO_ACQ), or 4 (HOLDOVER).

{from="dpll",iface="ens2fx",node="compute-1.example.com",process="dpll"} 3

openshift_ptp_pps_status

Returns the current status of the NIC 1PPS connection. You use the 1PPS connection to synchronize timing between connected NICs. Possible values are 0 (UNAVAILABLE) and 1 (AVAILABLE).

{from="dpll",iface="ens2fx",node="compute-1.example.com",process="dpll"} 1

openshift_ptp_gnss_status

Returns the current status of the global navigation satellite system (GNSS) connection. GNSS provides satellite-based positioning, navigation, and timing services globally. Possible values are 0 (NOFIX), 1 (DEAD RECKONING ONLY), 2 (2D-FIX), 3 (3D-FIX), 4 (GPS+DEAD RECKONING FIX), 5, (TIME ONLY FIX).

{from="gnss",iface="ens2fx",node="compute-1.example.com",process="gnss"} 3

15.6. PTP events REST API v1 reference

Use the following Precision Time Protocol (PTP) fast event REST API v1 endpoints to subscribe the cloud-event-consumer application to PTP events posted by the cloud-event-proxy container at http://localhost:8089/api/ocloudNotifications/v1/ in the application pod.

Important

The PTP events REST API v1 will be deprecated in a future release. When developing applications that use PTP events, use the PTP events REST API v2.

The following API endpoints are available: