Networking
Configuring and managing cluster networking
Abstract
Chapter 1. About networking
Red Hat OpenShift Networking is an ecosystem of features, plugins and advanced networking capabilities that extend Kubernetes networking with the advanced networking-related features that your cluster needs to manage its network traffic for one or multiple hybrid clusters. This ecosystem of networking capabilities integrates ingress, egress, load balancing, high-performance throughput, security, inter- and intra-cluster traffic management and provides role-based observability tooling to reduce its natural complexities.
The following list highlights some of the most commonly used Red Hat OpenShift Networking features available on your cluster:
Primary cluster network provided by either of the following Container Network Interface (CNI) plugins:
- OVN-Kubernetes network plugin, the default plugin
- OpenShift SDN network plugin
- Certified 3rd-party alternative primary network plugins
- Cluster Network Operator for network plugin management
- Ingress Operator for TLS encrypted web traffic
- DNS Operator for name assignment
- MetalLB Operator for traffic load balancing on bare metal clusters
- IP failover support for high-availability
- Additional hardware network support through multiple CNI plugins, including for macvlan, ipvlan, and SR-IOV hardware networks
- IPv4, IPv6, and dual stack addressing
- Hybrid Linux-Windows host clusters for Windows-based workloads
- Red Hat OpenShift Service Mesh for discovery, load balancing, service-to-service authentication, failure recovery, metrics, and monitoring of services
- Single-node OpenShift
- Network Observability Operator for network debugging and insights
- Submariner for inter-cluster networking
- Red Hat Service Interconnect for layer 7 inter-cluster networking
Chapter 2. Understanding networking
Cluster Administrators have several options for exposing applications that run inside a cluster to external traffic and securing network connections:
- Service types, such as node ports or load balancers
-
API resources, such as
Ingress
andRoute
By default, Kubernetes allocates each pod an internal IP address for applications running within the pod. Pods and their containers can network, but clients outside the cluster do not have networking access. When you expose your application to external traffic, giving each pod its own IP address means that pods can be treated like physical hosts or virtual machines in terms of port allocation, networking, naming, service discovery, load balancing, application configuration, and migration.
Some cloud platforms offer metadata APIs that listen on the 169.254.169.254 IP address, a link-local IP address in the IPv4 169.254.0.0/16
CIDR block.
This CIDR block is not reachable from the pod network. Pods that need access to these IP addresses must be given host network access by setting the spec.hostNetwork
field in the pod spec to true
.
If you allow a pod host network access, you grant the pod privileged access to the underlying network infrastructure.
2.1. OpenShift Container Platform DNS
If you are running multiple services, such as front-end and back-end services for use with multiple pods, environment variables are created for user names, service IPs, and more so the front-end pods can communicate with the back-end services. If the service is deleted and recreated, a new IP address can be assigned to the service, and requires the front-end pods to be recreated to pick up the updated values for the service IP environment variable. Additionally, the back-end service must be created before any of the front-end pods to ensure that the service IP is generated properly, and that it can be provided to the front-end pods as an environment variable.
For this reason, OpenShift Container Platform has a built-in DNS so that the services can be reached by the service DNS as well as the service IP/port.
2.2. OpenShift Container Platform Ingress Operator
When you create your OpenShift Container Platform cluster, pods and services running on the cluster are each allocated their own IP addresses. The IP addresses are accessible to other pods and services running nearby but are not accessible to outside clients. The Ingress Operator implements the IngressController
API and is the component responsible for enabling external access to OpenShift Container Platform cluster services.
The Ingress Operator makes it possible for external clients to access your service by deploying and managing one or more HAProxy-based Ingress Controllers to handle routing. You can use the Ingress Operator to route traffic by specifying OpenShift Container Platform Route
and Kubernetes Ingress
resources. Configurations within the Ingress Controller, such as the ability to define endpointPublishingStrategy
type and internal load balancing, provide ways to publish Ingress Controller endpoints.
2.2.1. Comparing routes and Ingress
The Kubernetes Ingress resource in OpenShift Container Platform implements the Ingress Controller with a shared router service that runs as a pod inside the cluster. The most common way to manage Ingress traffic is with the Ingress Controller. You can scale and replicate this pod like any other regular pod. This router service is based on HAProxy, which is an open source load balancer solution.
The OpenShift Container Platform route provides Ingress traffic to services in the cluster. Routes provide advanced features that might not be supported by standard Kubernetes Ingress Controllers, such as TLS re-encryption, TLS passthrough, and split traffic for blue-green deployments.
Ingress traffic accesses services in the cluster through a route. Routes and Ingress are the main resources for handling Ingress traffic. Ingress provides features similar to a route, such as accepting external requests and delegating them based on the route. However, with Ingress you can only allow certain types of connections: HTTP/2, HTTPS and server name identification (SNI), and TLS with certificate. In OpenShift Container Platform, routes are generated to meet the conditions specified by the Ingress resource.
2.3. Glossary of common terms for OpenShift Container Platform networking
This glossary defines common terms that are used in the networking content.
- authentication
- To control access to an OpenShift Container Platform cluster, a cluster administrator can configure user authentication and ensure only approved users access the cluster. To interact with an OpenShift Container Platform cluster, you must authenticate to the OpenShift Container Platform API. You can authenticate by providing an OAuth access token or an X.509 client certificate in your requests to the OpenShift Container Platform API.
- AWS Load Balancer Operator
-
The AWS Load Balancer (ALB) Operator deploys and manages an instance of the
aws-load-balancer-controller
. - Cluster Network Operator
- The Cluster Network Operator (CNO) deploys and manages the cluster network components in an OpenShift Container Platform cluster. This includes deployment of the Container Network Interface (CNI) network plugin selected for the cluster during installation.
- config map
-
A config map provides a way to inject configuration data into pods. You can reference the data stored in a config map in a volume of type
ConfigMap
. Applications running in a pod can use this data. - custom resource (CR)
- A CR is extension of the Kubernetes API. You can create custom resources.
- DNS
- Cluster DNS is a DNS server which serves DNS records for Kubernetes services. Containers started by Kubernetes automatically include this DNS server in their DNS searches.
- DNS Operator
- The DNS Operator deploys and manages CoreDNS to provide a name resolution service to pods. This enables DNS-based Kubernetes Service discovery in OpenShift Container Platform.
- deployment
- A Kubernetes resource object that maintains the life cycle of an application.
- domain
- Domain is a DNS name serviced by the Ingress Controller.
- egress
- The process of data sharing externally through a network’s outbound traffic from a pod.
- External DNS Operator
- The External DNS Operator deploys and manages ExternalDNS to provide the name resolution for services and routes from the external DNS provider to OpenShift Container Platform.
- HTTP-based route
- An HTTP-based route is an unsecured route that uses the basic HTTP routing protocol and exposes a service on an unsecured application port.
- Ingress
- The Kubernetes Ingress resource in OpenShift Container Platform implements the Ingress Controller with a shared router service that runs as a pod inside the cluster.
- Ingress Controller
- The Ingress Operator manages Ingress Controllers. Using an Ingress Controller is the most common way to allow external access to an OpenShift Container Platform cluster.
- installer-provisioned infrastructure
- The installation program deploys and configures the infrastructure that the cluster runs on.
- kubelet
- A primary node agent that runs on each node in the cluster to ensure that containers are running in a pod.
- Kubernetes NMState Operator
- The Kubernetes NMState Operator provides a Kubernetes API for performing state-driven network configuration across the OpenShift Container Platform cluster’s nodes with NMState.
- kube-proxy
- Kube-proxy is a proxy service which runs on each node and helps in making services available to the external host. It helps in forwarding the request to correct containers and is capable of performing primitive load balancing.
- load balancers
- OpenShift Container Platform uses load balancers for communicating from outside the cluster with services running in the cluster.
- MetalLB Operator
-
As a cluster administrator, you can add the MetalLB Operator to your cluster so that when a service of type
LoadBalancer
is added to the cluster, MetalLB can add an external IP address for the service. - multicast
- With IP multicast, data is broadcast to many IP addresses simultaneously.
- namespaces
- A namespace isolates specific system resources that are visible to all processes. Inside a namespace, only processes that are members of that namespace can see those resources.
- networking
- Network information of a OpenShift Container Platform cluster.
- node
- A worker machine in the OpenShift Container Platform cluster. A node is either a virtual machine (VM) or a physical machine.
- OpenShift Container Platform Ingress Operator
-
The Ingress Operator implements the
IngressController
API and is the component responsible for enabling external access to OpenShift Container Platform services. - pod
- One or more containers with shared resources, such as volume and IP addresses, running in your OpenShift Container Platform cluster. A pod is the smallest compute unit defined, deployed, and managed.
- PTP Operator
-
The PTP Operator creates and manages the
linuxptp
services. - route
- The OpenShift Container Platform route provides Ingress traffic to services in the cluster. Routes provide advanced features that might not be supported by standard Kubernetes Ingress Controllers, such as TLS re-encryption, TLS passthrough, and split traffic for blue-green deployments.
- scaling
- Increasing or decreasing the resource capacity.
- service
- Exposes a running application on a set of pods.
- Single Root I/O Virtualization (SR-IOV) Network Operator
- The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.
- software-defined networking (SDN)
- OpenShift Container Platform uses a software-defined networking (SDN) approach to provide a unified cluster network that enables communication between pods across the OpenShift Container Platform cluster.
- Stream Control Transmission Protocol (SCTP)
- SCTP is a reliable message based protocol that runs on top of an IP network.
- taint
- Taints and tolerations ensure that pods are scheduled onto appropriate nodes. You can apply one or more taints on a node.
- toleration
- You can apply tolerations to pods. Tolerations allow the scheduler to schedule pods with matching taints.
- web console
- A user interface (UI) to manage OpenShift Container Platform.
Chapter 3. Accessing hosts
Learn how to create a bastion host to access OpenShift Container Platform instances and access the control plane nodes with secure shell (SSH) access.
3.1. Accessing hosts on Amazon Web Services in an installer-provisioned infrastructure cluster
The OpenShift Container Platform installer does not create any public IP addresses for any of the Amazon Elastic Compute Cloud (Amazon EC2) instances that it provisions for your OpenShift Container Platform cluster. To be able to SSH to your OpenShift Container Platform hosts, you must follow this procedure.
Procedure
-
Create a security group that allows SSH access into the virtual private cloud (VPC) created by the
openshift-install
command. - Create an Amazon EC2 instance on one of the public subnets the installer created.
Associate a public IP address with the Amazon EC2 instance that you created.
Unlike with the OpenShift Container Platform installation, you should associate the Amazon EC2 instance you created with an SSH keypair. It does not matter what operating system you choose for this instance, as it will simply serve as an SSH bastion to bridge the internet into your OpenShift Container Platform cluster’s VPC. The Amazon Machine Image (AMI) you use does matter. With Red Hat Enterprise Linux CoreOS (RHCOS), for example, you can provide keys via Ignition, like the installer does.
After you provisioned your Amazon EC2 instance and can SSH into it, you must add the SSH key that you associated with your OpenShift Container Platform installation. This key can be different from the key for the bastion instance, but does not have to be.
NoteDirect SSH access is only recommended for disaster recovery. When the Kubernetes API is responsive, run privileged pods instead.
-
Run
oc get nodes
, inspect the output, and choose one of the nodes that is a master. The hostname looks similar toip-10-0-1-163.ec2.internal
. From the bastion SSH host you manually deployed into Amazon EC2, SSH into that control plane host. Ensure that you use the same SSH key you specified during the installation:
$ ssh -i <ssh-key-path> core@<master-hostname>
Chapter 4. Networking Operators overview
OpenShift Container Platform supports multiple types of networking Operators. You can manage the cluster networking using these networking Operators.
4.1. Cluster Network Operator
The Cluster Network Operator (CNO) deploys and manages the cluster network components in an OpenShift Container Platform cluster. This includes deployment of the Container Network Interface (CNI) network plugin selected for the cluster during installation. For more information, see Cluster Network Operator in OpenShift Container Platform.
4.2. DNS Operator
The DNS Operator deploys and manages CoreDNS to provide a name resolution service to pods. This enables DNS-based Kubernetes Service discovery in OpenShift Container Platform. For more information, see DNS Operator in OpenShift Container Platform.
4.3. Ingress Operator
When you create your OpenShift Container Platform cluster, pods and services running on the cluster are each allocated IP addresses. The IP addresses are accessible to other pods and services running nearby but are not accessible to external clients. The Ingress Operator implements the Ingress Controller API and is responsible for enabling external access to OpenShift Container Platform cluster services. For more information, see Ingress Operator in OpenShift Container Platform.
4.4. External DNS Operator
The External DNS Operator deploys and manages ExternalDNS to provide the name resolution for services and routes from the external DNS provider to OpenShift Container Platform. For more information, see Understanding the External DNS Operator.
4.5. Ingress Node Firewall Operator
The Ingress Node Firewall Operator uses an extended Berkley Packet Filter (eBPF) and eXpress Data Path (XDP) plugin to process node firewall rules, update statistics and generate events for dropped traffic. The operator manages ingress node firewall resources, verifies firewall configuration, does not allow incorrectly configured rules that can prevent cluster access, and loads ingress node firewall XDP programs to the selected interfaces in the rule’s object(s). For more information, see Understanding the Ingress Node Firewall Operator
4.6. Network Observability Operator
The Network Observability Operator is an optional Operator that allows cluster administrators to observe the network traffic for OpenShift Container Platform clusters. The Network Observability Operator uses the eBPF technology to create network flows. The network flows are then enriched with OpenShift Container Platform information and stored in Loki. You can view and analyze the stored network flows information in the OpenShift Container Platform console for further insight and troubleshooting. For more information, see About Network Observability Operator.
Chapter 5. Cluster Network Operator in OpenShift Container Platform
You can use the Cluster Network Operator (CNO) to deploy and manage cluster network components on an OpenShift Container Platform cluster, including the Container Network Interface (CNI) network plugin selected for the cluster during installation.
5.1. Cluster Network Operator
The Cluster Network Operator implements the network
API from the operator.openshift.io
API group. The Operator deploys the OVN-Kubernetes network plugin, or the network provider plugin that you selected during cluster installation, by using a daemon set.
Procedure
The Cluster Network Operator is deployed during installation as a Kubernetes Deployment
.
Run the following command to view the Deployment status:
$ oc get -n openshift-network-operator deployment/network-operator
Example output
NAME READY UP-TO-DATE AVAILABLE AGE network-operator 1/1 1 1 56m
Run the following command to view the state of the Cluster Network Operator:
$ oc get clusteroperator/network
Example output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE network 4.5.4 True False False 50m
The following fields provide information about the status of the operator:
AVAILABLE
,PROGRESSING
, andDEGRADED
. TheAVAILABLE
field isTrue
when the Cluster Network Operator reports an available status condition.
5.2. Viewing the cluster network configuration
Every new OpenShift Container Platform installation has a network.config
object named cluster
.
Procedure
Use the
oc describe
command to view the cluster network configuration:$ oc describe network.config/cluster
Example output
Name: cluster Namespace: Labels: <none> Annotations: <none> API Version: config.openshift.io/v1 Kind: Network Metadata: Self Link: /apis/config.openshift.io/v1/networks/cluster Spec: 1 Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Network Type: OpenShiftSDN Service Network: 172.30.0.0/16 Status: 2 Cluster Network: Cidr: 10.128.0.0/14 Host Prefix: 23 Cluster Network MTU: 8951 Network Type: OpenShiftSDN Service Network: 172.30.0.0/16 Events: <none>
5.3. Viewing Cluster Network Operator status
You can inspect the status and view the details of the Cluster Network Operator using the oc describe
command.
Procedure
Run the following command to view the status of the Cluster Network Operator:
$ oc describe clusteroperators/network
5.4. Viewing Cluster Network Operator logs
You can view Cluster Network Operator logs by using the oc logs
command.
Procedure
Run the following command to view the logs of the Cluster Network Operator:
$ oc logs --namespace=openshift-network-operator deployment/network-operator
5.5. Cluster Network Operator configuration
The configuration for the cluster network is specified as part of the Cluster Network Operator (CNO) configuration and stored in a custom resource (CR) object that is named cluster
. The CR specifies the fields for the Network
API in the operator.openshift.io
API group.
The CNO configuration inherits the following fields during cluster installation from the Network
API in the Network.config.openshift.io
API group and these fields cannot be changed:
clusterNetwork
- IP address pools from which pod IP addresses are allocated.
serviceNetwork
- IP address pool for services.
defaultNetwork.type
- Cluster network plugin, such as OpenShift SDN or OVN-Kubernetes.
After cluster installation, you cannot modify the fields listed in the previous section.
You can specify the cluster network plugin configuration for your cluster by setting the fields for the defaultNetwork
object in the CNO object named cluster
.
5.5.1. Cluster Network Operator configuration object
The fields for the Cluster Network Operator (CNO) are described in the following table:
Field | Type | Description |
---|---|---|
|
|
The name of the CNO object. This name is always |
|
| A list specifying the blocks of IP addresses from which pod IP addresses are allocated and the subnet prefix length assigned to each individual node in the cluster. For example: spec: clusterNetwork: - cidr: 10.128.0.0/19 hostPrefix: 23 - cidr: 10.128.32.0/19 hostPrefix: 23
This value is ready-only and inherited from the |
|
| A block of IP addresses for services. The OpenShift SDN and OVN-Kubernetes network plugins support only a single IP address block for the service network. For example: spec: serviceNetwork: - 172.30.0.0/14
This value is ready-only and inherited from the |
|
| Configures the network plugin for the cluster network. |
|
| The fields for this object specify the kube-proxy configuration. If you are using the OVN-Kubernetes cluster network plugin, the kube-proxy configuration has no effect. |
defaultNetwork object configuration
The values for the defaultNetwork
object are defined in the following table:
Field | Type | Description |
---|---|---|
|
|
Either Note OpenShift Container Platform uses the OVN-Kubernetes network plugin by default. |
|
| This object is only valid for the OpenShift SDN network plugin. |
|
| This object is only valid for the OVN-Kubernetes network plugin. |
Configuration for the OpenShift SDN network plugin
The following table describes the configuration fields for the OpenShift SDN network plugin:
Field | Type | Description |
---|---|---|
|
| The network isolation mode for OpenShift SDN. |
|
| The maximum transmission unit (MTU) for the VXLAN overlay network. This value is normally configured automatically. |
|
|
The port to use for all VXLAN packets. The default value is |
You can only change the configuration for your cluster network plugin during cluster installation.
Example OpenShift SDN configuration
defaultNetwork: type: OpenShiftSDN openshiftSDNConfig: mode: NetworkPolicy mtu: 1450 vxlanPort: 4789
Configuration for the OVN-Kubernetes network plugin
The following table describes the configuration fields for the OVN-Kubernetes network plugin:
Field | Type | Description |
---|---|---|
|
| The maximum transmission unit (MTU) for the Geneve (Generic Network Virtualization Encapsulation) overlay network. This value is normally configured automatically. |
|
| The UDP port for the Geneve overlay network. |
|
| If the field is present, IPsec is enabled for the cluster. |
|
| Specify a configuration object for customizing network policy audit logging. If unset, the defaults audit log settings are used. |
|
| Optional: Specify a configuration object for customizing how egress traffic is sent to the node gateway. Note While migrating egress traffic, you can expect some disruption to workloads and service traffic until the Cluster Network Operator (CNO) successfully rolls out the changes. |
|
If your existing network infrastructure overlaps with the This field cannot be changed after installation. |
The default value is |
|
If your existing network infrastructure overlaps with the This field cannot be changed after installation. |
The default value is |
Field | Type | Description |
---|---|---|
| integer |
The maximum number of messages to generate every second per node. The default value is |
| integer |
The maximum size for the audit log in bytes. The default value is |
| string | One of the following additional audit log targets:
|
| string |
The syslog facility, such as |
Field | Type | Description |
---|---|---|
|
|
Set this field to
This field has an interaction with the Open vSwitch hardware offloading feature. If you set this field to |
You can only change the configuration for your cluster network plugin during cluster installation, except for the gatewayConfig
field that can be changed at runtime as a postinstallation activity.
Example OVN-Kubernetes configuration with IPSec enabled
defaultNetwork: type: OVNKubernetes ovnKubernetesConfig: mtu: 1400 genevePort: 6081 ipsecConfig: {}
kubeProxyConfig object configuration
The values for the kubeProxyConfig
object are defined in the following table:
Field | Type | Description |
---|---|---|
|
|
The refresh period for Note
Because of performance improvements introduced in OpenShift Container Platform 4.3 and greater, adjusting the |
|
|
The minimum duration before refreshing kubeProxyConfig: proxyArguments: iptables-min-sync-period: - 0s |
5.5.2. Cluster Network Operator example configuration
A complete CNO configuration is specified in the following example:
Example Cluster Network Operator object
apiVersion: operator.openshift.io/v1 kind: Network metadata: name: cluster spec: clusterNetwork: 1 - cidr: 10.128.0.0/14 hostPrefix: 23 serviceNetwork: 2 - 172.30.0.0/16 defaultNetwork: 3 type: OpenShiftSDN openshiftSDNConfig: mode: NetworkPolicy mtu: 1450 vxlanPort: 4789 kubeProxyConfig: iptablesSyncPeriod: 30s proxyArguments: iptables-min-sync-period: - 0s
5.6. Additional resources
Chapter 6. DNS Operator in OpenShift Container Platform
The DNS Operator deploys and manages CoreDNS to provide a name resolution service to pods, enabling DNS-based Kubernetes Service discovery in OpenShift Container Platform.
6.1. DNS Operator
The DNS Operator implements the dns
API from the operator.openshift.io
API group. The Operator deploys CoreDNS using a daemon set, creates a service for the daemon set, and configures the kubelet to instruct pods to use the CoreDNS service IP address for name resolution.
Procedure
The DNS Operator is deployed during installation with a Deployment
object.
Use the
oc get
command to view the deployment status:$ oc get -n openshift-dns-operator deployment/dns-operator
Example output
NAME READY UP-TO-DATE AVAILABLE AGE dns-operator 1/1 1 1 23h
Use the
oc get
command to view the state of the DNS Operator:$ oc get clusteroperator/dns
Example output
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE dns 4.1.0-0.11 True False False 92m
AVAILABLE
,PROGRESSING
andDEGRADED
provide information about the status of the operator.AVAILABLE
isTrue
when at least 1 pod from the CoreDNS daemon set reports anAvailable
status condition.
6.2. Changing the DNS Operator managementState
DNS manages the CoreDNS component to provide a name resolution service for pods and services in the cluster. The managementState
of the DNS Operator is set to Managed
by default, which means that the DNS Operator is actively managing its resources. You can change it to Unmanaged
, which means the DNS Operator is not managing its resources.
The following are use cases for changing the DNS Operator managementState
:
-
You are a developer and want to test a configuration change to see if it fixes an issue in CoreDNS. You can stop the DNS Operator from overwriting the fix by setting the
managementState
toUnmanaged
. -
You are a cluster administrator and have reported an issue with CoreDNS, but need to apply a workaround until the issue is fixed. You can set the
managementState
field of the DNS Operator toUnmanaged
to apply the workaround.
Procedure
Change
managementState
DNS Operator:oc patch dns.operator.openshift.io default --type merge --patch '{"spec":{"managementState":"Unmanaged"}}'
6.3. Controlling DNS pod placement
The DNS Operator has two daemon sets: one for CoreDNS and one for managing the /etc/hosts
file. The daemon set for /etc/hosts
must run on every node host to add an entry for the cluster image registry to support pulling images. Security policies can prohibit communication between pairs of nodes, which prevents the daemon set for CoreDNS from running on every node.
As a cluster administrator, you can use a custom node selector to configure the daemon set for CoreDNS to run or not run on certain nodes.
Prerequisites
-
You installed the
oc
CLI. -
You are logged in to the cluster with a user with
cluster-admin
privileges.
Procedure
To prevent communication between certain nodes, configure the
spec.nodePlacement.nodeSelector
API field:Modify the DNS Operator object named
default
:$ oc edit dns.operator/default
Specify a node selector that includes only control plane nodes in the
spec.nodePlacement.nodeSelector
API field:spec: nodePlacement: nodeSelector: node-role.kubernetes.io/worker: ""
To allow the daemon set for CoreDNS to run on nodes, configure a taint and toleration:
Modify the DNS Operator object named
default
:$ oc edit dns.operator/default
Specify a taint key and a toleration for the taint:
spec: nodePlacement: tolerations: - effect: NoExecute key: "dns-only" operators: Equal value: abc tolerationSeconds: 3600 1
- 1
- If the taint is
dns-only
, it can be tolerated indefinitely. You can omittolerationSeconds
.
6.4. View the default DNS
Every new OpenShift Container Platform installation has a dns.operator
named default
.
Procedure
Use the
oc describe
command to view the defaultdns
:$ oc describe dns.operator/default
Example output
Name: default Namespace: Labels: <none> Annotations: <none> API Version: operator.openshift.io/v1 Kind: DNS ... Status: Cluster Domain: cluster.local 1 Cluster IP: 172.30.0.10 2 ...
To find the service CIDR of your cluster, use the
oc get
command:$ oc get networks.config/cluster -o jsonpath='{$.status.serviceNetwork}'
Example output
[172.30.0.0/16]
6.5. Using DNS forwarding
You can use DNS forwarding to override the default forwarding configuration in the /etc/resolv.conf
file in the following ways:
- Specify name servers for every zone. If the forwarded zone is the Ingress domain managed by OpenShift Container Platform, then the upstream name server must be authorized for the domain.
- Provide a list of upstream DNS servers.
- Change the default forwarding policy.
A DNS forwarding configuration for the default domain can have both the default servers specified in the /etc/resolv.conf
file and the upstream DNS servers.
Procedure
Modify the DNS Operator object named
default
:$ oc edit dns.operator/default
After you issue the previous command, the Operator creates and updates the config map named
dns-default
with additional server configuration blocks based onServer
. If none of the servers have a zone that matches the query, then name resolution falls back to the upstream DNS servers.Configuring DNS forwarding
apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: servers: - name: example-server 1 zones: 2 - example.com forwardPlugin: policy: Random 3 upstreams: 4 - 1.1.1.1 - 2.2.2.2:5353 upstreamResolvers: 5 policy: Random 6 upstreams: 7 - type: SystemResolvConf 8 - type: Network address: 1.2.3.4 9 port: 53 10
- 1
- Must comply with the
rfc6335
service name syntax. - 2
- Must conform to the definition of a subdomain in the
rfc1123
service name syntax. The cluster domain,cluster.local
, is an invalid subdomain for thezones
field. - 3
- Defines the policy to select upstream resolvers. Default value is
Random
. You can also use the valuesRoundRobin
, andSequential
. - 4
- A maximum of 15
upstreams
is allowed perforwardPlugin
. - 5
- Optional. You can use it to override the default policy and forward DNS resolution to the specified DNS resolvers (upstream resolvers) for the default domain. If you do not provide any upstream resolvers, the DNS name queries go to the servers in
/etc/resolv.conf
. - 6
- Determines the order in which upstream servers are selected for querying. You can specify one of these values:
Random
,RoundRobin
, orSequential
. The default value isSequential
. - 7
- Optional. You can use it to provide upstream resolvers.
- 8
- You can specify two types of
upstreams
-SystemResolvConf
andNetwork
.SystemResolvConf
configures the upstream to use/etc/resolv.conf
andNetwork
defines aNetworkresolver
. You can specify one or both. - 9
- If the specified type is
Network
, you must provide an IP address. Theaddress
field must be a valid IPv4 or IPv6 address. - 10
- If the specified type is
Network
, you can optionally provide a port. Theport
field must have a value between1
and65535
. If you do not specify a port for the upstream, by default port 853 is tried.
Optional: When working in a highly regulated environment, you might need the ability to secure DNS traffic when forwarding requests to upstream resolvers so that you can ensure additional DNS traffic and data privacy. Cluster administrators can configure transport layer security (TLS) for forwarded DNS queries.
Configuring DNS forwarding with TLS
apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: servers: - name: example-server 1 zones: 2 - example.com forwardPlugin: transportConfig: transport: TLS 3 tls: caBundle: name: mycacert serverName: dnstls.example.com 4 policy: Random 5 upstreams: 6 - 1.1.1.1 - 2.2.2.2:5353 upstreamResolvers: 7 transportConfig: transport: TLS tls: caBundle: name: mycacert serverName: dnstls.example.com upstreams: - type: Network 8 address: 1.2.3.4 9 port: 53 10
- 1
- Must comply with the
rfc6335
service name syntax. - 2
- Must conform to the definition of a subdomain in the
rfc1123
service name syntax. The cluster domain,cluster.local
, is an invalid subdomain for thezones
field. The cluster domain,cluster.local
, is an invalidsubdomain
forzones
. - 3
- When configuring TLS for forwarded DNS queries, set the
transport
field to have the valueTLS
. By default, CoreDNS caches forwarded connections for 10 seconds. CoreDNS will hold a TCP connection open for those 10 seconds if no request is issued. With large clusters, ensure that your DNS server is aware that it might get many new connections to hold open because you can initiate a connection per node. Set up your DNS hierarchy accordingly to avoid performance issues. - 4
- When configuring TLS for forwarded DNS queries, this is a mandatory server name used as part of the server name indication (SNI) to validate the upstream TLS server certificate.
- 5
- Defines the policy to select upstream resolvers. Default value is
Random
. You can also use the valuesRoundRobin
, andSequential
. - 6
- Required. You can use it to provide upstream resolvers. A maximum of 15
upstreams
entries are allowed perforwardPlugin
entry. - 7
- Optional. You can use it to override the default policy and forward DNS resolution to the specified DNS resolvers (upstream resolvers) for the default domain. If you do not provide any upstream resolvers, the DNS name queries go to the servers in
/etc/resolv.conf
. - 8
Network
type indicates that this upstream resolver should handle forwarded requests separately from the upstream resolvers listed in/etc/resolv.conf
. Only theNetwork
type is allowed when using TLS and you must provide an IP address.- 9
- The
address
field must be a valid IPv4 or IPv6 address. - 10
- You can optionally provide a port. The
port
must have a value between1
and65535
. If you do not specify a port for the upstream, by default port 853 is tried.
NoteIf
servers
is undefined or invalid, the config map only contains the default server.
Verification
View the config map:
$ oc get configmap/dns-default -n openshift-dns -o yaml
Sample DNS ConfigMap based on previous sample DNS
apiVersion: v1 data: Corefile: | example.com:5353 { forward . 1.1.1.1 2.2.2.2:5353 } bar.com:5353 example.com:5353 { forward . 3.3.3.3 4.4.4.4:5454 1 } .:5353 { errors health kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure upstream fallthrough in-addr.arpa ip6.arpa } prometheus :9153 forward . /etc/resolv.conf 1.2.3.4:53 { policy Random } cache 30 reload } kind: ConfigMap metadata: labels: dns.operator.openshift.io/owning-dns: default name: dns-default namespace: openshift-dns
- 1
- Changes to the
forwardPlugin
triggers a rolling update of the CoreDNS daemon set.
Additional resources
- For more information on DNS forwarding, see the CoreDNS forward documentation.
6.6. DNS Operator status
You can inspect the status and view the details of the DNS Operator using the oc describe
command.
Procedure
View the status of the DNS Operator:
$ oc describe clusteroperators/dns
6.7. DNS Operator logs
You can view DNS Operator logs by using the oc logs
command.
Procedure
View the logs of the DNS Operator:
$ oc logs -n openshift-dns-operator deployment/dns-operator -c dns-operator
6.8. Setting the CoreDNS log level
You can configure the CoreDNS log level to determine the amount of detail in logged error messages. The valid values for CoreDNS log level are Normal
, Debug
, and Trace
. The default logLevel
is Normal
.
The errors plugin is always enabled. The following logLevel
settings report different error responses:
-
logLevel
:Normal
enables the "errors" class:log . { class error }
. -
logLevel
:Debug
enables the "denial" class:log . { class denial error }
. -
logLevel
:Trace
enables the "all" class:log . { class all }
.
Procedure
To set
logLevel
toDebug
, enter the following command:$ oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Debug"}}' --type=merge
To set
logLevel
toTrace
, enter the following command:$ oc patch dnses.operator.openshift.io/default -p '{"spec":{"logLevel":"Trace"}}' --type=merge
Verification
To ensure the desired log level was set, check the config map:
$ oc get configmap/dns-default -n openshift-dns -o yaml
6.9. Setting the CoreDNS Operator log level
Cluster administrators can configure the Operator log level to more quickly track down OpenShift DNS issues. The valid values for operatorLogLevel
are Normal
, Debug
, and Trace
. Trace
has the most detailed information. The default operatorlogLevel
is Normal
. There are seven logging levels for issues: Trace, Debug, Info, Warning, Error, Fatal and Panic. After the logging level is set, log entries with that severity or anything above it will be logged.
-
operatorLogLevel: "Normal"
setslogrus.SetLogLevel("Info")
. -
operatorLogLevel: "Debug"
setslogrus.SetLogLevel("Debug")
. -
operatorLogLevel: "Trace"
setslogrus.SetLogLevel("Trace")
.
Procedure
To set
operatorLogLevel
toDebug
, enter the following command:$ oc patch dnses.operator.openshift.io/default -p '{"spec":{"operatorLogLevel":"Debug"}}' --type=merge
To set
operatorLogLevel
toTrace
, enter the following command:$ oc patch dnses.operator.openshift.io/default -p '{"spec":{"operatorLogLevel":"Trace"}}' --type=merge
6.10. Tuning the CoreDNS cache
You can configure the maximum duration of both successful or unsuccessful caching, also known as positive or negative caching respectively, done by CoreDNS. Tuning the duration of caching of DNS query responses can reduce the load for any upstream DNS resolvers.
Procedure
Edit the DNS Operator object named
default
by running the following command:$ oc edit dns.operator.openshift.io/default
Modify the time-to-live (TTL) caching values:
Configuring DNS caching
apiVersion: operator.openshift.io/v1 kind: DNS metadata: name: default spec: cache: positiveTTL: 1h 1 negativeTTL: 0.5h10m 2
- 1
- The string value
1h
is converted to its respective number of seconds by CoreDNS. If this field is omitted, the value is assumed to be0s
and the cluster uses the internal default value of900s
as a fallback. - 2
- The string value can be a combination of units such as
0.5h10m
and is converted to its respective number of seconds by CoreDNS. If this field is omitted, the value is assumed to be0s
and the cluster uses the internal default value of30s
as a fallback.
WarningSetting TTL fields to low values could lead to an increased load on the cluster, any upstream resolvers, or both.
Chapter 7. Ingress Operator in OpenShift Container Platform
7.1. OpenShift Container Platform Ingress Operator
When you create your OpenShift Container Platform cluster, pods and services running on the cluster are each allocated their own IP addresses. The IP addresses are accessible to other pods and services running nearby but are not accessible to outside clients. The Ingress Operator implements the IngressController
API and is the component responsible for enabling external access to OpenShift Container Platform cluster services.
The Ingress Operator makes it possible for external clients to access your service by deploying and managing one or more HAProxy-based Ingress Controllers to handle routing. You can use the Ingress Operator to route traffic by specifying OpenShift Container Platform Route
and Kubernetes Ingress
resources. Configurations within the Ingress Controller, such as the ability to define endpointPublishingStrategy
type and internal load balancing, provide ways to publish Ingress Controller endpoints.
7.2. The Ingress configuration asset
The installation program generates an asset with an Ingress
resource in the config.openshift.io
API group, cluster-ingress-02-config.yml
.
YAML Definition of the Ingress
resource
apiVersion: config.openshift.io/v1 kind: Ingress metadata: name: cluster spec: domain: apps.openshiftdemos.com
The installation program stores this asset in the cluster-ingress-02-config.yml
file in the manifests/
directory. This Ingress
resource defines the cluster-wide configuration for Ingress. This Ingress configuration is used as follows:
- The Ingress Operator uses the domain from the cluster Ingress configuration as the domain for the default Ingress Controller.
-
The OpenShift API Server Operator uses the domain from the cluster Ingress configuration. This domain is also used when generating a default host for a
Route
resource that does not specify an explicit host.
7.3. Ingress Controller configuration parameters
The Infrastructure
custom resource (CR) includes optional configuration parameters that you can configure to meet specific needs for your organization.
Parameter | Description |
---|---|
|
The
If empty, the default value is |
|
|
|
For cloud environments, use the
On GCP, AWS, and Azure you can configure the following
If not set, the default value is based on
For most platforms, the
For non-cloud environments, such as a bare-metal platform, use the
If you do not set a value in one of these fields, the default value is based on binding ports specified in the
If you need to update the
|
|
The
The secret must contain the following keys and data: *
If not set, a wildcard certificate is automatically generated and used. The certificate is valid for the Ingress Controller The in-use certificate, whether generated or user-specified, is automatically integrated with OpenShift Container Platform built-in OAuth server. |
|
|
|
|
|
If not set, the defaults values are used. Note
The nodePlacement: nodeSelector: matchLabels: kubernetes.io/os: linux tolerations: - effect: NoSchedule operator: Exists |
|
If not set, the default value is based on the
When using the
The minimum TLS version for Ingress Controllers is Note
Ciphers and the minimum TLS version of the configured security profile are reflected in the Important
The Ingress Operator converts the TLS |
|
The
The |
|
|
|
|
|
By setting the
By default, the policy is set to
By setting These adjustments are only applied to cleartext, edge-terminated, and re-encrypt routes, and only when using HTTP/1.
For request headers, these adjustments are applied only for routes that have the |
|
|
|
|
|
For any cookie that you want to capture, the following parameters must be in your
For example: httpCaptureCookies: - matchType: Exact maxLength: 128 name: MYCOOKIE |
|
httpCaptureHeaders: request: - maxLength: 256 name: Connection - maxLength: 128 name: User-Agent response: - maxLength: 256 name: Content-Type - maxLength: 256 name: Content-Length |
|
|
|
The
|
|
The
These connections come from load balancer health probes or web browser speculative connections (preconnect) and can be safely ignored. However, these requests can be caused by network errors, so setting this field to |
7.3.1. Ingress Controller TLS security profiles
TLS security profiles provide a way for servers to regulate which ciphers a connecting client can use when connecting to the server.
7.3.1.1. Understanding TLS security profiles
You can use a TLS (Transport Layer Security) security profile to define which TLS ciphers are required by various OpenShift Container Platform components. The OpenShift Container Platform TLS security profiles are based on Mozilla recommended configurations.
You can specify one of the following TLS security profiles for each component:
Profile | Description |
---|---|
| This profile is intended for use with legacy clients or libraries. The profile is based on the Old backward compatibility recommended configuration.
The Note For the Ingress Controller, the minimum TLS version is converted from 1.0 to 1.1. |
| This profile is the recommended configuration for the majority of clients. It is the default TLS security profile for the Ingress Controller, kubelet, and control plane. The profile is based on the Intermediate compatibility recommended configuration.
The |
| This profile is intended for use with modern clients that have no need for backwards compatibility. This profile is based on the Modern compatibility recommended configuration.
The |
| This profile allows you to define the TLS version and ciphers to use. Warning
Use caution when using a |
When using one of the predefined profile types, the effective profile configuration is subject to change between releases. For example, given a specification to use the Intermediate profile deployed on release X.Y.Z, an upgrade to release X.Y.Z+1 might cause a new profile configuration to be applied, resulting in a rollout.
7.3.1.2. Configuring the TLS security profile for the Ingress Controller
To configure a TLS security profile for an Ingress Controller, edit the IngressController
custom resource (CR) to specify a predefined or custom TLS security profile. If a TLS security profile is not configured, the default value is based on the TLS security profile set for the API server.
Sample IngressController
CR that configures the Old
TLS security profile
apiVersion: operator.openshift.io/v1 kind: IngressController ... spec: tlsSecurityProfile: old: {} type: Old ...
The TLS security profile defines the minimum TLS version and the TLS ciphers for TLS connections for Ingress Controllers.
You can see the ciphers and the minimum TLS version of the configured TLS security profile in the IngressController
custom resource (CR) under Status.Tls Profile
and the configured TLS security profile under Spec.Tls Security Profile
. For the Custom
TLS security profile, the specific ciphers and minimum TLS version are listed under both parameters.
The HAProxy Ingress Controller image supports TLS 1.3
and the Modern
profile.
The Ingress Operator also converts the TLS 1.0
of an Old
or Custom
profile to 1.1
.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Edit the
IngressController
CR in theopenshift-ingress-operator
project to configure the TLS security profile:$ oc edit IngressController default -n openshift-ingress-operator
Add the
spec.tlsSecurityProfile
field:Sample
IngressController
CR for aCustom
profileapiVersion: operator.openshift.io/v1 kind: IngressController ... spec: tlsSecurityProfile: type: Custom 1 custom: 2 ciphers: 3 - ECDHE-ECDSA-CHACHA20-POLY1305 - ECDHE-RSA-CHACHA20-POLY1305 - ECDHE-RSA-AES128-GCM-SHA256 - ECDHE-ECDSA-AES128-GCM-SHA256 minTLSVersion: VersionTLS11 ...
- Save the file to apply the changes.
Verification
Verify that the profile is set in the
IngressController
CR:$ oc describe IngressController default -n openshift-ingress-operator
Example output
Name: default Namespace: openshift-ingress-operator Labels: <none> Annotations: <none> API Version: operator.openshift.io/v1 Kind: IngressController ... Spec: ... Tls Security Profile: Custom: Ciphers: ECDHE-ECDSA-CHACHA20-POLY1305 ECDHE-RSA-CHACHA20-POLY1305 ECDHE-RSA-AES128-GCM-SHA256 ECDHE-ECDSA-AES128-GCM-SHA256 Min TLS Version: VersionTLS11 Type: Custom ...
7.3.1.3. Configuring mutual TLS authentication
You can configure the Ingress Controller to enable mutual TLS (mTLS) authentication by setting a spec.clientTLS
value. The clientTLS
value configures the Ingress Controller to verify client certificates. This configuration includes setting a clientCA
value, which is a reference to a config map. The config map contains the PEM-encoded CA certificate bundle that is used to verify a client’s certificate. Optionally, you can also configure a list of certificate subject filters.
If the clientCA
value specifies an X509v3 certificate revocation list (CRL) distribution point, the Ingress Operator downloads and manages a CRL config map based on the HTTP URI X509v3 CRL Distribution Point
specified in each provided certificate. The Ingress Controller uses this config map during mTLS/TLS negotiation. Requests that do not provide valid certificates are rejected.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. - You have a PEM-encoded CA certificate bundle.
If your CA bundle references a CRL distribution point, you must have also included the end-entity or leaf certificate to the client CA bundle. This certificate must have included an HTTP URI under
CRL Distribution Points
, as described in RFC 5280. For example:Issuer: C=US, O=Example Inc, CN=Example Global G2 TLS RSA SHA256 2020 CA1 Subject: SOME SIGNED CERT X509v3 CRL Distribution Points: Full Name: URI:http://crl.example.com/example.crl
Procedure
In the
openshift-config
namespace, create a config map from your CA bundle:$ oc create configmap \ router-ca-certs-default \ --from-file=ca-bundle.pem=client-ca.crt \1 -n openshift-config
- 1
- The config map data key must be
ca-bundle.pem
, and the data value must be a CA certificate in PEM format.
Edit the
IngressController
resource in theopenshift-ingress-operator
project:$ oc edit IngressController default -n openshift-ingress-operator
Add the
spec.clientTLS
field and subfields to configure mutual TLS:Sample
IngressController
CR for aclientTLS
profile that specifies filtering patternsapiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: clientTLS: clientCertificatePolicy: Required clientCA: name: router-ca-certs-default allowedSubjectPatterns: - "^/CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift$"
-
Optional, get the Distinguished Name (DN) for
allowedSubjectPatterns
by entering the following command.
$ openssl x509 -in custom-cert.pem -noout -subject subject= /CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift
7.4. View the default Ingress Controller
The Ingress Operator is a core feature of OpenShift Container Platform and is enabled out of the box.
Every new OpenShift Container Platform installation has an ingresscontroller
named default. It can be supplemented with additional Ingress Controllers. If the default ingresscontroller
is deleted, the Ingress Operator will automatically recreate it within a minute.
Procedure
View the default Ingress Controller:
$ oc describe --namespace=openshift-ingress-operator ingresscontroller/default
7.5. View Ingress Operator status
You can view and inspect the status of your Ingress Operator.
Procedure
View your Ingress Operator status:
$ oc describe clusteroperators/ingress
7.6. View Ingress Controller logs
You can view your Ingress Controller logs.
Procedure
View your Ingress Controller logs:
$ oc logs --namespace=openshift-ingress-operator deployments/ingress-operator -c <container_name>
7.7. View Ingress Controller status
Your can view the status of a particular Ingress Controller.
Procedure
View the status of an Ingress Controller:
$ oc describe --namespace=openshift-ingress-operator ingresscontroller/<name>
7.8. Configuring the Ingress Controller
7.8.1. Setting a custom default certificate
As an administrator, you can configure an Ingress Controller to use a custom certificate by creating a Secret resource and editing the IngressController
custom resource (CR).
Prerequisites
- You must have a certificate/key pair in PEM-encoded files, where the certificate is signed by a trusted certificate authority or by a private trusted certificate authority that you configured in a custom PKI.
Your certificate meets the following requirements:
- The certificate is valid for the ingress domain.
-
The certificate uses the
subjectAltName
extension to specify a wildcard domain, such as*.apps.ocp4.example.com
.
You must have an
IngressController
CR. You may use the default one:$ oc --namespace openshift-ingress-operator get ingresscontrollers
Example output
NAME AGE default 10m
If you have intermediate certificates, they must be included in the tls.crt
file of the secret containing a custom default certificate. Order matters when specifying a certificate; list your intermediate certificate(s) after any server certificate(s).
Procedure
The following assumes that the custom certificate and key pair are in the tls.crt
and tls.key
files in the current working directory. Substitute the actual path names for tls.crt
and tls.key
. You also may substitute another name for custom-certs-default
when creating the Secret resource and referencing it in the IngressController CR.
This action will cause the Ingress Controller to be redeployed, using a rolling deployment strategy.
Create a Secret resource containing the custom certificate in the
openshift-ingress
namespace using thetls.crt
andtls.key
files.$ oc --namespace openshift-ingress create secret tls custom-certs-default --cert=tls.crt --key=tls.key
Update the IngressController CR to reference the new certificate secret:
$ oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \ --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'
Verify the update was effective:
$ echo Q |\ openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null |\ openssl x509 -noout -subject -issuer -enddate
where:
<domain>
- Specifies the base domain name for your cluster.
Example output
subject=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = *.apps.example.com issuer=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = example.com notAfter=May 10 08:32:45 2022 GM
TipYou can alternatively apply the following YAML to set a custom default certificate:
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: defaultCertificate: name: custom-certs-default
The certificate secret name should match the value used to update the CR.
Once the IngressController CR has been modified, the Ingress Operator updates the Ingress Controller’s deployment to use the custom certificate.
7.8.2. Removing a custom default certificate
As an administrator, you can remove a custom certificate that you configured an Ingress Controller to use.
Prerequisites
-
You have access to the cluster as a user with the
cluster-admin
role. -
You have installed the OpenShift CLI (
oc
). - You previously configured a custom default certificate for the Ingress Controller.
Procedure
To remove the custom certificate and restore the certificate that ships with OpenShift Container Platform, enter the following command:
$ oc patch -n openshift-ingress-operator ingresscontrollers/default \ --type json -p $'- op: remove\n path: /spec/defaultCertificate'
There can be a delay while the cluster reconciles the new certificate configuration.
Verification
To confirm that the original cluster certificate is restored, enter the following command:
$ echo Q | \ openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null | \ openssl x509 -noout -subject -issuer -enddate
where:
<domain>
- Specifies the base domain name for your cluster.
Example output
subject=CN = *.apps.<domain> issuer=CN = ingress-operator@1620633373 notAfter=May 10 10:44:36 2023 GMT
7.8.3. Autoscaling an Ingress Controller
Automatically scale an Ingress Controller to dynamically meet routing performance or availability requirements such as the requirement to increase throughput. The following procedure provides an example for scaling up the default IngressController
.
Prerequisites
-
You have the OpenShift CLI (
oc
) installed. -
You have access to an OpenShift Container Platform cluster as a user with the
cluster-admin
role. - You have the Custom Metrics Autoscaler Operator installed.
Procedure
Create a project in the
openshift-ingress-operator
namespace by running the following command:$ oc project openshift-ingress-operator
Enable OpenShift monitoring for user-defined projects by creating and applying a config map:
Create a new
ConfigMap
object,cluster-monitoring-config.yaml
:cluster-monitoring-config.yaml
apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | enableUserWorkload: true 1
- 1
- When set to
true
, theenableUserWorkload
parameter enables monitoring for user-defined projects in a cluster.
Apply the config map by running the following command:
$ oc apply -f cluster-monitoring-config.yaml
Create a service account to authenticate with Thanos by running the following command:
$ oc create serviceaccount thanos && oc describe serviceaccount thanos
Example output
Name: thanos Namespace: openshift-ingress-operator Labels: <none> Annotations: <none> Image pull secrets: thanos-dockercfg-b4l9s Mountable secrets: thanos-dockercfg-b4l9s Tokens: thanos-token-c422q Events: <none>
Define a
TriggerAuthentication
object within theopenshift-ingress-operator
namespace using the service account’s token.Define the variable
secret
that contains the secret by running the following command:$ secret=$(oc get secret | grep thanos-token | head -n 1 | awk '{ print $1 }')
Create the
TriggerAuthentication
object and pass the value of thesecret
variable to theTOKEN
parameter:$ oc process TOKEN="$secret" -f - <<EOF | oc apply -f - apiVersion: template.openshift.io/v1 kind: Template parameters: - name: TOKEN objects: - apiVersion: keda.sh/v1alpha1 kind: TriggerAuthentication metadata: name: keda-trigger-auth-prometheus spec: secretTargetRef: - parameter: bearerToken name: \${TOKEN} key: token - parameter: ca name: \${TOKEN} key: ca.crt EOF
Create and apply a role for reading metrics from Thanos:
Create a new role,
thanos-metrics-reader.yaml
, that reads metrics from pods and nodes:thanos-metrics-reader.yaml
apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: thanos-metrics-reader rules: - apiGroups: - "" resources: - pods - nodes verbs: - get - apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list - watch - apiGroups: - "" resources: - namespaces verbs: - get
Apply the new role by running the following command:
$ oc apply -f thanos-metrics-reader.yaml
Add the new role to the service account by entering the following commands:
$ oc adm policy add-role-to-user thanos-metrics-reader -z thanos --role=namespace=openshift-ingress-operator
$ oc adm policy -n openshift-ingress-operator add-cluster-role-to-user cluster-monitoring-view -z thanos
NoteThe argument
add-cluster-role-to-user
is only required if you use cross-namespace queries. The following step uses a query from thekube-metrics
namespace which requires this argument.Create a new
ScaledObject
YAML file,ingress-autoscaler.yaml
, that targets the default Ingress Controller deployment:Example
ScaledObject
definitionapiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: ingress-scaler spec: scaleTargetRef: 1 apiVersion: operator.openshift.io/v1 kind: IngressController name: default envSourceContainerName: ingress-operator minReplicaCount: 1 maxReplicaCount: 20 2 cooldownPeriod: 1 pollingInterval: 1 triggers: - type: prometheus metricType: AverageValue metadata: serverAddress: https://<example-cluster>:9091 3 namespace: openshift-ingress-operator 4 metricName: 'kube-node-role' threshold: '1' query: 'sum(kube_node_role{role="worker",service="kube-state-metrics"})' 5 authModes: "bearer" authenticationRef: name: keda-trigger-auth-prometheus
- 1
- The custom resource that you are targeting. In this case, the Ingress Controller.
- 2
- Optional: The maximum number of replicas. If you omit this field, the default maximum is set to 100 replicas.
- 3
- The cluster address and port.
- 4
- The Ingress Operator namespace.
- 5
- This expression evaluates to however many worker nodes are present in the deployed cluster.
ImportantIf you are using cross-namespace queries, you must target port 9091 and not port 9092 in the
serverAddress
field. You also must have elevated privileges to read metrics from this port.Apply the custom resource definition by running the following command:
$ oc apply -f ingress-autoscaler.yaml
Verification
Verify that the default Ingress Controller is scaled out to match the value returned by the
kube-state-metrics
query by running the following commands:Use the
grep
command to search the Ingress Controller YAML file for replicas:$ oc get ingresscontroller/default -o yaml | grep replicas:
Example output
replicas: 3
Get the pods in the
openshift-ingress
project:$ oc get pods -n openshift-ingress
Example output
NAME READY STATUS RESTARTS AGE router-default-7b5df44ff-l9pmm 2/2 Running 0 17h router-default-7b5df44ff-s5sl5 2/2 Running 0 3d22h router-default-7b5df44ff-wwsth 2/2 Running 0 66s
Additional resources
7.8.4. Scaling an Ingress Controller
Manually scale an Ingress Controller to meeting routing performance or availability requirements such as the requirement to increase throughput. oc
commands are used to scale the IngressController
resource. The following procedure provides an example for scaling up the default IngressController
.
Scaling is not an immediate action, as it takes time to create the desired number of replicas.
Procedure
View the current number of available replicas for the default
IngressController
:$ oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'
Example output
2
Scale the default
IngressController
to the desired number of replicas using theoc patch
command. The following example scales the defaultIngressController
to 3 replicas:$ oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"replicas": 3}}' --type=merge
Example output
ingresscontroller.operator.openshift.io/default patched
Verify that the default
IngressController
scaled to the number of replicas that you specified:$ oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'
Example output
3
TipYou can alternatively apply the following YAML to scale an Ingress Controller to three replicas:
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 3 1
- 1
- If you need a different amount of replicas, change the
replicas
value.
7.8.5. Configuring Ingress access logging
You can configure the Ingress Controller to enable access logs. If you have clusters that do not receive much traffic, then you can log to a sidecar. If you have high traffic clusters, to avoid exceeding the capacity of the logging stack or to integrate with a logging infrastructure outside of OpenShift Container Platform, you can forward logs to a custom syslog endpoint. You can also specify the format for access logs.
Container logging is useful to enable access logs on low-traffic clusters when there is no existing Syslog logging infrastructure, or for short-term use while diagnosing problems with the Ingress Controller.
Syslog is needed for high-traffic clusters where access logs could exceed the OpenShift Logging stack’s capacity, or for environments where any logging solution needs to integrate with an existing Syslog logging infrastructure. The Syslog use-cases can overlap.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
Configure Ingress access logging to a sidecar.
To configure Ingress access logging, you must specify a destination using
spec.logging.access.destination
. To specify logging to a sidecar container, you must specifyContainer
spec.logging.access.destination.type
. The following example is an Ingress Controller definition that logs to aContainer
destination:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Container
When you configure the Ingress Controller to log to a sidecar, the operator creates a container named
logs
inside the Ingress Controller Pod:$ oc -n openshift-ingress logs deployment.apps/router-default -c logs
Example output
2020-05-11T19:11:50.135710+00:00 router-default-57dfc6cd95-bpmk6 router-default-57dfc6cd95-bpmk6 haproxy[108]: 174.19.21.82:39654 [11/May/2020:19:11:50.133] public be_http:hello-openshift:hello-openshift/pod:hello-openshift:hello-openshift:10.128.2.12:8080 0/0/1/0/1 200 142 - - --NI 1/1/0/0/0 0/0 "GET / HTTP/1.1"
Configure Ingress access logging to a Syslog endpoint.
To configure Ingress access logging, you must specify a destination using
spec.logging.access.destination
. To specify logging to a Syslog endpoint destination, you must specifySyslog
forspec.logging.access.destination.type
. If the destination type isSyslog
, you must also specify a destination endpoint usingspec.logging.access.destination.syslog.endpoint
and you can specify a facility usingspec.logging.access.destination.syslog.facility
. The following example is an Ingress Controller definition that logs to aSyslog
destination:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Syslog syslog: address: 1.2.3.4 port: 10514
NoteThe
syslog
destination port must be UDP.
Configure Ingress access logging with a specific log format.
You can specify
spec.logging.access.httpLogFormat
to customize the log format. The following example is an Ingress Controller definition that logs to asyslog
endpoint with IP address 1.2.3.4 and port 10514:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: destination: type: Syslog syslog: address: 1.2.3.4 port: 10514 httpLogFormat: '%ci:%cp [%t] %ft %b/%s %B %bq %HM %HU %HV'
Disable Ingress access logging.
To disable Ingress access logging, leave
spec.logging
orspec.logging.access
empty:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: replicas: 2 logging: access: null
7.8.6. Setting Ingress Controller thread count
A cluster administrator can set the thread count to increase the amount of incoming connections a cluster can handle. You can patch an existing Ingress Controller to increase the amount of threads.
Prerequisites
- The following assumes that you already created an Ingress Controller.
Procedure
Update the Ingress Controller to increase the number of threads:
$ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"threadCount": 8}}}'
NoteIf you have a node that is capable of running large amounts of resources, you can configure
spec.nodePlacement.nodeSelector
with labels that match the capacity of the intended node, and configurespec.tuningOptions.threadCount
to an appropriately high value.
7.8.7. Configuring an Ingress Controller to use an internal load balancer
When creating an Ingress Controller on cloud platforms, the Ingress Controller is published by a public cloud load balancer by default. As an administrator, you can create an Ingress Controller that uses an internal cloud load balancer.
If your cloud provider is Microsoft Azure, you must have at least one public load balancer that points to your nodes. If you do not, all of your nodes will lose egress connectivity to the internet.
If you want to change the scope
for an IngressController
, you can change the .spec.endpointPublishingStrategy.loadBalancer.scope
parameter after the custom resource (CR) is created.
Figure 7.1. Diagram of LoadBalancer
The preceding graphic shows the following concepts pertaining to OpenShift Container Platform Ingress LoadBalancerService endpoint publishing strategy:
- You can load balance externally, using the cloud provider load balancer, or internally, using the OpenShift Ingress Controller Load Balancer.
- You can use the single IP address of the load balancer and more familiar ports, such as 8080 and 4200 as shown on the cluster depicted in the graphic.
- Traffic from the external load balancer is directed at the pods, and managed by the load balancer, as depicted in the instance of a down node. See the Kubernetes Services documentation for implementation details.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create an
IngressController
custom resource (CR) in a file named<name>-ingress-controller.yaml
, such as in the following example:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: <name> 1 spec: domain: <domain> 2 endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: Internal 3
Create the Ingress Controller defined in the previous step by running the following command:
$ oc create -f <name>-ingress-controller.yaml 1
- 1
- Replace
<name>
with the name of theIngressController
object.
Optional: Confirm that the Ingress Controller was created by running the following command:
$ oc --all-namespaces=true get ingresscontrollers
7.8.8. Configuring global access for an Ingress Controller on GCP
An Ingress Controller created on GCP with an internal load balancer generates an internal IP address for the service. A cluster administrator can specify the global access option, which enables clients in any region within the same VPC network and compute region as the load balancer, to reach the workloads running on your cluster.
For more information, see the GCP documentation for global access.
Prerequisites
- You deployed an OpenShift Container Platform cluster on GCP infrastructure.
- You configured an Ingress Controller to use an internal load balancer.
-
You installed the OpenShift CLI (
oc
).
Procedure
Configure the Ingress Controller resource to allow global access.
NoteYou can also create an Ingress Controller and specify the global access option.
Configure the Ingress Controller resource:
$ oc -n openshift-ingress-operator edit ingresscontroller/default
Edit the YAML file:
Sample
clientAccess
configuration toGlobal
spec: endpointPublishingStrategy: loadBalancer: providerParameters: gcp: clientAccess: Global 1 type: GCP scope: Internal type: LoadBalancerService
- 1
- Set
gcp.clientAccess
toGlobal
.
- Save the file to apply the changes.
Run the following command to verify that the service allows global access:
$ oc -n openshift-ingress edit svc/router-default -o yaml
The output shows that global access is enabled for GCP with the annotation,
networking.gke.io/internal-load-balancer-allow-global-access
.
7.8.9. Setting the Ingress Controller health check interval
A cluster administrator can set the health check interval to define how long the router waits between two consecutive health checks. This value is applied globally as a default for all routes. The default value is 5 seconds.
Prerequisites
- The following assumes that you already created an Ingress Controller.
Procedure
Update the Ingress Controller to change the interval between back end health checks:
$ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"healthCheckInterval": "8s"}}}'
NoteTo override the
healthCheckInterval
for a single route, use the route annotationrouter.openshift.io/haproxy.health.check.interval
7.8.10. Configuring the default Ingress Controller for your cluster to be internal
You can configure the default
Ingress Controller for your cluster to be internal by deleting and recreating it.
If your cloud provider is Microsoft Azure, you must have at least one public load balancer that points to your nodes. If you do not, all of your nodes will lose egress connectivity to the internet.
If you want to change the scope
for an IngressController
, you can change the .spec.endpointPublishingStrategy.loadBalancer.scope
parameter after the custom resource (CR) is created.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Configure the
default
Ingress Controller for your cluster to be internal by deleting and recreating it.$ oc replace --force --wait --filename - <<EOF apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: default spec: endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: Internal EOF
7.8.11. Configuring the route admission policy
Administrators and application developers can run applications in multiple namespaces with the same domain name. This is for organizations where multiple teams develop microservices that are exposed on the same hostname.
Allowing claims across namespaces should only be enabled for clusters with trust between namespaces, otherwise a malicious user could take over a hostname. For this reason, the default admission policy disallows hostname claims across namespaces.
Prerequisites
- Cluster administrator privileges.
Procedure
Edit the
.spec.routeAdmission
field of theingresscontroller
resource variable using the following command:$ oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge
Sample Ingress Controller configuration
spec: routeAdmission: namespaceOwnership: InterNamespaceAllowed ...
TipYou can alternatively apply the following YAML to configure the route admission policy:
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: routeAdmission: namespaceOwnership: InterNamespaceAllowed
7.8.12. Using wildcard routes
The HAProxy Ingress Controller has support for wildcard routes. The Ingress Operator uses wildcardPolicy
to configure the ROUTER_ALLOW_WILDCARD_ROUTES
environment variable of the Ingress Controller.
The default behavior of the Ingress Controller is to admit routes with a wildcard policy of None
, which is backwards compatible with existing IngressController
resources.
Procedure
Configure the wildcard policy.
Use the following command to edit the
IngressController
resource:$ oc edit IngressController
Under
spec
, set thewildcardPolicy
field toWildcardsDisallowed
orWildcardsAllowed
:spec: routeAdmission: wildcardPolicy: WildcardsDisallowed # or WildcardsAllowed
7.8.13. Using X-Forwarded headers
You configure the HAProxy Ingress Controller to specify a policy for how to handle HTTP headers including Forwarded
and X-Forwarded-For
. The Ingress Operator uses the HTTPHeaders
field to configure the ROUTER_SET_FORWARDED_HEADERS
environment variable of the Ingress Controller.
Procedure
Configure the
HTTPHeaders
field for the Ingress Controller.Use the following command to edit the
IngressController
resource:$ oc edit IngressController
Under
spec
, set theHTTPHeaders
policy field toAppend
,Replace
,IfNone
, orNever
:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpHeaders: forwardedHeaderPolicy: Append
Example use cases
As a cluster administrator, you can:
Configure an external proxy that injects the
X-Forwarded-For
header into each request before forwarding it to an Ingress Controller.To configure the Ingress Controller to pass the header through unmodified, you specify the
never
policy. The Ingress Controller then never sets the headers, and applications receive only the headers that the external proxy provides.Configure the Ingress Controller to pass the
X-Forwarded-For
header that your external proxy sets on external cluster requests through unmodified.To configure the Ingress Controller to set the
X-Forwarded-For
header on internal cluster requests, which do not go through the external proxy, specify theif-none
policy. If an HTTP request already has the header set through the external proxy, then the Ingress Controller preserves it. If the header is absent because the request did not come through the proxy, then the Ingress Controller adds the header.
As an application developer, you can:
Configure an application-specific external proxy that injects the
X-Forwarded-For
header.To configure an Ingress Controller to pass the header through unmodified for an application’s Route, without affecting the policy for other Routes, add an annotation
haproxy.router.openshift.io/set-forwarded-headers: if-none
orhaproxy.router.openshift.io/set-forwarded-headers: never
on the Route for the application.NoteYou can set the
haproxy.router.openshift.io/set-forwarded-headers
annotation on a per route basis, independent from the globally set value for the Ingress Controller.
7.8.14. Enabling HTTP/2 Ingress connectivity
You can enable transparent end-to-end HTTP/2 connectivity in HAProxy. It allows application owners to make use of HTTP/2 protocol capabilities, including single connection, header compression, binary streams, and more.
You can enable HTTP/2 connectivity for an individual Ingress Controller or for the entire cluster.
To enable the use of HTTP/2 for the connection from the client to HAProxy, a route must specify a custom certificate. A route that uses the default certificate cannot use HTTP/2. This restriction is necessary to avoid problems from connection coalescing, where the client re-uses a connection for different routes that use the same certificate.
The connection from HAProxy to the application pod can use HTTP/2 only for re-encrypt routes and not for edge-terminated or insecure routes. This restriction is because HAProxy uses Application-Level Protocol Negotiation (ALPN), which is a TLS extension, to negotiate the use of HTTP/2 with the back-end. The implication is that end-to-end HTTP/2 is possible with passthrough and re-encrypt and not with insecure or edge-terminated routes.
Using WebSockets with a re-encrypt route and with HTTP/2 enabled on an Ingress Controller requires WebSocket support over HTTP/2. WebSockets over HTTP/2 is a feature of HAProxy 2.4, which is unsupported in OpenShift Container Platform at this time.
For non-passthrough routes, the Ingress Controller negotiates its connection to the application independently of the connection from the client. This means a client may connect to the Ingress Controller and negotiate HTTP/1.1, and the Ingress Controller may then connect to the application, negotiate HTTP/2, and forward the request from the client HTTP/1.1 connection using the HTTP/2 connection to the application. This poses a problem if the client subsequently tries to upgrade its connection from HTTP/1.1 to the WebSocket protocol, because the Ingress Controller cannot forward WebSocket to HTTP/2 and cannot upgrade its HTTP/2 connection to WebSocket. Consequently, if you have an application that is intended to accept WebSocket connections, it must not allow negotiating the HTTP/2 protocol or else clients will fail to upgrade to the WebSocket protocol.
Procedure
Enable HTTP/2 on a single Ingress Controller.
To enable HTTP/2 on an Ingress Controller, enter the
oc annotate
command:$ oc -n openshift-ingress-operator annotate ingresscontrollers/<ingresscontroller_name> ingress.operator.openshift.io/default-enable-http2=true
Replace
<ingresscontroller_name>
with the name of the Ingress Controller to annotate.
Enable HTTP/2 on the entire cluster.
To enable HTTP/2 for the entire cluster, enter the
oc annotate
command:$ oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=true
TipYou can alternatively apply the following YAML to add the annotation:
apiVersion: config.openshift.io/v1 kind: Ingress metadata: name: cluster annotations: ingress.operator.openshift.io/default-enable-http2: "true"
7.8.15. Configuring the PROXY protocol for an Ingress Controller
A cluster administrator can configure the PROXY protocol when an Ingress Controller uses either the HostNetwork
or NodePortService
endpoint publishing strategy types. The PROXY protocol enables the load balancer to preserve the original client addresses for connections that the Ingress Controller receives. The original client addresses are useful for logging, filtering, and injecting HTTP headers. In the default configuration, the connections that the Ingress Controller receives only contain the source address that is associated with the load balancer.
This feature is not supported in cloud deployments. This restriction is because when OpenShift Container Platform runs in a cloud platform, and an IngressController specifies that a service load balancer should be used, the Ingress Operator configures the load balancer service and enables the PROXY protocol based on the platform requirement for preserving source addresses.
You must configure both OpenShift Container Platform and the external load balancer to either use the PROXY protocol or to use TCP.
The PROXY protocol is unsupported for the default Ingress Controller with installer-provisioned clusters on non-cloud platforms that use a Keepalived Ingress VIP.
Prerequisites
- You created an Ingress Controller.
Procedure
Edit the Ingress Controller resource:
$ oc -n openshift-ingress-operator edit ingresscontroller/default
Set the PROXY configuration:
If your Ingress Controller uses the hostNetwork endpoint publishing strategy type, set the
spec.endpointPublishingStrategy.hostNetwork.protocol
subfield toPROXY
:Sample
hostNetwork
configuration toPROXY
spec: endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork
If your Ingress Controller uses the NodePortService endpoint publishing strategy type, set the
spec.endpointPublishingStrategy.nodePort.protocol
subfield toPROXY
:Sample
nodePort
configuration toPROXY
spec: endpointPublishingStrategy: nodePort: protocol: PROXY type: NodePortService
7.8.16. Specifying an alternative cluster domain using the appsDomain option
As a cluster administrator, you can specify an alternative to the default cluster domain for user-created routes by configuring the appsDomain
field. The appsDomain
field is an optional domain for OpenShift Container Platform to use instead of the default, which is specified in the domain
field. If you specify an alternative domain, it overrides the default cluster domain for the purpose of determining the default host for a new route.
For example, you can use the DNS domain for your company as the default domain for routes and ingresses for applications running on your cluster.
Prerequisites
- You deployed an OpenShift Container Platform cluster.
-
You installed the
oc
command line interface.
Procedure
Configure the
appsDomain
field by specifying an alternative default domain for user-created routes.Edit the ingress
cluster
resource:$ oc edit ingresses.config/cluster -o yaml
Edit the YAML file:
Sample
appsDomain
configuration totest.example.com
apiVersion: config.openshift.io/v1 kind: Ingress metadata: name: cluster spec: domain: apps.example.com 1 appsDomain: <test.example.com> 2
Verify that an existing route contains the domain name specified in the
appsDomain
field by exposing the route and verifying the route domain change:NoteWait for the
openshift-apiserver
finish rolling updates before exposing the route.Expose the route:
$ oc expose service hello-openshift route.route.openshift.io/hello-openshift exposed
Example output:
$ oc get routes NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD hello-openshift hello_openshift-<my_project>.test.example.com hello-openshift 8080-tcp None
7.8.17. Converting HTTP header case
HAProxy lowercases HTTP header names by default; for example, changing Host: xyz.com
to host: xyz.com
. If legacy applications are sensitive to the capitalization of HTTP header names, use the Ingress Controller spec.httpHeaders.headerNameCaseAdjustments
API field for a solution to accommodate legacy applications until they can be fixed.
OpenShift Container Platform includes HAProxy 2.2. If you want to update to this version of the web-based load balancer, ensure that you add the spec.httpHeaders.headerNameCaseAdjustments
section to your cluster’s configuration file.
As a cluster administrator, you can convert the HTTP header case by entering the oc patch
command or by setting the HeaderNameCaseAdjustments
field in the Ingress Controller YAML file.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). -
You have access to the cluster as a user with the
cluster-admin
role.
Procedure
Capitalize an HTTP header by using the
oc patch
command.Change the HTTP header from
host
toHost
by running the following command:$ oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"httpHeaders":{"headerNameCaseAdjustments":["Host"]}}}'
Create a
Route
resource YAML file so that the annotation can be applied to the application.Example of a route named
my-application
apiVersion: route.openshift.io/v1 kind: Route metadata: annotations: haproxy.router.openshift.io/h1-adjust-case: true 1 name: <application_name> namespace: <application_name> # ...
- 1
- Set
haproxy.router.openshift.io/h1-adjust-case
so that the Ingress Controller can adjust thehost
request header as specified.
Specify adjustments by configuring the
HeaderNameCaseAdjustments
field in the Ingress Controller YAML configuration file.The following example Ingress Controller YAML file adjusts the
host
header toHost
for HTTP/1 requests to appropriately annotated routes:Example Ingress Controller YAML
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpHeaders: headerNameCaseAdjustments: - Host
The following example route enables HTTP response header name case adjustments by using the
haproxy.router.openshift.io/h1-adjust-case
annotation:Example route YAML
apiVersion: route.openshift.io/v1 kind: Route metadata: annotations: haproxy.router.openshift.io/h1-adjust-case: true 1 name: my-application namespace: my-application spec: to: kind: Service name: my-application
- 1
- Set
haproxy.router.openshift.io/h1-adjust-case
to true.
7.8.18. Using router compression
You configure the HAProxy Ingress Controller to specify router compression globally for specific MIME types. You can use the mimeTypes
variable to define the formats of MIME types to which compression is applied. The types are: application, image, message, multipart, text, video, or a custom type prefaced by "X-". To see the full notation for MIME types and subtypes, see RFC1341.
Memory allocated for compression can affect the max connections. Additionally, compression of large buffers can cause latency, like heavy regex or long lists of regex.
Not all MIME types benefit from compression, but HAProxy still uses resources to try to compress if instructed to. Generally, text formats, such as html, css, and js, formats benefit from compression, but formats that are already compressed, such as image, audio, and video, benefit little in exchange for the time and resources spent on compression.
Procedure
Configure the
httpCompression
field for the Ingress Controller.Use the following command to edit the
IngressController
resource:$ oc edit -n openshift-ingress-operator ingresscontrollers/default
Under
spec
, set thehttpCompression
policy field tomimeTypes
and specify a list of MIME types that should have compression applied:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: default namespace: openshift-ingress-operator spec: httpCompression: mimeTypes: - "text/html" - "text/css; charset=utf-8" - "application/json" ...
7.8.19. Exposing router metrics
You can expose the HAProxy router metrics by default in Prometheus format on the default stats port, 1936. The external metrics collection and aggregation systems such as Prometheus can access the HAProxy router metrics. You can view the HAProxy router metrics in a browser in the HTML and comma separated values (CSV) format.
Prerequisites
- You configured your firewall to access the default stats port, 1936.
Procedure
Get the router pod name by running the following command:
$ oc get pods -n openshift-ingress
Example output
NAME READY STATUS RESTARTS AGE router-default-76bfffb66c-46qwp 1/1 Running 0 11h
Get the router’s username and password, which the router pod stores in the
/var/lib/haproxy/conf/metrics-auth/statsUsername
and/var/lib/haproxy/conf/metrics-auth/statsPassword
files:Get the username by running the following command:
$ oc rsh <router_pod_name> cat metrics-auth/statsUsername
Get the password by running the following command:
$ oc rsh <router_pod_name> cat metrics-auth/statsPassword
Get the router IP and metrics certificates by running the following command:
$ oc describe pod <router_pod>
Get the raw statistics in Prometheus format by running the following command:
$ curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics
Access the metrics securely by running the following command:
$ curl -u user:password https://<router_IP>:<stats_port>/metrics -k
Access the default stats port, 1936, by running the following command:
$ curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics
Example 7.1. Example output
... # HELP haproxy_backend_connections_total Total number of connections. # TYPE haproxy_backend_connections_total gauge haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route"} 0 haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route-alt"} 0 haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route01"} 0 ... # HELP haproxy_exporter_server_threshold Number of servers tracked and the current threshold value. # TYPE haproxy_exporter_server_threshold gauge haproxy_exporter_server_threshold{type="current"} 11 haproxy_exporter_server_threshold{type="limit"} 500 ... # HELP haproxy_frontend_bytes_in_total Current total of incoming bytes. # TYPE haproxy_frontend_bytes_in_total gauge haproxy_frontend_bytes_in_total{frontend="fe_no_sni"} 0 haproxy_frontend_bytes_in_total{frontend="fe_sni"} 0 haproxy_frontend_bytes_in_total{frontend="public"} 119070 ... # HELP haproxy_server_bytes_in_total Current total of incoming bytes. # TYPE haproxy_server_bytes_in_total gauge haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_no_sni",service=""} 0 haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_sni",service=""} 0 haproxy_server_bytes_in_total{namespace="default",pod="docker-registry-5-nk5fz",route="docker-registry",server="10.130.0.89:5000",service="docker-registry"} 0 haproxy_server_bytes_in_total{namespace="default",pod="hello-rc-vkjqx",route="hello-route",server="10.130.0.90:8080",service="hello-svc-1"} 0 ...
Launch the stats window by entering the following URL in a browser:
http://<user>:<password>@<router_IP>:<stats_port>
Optional: Get the stats in CSV format by entering the following URL in a browser:
http://<user>:<password>@<router_ip>:1936/metrics;csv
7.8.20. Customizing HAProxy error code response pages
As a cluster administrator, you can specify a custom error code response page for either 503, 404, or both error pages. The HAProxy router serves a 503 error page when the application pod is not running or a 404 error page when the requested URL does not exist. For example, if you customize the 503 error code response page, then the page is served when the application pod is not running, and the default 404 error code HTTP response page is served by the HAProxy router for an incorrect route or a non-existing route.
Custom error code response pages are specified in a config map then patched to the Ingress Controller. The config map keys have two available file names as follows: error-page-503.http
and error-page-404.http
.
Custom HTTP error code response pages must follow the HAProxy HTTP error page configuration guidelines. Here is an example of the default OpenShift Container Platform HAProxy router http 503 error code response page. You can use the default content as a template for creating your own custom page.
By default, the HAProxy router serves only a 503 error page when the application is not running or when the route is incorrect or non-existent. This default behavior is the same as the behavior on OpenShift Container Platform 4.8 and earlier. If a config map for the customization of an HTTP error code response is not provided, and you are using a custom HTTP error code response page, the router serves a default 404 or 503 error code response page.
If you use the OpenShift Container Platform default 503 error code page as a template for your customizations, the headers in the file require an editor that can use CRLF line endings.
Procedure
Create a config map named
my-custom-error-code-pages
in theopenshift-config
namespace:$ oc -n openshift-config create configmap my-custom-error-code-pages \ --from-file=error-page-503.http \ --from-file=error-page-404.http
ImportantIf you do not specify the correct format for the custom error code response page, a router pod outage occurs. To resolve this outage, you must delete or correct the config map and delete the affected router pods so they can be recreated with the correct information.
Patch the Ingress Controller to reference the
my-custom-error-code-pages
config map by name:$ oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
The Ingress Operator copies the
my-custom-error-code-pages
config map from theopenshift-config
namespace to theopenshift-ingress
namespace. The Operator names the config map according to the pattern,<your_ingresscontroller_name>-errorpages
, in theopenshift-ingress
namespace.Display the copy:
$ oc get cm default-errorpages -n openshift-ingress
Example output
NAME DATA AGE default-errorpages 2 25s 1
- 1
- The example config map name is
default-errorpages
because thedefault
Ingress Controller custom resource (CR) was patched.
Confirm that the config map containing the custom error response page mounts on the router volume where the config map key is the filename that has the custom HTTP error code response:
For 503 custom HTTP custom error code response:
$ oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-503.http
For 404 custom HTTP custom error code response:
$ oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-404.http
Verification
Verify your custom error code HTTP response:
Create a test project and application:
$ oc new-project test-ingress
$ oc new-app django-psql-example
For 503 custom http error code response:
- Stop all the pods for the application.
Run the following curl command or visit the route hostname in the browser:
$ curl -vk <route_hostname>
For 404 custom http error code response:
- Visit a non-existent route or an incorrect route.
Run the following curl command or visit the route hostname in the browser:
$ curl -vk <route_hostname>
Check if the
errorfile
attribute is properly in thehaproxy.config
file:$ oc -n openshift-ingress rsh <router> cat /var/lib/haproxy/conf/haproxy.config | grep errorfile
7.8.21. Setting the Ingress Controller maximum connections
A cluster administrator can set the maximum number of simultaneous connections for OpenShift router deployments. You can patch an existing Ingress Controller to increase the maximum number of connections.
Prerequisites
- The following assumes that you already created an Ingress Controller
Procedure
Update the Ingress Controller to change the maximum number of connections for HAProxy:
$ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"maxConnections": 7500}}}'
WarningIf you set the
spec.tuningOptions.maxConnections
value greater than the current operating system limit, the HAProxy process will not start. See the table in the "Ingress Controller configuration parameters" section for more information about this parameter.
7.9. Additional resources
Chapter 8. Ingress Node Firewall Operator in OpenShift Container Platform
The Ingress Node Firewall Operator allows administrators to manage firewall configurations at the node level.
8.1. Ingress Node Firewall Operator
The Ingress Node Firewall Operator provides ingress firewall rules at a node level by deploying the daemon set to nodes you specify and manage in the firewall configurations. To deploy the daemon set, you create an IngressNodeFirewallConfig
custom resource (CR). The Operator applies the IngressNodeFirewallConfig
CR to create ingress node firewall daemon set daemon
, which run on all nodes that match the nodeSelector
.
You configure rules
of the IngressNodeFirewall
CR and apply them to clusters using the nodeSelector
and setting values to "true".
The Ingress Node Firewall Operator supports only stateless firewall rules.
The maximum transmission units (MTU) parameter is 4Kb (kilobytes) in OpenShift Container Platform 4.12.
Network interface controllers (NICs) that do not support native XDP drivers will run at a lower performance.
Ingress Node Firewall Operator is not supported on Amazon Web Services (AWS) with the default OpenShift installation or on Red Hat OpenShift Service on AWS (ROSA). For more information on Red Hat OpenShift Service on AWS support and ingress, see Ingress Operator in Red Hat OpenShift Service on AWS.
8.2. Installing the Ingress Node Firewall Operator
As a cluster administrator, you can install the Ingress Node Firewall Operator by using the OpenShift Container Platform CLI or the web console.
8.2.1. Installing the Ingress Node Firewall Operator using the CLI
As a cluster administrator, you can install the Operator using the CLI.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). - You have an account with administrator privileges.
Procedure
To create the
openshift-ingress-node-firewall
namespace, enter the following command:$ cat << EOF| oc create -f - apiVersion: v1 kind: Namespace metadata: labels: pod-security.kubernetes.io/enforce: privileged pod-security.kubernetes.io/enforce-version: v1.24 name: openshift-ingress-node-firewall EOF
To create an
OperatorGroup
CR, enter the following command:$ cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: ingress-node-firewall-operators namespace: openshift-ingress-node-firewall EOF
Subscribe to the Ingress Node Firewall Operator.
To create a
Subscription
CR for the Ingress Node Firewall Operator, enter the following command:$ cat << EOF| oc create -f - apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ingress-node-firewall-sub namespace: openshift-ingress-node-firewall spec: name: ingress-node-firewall channel: stable source: redhat-operators sourceNamespace: openshift-marketplace EOF
To verify that the Operator is installed, enter the following command:
$ oc get ip -n openshift-ingress-node-firewall
Example output
NAME CSV APPROVAL APPROVED install-5cvnz ingress-node-firewall.4.12.0-202211122336 Automatic true
To verify the version of the Operator, enter the following command:
$ oc get csv -n openshift-ingress-node-firewall
Example output
NAME DISPLAY VERSION REPLACES PHASE ingress-node-firewall.4.12.0-202211122336 Ingress Node Firewall Operator 4.12.0-202211122336 ingress-node-firewall.4.12.0-202211102047 Succeeded
8.2.2. Installing the Ingress Node Firewall Operator using the web console
As a cluster administrator, you can install the Operator using the web console.
Prerequisites
-
You have installed the OpenShift CLI (
oc
). - You have an account with administrator privileges.
Procedure
Install the Ingress Node Firewall Operator:
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Select Ingress Node Firewall Operator from the list of available Operators, and then click Install.
- On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
- Click Install.
Verify that the Ingress Node Firewall Operator is installed successfully:
- Navigate to the Operators → Installed Operators page.
Ensure that Ingress Node Firewall Operator is listed in the openshift-ingress-node-firewall project with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the Operator does not have a Status of InstallSucceeded, troubleshoot using the following steps:
- Inspect the Operator Subscriptions and Install Plans tabs for any failures or errors under Status.
-
Navigate to the Workloads → Pods page and check the logs for pods in the
openshift-ingress-node-firewall
project. Check the namespace of the YAML file. If the annotation is missing, you can add the annotation
workload.openshift.io/allowed=management
to the Operator namespace with the following command:$ oc annotate ns/openshift-ingress-node-firewall workload.openshift.io/allowed=management
NoteFor single-node OpenShift clusters, the
openshift-ingress-node-firewall
namespace requires theworkload.openshift.io/allowed=management
annotation.
8.3. Deploying Ingress Node Firewall Operator
Prerequisite
- The Ingress Node Firewall Operator is installed.
Procedure
To deploy the Ingress Node Firewall Operator, create a IngressNodeFirewallConfig
custom resource that will deploy the Operator’s daemon set. You can deploy one or multiple IngressNodeFirewall
CRDs to nodes by applying firewall rules.
-
Create the
IngressNodeFirewallConfig
inside theopenshift-ingress-node-firewall
namespace namedingressnodefirewallconfig
. Run the following command to deploy Ingress Node Firewall Operator rules:
$ oc apply -f rule.yaml
8.3.1. Ingress Node Firewall configuration object
The fields for the Ingress Node Firewall configuration object are described in the following table:
Field | Type | Description |
---|---|---|
|
|
The name of the CR object. The name of the firewall rules object must be |
|
|
Namespace for the Ingress Firewall Operator CR object. The |
|
| A node selection constraint used to target nodes through specified node labels. For example: spec: nodeSelector: node-role.kubernetes.io/worker: "" Note
One label used in |
The Operator consumes the CR and creates an ingress node firewall daemon set on all the nodes that match the nodeSelector
.
Ingress Node Firewall Operator example configuration
A complete Ingress Node Firewall Configuration is specified in the following example:
Example Ingress Node Firewall Configuration object
apiVersion: ingressnodefirewall.openshift.io/v1alpha1 kind: IngressNodeFirewallConfig metadata: name: ingressnodefirewallconfig namespace: openshift-ingress-node-firewall spec: nodeSelector: node-role.kubernetes.io/worker: ""
The Operator consumes the CR and creates an ingress node firewall daemon set on all the nodes that match the nodeSelector
.
8.3.2. Ingress Node Firewall rules object
The fields for the Ingress Node Firewall rules object are described in the following table:
Field | Type | Description |
---|---|---|
|
| The name of the CR object. |
|
|
The fields for this object specify the interfaces to apply the firewall rules to. For example, |
|
|
You can use |
|
|
|
Ingress object configuration
The values for the ingress
object are defined in the following table:
Field | Type | Description |
---|---|---|
|
| Allows you to set the CIDR block. You can configure multiple CIDRs from different address families. Note
Different CIDRs allow you to use the same order rule. In the case that there are multiple |
|
|
Ingress firewall
Set Note Ingress firewall rules are verified using a verification webhook that blocks any invalid configuration. The verification webhook prevents you from blocking any critical cluster services such as the API server or SSH. |
Ingress Node Firewall rules object example
A complete Ingress Node Firewall configuration is specified in the following example:
Example Ingress Node Firewall configuration
apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewall
metadata:
name: ingressnodefirewall
spec:
interfaces:
- eth0
nodeSelector:
matchLabels:
<ingress_firewall_label_name>: <label_value> 1
ingress:
- sourceCIDRs:
- 172.16.0.0/12
rules:
- order: 10
protocolConfig:
protocol: ICMP
icmp:
icmpType: 8 #ICMP Echo request
action: Deny
- order: 20
protocolConfig:
protocol: TCP
tcp:
ports: "8000-9000"
action: Deny
- sourceCIDRs:
- fc00:f853:ccd:e793::0/64
rules:
- order: 10
protocolConfig:
protocol: ICMPv6
icmpv6:
icmpType: 128 #ICMPV6 Echo request
action: Deny
- 1
- A <label_name> and a <label_value> must exist on the node and must match the
nodeselector
label and value applied to the nodes you want theingressfirewallconfig
CR to run on. The <label_value> can betrue
orfalse
. By usingnodeSelector
labels, you can target separate groups of nodes to apply different rules to using theingressfirewallconfig
CR.
Zero trust Ingress Node Firewall rules object example
Zero trust Ingress Node Firewall rules can provide additional security to multi-interface clusters. For example, you can use zero trust Ingress Node Firewall rules to drop all traffic on a specific interface except for SSH.
A complete configuration of a zero trust Ingress Node Firewall rule set is specified in the following example:
Users need to add all ports their application will use to their allowlist in the following case to ensure proper functionality.
Example zero trust Ingress Node Firewall rules
apiVersion: ingressnodefirewall.openshift.io/v1alpha1 kind: IngressNodeFirewall metadata: name: ingressnodefirewall-zero-trust spec: interfaces: - eth1 1 nodeSelector: matchLabels: <ingress_firewall_label_name>: <label_value> 2 ingress: - sourceCIDRs: - 0.0.0.0/0 3 rules: - order: 10 protocolConfig: protocol: TCP tcp: ports: 22 action: Allow - order: 20 action: Deny 4
8.4. Viewing Ingress Node Firewall Operator rules
Procedure
Run the following command to view all current rules :
$ oc get ingressnodefirewall
Choose one of the returned
<resource>
names and run the following command to view the rules or configs:$ oc get <resource> <name> -o yaml
8.5. Troubleshooting the Ingress Node Firewall Operator
Run the following command to list installed Ingress Node Firewall custom resource definitions (CRD):
$ oc get crds | grep ingressnodefirewall
Example output
NAME READY UP-TO-DATE AVAILABLE AGE ingressnodefirewallconfigs.ingressnodefirewall.openshift.io 2022-08-25T10:03:01Z ingressnodefirewallnodestates.ingressnodefirewall.openshift.io 2022-08-25T10:03:00Z ingressnodefirewalls.ingressnodefirewall.openshift.io 2022-08-25T10:03:00Z
Run the following command to view the state of the Ingress Node Firewall Operator:
$ oc get pods -n openshift-ingress-node-firewall
Example output
NAME READY STATUS RESTARTS AGE ingress-node-firewall-controller-manager 2/2 Running 0 5d21h ingress-node-firewall-daemon-pqx56 3/3 Running 0 5d21h
The following fields provide information about the status of the Operator:
READY
,STATUS
,AGE
, andRESTARTS
. TheSTATUS
field isRunning
when the Ingress Node Firewall Operator is deploying a daemon set to the assigned nodes.Run the following command to collect all ingress firewall node pods' logs:
$ oc adm must-gather – gather_ingress_node_firewall
The logs are available in the sos node’s report containing eBPF
bpftool
outputs at/sos_commands/ebpf
. These reports include lookup tables used or updated as the ingress firewall XDP handles packet processing, updates statistics, and emits events.
Chapter 9. Configuring an Ingress Controller for manual DNS Management
As a cluster administrator, when you create an Ingress Controller, the Operator manages the DNS records automatically. This has some limitations when the required DNS zone is different from the cluster DNS zone or when the DNS zone is hosted outside the cloud provider.
As a cluster administrator, you can configure an Ingress Controller to stop automatic DNS management and start manual DNS management. Set dnsManagementPolicy
to specify when it should be automatically or manually managed.
When you change an Ingress Controller from Managed
to Unmanaged
DNS management policy, the Operator does not clean up the previous wildcard DNS record provisioned on the cloud. When you change an Ingress Controller from Unmanaged
to Managed
DNS management policy, the Operator attempts to create the DNS record on the cloud provider if it does not exist or updates the DNS record if it already exists.
When you set dnsManagementPolicy
to unmanaged
, you have to manually manage the lifecycle of the wildcard DNS record on the cloud provider.
9.1. Managed
DNS management policy
The Managed
DNS management policy for Ingress Controllers ensures that the lifecycle of the wildcard DNS record on the cloud provider is automatically managed by the Operator.
9.2. Unmanaged
DNS management policy
The Unmanaged
DNS management policy for Ingress Controllers ensures that the lifecycle of the wildcard DNS record on the cloud provider is not automatically managed, instead it becomes the responsibility of the cluster administrator.
On the AWS cloud platform, if the domain on the Ingress Controller does not match with dnsConfig.Spec.BaseDomain
then the DNS management policy is automatically set to Unmanaged
.
9.3. Creating a custom Ingress Controller with the Unmanaged
DNS management policy
As a cluster administrator, you can create a new custom Ingress Controller with the Unmanaged
DNS management policy.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a custom resource (CR) file named
sample-ingress.yaml
containing the following:apiVersion: operator.openshift.io/v1 kind: IngressController metadata: namespace: openshift-ingress-operator name: <name> 1 spec: domain: <domain> 2 endpointPublishingStrategy: type: LoadBalancerService loadBalancer: scope: External 3 dnsManagementPolicy: Unmanaged 4
- 1
- Specify the
<name>
with a name for theIngressController
object. - 2
- Specify the
domain
based on the DNS record that was created as a prerequisite. - 3
- Specify the
scope
asExternal
to expose the load balancer externally. - 4
dnsManagementPolicy
indicates if the Ingress Controller is managing the lifecycle of the wildcard DNS record associated with the load balancer. The valid values areManaged
andUnmanaged
. The default value isManaged
.
Save the file to apply the changes.
oc apply -f <name>.yaml 1
9.4. Modifying an existing Ingress Controller
As a cluster administrator, you can modify an existing Ingress Controller to manually manage the DNS record lifecycle.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Modify the chosen
IngressController
to setdnsManagementPolicy
:SCOPE=$(oc -n openshift-ingress-operator get ingresscontroller <name> -o=jsonpath="{.status.endpointPublishingStrategy.loadBalancer.scope}") oc -n openshift-ingress-operator patch ingresscontrollers/<name> --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"dnsManagementPolicy":"Unmanaged", "scope":"${SCOPE}"}}}}'
- Optional: You can delete the associated DNS record in the cloud provider.
9.5. Additional resources
Chapter 10. Configuring the Ingress Controller endpoint publishing strategy
The endpointPublishingStrategy
is used to publish the Ingress Controller endpoints to other networks, enable load balancer integrations, and provide access to other systems.
On Red Hat OpenStack Platform (RHOSP), the LoadBalancerService
endpoint publishing strategy is supported only if a cloud provider is configured to create health monitors. For RHOSP 16.2, this strategy is possible only if you use the Amphora Octavia provider.
For more information, see the "Setting RHOSP Cloud Controller Manager options" section of the RHOSP installation documentation.
10.1. Ingress Controller endpoint publishing strategy
NodePortService
endpoint publishing strategy
The NodePortService
endpoint publishing strategy publishes the Ingress Controller using a Kubernetes NodePort service.
In this configuration, the Ingress Controller deployment uses container networking. A NodePortService
is created to publish the deployment. The specific node ports are dynamically allocated by OpenShift Container Platform; however, to support static port allocations, your changes to the node port field of the managed NodePortService
are preserved.
Figure 10.1. Diagram of NodePortService
The preceding graphic shows the following concepts pertaining to OpenShift Container Platform Ingress NodePort endpoint publishing strategy:
- All the available nodes in the cluster have their own, externally accessible IP addresses. The service running in the cluster is bound to the unique NodePort for all the nodes.
-
When the client connects to a node that is down, for example, by connecting the
10.0.128.4
IP address in the graphic, the node port directly connects the client to an available node that is running the service. In this scenario, no load balancing is required. As the image shows, the10.0.128.4
address is down and another IP address must be used instead.
The Ingress Operator ignores any updates to .spec.ports[].nodePort
fields of the service.
By default, ports are allocated automatically and you can access the port allocations for integrations. However, sometimes static port allocations are necessary to integrate with existing infrastructure which may not be easily reconfigured in response to dynamic ports. To achieve integrations with static node ports, you can update the managed service resource directly.
For more information, see the Kubernetes Services documentation on NodePort
.
HostNetwork
endpoint publishing strategy
The HostNetwork
endpoint publishing strategy publishes the Ingress Controller on node ports where the Ingress Controller is deployed.
An Ingress Controller with the HostNetwork
endpoint publishing strategy can have only one pod replica per node. If you want n replicas, you must use at least n nodes where those replicas can be scheduled. Because each pod replica requests ports 80
and 443
on the node host where it is scheduled, a replica cannot be scheduled to a node if another pod on the same node is using those ports.
The HostNetwork
object has a hostNetwork
field with the following default values for the optional binding ports: httpPort: 80
, httpsPort: 443
, and statsPort: 1936
. By specifying different binding ports for your network, you can deploy multiple Ingress Controllers on the same node for the HostNetwork
strategy.
Example
apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: internal namespace: openshift-ingress-operator spec: domain: example.com endpointPublishingStrategy: type: HostNetwork hostNetwork: httpPort: 80 httpsPort: 443 statsPort: 1936
10.1.1. Configuring the Ingress Controller endpoint publishing scope to Internal
When a cluster administrator installs a new cluster without specifying that the cluster is private, the default Ingress Controller is created with a scope
set to External
. Cluster administrators can change an External
scoped Ingress Controller to Internal
.
Prerequisites
-
You installed the
oc
CLI.
Procedure
To change an
External
scoped Ingress Controller toInternal
, enter the following command:$ oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"Internal"}}}}'
To check the status of the Ingress Controller, enter the following command:
$ oc -n openshift-ingress-operator get ingresscontrollers/default -o yaml
The
Progressing
status condition indicates whether you must take further action. For example, the status condition can indicate that you need to delete the service by entering the following command:$ oc -n openshift-ingress delete services/router-default
If you delete the service, the Ingress Operator recreates it as
Internal
.
10.1.2. Configuring the Ingress Controller endpoint publishing scope to External
When a cluster administrator installs a new cluster without specifying that the cluster is private, the default Ingress Controller is created with a scope
set to External
.
The Ingress Controller’s scope can be configured to be Internal
during installation or after, and cluster administrators can change an Internal
Ingress Controller to External
.
On some platforms, it is necessary to delete and recreate the service.
Changing the scope can cause disruption to Ingress traffic, potentially for several minutes. This applies to platforms where it is necessary to delete and recreate the service, because the procedure can cause OpenShift Container Platform to deprovision the existing service load balancer, provision a new one, and update DNS.
Prerequisites
-
You installed the
oc
CLI.
Procedure
To change an
Internal
scoped Ingress Controller toExternal
, enter the following command:$ oc -n openshift-ingress-operator patch ingresscontrollers/private --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"External"}}}}'
To check the status of the Ingress Controller, enter the following command:
$ oc -n openshift-ingress-operator get ingresscontrollers/default -o yaml
The
Progressing
status condition indicates whether you must take further action. For example, the status condition can indicate that you need to delete the service by entering the following command:$ oc -n openshift-ingress delete services/router-default
If you delete the service, the Ingress Operator recreates it as
External
.
10.2. Additional resources
Chapter 11. Verifying connectivity to an endpoint
The Cluster Network Operator (CNO) runs a controller, the connectivity check controller, that performs a connection health check between resources within your cluster. By reviewing the results of the health checks, you can diagnose connection problems or eliminate network connectivity as the cause of an issue that you are investigating.
11.1. Connection health checks performed
To verify that cluster resources are reachable, a TCP connection is made to each of the following cluster API services:
- Kubernetes API server service
- Kubernetes API server endpoints
- OpenShift API server service
- OpenShift API server endpoints
- Load balancers
To verify that services and service endpoints are reachable on every node in the cluster, a TCP connection is made to each of the following targets:
- Health check target service
- Health check target endpoints
11.2. Implementation of connection health checks
The connectivity check controller orchestrates connection verification checks in your cluster. The results for the connection tests are stored in PodNetworkConnectivity
objects in the openshift-network-diagnostics
namespace. Connection tests are performed every minute in parallel.
The Cluster Network Operator (CNO) deploys several resources to the cluster to send and receive connectivity health checks:
- Health check source
-
This program deploys in a single pod replica set managed by a
Deployment
object. The program consumesPodNetworkConnectivity
objects and connects to thespec.targetEndpoint
specified in each object. - Health check target
- A pod deployed as part of a daemon set on every node in the cluster. The pod listens for inbound health checks. The presence of this pod on every node allows for the testing of connectivity to each node.
11.3. PodNetworkConnectivityCheck object fields
The PodNetworkConnectivityCheck
object fields are described in the following tables.
Field | Type | Description |
---|---|---|
|
|
The name of the object in the following format:
|
|
|
The namespace that the object is associated with. This value is always |
|
|
The name of the pod where the connection check originates, such as |
|
|
The target of the connection check, such as |
|
| Configuration for the TLS certificate to use. |
|
| The name of the TLS certificate used, if any. The default value is an empty string. |
|
| An object representing the condition of the connection test and logs of recent connection successes and failures. |
|
| The latest status of the connection check and any previous statuses. |
|
| Connection test logs from unsuccessful attempts. |
|
| Connect test logs covering the time periods of any outages. |
|
| Connection test logs from successful attempts. |
The following table describes the fields for objects in the status.conditions
array:
Field | Type | Description |
---|---|---|
|
| The time that the condition of the connection transitioned from one status to another. |
|
| The details about last transition in a human readable format. |
|
| The last status of the transition in a machine readable format. |
|
| The status of the condition. |
|
| The type of the condition. |
The following table describes the fields for objects in the status.conditions
array:
Field | Type | Description |
---|---|---|
|
| The timestamp from when the connection failure is resolved. |
|
| Connection log entries, including the log entry related to the successful end of the outage. |
|
| A summary of outage details in a human readable format. |
|
| The timestamp from when the connection failure is first detected. |
|
| Connection log entries, including the original failure. |
Connection log fields
The fields for a connection log entry are described in the following table. The object is used in the following fields:
-
status.failures[]
-
status.successes[]
-
status.outages[].startLogs[]
-
status.outages[].endLogs[]
Field | Type | Description |
---|---|---|
|
| Records the duration of the action. |
|
| Provides the status in a human readable format. |
|
|
Provides the reason for status in a machine readable format. The value is one of |
|
| Indicates if the log entry is a success or failure. |
|
| The start time of connection check. |
11.4. Verifying network connectivity for an endpoint
As a cluster administrator, you can verify the connectivity of an endpoint, such as an API server, load balancer, service, or pod.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Access to the cluster as a user with the
cluster-admin
role.
Procedure
To list the current
PodNetworkConnectivityCheck
objects, enter the following command:$ oc get podnetworkconnectivitycheck -n openshift-network-diagnostics
Example output
NAME AGE network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1 73m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-service-cluster 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-default-service-cluster 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-external 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-internal 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-1 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-2 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-c-n8mbf 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-d-4hnrz 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-service-cluster 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1 75m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2 74m network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-service-cluster 75m
View the connection test logs:
- From the output of the previous command, identify the endpoint that you want to review the connectivity logs for.
To view the object, enter the following command:
$ oc get podnetworkconnectivitycheck <name> \ -n openshift-network-diagnostics -o yaml
where
<name>
specifies the name of thePodNetworkConnectivityCheck
object.Example output
apiVersion: controlplane.operator.openshift.io/v1alpha1 kind: PodNetworkConnectivityCheck metadata: name: network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0 namespace: openshift-network-diagnostics ... spec: sourcePod: network-check-source-7c88f6d9f-hmg2f targetEndpoint: 10.0.0.4:6443 tlsClientCert: name: "" status: conditions: - lastTransitionTime: "2021-01-13T20:11:34Z" message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnectSuccess status: "True" type: Reachable failures: - latency: 2.241775ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:10:34Z" - latency: 2.582129ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:09:34Z" - latency: 3.483578ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:08:34Z" outages: - end: "2021-01-13T20:11:34Z" endLogs: - latency: 2.032018ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T20:11:34Z" - latency: 2.241775ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:10:34Z" - latency: 2.582129ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:09:34Z" - latency: 3.483578ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:08:34Z" message: Connectivity restored after 2m59.999789186s start: "2021-01-13T20:08:34Z" startLogs: - latency: 3.483578ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect: connection refused' reason: TCPConnectError success: false time: "2021-01-13T20:08:34Z" successes: - latency: 2.845865ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:14:34Z" - latency: 2.926345ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:13:34Z" - latency: 2.895796ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:12:34Z" - latency: 2.696844ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:11:34Z" - latency: 1.502064ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:10:34Z" - latency: 1.388857ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:09:34Z" - latency: 1.906383ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:08:34Z" - latency: 2.089073ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:07:34Z" - latency: 2.156994ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:06:34Z" - latency: 1.777043ms message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp connection to 10.0.0.4:6443 succeeded' reason: TCPConnect success: true time: "2021-01-13T21:05:34Z"
Chapter 12. Changing the MTU for the cluster network
As a cluster administrator, you can change the MTU for the cluster network after cluster installation. This change is disruptive as cluster nodes must be rebooted to finalize the MTU change. You can change the MTU only for clusters using the OVN-Kubernetes or OpenShift SDN network plugins.
12.1. About the cluster MTU
During installation the maximum transmission unit (MTU) for the cluster network is detected automatically based on the MTU of the primary network interface of nodes in the cluster. You do not normally need to override the detected MTU.
You might want to change the MTU of the cluster network for several reasons:
- The MTU detected during cluster installation is not correct for your infrastructure
- Your cluster infrastructure now requires a different MTU, such as from the addition of nodes that need a different MTU for optimal performance
You can change the cluster MTU for only the OVN-Kubernetes and OpenShift SDN cluster network plugins.
12.1.1. Service interruption considerations
When you initiate an MTU change on your cluster the following effects might impact service availability:
- At least two rolling reboots are required to complete the migration to a new MTU. During this time, some nodes are not available as they restart.
- Specific applications deployed to the cluster with shorter timeout intervals than the absolute TCP timeout interval might experience disruption during the MTU change.
12.1.2. MTU value selection
When planning your MTU migration there are two related but distinct MTU values to consider.
- Hardware MTU: This MTU value is set based on the specifics of your network infrastructure.
Cluster network MTU: This MTU value is always less than your hardware MTU to account for the cluster network overlay overhead. The specific overhead is determined by your network plugin:
-
OVN-Kubernetes:
100
bytes -
OpenShift SDN:
50
bytes
-
OVN-Kubernetes:
If your cluster requires different MTU values for different nodes, you must subtract the overhead value for your network plugin from the lowest MTU value that is used by any node in your cluster. For example, if some nodes in your cluster have an MTU of 9001
, and some have an MTU of 1500
, you must set this value to 1400
.
To avoid selecting an MTU value that is not acceptable by a node, verify the maximum MTU value (maxmtu
) that is accepted by the network interface by using the ip -d link
command.
12.1.3. How the migration process works
The following table summarizes the migration process by segmenting between the user-initiated steps in the process and the actions that the migration performs in response.
User-initiated steps | OpenShift Container Platform activity |
---|---|
Set the following values in the Cluster Network Operator configuration:
| Cluster Network Operator (CNO): Confirms that each field is set to a valid value.
If the values provided are valid, the CNO writes out a new temporary configuration with the MTU for the cluster network set to the value of the Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster. |
Reconfigure the MTU of the primary network interface for the nodes on the cluster. You can use a variety of methods to accomplish this, including:
| N/A |
Set the | Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster with the new MTU configuration. |
12.2. Changing the cluster MTU
As a cluster administrator, you can change the maximum transmission unit (MTU) for your cluster. The migration is disruptive and nodes in your cluster might be temporarily unavailable as the MTU update rolls out.
The following procedure describes how to change the cluster MTU by using either machine configs, DHCP, or an ISO. If you use the DHCP or ISO approach, you must refer to configuration artifacts that you kept after installing your cluster to complete the procedure.
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges. You identified the target MTU for your cluster. The correct MTU varies depending on the network plugin that your cluster uses:
-
OVN-Kubernetes: The cluster MTU must be set to
100
less than the lowest hardware MTU value in your cluster. -
OpenShift SDN: The cluster MTU must be set to
50
less than the lowest hardware MTU value in your cluster.
-
OVN-Kubernetes: The cluster MTU must be set to
Procedure
To increase or decrease the MTU for the cluster network complete the following procedure.
To obtain the current MTU for the cluster network, enter the following command:
$ oc describe network.config cluster
Example output
... Status: Cluster Network: Cidr: 10.217.0.0/22 Host Prefix: 23 Cluster Network MTU: 1400 Network Type: OpenShiftSDN Service Network: 10.217.4.0/23 ...
Prepare your configuration for the hardware MTU:
If your hardware MTU is specified with DHCP, update your DHCP configuration such as with the following dnsmasq configuration:
dhcp-option-force=26,<mtu>
where:
<mtu>
- Specifies the hardware MTU for the DHCP server to advertise.
- If your hardware MTU is specified with a kernel command line with PXE, update that configuration accordingly.
If your hardware MTU is specified in a NetworkManager connection configuration, complete the following steps. This approach is the default for OpenShift Container Platform if you do not explicitly specify your network configuration with DHCP, a kernel command line, or some other method. Your cluster nodes must all use the same underlying network configuration for the following procedure to work unmodified.
Find the primary network interface:
If you are using the OpenShift SDN network plugin, enter the following command:
$ oc debug node/<node_name> -- chroot /host ip route list match 0.0.0.0/0 | awk '{print $5 }'
where:
<node_name>
- Specifies the name of a node in your cluster.
If you are using the OVN-Kubernetes network plugin, enter the following command:
$ oc debug node/<node_name> -- chroot /host nmcli -g connection.interface-name c show ovs-if-phys0
where:
<node_name>
- Specifies the name of a node in your cluster.
Create the following NetworkManager configuration in the
<interface>-mtu.conf
file:Example NetworkManager connection configuration
[connection-<interface>-mtu] match-device=interface-name:<interface> ethernet.mtu=<mtu>
where:
<mtu>
- Specifies the new hardware MTU value.
<interface>
- Specifies the primary network interface name.
Create two
MachineConfig
objects, one for the control plane nodes and another for the worker nodes in your cluster:Create the following Butane config in the
control-plane-interface.bu
file:variant: openshift version: 4.12.0 metadata: name: 01-control-plane-interface labels: machineconfiguration.openshift.io/role: master storage: files: - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf 1 contents: local: <interface>-mtu.conf 2 mode: 0600
Create the following Butane config in the
worker-interface.bu
file:variant: openshift version: 4.12.0 metadata: name: 01-worker-interface labels: machineconfiguration.openshift.io/role: worker storage: files: - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf 1 contents: local: <interface>-mtu.conf 2 mode: 0600
Create
MachineConfig
objects from the Butane configs by running the following command:$ for manifest in control-plane-interface worker-interface; do butane --files-dir . $manifest.bu > $manifest.yaml done
To begin the MTU migration, specify the migration configuration by entering the following command. The Machine Config Operator performs a rolling reboot of the nodes in the cluster in preparation for the MTU change.
$ oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": { "mtu": { "network": { "from": <overlay_from>, "to": <overlay_to> } , "machine": { "to" : <machine_to> } } } } }'
where:
<overlay_from>
- Specifies the current cluster network MTU value.
<overlay_to>
-
Specifies the target MTU for the cluster network. This value is set relative to the value for
<machine_to>
and for OVN-Kubernetes must be100
less and for OpenShift SDN must be50
less. <machine_to>
- Specifies the MTU for the primary network interface on the underlying host network.
Example that increases the cluster MTU
$ oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 9000 } , "machine": { "to" : 9100} } } } }'
As the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
$ oc get mcp
A successfully updated node has the following status:
UPDATED=true
,UPDATING=false
,DEGRADED=false
.NoteBy default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
To list the machine configuration state and the name of the applied machine configuration, enter the following command:
$ oc describe node | egrep "hostname|machineconfig"
Example output
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
Verify that the following statements are true:
-
The value of
machineconfiguration.openshift.io/state
field isDone
. -
The value of the
machineconfiguration.openshift.io/currentConfig
field is equal to the value of themachineconfiguration.openshift.io/desiredConfig
field.
-
The value of
To confirm that the machine config is correct, enter the following command:
$ oc get machineconfig <config_name> -o yaml | grep ExecStart
where
<config_name>
is the name of the machine config from themachineconfiguration.openshift.io/currentConfig
field.The machine config must include the following update to the systemd configuration:
ExecStart=/usr/local/bin/mtu-migration.sh
Update the underlying network interface MTU value:
If you are specifying the new MTU with a NetworkManager connection configuration, enter the following command. The MachineConfig Operator automatically performs a rolling reboot of the nodes in your cluster.
$ for manifest in control-plane-interface worker-interface; do oc create -f $manifest.yaml done
- If you are specifying the new MTU with a DHCP server option or a kernel command line and PXE, make the necessary changes for your infrastructure.
As the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
$ oc get mcp
A successfully updated node has the following status:
UPDATED=true
,UPDATING=false
,DEGRADED=false
.NoteBy default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
To list the machine configuration state and the name of the applied machine configuration, enter the following command:
$ oc describe node | egrep "hostname|machineconfig"
Example output
kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
Verify that the following statements are true:
-
The value of
machineconfiguration.openshift.io/state
field isDone
. -
The value of the
machineconfiguration.openshift.io/currentConfig
field is equal to the value of themachineconfiguration.openshift.io/desiredConfig
field.
-
The value of
To confirm that the machine config is correct, enter the following command:
$ oc get machineconfig <config_name> -o yaml | grep path:
where
<config_name>
is the name of the machine config from themachineconfiguration.openshift.io/currentConfig
field.If the machine config is successfully deployed, the previous output contains the
/etc/NetworkManager/conf.d/99-<interface>-mtu.conf
file path and theExecStart=/usr/local/bin/mtu-migration.sh
line.
To finalize the MTU migration, enter one of the following commands:
If you are using the OVN-Kubernetes network plugin:
$ oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": null, "defaultNetwork":{ "ovnKubernetesConfig": { "mtu": <mtu> }}}}'
where:
<mtu>
-
Specifies the new cluster network MTU that you specified with
<overlay_to>
.
If you are using the OpenShift SDN network plugin:
$ oc patch Network.operator.openshift.io cluster --type=merge --patch \ '{"spec": { "migration": null, "defaultNetwork":{ "openshiftSDNConfig": { "mtu": <mtu> }}}}'
where:
<mtu>
-
Specifies the new cluster network MTU that you specified with
<overlay_to>
.
After finalizing the MTU migration, each MCP node is rebooted one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
$ oc get mcp
A successfully updated node has the following status:
UPDATED=true
,UPDATING=false
,DEGRADED=false
.
Verification
You can verify that a node in your cluster uses an MTU that you specified in the previous procedure.
To get the current MTU for the cluster network, enter the following command:
$ oc describe network.config cluster
Get the current MTU for the primary network interface of a node.
To list the nodes in your cluster, enter the following command:
$ oc get nodes
To obtain the current MTU setting for the primary network interface on a node, enter the following command:
$ oc debug node/<node> -- chroot /host ip address show <interface>
where:
<node>
- Specifies a node from the output from the previous step.
<interface>
- Specifies the primary network interface name for the node.
Example output
ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8051
12.3. Additional resources
Chapter 13. Configuring the node port service range
As a cluster administrator, you can expand the available node port range. If your cluster uses of a large number of node ports, you might need to increase the number of available ports.
The default port range is 30000-32767
. You can never reduce the port range, even if you first expand it beyond the default range.
13.1. Prerequisites
-
Your cluster infrastructure must allow access to the ports that you specify within the expanded range. For example, if you expand the node port range to
30000-32900
, the inclusive port range of32768-32900
must be allowed by your firewall or packet filtering configuration.
13.2. Expanding the node port range
You can expand the node port range for the cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in to the cluster with a user with
cluster-admin
privileges.
Procedure
To expand the node port range, enter the following command. Replace
<port>
with the largest port number in the new range.$ oc patch network.config.openshift.io cluster --type=merge -p \ '{ "spec": { "serviceNodePortRange": "30000-<port>" } }'
TipYou can alternatively apply the following YAML to update the node port range:
apiVersion: config.openshift.io/v1 kind: Network metadata: name: cluster spec: serviceNodePortRange: "30000-<port>"
Example output
network.config.openshift.io/cluster patched
To confirm that the configuration is active, enter the following command. It can take several minutes for the update to apply.
$ oc get configmaps -n openshift-kube-apiserver config \ -o jsonpath="{.data['config\.yaml']}" | \ grep -Eo '"service-node-port-range":["[[:digit:]]+-[[:digit:]]+"]'
Example output
"service-node-port-range":["30000-33000"]
13.3. Additional resources
Chapter 14. Configuring IP failover
This topic describes configuring IP failover for pods and services on your OpenShift Container Platform cluster.
IP failover uses Keepalived to host a set of externally accessible Virtual IP (VIP) addresses on a set of hosts. Each VIP address is only serviced by a single host at a time. Keepalived uses the Virtual Router Redundancy Protocol (VRRP) to determine which host, from the set of hosts, services which VIP. If a host becomes unavailable, or if the service that Keepalived is watching does not respond, the VIP is switched to another host from the set. This means a VIP is always serviced as long as a host is available.
Every VIP in the set is serviced by a node selected from the set. If a single node is available, the VIPs are served. There is no way to explicitly distribute the VIPs over the nodes, so there can be nodes with no VIPs and other nodes with many VIPs. If there is only one node, all VIPs are on it.
The administrator must ensure that all of the VIP addresses meet the following requirements:
- Accessible on the configured hosts from outside the cluster.
- Not used for any other purpose within the cluster.
Keepalived on each node determines whether the needed service is running. If it is, VIPs are supported and Keepalived participates in the negotiation to determine which node serves the VIP. For a node to participate, the service must be listening on the watch port on a VIP or the check must be disabled.
Each VIP in the set might be served by a different node.
IP failover monitors a port on each VIP to determine whether the port is reachable on the node. If the port is not reachable, the VIP is not assigned to the node. If the port is set to 0
, this check is suppressed. The check script does the needed testing.
When a node running Keepalived passes the check script, the VIP on that node can enter the master
state based on its priority and the priority of the current master and as determined by the preemption strategy.
A cluster administrator can provide a script through the OPENSHIFT_HA_NOTIFY_SCRIPT
variable, and this script is called whenever the state of the VIP on the node changes. Keepalived uses the master
state when it is servicing the VIP, the backup
state when another node is servicing the VIP, or in the fault
state when the check script fails. The notify script is called with the new state whenever the state changes.
You can create an IP failover deployment configuration on OpenShift Container Platform. The IP failover deployment configuration specifies the set of VIP addresses, and the set of nodes on which to service them. A cluster can have multiple IP failover deployment configurations, with each managing its own set of unique VIP addresses. Each node in the IP failover configuration runs an IP failover pod, and this pod runs Keepalived.
When using VIPs to access a pod with host networking, the application pod runs on all nodes that are running the IP failover pods. This enables any of the IP failover nodes to become the master and service the VIPs when needed. If application pods are not running on all nodes with IP failover, either some IP failover nodes never service the VIPs or some application pods never receive any traffic. Use the same selector and replication count, for both IP failover and the application pods, to avoid this mismatch.
While using VIPs to access a service, any of the nodes can be in the IP failover set of nodes, since the service is reachable on all nodes, no matter where the application pod is running. Any of the IP failover nodes can become master at any time. The service can either use external IPs and a service port or it can use a NodePort
. Setting up a NodePort
is a privileged operation.
When using external IPs in the service definition, the VIPs are set to the external IPs, and the IP failover monitoring port is set to the service port. When using a node port, the port is open on every node in the cluster, and the service load-balances traffic from whatever node currently services the VIP. In this case, the IP failover monitoring port is set to the NodePort
in the service definition.
Even though a service VIP is highly available, performance can still be affected. Keepalived makes sure that each of the VIPs is serviced by some node in the configuration, and several VIPs can end up on the same node even when other nodes have none. Strategies that externally load-balance across a set of VIPs can be thwarted when IP failover puts multiple VIPs on the same node.
When you use ExternalIP
, you can set up IP failover to have the same VIP range as the ExternalIP
range. You can also disable the monitoring port. In this case, all of the VIPs appear on same node in the cluster. Any user can set up a service with an ExternalIP
and make it highly available.
There are a maximum of 254 VIPs in the cluster.
14.1. IP failover environment variables
The following table contains the variables used to configure IP failover.
Variable Name | Default | Description |
---|---|---|
|
|
The IP failover pod tries to open a TCP connection to this port on each Virtual IP (VIP). If connection is established, the service is considered to be running. If this port is set to |
|
The interface name that IP failover uses to send Virtual Router Redundancy Protocol (VRRP) traffic. The default value is | |
|
|
The number of replicas to create. This must match |
|
The list of IP address ranges to replicate. This must be provided. For example, | |
|
|
The offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is |
|
The number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the | |
| INPUT |
The name of the iptables chain, to automatically add an |
| The full path name in the pod file system of a script that is periodically run to verify the application is operating. | |
|
| The period, in seconds, that the check script is run. |
| The full path name in the pod file system of a script that is run whenever the state changes. | |
|
|
The strategy for handling a new higher priority host. The |
14.2. Configuring IP failover in your cluster
As a cluster administrator, you can configure IP failover on an entire cluster, or on a subset of nodes, as defined by the label selector. You can also configure multiple IP failover deployments in your cluster, where each one is independent of the others.
The IP failover deployment ensures that a failover pod runs on each of the nodes matching the constraints or the label used.
This pod runs Keepalived, which can monitor an endpoint and use Virtual Router Redundancy Protocol (VRRP) to fail over the virtual IP (VIP) from one node to another if the first node cannot reach the service or endpoint.
For production use, set a selector
that selects at least two nodes, and set replicas
equal to the number of selected nodes.
Prerequisites
-
You are logged in to the cluster as a user with
cluster-admin
privileges. - You created a pull secret.
Red Hat OpenStack Platform (RHOSP) only:
- You installed an RHOSP client (RHCOS documentation) on the target environment.
-
You also downloaded the RHOSP
openrc.sh
rc file (RHCOS documentation).
Procedure
Create an IP failover service account:
$ oc create sa ipfailover
Update security context constraints (SCC) for
hostNetwork
:$ oc adm policy add-scc-to-user privileged -z ipfailover
$ oc adm policy add-scc-to-user hostnetwork -z ipfailover
Red Hat OpenStack Platform (RHOSP) only: Complete the following steps to make a failover VIP address reachable on RHOSP ports.
Use the RHOSP CLI to show the default RHOSP API and VIP addresses in the
allowed_address_pairs
parameter of your RHOSP cluster:$ openstack port show <cluster_name> -c allowed_address_pairs
Output example
*Field* *Value* allowed_address_pairs ip_address='192.168.0.5', mac_address='fa:16:3e:31:f9:cb' ip_address='192.168.0.7', mac_address='fa:16:3e:31:f9:cb'
Set a different VIP address for the IP failover deployment and make the address reachable on RHOSP ports by entering the following command in the RHOSP CLI. Do not set any default RHOSP API and VIP addresses as the failover VIP address for the IP failover deployment.
Example of adding the
1.1.1.1
failover IP address as an allowed address on RHOSP ports.$ openstack port set <cluster_name> --allowed-address ip-address=1.1.1.1,mac-address=fa:fa:16:3e:31:f9:cb
- Create a deployment YAML file to configure IP failover for your deployment. See "Example deployment YAML for IP failover configuration" in a later step.
Specify the following specification in the IP failover deployment so that you pass the failover VIP address to the
OPENSHIFT_HA_VIRTUAL_IPS
environment variable:Example of adding the
1.1.1.1
VIP address toOPENSHIFT_HA_VIRTUAL_IPS
apiVersion: apps/v1 kind: Deployment metadata: name: ipfailover-keepalived # ... spec: env: - name: OPENSHIFT_HA_VIRTUAL_IPS value: "1.1.1.1" # ...
Create a deployment YAML file to configure IP failover.
NoteFor Red Hat OpenStack Platform (RHOSP), you do not need to re-create the deployment YAML file. You already created this file as part of the earlier instructions.
Example deployment YAML for IP failover configuration
apiVersion: apps/v1 kind: Deployment metadata: name: ipfailover-keepalived 1 labels: ipfailover: hello-openshift spec: strategy: type: Recreate replicas: 2 selector: matchLabels: ipfailover: hello-openshift template: metadata: labels: ipfailover: hello-openshift spec: serviceAccountName: ipfailover privileged: true hostNetwork: true nodeSelector: node-role.kubernetes.io/worker: "" containers: - name: openshift-ipfailover image: quay.io/openshift/origin-keepalived-ipfailover ports: - containerPort: 63000 hostPort: 63000 imagePullPolicy: IfNotPresent securityContext: privileged: true volumeMounts: - name: lib-modules mountPath: /lib/modules readOnly: true - name: host-slash mountPath: /host readOnly: true mountPropagation: HostToContainer - name: etc-sysconfig mountPath: /etc/sysconfig readOnly: true - name: config-volume mountPath: /etc/keepalive env: - name: OPENSHIFT_HA_CONFIG_NAME value: "ipfailover" - name: OPENSHIFT_HA_VIRTUAL_IPS 2 value: "1.1.1.1-2" - name: OPENSHIFT_HA_VIP_GROUPS 3 value: "10" - name: OPENSHIFT_HA_NETWORK_INTERFACE 4 value: "ens3" #The host interface to assign the VIPs - name: OPENSHIFT_HA_MONITOR_PORT 5 value: "30060" - name: OPENSHIFT_HA_VRRP_ID_OFFSET 6 value: "0" - name: OPENSHIFT_HA_REPLICA_COUNT 7 value: "2" #Must match the number of replicas in the deployment - name: OPENSHIFT_HA_USE_UNICAST value: "false" #- name: OPENSHIFT_HA_UNICAST_PEERS #value: "10.0.148.40,10.0.160.234,10.0.199.110" - name: OPENSHIFT_HA_IPTABLES_CHAIN 8 value: "INPUT" #- name: OPENSHIFT_HA_NOTIFY_SCRIPT 9 # value: /etc/keepalive/mynotifyscript.sh - name: OPENSHIFT_HA_CHECK_SCRIPT 10 value: "/etc/keepalive/mycheckscript.sh" - name: OPENSHIFT_HA_PREEMPTION 11 value: "preempt_delay 300" - name: OPENSHIFT_HA_CHECK_INTERVAL 12 value: "2" livenessProbe: initialDelaySeconds: 10 exec: command: - pgrep - keepalived volumes: - name: lib-modules hostPath: path: /lib/modules - name: host-slash hostPath: path: / - name: etc-sysconfig hostPath: path: /etc/sysconfig # config-volume contains the check script # created with `oc create configmap keepalived-checkscript --from-file=mycheckscript.sh` - configMap: defaultMode: 0755 name: keepalived-checkscript name: config-volume imagePullSecrets: - name: openshift-pull-secret 13
- 1
- The name of the IP failover deployment.
- 2
- The list of IP address ranges to replicate. This must be provided. For example,
1.2.3.4-6,1.2.3.9
. - 3
- The number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the
OPENSHIFT_HA_VIP_GROUPS
variable. - 4
- The interface name that IP failover uses to send VRRP traffic. By default,
eth0
is used. - 5
- The IP failover pod tries to open a TCP connection to this port on each VIP. If connection is established, the service is considered to be running. If this port is set to
0
, the test always passes. The default value is80
. - 6
- The offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is
0
, and the allowed range is0
through255
. - 7
- The number of replicas to create. This must match
spec.replicas
value in IP failover deployment configuration. The default value is2
. - 8
- The name of the
iptables
chain to automatically add aniptables
rule to allow the VRRP traffic on. If the value is not set, aniptables
rule is not added. If the chain does not exist, it is not created, and Keepalived operates in unicast mode. The default isINPUT
. - 9
- The full path name in the pod file system of a script that is run whenever the state changes.
- 10
- The full path name in the pod file system of a script that is periodically run to verify the application is operating.
- 11
- The strategy for handling a new higher priority host. The default value is
preempt_delay 300
, which causes a Keepalived instance to take over a VIP after 5 minutes if a lower-priority master is holding the VIP. - 12
- The period, in seconds, that the check script is run. The default value is
2
. - 13
- Create the pull secret before creating the deployment, otherwise you will get an error when creating the deployment.
14.3. Configuring check and notify scripts
Keepalived monitors the health of the application by periodically running an optional user-supplied check script. For example, the script can test a web server by issuing a request and verifying the response. As cluster administrator, you can provide an optional notify script, which is called whenever the state changes.
The check and notify scripts run in the IP failover pod and use the pod file system, not the host file system. However, the IP failover pod makes the host file system available under the /hosts
mount path. When configuring a check or notify script, you must provide the full path to the script. The recommended approach for providing the scripts is to use a ConfigMap
object.
The full path names of the check and notify scripts are added to the Keepalived configuration file, _/etc/keepalived/keepalived.conf
, which is loaded every time Keepalived starts. The scripts can be added to the pod with a ConfigMap
object as described in the following methods.
Check script
When a check script is not provided, a simple default script is run that tests the TCP connection. This default test is suppressed when the monitor port is 0
.
Each IP failover pod manages a Keepalived daemon that manages one or more virtual IP (VIP) addresses on the node where the pod is running. The Keepalived daemon keeps the state of each VIP for that node. A particular VIP on a particular node might be in master
, backup
, or fault
state.
If the check script returns non-zero, the node enters the backup
state, and any VIPs it holds are reassigned.
Notify script
Keepalived passes the following three parameters to the notify script:
-
$1
-group
orinstance
-
$2
- Name of thegroup
orinstance
-
$3
- The new state:master
,backup
, orfault
Prerequisites
-
You installed the OpenShift CLI (
oc
). -
You are logged in to the cluster with a user with
cluster-admin
privileges.
Procedure
Create the desired script and create a
ConfigMap
object to hold it. The script has no input arguments and must return0
forOK
and1
forfail
.The check script,
mycheckscript.sh
:#!/bin/bash # Whatever tests are needed # E.g., send request and verify response exit 0
Create the
ConfigMap
object :$ oc create configmap mycustomcheck --from-file=mycheckscript.sh
Add the script to the pod. The
defaultMode
for the mountedConfigMap
object files must able to run by usingoc
commands or by editing the deployment configuration. A value of0755
,493
decimal, is typical:$ oc set env deploy/ipfailover-keepalived \ OPENSHIFT_HA_CHECK_SCRIPT=/etc/keepalive/mycheckscript.sh
$ oc set volume deploy/ipfailover-keepalived --add --overwrite \ --name=config-volume \ --mount-path=/etc/keepalive \ --source='{"configMap": { "name": "mycustomcheck", "defaultMode": 493}}'
NoteThe
oc set env
command is whitespace sensitive. There must be no whitespace on either side of the=
sign.TipYou can alternatively edit the
ipfailover-keepalived
deployment configuration:$ oc edit deploy ipfailover-keepalived
spec: containers: - env: - name: OPENSHIFT_HA_CHECK_SCRIPT 1 value: /etc/keepalive/mycheckscript.sh ... volumeMounts: 2 - mountPath: /etc/keepalive name: config-volume dnsPolicy: ClusterFirst ... volumes: 3 - configMap: defaultMode: 0755 4 name: customrouter name: config-volume ...
- 1
- In the
spec.container.env
field, add theOPENSHIFT_HA_CHECK_SCRIPT
environment variable to point to the mounted script file. - 2
- Add the
spec.container.volumeMounts
field to create the mount point. - 3
- Add a new
spec.volumes
field to mention the config map. - 4
- This sets run permission on the files. When read back, it is displayed in decimal,
493
.
Save the changes and exit the editor. This restarts
ipfailover-keepalived
.
14.4. Configuring VRRP preemption
When a Virtual IP (VIP) on a node leaves the fault
state by passing the check script, the VIP on the node enters the backup
state if it has lower priority than the VIP on the node that is currently in the master
state. The nopreempt
strategy does not move master
from the lower priority VIP on the host to the higher priority VIP on the host. With preempt_delay 300
, the default, Keepalived waits the specified 300 seconds and moves master
to the higher priority VIP on the host.
Procedure
To specify preemption enter
oc edit deploy ipfailover-keepalived
to edit the router deployment configuration:$ oc edit deploy ipfailover-keepalived
... spec: containers: - env: - name: OPENSHIFT_HA_PREEMPTION 1 value: preempt_delay 300 ...
- 1
- Set the
OPENSHIFT_HA_PREEMPTION
value:-
preempt_delay 300
: Keepalived waits the specified 300 seconds and movesmaster
to the higher priority VIP on the host. This is the default value. -
nopreempt
: does not movemaster
from the lower priority VIP on the host to the higher priority VIP on the host.
-
14.5. Deploying multiple IP failover instances
Each IP failover pod managed by the IP failover deployment configuration, 1
pod per node or replica, runs a Keepalived daemon. As more IP failover deployment configurations are configured, more pods are created and more daemons join into the common Virtual Router Redundancy Protocol (VRRP) negotiation. This negotiation is done by all the Keepalived daemons and it determines which nodes service which virtual IPs (VIP).
Internally, Keepalived assigns a unique vrrp-id
to each VIP. The negotiation uses this set of vrrp-ids
, when a decision is made, the VIP corresponding to the winning vrrp-id
is serviced on the winning node.
Therefore, for every VIP defined in the IP failover deployment configuration, the IP failover pod must assign a corresponding vrrp-id
. This is done by starting at OPENSHIFT_HA_VRRP_ID_OFFSET
and sequentially assigning the vrrp-ids
to the list of VIPs. The vrrp-ids
can have values in the range 1..255
.
When there are multiple IP failover deployment configurations, you must specify OPENSHIFT_HA_VRRP_ID_OFFSET
so that there is room to increase the number of VIPs in the deployment configuration and none of the vrrp-id
ranges overlap.
14.6. Configuring IP failover for more than 254 addresses
IP failover management is limited to 254 groups of Virtual IP (VIP) addresses. By default OpenShift Container Platform assigns one IP address to each group. You can use the OPENSHIFT_HA_VIP_GROUPS
variable to change this so multiple IP addresses are in each group and define the number of VIP groups available for each Virtual Router Redundancy Protocol (VRRP) instance when configuring IP failover.
Grouping VIPs creates a wider range of allocation of VIPs per VRRP in the case of VRRP failover events, and is useful when all hosts in the cluster have access to a service locally. For example, when a service is being exposed with an ExternalIP
.
As a rule for failover, do not limit services, such as the router, to one specific host. Instead, services should be replicated to each host so that in the case of IP failover, the services do not have to be recreated on the new host.
If you are using OpenShift Container Platform health checks, the nature of IP failover and groups means that all instances in the group are not checked. For that reason, the Kubernetes health checks must be used to ensure that services are live.
Prerequisites
-
You are logged in to the cluster with a user with
cluster-admin
privileges.
Procedure
To change the number of IP addresses assigned to each group, change the value for the
OPENSHIFT_HA_VIP_GROUPS
variable, for example:Example
Deployment
YAML for IP failover configuration... spec: env: - name: OPENSHIFT_HA_VIP_GROUPS 1 value: "3" ...
- 1
- If
OPENSHIFT_HA_VIP_GROUPS
is set to3
in an environment with seven VIPs, it creates three groups, assigning three VIPs to the first group, and two VIPs to the two remaining groups.
If the number of groups set by OPENSHIFT_HA_VIP_GROUPS
is fewer than the number of IP addresses set to fail over, the group contains more than one IP address, and all of the addresses move as a single unit.
14.7. High availability For ExternalIP
In non-cloud clusters, IP failover and ExternalIP
to a service can be combined. The result is high availability services for users that create services using ExternalIP
.
The approach is to specify an spec.ExternalIP.autoAssignCIDRs
range of the cluster network configuration, and then use the same range in creating the IP failover configuration.
Because IP failover can support up to a maximum of 255 VIPs for the entire cluster, the spec.ExternalIP.autoAssignCIDRs
must be /24
or smaller.
Additional resources
14.8. Removing IP failover
When IP failover is initially configured, the worker nodes in the cluster are modified with an iptables
rule that explicitly allows multicast packets on 224.0.0.18
for Keepalived. Because of the change to the nodes, removing IP failover requires running a job to remove the iptables
rule and removing the virtual IP addresses used by Keepalived.
Procedure
Optional: Identify and delete any check and notify scripts that are stored as config maps:
Identify whether any pods for IP failover use a config map as a volume:
$ oc get pod -l ipfailover \ -o jsonpath="\ {range .items[?(@.spec.volumes[*].configMap)]} {'Namespace: '}{.metadata.namespace} {'Pod: '}{.metadata.name} {'Volumes that use config maps:'} {range .spec.volumes[?(@.configMap)]} {'volume: '}{.name} {'configMap: '}{.configMap.name}{'\n'}{end} {end}"
Example output
Namespace: default Pod: keepalived-worker-59df45db9c-2x9mn Volumes that use config maps: volume: config-volume configMap: mycustomcheck
If the preceding step provided the names of config maps that are used as volumes, delete the config maps:
$ oc delete configmap <configmap_name>
Identify an existing deployment for IP failover:
$ oc get deployment -l ipfailover
Example output
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE default ipfailover 2/2 2 2 105d
Delete the deployment:
$ oc delete deployment <ipfailover_deployment_name>
Remove the
ipfailover
service account:$ oc delete sa ipfailover
Run a job that removes the IP tables rule that was added when IP failover was initially configured:
Create a file such as
remove-ipfailover-job.yaml
with contents that are similar to the following example:apiVersion: batch/v1 kind: Job metadata: generateName: remove-ipfailover- labels: app: remove-ipfailover spec: template: metadata: name: remove-ipfailover spec: containers: - name: remove-ipfailover image: quay.io/openshift/origin-keepalived-ipfailover:4.12 command: ["/var/lib/ipfailover/keepalived/remove-failover.sh"] nodeSelector: 1 kubernetes.io/hostname: <host_name> 2 restartPolicy: Never
Run the job:
$ oc create -f remove-ipfailover-job.yaml
Example output
job.batch/remove-ipfailover-2h8dm created
Verification
Confirm that the job removed the initial configuration for IP failover.
$ oc logs job/remove-ipfailover-2h8dm
Example output
remove-failover.sh: OpenShift IP Failover service terminating. - Removing ip_vs module ... - Cleaning up ... - Releasing VIPs (interface eth0) ...
Chapter 15. Configuring interface-level network sysctls
In Linux, sysctl allows an administrator to modify kernel parameters at runtime. You can modify interface-level network sysctls using the tuning Container Network Interface (CNI) meta plugin. The tuning CNI meta plugin operates in a chain with a main CNI plugin as illustrated.
The main CNI plugin assigns the interface and passes this to the tuning CNI meta plugin at runtime. You can change some sysctls and several interface attributes (promiscuous mode, all-multicast mode, MTU, and MAC address) in the network namespace by using the tuning CNI meta plugin. In the tuning CNI meta plugin configuration, the interface name is represented by the IFNAME
token, and is replaced with the actual name of the interface at runtime.
In OpenShift Container Platform, the tuning CNI meta plugin only supports changing interface-level network sysctls.
15.1. Configuring the tuning CNI
The following procedure configures the tuning CNI to change the interface-level network net.ipv4.conf.IFNAME.accept_redirects
sysctl. This example enables accepting and sending ICMP-redirected packets.
Procedure
Create a network attachment definition, such as
tuning-example.yaml
, with the following content:apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: <name> 1 namespace: default 2 spec: config: '{ "cniVersion": "0.4.0", 3 "name": "<name>", 4 "plugins": [{ "type": "<main_CNI_plugin>" 5 }, { "type": "tuning", 6 "sysctl": { "net.ipv4.conf.IFNAME.accept_redirects": "1" 7 } } ] }
- 1
- Specifies the name for the additional network attachment to create. The name must be unique within the specified namespace.
- 2
- Specifies the namespace that the object is associated with.
- 3
- Specifies the CNI specification version.
- 4
- Specifies the name for the configuration. It is recommended to match the configuration name to the name value of the network attachment definition.
- 5
- Specifies the name of the main CNI plugin to configure.
- 6
- Specifies the name of the CNI meta plugin.
- 7
- Specifies the sysctl to set.
An example yaml file is shown here:
apiVersion: "k8s.cni.cncf.io/v1" kind: NetworkAttachmentDefinition metadata: name: tuningnad namespace: default spec: config: '{ "cniVersion": "0.4.0", "name": "tuningnad", "plugins": [{ "type": "bridge" }, { "type": "tuning", "sysctl": { "net.ipv4.conf.IFNAME.accept_redirects": "1" } } ] }'
Apply the yaml by running the following command:
$ oc apply -f tuning-example.yaml
Example output
networkattachmentdefinition.k8.cni.cncf.io/tuningnad created
Create a pod such as
examplepod.yaml
with the network attachment definition similar to the following:apiVersion: v1 kind: Pod metadata: name: tunepod namespace: default annotations: k8s.v1.cni.cncf.io/networks: tuningnad 1 spec: containers: - name: podexample image: centos command: ["/bin/bash", "-c", "sleep INF"] securityContext: runAsUser: 2000 2 runAsGroup: 3000 3 allowPrivilegeEscalation: false 4 capabilities: 5 drop: ["ALL"] securityContext: runAsNonRoot: true 6 seccompProfile: 7 type: RuntimeDefault
- 1
- Specify the name of the configured
NetworkAttachmentDefinition
. - 2
runAsUser
controls which user ID the container is run with.- 3
runAsGroup
controls which primary group ID the containers is run with.- 4
allowPrivilegeEscalation
determines if a pod can request to allow privilege escalation. If unspecified, it defaults to true. This boolean directly controls whether theno_new_privs
flag gets set on the container process.- 5
capabilities
permit privileged actions without giving full root access. This policy ensures all capabilities are dropped from the pod.- 6
runAsNonRoot: true
requires that the container will run with a user with any UID other than 0.- 7
RuntimeDefault
enables the default seccomp profile for a pod or container workload.
Apply the yaml by running the following command:
$ oc apply -f examplepod.yaml
Verify that the pod is created by running the following command:
$ oc get pod
Example output
NAME READY STATUS RESTARTS AGE tunepod 1/1 Running 0 47s
Log in to the pod by running the following command:
$ oc rsh tunepod
Verify the values of the configured sysctl flags. For example, find the value
net.ipv4.conf.net1.accept_redirects
by running the following command:sh-4.4# sysctl net.ipv4.conf.net1.accept_redirects
Expected output
net.ipv4.conf.net1.accept_redirects = 1
15.2. Additional resources
Chapter 16. Using the Stream Control Transmission Protocol (SCTP) on a bare metal cluster
As a cluster administrator, you can use the Stream Control Transmission Protocol (SCTP) on a cluster.
16.1. Support for Stream Control Transmission Protocol (SCTP) on OpenShift Container Platform
As a cluster administrator, you can enable SCTP on the hosts in the cluster. On Red Hat Enterprise Linux CoreOS (RHCOS), the SCTP module is disabled by default.
SCTP is a reliable message based protocol that runs on top of an IP network.
When enabled, you can use SCTP as a protocol with pods, services, and network policy. A Service
object must be defined with the type
parameter set to either the ClusterIP
or NodePort
value.
16.1.1. Example configurations using SCTP protocol
You can configure a pod or service to use SCTP by setting the protocol
parameter to the SCTP
value in the pod or service object.
In the following example, a pod is configured to use SCTP:
apiVersion: v1 kind: Pod metadata: namespace: project1 name: example-pod spec: containers: - name: example-pod ... ports: - containerPort: 30100 name: sctpserver protocol: SCTP
In the following example, a service is configured to use SCTP:
apiVersion: v1 kind: Service metadata: namespace: project1 name: sctpserver spec: ... ports: - name: sctpserver protocol: SCTP port: 30100 targetPort: 30100 type: ClusterIP
In the following example, a NetworkPolicy
object is configured to apply to SCTP network traffic on port 80
from any pods with a specific label:
kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: allow-sctp-on-http spec: podSelector: matchLabels: role: web ingress: - ports: - protocol: SCTP port: 80
16.2. Enabling Stream Control Transmission Protocol (SCTP)
As a cluster administrator, you can load and enable the blacklisted SCTP kernel module on worker nodes in your cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Access to the cluster as a user with the
cluster-admin
role.
Procedure
Create a file named
load-sctp-module.yaml
that contains the following YAML definition:apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: name: load-sctp-module labels: machineconfiguration.openshift.io/role: worker spec: config: ignition: version: 3.2.0 storage: files: - path: /etc/modprobe.d/sctp-blacklist.conf mode: 0644 overwrite: true contents: source: data:, - path: /etc/modules-load.d/sctp-load.conf mode: 0644 overwrite: true contents: source: data:,sctp
To create the
MachineConfig
object, enter the following command:$ oc create -f load-sctp-module.yaml
Optional: To watch the status of the nodes while the MachineConfig Operator applies the configuration change, enter the following command. When the status of a node transitions to
Ready
, the configuration update is applied.$ oc get nodes
16.3. Verifying Stream Control Transmission Protocol (SCTP) is enabled
You can verify that SCTP is working on a cluster by creating a pod with an application that listens for SCTP traffic, associating it with a service, and then connecting to the exposed service.
Prerequisites
-
Access to the internet from the cluster to install the
nc
package. -
Install the OpenShift CLI (
oc
). -
Access to the cluster as a user with the
cluster-admin
role.
Procedure
Create a pod starts an SCTP listener:
Create a file named
sctp-server.yaml
that defines a pod with the following YAML:apiVersion: v1 kind: Pod metadata: name: sctpserver labels: app: sctpserver spec: containers: - name: sctpserver image: registry.access.redhat.com/ubi8/ubi command: ["/bin/sh", "-c"] args: ["dnf install -y nc && sleep inf"] ports: - containerPort: 30102 name: sctpserver protocol: SCTP
Create the pod by entering the following command:
$ oc create -f sctp-server.yaml
Create a service for the SCTP listener pod.
Create a file named
sctp-service.yaml
that defines a service with the following YAML:apiVersion: v1 kind: Service metadata: name: sctpservice labels: app: sctpserver spec: type: NodePort selector: app: sctpserver ports: - name: sctpserver protocol: SCTP port: 30102 targetPort: 30102
To create the service, enter the following command:
$ oc create -f sctp-service.yaml
Create a pod for the SCTP client.
Create a file named
sctp-client.yaml
with the following YAML:apiVersion: v1 kind: Pod metadata: name: sctpclient labels: app: sctpclient spec: containers: - name: sctpclient image: registry.access.redhat.com/ubi8/ubi command: ["/bin/sh", "-c"] args: ["dnf install -y nc && sleep inf"]
To create the
Pod
object, enter the following command:$ oc apply -f sctp-client.yaml
Run an SCTP listener on the server.
To connect to the server pod, enter the following command:
$ oc rsh sctpserver
To start the SCTP listener, enter the following command:
$ nc -l 30102 --sctp
Connect to the SCTP listener on the server.
- Open a new terminal window or tab in your terminal program.
Obtain the IP address of the
sctpservice
service. Enter the following command:$ oc get services sctpservice -o go-template='{{.spec.clusterIP}}{{"\n"}}'
To connect to the client pod, enter the following command:
$ oc rsh sctpclient
To start the SCTP client, enter the following command. Replace
<cluster_IP>
with the cluster IP address of thesctpservice
service.# nc <cluster_IP> 30102 --sctp
Chapter 17. Using PTP hardware
You can configure linuxptp
services and use PTP-capable hardware in OpenShift Container Platform cluster nodes.
17.1. About PTP hardware
You can use the OpenShift Container Platform console or OpenShift CLI (oc
) to install PTP by deploying the PTP Operator. The PTP Operator creates and manages the linuxptp
services and provides the following features:
- Discovery of the PTP-capable devices in the cluster.
-
Management of the configuration of
linuxptp
services. -
Notification of PTP clock events that negatively affect the performance and reliability of your application with the PTP Operator
cloud-event-proxy
sidecar.
The PTP Operator works with PTP-capable devices on clusters provisioned only on bare-metal infrastructure.
17.2. About PTP
Precision Time Protocol (PTP) is used to synchronize clocks in a network. When used in conjunction with hardware support, PTP is capable of sub-microsecond accuracy, and is more accurate than Network Time Protocol (NTP).
The linuxptp
package includes the ptp4l
and phc2sys
programs for clock synchronization. ptp4l
implements the PTP boundary clock and ordinary clock. ptp4l
synchronizes the PTP hardware clock to the source clock with hardware time stamping and synchronizes the system clock to the source clock with software time stamping. phc2sys
is used for hardware time stamping to synchronize the system clock to the PTP hardware clock on the network interface controller (NIC).
17.2.1. Elements of a PTP domain
PTP is used to synchronize multiple nodes connected in a network, with clocks for each node. The clocks synchronized by PTP are organized in a source-destination hierarchy. The hierarchy is created and updated automatically by the best master clock (BMC) algorithm, which runs on every clock. Destination clocks are synchronized to source clocks, and destination clocks can themselves be the source for other downstream clocks. The following types of clocks can be included in configurations:
- Grandmaster clock
- The grandmaster clock provides standard time information to other clocks across the network and ensures accurate and stable synchronisation. It writes time stamps and responds to time requests from other clocks. Grandmaster clocks can be synchronized to a Global Positioning System (GPS) time source.
- Ordinary clock
- The ordinary clock has a single port connection that can play the role of source or destination clock, depending on its position in the network. The ordinary clock can read and write time stamps.
- Boundary clock
- The boundary clock has ports in two or more communication paths and can be a source and a destination to other destination clocks at the same time. The boundary clock works as a destination clock upstream. The destination clock receives the timing message, adjusts for delay, and then creates a new source time signal to pass down the network. The boundary clock produces a new timing packet that is still correctly synced with the source clock and can reduce the number of connected devices reporting directly to the source clock.
17.2.2. Advantages of PTP over NTP
One of the main advantages that PTP has over NTP is the hardware support present in various network interface controllers (NIC) and network switches. The specialized hardware allows PTP to account for delays in message transfer and improves the accuracy of time synchronization. To achieve the best possible accuracy, it is recommended that all networking components between PTP clocks are PTP hardware enabled.
Hardware-based PTP provides optimal accuracy, since the NIC can time stamp the PTP packets at the exact moment they are sent and received. Compare this to software-based PTP, which requires additional processing of the PTP packets by the operating system.
Before enabling PTP, ensure that NTP is disabled for the required nodes. You can disable the chrony time service (chronyd
) using a MachineConfig
custom resource. For more information, see Disabling chrony time service.
17.2.3. Using PTP with dual NIC hardware
OpenShift Container Platform supports single and dual NIC hardware for precision PTP timing in the cluster.
For 5G telco networks that deliver mid-band spectrum coverage, each virtual distributed unit (vDU) requires connections to 6 radio units (RUs). To make these connections, each vDU host requires 2 NICs configured as boundary clocks.
Dual NIC hardware allows you to connect each NIC to the same upstream leader clock with separate ptp4l
instances for each NIC feeding the downstream clocks.
17.3. Installing the PTP Operator using the CLI
As a cluster administrator, you can install the Operator by using the CLI.
Prerequisites
- A cluster installed on bare-metal hardware with nodes that have hardware that supports PTP.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a namespace for the PTP Operator.
Save the following YAML in the
ptp-namespace.yaml
file:apiVersion: v1 kind: Namespace metadata: name: openshift-ptp annotations: workload.openshift.io/allowed: management labels: name: openshift-ptp openshift.io/cluster-monitoring: "true"
Create the
Namespace
CR:$ oc create -f ptp-namespace.yaml
Create an Operator group for the PTP Operator.
Save the following YAML in the
ptp-operatorgroup.yaml
file:apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: ptp-operators namespace: openshift-ptp spec: targetNamespaces: - openshift-ptp
Create the
OperatorGroup
CR:$ oc create -f ptp-operatorgroup.yaml
Subscribe to the PTP Operator.
Save the following YAML in the
ptp-sub.yaml
file:apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: ptp-operator-subscription namespace: openshift-ptp spec: channel: "stable" name: ptp-operator source: redhat-operators sourceNamespace: openshift-marketplace
Create the
Subscription
CR:$ oc create -f ptp-sub.yaml
To verify that the Operator is installed, enter the following command:
$ oc get csv -n openshift-ptp -o custom-columns=Name:.metadata.name,Phase:.status.phase
Example output
Name Phase 4.12.0-202301261535 Succeeded
17.4. Installing the PTP Operator using the web console
As a cluster administrator, you can install the PTP Operator using the web console.
You have to create the namespace and Operator group as mentioned in the previous section.
Procedure
Install the PTP Operator using the OpenShift Container Platform web console:
- In the OpenShift Container Platform web console, click Operators → OperatorHub.
- Choose PTP Operator from the list of available Operators, and then click Install.
- On the Install Operator page, under A specific namespace on the cluster select openshift-ptp. Then, click Install.
Optional: Verify that the PTP Operator installed successfully:
- Switch to the Operators → Installed Operators page.
Ensure that PTP Operator is listed in the openshift-ptp project with a Status of InstallSucceeded.
NoteDuring installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
If the Operator does not appear as installed, to troubleshoot further:
- Go to the Operators → Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
-
Go to the Workloads → Pods page and check the logs for pods in the
openshift-ptp
project.
17.5. Configuring PTP devices
The PTP Operator adds the NodePtpDevice.ptp.openshift.io
custom resource definition (CRD) to OpenShift Container Platform.
When installed, the PTP Operator searches your cluster for PTP-capable network devices on each node. It creates and updates a NodePtpDevice
custom resource (CR) object for each node that provides a compatible PTP-capable network device.
17.5.1. Discovering PTP capable network devices in your cluster
To return a complete list of PTP capable network devices in your cluster, run the following command:
$ oc get NodePtpDevice -n openshift-ptp -o yaml
Example output
apiVersion: v1 items: - apiVersion: ptp.openshift.io/v1 kind: NodePtpDevice metadata: creationTimestamp: "2022-01-27T15:16:28Z" generation: 1 name: dev-worker-0 1 namespace: openshift-ptp resourceVersion: "6538103" uid: d42fc9ad-bcbf-4590-b6d8-b676c642781a spec: {} status: devices: 2 - name: eno1 - name: eno2 - name: eno3 - name: eno4 - name: enp5s0f0 - name: enp5s0f1 ...
17.5.2. Configuring linuxptp services as a grandmaster clock
You can configure the linuxptp
services (ptp4l
, phc2sys
, ts2phc
) as grandmaster clock by creating a PtpConfig
custom resource (CR) that configures the host NIC.
The ts2phc
utility allows you to synchronize the system clock with the PTP grandmaster clock so that the node can stream precision clock signal to downstream PTP ordinary clocks and boundary clocks.
Use the following example PtpConfig
CR as the basis to configure linuxptp
services as the grandmaster clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts
, ptp4lConf
, and ptpClockThreshold
. ptpClockThreshold
is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.
Prerequisites
- Install an Intel Westport Channel network interface in the bare-metal cluster host.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Create the
PtpConfig
resource. For example:Save the following YAML in the
grandmaster-clock-ptp-config.yaml
file:Example PTP grandmaster clock configuration
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: grandmaster-clock namespace: openshift-ptp annotations: {} spec: profile: - name: grandmaster-clock # The interface name is hardware-specific interface: $interface ptp4lOpts: "-2" phc2sysOpts: "-a -r -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | [global] # # Default Data Set # twoStepFlag 1 slaveOnly 0 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 255 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type OC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: grandmaster-clock priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
Create the CR by running the following command:
$ oc create -f grandmaster-clock-ptp-config.yaml
Verification
Check that the
PtpConfig
profile is applied to the node.Get the list of pods in the
openshift-ptp
namespace by running the following command:$ oc get pods -n openshift-ptp -o wide
Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-74m2g 3/3 Running 3 4d15h 10.16.230.7 compute-1.example.com ptp-operator-5f4f48d7c-x7zkf 1/1 Running 1 4d15h 10.128.1.145 compute-1.example.com
Check that the profile is correct. Examine the logs of the
linuxptp
daemon that corresponds to the node you specified in thePtpConfig
profile. Run the following command:$ oc logs linuxptp-daemon-74m2g -n openshift-ptp -c linuxptp-daemon-container
Example output
ts2phc[94980.334]: [ts2phc.0.config] nmea delay: 98690975 ns ts2phc[94980.334]: [ts2phc.0.config] ens3f0 extts index 0 at 1676577329.999999999 corr 0 src 1676577330.901342528 diff -1 ts2phc[94980.334]: [ts2phc.0.config] ens3f0 master offset -1 s2 freq -1 ts2phc[94980.441]: [ts2phc.0.config] nmea sentence: GNRMC,195453.00,A,4233.24427,N,07126.64420,W,0.008,,160223,,,A,V phc2sys[94980.450]: [ptp4l.0.config] CLOCK_REALTIME phc offset 943 s2 freq -89604 delay 504 phc2sys[94980.512]: [ptp4l.0.config] CLOCK_REALTIME phc offset 1000 s2 freq -89264 delay 474
17.5.3. Configuring linuxptp services as an ordinary clock
You can configure linuxptp
services (ptp4l
, phc2sys
) as ordinary clock by creating a PtpConfig
custom resource (CR) object.
Use the following example PtpConfig
CR as the basis to configure linuxptp
services as an ordinary clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts
, ptp4lConf
, and ptpClockThreshold
. ptpClockThreshold
is required only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Create the following
PtpConfig
CR, and then save the YAML in theordinary-clock-ptp-config.yaml
file.Example PTP ordinary clock configuration
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: ordinary-clock namespace: openshift-ptp annotations: {} spec: profile: - name: ordinary-clock # The interface name is hardware-specific interface: $interface ptp4lOpts: "-2 -s" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | [global] # # Default Data Set # twoStepFlag 1 slaveOnly 1 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 255 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 7 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type OC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: ordinary-clock priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
Table 17.1. PTP ordinary clock CR configuration options Custom resource field Description name
The name of the
PtpConfig
CR.profile
Specify an array of one or more
profile
objects. Each profile must be uniquely named.interface
Specify the network interface to be used by the
ptp4l
service, for exampleens787f1
.ptp4lOpts
Specify system config options for the
ptp4l
service, for example-2
to select the IEEE 802.3 network transport. The options should not include the network interface name-i <interface>
and service config file-f /etc/ptp4l.conf
because the network interface name and the service config file are automatically appended. Append--summary_interval -4
to use PTP fast events with this interface.phc2sysOpts
Specify system config options for the
phc2sys
service. If this field is empty, the PTP Operator does not start thephc2sys
service. For Intel Columbiaville 800 Series NICs, setphc2sysOpts
options to-a -r -m -n 24 -N 8 -R 16
.-m
prints messages tostdout
. Thelinuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics.ptp4lConf
Specify a string that contains the configuration to replace the default
/etc/ptp4l.conf
file. To use the default configuration, leave the field empty.tx_timestamp_timeout
For Intel Columbiaville 800 Series NICs, set
tx_timestamp_timeout
to50
.boundary_clock_jbod
For Intel Columbiaville 800 Series NICs, set
boundary_clock_jbod
to0
.ptpSchedulingPolicy
Scheduling policy for
ptp4l
andphc2sys
processes. Default value isSCHED_OTHER
. UseSCHED_FIFO
on systems that support FIFO scheduling.ptpSchedulingPriority
Integer value from 1-65 used to set FIFO priority for
ptp4l
andphc2sys
processes whenptpSchedulingPolicy
is set toSCHED_FIFO
. TheptpSchedulingPriority
field is not used whenptpSchedulingPolicy
is set toSCHED_OTHER
.ptpClockThreshold
Optional. If
ptpClockThreshold
is not present, default values are used for theptpClockThreshold
fields.ptpClockThreshold
configures how long after the PTP master clock is disconnected before PTP events are triggered.holdOverTimeout
is the time value in seconds before the PTP clock event state changes toFREERUN
when the PTP master clock is disconnected. ThemaxOffsetThreshold
andminOffsetThreshold
settings configure offset values in nanoseconds that compare against the values forCLOCK_REALTIME
(phc2sys
) or master offset (ptp4l
). When theptp4l
orphc2sys
offset value is outside this range, the PTP clock state is set toFREERUN
. When the offset value is within this range, the PTP clock state is set toLOCKED
.recommend
Specify an array of one or more
recommend
objects that define rules on how theprofile
should be applied to nodes..recommend.profile
Specify the
.recommend.profile
object name defined in theprofile
section..recommend.priority
Set
.recommend.priority
to0
for ordinary clock..recommend.match
Specify
.recommend.match
rules withnodeLabel
ornodeName
values..recommend.match.nodeLabel
Set
nodeLabel
with thekey
of thenode.Labels
field from the node object by using theoc get nodes --show-labels
command. For example,node-role.kubernetes.io/worker
..recommend.match.nodeName
Set
nodeName
with the value of thenode.Name
field from the node object by using theoc get nodes
command. For example,compute-1.example.com
.Create the
PtpConfig
CR by running the following command:$ oc create -f ordinary-clock-ptp-config.yaml
Verification
Check that the
PtpConfig
profile is applied to the node.Get the list of pods in the
openshift-ptp
namespace by running the following command:$ oc get pods -n openshift-ptp -o wide
Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-4xkbb 1/1 Running 0 43m 10.1.196.24 compute-0.example.com linuxptp-daemon-tdspf 1/1 Running 0 43m 10.1.196.25 compute-1.example.com ptp-operator-657bbb64c8-2f8sj 1/1 Running 0 43m 10.129.0.61 control-plane-1.example.com
Check that the profile is correct. Examine the logs of the
linuxptp
daemon that corresponds to the node you specified in thePtpConfig
profile. Run the following command:$ oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container
Example output
I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to: I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------ I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1 I1115 09:41:17.117616 4143292 daemon.go:102] Interface: ens787f1 I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2 -s I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24 I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------
Additional resources
- For more information about FIFO priority scheduling on PTP hardware, see Configuring FIFO priority scheduling for PTP hardware.
- For more information about configuring PTP fast events, see Configuring the PTP fast event notifications publisher.
17.5.4. Configuring linuxptp services as a boundary clock
You can configure the linuxptp
services (ptp4l
, phc2sys
) as boundary clock by creating a PtpConfig
custom resource (CR) object.
Use the following example PtpConfig
CR as the basis to configure linuxptp
services as the boundary clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts
, ptp4lConf
, and ptpClockThreshold
. ptpClockThreshold
is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Create the following
PtpConfig
CR, and then save the YAML in theboundary-clock-ptp-config.yaml
file.Example PTP boundary clock configuration
apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary-clock namespace: openshift-ptp annotations: {} spec: profile: - name: boundary-clock ptp4lOpts: "-2" phc2sysOpts: "-a -r -n 24" ptpSchedulingPolicy: SCHED_FIFO ptpSchedulingPriority: 10 ptpSettings: logReduce: "true" ptp4lConf: | # The interface name is hardware-specific [$iface_slave] masterOnly 0 [$iface_master_1] masterOnly 1 [$iface_master_2] masterOnly 1 [$iface_master_3] masterOnly 1 [global] # # Default Data Set # twoStepFlag 1 slaveOnly 0 priority1 128 priority2 128 domainNumber 24 #utc_offset 37 clockClass 248 clockAccuracy 0xFE offsetScaledLogVariance 0xFFFF free_running 0 freq_est_interval 1 dscp_event 0 dscp_general 0 dataset_comparison G.8275.x G.8275.defaultDS.localPriority 128 # # Port Data Set # logAnnounceInterval -3 logSyncInterval -4 logMinDelayReqInterval -4 logMinPdelayReqInterval -4 announceReceiptTimeout 3 syncReceiptTimeout 0 delayAsymmetry 0 fault_reset_interval -4 neighborPropDelayThresh 20000000 masterOnly 0 G.8275.portDS.localPriority 128 # # Run time options # assume_two_step 0 logging_level 6 path_trace_enabled 0 follow_up_info 0 hybrid_e2e 0 inhibit_multicast_service 0 net_sync_monitor 0 tc_spanning_tree 0 tx_timestamp_timeout 50 unicast_listen 0 unicast_master_table 0 unicast_req_duration 3600 use_syslog 1 verbose 0 summary_interval 0 kernel_leap 1 check_fup_sync 0 clock_class_threshold 135 # # Servo Options # pi_proportional_const 0.0 pi_integral_const 0.0 pi_proportional_scale 0.0 pi_proportional_exponent -0.3 pi_proportional_norm_max 0.7 pi_integral_scale 0.0 pi_integral_exponent 0.4 pi_integral_norm_max 0.3 step_threshold 2.0 first_step_threshold 0.00002 max_frequency 900000000 clock_servo pi sanity_freq_limit 200000000 ntpshm_segment 0 # # Transport options # transportSpecific 0x0 ptp_dst_mac 01:1B:19:00:00:00 p2p_dst_mac 01:80:C2:00:00:0E udp_ttl 1 udp6_scope 0x0E uds_address /var/run/ptp4l # # Default interface options # clock_type BC network_transport L2 delay_mechanism E2E time_stamping hardware tsproc_mode filter delay_filter moving_median delay_filter_length 10 egressLatency 0 ingressLatency 0 boundary_clock_jbod 0 # # Clock description # productDescription ;; revisionData ;; manufacturerIdentity 00:00:00 userDescription ; timeSource 0xA0 recommend: - profile: boundary-clock priority: 4 match: - nodeLabel: "node-role.kubernetes.io/$mcp"
Table 17.2. PTP boundary clock CR configuration options Custom resource field Description name
The name of the
PtpConfig
CR.profile
Specify an array of one or more
profile
objects.name
Specify the name of a profile object which uniquely identifies a profile object.
ptp4lOpts
Specify system config options for the
ptp4l
service. The options should not include the network interface name-i <interface>
and service config file-f /etc/ptp4l.conf
because the network interface name and the service config file are automatically appended.ptp4lConf
Specify the required configuration to start
ptp4l
as boundary clock. For example,ens1f0
synchronizes from a grandmaster clock andens1f3
synchronizes connected devices.<interface_1>
The interface that receives the synchronization clock.
<interface_2>
The interface that sends the synchronization clock.
tx_timestamp_timeout
For Intel Columbiaville 800 Series NICs, set
tx_timestamp_timeout
to50
.boundary_clock_jbod
For Intel Columbiaville 800 Series NICs, ensure
boundary_clock_jbod
is set to0
. For Intel Fortville X710 Series NICs, ensureboundary_clock_jbod
is set to1
.phc2sysOpts
Specify system config options for the
phc2sys
service. If this field is empty, the PTP Operator does not start thephc2sys
service.ptpSchedulingPolicy
Scheduling policy for ptp4l and phc2sys processes. Default value is
SCHED_OTHER
. UseSCHED_FIFO
on systems that support FIFO scheduling.ptpSchedulingPriority
Integer value from 1-65 used to set FIFO priority for
ptp4l
andphc2sys
processes whenptpSchedulingPolicy
is set toSCHED_FIFO
. TheptpSchedulingPriority
field is not used whenptpSchedulingPolicy
is set toSCHED_OTHER
.ptpClockThreshold
Optional. If
ptpClockThreshold
is not present, default values are used for theptpClockThreshold
fields.ptpClockThreshold
configures how long after the PTP master clock is disconnected before PTP events are triggered.holdOverTimeout
is the time value in seconds before the PTP clock event state changes toFREERUN
when the PTP master clock is disconnected. ThemaxOffsetThreshold
andminOffsetThreshold
settings configure offset values in nanoseconds that compare against the values forCLOCK_REALTIME
(phc2sys
) or master offset (ptp4l
). When theptp4l
orphc2sys
offset value is outside this range, the PTP clock state is set toFREERUN
. When the offset value is within this range, the PTP clock state is set toLOCKED
.recommend
Specify an array of one or more
recommend
objects that define rules on how theprofile
should be applied to nodes..recommend.profile
Specify the
.recommend.profile
object name defined in theprofile
section..recommend.priority
Specify the
priority
with an integer value between0
and99
. A larger number gets lower priority, so a priority of99
is lower than a priority of10
. If a node can be matched with multiple profiles according to rules defined in thematch
field, the profile with the higher priority is applied to that node..recommend.match
Specify
.recommend.match
rules withnodeLabel
ornodeName
values..recommend.match.nodeLabel
Set
nodeLabel
with thekey
of thenode.Labels
field from the node object by using theoc get nodes --show-labels
command. For example,node-role.kubernetes.io/worker
..recommend.match.nodeName
Set
nodeName
with the value of thenode.Name
field from the node object by using theoc get nodes
command. For example,compute-1.example.com
.Create the CR by running the following command:
$ oc create -f boundary-clock-ptp-config.yaml
Verification
Check that the
PtpConfig
profile is applied to the node.Get the list of pods in the
openshift-ptp
namespace by running the following command:$ oc get pods -n openshift-ptp -o wide
Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-4xkbb 1/1 Running 0 43m 10.1.196.24 compute-0.example.com linuxptp-daemon-tdspf 1/1 Running 0 43m 10.1.196.25 compute-1.example.com ptp-operator-657bbb64c8-2f8sj 1/1 Running 0 43m 10.129.0.61 control-plane-1.example.com
Check that the profile is correct. Examine the logs of the
linuxptp
daemon that corresponds to the node you specified in thePtpConfig
profile. Run the following command:$ oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container
Example output
I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to: I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------ I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1 I1115 09:41:17.117616 4143292 daemon.go:102] Interface: I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2 I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24 I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------
Additional resources
- For more information about FIFO priority scheduling on PTP hardware, see Configuring FIFO priority scheduling for PTP hardware.
- For more information about configuring PTP fast events, see Configuring the PTP fast event notifications publisher.
17.5.5. Configuring linuxptp services as boundary clocks for dual NIC hardware
Precision Time Protocol (PTP) hardware with dual NIC configured as boundary clocks is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
You can configure the linuxptp
services (ptp4l
, phc2sys
) as boundary clocks for dual NIC hardware by creating a PtpConfig
custom resource (CR) object for each NIC.
Dual NIC hardware allows you to connect each NIC to the same upstream leader clock with separate ptp4l
instances for each NIC feeding the downstream clocks.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the PTP Operator.
Procedure
Create two separate
PtpConfig
CRs, one for each NIC, using the reference CR in "Configuring linuxptp services as a boundary clock" as the basis for each CR. For example:Create
boundary-clock-ptp-config-nic1.yaml
, specifying values forphc2sysOpts
:apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary-clock-ptp-config-nic1 namespace: openshift-ptp spec: profile: - name: "profile1" ptp4lOpts: "-2 --summary_interval -4" ptp4lConf: | 1 [ens5f1] masterOnly 1 [ens5f0] masterOnly 0 ... phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" 2
- 1
- Specify the required interfaces to start
ptp4l
as a boundary clock. For example,ens5f0
synchronizes from a grandmaster clock andens5f1
synchronizes connected devices. - 2
- Required
phc2sysOpts
values.-m
prints messages tostdout
. Thelinuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics.
Create
boundary-clock-ptp-config-nic2.yaml
, removing thephc2sysOpts
field altogether to disable thephc2sys
service for the second NIC:apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: boundary-clock-ptp-config-nic2 namespace: openshift-ptp spec: profile: - name: "profile2" ptp4lOpts: "-2 --summary_interval -4" ptp4lConf: | 1 [ens7f1] masterOnly 1 [ens7f0] masterOnly 0 ...
- 1
- Specify the required interfaces to start
ptp4l
as a boundary clock on the second NIC.
NoteYou must completely remove the
phc2sysOpts
field from the secondPtpConfig
CR to disable thephc2sys
service on the second NIC.
Create the dual NIC
PtpConfig
CRs by running the following commands:Create the CR that configures PTP for the first NIC:
$ oc create -f boundary-clock-ptp-config-nic1.yaml
Create the CR that configures PTP for the second NIC:
$ oc create -f boundary-clock-ptp-config-nic2.yaml
Verification
Check that the PTP Operator has applied the
PtpConfig
CRs for both NICs. Examine the logs for thelinuxptp
daemon corresponding to the node that has the dual NIC hardware installed. For example, run the following command:$ oc logs linuxptp-daemon-cvgr6 -n openshift-ptp -c linuxptp-daemon-container
Example output
ptp4l[80828.335]: [ptp4l.1.config] master offset 5 s2 freq -5727 path delay 519 ptp4l[80828.343]: [ptp4l.0.config] master offset -5 s2 freq -10607 path delay 533 phc2sys[80828.390]: [ptp4l.0.config] CLOCK_REALTIME phc offset 1 s2 freq -87239 delay 539
17.5.6. Intel Columbiaville E800 series NIC as PTP ordinary clock reference
The following table describes the changes that you must make to the reference PTP configuration in order to use Intel Columbiaville E800 series NICs as ordinary clocks. Make the changes in a PtpConfig
custom resource (CR) that you apply to the cluster.
PTP configuration | Recommended setting |
---|---|
|
|
|
|
|
|
For phc2sysOpts
, -m
prints messages to stdout
. The linuxptp-daemon
DaemonSet
parses the logs and generates Prometheus metrics.
Additional resources
-
For a complete example CR that configures
linuxptp
services as an ordinary clock with PTP fast events, see Configuring linuxptp services as ordinary clock.
17.5.7. Configuring FIFO priority scheduling for PTP hardware
In telco or other deployment configurations that require low latency performance, PTP daemon threads run in a constrained CPU footprint alongside the rest of the infrastructure components. By default, PTP threads run with the SCHED_OTHER
policy. Under high load, these threads might not get the scheduling latency they require for error-free operation.
To mitigate against potential scheduling latency errors, you can configure the PTP Operator linuxptp
services to allow threads to run with a SCHED_FIFO
policy. If SCHED_FIFO
is set for a PtpConfig
CR, then ptp4l
and phc2sys
will run in the parent container under chrt
with a priority set by the ptpSchedulingPriority
field of the PtpConfig
CR.
Setting ptpSchedulingPolicy
is optional, and is only required if you are experiencing latency errors.
Procedure
Edit the
PtpConfig
CR profile:$ oc edit PtpConfig -n openshift-ptp
Change the
ptpSchedulingPolicy
andptpSchedulingPriority
fields:apiVersion: ptp.openshift.io/v1 kind: PtpConfig metadata: name: <ptp_config_name> namespace: openshift-ptp ... spec: profile: - name: "profile1" ... ptpSchedulingPolicy: SCHED_FIFO 1 ptpSchedulingPriority: 10 2
-
Save and exit to apply the changes to the
PtpConfig
CR.
Verification
Get the name of the
linuxptp-daemon
pod and corresponding node where thePtpConfig
CR has been applied:$ oc get pods -n openshift-ptp -o wide
Example output
NAME READY STATUS RESTARTS AGE IP NODE linuxptp-daemon-gmv2n 3/3 Running 0 1d17h 10.1.196.24 compute-0.example.com linuxptp-daemon-lgm55 3/3 Running 0 1d17h 10.1.196.25 compute-1.example.com ptp-operator-3r4dcvf7f4-zndk7 1/1 Running 0 1d7h 10.129.0.61 control-plane-1.example.com
Check that the
ptp4l
process is running with the updatedchrt
FIFO priority:$ oc -n openshift-ptp logs linuxptp-daemon-lgm55 -c linuxptp-daemon-container|grep chrt
Example output
I1216 19:24:57.091872 1600715 daemon.go:285] /bin/chrt -f 65 /usr/sbin/ptp4l -f /var/run/ptp4l.0.config -2 --summary_interval -4 -m
17.5.8. Configuring log filtering for linuxptp services
The linuxptp
daemon generates logs that you can use for debugging purposes. In telco or other deployment configurations that feature a limited storage capacity, these logs can add to the storage demand.
To reduce the number log messages, you can configure the PtpConfig
custom resource (CR) to exclude log messages that report the master offset
value. The master offset
log message reports the difference between the current node’s clock and the master clock in nanoseconds.
Prerequisites
- Install the OpenShif