OpenShift Container Platform

Specify the TLS security profile type (Old, Intermediate, or Custom). The default is Intermediate.

Specify the appropriate field for the selected type:

old: {}
intermediate: {}
custom:

For the custom type, specify a list of TLS ciphers and minimum accepted TLS version.

Save the file to apply the changes.

Verification

Verify that the profile is set in the IngressController CR:

oc describe IngressController default -n openshift-ingress-operator

$ oc describe IngressController default -n openshift-ingress-operator

Copy to Clipboard

Toggle word wrap

Example output

Name:         default
Namespace:    openshift-ingress-operator
Labels:       <none>
Annotations:  <none>
API Version:  operator.openshift.io/v1
Kind:         IngressController
 ...
Spec:
 ...
  Tls Security Profile:
    Custom:
      Ciphers:
        ECDHE-ECDSA-CHACHA20-POLY1305
        ECDHE-RSA-CHACHA20-POLY1305
        ECDHE-RSA-AES128-GCM-SHA256
        ECDHE-ECDSA-AES128-GCM-SHA256
      Min TLS Version:  VersionTLS11
    Type:               Custom
 ...

Name:         default
Namespace:    openshift-ingress-operator
Labels:       <none>
Annotations:  <none>
API Version:  operator.openshift.io/v1
Kind:         IngressController
 ...
Spec:
 ...
  Tls Security Profile:
    Custom:
      Ciphers:
        ECDHE-ECDSA-CHACHA20-POLY1305
        ECDHE-RSA-CHACHA20-POLY1305
        ECDHE-RSA-AES128-GCM-SHA256
        ECDHE-ECDSA-AES128-GCM-SHA256
      Min TLS Version:  VersionTLS11
    Type:               Custom
 ...

Copy to Clipboard

Toggle word wrap

7.3.1.3. Configuring mutual TLS authentication
Copy link

You can configure the Ingress Controller to enable mutual TLS (mTLS) authentication by setting a spec.clientTLS value. The clientTLS value configures the Ingress Controller to verify client certificates. This configuration includes setting a clientCA value, which is a reference to a config map. The config map contains the PEM-encoded CA certificate bundle that is used to verify a client’s certificate. Optionally, you can also configure a list of certificate subject filters.

If the clientCA value specifies an X509v3 certificate revocation list (CRL) distribution point, the Ingress Operator downloads and manages a CRL config map based on the HTTP URI X509v3 CRL Distribution Point specified in each provided certificate. The Ingress Controller uses this config map during mTLS/TLS negotiation. Requests that do not provide valid certificates are rejected.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have a PEM-encoded CA certificate bundle.

If your CA bundle references a CRL distribution point, you must have also included the end-entity or leaf certificate to the client CA bundle. This certificate must have included an HTTP URI under CRL Distribution Points, as described in RFC 5280. For example:

 Issuer: C=US, O=Example Inc, CN=Example Global G2 TLS RSA SHA256 2020 CA1
         Subject: SOME SIGNED CERT            X509v3 CRL Distribution Points:
                Full Name:
                  URI:http://crl.example.com/example.crl

 Issuer: C=US, O=Example Inc, CN=Example Global G2 TLS RSA SHA256 2020 CA1
         Subject: SOME SIGNED CERT            X509v3 CRL Distribution Points:
                Full Name:
                  URI:http://crl.example.com/example.crl

Copy to Clipboard

Toggle word wrap

Procedure

In the openshift-config namespace, create a config map from your CA bundle:

oc create configmap \
   router-ca-certs-default \
   --from-file=ca-bundle.pem=client-ca.crt \
   -n openshift-config

$ oc create configmap \
   router-ca-certs-default \
   --from-file=ca-bundle.pem=client-ca.crt \

1


   -n openshift-config

Copy to Clipboard

Toggle word wrap

1: The config map data key must be ca-bundle.pem, and the data value must be a CA certificate in PEM format.

Edit the IngressController resource in the openshift-ingress-operator project:

oc edit IngressController default -n openshift-ingress-operator

$ oc edit IngressController default -n openshift-ingress-operator

Copy to Clipboard

Toggle word wrap

Add the spec.clientTLS field and subfields to configure mutual TLS:

Sample IngressController CR for a clientTLS profile that specifies filtering patterns

  apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: default
    namespace: openshift-ingress-operator
  spec:
    clientTLS:
      clientCertificatePolicy: Required
      clientCA:
        name: router-ca-certs-default
      allowedSubjectPatterns:
      - "^/CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift$"

  apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: default
    namespace: openshift-ingress-operator
  spec:
    clientTLS:
      clientCertificatePolicy: Required
      clientCA:
        name: router-ca-certs-default
      allowedSubjectPatterns:
      - "^/CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift$"

Copy to Clipboard

Toggle word wrap

Optional, get the Distinguished Name (DN) for allowedSubjectPatterns by entering the following command.

openssl  x509 -in custom-cert.pem  -noout -subject
subject= /CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift

$ openssl  x509 -in custom-cert.pem  -noout -subject
subject= /CN=example.com/ST=NC/C=US/O=Security/OU=OpenShift

Copy to Clipboard

Toggle word wrap

7.4. View the default Ingress Controller
Copy link

The Ingress Operator is a core feature of OpenShift Container Platform and is enabled out of the box.

Every new OpenShift Container Platform installation has an ingresscontroller named default. It can be supplemented with additional Ingress Controllers. If the default ingresscontroller is deleted, the Ingress Operator will automatically recreate it within a minute.

Procedure

View the default Ingress Controller:

oc describe --namespace=openshift-ingress-operator ingresscontroller/default

$ oc describe --namespace=openshift-ingress-operator ingresscontroller/default

Copy to Clipboard

Toggle word wrap

7.5. View Ingress Operator status
Copy link

You can view and inspect the status of your Ingress Operator.

Procedure

View your Ingress Operator status:
```
oc describe clusteroperators/ingress
```
```
$ oc describe clusteroperators/ingress
```
Copy to Clipboard Toggle word wrap

7.6. View Ingress Controller logs
Copy link

You can view your Ingress Controller logs.

Procedure

View your Ingress Controller logs:

oc logs --namespace=openshift-ingress-operator deployments/ingress-operator -c <container_name>

$ oc logs --namespace=openshift-ingress-operator deployments/ingress-operator -c <container_name>

Copy to Clipboard

Toggle word wrap

7.7. View Ingress Controller status
Copy link

Your can view the status of a particular Ingress Controller.

Procedure

View the status of an Ingress Controller:

oc describe --namespace=openshift-ingress-operator ingresscontroller/<name>

$ oc describe --namespace=openshift-ingress-operator ingresscontroller/<name>

Copy to Clipboard

Toggle word wrap

7.8. Configuring the Ingress Controller
Copy link

7.8.1. Setting a custom default certificate
Copy link

As an administrator, you can configure an Ingress Controller to use a custom certificate by creating a Secret resource and editing the IngressController custom resource (CR).

Prerequisites

You must have a certificate/key pair in PEM-encoded files, where the certificate is signed by a trusted certificate authority or by a private trusted certificate authority that you configured in a custom PKI.
Your certificate meets the following requirements:
- The certificate is valid for the ingress domain.
- The certificate uses the subjectAltName extension to specify a wildcard domain, such as *.apps.ocp4.example.com.
You must have an IngressController CR. You may use the default one:
```
oc --namespace openshift-ingress-operator get ingresscontrollers
```
```
$ oc --namespace openshift-ingress-operator get ingresscontrollers
```
Copy to Clipboard Toggle word wrap
Example output
```
NAME      AGE
default   10m
```
```
NAME      AGE
default   10m
```
Copy to Clipboard Toggle word wrap

Note

If you have intermediate certificates, they must be included in the tls.crt file of the secret containing a custom default certificate. Order matters when specifying a certificate; list your intermediate certificate(s) after any server certificate(s).

Procedure

The following assumes that the custom certificate and key pair are in the tls.crt and tls.key files in the current working directory. Substitute the actual path names for tls.crt and tls.key. You also may substitute another name for custom-certs-default when creating the Secret resource and referencing it in the IngressController CR.

Note

This action will cause the Ingress Controller to be redeployed, using a rolling deployment strategy.

Create a Secret resource containing the custom certificate in the openshift-ingress namespace using the tls.crt and tls.key files.

oc --namespace openshift-ingress create secret tls custom-certs-default --cert=tls.crt --key=tls.key

$ oc --namespace openshift-ingress create secret tls custom-certs-default --cert=tls.crt --key=tls.key

Copy to Clipboard

Toggle word wrap

Update the IngressController CR to reference the new certificate secret:

oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \
  --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'

$ oc patch --type=merge --namespace openshift-ingress-operator ingresscontrollers/default \
  --patch '{"spec":{"defaultCertificate":{"name":"custom-certs-default"}}}'

Copy to Clipboard

Toggle word wrap

Verify the update was effective:

echo Q |\
  openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null |\
  openssl x509 -noout -subject -issuer -enddate

$ echo Q |\
  openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null |\
  openssl x509 -noout -subject -issuer -enddate

Copy to Clipboard

Toggle word wrap

where:

<domain>: Specifies the base domain name for your cluster.

Example output

subject=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = *.apps.example.com
issuer=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = example.com
notAfter=May 10 08:32:45 2022 GM

subject=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = *.apps.example.com
issuer=C = US, ST = NC, L = Raleigh, O = RH, OU = OCP4, CN = example.com
notAfter=May 10 08:32:45 2022 GM

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to set a custom default certificate:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  defaultCertificate:
    name: custom-certs-default

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  defaultCertificate:
    name: custom-certs-default

Copy to Clipboard

Toggle word wrap

The certificate secret name should match the value used to update the CR.

Once the IngressController CR has been modified, the Ingress Operator updates the Ingress Controller’s deployment to use the custom certificate.

7.8.2. Removing a custom default certificate
Copy link

As an administrator, you can remove a custom certificate that you configured an Ingress Controller to use.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).
You previously configured a custom default certificate for the Ingress Controller.

Procedure

To remove the custom certificate and restore the certificate that ships with OpenShift Container Platform, enter the following command:

oc patch -n openshift-ingress-operator ingresscontrollers/default \
  --type json -p $'- op: remove\n  path: /spec/defaultCertificate'

$ oc patch -n openshift-ingress-operator ingresscontrollers/default \
  --type json -p $'- op: remove\n  path: /spec/defaultCertificate'

Copy to Clipboard

Toggle word wrap

There can be a delay while the cluster reconciles the new certificate configuration.

Verification

To confirm that the original cluster certificate is restored, enter the following command:

echo Q | \
  openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null | \
  openssl x509 -noout -subject -issuer -enddate

$ echo Q | \
  openssl s_client -connect console-openshift-console.apps.<domain>:443 -showcerts 2>/dev/null | \
  openssl x509 -noout -subject -issuer -enddate

Copy to Clipboard

Toggle word wrap

where:

<domain>: Specifies the base domain name for your cluster.

Example output

subject=CN = *.apps.<domain>
issuer=CN = ingress-operator@1620633373
notAfter=May 10 10:44:36 2023 GMT

subject=CN = *.apps.<domain>
issuer=CN = ingress-operator@1620633373
notAfter=May 10 10:44:36 2023 GMT

Copy to Clipboard

Toggle word wrap

7.8.3. Autoscaling an Ingress Controller
Copy link

You can automatically scale an Ingress Controller to dynamically meet routing performance or availability requirements, such as the requirement to increase throughput.

The following procedure provides an example for scaling up the default Ingress Controller.

Prerequisites

You have the OpenShift CLI (oc) installed.
You have access to an OpenShift Container Platform cluster as a user with the cluster-admin role.
You installed the Custom Metrics Autoscaler Operator and an associated KEDA Controller.
- You can install the Operator by using OperatorHub on the web console. After you install the Operator, you can create an instance of KedaController.

Procedure

Create a service account to authenticate with Thanos by running the following command:

oc create -n openshift-ingress-operator serviceaccount thanos && oc describe -n openshift-ingress-operator serviceaccount thanos

$ oc create -n openshift-ingress-operator serviceaccount thanos && oc describe -n openshift-ingress-operator serviceaccount thanos

Copy to Clipboard

Toggle word wrap

Example output

Name:                thanos
Namespace:           openshift-ingress-operator
Labels:              <none>
Annotations:         <none>
Image pull secrets:  thanos-dockercfg-kfvf2
Mountable secrets:   thanos-dockercfg-kfvf2
Tokens:              thanos-token-c422q
Events:              <none>

Name:                thanos
Namespace:           openshift-ingress-operator
Labels:              <none>
Annotations:         <none>
Image pull secrets:  thanos-dockercfg-kfvf2
Mountable secrets:   thanos-dockercfg-kfvf2
Tokens:              thanos-token-c422q
Events:              <none>

Copy to Clipboard

Toggle word wrap

Manually create the service account secret token with the following command:

oc apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: thanos-token
  namespace: openshift-ingress-operator
  annotations:
    kubernetes.io/service-account.name: thanos
type: kubernetes.io/service-account-token
EOF

$ oc apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: thanos-token
  namespace: openshift-ingress-operator
  annotations:
    kubernetes.io/service-account.name: thanos
type: kubernetes.io/service-account-token
EOF

Copy to Clipboard

Toggle word wrap

Define a TriggerAuthentication object within the openshift-ingress-operator namespace by using the service account’s token.

Define the secret variable that contains the secret by running the following command:

secret=$(oc get secret -n openshift-ingress-operator | grep thanos-token | head -n 1 | awk '{ print $1 }')

$ secret=$(oc get secret -n openshift-ingress-operator | grep thanos-token | head -n 1 | awk '{ print $1 }')

Copy to Clipboard

Toggle word wrap

Create the TriggerAuthentication object and pass the value of the secret variable to the TOKEN parameter:

oc process TOKEN="$secret" -f - <<EOF | oc apply -n openshift-ingress-operator -f -
apiVersion: template.openshift.io/v1
kind: Template
parameters:
- name: TOKEN
objects:
- apiVersion: keda.sh/v1alpha1
  kind: TriggerAuthentication
  metadata:
    name: keda-trigger-auth-prometheus
  spec:
    secretTargetRef:
    - parameter: bearerToken
      name: \${TOKEN}
      key: token
    - parameter: ca
      name: \${TOKEN}
      key: ca.crt
EOF

$ oc process TOKEN="$secret" -f - <<EOF | oc apply -n openshift-ingress-operator -f -
apiVersion: template.openshift.io/v1
kind: Template
parameters:
- name: TOKEN
objects:
- apiVersion: keda.sh/v1alpha1
  kind: TriggerAuthentication
  metadata:
    name: keda-trigger-auth-prometheus
  spec:
    secretTargetRef:
    - parameter: bearerToken
      name: \${TOKEN}
      key: token
    - parameter: ca
      name: \${TOKEN}
      key: ca.crt
EOF

Copy to Clipboard

Toggle word wrap

Create and apply a role for reading metrics from Thanos:

Create a new role, thanos-metrics-reader.yaml, that reads metrics from pods and nodes:

thanos-metrics-reader.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: thanos-metrics-reader
  namespace: openshift-ingress-operator
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: thanos-metrics-reader
  namespace: openshift-ingress-operator
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - get

Copy to Clipboard

Toggle word wrap

Apply the new role by running the following command:
```
oc apply -f thanos-metrics-reader.yaml
```
```
$ oc apply -f thanos-metrics-reader.yaml
```
Copy to Clipboard Toggle word wrap

Add the new role to the service account by entering the following commands:

oc adm policy -n openshift-ingress-operator add-role-to-user thanos-metrics-reader -z thanos --role-namespace=openshift-ingress-operator

$ oc adm policy -n openshift-ingress-operator add-role-to-user thanos-metrics-reader -z thanos --role-namespace=openshift-ingress-operator

Copy to Clipboard

Toggle word wrap

oc adm policy -n openshift-ingress-operator add-cluster-role-to-user cluster-monitoring-view -z thanos

$ oc adm policy -n openshift-ingress-operator add-cluster-role-to-user cluster-monitoring-view -z thanos

Copy to Clipboard

Toggle word wrap

Note

The argument add-cluster-role-to-user is only required if you use cross-namespace queries. The following step uses a query from the kube-metrics namespace which requires this argument.

Create a new ScaledObject YAML file, ingress-autoscaler.yaml, that targets the default Ingress Controller deployment:

Example ScaledObject definition

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ingress-scaler
  namespace: openshift-ingress-operator
spec:
  scaleTargetRef: 
    apiVersion: operator.openshift.io/v1
    kind: IngressController
    name: default
    envSourceContainerName: ingress-operator
  minReplicaCount: 1
  maxReplicaCount: 20 
  cooldownPeriod: 1
  pollingInterval: 1
  triggers:
  - type: prometheus
    metricType: AverageValue
    metadata:
      serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091 
      namespace: openshift-ingress-operator 
      metricName: 'kube-node-role'
      threshold: '1'
      query: 'sum(kube_node_role{role="worker",service="kube-state-metrics"})' 
      authModes: "bearer"
    authenticationRef:
      name: keda-trigger-auth-prometheus

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: ingress-scaler
  namespace: openshift-ingress-operator
spec:
  scaleTargetRef:

1


    apiVersion: operator.openshift.io/v1
    kind: IngressController
    name: default
    envSourceContainerName: ingress-operator
  minReplicaCount: 1
  maxReplicaCount: 20

2


  cooldownPeriod: 1
  pollingInterval: 1
  triggers:
  - type: prometheus
    metricType: AverageValue
    metadata:
      serverAddress: https://thanos-querier.openshift-monitoring.svc.cluster.local:9091

3


      namespace: openshift-ingress-operator

4


      metricName: 'kube-node-role'
      threshold: '1'
      query: 'sum(kube_node_role{role="worker",service="kube-state-metrics"})'

5


      authModes: "bearer"
    authenticationRef:
      name: keda-trigger-auth-prometheus

Copy to Clipboard

Toggle word wrap

1: The custom resource that you are targeting. In this case, the Ingress Controller.
2: Optional: The maximum number of replicas. If you omit this field, the default maximum is set to 100 replicas.
3: The Thanos service endpoint in the openshift-monitoring namespace.
4: The Ingress Operator namespace.
5: This expression evaluates to however many worker nodes are present in the deployed cluster.

Important

If you are using cross-namespace queries, you must target port 9091 and not port 9092 in the serverAddress field. You also must have elevated privileges to read metrics from this port.

Apply the custom resource definition by running the following command:
```
oc apply -f ingress-autoscaler.yaml
```
```
$ oc apply -f ingress-autoscaler.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Verify that the default Ingress Controller is scaled out to match the value returned by the kube-state-metrics query by running the following commands:

Use the grep command to search the Ingress Controller YAML file for replicas:

oc get -n openshift-ingress-operator ingresscontroller/default -o yaml | grep replicas:

$ oc get -n openshift-ingress-operator ingresscontroller/default -o yaml | grep replicas:

Copy to Clipboard

Toggle word wrap

Example output

  replicas: 3

  replicas: 3

Copy to Clipboard

Toggle word wrap

Get the pods in the openshift-ingress project:

oc get pods -n openshift-ingress

$ oc get pods -n openshift-ingress

Copy to Clipboard

Toggle word wrap

Example output

NAME                             READY   STATUS    RESTARTS   AGE
router-default-7b5df44ff-l9pmm   2/2     Running   0          17h
router-default-7b5df44ff-s5sl5   2/2     Running   0          3d22h
router-default-7b5df44ff-wwsth   2/2     Running   0          66s

NAME                             READY   STATUS    RESTARTS   AGE
router-default-7b5df44ff-l9pmm   2/2     Running   0          17h
router-default-7b5df44ff-s5sl5   2/2     Running   0          3d22h
router-default-7b5df44ff-wwsth   2/2     Running   0          66s

Copy to Clipboard

Toggle word wrap

7.8.4. Scaling an Ingress Controller
Copy link

Manually scale an Ingress Controller to meeting routing performance or availability requirements such as the requirement to increase throughput. oc commands are used to scale the IngressController resource. The following procedure provides an example for scaling up the default IngressController.

Note

Scaling is not an immediate action, as it takes time to create the desired number of replicas.

Procedure

View the current number of available replicas for the default IngressController:

oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'

$ oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'

Copy to Clipboard

Toggle word wrap

Example output

Copy to Clipboard

Toggle word wrap

Scale the default IngressController to the desired number of replicas using the oc patch command. The following example scales the default IngressController to 3 replicas:

oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"replicas": 3}}' --type=merge

$ oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"replicas": 3}}' --type=merge

Copy to Clipboard

Toggle word wrap

Example output

ingresscontroller.operator.openshift.io/default patched

ingresscontroller.operator.openshift.io/default patched

Copy to Clipboard

Toggle word wrap

Verify that the default IngressController scaled to the number of replicas that you specified:

oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'

$ oc get -n openshift-ingress-operator ingresscontrollers/default -o jsonpath='{$.status.availableReplicas}'

Copy to Clipboard

Toggle word wrap

Example output

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to scale an Ingress Controller to three replicas:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 3

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 3

1

Copy to Clipboard

Toggle word wrap

1: If you need a different amount of replicas, change the replicas value.

7.8.5. Configuring Ingress access logging
Copy link

You can configure the Ingress Controller to enable access logs. If you have clusters that do not receive much traffic, then you can log to a sidecar. If you have high traffic clusters, to avoid exceeding the capacity of the logging stack or to integrate with a logging infrastructure outside of OpenShift Container Platform, you can forward logs to a custom syslog endpoint. You can also specify the format for access logs.

Container logging is useful to enable access logs on low-traffic clusters when there is no existing Syslog logging infrastructure, or for short-term use while diagnosing problems with the Ingress Controller.

Syslog is needed for high-traffic clusters where access logs could exceed the OpenShift Logging stack’s capacity, or for environments where any logging solution needs to integrate with an existing Syslog logging infrastructure. The Syslog use-cases can overlap.

Prerequisites

Log in as a user with cluster-admin privileges.

Procedure

Configure Ingress access logging to a sidecar.

To configure Ingress access logging, you must specify a destination using spec.logging.access.destination. To specify logging to a sidecar container, you must specify Container spec.logging.access.destination.type. The following example is an Ingress Controller definition that logs to a Container destination:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access:
      destination:
        type: Container

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access:
      destination:
        type: Container

Copy to Clipboard

Toggle word wrap

When you configure the Ingress Controller to log to a sidecar, the operator creates a container named logs inside the Ingress Controller Pod:

oc -n openshift-ingress logs deployment.apps/router-default -c logs

$ oc -n openshift-ingress logs deployment.apps/router-default -c logs

Copy to Clipboard

Toggle word wrap

Example output

2020-05-11T19:11:50.135710+00:00 router-default-57dfc6cd95-bpmk6 router-default-57dfc6cd95-bpmk6 haproxy[108]: 174.19.21.82:39654 [11/May/2020:19:11:50.133] public be_http:hello-openshift:hello-openshift/pod:hello-openshift:hello-openshift:10.128.2.12:8080 0/0/1/0/1 200 142 - - --NI 1/1/0/0/0 0/0 "GET / HTTP/1.1"

2020-05-11T19:11:50.135710+00:00 router-default-57dfc6cd95-bpmk6 router-default-57dfc6cd95-bpmk6 haproxy[108]: 174.19.21.82:39654 [11/May/2020:19:11:50.133] public be_http:hello-openshift:hello-openshift/pod:hello-openshift:hello-openshift:10.128.2.12:8080 0/0/1/0/1 200 142 - - --NI 1/1/0/0/0 0/0 "GET / HTTP/1.1"

Copy to Clipboard

Toggle word wrap

Configure Ingress access logging to a Syslog endpoint.

To configure Ingress access logging, you must specify a destination using spec.logging.access.destination. To specify logging to a Syslog endpoint destination, you must specify Syslog for spec.logging.access.destination.type. If the destination type is Syslog, you must also specify a destination endpoint using spec.logging.access.destination.syslog.endpoint and you can specify a facility using spec.logging.access.destination.syslog.facility. The following example is an Ingress Controller definition that logs to a Syslog destination:
```
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access:
      destination:
        type: Syslog
        syslog:
          address: 1.2.3.4
          port: 10514
```
```
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access:
      destination:
        type: Syslog
        syslog:
          address: 1.2.3.4
          port: 10514
```
Copy to Clipboard Toggle word wrap
Note
The syslog destination port must be UDP.

Configure Ingress access logging with a specific log format.

You can specify spec.logging.access.httpLogFormat to customize the log format. The following example is an Ingress Controller definition that logs to a syslog endpoint with IP address 1.2.3.4 and port 10514:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access:
      destination:
        type: Syslog
        syslog:
          address: 1.2.3.4
          port: 10514
      httpLogFormat: '%ci:%cp [%t] %ft %b/%s %B %bq %HM %HU %HV'

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access:
      destination:
        type: Syslog
        syslog:
          address: 1.2.3.4
          port: 10514
      httpLogFormat: '%ci:%cp [%t] %ft %b/%s %B %bq %HM %HU %HV'

Copy to Clipboard

Toggle word wrap

Disable Ingress access logging.

To disable Ingress access logging, leave spec.logging or spec.logging.access empty:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access: null

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access: null

Copy to Clipboard

Toggle word wrap

7.8.6. Setting Ingress Controller thread count
Copy link

A cluster administrator can set the thread count to increase the amount of incoming connections a cluster can handle. You can patch an existing Ingress Controller to increase the amount of threads.

Prerequisites

The following assumes that you already created an Ingress Controller.

Procedure

Update the Ingress Controller to increase the number of threads:
```
oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"threadCount": 8}}}'
```
```
$ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"threadCount": 8}}}'
```
Copy to Clipboard Toggle word wrap
Note
If you have a node that is capable of running large amounts of resources, you can configure spec.nodePlacement.nodeSelector with labels that match the capacity of the intended node, and configure spec.tuningOptions.threadCount to an appropriately high value.

7.8.7. Configuring an Ingress Controller to use an internal load balancer
Copy link

When creating an Ingress Controller on cloud platforms, the Ingress Controller is published by a public cloud load balancer by default. As an administrator, you can create an Ingress Controller that uses an internal cloud load balancer.

Warning

If your cloud provider is Microsoft Azure, you must have at least one public load balancer that points to your nodes. If you do not, all of your nodes will lose egress connectivity to the internet.

Important

If you want to change the scope for an IngressController, you can change the .spec.endpointPublishingStrategy.loadBalancer.scope parameter after the custom resource (CR) is created.

Figure 7.1. Diagram of LoadBalancer

OpenShift Container Platform Ingress LoadBalancerService endpoint publishing strategy

The preceding graphic shows the following concepts pertaining to OpenShift Container Platform Ingress LoadBalancerService endpoint publishing strategy:

You can load balance externally, using the cloud provider load balancer, or internally, using the OpenShift Ingress Controller Load Balancer.
You can use the single IP address of the load balancer and more familiar ports, such as 8080 and 4200 as shown on the cluster depicted in the graphic.
Traffic from the external load balancer is directed at the pods, and managed by the load balancer, as depicted in the instance of a down node. See the Kubernetes Services documentation for implementation details.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create an IngressController custom resource (CR) in a file named <name>-ingress-controller.yaml, such as in the following example:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  namespace: openshift-ingress-operator
  name: <name> 
spec:
  domain: <domain> 
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: Internal

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  namespace: openshift-ingress-operator
  name: <name>

1


spec:
  domain: <domain>

2


  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: Internal

3

Copy to Clipboard

Toggle word wrap

1: Replace <name> with a name for the IngressController object.
2: Specify the domain for the application published by the controller.
3: Specify a value of Internal to use an internal load balancer.

Create the Ingress Controller defined in the previous step by running the following command:
```
oc create -f <name>-ingress-controller.yaml
```
```
$ oc create -f <name>-ingress-controller.yaml 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <name> with the name of the IngressController object.
Optional: Confirm that the Ingress Controller was created by running the following command:
```
oc --all-namespaces=true get ingresscontrollers
```
```
$ oc --all-namespaces=true get ingresscontrollers
```
Copy to Clipboard Toggle word wrap

7.8.8. Configuring global access for an Ingress Controller on Google Cloud
Copy link

An Ingress Controller created on Google Cloud with an internal load balancer generates an internal IP address for the service. A cluster administrator can specify the global access option, which enables clients in any region within the same VPC network and compute region as the load balancer, to reach the workloads running on your cluster.

For more information, see the Google Cloud documentation for global access.

Prerequisites

You deployed an OpenShift Container Platform cluster on Google Cloud infrastructure.
You configured an Ingress Controller to use an internal load balancer.
You installed the OpenShift CLI (oc).

Procedure

Configure the Ingress Controller resource to allow global access.

Note

You can also create an Ingress Controller and specify the global access option.

Configure the Ingress Controller resource:

oc -n openshift-ingress-operator edit ingresscontroller/default

$ oc -n openshift-ingress-operator edit ingresscontroller/default

Copy to Clipboard

Toggle word wrap

Edit the YAML file:

Sample clientAccess configuration to Global

  spec:
    endpointPublishingStrategy:
      loadBalancer:
        providerParameters:
          gcp:
            clientAccess: Global 
          type: GCP
        scope: Internal
      type: LoadBalancerService

  spec:
    endpointPublishingStrategy:
      loadBalancer:
        providerParameters:
          gcp:
            clientAccess: Global

1


          type: GCP
        scope: Internal
      type: LoadBalancerService

Copy to Clipboard

Toggle word wrap

1: Set gcp.clientAccess to Global.

Save the file to apply the changes.

Run the following command to verify that the service allows global access:
```
oc -n openshift-ingress edit svc/router-default -o yaml
```
```
$ oc -n openshift-ingress edit svc/router-default -o yaml
```
Copy to Clipboard Toggle word wrap
The output shows that global access is enabled for Google Cloud with the annotation, networking.gke.io/internal-load-balancer-allow-global-access.

7.8.9. Setting the Ingress Controller health check interval
Copy link

A cluster administrator can set the health check interval to define how long the router waits between two consecutive health checks. This value is applied globally as a default for all routes. The default value is 5 seconds.

Prerequisites

The following assumes that you already created an Ingress Controller.

Procedure

Update the Ingress Controller to change the interval between back end health checks:

oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"healthCheckInterval": "8s"}}}'

$ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"healthCheckInterval": "8s"}}}'

Copy to Clipboard

Toggle word wrap

Note

To override the healthCheckInterval for a single route, use the route annotation router.openshift.io/haproxy.health.check.interval

7.8.10. Configuring the default Ingress Controller for your cluster to be internal
Copy link

You can configure the default Ingress Controller for your cluster to be internal by deleting and recreating it.

Warning

If your cloud provider is Microsoft Azure, you must have at least one public load balancer that points to your nodes. If you do not, all of your nodes will lose egress connectivity to the internet.

Important

If you want to change the scope for an IngressController, you can change the .spec.endpointPublishingStrategy.loadBalancer.scope parameter after the custom resource (CR) is created.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Configure the default Ingress Controller for your cluster to be internal by deleting and recreating it.

oc replace --force --wait --filename - <<EOF
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  namespace: openshift-ingress-operator
  name: default
spec:
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: Internal
EOF

$ oc replace --force --wait --filename - <<EOF
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  namespace: openshift-ingress-operator
  name: default
spec:
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: Internal
EOF

Copy to Clipboard

Toggle word wrap

7.8.11. Configuring the route admission policy
Copy link

Administrators and application developers can run applications in multiple namespaces with the same domain name. This is for organizations where multiple teams develop microservices that are exposed on the same hostname.

Warning

Allowing claims across namespaces should only be enabled for clusters with trust between namespaces, otherwise a malicious user could take over a hostname. For this reason, the default admission policy disallows hostname claims across namespaces.

Prerequisites

Cluster administrator privileges.

Procedure

Edit the .spec.routeAdmission field of the ingresscontroller resource variable using the following command:

oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge

$ oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge

Copy to Clipboard

Toggle word wrap

Sample Ingress Controller configuration

spec:
  routeAdmission:
    namespaceOwnership: InterNamespaceAllowed
...

spec:
  routeAdmission:
    namespaceOwnership: InterNamespaceAllowed
...

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to configure the route admission policy:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  routeAdmission:
    namespaceOwnership: InterNamespaceAllowed

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  routeAdmission:
    namespaceOwnership: InterNamespaceAllowed

Copy to Clipboard

Toggle word wrap

7.8.12. Using wildcard routes
Copy link

The HAProxy Ingress Controller has support for wildcard routes. The Ingress Operator uses wildcardPolicy to configure the ROUTER_ALLOW_WILDCARD_ROUTES environment variable of the Ingress Controller.

The default behavior of the Ingress Controller is to admit routes with a wildcard policy of None, which is backwards compatible with existing IngressController resources.

Procedure

Configure the wildcard policy.
1. Use the following command to edit the IngressController resource:
  $ oc edit IngressController
  Copy to Clipboard Toggle word wrap
2. Under spec, set the wildcardPolicy field to WildcardsDisallowed or WildcardsAllowed:
  spec: routeAdmission: wildcardPolicy: WildcardsDisallowed # or WildcardsAllowed
  Copy to Clipboard Toggle word wrap

7.8.13. Using X-Forwarded headers
Copy link

You configure the HAProxy Ingress Controller to specify a policy for how to handle HTTP headers including Forwarded and X-Forwarded-For. The Ingress Operator uses the HTTPHeaders field to configure the ROUTER_SET_FORWARDED_HEADERS environment variable of the Ingress Controller.

Procedure

Configure the HTTPHeaders field for the Ingress Controller.

Use the following command to edit the IngressController resource:
```
oc edit IngressController
```
```
$ oc edit IngressController
```
Copy to Clipboard Toggle word wrap

Under spec, set the HTTPHeaders policy field to Append, Replace, IfNone, or Never:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  httpHeaders:
    forwardedHeaderPolicy: Append

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  httpHeaders:
    forwardedHeaderPolicy: Append

Copy to Clipboard

Toggle word wrap

Example use cases

As a cluster administrator, you can:

Configure an external proxy that injects the X-Forwarded-For header into each request before forwarding it to an Ingress Controller.
To configure the Ingress Controller to pass the header through unmodified, you specify the never policy. The Ingress Controller then never sets the headers, and applications receive only the headers that the external proxy provides.
Configure the Ingress Controller to pass the X-Forwarded-For header that your external proxy sets on external cluster requests through unmodified.
To configure the Ingress Controller to set the X-Forwarded-For header on internal cluster requests, which do not go through the external proxy, specify the if-none policy. If an HTTP request already has the header set through the external proxy, then the Ingress Controller preserves it. If the header is absent because the request did not come through the proxy, then the Ingress Controller adds the header.

As an application developer, you can:

Configure an application-specific external proxy that injects the X-Forwarded-For header.
To configure an Ingress Controller to pass the header through unmodified for an application’s Route, without affecting the policy for other Routes, add an annotation haproxy.router.openshift.io/set-forwarded-headers: if-none or haproxy.router.openshift.io/set-forwarded-headers: never on the Route for the application.
Note
You can set the haproxy.router.openshift.io/set-forwarded-headers annotation on a per route basis, independent from the globally set value for the Ingress Controller.

7.8.14. Enabling HTTP/2 Ingress connectivity
Copy link

You can enable transparent end-to-end HTTP/2 connectivity in HAProxy. It allows application owners to make use of HTTP/2 protocol capabilities, including single connection, header compression, binary streams, and more.

You can enable HTTP/2 connectivity for an individual Ingress Controller or for the entire cluster.

To enable the use of HTTP/2 for the connection from the client to HAProxy, a route must specify a custom certificate. A route that uses the default certificate cannot use HTTP/2. This restriction is necessary to avoid problems from connection coalescing, where the client re-uses a connection for different routes that use the same certificate.

The connection from HAProxy to the application pod can use HTTP/2 only for re-encrypt routes and not for edge-terminated or insecure routes. This restriction is because HAProxy uses Application-Level Protocol Negotiation (ALPN), which is a TLS extension, to negotiate the use of HTTP/2 with the back-end. The implication is that end-to-end HTTP/2 is possible with passthrough and re-encrypt and not with insecure or edge-terminated routes.

Warning

Using WebSockets with a re-encrypt route and with HTTP/2 enabled on an Ingress Controller requires WebSocket support over HTTP/2. WebSockets over HTTP/2 is a feature of HAProxy 2.4, which is unsupported in OpenShift Container Platform at this time.

Important

For non-passthrough routes, the Ingress Controller negotiates its connection to the application independently of the connection from the client. This means a client may connect to the Ingress Controller and negotiate HTTP/1.1, and the Ingress Controller may then connect to the application, negotiate HTTP/2, and forward the request from the client HTTP/1.1 connection using the HTTP/2 connection to the application. This poses a problem if the client subsequently tries to upgrade its connection from HTTP/1.1 to the WebSocket protocol, because the Ingress Controller cannot forward WebSocket to HTTP/2 and cannot upgrade its HTTP/2 connection to WebSocket. Consequently, if you have an application that is intended to accept WebSocket connections, it must not allow negotiating the HTTP/2 protocol or else clients will fail to upgrade to the WebSocket protocol.

Procedure

Enable HTTP/2 on a single Ingress Controller.

To enable HTTP/2 on an Ingress Controller, enter the oc annotate command:

oc -n openshift-ingress-operator annotate ingresscontrollers/<ingresscontroller_name> ingress.operator.openshift.io/default-enable-http2=true

$ oc -n openshift-ingress-operator annotate ingresscontrollers/<ingresscontroller_name> ingress.operator.openshift.io/default-enable-http2=true

Copy to Clipboard

Toggle word wrap

Replace <ingresscontroller_name> with the name of the Ingress Controller to annotate.

Enable HTTP/2 on the entire cluster.

To enable HTTP/2 for the entire cluster, enter the oc annotate command:

oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=true

$ oc annotate ingresses.config/cluster ingress.operator.openshift.io/default-enable-http2=true

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to add the annotation:

apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
  name: cluster
  annotations:
    ingress.operator.openshift.io/default-enable-http2: "true"

apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
  name: cluster
  annotations:
    ingress.operator.openshift.io/default-enable-http2: "true"

Copy to Clipboard

Toggle word wrap

7.8.15. Configuring the PROXY protocol for an Ingress Controller
Copy link

A cluster administrator can configure the PROXY protocol when an Ingress Controller uses either the HostNetwork or NodePortService endpoint publishing strategy types. The PROXY protocol enables the load balancer to preserve the original client addresses for connections that the Ingress Controller receives. The original client addresses are useful for logging, filtering, and injecting HTTP headers. In the default configuration, the connections that the Ingress Controller receives only contain the source address that is associated with the load balancer.

This feature is not supported in cloud deployments. This restriction is because when OpenShift Container Platform runs in a cloud platform, and an IngressController specifies that a service load balancer should be used, the Ingress Operator configures the load balancer service and enables the PROXY protocol based on the platform requirement for preserving source addresses.

Important

You must configure both OpenShift Container Platform and the external load balancer to either use the PROXY protocol or to use TCP.

Warning

The PROXY protocol is unsupported for the default Ingress Controller with installer-provisioned clusters on non-cloud platforms that use a Keepalived Ingress VIP.

Prerequisites

You created an Ingress Controller.

Procedure

Edit the Ingress Controller resource:

oc -n openshift-ingress-operator edit ingresscontroller/default

$ oc -n openshift-ingress-operator edit ingresscontroller/default

Copy to Clipboard

Toggle word wrap

Set the PROXY configuration:
- If your Ingress Controller uses the hostNetwork endpoint publishing strategy type, set the spec.endpointPublishingStrategy.hostNetwork.protocol subfield to PROXY:
  Sample hostNetwork configuration to PROXY
  spec: endpointPublishingStrategy: hostNetwork: protocol: PROXY type: HostNetwork
  
  Copy to Clipboard Toggle word wrap
- If your Ingress Controller uses the NodePortService endpoint publishing strategy type, set the spec.endpointPublishingStrategy.nodePort.protocol subfield to PROXY:
  Sample nodePort configuration to PROXY
  spec: endpointPublishingStrategy: nodePort: protocol: PROXY type: NodePortService
  
  Copy to Clipboard Toggle word wrap

7.8.16. Specifying an alternative cluster domain using the appsDomain option
Copy link

As a cluster administrator, you can specify an alternative to the default cluster domain for user-created routes by configuring the appsDomain field. The appsDomain field is an optional domain for OpenShift Container Platform to use instead of the default, which is specified in the domain field. If you specify an alternative domain, it overrides the default cluster domain for the purpose of determining the default host for a new route.

For example, you can use the DNS domain for your company as the default domain for routes and ingresses for applications running on your cluster.

Prerequisites

You deployed an OpenShift Container Platform cluster.
You installed the oc command-line interface.

Procedure

Configure the appsDomain field by specifying an alternative default domain for user-created routes.
1. Edit the ingress cluster resource:
  $ oc edit ingresses.config/cluster -o yaml
  Copy to Clipboard Toggle word wrap
2. Edit the YAML file:
  Sample appsDomain configuration to test.example.com
  apiVersion: config.openshift.io/v1 kind: Ingress metadata: name: cluster spec: domain: apps.example.com
  1
  appsDomain: <test.example.com>
  2
  
  Copy to Clipboard Toggle word wrap
  1
  Specifies the default domain. You cannot modify the default domain after installation.
  2
  Optional: Domain for OpenShift Container Platform infrastructure to use for application routes. Instead of the default prefix, apps, you can use an alternative prefix like test.

Verify that an existing route contains the domain name specified in the appsDomain field by exposing the route and verifying the route domain change:

Note

Wait for the openshift-apiserver finish rolling updates before exposing the route.

Expose the route:

oc expose service hello-openshift
route.route.openshift.io/hello-openshift exposed

$ oc expose service hello-openshift
route.route.openshift.io/hello-openshift exposed

Copy to Clipboard

Toggle word wrap

Example output:

oc get routes
NAME              HOST/PORT                                   PATH   SERVICES          PORT       TERMINATION   WILDCARD
hello-openshift   hello_openshift-<my_project>.test.example.com
hello-openshift   8080-tcp                 None

$ oc get routes
NAME              HOST/PORT                                   PATH   SERVICES          PORT       TERMINATION   WILDCARD
hello-openshift   hello_openshift-<my_project>.test.example.com
hello-openshift   8080-tcp                 None

Copy to Clipboard

Toggle word wrap

7.8.17. Converting HTTP header case
Copy link

HAProxy lowercases HTTP header names by default; for example, changing Host: xyz.com to host: xyz.com. If legacy applications are sensitive to the capitalization of HTTP header names, use the Ingress Controller spec.httpHeaders.headerNameCaseAdjustments API field for a solution to accommodate legacy applications until they can be fixed.

Important

OpenShift Container Platform includes HAProxy 2.2. If you want to update to this version of the web-based load balancer, ensure that you add the spec.httpHeaders.headerNameCaseAdjustments section to your cluster’s configuration file.

As a cluster administrator, you can convert the HTTP header case by entering the oc patch command or by setting the HeaderNameCaseAdjustments field in the Ingress Controller YAML file.

Prerequisites

You have installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.

Procedure

Capitalize an HTTP header by using the oc patch command.

Change the HTTP header from host to Host by running the following command:

oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"httpHeaders":{"headerNameCaseAdjustments":["Host"]}}}'

$ oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"httpHeaders":{"headerNameCaseAdjustments":["Host"]}}}'

Copy to Clipboard

Toggle word wrap

Create a Route resource YAML file so that the annotation can be applied to the application.

Example of a route named my-application

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/h1-adjust-case: true 
  name: <application_name>
  namespace: <application_name>
# ...

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/h1-adjust-case: true

1


  name: <application_name>
  namespace: <application_name>
# ...

Copy to Clipboard

Toggle word wrap

1: Set haproxy.router.openshift.io/h1-adjust-case so that the Ingress Controller can adjust the host request header as specified.

Specify adjustments by configuring the HeaderNameCaseAdjustments field in the Ingress Controller YAML configuration file.

The following example Ingress Controller YAML file adjusts the host header to Host for HTTP/1 requests to appropriately annotated routes:

Example Ingress Controller YAML

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  httpHeaders:
    headerNameCaseAdjustments:
    - Host

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  httpHeaders:
    headerNameCaseAdjustments:
    - Host

Copy to Clipboard

Toggle word wrap

The following example route enables HTTP response header name case adjustments by using the haproxy.router.openshift.io/h1-adjust-case annotation:

Example route YAML

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/h1-adjust-case: true 
  name: my-application
  namespace: my-application
spec:
  to:
    kind: Service
    name: my-application

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/h1-adjust-case: true

1


  name: my-application
  namespace: my-application
spec:
  to:
    kind: Service
    name: my-application

Copy to Clipboard

Toggle word wrap

1: Set haproxy.router.openshift.io/h1-adjust-case to true.

7.8.18. Using router compression
Copy link

You configure the HAProxy Ingress Controller to specify router compression globally for specific MIME types. You can use the mimeTypes variable to define the formats of MIME types to which compression is applied. The types are: application, image, message, multipart, text, video, or a custom type prefaced by "X-". To see the full notation for MIME types and subtypes, see RFC1341.

Note

Memory allocated for compression can affect the max connections. Additionally, compression of large buffers can cause latency, like heavy regex or long lists of regex.

Not all MIME types benefit from compression, but HAProxy still uses resources to try to compress if instructed to. Generally, text formats, such as html, css, and js, formats benefit from compression, but formats that are already compressed, such as image, audio, and video, benefit little in exchange for the time and resources spent on compression.

Procedure

Configure the httpCompression field for the Ingress Controller.

Use the following command to edit the IngressController resource:

oc edit -n openshift-ingress-operator ingresscontrollers/default

$ oc edit -n openshift-ingress-operator ingresscontrollers/default

Copy to Clipboard

Toggle word wrap

Under spec, set the httpCompression policy field to mimeTypes and specify a list of MIME types that should have compression applied:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  httpCompression:
    mimeTypes:
    - "text/html"
    - "text/css; charset=utf-8"
    - "application/json"
   ...

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  httpCompression:
    mimeTypes:
    - "text/html"
    - "text/css; charset=utf-8"
    - "application/json"
   ...

Copy to Clipboard

Toggle word wrap

7.8.19. Exposing router metrics
Copy link

You can expose the HAProxy router metrics by default in Prometheus format on the default stats port, 1936. The external metrics collection and aggregation systems such as Prometheus can access the HAProxy router metrics. You can view the HAProxy router metrics in a browser in the HTML and comma separated values (CSV) format.

Prerequisites

You configured your firewall to access the default stats port, 1936.

Procedure

Get the router pod name by running the following command:

oc get pods -n openshift-ingress

$ oc get pods -n openshift-ingress

Copy to Clipboard

Toggle word wrap

Example output

NAME                              READY   STATUS    RESTARTS   AGE
router-default-76bfffb66c-46qwp   1/1     Running   0          11h

NAME                              READY   STATUS    RESTARTS   AGE
router-default-76bfffb66c-46qwp   1/1     Running   0          11h

Copy to Clipboard

Toggle word wrap

Get the router’s username and password, which the router pod stores in the /var/lib/haproxy/conf/metrics-auth/statsUsername and /var/lib/haproxy/conf/metrics-auth/statsPassword files:
1. Get the username by running the following command:
  $ oc rsh <router_pod_name> cat metrics-auth/statsUsername
  Copy to Clipboard Toggle word wrap
2. Get the password by running the following command:
  $ oc rsh <router_pod_name> cat metrics-auth/statsPassword
  Copy to Clipboard Toggle word wrap
Get the router IP and metrics certificates by running the following command:
```
oc describe pod <router_pod>
```
```
$ oc describe pod <router_pod>
```
Copy to Clipboard Toggle word wrap

Get the raw statistics in Prometheus format by running the following command:

curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics

$ curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics

Copy to Clipboard

Toggle word wrap

Access the metrics securely by running the following command:

curl -u user:password https://<router_IP>:<stats_port>/metrics -k

$ curl -u user:password https://<router_IP>:<stats_port>/metrics -k

Copy to Clipboard

Toggle word wrap

Access the default stats port, 1936, by running the following command:

curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics

$ curl -u <user>:<password> http://<router_IP>:<stats_port>/metrics

Copy to Clipboard

Toggle word wrap

Example 7.1. Example output

...
# HELP haproxy_backend_connections_total Total number of connections.
# TYPE haproxy_backend_connections_total gauge
haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route"} 0
haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route-alt"} 0
haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route01"} 0
...
# HELP haproxy_exporter_server_threshold Number of servers tracked and the current threshold value.
# TYPE haproxy_exporter_server_threshold gauge
haproxy_exporter_server_threshold{type="current"} 11
haproxy_exporter_server_threshold{type="limit"} 500
...
# HELP haproxy_frontend_bytes_in_total Current total of incoming bytes.
# TYPE haproxy_frontend_bytes_in_total gauge
haproxy_frontend_bytes_in_total{frontend="fe_no_sni"} 0
haproxy_frontend_bytes_in_total{frontend="fe_sni"} 0
haproxy_frontend_bytes_in_total{frontend="public"} 119070
...
# HELP haproxy_server_bytes_in_total Current total of incoming bytes.
# TYPE haproxy_server_bytes_in_total gauge
haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_no_sni",service=""} 0
haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_sni",service=""} 0
haproxy_server_bytes_in_total{namespace="default",pod="docker-registry-5-nk5fz",route="docker-registry",server="10.130.0.89:5000",service="docker-registry"} 0
haproxy_server_bytes_in_total{namespace="default",pod="hello-rc-vkjqx",route="hello-route",server="10.130.0.90:8080",service="hello-svc-1"} 0
...

...
# HELP haproxy_backend_connections_total Total number of connections.
# TYPE haproxy_backend_connections_total gauge
haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route"} 0
haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route-alt"} 0
haproxy_backend_connections_total{backend="http",namespace="default",route="hello-route01"} 0
...
# HELP haproxy_exporter_server_threshold Number of servers tracked and the current threshold value.
# TYPE haproxy_exporter_server_threshold gauge
haproxy_exporter_server_threshold{type="current"} 11
haproxy_exporter_server_threshold{type="limit"} 500
...
# HELP haproxy_frontend_bytes_in_total Current total of incoming bytes.
# TYPE haproxy_frontend_bytes_in_total gauge
haproxy_frontend_bytes_in_total{frontend="fe_no_sni"} 0
haproxy_frontend_bytes_in_total{frontend="fe_sni"} 0
haproxy_frontend_bytes_in_total{frontend="public"} 119070
...
# HELP haproxy_server_bytes_in_total Current total of incoming bytes.
# TYPE haproxy_server_bytes_in_total gauge
haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_no_sni",service=""} 0
haproxy_server_bytes_in_total{namespace="",pod="",route="",server="fe_sni",service=""} 0
haproxy_server_bytes_in_total{namespace="default",pod="docker-registry-5-nk5fz",route="docker-registry",server="10.130.0.89:5000",service="docker-registry"} 0
haproxy_server_bytes_in_total{namespace="default",pod="hello-rc-vkjqx",route="hello-route",server="10.130.0.90:8080",service="hello-svc-1"} 0
...

Copy to Clipboard

Toggle word wrap

Launch the stats window by entering the following URL in a browser:
```
http://<user>:<password>@<router_IP>:<stats_port>
```
```
http://<user>:<password>@<router_IP>:<stats_port>
```
Copy to Clipboard Toggle word wrap
Optional: Get the stats in CSV format by entering the following URL in a browser:
```
http://<user>:<password>@<router_ip>:1936/metrics;csv
```
```
http://<user>:<password>@<router_ip>:1936/metrics;csv
```
Copy to Clipboard Toggle word wrap

7.8.20. Customizing HAProxy error code response pages
Copy link

As a cluster administrator, you can specify a custom error code response page for either 503, 404, or both error pages. The HAProxy router serves a 503 error page when the application pod is not running or a 404 error page when the requested URL does not exist. For example, if you customize the 503 error code response page, then the page is served when the application pod is not running, and the default 404 error code HTTP response page is served by the HAProxy router for an incorrect route or a non-existing route.

Custom error code response pages are specified in a config map then patched to the Ingress Controller. The config map keys have two available file names as follows: error-page-503.http and error-page-404.http.

Custom HTTP error code response pages must follow the HAProxy HTTP error page configuration guidelines. Here is an example of the default OpenShift Container Platform HAProxy router http 503 error code response page. You can use the default content as a template for creating your own custom page.

By default, the HAProxy router serves only a 503 error page when the application is not running or when the route is incorrect or non-existent. This default behavior is the same as the behavior on OpenShift Container Platform 4.8 and earlier. If a config map for the customization of an HTTP error code response is not provided, and you are using a custom HTTP error code response page, the router serves a default 404 or 503 error code response page.

Note

If you use the OpenShift Container Platform default 503 error code page as a template for your customizations, the headers in the file require an editor that can use CRLF line endings.

Procedure

Create a config map named my-custom-error-code-pages in the openshift-config namespace:
```
oc -n openshift-config create configmap my-custom-error-code-pages \
--from-file=error-page-503.http \
--from-file=error-page-404.http
```
```
$ oc -n openshift-config create configmap my-custom-error-code-pages \
--from-file=error-page-503.http \
--from-file=error-page-404.http
```
Copy to Clipboard Toggle word wrap
Important
If you do not specify the correct format for the custom error code response page, a router pod outage occurs. To resolve this outage, you must delete or correct the config map and delete the affected router pods so they can be recreated with the correct information.
Patch the Ingress Controller to reference the my-custom-error-code-pages config map by name:
```
oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
```
```
$ oc patch -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"httpErrorCodePages":{"name":"my-custom-error-code-pages"}}}' --type=merge
```
Copy to Clipboard Toggle word wrap
The Ingress Operator copies the my-custom-error-code-pages config map from the openshift-config namespace to the openshift-ingress namespace. The Operator names the config map according to the pattern, <your_ingresscontroller_name>-errorpages, in the openshift-ingress namespace.

Display the copy:

oc get cm default-errorpages -n openshift-ingress

$ oc get cm default-errorpages -n openshift-ingress

Copy to Clipboard

Toggle word wrap

Example output

NAME                       DATA   AGE
default-errorpages         2      25s

NAME                       DATA   AGE
default-errorpages         2      25s

1

Copy to Clipboard

Toggle word wrap

1: The example config map name is default-errorpages because the default Ingress Controller custom resource (CR) was patched.

Confirm that the config map containing the custom error response page mounts on the router volume where the config map key is the filename that has the custom HTTP error code response:

For 503 custom HTTP custom error code response:

oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-503.http

$ oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-503.http

Copy to Clipboard

Toggle word wrap

For 404 custom HTTP custom error code response:

oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-404.http

$ oc -n openshift-ingress rsh <router_pod> cat /var/lib/haproxy/conf/error_code_pages/error-page-404.http

Copy to Clipboard

Toggle word wrap

Verification

Verify your custom error code HTTP response:

Create a test project and application:
```
oc new-project test-ingress
```
```
 $ oc new-project test-ingress
```
Copy to Clipboard Toggle word wrap
```
oc new-app django-psql-example
```
```
$ oc new-app django-psql-example
```
Copy to Clipboard Toggle word wrap
For 503 custom http error code response:
1. Stop all the pods for the application.
2. Run the following curl command or visit the route hostname in the browser:
  $ curl -vk <route_hostname>
  Copy to Clipboard Toggle word wrap
For 404 custom http error code response:
1. Visit a non-existent route or an incorrect route.
2. Run the following curl command or visit the route hostname in the browser:
  $ curl -vk <route_hostname>
  Copy to Clipboard Toggle word wrap

Check if the errorfile attribute is properly in the haproxy.config file:

oc -n openshift-ingress rsh <router> cat /var/lib/haproxy/conf/haproxy.config | grep errorfile

$ oc -n openshift-ingress rsh <router> cat /var/lib/haproxy/conf/haproxy.config | grep errorfile

Copy to Clipboard

Toggle word wrap

7.8.21. Setting the Ingress Controller maximum connections
Copy link

A cluster administrator can set the maximum number of simultaneous connections for OpenShift router deployments. You can patch an existing Ingress Controller to increase the maximum number of connections.

Prerequisites

The following assumes that you already created an Ingress Controller

Procedure

Update the Ingress Controller to change the maximum number of connections for HAProxy:
```
oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"maxConnections": 7500}}}'
```
```
$ oc -n openshift-ingress-operator patch ingresscontroller/default --type=merge -p '{"spec":{"tuningOptions": {"maxConnections": 7500}}}'
```
Copy to Clipboard Toggle word wrap
Warning
If you set the spec.tuningOptions.maxConnections value greater than the current operating system limit, the HAProxy process will not start. See the table in the "Ingress Controller configuration parameters" section for more information about this parameter.

Chapter 8. Ingress Node Firewall Operator in OpenShift Container Platform
Copy link

The Ingress Node Firewall Operator provides a stateless, eBPF-based firewall for managing node-level ingress traffic in OpenShift Container Platform.

8.1. Ingress Node Firewall Operator
Copy link

The Ingress Node Firewall Operator provides ingress firewall rules at a node level by deploying the daemon set to nodes you specify and manage in the firewall configurations. To deploy the daemon set, you create an IngressNodeFirewallConfig custom resource (CR). The Operator applies the IngressNodeFirewallConfig CR to create ingress node firewall daemon set daemon, which run on all nodes that match the nodeSelector.

You configure rules of the IngressNodeFirewall CR and apply them to clusters using the nodeSelector and setting values to "true".

Important

The Ingress Node Firewall Operator supports only stateless firewall rules.

The maximum transmission units (MTU) parameter is 4Kb (kilobytes) in OpenShift Container Platform 4.13.

Network interface controllers (NICs) that do not support native XDP drivers will run at a lower performance.

Ingress Node Firewall Operator is not supported on Amazon Web Services (AWS) with the default OpenShift installation or on Red Hat OpenShift Service on AWS (ROSA). For more information on Red Hat OpenShift Service on AWS support and ingress, see Ingress Operator in Red Hat OpenShift Service on AWS.

8.2. Installing the Ingress Node Firewall Operator
Copy link

As a cluster administrator, you can install the Ingress Node Firewall Operator by using the OpenShift Container Platform CLI or the web console.

8.2.1. Installing the Ingress Node Firewall Operator using the CLI
Copy link

As a cluster administrator, you can install the Operator using the CLI.

Prerequisites

You have installed the OpenShift CLI (oc).
You have an account with administrator privileges.

Procedure

To create the openshift-ingress-node-firewall namespace, enter the following command:

cat << EOF| oc create -f -
apiVersion: v1
kind: Namespace
metadata:
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/enforce-version: v1.24
  name: openshift-ingress-node-firewall
EOF

$ cat << EOF| oc create -f -
apiVersion: v1
kind: Namespace
metadata:
  labels:
    pod-security.kubernetes.io/enforce: privileged
    pod-security.kubernetes.io/enforce-version: v1.24
  name: openshift-ingress-node-firewall
EOF

Copy to Clipboard

Toggle word wrap

To create an OperatorGroup CR, enter the following command:

cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: ingress-node-firewall-operators
  namespace: openshift-ingress-node-firewall
EOF

$ cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: ingress-node-firewall-operators
  namespace: openshift-ingress-node-firewall
EOF

Copy to Clipboard

Toggle word wrap

Subscribe to the Ingress Node Firewall Operator.

To create a Subscription CR for the Ingress Node Firewall Operator, enter the following command:

cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ingress-node-firewall-sub
  namespace: openshift-ingress-node-firewall
spec:
  name: ingress-node-firewall
  channel: stable
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

$ cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ingress-node-firewall-sub
  namespace: openshift-ingress-node-firewall
spec:
  name: ingress-node-firewall
  channel: stable
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

Copy to Clipboard

Toggle word wrap

To verify that the Operator is installed, enter the following command:

oc get ip -n openshift-ingress-node-firewall

$ oc get ip -n openshift-ingress-node-firewall

Copy to Clipboard

Toggle word wrap

Example output

NAME            CSV                                         APPROVAL    APPROVED
install-5cvnz   ingress-node-firewall.4.13.0-202211122336   Automatic   true

NAME            CSV                                         APPROVAL    APPROVED
install-5cvnz   ingress-node-firewall.4.13.0-202211122336   Automatic   true

Copy to Clipboard

Toggle word wrap

To verify the version of the Operator, enter the following command:

oc get csv -n openshift-ingress-node-firewall

$ oc get csv -n openshift-ingress-node-firewall

Copy to Clipboard

Toggle word wrap

Example output

NAME                                        DISPLAY                          VERSION               REPLACES                                    PHASE
ingress-node-firewall.4.13.0-202211122336   Ingress Node Firewall Operator   4.13.0-202211122336   ingress-node-firewall.4.13.0-202211102047   Succeeded

NAME                                        DISPLAY                          VERSION               REPLACES                                    PHASE
ingress-node-firewall.4.13.0-202211122336   Ingress Node Firewall Operator   4.13.0-202211122336   ingress-node-firewall.4.13.0-202211102047   Succeeded

Copy to Clipboard

Toggle word wrap

8.2.2. Installing the Ingress Node Firewall Operator using the web console
Copy link

As a cluster administrator, you can install the Operator using the web console.

Prerequisites

You have installed the OpenShift CLI (oc).
You have an account with administrator privileges.

Procedure

Install the Ingress Node Firewall Operator:
1. In the OpenShift Container Platform web console, click Operators → OperatorHub.
2. Select Ingress Node Firewall Operator from the list of available Operators, and then click Install.
3. On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
4. Click Install.
Verify that the Ingress Node Firewall Operator is installed successfully:
1. Navigate to the Operators → Installed Operators page.
2. Ensure that Ingress Node Firewall Operator is listed in the openshift-ingress-node-firewall project with a Status of InstallSucceeded.
  Note
  During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
  If the Operator does not have a Status of InstallSucceeded, troubleshoot using the following steps:
  - Inspect the Operator Subscriptions and Install Plans tabs for any failures or errors under Status.
  - Navigate to the Workloads → Pods page and check the logs for pods in the openshift-ingress-node-firewall project.
  - Check the namespace of the YAML file. If the annotation is missing, you can add the annotation workload.openshift.io/allowed=management to the Operator namespace with the following command:
    
    $ oc annotate ns/openshift-ingress-node-firewall workload.openshift.io/allowed=management
    
    Copy to Clipboard Toggle word wrap
    
    Note
    For single-node OpenShift clusters, the openshift-ingress-node-firewall namespace requires the workload.openshift.io/allowed=management annotation.

8.3. Deploying Ingress Node Firewall Operator
Copy link

Prerequisite

The Ingress Node Firewall Operator is installed.

Procedure

To deploy the Ingress Node Firewall Operator, create a IngressNodeFirewallConfig custom resource that will deploy the Operator’s daemon set. You can deploy one or multiple IngressNodeFirewall CRDs to nodes by applying firewall rules.

Create the IngressNodeFirewallConfig inside the openshift-ingress-node-firewall namespace named ingressnodefirewallconfig.
Run the following command to deploy Ingress Node Firewall Operator rules:
```
oc apply -f rule.yaml
```
```
$ oc apply -f rule.yaml
```
Copy to Clipboard Toggle word wrap

8.3.1. Ingress Node Firewall configuration object
Copy link

The fields for the Ingress Node Firewall configuration object are described in the following table:

Expand

Table 8.1. Ingress Node Firewall Configuration object
Field	Type	Description
`metadata.name`	`string`	The name of the CR object. The name of the firewall rules object must be `ingressnodefirewallconfig`.
`metadata.namespace`	`string`	Namespace for the Ingress Firewall Operator CR object. The `IngressNodeFirewallConfig` CR must be created inside the `openshift-ingress-node-firewall` namespace.
`spec.nodeSelector`	`string`	A node selection constraint used to target nodes through specified node labels. For example: `spec: nodeSelector: node-role.kubernetes.io/worker: ""` Copy to Clipboard Toggle word wrap Note One label used in `nodeSelector` must match a label on the nodes in order for the daemon set to start. For example, if the node labels `node-role.kubernetes.io/worker` and `node-type.kubernetes.io/vm` are applied to a node, then at least one label must be set using `nodeSelector` for the daemon set to start.

Note

The Operator consumes the CR and creates an ingress node firewall daemon set on all the nodes that match the nodeSelector.

Ingress Node Firewall Operator example configuration

A complete Ingress Node Firewall Configuration is specified in the following example:

Example Ingress Node Firewall Configuration object

apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewallConfig
metadata:
  name: ingressnodefirewallconfig
  namespace: openshift-ingress-node-firewall
spec:
  nodeSelector:
    node-role.kubernetes.io/worker: ""

apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewallConfig
metadata:
  name: ingressnodefirewallconfig
  namespace: openshift-ingress-node-firewall
spec:
  nodeSelector:
    node-role.kubernetes.io/worker: ""

Copy to Clipboard

Toggle word wrap

Note

The Operator consumes the CR and creates an ingress node firewall daemon set on all the nodes that match the nodeSelector.

8.3.2. Ingress Node Firewall rules object
Copy link

The fields for the Ingress Node Firewall rules object are described in the following table:

Expand

Table 8.2. Ingress Node Firewall rules object
Field	Type	Description
`metadata.name`	`string`	The name of the CR object.
`interfaces`	`array`	The fields for this object specify the interfaces to apply the firewall rules to. For example, `- en0` and `- en1`.
`nodeSelector`	`array`	You can use `nodeSelector` to select the nodes to apply the firewall rules to. Set the value of your named `nodeselector` labels to `true` to apply the rule.
`ingress`	`object`	`ingress` allows you to configure the rules that allow outside access to the services on your cluster.

8.3.2.1. Ingress object configuration
Copy link

The values for the ingress object are defined in the following table:

Expand

Table 8.3. ingress object
Field	Type	Description
`sourceCIDRs`	`array`	Allows you to set the CIDR block. You can configure multiple CIDRs from different address families. Note Different CIDRs allow you to use the same order rule. In the case that there are multiple `IngressNodeFirewall` objects for the same nodes and interfaces with overlapping CIDRs, the `order` field will specify which rule is applied first. Rules are applied in ascending order.
`rules`	`array`	Ingress firewall `rules.order` objects are ordered starting at `1` for each `source.CIDR` with up to 100 rules per CIDR. Lower order rules are executed first. `rules.protocolConfig.protocol` supports the following protocols: TCP, UDP, SCTP, ICMP and ICMPv6. ICMP and ICMPv6 rules can match against ICMP and ICMPv6 types or codes. TCP, UDP, and SCTP rules can match against a single destination port or a range of ports using `<start : end-1>` format. Set `rules.action` to `allow` to apply the rule or `deny` to disallow the rule. Note Ingress firewall rules are verified using a verification webhook that blocks any invalid configuration. The verification webhook prevents you from blocking any critical cluster services such as the API server or SSH.

8.3.2.2. Ingress Node Firewall rules object example
Copy link

A complete Ingress Node Firewall configuration is specified in the following example:

Example Ingress Node Firewall configuration

apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewall
metadata:
  name: ingressnodefirewall
spec:
  interfaces:
  - eth0
  nodeSelector:
    matchLabels:
      <ingress_firewall_label_name>: <label_value> 
  ingress:
  - sourceCIDRs:
       - 172.16.0.0/12
    rules:
    - order: 10
      protocolConfig:
        protocol: ICMP
        icmp:
          icmpType: 8 #ICMP Echo request
      action: Deny
    - order: 20
      protocolConfig:
        protocol: TCP
        tcp:
          ports: "8000-9000"
      action: Deny
  - sourceCIDRs:
       - fc00:f853:ccd:e793::0/64
    rules:
    - order: 10
      protocolConfig:
        protocol: ICMPv6
        icmpv6:
          icmpType: 128 #ICMPV6 Echo request
      action: Deny

apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewall
metadata:
  name: ingressnodefirewall
spec:
  interfaces:
  - eth0
  nodeSelector:
    matchLabels:
      <ingress_firewall_label_name>: <label_value>

1


  ingress:
  - sourceCIDRs:
       - 172.16.0.0/12
    rules:
    - order: 10
      protocolConfig:
        protocol: ICMP
        icmp:
          icmpType: 8 #ICMP Echo request
      action: Deny
    - order: 20
      protocolConfig:
        protocol: TCP
        tcp:
          ports: "8000-9000"
      action: Deny
  - sourceCIDRs:
       - fc00:f853:ccd:e793::0/64
    rules:
    - order: 10
      protocolConfig:
        protocol: ICMPv6
        icmpv6:
          icmpType: 128 #ICMPV6 Echo request
      action: Deny

Copy to Clipboard

Toggle word wrap

1: A <label_name> and a <label_value> must exist on the node and must match the nodeselector label and value applied to the nodes you want the ingressfirewallconfig CR to run on. The <label_value> can be true or false. By using nodeSelector labels, you can target separate groups of nodes to apply different rules to using the ingressfirewallconfig CR.

8.3.2.3. Zero trust Ingress Node Firewall rules object example
Copy link

Zero trust Ingress Node Firewall rules can provide additional security to multi-interface clusters. For example, you can use zero trust Ingress Node Firewall rules to drop all traffic on a specific interface except for SSH.

A complete configuration of a zero trust Ingress Node Firewall rule set is specified in the following example:

Important

Users need to add all ports their application will use to their allowlist in the following case to ensure proper functionality.

Example zero trust Ingress Node Firewall rules

apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewall
metadata:
 name: ingressnodefirewall-zero-trust
spec:
 interfaces:
 - eth1 
 nodeSelector:
   matchLabels:
     <ingress_firewall_label_name>: <label_value> 
 ingress:
 - sourceCIDRs:
      - 0.0.0.0/0 
   rules:
   - order: 10
     protocolConfig:
       protocol: TCP
       tcp:
         ports: 22
     action: Allow
   - order: 20
     action: Deny

apiVersion: ingressnodefirewall.openshift.io/v1alpha1
kind: IngressNodeFirewall
metadata:
 name: ingressnodefirewall-zero-trust
spec:
 interfaces:
 - eth1

1


 nodeSelector:
   matchLabels:
     <ingress_firewall_label_name>: <label_value>

2


 ingress:
 - sourceCIDRs:
      - 0.0.0.0/0

3


   rules:
   - order: 10
     protocolConfig:
       protocol: TCP
       tcp:
         ports: 22
     action: Allow
   - order: 20
     action: Deny

4

Copy to Clipboard

Toggle word wrap

1: Network-interface cluster
2: The <label_name> and <label_value> needs to match the nodeSelector label and value applied to the specific nodes with which you wish to apply the ingressfirewallconfig CR.
3: 0.0.0.0/0 set to match any CIDR
4: action set to Deny

8.4. Viewing Ingress Node Firewall Operator rules
Copy link

Procedure

Run the following command to view all current rules :
```
oc get ingressnodefirewall
```
```
$ oc get ingressnodefirewall
```
Copy to Clipboard Toggle word wrap
Choose one of the returned <resource> names and run the following command to view the rules or configs:
```
oc get <resource> <name> -o yaml
```
```
$ oc get <resource> <name> -o yaml
```
Copy to Clipboard Toggle word wrap

8.5. Troubleshooting the Ingress Node Firewall Operator
Copy link

Run the following command to list installed Ingress Node Firewall custom resource definitions (CRD):

oc get crds | grep ingressnodefirewall

$ oc get crds | grep ingressnodefirewall

Copy to Clipboard

Toggle word wrap

Example output

NAME               READY   UP-TO-DATE   AVAILABLE   AGE
ingressnodefirewallconfigs.ingressnodefirewall.openshift.io       2022-08-25T10:03:01Z
ingressnodefirewallnodestates.ingressnodefirewall.openshift.io    2022-08-25T10:03:00Z
ingressnodefirewalls.ingressnodefirewall.openshift.io             2022-08-25T10:03:00Z

NAME               READY   UP-TO-DATE   AVAILABLE   AGE
ingressnodefirewallconfigs.ingressnodefirewall.openshift.io       2022-08-25T10:03:01Z
ingressnodefirewallnodestates.ingressnodefirewall.openshift.io    2022-08-25T10:03:00Z
ingressnodefirewalls.ingressnodefirewall.openshift.io             2022-08-25T10:03:00Z

Copy to Clipboard

Toggle word wrap

Run the following command to view the state of the Ingress Node Firewall Operator:

oc get pods -n openshift-ingress-node-firewall

$ oc get pods -n openshift-ingress-node-firewall

Copy to Clipboard

Toggle word wrap

Example output

NAME                                       READY  STATUS         RESTARTS  AGE
ingress-node-firewall-controller-manager   2/2    Running        0         5d21h
ingress-node-firewall-daemon-pqx56         3/3    Running        0         5d21h

NAME                                       READY  STATUS         RESTARTS  AGE
ingress-node-firewall-controller-manager   2/2    Running        0         5d21h
ingress-node-firewall-daemon-pqx56         3/3    Running        0         5d21h

Copy to Clipboard

Toggle word wrap

The following fields provide information about the status of the Operator: READY, STATUS, AGE, and RESTARTS. The STATUS field is Running when the Ingress Node Firewall Operator is deploying a daemon set to the assigned nodes.

Run the following command to collect all ingress firewall node pods' logs:
```
oc adm must-gather – gather_ingress_node_firewall
```
```
$ oc adm must-gather – gather_ingress_node_firewall
```
Copy to Clipboard Toggle word wrap
The logs are available in the sos node’s report containing eBPF bpftool outputs at /sos_commands/ebpf. These reports include lookup tables used or updated as the ingress firewall XDP handles packet processing, updates statistics, and emits events.

Chapter 9. Configuring an Ingress Controller for manual DNS Management
Copy link

As a cluster administrator, when you create an Ingress Controller, the Operator manages the DNS records automatically. This has some limitations when the required DNS zone is different from the cluster DNS zone or when the DNS zone is hosted outside the cloud provider.

As a cluster administrator, you can configure an Ingress Controller to stop automatic DNS management and start manual DNS management. Set dnsManagementPolicy to specify when it should be automatically or manually managed.

When you change an Ingress Controller from Managed to Unmanaged DNS management policy, the Operator does not clean up the previous wildcard DNS record provisioned on the cloud. When you change an Ingress Controller from Unmanaged to Managed DNS management policy, the Operator attempts to create the DNS record on the cloud provider if it does not exist or updates the DNS record if it already exists.

Important

When you set dnsManagementPolicy to unmanaged, you have to manually manage the lifecycle of the wildcard DNS record on the cloud provider.

9.1. Managed DNS management policy
Copy link

The Managed DNS management policy for Ingress Controllers ensures that the lifecycle of the wildcard DNS record on the cloud provider is automatically managed by the Operator.

9.2. Unmanaged DNS management policy
Copy link

The Unmanaged DNS management policy for Ingress Controllers ensures that the lifecycle of the wildcard DNS record on the cloud provider is not automatically managed, instead it becomes the responsibility of the cluster administrator.

Note

On the AWS cloud platform, if the domain on the Ingress Controller does not match with dnsConfig.Spec.BaseDomain then the DNS management policy is automatically set to Unmanaged.

9.3. Creating a custom Ingress Controller with the Unmanaged DNS management policy
Copy link

As a cluster administrator, you can create a new custom Ingress Controller with the Unmanaged DNS management policy.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create a custom resource (CR) file named sample-ingress.yaml containing the following:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  namespace: openshift-ingress-operator
  name: <name> 
spec:
  domain: <domain> 
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: External 
      dnsManagementPolicy: Unmanaged

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  namespace: openshift-ingress-operator
  name: <name>

1


spec:
  domain: <domain>

2


  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: External

3


      dnsManagementPolicy: Unmanaged

4

Copy to Clipboard

Toggle word wrap

1: Specify the <name> with a name for the IngressController object.
2: Specify the domain based on the DNS record that was created as a prerequisite.
3: Specify the scope as External to expose the load balancer externally.
4: dnsManagementPolicy indicates if the Ingress Controller is managing the lifecycle of the wildcard DNS record associated with the load balancer. The valid values are Managed and Unmanaged. The default value is Managed.

Save the file to apply the changes.
```
oc apply -f <name>.yaml 
```
```
oc apply -f <name>.yaml 
```
1
Copy to Clipboard Toggle word wrap

9.4. Modifying an existing Ingress Controller
Copy link

As a cluster administrator, you can modify an existing Ingress Controller to manually manage the DNS record lifecycle.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Modify the chosen IngressController to set dnsManagementPolicy:

SCOPE=$(oc -n openshift-ingress-operator get ingresscontroller <name> -o=jsonpath="{.status.endpointPublishingStrategy.loadBalancer.scope}")

oc -n openshift-ingress-operator patch ingresscontrollers/<name> --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"dnsManagementPolicy":"Unmanaged", "scope":"${SCOPE}"}}}}'

SCOPE=$(oc -n openshift-ingress-operator get ingresscontroller <name> -o=jsonpath="{.status.endpointPublishingStrategy.loadBalancer.scope}")

oc -n openshift-ingress-operator patch ingresscontrollers/<name> --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"dnsManagementPolicy":"Unmanaged", "scope":"${SCOPE}"}}}}'

Copy to Clipboard

Toggle word wrap

Optional: You can delete the associated DNS record in the cloud provider.

Chapter 10. Verifying connectivity to an endpoint
Copy link

The Cluster Network Operator (CNO) runs a controller, the connectivity check controller, that performs a connection health check between resources within your cluster. By reviewing the results of the health checks, you can diagnose connection problems or eliminate network connectivity as the cause of an issue that you are investigating.

10.1. Connection health checks that are performed
Copy link

To verify that cluster resources are reachable, a TCP connection is made to each of the following cluster API services:

Kubernetes API server service
Kubernetes API server endpoints
OpenShift API server service
OpenShift API server endpoints
Load balancers

To verify that services and service endpoints are reachable on every node in the cluster, a TCP connection is made to each of the following targets:

Health check target service
Health check target endpoints

10.2. Implementation of connection health checks
Copy link

The connectivity check controller orchestrates connection verification checks in your cluster. The results for the connection tests are stored in PodNetworkConnectivity objects in the openshift-network-diagnostics namespace. Connection tests are performed every minute in parallel.

The Cluster Network Operator (CNO) deploys several resources to the cluster to send and receive connectivity health checks:

Health check source: This program deploys in a single pod replica set managed by a Deployment object. The program consumes PodNetworkConnectivity objects and connects to the spec.targetEndpoint specified in each object.
Health check target: A pod deployed as part of a daemon set on every node in the cluster. The pod listens for inbound health checks. The presence of this pod on every node allows for the testing of connectivity to each node.

You can configure the nodes which network connectivity sources and targets run on with a node selector. Additionally, you can specify permissible tolerations for source and target pods. The configuration is defined in the singleton cluster custom resource of the Network API in the config.openshift.io/v1 API group.

Pod scheduling occurs after you have updated the configuration. Therefore, you must apply node labels that you intend to use in your selectors before updating the configuration. Labels applied after updating your network connectivity check pod placement are ignored.

Refer to the default configuration in the following YAML:

Default configuration for connectivity source and target pods

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  # ...
    networkDiagnostics: 
      mode: "All" 
      sourcePlacement: 
        nodeSelector:
          checkNodes: groupA
        tolerations:
        - key: myTaint
          effect: NoSchedule
          operator: Exists
      targetPlacement: 
        nodeSelector:
          checkNodes: groupB
        tolerations:
        - key: myOtherTaint
          effect: NoExecute
          operator: Exists

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  # ...
    networkDiagnostics:

1


      mode: "All"

2


      sourcePlacement:

3


        nodeSelector:
          checkNodes: groupA
        tolerations:
        - key: myTaint
          effect: NoSchedule
          operator: Exists
      targetPlacement:

4


        nodeSelector:
          checkNodes: groupB
        tolerations:
        - key: myOtherTaint
          effect: NoExecute
          operator: Exists

Copy to Clipboard

Toggle word wrap

1 1: Specifies the network diagnostics configuration. If a value is not specified or an empty object is specified, and spec.disableNetworkDiagnostics=true is set in the network.operator.openshift.io custom resource named cluster, network diagnostics are disabled. If set, this value overrides spec.disableNetworkDiagnostics=true.
2: Specifies the diagnostics mode. The value can be the empty string, All, or Disabled. The empty string is equivalent to specifying All.
3: Optional: Specifies a selector for connectivity check source pods. You can use the nodeSelector and tolerations fields to further specify the sourceNode pods. These are optional for both source and target pods. You can omit them, use both, or use only one of them.
4: Optional: Specifies a selector for connectivity check target pods. You can use the nodeSelector and tolerations fields to further specify the targetNode pods. These are optional for both source and target pods. You can omit them, use both, or use only one of them.

10.3. PodNetworkConnectivityCheck object fields
Copy link

The PodNetworkConnectivityCheck object fields are described in the following tables.

Expand

Table 10.1. PodNetworkConnectivityCheck object fields
Field	Type	Description
`metadata.name`	`string`	The name of the object in the following format: `<source>-to-<target>`. The destination described by `<target>` includes one of following strings: `load-balancer-api-external` `load-balancer-api-internal` `kubernetes-apiserver-endpoint` `kubernetes-apiserver-service-cluster` `network-check-target` `openshift-apiserver-endpoint` `openshift-apiserver-service-cluster`
`metadata.namespace`	`string`	The namespace that the object is associated with. This value is always `openshift-network-diagnostics`.
`spec.sourcePod`	`string`	The name of the pod where the connection check originates, such as `network-check-source-596b4c6566-rgh92`.
`spec.targetEndpoint`	`string`	The target of the connection check, such as `api.devcluster.example.com:6443`.
`spec.tlsClientCert`	`object`	Configuration for the TLS certificate to use.
`spec.tlsClientCert.name`	`string`	The name of the TLS certificate used, if any. The default value is an empty string.
`status`	`object`	An object representing the condition of the connection test and logs of recent connection successes and failures.
`status.conditions`	`array`	The latest status of the connection check and any previous statuses.
`status.failures`	`array`	Connection test logs from unsuccessful attempts.
`status.outages`	`array`	Connect test logs covering the time periods of any outages.
`status.successes`	`array`	Connection test logs from successful attempts.

The following table describes the fields for objects in the status.conditions array:

Expand

Table 10.2. status.conditions
Field	Type	Description
`lastTransitionTime`	`string`	The time that the condition of the connection transitioned from one status to another.
`message`	`string`	The details about last transition in a human readable format.
`reason`	`string`	The last status of the transition in a machine readable format.
`status`	`string`	The status of the condition.
`type`	`string`	The type of the condition.

The following table describes the fields for objects in the status.conditions array:

Expand

Table 10.3. status.outages
Field	Type	Description
`end`	`string`	The timestamp from when the connection failure is resolved.
`endLogs`	`array`	Connection log entries, including the log entry related to the successful end of the outage.
`message`	`string`	A summary of outage details in a human readable format.
`start`	`string`	The timestamp from when the connection failure is first detected.
`startLogs`	`array`	Connection log entries, including the original failure.

10.3.1. Connection log fields
Copy link

The fields for a connection log entry are described in the following table. The object is used in the following fields:

status.failures[]
status.successes[]
status.outages[].startLogs[]
status.outages[].endLogs[]

Expand

Table 10.4. Connection log object
Field	Type	Description
`latency`	`string`	Records the duration of the action.
`message`	`string`	Provides the status in a human readable format.
`reason`	`string`	Provides the reason for status in a machine readable format. The value is one of `TCPConnect`, `TCPConnectError`, `DNSResolve`, `DNSError`.
`success`	`boolean`	Indicates if the log entry is a success or failure.
`time`	`string`	The start time of connection check.

10.4. Verifying network connectivity for an endpoint
Copy link

As a cluster administrator, you can verify the connectivity of an endpoint, such as an API server, load balancer, service, or pod.

Prerequisites

Install the OpenShift CLI (oc).
Access to the cluster as a user with the cluster-admin role.

Procedure

To list the current PodNetworkConnectivityCheck objects, enter the following command:

oc get podnetworkconnectivitycheck -n openshift-network-diagnostics

$ oc get podnetworkconnectivitycheck -n openshift-network-diagnostics

Copy to Clipboard

Toggle word wrap

Example output

NAME                                                                                                                                AGE
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0   75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1   73m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2   75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-service-cluster                               75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-default-service-cluster                                 75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-external                                         75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-internal                                         75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-0            75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-1            75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-2            75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh      74m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-c-n8mbf      74m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-d-4hnrz      74m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-service-cluster                               75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0    75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1    75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2    74m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-service-cluster                                75m

NAME                                                                                                                                AGE
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0   75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1   73m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2   75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-service-cluster                               75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-default-service-cluster                                 75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-external                                         75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-load-balancer-api-internal                                         75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-0            75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-1            75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-master-2            75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh      74m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-c-n8mbf      74m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-ci-ln-x5sv9rb-f76d1-4rzrp-worker-d-4hnrz      74m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-network-check-target-service-cluster                               75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0    75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-1    75m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-2    74m
network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-openshift-apiserver-service-cluster                                75m

Copy to Clipboard

Toggle word wrap

View the connection test logs:

From the output of the previous command, identify the endpoint that you want to review the connectivity logs for.

View the object by entering the following command:

oc get podnetworkconnectivitycheck <name> \
  -n openshift-network-diagnostics -o yaml

$ oc get podnetworkconnectivitycheck <name> \
  -n openshift-network-diagnostics -o yaml

Copy to Clipboard

Toggle word wrap

where <name> specifies the name of the PodNetworkConnectivityCheck object.

Example output

apiVersion: controlplane.operator.openshift.io/v1alpha1
kind: PodNetworkConnectivityCheck
metadata:
  name: network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0
  namespace: openshift-network-diagnostics
  ...
spec:
  sourcePod: network-check-source-7c88f6d9f-hmg2f
  targetEndpoint: 10.0.0.4:6443
  tlsClientCert:
    name: ""
status:
  conditions:
  - lastTransitionTime: "2021-01-13T20:11:34Z"
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnectSuccess
    status: "True"
    type: Reachable
  failures:
  - latency: 2.241775ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed
      to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect:
      connection refused'
    reason: TCPConnectError
    success: false
    time: "2021-01-13T20:10:34Z"
  - latency: 2.582129ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed
      to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect:
      connection refused'
    reason: TCPConnectError
    success: false
    time: "2021-01-13T20:09:34Z"
  - latency: 3.483578ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed
      to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect:
      connection refused'
    reason: TCPConnectError
    success: false
    time: "2021-01-13T20:08:34Z"
  outages:
  - end: "2021-01-13T20:11:34Z"
    endLogs:
    - latency: 2.032018ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        tcp connection to 10.0.0.4:6443 succeeded'
      reason: TCPConnect
      success: true
      time: "2021-01-13T20:11:34Z"
    - latency: 2.241775ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
        connect: connection refused'
      reason: TCPConnectError
      success: false
      time: "2021-01-13T20:10:34Z"
    - latency: 2.582129ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
        connect: connection refused'
      reason: TCPConnectError
      success: false
      time: "2021-01-13T20:09:34Z"
    - latency: 3.483578ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
        connect: connection refused'
      reason: TCPConnectError
      success: false
      time: "2021-01-13T20:08:34Z"
    message: Connectivity restored after 2m59.999789186s
    start: "2021-01-13T20:08:34Z"
    startLogs:
    - latency: 3.483578ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
        connect: connection refused'
      reason: TCPConnectError
      success: false
      time: "2021-01-13T20:08:34Z"
  successes:
  - latency: 2.845865ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:14:34Z"
  - latency: 2.926345ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:13:34Z"
  - latency: 2.895796ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:12:34Z"
  - latency: 2.696844ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:11:34Z"
  - latency: 1.502064ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:10:34Z"
  - latency: 1.388857ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:09:34Z"
  - latency: 1.906383ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:08:34Z"
  - latency: 2.089073ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:07:34Z"
  - latency: 2.156994ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:06:34Z"
  - latency: 1.777043ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:05:34Z"

apiVersion: controlplane.operator.openshift.io/v1alpha1
kind: PodNetworkConnectivityCheck
metadata:
  name: network-check-source-ci-ln-x5sv9rb-f76d1-4rzrp-worker-b-6xdmh-to-kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0
  namespace: openshift-network-diagnostics
  ...
spec:
  sourcePod: network-check-source-7c88f6d9f-hmg2f
  targetEndpoint: 10.0.0.4:6443
  tlsClientCert:
    name: ""
status:
  conditions:
  - lastTransitionTime: "2021-01-13T20:11:34Z"
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnectSuccess
    status: "True"
    type: Reachable
  failures:
  - latency: 2.241775ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed
      to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect:
      connection refused'
    reason: TCPConnectError
    success: false
    time: "2021-01-13T20:10:34Z"
  - latency: 2.582129ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed
      to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect:
      connection refused'
    reason: TCPConnectError
    success: false
    time: "2021-01-13T20:09:34Z"
  - latency: 3.483578ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: failed
      to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443: connect:
      connection refused'
    reason: TCPConnectError
    success: false
    time: "2021-01-13T20:08:34Z"
  outages:
  - end: "2021-01-13T20:11:34Z"
    endLogs:
    - latency: 2.032018ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        tcp connection to 10.0.0.4:6443 succeeded'
      reason: TCPConnect
      success: true
      time: "2021-01-13T20:11:34Z"
    - latency: 2.241775ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
        connect: connection refused'
      reason: TCPConnectError
      success: false
      time: "2021-01-13T20:10:34Z"
    - latency: 2.582129ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
        connect: connection refused'
      reason: TCPConnectError
      success: false
      time: "2021-01-13T20:09:34Z"
    - latency: 3.483578ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
        connect: connection refused'
      reason: TCPConnectError
      success: false
      time: "2021-01-13T20:08:34Z"
    message: Connectivity restored after 2m59.999789186s
    start: "2021-01-13T20:08:34Z"
    startLogs:
    - latency: 3.483578ms
      message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0:
        failed to establish a TCP connection to 10.0.0.4:6443: dial tcp 10.0.0.4:6443:
        connect: connection refused'
      reason: TCPConnectError
      success: false
      time: "2021-01-13T20:08:34Z"
  successes:
  - latency: 2.845865ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:14:34Z"
  - latency: 2.926345ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:13:34Z"
  - latency: 2.895796ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:12:34Z"
  - latency: 2.696844ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:11:34Z"
  - latency: 1.502064ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:10:34Z"
  - latency: 1.388857ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:09:34Z"
  - latency: 1.906383ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:08:34Z"
  - latency: 2.089073ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:07:34Z"
  - latency: 2.156994ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:06:34Z"
  - latency: 1.777043ms
    message: 'kubernetes-apiserver-endpoint-ci-ln-x5sv9rb-f76d1-4rzrp-master-0: tcp
      connection to 10.0.0.4:6443 succeeded'
    reason: TCPConnect
    success: true
    time: "2021-01-13T21:05:34Z"

Copy to Clipboard

Toggle word wrap

Chapter 11. Changing the MTU for the cluster network
Copy link

As a cluster administrator, you can change the MTU for the cluster network after cluster installation. This change is disruptive as cluster nodes must be rebooted to finalize the MTU change. You can change the MTU only for clusters using the OVN-Kubernetes or OpenShift SDN network plugins.

11.1. About the cluster MTU
Copy link

During installation the maximum transmission unit (MTU) for the cluster network is detected automatically based on the MTU of the primary network interface of nodes in the cluster. You do not normally need to override the detected MTU.

You might want to change the MTU of the cluster network for several reasons:

The MTU detected during cluster installation is not correct for your infrastructure
Your cluster infrastructure now requires a different MTU, such as from the addition of nodes that need a different MTU for optimal performance

You can change the cluster MTU for only the OVN-Kubernetes and OpenShift SDN cluster network plugins.

11.1.1. Service interruption considerations
Copy link

When you initiate an MTU change on your cluster the following effects might impact service availability:

At least two rolling reboots are required to complete the migration to a new MTU. During this time, some nodes are not available as they restart.
Specific applications deployed to the cluster with shorter timeout intervals than the absolute TCP timeout interval might experience disruption during the MTU change.

11.1.2. MTU value selection
Copy link

When planning your MTU migration there are two related but distinct MTU values to consider.

Hardware MTU: This MTU value is set based on the specifics of your network infrastructure.
Cluster network MTU: This MTU value is always less than your hardware MTU to account for the cluster network overlay overhead. The specific overhead is determined by your network plugin:
- OVN-Kubernetes: 100 bytes
- OpenShift SDN: 50 bytes

If your cluster requires different MTU values for different nodes, you must subtract the overhead value for your network plugin from the lowest MTU value that is used by any node in your cluster. For example, if some nodes in your cluster have an MTU of 9001, and some have an MTU of 1500, you must set this value to 1400.

Important

To avoid selecting an MTU value that is not acceptable by a node, verify the maximum MTU value (maxmtu) that is accepted by the network interface by using the ip -d link command.

11.1.3. How the migration process works
Copy link

The following table summarizes the migration process by segmenting between the user-initiated steps in the process and the actions that the migration performs in response.

Expand

Table 11.1. Live migration of the cluster MTU
User-initiated steps	OpenShift Container Platform activity
Set the following values in the Cluster Network Operator configuration: `spec.migration.mtu.machine.to` `spec.migration.mtu.network.from` `spec.migration.mtu.network.to`	Cluster Network Operator (CNO): Confirms that each field is set to a valid value. The `mtu.machine.to` must be set to either the new hardware MTU or to the current hardware MTU if the MTU for the hardware is not changing. This value is transient and is used as part of the migration process. Separately, if you specify a hardware MTU that is different from your existing hardware MTU value, you must manually configure the MTU to persist by other means, such as with a machine config, DHCP setting, or a Linux kernel command line. The `mtu.network.from` field must equal the `network.status.clusterNetworkMTU` field, which is the current MTU of the cluster network. The `mtu.network.to` field must be set to the target cluster network MTU and must be lower than the hardware MTU to allow for the overlay overhead of the network plugin. For OVN-Kubernetes, the overhead is `100` bytes and for OpenShift SDN the overhead is `50` bytes. If the values provided are valid, the CNO writes out a new temporary configuration with the MTU for the cluster network set to the value of the `mtu.network.to` field. Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster.
Reconfigure the MTU of the primary network interface for the nodes on the cluster. You can use a variety of methods to accomplish this, including: Deploying a new NetworkManager connection profile with the MTU change Changing the MTU through a DHCP server setting Changing the MTU through boot parameters	N/A
Set the `mtu` value in the CNO configuration for the network plugin and set `spec.migration` to `null`.	Machine Config Operator (MCO): Performs a rolling reboot of each node in the cluster with the new MTU configuration.

11.2. Changing the cluster MTU
Copy link

As a cluster administrator, you can change the maximum transmission unit (MTU) for your cluster. The migration is disruptive and nodes in your cluster might be temporarily unavailable as the MTU update rolls out.

The following procedure describes how to change the cluster MTU by using either machine configs, DHCP, or an ISO. If you use the DHCP or ISO approach, you must refer to configuration artifacts that you kept after installing your cluster to complete the procedure.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You identified the target MTU for your cluster. The correct MTU varies depending on the network plugin that your cluster uses:
- OVN-Kubernetes: The cluster MTU must be set to 100 less than the lowest hardware MTU value in your cluster.
- OpenShift SDN: The cluster MTU must be set to 50 less than the lowest hardware MTU value in your cluster.
If your nodes are physical machines, ensure that the cluster network and the connected network switches support jumbo frames.
If your nodes are virtual machines (VMs), ensure that the hypervisor and the connected network switches support jumbo frames.

Procedure

To increase or decrease the MTU for the cluster network complete the following procedure.

To obtain the current MTU for the cluster network, enter the following command:

oc describe network.config cluster

$ oc describe network.config cluster

Copy to Clipboard

Toggle word wrap

Example output

...
Status:
  Cluster Network:
    Cidr:               10.217.0.0/22
    Host Prefix:        23
  Cluster Network MTU:  1400
  Network Type:         OpenShiftSDN
  Service Network:
    10.217.4.0/23
...

...
Status:
  Cluster Network:
    Cidr:               10.217.0.0/22
    Host Prefix:        23
  Cluster Network MTU:  1400
  Network Type:         OpenShiftSDN
  Service Network:
    10.217.4.0/23
...

Copy to Clipboard

Toggle word wrap

Prepare your configuration for the hardware MTU:
- If your hardware MTU is specified with DHCP, update your DHCP configuration such as with the following dnsmasq configuration:
  dhcp-option-force=26,<mtu>
  Copy to Clipboard Toggle word wrap
  where:
  <mtu>
  Specifies the hardware MTU for the DHCP server to advertise.
- If your hardware MTU is specified with a kernel command line with PXE, update that configuration accordingly.
- If your hardware MTU is specified in a NetworkManager connection configuration, complete the following steps. This approach is the default for OpenShift Container Platform if you do not explicitly specify your network configuration with DHCP, a kernel command line, or some other method. Your cluster nodes must all use the same underlying network configuration for the following procedure to work unmodified.
  1. Find the primary network interface:
    If you are using the OpenShift SDN network plugin, enter the following command:
    
    $ oc debug node/<node_name> -- chroot /host ip route list match 0.0.0.0/0 | awk '{print $5 }'
    
    Copy to Clipboard Toggle word wrap
    
    where:
    <node_name>
    Specifies the name of a node in your cluster.
    If you are using the OVN-Kubernetes network plugin, enter the following command:
    
    $ oc debug node/<node_name> -- chroot /host nmcli -g connection.interface-name c show ovs-if-phys0
    
    Copy to Clipboard Toggle word wrap
    
    where:
    <node_name>
    Specifies the name of a node in your cluster.
  2. Create the following NetworkManager configuration in the <interface>-mtu.conf file:
    Example NetworkManager connection configuration
    
    [connection-<interface>-mtu] match-device=interface-name:<interface> ethernet.mtu=<mtu>
    
    Copy to Clipboard Toggle word wrap
    
    where:
    <mtu>
    Specifies the new hardware MTU value.
    <interface>
    Specifies the primary network interface name.
  3. Create two MachineConfig objects, one for the control plane nodes and another for the worker nodes in your cluster:
    Create the following Butane config in the control-plane-interface.bu file:
    Note
    The Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in 0. For example, 4.13.0. See "Creating machine configs with Butane" for information about Butane.
    
    variant: openshift version: 4.13.0 metadata: name: 01-control-plane-interface labels: machineconfiguration.openshift.io/role: master storage: files: - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf
    1
    contents: local: <interface>-mtu.conf
    2
    mode: 0600
    
    Copy to Clipboard Toggle word wrap
    
    1
    Specify the NetworkManager connection name for the primary network interface.
    2
    Specify the local filename for the updated NetworkManager configuration file from the previous step.
    Create the following Butane config in the worker-interface.bu file:
    Note
    The Butane version you specify in the config file should match the OpenShift Container Platform version and always ends in 0. For example, 4.13.0. See "Creating machine configs with Butane" for information about Butane.
    
    variant: openshift version: 4.13.0 metadata: name: 01-worker-interface labels: machineconfiguration.openshift.io/role: worker storage: files: - path: /etc/NetworkManager/conf.d/99-<interface>-mtu.conf
    1
    contents: local: <interface>-mtu.conf
    2
    mode: 0600
    
    Copy to Clipboard Toggle word wrap
    
    1
    Specify the NetworkManager connection name for the primary network interface.
    2
    Specify the local filename for the updated NetworkManager configuration file from the previous step.
    Create MachineConfig objects from the Butane configs by running the following command:
    
    $ for manifest in control-plane-interface worker-interface; do butane --files-dir . $manifest.bu > $manifest.yaml done
    
    Copy to Clipboard Toggle word wrap

To begin the MTU migration, specify the migration configuration by entering the following command. The Machine Config Operator performs a rolling reboot of the nodes in the cluster in preparation for the MTU change.

oc patch Network.operator.openshift.io cluster --type=merge --patch \
  '{"spec": { "migration": { "mtu": { "network": { "from": <overlay_from>, "to": <overlay_to> } , "machine": { "to" : <machine_to> } } } } }'

$ oc patch Network.operator.openshift.io cluster --type=merge --patch \
  '{"spec": { "migration": { "mtu": { "network": { "from": <overlay_from>, "to": <overlay_to> } , "machine": { "to" : <machine_to> } } } } }'

Copy to Clipboard

Toggle word wrap

where:

<overlay_from>: Specifies the current cluster network MTU value.
<overlay_to>: Specifies the target MTU for the cluster network. This value is set relative to the value for <machine_to> and for OVN-Kubernetes must be 100 less and for OpenShift SDN must be 50 less.
<machine_to>: Specifies the MTU for the primary network interface on the underlying host network.

Example that increases the cluster MTU

oc patch Network.operator.openshift.io cluster --type=merge --patch \
  '{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 9000 } , "machine": { "to" : 9100} } } } }'

$ oc patch Network.operator.openshift.io cluster --type=merge --patch \
  '{"spec": { "migration": { "mtu": { "network": { "from": 1400, "to": 9000 } , "machine": { "to" : 9100} } } } }'

Copy to Clipboard

Toggle word wrap

As the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
```
oc get mcp
```
```
$ oc get mcp
```
Copy to Clipboard Toggle word wrap
A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.
Note
By default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
1. To list the machine configuration state and the name of the applied machine configuration, enter the following command:
  $ oc describe node | egrep "hostname|machineconfig"
  Copy to Clipboard Toggle word wrap
  Example output
  kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
  
  Copy to Clipboard Toggle word wrap
  Verify that the following statements are true:
  - The value of machineconfiguration.openshift.io/state field is Done.
  - The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.
2. To confirm that the machine config is correct, enter the following command:
  $ oc get machineconfig <config_name> -o yaml | grep ExecStart
  Copy to Clipboard Toggle word wrap
  where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.
  The machine config must include the following update to the systemd configuration:
  ExecStart=/usr/local/bin/mtu-migration.sh
  Copy to Clipboard Toggle word wrap
Update the underlying network interface MTU value:
- If you are specifying the new MTU with a NetworkManager connection configuration, enter the following command. The MachineConfig Operator automatically performs a rolling reboot of the nodes in your cluster.
  $ for manifest in control-plane-interface worker-interface; do oc create -f $manifest.yaml done
  Copy to Clipboard Toggle word wrap
- If you are specifying the new MTU with a DHCP server option or a kernel command line and PXE, make the necessary changes for your infrastructure.
As the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
```
oc get mcp
```
```
$ oc get mcp
```
Copy to Clipboard Toggle word wrap
A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.
Note
By default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
1. To list the machine configuration state and the name of the applied machine configuration, enter the following command:
  $ oc describe node | egrep "hostname|machineconfig"
  Copy to Clipboard Toggle word wrap
  Example output
  kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
  
  Copy to Clipboard Toggle word wrap
  Verify that the following statements are true:
  - The value of machineconfiguration.openshift.io/state field is Done.
  - The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.
2. To confirm that the machine config is correct, enter the following command:
  $ oc get machineconfig <config_name> -o yaml | grep path:
  Copy to Clipboard Toggle word wrap
  where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.
  If the machine config is successfully deployed, the previous output contains the /etc/NetworkManager/conf.d/99-<interface>-mtu.conf file path and the ExecStart=/usr/local/bin/mtu-migration.sh line.

To finalize the MTU migration, enter one of the following commands:

If you are using the OVN-Kubernetes network plugin:

oc patch Network.operator.openshift.io cluster --type=merge --patch \
  '{"spec": { "migration": null, "defaultNetwork":{ "ovnKubernetesConfig": { "mtu": <mtu> }}}}'

$ oc patch Network.operator.openshift.io cluster --type=merge --patch \
  '{"spec": { "migration": null, "defaultNetwork":{ "ovnKubernetesConfig": { "mtu": <mtu> }}}}'

Copy to Clipboard

Toggle word wrap

where:

<mtu>: Specifies the new cluster network MTU that you specified with <overlay_to>.

If you are using the OpenShift SDN network plugin:

oc patch Network.operator.openshift.io cluster --type=merge --patch \
  '{"spec": { "migration": null, "defaultNetwork":{ "openshiftSDNConfig": { "mtu": <mtu> }}}}'

$ oc patch Network.operator.openshift.io cluster --type=merge --patch \
  '{"spec": { "migration": null, "defaultNetwork":{ "openshiftSDNConfig": { "mtu": <mtu> }}}}'

Copy to Clipboard

Toggle word wrap

where:

<mtu>: Specifies the new cluster network MTU that you specified with <overlay_to>.

After finalizing the MTU migration, each MCP node is rebooted one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
```
oc get mcp
```
```
$ oc get mcp
```
Copy to Clipboard Toggle word wrap
A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.

Verification

You can verify that a node in your cluster uses an MTU that you specified in the previous procedure.

To get the current MTU for the cluster network, enter the following command:
```
oc describe network.config cluster
```
```
$ oc describe network.config cluster
```
Copy to Clipboard Toggle word wrap
Get the current MTU for the primary network interface of a node.
1. To list the nodes in your cluster, enter the following command:
  $ oc get nodes
  Copy to Clipboard Toggle word wrap
2. To obtain the current MTU setting for the primary network interface on a node, enter the following command:
  $ oc debug node/<node> -- chroot /host ip address show <interface>
  Copy to Clipboard Toggle word wrap
  where:
  <node>
  Specifies a node from the output from the previous step.
  <interface>
  Specifies the primary network interface name for the node.
  Example output
  ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8051
  
  Copy to Clipboard Toggle word wrap

Chapter 12. Configuring the node port service range
Copy link

During cluster installation, you can configure the node port range to meet the requirements of your cluster. After cluster installation, only a cluster administrator can expand the range as a postinstallation task. If your cluster uses a large number of node ports, consider increasing the available port range according to the requirements of your cluster.

If you do not set a node port range during cluster installation, the default range of 30000-32768 applies to your cluster. In this situation, you can expand the range on either side, but you must preserve 30000-32768 within your new port range.

Important

Red Hat has not performed testing outside the default port range of 30000-32768. For ranges outside the default port range, ensure that you test to verify the expanding node port range does not impact your cluster. In particular, ensure that there is:

No overlap with any ports already in use by host processes
No overlap with any ports already in use by pods that are configured with host networking

If you expanded the range and a port allocation issue occurs, create a new cluster and set the required range for it.

If you expand the node port range and OpenShift CLI (oc) stops working because of a port conflict with the OpenShift Container Platform API server, you must create a new cluster.

12.1. Expanding the node port range
Copy link

You can expand the node port range for your cluster. After you install your OpenShift Container Platform cluster, you cannot shrink the node port range on either side of the currently configured range.

Important

Red Hat has not performed testing outside the default port range of 30000-32768. For ranges outside the default port range, ensure that you test to verify that expanding your node port range does not impact your cluster. If you expanded the range and a port allocation issue occurs, create a new cluster and set the required range for it.

Prerequisites

Installed the OpenShift CLI (oc).
Logged in to the cluster as a user with cluster-admin privileges.
You ensured that your cluster infrastructure allows access to the ports that exist in the extended range. For example, if you expand the node port range to 30000-32900, your firewall or packet filtering configuration must allow the inclusive port range of 30000-32900.

Procedure

To expand the range for the serviceNodePortRange parameter in the network.config.openshift.io object that your cluster uses to manage traffic for pods, enter the following command:

oc patch network.config.openshift.io cluster --type=merge -p \
  '{
    "spec":
      { "serviceNodePortRange": "<port_range>" }
  }'

$ oc patch network.config.openshift.io cluster --type=merge -p \
  '{
    "spec":
      { "serviceNodePortRange": "<port_range>" }
  }'

Copy to Clipboard

Toggle word wrap

where:

<port_range>: specifies your expanded range, such as 30000-32900.

Tip

You can also apply the following YAML to update the node port range:

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  serviceNodePortRange: "<port_range>"
# ...

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  serviceNodePortRange: "<port_range>"
# ...

Copy to Clipboard

Toggle word wrap

Example output

network.config.openshift.io/cluster patched

network.config.openshift.io/cluster patched

Copy to Clipboard

Toggle word wrap

Verification

To confirm that the updated configuration is active, enter the following command. The update can take several minutes to apply.

oc get configmaps -n openshift-kube-apiserver config \
  -o jsonpath="{.data['config\.yaml']}" | \
  grep -Eo '"service-node-port-range":["[[:digit:]]+-[[:digit:]]+"]'

$ oc get configmaps -n openshift-kube-apiserver config \
  -o jsonpath="{.data['config\.yaml']}" | \
  grep -Eo '"service-node-port-range":["[[:digit:]]+-[[:digit:]]+"]'

Copy to Clipboard

Toggle word wrap

Example output

"service-node-port-range":["30000-32900"]

"service-node-port-range":["30000-32900"]

Copy to Clipboard

Toggle word wrap

Chapter 13. Configuring the cluster network range
Copy link

As a cluster administrator, you can expand the cluster network range after cluster installation. You might want to expand the cluster network range if you need more IP addresses for additional nodes.

For example, if you deployed a cluster and specified 10.128.0.0/19 as the cluster network range and a host prefix of 23, you are limited to 16 nodes. You can expand that to 510 nodes by changing the CIDR mask on a cluster to /14.

When expanding the cluster network address range, your cluster must use the OVN-Kubernetes network plugin. Other network plugins are not supported.

The following limitations apply when modifying the cluster network IP address range:

The CIDR mask size specified must always be smaller than the currently configured CIDR mask size, because you can only increase IP space by adding more nodes to an installed cluster
The host prefix cannot be modified
Pods that are configured with an overridden default gateway must be recreated after the cluster network expands

13.1. Expanding the cluster network IP address range
Copy link

You can expand the IP address range for the cluster network. Because this change requires rolling out a new Operator configuration across the cluster, it can take up to 30 minutes to take effect.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster with a user with cluster-admin privileges.
Ensure that the cluster uses the OVN-Kubernetes network plugin.

Procedure

To obtain the cluster network range and host prefix for your cluster, enter the following command:

oc get network.operator.openshift.io \
  -o jsonpath="{.items[0].spec.clusterNetwork}"

$ oc get network.operator.openshift.io \
  -o jsonpath="{.items[0].spec.clusterNetwork}"

Copy to Clipboard

Toggle word wrap

Example output

[{"cidr":"10.217.0.0/22","hostPrefix":23}]

[{"cidr":"10.217.0.0/22","hostPrefix":23}]

Copy to Clipboard

Toggle word wrap

To expand the cluster network IP address range, enter the following command. Use the CIDR IP address range and host prefix returned from the output of the previous command.

oc patch Network.config.openshift.io cluster --type='merge' --patch \
  '{
    "spec":{
      "clusterNetwork": [ {"cidr":"<network>/<cidr>","hostPrefix":<prefix>} ],
      "networkType": "OVNKubernetes"
    }
  }'

$ oc patch Network.config.openshift.io cluster --type='merge' --patch \
  '{
    "spec":{
      "clusterNetwork": [ {"cidr":"<network>/<cidr>","hostPrefix":<prefix>} ],
      "networkType": "OVNKubernetes"
    }
  }'

Copy to Clipboard

Toggle word wrap

where:

<network>: Specifies the network part of the cidr field that you obtained from the previous step. You cannot change this value.
<cidr>: Specifies the network prefix length. For example, 14. Change this value to a smaller number than the value from the output in the previous step to expand the cluster network range.
<prefix>: Specifies the current host prefix for your cluster. This value must be the same value for the hostPrefix field that you obtained from the previous step.

Example command

oc patch Network.config.openshift.io cluster --type='merge' --patch \
  '{
    "spec":{
      "clusterNetwork": [ {"cidr":"10.217.0.0/14","hostPrefix": 23} ],
      "networkType": "OVNKubernetes"
    }
  }'

$ oc patch Network.config.openshift.io cluster --type='merge' --patch \
  '{
    "spec":{
      "clusterNetwork": [ {"cidr":"10.217.0.0/14","hostPrefix": 23} ],
      "networkType": "OVNKubernetes"
    }
  }'

Copy to Clipboard

Toggle word wrap

Example output

network.config.openshift.io/cluster patched

network.config.openshift.io/cluster patched

Copy to Clipboard

Toggle word wrap

To confirm that the configuration is active, enter the following command. It can take up to 30 minutes for this change to take effect.

oc get network.operator.openshift.io \
  -o jsonpath="{.items[0].spec.clusterNetwork}"

$ oc get network.operator.openshift.io \
  -o jsonpath="{.items[0].spec.clusterNetwork}"

Copy to Clipboard

Toggle word wrap

Example output

[{"cidr":"10.217.0.0/14","hostPrefix":23}]

[{"cidr":"10.217.0.0/14","hostPrefix":23}]

Copy to Clipboard

Toggle word wrap

Chapter 14. Configuring IP failover
Copy link

This topic describes configuring IP failover for pods and services on your OpenShift Container Platform cluster.

IP failover uses Keepalived to host a set of externally accessible Virtual IP (VIP) addresses on a set of hosts. Each VIP address is only serviced by a single host at a time. Keepalived uses the Virtual Router Redundancy Protocol (VRRP) to determine which host, from the set of hosts, services which VIP. If a host becomes unavailable, or if the service that Keepalived is watching does not respond, the VIP is switched to another host from the set. This means a VIP is always serviced as long as a host is available.

Every VIP in the set is serviced by a node selected from the set. If a single node is available, the VIPs are served. There is no way to explicitly distribute the VIPs over the nodes, so there can be nodes with no VIPs and other nodes with many VIPs. If there is only one node, all VIPs are on it.

The administrator must ensure that all of the VIP addresses meet the following requirements:

Accessible on the configured hosts from outside the cluster.
Not used for any other purpose within the cluster.

Keepalived on each node determines whether the needed service is running. If it is, VIPs are supported and Keepalived participates in the negotiation to determine which node serves the VIP. For a node to participate, the service must be listening on the watch port on a VIP or the check must be disabled.

Note

Each VIP in the set might be served by a different node.

IP failover monitors a port on each VIP to determine whether the port is reachable on the node. If the port is not reachable, the VIP is not assigned to the node. If the port is set to 0, this check is suppressed. The check script does the needed testing.

When a node running Keepalived passes the check script, the VIP on that node can enter the master state based on its priority and the priority of the current master and as determined by the preemption strategy.

A cluster administrator can provide a script through the OPENSHIFT_HA_NOTIFY_SCRIPT variable, and this script is called whenever the state of the VIP on the node changes. Keepalived uses the master state when it is servicing the VIP, the backup state when another node is servicing the VIP, or in the fault state when the check script fails. The notify script is called with the new state whenever the state changes.

You can create an IP failover deployment configuration on OpenShift Container Platform. The IP failover deployment configuration specifies the set of VIP addresses, and the set of nodes on which to service them. A cluster can have multiple IP failover deployment configurations, with each managing its own set of unique VIP addresses. Each node in the IP failover configuration runs an IP failover pod, and this pod runs Keepalived.

When using VIPs to access a pod with host networking, the application pod runs on all nodes that are running the IP failover pods. This enables any of the IP failover nodes to become the master and service the VIPs when needed. If application pods are not running on all nodes with IP failover, either some IP failover nodes never service the VIPs or some application pods never receive any traffic. Use the same selector and replication count, for both IP failover and the application pods, to avoid this mismatch.

While using VIPs to access a service, any of the nodes can be in the IP failover set of nodes, since the service is reachable on all nodes, no matter where the application pod is running. Any of the IP failover nodes can become master at any time. The service can either use external IPs and a service port or it can use a NodePort. Setting up a NodePort is a privileged operation.

When using external IPs in the service definition, the VIPs are set to the external IPs, and the IP failover monitoring port is set to the service port. When using a node port, the port is open on every node in the cluster, and the service load-balances traffic from whatever node currently services the VIP. In this case, the IP failover monitoring port is set to the NodePort in the service definition.

Important

Even though a service VIP is highly available, performance can still be affected. Keepalived makes sure that each of the VIPs is serviced by some node in the configuration, and several VIPs can end up on the same node even when other nodes have none. Strategies that externally load-balance across a set of VIPs can be thwarted when IP failover puts multiple VIPs on the same node.

When you use ExternalIP, you can set up IP failover to have the same VIP range as the ExternalIP range. You can also disable the monitoring port. In this case, all of the VIPs appear on same node in the cluster. Any user can set up a service with an ExternalIP and make it highly available.

Important

There are a maximum of 254 VIPs in the cluster.

14.1. IP failover environment variables
Copy link

The following table contains the variables used to configure IP failover.

Expand

Table 14.1. IP failover environment variables
Variable Name	Default	Description
`OPENSHIFT_HA_MONITOR_PORT`	`80`	The IP failover pod tries to open a TCP connection to this port on each Virtual IP (VIP). If connection is established, the service is considered to be running. If this port is set to `0`, the test always passes.
`OPENSHIFT_HA_NETWORK_INTERFACE`		The interface name that IP failover uses to send Virtual Router Redundancy Protocol (VRRP) traffic. The default value is `eth0`. If your cluster uses the OVN-Kubernetes network plugin, set this value to `br-ex` to avoid packet loss. For a cluster that uses the OVN-Kubernetes network plugin, all listening interfaces do not serve VRRP but instead expect inbound traffic over a `br-ex` bridge.
`OPENSHIFT_HA_REPLICA_COUNT`	`2`	The number of replicas to create. This must match `spec.replicas` value in IP failover deployment configuration.
`OPENSHIFT_HA_VIRTUAL_IPS`		The list of IP address ranges to replicate. This must be provided. For example, `1.2.3.4-6,1.2.3.9`.
`OPENSHIFT_HA_VRRP_ID_OFFSET`	`0`	The offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is `0`, and the allowed range is `0` through `255`.
`OPENSHIFT_HA_VIP_GROUPS`		The number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the `OPENSHIFT_HA_VIP_GROUPS` variable.
`OPENSHIFT_HA_IPTABLES_CHAIN`	INPUT	The name of the iptables chain, to automatically add an `iptables` rule to allow the VRRP traffic on. If the value is not set, an `iptables` rule is not added. If the chain does not exist, it is not created.
`OPENSHIFT_HA_CHECK_SCRIPT`		The full path name in the pod file system of a script that is periodically run to verify the application is operating.
`OPENSHIFT_HA_CHECK_INTERVAL`	`2`	The period, in seconds, that the check script is run.
`OPENSHIFT_HA_NOTIFY_SCRIPT`		The full path name in the pod file system of a script that is run whenever the state changes.
`OPENSHIFT_HA_PREEMPTION`	`preempt_nodelay 300`	The strategy for handling a new higher priority host. The `nopreempt` strategy does not move master from the lower priority host to the higher priority host.

14.2. Configuring IP failover in your cluster
Copy link

As a cluster administrator, you can configure IP failover on an entire cluster, or on a subset of nodes, as defined by the label selector. You can also configure multiple IP failover deployments in your cluster, where each one is independent of the others.

The IP failover deployment ensures that a failover pod runs on each of the nodes matching the constraints or the label used.

This pod runs Keepalived, which can monitor an endpoint and use Virtual Router Redundancy Protocol (VRRP) to fail over the virtual IP (VIP) from one node to another if the first node cannot reach the service or endpoint.

For production use, set a selector that selects at least two nodes, and set replicas equal to the number of selected nodes.

Prerequisites

You are logged in to the cluster as a user with cluster-admin privileges.
You created a pull secret.
Red Hat OpenStack Platform (RHOSP) only:
- You installed an RHOSP client (RHCOS documentation) on the target environment.
- You also downloaded the RHOSP openrc.sh rc file (RHCOS documentation).

Procedure

Create an IP failover service account:
```
oc create sa ipfailover
```
```
$ oc create sa ipfailover
```
Copy to Clipboard Toggle word wrap

Update security context constraints (SCC) for hostNetwork:

oc adm policy add-scc-to-user privileged -z ipfailover

$ oc adm policy add-scc-to-user privileged -z ipfailover

Copy to Clipboard

Toggle word wrap

oc adm policy add-scc-to-user hostnetwork -z ipfailover

$ oc adm policy add-scc-to-user hostnetwork -z ipfailover

Copy to Clipboard

Toggle word wrap

Red Hat OpenStack Platform (RHOSP) only: Complete the following steps to make a failover VIP address reachable on RHOSP ports.
1. Use the RHOSP CLI to show the default RHOSP API and VIP addresses in the allowed_address_pairs parameter of your RHOSP cluster:
  $ openstack port show <cluster_name> -c allowed_address_pairs
  Copy to Clipboard Toggle word wrap
  Output example
  *Field* *Value* allowed_address_pairs ip_address='192.168.0.5', mac_address='fa:16:3e:31:f9:cb' ip_address='192.168.0.7', mac_address='fa:16:3e:31:f9:cb'
  
  Copy to Clipboard Toggle word wrap
2. Set a different VIP address for the IP failover deployment and make the address reachable on RHOSP ports by entering the following command in the RHOSP CLI. Do not set any default RHOSP API and VIP addresses as the failover VIP address for the IP failover deployment.
  Example of adding the 1.1.1.1 failover IP address as an allowed address on RHOSP ports.
  $ openstack port set <cluster_name> --allowed-address ip-address=1.1.1.1,mac-address=fa:fa:16:3e:31:f9:cb
  
  Copy to Clipboard Toggle word wrap
3. Create a deployment YAML file to configure IP failover for your deployment. See "Example deployment YAML for IP failover configuration" in a later step.
4. Specify the following specification in the IP failover deployment so that you pass the failover VIP address to the OPENSHIFT_HA_VIRTUAL_IPS environment variable:
  Example of adding the 1.1.1.1 VIP address to OPENSHIFT_HA_VIRTUAL_IPS
  apiVersion: apps/v1 kind: Deployment metadata: name: ipfailover-keepalived # ... spec: env: - name: OPENSHIFT_HA_VIRTUAL_IPS value: "1.1.1.1" # ...
  
  Copy to Clipboard Toggle word wrap

Create a deployment YAML file to configure IP failover.

Note

For Red Hat OpenStack Platform (RHOSP), you do not need to re-create the deployment YAML file. You already created this file as part of the earlier instructions.

Example deployment YAML for IP failover configuration

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ipfailover-keepalived 
  labels:
    ipfailover: hello-openshift
spec:
  strategy:
    type: Recreate
  replicas: 2
  selector:
    matchLabels:
      ipfailover: hello-openshift
  template:
    metadata:
      labels:
        ipfailover: hello-openshift
    spec:
      serviceAccountName: ipfailover
      privileged: true
      hostNetwork: true
      nodeSelector:
        node-role.kubernetes.io/worker: ""
      containers:
      - name: openshift-ipfailover
        image: registry.redhat.io/openshift4/ose-keepalived-ipfailover:v4.13
        ports:
        - containerPort: 63000
          hostPort: 63000
        imagePullPolicy: IfNotPresent
        securityContext:
          privileged: true
        volumeMounts:
        - name: lib-modules
          mountPath: /lib/modules
          readOnly: true
        - name: host-slash
          mountPath: /host
          readOnly: true
          mountPropagation: HostToContainer
        - name: etc-sysconfig
          mountPath: /etc/sysconfig
          readOnly: true
        - name: config-volume
          mountPath: /etc/keepalive
        env:
        - name: OPENSHIFT_HA_CONFIG_NAME
          value: "ipfailover"
        - name: OPENSHIFT_HA_VIRTUAL_IPS 
          value: "1.1.1.1-2"
        - name: OPENSHIFT_HA_VIP_GROUPS 
          value: "10"
        - name: OPENSHIFT_HA_NETWORK_INTERFACE 
          value: "ens3" #The host interface to assign the VIPs
        - name: OPENSHIFT_HA_MONITOR_PORT 
          value: "30060"
        - name: OPENSHIFT_HA_VRRP_ID_OFFSET 
          value: "0"
        - name: OPENSHIFT_HA_REPLICA_COUNT 
          value: "2" #Must match the number of replicas in the deployment
        - name: OPENSHIFT_HA_USE_UNICAST
          value: "false"
        #- name: OPENSHIFT_HA_UNICAST_PEERS
          #value: "10.0.148.40,10.0.160.234,10.0.199.110"
        - name: OPENSHIFT_HA_IPTABLES_CHAIN 
          value: "INPUT"
        #- name: OPENSHIFT_HA_NOTIFY_SCRIPT 
        #  value: /etc/keepalive/mynotifyscript.sh
        - name: OPENSHIFT_HA_CHECK_SCRIPT 
          value: "/etc/keepalive/mycheckscript.sh"
        - name: OPENSHIFT_HA_PREEMPTION 
          value: "preempt_delay 300"
        - name: OPENSHIFT_HA_CHECK_INTERVAL 
          value: "2"
        livenessProbe:
          initialDelaySeconds: 10
          exec:
            command:
            - pgrep
            - keepalived
      volumes:
      - name: lib-modules
        hostPath:
          path: /lib/modules
      - name: host-slash
        hostPath:
          path: /
      - name: etc-sysconfig
        hostPath:
          path: /etc/sysconfig
      # config-volume contains the check script
      # created with `oc create configmap keepalived-checkscript --from-file=mycheckscript.sh`
      - configMap:
          defaultMode: 0755
          name: keepalived-checkscript
        name: config-volume
      imagePullSecrets:
        - name: openshift-pull-secret

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ipfailover-keepalived

1


  labels:
    ipfailover: hello-openshift
spec:
  strategy:
    type: Recreate
  replicas: 2
  selector:
    matchLabels:
      ipfailover: hello-openshift
  template:
    metadata:
      labels:
        ipfailover: hello-openshift
    spec:
      serviceAccountName: ipfailover
      privileged: true
      hostNetwork: true
      nodeSelector:
        node-role.kubernetes.io/worker: ""
      containers:
      - name: openshift-ipfailover
        image: registry.redhat.io/openshift4/ose-keepalived-ipfailover:v4.13
        ports:
        - containerPort: 63000
          hostPort: 63000
        imagePullPolicy: IfNotPresent
        securityContext:
          privileged: true
        volumeMounts:
        - name: lib-modules
          mountPath: /lib/modules
          readOnly: true
        - name: host-slash
          mountPath: /host
          readOnly: true
          mountPropagation: HostToContainer
        - name: etc-sysconfig
          mountPath: /etc/sysconfig
          readOnly: true
        - name: config-volume
          mountPath: /etc/keepalive
        env:
        - name: OPENSHIFT_HA_CONFIG_NAME
          value: "ipfailover"
        - name: OPENSHIFT_HA_VIRTUAL_IPS

2


          value: "1.1.1.1-2"
        - name: OPENSHIFT_HA_VIP_GROUPS

3


          value: "10"
        - name: OPENSHIFT_HA_NETWORK_INTERFACE

4


          value: "ens3" #The host interface to assign the VIPs
        - name: OPENSHIFT_HA_MONITOR_PORT

5


          value: "30060"
        - name: OPENSHIFT_HA_VRRP_ID_OFFSET

6


          value: "0"
        - name: OPENSHIFT_HA_REPLICA_COUNT

7


          value: "2" #Must match the number of replicas in the deployment
        - name: OPENSHIFT_HA_USE_UNICAST
          value: "false"
        #- name: OPENSHIFT_HA_UNICAST_PEERS
          #value: "10.0.148.40,10.0.160.234,10.0.199.110"
        - name: OPENSHIFT_HA_IPTABLES_CHAIN

8


          value: "INPUT"
        #- name: OPENSHIFT_HA_NOTIFY_SCRIPT

9


        #  value: /etc/keepalive/mynotifyscript.sh
        - name: OPENSHIFT_HA_CHECK_SCRIPT

10


          value: "/etc/keepalive/mycheckscript.sh"
        - name: OPENSHIFT_HA_PREEMPTION

11


          value: "preempt_delay 300"
        - name: OPENSHIFT_HA_CHECK_INTERVAL

12


          value: "2"
        livenessProbe:
          initialDelaySeconds: 10
          exec:
            command:
            - pgrep
            - keepalived
      volumes:
      - name: lib-modules
        hostPath:
          path: /lib/modules
      - name: host-slash
        hostPath:
          path: /
      - name: etc-sysconfig
        hostPath:
          path: /etc/sysconfig
      # config-volume contains the check script
      # created with `oc create configmap keepalived-checkscript --from-file=mycheckscript.sh`
      - configMap:
          defaultMode: 0755
          name: keepalived-checkscript
        name: config-volume
      imagePullSecrets:
        - name: openshift-pull-secret

13

Copy to Clipboard

Toggle word wrap

1: The name of the IP failover deployment.
2: The list of IP address ranges to replicate. This must be provided. For example, 1.2.3.4-6,1.2.3.9.
3: The number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the OPENSHIFT_HA_VIP_GROUPS variable.
4: The interface name that IP failover uses to send VRRP traffic. By default, eth0 is used.
5: The IP failover pod tries to open a TCP connection to this port on each VIP. If connection is established, the service is considered to be running. If this port is set to 0, the test always passes. The default value is 80.
6: The offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is 0, and the allowed range is 0 through 255.
7: The number of replicas to create. This must match spec.replicas value in IP failover deployment configuration. The default value is 2.
8: The name of the iptables chain to automatically add an iptables rule to allow the VRRP traffic on. If the value is not set, an iptables rule is not added. If the chain does not exist, it is not created, and Keepalived operates in unicast mode. The default is INPUT.
9: The full path name in the pod file system of a script that is run whenever the state changes.
10: The full path name in the pod file system of a script that is periodically run to verify the application is operating.
11: The strategy for handling a new higher priority host. The default value is preempt_delay 300, which causes a Keepalived instance to take over a VIP after 5 minutes if a lower-priority master is holding the VIP.
12: The period, in seconds, that the check script is run. The default value is 2.
13: Create the pull secret before creating the deployment, otherwise you will get an error when creating the deployment.

14.3. Configuring check and notify scripts
Copy link

Keepalived monitors the health of the application by periodically running an optional user-supplied check script. For example, the script can test a web server by issuing a request and verifying the response. As cluster administrator, you can provide an optional notify script, which is called whenever the state changes.

The check and notify scripts run in the IP failover pod and use the pod file system, not the host file system. However, the IP failover pod makes the host file system available under the /hosts mount path. When configuring a check or notify script, you must provide the full path to the script. The recommended approach for providing the scripts is to use a ConfigMap object.

The full path names of the check and notify scripts are added to the Keepalived configuration file, _/etc/keepalived/keepalived.conf, which is loaded every time Keepalived starts. The scripts can be added to the pod with a ConfigMap object as described in the following methods.

Check script

When a check script is not provided, a simple default script is run that tests the TCP connection. This default test is suppressed when the monitor port is 0.

Each IP failover pod manages a Keepalived daemon that manages one or more virtual IP (VIP) addresses on the node where the pod is running. The Keepalived daemon keeps the state of each VIP for that node. A particular VIP on a particular node might be in master, backup, or fault state.

If the check script returns non-zero, the node enters the backup state, and any VIPs it holds are reassigned.

Notify script

Keepalived passes the following three parameters to the notify script:

$1 - group or instance
$2 - Name of the group or instance
$3 - The new state: master, backup, or fault

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.

Procedure

Create the desired script and create a ConfigMap object to hold it. The script has no input arguments and must return 0 for OK and 1 for fail.
The check script, mycheckscript.sh:
```
#!/bin/bash
    # Whatever tests are needed
    # E.g., send request and verify response
exit 0
```
```
#!/bin/bash
    # Whatever tests are needed
    # E.g., send request and verify response
exit 0
```
Copy to Clipboard Toggle word wrap

Create the ConfigMap object :

oc create configmap mycustomcheck --from-file=mycheckscript.sh

$ oc create configmap mycustomcheck --from-file=mycheckscript.sh

Copy to Clipboard

Toggle word wrap

Add the script to the pod. The defaultMode for the mounted ConfigMap object files must able to run by using oc commands or by editing the deployment configuration. A value of 0755, 493 decimal, is typical:

oc set env deploy/ipfailover-keepalived \
    OPENSHIFT_HA_CHECK_SCRIPT=/etc/keepalive/mycheckscript.sh

$ oc set env deploy/ipfailover-keepalived \
    OPENSHIFT_HA_CHECK_SCRIPT=/etc/keepalive/mycheckscript.sh

Copy to Clipboard

Toggle word wrap

oc set volume deploy/ipfailover-keepalived --add --overwrite \
    --name=config-volume \
    --mount-path=/etc/keepalive \
    --source='{"configMap": { "name": "mycustomcheck", "defaultMode": 493}}'

$ oc set volume deploy/ipfailover-keepalived --add --overwrite \
    --name=config-volume \
    --mount-path=/etc/keepalive \
    --source='{"configMap": { "name": "mycustomcheck", "defaultMode": 493}}'

Copy to Clipboard

Toggle word wrap

Note

The oc set env command is whitespace sensitive. There must be no whitespace on either side of the = sign.

Tip

You can alternatively edit the ipfailover-keepalived deployment configuration:

oc edit deploy ipfailover-keepalived

$ oc edit deploy ipfailover-keepalived

Copy to Clipboard

Toggle word wrap

    spec:
      containers:
      - env:
        - name: OPENSHIFT_HA_CHECK_SCRIPT  
          value: /etc/keepalive/mycheckscript.sh
...
        volumeMounts: 
        - mountPath: /etc/keepalive
          name: config-volume
      dnsPolicy: ClusterFirst
...
      volumes: 
      - configMap:
          defaultMode: 0755 
          name: customrouter
        name: config-volume
...

    spec:
      containers:
      - env:
        - name: OPENSHIFT_HA_CHECK_SCRIPT

1


          value: /etc/keepalive/mycheckscript.sh
...
        volumeMounts:

2


        - mountPath: /etc/keepalive
          name: config-volume
      dnsPolicy: ClusterFirst
...
      volumes:

3


      - configMap:
          defaultMode: 0755

4


          name: customrouter
        name: config-volume
...

Copy to Clipboard

Toggle word wrap

1: In the spec.container.env field, add the OPENSHIFT_HA_CHECK_SCRIPT environment variable to point to the mounted script file.
2: Add the spec.container.volumeMounts field to create the mount point.
3: Add a new spec.volumes field to mention the config map.
4: This sets run permission on the files. When read back, it is displayed in decimal, 493.

Save the changes and exit the editor. This restarts ipfailover-keepalived.

14.4. Configuring VRRP preemption
Copy link

When a Virtual IP (VIP) on a node leaves the fault state by passing the check script, the VIP on the node enters the backup state if it has lower priority than the VIP on the node that is currently in the master state. The nopreempt strategy does not move master from the lower priority VIP on the host to the higher priority VIP on the host. With preempt_delay 300, the default, Keepalived waits the specified 300 seconds and moves master to the higher priority VIP on the host.

Procedure

To specify preemption enter oc edit deploy ipfailover-keepalived to edit the router deployment configuration:
```
oc edit deploy ipfailover-keepalived
```
```
$ oc edit deploy ipfailover-keepalived
```
Copy to Clipboard Toggle word wrap
```
...
    spec:
      containers:
      - env:
        - name: OPENSHIFT_HA_PREEMPTION  
          value: preempt_delay 300
...
```
```
...
    spec:
      containers:
      - env:
        - name: OPENSHIFT_HA_PREEMPTION  
```
1
```
          value: preempt_delay 300
...
```
Copy to Clipboard Toggle word wrap
1
Set the OPENSHIFT_HA_PREEMPTION value:
preempt_delay 300: Keepalived waits the specified 300 seconds and moves master to the higher priority VIP on the host. This is the default value.
nopreempt: does not move master from the lower priority VIP on the host to the higher priority VIP on the host.

14.5. Deploying multiple IP failover instances
Copy link

Each IP failover pod managed by the IP failover deployment configuration, 1 pod per node or replica, runs a Keepalived daemon. As more IP failover deployment configurations are configured, more pods are created and more daemons join into the common Virtual Router Redundancy Protocol (VRRP) negotiation. This negotiation is done by all the Keepalived daemons and it determines which nodes service which virtual IPs (VIP).

Internally, Keepalived assigns a unique vrrp-id to each VIP. The negotiation uses this set of vrrp-ids, when a decision is made, the VIP corresponding to the winning vrrp-id is serviced on the winning node.

Therefore, for every VIP defined in the IP failover deployment configuration, the IP failover pod must assign a corresponding vrrp-id. This is done by starting at OPENSHIFT_HA_VRRP_ID_OFFSET and sequentially assigning the vrrp-ids to the list of VIPs. The vrrp-ids can have values in the range 1..255.

When there are multiple IP failover deployment configurations, you must specify OPENSHIFT_HA_VRRP_ID_OFFSET so that there is room to increase the number of VIPs in the deployment configuration and none of the vrrp-id ranges overlap.

14.6. Configuring IP failover for more than 254 addresses
Copy link

IP failover management is limited to 254 groups of Virtual IP (VIP) addresses. By default OpenShift Container Platform assigns one IP address to each group. You can use the OPENSHIFT_HA_VIP_GROUPS variable to change this so multiple IP addresses are in each group and define the number of VIP groups available for each Virtual Router Redundancy Protocol (VRRP) instance when configuring IP failover.

Grouping VIPs creates a wider range of allocation of VIPs per VRRP in the case of VRRP failover events, and is useful when all hosts in the cluster have access to a service locally. For example, when a service is being exposed with an ExternalIP.

Note

As a rule for failover, do not limit services, such as the router, to one specific host. Instead, services should be replicated to each host so that in the case of IP failover, the services do not have to be recreated on the new host.

Note

If you are using OpenShift Container Platform health checks, the nature of IP failover and groups means that all instances in the group are not checked. For that reason, the Kubernetes health checks must be used to ensure that services are live.

Prerequisites

You are logged in to the cluster with a user with cluster-admin privileges.

Procedure

To change the number of IP addresses assigned to each group, change the value for the OPENSHIFT_HA_VIP_GROUPS variable, for example:
Example Deployment YAML for IP failover configuration
```
...
    spec:
        env:
        - name: OPENSHIFT_HA_VIP_GROUPS 
          value: "3"
...
```
```
...
    spec:
        env:
        - name: OPENSHIFT_HA_VIP_GROUPS 
```
1
```
          value: "3"
...
```
Copy to Clipboard Toggle word wrap
1
If OPENSHIFT_HA_VIP_GROUPS is set to 3 in an environment with seven VIPs, it creates three groups, assigning three VIPs to the first group, and two VIPs to the two remaining groups.

Note

If the number of groups set by OPENSHIFT_HA_VIP_GROUPS is fewer than the number of IP addresses set to fail over, the group contains more than one IP address, and all of the addresses move as a single unit.

14.7. High availability For ExternalIP
Copy link

In non-cloud clusters, IP failover and ExternalIP to a service can be combined. The result is high availability services for users that create services using ExternalIP.

The approach is to specify an spec.ExternalIP.autoAssignCIDRs range of the cluster network configuration, and then use the same range in creating the IP failover configuration.

Because IP failover can support up to a maximum of 255 VIPs for the entire cluster, the spec.ExternalIP.autoAssignCIDRs must be /24 or smaller.

14.8. Removing IP failover
Copy link

When IP failover is initially configured, the worker nodes in the cluster are modified with an iptables rule that explicitly allows multicast packets on 224.0.0.18 for Keepalived. Because of the change to the nodes, removing IP failover requires running a job to remove the iptables rule and removing the virtual IP addresses used by Keepalived.

Procedure

Optional: Identify and delete any check and notify scripts that are stored as config maps:

Identify whether any pods for IP failover use a config map as a volume:

oc get pod -l ipfailover \
  -o jsonpath="\
{range .items[?(@.spec.volumes[*].configMap)]}
{'Namespace: '}{.metadata.namespace}
{'Pod:       '}{.metadata.name}
{'Volumes that use config maps:'}
{range .spec.volumes[?(@.configMap)]}  {'volume:    '}{.name}
  {'configMap: '}{.configMap.name}{'\n'}{end}
{end}"

$ oc get pod -l ipfailover \
  -o jsonpath="\
{range .items[?(@.spec.volumes[*].configMap)]}
{'Namespace: '}{.metadata.namespace}
{'Pod:       '}{.metadata.name}
{'Volumes that use config maps:'}
{range .spec.volumes[?(@.configMap)]}  {'volume:    '}{.name}
  {'configMap: '}{.configMap.name}{'\n'}{end}
{end}"

Copy to Clipboard

Toggle word wrap

Example output

Namespace: default
Pod:       keepalived-worker-59df45db9c-2x9mn
Volumes that use config maps:
  volume:    config-volume
  configMap: mycustomcheck

Namespace: default
Pod:       keepalived-worker-59df45db9c-2x9mn
Volumes that use config maps:
  volume:    config-volume
  configMap: mycustomcheck

Copy to Clipboard

Toggle word wrap

If the preceding step provided the names of config maps that are used as volumes, delete the config maps:
```
oc delete configmap <configmap_name>
```
```
$ oc delete configmap <configmap_name>
```
Copy to Clipboard Toggle word wrap

Identify an existing deployment for IP failover:

oc get deployment -l ipfailover

$ oc get deployment -l ipfailover

Copy to Clipboard

Toggle word wrap

Example output

NAMESPACE   NAME         READY   UP-TO-DATE   AVAILABLE   AGE
default     ipfailover   2/2     2            2           105d

NAMESPACE   NAME         READY   UP-TO-DATE   AVAILABLE   AGE
default     ipfailover   2/2     2            2           105d

Copy to Clipboard

Toggle word wrap

Delete the deployment:

oc delete deployment <ipfailover_deployment_name>

$ oc delete deployment <ipfailover_deployment_name>

Copy to Clipboard

Toggle word wrap

Remove the ipfailover service account:
```
oc delete sa ipfailover
```
```
$ oc delete sa ipfailover
```
Copy to Clipboard Toggle word wrap

Run a job that removes the IP tables rule that was added when IP failover was initially configured:

Create a file such as remove-ipfailover-job.yaml with contents that are similar to the following example:

apiVersion: batch/v1
kind: Job
metadata:
  generateName: remove-ipfailover-
  labels:
    app: remove-ipfailover
spec:
  template:
    metadata:
      name: remove-ipfailover
    spec:
      containers:
      - name: remove-ipfailover
        image: registry.redhat.io/openshift4/ose-keepalived-ipfailover:v4.13
        command: ["/var/lib/ipfailover/keepalived/remove-failover.sh"]
      nodeSelector: 
        kubernetes.io/hostname: <host_name>  
      restartPolicy: Never

apiVersion: batch/v1
kind: Job
metadata:
  generateName: remove-ipfailover-
  labels:
    app: remove-ipfailover
spec:
  template:
    metadata:
      name: remove-ipfailover
    spec:
      containers:
      - name: remove-ipfailover
        image: registry.redhat.io/openshift4/ose-keepalived-ipfailover:v4.13
        command: ["/var/lib/ipfailover/keepalived/remove-failover.sh"]
      nodeSelector:

1


        kubernetes.io/hostname: <host_name>

2


      restartPolicy: Never

Copy to Clipboard

Toggle word wrap

1: The nodeSelector is likely the same as the selector used in the old IP failover deployment.
2: Run the job for each node in your cluster that was configured for IP failover and replace the hostname each time.

Run the job:

oc create -f remove-ipfailover-job.yaml

$ oc create -f remove-ipfailover-job.yaml

Copy to Clipboard

Toggle word wrap

Example output

job.batch/remove-ipfailover-2h8dm created

job.batch/remove-ipfailover-2h8dm created

Copy to Clipboard

Toggle word wrap

Verification

Confirm that the job removed the initial configuration for IP failover.

oc logs job/remove-ipfailover-2h8dm

$ oc logs job/remove-ipfailover-2h8dm

Copy to Clipboard

Toggle word wrap

Example output

remove-failover.sh: OpenShift IP Failover service terminating.
  - Removing ip_vs module ...
  - Cleaning up ...
  - Releasing VIPs  (interface eth0) ...

remove-failover.sh: OpenShift IP Failover service terminating.
  - Removing ip_vs module ...
  - Cleaning up ...
  - Releasing VIPs  (interface eth0) ...

Copy to Clipboard

Toggle word wrap

Chapter 15. Configuring interface-level network sysctls
Copy link

In Linux, sysctl allows an administrator to modify kernel parameters at runtime. You can modify interface-level network sysctls using the tuning Container Network Interface (CNI) meta plugin. The tuning CNI meta plugin operates in a chain with a main CNI plugin as illustrated.

The main CNI plugin assigns the interface and passes this to the tuning CNI meta plugin at runtime. You can change some sysctls and several interface attributes (promiscuous mode, all-multicast mode, MTU, and MAC address) in the network namespace by using the tuning CNI meta plugin. In the tuning CNI meta plugin configuration, the interface name is represented by the IFNAME token, and is replaced with the actual name of the interface at runtime.

Note

In OpenShift Container Platform, the tuning CNI meta plugin only supports changing interface-level network sysctls.

15.1. Configuring the tuning CNI
Copy link

The following procedure configures the tuning CNI to change the interface-level network net.ipv4.conf.IFNAME.accept_redirects sysctl. This example enables accepting and sending ICMP-redirected packets.

Procedure

Create a network attachment definition, such as tuning-example.yaml, with the following content:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: <name> 
  namespace: default 
spec:
  config: '{
    "cniVersion": "0.4.0", 
    "name": "<name>", 
    "plugins": [{
       "type": "<main_CNI_plugin>" 
      },
      {
       "type": "tuning", 
       "sysctl": {
            "net.ipv4.conf.IFNAME.accept_redirects": "1" 
        }
      }
     ]
}

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: <name>

1


  namespace: default

2


spec:
  config: '{
    "cniVersion": "0.4.0",

3


    "name": "<name>",

4


    "plugins": [{
       "type": "<main_CNI_plugin>"

5


      },
      {
       "type": "tuning",

6


       "sysctl": {
            "net.ipv4.conf.IFNAME.accept_redirects": "1"

7

Copy to Clipboard

Toggle word wrap

1: Specifies the name for the additional network attachment to create. The name must be unique within the specified namespace.
2: Specifies the namespace that the object is associated with.
3: Specifies the CNI specification version.
4: Specifies the name for the configuration. It is recommended to match the configuration name to the name value of the network attachment definition.
5: Specifies the name of the main CNI plugin to configure.
6: Specifies the name of the CNI meta plugin.
7: Specifies the sysctl to set.

An example yaml file is shown here:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: tuningnad
  namespace: default
spec:
  config: '{
    "cniVersion": "0.4.0",
    "name": "tuningnad",
    "plugins": [{
      "type": "bridge"
      },
      {
      "type": "tuning",
      "sysctl": {
         "net.ipv4.conf.IFNAME.accept_redirects": "1"
        }
    }
  ]
}'

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: tuningnad
  namespace: default
spec:
  config: '{
    "cniVersion": "0.4.0",
    "name": "tuningnad",
    "plugins": [{
      "type": "bridge"
      },
      {
      "type": "tuning",
      "sysctl": {
         "net.ipv4.conf.IFNAME.accept_redirects": "1"
        }
    }
  ]
}'

Copy to Clipboard

Toggle word wrap

Apply the yaml by running the following command:

oc apply -f tuning-example.yaml

$ oc apply -f tuning-example.yaml

Copy to Clipboard

Toggle word wrap

Example output

networkattachmentdefinition.k8.cni.cncf.io/tuningnad created

networkattachmentdefinition.k8.cni.cncf.io/tuningnad created

Copy to Clipboard

Toggle word wrap

Create a pod such as examplepod.yaml with the network attachment definition similar to the following:

apiVersion: v1
kind: Pod
metadata:
  name: tunepod
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/networks: tuningnad 
spec:
  containers:
  - name: podexample
    image: centos
    command: ["/bin/bash", "-c", "sleep INF"]
    securityContext:
      runAsUser: 2000 
      runAsGroup: 3000 
      allowPrivilegeEscalation: false 
      capabilities: 
        drop: ["ALL"]
  securityContext:
    runAsNonRoot: true 
    seccompProfile: 
      type: RuntimeDefault

apiVersion: v1
kind: Pod
metadata:
  name: tunepod
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/networks: tuningnad

1


spec:
  containers:
  - name: podexample
    image: centos
    command: ["/bin/bash", "-c", "sleep INF"]
    securityContext:
      runAsUser: 2000

2


      runAsGroup: 3000

3


      allowPrivilegeEscalation: false

4


      capabilities:

5


        drop: ["ALL"]
  securityContext:
    runAsNonRoot: true

6


    seccompProfile:

7


      type: RuntimeDefault

Copy to Clipboard

Toggle word wrap

1: Specify the name of the configured NetworkAttachmentDefinition.
2: runAsUser controls which user ID the container is run with.
3: runAsGroup controls which primary group ID the containers is run with.
4: allowPrivilegeEscalation determines if a pod can request to allow privilege escalation. If unspecified, it defaults to true. This boolean directly controls whether the no_new_privs flag gets set on the container process.
5: capabilities permit privileged actions without giving full root access. This policy ensures all capabilities are dropped from the pod.
6: runAsNonRoot: true requires that the container will run with a user with any UID other than 0.
7: RuntimeDefault enables the default seccomp profile for a pod or container workload.

Apply the yaml by running the following command:
```
oc apply -f examplepod.yaml
```
```
$ oc apply -f examplepod.yaml
```
Copy to Clipboard Toggle word wrap

Verify that the pod is created by running the following command:

oc get pod

$ oc get pod

Copy to Clipboard

Toggle word wrap

Example output

NAME      READY   STATUS    RESTARTS   AGE
tunepod   1/1     Running   0          47s

NAME      READY   STATUS    RESTARTS   AGE
tunepod   1/1     Running   0          47s

Copy to Clipboard

Toggle word wrap

Log in to the pod by running the following command:
```
oc rsh tunepod
```
```
$ oc rsh tunepod
```
Copy to Clipboard Toggle word wrap
Verify the values of the configured sysctl flags. For example, find the value net.ipv4.conf.net1.accept_redirects by running the following command:
```
sysctl net.ipv4.conf.net1.accept_redirects
```
```
sh-4.4# sysctl net.ipv4.conf.net1.accept_redirects
```
Copy to Clipboard Toggle word wrap
Expected output
```
net.ipv4.conf.net1.accept_redirects = 1
```
```
net.ipv4.conf.net1.accept_redirects = 1
```
Copy to Clipboard Toggle word wrap

Chapter 16. Using the Stream Control Transmission Protocol (SCTP)
Copy link

As a cluster administrator, you can use the Stream Control Transmission Protocol (SCTP) on a bare-metal cluster.

16.1. Support for SCTP on OpenShift Container Platform
Copy link

As a cluster administrator, you can enable SCTP on the hosts in the cluster. On Red Hat Enterprise Linux CoreOS (RHCOS), the SCTP module is disabled by default.

SCTP is a reliable message based protocol that runs on top of an IP network.

When enabled, you can use SCTP as a protocol with pods, services, and network policy. A Service object must be defined with the type parameter set to either the ClusterIP or NodePort value.

16.1.1. Example configurations using SCTP protocol
Copy link

You can configure a pod or service to use SCTP by setting the protocol parameter to the SCTP value in the pod or service object.

In the following example, a pod is configured to use SCTP:

apiVersion: v1
kind: Pod
metadata:
  namespace: project1
  name: example-pod
spec:
  containers:
    - name: example-pod
...
      ports:
        - containerPort: 30100
          name: sctpserver
          protocol: SCTP

apiVersion: v1
kind: Pod
metadata:
  namespace: project1
  name: example-pod
spec:
  containers:
    - name: example-pod
...
      ports:
        - containerPort: 30100
          name: sctpserver
          protocol: SCTP

Copy to Clipboard

Toggle word wrap

In the following example, a service is configured to use SCTP:

apiVersion: v1
kind: Service
metadata:
  namespace: project1
  name: sctpserver
spec:
...
  ports:
    - name: sctpserver
      protocol: SCTP
      port: 30100
      targetPort: 30100
  type: ClusterIP

apiVersion: v1
kind: Service
metadata:
  namespace: project1
  name: sctpserver
spec:
...
  ports:
    - name: sctpserver
      protocol: SCTP
      port: 30100
      targetPort: 30100
  type: ClusterIP

Copy to Clipboard

Toggle word wrap

In the following example, a NetworkPolicy object is configured to apply to SCTP network traffic on port 80 from any pods with a specific label:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-sctp-on-http
spec:
  podSelector:
    matchLabels:
      role: web
  ingress:
  - ports:
    - protocol: SCTP
      port: 80

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-sctp-on-http
spec:
  podSelector:
    matchLabels:
      role: web
  ingress:
  - ports:
    - protocol: SCTP
      port: 80

Copy to Clipboard

Toggle word wrap

16.2. Enabling Stream Control Transmission Protocol (SCTP)
Copy link

As a cluster administrator, you can load and enable the blacklisted SCTP kernel module on worker nodes in your cluster.

Prerequisites

Install the OpenShift CLI (oc).
Access to the cluster as a user with the cluster-admin role.

Procedure

Create a file named load-sctp-module.yaml that contains the following YAML definition:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: load-sctp-module
  labels:
    machineconfiguration.openshift.io/role: worker
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - path: /etc/modprobe.d/sctp-blacklist.conf
          mode: 0644
          overwrite: true
          contents:
            source: data:,
        - path: /etc/modules-load.d/sctp-load.conf
          mode: 0644
          overwrite: true
          contents:
            source: data:,sctp

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  name: load-sctp-module
  labels:
    machineconfiguration.openshift.io/role: worker
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
        - path: /etc/modprobe.d/sctp-blacklist.conf
          mode: 0644
          overwrite: true
          contents:
            source: data:,
        - path: /etc/modules-load.d/sctp-load.conf
          mode: 0644
          overwrite: true
          contents:
            source: data:,sctp

Copy to Clipboard

Toggle word wrap

To create the MachineConfig object, enter the following command:
```
oc create -f load-sctp-module.yaml
```
```
$ oc create -f load-sctp-module.yaml
```
Copy to Clipboard Toggle word wrap
Optional: To watch the status of the nodes while the MachineConfig Operator applies the configuration change, enter the following command. When the status of a node transitions to Ready, the configuration update is applied.
```
oc get nodes
```
```
$ oc get nodes
```
Copy to Clipboard Toggle word wrap

16.3. Verifying Stream Control Transmission Protocol (SCTP) is enabled
Copy link

You can verify that SCTP is working on a cluster by creating a pod with an application that listens for SCTP traffic, associating it with a service, and then connecting to the exposed service.

Prerequisites

Access to the internet from the cluster to install the nc package.
Install the OpenShift CLI (oc).
Access to the cluster as a user with the cluster-admin role.

Procedure

Create a pod starts an SCTP listener:

Create a file named sctp-server.yaml that defines a pod with the following YAML:

apiVersion: v1
kind: Pod
metadata:
  name: sctpserver
  labels:
    app: sctpserver
spec:
  containers:
    - name: sctpserver
      image: registry.access.redhat.com/ubi9/ubi
      command: ["/bin/sh", "-c"]
      args:
        ["dnf install -y nc && sleep inf"]
      ports:
        - containerPort: 30102
          name: sctpserver
          protocol: SCTP

apiVersion: v1
kind: Pod
metadata:
  name: sctpserver
  labels:
    app: sctpserver
spec:
  containers:
    - name: sctpserver
      image: registry.access.redhat.com/ubi9/ubi
      command: ["/bin/sh", "-c"]
      args:
        ["dnf install -y nc && sleep inf"]
      ports:
        - containerPort: 30102
          name: sctpserver
          protocol: SCTP

Copy to Clipboard

Toggle word wrap

Create the pod by entering the following command:
```
oc create -f sctp-server.yaml
```
```
$ oc create -f sctp-server.yaml
```
Copy to Clipboard Toggle word wrap

Create a service for the SCTP listener pod.

Create a file named sctp-service.yaml that defines a service with the following YAML:

apiVersion: v1
kind: Service
metadata:
  name: sctpservice
  labels:
    app: sctpserver
spec:
  type: NodePort
  selector:
    app: sctpserver
  ports:
    - name: sctpserver
      protocol: SCTP
      port: 30102
      targetPort: 30102

apiVersion: v1
kind: Service
metadata:
  name: sctpservice
  labels:
    app: sctpserver
spec:
  type: NodePort
  selector:
    app: sctpserver
  ports:
    - name: sctpserver
      protocol: SCTP
      port: 30102
      targetPort: 30102

Copy to Clipboard

Toggle word wrap

To create the service, enter the following command:
```
oc create -f sctp-service.yaml
```
```
$ oc create -f sctp-service.yaml
```
Copy to Clipboard Toggle word wrap

Create a pod for the SCTP client.

Create a file named sctp-client.yaml with the following YAML:

apiVersion: v1
kind: Pod
metadata:
  name: sctpclient
  labels:
    app: sctpclient
spec:
  containers:
    - name: sctpclient
      image: registry.access.redhat.com/ubi9/ubi
      command: ["/bin/sh", "-c"]
      args:
        ["dnf install -y nc && sleep inf"]

apiVersion: v1
kind: Pod
metadata:
  name: sctpclient
  labels:
    app: sctpclient
spec:
  containers:
    - name: sctpclient
      image: registry.access.redhat.com/ubi9/ubi
      command: ["/bin/sh", "-c"]
      args:
        ["dnf install -y nc && sleep inf"]

Copy to Clipboard

Toggle word wrap

To create the Pod object, enter the following command:
```
oc apply -f sctp-client.yaml
```
```
$ oc apply -f sctp-client.yaml
```
Copy to Clipboard Toggle word wrap

Run an SCTP listener on the server.
1. To connect to the server pod, enter the following command:
  $ oc rsh sctpserver
  Copy to Clipboard Toggle word wrap
2. To start the SCTP listener, enter the following command:
  $ nc -l 30102 --sctp
  Copy to Clipboard Toggle word wrap
Connect to the SCTP listener on the server.
1. Open a new terminal window or tab in your terminal program.
2. Obtain the IP address of the sctpservice service. Enter the following command:
  $ oc get services sctpservice -o go-template='{{.spec.clusterIP}}{{"\n"}}'
  Copy to Clipboard Toggle word wrap
3. To connect to the client pod, enter the following command:
  $ oc rsh sctpclient
  Copy to Clipboard Toggle word wrap
4. To start the SCTP client, enter the following command. Replace <cluster_IP> with the cluster IP address of the sctpservice service.
  # nc <cluster_IP> 30102 --sctp
  Copy to Clipboard Toggle word wrap

Chapter 17. Using Precision Time Protocol (PTP) hardware
Copy link

You can configure linuxptp services and use PTP-capable hardware in OpenShift Container Platform cluster nodes.

17.1. About PTP hardware
Copy link

You can use the OpenShift Container Platform console or OpenShift CLI (oc) to install PTP by deploying the PTP Operator. The PTP Operator creates and manages the linuxptp services and provides the following features:

Discovery of the PTP-capable devices in the cluster.
Management of the configuration of linuxptp services.
Notification of PTP clock events that negatively affect the performance and reliability of your application with the PTP Operator cloud-event-proxy sidecar.

Note

The PTP Operator works with PTP-capable devices on clusters provisioned only on bare-metal infrastructure.

17.2. About PTP
Copy link

Precision Time Protocol (PTP) is used to synchronize clocks in a network. When used in conjunction with hardware support, PTP is capable of sub-microsecond accuracy, and is more accurate than Network Time Protocol (NTP).

17.2.1. Elements of a PTP domain
Copy link

PTP is used to synchronize multiple nodes connected in a network, with clocks for each node. The clocks synchronized by PTP are organized in a source-destination hierarchy. The hierarchy is created and updated automatically by the best master clock (BMC) algorithm, which runs on every clock. Destination clocks are synchronized to source clocks, and destination clocks can themselves be the source for other downstream clocks. The three primary types of PTP clocks are described below.

Grandmaster clock: The grandmaster clock provides standard time information to other clocks across the network and ensures accurate and stable synchronisation. It writes time stamps and responds to time requests from other clocks. Grandmaster clocks synchronize to a Global Navigation Satellite System (GNSS) time source. The Grandmaster clock is the authoritative source of time in the network and is responsible for providing time synchronization to all other devices.
Ordinary clock: The ordinary clock has a single port connection that can play the role of source or destination clock, depending on its position in the network. The ordinary clock can read and write time stamps.
Boundary clock: The boundary clock has ports in two or more communication paths and can be a source and a destination to other destination clocks at the same time. The boundary clock works as a destination clock upstream. The destination clock receives the timing message, adjusts for delay, and then creates a new source time signal to pass down the network. The boundary clock produces a new timing packet that is still correctly synced with the source clock and can reduce the number of connected devices reporting directly to the source clock.

17.2.2. Advantages of PTP over NTP
Copy link

One of the main advantages that PTP has over NTP is the hardware support present in various network interface controllers (NIC) and network switches. The specialized hardware allows PTP to account for delays in message transfer and improves the accuracy of time synchronization. To achieve the best possible accuracy, it is recommended that all networking components between PTP clocks are PTP hardware enabled.

Hardware-based PTP provides optimal accuracy, since the NIC can time stamp the PTP packets at the exact moment they are sent and received. Compare this to software-based PTP, which requires additional processing of the PTP packets by the operating system.

Important

Before enabling PTP, ensure that NTP is disabled for the required nodes. You can disable the chrony time service (chronyd) using a MachineConfig custom resource. For more information, see Disabling chrony time service.

17.2.3. Using PTP with dual NIC hardware
Copy link

OpenShift Container Platform supports single and dual NIC hardware for precision PTP timing in the cluster.

For 5G telco networks that deliver mid-band spectrum coverage, each virtual distributed unit (vDU) requires connections to 6 radio units (RUs). To make these connections, each vDU host requires 2 NICs configured as boundary clocks.

Dual NIC hardware allows you to connect each NIC to the same upstream leader clock with separate ptp4l instances for each NIC feeding the downstream clocks.

17.3. Overview of linuxptp in OpenShift Container Platform nodes
Copy link

OpenShift Container Platform uses PTP and linuxptp for high precision system timing in bare-metal infrastructure. The linuxptp package includes the ts2phc, pmc, ptp4l, and phc2sys programs for system clock synchronization.

ts2phc

ts2phc synchronizes the PTP hardware clock (PHC) across PTP devices with a high degree of precision. ts2phc is used in grandmaster clock configurations. It receives the precision timing signal a high precision clock source such as Global Navigation Satellite System (GNSS). GNSS provides an accurate and reliable source of synchronized time for use in large distributed networks. GNSS clocks typically provide time information with a precision of a few nanoseconds.

The ts2phc system daemon sends timing information from the grandmaster clock to other PTP devices in the network by reading time information from the grandmaster clock and converting it to PHC format. PHC time is used by other devices in the network to synchronize their clocks with the grandmaster clock.

pmc

pmc implements a PTP management client (pmc) according to IEEE standard 1588.1588. pmc provides basic management access for the ptp4l system daemon. pmc reads from standard input and sends the output over the selected transport, printing any replies it receives.

ptp4l

ptp4l implements the PTP boundary clock and ordinary clock and runs as a system daemon. ptp4l does the following:

Synchronizes the PHC to the source clock with hardware time stamping
Synchronizes the system clock to the source clock with software time stamping

phc2sys

phc2sys synchronizes the system clock to the PHC on the network interface controller (NIC). The phc2sys system daemon continuously monitors the PHC for timing information. When it detects a timing error, the PHC corrects the system clock.

17.4. Installing the PTP Operator using the CLI
Copy link

As a cluster administrator, you can install the Operator by using the CLI.

Prerequisites

A cluster installed on bare-metal hardware with nodes that have hardware that supports PTP.
Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create a namespace for the PTP Operator.

Save the following YAML in the ptp-namespace.yaml file:

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-ptp
  annotations:
    workload.openshift.io/allowed: management
  labels:
    name: openshift-ptp
    openshift.io/cluster-monitoring: "true"

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-ptp
  annotations:
    workload.openshift.io/allowed: management
  labels:
    name: openshift-ptp
    openshift.io/cluster-monitoring: "true"

Copy to Clipboard

Toggle word wrap

Create the Namespace CR:
```
oc create -f ptp-namespace.yaml
```
```
$ oc create -f ptp-namespace.yaml
```
Copy to Clipboard Toggle word wrap

Create an Operator group for the PTP Operator.

Save the following YAML in the ptp-operatorgroup.yaml file:

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: ptp-operators
  namespace: openshift-ptp
spec:
  targetNamespaces:
  - openshift-ptp

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: ptp-operators
  namespace: openshift-ptp
spec:
  targetNamespaces:
  - openshift-ptp

Copy to Clipboard

Toggle word wrap

Create the OperatorGroup CR:
```
oc create -f ptp-operatorgroup.yaml
```
```
$ oc create -f ptp-operatorgroup.yaml
```
Copy to Clipboard Toggle word wrap

Subscribe to the PTP Operator.

Save the following YAML in the ptp-sub.yaml file:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ptp-operator-subscription
  namespace: openshift-ptp
spec:
  channel: "stable"
  name: ptp-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: ptp-operator-subscription
  namespace: openshift-ptp
spec:
  channel: "stable"
  name: ptp-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

Copy to Clipboard

Toggle word wrap

Create the Subscription CR:
```
oc create -f ptp-sub.yaml
```
```
$ oc create -f ptp-sub.yaml
```
Copy to Clipboard Toggle word wrap

To verify that the Operator is installed, enter the following command:

oc get csv -n openshift-ptp -o custom-columns=Name:.metadata.name,Phase:.status.phase

$ oc get csv -n openshift-ptp -o custom-columns=Name:.metadata.name,Phase:.status.phase

Copy to Clipboard

Toggle word wrap

Example output

Name                         Phase
4.13.0-202301261535          Succeeded

Name                         Phase
4.13.0-202301261535          Succeeded

Copy to Clipboard

Toggle word wrap

17.5. Installing the PTP Operator using the web console
Copy link

As a cluster administrator, you can install the PTP Operator using the web console.

Note

You have to create the namespace and Operator group as mentioned in the previous section.

Procedure

Install the PTP Operator using the OpenShift Container Platform web console:
1. In the OpenShift Container Platform web console, click Operators → OperatorHub.
2. Choose PTP Operator from the list of available Operators, and then click Install.
3. On the Install Operator page, under A specific namespace on the cluster select openshift-ptp. Then, click Install.
Optional: Verify that the PTP Operator installed successfully:
1. Switch to the Operators → Installed Operators page.
2. Ensure that PTP Operator is listed in the openshift-ptp project with a Status of InstallSucceeded.
  Note
  During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
  If the Operator does not appear as installed, to troubleshoot further:
  - Go to the Operators → Installed Operators page and inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
  - Go to the Workloads → Pods page and check the logs for pods in the openshift-ptp project.

17.6. Configuring PTP devices
Copy link

The PTP Operator adds the NodePtpDevice.ptp.openshift.io custom resource definition (CRD) to OpenShift Container Platform.

When installed, the PTP Operator searches your cluster for PTP-capable network devices on each node. It creates and updates a NodePtpDevice custom resource (CR) object for each node that provides a compatible PTP-capable network device.

17.6.1. Discovering PTP-capable network devices in your cluster
Copy link

Identify PTP-capable network devices that exist in your cluster so that you can configure them

Prerequisties

You installed the PTP Operator.

Procedure

To return a complete list of PTP capable network devices in your cluster, run the following command:

oc get NodePtpDevice -n openshift-ptp -o yaml

$ oc get NodePtpDevice -n openshift-ptp -o yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: v1
items:
- apiVersion: ptp.openshift.io/v1
  kind: NodePtpDevice
  metadata:
    creationTimestamp: "2022-01-27T15:16:28Z"
    generation: 1
    name: dev-worker-0 
    namespace: openshift-ptp
    resourceVersion: "6538103"
    uid: d42fc9ad-bcbf-4590-b6d8-b676c642781a
  spec: {}
  status:
    devices: 
    - name: eno1
    - name: eno2
    - name: eno3
    - name: eno4
    - name: enp5s0f0
    - name: enp5s0f1
...

apiVersion: v1
items:
- apiVersion: ptp.openshift.io/v1
  kind: NodePtpDevice
  metadata:
    creationTimestamp: "2022-01-27T15:16:28Z"
    generation: 1
    name: dev-worker-0

1


    namespace: openshift-ptp
    resourceVersion: "6538103"
    uid: d42fc9ad-bcbf-4590-b6d8-b676c642781a
  spec: {}
  status:
    devices:

2


    - name: eno1
    - name: eno2
    - name: eno3
    - name: eno4
    - name: enp5s0f0
    - name: enp5s0f1
...

Copy to Clipboard

Toggle word wrap

1: The value for the name parameter is the same as the name of the parent node.
2: The devices collection includes a list of the PTP capable devices that the PTP Operator discovers for the node.

17.6.2. Configuring linuxptp services as a grandmaster clock
Copy link

You can configure the linuxptp services (ptp4l, phc2sys, ts2phc) as a grandmaster clock (T-GM) by creating a PtpConfig custom resource (CR) that configures the host NIC.

The ts2phc utility allows you to synchronize the system clock with the PTP grandmaster clock so that the node can stream precision clock signal to downstream PTP ordinary clocks and boundary clocks.

Note

Use the following example PtpConfig CR as the basis to configure linuxptp services as the grandmaster clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts, ptp4lConf, and ptpClockThreshold. ptpClockThreshold is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.

Prerequisites

For T-GM clocks in production environments, install an Intel E810 Westport Channel NIC in the bare-metal cluster host.
Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.
Install the PTP Operator.

Procedure

Create the PtpConfig resource. For example:

Depending on your requirements, use one of the following T-GM configurations for your deployment. Save the YAML in the grandmaster-clock-ptp-config.yaml file:

Example 17.1. Example PTP grandmaster clock configuration

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: grandmaster-clock
  namespace: openshift-ptp
  annotations: {}
spec:
  profile:
    - name: grandmaster-clock
      # The interface name is hardware-specific
      interface: $interface
      ptp4lOpts: "-2"
      phc2sysOpts: "-a -r -r -n 24"
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptpSettings:
        logReduce: "true"
      ptp4lConf: |
        [global]
        #
        # Default Data Set
        #
        twoStepFlag 1
        slaveOnly 0
        priority1 128
        priority2 128
        domainNumber 24
        #utc_offset 37
        clockClass 255
        clockAccuracy 0xFE
        offsetScaledLogVariance 0xFFFF
        free_running 0
        freq_est_interval 1
        dscp_event 0
        dscp_general 0
        dataset_comparison G.8275.x
        G.8275.defaultDS.localPriority 128
        #
        # Port Data Set
        #
        logAnnounceInterval -3
        logSyncInterval -4
        logMinDelayReqInterval -4
        logMinPdelayReqInterval -4
        announceReceiptTimeout 3
        syncReceiptTimeout 0
        delayAsymmetry 0
        fault_reset_interval -4
        neighborPropDelayThresh 20000000
        masterOnly 0
        G.8275.portDS.localPriority 128
        #
        # Run time options
        #
        assume_two_step 0
        logging_level 6
        path_trace_enabled 0
        follow_up_info 0
        hybrid_e2e 0
        inhibit_multicast_service 0
        net_sync_monitor 0
        tc_spanning_tree 0
        tx_timestamp_timeout 50
        unicast_listen 0
        unicast_master_table 0
        unicast_req_duration 3600
        use_syslog 1
        verbose 0
        summary_interval 0
        kernel_leap 1
        check_fup_sync 0
        clock_class_threshold 7
        #
        # Servo Options
        #
        pi_proportional_const 0.0
        pi_integral_const 0.0
        pi_proportional_scale 0.0
        pi_proportional_exponent -0.3
        pi_proportional_norm_max 0.7
        pi_integral_scale 0.0
        pi_integral_exponent 0.4
        pi_integral_norm_max 0.3
        step_threshold 2.0
        first_step_threshold 0.00002
        max_frequency 900000000
        clock_servo pi
        sanity_freq_limit 200000000
        ntpshm_segment 0
        #
        # Transport options
        #
        transportSpecific 0x0
        ptp_dst_mac 01:1B:19:00:00:00
        p2p_dst_mac 01:80:C2:00:00:0E
        udp_ttl 1
        udp6_scope 0x0E
        uds_address /var/run/ptp4l
        #
        # Default interface options
        #
        clock_type OC
        network_transport L2
        delay_mechanism E2E
        time_stamping hardware
        tsproc_mode filter
        delay_filter moving_median
        delay_filter_length 10
        egressLatency 0
        ingressLatency 0
        boundary_clock_jbod 0
        #
        # Clock description
        #
        productDescription ;;
        revisionData ;;
        manufacturerIdentity 00:00:00
        userDescription ;
        timeSource 0xA0
  recommend:
    - profile: grandmaster-clock
      priority: 4
      match:
        - nodeLabel: "node-role.kubernetes.io/$mcp"

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: grandmaster-clock
  namespace: openshift-ptp
  annotations: {}
spec:
  profile:
    - name: grandmaster-clock
      # The interface name is hardware-specific
      interface: $interface
      ptp4lOpts: "-2"
      phc2sysOpts: "-a -r -r -n 24"
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptpSettings:
        logReduce: "true"
      ptp4lConf: |
        [global]
        #
        # Default Data Set
        #
        twoStepFlag 1
        slaveOnly 0
        priority1 128
        priority2 128
        domainNumber 24
        #utc_offset 37
        clockClass 255
        clockAccuracy 0xFE
        offsetScaledLogVariance 0xFFFF
        free_running 0
        freq_est_interval 1
        dscp_event 0
        dscp_general 0
        dataset_comparison G.8275.x
        G.8275.defaultDS.localPriority 128
        #
        # Port Data Set
        #
        logAnnounceInterval -3
        logSyncInterval -4
        logMinDelayReqInterval -4
        logMinPdelayReqInterval -4
        announceReceiptTimeout 3
        syncReceiptTimeout 0
        delayAsymmetry 0
        fault_reset_interval -4
        neighborPropDelayThresh 20000000
        masterOnly 0
        G.8275.portDS.localPriority 128
        #
        # Run time options
        #
        assume_two_step 0
        logging_level 6
        path_trace_enabled 0
        follow_up_info 0
        hybrid_e2e 0
        inhibit_multicast_service 0
        net_sync_monitor 0
        tc_spanning_tree 0
        tx_timestamp_timeout 50
        unicast_listen 0
        unicast_master_table 0
        unicast_req_duration 3600
        use_syslog 1
        verbose 0
        summary_interval 0
        kernel_leap 1
        check_fup_sync 0
        clock_class_threshold 7
        #
        # Servo Options
        #
        pi_proportional_const 0.0
        pi_integral_const 0.0
        pi_proportional_scale 0.0
        pi_proportional_exponent -0.3
        pi_proportional_norm_max 0.7
        pi_integral_scale 0.0
        pi_integral_exponent 0.4
        pi_integral_norm_max 0.3
        step_threshold 2.0
        first_step_threshold 0.00002
        max_frequency 900000000
        clock_servo pi
        sanity_freq_limit 200000000
        ntpshm_segment 0
        #
        # Transport options
        #
        transportSpecific 0x0
        ptp_dst_mac 01:1B:19:00:00:00
        p2p_dst_mac 01:80:C2:00:00:0E
        udp_ttl 1
        udp6_scope 0x0E
        uds_address /var/run/ptp4l
        #
        # Default interface options
        #
        clock_type OC
        network_transport L2
        delay_mechanism E2E
        time_stamping hardware
        tsproc_mode filter
        delay_filter moving_median
        delay_filter_length 10
        egressLatency 0
        ingressLatency 0
        boundary_clock_jbod 0
        #
        # Clock description
        #
        productDescription ;;
        revisionData ;;
        manufacturerIdentity 00:00:00
        userDescription ;
        timeSource 0xA0
  recommend:
    - profile: grandmaster-clock
      priority: 4
      match:
        - nodeLabel: "node-role.kubernetes.io/$mcp"

Copy to Clipboard

Toggle word wrap

Note

The example PTP grandmaster clock configuration is for test purposes only and is not intended for production.

Example 17.2. PTP grandmaster clock configuration for E810 NIC

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: grandmaster
  namespace: openshift-ptp
  annotations:
    ran.openshift.io/ztp-deploy-wave: "10"
spec:
  profile:
  - name: "grandmaster"
    ptp4lOpts: "-2 --summary_interval -4"
    phc2sysOpts: -r -u 0 -m -O -37 -N 8 -R 16 -s $iface_master -n 24
    ptpSchedulingPolicy: SCHED_FIFO
    ptpSchedulingPriority: 10
    ptpSettings:
      logReduce: "true"
    plugins:
      e810:
        enableDefaultConfig: true
    ts2phcOpts: " "
    ts2phcConf: |
      [nmea]
      ts2phc.master 1
      [global]
      use_syslog  0
      verbose 1
      logging_level 7
      ts2phc.pulsewidth 100000000
      ts2phc.nmea_serialport $gnss_serialport
      leapfile  /usr/share/zoneinfo/leap-seconds.list
      [$iface_master]
      ts2phc.extts_polarity rising
      ts2phc.extts_correction 0
    ptp4lConf: |
      [$iface_master]
      masterOnly 1
      [$iface_master_1]
      masterOnly 1
      [$iface_master_2]
      masterOnly 1
      [$iface_master_3]
      masterOnly 1
      [global]
      #
      # Default Data Set
      #
      twoStepFlag 1
      priority1 128
      priority2 128
      domainNumber 24
      #utc_offset 37
      clockClass 6
      clockAccuracy 0x27
      offsetScaledLogVariance 0xFFFF
      free_running 0
      freq_est_interval 1
      dscp_event 0
      dscp_general 0
      dataset_comparison G.8275.x
      G.8275.defaultDS.localPriority 128
      #
      # Port Data Set
      #
      logAnnounceInterval -3
      logSyncInterval -4
      logMinDelayReqInterval -4
      logMinPdelayReqInterval 0
      announceReceiptTimeout 3
      syncReceiptTimeout 0
      delayAsymmetry 0
      fault_reset_interval -4
      neighborPropDelayThresh 20000000
      masterOnly 0
      G.8275.portDS.localPriority 128
      #
      # Run time options
      #
      assume_two_step 0
      logging_level 6
      path_trace_enabled 0
      follow_up_info 0
      hybrid_e2e 0
      inhibit_multicast_service 0
      net_sync_monitor 0
      tc_spanning_tree 0
      tx_timestamp_timeout 50
      unicast_listen 0
      unicast_master_table 0
      unicast_req_duration 3600
      use_syslog 1
      verbose 0
      summary_interval -4
      kernel_leap 1
      check_fup_sync 0
      clock_class_threshold 7
      #
      # Servo Options
      #
      pi_proportional_const 0.0
      pi_integral_const 0.0
      pi_proportional_scale 0.0
      pi_proportional_exponent -0.3
      pi_proportional_norm_max 0.7
      pi_integral_scale 0.0
      pi_integral_exponent 0.4
      pi_integral_norm_max 0.3
      step_threshold 2.0
      first_step_threshold 0.00002
      clock_servo pi
      sanity_freq_limit  200000000
      ntpshm_segment 0
      #
      # Transport options
      #
      transportSpecific 0x0
      ptp_dst_mac 01:1B:19:00:00:00
      p2p_dst_mac 01:80:C2:00:00:0E
      udp_ttl 1
      udp6_scope 0x0E
      uds_address /var/run/ptp4l
      #
      # Default interface options
      #
      clock_type BC
      network_transport L2
      delay_mechanism E2E
      time_stamping hardware
      tsproc_mode filter
      delay_filter moving_median
      delay_filter_length 10
      egressLatency 0
      ingressLatency 0
      boundary_clock_jbod 0
      #
      # Clock description
      #
      productDescription ;;
      revisionData ;;
      manufacturerIdentity 00:00:00
      userDescription ;
      timeSource 0x20
  recommend:
  - profile: "grandmaster"
    priority: 4
    match:
    - nodeLabel: "node-role.kubernetes.io/$mcp"

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: grandmaster
  namespace: openshift-ptp
  annotations:
    ran.openshift.io/ztp-deploy-wave: "10"
spec:
  profile:
  - name: "grandmaster"
    ptp4lOpts: "-2 --summary_interval -4"
    phc2sysOpts: -r -u 0 -m -O -37 -N 8 -R 16 -s $iface_master -n 24
    ptpSchedulingPolicy: SCHED_FIFO
    ptpSchedulingPriority: 10
    ptpSettings:
      logReduce: "true"
    plugins:
      e810:
        enableDefaultConfig: true
    ts2phcOpts: " "
    ts2phcConf: |
      [nmea]
      ts2phc.master 1
      [global]
      use_syslog  0
      verbose 1
      logging_level 7
      ts2phc.pulsewidth 100000000
      ts2phc.nmea_serialport $gnss_serialport
      leapfile  /usr/share/zoneinfo/leap-seconds.list
      [$iface_master]
      ts2phc.extts_polarity rising
      ts2phc.extts_correction 0
    ptp4lConf: |
      [$iface_master]
      masterOnly 1
      [$iface_master_1]
      masterOnly 1
      [$iface_master_2]
      masterOnly 1
      [$iface_master_3]
      masterOnly 1
      [global]
      #
      # Default Data Set
      #
      twoStepFlag 1
      priority1 128
      priority2 128
      domainNumber 24
      #utc_offset 37
      clockClass 6
      clockAccuracy 0x27
      offsetScaledLogVariance 0xFFFF
      free_running 0
      freq_est_interval 1
      dscp_event 0
      dscp_general 0
      dataset_comparison G.8275.x
      G.8275.defaultDS.localPriority 128
      #
      # Port Data Set
      #
      logAnnounceInterval -3
      logSyncInterval -4
      logMinDelayReqInterval -4
      logMinPdelayReqInterval 0
      announceReceiptTimeout 3
      syncReceiptTimeout 0
      delayAsymmetry 0
      fault_reset_interval -4
      neighborPropDelayThresh 20000000
      masterOnly 0
      G.8275.portDS.localPriority 128
      #
      # Run time options
      #
      assume_two_step 0
      logging_level 6
      path_trace_enabled 0
      follow_up_info 0
      hybrid_e2e 0
      inhibit_multicast_service 0
      net_sync_monitor 0
      tc_spanning_tree 0
      tx_timestamp_timeout 50
      unicast_listen 0
      unicast_master_table 0
      unicast_req_duration 3600
      use_syslog 1
      verbose 0
      summary_interval -4
      kernel_leap 1
      check_fup_sync 0
      clock_class_threshold 7
      #
      # Servo Options
      #
      pi_proportional_const 0.0
      pi_integral_const 0.0
      pi_proportional_scale 0.0
      pi_proportional_exponent -0.3
      pi_proportional_norm_max 0.7
      pi_integral_scale 0.0
      pi_integral_exponent 0.4
      pi_integral_norm_max 0.3
      step_threshold 2.0
      first_step_threshold 0.00002
      clock_servo pi
      sanity_freq_limit  200000000
      ntpshm_segment 0
      #
      # Transport options
      #
      transportSpecific 0x0
      ptp_dst_mac 01:1B:19:00:00:00
      p2p_dst_mac 01:80:C2:00:00:0E
      udp_ttl 1
      udp6_scope 0x0E
      uds_address /var/run/ptp4l
      #
      # Default interface options
      #
      clock_type BC
      network_transport L2
      delay_mechanism E2E
      time_stamping hardware
      tsproc_mode filter
      delay_filter moving_median
      delay_filter_length 10
      egressLatency 0
      ingressLatency 0
      boundary_clock_jbod 0
      #
      # Clock description
      #
      productDescription ;;
      revisionData ;;
      manufacturerIdentity 00:00:00
      userDescription ;
      timeSource 0x20
  recommend:
  - profile: "grandmaster"
    priority: 4
    match:
    - nodeLabel: "node-role.kubernetes.io/$mcp"

Copy to Clipboard

Toggle word wrap

Create the CR by running the following command:
```
oc create -f grandmaster-clock-ptp-config.yaml
```
```
$ oc create -f grandmaster-clock-ptp-config.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Check that the PtpConfig profile is applied to the node.

Get the list of pods in the openshift-ptp namespace by running the following command:

oc get pods -n openshift-ptp -o wide

$ oc get pods -n openshift-ptp -o wide

Copy to Clipboard

Toggle word wrap

Example output

NAME                          READY   STATUS    RESTARTS   AGE     IP             NODE
linuxptp-daemon-74m2g         3/3     Running   3          4d15h   10.16.230.7    compute-1.example.com
ptp-operator-5f4f48d7c-x7zkf  1/1     Running   1          4d15h   10.128.1.145   compute-1.example.com

NAME                          READY   STATUS    RESTARTS   AGE     IP             NODE
linuxptp-daemon-74m2g         3/3     Running   3          4d15h   10.16.230.7    compute-1.example.com
ptp-operator-5f4f48d7c-x7zkf  1/1     Running   1          4d15h   10.128.1.145   compute-1.example.com

Copy to Clipboard

Toggle word wrap

Check that the profile is correct. Examine the logs of the linuxptp daemon that corresponds to the node you specified in the PtpConfig profile. Run the following command:

oc logs linuxptp-daemon-74m2g -n openshift-ptp -c linuxptp-daemon-container

$ oc logs linuxptp-daemon-74m2g -n openshift-ptp -c linuxptp-daemon-container

Copy to Clipboard

Toggle word wrap

Example output

ts2phc[94980.334]: [ts2phc.0.config] nmea delay: 98690975 ns
ts2phc[94980.334]: [ts2phc.0.config] ens3f0 extts index 0 at 1676577329.999999999 corr 0 src 1676577330.901342528 diff -1
ts2phc[94980.334]: [ts2phc.0.config] ens3f0 master offset         -1 s2 freq      -1
ts2phc[94980.441]: [ts2phc.0.config] nmea sentence: GNRMC,195453.00,A,4233.24427,N,07126.64420,W,0.008,,160223,,,A,V
phc2sys[94980.450]: [ptp4l.0.config] CLOCK_REALTIME phc offset       943 s2 freq  -89604 delay    504
phc2sys[94980.512]: [ptp4l.0.config] CLOCK_REALTIME phc offset      1000 s2 freq  -89264 delay    474

ts2phc[94980.334]: [ts2phc.0.config] nmea delay: 98690975 ns
ts2phc[94980.334]: [ts2phc.0.config] ens3f0 extts index 0 at 1676577329.999999999 corr 0 src 1676577330.901342528 diff -1
ts2phc[94980.334]: [ts2phc.0.config] ens3f0 master offset         -1 s2 freq      -1
ts2phc[94980.441]: [ts2phc.0.config] nmea sentence: GNRMC,195453.00,A,4233.24427,N,07126.64420,W,0.008,,160223,,,A,V
phc2sys[94980.450]: [ptp4l.0.config] CLOCK_REALTIME phc offset       943 s2 freq  -89604 delay    504
phc2sys[94980.512]: [ptp4l.0.config] CLOCK_REALTIME phc offset      1000 s2 freq  -89264 delay    474

Copy to Clipboard

Toggle word wrap

17.6.2.1. Grandmaster clock PtpConfig configuration reference
Copy link

The following reference information describes the configuration options for the PtpConfig custom resource (CR) that configures the linuxptp services (ptp4l, phc2sys, ts2phc) as a grandmaster clock.

Expand

Table 17.1. PtpConfig configuration options for PTP Grandmaster clock
PtpConfig CR field	Description
`plugins`	Specify an array of `.exec.cmdline` options that configure the NIC for grandmaster clock operation. Grandmaster clock configuration requires certain PTP pins to be disabled. The plugin mechanism allows the PTP Operator to do automated hardware configuration. For the Intel Westport Channel NIC, when `enableDefaultConfig` is true, The PTP Operator runs a hard-coded script to do the required configuration for the NIC.
`ptp4lOpts`	Specify system configuration options for the `ptp4l` service. The options should not include the network interface name `-i <interface>` and service config file `-f /etc/ptp4l.conf` because the network interface name and the service config file are automatically appended.
`ptp4lConf`	Specify the required configuration to start `ptp4l` as a grandmaster clock. For example, the `ens2f1` interface synchronizes downstream connected devices. For grandmaster clocks, set `clockClass` to `6` and set `clockAccuracy` to `0x27`. Set `timeSource` to `0x20` for when receiving the timing signal from a Global navigation satellite system (GNSS).
`tx_timestamp_timeout`	Specify the maximum amount of time to wait for the transmit (TX) timestamp from the sender before discarding the data.
`boundary_clock_jbod`	Specify the JBOD boundary clock time delay value. This value is used to correct the time values that are passed between the network time devices.
`phc2sysOpts`	Specify system config options for the `phc2sys` service. If this field is empty the PTP Operator does not start the `phc2sys` service. Note Ensure that the network interface listed here is configured as grandmaster and is referenced as required in the `ts2phcConf` and `ptp4lConf` fields.
`ptpSchedulingPolicy`	Configure the scheduling policy for `ptp4l` and `phc2sys` processes. Default value is `SCHED_OTHER`. Use `SCHED_FIFO` on systems that support FIFO scheduling.
`ptpSchedulingPriority`	Set an integer value from 1-65 to configure FIFO priority for `ptp4l` and `phc2sys` processes when `ptpSchedulingPolicy` is set to `SCHED_FIFO`. The `ptpSchedulingPriority` field is not used when `ptpSchedulingPolicy` is set to `SCHED_OTHER`.
`ptpClockThreshold`	Optional. If `ptpClockThreshold` stanza is not present, default values are used for `ptpClockThreshold` fields. Stanza shows default `ptpClockThreshold` values. `ptpClockThreshold` values configure how long after the PTP master clock is disconnected before PTP events are triggered. `holdOverTimeout` is the time value in seconds before the PTP clock event state changes to `FREERUN` when the PTP master clock is disconnected. The `maxOffsetThreshold` and `minOffsetThreshold` settings configure offset values in nanoseconds that compare against the values for `CLOCK_REALTIME` (`phc2sys`) or master offset (`ptp4l`). When the `ptp4l` or `phc2sys` offset value is outside this range, the PTP clock state is set to `FREERUN`. When the offset value is within this range, the PTP clock state is set to `LOCKED`.
`ts2phcConf`	Sets the configuration for the `ts2phc` command. `leapfile` is the default path to the current leap seconds definition file in the PTP Operator container image. `ts2phc.nmea_serialport` is the serial port device that is connected to the NMEA GPS clock source. When configured, the GNSS receiver is accessible on `/dev/gnss<id>`. If the host has multiple GNSS receivers, you can find the correct device by enumerating either of the following devices: `/sys/class/net/<eth_port>/device/gnss/` `/sys/class/gnss/gnss<id>/device/`
`ts2phcOpts`	Set options for the `ts2phc` command.
`recommend`	Specify an array of one or more `recommend` objects that define rules on how the `profile` should be applied to nodes.
`.recommend.profile`	Specify the `.recommend.profile` object name that is defined in the `profile` section.
`.recommend.priority`	Specify the `priority` with an integer value between `0` and `99`. A larger number gets lower priority, so a priority of `99` is lower than a priority of `10`. If a node can be matched with multiple profiles according to rules defined in the `match` field, the profile with the higher priority is applied to that node.
`.recommend.match`	Specify `.recommend.match` rules with `nodeLabel` or `nodeName` values.
`.recommend.match.nodeLabel`	Set `nodeLabel` with the `key` of the `node.Labels` field from the node object by using the `oc get nodes --show-labels` command. For example, `node-role.kubernetes.io/worker`.
`.recommend.match.nodeName`	Set `nodeName` with the value of the `node.Name` field from the node object by using the `oc get nodes` command. For example, `compute-1.example.com`.

17.6.3. Configuring linuxptp services as an ordinary clock
Copy link

You can configure linuxptp services (ptp4l, phc2sys) as ordinary clock by creating a PtpConfig custom resource (CR) object.

Note

Use the following example PtpConfig CR as the basis to configure linuxptp services as an ordinary clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts, ptp4lConf, and ptpClockThreshold. ptpClockThreshold is required only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.
Install the PTP Operator.

Procedure

Create the following PtpConfig CR, and then save the YAML in the ordinary-clock-ptp-config.yaml file.

Example PTP ordinary clock configuration

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: ordinary-clock
  namespace: openshift-ptp
  annotations: {}
spec:
  profile:
    - name: ordinary-clock
      # The interface name is hardware-specific
      interface: $interface
      ptp4lOpts: "-2 -s"
      phc2sysOpts: "-a -r -n 24"
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptpSettings:
        logReduce: "true"
      ptp4lConf: |
        [global]
        #
        # Default Data Set
        #
        twoStepFlag 1
        slaveOnly 1
        priority1 128
        priority2 128
        domainNumber 24
        #utc_offset 37
        clockClass 255
        clockAccuracy 0xFE
        offsetScaledLogVariance 0xFFFF
        free_running 0
        freq_est_interval 1
        dscp_event 0
        dscp_general 0
        dataset_comparison G.8275.x
        G.8275.defaultDS.localPriority 128
        #
        # Port Data Set
        #
        logAnnounceInterval -3
        logSyncInterval -4
        logMinDelayReqInterval -4
        logMinPdelayReqInterval -4
        announceReceiptTimeout 3
        syncReceiptTimeout 0
        delayAsymmetry 0
        fault_reset_interval -4
        neighborPropDelayThresh 20000000
        masterOnly 0
        G.8275.portDS.localPriority 128
        #
        # Run time options
        #
        assume_two_step 0
        logging_level 6
        path_trace_enabled 0
        follow_up_info 0
        hybrid_e2e 0
        inhibit_multicast_service 0
        net_sync_monitor 0
        tc_spanning_tree 0
        tx_timestamp_timeout 50
        unicast_listen 0
        unicast_master_table 0
        unicast_req_duration 3600
        use_syslog 1
        verbose 0
        summary_interval 0
        kernel_leap 1
        check_fup_sync 0
        clock_class_threshold 7
        #
        # Servo Options
        #
        pi_proportional_const 0.0
        pi_integral_const 0.0
        pi_proportional_scale 0.0
        pi_proportional_exponent -0.3
        pi_proportional_norm_max 0.7
        pi_integral_scale 0.0
        pi_integral_exponent 0.4
        pi_integral_norm_max 0.3
        step_threshold 2.0
        first_step_threshold 0.00002
        max_frequency 900000000
        clock_servo pi
        sanity_freq_limit 200000000
        ntpshm_segment 0
        #
        # Transport options
        #
        transportSpecific 0x0
        ptp_dst_mac 01:1B:19:00:00:00
        p2p_dst_mac 01:80:C2:00:00:0E
        udp_ttl 1
        udp6_scope 0x0E
        uds_address /var/run/ptp4l
        #
        # Default interface options
        #
        clock_type OC
        network_transport L2
        delay_mechanism E2E
        time_stamping hardware
        tsproc_mode filter
        delay_filter moving_median
        delay_filter_length 10
        egressLatency 0
        ingressLatency 0
        boundary_clock_jbod 0
        #
        # Clock description
        #
        productDescription ;;
        revisionData ;;
        manufacturerIdentity 00:00:00
        userDescription ;
        timeSource 0xA0
  recommend:
    - profile: ordinary-clock
      priority: 4
      match:
        - nodeLabel: "node-role.kubernetes.io/$mcp"

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: ordinary-clock
  namespace: openshift-ptp
  annotations: {}
spec:
  profile:
    - name: ordinary-clock
      # The interface name is hardware-specific
      interface: $interface
      ptp4lOpts: "-2 -s"
      phc2sysOpts: "-a -r -n 24"
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptpSettings:
        logReduce: "true"
      ptp4lConf: |
        [global]
        #
        # Default Data Set
        #
        twoStepFlag 1
        slaveOnly 1
        priority1 128
        priority2 128
        domainNumber 24
        #utc_offset 37
        clockClass 255
        clockAccuracy 0xFE
        offsetScaledLogVariance 0xFFFF
        free_running 0
        freq_est_interval 1
        dscp_event 0
        dscp_general 0
        dataset_comparison G.8275.x
        G.8275.defaultDS.localPriority 128
        #
        # Port Data Set
        #
        logAnnounceInterval -3
        logSyncInterval -4
        logMinDelayReqInterval -4
        logMinPdelayReqInterval -4
        announceReceiptTimeout 3
        syncReceiptTimeout 0
        delayAsymmetry 0
        fault_reset_interval -4
        neighborPropDelayThresh 20000000
        masterOnly 0
        G.8275.portDS.localPriority 128
        #
        # Run time options
        #
        assume_two_step 0
        logging_level 6
        path_trace_enabled 0
        follow_up_info 0
        hybrid_e2e 0
        inhibit_multicast_service 0
        net_sync_monitor 0
        tc_spanning_tree 0
        tx_timestamp_timeout 50
        unicast_listen 0
        unicast_master_table 0
        unicast_req_duration 3600
        use_syslog 1
        verbose 0
        summary_interval 0
        kernel_leap 1
        check_fup_sync 0
        clock_class_threshold 7
        #
        # Servo Options
        #
        pi_proportional_const 0.0
        pi_integral_const 0.0
        pi_proportional_scale 0.0
        pi_proportional_exponent -0.3
        pi_proportional_norm_max 0.7
        pi_integral_scale 0.0
        pi_integral_exponent 0.4
        pi_integral_norm_max 0.3
        step_threshold 2.0
        first_step_threshold 0.00002
        max_frequency 900000000
        clock_servo pi
        sanity_freq_limit 200000000
        ntpshm_segment 0
        #
        # Transport options
        #
        transportSpecific 0x0
        ptp_dst_mac 01:1B:19:00:00:00
        p2p_dst_mac 01:80:C2:00:00:0E
        udp_ttl 1
        udp6_scope 0x0E
        uds_address /var/run/ptp4l
        #
        # Default interface options
        #
        clock_type OC
        network_transport L2
        delay_mechanism E2E
        time_stamping hardware
        tsproc_mode filter
        delay_filter moving_median
        delay_filter_length 10
        egressLatency 0
        ingressLatency 0
        boundary_clock_jbod 0
        #
        # Clock description
        #
        productDescription ;;
        revisionData ;;
        manufacturerIdentity 00:00:00
        userDescription ;
        timeSource 0xA0
  recommend:
    - profile: ordinary-clock
      priority: 4
      match:
        - nodeLabel: "node-role.kubernetes.io/$mcp"

Copy to Clipboard

Toggle word wrap

Expand

Table 17.2. PTP ordinary clock CR configuration options
Custom resource field	Description
`name`	The name of the `PtpConfig` CR.
`profile`	Specify an array of one or more `profile` objects. Each profile must be uniquely named.
`interface`	Specify the network interface to be used by the `ptp4l` service, for example `ens787f1`.
`ptp4lOpts`	Specify system config options for the `ptp4l` service, for example `-2` to select the IEEE 802.3 network transport. The options should not include the network interface name `-i <interface>` and service config file `-f /etc/ptp4l.conf` because the network interface name and the service config file are automatically appended. Append `--summary_interval -4` to use PTP fast events with this interface.
`phc2sysOpts`	Specify system config options for the `phc2sys` service. If this field is empty, the PTP Operator does not start the `phc2sys` service. For Intel Columbiaville 800 Series NICs, set `phc2sysOpts` options to `-a -r -m -n 24 -N 8 -R 16`. `-m` prints messages to `stdout`. The `linuxptp-daemon` `DaemonSet` parses the logs and generates Prometheus metrics.
`ptp4lConf`	Specify a string that contains the configuration to replace the default `/etc/ptp4l.conf` file. To use the default configuration, leave the field empty.
`tx_timestamp_timeout`	For Intel Columbiaville 800 Series NICs, set `tx_timestamp_timeout` to `50`.
`boundary_clock_jbod`	For Intel Columbiaville 800 Series NICs, set `boundary_clock_jbod` to `0`.
`ptpSchedulingPolicy`	Scheduling policy for `ptp4l` and `phc2sys` processes. Default value is `SCHED_OTHER`. Use `SCHED_FIFO` on systems that support FIFO scheduling.
`ptpSchedulingPriority`	Integer value from 1-65 used to set FIFO priority for `ptp4l` and `phc2sys` processes when `ptpSchedulingPolicy` is set to `SCHED_FIFO`. The `ptpSchedulingPriority` field is not used when `ptpSchedulingPolicy` is set to `SCHED_OTHER`.
`ptpClockThreshold`	Optional. If `ptpClockThreshold` is not present, default values are used for the `ptpClockThreshold` fields. `ptpClockThreshold` configures how long after the PTP master clock is disconnected before PTP events are triggered. `holdOverTimeout` is the time value in seconds before the PTP clock event state changes to `FREERUN` when the PTP master clock is disconnected. The `maxOffsetThreshold` and `minOffsetThreshold` settings configure offset values in nanoseconds that compare against the values for `CLOCK_REALTIME` (`phc2sys`) or master offset (`ptp4l`). When the `ptp4l` or `phc2sys` offset value is outside this range, the PTP clock state is set to `FREERUN`. When the offset value is within this range, the PTP clock state is set to `LOCKED`.
`recommend`	Specify an array of one or more `recommend` objects that define rules on how the `profile` should be applied to nodes.
`.recommend.profile`	Specify the `.recommend.profile` object name defined in the `profile` section.
`.recommend.priority`	Set `.recommend.priority` to `0` for ordinary clock.
`.recommend.match`	Specify `.recommend.match` rules with `nodeLabel` or `nodeName` values.
`.recommend.match.nodeLabel`	Set `nodeLabel` with the `key` of the `node.Labels` field from the node object by using the `oc get nodes --show-labels` command. For example, `node-role.kubernetes.io/worker`.
`.recommend.match.nodeName`	Set `nodeName` with the value of the `node.Name` field from the node object by using the `oc get nodes` command. For example, `compute-1.example.com`.

Create the PtpConfig CR by running the following command:
```
oc create -f ordinary-clock-ptp-config.yaml
```
```
$ oc create -f ordinary-clock-ptp-config.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Check that the PtpConfig profile is applied to the node.

Get the list of pods in the openshift-ptp namespace by running the following command:

oc get pods -n openshift-ptp -o wide

$ oc get pods -n openshift-ptp -o wide

Copy to Clipboard

Toggle word wrap

Example output

NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE
linuxptp-daemon-4xkbb           1/1     Running   0          43m   10.1.196.24      compute-0.example.com
linuxptp-daemon-tdspf           1/1     Running   0          43m   10.1.196.25      compute-1.example.com
ptp-operator-657bbb64c8-2f8sj   1/1     Running   0          43m   10.129.0.61      control-plane-1.example.com

NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE
linuxptp-daemon-4xkbb           1/1     Running   0          43m   10.1.196.24      compute-0.example.com
linuxptp-daemon-tdspf           1/1     Running   0          43m   10.1.196.25      compute-1.example.com
ptp-operator-657bbb64c8-2f8sj   1/1     Running   0          43m   10.129.0.61      control-plane-1.example.com

Copy to Clipboard

Toggle word wrap

Check that the profile is correct. Examine the logs of the linuxptp daemon that corresponds to the node you specified in the PtpConfig profile. Run the following command:

oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container

$ oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container

Copy to Clipboard

Toggle word wrap

Example output

I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile
I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to:
I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------
I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1
I1115 09:41:17.117616 4143292 daemon.go:102] Interface: ens787f1
I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2 -s
I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24
I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------

I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile
I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to:
I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------
I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1
I1115 09:41:17.117616 4143292 daemon.go:102] Interface: ens787f1
I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2 -s
I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24
I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------

Copy to Clipboard

Toggle word wrap

17.6.4. Configuring linuxptp services as a boundary clock
Copy link

You can configure the linuxptp services (ptp4l, phc2sys) as boundary clock by creating a PtpConfig custom resource (CR) object.

Note

Use the following example PtpConfig CR as the basis to configure linuxptp services as the boundary clock for your particular hardware and environment. This example CR does not configure PTP fast events. To configure PTP fast events, set appropriate values for ptp4lOpts, ptp4lConf, and ptpClockThreshold. ptpClockThreshold is used only when events are enabled. See "Configuring the PTP fast event notifications publisher" for more information.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.
Install the PTP Operator.

Procedure

Create the following PtpConfig CR, and then save the YAML in the boundary-clock-ptp-config.yaml file.

Example PTP boundary clock configuration

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: boundary-clock
  namespace: openshift-ptp
  annotations: {}
spec:
  profile:
    - name: boundary-clock
      ptp4lOpts: "-2"
      phc2sysOpts: "-a -r -n 24"
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptpSettings:
        logReduce: "true"
      ptp4lConf: |
        # The interface name is hardware-specific
        [$iface_slave]
        masterOnly 0
        [$iface_master_1]
        masterOnly 1
        [$iface_master_2]
        masterOnly 1
        [$iface_master_3]
        masterOnly 1
        [global]
        #
        # Default Data Set
        #
        twoStepFlag 1
        slaveOnly 0
        priority1 128
        priority2 128
        domainNumber 24
        #utc_offset 37
        clockClass 248
        clockAccuracy 0xFE
        offsetScaledLogVariance 0xFFFF
        free_running 0
        freq_est_interval 1
        dscp_event 0
        dscp_general 0
        dataset_comparison G.8275.x
        G.8275.defaultDS.localPriority 128
        #
        # Port Data Set
        #
        logAnnounceInterval -3
        logSyncInterval -4
        logMinDelayReqInterval -4
        logMinPdelayReqInterval -4
        announceReceiptTimeout 3
        syncReceiptTimeout 0
        delayAsymmetry 0
        fault_reset_interval -4
        neighborPropDelayThresh 20000000
        masterOnly 0
        G.8275.portDS.localPriority 128
        #
        # Run time options
        #
        assume_two_step 0
        logging_level 6
        path_trace_enabled 0
        follow_up_info 0
        hybrid_e2e 0
        inhibit_multicast_service 0
        net_sync_monitor 0
        tc_spanning_tree 0
        tx_timestamp_timeout 50
        unicast_listen 0
        unicast_master_table 0
        unicast_req_duration 3600
        use_syslog 1
        verbose 0
        summary_interval 0
        kernel_leap 1
        check_fup_sync 0
        clock_class_threshold 135
        #
        # Servo Options
        #
        pi_proportional_const 0.0
        pi_integral_const 0.0
        pi_proportional_scale 0.0
        pi_proportional_exponent -0.3
        pi_proportional_norm_max 0.7
        pi_integral_scale 0.0
        pi_integral_exponent 0.4
        pi_integral_norm_max 0.3
        step_threshold 2.0
        first_step_threshold 0.00002
        max_frequency 900000000
        clock_servo pi
        sanity_freq_limit 200000000
        ntpshm_segment 0
        #
        # Transport options
        #
        transportSpecific 0x0
        ptp_dst_mac 01:1B:19:00:00:00
        p2p_dst_mac 01:80:C2:00:00:0E
        udp_ttl 1
        udp6_scope 0x0E
        uds_address /var/run/ptp4l
        #
        # Default interface options
        #
        clock_type BC
        network_transport L2
        delay_mechanism E2E
        time_stamping hardware
        tsproc_mode filter
        delay_filter moving_median
        delay_filter_length 10
        egressLatency 0
        ingressLatency 0
        boundary_clock_jbod 0
        #
        # Clock description
        #
        productDescription ;;
        revisionData ;;
        manufacturerIdentity 00:00:00
        userDescription ;
        timeSource 0xA0
  recommend:
    - profile: boundary-clock
      priority: 4
      match:
        - nodeLabel: "node-role.kubernetes.io/$mcp"

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: boundary-clock
  namespace: openshift-ptp
  annotations: {}
spec:
  profile:
    - name: boundary-clock
      ptp4lOpts: "-2"
      phc2sysOpts: "-a -r -n 24"
      ptpSchedulingPolicy: SCHED_FIFO
      ptpSchedulingPriority: 10
      ptpSettings:
        logReduce: "true"
      ptp4lConf: |
        # The interface name is hardware-specific
        [$iface_slave]
        masterOnly 0
        [$iface_master_1]
        masterOnly 1
        [$iface_master_2]
        masterOnly 1
        [$iface_master_3]
        masterOnly 1
        [global]
        #
        # Default Data Set
        #
        twoStepFlag 1
        slaveOnly 0
        priority1 128
        priority2 128
        domainNumber 24
        #utc_offset 37
        clockClass 248
        clockAccuracy 0xFE
        offsetScaledLogVariance 0xFFFF
        free_running 0
        freq_est_interval 1
        dscp_event 0
        dscp_general 0
        dataset_comparison G.8275.x
        G.8275.defaultDS.localPriority 128
        #
        # Port Data Set
        #
        logAnnounceInterval -3
        logSyncInterval -4
        logMinDelayReqInterval -4
        logMinPdelayReqInterval -4
        announceReceiptTimeout 3
        syncReceiptTimeout 0
        delayAsymmetry 0
        fault_reset_interval -4
        neighborPropDelayThresh 20000000
        masterOnly 0
        G.8275.portDS.localPriority 128
        #
        # Run time options
        #
        assume_two_step 0
        logging_level 6
        path_trace_enabled 0
        follow_up_info 0
        hybrid_e2e 0
        inhibit_multicast_service 0
        net_sync_monitor 0
        tc_spanning_tree 0
        tx_timestamp_timeout 50
        unicast_listen 0
        unicast_master_table 0
        unicast_req_duration 3600
        use_syslog 1
        verbose 0
        summary_interval 0
        kernel_leap 1
        check_fup_sync 0
        clock_class_threshold 135
        #
        # Servo Options
        #
        pi_proportional_const 0.0
        pi_integral_const 0.0
        pi_proportional_scale 0.0
        pi_proportional_exponent -0.3
        pi_proportional_norm_max 0.7
        pi_integral_scale 0.0
        pi_integral_exponent 0.4
        pi_integral_norm_max 0.3
        step_threshold 2.0
        first_step_threshold 0.00002
        max_frequency 900000000
        clock_servo pi
        sanity_freq_limit 200000000
        ntpshm_segment 0
        #
        # Transport options
        #
        transportSpecific 0x0
        ptp_dst_mac 01:1B:19:00:00:00
        p2p_dst_mac 01:80:C2:00:00:0E
        udp_ttl 1
        udp6_scope 0x0E
        uds_address /var/run/ptp4l
        #
        # Default interface options
        #
        clock_type BC
        network_transport L2
        delay_mechanism E2E
        time_stamping hardware
        tsproc_mode filter
        delay_filter moving_median
        delay_filter_length 10
        egressLatency 0
        ingressLatency 0
        boundary_clock_jbod 0
        #
        # Clock description
        #
        productDescription ;;
        revisionData ;;
        manufacturerIdentity 00:00:00
        userDescription ;
        timeSource 0xA0
  recommend:
    - profile: boundary-clock
      priority: 4
      match:
        - nodeLabel: "node-role.kubernetes.io/$mcp"

Copy to Clipboard

Toggle word wrap

Expand

Table 17.3. PTP boundary clock CR configuration options
Custom resource field	Description
`name`	The name of the `PtpConfig` CR.
`profile`	Specify an array of one or more `profile` objects.
`name`	Specify the name of a profile object which uniquely identifies a profile object.
`ptp4lOpts`	Specify system config options for the `ptp4l` service. The options should not include the network interface name `-i <interface>` and service config file `-f /etc/ptp4l.conf` because the network interface name and the service config file are automatically appended.
`ptp4lConf`	Specify the required configuration to start `ptp4l` as boundary clock. For example, `ens1f0` synchronizes from a grandmaster clock and `ens1f3` synchronizes connected devices.
`<interface_1>`	The interface that receives the synchronization clock.
`<interface_2>`	The interface that sends the synchronization clock.
`tx_timestamp_timeout`	For Intel Columbiaville 800 Series NICs, set `tx_timestamp_timeout` to `50`.
`boundary_clock_jbod`	For Intel Columbiaville 800 Series NICs, ensure `boundary_clock_jbod` is set to `0`. For Intel Fortville X710 Series NICs, ensure `boundary_clock_jbod` is set to `1`.
`phc2sysOpts`	Specify system config options for the `phc2sys` service. If this field is empty, the PTP Operator does not start the `phc2sys` service.
`ptpSchedulingPolicy`	Scheduling policy for ptp4l and phc2sys processes. Default value is `SCHED_OTHER`. Use `SCHED_FIFO` on systems that support FIFO scheduling.
`ptpSchedulingPriority`	Integer value from 1-65 used to set FIFO priority for `ptp4l` and `phc2sys` processes when `ptpSchedulingPolicy` is set to `SCHED_FIFO`. The `ptpSchedulingPriority` field is not used when `ptpSchedulingPolicy` is set to `SCHED_OTHER`.
`ptpClockThreshold`	Optional. If `ptpClockThreshold` is not present, default values are used for the `ptpClockThreshold` fields. `ptpClockThreshold` configures how long after the PTP master clock is disconnected before PTP events are triggered. `holdOverTimeout` is the time value in seconds before the PTP clock event state changes to `FREERUN` when the PTP master clock is disconnected. The `maxOffsetThreshold` and `minOffsetThreshold` settings configure offset values in nanoseconds that compare against the values for `CLOCK_REALTIME` (`phc2sys`) or master offset (`ptp4l`). When the `ptp4l` or `phc2sys` offset value is outside this range, the PTP clock state is set to `FREERUN`. When the offset value is within this range, the PTP clock state is set to `LOCKED`.
`recommend`	Specify an array of one or more `recommend` objects that define rules on how the `profile` should be applied to nodes.
`.recommend.profile`	Specify the `.recommend.profile` object name defined in the `profile` section.
`.recommend.priority`	Specify the `priority` with an integer value between `0` and `99`. A larger number gets lower priority, so a priority of `99` is lower than a priority of `10`. If a node can be matched with multiple profiles according to rules defined in the `match` field, the profile with the higher priority is applied to that node.
`.recommend.match`	Specify `.recommend.match` rules with `nodeLabel` or `nodeName` values.
`.recommend.match.nodeLabel`	Set `nodeLabel` with the `key` of the `node.Labels` field from the node object by using the `oc get nodes --show-labels` command. For example, `node-role.kubernetes.io/worker`.
`.recommend.match.nodeName`	Set `nodeName` with the value of the `node.Name` field from the node object by using the `oc get nodes` command. For example, `compute-1.example.com`.

Create the CR by running the following command:
```
oc create -f boundary-clock-ptp-config.yaml
```
```
$ oc create -f boundary-clock-ptp-config.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Check that the PtpConfig profile is applied to the node.

Get the list of pods in the openshift-ptp namespace by running the following command:

oc get pods -n openshift-ptp -o wide

$ oc get pods -n openshift-ptp -o wide

Copy to Clipboard

Toggle word wrap

Example output

NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE
linuxptp-daemon-4xkbb           1/1     Running   0          43m   10.1.196.24      compute-0.example.com
linuxptp-daemon-tdspf           1/1     Running   0          43m   10.1.196.25      compute-1.example.com
ptp-operator-657bbb64c8-2f8sj   1/1     Running   0          43m   10.129.0.61      control-plane-1.example.com

NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE
linuxptp-daemon-4xkbb           1/1     Running   0          43m   10.1.196.24      compute-0.example.com
linuxptp-daemon-tdspf           1/1     Running   0          43m   10.1.196.25      compute-1.example.com
ptp-operator-657bbb64c8-2f8sj   1/1     Running   0          43m   10.129.0.61      control-plane-1.example.com

Copy to Clipboard

Toggle word wrap

Check that the profile is correct. Examine the logs of the linuxptp daemon that corresponds to the node you specified in the PtpConfig profile. Run the following command:

oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container

$ oc logs linuxptp-daemon-4xkbb -n openshift-ptp -c linuxptp-daemon-container

Copy to Clipboard

Toggle word wrap

Example output

I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile
I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to:
I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------
I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1
I1115 09:41:17.117616 4143292 daemon.go:102] Interface:
I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2
I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24
I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------

I1115 09:41:17.117596 4143292 daemon.go:107] in applyNodePTPProfile
I1115 09:41:17.117604 4143292 daemon.go:109] updating NodePTPProfile to:
I1115 09:41:17.117607 4143292 daemon.go:110] ------------------------------------
I1115 09:41:17.117612 4143292 daemon.go:102] Profile Name: profile1
I1115 09:41:17.117616 4143292 daemon.go:102] Interface:
I1115 09:41:17.117620 4143292 daemon.go:102] Ptp4lOpts: -2
I1115 09:41:17.117623 4143292 daemon.go:102] Phc2sysOpts: -a -r -n 24
I1115 09:41:17.117626 4143292 daemon.go:116] ------------------------------------

Copy to Clipboard

Toggle word wrap

17.6.5. Configuring linuxptp services as boundary clocks for dual NIC hardware
Copy link

You can configure the linuxptp services (ptp4l, phc2sys) as boundary clocks for dual-NIC hardware by creating a PtpConfig custom resource (CR) object for each NIC.

Dual NIC hardware allows you to connect each NIC to the same upstream leader clock with separate ptp4l instances for each NIC feeding the downstream clocks.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.
Install the PTP Operator.

Procedure

Create two separate PtpConfig CRs, one for each NIC, using the reference CR in "Configuring linuxptp services as a boundary clock" as the basis for each CR. For example:

Create boundary-clock-ptp-config-nic1.yaml, specifying values for phc2sysOpts:

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: boundary-clock-ptp-config-nic1
  namespace: openshift-ptp
spec:
  profile:
  - name: "profile1"
    ptp4lOpts: "-2 --summary_interval -4"
    ptp4lConf: | 
      [ens5f1]
      masterOnly 1
      [ens5f0]
      masterOnly 0
    ...
    phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16"

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: boundary-clock-ptp-config-nic1
  namespace: openshift-ptp
spec:
  profile:
  - name: "profile1"
    ptp4lOpts: "-2 --summary_interval -4"
    ptp4lConf: |

1


      [ens5f1]
      masterOnly 1
      [ens5f0]
      masterOnly 0
    ...
    phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16"

2

Copy to Clipboard

Toggle word wrap

1: Specify the required interfaces to start ptp4l as a boundary clock. For example, ens5f0 synchronizes from a grandmaster clock and ens5f1 synchronizes connected devices.
2: Required phc2sysOpts values. -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.

Create boundary-clock-ptp-config-nic2.yaml, removing the phc2sysOpts field altogether to disable the phc2sys service for the second NIC:

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: boundary-clock-ptp-config-nic2
  namespace: openshift-ptp
spec:
  profile:
  - name: "profile2"
    ptp4lOpts: "-2 --summary_interval -4"
    ptp4lConf: | 
      [ens7f1]
      masterOnly 1
      [ens7f0]
      masterOnly 0
...

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: boundary-clock-ptp-config-nic2
  namespace: openshift-ptp
spec:
  profile:
  - name: "profile2"
    ptp4lOpts: "-2 --summary_interval -4"
    ptp4lConf: |

1


      [ens7f1]
      masterOnly 1
      [ens7f0]
      masterOnly 0
...

Copy to Clipboard

Toggle word wrap

1: Specify the required interfaces to start ptp4l as a boundary clock on the second NIC.

Note

You must completely remove the phc2sysOpts field from the second PtpConfig CR to disable the phc2sys service on the second NIC.

Create the dual NIC PtpConfig CRs by running the following commands:
1. Create the CR that configures PTP for the first NIC:
  $ oc create -f boundary-clock-ptp-config-nic1.yaml
  Copy to Clipboard Toggle word wrap
2. Create the CR that configures PTP for the second NIC:
  $ oc create -f boundary-clock-ptp-config-nic2.yaml
  Copy to Clipboard Toggle word wrap

Verification

Check that the PTP Operator has applied the PtpConfig CRs for both NICs. Examine the logs for the linuxptp daemon corresponding to the node that has the dual NIC hardware installed. For example, run the following command:

oc logs linuxptp-daemon-cvgr6 -n openshift-ptp -c linuxptp-daemon-container

$ oc logs linuxptp-daemon-cvgr6 -n openshift-ptp -c linuxptp-daemon-container

Copy to Clipboard

Toggle word wrap

Example output

ptp4l[80828.335]: [ptp4l.1.config] master offset          5 s2 freq   -5727 path delay       519
ptp4l[80828.343]: [ptp4l.0.config] master offset         -5 s2 freq  -10607 path delay       533
phc2sys[80828.390]: [ptp4l.0.config] CLOCK_REALTIME phc offset         1 s2 freq  -87239 delay    539

ptp4l[80828.335]: [ptp4l.1.config] master offset          5 s2 freq   -5727 path delay       519
ptp4l[80828.343]: [ptp4l.0.config] master offset         -5 s2 freq  -10607 path delay       533
phc2sys[80828.390]: [ptp4l.0.config] CLOCK_REALTIME phc offset         1 s2 freq  -87239 delay    539

Copy to Clipboard

Toggle word wrap

17.6.6. Intel Columbiaville E800 series NIC as PTP ordinary clock reference
Copy link

The following table describes the changes that you must make to the reference PTP configuration in order to use Intel Columbiaville E800 series NICs as ordinary clocks. Make the changes in a PtpConfig custom resource (CR) that you apply to the cluster.

Expand

Table 17.4. Recommended PTP settings for Intel Columbiaville NIC
PTP configuration	Recommended setting
`phc2sysOpts`	`-a -r -m -n 24 -N 8 -R 16`
`tx_timestamp_timeout`	`50`
`boundary_clock_jbod`	`0`

Note

For phc2sysOpts, -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.

17.6.7. Configuring FIFO priority scheduling for PTP hardware
Copy link

In telco or other deployment configurations that require low latency performance, PTP daemon threads run in a constrained CPU footprint alongside the rest of the infrastructure components. By default, PTP threads run with the SCHED_OTHER policy. Under high load, these threads might not get the scheduling latency they require for error-free operation.

To mitigate against potential scheduling latency errors, you can configure the PTP Operator linuxptp services to allow threads to run with a SCHED_FIFO policy. If SCHED_FIFO is set for a PtpConfig CR, then ptp4l and phc2sys will run in the parent container under chrt with a priority set by the ptpSchedulingPriority field of the PtpConfig CR.

Note

Setting ptpSchedulingPolicy is optional, and is only required if you are experiencing latency errors.

Procedure

Edit the PtpConfig CR profile:
```
oc edit PtpConfig -n openshift-ptp
```
```
$ oc edit PtpConfig -n openshift-ptp
```
Copy to Clipboard Toggle word wrap

Change the ptpSchedulingPolicy and ptpSchedulingPriority fields:

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: <ptp_config_name>
  namespace: openshift-ptp
...
spec:
  profile:
  - name: "profile1"
...
    ptpSchedulingPolicy: SCHED_FIFO 
    ptpSchedulingPriority: 10

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: <ptp_config_name>
  namespace: openshift-ptp
...
spec:
  profile:
  - name: "profile1"
...
    ptpSchedulingPolicy: SCHED_FIFO

1


    ptpSchedulingPriority: 10

2

Copy to Clipboard

Toggle word wrap

1: Scheduling policy for ptp4l and phc2sys processes. Use SCHED_FIFO on systems that support FIFO scheduling.
2: Required. Sets the integer value 1-65 used to configure FIFO priority for ptp4l and phc2sys processes.

Save and exit to apply the changes to the PtpConfig CR.

Verification

Get the name of the linuxptp-daemon pod and corresponding node where the PtpConfig CR has been applied:

oc get pods -n openshift-ptp -o wide

$ oc get pods -n openshift-ptp -o wide

Copy to Clipboard

Toggle word wrap

Example output

NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
linuxptp-daemon-gmv2n           3/3     Running   0          1d17h   10.1.196.24   compute-0.example.com
linuxptp-daemon-lgm55           3/3     Running   0          1d17h   10.1.196.25   compute-1.example.com
ptp-operator-3r4dcvf7f4-zndk7   1/1     Running   0          1d7h    10.129.0.61   control-plane-1.example.com

NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
linuxptp-daemon-gmv2n           3/3     Running   0          1d17h   10.1.196.24   compute-0.example.com
linuxptp-daemon-lgm55           3/3     Running   0          1d17h   10.1.196.25   compute-1.example.com
ptp-operator-3r4dcvf7f4-zndk7   1/1     Running   0          1d7h    10.129.0.61   control-plane-1.example.com

Copy to Clipboard

Toggle word wrap

Check that the ptp4l process is running with the updated chrt FIFO priority:

oc -n openshift-ptp logs linuxptp-daemon-lgm55 -c linuxptp-daemon-container|grep chrt

$ oc -n openshift-ptp logs linuxptp-daemon-lgm55 -c linuxptp-daemon-container|grep chrt

Copy to Clipboard

Toggle word wrap

Example output

I1216 19:24:57.091872 1600715 daemon.go:285] /bin/chrt -f 65 /usr/sbin/ptp4l -f /var/run/ptp4l.0.config -2  --summary_interval -4 -m

I1216 19:24:57.091872 1600715 daemon.go:285] /bin/chrt -f 65 /usr/sbin/ptp4l -f /var/run/ptp4l.0.config -2  --summary_interval -4 -m

Copy to Clipboard

Toggle word wrap

17.6.8. Configuring log filtering for linuxptp services
Copy link

The linuxptp daemon generates logs that you can use for debugging purposes. In telco or other deployment configurations that feature a limited storage capacity, these logs can add to the storage demand.

To reduce the number log messages, you can configure the PtpConfig custom resource (CR) to exclude log messages that report the master offset value. The master offset log message reports the difference between the current node’s clock and the master clock in nanoseconds.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.
Install the PTP Operator.

Procedure

Edit the PtpConfig CR:
```
oc edit PtpConfig -n openshift-ptp
```
```
$ oc edit PtpConfig -n openshift-ptp
```
Copy to Clipboard Toggle word wrap

In spec.profile, add the ptpSettings.logReduce specification and set the value to true:

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: <ptp_config_name>
  namespace: openshift-ptp
...
spec:
  profile:
  - name: "profile1"
...
    ptpSettings:
      logReduce: "true"

apiVersion: ptp.openshift.io/v1
kind: PtpConfig
metadata:
  name: <ptp_config_name>
  namespace: openshift-ptp
...
spec:
  profile:
  - name: "profile1"
...
    ptpSettings:
      logReduce: "true"

Copy to Clipboard

Toggle word wrap

Note

For debugging purposes, you can revert this specification to False to include the master offset messages.

Save and exit to apply the changes to the PtpConfig CR.

Verification

Get the name of the linuxptp-daemon pod and corresponding node where the PtpConfig CR has been applied:

oc get pods -n openshift-ptp -o wide

$ oc get pods -n openshift-ptp -o wide

Copy to Clipboard

Toggle word wrap

Example output

NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
linuxptp-daemon-gmv2n           3/3     Running   0          1d17h   10.1.196.24   compute-0.example.com
linuxptp-daemon-lgm55           3/3     Running   0          1d17h   10.1.196.25   compute-1.example.com
ptp-operator-3r4dcvf7f4-zndk7   1/1     Running   0          1d7h    10.129.0.61   control-plane-1.example.com

NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
linuxptp-daemon-gmv2n           3/3     Running   0          1d17h   10.1.196.24   compute-0.example.com
linuxptp-daemon-lgm55           3/3     Running   0          1d17h   10.1.196.25   compute-1.example.com
ptp-operator-3r4dcvf7f4-zndk7   1/1     Running   0          1d7h    10.129.0.61   control-plane-1.example.com

Copy to Clipboard

Toggle word wrap

Verify that master offset messages are excluded from the logs by running the following command:
```
oc -n openshift-ptp logs <linux_daemon_container> -c linuxptp-daemon-container | grep "master offset"
```
```
$ oc -n openshift-ptp logs <linux_daemon_container> -c linuxptp-daemon-container | grep "master offset" 
```
1
Copy to Clipboard Toggle word wrap
1
<linux_daemon_container> is the name of the linuxptp-daemon pod, for example linuxptp-daemon-gmv2n.
When you configure the logReduce specification, this command does not report any instances of master offset in the logs of the linuxptp daemon.

17.7. Troubleshooting common PTP Operator issues
Copy link

Troubleshoot common problems with the PTP Operator by performing the following steps.

Prerequisites

Install the OpenShift Container Platform CLI (oc).
Log in as a user with cluster-admin privileges.
Install the PTP Operator on a bare-metal cluster with hosts that support PTP.

Procedure

Check the Operator and operands are successfully deployed in the cluster for the configured nodes.

oc get pods -n openshift-ptp -o wide

$ oc get pods -n openshift-ptp -o wide

Copy to Clipboard

Toggle word wrap

Example output

NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
linuxptp-daemon-lmvgn           3/3     Running   0          4d17h   10.1.196.24   compute-0.example.com
linuxptp-daemon-qhfg7           3/3     Running   0          4d17h   10.1.196.25   compute-1.example.com
ptp-operator-6b8dcbf7f4-zndk7   1/1     Running   0          5d7h    10.129.0.61   control-plane-1.example.com

NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
linuxptp-daemon-lmvgn           3/3     Running   0          4d17h   10.1.196.24   compute-0.example.com
linuxptp-daemon-qhfg7           3/3     Running   0          4d17h   10.1.196.25   compute-1.example.com
ptp-operator-6b8dcbf7f4-zndk7   1/1     Running   0          5d7h    10.129.0.61   control-plane-1.example.com

Copy to Clipboard

Toggle word wrap

Note

When the PTP fast event bus is enabled, the number of ready linuxptp-daemon pods is 3/3. If the PTP fast event bus is not enabled, 2/2 is displayed.

Check that supported hardware is found in the cluster.

oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io

$ oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io

Copy to Clipboard

Toggle word wrap

Example output

NAME                                  AGE
control-plane-0.example.com           10d
control-plane-1.example.com           10d
compute-0.example.com                 10d
compute-1.example.com                 10d
compute-2.example.com                 10d

NAME                                  AGE
control-plane-0.example.com           10d
control-plane-1.example.com           10d
compute-0.example.com                 10d
compute-1.example.com                 10d
compute-2.example.com                 10d

Copy to Clipboard

Toggle word wrap

Check the available PTP network interfaces for a node:

oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io <node_name> -o yaml

$ oc -n openshift-ptp get nodeptpdevices.ptp.openshift.io <node_name> -o yaml

Copy to Clipboard

Toggle word wrap

where:

<node_name>

Specifies the node you want to query, for example, compute-0.example.com.

Example output

apiVersion: ptp.openshift.io/v1
kind: NodePtpDevice
metadata:
  creationTimestamp: "2021-09-14T16:52:33Z"
  generation: 1
  name: compute-0.example.com
  namespace: openshift-ptp
  resourceVersion: "177400"
  uid: 30413db0-4d8d-46da-9bef-737bacd548fd
spec: {}
status:
  devices:
  - name: eno1
  - name: eno2
  - name: eno3
  - name: eno4
  - name: enp5s0f0
  - name: enp5s0f1

apiVersion: ptp.openshift.io/v1
kind: NodePtpDevice
metadata:
  creationTimestamp: "2021-09-14T16:52:33Z"
  generation: 1
  name: compute-0.example.com
  namespace: openshift-ptp
  resourceVersion: "177400"
  uid: 30413db0-4d8d-46da-9bef-737bacd548fd
spec: {}
status:
  devices:
  - name: eno1
  - name: eno2
  - name: eno3
  - name: eno4
  - name: enp5s0f0
  - name: enp5s0f1

Copy to Clipboard

Toggle word wrap

Check that the PTP interface is successfully synchronized to the primary clock by accessing the linuxptp-daemon pod for the corresponding node.

Get the name of the linuxptp-daemon pod and corresponding node you want to troubleshoot by running the following command:

oc get pods -n openshift-ptp -o wide

$ oc get pods -n openshift-ptp -o wide

Copy to Clipboard

Toggle word wrap

Example output

NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
linuxptp-daemon-lmvgn           3/3     Running   0          4d17h   10.1.196.24   compute-0.example.com
linuxptp-daemon-qhfg7           3/3     Running   0          4d17h   10.1.196.25   compute-1.example.com
ptp-operator-6b8dcbf7f4-zndk7   1/1     Running   0          5d7h    10.129.0.61   control-plane-1.example.com

NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE
linuxptp-daemon-lmvgn           3/3     Running   0          4d17h   10.1.196.24   compute-0.example.com
linuxptp-daemon-qhfg7           3/3     Running   0          4d17h   10.1.196.25   compute-1.example.com
ptp-operator-6b8dcbf7f4-zndk7   1/1     Running   0          5d7h    10.129.0.61   control-plane-1.example.com

Copy to Clipboard

Toggle word wrap

Remote shell into the required linuxptp-daemon container:
```
oc rsh -n openshift-ptp -c linuxptp-daemon-container <linux_daemon_container>
```
```
$ oc rsh -n openshift-ptp -c linuxptp-daemon-container <linux_daemon_container>
```
Copy to Clipboard Toggle word wrap
where:
<linux_daemon_container>
is the container you want to diagnose, for example linuxptp-daemon-lmvgn.

In the remote shell connection to the linuxptp-daemon container, use the PTP Management Client (pmc) tool to diagnose the network interface. Run the following pmc command to check the sync status of the PTP device, for example ptp4l.

pmc -u -f /var/run/ptp4l.0.config -b 0 'GET PORT_DATA_SET'

# pmc -u -f /var/run/ptp4l.0.config -b 0 'GET PORT_DATA_SET'

Copy to Clipboard

Toggle word wrap

Example output when the node is successfully synced to the primary clock

sending: GET PORT_DATA_SET
    40a6b7.fffe.166ef0-1 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET
        portIdentity            40a6b7.fffe.166ef0-1
        portState               SLAVE
        logMinDelayReqInterval  -4
        peerMeanPathDelay       0
        logAnnounceInterval     -3
        announceReceiptTimeout  3
        logSyncInterval         -4
        delayMechanism          1
        logMinPdelayReqInterval -4
        versionNumber           2

sending: GET PORT_DATA_SET
    40a6b7.fffe.166ef0-1 seq 0 RESPONSE MANAGEMENT PORT_DATA_SET
        portIdentity            40a6b7.fffe.166ef0-1
        portState               SLAVE
        logMinDelayReqInterval  -4
        peerMeanPathDelay       0
        logAnnounceInterval     -3
        announceReceiptTimeout  3
        logSyncInterval         -4
        delayMechanism          1
        logMinPdelayReqInterval -4
        versionNumber           2

Copy to Clipboard

Toggle word wrap

17.7.1. Collecting Precision Time Protocol (PTP) Operator data
Copy link

You can use the oc adm must-gather CLI command to collect information about your cluster, including features and objects associated with Precision Time Protocol (PTP) Operator.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).
You have installed the PTP Operator.

Procedure

To collect PTP Operator data with must-gather, you must specify the PTP Operator must-gather image.

oc adm must-gather --image=registry.redhat.io/openshift4/ptp-must-gather-rhel8:v4.13

$ oc adm must-gather --image=registry.redhat.io/openshift4/ptp-must-gather-rhel8:v4.13

Copy to Clipboard

Toggle word wrap

17.8. PTP hardware fast event notifications framework
Copy link

Cloud native applications such as virtual RAN (vRAN) require access to notifications about hardware timing events that are critical to the functioning of the overall network. PTP clock synchronization errors can negatively affect the performance and reliability of your low-latency application, for example, a vRAN application running in a distributed unit (DU).

17.8.1. About PTP and clock synchronization error events
Copy link

Loss of PTP synchronization is a critical error for a RAN network. If synchronization is lost on a node, the radio might be shut down and the network Over the Air (OTA) traffic might be shifted to another node in the wireless network. Fast event notifications mitigate against workload errors by allowing cluster nodes to communicate PTP clock sync status to the vRAN application running in the DU.

Event notifications are available to vRAN applications running on the same DU node. A publish-subscribe REST API passes events notifications to the messaging bus. Publish-subscribe messaging, or pub-sub messaging, is an asynchronous service-to-service communication architecture where any message published to a topic is immediately received by all of the subscribers to the topic.

The PTP Operator generates fast event notifications for every PTP-capable network interface. You can access the events by using a cloud-event-proxy sidecar container over an HTTP or Advanced Message Queuing Protocol (AMQP) message bus.

Note

PTP fast event notifications are available for network interfaces configured to use PTP ordinary clocks or PTP boundary clocks.

Note

HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.

17.8.2. About the PTP fast event notifications framework
Copy link

Use the Precision Time Protocol (PTP) fast event notifications framework to subscribe cluster applications to PTP events that the bare-metal cluster node generates.

Note

The fast events notifications framework uses a REST API for communication. The REST API is based on the O-RAN O-Cloud Notification API Specification for Event Consumers 3.0 that is available from O-RAN ALLIANCE Specifications.

The framework consists of a publisher, subscriber, and an AMQ or HTTP messaging protocol to handle communications between the publisher and subscriber applications. Applications run the cloud-event-proxy container in a sidecar pattern to subscribe to PTP events. The cloud-event-proxy sidecar container can access the same resources as the primary application container without using any of the resources of the primary application and with no significant latency.

Note

HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.

Figure 17.1. Overview of PTP fast events

Event is generated on the cluster host: linuxptp-daemon in the PTP Operator-managed pod runs as a Kubernetes DaemonSet and manages the various linuxptp processes (ptp4l, phc2sys, and optionally for grandmaster clocks, ts2phc). The linuxptp-daemon passes the event to the UNIX domain socket.
Event is passed to the cloud-event-proxy sidecar: The PTP plugin reads the event from the UNIX domain socket and passes it to the cloud-event-proxy sidecar in the PTP Operator-managed pod. cloud-event-proxy delivers the event from the Kubernetes infrastructure to Cloud-Native Network Functions (CNFs) with low latency.
Event is persisted: The cloud-event-proxy sidecar in the PTP Operator-managed pod processes the event and publishes the cloud-native event by using a REST API.
Message is transported: The message transporter transports the event to the cloud-event-proxy sidecar in the application pod over HTTP or AMQP 1.0 QPID.
Event is available from the REST API: The cloud-event-proxy sidecar in the Application pod processes the event and makes it available by using the REST API.
Consumer application requests a subscription and receives the subscribed event: The consumer application sends an API request to the cloud-event-proxy sidecar in the application pod to create a PTP events subscription. The cloud-event-proxy sidecar creates an AMQ or HTTP messaging listener protocol for the resource specified in the subscription.

The cloud-event-proxy sidecar in the application pod receives the event from the PTP Operator-managed pod, unwraps the cloud events object to retrieve the data, and posts the event to the consumer application. The consumer application listens to the address specified in the resource qualifier and receives and processes the PTP event.

17.8.3. Configuring the PTP fast event notifications publisher
Copy link

To start using PTP fast event notifications for a network interface in your cluster, you must enable the fast event publisher in the PTP Operator PtpOperatorConfig custom resource (CR) and configure ptpClockThreshold values in a PtpConfig CR that you create.

Prerequisites

You have installed the OpenShift Container Platform CLI (oc).
You have logged in as a user with cluster-admin privileges.
You have installed the PTP Operator.

Procedure

Modify the default PTP Operator config to enable PTP fast events.
1. Save the following YAML in the ptp-operatorconfig.yaml file:
  apiVersion: ptp.openshift.io/v1 kind: PtpOperatorConfig metadata: name: default namespace: openshift-ptp spec: daemonNodeSelector: node-role.kubernetes.io/worker: "" ptpEventConfig: enableEventPublisher: true
  1
  Copy to Clipboard Toggle word wrap
  1
  Set enableEventPublisher to true to enable PTP fast event notifications.
Note
In OpenShift Container Platform 4.13 or later, you do not need to set the spec.ptpEventConfig.transportHost field in the PtpOperatorConfig resource when you use HTTP transport for PTP events. Set transportHost only when you use AMQP transport for PTP events.
1. Update the PtpOperatorConfig CR:
  $ oc apply -f ptp-operatorconfig.yaml
  Copy to Clipboard Toggle word wrap
Create a PtpConfig custom resource (CR) for the PTP enabled interface, and set the required values for ptpClockThreshold and ptp4lOpts. The following YAML illustrates the required values that you must set in the PtpConfig CR:
```
spec:
  profile:
  - name: "profile1"
    interface: "enp5s0f0"
    ptp4lOpts: "-2 -s --summary_interval -4" 
    phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" 
    ptp4lConf: "" 
    ptpClockThreshold: 
      holdOverTimeout: 5
      maxOffsetThreshold: 100
      minOffsetThreshold: -100
```
```
spec:
  profile:
  - name: "profile1"
    interface: "enp5s0f0"
    ptp4lOpts: "-2 -s --summary_interval -4" 
```
1
```
    phc2sysOpts: "-a -r -m -n 24 -N 8 -R 16" 
```
2
```
    ptp4lConf: "" 
```
3
```
    ptpClockThreshold: 
```
4
```
      holdOverTimeout: 5
      maxOffsetThreshold: 100
      minOffsetThreshold: -100
```
Copy to Clipboard Toggle word wrap
1
Append --summary_interval -4 to use PTP fast events.
2
Required phc2sysOpts values. -m prints messages to stdout. The linuxptp-daemon DaemonSet parses the logs and generates Prometheus metrics.
3
Specify a string that contains the configuration to replace the default /etc/ptp4l.conf file. To use the default configuration, leave the field empty.
4
Optional. If the ptpClockThreshold stanza is not present, default values are used for the ptpClockThreshold fields. The stanza shows default ptpClockThreshold values. The ptpClockThreshold values configure how long after the PTP master clock is disconnected before PTP events are triggered. holdOverTimeout is the time value in seconds before the PTP clock event state changes to FREERUN when the PTP master clock is disconnected. The maxOffsetThreshold and minOffsetThreshold settings configure offset values in nanoseconds that compare against the values for CLOCK_REALTIME (phc2sys) or master offset (ptp4l). When the ptp4l or phc2sys offset value is outside this range, the PTP clock state is set to FREERUN. When the offset value is within this range, the PTP clock state is set to LOCKED.

17.8.4. Migrating consumer applications to use HTTP transport for PTP or bare-metal events
Copy link

If you have previously deployed PTP or bare-metal events consumer applications, you need to update the applications to use HTTP message transport.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in as a user with cluster-admin privileges.
You have updated the PTP Operator or Bare Metal Event Relay to version 4.13+ which uses HTTP transport by default.

Procedure

Update your events consumer application to use HTTP transport. Set the http-event-publishers variable for the cloud event sidecar deployment.

For example, in a cluster with PTP events configured, the following YAML snippet illustrates a cloud event sidecar deployment:

containers:
  - name: cloud-event-sidecar
    image: cloud-event-sidecar
    args:
      - "--metrics-addr=127.0.0.1:9091"
      - "--store-path=/store"
      - "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043"
      - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043" 
      - "--api-port=8089"

containers:
  - name: cloud-event-sidecar
    image: cloud-event-sidecar
    args:
      - "--metrics-addr=127.0.0.1:9091"
      - "--store-path=/store"
      - "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043"
      - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"

1


      - "--api-port=8089"

Copy to Clipboard

Toggle word wrap

1: The PTP Operator automatically resolves NODE_NAME to the host that is generating the PTP events. For example, compute-1.example.com.

In a cluster with bare-metal events configured, set the http-event-publishers field to hw-event-publisher-service.openshift-bare-metal-events.svc.cluster.local:9043 in the cloud event sidecar deployment CR.

Deploy the consumer-events-subscription-service service alongside the events consumer application. For example:

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
    service.alpha.openshift.io/serving-cert-secret-name: sidecar-consumer-secret
  name: consumer-events-subscription-service
  namespace: cloud-events
  labels:
    app: consumer-service
spec:
  ports:
    - name: sub-port
      port: 9043
  selector:
    app: consumer
  clusterIP: None
  sessionAffinity: None
  type: ClusterIP

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
    service.alpha.openshift.io/serving-cert-secret-name: sidecar-consumer-secret
  name: consumer-events-subscription-service
  namespace: cloud-events
  labels:
    app: consumer-service
spec:
  ports:
    - name: sub-port
      port: 9043
  selector:
    app: consumer
  clusterIP: None
  sessionAffinity: None
  type: ClusterIP

Copy to Clipboard

Toggle word wrap

17.8.5. Installing the AMQ messaging bus
Copy link

To pass PTP fast event notifications between publisher and subscriber on a node, you can install and configure an AMQ messaging bus to run locally on the node. To use AMQ messaging, you must install the AMQ Interconnect Operator.

Note

HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.

Prerequisites

Install the OpenShift Container Platform CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Install the AMQ Interconnect Operator to its own amq-interconnect namespace. See Adding the Red Hat Integration - AMQ Interconnect Operator.

Verification

Check that the AMQ Interconnect Operator is available and the required pods are running:

oc get pods -n amq-interconnect

$ oc get pods -n amq-interconnect

Copy to Clipboard

Toggle word wrap

Example output

NAME                                    READY   STATUS    RESTARTS   AGE
amq-interconnect-645db76c76-k8ghs       1/1     Running   0          23h
interconnect-operator-5cb5fc7cc-4v7qm   1/1     Running   0          23h

NAME                                    READY   STATUS    RESTARTS   AGE
amq-interconnect-645db76c76-k8ghs       1/1     Running   0          23h
interconnect-operator-5cb5fc7cc-4v7qm   1/1     Running   0          23h

Copy to Clipboard

Toggle word wrap

Check that the required linuxptp-daemon PTP event producer pods are running in the openshift-ptp namespace.

oc get pods -n openshift-ptp

$ oc get pods -n openshift-ptp

Copy to Clipboard

Toggle word wrap

Example output

NAME                     READY   STATUS    RESTARTS       AGE
linuxptp-daemon-2t78p    3/3     Running   0              12h
linuxptp-daemon-k8n88    3/3     Running   0              12h

NAME                     READY   STATUS    RESTARTS       AGE
linuxptp-daemon-2t78p    3/3     Running   0              12h
linuxptp-daemon-k8n88    3/3     Running   0              12h

Copy to Clipboard

Toggle word wrap

17.8.6. Subscribing DU applications to PTP events REST API reference
Copy link

Use the PTP event notifications REST API to subscribe a distributed unit (DU) application to the PTP events that are generated on the parent node.

Subscribe applications to PTP events by using the resource address /cluster/node/<node_name>/ptp, where <node_name> is the cluster node running the DU application.

Deploy your cloud-event-consumer DU application container and cloud-event-proxy sidecar container in a separate DU application pod. The cloud-event-consumer DU application subscribes to the cloud-event-proxy container in the application pod.

Use the following API endpoints to subscribe the cloud-event-consumer DU application to PTP events posted by the cloud-event-proxy container at http://localhost:8089/api/ocloudNotifications/v1/ in the DU application pod:

/api/ocloudNotifications/v1/subscriptions
- POST: Creates a new subscription
- GET: Retrieves a list of subscriptions
- DELETE: Deletes all subscriptions
/api/ocloudNotifications/v1/subscriptions/<subscription_id>
- GET: Returns details for the specified subscription ID
- DELETE: Deletes the subscription associated with the specified subscription ID
/api/ocloudNotifications/v1/health
- GET: Returns the health status of ocloudNotifications API
api/ocloudNotifications/v1/publishers
- GET: Returns an array of os-clock-sync-state, ptp-clock-class-change, and lock-state messages for the cluster node
/api/ocloudnotifications/v1/{resource_address}/CurrentState
- GET: Returns the current state of one the following event types: os-clock-sync-state, ptp-clock-class-change, or lock-state events

Note

9089 is the default port for the cloud-event-consumer container deployed in the application pod. You can configure a different port for your DU application as required.

17.8.6.1. api/ocloudNotifications/v1/subscriptions
Copy link

HTTP method

GET api/ocloudNotifications/v1/subscriptions

Description

Returns a list of subscriptions. If subscriptions exist, a 200 OK status code is returned along with the list of subscriptions.

Example API response

[
 {
  "id": "75b1ad8f-c807-4c23-acf5-56f4b7ee3826",
  "endpointUri": "http://localhost:9089/event",
  "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/75b1ad8f-c807-4c23-acf5-56f4b7ee3826",
  "resource": "/cluster/node/compute-1.example.com/ptp"
 }
]

[
 {
  "id": "75b1ad8f-c807-4c23-acf5-56f4b7ee3826",
  "endpointUri": "http://localhost:9089/event",
  "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/75b1ad8f-c807-4c23-acf5-56f4b7ee3826",
  "resource": "/cluster/node/compute-1.example.com/ptp"
 }
]

Copy to Clipboard

Toggle word wrap

HTTP method

POST api/ocloudNotifications/v1/subscriptions

Description

Creates a new subscription. If a subscription is successfully created, or if it already exists, a 201 Created status code is returned.

Expand

Table 17.5. Query parameters
Parameter	Type
subscription	data

Example payload

{
  "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions",
  "resource": "/cluster/node/compute-1.example.com/ptp"
}

{
  "uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions",
  "resource": "/cluster/node/compute-1.example.com/ptp"
}

Copy to Clipboard

Toggle word wrap

HTTP method

DELETE api/ocloudNotifications/v1/subscriptions

Description

Deletes all subscriptions.

Example API response

{
"status": "deleted all subscriptions"
}

{
"status": "deleted all subscriptions"
}

Copy to Clipboard

Toggle word wrap

17.8.6.2. api/ocloudNotifications/v1/subscriptions/{subscription_id}
Copy link

HTTP method

GET api/ocloudNotifications/v1/subscriptions/{subscription_id}

Description

Returns details for the subscription with ID subscription_id.

Expand

Table 17.6. Global path parameters
Parameter	Type
`subscription_id`	string

Example API response

{
  "id":"48210fb3-45be-4ce0-aa9b-41a0e58730ab",
  "endpointUri": "http://localhost:9089/event",
  "uriLocation":"http://localhost:8089/api/ocloudNotifications/v1/subscriptions/48210fb3-45be-4ce0-aa9b-41a0e58730ab",
  "resource":"/cluster/node/compute-1.example.com/ptp"
}

{
  "id":"48210fb3-45be-4ce0-aa9b-41a0e58730ab",
  "endpointUri": "http://localhost:9089/event",
  "uriLocation":"http://localhost:8089/api/ocloudNotifications/v1/subscriptions/48210fb3-45be-4ce0-aa9b-41a0e58730ab",
  "resource":"/cluster/node/compute-1.example.com/ptp"
}

Copy to Clipboard

Toggle word wrap

HTTP method

DELETE api/ocloudNotifications/v1/subscriptions/{subscription_id}

Description

Deletes the subscription with ID subscription_id.

Expand

Table 17.7. Global path parameters
Parameter	Type
`subscription_id`	string

Example API response

{
"status": "OK"
}

{
"status": "OK"
}

Copy to Clipboard

Toggle word wrap

17.8.6.3. api/ocloudNotifications/v1/health
Copy link

HTTP method

GET api/ocloudNotifications/v1/health/

Description

Returns the health status for the ocloudNotifications REST API.

Example API response

OK

OK

Copy to Clipboard

Toggle word wrap

17.8.6.4. api/ocloudNotifications/v1/publishers
Copy link

HTTP method

GET api/ocloudNotifications/v1/publishers

Description

Returns an array of os-clock-sync-state, ptp-clock-class-change, and lock-state details for the cluster node. The system generates notifications when the relevant equipment state changes.

os-clock-sync-state notifications describe the host operating system clock synchronization state. Can be in LOCKED or FREERUN state.
ptp-clock-class-change notifications describe the current state of the PTP clock class.
lock-state notifications describe the current status of the PTP equipment lock state. Can be in LOCKED, HOLDOVER or FREERUN state.

Example API response

[
  {
    "id": "0fa415ae-a3cf-4299-876a-589438bacf75",
    "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
    "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/0fa415ae-a3cf-4299-876a-589438bacf75",
    "resource": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state"
  },
  {
    "id": "28cd82df-8436-4f50-bbd9-7a9742828a71",
    "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
    "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/28cd82df-8436-4f50-bbd9-7a9742828a71",
    "resource": "/cluster/node/compute-1.example.com/sync/ptp-status/ptp-clock-class-change"
  },
  {
    "id": "44aa480d-7347-48b0-a5b0-e0af01fa9677",
    "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
    "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/44aa480d-7347-48b0-a5b0-e0af01fa9677",
    "resource": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state"
  }
]

[
  {
    "id": "0fa415ae-a3cf-4299-876a-589438bacf75",
    "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
    "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/0fa415ae-a3cf-4299-876a-589438bacf75",
    "resource": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state"
  },
  {
    "id": "28cd82df-8436-4f50-bbd9-7a9742828a71",
    "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
    "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/28cd82df-8436-4f50-bbd9-7a9742828a71",
    "resource": "/cluster/node/compute-1.example.com/sync/ptp-status/ptp-clock-class-change"
  },
  {
    "id": "44aa480d-7347-48b0-a5b0-e0af01fa9677",
    "endpointUri": "http://localhost:9085/api/ocloudNotifications/v1/dummy",
    "uriLocation": "http://localhost:9085/api/ocloudNotifications/v1/publishers/44aa480d-7347-48b0-a5b0-e0af01fa9677",
    "resource": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state"
  }
]

Copy to Clipboard

Toggle word wrap

You can find os-clock-sync-state, ptp-clock-class-change and lock-state events in the logs for the cloud-event-proxy container. For example:

oc logs -f linuxptp-daemon-cvgr6 -n openshift-ptp -c cloud-event-proxy

$ oc logs -f linuxptp-daemon-cvgr6 -n openshift-ptp -c cloud-event-proxy

Copy to Clipboard

Toggle word wrap

Example os-clock-sync-state event

{
   "id":"c8a784d1-5f4a-4c16-9a81-a3b4313affe5",
   "type":"event.sync.sync-status.os-clock-sync-state-change",
   "source":"/cluster/compute-1.example.com/ptp/CLOCK_REALTIME",
   "dataContentType":"application/json",
   "time":"2022-05-06T15:31:23.906277159Z",
   "data":{
      "version":"v1",
      "values":[
         {
            "resource":"/sync/sync-status/os-clock-sync-state",
            "dataType":"notification",
            "valueType":"enumeration",
            "value":"LOCKED"
         },
         {
            "resource":"/sync/sync-status/os-clock-sync-state",
            "dataType":"metric",
            "valueType":"decimal64.3",
            "value":"-53"
         }
      ]
   }
}

{
   "id":"c8a784d1-5f4a-4c16-9a81-a3b4313affe5",
   "type":"event.sync.sync-status.os-clock-sync-state-change",
   "source":"/cluster/compute-1.example.com/ptp/CLOCK_REALTIME",
   "dataContentType":"application/json",
   "time":"2022-05-06T15:31:23.906277159Z",
   "data":{
      "version":"v1",
      "values":[
         {
            "resource":"/sync/sync-status/os-clock-sync-state",
            "dataType":"notification",
            "valueType":"enumeration",
            "value":"LOCKED"
         },
         {
            "resource":"/sync/sync-status/os-clock-sync-state",
            "dataType":"metric",
            "valueType":"decimal64.3",
            "value":"-53"
         }
      ]
   }
}

Copy to Clipboard

Toggle word wrap

Example ptp-clock-class-change event

{
   "id":"69eddb52-1650-4e56-b325-86d44688d02b",
   "type":"event.sync.ptp-status.ptp-clock-class-change",
   "source":"/cluster/compute-1.example.com/ptp/ens2fx/master",
   "dataContentType":"application/json",
   "time":"2022-05-06T15:31:23.147100033Z",
   "data":{
      "version":"v1",
      "values":[
         {
            "resource":"/sync/ptp-status/ptp-clock-class-change",
            "dataType":"metric",
            "valueType":"decimal64.3",
            "value":"135"
         }
      ]
   }
}

{
   "id":"69eddb52-1650-4e56-b325-86d44688d02b",
   "type":"event.sync.ptp-status.ptp-clock-class-change",
   "source":"/cluster/compute-1.example.com/ptp/ens2fx/master",
   "dataContentType":"application/json",
   "time":"2022-05-06T15:31:23.147100033Z",
   "data":{
      "version":"v1",
      "values":[
         {
            "resource":"/sync/ptp-status/ptp-clock-class-change",
            "dataType":"metric",
            "valueType":"decimal64.3",
            "value":"135"
         }
      ]
   }
}

Copy to Clipboard

Toggle word wrap

Example lock-state event

{
   "id":"305ec18b-1472-47b3-aadd-8f37933249a9",
   "type":"event.sync.ptp-status.ptp-state-change",
   "source":"/cluster/compute-1.example.com/ptp/ens2fx/master",
   "dataContentType":"application/json",
   "time":"2022-05-06T15:31:23.467684081Z",
   "data":{
      "version":"v1",
      "values":[
         {
            "resource":"/sync/ptp-status/lock-state",
            "dataType":"notification",
            "valueType":"enumeration",
            "value":"LOCKED"
         },
         {
            "resource":"/sync/ptp-status/lock-state",
            "dataType":"metric",
            "valueType":"decimal64.3",
            "value":"62"
         }
      ]
   }
}

{
   "id":"305ec18b-1472-47b3-aadd-8f37933249a9",
   "type":"event.sync.ptp-status.ptp-state-change",
   "source":"/cluster/compute-1.example.com/ptp/ens2fx/master",
   "dataContentType":"application/json",
   "time":"2022-05-06T15:31:23.467684081Z",
   "data":{
      "version":"v1",
      "values":[
         {
            "resource":"/sync/ptp-status/lock-state",
            "dataType":"notification",
            "valueType":"enumeration",
            "value":"LOCKED"
         },
         {
            "resource":"/sync/ptp-status/lock-state",
            "dataType":"metric",
            "valueType":"decimal64.3",
            "value":"62"
         }
      ]
   }
}

Copy to Clipboard

Toggle word wrap

17.8.6.5. api/ocloudNotifications/v1/{resource_address}/CurrentState
Copy link

HTTP method

GET api/ocloudNotifications/v1/cluster/node/<node_name>/sync/ptp-status/lock-state/CurrentState

GET api/ocloudNotifications/v1/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state/CurrentState

GET api/ocloudNotifications/v1/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change/CurrentState

Description

Configure the CurrentState API endpoint to return the current state of the os-clock-sync-state, ptp-clock-class-change, or lock-state events for the cluster node.

os-clock-sync-state notifications describe the host operating system clock synchronization state. Can be in LOCKED or FREERUN state.
ptp-clock-class-change notifications describe the current state of the PTP clock class.
lock-state notifications describe the current status of the PTP equipment lock state. Can be in LOCKED, HOLDOVER or FREERUN state.

Expand

Table 17.8. Global path parameters
Parameter	Type
`resource_address`	string

Example lock-state API response

{
  "id": "c1ac3aa5-1195-4786-84f8-da0ea4462921",
  "type": "event.sync.ptp-status.ptp-state-change",
  "source": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
  "dataContentType": "application/json",
  "time": "2023-01-10T02:41:57.094981478Z",
  "data": {
    "version": "v1",
    "values": [
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "notification",
        "valueType": "enumeration",
        "value": "LOCKED"
      },
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "metric",
        "valueType": "decimal64.3",
        "value": "29"
      }
    ]
  }
}

{
  "id": "c1ac3aa5-1195-4786-84f8-da0ea4462921",
  "type": "event.sync.ptp-status.ptp-state-change",
  "source": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
  "dataContentType": "application/json",
  "time": "2023-01-10T02:41:57.094981478Z",
  "data": {
    "version": "v1",
    "values": [
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "notification",
        "valueType": "enumeration",
        "value": "LOCKED"
      },
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "metric",
        "valueType": "decimal64.3",
        "value": "29"
      }
    ]
  }
}

Copy to Clipboard

Toggle word wrap

Example os-clock-sync-state API response

{
  "specversion": "0.3",
  "id": "4f51fe99-feaa-4e66-9112-66c5c9b9afcb",
  "source": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
  "type": "event.sync.sync-status.os-clock-sync-state-change",
  "subject": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
  "datacontenttype": "application/json",
  "time": "2022-11-29T17:44:22.202Z",
  "data": {
    "version": "v1",
    "values": [
      {
        "resource": "/cluster/node/compute-1.example.com/CLOCK_REALTIME",
        "dataType": "notification",
        "valueType": "enumeration",
        "value": "LOCKED"
      },
      {
        "resource": "/cluster/node/compute-1.example.com/CLOCK_REALTIME",
        "dataType": "metric",
        "valueType": "decimal64.3",
        "value": "27"
      }
    ]
  }
}

{
  "specversion": "0.3",
  "id": "4f51fe99-feaa-4e66-9112-66c5c9b9afcb",
  "source": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
  "type": "event.sync.sync-status.os-clock-sync-state-change",
  "subject": "/cluster/node/compute-1.example.com/sync/sync-status/os-clock-sync-state",
  "datacontenttype": "application/json",
  "time": "2022-11-29T17:44:22.202Z",
  "data": {
    "version": "v1",
    "values": [
      {
        "resource": "/cluster/node/compute-1.example.com/CLOCK_REALTIME",
        "dataType": "notification",
        "valueType": "enumeration",
        "value": "LOCKED"
      },
      {
        "resource": "/cluster/node/compute-1.example.com/CLOCK_REALTIME",
        "dataType": "metric",
        "valueType": "decimal64.3",
        "value": "27"
      }
    ]
  }
}

Copy to Clipboard

Toggle word wrap

Example ptp-clock-class-change API response

{
  "id": "064c9e67-5ad4-4afb-98ff-189c6aa9c205",
  "type": "event.sync.ptp-status.ptp-clock-class-change",
  "source": "/cluster/node/compute-1.example.com/sync/ptp-status/ptp-clock-class-change",
  "dataContentType": "application/json",
  "time": "2023-01-10T02:41:56.785673989Z",
  "data": {
    "version": "v1",
    "values": [
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "metric",
        "valueType": "decimal64.3",
        "value": "165"
      }
    ]
  }
}

{
  "id": "064c9e67-5ad4-4afb-98ff-189c6aa9c205",
  "type": "event.sync.ptp-status.ptp-clock-class-change",
  "source": "/cluster/node/compute-1.example.com/sync/ptp-status/ptp-clock-class-change",
  "dataContentType": "application/json",
  "time": "2023-01-10T02:41:56.785673989Z",
  "data": {
    "version": "v1",
    "values": [
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "metric",
        "valueType": "decimal64.3",
        "value": "165"
      }
    ]
  }
}

Copy to Clipboard

Toggle word wrap

17.8.7. Monitoring PTP fast event metrics
Copy link

You can monitor PTP fast events metrics from cluster nodes where the linuxptp-daemon is running. You can also monitor PTP fast event metrics in the OpenShift Container Platform web console by using the preconfigured and self-updating Prometheus monitoring stack.

Prerequisites

Install the OpenShift Container Platform CLI oc.
Log in as a user with cluster-admin privileges.
Install and configure the PTP Operator on a node with PTP-capable hardware.

Procedure

Check for exposed PTP metrics on any node where the linuxptp-daemon is running. For example, run the following command:

curl http://<node_name>:9091/metrics

$ curl http://<node_name>:9091/metrics

Copy to Clipboard

Toggle word wrap

Example output

# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="ens1fx",node="compute-1.example.com",process="ptp4l"} 1
openshift_ptp_clock_state{iface="ens3fx",node="compute-1.example.com",process="ptp4l"} 1
openshift_ptp_clock_state{iface="ens5fx",node="compute-1.example.com",process="ptp4l"} 1
openshift_ptp_clock_state{iface="ens7fx",node="compute-1.example.com",process="ptp4l"} 1
# HELP openshift_ptp_delay_ns
# TYPE openshift_ptp_delay_ns gauge
openshift_ptp_delay_ns{from="master",iface="ens1fx",node="compute-1.example.com",process="ptp4l"} 842
openshift_ptp_delay_ns{from="master",iface="ens3fx",node="compute-1.example.com",process="ptp4l"} 480
openshift_ptp_delay_ns{from="master",iface="ens5fx",node="compute-1.example.com",process="ptp4l"} 584
openshift_ptp_delay_ns{from="master",iface="ens7fx",node="compute-1.example.com",process="ptp4l"} 482
openshift_ptp_delay_ns{from="phc",iface="CLOCK_REALTIME",node="compute-1.example.com",process="phc2sys"} 547
# HELP openshift_ptp_offset_ns
# TYPE openshift_ptp_offset_ns gauge
openshift_ptp_offset_ns{from="master",iface="ens1fx",node="compute-1.example.com",process="ptp4l"} -2
openshift_ptp_offset_ns{from="master",iface="ens3fx",node="compute-1.example.com",process="ptp4l"} -44
openshift_ptp_offset_ns{from="master",iface="ens5fx",node="compute-1.example.com",process="ptp4l"} -8
openshift_ptp_offset_ns{from="master",iface="ens7fx",node="compute-1.example.com",process="ptp4l"} 3
openshift_ptp_offset_ns{from="phc",iface="CLOCK_REALTIME",node="compute-1.example.com",process="phc2sys"} 12

# HELP openshift_ptp_clock_state 0 = FREERUN, 1 = LOCKED, 2 = HOLDOVER
# TYPE openshift_ptp_clock_state gauge
openshift_ptp_clock_state{iface="ens1fx",node="compute-1.example.com",process="ptp4l"} 1
openshift_ptp_clock_state{iface="ens3fx",node="compute-1.example.com",process="ptp4l"} 1
openshift_ptp_clock_state{iface="ens5fx",node="compute-1.example.com",process="ptp4l"} 1
openshift_ptp_clock_state{iface="ens7fx",node="compute-1.example.com",process="ptp4l"} 1
# HELP openshift_ptp_delay_ns
# TYPE openshift_ptp_delay_ns gauge
openshift_ptp_delay_ns{from="master",iface="ens1fx",node="compute-1.example.com",process="ptp4l"} 842
openshift_ptp_delay_ns{from="master",iface="ens3fx",node="compute-1.example.com",process="ptp4l"} 480
openshift_ptp_delay_ns{from="master",iface="ens5fx",node="compute-1.example.com",process="ptp4l"} 584
openshift_ptp_delay_ns{from="master",iface="ens7fx",node="compute-1.example.com",process="ptp4l"} 482
openshift_ptp_delay_ns{from="phc",iface="CLOCK_REALTIME",node="compute-1.example.com",process="phc2sys"} 547
# HELP openshift_ptp_offset_ns
# TYPE openshift_ptp_offset_ns gauge
openshift_ptp_offset_ns{from="master",iface="ens1fx",node="compute-1.example.com",process="ptp4l"} -2
openshift_ptp_offset_ns{from="master",iface="ens3fx",node="compute-1.example.com",process="ptp4l"} -44
openshift_ptp_offset_ns{from="master",iface="ens5fx",node="compute-1.example.com",process="ptp4l"} -8
openshift_ptp_offset_ns{from="master",iface="ens7fx",node="compute-1.example.com",process="ptp4l"} 3
openshift_ptp_offset_ns{from="phc",iface="CLOCK_REALTIME",node="compute-1.example.com",process="phc2sys"} 12

Copy to Clipboard

Toggle word wrap

To view the PTP event in the OpenShift Container Platform web console, copy the name of the PTP metric you want to query, for example, openshift_ptp_offset_ns.
In the OpenShift Container Platform web console, click Observe → Metrics.
Paste the PTP metric name into the Expression field, and click Run queries.

Chapter 18. Developing Precision Time Protocol events consumer applications
Copy link

When developing consumer applications that make use of Precision Time Protocol (PTP) events on a bare-metal cluster node, you need to deploy your consumer application and a cloud-event-proxy container in a separate application pod. The cloud-event-proxy container receives the events from the PTP Operator pod and passes it to the consumer application. The consumer application subscribes to the events posted in the cloud-event-proxy container by using a REST API.

For more information about deploying PTP events applications, see About the PTP fast event notifications framework.

Note

The following information provides general guidance for developing consumer applications that use PTP events. A complete events consumer application example is outside the scope of this information.

18.1. PTP events consumer application reference
Copy link

PTP event consumer applications require the following features:

A web service running with a POST handler to receive the cloud native PTP events JSON payload
A createSubscription function to subscribe to the PTP events producer
A getCurrentState function to poll the current state of the PTP events producer

The following example Go snippets illustrate these requirements:

Example PTP events consumer server function in Go

func server() {
  http.HandleFunc("/event", getEvent)
  http.ListenAndServe("localhost:8989", nil)
}

func getEvent(w http.ResponseWriter, req *http.Request) {
  defer req.Body.Close()
  bodyBytes, err := io.ReadAll(req.Body)
  if err != nil {
    log.Errorf("error reading event %v", err)
  }
  e := string(bodyBytes)
  if e != "" {
    processEvent(bodyBytes)
    log.Infof("received event %s", string(bodyBytes))
  } else {
    w.WriteHeader(http.StatusNoContent)
  }
}

func server() {
  http.HandleFunc("/event", getEvent)
  http.ListenAndServe("localhost:8989", nil)
}

func getEvent(w http.ResponseWriter, req *http.Request) {
  defer req.Body.Close()
  bodyBytes, err := io.ReadAll(req.Body)
  if err != nil {
    log.Errorf("error reading event %v", err)
  }
  e := string(bodyBytes)
  if e != "" {
    processEvent(bodyBytes)
    log.Infof("received event %s", string(bodyBytes))
  } else {
    w.WriteHeader(http.StatusNoContent)
  }
}

Copy to Clipboard

Toggle word wrap

Example PTP events createSubscription function in Go

import (
"github.com/redhat-cne/sdk-go/pkg/pubsub"
"github.com/redhat-cne/sdk-go/pkg/types"
v1pubsub "github.com/redhat-cne/sdk-go/v1/pubsub"
)

// Subscribe to PTP events using REST API
s1,_:=createsubscription("/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state") 
s2,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change")
s3,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/lock-state")

// Create PTP event subscriptions POST
func createSubscription(resourceAddress string) (sub pubsub.PubSub, err error) {
  var status int
      apiPath:= "/api/ocloudNotifications/v1/"
      localAPIAddr:=localhost:8989 // vDU service API address
      apiAddr:= "localhost:8089" // event framework API address

  subURL := &types.URI{URL: url.URL{Scheme: "http",
    Host: apiAddr
    Path: fmt.Sprintf("%s%s", apiPath, "subscriptions")}}
  endpointURL := &types.URI{URL: url.URL{Scheme: "http",
    Host: localAPIAddr,
    Path: "event"}}

  sub = v1pubsub.NewPubSub(endpointURL, resourceAddress)
  var subB []byte

  if subB, err = json.Marshal(&sub); err == nil {
    rc := restclient.New()
    if status, subB = rc.PostWithReturn(subURL, subB); status != http.StatusCreated {
      err = fmt.Errorf("error in subscription creation api at %s, returned status %d", subURL, status)
    } else {
      err = json.Unmarshal(subB, &sub)
    }
  } else {
    err = fmt.Errorf("failed to marshal subscription for %s", resourceAddress)
  }
  return
}

import (
"github.com/redhat-cne/sdk-go/pkg/pubsub"
"github.com/redhat-cne/sdk-go/pkg/types"
v1pubsub "github.com/redhat-cne/sdk-go/v1/pubsub"
)

// Subscribe to PTP events using REST API
s1,_:=createsubscription("/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state")

1


s2,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change")
s3,_:=createsubscription("/cluster/node/<node_name>/sync/ptp-status/lock-state")

// Create PTP event subscriptions POST
func createSubscription(resourceAddress string) (sub pubsub.PubSub, err error) {
  var status int
      apiPath:= "/api/ocloudNotifications/v1/"
      localAPIAddr:=localhost:8989 // vDU service API address
      apiAddr:= "localhost:8089" // event framework API address

  subURL := &types.URI{URL: url.URL{Scheme: "http",
    Host: apiAddr
    Path: fmt.Sprintf("%s%s", apiPath, "subscriptions")}}
  endpointURL := &types.URI{URL: url.URL{Scheme: "http",
    Host: localAPIAddr,
    Path: "event"}}

  sub = v1pubsub.NewPubSub(endpointURL, resourceAddress)
  var subB []byte

  if subB, err = json.Marshal(&sub); err == nil {
    rc := restclient.New()
    if status, subB = rc.PostWithReturn(subURL, subB); status != http.StatusCreated {
      err = fmt.Errorf("error in subscription creation api at %s, returned status %d", subURL, status)
    } else {
      err = json.Unmarshal(subB, &sub)
    }
  } else {
    err = fmt.Errorf("failed to marshal subscription for %s", resourceAddress)
  }
  return
}

Copy to Clipboard

Toggle word wrap

1: Replace <node_name> with the FQDN of the node that is generating the PTP events. For example, compute-1.example.com.

Example PTP events consumer getCurrentState function in Go

//Get PTP event state for the resource
func getCurrentState(resource string) {
  //Create publisher
  url := &types.URI{URL: url.URL{Scheme: "http",
    Host: localhost:8989,
    Path: fmt.SPrintf("/api/ocloudNotifications/v1/%s/CurrentState",resource}}
  rc := restclient.New()
  status, event := rc.Get(url)
  if status != http.StatusOK {
    log.Errorf("CurrentState:error %d from url %s, %s", status, url.String(), event)
  } else {
    log.Debugf("Got CurrentState: %s ", event)
  }
}

//Get PTP event state for the resource
func getCurrentState(resource string) {
  //Create publisher
  url := &types.URI{URL: url.URL{Scheme: "http",
    Host: localhost:8989,
    Path: fmt.SPrintf("/api/ocloudNotifications/v1/%s/CurrentState",resource}}
  rc := restclient.New()
  status, event := rc.Get(url)
  if status != http.StatusOK {
    log.Errorf("CurrentState:error %d from url %s, %s", status, url.String(), event)
  } else {
    log.Debugf("Got CurrentState: %s ", event)
  }
}

Copy to Clipboard

Toggle word wrap

18.2. Reference cloud-event-proxy deployment and service CRs
Copy link

Use the following example cloud-event-proxy deployment and subscriber service CRs as a reference when deploying your PTP events consumer application.

Note

HTTP transport is the default transport for PTP and bare-metal events. Use HTTP transport instead of AMQP for PTP and bare-metal events where possible. AMQ Interconnect is EOL from 30 June 2024. Extended life cycle support (ELS) for AMQ Interconnect ends 29 November 2029. For more information see, Red Hat AMQ Interconnect support status.

Reference cloud-event-proxy deployment with HTTP transport

apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-consumer-deployment
  namespace: <namespace>
  labels:
    app: consumer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: consumer
  template:
    metadata:
      labels:
        app: consumer
    spec:
      serviceAccountName: sidecar-consumer-sa
      containers:
        - name: event-subscriber
          image: event-subscriber-app
        - name: cloud-event-proxy-as-sidecar
          image: openshift4/ose-cloud-event-proxy
          args:
            - "--metrics-addr=127.0.0.1:9091"
            - "--store-path=/store"
            - "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043"
            - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"
            - "--api-port=8089"
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
              volumeMounts:
                - name: pubsubstore
                  mountPath: /store
          ports:
            - name: metrics-port
              containerPort: 9091
            - name: sub-port
              containerPort: 9043
          volumes:
            - name: pubsubstore
              emptyDir: {}

apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-consumer-deployment
  namespace: <namespace>
  labels:
    app: consumer
spec:
  replicas: 1
  selector:
    matchLabels:
      app: consumer
  template:
    metadata:
      labels:
        app: consumer
    spec:
      serviceAccountName: sidecar-consumer-sa
      containers:
        - name: event-subscriber
          image: event-subscriber-app
        - name: cloud-event-proxy-as-sidecar
          image: openshift4/ose-cloud-event-proxy
          args:
            - "--metrics-addr=127.0.0.1:9091"
            - "--store-path=/store"
            - "--transport-host=consumer-events-subscription-service.cloud-events.svc.cluster.local:9043"
            - "--http-event-publishers=ptp-event-publisher-service-NODE_NAME.openshift-ptp.svc.cluster.local:9043"
            - "--api-port=8089"
          env:
            - name: NODE_NAME
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: NODE_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
              volumeMounts:
                - name: pubsubstore
                  mountPath: /store
          ports:
            - name: metrics-port
              containerPort: 9091
            - name: sub-port
              containerPort: 9043
          volumes:
            - name: pubsubstore
              emptyDir: {}

Copy to Clipboard

Toggle word wrap

Reference cloud-event-proxy deployment with AMQ transport

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloud-event-proxy-sidecar
  namespace: cloud-events
  labels:
    app: cloud-event-proxy
spec:
  selector:
    matchLabels:
      app: cloud-event-proxy
  template:
    metadata:
      labels:
        app: cloud-event-proxy
    spec:
      nodeSelector:
        node-role.kubernetes.io/worker: ""
      containers:
        - name: cloud-event-sidecar
          image: openshift4/ose-cloud-event-proxy
          args:
            - "--metrics-addr=127.0.0.1:9091"
            - "--store-path=/store"
            - "--transport-host=amqp://router.router.svc.cluster.local"
            - "--api-port=8089"
          env:
            - name: <node_name>
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: <node_ip>
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
          volumeMounts:
            - name: pubsubstore
              mountPath: /store
          ports:
            - name: metrics-port
              containerPort: 9091
            - name: sub-port
              containerPort: 9043
          volumes:
            - name: pubsubstore
              emptyDir: {}

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloud-event-proxy-sidecar
  namespace: cloud-events
  labels:
    app: cloud-event-proxy
spec:
  selector:
    matchLabels:
      app: cloud-event-proxy
  template:
    metadata:
      labels:
        app: cloud-event-proxy
    spec:
      nodeSelector:
        node-role.kubernetes.io/worker: ""
      containers:
        - name: cloud-event-sidecar
          image: openshift4/ose-cloud-event-proxy
          args:
            - "--metrics-addr=127.0.0.1:9091"
            - "--store-path=/store"
            - "--transport-host=amqp://router.router.svc.cluster.local"
            - "--api-port=8089"
          env:
            - name: <node_name>
              valueFrom:
                fieldRef:
                  fieldPath: spec.nodeName
            - name: <node_ip>
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
          volumeMounts:
            - name: pubsubstore
              mountPath: /store
          ports:
            - name: metrics-port
              containerPort: 9091
            - name: sub-port
              containerPort: 9043
          volumes:
            - name: pubsubstore
              emptyDir: {}

Copy to Clipboard

Toggle word wrap

Reference cloud-event-proxy subscriber service

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
    service.alpha.openshift.io/serving-cert-secret-name: sidecar-consumer-secret
  name: consumer-events-subscription-service
  namespace: cloud-events
  labels:
    app: consumer-service
spec:
  ports:
    - name: sub-port
      port: 9043
  selector:
    app: consumer
  clusterIP: None
  sessionAffinity: None
  type: ClusterIP

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: "true"
    service.alpha.openshift.io/serving-cert-secret-name: sidecar-consumer-secret
  name: consumer-events-subscription-service
  namespace: cloud-events
  labels:
    app: consumer-service
spec:
  ports:
    - name: sub-port
      port: 9043
  selector:
    app: consumer
  clusterIP: None
  sessionAffinity: None
  type: ClusterIP

Copy to Clipboard

Toggle word wrap

18.3. PTP events available from the cloud-event-proxy sidecar REST API
Copy link

PTP events consumer applications can poll the PTP events producer for the following PTP timing events.

Expand

Table 18.1. PTP events available from the cloud-event-proxy sidecar
Resource URI	Description
`/cluster/node/<node_name>/sync/ptp-status/lock-state`	Describes the current status of the PTP equipment lock state. Can be in `LOCKED`, `HOLDOVER`, or `FREERUN` state.
`/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state`	Describes the host operating system clock synchronization state. Can be in `LOCKED` or `FREERUN` state.
`/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change`	Describes the current state of the PTP clock class.

18.4. Subscribing the consumer application to PTP events
Copy link

Before the PTP events consumer application can poll for events, you need to subscribe the application to the event producer.

18.4.1. Subscribing to PTP lock-state events
Copy link

To create a subscription for PTP lock-state events, send a POST action to the cloud event API at http://localhost:8081/api/ocloudNotifications/v1/subscriptions with the following payload:

{
"endpointUri": "http://localhost:8989/event",
"resource": "/cluster/node/<node_name>/sync/ptp-status/lock-state",
}

{
"endpointUri": "http://localhost:8989/event",
"resource": "/cluster/node/<node_name>/sync/ptp-status/lock-state",
}

Copy to Clipboard

Toggle word wrap

Example response

{
"id": "e23473d9-ba18-4f78-946e-401a0caeff90",
"endpointUri": "http://localhost:8989/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90",
"resource": "/cluster/node/<node_name>/sync/ptp-status/lock-state",
}

{
"id": "e23473d9-ba18-4f78-946e-401a0caeff90",
"endpointUri": "http://localhost:8989/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90",
"resource": "/cluster/node/<node_name>/sync/ptp-status/lock-state",
}

Copy to Clipboard

Toggle word wrap

18.4.2. Subscribing to PTP os-clock-sync-state events
Copy link

To create a subscription for PTP os-clock-sync-state events, send a POST action to the cloud event API at http://localhost:8081/api/ocloudNotifications/v1/subscriptions with the following payload:

{
"endpointUri": "http://localhost:8989/event",
"resource": "/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state",
}

{
"endpointUri": "http://localhost:8989/event",
"resource": "/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state",
}

Copy to Clipboard

Toggle word wrap

Example response

{
"id": "e23473d9-ba18-4f78-946e-401a0caeff90",
"endpointUri": "http://localhost:8989/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90",
"resource": "/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state",
}

{
"id": "e23473d9-ba18-4f78-946e-401a0caeff90",
"endpointUri": "http://localhost:8989/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90",
"resource": "/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state",
}

Copy to Clipboard

Toggle word wrap

18.4.3. Subscribing to PTP ptp-clock-class-change events
Copy link

To create a subscription for PTP ptp-clock-class-change events, send a POST action to the cloud event API at http://localhost:8081/api/ocloudNotifications/v1/subscriptions with the following payload:

{
"endpointUri": "http://localhost:8989/event",
"resource": "/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change",
}

{
"endpointUri": "http://localhost:8989/event",
"resource": "/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change",
}

Copy to Clipboard

Toggle word wrap

Example response

{
"id": "e23473d9-ba18-4f78-946e-401a0caeff90",
"endpointUri": "http://localhost:8989/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90",
"resource": "/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change",
}

{
"id": "e23473d9-ba18-4f78-946e-401a0caeff90",
"endpointUri": "http://localhost:8989/event",
"uriLocation": "http://localhost:8089/api/ocloudNotifications/v1/subscriptions/e23473d9-ba18-4f78-946e-401a0caeff90",
"resource": "/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change",
}

Copy to Clipboard

Toggle word wrap

18.5. Getting the current PTP clock status
Copy link

To get the current PTP status for the node, send a GET action to one of the following event REST APIs:

http://localhost:8081/api/ocloudNotifications/v1/cluster/node/<node_name>/sync/ptp-status/lock-state/CurrentState
http://localhost:8081/api/ocloudNotifications/v1/cluster/node/<node_name>/sync/sync-status/os-clock-sync-state/CurrentState
http://localhost:8081/api/ocloudNotifications/v1/cluster/node/<node_name>/sync/ptp-status/ptp-clock-class-change/CurrentState

The response is a cloud native event JSON object. For example:

Example lock-state API response

{
  "id": "c1ac3aa5-1195-4786-84f8-da0ea4462921",
  "type": "event.sync.ptp-status.ptp-state-change",
  "source": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
  "dataContentType": "application/json",
  "time": "2023-01-10T02:41:57.094981478Z",
  "data": {
    "version": "v1",
    "values": [
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "notification",
        "valueType": "enumeration",
        "value": "LOCKED"
      },
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "metric",
        "valueType": "decimal64.3",
        "value": "29"
      }
    ]
  }
}

{
  "id": "c1ac3aa5-1195-4786-84f8-da0ea4462921",
  "type": "event.sync.ptp-status.ptp-state-change",
  "source": "/cluster/node/compute-1.example.com/sync/ptp-status/lock-state",
  "dataContentType": "application/json",
  "time": "2023-01-10T02:41:57.094981478Z",
  "data": {
    "version": "v1",
    "values": [
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "notification",
        "valueType": "enumeration",
        "value": "LOCKED"
      },
      {
        "resource": "/cluster/node/compute-1.example.com/ens5fx/master",
        "dataType": "metric",
        "valueType": "decimal64.3",
        "value": "29"
      }
    ]
  }
}

Copy to Clipboard

Toggle word wrap

18.6. Verifying that the PTP events consumer application is receiving events
Copy link

Verify that the cloud-event-proxy container in the application pod is receiving PTP events.

Prerequisites

You have installed the OpenShift CLI (oc).
You have logged in as a user with cluster-admin privileges.
You have installed and configured the PTP Operator.

Procedure

Get the list of active linuxptp-daemon pods. Run the following command:

oc get pods -n openshift-ptp

$ oc get pods -n openshift-ptp

Copy to Clipboard

Toggle word wrap

Example output

NAME                    READY   STATUS    RESTARTS   AGE
linuxptp-daemon-2t78p   3/3     Running   0          8h
linuxptp-daemon-k8n88   3/3     Running   0          8h

NAME                    READY   STATUS    RESTARTS   AGE
linuxptp-daemon-2t78p   3/3     Running   0          8h
linuxptp-daemon-k8n88   3/3     Running   0          8h

Copy to Clipboard

Toggle word wrap

Access the metrics for the required consumer-side cloud-event-proxy container by running the following command:

oc exec -it <linuxptp-daemon> -n openshift-ptp -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics

$ oc exec -it <linuxptp-daemon> -n openshift-ptp -c cloud-event-proxy -- curl 127.0.0.1:9091/metrics

Copy to Clipboard

Toggle word wrap

where:

<linuxptp-daemon>

Specifies the pod you want to query, for example, linuxptp-daemon-2t78p.

Example output

# HELP cne_transport_connections_resets Metric to get number of connection resets
# TYPE cne_transport_connections_resets gauge
cne_transport_connection_reset 1
# HELP cne_transport_receiver Metric to get number of receiver created
# TYPE cne_transport_receiver gauge
cne_transport_receiver{address="/cluster/node/compute-1.example.com/ptp",status="active"} 2
cne_transport_receiver{address="/cluster/node/compute-1.example.com/redfish/event",status="active"} 2
# HELP cne_transport_sender Metric to get number of sender created
# TYPE cne_transport_sender gauge
cne_transport_sender{address="/cluster/node/compute-1.example.com/ptp",status="active"} 1
cne_transport_sender{address="/cluster/node/compute-1.example.com/redfish/event",status="active"} 1
# HELP cne_events_ack Metric to get number of events produced
# TYPE cne_events_ack gauge
cne_events_ack{status="success",type="/cluster/node/compute-1.example.com/ptp"} 18
cne_events_ack{status="success",type="/cluster/node/compute-1.example.com/redfish/event"} 18
# HELP cne_events_transport_published Metric to get number of events published by the transport
# TYPE cne_events_transport_published gauge
cne_events_transport_published{address="/cluster/node/compute-1.example.com/ptp",status="failed"} 1
cne_events_transport_published{address="/cluster/node/compute-1.example.com/ptp",status="success"} 18
cne_events_transport_published{address="/cluster/node/compute-1.example.com/redfish/event",status="failed"} 1
cne_events_transport_published{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 18
# HELP cne_events_transport_received Metric to get number of events received  by the transport
# TYPE cne_events_transport_received gauge
cne_events_transport_received{address="/cluster/node/compute-1.example.com/ptp",status="success"} 18
cne_events_transport_received{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 18
# HELP cne_events_api_published Metric to get number of events published by the rest api
# TYPE cne_events_api_published gauge
cne_events_api_published{address="/cluster/node/compute-1.example.com/ptp",status="success"} 19
cne_events_api_published{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 19
# HELP cne_events_received Metric to get number of events received
# TYPE cne_events_received gauge
cne_events_received{status="success",type="/cluster/node/compute-1.example.com/ptp"} 18
cne_events_received{status="success",type="/cluster/node/compute-1.example.com/redfish/event"} 18
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 4
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

# HELP cne_transport_connections_resets Metric to get number of connection resets
# TYPE cne_transport_connections_resets gauge
cne_transport_connection_reset 1
# HELP cne_transport_receiver Metric to get number of receiver created
# TYPE cne_transport_receiver gauge
cne_transport_receiver{address="/cluster/node/compute-1.example.com/ptp",status="active"} 2
cne_transport_receiver{address="/cluster/node/compute-1.example.com/redfish/event",status="active"} 2
# HELP cne_transport_sender Metric to get number of sender created
# TYPE cne_transport_sender gauge
cne_transport_sender{address="/cluster/node/compute-1.example.com/ptp",status="active"} 1
cne_transport_sender{address="/cluster/node/compute-1.example.com/redfish/event",status="active"} 1
# HELP cne_events_ack Metric to get number of events produced
# TYPE cne_events_ack gauge
cne_events_ack{status="success",type="/cluster/node/compute-1.example.com/ptp"} 18
cne_events_ack{status="success",type="/cluster/node/compute-1.example.com/redfish/event"} 18
# HELP cne_events_transport_published Metric to get number of events published by the transport
# TYPE cne_events_transport_published gauge
cne_events_transport_published{address="/cluster/node/compute-1.example.com/ptp",status="failed"} 1
cne_events_transport_published{address="/cluster/node/compute-1.example.com/ptp",status="success"} 18
cne_events_transport_published{address="/cluster/node/compute-1.example.com/redfish/event",status="failed"} 1
cne_events_transport_published{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 18
# HELP cne_events_transport_received Metric to get number of events received  by the transport
# TYPE cne_events_transport_received gauge
cne_events_transport_received{address="/cluster/node/compute-1.example.com/ptp",status="success"} 18
cne_events_transport_received{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 18
# HELP cne_events_api_published Metric to get number of events published by the rest api
# TYPE cne_events_api_published gauge
cne_events_api_published{address="/cluster/node/compute-1.example.com/ptp",status="success"} 19
cne_events_api_published{address="/cluster/node/compute-1.example.com/redfish/event",status="success"} 19
# HELP cne_events_received Metric to get number of events received
# TYPE cne_events_received gauge
cne_events_received{status="success",type="/cluster/node/compute-1.example.com/ptp"} 18
cne_events_received{status="success",type="/cluster/node/compute-1.example.com/redfish/event"} 18
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 4
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

Copy to Clipboard

Toggle word wrap

Chapter 19. External DNS Operator
Copy link

19.1. External DNS Operator in OpenShift Container Platform
Copy link

The External DNS Operator deploys and manages ExternalDNS to provide the name resolution for services and routes from the external DNS provider to OpenShift Container Platform.

19.1.1. External DNS Operator
Copy link

The External DNS Operator implements the External DNS API from the olm.openshift.io API group. The External DNS Operator updates services, routes, and external DNS providers.

Prerequisites

You have installed the yq CLI tool.

Procedure

You can deploy the External DNS Operator on demand from the OperatorHub. Deploying the External DNS Operator creates a Subscription object.

Check the name of an install plan by running the following command:

oc -n external-dns-operator get sub external-dns-operator -o yaml | yq '.status.installplan.name'

$ oc -n external-dns-operator get sub external-dns-operator -o yaml | yq '.status.installplan.name'

Copy to Clipboard

Toggle word wrap

Example output

install-zcvlr

install-zcvlr

Copy to Clipboard

Toggle word wrap

Check if the status of an install plan is Complete by running the following command:
```
oc -n external-dns-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
```
```
$ oc -n external-dns-operator get ip <install_plan_name> -o yaml | yq '.status.phase'
```
Copy to Clipboard Toggle word wrap
Example output
```
Complete
```
```
Complete
```
Copy to Clipboard Toggle word wrap

View the status of the external-dns-operator deployment by running the following command:

oc get -n external-dns-operator deployment/external-dns-operator

$ oc get -n external-dns-operator deployment/external-dns-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                    READY     UP-TO-DATE   AVAILABLE   AGE
external-dns-operator   1/1       1            1           23h

NAME                    READY     UP-TO-DATE   AVAILABLE   AGE
external-dns-operator   1/1       1            1           23h

Copy to Clipboard

Toggle word wrap

19.1.2. External DNS Operator logs
Copy link

You can view External DNS Operator logs by using the oc logs command.

Procedure

View the logs of the External DNS Operator by running the following command:

oc logs -n external-dns-operator deployment/external-dns-operator -c external-dns-operator

$ oc logs -n external-dns-operator deployment/external-dns-operator -c external-dns-operator

Copy to Clipboard

Toggle word wrap

19.1.2.1. External DNS Operator domain name limitations
Copy link

The External DNS Operator uses the TXT registry which adds the prefix for TXT records. This reduces the maximum length of the domain name for TXT records. A DNS record cannot be present without a corresponding TXT record, so the domain name of the DNS record must follow the same limit as the TXT records. For example, a DNS record of <domain_name_from_source> results in a TXT record of external-dns-<record_type>-<domain_name_from_source>.

The domain name of the DNS records generated by the External DNS Operator has the following limitations:

Expand

Record type	Number of characters
CNAME	44
Wildcard CNAME records on AzureDNS	42
A	48
Wildcard A records on AzureDNS	46

The following error appears in the External DNS Operator logs if the generated domain name exceeds any of the domain name limitations:

time="2022-09-02T08:53:57Z" level=error msg="Failure in zone test.example.io. [Id: /hostedzone/Z06988883Q0H0RL6UMXXX]"
time="2022-09-02T08:53:57Z" level=error msg="InvalidChangeBatch: [FATAL problem: DomainLabelTooLong (Domain label is too long) encountered with 'external-dns-a-hello-openshift-aaaaaaaaaa-bbbbbbbbbb-ccccccc']\n\tstatus code: 400, request id: e54dfd5a-06c6-47b0-bcb9-a4f7c3a4e0c6"

time="2022-09-02T08:53:57Z" level=error msg="Failure in zone test.example.io. [Id: /hostedzone/Z06988883Q0H0RL6UMXXX]"
time="2022-09-02T08:53:57Z" level=error msg="InvalidChangeBatch: [FATAL problem: DomainLabelTooLong (Domain label is too long) encountered with 'external-dns-a-hello-openshift-aaaaaaaaaa-bbbbbbbbbb-ccccccc']\n\tstatus code: 400, request id: e54dfd5a-06c6-47b0-bcb9-a4f7c3a4e0c6"

Copy to Clipboard

Toggle word wrap

19.2. Installing External DNS Operator on cloud providers
Copy link

You can install the External DNS Operator on cloud providers such as AWS, Azure, and Google Cloud.

19.2.1. Installing the External DNS Operator with OperatorHub
Copy link

You can install the External DNS Operator by using the OpenShift Container Platform OperatorHub.

Procedure

Click Operators → OperatorHub in the OpenShift Container Platform web console.
Click External DNS Operator. You can use the Filter by keyword text box or the filter list to search for External DNS Operator from the list of Operators.
Select the external-dns-operator namespace.
On the External DNS Operator page, click Install.
On the Install Operator page, ensure that you selected the following options:
1. Update the channel as stable-v1.
2. Installation mode as A specific name on the cluster.
3. Installed namespace as external-dns-operator. If namespace external-dns-operator does not exist, it gets created during the Operator installation.
4. Select Approval Strategy as Automatic or Manual. Approval Strategy is set to Automatic by default.
5. Click Install.

If you select Automatic updates, the Operator Lifecycle Manager (OLM) automatically upgrades the running instance of your Operator without any intervention.

If you select Manual updates, the OLM creates an update request. As a cluster administrator, you must then manually approve that update request to have the Operator updated to the new version.

Verification

Verify that the External DNS Operator shows the Status as Succeeded on the Installed Operators dashboard.

19.2.2. Installing the External DNS Operator by using the CLI
Copy link

You can install the External DNS Operator by using the CLI.

Prerequisites

You are logged in to the OpenShift Container Platform web console as a user with cluster-admin permissions.
You are logged into the OpenShift CLI (oc).

Procedure

Create a Namespace object:
1. Create a YAML file that defines the Namespace object:
  Example namespace.yaml file
  apiVersion: v1 kind: Namespace metadata: name: external-dns-operator
  
  Copy to Clipboard Toggle word wrap
2. Create the Namespace object by running the following command:
  $ oc apply -f namespace.yaml
  Copy to Clipboard Toggle word wrap

Create an OperatorGroup object:

Create a YAML file that defines the OperatorGroup object:

Example operatorgroup.yaml file

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: external-dns-operator
  namespace: external-dns-operator
spec:
  upgradeStrategy: Default
  targetNamespaces:
  - external-dns-operator

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: external-dns-operator
  namespace: external-dns-operator
spec:
  upgradeStrategy: Default
  targetNamespaces:
  - external-dns-operator

Copy to Clipboard

Toggle word wrap

Create the OperatorGroup object by running the following command:
```
oc apply -f operatorgroup.yaml
```
```
$ oc apply -f operatorgroup.yaml
```
Copy to Clipboard Toggle word wrap

Create a Subscription object:

Create a YAML file that defines the Subscription object:

Example subscription.yaml file

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: external-dns-operator
  namespace: external-dns-operator
spec:
  channel: stable-v1
  installPlanApproval: Automatic
  name: external-dns-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: external-dns-operator
  namespace: external-dns-operator
spec:
  channel: stable-v1
  installPlanApproval: Automatic
  name: external-dns-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

Copy to Clipboard

Toggle word wrap

Create the Subscription object by running the following command:
```
oc apply -f subscription.yaml
```
```
$ oc apply -f subscription.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Get the name of the install plan from the subscription by running the following command:

oc -n external-dns-operator \
    get subscription external-dns-operator \
    --template='{{.status.installplan.name}}{{"\n"}}'

$ oc -n external-dns-operator \
    get subscription external-dns-operator \
    --template='{{.status.installplan.name}}{{"\n"}}'

Copy to Clipboard

Toggle word wrap

Verify that the status of the install plan is Complete by running the following command:

oc -n external-dns-operator \
    get ip <install_plan_name> \
    --template='{{.status.phase}}{{"\n"}}'

$ oc -n external-dns-operator \
    get ip <install_plan_name> \
    --template='{{.status.phase}}{{"\n"}}'

Copy to Clipboard

Toggle word wrap

Verify that the status of the external-dns-operator pod is Running by running the following command:

oc -n external-dns-operator get pod

$ oc -n external-dns-operator get pod

Copy to Clipboard

Toggle word wrap

Example output

NAME                                     READY   STATUS    RESTARTS   AGE
external-dns-operator-5584585fd7-5lwqm   2/2     Running   0          11m

NAME                                     READY   STATUS    RESTARTS   AGE
external-dns-operator-5584585fd7-5lwqm   2/2     Running   0          11m

Copy to Clipboard

Toggle word wrap

Verify that the catalog source of the subscription is redhat-operators by running the following command:

oc -n external-dns-operator get subscription

$ oc -n external-dns-operator get subscription

Copy to Clipboard

Toggle word wrap

Example output

NAME                    PACKAGE                 SOURCE             CHANNEL
external-dns-operator   external-dns-operator   redhat-operators   stable-v1

NAME                    PACKAGE                 SOURCE             CHANNEL
external-dns-operator   external-dns-operator   redhat-operators   stable-v1

Copy to Clipboard

Toggle word wrap

Check the external-dns-operator version by running the following command:

oc -n external-dns-operator get csv

$ oc -n external-dns-operator get csv

Copy to Clipboard

Toggle word wrap

Example output

NAME                           DISPLAY                VERSION   REPLACES   PHASE
external-dns-operator.v<1.y.z>   ExternalDNS Operator   <1.y.z>                Succeeded

NAME                           DISPLAY                VERSION   REPLACES   PHASE
external-dns-operator.v<1.y.z>   ExternalDNS Operator   <1.y.z>                Succeeded

Copy to Clipboard

Toggle word wrap

19.3. External DNS Operator configuration parameters
Copy link

The External DNS Operator includes the following configuration parameters.

19.3.1. External DNS Operator configuration parameters
Copy link

The External DNS Operator includes the following configuration parameters:

Expand

Parameter	Description
`spec`	Enables the type of a cloud provider. `spec: provider: type: AWS` 1 `aws: credentials: name: aws-access-key` 2 Copy to Clipboard Toggle word wrap 1 Defines available options such as AWS, Google Cloud, Azure, and Infoblox. 2 Defines a secret name for your cloud provider.
`zones`	Enables you to specify DNS zones by their domains. If you do not specify zones, the `ExternalDNS` resource discovers all of the zones present in your cloud provider account. `zones: - "myzoneid"` 1 Copy to Clipboard Toggle word wrap 1 Specifies the name of DNS zones.
`domains`	Enables you to specify AWS zones by their domains. If you do not specify domains, the `ExternalDNS` resource discovers all of the zones present in your cloud provider account. `domains: - filterType: Include` 1 `matchType: Exact` 2 `name: "myzonedomain1.com"` 3 `- filterType: Include matchType: Pattern` 4 `pattern: ".*\\.otherzonedomain\\.com"` 5 Copy to Clipboard Toggle word wrap 1 Ensures that the `ExternalDNS` resource includes the domain name. 2 Instructs `ExternalDNS` that the domain matching has to be exact as opposed to regular expression match. 3 Defines the name of the domain. 4 Sets the `regex-domain-filter` flag in the `ExternalDNS` resource. You can limit possible domains by using a Regex filter. 5 Defines the regex pattern to be used by the `ExternalDNS` resource to filter the domains of the target zones.
`source`	Enables you to specify the source for the DNS records, `Service` or `Route`. `source:` 1 `type: Service` 2 `service: serviceType:` 3 `- LoadBalancer - ClusterIP labelFilter:` 4 `matchLabels: external-dns.mydomain.org/publish: "yes" hostnameAnnotation: "Allow"` 5 `fqdnTemplate: - "{{.Name}}.myzonedomain.com"` 6 Copy to Clipboard Toggle word wrap 1 Defines the settings for the source of DNS records. 2 The `ExternalDNS` resource uses the `Service` type as the source for creating DNS records. 3 Sets the `service-type-filter` flag in the `ExternalDNS` resource. The `serviceType` contains the following fields: `default`: `LoadBalancer` `expected`: `ClusterIP` `NodePort` `LoadBalancer` `ExternalName` 4 Ensures that the controller considers only those resources which matches with label filter. 5 The default value for `hostnameAnnotation` is `Ignore` which instructs `ExternalDNS` to generate DNS records using the templates specified in the field `fqdnTemplates`. When the value is `Allow` the DNS records get generated based on the value specified in the `external-dns.alpha.kubernetes.io/hostname` annotation. 6 The External DNS Operator uses a string to generate DNS names from sources that don’t define a hostname, or to add a hostname suffix when paired with the fake source. `source: type: OpenShiftRoute` 1 `openshiftRouteOptions: routerName: default` 2 `labelFilter: matchLabels: external-dns.mydomain.org/publish: "yes"` Copy to Clipboard Toggle word wrap 1 Creates DNS records. 2 If the source type is `OpenShiftRoute`, then you can pass the Ingress Controller name. The `ExternalDNS` resource uses the canonical name of the Ingress Controller as the target for CNAME records.

19.4. Creating DNS records on AWS
Copy link

You can create DNS records on AWS and AWS GovCloud by using External DNS Operator.

19.4.1. Creating DNS records on an public hosted zone for AWS by using Red Hat External DNS Operator
Copy link

You can create DNS records on a public hosted zone for AWS by using the Red Hat External DNS Operator. You can use the same instructions to create DNS records on a hosted zone for AWS GovCloud.

Procedure

Check the user. The user must have access to the kube-system namespace. If you don’t have the credentials, as you can fetch the credentials from the kube-system namespace to use the cloud provider client:
```
oc whoami
```
```
$ oc whoami
```
Copy to Clipboard Toggle word wrap
Example output
```
system:admin
```
```
system:admin
```
Copy to Clipboard Toggle word wrap

Fetch the values from aws-creds secret present in kube-system namespace.

export AWS_ACCESS_KEY_ID=$(oc get secrets aws-creds -n kube-system  --template={{.data.aws_access_key_id}} | base64 -d)
export AWS_SECRET_ACCESS_KEY=$(oc get secrets aws-creds -n kube-system  --template={{.data.aws_secret_access_key}} | base64 -d)

$ export AWS_ACCESS_KEY_ID=$(oc get secrets aws-creds -n kube-system  --template={{.data.aws_access_key_id}} | base64 -d)
$ export AWS_SECRET_ACCESS_KEY=$(oc get secrets aws-creds -n kube-system  --template={{.data.aws_secret_access_key}} | base64 -d)

Copy to Clipboard

Toggle word wrap

Get the routes to check the domain:

oc get routes --all-namespaces | grep console

$ oc get routes --all-namespaces | grep console

Copy to Clipboard

Toggle word wrap

Example output

openshift-console          console             console-openshift-console.apps.testextdnsoperator.apacshift.support                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.testextdnsoperator.apacshift.support                     downloads           http    edge/Redirect          None

openshift-console          console             console-openshift-console.apps.testextdnsoperator.apacshift.support                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.testextdnsoperator.apacshift.support                     downloads           http    edge/Redirect          None

Copy to Clipboard

Toggle word wrap

Get the list of dns zones to find the one which corresponds to the previously found route’s domain:

aws route53 list-hosted-zones | grep testextdnsoperator.apacshift.support

$ aws route53 list-hosted-zones | grep testextdnsoperator.apacshift.support

Copy to Clipboard

Toggle word wrap

Example output

HOSTEDZONES	terraform	/hostedzone/Z02355203TNN1XXXX1J6O	testextdnsoperator.apacshift.support.	5

HOSTEDZONES	terraform	/hostedzone/Z02355203TNN1XXXX1J6O	testextdnsoperator.apacshift.support.	5

Copy to Clipboard

Toggle word wrap

Create ExternalDNS resource for route source:
```
$ cat <<EOF | oc create -f -
apiVersion: externaldns.olm.openshift.io/v1beta1
kind: ExternalDNS
metadata:
  name: sample-aws 
spec:
  domains:
  - filterType: Include   
    matchType: Exact   
    name: testextdnsoperator.apacshift.support 
  provider:
    type: AWS 
  source:  
    type: OpenShiftRoute 
    openshiftRouteOptions:
      routerName: default 
EOF
```
```
$ cat <<EOF | oc create -f -
apiVersion: externaldns.olm.openshift.io/v1beta1
kind: ExternalDNS
metadata:
  name: sample-aws 
```
1
```
spec:
  domains:
  - filterType: Include   
```
2
```
    matchType: Exact   
```
3
```
    name: testextdnsoperator.apacshift.support 
```
4
```
  provider:
    type: AWS 
```
5
```
  source:  
```
6
```
    type: OpenShiftRoute 
```
7
```
    openshiftRouteOptions:
      routerName: default 
```
8
```
EOF
```
Copy to Clipboard Toggle word wrap
1
Defines the name of external DNS resource.
2
By default all hosted zones are selected as potential targets. You can include a hosted zone that you need.
3
The matching of the target zone’s domain has to be exact (as opposed to regular expression match).
4
Specify the exact domain of the zone you want to update. The hostname of the routes must be subdomains of the specified domain.
5
Defines the AWS Route53 DNS provider.
6
Defines options for the source of DNS records.
7
Defines OpenShift route resource as the source for the DNS records which gets created in the previously specified DNS provider.
8
If the source is OpenShiftRoute, then you can pass the OpenShift Ingress Controller name. External DNS Operator selects the canonical hostname of that router as the target while creating CNAME record.

Check the records created for OCP routes using the following command:

aws route53 list-resource-record-sets --hosted-zone-id Z02355203TNN1XXXX1J6O --query "ResourceRecordSets[?Type == 'CNAME']" | grep console

$ aws route53 list-resource-record-sets --hosted-zone-id Z02355203TNN1XXXX1J6O --query "ResourceRecordSets[?Type == 'CNAME']" | grep console

Copy to Clipboard

Toggle word wrap

19.5. Creating DNS records on Azure
Copy link

You can create DNS records on Azure by using the External DNS Operator.

Important

Using the External DNS Operator on a {entra-first}-enabled cluster or a cluster that runs in Microsoft Azure Government (MAG) regions is not supported.

19.5.1. Creating DNS records on an Azure public DNS zone
Copy link

You can create DNS records on a public DNS zone for Azure by using the External DNS Operator.

Prerequisites

You must have administrator privileges.
The admin user must have access to the kube-system namespace.

Procedure

Fetch the credentials from the kube-system namespace to use the cloud provider client by running the following command:

CLIENT_ID=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_client_id}} | base64 -d)
CLIENT_SECRET=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_client_secret}} | base64 -d)
RESOURCE_GROUP=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_resourcegroup}} | base64 -d)
SUBSCRIPTION_ID=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_subscription_id}} | base64 -d)
TENANT_ID=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_tenant_id}} | base64 -d)

$ CLIENT_ID=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_client_id}} | base64 -d)
$ CLIENT_SECRET=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_client_secret}} | base64 -d)
$ RESOURCE_GROUP=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_resourcegroup}} | base64 -d)
$ SUBSCRIPTION_ID=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_subscription_id}} | base64 -d)
$ TENANT_ID=$(oc get secrets azure-credentials  -n kube-system  --template={{.data.azure_tenant_id}} | base64 -d)

Copy to Clipboard

Toggle word wrap

Log in to Azure by running the following command:

az login --service-principal -u "${CLIENT_ID}" -p "${CLIENT_SECRET}" --tenant "${TENANT_ID}"

$ az login --service-principal -u "${CLIENT_ID}" -p "${CLIENT_SECRET}" --tenant "${TENANT_ID}"

Copy to Clipboard

Toggle word wrap

Get a list of routes by running the following command:

oc get routes --all-namespaces | grep console

$ oc get routes --all-namespaces | grep console

Copy to Clipboard

Toggle word wrap

Example output

openshift-console          console             console-openshift-console.apps.test.azure.example.com                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.test.azure.example.com                     downloads           http    edge/Redirect          None

openshift-console          console             console-openshift-console.apps.test.azure.example.com                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.test.azure.example.com                     downloads           http    edge/Redirect          None

Copy to Clipboard

Toggle word wrap

Get a list of DNS zones by running the following command:

az network dns zone list --resource-group "${RESOURCE_GROUP}"

$ az network dns zone list --resource-group "${RESOURCE_GROUP}"

Copy to Clipboard

Toggle word wrap

Create a YAML file, for example, external-dns-sample-azure.yaml, that defines the ExternalDNS object:

Example external-dns-sample-azure.yaml file

apiVersion: externaldns.olm.openshift.io/v1beta1
kind: ExternalDNS
metadata:
  name: sample-azure 
spec:
  zones:
  - "/subscriptions/1234567890/resourceGroups/test-azure-xxxxx-rg/providers/Microsoft.Network/dnszones/test.azure.example.com" 
  provider:
    type: Azure 
  source:
    openshiftRouteOptions: 
      routerName: default 
    type: OpenShiftRoute

apiVersion: externaldns.olm.openshift.io/v1beta1
kind: ExternalDNS
metadata:
  name: sample-azure

1


spec:
  zones:
  - "/subscriptions/1234567890/resourceGroups/test-azure-xxxxx-rg/providers/Microsoft.Network/dnszones/test.azure.example.com"

2


  provider:
    type: Azure

3


  source:
    openshiftRouteOptions:

4


      routerName: default

5


    type: OpenShiftRoute

6

Copy to Clipboard

Toggle word wrap

1: Specifies the External DNS name.
2: Defines the zone ID.
3: Defines the provider type.
4: You can define options for the source of DNS records.
5: If the source type is OpenShiftRoute, you can pass the OpenShift Ingress Controller name. External DNS selects the canonical hostname of that router as the target while creating CNAME record.
6: Defines the route resource as the source for the Azure DNS records.

Check the DNS records created for OpenShift Container Platform routes by running the following command:
```
az network dns record-set list -g "${RESOURCE_GROUP}"  -z test.azure.example.com | grep console
```
```
$ az network dns record-set list -g "${RESOURCE_GROUP}"  -z test.azure.example.com | grep console
```
Copy to Clipboard Toggle word wrap
Note
To create records on private hosted zones on private Azure DNS, you need to specify the private zone under the zones field which populates the provider type to azure-private-dns in the ExternalDNS container arguments.

19.6. Creating DNS records on Google Cloud
Copy link

You can create DNS records on Google Cloud by using the External DNS Operator.

Important

Using the External DNS Operator on a cluster with Google Cloud Workload Identity enabled is not supported. For more information about the Google Cloud Workload Identity, see Using manual mode with Google Cloud Workload Identity.

19.6.1. Creating DNS records on a public managed zone for Google Cloud
Copy link

You can create DNS records on a public managed zone for Google Cloud by using the External DNS Operator.

Prerequisites

You must have administrator privileges.

Procedure

Copy the gcp-credentials secret in the encoded-gcloud.json file by running the following command:

oc get secret gcp-credentials -n kube-system --template='{{$v := index .data "service_account.json"}}{{$v}}' | base64 -d - > decoded-gcloud.json

$ oc get secret gcp-credentials -n kube-system --template='{{$v := index .data "service_account.json"}}{{$v}}' | base64 -d - > decoded-gcloud.json

Copy to Clipboard

Toggle word wrap

Export your Google credentials by running the following command:
```
export GOOGLE_CREDENTIALS=decoded-gcloud.json
```
```
$ export GOOGLE_CREDENTIALS=decoded-gcloud.json
```
Copy to Clipboard Toggle word wrap

Activate your account by using the following command:

gcloud auth activate-service-account  <client_email as per decoded-gcloud.json> --key-file=decoded-gcloud.json

$ gcloud auth activate-service-account  <client_email as per decoded-gcloud.json> --key-file=decoded-gcloud.json

Copy to Clipboard

Toggle word wrap

Set your project by running the following command:

gcloud config set project <project_id as per decoded-gcloud.json>

$ gcloud config set project <project_id as per decoded-gcloud.json>

Copy to Clipboard

Toggle word wrap

Get a list of routes by running the following command:

oc get routes --all-namespaces | grep console

$ oc get routes --all-namespaces | grep console

Copy to Clipboard

Toggle word wrap

Example output

openshift-console          console             console-openshift-console.apps.test.gcp.example.com                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.test.gcp.example.com                     downloads           http    edge/Redirect          None

openshift-console          console             console-openshift-console.apps.test.gcp.example.com                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.test.gcp.example.com                     downloads           http    edge/Redirect          None

Copy to Clipboard

Toggle word wrap

Get a list of managed zones by running the following command:

gcloud dns managed-zones list | grep test.gcp.example.com

$ gcloud dns managed-zones list | grep test.gcp.example.com

Copy to Clipboard

Toggle word wrap

Example output

qe-cvs4g-private-zone test.gcp.example.com

qe-cvs4g-private-zone test.gcp.example.com

Copy to Clipboard

Toggle word wrap

Create a YAML file, for example, external-dns-sample-gcp.yaml, that defines the ExternalDNS object:
Example external-dns-sample-gcp.yaml file
```
apiVersion: externaldns.olm.openshift.io/v1beta1
kind: ExternalDNS
metadata:
  name: sample-gcp 
spec:
  domains:
    - filterType: Include 
      matchType: Exact 
      name: test.gcp.example.com 
  provider:
    type: GCP 
  source:
    openshiftRouteOptions: 
      routerName: default 
    type: OpenShiftRoute 
```
```
apiVersion: externaldns.olm.openshift.io/v1beta1
kind: ExternalDNS
metadata:
  name: sample-gcp 
```
1
```
spec:
  domains:
    - filterType: Include 
```
2
```
      matchType: Exact 
```
3
```
      name: test.gcp.example.com 
```
4
```
  provider:
    type: GCP 
```
5
```
  source:
    openshiftRouteOptions: 
```
6
```
      routerName: default 
```
7
```
    type: OpenShiftRoute 
```
8
Copy to Clipboard Toggle word wrap
1
Specifies the External DNS name.
2
By default, all hosted zones are selected as potential targets. You can include your hosted zone.
3
The domain of the target must match the string defined by the name key.
4
Specify the exact domain of the zone you want to update. The hostname of the routes must be subdomains of the specified domain.
5
Defines the provider type.
6
You can define options for the source of DNS records.
7
If the source type is OpenShiftRoute, you can pass the OpenShift Ingress Controller name. External DNS selects the canonical hostname of that router as the target while creating CNAME record.
8
Defines the route resource as the source for Google Cloud DNS records.
Check the DNS records created for OpenShift Container Platform routes by running the following command:
```
gcloud dns record-sets list --zone=qe-cvs4g-private-zone | grep console
```
```
$ gcloud dns record-sets list --zone=qe-cvs4g-private-zone | grep console
```
Copy to Clipboard Toggle word wrap

19.7. Creating DNS records on Infoblox
Copy link

You can create DNS records on Infoblox by using the External DNS Operator.

19.7.1. Creating DNS records on a public DNS zone on Infoblox
Copy link

You can create DNS records on a public DNS zone on Infoblox by using the External DNS Operator.

Prerequisites

You have access to the OpenShift CLI (oc).
You have access to the Infoblox UI.

Procedure

Create a secret object with Infoblox credentials by running the following command:

oc -n external-dns-operator create secret generic infoblox-credentials --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_USERNAME=<infoblox_username> --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_PASSWORD=<infoblox_password>

$ oc -n external-dns-operator create secret generic infoblox-credentials --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_USERNAME=<infoblox_username> --from-literal=EXTERNAL_DNS_INFOBLOX_WAPI_PASSWORD=<infoblox_password>

Copy to Clipboard

Toggle word wrap

Get a list of routes by running the following command:

oc get routes --all-namespaces | grep console

$ oc get routes --all-namespaces | grep console

Copy to Clipboard

Toggle word wrap

Example Output

openshift-console          console             console-openshift-console.apps.test.example.com                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.test.example.com                     downloads           http    edge/Redirect          None

openshift-console          console             console-openshift-console.apps.test.example.com                       console             https   reencrypt/Redirect     None
openshift-console          downloads           downloads-openshift-console.apps.test.example.com                     downloads           http    edge/Redirect          None

Copy to Clipboard

Toggle word wrap

Create a YAML file, for example, external-dns-sample-infoblox.yaml, that defines the ExternalDNS object:

Example external-dns-sample-infoblox.yaml file

apiVersion: externaldns.olm.openshift.io/v1beta1
kind: ExternalDNS
metadata:
  name: sample-infoblox 
spec:
  provider:
    type: Infoblox 
    infoblox:
      credentials:
        name: infoblox-credentials
      gridHost: ${INFOBLOX_GRID_PUBLIC_IP}
      wapiPort: 443
      wapiVersion: "2.3.1"
  domains:
  - filterType: Include
    matchType: Exact
    name: test.example.com
  source:
    type: OpenShiftRoute 
    openshiftRouteOptions:
      routerName: default

apiVersion: externaldns.olm.openshift.io/v1beta1
kind: ExternalDNS
metadata:
  name: sample-infoblox

1


spec:
  provider:
    type: Infoblox

2


    infoblox:
      credentials:
        name: infoblox-credentials
      gridHost: ${INFOBLOX_GRID_PUBLIC_IP}
      wapiPort: 443
      wapiVersion: "2.3.1"
  domains:
  - filterType: Include
    matchType: Exact
    name: test.example.com
  source:
    type: OpenShiftRoute

3


    openshiftRouteOptions:
      routerName: default

4

Copy to Clipboard

Toggle word wrap

1: Specifies the External DNS name.
2: Defines the provider type.
3: You can define options for the source of DNS records.
4: If the source type is OpenShiftRoute, you can pass the OpenShift Ingress Controller name. External DNS selects the canonical hostname of that router as the target while creating CNAME record.

Create the ExternalDNS resource on Infoblox by running the following command:
```
oc create -f external-dns-sample-infoblox.yaml
```
```
$ oc create -f external-dns-sample-infoblox.yaml
```
Copy to Clipboard Toggle word wrap
From the Infoblox UI, check the DNS records created for console routes:
1. Click Data Management → DNS → Zones.
2. Select the zone name.

19.8. Configuring the cluster-wide proxy on the External DNS Operator
Copy link

After configuring the cluster-wide proxy, the Operator Lifecycle Manager (OLM) triggers automatic updates to all of the deployed Operators with the new contents of the HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variables.

19.8.1. Trusting the certificate authority of the cluster-wide proxy
Copy link

You can configure the External DNS Operator to trust the certificate authority of the cluster-wide proxy.

Procedure

Create the config map to contain the CA bundle in the external-dns-operator namespace by running the following command:
```
oc -n external-dns-operator create configmap trusted-ca
```
```
$ oc -n external-dns-operator create configmap trusted-ca
```
Copy to Clipboard Toggle word wrap
To inject the trusted CA bundle into the config map, add the config.openshift.io/inject-trusted-cabundle=true label to the config map by running the following command:
```
oc -n external-dns-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
```
```
$ oc -n external-dns-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
```
Copy to Clipboard Toggle word wrap

Update the subscription of the External DNS Operator by running the following command:

oc -n external-dns-operator patch subscription external-dns-operator --type='json' -p='[{"op": "add", "path": "/spec/config", "value":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}]}}]'

$ oc -n external-dns-operator patch subscription external-dns-operator --type='json' -p='[{"op": "add", "path": "/spec/config", "value":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}]}}]'

Copy to Clipboard

Toggle word wrap

Verification

After the deployment of the External DNS Operator is completed, verify that the trusted CA environment variable is added to the external-dns-operator deployment by running the following command:
```
oc -n external-dns-operator exec deploy/external-dns-operator -c external-dns-operator -- printenv TRUSTED_CA_CONFIGMAP_NAME
```
```
$ oc -n external-dns-operator exec deploy/external-dns-operator -c external-dns-operator -- printenv TRUSTED_CA_CONFIGMAP_NAME
```
Copy to Clipboard Toggle word wrap
Example output
```
trusted-ca
```
```
trusted-ca
```
Copy to Clipboard Toggle word wrap

Chapter 20. Network policy
Copy link

20.1. About network policy
Copy link

As a developer, you can define network policies that restrict traffic to pods in your cluster.

20.1.1. About network policy
Copy link

In a cluster using a network plugin that supports Kubernetes network policy, network isolation is controlled entirely by NetworkPolicy objects. In OpenShift Container Platform 4.13, OpenShift SDN supports using network policy in its default network isolation mode.

Warning

A network policy does not apply to the host network namespace. Pods with host networking enabled are unaffected by network policy rules. However, pods connecting to the host-networked pods might be affected by the network policy rules.
Using the namespaceSelector field without the podSelector field set to {} will not include hostNetwork pods. You must use the podSelector set to {} with the namespaceSelector field in order to target hostNetwork pods when creating network policies.
Network policies cannot block traffic from localhost or from their resident nodes.

By default, all pods in a project are accessible from other pods and network endpoints. To isolate one or more pods in a project, you can create NetworkPolicy objects in that project to indicate the allowed incoming connections. Project administrators can create and delete NetworkPolicy objects within their own project.

If a pod is matched by selectors in one or more NetworkPolicy objects, then the pod will accept only connections that are allowed by at least one of those NetworkPolicy objects. A pod that is not selected by any NetworkPolicy objects is fully accessible.

A network policy applies to only the Transmission Control Protocol (TCP), User Datagram Protocol (UDP), Internet Control Message Protocol (ICMP), and Stream Control Transmission Protocol (SCTP) protocols. Other protocols are not affected.

The following example NetworkPolicy objects demonstrate supporting different scenarios:

Deny all traffic:

To make a project deny by default, add a NetworkPolicy object that matches all pods but accepts no traffic:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
spec:
  podSelector: {}
  ingress: []

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
spec:
  podSelector: {}
  ingress: []

Copy to Clipboard

Toggle word wrap

Only allow connections from the OpenShift Container Platform Ingress Controller:

To make a project allow only connections from the OpenShift Container Platform Ingress Controller, add the following NetworkPolicy object.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-ingress
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          network.openshift.io/policy-group: ingress
  podSelector: {}
  policyTypes:
  - Ingress

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-ingress
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          network.openshift.io/policy-group: ingress
  podSelector: {}
  policyTypes:
  - Ingress

Copy to Clipboard

Toggle word wrap

Only accept connections from pods within a project:
Important
To allow ingress connections from hostNetwork pods in the same namespace, you need to apply the allow-from-hostnetwork policy together with the allow-same-namespace policy.
To make pods accept connections from other pods in the same project, but reject all other connections from pods in other projects, add the following NetworkPolicy object:
```
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
spec:
  podSelector: {}
  ingress:
  - from:
    - podSelector: {}
```
```
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
spec:
  podSelector: {}
  ingress:
  - from:
    - podSelector: {}
```
Copy to Clipboard Toggle word wrap

Only allow HTTP and HTTPS traffic based on pod labels:

To enable only HTTP and HTTPS access to the pods with a specific label (role=frontend in following example), add a NetworkPolicy object similar to the following:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-http-and-https
spec:
  podSelector:
    matchLabels:
      role: frontend
  ingress:
  - ports:
    - protocol: TCP
      port: 80
    - protocol: TCP
      port: 443

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-http-and-https
spec:
  podSelector:
    matchLabels:
      role: frontend
  ingress:
  - ports:
    - protocol: TCP
      port: 80
    - protocol: TCP
      port: 443

Copy to Clipboard

Toggle word wrap

Accept connections by using both namespace and pod selectors:

To match network traffic by combining namespace and pod selectors, you can use a NetworkPolicy object similar to the following:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-pod-and-namespace-both
spec:
  podSelector:
    matchLabels:
      name: test-pods
  ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            project: project_name
        podSelector:
          matchLabels:
            name: test-pods

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-pod-and-namespace-both
spec:
  podSelector:
    matchLabels:
      name: test-pods
  ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            project: project_name
        podSelector:
          matchLabels:
            name: test-pods

Copy to Clipboard

Toggle word wrap

NetworkPolicy objects are additive, which means you can combine multiple NetworkPolicy objects together to satisfy complex network requirements.

For example, for the NetworkPolicy objects defined in previous samples, you can define both allow-same-namespace and allow-http-and-https policies within the same project. Thus allowing the pods with the label role=frontend, to accept any connection allowed by each policy. That is, connections on any port from pods in the same namespace, and connections on ports 80 and 443 from pods in any namespace.

20.1.1.1. Using the allow-from-router network policy
Copy link

Use the following NetworkPolicy to allow external traffic regardless of the router configuration:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-router
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          policy-group.network.openshift.io/ingress: ""
  podSelector: {}
  policyTypes:
  - Ingress

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-router
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          policy-group.network.openshift.io/ingress: ""

1


  podSelector: {}
  policyTypes:
  - Ingress

Copy to Clipboard

Toggle word wrap

1: policy-group.network.openshift.io/ingress:"" label supports both OpenShift-SDN and OVN-Kubernetes.

20.1.1.2. Using the allow-from-hostnetwork network policy
Copy link

Add the following allow-from-hostnetwork NetworkPolicy object to direct traffic from the host network pods.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-hostnetwork
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          policy-group.network.openshift.io/host-network: ""
  podSelector: {}
  policyTypes:
  - Ingress

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-hostnetwork
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          policy-group.network.openshift.io/host-network: ""
  podSelector: {}
  policyTypes:
  - Ingress

Copy to Clipboard

Toggle word wrap

20.1.2. Optimizations for network policy with OpenShift SDN
Copy link

Use a network policy to isolate pods that are differentiated from one another by labels within a namespace.

It is inefficient to apply NetworkPolicy objects to large numbers of individual pods in a single namespace. Pod labels do not exist at the IP address level, so a network policy generates a separate Open vSwitch (OVS) flow rule for every possible link between every pod selected with a podSelector.

For example, if the spec podSelector and the ingress podSelector within a NetworkPolicy object each match 200 pods, then 40,000 (200*200) OVS flow rules are generated. This might slow down a node.

When designing your network policy, refer to the following guidelines:

Reduce the number of OVS flow rules by using namespaces to contain groups of pods that need to be isolated.
NetworkPolicy objects that select a whole namespace, by using the namespaceSelector or an empty podSelector, generate only a single OVS flow rule that matches the VXLAN virtual network ID (VNID) of the namespace.
Keep the pods that do not need to be isolated in their original namespace, and move the pods that require isolation into one or more different namespaces.
Create additional targeted cross-namespace network policies to allow the specific traffic that you do want to allow from the isolated pods.

20.1.3. Optimizations for network policy with OVN-Kubernetes network plugin
Copy link

When designing your network policy, refer to the following guidelines:

For network policies with the same spec.podSelector spec, it is more efficient to use one network policy with multiple ingress or egress rules, than multiple network policies with subsets of ingress or egress rules.

Every ingress or egress rule based on the podSelector or namespaceSelector spec generates the number of OVS flows proportional to number of pods selected by network policy + number of pods selected by ingress or egress rule. Therefore, it is preferable to use the podSelector or namespaceSelector spec that can select as many pods as you need in one rule, instead of creating individual rules for every pod.

For example, the following policy contains two rules:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
spec:
  podSelector: {}
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend
  - from:
    - podSelector:
        matchLabels:
          role: backend

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
spec:
  podSelector: {}
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend
  - from:
    - podSelector:
        matchLabels:
          role: backend

Copy to Clipboard

Toggle word wrap

The following policy expresses those same two rules as one:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
spec:
  podSelector: {}
  ingress:
  - from:
    - podSelector:
        matchExpressions:
        - {key: role, operator: In, values: [frontend, backend]}

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: test-network-policy
spec:
  podSelector: {}
  ingress:
  - from:
    - podSelector:
        matchExpressions:
        - {key: role, operator: In, values: [frontend, backend]}

Copy to Clipboard

Toggle word wrap

The same guideline applies to the spec.podSelector spec. If you have the same ingress or egress rules for different network policies, it might be more efficient to create one network policy with a common spec.podSelector spec. For example, the following two policies have different rules:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: policy1
spec:
  podSelector:
    matchLabels:
      role: db
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: policy2
spec:
  podSelector:
    matchLabels:
      role: client
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: policy1
spec:
  podSelector:
    matchLabels:
      role: db
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: policy2
spec:
  podSelector:
    matchLabels:
      role: client
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend

Copy to Clipboard

Toggle word wrap

The following network policy expresses those same two rules as one:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: policy3
spec:
  podSelector:
    matchExpressions:
    - {key: role, operator: In, values: [db, client]}
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: policy3
spec:
  podSelector:
    matchExpressions:
    - {key: role, operator: In, values: [db, client]}
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend

Copy to Clipboard

Toggle word wrap

You can apply this optimization when only multiple selectors are expressed as one. In cases where selectors are based on different labels, it may not be possible to apply this optimization. In those cases, consider applying some new labels for network policy optimization specifically.

20.1.3.1. NetworkPolicy CR and external IPs in OVN-Kubernetes
Copy link

In OVN-Kubernetes, the NetworkPolicy custom resource (CR) enforces strict isolation rules. If a service is exposed using an external IP, a network policy can block access from other namespaces unless explicitly configured to allow traffic.

To allow access to external IPs across namespaces, create a NetworkPolicy CR that explicitly permits ingress from the required namespaces and ensures traffic is allowed to the designated service ports. Without allowing traffic to the required ports, access might still be restricted.

Example output

  apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
    name: <policy_name>
    namespace: openshift-ingress
  spec:
    ingress:
    - ports:
      - port: 80
        protocol: TCP
    - ports:
      - port: 443
        protocol: TCP
    - from:
      - namespaceSelector:
          matchLabels:
          kubernetes.io/metadata.name: <my_namespace>
    podSelector: {}
    policyTypes:
    - Ingress

  apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    annotations:
    name: <policy_name>
    namespace: openshift-ingress
  spec:
    ingress:
    - ports:
      - port: 80
        protocol: TCP
    - ports:
      - port: 443
        protocol: TCP
    - from:
      - namespaceSelector:
          matchLabels:
          kubernetes.io/metadata.name: <my_namespace>
    podSelector: {}
    policyTypes:
    - Ingress

Copy to Clipboard

Toggle word wrap

where:

<policy_name>: Specifies your name for the policy.
<my_namespace>: Specifies the name of the namespace where the policy is deployed.

For more details, see "About network policy".

20.1.4. Next steps
Copy link

Creating a network policy
Optional: Defining a default network policy

20.2. Creating a network policy
Copy link

As a user with the admin role, you can create a network policy for a namespace.

20.2.1. Example NetworkPolicy object
Copy link

The following annotates an example NetworkPolicy object:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-27107 
spec:
  podSelector: 
    matchLabels:
      app: mongodb
  ingress:
  - from:
    - podSelector: 
        matchLabels:
          app: app
    ports: 
    - protocol: TCP
      port: 27017

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-27107

1


spec:
  podSelector:

2


    matchLabels:
      app: mongodb
  ingress:
  - from:
    - podSelector:

3


        matchLabels:
          app: app
    ports:

4


    - protocol: TCP
      port: 27017

Copy to Clipboard

Toggle word wrap

1: The name of the NetworkPolicy object.
2: A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object.
3: A selector that matches the pods from which the policy object allows ingress traffic. The selector matches pods in the same namespace as the NetworkPolicy.
4: A list of one or more destination ports on which to accept traffic.

20.2.2. Creating a network policy using the CLI
Copy link

To define granular rules describing ingress or egress network traffic allowed for namespaces in your cluster, you can create a network policy.

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with admin privileges.
You are working in the namespace that the network policy applies to.

Procedure

Create a policy rule:

Create a <policy_name>.yaml file:
```
touch <policy_name>.yaml
```
```
$ touch <policy_name>.yaml
```
Copy to Clipboard Toggle word wrap
where:

<policy_name>
Specifies the network policy file name.

Define a network policy in the file that you just created, such as in the following examples:

Deny ingress from all pods in all namespaces

This is a fundamental policy, blocking all cross-pod networking other than cross-pod traffic allowed by the configuration of other Network Policies.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
spec:
  podSelector:
  ingress: []

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
spec:
  podSelector:
  ingress: []

Copy to Clipboard

Toggle word wrap

Allow ingress from all pods in the same namespace

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
spec:
  podSelector:
  ingress:
  - from:
    - podSelector: {}

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
spec:
  podSelector:
  ingress:
  - from:
    - podSelector: {}

Copy to Clipboard

Toggle word wrap

Allow ingress traffic to one pod from a particular namespace

This policy allows traffic to pods labelled pod-a from pods running in namespace-y.

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-traffic-pod
spec:
  podSelector:
   matchLabels:
      pod: pod-a
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
           kubernetes.io/metadata.name: namespace-y

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-traffic-pod
spec:
  podSelector:
   matchLabels:
      pod: pod-a
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
           kubernetes.io/metadata.name: namespace-y

Copy to Clipboard

Toggle word wrap

To create the network policy object, enter the following command:
```
oc apply -f <policy_name>.yaml -n <namespace>
```
```
$ oc apply -f <policy_name>.yaml -n <namespace>
```
Copy to Clipboard Toggle word wrap
where:

<policy_name>
Specifies the network policy file name.
<namespace>
Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Example output
```
networkpolicy.networking.k8s.io/deny-by-default created
```
```
networkpolicy.networking.k8s.io/deny-by-default created
```
Copy to Clipboard Toggle word wrap

Note

If you log in to the web console with cluster-admin privileges, you have a choice of creating a network policy in any namespace in the cluster directly in YAML or from a form in the web console.

20.2.3. Creating a default deny all network policy
Copy link

This is a fundamental policy, blocking all cross-pod networking other than network traffic allowed by the configuration of other deployed network policies. This procedure enforces a default deny-by-default policy.

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with admin privileges.
You are working in the namespace that the network policy applies to.

Procedure

Create the following YAML that defines a deny-by-default policy to deny ingress from all pods in all namespaces. Save the YAML in the deny-by-default.yaml file:
```
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
  namespace: default 
spec:
  podSelector: {} 
  ingress: [] 
```
```
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
  namespace: default 
```
1
```
spec:
  podSelector: {} 
```
2
```
  ingress: [] 
```
3
Copy to Clipboard Toggle word wrap
1
namespace: default deploys this policy to the default namespace.
2
podSelector: is empty, this means it matches all the pods. Therefore, the policy applies to all pods in the default namespace.
3
There are no ingress rules specified. This causes incoming traffic to be dropped to all pods.

Apply the policy by entering the following command:

oc apply -f deny-by-default.yaml

$ oc apply -f deny-by-default.yaml

Copy to Clipboard

Toggle word wrap

Example output

networkpolicy.networking.k8s.io/deny-by-default created

networkpolicy.networking.k8s.io/deny-by-default created

Copy to Clipboard

Toggle word wrap

20.2.4. Creating a network policy to allow traffic from external clients
Copy link

With the deny-by-default policy in place you can proceed to configure a policy that allows traffic from external clients to a pod with the label app=web.

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Follow this procedure to configure a policy that allows external service from the public Internet directly or by using a Load Balancer to access the pod. Traffic is only allowed to a pod with the label app=web.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with admin privileges.
You are working in the namespace that the network policy applies to.

Procedure

Create a policy that allows traffic from the public Internet directly or by using a load balancer to access the pod. Save the YAML in the web-allow-external.yaml file:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: web-allow-external
  namespace: default
spec:
  policyTypes:
  - Ingress
  podSelector:
    matchLabels:
      app: web
  ingress:
    - {}

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: web-allow-external
  namespace: default
spec:
  policyTypes:
  - Ingress
  podSelector:
    matchLabels:
      app: web
  ingress:
    - {}

Copy to Clipboard

Toggle word wrap

Apply the policy by entering the following command:

oc apply -f web-allow-external.yaml

$ oc apply -f web-allow-external.yaml

Copy to Clipboard

Toggle word wrap

Example output

networkpolicy.networking.k8s.io/web-allow-external created

networkpolicy.networking.k8s.io/web-allow-external created

Copy to Clipboard

Toggle word wrap

This policy allows traffic from all resources, including external traffic as illustrated in the following diagram:

20.2.5. Creating a network policy allowing traffic to an application from all namespaces
Copy link

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Follow this procedure to configure a policy that allows traffic from all pods in all namespaces to a particular application.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with admin privileges.
You are working in the namespace that the network policy applies to.

Procedure

Create a policy that allows traffic from all pods in all namespaces to a particular application. Save the YAML in the web-allow-all-namespaces.yaml file:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: web-allow-all-namespaces
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: web 
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector: {}

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: web-allow-all-namespaces
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: web

1


  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector: {}

2

Copy to Clipboard

Toggle word wrap

1: Applies the policy only to app:web pods in default namespace.
2: Selects all pods in all namespaces.

Note

By default, if you omit specifying a namespaceSelector it does not select any namespaces, which means the policy allows traffic only from the namespace the network policy is deployed to.

Apply the policy by entering the following command:

oc apply -f web-allow-all-namespaces.yaml

$ oc apply -f web-allow-all-namespaces.yaml

Copy to Clipboard

Toggle word wrap

Example output

networkpolicy.networking.k8s.io/web-allow-all-namespaces created

networkpolicy.networking.k8s.io/web-allow-all-namespaces created

Copy to Clipboard

Toggle word wrap

Verification

Start a web service in the default namespace by entering the following command:

oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

Copy to Clipboard

Toggle word wrap

Run the following command to deploy an alpine image in the secondary namespace and to start a shell:

oc run test-$RANDOM --namespace=secondary --rm -i -t --image=alpine -- sh

$ oc run test-$RANDOM --namespace=secondary --rm -i -t --image=alpine -- sh

Copy to Clipboard

Toggle word wrap

Run the following command in the shell and observe that the request is allowed:

wget -qO- --timeout=2 http://web.default

# wget -qO- --timeout=2 http://web.default

Copy to Clipboard

Toggle word wrap

Expected output

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Copy to Clipboard

Toggle word wrap

20.2.6. Creating a network policy allowing traffic to an application from a namespace
Copy link

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Follow this procedure to configure a policy that allows traffic to a pod with the label app=web from a particular namespace. You might want to do this to:

Restrict traffic to a production database only to namespaces where production workloads are deployed.
Enable monitoring tools deployed to a particular namespace to scrape metrics from the current namespace.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with admin privileges.
You are working in the namespace that the network policy applies to.

Procedure

Create a policy that allows traffic from all pods in a particular namespaces with a label purpose=production. Save the YAML in the web-allow-prod.yaml file:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: web-allow-prod
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: web 
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          purpose: production

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: web-allow-prod
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: web

1


  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          purpose: production

2

Copy to Clipboard

Toggle word wrap

1: Applies the policy only to app:web pods in the default namespace.
2: Restricts traffic to only pods in namespaces that have the label purpose=production.

Apply the policy by entering the following command:

oc apply -f web-allow-prod.yaml

$ oc apply -f web-allow-prod.yaml

Copy to Clipboard

Toggle word wrap

Example output

networkpolicy.networking.k8s.io/web-allow-prod created

networkpolicy.networking.k8s.io/web-allow-prod created

Copy to Clipboard

Toggle word wrap

Verification

Start a web service in the default namespace by entering the following command:

oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

Copy to Clipboard

Toggle word wrap

Run the following command to create the prod namespace:
```
oc create namespace prod
```
```
$ oc create namespace prod
```
Copy to Clipboard Toggle word wrap
Run the following command to label the prod namespace:
```
oc label namespace/prod purpose=production
```
```
$ oc label namespace/prod purpose=production
```
Copy to Clipboard Toggle word wrap
Run the following command to create the dev namespace:
```
oc create namespace dev
```
```
$ oc create namespace dev
```
Copy to Clipboard Toggle word wrap
Run the following command to label the dev namespace:
```
oc label namespace/dev purpose=testing
```
```
$ oc label namespace/dev purpose=testing
```
Copy to Clipboard Toggle word wrap
Run the following command to deploy an alpine image in the dev namespace and to start a shell:
```
oc run test-$RANDOM --namespace=dev --rm -i -t --image=alpine -- sh
```
```
$ oc run test-$RANDOM --namespace=dev --rm -i -t --image=alpine -- sh
```
Copy to Clipboard Toggle word wrap
Run the following command in the shell and observe that the request is blocked:
```
wget -qO- --timeout=2 http://web.default
```
```
# wget -qO- --timeout=2 http://web.default
```
Copy to Clipboard Toggle word wrap
Expected output
```
wget: download timed out
```
```
wget: download timed out
```
Copy to Clipboard Toggle word wrap

Run the following command to deploy an alpine image in the prod namespace and start a shell:

oc run test-$RANDOM --namespace=prod --rm -i -t --image=alpine -- sh

$ oc run test-$RANDOM --namespace=prod --rm -i -t --image=alpine -- sh

Copy to Clipboard

Toggle word wrap

Run the following command in the shell and observe that the request is allowed:

wget -qO- --timeout=2 http://web.default

# wget -qO- --timeout=2 http://web.default

Copy to Clipboard

Toggle word wrap

Expected output

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Copy to Clipboard

Toggle word wrap

20.3. Viewing a network policy
Copy link

As a user with the admin role, you can view a network policy for a namespace.

20.3.1. Example NetworkPolicy object
Copy link

The following annotates an example NetworkPolicy object:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-27107 
spec:
  podSelector: 
    matchLabels:
      app: mongodb
  ingress:
  - from:
    - podSelector: 
        matchLabels:
          app: app
    ports: 
    - protocol: TCP
      port: 27017

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-27107

1


spec:
  podSelector:

2


    matchLabels:
      app: mongodb
  ingress:
  - from:
    - podSelector:

3


        matchLabels:
          app: app
    ports:

4


    - protocol: TCP
      port: 27017

Copy to Clipboard

Toggle word wrap

1: The name of the NetworkPolicy object.
2: A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object.
3: A selector that matches the pods from which the policy object allows ingress traffic. The selector matches pods in the same namespace as the NetworkPolicy.
4: A list of one or more destination ports on which to accept traffic.

20.3.2. Viewing network policies using the CLI
Copy link

You can examine the network policies in a namespace.

Note

If you log in with a user with the cluster-admin role, then you can view any network policy in the cluster.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with admin privileges.
You are working in the namespace where the network policy exists.

Procedure

List network policies in a namespace:

To view network policy objects defined in a namespace, enter the following command:
```
oc get networkpolicy
```
```
$ oc get networkpolicy
```
Copy to Clipboard Toggle word wrap

Optional: To examine a specific network policy, enter the following command:

oc describe networkpolicy <policy_name> -n <namespace>

$ oc describe networkpolicy <policy_name> -n <namespace>

Copy to Clipboard

Toggle word wrap

where:

<policy_name>: Specifies the name of the network policy to inspect.
<namespace>: Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.

For example:

oc describe networkpolicy allow-same-namespace

$ oc describe networkpolicy allow-same-namespace

Copy to Clipboard

Toggle word wrap

Output for oc describe command

Name:         allow-same-namespace
Namespace:    ns1
Created on:   2021-05-24 22:28:56 -0400 EDT
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    To Port: <any> (traffic allowed to all ports)
    From:
      PodSelector: <none>
  Not affecting egress traffic
  Policy Types: Ingress

Name:         allow-same-namespace
Namespace:    ns1
Created on:   2021-05-24 22:28:56 -0400 EDT
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    To Port: <any> (traffic allowed to all ports)
    From:
      PodSelector: <none>
  Not affecting egress traffic
  Policy Types: Ingress

Copy to Clipboard

Toggle word wrap

Note

If you log in to the web console with cluster-admin privileges, you have a choice of viewing a network policy in any namespace in the cluster directly in YAML or from a form in the web console.

20.4. Editing a network policy
Copy link

As a user with the admin role, you can edit an existing network policy for a namespace.

20.4.1. Editing a network policy
Copy link

You can edit a network policy in a namespace.

Note

If you log in with a user with the cluster-admin role, then you can edit a network policy in any namespace in the cluster.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with admin privileges.
You are working in the namespace where the network policy exists.

Procedure

Optional: To list the network policy objects in a namespace, enter the following command:
```
oc get networkpolicy
```
```
$ oc get networkpolicy
```
Copy to Clipboard Toggle word wrap
where:

<namespace>
Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Edit the network policy object.
- If you saved the network policy definition in a file, edit the file and make any necessary changes, and then enter the following command.
  $ oc apply -n <namespace> -f <policy_file>.yaml
  Copy to Clipboard Toggle word wrap
  where:
  
  <namespace>
  Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
  <policy_file>
  Specifies the name of the file containing the network policy.
- If you need to update the network policy object directly, enter the following command:
  $ oc edit networkpolicy <policy_name> -n <namespace>
  Copy to Clipboard Toggle word wrap
  where:
  
  <policy_name>
  Specifies the name of the network policy.
  <namespace>
  Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Confirm that the network policy object is updated.
```
oc describe networkpolicy <policy_name> -n <namespace>
```
```
$ oc describe networkpolicy <policy_name> -n <namespace>
```
Copy to Clipboard Toggle word wrap
where:

<policy_name>
Specifies the name of the network policy.
<namespace>
Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.

Note

If you log in to the web console with cluster-admin privileges, you have a choice of editing a network policy in any namespace in the cluster directly in YAML or from the policy in the web console through the Actions menu.

20.4.2. Example NetworkPolicy object
Copy link

The following annotates an example NetworkPolicy object:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-27107 
spec:
  podSelector: 
    matchLabels:
      app: mongodb
  ingress:
  - from:
    - podSelector: 
        matchLabels:
          app: app
    ports: 
    - protocol: TCP
      port: 27017

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-27107

1


spec:
  podSelector:

2


    matchLabels:
      app: mongodb
  ingress:
  - from:
    - podSelector:

3


        matchLabels:
          app: app
    ports:

4


    - protocol: TCP
      port: 27017

Copy to Clipboard

Toggle word wrap

1: The name of the NetworkPolicy object.
2: A selector that describes the pods to which the policy applies. The policy object can only select pods in the project that defines the NetworkPolicy object.
3: A selector that matches the pods from which the policy object allows ingress traffic. The selector matches pods in the same namespace as the NetworkPolicy.
4: A list of one or more destination ports on which to accept traffic.

20.5. Deleting a network policy
Copy link

As a user with the admin role, you can delete a network policy from a namespace.

20.5.1. Deleting a network policy using the CLI
Copy link

You can delete a network policy in a namespace.

Note

If you log in with a user with the cluster-admin role, then you can delete any network policy in the cluster.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with admin privileges.
You are working in the namespace where the network policy exists.

Procedure

To delete a network policy object, enter the following command:
```
oc delete networkpolicy <policy_name> -n <namespace>
```
```
$ oc delete networkpolicy <policy_name> -n <namespace>
```
Copy to Clipboard Toggle word wrap
where:

<policy_name>
Specifies the name of the network policy.
<namespace>
Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Example output
```
networkpolicy.networking.k8s.io/default-deny deleted
```
```
networkpolicy.networking.k8s.io/default-deny deleted
```
Copy to Clipboard Toggle word wrap

Note

If you log in to the web console with cluster-admin privileges, you have a choice of deleting a network policy in any namespace in the cluster directly in YAML or from the policy in the web console through the Actions menu.

20.6. Defining a default network policy for projects
Copy link

As a cluster administrator, you can modify the new project template to automatically include network policies when you create a new project. If you do not yet have a customized template for new projects, you must first create one.

20.6.1. Modifying the template for new projects
Copy link

As a cluster administrator, you can modify the default project template so that new projects are created using your custom requirements.

To create your own custom project template:

Procedure

Log in as a user with cluster-admin privileges.

Generate the default project template:

oc adm create-bootstrap-project-template -o yaml > template.yaml

$ oc adm create-bootstrap-project-template -o yaml > template.yaml

Copy to Clipboard

Toggle word wrap

Use a text editor to modify the generated template.yaml file by adding objects or modifying existing objects.
The project template must be created in the openshift-config namespace. Load your modified template:
```
oc create -f template.yaml -n openshift-config
```
```
$ oc create -f template.yaml -n openshift-config
```
Copy to Clipboard Toggle word wrap
Edit the project configuration resource using the web console or CLI.
- Using the web console:
  1. Navigate to the Administration → Cluster Settings page.
  2. Click Configuration to view all configuration resources.
  3. Find the entry for Project and click Edit YAML.
- Using the CLI:
  1. Edit the project.config.openshift.io/cluster resource:
    
    $ oc edit project.config.openshift.io/cluster
    
    Copy to Clipboard Toggle word wrap
Update the spec section to include the projectRequestTemplate and name parameters, and set the name of your uploaded project template. The default name is project-request.
Project configuration resource with custom project template
```
apiVersion: config.openshift.io/v1
kind: Project
metadata:
# ...
spec:
  projectRequestTemplate:
    name: <template_name>
# ...
```
```
apiVersion: config.openshift.io/v1
kind: Project
metadata:
# ...
spec:
  projectRequestTemplate:
    name: <template_name>
# ...
```
Copy to Clipboard Toggle word wrap
After you save your changes, create a new project to verify that your changes were successfully applied.

20.6.2. Adding network policies to the new project template
Copy link

As a cluster administrator, you can add network policies to the default template for new projects. OpenShift Container Platform will automatically create all the NetworkPolicy objects specified in the template in the project.

Prerequisites

Your cluster uses a default CNI network plugin that supports NetworkPolicy objects, such as the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You must log in to the cluster with a user with cluster-admin privileges.
You must have created a custom default project template for new projects.

Procedure

Edit the default template for a new project by running the following command:
```
oc edit template <project_template> -n openshift-config
```
```
$ oc edit template <project_template> -n openshift-config
```
Copy to Clipboard Toggle word wrap
Replace <project_template> with the name of the default template that you configured for your cluster. The default template name is project-request.

In the template, add each NetworkPolicy object as an element to the objects parameter. The objects parameter accepts a collection of one or more objects.

In the following example, the objects parameter collection includes several NetworkPolicy objects.

objects:
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: allow-from-same-namespace
  spec:
    podSelector: {}
    ingress:
    - from:
      - podSelector: {}
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: allow-from-openshift-ingress
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            network.openshift.io/policy-group: ingress
    podSelector: {}
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: allow-from-kube-apiserver-operator
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: openshift-kube-apiserver-operator
        podSelector:
          matchLabels:
            app: kube-apiserver-operator
    policyTypes:
    - Ingress
...

objects:
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: allow-from-same-namespace
  spec:
    podSelector: {}
    ingress:
    - from:
      - podSelector: {}
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: allow-from-openshift-ingress
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            network.openshift.io/policy-group: ingress
    podSelector: {}
    policyTypes:
    - Ingress
- apiVersion: networking.k8s.io/v1
  kind: NetworkPolicy
  metadata:
    name: allow-from-kube-apiserver-operator
  spec:
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: openshift-kube-apiserver-operator
        podSelector:
          matchLabels:
            app: kube-apiserver-operator
    policyTypes:
    - Ingress
...

Copy to Clipboard

Toggle word wrap

Optional: Create a new project to confirm that your network policy objects are created successfully by running the following commands:
1. Create a new project:
  $ oc new-project <project>
  1
  Copy to Clipboard Toggle word wrap
  1
  Replace <project> with the name for the project you are creating.
2. Confirm that the network policy objects in the new project template exist in the new project:
  $ oc get networkpolicy NAME POD-SELECTOR AGE allow-from-openshift-ingress <none> 7s allow-from-same-namespace <none> 7s
  Copy to Clipboard Toggle word wrap

20.7. Configuring multitenant isolation with network policy
Copy link

As a cluster administrator, you can configure your network policies to provide multitenant network isolation.

Note

If you are using the OpenShift SDN network plugin, configuring network policies as described in this section provides network isolation similar to multitenant mode but with network policy mode set.

20.7.1. Configuring multitenant isolation by using network policy
Copy link

You can configure your project to isolate it from pods and services in other project namespaces.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with admin privileges.

Procedure

Create the following NetworkPolicy objects:

A policy named allow-from-openshift-ingress.

cat << EOF| oc create -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-ingress
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          policy-group.network.openshift.io/ingress: ""
  podSelector: {}
  policyTypes:
  - Ingress
EOF

$ cat << EOF| oc create -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-ingress
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          policy-group.network.openshift.io/ingress: ""
  podSelector: {}
  policyTypes:
  - Ingress
EOF

Copy to Clipboard

Toggle word wrap

Note

policy-group.network.openshift.io/ingress: "" is the preferred namespace selector label for OpenShift SDN. You can use the network.openshift.io/policy-group: ingress namespace selector label, but this is a legacy label.

A policy named allow-from-openshift-monitoring:

cat << EOF| oc create -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-monitoring
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          network.openshift.io/policy-group: monitoring
  podSelector: {}
  policyTypes:
  - Ingress
EOF

$ cat << EOF| oc create -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-openshift-monitoring
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          network.openshift.io/policy-group: monitoring
  podSelector: {}
  policyTypes:
  - Ingress
EOF

Copy to Clipboard

Toggle word wrap

A policy named allow-same-namespace:

cat << EOF| oc create -f -
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
spec:
  podSelector:
  ingress:
  - from:
    - podSelector: {}
EOF

$ cat << EOF| oc create -f -
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: allow-same-namespace
spec:
  podSelector:
  ingress:
  - from:
    - podSelector: {}
EOF

Copy to Clipboard

Toggle word wrap

A policy named allow-from-kube-apiserver-operator:

cat << EOF| oc create -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-kube-apiserver-operator
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-kube-apiserver-operator
      podSelector:
        matchLabels:
          app: kube-apiserver-operator
  policyTypes:
  - Ingress
EOF

$ cat << EOF| oc create -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-kube-apiserver-operator
spec:
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-kube-apiserver-operator
      podSelector:
        matchLabels:
          app: kube-apiserver-operator
  policyTypes:
  - Ingress
EOF

Copy to Clipboard

Toggle word wrap

For more details, see New kube-apiserver-operator webhook controller validating health of webhook.

Optional: To confirm that the network policies exist in your current project, enter the following command:

oc describe networkpolicy

$ oc describe networkpolicy

Copy to Clipboard

Toggle word wrap

Example output

Name:         allow-from-openshift-ingress
Namespace:    example1
Created on:   2020-06-09 00:28:17 -0400 EDT
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    To Port: <any> (traffic allowed to all ports)
    From:
      NamespaceSelector: network.openshift.io/policy-group: ingress
  Not affecting egress traffic
  Policy Types: Ingress


Name:         allow-from-openshift-monitoring
Namespace:    example1
Created on:   2020-06-09 00:29:57 -0400 EDT
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    To Port: <any> (traffic allowed to all ports)
    From:
      NamespaceSelector: network.openshift.io/policy-group: monitoring
  Not affecting egress traffic
  Policy Types: Ingress

Name:         allow-from-openshift-ingress
Namespace:    example1
Created on:   2020-06-09 00:28:17 -0400 EDT
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    To Port: <any> (traffic allowed to all ports)
    From:
      NamespaceSelector: network.openshift.io/policy-group: ingress
  Not affecting egress traffic
  Policy Types: Ingress


Name:         allow-from-openshift-monitoring
Namespace:    example1
Created on:   2020-06-09 00:29:57 -0400 EDT
Labels:       <none>
Annotations:  <none>
Spec:
  PodSelector:     <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    To Port: <any> (traffic allowed to all ports)
    From:
      NamespaceSelector: network.openshift.io/policy-group: monitoring
  Not affecting egress traffic
  Policy Types: Ingress

Copy to Clipboard

Toggle word wrap

20.7.2. Next steps
Copy link

Defining a default network policy

Chapter 21. CIDR range definitions
Copy link

You must specify non-overlapping ranges for the following CIDR ranges.

Note

Machine CIDR ranges cannot be changed after creating your cluster.

Important

OVN-Kubernetes, the default network provider in OpenShift Container Platform 4.11 to 4.13, uses the following IP address ranges internally: 100.64.0.0/16, 169.254.169.0/29, fd98::/64 and fd69::/125. If your cluster uses OVN-Kubernetes, do not include any IP address ranges in any other CIDR definitions in your cluster.

21.1. Machine CIDR
Copy link

In the Machine classless inter-domain routing (CIDR) field, you must specify the IP address range for machines or cluster nodes.

The default is 10.0.0.0/16. This range must not conflict with any connected networks.

21.2. Service CIDR
Copy link

In the Service CIDR field, you must specify the IP address range for services. The range must be large enough to accommodate your workload. The address block must not overlap with any external service accessed from within the cluster. The default is 172.30.0.0/16.

21.3. Pod CIDR
Copy link

In the pod CIDR field, you must specify the IP address range for pods.

The pod CIDR is the same as the clusterNetwork CIDR and the cluster CIDR. The range must be large enough to accommodate your workload. The address block must not overlap with any external service accessed from within the cluster. The default is 10.128.0.0/14. You can expand the range after cluster installation.

21.4. Host Prefix
Copy link

In the Host Prefix field, you must specify the subnet prefix length assigned to pods scheduled to individual machines. The host prefix determines the pod IP address pool for each machine.

For example, if the host prefix is set to /23, each machine is assigned a /23 subnet from the pod CIDR address range. The default is /23, allowing 510 cluster nodes, and 510 pod IP addresses per node.

Chapter 22. AWS Load Balancer Operator
Copy link

22.1. AWS Load Balancer Operator release notes
Copy link

The AWS Load Balancer (ALB) Operator deploys and manages an instance of the AWSLoadBalancerController resource.

Important

The AWS Load Balancer (ALB) Operator is only supported on the x86_64 architecture.

These release notes track the development of the AWS Load Balancer Operator in OpenShift Container Platform.

For an overview of the AWS Load Balancer Operator, see AWS Load Balancer Operator in OpenShift Container Platform.

Note

AWS Load Balancer Operator currently does not support AWS GovCloud.

22.1.1. AWS Load Balancer Operator 1.0.0
Copy link

The AWS Load Balancer Operator is now generally available with this release. The AWS Load Balancer Operator version 1.0.0 supports the AWS Load Balancer Controller version 2.4.4.

The following advisory is available for the AWS Load Balancer Operator version 1.0.0:

RHEA-2023:1954 Release of AWS Load Balancer Operator on OperatorHub Enhancement Advisory Update

22.1.1.1. Notable changes
Copy link

This release uses the new v1 API version.

22.1.1.2. Bug fixes
Copy link

Previously, the controller provisioned by the AWS Load Balancer Operator did not properly use the configuration for the cluster-wide proxy. These settings are now applied appropriately to the controller. (OCPBUGS-4052, OCPBUGS-5295)

22.1.2. Earlier versions
Copy link

The two earliest versions of the AWS Load Balancer Operator are available as a Technology Preview. These versions should not be used in a production cluster. For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

The following advisory is available for the AWS Load Balancer Operator version 0.2.0:

RHEA-2022:9084 Release of AWS Load Balancer Operator on OperatorHub Enhancement Advisory Update

The following advisory is available for the AWS Load Balancer Operator version 0.0.1:

RHEA-2022:5780 Release of AWS Load Balancer Operator on OperatorHub Enhancement Advisory Update

22.2. AWS Load Balancer Operator in OpenShift Container Platform
Copy link

The AWS Load Balancer Operator deploys and manages the AWS Load Balancer Controller. You can install the AWS Load Balancer Operator from OperatorHub by using OpenShift Container Platform web console or CLI.

22.2.1. AWS Load Balancer Operator considerations
Copy link

Review the following limitations before installing and using the AWS Load Balancer Operator:

The IP traffic mode only works on AWS Elastic Kubernetes Service (EKS). The AWS Load Balancer Operator disables the IP traffic mode for the AWS Load Balancer Controller. As a result of disabling the IP traffic mode, the AWS Load Balancer Controller cannot use the pod readiness gate.
The AWS Load Balancer Operator adds command-line flags such as --disable-ingress-class-annotation and --disable-ingress-group-name-annotation to the AWS Load Balancer Controller. Therefore, the AWS Load Balancer Operator does not allow using the kubernetes.io/ingress.class and alb.ingress.kubernetes.io/group.name annotations in the Ingress resource.
You have configured the AWS Load Balancer Operator so that the SVC type is NodePort (not LoadBalancer or ClusterIP).

22.2.2. AWS Load Balancer Operator
Copy link

The AWS Load Balancer Operator can tag the public subnets if the kubernetes.io/role/elb tag is missing. Also, the AWS Load Balancer Operator detects the following information from the underlying AWS cloud:

The ID of the virtual private cloud (VPC) on which the cluster hosting the Operator is deployed in.
Public and private subnets of the discovered VPC.

The AWS Load Balancer Operator supports the Kubernetes service resource of type LoadBalancer by using Network Load Balancer (NLB) with the instance target type only.

Prerequisites

You must have the AWS credentials secret. The credentials are used to provide subnet tagging and VPC discovery.

Procedure

You can deploy the AWS Load Balancer Operator on demand from OperatorHub, by creating a Subscription object by running the following command:

oc -n aws-load-balancer-operator get sub aws-load-balancer-operator --template='{{.status.installplan.name}}{{"\n"}}'

$ oc -n aws-load-balancer-operator get sub aws-load-balancer-operator --template='{{.status.installplan.name}}{{"\n"}}'

Copy to Clipboard

Toggle word wrap

Example output

install-zlfbt

install-zlfbt

Copy to Clipboard

Toggle word wrap

Check if the status of an install plan is Complete by running the following command:

oc -n aws-load-balancer-operator get ip <install_plan_name> --template='{{.status.phase}}{{"\n"}}'

$ oc -n aws-load-balancer-operator get ip <install_plan_name> --template='{{.status.phase}}{{"\n"}}'

Copy to Clipboard

Toggle word wrap

Example output

Complete

Complete

Copy to Clipboard

Toggle word wrap

View the status of the aws-load-balancer-operator-controller-manager deployment by running the following command:

oc get -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager

$ oc get -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager

Copy to Clipboard

Toggle word wrap

Example output

NAME                                           READY     UP-TO-DATE   AVAILABLE   AGE
aws-load-balancer-operator-controller-manager  1/1       1            1           23h

NAME                                           READY     UP-TO-DATE   AVAILABLE   AGE
aws-load-balancer-operator-controller-manager  1/1       1            1           23h

Copy to Clipboard

Toggle word wrap

22.2.3. AWS Load Balancer Operator logs
Copy link

You can view the AWS Load Balancer Operator logs by using the oc logs command.

Procedure

View the logs of the AWS Load Balancer Operator by running the following command:

oc logs -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager -c manager

$ oc logs -n aws-load-balancer-operator deployment/aws-load-balancer-operator-controller-manager -c manager

Copy to Clipboard

Toggle word wrap

22.3. Installing the AWS Load Balancer Operator
Copy link

The AWS Load Balancer Operator deploys and manages the AWS Load Balancer Controller. You can install the AWS Load Balancer Operator from the OperatorHub by using OpenShift Container Platform web console or CLI.

22.3.1. Installing the AWS Load Balancer Operator by using the web console
Copy link

You can install the AWS Load Balancer Operator by using the web console.

Prerequisites

You have logged in to the OpenShift Container Platform web console as a user with cluster-admin permissions.
Your cluster is configured with AWS as the platform type and cloud provider.
If you are using a security token service (STS) or user-provisioned infrastructure, follow the related preparation steps. For example, if you are using AWS Security Token Service, see "Preparing for the AWS Load Balancer Operator on a cluster using the AWS Security Token Service (STS)".

Procedure

Navigate to Operators → OperatorHub in the OpenShift Container Platform web console.
Select the AWS Load Balancer Operator. You can use the Filter by keyword text box or use the filter list to search for the AWS Load Balancer Operator from the list of Operators.
Select the aws-load-balancer-operator namespace.
On the Install Operator page, select the following options:
1. Update the channel as stable-v1.
2. Installation mode as All namespaces on the cluster (default).
3. Installed Namespace as aws-load-balancer-operator. If the aws-load-balancer-operator namespace does not exist, it gets created during the Operator installation.
4. Select Update approval as Automatic or Manual. By default, the Update approval is set to Automatic. If you select automatic updates, the Operator Lifecycle Manager (OLM) automatically upgrades the running instance of your Operator without any intervention. If you select manual updates, the OLM creates an update request. As a cluster administrator, you must then manually approve that update request to update the Operator updated to the new version.
Click Install.

Verification

Verify that the AWS Load Balancer Operator shows the Status as Succeeded on the Installed Operators dashboard.

22.3.2. Installing the AWS Load Balancer Operator by using the CLI
Copy link

You can install the AWS Load Balancer Operator by using the CLI.

Prerequisites

You are logged in to the OpenShift Container Platform web console as a user with cluster-admin permissions.
Your cluster is configured with AWS as the platform type and cloud provider.
You are logged into the OpenShift CLI (oc).

Procedure

Create a Namespace object:
1. Create a YAML file that defines the Namespace object:
  Example namespace.yaml file
  apiVersion: v1 kind: Namespace metadata: name: aws-load-balancer-operator
  
  Copy to Clipboard Toggle word wrap
2. Create the Namespace object by running the following command:
  $ oc apply -f namespace.yaml
  Copy to Clipboard Toggle word wrap

Create a CredentialsRequest object:

Create a YAML file that defines the CredentialsRequest object:

Example credentialsrequest.yaml file

apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: aws-load-balancer-operator
  namespace: openshift-cloud-credential-operator
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
      - action:
          - ec2:DescribeSubnets
        effect: Allow
        resource: "*"
      - action:
          - ec2:CreateTags
          - ec2:DeleteTags
        effect: Allow
        resource: arn:aws:ec2:*:*:subnet/*
      - action:
          - ec2:DescribeVpcs
        effect: Allow
        resource: "*"
  secretRef:
    name: aws-load-balancer-operator
    namespace: aws-load-balancer-operator
  serviceAccountNames:
    - aws-load-balancer-operator-controller-manager

apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: aws-load-balancer-operator
  namespace: openshift-cloud-credential-operator
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
      - action:
          - ec2:DescribeSubnets
        effect: Allow
        resource: "*"
      - action:
          - ec2:CreateTags
          - ec2:DeleteTags
        effect: Allow
        resource: arn:aws:ec2:*:*:subnet/*
      - action:
          - ec2:DescribeVpcs
        effect: Allow
        resource: "*"
  secretRef:
    name: aws-load-balancer-operator
    namespace: aws-load-balancer-operator
  serviceAccountNames:
    - aws-load-balancer-operator-controller-manager

Copy to Clipboard

Toggle word wrap

Create the CredentialsRequest object by running the following command:
```
oc apply -f credentialsrequest.yaml
```
```
$ oc apply -f credentialsrequest.yaml
```
Copy to Clipboard Toggle word wrap

Create an OperatorGroup object:

Create a YAML file that defines the OperatorGroup object:

Example operatorgroup.yaml file

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: aws-lb-operatorgroup
  namespace: aws-load-balancer-operator
spec:
  upgradeStrategy: Default

apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: aws-lb-operatorgroup
  namespace: aws-load-balancer-operator
spec:
  upgradeStrategy: Default

Copy to Clipboard

Toggle word wrap

Create the OperatorGroup object by running the following command:
```
oc apply -f operatorgroup.yaml
```
```
$ oc apply -f operatorgroup.yaml
```
Copy to Clipboard Toggle word wrap

Create a Subscription object:

Create a YAML file that defines the Subscription object:

Example subscription.yaml file

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: aws-load-balancer-operator
  namespace: aws-load-balancer-operator
spec:
  channel: stable-v1
  installPlanApproval: Automatic
  name: aws-load-balancer-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: aws-load-balancer-operator
  namespace: aws-load-balancer-operator
spec:
  channel: stable-v1
  installPlanApproval: Automatic
  name: aws-load-balancer-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace

Copy to Clipboard

Toggle word wrap

Create the Subscription object by running the following command:
```
oc apply -f subscription.yaml
```
```
$ oc apply -f subscription.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Get the name of the install plan from the subscription:

oc -n aws-load-balancer-operator \
    get subscription aws-load-balancer-operator \
    --template='{{.status.installplan.name}}{{"\n"}}'

$ oc -n aws-load-balancer-operator \
    get subscription aws-load-balancer-operator \
    --template='{{.status.installplan.name}}{{"\n"}}'

Copy to Clipboard

Toggle word wrap

Check the status of the install plan:

oc -n aws-load-balancer-operator \
    get ip <install_plan_name> \
    --template='{{.status.phase}}{{"\n"}}'

$ oc -n aws-load-balancer-operator \
    get ip <install_plan_name> \
    --template='{{.status.phase}}{{"\n"}}'

Copy to Clipboard

Toggle word wrap

The output must be Complete.

22.4. Preparing for the AWS Load Balancer Operator on a cluster using the AWS Security Token Service
Copy link

You can install the AWS Load Balancer Operator on a cluster that uses STS. Follow these steps to prepare your cluster before installing the Operator.

The AWS Load Balancer Operator relies on the CredentialsRequest object to bootstrap the Operator and the AWS Load Balancer Controller. The AWS Load Balancer Operator waits until the required secrets are created and available. The Cloud Credential Operator does not provision the secrets automatically in the STS cluster. You must set the credentials secrets manually by using the ccoctl binary.

If you do not want to provision credential secret by using the Cloud Credential Operator, you can configure the AWSLoadBalancerController instance on the STS cluster by specifying the credential secret in the AWS load Balancer Controller custom resource (CR).

22.4.1. Bootstrapping AWS Load Balancer Operator on Security Token Service cluster
Copy link

Prerequisites

You must extract and prepare the ccoctl binary.

Procedure

Create the aws-load-balancer-operator namespace by running the following command:
```
oc create namespace aws-load-balancer-operator
```
```
$ oc create namespace aws-load-balancer-operator
```
Copy to Clipboard Toggle word wrap

Download the CredentialsRequest custom resource (CR) of the AWS Load Balancer Operator, and create a directory to store it by running the following command:

curl --create-dirs -o <path-to-credrequests-dir>/cr.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/operator-credentials-request.yaml

$ curl --create-dirs -o <path-to-credrequests-dir>/cr.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/operator-credentials-request.yaml

Copy to Clipboard

Toggle word wrap

Use the ccoctl tool to process CredentialsRequest objects of the AWS Load Balancer Operator, by running the following command:

ccoctl aws create-iam-roles \
    --name <name> --region=<aws_region> \
    --credentials-requests-dir=<path-to-credrequests-dir> \
    --identity-provider-arn <oidc-arn>

$ ccoctl aws create-iam-roles \
    --name <name> --region=<aws_region> \
    --credentials-requests-dir=<path-to-credrequests-dir> \
    --identity-provider-arn <oidc-arn>

Copy to Clipboard

Toggle word wrap

Apply the secrets generated in the manifests directory of your cluster by running the following command:
```
ls manifests/*-credentials.yaml | xargs -I{} oc apply -f {}
```
```
$ ls manifests/*-credentials.yaml | xargs -I{} oc apply -f {}
```
Copy to Clipboard Toggle word wrap

Verify that the credentials secret of the AWS Load Balancer Operator is created by running the following command:

oc -n aws-load-balancer-operator get secret aws-load-balancer-operator --template='{{index .data "credentials"}}' | base64 -d

$ oc -n aws-load-balancer-operator get secret aws-load-balancer-operator --template='{{index .data "credentials"}}' | base64 -d

Copy to Clipboard

Toggle word wrap

Example output

[default]
sts_regional_endpoints = regional
role_arn = arn:aws:iam::999999999999:role/aws-load-balancer-operator-aws-load-balancer-operator
web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

[default]
sts_regional_endpoints = regional
role_arn = arn:aws:iam::999999999999:role/aws-load-balancer-operator-aws-load-balancer-operator
web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

Copy to Clipboard

Toggle word wrap

22.4.2. Configuring AWS Load Balancer Operator on Security Token Service cluster by using managed CredentialsRequest objects
Copy link

Prerequisites

You must extract and prepare the ccoctl binary.

Procedure

The AWS Load Balancer Operator creates the CredentialsRequest object in the openshift-cloud-credential-operator namespace for each AWSLoadBalancerController custom resource (CR). You can extract and save the created CredentialsRequest object in a directory by running the following command:
```
oc get credentialsrequest -n openshift-cloud-credential-operator  \
    aws-load-balancer-controller-<cr-name> -o yaml > <path-to-credrequests-dir>/cr.yaml
```
```
$ oc get credentialsrequest -n openshift-cloud-credential-operator  \
    aws-load-balancer-controller-<cr-name> -o yaml > <path-to-credrequests-dir>/cr.yaml 
```
1
Copy to Clipboard Toggle word wrap
1
The aws-load-balancer-controller-<cr-name> parameter specifies the credential request name created by the AWS Load Balancer Operator. The cr-name specifies the name of the AWS Load Balancer Controller instance.

Use the ccoctl tool to process all CredentialsRequest objects in the credrequests directory by running the following command:

ccoctl aws create-iam-roles \
    --name <name> --region=<aws_region> \
    --credentials-requests-dir=<path-to-credrequests-dir> \
    --identity-provider-arn <oidc-arn>

$ ccoctl aws create-iam-roles \
    --name <name> --region=<aws_region> \
    --credentials-requests-dir=<path-to-credrequests-dir> \
    --identity-provider-arn <oidc-arn>

Copy to Clipboard

Toggle word wrap

Apply the secrets generated in manifests directory to your cluster, by running the following command:
```
ls manifests/*-credentials.yaml | xargs -I{} oc apply -f {}
```
```
$ ls manifests/*-credentials.yaml | xargs -I{} oc apply -f {}
```
Copy to Clipboard Toggle word wrap

Verify that the aws-load-balancer-controller pod is created:

oc -n aws-load-balancer-operator get pods
NAME                                                            READY   STATUS    RESTARTS   AGE
aws-load-balancer-controller-cluster-9b766d6-gg82c              1/1     Running   0          137m
aws-load-balancer-operator-controller-manager-b55ff68cc-85jzg   2/2     Running   0          3h26m

$ oc -n aws-load-balancer-operator get pods
NAME                                                            READY   STATUS    RESTARTS   AGE
aws-load-balancer-controller-cluster-9b766d6-gg82c              1/1     Running   0          137m
aws-load-balancer-operator-controller-manager-b55ff68cc-85jzg   2/2     Running   0          3h26m

Copy to Clipboard

Toggle word wrap

22.4.3. Configuring the AWS Load Balancer Operator on Security Token Service cluster by using specific credentials
Copy link

You can specify the credential secret by using the spec.credentials field in the AWS Load Balancer Controller custom resource (CR). You can use the predefined CredentialsRequest object of the controller to know which roles are required.

Prerequisites

You must extract and prepare the ccoctl binary.

Procedure

Download the CredentialsRequest custom resource (CR) of the AWS Load Balancer Controller, and create a directory to store it by running the following command:

curl --create-dirs -o <path-to-credrequests-dir>/cr.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/controller/controller-credentials-request.yaml

$ curl --create-dirs -o <path-to-credrequests-dir>/cr.yaml https://raw.githubusercontent.com/openshift/aws-load-balancer-operator/main/hack/controller/controller-credentials-request.yaml

Copy to Clipboard

Toggle word wrap

Use the ccoctl tool to process the CredentialsRequest object of the controller:

ccoctl aws create-iam-roles \
        --name <name> --region=<aws_region> \
        --credentials-requests-dir=<path-to-credrequests-dir> \
        --identity-provider-arn <oidc-arn>

$ ccoctl aws create-iam-roles \
        --name <name> --region=<aws_region> \
        --credentials-requests-dir=<path-to-credrequests-dir> \
        --identity-provider-arn <oidc-arn>

Copy to Clipboard

Toggle word wrap

Apply the secrets to your cluster:

ls manifests/*-credentials.yaml | xargs -I{} oc apply -f {}

$ ls manifests/*-credentials.yaml | xargs -I{} oc apply -f {}

Copy to Clipboard

Toggle word wrap

Verify the credentials secret has been created for use by the controller:

oc -n aws-load-balancer-operator get secret aws-load-balancer-controller-manual-cluster --template='{{index .data "credentials"}}' | base64 -d

$ oc -n aws-load-balancer-operator get secret aws-load-balancer-controller-manual-cluster --template='{{index .data "credentials"}}' | base64 -d

Copy to Clipboard

Toggle word wrap

Example output

[default]
    sts_regional_endpoints = regional
    role_arn = arn:aws:iam::999999999999:role/aws-load-balancer-operator-aws-load-balancer-controller
    web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

[default]
    sts_regional_endpoints = regional
    role_arn = arn:aws:iam::999999999999:role/aws-load-balancer-operator-aws-load-balancer-controller
    web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

Copy to Clipboard

Toggle word wrap

Create the AWSLoadBalancerController resource YAML file, for example, sample-aws-lb-manual-creds.yaml, as follows:
```
apiVersion: networking.olm.openshift.io/v1
kind: AWSLoadBalancerController 
metadata:
  name: cluster 
spec:
  credentials:
    name: <secret-name> 
```
```
apiVersion: networking.olm.openshift.io/v1
kind: AWSLoadBalancerController 
```
1
```
metadata:
  name: cluster 
```
2
```
spec:
  credentials:
    name: <secret-name> 
```
3
Copy to Clipboard Toggle word wrap
1
Defines the AWSLoadBalancerController resource.
2
Defines the AWS Load Balancer Controller instance name. This instance name gets added as a suffix to all related resources.
3
Specifies the secret name containing AWS credentials that the controller uses.

22.5. Creating an instance of the AWS Load Balancer Controller
Copy link

After installing the AWS Load Balancer Operator, you can create the AWS Load Balancer Controller.

22.5.1. Creating the AWS Load Balancer Controller
Copy link

You can install only a single instance of the AWSLoadBalancerController object in a cluster. You can create the AWS Load Balancer Controller by using CLI. The AWS Load Balancer Operator reconciles only the cluster named resource.

Prerequisites

You have created the echoserver namespace.
You have access to the OpenShift CLI (oc).

Procedure

Create a YAML file that defines the AWSLoadBalancerController object:
Example sample-aws-lb.yaml file
```
apiVersion: networking.olm.openshift.io/v1
kind: AWSLoadBalancerController 
metadata:
  name: cluster 
spec:
  subnetTagging: Auto 
  additionalResourceTags: 
  - key: example.org/security-scope
    value: staging
  ingressClass: alb 
  config:
    replicas: 2 
  enabledAddons: 
    - AWSWAFv2 
```
```
apiVersion: networking.olm.openshift.io/v1
kind: AWSLoadBalancerController 
```
1
```
metadata:
  name: cluster 
```
2
```
spec:
  subnetTagging: Auto 
```
3
```
  additionalResourceTags: 
```
4
```
  - key: example.org/security-scope
    value: staging
  ingressClass: alb 
```
5
```
  config:
    replicas: 2 
```
6
```
  enabledAddons: 
```
7
```
    - AWSWAFv2 
```
8
Copy to Clipboard Toggle word wrap
1
Defines the AWSLoadBalancerController object.
2
Defines the AWS Load Balancer Controller name. This instance name gets added as a suffix to all related resources.
3
Configures the subnet tagging method for the AWS Load Balancer Controller. The following values are valid:
Auto: The AWS Load Balancer Operator determines the subnets that belong to the cluster and tags them appropriately. The Operator cannot determine the role correctly if the internal subnet tags are not present on internal subnet.
Manual: You manually tag the subnets that belong to the cluster with the appropriate role tags. Use this option if you installed your cluster on user-provided infrastructure.
4
Defines the tags used by the AWS Load Balancer Controller when it provisions AWS resources.
5
Defines the ingress class name. The default value is alb.
6
Specifies the number of replicas of the AWS Load Balancer Controller.
7
Specifies annotations as an add-on for the AWS Load Balancer Controller.
8
Enables the alb.ingress.kubernetes.io/wafv2-acl-arn annotation.
Create the AWSLoadBalancerController object by running the following command:
```
oc create -f sample-aws-lb.yaml
```
```
$ oc create -f sample-aws-lb.yaml
```
Copy to Clipboard Toggle word wrap

Create a YAML file that defines the Deployment resource:

Example sample-aws-lb.yaml file

apiVersion: apps/v1
kind: Deployment 
metadata:
  name: <echoserver> 
  namespace: echoserver
spec:
  selector:
    matchLabels:
      app: echoserver
  replicas: 3 
  template:
    metadata:
      labels:
        app: echoserver
    spec:
      containers:
        - image: openshift/origin-node
          command:
           - "/bin/socat"
          args:
            - TCP4-LISTEN:8080,reuseaddr,fork
            - EXEC:'/bin/bash -c \"printf \\\"HTTP/1.0 200 OK\r\n\r\n\\\"; sed -e \\\"/^\r/q\\\"\"'
          imagePullPolicy: Always
          name: echoserver
          ports:
            - containerPort: 8080

apiVersion: apps/v1
kind: Deployment

1


metadata:
  name: <echoserver>

2


  namespace: echoserver
spec:
  selector:
    matchLabels:
      app: echoserver
  replicas: 3

3


  template:
    metadata:
      labels:
        app: echoserver
    spec:
      containers:
        - image: openshift/origin-node
          command:
           - "/bin/socat"
          args:
            - TCP4-LISTEN:8080,reuseaddr,fork
            - EXEC:'/bin/bash -c \"printf \\\"HTTP/1.0 200 OK\r\n\r\n\\\"; sed -e \\\"/^\r/q\\\"\"'
          imagePullPolicy: Always
          name: echoserver
          ports:
            - containerPort: 8080

Copy to Clipboard

Toggle word wrap

1: Defines the deployment resource.
2: Specifies the deployment name.
3: Specifies the number of replicas of the deployment.

Create a YAML file that defines the Service resource:

Example service-albo.yaml file:

apiVersion: v1
kind: Service 
metadata:
  name: <echoserver> 
  namespace: echoserver
spec:
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  type: NodePort
  selector:
    app: echoserver

apiVersion: v1
kind: Service

1


metadata:
  name: <echoserver>

2


  namespace: echoserver
spec:
  ports:
    - port: 80
      targetPort: 8080
      protocol: TCP
  type: NodePort
  selector:
    app: echoserver

Copy to Clipboard

Toggle word wrap

1: Defines the service resource.
2: Specifies the service name.

Create a YAML file that defines the Ingress resource:

Example ingress-albo.yaml file:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: <name> 
  namespace: echoserver
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: instance
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Exact
            backend:
              service:
                name: <echoserver> 
                port:
                  number: 80

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: <name>

1


  namespace: echoserver
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: instance
spec:
  ingressClassName: alb
  rules:
    - http:
        paths:
          - path: /
            pathType: Exact
            backend:
              service:
                name: <echoserver>

2


                port:
                  number: 80

Copy to Clipboard

Toggle word wrap

1: Specify a name for the Ingress resource.
2: Specifies the service name.

Verification

Save the status of the Ingress resource in the HOST variable by running the following command:

HOST=$(oc get ingress -n echoserver echoserver --template='{{(index .status.loadBalancer.ingress 0).hostname}}')

$ HOST=$(oc get ingress -n echoserver echoserver --template='{{(index .status.loadBalancer.ingress 0).hostname}}')

Copy to Clipboard

Toggle word wrap

Verify the status of the Ingress resource by running the following command:
```
curl $HOST
```
```
$ curl $HOST
```
Copy to Clipboard Toggle word wrap

22.6. Serving multiple ingress resources through a single AWS Load Balancer
Copy link

You can route the traffic to different services that are part of a single domain through a single AWS Load Balancer. Each Ingress resource provides different endpoints of the domain.

22.6.1. Creating multiple ingress resources through a single AWS Load Balancer
Copy link

You can route the traffic to multiple ingress resources through a single AWS Load Balancer by using the CLI.

Prerequisites

You have an access to the OpenShift CLI (oc).

Procedure

Create an IngressClassParams resource YAML file, for example, sample-single-lb-params.yaml, as follows:
```
apiVersion: elbv2.k8s.aws/v1beta1 
kind: IngressClassParams
metadata:
  name: single-lb-params 
spec:
  group:
    name: single-lb 
```
```
apiVersion: elbv2.k8s.aws/v1beta1 
```
1
```
kind: IngressClassParams
metadata:
  name: single-lb-params 
```
2
```
spec:
  group:
    name: single-lb 
```
3
Copy to Clipboard Toggle word wrap
1
Defines the API group and version of the IngressClassParams resource.
2
Specifies the IngressClassParams resource name.
3
Specifies the IngressGroup resource name. All of the Ingress resources of this class belong to this IngressGroup.
Create the IngressClassParams resource by running the following command:
```
oc create -f sample-single-lb-params.yaml
```
```
$ oc create -f sample-single-lb-params.yaml
```
Copy to Clipboard Toggle word wrap
Create the IngressClass resource YAML file, for example, sample-single-lb-class.yaml, as follows:
```
apiVersion: networking.k8s.io/v1 
kind: IngressClass
metadata:
  name: single-lb 
spec:
  controller: ingress.k8s.aws/alb 
  parameters:
    apiGroup: elbv2.k8s.aws 
    kind: IngressClassParams 
    name: single-lb-params 
```
```
apiVersion: networking.k8s.io/v1 
```
1
```
kind: IngressClass
metadata:
  name: single-lb 
```
2
```
spec:
  controller: ingress.k8s.aws/alb 
```
3
```
  parameters:
    apiGroup: elbv2.k8s.aws 
```
4
```
    kind: IngressClassParams 
```
5
```
    name: single-lb-params 
```
6
Copy to Clipboard Toggle word wrap
1
Defines the API group and version of the IngressClass resource.
2
Specifies the ingress class name.
3
Defines the controller name. The ingress.k8s.aws/alb value denotes that all ingress resources of this class should be managed by the AWS Load Balancer Controller.
4
Defines the API group of the IngressClassParams resource.
5
Defines the resource type of the IngressClassParams resource.
6
Defines the IngressClassParams resource name.
Create the IngressClass resource by running the following command:
```
oc create -f sample-single-lb-class.yaml
```
```
$ oc create -f sample-single-lb-class.yaml
```
Copy to Clipboard Toggle word wrap

Create the AWSLoadBalancerController resource YAML file, for example, sample-single-lb.yaml, as follows:

apiVersion: networking.olm.openshift.io/v1
kind: AWSLoadBalancerController
metadata:
  name: cluster
spec:
  subnetTagging: Auto
  ingressClass: single-lb

apiVersion: networking.olm.openshift.io/v1
kind: AWSLoadBalancerController
metadata:
  name: cluster
spec:
  subnetTagging: Auto
  ingressClass: single-lb

1

Copy to Clipboard

Toggle word wrap

1: Defines the name of the IngressClass resource.

Create the AWSLoadBalancerController resource by running the following command:
```
oc create -f sample-single-lb.yaml
```
```
$ oc create -f sample-single-lb.yaml
```
Copy to Clipboard Toggle word wrap

Create the Ingress resource YAML file, for example, sample-multiple-ingress.yaml, as follows:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-1 
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing 
    alb.ingress.kubernetes.io/group.order: "1" 
    alb.ingress.kubernetes.io/target-type: instance 
spec:
  ingressClassName: single-lb 
  rules:
  - host: example.com 
    http:
        paths:
        - path: /blog 
          pathType: Prefix
          backend:
            service:
              name: example-1 
              port:
                number: 80 
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-2
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/group.order: "2"
    alb.ingress.kubernetes.io/target-type: instance
spec:
  ingressClassName: single-lb
  rules:
  - host: example.com
    http:
        paths:
        - path: /store
          pathType: Prefix
          backend:
            service:
              name: example-2
              port:
                number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-3
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/group.order: "3"
    alb.ingress.kubernetes.io/target-type: instance
spec:
  ingressClassName: single-lb
  rules:
  - host: example.com
    http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: example-3
              port:
                number: 80

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-1

1


  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing

2


    alb.ingress.kubernetes.io/group.order: "1"

3


    alb.ingress.kubernetes.io/target-type: instance

4


spec:
  ingressClassName: single-lb

5


  rules:
  - host: example.com

6


    http:
        paths:
        - path: /blog

7


          pathType: Prefix
          backend:
            service:
              name: example-1

8


              port:
                number: 80

9


---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-2
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/group.order: "2"
    alb.ingress.kubernetes.io/target-type: instance
spec:
  ingressClassName: single-lb
  rules:
  - host: example.com
    http:
        paths:
        - path: /store
          pathType: Prefix
          backend:
            service:
              name: example-2
              port:
                number: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-3
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/group.order: "3"
    alb.ingress.kubernetes.io/target-type: instance
spec:
  ingressClassName: single-lb
  rules:
  - host: example.com
    http:
        paths:
        - path: /
          pathType: Prefix
          backend:
            service:
              name: example-3
              port:
                number: 80

Copy to Clipboard

Toggle word wrap

1: Specifies the ingress name.
2: Indicates the load balancer to provision in the public subnet to access the internet.
3: Specifies the order in which the rules from the multiple ingress resources are matched when the request is received at the load balancer.
4: Indicates that the load balancer will target OpenShift Container Platform nodes to reach the service.
5: Specifies the ingress class that belongs to this ingress.
6: Defines a domain name used for request routing.
7: Defines the path that must route to the service.
8: Defines the service name that serves the endpoint configured in the Ingress resource.
9: Defines the port on the service that serves the endpoint.

Create the Ingress resource by running the following command:
```
oc create -f sample-multiple-ingress.yaml
```
```
$ oc create -f sample-multiple-ingress.yaml
```
Copy to Clipboard Toggle word wrap

22.7. Adding TLS termination
Copy link

You can add TLS termination on the AWS Load Balancer.

22.7.1. Adding TLS termination on the AWS Load Balancer
Copy link

You can route the traffic for the domain to pods of a service and add TLS termination on the AWS Load Balancer.

Prerequisites

You have an access to the OpenShift CLI (oc).

Procedure

Create a YAML file that defines the AWSLoadBalancerController resource:
Example add-tls-termination-albc.yaml file
```
apiVersion: networking.olm.openshift.io/v1
kind: AWSLoadBalancerController
metadata:
  name: cluster
spec:
  subnetTagging: Auto
  ingressClass: tls-termination 
```
```
apiVersion: networking.olm.openshift.io/v1
kind: AWSLoadBalancerController
metadata:
  name: cluster
spec:
  subnetTagging: Auto
  ingressClass: tls-termination 
```
1
Copy to Clipboard Toggle word wrap
1
Defines the ingress class name. If the ingress class is not present in your cluster the AWS Load Balancer Controller creates one. The AWS Load Balancer Controller reconciles the additional ingress class values if spec.controller is set to ingress.k8s.aws/alb.

Create a YAML file that defines the Ingress resource:

Example add-tls-termination-ingress.yaml file

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: <example> 
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing 
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:xxxxx 
spec:
  ingressClassName: tls-termination 
  rules:
  - host: <example.com> 
    http:
        paths:
          - path: /
            pathType: Exact
            backend:
              service:
                name: <example-service> 
                port:
                  number: 80

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: <example>

1


  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing

2


    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-west-2:xxxxx

3


spec:
  ingressClassName: tls-termination

4


  rules:
  - host: <example.com>

5


    http:
        paths:
          - path: /
            pathType: Exact
            backend:
              service:
                name: <example-service>

6


                port:
                  number: 80

Copy to Clipboard

Toggle word wrap

1: Specifies the ingress name.
2: The controller provisions the load balancer for ingress in a public subnet to access the load balancer over the internet.
3: The Amazon Resource Name (ARN) of the certificate that you attach to the load balancer.
4: Defines the ingress class name.
5: Defines the domain for traffic routing.
6: Defines the service for traffic routing.

22.8. Configuring cluster-wide proxy
Copy link

You can configure the cluster-wide proxy in the AWS Load Balancer Operator. After configuring the cluster-wide proxy, Operator Lifecycle Manager (OLM) automatically updates all the deployments of the Operators with the environment variables such as HTTP_PROXY, HTTPS_PROXY, and NO_PROXY. These variables are populated to the managed controller by the AWS Load Balancer Operator.

22.8.1. Trusting the certificate authority of the cluster-wide proxy
Copy link

Create the config map to contain the certificate authority (CA) bundle in the aws-load-balancer-operator namespace by running the following command:
```
oc -n aws-load-balancer-operator create configmap trusted-ca
```
```
$ oc -n aws-load-balancer-operator create configmap trusted-ca
```
Copy to Clipboard Toggle word wrap
To inject the trusted CA bundle into the config map, add the config.openshift.io/inject-trusted-cabundle=true label to the config map by running the following command:
```
oc -n aws-load-balancer-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
```
```
$ oc -n aws-load-balancer-operator label cm trusted-ca config.openshift.io/inject-trusted-cabundle=true
```
Copy to Clipboard Toggle word wrap

Update the AWS Load Balancer Operator subscription to access the config map in the AWS Load Balancer Operator deployment by running the following command:

oc -n aws-load-balancer-operator patch subscription aws-load-balancer-operator --type='merge' -p '{"spec":{"config":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}],"volumes":[{"name":"trusted-ca","configMap":{"name":"trusted-ca"}}],"volumeMounts":[{"name":"trusted-ca","mountPath":"/etc/pki/tls/certs/albo-tls-ca-bundle.crt","subPath":"ca-bundle.crt"}]}}}'

$ oc -n aws-load-balancer-operator patch subscription aws-load-balancer-operator --type='merge' -p '{"spec":{"config":{"env":[{"name":"TRUSTED_CA_CONFIGMAP_NAME","value":"trusted-ca"}],"volumes":[{"name":"trusted-ca","configMap":{"name":"trusted-ca"}}],"volumeMounts":[{"name":"trusted-ca","mountPath":"/etc/pki/tls/certs/albo-tls-ca-bundle.crt","subPath":"ca-bundle.crt"}]}}}'

Copy to Clipboard

Toggle word wrap

After the AWS Load Balancer Operator is deployed, verify that the CA bundle is added to the aws-load-balancer-operator-controller-manager deployment by running the following command:

oc -n aws-load-balancer-operator exec deploy/aws-load-balancer-operator-controller-manager -c manager -- bash -c "ls -l /etc/pki/tls/certs/albo-tls-ca-bundle.crt; printenv TRUSTED_CA_CONFIGMAP_NAME"

$ oc -n aws-load-balancer-operator exec deploy/aws-load-balancer-operator-controller-manager -c manager -- bash -c "ls -l /etc/pki/tls/certs/albo-tls-ca-bundle.crt; printenv TRUSTED_CA_CONFIGMAP_NAME"

Copy to Clipboard

Toggle word wrap

Example output

-rw-r--r--. 1 root 1000690000 5875 Jan 11 12:25 /etc/pki/tls/certs/albo-tls-ca-bundle.crt
trusted-ca

-rw-r--r--. 1 root 1000690000 5875 Jan 11 12:25 /etc/pki/tls/certs/albo-tls-ca-bundle.crt
trusted-ca

Copy to Clipboard

Toggle word wrap

Optional: Restart deployment of the AWS Load Balancer Operator every time the config map changes by running the following command:

oc -n aws-load-balancer-operator rollout restart deployment/aws-load-balancer-operator-controller-manager

$ oc -n aws-load-balancer-operator rollout restart deployment/aws-load-balancer-operator-controller-manager

Copy to Clipboard

Toggle word wrap

Chapter 23. Multiple networks
Copy link

23.1. Understanding multiple networks
Copy link

In Kubernetes, container networking is delegated to networking plugins that implement the Container Network Interface (CNI).

OpenShift Container Platform uses the Multus CNI plugin to allow chaining of CNI plugins. During cluster installation, you configure your default pod network. The default network handles all ordinary network traffic for the cluster. You can define an additional network based on the available CNI plugins and attach one or more of these networks to your pods. You can define more than one additional network for your cluster, depending on your needs. This gives you flexibility when you configure pods that deliver network functionality, such as switching or routing.

23.1.1. Usage scenarios for an additional network
Copy link

You can use an additional network in situations where network isolation is needed, including data plane and control plane separation. Isolating network traffic is useful for the following performance and security reasons:

Performance: You can send traffic on two different planes to manage how much traffic is along each plane.
Security: You can send sensitive traffic onto a network plane that is managed specifically for security considerations, and you can separate private data that must not be shared between tenants or customers.

All of the pods in the cluster still use the cluster-wide default network to maintain connectivity across the cluster. Every pod has an eth0 interface that is attached to the cluster-wide pod network. You can view the interfaces for a pod by using the oc exec -it <pod_name> -- ip a command. If you add additional network interfaces that use Multus CNI, they are named net1, net2, …, netN.

To attach additional network interfaces to a pod, you must create configurations that define how the interfaces are attached. You specify each interface by using a NetworkAttachmentDefinition custom resource (CR). A CNI configuration inside each of these CRs defines how that interface is created.

23.1.2. Additional networks in OpenShift Container Platform
Copy link

OpenShift Container Platform provides the following CNI plugins for creating additional networks in your cluster:

bridge: Configure a bridge-based additional network to allow pods on the same host to communicate with each other and the host.
host-device: Configure a host-device additional network to allow pods access to a physical Ethernet network device on the host system.
ipvlan: Configure an ipvlan-based additional network to allow pods on a host to communicate with other hosts and pods on those hosts, similar to a macvlan-based additional network. Unlike a macvlan-based additional network, each pod shares the same MAC address as the parent physical network interface.
vlan: Configure a vlan-based additional network to allow VLAN-based network isolation and connectivity for pods.
macvlan: Configure a macvlan-based additional network to allow pods on a host to communicate with other hosts and pods on those hosts by using a physical network interface. Each pod that is attached to a macvlan-based additional network is provided a unique MAC address.
SR-IOV: Configure an SR-IOV based additional network to allow pods to attach to a virtual function (VF) interface on SR-IOV capable hardware on the host system.

23.2. Configuring an additional network
Copy link

As a cluster administrator, you can configure an additional network for your cluster. The following network types are supported:

Bridge
Host device
VLAN
IPVLAN
MACVLAN
OVN-Kubernetes

23.2.1. Approaches to managing an additional network
Copy link

You can manage the lifecycle of an additional network in OpenShift Container Platform by using one of two approaches: modifying the Cluster Network Operator (CNO) configuration or applying a YAML manifest. Each approach is mutually exclusive and you can only use one approach for managing an additional network at a time. For either approach, the additional network is managed by a Container Network Interface (CNI) plugin that you configure. The two different approaches are summarized here:

Modifying the Cluster Network Operator (CNO) configuration: Configuring additional networks through CNO is only possible for cluster administrators. The CNO automatically creates and manages the NetworkAttachmentDefinition object. By using this approach, you can define NetworkAttachmentDefinition objects at install time through configuration of the install-config.
Applying a YAML manifest: You can manage the additional network directly by creating an NetworkAttachmentDefinition object. Compared to modifying the CNO configuration, this approach gives you more granular control and flexibility when it comes to configuration.

Note

When deploying OpenShift Container Platform nodes with multiple network interfaces on Red Hat OpenStack Platform (RHOSP) with OVN Kubernetes, DNS configuration of the secondary interface might take precedence over the DNS configuration of the primary interface. In this case, remove the DNS nameservers for the subnet ID that is attached to the secondary interface:

openstack subnet set --dns-nameserver 0.0.0.0 <subnet_id>

$ openstack subnet set --dns-nameserver 0.0.0.0 <subnet_id>

Copy to Clipboard

Toggle word wrap

23.2.2. IP address assignment for additional networks
Copy link

For additional networks, IP addresses can be assigned using an IP Address Management (IPAM) CNI plugin, which supports various assignment methods, including Dynamic Host Configuration Protocol (DHCP) and static assignment.

The DHCP IPAM CNI plugin responsible for dynamic assignment of IP addresses operates with two distinct components:

CNI Plugin: Responsible for integrating with the Kubernetes networking stack to request and release IP addresses.
DHCP IPAM CNI Daemon: A listener for DHCP events that coordinates with existing DHCP servers in the environment to handle IP address assignment requests. This daemon is not a DHCP server itself.

For networks requiring type: dhcp in their IPAM configuration, ensure the following:

A DHCP server is available and running in the environment. The DHCP server is external to the cluster and is expected to be part of the customer’s existing network infrastructure.
The DHCP server is appropriately configured to serve IP addresses to the nodes.

In cases where a DHCP server is unavailable in the environment, it is recommended to use the Whereabouts IPAM CNI plugin instead. The Whereabouts CNI provides similar IP address management capabilities without the need for an external DHCP server.

Note

Use the Whereabouts CNI plugin when there is no external DHCP server or where static IP address management is preferred. The Whereabouts plugin includes a reconciler daemon to manage stale IP address allocations.

A DHCP lease must be periodically renewed throughout the container’s lifetime, so a separate daemon, the DHCP IPAM CNI Daemon, is required. To deploy the DHCP IPAM CNI daemon, modify the Cluster Network Operator (CNO) configuration to trigger the deployment of this daemon as part of the additional network setup.

23.2.3. Configuration for an additional network attachment
Copy link

An additional network is configured by using the NetworkAttachmentDefinition API in the k8s.cni.cncf.io API group.

Important

Do not store any sensitive information or a secret in the NetworkAttachmentDefinition object because this information is accessible by the project administration user.

The configuration for the API is described in the following table:

Expand

Table 23.1. NetworkAttachmentDefinition API fields
Field	Type	Description
`metadata.name`	`string`	The name for the additional network.
`metadata.namespace`	`string`	The namespace that the object is associated with.
`spec.config`	`string`	The CNI plugin configuration in JSON format.

23.2.3.1. Configuration of an additional network through the Cluster Network Operator
Copy link

The configuration for an additional network attachment is specified as part of the Cluster Network Operator (CNO) configuration.

The following YAML describes the configuration parameters for managing an additional network with the CNO:

Cluster Network Operator configuration

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  # ...
  additionalNetworks: 
  - name: <name> 
    namespace: <namespace> 
    rawCNIConfig: |- 
      {
        ...
      }
    type: Raw

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  # ...
  additionalNetworks:

1


  - name: <name>

2


    namespace: <namespace>

3


    rawCNIConfig: |-

4


      {
        ...
      }
    type: Raw

Copy to Clipboard

Toggle word wrap

1: An array of one or more additional network configurations.
2: The name for the additional network attachment that you are creating. The name must be unique within the specified namespace.
3: The namespace to create the network attachment in. If you do not specify a value then the default namespace is used.
Important
To prevent namespace issues for the OVN-Kubernetes network plugin, do not name your additional network attachment default, because this namespace is reserved for the default additional network attachment.
4: A CNI plugin configuration in JSON format.

23.2.3.2. Configuration of an additional network from a YAML manifest
Copy link

The configuration for an additional network is specified from a YAML configuration file, such as in the following example:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: <name> 
spec:
  config: |- 
    {
      ...
    }

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: <name>

1


spec:
  config: |-

2


    {
      ...
    }

Copy to Clipboard

Toggle word wrap

1: The name for the additional network attachment that you are creating.
2: A CNI plugin configuration in JSON format.

23.2.4. Configurations for additional network types
Copy link

The specific configuration fields for additional networks is described in the following sections.

23.2.4.1. Configuration for a bridge additional network
Copy link

The following object describes the configuration parameters for the bridge CNI plugin:

Expand

Table 23.2. Bridge CNI plugin JSON configuration object
Field	Type	Description
`cniVersion`	`string`	The CNI specification version. The `0.3.1` value is required.
`name`	`string`	The value for the `name` parameter you provided previously for the CNO configuration.
`type`	`string`	The name of the CNI plugin to configure: `bridge`.
`ipam`	`object`	The configuration object for the IPAM CNI plugin. The plugin manages IP address assignment for the attachment definition.
`bridge`	`string`	Optional: Specify the name of the virtual bridge to use. If the bridge interface does not exist on the host, it is created. The default value is `cni0`.
`ipMasq`	`boolean`	Optional: Set to `true` to enable IP masquerading for traffic that leaves the virtual network. The source IP address for all traffic is rewritten to the bridge’s IP address. If the bridge does not have an IP address, this setting has no effect. The default value is `false`.
`isGateway`	`boolean`	Optional: Set to `true` to assign an IP address to the bridge. The default value is `false`.
`isDefaultGateway`	`boolean`	Optional: Set to `true` to configure the bridge as the default gateway for the virtual network. The default value is `false`. If `isDefaultGateway` is set to `true`, then `isGateway` is also set to `true` automatically.
`forceAddress`	`boolean`	Optional: Set to `true` to allow assignment of a previously assigned IP address to the virtual bridge. When set to `false`, if an IPv4 address or an IPv6 address from overlapping subsets is assigned to the virtual bridge, an error occurs. The default value is `false`.
`hairpinMode`	`boolean`	Optional: Set to `true` to allow the virtual bridge to send an Ethernet frame back through the virtual port it was received on. This mode is also known as reflective relay. The default value is `false`.
`promiscMode`	`boolean`	Optional: Set to `true` to enable promiscuous mode on the bridge. The default value is `false`.
`vlan`	`string`	Optional: Specify a virtual LAN (VLAN) tag as an integer value. By default, no VLAN tag is assigned.
`preserveDefaultVlan`	`string`	Optional: Indicates whether the default vlan must be preserved on the `veth` end connected to the bridge. Defaults to true.
`mtu`	`integer`	Optional: Set the maximum transmission unit (MTU) to the specified value. The default value is automatically set by the kernel.
`enabledad`	`boolean`	Optional: Enables duplicate address detection for the container side `veth`. The default value is `false`.
`macspoofchk`	`boolean`	Optional: Enables mac spoof check, limiting the traffic originating from the container to the mac address of the interface. The default value is `false`.

Note

The VLAN parameter configures the VLAN tag on the host end of the veth and also enables the vlan_filtering feature on the bridge interface.

Note

To configure uplink for a L2 network you need to allow the vlan on the uplink interface by using the following command:

 bridge vlan add vid VLAN_ID dev DEV

$  bridge vlan add vid VLAN_ID dev DEV

Copy to Clipboard

Toggle word wrap

23.2.4.1.1. bridge configuration example
Copy link

The following example configures an additional network named bridge-net:

{
  "cniVersion": "0.3.1",
  "name": "bridge-net",
  "type": "bridge",
  "isGateway": true,
  "vlan": 2,
  "ipam": {
    "type": "dhcp"
    }
}

{
  "cniVersion": "0.3.1",
  "name": "bridge-net",
  "type": "bridge",
  "isGateway": true,
  "vlan": 2,
  "ipam": {
    "type": "dhcp"
    }
}

Copy to Clipboard

Toggle word wrap

23.2.4.2. Configuration for a host device additional network
Copy link

Note

Specify your network device by setting only one of the following parameters: device,hwaddr, kernelpath, or pciBusID.

The following object describes the configuration parameters for the host-device CNI plugin:

Expand

Table 23.3. Host device CNI plugin JSON configuration object
Field	Type	Description
`cniVersion`	`string`	The CNI specification version. The `0.3.1` value is required.
`name`	`string`	The value for the `name` parameter you provided previously for the CNO configuration.
`type`	`string`	The name of the CNI plugin to configure: `host-device`.
`device`	`string`	Optional: The name of the device, such as `eth0`.
`hwaddr`	`string`	Optional: The device hardware MAC address.
`kernelpath`	`string`	Optional: The Linux kernel device path, such as `/sys/devices/pci0000:00/0000:00:1f.6`.
`pciBusID`	`string`	Optional: The PCI address of the network device, such as `0000:00:1f.6`.

23.2.4.2.1. host-device configuration example
Copy link

The following example configures an additional network named hostdev-net:

{
  "cniVersion": "0.3.1",
  "name": "hostdev-net",
  "type": "host-device",
  "device": "eth1"
}

{
  "cniVersion": "0.3.1",
  "name": "hostdev-net",
  "type": "host-device",
  "device": "eth1"
}

Copy to Clipboard

Toggle word wrap

23.2.4.3. Configuration for an VLAN additional network
Copy link

The following object describes the configuration parameters for the VLAN CNI plugin:

Expand

Table 23.4. VLAN CNI plugin JSON configuration object
Field	Type	Description
`cniVersion`	`string`	The CNI specification version. The `0.3.1` value is required.
`name`	`string`	The value for the `name` parameter you provided previously for the CNO configuration.
`type`	`string`	The name of the CNI plugin to configure: `vlan`.
`master`	`string`	The Ethernet interface to associate with the network attachment. If a `master` is not specified, the interface for the default network route is used.
`vlanId`	`integer`	Set the id of the vlan.
`ipam`	`object`	The configuration object for the IPAM CNI plugin. The plugin manages IP address assignment for the attachment definition.
`mtu`	`integer`	Optional: Set the maximum transmission unit (MTU) to the specified value. The default value is automatically set by the kernel.
`dns`	`integer`	Optional: DNS information to return, for example, a priority-ordered list of DNS nameservers.
`linkInContainer`	`boolean`	Optional: Specifies if the master interface is in the container network namespace or the main network namespace.

23.2.4.3.1. vlan configuration example
Copy link

The following example configures an additional network named vlan-net:

{
  "name": "vlan-net",
  "cniVersion": "0.3.1",
  "type": "vlan",
  "master": "eth0",
  "mtu": 1500,
  "vlanId": 5,
  "linkInContainer": false,
  "ipam": {
      "type": "host-local",
      "subnet": "10.1.1.0/24"
  },
  "dns": {
      "nameservers": [ "10.1.1.1", "8.8.8.8" ]
  }
}

{
  "name": "vlan-net",
  "cniVersion": "0.3.1",
  "type": "vlan",
  "master": "eth0",
  "mtu": 1500,
  "vlanId": 5,
  "linkInContainer": false,
  "ipam": {
      "type": "host-local",
      "subnet": "10.1.1.0/24"
  },
  "dns": {
      "nameservers": [ "10.1.1.1", "8.8.8.8" ]
  }
}

Copy to Clipboard

Toggle word wrap

23.2.4.4. Configuration for an IPVLAN additional network
Copy link

The following object describes the configuration parameters for the IPVLAN CNI plugin:

Expand

Table 23.5. IPVLAN CNI plugin JSON configuration object
Field	Type	Description
`cniVersion`	`string`	The CNI specification version. The `0.3.1` value is required.
`name`	`string`	The value for the `name` parameter you provided previously for the CNO configuration.
`type`	`string`	The name of the CNI plugin to configure: `ipvlan`.
`ipam`	`object`	The configuration object for the IPAM CNI plugin. The plugin manages IP address assignment for the attachment definition. This is required unless the plugin is chained.
`mode`	`string`	Optional: The operating mode for the virtual network. The value must be `l2`, `l3`, or `l3s`. The default value is `l2`.
`master`	`string`	Optional: The Ethernet interface to associate with the network attachment. If a `master` is not specified, the interface for the default network route is used.
`mtu`	`integer`	Optional: Set the maximum transmission unit (MTU) to the specified value. The default value is automatically set by the kernel.

Note

The ipvlan object does not allow virtual interfaces to communicate with the master interface. Therefore the container will not be able to reach the host by using the ipvlan interface. Be sure that the container joins a network that provides connectivity to the host, such as a network supporting the Precision Time Protocol (PTP).
A single master interface cannot simultaneously be configured to use both macvlan and ipvlan.
For IP allocation schemes that cannot be interface agnostic, the ipvlan plugin can be chained with an earlier plugin that handles this logic. If the master is omitted, then the previous result must contain a single interface name for the ipvlan plugin to enslave. If ipam is omitted, then the previous result is used to configure the ipvlan interface.

23.2.4.4.1. ipvlan configuration example
Copy link

The following example configures an additional network named ipvlan-net:

{
  "cniVersion": "0.3.1",
  "name": "ipvlan-net",
  "type": "ipvlan",
  "master": "eth1",
  "mode": "l3",
  "ipam": {
    "type": "static",
    "addresses": [
       {
         "address": "192.168.10.10/24"
       }
    ]
  }
}

{
  "cniVersion": "0.3.1",
  "name": "ipvlan-net",
  "type": "ipvlan",
  "master": "eth1",
  "mode": "l3",
  "ipam": {
    "type": "static",
    "addresses": [
       {
         "address": "192.168.10.10/24"
       }
    ]
  }
}

Copy to Clipboard

Toggle word wrap

23.2.4.5. Configuration for a MACVLAN additional network
Copy link

The following object describes the configuration parameters for the MAC Virtual LAN (MACVLAN) Container Network Interface (CNI) plugin:

Expand

Table 23.6. MACVLAN CNI plugin JSON configuration object
Field	Type	Description
`cniVersion`	`string`	The CNI specification version. The `0.3.1` value is required.
`name`	`string`	The value for the `name` parameter you provided previously for the CNO configuration.
`type`	`string`	The name of the CNI plugin to configure: `macvlan`.
`ipam`	`object`	The configuration object for the IPAM CNI plugin. The plugin manages IP address assignment for the attachment definition.
`mode`	`string`	Optional: Configures traffic visibility on the virtual network. Must be either `bridge`, `passthru`, `private`, or `vepa`. If a value is not provided, the default value is `bridge`.
`master`	`string`	Optional: The host network interface to associate with the newly created macvlan interface. If a value is not specified, then the default route interface is used.
`mtu`	`integer`	Optional: The maximum transmission unit (MTU) to the specified value. The default value is automatically set by the kernel.

Note

If you specify the master key for the plugin configuration, use a different physical network interface than the one that is associated with your primary network plugin to avoid possible conflicts.

23.2.4.5.1. MACVLAN configuration example
Copy link

The following example configures an additional network named macvlan-net:

{
  "cniVersion": "0.3.1",
  "name": "macvlan-net",
  "type": "macvlan",
  "master": "eth1",
  "mode": "bridge",
  "ipam": {
    "type": "dhcp"
    }
}

{
  "cniVersion": "0.3.1",
  "name": "macvlan-net",
  "type": "macvlan",
  "master": "eth1",
  "mode": "bridge",
  "ipam": {
    "type": "dhcp"
    }
}

Copy to Clipboard

Toggle word wrap

23.2.4.6. Configuration for an OVN-Kubernetes additional network
Copy link

The Red Hat OpenShift Networking OVN-Kubernetes network plugin allows the configuration of secondary network interfaces for pods. To configure secondary network interfaces, you must define the configurations in the NetworkAttachmentDefinition custom resource definition (CRD).

Important

Configuration for an OVN-Kubernetes additional network is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Note

Pod and multi-network policy creation might remain in a pending state until the OVN-Kubernetes control plane agent in the nodes processes the associated network-attachment-definition CR.

The following sections provide example configurations for each of the topologies that OVN-Kubernetes currently allows for secondary networks.

Note

Networks names must be unique. For example, creating multiple NetworkAttachmentDefinition CRDs with different configurations that reference the same network is unsupported.

23.2.4.6.1. OVN-Kubernetes network plugin JSON configuration table
Copy link

The following table describes the configuration parameters for the OVN-Kubernetes CNI network plugin:

Expand

Table 23.7. OVN-Kubernetes network plugin JSON configuration table
Field	Type	Description
`cniVersion`	`string`	The CNI specification version. The required value is `0.3.1`.
`name`	`string`	The name of the network. These networks are not namespaced. For example, you can have a network named `l2-network` referenced from two different `NetworkAttachmentDefinitions` that exist on two different namespaces. This ensures that pods making use of the `NetworkAttachmentDefinition` on their own different namespaces can communicate over the same secondary network. However, those two different `NetworkAttachmentDefinitions` must also share the same network specific parameters such as `topology`, `subnets`, `mtu`, and `excludeSubnets`.
`type`	`string`	The name of the CNI plugin to configure. The required value is `ovn-k8s-cni-overlay`.
`topology`	`string`	The topological configuration for the network. The required value is `layer2`.
`subnets`	`string`	The subnet to use for the network across the cluster. For `"topology":"layer2"` deployments, IPv6 (`2001:DBB::/64`) and dual-stack (`192.168.100.0/24,2001:DBB::/64`) subnets are supported.
`mtu`	`string`	The maximum transmission unit (MTU) to the specified value. The default value, `1300`, is automatically set by the kernel.
`netAttachDefName`	`string`	The metadata `namespace` and `name` of the network attachment definition object where this configuration is included. For example, if this configuration is defined in a `NetworkAttachmentDefinition` in namespace `ns1` named `l2-network`, this should be set to `ns1/l2-network`.
`excludeSubnets`	`string`	A comma-separated list of CIDRs and IPs. IPs are removed from the assignable IP pool, and are never passed to the pods. When omitted, the logical switch implementing the network only provides layer 2 communication, and users must configure IPs for the pods. Port security only prevents MAC spoofing.

23.2.4.6.2. Configuration for a switched topology
Copy link

The switched (layer 2) topology networks interconnect the workloads through a cluster-wide logical switch. This configuration can be used for IPv6 and dual-stack deployments.

Note

Layer 2 switched topology networks only allow for the transfer of data packets between pods within a cluster.

The following NetworkAttachmentDefinition custom resource definition (CRD) YAML describes the fields needed to configure a switched secondary network.

    {
            "cniVersion": "0.3.1",
            "name": "l2-network",
            "type": "ovn-k8s-cni-overlay",
            "topology":"layer2",
            "subnets": "10.100.200.0/24",
            "mtu": 1300,
            "netAttachDefName": "ns1/l2-network",
            "excludeSubnets": "10.100.200.0/29"
    }

    {
            "cniVersion": "0.3.1",
            "name": "l2-network",
            "type": "ovn-k8s-cni-overlay",
            "topology":"layer2",
            "subnets": "10.100.200.0/24",
            "mtu": 1300,
            "netAttachDefName": "ns1/l2-network",
            "excludeSubnets": "10.100.200.0/29"
    }

Copy to Clipboard

Toggle word wrap

23.2.4.6.3. Configuring pods for additional networks
Copy link

You must specify the secondary network attachments through the k8s.v1.cni.cncf.io/networks annotation.

The following example provisions a pod with two secondary attachments, one for each of the attachment configurations presented in this guide.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: l2-network
  name: tinypod
  namespace: ns1
spec:
  containers:
  - args:
    - pause
    image: k8s.gcr.io/e2e-test-images/agnhost:2.36
    imagePullPolicy: IfNotPresent
    name: agnhost-container

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: l2-network
  name: tinypod
  namespace: ns1
spec:
  containers:
  - args:
    - pause
    image: k8s.gcr.io/e2e-test-images/agnhost:2.36
    imagePullPolicy: IfNotPresent
    name: agnhost-container

Copy to Clipboard

Toggle word wrap

23.2.4.6.4. Configuring pods with a static IP address
Copy link

The following example provisions a pod with a static IP address.

Note

You can only specify the IP address for a pod’s secondary network attachment for layer 2 attachments.
Specifying a static IP address for the pod is only possible when the attachment configuration does not feature subnets.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
      {
        "name": "l2-network", 
        "mac": "02:03:04:05:06:07", 
        "interface": "myiface1", 
        "ips": [
          "192.0.2.20/24"
          ] 
      }
    ]'
  name: tinypod
  namespace: ns1
spec:
  containers:
  - args:
    - pause
    image: k8s.gcr.io/e2e-test-images/agnhost:2.36
    imagePullPolicy: IfNotPresent
    name: agnhost-container

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
      {
        "name": "l2-network",

1


        "mac": "02:03:04:05:06:07",

2


        "interface": "myiface1",

3


        "ips": [
          "192.0.2.20/24"
          ]

4


      }
    ]'
  name: tinypod
  namespace: ns1
spec:
  containers:
  - args:
    - pause
    image: k8s.gcr.io/e2e-test-images/agnhost:2.36
    imagePullPolicy: IfNotPresent
    name: agnhost-container

Copy to Clipboard

Toggle word wrap

1: The name of the network. This value must be unique across all NetworkAttachmentDefinitions.
2: The MAC address to be assigned for the interface.
3: The name of the network interface to be created for the pod.
4: The IP addresses to be assigned to the network interface.

23.2.5. Configuration of IP address assignment for an additional network
Copy link

The IP address management (IPAM) Container Network Interface (CNI) plugin provides IP addresses for other CNI plugins.

You can use the following IP address assignment types:

Static assignment.
Dynamic assignment through a DHCP server. The DHCP server you specify must be reachable from the additional network.
Dynamic assignment through the Whereabouts IPAM CNI plugin.

23.2.5.1. Static IP address assignment configuration
Copy link

The following table describes the configuration for static IP address assignment:

Expand

Table 23.8. ipam static configuration object
Field	Type	Description
`type`	`string`	The IPAM address type. The value `static` is required.
`addresses`	`array`	An array of objects specifying IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported.
`routes`	`array`	An array of objects specifying routes to configure inside the pod.
`dns`	`array`	Optional: An array of objects specifying the DNS configuration.

The addresses array requires objects with the following fields:

Expand

Table 23.9. ipam.addresses[] array
Field	Type	Description
`address`	`string`	An IP address and network prefix that you specify. For example, if you specify `10.10.21.10/24`, then the additional network is assigned an IP address of `10.10.21.10` and the netmask is `255.255.255.0`.
`gateway`	`string`	The default gateway to route egress network traffic to.

Expand

Table 23.10. ipam.routes[] array
Field	Type	Description
`dst`	`string`	The IP address range in CIDR format, such as `192.168.17.0/24` or `0.0.0.0/0` for the default route.
`gw`	`string`	The gateway where network traffic is routed.

Expand

Table 23.11. ipam.dns object
Field	Type	Description
`nameservers`	`array`	An array of one or more IP addresses for to send DNS queries to.
`domain`	`array`	The default domain to append to a hostname. For example, if the domain is set to `example.com`, a DNS lookup query for `example-host` is rewritten as `example-host.example.com`.
`search`	`array`	An array of domain names to append to an unqualified hostname, such as `example-host`, during a DNS lookup query.

Static IP address assignment configuration example

{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.7/24"
        }
      ]
  }
}

{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.7/24"
        }
      ]
  }
}

Copy to Clipboard

Toggle word wrap

23.2.5.2. Dynamic IP address (DHCP) assignment configuration
Copy link

The following JSON describes the configuration for dynamic IP address address assignment with DHCP.

Renewal of DHCP leases

A pod obtains its original DHCP lease when it is created. The lease must be periodically renewed by a minimal DHCP server deployment running on the cluster.

To trigger the deployment of the DHCP server, you must create a shim network attachment by editing the Cluster Network Operator configuration, as in the following example:

Example shim network attachment definition

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
  - name: dhcp-shim
    namespace: default
    type: Raw
    rawCNIConfig: |-
      {
        "name": "dhcp-shim",
        "cniVersion": "0.3.1",
        "type": "bridge",
        "ipam": {
          "type": "dhcp"
        }
      }
  # ...

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
  - name: dhcp-shim
    namespace: default
    type: Raw
    rawCNIConfig: |-
      {
        "name": "dhcp-shim",
        "cniVersion": "0.3.1",
        "type": "bridge",
        "ipam": {
          "type": "dhcp"
        }
      }
  # ...

Copy to Clipboard

Toggle word wrap

Expand

Table 23.12. ipam DHCP configuration object
Field	Type	Description
`type`	`string`	The IPAM address type. The value `dhcp` is required.

Dynamic IP address (DHCP) assignment configuration example

{
  "ipam": {
    "type": "dhcp"
  }
}

{
  "ipam": {
    "type": "dhcp"
  }
}

Copy to Clipboard

Toggle word wrap

23.2.5.3. Dynamic IP address assignment configuration with Whereabouts
Copy link

The Whereabouts CNI plugin allows the dynamic assignment of an IP address to an additional network without the use of a DHCP server.

The following table describes the configuration for dynamic IP address assignment with Whereabouts:

Expand

Table 23.13. ipam whereabouts configuration object
Field	Type	Description
`type`	`string`	The IPAM address type. The value `whereabouts` is required.
`range`	`string`	An IP address and range in CIDR notation. IP addresses are assigned from within this range of addresses.
`exclude`	`array`	Optional: A list of zero or more IP addresses and ranges in CIDR notation. IP addresses within an excluded address range are not assigned.

Dynamic IP address assignment configuration example that uses Whereabouts

{
  "ipam": {
    "type": "whereabouts",
    "range": "192.0.2.192/27",
    "exclude": [
       "192.0.2.192/30",
       "192.0.2.196/32"
    ]
  }
}

{
  "ipam": {
    "type": "whereabouts",
    "range": "192.0.2.192/27",
    "exclude": [
       "192.0.2.192/30",
       "192.0.2.196/32"
    ]
  }
}

Copy to Clipboard

Toggle word wrap

23.2.5.4. Creating a whereabouts-reconciler daemon set
Copy link

The Whereabouts reconciler is responsible for managing dynamic IP address assignments for the pods within a cluster by using the Whereabouts IP Address Management (IPAM) solution. It ensures that each pod gets a unique IP address from the specified IP address range. It also handles IP address releases when pods are deleted or scaled down.

Note

You can also use a NetworkAttachmentDefinition custom resource (CR) for dynamic IP address assignment.

The whereabouts-reconciler daemon set is automatically created when you configure an additional network through the Cluster Network Operator. It is not automatically created when you configure an additional network from a YAML manifest.

To trigger the deployment of the whereabouts-reconciler daemon set, you must manually create a whereabouts-shim network attachment by editing the Cluster Network Operator custom resource (CR) file.

Use the following procedure to deploy the whereabouts-reconciler daemon set.

Procedure

Edit the Network.operator.openshift.io custom resource (CR) by running the following command:
```
oc edit network.operator.openshift.io cluster
```
```
$ oc edit network.operator.openshift.io cluster
```
Copy to Clipboard Toggle word wrap

Include the additionalNetworks section shown in this example YAML extract within the spec definition of the custom resource (CR):

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
# ...
spec:
  additionalNetworks:
  - name: whereabouts-shim
    namespace: default
    rawCNIConfig: |-
      {
       "name": "whereabouts-shim",
       "cniVersion": "0.3.1",
       "type": "bridge",
       "ipam": {
         "type": "whereabouts"
       }
      }
    type: Raw
# ...

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
# ...
spec:
  additionalNetworks:
  - name: whereabouts-shim
    namespace: default
    rawCNIConfig: |-
      {
       "name": "whereabouts-shim",
       "cniVersion": "0.3.1",
       "type": "bridge",
       "ipam": {
         "type": "whereabouts"
       }
      }
    type: Raw
# ...

Copy to Clipboard

Toggle word wrap

Save the file and exit the text editor.

Verify that the whereabouts-reconciler daemon set deployed successfully by running the following command:

oc get all -n openshift-multus | grep whereabouts-reconciler

$ oc get all -n openshift-multus | grep whereabouts-reconciler

Copy to Clipboard

Toggle word wrap

Example output

pod/whereabouts-reconciler-jnp6g 1/1 Running 0 6s
pod/whereabouts-reconciler-k76gg 1/1 Running 0 6s
pod/whereabouts-reconciler-k86t9 1/1 Running 0 6s
pod/whereabouts-reconciler-p4sxw 1/1 Running 0 6s
pod/whereabouts-reconciler-rvfdv 1/1 Running 0 6s
pod/whereabouts-reconciler-svzw9 1/1 Running 0 6s
daemonset.apps/whereabouts-reconciler 6 6 6 6 6 kubernetes.io/os=linux 6s

pod/whereabouts-reconciler-jnp6g 1/1 Running 0 6s
pod/whereabouts-reconciler-k76gg 1/1 Running 0 6s
pod/whereabouts-reconciler-k86t9 1/1 Running 0 6s
pod/whereabouts-reconciler-p4sxw 1/1 Running 0 6s
pod/whereabouts-reconciler-rvfdv 1/1 Running 0 6s
pod/whereabouts-reconciler-svzw9 1/1 Running 0 6s
daemonset.apps/whereabouts-reconciler 6 6 6 6 6 kubernetes.io/os=linux 6s

Copy to Clipboard

Toggle word wrap

23.2.5.5. Configuring the Whereabouts IP reconciler schedule
Copy link

The Whereabouts IPAM CNI plugin runs the IP reconciler daily. This process cleans up any stranded IP allocations that might result in exhausting IPs and therefore prevent new pods from getting an IP allocated to them.

Use this procedure to change the frequency at which the IP reconciler runs.

Prerequisites

You installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.
You have deployed the whereabouts-reconciler daemon set, and the whereabouts-reconciler pods are up and running.

Procedure

Run the following command to create a ConfigMap object named whereabouts-config in the openshift-multus namespace with a specific cron expression for the IP reconciler:
```
oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/15 * * * *"
```
```
$ oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/15 * * * *"
```
Copy to Clipboard Toggle word wrap
This cron expression indicates the IP reconciler runs every 15 minutes. Adjust the expression based on your specific requirements.
Note
The whereabouts-reconciler daemon set can only consume a cron expression pattern that includes five asterisks. The sixth, which is used to denote seconds, is currently not supported.

Retrieve information about resources related to the whereabouts-reconciler daemon set and pods within the openshift-multus namespace by running the following command:

oc get all -n openshift-multus | grep whereabouts-reconciler

$ oc get all -n openshift-multus | grep whereabouts-reconciler

Copy to Clipboard

Toggle word wrap

Example output

pod/whereabouts-reconciler-2p7hw                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-76jk7                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-94zw6                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-mfh68                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-pgshz                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-xn5xz                   1/1     Running   0             4m14s
daemonset.apps/whereabouts-reconciler          6         6         6       6            6           kubernetes.io/os=linux   4m16s

pod/whereabouts-reconciler-2p7hw                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-76jk7                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-94zw6                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-mfh68                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-pgshz                   1/1     Running   0             4m14s
pod/whereabouts-reconciler-xn5xz                   1/1     Running   0             4m14s
daemonset.apps/whereabouts-reconciler          6         6         6       6            6           kubernetes.io/os=linux   4m16s

Copy to Clipboard

Toggle word wrap

Run the following command to verify that the whereabouts-reconciler pod runs the IP reconciler with the configured interval:

oc -n openshift-multus logs whereabouts-reconciler-2p7hw

$ oc -n openshift-multus logs whereabouts-reconciler-2p7hw

Copy to Clipboard

Toggle word wrap

Example output

2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_33_54.1375928161": CREATE
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_33_54.1375928161": CHMOD
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..data_tmp": RENAME
2024-02-02T16:33:54Z [verbose] using expression: */15 * * * *
2024-02-02T16:33:54Z [verbose] configuration updated to file "/cron-schedule/..data". New cron expression: */15 * * * *
2024-02-02T16:33:54Z [verbose] successfully updated CRON configuration id "00c2d1c9-631d-403f-bb86-73ad104a6817" - new cron expression: */15 * * * *
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/config": CREATE
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_26_17.3874177937": REMOVE
2024-02-02T16:45:00Z [verbose] starting reconciler run
2024-02-02T16:45:00Z [debug] NewReconcileLooper - inferred connection data
2024-02-02T16:45:00Z [debug] listing IP pools
2024-02-02T16:45:00Z [debug] no IP addresses to cleanup
2024-02-02T16:45:00Z [verbose] reconciler success

2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_33_54.1375928161": CREATE
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_33_54.1375928161": CHMOD
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..data_tmp": RENAME
2024-02-02T16:33:54Z [verbose] using expression: */15 * * * *
2024-02-02T16:33:54Z [verbose] configuration updated to file "/cron-schedule/..data". New cron expression: */15 * * * *
2024-02-02T16:33:54Z [verbose] successfully updated CRON configuration id "00c2d1c9-631d-403f-bb86-73ad104a6817" - new cron expression: */15 * * * *
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/config": CREATE
2024-02-02T16:33:54Z [debug] event not relevant: "/cron-schedule/..2024_02_02_16_26_17.3874177937": REMOVE
2024-02-02T16:45:00Z [verbose] starting reconciler run
2024-02-02T16:45:00Z [debug] NewReconcileLooper - inferred connection data
2024-02-02T16:45:00Z [debug] listing IP pools
2024-02-02T16:45:00Z [debug] no IP addresses to cleanup
2024-02-02T16:45:00Z [verbose] reconciler success

Copy to Clipboard

Toggle word wrap

23.2.6. Creating an additional network attachment with the Cluster Network Operator
Copy link

The Cluster Network Operator (CNO) manages additional network definitions. When you specify an additional network to create, the CNO creates the NetworkAttachmentDefinition object automatically.

Important

Do not edit the NetworkAttachmentDefinition objects that the Cluster Network Operator manages. Doing so might disrupt network traffic on your additional network.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Optional: Create the namespace for the additional networks:
```
oc create namespace <namespace_name>
```
```
$ oc create namespace <namespace_name>
```
Copy to Clipboard Toggle word wrap
To edit the CNO configuration, enter the following command:
```
oc edit networks.operator.openshift.io cluster
```
```
$ oc edit networks.operator.openshift.io cluster
```
Copy to Clipboard Toggle word wrap

Modify the CR that you are creating by adding the configuration for the additional network that you are creating, as in the following example CR.

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  # ...
  additionalNetworks:
  - name: tertiary-net
    namespace: namespace2
    type: Raw
    rawCNIConfig: |-
      {
        "cniVersion": "0.3.1",
        "name": "tertiary-net",
        "type": "ipvlan",
        "master": "eth1",
        "mode": "l2",
        "ipam": {
          "type": "static",
          "addresses": [
            {
              "address": "192.168.1.23/24"
            }
          ]
        }
      }

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  # ...
  additionalNetworks:
  - name: tertiary-net
    namespace: namespace2
    type: Raw
    rawCNIConfig: |-
      {
        "cniVersion": "0.3.1",
        "name": "tertiary-net",
        "type": "ipvlan",
        "master": "eth1",
        "mode": "l2",
        "ipam": {
          "type": "static",
          "addresses": [
            {
              "address": "192.168.1.23/24"
            }
          ]
        }
      }

Copy to Clipboard

Toggle word wrap

Save your changes and quit the text editor to commit your changes.

Verification

Confirm that the CNO created the NetworkAttachmentDefinition object by running the following command. There might be a delay before the CNO creates the object.
```
oc get network-attachment-definitions -n <namespace>
```
```
$ oc get network-attachment-definitions -n <namespace>
```
Copy to Clipboard Toggle word wrap
where:
<namespace>
Specifies the namespace for the network attachment that you added to the CNO configuration.
Example output
```
NAME                 AGE
test-network-1       14m
```
```
NAME                 AGE
test-network-1       14m
```
Copy to Clipboard Toggle word wrap

23.2.7. Creating an additional network attachment by applying a YAML manifest
Copy link

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create a YAML file with your additional network configuration, such as in the following example:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: next-net
spec:
  config: |-
    {
      "cniVersion": "0.3.1",
      "name": "work-network",
      "type": "host-device",
      "device": "eth1",
      "ipam": {
        "type": "dhcp"
      }
    }

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: next-net
spec:
  config: |-
    {
      "cniVersion": "0.3.1",
      "name": "work-network",
      "type": "host-device",
      "device": "eth1",
      "ipam": {
        "type": "dhcp"
      }
    }

Copy to Clipboard

Toggle word wrap

To create the additional network, enter the following command:
```
oc apply -f <file>.yaml
```
```
$ oc apply -f <file>.yaml
```
Copy to Clipboard Toggle word wrap
where:
<file>
Specifies the name of the file contained the YAML manifest.

23.3. About virtual routing and forwarding
Copy link

23.3.1. About virtual routing and forwarding
Copy link

Virtual routing and forwarding (VRF) devices combined with IP rules provide the ability to create virtual routing and forwarding domains. VRF reduces the number of permissions needed by CNF, and provides increased visibility of the network topology of secondary networks. VRF is used to provide multi-tenancy functionality, for example, where each tenant has its own unique routing tables and requires different default gateways.

Processes can bind a socket to the VRF device. Packets through the binded socket use the routing table associated with the VRF device. An important feature of VRF is that it impacts only OSI model layer 3 traffic and above so L2 tools, such as LLDP, are not affected. This allows higher priority IP rules such as policy based routing to take precedence over the VRF device rules directing specific traffic.

23.3.1.1. Benefits of secondary networks for pods for telecommunications operators
Copy link

In telecommunications use cases, each CNF can potentially be connected to multiple different networks sharing the same address space. These secondary networks can potentially conflict with the cluster’s main network CIDR. Using the CNI VRF plugin, network functions can be connected to different customers' infrastructure using the same IP address, keeping different customers isolated. IP addresses are overlapped with OpenShift Container Platform IP space. The CNI VRF plugin also reduces the number of permissions needed by CNF and increases the visibility of network topologies of secondary networks.

23.4. Configuring multi-network policy
Copy link

As a cluster administrator, you can configure multi-network for additional networks. You can specify multi-network policy for SR-IOV and macvlan additional networks. Macvlan additional networks are fully supported. Other types of additional networks, such as ipvlan, are not supported.

Important

Support for configuring multi-network policies for SR-IOV additional networks is a Technology Preview feature and is only supported with kernel network interface cards (NICs). SR-IOV is not supported for Data Plane Development Kit (DPDK) applications.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

Note

Configured network policies are ignored in IPv6 networks.

23.4.1. Differences between multi-network policy and network policy
Copy link

Although the MultiNetworkPolicy API implements the NetworkPolicy API, there are several important differences:

You must use the MultiNetworkPolicy API:

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy

Copy to Clipboard

Toggle word wrap

You must use the multi-networkpolicy resource name when using the CLI to interact with multi-network policies. For example, you can view a multi-network policy object with the oc get multi-networkpolicy <name> command where <name> is the name of a multi-network policy.
You can use the k8s.v1.cni.cncf.io/policy-for annotation on a MultiNetworkPolicy object to point to a NetworkAttachmentDefinition (NAD) custom resource (CR). The NAD CR defines the network to which the policy applies.
Example multi-network policy that includes the k8s.v1.cni.cncf.io/policy-for annotation
```
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
```
```
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
```
Copy to Clipboard Toggle word wrap
where:
<namespace_name>
Specifies the namespace name.
<network_name>
Specifies the name of a network attachment definition.

23.4.2. Enabling multi-network policy for the cluster
Copy link

As a cluster administrator, you can enable multi-network policy support on your cluster.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster with a user with cluster-admin privileges.

Procedure

Create the multinetwork-enable-patch.yaml file with the following YAML:

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  useMultiNetworkPolicy: true

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  useMultiNetworkPolicy: true

Copy to Clipboard

Toggle word wrap

Configure the cluster to enable multi-network policy:

oc patch network.operator.openshift.io cluster --type=merge --patch-file=multinetwork-enable-patch.yaml

$ oc patch network.operator.openshift.io cluster --type=merge --patch-file=multinetwork-enable-patch.yaml

Copy to Clipboard

Toggle word wrap

Example output

network.operator.openshift.io/cluster patched

network.operator.openshift.io/cluster patched

Copy to Clipboard

Toggle word wrap

23.4.3. Working with multi-network policy
Copy link

As a cluster administrator, you can create, edit, view, and delete multi-network policies.

23.4.3.1. Prerequisites
Copy link

You have enabled multi-network policy support for your cluster.

23.4.3.2. Creating a multi-network policy using the CLI
Copy link

To define granular rules describing ingress or egress network traffic allowed for namespaces in your cluster, you can create a multi-network policy.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You are working in the namespace that the multi-network policy applies to.

Procedure

Create a policy rule:

Create a <policy_name>.yaml file:
```
touch <policy_name>.yaml
```
```
$ touch <policy_name>.yaml
```
Copy to Clipboard Toggle word wrap
where:

<policy_name>
Specifies the multi-network policy file name.

Define a multi-network policy in the file that you just created, such as in the following examples:

Deny ingress from all pods in all namespaces

This is a fundamental policy, blocking all cross-pod networking other than cross-pod traffic allowed by the configuration of other Network Policies.

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: deny-by-default
  annotations:
    k8s.v1.cni.cncf.io/policy-for: <network_name>
spec:
  podSelector:
  ingress: []

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: deny-by-default
  annotations:
    k8s.v1.cni.cncf.io/policy-for: <network_name>
spec:
  podSelector:
  ingress: []

Copy to Clipboard

Toggle word wrap

where:

<network_name>: Specifies the name of a network attachment definition.

Allow ingress from all pods in the same namespace

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: allow-same-namespace
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
  ingress:
  - from:
    - podSelector: {}

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: allow-same-namespace
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
  ingress:
  - from:
    - podSelector: {}

Copy to Clipboard

Toggle word wrap

where:

<network_name>: Specifies the name of a network attachment definition.

Allow ingress traffic to one pod from a particular namespace

This policy allows traffic to pods labelled pod-a from pods running in namespace-y.

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: allow-traffic-pod
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
   matchLabels:
      pod: pod-a
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
           kubernetes.io/metadata.name: namespace-y

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: allow-traffic-pod
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
   matchLabels:
      pod: pod-a
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
           kubernetes.io/metadata.name: namespace-y

Copy to Clipboard

Toggle word wrap

where:

<network_name>: Specifies the name of a network attachment definition.

Restrict traffic to a service

This policy when applied ensures every pod with both labels app=bookstore and role=api can only be accessed by pods with label app=bookstore. In this example the application could be a REST API server, marked with labels app=bookstore and role=api.

This example addresses the following use cases:

Restricting the traffic to a service to only the other microservices that need to use it.

Restricting the connections to a database to only permit the application using it.

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: api-allow
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
    matchLabels:
      app: bookstore
      role: api
  ingress:
  - from:
      - podSelector:
          matchLabels:
            app: bookstore

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: api-allow
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
    matchLabels:
      app: bookstore
      role: api
  ingress:
  - from:
      - podSelector:
          matchLabels:
            app: bookstore

Copy to Clipboard

Toggle word wrap

where:

<network_name>: Specifies the name of a network attachment definition.

To create the multi-network policy object, enter the following command:
```
oc apply -f <policy_name>.yaml -n <namespace>
```
```
$ oc apply -f <policy_name>.yaml -n <namespace>
```
Copy to Clipboard Toggle word wrap
where:

<policy_name>
Specifies the multi-network policy file name.
<namespace>
Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Example output
```
multinetworkpolicy.k8s.cni.cncf.io/deny-by-default created
```
```
multinetworkpolicy.k8s.cni.cncf.io/deny-by-default created
```
Copy to Clipboard Toggle word wrap

Note

If you log in to the web console with cluster-admin privileges, you have a choice of creating a network policy in any namespace in the cluster directly in YAML or from a form in the web console.

23.4.3.3. Editing a multi-network policy
Copy link

You can edit a multi-network policy in a namespace.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You are working in the namespace where the multi-network policy exists.

Procedure

Optional: To list the multi-network policy objects in a namespace, enter the following command:
```
oc get multi-networkpolicy
```
```
$ oc get multi-networkpolicy
```
Copy to Clipboard Toggle word wrap
where:

<namespace>
Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Edit the multi-network policy object.
- If you saved the multi-network policy definition in a file, edit the file and make any necessary changes, and then enter the following command.
  $ oc apply -n <namespace> -f <policy_file>.yaml
  Copy to Clipboard Toggle word wrap
  where:
  
  <namespace>
  Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
  <policy_file>
  Specifies the name of the file containing the network policy.
- If you need to update the multi-network policy object directly, enter the following command:
  $ oc edit multi-networkpolicy <policy_name> -n <namespace>
  Copy to Clipboard Toggle word wrap
  where:
  
  <policy_name>
  Specifies the name of the network policy.
  <namespace>
  Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Confirm that the multi-network policy object is updated.
```
oc describe multi-networkpolicy <policy_name> -n <namespace>
```
```
$ oc describe multi-networkpolicy <policy_name> -n <namespace>
```
Copy to Clipboard Toggle word wrap
where:

<policy_name>
Specifies the name of the multi-network policy.
<namespace>
Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.

Note

If you log in to the web console with cluster-admin privileges, you have a choice of editing a network policy in any namespace in the cluster directly in YAML or from the policy in the web console through the Actions menu.

23.4.3.4. Viewing multi-network policies using the CLI
Copy link

You can examine the multi-network policies in a namespace.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You are working in the namespace where the multi-network policy exists.

Procedure

List multi-network policies in a namespace:
- To view multi-network policy objects defined in a namespace, enter the following command:
  $ oc get multi-networkpolicy
  Copy to Clipboard Toggle word wrap
- Optional: To examine a specific multi-network policy, enter the following command:
  $ oc describe multi-networkpolicy <policy_name> -n <namespace>
  Copy to Clipboard Toggle word wrap
  where:
  
  <policy_name>
  Specifies the name of the multi-network policy to inspect.
  <namespace>
  Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.

Note

If you log in to the web console with cluster-admin privileges, you have a choice of viewing a network policy in any namespace in the cluster directly in YAML or from a form in the web console.

23.4.3.5. Deleting a multi-network policy using the CLI
Copy link

You can delete a multi-network policy in a namespace.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You are working in the namespace where the multi-network policy exists.

Procedure

To delete a multi-network policy object, enter the following command:
```
oc delete multi-networkpolicy <policy_name> -n <namespace>
```
```
$ oc delete multi-networkpolicy <policy_name> -n <namespace>
```
Copy to Clipboard Toggle word wrap
where:

<policy_name>
Specifies the name of the multi-network policy.
<namespace>
Optional: Specifies the namespace if the object is defined in a different namespace than the current namespace.
Example output
```
multinetworkpolicy.k8s.cni.cncf.io/default-deny deleted
```
```
multinetworkpolicy.k8s.cni.cncf.io/default-deny deleted
```
Copy to Clipboard Toggle word wrap

Note

If you log in to the web console with cluster-admin privileges, you have a choice of deleting a network policy in any namespace in the cluster directly in YAML or from the policy in the web console through the Actions menu.

23.4.3.6. Creating a default deny all multi-network policy
Copy link

This is a fundamental policy, blocking all cross-pod networking other than network traffic allowed by the configuration of other deployed network policies. This procedure enforces a default deny-by-default policy.

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You are working in the namespace that the multi-network policy applies to.

Procedure

Create the following YAML that defines a deny-by-default policy to deny ingress from all pods in all namespaces. Save the YAML in the deny-by-default.yaml file:
```
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: deny-by-default
  namespace: default 
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> 
spec:
  podSelector: {} 
  ingress: [] 
```
```
apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: deny-by-default
  namespace: default 
```
1
```
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name> 
```
2
```
spec:
  podSelector: {} 
```
3
```
  ingress: [] 
```
4
Copy to Clipboard Toggle word wrap
1
namespace: default deploys this policy to the default namespace.
2
network_name: specifies the name of a network attachment definition.
3
podSelector: is empty, this means it matches all the pods. Therefore, the policy applies to all pods in the default namespace.
4
There are no ingress rules specified. This causes incoming traffic to be dropped to all pods.

Apply the policy by entering the following command:

oc apply -f deny-by-default.yaml

$ oc apply -f deny-by-default.yaml

Copy to Clipboard

Toggle word wrap

Example output

multinetworkpolicy.k8s.cni.cncf.io/deny-by-default created

multinetworkpolicy.k8s.cni.cncf.io/deny-by-default created

Copy to Clipboard

Toggle word wrap

23.4.3.7. Creating a multi-network policy to allow traffic from external clients
Copy link

With the deny-by-default policy in place you can proceed to configure a policy that allows traffic from external clients to a pod with the label app=web.

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Follow this procedure to configure a policy that allows external service from the public Internet directly or by using a Load Balancer to access the pod. Traffic is only allowed to a pod with the label app=web.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You are working in the namespace that the multi-network policy applies to.

Procedure

Create a policy that allows traffic from the public Internet directly or by using a load balancer to access the pod. Save the YAML in the web-allow-external.yaml file:

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: web-allow-external
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  policyTypes:
  - Ingress
  podSelector:
    matchLabels:
      app: web
  ingress:
    - {}

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: web-allow-external
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  policyTypes:
  - Ingress
  podSelector:
    matchLabels:
      app: web
  ingress:
    - {}

Copy to Clipboard

Toggle word wrap

Apply the policy by entering the following command:

oc apply -f web-allow-external.yaml

$ oc apply -f web-allow-external.yaml

Copy to Clipboard

Toggle word wrap

Example output

multinetworkpolicy.k8s.cni.cncf.io/web-allow-external created

multinetworkpolicy.k8s.cni.cncf.io/web-allow-external created

Copy to Clipboard

Toggle word wrap

This policy allows traffic from all resources, including external traffic as illustrated in the following diagram:

23.4.3.8. Creating a multi-network policy allowing traffic to an application from all namespaces
Copy link

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Follow this procedure to configure a policy that allows traffic from all pods in all namespaces to a particular application.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You are working in the namespace that the multi-network policy applies to.

Procedure

Create a policy that allows traffic from all pods in all namespaces to a particular application. Save the YAML in the web-allow-all-namespaces.yaml file:

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: web-allow-all-namespaces
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
    matchLabels:
      app: web 
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector: {}

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: web-allow-all-namespaces
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
    matchLabels:
      app: web

1


  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector: {}

2

Copy to Clipboard

Toggle word wrap

1: Applies the policy only to app:web pods in default namespace.
2: Selects all pods in all namespaces.

Note

By default, if you omit specifying a namespaceSelector it does not select any namespaces, which means the policy allows traffic only from the namespace the network policy is deployed to.

Apply the policy by entering the following command:

oc apply -f web-allow-all-namespaces.yaml

$ oc apply -f web-allow-all-namespaces.yaml

Copy to Clipboard

Toggle word wrap

Example output

multinetworkpolicy.k8s.cni.cncf.io/web-allow-all-namespaces created

multinetworkpolicy.k8s.cni.cncf.io/web-allow-all-namespaces created

Copy to Clipboard

Toggle word wrap

Verification

Start a web service in the default namespace by entering the following command:

oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

Copy to Clipboard

Toggle word wrap

Run the following command to deploy an alpine image in the secondary namespace and to start a shell:

oc run test-$RANDOM --namespace=secondary --rm -i -t --image=alpine -- sh

$ oc run test-$RANDOM --namespace=secondary --rm -i -t --image=alpine -- sh

Copy to Clipboard

Toggle word wrap

Run the following command in the shell and observe that the request is allowed:

wget -qO- --timeout=2 http://web.default

# wget -qO- --timeout=2 http://web.default

Copy to Clipboard

Toggle word wrap

Expected output

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Copy to Clipboard

Toggle word wrap

23.4.3.9. Creating a multi-network policy allowing traffic to an application from a namespace
Copy link

Note

If you log in with a user with the cluster-admin role, then you can create a network policy in any namespace in the cluster.

Follow this procedure to configure a policy that allows traffic to a pod with the label app=web from a particular namespace. You might want to do this to:

Restrict traffic to a production database only to namespaces where production workloads are deployed.
Enable monitoring tools deployed to a particular namespace to scrape metrics from the current namespace.

Prerequisites

Your cluster uses a network plugin that supports NetworkPolicy objects, such as the OVN-Kubernetes network plugin or the OpenShift SDN network plugin with mode: NetworkPolicy set. This mode is the default for OpenShift SDN.
You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You are working in the namespace that the multi-network policy applies to.

Procedure

Create a policy that allows traffic from all pods in a particular namespaces with a label purpose=production. Save the YAML in the web-allow-prod.yaml file:

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: web-allow-prod
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
    matchLabels:
      app: web 
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          purpose: production

apiVersion: k8s.cni.cncf.io/v1beta1
kind: MultiNetworkPolicy
metadata:
  name: web-allow-prod
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/policy-for:<namespace_name>/<network_name>
spec:
  podSelector:
    matchLabels:
      app: web

1


  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          purpose: production

2

Copy to Clipboard

Toggle word wrap

1: Applies the policy only to app:web pods in the default namespace.
2: Restricts traffic to only pods in namespaces that have the label purpose=production.

Apply the policy by entering the following command:

oc apply -f web-allow-prod.yaml

$ oc apply -f web-allow-prod.yaml

Copy to Clipboard

Toggle word wrap

Example output

multinetworkpolicy.k8s.cni.cncf.io/web-allow-prod created

multinetworkpolicy.k8s.cni.cncf.io/web-allow-prod created

Copy to Clipboard

Toggle word wrap

Verification

Start a web service in the default namespace by entering the following command:

oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

Copy to Clipboard

Toggle word wrap

Run the following command to create the prod namespace:
```
oc create namespace prod
```
```
$ oc create namespace prod
```
Copy to Clipboard Toggle word wrap
Run the following command to label the prod namespace:
```
oc label namespace/prod purpose=production
```
```
$ oc label namespace/prod purpose=production
```
Copy to Clipboard Toggle word wrap
Run the following command to create the dev namespace:
```
oc create namespace dev
```
```
$ oc create namespace dev
```
Copy to Clipboard Toggle word wrap
Run the following command to label the dev namespace:
```
oc label namespace/dev purpose=testing
```
```
$ oc label namespace/dev purpose=testing
```
Copy to Clipboard Toggle word wrap
Run the following command to deploy an alpine image in the dev namespace and to start a shell:
```
oc run test-$RANDOM --namespace=dev --rm -i -t --image=alpine -- sh
```
```
$ oc run test-$RANDOM --namespace=dev --rm -i -t --image=alpine -- sh
```
Copy to Clipboard Toggle word wrap
Run the following command in the shell and observe that the request is blocked:
```
wget -qO- --timeout=2 http://web.default
```
```
# wget -qO- --timeout=2 http://web.default
```
Copy to Clipboard Toggle word wrap
Expected output
```
wget: download timed out
```
```
wget: download timed out
```
Copy to Clipboard Toggle word wrap

Run the following command to deploy an alpine image in the prod namespace and start a shell:

oc run test-$RANDOM --namespace=prod --rm -i -t --image=alpine -- sh

$ oc run test-$RANDOM --namespace=prod --rm -i -t --image=alpine -- sh

Copy to Clipboard

Toggle word wrap

Run the following command in the shell and observe that the request is allowed:

wget -qO- --timeout=2 http://web.default

# wget -qO- --timeout=2 http://web.default

Copy to Clipboard

Toggle word wrap

Expected output

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Copy to Clipboard

Toggle word wrap

23.5. Attaching a pod to an additional network
Copy link

As a cluster user you can attach a pod to an additional network.

23.5.1. Adding a pod to an additional network
Copy link

You can add a pod to an additional network. The pod continues to send normal cluster-related network traffic over the default network.

When a pod is created additional networks are attached to it. However, if a pod already exists, you cannot attach additional networks to it.

The pod must be in the same namespace as the additional network.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster.

Procedure

Add an annotation to the Pod object. Only one of the following annotation formats can be used:
1. To attach an additional network without any customization, add an annotation with the following format. Replace <network> with the name of the additional network to associate with the pod:
  metadata: annotations: k8s.v1.cni.cncf.io/networks: <network>[,<network>,...]
  1
  Copy to Clipboard Toggle word wrap
  1
  To specify more than one additional network, separate each network with a comma. Do not include whitespace between the comma. If you specify the same additional network multiple times, that pod will have multiple network interfaces attached to that network.
2. To attach an additional network with customizations, add an annotation with the following format:
  metadata: annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "<network>",
  1
  "namespace": "<namespace>",
  2
  "default-route": ["<default-route>"]
  3
  } ]
  Copy to Clipboard Toggle word wrap
  1
  Specify the name of the additional network defined by a NetworkAttachmentDefinition object.
  2
  Specify the namespace where the NetworkAttachmentDefinition object is defined.
  3
  Optional: Specify an override for the default route, such as 192.168.17.1.
To create the pod, enter the following command. Replace <name> with the name of the pod.
```
oc create -f <name>.yaml
```
```
$ oc create -f <name>.yaml
```
Copy to Clipboard Toggle word wrap

Optional: To Confirm that the annotation exists in the Pod CR, enter the following command, replacing <name> with the name of the pod.

oc get pod <name> -o yaml

$ oc get pod <name> -o yaml

Copy to Clipboard

Toggle word wrap

In the following example, the example-pod pod is attached to the net1 additional network:

oc get pod example-pod -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-bridge
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.128.2.14"
          ],
          "default": true,
          "dns": {}
      },{
          "name": "macvlan-bridge",
          "interface": "net1",
          "ips": [
              "20.2.2.100"
          ],
          "mac": "22:2f:60:a5:f8:00",
          "dns": {}
      }]
  name: example-pod
  namespace: default
spec:
  ...
status:
  ...

$ oc get pod example-pod -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-bridge
    k8s.v1.cni.cncf.io/network-status: |-

1


      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.128.2.14"
          ],
          "default": true,
          "dns": {}
      },{
          "name": "macvlan-bridge",
          "interface": "net1",
          "ips": [
              "20.2.2.100"
          ],
          "mac": "22:2f:60:a5:f8:00",
          "dns": {}
      }]
  name: example-pod
  namespace: default
spec:
  ...
status:
  ...

Copy to Clipboard

Toggle word wrap

1: The k8s.v1.cni.cncf.io/network-status parameter is a JSON array of objects. Each object describes the status of an additional network attached to the pod. The annotation value is stored as a plain text value.

23.5.1.1. Specifying pod-specific addressing and routing options
Copy link

When attaching a pod to an additional network, you may want to specify further properties about that network in a particular pod. This allows you to change some aspects of routing, as well as specify static IP addresses and MAC addresses. To accomplish this, you can use the JSON formatted annotations.

Prerequisites

The pod must be in the same namespace as the additional network.
Install the OpenShift CLI (oc).
You must log in to the cluster.

Procedure

To add a pod to an additional network while specifying addressing and/or routing options, complete the following steps:

Edit the Pod resource definition. If you are editing an existing Pod resource, run the following command to edit its definition in the default editor. Replace <name> with the name of the Pod resource to edit.
```
oc edit pod <name>
```
```
$ oc edit pod <name>
```
Copy to Clipboard Toggle word wrap
In the Pod resource definition, add the k8s.v1.cni.cncf.io/networks parameter to the pod metadata mapping. The k8s.v1.cni.cncf.io/networks accepts a JSON string of a list of objects that reference the name of NetworkAttachmentDefinition custom resource (CR) names in addition to specifying additional properties.
```
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: '[<network>[,<network>,...]]' 
```
```
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: '[<network>[,<network>,...]]' 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <network> with a JSON object as shown in the following examples. The single quotes are required.

In the following example the annotation specifies which network attachment will have the default route, using the default-route parameter.

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
    {
      "name": "net1"
    },
    {
      "name": "net2", 
      "default-route": ["192.0.2.1"] 
    }]'
spec:
  containers:
  - name: example-pod
    command: ["/bin/bash", "-c", "sleep 2000000000000"]
    image: centos/tools

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
    {
      "name": "net1"
    },
    {
      "name": "net2",

1


      "default-route": ["192.0.2.1"]

2


    }]'
spec:
  containers:
  - name: example-pod
    command: ["/bin/bash", "-c", "sleep 2000000000000"]
    image: centos/tools

Copy to Clipboard

Toggle word wrap

1: The name key is the name of the additional network to associate with the pod.
2: The default-route key specifies a value of a gateway for traffic to be routed over if no other routing entry is present in the routing table. If more than one default-route key is specified, this will cause the pod to fail to become active.

The default route will cause any traffic that is not specified in other routes to be routed to the gateway.

Important

Setting the default route to an interface other than the default network interface for OpenShift Container Platform may cause traffic that is anticipated for pod-to-pod traffic to be routed over another interface.

To verify the routing properties of a pod, the oc command may be used to execute the ip command within a pod.

oc exec -it <pod_name> -- ip route

$ oc exec -it <pod_name> -- ip route

Copy to Clipboard

Toggle word wrap

Note

You may also reference the pod’s k8s.v1.cni.cncf.io/network-status to see which additional network has been assigned the default route, by the presence of the default-route key in the JSON-formatted list of objects.

To set a static IP address or MAC address for a pod you can use the JSON formatted annotations. This requires you create networks that specifically allow for this functionality. This can be specified in a rawCNIConfig for the CNO.

Edit the CNO CR by running the following command:
```
oc edit networks.operator.openshift.io cluster
```
```
$ oc edit networks.operator.openshift.io cluster
```
Copy to Clipboard Toggle word wrap

The following YAML describes the configuration parameters for the CNO:

Cluster Network Operator YAML configuration

name: <name> 
namespace: <namespace> 
rawCNIConfig: '{ 
  ...
}'
type: Raw

name: <name>

1


namespace: <namespace>

2


rawCNIConfig: '{

3


  ...
}'
type: Raw

Copy to Clipboard

Toggle word wrap

1: Specify a name for the additional network attachment that you are creating. The name must be unique within the specified namespace.
2: Specify the namespace to create the network attachment in. If you do not specify a value, then the default namespace is used.
3: Specify the CNI plugin configuration in JSON format, which is based on the following template.

The following object describes the configuration parameters for utilizing static MAC address and IP address using the macvlan CNI plugin:

macvlan CNI plugin JSON configuration object using static IP and MAC address

{
  "cniVersion": "0.3.1",
  "name": "<name>", 
  "plugins": [{ 
      "type": "macvlan",
      "capabilities": { "ips": true }, 
      "master": "eth0", 
      "mode": "bridge",
      "ipam": {
        "type": "static"
      }
    }, {
      "capabilities": { "mac": true }, 
      "type": "tuning"
    }]
}

{
  "cniVersion": "0.3.1",
  "name": "<name>",

1


  "plugins": [{

2


      "type": "macvlan",
      "capabilities": { "ips": true },

3


      "master": "eth0",

4


      "mode": "bridge",
      "ipam": {
        "type": "static"
      }
    }, {
      "capabilities": { "mac": true },

5


      "type": "tuning"
    }]
}

Copy to Clipboard

Toggle word wrap

1: Specifies the name for the additional network attachment to create. The name must be unique within the specified namespace.
2: Specifies an array of CNI plugin configurations. The first object specifies a macvlan plugin configuration and the second object specifies a tuning plugin configuration.
3: Specifies that a request is made to enable the static IP address functionality of the CNI plugin runtime configuration capabilities.
4: Specifies the interface that the macvlan plugin uses.
5: Specifies that a request is made to enable the static MAC address functionality of a CNI plugin.

The above network attachment can be referenced in a JSON formatted annotation, along with keys to specify which static IP and MAC address will be assigned to a given pod.

Edit the pod with:

oc edit pod <name>

$ oc edit pod <name>

Copy to Clipboard

Toggle word wrap

macvlan CNI plugin JSON configuration object using static IP and MAC address

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
      {
        "name": "<name>", 
        "ips": [ "192.0.2.205/24" ], 
        "mac": "CA:FE:C0:FF:EE:00" 
      }
    ]'

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: '[
      {
        "name": "<name>",

1


        "ips": [ "192.0.2.205/24" ],

2


        "mac": "CA:FE:C0:FF:EE:00"

3


      }
    ]'

Copy to Clipboard

Toggle word wrap

1: Use the <name> as provided when creating the rawCNIConfig above.
2: Provide an IP address including the subnet mask.
3: Provide the MAC address.

Note

Static IP addresses and MAC addresses do not have to be used at the same time, you may use them individually, or together.

To verify the IP address and MAC properties of a pod with additional networks, use the oc command to execute the ip command within a pod.

oc exec -it <pod_name> -- ip a

$ oc exec -it <pod_name> -- ip a

Copy to Clipboard

Toggle word wrap

23.6. Removing a pod from an additional network
Copy link

As a cluster user you can remove a pod from an additional network.

23.6.1. Removing a pod from an additional network
Copy link

You can remove a pod from an additional network only by deleting the pod.

Prerequisites

An additional network is attached to the pod.
Install the OpenShift CLI (oc).
Log in to the cluster.

Procedure

To delete the pod, enter the following command:
```
oc delete pod <name> -n <namespace>
```
```
$ oc delete pod <name> -n <namespace>
```
Copy to Clipboard Toggle word wrap
- <name> is the name of the pod.
- <namespace> is the namespace that contains the pod.

23.7. Editing an additional network
Copy link

As a cluster administrator you can modify the configuration for an existing additional network.

23.7.1. Modifying an additional network attachment definition
Copy link

As a cluster administrator, you can make changes to an existing additional network. Any existing pods attached to the additional network will not be updated.

Prerequisites

You have configured an additional network for your cluster.
Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

To edit an additional network for your cluster, complete the following steps:

Run the following command to edit the Cluster Network Operator (CNO) CR in your default text editor:
```
oc edit networks.operator.openshift.io cluster
```
```
$ oc edit networks.operator.openshift.io cluster
```
Copy to Clipboard Toggle word wrap
In the additionalNetworks collection, update the additional network with your changes.
Save your changes and quit the text editor to commit your changes.

Optional: Confirm that the CNO updated the NetworkAttachmentDefinition object by running the following command. Replace <network-name> with the name of the additional network to display. There might be a delay before the CNO updates the NetworkAttachmentDefinition object to reflect your changes.

oc get network-attachment-definitions <network-name> -o yaml

$ oc get network-attachment-definitions <network-name> -o yaml

Copy to Clipboard

Toggle word wrap

For example, the following console output displays a NetworkAttachmentDefinition object that is named net1:

oc get network-attachment-definitions net1 -o go-template='{{printf "%s\n" .spec.config}}'
{ "cniVersion": "0.3.1", "type": "macvlan",
"master": "ens5",
"mode": "bridge",
"ipam":       {"type":"static","routes":[{"dst":"0.0.0.0/0","gw":"10.128.2.1"}],"addresses":[{"address":"10.128.2.100/23","gateway":"10.128.2.1"}],"dns":{"nameservers":["172.30.0.10"],"domain":"us-west-2.compute.internal","search":["us-west-2.compute.internal"]}} }

$ oc get network-attachment-definitions net1 -o go-template='{{printf "%s\n" .spec.config}}'
{ "cniVersion": "0.3.1", "type": "macvlan",
"master": "ens5",
"mode": "bridge",
"ipam":       {"type":"static","routes":[{"dst":"0.0.0.0/0","gw":"10.128.2.1"}],"addresses":[{"address":"10.128.2.100/23","gateway":"10.128.2.1"}],"dns":{"nameservers":["172.30.0.10"],"domain":"us-west-2.compute.internal","search":["us-west-2.compute.internal"]}} }

Copy to Clipboard

Toggle word wrap

23.8. Removing an additional network
Copy link

As a cluster administrator you can remove an additional network attachment.

23.8.1. Removing an additional network attachment definition
Copy link

As a cluster administrator, you can remove an additional network from your OpenShift Container Platform cluster. The additional network is not removed from any pods it is attached to.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

To remove an additional network from your cluster, complete the following steps:

Edit the Cluster Network Operator (CNO) in your default text editor by running the following command:
```
oc edit networks.operator.openshift.io cluster
```
```
$ oc edit networks.operator.openshift.io cluster
```
Copy to Clipboard Toggle word wrap
Modify the CR by removing the configuration from the additionalNetworks collection for the network attachment definition you are removing.
```
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks: [] 
```
```
apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks: [] 
```
1
Copy to Clipboard Toggle word wrap
1
If you are removing the configuration mapping for the only additional network attachment definition in the additionalNetworks collection, you must specify an empty collection.
Save your changes and quit the text editor to commit your changes.
Optional: Confirm that the additional network CR was deleted by running the following command:
```
oc get network-attachment-definition --all-namespaces
```
```
$ oc get network-attachment-definition --all-namespaces
```
Copy to Clipboard Toggle word wrap

23.9. Assigning a secondary network to a VRF
Copy link

As a cluster administrator, you can configure an additional network for a virtual routing and forwarding (VRF) domain by using the CNI VRF plugin. The virtual network that this plugin creates is associated with the physical interface that you specify.

Using a secondary network with a VRF instance has the following advantages:

Workload isolation: Isolate workload traffic by configuring a VRF instance for the additional network.
Improved security: Enable improved security through isolated network paths in the VRF domain.
Multi-tenancy support: Support multi-tenancy through network segmentation with a unique routing table in the VRF domain for each tenant.

Note

Applications that use VRFs must bind to a specific device. The common usage is to use the SO_BINDTODEVICE option for a socket. The SO_BINDTODEVICE option binds the socket to the device that is specified in the passed interface name, for example, eth1. To use the SO_BINDTODEVICE option, the application must have CAP_NET_RAW capabilities.

Using a VRF through the ip vrf exec command is not supported in OpenShift Container Platform pods. To use VRF, bind applications directly to the VRF interface.

23.9.1. Creating an additional network attachment with the CNI VRF plugin
Copy link

The Cluster Network Operator (CNO) manages additional network definitions. When you specify an additional network to create, the CNO creates the NetworkAttachmentDefinition custom resource (CR) automatically.

Note

Do not edit the NetworkAttachmentDefinition CRs that the Cluster Network Operator manages. Doing so might disrupt network traffic on your additional network.

To create an additional network attachment with the CNI VRF plugin, perform the following procedure.

Prerequisites

Install the OpenShift Container Platform CLI (oc).
Log in to the OpenShift cluster as a user with cluster-admin privileges.

Procedure

Create the Network custom resource (CR) for the additional network attachment and insert the rawCNIConfig configuration for the additional network, as in the following example CR. Save the YAML as the file additional-network-attachment.yaml.

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
    - name: test-network-1
      namespace: additional-network-1
      type: Raw
      rawCNIConfig: '{
        "cniVersion": "0.3.1",
        "name": "macvlan-vrf",
        "plugins": [  
        {
          "type": "macvlan",
          "master": "eth1",
          "ipam": {
              "type": "static",
              "addresses": [
              {
                  "address": "191.168.1.23/24"
              }
              ]
          }
        },
        {
          "type": "vrf", 
          "vrfname": "vrf-1",  
          "table": 1001   
        }]
      }'

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
    - name: test-network-1
      namespace: additional-network-1
      type: Raw
      rawCNIConfig: '{
        "cniVersion": "0.3.1",
        "name": "macvlan-vrf",
        "plugins": [

1


        {
          "type": "macvlan",
          "master": "eth1",
          "ipam": {
              "type": "static",
              "addresses": [
              {
                  "address": "191.168.1.23/24"
              }
              ]
          }
        },
        {
          "type": "vrf",

2


          "vrfname": "vrf-1",

3


          "table": 1001

4


        }]
      }'

Copy to Clipboard

Toggle word wrap

1: plugins must be a list. The first item in the list must be the secondary network underpinning the VRF network. The second item in the list is the VRF plugin configuration.
2: type must be set to vrf.
3: vrfname is the name of the VRF that the interface is assigned to. If it does not exist in the pod, it is created.
4: Optional. table is the routing table ID. By default, the tableid parameter is used. If it is not specified, the CNI assigns a free routing table ID to the VRF.

Note

VRF functions correctly only when the resource is of type netdevice.

Create the Network resource:

oc create -f additional-network-attachment.yaml

$ oc create -f additional-network-attachment.yaml

Copy to Clipboard

Toggle word wrap

Confirm that the CNO created the NetworkAttachmentDefinition CR by running the following command. Replace <namespace> with the namespace that you specified when configuring the network attachment, for example, additional-network-1.
```
oc get network-attachment-definitions -n <namespace>
```
```
$ oc get network-attachment-definitions -n <namespace>
```
Copy to Clipboard Toggle word wrap
Example output
```
NAME                       AGE
additional-network-1       14m
```
```
NAME                       AGE
additional-network-1       14m
```
Copy to Clipboard Toggle word wrap
Note
There might be a delay before the CNO creates the CR.

Verification

Create a pod and assign it to the additional network with the VRF instance:

Create a YAML file that defines the Pod resource:

Example pod-additional-net.yaml file

apiVersion: v1
kind: Pod
metadata:
 name: pod-additional-net
 annotations:
   k8s.v1.cni.cncf.io/networks: '[
       {
               "name": "test-network-1" 
       }
 ]'
spec:
 containers:
 - name: example-pod-1
   command: ["/bin/bash", "-c", "sleep 9000000"]
   image: centos:8

apiVersion: v1
kind: Pod
metadata:
 name: pod-additional-net
 annotations:
   k8s.v1.cni.cncf.io/networks: '[
       {
               "name": "test-network-1"

1


       }
 ]'
spec:
 containers:
 - name: example-pod-1
   command: ["/bin/bash", "-c", "sleep 9000000"]
   image: centos:8

Copy to Clipboard

Toggle word wrap

1: Specify the name of the additional network with the VRF instance.

Create the Pod resource by running the following command:
```
oc create -f pod-additional-net.yaml
```
```
$ oc create -f pod-additional-net.yaml
```
Copy to Clipboard Toggle word wrap
Example output
```
pod/test-pod created
```
```
pod/test-pod created
```
Copy to Clipboard Toggle word wrap

Verify that the pod network attachment is connected to the VRF additional network. Start a remote session with the pod and run the following command:
```
ip vrf show
```
```
$ ip vrf show
```
Copy to Clipboard Toggle word wrap
Example output
```
Name              Table
-----------------------
vrf-1             1001
```
```
Name              Table
-----------------------
vrf-1             1001
```
Copy to Clipboard Toggle word wrap
Confirm that the VRF interface is the controller for the additional interface:
```
ip link
```
```
$ ip link
```
Copy to Clipboard Toggle word wrap
Example output
```
5: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master red state UP mode
```
```
5: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master red state UP mode
```
Copy to Clipboard Toggle word wrap

Chapter 24. Hardware networks
Copy link

24.1. About Single Root I/O Virtualization (SR-IOV) hardware networks
Copy link

The Single Root I/O Virtualization (SR-IOV) specification is a standard for a type of PCI device assignment that can share a single device with multiple pods.

SR-IOV can segment a compliant network device, recognized on the host node as a physical function (PF), into multiple virtual functions (VFs). The VF is used like any other network device. The SR-IOV network device driver for the device determines how the VF is exposed in the container:

netdevice driver: A regular kernel network device in the netns of the container
vfio-pci driver: A character device mounted in the container

You can use SR-IOV network devices with additional networks on your OpenShift Container Platform cluster installed on bare metal or Red Hat OpenStack Platform (RHOSP) infrastructure for applications that require high bandwidth or low latency.

You can configure multi-network policies for SR-IOV networks. The support for this is technology preview and SR-IOV additional networks are only supported with kernel NICs. They are not supported for Data Plane Development Kit (DPDK) applications.

Note

Creating multi-network policies on SR-IOV networks might not deliver the same performance to applications compared to SR-IOV networks without a multi-network policy configured.

Important

Multi-network policies for SR-IOV network is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

You can enable SR-IOV on a node by using the following command:

oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"

$ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"

Copy to Clipboard

Toggle word wrap

24.1.1. Components that manage SR-IOV network devices
Copy link

The SR-IOV Network Operator creates and manages the components of the SR-IOV stack. It performs the following functions:

Orchestrates discovery and management of SR-IOV network devices
Generates NetworkAttachmentDefinition custom resources for the SR-IOV Container Network Interface (CNI)
Creates and updates the configuration of the SR-IOV network device plugin
Creates node specific SriovNetworkNodeState custom resources
Updates the spec.interfaces field in each SriovNetworkNodeState custom resource

The Operator provisions the following components:

SR-IOV network configuration daemon: A daemon set that is deployed on worker nodes when the SR-IOV Network Operator starts. The daemon is responsible for discovering and initializing SR-IOV network devices in the cluster.
SR-IOV Network Operator webhook: A dynamic admission controller webhook that validates the Operator custom resource and sets appropriate default values for unset fields.
SR-IOV Network resources injector: A dynamic admission controller webhook that provides functionality for patching Kubernetes pod specifications with requests and limits for custom network resources such as SR-IOV VFs. The SR-IOV network resources injector adds the resource field to only the first container in a pod automatically.
SR-IOV network device plugin: A device plugin that discovers, advertises, and allocates SR-IOV network virtual function (VF) resources. Device plugins are used in Kubernetes to enable the use of limited resources, typically in physical devices. Device plugins give the Kubernetes scheduler awareness of resource availability, so that the scheduler can schedule pods on nodes with sufficient resources.
SR-IOV CNI plugin: A CNI plugin that attaches VF interfaces allocated from the SR-IOV network device plugin directly into a pod.
SR-IOV InfiniBand CNI plugin: A CNI plugin that attaches InfiniBand (IB) VF interfaces allocated from the SR-IOV network device plugin directly into a pod.

Note

The SR-IOV Network resources injector and SR-IOV Network Operator webhook are enabled by default and can be disabled by editing the default SriovOperatorConfig CR. Use caution when disabling the SR-IOV Network Operator Admission Controller webhook. You can disable the webhook under specific circumstances, such as troubleshooting, or if you want to use unsupported devices.

24.1.1.1. Supported platforms
Copy link

The SR-IOV Network Operator is supported on the following platforms:

Bare metal
Red Hat OpenStack Platform (RHOSP)

24.1.1.2. Supported devices
Copy link

OpenShift Container Platform supports the following network interface controllers:

Expand

Table 24.1. Supported network interface controllers
Manufacturer	Model	Vendor ID	Device ID
Broadcom	BCM57414	14e4	16d7
Broadcom	BCM57508	14e4	1750
Broadcom	BCM57504	14e4	1751
Intel	X710	8086	1572
Intel	XL710	8086	1583
Intel	X710 Base T	8086	15ff
Intel	XXV710	8086	158b
Intel	E810-CQDA2	8086	1592
Intel	E810-2CQDA2	8086	1592
Intel	E810-XXVDA2	8086	159b
Intel	E810-XXVDA4	8086	1593
Intel	E810-XXVDA4T	8086	1593
Mellanox	MT27700 Family [ConnectX‑4]	15b3	1013
Mellanox	MT27710 Family [ConnectX‑4 Lx]	15b3	1015
Mellanox	MT27800 Family [ConnectX‑5]	15b3	1017
Mellanox	MT28880 Family [ConnectX‑5 Ex]	15b3	1019
Mellanox	MT28908 Family [ConnectX‑6]	15b3	101b
Mellanox	MT2892 Family [ConnectX‑6 Dx]	15b3	101d
Mellanox	MT2894 Family [ConnectX‑6 Lx]	15b3	101f
Mellanox	MT42822 BlueField‑2 in ConnectX‑6 NIC mode	15b3	a2d6
Pensando ^[1]	DSC-25 dual-port 25G distributed services card for ionic driver	0x1dd8	0x1002
Pensando ^[1]	DSC-100 dual-port 100G distributed services card for ionic driver	0x1dd8	0x1003
Silicom	STS Family	8086	1591

OpenShift SR-IOV is supported, but you must set a static, Virtual Function (VF) media access control (MAC) address using the SR-IOV CNI config file when using SR-IOV.

Note

For the most up-to-date list of supported cards and compatible OpenShift Container Platform versions available, see Openshift Single Root I/O Virtualization (SR-IOV) and PTP hardware networks Support Matrix.

24.1.1.3. Automated discovery of SR-IOV network devices
Copy link

The SR-IOV Network Operator searches your cluster for SR-IOV capable network devices on worker nodes. The Operator creates and updates a SriovNetworkNodeState custom resource (CR) for each worker node that provides a compatible SR-IOV network device.

The CR is assigned the same name as the worker node. The status.interfaces list provides information about the network devices on a node.

Important

Do not modify a SriovNetworkNodeState object. The Operator creates and manages these resources automatically.

24.1.1.3.1. Example SriovNetworkNodeState object
Copy link

The following YAML is an example of a SriovNetworkNodeState object created by the SR-IOV Network Operator:

An SriovNetworkNodeState object

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  name: node-25 
  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
spec:
  dpConfigVersion: "39824"
status:
  interfaces: 
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f0
    pciAddress: "0000:18:00.0"
    totalvfs: 8
    vendor: 15b3
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f1
    pciAddress: "0000:18:00.1"
    totalvfs: 8
    vendor: 15b3
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f0
    pciAddress: 0000:81:00.0
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f1
    pciAddress: 0000:81:00.1
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens803f0
    pciAddress: 0000:86:00.0
    totalvfs: 64
    vendor: "8086"
  syncStatus: Succeeded

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  name: node-25

1


  namespace: openshift-sriov-network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovNetworkNodePolicy
    name: default
spec:
  dpConfigVersion: "39824"
status:
  interfaces:

2


  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f0
    pciAddress: "0000:18:00.0"
    totalvfs: 8
    vendor: 15b3
  - deviceID: "1017"
    driver: mlx5_core
    mtu: 1500
    name: ens785f1
    pciAddress: "0000:18:00.1"
    totalvfs: 8
    vendor: 15b3
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f0
    pciAddress: 0000:81:00.0
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens817f1
    pciAddress: 0000:81:00.1
    totalvfs: 64
    vendor: "8086"
  - deviceID: 158b
    driver: i40e
    mtu: 1500
    name: ens803f0
    pciAddress: 0000:86:00.0
    totalvfs: 64
    vendor: "8086"
  syncStatus: Succeeded

Copy to Clipboard

Toggle word wrap

1: The value of the name field is the same as the name of the worker node.
2: The interfaces stanza includes a list of all of the SR-IOV devices discovered by the Operator on the worker node.

24.1.1.4. Example use of a virtual function in a pod
Copy link

You can run a remote direct memory access (RDMA) or a Data Plane Development Kit (DPDK) application in a pod with SR-IOV VF attached.

This example shows a pod using a virtual function (VF) in RDMA mode:

Pod spec that uses RDMA mode

apiVersion: v1
kind: Pod
metadata:
  name: rdma-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-rdma-mlnx
spec:
  containers:
  - name: testpmd
    image: <RDMA_image>
    imagePullPolicy: IfNotPresent
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
    command: ["sleep", "infinity"]

apiVersion: v1
kind: Pod
metadata:
  name: rdma-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-rdma-mlnx
spec:
  containers:
  - name: testpmd
    image: <RDMA_image>
    imagePullPolicy: IfNotPresent
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
    command: ["sleep", "infinity"]

Copy to Clipboard

Toggle word wrap

The following example shows a pod with a VF in DPDK mode:

Pod spec that uses DPDK mode

apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-dpdk-net
spec:
  containers:
  - name: testpmd
    image: <DPDK_image>
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
      requests:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-dpdk-net
spec:
  containers:
  - name: testpmd
    image: <DPDK_image>
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]
    volumeMounts:
    - mountPath: /dev/hugepages
      name: hugepage
    resources:
      limits:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
      requests:
        memory: "1Gi"
        cpu: "2"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

Copy to Clipboard

Toggle word wrap

24.1.1.5. DPDK library for use with container applications
Copy link

An optional library, app-netutil, provides several API methods for gathering network information about a pod from within a container running within that pod.

This library can assist with integrating SR-IOV virtual functions (VFs) in Data Plane Development Kit (DPDK) mode into the container. The library provides both a Golang API and a C API.

Currently there are three API methods implemented:

GetCPUInfo(): This function determines which CPUs are available to the container and returns the list.
GetHugepages(): This function determines the amount of huge page memory requested in the Pod spec for each container and returns the values.
GetInterfaces(): This function determines the set of interfaces in the container and returns the list. The return value includes the interface type and type-specific data for each interface.

The repository for the library includes a sample Dockerfile to build a container image, dpdk-app-centos. The container image can run one of the following DPDK sample applications, depending on an environment variable in the pod specification: l2fwd, l3wd or testpmd. The container image provides an example of integrating the app-netutil library into the container image itself. The library can also integrate into an init container. The init container can collect the required data and pass the data to an existing DPDK workload.

24.1.1.6. Huge pages resource injection for Downward API
Copy link

When a pod specification includes a resource request or limit for huge pages, the Network Resources Injector automatically adds Downward API fields to the pod specification to provide the huge pages information to the container.

The Network Resources Injector adds a volume that is named podnetinfo and is mounted at /etc/podnetinfo for each container in the pod. The volume uses the Downward API and includes a file for huge pages requests and limits. The file naming convention is as follows:

/etc/podnetinfo/hugepages_1G_request_<container-name>
/etc/podnetinfo/hugepages_1G_limit_<container-name>
/etc/podnetinfo/hugepages_2M_request_<container-name>
/etc/podnetinfo/hugepages_2M_limit_<container-name>

The paths specified in the previous list are compatible with the app-netutil library. By default, the library is configured to search for resource information in the /etc/podnetinfo directory. If you choose to specify the Downward API path items yourself manually, the app-netutil library searches for the following paths in addition to the paths in the previous list.

/etc/podnetinfo/hugepages_request
/etc/podnetinfo/hugepages_limit
/etc/podnetinfo/hugepages_1G_request
/etc/podnetinfo/hugepages_1G_limit
/etc/podnetinfo/hugepages_2M_request
/etc/podnetinfo/hugepages_2M_limit

As with the paths that the Network Resources Injector can create, the paths in the preceding list can optionally end with a _<container-name> suffix.

24.1.3. Next steps
Copy link

Installing the SR-IOV Network Operator
Optional: Configuring the SR-IOV Network Operator
Configuring an SR-IOV network device
If you use OpenShift Virtualization: Connecting a virtual machine to an SR-IOV network
Configuring an SR-IOV network attachment
Adding a pod to an SR-IOV additional network

24.2. Installing the SR-IOV Network Operator
Copy link

You can install the Single Root I/O Virtualization (SR-IOV) Network Operator on your cluster to manage SR-IOV network devices and network attachments.

24.2.1. Installing SR-IOV Network Operator
Copy link

As a cluster administrator, you can install the SR-IOV Network Operator by using the OpenShift Container Platform CLI or the web console.

24.2.1.1. CLI: Installing the SR-IOV Network Operator
Copy link

As a cluster administrator, you can install the Operator using the CLI.

Prerequisites

A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
Install the OpenShift CLI (oc).
An account with cluster-admin privileges.

Procedure

To create the openshift-sriov-network-operator namespace, enter the following command:

cat << EOF| oc create -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sriov-network-operator
  annotations:
    workload.openshift.io/allowed: management
EOF

$ cat << EOF| oc create -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sriov-network-operator
  annotations:
    workload.openshift.io/allowed: management
EOF

Copy to Clipboard

Toggle word wrap

To create an OperatorGroup CR, enter the following command:

cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator
EOF

$ cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator
EOF

Copy to Clipboard

Toggle word wrap

Subscribe to the SR-IOV Network Operator.

Run the following command to get the OpenShift Container Platform major and minor version. It is required for the channel value in the next step.

OC_VERSION=$(oc version -o yaml | grep openshiftVersion | \
    grep -o '[0-9]*[.][0-9]*' | head -1)

$ OC_VERSION=$(oc version -o yaml | grep openshiftVersion | \
    grep -o '[0-9]*[.][0-9]*' | head -1)

Copy to Clipboard

Toggle word wrap

To create a Subscription CR for the SR-IOV Network Operator, enter the following command:

cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subscription
  namespace: openshift-sriov-network-operator
spec:
  channel: "${OC_VERSION}"
  name: sriov-network-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

$ cat << EOF| oc create -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subscription
  namespace: openshift-sriov-network-operator
spec:
  channel: "${OC_VERSION}"
  name: sriov-network-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

Copy to Clipboard

Toggle word wrap

To verify that the Operator is installed, enter the following command:

oc get csv -n openshift-sriov-network-operator \
  -o custom-columns=Name:.metadata.name,Phase:.status.phase

$ oc get csv -n openshift-sriov-network-operator \
  -o custom-columns=Name:.metadata.name,Phase:.status.phase

Copy to Clipboard

Toggle word wrap

Example output

Name                                         Phase
sriov-network-operator.4.13.0-202310121402   Succeeded

Name                                         Phase
sriov-network-operator.4.13.0-202310121402   Succeeded

Copy to Clipboard

Toggle word wrap

24.2.1.2. Web console: Installing the SR-IOV Network Operator
Copy link

As a cluster administrator, you can install the Operator using the web console.

Prerequisites

A cluster installed on bare-metal hardware with nodes that have hardware that supports SR-IOV.
Install the OpenShift CLI (oc).
An account with cluster-admin privileges.

Procedure

Install the SR-IOV Network Operator:
1. In the OpenShift Container Platform web console, click Operators → OperatorHub.
2. Select SR-IOV Network Operator from the list of available Operators, and then click Install.
3. On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
4. Click Install.
Verify that the SR-IOV Network Operator is installed successfully:
1. Navigate to the Operators → Installed Operators page.
2. Ensure that SR-IOV Network Operator is listed in the openshift-sriov-network-operator project with a Status of InstallSucceeded.
  Note
  During installation an Operator might display a Failed status. If the installation later succeeds with an InstallSucceeded message, you can ignore the Failed message.
  If the Operator does not appear as installed, to troubleshoot further:
  - Inspect the Operator Subscriptions and Install Plans tabs for any failure or errors under Status.
  - Navigate to the Workloads → Pods page and check the logs for pods in the openshift-sriov-network-operator project.
  - Check the namespace of the YAML file. If the annotation is missing, you can add the annotation workload.openshift.io/allowed=management to the Operator namespace with the following command:
    
    $ oc annotate ns/openshift-sriov-network-operator workload.openshift.io/allowed=management
    
    Copy to Clipboard Toggle word wrap
    
    Note
    For single-node OpenShift clusters, the annotation workload.openshift.io/allowed=management is required for the namespace.

24.2.2. Next steps
Copy link

Optional: Configuring the SR-IOV Network Operator

24.3. Configuring the SR-IOV Network Operator
Copy link

The Single Root I/O Virtualization (SR-IOV) Network Operator manages the SR-IOV network devices and network attachments in your cluster.

24.3.1. Configuring the SR-IOV Network Operator
Copy link

Important

Modifying the SR-IOV Network Operator configuration is not normally necessary. The default configuration is recommended for most use cases. Complete the steps to modify the relevant configuration only if the default behavior of the Operator is not compatible with your use case.

The SR-IOV Network Operator adds the SriovOperatorConfig.sriovnetwork.openshift.io CustomResourceDefinition resource. The Operator automatically creates a SriovOperatorConfig custom resource (CR) named default in the openshift-sriov-network-operator namespace.

Note

The default CR contains the SR-IOV Network Operator configuration for your cluster. To change the Operator configuration, you must modify this CR.

24.3.1.1. SR-IOV Network Operator config custom resource
Copy link

The fields for the sriovoperatorconfig custom resource are described in the following table:

Expand

Table 24.2. SR-IOV Network Operator config custom resource
Field	Type	Description
`metadata.name`	`string`	Specifies the name of the SR-IOV Network Operator instance. The default value is `default`. Do not set a different value.
`metadata.namespace`	`string`	Specifies the namespace of the SR-IOV Network Operator instance. The default value is `openshift-sriov-network-operator`. Do not set a different value.
`spec.configDaemonNodeSelector`	`string`	Specifies the node selection to control scheduling the SR-IOV Network Config Daemon on selected nodes. By default, this field is not set and the Operator deploys the SR-IOV Network Config daemon set on worker nodes.
`spec.disableDrain`	`boolean`	Specifies whether to disable the node draining process or enable the node draining process when you apply a new policy to configure the NIC on a node. Setting this field to `true` facilitates software development and installing OpenShift Container Platform on a single node. By default, this field is not set. For single-node clusters, set this field to `true` after installing the Operator. This field must remain set to `true`.
`spec.enableInjector`	`boolean`	Specifies whether to enable or disable the Network Resources Injector daemon set. By default, this field is set to `true`.
`spec.enableOperatorWebhook`	`boolean`	Specifies whether to enable or disable the Operator Admission Controller webhook daemon set.
`spec.logLevel`	`integer`	Specifies the log verbosity level of the Operator. By default, this field is set to `0`, which shows only basic logs. Set to `2` to show all the available logs.

24.3.1.2. About the Network Resources Injector
Copy link

The Network Resources Injector is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:

Mutation of resource requests and limits in a pod specification to add an SR-IOV resource name according to an SR-IOV network attachment definition annotation.
Mutation of a pod specification with a Downward API volume to expose pod annotations, labels, and huge pages requests and limits. Containers that run in the pod can access the exposed information as files under the /etc/podnetinfo path.

By default, the Network Resources Injector is enabled by the SR-IOV Network Operator and runs as a daemon set on all control plane nodes. The following is an example of Network Resources Injector pods running in a cluster with three control plane nodes:

oc get pods -n openshift-sriov-network-operator

$ oc get pods -n openshift-sriov-network-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                      READY   STATUS    RESTARTS   AGE
network-resources-injector-5cz5p          1/1     Running   0          10m
network-resources-injector-dwqpx          1/1     Running   0          10m
network-resources-injector-lktz5          1/1     Running   0          10m

NAME                                      READY   STATUS    RESTARTS   AGE
network-resources-injector-5cz5p          1/1     Running   0          10m
network-resources-injector-dwqpx          1/1     Running   0          10m
network-resources-injector-lktz5          1/1     Running   0          10m

Copy to Clipboard

Toggle word wrap

24.3.1.3. About the SR-IOV Network Operator admission controller webhook
Copy link

The SR-IOV Network Operator Admission Controller webhook is a Kubernetes Dynamic Admission Controller application. It provides the following capabilities:

Validation of the SriovNetworkNodePolicy CR when it is created or updated.
Mutation of the SriovNetworkNodePolicy CR by setting the default value for the priority and deviceType fields when the CR is created or updated.

By default the SR-IOV Network Operator Admission Controller webhook is enabled by the Operator and runs as a daemon set on all control plane nodes.

Note

Use caution when disabling the SR-IOV Network Operator Admission Controller webhook. You can disable the webhook under specific circumstances, such as troubleshooting, or if you want to use unsupported devices. For information about configuring unsupported devices, see Configuring the SR-IOV Network Operator to use an unsupported NIC.

The following is an example of the Operator Admission Controller webhook pods running in a cluster with three control plane nodes:

oc get pods -n openshift-sriov-network-operator

$ oc get pods -n openshift-sriov-network-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                      READY   STATUS    RESTARTS   AGE
operator-webhook-9jkw6                    1/1     Running   0          16m
operator-webhook-kbr5p                    1/1     Running   0          16m
operator-webhook-rpfrl                    1/1     Running   0          16m

NAME                                      READY   STATUS    RESTARTS   AGE
operator-webhook-9jkw6                    1/1     Running   0          16m
operator-webhook-kbr5p                    1/1     Running   0          16m
operator-webhook-rpfrl                    1/1     Running   0          16m

Copy to Clipboard

Toggle word wrap

24.3.1.4. About custom node selectors
Copy link

The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.

24.3.1.5. Disabling or enabling the Network Resources Injector
Copy link

To disable or enable the Network Resources Injector, which is enabled by default, complete the following procedure.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.
You must have installed the SR-IOV Network Operator.

Procedure

Set the enableInjector field. Replace <value> with false to disable the feature or true to enable the feature.

oc patch sriovoperatorconfig default \
  --type=merge -n openshift-sriov-network-operator \
  --patch '{ "spec": { "enableInjector": <value> } }'

$ oc patch sriovoperatorconfig default \
  --type=merge -n openshift-sriov-network-operator \
  --patch '{ "spec": { "enableInjector": <value> } }'

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to update the Operator:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  enableInjector: <value>

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  enableInjector: <value>

Copy to Clipboard

Toggle word wrap

24.3.1.6. Disabling or enabling the SR-IOV Network Operator admission controller webhook
Copy link

To disable or enable the admission controller webhook, which is enabled by default, complete the following procedure.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.
You must have installed the SR-IOV Network Operator.

Procedure

Set the enableOperatorWebhook field. Replace <value> with false to disable the feature or true to enable it:

oc patch sriovoperatorconfig default --type=merge \
  -n openshift-sriov-network-operator \
  --patch '{ "spec": { "enableOperatorWebhook": <value> } }'

$ oc patch sriovoperatorconfig default --type=merge \
  -n openshift-sriov-network-operator \
  --patch '{ "spec": { "enableOperatorWebhook": <value> } }'

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to update the Operator:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  enableOperatorWebhook: <value>

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  enableOperatorWebhook: <value>

Copy to Clipboard

Toggle word wrap

24.3.1.7. Configuring a custom NodeSelector for the SR-IOV Network Config daemon
Copy link

The SR-IOV Network Config daemon discovers and configures the SR-IOV network devices on cluster nodes. By default, it is deployed to all the worker nodes in the cluster. You can use node labels to specify on which nodes the SR-IOV Network Config daemon runs.

To specify the nodes where the SR-IOV Network Config daemon is deployed, complete the following procedure.

Important

When you update the configDaemonNodeSelector field, the SR-IOV Network Config daemon is recreated on each selected node. While the daemon is recreated, cluster users are unable to apply any new SR-IOV Network node policy or create new SR-IOV pods.

Procedure

To update the node selector for the operator, enter the following command:

oc patch sriovoperatorconfig default --type=json \
  -n openshift-sriov-network-operator \
  --patch '[{
      "op": "replace",
      "path": "/spec/configDaemonNodeSelector",
      "value": {<node_label>}
    }]'

$ oc patch sriovoperatorconfig default --type=json \
  -n openshift-sriov-network-operator \
  --patch '[{
      "op": "replace",
      "path": "/spec/configDaemonNodeSelector",
      "value": {<node_label>}
    }]'

Copy to Clipboard

Toggle word wrap

Replace <node_label> with a label to apply as in the following example: "node-role.kubernetes.io/worker": "".

Tip

You can alternatively apply the following YAML to update the Operator:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  configDaemonNodeSelector:
    <node_label>

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  configDaemonNodeSelector:
    <node_label>

Copy to Clipboard

Toggle word wrap

24.3.1.8. Configuring the SR-IOV Network Operator for single node installations
Copy link

By default, the SR-IOV Network Operator drains workloads from a node before every policy change. The Operator performs this action to ensure that there no workloads using the virtual functions before the reconfiguration.

For installations on a single node, there are no other nodes to receive the workloads. As a result, the Operator must be configured not to drain the workloads from the single node.

Important

After performing the following procedure to disable draining workloads, you must remove any workload that uses an SR-IOV network interface before you change any SR-IOV network node policy.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.
You must have installed the SR-IOV Network Operator.

Procedure

To set the disableDrain field to true, enter the following command:

oc patch sriovoperatorconfig default --type=merge \
  -n openshift-sriov-network-operator \
  --patch '{ "spec": { "disableDrain": true } }'

$ oc patch sriovoperatorconfig default --type=merge \
  -n openshift-sriov-network-operator \
  --patch '{ "spec": { "disableDrain": true } }'

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to update the Operator:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  disableDrain: true

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default
  namespace: openshift-sriov-network-operator
spec:
  disableDrain: true

Copy to Clipboard

Toggle word wrap

24.3.1.9. Deploying the SR-IOV Operator for hosted control planes
Copy link

Important

Hosted control planes is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

After you configure and deploy your hosting service cluster, you can create a subscription to the SR-IOV Operator on a hosted cluster. The SR-IOV pod runs on worker machines rather than the control plane.

Prerequisites

You must configure and deploy the hosted cluster on AWS. For more information, see Configuring the hosting cluster on AWS (Technology Preview).

Procedure

Create a namespace and an Operator group:

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sriov-network-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator

apiVersion: v1
kind: Namespace
metadata:
  name: openshift-sriov-network-operator
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: sriov-network-operators
  namespace: openshift-sriov-network-operator
spec:
  targetNamespaces:
  - openshift-sriov-network-operator

Copy to Clipboard

Toggle word wrap

Create a subscription to the SR-IOV Operator:

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subsription
  namespace: openshift-sriov-network-operator
spec:
  channel: "4.13"
  name: sriov-network-operator
  config:
    nodeSelector:
      node-role.kubernetes.io/worker: ""
  source: s/qe-app-registry/redhat-operators
  sourceNamespace: openshift-marketplace

apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: sriov-network-operator-subsription
  namespace: openshift-sriov-network-operator
spec:
  channel: "4.13"
  name: sriov-network-operator
  config:
    nodeSelector:
      node-role.kubernetes.io/worker: ""
  source: s/qe-app-registry/redhat-operators
  sourceNamespace: openshift-marketplace

Copy to Clipboard

Toggle word wrap

Verification

To verify that the SR-IOV Operator is ready, run the following command and view the resulting output:

oc get csv -n openshift-sriov-network-operator

$ oc get csv -n openshift-sriov-network-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                         DISPLAY                   VERSION               REPLACES                                     PHASE
sriov-network-operator.4.13.0-202211021237   SR-IOV Network Operator   4.13.0-202211021237   sriov-network-operator.4.13.0-202210290517   Succeeded

NAME                                         DISPLAY                   VERSION               REPLACES                                     PHASE
sriov-network-operator.4.13.0-202211021237   SR-IOV Network Operator   4.13.0-202211021237   sriov-network-operator.4.13.0-202210290517   Succeeded

Copy to Clipboard

Toggle word wrap

To verify that the SR-IOV pods are deployed, run the following command:
```
oc get pods -n openshift-sriov-network-operator
```
```
$ oc get pods -n openshift-sriov-network-operator
```
Copy to Clipboard Toggle word wrap

24.3.2. Next steps
Copy link

Configuring an SR-IOV network device

24.4. Configuring an SR-IOV network device
Copy link

You can configure a Single Root I/O Virtualization (SR-IOV) device in your cluster.

24.4.1. SR-IOV network node configuration object
Copy link

You specify the SR-IOV network device configuration for a node by creating an SR-IOV network node policy. The API object for the policy is part of the sriovnetwork.openshift.io API group.

The following YAML describes an SR-IOV network node policy:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: <name> 
  namespace: openshift-sriov-network-operator 
spec:
  resourceName: <sriov_resource_name> 
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true" 
  priority: <priority> 
  mtu: <mtu> 
  needVhostNet: false 
  numVfs: <num> 
  nicSelector: 
    vendor: "<vendor_code>" 
    deviceID: "<device_id>" 
    pfNames: ["<pf_name>", ...] 
    rootDevices: ["<pci_bus_id>", ...] 
    netFilter: "<filter_string>" 
  deviceType: <device_type> 
  isRdma: false 
  linkType: <link_type> 
  eSwitchMode: <mode> 
  excludeTopology: false

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: <name>

1


  namespace: openshift-sriov-network-operator

2


spec:
  resourceName: <sriov_resource_name>

3


  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"

4


  priority: <priority>

5


  mtu: <mtu>

6


  needVhostNet: false

7


  numVfs: <num>

8


  nicSelector:

9


    vendor: "<vendor_code>"

10


    deviceID: "<device_id>"

11


    pfNames: ["<pf_name>", ...]

12


    rootDevices: ["<pci_bus_id>", ...]

13


    netFilter: "<filter_string>"

14


  deviceType: <device_type>

15


  isRdma: false

16


  linkType: <link_type>

17


  eSwitchMode: <mode>

18


  excludeTopology: false

19

Copy to Clipboard

Toggle word wrap

The name for the custom resource object.

The namespace where the SR-IOV Network Operator is installed.

The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name.

When specifying a name, be sure to use the accepted syntax expression ^[a-zA-Z0-9_]+$ in the resourceName.

The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.

Important

The SR-IOV Network Operator applies node network configuration policies to nodes in sequence. Before applying node network configuration policies, the SR-IOV Network Operator checks if the machine config pool (MCP) for a node is in an unhealthy state such as Degraded or Updating. If a node is in an unhealthy MCP, the process of applying node network configuration policies to all targeted nodes in the cluster pauses until the MCP returns to a healthy state.

To avoid a node in an unhealthy MCP from blocking the application of node network configuration policies to other nodes, including nodes in other MCPs, you must create a separate node network configuration policy for each MCP.

Optional: The priority is an integer value between 0 and 99. A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99. The default value is 99.

6

Optional: The maximum transmission unit (MTU) of the virtual function. The maximum MTU value can vary for different network interface controller (NIC) models.

Important

If you want to create virtual function on the default network interface, ensure that the MTU is set to a value that matches the cluster MTU.

7

Optional: Set needVhostNet to true to mount the /dev/vhost-net device in the pod. Use the mounted /dev/vhost-net device with Data Plane Development Kit (DPDK) to forward traffic to the kernel network stack.

8

The number of the virtual functions (VF) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 127.

9

The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally.

If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they refer to the same device. If you specify a value for netFilter, then you do not need to specify any other parameter because a network ID is unique.

10

Optional: The vendor hexadecimal code of the SR-IOV network device. The only allowed values are 8086 and 15b3.

11

Optional: The device hexadecimal code of the SR-IOV network device. For example, 101b is the device ID for a Mellanox ConnectX-6 device.

12

Optional: An array of one or more physical function (PF) names for the device.

13

Optional: An array of one or more PCI bus addresses for the PF of the device. Provide the address in the following format: 0000:02:00.1.

14

Optional: The platform-specific network filter. The only supported platform is Red Hat OpenStack Platform (RHOSP). Acceptable values use the following format: openstack/NetworkID:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx. Replace xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx with the value from the /var/config/openstack/latest/network_data.json metadata file.

15

Optional: The driver type for the virtual functions. The only allowed values are netdevice and vfio-pci. The default value is netdevice.

For a Mellanox NIC to work in DPDK mode on bare metal nodes, use the netdevice driver type and set isRdma to true.

16

Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false.

If the isRdma parameter is set to true, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode.

Set isRdma to true and additionally set needVhostNet to true to configure a Mellanox NIC for use with Fast Datapath DPDK applications.

Note

You cannot set the isRdma parameter to true for intel NICs.

17

Optional: The link type for the VFs. The default value is eth for Ethernet. Change this value to 'ib' for InfiniBand.

When linkType is set to ib, isRdma is automatically set to true by the SR-IOV Network Operator webhook. When linkType is set to ib, deviceType should not be set to vfio-pci.

Do not set linkType to 'eth' for SriovNetworkNodePolicy, because this can lead to an incorrect number of available devices reported by the device plugin.

18

Optional: The NIC device mode. The only allowed values are legacy or switchdev.

When eSwitchMode is set to legacy, the default SR-IOV behavior is enabled.

When eSwitchMode is set to switchdev, hardware offloading is enabled.

19

Optional: To exclude advertising an SR-IOV network resource’s NUMA node to the Topology Manager, set the value to true. The default value is false.

24.4.1.1. SR-IOV network node configuration examples
Copy link

The following example describes the configuration for an InfiniBand device:

Example configuration for an InfiniBand device

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-ib-net-1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: ibnic1
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  nicSelector:
    vendor: "15b3"
    deviceID: "101b"
    rootDevices:
      - "0000:19:00.0"
  linkType: ib
  isRdma: true

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-ib-net-1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: ibnic1
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  nicSelector:
    vendor: "15b3"
    deviceID: "101b"
    rootDevices:
      - "0000:19:00.0"
  linkType: ib
  isRdma: true

Copy to Clipboard

Toggle word wrap

The following example describes the configuration for an SR-IOV network device in a RHOSP virtual machine:

Example configuration for an SR-IOV device in a virtual machine

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-sriov-net-openstack-1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: sriovnic1
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 1 
  nicSelector:
    vendor: "15b3"
    deviceID: "101b"
    netFilter: "openstack/NetworkID:ea24bd04-8674-4f69-b0ee-fa0b3bd20509"

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-sriov-net-openstack-1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: sriovnic1
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 1

1


  nicSelector:
    vendor: "15b3"
    deviceID: "101b"
    netFilter: "openstack/NetworkID:ea24bd04-8674-4f69-b0ee-fa0b3bd20509"

2

Copy to Clipboard

Toggle word wrap

1: The numVfs field is always set to 1 when configuring the node network policy for a virtual machine.
2: The netFilter field must refer to a network ID when the virtual machine is deployed on RHOSP. Valid values for netFilter are available from an SriovNetworkNodeState object.

24.4.1.2. Virtual function (VF) partitioning for SR-IOV devices
Copy link

In some cases, you might want to split virtual functions (VFs) from the same physical function (PF) into multiple resource pools. For example, you might want some of the VFs to load with the default driver and the remaining VFs load with the vfio-pci driver. In such a deployment, the pfNames selector in your SriovNetworkNodePolicy custom resource (CR) can be used to specify a range of VFs for a pool using the following format: <pfname>#<first_vf>-<last_vf>.

For example, the following YAML shows the selector for an interface named netpf0 with VF 2 through 7:

pfNames: ["netpf0#2-7"]

pfNames: ["netpf0#2-7"]

Copy to Clipboard

Toggle word wrap

netpf0 is the PF interface name.
2 is the first VF index (0-based) that is included in the range.
7 is the last VF index (0-based) that is included in the range.

You can select VFs from the same PF by using different policy CRs if the following requirements are met:

The numVfs value must be identical for policies that select the same PF.
The VF index must be in the range of 0 to <numVfs>-1. For example, if you have a policy with numVfs set to 8, then the <first_vf> value must not be smaller than 0, and the <last_vf> must not be larger than 7.
The VFs ranges in different policies must not overlap.
The <first_vf> must not be larger than the <last_vf>.

The following example illustrates NIC partitioning for an SR-IOV device.

The policy policy-net-1 defines a resource pool net-1 that contains the VF 0 of PF netpf0 with the default VF driver. The policy policy-net-1-dpdk defines a resource pool net-1-dpdk that contains the VF 8 to 15 of PF netpf0 with the vfio VF driver.

Policy policy-net-1:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: net1
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 16
  nicSelector:
    pfNames: ["netpf0#0-0"]
  deviceType: netdevice

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: net1
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 16
  nicSelector:
    pfNames: ["netpf0#0-0"]
  deviceType: netdevice

Copy to Clipboard

Toggle word wrap

Policy policy-net-1-dpdk:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-1-dpdk
  namespace: openshift-sriov-network-operator
spec:
  resourceName: net1dpdk
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 16
  nicSelector:
    pfNames: ["netpf0#8-15"]
  deviceType: vfio-pci

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-net-1-dpdk
  namespace: openshift-sriov-network-operator
spec:
  resourceName: net1dpdk
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 16
  nicSelector:
    pfNames: ["netpf0#8-15"]
  deviceType: vfio-pci

Copy to Clipboard

Toggle word wrap

Verifying that the interface is successfully partitioned

Confirm that the interface partitioned to virtual functions (VFs) for the SR-IOV device by running the following command.

ip link show <interface>

$ ip link show <interface>

1

Copy to Clipboard

Toggle word wrap

1: Replace <interface> with the interface that you specified when partitioning to VFs for the SR-IOV device, for example, ens3f1.

Example output

5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:fd:fe:d1:bc:01 brd ff:ff:ff:ff:ff:ff

vf 0     link/ether 5a:e7:88:25:ea:a0 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 1     link/ether 3e:1d:36:d7:3d:49 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 2     link/ether ce:09:56:97:df:f9 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 3     link/ether 5e:91:cf:88:d1:38 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 4     link/ether e6:06:a1:96:2f:de brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off

5: ens3f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 3c:fd:fe:d1:bc:01 brd ff:ff:ff:ff:ff:ff

vf 0     link/ether 5a:e7:88:25:ea:a0 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 1     link/ether 3e:1d:36:d7:3d:49 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 2     link/ether ce:09:56:97:df:f9 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 3     link/ether 5e:91:cf:88:d1:38 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off
vf 4     link/ether e6:06:a1:96:2f:de brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off

Copy to Clipboard

Toggle word wrap

24.4.2. Configuring SR-IOV network devices
Copy link

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io CustomResourceDefinition to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).

Note

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes.

It might take several minutes for a configuration change to apply.

Prerequisites

You installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.
You have installed the SR-IOV Network Operator.
You have enough available nodes in your cluster to handle the evicted workload from drained nodes.
You have not selected any control plane nodes for SR-IOV network device configuration.

Procedure

Create an SriovNetworkNodePolicy object, and then save the YAML in the <name>-sriov-node-network.yaml file. Replace <name> with the name for this configuration.
Optional: Label the SR-IOV capable cluster nodes with SriovNetworkNodePolicy.Spec.NodeSelector if they are not already labeled. For more information about labeling nodes, see "Understanding how to update labels on nodes".
Create the SriovNetworkNodePolicy object:
```
oc create -f <name>-sriov-node-network.yaml
```
```
$ oc create -f <name>-sriov-node-network.yaml
```
Copy to Clipboard Toggle word wrap
where <name> specifies the name for this configuration.
After applying the configuration update, all the pods in sriov-network-operator namespace transition to the Running status.
To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name> with the name of a node with the SR-IOV network device that you just configured.
```
oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
```
```
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
```
Copy to Clipboard Toggle word wrap

24.4.3. Troubleshooting SR-IOV configuration
Copy link

After following the procedure to configure an SR-IOV network device, the following sections address some error conditions.

To display the state of nodes, run the following command:

oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name>

$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name>

Copy to Clipboard

Toggle word wrap

where: <node_name> specifies the name of a node with an SR-IOV network device.

Error output: Cannot allocate memory

"lastSyncError": "write /sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs: cannot allocate memory"

"lastSyncError": "write /sys/bus/pci/devices/0000:3b:00.1/sriov_numvfs: cannot allocate memory"

Copy to Clipboard

Toggle word wrap

When a node indicates that it cannot allocate memory, check the following items:

Confirm that global SR-IOV settings are enabled in the BIOS for the node.
Confirm that VT-d is enabled in the BIOS for the node.

24.4.4. Assigning an SR-IOV network to a VRF
Copy link

As a cluster administrator, you can assign an SR-IOV network interface to your VRF domain by using the CNI VRF plugin.

To do this, add the VRF configuration to the optional metaPlugins parameter of the SriovNetwork resource.

Note

Applications that use VRFs need to bind to a specific device. The common usage is to use the SO_BINDTODEVICE option for a socket. SO_BINDTODEVICE binds the socket to a device that is specified in the passed interface name, for example, eth1. To use SO_BINDTODEVICE, the application must have CAP_NET_RAW capabilities.

Using a VRF through the ip vrf exec command is not supported in OpenShift Container Platform pods. To use VRF, bind applications directly to the VRF interface.

24.4.4.1. Creating an additional SR-IOV network attachment with the CNI VRF plugin
Copy link

The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition custom resource (CR) automatically.

Note

Do not edit NetworkAttachmentDefinition custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.

To create an additional SR-IOV network attachment with the CNI VRF plugin, perform the following procedure.

Prerequisites

Install the OpenShift Container Platform CLI (oc).
Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.

Procedure

Create the SriovNetwork custom resource (CR) for the additional SR-IOV network attachment and insert the metaPlugins configuration, as in the following example CR. Save the YAML as the file sriov-network-attachment.yaml.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: example-network
  namespace: additional-sriov-network-1
spec:
  ipam: |
    {
      "type": "host-local",
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "10.56.217.1"
    }
  vlan: 0
  resourceName: intelnics
  metaPlugins : |
    {
      "type": "vrf", 
      "vrfname": "example-vrf-name" 
    }

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: example-network
  namespace: additional-sriov-network-1
spec:
  ipam: |
    {
      "type": "host-local",
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "routes": [{
        "dst": "0.0.0.0/0"
      }],
      "gateway": "10.56.217.1"
    }
  vlan: 0
  resourceName: intelnics
  metaPlugins : |
    {
      "type": "vrf",

1


      "vrfname": "example-vrf-name"

2

Copy to Clipboard

Toggle word wrap

1: type must be set to vrf.
2: vrfname is the name of the VRF that the interface is assigned to. If it does not exist in the pod, it is created.

Create the SriovNetwork resource:

oc create -f sriov-network-attachment.yaml

$ oc create -f sriov-network-attachment.yaml

Copy to Clipboard

Toggle word wrap

Verifying that the NetworkAttachmentDefinition CR is successfully created

Confirm that the SR-IOV Network Operator created the NetworkAttachmentDefinition CR by running the following command.
```
oc get network-attachment-definitions -n <namespace>
```
```
$ oc get network-attachment-definitions -n <namespace> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <namespace> with the namespace that you specified when configuring the network attachment, for example, additional-sriov-network-1.
Example output
```
NAME                            AGE
additional-sriov-network-1      14m
```
```
NAME                            AGE
additional-sriov-network-1      14m
```
Copy to Clipboard Toggle word wrap
Note
There might be a delay before the SR-IOV Network Operator creates the CR.

Verifying that the additional SR-IOV network attachment is successful

To verify that the VRF CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:

Create an SR-IOV network that uses the VRF CNI.
Assign the network to a pod.
Verify that the pod network attachment is connected to the SR-IOV additional network. Remote shell into the pod and run the following command:
```
ip vrf show
```
```
$ ip vrf show
```
Copy to Clipboard Toggle word wrap
Example output
```
Name              Table
-----------------------
red                 10
```
```
Name              Table
-----------------------
red                 10
```
Copy to Clipboard Toggle word wrap

Confirm the VRF interface is master of the secondary interface:

ip link

$ ip link

Copy to Clipboard

Toggle word wrap

Example output

...
5: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master red state UP mode
...

...
5: net1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master red state UP mode
...

Copy to Clipboard

Toggle word wrap

24.4.5. Exclude the SR-IOV network topology for NUMA-aware scheduling
Copy link

You can exclude advertising the Non-Uniform Memory Access (NUMA) node for the SR-IOV network to the Topology Manager for more flexible SR-IOV network deployments during NUMA-aware pod scheduling.

In some scenarios, it is a priority to maximize CPU and memory resources for a pod on a single NUMA node. By not providing a hint to the Topology Manager about the NUMA node for the pod’s SR-IOV network resource, the Topology Manager can deploy the SR-IOV network resource and the pod CPU and memory resources to different NUMA nodes. This can add to network latency because of the data transfer between NUMA nodes. However, it is acceptable in scenarios when workloads require optimal CPU and memory performance.

For example, consider a compute node, compute-1, that features two NUMA nodes: numa0 and numa1. The SR-IOV-enabled NIC is present on numa0. The CPUs available for pod scheduling are present on numa1 only. By setting the excludeTopology specification to true, the Topology Manager can assign CPU and memory resources for the pod to numa1 and can assign the SR-IOV network resource for the same pod to numa0. This is only possible when you set the excludeTopology specification to true. Otherwise, the Topology Manager attempts to place all resources on the same NUMA node.

24.4.5.1. Excluding the SR-IOV network topology for NUMA-aware scheduling
Copy link

To exclude advertising the SR-IOV network resource’s Non-Uniform Memory Access (NUMA) node to the Topology Manager, you can configure the excludeTopology specification in the SriovNetworkNodePolicy custom resource. Use this configuration for more flexible SR-IOV network deployments during NUMA-aware pod scheduling.

Prerequisites

You have installed the OpenShift CLI (oc).
You have configured the CPU Manager policy to static. For more information about CPU Manager, see the Additional resources section.
You have configured the Topology Manager policy to single-numa-node.
You have installed the SR-IOV Network Operator.

Procedure

Create the SriovNetworkNodePolicy CR:

Save the following YAML in the sriov-network-node-policy.yaml file, replacing values in the YAML to match your environment:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: <policy_name>
  namespace: openshift-sriov-network-operator
spec:
  resourceName: sriovnuma0 
  nodeSelector:
    kubernetes.io/hostname: <node_name>
  numVfs: <number_of_Vfs>
  nicSelector: 
    vendor: "<vendor_ID>"
    deviceID: "<device_ID>"
  deviceType: netdevice
  excludeTopology: true

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: <policy_name>
  namespace: openshift-sriov-network-operator
spec:
  resourceName: sriovnuma0

1


  nodeSelector:
    kubernetes.io/hostname: <node_name>
  numVfs: <number_of_Vfs>
  nicSelector:

2


    vendor: "<vendor_ID>"
    deviceID: "<device_ID>"
  deviceType: netdevice
  excludeTopology: true

3

Copy to Clipboard

Toggle word wrap

1: The resource name of the SR-IOV network device plugin. This YAML uses a sample resourceName value.
2: Identify the device for the Operator to configure by using the NIC selector.
3: To exclude advertising the NUMA node for the SR-IOV network resource to the Topology Manager, set the value to true. The default value is false.

Note

If multiple SriovNetworkNodePolicy resources target the same SR-IOV network resource, the SriovNetworkNodePolicy resources must have the same value as the excludeTopology specification. Otherwise, the conflicting policy is rejected.

Create the SriovNetworkNodePolicy resource by running the following command:

oc create -f sriov-network-node-policy.yaml

$ oc create -f sriov-network-node-policy.yaml

Copy to Clipboard

Toggle word wrap

Example output

sriovnetworknodepolicy.sriovnetwork.openshift.io/policy-for-numa-0 created

sriovnetworknodepolicy.sriovnetwork.openshift.io/policy-for-numa-0 created

Copy to Clipboard

Toggle word wrap

Create the SriovNetwork CR:

Save the following YAML in the sriov-network.yaml file, replacing values in the YAML to match your environment:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: sriov-numa-0-network 
  namespace: openshift-sriov-network-operator
spec:
  resourceName: sriovnuma0 
  networkNamespace: <namespace> 
  ipam: |- 
    {
      "type": "<ipam_type>",
    }

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: sriov-numa-0-network

1


  namespace: openshift-sriov-network-operator
spec:
  resourceName: sriovnuma0

2


  networkNamespace: <namespace>

3


  ipam: |-

4


    {
      "type": "<ipam_type>",
    }

Copy to Clipboard

Toggle word wrap

1: Replace sriov-numa-0-network with the name for the SR-IOV network resource.
2: Specify the resource name for the SriovNetworkNodePolicy CR from the previous step. This YAML uses a sample resourceName value.
3: Enter the namespace for your SR-IOV network resource.
4: Enter the IP address management configuration for the SR-IOV network.

Create the SriovNetwork resource by running the following command:

oc create -f sriov-network.yaml

$ oc create -f sriov-network.yaml

Copy to Clipboard

Toggle word wrap

Example output

sriovnetwork.sriovnetwork.openshift.io/sriov-numa-0-network created

sriovnetwork.sriovnetwork.openshift.io/sriov-numa-0-network created

Copy to Clipboard

Toggle word wrap

Create a pod and assign the SR-IOV network resource from the previous step:

Save the following YAML in the sriov-network-pod.yaml file, replacing values in the YAML to match your environment:

apiVersion: v1
kind: Pod
metadata:
  name: <pod_name>
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {
          "name": "sriov-numa-0-network", 
        }
      ]
spec:
  containers:
  - name: <container_name>
    image: <image>
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]

apiVersion: v1
kind: Pod
metadata:
  name: <pod_name>
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {
          "name": "sriov-numa-0-network",

1


        }
      ]
spec:
  containers:
  - name: <container_name>
    image: <image>
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]

Copy to Clipboard

Toggle word wrap

1: This is the name of the SriovNetwork resource that uses the SriovNetworkNodePolicy resource.

Create the Pod resource by running the following command:
```
oc create -f sriov-network-pod.yaml
```
```
$ oc create -f sriov-network-pod.yaml
```
Copy to Clipboard Toggle word wrap
Example output
```
pod/example-pod created
```
```
pod/example-pod created
```
Copy to Clipboard Toggle word wrap

Verification

Verify the status of the pod by running the following command, replacing <pod_name> with the name of the pod:

oc get pod <pod_name>

$ oc get pod <pod_name>

Copy to Clipboard

Toggle word wrap

Example output

NAME                                     READY   STATUS    RESTARTS   AGE
test-deployment-sriov-76cbbf4756-k9v72   1/1     Running   0          45h

NAME                                     READY   STATUS    RESTARTS   AGE
test-deployment-sriov-76cbbf4756-k9v72   1/1     Running   0          45h

Copy to Clipboard

Toggle word wrap

Open a debug session with the target pod to verify that the SR-IOV network resources are deployed to a different node than the memory and CPU resources.
1. Open a debug session with the pod by running the following command, replacing <pod_name> with the target pod name.
  $ oc debug pod/<pod_name>
  Copy to Clipboard Toggle word wrap
2. Set /host as the root directory within the debug shell. The debug pod mounts the root file system from the host in /host within the pod. By changing the root directory to /host, you can run binaries from the host file system:
  $ chroot /host
  Copy to Clipboard Toggle word wrap
3. View information about the CPU allocation by running the following commands:
  $ lscpu | grep NUMA
  Copy to Clipboard Toggle word wrap
  Example output
  NUMA node(s): 2 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,... NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,...
  
  Copy to Clipboard Toggle word wrap
  $ cat /proc/self/status | grep Cpus
  Copy to Clipboard Toggle word wrap
  Example output
  Cpus_allowed: aa Cpus_allowed_list: 1,3,5,7
  
  Copy to Clipboard Toggle word wrap
  $ cat /sys/class/net/net1/device/numa_node
  Copy to Clipboard Toggle word wrap
  Example output
  0
  
  Copy to Clipboard Toggle word wrap
  In this example, CPUs 1,3,5, and 7 are allocated to NUMA node1 but the SR-IOV network resource can use the NIC in NUMA node0.

Note

If the excludeTopology specification is set to True, it is possible that the required resources exist in the same NUMA node.

24.4.6. Next steps
Copy link

Configuring an SR-IOV network attachment

24.5. Configuring an SR-IOV Ethernet network attachment
Copy link

You can configure an Ethernet network attachment for an Single Root I/O Virtualization (SR-IOV) device in the cluster.

24.5.1. Ethernet device configuration object
Copy link

You can configure an Ethernet network device by defining an SriovNetwork object.

The following YAML describes an SriovNetwork object:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: <name> 
  namespace: openshift-sriov-network-operator 
spec:
  resourceName: <sriov_resource_name> 
  networkNamespace: <target_namespace> 
  vlan: <vlan> 
  spoofChk: "<spoof_check>" 
  ipam: |- 
    {}
  linkState: <link_state> 
  maxTxRate: <max_tx_rate> 
  minTxRate: <min_tx_rate> 
  vlanQoS: <vlan_qos> 
  trust: "<trust_vf>" 
  capabilities: <capabilities>

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: <name>

1


  namespace: openshift-sriov-network-operator

2


spec:
  resourceName: <sriov_resource_name>

3


  networkNamespace: <target_namespace>

4


  vlan: <vlan>

5


  spoofChk: "<spoof_check>"

6


  ipam: |-

7


    {}
  linkState: <link_state>

8


  maxTxRate: <max_tx_rate>

9


  minTxRate: <min_tx_rate>

10


  vlanQoS: <vlan_qos>

11


  trust: "<trust_vf>"

12


  capabilities: <capabilities>

13

Copy to Clipboard

Toggle word wrap

1: A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
2: The namespace where the SR-IOV Network Operator is installed.
3: The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network.
4: The target namespace for the SriovNetwork object. Only pods in the target namespace can attach to the additional network.
5: Optional: A Virtual LAN (VLAN) ID for the additional network. The integer value must be from 0 to 4095. The default value is 0.
6: Optional: The spoof check mode of the VF. The allowed values are the strings "on" and "off".
Important
You must enclose the value you specify in quotes or the object is rejected by the SR-IOV Network Operator.
7: A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
8: Optional: The link state of virtual function (VF). Allowed value are enable, disable and auto.
9: Optional: A maximum transmission rate, in Mbps, for the VF.
10: Optional: A minimum transmission rate, in Mbps, for the VF. This value must be less than or equal to the maximum transmission rate.
Note
Intel NICs do not support the minTxRate parameter. For more information, see BZ#1772847.
11: Optional: An IEEE 802.1p priority level for the VF. The default value is 0.
12: Optional: The trust mode of the VF. The allowed values are the strings "on" and "off".
Important
You must enclose the value that you specify in quotes, or the SR-IOV Network Operator rejects the object.
13: Optional: The capabilities to configure for this additional network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support.

24.5.1.1. Configuration of IP address assignment for an additional network
Copy link

The IP address management (IPAM) Container Network Interface (CNI) plugin provides IP addresses for other CNI plugins.

You can use the following IP address assignment types:

Static assignment.
Dynamic assignment through a DHCP server. The DHCP server you specify must be reachable from the additional network.
Dynamic assignment through the Whereabouts IPAM CNI plugin.

24.5.1.1.1. Static IP address assignment configuration
Copy link

The following table describes the configuration for static IP address assignment:

Expand

Table 24.3. ipam static configuration object
Field	Type	Description
`type`	`string`	The IPAM address type. The value `static` is required.
`addresses`	`array`	An array of objects specifying IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported.
`routes`	`array`	An array of objects specifying routes to configure inside the pod.
`dns`	`array`	Optional: An array of objects specifying the DNS configuration.

The addresses array requires objects with the following fields:

Expand

Table 24.4. ipam.addresses[] array
Field	Type	Description
`address`	`string`	An IP address and network prefix that you specify. For example, if you specify `10.10.21.10/24`, then the additional network is assigned an IP address of `10.10.21.10` and the netmask is `255.255.255.0`.
`gateway`	`string`	The default gateway to route egress network traffic to.

Expand

Table 24.5. ipam.routes[] array
Field	Type	Description
`dst`	`string`	The IP address range in CIDR format, such as `192.168.17.0/24` or `0.0.0.0/0` for the default route.
`gw`	`string`	The gateway where network traffic is routed.

Expand

Table 24.6. ipam.dns object
Field	Type	Description
`nameservers`	`array`	An array of one or more IP addresses for to send DNS queries to.
`domain`	`array`	The default domain to append to a hostname. For example, if the domain is set to `example.com`, a DNS lookup query for `example-host` is rewritten as `example-host.example.com`.
`search`	`array`	An array of domain names to append to an unqualified hostname, such as `example-host`, during a DNS lookup query.

Static IP address assignment configuration example

{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.7/24"
        }
      ]
  }
}

{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.7/24"
        }
      ]
  }
}

Copy to Clipboard

Toggle word wrap

24.5.1.1.2. Dynamic IP address (DHCP) assignment configuration
Copy link

The following JSON describes the configuration for dynamic IP address address assignment with DHCP.

Renewal of DHCP leases

A pod obtains its original DHCP lease when it is created. The lease must be periodically renewed by a minimal DHCP server deployment running on the cluster.

The SR-IOV Network Operator does not create a DHCP server deployment; The Cluster Network Operator is responsible for creating the minimal DHCP server deployment.

To trigger the deployment of the DHCP server, you must create a shim network attachment by editing the Cluster Network Operator configuration, as in the following example:

Example shim network attachment definition

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
  - name: dhcp-shim
    namespace: default
    type: Raw
    rawCNIConfig: |-
      {
        "name": "dhcp-shim",
        "cniVersion": "0.3.1",
        "type": "bridge",
        "ipam": {
          "type": "dhcp"
        }
      }
  # ...

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
  - name: dhcp-shim
    namespace: default
    type: Raw
    rawCNIConfig: |-
      {
        "name": "dhcp-shim",
        "cniVersion": "0.3.1",
        "type": "bridge",
        "ipam": {
          "type": "dhcp"
        }
      }
  # ...

Copy to Clipboard

Toggle word wrap

Expand

Table 24.7. ipam DHCP configuration object
Field	Type	Description
`type`	`string`	The IPAM address type. The value `dhcp` is required.

Dynamic IP address (DHCP) assignment configuration example

{
  "ipam": {
    "type": "dhcp"
  }
}

{
  "ipam": {
    "type": "dhcp"
  }
}

Copy to Clipboard

Toggle word wrap

24.5.1.1.3. Dynamic IP address assignment configuration with Whereabouts
Copy link

The Whereabouts CNI plugin allows the dynamic assignment of an IP address to an additional network without the use of a DHCP server.

The following table describes the configuration for dynamic IP address assignment with Whereabouts:

Expand

Table 24.8. ipam whereabouts configuration object
Field	Type	Description
`type`	`string`	The IPAM address type. The value `whereabouts` is required.
`range`	`string`	An IP address and range in CIDR notation. IP addresses are assigned from within this range of addresses.
`exclude`	`array`	Optional: A list of zero or more IP addresses and ranges in CIDR notation. IP addresses within an excluded address range are not assigned.

Dynamic IP address assignment configuration example that uses Whereabouts

{
  "ipam": {
    "type": "whereabouts",
    "range": "192.0.2.192/27",
    "exclude": [
       "192.0.2.192/30",
       "192.0.2.196/32"
    ]
  }
}

{
  "ipam": {
    "type": "whereabouts",
    "range": "192.0.2.192/27",
    "exclude": [
       "192.0.2.192/30",
       "192.0.2.196/32"
    ]
  }
}

Copy to Clipboard

Toggle word wrap

24.5.2. Configuring SR-IOV additional network
Copy link

You can configure an additional network that uses SR-IOV hardware by creating an SriovNetwork object. When you create an SriovNetwork object, the SR-IOV Network Operator automatically creates a NetworkAttachmentDefinition object.

Note

Do not modify or delete an SriovNetwork object if it is attached to any pods in a running state.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create a SriovNetwork object, and then save the YAML in the <name>.yaml file, where <name> is a name for this additional network. The object specification might resemble the following example:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: attach1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: net1
  networkNamespace: project2
  ipam: |-
    {
      "type": "host-local",
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "gateway": "10.56.217.1"
    }

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: attach1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: net1
  networkNamespace: project2
  ipam: |-
    {
      "type": "host-local",
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "gateway": "10.56.217.1"
    }

Copy to Clipboard

Toggle word wrap

To create the object, enter the following command:
```
oc create -f <name>.yaml
```
```
$ oc create -f <name>.yaml
```
Copy to Clipboard Toggle word wrap
where <name> specifies the name of the additional network.
Optional: To confirm that the NetworkAttachmentDefinition object that is associated with the SriovNetwork object that you created in the previous step exists, enter the following command. Replace <namespace> with the networkNamespace you specified in the SriovNetwork object.
```
oc get net-attach-def -n <namespace>
```
```
$ oc get net-attach-def -n <namespace>
```
Copy to Clipboard Toggle word wrap

24.5.3. Next steps
Copy link

Adding a pod to an SR-IOV additional network

24.6. Configuring an SR-IOV InfiniBand network attachment
Copy link

You can configure an InfiniBand (IB) network attachment for an Single Root I/O Virtualization (SR-IOV) device in the cluster.

24.6.1. InfiniBand device configuration object
Copy link

You can configure an InfiniBand (IB) network device by defining an SriovIBNetwork object.

The following YAML describes an SriovIBNetwork object:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: <name> 
  namespace: openshift-sriov-network-operator 
spec:
  resourceName: <sriov_resource_name> 
  networkNamespace: <target_namespace> 
  ipam: |- 
    {}
  linkState: <link_state> 
  capabilities: <capabilities>

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: <name>

1


  namespace: openshift-sriov-network-operator

2


spec:
  resourceName: <sriov_resource_name>

3


  networkNamespace: <target_namespace>

4


  ipam: |-

5


    {}
  linkState: <link_state>

6


  capabilities: <capabilities>

7

Copy to Clipboard

Toggle word wrap

1: A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
2: The namespace where the SR-IOV Operator is installed.
3: The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network.
4: The target namespace for the SriovIBNetwork object. Only pods in the target namespace can attach to the network device.
5: Optional: A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
6: Optional: The link state of virtual function (VF). Allowed values are enable, disable and auto.
7: Optional: The capabilities to configure for this network. You can specify "{ "ips": true }" to enable IP address support or "{ "infinibandGUID": true }" to enable IB Global Unique Identifier (GUID) support.

24.6.1.1. Configuration of IP address assignment for an additional network
Copy link

The IP address management (IPAM) Container Network Interface (CNI) plugin provides IP addresses for other CNI plugins.

You can use the following IP address assignment types:

Static assignment.
Dynamic assignment through a DHCP server. The DHCP server you specify must be reachable from the additional network.
Dynamic assignment through the Whereabouts IPAM CNI plugin.

24.6.1.1.1. Static IP address assignment configuration
Copy link

The following table describes the configuration for static IP address assignment:

Expand

Table 24.9. ipam static configuration object
Field	Type	Description
`type`	`string`	The IPAM address type. The value `static` is required.
`addresses`	`array`	An array of objects specifying IP addresses to assign to the virtual interface. Both IPv4 and IPv6 IP addresses are supported.
`routes`	`array`	An array of objects specifying routes to configure inside the pod.
`dns`	`array`	Optional: An array of objects specifying the DNS configuration.

The addresses array requires objects with the following fields:

Expand

Table 24.10. ipam.addresses[] array
Field	Type	Description
`address`	`string`	An IP address and network prefix that you specify. For example, if you specify `10.10.21.10/24`, then the additional network is assigned an IP address of `10.10.21.10` and the netmask is `255.255.255.0`.
`gateway`	`string`	The default gateway to route egress network traffic to.

Expand

Table 24.11. ipam.routes[] array
Field	Type	Description
`dst`	`string`	The IP address range in CIDR format, such as `192.168.17.0/24` or `0.0.0.0/0` for the default route.
`gw`	`string`	The gateway where network traffic is routed.

Expand

Table 24.12. ipam.dns object
Field	Type	Description
`nameservers`	`array`	An array of one or more IP addresses for to send DNS queries to.
`domain`	`array`	The default domain to append to a hostname. For example, if the domain is set to `example.com`, a DNS lookup query for `example-host` is rewritten as `example-host.example.com`.
`search`	`array`	An array of domain names to append to an unqualified hostname, such as `example-host`, during a DNS lookup query.

Static IP address assignment configuration example

{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.7/24"
        }
      ]
  }
}

{
  "ipam": {
    "type": "static",
      "addresses": [
        {
          "address": "191.168.1.7/24"
        }
      ]
  }
}

Copy to Clipboard

Toggle word wrap

24.6.1.1.2. Dynamic IP address (DHCP) assignment configuration
Copy link

The following JSON describes the configuration for dynamic IP address address assignment with DHCP.

Renewal of DHCP leases

A pod obtains its original DHCP lease when it is created. The lease must be periodically renewed by a minimal DHCP server deployment running on the cluster.

To trigger the deployment of the DHCP server, you must create a shim network attachment by editing the Cluster Network Operator configuration, as in the following example:

Example shim network attachment definition

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
  - name: dhcp-shim
    namespace: default
    type: Raw
    rawCNIConfig: |-
      {
        "name": "dhcp-shim",
        "cniVersion": "0.3.1",
        "type": "bridge",
        "ipam": {
          "type": "dhcp"
        }
      }
  # ...

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  additionalNetworks:
  - name: dhcp-shim
    namespace: default
    type: Raw
    rawCNIConfig: |-
      {
        "name": "dhcp-shim",
        "cniVersion": "0.3.1",
        "type": "bridge",
        "ipam": {
          "type": "dhcp"
        }
      }
  # ...

Copy to Clipboard

Toggle word wrap

Expand

Table 24.13. ipam DHCP configuration object
Field	Type	Description
`type`	`string`	The IPAM address type. The value `dhcp` is required.

Dynamic IP address (DHCP) assignment configuration example

{
  "ipam": {
    "type": "dhcp"
  }
}

{
  "ipam": {
    "type": "dhcp"
  }
}

Copy to Clipboard

Toggle word wrap

24.6.1.1.3. Dynamic IP address assignment configuration with Whereabouts
Copy link

The Whereabouts CNI plugin allows the dynamic assignment of an IP address to an additional network without the use of a DHCP server.

The following table describes the configuration for dynamic IP address assignment with Whereabouts:

Expand

Table 24.14. ipam whereabouts configuration object
Field	Type	Description
`type`	`string`	The IPAM address type. The value `whereabouts` is required.
`range`	`string`	An IP address and range in CIDR notation. IP addresses are assigned from within this range of addresses.
`exclude`	`array`	Optional: A list of zero or more IP addresses and ranges in CIDR notation. IP addresses within an excluded address range are not assigned.

Dynamic IP address assignment configuration example that uses Whereabouts

{
  "ipam": {
    "type": "whereabouts",
    "range": "192.0.2.192/27",
    "exclude": [
       "192.0.2.192/30",
       "192.0.2.196/32"
    ]
  }
}

{
  "ipam": {
    "type": "whereabouts",
    "range": "192.0.2.192/27",
    "exclude": [
       "192.0.2.192/30",
       "192.0.2.196/32"
    ]
  }
}

Copy to Clipboard

Toggle word wrap

24.6.2. Configuring SR-IOV additional network
Copy link

You can configure an additional network that uses SR-IOV hardware by creating an SriovIBNetwork object. When you create an SriovIBNetwork object, the SR-IOV Network Operator automatically creates a NetworkAttachmentDefinition object.

Note

Do not modify or delete an SriovIBNetwork object if it is attached to any pods in a running state.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create a SriovIBNetwork object, and then save the YAML in the <name>.yaml file, where <name> is a name for this additional network. The object specification might resemble the following example:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: attach1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: net1
  networkNamespace: project2
  ipam: |-
    {
      "type": "host-local",
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "gateway": "10.56.217.1"
    }

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovIBNetwork
metadata:
  name: attach1
  namespace: openshift-sriov-network-operator
spec:
  resourceName: net1
  networkNamespace: project2
  ipam: |-
    {
      "type": "host-local",
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "gateway": "10.56.217.1"
    }

Copy to Clipboard

Toggle word wrap

To create the object, enter the following command:
```
oc create -f <name>.yaml
```
```
$ oc create -f <name>.yaml
```
Copy to Clipboard Toggle word wrap
where <name> specifies the name of the additional network.
Optional: To confirm that the NetworkAttachmentDefinition object that is associated with the SriovIBNetwork object that you created in the previous step exists, enter the following command. Replace <namespace> with the networkNamespace you specified in the SriovIBNetwork object.
```
oc get net-attach-def -n <namespace>
```
```
$ oc get net-attach-def -n <namespace>
```
Copy to Clipboard Toggle word wrap

24.6.3. Next steps
Copy link

Adding a pod to an SR-IOV additional network

24.7. Adding a pod to an SR-IOV additional network
Copy link

You can add a pod to an existing Single Root I/O Virtualization (SR-IOV) network.

24.7.1. Runtime configuration for a network attachment
Copy link

When attaching a pod to an additional network, you can specify a runtime configuration to make specific customizations for the pod. For example, you can request a specific MAC hardware address.

You specify the runtime configuration by setting an annotation in the pod specification. The annotation key is k8s.v1.cni.cncf.io/networks, and it accepts a JSON object that describes the runtime configuration.

24.7.1.1. Runtime configuration for an Ethernet-based SR-IOV attachment
Copy link

The following JSON describes the runtime configuration options for an Ethernet-based SR-IOV network attachment.

[
  {
    "name": "<name>", 
    "mac": "<mac_address>", 
    "ips": ["<cidr_range>"] 
  }
]

[
  {
    "name": "<name>",

1


    "mac": "<mac_address>",

2


    "ips": ["<cidr_range>"]

3

}
]

Copy to Clipboard

Toggle word wrap

1: The name of the SR-IOV network attachment definition CR.
2: Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { "mac": true } in the SriovNetwork object.
3: Optional: IP addresses for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { "ips": true } in the SriovNetwork object.

Example runtime configuration

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {
          "name": "net1",
          "mac": "20:04:0f:f1:88:01",
          "ips": ["192.168.10.1/24", "2001::1/64"]
        }
      ]
spec:
  containers:
  - name: sample-container
    image: <image>
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {
          "name": "net1",
          "mac": "20:04:0f:f1:88:01",
          "ips": ["192.168.10.1/24", "2001::1/64"]
        }
      ]
spec:
  containers:
  - name: sample-container
    image: <image>
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]

Copy to Clipboard

Toggle word wrap

24.7.1.2. Runtime configuration for an InfiniBand-based SR-IOV attachment
Copy link

The following JSON describes the runtime configuration options for an InfiniBand-based SR-IOV network attachment.

[
  {
    "name": "<network_attachment>", 
    "infiniband-guid": "<guid>", 
    "ips": ["<cidr_range>"] 
  }
]

[
  {
    "name": "<network_attachment>",

1


    "infiniband-guid": "<guid>",

2


    "ips": ["<cidr_range>"]

3

}
]

Copy to Clipboard

Toggle word wrap

1: The name of the SR-IOV network attachment definition CR.
2: The InfiniBand GUID for the SR-IOV device. To use this feature, you also must specify { "infinibandGUID": true } in the SriovIBNetwork object.
3: The IP addresses for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { "ips": true } in the SriovIBNetwork object.

Example runtime configuration

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {
          "name": "ib1",
          "infiniband-guid": "c2:11:22:33:44:55:66:77",
          "ips": ["192.168.10.1/24", "2001::1/64"]
        }
      ]
spec:
  containers:
  - name: sample-container
    image: <image>
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {
          "name": "ib1",
          "infiniband-guid": "c2:11:22:33:44:55:66:77",
          "ips": ["192.168.10.1/24", "2001::1/64"]
        }
      ]
spec:
  containers:
  - name: sample-container
    image: <image>
    imagePullPolicy: IfNotPresent
    command: ["sleep", "infinity"]

Copy to Clipboard

Toggle word wrap

24.7.2. Adding a pod to an additional network
Copy link

You can add a pod to an additional network. The pod continues to send normal cluster-related network traffic over the default network.

When a pod is created additional networks are attached to it. However, if a pod already exists, you cannot attach additional networks to it.

The pod must be in the same namespace as the additional network.

Note

The SR-IOV Network Resource Injector adds the resource field to the first container in a pod automatically.

If you are using an Intel network interface controller (NIC) in Data Plane Development Kit (DPDK) mode, only the first container in your pod is configured to access the NIC. Your SR-IOV additional network is configured for DPDK mode if the deviceType is set to vfio-pci in the SriovNetworkNodePolicy object.

You can work around this issue by either ensuring that the container that needs access to the NIC is the first container defined in the Pod object or by disabling the Network Resource Injector. For more information, see BZ#1990953.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster.
Install the SR-IOV Operator.
Create either an SriovNetwork object or an SriovIBNetwork object to attach the pod to.

Procedure

Add an annotation to the Pod object. Only one of the following annotation formats can be used:
1. To attach an additional network without any customization, add an annotation with the following format. Replace <network> with the name of the additional network to associate with the pod:
  metadata: annotations: k8s.v1.cni.cncf.io/networks: <network>[,<network>,...]
  1
  Copy to Clipboard Toggle word wrap
  1
  To specify more than one additional network, separate each network with a comma. Do not include whitespace between the comma. If you specify the same additional network multiple times, that pod will have multiple network interfaces attached to that network.
2. To attach an additional network with customizations, add an annotation with the following format:
  metadata: annotations: k8s.v1.cni.cncf.io/networks: |- [ { "name": "<network>",
  1
  "namespace": "<namespace>",
  2
  "default-route": ["<default-route>"]
  3
  } ]
  Copy to Clipboard Toggle word wrap
  1
  Specify the name of the additional network defined by a NetworkAttachmentDefinition object.
  2
  Specify the namespace where the NetworkAttachmentDefinition object is defined.
  3
  Optional: Specify an override for the default route, such as 192.168.17.1.
To create the pod, enter the following command. Replace <name> with the name of the pod.
```
oc create -f <name>.yaml
```
```
$ oc create -f <name>.yaml
```
Copy to Clipboard Toggle word wrap

Optional: To Confirm that the annotation exists in the Pod CR, enter the following command, replacing <name> with the name of the pod.

oc get pod <name> -o yaml

$ oc get pod <name> -o yaml

Copy to Clipboard

Toggle word wrap

In the following example, the example-pod pod is attached to the net1 additional network:

oc get pod example-pod -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-bridge
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.128.2.14"
          ],
          "default": true,
          "dns": {}
      },{
          "name": "macvlan-bridge",
          "interface": "net1",
          "ips": [
              "20.2.2.100"
          ],
          "mac": "22:2f:60:a5:f8:00",
          "dns": {}
      }]
  name: example-pod
  namespace: default
spec:
  ...
status:
  ...

$ oc get pod example-pod -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: macvlan-bridge
    k8s.v1.cni.cncf.io/network-status: |-

1


      [{
          "name": "openshift-sdn",
          "interface": "eth0",
          "ips": [
              "10.128.2.14"
          ],
          "default": true,
          "dns": {}
      },{
          "name": "macvlan-bridge",
          "interface": "net1",
          "ips": [
              "20.2.2.100"
          ],
          "mac": "22:2f:60:a5:f8:00",
          "dns": {}
      }]
  name: example-pod
  namespace: default
spec:
  ...
status:
  ...

Copy to Clipboard

Toggle word wrap

1: The k8s.v1.cni.cncf.io/network-status parameter is a JSON array of objects. Each object describes the status of an additional network attached to the pod. The annotation value is stored as a plain text value.

24.7.3. Creating a non-uniform memory access (NUMA) aligned SR-IOV pod
Copy link

You can create a NUMA aligned SR-IOV pod by restricting SR-IOV and the CPU resources allocated from the same NUMA node with restricted or single-numa-node Topology Manager polices.

Prerequisites

You have installed the OpenShift CLI (oc).
You have configured the CPU Manager policy to static. For more information on CPU Manager, see the "Additional resources" section.
You have configured the Topology Manager policy to single-numa-node.
Note
When single-numa-node is unable to satisfy the request, you can configure the Topology Manager policy to restricted. For more flexible SR-IOV network resource scheduling, see Excluding SR-IOV network topology during NUMA-aware scheduling in the Additional resources section.

Procedure

Create the following SR-IOV pod spec, and then save the YAML in the <name>-sriov-pod.yaml file. Replace <name> with a name for this pod.

The following example shows an SR-IOV pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: <name> 
spec:
  containers:
  - name: sample-container
    image: <image> 
    command: ["sleep", "infinity"]
    resources:
      limits:
        memory: "1Gi" 
        cpu: "2" 
      requests:
        memory: "1Gi"
        cpu: "2"

apiVersion: v1
kind: Pod
metadata:
  name: sample-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: <name>

1


spec:
  containers:
  - name: sample-container
    image: <image>

2


    command: ["sleep", "infinity"]
    resources:
      limits:
        memory: "1Gi"

3


        cpu: "2"

4


      requests:
        memory: "1Gi"
        cpu: "2"

Copy to Clipboard

Toggle word wrap

1: Replace <name> with the name of the SR-IOV network attachment definition CR.
2: Replace <image> with the name of the sample-pod image.
3: To create the SR-IOV pod with guaranteed QoS, set memory limits equal to memory requests.
4: To create the SR-IOV pod with guaranteed QoS, set cpu limits equals to cpu requests.

Create the sample SR-IOV pod by running the following command:
```
oc create -f <filename>
```
```
$ oc create -f <filename> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <filename> with the name of the file you created in the previous step.
Confirm that the sample-pod is configured with guaranteed QoS.
```
oc describe pod sample-pod
```
```
$ oc describe pod sample-pod
```
Copy to Clipboard Toggle word wrap

Confirm that the sample-pod is allocated with exclusive CPUs.

oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus

$ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus

Copy to Clipboard

Toggle word wrap

Confirm that the SR-IOV device and CPUs that are allocated for the sample-pod are on the same NUMA node.
```
oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
```
```
$ oc exec sample-pod -- cat /sys/fs/cgroup/cpuset/cpuset.cpus
```
Copy to Clipboard Toggle word wrap

24.7.4. A test pod template for clusters that use SR-IOV on OpenStack
Copy link

The following testpmd pod demonstrates container creation with huge pages, reserved CPUs, and the SR-IOV port.

An example testpmd pod

apiVersion: v1
kind: Pod
metadata:
  name: testpmd-sriov
  namespace: mynamespace
  annotations:
    cpu-load-balancing.crio.io: "disable"
    cpu-quota.crio.io: "disable"
# ...
spec:
  containers:
  - name: testpmd
    command: ["sleep", "99999"]
    image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
    securityContext:
      capabilities:
        add: ["IPC_LOCK","SYS_ADMIN"]
      privileged: true
      runAsUser: 0
    resources:
      requests:
        memory: 1000Mi
        hugepages-1Gi: 1Gi
        cpu: '2'
        openshift.io/sriov1: 1
      limits:
        hugepages-1Gi: 1Gi
        cpu: '2'
        memory: 1000Mi
        openshift.io/sriov1: 1
    volumeMounts:
      - mountPath: /dev/hugepages
        name: hugepage
        readOnly: False
  runtimeClassName: performance-cnf-performanceprofile 
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

apiVersion: v1
kind: Pod
metadata:
  name: testpmd-sriov
  namespace: mynamespace
  annotations:
    cpu-load-balancing.crio.io: "disable"
    cpu-quota.crio.io: "disable"
# ...
spec:
  containers:
  - name: testpmd
    command: ["sleep", "99999"]
    image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
    securityContext:
      capabilities:
        add: ["IPC_LOCK","SYS_ADMIN"]
      privileged: true
      runAsUser: 0
    resources:
      requests:
        memory: 1000Mi
        hugepages-1Gi: 1Gi
        cpu: '2'
        openshift.io/sriov1: 1
      limits:
        hugepages-1Gi: 1Gi
        cpu: '2'
        memory: 1000Mi
        openshift.io/sriov1: 1
    volumeMounts:
      - mountPath: /dev/hugepages
        name: hugepage
        readOnly: False
  runtimeClassName: performance-cnf-performanceprofile

1


  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

Copy to Clipboard

Toggle word wrap

1: This example assumes that the name of the performance profile is cnf-performance profile.

24.8. Configuring interface-level network sysctl settings for SR-IOV networks
Copy link

As a cluster administrator, you can modify interface-level network sysctls using the tuning Container Network Interface (CNI) meta plugin for a pod connected to a SR-IOV network device.

24.8.1. Labeling nodes with an SR-IOV enabled NIC
Copy link

If you want to enable SR-IOV on only SR-IOV capable nodes there are a couple of ways to do this:

Install the Node Feature Discovery (NFD) Operator. NFD detects the presence of SR-IOV enabled NICs and labels the nodes with node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = true.
Examine the SriovNetworkNodeState CR for each node. The interfaces stanza includes a list of all of the SR-IOV devices discovered by the SR-IOV Network Operator on the worker node. Label each node with feature.node.kubernetes.io/network-sriov.capable: "true" by using the following command:
```
$ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"
```
```
$ oc label node <node_name> feature.node.kubernetes.io/network-sriov.capable="true"
```
Copy to Clipboard Toggle word wrap
Note
You can label the nodes with whatever name you want.

24.8.2. Setting one sysctl flag
Copy link

You can set interface-level network sysctl settings for a pod connected to a SR-IOV network device.

In this example, net.ipv4.conf.IFNAME.accept_redirects is set to 1 on the created virtual interfaces.

The sysctl-tuning-test is a namespace used in this example.

Use the following command to create the sysctl-tuning-test namespace:
```
oc create namespace sysctl-tuning-test
```
```
$ oc create namespace sysctl-tuning-test
```
Copy to Clipboard Toggle word wrap

24.8.2.1. Setting one sysctl flag on nodes with SR-IOV network devices
Copy link

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io custom resource definition (CRD) to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).

Note

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain and reboot the nodes.

It can take several minutes for a configuration change to apply.

Follow this procedure to create a SriovNetworkNodePolicy custom resource (CR).

Procedure

Create an SriovNetworkNodePolicy custom resource (CR). For example, save the following YAML as the file policyoneflag-sriov-node-network.yaml:
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policyoneflag 
  namespace: openshift-sriov-network-operator 
spec:
  resourceName: policyoneflag 
  nodeSelector: 
    feature.node.kubernetes.io/network-sriov.capable="true"
  priority: 10 
  numVfs: 5 
  nicSelector: 
    pfNames: ["ens5"] 
  deviceType: "netdevice" 
  isRdma: false 
```
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policyoneflag 
```
1
```
  namespace: openshift-sriov-network-operator 
```
2
```
spec:
  resourceName: policyoneflag 
```
3
```
  nodeSelector: 
```
4
```
    feature.node.kubernetes.io/network-sriov.capable="true"
  priority: 10 
```
5
```
  numVfs: 5 
```
6
```
  nicSelector: 
```
7
```
    pfNames: ["ens5"] 
```
8
```
  deviceType: "netdevice" 
```
9
```
  isRdma: false 
```
10
Copy to Clipboard Toggle word wrap
1
The name for the custom resource object.
2
The namespace where the SR-IOV Network Operator is installed.
3
The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name.
4
The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.
5
Optional: The priority is an integer value between 0 and 99. A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99. The default value is 99.
6
The number of the virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 127.
7
The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they refer to the same device. If you specify a value for netFilter, then you do not need to specify any other parameter because a network ID is unique.
8
Optional: An array of one or more physical function (PF) names for the device.
9
Optional: The driver type for the virtual functions. The only allowed value is netdevice. For a Mellanox NIC to work in DPDK mode on bare metal nodes, set isRdma to true.
10
Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false. If the isRdma parameter is set to true, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode. Set isRdma to true and additionally set needVhostNet to true to configure a Mellanox NIC for use with Fast Datapath DPDK applications.
Note
The vfio-pci driver type is not supported.
Create the SriovNetworkNodePolicy object:
```
oc create -f policyoneflag-sriov-node-network.yaml
```
```
$ oc create -f policyoneflag-sriov-node-network.yaml
```
Copy to Clipboard Toggle word wrap
After applying the configuration update, all the pods in sriov-network-operator namespace change to the Running status.
To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name> with the name of a node with the SR-IOV network device that you just configured.
```
oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
```
```
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
```
Copy to Clipboard Toggle word wrap
Example output
```
Succeeded
```
```
Succeeded
```
Copy to Clipboard Toggle word wrap

24.8.2.2. Configuring sysctl on a SR-IOV network
Copy link

You can set interface specific sysctl settings on virtual interfaces created by SR-IOV by adding the tuning configuration to the optional metaPlugins parameter of the SriovNetwork resource.

The SR-IOV Network Operator manages additional network definitions. When you specify an additional SR-IOV network to create, the SR-IOV Network Operator creates the NetworkAttachmentDefinition custom resource (CR) automatically.

Note

Do not edit NetworkAttachmentDefinition custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.

To change the interface-level network net.ipv4.conf.IFNAME.accept_redirects sysctl settings, create an additional SR-IOV network with the Container Network Interface (CNI) tuning plugin.

Prerequisites

Install the OpenShift Container Platform CLI (oc).
Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.

Procedure

Create the SriovNetwork custom resource (CR) for the additional SR-IOV network attachment and insert the metaPlugins configuration, as in the following example CR. Save the YAML as the file sriov-network-interface-sysctl.yaml.
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: onevalidflag 
  namespace: openshift-sriov-network-operator 
spec:
  resourceName: policyoneflag 
  networkNamespace: sysctl-tuning-test 
  ipam: '{ "type": "static" }' 
  capabilities: '{ "mac": true, "ips": true }' 
  metaPlugins : | 
    {
      "type": "tuning",
      "capabilities":{
        "mac":true
      },
      "sysctl":{
         "net.ipv4.conf.IFNAME.accept_redirects": "1"
      }
    }
```
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: onevalidflag 
```
1
```
  namespace: openshift-sriov-network-operator 
```
2
```
spec:
  resourceName: policyoneflag 
```
3
```
  networkNamespace: sysctl-tuning-test 
```
4
```
  ipam: '{ "type": "static" }' 
```
5
```
  capabilities: '{ "mac": true, "ips": true }' 
```
6
```
  metaPlugins : | 
```
7
```
    {
      "type": "tuning",
      "capabilities":{
        "mac":true
      },
      "sysctl":{
         "net.ipv4.conf.IFNAME.accept_redirects": "1"
      }
    }
```
Copy to Clipboard Toggle word wrap
1
A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
2
The namespace where the SR-IOV Network Operator is installed.
3
The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network.
4
The target namespace for the SriovNetwork object. Only pods in the target namespace can attach to the additional network.
5
A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
6
Optional: Set capabilities for the additional network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support.
7
Optional: The metaPlugins parameter is used to add additional capabilities to the device. In this use case set the type field to tuning. Specify the interface-level network sysctl you want to set in the sysctl field.

Create the SriovNetwork resource:

oc create -f sriov-network-interface-sysctl.yaml

$ oc create -f sriov-network-interface-sysctl.yaml

Copy to Clipboard

Toggle word wrap

Verifying that the NetworkAttachmentDefinition CR is successfully created

Confirm that the SR-IOV Network Operator created the NetworkAttachmentDefinition CR by running the following command:
```
oc get network-attachment-definitions -n <namespace>
```
```
$ oc get network-attachment-definitions -n <namespace> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <namespace> with the value for networkNamespace that you specified in the SriovNetwork object. For example, sysctl-tuning-test.
Example output
```
NAME                                  AGE
onevalidflag                          14m
```
```
NAME                                  AGE
onevalidflag                          14m
```
Copy to Clipboard Toggle word wrap
Note
There might be a delay before the SR-IOV Network Operator creates the CR.

Verifying that the additional SR-IOV network attachment is successful

To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:

Create a Pod CR. Save the following YAML as the file examplepod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: tunepod
  namespace: sysctl-tuning-test
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {
          "name": "onevalidflag",  
          "mac": "0a:56:0a:83:04:0c", 
          "ips": ["10.100.100.200/24"] 
       }
      ]
spec:
  containers:
  - name: podexample
    image: centos
    command: ["/bin/bash", "-c", "sleep INF"]
    securityContext:
      runAsUser: 2000
      runAsGroup: 3000
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault

apiVersion: v1
kind: Pod
metadata:
  name: tunepod
  namespace: sysctl-tuning-test
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {
          "name": "onevalidflag",

1


          "mac": "0a:56:0a:83:04:0c",

2


          "ips": ["10.100.100.200/24"]

3


       }
      ]
spec:
  containers:
  - name: podexample
    image: centos
    command: ["/bin/bash", "-c", "sleep INF"]
    securityContext:
      runAsUser: 2000
      runAsGroup: 3000
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault

Copy to Clipboard

Toggle word wrap

1: The name of the SR-IOV network attachment definition CR.
2: Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { "mac": true } in the SriovNetwork object.
3: Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { "ips": true } in the SriovNetwork object.

Create the Pod CR:
```
oc apply -f examplepod.yaml
```
```
$ oc apply -f examplepod.yaml
```
Copy to Clipboard Toggle word wrap

Verify that the pod is created by running the following command:

oc get pod -n sysctl-tuning-test

$ oc get pod -n sysctl-tuning-test

Copy to Clipboard

Toggle word wrap

Example output

NAME      READY   STATUS    RESTARTS   AGE
tunepod   1/1     Running   0          47s

NAME      READY   STATUS    RESTARTS   AGE
tunepod   1/1     Running   0          47s

Copy to Clipboard

Toggle word wrap

Log in to the pod by running the following command:
```
oc rsh -n sysctl-tuning-test tunepod
```
```
$ oc rsh -n sysctl-tuning-test tunepod
```
Copy to Clipboard Toggle word wrap
Verify the values of the configured sysctl flag. Find the value net.ipv4.conf.IFNAME.accept_redirects by running the following command::
```
sysctl net.ipv4.conf.net1.accept_redirects
```
```
$ sysctl net.ipv4.conf.net1.accept_redirects
```
Copy to Clipboard Toggle word wrap
Example output
```
net.ipv4.conf.net1.accept_redirects = 1
```
```
net.ipv4.conf.net1.accept_redirects = 1
```
Copy to Clipboard Toggle word wrap

24.8.3. Configuring sysctl settings for pods associated with bonded SR-IOV interface flag
Copy link

You can set interface-level network sysctl settings for a pod connected to a bonded SR-IOV network device.

In this example, the specific network interface-level sysctl settings that can be configured are set on the bonded interface.

The sysctl-tuning-test is a namespace used in this example.

Use the following command to create the sysctl-tuning-test namespace:
```
oc create namespace sysctl-tuning-test
```
```
$ oc create namespace sysctl-tuning-test
```
Copy to Clipboard Toggle word wrap

24.8.3.1. Setting all sysctl flag on nodes with bonded SR-IOV network devices
Copy link

The SR-IOV Network Operator adds the SriovNetworkNodePolicy.sriovnetwork.openshift.io custom resource definition (CRD) to OpenShift Container Platform. You can configure an SR-IOV network device by creating a SriovNetworkNodePolicy custom resource (CR).

Note

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes.

It might take several minutes for a configuration change to apply.

Follow this procedure to create a SriovNetworkNodePolicy custom resource (CR).

Procedure

Create an SriovNetworkNodePolicy custom resource (CR). Save the following YAML as the file policyallflags-sriov-node-network.yaml. Replace policyallflags with the name for the configuration.
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policyallflags 
  namespace: openshift-sriov-network-operator 
spec:
  resourceName: policyallflags 
  nodeSelector: 
    node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = `true`
  priority: 10 
  numVfs: 5 
  nicSelector: 
    pfNames: ["ens1f0"]  
  deviceType: "netdevice" 
  isRdma: false 
```
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policyallflags 
```
1
```
  namespace: openshift-sriov-network-operator 
```
2
```
spec:
  resourceName: policyallflags 
```
3
```
  nodeSelector: 
```
4
```
    node.alpha.kubernetes-incubator.io/nfd-network-sriov.capable = `true`
  priority: 10 
```
5
```
  numVfs: 5 
```
6
```
  nicSelector: 
```
7
```
    pfNames: ["ens1f0"]  
```
8
```
  deviceType: "netdevice" 
```
9
```
  isRdma: false 
```
10
Copy to Clipboard Toggle word wrap
1
The name for the custom resource object.
2
The namespace where the SR-IOV Network Operator is installed.
3
The resource name of the SR-IOV network device plugin. You can create multiple SR-IOV network node policies for a resource name.
4
The node selector specifies the nodes to configure. Only SR-IOV network devices on the selected nodes are configured. The SR-IOV Container Network Interface (CNI) plugin and device plugin are deployed on selected nodes only.
5
Optional: The priority is an integer value between 0 and 99. A smaller value receives higher priority. For example, a priority of 10 is a higher priority than 99. The default value is 99.
6
The number of virtual functions (VFs) to create for the SR-IOV physical network device. For an Intel network interface controller (NIC), the number of VFs cannot be larger than the total VFs supported by the device. For a Mellanox NIC, the number of VFs cannot be larger than 127.
7
The NIC selector identifies the device for the Operator to configure. You do not have to specify values for all the parameters. It is recommended to identify the network device with enough precision to avoid selecting a device unintentionally. If you specify rootDevices, you must also specify a value for vendor, deviceID, or pfNames. If you specify both pfNames and rootDevices at the same time, ensure that they refer to the same device. If you specify a value for netFilter, then you do not need to specify any other parameter because a network ID is unique.
8
Optional: An array of one or more physical function (PF) names for the device.
9
Optional: The driver type for the virtual functions. The only allowed value is netdevice. For a Mellanox NIC to work in DPDK mode on bare metal nodes, set isRdma to true.
10
Optional: Configures whether to enable remote direct memory access (RDMA) mode. The default value is false. If the isRdma parameter is set to true, you can continue to use the RDMA-enabled VF as a normal network device. A device can be used in either mode. Set isRdma to true and additionally set needVhostNet to true to configure a Mellanox NIC for use with Fast Datapath DPDK applications.
Note
The vfio-pci driver type is not supported.
Create the SriovNetworkNodePolicy object:
```
oc create -f policyallflags-sriov-node-network.yaml
```
```
$ oc create -f policyallflags-sriov-node-network.yaml
```
Copy to Clipboard Toggle word wrap
After applying the configuration update, all the pods in sriov-network-operator namespace change to the Running status.
To verify that the SR-IOV network device is configured, enter the following command. Replace <node_name> with the name of a node with the SR-IOV network device that you just configured.
```
oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
```
```
$ oc get sriovnetworknodestates -n openshift-sriov-network-operator <node_name> -o jsonpath='{.status.syncStatus}'
```
Copy to Clipboard Toggle word wrap
Example output
```
Succeeded
```
```
Succeeded
```
Copy to Clipboard Toggle word wrap

24.8.3.2. Configuring sysctl on a bonded SR-IOV network
Copy link

You can set interface specific sysctl settings on a bonded interface created from two SR-IOV interfaces. Do this by adding the tuning configuration to the optional Plugins parameter of the bond network attachment definition.

Note

Do not edit NetworkAttachmentDefinition custom resources that the SR-IOV Network Operator manages. Doing so might disrupt network traffic on your additional network.

To change specific interface-level network sysctl settings create the SriovNetwork custom resource (CR) with the Container Network Interface (CNI) tuning plugin by using the following procedure.

Prerequisites

Install the OpenShift Container Platform CLI (oc).
Log in to the OpenShift Container Platform cluster as a user with cluster-admin privileges.

Procedure

Create the SriovNetwork custom resource (CR) for the bonded interface as in the following example CR. Save the YAML as the file sriov-network-attachment.yaml.
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: allvalidflags 
  namespace: openshift-sriov-network-operator 
spec:
  resourceName: policyallflags 
  networkNamespace: sysctl-tuning-test 
  capabilities: '{ "mac": true, "ips": true }' 
```
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: allvalidflags 
```
1
```
  namespace: openshift-sriov-network-operator 
```
2
```
spec:
  resourceName: policyallflags 
```
3
```
  networkNamespace: sysctl-tuning-test 
```
4
```
  capabilities: '{ "mac": true, "ips": true }' 
```
5
Copy to Clipboard Toggle word wrap
1
A name for the object. The SR-IOV Network Operator creates a NetworkAttachmentDefinition object with same name.
2
The namespace where the SR-IOV Network Operator is installed.
3
The value for the spec.resourceName parameter from the SriovNetworkNodePolicy object that defines the SR-IOV hardware for this additional network.
4
The target namespace for the SriovNetwork object. Only pods in the target namespace can attach to the additional network.
5
Optional: The capabilities to configure for this additional network. You can specify "{ "ips": true }" to enable IP address support or "{ "mac": true }" to enable MAC address support.

Create the SriovNetwork resource:

oc create -f sriov-network-attachment.yaml

$ oc create -f sriov-network-attachment.yaml

Copy to Clipboard

Toggle word wrap

Create a bond network attachment definition as in the following example CR. Save the YAML as the file sriov-bond-network-interface.yaml.

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bond-sysctl-network
  namespace: sysctl-tuning-test
spec:
  config: '{
  "cniVersion":"0.4.0",
  "name":"bound-net",
  "plugins":[
    {
      "type":"bond", 
      "mode": "active-backup", 
      "failOverMac": 1, 
      "linksInContainer": true, 
      "miimon": "100",
      "links": [ 
        {"name": "net1"},
        {"name": "net2"}
      ],
      "ipam":{ 
        "type":"static"
      }
    },
    {
      "type":"tuning", 
      "capabilities":{
        "mac":true
      },
      "sysctl":{
        "net.ipv4.conf.IFNAME.accept_redirects": "0",
        "net.ipv4.conf.IFNAME.accept_source_route": "0",
        "net.ipv4.conf.IFNAME.disable_policy": "1",
        "net.ipv4.conf.IFNAME.secure_redirects": "0",
        "net.ipv4.conf.IFNAME.send_redirects": "0",
        "net.ipv6.conf.IFNAME.accept_redirects": "0",
        "net.ipv6.conf.IFNAME.accept_source_route": "1",
        "net.ipv6.neigh.IFNAME.base_reachable_time_ms": "20000",
        "net.ipv6.neigh.IFNAME.retrans_time_ms": "2000"
      }
    }
  ]
}'

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: bond-sysctl-network
  namespace: sysctl-tuning-test
spec:
  config: '{
  "cniVersion":"0.4.0",
  "name":"bound-net",
  "plugins":[
    {
      "type":"bond",

1


      "mode": "active-backup",

2


      "failOverMac": 1,

3


      "linksInContainer": true,

4


      "miimon": "100",
      "links": [

5


        {"name": "net1"},
        {"name": "net2"}
      ],
      "ipam":{

6


        "type":"static"
      }
    },
    {
      "type":"tuning",

7


      "capabilities":{
        "mac":true
      },
      "sysctl":{
        "net.ipv4.conf.IFNAME.accept_redirects": "0",
        "net.ipv4.conf.IFNAME.accept_source_route": "0",
        "net.ipv4.conf.IFNAME.disable_policy": "1",
        "net.ipv4.conf.IFNAME.secure_redirects": "0",
        "net.ipv4.conf.IFNAME.send_redirects": "0",
        "net.ipv6.conf.IFNAME.accept_redirects": "0",
        "net.ipv6.conf.IFNAME.accept_source_route": "1",
        "net.ipv6.neigh.IFNAME.base_reachable_time_ms": "20000",
        "net.ipv6.neigh.IFNAME.retrans_time_ms": "2000"
      }
    }
  ]
}'

Copy to Clipboard

Toggle word wrap

The type is bond.

The mode attribute specifies the bonding mode. The bonding modes supported are:

balance-rr - 0
active-backup - 1
balance-xor - 2
For balance-rr or balance-xor modes, you must set the trust mode to on for the SR-IOV virtual function.

The failover attribute is mandatory for active-backup mode.

The linksInContainer=true flag informs the Bond CNI that the required interfaces are to be found inside the container. By default, Bond CNI looks for these interfaces on the host which does not work for integration with SRIOV and Multus.

The links section defines which interfaces will be used to create the bond. By default, Multus names the attached interfaces as: "net", plus a consecutive number, starting with one.

6

A configuration object for the IPAM CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition. In this pod example IP addresses are configured manually, so in this case,ipam is set to static.

7

Add additional capabilities to the device. For example, set the type field to tuning. Specify the interface-level network sysctl you want to set in the sysctl field. This example sets all interface-level network sysctl settings that can be set.

Create the bond network attachment resource:
```
oc create -f sriov-bond-network-interface.yaml
```
```
$ oc create -f sriov-bond-network-interface.yaml
```
Copy to Clipboard Toggle word wrap

Verifying that the NetworkAttachmentDefinition CR is successfully created

Confirm that the SR-IOV Network Operator created the NetworkAttachmentDefinition CR by running the following command:
```
oc get network-attachment-definitions -n <namespace>
```
```
$ oc get network-attachment-definitions -n <namespace> 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <namespace> with the networkNamespace that you specified when configuring the network attachment, for example, sysctl-tuning-test.
Example output
```
NAME                          AGE
bond-sysctl-network           22m
allvalidflags                 47m
```
```
NAME                          AGE
bond-sysctl-network           22m
allvalidflags                 47m
```
Copy to Clipboard Toggle word wrap
Note
There might be a delay before the SR-IOV Network Operator creates the CR.

Verifying that the additional SR-IOV network resource is successful

To verify that the tuning CNI is correctly configured and the additional SR-IOV network attachment is attached, do the following:

Create a Pod CR. For example, save the following YAML as the file examplepod.yaml:

apiVersion: v1
kind: Pod
metadata:
  name: tunepod
  namespace: sysctl-tuning-test
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {"name": "allvalidflags"}, 
        {"name": "allvalidflags"},
        {
          "name": "bond-sysctl-network",
          "interface": "bond0",
          "mac": "0a:56:0a:83:04:0c", 
          "ips": ["10.100.100.200/24"] 
       }
      ]
spec:
  containers:
  - name: podexample
    image: centos
    command: ["/bin/bash", "-c", "sleep INF"]
    securityContext:
      runAsUser: 2000
      runAsGroup: 3000
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault

apiVersion: v1
kind: Pod
metadata:
  name: tunepod
  namespace: sysctl-tuning-test
  annotations:
    k8s.v1.cni.cncf.io/networks: |-
      [
        {"name": "allvalidflags"},

1


        {"name": "allvalidflags"},
        {
          "name": "bond-sysctl-network",
          "interface": "bond0",
          "mac": "0a:56:0a:83:04:0c",

2


          "ips": ["10.100.100.200/24"]

3


       }
      ]
spec:
  containers:
  - name: podexample
    image: centos
    command: ["/bin/bash", "-c", "sleep INF"]
    securityContext:
      runAsUser: 2000
      runAsGroup: 3000
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault

Copy to Clipboard

Toggle word wrap

1: The name of the SR-IOV network attachment definition CR.
2: Optional: The MAC address for the SR-IOV device that is allocated from the resource type defined in the SR-IOV network attachment definition CR. To use this feature, you also must specify { "mac": true } in the SriovNetwork object.
3: Optional: IP addresses for the SR-IOV device that are allocated from the resource type defined in the SR-IOV network attachment definition CR. Both IPv4 and IPv6 addresses are supported. To use this feature, you also must specify { "ips": true } in the SriovNetwork object.

Apply the YAML:
```
oc apply -f examplepod.yaml
```
```
$ oc apply -f examplepod.yaml
```
Copy to Clipboard Toggle word wrap

Verify that the pod is created by running the following command:

oc get pod -n sysctl-tuning-test

$ oc get pod -n sysctl-tuning-test

Copy to Clipboard

Toggle word wrap

Example output

NAME      READY   STATUS    RESTARTS   AGE
tunepod   1/1     Running   0          47s

NAME      READY   STATUS    RESTARTS   AGE
tunepod   1/1     Running   0          47s

Copy to Clipboard

Toggle word wrap

Log in to the pod by running the following command:
```
oc rsh -n sysctl-tuning-test tunepod
```
```
$ oc rsh -n sysctl-tuning-test tunepod
```
Copy to Clipboard Toggle word wrap
Verify the values of the configured sysctl flag. Find the value net.ipv6.neigh.IFNAME.base_reachable_time_ms by running the following command::
```
sysctl net.ipv6.neigh.bond0.base_reachable_time_ms
```
```
$ sysctl net.ipv6.neigh.bond0.base_reachable_time_ms
```
Copy to Clipboard Toggle word wrap
Example output
```
net.ipv6.neigh.bond0.base_reachable_time_ms = 20000
```
```
net.ipv6.neigh.bond0.base_reachable_time_ms = 20000
```
Copy to Clipboard Toggle word wrap

24.9. Using high performance multicast
Copy link

You can use multicast on your Single Root I/O Virtualization (SR-IOV) hardware network.

24.9.1. High performance multicast
Copy link

The OpenShift SDN network plugin supports multicast between pods on the default network. This is best used for low-bandwidth coordination or service discovery, and not high-bandwidth applications. For applications such as streaming media, like Internet Protocol television (IPTV) and multipoint videoconferencing, you can utilize Single Root I/O Virtualization (SR-IOV) hardware to provide near-native performance.

When using additional SR-IOV interfaces for multicast:

Multicast packages must be sent or received by a pod through the additional SR-IOV interface.
The physical network which connects the SR-IOV interfaces decides the multicast routing and topology, which is not controlled by OpenShift Container Platform.

24.9.2. Configuring an SR-IOV interface for multicast
Copy link

The follow procedure creates an example SR-IOV interface for multicast.

Prerequisites

Install the OpenShift CLI (oc).
You must log in to the cluster with a user that has the cluster-admin role.

Procedure

Create a SriovNetworkNodePolicy object:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-example
  namespace: openshift-sriov-network-operator
spec:
  resourceName: example
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  nicSelector:
    vendor: "8086"
    pfNames: ['ens803f0']
    rootDevices: ['0000:86:00.0']

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: policy-example
  namespace: openshift-sriov-network-operator
spec:
  resourceName: example
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 4
  nicSelector:
    vendor: "8086"
    pfNames: ['ens803f0']
    rootDevices: ['0000:86:00.0']

Copy to Clipboard

Toggle word wrap

Create a SriovNetwork object:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: net-example
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: default
  ipam: | 
    {
      "type": "host-local", 
      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "routes": [
        {"dst": "224.0.0.0/5"},
        {"dst": "232.0.0.0/5"}
      ],
      "gateway": "10.56.217.1"
    }
  resourceName: example

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: net-example
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: default
  ipam: |

1


    {
      "type": "host-local",

2


      "subnet": "10.56.217.0/24",
      "rangeStart": "10.56.217.171",
      "rangeEnd": "10.56.217.181",
      "routes": [
        {"dst": "224.0.0.0/5"},
        {"dst": "232.0.0.0/5"}
      ],
      "gateway": "10.56.217.1"
    }
  resourceName: example

Copy to Clipboard

Toggle word wrap

1 2: If you choose to configure DHCP as IPAM, ensure that you provision the following default routes through your DHCP server: 224.0.0.0/5 and 232.0.0.0/5. This is to override the static multicast route set by the default network provider.

Create a pod with multicast application:

apiVersion: v1
kind: Pod
metadata:
  name: testpmd
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/networks: nic1
spec:
  containers:
  - name: example
    image: rhel7:latest
    securityContext:
      capabilities:
        add: ["NET_ADMIN"] 
    command: [ "sleep", "infinity"]

apiVersion: v1
kind: Pod
metadata:
  name: testpmd
  namespace: default
  annotations:
    k8s.v1.cni.cncf.io/networks: nic1
spec:
  containers:
  - name: example
    image: rhel7:latest
    securityContext:
      capabilities:
        add: ["NET_ADMIN"]

1


    command: [ "sleep", "infinity"]

Copy to Clipboard

Toggle word wrap

1: The NET_ADMIN capability is required only if your application needs to assign the multicast IP address to the SR-IOV interface. Otherwise, it can be omitted.

24.10. Using DPDK and RDMA
Copy link

The containerized Data Plane Development Kit (DPDK) application is supported on OpenShift Container Platform. You can use Single Root I/O Virtualization (SR-IOV) network hardware with the Data Plane Development Kit (DPDK) and with remote direct memory access (RDMA).

For information on supported devices, refer to Supported devices.

24.10.1. Using a virtual function in DPDK mode with an Intel NIC
Copy link

Prerequisites

Install the OpenShift CLI (oc).
Install the SR-IOV Network Operator.
Log in as a user with cluster-admin privileges.

Procedure

Create the following SriovNetworkNodePolicy object, and then save the YAML in the intel-dpdk-node-policy.yaml file.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-dpdk-node-policy
  namespace: openshift-sriov-network-operator
spec:
  resourceName: intelnics
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: <priority>
  numVfs: <num>
  nicSelector:
    vendor: "8086"
    deviceID: "158b"
    pfNames: ["<pf_name>", ...]
    rootDevices: ["<pci_bus_id>", "..."]
  deviceType: vfio-pci

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: intel-dpdk-node-policy
  namespace: openshift-sriov-network-operator
spec:
  resourceName: intelnics
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: <priority>
  numVfs: <num>
  nicSelector:
    vendor: "8086"
    deviceID: "158b"
    pfNames: ["<pf_name>", ...]
    rootDevices: ["<pci_bus_id>", "..."]
  deviceType: vfio-pci

1

Copy to Clipboard

Toggle word wrap

1: Specify the driver type for the virtual functions to vfio-pci.

Note

See the Configuring SR-IOV network devices section for a detailed explanation on each option in SriovNetworkNodePolicy.

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

After the configuration update is applied, all the pods in openshift-sriov-network-operator namespace will change to a Running status.

Create the SriovNetworkNodePolicy object by running the following command:
```
oc create -f intel-dpdk-node-policy.yaml
```
```
$ oc create -f intel-dpdk-node-policy.yaml
```
Copy to Clipboard Toggle word wrap
Create the following SriovNetwork object, and then save the YAML in the intel-dpdk-network.yaml file.
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: intel-dpdk-network
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: <target_namespace>
  ipam: |-
# ... 
  vlan: <vlan>
  resourceName: intelnics
```
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: intel-dpdk-network
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: <target_namespace>
  ipam: |-
# ... 
```
1
```
  vlan: <vlan>
  resourceName: intelnics
```
Copy to Clipboard Toggle word wrap
1
Specify a configuration object for the ipam CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
Note
See the "Configuring SR-IOV additional network" section for a detailed explanation on each option in SriovNetwork.
An optional library, app-netutil, provides several API methods for gathering network information about a container’s parent pod.
Create the SriovNetwork object by running the following command:
```
oc create -f intel-dpdk-network.yaml
```
```
$ oc create -f intel-dpdk-network.yaml
```
Copy to Clipboard Toggle word wrap
Create the following Pod spec, and then save the YAML in the intel-dpdk-pod.yaml file.
```
apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  namespace: <target_namespace> 
  annotations:
    k8s.v1.cni.cncf.io/networks: intel-dpdk-network
spec:
  containers:
  - name: testpmd
    image: <DPDK_image> 
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] 
    volumeMounts:
    - mountPath: /mnt/huge 
      name: hugepage
    resources:
      limits:
        openshift.io/intelnics: "1" 
        memory: "1Gi"
        cpu: "4" 
        hugepages-1Gi: "4Gi" 
      requests:
        openshift.io/intelnics: "1"
        memory: "1Gi"
        cpu: "4"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages
```
```
apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  namespace: <target_namespace> 
```
1
```
  annotations:
    k8s.v1.cni.cncf.io/networks: intel-dpdk-network
spec:
  containers:
  - name: testpmd
    image: <DPDK_image> 
```
2
```
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] 
```
3
```
    volumeMounts:
    - mountPath: /mnt/huge 
```
4
```
      name: hugepage
    resources:
      limits:
        openshift.io/intelnics: "1" 
```
5
```
        memory: "1Gi"
        cpu: "4" 
```
6
```
        hugepages-1Gi: "4Gi" 
```
7
```
      requests:
        openshift.io/intelnics: "1"
        memory: "1Gi"
        cpu: "4"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages
```
Copy to Clipboard Toggle word wrap
1
Specify the same target_namespace where the SriovNetwork object intel-dpdk-network is created. If you would like to create the pod in a different namespace, change target_namespace in both the Pod spec and the SriovNetwork object.
2
Specify the DPDK image which includes your application and the DPDK library used by application.
3
Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access.
4
Mount a hugepage volume to the DPDK pod under /mnt/huge. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
5
Optional: Specify the number of DPDK devices allocated to DPDK pod. This resource request and limit, if not explicitly specified, will be automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by the SR-IOV Operator. It is enabled by default and can be disabled by setting enableInjector option to false in the default SriovOperatorConfig CR.
6
Specify the number of CPUs. The DPDK pod usually requires exclusive CPUs to be allocated from the kubelet. This is achieved by setting CPU Manager policy to static and creating a pod with Guaranteed QoS.
7
Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to the DPDK pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes. For example, adding kernel arguments default_hugepagesz=1GB, hugepagesz=1G and hugepages=16 will result in 16*1Gi hugepages be allocated during system boot.
Create the DPDK pod by running the following command:
```
oc create -f intel-dpdk-pod.yaml
```
```
$ oc create -f intel-dpdk-pod.yaml
```
Copy to Clipboard Toggle word wrap

24.10.2. Using a virtual function in DPDK mode with a Mellanox NIC
Copy link

You can create a network node policy and create a Data Plane Development Kit (DPDK) pod using a virtual function in DPDK mode with a Mellanox NIC.

Prerequisites

You have installed the OpenShift CLI (oc).
You have installed the Single Root I/O Virtualization (SR-IOV) Network Operator.
You have logged in as a user with cluster-admin privileges.

Procedure

Save the following SriovNetworkNodePolicy YAML configuration to an mlx-dpdk-node-policy.yaml file:
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: mlx-dpdk-node-policy
  namespace: openshift-sriov-network-operator
spec:
  resourceName: mlxnics
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: <priority>
  numVfs: <num>
  nicSelector:
    vendor: "15b3"
    deviceID: "1015" 
    pfNames: ["<pf_name>", ...]
    rootDevices: ["<pci_bus_id>", "..."]
  deviceType: netdevice 
  isRdma: true 
```
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: mlx-dpdk-node-policy
  namespace: openshift-sriov-network-operator
spec:
  resourceName: mlxnics
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: <priority>
  numVfs: <num>
  nicSelector:
    vendor: "15b3"
    deviceID: "1015" 
```
1
```
    pfNames: ["<pf_name>", ...]
    rootDevices: ["<pci_bus_id>", "..."]
  deviceType: netdevice 
```
2
```
  isRdma: true 
```
3
Copy to Clipboard Toggle word wrap
1
Specify the device hex code of the SR-IOV network device.
2
Specify the driver type for the virtual functions to netdevice. A Mellanox SR-IOV Virtual Function (VF) can work in DPDK mode without using the vfio-pci device type. The VF device appears as a kernel network interface inside a container.
3
Enable Remote Direct Memory Access (RDMA) mode. This is required for Mellanox cards to work in DPDK mode.
Note
See Configuring an SR-IOV network device for a detailed explanation of each option in the SriovNetworkNodePolicy object.
When applying the configuration specified in an SriovNetworkNodePolicy object, the SR-IOV Operator might drain the nodes, and in some cases, reboot nodes. It might take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.
After the configuration update is applied, all the pods in the openshift-sriov-network-operator namespace will change to a Running status.
Create the SriovNetworkNodePolicy object by running the following command:
```
oc create -f mlx-dpdk-node-policy.yaml
```
```
$ oc create -f mlx-dpdk-node-policy.yaml
```
Copy to Clipboard Toggle word wrap
Save the following SriovNetwork YAML configuration to an mlx-dpdk-network.yaml file:
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: mlx-dpdk-network
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: <target_namespace>
  ipam: |- 
...
  vlan: <vlan>
  resourceName: mlxnics
```
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: mlx-dpdk-network
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: <target_namespace>
  ipam: |- 
```
1
```
...
  vlan: <vlan>
  resourceName: mlxnics
```
Copy to Clipboard Toggle word wrap
1
Specify a configuration object for the IP Address Management (IPAM) Container Network Interface (CNI) plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
Note
See Configuring an SR-IOV network device for a detailed explanation on each option in the SriovNetwork object.
The app-netutil option library provides several API methods for gathering network information about the parent pod of a container.
Create the SriovNetwork object by running the following command:
```
oc create -f mlx-dpdk-network.yaml
```
```
$ oc create -f mlx-dpdk-network.yaml
```
Copy to Clipboard Toggle word wrap

Save the following Pod YAML configuration to an mlx-dpdk-pod.yaml file:

apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  namespace: <target_namespace> 
  annotations:
    k8s.v1.cni.cncf.io/networks: mlx-dpdk-network
spec:
  containers:
  - name: testpmd
    image: <DPDK_image> 
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] 
    volumeMounts:
    - mountPath: /mnt/huge 
      name: hugepage
    resources:
      limits:
        openshift.io/mlxnics: "1" 
        memory: "1Gi"
        cpu: "4" 
        hugepages-1Gi: "4Gi" 
      requests:
        openshift.io/mlxnics: "1"
        memory: "1Gi"
        cpu: "4"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

apiVersion: v1
kind: Pod
metadata:
  name: dpdk-app
  namespace: <target_namespace>

1


  annotations:
    k8s.v1.cni.cncf.io/networks: mlx-dpdk-network
spec:
  containers:
  - name: testpmd
    image: <DPDK_image>

2


    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]

3


    volumeMounts:
    - mountPath: /mnt/huge

4


      name: hugepage
    resources:
      limits:
        openshift.io/mlxnics: "1"

5


        memory: "1Gi"
        cpu: "4"

6


        hugepages-1Gi: "4Gi"

7


      requests:
        openshift.io/mlxnics: "1"
        memory: "1Gi"
        cpu: "4"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

Copy to Clipboard

Toggle word wrap

1: Specify the same target_namespace where SriovNetwork object mlx-dpdk-network is created. To create the pod in a different namespace, change target_namespace in both the Pod spec and SriovNetwork object.
2: Specify the DPDK image which includes your application and the DPDK library used by the application.
3: Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access.
4: Mount the hugepage volume to the DPDK pod under /mnt/huge. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
5: Optional: Specify the number of DPDK devices allocated for the DPDK pod. If not explicitly specified, this resource request and limit is automatically added by the SR-IOV network resource injector. The SR-IOV network resource injector is an admission controller component managed by SR-IOV Operator. It is enabled by default and can be disabled by setting the enableInjector option to false in the default SriovOperatorConfig CR.
6: Specify the number of CPUs. The DPDK pod usually requires that exclusive CPUs be allocated from the kubelet. To do this, set the CPU Manager policy to static and create a pod with Guaranteed Quality of Service (QoS).
7: Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to the DPDK pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepages requires adding kernel arguments to Nodes.

Create the DPDK pod by running the following command:
```
oc create -f mlx-dpdk-pod.yaml
```
```
$ oc create -f mlx-dpdk-pod.yaml
```
Copy to Clipboard Toggle word wrap

24.10.3. Overview of achieving a specific DPDK line rate
Copy link

To achieve a specific Data Plane Development Kit (DPDK) line rate, deploy a Node Tuning Operator and configure Single Root I/O Virtualization (SR-IOV). You must also tune the DPDK settings for the following resources:

Isolated CPUs
Hugepages
The topology scheduler

Note

In previous versions of OpenShift Container Platform, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift Container Platform applications. In OpenShift Container Platform 4.11 and later, this functionality is part of the Node Tuning Operator.

DPDK test environment

The following diagram shows the components of a traffic-testing environment:

Traffic generator: An application that can generate high-volume packet traffic.
SR-IOV-supporting NIC: A network interface card compatible with SR-IOV. The card runs a number of virtual functions on a physical interface.
Physical Function (PF): A PCI Express (PCIe) function of a network adapter that supports the SR-IOV interface.
Virtual Function (VF): A lightweight PCIe function on a network adapter that supports SR-IOV. The VF is associated with the PCIe PF on the network adapter. The VF represents a virtualized instance of the network adapter.
Switch: A network switch. Nodes can also be connected back-to-back.
testpmd: An example application included with DPDK. The testpmd application can be used to test the DPDK in a packet-forwarding mode. The testpmd application is also an example of how to build a fully-fledged application using the DPDK Software Development Kit (SDK).
worker 0 and worker 1: OpenShift Container Platform nodes.

24.10.4. Using SR-IOV and the Node Tuning Operator to achieve a DPDK line rate
Copy link

You can use the Node Tuning Operator to configure isolated CPUs, hugepages, and a topology scheduler. You can then use the Node Tuning Operator with Single Root I/O Virtualization (SR-IOV) to achieve a specific Data Plane Development Kit (DPDK) line rate.

Prerequisites

You have installed the OpenShift CLI (oc).
You have installed the SR-IOV Network Operator.
You have logged in as a user with cluster-admin privileges.
You have deployed a standalone Node Tuning Operator.
Note
In previous versions of OpenShift Container Platform, the Performance Addon Operator was used to implement automatic tuning to achieve low latency performance for OpenShift applications. In OpenShift Container Platform 4.11 and later, this functionality is part of the Node Tuning Operator.

Procedure

Create a PerformanceProfile object based on the following example:
```
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: performance
spec:
  globallyDisableIrqLoadBalancing: true
  cpu:
    isolated: 21-51,73-103 
    reserved: 0-20,52-72 
  hugepages:
    defaultHugepagesSize: 1G 
    pages:
      - count: 32
        size: 1G
  net:
    userLevelNetworking: true
  numa:
    topologyPolicy: "single-numa-node"
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
```
```
apiVersion: performance.openshift.io/v2
kind: PerformanceProfile
metadata:
  name: performance
spec:
  globallyDisableIrqLoadBalancing: true
  cpu:
    isolated: 21-51,73-103 
```
1
```
    reserved: 0-20,52-72 
```
2
```
  hugepages:
    defaultHugepagesSize: 1G 
```
3
```
    pages:
      - count: 32
        size: 1G
  net:
    userLevelNetworking: true
  numa:
    topologyPolicy: "single-numa-node"
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
```
Copy to Clipboard Toggle word wrap
1
If hyperthreading is enabled on the system, allocate the relevant symbolic links to the isolated and reserved CPU groups. If the system contains multiple non-uniform memory access nodes (NUMAs), allocate CPUs from both NUMAs to both groups. You can also use the Performance Profile Creator for this task. For more information, see Creating a performance profile.
2
You can also specify a list of devices that will have their queues set to the reserved CPU count. For more information, see Reducing NIC queues using the Node Tuning Operator.
3
Allocate the number and size of hugepages needed. You can specify the NUMA configuration for the hugepages. By default, the system allocates an even number to every NUMA node on the system. If needed, you can request the use of a realtime kernel for the nodes. See Provisioning a worker with real-time capabilities for more information.
Save the yaml file as mlx-dpdk-perfprofile-policy.yaml.
Apply the performance profile using the following command:
```
oc create -f mlx-dpdk-perfprofile-policy.yaml
```
```
$ oc create -f mlx-dpdk-perfprofile-policy.yaml
```
Copy to Clipboard Toggle word wrap

24.10.4.1. Example SR-IOV Network Operator for virtual functions
Copy link

You can use the Single Root I/O Virtualization (SR-IOV) Network Operator to allocate and configure Virtual Functions (VFs) from SR-IOV-supporting Physical Function NICs on the nodes.

For more information on deploying the Operator, see Installing the SR-IOV Network Operator. For more information on configuring an SR-IOV network device, see Configuring an SR-IOV network device.

There are some differences between running Data Plane Development Kit (DPDK) workloads on Intel VFs and Mellanox VFs. This section provides object configuration examples for both VF types. The following is an example of an sriovNetworkNodePolicy object used to run DPDK applications on Intel NICs:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk-nic-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci 
  needVhostNet: true 
  nicSelector:
    pfNames: ["ens3f0"]
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numVfs: 10
  priority: 99
  resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk-nic-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  needVhostNet: true
  nicSelector:
    pfNames: ["ens3f1"]
  nodeSelector:
  node-role.kubernetes.io/worker-cnf: ""
  numVfs: 10
  priority: 99
  resourceName: dpdk_nic_2

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk-nic-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci

1


  needVhostNet: true

2


  nicSelector:
    pfNames: ["ens3f0"]
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numVfs: 10
  priority: 99
  resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk-nic-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: vfio-pci
  needVhostNet: true
  nicSelector:
    pfNames: ["ens3f1"]
  nodeSelector:
  node-role.kubernetes.io/worker-cnf: ""
  numVfs: 10
  priority: 99
  resourceName: dpdk_nic_2

Copy to Clipboard

Toggle word wrap

1: For Intel NICs, deviceType must be vfio-pci.
2: If kernel communication with DPDK workloads is required, add needVhostNet: true. This mounts the /dev/net/tun and /dev/vhost-net devices into the container so the application can create a tap device and connect the tap device to the DPDK workload.

The following is an example of an sriovNetworkNodePolicy object for Mellanox NICs:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk-nic-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice 
  isRdma: true 
  nicSelector:
    rootDevices:
      - "0000:5e:00.1"
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numVfs: 5
  priority: 99
  resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk-nic-2
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  isRdma: true
  nicSelector:
    rootDevices:
      - "0000:5e:00.0"
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numVfs: 5
  priority: 99
  resourceName: dpdk_nic_2

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk-nic-1
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice

1


  isRdma: true

2


  nicSelector:
    rootDevices:
      - "0000:5e:00.1"
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numVfs: 5
  priority: 99
  resourceName: dpdk_nic_1
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: dpdk-nic-2
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice
  isRdma: true
  nicSelector:
    rootDevices:
      - "0000:5e:00.0"
  nodeSelector:
    node-role.kubernetes.io/worker-cnf: ""
  numVfs: 5
  priority: 99
  resourceName: dpdk_nic_2

Copy to Clipboard

Toggle word wrap

1: For Mellanox devices the deviceType must be netdevice.
2: For Mellanox devices isRdma must be true. Mellanox cards are connected to DPDK applications using Flow Bifurcation. This mechanism splits traffic between Linux user space and kernel space, and can enhance line rate processing capability.

24.10.4.2. Example SR-IOV network operator
Copy link

The following is an example definition of an sriovNetwork object. In this case, Intel and Mellanox configurations are identical:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: dpdk-network-1
  namespace: openshift-sriov-network-operator
spec:
  ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.1.0/24"}]],"dataDir":
   "/run/my-orchestrator/container-ipam-state-1"}' 
  networkNamespace: dpdk-test 
  spoofChk: "off"
  trust: "on"
  resourceName: dpdk_nic_1 
---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: dpdk-network-2
  namespace: openshift-sriov-network-operator
spec:
  ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.2.0/24"}]],"dataDir":
   "/run/my-orchestrator/container-ipam-state-1"}'
  networkNamespace: dpdk-test
  spoofChk: "off"
  trust: "on"
  resourceName: dpdk_nic_2

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: dpdk-network-1
  namespace: openshift-sriov-network-operator
spec:
  ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.1.0/24"}]],"dataDir":
   "/run/my-orchestrator/container-ipam-state-1"}'

1


  networkNamespace: dpdk-test

2


  spoofChk: "off"
  trust: "on"
  resourceName: dpdk_nic_1

3


---
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: dpdk-network-2
  namespace: openshift-sriov-network-operator
spec:
  ipam: '{"type": "host-local","ranges": [[{"subnet": "10.0.2.0/24"}]],"dataDir":
   "/run/my-orchestrator/container-ipam-state-1"}'
  networkNamespace: dpdk-test
  spoofChk: "off"
  trust: "on"
  resourceName: dpdk_nic_2

Copy to Clipboard

Toggle word wrap

1: You can use a different IP Address Management (IPAM) implementation, such as Whereabouts. For more information, see Dynamic IP address assignment configuration with Whereabouts.
2: You must request the networkNamespace where the network attachment definition will be created. You must create the sriovNetwork CR under the openshift-sriov-network-operator namespace.
3: The resourceName value must match that of the resourceName created under the sriovNetworkNodePolicy.

24.10.4.3. Example DPDK base workload
Copy link

The following is an example of a Data Plane Development Kit (DPDK) container:

apiVersion: v1
kind: Namespace
metadata:
  name: dpdk-test
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: '[ 
     {
      "name": "dpdk-network-1",
      "namespace": "dpdk-test"
     },
     {
      "name": "dpdk-network-2",
      "namespace": "dpdk-test"
     }
   ]'
    irq-load-balancing.crio.io: "disable" 
    cpu-load-balancing.crio.io: "disable"
    cpu-quota.crio.io: "disable"
  labels:
    app: dpdk
  name: testpmd
  namespace: dpdk-test
spec:
  runtimeClassName: performance-performance 
  containers:
    - command:
        - /bin/bash
        - -c
        - sleep INF
      image: registry.redhat.io/openshift4/dpdk-base-rhel8
      imagePullPolicy: Always
      name: dpdk
      resources: 
        limits:
          cpu: "16"
          hugepages-1Gi: 8Gi
          memory: 2Gi
        requests:
          cpu: "16"
          hugepages-1Gi: 8Gi
          memory: 2Gi
      securityContext:
        capabilities:
          add:
            - IPC_LOCK
            - SYS_RESOURCE
            - NET_RAW
            - NET_ADMIN
        runAsUser: 0
      volumeMounts:
        - mountPath: /mnt/huge
          name: hugepages
  terminationGracePeriodSeconds: 5
  volumes:
    - emptyDir:
        medium: HugePages
      name: hugepages

apiVersion: v1
kind: Namespace
metadata:
  name: dpdk-test
---
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks: '[

1


     {
      "name": "dpdk-network-1",
      "namespace": "dpdk-test"
     },
     {
      "name": "dpdk-network-2",
      "namespace": "dpdk-test"
     }
   ]'
    irq-load-balancing.crio.io: "disable"

2


    cpu-load-balancing.crio.io: "disable"
    cpu-quota.crio.io: "disable"
  labels:
    app: dpdk
  name: testpmd
  namespace: dpdk-test
spec:
  runtimeClassName: performance-performance

3


  containers:
    - command:
        - /bin/bash
        - -c
        - sleep INF
      image: registry.redhat.io/openshift4/dpdk-base-rhel8
      imagePullPolicy: Always
      name: dpdk
      resources:

4


        limits:
          cpu: "16"
          hugepages-1Gi: 8Gi
          memory: 2Gi
        requests:
          cpu: "16"
          hugepages-1Gi: 8Gi
          memory: 2Gi
      securityContext:
        capabilities:
          add:
            - IPC_LOCK
            - SYS_RESOURCE
            - NET_RAW
            - NET_ADMIN
        runAsUser: 0
      volumeMounts:
        - mountPath: /mnt/huge
          name: hugepages
  terminationGracePeriodSeconds: 5
  volumes:
    - emptyDir:
        medium: HugePages
      name: hugepages

Copy to Clipboard

Toggle word wrap

1: Request the SR-IOV networks you need. Resources for the devices will be injected automatically.
2: Disable the CPU and IRQ load balancing base. See Disabling interrupt processing for individual pods for more information.
3: Set the runtimeClass to performance-performance. Do not set the runtimeClass to HostNetwork or privileged.
4: Request an equal number of resources for requests and limits to start the pod with Guaranteed Quality of Service (QoS).

Note

Do not start the pod with SLEEP and then exec into the pod to start the testpmd or the DPDK workload. This can add additional interrupts as the exec process is not pinned to any CPU.

24.10.4.4. Example testpmd script
Copy link

The following is an example script for running testpmd:

#!/bin/bash
set -ex
export CPU=$(cat /sys/fs/cgroup/cpuset/cpuset.cpus)
echo ${CPU}

dpdk-testpmd -l ${CPU} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_1} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_2} -n 4 -- -i --nb-cores=15 --rxd=4096 --txd=4096 --rxq=7 --txq=7 --forward-mode=mac --eth-peer=0,50:00:00:00:00:01 --eth-peer=1,50:00:00:00:00:02

#!/bin/bash
set -ex
export CPU=$(cat /sys/fs/cgroup/cpuset/cpuset.cpus)
echo ${CPU}

dpdk-testpmd -l ${CPU} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_1} -a ${PCIDEVICE_OPENSHIFT_IO_DPDK_NIC_2} -n 4 -- -i --nb-cores=15 --rxd=4096 --txd=4096 --rxq=7 --txq=7 --forward-mode=mac --eth-peer=0,50:00:00:00:00:01 --eth-peer=1,50:00:00:00:00:02

Copy to Clipboard

Toggle word wrap

This example uses two different sriovNetwork CRs. The environment variable contains the Virtual Function (VF) PCI address that was allocated for the pod. If you use the same network in the pod definition, you must split the pciAddress. It is important to configure the correct MAC addresses of the traffic generator. This example uses custom MAC addresses.

24.10.5. Using a virtual function in RDMA mode with a Mellanox NIC
Copy link

Important

RDMA over Converged Ethernet (RoCE) is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

RDMA over Converged Ethernet (RoCE) is the only supported mode when using RDMA on OpenShift Container Platform.

Prerequisites

Install the OpenShift CLI (oc).
Install the SR-IOV Network Operator.
Log in as a user with cluster-admin privileges.

Procedure

Create the following SriovNetworkNodePolicy object, and then save the YAML in the mlx-rdma-node-policy.yaml file.

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: mlx-rdma-node-policy
  namespace: openshift-sriov-network-operator
spec:
  resourceName: mlxnics
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: <priority>
  numVfs: <num>
  nicSelector:
    vendor: "15b3"
    deviceID: "1015" 
    pfNames: ["<pf_name>", ...]
    rootDevices: ["<pci_bus_id>", "..."]
  deviceType: netdevice 
  isRdma: true

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: mlx-rdma-node-policy
  namespace: openshift-sriov-network-operator
spec:
  resourceName: mlxnics
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  priority: <priority>
  numVfs: <num>
  nicSelector:
    vendor: "15b3"
    deviceID: "1015"

1


    pfNames: ["<pf_name>", ...]
    rootDevices: ["<pci_bus_id>", "..."]
  deviceType: netdevice

2


  isRdma: true

3

Copy to Clipboard

Toggle word wrap

1: Specify the device hex code of the SR-IOV network device.
2: Specify the driver type for the virtual functions to netdevice.
3: Enable RDMA mode.

Note

See the Configuring SR-IOV network devices section for a detailed explanation on each option in SriovNetworkNodePolicy.

When applying the configuration specified in a SriovNetworkNodePolicy object, the SR-IOV Operator may drain the nodes, and in some cases, reboot nodes. It may take several minutes for a configuration change to apply. Ensure that there are enough available nodes in your cluster to handle the evicted workload beforehand.

After the configuration update is applied, all the pods in the openshift-sriov-network-operator namespace will change to a Running status.

Create the SriovNetworkNodePolicy object by running the following command:
```
oc create -f mlx-rdma-node-policy.yaml
```
```
$ oc create -f mlx-rdma-node-policy.yaml
```
Copy to Clipboard Toggle word wrap
Create the following SriovNetwork object, and then save the YAML in the mlx-rdma-network.yaml file.
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: mlx-rdma-network
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: <target_namespace>
  ipam: |- 
# ...
  vlan: <vlan>
  resourceName: mlxnics
```
```
apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetwork
metadata:
  name: mlx-rdma-network
  namespace: openshift-sriov-network-operator
spec:
  networkNamespace: <target_namespace>
  ipam: |- 
```
1
```
# ...
  vlan: <vlan>
  resourceName: mlxnics
```
Copy to Clipboard Toggle word wrap
1
Specify a configuration object for the ipam CNI plugin as a YAML block scalar. The plugin manages IP address assignment for the attachment definition.
Note
See the "Configuring SR-IOV additional network" section for a detailed explanation on each option in SriovNetwork.
An optional library, app-netutil, provides several API methods for gathering network information about a container’s parent pod.
Create the SriovNetworkNodePolicy object by running the following command:
```
oc create -f mlx-rdma-network.yaml
```
```
$ oc create -f mlx-rdma-network.yaml
```
Copy to Clipboard Toggle word wrap

Create the following Pod spec, and then save the YAML in the mlx-rdma-pod.yaml file.

apiVersion: v1
kind: Pod
metadata:
  name: rdma-app
  namespace: <target_namespace> 
  annotations:
    k8s.v1.cni.cncf.io/networks: mlx-rdma-network
spec:
  containers:
  - name: testpmd
    image: <RDMA_image> 
    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"] 
    volumeMounts:
    - mountPath: /mnt/huge 
      name: hugepage
    resources:
      limits:
        memory: "1Gi"
        cpu: "4" 
        hugepages-1Gi: "4Gi" 
      requests:
        memory: "1Gi"
        cpu: "4"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

apiVersion: v1
kind: Pod
metadata:
  name: rdma-app
  namespace: <target_namespace>

1


  annotations:
    k8s.v1.cni.cncf.io/networks: mlx-rdma-network
spec:
  containers:
  - name: testpmd
    image: <RDMA_image>

2


    securityContext:
      runAsUser: 0
      capabilities:
        add: ["IPC_LOCK","SYS_RESOURCE","NET_RAW"]

3


    volumeMounts:
    - mountPath: /mnt/huge

4


      name: hugepage
    resources:
      limits:
        memory: "1Gi"
        cpu: "4"

5


        hugepages-1Gi: "4Gi"

6


      requests:
        memory: "1Gi"
        cpu: "4"
        hugepages-1Gi: "4Gi"
    command: ["sleep", "infinity"]
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

Copy to Clipboard

Toggle word wrap

1: Specify the same target_namespace where SriovNetwork object mlx-rdma-network is created. If you would like to create the pod in a different namespace, change target_namespace in both Pod spec and SriovNetwork object.
2: Specify the RDMA image which includes your application and RDMA library used by application.
3: Specify additional capabilities required by the application inside the container for hugepage allocation, system resource allocation, and network interface access.
4: Mount the hugepage volume to RDMA pod under /mnt/huge. The hugepage volume is backed by the emptyDir volume type with the medium being Hugepages.
5: Specify number of CPUs. The RDMA pod usually requires exclusive CPUs be allocated from the kubelet. This is achieved by setting CPU Manager policy to static and create pod with Guaranteed QoS.
6: Specify hugepage size hugepages-1Gi or hugepages-2Mi and the quantity of hugepages that will be allocated to the RDMA pod. Configure 2Mi and 1Gi hugepages separately. Configuring 1Gi hugepage requires adding kernel arguments to Nodes.

Create the RDMA pod by running the following command:
```
oc create -f mlx-rdma-pod.yaml
```
```
$ oc create -f mlx-rdma-pod.yaml
```
Copy to Clipboard Toggle word wrap

24.10.6. A test pod template for clusters that use OVS-DPDK on OpenStack
Copy link

The following testpmd pod demonstrates container creation with huge pages, reserved CPUs, and the SR-IOV port.

An example testpmd pod

apiVersion: v1
kind: Pod
metadata:
  name: testpmd-dpdk
  namespace: mynamespace
  annotations:
    cpu-load-balancing.crio.io: "disable"
    cpu-quota.crio.io: "disable"
# ...
spec:
  containers:
  - name: testpmd
    command: ["sleep", "99999"]
    image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
    securityContext:
      capabilities:
        add: ["IPC_LOCK","SYS_ADMIN"]
      privileged: true
      runAsUser: 0
    resources:
      requests:
        memory: 1000Mi
        hugepages-1Gi: 1Gi
        cpu: '2'
        openshift.io/dpdk1: 1 
      limits:
        hugepages-1Gi: 1Gi
        cpu: '2'
        memory: 1000Mi
        openshift.io/dpdk1: 1
    volumeMounts:
      - mountPath: /mnt/huge
        name: hugepage
        readOnly: False
  runtimeClassName: performance-cnf-performanceprofile 
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

apiVersion: v1
kind: Pod
metadata:
  name: testpmd-dpdk
  namespace: mynamespace
  annotations:
    cpu-load-balancing.crio.io: "disable"
    cpu-quota.crio.io: "disable"
# ...
spec:
  containers:
  - name: testpmd
    command: ["sleep", "99999"]
    image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
    securityContext:
      capabilities:
        add: ["IPC_LOCK","SYS_ADMIN"]
      privileged: true
      runAsUser: 0
    resources:
      requests:
        memory: 1000Mi
        hugepages-1Gi: 1Gi
        cpu: '2'
        openshift.io/dpdk1: 1

1


      limits:
        hugepages-1Gi: 1Gi
        cpu: '2'
        memory: 1000Mi
        openshift.io/dpdk1: 1
    volumeMounts:
      - mountPath: /mnt/huge
        name: hugepage
        readOnly: False
  runtimeClassName: performance-cnf-performanceprofile

2


  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

Copy to Clipboard

Toggle word wrap

1: The name dpdk1 in this example is a user-created SriovNetworkNodePolicy resource. You can substitute this name for that of a resource that you create.
2: If your performance profile is not named cnf-performance profile, replace that string with the correct performance profile name.

24.10.7. A test pod template for clusters that use OVS hardware offloading on OpenStack
Copy link

The following testpmd pod demonstrates Open vSwitch (OVS) hardware offloading on Red Hat OpenStack Platform (RHOSP).

An example testpmd pod

apiVersion: v1
kind: Pod
metadata:
  name: testpmd-sriov
  namespace: mynamespace
  annotations:
    k8s.v1.cni.cncf.io/networks: hwoffload1
spec:
  runtimeClassName: performance-cnf-performanceprofile 
  containers:
  - name: testpmd
    command: ["sleep", "99999"]
    image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
    securityContext:
      capabilities:
        add: ["IPC_LOCK","SYS_ADMIN"]
      privileged: true
      runAsUser: 0
    resources:
      requests:
        memory: 1000Mi
        hugepages-1Gi: 1Gi
        cpu: '2'
      limits:
        hugepages-1Gi: 1Gi
        cpu: '2'
        memory: 1000Mi
    volumeMounts:
      - mountPath: /mnt/huge
        name: hugepage
        readOnly: False
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

apiVersion: v1
kind: Pod
metadata:
  name: testpmd-sriov
  namespace: mynamespace
  annotations:
    k8s.v1.cni.cncf.io/networks: hwoffload1
spec:
  runtimeClassName: performance-cnf-performanceprofile

1


  containers:
  - name: testpmd
    command: ["sleep", "99999"]
    image: registry.redhat.io/openshift4/dpdk-base-rhel8:v4.9
    securityContext:
      capabilities:
        add: ["IPC_LOCK","SYS_ADMIN"]
      privileged: true
      runAsUser: 0
    resources:
      requests:
        memory: 1000Mi
        hugepages-1Gi: 1Gi
        cpu: '2'
      limits:
        hugepages-1Gi: 1Gi
        cpu: '2'
        memory: 1000Mi
    volumeMounts:
      - mountPath: /mnt/huge
        name: hugepage
        readOnly: False
  volumes:
  - name: hugepage
    emptyDir:
      medium: HugePages

Copy to Clipboard

Toggle word wrap

1: If your performance profile is not named cnf-performance profile, replace that string with the correct performance profile name.

24.11. Using pod-level bonding
Copy link

Bonding at the pod level is vital to enable workloads inside pods that require high availability and more throughput. With pod-level bonding, you can create a bond interface from multiple single root I/O virtualization (SR-IOV) virtual function interfaces in a kernel mode interface. The SR-IOV virtual functions are passed into the pod and attached to a kernel driver.

One scenario where pod level bonding is required is creating a bond interface from multiple SR-IOV virtual functions on different physical functions. Creating a bond interface from two different physical functions on the host can be used to achieve high availability and throughput at pod level.

For guidance on tasks such as creating a SR-IOV network, network policies, network attachment definitions and pods, see Configuring an SR-IOV network device.

24.11.1. Configuring a bond interface from two SR-IOV interfaces
Copy link

Bonding enables multiple network interfaces to be aggregated into a single logical "bonded" interface. Bond Container Network Interface (Bond-CNI) brings bond capability into containers.

Bond-CNI can be created using Single Root I/O Virtualization (SR-IOV) virtual functions and placing them in the container network namespace.

OpenShift Container Platform only supports Bond-CNI using SR-IOV virtual functions. The SR-IOV Network Operator provides the SR-IOV CNI plugin needed to manage the virtual functions. Other CNIs or types of interfaces are not supported.

Prerequisites

The SR-IOV Network Operator must be installed and configured to obtain virtual functions in a container.
To configure SR-IOV interfaces, an SR-IOV network and policy must be created for each interface.
The SR-IOV Network Operator creates a network attachment definition for each SR-IOV interface, based on the SR-IOV network and policy defined.
The linkState is set to the default value auto for the SR-IOV virtual function.

24.11.1.1. Creating a bond network attachment definition
Copy link

Now that the SR-IOV virtual functions are available, you can create a bond network attachment definition.

apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
      name: bond-net1
      namespace: demo
    spec:
      config: '{
      "type": "bond", 
      "cniVersion": "0.3.1",
      "name": "bond-net1",
      "mode": "active-backup", 
      "failOverMac": 1, 
      "linksInContainer": true, 
      "miimon": "100",
      "mtu": 1500,
      "links": [ 
            {"name": "net1"},
            {"name": "net2"}
        ],
      "ipam": {
            "type": "host-local",
            "subnet": "10.56.217.0/24",
            "routes": [{
            "dst": "0.0.0.0/0"
            }],
            "gateway": "10.56.217.1"
        }
      }'

apiVersion: "k8s.cni.cncf.io/v1"
    kind: NetworkAttachmentDefinition
    metadata:
      name: bond-net1
      namespace: demo
    spec:
      config: '{
      "type": "bond",

1


      "cniVersion": "0.3.1",
      "name": "bond-net1",
      "mode": "active-backup",

2


      "failOverMac": 1,

3


      "linksInContainer": true,

4


      "miimon": "100",
      "mtu": 1500,
      "links": [

5


            {"name": "net1"},
            {"name": "net2"}
        ],
      "ipam": {
            "type": "host-local",
            "subnet": "10.56.217.0/24",
            "routes": [{
            "dst": "0.0.0.0/0"
            }],
            "gateway": "10.56.217.1"
        }
      }'

Copy to Clipboard

Toggle word wrap

The cni-type is always set to bond.

The mode attribute specifies the bonding mode.

Note

The bonding modes supported are:

balance-rr - 0
active-backup - 1
balance-xor - 2

For balance-rr or balance-xor modes, you must set the trust mode to on for the SR-IOV virtual function.

The failover attribute is mandatory for active-backup mode and must be set to 1.

The linksInContainer=true flag informs the Bond CNI that the required interfaces are to be found inside the container. By default, Bond CNI looks for these interfaces on the host which does not work for integration with SRIOV and Multus.

The links section defines which interfaces will be used to create the bond. By default, Multus names the attached interfaces as: "net", plus a consecutive number, starting with one.

24.11.1.2. Creating a pod using a bond interface
Copy link

Test the setup by creating a pod with a YAML file named for example podbonding.yaml with content similar to the following:

apiVersion: v1
    kind: Pod
    metadata:
      name: bondpod1
      namespace: demo
      annotations:
        k8s.v1.cni.cncf.io/networks: demo/sriovnet1, demo/sriovnet2, demo/bond-net1 
    spec:
      containers:
      - name: podexample
        image: quay.io/openshift/origin-network-interface-bond-cni:4.11.0
        command: ["/bin/bash", "-c", "sleep INF"]

apiVersion: v1
    kind: Pod
    metadata:
      name: bondpod1
      namespace: demo
      annotations:
        k8s.v1.cni.cncf.io/networks: demo/sriovnet1, demo/sriovnet2, demo/bond-net1

1


    spec:
      containers:
      - name: podexample
        image: quay.io/openshift/origin-network-interface-bond-cni:4.11.0
        command: ["/bin/bash", "-c", "sleep INF"]

Copy to Clipboard

Toggle word wrap

1: Note the network annotation: it contains two SR-IOV network attachments, and one bond network attachment. The bond attachment uses the two SR-IOV interfaces as bonded port interfaces.

Apply the yaml by running the following command:
```
oc apply -f podbonding.yaml
```
```
$ oc apply -f podbonding.yaml
```
Copy to Clipboard Toggle word wrap

Inspect the pod interfaces with the following command:

$ oc rsh -n demo bondpod1
sh-4.4#
sh-4.4# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
3: eth0@if150: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
link/ether 62:b1:b5:c8:fb:7a brd ff:ff:ff:ff:ff:ff
inet 10.244.1.122/24 brd 10.244.1.255 scope global eth0
valid_lft forever preferred_lft forever
4: net3: <BROADCAST,MULTICAST,UP,LOWER_UP400> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff 
inet 10.56.217.66/24 scope global bond0
valid_lft forever preferred_lft forever
43: net1: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000
link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff 
44: net2: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000
link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff

$ oc rsh -n demo bondpod1
sh-4.4#
sh-4.4# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
3: eth0@if150: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue state UP
link/ether 62:b1:b5:c8:fb:7a brd ff:ff:ff:ff:ff:ff
inet 10.244.1.122/24 brd 10.244.1.255 scope global eth0
valid_lft forever preferred_lft forever
4: net3: <BROADCAST,MULTICAST,UP,LOWER_UP400> mtu 1500 qdisc noqueue state UP qlen 1000
link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff

1


inet 10.56.217.66/24 scope global bond0
valid_lft forever preferred_lft forever
43: net1: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000
link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff

2


44: net2: <BROADCAST,MULTICAST,UP,LOWER_UP800> mtu 1500 qdisc mq master bond0 state UP qlen 1000
link/ether 9e:23:69:42:fb:8a brd ff:ff:ff:ff:ff:ff

3

Copy to Clipboard

Toggle word wrap

1: The bond interface is automatically named net3. To set a specific interface name add @name suffix to the pod’s k8s.v1.cni.cncf.io/networks annotation.
2: The net1 interface is based on an SR-IOV virtual function.
3: The net2 interface is based on an SR-IOV virtual function.

Note

If no interface names are configured in the pod annotation, interface names are assigned automatically as net<n>, with <n> starting at 1.

Optional: If you want to set a specific interface name for example bond0, edit the k8s.v1.cni.cncf.io/networks annotation and set bond0 as the interface name as follows:
```
annotations:
        k8s.v1.cni.cncf.io/networks: demo/sriovnet1, demo/sriovnet2, demo/bond-net1@bond0
```
```
annotations:
        k8s.v1.cni.cncf.io/networks: demo/sriovnet1, demo/sriovnet2, demo/bond-net1@bond0
```
Copy to Clipboard Toggle word wrap

24.12. Configuring hardware offloading
Copy link

As a cluster administrator, you can configure hardware offloading on compatible nodes to increase data processing performance and reduce load on host CPUs.

24.12.1. About hardware offloading
Copy link

Open vSwitch hardware offloading is a method of processing network tasks by diverting them away from the CPU and offloading them to a dedicated processor on a network interface controller. As a result, clusters can benefit from faster data transfer speeds, reduced CPU workloads, and lower computing costs.

The key element for this feature is a modern class of network interface controllers known as SmartNICs. A SmartNIC is a network interface controller that is able to handle computationally-heavy network processing tasks. In the same way that a dedicated graphics card can improve graphics performance, a SmartNIC can improve network performance. In each case, a dedicated processor improves performance for a specific type of processing task.

In OpenShift Container Platform, you can configure hardware offloading for bare metal nodes that have a compatible SmartNIC. Hardware offloading is configured and enabled by the SR-IOV Network Operator.

Hardware offloading is not compatible with all workloads or application types. Only the following two communication types are supported:

pod-to-pod
pod-to-service, where the service is a ClusterIP service backed by a regular pod

In all cases, hardware offloading takes place only when those pods and services are assigned to nodes that have a compatible SmartNIC. Suppose, for example, that a pod on a node with hardware offloading tries to communicate with a service on a regular node. On the regular node, all the processing takes place in the kernel, so the overall performance of the pod-to-service communication is limited to the maximum performance of that regular node. Hardware offloading is not compatible with DPDK applications.

Enabling hardware offloading on a node, but not configuring pods to use, it can result in decreased throughput performance for pod traffic. You cannot configure hardware offloading for pods that are managed by OpenShift Container Platform.

24.12.2. Supported devices
Copy link

Hardware offloading is supported on the following network interface controllers:

Expand

Table 24.15. Supported network interface controllers
Manufacturer	Model	Vendor ID	Device ID
Mellanox	MT27800 Family [ConnectX‑5]	15b3	1017
Mellanox	MT28880 Family [ConnectX‑5 Ex]	15b3	1019
Mellanox	MT2892 Family [ConnectX‑6 Dx]	15b3	101d
Mellanox	MT2894 Family [ConnectX-6 Lx]	15b3	101f
Mellanox	MT42822 BlueField-2 in ConnectX-6 NIC mode	15b3	a2d6

24.12.3. Prerequisites
Copy link

Your cluster has at least one bare metal machine with a network interface controller that is supported for hardware offloading.
You installed the SR-IOV Network Operator.
Your cluster uses the OVN-Kubernetes network plugin.
In your OVN-Kubernetes network plugin configuration, the gatewayConfig.routingViaHost field is set to false.

24.12.4. Setting the SR-IOV Network Operator into systemd mode
Copy link

To support hardware offloading, you must first set the SR-IOV Network Operator into systemd mode.

Prerequisites

You installed the OpenShift CLI (oc).
You have access to the cluster as a user that has the cluster-admin role.

Procedure

Create a SriovOperatorConfig custom resource (CR) to deploy all the SR-IOV Operator components:

Create a file named sriovOperatorConfig.yaml that contains the following YAML:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default 
  namespace: openshift-sriov-network-operator
spec:
  enableInjector: true
  enableOperatorWebhook: true
  configurationMode: "systemd" 
  logLevel: 2

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovOperatorConfig
metadata:
  name: default

1


  namespace: openshift-sriov-network-operator
spec:
  enableInjector: true
  enableOperatorWebhook: true
  configurationMode: "systemd"

2


  logLevel: 2

Copy to Clipboard

Toggle word wrap

1: The only valid name for the SriovOperatorConfig resource is default and it must be in the namespace where the Operator is deployed.
2: Setting the SR-IOV Network Operator into systemd mode is only relevant for Open vSwitch hardware offloading.

Create the resource by running the following command:
```
oc apply -f sriovOperatorConfig.yaml
```
```
$ oc apply -f sriovOperatorConfig.yaml
```
Copy to Clipboard Toggle word wrap

24.12.5. Configuring a machine config pool for hardware offloading
Copy link

To enable hardware offloading, you now create a dedicated machine config pool and configure it to work with the SR-IOV Network Operator.

Prerequisites

SR-IOV Network Operator installed and set into systemd mode.

Procedure

Create a machine config pool for machines you want to use hardware offloading on.

Create a file, such as mcp-offloading.yaml, with content like the following example:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: mcp-offloading 
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,mcp-offloading]} 
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/mcp-offloading: ""

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: mcp-offloading

1


spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,mcp-offloading]}

2


  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/mcp-offloading: ""

3

Copy to Clipboard

Toggle word wrap

1 2: The name of your machine config pool for hardware offloading.
3: This node role label is used to add nodes to the machine config pool.

Apply the configuration for the machine config pool:
```
oc create -f mcp-offloading.yaml
```
```
$ oc create -f mcp-offloading.yaml
```
Copy to Clipboard Toggle word wrap

Add nodes to the machine config pool. Label each node with the node role label of your pool:
```
oc label node worker-2 node-role.kubernetes.io/mcp-offloading=""
```
```
$ oc label node worker-2 node-role.kubernetes.io/mcp-offloading=""
```
Copy to Clipboard Toggle word wrap

Optional: To verify that the new pool is created, run the following command:

oc get nodes

$ oc get nodes

Copy to Clipboard

Toggle word wrap

Example output

NAME       STATUS   ROLES                   AGE   VERSION
master-0   Ready    master                  2d    v1.26.0
master-1   Ready    master                  2d    v1.26.0
master-2   Ready    master                  2d    v1.26.0
worker-0   Ready    worker                  2d    v1.26.0
worker-1   Ready    worker                  2d    v1.26.0
worker-2   Ready    mcp-offloading,worker   47h   v1.26.0
worker-3   Ready    mcp-offloading,worker   47h   v1.26.0

NAME       STATUS   ROLES                   AGE   VERSION
master-0   Ready    master                  2d    v1.26.0
master-1   Ready    master                  2d    v1.26.0
master-2   Ready    master                  2d    v1.26.0
worker-0   Ready    worker                  2d    v1.26.0
worker-1   Ready    worker                  2d    v1.26.0
worker-2   Ready    mcp-offloading,worker   47h   v1.26.0
worker-3   Ready    mcp-offloading,worker   47h   v1.26.0

Copy to Clipboard

Toggle word wrap

Add this machine config pool to the SriovNetworkPoolConfig custom resource:
1. Create a file, such as sriov-pool-config.yaml, with content like the following example:
  apiVersion: sriovnetwork.openshift.io/v1 kind: SriovNetworkPoolConfig metadata: name: sriovnetworkpoolconfig-offload namespace: openshift-sriov-network-operator spec: ovsHardwareOffloadConfig: name: mcp-offloading
  1
  Copy to Clipboard Toggle word wrap
  1
  The name of your machine config pool for hardware offloading.
2. Apply the configuration:
  $ oc create -f <SriovNetworkPoolConfig_name>.yaml
  Copy to Clipboard Toggle word wrap
  Note
  When you apply the configuration specified in a SriovNetworkPoolConfig object, the SR-IOV Operator drains and restarts the nodes in the machine config pool.
  It might take several minutes for a configuration changes to apply.

24.12.6. Configuring the SR-IOV network node policy
Copy link

You can create an SR-IOV network device configuration for a node by creating an SR-IOV network node policy. To enable hardware offloading, you must define the .spec.eSwitchMode field with the value "switchdev".

The following procedure creates an SR-IOV interface for a network interface controller with hardware offloading.

Prerequisites

You installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.

Procedure

Create a file, such as sriov-node-policy.yaml, with content like the following example:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-node-policy <.>
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice <.>
  eSwitchMode: "switchdev" <.>
  nicSelector:
    deviceID: "1019"
    rootDevices:
    - 0000:d8:00.0
    vendor: "15b3"
    pfNames:
    - ens8f0
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 6
  priority: 5
  resourceName: mlxnics

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: sriov-node-policy <.>
  namespace: openshift-sriov-network-operator
spec:
  deviceType: netdevice <.>
  eSwitchMode: "switchdev" <.>
  nicSelector:
    deviceID: "1019"
    rootDevices:
    - 0000:d8:00.0
    vendor: "15b3"
    pfNames:
    - ens8f0
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: "true"
  numVfs: 6
  priority: 5
  resourceName: mlxnics

Copy to Clipboard

Toggle word wrap

<.> The name for the custom resource object. <.> Required. Hardware offloading is not supported with vfio-pci. <.> Required.

Apply the configuration for the policy:
```
oc create -f sriov-node-policy.yaml
```
```
$ oc create -f sriov-node-policy.yaml
```
Copy to Clipboard Toggle word wrap
Note
When you apply the configuration specified in a SriovNetworkPoolConfig object, the SR-IOV Operator drains and restarts the nodes in the machine config pool.
It might take several minutes for a configuration change to apply.

24.12.6.1. An example SR-IOV network node policy for OpenStack
Copy link

The following example describes an SR-IOV interface for a network interface controller (NIC) with hardware offloading on Red Hat OpenStack Platform (RHOSP).

An SR-IOV interface for a NIC with hardware offloading on RHOSP

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ${name}
  namespace: openshift-sriov-network-operator
spec:
  deviceType: switchdev
  isRdma: true
  nicSelector:
    netFilter: openstack/NetworkID:${net_id}
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: 'true'
  numVfs: 1
  priority: 99
  resourceName: ${name}

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodePolicy
metadata:
  name: ${name}
  namespace: openshift-sriov-network-operator
spec:
  deviceType: switchdev
  isRdma: true
  nicSelector:
    netFilter: openstack/NetworkID:${net_id}
  nodeSelector:
    feature.node.kubernetes.io/network-sriov.capable: 'true'
  numVfs: 1
  priority: 99
  resourceName: ${name}

Copy to Clipboard

Toggle word wrap

24.12.7. Creating a network attachment definition
Copy link

After you define the machine config pool and the SR-IOV network node policy, you can create a network attachment definition for the network interface card you specified.

Prerequisites

You installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.

Procedure

Create a file, such as net-attach-def.yaml, with content like the following example:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: net-attach-def <.>
  namespace: net-attach-def <.>
  annotations:
    k8s.v1.cni.cncf.io/resourceName: openshift.io/mlxnics <.>
spec:
  config: '{"cniVersion":"0.3.1","name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay","ipam":{},"dns":{}}'

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: net-attach-def <.>
  namespace: net-attach-def <.>
  annotations:
    k8s.v1.cni.cncf.io/resourceName: openshift.io/mlxnics <.>
spec:
  config: '{"cniVersion":"0.3.1","name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay","ipam":{},"dns":{}}'

Copy to Clipboard

Toggle word wrap

<.> The name for your network attachment definition. <.> The namespace for your network attachment definition. <.> This is the value of the spec.resourceName field you specified in the SriovNetworkNodePolicy object.

Apply the configuration for the network attachment definition:
```
oc create -f net-attach-def.yaml
```
```
$ oc create -f net-attach-def.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Run the following command to see whether the new definition is present:

oc get net-attach-def -A

$ oc get net-attach-def -A

Copy to Clipboard

Toggle word wrap

Example output

NAMESPACE         NAME             AGE
net-attach-def    net-attach-def   43h

NAMESPACE         NAME             AGE
net-attach-def    net-attach-def   43h

Copy to Clipboard

Toggle word wrap

24.12.8. Adding the network attachment definition to your pods
Copy link

After you create the machine config pool, the SriovNetworkPoolConfig and SriovNetworkNodePolicy custom resources, and the network attachment definition, you can apply these configurations to your pods by adding the network attachment definition to your pod specifications.

Procedure

In the pod specification, add the .metadata.annotations.k8s.v1.cni.cncf.io/networks field and specify the network attachment definition you created for hardware offloading:
```
....
metadata:
  annotations:
    v1.multus-cni.io/default-network: net-attach-def/net-attach-def <.>
```
```
....
metadata:
  annotations:
    v1.multus-cni.io/default-network: net-attach-def/net-attach-def <.>
```
Copy to Clipboard Toggle word wrap
<.> The value must be the name and namespace of the network attachment definition you created for hardware offloading.

24.13. Switching Bluefield-2 from DPU to NIC
Copy link

You can switch the Bluefield-2 network device from data processing unit (DPU) mode to network interface controller (NIC) mode.

24.13.1. Switching Bluefield-2 from DPU mode to NIC mode
Copy link

Use the following procedure to switch Bluefield-2 from data processing units (DPU) mode to network interface controller (NIC) mode.

Important

Currently, only switching Bluefield-2 from DPU to NIC mode is supported. Switching from NIC mode to DPU mode is unsupported.

Prerequisites

You have installed the SR-IOV Network Operator. For more information, see "Installing SR-IOV Network Operator".
You have updated Bluefield-2 to the latest firmware. For more information, see Firmware for NVIDIA BlueField-2.

Procedure

Add the following labels to each of your worker nodes by entering the following commands:

oc label node <example_node_name_one> node-role.kubernetes.io/sriov=

$ oc label node <example_node_name_one> node-role.kubernetes.io/sriov=

Copy to Clipboard

Toggle word wrap

oc label node <example_node_name_two> node-role.kubernetes.io/sriov=

$ oc label node <example_node_name_two> node-role.kubernetes.io/sriov=

Copy to Clipboard

Toggle word wrap

Create a machine config pool for the SR-IOV Network Operator, for example:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: sriov
spec:
  machineConfigSelector:
    matchExpressions:
    - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,sriov]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/sriov: ""

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: sriov
spec:
  machineConfigSelector:
    matchExpressions:
    - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,sriov]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/sriov: ""

Copy to Clipboard

Toggle word wrap

Apply the following machineconfig.yaml file to the worker nodes:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: sriov
  name: 99-bf2-dpu
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,ZmluZF9jb250YWluZXIoKSB7CiAgY3JpY3RsIHBzIC1vIGpzb24gfCBqcSAtciAnLmNvbnRhaW5lcnNbXSB8IHNlbGVjdCgubWV0YWRhdGEubmFtZT09InNyaW92LW5ldHdvcmstY29uZmlnLWRhZW1vbiIpIHwgLmlkJwp9CnVudGlsIG91dHB1dD0kKGZpbmRfY29udGFpbmVyKTsgW1sgLW4gIiRvdXRwdXQiIF1dOyBkbwogIGVjaG8gIndhaXRpbmcgZm9yIGNvbnRhaW5lciB0byBjb21lIHVwIgogIHNsZWVwIDE7CmRvbmUKISBzdWRvIGNyaWN0bCBleGVjICRvdXRwdXQgL2JpbmRhdGEvc2NyaXB0cy9iZjItc3dpdGNoLW1vZGUuc2ggIiRAIgo=
        mode: 0755
        overwrite: true
        path: /etc/default/switch_in_sriov_config_daemon.sh
    systemd:
      units:
      - name: dpu-switch.service
        enabled: true
        contents: |
          [Unit]
          Description=Switch BlueField2 card to NIC/DPU mode
          RequiresMountsFor=%t/containers
          Wants=network.target
          After=network-online.target kubelet.service
          [Service]
          SuccessExitStatus=0 120
          RemainAfterExit=True
          ExecStart=/bin/bash -c '/etc/default/switch_in_sriov_config_daemon.sh nic || shutdown -r now' 
          Type=oneshot
          [Install]
          WantedBy=multi-user.target

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: sriov
  name: 99-bf2-dpu
spec:
  config:
    ignition:
      version: 3.2.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,ZmluZF9jb250YWluZXIoKSB7CiAgY3JpY3RsIHBzIC1vIGpzb24gfCBqcSAtciAnLmNvbnRhaW5lcnNbXSB8IHNlbGVjdCgubWV0YWRhdGEubmFtZT09InNyaW92LW5ldHdvcmstY29uZmlnLWRhZW1vbiIpIHwgLmlkJwp9CnVudGlsIG91dHB1dD0kKGZpbmRfY29udGFpbmVyKTsgW1sgLW4gIiRvdXRwdXQiIF1dOyBkbwogIGVjaG8gIndhaXRpbmcgZm9yIGNvbnRhaW5lciB0byBjb21lIHVwIgogIHNsZWVwIDE7CmRvbmUKISBzdWRvIGNyaWN0bCBleGVjICRvdXRwdXQgL2JpbmRhdGEvc2NyaXB0cy9iZjItc3dpdGNoLW1vZGUuc2ggIiRAIgo=
        mode: 0755
        overwrite: true
        path: /etc/default/switch_in_sriov_config_daemon.sh
    systemd:
      units:
      - name: dpu-switch.service
        enabled: true
        contents: |
          [Unit]
          Description=Switch BlueField2 card to NIC/DPU mode
          RequiresMountsFor=%t/containers
          Wants=network.target
          After=network-online.target kubelet.service
          [Service]
          SuccessExitStatus=0 120
          RemainAfterExit=True
          ExecStart=/bin/bash -c '/etc/default/switch_in_sriov_config_daemon.sh nic || shutdown -r now'

1


          Type=oneshot
          [Install]
          WantedBy=multi-user.target

Copy to Clipboard

Toggle word wrap

1: Optional: The PCI address of a specific card can optionally be specified, for example ExecStart=/bin/bash -c '/etc/default/switch_in_sriov_config_daemon.sh nic 0000:5e:00.0 || echo done'. By default, the first device is selected. If there is more than one device, you must specify which PCI address to be used. The PCI address must be the same on all nodes that are switching Bluefield-2 from DPU mode to NIC mode.

Wait for the worker nodes to restart. After restarting, the Bluefield-2 network device on the worker nodes is switched into NIC mode.
Optional: You might need to restart the host hardware because most recent Bluefield-2 firmware releases require a hardware restart to switch into NIC mode.

24.14. Uninstalling the SR-IOV Network Operator
Copy link

To uninstall the SR-IOV Network Operator, you must delete any running SR-IOV workloads, uninstall the Operator, and delete the webhooks that the Operator used.

24.14.1. Uninstalling the SR-IOV Network Operator
Copy link

As a cluster administrator, you can uninstall the SR-IOV Network Operator.

Prerequisites

You have access to an OpenShift Container Platform cluster using an account with cluster-admin permissions.
You have the SR-IOV Network Operator installed.

Procedure

Delete all SR-IOV custom resources (CRs):

oc delete sriovnetwork -n openshift-sriov-network-operator --all

$ oc delete sriovnetwork -n openshift-sriov-network-operator --all

Copy to Clipboard

Toggle word wrap

oc delete sriovnetworknodepolicy -n openshift-sriov-network-operator --all

$ oc delete sriovnetworknodepolicy -n openshift-sriov-network-operator --all

Copy to Clipboard

Toggle word wrap

oc delete sriovibnetwork -n openshift-sriov-network-operator --all

$ oc delete sriovibnetwork -n openshift-sriov-network-operator --all

Copy to Clipboard

Toggle word wrap

Follow the instructions in the "Deleting Operators from a cluster" section to remove the SR-IOV Network Operator from your cluster.

Delete the SR-IOV custom resource definitions that remain in the cluster after the SR-IOV Network Operator is uninstalled:

oc delete crd sriovibnetworks.sriovnetwork.openshift.io

$ oc delete crd sriovibnetworks.sriovnetwork.openshift.io

Copy to Clipboard

Toggle word wrap

oc delete crd sriovnetworknodepolicies.sriovnetwork.openshift.io

$ oc delete crd sriovnetworknodepolicies.sriovnetwork.openshift.io

Copy to Clipboard

Toggle word wrap

oc delete crd sriovnetworknodestates.sriovnetwork.openshift.io

$ oc delete crd sriovnetworknodestates.sriovnetwork.openshift.io

Copy to Clipboard

Toggle word wrap

oc delete crd sriovnetworkpoolconfigs.sriovnetwork.openshift.io

$ oc delete crd sriovnetworkpoolconfigs.sriovnetwork.openshift.io

Copy to Clipboard

Toggle word wrap

oc delete crd sriovnetworks.sriovnetwork.openshift.io

$ oc delete crd sriovnetworks.sriovnetwork.openshift.io

Copy to Clipboard

Toggle word wrap

oc delete crd sriovoperatorconfigs.sriovnetwork.openshift.io

$ oc delete crd sriovoperatorconfigs.sriovnetwork.openshift.io

Copy to Clipboard

Toggle word wrap

Delete the SR-IOV webhooks:

oc delete mutatingwebhookconfigurations network-resources-injector-config

$ oc delete mutatingwebhookconfigurations network-resources-injector-config

Copy to Clipboard

Toggle word wrap

oc delete MutatingWebhookConfiguration sriov-operator-webhook-config

$ oc delete MutatingWebhookConfiguration sriov-operator-webhook-config

Copy to Clipboard

Toggle word wrap

oc delete ValidatingWebhookConfiguration sriov-operator-webhook-config

$ oc delete ValidatingWebhookConfiguration sriov-operator-webhook-config

Copy to Clipboard

Toggle word wrap

Delete the SR-IOV Network Operator namespace:

oc delete namespace openshift-sriov-network-operator

$ oc delete namespace openshift-sriov-network-operator

Copy to Clipboard

Toggle word wrap

Chapter 25. OVN-Kubernetes network plugin
Copy link

25.1. About the OVN-Kubernetes network plugin
Copy link

The OpenShift Container Platform cluster uses a virtualized network for pod and service networks.

Part of Red Hat OpenShift Networking, the OVN-Kubernetes network plugin is the default network provider for OpenShift Container Platform. OVN-Kubernetes is based on Open Virtual Network (OVN) and provides an overlay-based networking implementation.

Important

For a cloud controller manager (CCM) with the --cloud-provider=external option set to cloud-provider-vsphere, a known issue exists for a cluster that operates in a networking environment with multiple subnets.

When you upgrade your cluster from OpenShift Container Platform 4.12 to OpenShift Container Platform 4.13, the CCM selects a wrong node IP address and this operation generates an error message in the namespaces/openshift-cloud-controller-manager/pods/vsphere-cloud-controller-manager logs. The error message indicates a mismatch with the node IP address and the vsphere-cloud-controller-manager pod IP address in your cluster.

The known issue might not impact the cluster upgrade operation, but you can set the correct IP address in both the nodeNetworking.external.networkSubnetCidr and the nodeNetworking.internal.networkSubnetCidr parameters for the nodeNetworking object that your cluster uses for its networking requirements.

A cluster that uses the OVN-Kubernetes plugin also runs Open vSwitch (OVS) on each node. OVN configures OVS on each node to implement the declared network configuration.

Note

OVN-Kubernetes is the default networking solution for OpenShift Container Platform and single-node OpenShift deployments.

OVN-Kubernetes, which arose from the OVS project, uses many of the same constructs, such as open flow rules, to determine how packets travel through the network. For more information, see the Open Virtual Network website.

OVN-Kubernetes is a series of daemons for OVS that translate virtual network configurations into OpenFlow rules. OpenFlow is a protocol for communicating with network switches and routers, providing a means for remotely controlling the flow of network traffic on a network device, allowing network administrators to configure, manage, and monitor the flow of network traffic.

OVN-Kubernetes provides more of the advanced functionality not available with OpenFlow. OVN supports distributed virtual routing, distributed logical switches, access control, DHCP and DNS. OVN implements distributed virtual routing within logic flows which equate to open flows. So for example if you have a pod that sends out a DHCP request on the network, it sends out that broadcast looking for DHCP address there will be a logic flow rule that matches that packet, and it responds giving it a gateway, a DNS server an IP address and so on.

OVN-Kubernetes runs a daemon on each node. There are daemon sets for the databases and for the OVN controller that run on every node. The OVN controller programs the Open vSwitch daemon on the nodes to support the network provider features; egress IPs, firewalls, routers, hybrid networking, IPSEC encryption, IPv6, network policy, network policy logs, hardware offloading and multicast.

25.1.1. OVN-Kubernetes purpose
Copy link

The OVN-Kubernetes network plugin is an open-source, fully-featured Kubernetes CNI plugin that uses Open Virtual Network (OVN) to manage network traffic flows. OVN is a community developed, vendor-agnostic network virtualization solution. The OVN-Kubernetes network plugin:

Uses OVN (Open Virtual Network) to manage network traffic flows. OVN is a community developed, vendor-agnostic network virtualization solution.
Implements Kubernetes network policy support, including ingress and egress rules.
Uses the Geneve (Generic Network Virtualization Encapsulation) protocol rather than VXLAN to create an overlay network between nodes.

The OVN-Kubernetes network plugin provides the following advantages over OpenShift SDN.

Full support for IPv6 single-stack and IPv4/IPv6 dual-stack networking on supported platforms
Support for hybrid clusters with both Linux and Microsoft Windows workloads
Optional IPsec encryption of intra-cluster communications
Offload of network data processing from host CPU to compatible network cards and data processing units (DPUs)

25.1.2. Supported network plugin feature matrix
Copy link

Red Hat OpenShift Networking offers two options for the network plugin, OpenShift SDN and OVN-Kubernetes, for the network plugin. The following table summarizes the current feature support for both network plugins:

Expand

Table 25.1. Default CNI network plugin feature comparison
Feature	OpenShift SDN	OVN-Kubernetes
Egress IPs	Supported	Supported
Egress firewall	Supported	Supported ^[1]
Egress router	Supported	Supported ^[2]
Hybrid networking	Not supported	Supported
IPsec encryption for intra-cluster communication	Not supported	Supported
IPv4 single-stack	Supported	Supported
IPv6 single-stack	Not supported	Supported ^[3]
IPv4/IPv6 dual-stack	Not Supported	Supported ^[4]
IPv6/IPv4 dual-stack	Not supported	Supported ^[5]
Kubernetes network policy	Supported	Supported
Kubernetes network policy logs	Not supported	Supported
Hardware offloading	Not supported	Supported
Multicast	Supported	Supported

Egress firewall is also known as egress network policy in OpenShift SDN. This is not the same as network policy egress.
Egress router for OVN-Kubernetes supports only redirect mode.
IPv6 single-stack networking on a bare-metal platform.
IPv4/IPv6 dual-stack networking on bare-metal, VMware vSphere (installer-provisioned infrastructure installations only), IBM Power®, and IBM Z® platforms. On VMware vSphere, dual-stack networking limitations exist.
IPv6/IPv4 dual-stack networking on bare-metal and IBM Power® platforms.

25.1.3. OVN-Kubernetes IPv6 and dual-stack limitations
Copy link

The OVN-Kubernetes network plugin has the following limitations:

For clusters configured for dual-stack networking, both IPv4 and IPv6 traffic must use the same network interface as the default gateway. If this requirement is not met, pods on the host in the ovnkube-node daemon set enter the CrashLoopBackOff state. If you display a pod with a command such as oc get pod -n openshift-ovn-kubernetes -l app=ovnkube-node -o yaml, the status field contains more than one message about the default gateway, as shown in the following output:
```
I1006 16:09:50.985852   60651 helper_linux.go:73] Found default gateway interface br-ex 192.168.127.1
I1006 16:09:50.985923   60651 helper_linux.go:73] Found default gateway interface ens4 fe80::5054:ff:febe:bcd4
F1006 16:09:50.985939   60651 ovnkube.go:130] multiple gateway interfaces detected: br-ex ens4
```
```
I1006 16:09:50.985852   60651 helper_linux.go:73] Found default gateway interface br-ex 192.168.127.1
I1006 16:09:50.985923   60651 helper_linux.go:73] Found default gateway interface ens4 fe80::5054:ff:febe:bcd4
F1006 16:09:50.985939   60651 ovnkube.go:130] multiple gateway interfaces detected: br-ex ens4
```
Copy to Clipboard Toggle word wrap
The only resolution is to reconfigure the host networking so that both IP families use the same network interface for the default gateway.
For clusters configured for dual-stack networking, both the IPv4 and IPv6 routing tables must contain the default gateway. If this requirement is not met, pods on the host in the ovnkube-node daemon set enter the CrashLoopBackOff state. If you display a pod with a command such as oc get pod -n openshift-ovn-kubernetes -l app=ovnkube-node -o yaml, the status field contains more than one message about the default gateway, as shown in the following output:
```
I0512 19:07:17.589083  108432 helper_linux.go:74] Found default gateway interface br-ex 192.168.123.1
F0512 19:07:17.589141  108432 ovnkube.go:133] failed to get default gateway interface
```
```
I0512 19:07:17.589083  108432 helper_linux.go:74] Found default gateway interface br-ex 192.168.123.1
F0512 19:07:17.589141  108432 ovnkube.go:133] failed to get default gateway interface
```
Copy to Clipboard Toggle word wrap
The only resolution is to reconfigure the host networking so that both IP families contain the default gateway.

25.1.4. Session affinity
Copy link

Session affinity is a feature that applies to Kubernetes Service objects. You can use session affinity if you want to ensure that each time you connect to a <service_VIP>:<Port>, the traffic is always load balanced to the same back end. For more information, including how to set session affinity based on a client’s IP address, see Session affinity.

Stickiness timeout for session affinity

The OVN-Kubernetes network plugin for OpenShift Container Platform calculates the stickiness timeout for a session from a client based on the last packet. For example, if you run a curl command 10 times, the sticky session timer starts from the tenth packet not the first. As a result, if the client is continuously contacting the service, then the session never times out. The timeout starts when the service has not received a packet for the amount of time set by the timeoutSeconds parameter.

25.2. OVN-Kubernetes architecture
Copy link

25.2.1. Introduction to OVN-Kubernetes architecture
Copy link

The following diagram shows the OVN-Kubernetes architecture.

Figure 25.1. OVK-Kubernetes architecture

The key components are:

Cloud Management System (CMS) - A platform specific client for OVN that provides a CMS specific plugin for OVN integration. The plugin translates the cloud management system’s concept of the logical network configuration, stored in the CMS configuration database in a CMS-specific format, into an intermediate representation understood by OVN.
OVN Northbound database (nbdb) - Stores the logical network configuration passed by the CMS plugin.
OVN Southbound database (sbdb) - Stores the physical and logical network configuration state for OpenVswitch (OVS) system on each node, including tables that bind them.
ovn-northd - This is the intermediary client between nbdb and sbdb. It translates the logical network configuration in terms of conventional network concepts, taken from the nbdb, into logical data path flows in the sbdb below it. The container name is northd and it runs in the ovnkube-master pods.
ovn-controller - This is the OVN agent that interacts with OVS and hypervisors, for any information or update that is needed for sbdb. The ovn-controller reads logical flows from the sbdb, translates them into OpenFlow flows and sends them to the node’s OVS daemon. The container name is ovn-controller and it runs in the ovnkube-node pods.

The OVN northbound database has the logical network configuration passed down to it by the cloud management system (CMS). The OVN northbound Database contains the current desired state of the network, presented as a collection of logical ports, logical switches, logical routers, and more. The ovn-northd (northd container) connects to the OVN northbound database and the OVN southbound database. It translates the logical network configuration in terms of conventional network concepts, taken from the OVN northbound Database, into logical data path flows in the OVN southbound database.

The OVN southbound database has physical and logical representations of the network and binding tables that link them together. Every node in the cluster is represented in the southbound database, and you can see the ports that are connected to it. It also contains all the logic flows, the logic flows are shared with the ovn-controller process that runs on each node and the ovn-controller turns those into OpenFlow rules to program Open vSwitch.

The Kubernetes control plane nodes each contain an ovnkube-master pod which hosts containers for the OVN northbound and southbound databases. All OVN northbound databases form a Raft cluster and all southbound databases form a separate Raft cluster. At any given time a single ovnkube-master is the leader and the other ovnkube-master pods are followers.

25.2.2. Listing all resources in the OVN-Kubernetes project
Copy link

Finding the resources and containers that run in the OVN-Kubernetes project is important to help you understand the OVN-Kubernetes networking implementation.

Prerequisites

Access to the cluster as a user with the cluster-admin role.
The OpenShift CLI (oc) installed.

Procedure

Run the following command to get all resources, endpoints, and ConfigMaps in the OVN-Kubernetes project:

oc get all,ep,cm -n openshift-ovn-kubernetes

$ oc get all,ep,cm -n openshift-ovn-kubernetes

Copy to Clipboard

Toggle word wrap

Example output

NAME                       READY   STATUS    RESTARTS      AGE
pod/ovnkube-master-9g7zt   6/6     Running   1 (48m ago)   57m
pod/ovnkube-master-lqs4v   6/6     Running   0             57m
pod/ovnkube-master-vxhtq   6/6     Running   0             57m
pod/ovnkube-node-9k9kc     5/5     Running   0             57m
pod/ovnkube-node-jg52r     5/5     Running   0             51m
pod/ovnkube-node-k8wf7     5/5     Running   0             57m
pod/ovnkube-node-tlwk6     5/5     Running   0             47m
pod/ovnkube-node-xsvnk     5/5     Running   0             57m

NAME                            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
service/ovn-kubernetes-master   ClusterIP   None         <none>        9102/TCP            57m
service/ovn-kubernetes-node     ClusterIP   None         <none>        9103/TCP,9105/TCP   57m
service/ovnkube-db              ClusterIP   None         <none>        9641/TCP,9642/TCP   57m

NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                 AGE
daemonset.apps/ovnkube-master   3         3         3       3            3           beta.kubernetes.io/os=linux,node-role.kubernetes.io/master=   57m
daemonset.apps/ovnkube-node     5         5         5       5            5           beta.kubernetes.io/os=linux                                   57m

NAME                              ENDPOINTS                                                        AGE
endpoints/ovn-kubernetes-master   10.0.132.11:9102,10.0.151.18:9102,10.0.192.45:9102               57m
endpoints/ovn-kubernetes-node     10.0.132.11:9105,10.0.143.72:9105,10.0.151.18:9105 + 7 more...   57m
endpoints/ovnkube-db              10.0.132.11:9642,10.0.151.18:9642,10.0.192.45:9642 + 3 more...   57m

NAME                                 DATA   AGE
configmap/control-plane-status       1      55m
configmap/kube-root-ca.crt           1      57m
configmap/openshift-service-ca.crt   1      57m
configmap/ovn-ca                     1      57m
configmap/ovnkube-config             1      57m
configmap/signer-ca                  1      57m

NAME                       READY   STATUS    RESTARTS      AGE
pod/ovnkube-master-9g7zt   6/6     Running   1 (48m ago)   57m
pod/ovnkube-master-lqs4v   6/6     Running   0             57m
pod/ovnkube-master-vxhtq   6/6     Running   0             57m
pod/ovnkube-node-9k9kc     5/5     Running   0             57m
pod/ovnkube-node-jg52r     5/5     Running   0             51m
pod/ovnkube-node-k8wf7     5/5     Running   0             57m
pod/ovnkube-node-tlwk6     5/5     Running   0             47m
pod/ovnkube-node-xsvnk     5/5     Running   0             57m

NAME                            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)             AGE
service/ovn-kubernetes-master   ClusterIP   None         <none>        9102/TCP            57m
service/ovn-kubernetes-node     ClusterIP   None         <none>        9103/TCP,9105/TCP   57m
service/ovnkube-db              ClusterIP   None         <none>        9641/TCP,9642/TCP   57m

NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR                                                 AGE
daemonset.apps/ovnkube-master   3         3         3       3            3           beta.kubernetes.io/os=linux,node-role.kubernetes.io/master=   57m
daemonset.apps/ovnkube-node     5         5         5       5            5           beta.kubernetes.io/os=linux                                   57m

NAME                              ENDPOINTS                                                        AGE
endpoints/ovn-kubernetes-master   10.0.132.11:9102,10.0.151.18:9102,10.0.192.45:9102               57m
endpoints/ovn-kubernetes-node     10.0.132.11:9105,10.0.143.72:9105,10.0.151.18:9105 + 7 more...   57m
endpoints/ovnkube-db              10.0.132.11:9642,10.0.151.18:9642,10.0.192.45:9642 + 3 more...   57m

NAME                                 DATA   AGE
configmap/control-plane-status       1      55m
configmap/kube-root-ca.crt           1      57m
configmap/openshift-service-ca.crt   1      57m
configmap/ovn-ca                     1      57m
configmap/ovnkube-config             1      57m
configmap/signer-ca                  1      57m

Copy to Clipboard

Toggle word wrap

There are three ovnkube-masters that run on the control plane nodes, and two daemon sets used to deploy the ovnkube-master and ovnkube-node pods. There is one ovnkube-node pod for each node in the cluster. The ovnkube-config ConfigMap has the OpenShift Container Platform OVN-Kubernetes configurations started by online-master and ovnkube-node.

List all the containers in the ovnkube-master pods by running the following command:
```
oc get pods ovnkube-master-9g7zt \
-o jsonpath='{.spec.containers[*].name}' -n openshift-ovn-kubernetes
```
```
$ oc get pods ovnkube-master-9g7zt \
-o jsonpath='{.spec.containers[*].name}' -n openshift-ovn-kubernetes
```
Copy to Clipboard Toggle word wrap
Expected output
```
northd nbdb kube-rbac-proxy sbdb ovnkube-master ovn-dbchecker
```
```
northd nbdb kube-rbac-proxy sbdb ovnkube-master ovn-dbchecker
```
Copy to Clipboard Toggle word wrap
The ovnkube-master pod is made up of several containers. It is responsible for hosting the northbound database (nbdb container), the southbound database (sbdb container), watching for cluster events for pods, egressIP, namespaces, services, endpoints, egress firewall, and network policy and writing them to the northbound database (ovnkube-master pod), as well as managing pod subnet allocation to nodes.
List all the containers in the ovnkube-node pods by running the following command:
```
oc get pods ovnkube-node-jg52r \
-o jsonpath='{.spec.containers[*].name}' -n openshift-ovn-kubernetes
```
```
$ oc get pods ovnkube-node-jg52r \
-o jsonpath='{.spec.containers[*].name}' -n openshift-ovn-kubernetes
```
Copy to Clipboard Toggle word wrap
Expected output
```
ovn-controller ovn-acl-logging kube-rbac-proxy kube-rbac-proxy-ovn-metrics ovnkube-node
```
```
ovn-controller ovn-acl-logging kube-rbac-proxy kube-rbac-proxy-ovn-metrics ovnkube-node
```
Copy to Clipboard Toggle word wrap
The ovnkube-node pod has a container (ovn-controller) that resides on each OpenShift Container Platform node. Each node’s ovn-controller connects the OVN northbound to the OVN southbound database to learn about the OVN configuration. The ovn-controller connects southbound to ovs-vswitchd as an OpenFlow controller, for control over network traffic, and to the local ovsdb-server to allow it to monitor and control Open vSwitch configuration.

List the currently elected OVN-Kubernetes master leader by running the following command:

oc get lease -n openshift-ovn-kubernetes

$ oc get lease -n openshift-ovn-kubernetes

Copy to Clipboard

Toggle word wrap

Expected output

NAME                    HOLDER                               AGE
ovn-kubernetes-master   ci-ln-gz990pb-72292-rthz2-master-2   50m

NAME                    HOLDER                               AGE
ovn-kubernetes-master   ci-ln-gz990pb-72292-rthz2-master-2   50m

Copy to Clipboard

Toggle word wrap

25.2.3. Listing the OVN-Kubernetes northbound database contents
Copy link

To understand logic flow rules you need to examine the northbound database and understand what objects are there to see how they are translated into logic flow rules. The up to date information is present on the OVN Raft leader and this procedure describes how to find the Raft leader and subsequently query it to list the OVN northbound database contents.

Prerequisites

Access to the cluster as a user with the cluster-admin role.
The OpenShift CLI (oc) installed.

Procedure

Find the OVN Raft leader for the northbound database.

Note

The Raft leader stores the most up to date information.

List the pods by running the following command:

oc get po -n openshift-ovn-kubernetes

$ oc get po -n openshift-ovn-kubernetes

Copy to Clipboard

Toggle word wrap

Example output

NAME                   READY   STATUS    RESTARTS       AGE
ovnkube-master-7j97q   6/6     Running   2 (148m ago)   149m
ovnkube-master-gt4ms   6/6     Running   1 (140m ago)   147m
ovnkube-master-mk6p6   6/6     Running   0              148m
ovnkube-node-8qvtr     5/5     Running   0              149m
ovnkube-node-fqdc9     5/5     Running   0              149m
ovnkube-node-tlfwv     5/5     Running   0              149m
ovnkube-node-wlwkn     5/5     Running   0              142m

NAME                   READY   STATUS    RESTARTS       AGE
ovnkube-master-7j97q   6/6     Running   2 (148m ago)   149m
ovnkube-master-gt4ms   6/6     Running   1 (140m ago)   147m
ovnkube-master-mk6p6   6/6     Running   0              148m
ovnkube-node-8qvtr     5/5     Running   0              149m
ovnkube-node-fqdc9     5/5     Running   0              149m
ovnkube-node-tlfwv     5/5     Running   0              149m
ovnkube-node-wlwkn     5/5     Running   0              142m

Copy to Clipboard

Toggle word wrap

Choose one of the master pods at random and run the following command:

oc exec -n openshift-ovn-kubernetes ovnkube-master-7j97q \
-- /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl \
--timeout=3 cluster/status OVN_Northbound

$ oc exec -n openshift-ovn-kubernetes ovnkube-master-7j97q \
-- /usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl \
--timeout=3 cluster/status OVN_Northbound

Copy to Clipboard

Toggle word wrap

Example output

Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
1c57
Name: OVN_Northbound
Cluster ID: c48a (c48aa5c0-a704-4c77-a066-24fe99d9b338)
Server ID: 1c57 (1c57b6fc-2849-49b7-8679-fbf18bafe339)
Address: ssl:10.0.147.219:9643
Status: cluster member
Role: follower 
Term: 5
Leader: 2b4f 
Vote: unknown

Election timer: 10000
Log: [2, 3018]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 ->0000 <-8844 <-2b4f
Disconnections: 0
Servers:
    1c57 (1c57 at ssl:10.0.147.219:9643) (self)
    8844 (8844 at ssl:10.0.163.212:9643) last msg 8928047 ms ago
    2b4f (2b4f at ssl:10.0.242.240:9643) last msg 620 ms ago

Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
1c57
Name: OVN_Northbound
Cluster ID: c48a (c48aa5c0-a704-4c77-a066-24fe99d9b338)
Server ID: 1c57 (1c57b6fc-2849-49b7-8679-fbf18bafe339)
Address: ssl:10.0.147.219:9643
Status: cluster member
Role: follower

1


Term: 5
Leader: 2b4f

2


Vote: unknown

Election timer: 10000
Log: [2, 3018]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 ->0000 <-8844 <-2b4f
Disconnections: 0
Servers:
    1c57 (1c57 at ssl:10.0.147.219:9643) (self)
    8844 (8844 at ssl:10.0.163.212:9643) last msg 8928047 ms ago
    2b4f (2b4f at ssl:10.0.242.240:9643) last msg 620 ms ago

3

Copy to Clipboard

Toggle word wrap

1: This pod is identified as a follower
2: The leader is identified as 2b4f
3: The 2b4f is on IP address 10.0.242.240

Find the ovnkube-master pod running on IP Address 10.0.242.240 using the following command:

oc get po -o wide -n openshift-ovn-kubernetes | grep 10.0.242.240 | grep -v ovnkube-node

$ oc get po -o wide -n openshift-ovn-kubernetes | grep 10.0.242.240 | grep -v ovnkube-node

Copy to Clipboard

Toggle word wrap

Example output

ovnkube-master-gt4ms   6/6     Running             1 (143m ago)   150m   10.0.242.240   ip-10-0-242-240.ec2.internal   <none>           <none>

ovnkube-master-gt4ms   6/6     Running             1 (143m ago)   150m   10.0.242.240   ip-10-0-242-240.ec2.internal   <none>           <none>

Copy to Clipboard

Toggle word wrap

The ovnkube-master-gt4ms pod runs on IP Address 10.0.242.240.

Run the following command to show all the objects in the northbound database:
```
oc exec -n openshift-ovn-kubernetes -it ovnkube-master-gt4ms \
-c northd -- ovn-nbctl show
```
```
$ oc exec -n openshift-ovn-kubernetes -it ovnkube-master-gt4ms \
-c northd -- ovn-nbctl show
```
Copy to Clipboard Toggle word wrap
The output is too long to list here. The list includes the NAT rules, logical switches, load balancers and so on.
Run the following command to display the options available with the command ovn-nbctl:
```
oc exec -n openshift-ovn-kubernetes -it ovnkube-master-mk6p6 \
-c northd ovn-nbctl --help
```
```
$ oc exec -n openshift-ovn-kubernetes -it ovnkube-master-mk6p6 \
-c northd ovn-nbctl --help
```
Copy to Clipboard Toggle word wrap
You can narrow down and focus on specific components by using some of the following commands:

Run the following command to show the list of logical routers:

oc exec -n openshift-ovn-kubernetes -it ovnkube-master-gt4ms \
-c northd -- ovn-nbctl lr-list

$ oc exec -n openshift-ovn-kubernetes -it ovnkube-master-gt4ms \
-c northd -- ovn-nbctl lr-list

Copy to Clipboard

Toggle word wrap

Example output

f971f1f3-5112-402f-9d1e-48f1d091ff04 (GR_ip-10-0-145-205.ec2.internal)
69c992d8-a4cf-429e-81a3-5361209ffe44 (GR_ip-10-0-147-219.ec2.internal)
7d164271-af9e-4283-b84a-48f2a44851cd (GR_ip-10-0-163-212.ec2.internal)
111052e3-c395-408b-97b2-8dd0a20a29a5 (GR_ip-10-0-165-9.ec2.internal)
ed50ce33-df5d-48e8-8862-2df6a59169a0 (GR_ip-10-0-209-170.ec2.internal)
f44e2a96-8d1e-4a4d-abae-ed8728ac6851 (GR_ip-10-0-242-240.ec2.internal)
ef3d0057-e557-4b1a-b3c6-fcc3463790b0 (ovn_cluster_router)

f971f1f3-5112-402f-9d1e-48f1d091ff04 (GR_ip-10-0-145-205.ec2.internal)
69c992d8-a4cf-429e-81a3-5361209ffe44 (GR_ip-10-0-147-219.ec2.internal)
7d164271-af9e-4283-b84a-48f2a44851cd (GR_ip-10-0-163-212.ec2.internal)
111052e3-c395-408b-97b2-8dd0a20a29a5 (GR_ip-10-0-165-9.ec2.internal)
ed50ce33-df5d-48e8-8862-2df6a59169a0 (GR_ip-10-0-209-170.ec2.internal)
f44e2a96-8d1e-4a4d-abae-ed8728ac6851 (GR_ip-10-0-242-240.ec2.internal)
ef3d0057-e557-4b1a-b3c6-fcc3463790b0 (ovn_cluster_router)

Copy to Clipboard

Toggle word wrap

Note

From this output you can see there is router on each node plus an ovn_cluster_router.

Run the following command to show the list of logical switches:

oc exec -n openshift-ovn-kubernetes -it ovnkube-master-gt4ms \
-c northd -- ovn-nbctl ls-list

$ oc exec -n openshift-ovn-kubernetes -it ovnkube-master-gt4ms \
-c northd -- ovn-nbctl ls-list

Copy to Clipboard

Toggle word wrap

Example output

82808c5c-b3bc-414a-bb59-8fec4b07eb14 (ext_ip-10-0-145-205.ec2.internal)
3d22444f-0272-4c51-afc6-de9e03db3291 (ext_ip-10-0-147-219.ec2.internal)
bf73b9df-59ab-4c58-a456-ce8205b34ac5 (ext_ip-10-0-163-212.ec2.internal)
bee1e8d0-ec87-45eb-b98b-63f9ec213e5e (ext_ip-10-0-165-9.ec2.internal)
812f08f2-6476-4abf-9a78-635f8516f95e (ext_ip-10-0-209-170.ec2.internal)
f65e710b-32f9-482b-8eab-8d96a44799c1 (ext_ip-10-0-242-240.ec2.internal)
84dad700-afb8-4129-86f9-923a1ddeace9 (ip-10-0-145-205.ec2.internal)
1b7b448b-e36c-4ca3-9f38-4a2cf6814bfd (ip-10-0-147-219.ec2.internal)
d92d1f56-2606-4f23-8b6a-4396a78951de (ip-10-0-163-212.ec2.internal)
6864a6b2-de15-4de3-92d8-f95014b6f28f (ip-10-0-165-9.ec2.internal)
c26bf618-4d7e-4afd-804f-1a2cbc96ec6d (ip-10-0-209-170.ec2.internal)
ab9a4526-44ed-4f82-ae1c-e20da04947d9 (ip-10-0-242-240.ec2.internal)
a8588aba-21da-4276-ba0f-9d68e88911f0 (join)

82808c5c-b3bc-414a-bb59-8fec4b07eb14 (ext_ip-10-0-145-205.ec2.internal)
3d22444f-0272-4c51-afc6-de9e03db3291 (ext_ip-10-0-147-219.ec2.internal)
bf73b9df-59ab-4c58-a456-ce8205b34ac5 (ext_ip-10-0-163-212.ec2.internal)
bee1e8d0-ec87-45eb-b98b-63f9ec213e5e (ext_ip-10-0-165-9.ec2.internal)
812f08f2-6476-4abf-9a78-635f8516f95e (ext_ip-10-0-209-170.ec2.internal)
f65e710b-32f9-482b-8eab-8d96a44799c1 (ext_ip-10-0-242-240.ec2.internal)
84dad700-afb8-4129-86f9-923a1ddeace9 (ip-10-0-145-205.ec2.internal)
1b7b448b-e36c-4ca3-9f38-4a2cf6814bfd (ip-10-0-147-219.ec2.internal)
d92d1f56-2606-4f23-8b6a-4396a78951de (ip-10-0-163-212.ec2.internal)
6864a6b2-de15-4de3-92d8-f95014b6f28f (ip-10-0-165-9.ec2.internal)
c26bf618-4d7e-4afd-804f-1a2cbc96ec6d (ip-10-0-209-170.ec2.internal)
ab9a4526-44ed-4f82-ae1c-e20da04947d9 (ip-10-0-242-240.ec2.internal)
a8588aba-21da-4276-ba0f-9d68e88911f0 (join)

Copy to Clipboard

Toggle word wrap

Note

From this output you can see there is an ext switch for each node plus switches with the node name itself and a join switch.

Run the following command to show the list of load balancers:

oc exec -n openshift-ovn-kubernetes -it ovnkube-master-gt4ms \
-c northd -- ovn-nbctl lb-list

$ oc exec -n openshift-ovn-kubernetes -it ovnkube-master-gt4ms \
-c northd -- ovn-nbctl lb-list

Copy to Clipboard

Toggle word wrap

Example output

UUID                                    LB                  PROTO      VIP                     IPs
f0fb50f9-4968-4b55-908c-616bae4db0a2    Service_default/    tcp        172.30.0.1:443          10.0.147.219:6443,10.0.163.212:6443,169.254.169.2:6443
0dc42012-4f5b-432e-ae01-2cc4bfe81b00    Service_default/    tcp        172.30.0.1:443          10.0.147.219:6443,169.254.169.2:6443,10.0.242.240:6443
f7fff5d5-5eff-4a40-98b1-3a4ba8f7f69c    Service_default/    tcp        172.30.0.1:443          169.254.169.2:6443,10.0.163.212:6443,10.0.242.240:6443
12fe57a0-50a4-4a1b-ac10-5f288badee07    Service_default/    tcp        172.30.0.1:443          10.0.147.219:6443,10.0.163.212:6443,10.0.242.240:6443
3f137fbf-0b78-4875-ba44-fbf89f254cf7    Service_openshif    tcp        172.30.23.153:443       10.130.0.14:8443
174199fe-0562-4141-b410-12094db922a7    Service_openshif    tcp        172.30.69.51:50051      10.130.0.84:50051
5ee2d4bd-c9e2-4d16-a6df-f54cd17c9ac3    Service_openshif    tcp        172.30.143.87:9001      10.0.145.205:9001,10.0.147.219:9001,10.0.163.212:9001,10.0.165.9:9001,10.0.209.170:9001,10.0.242.240:9001
a056ae3d-83f8-45bc-9c80-ef89bce7b162    Service_openshif    tcp        172.30.164.74:443       10.0.147.219:6443,10.0.163.212:6443,10.0.242.240:6443
bac51f3d-9a6f-4f5e-ac02-28fd343a332a    Service_openshif    tcp        172.30.0.10:53          10.131.0.6:5353
                                                            tcp        172.30.0.10:9154        10.131.0.6:9154
48105bbc-51d7-4178-b975-417433f9c20a    Service_openshif    tcp        172.30.26.159:2379      10.0.147.219:2379,169.254.169.2:2379,10.0.242.240:2379
                                                            tcp        172.30.26.159:9979      10.0.147.219:9979,169.254.169.2:9979,10.0.242.240:9979
7de2b8fc-342a-415f-ac13-1a493f4e39c0    Service_openshif    tcp        172.30.53.219:443       10.128.0.7:8443
                                                            tcp        172.30.53.219:9192      10.128.0.7:9192
2cef36bc-d720-4afb-8d95-9350eff1d27a    Service_openshif    tcp        172.30.81.66:443        10.128.0.23:8443
365cb6fb-e15e-45a4-a55b-21868b3cf513    Service_openshif    tcp        172.30.96.51:50051      10.130.0.19:50051
41691cbb-ec55-4cdb-8431-afce679c5e8d    Service_openshif    tcp        172.30.98.218:9099      169.254.169.2:9099
82df10ba-8143-400b-977a-8f5f416a4541    Service_openshif    tcp        172.30.26.159:2379      10.0.147.219:2379,10.0.163.212:2379,169.254.169.2:2379
                                                            tcp        172.30.26.159:9979      10.0.147.219:9979,10.0.163.212:9979,169.254.169.2:9979
debe7f3a-39a8-490e-bc0a-ebbfafdffb16    Service_openshif    tcp        172.30.23.244:443       10.128.0.48:8443,10.129.0.27:8443,10.130.0.45:8443
8a749239-02d9-4dc2-8737-716528e0da7b    Service_openshif    tcp        172.30.124.255:8443     10.128.0.14:8443
880c7c78-c790-403d-a3cb-9f06592717a3    Service_openshif    tcp        172.30.0.10:53          10.130.0.20:5353
                                                            tcp        172.30.0.10:9154        10.130.0.20:9154
d2f39078-6751-4311-a161-815bbaf7f9c7    Service_openshif    tcp        172.30.26.159:2379      169.254.169.2:2379,10.0.163.212:2379,10.0.242.240:2379
                                                            tcp        172.30.26.159:9979      169.254.169.2:9979,10.0.163.212:9979,10.0.242.240:9979
30948278-602b-455c-934a-28e64c46de12    Service_openshif    tcp        172.30.157.35:9443      10.130.0.43:9443
2cc7e376-7c02-4a82-89e8-dfa1e23fb003    Service_openshif    tcp        172.30.159.212:17698    10.128.0.48:17698,10.129.0.27:17698,10.130.0.45:17698
e7d22d35-61c2-40c2-bc30-265cff8ed18d    Service_openshif    tcp        172.30.143.87:9001      10.0.145.205:9001,10.0.147.219:9001,10.0.163.212:9001,10.0.165.9:9001,10.0.209.170:9001,169.254.169.2:9001
75164e75-e0c5-40fb-9636-bfdbf4223a02    Service_openshif    tcp        172.30.150.68:1936      10.129.4.8:1936,10.131.0.10:1936
                                                            tcp        172.30.150.68:443       10.129.4.8:443,10.131.0.10:443
                                                            tcp        172.30.150.68:80        10.129.4.8:80,10.131.0.10:80
7bc4ee74-dccf-47e9-9149-b011f09aff39    Service_openshif    tcp        172.30.164.74:443       10.0.147.219:6443,10.0.163.212:6443,169.254.169.2:6443
0db59e74-1cc6-470c-bf44-57c520e0aa8f    Service_openshif    tcp        10.0.163.212:31460
                                                            tcp        10.0.163.212:32361
c300e134-018c-49af-9f84-9deb1d0715f8    Service_openshif    tcp        172.30.42.244:50051     10.130.0.47:50051
5e352773-429b-4881-afb3-a13b7ba8b081    Service_openshif    tcp        172.30.244.66:443       10.129.0.8:8443,10.130.0.8:8443
54b82d32-1939-4465-a87d-f26321442a7a    Service_openshif    tcp        172.30.12.9:8443        10.128.0.35:8443

UUID                                    LB                  PROTO      VIP                     IPs
f0fb50f9-4968-4b55-908c-616bae4db0a2    Service_default/    tcp        172.30.0.1:443          10.0.147.219:6443,10.0.163.212:6443,169.254.169.2:6443
0dc42012-4f5b-432e-ae01-2cc4bfe81b00    Service_default/    tcp        172.30.0.1:443          10.0.147.219:6443,169.254.169.2:6443,10.0.242.240:6443
f7fff5d5-5eff-4a40-98b1-3a4ba8f7f69c    Service_default/    tcp        172.30.0.1:443          169.254.169.2:6443,10.0.163.212:6443,10.0.242.240:6443
12fe57a0-50a4-4a1b-ac10-5f288badee07    Service_default/    tcp        172.30.0.1:443          10.0.147.219:6443,10.0.163.212:6443,10.0.242.240:6443
3f137fbf-0b78-4875-ba44-fbf89f254cf7    Service_openshif    tcp        172.30.23.153:443       10.130.0.14:8443
174199fe-0562-4141-b410-12094db922a7    Service_openshif    tcp        172.30.69.51:50051      10.130.0.84:50051
5ee2d4bd-c9e2-4d16-a6df-f54cd17c9ac3    Service_openshif    tcp        172.30.143.87:9001      10.0.145.205:9001,10.0.147.219:9001,10.0.163.212:9001,10.0.165.9:9001,10.0.209.170:9001,10.0.242.240:9001
a056ae3d-83f8-45bc-9c80-ef89bce7b162    Service_openshif    tcp        172.30.164.74:443       10.0.147.219:6443,10.0.163.212:6443,10.0.242.240:6443
bac51f3d-9a6f-4f5e-ac02-28fd343a332a    Service_openshif    tcp        172.30.0.10:53          10.131.0.6:5353
                                                            tcp        172.30.0.10:9154        10.131.0.6:9154
48105bbc-51d7-4178-b975-417433f9c20a    Service_openshif    tcp        172.30.26.159:2379      10.0.147.219:2379,169.254.169.2:2379,10.0.242.240:2379
                                                            tcp        172.30.26.159:9979      10.0.147.219:9979,169.254.169.2:9979,10.0.242.240:9979
7de2b8fc-342a-415f-ac13-1a493f4e39c0    Service_openshif    tcp        172.30.53.219:443       10.128.0.7:8443
                                                            tcp        172.30.53.219:9192      10.128.0.7:9192
2cef36bc-d720-4afb-8d95-9350eff1d27a    Service_openshif    tcp        172.30.81.66:443        10.128.0.23:8443
365cb6fb-e15e-45a4-a55b-21868b3cf513    Service_openshif    tcp        172.30.96.51:50051      10.130.0.19:50051
41691cbb-ec55-4cdb-8431-afce679c5e8d    Service_openshif    tcp        172.30.98.218:9099      169.254.169.2:9099
82df10ba-8143-400b-977a-8f5f416a4541    Service_openshif    tcp        172.30.26.159:2379      10.0.147.219:2379,10.0.163.212:2379,169.254.169.2:2379
                                                            tcp        172.30.26.159:9979      10.0.147.219:9979,10.0.163.212:9979,169.254.169.2:9979
debe7f3a-39a8-490e-bc0a-ebbfafdffb16    Service_openshif    tcp        172.30.23.244:443       10.128.0.48:8443,10.129.0.27:8443,10.130.0.45:8443
8a749239-02d9-4dc2-8737-716528e0da7b    Service_openshif    tcp        172.30.124.255:8443     10.128.0.14:8443
880c7c78-c790-403d-a3cb-9f06592717a3    Service_openshif    tcp        172.30.0.10:53          10.130.0.20:5353
                                                            tcp        172.30.0.10:9154        10.130.0.20:9154
d2f39078-6751-4311-a161-815bbaf7f9c7    Service_openshif    tcp        172.30.26.159:2379      169.254.169.2:2379,10.0.163.212:2379,10.0.242.240:2379
                                                            tcp        172.30.26.159:9979      169.254.169.2:9979,10.0.163.212:9979,10.0.242.240:9979
30948278-602b-455c-934a-28e64c46de12    Service_openshif    tcp        172.30.157.35:9443      10.130.0.43:9443
2cc7e376-7c02-4a82-89e8-dfa1e23fb003    Service_openshif    tcp        172.30.159.212:17698    10.128.0.48:17698,10.129.0.27:17698,10.130.0.45:17698
e7d22d35-61c2-40c2-bc30-265cff8ed18d    Service_openshif    tcp        172.30.143.87:9001      10.0.145.205:9001,10.0.147.219:9001,10.0.163.212:9001,10.0.165.9:9001,10.0.209.170:9001,169.254.169.2:9001
75164e75-e0c5-40fb-9636-bfdbf4223a02    Service_openshif    tcp        172.30.150.68:1936      10.129.4.8:1936,10.131.0.10:1936
                                                            tcp        172.30.150.68:443       10.129.4.8:443,10.131.0.10:443
                                                            tcp        172.30.150.68:80        10.129.4.8:80,10.131.0.10:80
7bc4ee74-dccf-47e9-9149-b011f09aff39    Service_openshif    tcp        172.30.164.74:443       10.0.147.219:6443,10.0.163.212:6443,169.254.169.2:6443
0db59e74-1cc6-470c-bf44-57c520e0aa8f    Service_openshif    tcp        10.0.163.212:31460
                                                            tcp        10.0.163.212:32361
c300e134-018c-49af-9f84-9deb1d0715f8    Service_openshif    tcp        172.30.42.244:50051     10.130.0.47:50051
5e352773-429b-4881-afb3-a13b7ba8b081    Service_openshif    tcp        172.30.244.66:443       10.129.0.8:8443,10.130.0.8:8443
54b82d32-1939-4465-a87d-f26321442a7a    Service_openshif    tcp        172.30.12.9:8443        10.128.0.35:8443

Copy to Clipboard

Toggle word wrap

Note

From this truncated output you can see there are many OVN-Kubernetes load balancers. Load balancers in OVN-Kubernetes are representations of services.

25.2.4. Command-line arguments for ovn-nbctl to examine northbound database contents
Copy link

The following table describes the command-line arguments that can be used with ovn-nbctl to examine the contents of the northbound database.

Expand

Table 25.2. Command-line arguments to examine northbound database contents
Argument	Description
`ovn-nbctl show`	An overview of the northbound database contents.
`ovn-nbctl show <switch_or_router>`	Show the details associated with the specified switch or router.
`ovn-nbctl lr-list`	Show the logical routers.
`ovn-nbctl lrp-list <router>`	Using the router information from `ovn-nbctl lr-list` to show the router ports.
`ovn-nbctl lr-nat-list <router>`	Show network address translation details for the specified router.
`ovn-nbctl ls-list`	Show the logical switches
`ovn-nbctl lsp-list <switch>`	Using the switch information from `ovn-nbctl ls-list` to show the switch port.
`ovn-nbctl lsp-get-type <port>`	Get the type for the logical port.
`ovn-nbctl lb-list`	Show the load balancers.

25.2.5. Listing the OVN-Kubernetes southbound database contents
Copy link

Logic flow rules are stored in the southbound database that is a representation of your infrastructure. The up to date information is present on the OVN Raft leader and this procedure describes how to find the Raft leader and query it to list the OVN southbound database contents.

Prerequisites

Access to the cluster as a user with the cluster-admin role.
The OpenShift CLI (oc) installed.

Procedure

Find the OVN Raft leader for the southbound database.

Note

The Raft leader stores the most up to date information.

List the pods by running the following command:

oc get po -n openshift-ovn-kubernetes

$ oc get po -n openshift-ovn-kubernetes

Copy to Clipboard

Toggle word wrap

Example output

NAME                   READY   STATUS    RESTARTS       AGE
ovnkube-master-7j97q   6/6     Running   2 (134m ago)   135m
ovnkube-master-gt4ms   6/6     Running   1 (126m ago)   133m
ovnkube-master-mk6p6   6/6     Running   0              134m
ovnkube-node-8qvtr     5/5     Running   0              135m
ovnkube-node-bqztb     5/5     Running   0              117m
ovnkube-node-fqdc9     5/5     Running   0              135m
ovnkube-node-tlfwv     5/5     Running   0              135m
ovnkube-node-wlwkn     5/5     Running   0              128m

NAME                   READY   STATUS    RESTARTS       AGE
ovnkube-master-7j97q   6/6     Running   2 (134m ago)   135m
ovnkube-master-gt4ms   6/6     Running   1 (126m ago)   133m
ovnkube-master-mk6p6   6/6     Running   0              134m
ovnkube-node-8qvtr     5/5     Running   0              135m
ovnkube-node-bqztb     5/5     Running   0              117m
ovnkube-node-fqdc9     5/5     Running   0              135m
ovnkube-node-tlfwv     5/5     Running   0              135m
ovnkube-node-wlwkn     5/5     Running   0              128m

Copy to Clipboard

Toggle word wrap

Choose one of the master pods at random and run the following command to find the OVN southbound Raft leader:

oc exec -n openshift-ovn-kubernetes ovnkube-master-7j97q \
-- /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl \
--timeout=3 cluster/status OVN_Southbound

$ oc exec -n openshift-ovn-kubernetes ovnkube-master-7j97q \
-- /usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl \
--timeout=3 cluster/status OVN_Southbound

Copy to Clipboard

Toggle word wrap

Example output

Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
1930
Name: OVN_Southbound
Cluster ID: f772 (f77273c0-7986-42dd-bd3c-a9f18e25701f)
Server ID: 1930 (1930f4b7-314b-406f-9dcb-b81fe2729ae1)
Address: ssl:10.0.147.219:9644
Status: cluster member
Role: follower 
Term: 3
Leader: 7081 
Vote: unknown

Election timer: 16000
Log: [2, 2423]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 ->7145 <-7081 <-7145
Disconnections: 0
Servers:
    7081 (7081 at ssl:10.0.163.212:9644) last msg 59 ms ago 
    1930 (1930 at ssl:10.0.147.219:9644) (self)
    7145 (7145 at ssl:10.0.242.240:9644) last msg 7871735 ms ago

Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
1930
Name: OVN_Southbound
Cluster ID: f772 (f77273c0-7986-42dd-bd3c-a9f18e25701f)
Server ID: 1930 (1930f4b7-314b-406f-9dcb-b81fe2729ae1)
Address: ssl:10.0.147.219:9644
Status: cluster member
Role: follower

1


Term: 3
Leader: 7081

2


Vote: unknown

Election timer: 16000
Log: [2, 2423]
Entries not yet committed: 0
Entries not yet applied: 0
Connections: ->0000 ->7145 <-7081 <-7145
Disconnections: 0
Servers:
    7081 (7081 at ssl:10.0.163.212:9644) last msg 59 ms ago

3


    1930 (1930 at ssl:10.0.147.219:9644) (self)
    7145 (7145 at ssl:10.0.242.240:9644) last msg 7871735 ms ago

Copy to Clipboard

Toggle word wrap

1: This pod is identified as a follower
2: The leader is identified as 7081
3: The 7081 is on IP address 10.0.163.212

Find the ovnkube-master pod running on IP Address 10.0.163.212 using the following command:

oc get po -o wide -n openshift-ovn-kubernetes | grep 10.0.163.212 | grep -v ovnkube-node

$ oc get po -o wide -n openshift-ovn-kubernetes | grep 10.0.163.212 | grep -v ovnkube-node

Copy to Clipboard

Toggle word wrap

Example output

ovnkube-master-mk6p6   6/6     Running   0              136m   10.0.163.212   ip-10-0-163-212.ec2.internal   <none>           <none>

ovnkube-master-mk6p6   6/6     Running   0              136m   10.0.163.212   ip-10-0-163-212.ec2.internal   <none>           <none>

Copy to Clipboard

Toggle word wrap

The ovnkube-master-mk6p6 pod runs on IP Address 10.0.163.212.

Run the following command to show all the information stored in the southbound database:

oc exec -n openshift-ovn-kubernetes -it ovnkube-master-mk6p6 \
-c northd -- ovn-sbctl show

$ oc exec -n openshift-ovn-kubernetes -it ovnkube-master-mk6p6 \
-c northd -- ovn-sbctl show

Copy to Clipboard

Toggle word wrap

Example output

Chassis "8ca57b28-9834-45f0-99b0-96486c22e1be"
    hostname: ip-10-0-156-16.ec2.internal
    Encap geneve
        ip: "10.0.156.16"
        options: {csum="true"}
    Port_Binding k8s-ip-10-0-156-16.ec2.internal
    Port_Binding etor-GR_ip-10-0-156-16.ec2.internal
    Port_Binding jtor-GR_ip-10-0-156-16.ec2.internal
    Port_Binding openshift-ingress-canary_ingress-canary-hsblx
    Port_Binding rtoj-GR_ip-10-0-156-16.ec2.internal
    Port_Binding openshift-monitoring_prometheus-adapter-658fc5967-9l46x
    Port_Binding rtoe-GR_ip-10-0-156-16.ec2.internal
    Port_Binding openshift-multus_network-metrics-daemon-77nvz
    Port_Binding openshift-ingress_router-default-64fd8c67c7-df598
    Port_Binding openshift-dns_dns-default-ttpcq
    Port_Binding openshift-monitoring_alertmanager-main-0
    Port_Binding openshift-e2e-loki_loki-promtail-g2pbh
    Port_Binding openshift-network-diagnostics_network-check-target-m6tn4
    Port_Binding openshift-monitoring_thanos-querier-75b5cf8dcb-qf8qj
    Port_Binding cr-rtos-ip-10-0-156-16.ec2.internal
    Port_Binding openshift-image-registry_image-registry-7b7bc44566-mp9b8

Chassis "8ca57b28-9834-45f0-99b0-96486c22e1be"
    hostname: ip-10-0-156-16.ec2.internal
    Encap geneve
        ip: "10.0.156.16"
        options: {csum="true"}
    Port_Binding k8s-ip-10-0-156-16.ec2.internal
    Port_Binding etor-GR_ip-10-0-156-16.ec2.internal
    Port_Binding jtor-GR_ip-10-0-156-16.ec2.internal
    Port_Binding openshift-ingress-canary_ingress-canary-hsblx
    Port_Binding rtoj-GR_ip-10-0-156-16.ec2.internal
    Port_Binding openshift-monitoring_prometheus-adapter-658fc5967-9l46x
    Port_Binding rtoe-GR_ip-10-0-156-16.ec2.internal
    Port_Binding openshift-multus_network-metrics-daemon-77nvz
    Port_Binding openshift-ingress_router-default-64fd8c67c7-df598
    Port_Binding openshift-dns_dns-default-ttpcq
    Port_Binding openshift-monitoring_alertmanager-main-0
    Port_Binding openshift-e2e-loki_loki-promtail-g2pbh
    Port_Binding openshift-network-diagnostics_network-check-target-m6tn4
    Port_Binding openshift-monitoring_thanos-querier-75b5cf8dcb-qf8qj
    Port_Binding cr-rtos-ip-10-0-156-16.ec2.internal
    Port_Binding openshift-image-registry_image-registry-7b7bc44566-mp9b8

Copy to Clipboard

Toggle word wrap

This detailed output shows the chassis and the ports that are attached to the chassis which in this case are all of the router ports and anything that runs like host networking. Any pods communicate out to the wider network using source network address translation (SNAT). Their IP address is translated into the IP address of the node that the pod is running on and then sent out into the network.

In addition to the chassis information the southbound database has all the logic flows and those logic flows are then sent to the ovn-controller running on each of the nodes. The ovn-controller translates the logic flows into open flow rules and ultimately programs OpenvSwitch so that your pods can then follow open flow rules and make it out of the network.

Run the following command to display the options available with the command ovn-sbctl:

oc exec -n openshift-ovn-kubernetes -it ovnkube-master-mk6p6 \
-c northd -- ovn-sbctl --help

$ oc exec -n openshift-ovn-kubernetes -it ovnkube-master-mk6p6 \
-c northd -- ovn-sbctl --help

Copy to Clipboard

Toggle word wrap

25.2.6. Command-line arguments for ovn-sbctl to examine southbound database contents
Copy link

The following table describes the command-line arguments that can be used with ovn-sbctl to examine the contents of the southbound database.

Expand

Table 25.3. Command-line arguments to examine southbound database contents
Argument	Description
`ovn-sbctl show`	Overview of the southbound database contents.
`ovn-sbctl list Port_Binding <port>`	List the contents of southbound database for a the specified port .
`ovn-sbctl dump-flows`	List the logical flows.

25.2.7. OVN-Kubernetes logical architecture
Copy link

OVN is a network virtualization solution. It creates logical switches and routers. These switches and routers are interconnected to create any network topologies. When you run ovnkube-trace with the log level set to 2 or 5 the OVN-Kubernetes logical components are exposed. The following diagram shows how the routers and switches are connected in OpenShift Container Platform.

Figure 25.2. OVN-Kubernetes router and switch components

The key components involved in packet processing are:

Gateway routers: Gateway routers sometimes called L3 gateway routers, are typically used between the distributed routers and the physical network. Gateway routers including their logical patch ports are bound to a physical location (not distributed), or chassis. The patch ports on this router are known as l3gateway ports in the ovn-southbound database (ovn-sbdb).
Distributed logical routers: Distributed logical routers and the logical switches behind them, to which virtual machines and containers attach, effectively reside on each hypervisor.
Join local switch: Join local switches are used to connect the distributed router and gateway routers. It reduces the number of IP addresses needed on the distributed router.
Logical switches with patch ports: Logical switches with patch ports are used to virtualize the network stack. They connect remote logical ports through tunnels.
Logical switches with localnet ports: Logical switches with localnet ports are used to connect OVN to the physical network. They connect remote logical ports by bridging the packets to directly connected physical L2 segments using localnet ports.
Patch ports: Patch ports represent connectivity between logical switches and logical routers and between peer logical routers. A single connection has a pair of patch ports at each such point of connectivity, one on each side.
l3gateway ports: l3gateway ports are the port binding entries in the ovn-sbdb for logical patch ports used in the gateway routers. They are called l3gateway ports rather than patch ports just to portray the fact that these ports are bound to a chassis just like the gateway router itself.
localnet ports: localnet ports are present on the bridged logical switches that allows a connection to a locally accessible network from each ovn-controller instance. This helps model the direct connectivity to the physical network from the logical switches. A logical switch can only have a single localnet port attached to it.

25.2.7.1. Installing network-tools on local host
Copy link

Install network-tools on your local host to make a collection of tools available for debugging OpenShift Container Platform cluster network issues.

Procedure

Clone the network-tools repository onto your workstation with the following command:
```
git clone git@github.com:openshift/network-tools.git
```
```
$ git clone git@github.com:openshift/network-tools.git
```
Copy to Clipboard Toggle word wrap
Change into the directory for the repository you just cloned:
```
cd network-tools
```
```
$ cd network-tools
```
Copy to Clipboard Toggle word wrap
Optional: List all available commands:
```
./debug-scripts/network-tools -h
```
```
$ ./debug-scripts/network-tools -h
```
Copy to Clipboard Toggle word wrap

25.2.7.2. Running network-tools
Copy link

Get information about the logical switches and routers by running network-tools.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster as a user with cluster-admin privileges.
You have installed network-tools on local host.

Procedure

List the routers by running the following command:

./debug-scripts/network-tools ovn-db-run-command ovn-nbctl lr-list

$ ./debug-scripts/network-tools ovn-db-run-command ovn-nbctl lr-list

Copy to Clipboard

Toggle word wrap

Example output

Leader pod is ovnkube-master-vslqm
5351ddd1-f181-4e77-afc6-b48b0a9df953 (GR_helix13.lab.eng.tlv2.redhat.com)
ccf9349e-1948-4df8-954e-39fb0c2d4d06 (GR_helix14.lab.eng.tlv2.redhat.com)
e426b918-75a8-4220-9e76-20b7758f92b7 (GR_hlxcl7-master-0.hlxcl7.lab.eng.tlv2.redhat.com)
dded77c8-0cc3-4b99-8420-56cd2ae6a840 (GR_hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com)
4f6747e6-e7ba-4e0c-8dcd-94c8efa51798 (GR_hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com)
52232654-336e-4952-98b9-0b8601e370b4 (ovn_cluster_router)

Leader pod is ovnkube-master-vslqm
5351ddd1-f181-4e77-afc6-b48b0a9df953 (GR_helix13.lab.eng.tlv2.redhat.com)
ccf9349e-1948-4df8-954e-39fb0c2d4d06 (GR_helix14.lab.eng.tlv2.redhat.com)
e426b918-75a8-4220-9e76-20b7758f92b7 (GR_hlxcl7-master-0.hlxcl7.lab.eng.tlv2.redhat.com)
dded77c8-0cc3-4b99-8420-56cd2ae6a840 (GR_hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com)
4f6747e6-e7ba-4e0c-8dcd-94c8efa51798 (GR_hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com)
52232654-336e-4952-98b9-0b8601e370b4 (ovn_cluster_router)

Copy to Clipboard

Toggle word wrap

List the localnet ports by running the following command:

./debug-scripts/network-tools ovn-db-run-command \
ovn-sbctl find Port_Binding type=localnet

$ ./debug-scripts/network-tools ovn-db-run-command \
ovn-sbctl find Port_Binding type=localnet

Copy to Clipboard

Toggle word wrap

Example output

Leader pod is ovnkube-master-vslqm
_uuid               : 3de79191-cca8-4c28-be5a-a228f0f9ebfc
additional_chassis  : []
additional_encap    : []
chassis             : []
datapath            : 3f1a4928-7ff5-471f-9092-fe5f5c67d15c
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : br-ex_helix13.lab.eng.tlv2.redhat.com
mac                 : [unknown]
nat_addresses       : []
options             : {network_name=physnet}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 2
type                : localnet
up                  : false
virtual_parent      : []

_uuid               : dbe21daf-9594-4849-b8f0-5efbfa09a455
additional_chassis  : []
additional_encap    : []
chassis             : []
datapath            : db2a6067-fe7c-4d11-95a7-ff2321329e11
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : br-ex_hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com
mac                 : [unknown]
nat_addresses       : []
options             : {network_name=physnet}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 2
type                : localnet
up                  : false
virtual_parent      : []

[...]

Leader pod is ovnkube-master-vslqm
_uuid               : 3de79191-cca8-4c28-be5a-a228f0f9ebfc
additional_chassis  : []
additional_encap    : []
chassis             : []
datapath            : 3f1a4928-7ff5-471f-9092-fe5f5c67d15c
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : br-ex_helix13.lab.eng.tlv2.redhat.com
mac                 : [unknown]
nat_addresses       : []
options             : {network_name=physnet}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 2
type                : localnet
up                  : false
virtual_parent      : []

_uuid               : dbe21daf-9594-4849-b8f0-5efbfa09a455
additional_chassis  : []
additional_encap    : []
chassis             : []
datapath            : db2a6067-fe7c-4d11-95a7-ff2321329e11
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : br-ex_hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com
mac                 : [unknown]
nat_addresses       : []
options             : {network_name=physnet}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 2
type                : localnet
up                  : false
virtual_parent      : []

[...]

Copy to Clipboard

Toggle word wrap

List the l3gateway ports by running the following command:

./debug-scripts/network-tools ovn-db-run-command \
ovn-sbctl find Port_Binding type=l3gateway

$ ./debug-scripts/network-tools ovn-db-run-command \
ovn-sbctl find Port_Binding type=l3gateway

Copy to Clipboard

Toggle word wrap

Example output

Leader pod is ovnkube-master-vslqm
_uuid               : 9314dc80-39e1-4af7-9cc0-ae8a9708ed59
additional_chassis  : []
additional_encap    : []
chassis             : 336a923d-99e8-4e71-89a6-12564fde5760
datapath            : db2a6067-fe7c-4d11-95a7-ff2321329e11
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : etor-GR_hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com
mac                 : ["52:54:00:3e:95:d3"]
nat_addresses       : ["52:54:00:3e:95:d3 10.46.56.77"]
options             : {l3gateway-chassis="7eb1f1c3-87c2-4f68-8e89-60f5ca810971", peer=rtoe-GR_hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 1
type                : l3gateway
up                  : true
virtual_parent      : []

_uuid               : ad7eb303-b411-4e9f-8d36-d07f1f268e27
additional_chassis  : []
additional_encap    : []
chassis             : f41453b8-29c5-4f39-b86b-e82cf344bce4
datapath            : 082e7a60-d9c7-464b-b6ec-117d3426645a
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : etor-GR_helix14.lab.eng.tlv2.redhat.com
mac                 : ["34:48:ed:f3:e2:2c"]
nat_addresses       : ["34:48:ed:f3:e2:2c 10.46.56.14"]
options             : {l3gateway-chassis="2e8abe3a-cb94-4593-9037-f5f9596325e2", peer=rtoe-GR_helix14.lab.eng.tlv2.redhat.com}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 1
type                : l3gateway
up                  : true
virtual_parent      : []

[...]

Leader pod is ovnkube-master-vslqm
_uuid               : 9314dc80-39e1-4af7-9cc0-ae8a9708ed59
additional_chassis  : []
additional_encap    : []
chassis             : 336a923d-99e8-4e71-89a6-12564fde5760
datapath            : db2a6067-fe7c-4d11-95a7-ff2321329e11
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : etor-GR_hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com
mac                 : ["52:54:00:3e:95:d3"]
nat_addresses       : ["52:54:00:3e:95:d3 10.46.56.77"]
options             : {l3gateway-chassis="7eb1f1c3-87c2-4f68-8e89-60f5ca810971", peer=rtoe-GR_hlxcl7-master-2.hlxcl7.lab.eng.tlv2.redhat.com}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 1
type                : l3gateway
up                  : true
virtual_parent      : []

_uuid               : ad7eb303-b411-4e9f-8d36-d07f1f268e27
additional_chassis  : []
additional_encap    : []
chassis             : f41453b8-29c5-4f39-b86b-e82cf344bce4
datapath            : 082e7a60-d9c7-464b-b6ec-117d3426645a
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : etor-GR_helix14.lab.eng.tlv2.redhat.com
mac                 : ["34:48:ed:f3:e2:2c"]
nat_addresses       : ["34:48:ed:f3:e2:2c 10.46.56.14"]
options             : {l3gateway-chassis="2e8abe3a-cb94-4593-9037-f5f9596325e2", peer=rtoe-GR_helix14.lab.eng.tlv2.redhat.com}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 1
type                : l3gateway
up                  : true
virtual_parent      : []

[...]

Copy to Clipboard

Toggle word wrap

List the patch ports by running the following command:

./debug-scripts/network-tools ovn-db-run-command \
ovn-sbctl find Port_Binding type=patch

$ ./debug-scripts/network-tools ovn-db-run-command \
ovn-sbctl find Port_Binding type=patch

Copy to Clipboard

Toggle word wrap

Example output

Leader pod is ovnkube-master-vslqm
_uuid               : c48b1380-ff26-4965-a644-6bd5b5946c61
additional_chassis  : []
additional_encap    : []
chassis             : []
datapath            : 72734d65-fae1-4bd9-a1ee-1bf4e085a060
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : jtor-ovn_cluster_router
mac                 : [router]
nat_addresses       : []
options             : {peer=rtoj-ovn_cluster_router}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 4
type                : patch
up                  : false
virtual_parent      : []

_uuid               : 5df51302-f3cd-415b-a059-ac24389938f7
additional_chassis  : []
additional_encap    : []
chassis             : []
datapath            : 0551c90f-e891-4909-8e9e-acc7909e06d0
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : rtos-hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com
mac                 : ["0a:58:0a:82:00:01 10.130.0.1/23"]
nat_addresses       : []
options             : {chassis-redirect-port=cr-rtos-hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com, peer=stor-hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 4
type                : patch
up                  : false
virtual_parent      : []

[...]

Leader pod is ovnkube-master-vslqm
_uuid               : c48b1380-ff26-4965-a644-6bd5b5946c61
additional_chassis  : []
additional_encap    : []
chassis             : []
datapath            : 72734d65-fae1-4bd9-a1ee-1bf4e085a060
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : jtor-ovn_cluster_router
mac                 : [router]
nat_addresses       : []
options             : {peer=rtoj-ovn_cluster_router}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 4
type                : patch
up                  : false
virtual_parent      : []

_uuid               : 5df51302-f3cd-415b-a059-ac24389938f7
additional_chassis  : []
additional_encap    : []
chassis             : []
datapath            : 0551c90f-e891-4909-8e9e-acc7909e06d0
encap               : []
external_ids        : {}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : rtos-hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com
mac                 : ["0a:58:0a:82:00:01 10.130.0.1/23"]
nat_addresses       : []
options             : {chassis-redirect-port=cr-rtos-hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com, peer=stor-hlxcl7-master-1.hlxcl7.lab.eng.tlv2.redhat.com}
parent_port         : []
port_security       : []
requested_additional_chassis: []
requested_chassis   : []
tag                 : []
tunnel_key          : 4
type                : patch
up                  : false
virtual_parent      : []

[...]

Copy to Clipboard

Toggle word wrap

25.3. Troubleshooting OVN-Kubernetes
Copy link

OVN-Kubernetes has many sources of built-in health checks and logs.

25.3.1. Monitoring OVN-Kubernetes health by using readiness probes
Copy link

The ovnkube-master and ovnkube-node pods have containers configured with readiness probes.

Prerequisites

Access to the OpenShift CLI (oc).
You have access to the cluster with cluster-admin privileges.
You have installed jq.

Procedure

Review the details of the ovnkube-master readiness probe by running the following command:
```
oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master \
-o json | jq '.items[0].spec.containers[] | .name,.readinessProbe'
```
```
$ oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master \
-o json | jq '.items[0].spec.containers[] | .name,.readinessProbe'
```
Copy to Clipboard Toggle word wrap
The readiness probe for the northbound and southbound database containers in the ovnkube-master pod checks for the health of the Raft cluster hosting the databases.
Review the details of the ovnkube-node readiness probe by running the following command:
```
oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master \
-o json | jq '.items[0].spec.containers[] | .name,.readinessProbe'
```
```
$ oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master \
-o json | jq '.items[0].spec.containers[] | .name,.readinessProbe'
```
Copy to Clipboard Toggle word wrap
The ovnkube-node container in the ovnkube-node pod has a readiness probe to verify the presence of the ovn-kubernetes CNI configuration file, the absence of which would indicate that the pod is not running or is not ready to accept requests to configure pods.
Show all events including the probe failures, for the namespace by using the following command:
```
oc get events -n openshift-ovn-kubernetes
```
```
$ oc get events -n openshift-ovn-kubernetes
```
Copy to Clipboard Toggle word wrap

Show the events for just this pod:

oc describe pod ovnkube-master-tp2z8 -n openshift-ovn-kubernetes

$ oc describe pod ovnkube-master-tp2z8 -n openshift-ovn-kubernetes

Copy to Clipboard

Toggle word wrap

Show the messages and statuses from the cluster network operator:
```
oc get co/network -o json | jq '.status.conditions[]'
```
```
$ oc get co/network -o json | jq '.status.conditions[]'
```
Copy to Clipboard Toggle word wrap

Show the ready status of each container in ovnkube-master pods by running the following script:

for p in $(oc get pods --selector app=ovnkube-master -n openshift-ovn-kubernetes \
-o jsonpath='{range.items[*]}{" "}{.metadata.name}'); do echo === $p ===;  \
oc get pods -n openshift-ovn-kubernetes $p -o json | jq '.status.containerStatuses[] | .name, .ready'; \
done

$ for p in $(oc get pods --selector app=ovnkube-master -n openshift-ovn-kubernetes \
-o jsonpath='{range.items[*]}{" "}{.metadata.name}'); do echo === $p ===;  \
oc get pods -n openshift-ovn-kubernetes $p -o json | jq '.status.containerStatuses[] | .name, .ready'; \
done

Copy to Clipboard

Toggle word wrap

Note

The expectation is all container statuses are reporting as true. Failure of a readiness probe sets the status to false.

25.3.2. Viewing OVN-Kubernetes alerts in the console
Copy link

The Alerting UI provides detailed information about alerts and their governing alerting rules and silences.

Prerequisites

You have access to the cluster as a developer or as a user with view permissions for the project that you are viewing metrics for.

Procedure (UI)

In the Administrator perspective, select Observe → Alerting. The three main pages in the Alerting UI in this perspective are the Alerts, Silences, and Alerting Rules pages.
View the rules for OVN-Kubernetes alerts by selecting Observe → Alerting → Alerting Rules.

25.3.3. Viewing OVN-Kubernetes alerts in the CLI
Copy link

You can get information about alerts and their governing alerting rules and silences from the command line.

Prerequisites

Access to the cluster as a user with the cluster-admin role.
The OpenShift CLI (oc) installed.
You have installed jq.

Procedure

View active or firing alerts by running the following commands.

Set the alert manager route environment variable by running the following command:

ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring \
-o jsonpath='{@.spec.host}')

$ ALERT_MANAGER=$(oc get route alertmanager-main -n openshift-monitoring \
-o jsonpath='{@.spec.host}')

Copy to Clipboard

Toggle word wrap

Issue a curl request to the alert manager route API with the correct authorization details requesting specific fields by running the following command:

curl -s -k -H "Authorization: Bearer \
$(oc create token prometheus-k8s -n openshift-monitoring)" \
https://$ALERT_MANAGER/api/v1/alerts \
| jq '.data[] | "\(.labels.severity) \(.labels.alertname) \(.labels.pod) \(.labels.container) \(.labels.endpoint) \(.labels.instance)"'

$ curl -s -k -H "Authorization: Bearer \
$(oc create token prometheus-k8s -n openshift-monitoring)" \
https://$ALERT_MANAGER/api/v1/alerts \
| jq '.data[] | "\(.labels.severity) \(.labels.alertname) \(.labels.pod) \(.labels.container) \(.labels.endpoint) \(.labels.instance)"'

Copy to Clipboard

Toggle word wrap

View alerting rules by running the following command:

oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s 'http://localhost:9090/api/v1/rules' | jq '.data.groups[].rules[] | select(((.name|contains("ovn")) or (.name|contains("OVN")) or (.name|contains("Ovn")) or (.name|contains("North")) or (.name|contains("South"))) and .type=="alerting")'

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s 'http://localhost:9090/api/v1/rules' | jq '.data.groups[].rules[] | select(((.name|contains("ovn")) or (.name|contains("OVN")) or (.name|contains("Ovn")) or (.name|contains("North")) or (.name|contains("South"))) and .type=="alerting")'

Copy to Clipboard

Toggle word wrap

25.3.4. Viewing the OVN-Kubernetes logs using the CLI
Copy link

You can view the logs for each of the pods in the ovnkube-master and ovnkube-node pods using the OpenShift CLI (oc).

Prerequisites

Access to the cluster as a user with the cluster-admin role.
Access to the OpenShift CLI (oc).
You have installed jq.

Procedure

View the log for a specific pod:
```
oc logs -f <pod_name> -c <container_name> -n <namespace>
```
```
$ oc logs -f <pod_name> -c <container_name> -n <namespace>
```
Copy to Clipboard Toggle word wrap
where:
-f
Optional: Specifies that the output follows what is being written into the logs.
<pod_name>
Specifies the name of the pod.
<container_name>
Optional: Specifies the name of a container. When a pod has more than one container, you must specify the container name.
<namespace>
Specify the namespace the pod is running in.
For example:
```
oc logs ovnkube-master-7h4q7 -n openshift-ovn-kubernetes
```
```
$ oc logs ovnkube-master-7h4q7 -n openshift-ovn-kubernetes
```
Copy to Clipboard Toggle word wrap
```
oc logs -f ovnkube-master-7h4q7 -n openshift-ovn-kubernetes -c ovn-dbchecker
```
```
$ oc logs -f ovnkube-master-7h4q7 -n openshift-ovn-kubernetes -c ovn-dbchecker
```
Copy to Clipboard Toggle word wrap
The contents of log files are printed out.

Examine the most recent entries in all the containers in the ovnkube-master pods:

for p in $(oc get pods --selector app=ovnkube-master -n openshift-ovn-kubernetes \
-o jsonpath='{range.items[*]}{" "}{.metadata.name}'); \
do echo === $p ===; for container in $(oc get pods -n openshift-ovn-kubernetes $p \
-o json | jq -r '.status.containerStatuses[] | .name');do echo ---$container---; \
oc logs -c $container $p -n openshift-ovn-kubernetes --tail=5; done; done

$ for p in $(oc get pods --selector app=ovnkube-master -n openshift-ovn-kubernetes \
-o jsonpath='{range.items[*]}{" "}{.metadata.name}'); \
do echo === $p ===; for container in $(oc get pods -n openshift-ovn-kubernetes $p \
-o json | jq -r '.status.containerStatuses[] | .name');do echo ---$container---; \
oc logs -c $container $p -n openshift-ovn-kubernetes --tail=5; done; done

Copy to Clipboard

Toggle word wrap

View the last 5 lines of every log in every container in an ovnkube-master pod using the following command:

oc logs -l app=ovnkube-master -n openshift-ovn-kubernetes --all-containers --tail 5

$ oc logs -l app=ovnkube-master -n openshift-ovn-kubernetes --all-containers --tail 5

Copy to Clipboard

Toggle word wrap

25.3.5. Viewing the OVN-Kubernetes logs using the web console
Copy link

You can view the logs for each of the pods in the ovnkube-master and ovnkube-node pods in the web console.

Prerequisites

Access to the OpenShift CLI (oc).

Procedure

In the OpenShift Container Platform console, navigate to Workloads → Pods or navigate to the pod through the resource you want to investigate.
Select the openshift-ovn-kubernetes project from the drop-down menu.
Click the name of the pod you want to investigate.
Click Logs. By default for the ovnkube-master the logs associated with the northd container are displayed.
Use the down-down menu to select logs for each container in turn.

25.3.5.1. Changing the OVN-Kubernetes log levels
Copy link

The default log level for OVN-Kubernetes is 2. To debug OVN-Kubernetes set the log level to 5. Follow this procedure to increase the log level of the OVN-Kubernetes to help you debug an issue.

Prerequisites

You have access to the cluster with cluster-admin privileges.
You have access to the OpenShift Container Platform web console.

Procedure

Run the following command to get detailed information for all pods in the OVN-Kubernetes project:

oc get po -o wide -n openshift-ovn-kubernetes

$ oc get po -o wide -n openshift-ovn-kubernetes

Copy to Clipboard

Toggle word wrap

Example output

NAME                   READY   STATUS    RESTARTS      AGE   IP             NODE                           NOMINATED NODE   READINESS GATES
ovnkube-master-84nc9   6/6     Running   0             50m   10.0.134.156   ip-10-0-134-156.ec2.internal   <none>           <none>
ovnkube-master-gmlqv   6/6     Running   0             50m   10.0.209.180   ip-10-0-209-180.ec2.internal   <none>           <none>
ovnkube-master-nhts2   6/6     Running   1 (48m ago)   50m   10.0.147.31    ip-10-0-147-31.ec2.internal    <none>           <none>
ovnkube-node-2cbh8     5/5     Running   0             43m   10.0.217.114   ip-10-0-217-114.ec2.internal   <none>           <none>
ovnkube-node-6fvzl     5/5     Running   0             50m   10.0.147.31    ip-10-0-147-31.ec2.internal    <none>           <none>
ovnkube-node-f4lzz     5/5     Running   0             24m   10.0.146.76    ip-10-0-146-76.ec2.internal    <none>           <none>
ovnkube-node-jf67d     5/5     Running   0             50m   10.0.209.180   ip-10-0-209-180.ec2.internal   <none>           <none>
ovnkube-node-np9mf     5/5     Running   0             40m   10.0.165.191   ip-10-0-165-191.ec2.internal   <none>           <none>
ovnkube-node-qjldg     5/5     Running   0             50m   10.0.134.156   ip-10-0-134-156.ec2.internal   <none>           <none>

NAME                   READY   STATUS    RESTARTS      AGE   IP             NODE                           NOMINATED NODE   READINESS GATES
ovnkube-master-84nc9   6/6     Running   0             50m   10.0.134.156   ip-10-0-134-156.ec2.internal   <none>           <none>
ovnkube-master-gmlqv   6/6     Running   0             50m   10.0.209.180   ip-10-0-209-180.ec2.internal   <none>           <none>
ovnkube-master-nhts2   6/6     Running   1 (48m ago)   50m   10.0.147.31    ip-10-0-147-31.ec2.internal    <none>           <none>
ovnkube-node-2cbh8     5/5     Running   0             43m   10.0.217.114   ip-10-0-217-114.ec2.internal   <none>           <none>
ovnkube-node-6fvzl     5/5     Running   0             50m   10.0.147.31    ip-10-0-147-31.ec2.internal    <none>           <none>
ovnkube-node-f4lzz     5/5     Running   0             24m   10.0.146.76    ip-10-0-146-76.ec2.internal    <none>           <none>
ovnkube-node-jf67d     5/5     Running   0             50m   10.0.209.180   ip-10-0-209-180.ec2.internal   <none>           <none>
ovnkube-node-np9mf     5/5     Running   0             40m   10.0.165.191   ip-10-0-165-191.ec2.internal   <none>           <none>
ovnkube-node-qjldg     5/5     Running   0             50m   10.0.134.156   ip-10-0-134-156.ec2.internal   <none>           <none>

Copy to Clipboard

Toggle word wrap

Create a ConfigMap file similar to the following example and use a filename such as env-overrides.yaml:

Example ConfigMap file

kind: ConfigMap
apiVersion: v1
metadata:
  name: env-overrides
  namespace: openshift-ovn-kubernetes
data:
  ip-10-0-217-114.ec2.internal: | 
    # This sets the log level for the ovn-kubernetes node process:
    OVN_KUBE_LOG_LEVEL=5
    # You might also/instead want to enable debug logging for ovn-controller:
    OVN_LOG_LEVEL=dbg
  ip-10-0-209-180.ec2.internal: |
    # This sets the log level for the ovn-kubernetes node process:
    OVN_KUBE_LOG_LEVEL=5
    # You might also/instead want to enable debug logging for ovn-controller:
    OVN_LOG_LEVEL=dbg
  _master: | 
    # This sets the log level for the ovn-kubernetes master process as well as the ovn-dbchecker:
    OVN_KUBE_LOG_LEVEL=5
    # You might also/instead want to enable debug logging for northd, nbdb and sbdb on all masters:
    OVN_LOG_LEVEL=dbg

kind: ConfigMap
apiVersion: v1
metadata:
  name: env-overrides
  namespace: openshift-ovn-kubernetes
data:
  ip-10-0-217-114.ec2.internal: |

1


    # This sets the log level for the ovn-kubernetes node process:
    OVN_KUBE_LOG_LEVEL=5
    # You might also/instead want to enable debug logging for ovn-controller:
    OVN_LOG_LEVEL=dbg
  ip-10-0-209-180.ec2.internal: |
    # This sets the log level for the ovn-kubernetes node process:
    OVN_KUBE_LOG_LEVEL=5
    # You might also/instead want to enable debug logging for ovn-controller:
    OVN_LOG_LEVEL=dbg
  _master: |

2


    # This sets the log level for the ovn-kubernetes master process as well as the ovn-dbchecker:
    OVN_KUBE_LOG_LEVEL=5
    # You might also/instead want to enable debug logging for northd, nbdb and sbdb on all masters:
    OVN_LOG_LEVEL=dbg

Copy to Clipboard

Toggle word wrap

1: Specify the name of the node you want to set the debug log level on.
2: Specify _master to set the log levels of ovnkube-master components.

Apply the ConfigMap file by using the following command:

oc apply -n openshift-ovn-kubernetes -f env-overrides.yaml

$ oc apply -n openshift-ovn-kubernetes -f env-overrides.yaml

Copy to Clipboard

Toggle word wrap

Example output

configmap/env-overrides.yaml created

configmap/env-overrides.yaml created

Copy to Clipboard

Toggle word wrap

Restart the ovnkube pods to apply the new log level by using the following commands:

oc delete pod -n openshift-ovn-kubernetes \
--field-selector spec.nodeName=ip-10-0-217-114.ec2.internal -l app=ovnkube-node

$ oc delete pod -n openshift-ovn-kubernetes \
--field-selector spec.nodeName=ip-10-0-217-114.ec2.internal -l app=ovnkube-node

Copy to Clipboard

Toggle word wrap

oc delete pod -n openshift-ovn-kubernetes \
--field-selector spec.nodeName=ip-10-0-209-180.ec2.internal -l app=ovnkube-node

$ oc delete pod -n openshift-ovn-kubernetes \
--field-selector spec.nodeName=ip-10-0-209-180.ec2.internal -l app=ovnkube-node

Copy to Clipboard

Toggle word wrap

oc delete pod -n openshift-ovn-kubernetes -l app=ovnkube-master

$ oc delete pod -n openshift-ovn-kubernetes -l app=ovnkube-master

Copy to Clipboard

Toggle word wrap

25.3.6. Checking the OVN-Kubernetes pod network connectivity
Copy link

The connectivity check controller, in OpenShift Container Platform 4.10 and later, orchestrates connection verification checks in your cluster. These include Kubernetes API, OpenShift API and individual nodes. The results for the connection tests are stored in PodNetworkConnectivity objects in the openshift-network-diagnostics namespace. Connection tests are performed every minute in parallel.

Prerequisites

Access to the OpenShift CLI (oc).
Access to the cluster as a user with the cluster-admin role.
You have installed jq.

Procedure

To list the current PodNetworkConnectivityCheck objects, enter the following command:

oc get podnetworkconnectivitychecks -n openshift-network-diagnostics

$ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics

Copy to Clipboard

Toggle word wrap

View the most recent success for each connection object by using the following command:

oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \
-o json | jq '.items[]| .spec.targetEndpoint,.status.successes[0]'

$ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \
-o json | jq '.items[]| .spec.targetEndpoint,.status.successes[0]'

Copy to Clipboard

Toggle word wrap

View the most recent failures for each connection object by using the following command:

oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \
-o json | jq '.items[]| .spec.targetEndpoint,.status.failures[0]'

$ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \
-o json | jq '.items[]| .spec.targetEndpoint,.status.failures[0]'

Copy to Clipboard

Toggle word wrap

View the most recent outages for each connection object by using the following command:

oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \
-o json | jq '.items[]| .spec.targetEndpoint,.status.outages[0]'

$ oc get podnetworkconnectivitychecks -n openshift-network-diagnostics \
-o json | jq '.items[]| .spec.targetEndpoint,.status.outages[0]'

Copy to Clipboard

Toggle word wrap

The connectivity check controller also logs metrics from these checks into Prometheus.

View all the metrics by running the following command:

oc exec prometheus-k8s-0 -n openshift-monitoring -- \
promtool query instant  http://localhost:9090 \
'{component="openshift-network-diagnostics"}'

$ oc exec prometheus-k8s-0 -n openshift-monitoring -- \
promtool query instant  http://localhost:9090 \
'{component="openshift-network-diagnostics"}'

Copy to Clipboard

Toggle word wrap

View the latency between the source pod and the openshift api service for the last 5 minutes:

oc exec prometheus-k8s-0 -n openshift-monitoring -- \
promtool query instant  http://localhost:9090 \
'{component="openshift-network-diagnostics"}'

$ oc exec prometheus-k8s-0 -n openshift-monitoring -- \
promtool query instant  http://localhost:9090 \
'{component="openshift-network-diagnostics"}'

Copy to Clipboard

Toggle word wrap

25.4. Tracing Openflow with ovnkube-trace
Copy link

OVN and OVS traffic flows can be simulated in a single utility called ovnkube-trace. The ovnkube-trace utility runs ovn-trace, ovs-appctl ofproto/trace and ovn-detrace and correlates that information in a single output.

You can execute the ovnkube-trace binary from a dedicated container. For releases after OpenShift Container Platform 4.7, you can also copy the binary to a local host and execute it from that host.

Note

The binaries in the Quay images do not currently work for Dual IP stack or IPv6 only environments. For those environments, you must build from source.

25.4.1. Installing the ovnkube-trace on local host
Copy link

The ovnkube-trace tool traces packet simulations for arbitrary UDP or TCP traffic between points in an OVN-Kubernetes driven OpenShift Container Platform cluster. Copy the ovnkube-trace binary to your local host making it available to run against the cluster.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.

Procedure

Create a pod variable by using the following command:

 POD=$(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master -o name | head -1 | awk -F '/' '{print $NF}')

$  POD=$(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-master -o name | head -1 | awk -F '/' '{print $NF}')

Copy to Clipboard

Toggle word wrap

Run the following command on your local host to copy the binary from the ovnkube-master pods:

 oc cp -n openshift-ovn-kubernetes $POD:/usr/bin/ovnkube-trace ovnkube-trace

$  oc cp -n openshift-ovn-kubernetes $POD:/usr/bin/ovnkube-trace ovnkube-trace

Copy to Clipboard

Toggle word wrap

Make ovnkube-trace executable by running the following command:
```
 chmod +x ovnkube-trace
```
```
$  chmod +x ovnkube-trace
```
Copy to Clipboard Toggle word wrap

Display the options available with ovnkube-trace by running the following command:

 ./ovnkube-trace -help

$  ./ovnkube-trace -help

Copy to Clipboard

Toggle word wrap

Expected output

I0111 15:05:27.973305  204872 ovs.go:90] Maximum command line arguments set to: 191102
Usage of ./ovnkube-trace:
  -dst string
    	dest: destination pod name
  -dst-ip string
    	destination IP address (meant for tests to external targets)
  -dst-namespace string
    	k8s namespace of dest pod (default "default")
  -dst-port string
    	dst-port: destination port (default "80")
  -kubeconfig string
    	absolute path to the kubeconfig file
  -loglevel string
    	loglevel: klog level (default "0")
  -ovn-config-namespace string
    	namespace used by ovn-config itself
  -service string
    	service: destination service name
  -skip-detrace
    	skip ovn-detrace command
  -src string
    	src: source pod name
  -src-namespace string
    	k8s namespace of source pod (default "default")
  -tcp
    	use tcp transport protocol
  -udp
    	use udp transport protocol

I0111 15:05:27.973305  204872 ovs.go:90] Maximum command line arguments set to: 191102
Usage of ./ovnkube-trace:
  -dst string
    	dest: destination pod name
  -dst-ip string
    	destination IP address (meant for tests to external targets)
  -dst-namespace string
    	k8s namespace of dest pod (default "default")
  -dst-port string
    	dst-port: destination port (default "80")
  -kubeconfig string
    	absolute path to the kubeconfig file
  -loglevel string
    	loglevel: klog level (default "0")
  -ovn-config-namespace string
    	namespace used by ovn-config itself
  -service string
    	service: destination service name
  -skip-detrace
    	skip ovn-detrace command
  -src string
    	src: source pod name
  -src-namespace string
    	k8s namespace of source pod (default "default")
  -tcp
    	use tcp transport protocol
  -udp
    	use udp transport protocol

Copy to Clipboard

Toggle word wrap

The command-line arguments supported are familiar Kubernetes constructs, such as namespaces, pods, services so you do not need to find the MAC address, the IP address of the destination nodes, or the ICMP type.

The log levels are:

0 (minimal output)
2 (more verbose output showing results of trace commands)
5 (debug output)

25.4.2. Running ovnkube-trace
Copy link

Run ovn-trace to simulate packet forwarding within an OVN logical network.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You have installed ovnkube-trace on local host

Example: Testing that DNS resolution works from a deployed pod

This example illustrates how to test the DNS resolution from a deployed pod to the core DNS pod that runs in the cluster.

Procedure

Start a web service in the default namespace by entering the following command:

oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

Copy to Clipboard

Toggle word wrap

List the pods running in the openshift-dns namespace:

oc get pods -n openshift-dns

oc get pods -n openshift-dns

Copy to Clipboard

Toggle word wrap

Example output

NAME                  READY   STATUS    RESTARTS   AGE
dns-default-467qw     2/2     Running   0          49m
dns-default-6prvx     2/2     Running   0          53m
dns-default-fkqr8     2/2     Running   0          53m
dns-default-qv2rg     2/2     Running   0          49m
dns-default-s29vr     2/2     Running   0          49m
dns-default-vdsbn     2/2     Running   0          53m
node-resolver-6thtt   1/1     Running   0          53m
node-resolver-7ksdn   1/1     Running   0          49m
node-resolver-8sthh   1/1     Running   0          53m
node-resolver-c5ksw   1/1     Running   0          50m
node-resolver-gbvdp   1/1     Running   0          53m
node-resolver-sxhkd   1/1     Running   0          50m

NAME                  READY   STATUS    RESTARTS   AGE
dns-default-467qw     2/2     Running   0          49m
dns-default-6prvx     2/2     Running   0          53m
dns-default-fkqr8     2/2     Running   0          53m
dns-default-qv2rg     2/2     Running   0          49m
dns-default-s29vr     2/2     Running   0          49m
dns-default-vdsbn     2/2     Running   0          53m
node-resolver-6thtt   1/1     Running   0          53m
node-resolver-7ksdn   1/1     Running   0          49m
node-resolver-8sthh   1/1     Running   0          53m
node-resolver-c5ksw   1/1     Running   0          50m
node-resolver-gbvdp   1/1     Running   0          53m
node-resolver-sxhkd   1/1     Running   0          50m

Copy to Clipboard

Toggle word wrap

Run the following ovn-kube-trace command to verify DNS resolution is working:

./ovnkube-trace \
  -src-namespace default \
  -src web \
  -dst-namespace openshift-dns \
  -dst dns-default-467qw \
  -udp -dst-port 53 \
  -loglevel 0

$ ./ovnkube-trace \
  -src-namespace default \

1


  -src web \

2


  -dst-namespace openshift-dns \

3


  -dst dns-default-467qw \

4


  -udp -dst-port 53 \

5


  -loglevel 0

6

Copy to Clipboard

Toggle word wrap

1: Namespace of the source pod
2: Source pod name
3: Namespace of destination pod
4: Destination pod name
5: Use the udp transport protocol. Port 53 is the port the DNS service uses.
6: Set the log level to 1 (0 is minimal and 5 is debug)

Expected output

I0116 10:19:35.601303   17900 ovs.go:90] Maximum command line arguments set to: 191102
ovn-trace source pod to destination pod indicates success from web to dns-default-467qw
ovn-trace destination pod to source pod indicates success from dns-default-467qw to web
ovs-appctl ofproto/trace source pod to destination pod indicates success from web to dns-default-467qw
ovs-appctl ofproto/trace destination pod to source pod indicates success from dns-default-467qw to web
ovn-detrace source pod to destination pod indicates success from web to dns-default-467qw
ovn-detrace destination pod to source pod indicates success from dns-default-467qw to web

I0116 10:19:35.601303   17900 ovs.go:90] Maximum command line arguments set to: 191102
ovn-trace source pod to destination pod indicates success from web to dns-default-467qw
ovn-trace destination pod to source pod indicates success from dns-default-467qw to web
ovs-appctl ofproto/trace source pod to destination pod indicates success from web to dns-default-467qw
ovs-appctl ofproto/trace destination pod to source pod indicates success from dns-default-467qw to web
ovn-detrace source pod to destination pod indicates success from web to dns-default-467qw
ovn-detrace destination pod to source pod indicates success from dns-default-467qw to web

Copy to Clipboard

Toggle word wrap

The ouput indicates success from the deployed pod to the DNS port and also indicates that it is successful going back in the other direction. So you know bi-directional traffic is supported on UDP port 53 if my web pod wants to do dns resolution from core DNS.

If for example that did not work and you wanted to get the ovn-trace, the ovs-appctl ofproto/trace and ovn-detrace, and more debug type information increase the log level to 2 and run the command again as follows:

./ovnkube-trace \
  -src-namespace default \
  -src web \
  -dst-namespace openshift-dns \
  -dst dns-default-467qw \
  -udp -dst-port 53 \
  -loglevel 2

$ ./ovnkube-trace \
  -src-namespace default \
  -src web \
  -dst-namespace openshift-dns \
  -dst dns-default-467qw \
  -udp -dst-port 53 \
  -loglevel 2

Copy to Clipboard

Toggle word wrap

The output from this increased log level is too much to list here. In a failure situation the output of this command shows which flow is dropping that traffic. For example an egress or ingress network policy may be configured on the cluster that does not allow that traffic.

Example: Verifying by using debug output a configured default deny

This example illustrates how to identify by using the debug output that an ingress default deny policy blocks traffic.

Procedure

Create the following YAML that defines a deny-by-default policy to deny ingress from all pods in all namespaces. Save the YAML in the deny-by-default.yaml file:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
  namespace: default
spec:
  podSelector: {}
  ingress: []

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-by-default
  namespace: default
spec:
  podSelector: {}
  ingress: []

Copy to Clipboard

Toggle word wrap

Apply the policy by entering the following command:

oc apply -f deny-by-default.yaml

$ oc apply -f deny-by-default.yaml

Copy to Clipboard

Toggle word wrap

Example output

networkpolicy.networking.k8s.io/deny-by-default created

networkpolicy.networking.k8s.io/deny-by-default created

Copy to Clipboard

Toggle word wrap

Start a web service in the default namespace by entering the following command:

oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

$ oc run web --namespace=default --image=nginx --labels="app=web" --expose --port=80

Copy to Clipboard

Toggle word wrap

Run the following command to create the prod namespace:
```
oc create namespace prod
```
```
$ oc create namespace prod
```
Copy to Clipboard Toggle word wrap
Run the following command to label the prod namespace:
```
oc label namespace/prod purpose=production
```
```
$ oc label namespace/prod purpose=production
```
Copy to Clipboard Toggle word wrap
Run the following command to deploy an alpine image in the prod namespace and start a shell:
```
oc run test-6459 --namespace=prod --rm -i -t --image=alpine -- sh
```
```
$ oc run test-6459 --namespace=prod --rm -i -t --image=alpine -- sh
```
Copy to Clipboard Toggle word wrap
Open another terminal session.

In this new terminal session run ovn-trace to verify the failure in communication between the source pod test-6459 running in namespace prod and destination pod running in the default namespace:

./ovnkube-trace \
 -src-namespace prod \
 -src test-6459 \
 -dst-namespace default \
 -dst web \
 -tcp -dst-port 80 \
 -loglevel 0

$ ./ovnkube-trace \
 -src-namespace prod \
 -src test-6459 \
 -dst-namespace default \
 -dst web \
 -tcp -dst-port 80 \
 -loglevel 0

Copy to Clipboard

Toggle word wrap

Expected output

I0116 14:20:47.380775   50822 ovs.go:90] Maximum command line arguments set to: 191102
ovn-trace source pod to destination pod indicates failure from test-6459 to web

I0116 14:20:47.380775   50822 ovs.go:90] Maximum command line arguments set to: 191102
ovn-trace source pod to destination pod indicates failure from test-6459 to web

Copy to Clipboard

Toggle word wrap

Increase the log level to 2 to expose the reason for the failure by running the following command:

./ovnkube-trace \
 -src-namespace prod \
 -src test-6459 \
 -dst-namespace default \
 -dst web \
 -tcp -dst-port 80 \
 -loglevel 2

$ ./ovnkube-trace \
 -src-namespace prod \
 -src test-6459 \
 -dst-namespace default \
 -dst web \
 -tcp -dst-port 80 \
 -loglevel 2

Copy to Clipboard

Toggle word wrap

Expected output

ct_lb_mark /* default (use --ct to customize) */
------------------------------------------------
 3. ls_out_acl_hint (northd.c:6092): !ct.new && ct.est && !ct.rpl && ct_mark.blocked == 0, priority 4, uuid 32d45ad4
    reg0[8] = 1;
    reg0[10] = 1;
    next;
 4. ls_out_acl (northd.c:6435): reg0[10] == 1 && (outport == @a16982411286042166782_ingressDefaultDeny), priority 2000, uuid f730a887 
    ct_commit { ct_mark.blocked = 1; };

ct_lb_mark /* default (use --ct to customize) */
------------------------------------------------
 3. ls_out_acl_hint (northd.c:6092): !ct.new && ct.est && !ct.rpl && ct_mark.blocked == 0, priority 4, uuid 32d45ad4
    reg0[8] = 1;
    reg0[10] = 1;
    next;
 4. ls_out_acl (northd.c:6435): reg0[10] == 1 && (outport == @a16982411286042166782_ingressDefaultDeny), priority 2000, uuid f730a887

1


    ct_commit { ct_mark.blocked = 1; };

Copy to Clipboard

Toggle word wrap

1: Ingress traffic is blocked due to the default deny policy being in place

Create a policy that allows traffic from all pods in a particular namespaces with a label purpose=production. Save the YAML in the web-allow-prod.yaml file:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: web-allow-prod
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: web
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          purpose: production

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: web-allow-prod
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: web
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          purpose: production

Copy to Clipboard

Toggle word wrap

Apply the policy by entering the following command:
```
oc apply -f web-allow-prod.yaml
```
```
$ oc apply -f web-allow-prod.yaml
```
Copy to Clipboard Toggle word wrap

Run ovnkube-trace to verify that traffic is now allowed by entering the following command:

./ovnkube-trace \
 -src-namespace prod \
 -src test-6459 \
 -dst-namespace default \
 -dst web \
 -tcp -dst-port 80 \
 -loglevel 0

$ ./ovnkube-trace \
 -src-namespace prod \
 -src test-6459 \
 -dst-namespace default \
 -dst web \
 -tcp -dst-port 80 \
 -loglevel 0

Copy to Clipboard

Toggle word wrap

Expected output

I0116 14:25:44.055207   51695 ovs.go:90] Maximum command line arguments set to: 191102
ovn-trace source pod to destination pod indicates success from test-6459 to web
ovn-trace destination pod to source pod indicates success from web to test-6459
ovs-appctl ofproto/trace source pod to destination pod indicates success from test-6459 to web
ovs-appctl ofproto/trace destination pod to source pod indicates success from web to test-6459
ovn-detrace source pod to destination pod indicates success from test-6459 to web
ovn-detrace destination pod to source pod indicates success from web to test-6459

I0116 14:25:44.055207   51695 ovs.go:90] Maximum command line arguments set to: 191102
ovn-trace source pod to destination pod indicates success from test-6459 to web
ovn-trace destination pod to source pod indicates success from web to test-6459
ovs-appctl ofproto/trace source pod to destination pod indicates success from test-6459 to web
ovs-appctl ofproto/trace destination pod to source pod indicates success from web to test-6459
ovn-detrace source pod to destination pod indicates success from test-6459 to web
ovn-detrace destination pod to source pod indicates success from web to test-6459

Copy to Clipboard

Toggle word wrap

In the open shell run the following command:

 wget -qO- --timeout=2 http://web.default

 wget -qO- --timeout=2 http://web.default

Copy to Clipboard

Toggle word wrap

Expected output

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
html { color-scheme: light dark; }
body { width: 35em; margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif; }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

Copy to Clipboard

Toggle word wrap

25.5. Migrating from the OpenShift SDN network plugin
Copy link

As a cluster administrator, you can migrate to the OVN-Kubernetes network plugin from the OpenShift SDN network plugin.

You can use the offline migration method for migrating from the OpenShift SDN network plugin to the OVN-Kubernetes plugin. The offline migration method is a manual process that includes some downtime.

25.5.1. Migration to the OVN-Kubernetes network plugin
Copy link

Migrating to the OVN-Kubernetes network plugin is a manual process that includes some downtime during which your cluster is unreachable.

Important

Before you migrate your OpenShift Container Platform cluster to use the OVN-Kubernetes network plugin, update your cluster to the latest z-stream release so that all the latest bug fixes apply to your cluster.

Although a rollback procedure is provided, the migration is intended to be a one-way process.

A migration to the OVN-Kubernetes network plugin is supported on the following platforms:

Bare metal hardware
Amazon Web Services (AWS)
Google Cloud
IBM Cloud®
Microsoft Azure
Red Hat OpenStack Platform (RHOSP)
Red Hat Virtualization (RHV)
{vmw-first}

Important

Migrating to or from the OVN-Kubernetes network plugin is not supported for managed OpenShift cloud services such as Red Hat OpenShift Dedicated, Azure Red Hat OpenShift(ARO), and Red Hat OpenShift Service on AWS (ROSA).

Migrating from OpenShift SDN network plugin to OVN-Kubernetes network plugin is not supported on Nutanix.

25.5.1.1. Considerations for migrating to the OVN-Kubernetes network plugin
Copy link

If you have more than 150 nodes in your OpenShift Container Platform cluster, then open a support case for consultation on your migration to the OVN-Kubernetes network plugin.

The subnets assigned to nodes and the IP addresses assigned to individual pods are not preserved during the migration.

While the OVN-Kubernetes network plugin implements many of the capabilities present in the OpenShift SDN network plugin, the configuration is not the same.

If your cluster uses any of the following OpenShift SDN network plugin capabilities, you must manually configure the same capability in the OVN-Kubernetes network plugin:
- Namespace isolation
- Egress router pods
If your cluster or surrounding network uses any part of the 100.64.0.0/16 address range, you must choose another unused IP range by specifying the v4InternalSubnet spec under the spec.defaultNetwork.ovnKubernetesConfig object definition. OVN-Kubernetes uses the IP range 100.64.0.0/16 internally by default.
If your openshift-sdn cluster with Precision Time Protocol (PTP) uses the User Datagram Protocol (UDP) for hardware time stamping and you migrate to the OVN-Kubernetes plugin, the hardware time stamping cannot be applied to primary interface devices, such as an Open vSwitch (OVS) bridge. As a result, UDP version 4 configurations cannot work with a br-ex interface.

The following sections highlight the differences in configuration between the aforementioned capabilities in OVN-Kubernetes and OpenShift SDN network plugins.

Primary network interface

The OpenShift SDN plugin allows application of the NodeNetworkConfigurationPolicy (NNCP) custom resource (CR) to the primary interface on a node. The OVN-Kubernetes network plugin does not have this capability.

If you have an NNCP applied to the primary interface, you must delete the NNCP before migrating to the OVN-Kubernetes network plugin. Deleting the NNCP does not remove the configuration from the primary interface, but with OVN-Kubernetes, the Kubernetes NMState cannot manage this configuration. Instead, the configure-ovs.sh shell script manages the primary interface and the configuration attached to this interface.

Namespace isolation

OVN-Kubernetes supports only the network policy isolation mode.

Important

For a cluster using OpenShift SDN that is configured in either the multitenant or subnet isolation mode, you can still migrate to the OVN-Kubernetes network plugin. Note that after the migration operation, multitenant isolation mode is dropped, so you must manually configure network policies to achieve the same level of project-level isolation for pods and services.

Egress IP addresses

OpenShift SDN supports two different Egress IP modes:

In the automatically assigned approach, an egress IP address range is assigned to a node.
In the manually assigned approach, a list of one or more egress IP addresses is assigned to a node.

The migration process supports migrating Egress IP configurations that use the automatically assigned mode.

The differences in configuring an egress IP address between OVN-Kubernetes and OpenShift SDN is described in the following table:

Expand

Table 25.4. Differences in egress IP address configuration
OVN-Kubernetes	OpenShift SDN
Create an `EgressIPs` object Add an annotation on a `Node` object	Patch a `NetNamespace` object Patch a `HostSubnet` object

For more information on using egress IP addresses in OVN-Kubernetes, see "Configuring an egress IP address".

Egress network policies

The difference in configuring an egress network policy, also known as an egress firewall, between OVN-Kubernetes and OpenShift SDN is described in the following table:

Expand

Table 25.5. Differences in egress network policy configuration
OVN-Kubernetes	OpenShift SDN
Create an `EgressFirewall` object in a namespace	Create an `EgressNetworkPolicy` object in a namespace

Note

Because the name of an EgressFirewall object can only be set to default, after the migration all migrated EgressNetworkPolicy objects are named default, regardless of what the name was under OpenShift SDN.

If you subsequently rollback to OpenShift SDN, all EgressNetworkPolicy objects are named default as the prior name is lost.

For more information on using an egress firewall in OVN-Kubernetes, see "Configuring an egress firewall for a project".

Egress router pods

OVN-Kubernetes supports egress router pods in redirect mode. OVN-Kubernetes does not support egress router pods in HTTP proxy mode or DNS proxy mode.

When you deploy an egress router with the Cluster Network Operator, you cannot specify a node selector to control which node is used to host the egress router pod.

Multicast

The difference between enabling multicast traffic on OVN-Kubernetes and OpenShift SDN is described in the following table:

Expand

Table 25.6. Differences in multicast configuration
OVN-Kubernetes	OpenShift SDN
Add an annotation on a `Namespace` object	Add an annotation on a `NetNamespace` object

For more information on using multicast in OVN-Kubernetes, see "Enabling multicast for a project".

Network policies

OVN-Kubernetes fully supports the Kubernetes NetworkPolicy API in the networking.k8s.io/v1 API group. No changes are necessary in your network policies when migrating from OpenShift SDN.

25.5.1.2. How the migration process works
Copy link

The following table summarizes the migration process by segmenting between the user-initiated steps in the process and the actions that the migration performs in response.

Expand

Table 25.7. Migrating to OVN-Kubernetes from OpenShift SDN
User-initiated steps	Migration activity
Set the `migration` field of the `Network.operator.openshift.io` custom resource (CR) named `cluster` to `OVNKubernetes`. Make sure the `migration` field is `null` before setting it to a value.	Cluster Network Operator (CNO) Updates the status of the `Network.config.openshift.io` CR named `cluster` accordingly. Machine Config Operator (MCO) Rolls out an update to the systemd configuration necessary for OVN-Kubernetes; the MCO updates a single machine per pool at a time by default, causing the total time the migration takes to increase with the size of the cluster.
Update the `networkType` field of the `Network.config.openshift.io` CR.	CNO Performs the following actions: Destroys the OpenShift SDN control plane pods. Deploys the OVN-Kubernetes control plane pods. Updates the Multus objects to reflect the new network plugin.
Reboot each node in the cluster.	Cluster As nodes reboot, the cluster assigns IP addresses to pods on the OVN-Kubernetes cluster network.

If a rollback to OpenShift SDN is required, the following table describes the process.

Important

You must wait until the migration process from OpenShift SDN to OVN-Kubernetes network plugin is successful before initiating a rollback.

Expand

Table 25.8. Performing a rollback to OpenShift SDN
User-initiated steps	Migration activity
Suspend the MCO to ensure that it does not interrupt the migration.	The MCO stops.
Set the `migration` field of the `Network.operator.openshift.io` custom resource (CR) named `cluster` to `OpenShiftSDN`. Make sure the `migration` field is `null` before setting it to a value.	CNO Updates the status of the `Network.config.openshift.io` CR named `cluster` accordingly.
Update the `networkType` field.	CNO Performs the following actions: Destroys the OVN-Kubernetes control plane pods. Deploys the OpenShift SDN control plane pods. Updates the Multus objects to reflect the new network plugin.
Reboot each node in the cluster.	Cluster As nodes reboot, the cluster assigns IP addresses to pods on the OpenShift-SDN network.
Enable the MCO after all nodes in the cluster reboot.	MCO Rolls out an update to the systemd configuration necessary for OpenShift SDN; the MCO updates a single machine per pool at a time by default, so the total time the migration takes increases with the size of the cluster.

25.5.1.3. Using an Ansible playbook to migrate to the OVN-Kubernetes network plugin
Copy link

As a cluster administrator, you can use an Ansible collection, network.offline_migration_sdn_to_ovnk, to migrate from the OpenShift SDN Container Network Interface (CNI) network plugin to the OVN-Kubernetes plugin for your cluster. The Ansible collection includes the following playbooks:

playbooks/playbook-migration.yml: Includes playbooks that execute in a sequence where each playbook represents a step in the migration process.
playbooks/playbook-rollback.yml: Includes playbooks that execute in a sequence where each playbook represents a step in the rollback process.

Prerequisites

You installed the python3 package, minimum version 3.10.
You installed the jmespath and jq packages.
You logged in to the Red Hat Hybrid Cloud Console and opened the Ansible Automation Platform web console.
You created a security group rule that allows User Datagram Protocol (UDP) packets on port 6081 for all nodes on all cloud platforms. If you do not do this task, your cluster might fail to schedule pods.
Check if your cluster uses static routes or routing policies in the host network.
- If true, a later procedure step requires that you set the routingViaHost parameter to true and the ipForwarding parameter to Global in the gatewayConfig section of the playbooks/playbook-migration.yml file.
If the OpenShift-SDN plugin uses the 100.64.0.0/16 and 100.88.0.0/16 address ranges, you patched the address ranges. For more information, see "Patching OVN-Kubernetes address ranges" in the Additional resources section.

Procedure

Install the ansible-core package, minimum version 2.15. The following example command shows how to install the ansible-core package on Red Hat Enterprise Linux (RHEL):
```
sudo dnf install -y ansible-core
```
```
$ sudo dnf install -y ansible-core
```
Copy to Clipboard Toggle word wrap

Create an ansible.cfg file and add information similar to the following example to the file. Ensure that file exists in the same directory as where the ansible-galaxy commands and the playbooks run.

cat << EOF >> ansible.cfg
[galaxy]
server_list = automation_hub, validated

[galaxy_server.automation_hub]
url=https://console.redhat.com/api/automation-hub/content/published/
auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
token=

#[galaxy_server.release_galaxy]
#url=https://galaxy.ansible.com/

[galaxy_server.validated]
url=https://console.redhat.com/api/automation-hub/content/validated/
auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
token=
EOF

$ cat << EOF >> ansible.cfg
[galaxy]
server_list = automation_hub, validated

[galaxy_server.automation_hub]
url=https://console.redhat.com/api/automation-hub/content/published/
auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
token=

#[galaxy_server.release_galaxy]
#url=https://galaxy.ansible.com/

[galaxy_server.validated]
url=https://console.redhat.com/api/automation-hub/content/validated/
auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
token=
EOF

Copy to Clipboard

Toggle word wrap

From the Ansible Automation Platform web console, go to the Connect to Hub page and complete the following steps:
1. In the Offline token section of the page, click the Load token button.
2. After the token loads, click the Copy to clipboard icon.
3. Open the ansible.cfg file and paste the API token in the token= parameter. The API token is required for authenticating against the server URL specified in the ansible.cfg file.
Install the network.offline_migration_sdn_to_ovnk Ansible collection by entering the following ansible-galaxy command:
```
ansible-galaxy collection install network.offline_migration_sdn_to_ovnk
```
```
$ ansible-galaxy collection install network.offline_migration_sdn_to_ovnk
```
Copy to Clipboard Toggle word wrap
Verify that the network.offline_migration_sdn_to_ovnk Ansible collection is installed on your system:
```
ansible-galaxy collection list | grep network.offline_migration_sdn_to_ovnk
```
```
$ ansible-galaxy collection list | grep network.offline_migration_sdn_to_ovnk
```
Copy to Clipboard Toggle word wrap
Example output
```
network.offline_migration_sdn_to_ovnk   1.0.2
```
```
network.offline_migration_sdn_to_ovnk   1.0.2
```
Copy to Clipboard Toggle word wrap
The network.offline_migration_sdn_to_ovnk Ansible collection is saved in the default path of ~/.ansible/collections/ansible_collections/network/offline_migration_sdn_to_ovnk/.
Configure migration features in the playbooks/playbook-migration.yml file:
```
# ...
    migration_interface_name: eth0
    migration_disable_auto_migration: true
    migration_egress_ip: false
    migration_egress_firewall: false
    migration_multicast: false
    migration_routing_via_host: true
    migration_ip_forwarding: Global
    migration_cidr: "10.240.0.0/14"
    migration_prefix: 23
    migration_mtu: 1400
    migration_geneve_port: 6081
    migration_ipv4_subnet: "100.64.0.0/16"
# ...
```
```
# ...
    migration_interface_name: eth0
    migration_disable_auto_migration: true
    migration_egress_ip: false
    migration_egress_firewall: false
    migration_multicast: false
    migration_routing_via_host: true
    migration_ip_forwarding: Global
    migration_cidr: "10.240.0.0/14"
    migration_prefix: 23
    migration_mtu: 1400
    migration_geneve_port: 6081
    migration_ipv4_subnet: "100.64.0.0/16"
# ...
```
Copy to Clipboard Toggle word wrap
migration_interface_name
If you use an NodeNetworkConfigurationPolicy (NNCP) resource on a primary interface, specify the interface name in the migration-playbook.yml file so that the NNCP resource gets deleted on the primary interface during the migration process.
migration_disable_auto_migration
Disables the auto-migration of OpenShift SDN CNI plug-in features to the OVN-Kubernetes plugin. If you disable auto-migration of features, you must also set the migration_egress_ip, migration_egress_firewall, and migration_multicast parameters to false. If you need to enable auto-migration of features, set the parameter to false.
migration_routing_via_host
Set to true to configure local gateway mode or false to configure shared gateway mode for nodes in your cluster. The default value is false. In local gateway mode, traffic is routed through the host network stack. In shared gateway mode, traffic is not routed through the host network stack.
migration_ip_forwarding
If you configured local gateway mode, set IP forwarding to Global if you need the host network of the node to act as a router for traffic not related to OVN-Kubernetes.
migration_cidr
Specifies a Classless Inter-Domain Routing (CIDR) IP address block for your cluster. You cannot use any CIDR block that overlaps the 100.64.0.0/16 CIDR block, because the OVN-Kubernetes network provider uses this block internally.
migration_prefix
Ensure that you specify a prefix value, which is the slice of the CIDR block apportioned to each node in your cluster.
migration_mtu
Optional parameter that sets a specific maximum transmission unit (MTU) to your cluster network after the migration process.
migration_geneve_port
Optional parameter that sets a Geneve port for OVN-Kubernetes. The default port is 6081.
migration_ipv4_subnet
Optional parameter that sets an IPv4 address range for internal use by OVN-Kubernetes. The default value for the parameter is 100.64.0.0/16.
To run the playbooks/playbook-migration.yml file, enter the following command:
```
ansible-playbook -v playbooks/playbook-migration.yml
```
```
$ ansible-playbook -v playbooks/playbook-migration.yml
```
Copy to Clipboard Toggle word wrap

25.5.2. Migrating to the OVN-Kubernetes network plugin
Copy link

As a cluster administrator, you can change the network plugin for your cluster to OVN-Kubernetes. During the migration, you must reboot every node in your cluster.

Important

While performing the migration, your cluster is unavailable and workloads might be interrupted. Perform the migration only when an interruption in service is acceptable.

Prerequisites

You have a cluster configured with the OpenShift SDN CNI network plugin in the network policy isolation mode.
You installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.
You have a recent backup of the etcd database.
You can manually reboot each node.
You checked that your cluster is in a known good state without any errors.
You created a security group rule that allows User Datagram Protocol (UDP) packets on port 6081 for all nodes on all cloud platforms.

Procedure

To backup the configuration for the cluster network, enter the following command:

oc get Network.config.openshift.io cluster -o yaml > cluster-openshift-sdn.yaml

$ oc get Network.config.openshift.io cluster -o yaml > cluster-openshift-sdn.yaml

Copy to Clipboard

Toggle word wrap

Verify that the OVN_SDN_MIGRATION_TIMEOUT environment variable is set and is equal to 0s by running the following command:

#!/bin/bash

if [ -n "$OVN_SDN_MIGRATION_TIMEOUT" ] && [ "$OVN_SDN_MIGRATION_TIMEOUT" = "0s" ]; then
    unset OVN_SDN_MIGRATION_TIMEOUT
fi

#loops the timeout command of the script to repeatedly check the cluster Operators until all are available.

co_timeout=${OVN_SDN_MIGRATION_TIMEOUT:-1200s}
timeout "$co_timeout" bash <<EOT
until
  oc wait co --all --for='condition=AVAILABLE=True' --timeout=10s && \
  oc wait co --all --for='condition=PROGRESSING=False' --timeout=10s && \
  oc wait co --all --for='condition=DEGRADED=False' --timeout=10s;
do
  sleep 10
  echo "Some ClusterOperators Degraded=False,Progressing=True,or Available=False";
done
EOT

#!/bin/bash

if [ -n "$OVN_SDN_MIGRATION_TIMEOUT" ] && [ "$OVN_SDN_MIGRATION_TIMEOUT" = "0s" ]; then
    unset OVN_SDN_MIGRATION_TIMEOUT
fi

#loops the timeout command of the script to repeatedly check the cluster Operators until all are available.

co_timeout=${OVN_SDN_MIGRATION_TIMEOUT:-1200s}
timeout "$co_timeout" bash <<EOT
until
  oc wait co --all --for='condition=AVAILABLE=True' --timeout=10s && \
  oc wait co --all --for='condition=PROGRESSING=False' --timeout=10s && \
  oc wait co --all --for='condition=DEGRADED=False' --timeout=10s;
do
  sleep 10
  echo "Some ClusterOperators Degraded=False,Progressing=True,or Available=False";
done
EOT

Copy to Clipboard

Toggle word wrap

Remove the configuration from the Cluster Network Operator (CNO) configuration object by running the following command:

oc patch Network.operator.openshift.io cluster --type='merge' \
--patch '{"spec":{"migration":null}}'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
--patch '{"spec":{"migration":null}}'

Copy to Clipboard

Toggle word wrap

Delete the NodeNetworkConfigurationPolicy (NNCP) custom resource (CR) that defines the primary network interface for the OpenShift SDN network plugin by completing the following steps:
1. Check that the existing NNCP CR bonded the primary interface to your cluster by entering the following command:
  $ oc get nncp
  Copy to Clipboard Toggle word wrap
  Example output
  NAME STATUS REASON bondmaster0 Available SuccessfullyConfigured
  
  Copy to Clipboard Toggle word wrap
  Network Manager stores the connection profile for the bonded primary interface in the /etc/NetworkManager/system-connections system path.
2. Remove the NNCP from your cluster:
  $ oc delete nncp <nncp_manifest_filename>
  Copy to Clipboard Toggle word wrap
To prepare all the nodes for the migration, set the migration field on the CNO configuration object by running the following command:
```
oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes" } } }'
```
```
$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes" } } }'
```
Copy to Clipboard Toggle word wrap
Note
This step does not deploy OVN-Kubernetes immediately. Instead, specifying the migration field triggers the Machine Config Operator (MCO) to apply new machine configs to all the nodes in the cluster in preparation for the OVN-Kubernetes deployment.
1. Check that the reboot is finished by running the following command:
  $ oc get mcp
  Copy to Clipboard Toggle word wrap
2. Check that all cluster Operators are available by running the following command:
  $ oc get co
  Copy to Clipboard Toggle word wrap
3. Alternatively: You can disable automatic migration of several OpenShift SDN capabilities to the OVN-Kubernetes equivalents:
  - Egress IPs
  - Egress firewall
  - Multicast
  To disable automatic migration of the configuration for any of the previously noted OpenShift SDN features, specify the following keys:
  $ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes", "features": { "egressIP": <bool>, "egressFirewall": <bool>, "multicast": <bool> } } } }'
  Copy to Clipboard Toggle word wrap
  where:
  bool: Specifies whether to enable migration of the feature. The default is true.
Optional: You can customize the following settings for OVN-Kubernetes to meet your network infrastructure requirements:
- Maximum transmission unit (MTU). Consider the following before customizing the MTU for this optional step:
  - If you use the default MTU, and you want to keep the default MTU during migration, this step can be ignored.
  - If you used a custom MTU, and you want to keep the custom MTU during migration, you must declare the custom MTU value in this step.
  - This step does not work if you want to change the MTU value during migration. Instead, you must first follow the instructions for "Changing the cluster MTU". You can then keep the custom MTU value by performing this procedure and declaring the custom MTU value in this step.
    Note
    OpenShift-SDN and OVN-Kubernetes have different overlay overhead. MTU values should be selected by following the guidelines found on the "MTU value selection" page.
- Geneve (Generic Network Virtualization Encapsulation) overlay network port
- OVN-Kubernetes IPv4 internal subnet
To customize either of the previously noted settings, enter and customize the following command. If you do not need to change the default value, omit the key from the patch.
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":<mtu>,
          "genevePort":<port>,
          "v4InternalSubnet":"<ipv4_subnet>"
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":<mtu>,
          "genevePort":<port>,
          "v4InternalSubnet":"<ipv4_subnet>"
    }}}}'
```
Copy to Clipboard Toggle word wrap
where:
mtu
The MTU for the Geneve overlay network. This value is normally configured automatically, but if the nodes in your cluster do not all use the same MTU, then you must set this explicitly to 100 less than the smallest node MTU value.
port
The UDP port for the Geneve overlay network. If a value is not specified, the default is 6081. The port cannot be the same as the VXLAN port that is used by OpenShift SDN. The default value for the VXLAN port is 4789.
ipv4_subnet
An IPv4 address range for internal use by OVN-Kubernetes. You must ensure that the IP address range does not overlap with any other subnet used by your OpenShift Container Platform installation. The IP address range must be larger than the maximum number of nodes that can be added to the cluster. The default value is 100.64.0.0/16.
Example patch command to update mtu field
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":1200
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":1200
    }}}}'
```
Copy to Clipboard Toggle word wrap
As the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
```
oc get mcp
```
```
$ oc get mcp
```
Copy to Clipboard Toggle word wrap
A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.
Note
By default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.

Confirm the status of the new machine configuration on the hosts:

To list the machine configuration state and the name of the applied machine configuration, enter the following command:

oc describe node | egrep "hostname|machineconfig"

$ oc describe node | egrep "hostname|machineconfig"

Copy to Clipboard

Toggle word wrap

Example output

kubernetes.io/hostname=master-0
machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done

kubernetes.io/hostname=master-0
machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done

Copy to Clipboard

Toggle word wrap

Verify that the following statements are true:

The value of machineconfiguration.openshift.io/state field is Done.
The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.

To confirm that the machine config is correct, enter the following command:
```
oc get machineconfig <config_name> -o yaml | grep ExecStart
```
```
$ oc get machineconfig <config_name> -o yaml | grep ExecStart
```
Copy to Clipboard Toggle word wrap
where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.
The machine config must include the following update to the systemd configuration:
```
ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes
```
```
ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes
```
Copy to Clipboard Toggle word wrap

If a node is stuck in the NotReady state, investigate the machine config daemon pod logs and resolve any errors.

To list the pods, enter the following command:

oc get pod -n openshift-machine-config-operator

$ oc get pod -n openshift-machine-config-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

Copy to Clipboard

Toggle word wrap

The names for the config daemon pods are in the following format: machine-config-daemon-<seq>. The <seq> value is a random five character alphanumeric sequence.

Display the pod log for the first machine config daemon pod shown in the previous output by enter the following command:
```
oc logs <pod> -n openshift-machine-config-operator
```
```
$ oc logs <pod> -n openshift-machine-config-operator
```
Copy to Clipboard Toggle word wrap
where pod is the name of a machine config daemon pod.
Resolve any errors in the logs shown by the output from the previous command.

To start the migration, configure the OVN-Kubernetes network plugin by using one of the following commands:

To specify the network provider without changing the cluster network IP address block, enter the following command:

oc patch Network.config.openshift.io cluster \
  --type='merge' --patch '{ "spec": { "networkType": "OVNKubernetes" } }'

$ oc patch Network.config.openshift.io cluster \
  --type='merge' --patch '{ "spec": { "networkType": "OVNKubernetes" } }'

Copy to Clipboard

Toggle word wrap

To specify a different cluster network IP address block, enter the following command:

oc patch Network.config.openshift.io cluster \
  --type='merge' --patch '{
    "spec": {
      "clusterNetwork": [
        {
          "cidr": "<cidr>",
          "hostPrefix": <prefix>
        }
      ],
      "networkType": "OVNKubernetes"
    }
  }'

$ oc patch Network.config.openshift.io cluster \
  --type='merge' --patch '{
    "spec": {
      "clusterNetwork": [
        {
          "cidr": "<cidr>",
          "hostPrefix": <prefix>
        }
      ],
      "networkType": "OVNKubernetes"
    }
  }'

Copy to Clipboard

Toggle word wrap

where cidr is a CIDR block and prefix is the slice of the CIDR block apportioned to each node in your cluster. You cannot use any CIDR block that overlaps with the 100.64.0.0/16 CIDR block because the OVN-Kubernetes network provider uses this block internally.

Important

You cannot change the service network address block during the migration.

Verify that the Multus daemon set rollout is complete before continuing with subsequent steps:

oc -n openshift-multus rollout status daemonset/multus

$ oc -n openshift-multus rollout status daemonset/multus

Copy to Clipboard

Toggle word wrap

The name of the Multus pods is in the form of multus-<xxxxx> where <xxxxx> is a random sequence of letters. It might take several moments for the pods to restart.

Example output

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Copy to Clipboard

Toggle word wrap

To complete changing the network plugin, reboot each node in your cluster. You can reboot the nodes in your cluster with either of the following approaches:

Important

The following scripts reboot all of the nodes in the cluster at the same time. This can cause your cluster to be unstable. Another option is to reboot your nodes manually one at a time. Rebooting nodes one-by-one causes considerable downtime in a cluster with many nodes.

Cluster Operators will not work correctly before you reboot the nodes.

With the oc rsh command, you can use a bash script similar to the following:

#!/bin/bash
readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')"

for i in "${POD_NODES[@]}"
do
  read -r POD NODE <<< "$i"
  until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1
    do
      echo "cannot reboot node $NODE, retry" && sleep 3
    done
done

#!/bin/bash
readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')"

for i in "${POD_NODES[@]}"
do
  read -r POD NODE <<< "$i"
  until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1
    do
      echo "cannot reboot node $NODE, retry" && sleep 3
    done
done

Copy to Clipboard

Toggle word wrap

With the ssh command, you can use a bash script similar to the following. The script assumes that you have configured sudo to not prompt for a password.

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

Copy to Clipboard

Toggle word wrap

Confirm that the migration succeeded:
1. To confirm that the network plugin is OVN-Kubernetes, enter the following command. The value of status.networkType must be OVNKubernetes.
  $ oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
  Copy to Clipboard Toggle word wrap
2. To confirm that the cluster nodes are in the Ready state, enter the following command:
  $ oc get nodes
  Copy to Clipboard Toggle word wrap
3. To confirm that your pods are not in an error state, enter the following command:
  $ oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
  Copy to Clipboard Toggle word wrap
  If pods on a node are in an error state, reboot that node.
4. To confirm that all of the cluster Operators are not in an abnormal state, enter the following command:
  $ oc get co
  Copy to Clipboard Toggle word wrap
  The status of every cluster Operator must be the following: AVAILABLE="True", PROGRESSING="False", DEGRADED="False". If a cluster Operator is not available or degraded, check the logs for the cluster Operator for more information.
Complete the following steps only if the migration succeeds and your cluster is in a good state:
1. To remove the migration configuration from the CNO configuration object, enter the following command:
  $ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": null } }'
  Copy to Clipboard Toggle word wrap
2. To remove custom configuration for the OpenShift SDN network provider, enter the following command:
  $ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "defaultNetwork": { "openshiftSDNConfig": null } } }'
  Copy to Clipboard Toggle word wrap
3. To remove the OpenShift SDN network provider namespace, enter the following command:
  $ oc delete namespace openshift-sdn
  Copy to Clipboard Toggle word wrap
4. After a successful migration operation, remove the network.openshift.io/network-type-migration- annotation from the network.config custom resource by entering the following command:
  $ oc annotate network.config cluster network.openshift.io/network-type-migration-
  Copy to Clipboard Toggle word wrap

Next steps

Optional: After cluster migration, you can convert your IPv4 single-stack cluster to a dual-network cluster network that supports IPv4 and IPv6 address families. For more information, see "Converting to IPv4/IPv6 dual-stack networking".

25.5.4. Understanding changes to external IP behavior in OVN-Kubernetes
Copy link

When migrating from OpenShift SDN to OVN-Kubernetes (OVN-K), services that use external IPs might become inaccessible across namespaces due to network policy enforcement.

In OpenShift SDN, external IPs were accessible across namespaces by default. However, in OVN-K, network policies strictly enforce multitenant isolation, preventing access to services exposed via external IPs from other namespaces.

To ensure access, consider the following alternatives:

Use an ingress or route: Instead of exposing services by using external IPs, configure an ingress or route to allow external access while maintaining security controls.
Adjust the NetworkPolicy custom resource (CR): Modify a NetworkPolicy CR to explicitly allow access from required namespaces and ensure that traffic is allowed to the designated service ports. Without explicitly allowing traffic to the required ports, access might still be blocked, even if the namespace is allowed.
Use a LoadBalancer service: If applicable, deploy a LoadBalancer service instead of relying on external IPs. For more information about configuring see "NetworkPolicy and external IPs in OVN-Kubernetes".

25.6. Rolling back to the OpenShift SDN network provider
Copy link

As a cluster administrator, you can rollback to the OpenShift SDN from the OVN-Kubernetes network plugin only after the migration to the OVN-Kubernetes network plugin is completed and successful.

25.6.1. Migrating to the OpenShift SDN network plugin
Copy link

Cluster administrators can roll back to the OpenShift SDN Container Network Interface (CNI) network plugin by using the offline migration method. During the migration you must manually reboot every node in your cluster. With the offline migration method, there is some downtime, during which your cluster is unreachable.

Important

You must wait until the migration process from OpenShift SDN to OVN-Kubernetes network plugin is successful before initiating a rollback.

Prerequisites

Install the OpenShift CLI (oc).
Access to the cluster as a user with the cluster-admin role.
A cluster installed on infrastructure configured with the OVN-Kubernetes network plugin.
A recent backup of the etcd database is available.
A reboot can be triggered manually for each node.
The cluster is in a known good state, without any errors.

Procedure

Stop all of the machine configuration pools managed by the Machine Config Operator (MCO):

Stop the master configuration pool by entering the following command in your CLI:

oc patch MachineConfigPool master --type='merge' --patch \
  '{ "spec": { "paused": true } }'

$ oc patch MachineConfigPool master --type='merge' --patch \
  '{ "spec": { "paused": true } }'

Copy to Clipboard

Toggle word wrap

Stop the worker machine configuration pool by entering the following command in your CLI:

oc patch MachineConfigPool worker --type='merge' --patch \
  '{ "spec":{ "paused": true } }'

$ oc patch MachineConfigPool worker --type='merge' --patch \
  '{ "spec":{ "paused": true } }'

Copy to Clipboard

Toggle word wrap

To prepare for the migration, set the migration field to null by entering the following command in your CLI:

oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

Copy to Clipboard

Toggle word wrap

Check that the migration status is empty for the Network.config.openshift.io object by entering the following command in your CLI. Empty command output indicates that the object is not in a migration operation.
```
oc get Network.config cluster -o jsonpath='{.status.migration}'
```
```
$ oc get Network.config cluster -o jsonpath='{.status.migration}'
```
Copy to Clipboard Toggle word wrap
Apply the patch to the Network.operator.openshift.io object to set the network plugin back to OpenShift SDN by entering the following command in your CLI:
```
oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OpenShiftSDN" } } }'
```
```
$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OpenShiftSDN" } } }'
```
Copy to Clipboard Toggle word wrap
Important
If you applied the patch to the Network.config.openshift.io object before the patch operation finalizes on the Network.operator.openshift.io object, the Cluster Network Operator (CNO) enters into a degradation state and this causes a slight delay until the CNO recovers from the degraded state.
Confirm that the migration status of the network plugin for the Network.config.openshift.io cluster object is OpenShiftSDN by entering the following command in your CLI:
```
oc get Network.config cluster -o jsonpath='{.status.migration.networkType}'
```
```
$ oc get Network.config cluster -o jsonpath='{.status.migration.networkType}'
```
Copy to Clipboard Toggle word wrap

Apply the patch to the Network.config.openshift.io object to set the network plugin back to OpenShift SDN by entering the following command in your CLI:

oc patch Network.config.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "networkType": "OpenShiftSDN" } }'

$ oc patch Network.config.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "networkType": "OpenShiftSDN" } }'

Copy to Clipboard

Toggle word wrap

Optional: Disable automatic migration of several OVN-Kubernetes capabilities to the OpenShift SDN equivalents:

Egress IPs
Egress firewall
Multicast

To disable automatic migration of the configuration for any of the previously noted OpenShift SDN features, specify the following keys:

oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{
    "spec": {
      "migration": {
        "networkType": "OpenShiftSDN",
        "features": {
          "egressIP": <bool>,
          "egressFirewall": <bool>,
          "multicast": <bool>
        }
      }
    }
  }'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{
    "spec": {
      "migration": {
        "networkType": "OpenShiftSDN",
        "features": {
          "egressIP": <bool>,
          "egressFirewall": <bool>,
          "multicast": <bool>
        }
      }
    }
  }'

Copy to Clipboard

Toggle word wrap

where:

bool: Specifies whether to enable migration of the feature. The default is true.

Optional: You can customize the following settings for OpenShift SDN to meet your network infrastructure requirements:
- Maximum transmission unit (MTU)
- VXLAN port
To customize either or both of the previously noted settings, customize and enter the following command in your CLI. If you do not need to change the default value, omit the key from the patch.
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "openshiftSDNConfig":{
          "mtu":<mtu>,
          "vxlanPort":<port>
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "openshiftSDNConfig":{
          "mtu":<mtu>,
          "vxlanPort":<port>
    }}}}'
```
Copy to Clipboard Toggle word wrap
mtu
The MTU for the VXLAN overlay network. This value is normally configured automatically, but if the nodes in your cluster do not all use the same MTU, then you must set this explicitly to 50 less than the smallest node MTU value.
port
The UDP port for the VXLAN overlay network. If a value is not specified, the default is 4789. The port cannot be the same as the Geneve port that is used by OVN-Kubernetes. The default value for the Geneve port is 6081.
Example patch command
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "openshiftSDNConfig":{
          "mtu":1200
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "openshiftSDNConfig":{
          "mtu":1200
    }}}}'
```
Copy to Clipboard Toggle word wrap

Reboot each node in your cluster. You can reboot the nodes in your cluster with either of the following approaches:

With the oc rsh command, you can use a bash script similar to the following:

#!/bin/bash
readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')"

for i in "${POD_NODES[@]}"
do
  read -r POD NODE <<< "$i"
  until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1
    do
      echo "cannot reboot node $NODE, retry" && sleep 3
    done
done

#!/bin/bash
readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')"

for i in "${POD_NODES[@]}"
do
  read -r POD NODE <<< "$i"
  until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1
    do
      echo "cannot reboot node $NODE, retry" && sleep 3
    done
done

Copy to Clipboard

Toggle word wrap

With the ssh command, you can use a bash script similar to the following. The script assumes that you have configured sudo to not prompt for a password.

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

Copy to Clipboard

Toggle word wrap

Wait until the Multus daemon set rollout completes. Run the following command to see your rollout status:

oc -n openshift-multus rollout status daemonset/multus

$ oc -n openshift-multus rollout status daemonset/multus

Copy to Clipboard

Toggle word wrap

The name of the Multus pods is in the form of multus-<xxxxx> where <xxxxx> is a random sequence of letters. It might take several moments for the pods to restart.

Example output

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Copy to Clipboard

Toggle word wrap

After the nodes in your cluster have rebooted and the multus pods are rolled out, start all of the machine configuration pools by running the following commands::
- Start the master configuration pool:
  $ oc patch MachineConfigPool master --type='merge' --patch \ '{ "spec": { "paused": false } }'
  Copy to Clipboard Toggle word wrap
- Start the worker configuration pool:
  $ oc patch MachineConfigPool worker --type='merge' --patch \ '{ "spec": { "paused": false } }'
  Copy to Clipboard Toggle word wrap
As the MCO updates machines in each config pool, it reboots each node.
By default the MCO updates a single machine per pool at a time, so the time that the migration requires to complete grows with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
1. To list the machine configuration state and the name of the applied machine configuration, enter the following command in your CLI:
  $ oc describe node | egrep "hostname|machineconfig"
  Copy to Clipboard Toggle word wrap
  Example output
  kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
  
  Copy to Clipboard Toggle word wrap
  Verify that the following statements are true:
  - The value of machineconfiguration.openshift.io/state field is Done.
  - The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.
2. To confirm that the machine config is correct, enter the following command in your CLI:
  $ oc get machineconfig <config_name> -o yaml
  Copy to Clipboard Toggle word wrap
  where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.

Confirm that the migration succeeded:

To confirm that the network plugin is OpenShift SDN, enter the following command in your CLI. The value of status.networkType must be OpenShiftSDN.
```
oc get Network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
```
```
$ oc get Network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
```
Copy to Clipboard Toggle word wrap
To confirm that the cluster nodes are in the Ready state, enter the following command in your CLI:
```
oc get nodes
```
```
$ oc get nodes
```
Copy to Clipboard Toggle word wrap

If a node is stuck in the NotReady state, investigate the machine config daemon pod logs and resolve any errors.

To list the pods, enter the following command in your CLI:

oc get pod -n openshift-machine-config-operator

$ oc get pod -n openshift-machine-config-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

Copy to Clipboard

Toggle word wrap

The names for the config daemon pods are in the following format: machine-config-daemon-<seq>. The <seq> value is a random five character alphanumeric sequence.

To display the pod log for each machine config daemon pod shown in the previous output, enter the following command in your CLI:
```
oc logs <pod> -n openshift-machine-config-operator
```
```
$ oc logs <pod> -n openshift-machine-config-operator
```
Copy to Clipboard Toggle word wrap
where pod is the name of a machine config daemon pod.
Resolve any errors in the logs shown by the output from the previous command.

To confirm that your pods are not in an error state, enter the following command in your CLI:
```
oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
```
```
$ oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
```
Copy to Clipboard Toggle word wrap
If pods on a node are in an error state, reboot that node.

Complete the following steps only if the migration succeeds and your cluster is in a good state:

To remove the migration configuration from the Cluster Network Operator configuration object, enter the following command in your CLI:

oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

Copy to Clipboard

Toggle word wrap

To remove the OVN-Kubernetes configuration, enter the following command in your CLI:

oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "defaultNetwork": { "ovnKubernetesConfig":null } } }'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "defaultNetwork": { "ovnKubernetesConfig":null } } }'

Copy to Clipboard

Toggle word wrap

To remove the OVN-Kubernetes network provider namespace, enter the following command in your CLI:
```
oc delete namespace openshift-ovn-kubernetes
```
```
$ oc delete namespace openshift-ovn-kubernetes
```
Copy to Clipboard Toggle word wrap

25.6.2. Using an Ansible playbook to roll back to the OpenShift SDN network plugin
Copy link

As a cluster administrator, you can use the playbooks/playbook-rollback.yml from the network.offline_migration_sdn_to_ovnk Ansible collection to roll back from the OVN-Kubernetes plugin to the OpenShift SDN Container Network Interface (CNI) network plugin.

Prerequisites

You installed the python3 package, minimum version 3.10.
You installed the jmespath and jq packages.
You logged in to the Red Hat Hybrid Cloud Console and opened the Ansible Automation Platform web console.
You created a security group rule that allows User Datagram Protocol (UDP) packets on port 6081 for all nodes on all cloud platforms. If you do not do this task, your cluster might fail to schedule pods.

Procedure

Install the ansible-core package, minimum version 2.15. The following example command shows how to install the ansible-core package on Red Hat Enterprise Linux (RHEL):
```
sudo dnf install -y ansible-core
```
```
$ sudo dnf install -y ansible-core
```
Copy to Clipboard Toggle word wrap

Create an ansible.cfg file and add information similar to the following example to the file. Ensure that file exists in the same directory as where the ansible-galaxy commands and the playbooks run.

cat << EOF >> ansible.cfg
[galaxy]
server_list = automation_hub, validated

[galaxy_server.automation_hub]
url=https://console.redhat.com/api/automation-hub/content/published/
auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
token=

#[galaxy_server.release_galaxy]
#url=https://galaxy.ansible.com/

[galaxy_server.validated]
url=https://console.redhat.com/api/automation-hub/content/validated/
auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
token=
EOF

$ cat << EOF >> ansible.cfg
[galaxy]
server_list = automation_hub, validated

[galaxy_server.automation_hub]
url=https://console.redhat.com/api/automation-hub/content/published/
auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
token=

#[galaxy_server.release_galaxy]
#url=https://galaxy.ansible.com/

[galaxy_server.validated]
url=https://console.redhat.com/api/automation-hub/content/validated/
auth_url=https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token
token=
EOF

Copy to Clipboard

Toggle word wrap

From the Ansible Automation Platform web console, go to the Connect to Hub page and complete the following steps:
1. In the Offline token section of the page, click the Load token button.
2. After the token loads, click the Copy to clipboard icon.
3. Open the ansible.cfg file and paste the API token in the token= parameter. The API token is required for authenticating against the server URL specified in the ansible.cfg file.
Install the network.offline_migration_sdn_to_ovnk Ansible collection by entering the following ansible-galaxy command:
```
ansible-galaxy collection install network.offline_migration_sdn_to_ovnk
```
```
$ ansible-galaxy collection install network.offline_migration_sdn_to_ovnk
```
Copy to Clipboard Toggle word wrap
Verify that the network.offline_migration_sdn_to_ovnk Ansible collection is installed on your system:
```
ansible-galaxy collection list | grep network.offline_migration_sdn_to_ovnk
```
```
$ ansible-galaxy collection list | grep network.offline_migration_sdn_to_ovnk
```
Copy to Clipboard Toggle word wrap
Example output
```
network.offline_migration_sdn_to_ovnk   1.0.2
```
```
network.offline_migration_sdn_to_ovnk   1.0.2
```
Copy to Clipboard Toggle word wrap
The network.offline_migration_sdn_to_ovnk Ansible collection is saved in the default path of ~/.ansible/collections/ansible_collections/network/offline_migration_sdn_to_ovnk/.
Configure rollback features in the playbooks/playbook-migration.yml file:
```
# ...
    rollback_disable_auto_migration: true
    rollback_egress_ip: false
    rollback_egress_firewall: false
    rollback_multicast: false
    rollback_mtu: 1400
    rollback_vxlanPort: 4790
# ...
```
```
# ...
    rollback_disable_auto_migration: true
    rollback_egress_ip: false
    rollback_egress_firewall: false
    rollback_multicast: false
    rollback_mtu: 1400
    rollback_vxlanPort: 4790
# ...
```
Copy to Clipboard Toggle word wrap
rollback_disable_auto_migration
Disables the auto-migration of OVN-Kubernetes plug-in features to the OpenShift SDN CNI plug-in. If you disable auto-migration of features, you must also set the rollback_egress_ip, rollback_egress_firewall, and rollback_multicast parameters to false. If you need to enable auto-migration of features, set the parameter to false.
rollback_mtu
Optional parameter that sets a specific maximum transmission unit (MTU) to your cluster network after the migration process.
rollback_vxlanPort
Optional parameter that sets a VXLAN (Virtual Extensible LAN) port for use by OpenShift SDN CNI plug-in. The default value for the parameter is 4790.
To run the playbooks/playbook-rollback.yml file, enter the following command:
```
ansible-playbook -v playbooks/playbook-rollback.yml
```
```
$ ansible-playbook -v playbooks/playbook-rollback.yml
```
Copy to Clipboard Toggle word wrap

25.7. Migrating from the Kuryr network plugin to the OVN-Kubernetes network plugin
Copy link

Important

Migration from Kuryr to OVN-Kubernetes is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

As the administrator of a cluster that runs on Red Hat OpenStack Platform (RHOSP), you can migrate to the OVN-Kubernetes network plugin from the Kuryr SDN network plugin.

To learn more about OVN-Kubernetes, read About the OVN-Kubernetes network plugin.

25.7.1. Migration to the OVN-Kubernetes network provider
Copy link

You can manually migrate a cluster that runs on Red Hat OpenStack Platform (RHOSP) to the OVN-Kubernetes network provider.

Important

Migration to OVN-Kubernetes is a one-way process. During migration, your cluster will be unreachable for a brief time.

25.7.1.1. Considerations when migrating to the OVN-Kubernetes network provider
Copy link

Kubernetes namespaces are kept by Kuryr in separate RHOSP networking service (Neutron) subnets. Those subnets and the IP addresses that are assigned to individual pods are not preserved during the migration.

25.7.1.2. How the migration process works
Copy link

The following table summarizes the migration process by relating the steps that you perform with the actions that your cluster and Operators take.

Expand

Table 25.9. The Kuryr to OVN-Kubernetes migration process
User-initiated steps	Migration activity
Set the `migration` field of the `Network.operator.openshift.io` custom resource (CR) named `cluster` to `OVNKubernetes`. Verify that the value of the `migration` field prints the `null` value before setting it to another value.	Cluster Network Operator (CNO) Updates the status of the `Network.config.openshift.io` CR named `cluster` accordingly. Machine Config Operator (MCO) Deploys an update to the systemd configuration that is required by OVN-Kubernetes. By default, the MCO updates a single machine per pool at a time. As a result, large clusters have longer migration times.
Update the `networkType` field of the `Network.config.openshift.io` CR.	CNO Performs the following actions: Destroys the Kuryr control plane pods: Kuryr CNIs and the Kuryr controller. Deploys the OVN-Kubernetes control plane pods. Updates the Multus objects to reflect the new network plugin.
Reboot each node in the cluster.	Cluster As nodes reboot, the cluster assigns IP addresses to pods on the OVN-Kubernetes cluster network.
Clean up remaining resources Kuryr controlled.	Cluster Holds RHOSP resources that need to be freed, as well as OpenShift Container Platform resources to configure.

25.7.2. Migrating to the OVN-Kubernetes network plugin
Copy link

As a cluster administrator, you can change the network plugin for your cluster to OVN-Kubernetes.

Important

During the migration, you must reboot every node in your cluster. Your cluster is unavailable and workloads might be interrupted. Perform the migration only if an interruption in service is acceptable.

Prerequisites

You installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.
You have a recent backup of the etcd database is available.
You can manually reboot each node.
The cluster you plan to migrate is in a known good state, without any errors.
You installed the Python interpreter.
You installed the openstacksdk python package.
You installed the openstack CLI tool.
You have access to the underlying RHOSP cloud.

Procedure

Back up the configuration for the cluster network by running the following command:

oc get Network.config.openshift.io cluster -o yaml > cluster-kuryr.yaml

$ oc get Network.config.openshift.io cluster -o yaml > cluster-kuryr.yaml

Copy to Clipboard

Toggle word wrap

To set the CLUSTERID variable, run the following command:

CLUSTERID=$(oc get infrastructure.config.openshift.io cluster -o=jsonpath='{.status.infrastructureName}')

$ CLUSTERID=$(oc get infrastructure.config.openshift.io cluster -o=jsonpath='{.status.infrastructureName}')

Copy to Clipboard

Toggle word wrap

To prepare all the nodes for the migration, set the migration field on the Cluster Network Operator configuration object by running the following command:
```
oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes" } } }'
```
```
$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes" } } }'
```
Copy to Clipboard Toggle word wrap
Note
This step does not deploy OVN-Kubernetes immediately. Specifying the migration field triggers the Machine Config Operator (MCO) to apply new machine configs to all the nodes in the cluster. This prepares the cluster for the OVN-Kubernetes deployment.
Optional: Customize the following settings for OVN-Kubernetes for your network infrastructure requirements:
- Maximum transmission unit (MTU)
- Geneve (Generic Network Virtualization Encapsulation) overlay network port
- OVN-Kubernetes IPv4 internal subnet
- OVN-Kubernetes IPv6 internal subnet
To customize these settings, enter and customize the following command:
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":<mtu>,
          "genevePort":<port>,
          "v4InternalSubnet":"<ipv4_subnet>",
          "v6InternalSubnet":"<ipv6_subnet>"
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":<mtu>,
          "genevePort":<port>,
          "v4InternalSubnet":"<ipv4_subnet>",
          "v6InternalSubnet":"<ipv6_subnet>"
    }}}}'
```
Copy to Clipboard Toggle word wrap
where:
mtu
Specifies the MTU for the Geneve overlay network. This value is normally configured automatically, but if the nodes in your cluster do not all use the same MTU, then you must set this explicitly to 100 less than the smallest node MTU value.
port
Specifies the UDP port for the Geneve overlay network. If a value is not specified, the default is 6081. The port cannot be the same as the VXLAN port that is used by Kuryr. The default value for the VXLAN port is 4789.
ipv4_subnet
Specifies an IPv4 address range for internal use by OVN-Kubernetes. You must ensure that the IP address range does not overlap with any other subnet used by your OpenShift Container Platform installation. The IP address range must be larger than the maximum number of nodes that can be added to the cluster. The default value is 100.64.0.0/16.
ipv6_subnet
Specifies an IPv6 address range for internal use by OVN-Kubernetes. You must ensure that the IP address range does not overlap with any other subnet used by your OpenShift Container Platform installation. The IP address range must be larger than the maximum number of nodes that can be added to the cluster. The default value is fd98::/48.
If you do not need to change the default value, omit the key from the patch.
Example patch command to update mtu field
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":1200
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":1200
    }}}}'
```
Copy to Clipboard Toggle word wrap
Check the machine config pool status by entering the following command:
```
oc get mcp
```
```
$ oc get mcp
```
Copy to Clipboard Toggle word wrap
While the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated before continuing.
A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.
Note
By default, the MCO updates one machine per pool at a time. Large clusters take more time to migrate than small clusters.

Confirm the status of the new machine configuration on the hosts:

To list the machine configuration state and the name of the applied machine configuration, enter the following command:

oc describe node | egrep "hostname|machineconfig"

$ oc describe node | egrep "hostname|machineconfig"

Copy to Clipboard

Toggle word wrap

Example output

kubernetes.io/hostname=master-0
machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b 
machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b 
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done

kubernetes.io/hostname=master-0
machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b

1


machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b

2


machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done

Copy to Clipboard

Toggle word wrap

Review the output from the previous step. The following statements must be true:
- The value of machineconfiguration.openshift.io/state field is Done.
- The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.
To confirm that the machine config is correct, enter the following command:
```
oc get machineconfig <config_name> -o yaml | grep ExecStart
```
```
$ oc get machineconfig <config_name> -o yaml | grep ExecStart
```
Copy to Clipboard Toggle word wrap
where:
<config_name>
Specifies the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.
The machine config must include the following update to the systemd configuration:
Example output

ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes

Copy to Clipboard Toggle word wrap

If a node is stuck in the NotReady state, investigate the machine config daemon pod logs and resolve any errors:

To list the pods, enter the following command:

oc get pod -n openshift-machine-config-operator

$ oc get pod -n openshift-machine-config-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

Copy to Clipboard

Toggle word wrap

The names for the config daemon pods are in the following format: machine-config-daemon-<seq>. The <seq> value is a random five character alphanumeric sequence.

Display the pod log for the first machine config daemon pod shown in the previous output by enter the following command:
```
oc logs <pod> -n openshift-machine-config-operator
```
```
$ oc logs <pod> -n openshift-machine-config-operator
```
Copy to Clipboard Toggle word wrap
where:
<pod>
Specifies the name of a machine config daemon pod.
Resolve any errors in the logs shown by the output from the previous command.

To start the migration, configure the OVN-Kubernetes network plugin by using one of the following commands:
- To specify the network provider without changing the cluster network IP address block, enter the following command:
  $ oc patch Network.config.openshift.io cluster \ --type='merge' --patch '{ "spec": { "networkType": "OVNKubernetes" } }'
  Copy to Clipboard Toggle word wrap
- To specify a different cluster network IP address block, enter the following command:
  $ oc patch Network.config.openshift.io cluster \ --type='merge' --patch '{ "spec": { "clusterNetwork": [ { "cidr": "<cidr>", "hostPrefix": "<prefix>" } ] "networkType": "OVNKubernetes" } }'
  Copy to Clipboard Toggle word wrap
  where:
  <cidr>
  Specifies a CIDR block.
  <prefix>
  Specifies a slice of the CIDR block that is apportioned to each node in your cluster.
  Important
  You cannot change the service network address block during the migration.
  You cannot use any CIDR block that overlaps with the 100.64.0.0/16 CIDR block because the OVN-Kubernetes network provider uses this block internally.

Verify that the Multus daemon set rollout is complete by entering the following command:

oc -n openshift-multus rollout status daemonset/multus

$ oc -n openshift-multus rollout status daemonset/multus

Copy to Clipboard

Toggle word wrap

The name of the Multus pods is in the form of multus-<xxxxx>, where <xxxxx> is a random sequence of letters. It might take several moments for the pods to restart.

Example output

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Copy to Clipboard

Toggle word wrap

To complete the migration, reboot each node in your cluster. For example, you can use a bash script similar to the following example. The script assumes that you can connect to each host by using ssh and that you have configured sudo to not prompt for a password:

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

Copy to Clipboard

Toggle word wrap

Note

If SSH access is not available, you can use the openstack command:

for name in $(openstack server list --name ${CLUSTERID}\* -f value -c Name); do openstack server reboot $name; done

$ for name in $(openstack server list --name ${CLUSTERID}\* -f value -c Name); do openstack server reboot $name; done

Copy to Clipboard

Toggle word wrap

Alternatively, you might be able to reboot each node through the management portal for your infrastructure provider. Otherwise, contact the appropriate authority to either gain access to the virtual machines through SSH or the management portal and OpenStack client.

Verification

Confirm that the migration succeeded, and then remove the migration resources:
1. To confirm that the network plugin is OVN-Kubernetes, enter the following command.
  $ oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
  Copy to Clipboard Toggle word wrap
  The value of status.networkType must be OVNKubernetes.
2. To confirm that the cluster nodes are in the Ready state, enter the following command:
  $ oc get nodes
  Copy to Clipboard Toggle word wrap
3. To confirm that your pods are not in an error state, enter the following command:
  $ oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
  Copy to Clipboard Toggle word wrap
  If pods on a node are in an error state, reboot that node.
4. To confirm that all of the cluster Operators are not in an abnormal state, enter the following command:
  $ oc get co
  Copy to Clipboard Toggle word wrap
  The status of every cluster Operator must be the following: AVAILABLE="True", PROGRESSING="False", DEGRADED="False". If a cluster Operator is not available or degraded, check the logs for the cluster Operator for more information.
  Important
  Do not proceed if any of the previous verification steps indicate errors. You might encounter pods that have a Terminating state due to finalizers that are removed during clean up. They are not an error indication.

If the migration completed and your cluster is in a good state, remove the migration configuration from the CNO configuration object by entering the following command:

oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

Copy to Clipboard

Toggle word wrap

25.7.3. Cleaning up resources after migration
Copy link

After migration from the Kuryr network plugin to the OVN-Kubernetes network plugin, you must clean up the resources that Kuryr created previously.

Note

The clean up process relies on a Python virtual environment to ensure that the package versions that you use support tags for Octavia objects. You do not need a virtual environment if you are certain that your environment uses at minimum: * openstacksdk version 0.54.0 * python-openstackclient version 5.5.0 * python-octaviaclient version 2.3.0

Prerequisites

You installed the OpenShift Container Platform CLI (oc).
You installed a Python interpreter.
You installed the openstacksdk Python package.
You installed the openstack CLI.
You have access to the underlying RHOSP cloud.
You can access the cluster as a user with the cluster-admin role.

Procedure

Create a clean-up Python virtual environment:
1. Create a temporary directory for your environment. For example:
  $ python3 -m venv /tmp/venv
  Copy to Clipboard Toggle word wrap
  The virtual environment located in /tmp/venv directory is used in all clean up examples.
2. Enter the virtual environment. For example:
  $ source /tmp/venv/bin/activate
  Copy to Clipboard Toggle word wrap
3. Upgrade the pip command in the virtual environment by running the following command:
  (venv) $ pip install pip --upgrade
  Copy to Clipboard Toggle word wrap
4. Install the required Python packages by running the following command:
  (venv) $ pip install openstacksdk==0.54.0 python-openstackclient==5.5.0 python-octaviaclient==2.3.0
  Copy to Clipboard Toggle word wrap

In your terminal, set variables to cluster and Kuryr identifiers by running the following commands:

Set the cluster ID:

(venv) $ CLUSTERID=$(oc get infrastructure.config.openshift.io cluster -o=jsonpath='{.status.infrastructureName}')

(venv) $ CLUSTERID=$(oc get infrastructure.config.openshift.io cluster -o=jsonpath='{.status.infrastructureName}')

Copy to Clipboard

Toggle word wrap

Set the cluster tag:

(venv) $ CLUSTERTAG="openshiftClusterID=${CLUSTERID}"

(venv) $ CLUSTERTAG="openshiftClusterID=${CLUSTERID}"

Copy to Clipboard

Toggle word wrap

Set the router ID:

(venv) $ ROUTERID=$(oc get kuryrnetwork -A --no-headers -o custom-columns=":status.routerId"|head -n 1)

(venv) $ ROUTERID=$(oc get kuryrnetwork -A --no-headers -o custom-columns=":status.routerId"|head -n 1)

Copy to Clipboard

Toggle word wrap

Create a Bash function that removes finalizers from specified resources by running the following command:

(venv) $ function REMFIN {
    local resource=$1
    local finalizer=$2
    for res in $(oc get $resource -A --template='{{range $i,$p := .items}}{{ $p.metadata.name }}|{{ $p.metadata.namespace }}{{"\n"}}{{end}}'); do
        name=${res%%|*}
        ns=${res##*|}
        yaml=$(oc get -n $ns $resource $name -o yaml)
        if echo "${yaml}" | grep -q "${finalizer}"; then
            echo "${yaml}" | grep -v  "${finalizer}" | oc replace -n $ns $resource $name -f -
        fi
    done
}

(venv) $ function REMFIN {
    local resource=$1
    local finalizer=$2
    for res in $(oc get $resource -A --template='{{range $i,$p := .items}}{{ $p.metadata.name }}|{{ $p.metadata.namespace }}{{"\n"}}{{end}}'); do
        name=${res%%|*}
        ns=${res##*|}
        yaml=$(oc get -n $ns $resource $name -o yaml)
        if echo "${yaml}" | grep -q "${finalizer}"; then
            echo "${yaml}" | grep -v  "${finalizer}" | oc replace -n $ns $resource $name -f -
        fi
    done
}

Copy to Clipboard

Toggle word wrap

The function takes two parameters: the first parameter is name of the resource, and the second parameter is the finalizer to remove. The named resource is removed from the cluster and its definition is replaced with copied data, excluding the specified finalizer.

To remove Kuryr finalizers from services, enter the following command:
```
(venv) $ REMFIN services kuryr.openstack.org/service-finalizer
```
```
(venv) $ REMFIN services kuryr.openstack.org/service-finalizer
```
Copy to Clipboard Toggle word wrap

To remove the Kuryr service-subnet-gateway-ip service, enter the following command:

(venv) $ if $(oc get -n openshift-kuryr service service-subnet-gateway-ip &>/dev/null); then
    oc -n openshift-kuryr delete service service-subnet-gateway-ip
fi

(venv) $ if $(oc get -n openshift-kuryr service service-subnet-gateway-ip &>/dev/null); then
    oc -n openshift-kuryr delete service service-subnet-gateway-ip
fi

Copy to Clipboard

Toggle word wrap

To remove all tagged RHOSP load balancers from Octavia, enter the following command:

(venv) $ for lb in $(openstack loadbalancer list --tags $CLUSTERTAG -f value -c id); do
    openstack loadbalancer delete --cascade $lb
done

(venv) $ for lb in $(openstack loadbalancer list --tags $CLUSTERTAG -f value -c id); do
    openstack loadbalancer delete --cascade $lb
done

Copy to Clipboard

Toggle word wrap

To remove Kuryr finalizers from all KuryrLoadBalancer CRs, enter the following command:

(venv) $ REMFIN kuryrloadbalancers.openstack.org kuryr.openstack.org/kuryrloadbalancer-finalizers

(venv) $ REMFIN kuryrloadbalancers.openstack.org kuryr.openstack.org/kuryrloadbalancer-finalizers

Copy to Clipboard

Toggle word wrap

To remove the openshift-kuryr namespace, enter the following command:
```
(venv) $ oc delete namespace openshift-kuryr
```
```
(venv) $ oc delete namespace openshift-kuryr
```
Copy to Clipboard Toggle word wrap

To remove the Kuryr service subnet from the router, enter the following command:

(venv) $ openstack router remove subnet $ROUTERID ${CLUSTERID}-kuryr-service-subnet

(venv) $ openstack router remove subnet $ROUTERID ${CLUSTERID}-kuryr-service-subnet

Copy to Clipboard

Toggle word wrap

To remove the Kuryr service network, enter the following command:

(venv) $ openstack network delete ${CLUSTERID}-kuryr-service-network

(venv) $ openstack network delete ${CLUSTERID}-kuryr-service-network

Copy to Clipboard

Toggle word wrap

To remove Kuryr finalizers from all pods, enter the following command:
```
(venv) $ REMFIN pods kuryr.openstack.org/pod-finalizer
```
```
(venv) $ REMFIN pods kuryr.openstack.org/pod-finalizer
```
Copy to Clipboard Toggle word wrap
To remove Kuryr finalizers from all KuryrPort CRs, enter the following command:
```
(venv) $ REMFIN kuryrports.openstack.org kuryr.openstack.org/kuryrport-finalizer
```
```
(venv) $ REMFIN kuryrports.openstack.org kuryr.openstack.org/kuryrport-finalizer
```
Copy to Clipboard Toggle word wrap
This command deletes the KuryrPort CRs.

To remove Kuryr finalizers from network policies, enter the following command:

(venv) $ REMFIN networkpolicy kuryr.openstack.org/networkpolicy-finalizer

(venv) $ REMFIN networkpolicy kuryr.openstack.org/networkpolicy-finalizer

Copy to Clipboard

Toggle word wrap

To remove Kuryr finalizers from remaining network policies, enter the following command:

(venv) $ REMFIN kuryrnetworkpolicies.openstack.org kuryr.openstack.org/networkpolicy-finalizer

(venv) $ REMFIN kuryrnetworkpolicies.openstack.org kuryr.openstack.org/networkpolicy-finalizer

Copy to Clipboard

Toggle word wrap

To remove subports that Kuryr created from trunks, enter the following command:

(venv) $ read -ra trunks <<< $(python -c "import openstack; n = openstack.connect().network; print(' '.join([x.id for x in n.trunks(any_tags='$CLUSTERTAG')]))") && \
i=0 && \
for trunk in "${trunks[@]}"; do
    i=$((i+1))
    echo "Processing trunk $trunk, ${i}/${#trunks[@]}."
    subports=()
    for subport in $(python -c "import openstack; n = openstack.connect().network; print(' '.join([x['port_id'] for x in n.get_trunk('$trunk').sub_ports if '$CLUSTERTAG' in n.get_port(x['port_id']).tags]))"); do
        subports+=("$subport");
    done
    args=()
    for sub in "${subports[@]}" ; do
        args+=("--subport $sub")
    done
    if [ ${#args[@]} -gt 0 ]; then
        openstack network trunk unset ${args[*]} $trunk
    fi
done

(venv) $ read -ra trunks <<< $(python -c "import openstack; n = openstack.connect().network; print(' '.join([x.id for x in n.trunks(any_tags='$CLUSTERTAG')]))") && \
i=0 && \
for trunk in "${trunks[@]}"; do
    i=$((i+1))
    echo "Processing trunk $trunk, ${i}/${#trunks[@]}."
    subports=()
    for subport in $(python -c "import openstack; n = openstack.connect().network; print(' '.join([x['port_id'] for x in n.get_trunk('$trunk').sub_ports if '$CLUSTERTAG' in n.get_port(x['port_id']).tags]))"); do
        subports+=("$subport");
    done
    args=()
    for sub in "${subports[@]}" ; do
        args+=("--subport $sub")
    done
    if [ ${#args[@]} -gt 0 ]; then
        openstack network trunk unset ${args[*]} $trunk
    fi
done

Copy to Clipboard

Toggle word wrap

To retrieve all networks and subnets from KuryrNetwork CRs and remove ports, router interfaces and the network itself, enter the following command:

(venv) $ mapfile -t kuryrnetworks < <(oc get kuryrnetwork -A --template='{{range $i,$p := .items}}{{ $p.status.netId }}|{{ $p.status.subnetId }}{{"\n"}}{{end}}') && \
i=0 && \
for kn in "${kuryrnetworks[@]}"; do
    i=$((i+1))
    netID=${kn%%|*}
    subnetID=${kn##*|}
    echo "Processing network $netID, ${i}/${#kuryrnetworks[@]}"
    # Remove all ports from the network.
    for port in $(python -c "import openstack; n = openstack.connect().network; print(' '.join([x.id for x in n.ports(network_id='$netID') if x.device_owner != 'network:router_interface']))"); do
        ( openstack port delete $port ) &

        # Only allow 20 jobs in parallel.
        if [[ $(jobs -r -p | wc -l) -ge 20 ]]; then
            wait -n
        fi
    done
    wait

    # Remove the subnet from the router.
    openstack router remove subnet $ROUTERID $subnetID

    # Remove the network.
    openstack network delete $netID
done

(venv) $ mapfile -t kuryrnetworks < <(oc get kuryrnetwork -A --template='{{range $i,$p := .items}}{{ $p.status.netId }}|{{ $p.status.subnetId }}{{"\n"}}{{end}}') && \
i=0 && \
for kn in "${kuryrnetworks[@]}"; do
    i=$((i+1))
    netID=${kn%%|*}
    subnetID=${kn##*|}
    echo "Processing network $netID, ${i}/${#kuryrnetworks[@]}"
    # Remove all ports from the network.
    for port in $(python -c "import openstack; n = openstack.connect().network; print(' '.join([x.id for x in n.ports(network_id='$netID') if x.device_owner != 'network:router_interface']))"); do
        ( openstack port delete $port ) &

        # Only allow 20 jobs in parallel.
        if [[ $(jobs -r -p | wc -l) -ge 20 ]]; then
            wait -n
        fi
    done
    wait

    # Remove the subnet from the router.
    openstack router remove subnet $ROUTERID $subnetID

    # Remove the network.
    openstack network delete $netID
done

Copy to Clipboard

Toggle word wrap

To remove the Kuryr security group, enter the following command:

(venv) $ openstack security group delete ${CLUSTERID}-kuryr-pods-security-group

(venv) $ openstack security group delete ${CLUSTERID}-kuryr-pods-security-group

Copy to Clipboard

Toggle word wrap

To remove all tagged subnet pools, enter the following command:

(venv) $ for subnetpool in $(openstack subnet pool list --tags $CLUSTERTAG -f value -c ID); do
    openstack subnet pool delete $subnetpool
done

(venv) $ for subnetpool in $(openstack subnet pool list --tags $CLUSTERTAG -f value -c ID); do
    openstack subnet pool delete $subnetpool
done

Copy to Clipboard

Toggle word wrap

To check that all of the networks based on KuryrNetwork CRs were removed, enter the following command:

(venv) $ networks=$(oc get kuryrnetwork -A --no-headers -o custom-columns=":status.netId") && \
for existingNet in $(openstack network list --tags $CLUSTERTAG -f value -c ID); do
    if [[ $networks =~ $existingNet ]]; then
        echo "Network still exists: $existingNet"
    fi
done

(venv) $ networks=$(oc get kuryrnetwork -A --no-headers -o custom-columns=":status.netId") && \
for existingNet in $(openstack network list --tags $CLUSTERTAG -f value -c ID); do
    if [[ $networks =~ $existingNet ]]; then
        echo "Network still exists: $existingNet"
    fi
done

Copy to Clipboard

Toggle word wrap

If the command returns any existing networks, intestigate and remove them before you continue.

To remove security groups that are related to network policy, enter the following command:

(venv) $ for sgid in $(openstack security group list -f value -c ID -c Description | grep 'Kuryr-Kubernetes Network Policy' | cut -f 1 -d ' '); do
    openstack security group delete $sgid
done

(venv) $ for sgid in $(openstack security group list -f value -c ID -c Description | grep 'Kuryr-Kubernetes Network Policy' | cut -f 1 -d ' '); do
    openstack security group delete $sgid
done

Copy to Clipboard

Toggle word wrap

To remove finalizers from KuryrNetwork CRs, enter the following command:

(venv) $ REMFIN kuryrnetworks.openstack.org kuryrnetwork.finalizers.kuryr.openstack.org

(venv) $ REMFIN kuryrnetworks.openstack.org kuryrnetwork.finalizers.kuryr.openstack.org

Copy to Clipboard

Toggle word wrap

To remove the Kuryr router, enter the following command:

(venv) $ if $(python3 -c "import sys; import openstack; n = openstack.connect().network; r = n.get_router('$ROUTERID'); sys.exit(0) if r.description != 'Created By OpenShift Installer' else sys.exit(1)"); then
    openstack router delete $ROUTERID
fi

(venv) $ if $(python3 -c "import sys; import openstack; n = openstack.connect().network; r = n.get_router('$ROUTERID'); sys.exit(0) if r.description != 'Created By OpenShift Installer' else sys.exit(1)"); then
    openstack router delete $ROUTERID
fi

Copy to Clipboard

Toggle word wrap

25.8. Converting to IPv4/IPv6 dual-stack networking
Copy link

As a cluster administrator, you can convert your IPv4 single-stack cluster to a dual-network cluster network that supports IPv4 and IPv6 address families. After converting to dual-stack, all newly created pods are dual-stack enabled.

Note

A dual-stack network is supported on clusters provisioned on bare metal, VMware vSphere, IBM Power, IBM Z infrastructure, and single node OpenShift clusters.

Note

While using dual-stack networking, you cannot use IPv4-mapped IPv6 addresses, such as ::FFFF:198.51.100.1, where IPv6 is required.

25.8.1. Converting to a dual-stack cluster network
Copy link

As a cluster administrator, you can convert your single-stack cluster network to a dual-stack cluster network.

Note

After converting to dual-stack networking only newly created pods are assigned IPv6 addresses. Any pods created before the conversion must be recreated to receive an IPv6 address.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
Your cluster uses the OVN-Kubernetes network plugin.
The cluster nodes have IPv6 addresses.
You have configured an IPv6-enabled router based on your infrastructure.

Procedure

To specify IPv6 address blocks for the cluster and service networks, create a file containing the following YAML:
```
- op: add
  path: /spec/clusterNetwork/-
  value: 
    cidr: fd01::/48
    hostPrefix: 64
- op: add
  path: /spec/serviceNetwork/-
  value: fd02::/112 
```
```
- op: add
  path: /spec/clusterNetwork/-
  value: 
```
1
```
    cidr: fd01::/48
    hostPrefix: 64
- op: add
  path: /spec/serviceNetwork/-
  value: fd02::/112 
```
2
Copy to Clipboard Toggle word wrap
1
Specify an object with the cidr and hostPrefix parameters. The host prefix must be 64 or greater. The IPv6 Classless Inter-Domain Routing (CIDR) prefix must be large enough to accommodate the specified host prefix.
1 2
Specify an IPv6 CIDR with a prefix of 112. Kubernetes uses only the lowest 16 bits. For a prefix of 112, IP addresses are assigned from 112 to 128 bits.
To patch the cluster network configuration, enter the following command:
```
oc patch network.config.openshift.io cluster \
  --type='json' --patch-file <file>.yaml
```
```
$ oc patch network.config.openshift.io cluster \
  --type='json' --patch-file <file>.yaml
```
Copy to Clipboard Toggle word wrap
where:
file
Specifies the name of the file you created in the previous step.
Example output

network.config.openshift.io/cluster patched

Copy to Clipboard Toggle word wrap

Verification

Complete the following step to verify that the cluster network recognizes the IPv6 address blocks that you specified in the previous procedure.

Display the network configuration:

oc describe network

$ oc describe network

Copy to Clipboard

Toggle word wrap

Example output

Status:
  Cluster Network:
    Cidr:               10.128.0.0/14
    Host Prefix:        23
    Cidr:               fd01::/48
    Host Prefix:        64
  Cluster Network MTU:  1400
  Network Type:         OVNKubernetes
  Service Network:
    172.30.0.0/16
    fd02::/112

Status:
  Cluster Network:
    Cidr:               10.128.0.0/14
    Host Prefix:        23
    Cidr:               fd01::/48
    Host Prefix:        64
  Cluster Network MTU:  1400
  Network Type:         OVNKubernetes
  Service Network:
    172.30.0.0/16
    fd02::/112

Copy to Clipboard

Toggle word wrap

25.8.2. Converting to a single-stack cluster network
Copy link

As a cluster administrator, you can convert your dual-stack cluster network to a single-stack cluster network.

Important

If you originally converted your IPv4 single-stack cluster network to a dual-stack cluster, you can convert only back to the IPv4 single-stack cluster and not an IPv6 single-stack cluster network. The same restriction applies for converting back to an IPv6 single-stack cluster network.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
Your cluster uses the OVN-Kubernetes network plugin.
The cluster nodes have IPv6 addresses.
You have enabled dual-stack networking.

Procedure

Edit the networks.config.openshift.io custom resource (CR) by running the following command:
```
oc edit networks.config.openshift.io
```
```
$ oc edit networks.config.openshift.io
```
Copy to Clipboard Toggle word wrap
Remove the IPv4 or IPv6 configuration that you added to the cidr and the hostPrefix parameters from completing the "Converting to a dual-stack cluster network " procedure steps.

25.9. Logging for egress firewall and network policy rules
Copy link

As a cluster administrator, you can configure audit logging for your cluster and enable logging for one or more namespaces. OpenShift Container Platform produces audit logs for both egress firewalls and network policies.

Note

Audit logging is available for only the OVN-Kubernetes network plugin.

25.9.1. Audit logging
Copy link

The OVN-Kubernetes network plugin uses Open Virtual Network (OVN) ACLs to manage egress firewalls and network policies. Audit logging exposes allow and deny ACL events.

You can configure the destination for audit logs, such as a syslog server or a UNIX domain socket. Regardless of any additional configuration, an audit log is always saved to /var/log/ovn/acl-audit-log.log on each OVN-Kubernetes pod in the cluster.

You can enable audit logging for each namespace by annotating each namespace configuration with a k8s.ovn.org/acl-logging section. In the k8s.ovn.org/acl-logging section, you must specify allow, deny, or both values to enable audit logging for a namespace.

Note

A network policy does not support setting the Pass action set as a rule.

The ACL-logging implementation logs access control list (ACL) events for a network. You can view these logs to analyze any potential security issues.

Example namespace annotation

kind: Namespace
apiVersion: v1
metadata:
  name: example1
  annotations:
    k8s.ovn.org/acl-logging: |-
      {
        "deny": "info",
        "allow": "info"
      }

kind: Namespace
apiVersion: v1
metadata:
  name: example1
  annotations:
    k8s.ovn.org/acl-logging: |-
      {
        "deny": "info",
        "allow": "info"
      }

Copy to Clipboard

Toggle word wrap

To view the default ACL logging configuration values, see the policyAuditConfig object in the cluster-network-03-config.yml file. If required, you can change the ACL logging configuration values for log file parameters in this file.

The logging message format is compatible with syslog as defined by RFC5424. The syslog facility is configurable and defaults to local0. The following example shows key parameters and their values outputted in a log message:

Example logging message that outputs parameters and their values

<timestamp>|<message_serial>|acl_log(ovn_pinctrl0)|<severity>|name="<acl_name>", verdict="<verdict>", severity="<severity>", direction="<direction>": <flow>

<timestamp>|<message_serial>|acl_log(ovn_pinctrl0)|<severity>|name="<acl_name>", verdict="<verdict>", severity="<severity>", direction="<direction>": <flow>

Copy to Clipboard

Toggle word wrap

Where:

<timestamp> states the time and date for the creation of a log message.
<message_serial> lists the serial number for a log message.
acl_log(ovn_pinctrl0) is a literal string that prints the location of the log message in the OVN-Kubernetes plugin.
<severity> sets the severity level for a log message. If you enable audit logging that supports allow and deny tasks then two severity levels show in the log message output.
<name> states the name of the ACL-logging implementation in the OVN Network Bridging Database (nbdb) that was created by the network policy.
<verdict> can be either allow or drop.
<direction> can be either to-lport or from-lport to indicate that the policy was applied to traffic going to or away from a pod.
<flow> shows packet information in a format equivalent to the OpenFlow protocol. This parameter comprises Open vSwitch (OVS) fields.

The following example shows OVS fields that the flow parameter uses to extract packet information from system memory:

Example of OVS fields used by the flow parameter to extract packet information

<proto>,vlan_tci=0x0000,dl_src=<src_mac>,dl_dst=<source_mac>,nw_src=<source_ip>,nw_dst=<target_ip>,nw_tos=<tos_dscp>,nw_ecn=<tos_ecn>,nw_ttl=<ip_ttl>,nw_frag=<fragment>,tp_src=<tcp_src_port>,tp_dst=<tcp_dst_port>,tcp_flags=<tcp_flags>

<proto>,vlan_tci=0x0000,dl_src=<src_mac>,dl_dst=<source_mac>,nw_src=<source_ip>,nw_dst=<target_ip>,nw_tos=<tos_dscp>,nw_ecn=<tos_ecn>,nw_ttl=<ip_ttl>,nw_frag=<fragment>,tp_src=<tcp_src_port>,tp_dst=<tcp_dst_port>,tcp_flags=<tcp_flags>

Copy to Clipboard

Toggle word wrap

Where:

<proto> states the protocol. Valid values are tcp and udp.
vlan_tci=0x0000 states the VLAN header as 0 because a VLAN ID is not set for internal pod network traffic.
<src_mac> specifies the source for the Media Access Control (MAC) address.
<source_mac> specifies the destination for the MAC address.
<source_ip> lists the source IP address
<target_ip> lists the target IP address.
<tos_dscp> states Differentiated Services Code Point (DSCP) values to classify and prioritize certain network traffic over other traffic.
<tos_ecn> states Explicit Congestion Notification (ECN) values that indicate any congested traffic in your network.
<ip_ttl> states the Time To Live (TTP) information for an packet.
<fragment> specifies what type of IP fragments or IP non-fragments to match.
<tcp_src_port> shows the source for the port for TCP and UDP protocols.
<tcp_dst_port> lists the destination port for TCP and UDP protocols.
<tcp_flags> supports numerous flags such as SYN, ACK, PSH and so on. If you need to set multiple values then each value is separated by a vertical bar (|). The UDP protocol does not support this parameter.

Note

For more information about the previous field descriptions, go to the OVS manual page for ovs-fields.

Example ACL deny log entry for a network policy

Defaulting container name to ovn-controller.
Use 'oc describe pod/ovnkube-node-hdb8v -n openshift-ovn-kubernetes' to see all of the containers in this pod.
2021-06-13T19:33:11.590Z|00005|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:33:12.614Z|00006|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:10.037Z|00007|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:11.037Z|00008|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0

Defaulting container name to ovn-controller.
Use 'oc describe pod/ovnkube-node-hdb8v -n openshift-ovn-kubernetes' to see all of the containers in this pod.
2021-06-13T19:33:11.590Z|00005|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:33:12.614Z|00006|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:10.037Z|00007|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:11.037Z|00008|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0

Copy to Clipboard

Toggle word wrap

The following table describes namespace annotation values:

Expand

Table 25.10. Audit logging namespace annotation for k8s.ovn.org/acl-logging
Field	Description
`deny`	Blocks namespace access to any traffic that matches an ACL rule with the `deny` action. The field supports `alert`, `warning`, `notice`, `info`, or `debug` values.
`allow`	Permits namespace access to any traffic that matches an ACL rule with the `allow` action. The field supports `alert`, `warning`, `notice`, `info`, or `debug` values.
`pass`	A `pass` action applies to an admin network policy’s ACL rule. A `pass` action allows either the network policy in the namespace or the baseline admin network policy rule to evaluate all incoming and outgoing traffic. A network policy does not support a `pass` action.

25.9.2. Audit configuration
Copy link

The configuration for audit logging is specified as part of the OVN-Kubernetes cluster network provider configuration. The following YAML illustrates the default values for the audit logging:

Audit logging configuration

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    ovnKubernetesConfig:
      policyAuditConfig:
        destination: "null"
        maxFileSize: 50
        rateLimit: 20
        syslogFacility: local0

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    ovnKubernetesConfig:
      policyAuditConfig:
        destination: "null"
        maxFileSize: 50
        rateLimit: 20
        syslogFacility: local0

Copy to Clipboard

Toggle word wrap

The following table describes the configuration fields for audit logging.

Expand

Table 25.11. policyAuditConfig object
Field	Type	Description
`rateLimit`	integer	The maximum number of messages to generate every second per node. The default value is `20` messages per second.
`maxFileSize`	integer	The maximum size for the audit log in bytes. The default value is `50000000` or 50 MB.
`maxLogFiles`	integer	The maximum number of log files that are retained.
`destination`	string	One of the following additional audit log targets: `libc` The libc `syslog()` function of the journald process on the host. `udp:<host>:<port>` A syslog server. Replace `<host>:<port>` with the host and port of the syslog server. `unix:<file>` A Unix Domain Socket file specified by `<file>`. `null` Do not send the audit logs to any additional target.
`syslogFacility`	string	The syslog facility, such as `kern`, as defined by RFC5424. The default value is `local0`.

25.9.3. Configuring egress firewall and network policy auditing for a cluster
Copy link

As a cluster administrator, you can customize audit logging for your cluster.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster with a user with cluster-admin privileges.

Procedure

To customize the audit logging configuration, enter the following command:

oc edit network.operator.openshift.io/cluster

$ oc edit network.operator.openshift.io/cluster

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively customize and apply the following YAML to configure audit logging:

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    ovnKubernetesConfig:
      policyAuditConfig:
        destination: "null"
        maxFileSize: 50
        rateLimit: 20
        syslogFacility: local0

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  defaultNetwork:
    ovnKubernetesConfig:
      policyAuditConfig:
        destination: "null"
        maxFileSize: 50
        rateLimit: 20
        syslogFacility: local0

Copy to Clipboard

Toggle word wrap

Verification

To create a namespace with network policies complete the following steps:

Create a namespace for verification:

cat <<EOF| oc create -f -
kind: Namespace
apiVersion: v1
metadata:
  name: verify-audit-logging
  annotations:
    k8s.ovn.org/acl-logging: '{ "deny": "alert", "allow": "alert" }'
EOF

$ cat <<EOF| oc create -f -
kind: Namespace
apiVersion: v1
metadata:
  name: verify-audit-logging
  annotations:
    k8s.ovn.org/acl-logging: '{ "deny": "alert", "allow": "alert" }'
EOF

Copy to Clipboard

Toggle word wrap

Example output

namespace/verify-audit-logging created

namespace/verify-audit-logging created

Copy to Clipboard

Toggle word wrap

Create network policies for the namespace:

cat <<EOF| oc create -n verify-audit-logging -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
spec:
  podSelector:
    matchLabels:
  policyTypes:
  - Ingress
  - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-same-namespace
  namespace: verify-audit-logging
spec:
  podSelector: {}
  policyTypes:
   - Ingress
   - Egress
  ingress:
    - from:
        - podSelector: {}
  egress:
    - to:
       - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: verify-audit-logging
EOF

$ cat <<EOF| oc create -n verify-audit-logging -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
spec:
  podSelector:
    matchLabels:
  policyTypes:
  - Ingress
  - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-from-same-namespace
  namespace: verify-audit-logging
spec:
  podSelector: {}
  policyTypes:
   - Ingress
   - Egress
  ingress:
    - from:
        - podSelector: {}
  egress:
    - to:
       - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: verify-audit-logging
EOF

Copy to Clipboard

Toggle word wrap

Example output

networkpolicy.networking.k8s.io/deny-all created
networkpolicy.networking.k8s.io/allow-from-same-namespace created

networkpolicy.networking.k8s.io/deny-all created
networkpolicy.networking.k8s.io/allow-from-same-namespace created

Copy to Clipboard

Toggle word wrap

Create a pod for source traffic in the default namespace:

cat <<EOF| oc create -n default -f -
apiVersion: v1
kind: Pod
metadata:
  name: client
spec:
  containers:
    - name: client
      image: registry.access.redhat.com/rhel7/rhel-tools
      command: ["/bin/sh", "-c"]
      args:
        ["sleep inf"]
EOF

$ cat <<EOF| oc create -n default -f -
apiVersion: v1
kind: Pod
metadata:
  name: client
spec:
  containers:
    - name: client
      image: registry.access.redhat.com/rhel7/rhel-tools
      command: ["/bin/sh", "-c"]
      args:
        ["sleep inf"]
EOF

Copy to Clipboard

Toggle word wrap

Create two pods in the verify-audit-logging namespace:

for name in client server; do
cat <<EOF| oc create -n verify-audit-logging -f -
apiVersion: v1
kind: Pod
metadata:
  name: ${name}
spec:
  containers:
    - name: ${name}
      image: registry.access.redhat.com/rhel7/rhel-tools
      command: ["/bin/sh", "-c"]
      args:
        ["sleep inf"]
EOF
done

$ for name in client server; do
cat <<EOF| oc create -n verify-audit-logging -f -
apiVersion: v1
kind: Pod
metadata:
  name: ${name}
spec:
  containers:
    - name: ${name}
      image: registry.access.redhat.com/rhel7/rhel-tools
      command: ["/bin/sh", "-c"]
      args:
        ["sleep inf"]
EOF
done

Copy to Clipboard

Toggle word wrap

Example output

pod/client created
pod/server created

pod/client created
pod/server created

Copy to Clipboard

Toggle word wrap

To generate traffic and produce network policy audit log entries, complete the following steps:

Obtain the IP address for pod named server in the verify-audit-logging namespace:

POD_IP=$(oc get pods server -n verify-audit-logging -o jsonpath='{.status.podIP}')

$ POD_IP=$(oc get pods server -n verify-audit-logging -o jsonpath='{.status.podIP}')

Copy to Clipboard

Toggle word wrap

Ping the IP address from the previous command from the pod named client in the default namespace and confirm that all packets are dropped:

oc exec -it client -n default -- /bin/ping -c 2 $POD_IP

$ oc exec -it client -n default -- /bin/ping -c 2 $POD_IP

Copy to Clipboard

Toggle word wrap

Example output

PING 10.128.2.55 (10.128.2.55) 56(84) bytes of data.

--- 10.128.2.55 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 2041ms

PING 10.128.2.55 (10.128.2.55) 56(84) bytes of data.

--- 10.128.2.55 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 2041ms

Copy to Clipboard

Toggle word wrap

Ping the IP address saved in the POD_IP shell environment variable from the pod named client in the verify-audit-logging namespace and confirm that all packets are allowed:

oc exec -it client -n verify-audit-logging -- /bin/ping -c 2 $POD_IP

$ oc exec -it client -n verify-audit-logging -- /bin/ping -c 2 $POD_IP

Copy to Clipboard

Toggle word wrap

Example output

PING 10.128.0.86 (10.128.0.86) 56(84) bytes of data.
64 bytes from 10.128.0.86: icmp_seq=1 ttl=64 time=2.21 ms
64 bytes from 10.128.0.86: icmp_seq=2 ttl=64 time=0.440 ms

--- 10.128.0.86 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.440/1.329/2.219/0.890 ms

PING 10.128.0.86 (10.128.0.86) 56(84) bytes of data.
64 bytes from 10.128.0.86: icmp_seq=1 ttl=64 time=2.21 ms
64 bytes from 10.128.0.86: icmp_seq=2 ttl=64 time=0.440 ms

--- 10.128.0.86 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.440/1.329/2.219/0.890 ms

Copy to Clipboard

Toggle word wrap

Display the latest entries in the network policy audit log:

for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do
    oc exec -it $pod -n openshift-ovn-kubernetes -- tail -4 /var/log/ovn/acl-audit-log.log
  done

$ for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do
    oc exec -it $pod -n openshift-ovn-kubernetes -- tail -4 /var/log/ovn/acl-audit-log.log
  done

Copy to Clipboard

Toggle word wrap

Example output

Defaulting container name to ovn-controller.
Use 'oc describe pod/ovnkube-node-hdb8v -n openshift-ovn-kubernetes' to see all of the containers in this pod.
2021-06-13T19:33:11.590Z|00005|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:33:12.614Z|00006|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:10.037Z|00007|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:11.037Z|00008|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0

Defaulting container name to ovn-controller.
Use 'oc describe pod/ovnkube-node-hdb8v -n openshift-ovn-kubernetes' to see all of the containers in this pod.
2021-06-13T19:33:11.590Z|00005|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:33:12.614Z|00006|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:10.037Z|00007|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:11.037Z|00008|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0

Copy to Clipboard

Toggle word wrap

25.9.4. Enabling egress firewall and network policy audit logging for a namespace
Copy link

As a cluster administrator, you can enable audit logging for a namespace.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster with a user with cluster-admin privileges.

Procedure

To enable audit logging for a namespace, enter the following command:

oc annotate namespace <namespace> \
  k8s.ovn.org/acl-logging='{ "deny": "alert", "allow": "notice" }'

$ oc annotate namespace <namespace> \
  k8s.ovn.org/acl-logging='{ "deny": "alert", "allow": "notice" }'

Copy to Clipboard

Toggle word wrap

where:

<namespace>: Specifies the name of the namespace.

Tip

You can alternatively apply the following YAML to enable audit logging:

kind: Namespace
apiVersion: v1
metadata:
  name: <namespace>
  annotations:
    k8s.ovn.org/acl-logging: |-
      {
        "deny": "alert",
        "allow": "notice"
      }

kind: Namespace
apiVersion: v1
metadata:
  name: <namespace>
  annotations:
    k8s.ovn.org/acl-logging: |-
      {
        "deny": "alert",
        "allow": "notice"
      }

Copy to Clipboard

Toggle word wrap

Example output

namespace/verify-audit-logging annotated

namespace/verify-audit-logging annotated

Copy to Clipboard

Toggle word wrap

Verification

Display the latest entries in the audit log:

for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do
    oc exec -it $pod -n openshift-ovn-kubernetes -- tail -4 /var/log/ovn/acl-audit-log.log
  done

$ for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node --no-headers=true | awk '{ print $1 }') ; do
    oc exec -it $pod -n openshift-ovn-kubernetes -- tail -4 /var/log/ovn/acl-audit-log.log
  done

Copy to Clipboard

Toggle word wrap

Example output

Defaulting container name to ovn-controller.
Use 'oc describe pod/ovnkube-node-hdb8v -n openshift-ovn-kubernetes' to see all of the containers in this pod.
2021-06-13T19:33:11.590Z|00005|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:33:12.614Z|00006|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:10.037Z|00007|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:11.037Z|00008|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0

Defaulting container name to ovn-controller.
Use 'oc describe pod/ovnkube-node-hdb8v -n openshift-ovn-kubernetes' to see all of the containers in this pod.
2021-06-13T19:33:11.590Z|00005|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:33:12.614Z|00006|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_ingressDefaultDeny", verdict=drop, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:39,dl_dst=0a:58:0a:80:02:37,nw_src=10.128.2.57,nw_dst=10.128.2.55,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:10.037Z|00007|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0
2021-06-13T19:44:11.037Z|00008|acl_log(ovn_pinctrl0)|INFO|name="verify-audit-logging_allow-from-same-namespace_0", verdict=allow, severity=alert: icmp,vlan_tci=0x0000,dl_src=0a:58:0a:80:02:3b,dl_dst=0a:58:0a:80:02:3a,nw_src=10.128.2.59,nw_dst=10.128.2.58,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=8,icmp_code=0

Copy to Clipboard

Toggle word wrap

25.9.5. Disabling egress firewall and network policy audit logging for a namespace
Copy link

As a cluster administrator, you can disable audit logging for a namespace.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster with a user with cluster-admin privileges.

Procedure

To disable audit logging for a namespace, enter the following command:

oc annotate --overwrite namespace <namespace> k8s.ovn.org/acl-logging-

$ oc annotate --overwrite namespace <namespace> k8s.ovn.org/acl-logging-

Copy to Clipboard

Toggle word wrap

where:

<namespace>: Specifies the name of the namespace.

Tip

You can alternatively apply the following YAML to disable audit logging:

kind: Namespace
apiVersion: v1
metadata:
  name: <namespace>
  annotations:
    k8s.ovn.org/acl-logging: null

kind: Namespace
apiVersion: v1
metadata:
  name: <namespace>
  annotations:
    k8s.ovn.org/acl-logging: null

Copy to Clipboard

Toggle word wrap

Example output

namespace/verify-audit-logging annotated

namespace/verify-audit-logging annotated

Copy to Clipboard

Toggle word wrap

25.10. Configuring IPsec encryption
Copy link

With IPsec enabled, all pod-to-pod network traffic between nodes on the OVN-Kubernetes cluster network is encrypted with IPsec Transport mode.

IPsec is disabled by default. It can be enabled either during or after installing the cluster. For information about cluster installation, see OpenShift Container Platform installation overview. If you need to enable IPsec after cluster installation, you must first resize your cluster MTU to account for the overhead of the IPsec ESP IP header.

The following support limitations exist for IPsec on a OpenShift Container Platform cluster:

You must disable IPsec before updating to OpenShift Container Platform 4.15. After disabling IPsec, you must also delete the associated IPsec daemonsets. There is a known issue that can cause interruptions in pod-to-pod communication if you update without disabling IPsec. (OCPBUGS-43323)

The following documentation describes how to enable and disable IPSec after cluster installation.

25.10.1. Prerequisites
Copy link

You have decreased the size of the cluster MTU by 46 bytes to allow for the additional overhead of the IPsec ESP header. For more information on resizing the MTU that your cluster uses, see Changing the MTU for the cluster network.

25.10.2. Types of network traffic flows encrypted by IPsec
Copy link

With IPsec enabled, only the following network traffic flows between pods are encrypted:

Traffic between pods on different nodes on the cluster network
Traffic from a pod on the host network to a pod on the cluster network

The following traffic flows are not encrypted:

Traffic between pods on the same node on the cluster network
Traffic between pods on the host network
Traffic from a pod on the cluster network to a pod on the host network

The encrypted and unencrypted flows are illustrated in the following diagram:

IPsec encrypted and unencrypted traffic flows

25.10.2.1. Network connectivity requirements when IPsec is enabled
Copy link

You must configure the network connectivity between machines to allow OpenShift Container Platform cluster components to communicate. Each machine must be able to resolve the hostnames of all other machines in the cluster.

Expand

Table 25.12. Ports used for all-machine to all-machine communications
Protocol	Port	Description
UDP	`500`	IPsec IKE packets
UDP	`4500`	IPsec NAT-T packets
ESP	N/A	IPsec Encapsulating Security Payload (ESP)

25.10.3. Encryption protocol and IPsec mode
Copy link

The encrypt cipher used is AES-GCM-16-256. The integrity check value (ICV) is 16 bytes. The key length is 256 bits.

The IPsec mode used is Transport mode, a mode that encrypts end-to-end communication by adding an Encapsulated Security Payload (ESP) header to the IP header of the original packet and encrypts the packet data. OpenShift Container Platform does not currently use or support IPsec Tunnel mode for pod-to-pod communication.

25.10.4. Security certificate generation and rotation
Copy link

The Cluster Network Operator (CNO) generates a self-signed X.509 certificate authority (CA) that is used by IPsec for encryption. Certificate signing requests (CSRs) from each node are automatically fulfilled by the CNO.

The CA is valid for 10 years. The individual node certificates are valid for 5 years and are automatically rotated after 4 1/2 years elapse.

25.10.5. Enabling IPsec encryption
Copy link

As a cluster administrator, you can enable IPsec encryption after cluster installation.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster as a user with cluster-admin privileges.
You have reduced the size of your cluster maximum transmission unit (MTU) by 46 bytes to allow for the overhead of the IPsec ESP header.

Procedure

To enable IPsec encryption, enter the following command:

oc patch networks.operator.openshift.io cluster --type=merge \
-p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'

$ oc patch networks.operator.openshift.io cluster --type=merge \
-p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipsecConfig":{ }}}}}'

Copy to Clipboard

Toggle word wrap

Verification

To find the names of the OVN-Kubernetes control plane pods, enter the following command:

oc get pods -l app=ovnkube-master -n openshift-ovn-kubernetes

$ oc get pods -l app=ovnkube-master -n openshift-ovn-kubernetes

Copy to Clipboard

Toggle word wrap

Example output

NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-fvtnh   6/6     Running   0          122m
ovnkube-master-hsgmm   6/6     Running   0          122m
ovnkube-master-qcmdc   6/6     Running   0          122m

NAME                   READY   STATUS    RESTARTS   AGE
ovnkube-master-fvtnh   6/6     Running   0          122m
ovnkube-master-hsgmm   6/6     Running   0          122m
ovnkube-master-qcmdc   6/6     Running   0          122m

Copy to Clipboard

Toggle word wrap

Verify that IPsec is enabled on your cluster by entering the following command. The command output must state true to indicate that the node has IPsec enabled.
```
oc -n openshift-ovn-kubernetes rsh ovnkube-master-<pod_number_sequence> \
  ovn-nbctl --no-leader-only get nb_global . ipsec
```
```
$ oc -n openshift-ovn-kubernetes rsh ovnkube-master-<pod_number_sequence> \ 
```
1
```
  ovn-nbctl --no-leader-only get nb_global . ipsec
```
Copy to Clipboard Toggle word wrap
1
Replace <pod_number_sequence> with the random sequence of letters, fvtnh, for a data plane pod from the previous step.

25.10.6. Disabling IPsec encryption
Copy link

As a cluster administrator, you can disable IPsec encryption only if you enabled IPsec after cluster installation.

Important

After disabling IPsec, you must delete the associated IPsec daemonsets pods. If you do not delete these pods, you might experience issues with your cluster.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster with a user with cluster-admin privileges.

Procedure

To disable IPsec encryption, enter the following command:

oc patch networks.operator.openshift.io/cluster --type=json \
  -p='[{"op":"remove", "path":"/spec/defaultNetwork/ovnKubernetesConfig/ipsecConfig"}]'

$ oc patch networks.operator.openshift.io/cluster --type=json \
  -p='[{"op":"remove", "path":"/spec/defaultNetwork/ovnKubernetesConfig/ipsecConfig"}]'

Copy to Clipboard

Toggle word wrap

To find the name of the OVN-Kubernetes data plane pod that exists on the master node in your cluster, enter the following command:

oc get pods -n openshift-ovn-kubernetes -l=app=ovnkube-master

$ oc get pods -n openshift-ovn-kubernetes -l=app=ovnkube-master

Copy to Clipboard

Toggle word wrap

Example output

ovnkube-master-5xqbf                      8/8     Running   0              28m
...

ovnkube-master-5xqbf                      8/8     Running   0              28m
...

Copy to Clipboard

Toggle word wrap

Verify that the master node in your cluster has IPsec disabled by entering the following command. The command output must state false to indicate that the node has IPsec disabled.
```
oc -n openshift-ovn-kubernetes -c nbdb rsh ovnkube-master-<pod_number_sequence> \
  ovn-nbctl --no-leader-only get nb_global . ipsec
```
```
$ oc -n openshift-ovn-kubernetes -c nbdb rsh ovnkube-master-<pod_number_sequence> \
```
1
```
  ovn-nbctl --no-leader-only get nb_global . ipsec
```
Copy to Clipboard Toggle word wrap
1
Replace <pod_number_sequence> with the random sequence of letters, such as 5xqbf, for the data plane pod from the previous step.
To remove the IPsec ovn-ipsec daemonset pod from the openshift-ovn-kubernetes namespace on the node, enter the following command:
```
oc delete daemonset ovn-ipsec -n openshift-ovn-kubernetes
```
```
$ oc delete daemonset ovn-ipsec -n openshift-ovn-kubernetes 
```
1
Copy to Clipboard Toggle word wrap
1
The ovn-ipsec daemonset configures IPsec connections for east-west traffic on the node.
Verify that the ovn-ipsec daemonset pod was removed from the all nodes in your cluster by entering the following command. If the command output does not list the pod, the removal operation is successful.
```
oc get pods -n openshift-ovn-kubernetes -l=app=ovn-ipsec
```
```
$ oc get pods -n openshift-ovn-kubernetes -l=app=ovn-ipsec
```
Copy to Clipboard Toggle word wrap
Note
You might need to re-run the command for deleting the pod because sometimes the initial command attempt might not delete the pod.
Optional: You can increase the size of your cluster MTU by 46 bytes because there is no longer any overhead from the IPsec ESP header in IP packets.

25.10.7. Additional resources
Copy link

About the OVN-Kubernetes Container Network Interface (CNI) network plugin
Changing the MTU for the cluster network
Network [operator.openshift.io/v1] API

25.11. Configuring an egress firewall for a project
Copy link

As a cluster administrator, you can create an egress firewall for a project that restricts egress traffic leaving your OpenShift Container Platform cluster.

25.11.1. How an egress firewall works in a project
Copy link

As a cluster administrator, you can use an egress firewall to limit the external hosts that some or all pods can access from within the cluster. An egress firewall supports the following scenarios:

A pod can only connect to internal hosts and cannot start connections to the public internet.
A pod can only connect to the public internet and cannot start connections to internal hosts that are outside the OpenShift Container Platform cluster.
A pod cannot reach specified internal subnets or hosts outside the OpenShift Container Platform cluster.
A pod can connect to only specific external hosts.

For example, you can allow one project access to a specified IP range but deny the same access to a different project. Or you can restrict application developers from updating from Python pip mirrors, and force updates to come only from approved sources.

Note

Egress firewall does not apply to the host network namespace. Egress firewall rules do not impact any pods that have host networking enabled.

You configure an egress firewall policy by creating an EgressFirewall custom resource (CR) object. The egress firewall matches network traffic that meets any of the following criteria:

An IP address range in CIDR format
A DNS name that resolves to an IP address
A port number
A protocol that is one of the following protocols: TCP, UDP, and SCTP

Important

If your egress firewall includes a deny rule for 0.0.0.0/0, the rule blocks access to your OpenShift Container Platform API servers. You must either add allow rules for each IP address or use the nodeSelector type allow rule in your egress policy rules to connect to API servers.

The following example illustrates the order of the egress firewall rules necessary to ensure API server access:

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
  namespace: <namespace> 
spec:
  egress:
  - to:
      cidrSelector: <api_server_address_range> 
    type: Allow
# ...
  - to:
      cidrSelector: 0.0.0.0/0 
    type: Deny

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
  namespace: <namespace>

1


spec:
  egress:
  - to:
      cidrSelector: <api_server_address_range>

2


    type: Allow
# ...
  - to:
      cidrSelector: 0.0.0.0/0

3


    type: Deny

Copy to Clipboard

Toggle word wrap

1: The namespace for the egress firewall.
2: The IP address range that includes your OpenShift Container Platform API servers.
3: A global deny rule prevents access to the OpenShift Container Platform API servers.

To find the IP address for your API servers, run oc get ep kubernetes -n default.

For more information, see BZ#1988324.

Warning

Egress firewall rules do not apply to traffic that goes through routers. Any user with permission to create a Route CR object can bypass egress firewall policy rules by creating a route that points to a forbidden destination.

25.11.1.1. Limitations of an egress firewall
Copy link

An egress firewall has the following limitations:

No project can have more than one EgressFirewall object.
A maximum of one EgressFirewall object with a maximum of 8,000 rules can be defined per project.
If you use the OVN-Kubernetes network plugin and you configured false for the routingViaHost parameter in the Network custom resource for your cluster, egress firewall rules impact the return ingress replies. If the egress firewall rules drop the ingress reply destination IP, the traffic is dropped.

Violating any of these restrictions results in a broken egress firewall for the project. As a result, all external network traffic drops, which can cause security risks for your organization.

You can create an Egress Firewall resource in the kube-node-lease, kube-public, kube-system, openshift and openshift- projects.

25.11.1.2. Matching order for egress firewall policy rules
Copy link

The OVN-Kubernetes network plugin evaluates egress firewall policy rules based on the first-to-last order of how you defined the rules. The first rule that matches an egress connection from a pod applies. The plugin ignores any subsequent rules for that connection.

25.11.1.3. Domain Name Server (DNS) resolution
Copy link

If you use DNS names in any of your egress firewall policy rules, proper resolution of the domain names is subject to the following restrictions:

Domain name updates are polled based on a time-to-live (TTL) duration. By default, the duration is 30 minutes. When the egress firewall controller queries the local name servers for a domain name, if the response includes a TTL and the TTL is less than 30 minutes, the controller sets the duration for that DNS name to the returned value. Each DNS name is queried after the TTL for the DNS record expires.
The pod must resolve the domain from the same local name servers when necessary. Otherwise the IP addresses for the domain known by the egress firewall controller and the pod can be different. If the IP addresses for a hostname differ, consistent enforcement of the egress firewall does not apply.
Because the egress firewall controller and pods asynchronously poll the same local name server, the pod might obtain the updated IP address before the egress controller does, which causes a race condition. Due to this current limitation, domain name usage in EgressFirewall objects is only recommended for domains with infrequent IP address changes.

Note

Using DNS names in your egress firewall policy does not affect local DNS resolution through CoreDNS.

If your egress firewall policy uses domain names, and an external DNS server handles DNS resolution for an affected pod, you must include egress firewall rules that allow access to the IP addresses of your DNS server.

25.11.2. EgressFirewall custom resource (CR) object
Copy link

You can define one or more rules for an egress firewall. A rule is either an Allow rule or a Deny rule, with a specification for the traffic that the rule applies to.

The following YAML describes an EgressFirewall CR object:

EgressFirewall object

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: <name> 
spec:
  egress: 
    ...

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: <name>

1


spec:
  egress:

2

...

Copy to Clipboard

Toggle word wrap

1: The name for the object must be default.
2: A collection of one or more egress network policy rules as described in the following section.

25.11.2.1. EgressFirewall rules
Copy link

The following YAML describes an egress firewall rule object. The user can select either an IP address range in CIDR format, a domain name, or use the nodeSelector to allow or deny egress traffic. The egress stanza expects an array of one or more objects.

Egress policy rule stanza

egress:
- type: <type> 
  to: 
    cidrSelector: <cidr> 
    dnsName: <dns_name> 
    nodeSelector: <label_name>: <label_value> 
  ports: 
      ...

egress:
- type: <type>

1

to:

2


    cidrSelector: <cidr>

3


    dnsName: <dns_name>

4


    nodeSelector: <label_name>: <label_value>

5


  ports:

6

...

Copy to Clipboard

Toggle word wrap

1: The type of rule. The value must be either Allow or Deny.
2: A stanza describing an egress traffic match rule that specifies the cidrSelector field or the dnsName field. You cannot use both fields in the same rule.
3: An IP address range in CIDR format.
4: A DNS domain name.
5: Labels are key/value pairs that the user defines. Labels are attached to objects, such as pods. The nodeSelector allows for one or more node labels to be selected and attached to pods.
6: Optional: A stanza describing a collection of network ports and protocols for the rule.

Ports stanza

ports:
- port: <port> 
  protocol: <protocol>

ports:
- port: <port>

1


  protocol: <protocol>

2

Copy to Clipboard

Toggle word wrap

1: A network port, such as 80 or 443. If you specify a value for this field, you must also specify a value for protocol.
2: A network protocol. The value must be either TCP, UDP, or SCTP.

25.11.2.2. Example EgressFirewall CR objects
Copy link

The following example defines several egress firewall policy rules:

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
  egress: 
  - type: Allow
    to:
      cidrSelector: 1.2.3.0/24
  - type: Deny
    to:
      cidrSelector: 0.0.0.0/0

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
  egress:

1


  - type: Allow
    to:
      cidrSelector: 1.2.3.0/24
  - type: Deny
    to:
      cidrSelector: 0.0.0.0/0

Copy to Clipboard

Toggle word wrap

1: A collection of egress firewall policy rule objects.

The following example defines a policy rule that denies traffic to the host at the 172.16.1.1/32 IP address, if the traffic is using either the TCP protocol and destination port 80 or any protocol and destination port 443.

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
  egress:
  - type: Deny
    to:
      cidrSelector: 172.16.1.1/32
    ports:
    - port: 80
      protocol: TCP
    - port: 443

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
  egress:
  - type: Deny
    to:
      cidrSelector: 172.16.1.1/32
    ports:
    - port: 80
      protocol: TCP
    - port: 443

Copy to Clipboard

Toggle word wrap

25.11.2.3. Example nodeSelector for EgressFirewall
Copy link

As a cluster administrator, you can allow or deny egress traffic to nodes in your cluster by specifying a label using nodeSelector. Labels can be applied to one or more nodes. The following is an example with the region=east label:

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
    egress:
    - to:
        nodeSelector:
          matchLabels:
            region: east
      type: Allow

apiVersion: k8s.ovn.org/v1
kind: EgressFirewall
metadata:
  name: default
spec:
    egress:
    - to:
        nodeSelector:
          matchLabels:
            region: east
      type: Allow

Copy to Clipboard

Toggle word wrap

Tip

Instead of adding manual rules per node IP address, use node selectors to create a label that allows pods behind an egress firewall to access host network pods.

25.11.3. Creating an egress firewall policy object
Copy link

As a cluster administrator, you can create an egress firewall policy object for a project.

Important

If the project already has an EgressFirewall object defined, you must edit the existing policy to make changes to the egress firewall rules.

Prerequisites

A cluster that uses the OVN-Kubernetes network plugin.
Install the OpenShift CLI (oc).
You must log in to the cluster as a cluster administrator.

Procedure

Create a policy rule:
1. Create a <policy_name>.yaml file where <policy_name> describes the egress policy rules.
2. In the file you created, define an egress policy object.
Enter the following command to create the policy object. Replace <policy_name> with the name of the policy and <project> with the project that the rule applies to.
```
oc create -f <policy_name>.yaml -n <project>
```
```
$ oc create -f <policy_name>.yaml -n <project>
```
Copy to Clipboard Toggle word wrap
In the following example, a new EgressFirewall object is created in a project named project1:
```
oc create -f default.yaml -n project1
```
```
$ oc create -f default.yaml -n project1
```
Copy to Clipboard Toggle word wrap
Example output
```
egressfirewall.k8s.ovn.org/v1 created
```
```
egressfirewall.k8s.ovn.org/v1 created
```
Copy to Clipboard Toggle word wrap
Optional: Save the <policy_name>.yaml file so that you can make changes later.

25.12. Viewing an egress firewall for a project
Copy link

As a cluster administrator, you can list the names of any existing egress firewalls and view the traffic rules for a specific egress firewall.

25.12.1. Viewing an EgressFirewall object
Copy link

You can view an EgressFirewall object in your cluster.

Prerequisites

A cluster using the OVN-Kubernetes network plugin.
Install the OpenShift Command-line Interface (CLI), commonly known as oc.
You must log in to the cluster.

Procedure

Optional: To view the names of the EgressFirewall objects defined in your cluster, enter the following command:
```
oc get egressfirewall --all-namespaces
```
```
$ oc get egressfirewall --all-namespaces
```
Copy to Clipboard Toggle word wrap

To inspect a policy, enter the following command. Replace <policy_name> with the name of the policy to inspect.

oc describe egressfirewall <policy_name>

$ oc describe egressfirewall <policy_name>

Copy to Clipboard

Toggle word wrap

Example output

Name:		default
Namespace:	project1
Created:	20 minutes ago
Labels:		<none>
Annotations:	<none>
Rule:		Allow to 1.2.3.0/24
Rule:		Allow to www.example.com
Rule:		Deny to 0.0.0.0/0

Name:		default
Namespace:	project1
Created:	20 minutes ago
Labels:		<none>
Annotations:	<none>
Rule:		Allow to 1.2.3.0/24
Rule:		Allow to www.example.com
Rule:		Deny to 0.0.0.0/0

Copy to Clipboard

Toggle word wrap

25.13. Editing an egress firewall for a project
Copy link

As a cluster administrator, you can modify network traffic rules for an existing egress firewall.

25.13.1. Editing an EgressFirewall object
Copy link

As a cluster administrator, you can update the egress firewall for a project.

Prerequisites

A cluster using the OVN-Kubernetes network plugin.
Install the OpenShift CLI (oc).
You must log in to the cluster as a cluster administrator.

Procedure

Find the name of the EgressFirewall object for the project. Replace <project> with the name of the project.
```
oc get -n <project> egressfirewall
```
```
$ oc get -n <project> egressfirewall
```
Copy to Clipboard Toggle word wrap
Optional: If you did not save a copy of the EgressFirewall object when you created the egress network firewall, enter the following command to create a copy.
```
oc get -n <project> egressfirewall <name> -o yaml > <filename>.yaml
```
```
$ oc get -n <project> egressfirewall <name> -o yaml > <filename>.yaml
```
Copy to Clipboard Toggle word wrap
Replace <project> with the name of the project. Replace <name> with the name of the object. Replace <filename> with the name of the file to save the YAML to.
After making changes to the policy rules, enter the following command to replace the EgressFirewall object. Replace <filename> with the name of the file containing the updated EgressFirewall object.
```
oc replace -f <filename>.yaml
```
```
$ oc replace -f <filename>.yaml
```
Copy to Clipboard Toggle word wrap

25.14. Removing an egress firewall from a project
Copy link

As a cluster administrator, you can remove an egress firewall from a project to remove all restrictions on network traffic from the project that leaves the OpenShift Container Platform cluster.

25.14.1. Removing an EgressFirewall object
Copy link

As a cluster administrator, you can remove an egress firewall from a project.

Prerequisites

A cluster using the OVN-Kubernetes network plugin.
Install the OpenShift CLI (oc).
You must log in to the cluster as a cluster administrator.

Procedure

Find the name of the EgressFirewall object for the project. Replace <project> with the name of the project.
```
oc get -n <project> egressfirewall
```
```
$ oc get -n <project> egressfirewall
```
Copy to Clipboard Toggle word wrap
Enter the following command to delete the EgressFirewall object. Replace <project> with the name of the project and <name> with the name of the object.
```
oc delete -n <project> egressfirewall <name>
```
```
$ oc delete -n <project> egressfirewall <name>
```
Copy to Clipboard Toggle word wrap

25.15. Configuring an egress IP address
Copy link

As a cluster administrator, you can configure the OVN-Kubernetes Container Network Interface (CNI) network plugin to assign one or more egress IP addresses to a namespace, or to specific pods in a namespace.

Important

In an installer-provisioned infrastructure cluster, do not assign egress IP addresses to the infrastructure node that already hosts the ingress VIP. For more information, see the Red Hat Knowledgebase solution POD from the egress IP enabled namespace cannot access OCP route in an IPI cluster when the egress IP is assigned to the infra node that already hosts the ingress VIP.

25.15.1. Egress IP address architectural design and implementation
Copy link

By using the OpenShift Container Platform egress IP address functionality, you can ensure that the traffic from one or more pods in one or more namespaces has a consistent source IP address for services outside the cluster network.

For example, you might have a pod that periodically queries a database that is hosted on a server outside of your cluster. To enforce access requirements for the server, a packet filtering device is configured to allow traffic only from specific IP addresses. To ensure that you can reliably allow access to the server from only that specific pod, you can configure a specific egress IP address for the pod that makes the requests to the server.

An egress IP address assigned to a namespace is different from an egress router, which is used to send traffic to specific destinations.

In some cluster configurations, application pods and ingress router pods run on the same node. If you configure an egress IP address for an application project in this scenario, the IP address is not used when you send a request to a route from the application project.

Important

Egress IP addresses must not be configured in any Linux network configuration files, such as ifcfg-eth0.

25.15.1.1. Platform support
Copy link

The Egress IP address feature that runs on a primary host network is supported on the following platforms:

Expand

Platform	Supported
Bare metal	Yes
VMware vSphere	Yes
Red Hat OpenStack Platform (RHOSP)	Yes
Amazon Web Services (AWS)	Yes
Google Cloud	Yes
Microsoft Azure	Yes
IBM Z and IBM® LinuxONE	Yes
IBM Z and IBM® LinuxONE for Red Hat Enterprise Linux (RHEL) KVM	Yes
IBM Power	Yes

The Egress IP address feature that runs on secondary host networks is supported on the following platform:

Expand

Platform	Supported
Bare metal	Yes

Important

The assignment of egress IP addresses to control plane nodes with the EgressIP feature is not supported on a cluster provisioned on Amazon Web Services (AWS). (BZ#2039656)

25.15.1.2. Public cloud platform considerations
Copy link

Typically, public cloud providers place a limit on egress IPs. This means that there is a constraint on the absolute number of assignable IP addresses per node for clusters provisioned on public cloud infrastructure. The maximum number of assignable IP addresses per node, or the IP capacity, can be described in the following formula:

IP capacity = public cloud default capacity - sum(current IP assignments)

IP capacity = public cloud default capacity - sum(current IP assignments)

Copy to Clipboard

Toggle word wrap

While the Egress IPs capability manages the IP address capacity per node, it is important to plan for this constraint in your deployments. For example, if a public cloud provider limits IP address capacity to 10 IP addresses per node, and you have 8 nodes, the total number of assignable IP addresses is only 80. To achieve a higher IP address capacity, you would need to allocate additional nodes. For example, if you needed 150 assignable IP addresses, you would need to allocate 7 additional nodes.

To confirm the IP capacity and subnets for any node in your public cloud environment, you can enter the oc get node <node_name> -o yaml command. The cloud.network.openshift.io/egress-ipconfig annotation includes capacity and subnet information for the node.

The annotation value is an array with a single object with fields that provide the following information for the primary network interface:

interface: Specifies the interface ID on AWS and Azure and the interface name on Google Cloud.
ifaddr: Specifies the subnet mask for one or both IP address families.
capacity: Specifies the IP address capacity for the node. On AWS, the IP address capacity is provided per IP address family. On Azure and Google Cloud, the IP address capacity includes both IPv4 and IPv6 addresses.

Automatic attachment and detachment of egress IP addresses for traffic between nodes are available. This allows for traffic from many pods in namespaces to have a consistent source IP address to locations outside of the cluster. This also supports OpenShift SDN and OVN-Kubernetes, which is the default networking plugin in Red Hat OpenShift Networking in OpenShift Container Platform 4.13.

Note

When an RHOSP cluster administrator assigns a floating IP to the reservation port, OpenShift Container Platform cannot delete the reservation port. The CloudPrivateIPConfig object cannot perform delete and move operations until an RHOSP cluster administrator unassigns the floating IP from the reservation port.

The following examples illustrate the annotation from nodes on several public cloud providers. The annotations are indented for readability.

Example cloud.network.openshift.io/egress-ipconfig annotation on AWS

cloud.network.openshift.io/egress-ipconfig: [
  {
    "interface":"eni-078d267045138e436",
    "ifaddr":{"ipv4":"10.0.128.0/18"},
    "capacity":{"ipv4":14,"ipv6":15}
  }
]

cloud.network.openshift.io/egress-ipconfig: [
  {
    "interface":"eni-078d267045138e436",
    "ifaddr":{"ipv4":"10.0.128.0/18"},
    "capacity":{"ipv4":14,"ipv6":15}
  }
]

Copy to Clipboard

Toggle word wrap

Example cloud.network.openshift.io/egress-ipconfig annotation on Google Cloud

cloud.network.openshift.io/egress-ipconfig: [
  {
    "interface":"nic0",
    "ifaddr":{"ipv4":"10.0.128.0/18"},
    "capacity":{"ip":14}
  }
]

cloud.network.openshift.io/egress-ipconfig: [
  {
    "interface":"nic0",
    "ifaddr":{"ipv4":"10.0.128.0/18"},
    "capacity":{"ip":14}
  }
]

Copy to Clipboard

Toggle word wrap

The following sections describe the IP address capacity for supported public cloud environments for use in your capacity calculation.

25.15.1.2.1. Amazon Web Services (AWS) IP address capacity limits
Copy link

On AWS, constraints on IP address assignments depend on the instance type configured. For more information, see IP addresses per network interface per instance type

25.15.1.2.2. Google Cloud IP address capacity limits
Copy link

On Google Cloud, the networking model implements additional node IP addresses through IP address aliasing, rather than IP address assignments. However, IP address capacity maps directly to IP aliasing capacity.

The following capacity limits exist for IP aliasing assignment:

Per node, the maximum number of IP aliases, both IPv4 and IPv6, is 100.
Per VPC, the maximum number of IP aliases is unspecified, but OpenShift Container Platform scalability testing reveals the maximum to be approximately 15,000.

For more information, see Per instance quotas and Alias IP ranges overview.

25.15.1.2.3. Microsoft Azure IP address capacity limits
Copy link

On Azure, the following capacity limits exist for IP address assignment:

Per NIC, the maximum number of assignable IP addresses, for both IPv4 and IPv6, is 256.
Per virtual network, the maximum number of assigned IP addresses cannot exceed 65,536.

For more information, see Networking limits.

25.15.1.3. Assignment of egress IPs to pods
Copy link

To assign one or more egress IPs to a namespace or specific pods in a namespace, the following conditions must be satisfied:

At least one node in your cluster must have the k8s.ovn.org/egress-assignable: "" label.
An EgressIP object exists that defines one or more egress IP addresses to use as the source IP address for traffic leaving the cluster from pods in a namespace.

Important

If you create EgressIP objects prior to labeling any nodes in your cluster for egress IP assignment, OpenShift Container Platform might assign every egress IP address to the first node with the k8s.ovn.org/egress-assignable: "" label.

To ensure that egress IP addresses are widely distributed across nodes in the cluster, always apply the label to the nodes you intent to host the egress IP addresses before creating any EgressIP objects.

25.15.1.4. Assignment of egress IPs to nodes
Copy link

When creating an EgressIP object, the following conditions apply to nodes that are labeled with the k8s.ovn.org/egress-assignable: "" label:

An egress IP address is never assigned to more than one node at a time.
An egress IP address is equally balanced between available nodes that can host the egress IP address.
If the spec.EgressIPs array in an EgressIP object specifies more than one IP address, the following conditions apply:
- No node will ever host more than one of the specified IP addresses.
- Traffic is balanced roughly equally between the specified IP addresses for a given namespace.
If a node becomes unavailable, any egress IP addresses assigned to it are automatically reassigned, subject to the previously described conditions.

When a pod matches the selector for multiple EgressIP objects, there is no guarantee which of the egress IP addresses that are specified in the EgressIP objects is assigned as the egress IP address for the pod.

Additionally, if an EgressIP object specifies multiple egress IP addresses, there is no guarantee which of the egress IP addresses might be used. For example, if a pod matches a selector for an EgressIP object with two egress IP addresses, 10.10.20.1 and 10.10.20.2, either might be used for each TCP connection or UDP conversation.

25.15.1.5. Architectural diagram of an egress IP address configuration
Copy link

The following diagram depicts an egress IP address configuration. The diagram describes four pods in two different namespaces running on three nodes in a cluster. The nodes are assigned IP addresses from the 192.168.126.0/18 CIDR block on the host network.

Both Node 1 and Node 3 are labeled with k8s.ovn.org/egress-assignable: "" and thus available for the assignment of egress IP addresses.

The dashed lines in the diagram depict the traffic flow from pod1, pod2, and pod3 traveling through the pod network to egress the cluster from Node 1 and Node 3. When an external service receives traffic from any of the pods selected by the example EgressIP object, the source IP address is either 192.168.126.10 or 192.168.126.102. The traffic is balanced roughly equally between these two nodes.

The following resources from the diagram are illustrated in detail:

Namespace objects

The namespaces are defined in the following manifest:

Namespace objects

apiVersion: v1
kind: Namespace
metadata:
  name: namespace1
  labels:
    env: prod
---
apiVersion: v1
kind: Namespace
metadata:
  name: namespace2
  labels:
    env: prod

apiVersion: v1
kind: Namespace
metadata:
  name: namespace1
  labels:
    env: prod
---
apiVersion: v1
kind: Namespace
metadata:
  name: namespace2
  labels:
    env: prod

Copy to Clipboard

Toggle word wrap

EgressIP object

The following EgressIP object describes a configuration that selects all pods in any namespace with the env label set to prod. The egress IP addresses for the selected pods are 192.168.126.10 and 192.168.126.102.

EgressIP object

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressips-prod
spec:
  egressIPs:
  - 192.168.126.10
  - 192.168.126.102
  namespaceSelector:
    matchLabels:
      env: prod
status:
  items:
  - node: node1
    egressIP: 192.168.126.10
  - node: node3
    egressIP: 192.168.126.102

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressips-prod
spec:
  egressIPs:
  - 192.168.126.10
  - 192.168.126.102
  namespaceSelector:
    matchLabels:
      env: prod
status:
  items:
  - node: node1
    egressIP: 192.168.126.10
  - node: node3
    egressIP: 192.168.126.102

Copy to Clipboard

Toggle word wrap

For the configuration in the previous example, OpenShift Container Platform assigns both egress IP addresses to the available nodes. The status field reflects whether and where the egress IP addresses are assigned.

25.15.2. EgressIP object
Copy link

The following YAML describes the API for the EgressIP object. The scope of the object is cluster-wide; it is not created in a namespace.

Important

EgressIP selected pods cannot serve as backends for services with externalTrafficPolicy set to Local. If you try this configuration, service ingress traffic that targets the pods gets incorrectly rerouted to the egress node that hosts the EgressIP. This situation negatively impacts the handling of incoming service traffic and causes connections to drop. This leads to unavailable and non-functional services.

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: <name> 
spec:
  egressIPs: 
  - <ip_address>
  namespaceSelector: 
    ...
  podSelector: 
    ...

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: <name>

1


spec:
  egressIPs:

2


  - <ip_address>
  namespaceSelector:

3


    ...
  podSelector:

4

...

Copy to Clipboard

Toggle word wrap

1: The name for the EgressIPs object.
2: An array of one or more IP addresses.
3: One or more selectors for the namespaces to associate the egress IP addresses with.
4: Optional: One or more selectors for pods in the specified namespaces to associate egress IP addresses with. Applying these selectors allows for the selection of a subset of pods within a namespace.

The following YAML describes the stanza for the namespace selector:

Namespace selector stanza

namespaceSelector: 
  matchLabels:
    <label_name>: <label_value>

namespaceSelector:

1


  matchLabels:
    <label_name>: <label_value>

Copy to Clipboard

Toggle word wrap

1: One or more matching rules for namespaces. If more than one match rule is provided, all matching namespaces are selected.

The following YAML describes the optional stanza for the pod selector:

Pod selector stanza

podSelector: 
  matchLabels:
    <label_name>: <label_value>

podSelector:

1


  matchLabels:
    <label_name>: <label_value>

Copy to Clipboard

Toggle word wrap

1: Optional: One or more matching rules for pods in the namespaces that match the specified namespaceSelector rules. If specified, only pods that match are selected. Others pods in the namespace are not selected.

In the following example, the EgressIP object associates the 192.168.126.11 and 192.168.126.102 egress IP addresses with pods that have the app label set to web and are in the namespaces that have the env label set to prod:

Example EgressIP object

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-group1
spec:
  egressIPs:
  - 192.168.126.11
  - 192.168.126.102
  podSelector:
    matchLabels:
      app: web
  namespaceSelector:
    matchLabels:
      env: prod

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-group1
spec:
  egressIPs:
  - 192.168.126.11
  - 192.168.126.102
  podSelector:
    matchLabels:
      app: web
  namespaceSelector:
    matchLabels:
      env: prod

Copy to Clipboard

Toggle word wrap

In the following example, the EgressIP object associates the 192.168.127.30 and 192.168.127.40 egress IP addresses with any pods that do not have the environment label set to development:

Example EgressIP object

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-group2
spec:
  egressIPs:
  - 192.168.127.30
  - 192.168.127.40
  namespaceSelector:
    matchExpressions:
    - key: environment
      operator: NotIn
      values:
      - development

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-group2
spec:
  egressIPs:
  - 192.168.127.30
  - 192.168.127.40
  namespaceSelector:
    matchExpressions:
    - key: environment
      operator: NotIn
      values:
      - development

Copy to Clipboard

Toggle word wrap

25.15.3. The egressIPConfig object
Copy link

As a feature of egress IP, the reachabilityTotalTimeoutSeconds parameter configures the EgressIP node reachability check total timeout in seconds. If the EgressIP node cannot be reached within this timeout, the node is declared down.

You can set a value for the reachabilityTotalTimeoutSeconds in the configuration file for the egressIPConfig object. Setting a large value might cause the EgressIP implementation to react slowly to node changes. The implementation reacts slowly for EgressIP nodes that have an issue and are unreachable.

If you omit the reachabilityTotalTimeoutSeconds parameter from the egressIPConfig object, the platform chooses a reasonable default value, which is subject to change over time. The current default is 1 second. A value of 0 disables the reachability check for the EgressIP node.

The following egressIPConfig object describes changing the reachabilityTotalTimeoutSeconds from the default 1 second probes to 5 second probes:

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  defaultNetwork:
    ovnKubernetesConfig:
      egressIPConfig: 
        reachabilityTotalTimeoutSeconds: 5 
      gatewayConfig:
        routingViaHost: false
      genevePort: 6081

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  defaultNetwork:
    ovnKubernetesConfig:
      egressIPConfig:

1


        reachabilityTotalTimeoutSeconds: 5

2


      gatewayConfig:
        routingViaHost: false
      genevePort: 6081

Copy to Clipboard

Toggle word wrap

1: The egressIPConfig holds the configurations for the options of the EgressIP object. By changing these configurations, you can extend the EgressIP object.
2: The value for reachabilityTotalTimeoutSeconds accepts integer values from 0 to 60. A value of 0 disables the reachability check of the egressIP node. Setting a value from 1 to 60 corresponds to the timeout in seconds for a probe to send the reachability check to the node.

25.15.4. Labeling a node to host egress IP addresses
Copy link

You can apply the k8s.ovn.org/egress-assignable="" label to a node in your cluster so that OpenShift Container Platform can assign one or more egress IP addresses to the node.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster as a cluster administrator.

Procedure

To label a node so that it can host one or more egress IP addresses, enter the following command:

oc label nodes <node_name> k8s.ovn.org/egress-assignable=""

$ oc label nodes <node_name> k8s.ovn.org/egress-assignable=""

1

Copy to Clipboard

Toggle word wrap

1: The name of the node to label.

Tip

You can alternatively apply the following YAML to add the label to a node:

apiVersion: v1
kind: Node
metadata:
  labels:
    k8s.ovn.org/egress-assignable: ""
  name: <node_name>

apiVersion: v1
kind: Node
metadata:
  labels:
    k8s.ovn.org/egress-assignable: ""
  name: <node_name>

Copy to Clipboard

Toggle word wrap

25.15.5. Next steps
Copy link

Assigning egress IPs

25.16. Assigning an egress IP address
Copy link

As a cluster administrator, you can assign an egress IP address for traffic leaving the cluster from a namespace or from specific pods in a namespace.

25.16.1. Assigning an egress IP address to a namespace
Copy link

You can assign one or more egress IP addresses to a namespace or to specific pods in a namespace.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster as a cluster administrator.
Configure at least one node to host an egress IP address.

Procedure

Create an EgressIP object:

Create a <egressips_name>.yaml file where <egressips_name> is the name of the object.

In the file that you created, define an EgressIP object, as in the following example:

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-project1
spec:
  egressIPs:
  - 192.168.127.10
  - 192.168.127.11
  namespaceSelector:
    matchLabels:
      env: qa

apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egress-project1
spec:
  egressIPs:
  - 192.168.127.10
  - 192.168.127.11
  namespaceSelector:
    matchLabels:
      env: qa

Copy to Clipboard

Toggle word wrap

To create the object, enter the following command.
```
oc apply -f <egressips_name>.yaml
```
```
$ oc apply -f <egressips_name>.yaml 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <egressips_name> with the name of the object.
Example output
```
egressips.k8s.ovn.org/<egressips_name> created
```
```
egressips.k8s.ovn.org/<egressips_name> created
```
Copy to Clipboard Toggle word wrap
Optional: Store the <egressips_name>.yaml file so that you can make changes later.
Add labels to the namespace that requires egress IP addresses. To add a label to the namespace of an EgressIP object defined in step 1, run the following command:
```
oc label ns <namespace> env=qa
```
```
$ oc label ns <namespace> env=qa 
```
1
Copy to Clipboard Toggle word wrap
1
Replace <namespace> with the namespace that requires egress IP addresses.

Verification

To show all egress IPs that are in use in your cluster, enter the following command:
```
oc get egressip -o yaml
```
```
$ oc get egressip -o yaml
```
Copy to Clipboard Toggle word wrap
Note
The command oc get egressip only returns one egress IP address regardless of how many are configured. This is not a bug and is a limitation of Kubernetes. As a workaround, you can pass in the -o yaml or -o json flags to return all egress IPs addresses in use.
Example output
```
# ...
spec:
  egressIPs:
  - 192.168.127.10
  - 192.168.127.11
# ...
```
```
# ...
spec:
  egressIPs:
  - 192.168.127.10
  - 192.168.127.11
# ...
```
Copy to Clipboard Toggle word wrap

25.17. Considerations for the use of an egress router pod
Copy link

25.17.1. About an egress router pod
Copy link

The OpenShift Container Platform egress router pod redirects traffic to a specified remote server from a private source IP address that is not used for any other purpose. An egress router pod can send network traffic to servers that are set up to allow access only from specific IP addresses.

Note

The egress router pod is not intended for every outgoing connection. Creating large numbers of egress router pods can exceed the limits of your network hardware. For example, creating an egress router pod for every project or application could exceed the number of local MAC addresses that the network interface can handle before reverting to filtering MAC addresses in software.

Important

The egress router image is not compatible with Amazon AWS, Azure Cloud, or any other cloud platform that does not support layer 2 manipulations due to their incompatibility with macvlan traffic.

25.17.1.1. Egress router modes
Copy link

In redirect mode, an egress router pod configures iptables rules to redirect traffic from its own IP address to one or more destination IP addresses. Client pods that need to use the reserved source IP address must be configured to access the service for the egress router rather than connecting directly to the destination IP. You can access the destination service and port from the application pod by using the curl command. For example:

curl <router_service_IP> <port>

$ curl <router_service_IP> <port>

Copy to Clipboard

Toggle word wrap

Note

The egress router CNI plugin supports redirect mode only. This is a difference with the egress router implementation that you can deploy with OpenShift SDN. Unlike the egress router for OpenShift SDN, the egress router CNI plugin does not support HTTP proxy mode or DNS proxy mode.

25.17.1.2. Egress router pod implementation
Copy link

The egress router implementation uses the egress router Container Network Interface (CNI) plugin. The plugin adds a secondary network interface to a pod.

An egress router is a pod that has two network interfaces. For example, the pod can have eth0 and net1 network interfaces. The eth0 interface is on the cluster network and the pod continues to use the interface for ordinary cluster-related network traffic. The net1 interface is on a secondary network and has an IP address and gateway for that network. Other pods in the OpenShift Container Platform cluster can access the egress router service and the service enables the pods to access external services. The egress router acts as a bridge between pods and an external system.

Traffic that leaves the egress router exits through a node, but the packets have the MAC address of the net1 interface from the egress router pod.

When you add an egress router custom resource, the Cluster Network Operator creates the following objects:

The network attachment definition for the net1 secondary network interface of the pod.
A deployment for the egress router.

If you delete an egress router custom resource, the Operator deletes the two objects in the preceding list that are associated with the egress router.

25.17.1.3. Deployment considerations
Copy link

An egress router pod adds an additional IP address and MAC address to the primary network interface of the node. As a result, you might need to configure your hypervisor or cloud provider to allow the additional address.

Red Hat OpenStack Platform (RHOSP)

If you deploy OpenShift Container Platform on RHOSP, you must allow traffic from the IP and MAC addresses of the egress router pod on your OpenStack environment. If you do not allow the traffic, then communication will fail:

openstack port set --allowed-address \
  ip_address=<ip_address>,mac_address=<mac_address> <neutron_port_uuid>

$ openstack port set --allowed-address \
  ip_address=<ip_address>,mac_address=<mac_address> <neutron_port_uuid>

Copy to Clipboard

Toggle word wrap

Red Hat Virtualization (RHV)

If you are using RHV, you must select No Network Filter for the Virtual network interface controller (vNIC).

VMware vSphere

If you are using VMware vSphere, see the VMware documentation for securing vSphere standard switches. View and change VMware vSphere default settings by selecting the host virtual switch from the vSphere Web Client.

Specifically, ensure that the following are enabled:

25.17.1.4. Failover configuration
Copy link

To avoid downtime, the Cluster Network Operator deploys the egress router pod as a deployment resource. The deployment name is egress-router-cni-deployment. The pod that corresponds to the deployment has a label of app=egress-router-cni.

To create a new service for the deployment, use the oc expose deployment/egress-router-cni-deployment --port <port_number> command or create a file like the following example:

apiVersion: v1
kind: Service
metadata:
  name: app-egress
spec:
  ports:
  - name: tcp-8080
    protocol: TCP
    port: 8080
  - name: tcp-8443
    protocol: TCP
    port: 8443
  - name: udp-80
    protocol: UDP
    port: 80
  type: ClusterIP
  selector:
    app: egress-router-cni

apiVersion: v1
kind: Service
metadata:
  name: app-egress
spec:
  ports:
  - name: tcp-8080
    protocol: TCP
    port: 8080
  - name: tcp-8443
    protocol: TCP
    port: 8443
  - name: udp-80
    protocol: UDP
    port: 80
  type: ClusterIP
  selector:
    app: egress-router-cni

Copy to Clipboard

Toggle word wrap

25.18. Deploying an egress router pod in redirect mode
Copy link

As a cluster administrator, you can deploy an egress router pod to redirect traffic to specified destination IP addresses from a reserved source IP address.

The egress router implementation uses the egress router Container Network Interface (CNI) plugin.

25.18.1. Egress router custom resource
Copy link

Define the configuration for an egress router pod in an egress router custom resource. The following YAML describes the fields for the configuration of an egress router in redirect mode:

apiVersion: network.operator.openshift.io/v1
kind: EgressRouter
metadata:
  name: <egress_router_name>
  namespace: <namespace>  <.>
spec:
  addresses: [  <.>
    {
      ip: "<egress_router>",  <.>
      gateway: "<egress_gateway>"  <.>
    }
  ]
  mode: Redirect
  redirect: {
    redirectRules: [  <.>
      {
        destinationIP: "<egress_destination>",
        port: <egress_router_port>,
        targetPort: <target_port>,  <.>
        protocol: <network_protocol>  <.>
      },
      ...
    ],
    fallbackIP: "<egress_destination>" <.>
  }

apiVersion: network.operator.openshift.io/v1
kind: EgressRouter
metadata:
  name: <egress_router_name>
  namespace: <namespace>  <.>
spec:
  addresses: [  <.>
    {
      ip: "<egress_router>",  <.>
      gateway: "<egress_gateway>"  <.>
    }
  ]
  mode: Redirect
  redirect: {
    redirectRules: [  <.>
      {
        destinationIP: "<egress_destination>",
        port: <egress_router_port>,
        targetPort: <target_port>,  <.>
        protocol: <network_protocol>  <.>
      },
      ...
    ],
    fallbackIP: "<egress_destination>" <.>
  }

Copy to Clipboard

Toggle word wrap

<.> Optional: The namespace field specifies the namespace to create the egress router in. If you do not specify a value in the file or on the command line, the default namespace is used.

<.> The addresses field specifies the IP addresses to configure on the secondary network interface.

<.> The ip field specifies the reserved source IP address and netmask from the physical network that the node is on to use with egress router pod. Use CIDR notation to specify the IP address and netmask.

<.> The gateway field specifies the IP address of the network gateway.

<.> Optional: The redirectRules field specifies a combination of egress destination IP address, egress router port, and protocol. Incoming connections to the egress router on the specified port and protocol are routed to the destination IP address.

<.> Optional: The targetPort field specifies the network port on the destination IP address. If this field is not specified, traffic is routed to the same network port that it arrived on.

<.> The protocol field supports TCP, UDP, or SCTP.

<.> Optional: The fallbackIP field specifies a destination IP address. If you do not specify any redirect rules, the egress router sends all traffic to this fallback IP address. If you specify redirect rules, any connections to network ports that are not defined in the rules are sent by the egress router to this fallback IP address. If you do not specify this field, the egress router rejects connections to network ports that are not defined in the rules.

Example egress router specification

apiVersion: network.operator.openshift.io/v1
kind: EgressRouter
metadata:
  name: egress-router-redirect
spec:
  networkInterface: {
    macvlan: {
      mode: "Bridge"
    }
  }
  addresses: [
    {
      ip: "192.168.12.99/24",
      gateway: "192.168.12.1"
    }
  ]
  mode: Redirect
  redirect: {
    redirectRules: [
      {
        destinationIP: "10.0.0.99",
        port: 80,
        protocol: UDP
      },
      {
        destinationIP: "203.0.113.26",
        port: 8080,
        targetPort: 80,
        protocol: TCP
      },
      {
        destinationIP: "203.0.113.27",
        port: 8443,
        targetPort: 443,
        protocol: TCP
      }
    ]
  }

apiVersion: network.operator.openshift.io/v1
kind: EgressRouter
metadata:
  name: egress-router-redirect
spec:
  networkInterface: {
    macvlan: {
      mode: "Bridge"
    }
  }
  addresses: [
    {
      ip: "192.168.12.99/24",
      gateway: "192.168.12.1"
    }
  ]
  mode: Redirect
  redirect: {
    redirectRules: [
      {
        destinationIP: "10.0.0.99",
        port: 80,
        protocol: UDP
      },
      {
        destinationIP: "203.0.113.26",
        port: 8080,
        targetPort: 80,
        protocol: TCP
      },
      {
        destinationIP: "203.0.113.27",
        port: 8443,
        targetPort: 443,
        protocol: TCP
      }
    ]
  }

Copy to Clipboard

Toggle word wrap

25.18.2. Deploying an egress router in redirect mode
Copy link

You can deploy an egress router to redirect traffic from its own reserved source IP address to one or more destination IP addresses.

After you add an egress router, the client pods that need to use the reserved source IP address must be modified to connect to the egress router rather than connecting directly to the destination IP.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create an egress router definition.
To ensure that other pods can find the IP address of the egress router pod, create a service that uses the egress router, as in the following example:
```
apiVersion: v1
kind: Service
metadata:
  name: egress-1
spec:
  ports:
  - name: web-app
    protocol: TCP
    port: 8080
  type: ClusterIP
  selector:
    app: egress-router-cni <.>
```
```
apiVersion: v1
kind: Service
metadata:
  name: egress-1
spec:
  ports:
  - name: web-app
    protocol: TCP
    port: 8080
  type: ClusterIP
  selector:
    app: egress-router-cni <.>
```
Copy to Clipboard Toggle word wrap
<.> Specify the label for the egress router. The value shown is added by the Cluster Network Operator and is not configurable.
After you create the service, your pods can connect to the service. The egress router pod redirects traffic to the corresponding port on the destination IP address. The connections originate from the reserved source IP address.

Verification

To verify that the Cluster Network Operator started the egress router, complete the following procedure:

View the network attachment definition that the Operator created for the egress router:
```
oc get network-attachment-definition egress-router-cni-nad
```
```
$ oc get network-attachment-definition egress-router-cni-nad
```
Copy to Clipboard Toggle word wrap
The name of the network attachment definition is not configurable.
Example output
```
NAME                    AGE
egress-router-cni-nad   18m
```
```
NAME                    AGE
egress-router-cni-nad   18m
```
Copy to Clipboard Toggle word wrap

View the deployment for the egress router pod:

oc get deployment egress-router-cni-deployment

$ oc get deployment egress-router-cni-deployment

Copy to Clipboard

Toggle word wrap

The name of the deployment is not configurable.

Example output

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
egress-router-cni-deployment   1/1     1            1           18m

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
egress-router-cni-deployment   1/1     1            1           18m

Copy to Clipboard

Toggle word wrap

View the status of the egress router pod:

oc get pods -l app=egress-router-cni

$ oc get pods -l app=egress-router-cni

Copy to Clipboard

Toggle word wrap

Example output

NAME                                            READY   STATUS    RESTARTS   AGE
egress-router-cni-deployment-575465c75c-qkq6m   1/1     Running   0          18m

NAME                                            READY   STATUS    RESTARTS   AGE
egress-router-cni-deployment-575465c75c-qkq6m   1/1     Running   0          18m

Copy to Clipboard

Toggle word wrap

View the logs and the routing table for the egress router pod.

Get the node name for the egress router pod:

POD_NODENAME=$(oc get pod -l app=egress-router-cni -o jsonpath="{.items[0].spec.nodeName}")

$ POD_NODENAME=$(oc get pod -l app=egress-router-cni -o jsonpath="{.items[0].spec.nodeName}")

Copy to Clipboard

Toggle word wrap

Enter into a debug session on the target node. This step instantiates a debug pod called <node_name>-debug:
```
oc debug node/$POD_NODENAME
```
```
$ oc debug node/$POD_NODENAME
```
Copy to Clipboard Toggle word wrap
Set /host as the root directory within the debug shell. The debug pod mounts the root file system of the host in /host within the pod. By changing the root directory to /host, you can run binaries from the executable paths of the host:
```
chroot /host
```
```
# chroot /host
```
Copy to Clipboard Toggle word wrap

From within the chroot environment console, display the egress router logs:

cat /tmp/egress-router-log

# cat /tmp/egress-router-log

Copy to Clipboard

Toggle word wrap

Example output

2021-04-26T12:27:20Z [debug] Called CNI ADD
2021-04-26T12:27:20Z [debug] Gateway: 192.168.12.1
2021-04-26T12:27:20Z [debug] IP Source Addresses: [192.168.12.99/24]
2021-04-26T12:27:20Z [debug] IP Destinations: [80 UDP 10.0.0.99/30 8080 TCP 203.0.113.26/30 80 8443 TCP 203.0.113.27/30 443]
2021-04-26T12:27:20Z [debug] Created macvlan interface
2021-04-26T12:27:20Z [debug] Renamed macvlan to "net1"
2021-04-26T12:27:20Z [debug] Adding route to gateway 192.168.12.1 on macvlan interface
2021-04-26T12:27:20Z [debug] deleted default route {Ifindex: 3 Dst: <nil> Src: <nil> Gw: 10.128.10.1 Flags: [] Table: 254}
2021-04-26T12:27:20Z [debug] Added new default route with gateway 192.168.12.1
2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p UDP --dport 80 -j DNAT --to-destination 10.0.0.99
2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p TCP --dport 8080 -j DNAT --to-destination 203.0.113.26:80
2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p TCP --dport 8443 -j DNAT --to-destination 203.0.113.27:443
2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat -o net1 -j SNAT --to-source 192.168.12.99

2021-04-26T12:27:20Z [debug] Called CNI ADD
2021-04-26T12:27:20Z [debug] Gateway: 192.168.12.1
2021-04-26T12:27:20Z [debug] IP Source Addresses: [192.168.12.99/24]
2021-04-26T12:27:20Z [debug] IP Destinations: [80 UDP 10.0.0.99/30 8080 TCP 203.0.113.26/30 80 8443 TCP 203.0.113.27/30 443]
2021-04-26T12:27:20Z [debug] Created macvlan interface
2021-04-26T12:27:20Z [debug] Renamed macvlan to "net1"
2021-04-26T12:27:20Z [debug] Adding route to gateway 192.168.12.1 on macvlan interface
2021-04-26T12:27:20Z [debug] deleted default route {Ifindex: 3 Dst: <nil> Src: <nil> Gw: 10.128.10.1 Flags: [] Table: 254}
2021-04-26T12:27:20Z [debug] Added new default route with gateway 192.168.12.1
2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p UDP --dport 80 -j DNAT --to-destination 10.0.0.99
2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p TCP --dport 8080 -j DNAT --to-destination 203.0.113.26:80
2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat PREROUTING -i eth0 -p TCP --dport 8443 -j DNAT --to-destination 203.0.113.27:443
2021-04-26T12:27:20Z [debug] Added iptables rule: iptables -t nat -o net1 -j SNAT --to-source 192.168.12.99

Copy to Clipboard

Toggle word wrap

The logging file location and logging level are not configurable when you start the egress router by creating an EgressRouter object as described in this procedure.

From within the chroot environment console, get the container ID:
```
crictl ps --name egress-router-cni-pod | awk '{print $1}'
```
```
# crictl ps --name egress-router-cni-pod | awk '{print $1}'
```
Copy to Clipboard Toggle word wrap
Example output
```
CONTAINER
bac9fae69ddb6
```
```
CONTAINER
bac9fae69ddb6
```
Copy to Clipboard Toggle word wrap
Determine the process ID of the container. In this example, the container ID is bac9fae69ddb6:
```
crictl inspect -o yaml bac9fae69ddb6 | grep 'pid:' | awk '{print $2}'
```
```
# crictl inspect -o yaml bac9fae69ddb6 | grep 'pid:' | awk '{print $2}'
```
Copy to Clipboard Toggle word wrap
Example output
```
68857
```
```
68857
```
Copy to Clipboard Toggle word wrap
Enter the network namespace of the container:
```
nsenter -n -t 68857
```
```
# nsenter -n -t 68857
```
Copy to Clipboard Toggle word wrap
Display the routing table:
```
ip route
```
```
# ip route
```
Copy to Clipboard Toggle word wrap
In the following example output, the net1 network interface is the default route. Traffic for the cluster network uses the eth0 network interface. Traffic for the 192.168.12.0/24 network uses the net1 network interface and originates from the reserved source IP address 192.168.12.99. The pod routes all other traffic to the gateway at IP address 192.168.12.1. Routing for the service network is not shown.
Example output
```
default via 192.168.12.1 dev net1
10.128.10.0/23 dev eth0 proto kernel scope link src 10.128.10.18
192.168.12.0/24 dev net1 proto kernel scope link src 192.168.12.99
192.168.12.1 dev net1
```
```
default via 192.168.12.1 dev net1
10.128.10.0/23 dev eth0 proto kernel scope link src 10.128.10.18
192.168.12.0/24 dev net1 proto kernel scope link src 192.168.12.99
192.168.12.1 dev net1
```
Copy to Clipboard Toggle word wrap

25.19. Enabling multicast for a project
Copy link

25.19.1. About multicast
Copy link

With IP multicast, data is broadcast to many IP addresses simultaneously.

Important

At this time, multicast is best used for low-bandwidth coordination or service discovery and not a high-bandwidth solution.
By default, network policies affect all connections in a namespace. However, multicast is unaffected by network policies. If multicast is enabled in the same namespace as your network policies, it is always allowed, even if there is a deny-all network policy. Cluster administrators should consider the implications to the exemption of multicast from network policies before enabling it.

Multicast traffic between OpenShift Container Platform pods is disabled by default. If you are using the OVN-Kubernetes network plugin, you can enable multicast on a per-project basis.

25.19.2. Enabling multicast between pods
Copy link

You can enable multicast between pods for your project.

Prerequisites

Install the OpenShift CLI (oc).
You must log in to the cluster with a user that has the cluster-admin role.

Procedure

Run the following command to enable multicast for a project. Replace <namespace> with the namespace for the project you want to enable multicast for.

oc annotate namespace <namespace> \
    k8s.ovn.org/multicast-enabled=true

$ oc annotate namespace <namespace> \
    k8s.ovn.org/multicast-enabled=true

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to add the annotation:

apiVersion: v1
kind: Namespace
metadata:
  name: <namespace>
  annotations:
    k8s.ovn.org/multicast-enabled: "true"

apiVersion: v1
kind: Namespace
metadata:
  name: <namespace>
  annotations:
    k8s.ovn.org/multicast-enabled: "true"

Copy to Clipboard

Toggle word wrap

Verification

To verify that multicast is enabled for a project, complete the following procedure:

Change your current project to the project that you enabled multicast for. Replace <project> with the project name.
```
oc project <project>
```
```
$ oc project <project>
```
Copy to Clipboard Toggle word wrap

Create a pod to act as a multicast receiver:

cat <<EOF| oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: mlistener
  labels:
    app: multicast-verify
spec:
  containers:
    - name: mlistener
      image: registry.access.redhat.com/ubi9
      command: ["/bin/sh", "-c"]
      args:
        ["dnf -y install socat hostname && sleep inf"]
      ports:
        - containerPort: 30102
          name: mlistener
          protocol: UDP
EOF

$ cat <<EOF| oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: mlistener
  labels:
    app: multicast-verify
spec:
  containers:
    - name: mlistener
      image: registry.access.redhat.com/ubi9
      command: ["/bin/sh", "-c"]
      args:
        ["dnf -y install socat hostname && sleep inf"]
      ports:
        - containerPort: 30102
          name: mlistener
          protocol: UDP
EOF

Copy to Clipboard

Toggle word wrap

Create a pod to act as a multicast sender:

cat <<EOF| oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: msender
  labels:
    app: multicast-verify
spec:
  containers:
    - name: msender
      image: registry.access.redhat.com/ubi9
      command: ["/bin/sh", "-c"]
      args:
        ["dnf -y install socat && sleep inf"]
EOF

$ cat <<EOF| oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: msender
  labels:
    app: multicast-verify
spec:
  containers:
    - name: msender
      image: registry.access.redhat.com/ubi9
      command: ["/bin/sh", "-c"]
      args:
        ["dnf -y install socat && sleep inf"]
EOF

Copy to Clipboard

Toggle word wrap

In a new terminal window or tab, start the multicast listener.

Get the IP address for the Pod:

POD_IP=$(oc get pods mlistener -o jsonpath='{.status.podIP}')

$ POD_IP=$(oc get pods mlistener -o jsonpath='{.status.podIP}')

Copy to Clipboard

Toggle word wrap

Start the multicast listener by entering the following command:

oc exec mlistener -i -t -- \
    socat UDP4-RECVFROM:30102,ip-add-membership=224.1.0.1:$POD_IP,fork EXEC:hostname

$ oc exec mlistener -i -t -- \
    socat UDP4-RECVFROM:30102,ip-add-membership=224.1.0.1:$POD_IP,fork EXEC:hostname

Copy to Clipboard

Toggle word wrap

Start the multicast transmitter.

Get the pod network IP address range:

CIDR=$(oc get Network.config.openshift.io cluster \
    -o jsonpath='{.status.clusterNetwork[0].cidr}')

$ CIDR=$(oc get Network.config.openshift.io cluster \
    -o jsonpath='{.status.clusterNetwork[0].cidr}')

Copy to Clipboard

Toggle word wrap

To send a multicast message, enter the following command:

oc exec msender -i -t -- \
    /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64"

$ oc exec msender -i -t -- \
    /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64"

Copy to Clipboard

Toggle word wrap

If multicast is working, the previous command returns the following output:

mlistener

mlistener

Copy to Clipboard

Toggle word wrap

25.20. Disabling multicast for a project
Copy link

25.20.1. Disabling multicast between pods
Copy link

You can disable multicast between pods for your project.

Prerequisites

Install the OpenShift CLI (oc).
You must log in to the cluster with a user that has the cluster-admin role.

Procedure

Disable multicast by running the following command:

oc annotate namespace <namespace> \
    k8s.ovn.org/multicast-enabled-

$ oc annotate namespace <namespace> \

1


    k8s.ovn.org/multicast-enabled-

Copy to Clipboard

Toggle word wrap

1: The namespace for the project you want to disable multicast for.

Tip

You can alternatively apply the following YAML to delete the annotation:

apiVersion: v1
kind: Namespace
metadata:
  name: <namespace>
  annotations:
    k8s.ovn.org/multicast-enabled: null

apiVersion: v1
kind: Namespace
metadata:
  name: <namespace>
  annotations:
    k8s.ovn.org/multicast-enabled: null

Copy to Clipboard

Toggle word wrap

25.21. Tracking network flows
Copy link

As a cluster administrator, you can collect information about pod network flows from your cluster to assist with the following areas:

Monitor ingress and egress traffic on the pod network.
Troubleshoot performance issues.
Gather data for capacity planning and security audits.

When you enable the collection of the network flows, only the metadata about the traffic is collected. For example, packet data is not collected, but the protocol, source address, destination address, port numbers, number of bytes, and other packet-level information is collected.

The data is collected in one or more of the following record formats:

NetFlow
sFlow
IPFIX

When you configure the Cluster Network Operator (CNO) with one or more collector IP addresses and port numbers, the Operator configures Open vSwitch (OVS) on each node to send the network flows records to each collector.

You can configure the Operator to send records to more than one type of network flow collector. For example, you can send records to NetFlow collectors and also send records to sFlow collectors.

When OVS sends data to the collectors, each type of collector receives identical records. For example, if you configure two NetFlow collectors, OVS on a node sends identical records to the two collectors. If you also configure two sFlow collectors, the two sFlow collectors receive identical records. However, each collector type has a unique record format.

Collecting the network flows data and sending the records to collectors affects performance. Nodes process packets at a slower rate. If the performance impact is too great, you can delete the destinations for collectors to disable collecting network flows data and restore performance.

Note

Enabling network flow collectors might have an impact on the overall performance of the cluster network.

25.21.1. Network object configuration for tracking network flows
Copy link

The fields for configuring network flows collectors in the Cluster Network Operator (CNO) are shown in the following table:

Expand

Table 25.13. Network flows configuration
Field	Type	Description
`metadata.name`	`string`	The name of the CNO object. This name is always `cluster`.
`spec.exportNetworkFlows`	`object`	One or more of `netFlow`, `sFlow`, or `ipfix`.
`spec.exportNetworkFlows.netFlow.collectors`	`array`	A list of IP address and network port pairs for up to 10 collectors.
`spec.exportNetworkFlows.sFlow.collectors`	`array`	A list of IP address and network port pairs for up to 10 collectors.
`spec.exportNetworkFlows.ipfix.collectors`	`array`	A list of IP address and network port pairs for up to 10 collectors.

After applying the following manifest to the CNO, the Operator configures Open vSwitch (OVS) on each node in the cluster to send network flows records to the NetFlow collector that is listening at 192.168.1.99:2056.

Example configuration for tracking network flows

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  exportNetworkFlows:
    netFlow:
      collectors:
        - 192.168.1.99:2056

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  exportNetworkFlows:
    netFlow:
      collectors:
        - 192.168.1.99:2056

Copy to Clipboard

Toggle word wrap

25.21.2. Adding destinations for network flows collectors
Copy link

As a cluster administrator, you can configure the Cluster Network Operator (CNO) to send network flows metadata about the pod network to a network flows collector.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.
You have a network flows collector and know the IP address and port that it listens on.

Procedure

Create a patch file that specifies the network flows collector type and the IP address and port information of the collectors:
```
spec:
  exportNetworkFlows:
    netFlow:
      collectors:
        - 192.168.1.99:2056
```
```
spec:
  exportNetworkFlows:
    netFlow:
      collectors:
        - 192.168.1.99:2056
```
Copy to Clipboard Toggle word wrap

Configure the CNO with the network flows collectors:

oc patch network.operator cluster --type merge -p "$(cat <file_name>.yaml)"

$ oc patch network.operator cluster --type merge -p "$(cat <file_name>.yaml)"

Copy to Clipboard

Toggle word wrap

Example output

network.operator.openshift.io/cluster patched

network.operator.openshift.io/cluster patched

Copy to Clipboard

Toggle word wrap

Verification

Verification is not typically necessary. You can run the following command to confirm that Open vSwitch (OVS) on each node is configured to send network flows records to one or more collectors.

View the Operator configuration to confirm that the exportNetworkFlows field is configured:

oc get network.operator cluster -o jsonpath="{.spec.exportNetworkFlows}"

$ oc get network.operator cluster -o jsonpath="{.spec.exportNetworkFlows}"

Copy to Clipboard

Toggle word wrap

Example output

{"netFlow":{"collectors":["192.168.1.99:2056"]}}

{"netFlow":{"collectors":["192.168.1.99:2056"]}}

Copy to Clipboard

Toggle word wrap

View the network flows configuration in OVS from each node:

for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node -o jsonpath='{range@.items[*]}{.metadata.name}{"\n"}{end}');
  do ;
    echo;
    echo $pod;
    oc -n openshift-ovn-kubernetes exec -c ovnkube-node $pod \
      -- bash -c 'for type in ipfix sflow netflow ; do ovs-vsctl find $type ; done';
done

$ for pod in $(oc get pods -n openshift-ovn-kubernetes -l app=ovnkube-node -o jsonpath='{range@.items[*]}{.metadata.name}{"\n"}{end}');
  do ;
    echo;
    echo $pod;
    oc -n openshift-ovn-kubernetes exec -c ovnkube-node $pod \
      -- bash -c 'for type in ipfix sflow netflow ; do ovs-vsctl find $type ; done';
done

Copy to Clipboard

Toggle word wrap

Example output

ovnkube-node-xrn4p
_uuid               : a4d2aaca-5023-4f3d-9400-7275f92611f9
active_timeout      : 60
add_id_to_interface : false
engine_id           : []
engine_type         : []
external_ids        : {}
targets             : ["192.168.1.99:2056"]

ovnkube-node-z4vq9
_uuid               : 61d02fdb-9228-4993-8ff5-b27f01a29bd6
active_timeout      : 60
add_id_to_interface : false
engine_id           : []
engine_type         : []
external_ids        : {}
targets             : ["192.168.1.99:2056"]-

...

ovnkube-node-xrn4p
_uuid               : a4d2aaca-5023-4f3d-9400-7275f92611f9
active_timeout      : 60
add_id_to_interface : false
engine_id           : []
engine_type         : []
external_ids        : {}
targets             : ["192.168.1.99:2056"]

ovnkube-node-z4vq9
_uuid               : 61d02fdb-9228-4993-8ff5-b27f01a29bd6
active_timeout      : 60
add_id_to_interface : false
engine_id           : []
engine_type         : []
external_ids        : {}
targets             : ["192.168.1.99:2056"]-

...

Copy to Clipboard

Toggle word wrap

25.21.3. Deleting all destinations for network flows collectors
Copy link

As a cluster administrator, you can configure the Cluster Network Operator (CNO) to stop sending network flows metadata to a network flows collector.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in to the cluster with a user with cluster-admin privileges.

Procedure

Remove all network flows collectors:

oc patch network.operator cluster --type='json' \
    -p='[{"op":"remove", "path":"/spec/exportNetworkFlows"}]'

$ oc patch network.operator cluster --type='json' \
    -p='[{"op":"remove", "path":"/spec/exportNetworkFlows"}]'

Copy to Clipboard

Toggle word wrap

Example output

network.operator.openshift.io/cluster patched

network.operator.openshift.io/cluster patched

Copy to Clipboard

Toggle word wrap

25.22. Configuring hybrid networking
Copy link

As a cluster administrator, you can configure the Red Hat OpenShift Networking OVN-Kubernetes network plugin to allow Linux and Windows nodes to host Linux and Windows workloads, respectively.

25.22.1. Configuring hybrid networking with OVN-Kubernetes
Copy link

You can configure your cluster to use hybrid networking with the OVN-Kubernetes network plugin. This allows a hybrid cluster that supports different node networking configurations.

Note

This configuration is necessary to run both Linux and Windows nodes in the same cluster.

Prerequisites

Install the OpenShift CLI (oc).
Log in to the cluster as a user with cluster-admin privileges.
Ensure that the cluster uses the OVN-Kubernetes network plugin.

Procedure

To configure the OVN-Kubernetes hybrid network overlay, enter the following command:
```
oc patch networks.operator.openshift.io cluster --type=merge \
  -p '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "hybridOverlayConfig":{
            "hybridClusterNetwork":[
              {
                "cidr": "<cidr>",
                "hostPrefix": <prefix>
              }
            ],
            "hybridOverlayVXLANPort": <overlay_port>
          }
        }
      }
    }
  }'
```
```
$ oc patch networks.operator.openshift.io cluster --type=merge \
  -p '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "hybridOverlayConfig":{
            "hybridClusterNetwork":[
              {
                "cidr": "<cidr>",
                "hostPrefix": <prefix>
              }
            ],
            "hybridOverlayVXLANPort": <overlay_port>
          }
        }
      }
    }
  }'
```
Copy to Clipboard Toggle word wrap
where:
cidr
Specify the CIDR configuration used for nodes on the additional overlay network. This CIDR must not overlap with the cluster network CIDR.
hostPrefix
Specifies the subnet prefix length to assign to each individual node. For example, if hostPrefix is set to 23, then each node is assigned a /23 subnet out of the given cidr, which allows for 510 (2^(32 - 23) - 2) pod IP addresses. If you are required to provide access to nodes from an external network, configure load balancers and routers to manage the traffic.
hybridOverlayVXLANPort
Specify a custom VXLAN port for the additional overlay network. This is required for running Windows nodes in a cluster installed on vSphere, and must not be configured for any other cloud provider. The custom port can be any open port excluding the default 4789 port. For more information on this requirement, see the Microsoft documentation on Pod-to-pod connectivity between hosts is broken.
Note
Windows Server Long-Term Servicing Channel (LTSC): Windows Server 2019 is not supported on clusters with a custom hybridOverlayVXLANPort value because this Windows server version does not support selecting a custom VXLAN port.
Example output
```
network.operator.openshift.io/cluster patched
```
```
network.operator.openshift.io/cluster patched
```
Copy to Clipboard Toggle word wrap

To confirm that the configuration is active, enter the following command. It can take several minutes for the update to apply.

oc get network.operator.openshift.io -o jsonpath="{.items[0].spec.defaultNetwork.ovnKubernetesConfig}"

$ oc get network.operator.openshift.io -o jsonpath="{.items[0].spec.defaultNetwork.ovnKubernetesConfig}"

Copy to Clipboard

Toggle word wrap

Chapter 26. OpenShift SDN network plugin
Copy link

26.1. About the OpenShift SDN network plugin
Copy link

Part of Red Hat OpenShift Networking, OpenShift SDN is a network plugin that uses a software-defined networking (SDN) approach to provide a unified cluster network that enables communication between pods across the OpenShift Container Platform cluster. This pod network is established and maintained by OpenShift SDN, which configures an overlay network by using Open vSwitch (OVS).

Important

For a cloud controller manager (CCM) with the --cloud-provider=external option set to cloud-provider-vsphere, a known issue exists for a cluster that operates in a networking environment with multiple subnets.

When you upgrade your cluster from OpenShift Container Platform 4.12 to OpenShift Container Platform 4.13, the CCM selects a wrong node IP address and this operation generates an error message in the namespaces/openshift-cloud-controller-manager/pods/vsphere-cloud-controller-manager logs. The error message indicates a mismatch with the node IP address and the vsphere-cloud-controller-manager pod IP address in your cluster.

The known issue might not impact the cluster upgrade operation, but you can set the correct IP address in both the nodeNetworking.external.networkSubnetCidr and the nodeNetworking.internal.networkSubnetCidr parameters for the nodeNetworking object that your cluster uses for its networking requirements.

26.1.1. OpenShift SDN network isolation modes
Copy link

OpenShift SDN provides three SDN modes for configuring the pod network:

Network policy mode allows project administrators to configure their own isolation policies using NetworkPolicy objects. Network policy is the default mode in OpenShift Container Platform 4.13.
Multitenant mode provides project-level isolation for pods and services. Pods from different projects cannot send packets to or receive packets from pods and services of a different project. You can disable isolation for a project, allowing it to send network traffic to all pods and services in the entire cluster and receive network traffic from those pods and services.
Subnet mode provides a flat pod network where every pod can communicate with every other pod and service. The network policy mode provides the same functionality as subnet mode.

26.1.2. Supported network plugin feature matrix
Copy link

Red Hat OpenShift Networking offers two options for the network plugin, OpenShift SDN and OVN-Kubernetes, for the network plugin. The following table summarizes the current feature support for both network plugins:

Expand

Table 26.1. Default CNI network plugin feature comparison
Feature	OpenShift SDN	OVN-Kubernetes
Egress IPs	Supported	Supported
Egress firewall	Supported	Supported ^[1]
Egress router	Supported	Supported ^[2]
Hybrid networking	Not supported	Supported
IPsec encryption for intra-cluster communication	Not supported	Supported
IPv4 single-stack	Supported	Supported
IPv6 single-stack	Not supported	Supported ^[3]
IPv4/IPv6 dual-stack	Not Supported	Supported ^[4]
IPv6/IPv4 dual-stack	Not supported	Supported ^[5]
Kubernetes network policy	Supported	Supported
Kubernetes network policy logs	Not supported	Supported
Hardware offloading	Not supported	Supported
Multicast	Supported	Supported

Egress firewall is also known as egress network policy in OpenShift SDN. This is not the same as network policy egress.
Egress router for OVN-Kubernetes supports only redirect mode.
IPv6 single-stack networking on a bare-metal platform.
IPv4/IPv6 dual-stack networking on bare-metal, VMware vSphere (installer-provisioned infrastructure installations only), IBM Power®, and IBM Z® platforms. On VMware vSphere, dual-stack networking limitations exist.
IPv6/IPv4 dual-stack networking on bare-metal and IBM Power® platforms.

26.2. Migrating to the OpenShift SDN network plugin
Copy link

As a cluster administrator, you can migrate to the OpenShift SDN network plugin from the OVN-Kubernetes network plugin.

To learn more about OpenShift SDN, read About the OpenShift SDN network plugin.

26.2.1. How the migration process works
Copy link

The following table summarizes the migration process by segmenting between the user-initiated steps in the process and the actions that the migration performs in response.

Expand

Table 26.2. Migrating to OpenShift SDN from OVN-Kubernetes
User-initiated steps	Migration activity
Set the `migration` field of the `Network.operator.openshift.io` custom resource (CR) named `cluster` to `OpenShiftSDN`. Make sure the `migration` field is `null` before setting it to a value.	Cluster Network Operator (CNO) Updates the status of the `Network.config.openshift.io` CR named `cluster` accordingly. Machine Config Operator (MCO) Rolls out an update to the systemd configuration necessary for OpenShift SDN; the MCO updates a single machine per pool at a time by default, causing the total time the migration takes to increase with the size of the cluster.
Update the `networkType` field of the `Network.config.openshift.io` CR.	CNO Performs the following actions: Destroys the OVN-Kubernetes control plane pods. Deploys the OpenShift SDN control plane pods. Updates the Multus objects to reflect the new network plugin.
Reboot each node in the cluster.	Cluster As nodes reboot, the cluster assigns IP addresses to pods on the OpenShift SDN cluster network.

26.2.2. Migrating to the OpenShift SDN network plugin
Copy link

Cluster administrators can roll back to the OpenShift SDN Container Network Interface (CNI) network plugin by using the offline migration method. During the migration you must manually reboot every node in your cluster. With the offline migration method, there is some downtime, during which your cluster is unreachable.

Prerequisites

Install the OpenShift CLI (oc).
Access to the cluster as a user with the cluster-admin role.
A cluster installed on infrastructure configured with the OVN-Kubernetes network plugin.
A recent backup of the etcd database is available.
A reboot can be triggered manually for each node.
The cluster is in a known good state, without any errors.

Procedure

Stop all of the machine configuration pools managed by the Machine Config Operator (MCO):

Stop the master configuration pool by entering the following command in your CLI:

oc patch MachineConfigPool master --type='merge' --patch \
  '{ "spec": { "paused": true } }'

$ oc patch MachineConfigPool master --type='merge' --patch \
  '{ "spec": { "paused": true } }'

Copy to Clipboard

Toggle word wrap

Stop the worker machine configuration pool by entering the following command in your CLI:

oc patch MachineConfigPool worker --type='merge' --patch \
  '{ "spec":{ "paused": true } }'

$ oc patch MachineConfigPool worker --type='merge' --patch \
  '{ "spec":{ "paused": true } }'

Copy to Clipboard

Toggle word wrap

To prepare for the migration, set the migration field to null by entering the following command in your CLI:

oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

Copy to Clipboard

Toggle word wrap

Check that the migration status is empty for the Network.config.openshift.io object by entering the following command in your CLI. Empty command output indicates that the object is not in a migration operation.
```
oc get Network.config cluster -o jsonpath='{.status.migration}'
```
```
$ oc get Network.config cluster -o jsonpath='{.status.migration}'
```
Copy to Clipboard Toggle word wrap
Apply the patch to the Network.operator.openshift.io object to set the network plugin back to OpenShift SDN by entering the following command in your CLI:
```
oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OpenShiftSDN" } } }'
```
```
$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OpenShiftSDN" } } }'
```
Copy to Clipboard Toggle word wrap
Important
If you applied the patch to the Network.config.openshift.io object before the patch operation finalizes on the Network.operator.openshift.io object, the Cluster Network Operator (CNO) enters into a degradation state and this causes a slight delay until the CNO recovers from the degraded state.
Confirm that the migration status of the network plugin for the Network.config.openshift.io cluster object is OpenShiftSDN by entering the following command in your CLI:
```
oc get Network.config cluster -o jsonpath='{.status.migration.networkType}'
```
```
$ oc get Network.config cluster -o jsonpath='{.status.migration.networkType}'
```
Copy to Clipboard Toggle word wrap

Apply the patch to the Network.config.openshift.io object to set the network plugin back to OpenShift SDN by entering the following command in your CLI:

oc patch Network.config.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "networkType": "OpenShiftSDN" } }'

$ oc patch Network.config.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "networkType": "OpenShiftSDN" } }'

Copy to Clipboard

Toggle word wrap

Optional: Disable automatic migration of several OVN-Kubernetes capabilities to the OpenShift SDN equivalents:

Egress IPs
Egress firewall
Multicast

To disable automatic migration of the configuration for any of the previously noted OpenShift SDN features, specify the following keys:

oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{
    "spec": {
      "migration": {
        "networkType": "OpenShiftSDN",
        "features": {
          "egressIP": <bool>,
          "egressFirewall": <bool>,
          "multicast": <bool>
        }
      }
    }
  }'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{
    "spec": {
      "migration": {
        "networkType": "OpenShiftSDN",
        "features": {
          "egressIP": <bool>,
          "egressFirewall": <bool>,
          "multicast": <bool>
        }
      }
    }
  }'

Copy to Clipboard

Toggle word wrap

where:

bool: Specifies whether to enable migration of the feature. The default is true.

Optional: You can customize the following settings for OpenShift SDN to meet your network infrastructure requirements:
- Maximum transmission unit (MTU)
- VXLAN port
To customize either or both of the previously noted settings, customize and enter the following command in your CLI. If you do not need to change the default value, omit the key from the patch.
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "openshiftSDNConfig":{
          "mtu":<mtu>,
          "vxlanPort":<port>
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "openshiftSDNConfig":{
          "mtu":<mtu>,
          "vxlanPort":<port>
    }}}}'
```
Copy to Clipboard Toggle word wrap
mtu
The MTU for the VXLAN overlay network. This value is normally configured automatically, but if the nodes in your cluster do not all use the same MTU, then you must set this explicitly to 50 less than the smallest node MTU value.
port
The UDP port for the VXLAN overlay network. If a value is not specified, the default is 4789. The port cannot be the same as the Geneve port that is used by OVN-Kubernetes. The default value for the Geneve port is 6081.
Example patch command
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "openshiftSDNConfig":{
          "mtu":1200
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "openshiftSDNConfig":{
          "mtu":1200
    }}}}'
```
Copy to Clipboard Toggle word wrap

Reboot each node in your cluster. You can reboot the nodes in your cluster with either of the following approaches:

With the oc rsh command, you can use a bash script similar to the following:

#!/bin/bash
readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')"

for i in "${POD_NODES[@]}"
do
  read -r POD NODE <<< "$i"
  until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1
    do
      echo "cannot reboot node $NODE, retry" && sleep 3
    done
done

#!/bin/bash
readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')"

for i in "${POD_NODES[@]}"
do
  read -r POD NODE <<< "$i"
  until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1
    do
      echo "cannot reboot node $NODE, retry" && sleep 3
    done
done

Copy to Clipboard

Toggle word wrap

With the ssh command, you can use a bash script similar to the following. The script assumes that you have configured sudo to not prompt for a password.

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

Copy to Clipboard

Toggle word wrap

Wait until the Multus daemon set rollout completes. Run the following command to see your rollout status:

oc -n openshift-multus rollout status daemonset/multus

$ oc -n openshift-multus rollout status daemonset/multus

Copy to Clipboard

Toggle word wrap

The name of the Multus pods is in the form of multus-<xxxxx> where <xxxxx> is a random sequence of letters. It might take several moments for the pods to restart.

Example output

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Copy to Clipboard

Toggle word wrap

After the nodes in your cluster have rebooted and the multus pods are rolled out, start all of the machine configuration pools by running the following commands::
- Start the master configuration pool:
  $ oc patch MachineConfigPool master --type='merge' --patch \ '{ "spec": { "paused": false } }'
  Copy to Clipboard Toggle word wrap
- Start the worker configuration pool:
  $ oc patch MachineConfigPool worker --type='merge' --patch \ '{ "spec": { "paused": false } }'
  Copy to Clipboard Toggle word wrap
As the MCO updates machines in each config pool, it reboots each node.
By default the MCO updates a single machine per pool at a time, so the time that the migration requires to complete grows with the size of the cluster.
Confirm the status of the new machine configuration on the hosts:
1. To list the machine configuration state and the name of the applied machine configuration, enter the following command in your CLI:
  $ oc describe node | egrep "hostname|machineconfig"
  Copy to Clipboard Toggle word wrap
  Example output
  kubernetes.io/hostname=master-0 machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b machineconfiguration.openshift.io/reason: machineconfiguration.openshift.io/state: Done
  
  Copy to Clipboard Toggle word wrap
  Verify that the following statements are true:
  - The value of machineconfiguration.openshift.io/state field is Done.
  - The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.
2. To confirm that the machine config is correct, enter the following command in your CLI:
  $ oc get machineconfig <config_name> -o yaml
  Copy to Clipboard Toggle word wrap
  where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.

Confirm that the migration succeeded:

To confirm that the network plugin is OpenShift SDN, enter the following command in your CLI. The value of status.networkType must be OpenShiftSDN.
```
oc get Network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
```
```
$ oc get Network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
```
Copy to Clipboard Toggle word wrap
To confirm that the cluster nodes are in the Ready state, enter the following command in your CLI:
```
oc get nodes
```
```
$ oc get nodes
```
Copy to Clipboard Toggle word wrap

If a node is stuck in the NotReady state, investigate the machine config daemon pod logs and resolve any errors.

To list the pods, enter the following command in your CLI:

oc get pod -n openshift-machine-config-operator

$ oc get pod -n openshift-machine-config-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

Copy to Clipboard

Toggle word wrap

The names for the config daemon pods are in the following format: machine-config-daemon-<seq>. The <seq> value is a random five character alphanumeric sequence.

To display the pod log for each machine config daemon pod shown in the previous output, enter the following command in your CLI:
```
oc logs <pod> -n openshift-machine-config-operator
```
```
$ oc logs <pod> -n openshift-machine-config-operator
```
Copy to Clipboard Toggle word wrap
where pod is the name of a machine config daemon pod.
Resolve any errors in the logs shown by the output from the previous command.

To confirm that your pods are not in an error state, enter the following command in your CLI:
```
oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
```
```
$ oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
```
Copy to Clipboard Toggle word wrap
If pods on a node are in an error state, reboot that node.

Complete the following steps only if the migration succeeds and your cluster is in a good state:

To remove the migration configuration from the Cluster Network Operator configuration object, enter the following command in your CLI:

oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": null } }'

Copy to Clipboard

Toggle word wrap

To remove the OVN-Kubernetes configuration, enter the following command in your CLI:

oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "defaultNetwork": { "ovnKubernetesConfig":null } } }'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "defaultNetwork": { "ovnKubernetesConfig":null } } }'

Copy to Clipboard

Toggle word wrap

To remove the OVN-Kubernetes network provider namespace, enter the following command in your CLI:
```
oc delete namespace openshift-ovn-kubernetes
```
```
$ oc delete namespace openshift-ovn-kubernetes
```
Copy to Clipboard Toggle word wrap

26.3. Rolling back to the OVN-Kubernetes network plugin
Copy link

As a cluster administrator, you can rollback to the OVN-Kubernetes network plugin from the OpenShift SDN network plugin if the migration to OpenShift SDN is unsuccessful.

To learn more about OVN-Kubernetes, read About the OVN-Kubernetes network plugin.

26.3.1. Migrating to the OVN-Kubernetes network plugin
Copy link

As a cluster administrator, you can change the network plugin for your cluster to OVN-Kubernetes. During the migration, you must reboot every node in your cluster.

Important

While performing the migration, your cluster is unavailable and workloads might be interrupted. Perform the migration only when an interruption in service is acceptable.

Prerequisites

You have a cluster configured with the OpenShift SDN CNI network plugin in the network policy isolation mode.
You installed the OpenShift CLI (oc).
You have access to the cluster as a user with the cluster-admin role.
You have a recent backup of the etcd database.
You can manually reboot each node.
You checked that your cluster is in a known good state without any errors.
You created a security group rule that allows User Datagram Protocol (UDP) packets on port 6081 for all nodes on all cloud platforms.

Procedure

To backup the configuration for the cluster network, enter the following command:

oc get Network.config.openshift.io cluster -o yaml > cluster-openshift-sdn.yaml

$ oc get Network.config.openshift.io cluster -o yaml > cluster-openshift-sdn.yaml

Copy to Clipboard

Toggle word wrap

Verify that the OVN_SDN_MIGRATION_TIMEOUT environment variable is set and is equal to 0s by running the following command:

#!/bin/bash

if [ -n "$OVN_SDN_MIGRATION_TIMEOUT" ] && [ "$OVN_SDN_MIGRATION_TIMEOUT" = "0s" ]; then
    unset OVN_SDN_MIGRATION_TIMEOUT
fi

#loops the timeout command of the script to repeatedly check the cluster Operators until all are available.

co_timeout=${OVN_SDN_MIGRATION_TIMEOUT:-1200s}
timeout "$co_timeout" bash <<EOT
until
  oc wait co --all --for='condition=AVAILABLE=True' --timeout=10s && \
  oc wait co --all --for='condition=PROGRESSING=False' --timeout=10s && \
  oc wait co --all --for='condition=DEGRADED=False' --timeout=10s;
do
  sleep 10
  echo "Some ClusterOperators Degraded=False,Progressing=True,or Available=False";
done
EOT

#!/bin/bash

if [ -n "$OVN_SDN_MIGRATION_TIMEOUT" ] && [ "$OVN_SDN_MIGRATION_TIMEOUT" = "0s" ]; then
    unset OVN_SDN_MIGRATION_TIMEOUT
fi

#loops the timeout command of the script to repeatedly check the cluster Operators until all are available.

co_timeout=${OVN_SDN_MIGRATION_TIMEOUT:-1200s}
timeout "$co_timeout" bash <<EOT
until
  oc wait co --all --for='condition=AVAILABLE=True' --timeout=10s && \
  oc wait co --all --for='condition=PROGRESSING=False' --timeout=10s && \
  oc wait co --all --for='condition=DEGRADED=False' --timeout=10s;
do
  sleep 10
  echo "Some ClusterOperators Degraded=False,Progressing=True,or Available=False";
done
EOT

Copy to Clipboard

Toggle word wrap

Remove the configuration from the Cluster Network Operator (CNO) configuration object by running the following command:

oc patch Network.operator.openshift.io cluster --type='merge' \
--patch '{"spec":{"migration":null}}'

$ oc patch Network.operator.openshift.io cluster --type='merge' \
--patch '{"spec":{"migration":null}}'

Copy to Clipboard

Toggle word wrap

Delete the NodeNetworkConfigurationPolicy (NNCP) custom resource (CR) that defines the primary network interface for the OpenShift SDN network plugin by completing the following steps:
1. Check that the existing NNCP CR bonded the primary interface to your cluster by entering the following command:
  $ oc get nncp
  Copy to Clipboard Toggle word wrap
  Example output
  NAME STATUS REASON bondmaster0 Available SuccessfullyConfigured
  
  Copy to Clipboard Toggle word wrap
  Network Manager stores the connection profile for the bonded primary interface in the /etc/NetworkManager/system-connections system path.
2. Remove the NNCP from your cluster:
  $ oc delete nncp <nncp_manifest_filename>
  Copy to Clipboard Toggle word wrap
To prepare all the nodes for the migration, set the migration field on the CNO configuration object by running the following command:
```
oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes" } } }'
```
```
$ oc patch Network.operator.openshift.io cluster --type='merge' \
  --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes" } } }'
```
Copy to Clipboard Toggle word wrap
Note
This step does not deploy OVN-Kubernetes immediately. Instead, specifying the migration field triggers the Machine Config Operator (MCO) to apply new machine configs to all the nodes in the cluster in preparation for the OVN-Kubernetes deployment.
1. Check that the reboot is finished by running the following command:
  $ oc get mcp
  Copy to Clipboard Toggle word wrap
2. Check that all cluster Operators are available by running the following command:
  $ oc get co
  Copy to Clipboard Toggle word wrap
3. Alternatively: You can disable automatic migration of several OpenShift SDN capabilities to the OVN-Kubernetes equivalents:
  - Egress IPs
  - Egress firewall
  - Multicast
  To disable automatic migration of the configuration for any of the previously noted OpenShift SDN features, specify the following keys:
  $ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": { "networkType": "OVNKubernetes", "features": { "egressIP": <bool>, "egressFirewall": <bool>, "multicast": <bool> } } } }'
  Copy to Clipboard Toggle word wrap
  where:
  bool: Specifies whether to enable migration of the feature. The default is true.
Optional: You can customize the following settings for OVN-Kubernetes to meet your network infrastructure requirements:
- Maximum transmission unit (MTU). Consider the following before customizing the MTU for this optional step:
  - If you use the default MTU, and you want to keep the default MTU during migration, this step can be ignored.
  - If you used a custom MTU, and you want to keep the custom MTU during migration, you must declare the custom MTU value in this step.
  - This step does not work if you want to change the MTU value during migration. Instead, you must first follow the instructions for "Changing the cluster MTU". You can then keep the custom MTU value by performing this procedure and declaring the custom MTU value in this step.
    Note
    OpenShift-SDN and OVN-Kubernetes have different overlay overhead. MTU values should be selected by following the guidelines found on the "MTU value selection" page.
- Geneve (Generic Network Virtualization Encapsulation) overlay network port
- OVN-Kubernetes IPv4 internal subnet
To customize either of the previously noted settings, enter and customize the following command. If you do not need to change the default value, omit the key from the patch.
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":<mtu>,
          "genevePort":<port>,
          "v4InternalSubnet":"<ipv4_subnet>"
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":<mtu>,
          "genevePort":<port>,
          "v4InternalSubnet":"<ipv4_subnet>"
    }}}}'
```
Copy to Clipboard Toggle word wrap
where:
mtu
The MTU for the Geneve overlay network. This value is normally configured automatically, but if the nodes in your cluster do not all use the same MTU, then you must set this explicitly to 100 less than the smallest node MTU value.
port
The UDP port for the Geneve overlay network. If a value is not specified, the default is 6081. The port cannot be the same as the VXLAN port that is used by OpenShift SDN. The default value for the VXLAN port is 4789.
ipv4_subnet
An IPv4 address range for internal use by OVN-Kubernetes. You must ensure that the IP address range does not overlap with any other subnet used by your OpenShift Container Platform installation. The IP address range must be larger than the maximum number of nodes that can be added to the cluster. The default value is 100.64.0.0/16.
Example patch command to update mtu field
```
oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":1200
    }}}}'
```
```
$ oc patch Network.operator.openshift.io cluster --type=merge \
  --patch '{
    "spec":{
      "defaultNetwork":{
        "ovnKubernetesConfig":{
          "mtu":1200
    }}}}'
```
Copy to Clipboard Toggle word wrap
As the MCO updates machines in each machine config pool, it reboots each node one by one. You must wait until all the nodes are updated. Check the machine config pool status by entering the following command:
```
oc get mcp
```
```
$ oc get mcp
```
Copy to Clipboard Toggle word wrap
A successfully updated node has the following status: UPDATED=true, UPDATING=false, DEGRADED=false.
Note
By default, the MCO updates one machine per pool at a time, causing the total time the migration takes to increase with the size of the cluster.

Confirm the status of the new machine configuration on the hosts:

To list the machine configuration state and the name of the applied machine configuration, enter the following command:

oc describe node | egrep "hostname|machineconfig"

$ oc describe node | egrep "hostname|machineconfig"

Copy to Clipboard

Toggle word wrap

Example output

kubernetes.io/hostname=master-0
machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done

kubernetes.io/hostname=master-0
machineconfiguration.openshift.io/currentConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
machineconfiguration.openshift.io/desiredConfig: rendered-master-c53e221d9d24e1c8bb6ee89dd3d8ad7b
machineconfiguration.openshift.io/reason:
machineconfiguration.openshift.io/state: Done

Copy to Clipboard

Toggle word wrap

Verify that the following statements are true:

The value of machineconfiguration.openshift.io/state field is Done.
The value of the machineconfiguration.openshift.io/currentConfig field is equal to the value of the machineconfiguration.openshift.io/desiredConfig field.

To confirm that the machine config is correct, enter the following command:
```
oc get machineconfig <config_name> -o yaml | grep ExecStart
```
```
$ oc get machineconfig <config_name> -o yaml | grep ExecStart
```
Copy to Clipboard Toggle word wrap
where <config_name> is the name of the machine config from the machineconfiguration.openshift.io/currentConfig field.
The machine config must include the following update to the systemd configuration:
```
ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes
```
```
ExecStart=/usr/local/bin/configure-ovs.sh OVNKubernetes
```
Copy to Clipboard Toggle word wrap

If a node is stuck in the NotReady state, investigate the machine config daemon pod logs and resolve any errors.

To list the pods, enter the following command:

oc get pod -n openshift-machine-config-operator

$ oc get pod -n openshift-machine-config-operator

Copy to Clipboard

Toggle word wrap

Example output

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

NAME                                         READY   STATUS    RESTARTS   AGE
machine-config-controller-75f756f89d-sjp8b   1/1     Running   0          37m
machine-config-daemon-5cf4b                  2/2     Running   0          43h
machine-config-daemon-7wzcd                  2/2     Running   0          43h
machine-config-daemon-fc946                  2/2     Running   0          43h
machine-config-daemon-g2v28                  2/2     Running   0          43h
machine-config-daemon-gcl4f                  2/2     Running   0          43h
machine-config-daemon-l5tnv                  2/2     Running   0          43h
machine-config-operator-79d9c55d5-hth92      1/1     Running   0          37m
machine-config-server-bsc8h                  1/1     Running   0          43h
machine-config-server-hklrm                  1/1     Running   0          43h
machine-config-server-k9rtx                  1/1     Running   0          43h

Copy to Clipboard

Toggle word wrap

The names for the config daemon pods are in the following format: machine-config-daemon-<seq>. The <seq> value is a random five character alphanumeric sequence.

Display the pod log for the first machine config daemon pod shown in the previous output by enter the following command:
```
oc logs <pod> -n openshift-machine-config-operator
```
```
$ oc logs <pod> -n openshift-machine-config-operator
```
Copy to Clipboard Toggle word wrap
where pod is the name of a machine config daemon pod.
Resolve any errors in the logs shown by the output from the previous command.

To start the migration, configure the OVN-Kubernetes network plugin by using one of the following commands:

To specify the network provider without changing the cluster network IP address block, enter the following command:

oc patch Network.config.openshift.io cluster \
  --type='merge' --patch '{ "spec": { "networkType": "OVNKubernetes" } }'

$ oc patch Network.config.openshift.io cluster \
  --type='merge' --patch '{ "spec": { "networkType": "OVNKubernetes" } }'

Copy to Clipboard

Toggle word wrap

To specify a different cluster network IP address block, enter the following command:

oc patch Network.config.openshift.io cluster \
  --type='merge' --patch '{
    "spec": {
      "clusterNetwork": [
        {
          "cidr": "<cidr>",
          "hostPrefix": <prefix>
        }
      ],
      "networkType": "OVNKubernetes"
    }
  }'

$ oc patch Network.config.openshift.io cluster \
  --type='merge' --patch '{
    "spec": {
      "clusterNetwork": [
        {
          "cidr": "<cidr>",
          "hostPrefix": <prefix>
        }
      ],
      "networkType": "OVNKubernetes"
    }
  }'

Copy to Clipboard

Toggle word wrap

where cidr is a CIDR block and prefix is the slice of the CIDR block apportioned to each node in your cluster. You cannot use any CIDR block that overlaps with the 100.64.0.0/16 CIDR block because the OVN-Kubernetes network provider uses this block internally.

Important

You cannot change the service network address block during the migration.

Verify that the Multus daemon set rollout is complete before continuing with subsequent steps:

oc -n openshift-multus rollout status daemonset/multus

$ oc -n openshift-multus rollout status daemonset/multus

Copy to Clipboard

Toggle word wrap

The name of the Multus pods is in the form of multus-<xxxxx> where <xxxxx> is a random sequence of letters. It might take several moments for the pods to restart.

Example output

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Waiting for daemon set "multus" rollout to finish: 1 out of 6 new pods have been updated...
...
Waiting for daemon set "multus" rollout to finish: 5 of 6 updated pods are available...
daemon set "multus" successfully rolled out

Copy to Clipboard

Toggle word wrap

To complete changing the network plugin, reboot each node in your cluster. You can reboot the nodes in your cluster with either of the following approaches:

Important

The following scripts reboot all of the nodes in the cluster at the same time. This can cause your cluster to be unstable. Another option is to reboot your nodes manually one at a time. Rebooting nodes one-by-one causes considerable downtime in a cluster with many nodes.

Cluster Operators will not work correctly before you reboot the nodes.

With the oc rsh command, you can use a bash script similar to the following:

#!/bin/bash
readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')"

for i in "${POD_NODES[@]}"
do
  read -r POD NODE <<< "$i"
  until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1
    do
      echo "cannot reboot node $NODE, retry" && sleep 3
    done
done

#!/bin/bash
readarray -t POD_NODES <<< "$(oc get pod -n openshift-machine-config-operator -o wide| grep daemon|awk '{print $1" "$7}')"

for i in "${POD_NODES[@]}"
do
  read -r POD NODE <<< "$i"
  until oc rsh -n openshift-machine-config-operator "$POD" chroot /rootfs shutdown -r +1
    do
      echo "cannot reboot node $NODE, retry" && sleep 3
    done
done

Copy to Clipboard

Toggle word wrap

With the ssh command, you can use a bash script similar to the following. The script assumes that you have configured sudo to not prompt for a password.

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

#!/bin/bash

for ip in $(oc get nodes  -o jsonpath='{.items[*].status.addresses[?(@.type=="InternalIP")].address}')
do
   echo "reboot node $ip"
   ssh -o StrictHostKeyChecking=no core@$ip sudo shutdown -r -t 3
done

Copy to Clipboard

Toggle word wrap

Confirm that the migration succeeded:
1. To confirm that the network plugin is OVN-Kubernetes, enter the following command. The value of status.networkType must be OVNKubernetes.
  $ oc get network.config/cluster -o jsonpath='{.status.networkType}{"\n"}'
  Copy to Clipboard Toggle word wrap
2. To confirm that the cluster nodes are in the Ready state, enter the following command:
  $ oc get nodes
  Copy to Clipboard Toggle word wrap
3. To confirm that your pods are not in an error state, enter the following command:
  $ oc get pods --all-namespaces -o wide --sort-by='{.spec.nodeName}'
  Copy to Clipboard Toggle word wrap
  If pods on a node are in an error state, reboot that node.
4. To confirm that all of the cluster Operators are not in an abnormal state, enter the following command:
  $ oc get co
  Copy to Clipboard Toggle word wrap
  The status of every cluster Operator must be the following: AVAILABLE="True", PROGRESSING="False", DEGRADED="False". If a cluster Operator is not available or degraded, check the logs for the cluster Operator for more information.
Complete the following steps only if the migration succeeds and your cluster is in a good state:
1. To remove the migration configuration from the CNO configuration object, enter the following command:
  $ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "migration": null } }'
  Copy to Clipboard Toggle word wrap
2. To remove custom configuration for the OpenShift SDN network provider, enter the following command:
  $ oc patch Network.operator.openshift.io cluster --type='merge' \ --patch '{ "spec": { "defaultNetwork": { "openshiftSDNConfig": null } } }'
  Copy to Clipboard Toggle word wrap
3. To remove the OpenShift SDN network provider namespace, enter the following command:
  $ oc delete namespace openshift-sdn
  Copy to Clipboard Toggle word wrap
4. After a successful migration operation, remove the network.openshift.io/network-type-migration- annotation from the network.config custom resource by entering the following command:
  $ oc annotate network.config cluster network.openshift.io/network-type-migration-
  Copy to Clipboard Toggle word wrap

Next steps

Optional: After cluster migration, you can convert your IPv4 single-stack cluster to a dual-network cluster network that supports IPv4 and IPv6 address families. For more information, see "Converting to IPv4/IPv6 dual-stack networking".

26.4. Configuring egress IPs for a project
Copy link

As a cluster administrator, you can configure the OpenShift SDN Container Network Interface (CNI) network plugin to assign one or more egress IP addresses to a project.

26.4.1. Egress IP address architectural design and implementation
Copy link

By using the OpenShift Container Platform egress IP address functionality, you can ensure that the traffic from one or more pods in one or more namespaces has a consistent source IP address for services outside the cluster network.

For example, you might have a pod that periodically queries a database that is hosted on a server outside of your cluster. To enforce access requirements for the server, a packet filtering device is configured to allow traffic only from specific IP addresses. To ensure that you can reliably allow access to the server from only that specific pod, you can configure a specific egress IP address for the pod that makes the requests to the server.

An egress IP address assigned to a namespace is different from an egress router, which is used to send traffic to specific destinations.

In some cluster configurations, application pods and ingress router pods run on the same node. If you configure an egress IP address for an application project in this scenario, the IP address is not used when you send a request to a route from the application project.

An egress IP address is implemented as an additional IP address on the primary network interface of a node and must be in the same subnet as the primary IP address of the node. The additional IP address must not be assigned to any other node in the cluster.

Important

Egress IP addresses must not be configured in any Linux network configuration files, such as ifcfg-eth0.

26.4.1.1. Platform support
Copy link

The Egress IP address feature that runs on a primary host network is supported on the following platforms:

Expand

Platform	Supported
Bare metal	Yes
VMware vSphere	Yes
Red Hat OpenStack Platform (RHOSP)	Yes
Amazon Web Services (AWS)	Yes
Google Cloud	Yes
Microsoft Azure	Yes
IBM Z and IBM® LinuxONE	Yes
IBM Z and IBM® LinuxONE for Red Hat Enterprise Linux (RHEL) KVM	Yes
IBM Power	Yes

The Egress IP address feature that runs on secondary host networks is supported on the following platform:

Expand

Platform	Supported
Bare metal	Yes

Important

The assignment of egress IP addresses to control plane nodes with the EgressIP feature is not supported on a cluster provisioned on Amazon Web Services (AWS). (BZ#2039656)

26.4.1.2. Public cloud platform considerations
Copy link

Typically, public cloud providers place a limit on egress IPs. This means that there is a constraint on the absolute number of assignable IP addresses per node for clusters provisioned on public cloud infrastructure. The maximum number of assignable IP addresses per node, or the IP capacity, can be described in the following formula:

IP capacity = public cloud default capacity - sum(current IP assignments)

IP capacity = public cloud default capacity - sum(current IP assignments)

Copy to Clipboard

Toggle word wrap

While the Egress IPs capability manages the IP address capacity per node, it is important to plan for this constraint in your deployments. For example, if a public cloud provider limits IP address capacity to 10 IP addresses per node, and you have 8 nodes, the total number of assignable IP addresses is only 80. To achieve a higher IP address capacity, you would need to allocate additional nodes. For example, if you needed 150 assignable IP addresses, you would need to allocate 7 additional nodes.

To confirm the IP capacity and subnets for any node in your public cloud environment, you can enter the oc get node <node_name> -o yaml command. The cloud.network.openshift.io/egress-ipconfig annotation includes capacity and subnet information for the node.

The annotation value is an array with a single object with fields that provide the following information for the primary network interface:

interface: Specifies the interface ID on AWS and Azure and the interface name on Google Cloud.
ifaddr: Specifies the subnet mask for one or both IP address families.
capacity: Specifies the IP address capacity for the node. On AWS, the IP address capacity is provided per IP address family. On Azure and Google Cloud, the IP address capacity includes both IPv4 and IPv6 addresses.

Automatic attachment and detachment of egress IP addresses for traffic between nodes are available. This allows for traffic from many pods in namespaces to have a consistent source IP address to locations outside of the cluster. This also supports OpenShift SDN and OVN-Kubernetes, which is the default networking plugin in Red Hat OpenShift Networking in OpenShift Container Platform 4.13.

Note

The RHOSP egress IP address feature creates a Neutron reservation port called egressip-<IP address>. Using the same RHOSP user as the one used for the OpenShift Container Platform cluster installation, you can assign a floating IP address to this reservation port to have a predictable SNAT address for egress traffic. When an egress IP address on an RHOSP network is moved from one node to another, because of a node failover, for example, the Neutron reservation port is removed and recreated. This means that the floating IP association is lost and you need to manually reassign the floating IP address to the new reservation port.

Note

When an RHOSP cluster administrator assigns a floating IP to the reservation port, OpenShift Container Platform cannot delete the reservation port. The CloudPrivateIPConfig object cannot perform delete and move operations until an RHOSP cluster administrator unassigns the floating IP from the reservation port.

The following examples illustrate the annotation from nodes on several public cloud providers. The annotations are indented for readability.

Example cloud.network.openshift.io/egress-ipconfig annotation on AWS

cloud.network.openshift.io/egress-ipconfig: [
  {
    "interface":"eni-078d267045138e436",
    "ifaddr":{"ipv4":"10.0.128.0/18"},
    "capacity":{"ipv4":14,"ipv6":15}
  }
]

cloud.network.openshift.io/egress-ipconfig: [
  {
    "interface":"eni-078d267045138e436",
    "ifaddr":{"ipv4":"10.0.128.0/18"},
    "capacity":{"ipv4":14,"ipv6":15}
  }
]

Copy to Clipboard

Toggle word wrap

Example cloud.network.openshift.io/egress-ipconfig annotation on Google Cloud

cloud.network.openshift.io/egress-ipconfig: [
  {
    "interface":"nic0",
    "ifaddr":{"ipv4":"10.0.128.0/18"},
    "capacity":{"ip":14}
  }
]

cloud.network.openshift.io/egress-ipconfig: [
  {
    "interface":"nic0",
    "ifaddr":{"ipv4":"10.0.128.0/18"},
    "capacity":{"ip":14}
  }
]

Copy to Clipboard

Toggle word wrap

The following sections describe the IP address capacity for supported public cloud environments for use in your capacity calculation.

26.4.1.2.1. Amazon Web Services (AWS) IP address capacity limits
Copy link

On AWS, constraints on IP address assignments depend on the instance type configured. For more information, see IP addresses per network interface per instance type

26.4.1.2.2. Google Cloud IP address capacity limits
Copy link

On Google Cloud, the networking model implements additional node IP addresses through IP address aliasing, rather than IP address assignments. However, IP address capacity maps directly to IP aliasing capacity.

The following capacity limits exist for IP aliasing assignment:

Per node, the maximum number of IP aliases, both IPv4 and IPv6, is 100.
Per VPC, the maximum number of IP aliases is unspecified, but OpenShift Container Platform scalability testing reveals the maximum to be approximately 15,000.

For more information, see Per instance quotas and Alias IP ranges overview.

26.4.1.2.3. Microsoft Azure IP address capacity limits
Copy link

On Azure, the following capacity limits exist for IP address assignment:

Per NIC, the maximum number of assignable IP addresses, for both IPv4 and IPv6, is 256.
Per virtual network, the maximum number of assigned IP addresses cannot exceed 65,536.

For more information, see Networking limits.

26.4.1.3. Limitations
Copy link

The following limitations apply when using egress IP addresses with the OpenShift SDN network plugin:

You cannot use manually assigned and automatically assigned egress IP addresses on the same nodes.
If you manually assign egress IP addresses from an IP address range, you must not make that range available for automatic IP assignment.
You cannot share egress IP addresses across multiple namespaces using the OpenShift SDN egress IP address implementation.

If you need to share IP addresses across namespaces, the OVN-Kubernetes network plugin egress IP address implementation allows you to span IP addresses across multiple namespaces.

Note

If you use OpenShift SDN in multitenant mode, you cannot use egress IP addresses with any namespace that is joined to another namespace by the projects that are associated with them. For example, if project1 and project2 are joined by running the oc adm pod-network join-projects --to=project1 project2 command, neither project can use an egress IP address. For more information, see BZ#1645577.

26.4.1.4. IP address assignment approaches
Copy link

You can assign egress IP addresses to namespaces by setting the egressIPs parameter of the NetNamespace object. After an egress IP address is associated with a project, OpenShift SDN allows you to assign egress IP addresses to hosts in two ways:

In the automatically assigned approach, an egress IP address range is assigned to a node.
In the manually assigned approach, a list of one or more egress IP address is assigned to a node.

Namespaces that request an egress IP address are matched with nodes that can host those egress IP addresses, and then the egress IP addresses are assigned to those nodes. If the egressIPs parameter is set on a NetNamespace object, but no node hosts that egress IP address, then egress traffic from the namespace will be dropped.

High availability of nodes is automatic. If a node that hosts an egress IP address is unreachable and there are nodes that are able to host that egress IP address, then the egress IP address will move to a new node. When the unreachable node comes back online, the egress IP address automatically moves to balance egress IP addresses across nodes.

26.4.1.4.1. Considerations when using automatically assigned egress IP addresses
Copy link

When using the automatic assignment approach for egress IP addresses the following considerations apply:

You set the egressCIDRs parameter of each node’s HostSubnet resource to indicate the range of egress IP addresses that can be hosted by a node. OpenShift Container Platform sets the egressIPs parameter of the HostSubnet resource based on the IP address range you specify.

If the node hosting the namespace’s egress IP address is unreachable, OpenShift Container Platform will reassign the egress IP address to another node with a compatible egress IP address range. The automatic assignment approach works best for clusters installed in environments with flexibility in associating additional IP addresses with nodes.

26.4.1.4.2. Considerations when using manually assigned egress IP addresses
Copy link

This approach allows you to control which nodes can host an egress IP address.

Note

If your cluster is installed on public cloud infrastructure, you must ensure that each node that you assign egress IP addresses to has sufficient spare capacity to host the IP addresses. For more information, see "Platform considerations" in a previous section.

When using the manual assignment approach for egress IP addresses the following considerations apply:

You set the egressIPs parameter of each node’s HostSubnet resource to indicate the IP addresses that can be hosted by a node.
Multiple egress IP addresses per namespace are supported.

If a namespace has multiple egress IP addresses and those addresses are hosted on multiple nodes, the following additional considerations apply:

If a pod is on a node that is hosting an egress IP address, that pod always uses the egress IP address on the node.
If a pod is not on a node that is hosting an egress IP address, that pod uses an egress IP address at random.

26.4.2. Configuring automatically assigned egress IP addresses for a namespace
Copy link

In OpenShift Container Platform you can enable automatic assignment of an egress IP address for a specific namespace across one or more nodes.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).

Procedure

Update the NetNamespace object with the egress IP address using the following JSON:
```
oc patch netnamespace <project_name> --type=merge -p \
  '{
    "egressIPs": [
      "<ip_address>"
    ]
  }'
```
```
 $ oc patch netnamespace <project_name> --type=merge -p \
  '{
    "egressIPs": [
      "<ip_address>"
    ]
  }'
```
Copy to Clipboard Toggle word wrap
where:
<project_name>
Specifies the name of the project.
<ip_address>
Specifies one or more egress IP addresses for the egressIPs array.
For example, to assign project1 to an IP address of 192.168.1.100 and project2 to an IP address of 192.168.1.101:
```
oc patch netnamespace project1 --type=merge -p \
  '{"egressIPs": ["192.168.1.100"]}'
oc patch netnamespace project2 --type=merge -p \
  '{"egressIPs": ["192.168.1.101"]}'
```
```
$ oc patch netnamespace project1 --type=merge -p \
  '{"egressIPs": ["192.168.1.100"]}'
$ oc patch netnamespace project2 --type=merge -p \
  '{"egressIPs": ["192.168.1.101"]}'
```
Copy to Clipboard Toggle word wrap
Note
Because OpenShift SDN manages the NetNamespace object, you can make changes only by modifying the existing NetNamespace object. Do not create a new NetNamespace object.
Indicate which nodes can host egress IP addresses by setting the egressCIDRs parameter for each host using the following JSON:
```
oc patch hostsubnet <node_name> --type=merge -p \
  '{
    "egressCIDRs": [
      "<ip_address_range>", "<ip_address_range>"
    ]
  }'
```
```
$ oc patch hostsubnet <node_name> --type=merge -p \
  '{
    "egressCIDRs": [
      "<ip_address_range>", "<ip_address_range>"
    ]
  }'
```
Copy to Clipboard Toggle word wrap
where:
<node_name>
Specifies a node name.
<ip_address_range>
Specifies an IP address range in CIDR format. You can specify more than one address range for the egressCIDRs array.
For example, to set node1 and node2 to host egress IP addresses in the range 192.168.1.0 to 192.168.1.255:
```
oc patch hostsubnet node1 --type=merge -p \
  '{"egressCIDRs": ["192.168.1.0/24"]}'
oc patch hostsubnet node2 --type=merge -p \
  '{"egressCIDRs": ["192.168.1.0/24"]}'
```
```
$ oc patch hostsubnet node1 --type=merge -p \
  '{"egressCIDRs": ["192.168.1.0/24"]}'
$ oc patch hostsubnet node2 --type=merge -p \
  '{"egressCIDRs": ["192.168.1.0/24"]}'
```
Copy to Clipboard Toggle word wrap
OpenShift Container Platform automatically assigns specific egress IP addresses to available nodes in a balanced way. In this case, it assigns the egress IP address 192.168.1.100 to node1 and the egress IP address 192.168.1.101 to node2 or vice versa.

26.4.3. Configuring manually assigned egress IP addresses for a namespace
Copy link

In OpenShift Container Platform you can associate one or more egress IP addresses with a namespace.

Prerequisites

You have access to the cluster as a user with the cluster-admin role.
You have installed the OpenShift CLI (oc).

Procedure

Update the NetNamespace object by specifying the following JSON object with the desired IP addresses:
```
oc patch netnamespace <project_name> --type=merge -p \
  '{
    "egressIPs": [
      "<ip_address>"
    ]
  }'
```
```
 $ oc patch netnamespace <project_name> --type=merge -p \
  '{
    "egressIPs": [
      "<ip_address>"
    ]
  }'
```
Copy to Clipboard Toggle word wrap
where:
<project_name>
Specifies the name of the project.
<ip_address>
Specifies one or more egress IP addresses for the egressIPs array.
For example, to assign the project1 project to the IP addresses 192.168.1.100 and 192.168.1.101:
```
oc patch netnamespace project1 --type=merge \
  -p '{"egressIPs": ["192.168.1.100","192.168.1.101"]}'
```
```
$ oc patch netnamespace project1 --type=merge \
  -p '{"egressIPs": ["192.168.1.100","192.168.1.101"]}'
```
Copy to Clipboard Toggle word wrap
To provide high availability, set the egressIPs value to two or more IP addresses on different nodes. If multiple egress IP addresses are set, then pods use all egress IP addresses roughly equally.
Note
Because OpenShift SDN manages the NetNamespace object, you can make changes only by modifying the existing NetNamespace object. Do not create a new NetNamespace object.
Manually assign the egress IP address to the node hosts.
If your cluster is installed on public cloud infrastructure, you must confirm that the node has available IP address capacity.
Set the egressIPs parameter on the HostSubnet object on the node host. Using the following JSON, include as many IP addresses as you want to assign to that node host:
```
oc patch hostsubnet <node_name> --type=merge -p \
  '{
    "egressIPs": [
      "<ip_address>",
      "<ip_address>"
      ]
  }'
```
```
$ oc patch hostsubnet <node_name> --type=merge -p \
  '{
    "egressIPs": [
      "<ip_address>",
      "<ip_address>"
      ]
  }'
```
Copy to Clipboard Toggle word wrap
where:
<node_name>
Specifies a node name.
<ip_address>
Specifies an IP address. You can specify more than one IP address for the egressIPs array.
For example, to specify that node1 should have the egress IPs 192.168.1.100, 192.168.1.101, and 192.168.1.102:
```
oc patch hostsubnet node1 --type=merge -p \
  '{"egressIPs": ["192.168.1.100", "192.168.1.101", "192.168.1.102"]}'
```
```
$ oc patch hostsubnet node1 --type=merge -p \
  '{"egressIPs": ["192.168.1.100", "192.168.1.101", "192.168.1.102"]}'
```
Copy to Clipboard Toggle word wrap
In the previous example, all egress traffic for project1 will be routed to the node hosting the specified egress IP, and then connected through Network Address Translation (NAT) to that IP address.

26.4.4. Additional resources
Copy link

If you are configuring manual egress IP address assignment, see Platform considerations for information about IP capacity planning.

26.5. Configuring an egress firewall for a project
Copy link

As a cluster administrator, you can create an egress firewall for a project that restricts egress traffic leaving your OpenShift Container Platform cluster.

26.5.1. How an egress firewall works in a project
Copy link

As a cluster administrator, you can use an egress firewall to limit the external hosts that some or all pods can access from within the cluster. An egress firewall supports the following scenarios:

A pod can only connect to internal hosts and cannot start connections to the public internet.
A pod can only connect to the public internet and cannot start connections to internal hosts that are outside the OpenShift Container Platform cluster.
A pod cannot reach specified internal subnets or hosts outside the OpenShift Container Platform cluster.
A pod can connect to only specific external hosts.

For example, you can allow one project access to a specified IP range but deny the same access to a different project. Or you can restrict application developers from updating from Python pip mirrors, and force updates to come only from approved sources.

Note

Egress firewall does not apply to the host network namespace. Egress firewall rules do not impact any pods that have host networking enabled.

You configure an egress firewall policy by creating an EgressNetworkPolicy custom resource (CR) object. The egress firewall matches network traffic that meets any of the following criteria:

An IP address range in CIDR format
A DNS name that resolves to an IP address

Important

You must have OpenShift SDN configured to use either the network policy or multitenant mode to configure an egress firewall.

If you use network policy mode, an egress firewall is compatible with only one policy per namespace and will not work with projects that share a network, such as global projects.

Warning

Egress firewall rules do not apply to traffic that goes through routers. Any user with permission to create a Route CR object can bypass egress firewall policy rules by creating a route that points to a forbidden destination.

26.5.1.1. Limitations of an egress firewall
Copy link

An egress firewall has the following limitations:

No project can have more than one EgressNetworkPolicy object.
Important
The creation of more than one EgressNetworkPolicy object is allowed, however it should not be done. When you create more than one EgressNetworkPolicy object, the following message is returned: dropping all rules. In actuality, all external traffic is dropped, which can cause security risks for your organization.
A maximum of one EgressNetworkPolicy object with a maximum of 1,000 rules can be defined per project.
The default project cannot use an egress firewall.
When using the OpenShift SDN network plugin in multitenant mode, the following limitations apply:
- Global projects cannot use an egress firewall. You can make a project global by using the oc adm pod-network make-projects-global command.
- Projects merged by using the oc adm pod-network join-projects command cannot use an egress firewall in any of the joined projects.
If you create a selectorless service and manually define endpoints or EndpointSlices that point to external IPs, traffic to the service IP might still be allowed, even if your EgressNetworkPolicy is configured to deny all egress traffic. This occurs because OpenShift SDN does not fully enforce egress network policies for these external endpoints. Consequently, this might result in unexpected access to external services.

Violating any of these restrictions results in a broken egress firewall for the project. As a result, all external network traffic drops, which can cause security risks for your organization.

You can create an Egress Firewall resource in the kube-node-lease, kube-public, kube-system, openshift and openshift- projects.

26.5.1.2. Matching order for egress firewall policy rules
Copy link

The OVN-Kubernetes network plugin evaluates egress firewall policy rules based on the first-to-last order of how you defined the rules. The first rule that matches an egress connection from a pod applies. The plugin ignores any subsequent rules for that connection.

26.5.1.3. Domain Name Server (DNS) resolution
Copy link

If you use DNS names in any of your egress firewall policy rules, proper resolution of the domain names is subject to the following restrictions:

Domain name updates are polled based on a time-to-live (TTL) duration. By default, the duration is 30 seconds. When the egress firewall controller queries the local name servers for a domain name, if the response includes a TTL that is less than 30 seconds, the controller sets the duration to the returned value. If the TTL in the response is greater than 30 minutes, the controller sets the duration to 30 minutes. If the TTL is between 30 seconds and 30 minutes, the controller ignores the value and sets the duration to 30 seconds.
The pod must resolve the domain from the same local name servers when necessary. Otherwise the IP addresses for the domain known by the egress firewall controller and the pod can be different. If the IP addresses for a hostname differ, consistent enforcement of the egress firewall does not apply.
Because the egress firewall controller and pods asynchronously poll the same local name server, the pod might obtain the updated IP address before the egress controller does, which causes a race condition. Due to this current limitation, domain name usage in EgressNetworkPolicy objects is only recommended for domains with infrequent IP address changes.

Note

Using DNS names in your egress firewall policy does not affect local DNS resolution through CoreDNS.

If your egress firewall policy uses domain names, and an external DNS server handles DNS resolution for an affected pod, you must include egress firewall rules that allow access to the IP addresses of your DNS server.

26.5.2. EgressNetworkPolicy custom resource (CR) object
Copy link

You can define one or more rules for an egress firewall. A rule is either an Allow rule or a Deny rule, with a specification for the traffic that the rule applies to.

The following YAML describes an EgressNetworkPolicy CR object:

EgressNetworkPolicy object

apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
  name: <name> 
spec:
  egress: 
    ...

apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
  name: <name>

1


spec:
  egress:

2

...

Copy to Clipboard

Toggle word wrap

1: A name for your egress firewall policy.
2: A collection of one or more egress network policy rules as described in the following section.

26.5.2.1. EgressNetworkPolicy rules
Copy link

The following YAML describes an egress firewall rule object. The user can select either an IP address range in CIDR format, a domain name, or use the nodeSelector to allow or deny egress traffic. The egress stanza expects an array of one or more objects.

Egress policy rule stanza

egress:
- type: <type> 
  to: 
    cidrSelector: <cidr> 
    dnsName: <dns_name>

egress:
- type: <type>

1

to:

2


    cidrSelector: <cidr>

3


    dnsName: <dns_name>

4

Copy to Clipboard

Toggle word wrap

1: The type of rule. The value must be either Allow or Deny.
2: A stanza describing an egress traffic match rule. A value for either the cidrSelector field or the dnsName field for the rule. You cannot use both fields in the same rule.
3: An IP address range in CIDR format.
4: A domain name.

26.5.2.2. Example EgressNetworkPolicy CR objects
Copy link

The following example defines several egress firewall policy rules:

apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
  name: default
spec:
  egress: 
  - type: Allow
    to:
      cidrSelector: 1.2.3.0/24
  - type: Allow
    to:
      dnsName: www.example.com
  - type: Deny
    to:
      cidrSelector: 0.0.0.0/0

apiVersion: network.openshift.io/v1
kind: EgressNetworkPolicy
metadata:
  name: default
spec:
  egress:

1


  - type: Allow
    to:
      cidrSelector: 1.2.3.0/24
  - type: Allow
    to:
      dnsName: www.example.com
  - type: Deny
    to:
      cidrSelector: 0.0.0.0/0

Copy to Clipboard

Toggle word wrap

1: A collection of egress firewall policy rule objects.

26.5.3. Creating an egress firewall policy object
Copy link

As a cluster administrator, you can create an egress firewall policy object for a project.

Important

If the project already has an EgressNetworkPolicy object defined, you must edit the existing policy to make changes to the egress firewall rules.

Prerequisites

A cluster that uses the OpenShift SDN network plugin.
Install the OpenShift CLI (oc).
You must log in to the cluster as a cluster administrator.

Procedure

Create a policy rule:
1. Create a <policy_name>.yaml file where <policy_name> describes the egress policy rules.
2. In the file you created, define an egress policy object.
Enter the following command to create the policy object. Replace <policy_name> with the name of the policy and <project> with the project that the rule applies to.
```
oc create -f <policy_name>.yaml -n <project>
```
```
$ oc create -f <policy_name>.yaml -n <project>
```
Copy to Clipboard Toggle word wrap
In the following example, a new EgressNetworkPolicy object is created in a project named project1:
```
oc create -f default.yaml -n project1
```
```
$ oc create -f default.yaml -n project1
```
Copy to Clipboard Toggle word wrap
Example output
```
egressnetworkpolicy.network.openshift.io/v1 created
```
```
egressnetworkpolicy.network.openshift.io/v1 created
```
Copy to Clipboard Toggle word wrap
Optional: Save the <policy_name>.yaml file so that you can make changes later.

26.6. Editing an egress firewall for a project
Copy link

As a cluster administrator, you can modify network traffic rules for an existing egress firewall.

26.6.1. Viewing an EgressNetworkPolicy object
Copy link

You can view an EgressNetworkPolicy object in your cluster.

Prerequisites

A cluster using the OpenShift SDN network plugin.
Install the OpenShift Command-line Interface (CLI), commonly known as oc.
You must log in to the cluster.

Procedure

Optional: To view the names of the EgressNetworkPolicy objects defined in your cluster, enter the following command:
```
oc get egressnetworkpolicy --all-namespaces
```
```
$ oc get egressnetworkpolicy --all-namespaces
```
Copy to Clipboard Toggle word wrap

To inspect a policy, enter the following command. Replace <policy_name> with the name of the policy to inspect.

oc describe egressnetworkpolicy <policy_name>

$ oc describe egressnetworkpolicy <policy_name>

Copy to Clipboard

Toggle word wrap

Example output

Name:		default
Namespace:	project1
Created:	20 minutes ago
Labels:		<none>
Annotations:	<none>
Rule:		Allow to 1.2.3.0/24
Rule:		Allow to www.example.com
Rule:		Deny to 0.0.0.0/0

Name:		default
Namespace:	project1
Created:	20 minutes ago
Labels:		<none>
Annotations:	<none>
Rule:		Allow to 1.2.3.0/24
Rule:		Allow to www.example.com
Rule:		Deny to 0.0.0.0/0

Copy to Clipboard

Toggle word wrap

26.7. Editing an egress firewall for a project
Copy link

As a cluster administrator, you can modify network traffic rules for an existing egress firewall.

26.7.1. Editing an EgressNetworkPolicy object
Copy link

As a cluster administrator, you can update the egress firewall for a project.

Prerequisites

A cluster using the OpenShift SDN network plugin.
Install the OpenShift CLI (oc).
You must log in to the cluster as a cluster administrator.

Procedure

Find the name of the EgressNetworkPolicy object for the project. Replace <project> with the name of the project.
```
oc get -n <project> egressnetworkpolicy
```
```
$ oc get -n <project> egressnetworkpolicy
```
Copy to Clipboard Toggle word wrap
Optional: If you did not save a copy of the EgressNetworkPolicy object when you created the egress network firewall, enter the following command to create a copy.
```
oc get -n <project> egressnetworkpolicy <name> -o yaml > <filename>.yaml
```
```
$ oc get -n <project> egressnetworkpolicy <name> -o yaml > <filename>.yaml
```
Copy to Clipboard Toggle word wrap
Replace <project> with the name of the project. Replace <name> with the name of the object. Replace <filename> with the name of the file to save the YAML to.
After making changes to the policy rules, enter the following command to replace the EgressNetworkPolicy object. Replace <filename> with the name of the file containing the updated EgressNetworkPolicy object.
```
oc replace -f <filename>.yaml
```
```
$ oc replace -f <filename>.yaml
```
Copy to Clipboard Toggle word wrap

26.8. Removing an egress firewall from a project
Copy link

As a cluster administrator, you can remove an egress firewall from a project to remove all restrictions on network traffic from the project that leaves the OpenShift Container Platform cluster.

26.8.1. Removing an EgressNetworkPolicy object
Copy link

As a cluster administrator, you can remove an egress firewall from a project.

Prerequisites

A cluster using the OpenShift SDN network plugin.
Install the OpenShift CLI (oc).
You must log in to the cluster as a cluster administrator.

Procedure

Find the name of the EgressNetworkPolicy object for the project. Replace <project> with the name of the project.
```
oc get -n <project> egressnetworkpolicy
```
```
$ oc get -n <project> egressnetworkpolicy
```
Copy to Clipboard Toggle word wrap
Enter the following command to delete the EgressNetworkPolicy object. Replace <project> with the name of the project and <name> with the name of the object.
```
oc delete -n <project> egressnetworkpolicy <name>
```
```
$ oc delete -n <project> egressnetworkpolicy <name>
```
Copy to Clipboard Toggle word wrap

26.9. Considerations for the use of an egress router pod
Copy link

26.9.1. About an egress router pod
Copy link

The OpenShift Container Platform egress router pod redirects traffic to a specified remote server from a private source IP address that is not used for any other purpose. An egress router pod can send network traffic to servers that are set up to allow access only from specific IP addresses.

Note

The egress router pod is not intended for every outgoing connection. Creating large numbers of egress router pods can exceed the limits of your network hardware. For example, creating an egress router pod for every project or application could exceed the number of local MAC addresses that the network interface can handle before reverting to filtering MAC addresses in software.

Important

The egress router image is not compatible with Amazon AWS, Azure Cloud, or any other cloud platform that does not support layer 2 manipulations due to their incompatibility with macvlan traffic.

26.9.1.1. Egress router modes
Copy link

In redirect mode, an egress router pod configures iptables rules to redirect traffic from its own IP address to one or more destination IP addresses. Client pods that need to use the reserved source IP address must be configured to access the service for the egress router rather than connecting directly to the destination IP. You can access the destination service and port from the application pod by using the curl command. For example:

curl <router_service_IP> <port>

$ curl <router_service_IP> <port>

Copy to Clipboard

Toggle word wrap

In HTTP proxy mode, an egress router pod runs as an HTTP proxy on port 8080. This mode only works for clients that are connecting to HTTP-based or HTTPS-based services, but usually requires fewer changes to the client pods to get them to work. Many programs can be told to use an HTTP proxy by setting an environment variable.

In DNS proxy mode, an egress router pod runs as a DNS proxy for TCP-based services from its own IP address to one or more destination IP addresses. To make use of the reserved, source IP address, client pods must be modified to connect to the egress router pod rather than connecting directly to the destination IP address. This modification ensures that external destinations treat traffic as though it were coming from a known source.

Redirect mode works for all services except for HTTP and HTTPS. For HTTP and HTTPS services, use HTTP proxy mode. For TCP-based services with IP addresses or domain names, use DNS proxy mode.

26.9.1.2. Egress router pod implementation
Copy link

The egress router pod setup is performed by an initialization container. That container runs in a privileged context so that it can configure the macvlan interface and set up iptables rules. After the initialization container finishes setting up the iptables rules, it exits. Next the egress router pod executes the container to handle the egress router traffic. The image used varies depending on the egress router mode.

The environment variables determine which addresses the egress-router image uses. The image configures the macvlan interface to use EGRESS_SOURCE as its IP address, with EGRESS_GATEWAY as the IP address for the gateway.

Network Address Translation (NAT) rules are set up so that connections to the cluster IP address of the pod on any TCP or UDP port are redirected to the same port on IP address specified by the EGRESS_DESTINATION variable.

If only some of the nodes in your cluster are capable of claiming the specified source IP address and using the specified gateway, you can specify a nodeName or nodeSelector to identify which nodes are acceptable.

26.9.1.3. Deployment considerations
Copy link

An egress router pod adds an additional IP address and MAC address to the primary network interface of the node. As a result, you might need to configure your hypervisor or cloud provider to allow the additional address.

Red Hat OpenStack Platform (RHOSP)

If you deploy OpenShift Container Platform on RHOSP, you must allow traffic from the IP and MAC addresses of the egress router pod on your OpenStack environment. If you do not allow the traffic, then communication will fail:

openstack port set --allowed-address \
  ip_address=<ip_address>,mac_address=<mac_address> <neutron_port_uuid>

$ openstack port set --allowed-address \
  ip_address=<ip_address>,mac_address=<mac_address> <neutron_port_uuid>

Copy to Clipboard

Toggle word wrap

Red Hat Virtualization (RHV)

If you are using RHV, you must select No Network Filter for the Virtual network interface controller (vNIC).

VMware vSphere

If you are using VMware vSphere, see the VMware documentation for securing vSphere standard switches. View and change VMware vSphere default settings by selecting the host virtual switch from the vSphere Web Client.

Specifically, ensure that the following are enabled:

26.9.1.4. Failover configuration
Copy link

To avoid downtime, you can deploy an egress router pod with a Deployment resource, as in the following example. To create a new Service object for the example deployment, use the oc expose deployment/egress-demo-controller command.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: egress-demo-controller
spec:
  replicas: 1 
  selector:
    matchLabels:
      name: egress-router
  template:
    metadata:
      name: egress-router
      labels:
        name: egress-router
      annotations:
        pod.network.openshift.io/assign-macvlan: "true"
    spec: 
      initContainers:
        ...
      containers:
        ...

apiVersion: apps/v1
kind: Deployment
metadata:
  name: egress-demo-controller
spec:
  replicas: 1

1


  selector:
    matchLabels:
      name: egress-router
  template:
    metadata:
      name: egress-router
      labels:
        name: egress-router
      annotations:
        pod.network.openshift.io/assign-macvlan: "true"
    spec:

2


      initContainers:
        ...
      containers:
        ...

Copy to Clipboard

Toggle word wrap

1: Ensure that replicas is set to 1, because only one pod can use a given egress source IP address at any time. This means that only a single copy of the router runs on a node.
2: Specify the Pod object template for the egress router pod.

26.10. Deploying an egress router pod in redirect mode
Copy link

As a cluster administrator, you can deploy an egress router pod that is configured to redirect traffic to specified destination IP addresses.

26.10.1. Egress router pod specification for redirect mode
Copy link

Define the configuration for an egress router pod in the Pod object. The following YAML describes the fields for the configuration of an egress router pod in redirect mode:

apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true" 
spec:
  initContainers:
  - name: egress-router
    image: registry.redhat.io/openshift4/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE 
      value: <egress_router>
    - name: EGRESS_GATEWAY 
      value: <egress_gateway>
    - name: EGRESS_DESTINATION 
      value: <egress_destination>
    - name: EGRESS_ROUTER_MODE
      value: init
  containers:
  - name: egress-router-wait
    image: registry.redhat.io/openshift4/ose-pod

apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true"

1


spec:
  initContainers:
  - name: egress-router
    image: registry.redhat.io/openshift4/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE

2


      value: <egress_router>
    - name: EGRESS_GATEWAY

3


      value: <egress_gateway>
    - name: EGRESS_DESTINATION

4


      value: <egress_destination>
    - name: EGRESS_ROUTER_MODE
      value: init
  containers:
  - name: egress-router-wait
    image: registry.redhat.io/openshift4/ose-pod

Copy to Clipboard

Toggle word wrap

1: The annotation tells OpenShift Container Platform to create a macvlan network interface on the primary network interface controller (NIC) and move that macvlan interface into the pod’s network namespace. You must include the quotation marks around the "true" value. To have OpenShift Container Platform create the macvlan interface on a different NIC interface, set the annotation value to the name of that interface. For example, eth1.
2: IP address from the physical network that the node is on that is reserved for use by the egress router pod. Optional: You can include the subnet length, the /24 suffix, so that a proper route to the local subnet is set. If you do not specify a subnet length, then the egress router can access only the host specified with the EGRESS_GATEWAY variable and no other hosts on the subnet.
3: Same value as the default gateway used by the node.
4: External server to direct traffic to. Using this example, connections to the pod are redirected to 203.0.113.25, with a source IP address of 192.168.12.99.

Example egress router pod specification

apiVersion: v1
kind: Pod
metadata:
  name: egress-multi
  labels:
    name: egress-multi
  annotations:
    pod.network.openshift.io/assign-macvlan: "true"
spec:
  initContainers:
  - name: egress-router
    image: registry.redhat.io/openshift4/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE
      value: 192.168.12.99/24
    - name: EGRESS_GATEWAY
      value: 192.168.12.1
    - name: EGRESS_DESTINATION
      value: |
        80   tcp 203.0.113.25
        8080 tcp 203.0.113.26 80
        8443 tcp 203.0.113.26 443
        203.0.113.27
    - name: EGRESS_ROUTER_MODE
      value: init
  containers:
  - name: egress-router-wait
    image: registry.redhat.io/openshift4/ose-pod

apiVersion: v1
kind: Pod
metadata:
  name: egress-multi
  labels:
    name: egress-multi
  annotations:
    pod.network.openshift.io/assign-macvlan: "true"
spec:
  initContainers:
  - name: egress-router
    image: registry.redhat.io/openshift4/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE
      value: 192.168.12.99/24
    - name: EGRESS_GATEWAY
      value: 192.168.12.1
    - name: EGRESS_DESTINATION
      value: |
        80   tcp 203.0.113.25
        8080 tcp 203.0.113.26 80
        8443 tcp 203.0.113.26 443
        203.0.113.27
    - name: EGRESS_ROUTER_MODE
      value: init
  containers:
  - name: egress-router-wait
    image: registry.redhat.io/openshift4/ose-pod

Copy to Clipboard

Toggle word wrap

26.10.2. Egress destination configuration format
Copy link

When an egress router pod is deployed in redirect mode, you can specify redirection rules by using one or more of the following formats:

<port> <protocol> <ip_address> - Incoming connections to the given <port> should be redirected to the same port on the given <ip_address>. <protocol> is either tcp or udp.
<port> <protocol> <ip_address> <remote_port> - As above, except that the connection is redirected to a different <remote_port> on <ip_address>.
<ip_address> - If the last line is a single IP address, then any connections on any other port will be redirected to the corresponding port on that IP address. If there is no fallback IP address then connections on other ports are rejected.

In the example that follows several rules are defined:

The first line redirects traffic from local port 80 to port 80 on 203.0.113.25.
The second and third lines redirect local ports 8080 and 8443 to remote ports 80 and 443 on 203.0.113.26.
The last line matches traffic for any ports not specified in the previous rules.

Example configuration

80   tcp 203.0.113.25
8080 tcp 203.0.113.26 80
8443 tcp 203.0.113.26 443
203.0.113.27

80   tcp 203.0.113.25
8080 tcp 203.0.113.26 80
8443 tcp 203.0.113.26 443
203.0.113.27

Copy to Clipboard

Toggle word wrap

26.10.3. Deploying an egress router pod in redirect mode
Copy link

In redirect mode, an egress router pod sets up iptables rules to redirect traffic from its own IP address to one or more destination IP addresses. Client pods that need to use the reserved source IP address must be configured to access the service for the egress router rather than connecting directly to the destination IP. You can access the destination service and port from the application pod by using the curl command. For example:

curl <router_service_IP> <port>

$ curl <router_service_IP> <port>

Copy to Clipboard

Toggle word wrap

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create an egress router pod.

To ensure that other pods can find the IP address of the egress router pod, create a service to point to the egress router pod, as in the following example:

apiVersion: v1
kind: Service
metadata:
  name: egress-1
spec:
  ports:
  - name: http
    port: 80
  - name: https
    port: 443
  type: ClusterIP
  selector:
    name: egress-1

apiVersion: v1
kind: Service
metadata:
  name: egress-1
spec:
  ports:
  - name: http
    port: 80
  - name: https
    port: 443
  type: ClusterIP
  selector:
    name: egress-1

Copy to Clipboard

Toggle word wrap

Your pods can now connect to this service. Their connections are redirected to the corresponding ports on the external server, using the reserved egress IP address.

26.11. Deploying an egress router pod in HTTP proxy mode
Copy link

As a cluster administrator, you can deploy an egress router pod configured to proxy traffic to specified HTTP and HTTPS-based services.

26.11.1. Egress router pod specification for HTTP mode
Copy link

Define the configuration for an egress router pod in the Pod object. The following YAML describes the fields for the configuration of an egress router pod in HTTP mode:

apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true" 
spec:
  initContainers:
  - name: egress-router
    image: registry.redhat.io/openshift4/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE 
      value: <egress-router>
    - name: EGRESS_GATEWAY 
      value: <egress-gateway>
    - name: EGRESS_ROUTER_MODE
      value: http-proxy
  containers:
  - name: egress-router-pod
    image: registry.redhat.io/openshift4/ose-egress-http-proxy
    env:
    - name: EGRESS_HTTP_PROXY_DESTINATION 
      value: |-
        ...
    ...

apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true"

1


spec:
  initContainers:
  - name: egress-router
    image: registry.redhat.io/openshift4/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE

2


      value: <egress-router>
    - name: EGRESS_GATEWAY

3


      value: <egress-gateway>
    - name: EGRESS_ROUTER_MODE
      value: http-proxy
  containers:
  - name: egress-router-pod
    image: registry.redhat.io/openshift4/ose-egress-http-proxy
    env:
    - name: EGRESS_HTTP_PROXY_DESTINATION

4


      value: |-
        ...
    ...

Copy to Clipboard

Toggle word wrap

1: The annotation tells OpenShift Container Platform to create a macvlan network interface on the primary network interface controller (NIC) and move that macvlan interface into the pod’s network namespace. You must include the quotation marks around the "true" value. To have OpenShift Container Platform create the macvlan interface on a different NIC interface, set the annotation value to the name of that interface. For example, eth1.
2: IP address from the physical network that the node is on that is reserved for use by the egress router pod. Optional: You can include the subnet length, the /24 suffix, so that a proper route to the local subnet is set. If you do not specify a subnet length, then the egress router can access only the host specified with the EGRESS_GATEWAY variable and no other hosts on the subnet.
3: Same value as the default gateway used by the node.
4: A string or YAML multi-line string specifying how to configure the proxy. Note that this is specified as an environment variable in the HTTP proxy container, not with the other environment variables in the init container.

26.11.2. Egress destination configuration format
Copy link

When an egress router pod is deployed in HTTP proxy mode, you can specify redirection rules by using one or more of the following formats. Each line in the configuration specifies one group of connections to allow or deny:

An IP address allows connections to that IP address, such as 192.168.1.1.
A CIDR range allows connections to that CIDR range, such as 192.168.1.0/24.
A hostname allows proxying to that host, such as www.example.com.
A domain name preceded by *. allows proxying to that domain and all of its subdomains, such as *.example.com.
A ! followed by any of the previous match expressions denies the connection instead.
If the last line is *, then anything that is not explicitly denied is allowed. Otherwise, anything that is not allowed is denied.

You can also use * to allow connections to all remote destinations.

Example configuration

!*.example.com
!192.168.1.0/24
192.168.2.1
*

!*.example.com
!192.168.1.0/24
192.168.2.1
*

Copy to Clipboard

Toggle word wrap

26.11.3. Deploying an egress router pod in HTTP proxy mode
Copy link

In HTTP proxy mode, an egress router pod runs as an HTTP proxy on port 8080. This mode only works for clients that are connecting to HTTP-based or HTTPS-based services, but usually requires fewer changes to the client pods to get them to work. Many programs can be told to use an HTTP proxy by setting an environment variable.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create an egress router pod.

To ensure that other pods can find the IP address of the egress router pod, create a service to point to the egress router pod, as in the following example:

apiVersion: v1
kind: Service
metadata:
  name: egress-1
spec:
  ports:
  - name: http-proxy
    port: 8080 
  type: ClusterIP
  selector:
    name: egress-1

apiVersion: v1
kind: Service
metadata:
  name: egress-1
spec:
  ports:
  - name: http-proxy
    port: 8080

1


  type: ClusterIP
  selector:
    name: egress-1

Copy to Clipboard

Toggle word wrap

1: Ensure the http port is set to 8080.

To configure the client pod (not the egress proxy pod) to use the HTTP proxy, set the http_proxy or https_proxy variables:

apiVersion: v1
kind: Pod
metadata:
  name: app-1
  labels:
    name: app-1
spec:
  containers:
    env:
    - name: http_proxy
      value: http://egress-1:8080/ 
    - name: https_proxy
      value: http://egress-1:8080/
    ...

apiVersion: v1
kind: Pod
metadata:
  name: app-1
  labels:
    name: app-1
spec:
  containers:
    env:
    - name: http_proxy
      value: http://egress-1:8080/

1


    - name: https_proxy
      value: http://egress-1:8080/
    ...

Copy to Clipboard

Toggle word wrap

1: The service created in the previous step.

Note

Using the http_proxy and https_proxy environment variables is not necessary for all setups. If the above does not create a working setup, then consult the documentation for the tool or software you are running in the pod.

26.12. Deploying an egress router pod in DNS proxy mode
Copy link

As a cluster administrator, you can deploy an egress router pod configured to proxy traffic to specified DNS names and IP addresses.

26.12.1. Egress router pod specification for DNS mode
Copy link

Define the configuration for an egress router pod in the Pod object. The following YAML describes the fields for the configuration of an egress router pod in DNS mode:

apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true" 
spec:
  initContainers:
  - name: egress-router
    image: registry.redhat.io/openshift4/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE 
      value: <egress-router>
    - name: EGRESS_GATEWAY 
      value: <egress-gateway>
    - name: EGRESS_ROUTER_MODE
      value: dns-proxy
  containers:
  - name: egress-router-pod
    image: registry.redhat.io/openshift4/ose-egress-dns-proxy
    securityContext:
      privileged: true
    env:
    - name: EGRESS_DNS_PROXY_DESTINATION 
      value: |-
        ...
    - name: EGRESS_DNS_PROXY_DEBUG 
      value: "1"
    ...

apiVersion: v1
kind: Pod
metadata:
  name: egress-1
  labels:
    name: egress-1
  annotations:
    pod.network.openshift.io/assign-macvlan: "true"

1


spec:
  initContainers:
  - name: egress-router
    image: registry.redhat.io/openshift4/ose-egress-router
    securityContext:
      privileged: true
    env:
    - name: EGRESS_SOURCE

2


      value: <egress-router>
    - name: EGRESS_GATEWAY

3


      value: <egress-gateway>
    - name: EGRESS_ROUTER_MODE
      value: dns-proxy
  containers:
  - name: egress-router-pod
    image: registry.redhat.io/openshift4/ose-egress-dns-proxy
    securityContext:
      privileged: true
    env:
    - name: EGRESS_DNS_PROXY_DESTINATION

4


      value: |-
        ...
    - name: EGRESS_DNS_PROXY_DEBUG

5


      value: "1"
    ...

Copy to Clipboard

Toggle word wrap

1: The annotation tells OpenShift Container Platform to create a macvlan network interface on the primary network interface controller (NIC) and move that macvlan interface into the pod’s network namespace. You must include the quotation marks around the "true" value. To have OpenShift Container Platform create the macvlan interface on a different NIC interface, set the annotation value to the name of that interface. For example, eth1.
2: IP address from the physical network that the node is on that is reserved for use by the egress router pod. Optional: You can include the subnet length, the /24 suffix, so that a proper route to the local subnet is set. If you do not specify a subnet length, then the egress router can access only the host specified with the EGRESS_GATEWAY variable and no other hosts on the subnet.
3: Same value as the default gateway used by the node.
4: Specify a list of one or more proxy destinations.
5: Optional: Specify to output the DNS proxy log output to stdout.

26.12.2. Egress destination configuration format
Copy link

When the router is deployed in DNS proxy mode, you specify a list of port and destination mappings. A destination may be either an IP address or a DNS name.

An egress router pod supports the following formats for specifying port and destination mappings:

Port and remote address: You can specify a source port and a destination host by using the two field format: <port> <remote_address>.

The host can be an IP address or a DNS name. If a DNS name is provided, DNS resolution occurs at runtime. For a given host, the proxy connects to the specified source port on the destination host when connecting to the destination host IP address.

Port and remote address pair example

80 172.16.12.11
100 example.com

80 172.16.12.11
100 example.com

Copy to Clipboard

Toggle word wrap

Port, remote address, and remote port: You can specify a source port, a destination host, and a destination port by using the three field format: <port> <remote_address> <remote_port>.

The three field format behaves identically to the two field version, with the exception that the destination port can be different than the source port.

Port, remote address, and remote port example

8080 192.168.60.252 80
8443 web.example.com 443

8080 192.168.60.252 80
8443 web.example.com 443

Copy to Clipboard

Toggle word wrap

26.12.3. Deploying an egress router pod in DNS proxy mode
Copy link

In DNS proxy mode, an egress router pod acts as a DNS proxy for TCP-based services from its own IP address to one or more destination IP addresses.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create an egress router pod.

Create a service for the egress router pod:

Create a file named egress-router-service.yaml that contains the following YAML. Set spec.ports to the list of ports that you defined previously for the EGRESS_DNS_PROXY_DESTINATION environment variable.

apiVersion: v1
kind: Service
metadata:
  name: egress-dns-svc
spec:
  ports:
    ...
  type: ClusterIP
  selector:
    name: egress-dns-proxy

apiVersion: v1
kind: Service
metadata:
  name: egress-dns-svc
spec:
  ports:
    ...
  type: ClusterIP
  selector:
    name: egress-dns-proxy

Copy to Clipboard

Toggle word wrap

For example:

apiVersion: v1
kind: Service
metadata:
  name: egress-dns-svc
spec:
  ports:
  - name: con1
    protocol: TCP
    port: 80
    targetPort: 80
  - name: con2
    protocol: TCP
    port: 100
    targetPort: 100
  type: ClusterIP
  selector:
    name: egress-dns-proxy

apiVersion: v1
kind: Service
metadata:
  name: egress-dns-svc
spec:
  ports:
  - name: con1
    protocol: TCP
    port: 80
    targetPort: 80
  - name: con2
    protocol: TCP
    port: 100
    targetPort: 100
  type: ClusterIP
  selector:
    name: egress-dns-proxy

Copy to Clipboard

Toggle word wrap

To create the service, enter the following command:
```
oc create -f egress-router-service.yaml
```
```
$ oc create -f egress-router-service.yaml
```
Copy to Clipboard Toggle word wrap
Pods can now connect to this service. The connections are proxied to the corresponding ports on the external server, using the reserved egress IP address.

26.13. Configuring an egress router pod destination list from a config map
Copy link

As a cluster administrator, you can define a ConfigMap object that specifies destination mappings for an egress router pod. The specific format of the configuration depends on the type of egress router pod. For details on the format, refer to the documentation for the specific egress router pod.

26.13.1. Configuring an egress router destination mappings with a config map
Copy link

For a large or frequently-changing set of destination mappings, you can use a config map to externally maintain the list. An advantage of this approach is that permission to edit the config map can be delegated to users without cluster-admin privileges. Because the egress router pod requires a privileged container, it is not possible for users without cluster-admin privileges to edit the pod definition directly.

Note

The egress router pod does not automatically update when the config map changes. You must restart the egress router pod to get updates.

Prerequisites

Install the OpenShift CLI (oc).
Log in as a user with cluster-admin privileges.

Procedure

Create a file containing the mapping data for the egress router pod, as in the following example:

# Egress routes for Project "Test", version 3

80   tcp 203.0.113.25

8080 tcp 203.0.113.26 80
8443 tcp 203.0.113.26 443

# Fallback
203.0.113.27

# Egress routes for Project "Test", version 3

80   tcp 203.0.113.25

8080 tcp 203.0.113.26 80
8443 tcp 203.0.113.26 443

# Fallback
203.0.113.27

Copy to Clipboard

Toggle word wrap

You can put blank lines and comments into this file.

Create a ConfigMap object from the file:

oc delete configmap egress-routes --ignore-not-found

$ oc delete configmap egress-routes --ignore-not-found

Copy to Clipboard

Toggle word wrap

oc create configmap egress-routes \
  --from-file=destination=my-egress-destination.txt

$ oc create configmap egress-routes \
  --from-file=destination=my-egress-destination.txt

Copy to Clipboard

Toggle word wrap

In the previous command, the egress-routes value is the name of the ConfigMap object to create and my-egress-destination.txt is the name of the file that the data is read from.

Tip

You can alternatively apply the following YAML to create the config map:

apiVersion: v1
kind: ConfigMap
metadata:
  name: egress-routes
data:
  destination: |
    # Egress routes for Project "Test", version 3

    80   tcp 203.0.113.25

    8080 tcp 203.0.113.26 80
    8443 tcp 203.0.113.26 443

    # Fallback
    203.0.113.27

apiVersion: v1
kind: ConfigMap
metadata:
  name: egress-routes
data:
  destination: |
    # Egress routes for Project "Test", version 3

    80   tcp 203.0.113.25

    8080 tcp 203.0.113.26 80
    8443 tcp 203.0.113.26 443

    # Fallback
    203.0.113.27

Copy to Clipboard

Toggle word wrap

Create an egress router pod definition and specify the configMapKeyRef stanza for the EGRESS_DESTINATION field in the environment stanza:

...
env:
- name: EGRESS_DESTINATION
  valueFrom:
    configMapKeyRef:
      name: egress-routes
      key: destination
...

...
env:
- name: EGRESS_DESTINATION
  valueFrom:
    configMapKeyRef:
      name: egress-routes
      key: destination
...

Copy to Clipboard

Toggle word wrap

26.14. Enabling multicast for a project
Copy link

26.14.1. About multicast
Copy link

With IP multicast, data is broadcast to many IP addresses simultaneously.

Important

At this time, multicast is best used for low-bandwidth coordination or service discovery and not a high-bandwidth solution.
By default, network policies affect all connections in a namespace. However, multicast is unaffected by network policies. If multicast is enabled in the same namespace as your network policies, it is always allowed, even if there is a deny-all network policy. Cluster administrators should consider the implications to the exemption of multicast from network policies before enabling it.

Multicast traffic between OpenShift Container Platform pods is disabled by default. If you are using the OpenShift SDN network plugin, you can enable multicast on a per-project basis.

When using the OpenShift SDN network plugin in networkpolicy isolation mode:

Multicast packets sent by a pod will be delivered to all other pods in the project, regardless of NetworkPolicy objects. Pods might be able to communicate over multicast even when they cannot communicate over unicast.
Multicast packets sent by a pod in one project will never be delivered to pods in any other project, even if there are NetworkPolicy objects that allow communication between the projects.

When using the OpenShift SDN network plugin in multitenant isolation mode:

Multicast packets sent by a pod will be delivered to all other pods in the project.
Multicast packets sent by a pod in one project will be delivered to pods in other projects only if each project is joined together and multicast is enabled in each joined project.

26.14.2. Enabling multicast between pods
Copy link

You can enable multicast between pods for your project.

Prerequisites

Install the OpenShift CLI (oc).
You must log in to the cluster with a user that has the cluster-admin role.

Procedure

Run the following command to enable multicast for a project. Replace <namespace> with the namespace for the project you want to enable multicast for.

oc annotate netnamespace <namespace> \
    netnamespace.network.openshift.io/multicast-enabled=true

$ oc annotate netnamespace <namespace> \
    netnamespace.network.openshift.io/multicast-enabled=true

Copy to Clipboard

Toggle word wrap

Verification

To verify that multicast is enabled for a project, complete the following procedure:

Change your current project to the project that you enabled multicast for. Replace <project> with the project name.
```
oc project <project>
```
```
$ oc project <project>
```
Copy to Clipboard Toggle word wrap

Create a pod to act as a multicast receiver:

cat <<EOF| oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: mlistener
  labels:
    app: multicast-verify
spec:
  containers:
    - name: mlistener
      image: registry.access.redhat.com/ubi9
      command: ["/bin/sh", "-c"]
      args:
        ["dnf -y install socat hostname && sleep inf"]
      ports:
        - containerPort: 30102
          name: mlistener
          protocol: UDP
EOF

$ cat <<EOF| oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: mlistener
  labels:
    app: multicast-verify
spec:
  containers:
    - name: mlistener
      image: registry.access.redhat.com/ubi9
      command: ["/bin/sh", "-c"]
      args:
        ["dnf -y install socat hostname && sleep inf"]
      ports:
        - containerPort: 30102
          name: mlistener
          protocol: UDP
EOF

Copy to Clipboard

Toggle word wrap

Create a pod to act as a multicast sender:

cat <<EOF| oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: msender
  labels:
    app: multicast-verify
spec:
  containers:
    - name: msender
      image: registry.access.redhat.com/ubi9
      command: ["/bin/sh", "-c"]
      args:
        ["dnf -y install socat && sleep inf"]
EOF

$ cat <<EOF| oc create -f -
apiVersion: v1
kind: Pod
metadata:
  name: msender
  labels:
    app: multicast-verify
spec:
  containers:
    - name: msender
      image: registry.access.redhat.com/ubi9
      command: ["/bin/sh", "-c"]
      args:
        ["dnf -y install socat && sleep inf"]
EOF

Copy to Clipboard

Toggle word wrap

In a new terminal window or tab, start the multicast listener.

Get the IP address for the Pod:

POD_IP=$(oc get pods mlistener -o jsonpath='{.status.podIP}')

$ POD_IP=$(oc get pods mlistener -o jsonpath='{.status.podIP}')

Copy to Clipboard

Toggle word wrap

Start the multicast listener by entering the following command:

oc exec mlistener -i -t -- \
    socat UDP4-RECVFROM:30102,ip-add-membership=224.1.0.1:$POD_IP,fork EXEC:hostname

$ oc exec mlistener -i -t -- \
    socat UDP4-RECVFROM:30102,ip-add-membership=224.1.0.1:$POD_IP,fork EXEC:hostname

Copy to Clipboard

Toggle word wrap

Start the multicast transmitter.

Get the pod network IP address range:

CIDR=$(oc get Network.config.openshift.io cluster \
    -o jsonpath='{.status.clusterNetwork[0].cidr}')

$ CIDR=$(oc get Network.config.openshift.io cluster \
    -o jsonpath='{.status.clusterNetwork[0].cidr}')

Copy to Clipboard

Toggle word wrap

To send a multicast message, enter the following command:

oc exec msender -i -t -- \
    /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64"

$ oc exec msender -i -t -- \
    /bin/bash -c "echo | socat STDIO UDP4-DATAGRAM:224.1.0.1:30102,range=$CIDR,ip-multicast-ttl=64"

Copy to Clipboard

Toggle word wrap

If multicast is working, the previous command returns the following output:

mlistener

mlistener

Copy to Clipboard

Toggle word wrap

26.15. Disabling multicast for a project
Copy link

26.15.1. Disabling multicast between pods
Copy link

You can disable multicast between pods for your project.

Prerequisites

Install the OpenShift CLI (oc).
You must log in to the cluster with a user that has the cluster-admin role.

Procedure

Disable multicast by running the following command:

oc annotate netnamespace <namespace> \
    netnamespace.network.openshift.io/multicast-enabled-

$ oc annotate netnamespace <namespace> \

1


    netnamespace.network.openshift.io/multicast-enabled-

Copy to Clipboard

Toggle word wrap

1: The namespace for the project you want to disable multicast for.

26.16. Configuring network isolation using OpenShift SDN
Copy link

When your cluster is configured to use the multitenant isolation mode for the OpenShift SDN network plugin, each project is isolated by default. Network traffic is not allowed between pods or services in different projects in multitenant isolation mode.

You can change the behavior of multitenant isolation for a project in two ways:

You can join one or more projects, allowing network traffic between pods and services in different projects.
You can disable network isolation for a project. It will be globally accessible, accepting network traffic from pods and services in all other projects. A globally accessible project can access pods and services in all other projects.

26.16.1. Prerequisites
Copy link

You must have a cluster configured to use the OpenShift SDN network plugin in multitenant isolation mode.

26.16.2. Joining projects
Copy link

You can join two or more projects to allow network traffic between pods and services in different projects.

Prerequisites

Install the OpenShift CLI (oc).
You must log in to the cluster with a user that has the cluster-admin role.

Procedure

Use the following command to join projects to an existing project network:
```
oc adm pod-network join-projects --to=<project1> <project2> <project3>
```
```
$ oc adm pod-network join-projects --to=<project1> <project2> <project3>
```
Copy to Clipboard Toggle word wrap
Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option to specify projects based upon an associated label.
Optional: Run the following command to view the pod networks that you have joined together:
```
oc get netnamespaces
```
```
$ oc get netnamespaces
```
Copy to Clipboard Toggle word wrap
Projects in the same pod-network have the same network ID in the NETID column.

26.16.3. Isolating a project
Copy link

You can isolate a project so that pods and services in other projects cannot access its pods and services.

Prerequisites

Install the OpenShift CLI (oc).
You must log in to the cluster with a user that has the cluster-admin role.

Procedure

To isolate the projects in the cluster, run the following command:
```
oc adm pod-network isolate-projects <project1> <project2>
```
```
$ oc adm pod-network isolate-projects <project1> <project2>
```
Copy to Clipboard Toggle word wrap
Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option to specify projects based upon an associated label.

26.16.4. Disabling network isolation for a project
Copy link

You can disable network isolation for a project.

Prerequisites

Install the OpenShift CLI (oc).
You must log in to the cluster with a user that has the cluster-admin role.

Procedure

Run the following command for the project:
```
oc adm pod-network make-projects-global <project1> <project2>
```
```
$ oc adm pod-network make-projects-global <project1> <project2>
```
Copy to Clipboard Toggle word wrap
Alternatively, instead of specifying specific project names, you can use the --selector=<project_selector> option to specify projects based upon an associated label.

26.17. Configuring kube-proxy
Copy link

The Kubernetes network proxy (kube-proxy) runs on each node and is managed by the Cluster Network Operator (CNO). kube-proxy maintains network rules for forwarding connections for endpoints associated with services.

26.17.1. About iptables rules synchronization
Copy link

The synchronization period determines how frequently the Kubernetes network proxy (kube-proxy) syncs the iptables rules on a node.

A sync begins when either of the following events occurs:

An event occurs, such as service or endpoint is added to or removed from the cluster.
The time since the last sync exceeds the sync period defined for kube-proxy.

26.17.2. kube-proxy configuration parameters
Copy link

You can modify the following kubeProxyConfig parameters.

Note

Because of performance improvements introduced in OpenShift Container Platform 4.3 and greater, adjusting the iptablesSyncPeriod parameter is no longer necessary.

Expand

Table 26.3. Parameters
Parameter	Description	Values	Default
`iptablesSyncPeriod`	The refresh period for `iptables` rules.	A time interval, such as `30s` or `2m`. Valid suffixes include `s`, `m`, and `h` and are described in the Go time package documentation.	`30s`
`proxyArguments.iptables-min-sync-period`	The minimum duration before refreshing `iptables` rules. This parameter ensures that the refresh does not happen too frequently. By default, a refresh starts as soon as a change that affects `iptables` rules occurs.	A time interval, such as `30s` or `2m`. Valid suffixes include `s`, `m`, and `h` and are described in the Go time package	`0s`

26.17.3. Modifying the kube-proxy configuration
Copy link

You can modify the Kubernetes network proxy configuration for your cluster.

Prerequisites

Install the OpenShift CLI (oc).
Log in to a running cluster with the cluster-admin role.

Procedure

Edit the Network.operator.openshift.io custom resource (CR) by running the following command:
```
oc edit network.operator.openshift.io cluster
```
```
$ oc edit network.operator.openshift.io cluster
```
Copy to Clipboard Toggle word wrap

Modify the kubeProxyConfig parameter in the CR with your changes to the kube-proxy configuration, such as in the following example CR:

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  kubeProxyConfig:
    iptablesSyncPeriod: 30s
    proxyArguments:
      iptables-min-sync-period: ["30s"]

apiVersion: operator.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  kubeProxyConfig:
    iptablesSyncPeriod: 30s
    proxyArguments:
      iptables-min-sync-period: ["30s"]

Copy to Clipboard

Toggle word wrap

Save the file and exit the text editor.
The syntax is validated by the oc command when you save the file and exit the editor. If your modifications contain a syntax error, the editor opens the file and displays an error message.

Enter the following command to confirm the configuration update:

oc get networks.operator.openshift.io -o yaml

$ oc get networks.operator.openshift.io -o yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: Network
  metadata:
    name: cluster
  spec:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    defaultNetwork:
      type: OpenShiftSDN
    kubeProxyConfig:
      iptablesSyncPeriod: 30s
      proxyArguments:
        iptables-min-sync-period:
        - 30s
    serviceNetwork:
    - 172.30.0.0/16
  status: {}
kind: List

apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: Network
  metadata:
    name: cluster
  spec:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    defaultNetwork:
      type: OpenShiftSDN
    kubeProxyConfig:
      iptablesSyncPeriod: 30s
      proxyArguments:
        iptables-min-sync-period:
        - 30s
    serviceNetwork:
    - 172.30.0.0/16
  status: {}
kind: List

Copy to Clipboard

Toggle word wrap

Optional: Enter the following command to confirm that the Cluster Network Operator accepted the configuration change:
```
oc get clusteroperator network
```
```
$ oc get clusteroperator network
```
Copy to Clipboard Toggle word wrap
Example output
```
NAME      VERSION     AVAILABLE   PROGRESSING   DEGRADED   SINCE
network   4.1.0-0.9   True        False         False      1m
```
```
NAME      VERSION     AVAILABLE   PROGRESSING   DEGRADED   SINCE
network   4.1.0-0.9   True        False         False      1m
```
Copy to Clipboard Toggle word wrap
The AVAILABLE field is True when the configuration update is applied successfully.

Chapter 27. Configuring Routes
Copy link

27.1. Route configuration
Copy link

27.1.1. Creating an HTTP-based route
Copy link

A route allows you to host your application at a public URL. It can either be secure or unsecured, depending on the network security configuration of your application. An HTTP-based route is an unsecured route that uses the basic HTTP routing protocol and exposes a service on an unsecured application port.

The following procedure describes how to create a simple HTTP-based route to a web application, using the hello-openshift application as an example.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in as an administrator.
You have a web application that exposes a port and a TCP endpoint listening for traffic on the port.

Procedure

Create a project called hello-openshift by running the following command:
```
oc new-project hello-openshift
```
```
$ oc new-project hello-openshift
```
Copy to Clipboard Toggle word wrap

Create a pod in the project by running the following command:

oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json

$ oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json

Copy to Clipboard

Toggle word wrap

Create a service called hello-openshift by running the following command:
```
oc expose pod/hello-openshift
```
```
$ oc expose pod/hello-openshift
```
Copy to Clipboard Toggle word wrap
Create an unsecured route to the hello-openshift application by running the following command:
```
oc expose svc hello-openshift
```
```
$ oc expose svc hello-openshift
```
Copy to Clipboard Toggle word wrap

Verification

To verify that the route resource that you created, run the following command:
```
oc get routes -o yaml <name of resource>
```
```
$ oc get routes -o yaml <name of resource> 
```
1
Copy to Clipboard Toggle word wrap
1
In this example, the route is named hello-openshift.

Sample YAML definition of the created unsecured route:

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: hello-openshift
spec:
  host: www.example.com 
  port:
    targetPort: 8080 
  to:
    kind: Service
    name: hello-openshift

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: hello-openshift
spec:
  host: www.example.com

1


  port:
    targetPort: 8080

2


  to:
    kind: Service
    name: hello-openshift

Copy to Clipboard

Toggle word wrap

The host field is an alias DNS record that points to the service. This field can be any valid DNS name, such as www.example.com. The DNS name must follow DNS952 subdomain conventions. If not specified, a route name is automatically generated.

The targetPort field is the target port on pods that is selected by the service that this route points to.

Note

To display your default ingress domain, run the following command:

oc get ingresses.config/cluster -o jsonpath={.spec.domain}

$ oc get ingresses.config/cluster -o jsonpath={.spec.domain}

Copy to Clipboard

Toggle word wrap

27.1.2. Creating a route for Ingress Controller sharding
Copy link

A route allows you to host your application at a URL. In this case, the hostname is not set and the route uses a subdomain instead. When you specify a subdomain, you automatically use the domain of the Ingress Controller that exposes the route. For situations where a route is exposed by multiple Ingress Controllers, the route is hosted at multiple URLs.

The following procedure describes how to create a route for Ingress Controller sharding, using the hello-openshift application as an example.

Ingress Controller sharding is useful when balancing incoming traffic load among a set of Ingress Controllers and when isolating traffic to a specific Ingress Controller. For example, company A goes to one Ingress Controller and company B to another.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in as a project administrator.
You have a web application that exposes a port and an HTTP or TLS endpoint listening for traffic on the port.
You have configured the Ingress Controller for sharding.

Procedure

Create a project called hello-openshift by running the following command:
```
oc new-project hello-openshift
```
```
$ oc new-project hello-openshift
```
Copy to Clipboard Toggle word wrap

Create a pod in the project by running the following command:

oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json

$ oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json

Copy to Clipboard

Toggle word wrap

Create a service called hello-openshift by running the following command:
```
oc expose pod/hello-openshift
```
```
$ oc expose pod/hello-openshift
```
Copy to Clipboard Toggle word wrap
Create a route definition called hello-openshift-route.yaml:
YAML definition of the created route for sharding:
```
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    type: sharded 
  name: hello-openshift-edge
  namespace: hello-openshift
spec:
  subdomain: hello-openshift 
  tls:
    termination: edge
  to:
    kind: Service
    name: hello-openshift
```
```
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    type: sharded 
```
1
```
  name: hello-openshift-edge
  namespace: hello-openshift
spec:
  subdomain: hello-openshift 
```
2
```
  tls:
    termination: edge
  to:
    kind: Service
    name: hello-openshift
```
Copy to Clipboard Toggle word wrap
1
Both the label key and its corresponding label value must match the ones specified in the Ingress Controller. In this example, the Ingress Controller has the label key and value type: sharded.
2
The route will be exposed using the value of the subdomain field. When you specify the subdomain field, you must leave the hostname unset. If you specify both the host and subdomain fields, then the route will use the value of the host field, and ignore the subdomain field.
Use hello-openshift-route.yaml to create a route to the hello-openshift application by running the following command:
```
oc -n hello-openshift create -f hello-openshift-route.yaml
```
```
$ oc -n hello-openshift create -f hello-openshift-route.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Get the status of the route with the following command:

oc -n hello-openshift get routes/hello-openshift-edge -o yaml

$ oc -n hello-openshift get routes/hello-openshift-edge -o yaml

Copy to Clipboard

Toggle word wrap

The resulting Route resource should look similar to the following:

Example output

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    type: sharded
  name: hello-openshift-edge
  namespace: hello-openshift
spec:
  subdomain: hello-openshift
  tls:
    termination: edge
  to:
    kind: Service
    name: hello-openshift
status:
  ingress:
  - host: hello-openshift.<apps-sharded.basedomain.example.net> 
    routerCanonicalHostname: router-sharded.<apps-sharded.basedomain.example.net> 
    routerName: sharded

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    type: sharded
  name: hello-openshift-edge
  namespace: hello-openshift
spec:
  subdomain: hello-openshift
  tls:
    termination: edge
  to:
    kind: Service
    name: hello-openshift
status:
  ingress:
  - host: hello-openshift.<apps-sharded.basedomain.example.net>

1


    routerCanonicalHostname: router-sharded.<apps-sharded.basedomain.example.net>

2


    routerName: sharded

3

Copy to Clipboard

Toggle word wrap

1: The hostname the Ingress Controller, or router, uses to expose the route. The value of the host field is automatically determined by the Ingress Controller, and uses its domain. In this example, the domain of the Ingress Controller is <apps-sharded.basedomain.example.net>.
2: The hostname of the Ingress Controller.
3: The name of the Ingress Controller. In this example, the Ingress Controller has the name sharded.

27.1.3. Configuring route timeouts
Copy link

You can configure the default timeouts for an existing route when you have services in need of a low timeout, which is required for Service Level Availability (SLA) purposes, or a high timeout, for cases with a slow back end.

Important

If you configured a user-managed external load balancer in front of your OpenShift Container Platform cluster, ensure that the timeout value for the user-managed external load balancer is higher than the timeout value for the route. This configuration prevents network congestion issues over the network that your cluster uses.

Prerequisites

You need a deployed Ingress Controller on a running cluster.

Procedure

Using the oc annotate command, add the timeout to the route:

oc annotate route <route_name> \
    --overwrite haproxy.router.openshift.io/timeout=<timeout><time_unit>

$ oc annotate route <route_name> \
    --overwrite haproxy.router.openshift.io/timeout=<timeout><time_unit>

1

Copy to Clipboard

Toggle word wrap

1: Supported time units are microseconds (us), milliseconds (ms), seconds (s), minutes (m), hours (h), or days (d).

The following example sets a timeout of two seconds on a route named myroute:

oc annotate route myroute --overwrite haproxy.router.openshift.io/timeout=2s

$ oc annotate route myroute --overwrite haproxy.router.openshift.io/timeout=2s

Copy to Clipboard

Toggle word wrap

27.1.4. HTTP Strict Transport Security
Copy link

HTTP Strict Transport Security (HSTS) policy is a security enhancement, which signals to the browser client that only HTTPS traffic is allowed on the route host. HSTS also optimizes web traffic by signaling HTTPS transport is required, without using HTTP redirects. HSTS is useful for speeding up interactions with websites.

When HSTS policy is enforced, HSTS adds a Strict Transport Security header to HTTP and HTTPS responses from the site. You can use the insecureEdgeTerminationPolicy value in a route to redirect HTTP to HTTPS. When HSTS is enforced, the client changes all requests from the HTTP URL to HTTPS before the request is sent, eliminating the need for a redirect.

Cluster administrators can configure HSTS to do the following:

Enable HSTS per-route
Disable HSTS per-route
Enforce HSTS per-domain, for a set of domains, or use namespace labels in combination with domains

Important

HSTS works only with secure routes, either edge-terminated or re-encrypt. The configuration is ineffective on HTTP or passthrough routes.

27.1.4.1. Enabling HTTP Strict Transport Security per-route
Copy link

HTTP strict transport security (HSTS) is implemented in the HAProxy template and applied to edge and re-encrypt routes that have the haproxy.router.openshift.io/hsts_header annotation.

Prerequisites

You are logged in to the cluster with a user with administrator privileges for the project.
You installed the oc CLI.

Procedure

To enable HSTS on a route, add the haproxy.router.openshift.io/hsts_header value to the edge-terminated or re-encrypt route. You can use the oc annotate tool to do this by running the following command:
```
oc annotate route <route_name> -n <namespace> --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=31536000;\
includeSubDomains;preload"
```
```
$ oc annotate route <route_name> -n <namespace> --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=31536000;\ 
```
1
```
includeSubDomains;preload"
```
Copy to Clipboard Toggle word wrap
1
In this example, the maximum age is set to 31536000 ms, which is approximately eight and a half hours.
Note
In this example, the equal sign (=) is in quotes. This is required to properly execute the annotate command.
Example route configured with an annotation
```
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/hsts_header: max-age=31536000;includeSubDomains;preload   
...
spec:
  host: def.abc.com
  tls:
    termination: "reencrypt"
    ...
  wildcardPolicy: "Subdomain"
```
```
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/hsts_header: max-age=31536000;includeSubDomains;preload 
```
1
```
 
```
2
```
 
```
3
```
...
spec:
  host: def.abc.com
  tls:
    termination: "reencrypt"
    ...
  wildcardPolicy: "Subdomain"
```
Copy to Clipboard Toggle word wrap
1
Required. max-age measures the length of time, in seconds, that the HSTS policy is in effect. If set to 0, it negates the policy.
2
Optional. When included, includeSubDomains tells the client that all subdomains of the host must have the same HSTS policy as the host.
3
Optional. When max-age is greater than 0, you can add preload in haproxy.router.openshift.io/hsts_header to allow external services to include this site in their HSTS preload lists. For example, sites such as Google can construct a list of sites that have preload set. Browsers can then use these lists to determine which sites they can communicate with over HTTPS, even before they have interacted with the site. Without preload set, browsers must have interacted with the site over HTTPS, at least once, to get the header.

27.1.4.2. Disabling HTTP Strict Transport Security per-route
Copy link

To disable HTTP strict transport security (HSTS) per-route, you can set the max-age value in the route annotation to 0.

Prerequisites

You are logged in to the cluster with a user with administrator privileges for the project.
You installed the oc CLI.

Procedure

To disable HSTS, set the max-age value in the route annotation to 0, by entering the following command:

oc annotate route <route_name> -n <namespace> --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=0"

$ oc annotate route <route_name> -n <namespace> --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=0"

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to create the config map:

Example of disabling HSTS per-route

metadata:
  annotations:
    haproxy.router.openshift.io/hsts_header: max-age=0

metadata:
  annotations:
    haproxy.router.openshift.io/hsts_header: max-age=0

Copy to Clipboard

Toggle word wrap

To disable HSTS for every route in a namespace, enter the following command:

oc annotate route --all -n <namespace> --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=0"

$ oc annotate route --all -n <namespace> --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=0"

Copy to Clipboard

Toggle word wrap

Verification

To query the annotation for all routes, enter the following command:

oc get route  --all-namespaces -o go-template='{{range .items}}{{if .metadata.annotations}}{{$a := index .metadata.annotations "haproxy.router.openshift.io/hsts_header"}}{{$n := .metadata.name}}{{with $a}}Name: {{$n}} HSTS: {{$a}}{{"\n"}}{{else}}{{""}}{{end}}{{end}}{{end}}'

$ oc get route  --all-namespaces -o go-template='{{range .items}}{{if .metadata.annotations}}{{$a := index .metadata.annotations "haproxy.router.openshift.io/hsts_header"}}{{$n := .metadata.name}}{{with $a}}Name: {{$n}} HSTS: {{$a}}{{"\n"}}{{else}}{{""}}{{end}}{{end}}{{end}}'

Copy to Clipboard

Toggle word wrap

Example output

Name: routename HSTS: max-age=0

Name: routename HSTS: max-age=0

Copy to Clipboard

Toggle word wrap

27.1.4.3. Enforcing HTTP Strict Transport Security per-domain
Copy link

To enforce HTTP Strict Transport Security (HSTS) per-domain for secure routes, add a requiredHSTSPolicies record to the Ingress spec to capture the configuration of the HSTS policy.

If you configure a requiredHSTSPolicy to enforce HSTS, then any newly created route must be configured with a compliant HSTS policy annotation.

Note

To handle upgraded clusters with non-compliant HSTS routes, you can update the manifests at the source and apply the updates.

Note

You cannot use oc expose route or oc create route commands to add a route in a domain that enforces HSTS, because the API for these commands does not accept annotations.

Important

HSTS cannot be applied to insecure, or non-TLS routes, even if HSTS is requested for all routes globally.

Prerequisites

You are logged in to the cluster with a user with administrator privileges for the project.
You installed the oc CLI.

Procedure

Edit the Ingress config file:
```
oc edit ingresses.config.openshift.io/cluster
```
```
$ oc edit ingresses.config.openshift.io/cluster
```
Copy to Clipboard Toggle word wrap
Example HSTS policy
```
apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
  name: cluster
spec:
  domain: 'hello-openshift-default.apps.username.devcluster.openshift.com'
  requiredHSTSPolicies: 
  - domainPatterns: 
    - '*hello-openshift-default.apps.username.devcluster.openshift.com'
    - '*hello-openshift-default2.apps.username.devcluster.openshift.com'
    namespaceSelector: 
      matchLabels:
        myPolicy: strict
    maxAge: 
      smallestMaxAge: 1
      largestMaxAge: 31536000
    preloadPolicy: RequirePreload 
    includeSubDomainsPolicy: RequireIncludeSubDomains 
  - domainPatterns: 
    - 'abc.example.com'
    - '*xyz.example.com'
    namespaceSelector:
      matchLabels: {}
    maxAge: {}
    preloadPolicy: NoOpinion
    includeSubDomainsPolicy: RequireNoIncludeSubDomains
```
```
apiVersion: config.openshift.io/v1
kind: Ingress
metadata:
  name: cluster
spec:
  domain: 'hello-openshift-default.apps.username.devcluster.openshift.com'
  requiredHSTSPolicies: 
```
1
```
  - domainPatterns: 
```
2
```
    - '*hello-openshift-default.apps.username.devcluster.openshift.com'
    - '*hello-openshift-default2.apps.username.devcluster.openshift.com'
    namespaceSelector: 
```
3
```
      matchLabels:
        myPolicy: strict
    maxAge: 
```
4
```
      smallestMaxAge: 1
      largestMaxAge: 31536000
    preloadPolicy: RequirePreload 
```
5
```
    includeSubDomainsPolicy: RequireIncludeSubDomains 
```
6
```
  - domainPatterns: 
```
7
```
    - 'abc.example.com'
    - '*xyz.example.com'
    namespaceSelector:
      matchLabels: {}
    maxAge: {}
    preloadPolicy: NoOpinion
    includeSubDomainsPolicy: RequireNoIncludeSubDomains
```
Copy to Clipboard Toggle word wrap
1
Required. requiredHSTSPolicies are validated in order, and the first matching domainPatterns applies.
2 7
Required. You must specify at least one domainPatterns hostname. Any number of domains can be listed. You can include multiple sections of enforcing options for different domainPatterns.
3
Optional. If you include namespaceSelector, it must match the labels of the project where the routes reside, to enforce the set HSTS policy on the routes. Routes that only match the namespaceSelector and not the domainPatterns are not validated.
4
Required. max-age measures the length of time, in seconds, that the HSTS policy is in effect. This policy setting allows for a smallest and largest max-age to be enforced.
The largestMaxAge value must be between 0 and 2147483647. It can be left unspecified, which means no upper limit is enforced.
The smallestMaxAge value must be between 0 and 2147483647. Enter 0 to disable HSTS for troubleshooting, otherwise enter 1 if you never want HSTS to be disabled. It can be left unspecified, which means no lower limit is enforced.
5
Optional. Including preload in haproxy.router.openshift.io/hsts_header allows external services to include this site in their HSTS preload lists. Browsers can then use these lists to determine which sites they can communicate with over HTTPS, before they have interacted with the site. Without preload set, browsers need to interact at least once with the site to get the header. preload can be set with one of the following:
RequirePreload: preload is required by the RequiredHSTSPolicy.
RequireNoPreload: preload is forbidden by the RequiredHSTSPolicy.
NoOpinion: preload does not matter to the RequiredHSTSPolicy.
6
Optional. includeSubDomainsPolicy can be set with one of the following:
RequireIncludeSubDomains: includeSubDomains is required by the RequiredHSTSPolicy.
RequireNoIncludeSubDomains: includeSubDomains is forbidden by the RequiredHSTSPolicy.
NoOpinion: includeSubDomains does not matter to the RequiredHSTSPolicy.

You can apply HSTS to all routes in the cluster or in a particular namespace by entering the oc annotate command.

To apply HSTS to all routes in the cluster, enter the oc annotate command. For example:

oc annotate route --all --all-namespaces --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=31536000"

$ oc annotate route --all --all-namespaces --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=31536000"

Copy to Clipboard

Toggle word wrap

To apply HSTS to all routes in a particular namespace, enter the oc annotate command. For example:

oc annotate route --all -n my-namespace --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=31536000"

$ oc annotate route --all -n my-namespace --overwrite=true "haproxy.router.openshift.io/hsts_header"="max-age=31536000"

Copy to Clipboard

Toggle word wrap

Verification

You can review the HSTS policy you configured. For example:

To review the maxAge set for required HSTS policies, enter the following command:

oc get clusteroperator/ingress -n openshift-ingress-operator -o jsonpath='{range .spec.requiredHSTSPolicies[*]}{.spec.requiredHSTSPolicies.maxAgePolicy.largestMaxAge}{"\n"}{end}'

$ oc get clusteroperator/ingress -n openshift-ingress-operator -o jsonpath='{range .spec.requiredHSTSPolicies[*]}{.spec.requiredHSTSPolicies.maxAgePolicy.largestMaxAge}{"\n"}{end}'

Copy to Clipboard

Toggle word wrap

To review the HSTS annotations on all routes, enter the following command:

oc get route  --all-namespaces -o go-template='{{range .items}}{{if .metadata.annotations}}{{$a := index .metadata.annotations "haproxy.router.openshift.io/hsts_header"}}{{$n := .metadata.name}}{{with $a}}Name: {{$n}} HSTS: {{$a}}{{"\n"}}{{else}}{{""}}{{end}}{{end}}{{end}}'

$ oc get route  --all-namespaces -o go-template='{{range .items}}{{if .metadata.annotations}}{{$a := index .metadata.annotations "haproxy.router.openshift.io/hsts_header"}}{{$n := .metadata.name}}{{with $a}}Name: {{$n}} HSTS: {{$a}}{{"\n"}}{{else}}{{""}}{{end}}{{end}}{{end}}'

Copy to Clipboard

Toggle word wrap

Example output

Name: <_routename_> HSTS: max-age=31536000;preload;includeSubDomains

Name: <_routename_> HSTS: max-age=31536000;preload;includeSubDomains

Copy to Clipboard

Toggle word wrap

27.1.5. Throughput issue troubleshooting methods
Copy link

Sometimes applications deployed by using OpenShift Container Platform can cause network throughput issues, such as unusually high latency between specific services.

If pod logs do not reveal any cause of the problem, use the following methods to analyze performance issues:

Use a packet analyzer, such as ping or tcpdump to analyze traffic between a pod and its node.
For example, run the tcpdump tool on each pod while reproducing the behavior that led to the issue. Review the captures on both sides to compare send and receive timestamps to analyze the latency of traffic to and from a pod. Latency can occur in OpenShift Container Platform if a node interface is overloaded with traffic from other pods, storage devices, or the data plane.
```
tcpdump -s 0 -i any -w /tmp/dump.pcap host <podip 1> && host <podip 2>
```
```
$ tcpdump -s 0 -i any -w /tmp/dump.pcap host <podip 1> && host <podip 2> 
```
1
Copy to Clipboard Toggle word wrap
1
podip is the IP address for the pod. Run the oc get pod <pod_name> -o wide command to get the IP address of a pod.
The tcpdump command generates a file at /tmp/dump.pcap containing all traffic between these two pods. You can run the analyzer shortly before the issue is reproduced and stop the analyzer shortly after the issue is finished reproducing to minimize the size of the file. You can also run a packet analyzer between the nodes (eliminating the SDN from the equation) with:
```
tcpdump -s 0 -i any -w /tmp/dump.pcap port 4789
```
```
$ tcpdump -s 0 -i any -w /tmp/dump.pcap port 4789
```
Copy to Clipboard Toggle word wrap
Use a bandwidth measuring tool, such as iperf, to measure streaming throughput and UDP throughput. Locate any bottlenecks by running the tool from the pods first, and then running it from the nodes.
- For information on installing and using iperf, see this Red Hat Solution.
In some cases, the cluster may mark the node with the router pod as unhealthy due to latency issues. Use worker latency profiles to adjust the frequency that the cluster waits for a status update from the node before taking action.
If your cluster has designated lower-latency and higher-latency nodes, configure the spec.nodePlacement field in the Ingress Controller to control the placement of the router pod.

27.1.6. Using cookies to keep route statefulness
Copy link

OpenShift Container Platform provides sticky sessions, which enables stateful application traffic by ensuring all traffic hits the same endpoint. However, if the endpoint pod terminates, whether through restart, scaling, or a change in configuration, this statefulness can disappear.

OpenShift Container Platform can use cookies to configure session persistence. The Ingress controller selects an endpoint to handle any user requests, and creates a cookie for the session. The cookie is passed back in the response to the request and the user sends the cookie back with the next request in the session. The cookie tells the Ingress Controller which endpoint is handling the session, ensuring that client requests use the cookie so that they are routed to the same pod.

Note

Cookies cannot be set on passthrough routes, because the HTTP traffic cannot be seen. Instead, a number is calculated based on the source IP address, which determines the backend.

If backends change, the traffic can be directed to the wrong server, making it less sticky. If you are using a load balancer, which hides source IP, the same number is set for all connections and traffic is sent to the same pod.

27.1.6.1. Annotating a route with a cookie
Copy link

You can set a cookie name to overwrite the default, auto-generated one for the route. This allows the application receiving route traffic to know the cookie name. By deleting the cookie it can force the next request to re-choose an endpoint. So, if a server was overloaded it tries to remove the requests from the client and redistribute them.

Procedure

Annotate the route with the specified cookie name:
```
oc annotate route <route_name> router.openshift.io/cookie_name="<cookie_name>"
```
```
$ oc annotate route <route_name> router.openshift.io/cookie_name="<cookie_name>"
```
Copy to Clipboard Toggle word wrap
where:
<route_name>
Specifies the name of the route.
<cookie_name>
Specifies the name for the cookie.
For example, to annotate the route my_route with the cookie name my_cookie:
```
oc annotate route my_route router.openshift.io/cookie_name="my_cookie"
```
```
$ oc annotate route my_route router.openshift.io/cookie_name="my_cookie"
```
Copy to Clipboard Toggle word wrap
Capture the route hostname in a variable:
```
ROUTE_NAME=$(oc get route <route_name> -o jsonpath='{.spec.host}')
```
```
$ ROUTE_NAME=$(oc get route <route_name> -o jsonpath='{.spec.host}')
```
Copy to Clipboard Toggle word wrap
where:
<route_name>
Specifies the name of the route.
Save the cookie, and then access the route:
```
curl $ROUTE_NAME -k -c /tmp/cookie_jar
```
```
$ curl $ROUTE_NAME -k -c /tmp/cookie_jar
```
Copy to Clipboard Toggle word wrap
Use the cookie saved by the previous command when connecting to the route:
```
curl $ROUTE_NAME -k -b /tmp/cookie_jar
```
```
$ curl $ROUTE_NAME -k -b /tmp/cookie_jar
```
Copy to Clipboard Toggle word wrap

27.1.7. Path-based routes
Copy link

Path-based routes specify a path component that can be compared against a URL, which requires that the traffic for the route be HTTP based. Thus, multiple routes can be served using the same hostname, each with a different path. Routers should match routes based on the most specific path to the least.

The following table shows example routes and their accessibility:

Expand

Table 27.1. Route availability
Route	When Compared to	Accessible
www.example.com/test	www.example.com/test	Yes
www.example.com/test	www.example.com	No
www.example.com/test and www.example.com	www.example.com/test	Yes
www.example.com/test and www.example.com	www.example.com	Yes
www.example.com	www.example.com/text	Yes (Matched by the host, not the route)
www.example.com	www.example.com	Yes

An unsecured route with a path

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: route-unsecured
spec:
  host: www.example.com
  path: "/test" 
  to:
    kind: Service
    name: service-name

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: route-unsecured
spec:
  host: www.example.com
  path: "/test"

1


  to:
    kind: Service
    name: service-name

Copy to Clipboard

Toggle word wrap

1: The path is the only added attribute for a path-based route.

Note

Path-based routing is not available when using passthrough TLS, as the router does not terminate TLS in that case and cannot read the contents of the request.

27.1.8. Route-specific annotations
Copy link

The Ingress Controller can set the default options for all the routes it exposes. An individual route can override some of these defaults by providing specific configurations in its annotations. Red Hat does not support adding a route annotation to an operator-managed route.

Important

To create a whitelist with multiple source IPs or subnets, use a space-delimited list. Any other delimiter type causes the list to be ignored without a warning or error message.

Expand

Table 27.2. Route annotations
Variable	Description	Environment variable used as default
`haproxy.router.openshift.io/balance`	Sets the load-balancing algorithm. Available options are `random`, `source`, `roundrobin`, and `leastconn`. The default value is `source` for TLS passthrough routes. For all other routes, the default is `random`.	`ROUTER_TCP_BALANCE_SCHEME` for passthrough routes. Otherwise, use `ROUTER_LOAD_BALANCE_ALGORITHM`.
`haproxy.router.openshift.io/disable_cookies`	Disables the use of cookies to track related connections. If set to `'true'` or `'TRUE'`, the balance algorithm is used to choose which back-end serves connections for each incoming HTTP request.
`router.openshift.io/cookie_name`	Specifies an optional cookie to use for this route. The name must consist of any combination of upper and lower case letters, digits, "_", and "-". The default is the hashed internal key name for the route.
`haproxy.router.openshift.io/pod-concurrent-connections`	Sets the maximum number of connections that are allowed to a backing pod from a router. Note: If there are multiple pods, each can have this many connections. If you have multiple routers, there is no coordination among them, each may connect this many times. If not set, or set to 0, there is no limit.
`haproxy.router.openshift.io/rate-limit-connections`	Setting `'true'` or `'TRUE'` enables rate limiting functionality which is implemented through stick-tables on the specific backend per route. Note: Using this annotation provides basic protection against denial-of-service attacks.
`haproxy.router.openshift.io/rate-limit-connections.concurrent-tcp`	Limits the number of concurrent TCP connections made through the same source IP address. It accepts a numeric value. Note: Using this annotation provides basic protection against denial-of-service attacks.
`haproxy.router.openshift.io/rate-limit-connections.rate-http`	Limits the rate at which a client with the same source IP address can make HTTP requests. It accepts a numeric value. Note: Using this annotation provides basic protection against denial-of-service attacks.
`haproxy.router.openshift.io/rate-limit-connections.rate-tcp`	Limits the rate at which a client with the same source IP address can make TCP connections. It accepts a numeric value. Note: Using this annotation provides basic protection against denial-of-service attacks.
`haproxy.router.openshift.io/timeout`	Sets a server-side timeout for the route. (TimeUnits)	`ROUTER_DEFAULT_SERVER_TIMEOUT`
`haproxy.router.openshift.io/timeout-tunnel`	This timeout applies to a tunnel connection, for example, WebSocket over cleartext, edge, reencrypt, or passthrough routes. With cleartext, edge, or reencrypt route types, this annotation is applied as a timeout tunnel with the existing timeout value. For the passthrough route types, the annotation takes precedence over any existing timeout value set.	`ROUTER_DEFAULT_TUNNEL_TIMEOUT`
`ingresses.config/cluster ingress.operator.openshift.io/hard-stop-after`	You can set either an IngressController or the ingress config . This annotation redeploys the router and configures the HA proxy to emit the haproxy `hard-stop-after` global option, which defines the maximum time allowed to perform a clean soft-stop.	`ROUTER_HARD_STOP_AFTER`
`router.openshift.io/haproxy.health.check.interval`	Sets the interval for the back-end health checks. (TimeUnits)	`ROUTER_BACKEND_CHECK_INTERVAL`
`haproxy.router.openshift.io/ip_whitelist`	Sets an allowlist for the route. The allowlist is a space-separated list of IP addresses and CIDR ranges for the approved source addresses. Requests from IP addresses that are not in the allowlist are dropped. The maximum number of IP addresses and CIDR ranges directly visible in the `haproxy.config` file is 61. [¹]
`haproxy.router.openshift.io/hsts_header`	Sets a Strict-Transport-Security header for the edge terminated or re-encrypt route.
`haproxy.router.openshift.io/rewrite-target`	Sets the rewrite path of the request on the backend.
`router.openshift.io/cookie-same-site`	Sets a value to restrict cookies. The values are: `Lax`: the browser does not send cookies on cross-site requests, but does send cookies when users navigate to the origin site from an external site. This is the default browser behavior when the `SameSite` value is not specified. `Strict`: the browser sends cookies only for same-site requests. `None`: the browser sends cookies for both cross-site and same-site requests. This value is applicable to re-encrypt and edge routes only. For more information, see the SameSite cookies documentation.
`haproxy.router.openshift.io/set-forwarded-headers`	Sets the policy for handling the `Forwarded` and `X-Forwarded-For` HTTP headers per route. The values are: `append`: appends the header, preserving any existing header. This is the default value. `replace`: sets the header, removing any existing header. `never`: never sets the header, but preserves any existing header. `if-none`: sets the header if it is not already set.	`ROUTER_SET_FORWARDED_HEADERS`

If the number of IP addresses and CIDR ranges in an allowlist exceeds 61, they are written into a separate file that is then referenced from haproxy.config. This file is stored in the var/lib/haproxy/router/whitelists folder.
Note
To ensure that the addresses are written to the allowlist, check that the full list of CIDR ranges are listed in the Ingress Controller configuration file. The etcd object size limit restricts how large a route annotation can be. Because of this, it creates a threshold for the maximum number of IP addresses and CIDR ranges that you can include in an allowlist.

Note

Environment variables cannot be edited.

Router timeout variables

TimeUnits are represented by a number followed by the unit: us *(microseconds), ms (milliseconds, default), s (seconds), m (minutes), h *(hours), d (days).

The regular expression is: [1-9][0-9]*(us\|ms\|s\|m\|h\|d).

Expand

Variable	Default	Description
`ROUTER_BACKEND_CHECK_INTERVAL`	`5000ms`	Length of time between subsequent liveness checks on back ends.
`ROUTER_CLIENT_FIN_TIMEOUT`	`1s`	Controls the TCP FIN timeout period for the client connecting to the route. If the FIN sent to close the connection does not answer within the given time, HAProxy closes the connection. This is harmless if set to a low value and uses fewer resources on the router.
`ROUTER_DEFAULT_CLIENT_TIMEOUT`	`30s`	Length of time that a client has to acknowledge or send data.
`ROUTER_DEFAULT_CONNECT_TIMEOUT`	`5s`	The maximum connection time.
`ROUTER_DEFAULT_SERVER_FIN_TIMEOUT`	`1s`	Controls the TCP FIN timeout from the router to the pod backing the route.
`ROUTER_DEFAULT_SERVER_TIMEOUT`	`30s`	Length of time that a server has to acknowledge or send data.
`ROUTER_DEFAULT_TUNNEL_TIMEOUT`	`1h`	Length of time for TCP or WebSocket connections to remain open. This timeout period resets whenever HAProxy reloads.
`ROUTER_SLOWLORIS_HTTP_KEEPALIVE`	`300s`	Set the maximum time to wait for a new HTTP request to appear. If this is set too low, it can cause problems with browsers and applications not expecting a small `keepalive` value. Some effective timeout values can be the sum of certain variables, rather than the specific expected timeout. For example, `ROUTER_SLOWLORIS_HTTP_KEEPALIVE` adjusts `timeout http-keep-alive`. It is set to `300s` by default, but HAProxy also waits on `tcp-request inspect-delay`, which is set to `5s`. In this case, the overall timeout would be `300s` plus `5s`.
`ROUTER_SLOWLORIS_TIMEOUT`	`10s`	Length of time the transmission of an HTTP request can take.
`RELOAD_INTERVAL`	`5s`	Allows the minimum frequency for the router to reload and accept new changes.
`ROUTER_METRICS_HAPROXY_TIMEOUT`	`5s`	Timeout for the gathering of HAProxy metrics.

A route setting custom timeout

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/timeout: 5500ms 
...

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/timeout: 5500ms

1

...

Copy to Clipboard

Toggle word wrap

1: Specifies the new timeout with HAProxy supported units (us, ms, s, m, h, d). If the unit is not provided, ms is the default.

Note

Setting a server-side timeout value for passthrough routes too low can cause WebSocket connections to timeout frequently on that route.

A route that allows only one specific IP address

metadata:
  annotations:
    haproxy.router.openshift.io/ip_whitelist: 192.168.1.10

metadata:
  annotations:
    haproxy.router.openshift.io/ip_whitelist: 192.168.1.10

Copy to Clipboard

Toggle word wrap

A route that allows several IP addresses

metadata:
  annotations:
    haproxy.router.openshift.io/ip_whitelist: 192.168.1.10 192.168.1.11 192.168.1.12

metadata:
  annotations:
    haproxy.router.openshift.io/ip_whitelist: 192.168.1.10 192.168.1.11 192.168.1.12

Copy to Clipboard

Toggle word wrap

A route that allows an IP address CIDR network

metadata:
  annotations:
    haproxy.router.openshift.io/ip_whitelist: 192.168.1.0/24

metadata:
  annotations:
    haproxy.router.openshift.io/ip_whitelist: 192.168.1.0/24

Copy to Clipboard

Toggle word wrap

A route that allows both IP an address and IP address CIDR networks

metadata:
  annotations:
    haproxy.router.openshift.io/ip_whitelist: 180.5.61.153 192.168.1.0/24 10.0.0.0/8

metadata:
  annotations:
    haproxy.router.openshift.io/ip_whitelist: 180.5.61.153 192.168.1.0/24 10.0.0.0/8

Copy to Clipboard

Toggle word wrap

A route specifying a rewrite target

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/rewrite-target: / 
...

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  annotations:
    haproxy.router.openshift.io/rewrite-target: /

1

...

Copy to Clipboard

Toggle word wrap

1: Sets / as rewrite path of the request on the backend.

Setting the haproxy.router.openshift.io/rewrite-target annotation on a route specifies that the Ingress Controller should rewrite paths in HTTP requests using this route before forwarding the requests to the backend application. The part of the request path that matches the path specified in spec.path is replaced with the rewrite target specified in the annotation.

The following table provides examples of the path rewriting behavior for various combinations of spec.path, request path, and rewrite target.

Expand

Table 27.3. rewrite-target examples:
Route.spec.path	Request path	Rewrite target	Forwarded request path
/foo	/foo	/	/
/foo	/foo/	/	/
/foo	/foo/bar	/	/bar
/foo	/foo/bar/	/	/bar/
/foo	/foo	/bar	/bar
/foo	/foo/	/bar	/bar/
/foo	/foo/bar	/baz	/baz/bar
/foo	/foo/bar/	/baz	/baz/bar/
/foo/	/foo	/	N/A (request path does not match route path)
/foo/	/foo/	/	/
/foo/	/foo/bar	/	/bar

27.1.9. Configuring the route admission policy
Copy link

Administrators and application developers can run applications in multiple namespaces with the same domain name. This is for organizations where multiple teams develop microservices that are exposed on the same hostname.

Warning

Allowing claims across namespaces should only be enabled for clusters with trust between namespaces, otherwise a malicious user could take over a hostname. For this reason, the default admission policy disallows hostname claims across namespaces.

Prerequisites

Cluster administrator privileges.

Procedure

Edit the .spec.routeAdmission field of the ingresscontroller resource variable using the following command:

oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge

$ oc -n openshift-ingress-operator patch ingresscontroller/default --patch '{"spec":{"routeAdmission":{"namespaceOwnership":"InterNamespaceAllowed"}}}' --type=merge

Copy to Clipboard

Toggle word wrap

Sample Ingress Controller configuration

spec:
  routeAdmission:
    namespaceOwnership: InterNamespaceAllowed
...

spec:
  routeAdmission:
    namespaceOwnership: InterNamespaceAllowed
...

Copy to Clipboard

Toggle word wrap

Tip

You can alternatively apply the following YAML to configure the route admission policy:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  routeAdmission:
    namespaceOwnership: InterNamespaceAllowed

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  routeAdmission:
    namespaceOwnership: InterNamespaceAllowed

Copy to Clipboard

Toggle word wrap

27.1.10. Creating a route through an Ingress object
Copy link

Some ecosystem components have an integration with Ingress resources but not with route resources. To cover this case, OpenShift Container Platform automatically creates managed route objects when an Ingress object is created. These route objects are deleted when the corresponding Ingress objects are deleted.

Procedure

Define an Ingress object in the OpenShift Container Platform console or by entering the oc create command:

YAML Definition of an Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend
  annotations:
    route.openshift.io/termination: "reencrypt" 
    route.openshift.io/destination-ca-certificate-secret: secret-ca-cert 
spec:
  rules:
  - host: www.example.com 
    http:
      paths:
      - backend:
          service:
            name: frontend
            port:
              number: 443
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - www.example.com
    secretName: example-com-tls-certificate

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend
  annotations:
    route.openshift.io/termination: "reencrypt"

1


    route.openshift.io/destination-ca-certificate-secret: secret-ca-cert

2


spec:
  rules:
  - host: www.example.com

3


    http:
      paths:
      - backend:
          service:
            name: frontend
            port:
              number: 443
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - www.example.com
    secretName: example-com-tls-certificate

Copy to Clipboard

Toggle word wrap

The route.openshift.io/termination annotation can be used to configure the spec.tls.termination field of the Route as Ingress has no field for this. The accepted values are edge, passthrough and reencrypt. All other values are silently ignored. When the annotation value is unset, edge is the default route. The TLS certificate details must be defined in the template file to implement the default edge route.

When working with an Ingress object, you must specify an explicit hostname, unlike when working with routes. You can use the <host_name>.<cluster_ingress_domain> syntax, for example apps.openshiftdemos.com, to take advantage of the *.<cluster_ingress_domain> wildcard DNS record and serving certificate for the cluster. Otherwise, you must ensure that there is a DNS record for the chosen hostname.

If you specify the passthrough value in the route.openshift.io/termination annotation, set path to '' and pathType to ImplementationSpecific in the spec:

  spec:
    rules:
    - host: www.example.com
      http:
        paths:
        - path: ''
          pathType: ImplementationSpecific
          backend:
            service:
              name: frontend
              port:
                number: 443

  spec:
    rules:
    - host: www.example.com
      http:
        paths:
        - path: ''
          pathType: ImplementationSpecific
          backend:
            service:
              name: frontend
              port:
                number: 443

Copy to Clipboard

Toggle word wrap

oc apply -f ingress.yaml

$ oc apply -f ingress.yaml

Copy to Clipboard

Toggle word wrap

The route.openshift.io/destination-ca-certificate-secret can be used on an Ingress object to define a route with a custom destination certificate (CA). The annotation references a kubernetes secret, secret-ca-cert that will be inserted into the generated route.

To specify a route object with a destination CA from an ingress object, you must create a kubernetes.io/tls or Opaque type secret with a certificate in PEM-encoded format in the data.tls.crt specifier of the secret.

List your routes:

oc get routes

$ oc get routes

Copy to Clipboard

Toggle word wrap

The result includes an autogenerated route whose name starts with frontend-:

NAME             HOST/PORT         PATH    SERVICES    PORT    TERMINATION          WILDCARD
frontend-gnztq   www.example.com           frontend    443     reencrypt/Redirect   None

NAME             HOST/PORT         PATH    SERVICES    PORT    TERMINATION          WILDCARD
frontend-gnztq   www.example.com           frontend    443     reencrypt/Redirect   None

Copy to Clipboard

Toggle word wrap

If you inspect this route, it looks this:

YAML Definition of an autogenerated route

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: frontend-gnztq
  ownerReferences:
  - apiVersion: networking.k8s.io/v1
    controller: true
    kind: Ingress
    name: frontend
    uid: 4e6c59cc-704d-4f44-b390-617d879033b6
spec:
  host: www.example.com
  path: /
  port:
    targetPort: https
  tls:
    certificate: |
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
    insecureEdgeTerminationPolicy: Redirect
    key: |
      -----BEGIN RSA PRIVATE KEY-----
      [...]
      -----END RSA PRIVATE KEY-----
    termination: reencrypt
    destinationCACertificate: |
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
  to:
    kind: Service
    name: frontend

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: frontend-gnztq
  ownerReferences:
  - apiVersion: networking.k8s.io/v1
    controller: true
    kind: Ingress
    name: frontend
    uid: 4e6c59cc-704d-4f44-b390-617d879033b6
spec:
  host: www.example.com
  path: /
  port:
    targetPort: https
  tls:
    certificate: |
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
    insecureEdgeTerminationPolicy: Redirect
    key: |
      -----BEGIN RSA PRIVATE KEY-----
      [...]
      -----END RSA PRIVATE KEY-----
    termination: reencrypt
    destinationCACertificate: |
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
  to:
    kind: Service
    name: frontend

Copy to Clipboard

Toggle word wrap

27.1.11. Creating a route using the default certificate through an Ingress object
Copy link

If you create an Ingress object without specifying any TLS configuration, OpenShift Container Platform generates an insecure route. To create an Ingress object that generates a secure, edge-terminated route using the default ingress certificate, you can specify an empty TLS configuration as follows.

Prerequisites

You have a service that you want to expose.
You have access to the OpenShift CLI (oc).

Procedure

Create a YAML file for the Ingress object. In this example, the file is called example-ingress.yaml:
YAML definition of an Ingress object
```
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend
  ...
spec:
  rules:
    ...
  tls:
  - {} 
```
```
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend
  ...
spec:
  rules:
    ...
  tls:
  - {} 
```
1
Copy to Clipboard Toggle word wrap
1
Use this exact syntax to specify TLS without specifying a custom certificate.
Create the Ingress object by running the following command:
```
oc create -f example-ingress.yaml
```
```
$ oc create -f example-ingress.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Verify that OpenShift Container Platform has created the expected route for the Ingress object by running the following command:

oc get routes -o yaml

$ oc get routes -o yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: v1
items:
- apiVersion: route.openshift.io/v1
  kind: Route
  metadata:
    name: frontend-j9sdd 
    ...
  spec:
  ...
    tls: 
      insecureEdgeTerminationPolicy: Redirect
      termination: edge 
  ...

apiVersion: v1
items:
- apiVersion: route.openshift.io/v1
  kind: Route
  metadata:
    name: frontend-j9sdd

1


    ...
  spec:
  ...
    tls:

2


      insecureEdgeTerminationPolicy: Redirect
      termination: edge

3

...

Copy to Clipboard

Toggle word wrap

1: The name of the route includes the name of the Ingress object followed by a random suffix.
2: In order to use the default certificate, the route should not specify spec.certificate.
3: The route should specify the edge termination policy.

27.1.12. Creating a route using the destination CA certificate in the Ingress annotation
Copy link

The route.openshift.io/destination-ca-certificate-secret annotation can be used on an Ingress object to define a route with a custom destination CA certificate.

Prerequisites

You may have a certificate/key pair in PEM-encoded files, where the certificate is valid for the route host.
You may have a separate CA certificate in a PEM-encoded file that completes the certificate chain.
You must have a separate destination CA certificate in a PEM-encoded file.
You must have a service that you want to expose.

Procedure

Create a secret for the destination CA certificate by entering the following command:

oc create secret generic dest-ca-cert --from-file=tls.crt=<file_path>

$ oc create secret generic dest-ca-cert --from-file=tls.crt=<file_path>

Copy to Clipboard

Toggle word wrap

For example:

oc -n test-ns create secret generic dest-ca-cert --from-file=tls.crt=tls.crt

$ oc -n test-ns create secret generic dest-ca-cert --from-file=tls.crt=tls.crt

Copy to Clipboard

Toggle word wrap

Example output

secret/dest-ca-cert created

secret/dest-ca-cert created

Copy to Clipboard

Toggle word wrap

Add the route.openshift.io/destination-ca-certificate-secret to the Ingress annotations:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend
  annotations:
    route.openshift.io/termination: "reencrypt"
    route.openshift.io/destination-ca-certificate-secret: secret-ca-cert 
...

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: frontend
  annotations:
    route.openshift.io/termination: "reencrypt"
    route.openshift.io/destination-ca-certificate-secret: secret-ca-cert

1

...

Copy to Clipboard

Toggle word wrap

1: The annotation references a kubernetes secret.

The secret referenced in this annotation will be inserted into the generated route.

Example output

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: frontend
  annotations:
    route.openshift.io/termination: reencrypt
    route.openshift.io/destination-ca-certificate-secret: secret-ca-cert
spec:
...
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: reencrypt
    destinationCACertificate: |
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
...

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: frontend
  annotations:
    route.openshift.io/termination: reencrypt
    route.openshift.io/destination-ca-certificate-secret: secret-ca-cert
spec:
...
  tls:
    insecureEdgeTerminationPolicy: Redirect
    termination: reencrypt
    destinationCACertificate: |
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
...

Copy to Clipboard

Toggle word wrap

27.1.13. Configuring the OpenShift Container Platform Ingress Controller for dual-stack networking
Copy link

If your OpenShift Container Platform cluster is configured for IPv4 and IPv6 dual-stack networking, your cluster is externally reachable by OpenShift Container Platform routes.

The Ingress Controller automatically serves services that have both IPv4 and IPv6 endpoints, but you can configure the Ingress Controller for single-stack or dual-stack services.

Prerequisites

You deployed an OpenShift Container Platform cluster on bare metal.
You installed the OpenShift CLI (oc).

Procedure

To have the Ingress Controller serve traffic over IPv4/IPv6 to a workload, you can create a service YAML file or modify an existing service YAML file by setting the ipFamilies and ipFamilyPolicy fields. For example:

Sample service YAML file

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: yyyy-mm-ddT00:00:00Z
  labels:
    name: <service_name>
    manager: kubectl-create
    operation: Update
    time: yyyy-mm-ddT00:00:00Z
  name: <service_name>
  namespace: <namespace_name>
  resourceVersion: "<resource_version_number>"
  selfLink: "/api/v1/namespaces/<namespace_name>/services/<service_name>"
  uid: <uid_number>
spec:
  clusterIP: 172.30.0.0/16
  clusterIPs: 
  - 172.30.0.0/16
  - <second_IP_address>
  ipFamilies: 
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack 
  ports:
  - port: 8080
    protocol: TCP
    targetport: 8080
  selector:
    name: <namespace_name>
  sessionAffinity: None
  type: ClusterIP
status:
  loadbalancer: {}

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: yyyy-mm-ddT00:00:00Z
  labels:
    name: <service_name>
    manager: kubectl-create
    operation: Update
    time: yyyy-mm-ddT00:00:00Z
  name: <service_name>
  namespace: <namespace_name>
  resourceVersion: "<resource_version_number>"
  selfLink: "/api/v1/namespaces/<namespace_name>/services/<service_name>"
  uid: <uid_number>
spec:
  clusterIP: 172.30.0.0/16
  clusterIPs:

1


  - 172.30.0.0/16
  - <second_IP_address>
  ipFamilies:

2


  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack

3


  ports:
  - port: 8080
    protocol: TCP
    targetport: 8080
  selector:
    name: <namespace_name>
  sessionAffinity: None
  type: ClusterIP
status:
  loadbalancer: {}

Copy to Clipboard

Toggle word wrap

1: In a dual-stack instance, there are two different clusterIPs provided.
2: For a single-stack instance, enter IPv4 or IPv6. For a dual-stack instance, enter both IPv4 and IPv6.
3: For a single-stack instance, enter SingleStack. For a dual-stack instance, enter RequireDualStack.

These resources generate corresponding endpoints. The Ingress Controller now watches endpointslices.

To view endpoints, enter the following command:
```
oc get endpoints
```
```
$ oc get endpoints
```
Copy to Clipboard Toggle word wrap
To view endpointslices, enter the following command:
```
oc get endpointslices
```
```
$ oc get endpointslices
```
Copy to Clipboard Toggle word wrap

27.2. Secured routes
Copy link

Secure routes provide the ability to use several types of TLS termination to serve certificates to the client. The following sections describe how to create re-encrypt, edge, and passthrough routes with custom certificates.

Important

If you create routes in Microsoft Azure through public endpoints, the resource names are subject to restriction. You cannot create resources that use certain terms. For a list of terms that Azure restricts, see Resolve reserved resource name errors in the Azure documentation.

27.2.1. Creating a re-encrypt route with a custom certificate
Copy link

You can configure a secure route using reencrypt TLS termination with a custom certificate by using the oc create route command.

Prerequisites

You must have a certificate/key pair in PEM-encoded files, where the certificate is valid for the route host.
You may have a separate CA certificate in a PEM-encoded file that completes the certificate chain.
You must have a separate destination CA certificate in a PEM-encoded file.
You must have a service that you want to expose.

Note

Password protected key files are not supported. To remove a passphrase from a key file, use the following command:

openssl rsa -in password_protected_tls.key -out tls.key

$ openssl rsa -in password_protected_tls.key -out tls.key

Copy to Clipboard

Toggle word wrap

Procedure

This procedure creates a Route resource with a custom certificate and reencrypt TLS termination. The following assumes that the certificate/key pair are in the tls.crt and tls.key files in the current working directory. You must also specify a destination CA certificate to enable the Ingress Controller to trust the service’s certificate. You may also specify a CA certificate if needed to complete the certificate chain. Substitute the actual path names for tls.crt, tls.key, cacert.crt, and (optionally) ca.crt. Substitute the name of the Service resource that you want to expose for frontend. Substitute the appropriate hostname for www.example.com.

Create a secure Route resource using reencrypt TLS termination and a custom certificate:

oc create route reencrypt --service=frontend --cert=tls.crt --key=tls.key --dest-ca-cert=destca.crt --ca-cert=ca.crt --hostname=www.example.com

$ oc create route reencrypt --service=frontend --cert=tls.crt --key=tls.key --dest-ca-cert=destca.crt --ca-cert=ca.crt --hostname=www.example.com

Copy to Clipboard

Toggle word wrap

If you examine the resulting Route resource, it should look similar to the following:

YAML Definition of the Secure Route

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: frontend
spec:
  host: www.example.com
  to:
    kind: Service
    name: frontend
  tls:
    termination: reencrypt
    key: |-
      -----BEGIN PRIVATE KEY-----
      [...]
      -----END PRIVATE KEY-----
    certificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
    caCertificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
    destinationCACertificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: frontend
spec:
  host: www.example.com
  to:
    kind: Service
    name: frontend
  tls:
    termination: reencrypt
    key: |-
      -----BEGIN PRIVATE KEY-----
      [...]
      -----END PRIVATE KEY-----
    certificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
    caCertificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
    destinationCACertificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----

Copy to Clipboard

Toggle word wrap

See oc create route reencrypt --help for more options.

27.2.2. Creating an edge route with a custom certificate
Copy link

You can configure a secure route using edge TLS termination with a custom certificate by using the oc create route command. With an edge route, the Ingress Controller terminates TLS encryption before forwarding traffic to the destination pod. The route specifies the TLS certificate and key that the Ingress Controller uses for the route.

Prerequisites

You must have a certificate/key pair in PEM-encoded files, where the certificate is valid for the route host.
You may have a separate CA certificate in a PEM-encoded file that completes the certificate chain.
You must have a service that you want to expose.

Note

Password protected key files are not supported. To remove a passphrase from a key file, use the following command:

openssl rsa -in password_protected_tls.key -out tls.key

$ openssl rsa -in password_protected_tls.key -out tls.key

Copy to Clipboard

Toggle word wrap

Procedure

This procedure creates a Route resource with a custom certificate and edge TLS termination. The following assumes that the certificate/key pair are in the tls.crt and tls.key files in the current working directory. You may also specify a CA certificate if needed to complete the certificate chain. Substitute the actual path names for tls.crt, tls.key, and (optionally) ca.crt. Substitute the name of the service that you want to expose for frontend. Substitute the appropriate hostname for www.example.com.

Create a secure Route resource using edge TLS termination and a custom certificate.

oc create route edge --service=frontend --cert=tls.crt --key=tls.key --ca-cert=ca.crt --hostname=www.example.com

$ oc create route edge --service=frontend --cert=tls.crt --key=tls.key --ca-cert=ca.crt --hostname=www.example.com

Copy to Clipboard

Toggle word wrap

If you examine the resulting Route resource, it should look similar to the following:

YAML Definition of the Secure Route

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: frontend
spec:
  host: www.example.com
  to:
    kind: Service
    name: frontend
  tls:
    termination: edge
    key: |-
      -----BEGIN PRIVATE KEY-----
      [...]
      -----END PRIVATE KEY-----
    certificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
    caCertificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: frontend
spec:
  host: www.example.com
  to:
    kind: Service
    name: frontend
  tls:
    termination: edge
    key: |-
      -----BEGIN PRIVATE KEY-----
      [...]
      -----END PRIVATE KEY-----
    certificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----
    caCertificate: |-
      -----BEGIN CERTIFICATE-----
      [...]
      -----END CERTIFICATE-----

Copy to Clipboard

Toggle word wrap

See oc create route edge --help for more options.

27.2.3. Creating a passthrough route
Copy link

You can configure a secure route using passthrough termination by using the oc create route command. With passthrough termination, encrypted traffic is sent straight to the destination without the router providing TLS termination. Therefore no key or certificate is required on the route.

Prerequisites

You must have a service that you want to expose.

Procedure

Create a Route resource:

oc create route passthrough route-passthrough-secured --service=frontend --port=8080

$ oc create route passthrough route-passthrough-secured --service=frontend --port=8080

Copy to Clipboard

Toggle word wrap

If you examine the resulting Route resource, it should look similar to the following:

A Secured Route Using Passthrough Termination

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: route-passthrough-secured 
spec:
  host: www.example.com
  port:
    targetPort: 8080
  tls:
    termination: passthrough 
    insecureEdgeTerminationPolicy: None 
  to:
    kind: Service
    name: frontend

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: route-passthrough-secured

1


spec:
  host: www.example.com
  port:
    targetPort: 8080
  tls:
    termination: passthrough

2


    insecureEdgeTerminationPolicy: None

3


  to:
    kind: Service
    name: frontend

Copy to Clipboard

Toggle word wrap

1: The name of the object, which is limited to 63 characters.
2: The termination field is set to passthrough. This is the only required tls field.
3: Optional insecureEdgeTerminationPolicy. The only valid values are None, Redirect, or empty for disabled.

The destination pod is responsible for serving certificates for the traffic at the endpoint. This is currently the only method that can support requiring client certificates, also known as two-way authentication.

Chapter 28. Configuring ingress cluster traffic
Copy link

28.1. Configuring ingress cluster traffic overview
Copy link

OpenShift Container Platform provides the following methods for communicating from outside the cluster with services running in the cluster.

The methods are recommended, in order or preference:

If you have HTTP/HTTPS, use an Ingress Controller.
If you have a TLS-encrypted protocol other than HTTPS. For example, for TLS with the SNI header, use an Ingress Controller.
Otherwise, use a Load Balancer, an External IP, or a NodePort.

Expand

Method	Purpose
Use an Ingress Controller	Allows access to HTTP/HTTPS traffic and TLS-encrypted protocols other than HTTPS (for example, TLS with the SNI header).
Automatically assign an external IP using a load balancer service	Allows traffic to non-standard ports through an IP address assigned from a pool. Most cloud platforms offer a method to start a service with a load-balancer IP address.
About MetalLB and the MetalLB Operator	Allows traffic to a specific IP address or address from a pool on the machine network. For bare-metal installations or platforms that are like bare metal, MetalLB provides a way to start a service with a load-balancer IP address.
Manually assign an external IP to a service	Allows traffic to non-standard ports through a specific IP address.
Configure a `NodePort`	Expose a service on all nodes in the cluster.

28.1.1. Comparision: Fault tolerant access to external IP addresses
Copy link

For the communication methods that provide access to an external IP address, fault tolerant access to the IP address is another consideration. The following features provide fault tolerant access to an external IP address.

IP failover: IP failover manages a pool of virtual IP address for a set of nodes. It is implemented with Keepalived and Virtual Router Redundancy Protocol (VRRP). IP failover is a layer 2 mechanism only and relies on multicast. Multicast can have disadvantages for some networks.
MetalLB: MetalLB has a layer 2 mode, but it does not use multicast. Layer 2 mode has a disadvantage that it transfers all traffic for an external IP address through one node.
Manually assigning external IP addresses: You can configure your cluster with an IP address block that is used to assign external IP addresses to services. By default, this feature is disabled. This feature is flexible, but places the largest burden on the cluster or network administrator. The cluster is prepared to receive traffic that is destined for the external IP, but each customer has to decide how they want to route traffic to nodes.

28.2. Configuring ExternalIPs for services
Copy link

As a cluster administrator, you can select an IP address block that is external to the cluster that can send traffic to services in the cluster.

This functionality is generally most useful for clusters installed on bare-metal hardware.

28.2.1. Prerequisites
Copy link

Your network infrastructure must route traffic for the external IP addresses to your cluster.

28.2.2. About ExternalIP
Copy link

For non-cloud environments, OpenShift Container Platform supports the use of the ExternalIP facility to specify external IP addresses in the spec.externalIPs[] parameter of the Service object. A service configured with an ExternalIP functions similarly to a service with type=NodePort, whereby you traffic directs to a local node for load balancing.

Important

For cloud environments, use the load balancer services for automatic deployment of a cloud load balancer to target the endpoints of a service.

After you specify a value for the parameter, OpenShift Container Platform assigns an additional virtual IP address to the service. The IP address can exist outside of the service network that you defined for your cluster.

Warning

Because ExternalIP is disabled by default, enabling the ExternalIP functionality might introduce security risks for the service, because in-cluster traffic to an external IP address is directed to that service. This configuration means that cluster users could intercept sensitive traffic destined for external resources.

You can use either a MetalLB implementation or an IP failover deployment to attach an ExternalIP resource to a service in the following ways:

Automatic assignment of an external IP: OpenShift Container Platform automatically assigns an IP address from the autoAssignCIDRs CIDR block to the spec.externalIPs[] array when you create a Service object with spec.type=LoadBalancer set. For this configuration, OpenShift Container Platform implements a cloud version of the load balancer service type and assigns IP addresses to the services. Automatic assignment is disabled by default and must be configured by a cluster administrator as described in the "Configuration for ExternalIP" section.
Manual assignment of an external IP: OpenShift Container Platform uses the IP addresses assigned to the spec.externalIPs[] array when you create a Service object. You cannot specify an IP address that is already in use by another service.

After using either the MetalLB implementation or an IP failover deployment to host external IP address blocks, you must configure your networking infrastructure to ensure that the external IP address blocks are routed to your cluster. This configuration means that the IP address is not configured in the network interfaces from nodes. To handle the traffic, you must configure the routing and access to the external IP by using a method, such as static Address Resolution Protocol (ARP) entries.

OpenShift Container Platform extends the ExternalIP functionality in Kubernetes by adding the following capabilities:

Restrictions on the use of external IP addresses by users through a configurable policy
Allocation of an external IP address automatically to a service upon request

28.2.4. Configuration for ExternalIP
Copy link

The following parameters in the Network.config.openshift.io custom resource (CR) govern the use of an external IP address in OpenShift Container Platform:

spec.externalIP.autoAssignCIDRs defines an IP address block used by the load balancer when choosing an external IP address for the service. OpenShift Container Platform supports only a single IP address block for automatic assignment. This configuration requires less steps than manually assigning ExternalIPs to services, which requires managing the port space of a limited number of shared IP addresses. If you enable automatic assignment, the Cloud Controller Manager Operator allocates an external IP address to a Service object with spec.type=LoadBalancer defind in its configuration.
spec.externalIP.policy defines the permissible IP address blocks when manually specifying an IP address. OpenShift Container Platform does not apply policy rules to IP address blocks that you defined in the spec.externalIP.autoAssignCIDRs parameter.

If routed correctly, external traffic from the configured external IP address block can reach service endpoints through any TCP or UDP port that the service exposes.

Important

As a cluster administrator, you must configure routing to externalIPs. You must also ensure that the IP address block you assign terminates at one or more nodes in your cluster. For more information, see Kubernetes External IPs.

OpenShift Container Platform supports both automatic and manual IP address assignment. This support guarantees that each address gets assigned to a maximum of one service and that each service can expose its chosen ports regardless of the ports exposed by other services.

Note

To use IP address blocks defined by autoAssignCIDRs in OpenShift Container Platform, you must configure the necessary IP address assignment and routing for your host network.

The following YAML shows a Service object with a configured external IP:

apiVersion: v1
kind: Service
metadata:
  name: http-service
spec:
  clusterIP: 172.30.163.110
  externalIPs:
  - 192.168.132.253
  externalTrafficPolicy: Cluster
  ports:
  - name: highport
    nodePort: 31903
    port: 30102
    protocol: TCP
    targetPort: 30102
  selector:
    app: web
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 192.168.132.253
# ...

apiVersion: v1
kind: Service
metadata:
  name: http-service
spec:
  clusterIP: 172.30.163.110
  externalIPs:
  - 192.168.132.253
  externalTrafficPolicy: Cluster
  ports:
  - name: highport
    nodePort: 31903
    port: 30102
    protocol: TCP
    targetPort: 30102
  selector:
    app: web
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 192.168.132.253
# ...

Copy to Clipboard

Toggle word wrap

If you run a private cluster on a cloud-provider platform, you can change the publishing scope to internal for the load balancer of the Ingress Controller by running the following patch command:

oc -n openshift-ingress-operator patch ingresscontrollers/ingress-controller-with-nlb --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"loadBalancer":{"scope":"Internal"}}}}'

$ oc -n openshift-ingress-operator patch ingresscontrollers/ingress-controller-with-nlb --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"loadBalancer":{"scope":"Internal"}}}}'

Copy to Clipboard

Toggle word wrap

After you run this command, the Ingress Controller restricts access to routes for OpenShift Container Platform applications to internal networks only.

28.2.5. Restrictions on the assignment of an external IP address
Copy link

As a cluster administrator, you can specify IP address blocks to allow and to reject IP addresses for a service. Restrictions apply only to users without cluster-admin privileges. A cluster administrator can always set the service spec.externalIPs[] field to any IP address.

You configure an IP address policy by specifying Classless Inter-Domain Routing (CIDR) address blocks for the spec.ExternalIP.policy parameter in the policy object.

Example in JSON form of a policy object and its CIDR parameters

{
  "policy": {
    "allowedCIDRs": [],
    "rejectedCIDRs": []
  }
}

{
  "policy": {
    "allowedCIDRs": [],
    "rejectedCIDRs": []
  }
}

Copy to Clipboard

Toggle word wrap

When configuring policy restrictions, the following rules apply:

If policy is set to {}, creating a Service object with spec.ExternalIPs[] results in a failed service. This setting is the default for OpenShift Container Platform. The same behavior exists for policy: null.
If policy is set and either policy.allowedCIDRs[] or policy.rejectedCIDRs[] is set, the following rules apply:
- If allowedCIDRs[] and rejectedCIDRs[] are both set, rejectedCIDRs[] has precedence over allowedCIDRs[].
- If allowedCIDRs[] is set, creating a Service object with spec.ExternalIPs[] succeeds only if the specified IP addresses are allowed.
- If rejectedCIDRs[] is set, creating a Service object with spec.ExternalIPs[] succeeds only if the specified IP addresses are not rejected.

28.2.6. Example policy objects
Copy link

The examples in this section show different spec.externalIP.policy configurations.

In the following example, the policy prevents OpenShift Container Platform from creating any service with a specified external IP address.
Example policy to reject any value specified for Service object spec.externalIPs[]
```
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  externalIP:
    policy: {}
# ...
```
```
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  externalIP:
    policy: {}
# ...
```
Copy to Clipboard Toggle word wrap

In the following example, both the allowedCIDRs and rejectedCIDRs fields are set.

Example policy that includes both allowed and rejected CIDR blocks

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  externalIP:
    policy:
      allowedCIDRs:
      - 172.16.66.10/23
      rejectedCIDRs:
      - 172.16.66.10/24
# ...

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  externalIP:
    policy:
      allowedCIDRs:
      - 172.16.66.10/23
      rejectedCIDRs:
      - 172.16.66.10/24
# ...

Copy to Clipboard

Toggle word wrap

In the following example, policy is set to {}. With this configuration, using the oc get networks.config.openshift.io -o yaml command to view the configuration means policy parameter does not show on the command output. The same behavior exists for policy: null.
Example policy to allow any value specified for Service object spec.externalIPs[]
```
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
# ...
```
```
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
# ...
```
Copy to Clipboard Toggle word wrap

28.2.7. ExternalIP address block configuration
Copy link

The configuration for ExternalIP address blocks is defined by a Network custom resource (CR) named cluster. The Network CR is part of the config.openshift.io API group.

Important

During cluster installation, the Cluster Version Operator (CVO) automatically creates a Network CR named cluster. Creating any other CR objects of this type is not supported.

The following YAML describes the ExternalIP configuration:

Network.config.openshift.io CR named cluster

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  externalIP:
    autoAssignCIDRs: [] 
    policy: 
      ...

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  externalIP:
    autoAssignCIDRs: []

1


    policy:

2

...

Copy to Clipboard

Toggle word wrap

1: Defines the IP address block in CIDR format that is available for automatic assignment of external IP addresses to a service. Only a single IP address range is allowed.
2: Defines restrictions on manual assignment of an IP address to a service. If no restrictions are defined, specifying the spec.externalIP field in a Service object is not allowed. By default, no restrictions are defined.

The following YAML describes the fields for the policy stanza:

Network.config.openshift.io policy stanza

policy:
  allowedCIDRs: [] 
  rejectedCIDRs: []

policy:
  allowedCIDRs: []

1


  rejectedCIDRs: []

2

Copy to Clipboard

Toggle word wrap

1: A list of allowed IP address ranges in CIDR format.
2: A list of rejected IP address ranges in CIDR format.

Example external IP configurations

Several possible configurations for external IP address pools are displayed in the following examples:

The following YAML describes a configuration that enables automatically assigned external IP addresses:

Example configuration with spec.externalIP.autoAssignCIDRs set

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  ...
  externalIP:
    autoAssignCIDRs:
    - 192.168.132.254/29

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  ...
  externalIP:
    autoAssignCIDRs:
    - 192.168.132.254/29

Copy to Clipboard

Toggle word wrap

The following YAML configures policy rules for the allowed and rejected CIDR ranges:

Example configuration with spec.externalIP.policy set

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  ...
  externalIP:
    policy:
      allowedCIDRs:
      - 192.168.132.0/29
      - 192.168.132.8/29
      rejectedCIDRs:
      - 192.168.132.7/32

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  ...
  externalIP:
    policy:
      allowedCIDRs:
      - 192.168.132.0/29
      - 192.168.132.8/29
      rejectedCIDRs:
      - 192.168.132.7/32

Copy to Clipboard

Toggle word wrap

28.2.8. Configure external IP address blocks for your cluster
Copy link

As a cluster administrator, you can configure the following ExternalIP settings:

An ExternalIP address block used by OpenShift Container Platform to automatically populate the spec.clusterIP field for a Service object.
A policy object to restrict what IP addresses may be manually assigned to the spec.clusterIP array of a Service object.

Prerequisites

Install the OpenShift CLI (oc).
Access to the cluster as a user with the cluster-admin role.

Procedure

Optional: To display the current external IP configuration, enter the following command:
```
oc describe networks.config cluster
```
```
$ oc describe networks.config cluster
```
Copy to Clipboard Toggle word wrap
To edit the configuration, enter the following command:
```
oc edit networks.config cluster
```
```
$ oc edit networks.config cluster
```
Copy to Clipboard Toggle word wrap

Modify the ExternalIP configuration, as in the following example:

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  ...
  externalIP: 
  ...

apiVersion: config.openshift.io/v1
kind: Network
metadata:
  name: cluster
spec:
  ...
  externalIP:

1

...

Copy to Clipboard

Toggle word wrap

1: Specify the configuration for the externalIP stanza.

To confirm the updated ExternalIP configuration, enter the following command:

oc get networks.config cluster -o go-template='{{.spec.externalIP}}{{"\n"}}'

$ oc get networks.config cluster -o go-template='{{.spec.externalIP}}{{"\n"}}'

Copy to Clipboard

Toggle word wrap

28.2.10. Next steps
Copy link

Configuring ingress cluster traffic for a service external IP

28.3. Configuring ingress cluster traffic using an Ingress Controller
Copy link

OpenShift Container Platform provides methods for communicating from outside the cluster with services running in the cluster. This method uses an Ingress Controller.

28.3.1. Using Ingress Controllers and routes
Copy link

The Ingress Operator manages Ingress Controllers and wildcard DNS.

Using an Ingress Controller is the most common way to allow external access to an OpenShift Container Platform cluster.

An Ingress Controller is configured to accept external requests and proxy them based on the configured routes. This is limited to HTTP, HTTPS using SNI, and TLS using SNI, which is sufficient for web applications and services that work over TLS with SNI.

Work with your administrator to configure an Ingress Controller to accept external requests and proxy them based on the configured routes.

The administrator can create a wildcard DNS entry and then set up an Ingress Controller. Then, you can work with the edge Ingress Controller without having to contact the administrators.

By default, every Ingress Controller in the cluster can admit any route created in any project in the cluster.

The Ingress Controller:

Has two replicas by default, which means it should be running on two worker nodes.
Can be scaled up to have more replicas on more nodes.

Note

The procedures in this section require prerequisites performed by the cluster administrator.

28.3.2. Prerequisites
Copy link

Before starting the following procedures, the administrator must:

Set up the external port to the cluster networking environment so that requests can reach the cluster.
Make sure there is at least one user with cluster admin role. To add this role to a user, run the following command:
```
oc adm policy add-cluster-role-to-user cluster-admin username
```
```
$ oc adm policy add-cluster-role-to-user cluster-admin username
```
Copy to Clipboard Toggle word wrap
You have an OpenShift Container Platform cluster with at least one master and at least one node and a system outside the cluster that has network access to the cluster. This procedure assumes that the external system is on the same subnet as the cluster. The additional networking required for external systems on a different subnet is out-of-scope for this topic.

28.3.3. Creating a project and service
Copy link

If the project and service that you want to expose does not exist, create the project and then create the service.

If the project and service already exists, skip to the procedure on exposing the service to create a route.

Prerequisites

Install the OpenShift CLI (oc) and log in as a cluster administrator.

Procedure

Create a new project for your service by running the oc new-project command:
```
oc new-project <project_name>
```
```
$ oc new-project <project_name>
```
Copy to Clipboard Toggle word wrap

Use the oc new-app command to create your service:

oc new-app nodejs:12~https://github.com/sclorg/nodejs-ex.git

$ oc new-app nodejs:12~https://github.com/sclorg/nodejs-ex.git

Copy to Clipboard

Toggle word wrap

To verify that the service was created, run the following command:

oc get svc -n <project_name>

$ oc get svc -n <project_name>

Copy to Clipboard

Toggle word wrap

Example output

NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
nodejs-ex   ClusterIP   172.30.197.157   <none>        8080/TCP   70s

NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
nodejs-ex   ClusterIP   172.30.197.157   <none>        8080/TCP   70s

Copy to Clipboard

Toggle word wrap

Note

By default, the new service does not have an external IP address.

28.3.4. Exposing the service by creating a route
Copy link

You can expose the service as a route by using the oc expose command.

Prerequisites

You logged into OpenShift Container Platform.

Procedure

Log in to the project where the service you want to expose is located:
```
oc project <project_name>
```
```
$ oc project <project_name>
```
Copy to Clipboard Toggle word wrap
Run the oc expose service command to expose the route:
```
oc expose service nodejs-ex
```
```
$ oc expose service nodejs-ex
```
Copy to Clipboard Toggle word wrap
Example output
```
route.route.openshift.io/nodejs-ex exposed
```
```
route.route.openshift.io/nodejs-ex exposed
```
Copy to Clipboard Toggle word wrap
To verify that the service is exposed, you can use a tool, such as curl to check that the service is accessible from outside the cluster.
1. To find the hostname of the route, enter the following command:
  $ oc get route
  Copy to Clipboard Toggle word wrap
  Example output
  NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD nodejs-ex nodejs-ex-myproject.example.com nodejs-ex 8080-tcp None
  
  Copy to Clipboard Toggle word wrap
2. To check that the host responds to a GET request, enter the following command:
  Example curl command
  $ curl --head nodejs-ex-myproject.example.com
  
  Copy to Clipboard Toggle word wrap
  Example output
  HTTP/1.1 200 OK ...
  
  Copy to Clipboard Toggle word wrap

28.3.5. Ingress sharding in OpenShift Container Platform
Copy link

In OpenShift Container Platform, an Ingress Controller can serve all routes, or it can serve a subset of routes. By default, the Ingress Controller serves any route created in any namespace in the cluster. You can add additional Ingress Controllers to your cluster to optimize routing by creating shards, which are subsets of routes based on selected characteristics. To mark a route as a member of a shard, use labels in the route or namespace metadata field. The Ingress Controller uses selectors, also known as a selection expression, to select a subset of routes from the entire pool of routes to serve.

Ingress sharding is useful in cases where you want to load balance incoming traffic across multiple Ingress Controllers, when you want to isolate traffic to be routed to a specific Ingress Controller, or for a variety of other reasons described in the next section.

By default, each route uses the default domain of the cluster. However, routes can be configured to use the domain of the router instead.

28.3.6. Ingress Controller sharding
Copy link

You can use Ingress sharding, also known as router sharding, to distribute a set of routes across multiple routers by adding labels to routes, namespaces, or both. The Ingress Controller uses a corresponding set of selectors to admit only the routes that have a specified label. Each Ingress shard comprises the routes that are filtered by using a given selection expression.

As the primary mechanism for traffic to enter the cluster, the demands on the Ingress Controller can be significant. As a cluster administrator, you can shard the routes to:

Balance Ingress Controllers, or routers, with several routes to accelerate responses to changes.
Assign certain routes to have different reliability guarantees than other routes.
Allow certain Ingress Controllers to have different policies defined.
Allow only specific routes to use additional features.
Expose different routes on different addresses so that internal and external users can see different routes, for example.
Transfer traffic from one version of an application to another during a blue-green deployment.

When Ingress Controllers are sharded, a given route is admitted to zero or more Ingress Controllers in the group. The status of a route describes whether an Ingress Controller has admitted the route. An Ingress Controller only admits a route if the route is unique to a shard.

With sharding, you can distribute subsets of routes over multiple Ingress Controllers. These subsets can be nonoverlapping, also called traditional sharding, or overlapping, otherwise known as overlapped sharding.

The following table outlines three sharding methods:

Expand

Sharding method	Description
Namespace selector	After you add a namespace selector to the Ingress Controller, all routes in a namespace that have matching labels for the namespace selector are included in the Ingress shard. Consider this method when an Ingress Controller serves all routes created in a namespace.
Route selector	After you add a route selector to the Ingress Controller, all routes with labels that match the route selector are included in the Ingress shard. Consider this method when you want an Ingress Controller to serve only a subset of routes or a specific route in a namespace.
Namespace and route selectors	Provides your Ingress Controller scope for both namespace selector and route selector methods. Consider this method when you want the flexibility of both the namespace selector and the route selector methods.

28.3.6.1. Traditional sharding example
Copy link

An example of a configured Ingress Controller finops-router that has the label selector spec.namespaceSelector.matchExpressions with key values set to finance and ops:

Example YAML definition for finops-router

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: finops-router
  namespace: openshift-ingress-operator
spec:
  namespaceSelector:
    matchExpressions:
    - key: name
      operator: In
      values:
      - finance
      - ops

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: finops-router
  namespace: openshift-ingress-operator
spec:
  namespaceSelector:
    matchExpressions:
    - key: name
      operator: In
      values:
      - finance
      - ops

Copy to Clipboard

Toggle word wrap

An example of a configured Ingress Controller dev-router that has the label selector spec.namespaceSelector.matchLabels.name with the key value set to dev:

Example YAML definition for dev-router

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: dev-router
  namespace: openshift-ingress-operator
spec:
  namespaceSelector:
    matchLabels:
      name: dev

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: dev-router
  namespace: openshift-ingress-operator
spec:
  namespaceSelector:
    matchLabels:
      name: dev

Copy to Clipboard

Toggle word wrap

If all application routes are in separate namespaces, such as each labeled with name:finance, name:ops, and name:dev, the configuration effectively distributes your routes between the two Ingress Controllers. OpenShift Container Platform routes for console, authentication, and other purposes should not be handled.

In the previous scenario, sharding becomes a special case of partitioning, with no overlapping subsets. Routes are divided between router shards.

Warning

The default Ingress Controller continues to serve all routes unless the namespaceSelector or routeSelector fields contain routes that are meant for exclusion. See this Red Hat Knowledgebase solution and the section "Sharding the default Ingress Controller" for more information on how to exclude routes from the default Ingress Controller.

28.3.6.2. Overlapped sharding example
Copy link

An example of a configured Ingress Controller devops-router that has the label selector spec.namespaceSelector.matchExpressions with key values set to dev and ops:

Example YAML definition for devops-router

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: devops-router
  namespace: openshift-ingress-operator
spec:
  namespaceSelector:
    matchExpressions:
    - key: name
      operator: In
      values:
      - dev
      - ops

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: devops-router
  namespace: openshift-ingress-operator
spec:
  namespaceSelector:
    matchExpressions:
    - key: name
      operator: In
      values:
      - dev
      - ops

Copy to Clipboard

Toggle word wrap

The routes in the namespaces labeled name:dev and name:ops are now serviced by two different Ingress Controllers. With this configuration, you have overlapping subsets of routes.

With overlapping subsets of routes you can create more complex routing rules. For example, you can divert higher priority traffic to the dedicated finops-router while sending lower priority traffic to devops-router.

28.3.6.3. Sharding the default Ingress Controller
Copy link

After creating a new Ingress shard, there might be routes that are admitted to your new Ingress shard that are also admitted by the default Ingress Controller. This is because the default Ingress Controller has no selectors and admits all routes by default.

You can restrict an Ingress Controller from servicing routes with specific labels using either namespace selectors or route selectors. The following procedure restricts the default Ingress Controller from servicing your newly sharded finance, ops, and dev, routes using a namespace selector. This adds further isolation to Ingress shards.

Important

You must keep all of OpenShift Container Platform’s administration routes on the same Ingress Controller. Therefore, avoid adding additional selectors to the default Ingress Controller that exclude these essential routes.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in as a project administrator.

Procedure

Modify the default Ingress Controller by running the following command:

oc edit ingresscontroller -n openshift-ingress-operator default

$ oc edit ingresscontroller -n openshift-ingress-operator default

Copy to Clipboard

Toggle word wrap

Edit the Ingress Controller to contain a namespaceSelector that excludes the routes with any of the finance, ops, and dev labels:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  namespaceSelector:
    matchExpressions:
      - key: name
        operator: NotIn
        values:
          - finance
          - ops
          - dev

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  namespaceSelector:
    matchExpressions:
      - key: name
        operator: NotIn
        values:
          - finance
          - ops
          - dev

Copy to Clipboard

Toggle word wrap

The default Ingress Controller will no longer serve the namespaces labeled name:finance, name:ops, and name:dev.

28.3.6.4. Ingress sharding and DNS
Copy link

The cluster administrator is responsible for making a separate DNS entry for each router in a project. A router will not forward unknown routes to another router.

Consider the following example:

Router A lives on host 192.168.0.5 and has routes with *.foo.com.
Router B lives on host 192.168.1.9 and has routes with *.example.com.

Separate DNS entries must resolve *.foo.com to the node hosting Router A and *.example.com to the node hosting Router B:

*.foo.com A IN 192.168.0.5
*.example.com A IN 192.168.1.9

28.3.6.5. Configuring Ingress Controller sharding by using route labels
Copy link

Ingress Controller sharding by using route labels means that the Ingress Controller serves any route in any namespace that is selected by the route selector.

Figure 28.1. Ingress sharding using route labels

A diagram showing multiple Ingress Controllers with different route selectors serving any route containing a label that matches a given route selector regardless of the namespace a route belongs to

Ingress Controller sharding is useful when balancing incoming traffic load among a set of Ingress Controllers and when isolating traffic to a specific Ingress Controller. For example, company A goes to one Ingress Controller and company B to another.

Procedure

Edit the router-internal.yaml file:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: sharded
  namespace: openshift-ingress-operator
spec:
  domain: <apps-sharded.basedomain.example.net> 
  nodePlacement:
    nodeSelector:
      matchLabels:
        node-role.kubernetes.io/worker: ""
  routeSelector:
    matchLabels:
      type: sharded

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: sharded
  namespace: openshift-ingress-operator
spec:
  domain: <apps-sharded.basedomain.example.net>

1


  nodePlacement:
    nodeSelector:
      matchLabels:
        node-role.kubernetes.io/worker: ""
  routeSelector:
    matchLabels:
      type: sharded

Copy to Clipboard

Toggle word wrap

1: Specify a domain to be used by the Ingress Controller. This domain must be different from the default Ingress Controller domain.

Apply the Ingress Controller router-internal.yaml file:
```
oc apply -f router-internal.yaml
```
```
# oc apply -f router-internal.yaml
```
Copy to Clipboard Toggle word wrap
The Ingress Controller selects routes in any namespace that have the label type: sharded.

Create a new route using the domain configured in the router-internal.yaml:

oc expose svc <service-name> --hostname <route-name>.apps-sharded.basedomain.example.net

$ oc expose svc <service-name> --hostname <route-name>.apps-sharded.basedomain.example.net

Copy to Clipboard

Toggle word wrap

28.3.6.6. Configuring Ingress Controller sharding by using namespace labels
Copy link

Ingress Controller sharding by using namespace labels means that the Ingress Controller serves any route in any namespace that is selected by the namespace selector.

Figure 28.2. Ingress sharding using namespace labels

A diagram showing multiple Ingress Controllers with different namespace selectors serving routes that belong to the namespace containing a label that matches a given namespace selector

Ingress Controller sharding is useful when balancing incoming traffic load among a set of Ingress Controllers and when isolating traffic to a specific Ingress Controller. For example, company A goes to one Ingress Controller and company B to another.

Procedure

Edit the router-internal.yaml file:

cat router-internal.yaml

$ cat router-internal.yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: sharded
  namespace: openshift-ingress-operator
spec:
  domain: <apps-sharded.basedomain.example.net> 
  nodePlacement:
    nodeSelector:
      matchLabels:
        node-role.kubernetes.io/worker: ""
  namespaceSelector:
    matchLabels:
      type: sharded

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: sharded
  namespace: openshift-ingress-operator
spec:
  domain: <apps-sharded.basedomain.example.net>

1


  nodePlacement:
    nodeSelector:
      matchLabels:
        node-role.kubernetes.io/worker: ""
  namespaceSelector:
    matchLabels:
      type: sharded

Copy to Clipboard

Toggle word wrap

1: Specify a domain to be used by the Ingress Controller. This domain must be different from the default Ingress Controller domain.

Apply the Ingress Controller router-internal.yaml file:
```
oc apply -f router-internal.yaml
```
```
$ oc apply -f router-internal.yaml
```
Copy to Clipboard Toggle word wrap
The Ingress Controller selects routes in any namespace that is selected by the namespace selector that have the label type: sharded.

Create a new route using the domain configured in the router-internal.yaml:

oc expose svc <service-name> --hostname <route-name>.apps-sharded.basedomain.example.net

$ oc expose svc <service-name> --hostname <route-name>.apps-sharded.basedomain.example.net

Copy to Clipboard

Toggle word wrap

28.3.6.7. Creating a route for Ingress Controller sharding
Copy link

A route allows you to host your application at a URL. In this case, the hostname is not set and the route uses a subdomain instead. When you specify a subdomain, you automatically use the domain of the Ingress Controller that exposes the route. For situations where a route is exposed by multiple Ingress Controllers, the route is hosted at multiple URLs.

The following procedure describes how to create a route for Ingress Controller sharding, using the hello-openshift application as an example.

Ingress Controller sharding is useful when balancing incoming traffic load among a set of Ingress Controllers and when isolating traffic to a specific Ingress Controller. For example, company A goes to one Ingress Controller and company B to another.

Prerequisites

You installed the OpenShift CLI (oc).
You are logged in as a project administrator.
You have a web application that exposes a port and an HTTP or TLS endpoint listening for traffic on the port.
You have configured the Ingress Controller for sharding.

Procedure

Create a project called hello-openshift by running the following command:
```
oc new-project hello-openshift
```
```
$ oc new-project hello-openshift
```
Copy to Clipboard Toggle word wrap

Create a pod in the project by running the following command:

oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json

$ oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json

Copy to Clipboard

Toggle word wrap

Create a service called hello-openshift by running the following command:
```
oc expose pod/hello-openshift
```
```
$ oc expose pod/hello-openshift
```
Copy to Clipboard Toggle word wrap
Create a route definition called hello-openshift-route.yaml:
YAML definition of the created route for sharding:
```
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    type: sharded 
  name: hello-openshift-edge
  namespace: hello-openshift
spec:
  subdomain: hello-openshift 
  tls:
    termination: edge
  to:
    kind: Service
    name: hello-openshift
```
```
apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    type: sharded 
```
1
```
  name: hello-openshift-edge
  namespace: hello-openshift
spec:
  subdomain: hello-openshift 
```
2
```
  tls:
    termination: edge
  to:
    kind: Service
    name: hello-openshift
```
Copy to Clipboard Toggle word wrap
1
Both the label key and its corresponding label value must match the ones specified in the Ingress Controller. In this example, the Ingress Controller has the label key and value type: sharded.
2
The route will be exposed using the value of the subdomain field. When you specify the subdomain field, you must leave the hostname unset. If you specify both the host and subdomain fields, then the route will use the value of the host field, and ignore the subdomain field.
Use hello-openshift-route.yaml to create a route to the hello-openshift application by running the following command:
```
oc -n hello-openshift create -f hello-openshift-route.yaml
```
```
$ oc -n hello-openshift create -f hello-openshift-route.yaml
```
Copy to Clipboard Toggle word wrap

Verification

Get the status of the route with the following command:

oc -n hello-openshift get routes/hello-openshift-edge -o yaml

$ oc -n hello-openshift get routes/hello-openshift-edge -o yaml

Copy to Clipboard

Toggle word wrap

The resulting Route resource should look similar to the following:

Example output

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    type: sharded
  name: hello-openshift-edge
  namespace: hello-openshift
spec:
  subdomain: hello-openshift
  tls:
    termination: edge
  to:
    kind: Service
    name: hello-openshift
status:
  ingress:
  - host: hello-openshift.<apps-sharded.basedomain.example.net> 
    routerCanonicalHostname: router-sharded.<apps-sharded.basedomain.example.net> 
    routerName: sharded

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  labels:
    type: sharded
  name: hello-openshift-edge
  namespace: hello-openshift
spec:
  subdomain: hello-openshift
  tls:
    termination: edge
  to:
    kind: Service
    name: hello-openshift
status:
  ingress:
  - host: hello-openshift.<apps-sharded.basedomain.example.net>

1


    routerCanonicalHostname: router-sharded.<apps-sharded.basedomain.example.net>

2


    routerName: sharded

3

Copy to Clipboard

Toggle word wrap

1: The hostname the Ingress Controller, or router, uses to expose the route. The value of the host field is automatically determined by the Ingress Controller, and uses its domain. In this example, the domain of the Ingress Controller is <apps-sharded.basedomain.example.net>.
2: The hostname of the Ingress Controller.
3: The name of the Ingress Controller. In this example, the Ingress Controller has the name sharded.

Additional resources

28.4. Configuring the Ingress Controller endpoint publishing strategy
Copy link

The endpointPublishingStrategy is used to publish the Ingress Controller endpoints to other networks, enable load balancer integrations, and provide access to other systems.

Important

On Red Hat OpenStack Platform (RHOSP), the LoadBalancerService endpoint publishing strategy is supported only if a cloud provider is configured to create health monitors. For RHOSP 16.2, this strategy is possible only if you use the Amphora Octavia provider.

For more information, see the "Setting RHOSP Cloud Controller Manager options" section of the RHOSP installation documentation.

28.4.1. Ingress Controller endpoint publishing strategy
Copy link

NodePortService endpoint publishing strategy

The NodePortService endpoint publishing strategy publishes the Ingress Controller using a Kubernetes NodePort service.

In this configuration, the Ingress Controller deployment uses container networking. A NodePortService is created to publish the deployment. The specific node ports are dynamically allocated by OpenShift Container Platform; however, to support static port allocations, your changes to the node port field of the managed NodePortService are preserved.

Figure 28.3. Diagram of NodePortService

OpenShift Container Platform Ingress NodePort endpoint publishing strategy

The preceding graphic shows the following concepts pertaining to OpenShift Container Platform Ingress NodePort endpoint publishing strategy:

All the available nodes in the cluster have their own, externally accessible IP addresses. The service running in the cluster is bound to the unique NodePort for all the nodes.
When the client connects to a node that is down, for example, by connecting the 10.0.128.4 IP address in the graphic, the node port directly connects the client to an available node that is running the service. In this scenario, no load balancing is required. As the image shows, the 10.0.128.4 address is down and another IP address must be used instead.

Note

The Ingress Operator ignores any updates to .spec.ports[].nodePort fields of the service.

By default, ports are allocated automatically and you can access the port allocations for integrations. However, sometimes static port allocations are necessary to integrate with existing infrastructure which may not be easily reconfigured in response to dynamic ports. To achieve integrations with static node ports, you can update the managed service resource directly.

For more information, see the Kubernetes Services documentation on NodePort.

HostNetwork endpoint publishing strategy

The HostNetwork endpoint publishing strategy publishes the Ingress Controller on node ports where the Ingress Controller is deployed.

An Ingress Controller with the HostNetwork endpoint publishing strategy can have only one pod replica per node. If you want n replicas, you must use at least n nodes where those replicas can be scheduled. Because each pod replica requests ports 80 and 443 on the node host where it is scheduled, a replica cannot be scheduled to a node if another pod on the same node is using those ports.

The HostNetwork object has a hostNetwork field with the following default values for the optional binding ports: httpPort: 80, httpsPort: 443, and statsPort: 1936. By specifying different binding ports for your network, you can deploy multiple Ingress Controllers on the same node for the HostNetwork strategy.

Example

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: internal
  namespace: openshift-ingress-operator
spec:
  domain: example.com
  endpointPublishingStrategy:
    type: HostNetwork
    hostNetwork:
      httpPort: 80
      httpsPort: 443
      statsPort: 1936

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: internal
  namespace: openshift-ingress-operator
spec:
  domain: example.com
  endpointPublishingStrategy:
    type: HostNetwork
    hostNetwork:
      httpPort: 80
      httpsPort: 443
      statsPort: 1936

Copy to Clipboard

Toggle word wrap

28.4.1.1. Configuring the Ingress Controller endpoint publishing scope to Internal
Copy link

When a cluster administrator installs a new cluster without specifying that the cluster is private, the default Ingress Controller is created with a scope set to External. Cluster administrators can change an External scoped Ingress Controller to Internal.

Prerequisites

You installed the oc CLI.

Procedure

To change an External scoped Ingress Controller to Internal, enter the following command:

oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"Internal"}}}}'

$ oc -n openshift-ingress-operator patch ingresscontrollers/default --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"Internal"}}}}'

Copy to Clipboard

Toggle word wrap

To check the status of the Ingress Controller, enter the following command:
```
oc -n openshift-ingress-operator get ingresscontrollers/default -o yaml
```
```
$ oc -n openshift-ingress-operator get ingresscontrollers/default -o yaml
```
Copy to Clipboard Toggle word wrap
- The Progressing status condition indicates whether you must take further action. For example, the status condition can indicate that you need to delete the service by entering the following command:
  $ oc -n openshift-ingress delete services/router-default
  Copy to Clipboard Toggle word wrap
  If you delete the service, the Ingress Operator recreates it as Internal.

28.4.1.2. Configuring the Ingress Controller endpoint publishing scope to External
Copy link

When a cluster administrator installs a new cluster without specifying that the cluster is private, the default Ingress Controller is created with a scope set to External.

The Ingress Controller’s scope can be configured to be Internal during installation or after, and cluster administrators can change an Internal Ingress Controller to External.

Important

On some platforms, it is necessary to delete and recreate the service.

Changing the scope can cause disruption to Ingress traffic, potentially for several minutes. This applies to platforms where it is necessary to delete and recreate the service, because the procedure can cause OpenShift Container Platform to deprovision the existing service load balancer, provision a new one, and update DNS.

Prerequisites

You installed the oc CLI.

Procedure

To change an Internal scoped Ingress Controller to External, enter the following command:

oc -n openshift-ingress-operator patch ingresscontrollers/private --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"External"}}}}'

$ oc -n openshift-ingress-operator patch ingresscontrollers/private --type=merge --patch='{"spec":{"endpointPublishingStrategy":{"type":"LoadBalancerService","loadBalancer":{"scope":"External"}}}}'

Copy to Clipboard

Toggle word wrap

To check the status of the Ingress Controller, enter the following command:
```
oc -n openshift-ingress-operator get ingresscontrollers/default -o yaml
```
```
$ oc -n openshift-ingress-operator get ingresscontrollers/default -o yaml
```
Copy to Clipboard Toggle word wrap
- The Progressing status condition indicates whether you must take further action. For example, the status condition can indicate that you need to delete the service by entering the following command:
  $ oc -n openshift-ingress delete services/router-default
  Copy to Clipboard Toggle word wrap
  If you delete the service, the Ingress Operator recreates it as External.

28.4.1.3. Adding a single NodePort service to an Ingress Controller
Copy link

Instead of creating a NodePort-type Service for each project, you can create a custom Ingress Controller to use the NodePortService endpoint publishing strategy. To prevent port conflicts, consider this configuration for your Ingress Controller when you want to apply a set of routes, through Ingress sharding, to nodes that might already have a HostNetwork Ingress Controller.

Before you set a NodePort-type Service for each project, read the following considerations:

You must create a wildcard DNS record for the Nodeport Ingress Controller domain. A Nodeport Ingress Controller route can be reached from the address of a worker node. For more information about the required DNS records for routes, see "User-provisioned DNS requirements".
You must expose a route for your service and specify the --hostname argument for your custom Ingress Controller domain.
You must append the port that is assigned to the NodePort-type Service in the route so that you can access application pods.

Prerequisites

You installed the OpenShift CLI (oc).
Logged in as a user with cluster-admin privileges.
You created a wildcard DNS record.

Procedure

Create a custom resource (CR) file for the Ingress Controller:

Example of a CR file that defines information for the IngressController object

apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: <custom_ic_name> 
    namespace: openshift-ingress-operator
  spec:
    replicas: 1
    domain: <custom_ic_domain_name> 
    nodePlacement:
      nodeSelector:
        matchLabels:
          <key>: <value> 
    namespaceSelector:
     matchLabels:
       <key>: <value> 
    endpointPublishingStrategy:
      type: NodePortService
# ...

apiVersion: v1
items:
- apiVersion: operator.openshift.io/v1
  kind: IngressController
  metadata:
    name: <custom_ic_name>

1


    namespace: openshift-ingress-operator
  spec:
    replicas: 1
    domain: <custom_ic_domain_name>

2


    nodePlacement:
      nodeSelector:
        matchLabels:
          <key>: <value>

3


    namespaceSelector:
     matchLabels:
       <key>: <value>

4


    endpointPublishingStrategy:
      type: NodePortService
# ...

Copy to Clipboard

Toggle word wrap

1: Specify the a custom name for the IngressController CR.
2: The DNS name that the Ingress Controller services. As an example, the default ingresscontroller domain is apps.ipi-cluster.example.com, so you would specify the <custom_ic_domain_name> as nodeportsvc.ipi-cluster.example.com.
3: Specify the label for the nodes that include the custom Ingress Controller.
4: Specify the label for a set of namespaces. Substitute <key>:<value> with a map of key-value pairs where <key> is a unique name for the new label and <value> is its value. For example: ingresscontroller: custom-ic.

Add a label to a node by using the oc label node command:
```
oc label node <node_name> <key>=<value>
```
```
$ oc label node <node_name> <key>=<value> 
```
1
Copy to Clipboard Toggle word wrap
1
Where <value> must match the key-value pair specified in the nodePlacement section of your IngressController CR.
Create the IngressController object:
```
oc create -f <ingress_controller_cr>.yaml
```
```
$ oc create -f <ingress_controller_cr>.yaml
```
Copy to Clipboard Toggle word wrap

Find the port for the service created for the IngressController CR:

oc get svc -n openshift-ingress

$ oc get svc -n openshift-ingress

Copy to Clipboard

Toggle word wrap

Example output that shows port 80:32432/TCP for the router-nodeport-custom-ic3 service

NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                     AGE
router-internal-default      ClusterIP   172.30.195.74    <none>        80/TCP,443/TCP,1936/TCP                     223d
router-nodeport-custom-ic3   NodePort    172.30.109.219   <none>        80:32432/TCP,443:31366/TCP,1936:30499/TCP   155m

NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                                     AGE
router-internal-default      ClusterIP   172.30.195.74    <none>        80/TCP,443/TCP,1936/TCP                     223d
router-nodeport-custom-ic3   NodePort    172.30.109.219   <none>        80:32432/TCP,443:31366/TCP,1936:30499/TCP   155m

Copy to Clipboard

Toggle word wrap

To create a new project, enter the following command:
```
oc new-project <project_name>
```
```
$ oc new-project <project_name>
```
Copy to Clipboard Toggle word wrap
To label the new namespace, enter the following command:
```
oc label namespace <project_name> <key>=<value>
```
```
$ oc label namespace <project_name> <key>=<value> 
```
1
Copy to Clipboard Toggle word wrap
1
Where <key>=<value> must match the value in the namespaceSelector section of your Ingress Controller CR.
Create a new application in your cluster:
```
oc new-app --image=<image_name>
```
```
$ oc new-app --image=<image_name> 
```
1
Copy to Clipboard Toggle word wrap
1
An example of <image_name> is quay.io/openshifttest/hello-openshift:multiarch.
Create a Route object for a service, so that the pod can use the service to expose the application external to the cluster.
```
oc expose svc/<service_name> --hostname=<svc_name>-<project_name>.<custom_ic_domain_name>
```
```
$ oc expose svc/<service_name> --hostname=<svc_name>-<project_name>.<custom_ic_domain_name> 
```
1
Copy to Clipboard Toggle word wrap
Note
You must specify the domain name of your custom Ingress Controller in the --hostname argument. If you do not do this, the Ingress Operator uses the default Ingress Controller to serve all the routes for your cluster.

Check that the route has the Admitted status and that it includes metadata for the custom Ingress Controller:

oc get route/hello-openshift -o json | jq '.status.ingress'

$ oc get route/hello-openshift -o json | jq '.status.ingress'

Copy to Clipboard

Toggle word wrap

Example output

# ...
{
  "conditions": [
    {
      "lastTransitionTime": "2024-05-17T18:25:41Z",
      "status": "True",
      "type": "Admitted"
    }
  ],
  [
    {
      "host": "hello-openshift.nodeportsvc.ipi-cluster.example.com",
      "routerCanonicalHostname": "router-nodeportsvc.nodeportsvc.ipi-cluster.example.com",
      "routerName": "nodeportsvc", "wildcardPolicy": "None"
    }
  ],
}

# ...
{
  "conditions": [
    {
      "lastTransitionTime": "2024-05-17T18:25:41Z",
      "status": "True",
      "type": "Admitted"
    }
  ],
  [
    {
      "host": "hello-openshift.nodeportsvc.ipi-cluster.example.com",
      "routerCanonicalHostname": "router-nodeportsvc.nodeportsvc.ipi-cluster.example.com",
      "routerName": "nodeportsvc", "wildcardPolicy": "None"
    }
  ],
}

Copy to Clipboard

Toggle word wrap

Update the default IngressController CR to prevent the default Ingress Controller from managing the NodePort-type Service. The default Ingress Controller will continue to monitor all other cluster traffic.

oc patch --type=merge -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"namespaceSelector":{"matchExpressions":[{"key":"<key>","operator":"NotIn","values":["<value>]}]}}}'

$ oc patch --type=merge -n openshift-ingress-operator ingresscontroller/default --patch '{"spec":{"namespaceSelector":{"matchExpressions":[{"key":"<key>","operator":"NotIn","values":["<value>]}]}}}'

Copy to Clipboard

Toggle word wrap

Verification

Verify that the DNS entry can route inside and outside of your cluster by entering the following command. The command outputs the IP address of the node that received the label from running the oc label node command earlier in the procedure.
```
dig +short <svc_name>-<project_name>.<custom_ic_domain_name>
```
```
$ dig +short <svc_name>-<project_name>.<custom_ic_domain_name>
```
Copy to Clipboard Toggle word wrap
To verify that your cluster uses the IP addresses from external DNS servers for DNS resolution, check the connection of your cluster by entering the following command:
```
curl <svc_name>-<project_name>.<custom_ic_domain_name>:<port>
```
```
$ curl <svc_name>-<project_name>.<custom_ic_domain_name>:<port> 
```
1
Copy to Clipboard Toggle word wrap
1 1
Where <port> is the node port from the NodePort-type Service. Based on example output from the oc get svc -n openshift-ingress command, the 80:32432/TCP HTTP route means that 32432 is the node port.
Output example
```
Hello OpenShift!
```
```
Hello OpenShift!
```
Copy to Clipboard Toggle word wrap

28.5. Configuring ingress cluster traffic using a load balancer
Copy link

OpenShift Container Platform provides methods for communicating from outside the cluster with services running in the cluster. This method uses a load balancer.

28.5.1. Using a load balancer to get traffic into the cluster
Copy link

If you do not need a specific external IP address, you can configure a load balancer service to allow external access to an OpenShift Container Platform cluster.

A load balancer service allocates a unique IP. The load balancer has a single edge router IP, which can be a virtual IP (VIP), but is still a single machine for initial load balancing.

Note

If a pool is configured, it is done at the infrastructure level, not by a cluster administrator.

Note

The procedures in this section require prerequisites performed by the cluster administrator.

28.5.2. Prerequisites
Copy link

Before starting the following procedures, the administrator must:

Set up the external port to the cluster networking environment so that requests can reach the cluster.
Make sure there is at least one user with cluster admin role. To add this role to a user, run the following command:
```
oc adm policy add-cluster-role-to-user cluster-admin username
```
```
$ oc adm policy add-cluster-role-to-user cluster-admin username
```
Copy to Clipboard Toggle word wrap
Have an OpenShift Container Platform cluster with at least one master and at least one node and a system outside the cluster that has network access to the cluster. This procedure assumes that the external system is on the same subnet as the cluster. The additional networking required for external systems on a different subnet is out-of-scope for this topic.

28.5.3. Creating a project and service
Copy link

If the project and service that you want to expose does not exist, create the project and then create the service.

If the project and service already exists, skip to the procedure on exposing the service to create a route.

Prerequisites

Install the OpenShift CLI (oc) and log in as a cluster administrator.

Procedure

Create a new project for your service by running the oc new-project command:
```
oc new-project <project_name>
```
```
$ oc new-project <project_name>
```
Copy to Clipboard Toggle word wrap

Use the oc new-app command to create your service:

oc new-app nodejs:12~https://github.com/sclorg/nodejs-ex.git

$ oc new-app nodejs:12~https://github.com/sclorg/nodejs-ex.git

Copy to Clipboard

Toggle word wrap

To verify that the service was created, run the following command:

oc get svc -n <project_name>

$ oc get svc -n <project_name>

Copy to Clipboard

Toggle word wrap

Example output

NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
nodejs-ex   ClusterIP   172.30.197.157   <none>        8080/TCP   70s

NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
nodejs-ex   ClusterIP   172.30.197.157   <none>        8080/TCP   70s

Copy to Clipboard

Toggle word wrap

Note

By default, the new service does not have an external IP address.

28.5.4. Exposing the service by creating a route
Copy link

You can expose the service as a route by using the oc expose command.

Prerequisites

You logged into OpenShift Container Platform.

Procedure

Log in to the project where the service you want to expose is located:
```
oc project <project_name>
```
```
$ oc project <project_name>
```
Copy to Clipboard Toggle word wrap
Run the oc expose service command to expose the route:
```
oc expose service nodejs-ex
```
```
$ oc expose service nodejs-ex
```
Copy to Clipboard Toggle word wrap
Example output
```
route.route.openshift.io/nodejs-ex exposed
```
```
route.route.openshift.io/nodejs-ex exposed
```
Copy to Clipboard Toggle word wrap
To verify that the service is exposed, you can use a tool, such as curl to check that the service is accessible from outside the cluster.
1. To find the hostname of the route, enter the following command:
  $ oc get route
  Copy to Clipboard Toggle word wrap
  Example output
  NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD nodejs-ex nodejs-ex-myproject.example.com nodejs-ex 8080-tcp None
  
  Copy to Clipboard Toggle word wrap
2. To check that the host responds to a GET request, enter the following command:
  Example curl command
  $ curl --head nodejs-ex-myproject.example.com
  
  Copy to Clipboard Toggle word wrap
  Example output
  HTTP/1.1 200 OK ...
  
  Copy to Clipboard Toggle word wrap

28.5.5. Creating a load balancer service
Copy link

Use the following procedure to create a load balancer service.

Prerequisites

Make sure that the project and service you want to expose exist.
Your cloud provider supports load balancers.

Procedure

To create a load balancer service:

Log in to OpenShift Container Platform.
Load the project where the service you want to expose is located.
```
oc project project1
```
```
$ oc project project1
```
Copy to Clipboard Toggle word wrap
Open a text file on the control plane node and paste the following text, editing the file as needed:
Sample load balancer configuration file
```
apiVersion: v1
kind: Service
metadata:
  name: egress-2 
spec:
  ports:
  - name: db
    port: 3306 
  loadBalancerIP:
  loadBalancerSourceRanges: 
  - 10.0.0.0/8
  - 192.168.0.0/16
  type: LoadBalancer 
  selector:
    name: mysql 
```
```
apiVersion: v1
kind: Service
metadata:
  name: egress-2 
```
1
```
spec:
  ports:
  - name: db
    port: 3306 
```
2
```
  loadBalancerIP:
  loadBalancerSourceRanges: 
```
3
```
  - 10.0.0.0/8
  - 192.168.0.0/16
  type: LoadBalancer 
```
4
```
  selector:
    name: mysql 
```
5
Copy to Clipboard Toggle word wrap
1
Enter a descriptive name for the load balancer service.
2
Enter the same port that the service you want to expose is listening on.
3
Enter a list of specific IP addresses to restrict traffic through the load balancer. This field is ignored if the cloud-provider does not support the feature.
4
Enter Loadbalancer as the type.
5
Enter the name of the service.
Note
To restrict the traffic through the load balancer to specific IP addresses, it is recommended to use the Ingress Controller field spec.endpointPublishingStrategy.loadBalancer.allowedSourceRanges. Do not set the loadBalancerSourceRanges field.
Save and exit the file.
Run the following command to create the service:
```
oc create -f <file-name>
```
```
$ oc create -f <file-name>
```
Copy to Clipboard Toggle word wrap
For example:
```
oc create -f mysql-lb.yaml
```
```
$ oc create -f mysql-lb.yaml
```
Copy to Clipboard Toggle word wrap

Execute the following command to view the new service:

oc get svc

$ oc get svc

Copy to Clipboard

Toggle word wrap

Example output

NAME       TYPE           CLUSTER-IP      EXTERNAL-IP                             PORT(S)          AGE
egress-2   LoadBalancer   172.30.22.226   ad42f5d8b303045-487804948.example.com   3306:30357/TCP   15m

NAME       TYPE           CLUSTER-IP      EXTERNAL-IP                             PORT(S)          AGE
egress-2   LoadBalancer   172.30.22.226   ad42f5d8b303045-487804948.example.com   3306:30357/TCP   15m

Copy to Clipboard

Toggle word wrap

The service has an external IP address automatically assigned if there is a cloud provider enabled.

On the master, use a tool, such as cURL, to make sure you can reach the service using the public IP address:
```
curl <public-ip>:<port>
```
```
$ curl <public-ip>:<port>
```
Copy to Clipboard Toggle word wrap
For example:
```
curl 172.29.121.74:3306
```
```
$ curl 172.29.121.74:3306
```
Copy to Clipboard Toggle word wrap
The examples in this section use a MySQL service, which requires a client application. If you get a string of characters with the Got packets out of order message, you are connecting with the service:
If you have a MySQL client, log in with the standard CLI command:
```
mysql -h 172.30.131.89 -u admin -p
```
```
$ mysql -h 172.30.131.89 -u admin -p
```
Copy to Clipboard Toggle word wrap
Example output
```
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.

MySQL [(none)]>
```
```
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.

MySQL [(none)]>
```
Copy to Clipboard Toggle word wrap

28.6. Configuring ingress cluster traffic on AWS
Copy link

OpenShift Container Platform provides methods for communicating from outside the cluster with services running in the cluster. This method uses load balancers on AWS, specifically a Network Load Balancer (NLB) or a Classic Load Balancer (CLB). Both types of load balancers can forward the client’s IP address to the node, but a CLB requires proxy protocol support, which OpenShift Container Platform automatically enables.

There are two ways to configure an Ingress Controller to use an NLB:

By force replacing the Ingress Controller that is currently using a CLB. This deletes the IngressController object and an outage will occur while the new DNS records propagate and the NLB is being provisioned.
By editing an existing Ingress Controller that uses a CLB to use an NLB. This changes the load balancer without having to delete and recreate the IngressController object.

Both methods can be used to switch from an NLB to a CLB.

You can configure these load balancers on a new or existing AWS cluster.

28.6.1. Configuring Classic Load Balancer timeouts on AWS
Copy link

OpenShift Container Platform provides a method for setting a custom timeout period for a specific route or Ingress Controller. Additionally, an AWS Classic Load Balancer (CLB) has its own timeout period with a default time of 60 seconds.

If the timeout period of the CLB is shorter than the route timeout or Ingress Controller timeout, the load balancer can prematurely terminate the connection. You can prevent this problem by increasing both the timeout period of the route and CLB.

28.6.1.1. Configuring route timeouts
Copy link

You can configure the default timeouts for an existing route when you have services in need of a low timeout, which is required for Service Level Availability (SLA) purposes, or a high timeout, for cases with a slow back end.

Important

If you configured a user-managed external load balancer in front of your OpenShift Container Platform cluster, ensure that the timeout value for the user-managed external load balancer is higher than the timeout value for the route. This configuration prevents network congestion issues over the network that your cluster uses.

Prerequisites

You need a deployed Ingress Controller on a running cluster.

Procedure

Using the oc annotate command, add the timeout to the route:

oc annotate route <route_name> \
    --overwrite haproxy.router.openshift.io/timeout=<timeout><time_unit>

$ oc annotate route <route_name> \
    --overwrite haproxy.router.openshift.io/timeout=<timeout><time_unit>

1

Copy to Clipboard

Toggle word wrap

1: Supported time units are microseconds (us), milliseconds (ms), seconds (s), minutes (m), hours (h), or days (d).

The following example sets a timeout of two seconds on a route named myroute:

oc annotate route myroute --overwrite haproxy.router.openshift.io/timeout=2s

$ oc annotate route myroute --overwrite haproxy.router.openshift.io/timeout=2s

Copy to Clipboard

Toggle word wrap

28.6.1.2. Configuring Classic Load Balancer timeouts
Copy link

You can configure the default timeouts for a Classic Load Balancer (CLB) to extend idle connections.

Prerequisites

You must have a deployed Ingress Controller on a running cluster.

Procedure

Set an AWS connection idle timeout of five minutes for the default ingresscontroller by running the following command:

oc -n openshift-ingress-operator patch ingresscontroller/default \
    --type=merge --patch='{"spec":{"endpointPublishingStrategy": \
    {"type":"LoadBalancerService", "loadBalancer": \
    {"scope":"External", "providerParameters":{"type":"AWS", "aws": \
    {"type":"Classic", "classicLoadBalancer": \
    {"connectionIdleTimeout":"5m"}}}}}}}'

$ oc -n openshift-ingress-operator patch ingresscontroller/default \
    --type=merge --patch='{"spec":{"endpointPublishingStrategy": \
    {"type":"LoadBalancerService", "loadBalancer": \
    {"scope":"External", "providerParameters":{"type":"AWS", "aws": \
    {"type":"Classic", "classicLoadBalancer": \
    {"connectionIdleTimeout":"5m"}}}}}}}'

Copy to Clipboard

Toggle word wrap

Optional: Restore the default value of the timeout by running the following command:

oc -n openshift-ingress-operator patch ingresscontroller/default \
    --type=merge --patch='{"spec":{"endpointPublishingStrategy": \
    {"loadBalancer":{"providerParameters":{"aws":{"classicLoadBalancer": \
    {"connectionIdleTimeout":null}}}}}}}'

$ oc -n openshift-ingress-operator patch ingresscontroller/default \
    --type=merge --patch='{"spec":{"endpointPublishingStrategy": \
    {"loadBalancer":{"providerParameters":{"aws":{"classicLoadBalancer": \
    {"connectionIdleTimeout":null}}}}}}}'

Copy to Clipboard

Toggle word wrap

Note

You must specify the scope field when you change the connection timeout value unless the current scope is already set. When you set the scope field, you do not need to do so again if you restore the default timeout value.

28.6.2. Configuring ingress cluster traffic on AWS using a Network Load Balancer
Copy link

OpenShift Container Platform provides methods for communicating from outside the cluster with services that run in the cluster. One such method uses a Network Load Balancer (NLB). You can configure an NLB on a new or existing AWS cluster.

28.6.2.1. Switching the Ingress Controller from using a Classic Load Balancer to a Network Load Balancer
Copy link

You can switch the Ingress Controller that is using a Classic Load Balancer (CLB) to one that uses a Network Load Balancer (NLB) on AWS.

Switching between these load balancers will not delete the IngressController object.

Warning

This procedure might cause the following issues:

An outage that can last several minutes due to new DNS records propagation, new load balancers provisioning, and other factors. IP addresses and canonical names of the Ingress Controller load balancer might change after applying this procedure.
Leaked load balancer resources due to a change in the annotation of the service.

Procedure

Modify the existing Ingress Controller that you want to switch to using an NLB. This example assumes that your default Ingress Controller has an External scope and no other customizations:

Example ingresscontroller.yaml file

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: NLB
    type: LoadBalancerService

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: NLB
    type: LoadBalancerService

Copy to Clipboard

Toggle word wrap

Note

If you do not specify a value for the spec.endpointPublishingStrategy.loadBalancer.providerParameters.aws.type field, the Ingress Controller uses the spec.loadBalancer.platform.aws.type value from the cluster Ingress configuration that was set during installation.

Tip

If your Ingress Controller has other customizations that you want to update, such as changing the domain, consider force replacing the Ingress Controller definition file instead.

Apply the changes to the Ingress Controller YAML file by running the command:
```
oc apply -f ingresscontroller.yaml
```
```
$ oc apply -f ingresscontroller.yaml
```
Copy to Clipboard Toggle word wrap
Expect several minutes of outages while the Ingress Controller updates.

28.6.2.2. Switching the Ingress Controller from using a Network Load Balancer to a Classic Load Balancer
Copy link

You can switch the Ingress Controller that is using a Network Load Balancer (NLB) to one that uses a Classic Load Balancer (CLB) on AWS.

Switching between these load balancers will not delete the IngressController object.

Warning

This procedure might cause an outage that can last several minutes due to new DNS records propagation, new load balancers provisioning, and other factors. IP addresses and canonical names of the Ingress Controller load balancer might change after applying this procedure.

Procedure

Modify the existing Ingress Controller that you want to switch to using a CLB. This example assumes that your default Ingress Controller has an External scope and no other customizations:

Example ingresscontroller.yaml file

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: Classic
    type: LoadBalancerService

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: Classic
    type: LoadBalancerService

Copy to Clipboard

Toggle word wrap

Note

If you do not specify a value for the spec.endpointPublishingStrategy.loadBalancer.providerParameters.aws.type field, the Ingress Controller uses the spec.loadBalancer.platform.aws.type value from the cluster Ingress configuration that was set during installation.

Tip

If your Ingress Controller has other customizations that you want to update, such as changing the domain, consider force replacing the Ingress Controller definition file instead.

Apply the changes to the Ingress Controller YAML file by running the command:
```
oc apply -f ingresscontroller.yaml
```
```
$ oc apply -f ingresscontroller.yaml
```
Copy to Clipboard Toggle word wrap
Expect several minutes of outages while the Ingress Controller updates.

28.6.2.3. Replacing Ingress Controller Classic Load Balancer with Network Load Balancer
Copy link

You can replace an Ingress Controller that is using a Classic Load Balancer (CLB) with one that uses a Network Load Balancer (NLB) on AWS.

Warning

This procedure might cause the following issues:

An outage that can last several minutes due to new DNS records propagation, new load balancers provisioning, and other factors. IP addresses and canonical names of the Ingress Controller load balancer might change after applying this procedure.
Leaked load balancer resources due to a change in the annotation of the service.

Procedure

Create a file with a new default Ingress Controller. The following example assumes that your default Ingress Controller has an External scope and no other customizations:

Example ingresscontroller.yml file

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: NLB
    type: LoadBalancerService

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: NLB
    type: LoadBalancerService

Copy to Clipboard

Toggle word wrap

If your default Ingress Controller has other customizations, ensure that you modify the file accordingly.

Tip

If your Ingress Controller has no other customizations and you are only updating the load balancer type, consider following the procedure detailed in "Switching the Ingress Controller from using a Classic Load Balancer to a Network Load Balancer".

Force replace the Ingress Controller YAML file:
```
oc replace --force --wait -f ingresscontroller.yml
```
```
$ oc replace --force --wait -f ingresscontroller.yml
```
Copy to Clipboard Toggle word wrap
Wait until the Ingress Controller is replaced. Expect several of minutes of outages.

28.6.2.4. Configuring an Ingress Controller Network Load Balancer on an existing AWS cluster
Copy link

You can create an Ingress Controller backed by an AWS Network Load Balancer (NLB) on an existing cluster.

Prerequisites

You must have an installed AWS cluster.

PlatformStatus of the infrastructure resource must be AWS.

To verify that the PlatformStatus is AWS, run:

oc get infrastructure/cluster -o jsonpath='{.status.platformStatus.type}'
AWS

$ oc get infrastructure/cluster -o jsonpath='{.status.platformStatus.type}'
AWS

Copy to Clipboard

Toggle word wrap

Procedure

Create an Ingress Controller backed by an AWS NLB on an existing cluster.

Create the Ingress Controller manifest:

cat ingresscontroller-aws-nlb.yaml

 $ cat ingresscontroller-aws-nlb.yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: $my_ingress_controller
  namespace: openshift-ingress-operator
spec:
  domain: $my_unique_ingress_domain
  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: NLB

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: $my_ingress_controller

1


  namespace: openshift-ingress-operator
spec:
  domain: $my_unique_ingress_domain

2


  endpointPublishingStrategy:
    type: LoadBalancerService
    loadBalancer:
      scope: External

3


      providerParameters:
        type: AWS
        aws:
          type: NLB

Copy to Clipboard

Toggle word wrap

1: Replace $my_ingress_controller with a unique name for the Ingress Controller.
2: Replace $my_unique_ingress_domain with a domain name that is unique among all Ingress Controllers in the cluster. This variable must be a subdomain of the DNS name <clustername>.<domain>.
3: You can replace External with Internal to use an internal NLB.

Create the resource in the cluster:

oc create -f ingresscontroller-aws-nlb.yaml

$ oc create -f ingresscontroller-aws-nlb.yaml

Copy to Clipboard

Toggle word wrap

Important

Before you can configure an Ingress Controller NLB on a new AWS cluster, you must complete the Creating the installation configuration file procedure.

28.6.2.5. Configuring an Ingress Controller Network Load Balancer on a new AWS cluster
Copy link

You can create an Ingress Controller backed by an AWS Network Load Balancer (NLB) on a new cluster.

Prerequisites

Create the install-config.yaml file and complete any modifications to it.

Procedure

Create an Ingress Controller backed by an AWS NLB on a new cluster.

Change to the directory that contains the installation program and create the manifests:
```
./openshift-install create manifests --dir <installation_directory>
```
```
$ ./openshift-install create manifests --dir <installation_directory> 
```
1
Copy to Clipboard Toggle word wrap
1
For <installation_directory>, specify the name of the directory that contains the install-config.yaml file for your cluster.
Create a file that is named cluster-ingress-default-ingresscontroller.yaml in the <installation_directory>/manifests/ directory:
```
touch <installation_directory>/manifests/cluster-ingress-default-ingresscontroller.yaml
```
```
$ touch <installation_directory>/manifests/cluster-ingress-default-ingresscontroller.yaml 
```
1
Copy to Clipboard Toggle word wrap
1
For <installation_directory>, specify the directory name that contains the manifests/ directory for your cluster.
After creating the file, several network configuration files are in the manifests/ directory, as shown:
```
ls <installation_directory>/manifests/cluster-ingress-default-ingresscontroller.yaml
```
```
$ ls <installation_directory>/manifests/cluster-ingress-default-ingresscontroller.yaml
```
Copy to Clipboard Toggle word wrap
Example output
```
cluster-ingress-default-ingresscontroller.yaml
```
```
cluster-ingress-default-ingresscontroller.yaml
```
Copy to Clipboard Toggle word wrap

Open the cluster-ingress-default-ingresscontroller.yaml file in an editor and enter a custom resource (CR) that describes the Operator configuration you want:

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: NLB
    type: LoadBalancerService

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  creationTimestamp: null
  name: default
  namespace: openshift-ingress-operator
spec:
  endpointPublishingStrategy:
    loadBalancer:
      scope: External
      providerParameters:
        type: AWS
        aws:
          type: NLB
    type: LoadBalancerService

Copy to Clipboard

Toggle word wrap

Save the cluster-ingress-default-ingresscontroller.yaml file and quit the text editor.
Optional: Back up the manifests/cluster-ingress-default-ingresscontroller.yaml file. The installation program deletes the manifests/ directory when creating the cluster.

28.7. Configuring ingress cluster traffic for a service external IP
Copy link

You can use either a MetalLB implementation or an IP failover deployment to attach an ExternalIP resource to a service so that the service is available to traffic outside your OpenShift Container Platform cluster. Hosting an external IP address in this way is only applicable for a cluster installed on bare-metal hardware.

You must ensure that you correctly configure the external network infrastructure to route traffic to the service.

28.7.1. Prerequisites
Copy link

Your cluster is configured with ExternalIPs enabled. For more information, read Configuring ExternalIPs for services.
Note
Do not use the same ExternalIP for the egress IP.

28.7.2. Attaching an ExternalIP to a service
Copy link

You can attach an ExternalIP resource to a service. If you configured your cluster to automatically attach the resource to a service, you might not need to manually attach an ExternalIP to the service.

The examples in the procedure use a scenario that manually attaches an ExternalIP resource to a service in a cluster with an IP failover configuration.

Procedure

Confirm compatible IP address ranges for the ExternalIP resource by entering the following command in your CLI:
```
oc get networks.config cluster -o jsonpath='{.spec.externalIP}{"\n"}'
```
```
$ oc get networks.config cluster -o jsonpath='{.spec.externalIP}{"\n"}'
```
Copy to Clipboard Toggle word wrap
Note
If autoAssignCIDRs is set and you did not specify a value for spec.externalIPs in the ExternalIP resource, OpenShift Container Platform automatically assigns ExternalIP to a new Service object.
Choose one of the following options to attach an ExternalIP resource to the service:
1. If you are creating a new service, specify a value in the spec.externalIPs field and array of one or more valid IP addresses in the allowedCIDRs parameter.
  Example of service YAML configuration file that supports an ExternalIP resource
  apiVersion: v1 kind: Service metadata: name: svc-with-externalip spec: externalIPs: policy: allowedCIDRs: - 192.168.123.0/28
  
  Copy to Clipboard Toggle word wrap
2. If you are attaching an ExternalIP to an existing service, enter the following command. Replace <name> with the service name. Replace <ip_address> with a valid ExternalIP address. You can provide multiple IP addresses separated by commas.
  $ oc patch svc <name> -p \ '{ "spec": { "externalIPs": [ "<ip_address>" ] } }'
  Copy to Clipboard Toggle word wrap
  For example:
  $ oc patch svc mysql-55-rhel7 -p '{"spec":{"externalIPs":["192.174.120.10"]}}'
  Copy to Clipboard Toggle word wrap
  Example output
  "mysql-55-rhel7" patched
  
  Copy to Clipboard Toggle word wrap
To confirm that an ExternalIP address is attached to the service, enter the following command. If you specified an ExternalIP for a new service, you must create the service first.
```
oc get svc
```
```
$ oc get svc
```
Copy to Clipboard Toggle word wrap
Example output
```
NAME               CLUSTER-IP      EXTERNAL-IP     PORT(S)    AGE
mysql-55-rhel7     172.30.131.89   192.174.120.10  3306/TCP   13m
```
```
NAME               CLUSTER-IP      EXTERNAL-IP     PORT(S)    AGE
mysql-55-rhel7     172.30.131.89   192.174.120.10  3306/TCP   13m
```
Copy to Clipboard Toggle word wrap

28.8. Configuring ingress cluster traffic by using a NodePort
Copy link

OpenShift Container Platform provides methods for communicating from outside the cluster with services running in the cluster. This method uses a NodePort.

28.8.1. Using a NodePort to get traffic into the cluster
Copy link

Use a NodePort-type Service resource to expose a service on a specific port on all nodes in the cluster. The port is specified in the Service resource’s .spec.ports[*].nodePort field.

Important

Using a node port requires additional port resources.

A NodePort exposes the service on a static port on the node’s IP address. NodePorts are in the 30000 to 32767 range by default, which means a NodePort is unlikely to match a service’s intended port. For example, port 8080 may be exposed as port 31020 on the node.

The administrator must ensure the external IP addresses are routed to the nodes.

NodePorts and external IPs are independent and both can be used concurrently.

Note

The procedures in this section require prerequisites performed by the cluster administrator.

28.8.2. Prerequisites
Copy link

Before starting the following procedures, the administrator must:

Set up the external port to the cluster networking environment so that requests can reach the cluster.
Make sure there is at least one user with cluster admin role. To add this role to a user, run the following command:
```
oc adm policy add-cluster-role-to-user cluster-admin <user_name>
```
```
$ oc adm policy add-cluster-role-to-user cluster-admin <user_name>
```
Copy to Clipboard Toggle word wrap
Have an OpenShift Container Platform cluster with at least one master and at least one node and a system outside the cluster that has network access to the cluster. This procedure assumes that the external system is on the same subnet as the cluster. The additional networking required for external systems on a different subnet is out-of-scope for this topic.

28.8.3. Creating a project and service
Copy link

If the project and service that you want to expose does not exist, create the project and then create the service.

If the project and service already exists, skip to the procedure on exposing the service to create a route.

Prerequisites

Install the OpenShift CLI (oc) and log in as a cluster administrator.

Procedure

Create a new project for your service by running the oc new-project command:
```
oc new-project <project_name>
```
```
$ oc new-project <project_name>
```
Copy to Clipboard Toggle word wrap

Use the oc new-app command to create your service:

oc new-app nodejs:12~https://github.com/sclorg/nodejs-ex.git

$ oc new-app nodejs:12~https://github.com/sclorg/nodejs-ex.git

Copy to Clipboard

Toggle word wrap

To verify that the service was created, run the following command:

oc get svc -n <project_name>

$ oc get svc -n <project_name>

Copy to Clipboard

Toggle word wrap

Example output

NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
nodejs-ex   ClusterIP   172.30.197.157   <none>        8080/TCP   70s

NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
nodejs-ex   ClusterIP   172.30.197.157   <none>        8080/TCP   70s

Copy to Clipboard

Toggle word wrap

Note

By default, the new service does not have an external IP address.

28.8.4. Exposing the service by creating a route
Copy link

You can expose the service as a route by using the oc expose command.

Prerequisites

You logged into OpenShift Container Platform.

Procedure

Log in to the project where the service you want to expose is located:
```
oc project <project_name>
```
```
$ oc project <project_name>
```
Copy to Clipboard Toggle word wrap
To expose a node port for the application, modify the custom resource definition (CRD) of a service by entering the following command:
```
oc edit svc <service_name>
```
```
$ oc edit svc <service_name>
```
Copy to Clipboard Toggle word wrap
Example output
```
spec:
  ports:
  - name: 8443-tcp
    nodePort: 30327 
    port: 8443
    protocol: TCP
    targetPort: 8443
  sessionAffinity: None
  type: NodePort 
```
```
spec:
  ports:
  - name: 8443-tcp
    nodePort: 30327 
```
1
```
    port: 8443
    protocol: TCP
    targetPort: 8443
  sessionAffinity: None
  type: NodePort 
```
2
Copy to Clipboard Toggle word wrap
1
Optional: Specify the node port range for the application. By default, OpenShift Container Platform selects an available port in the 30000-32767 range.
2
Define the service type.

Optional: To confirm the service is available with a node port exposed, enter the following command:

oc get svc -n myproject

$ oc get svc -n myproject

Copy to Clipboard

Toggle word wrap

Example output

NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
nodejs-ex           ClusterIP   172.30.217.127   <none>        3306/TCP         9m44s
nodejs-ex-ingress   NodePort    172.30.107.72    <none>        3306:31345/TCP   39s

NAME                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
nodejs-ex           ClusterIP   172.30.217.127   <none>        3306/TCP         9m44s
nodejs-ex-ingress   NodePort    172.30.107.72    <none>        3306:31345/TCP   39s

Copy to Clipboard

Toggle word wrap

Optional: To remove the service created automatically by the oc new-app command, enter the following command:
```
oc delete svc nodejs-ex
```
```
$ oc delete svc nodejs-ex
```
Copy to Clipboard Toggle word wrap

Verification

To check that the service node port is updated with a port in the 30000-32767 range, enter the following command:

oc get svc

$ oc get svc

Copy to Clipboard

Toggle word wrap

In the following example output, the updated port is 30327:

Example output

NAME    TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
httpd   NodePort   172.xx.xx.xx    <none>        8443:30327/TCP   109s

NAME    TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
httpd   NodePort   172.xx.xx.xx    <none>        8443:30327/TCP   109s

Copy to Clipboard

Toggle word wrap

28.9. Configuring ingress cluster traffic using load balancer allowed source ranges
Copy link

You can specify a list of IP address ranges for the IngressController. This restricts access to the load balancer service when the endpointPublishingStrategy is LoadBalancerService.

28.9.1. Configuring load balancer allowed source ranges
Copy link

You can enable and configure the spec.endpointPublishingStrategy.loadBalancer.allowedSourceRanges field. By configuring load balancer allowed source ranges, you can limit the access to the load balancer for the Ingress Controller to a specified list of IP address ranges. The Ingress Operator reconciles the load balancer Service and sets the spec.loadBalancerSourceRanges field based on AllowedSourceRanges.

Note

If you have already set the spec.loadBalancerSourceRanges field or the load balancer service anotation service.beta.kubernetes.io/load-balancer-source-ranges in a previous version of OpenShift Container Platform, Ingress Controller starts reporting Progressing=True after an upgrade. To fix this, set AllowedSourceRanges that overwrites the spec.loadBalancerSourceRanges field and clears the service.beta.kubernetes.io/load-balancer-source-ranges annotation. Ingress Controller starts reporting Progressing=False again.

Prerequisites

You have a deployed Ingress Controller on a running cluster.

Procedure

Set the allowed source ranges API for the Ingress Controller by running the following command:

oc -n openshift-ingress-operator patch ingresscontroller/default \
    --type=merge --patch='{"spec":{"endpointPublishingStrategy": \
    {"type":"LoadBalancerService", "loadbalancer": \
    {"scope":"External", "allowedSourceRanges":["0.0.0.0/0"]}}}}'

$ oc -n openshift-ingress-operator patch ingresscontroller/default \
    --type=merge --patch='{"spec":{"endpointPublishingStrategy": \
    {"type":"LoadBalancerService", "loadbalancer": \
    {"scope":"External", "allowedSourceRanges":["0.0.0.0/0"]}}}}'

1

Copy to Clipboard

Toggle word wrap

1: The example value 0.0.0.0/0 specifies the allowed source range.

28.9.2. Migrating to load balancer allowed source ranges
Copy link

If you have already set the annotation service.beta.kubernetes.io/load-balancer-source-ranges, you can migrate to load balancer allowed source ranges. When you set the AllowedSourceRanges, the Ingress Controller sets the spec.loadBalancerSourceRanges field based on the AllowedSourceRanges value and unsets the service.beta.kubernetes.io/load-balancer-source-ranges annotation.

Note

If you have already set the spec.loadBalancerSourceRanges field or the load balancer service anotation service.beta.kubernetes.io/load-balancer-source-ranges in a previous version of OpenShift Container Platform, the Ingress Controller starts reporting Progressing=True after an upgrade. To fix this, set AllowedSourceRanges that overwrites the spec.loadBalancerSourceRanges field and clears the service.beta.kubernetes.io/load-balancer-source-ranges annotation. The Ingress Controller starts reporting Progressing=False again.

Prerequisites

You have set the service.beta.kubernetes.io/load-balancer-source-ranges annotation.

Procedure

Ensure that the service.beta.kubernetes.io/load-balancer-source-ranges is set:

oc get svc router-default -n openshift-ingress -o yaml

$ oc get svc router-default -n openshift-ingress -o yaml

Copy to Clipboard

Toggle word wrap

Example output

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/load-balancer-source-ranges: 192.168.0.1/32

apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/load-balancer-source-ranges: 192.168.0.1/32

Copy to Clipboard

Toggle word wrap

Ensure that the spec.loadBalancerSourceRanges field is unset:

oc get svc router-default -n openshift-ingress -o yaml

$ oc get svc router-default -n openshift-ingress -o yaml

Copy to Clipboard

Toggle word wrap

Example output

...
spec:
  loadBalancerSourceRanges:
  - 0.0.0.0/0
...

...
spec:
  loadBalancerSourceRanges:
  - 0.0.0.0/0
...

Copy to Clipboard

Toggle word wrap

Update your cluster to OpenShift Container Platform 4.13.

Set the allowed source ranges API for the ingresscontroller by running the following command:

oc -n openshift-ingress-operator patch ingresscontroller/default \
    --type=merge --patch='{"spec":{"endpointPublishingStrategy": \
    {"loadBalancer":{"allowedSourceRanges":["0.0.0.0/0"]}}}}'

$ oc -n openshift-ingress-operator patch ingresscontroller/default \
    --type=merge --patch='{"spec":{"endpointPublishingStrategy": \
    {"loadBalancer":{"allowedSourceRanges":["0.0.0.0/0"]}}}}'

1

Copy to Clipboard

Toggle word wrap

1: The example value 0.0.0.0/0 specifies the allowed source range.

28.10. Patching existing ingress objects
Copy link

You can update or modify the following fields of existing Ingress objects without recreating the objects or disrupting services to them:

Specifications
Host
Path
Backend services
SSL/TLS settings
Annotations

28.10.1. Patching Ingress objects to resolve an ingressWithoutClassName alert
Copy link

The ingressClassName field specifies the name of the IngressClass object. You must define the ingressClassName field for each Ingress object.

If you have not defined the ingressClassName field for an Ingress object, you could experience routing issues. After 24 hours, you will receive an ingressWithoutClassName alert to remind you to set the ingressClassName field.

Procedure

Patch the Ingress objects with a completed ingressClassName field to ensure proper routing and functionality.

List all IngressClass objects:
```
oc get ingressclass
```
```
$ oc get ingressclass
```
Copy to Clipboard Toggle word wrap
List all Ingress objects in all namespaces:
```
oc get ingress -A
```
```
$ oc get ingress -A
```
Copy to Clipboard Toggle word wrap

Patch the Ingress object:

oc patch ingress/<ingress_name> --type=merge --patch '{"spec":{"ingressClassName":"openshift-default"}}'

$ oc patch ingress/<ingress_name> --type=merge --patch '{"spec":{"ingressClassName":"openshift-default"}}'

Copy to Clipboard

Toggle word wrap

Replace <ingress_name> with the name of the Ingress object. This command patches the Ingress object to include the desired ingress class name.

Chapter 29. Kubernetes NMState
Copy link

29.1. About the Kubernetes NMState Operator
Copy link

The Kubernetes NMState Operator provides a Kubernetes API for performing state-driven network configuration across the OpenShift Container Platform cluster’s nodes with NMState. The Kubernetes NMState Operator provides users with functionality to configure various network interface types, DNS, and routing on cluster nodes. Additionally, the daemons on the cluster nodes periodically report on the state of each node’s network interfaces to the API server.

Important

Red Hat supports the Kubernetes NMState Operator in production environments on bare-metal, IBM Power, IBM Z, IBM® LinuxONE, VMware vSphere, and OpenStack installations.

Before you can use NMState with OpenShift Container Platform, you must install the Kubernetes NMState Operator.

Note

The Kubernetes NMState Operator updates the network configuration of a secondary NIC. It cannot update the network configuration of the primary NIC or the br-ex bridge.

OpenShift Container Platform uses nmstate to report on and configure the state of the node network. This makes it possible to modify the network policy configuration, such as by creating a Linux bridge on all nodes, by applying a single configuration manifest to the cluster.

Node networking is monitored and updated by the following objects:

NodeNetworkState: Reports the state of the network on that node.
NodeNetworkConfigurationPolicy: Describes the requested network configuration on nodes. You update the node network configuration, including adding and removing interfaces, by applying a NodeNetworkConfigurationPolicy manifest to the cluster.
NodeNetworkConfigurationEnactment: Reports the network policies enacted upon each node.

29.1.1. Installing the Kubernetes NMState Operator
Copy link

You can install the Kubernetes NMState Operator by using the web console or the CLI.

29.1.1.1. Installing the Kubernetes NMState Operator using the web console
Copy link

You can install the Kubernetes NMState Operator by using the web console. After it is installed, the Operator can deploy the NMState State Controller as a daemon set across all of the cluster nodes.

Prerequisites

You are logged in as a user with cluster-admin privileges.

Procedure

Select Operators → OperatorHub.
In the search field below All Items, enter nmstate and click Enter to search for the Kubernetes NMState Operator.
Click on the Kubernetes NMState Operator search result.
Click on Install to open the Install Operator window.
Click Install to install the Operator.
After the Operator finishes installing, click View Operator.
Under Provided APIs, click Create Instance to open the dialog box for creating an instance of kubernetes-nmstate.
In the Name field of the dialog box, ensure the name of the instance is nmstate.
Note
The name restriction is a known issue. The instance is a singleton for the entire cluster.
Accept the default settings and click Create to create the instance.

Summary

Once complete, the Operator has deployed the NMState State Controller as a daemon set across all of the cluster nodes.

29.1.1.2. Installing the Kubernetes NMState Operator by using the CLI
Copy link

You can install the Kubernetes NMState Operator by using the OpenShift CLI (oc). After it is installed, the Operator can deploy the NMState State Controller as a daemon set across all of the cluster nodes.

Prerequisites

You have installed the OpenShift CLI (oc).
You are logged in as a user with cluster-admin privileges.

Procedure

Create the nmstate Operator namespace:

cat << EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-nmstate
spec:
  finalizers:
  - kubernetes
EOF

$ cat << EOF | oc apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-nmstate
spec:
  finalizers:
  - kubernetes
EOF

Copy to Clipboard

Toggle word wrap

Create the OperatorGroup:

cat << EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-nmstate
  namespace: openshift-nmstate
spec:
  targetNamespaces:
  - openshift-nmstate
EOF

$ cat << EOF | oc apply -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-nmstate
  namespace: openshift-nmstate
spec:
  targetNamespaces:
  - openshift-nmstate
EOF

Copy to Clipboard

Toggle word wrap

Subscribe to the nmstate Operator:

cat << EOF| oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: kubernetes-nmstate-operator
  namespace: openshift-nmstate
spec:
  channel: stable
  installPlanApproval: Automatic
  name: kubernetes-nmstate-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

$ cat << EOF| oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: kubernetes-nmstate-operator
  namespace: openshift-nmstate
spec:
  channel: stable
  installPlanApproval: Automatic
  name: kubernetes-nmstate-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

Copy to Clipboard

Toggle word wrap

Create instance of the nmstate operator:

cat << EOF | oc apply -f -
apiVersion: nmstate.io/v1
kind: NMState
metadata:
  name: nmstate
EOF

$ cat << EOF | oc apply -f -
apiVersion: nmstate.io/v1
kind: NMState
metadata:
  name: nmstate
EOF

Copy to Clipboard

Toggle word wrap

Verification

Confirm that the deployment for the nmstate operator is running:

oc get clusterserviceversion -n openshift-nmstate \
 -o custom-columns=Name:.metadata.name,Phase:.status.phase

oc get clusterserviceversion -n openshift-nmstate \
 -o custom-columns=Name:.metadata.name,Phase:.status.phase

Copy to Clipboard

Toggle word wrap

Example output

Name                                             Phase
kubernetes-nmstate-operator.4.13.0-202210210157   Succeeded

Name                                             Phase
kubernetes-nmstate-operator.4.13.0-202210210157   Succeeded

Copy to Clipboard

Toggle word wrap

29.1.2. Uninstalling the Kubernetes NMState Operator
Copy link

You can use the Operator Lifecycle Manager (OLM) to uninstall the Kubernetes NMState Operator, but by design OLM does not delete any associated custom resource definitions (CRDs), custom resources (CRs), or API Services.

Before you uninstall the Kubernetes NMState Operator from the Subcription resource used by OLM, identify what Kubernetes NMState Operator resources to delete. This identification ensures that you can delete resources without impacting your running cluster.

If you need to reinstall the Kubernetes NMState Operator, see "Installing the Kubernetes NMState Operator by using the CLI" or "Installing the Kubernetes NMState Operator by using the web console".

Prerequisites

You have installed the OpenShift CLI (oc).
You are logged in as a user with cluster-admin privileges.

Procedure

Unsubscribe the Kubernetes NMState Operator from the Subcription resource by running the following command:

oc delete --namespace openshift-nmstate subscription kubernetes-nmstate-operator

$ oc delete --namespace openshift-nmstate subscription kubernetes-nmstate-operator

Copy to Clipboard

Toggle word wrap

Find the ClusterServiceVersion (CSV) resource that associates with the Kubernetes NMState Operator:

oc get --namespace openshift-nmstate clusterserviceversion

$ oc get --namespace openshift-nmstate clusterserviceversion

Copy to Clipboard

Toggle word wrap

Example output that lists a CSV resource

NAME                              	  DISPLAY                   	VERSION   REPLACES     PHASE
kubernetes-nmstate-operator.v4.18.0   Kubernetes NMState Operator   4.18.0           	   Succeeded

NAME                              	  DISPLAY                   	VERSION   REPLACES     PHASE
kubernetes-nmstate-operator.v4.18.0   Kubernetes NMState Operator   4.18.0           	   Succeeded

Copy to Clipboard

Toggle word wrap

Delete the CSV resource. After you delete the file, OLM deletes certain resources, such as RBAC, that it created for the Operator.

oc delete --namespace openshift-nmstate clusterserviceversion kubernetes-nmstate-operator.v4.18.0

$ oc delete --namespace openshift-nmstate clusterserviceversion kubernetes-nmstate-operator.v4.18.0

Copy to Clipboard

Toggle word wrap

Delete the nmstate CR and any associated Deployment resources by running the following commands:

oc -n openshift-nmstate delete nmstate nmstate

$ oc -n openshift-nmstate delete nmstate nmstate

Copy to Clipboard

Toggle word wrap

oc delete --all deployments --namespace=openshift-nmstate

$ oc delete --all deployments --namespace=openshift-nmstate

Copy to Clipboard

Toggle word wrap

Delete all the custom resource definition (CRD), such as nmstates, that exist in the nmstate.io namespace by running the following commands:

oc delete crd nmstates.nmstate.io

$ oc delete crd nmstates.nmstate.io

Copy to Clipboard

Toggle word wrap

oc delete crd nodenetworkconfigurationenactments.nmstate.io

$ oc delete crd nodenetworkconfigurationenactments.nmstate.io

Copy to Clipboard

Toggle word wrap

oc delete crd nodenetworkstates.nmstate.io

$ oc delete crd nodenetworkstates.nmstate.io

Copy to Clipboard

Toggle word wrap

oc delete crd nodenetworkconfigurationpolicies.nmstate.io

$ oc delete crd nodenetworkconfigurationpolicies.nmstate.io

Copy to Clipboard

Toggle word wrap

Delete the namespace:

oc delete namespace kubernetes-nmstate

$ oc delete namespace kubernetes-nmstate

Copy to Clipboard

Toggle word wrap

29.2. Observing and updating the node network state and configuration
Copy link

For more information about how to install the NMState Operator, see Kubernetes NMState Operator.

29.2.1. Viewing the network state of a node
Copy link

Node network state is the network configuration for all nodes in the cluster. A NodeNetworkState object exists on every node in the cluster. This object is periodically updated and captures the state of the network for that node.

Procedure

List all the NodeNetworkState objects in the cluster:
```
oc get nns
```
```
$ oc get nns
```
Copy to Clipboard Toggle word wrap
Inspect a NodeNetworkState object to view the network on that node. The output in this example has been redacted for clarity:
```
oc get nns node01 -o yaml
```
```
$ oc get nns node01 -o yaml
```
Copy to Clipboard Toggle word wrap
Example output
```
apiVersion: nmstate.io/v1
kind: NodeNetworkState
metadata:
  name: node01 
status:
  currentState: 
    dns-resolver:
# ...
    interfaces:
# ...
    route-rules:
# ...
    routes:
# ...
  lastSuccessfulUpdateTime: "2020-01-31T12:14:00Z" 
```
```
apiVersion: nmstate.io/v1
kind: NodeNetworkState
metadata:
  name: node01 
```
1
```
status:
  currentState: 
```
2
```
    dns-resolver:
# ...
    interfaces:
# ...
    route-rules:
# ...
    routes:
# ...
  lastSuccessfulUpdateTime: "2020-01-31T12:14:00Z" 
```
3
Copy to Clipboard Toggle word wrap
1
The name of the NodeNetworkState object is taken from the node.
2
The currentState contains the complete network configuration for the node, including DNS, interfaces, and routes.
3
Timestamp of the last successful update. This is updated periodically as long as the node is reachable and can be used to evalute the freshness of the report.

29.2.2. The NodeNetworkConfigurationPolicy manifest file
Copy link

A NodeNetworkConfigurationPolicy (NNCP) manifest file defines policies that the Kubernetes NMState Operator uses to configure networking for nodes that exist in an OpenShift Container Platform cluster.

Important

If you want to apply multiple NNCP CRs to a node, you must create the NNCPs in a logical order that is based on the alphanumeric sorting of the policy names. The Kubernetes NMState Operator continuously checks for a newly created NNCP CR so that the Operator can instantly apply the CR to node. Consider the following logical order issue example:

You create NNCP 1 for defining the bridge interface that listens on a VLAN port, such as eth1.1000.
You create NNCP 2 for defining the VLAN interface and specify the port for this interface, such as eth1.1000.
You apply NNCP 1 before you apply NNCP 2 to the node.

The node experiences a node connectivity issue because port eth1.1000 does not exist. As a result, the cluster fails.

After you apply a node network policy to a node, the Kubernetes NMState Operator configures the networking configuration for nodes according to the node network policy details.

You can create an NNCP by using either the OpenShift CLI (oc) or the OpenShift Container Platform web console. As a postinstallation task you can create an NNCP or edit an existing NNCP.

Note

Before you create an NNCP, ensure that you read the "Example policy configurations for different interfaces" document.

If you want to delete an NNCP, you can use the oc delete nncp command to complete this action. However, this command does not delete any objects, such as a bridge interface.

Deleting the node network policy that added an interface to a node does not change the configuration of the policy on the node. Similarly, removing an interface does not delete the policy, because the Kubernetes NMState Operator re-adds the removed interface whenever a pod or a node is restarted.

To effectively delete the NNCP, the node network policy, and any interfaces would typically require the following actions:

Edit the NNCP and remove interface details from the file. Ensure that you do not remove name, state, and type parameters from the file.
Add state: absent under the interfaces.state section of the NNCP.
Run oc apply -f <nncp_file_name>. After the Kubernetes NMState Operator applies the node network policy to each node in your cluster, any interface that exists on each node is now marked as absent.
Run oc delete nncp to delete the NNCP.

Additional resources

Example policy configurations for different interfaces
Removing an interface from nodes

29.2.3. Managing policy by using the CLI
Copy link

29.2.3.1. Creating an interface on nodes
Copy link

Create an interface on nodes in the cluster by applying a NodeNetworkConfigurationPolicy (NNCP) manifest to the cluster. The manifest details the requested configuration for the interface.

By default, the manifest applies to all nodes in the cluster. To add the interface to specific nodes, add the spec: nodeSelector parameter and the appropriate <key>:<value> for your node selector.

You can configure multiple nmstate-enabled nodes concurrently. The configuration applies to 50% of the nodes in parallel. This strategy prevents the entire cluster from being unavailable if the network connection fails. To apply the policy configuration in parallel to a specific portion of the cluster, use the maxUnavailable parameter in the NodeNetworkConfigurationPolicy manifest configuration file.

Note

If you have two nodes and you apply an NNCP manifest with the maxUnavailable parameter set to 50% to these nodes, one node at a time receives the NNCP configuration. If you then introduce an additional NNCP manifest file with the maxUnavailable parameter set to 50%, this NCCP is independent of the initial NNCP; this means that if both NNCP manifests apply a bad configuration to nodes, you can no longer guarantee that half of your cluster is functional.

Procedure

Create the NodeNetworkConfigurationPolicy manifest. The following example configures a Linux bridge on all worker nodes and configures the DNS resolver:

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-eth1-policy 
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: "" 
  maxUnavailable: 3 
  desiredState:
    interfaces:
      - name: br1
        description: Linux bridge with eth1 as a port 
        type: linux-bridge
        state: up
        ipv4:
          dhcp: true
          enabled: true
          auto-dns: false
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: eth1
    dns-resolver: 
      config:
        search:
        - example.com
        - example.org
        server:
        - 8.8.8.8

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-eth1-policy

1


spec:
  nodeSelector:

2


    node-role.kubernetes.io/worker: ""

3


  maxUnavailable: 3

4


  desiredState:
    interfaces:
      - name: br1
        description: Linux bridge with eth1 as a port

5


        type: linux-bridge
        state: up
        ipv4:
          dhcp: true
          enabled: true
          auto-dns: false
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: eth1
    dns-resolver:

6


      config:
        search:
        - example.com
        - example.org
        server:
        - 8.8.8.8

Copy to Clipboard

Toggle word wrap

1: Name of the policy.
2: Optional: If you do not include the nodeSelector parameter, the policy applies to all nodes in the cluster.
3: This example uses the node-role.kubernetes.io/worker: "" node selector to select all worker nodes in the cluster.
4: Optional: Specifies the maximum number of nmstate-enabled nodes that the policy configuration can be applied to concurrently. This parameter can be set to either a percentage value (string), for example, "10%", or an absolute value (number), such as 3.
5: Optional: Human-readable description for the interface.
6: Optional: Specifies the search and server settings for the DNS server.

Create the node network policy:
```
oc apply -f br1-eth1-policy.yaml
```
```
$ oc apply -f br1-eth1-policy.yaml 
```
1
Copy to Clipboard Toggle word wrap
1
File name of the node network configuration policy manifest.

Additional resources

Example for creating multiple interfaces in the same policy
Examples of different IP management methods in policies

29.2.4. Confirming node network policy updates on nodes
Copy link

When you apply a node network policy, a NodeNetworkConfigurationEnactment object is created for every node in the cluster. The node network configuration enactment is a read-only object that represents the status of execution of the policy on that node. If the policy fails to be applied on the node, the enactment for that node includes a traceback for troubleshooting.

Procedure

To confirm that a policy has been applied to the cluster, list the policies and their status:
```
oc get nncp
```
```
$ oc get nncp
```
Copy to Clipboard Toggle word wrap
Optional: If a policy is taking longer than expected to successfully configure, you can inspect the requested state and status conditions of a particular policy:
```
oc get nncp <policy> -o yaml
```
```
$ oc get nncp <policy> -o yaml
```
Copy to Clipboard Toggle word wrap
Optional: If a policy is taking longer than expected to successfully configure on all nodes, you can list the status of the enactments on the cluster:
```
oc get nnce
```
```
$ oc get nnce
```
Copy to Clipboard Toggle word wrap
Optional: To view the configuration of a particular enactment, including any error reporting for a failed configuration:
```
oc get nnce <node>.<policy> -o yaml
```
```
$ oc get nnce <node>.<policy> -o yaml
```
Copy to Clipboard Toggle word wrap

29.2.5. Removing an interface from nodes
Copy link

You can remove an interface from one or more nodes in the cluster by editing the NodeNetworkConfigurationPolicy object and setting the state of the interface to absent.

Removing an interface from a node does not automatically restore the node network configuration to a previous state. If you want to restore the previous state, you will need to define that node network configuration in the policy.

If you remove a bridge or bonding interface, any node NICs in the cluster that were previously attached or subordinate to that bridge or bonding interface are placed in a down state and become unreachable. To avoid losing connectivity, configure the node NIC in the same policy so that it has a status of up and either DHCP or a static IP address.

Note

Deleting the node network policy that added an interface does not change the configuration of the policy on the node. Although a NodeNetworkConfigurationPolicy is an object in the cluster, the object only represents the requested configuration. Similarly, removing an interface does not delete the policy.

Procedure

Update the NodeNetworkConfigurationPolicy manifest used to create the interface. The following example removes a Linux bridge and configures the eth1 NIC with DHCP to avoid losing connectivity:
```
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: <br1-eth1-policy> 
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: "" 
  desiredState:
    interfaces:
    - name: br1
      type: linux-bridge
      state: absent 
    - name: eth1 
      type: ethernet 
      state: up 
      ipv4:
        dhcp: true 
        enabled: true 
```
```
apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: <br1-eth1-policy> 
```
1
```
spec:
  nodeSelector: 
```
2
```
    node-role.kubernetes.io/worker: "" 
```
3
```
  desiredState:
    interfaces:
    - name: br1
      type: linux-bridge
      state: absent 
```
4
```
    - name: eth1 
```
5
```
      type: ethernet 
```
6
```
      state: up 
```
7
```
      ipv4:
        dhcp: true 
```
8
```
        enabled: true 
```
9
Copy to Clipboard Toggle word wrap
1
Name of the policy.
2
Optional: If you do not include the nodeSelector parameter, the policy applies to all nodes in the cluster.
3
This example uses the node-role.kubernetes.io/worker: "" node selector to select all worker nodes in the cluster.
4
Changing the state to absent removes the interface.
5
The name of the interface that is to be unattached from the bridge interface.
6
The type of interface. This example creates an Ethernet networking interface.
7
The requested state for the interface.
8
Optional: If you do not use dhcp, you can either set a static IP or leave the interface without an IP address.
9
Enables ipv4 in this example.
Update the policy on the node and remove the interface:
```
oc apply -f <br1-eth1-policy.yaml>
```
```
$ oc apply -f <br1-eth1-policy.yaml> 
```
1
Copy to Clipboard Toggle word wrap
1
File name of the policy manifest.

29.2.6. Example policy configurations for different interfaces
Copy link

Before you read the different example NodeNetworkConfigurationPolicy (NNCP) manifest configurations, consider the following factors when you apply a policy to nodes so that your cluster runs under its best performance conditions:

If you want to apply multiple NNCP CRs to a node, you must create the NNCPs in a logical order that is based on the alphanumeric sorting of the policy names. The Kubernetes NMState Operator continuously checks for a newly created NNCP CR so that the Operator can instantly apply the CR to node.
When you need to apply a policy to many nodes but you only want to create a single NNCP for all the nodes, the Kubernetes NMState Operator applies the policy to each node in sequence. You can set the speed and coverage of policy application for target nodes with the maxUnavailable parameter in the cluster’s configuration file. By setting a lower percentage value for the parameter, you can reduce the risk of a cluster-wide outage if the outage impacts the small percentage of nodes that are receiving the policy application.
If you set the maxUnavailable parameter to 50% in two NNCP manifests, the policy configuration coverage applies to 100% of the nodes in your cluster.
When a node restarts, the Kubernetes NMState Operator cannot control the order to which it applies policies to nodes. The Kubernetes NMState Operator might apply interdependent policies in a sequence that results in a degraded network object.
Consider specifying all related network configurations in a single policy.

29.2.6.1. Example: Linux bridge interface node network configuration policy
Copy link

Create a Linux bridge interface on nodes in the cluster by applying a NodeNetworkConfigurationPolicy manifest to the cluster.

The following YAML file is an example of a manifest for a Linux bridge interface. It includes samples values that you must replace with your own information.

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-eth1-policy 
spec:
  nodeSelector: 
    kubernetes.io/hostname: <node01> 
  desiredState:
    interfaces:
      - name: br1 
        description: Linux bridge with eth1 as a port 
        type: linux-bridge 
        state: up 
        ipv4:
          dhcp: true 
          enabled: true 
        bridge:
          options:
            stp:
              enabled: false 
          port:
            - name: eth1

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-eth1-policy

1


spec:
  nodeSelector:

2


    kubernetes.io/hostname: <node01>

3


  desiredState:
    interfaces:
      - name: br1

4


        description: Linux bridge with eth1 as a port

5


        type: linux-bridge

6


        state: up

7


        ipv4:
          dhcp: true

8


          enabled: true

9


        bridge:
          options:
            stp:
              enabled: false

10


          port:
            - name: eth1

11

Copy to Clipboard

Toggle word wrap

1: Name of the policy.
2: Optional: If you do not include the nodeSelector parameter, the policy applies to all nodes in the cluster.
3: This example uses a hostname node selector.
4: Name of the interface.
5: Optional: Human-readable description of the interface.
6: The type of interface. This example creates a bridge.
7: The requested state for the interface after creation.
8: Optional: If you do not use dhcp, you can either set a static IP or leave the interface without an IP address.
9: Enables ipv4 in this example.
10: Disables stp in this example.
11: The node NIC to which the bridge attaches.

29.2.6.2. Example: VLAN interface node network configuration policy
Copy link

Create a VLAN interface on nodes in the cluster by applying a NodeNetworkConfigurationPolicy manifest to the cluster.

Note

Define all related configurations for the VLAN interface of a node in a single NodeNetworkConfigurationPolicy manifest. For example, define the VLAN interface for a node and the related routes for the VLAN interface in the same NodeNetworkConfigurationPolicy manifest.

When a node restarts, the Kubernetes NMState Operator cannot control the order in which policies are applied. Therefore, if you use separate policies for related network configurations, the Kubernetes NMState Operator might apply these policies in a sequence that results in a degraded network object.

The following YAML file is an example of a manifest for a VLAN interface. It includes samples values that you must replace with your own information.

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: vlan-eth1-policy 
spec:
  nodeSelector: 
    kubernetes.io/hostname: <node01> 
  desiredState:
    interfaces:
    - name: eth1.102 
      description: VLAN using eth1 
      type: vlan 
      state: up 
      vlan:
        base-iface: eth1 
        id: 102

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: vlan-eth1-policy

1


spec:
  nodeSelector:

2


    kubernetes.io/hostname: <node01>

3


  desiredState:
    interfaces:
    - name: eth1.102

4


      description: VLAN using eth1

5


      type: vlan

6


      state: up

7


      vlan:
        base-iface: eth1

8


        id: 102

9

Copy to Clipboard

Toggle word wrap

1: Name of the policy.
2: Optional: If you do not include the nodeSelector parameter, the policy applies to all nodes in the cluster.
3: This example uses a hostname node selector.
4: Name of the interface. When deploying on bare metal, only the <interface_name>.<vlan_number> VLAN format is supported.
5: Optional: Human-readable description of the interface.
6: The type of interface. This example creates a VLAN.
7: The requested state for the interface after creation.
8: The node NIC to which the VLAN is attached.
9: The VLAN tag.

29.2.6.3. Example: Bond interface node network configuration policy
Copy link

Create a bond interface on nodes in the cluster by applying a NodeNetworkConfigurationPolicy manifest to the cluster.

Note

OpenShift Container Platform only supports the following bond modes:

active-backup
balance-xor
802.3ad

Other bond modes are not supported.

The balance-xor and 802.3ad bond modes require switch configuration to establish an "EtherChannel" or similar port grouping. Those two modes also require additional load-balancing configuration, depending on the source and destination of traffic being passed through the interface. The active-backup bond mode does not require any switch configuration. Other bond modes are not supported.

The following YAML file is an example of a manifest for a bond interface. It includes samples values that you must replace with your own information.

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: bond0-eth1-eth2-policy 
spec:
  nodeSelector: 
    kubernetes.io/hostname: <node01> 
  desiredState:
    interfaces:
    - name: bond0 
      description: Bond with ports eth1 and eth2 
      type: bond 
      state: up 
      ipv4:
        dhcp: true 
        enabled: true 
      link-aggregation:
        mode: active-backup 
        options:
          miimon: '140' 
        port: 
        - eth1
        - eth2
      mtu: 1450

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: bond0-eth1-eth2-policy

1


spec:
  nodeSelector:

2


    kubernetes.io/hostname: <node01>

3


  desiredState:
    interfaces:
    - name: bond0

4


      description: Bond with ports eth1 and eth2

5


      type: bond

6


      state: up

7


      ipv4:
        dhcp: true

8


        enabled: true

9


      link-aggregation:
        mode: active-backup

10


        options:
          miimon: '140'

11


        port:

12


        - eth1
        - eth2
      mtu: 1450

13

Copy to Clipboard

Toggle word wrap

1: Name of the policy.
2: Optional: If you do not include the nodeSelector parameter, the policy applies to all nodes in the cluster.
3: This example uses a hostname node selector.
4: Name of the interface.
5: Optional: Human-readable description of the interface.
6: The type of interface. This example creates a bond.
7: The requested state for the interface after creation.
8: Optional: If you do not use dhcp, you can either set a static IP or leave the interface without an IP address.
9: Enables ipv4 in this example.
10: The driver mode for the bond. This example uses active backup.
11: Optional: This example uses miimon to inspect the bond link every 140ms.
12: The subordinate node NICs in the bond.
13: Optional: The maximum transmission unit (MTU) for the bond. If not specified, this value is set to 1500 by default.

29.2.6.4. Example: Ethernet interface node network configuration policy
Copy link

Configure an Ethernet interface on nodes in the cluster by applying a NodeNetworkConfigurationPolicy manifest to the cluster.

The following YAML file is an example of a manifest for an Ethernet interface. It includes sample values that you must replace with your own information.

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: eth1-policy 
spec:
  nodeSelector: 
    kubernetes.io/hostname: <node01> 
  desiredState:
    interfaces:
    - name: eth1 
      description: Configuring eth1 on node01 
      type: ethernet 
      state: up 
      ipv4:
        dhcp: true 
        enabled: true

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: eth1-policy

1


spec:
  nodeSelector:

2


    kubernetes.io/hostname: <node01>

3


  desiredState:
    interfaces:
    - name: eth1

4


      description: Configuring eth1 on node01

5


      type: ethernet

6


      state: up

7


      ipv4:
        dhcp: true

8


        enabled: true

9

Copy to Clipboard

Toggle word wrap

1: Name of the policy.
2: Optional: If you do not include the nodeSelector parameter, the policy applies to all nodes in the cluster.
3: This example uses a hostname node selector.
4: Name of the interface.
5: Optional: Human-readable description of the interface.
6: The type of interface. This example creates an Ethernet networking interface.
7: The requested state for the interface after creation.
8: Optional: If you do not use dhcp, you can either set a static IP or leave the interface without an IP address.
9: Enables ipv4 in this example.

29.2.6.5. Example: Multiple interfaces in the same node network configuration policy
Copy link

You can create multiple interfaces in the same node network configuration policy. These interfaces can reference each other, allowing you to build and deploy a network configuration by using a single policy manifest.

The following example YAML file creates a bond that is named bond10 across two NICs and VLAN that is named bond10.103 that connects to the bond.

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: bond-vlan 
spec:
  nodeSelector: 
    kubernetes.io/hostname: <node01> 
  desiredState:
    interfaces:
    - name: bond10 
      description: Bonding eth2 and eth3 
      type: bond 
      state: up 
      link-aggregation:
        mode: balance-xor 
        options:
          miimon: '140' 
        port: 
        - eth2
        - eth3
    - name: bond10.103 
      description: vlan using bond10 
      type: vlan 
      state: up 
      vlan:
         base-iface: bond10 
         id: 103 
      ipv4:
        dhcp: true 
        enabled: true

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: bond-vlan

1


spec:
  nodeSelector:

2


    kubernetes.io/hostname: <node01>

3


  desiredState:
    interfaces:
    - name: bond10

4


      description: Bonding eth2 and eth3

5


      type: bond

6


      state: up

7


      link-aggregation:
        mode: balance-xor

8


        options:
          miimon: '140'

9


        port:

10


        - eth2
        - eth3
    - name: bond10.103

11


      description: vlan using bond10

12


      type: vlan

13


      state: up

14


      vlan:
         base-iface: bond10

15


         id: 103

16


      ipv4:
        dhcp: true

17


        enabled: true

18

Copy to Clipboard

Toggle word wrap

1: Name of the policy.
2: Optional: If you do not include the nodeSelector parameter, the policy applies to all nodes in the cluster.
3: This example uses hostname node selector.
4 11: Name of the interface.
5 12: Optional: Human-readable description of the interface.
6 13: The type of interface.
7 14: The requested state for the interface after creation.
8: The driver mode for the bond.
9: Optional: This example uses miimon to inspect the bond link every 140ms.
10: The subordinate node NICs in the bond.
15: The node NIC to which the VLAN is attached.
16: The VLAN tag.
17: Optional: If you do not use dhcp, you can either set a static IP or leave the interface without an IP address.
18: Enables ipv4 in this example.

29.2.7. Capturing the static IP of a NIC attached to a bridge
Copy link

Important

Capturing the static IP of a NIC is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

29.2.7.1. Example: Linux bridge interface node network configuration policy to inherit static IP address from the NIC attached to the bridge
Copy link

Create a Linux bridge interface on nodes in the cluster and transfer the static IP configuration of the NIC to the bridge by applying a single NodeNetworkConfigurationPolicy manifest to the cluster.

The following YAML file is an example of a manifest for a Linux bridge interface. It includes sample values that you must replace with your own information.

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-eth1-copy-ipv4-policy 
spec:
  nodeSelector: 
    node-role.kubernetes.io/worker: ""
  capture:
    eth1-nic: interfaces.name=="eth1" 
    eth1-routes: routes.running.next-hop-interface=="eth1"
    br1-routes: capture.eth1-routes | routes.running.next-hop-interface := "br1"
  desiredState:
    interfaces:
      - name: br1
        description: Linux bridge with eth1 as a port
        type: linux-bridge 
        state: up
        ipv4: "{{ capture.eth1-nic.interfaces.0.ipv4 }}" 
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: eth1 
     routes:
        config: "{{ capture.br1-routes.routes.running }}"

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: br1-eth1-copy-ipv4-policy

1


spec:
  nodeSelector:

2


    node-role.kubernetes.io/worker: ""
  capture:
    eth1-nic: interfaces.name=="eth1"

3


    eth1-routes: routes.running.next-hop-interface=="eth1"
    br1-routes: capture.eth1-routes | routes.running.next-hop-interface := "br1"
  desiredState:
    interfaces:
      - name: br1
        description: Linux bridge with eth1 as a port
        type: linux-bridge

4


        state: up
        ipv4: "{{ capture.eth1-nic.interfaces.0.ipv4 }}"

5


        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: eth1

6


     routes:
        config: "{{ capture.br1-routes.routes.running }}"

Copy to Clipboard

Toggle word wrap

1: The name of the policy.
2: Optional: If you do not include the nodeSelector parameter, the policy applies to all nodes in the cluster. This example uses the node-role.kubernetes.io/worker: "" node selector to select all worker nodes in the cluster.
3: The reference to the node NIC to which the bridge attaches.
4: The type of interface. This example creates a bridge.
5: The IP address of the bridge interface. This value matches the IP address of the NIC which is referenced by the spec.capture.eth1-nic entry.
6: The node NIC to which the bridge attaches.

29.2.8. Examples: IP management
Copy link

The following example configuration snippets show different methods of IP management.

These examples use the ethernet interface type to simplify the example while showing the related context in the policy configuration. These IP management examples can be used with the other interface types.

29.2.8.1. Static
Copy link

The following snippet statically configures an IP address on the Ethernet interface:

# ...
    interfaces:
    - name: eth1
      description: static IP on eth1
      type: ethernet
      state: up
      ipv4:
        dhcp: false
        address:
        - ip: 192.168.122.250 
          prefix-length: 24
        enabled: true
# ...

# ...
    interfaces:
    - name: eth1
      description: static IP on eth1
      type: ethernet
      state: up
      ipv4:
        dhcp: false
        address:
        - ip: 192.168.122.250

1


          prefix-length: 24
        enabled: true
# ...

Copy to Clipboard

Toggle word wrap

1: Replace this value with the static IP address for the interface.

29.2.8.2. No IP address
Copy link

The following snippet ensures that the interface has no IP address:

# ...
    interfaces:
    - name: eth1
      description: No IP on eth1
      type: ethernet
      state: up
      ipv4:
        enabled: false
# ...

# ...
    interfaces:
    - name: eth1
      description: No IP on eth1
      type: ethernet
      state: up
      ipv4:
        enabled: false
# ...

Copy to Clipboard

Toggle word wrap

Important

Always set the state parameter to up when you set both the ipv4.enabled and the ipv6.enabled parameter to false to disable an interface. If you set state: down with this configuration, the interface receives a DHCP IP address because of automatic DHCP assignment.

29.2.8.3. Dynamic host configuration
Copy link

The following snippet configures an Ethernet interface that uses a dynamic IP address, gateway address, and DNS:

# ...
    interfaces:
    - name: eth1
      description: DHCP on eth1
      type: ethernet
      state: up
      ipv4:
        dhcp: true
        enabled: true
# ...

# ...
    interfaces:
    - name: eth1
      description: DHCP on eth1
      type: ethernet
      state: up
      ipv4:
        dhcp: true
        enabled: true
# ...

Copy to Clipboard

Toggle word wrap

The following snippet configures an Ethernet interface that uses a dynamic IP address but does not use a dynamic gateway address or DNS:

# ...
    interfaces:
    - name: eth1
      description: DHCP without gateway or DNS on eth1
      type: ethernet
      state: up
      ipv4:
        dhcp: true
        auto-gateway: false
        auto-dns: false
        enabled: true
# ...

# ...
    interfaces:
    - name: eth1
      description: DHCP without gateway or DNS on eth1
      type: ethernet
      state: up
      ipv4:
        dhcp: true
        auto-gateway: false
        auto-dns: false
        enabled: true
# ...

Copy to Clipboard

Toggle word wrap

29.2.8.4. DNS
Copy link

By default, the nmstate API stores DNS values globally as against storing them in a network interface. For certain situations, you must configure a network interface to store DNS values.

Tip

Setting a DNS configuration is comparable to modifying the /etc/resolv.conf file.

To define a DNS configuration for a network interface, you must initially specify the dns-resolver section in the network interface’s YAML configuration file. To apply an NNCP configuration to your network interface, you need to run the oc apply -f <nncp_file_name> command.

The following example shows a default situation that stores DNS values globally:

Configure a static DNS without a network interface. Note that when updating the /etc/resolv.conf file on a host node, you do not need to specify an interface, IPv4 or IPv6, in the NodeNetworkConfigurationPolicy (NNCP) manifest.

Example of a DNS configuration for a network interface that globally stores DNS values

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
 name: worker-0-dns-testing
spec:
  nodeSelector:
    kubernetes.io/hostname: <target_node>
  desiredState:
    dns-resolver:
      config:
        server:
        - 2001:db8:f::1
        - 192.0.2.251
        search:
        - example.com
        - example.org
# ...

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
 name: worker-0-dns-testing
spec:
  nodeSelector:
    kubernetes.io/hostname: <target_node>
  desiredState:
    dns-resolver:
      config:
        server:
        - 2001:db8:f::1
        - 192.0.2.251
        search:
        - example.com
        - example.org
# ...

Copy to Clipboard

Toggle word wrap

Important

You can specify DNS options under the dns-resolver.config section of your NNCP file as demonstrated in the following example:

# ...
desiredState:
    dns-resolver:
      config:
        options:
         - timeout:2
         - attempts:3
# ...

# ...
desiredState:
    dns-resolver:
      config:
        options:
         - timeout:2
         - attempts:3
# ...

Copy to Clipboard

Toggle word wrap

If you want to remove the DNS options from your network interface, apply the following configuration to your NNCP and then run the oc apply -f <nncp_file_name> command:

# ...
    dns-resolver:
      config: {}
    interfaces: []
# ...

# ...
    dns-resolver:
      config: {}
    interfaces: []
# ...

Copy to Clipboard

Toggle word wrap

The following examples show situations that require configuring a network interface to store DNS values:

If you want to rank a static DNS name server over a dynamic DNS name server, define the interface that runs either the Dynamic Host Configuration Protocol (DHCP) or the IPv6 Autoconfiguration (autoconf) mechanism in the network interface YAML configuration file.

Example configuration that adds 192.0.2.1 to DNS name servers retrieved from the DHCPv4 network protocol

# ...
dns-resolver:
  config:
    server:
    - 192.0.2.1
interfaces:
  - name: eth1
    type: ethernet
    state: up
    ipv4:
      enabled: true
      dhcp: true
      auto-dns: true
# ...

# ...
dns-resolver:
  config:
    server:
    - 192.0.2.1
interfaces:
  - name: eth1
    type: ethernet
    state: up
    ipv4:
      enabled: true
      dhcp: true
      auto-dns: true
# ...

Copy to Clipboard

Toggle word wrap

If you need to configure a network interface to store DNS values instead of adopting the default method, which uses the nmstate API to store DNS values globally, you can set static DNS values and static IP addresses in the network interface YAML file.

Important

Storing DNS values at the network interface level might cause name resolution issues after you attach the interface to network components, such as an Open vSwitch (OVS) bridge, a Linux bridge, or a bond.

Example configuration that stores DNS values at the interface level

# ...
dns-resolver:
  config:
    server:
    - 2001:db8:1::d1
    - 2001:db8:1::d2
    - 192.0.2.1
    search:
    - example.com
    - example.org
interfaces:
  - name: eth1
    type: ethernet
    state: up
    ipv4:
      address:
      - ip: 192.0.2.251
        prefix-length: 24
      dhcp: false
      enabled: true
    ipv6:
      address:
      - ip: 2001:db8:1::1
        prefix-length: 64
      dhcp: false
      enabled: true
      autoconf: false
# ...

# ...
dns-resolver:
  config:
    server:
    - 2001:db8:1::d1
    - 2001:db8:1::d2
    - 192.0.2.1
    search:
    - example.com
    - example.org
interfaces:
  - name: eth1
    type: ethernet
    state: up
    ipv4:
      address:
      - ip: 192.0.2.251
        prefix-length: 24
      dhcp: false
      enabled: true
    ipv6:
      address:
      - ip: 2001:db8:1::1
        prefix-length: 64
      dhcp: false
      enabled: true
      autoconf: false
# ...

Copy to Clipboard

Toggle word wrap

If you want to set static DNS search domains and static DNS name servers for your network interface, define the static interface that runs either the Dynamic Host Configuration Protocol (DHCP) or the IPv6 Autoconfiguration (autoconf) mechanism in the network interface YAML configuration file.
Important
Specifying the following dns-resolver configurations in the network interface YAML file might cause a race condition at reboot that prevents the NodeNetworkConfigurationPolicy (NNCP) from applying to pods that run in your cluster:
- Setting static DNS search domains and dynamic DNS name servers for your network interface.
- Specifying domain suffixes for the search parameter and not setting IP addresses for the server parameter.
Example configuration that sets example.com and example.org static DNS search domains along with static DNS name server settings
```
# ...
dns-resolver:
  config:
    server:
    - 2001:db8:f::1
    - 192.0.2.251
    search:
    - example.com
    - example.org
interfaces:
  - name: eth1
    type: ethernet
    state: up
    ipv4:
      enabled: true
      dhcp: true
      auto-dns: true
    ipv6:
      enabled: true
      dhcp: true
      autoconf: true
      auto-dns: true
# ...
```
```
# ...
dns-resolver:
  config:
    server:
    - 2001:db8:f::1
    - 192.0.2.251
    search:
    - example.com
    - example.org
interfaces:
  - name: eth1
    type: ethernet
    state: up
    ipv4:
      enabled: true
      dhcp: true
      auto-dns: true
    ipv6:
      enabled: true
      dhcp: true
      autoconf: true
      auto-dns: true
# ...
```
Copy to Clipboard Toggle word wrap

29.2.8.5. Static routing
Copy link

The following snippet configures a static route and a static IP on interface eth1.

dns-resolver:
  config:
# ...
interfaces:
  - name: eth1
    description: Static routing on eth1
    type: ethernet
    state: up
    ipv4:
      dhcp: false
      enabled: true
      address:
      - ip: 192.0.2.251 
        prefix-length: 24
routes:
  config:
  - destination: 198.51.100.0/24
    metric: 150
    next-hop-address: 192.0.2.1 
    next-hop-interface: eth1
    table-id: 254
# ...

dns-resolver:
  config:
# ...
interfaces:
  - name: eth1
    description: Static routing on eth1
    type: ethernet
    state: up
    ipv4:
      dhcp: false
      enabled: true
      address:
      - ip: 192.0.2.251

1


        prefix-length: 24
routes:
  config:
  - destination: 198.51.100.0/24
    metric: 150
    next-hop-address: 192.0.2.1

2


    next-hop-interface: eth1
    table-id: 254
# ...

Copy to Clipboard

Toggle word wrap

1: The static IP address for the Ethernet interface.
2: The next hop address for the node traffic. This must be in the same subnet as the IP address set for the Ethernet interface.

Important

You cannot use the OVN-Kubernetes br-ex bridge as the next hop interface when configuring a static route.

29.3. Troubleshooting node network configuration
Copy link

If the node network configuration encounters an issue, the policy is automatically rolled back and the enactments report failure. This includes issues such as:

The configuration fails to be applied on the host.
The host loses connection to the default gateway.
The host loses connection to the API server.

29.3.1. Troubleshooting an incorrect node network configuration policy configuration
Copy link

You can apply changes to the node network configuration across your entire cluster by applying a node network configuration policy.

If you applied an incorrect configuration, you can use the following example to troubleshoot and correct the failed node network policy. The example attempts to apply a Linux bridge policy to a cluster that has three control plane nodes and three compute nodes. The policy is not applied because the policy references the wrong interface.

To find an error, you need to investigate the available NMState resources. You can then update the policy with the correct configuration.

Prerequisites

You ensured that an ens01 interface does not exist on your Linux system.

Procedure

Create a policy on your cluster. The following example creates a simple bridge, br1 that has ens01 as its member:

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: ens01-bridge-testfail
spec:
  desiredState:
    interfaces:
      - name: br1
        description: Linux bridge with the wrong port
        type: linux-bridge
        state: up
        ipv4:
          dhcp: true
          enabled: true
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: ens01
# ...

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: ens01-bridge-testfail
spec:
  desiredState:
    interfaces:
      - name: br1
        description: Linux bridge with the wrong port
        type: linux-bridge
        state: up
        ipv4:
          dhcp: true
          enabled: true
        bridge:
          options:
            stp:
              enabled: false
          port:
            - name: ens01
# ...

Copy to Clipboard

Toggle word wrap

Apply the policy to your network interface:

oc apply -f ens01-bridge-testfail.yaml

$ oc apply -f ens01-bridge-testfail.yaml

Copy to Clipboard

Toggle word wrap

Example output

nodenetworkconfigurationpolicy.nmstate.io/ens01-bridge-testfail created

nodenetworkconfigurationpolicy.nmstate.io/ens01-bridge-testfail created

Copy to Clipboard

Toggle word wrap

Verify the status of the policy by running the following command:
```
oc get nncp
```
```
$ oc get nncp
```
Copy to Clipboard Toggle word wrap
The output shows that the policy failed:
Example output
```
NAME                    STATUS
ens01-bridge-testfail   FailedToConfigure
```
```
NAME                    STATUS
ens01-bridge-testfail   FailedToConfigure
```
Copy to Clipboard Toggle word wrap
The policy status alone does not indicate if it failed on all nodes or a subset of nodes.

List the node network configuration enactments to see if the policy was successful on any of the nodes. If the policy failed for only a subset of nodes, the output suggests that the problem is with a specific node configuration. If the policy failed on all nodes, the output suggests that the problem is with the policy.

oc get nnce

$ oc get nnce

Copy to Clipboard

Toggle word wrap

The output shows that the policy failed on all nodes:

Example output

NAME                                         STATUS
control-plane-1.ens01-bridge-testfail        FailedToConfigure
control-plane-2.ens01-bridge-testfail        FailedToConfigure
control-plane-3.ens01-bridge-testfail        FailedToConfigure
compute-1.ens01-bridge-testfail              FailedToConfigure
compute-2.ens01-bridge-testfail              FailedToConfigure
compute-3.ens01-bridge-testfail              FailedToConfigure

NAME                                         STATUS
control-plane-1.ens01-bridge-testfail        FailedToConfigure
control-plane-2.ens01-bridge-testfail        FailedToConfigure
control-plane-3.ens01-bridge-testfail        FailedToConfigure
compute-1.ens01-bridge-testfail              FailedToConfigure
compute-2.ens01-bridge-testfail              FailedToConfigure
compute-3.ens01-bridge-testfail              FailedToConfigure

Copy to Clipboard

Toggle word wrap

View one of the failed enactments. The following command uses the output tool jsonpath to filter the output:

oc get nnce compute-1.ens01-bridge-testfail -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}'

$ oc get nnce compute-1.ens01-bridge-testfail -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}'

Copy to Clipboard

Toggle word wrap

Example output

[2024-10-10T08:40:46Z INFO  nmstatectl] Nmstate version: 2.2.37
NmstateError: InvalidArgument: Controller interface br1 is holding unknown port ens01

[2024-10-10T08:40:46Z INFO  nmstatectl] Nmstate version: 2.2.37
NmstateError: InvalidArgument: Controller interface br1 is holding unknown port ens01

Copy to Clipboard

Toggle word wrap

The previous example shows the output from an InvalidArgument error that indicates that the ens01 is an unknown port. For this example, you might need to change the port configuration in the policy configuration file.

To ensure that the policy is configured properly, view the network configuration for one or all of the nodes by requesting the NodeNetworkState object. The following command returns the network configuration for the control-plane-1 node:
```
oc get nns control-plane-1 -o yaml
```
```
$ oc get nns control-plane-1 -o yaml
```
Copy to Clipboard Toggle word wrap
The output shows that the interface name on the nodes is ens1 but the failed policy incorrectly uses ens01:
Example output
```
   - ipv4:
# ...
      name: ens1
      state: up
      type: ethernet
```
```
   - ipv4:
# ...
      name: ens1
      state: up
      type: ethernet
```
Copy to Clipboard Toggle word wrap
Correct the error by editing the existing policy:
```
oc edit nncp ens01-bridge-testfail
```
```
$ oc edit nncp ens01-bridge-testfail
```
Copy to Clipboard Toggle word wrap
```
# ...
          port:
            - name: ens1
```
```
# ...
          port:
            - name: ens1
```
Copy to Clipboard Toggle word wrap
Save the policy to apply the correction.
Check the status of the policy to ensure it updated successfully:
```
oc get nncp
```
```
$ oc get nncp
```
Copy to Clipboard Toggle word wrap
Example output
```
NAME                    STATUS
ens01-bridge-testfail   SuccessfullyConfigured
```
```
NAME                    STATUS
ens01-bridge-testfail   SuccessfullyConfigured
```
Copy to Clipboard Toggle word wrap
The updated policy is successfully configured on all nodes in the cluster.

29.3.2. Troubleshooting DNS connectivity issues in a disconnected environment
Copy link

If you experience DNS connectivity issues when configuring nmstate in a disconnected environment, you can configure the DNS server to resolve the list of name servers for the domain root-servers.net.

Important

Ensure that the DNS server includes a name server (NS) entry for the root-servers.net zone. The DNS server does not need to forward a query to an upstream resolver, but the server must return a correct answer for the NS query.

29.3.2.1. Configuring the bind9 DNS named server
Copy link

For a cluster configured to query a bind9 DNS server, you can add the root-servers.net zone to a configuration file that contains at least one NS record. For example you can use the /var/named/named.localhost as a zone file that already matches this criteria.

Procedure

Add the root-servers.net zone at the end of the /etc/named.conf configuration file by running the following command:

cat >> /etc/named.conf <<EOF
zone "root-servers.net" IN {
    	type master;
    	file "named.localhost";
};
EOF

$ cat >> /etc/named.conf <<EOF
zone "root-servers.net" IN {
    	type master;
    	file "named.localhost";
};
EOF

Copy to Clipboard

Toggle word wrap

Restart the named service by running the following command:
```
systemctl restart named
```
```
$ systemctl restart named
```
Copy to Clipboard Toggle word wrap

Confirm that the root-servers.net zone is present by running the following command:

journalctl -u named|grep root-servers.net

$ journalctl -u named|grep root-servers.net

Copy to Clipboard

Toggle word wrap

Example output

Jul 03 15:16:26 rhel-8-10 bash[xxxx]: zone root-servers.net/IN: loaded serial 0
Jul 03 15:16:26 rhel-8-10 named[xxxx]: zone root-servers.net/IN: loaded serial 0

Jul 03 15:16:26 rhel-8-10 bash[xxxx]: zone root-servers.net/IN: loaded serial 0
Jul 03 15:16:26 rhel-8-10 named[xxxx]: zone root-servers.net/IN: loaded serial 0

Copy to Clipboard

Toggle word wrap

Verify that the DNS server can resolve the NS record for the root-servers.net domain by running the following command:

host -t NS root-servers.net. 127.0.0.1

$ host -t NS root-servers.net. 127.0.0.1

Copy to Clipboard

Toggle word wrap

Example output

Using domain server:
Name: 127.0.0.1
Address: 127.0.0.53
Aliases:
root-servers.net name server root-servers.net.

Using domain server:
Name: 127.0.0.1
Address: 127.0.0.53
Aliases:
root-servers.net name server root-servers.net.

Copy to Clipboard

Toggle word wrap

29.3.2.2. Configuring the dnsmasq DNS server
Copy link

If you are using dnsmasq as the DNS server, you can delegate resolution of the root-servers.net domain to another DNS server, for example, by creating a new configuration file that resolves root-servers.net using a DNS server that you specify.

Create a configuration file that delegates the domain root-servers.net to another DNS server by running the following command:

echo 'server=/root-servers.net/<DNS_server_IP>'> /etc/dnsmasq.d/delegate-root-servers.net.conf

$ echo 'server=/root-servers.net/<DNS_server_IP>'> /etc/dnsmasq.d/delegate-root-servers.net.conf

Copy to Clipboard

Toggle word wrap

Restart the dnsmasq service by running the following command:
```
systemctl restart dnsmasq
```
```
$ systemctl restart dnsmasq
```
Copy to Clipboard Toggle word wrap

Confirm that the root-servers.net domain is delegated to another DNS server by running the following command:

journalctl -u dnsmasq|grep root-servers.net

$ journalctl -u dnsmasq|grep root-servers.net

Copy to Clipboard

Toggle word wrap

Example output

Jul 03 15:31:25 rhel-8-10 dnsmasq[1342]: using nameserver 192.168.1.1#53 for domain root-servers.net

Jul 03 15:31:25 rhel-8-10 dnsmasq[1342]: using nameserver 192.168.1.1#53 for domain root-servers.net

Copy to Clipboard

Toggle word wrap

Verify that the DNS server can resolve the NS record for the root-servers.net domain by running the following command:

host -t NS root-servers.net. 127.0.0.1

$ host -t NS root-servers.net. 127.0.0.1

Copy to Clipboard

Toggle word wrap

Example output

Using domain server:
Name: 127.0.0.1
Address: 127.0.0.1#53
Aliases:
root-servers.net name server root-servers.net.

Using domain server:
Name: 127.0.0.1
Address: 127.0.0.1#53
Aliases:
root-servers.net name server root-servers.net.

Copy to Clipboard

Toggle word wrap

Chapter 30. Configuring the cluster-wide proxy
Copy link

Production environments can deny direct access to the internet and instead have an HTTP or HTTPS proxy available. You can configure OpenShift Container Platform to use a proxy by modifying the Proxy object for existing clusters or by configuring the proxy settings in the install-config.yaml file for new clusters.

After you enable a cluster-wide egress proxy for your cluster on a supported platform, Red Hat Enterprise Linux CoreOS (RHCOS) populates the status.noProxy parameter with the values of the networking.machineNetwork[].cidr, networking.clusterNetwork[].cidr, and networking.serviceNetwork[] fields from your install-config.yaml file that exists on the supported platform.

Note

As a postinstallation task, you can change the networking.clusterNetwork[].cidr value, but not the networking.machineNetwork[].cidr and the networking.serviceNetwork[] values. For more information, see "Configuring the cluster network range".

For installations on Amazon Web Services (AWS), Google Cloud, Microsoft Azure, and Red Hat OpenStack Platform (RHOSP), the status.noProxy parameter is also populated with the instance metadata endpoint, 169.254.169.254.

Example of values added to the status: segment of a Proxy object by RHCOS

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
# ...
networking:
  clusterNetwork: 
  - cidr: <ip_address_from_cidr>
    hostPrefix: 23
  network type: OVNKubernetes
  machineNetwork: 
  - cidr: <ip_address_from_cidr>
  serviceNetwork: 
  - 172.30.0.0/16
# ...
status:
  noProxy:
  - localhost
  - .cluster.local
  - .svc
  - 127.0.0.1
  - <api_server_internal_url> 
# ...

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
# ...
networking:
  clusterNetwork:

1


  - cidr: <ip_address_from_cidr>
    hostPrefix: 23
  network type: OVNKubernetes
  machineNetwork:

2


  - cidr: <ip_address_from_cidr>
  serviceNetwork:

3


  - 172.30.0.0/16
# ...
status:
  noProxy:
  - localhost
  - .cluster.local
  - .svc
  - 127.0.0.1
  - <api_server_internal_url>

4


# ...

Copy to Clipboard

Toggle word wrap

1: Specify IP address blocks from which pod IP addresses are allocated. The default value is 10.128.0.0/14 with a host prefix of /23.
2: Specify the IP address blocks for machines. The default value is 10.0.0.0/16.
3: Specify IP address block for services. The default value is 172.30.0.0/16.
4: You can find the URL of the internal API server by running the oc get infrastructures.config.openshift.io cluster -o jsonpath='{.status.etcdDiscoveryDomain}' command.

Important

If your installation type does not include setting the networking.machineNetwork[].cidr field, you must include the machine IP addresses manually in the .status.noProxy field to make sure that the traffic between nodes can bypass the proxy.

30.1. Prerequisites
Copy link

Review the sites that your cluster requires access to and determine whether any of them must bypass the proxy. By default, all cluster system egress traffic is proxied, including calls to the cloud provider API for the cloud that hosts your cluster. The system-wide proxy affects system components only, not user workloads. If necessary, add sites to the spec.noProxy parameter of the Proxy object to bypass the proxy.

30.2. Enabling the cluster-wide proxy
Copy link

The Proxy object is used to manage the cluster-wide egress proxy. When a cluster is installed or upgraded without the proxy configured, a Proxy object is still generated but it will have a nil spec. For example:

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  trustedCA:
    name: ""
status:

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  trustedCA:
    name: ""
status:

Copy to Clipboard

Toggle word wrap

A cluster administrator can configure the proxy for OpenShift Container Platform by modifying this cluster Proxy object.

Note

Only the Proxy object named cluster is supported, and no additional proxies can be created.

Warning

Enabling the cluster-wide proxy causes the Machine Config Operator (MCO) to trigger node reboot.

Prerequisites

You have cluster administrator permissions.
You installed the OpenShift Container Platform oc CLI tool.

Procedure

Create a config map that contains any additional CA certificates required for proxying HTTPS connections.
Note
You can skip this step if the proxy’s identity certificate is signed by an authority from the RHCOS trust bundle.
1. Create a file called user-ca-bundle.yaml with the following contents, and provide the values of your PEM-encoded certificates:
  apiVersion: v1 data: ca-bundle.crt: |
  1
  <MY_PEM_ENCODED_CERTS>
  2
  kind: ConfigMap metadata: name: user-ca-bundle
  3
  namespace: openshift-config
  4
  Copy to Clipboard Toggle word wrap
  1
  This data key must be named ca-bundle.crt.
  2
  One or more PEM-encoded X.509 certificates used to sign the proxy’s identity certificate.
  3
  The config map name that will be referenced from the Proxy object.
  4
  The config map must be in the openshift-config namespace.
2. Create the config map from this file:
  $ oc create -f user-ca-bundle.yaml
  Copy to Clipboard Toggle word wrap
Use the oc edit command to modify the Proxy object:
```
oc edit proxy/cluster
```
```
$ oc edit proxy/cluster
```
Copy to Clipboard Toggle word wrap

Configure the necessary fields for the proxy:

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  httpProxy: http://<username>:<pswd>@<ip>:<port> 
  httpsProxy: https://<username>:<pswd>@<ip>:<port> 
  noProxy: example.com 
  readinessEndpoints:
  - http://www.google.com 
  - https://www.google.com
  trustedCA:
    name: user-ca-bundle

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  httpProxy: http://<username>:<pswd>@<ip>:<port>

1


  httpsProxy: https://<username>:<pswd>@<ip>:<port>

2


  noProxy: example.com

3


  readinessEndpoints:
  - http://www.google.com

4


  - https://www.google.com
  trustedCA:
    name: user-ca-bundle

5

Copy to Clipboard

Toggle word wrap

A proxy URL to use for creating HTTP connections outside the cluster. The URL scheme must be http.

A proxy URL to use for creating HTTPS connections outside the cluster. The URL scheme must be either http or https. Specify a URL for the proxy that supports the URL scheme. For example, most proxies will report an error if they are configured to use https but they only support http. This failure message may not propagate to the logs and can appear to be a network connection failure instead. If using a proxy that listens for https connections from the cluster, you may need to configure the cluster to accept the CAs and certificates that the proxy uses.

A comma-separated list of destination domain names, domains, IP addresses (or other network CIDRs), and port numbers to exclude proxying.

Note

Port numbers are only supported when configuring IPv6 addresses. Port numbers are not supported when configuring IPv4 addresses.

Preface a domain with . to match subdomains only. For example, .y.com matches x.y.com, but not y.com. Use * to bypass proxy for all destinations. If you scale up workers that are not included in the network defined by the networking.machineNetwork[].cidr field from the installation configuration, you must add them to this list to prevent connection issues.

This field is ignored if neither the httpProxy or httpsProxy fields are set.

One or more URLs external to the cluster to use to perform a readiness check before writing the httpProxy and httpsProxy values to status.

A reference to the config map in the openshift-config namespace that contains additional CA certificates required for proxying HTTPS connections. Note that the config map must already exist before referencing it here. This field is required unless the proxy’s identity certificate is signed by an authority from the RHCOS trust bundle.

Save the file to apply the changes.

30.3. Removing the cluster-wide proxy
Copy link

The cluster Proxy object cannot be deleted. To remove the proxy from a cluster, remove all spec fields from the Proxy object.

Prerequisites

Cluster administrator permissions
OpenShift Container Platform oc CLI tool installed

Procedure

Use the oc edit command to modify the proxy:
```
oc edit proxy/cluster
```
```
$ oc edit proxy/cluster
```
Copy to Clipboard Toggle word wrap

Remove all spec fields from the Proxy object. For example:

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec: {}

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec: {}

Copy to Clipboard

Toggle word wrap

Save the file to apply the changes.

30.4. Verifying the cluster-wide proxy configuration
Copy link

After the cluster-wide proxy configuration is deployed, you can verify that it is working as expected. Follow these steps to check the logs and validate the implementation.

Prerequisites

You have cluster administrator permissions.
You have the OpenShift Container Platform oc CLI tool installed.

Procedure

Check the proxy configuration status using the oc command:
```
oc get proxy/cluster -o yaml
```
```
$ oc get proxy/cluster -o yaml
```
Copy to Clipboard Toggle word wrap
Verify the proxy fields in the output to ensure they match your configuration. Specifically, check the spec.httpProxy, spec.httpsProxy, spec.noProxy, and spec.trustedCA fields.

Inspect the status of the Proxy object:

oc get proxy/cluster -o jsonpath='{.status}'

$ oc get proxy/cluster -o jsonpath='{.status}'

Copy to Clipboard

Toggle word wrap

Example output

{
status:
    httpProxy: http://user:xxx@xxxx:3128
    httpsProxy: http://user:xxx@xxxx:3128
    noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,localhost,test.no-proxy.com
}

{
status:
    httpProxy: http://user:xxx@xxxx:3128
    httpsProxy: http://user:xxx@xxxx:3128
    noProxy: .cluster.local,.svc,10.0.0.0/16,10.128.0.0/14,127.0.0.1,169.254.169.254,172.30.0.0/16,localhost,test.no-proxy.com
}

Copy to Clipboard

Toggle word wrap

Check the logs of the Machine Config Operator (MCO) to ensure that the configuration changes were applied successfully:

oc logs -n openshift-machine-config-operator $(oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-operator -o name)

$ oc logs -n openshift-machine-config-operator $(oc get pods -n openshift-machine-config-operator -l k8s-app=machine-config-operator -o name)

Copy to Clipboard

Toggle word wrap

Look for messages that indicate the proxy settings were applied and the nodes were rebooted if necessary.

Verify that system components are using the proxy by checking the logs of a component that makes external requests, such as the Cluster Version Operator (CVO):

oc logs -n openshift-cluster-version $(oc get pods -n openshift-cluster-version -l k8s-app=machine-config-operator -o name)

$ oc logs -n openshift-cluster-version $(oc get pods -n openshift-cluster-version -l k8s-app=machine-config-operator -o name)

Copy to Clipboard

Toggle word wrap

Look for log entries that show that external requests have been routed through the proxy.

Additional resources

Chapter 31. Configuring a custom PKI
Copy link

Some platform components, such as the web console, use Routes for communication and must trust other components' certificates to interact with them. If you are using a custom public key infrastructure (PKI), you must configure it so its privately signed CA certificates are recognized across the cluster.

You can leverage the Proxy API to add cluster-wide trusted CA certificates. You must do this either during installation or at runtime.

During installation, configure the cluster-wide proxy. You must define your privately signed CA certificates in the install-config.yaml file’s additionalTrustBundle setting.
The installation program generates a ConfigMap that is named user-ca-bundle that contains the additional CA certificates you defined. The Cluster Network Operator then creates a trusted-ca-bundle ConfigMap that merges these CA certificates with the Red Hat Enterprise Linux CoreOS (RHCOS) trust bundle; this ConfigMap is referenced in the Proxy object’s trustedCA field.
At runtime, modify the default Proxy object to include your privately signed CA certificates (part of cluster’s proxy enablement workflow). This involves creating a ConfigMap that contains the privately signed CA certificates that should be trusted by the cluster, and then modifying the proxy resource with the trustedCA referencing the privately signed certificates' ConfigMap.

Note

The installer configuration’s additionalTrustBundle field and the proxy resource’s trustedCA field are used to manage the cluster-wide trust bundle; additionalTrustBundle is used at install time and the proxy’s trustedCA is used at runtime.

The trustedCA field is a reference to a ConfigMap containing the custom certificate and key pair used by the cluster component.

31.1. Configuring the cluster-wide proxy during installation
Copy link

Production environments can deny direct access to the internet and instead have an HTTP or HTTPS proxy available. You can configure a new OpenShift Container Platform cluster to use a proxy by configuring the proxy settings in the install-config.yaml file.

Prerequisites

You have an existing install-config.yaml file.
You reviewed the sites that your cluster requires access to and determined whether any of them need to bypass the proxy. By default, all cluster egress traffic is proxied, including calls to hosting cloud provider APIs. You added sites to the Proxy object’s spec.noProxy field to bypass the proxy if necessary.
Note
The Proxy object status.noProxy field is populated with the values of the networking.machineNetwork[].cidr, networking.clusterNetwork[].cidr, and networking.serviceNetwork[] fields from your installation configuration.
For installations on Amazon Web Services (AWS), Google Cloud, Microsoft Azure, and Red Hat OpenStack Platform (RHOSP), the Proxy object status.noProxy field is also populated with the instance metadata endpoint (169.254.169.254).

Procedure

Edit your install-config.yaml file and add the proxy settings. For example:
```
apiVersion: v1
baseDomain: my.domain.com
proxy:
  httpProxy: http://<username>:<pswd>@<ip>:<port> 
  httpsProxy: https://<username>:<pswd>@<ip>:<port> 
  noProxy: ec2.<aws_region>.amazonaws.com,elasticloadbalancing.<aws_region>.amazonaws.com,s3.<aws_region>.amazonaws.com 
additionalTrustBundle: | 
    -----BEGIN CERTIFICATE-----
    <MY_TRUSTED_CA_CERT>
    -----END CERTIFICATE-----
additionalTrustBundlePolicy: <policy_to_add_additionalTrustBundle> 
```
```
apiVersion: v1
baseDomain: my.domain.com
proxy:
  httpProxy: http://<username>:<pswd>@<ip>:<port> 
```
1
```
  httpsProxy: https://<username>:<pswd>@<ip>:<port> 
```
2
```
  noProxy: ec2.<aws_region>.amazonaws.com,elasticloadbalancing.<aws_region>.amazonaws.com,s3.<aws_region>.amazonaws.com 
```
3
```
additionalTrustBundle: | 
```
4
```
    -----BEGIN CERTIFICATE-----
    <MY_TRUSTED_CA_CERT>
    -----END CERTIFICATE-----
additionalTrustBundlePolicy: <policy_to_add_additionalTrustBundle> 
```
5
Copy to Clipboard Toggle word wrap
1
A proxy URL to use for creating HTTP connections outside the cluster. The URL scheme must be http.
2
A proxy URL to use for creating HTTPS connections outside the cluster.
3
A comma-separated list of destination domain names, IP addresses, or other network CIDRs to exclude from proxying. Preface a domain with . to match subdomains only. For example, .y.com matches x.y.com, but not y.com. Use * to bypass the proxy for all destinations. If you have added the Amazon EC2,Elastic Load Balancing, and S3 VPC endpoints to your VPC, you must add these endpoints to the noProxy field.
4
If provided, the installation program generates a config map that is named user-ca-bundle in the openshift-config namespace that contains one or more additional CA certificates that are required for proxying HTTPS connections. The Cluster Network Operator then creates a trusted-ca-bundle config map that merges these contents with the Red Hat Enterprise Linux CoreOS (RHCOS) trust bundle, and this config map is referenced in the trustedCA field of the Proxy object. The additionalTrustBundle field is required unless the proxy’s identity certificate is signed by an authority from the RHCOS trust bundle.
5
Optional: The policy to determine the configuration of the Proxy object to reference the user-ca-bundle config map in the trustedCA field. The allowed values are Proxyonly and Always. Use Proxyonly to reference the user-ca-bundle config map only when http/https proxy is configured. Use Always to always reference the user-ca-bundle config map. The default value is Proxyonly.
Note
The installation program does not support the proxy readinessEndpoints field.
Note
If the installer times out, restart and then complete the deployment by using the wait-for command of the installer. For example:
$ ./openshift-install wait-for install-complete --log-level debug
Copy to Clipboard Toggle word wrap
Save the file and reference it when installing OpenShift Container Platform.

The installation program creates a cluster-wide proxy that is named cluster that uses the proxy settings in the provided install-config.yaml file. If no proxy settings are provided, a cluster Proxy object is still created, but it will have a nil spec.

Note

Only the Proxy object named cluster is supported, and no additional proxies can be created.

31.2. Enabling the cluster-wide proxy
Copy link

The Proxy object is used to manage the cluster-wide egress proxy. When a cluster is installed or upgraded without the proxy configured, a Proxy object is still generated but it will have a nil spec. For example:

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  trustedCA:
    name: ""
status:

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  trustedCA:
    name: ""
status:

Copy to Clipboard

Toggle word wrap

A cluster administrator can configure the proxy for OpenShift Container Platform by modifying this cluster Proxy object.

Note

Only the Proxy object named cluster is supported, and no additional proxies can be created.

Warning

Enabling the cluster-wide proxy causes the Machine Config Operator (MCO) to trigger node reboot.

Prerequisites

You have cluster administrator permissions.
You installed the OpenShift Container Platform oc CLI tool.

Procedure

Create a config map that contains any additional CA certificates required for proxying HTTPS connections.
Note
You can skip this step if the proxy’s identity certificate is signed by an authority from the RHCOS trust bundle.
1. Create a file called user-ca-bundle.yaml with the following contents, and provide the values of your PEM-encoded certificates:
  apiVersion: v1 data: ca-bundle.crt: |
  1
  <MY_PEM_ENCODED_CERTS>
  2
  kind: ConfigMap metadata: name: user-ca-bundle
  3
  namespace: openshift-config
  4
  Copy to Clipboard Toggle word wrap
  1
  This data key must be named ca-bundle.crt.
  2
  One or more PEM-encoded X.509 certificates used to sign the proxy’s identity certificate.
  3
  The config map name that will be referenced from the Proxy object.
  4
  The config map must be in the openshift-config namespace.
2. Create the config map from this file:
  $ oc create -f user-ca-bundle.yaml
  Copy to Clipboard Toggle word wrap
Use the oc edit command to modify the Proxy object:
```
oc edit proxy/cluster
```
```
$ oc edit proxy/cluster
```
Copy to Clipboard Toggle word wrap

Configure the necessary fields for the proxy:

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  httpProxy: http://<username>:<pswd>@<ip>:<port> 
  httpsProxy: https://<username>:<pswd>@<ip>:<port> 
  noProxy: example.com 
  readinessEndpoints:
  - http://www.google.com 
  - https://www.google.com
  trustedCA:
    name: user-ca-bundle

apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  name: cluster
spec:
  httpProxy: http://<username>:<pswd>@<ip>:<port>

1


  httpsProxy: https://<username>:<pswd>@<ip>:<port>

2


  noProxy: example.com

3


  readinessEndpoints:
  - http://www.google.com

4


  - https://www.google.com
  trustedCA:
    name: user-ca-bundle

5

Copy to Clipboard

Toggle word wrap

A proxy URL to use for creating HTTP connections outside the cluster. The URL scheme must be http.

A proxy URL to use for creating HTTPS connections outside the cluster. The URL scheme must be either http or https. Specify a URL for the proxy that supports the URL scheme. For example, most proxies will report an error if they are configured to use https but they only support http. This failure message may not propagate to the logs and can appear to be a network connection failure instead. If using a proxy that listens for https connections from the cluster, you may need to configure the cluster to accept the CAs and certificates that the proxy uses.

A comma-separated list of destination domain names, domains, IP addresses (or other network CIDRs), and port numbers to exclude proxying.

Note

Port numbers are only supported when configuring IPv6 addresses. Port numbers are not supported when configuring IPv4 addresses.

Preface a domain with . to match subdomains only. For example, .y.com matches x.y.com, but not y.com. Use * to bypass proxy for all destinations. If you scale up workers that are not included in the network defined by the networking.machineNetwork[].cidr field from the installation configuration, you must add them to this list to prevent connection issues.

This field is ignored if neither the httpProxy or httpsProxy fields are set.

One or more URLs external to the cluster to use to perform a readiness check before writing the httpProxy and httpsProxy values to status.