Questo contenuto non è disponibile nella lingua selezionata.
Chapter 11. High availability for pod-level bonds on SR-IOV networks
For workloads using pod-level bonding with SR-IOV virtual functions (VFs), despite an upstream switch failure, an underlying physical function (PF) might still report an up
state. This creates a silent failure, as attached VFs remain up and pods continue to send traffic to a dead endpoint, causing packet loss.
The PF Status Relay Operator solves this issue by using Link Aggregation Control Protocol (LACP) as an active health check. In this configuration, each physical function (PF) is placed in its own single-member LACP bond with the upstream switch. When the Operator detects an LACP failure on a PF’s bond, it changes the link state of the attached VFs from auto
to disabled
. This action triggers the pod’s active-backup
bond to fail over to its backup network path, maintaining high availability.
Configuring LACP state monitoring for SR-IOV networks is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.
11.1. Installing the PF Status Relay Operator using the CLI Copia collegamentoCollegamento copiato negli appunti!
Install the PF Status Relay Operator to enable OpenShift Container Platform to use Link Aggregation Control Protocol (LACP) as an active health check on physical functions (PFs).
Prerequisites
- You configured LACP on your upstream switch.
- You configured pod-level bonding for your SR-IOV networks.
-
You installed the OpenShift CLI (
oc
). - You have cluster-admin privileges.
Procedure
Create the
openshift-pf-status-relay-operator
namespace by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create an
OperatorGroup
custom resource (CR) by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
Subscription
CR for the PF Status Relay Operator by entering the following command:Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
To verify that the Operator is installed, enter the following command and then check that output shows
Succeeded
for the Operator:oc get csv -n openshift-pf-status-relay-operator -o custom-columns=Name:.metadata.name,Phase:.status.phase
$ oc get csv -n openshift-pf-status-relay-operator -o custom-columns=Name:.metadata.name,Phase:.status.phase
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
11.2. Installing the PF Status Relay Operator using the web console Copia collegamentoCollegamento copiato negli appunti!
Install the PF Status Relay Operator to enable OpenShift Container Platform to use Link Aggregation Control Protocol (LACP) as an active health check on physical functions (PFs).
Prerequisites
- You configured LACP on your upstream switch.
- You configured pod-level bonding for your SR-IOV networks.
- You have cluster-admin privileges.
Procedure
Install the PF Status Relay Operator:
-
In the OpenShift Container Platform web console, click Ecosystem
Software Catalog. - Select PF Status Relay Operator from the list of available Operators, and then click Install.
- On the Install Operator page, under Installed Namespace, select Operator recommended Namespace.
- Click Install.
-
In the OpenShift Container Platform web console, click Ecosystem
Verification
- Verify that the PF Status Relay Operator shows the Status as Succeeded on the Installed Operators dashboard.
11.3. Configuring the PF Status Relay Operator for LACP state monitoring on SR-IOV networks Copia collegamentoCollegamento copiato negli appunti!
Use the PF Status Relay Operator to enable Link Aggregation Control Protocol (LACP) state monitoring for workloads using pod-level bonding with SR-IOV networks. The Operator monitors the LACP state on physical functions (PF) and changes the link state for attached virtual functions (VF) when it detects an upstream failure. With this approach, you can detect failures on VFs attached to a PF to ensure a timely fail over to backup network path, ensuring high availability for your workloads.
The following scenario demonstrates how to configure and verify LACP state monitoring for SR-IOV networks:
- Create host-level NIC bonds on worker nodes and configure LACP.
- Define SR-IOV network policies to create virtual functions (VFs) on the bonded interfaces.
- Deploy the PF Status Relay Operator to monitor PFs and monitor the LACP state.
- Verify that pods using these VFs automatically fail over to a backup network path in case of upstream switch failure.
The following scenario demonstrates how to configure and verify LACP state monitoring for SR-IOV networks. This scenario uses SR-IOV network cards with two ports on each node, worker-0
and worker-1
, with both ports connected to a shared switch to support LACP bonding.
Prerequisites
- Nodes must have a NIC that supports SR-IOV.
- The SR-IOV Network Operator is installed.
- The PF Status Relay Operator is installed.
- The physical switch ports connected to the worker nodes are configured for LACP with a fast polling rate.
-
The
linkState
is set toauto
ordisable
for the SR-IOV VFs that you want to monitor. The Operator ignores VFs with thelinkState
set toenable
. The default value for SR-IOV VFs islinkState: auto
.
Procedure
Create the project namespace by creating a
namespace.yaml
file such as the following example:Example
namespace.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The namespace where you deploy the high-availability pod.
Apply the namespace by running the following command:
oc apply -f namespace.yaml
$ oc apply -f namespace.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Configure host-level LACP bonds:
Create a YAML file that defines the
NodeNetworkConfigurationPolicy
resource for theens5f0
interface on theworker-0
node:Example
nncpBondF0Worker0.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a YAML file that defines the
NodeNetworkConfigurationPolicy
resource for theens5f1
interface on theworker-0
node:Example
nncpBondF1Worker0.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the resources by running the following commands:
oc apply -f nncpBondF0Worker0.yaml oc apply -f nncpBondF1Worker0.yaml
$ oc apply -f nncpBondF0Worker0.yaml $ oc apply -f nncpBondF1Worker0.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Create SR-IOV network VFs for the bonded interfaces:
Create a YAML file that defines the
SriovNetworkNodePolicy
resource for theens5f0
interface on theworker-0
node:Example
sriovnetworkpolicy-port1.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a YAML file that defines the
SriovNetworkNodePolicy
resource for theens5f1
interface on theworker-0
node:Example
sriovnetworkpolicy-port2.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the resources by running the following commands:
oc apply -f sriovnetworkpolicy-port1.yaml oc apply -f sriovnetworkpolicy-port2.yaml
$ oc apply -f sriovnetworkpolicy-port1.yaml $ oc apply -f sriovnetworkpolicy-port2.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Configure the PF Status Relay Operator:
Create a YAML file that defines the
PFLACPMonitor
resource. This example file configures the Operator to monitor the LACP status ofens5f0
andens5f1
bonded interfaces on theworker-0
node:Example
pflacpmonitor.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow ImportantUse only one
PFLACPMonitor
custom resource to monitor each network interface on a node. If you create multiple resources that target the same interface, the PF Status Relay Operator will not process the conflicting configurations.Apply the
PFLACPMonitor
resource by running the following command:oc apply -f pflacpmonitor.yaml
$ oc apply -f pflacpmonitor.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Check the logs of the PF Status Relay Operator to verify that it is monitoring the LACP state:
oc logs -n openshift-pf-status-relay-operator <pf_status_relay_operator_pod_name>
$ oc logs -n openshift-pf-status-relay-operator <pf_status_relay_operator_pod_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
{"time":"2025-07-24T13:35:54.653201692Z","level":"INFO","msg":"lacp is up","interface":"ens5f0"} {"time":"2025-07-24T13:35:54.65347273Z","level":"INFO","msg":"vf link state was set","id":0,"state":"auto","interface":"ens5f0"} ...
{"time":"2025-07-24T13:35:54.653201692Z","level":"INFO","msg":"lacp is up","interface":"ens5f0"} {"time":"2025-07-24T13:35:54.65347273Z","level":"INFO","msg":"vf link state was set","id":0,"state":"auto","interface":"ens5f0"} ...
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
SriovNetwork
resources to make the VFs available for use within thesriov-operator-tests
namespace:Create a YAML file that defines the
SriovNetwork
resource for the VFs created onens5f0
:Example
sriovnetwork-port1.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow Create a YAML file that defines the
SriovNetwork
resource for the VFs created onens5f1
:Example
sriovnetwork-port2.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the resources by running the following commands:
oc apply -f sriovnetwork-port1.yaml oc apply -f sriovnetwork-port2.yaml
$ oc apply -f sriovnetwork-port1.yaml $ oc apply -f sriovnetwork-port2.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Define a high-availability pod that uses the SR-IOV VFs:
Apply the
NetworkAttachmentDefinition
resource to create anactive-backup
bond using the two SR-IOV networks:Example
nad-bond.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow -
linksInContainer: true
creates the bond inside the pod’s network namespace. -
mode: active-backup
configures the bond to use active-backup mode. links
specifies the pod-level interfaces to include in the bond.ImportantThe PF Status Relay Operator provides LACP state monitoring for pod-level bonding with the
mode: active-backup
configuration only.
-
Apply the
NetworkAttachmentDefinition
resource by running the following command:oc apply -f nad-bond.yaml
$ oc apply -f nad-bond.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a YAML file that defines the
Pod
resource that uses the VFs from the bonded interfaces in active-backup mode:Example
client-bond.yaml
fileCopy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- The annotation requests three networks: two SR-IOV VFs,
net1
andnet2
and one bond,bond0
, which uses them.
Apply the
Pod
resource by running the following command:oc apply -f client-bond.yaml
$ oc apply -f client-bond.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Check that the failover mechanism:
Log in to the
client-bond
pod by running the following command:oc rsh -n sriov-operator-tests client-bond
$ oc rsh -n sriov-operator-tests client-bond
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the initial status of the pod-level bond by running the following command:
cat /proc/net/bonding/bond0
sh-4.4# cat /proc/net/bonding/bond0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
Both
net1
andnet2
interfaces are up.
-
Both
- Exit the pod shell.
- Simulate an LACP failure on your upstream physical switch. To simulate this scenario, you can filter LACP traffic on the switch port that you want to test the failure on. This ensures that the physical link remains up while the LACP pollings fails. The command to do this is vendor-dependent.
Verify the failover inside the pod by logging back into the
client-bond
pod and checking the bond status again:cat /proc/net/bonding/bond0
sh-4.4# cat /proc/net/bonding/bond0
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The
net1
interface is down, and thenet2
interface is now the active interface.The client-bond pod detects the link state change and switches to the backup network path.