Chapter 5. MetalLB Operator
5.1. About MetalLB and the MetalLB Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can add the MetalLB Operator to your cluster so that when a service of type LoadBalancer
is added to the cluster, MetalLB can add an external IP address for the service. The external IP address is added to the host network for your cluster.
5.1.1. When to use MetalLB Copy linkLink copied to clipboard!
Using MetalLB is valuable when you have a bare-metal cluster, or an infrastructure that is like bare metal, and you want fault-tolerant access to an application through an external IP address.
You must configure your networking infrastructure to ensure that network traffic for the external IP address is routed from clients to the host network for the cluster.
After deploying MetalLB with the MetalLB Operator, when you add a service of type LoadBalancer
, MetalLB provides a platform-native load balancer.
When external traffic enters your OpenShift Container Platform cluster through a MetalLB LoadBalancer
service, the return traffic to the client has the external IP address of the load balancer as the source IP.
MetalLB operating in layer2 mode provides support for failover by utilizing a mechanism similar to IP failover. However, instead of relying on the virtual router redundancy protocol (VRRP) and keepalived, MetalLB leverages a gossip-based protocol to identify instances of node failure. When a failover is detected, another node assumes the role of the leader node, and a gratuitous ARP message is dispatched to broadcast this change.
MetalLB operating in layer3 or border gateway protocol (BGP) mode delegates failure detection to the network. The BGP router or routers that the OpenShift Container Platform nodes have established a connection with will identify any node failure and terminate the routes to that node.
Using MetalLB instead of IP failover is preferable for ensuring high availability of pods and services.
5.1.2. MetalLB Operator custom resources Copy linkLink copied to clipboard!
The MetalLB Operator monitors its own namespace for the following custom resources:
MetalLB
-
When you add a
MetalLB
custom resource to the cluster, the MetalLB Operator deploys MetalLB on the cluster. The Operator only supports a single instance of the custom resource. If the instance is deleted, the Operator removes MetalLB from the cluster. IPAddressPool
MetalLB requires one or more pools of IP addresses that it can assign to a service when you add a service of type
LoadBalancer
. AnIPAddressPool
includes a list of IP addresses. The list can be a single IP address that is set using a range, such as 1.1.1.1-1.1.1.1, a range specified in CIDR notation, a range specified as a starting and ending address separated by a hyphen, or a combination of the three. AnIPAddressPool
requires a name. The documentation uses names likedoc-example
,doc-example-reserved
, anddoc-example-ipv6
. The MetalLBcontroller
assigns IP addresses from a pool of addresses in anIPAddressPool
.L2Advertisement
andBGPAdvertisement
custom resources enable the advertisement of a given IP from a given pool. You can assign IP addresses from anIPAddressPool
to services and namespaces by using thespec.serviceAllocation
specification in theIPAddressPool
custom resource.NoteA single
IPAddressPool
can be referenced by a L2 advertisement and a BGP advertisement.BGPPeer
- The BGP peer custom resource identifies the BGP router for MetalLB to communicate with, the AS number of the router, the AS number for MetalLB, and customizations for route advertisement. MetalLB advertises the routes for service load-balancer IP addresses to one or more BGP peers.
BFDProfile
- The BFD profile custom resource configures Bidirectional Forwarding Detection (BFD) for a BGP peer. BFD provides faster path failure detection than BGP alone provides.
L2Advertisement
-
The L2Advertisement custom resource advertises an IP coming from an
IPAddressPool
using the L2 protocol. BGPAdvertisement
-
The BGPAdvertisement custom resource advertises an IP coming from an
IPAddressPool
using the BGP protocol.
After you add the MetalLB
custom resource to the cluster and the Operator deploys MetalLB, the controller
and speaker
MetalLB software components begin running.
MetalLB validates all relevant custom resources.
5.1.3. MetalLB software components Copy linkLink copied to clipboard!
When you install the MetalLB Operator, the metallb-operator-controller-manager
deployment starts a pod. The pod is the implementation of the Operator. The pod monitors for changes to all the relevant resources.
When the Operator starts an instance of MetalLB, it starts a controller
deployment and a speaker
daemon set.
You can configure deployment specifications in the MetalLB custom resource to manage how controller
and speaker
pods deploy and run in your cluster. For more information about these deployment specifications, see the Additional resources section.
controller
The Operator starts the deployment and a single pod. When you add a service of type
LoadBalancer
, Kubernetes uses thecontroller
to allocate an IP address from an address pool. In case of a service failure, verify you have the following entry in yourcontroller
pod logs:Example output
"event":"ipAllocated","ip":"172.22.0.201","msg":"IP address assigned by controller
"event":"ipAllocated","ip":"172.22.0.201","msg":"IP address assigned by controller
Copy to Clipboard Copied! Toggle word wrap Toggle overflow speaker
The Operator starts a daemon set for
speaker
pods. By default, a pod is started on each node in your cluster. You can limit the pods to specific nodes by specifying a node selector in theMetalLB
custom resource when you start MetalLB. If thecontroller
allocated the IP address to the service and service is still unavailable, read thespeaker
pod logs. If thespeaker
pod is unavailable, run theoc describe pod -n
command.For layer 2 mode, after the
controller
allocates an IP address for the service, thespeaker
pods use an algorithm to determine whichspeaker
pod on which node will announce the load balancer IP address. The algorithm involves hashing the node name and the load balancer IP address. For more information, see "MetalLB and external traffic policy". Thespeaker
uses Address Resolution Protocol (ARP) to announce IPv4 addresses and Neighbor Discovery Protocol (NDP) to announce IPv6 addresses.
For Border Gateway Protocol (BGP) mode, after the controller
allocates an IP address for the service, each speaker
pod advertises the load balancer IP address with its BGP peers. You can configure which nodes start BGP sessions with BGP peers.
Requests for the load balancer IP address are routed to the node with the speaker
that announces the IP address. After the node receives the packets, the service proxy routes the packets to an endpoint for the service. The endpoint can be on the same node in the optimal case, or it can be on another node. The service proxy chooses an endpoint each time a connection is established.
5.1.4. MetalLB and external traffic policy Copy linkLink copied to clipboard!
With layer 2 mode, one node in your cluster receives all the traffic for the service IP address. With BGP mode, a router on the host network opens a connection to one of the nodes in the cluster for a new client connection. How your cluster handles the traffic after it enters the node is affected by the external traffic policy.
cluster
This is the default value for
spec.externalTrafficPolicy
.With the
cluster
traffic policy, after the node receives the traffic, the service proxy distributes the traffic to all the pods in your service. This policy provides uniform traffic distribution across the pods, but it obscures the client IP address and it can appear to the application in your pods that the traffic originates from the node rather than the client.local
With the
local
traffic policy, after the node receives the traffic, the service proxy only sends traffic to the pods on the same node. For example, if thespeaker
pod on node A announces the external service IP, then all traffic is sent to node A. After the traffic enters node A, the service proxy only sends traffic to pods for the service that are also on node A. Pods for the service that are on additional nodes do not receive any traffic from node A. Pods for the service on additional nodes act as replicas in case failover is needed.This policy does not affect the client IP address. Application pods can determine the client IP address from the incoming connections.
The following information is important when configuring the external traffic policy in BGP mode.
Although MetalLB advertises the load balancer IP address from all the eligible nodes, the number of nodes loadbalancing the service can be limited by the capacity of the router to establish equal-cost multipath (ECMP) routes. If the number of nodes advertising the IP is greater than the ECMP group limit of the router, the router will use less nodes than the ones advertising the IP.
For example, if the external traffic policy is set to local
and the router has an ECMP group limit set to 16 and the pods implementing a LoadBalancer service are deployed on 30 nodes, this would result in pods deployed on 14 nodes not receiving any traffic. In this situation, it would be preferable to set the external traffic policy for the service to cluster
.
5.1.5. MetalLB concepts for layer 2 mode Copy linkLink copied to clipboard!
In layer 2 mode, the speaker
pod on one node announces the external IP address for a service to the host network. From a network perspective, the node appears to have multiple IP addresses assigned to a network interface.
In layer 2 mode, MetalLB relies on ARP and NDP. These protocols implement local address resolution within a specific subnet. In this context, the client must be able to reach the VIP assigned by MetalLB that exists on the same subnet as the nodes announcing the service in order for MetalLB to work.
The speaker
pod responds to ARP requests for IPv4 services and NDP requests for IPv6.
In layer 2 mode, all traffic for a service IP address is routed through one node. After traffic enters the node, the service proxy for the CNI network provider distributes the traffic to all the pods for the service.
Because all traffic for a service enters through a single node in layer 2 mode, in a strict sense, MetalLB does not implement a load balancer for layer 2. Rather, MetalLB implements a failover mechanism for layer 2 so that when a speaker
pod becomes unavailable, a speaker
pod on a different node can announce the service IP address.
When a node becomes unavailable, failover is automatic. The speaker
pods on the other nodes detect that a node is unavailable and a new speaker
pod and node take ownership of the service IP address from the failed node.
The preceding graphic shows the following concepts related to MetalLB:
-
An application is available through a service that has a cluster IP on the
172.130.0.0/16
subnet. That IP address is accessible from inside the cluster. The service also has an external IP address that MetalLB assigned to the service,192.168.100.200
. - Nodes 1 and 3 have a pod for the application.
-
The
speaker
daemon set runs a pod on each node. The MetalLB Operator starts these pods. -
Each
speaker
pod is a host-networked pod. The IP address for the pod is identical to the IP address for the node on the host network. -
The
speaker
pod on node 1 uses ARP to announce the external IP address for the service,192.168.100.200
. Thespeaker
pod that announces the external IP address must be on the same node as an endpoint for the service and the endpoint must be in theReady
condition. Client traffic is routed to the host network and connects to the
192.168.100.200
IP address. After traffic enters the node, the service proxy sends the traffic to the application pod on the same node or another node according to the external traffic policy that you set for the service.-
If the external traffic policy for the service is set to
cluster
, the node that advertises the192.168.100.200
load balancer IP address is selected from the nodes where aspeaker
pod is running. Only that node can receive traffic for the service. -
If the external traffic policy for the service is set to
local
, the node that advertises the192.168.100.200
load balancer IP address is selected from the nodes where aspeaker
pod is running and at least an endpoint of the service. Only that node can receive traffic for the service. In the preceding graphic, either node 1 or 3 would advertise192.168.100.200
.
-
If the external traffic policy for the service is set to
-
If node 1 becomes unavailable, the external IP address fails over to another node. On another node that has an instance of the application pod and service endpoint, the
speaker
pod begins to announce the external IP address,192.168.100.200
and the new node receives the client traffic. In the diagram, the only candidate is node 3.
5.1.6. MetalLB concepts for BGP mode Copy linkLink copied to clipboard!
In BGP mode, by default each speaker
pod advertises the load balancer IP address for a service to each BGP peer. It is also possible to advertise the IPs coming from a given pool to a specific set of peers by adding an optional list of BGP peers. BGP peers are commonly network routers that are configured to use the BGP protocol. When a router receives traffic for the load balancer IP address, the router picks one of the nodes with a speaker
pod that advertised the IP address. The router sends the traffic to that node. After traffic enters the node, the service proxy for the CNI network plugin distributes the traffic to all the pods for the service.
The directly-connected router on the same layer 2 network segment as the cluster nodes can be configured as a BGP peer. If the directly-connected router is not configured as a BGP peer, you need to configure your network so that packets for load balancer IP addresses are routed between the BGP peers and the cluster nodes that run the speaker
pods.
Each time a router receives new traffic for the load balancer IP address, it creates a new connection to a node. Each router manufacturer has an implementation-specific algorithm for choosing which node to initiate the connection with. However, the algorithms commonly are designed to distribute traffic across the available nodes for the purpose of balancing the network load.
If a node becomes unavailable, the router initiates a new connection with another node that has a speaker
pod that advertises the load balancer IP address.
Figure 5.1. MetalLB topology diagram for BGP mode
The preceding graphic shows the following concepts related to MetalLB:
-
An application is available through a service that has an IPv4 cluster IP on the
172.130.0.0/16
subnet. That IP address is accessible from inside the cluster. The service also has an external IP address that MetalLB assigned to the service,203.0.113.200
. - Nodes 2 and 3 have a pod for the application.
-
The
speaker
daemon set runs a pod on each node. The MetalLB Operator starts these pods. You can configure MetalLB to specify which nodes run thespeaker
pods. -
Each
speaker
pod is a host-networked pod. The IP address for the pod is identical to the IP address for the node on the host network. -
Each
speaker
pod starts a BGP session with all BGP peers and advertises the load balancer IP addresses or aggregated routes to the BGP peers. Thespeaker
pods advertise that they are part of Autonomous System 65010. The diagram shows a router, R1, as a BGP peer within the same Autonomous System. However, you can configure MetalLB to start BGP sessions with peers that belong to other Autonomous Systems. All the nodes with a
speaker
pod that advertises the load balancer IP address can receive traffic for the service.-
If the external traffic policy for the service is set to
cluster
, all the nodes where a speaker pod is running advertise the203.0.113.200
load balancer IP address and all the nodes with aspeaker
pod can receive traffic for the service. The host prefix is advertised to the router peer only if the external traffic policy is set to cluster. -
If the external traffic policy for the service is set to
local
, then all the nodes where aspeaker
pod is running and at least an endpoint of the service is running can advertise the203.0.113.200
load balancer IP address. Only those nodes can receive traffic for the service. In the preceding graphic, nodes 2 and 3 would advertise203.0.113.200
.
-
If the external traffic policy for the service is set to
-
You can configure MetalLB to control which
speaker
pods start BGP sessions with specific BGP peers by specifying a node selector when you add a BGP peer custom resource. - Any routers, such as R1, that are configured to use BGP can be set as BGP peers.
- Client traffic is routed to one of the nodes on the host network. After traffic enters the node, the service proxy sends the traffic to the application pod on the same node or another node according to the external traffic policy that you set for the service.
- If a node becomes unavailable, the router detects the failure and initiates a new connection with another node. You can configure MetalLB to use a Bidirectional Forwarding Detection (BFD) profile for BGP peers. BFD provides faster link failure detection so that routers can initiate new connections earlier than without BFD.
5.1.7. Limitations and restrictions Copy linkLink copied to clipboard!
5.1.7.1. Infrastructure considerations for MetalLB Copy linkLink copied to clipboard!
MetalLB is primarily useful for on-premise, bare metal installations because these installations do not include a native load-balancer capability. In addition to bare metal installations, installations of OpenShift Container Platform on some infrastructures might not include a native load-balancer capability. For example, the following infrastructures can benefit from adding the MetalLB Operator:
- Bare metal
- VMware vSphere
- IBM Z® and IBM® LinuxONE
- IBM Z® and IBM® LinuxONE for Red Hat Enterprise Linux (RHEL) KVM
- IBM Power®
5.1.7.2. Limitations for layer 2 mode Copy linkLink copied to clipboard!
5.1.7.2.1. Single-node bottleneck Copy linkLink copied to clipboard!
MetalLB routes all traffic for a service through a single node, the node can become a bottleneck and limit performance.
Layer 2 mode limits the ingress bandwidth for your service to the bandwidth of a single node. This is a fundamental limitation of using ARP and NDP to direct traffic.
5.1.7.2.2. Slow failover performance Copy linkLink copied to clipboard!
Failover between nodes depends on cooperation from the clients. When a failover occurs, MetalLB sends gratuitous ARP packets to notify clients that the MAC address associated with the service IP has changed.
Most client operating systems handle gratuitous ARP packets correctly and update their neighbor caches promptly. When clients update their caches quickly, failover completes within a few seconds. Clients typically fail over to a new node within 10 seconds. However, some client operating systems either do not handle gratuitous ARP packets at all or have outdated implementations that delay the cache update.
Recent versions of common operating systems such as Windows, macOS, and Linux implement layer 2 failover correctly. Issues with slow failover are not expected except for older and less common client operating systems.
To minimize the impact from a planned failover on outdated clients, keep the old node running for a few minutes after flipping leadership. The old node can continue to forward traffic for outdated clients until their caches refresh.
During an unplanned failover, the service IPs are unreachable until the outdated clients refresh their cache entries.
5.1.7.2.3. Additional Network and MetalLB cannot use same network Copy linkLink copied to clipboard!
Using the same VLAN for both MetalLB and an additional network interface set up on a source pod might result in a connection failure. This occurs when both the MetalLB IP and the source pod reside on the same node.
To avoid connection failures, place the MetalLB IP in a different subnet from the one where the source pod resides. This configuration ensures that traffic from the source pod will take the default gateway. Consequently, the traffic can effectively reach its destination by using the OVN overlay network, ensuring that the connection functions as intended.
5.1.7.3. Limitations for BGP mode Copy linkLink copied to clipboard!
5.1.7.3.1. Node failure can break all active connections Copy linkLink copied to clipboard!
MetalLB shares a limitation that is common to BGP-based load balancing. When a BGP session terminates, such as when a node fails or when a speaker
pod restarts, the session termination might result in resetting all active connections. End users can experience a Connection reset by peer
message.
The consequence of a terminated BGP session is implementation-specific for each router manufacturer. However, you can anticipate that a change in the number of speaker
pods affects the number of BGP sessions and that active connections with BGP peers will break.
To avoid or reduce the likelihood of a service interruption, you can specify a node selector when you add a BGP peer. By limiting the number of nodes that start BGP sessions, a fault on a node that does not have a BGP session has no affect on connections to the service.
5.1.7.3.2. Support for a single ASN and a single router ID only Copy linkLink copied to clipboard!
When you add a BGP peer custom resource, you specify the spec.myASN
field to identify the Autonomous System Number (ASN) that MetalLB belongs to. OpenShift Container Platform uses an implementation of BGP with MetalLB that requires MetalLB to belong to a single ASN. If you attempt to add a BGP peer and specify a different value for spec.myASN
than an existing BGP peer custom resource, you receive an error.
Similarly, when you add a BGP peer custom resource, the spec.routerID
field is optional. If you specify a value for this field, you must specify the same value for all other BGP peer custom resources that you add.
The limitation to support a single ASN and single router ID is a difference with the community-supported implementation of MetalLB.
5.2. Installing the MetalLB Operator Copy linkLink copied to clipboard!
As a cluster administrator, you can add the MetalLB Operator so that the Operator can manage the lifecycle for an instance of MetalLB on your cluster.
MetalLB and IP failover are incompatible. If you configured IP failover for your cluster, perform the steps to remove IP failover before you install the Operator.
5.2.1. Installing the MetalLB Operator from the OperatorHub using the web console Copy linkLink copied to clipboard!
As a cluster administrator, you can install the MetalLB Operator by using the OpenShift Container Platform web console.
Prerequisites
-
Log in as a user with
cluster-admin
privileges.
Procedure
-
In the OpenShift Container Platform web console, navigate to Operators
OperatorHub. Type a keyword into the Filter by keyword box or scroll to find the Operator you want. For example, type
metallb
to find the MetalLB Operator.You can also filter options by Infrastructure Features. For example, select Disconnected if you want to see Operators that work in disconnected environments, also known as restricted network environments.
- On the Install Operator page, accept the defaults and click Install.
Verification
To confirm that the installation is successful:
-
Navigate to the Operators
Installed Operators page. -
Check that the Operator is installed in the
openshift-operators
namespace and that its status isSucceeded
.
-
Navigate to the Operators
If the Operator is not installed successfully, check the status of the Operator and review the logs:
-
Navigate to the Operators
Installed Operators page and inspect the Status
column for any errors or failures. -
Navigate to the Workloads
Pods page and check the logs in any pods in the openshift-operators
project that are reporting issues.
-
Navigate to the Operators
5.2.2. Installing from OperatorHub using the CLI Copy linkLink copied to clipboard!
Instead of using the OpenShift Container Platform web console, you can install an Operator from OperatorHub using the CLI. You can use the OpenShift CLI (oc
) to install the MetalLB Operator.
It is recommended that when using the CLI you install the Operator in the metallb-system
namespace.
Prerequisites
- A cluster installed on bare-metal hardware.
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges.
Procedure
Create a namespace for the MetalLB Operator by entering the following command:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create an Operator group custom resource (CR) in the namespace:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Confirm the Operator group is installed in the namespace:
oc get operatorgroup -n metallb-system
$ oc get operatorgroup -n metallb-system
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME AGE metallb-operator 14m
NAME AGE metallb-operator 14m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
Subscription
CR:Define the
Subscription
CR and save the YAML file, for example,metallb-sub.yaml
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- You must specify the
redhat-operators
value.
To create the
Subscription
CR, run the following command:oc create -f metallb-sub.yaml
$ oc create -f metallb-sub.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Optional: To ensure BGP and BFD metrics appear in Prometheus, you can label the namespace as in the following command:
oc label ns metallb-system "openshift.io/cluster-monitoring=true"
$ oc label ns metallb-system "openshift.io/cluster-monitoring=true"
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
The verification steps assume the MetalLB Operator is installed in the metallb-system
namespace.
Confirm the install plan is in the namespace:
oc get installplan -n metallb-system
$ oc get installplan -n metallb-system
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME CSV APPROVAL APPROVED install-wzg94 metallb-operator.4.17.0-nnnnnnnnnnnn Automatic true
NAME CSV APPROVAL APPROVED install-wzg94 metallb-operator.4.17.0-nnnnnnnnnnnn Automatic true
Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteInstallation of the Operator might take a few seconds.
To verify that the Operator is installed, enter the following command and then check that output shows
Succeeded
for the Operator:oc get clusterserviceversion -n metallb-system \ -o custom-columns=Name:.metadata.name,Phase:.status.phase
$ oc get clusterserviceversion -n metallb-system \ -o custom-columns=Name:.metadata.name,Phase:.status.phase
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.2.3. Starting MetalLB on your cluster Copy linkLink copied to clipboard!
After you install the Operator, you need to configure a single instance of a MetalLB custom resource. After you configure the custom resource, the Operator starts MetalLB on your cluster.
Prerequisites
-
Install the OpenShift CLI (
oc
). -
Log in as a user with
cluster-admin
privileges. - Install the MetalLB Operator.
Procedure
This procedure assumes the MetalLB Operator is installed in the metallb-system
namespace. If you installed using the web console substitute openshift-operators
for the namespace.
Create a single instance of a MetalLB custom resource:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Confirm that the deployment for the MetalLB controller and the daemon set for the MetalLB speaker are running.
Verify that the deployment for the controller is running:
oc get deployment -n metallb-system controller
$ oc get deployment -n metallb-system controller
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME READY UP-TO-DATE AVAILABLE AGE controller 1/1 1 1 11m
NAME READY UP-TO-DATE AVAILABLE AGE controller 1/1 1 1 11m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Verify that the daemon set for the speaker is running:
oc get daemonset -n metallb-system speaker
$ oc get daemonset -n metallb-system speaker
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE speaker 6 6 6 6 6 kubernetes.io/os=linux 18m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE speaker 6 6 6 6 6 kubernetes.io/os=linux 18m
Copy to Clipboard Copied! Toggle word wrap Toggle overflow The example output indicates 6 speaker pods. The number of speaker pods in your cluster might differ from the example output. Make sure the output indicates one pod for each node in your cluster.
5.2.4. Deployment specifications for MetalLB Copy linkLink copied to clipboard!
When you start an instance of MetalLB using the MetalLB
custom resource, you can configure deployment specifications in the MetalLB
custom resource to manage how the controller
or speaker
pods deploy and run in your cluster. Use these deployment specifications to manage the following tasks:
- Select nodes for MetalLB pod deployment.
- Manage scheduling by using pod priority and pod affinity.
- Assign CPU limits for MetalLB pods.
- Assign a container RuntimeClass for MetalLB pods.
- Assign metadata for MetalLB pods.
5.2.4.1. Limit speaker pods to specific nodes Copy linkLink copied to clipboard!
By default, when you start MetalLB with the MetalLB Operator, the Operator starts an instance of a speaker
pod on each node in the cluster. Only the nodes with a speaker
pod can advertise a load balancer IP address. You can configure the MetalLB
custom resource with a node selector to specify which nodes run the speaker
pods.
The most common reason to limit the speaker
pods to specific nodes is to ensure that only nodes with network interfaces on specific networks advertise load balancer IP addresses. Only the nodes with a running speaker
pod are advertised as destinations of the load balancer IP address.
If you limit the speaker
pods to specific nodes and specify local
for the external traffic policy of a service, then you must ensure that the application pods for the service are deployed to the same nodes.
Example configuration to limit speaker pods to worker nodes
- 1
- The example configuration specifies to assign the speaker pods to worker nodes, but you can specify labels that you assigned to nodes or any valid node selector.
- 2
- In this example configuration, the pod that this toleration is attached to tolerates any taint that matches the
key
value andeffect
value using theoperator
.
After you apply a manifest with the spec.nodeSelector
field, you can check the number of pods that the Operator deployed with the oc get daemonset -n metallb-system speaker
command. Similarly, you can display the nodes that match your labels with a command like oc get nodes -l node-role.kubernetes.io/worker=
.
You can optionally allow the node to control which speaker pods should, or should not, be scheduled on them by using affinity rules. You can also limit these pods by applying a list of tolerations. For more information about affinity rules, taints, and tolerations, see the additional resources.
5.2.4.2. Configuring pod priority and pod affinity in a MetalLB deployment Copy linkLink copied to clipboard!
You can optionally assign pod priority and pod affinity rules to controller
and speaker
pods by configuring the MetalLB
custom resource. The pod priority indicates the relative importance of a pod on a node and schedules the pod based on this priority. Set a high priority on your controller
or speaker
pod to ensure scheduling priority over other pods on the node.
Pod affinity manages relationships among pods. Assign pod affinity to the controller
or speaker
pods to control on what node the scheduler places the pod in the context of pod relationships. For example, you can use pod affinity rules to ensure that certain pods are located on the same node or nodes, which can help improve network communication and reduce latency between those components.
Prerequisites
-
You are logged in as a user with
cluster-admin
privileges. - You have installed the MetalLB Operator.
- You have started the MetalLB Operator on your cluster.
Procedure
Create a
PriorityClass
custom resource, such asmyPriorityClass.yaml
, to configure the priority level. This example defines aPriorityClass
namedhigh-priority
with a value of1000000
. Pods that are assigned this priority class are considered higher priority during scheduling compared to pods with lower priority classes:apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000000
apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000000
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
PriorityClass
custom resource configuration:oc apply -f myPriorityClass.yaml
$ oc apply -f myPriorityClass.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create a
MetalLB
custom resource, such asMetalLBPodConfig.yaml
, to specify thepriorityClassName
andpodAffinity
values:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - 1
- Specifies the priority class for the MetalLB controller pods. In this case, it is set to
high-priority
. - 2
- Specifies that you are configuring pod affinity rules. These rules dictate how pods are scheduled in relation to other pods or nodes. This configuration instructs the scheduler to schedule pods that have the label
app: metallb
onto nodes that share the same hostname. This helps to co-locate MetalLB-related pods on the same nodes, potentially optimizing network communication, latency, and resource usage between these pods.
Apply the
MetalLB
custom resource configuration:oc apply -f MetalLBPodConfig.yaml
$ oc apply -f MetalLBPodConfig.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
To view the priority class that you assigned to pods in the
metallb-system
namespace, run the following command:oc get pods -n metallb-system -o custom-columns=NAME:.metadata.name,PRIORITY:.spec.priorityClassName
$ oc get pods -n metallb-system -o custom-columns=NAME:.metadata.name,PRIORITY:.spec.priorityClassName
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAME PRIORITY controller-584f5c8cd8-5zbvg high-priority metallb-operator-controller-manager-9c8d9985-szkqg <none> metallb-operator-webhook-server-c895594d4-shjgx <none> speaker-dddf7 high-priority
NAME PRIORITY controller-584f5c8cd8-5zbvg high-priority metallb-operator-controller-manager-9c8d9985-szkqg <none> metallb-operator-webhook-server-c895594d4-shjgx <none> speaker-dddf7 high-priority
Copy to Clipboard Copied! Toggle word wrap Toggle overflow To verify that the scheduler placed pods according to pod affinity rules, view the metadata for the pod’s node or nodes by running the following command:
oc get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name -n metallb-system
$ oc get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name -n metallb-system
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.2.4.3. Configuring pod CPU limits in a MetalLB deployment Copy linkLink copied to clipboard!
You can optionally assign pod CPU limits to controller
and speaker
pods by configuring the MetalLB
custom resource. Defining CPU limits for the controller
or speaker
pods helps you to manage compute resources on the node. This ensures all pods on the node have the necessary compute resources to manage workloads and cluster housekeeping.
Prerequisites
-
You are logged in as a user with
cluster-admin
privileges. - You have installed the MetalLB Operator.
Procedure
Create a
MetalLB
custom resource file, such asCPULimits.yaml
, to specify thecpu
value for thecontroller
andspeaker
pods:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Apply the
MetalLB
custom resource configuration:oc apply -f CPULimits.yaml
$ oc apply -f CPULimits.yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
To view compute resources for a pod, run the following command, replacing
<pod_name>
with your target pod:oc describe pod <pod_name>
$ oc describe pod <pod_name>
Copy to Clipboard Copied! Toggle word wrap Toggle overflow
5.2.6. Next steps Copy linkLink copied to clipboard!
5.3. Upgrading the MetalLB Operator Copy linkLink copied to clipboard!
A Subscription
custom resource (CR) that subscribes the namespace to metallb-system
by default, automatically sets the installPlanApproval
parameter to Automatic
. This means that when Red Hat-provided Operator catalogs include a newer version of the MetalLB Operator, the MetalLB Operator is automatically upgraded.
If you need to manually control upgrading the MetalLB Operator, set the installPlanApproval
parameter to Manual
.
5.3.1. Manually upgrading the MetalLB Operator Copy linkLink copied to clipboard!
To manually control upgrading the MetalLB Operator, you must edit the Subscription
custom resource (CR) that subscribes the namespace to metallb-system
. A Subscription
CR is created as part of the Operator installation and the CR has the installPlanApproval
parameter set to Automatic
by default.
Prerequisites
- You updated your cluster to the latest z-stream release.
- You used OperatorHub to install the MetalLB Operator.
-
Access the cluster as a user with the
cluster-admin
role.
Procedure
Get the YAML definition of the
metallb-operator
subscription in themetallb-system
namespace by entering the following command:oc -n metallb-system get subscription metallb-operator -o yaml
$ oc -n metallb-system get subscription metallb-operator -o yaml
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the
Subscription
CR by setting theinstallPlanApproval
parameter toManual
:Copy to Clipboard Copied! Toggle word wrap Toggle overflow Find the latest OpenShift Container Platform 4.17 version of the MetalLB Operator by entering the following command:
oc -n metallb-system get csv
$ oc -n metallb-system get csv
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Check the install plan that exists in the namespace by entering the following command.
oc -n metallb-system get installplan
$ oc -n metallb-system get installplan
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output that shows install-tsz2g as a manual install plan
NAME CSV APPROVAL APPROVED install-shpmd metallb-operator.v4.17.0-202502261233 Automatic true install-tsz2g metallb-operator.v4.17.0-202503102139 Manual false
NAME CSV APPROVAL APPROVED install-shpmd metallb-operator.v4.17.0-202502261233 Automatic true install-tsz2g metallb-operator.v4.17.0-202503102139 Manual false
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Edit the install plan that exists in the namespace by entering the following command. Ensure that you replace
<name_of_installplan>
with the name of the install plan, such asinstall-tsz2g
.oc edit installplan <name_of_installplan> -n metallb-system
$ oc edit installplan <name_of_installplan> -n metallb-system
Copy to Clipboard Copied! Toggle word wrap Toggle overflow With the install plan open in your editor, set the
spec.approval
parameter toManual
and set thespec.approved
parameter totrue
.NoteAfter you edit the install plan, the upgrade operation starts. If you enter the
oc -n metallb-system get csv
command during the upgrade operation, the output might show theReplacing
or thePending
status.
Verification
To verify that the Operator is upgraded, enter the following command and then check that output shows
Succeeded
for the Operator:oc -n metallb-system get csv
$ oc -n metallb-system get csv
Copy to Clipboard Copied! Toggle word wrap Toggle overflow