Search

Chapter 32. Load balancing with MetalLB

download PDF

32.1. About MetalLB and the MetalLB Operator

As a cluster administrator, you can add the MetalLB Operator to your cluster so that when a service of type LoadBalancer is added to the cluster, MetalLB can add an external IP address for the service. The external IP address is added to the host network for your cluster.

32.1.1. When to use MetalLB

Using MetalLB is valuable when you have a bare-metal cluster, or an infrastructure that is like bare metal, and you want fault-tolerant access to an application through an external IP address.

You must configure your networking infrastructure to ensure that network traffic for the external IP address is routed from clients to the host network for the cluster.

After deploying MetalLB with the MetalLB Operator, when you add a service of type LoadBalancer, MetalLB provides a platform-native load balancer.

MetalLB operating in layer2 mode provides support for failover by utilizing a mechanism similar to IP failover. However, instead of relying on the virtual router redundancy protocol (VRRP) and keepalived, MetalLB leverages a gossip-based protocol to identify instances of node failure. When a failover is detected, another node assumes the role of the leader node, and a gratuitous ARP message is dispatched to broadcast this change.

MetalLB operating in layer3 or border gateway protocol (BGP) mode delegates failure detection to the network. The BGP router or routers that the OpenShift Container Platform nodes have established a connection with will identify any node failure and terminate the routes to that node.

Using MetalLB instead of IP failover is preferable for ensuring high availability of pods and services.

32.1.2. MetalLB Operator custom resources

The MetalLB Operator monitors its own namespace for the following custom resources:

MetalLB
When you add a MetalLB custom resource to the cluster, the MetalLB Operator deploys MetalLB on the cluster. The Operator only supports a single instance of the custom resource. If the instance is deleted, the Operator removes MetalLB from the cluster.
IPAddressPool

MetalLB requires one or more pools of IP addresses that it can assign to a service when you add a service of type LoadBalancer. An IPAddressPool includes a list of IP addresses. The list can be a single IP address that is set using a range, such as 1.1.1.1-1.1.1.1, a range specified in CIDR notation, a range specified as a starting and ending address separated by a hyphen, or a combination of the three. An IPAddressPool requires a name. The documentation uses names like doc-example, doc-example-reserved, and doc-example-ipv6. The MetalLB controller assigns IP addresses from a pool of addresses in an IPAddressPool. L2Advertisement and BGPAdvertisement custom resources enable the advertisement of a given IP from a given pool. You can assign IP addresses from an IPAddressPool to services and namespaces by using the spec.serviceAllocation specification in the IPAddressPool custom resource.

Note

A single IPAddressPool can be referenced by a L2 advertisement and a BGP advertisement.

BGPPeer
The BGP peer custom resource identifies the BGP router for MetalLB to communicate with, the AS number of the router, the AS number for MetalLB, and customizations for route advertisement. MetalLB advertises the routes for service load-balancer IP addresses to one or more BGP peers.
BFDProfile
The BFD profile custom resource configures Bidirectional Forwarding Detection (BFD) for a BGP peer. BFD provides faster path failure detection than BGP alone provides.
L2Advertisement
The L2Advertisement custom resource advertises an IP coming from an IPAddressPool using the L2 protocol.
BGPAdvertisement
The BGPAdvertisement custom resource advertises an IP coming from an IPAddressPool using the BGP protocol.

After you add the MetalLB custom resource to the cluster and the Operator deploys MetalLB, the controller and speaker MetalLB software components begin running.

MetalLB validates all relevant custom resources.

32.1.3. MetalLB software components

When you install the MetalLB Operator, the metallb-operator-controller-manager deployment starts a pod. The pod is the implementation of the Operator. The pod monitors for changes to all the relevant resources.

When the Operator starts an instance of MetalLB, it starts a controller deployment and a speaker daemon set.

Note

You can configure deployment specifications in the MetalLB custom resource to manage how controller and speaker pods deploy and run in your cluster. For more information about these deployment specifications, see the Additional resources section.

controller

The Operator starts the deployment and a single pod. When you add a service of type LoadBalancer, Kubernetes uses the controller to allocate an IP address from an address pool. In case of a service failure, verify you have the following entry in your controller pod logs:

Example output

"event":"ipAllocated","ip":"172.22.0.201","msg":"IP address assigned by controller

speaker

The Operator starts a daemon set for speaker pods. By default, a pod is started on each node in your cluster. You can limit the pods to specific nodes by specifying a node selector in the MetalLB custom resource when you start MetalLB. If the controller allocated the IP address to the service and service is still unavailable, read the speaker pod logs. If the speaker pod is unavailable, run the oc describe pod -n command.

For layer 2 mode, after the controller allocates an IP address for the service, the speaker pods use an algorithm to determine which speaker pod on which node will announce the load balancer IP address. The algorithm involves hashing the node name and the load balancer IP address. For more information, see "MetalLB and external traffic policy". The speaker uses Address Resolution Protocol (ARP) to announce IPv4 addresses and Neighbor Discovery Protocol (NDP) to announce IPv6 addresses.

For Border Gateway Protocol (BGP) mode, after the controller allocates an IP address for the service, each speaker pod advertises the load balancer IP address with its BGP peers. You can configure which nodes start BGP sessions with BGP peers.

Requests for the load balancer IP address are routed to the node with the speaker that announces the IP address. After the node receives the packets, the service proxy routes the packets to an endpoint for the service. The endpoint can be on the same node in the optimal case, or it can be on another node. The service proxy chooses an endpoint each time a connection is established.

32.1.4. MetalLB and external traffic policy

With layer 2 mode, one node in your cluster receives all the traffic for the service IP address. With BGP mode, a router on the host network opens a connection to one of the nodes in the cluster for a new client connection. How your cluster handles the traffic after it enters the node is affected by the external traffic policy.

cluster

This is the default value for spec.externalTrafficPolicy.

With the cluster traffic policy, after the node receives the traffic, the service proxy distributes the traffic to all the pods in your service. This policy provides uniform traffic distribution across the pods, but it obscures the client IP address and it can appear to the application in your pods that the traffic originates from the node rather than the client.

local

With the local traffic policy, after the node receives the traffic, the service proxy only sends traffic to the pods on the same node. For example, if the speaker pod on node A announces the external service IP, then all traffic is sent to node A. After the traffic enters node A, the service proxy only sends traffic to pods for the service that are also on node A. Pods for the service that are on additional nodes do not receive any traffic from node A. Pods for the service on additional nodes act as replicas in case failover is needed.

This policy does not affect the client IP address. Application pods can determine the client IP address from the incoming connections.

Note

The following information is important when configuring the external traffic policy in BGP mode.

Although MetalLB advertises the load balancer IP address from all the eligible nodes, the number of nodes loadbalancing the service can be limited by the capacity of the router to establish equal-cost multipath (ECMP) routes. If the number of nodes advertising the IP is greater than the ECMP group limit of the router, the router will use less nodes than the ones advertising the IP.

For example, if the external traffic policy is set to local and the router has an ECMP group limit set to 16 and the pods implementing a LoadBalancer service are deployed on 30 nodes, this would result in pods deployed on 14 nodes not receiving any traffic. In this situation, it would be preferable to set the external traffic policy for the service to cluster.

32.1.5. MetalLB concepts for layer 2 mode

In layer 2 mode, the speaker pod on one node announces the external IP address for a service to the host network. From a network perspective, the node appears to have multiple IP addresses assigned to a network interface.

Note

In layer 2 mode, MetalLB relies on ARP and NDP. These protocols implement local address resolution within a specific subnet. In this context, the client must be able to reach the VIP assigned by MetalLB that exists on the same subnet as the nodes announcing the service in order for MetalLB to work.

The speaker pod responds to ARP requests for IPv4 services and NDP requests for IPv6.

In layer 2 mode, all traffic for a service IP address is routed through one node. After traffic enters the node, the service proxy for the CNI network provider distributes the traffic to all the pods for the service.

Because all traffic for a service enters through a single node in layer 2 mode, in a strict sense, MetalLB does not implement a load balancer for layer 2. Rather, MetalLB implements a failover mechanism for layer 2 so that when a speaker pod becomes unavailable, a speaker pod on a different node can announce the service IP address.

When a node becomes unavailable, failover is automatic. The speaker pods on the other nodes detect that a node is unavailable and a new speaker pod and node take ownership of the service IP address from the failed node.

Conceptual diagram for MetalLB and layer 2 mode

The preceding graphic shows the following concepts related to MetalLB:

  • An application is available through a service that has a cluster IP on the 172.130.0.0/16 subnet. That IP address is accessible from inside the cluster. The service also has an external IP address that MetalLB assigned to the service, 192.168.100.200.
  • Nodes 1 and 3 have a pod for the application.
  • The speaker daemon set runs a pod on each node. The MetalLB Operator starts these pods.
  • Each speaker pod is a host-networked pod. The IP address for the pod is identical to the IP address for the node on the host network.
  • The speaker pod on node 1 uses ARP to announce the external IP address for the service, 192.168.100.200. The speaker pod that announces the external IP address must be on the same node as an endpoint for the service and the endpoint must be in the Ready condition.
  • Client traffic is routed to the host network and connects to the 192.168.100.200 IP address. After traffic enters the node, the service proxy sends the traffic to the application pod on the same node or another node according to the external traffic policy that you set for the service.

    • If the external traffic policy for the service is set to cluster, the node that advertises the 192.168.100.200 load balancer IP address is selected from the nodes where a speaker pod is running. Only that node can receive traffic for the service.
    • If the external traffic policy for the service is set to local, the node that advertises the 192.168.100.200 load balancer IP address is selected from the nodes where a speaker pod is running and at least an endpoint of the service. Only that node can receive traffic for the service. In the preceding graphic, either node 1 or 3 would advertise 192.168.100.200.
  • If node 1 becomes unavailable, the external IP address fails over to another node. On another node that has an instance of the application pod and service endpoint, the speaker pod begins to announce the external IP address, 192.168.100.200 and the new node receives the client traffic. In the diagram, the only candidate is node 3.

32.1.6. MetalLB concepts for BGP mode

In BGP mode, by default each speaker pod advertises the load balancer IP address for a service to each BGP peer. It is also possible to advertise the IPs coming from a given pool to a specific set of peers by adding an optional list of BGP peers. BGP peers are commonly network routers that are configured to use the BGP protocol. When a router receives traffic for the load balancer IP address, the router picks one of the nodes with a speaker pod that advertised the IP address. The router sends the traffic to that node. After traffic enters the node, the service proxy for the CNI network plugin distributes the traffic to all the pods for the service.

The directly-connected router on the same layer 2 network segment as the cluster nodes can be configured as a BGP peer. If the directly-connected router is not configured as a BGP peer, you need to configure your network so that packets for load balancer IP addresses are routed between the BGP peers and the cluster nodes that run the speaker pods.

Each time a router receives new traffic for the load balancer IP address, it creates a new connection to a node. Each router manufacturer has an implementation-specific algorithm for choosing which node to initiate the connection with. However, the algorithms commonly are designed to distribute traffic across the available nodes for the purpose of balancing the network load.

If a node becomes unavailable, the router initiates a new connection with another node that has a speaker pod that advertises the load balancer IP address.

Figure 32.1. MetalLB topology diagram for BGP mode

Speaker pods on host network 10.0.1.0/24 use BGP to advertise the load balancer IP address, 203.0.113.200, to a router.

The preceding graphic shows the following concepts related to MetalLB:

  • An application is available through a service that has an IPv4 cluster IP on the 172.130.0.0/16 subnet. That IP address is accessible from inside the cluster. The service also has an external IP address that MetalLB assigned to the service, 203.0.113.200.
  • Nodes 2 and 3 have a pod for the application.
  • The speaker daemon set runs a pod on each node. The MetalLB Operator starts these pods. You can configure MetalLB to specify which nodes run the speaker pods.
  • Each speaker pod is a host-networked pod. The IP address for the pod is identical to the IP address for the node on the host network.
  • Each speaker pod starts a BGP session with all BGP peers and advertises the load balancer IP addresses or aggregated routes to the BGP peers. The speaker pods advertise that they are part of Autonomous System 65010. The diagram shows a router, R1, as a BGP peer within the same Autonomous System. However, you can configure MetalLB to start BGP sessions with peers that belong to other Autonomous Systems.
  • All the nodes with a speaker pod that advertises the load balancer IP address can receive traffic for the service.

    • If the external traffic policy for the service is set to cluster, all the nodes where a speaker pod is running advertise the 203.0.113.200 load balancer IP address and all the nodes with a speaker pod can receive traffic for the service. The host prefix is advertised to the router peer only if the external traffic policy is set to cluster.
    • If the external traffic policy for the service is set to local, then all the nodes where a speaker pod is running and at least an endpoint of the service is running can advertise the 203.0.113.200 load balancer IP address. Only those nodes can receive traffic for the service. In the preceding graphic, nodes 2 and 3 would advertise 203.0.113.200.
  • You can configure MetalLB to control which speaker pods start BGP sessions with specific BGP peers by specifying a node selector when you add a BGP peer custom resource.
  • Any routers, such as R1, that are configured to use BGP can be set as BGP peers.
  • Client traffic is routed to one of the nodes on the host network. After traffic enters the node, the service proxy sends the traffic to the application pod on the same node or another node according to the external traffic policy that you set for the service.
  • If a node becomes unavailable, the router detects the failure and initiates a new connection with another node. You can configure MetalLB to use a Bidirectional Forwarding Detection (BFD) profile for BGP peers. BFD provides faster link failure detection so that routers can initiate new connections earlier than without BFD.

32.1.7. Limitations and restrictions

32.1.7.1. Infrastructure considerations for MetalLB

MetalLB is primarily useful for on-premise, bare metal installations because these installations do not include a native load-balancer capability. In addition to bare metal installations, installations of OpenShift Container Platform on some infrastructures might not include a native load-balancer capability. For example, the following infrastructures can benefit from adding the MetalLB Operator:

  • Bare metal
  • VMware vSphere
  • IBM Z® and IBM® LinuxONE
  • IBM Z® and IBM® LinuxONE for Red Hat Enterprise Linux (RHEL) KVM
  • IBM Power®

MetalLB Operator and MetalLB are supported with the OpenShift SDN and OVN-Kubernetes network providers.

32.1.7.2. Limitations for layer 2 mode

32.1.7.2.1. Single-node bottleneck

MetalLB routes all traffic for a service through a single node, the node can become a bottleneck and limit performance.

Layer 2 mode limits the ingress bandwidth for your service to the bandwidth of a single node. This is a fundamental limitation of using ARP and NDP to direct traffic.

32.1.7.2.2. Slow failover performance

Failover between nodes depends on cooperation from the clients. When a failover occurs, MetalLB sends gratuitous ARP packets to notify clients that the MAC address associated with the service IP has changed.

Most client operating systems handle gratuitous ARP packets correctly and update their neighbor caches promptly. When clients update their caches quickly, failover completes within a few seconds. Clients typically fail over to a new node within 10 seconds. However, some client operating systems either do not handle gratuitous ARP packets at all or have outdated implementations that delay the cache update.

Recent versions of common operating systems such as Windows, macOS, and Linux implement layer 2 failover correctly. Issues with slow failover are not expected except for older and less common client operating systems.

To minimize the impact from a planned failover on outdated clients, keep the old node running for a few minutes after flipping leadership. The old node can continue to forward traffic for outdated clients until their caches refresh.

During an unplanned failover, the service IPs are unreachable until the outdated clients refresh their cache entries.

32.1.7.2.3. Additional Network and MetalLB cannot use same network

Using the same VLAN for both MetalLB and an additional network interface set up on a source pod might result in a connection failure. This occurs when both the MetalLB IP and the source pod reside on the same node.

To avoid connection failures, place the MetalLB IP in a different subnet from the one where the source pod resides. This configuration ensures that traffic from the source pod will take the default gateway. Consequently, the traffic can effectively reach its destination by using the OVN overlay network, ensuring that the connection functions as intended.

32.1.7.3. Limitations for BGP mode

32.1.7.3.1. Node failure can break all active connections

MetalLB shares a limitation that is common to BGP-based load balancing. When a BGP session terminates, such as when a node fails or when a speaker pod restarts, the session termination might result in resetting all active connections. End users can experience a Connection reset by peer message.

The consequence of a terminated BGP session is implementation-specific for each router manufacturer. However, you can anticipate that a change in the number of speaker pods affects the number of BGP sessions and that active connections with BGP peers will break.

To avoid or reduce the likelihood of a service interruption, you can specify a node selector when you add a BGP peer. By limiting the number of nodes that start BGP sessions, a fault on a node that does not have a BGP session has no affect on connections to the service.

32.1.7.3.2. Support for a single ASN and a single router ID only

When you add a BGP peer custom resource, you specify the spec.myASN field to identify the Autonomous System Number (ASN) that MetalLB belongs to. OpenShift Container Platform uses an implementation of BGP with MetalLB that requires MetalLB to belong to a single ASN. If you attempt to add a BGP peer and specify a different value for spec.myASN than an existing BGP peer custom resource, you receive an error.

Similarly, when you add a BGP peer custom resource, the spec.routerID field is optional. If you specify a value for this field, you must specify the same value for all other BGP peer custom resources that you add.

The limitation to support a single ASN and single router ID is a difference with the community-supported implementation of MetalLB.

32.1.8. Additional resources

32.2. Installing the MetalLB Operator

As a cluster administrator, you can add the MetallB Operator so that the Operator can manage the lifecycle for an instance of MetalLB on your cluster.

MetalLB and IP failover are incompatible. If you configured IP failover for your cluster, perform the steps to remove IP failover before you install the Operator.

32.2.1. Installing the MetalLB Operator from the OperatorHub using the web console

As a cluster administrator, you can install the MetalLB Operator by using the OpenShift Container Platform web console.

Prerequisites

  • Log in as a user with cluster-admin privileges.

Procedure

  1. In the OpenShift Container Platform web console, navigate to Operators OperatorHub.
  2. Type a keyword into the Filter by keyword box or scroll to find the Operator you want. For example, type metallb to find the MetalLB Operator.

    You can also filter options by Infrastructure Features. For example, select Disconnected if you want to see Operators that work in disconnected environments, also known as restricted network environments.

  3. On the Install Operator page, accept the defaults and click Install.

Verification

  1. To confirm that the installation is successful:

    1. Navigate to the Operators Installed Operators page.
    2. Check that the Operator is installed in the openshift-operators namespace and that its status is Succeeded.
  2. If the Operator is not installed successfully, check the status of the Operator and review the logs:

    1. Navigate to the Operators Installed Operators page and inspect the Status column for any errors or failures.
    2. Navigate to the Workloads Pods page and check the logs in any pods in the openshift-operators project that are reporting issues.

32.2.2. Installing from OperatorHub using the CLI

Instead of using the OpenShift Container Platform web console, you can install an Operator from OperatorHub using the CLI. You can use the OpenShift CLI (oc) to install the MetalLB Operator.

It is recommended that when using the CLI you install the Operator in the metallb-system namespace.

Prerequisites

  • A cluster installed on bare-metal hardware.
  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a namespace for the MetalLB Operator by entering the following command:

    $ cat << EOF | oc apply -f -
    apiVersion: v1
    kind: Namespace
    metadata:
      name: metallb-system
    EOF
  2. Create an Operator group custom resource (CR) in the namespace:

    $ cat << EOF | oc apply -f -
    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      name: metallb-operator
      namespace: metallb-system
    EOF
  3. Confirm the Operator group is installed in the namespace:

    $ oc get operatorgroup -n metallb-system

    Example output

    NAME               AGE
    metallb-operator   14m

  4. Create a Subscription CR:

    1. Define the Subscription CR and save the YAML file, for example, metallb-sub.yaml:

      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        name: metallb-operator-sub
        namespace: metallb-system
      spec:
        channel: stable
        name: metallb-operator
        source: redhat-operators 1
        sourceNamespace: openshift-marketplace
      1
      You must specify the redhat-operators value.
    2. To create the Subscription CR, run the following command:

      $ oc create -f metallb-sub.yaml
  5. Optional: To ensure BGP and BFD metrics appear in Prometheus, you can label the namespace as in the following command:

    $ oc label ns metallb-system "openshift.io/cluster-monitoring=true"

Verification

The verification steps assume the MetalLB Operator is installed in the metallb-system namespace.

  1. Confirm the install plan is in the namespace:

    $ oc get installplan -n metallb-system

    Example output

    NAME            CSV                                   APPROVAL    APPROVED
    install-wzg94   metallb-operator.4.16.0-nnnnnnnnnnnn   Automatic   true

    Note

    Installation of the Operator might take a few seconds.

  2. To verify that the Operator is installed, enter the following command:

    $ oc get clusterserviceversion -n metallb-system \
      -o custom-columns=Name:.metadata.name,Phase:.status.phase

    Example output

    Name                                  Phase
    metallb-operator.4.16.0-nnnnnnnnnnnn   Succeeded

32.2.3. Starting MetalLB on your cluster

After you install the Operator, you need to configure a single instance of a MetalLB custom resource. After you configure the custom resource, the Operator starts MetalLB on your cluster.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the MetalLB Operator.

Procedure

This procedure assumes the MetalLB Operator is installed in the metallb-system namespace. If you installed using the web console substitute openshift-operators for the namespace.

  1. Create a single instance of a MetalLB custom resource:

    $ cat << EOF | oc apply -f -
    apiVersion: metallb.io/v1beta1
    kind: MetalLB
    metadata:
      name: metallb
      namespace: metallb-system
    EOF

Verification

Confirm that the deployment for the MetalLB controller and the daemon set for the MetalLB speaker are running.

  1. Verify that the deployment for the controller is running:

    $ oc get deployment -n metallb-system controller

    Example output

    NAME         READY   UP-TO-DATE   AVAILABLE   AGE
    controller   1/1     1            1           11m

  2. Verify that the daemon set for the speaker is running:

    $ oc get daemonset -n metallb-system speaker

    Example output

    NAME      DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
    speaker   6         6         6       6            6           kubernetes.io/os=linux   18m

    The example output indicates 6 speaker pods. The number of speaker pods in your cluster might differ from the example output. Make sure the output indicates one pod for each node in your cluster.

32.2.4. Deployment specifications for MetalLB

When you start an instance of MetalLB using the MetalLB custom resource, you can configure deployment specifications in the MetalLB custom resource to manage how the controller or speaker pods deploy and run in your cluster. Use these deployment specifications to manage the following tasks:

  • Select nodes for MetalLB pod deployment.
  • Manage scheduling by using pod priority and pod affinity.
  • Assign CPU limits for MetalLB pods.
  • Assign a container RuntimeClass for MetalLB pods.
  • Assign metadata for MetalLB pods.

32.2.4.1. Limit speaker pods to specific nodes

By default, when you start MetalLB with the MetalLB Operator, the Operator starts an instance of a speaker pod on each node in the cluster. Only the nodes with a speaker pod can advertise a load balancer IP address. You can configure the MetalLB custom resource with a node selector to specify which nodes run the speaker pods.

The most common reason to limit the speaker pods to specific nodes is to ensure that only nodes with network interfaces on specific networks advertise load balancer IP addresses. Only the nodes with a running speaker pod are advertised as destinations of the load balancer IP address.

If you limit the speaker pods to specific nodes and specify local for the external traffic policy of a service, then you must ensure that the application pods for the service are deployed to the same nodes.

Example configuration to limit speaker pods to worker nodes

apiVersion: metallb.io/v1beta1
kind: MetalLB
metadata:
  name: metallb
  namespace: metallb-system
spec:
  nodeSelector:  1
    node-role.kubernetes.io/worker: ""
  speakerTolerations:   2
  - key: "Example"
    operator: "Exists"
    effect: "NoExecute"

1
The example configuration specifies to assign the speaker pods to worker nodes, but you can specify labels that you assigned to nodes or any valid node selector.
2
In this example configuration, the pod that this toleration is attached to tolerates any taint that matches the key value and effect value using the operator.

After you apply a manifest with the spec.nodeSelector field, you can check the number of pods that the Operator deployed with the oc get daemonset -n metallb-system speaker command. Similarly, you can display the nodes that match your labels with a command like oc get nodes -l node-role.kubernetes.io/worker=.

You can optionally allow the node to control which speaker pods should, or should not, be scheduled on them by using affinity rules. You can also limit these pods by applying a list of tolerations. For more information about affinity rules, taints, and tolerations, see the additional resources.

32.2.4.2. Configuring pod priority and pod affinity in a MetalLB deployment

You can optionally assign pod priority and pod affinity rules to controller and speaker pods by configuring the MetalLB custom resource. The pod priority indicates the relative importance of a pod on a node and schedules the pod based on this priority. Set a high priority on your controller or speaker pod to ensure scheduling priority over other pods on the node.

Pod affinity manages relationships among pods. Assign pod affinity to the controller or speaker pods to control on what node the scheduler places the pod in the context of pod relationships. For example, you can use pod affinity rules to ensure that certain pods are located on the same node or nodes, which can help improve network communication and reduce latency between those components.

Prerequisites

  • You are logged in as a user with cluster-admin privileges.
  • You have installed the MetalLB Operator.
  • You have started the MetalLB Operator on your cluster.

Procedure

  1. Create a PriorityClass custom resource, such as myPriorityClass.yaml, to configure the priority level. This example defines a PriorityClass named high-priority with a value of 1000000. Pods that are assigned this priority class are considered higher priority during scheduling compared to pods with lower priority classes:

    apiVersion: scheduling.k8s.io/v1
    kind: PriorityClass
    metadata:
      name: high-priority
    value: 1000000
  2. Apply the PriorityClass custom resource configuration:

    $ oc apply -f myPriorityClass.yaml
  3. Create a MetalLB custom resource, such as MetalLBPodConfig.yaml, to specify the priorityClassName and podAffinity values:

    apiVersion: metallb.io/v1beta1
    kind: MetalLB
    metadata:
      name: metallb
      namespace: metallb-system
    spec:
      logLevel: debug
      controllerConfig:
        priorityClassName: high-priority 1
        affinity:
          podAffinity: 2
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                 app: metallb
              topologyKey: kubernetes.io/hostname
      speakerConfig:
        priorityClassName: high-priority
        affinity:
          podAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                 app: metallb
              topologyKey: kubernetes.io/hostname
    1
    Specifies the priority class for the MetalLB controller pods. In this case, it is set to high-priority.
    2
    Specifies that you are configuring pod affinity rules. These rules dictate how pods are scheduled in relation to other pods or nodes. This configuration instructs the scheduler to schedule pods that have the label app: metallb onto nodes that share the same hostname. This helps to co-locate MetalLB-related pods on the same nodes, potentially optimizing network communication, latency, and resource usage between these pods.
  4. Apply the MetalLB custom resource configuration:

    $ oc apply -f MetalLBPodConfig.yaml

Verification

  • To view the priority class that you assigned to pods in the metallb-system namespace, run the following command:

    $ oc get pods -n metallb-system -o custom-columns=NAME:.metadata.name,PRIORITY:.spec.priorityClassName

    Example output

    NAME                                                 PRIORITY
    controller-584f5c8cd8-5zbvg                          high-priority
    metallb-operator-controller-manager-9c8d9985-szkqg   <none>
    metallb-operator-webhook-server-c895594d4-shjgx      <none>
    speaker-dddf7                                        high-priority

  • To verify that the scheduler placed pods according to pod affinity rules, view the metadata for the pod’s node or nodes by running the following command:

    $ oc get pod -o=custom-columns=NODE:.spec.nodeName,NAME:.metadata.name -n metallb-system

32.2.4.3. Configuring pod CPU limits in a MetalLB deployment

You can optionally assign pod CPU limits to controller and speaker pods by configuring the MetalLB custom resource. Defining CPU limits for the controller or speaker pods helps you to manage compute resources on the node. This ensures all pods on the node have the necessary compute resources to manage workloads and cluster housekeeping.

Prerequisites

  • You are logged in as a user with cluster-admin privileges.
  • You have installed the MetalLB Operator.

Procedure

  1. Create a MetalLB custom resource file, such as CPULimits.yaml, to specify the cpu value for the controller and speaker pods:

    apiVersion: metallb.io/v1beta1
    kind: MetalLB
    metadata:
      name: metallb
      namespace: metallb-system
    spec:
      logLevel: debug
      controllerConfig:
        resources:
          limits:
            cpu: "200m"
      speakerConfig:
        resources:
          limits:
            cpu: "300m"
  2. Apply the MetalLB custom resource configuration:

    $ oc apply -f CPULimits.yaml

Verification

  • To view compute resources for a pod, run the following command, replacing <pod_name> with your target pod:

    $ oc describe pod <pod_name>

32.2.5. Additional resources

32.2.6. Next steps

32.3. Upgrading the MetalLB

If you are currently running version 4.10 or an earlier version of the MetalLB Operator, please note that automatic updates to any version later than 4.10 do not work. Upgrading to a newer version from any version of the MetalLB Operator that is 4.11 or later is successful. For example, upgrading from version 4.12 to version 4.13 will occur smoothly.

A summary of the upgrade procedure for the MetalLB Operator from 4.10 and earlier is as follows:

  1. Delete the installed MetalLB Operator version for example 4.10. Ensure that the namespace and the metallb custom resource are not removed.
  2. Using the CLI, install the MetalLB Operator 4.16 in the same namespace where the previous version of the MetalLB Operator was installed.
Note

This procedure does not apply to automatic z-stream updates of the MetalLB Operator, which follow the standard straightforward method.

For detailed steps to upgrade the MetalLB Operator from 4.10 and earlier, see the guidance that follows. As a cluster administrator, start the upgrade process by deleting the MetalLB Operator by using the OpenShift CLI (oc) or the web console.

32.3.1. Deleting the MetalLB Operator from a cluster using the web console

Cluster administrators can delete installed Operators from a selected namespace by using the web console.

Prerequisites

  • Access to an OpenShift Container Platform cluster web console using an account with cluster-admin permissions.

Procedure

  1. Navigate to the Operators Installed Operators page.
  2. Search for the MetalLB Operator. Then, click on it.
  3. On the right side of the Operator Details page, select Uninstall Operator from the Actions drop-down menu.

    An Uninstall Operator? dialog box is displayed.

  4. Select Uninstall to remove the Operator, Operator deployments, and pods. Following this action, the Operator stops running and no longer receives updates.

    Note

    This action does not remove resources managed by the Operator, including custom resource definitions (CRDs) and custom resources (CRs). Dashboards and navigation items enabled by the web console and off-cluster resources that continue to run might need manual clean up. To remove these after uninstalling the Operator, you might need to manually delete the Operator CRDs.

32.3.2. Deleting MetalLB Operator from a cluster using the CLI

Cluster administrators can delete installed Operators from a selected namespace by using the CLI.

Prerequisites

  • Access to an OpenShift Container Platform cluster using an account with cluster-admin permissions.
  • oc command installed on workstation.

Procedure

  1. Check the current version of the subscribed MetalLB Operator in the currentCSV field:

    $ oc get subscription metallb-operator -n metallb-system -o yaml | grep currentCSV

    Example output

      currentCSV: metallb-operator.4.10.0-202207051316

  2. Delete the subscription:

    $ oc delete subscription metallb-operator -n metallb-system

    Example output

    subscription.operators.coreos.com "metallb-operator" deleted

  3. Delete the CSV for the Operator in the target namespace using the currentCSV value from the previous step:

    $ oc delete clusterserviceversion metallb-operator.4.10.0-202207051316 -n metallb-system

    Example output

    clusterserviceversion.operators.coreos.com "metallb-operator.4.10.0-202207051316" deleted

32.3.3. Editing the MetalLB Operator Operator group

When upgrading from any MetalLB Operator version up to and including 4.10 to 4.11 and later, remove spec.targetNamespaces from the Operator group custom resource (CR). You must remove the spec regardless of whether you used the web console or the CLI to delete the MetalLB Operator.

Note

The MetalLB Operator version 4.11 or later only supports the AllNamespaces install mode, whereas 4.10 or earlier versions support OwnNamespace or SingleNamespace modes.

Prerequisites

  • You have access to an OpenShift Container Platform cluster with cluster-admin permissions.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. List the Operator groups in the metallb-system namespace by running the following command:

    $ oc get operatorgroup -n metallb-system

    Example output

    NAME                   AGE
    metallb-system-7jc66   85m

  2. Verify that the spec.targetNamespaces is present in the Operator group CR associated with the metallb-system namespace by running the following command:

    $ oc get operatorgroup metallb-system-7jc66 -n metallb-system -o yaml

    Example output

    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      annotations:
        olm.providedAPIs: ""
      creationTimestamp: "2023-10-25T09:42:49Z"
      generateName: metallb-system-
      generation: 1
      name: metallb-system-7jc66
      namespace: metallb-system
      resourceVersion: "25027"
      uid: f5f644a0-eef8-4e31-a306-e2bbcfaffab3
    spec:
      targetNamespaces:
      - metallb-system
      upgradeStrategy: Default
    status:
      lastUpdated: "2023-10-25T09:42:49Z"
      namespaces:
      - metallb-system

  3. Edit the Operator group and remove the targetNamespaces and metallb-system present under the spec section by running the following command:

    $ oc edit n metallb-system

    Example output

    operatorgroup.operators.coreos.com/metallb-system-7jc66 edited

  4. Verify the spec.targetNamespaces is removed from the Operator group custom resource associated with the metallb-system namespace by running the following command:

    $ oc get operatorgroup metallb-system-7jc66 -n metallb-system -o yaml

    Example output

    apiVersion: operators.coreos.com/v1
    kind: OperatorGroup
    metadata:
      annotations:
        olm.providedAPIs: ""
      creationTimestamp: "2023-10-25T09:42:49Z"
      generateName: metallb-system-
      generation: 2
      name: metallb-system-7jc66
      namespace: metallb-system
      resourceVersion: "61658"
      uid: f5f644a0-eef8-4e31-a306-e2bbcfaffab3
    spec:
      upgradeStrategy: Default
    status:
      lastUpdated: "2023-10-25T14:31:30Z"
      namespaces:
      - ""

32.3.4. Upgrading the MetalLB Operator

Prerequisites

  • Access the cluster as a user with the cluster-admin role.

Procedure

  1. Verify that the metallb-system namespace still exists:

    $ oc get namespaces | grep metallb-system

    Example output

    metallb-system                                     Active   31m

  2. Verify the metallb custom resource still exists:

    $ oc get metallb -n metallb-system

    Example output

    NAME      AGE
    metallb   33m

  3. Follow the guidance in "Installing from OperatorHub using the CLI" to install the latest 4.16 version of the MetalLB Operator.

    Note

    When installing the latest 4.16 version of the MetalLB Operator, you must install the Operator to the same namespace it was previously installed to.

  4. Verify the upgraded version of the Operator is now the 4.16 version.

    $ oc get csv -n metallb-system

    Example output

    NAME                                   DISPLAY            VERSION               REPLACES   PHASE
    metallb-operator.4.16.0-202207051316   MetalLB Operator   4.16.0-202207051316              Succeeded

32.3.5. Additional resources

32.4. Configuring MetalLB address pools

As a cluster administrator, you can add, modify, and delete address pools. The MetalLB Operator uses the address pool custom resources to set the IP addresses that MetalLB can assign to services. The namespace used in the examples assume the namespace is metallb-system.

32.4.1. About the IPAddressPool custom resource

The fields for the IPAddressPool custom resource are described in the following tables.

Table 32.1. MetalLB IPAddressPool pool custom resource
FieldTypeDescription

metadata.name

string

Specifies the name for the address pool. When you add a service, you can specify this pool name in the metallb.universe.tf/address-pool annotation to select an IP address from a specific pool. The names doc-example, silver, and gold are used throughout the documentation.

metadata.namespace

string

Specifies the namespace for the address pool. Specify the same namespace that the MetalLB Operator uses.

metadata.label

string

Optional: Specifies the key value pair assigned to the IPAddressPool. This can be referenced by the ipAddressPoolSelectors in the BGPAdvertisement and L2Advertisement CRD to associate the IPAddressPool with the advertisement

spec.addresses

string

Specifies a list of IP addresses for MetalLB Operator to assign to services. You can specify multiple ranges in a single pool; they will all share the same settings. Specify each range in CIDR notation or as starting and ending IP addresses separated with a hyphen.

spec.autoAssign

boolean

Optional: Specifies whether MetalLB automatically assigns IP addresses from this pool. Specify false if you want explicitly request an IP address from this pool with the metallb.universe.tf/address-pool annotation. The default value is true.

spec.avoidBuggyIPs

boolean

Optional: This ensures when enabled that IP addresses ending .0 and .255 are not allocated from the pool. The default value is false. Some older consumer network equipment mistakenly block IP addresses ending in .0 and .255.

You can assign IP addresses from an IPAddressPool to services and namespaces by configuring the spec.serviceAllocation specification.

Table 32.2. MetalLB IPAddressPool custom resource spec.serviceAllocation subfields
FieldTypeDescription

priority

int

Optional: Defines the priority between IP address pools when more than one IP address pool matches a service or namespace. A lower number indicates a higher priority.

namespaces

array (string)

Optional: Specifies a list of namespaces that you can assign to IP addresses in an IP address pool.

namespaceSelectors

array (LabelSelector)

Optional: Specifies namespace labels that you can assign to IP addresses from an IP address pool by using label selectors in a list format.

serviceSelectors

array (LabelSelector)

Optional: Specifies service labels that you can assign to IP addresses from an address pool by using label selectors in a list format.

32.4.2. Configuring an address pool

As a cluster administrator, you can add address pools to your cluster to control the IP addresses that MetalLB can assign to load-balancer services.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a file, such as ipaddresspool.yaml, with content like the following example:

    apiVersion: metallb.io/v1beta1
    kind: IPAddressPool
    metadata:
      namespace: metallb-system
      name: doc-example
      labels: 1
        zone: east
    spec:
      addresses:
      - 203.0.113.1-203.0.113.10
      - 203.0.113.65-203.0.113.75
    1
    This label assigned to the IPAddressPool can be referenced by the ipAddressPoolSelectors in the BGPAdvertisement CRD to associate the IPAddressPool with the advertisement.
  2. Apply the configuration for the IP address pool:

    $ oc apply -f ipaddresspool.yaml

Verification

  • View the address pool:

    $ oc describe -n metallb-system IPAddressPool doc-example

    Example output

    Name:         doc-example
    Namespace:    metallb-system
    Labels:       zone=east
    Annotations:  <none>
    API Version:  metallb.io/v1beta1
    Kind:         IPAddressPool
    Metadata:
      ...
    Spec:
      Addresses:
        203.0.113.1-203.0.113.10
        203.0.113.65-203.0.113.75
      Auto Assign:  true
    Events:         <none>

Confirm that the address pool name, such as doc-example, and the IP address ranges appear in the output.

32.4.3. Configure MetalLB address pool for VLAN

As a cluster administrator, you can add address pools to your cluster to control the IP addresses on a created VLAN that MetalLB can assign to load-balancer services

Prerequisites

  • Install the OpenShift CLI (oc).
  • Configure a separate VLAN.
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a file, such as ipaddresspool-vlan.yaml, that is similar to the following example:

    apiVersion: metallb.io/v1beta1
    kind: IPAddressPool
    metadata:
      namespace: metallb-system
      name: doc-example-vlan
      labels:
        zone: east 1
    spec:
      addresses:
      - 192.168.100.1-192.168.100.254 2
    1
    This label assigned to the IPAddressPool can be referenced by the ipAddressPoolSelectors in the BGPAdvertisement CRD to associate the IPAddressPool with the advertisement.
    2
    This IP range must match the subnet assigned to the VLAN on your network. To support layer 2 (L2) mode, the IP address range must be within the same subnet as the cluster nodes.
  2. Apply the configuration for the IP address pool:

    $ oc apply -f ipaddresspool-vlan.yaml
  3. To ensure this configuration applies to the VLAN you need to set the spec gatewayConfig.ipForwarding to Global.

    1. Run the following command to edit the network configuration custom resource (CR):

      $ oc edit network.config.openshift/cluster
    2. Update the spec.defaultNetwork.ovnKubernetesConfig section to include the gatewayConfig.ipForwarding set to Global. It should look something like this:

      Example

      ...
      spec:
        clusterNetwork:
          - cidr: 10.128.0.0/14
            hostPrefix: 23
        defaultNetwork:
          type: OVNKubernetes
          ovnKubernetesConfig:
            gatewayConfig:
              ipForwarding: Global
      ...

32.4.4. Example address pool configurations

32.4.4.1. Example: IPv4 and CIDR ranges

You can specify a range of IP addresses in CIDR notation. You can combine CIDR notation with the notation that uses a hyphen to separate lower and upper bounds.

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: doc-example-cidr
  namespace: metallb-system
spec:
  addresses:
  - 192.168.100.0/24
  - 192.168.200.0/24
  - 192.168.255.1-192.168.255.5

32.4.4.2. Example: Reserve IP addresses

You can set the autoAssign field to false to prevent MetalLB from automatically assigning the IP addresses from the pool. When you add a service, you can request a specific IP address from the pool or you can specify the pool name in an annotation to request any IP address from the pool.

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: doc-example-reserved
  namespace: metallb-system
spec:
  addresses:
  - 10.0.100.0/28
  autoAssign: false

32.4.4.3. Example: IPv4 and IPv6 addresses

You can add address pools that use IPv4 and IPv6. You can specify multiple ranges in the addresses list, just like several IPv4 examples.

Whether the service is assigned a single IPv4 address, a single IPv6 address, or both is determined by how you add the service. The spec.ipFamilies and spec.ipFamilyPolicy fields control how IP addresses are assigned to the service.

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: doc-example-combined
  namespace: metallb-system
spec:
  addresses:
  - 10.0.100.0/28
  - 2002:2:2::1-2002:2:2::100

32.4.4.4. Example: Assign IP address pools to services or namespaces

You can assign IP addresses from an IPAddressPool to services and namespaces that you specify.

If you assign a service or namespace to more than one IP address pool, MetalLB uses an available IP address from the higher-priority IP address pool. If no IP addresses are available from the assigned IP address pools with a high priority, MetalLB uses available IP addresses from an IP address pool with lower priority or no priority.

Note

You can use the matchLabels label selector, the matchExpressions label selector, or both, for the namespaceSelectors and serviceSelectors specifications. This example demonstrates one label selector for each specification.

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: doc-example-service-allocation
  namespace: metallb-system
spec:
  addresses:
    - 192.168.20.0/24
  serviceAllocation:
    priority: 50 1
    namespaces: 2
      - namespace-a
      - namespace-b
    namespaceSelectors: 3
      - matchLabels:
          zone: east
    serviceSelectors: 4
      - matchExpressions:
        - key: security
          operator: In
          values:
          - S1
1
Assign a priority to the address pool. A lower number indicates a higher priority.
2
Assign one or more namespaces to the IP address pool in a list format.
3
Assign one or more namespace labels to the IP address pool by using label selectors in a list format.
4
Assign one or more service labels to the IP address pool by using label selectors in a list format.

32.4.5. Next steps

32.5. About advertising for the IP address pools

You can configure MetalLB so that the IP address is advertised with layer 2 protocols, the BGP protocol, or both. With layer 2, MetalLB provides a fault-tolerant external IP address. With BGP, MetalLB provides fault-tolerance for the external IP address and load balancing.

MetalLB supports advertising using L2 and BGP for the same set of IP addresses.

MetalLB provides the flexibility to assign address pools to specific BGP peers effectively to a subset of nodes on the network. This allows for more complex configurations, for example facilitating the isolation of nodes or the segmentation of the network.

32.5.1. About the BGPAdvertisement custom resource

The fields for the BGPAdvertisements object are defined in the following table:

Table 32.3. BGPAdvertisements configuration
FieldTypeDescription

metadata.name

string

Specifies the name for the BGP advertisement.

metadata.namespace

string

Specifies the namespace for the BGP advertisement. Specify the same namespace that the MetalLB Operator uses.

spec.aggregationLength

integer

Optional: Specifies the number of bits to include in a 32-bit CIDR mask. To aggregate the routes that the speaker advertises to BGP peers, the mask is applied to the routes for several service IP addresses and the speaker advertises the aggregated route. For example, with an aggregation length of 24, the speaker can aggregate several 10.0.1.x/32 service IP addresses and advertise a single 10.0.1.0/24 route.

spec.aggregationLengthV6

integer

Optional: Specifies the number of bits to include in a 128-bit CIDR mask. For example, with an aggregation length of 124, the speaker can aggregate several fc00:f853:0ccd:e799::x/128 service IP addresses and advertise a single fc00:f853:0ccd:e799::0/124 route.

spec.communities

string

Optional: Specifies one or more BGP communities. Each community is specified as two 16-bit values separated by the colon character. Well-known communities must be specified as 16-bit values:

  • NO_EXPORT: 65535:65281
  • NO_ADVERTISE: 65535:65282
  • NO_EXPORT_SUBCONFED: 65535:65283

    Note

    You can also use community objects that are created along with the strings.

spec.localPref

integer

Optional: Specifies the local preference for this advertisement. This BGP attribute applies to BGP sessions within the Autonomous System.

spec.ipAddressPools

string

Optional: The list of IPAddressPools to advertise with this advertisement, selected by name.

spec.ipAddressPoolSelectors

string

Optional: A selector for the IPAddressPools that gets advertised with this advertisement. This is for associating the IPAddressPool to the advertisement based on the label assigned to the IPAddressPool instead of the name itself. If no IPAddressPool is selected by this or by the list, the advertisement is applied to all the IPAddressPools.

spec.nodeSelectors

string

Optional: NodeSelectors allows to limit the nodes to announce as next hops for the load balancer IP. When empty, all the nodes are announced as next hops.

spec.peers

string

Optional: Peers limits the BGP peer to advertise the IPs of the selected pools to. When empty, the load balancer IP is announced to all the BGP peers configured.

32.5.2. Configuring MetalLB with a BGP advertisement and a basic use case

Configure MetalLB as follows so that the peer BGP routers receive one 203.0.113.200/32 route and one fc00:f853:ccd:e799::1/128 route for each load-balancer IP address that MetalLB assigns to a service. Because the localPref and communities fields are not specified, the routes are advertised with localPref set to zero and no BGP communities.

32.5.2.1. Example: Advertise a basic address pool configuration with BGP

Configure MetalLB as follows so that the IPAddressPool is advertised with the BGP protocol.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create an IP address pool.

    1. Create a file, such as ipaddresspool.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        namespace: metallb-system
        name: doc-example-bgp-basic
      spec:
        addresses:
          - 203.0.113.200/30
          - fc00:f853:ccd:e799::/124
    2. Apply the configuration for the IP address pool:

      $ oc apply -f ipaddresspool.yaml
  2. Create a BGP advertisement.

    1. Create a file, such as bgpadvertisement.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: BGPAdvertisement
      metadata:
        name: bgpadvertisement-basic
        namespace: metallb-system
      spec:
        ipAddressPools:
        - doc-example-bgp-basic
    2. Apply the configuration:

      $ oc apply -f bgpadvertisement.yaml

32.5.3. Configuring MetalLB with a BGP advertisement and an advanced use case

Configure MetalLB as follows so that MetalLB assigns IP addresses to load-balancer services in the ranges between 203.0.113.200 and 203.0.113.203 and between fc00:f853:ccd:e799::0 and fc00:f853:ccd:e799::f.

To explain the two BGP advertisements, consider an instance when MetalLB assigns the IP address of 203.0.113.200 to a service. With that IP address as an example, the speaker advertises two routes to BGP peers:

  • 203.0.113.200/32, with localPref set to 100 and the community set to the numeric value of the NO_ADVERTISE community. This specification indicates to the peer routers that they can use this route but they should not propagate information about this route to BGP peers.
  • 203.0.113.200/30, aggregates the load-balancer IP addresses assigned by MetalLB into a single route. MetalLB advertises the aggregated route to BGP peers with the community attribute set to 8000:800. BGP peers propagate the 203.0.113.200/30 route to other BGP peers. When traffic is routed to a node with a speaker, the 203.0.113.200/32 route is used to forward the traffic into the cluster and to a pod that is associated with the service.

As you add more services and MetalLB assigns more load-balancer IP addresses from the pool, peer routers receive one local route, 203.0.113.20x/32, for each service, as well as the 203.0.113.200/30 aggregate route. Each service that you add generates the /30 route, but MetalLB deduplicates the routes to one BGP advertisement before communicating with peer routers.

32.5.3.1. Example: Advertise an advanced address pool configuration with BGP

Configure MetalLB as follows so that the IPAddressPool is advertised with the BGP protocol.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create an IP address pool.

    1. Create a file, such as ipaddresspool.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        namespace: metallb-system
        name: doc-example-bgp-adv
        labels:
          zone: east
      spec:
        addresses:
          - 203.0.113.200/30
          - fc00:f853:ccd:e799::/124
        autoAssign: false
    2. Apply the configuration for the IP address pool:

      $ oc apply -f ipaddresspool.yaml
  2. Create a BGP advertisement.

    1. Create a file, such as bgpadvertisement1.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: BGPAdvertisement
      metadata:
        name: bgpadvertisement-adv-1
        namespace: metallb-system
      spec:
        ipAddressPools:
          - doc-example-bgp-adv
        communities:
          - 65535:65282
        aggregationLength: 32
        localPref: 100
    2. Apply the configuration:

      $ oc apply -f bgpadvertisement1.yaml
    3. Create a file, such as bgpadvertisement2.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: BGPAdvertisement
      metadata:
        name: bgpadvertisement-adv-2
        namespace: metallb-system
      spec:
        ipAddressPools:
          - doc-example-bgp-adv
        communities:
          - 8000:800
        aggregationLength: 30
        aggregationLengthV6: 124
    4. Apply the configuration:

      $ oc apply -f bgpadvertisement2.yaml

32.5.4. Advertising an IP address pool from a subset of nodes

To advertise an IP address from an IP addresses pool, from a specific set of nodes only, use the .spec.nodeSelector specification in the BGPAdvertisement custom resource. This specification associates a pool of IP addresses with a set of nodes in the cluster. This is useful when you have nodes on different subnets in a cluster and you want to advertise an IP addresses from an address pool from a specific subnet, for example a public-facing subnet only.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create an IP address pool by using a custom resource:

    apiVersion: metallb.io/v1beta1
    kind: IPAddressPool
    metadata:
      namespace: metallb-system
      name: pool1
    spec:
      addresses:
        - 4.4.4.100-4.4.4.200
        - 2001:100:4::200-2001:100:4::400
  2. Control which nodes in the cluster the IP address from pool1 advertises from by defining the .spec.nodeSelector value in the BGPAdvertisement custom resource:

    apiVersion: metallb.io/v1beta1
    kind: BGPAdvertisement
    metadata:
      name: example
    spec:
      ipAddressPools:
      - pool1
      nodeSelector:
      - matchLabels:
          kubernetes.io/hostname: NodeA
      - matchLabels:
          kubernetes.io/hostname: NodeB

In this example, the IP address from pool1 advertises from NodeA and NodeB only.

32.5.5. About the L2Advertisement custom resource

The fields for the l2Advertisements object are defined in the following table:

Table 32.4. L2 advertisements configuration
FieldTypeDescription

metadata.name

string

Specifies the name for the L2 advertisement.

metadata.namespace

string

Specifies the namespace for the L2 advertisement. Specify the same namespace that the MetalLB Operator uses.

spec.ipAddressPools

string

Optional: The list of IPAddressPools to advertise with this advertisement, selected by name.

spec.ipAddressPoolSelectors

string

Optional: A selector for the IPAddressPools that gets advertised with this advertisement. This is for associating the IPAddressPool to the advertisement based on the label assigned to the IPAddressPool instead of the name itself. If no IPAddressPool is selected by this or by the list, the advertisement is applied to all the IPAddressPools.

spec.nodeSelectors

string

Optional: NodeSelectors limits the nodes to announce as next hops for the load balancer IP. When empty, all the nodes are announced as next hops.

Important

Limiting the nodes to announce as next hops is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

spec.interfaces

string

Optional: The list of interfaces that are used to announce the load balancer IP.

32.5.6. Configuring MetalLB with an L2 advertisement

Configure MetalLB as follows so that the IPAddressPool is advertised with the L2 protocol.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create an IP address pool.

    1. Create a file, such as ipaddresspool.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        namespace: metallb-system
        name: doc-example-l2
      spec:
        addresses:
          - 4.4.4.0/24
        autoAssign: false
    2. Apply the configuration for the IP address pool:

      $ oc apply -f ipaddresspool.yaml
  2. Create a L2 advertisement.

    1. Create a file, such as l2advertisement.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: L2Advertisement
      metadata:
        name: l2advertisement
        namespace: metallb-system
      spec:
        ipAddressPools:
         - doc-example-l2
    2. Apply the configuration:

      $ oc apply -f l2advertisement.yaml

32.5.7. Configuring MetalLB with a L2 advertisement and label

The ipAddressPoolSelectors field in the BGPAdvertisement and L2Advertisement custom resource definitions is used to associate the IPAddressPool to the advertisement based on the label assigned to the IPAddressPool instead of the name itself.

This example shows how to configure MetalLB so that the IPAddressPool is advertised with the L2 protocol by configuring the ipAddressPoolSelectors field.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create an IP address pool.

    1. Create a file, such as ipaddresspool.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        namespace: metallb-system
        name: doc-example-l2-label
        labels:
          zone: east
      spec:
        addresses:
          - 172.31.249.87/32
    2. Apply the configuration for the IP address pool:

      $ oc apply -f ipaddresspool.yaml
  2. Create a L2 advertisement advertising the IP using ipAddressPoolSelectors.

    1. Create a file, such as l2advertisement.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: L2Advertisement
      metadata:
        name: l2advertisement-label
        namespace: metallb-system
      spec:
        ipAddressPoolSelectors:
          - matchExpressions:
              - key: zone
                operator: In
                values:
                  - east
    2. Apply the configuration:

      $ oc apply -f l2advertisement.yaml

32.5.8. Configuring MetalLB with an L2 advertisement for selected interfaces

By default, the IP addresses from IP address pool that has been assigned to the service, is advertised from all the network interfaces. The interfaces field in the L2Advertisement custom resource definition is used to restrict those network interfaces that advertise the IP address pool.

This example shows how to configure MetalLB so that the IP address pool is advertised only from the network interfaces listed in the interfaces field of all nodes.

Prerequisites

  • You have installed the OpenShift CLI (oc).
  • You are logged in as a user with cluster-admin privileges.

Procedure

  1. Create an IP address pool.

    1. Create a file, such as ipaddresspool.yaml, and enter the configuration details like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        namespace: metallb-system
        name: doc-example-l2
      spec:
        addresses:
          - 4.4.4.0/24
        autoAssign: false
    2. Apply the configuration for the IP address pool like the following example:

      $ oc apply -f ipaddresspool.yaml
  2. Create a L2 advertisement advertising the IP with interfaces selector.

    1. Create a YAML file, such as l2advertisement.yaml, and enter the configuration details like the following example:

      apiVersion: metallb.io/v1beta1
      kind: L2Advertisement
      metadata:
        name: l2advertisement
        namespace: metallb-system
      spec:
        ipAddressPools:
         - doc-example-l2
         interfaces:
         - interfaceA
         - interfaceB
    2. Apply the configuration for the advertisement like the following example:

      $ oc apply -f l2advertisement.yaml
Important

The interface selector does not affect how MetalLB chooses the node to announce a given IP by using L2. The chosen node does not announce the service if the node does not have the selected interface.

32.5.9. Configuring MetalLB with secondary networks

From OpenShift Container Platform 4.14 the default network behavior is to not allow forwarding of IP packets between network interfaces. Therefore, when MetalLB is configured on a secondary interface, you need to add a machine configuration to enable IP forwarding for only the required interfaces.

Note

OpenShift Container Platform clusters upgraded from 4.13 are not affected because a global parameter is set during upgrade to enable global IP forwarding.

To enable IP forwarding for the secondary interface, you have two options:

  • Enable IP forwarding for all interfaces.
  • Enable IP forwarding for a specific interface.

    Note

    Enabling IP forwarding for a specific interface provides more granular control, while enabling it for all interfaces applies a global setting.

Procedure

  1. Enable forwarding for a specific secondary interface, such as bridge-net by creating and applying a MachineConfig CR.

    1. Create the MachineConfig CR to enable IP forwarding for the specified secondary interface named bridge-net.
    2. Save the following YAML in the enable-ip-forward.yaml file:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: <node_role> 1
        name: 81-enable-global-forwarding
      spec:
        config:
          ignition:
            version: 3.2.0
          storage:
            files:
            - contents:
                source: data:text/plain;charset=utf-8;base64,`echo -e "net.ipv4.conf.bridge-net.forwarding = 1\nnet.ipv6.conf.bridge-net.forwarding = 1\nnet.ipv4.conf.bridge-net.rp_filter = 0\nnet.ipv6.conf.bridge-net.rp_filter = 0" | base64 -w0`
                 verification: {}
              filesystem: root
              mode: 644
              path: /etc/sysctl.d/enable-global-forwarding.conf
        osImageURL: ""
      1
      Node role where you want to enable IP forwarding, for example, worker
    3. Apply the configuration by running the following command:

      $ oc apply -f enable-ip-forward.yaml
  2. Alternatively, you can enable IP forwarding globally by running the following command:

    $ oc patch network.operator cluster -p '{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"gatewayConfig":{"ipForwarding": "Global"}}}}}

32.5.10. Additional resources

32.6. Configuring MetalLB BGP peers

As a cluster administrator, you can add, modify, and delete Border Gateway Protocol (BGP) peers. The MetalLB Operator uses the BGP peer custom resources to identify which peers that MetalLB speaker pods contact to start BGP sessions. The peers receive the route advertisements for the load-balancer IP addresses that MetalLB assigns to services.

32.6.1. About the BGP peer custom resource

The fields for the BGP peer custom resource are described in the following table.

Table 32.5. MetalLB BGP peer custom resource
FieldTypeDescription

metadata.name

string

Specifies the name for the BGP peer custom resource.

metadata.namespace

string

Specifies the namespace for the BGP peer custom resource.

spec.myASN

integer

Specifies the Autonomous System number for the local end of the BGP session. Specify the same value in all BGP peer custom resources that you add. The range is 0 to 4294967295.

spec.peerASN

integer

Specifies the Autonomous System number for the remote end of the BGP session. The range is 0 to 4294967295.

spec.peerAddress

string

Specifies the IP address of the peer to contact for establishing the BGP session.

spec.sourceAddress

string

Optional: Specifies the IP address to use when establishing the BGP session. The value must be an IPv4 address.

spec.peerPort

integer

Optional: Specifies the network port of the peer to contact for establishing the BGP session. The range is 0 to 16384.

spec.holdTime

string

Optional: Specifies the duration for the hold time to propose to the BGP peer. The minimum value is 3 seconds (3s). The common units are seconds and minutes, such as 3s, 1m, and 5m30s. To detect path failures more quickly, also configure BFD.

spec.keepaliveTime

string

Optional: Specifies the maximum interval between sending keep-alive messages to the BGP peer. If you specify this field, you must also specify a value for the holdTime field. The specified value must be less than the value for the holdTime field.

spec.routerID

string

Optional: Specifies the router ID to advertise to the BGP peer. If you specify this field, you must specify the same value in every BGP peer custom resource that you add.

spec.password

string

Optional: Specifies the MD5 password to send to the peer for routers that enforce TCP MD5 authenticated BGP sessions.

spec.passwordSecret

string

Optional: Specifies name of the authentication secret for the BGP Peer. The secret must live in the metallb namespace and be of type basic-auth.

spec.bfdProfile

string

Optional: Specifies the name of a BFD profile.

spec.nodeSelectors

object[]

Optional: Specifies a selector, using match expressions and match labels, to control which nodes can connect to the BGP peer.

spec.ebgpMultiHop

boolean

Optional: Specifies that the BGP peer is multiple network hops away. If the BGP peer is not directly connected to the same network, the speaker cannot establish a BGP session unless this field is set to true. This field applies to external BGP. External BGP is the term that is used to describe when a BGP peer belongs to a different Autonomous System.

Note

The passwordSecret field is mutually exclusive with the password field, and contains a reference to a secret containing the password to use. Setting both fields results in a failure of the parsing.

32.6.2. Configuring a BGP peer

As a cluster administrator, you can add a BGP peer custom resource to exchange routing information with network routers and advertise the IP addresses for services.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Configure MetalLB with a BGP advertisement.

Procedure

  1. Create a file, such as bgppeer.yaml, with content like the following example:

    apiVersion: metallb.io/v1beta2
    kind: BGPPeer
    metadata:
      namespace: metallb-system
      name: doc-example-peer
    spec:
      peerAddress: 10.0.0.1
      peerASN: 64501
      myASN: 64500
      routerID: 10.10.10.10
  2. Apply the configuration for the BGP peer:

    $ oc apply -f bgppeer.yaml

32.6.3. Configure a specific set of BGP peers for a given address pool

This procedure illustrates how to:

  • Configure a set of address pools (pool1 and pool2).
  • Configure a set of BGP peers (peer1 and peer2).
  • Configure BGP advertisement to assign pool1 to peer1 and pool2 to peer2.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create address pool pool1.

    1. Create a file, such as ipaddresspool1.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        namespace: metallb-system
        name: pool1
      spec:
        addresses:
          - 4.4.4.100-4.4.4.200
          - 2001:100:4::200-2001:100:4::400
    2. Apply the configuration for the IP address pool pool1:

      $ oc apply -f ipaddresspool1.yaml
  2. Create address pool pool2.

    1. Create a file, such as ipaddresspool2.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        namespace: metallb-system
        name: pool2
      spec:
        addresses:
          - 5.5.5.100-5.5.5.200
          - 2001:100:5::200-2001:100:5::400
    2. Apply the configuration for the IP address pool pool2:

      $ oc apply -f ipaddresspool2.yaml
  3. Create BGP peer1.

    1. Create a file, such as bgppeer1.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta2
      kind: BGPPeer
      metadata:
        namespace: metallb-system
        name: peer1
      spec:
        peerAddress: 10.0.0.1
        peerASN: 64501
        myASN: 64500
        routerID: 10.10.10.10
    2. Apply the configuration for the BGP peer:

      $ oc apply -f bgppeer1.yaml
  4. Create BGP peer2.

    1. Create a file, such as bgppeer2.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta2
      kind: BGPPeer
      metadata:
        namespace: metallb-system
        name: peer2
      spec:
        peerAddress: 10.0.0.2
        peerASN: 64501
        myASN: 64500
        routerID: 10.10.10.10
    2. Apply the configuration for the BGP peer2:

      $ oc apply -f bgppeer2.yaml
  5. Create BGP advertisement 1.

    1. Create a file, such as bgpadvertisement1.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: BGPAdvertisement
      metadata:
        name: bgpadvertisement-1
        namespace: metallb-system
      spec:
        ipAddressPools:
          - pool1
        peers:
          - peer1
        communities:
          - 65535:65282
        aggregationLength: 32
        aggregationLengthV6: 128
        localPref: 100
    2. Apply the configuration:

      $ oc apply -f bgpadvertisement1.yaml
  6. Create BGP advertisement 2.

    1. Create a file, such as bgpadvertisement2.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: BGPAdvertisement
      metadata:
        name: bgpadvertisement-2
        namespace: metallb-system
      spec:
        ipAddressPools:
          - pool2
        peers:
          - peer2
        communities:
          - 65535:65282
        aggregationLength: 32
        aggregationLengthV6: 128
        localPref: 100
    2. Apply the configuration:

      $ oc apply -f bgpadvertisement2.yaml

32.6.4. Exposing a service through a network VRF

You can expose a service through a virtual routing and forwarding (VRF) instance by associating a VRF on a network interface with a BGP peer.

Important

Exposing a service through a VRF on a BGP peer is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

By using a VRF on a network interface to expose a service through a BGP peer, you can segregate traffic to the service, configure independent routing decisions, and enable multi-tenancy support on a network interface.

Note

By establishing a BGP session through an interface belonging to a network VRF, MetalLB can advertise services through that interface and enable external traffic to reach the service through this interface. However, the network VRF routing table is different from the default VRF routing table used by OVN-Kubernetes. Therefore, the traffic cannot reach the OVN-Kubernetes network infrastructure.

To enable the traffic directed to the service to reach the OVN-Kubernetes network infrastructure, you must configure routing rules to define the next hops for network traffic. See the NodeNetworkConfigurationPolicy resource in "Managing symmetric routing with MetalLB" in the Additional resources section for more information.

These are the high-level steps to expose a service through a network VRF with a BGP peer:

  1. Define a BGP peer and add a network VRF instance.
  2. Specify an IP address pool for MetalLB.
  3. Configure a BGP route advertisement for MetalLB to advertise a route using the specified IP address pool and the BGP peer associated with the VRF instance.
  4. Deploy a service to test the configuration.

Prerequisites

  • You installed the OpenShift CLI (oc).
  • You logged in as a user with cluster-admin privileges.
  • You defined a NodeNetworkConfigurationPolicy to associate a Virtual Routing and Forwarding (VRF) instance with a network interface. For more information about completing this prerequisite, see the Additional resources section.
  • You installed MetalLB on your cluster.

Procedure

  1. Create a BGPPeer custom resources (CR):

    1. Create a file, such as frrviavrf.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta2
      kind: BGPPeer
      metadata:
        name: frrviavrf
        namespace: metallb-system
      spec:
        myASN: 100
        peerASN: 200
        peerAddress: 192.168.130.1
        vrf: ens4vrf 1
      1
      Specifies the network VRF instance to associate with the BGP peer. MetalLB can advertise services and make routing decisions based on the routing information in the VRF.
      Note

      You must configure this network VRF instance in a NodeNetworkConfigurationPolicy CR. See the Additional resources for more information.

    2. Apply the configuration for the BGP peer by running the following command:

      $ oc apply -f frrviavrf.yaml
  2. Create an IPAddressPool CR:

    1. Create a file, such as first-pool.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        name: first-pool
        namespace: metallb-system
      spec:
        addresses:
        - 192.169.10.0/32
    2. Apply the configuration for the IP address pool by running the following command:

      $ oc apply -f first-pool.yaml
  3. Create a BGPAdvertisement CR:

    1. Create a file, such as first-adv.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: BGPAdvertisement
      metadata:
        name: first-adv
        namespace: metallb-system
      spec:
        ipAddressPools:
          - first-pool
        peers:
          - frrviavrf 1
      1
      In this example, MetalLB advertises a range of IP addresses from the first-pool IP address pool to the frrviavrf BGP peer.
    2. Apply the configuration for the BGP advertisement by running the following command:

      $ oc apply -f first-adv.yaml
  4. Create a Namespace, Deployment, and Service CR:

    1. Create a file, such as deploy-service.yaml, with content like the following example:

      apiVersion: v1
      kind: Namespace
      metadata:
        name: test
      ---
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: server
        namespace: test
      spec:
        selector:
          matchLabels:
            app: server
        template:
          metadata:
            labels:
              app: server
          spec:
            containers:
            - name: server
              image: registry.redhat.io/ubi9/ubi
              ports:
              - name: http
                containerPort: 30100
              command: ["/bin/sh", "-c"]
              args: ["sleep INF"]
      ---
      apiVersion: v1
      kind: Service
      metadata:
        name: server1
        namespace: test
      spec:
        ports:
        - name: http
          port: 30100
          protocol: TCP
          targetPort: 30100
        selector:
          app: server
        type: LoadBalancer
    2. Apply the configuration for the namespace, deployment, and service by running the following command:

      $ oc apply -f deploy-service.yaml

Verification

  1. Identify a MetalLB speaker pod by running the following command:

    $ oc get -n metallb-system pods -l component=speaker

    Example output

    NAME            READY   STATUS    RESTARTS   AGE
    speaker-c6c5f   6/6     Running   0          69m

  2. Verify that the state of the BGP session is Established in the speaker pod by running the following command, replacing the variables to match your configuration:

    $ oc exec -n metallb-system <speaker_pod> -c frr -- vtysh -c "show bgp vrf <vrf_name> neigh"

    Example output

    BGP neighbor is 192.168.30.1, remote AS 200, local AS 100, external link
      BGP version 4, remote router ID 192.168.30.1, local router ID 192.168.30.71
      BGP state = Established, up for 04:20:09
    
    ...

  3. Verify that the service is advertised correctly by running the following command:

    $ oc exec -n metallb-system <speaker_pod> -c frr -- vtysh -c "show bgp vrf <vrf_name> ipv4"

32.6.5. Example BGP peer configurations

32.6.5.1. Example: Limit which nodes connect to a BGP peer

You can specify the node selectors field to control which nodes can connect to a BGP peer.

apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: doc-example-nodesel
  namespace: metallb-system
spec:
  peerAddress: 10.0.20.1
  peerASN: 64501
  myASN: 64500
  nodeSelectors:
  - matchExpressions:
    - key: kubernetes.io/hostname
      operator: In
      values: [compute-1.example.com, compute-2.example.com]

32.6.5.2. Example: Specify a BFD profile for a BGP peer

You can specify a BFD profile to associate with BGP peers. BFD compliments BGP by providing more rapid detection of communication failures between peers than BGP alone.

apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: doc-example-peer-bfd
  namespace: metallb-system
spec:
  peerAddress: 10.0.20.1
  peerASN: 64501
  myASN: 64500
  holdTime: "10s"
  bfdProfile: doc-example-bfd-profile-full
Note

Deleting the bidirectional forwarding detection (BFD) profile and removing the bfdProfile added to the border gateway protocol (BGP) peer resource does not disable the BFD. Instead, the BGP peer starts using the default BFD profile. To disable BFD from a BGP peer resource, delete the BGP peer configuration and recreate it without a BFD profile. For more information, see BZ#2050824.

32.6.5.3. Example: Specify BGP peers for dual-stack networking

To support dual-stack networking, add one BGP peer custom resource for IPv4 and one BGP peer custom resource for IPv6.

apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: doc-example-dual-stack-ipv4
  namespace: metallb-system
spec:
  peerAddress: 10.0.20.1
  peerASN: 64500
  myASN: 64500
---
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: doc-example-dual-stack-ipv6
  namespace: metallb-system
spec:
  peerAddress: 2620:52:0:88::104
  peerASN: 64500
  myASN: 64500

32.6.6. Next steps

32.7. Configuring community alias

As a cluster administrator, you can configure a community alias and use it across different advertisements.

32.7.1. About the community custom resource

The community custom resource is a collection of aliases for communities. Users can define named aliases to be used when advertising ipAddressPools using the BGPAdvertisement. The fields for the community custom resource are described in the following table.

Note

The community CRD applies only to BGPAdvertisement.

Table 32.6. MetalLB community custom resource
FieldTypeDescription

metadata.name

string

Specifies the name for the community.

metadata.namespace

string

Specifies the namespace for the community. Specify the same namespace that the MetalLB Operator uses.

spec.communities

string

Specifies a list of BGP community aliases that can be used in BGPAdvertisements. A community alias consists of a pair of name (alias) and value (number:number). Link the BGPAdvertisement to a community alias by referring to the alias name in its spec.communities field.

Table 32.7. CommunityAlias
FieldTypeDescription

name

string

The name of the alias for the community.

value

string

The BGP community value corresponding to the given name.

32.7.2. Configuring MetalLB with a BGP advertisement and community alias

Configure MetalLB as follows so that the IPAddressPool is advertised with the BGP protocol and the community alias set to the numeric value of the NO_ADVERTISE community.

In the following example, the peer BGP router doc-example-peer-community receives one 203.0.113.200/32 route and one fc00:f853:ccd:e799::1/128 route for each load-balancer IP address that MetalLB assigns to a service. A community alias is configured with the NO_ADVERTISE community.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create an IP address pool.

    1. Create a file, such as ipaddresspool.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        namespace: metallb-system
        name: doc-example-bgp-community
      spec:
        addresses:
          - 203.0.113.200/30
          - fc00:f853:ccd:e799::/124
    2. Apply the configuration for the IP address pool:

      $ oc apply -f ipaddresspool.yaml
  2. Create a community alias named community1.

    apiVersion: metallb.io/v1beta1
    kind: Community
    metadata:
      name: community1
      namespace: metallb-system
    spec:
      communities:
        - name: NO_ADVERTISE
          value: '65535:65282'
  3. Create a BGP peer named doc-example-bgp-peer.

    1. Create a file, such as bgppeer.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta2
      kind: BGPPeer
      metadata:
        namespace: metallb-system
        name: doc-example-bgp-peer
      spec:
        peerAddress: 10.0.0.1
        peerASN: 64501
        myASN: 64500
        routerID: 10.10.10.10
    2. Apply the configuration for the BGP peer:

      $ oc apply -f bgppeer.yaml
  4. Create a BGP advertisement with the community alias.

    1. Create a file, such as bgpadvertisement.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: BGPAdvertisement
      metadata:
        name: bgp-community-sample
        namespace: metallb-system
      spec:
        aggregationLength: 32
        aggregationLengthV6: 128
        communities:
          - NO_ADVERTISE 1
        ipAddressPools:
          - doc-example-bgp-community
        peers:
          - doc-example-peer
      1
      Specify the CommunityAlias.name here and not the community custom resource (CR) name.
    2. Apply the configuration:

      $ oc apply -f bgpadvertisement.yaml

32.8. Configuring MetalLB BFD profiles

As a cluster administrator, you can add, modify, and delete Bidirectional Forwarding Detection (BFD) profiles. The MetalLB Operator uses the BFD profile custom resources to identify which BGP sessions use BFD to provide faster path failure detection than BGP alone provides.

32.8.1. About the BFD profile custom resource

The fields for the BFD profile custom resource are described in the following table.

Table 32.8. BFD profile custom resource
FieldTypeDescription

metadata.name

string

Specifies the name for the BFD profile custom resource.

metadata.namespace

string

Specifies the namespace for the BFD profile custom resource.

spec.detectMultiplier

integer

Specifies the detection multiplier to determine packet loss. The remote transmission interval is multiplied by this value to determine the connection loss detection timer.

For example, when the local system has the detect multiplier set to 3 and the remote system has the transmission interval set to 300, the local system detects failures only after 900 ms without receiving packets.

The range is 2 to 255. The default value is 3.

spec.echoMode

boolean

Specifies the echo transmission mode. If you are not using distributed BFD, echo transmission mode works only when the peer is also FRR. The default value is false and echo transmission mode is disabled.

When echo transmission mode is enabled, consider increasing the transmission interval of control packets to reduce bandwidth usage. For example, consider increasing the transmit interval to 2000 ms.

spec.echoInterval

integer

Specifies the minimum transmission interval, less jitter, that this system uses to send and receive echo packets. The range is 10 to 60000. The default value is 50 ms.

spec.minimumTtl

integer

Specifies the minimum expected TTL for an incoming control packet. This field applies to multi-hop sessions only.

The purpose of setting a minimum TTL is to make the packet validation requirements more stringent and avoid receiving control packets from other sessions.

The default value is 254 and indicates that the system expects only one hop between this system and the peer.

spec.passiveMode

boolean

Specifies whether a session is marked as active or passive. A passive session does not attempt to start the connection. Instead, a passive session waits for control packets from a peer before it begins to reply.

Marking a session as passive is useful when you have a router that acts as the central node of a star network and you want to avoid sending control packets that you do not need the system to send.

The default value is false and marks the session as active.

spec.receiveInterval

integer

Specifies the minimum interval that this system is capable of receiving control packets. The range is 10 to 60000. The default value is 300 ms.

spec.transmitInterval

integer

Specifies the minimum transmission interval, less jitter, that this system uses to send control packets. The range is 10 to 60000. The default value is 300 ms.

32.8.2. Configuring a BFD profile

As a cluster administrator, you can add a BFD profile and configure a BGP peer to use the profile. BFD provides faster path failure detection than BGP alone.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.

Procedure

  1. Create a file, such as bfdprofile.yaml, with content like the following example:

    apiVersion: metallb.io/v1beta1
    kind: BFDProfile
    metadata:
      name: doc-example-bfd-profile-full
      namespace: metallb-system
    spec:
      receiveInterval: 300
      transmitInterval: 300
      detectMultiplier: 3
      echoMode: false
      passiveMode: true
      minimumTtl: 254
  2. Apply the configuration for the BFD profile:

    $ oc apply -f bfdprofile.yaml

32.8.3. Next steps

32.9. Configuring services to use MetalLB

As a cluster administrator, when you add a service of type LoadBalancer, you can control how MetalLB assigns an IP address.

32.9.1. Request a specific IP address

Like some other load-balancer implementations, MetalLB accepts the spec.loadBalancerIP field in the service specification.

If the requested IP address is within a range from any address pool, MetalLB assigns the requested IP address. If the requested IP address is not within any range, MetalLB reports a warning.

Example service YAML for a specific IP address

apiVersion: v1
kind: Service
metadata:
  name: <service_name>
  annotations:
    metallb.universe.tf/address-pool: <address_pool_name>
spec:
  selector:
    <label_key>: <label_value>
  ports:
    - port: 8080
      targetPort: 8080
      protocol: TCP
  type: LoadBalancer
  loadBalancerIP: <ip_address>

If MetalLB cannot assign the requested IP address, the EXTERNAL-IP for the service reports <pending> and running oc describe service <service_name> includes an event like the following example.

Example event when MetalLB cannot assign a requested IP address

  ...
Events:
  Type     Reason            Age    From                Message
  ----     ------            ----   ----                -------
  Warning  AllocationFailed  3m16s  metallb-controller  Failed to allocate IP for "default/invalid-request": "4.3.2.1" is not allowed in config

32.9.2. Request an IP address from a specific pool

To assign an IP address from a specific range, but you are not concerned with the specific IP address, then you can use the metallb.universe.tf/address-pool annotation to request an IP address from the specified address pool.

Example service YAML for an IP address from a specific pool

apiVersion: v1
kind: Service
metadata:
  name: <service_name>
  annotations:
    metallb.universe.tf/address-pool: <address_pool_name>
spec:
  selector:
    <label_key>: <label_value>
  ports:
    - port: 8080
      targetPort: 8080
      protocol: TCP
  type: LoadBalancer

If the address pool that you specify for <address_pool_name> does not exist, MetalLB attempts to assign an IP address from any pool that permits automatic assignment.

32.9.3. Accept any IP address

By default, address pools are configured to permit automatic assignment. MetalLB assigns an IP address from these address pools.

To accept any IP address from any pool that is configured for automatic assignment, no special annotation or configuration is required.

Example service YAML for accepting any IP address

apiVersion: v1
kind: Service
metadata:
  name: <service_name>
spec:
  selector:
    <label_key>: <label_value>
  ports:
    - port: 8080
      targetPort: 8080
      protocol: TCP
  type: LoadBalancer

32.9.4. Share a specific IP address

By default, services do not share IP addresses. However, if you need to colocate services on a single IP address, you can enable selective IP sharing by adding the metallb.universe.tf/allow-shared-ip annotation to the services.

apiVersion: v1
kind: Service
metadata:
  name: service-http
  annotations:
    metallb.universe.tf/address-pool: doc-example
    metallb.universe.tf/allow-shared-ip: "web-server-svc"  1
spec:
  ports:
    - name: http
      port: 80  2
      protocol: TCP
      targetPort: 8080
  selector:
    <label_key>: <label_value>  3
  type: LoadBalancer
  loadBalancerIP: 172.31.249.7  4
---
apiVersion: v1
kind: Service
metadata:
  name: service-https
  annotations:
    metallb.universe.tf/address-pool: doc-example
    metallb.universe.tf/allow-shared-ip: "web-server-svc"
spec:
  ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: 8080
  selector:
    <label_key>: <label_value>
  type: LoadBalancer
  loadBalancerIP: 172.31.249.7
1
Specify the same value for the metallb.universe.tf/allow-shared-ip annotation. This value is referred to as the sharing key.
2
Specify different port numbers for the services.
3
Specify identical pod selectors if you must specify externalTrafficPolicy: local so the services send traffic to the same set of pods. If you use the cluster external traffic policy, then the pod selectors do not need to be identical.
4
Optional: If you specify the three preceding items, MetalLB might colocate the services on the same IP address. To ensure that services share an IP address, specify the IP address to share.

By default, Kubernetes does not allow multiprotocol load balancer services. This limitation would normally make it impossible to run a service like DNS that needs to listen on both TCP and UDP. To work around this limitation of Kubernetes with MetalLB, create two services:

  • For one service, specify TCP and for the second service, specify UDP.
  • In both services, specify the same pod selector.
  • Specify the same sharing key and spec.loadBalancerIP value to colocate the TCP and UDP services on the same IP address.

32.9.5. Configuring a service with MetalLB

You can configure a load-balancing service to use an external IP address from an address pool.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Install the MetalLB Operator and start MetalLB.
  • Configure at least one address pool.
  • Configure your network to route traffic from the clients to the host network for the cluster.

Procedure

  1. Create a <service_name>.yaml file. In the file, ensure that the spec.type field is set to LoadBalancer.

    Refer to the examples for information about how to request the external IP address that MetalLB assigns to the service.

  2. Create the service:

    $ oc apply -f <service_name>.yaml

    Example output

    service/<service_name> created

Verification

  • Describe the service:

    $ oc describe service <service_name>

    Example output

    Name:                     <service_name>
    Namespace:                default
    Labels:                   <none>
    Annotations:              metallb.universe.tf/address-pool: doc-example  1
    Selector:                 app=service_name
    Type:                     LoadBalancer  2
    IP Family Policy:         SingleStack
    IP Families:              IPv4
    IP:                       10.105.237.254
    IPs:                      10.105.237.254
    LoadBalancer Ingress:     192.168.100.5  3
    Port:                     <unset>  80/TCP
    TargetPort:               8080/TCP
    NodePort:                 <unset>  30550/TCP
    Endpoints:                10.244.0.50:8080
    Session Affinity:         None
    External Traffic Policy:  Cluster
    Events:  4
      Type    Reason        Age                From             Message
      ----    ------        ----               ----             -------
      Normal  nodeAssigned  32m (x2 over 32m)  metallb-speaker  announcing from node "<node_name>"

    1
    The annotation is present if you request an IP address from a specific pool.
    2
    The service type must indicate LoadBalancer.
    3
    The load-balancer ingress field indicates the external IP address if the service is assigned correctly.
    4
    The events field indicates the node name that is assigned to announce the external IP address. If you experience an error, the events field indicates the reason for the error.

32.10. Managing symmetric routing with MetalLB

As a cluster administrator, you can effectively manage traffic for pods behind a MetalLB load-balancer service with multiple host interfaces by implementing features from MetalLB, NMState, and OVN-Kubernetes. By combining these features in this context, you can provide symmetric routing, traffic segregation, and support clients on different networks with overlapping CIDR addresses.

To achieve this functionality, learn how to implement virtual routing and forwarding (VRF) instances with MetalLB, and configure egress services.

Important

Configuring symmetric traffic by using a VRF instance with MetalLB and an egress service is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

32.10.1. Challenges of managing symmetric routing with MetalLB

When you use MetalLB with multiple host interfaces, MetalLB exposes and announces a service through all available interfaces on the host. This can present challenges relating to network isolation, asymmetric return traffic and overlapping CIDR addresses.

One option to ensure that return traffic reaches the correct client is to use static routes. However, with this solution, MetalLB cannot isolate the services and then announce each service through a different interface. Additionally, static routing requires manual configuration and requires maintenance if remote sites are added.

A further challenge of symmetric routing when implementing a MetalLB service is scenarios where external systems expect the source and destination IP address for an application to be the same. The default behavior for OpenShift Container Platform is to assign the IP address of the host network interface as the source IP address for traffic originating from pods. This is problematic with multiple host interfaces.

You can overcome these challenges by implementing a configuration that combines features from MetalLB, NMState, and OVN-Kubernetes.

32.10.2. Overview of managing symmetric routing by using VRFs with MetalLB

You can overcome the challenges of implementing symmetric routing by using NMState to configure a VRF instance on a host, associating the VRF instance with a MetalLB BGPPeer resource, and configuring an egress service for egress traffic with OVN-Kubernetes.

Figure 32.2. Network overview of managing symmetric routing by using VRFs with MetalLB

Network overview of managing symmetric routing by using VRFs with MetalLB

The configuration process involves three stages:

1. Define a VRF and routing rules

  • Configure a NodeNetworkConfigurationPolicy custom resource (CR) to associate a VRF instance with a network interface.
  • Use the VRF routing table to direct ingress and egress traffic.

2. Link the VRF to a MetalLB BGPPeer

  • Configure a MetalLB BGPPeer resource to use the VRF instance on a network interface.
  • By associating the BGPPeer resource with the VRF instance, the designated network interface becomes the primary interface for the BGP session, and MetalLB advertises the services through this interface.

3. Configure an egress service

  • Configure an egress service to choose the network associated with the VRF instance for egress traffic.
  • Optional: Configure an egress service to use the IP address of the MetalLB load-balancer service as the source IP for egress traffic.

32.10.3. Configuring symmetric routing by using VRFs with MetalLB

You can configure symmetric network routing for applications behind a MetalLB service that require the same ingress and egress network paths.

This example associates a VRF routing table with MetalLB and an egress service to enable symmetric routing for ingress and egress traffic for pods behind a LoadBalancer service.

Note
  • If you use the sourceIPBy: "LoadBalancerIP" setting in the EgressService CR, you must specify the load-balancer node in the BGPAdvertisement custom resource (CR).
  • You can use the sourceIPBy: "Network" setting on clusters that use OVN-Kubernetes configured with the gatewayConfig.routingViaHost specification set to true only. Additionally, if you use the sourceIPBy: "Network" setting, you must schedule the application workload on nodes configured with the network VRF instance.

Prerequisites

  • Install the OpenShift CLI (oc).
  • Log in as a user with cluster-admin privileges.
  • Install the Kubernetes NMState Operator.
  • Install the MetalLB Operator.

Procedure

  1. Create a NodeNetworkConfigurationPolicy CR to define the VRF instance:

    1. Create a file, such as node-network-vrf.yaml, with content like the following example:

      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
        name: vrfpolicy 1
      spec:
        nodeSelector:
          vrf: "true" 2
        maxUnavailable: 3
        desiredState:
          interfaces:
          - name: ens4vrf 3
            type: vrf 4
            state: up
            vrf:
              port:
              - ens4 5
              route-table-id: 2 6
          - name: ens4 7
            type: ethernet
            state: up
            ipv4:
              address:
              - ip: 192.168.130.130
                prefix-length: 24
              dhcp: false
              enabled: true
          routes: 8
            config:
            - destination: 0.0.0.0/0
              metric: 150
              next-hop-address: 192.168.130.1
              next-hop-interface: ens4
              table-id: 2
          route-rules: 9
            config:
            - ip-to: 172.30.0.0/16
              priority: 998
              route-table: 254 10
            - ip-to: 10.132.0.0/14
              priority: 998
              route-table: 254
      1
      The name of the policy.
      2
      This example applies the policy to all nodes with the label vrf:true.
      3
      The name of the interface.
      4
      The type of interface. This example creates a VRF instance.
      5
      The node interface that the VRF attaches to.
      6
      The name of the route table ID for the VRF.
      7
      The IPv4 address of the interface associated with the VRF.
      8
      Defines the configuration for network routes. The next-hop-address field defines the IP address of the next hop for the route. The next-hop-interface field defines the outgoing interface for the route. In this example, the VRF routing table is 2, which references the ID that you define in the EgressService CR.
      9
      Defines additional route rules. The ip-to fields must match the Cluster Network CIDR and Service Network CIDR. You can view the values for these CIDR address specifications by running the following command: oc describe network.config/cluster.
      10
      The main routing table that the Linux kernel uses when calculating routes has the ID 254.
    2. Apply the policy by running the following command:

      $ oc apply -f node-network-vrf.yaml
  2. Create a BGPPeer custom resource (CR):

    1. Create a file, such as frr-via-vrf.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta2
      kind: BGPPeer
      metadata:
        name: frrviavrf
        namespace: metallb-system
      spec:
        myASN: 100
        peerASN: 200
        peerAddress: 192.168.130.1
        vrf: ens4vrf 1
      1
      Specifies the VRF instance to associate with the BGP peer. MetalLB can advertise services and make routing decisions based on the routing information in the VRF.
    2. Apply the configuration for the BGP peer by running the following command:

      $ oc apply -f frr-via-vrf.yaml
  3. Create an IPAddressPool CR:

    1. Create a file, such as first-pool.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: IPAddressPool
      metadata:
        name: first-pool
        namespace: metallb-system
      spec:
        addresses:
        - 192.169.10.0/32
    2. Apply the configuration for the IP address pool by running the following command:

      $ oc apply -f first-pool.yaml
  4. Create a BGPAdvertisement CR:

    1. Create a file, such as first-adv.yaml, with content like the following example:

      apiVersion: metallb.io/v1beta1
      kind: BGPAdvertisement
      metadata:
        name: first-adv
        namespace: metallb-system
      spec:
        ipAddressPools:
          - first-pool
        peers:
          - frrviavrf 1
        nodeSelectors:
          - matchLabels:
              egress-service.k8s.ovn.org/test-server1: "" 2
      1
      In this example, MetalLB advertises a range of IP addresses from the first-pool IP address pool to the frrviavrf BGP peer.
      2
      In this example, the EgressService CR configures the source IP address for egress traffic to use the load-balancer service IP address. Therefore, you must specify the load-balancer node for return traffic to use the same return path for the traffic originating from the pod.
    2. Apply the configuration for the BGP advertisement by running the following command:

      $ oc apply -f first-adv.yaml
  5. Create an EgressService CR:

    1. Create a file, such as egress-service.yaml, with content like the following example:

      apiVersion: k8s.ovn.org/v1
      kind: EgressService
      metadata:
        name: server1 1
        namespace: test 2
      spec:
        sourceIPBy: "LoadBalancerIP" 3
        nodeSelector:
          matchLabels:
            vrf: "true" 4
        network: "2" 5
      1
      Specify the name for the egress service. The name of the EgressService resource must match the name of the load-balancer service that you want to modify.
      2
      Specify the namespace for the egress service. The namespace for the EgressService must match the namespace of the load-balancer service that you want to modify. The egress service is namespace-scoped.
      3
      This example assigns the LoadBalancer service ingress IP address as the source IP address for egress traffic.
      4
      If you specify LoadBalancer for the sourceIPBy specification, a single node handles the LoadBalancer service traffic. In this example, only a node with the label vrf: "true" can handle the service traffic. If you do not specify a node, OVN-Kubernetes selects a worker node to handle the service traffic. When a node is selected, OVN-Kubernetes labels the node in the following format: egress-service.k8s.ovn.org/<svc_namespace>-<svc_name>: "".
      5
      Specify the routing table for egress traffic.
    2. Apply the configuration for the egress service by running the following command:

      $ oc apply -f egress-service.yaml

Verification

  1. Verify that you can access the application endpoint of the pods running behind the MetalLB service by running the following command:

    $ curl <external_ip_address>:<port_number> 1
    1
    Update the external IP address and port number to suit your application endpoint.
  2. Optional: If you assigned the LoadBalancer service ingress IP address as the source IP address for egress traffic, verify this configuration by using tools such as tcpdump to analyze packets received at the external client.

32.11. Configuring the integration of MetalLB and FRR-K8s

Important

The FRRConfiguration custom resource is a Technology Preview feature only. Technology Preview features are not supported with Red Hat production service level agreements (SLAs) and might not be functionally complete. Red Hat does not recommend using them in production. These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.

For more information about the support scope of Red Hat Technology Preview features, see Technology Preview Features Support Scope.

FRRouting (FRR) is a free, open source internet routing protocol suite for Linux and UNIX platforms. FRR-K8s is a Kubernetes based DaemonSet that exposes a subset of the FRR API in a Kubernetes-compliant manner. As a cluster administrator, you can use the FRRConfiguration custom resource (CR) to configure MetalLB to use FRR-K8s as the backend. You can use this to avail of FRR services, for example, receiving routes. If you run MetalLB with FRR-K8s as a backend, MetalLB generates the FRR-K8s configuration corresponding to the MetalLB configuration applied.

MetalLB integration with FRR

32.11.1. Activating the integration of MetalLB and FRR-K8s

The following procedure shows you how to activate FRR-K8s as the backend for MetalLB.

Prerequisites

  • You have a cluster installed on bare-metal hardware.
  • You have installed the OpenShift CLI (oc).
  • You have logged in as a user with cluster-admin privileges.

Procedure

  • Set the bgpBackend field of the MetalLB CR to frr-k8s as in the following example:

    apiVersion: metallb.io/v1beta1
    kind: MetalLB
    metadata:
      name: metallb
      namespace: metallb-system
    spec:
      bgpBackend: frr-k8s

32.11.2. FRR configurations

You can create multiple FRRConfiguration CRs to use FRR services in MetalLB. MetalLB generates an FRRConfiguration object which FRR-K8s merges with all other configurations that all users have created.

For example, you can configure FRR-K8s to receive all of the prefixes advertised by a given neighbor. The following example configures FRR-K8s to receive all of the prefixes advertised by a BGPPeer with host 172.18.0.5:

Example FRRConfiguration CR

apiVersion: frrk8s.metallb.io/v1beta1
kind: FRRConfiguration
metadata:
 name: test
 namespace: metallb-system
spec:
 bgp:
   routers:
   - asn: 64512
     neighbors:
     - address: 172.18.0.5
       asn: 64512
       toReceive:
        allowed:
            mode: all

You can also configure FRR-K8s to always block a set of prefixes, regardless of the configuration applied. This can be useful to avoid routes towards the pods or ClusterIPs CIDRs that might result in cluster malfunctions. The following example blocks the set of prefixes 192.168.1.0/24:

Example MetalLB CR

apiVersion: metallb.io/v1beta1
kind: MetalLB
metadata:
  name: metallb
  namespace: metallb-system
spec:
  bgpBackend: frr-k8s
  frrk8sConfig:
    alwaysBlock:
    - 192.168.1.0/24

You can set FRR-K8s to block the Cluster Network CIDR and Service Network CIDR. You can view the values for these CIDR address specifications by running the following command:

$ oc describe network.config/cluster

32.11.3. Configuring the FRRConfiguration CRD

The following section provides reference examples that use the FRRConfiguration custom resource (CR).

32.11.3.1. The routers field

You can use the routers field to configure multiple routers, one for each Virtual Routing and Forwarding (VRF) resource. For each router, you must define the Autonomous System Number (ASN).

You can also define a list of Border Gateway Protocol (BGP) neighbors to connect to, as in the following example:

Example FRRConfiguration CR

apiVersion: frrk8s.metallb.io/v1beta1
kind: FRRConfiguration
metadata:
  name: test
  namespace: frr-k8s-system
spec:
  bgp:
    routers:
    - asn: 64512
      neighbors:
      - address: 172.30.0.3
        asn: 4200000000
        ebgpMultiHop: true
        port: 180
      - address: 172.18.0.6
        asn: 4200000000
        port: 179

32.11.3.2. The toAdvertise field

By default, FRR-K8s does not advertise the prefixes configured as part of a router configuration. In order to advertise them, you use the toAdvertise field.

You can advertise a subset of the prefixes, as in the following example:

Example FRRConfiguration CR

apiVersion: frrk8s.metallb.io/v1beta1
kind: FRRConfiguration
metadata:
  name: test
  namespace: frr-k8s-system
spec:
  bgp:
    routers:
    - asn: 64512
      neighbors:
      - address: 172.30.0.3
        asn: 4200000000
        ebgpMultiHop: true
        port: 180
        toAdvertise:
          allowed:
            prefixes: 1
            - 192.168.2.0/24
      prefixes:
        - 192.168.2.0/24
        - 192.169.2.0/24

1
Advertises a subset of prefixes.

The following example shows you how to advertise all of the prefixes:

Example FRRConfiguration CR

apiVersion: frrk8s.metallb.io/v1beta1
kind: FRRConfiguration
metadata:
  name: test
  namespace: frr-k8s-system
spec:
  bgp:
    routers:
    - asn: 64512
      neighbors:
      - address: 172.30.0.3
        asn: 4200000000
        ebgpMultiHop: true
        port: 180
        toAdvertise:
          allowed:
            mode: all 1
      prefixes:
        - 192.168.2.0/24
        - 192.169.2.0/24

1
Advertises all prefixes.

32.11.3.3. The toReceive field

By default, FRR-K8s does not process any prefixes advertised by a neighbor. You can use the toReceive field to process such addresses.

You can configure for a subset of the prefixes, as in this example:

Example FRRConfiguration CR

apiVersion: frrk8s.metallb.io/v1beta1
kind: FRRConfiguration
metadata:
  name: test
  namespace: frr-k8s-system
spec:
  bgp:
    routers:
    - asn: 64512
      neighbors:
      - address: 172.18.0.5
          asn: 64512
          port: 179
          toReceive:
            allowed:
              prefixes:
              - prefix: 192.168.1.0/24
              - prefix: 192.169.2.0/24
                ge: 25 1
                le: 28 2

1 2
The prefix is applied if the prefix length is less than or equal to the le prefix length and greater than or equal to the ge prefix length.

The following example configures FRR to handle all the prefixes announced:

Example FRRConfiguration CR

apiVersion: frrk8s.metallb.io/v1beta1
kind: FRRConfiguration
metadata:
  name: test
  namespace: frr-k8s-system
spec:
  bgp:
    routers:
    - asn: 64512
      neighbors:
      - address: 172.18.0.5
          asn: 64512
          port: 179
          toReceive:
            allowed:
              mode: all

32.11.3.4. The bgp field

You can use the bgp field to define various BFD profiles and associate them with a neighbor. In the following example, BFD backs up the BGP session and FRR can detect link failures:

Example FRRConfiguration CR

apiVersion: frrk8s.metallb.io/v1beta1
kind: FRRConfiguration
metadata:
  name: test
  namespace: frr-k8s-system
spec:
  bgp:
    routers:
    - asn: 64512
      neighbors:
      - address: 172.30.0.3
        asn: 64512
        port: 180
        bfdProfile: defaultprofile
    bfdProfiles:
      - name: defaultprofile

32.11.3.5. The nodeSelector field

By default, FRR-K8s applies the configuration to all nodes where the daemon is running. You can use the nodeSelector field to specify the nodes to which you want to apply the configuration. For example:

Example FRRConfiguration CR

apiVersion: frrk8s.metallb.io/v1beta1
kind: FRRConfiguration
metadata:
  name: test
  namespace: frr-k8s-system
spec:
  bgp:
    routers:
    - asn: 64512
  nodeSelector:
    labelSelector:
    foo: "bar"

The fields for the FRRConfiguration custom resource are described in the following table:

Table 32.9. MetalLB FRRConfiguration custom resource
FieldTypeDescription

spec.bgp.routers

array

Specifies the routers that FRR is to configure (one per VRF).

spec.bgp.routers.asn

integer

The autonomous system number to use for the local end of the session.

spec.bgp.routers.id

string

Specifies the ID of the bgp router.

spec.bgp.routers.vrf

string

Specifies the host vrf used to establish sessions from this router.

spec.bgp.routers.neighbors

array

Specifies the neighbors to establish BGP sessions with.

spec.bgp.routers.neighbors.asn

integer

Specifies the autonomous system number to use for the local end of the session.

spec.bgp.routers.neighbors.address

string

Specifies the IP address to establish the session with.

spec.bgp.routers.neighbors.port

integer

Specifies the port to dial when establishing the session. Defaults to 179.

spec.bgp.routers.neighbors.password

string

Specifies the password to use for establishing the BGP session. Password and PasswordSecret are mutually exclusive.

spec.bgp.routers.neighbors.passwordSecret

string

Specifies the name of the authentication secret for the neighbor. The secret must be of type "kubernetes.io/basic-auth", and in the same namespace as the FRR-K8s daemon. The key "password" stores the password in the secret. Password and PasswordSecret are mutually exclusive.

spec.bgp.routers.neighbors.holdTime

duration

Specifies the requested BGP hold time, per RFC4271. Defaults to 180s.

spec.bgp.routers.neighbors.keepaliveTime

duration

Specifies the requested BGP keepalive time, per RFC4271. Defaults to 60s.

spec.bgp.routers.neighbors.connectTime

duration

Specifies how long BGP waits between connection attempts to a neighbor.

spec.bgp.routers.neighbors.ebgpMultiHop

boolean

Indicates if the BGPPeer is multi-hops away.

spec.bgp.routers.neighbors.bfdProfile

string

Specifies the name of the BFD Profile to use for the BFD session associated with the BGP session. If not set, the BFD session is not set up.

spec.bgp.routers.neighbors.toAdvertise.allowed

array

Represents the list of prefixes to advertise to a neighbor, and the associated properties.

spec.bgp.routers.neighbors.toAdvertise.allowed.prefixes

string array

Specifies the list of prefixes to advertise to a neighbor. This list must match the prefixes that you define in the router.

spec.bgp.routers.neighbors.toAdvertise.allowed.mode

string

Specifies the mode to use when handling the prefixes. You can set to filtered to allow only the prefixes in the prefixes list. You can set to all to allow all the prefixes configured on the router.

spec.bgp.routers.neighbors.toAdvertise.withLocalPref

array

Specifies the prefixes associated with an advertised local preference. You must specify the prefixes associated with a local preference in the prefixes allowed to be advertised.

spec.bgp.routers.neighbors.toAdvertise.withLocalPref.prefixes

string array

Specifies the prefixes associated with the local preference.

spec.bgp.routers.neighbors.toAdvertise.withLocalPref.localPref

integer

Specifies the local preference associated with the prefixes.

spec.bgp.routers.neighbors.toAdvertise.withCommunity

array

Specifies the prefixes associated with an advertised BGP community. You must include the prefixes associated with a local preference in the list of prefixes that you want to advertise.

spec.bgp.routers.neighbors.toAdvertise.withCommunity.prefixes

string array

Specifies the prefixes associated with the community.

spec.bgp.routers.neighbors.toAdvertise.withCommunity.community

string

Specifies the community associated with the prefixes.

spec.bgp.routers.neighbors.toReceive

array

Specifies the prefixes to receive from a neighbor.

spec.bgp.routers.neighbors.toReceive.allowed

array

Specifies the information that you want to receive from a neighbor.

spec.bgp.routers.neighbors.toReceive.allowed.prefixes

array

Specifies the prefixes allowed from a neighbor.

spec.bgp.routers.neighbors.toReceive.allowed.mode

string

Specifies the mode to use when handling the prefixes. When set to filtered, only the prefixes in the prefixes list are allowed. When set to all, all the prefixes configured on the router are allowed.

spec.bgp.routers.neighbors.disableMP

boolean

Disables MP BGP to prevent it from separating IPv4 and IPv6 route exchanges into distinct BGP sessions.

spec.bgp.routers.prefixes

string array

Specifies all prefixes to advertise from this router instance.

spec.bgp.bfdProfiles

array

Specifies the list of bfd profiles to use when configuring the neighbors.

spec.bgp.bfdProfiles.name

string

The name of the BFD Profile to be referenced in other parts of the configuration.

spec.bgp.bfdProfiles.receiveInterval

integer

Specifies the minimum interval at which this system can receive control packets, in milliseconds. Defaults to 300ms.

spec.bgp.bfdProfiles.transmitInterval

integer

Specifies the minimum transmission interval, excluding jitter, that this system wants to use to send BFD control packets, in milliseconds. Defaults to 300ms.

spec.bgp.bfdProfiles.detectMultiplier

integer

Configures the detection multiplier to determine packet loss. To determine the connection loss-detection timer, multiply the remote transmission interval by this value.

spec.bgp.bfdProfiles.echoInterval

integer

Configures the minimal echo receive transmission-interval that this system can handle, in milliseconds. Defaults to 50ms.

spec.bgp.bfdProfiles.echoMode

boolean

Enables or disables the echo transmission mode. This mode is disabled by default, and not supported on multihop setups.

spec.bgp.bfdProfiles.passiveMode

boolean

Mark session as passive. A passive session does not attempt to start the connection and waits for control packets from peers before it begins replying.

spec.bgp.bfdProfiles.MinimumTtl

integer

For multihop sessions only. Configures the minimum expected TTL for an incoming BFD control packet.

spec.nodeSelector

string

Limits the nodes that attempt to apply this configuration. If specified, only those nodes whose labels match the specified selectors attempt to apply the configuration. If it is not specified, all nodes attempt to apply this configuration.

status

string

Defines the observed state of FRRConfiguration.

32.11.4. How FRR-K8s merges multiple configurations

In a case where multiple users add configurations that select the same node, FRR-K8s merges the configurations. Each configuration can only extend others. This means that it is possible to add a new neighbor to a router, or to advertise an additional prefix to a neighbor, but not possible to remove a component added by another configuration.

32.11.4.1. Configuration conflicts

Certain configurations can cause conflicts, leading to errors, for example:

  • different ASN for the same router (in the same VRF)
  • different ASN for the same neighbor (with the same IP / port)
  • multiple BFD profiles with the same name but different values

When the daemon finds an invalid configuration for a node, it reports the configuration as invalid and reverts to the previous valid FRR configuration.

32.11.4.2. Merging

When merging, it is possible to do the following actions:

  • Extend the set of IPs that you want to advertise to a neighbor.
  • Add an extra neighbor with its set of IPs.
  • Extend the set of IPs to which you want to associate a community.
  • Allow incoming routes for a neighbor.

Each configuration must be self contained. This means, for example, that it is not possible to allow prefixes that are not defined in the router section by leveraging prefixes coming from another configuration.

If the configurations to be applied are compatible, merging works as follows:

  • FRR-K8s combines all the routers.
  • FRR-K8s merges all prefixes and neighbors for each router.
  • FRR-K8s merges all filters for each neighbor.
Note

A less restrictive filter has precedence over a stricter one. For example, a filter accepting some prefixes has precedence over a filter not accepting any, and a filter accepting all prefixes has precedence over one that accepts some.

32.12. MetalLB logging, troubleshooting, and support

If you need to troubleshoot MetalLB configuration, see the following sections for commonly used commands.

32.12.1. Setting the MetalLB logging levels

MetalLB uses FRRouting (FRR) in a container with the default setting of info generates a lot of logging. You can control the verbosity of the logs generated by setting the logLevel as illustrated in this example.

Gain a deeper insight into MetalLB by setting the logLevel to debug as follows:

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Create a file, such as setdebugloglevel.yaml, with content like the following example:

    apiVersion: metallb.io/v1beta1
    kind: MetalLB
    metadata:
      name: metallb
      namespace: metallb-system
    spec:
      logLevel: debug
      nodeSelector:
        node-role.kubernetes.io/worker: ""
  2. Apply the configuration:

    $ oc replace -f setdebugloglevel.yaml
    Note

    Use oc replace as the understanding is the metallb CR is already created and here you are changing the log level.

  3. Display the names of the speaker pods:

    $ oc get -n metallb-system pods -l component=speaker

    Example output

    NAME                    READY   STATUS    RESTARTS   AGE
    speaker-2m9pm           4/4     Running   0          9m19s
    speaker-7m4qw           3/4     Running   0          19s
    speaker-szlmx           4/4     Running   0          9m19s

    Note

    Speaker and controller pods are recreated to ensure the updated logging level is applied. The logging level is modified for all the components of MetalLB.

  4. View the speaker logs:

    $ oc logs -n metallb-system speaker-7m4qw -c speaker

    Example output

    {"branch":"main","caller":"main.go:92","commit":"3d052535","goversion":"gc / go1.17.1 / amd64","level":"info","msg":"MetalLB speaker starting (commit 3d052535, branch main)","ts":"2022-05-17T09:55:05Z","version":""}
    {"caller":"announcer.go:110","event":"createARPResponder","interface":"ens4","level":"info","msg":"created ARP responder for interface","ts":"2022-05-17T09:55:05Z"}
    {"caller":"announcer.go:119","event":"createNDPResponder","interface":"ens4","level":"info","msg":"created NDP responder for interface","ts":"2022-05-17T09:55:05Z"}
    {"caller":"announcer.go:110","event":"createARPResponder","interface":"tun0","level":"info","msg":"created ARP responder for interface","ts":"2022-05-17T09:55:05Z"}
    {"caller":"announcer.go:119","event":"createNDPResponder","interface":"tun0","level":"info","msg":"created NDP responder for interface","ts":"2022-05-17T09:55:05Z"}
    I0517 09:55:06.515686      95 request.go:665] Waited for 1.026500832s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1alpha1?timeout=32s
    {"Starting Manager":"(MISSING)","caller":"k8s.go:389","level":"info","ts":"2022-05-17T09:55:08Z"}
    {"caller":"speakerlist.go:310","level":"info","msg":"node event - forcing sync","node addr":"10.0.128.4","node event":"NodeJoin","node name":"ci-ln-qb8t3mb-72292-7s7rh-worker-a-vvznj","ts":"2022-05-17T09:55:08Z"}
    {"caller":"service_controller.go:113","controller":"ServiceReconciler","enqueueing":"openshift-kube-controller-manager-operator/metrics","epslice":"{\"metadata\":{\"name\":\"metrics-xtsxr\",\"generateName\":\"metrics-\",\"namespace\":\"openshift-kube-controller-manager-operator\",\"uid\":\"ac6766d7-8504-492c-9d1e-4ae8897990ad\",\"resourceVersion\":\"9041\",\"generation\":4,\"creationTimestamp\":\"2022-05-17T07:16:53Z\",\"labels\":{\"app\":\"kube-controller-manager-operator\",\"endpointslice.kubernetes.io/managed-by\":\"endpointslice-controller.k8s.io\",\"kubernetes.io/service-name\":\"metrics\"},\"annotations\":{\"endpoints.kubernetes.io/last-change-trigger-time\":\"2022-05-17T07:21:34Z\"},\"ownerReferences\":[{\"apiVersion\":\"v1\",\"kind\":\"Service\",\"name\":\"metrics\",\"uid\":\"0518eed3-6152-42be-b566-0bd00a60faf8\",\"controller\":true,\"blockOwnerDeletion\":true}],\"managedFields\":[{\"manager\":\"kube-controller-manager\",\"operation\":\"Update\",\"apiVersion\":\"discovery.k8s.io/v1\",\"time\":\"2022-05-17T07:20:02Z\",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:addressType\":{},\"f:endpoints\":{},\"f:metadata\":{\"f:annotations\":{\".\":{},\"f:endpoints.kubernetes.io/last-change-trigger-time\":{}},\"f:generateName\":{},\"f:labels\":{\".\":{},\"f:app\":{},\"f:endpointslice.kubernetes.io/managed-by\":{},\"f:kubernetes.io/service-name\":{}},\"f:ownerReferences\":{\".\":{},\"k:{\\\"uid\\\":\\\"0518eed3-6152-42be-b566-0bd00a60faf8\\\"}\":{}}},\"f:ports\":{}}}]},\"addressType\":\"IPv4\",\"endpoints\":[{\"addresses\":[\"10.129.0.7\"],\"conditions\":{\"ready\":true,\"serving\":true,\"terminating\":false},\"targetRef\":{\"kind\":\"Pod\",\"namespace\":\"openshift-kube-controller-manager-operator\",\"name\":\"kube-controller-manager-operator-6b98b89ddd-8d4nf\",\"uid\":\"dd5139b8-e41c-4946-a31b-1a629314e844\",\"resourceVersion\":\"9038\"},\"nodeName\":\"ci-ln-qb8t3mb-72292-7s7rh-master-0\",\"zone\":\"us-central1-a\"}],\"ports\":[{\"name\":\"https\",\"protocol\":\"TCP\",\"port\":8443}]}","level":"debug","ts":"2022-05-17T09:55:08Z"}

  5. View the FRR logs:

    $ oc logs -n metallb-system speaker-7m4qw -c frr

    Example output

    Started watchfrr
    2022/05/17 09:55:05 ZEBRA: client 16 says hello and bids fair to announce only bgp routes vrf=0
    2022/05/17 09:55:05 ZEBRA: client 31 says hello and bids fair to announce only vnc routes vrf=0
    2022/05/17 09:55:05 ZEBRA: client 38 says hello and bids fair to announce only static routes vrf=0
    2022/05/17 09:55:05 ZEBRA: client 43 says hello and bids fair to announce only bfd routes vrf=0
    2022/05/17 09:57:25.089 BGP: Creating Default VRF, AS 64500
    2022/05/17 09:57:25.090 BGP: dup addr detect enable max_moves 5 time 180 freeze disable freeze_time 0
    2022/05/17 09:57:25.090 BGP: bgp_get: Registering BGP instance (null) to zebra
    2022/05/17 09:57:25.090 BGP: Registering VRF 0
    2022/05/17 09:57:25.091 BGP: Rx Router Id update VRF 0 Id 10.131.0.1/32
    2022/05/17 09:57:25.091 BGP: RID change : vrf VRF default(0), RTR ID 10.131.0.1
    2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF br0
    2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF ens4
    2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF ens4 addr 10.0.128.4/32
    2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF ens4 addr fe80::c9d:84da:4d86:5618/64
    2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF lo
    2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF ovs-system
    2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF tun0
    2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF tun0 addr 10.131.0.1/23
    2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF tun0 addr fe80::40f1:d1ff:feb6:5322/64
    2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth2da49fed
    2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth2da49fed addr fe80::24bd:d1ff:fec1:d88/64
    2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth2fa08c8c
    2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth2fa08c8c addr fe80::6870:ff:fe96:efc8/64
    2022/05/17 09:57:25.091 BGP: Rx Intf add VRF 0 IF veth41e356b7
    2022/05/17 09:57:25.091 BGP: Rx Intf address add VRF 0 IF veth41e356b7 addr fe80::48ff:37ff:fede:eb4b/64
    2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth1295c6e2
    2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth1295c6e2 addr fe80::b827:a2ff:feed:637/64
    2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth9733c6dc
    2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth9733c6dc addr fe80::3cf4:15ff:fe11:e541/64
    2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF veth336680ea
    2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF veth336680ea addr fe80::94b1:8bff:fe7e:488c/64
    2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vetha0a907b7
    2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vetha0a907b7 addr fe80::3855:a6ff:fe73:46c3/64
    2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vethf35a4398
    2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vethf35a4398 addr fe80::40ef:2fff:fe57:4c4d/64
    2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vethf831b7f4
    2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vethf831b7f4 addr fe80::f0d9:89ff:fe7c:1d32/64
    2022/05/17 09:57:25.092 BGP: Rx Intf add VRF 0 IF vxlan_sys_4789
    2022/05/17 09:57:25.092 BGP: Rx Intf address add VRF 0 IF vxlan_sys_4789 addr fe80::80c1:82ff:fe4b:f078/64
    2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] Timer (start timer expire).
    2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] BGP_Start (Idle->Connect), fd -1
    2022/05/17 09:57:26.094 BGP: Allocated bnc 10.0.0.1/32(0)(VRF default) peer 0x7f807f7631a0
    2022/05/17 09:57:26.094 BGP: sendmsg_zebra_rnh: sending cmd ZEBRA_NEXTHOP_REGISTER for 10.0.0.1/32 (vrf VRF default)
    2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] Waiting for NHT
    2022/05/17 09:57:26.094 BGP: bgp_fsm_change_status : vrf default(0), Status: Connect established_peers 0
    2022/05/17 09:57:26.094 BGP: 10.0.0.1 went from Idle to Connect
    2022/05/17 09:57:26.094 BGP: 10.0.0.1 [FSM] TCP_connection_open_failed (Connect->Active), fd -1
    2022/05/17 09:57:26.094 BGP: bgp_fsm_change_status : vrf default(0), Status: Active established_peers 0
    2022/05/17 09:57:26.094 BGP: 10.0.0.1 went from Connect to Active
    2022/05/17 09:57:26.094 ZEBRA: rnh_register msg from client bgp: hdr->length=8, type=nexthop vrf=0
    2022/05/17 09:57:26.094 ZEBRA: 0: Add RNH 10.0.0.1/32 type Nexthop
    2022/05/17 09:57:26.094 ZEBRA: 0:10.0.0.1/32: Evaluate RNH, type Nexthop (force)
    2022/05/17 09:57:26.094 ZEBRA: 0:10.0.0.1/32: NH has become unresolved
    2022/05/17 09:57:26.094 ZEBRA: 0: Client bgp registers for RNH 10.0.0.1/32 type Nexthop
    2022/05/17 09:57:26.094 BGP: VRF default(0): Rcvd NH update 10.0.0.1/32(0) - metric 0/0 #nhops 0/0 flags 0x6
    2022/05/17 09:57:26.094 BGP: NH update for 10.0.0.1/32(0)(VRF default) - flags 0x6 chgflags 0x0 - evaluate paths
    2022/05/17 09:57:26.094 BGP: evaluate_paths: Updating peer (10.0.0.1(VRF default)) status with NHT
    2022/05/17 09:57:30.081 ZEBRA: Event driven route-map update triggered
    2022/05/17 09:57:30.081 ZEBRA: Event handler for route-map: 10.0.0.1-out
    2022/05/17 09:57:30.081 ZEBRA: Event handler for route-map: 10.0.0.1-in
    2022/05/17 09:57:31.104 ZEBRA: netlink_parse_info: netlink-listen (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0
    2022/05/17 09:57:31.104 ZEBRA: 	Neighbor Entry received is not on a VLAN or a BRIDGE, ignoring
    2022/05/17 09:57:31.105 ZEBRA: netlink_parse_info: netlink-listen (NS 0) type RTM_NEWNEIGH(28), len=76, seq=0, pid=0
    2022/05/17 09:57:31.105 ZEBRA: 	Neighbor Entry received is not on a VLAN or a BRIDGE, ignoring

32.12.1.1. FRRouting (FRR) log levels

The following table describes the FRR logging levels.

Table 32.10. Log levels
Log levelDescription

all

Supplies all logging information for all logging levels.

debug

Information that is diagnostically helpful to people. Set to debug to give detailed troubleshooting information.

info

Provides information that always should be logged but under normal circumstances does not require user intervention. This is the default logging level.

warn

Anything that can potentially cause inconsistent MetalLB behaviour. Usually MetalLB automatically recovers from this type of error.

error

Any error that is fatal to the functioning of MetalLB. These errors usually require administrator intervention to fix.

none

Turn off all logging.

32.12.2. Troubleshooting BGP issues

The BGP implementation that Red Hat supports uses FRRouting (FRR) in a container in the speaker pods. As a cluster administrator, if you need to troubleshoot BGP configuration issues, you need to run commands in the FRR container.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Display the names of the speaker pods:

    $ oc get -n metallb-system pods -l component=speaker

    Example output

    NAME            READY   STATUS    RESTARTS   AGE
    speaker-66bth   4/4     Running   0          56m
    speaker-gvfnf   4/4     Running   0          56m
    ...

  2. Display the running configuration for FRR:

    $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show running-config"

    Example output

    Building configuration...
    
    Current configuration:
    !
    frr version 7.5.1_git
    frr defaults traditional
    hostname some-hostname
    log file /etc/frr/frr.log informational
    log timestamp precision 3
    service integrated-vtysh-config
    !
    router bgp 64500  1
     bgp router-id 10.0.1.2
     no bgp ebgp-requires-policy
     no bgp default ipv4-unicast
     no bgp network import-check
     neighbor 10.0.2.3 remote-as 64500  2
     neighbor 10.0.2.3 bfd profile doc-example-bfd-profile-full  3
     neighbor 10.0.2.3 timers 5 15
     neighbor 10.0.2.4 remote-as 64500
     neighbor 10.0.2.4 bfd profile doc-example-bfd-profile-full
     neighbor 10.0.2.4 timers 5 15
     !
     address-family ipv4 unicast
      network 203.0.113.200/30   4
      neighbor 10.0.2.3 activate
      neighbor 10.0.2.3 route-map 10.0.2.3-in in
      neighbor 10.0.2.4 activate
      neighbor 10.0.2.4 route-map 10.0.2.4-in in
     exit-address-family
     !
     address-family ipv6 unicast
      network fc00:f853:ccd:e799::/124
      neighbor 10.0.2.3 activate
      neighbor 10.0.2.3 route-map 10.0.2.3-in in
      neighbor 10.0.2.4 activate
      neighbor 10.0.2.4 route-map 10.0.2.4-in in
     exit-address-family
    !
    route-map 10.0.2.3-in deny 20
    !
    route-map 10.0.2.4-in deny 20
    !
    ip nht resolve-via-default
    !
    ipv6 nht resolve-via-default
    !
    line vty
    !
    bfd
     profile doc-example-bfd-profile-full
      transmit-interval 35
      receive-interval 35
      passive-mode
      echo-mode
      echo-interval 35
      minimum-ttl 10
     !
    !
    end

    1
    The router bgp section indicates the ASN for MetalLB.
    2
    Confirm that a neighbor <ip-address> remote-as <peer-ASN> line exists for each BGP peer custom resource that you added.
    3
    If you configured BFD, confirm that the BFD profile is associated with the correct BGP peer and that the BFD profile appears in the command output.
    4
    Confirm that the network <ip-address-range> lines match the IP address ranges that you specified in address pool custom resources that you added.
  3. Display the BGP summary:

    $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bgp summary"

    Example output

    IPv4 Unicast Summary:
    BGP router identifier 10.0.1.2, local AS number 64500 vrf-id 0
    BGP table version 1
    RIB entries 1, using 192 bytes of memory
    Peers 2, using 29 KiB of memory
    
    Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt
    10.0.2.3        4      64500       387       389        0    0    0 00:32:02            0        1  1
    10.0.2.4        4      64500         0         0        0    0    0    never       Active        0  2
    
    Total number of neighbors 2
    
    IPv6 Unicast Summary:
    BGP router identifier 10.0.1.2, local AS number 64500 vrf-id 0
    BGP table version 1
    RIB entries 1, using 192 bytes of memory
    Peers 2, using 29 KiB of memory
    
    Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt
    10.0.2.3        4      64500       387       389        0    0    0 00:32:02 NoNeg
    10.0.2.4        4      64500         0         0        0    0    0    never       Active        0
    
    Total number of neighbors 2

    1
    Confirm that the output includes a line for each BGP peer custom resource that you added.
    2
    Output that shows 0 messages received and messages sent indicates a BGP peer that does not have a BGP session. Check network connectivity and the BGP configuration of the BGP peer.
  4. Display the BGP peers that received an address pool:

    $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bgp ipv4 unicast 203.0.113.200/30"

    Replace ipv4 with ipv6 to display the BGP peers that received an IPv6 address pool. Replace 203.0.113.200/30 with an IPv4 or IPv6 IP address range from an address pool.

    Example output

    BGP routing table entry for 203.0.113.200/30
    Paths: (1 available, best #1, table default)
      Advertised to non peer-group peers:
      10.0.2.3  1
      Local
        0.0.0.0 from 0.0.0.0 (10.0.1.2)
          Origin IGP, metric 0, weight 32768, valid, sourced, local, best (First path received)
          Last update: Mon Jan 10 19:49:07 2022

    1
    Confirm that the output includes an IP address for a BGP peer.

32.12.3. Troubleshooting BFD issues

The Bidirectional Forwarding Detection (BFD) implementation that Red Hat supports uses FRRouting (FRR) in a container in the speaker pods. The BFD implementation relies on BFD peers also being configured as BGP peers with an established BGP session. As a cluster administrator, if you need to troubleshoot BFD configuration issues, you need to run commands in the FRR container.

Prerequisites

  • You have access to the cluster as a user with the cluster-admin role.
  • You have installed the OpenShift CLI (oc).

Procedure

  1. Display the names of the speaker pods:

    $ oc get -n metallb-system pods -l component=speaker

    Example output

    NAME            READY   STATUS    RESTARTS   AGE
    speaker-66bth   4/4     Running   0          26m
    speaker-gvfnf   4/4     Running   0          26m
    ...

  2. Display the BFD peers:

    $ oc exec -n metallb-system speaker-66bth -c frr -- vtysh -c "show bfd peers brief"

    Example output

    Session count: 2
    SessionId  LocalAddress              PeerAddress              Status
    =========  ============              ===========              ======
    3909139637 10.0.1.2                  10.0.2.3                 up  <.>

    <.> Confirm that the PeerAddress column includes each BFD peer. If the output does not list a BFD peer IP address that you expected the output to include, troubleshoot BGP connectivity with the peer. If the status field indicates down, check for connectivity on the links and equipment between the node and the peer. You can determine the node name for the speaker pod with a command like oc get pods -n metallb-system speaker-66bth -o jsonpath='{.spec.nodeName}'.

32.12.4. MetalLB metrics for BGP and BFD

OpenShift Container Platform captures the following Prometheus metrics for MetalLB that relate to BGP peers and BFD profiles.

Table 32.11. MetalLB BFD metrics
NameDescription

metallb_bfd_control_packet_input

Counts the number of BFD control packets received from each BFD peer.

metallb_bfd_control_packet_output

Counts the number of BFD control packets sent to each BFD peer.

metallb_bfd_echo_packet_input

Counts the number of BFD echo packets received from each BFD peer.

metallb_bfd_echo_packet_output

Counts the number of BFD echo packets sent to each BFD.

metallb_bfd_session_down_events

Counts the number of times the BFD session with a peer entered the down state.

metallb_bfd_session_up

Indicates the connection state with a BFD peer. 1 indicates the session is up and 0 indicates the session is down.

metallb_bfd_session_up_events

Counts the number of times the BFD session with a peer entered the up state.

metallb_bfd_zebra_notifications

Counts the number of BFD Zebra notifications for each BFD peer.

Table 32.12. MetalLB BGP metrics
NameDescription

metallb_bgp_announced_prefixes_total

Counts the number of load balancer IP address prefixes that are advertised to BGP peers. The terms prefix and aggregated route have the same meaning.

metallb_bgp_session_up

Indicates the connection state with a BGP peer. 1 indicates the session is up and 0 indicates the session is down.

metallb_bgp_updates_total

Counts the number of BGP update messages sent to each BGP peer.

metallb_bgp_opens_sent

Counts the number of BGP open messages sent to each BGP peer.

metallb_bgp_opens_received

Counts the number of BGP open messages received from each BGP peer.

metallb_bgp_notifications_sent

Counts the number of BGP notification messages sent to each BGP peer.

metallb_bgp_updates_total_received

Counts the number of BGP update messages received from each BGP peer.

metallb_bgp_keepalives_sent

Counts the number of BGP keepalive messages sent to each BGP peer.

metallb_bgp_keepalives_received

Counts the number of BGP keepalive messages received from each BGP peer.

metallb_bgp_route_refresh_sent

Counts the number of BGP route refresh messages sent to each BGP peer.

metallb_bgp_total_sent

Counts the number of total BGP messages sent to each BGP peer.

metallb_bgp_total_received

Counts the number of total BGP messages received from each BGP peer.

Additional resources

32.12.5. About collecting MetalLB data

You can use the oc adm must-gather CLI command to collect information about your cluster, your MetalLB configuration, and the MetalLB Operator. The following features and objects are associated with MetalLB and the MetalLB Operator:

  • The namespace and child objects that the MetalLB Operator is deployed in
  • All MetalLB Operator custom resource definitions (CRDs)

The oc adm must-gather CLI command collects the following information from FRRouting (FRR) that Red Hat uses to implement BGP and BFD:

  • /etc/frr/frr.conf
  • /etc/frr/frr.log
  • /etc/frr/daemons configuration file
  • /etc/frr/vtysh.conf

The log and configuration files in the preceding list are collected from the frr container in each speaker pod.

In addition to the log and configuration files, the oc adm must-gather CLI command collects the output from the following vtysh commands:

  • show running-config
  • show bgp ipv4
  • show bgp ipv6
  • show bgp neighbor
  • show bfd peer

No additional configuration is required when you run the oc adm must-gather CLI command.

Additional resources

Red Hat logoGithubRedditYoutubeTwitter

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

We help Red Hat users innovate and achieve their goals with our products and services with content they can trust.

Making open source more inclusive

Red Hat is committed to replacing problematic language in our code, documentation, and web properties. For more details, see the Red Hat Blog.

About Red Hat

We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.

© 2024 Red Hat, Inc.