Chapter 29. Routing from Edge Load Balancers
29.1. Overview
Pods inside of an OpenShift Container Platform cluster are only reachable via their IP addresses on the cluster network. An edge load balancer can be used to accept traffic from outside networks and proxy the traffic to pods inside the OpenShift Container Platform cluster. In cases where the load balancer is not part of the cluster network, routing becomes a hurdle as the internal cluster network is not accessible to the edge load balancer.
To solve this problem where the OpenShift Container Platform cluster is using OpenShift Container Platform SDN as the cluster networking solution, there are two ways to achieve network access to the pods.
29.2. Including the Load Balancer in the SDN
If possible, run an OpenShift Container Platform node instance on the load balancer itself that uses OpenShift SDN as the networking plug-in. This way, the edge machine gets its own Open vSwitch bridge that the SDN automatically configures to provide access to the pods and nodes that reside in the cluster. The routing table is dynamically configured by the SDN as pods are created and deleted, and thus the routing software is able to reach the pods.
Mark the load balancer machine as an unschedulable node so that no pods end up on the load balancer itself:
$ oc adm manage-node <load_balancer_hostname> --schedulable=false
If the load balancer comes packaged as a container, then it is even easier to integrate with OpenShift Container Platform: Simply run the load balancer as a pod with the host port exposed. The pre-packaged HAProxy router in OpenShift Container Platform runs in precisely this fashion.
29.3. Establishing a Tunnel Using a Ramp Node
In some cases, the previous solution is not possible. For example, an F5 BIG-IP® host cannot run an OpenShift Container Platform node instance or the OpenShift Container Platform SDN because F5® uses a custom, incompatible Linux kernel and distribution.
Instead, to enable F5 BIG-IP® to reach pods, you can choose an existing node within the cluster network as a ramp node and establish a tunnel between the F5 BIG-IP® host and the designated ramp node. Because it is otherwise an ordinary OpenShift Container Platform node, the ramp node has the necessary configuration to route traffic to any pod on any node in the cluster network. The ramp node thus assumes the role of a gateway through which the F5 BIG-IP® host has access to the entire cluster network.
Following is an example of establishing an ipip tunnel between an F5 BIG-IP® host and a designated ramp node.
On the F5 BIG-IP® host:
Set the following variables:
# F5_IP=10.3.89.66 1 # RAMP_IP=10.3.89.89 2 # TUNNEL_IP1=10.3.91.216 3 # CLUSTER_NETWORK=10.128.0.0/14 4
- 1 2
- The
F5_IP
andRAMP_IP
variables refer to the F5 BIG-IP® host’s and the ramp node’s IP addresses, respectively, on a shared, internal network. - 3
- An arbitrary, non-conflicting IP address for the F5® host’s end of the ipip tunnel.
- 4
- The overlay network CIDR range that the OpenShift SDN uses to assign addresses to pods.
Delete any old route, self, tunnel and SNAT pool:
# tmsh delete net route $CLUSTER_NETWORK || true # tmsh delete net self SDN || true # tmsh delete net tunnels tunnel SDN || true # tmsh delete ltm snatpool SDN_snatpool || true
Create the new tunnel, self, route and SNAT pool and use the SNAT pool in the virtual servers:
# tmsh create net tunnels tunnel SDN \ \{ description "OpenShift SDN" local-address \ $F5_IP profile ipip remote-address $RAMP_IP \} # tmsh create net self SDN \{ address \ ${TUNNEL_IP1}/24 allow-service all vlan SDN \} # tmsh create net route $CLUSTER_NETWORK interface SDN # tmsh create ltm snatpool SDN_snatpool members add { $TUNNEL_IP1 } # tmsh modify ltm virtual ose-vserver source-address-translation { type snat pool SDN_snatpool } # tmsh modify ltm virtual https-ose-vserver source-address-translation { type snat pool SDN_snatpool }
On the ramp node:
The following creates a configuration that is not persistent, meaning that when the ramp node or the openvswitch service is restarted, the settings disappear.
Set the following variables:
# F5_IP=10.3.89.66 # TUNNEL_IP1=10.3.91.216 # TUNNEL_IP2=10.3.91.217 1 # CLUSTER_NETWORK=10.128.0.0/14 2
Delete any old tunnel:
# ip tunnel del tun1 || true
Create the ipip tunnel on the ramp node, using a suitable L2-connected interface (e.g., eth0):
# ip tunnel add tun1 mode ipip \ remote $F5_IP dev eth0 # ip addr add $TUNNEL_IP2 dev tun1 # ip link set tun1 up # ip route add $TUNNEL_IP1 dev tun1 # ping -c 5 $TUNNEL_IP1
SNAT the tunnel IP with an unused IP from the SDN subnet:
# source /run/openshift-sdn/config.env # tap1=$(ip -o -4 addr list tun0 | awk '{print $4}' | cut -d/ -f1 | head -n 1) # subaddr=$(echo ${OPENSHIFT_SDN_TAP1_ADDR:-"$tap1"} | cut -d "." -f 1,2,3) # export RAMP_SDN_IP=${subaddr}.254
Assign this
RAMP_SDN_IP
as an additional address to tun0 (the local SDN’s gateway):# ip addr add ${RAMP_SDN_IP} dev tun0
Modify the OVS rules for SNAT:
# ipflowopts="cookie=0x999,ip" # arpflowopts="cookie=0x999, table=0, arp" # # ovs-ofctl -O OpenFlow13 add-flow br0 \ "${ipflowopts},nw_src=${TUNNEL_IP1},actions=mod_nw_src:${RAMP_SDN_IP},resubmit(,0)" # ovs-ofctl -O OpenFlow13 add-flow br0 \ "${ipflowopts},nw_dst=${RAMP_SDN_IP},actions=mod_nw_dst:${TUNNEL_IP1},resubmit(,0)" # ovs-ofctl -O OpenFlow13 add-flow br0 \ "${arpflowopts}, arp_tpa=${RAMP_SDN_IP}, actions=output:2" # ovs-ofctl -O OpenFlow13 add-flow br0 \ "${arpflowopts}, priority=200, in_port=2, arp_spa=${RAMP_SDN_IP}, arp_tpa=${CLUSTER_NETWORK}, actions=goto_table:30" # ovs-ofctl -O OpenFlow13 add-flow br0 \ "arp, table=5, priority=300, arp_tpa=${RAMP_SDN_IP}, actions=output:2" # ovs-ofctl -O OpenFlow13 add-flow br0 \ "ip,table=5,priority=300,nw_dst=${RAMP_SDN_IP},actions=output:2" # ovs-ofctl -O OpenFlow13 add-flow br0 "${ipflowopts},nw_dst=${TUNNEL_IP1},actions=output:2"
Optionally, if you do not plan on configuring the ramp node to be highly available, mark the ramp node as unschedulable. Skip this step if you do plan to follow the next section and plan on creating a highly available ramp node.
$ oc adm manage-node <ramp_node_hostname> --schedulable=false
The F5 router plug-in integrates with F5 BIG-IP®.
29.3.1. Configuring a Highly-Available Ramp Node
You can use OpenShift Container Platform’s ipfailover feature, which uses keepalived internally, to make the ramp node highly available from F5 BIG-IP®'s point of view. To do so, first bring up two nodes, for example called ramp-node-1 and ramp-node-2, on the same L2 subnet.
Then, choose some unassigned IP address from within the same subnet to use for your virtual IP, or VIP. This will be set as the RAMP_IP
variable with which you will configure your tunnel on F5 BIG-IP®.
For example, suppose you are using the 10.20.30.0/24 subnet for your ramp nodes, and you have assigned 10.20.30.2 to ramp-node-1 and 10.20.30.3 to ramp-node-2. For your VIP, choose some unassigned address from the same 10.20.30.0/24 subnet, for example 10.20.30.4. Then, to configure ipfailover, mark both nodes with a label, such as f5rampnode:
$ oc label node ramp-node-1 f5rampnode=true $ oc label node ramp-node-2 f5rampnode=true
Similar to instructions from the ipfailover documentation, you must now create a service account and add it to the privileged SCC. First, create the f5ipfailover service account:
$ oc create serviceaccount f5ipfailover -n default
Next, you can add the f5ipfailover service to the privileged SCC. To add the f5ipfailover in the default namespace to the privileged SCC, run:
$ oc adm policy add-scc-to-user privileged system:serviceaccount:default:f5ipfailover
Finally, configure ipfailover using your chosen VIP (the RAMP_IP
variable) and the f5ipfailover service account, assigning the VIP to your two nodes using the f5rampnode label you set earlier:
# RAMP_IP=10.20.30.4
# IFNAME=eth0 1
# oc adm ipfailover <name-tag> \
--virtual-ips=$RAMP_IP \
--interface=$IFNAME \
--watch-port=0 \
--replicas=2 \
--service-account=f5ipfailover \
--selector='f5rampnode=true'
- 1
- The interface where
RAMP_IP
should be configured.
With the above setup, the VIP (the RAMP_IP
variable) is automatically re-assigned when the ramp node host that currently has it assigned fails.