Chapter 12. Configuring IP failover
To provide high availability for Virtual IP addresses and ensure services remain accessible when nodes fail in OpenShift Container Platform, you can configure IP failover using Keepalived.
IP failover uses Keepalived to host a set of externally accessible Virtual IP (VIP) addresses on a set of hosts. Each VIP address is only serviced by a single host at a time. Keepalived uses the Virtual Router Redundancy Protocol (VRRP) to determine which host, from the set of hosts, services which VIP. If a host becomes unavailable, or if the service that Keepalived is watching does not respond, the VIP is switched to another host from the set. This means a VIP is always serviced as long as a host is available.
Every VIP in the set is serviced by a node selected from the set. If a single node is available, the VIPs are served. There is no way to explicitly distribute the VIPs over the nodes, so there can be nodes with no VIPs and other nodes with many VIPs. If there is only one node, all VIPs are on it.
The administrator must ensure that all of the VIP addresses meet the following requirements:
- Accessible on the configured hosts from outside the cluster.
- Not used for any other purpose within the cluster.
Keepalived on each node determines whether the needed service is running. If it is, VIPs are supported and Keepalived participates in the negotiation to determine which node serves the VIP. For a node to participate, the service must be listening on the watch port on a VIP or the check must be disabled.
Each VIP in the set might be served by a different node.
IP failover monitors a port on each VIP to determine whether the port is reachable on the node. If the port is not reachable, the VIP is not assigned to the node. If the port is set to 0, this check is suppressed. The check script does the needed testing.
When a node running Keepalived passes the check script, the VIP on that node can enter the master state based on its priority and the priority of the current master and as determined by the preemption strategy.
A cluster administrator can provide a script through the OPENSHIFT_HA_NOTIFY_SCRIPT variable, and this script is called whenever the state of the VIP on the node changes. Keepalived uses the master state when it is servicing the VIP, the backup state when another node is servicing the VIP, or in the fault state when the check script fails. The notify script is called with the new state whenever the state changes.
You can create an IP failover deployment configuration on OpenShift Container Platform. The IP failover deployment configuration specifies the set of VIP addresses, and the set of nodes on which to service them. A cluster can have multiple IP failover deployment configurations, with each managing its own set of unique VIP addresses. Each node in the IP failover configuration runs an IP failover pod, and this pod runs Keepalived.
When using VIPs to access a pod with host networking, the application pod runs on all nodes that are running the IP failover pods. This enables any of the IP failover nodes to become the master and service the VIPs when needed. If application pods are not running on all nodes with IP failover, either some IP failover nodes never service the VIPs or some application pods never receive any traffic. Use the same selector and replication count, for both IP failover and the application pods, to avoid this mismatch.
While using VIPs to access a service, any of the nodes can be in the IP failover set of nodes, since the service is reachable on all nodes, no matter where the application pod is running. Any of the IP failover nodes can become master at any time. The service can either use external IPs and a service port or it can use a NodePort. Setting up a NodePort is a privileged operation.
When using external IPs in the service definition, the VIPs are set to the external IPs, and the IP failover monitoring port is set to the service port. When using a node port, the port is open on every node in the cluster, and the service load-balances traffic from whatever node currently services the VIP. In this case, the IP failover monitoring port is set to the NodePort in the service definition.
Even though a service VIP is highly available, performance can still be affected. Keepalived makes sure that each of the VIPs is serviced by some node in the configuration, and several VIPs can end up on the same node even when other nodes have none. Strategies that externally load-balance across a set of VIPs can be thwarted when IP failover puts multiple VIPs on the same node.
When you use ExternalIP, you can set up IP failover to have the same VIP range as the ExternalIP range. You can also disable the monitoring port. In this case, all of the VIPs appear on same node in the cluster. Any user can set up a service with an ExternalIP and make it highly available.
There are a maximum of 254 VIPs in the cluster.
12.1. IP failover environment variables Copy linkLink copied to clipboard!
The IP failover environment variables reference lists all variables you can use to configure IP failover in OpenShift Container Platform, including VIP addresses, monitoring ports, and network interfaces.
| Variable Name | Default | Description |
|---|---|---|
|
|
|
The IP failover pod tries to open a TCP connection to this port on each Virtual IP (VIP). If connection is established, the service is considered to be running. If this port is set to |
|
|
The interface name that IP failover uses to send Virtual Router Redundancy Protocol (VRRP) traffic. The default value is
If your cluster uses the OVN-Kubernetes network plugin, set this value to | |
|
|
|
The number of replicas to create. This must match |
|
|
The list of IP address ranges to replicate. This must be provided. For example, | |
|
|
|
The offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is |
|
|
The number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the | |
|
| INPUT |
The name of the iptables chain, to automatically add an |
|
| The full path name in the pod file system of a script that is periodically run to verify the application is operating. | |
|
|
| The period, in seconds, that the check script is run. |
|
| The full path name in the pod file system of a script that is run whenever the state changes. | |
|
|
|
The strategy for handling a new higher priority host. The |
12.2. Configuring IP failover in your cluster Copy linkLink copied to clipboard!
To configure IP failover in your OpenShift Container Platform cluster and provide high availability for Virtual IP addresses, you can create a deployment that runs Keepalived on selected nodes to monitor services and fail over VIPs when nodes become unavailable.
The IP failover deployment ensures that a failover pod runs on each of the nodes matching the constraints or the label used. The pod, which runs Keepalived, can monitor an endpoint and use Virtual Router Redundancy Protocol (VRRP) to fail over the virtual IP (VIP) from one node to another if the first node cannot reach the service or endpoint.
For production use, set a selector that selects at least two nodes, and set replicas equal to the number of selected nodes.
Prerequisites
-
You have logged in to the cluster as a user with
cluster-adminprivileges. - You have created a pull secret.
Red Hat OpenStack Platform (RHOSP) only:
- You have installed an RHOSP client (RHCOS documentation) on the target environment.
-
You have downloaded the RHOSP
openrc.shrc file (RHCOS documentation).
Procedure
Create an IP failover service account:
oc create sa ipfailover
$ oc create sa ipfailoverCopy to Clipboard Copied! Toggle word wrap Toggle overflow Update security context constraints (SCC) for
hostNetwork:oc adm policy add-scc-to-user privileged -z ipfailover
$ oc adm policy add-scc-to-user privileged -z ipfailoverCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc adm policy add-scc-to-user hostnetwork -z ipfailover
$ oc adm policy add-scc-to-user hostnetwork -z ipfailoverCopy to Clipboard Copied! Toggle word wrap Toggle overflow Red Hat OpenStack Platform (RHOSP) only: Complete the following steps to make a failover VIP address reachable on RHOSP ports.
Use the RHOSP CLI to show the default RHOSP API and VIP addresses in the
allowed_address_pairsparameter of your RHOSP cluster:openstack port show <cluster_name> -c allowed_address_pairs
$ openstack port show <cluster_name> -c allowed_address_pairsCopy to Clipboard Copied! Toggle word wrap Toggle overflow Output example
*Field* *Value* allowed_address_pairs ip_address='192.168.0.5', mac_address='fa:16:3e:31:f9:cb' ip_address='192.168.0.7', mac_address='fa:16:3e:31:f9:cb'*Field* *Value* allowed_address_pairs ip_address='192.168.0.5', mac_address='fa:16:3e:31:f9:cb' ip_address='192.168.0.7', mac_address='fa:16:3e:31:f9:cb'Copy to Clipboard Copied! Toggle word wrap Toggle overflow Set a different VIP address for the IP failover deployment and make the address reachable on RHOSP ports by entering the following command in the RHOSP CLI. Do not set any default RHOSP API and VIP addresses as the failover VIP address for the IP failover deployment.
Example of adding the
1.1.1.1failover IP address as an allowed address on RHOSP ports.openstack port set <cluster_name> --allowed-address ip-address=1.1.1.1,mac-address=fa:fa:16:3e:31:f9:cb
$ openstack port set <cluster_name> --allowed-address ip-address=1.1.1.1,mac-address=fa:fa:16:3e:31:f9:cbCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Create a deployment YAML file to configure IP failover for your deployment. See "Example deployment YAML for IP failover configuration" in a later step.
Specify the following specification in the IP failover deployment so that you pass the failover VIP address to the
OPENSHIFT_HA_VIRTUAL_IPSenvironment variable:Example of adding the
1.1.1.1VIP address toOPENSHIFT_HA_VIRTUAL_IPSCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Create a deployment YAML file to configure IP failover.
NoteFor Red Hat OpenStack Platform (RHOSP), you do not need to re-create the deployment YAML file. You already created this file as part of the earlier instructions.
Example deployment YAML for IP failover configuration
Copy to Clipboard Copied! Toggle word wrap Toggle overflow where:
ipfailover-keepalived- Specifies the name of the IP failover deployment.
OPENSHIFT_HA_VIRTUAL_IPS-
Specifies the lis t of IP address ranges to replicate. This must be provided. For example,
1.2.3.4-6,1.2.3.9. OPENSHIFT_HA_VIP_GROUPS-
Specifies the number of groups to create for VRRP. If not set, a group is created for each virtual IP range specified with the
OPENSHIFT_HA_VIP_GROUPSvariable. OPENSHIFT_HA_NETWORK_INTERFACE-
Specifies the interface name that IP failover uses to send VRRP traffic. By default,
eth0is used. OPENSHIFT_HA_MONITOR_PORT-
Specifies the IP failover pod tries to open a TCP connection to this port on each VIP. If connection is established, the service is considered to be running. If this port is set to
0, the test always passes. The default value is80. OPENSHIFT_HA_VRRP_ID_OFFSET-
Specifies the offset value used to set the virtual router IDs. Using different offset values allows multiple IP failover configurations to exist within the same cluster. The default offset is
10, and the allowed range is0through255. OPENSHIFT_HA_REPLICA_COUNT-
Specifies the number of replicas to create. This must match
spec.replicasvalue in IP failover configuration. The default value is2. OPENSHIFT_HA_USE_UNICAST-
Specifies whether to use unicast mode for VRRP. The default value is
false. OPENSHIFT_HA_UNICAST_PEERS-
Specifies the list of IP addresses of the unicast peers. This must be provided if
OPENSHIFT_HA_USE_UNICASTis set totrue. OPENSHIFT_HA_IPTABLES_CHAIN-
Specifies the name of the
iptableschain to automatically add aniptablesrule to allow the VRRP traffic on. If the value is not set, aniptablesrule is not added. If the chain does not exist, it is not created, and Keepalived operates in unicast mode. The default isINPUT. OPENSHIFT_HA_NOTIFY_SCRIPT- Specifies the full path name in the pod file system of a script that is run whenever the state changes.
OPENSHIFT_HA_CHECK_SCRIPT- Specifies the full path name in the pod file system of a script that is periodically run to verify the application is operating.
OPENSHIFT_HA_PREEMPTION-
Specifies the strategy for handling a new higher priority host. The default value is
preempt_delay 300, which causes a Keepalived instance to take over a VIP after 5 minutes if a lower-priority master is holding the VIP. OPENSHIFT_HA_CHECK_INTERVAL-
Specifies the period, in seconds, that the check script is run. The default value is
2. openshift-pull-secret- Specifies the name of the pull secret to use for the IP failover deployment. Create the pull secret before creating the deployment, otherwise you will get an error when creating the deployment.
12.3. Configuring check and notify scripts Copy linkLink copied to clipboard!
To customize health monitoring for IP failover and receive notifications when VIP state changes in OpenShift Container Platform, you can configure check and notify scripts by using ConfigMap objects.
The check and notify scripts run inside the IP failover pod and use the pod file system rather than the host file system. The host file system is available to the pod under the /hosts mount path. When configuring a check or notify script, you must provide the full path to the script.
Each IP failover pod manages a Keepalived daemon that controls one or more virtual IP (VIP) addresses on the node where the pod is running. Keepalived tracks the state of each VIP on the node, which can be master, backup, or fault.
The full path names of the check and notify scripts are added to the Keepalived configuration file, /etc/keepalived/keepalived.conf, which is loaded each time Keepalived starts. You add the scripts to the pod by using a ConfigMap object, as described in the following sections.
- Check script
Keepalived monitors application health by periodically running an optional, user-supplied check script. For example, the script can test a web server by issuing a request and verifying the response. If you do not provide a check script, Keepalived runs a default script that tests the TCP connection. This default test is suppressed when the monitor port is set to
0.If the check script returns a non-zero value, the node enters the
backupstate and any VIPs that it holds are reassigned.- Notify script
As a cluster administrator, you can provide an optional notify script that Keepalived calls whenever the VIP state changes. Keepalived passes the following parameters to the notify script:
-
$1—grouporinstance -
$2— Name of thegrouporinstance -
$3— New state:master,backup, orfault
-
Prerequisites
-
You installed the OpenShift CLI (
oc). -
You are logged in to the cluster with a user with
cluster-adminprivileges.
Procedure
Create the desired script and create a
ConfigMapobject to hold it. The script has no input arguments and must return0forOKand1forfail.The check script,
mycheckscript.sh:#!/bin/bash # Whatever tests are needed # E.g., send request and verify response exit 0#!/bin/bash # Whatever tests are needed # E.g., send request and verify response exit 0Copy to Clipboard Copied! Toggle word wrap Toggle overflow Create the
ConfigMapobject by running the following command:oc create configmap mycustomcheck --from-file=mycheckscript.sh
$ oc create configmap mycustomcheck --from-file=mycheckscript.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow Add the script to the pod. The
defaultModefor the mountedConfigMapobject files must able to run by usingoccommands or by editing the IP failover configuration.Add the script to the pod by running the following command. A value of
0755,493decimal, is typical. For example:oc set env deploy/ipfailover-keepalived \ OPENSHIFT_HA_CHECK_SCRIPT=/etc/keepalive/mycheckscript.sh$ oc set env deploy/ipfailover-keepalived \ OPENSHIFT_HA_CHECK_SCRIPT=/etc/keepalive/mycheckscript.shCopy to Clipboard Copied! Toggle word wrap Toggle overflow oc set volume deploy/ipfailover-keepalived --add --overwrite \ --name=config-volume \ --mount-path=/etc/keepalive \ --source='{"configMap": { "name": "mycustomcheck", "defaultMode": 493}}'$ oc set volume deploy/ipfailover-keepalived --add --overwrite \ --name=config-volume \ --mount-path=/etc/keepalive \ --source='{"configMap": { "name": "mycustomcheck", "defaultMode": 493}}'Copy to Clipboard Copied! Toggle word wrap Toggle overflow NoteThe
oc set envcommand is whitespace sensitive. There must be no whitespace on either side of the=sign.Alternatively, edit the
ipfailover-keepalivedconfiguration by running the following command:oc edit deploy ipfailover-keepalived
$ oc edit deploy ipfailover-keepalivedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example
ipfailover-keepalivedconfigurationCopy to Clipboard Copied! Toggle word wrap Toggle overflow where:
spec.container.env.name-
Specifies the
OPENSHIFT_HA_CHECK_SCRIPTenvironment variable to point to the mounted script file. spec.container.volumeMounts-
Specifies the
spec.container.volumeMountsfield to create the mount point. spec.volumes-
Specifies a new
spec.volumesfield to mention the config map. spec.volumes.configMap.defaultMode-
Specifies run permission on the files. When read back, it is displayed in decimal,
493.
-
Save the changes and exit the editor. This restarts the
ipfailover-keepalivedconfiguration.
12.4. Configuring VRRP preemption Copy linkLink copied to clipboard!
To control VIP preemption behavior when nodes recover in OpenShift Container Platform, you can configure the OPENSHIFT_HA_PREEMPTION variable to set a delay before higher priority VIPs take over or disable preemption entirely.
When a virtual IP (VIP) on a node recovers from the fault state, it enters the backup state if it has a lower priority than the VIP currently in the master state.
There are two options for the OPENSHIFT_HA_PREEMPTION variable:
-
nopreempt: When set, themasterrole does not move from a lower-priority VIP to a higher-priority VIP. -
preempt_delay 300: When set, Keepalived waits 300 seconds before moving themasterrole to the higher-priority VIP.
In the following example, the OPENSHIFT_HA_PREEMPTION value is set to preempt_delay 300.
Procedure
To specify preemption enter
oc edit deploy ipfailover-keepalivedto edit the router deployment configuration:oc edit deploy ipfailover-keepalived
$ oc edit deploy ipfailover-keepalivedCopy to Clipboard Copied! Toggle word wrap Toggle overflow Copy to Clipboard Copied! Toggle word wrap Toggle overflow
12.5. Deploying multiple IP failover instances Copy linkLink copied to clipboard!
When deploying multiple IP failover instances in OpenShift Container Platform, each Keepalived daemon assigns unique VRRP IDs to virtual IP addresses. Configure the OPENSHIFT_HA_VRRP_ID_OFFSET variable to prevent VRRP ID range overlaps between different IP failover configurations.
Each IP failover pod created by an IP failover configuration (one pod per node or replica) runs a Keepalived daemon. When multiple IP failover configurations are present, additional pods are created, and their Keepalived daemons participate together in Virtual Router Redundancy Protocol (VRRP) negotiation. This negotiation determines which node services each virtual IP (VIP).
For each VIP, Keepalived assigns a unique internal vrrp-id. During VRRP negotiation, these vrrp-id values are used to select the node that services the corresponding VIP.
The IP failover pod assigns vrrp-id values sequentially to the VIPs defined in the IP failover configuration, starting from the value specified by OPENSHIFT_HA_VRRP_ID_OFFSET. Valid vrrp-id values are in the range 1..255.
When you deploy multiple IP failover configurations, ensure that the configured offset leaves sufficient space for additional VIPs and prevents vrrp-id ranges from overlapping across configurations.
12.6. Configuring IP failover for more than 254 addresses Copy linkLink copied to clipboard!
To configure IP failover for more than 254 Virtual IP addresses in OpenShift Container Platform, you can use the OPENSHIFT_HA_VIP_GROUPS variable to group multiple addresses together. By using the OPENSHIFT_HA_VIP_GROUPS variable, you can change the number of VIPs per VRRP instance and define the number of VIP groups available for each VRRP instance when configuring IP failover.
Grouping VIPs creates a wider range of allocation of VIPs per VRRP in the case of VRRP failover events, and is useful when all hosts in the cluster have access to a service locally. For example, when a service is being exposed with an ExternalIP.
As a rule for failover, do not limit services, such as the router, to one specific host. Instead, services should be replicated to each host so that in the case of IP failover, the services do not have to be recreated on the new host.
If you are using OpenShift Container Platform health checks, the nature of IP failover and groups means that all instances in the group are not checked. For that reason, the Kubernetes health checks must be used to ensure that services are live.
Prerequisites
-
You are logged in to the cluster with a user with
cluster-adminprivileges.
Procedure
To change the number of IP addresses assigned to each group, change the value for the
OPENSHIFT_HA_VIP_GROUPSvariable, for example:Example
DeploymentYAML for IP failover configurationCopy to Clipboard Copied! Toggle word wrap Toggle overflow In this example, the
OPENSHIFT_HA_VIP_GROUPSvariable is set to3. In an environment with seven VIPs, it creates three groups, assigning three VIPs to the first group, and two VIPs to the two remaining groups.NoteIf the number of groups set by
OPENSHIFT_HA_VIP_GROUPSis fewer than the number of IP addresses set to fail over, the group contains more than one IP address, and all of the addresses move as a single unit.
12.7. High availability For ExternalIP Copy linkLink copied to clipboard!
High availability for ExternalIP in non-cloud clusters of OpenShift Container Platform combines IP failover with ExternalIP auto-assignment to ensure services remain accessible when nodes fail. You can configure this by using the same CIDR range for both ExternalIP auto-assignment and IP failover.
To configure high availability for ExternalIP, you can specify a spec.ExternalIP.autoAssignCIDRs range of the cluster network configuration, and then use the same range in creating the IP failover configuration.
Because IP failover can support up to a maximum of 255 VIPs for the entire cluster, the spec.ExternalIP.autoAssignCIDRs must be /24 or smaller.
12.8. Removing IP failover Copy linkLink copied to clipboard!
To remove IP failover from your OpenShift Container Platform cluster and clean up iptables rules and virtual IP addresses, you can delete the deployment and service account, then run a cleanup job on each configured node.
When IP failover is initially configured, the worker nodes in the cluster are modified with an iptables rule that explicitly allows multicast packets on 224.0.0.18 for Keepalived. Because of the change to the nodes, removing IP failover requires running a job to remove the iptables rule and removing the virtual IP addresses used by Keepalived.
Procedure
Optional: Identify and delete any check and notify scripts that are stored as config maps:
Identify whether any pods for IP failover use a config map as a volume:
Copy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
Namespace: default Pod: keepalived-worker-59df45db9c-2x9mn Volumes that use config maps: volume: config-volume configMap: mycustomcheck
Namespace: default Pod: keepalived-worker-59df45db9c-2x9mn Volumes that use config maps: volume: config-volume configMap: mycustomcheckCopy to Clipboard Copied! Toggle word wrap Toggle overflow If the preceding step provided the names of config maps that are used as volumes, delete the config maps:
oc delete configmap <configmap_name>
$ oc delete configmap <configmap_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow
Identify an existing deployment for IP failover:
oc get deployment -l ipfailover
$ oc get deployment -l ipfailoverCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE default ipfailover 2/2 2 2 105d
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE default ipfailover 2/2 2 2 105dCopy to Clipboard Copied! Toggle word wrap Toggle overflow Delete the deployment:
oc delete deployment <ipfailover_deployment_name>
$ oc delete deployment <ipfailover_deployment_name>Copy to Clipboard Copied! Toggle word wrap Toggle overflow Remove the
ipfailoverservice account:oc delete sa ipfailover
$ oc delete sa ipfailoverCopy to Clipboard Copied! Toggle word wrap Toggle overflow Run a job that removes the IP tables rule that was added when IP failover was initially configured:
Create a file such as
remove-ipfailover-job.yamlwith contents that are similar to the following example:Copy to Clipboard Copied! Toggle word wrap Toggle overflow -
The
nodeSelectoris likely the same as the selector used in the old IP failover deployment. - Run the job for each node in your cluster that was configured for IP failover and replace the hostname each time.
-
The
Run the job:
oc create -f remove-ipfailover-job.yaml
$ oc create -f remove-ipfailover-job.yamlCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
job.batch/remove-ipfailover-2h8dm created
job.batch/remove-ipfailover-2h8dm createdCopy to Clipboard Copied! Toggle word wrap Toggle overflow
Verification
Confirm that the job removed the initial configuration for IP failover.
oc logs job/remove-ipfailover-2h8dm
$ oc logs job/remove-ipfailover-2h8dmCopy to Clipboard Copied! Toggle word wrap Toggle overflow Example output
remove-failover.sh: OpenShift IP Failover service terminating. - Removing ip_vs module ... - Cleaning up ... - Releasing VIPs (interface eth0) ...
remove-failover.sh: OpenShift IP Failover service terminating. - Removing ip_vs module ... - Cleaning up ... - Releasing VIPs (interface eth0) ...Copy to Clipboard Copied! Toggle word wrap Toggle overflow