Chapter 22. Configuring Layer 3 high availability (HA)
22.1. RHOSP Networking service without high availability (HA)
Red Hat OpenStack Platform (RHOSP) Networking service deployments without any high availability (HA) features are vulnerable to physical node failures.
In a typical deployment, projects create virtual routers, which are scheduled to run on physical Networking service Layer 3 (L3) agent nodes. This becomes an issue when you lose an L3 agent node and the dependent virtual machines subsequently lose connectivity to external networks. Any floating IP addresses are also unavailable. In addition, connectivity is lost between any networks that the router hosts.
22.2. Overview of Layer 3 high availability (HA)
This active/passive high availability (HA) configuration uses the industry standard VRRP (as defined in RFC 3768) to protect project routers and floating IP addresses. A virtual router is randomly scheduled across multiple Red Hat OpenStack Platform (RHOSP) Networking service nodes, with one designated as the active router, and the remainder serving in a standby role.
To deploy Layer 3 (L3) HA, you must maintain similar configuration on the redundant Networking service nodes, including floating IP ranges and access to external networks.
In the following diagram, the active Router1
and Router2
routers are running on separate physical L3 Networking service agent nodes. L3 HA has scheduled backup virtual routers on the corresponding nodes, ready to resume service in the case of a physical node failure. When the L3 agent node fails, L3 HA reschedules the affected virtual router and floating IP addresses to a working node:
During a failover event, instance TCP sessions through floating IPs remain unaffected, and migrate to the new L3 node without disruption. Only SNAT traffic is affected by failover events.
The L3 agent is further protected when in an active/active HA mode.
Additional resources
22.3. Layer 3 high availability (HA) failover conditions
Layer 3 (L3) high availability (HA) for the Red Hat OpenStack Platform (RHOSP) Networking service automatically reschedules protected resources in the following events:
- The Networking service L3 agent node shuts down or otherwise loses power because of a hardware failure.
- The L3 agent node becomes isolated from the physical network and loses connectivity.
Manually stopping the L3 agent service does not induce a failover event.
22.4. Project considerations for Layer 3 high availability (HA)
Red Hat OpenStack Platform (RHOSP) Networking service Layer 3 (L3) high availability (HA) configuration occurs in the back end and is invisible to the project. Projects can continue to create and manage their virtual routers as usual, however there are some limitations to be aware of when designing your L3 HA implementation:
- L3 HA supports up to 255 virtual routers per project.
- Internal VRRP messages are transported within a separate internal network, created automatically for each project. This process occurs transparently to the user.
-
When implementing high availability (HA) routers on ML2/OVS, each L3 agent spawns
haproxy
andneutron-keepalived-state-change-monitor
processes for each router. Each process consumes approximately 20MB of memory. By default, each HA router resides on three L3 agents and consumes resources on each of the nodes. Therefore, when sizing your RHOSP networks, ensure that you have allocated enough memory to support the number of HA routers that you plan to implement.
22.5. High availability (HA) changes to the RHOSP Networking service
The Red Hat OpenStack Platform (RHOSP) Networking service (neutron) API has been updated to allow administrators to set the --ha=True/False
flag when creating a router, which overrides the default configuration of l3_ha
in /var/lib/config-data/puppet-generated/neutron/etc/neutron/neutron.conf
.
High availability (HA) changes to neutron-server:
- Layer 3 (L3) HA assigns the active role randomly, regardless of the scheduler used by the Networking service (whether random or leastrouter).
- The database schema has been modified to handle allocation of virtual IP addresses (VIPs) to virtual routers.
- A transport network is created to direct L3 HA traffic.
HA changes to the Networking service L3 agent:
- A new keepalived manager has been added, providing load-balancing and HA capabilities.
- IP addresses are converted to VIPs.
22.6. Enabling Layer 3 high availability (HA) on RHOSP Networking service nodes
During installation, Red Hat OpenStack Platform (RHOSP) director enables high availability (HA) for virtual routers by default when you have at least two RHOSP Controllers and are not using distributed virtual routing (DVR). Using an RHOSP Orchestration service (heat) parameter, max_l3_agents_per_router
, you can set the maximum number of RHOSP Networking service Layer 3 (L3) agents on which an HA router is scheduled.
Prerequisites
- Your RHOSP deployment does not use DVR.
- You have at least two RHOSP Controllers deployed.
Procedure
Log in to the undercloud as the stack user, and source the
stackrc
file to enable the director command line tools.Example
$ source ~/stackrc
Create a custom YAML environment file.
Example
$ vi /home/stack/templates/my-neutron-environment.yaml
TipThe Orchestration service (heat) uses a set of plans called templates to install and configure your environment. You can customize aspects of the overcloud with a custom environment file, which is a special type of template that provides customization for your heat templates.
Set the
NeutronL3HA
parameter totrue
in the YAML environment file. This ensures HA is enabled even if director did not set it by default.parameter_defaults: NeutronL3HA: 'true'
Set the maximum number of L3 agents on which an HA router is scheduled.
Set the
max_l3_agents_per_router
parameter to a value between the minimum and total number of network nodes in your deployment. (A zero value indicates that the router is scheduled on every agent.)Example
parameter_defaults: NeutronL3HA: 'true' ControllerExtraConfig: neutron::server::max_l3_agents_per_router: 2
In this example, if you deploy four Networking service nodes, only two L3 agents protect each HA virtual router: one active, and one standby.
If you set the value of
max_l3_agents_per_router
to be greater than the number of available network nodes, you can scale out the number of standby routers by adding new L3 agents. For every new L3 agent node that you deploy, the Networking service schedules additional standby versions of the virtual routers until themax_l3_agents_per_router
limit is reached.Run the
openstack overcloud deploy
command and include the core heat templates, environment files, and this new custom environment file.ImportantThe order of the environment files is important because the parameters and resources defined in subsequent environment files take precedence.
Example
$ openstack overcloud deploy --templates \ -e [your-environment-files] \ -e /usr/share/openstack-tripleo-heat-templates/environments/services/my-neutron-environment.yaml
NoteWhen
NeutronL3HA
is set totrue
, all virtual routers that are created default to HA routers. When you create a router, you can override the HA option by including the--no-ha
option in theopenstack router create
command:# openstack router create --no-ha
Additional resources
- Environment files in the Customizing your Red Hat OpenStack Platform deployment guide
- Including environment files in overcloud creation in the Customizing your Red Hat OpenStack Platform deployment guide
22.7. Reviewing high availability (HA) RHOSP Networking service node configurations
Procedure
Run the
ip address
command within the virtual router namespace to return a high availability (HA) device in the result, prefixed with ha-.# ip netns exec qrouter-b30064f9-414e-4c98-ab42-646197c74020 ip address <snip> 2794: ha-45249562-ec: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 12:34:56:78:2b:5d brd ff:ff:ff:ff:ff:ff inet 169.254.0.2/24 brd 169.254.0.255 scope global ha-54b92d86-4f
With Layer 3 HA enabled, virtual routers and floating IP addresses are protected against individual node failure.