4.3. Network Topology
A cluster’s network is either complicated, or really complicated. The basic cluster involves several sets of logical network pathways. Some of these share physical interfaces, and some require dedicated physical interfaces and VLANs, depending on the degree of robustness required. This example is based on a topology that Red Hat uses to certify Oracle RAC/GFS, but is also suitable for the HA configuration.
Note
Cluster networks require several VLANs and multiple address assignments across those VLANs. If bonds are going to span VLANs or switches, then it might be required to use
ARP
to ensure the correct behavior in the event of a link failure.
4.3.1. Public Network
The public network is the pathway used by the application tier to access the database. The failure scenario is the loss of an entire node, so although bonding does provide protection in the event of the public interface failure, this is not as likely. Bonded public interfaces complicate application tier network configuration and failover sequencing. This network is not bonded in our example.
The hostnames of the server nodes are identified by the public address. All other network interfaces are private, but they still may need addresses assigned by network operations.
Note
Oracle Clusterware (CRS) creates it’s own set of Virtual IPs (VIP) on the public interface. This mechanism makes it possible for CRS on another node to provide continued access to the failed node’s specific public address. Bonded public interfaces, in the presence of CRS VIPs, are not recommended. See Oracle SQL*Net Configuration in both the HA and RAC/GFS Chapters.
4.3.2. Red Hat Cluster Suite Network
The Red Hat Cluster Suite network is used by CMAN to monitor and manage the health of the cluster. This network is critical to the proper functioning of the cluster and is the pathway that is bonded most often.
Note
RAC requires GFS clustered file systems, which utilize the Distributed Lock Manager (DLM) to provide access to GFS. The Oracle Global Cache Services (GCS) is often configured to use this pathway as well. There is a risk of overloading this network, but that is very workload dependent. An advanced administrator may also choose to use Infiniband and Reliable Data Sockets (RDS) to implement GCS.
The network is private, and only ever used by cluster members. The dual-ported e1000 NIC is used for the Red Hat Cluster Suite hearbeat service or Oracle RAC Clusterware services.
The file
/etc/modprobe.conf
contains all four interfaces, and the two ports of the e1000 will be bonded together. The options for bond0 set the bond for failover (not load balance), and the sampling interval is 100ms. Once the file modprobe.conf
file is modified, either remove and reload the e1000 kernel module, or the modification will take effect at the next reboot.
alias eth0 tg3 alias eth1 tg3 alias eth2 e1000 alias eth3 e1000 alias bond0 bonding options bond0 mode=1 miimon=100
The configuration of the bond requires three network-scripts files: One for bond0, and then the corresponding interface files have to be set as well, as shown in the following example.
ifcfg-eth2 # Intel Corporation 82546GB Gigabit Ethernet Controller DEVICE=eth2 HWADDR=00:04:23:D4:88:BE MASTER=bond0 SLAVE=yes BOOTPROTO=none TYPE=Ethernet ONBOOT=no ifcfg-eth3 # Intel Corporation 82546GB Gigabit Ethernet Controller DEVICE=eth3 HWADDR=00:04:23:D4:88:BF MASTER=bond0 SLAVE=yes BOOTPROTO=none TYPE=Ethernet ONBOOT=no ifcfg-bond0 DEVICE=bond0 IPADDR=192.168.2.162 NETMASK=255.255.255.0 NETWORK=192.168.2.0 BROADCAST=192.168.2.255 BOOTPROTO=none TYPE=Ethernet ONBOOT=yes
4.3.3. Fencing Network
When Red Hat Cluster Suite has determined that a cluster node must be removed from the active cluster, it will need to fence this node. The methods used in this cluster are both power-managed. The HP iLO BMC has one Ethernet port, which must be configured, and this information must exactly match the fencing clauses in the
/etc/cluster.conf
file. Most IPMI-based interfaces only have one network interface, which may prove to be a single point of failure for the fencing mechanism. A unique feature of Red Hat Cluster Suite is the ability to nest fence domains to provide an alternative fence method, in case the BMC pathway fails. A switched Power Distribution Unit (PDU) can be configured (and it frequently has only one port). We do not recommend the use of FCP port fencing, nor T.10 SCSI reservations fence agent for mission critical database applications. The address and user/password must also be correct in the /etc/cluster/conf
file.
<fencedevices> <fencedevice agent="fence_ilo" hostname="192.168.1.7" login="rac" name="jLO7" passwd="jeff99"/> <fencedevice agent="fence_ilo" hostname="192.168.1.8" login=”rac” name="jLO8" passwd="jeff99"/> </fencedevices>
Note
You can test the fencing configuration manually with the
fence_node
command. Test early and often.
4.3.4. Red Hat Cluster Suite services
There are now enough hardware and software pieces in place that the
cluster.conf
file can be completed and parts of the cluster can be initialized. Red Hat Cluster Suite consists of a set of services (cman
, qdisk
, fenced
) that ensure cluster integrity. The values below are from the RAC example, with HA values in comments. The timeouts are good starting points for either configuration and comments give the HA equivalent. More details on the RAC example will be provided in Chapter 5, RAC/GFS Cluster Configuration. More details on the HA example will be provided in Chapter 6, Cold Failover Cluster Configuration.
<cluster config_version="2" name="HA585"> <fence_daemon post_fail_delay="0" post_join_delay="3" /> <quorumd interval="7" device="/dev/mapper/qdisk" tko="9" votes="1" log_level="5"/> <cman deadnode_timeout="30" expected_nodes="7"/> <!-- cman deadnode_timeout="30" expected_votes=”3”/ --> <!-- totem token=”31000”--> <multicast addr="225.0.0.12"/> <clusternodes> <clusternode name="rac7-priv" nodeid="1" votes="1"> <multicast addr="225.0.0.12" interface="bond0"/> <fence> <method name="1"> <device name="jLO7"/> </method> </fence> </clusternode> <clusternode name="rac8-priv" nodeid="2" votes="1"> <multicast addr="225.0.0.12" interface="bond0"/> <fence> <method name="1"> <device name="jLO8"/> </method> </fence> </clusternode> <fencedevices> <fencedevice agent="fence_ilo" hostname="192.168.1.7" login="rac" name="jLO7" passwd="jeff123456"/> <fencedevice agent="fence_ilo" hostname="192.168.1.8" login="rac" name="jLO8" passwd="jeff123456"/> </fencedevices>
The cluster node names
rac7-priv
and rac8-priv
need to be resolved and therefore are included in all nodes' /etc/hosts
file:
192.168.1.7 rac7-priv.example.com rac7-priv 192.168.1.8 rac8-priv.example.com rac8-priv
Note
When doing initial testing, set the
init
level to 2 in the /etc/inittab
file, to aid node testing. If the configuration is broken and the node reboots back into init 3
, the startup will hang, and this impedes debugging. Open a window and tail the /var/log/messages
file to track your progress.
The
qdiskd
service is the first service to start and is responsible for parsing the cluster.conf
file. Any errors will appear in the /var/log/messages
file and qdiskd
will exit. If qdiskd
starts up, then cman
should be started next.
Assuming no glitches in configuration (consider yourself talented, if the node enters the cluster on first attempt) we can now ensure that the
qdisk
and cman
services will start on boot:
$sudo chkconfig –level 3 qdiskd on
$sudo chkconfig –level 3 cman on
At this point, we should shut down all services on this node and repeat the steps in this chapter for our second node. You can copy the
multipath.conf
and cluster.conf
configuration files to the second node to make things easier. Now the configuration process diverges to the point that further configuration is very RAC/GFS or HA specific. For information on configuring a RAC/GFS cluster, continue with Chapter 5, RAC/GFS Cluster Configuration. For information on configuring cold failover HA cluster, continue with Chapter 6, Cold Failover Cluster Configuration.