4.2. Storage Topology

Once the server has all the required software installed, configuration can commence. Configuring the server node to map the external LUN will require some incremental configuration on the server, some incremental configuration on the array and then back to the server to verify that all LUNs were mapped. Since the storage pathways will be multipathed, all LUNS must be visible down both ports on the HBA before moving onto the multipath configuration.

4.2.1. HBA WWPN Mapping

Fiber Channel HBAs typically have two ports or, for extra redundancy, two single-ported HBAs are deployed. In either case, the World-Wide Port Number (WWPN) for each port must be acquired for both nodes and used to register the LUNS so the storage array will accept the connection request. Try to install the FCP ports into the server before RHEL is installed. This will insure they are configured for use, once the install is complete.

When FCP switch zones are deployed to isolate database traffic to a specific set of FCP array ports on the array, the switch will identify the ports physically, or you can also use the specific WWPN. Most storage administrators know how to do this, but this is what must happen to make sure two copies of the LUNS show on each server node.

The storage array typically bundles the LUNS that are reserved for this cluster into an initiator group, and this group list must contain all four WWPNs so that all four requesting HBA ports can see the set of LUNS.

On RHEL, the easiest place to look for the HBA WWPNs is in the /sys directory, but the switch often has logged the port names as well, so you can look there if you know how the HBAs are connected to the switch.

$ cat /sys/class/block/fc_host/host0/port_name
0x210000e08b806ba0

$ cat /sys/class/block/fc_host/host1/port_name
0x210100e08ba06ba0

Use the hex values from the /sys inquiry. Do not use the WWNN or node name. WWPNs needed to be added to the initiator group on the array, and to the appropriate zone on the switch. Once these steps are complete, reboot the server and you should see two sets of identical LUNS. You cannot proceed to the multipath configuration section until there are two identical sets.

4.2.2. Multipath Configuration

The software feature Device-Mapper Multipath (DM-Multipath) was installed as part of the kickstart and is used to provide pathway redundancy to the LUN. Configuring DM-Multipath must be the next step. Both the Red Hat Cluster Suite quorum disk and the Oracle Clusterware support disks will need to use the resulting DM-Multipath objects. Once DM-Multipath is configured, the block device entries that will be used appear in /dev/mapper.

The installation of DM-Multipath creates an rc service and a disabled /etc/multipath.conf file. The task in this section is to create reasonable aliases for the LUN, and also to define how failure processing is managed. The default configuration in this file is to blacklist everything, so this clause must be modified, removed, or commented out and then multipath must be restarted or refreshed. Be sure the multipah daemon is set to run at reboot. Also, reboot of the server should take place now to ensure that the duplicate sets of LUN are visible.

To create aliases for LUNs, the WWID of the scsi LUN must retrieved and used in the alias clause. The previous method for gathering WWIDs required the execution of the scsi_id command on each LUN.

$ scsi_id -g -s /block/sdc #External LUN, returns 360a9800056724671684a514137392d65
$ scsi_id -g -s /block/sdd #External LUN, returns 360a9800056724671684a502d34555579

The following example of a multipath configuration file shows the Red Hat Cluster Suite quorum disk and, for the RAC/GFS node, the first of three Oracle Clusterware Voting Disks. This excerpt is the stanza that identifies the WWID of the LUNS in the multipath.conf file.

multipath {
                no_path_retry                fail
                wwid                              360a9800056724671684a514137392d65
                alias                               qdisk
        }
#The following 3 are voting disks that are necessary ONLY for the RAC/GFS configuration!
multipath {
                no_path_retry                fail
                wwid                              360a9800056724671684a502d34555579
                alias                               vote1
        }
multipath {
                no_path_retry                fail
                wwid                              360a9800056724671684a502d34555578
                alias                               vote2
        }
multipath {
                no_path_retry                fail
                wwid                              360a9800056724671684a502d34555577
                alias                               vote3
        }

The only two parameters in the mulipath configuration file that must be changed are path_grouping_policy (set to failover) and path_checker (set to tur). Historically, the default was to readsector0, or directio, both of which create an I/O request. For voting disks on highly loaded clusters, this may cause voting “jitter”. The least invasive path checking policy is TUR (Test Unit Ready), and rarely disturbs qdisk or Clusterware voting. TUR and zone isolation both reduce voting jitter. The voting LUNS could be further isolated into their own zone, but this would require dedicated WWPN pathways; this would likely be more trouble than it is worth.

Some storage vendors will install their HBA driver and also have specific settings for the multipath.conf file, including procedures, defined by the prio_callout parameter. Check with the vendor.

The following example shows the remaining portion of the multipath.conf file.

defaults {
        user_friendly_names     yes
        udev_dir                /dev
        polling_interval        10
        selector                "round-robin 0"
        path_grouping_policy    failover
        getuid_callout          "/sbin/scsi_id -g -u -s /block/%n"
        prio_callout            /bin/true
        path_checker            tur
        rr_min_io               100
        rr_weight               priorities
        failback                immediate
        no_path_retry           fail
        user_friendly_name      yes
}

Now that the multipath.conf file is complete, try restarting the multipath service.

$ service multipathd restart
$ tail -f /var/log/messages #Should see aliases listed
$ chkconfig multipathd on

Customers who want to push the envelope to have both performance and reliability might be surprised to find that multibus is slower than failover in certain situations.

Aside from tweaking for things like failback or a faster polling_interval, the bulk of the recovery latency is in the cluster take-over at the cluster and Oracle recover layers. If high-speed takeover is a critical requirement, then consider using RAC

Note

Because RAC (and therefore Clusterware) is certified for use with Red Hat Cluster Suite, customers may chose a third configuration option of using either OCFS2 or ASM. This is an unusual configuration, but this permits RAC/asm use, combined with the superior fencing of Red Hat Cluster Suite. This configuration is not covered in this manual.

4.2.3. qdisk Configuration

A successful DM-Multipath configuration should produce a set of identifiable inodes in the /dev/mapper directory. The /dev/mapper/qdisk inode will need to be initialized and enabled as a service This is the one of the first pieces of info you need for the /etc/cluster.conf file.

$ mkqdisk –l HA585 –c /dev/mapper/qdisk

By convention, the label is the same name as the cluster; in this case, HA_585. The section of the cluster.conf file looks like the following.


<?xml version="1.0"?>
<cluster config_version="1" name="HA585">
  <fence_daemon post_fail_delay="0" post_join_delay="3"/>
        <quorumd interval="7" device="/dev/mapper/qdisk" tko="9" votes="3" log_level="5"/>

Note

You may need to change the maximum journal size for a partition. The following procedure provides an example of changing the maximum journal size of an existing partition to 400MB.

tune2fs -l /dev/mapper/vg1-oracle |grep -i "journal inode"
debugfs -R "stat <8>" /dev/mapper/vg1-oracle 2>&1 | awk '/Size: /{print $6}
tune2fs -O ^has_journal /dev/mapper/vg1-oracle
tune2fs -J size=400 /dev/mapper/vg1-oracle

Warning

Fencing in two-node clusters is more prone to fence and quorum race conditions than fencing in clusters with three or more nodes. If node 1 can no longer communicate with node 2, then which node is actually the odd man out? Most of these races are resolved by the quorum disk, which is why it is important for the HA case, and mandatory for RAC/GFS.

Note

Red Hat Cluster Suite must be implemented with qdisk, or the configuration is unsupported. Red Hat Cluster Suite has to retain quorum to support a single, surviving RAC node. This single-node operation is required for certified combinations of RAC/GFS.

4.2. Storage Topology

4.2.1. HBA WWPN Mapping

4.2.2. Multipath Configuration

4.2.3. qdisk Configuration

Learn

Try, buy, & sell

Communities

About Red Hat Documentation

Making open source more inclusive

About Red Hat

Red Hat legal and privacy links

Red Hat legal and privacy links