Appendix A. Appendix
A.1. Scaling
To scale HCI nodes up or down, the same principles (and for the most part, methods) for scaling Compute or Ceph Storage nodes apply. Be mindful of the following caveats:
A.1.1. Scaling Up
To scale up HCI nodes in a pure HCI environment, use the same methods for scaling up Compute nodes. See Adding Additional Nodes (from Director Installation and Usage) for details.
The same methods apply for scaling up HCI nodes in a mixed HCI environment. When you tag new nodes, remember to use the right flavor (in this case, osdcompute
). See Section 3.2.3, “Creating and Assigning a New Flavor”.
A.1.2. Scaling Down
The process for scaling down HCI nodes (in both pure and mixed HCI environments) can be summarized as follows:
Disable and rebalance the Ceph OSD services on the HCI node. This step is necessary because the director does not automatically rebalance the Red Hat Ceph Storage cluster when you remove HCI or Ceph Storage nodes.
See Scaling Down and Replacing Ceph Storage Nodes (from Red Hat Ceph Storage for the Overcloud). Do not follow the steps here for removing the node, as you will need to migrate instances and disable the Compute services on the node first.
- Migrate the instances from the HCI nodes. See Migrating Instances for instructions.
- Disable the Compute services on the nodes to prevent them from being used to spawn new instances.
- Remove the node from the overcloud.
For the third and fourth step (disabling Compute services and removing the node), see Removing Compute Nodes from Director Installation and Usage.
A.2. Upstream Tools
Heat templates, environment files, scripts, and other resources relevant to hyper-convergence in OpenStack are available from the following upstream Github repository:
https://github.com/RHsyseng/hci/tree/master/custom-templates
This repository features the scripts in Section A.2.1, “Compute CPU and Memory Calculator” and Section A.2.2, “Custom Script to Configure NUMA Pinning for Ceph OSD Services”.
To use these scripts, clone the repository directly to your undercloud:
$ git clone https://github.com/RHsyseng/hci
A.2.1. Compute CPU and Memory Calculator
The Section 4.1, “Reserve CPU and Memory Resources for Compute” section explains how to determine suitable values for reserved_host_memory_mb
and cpu_allocation_ratio
on hyper-converged nodes. You can also use the following Python script (also available from the repository in Section A.2, “Upstream Tools”) to perform the computations for you:
#!/usr/bin/env python # Filename: nova_mem_cpu_calc.py # Supported Langauge(s): Python 2.7.x # Time-stamp: <2017-03-10 20:31:18 jfulton> # ------------------------------------------------------- # This program was originally written by Ben England # ------------------------------------------------------- # Calculates cpu_allocation_ratio and reserved_host_memory # for nova.conf based on on the following inputs: # # input command line parameters: # 1 - total host RAM in GB # 2 - total host cores # 3 - Ceph OSDs per server # 4 - average guest size in GB # 5 - average guest CPU utilization (0.0 to 1.0) # # It assumes that we want to allow 3 GB per OSD # (based on prior Ceph Hammer testing) # and that we want to allow an extra 1/2 GB per Nova (KVM guest) # based on test observations that KVM guests' virtual memory footprint # was actually significantly bigger than the declared guest memory size # This is more of a factor for small guests than for large guests. # ------------------------------------------------------- import sys from sys import argv NOTOK = 1 # process exit status signifying failure MB_per_GB = 1000 GB_per_OSD = 3 GB_overhead_per_guest = 0.5 # based on measurement in test environment cores_per_OSD = 1.0 # may be a little low in I/O intensive workloads def usage(msg): print msg print( ("Usage: %s Total-host-RAM-GB Total-host-cores OSDs-per-server " + "Avg-guest-size-GB Avg-guest-CPU-util") % sys.argv[0]) sys.exit(NOTOK) if len(argv) < 5: usage("Too few command line params") try: mem = int(argv[1]) cores = int(argv[2]) osds = int(argv[3]) average_guest_size = int(argv[4]) average_guest_util = float(argv[5]) except ValueError: usage("Non-integer input parameter") average_guest_util_percent = 100 * average_guest_util # print inputs print "Inputs:" print "- Total host RAM in GB: %d" % mem print "- Total host cores: %d" % cores print "- Ceph OSDs per host: %d" % osds print "- Average guest memory size in GB: %d" % average_guest_size print "- Average guest CPU utilization: %.0f%%" % average_guest_util_percent # calculate operating parameters based on memory constraints only left_over_mem = mem - (GB_per_OSD * osds) number_of_guests = int(left_over_mem / (average_guest_size + GB_overhead_per_guest)) nova_reserved_mem_MB = MB_per_GB * ( (GB_per_OSD * osds) + (number_of_guests * GB_overhead_per_guest)) nonceph_cores = cores - (cores_per_OSD * osds) guest_vCPUs = nonceph_cores / average_guest_util cpu_allocation_ratio = guest_vCPUs / cores # display outputs including how to tune Nova reserved mem print "\nResults:" print "- number of guests allowed based on memory = %d" % number_of_guests print "- number of guest vCPUs allowed = %d" % int(guest_vCPUs) print "- nova.conf reserved_host_memory = %d MB" % nova_reserved_mem_MB print "- nova.conf cpu_allocation_ratio = %f" % cpu_allocation_ratio if nova_reserved_mem_MB > (MB_per_GB * mem * 0.8): print "ERROR: you do not have enough memory to run hyperconverged!" sys.exit(NOTOK) if cpu_allocation_ratio < 0.5: print "WARNING: you may not have enough CPU to run hyperconverged!" if cpu_allocation_ratio > 16.0: print( "WARNING: do not increase VCPU overcommit ratio " + "beyond OSP8 default of 16:1") sys.exit(NOTOK) print "\nCompare \"guest vCPUs allowed\" to \"guests allowed based on memory\" for actual guest count"
A.2.2. Custom Script to Configure NUMA Pinning for Ceph OSD Services
The Configure Ceph NUMA Pinning section describes the creation of a script that pins the Ceph OSD services to an available NUMA socket. This script, ~/templates/numa-systemd-osd.sh
, should:
- Take the network interface used for Ceph network traffic
-
Use
lstopo
to determine that interface’s NUMA socket -
Configure
numactl
to start the OSD service with a NUMA policy that prefers the NUMA node of the Ceph network’s interface - Restart each Ceph OSD daemon sequentially so that the service runs with the new NUMA option
The numa-systemd-osd.sh
script will also attempt to install NUMA configuration tools. As such, the overcloud nodes must also be registered with Red Hat, as described in Registering the Nodes (from Red Hat Ceph Storage for the Overcloud).
The following script does all of these for Mixed HCI deployments, assuming that the hostname of each hyper-converged node uses either ceph
or osd-compute
. Edit the top-level IF
statement accordingly if you are customizing the hostnames of hyper-converged nodes:
#!/usr/bin/env bash
{
if [[ `hostname` = *"ceph"* ]] || [[ `hostname` = *"osd-compute"* ]]; then # 1
# Verify the passed network interface exists
if [[ ! $(ip add show $OSD_NUMA_INTERFACE) ]]; then
exit 1
fi
# If NUMA related packages are missing, then install them
# If packages are baked into image, no install attempted
for PKG in numactl hwloc; do
if [[ ! $(rpm -q $PKG) ]]; then
yum install -y $PKG
if [[ ! $? ]]; then
echo "Unable to install $PKG with yum"
exit 1
fi
fi
done
if [[ ! $(lstopo-no-graphics | tr -d [:punct:] | egrep "NUMANode|$OSD_NUMA_INTERFACE") ]];
then
echo "No NUMAnodes found. Exiting."
exit 1
fi
# Find the NUMA socket of the $OSD_NUMA_INTERFACE
declare -A NUMASOCKET
while read TYPE SOCKET_NUM NIC ; do
if [[ "$TYPE" == "NUMANode" ]]; then
NUMASOCKET=$(echo $SOCKET_NUM | sed s/L//g);
fi
if [[ "$NIC" == "$OSD_NUMA_INTERFACE" ]]; then
# because $NIC is the $OSD_NUMA_INTERFACE,
# the NUMASOCKET has been set correctly above
break # so stop looking
fi
done < <(lstopo-no-graphics | tr -d [:punct:] | egrep "NUMANode|$OSD_NUMA_INTERFACE")
if [[ -z $NUMASOCKET ]]; then
echo "No NUMAnode found for $OSD_NUMA_INTERFACE. Exiting."
exit 1
fi
UNIT='/usr/lib/systemd/system/ceph-osd@.service'
# Preserve the original ceph-osd start command
CMD=$(crudini --get $UNIT Service ExecStart)
if [[ $(echo $CMD | grep numactl) ]]; then
echo "numactl already in $UNIT. No changes required."
exit 0
fi
# NUMA control options to append in front of $CMD
NUMA="/usr/bin/numactl -N $NUMASOCKET --preferred=$NUMASOCKET"
# Update the unit file to start with numactl
# TODO: why doesn't a copy of $UNIT in /etc/systemd/system work with numactl?
crudini --verbose --set $UNIT Service ExecStart "$NUMA $CMD"
# Reload so updated file is used
systemctl daemon-reload
# Restart OSDs with NUMA policy (print results for log)
OSD_IDS=$(ls /var/lib/ceph/osd | awk 'BEGIN { FS = "-" } ; { print $2 }')
for OSD_ID in $OSD_IDS; do
echo -e "\nStatus of OSD $OSD_ID before unit file update\n"
systemctl status ceph-osd@$OSD_ID
echo -e "\nRestarting OSD $OSD_ID..."
systemctl restart ceph-osd@$OSD_ID
echo -e "\nStatus of OSD $OSD_ID after unit file update\n"
systemctl status ceph-osd@$OSD_ID
done
fi
} 2>&1 > /root/post_deploy_heat_output.txt
- 1
- The top-level
IF
statement assumes that the hostname of each hyper-converged node contains eitherceph
orosd-compute
. Edit theIF
statement accordingly if you are customizing the hostnames of hyper-converged nodes.Likewise, on a Pure HCI deployment all Compute nodes are hyper-converged. As such, on a Pure HCI deployment change the top-level
IF
statement to:if [[ `hostname` = *"compute"* ]]; then