Search

Administration Guide

download PDF
Red Hat Gluster Storage 3.4

Configuring and Managing Red Hat Gluster Storage

Red Hat Gluster Storage Documentation Team

Abstract

Red Hat Gluster Storage Administration Guide describes the configuration and management of Red Hat Gluster Storage for On-Premise.

Part I. Preface

Chapter 1. Preface

1.1. About Red Hat Gluster Storage

Red Hat Gluster Storage is a software-only, scale-out storage solution that provides flexible and agile unstructured data storage for the enterprise.
Red Hat Gluster Storage provides new opportunities to unify data storage and infrastructure, increase performance, and improve availability and manageability in order to meet a broader set of an organization’s storage challenges and needs.
The product can be installed and managed on-premises, or in a public cloud.

1.2. About glusterFS

glusterFS aggregates various storage servers over network interconnects into one large parallel network file system. Based on a stackable user space design, it delivers exceptional performance for diverse workloads and is a key building block of Red Hat Gluster Storage.
The POSIX compatible glusterFS servers, which use XFS file system format to store data on disks, can be accessed using industry-standard access protocols including Network File System (NFS) and Server Message Block (SMB) (also known as CIFS).

1.3. About On-premises Installation

Red Hat Gluster Storage for On-Premise allows physical storage to be utilized as a virtualized, scalable, and centrally managed pool of storage.
Red Hat Gluster Storage can be installed on commodity servers resulting in a powerful, massively scalable, and highly available NAS environment.

Part II. Overview

Chapter 2. Architecture and Concepts

This chapter provides an overview of Red Hat Gluster Storage architecture and Storage concepts.

2.1. Architecture

At the core of the Red Hat Gluster Storage design is a completely new method of architecting storage. The result is a system that has immense scalability, is highly resilient, and offers extraordinary performance.
In a scale-out system, one of the biggest challenges is keeping track of the logical and physical locations of data and metadata. Most distributed systems solve this problem by creating a metadata server to track the location of data and metadata. As traditional systems add more files, more servers, or more disks, the central metadata server becomes a performance bottleneck, as well as a central point of failure.
Unlike other traditional storage solutions, Red Hat Gluster Storage does not need a metadata server, and locates files algorithmically using an elastic hashing algorithm. This no-metadata server architecture ensures better performance, linear scalability, and reliability.
Red Hat Gluster Storage Architecture

Figure 2.1. Red Hat Gluster Storage Architecture

2.2. On-premises Architecture

Red Hat Gluster Storage for On-premises enables enterprises to treat physical storage as a virtualized, scalable, and centrally managed storage pool by using commodity storage hardware.
It supports multi-tenancy by partitioning users or groups into logical volumes on shared storage. It enables users to eliminate, decrease, or manage their dependence on high-cost, monolithic and difficult-to-deploy storage arrays.
You can add capacity in a matter of minutes across a wide variety of workloads without affecting performance. Storage can also be centrally managed across a variety of workloads, thus increasing storage efficiency.
Red Hat Gluster Storage for On-premises Architecture

Figure 2.2. Red Hat Gluster Storage for On-premises Architecture

Red Hat Gluster Storage for On-premises is based on glusterFS, an open source distributed file system with a modular, stackable design, and a unique no-metadata server architecture. This no-metadata server architecture ensures better performance, linear scalability, and reliability.

2.3. Storage Concepts

Following are the common terms relating to file systems and storage used throughout the Red Hat Gluster Storage Administration Guide.
Brick
The glusterFS basic unit of storage, represented by an export directory on a server in the trusted storage pool. A brick is expressed by combining a server with an export directory in the following format:
SERVER:EXPORT
For example:
myhostname:/exports/myexportdir/
Volume
A volume is a logical collection of bricks. Most of the Red Hat Gluster Storage management operations happen on the volume.
Translator
A translator connects to one or more subvolumes, does something with them, and offers a subvolume connection.
Subvolume
A brick after being processed by at least one translator.
Volfile
Volume (vol) files are configuration files that determine the behavior of your Red Hat Gluster Storage trusted storage pool. At a high level, GlusterFS has three entities, that is, Server, Client and Management daemon. Each of these entities have their own volume files. Volume files for servers and clients are generated by the management daemon upon creation of a volume.
Server and Client Vol files are located in /var/lib/glusterd/vols/VOLNAME directory. The management daemon vol file is named as glusterd.vol and is located in /etc/glusterfs/ directory.

Warning

You must not modify any vol file in /var/lib/glusterd manually as Red Hat does not support vol files that are not generated by the management daemon.
glusterd
glusterd is the glusterFS Management Service that must run on all servers in the trusted storage pool.
Cluster
A trusted pool of linked computers working together, resembling a single computing resource. In Red Hat Gluster Storage, a cluster is also referred to as a trusted storage pool.
Client
The machine that mounts a volume (this may also be a server).
File System
A method of storing and organizing computer files. A file system organizes files into a database for the storage, manipulation, and retrieval by the computer's operating system.
Source: Wikipedia
Distributed File System
A file system that allows multiple clients to concurrently access data which is spread across servers/bricks in a trusted storage pool. Data sharing among multiple locations is fundamental to all distributed file systems.
Virtual File System (VFS)
VFS is a kernel software layer that handles all system calls related to the standard Linux file system. It provides a common interface to several kinds of file systems.
POSIX
Portable Operating System Interface (for Unix) (POSIX) is the name of a family of related standards specified by the IEEE to define the application programming interface (API), as well as shell and utilities interfaces, for software that is compatible with variants of the UNIX operating system. Red Hat Gluster Storage exports a fully POSIX compatible file system.
Metadata
Metadata is data providing information about other pieces of data.
FUSE
Filesystem in User space (FUSE) is a loadable kernel module for Unix-like operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a "bridge" to the kernel interfaces.
Source: Wikipedia
Geo-Replication
Geo-replication provides a continuous, asynchronous, and incremental replication service from one site to another over Local Area Networks (LAN), Wide Area Networks (WAN), and the Internet.
N-way Replication
Local synchronous data replication that is typically deployed across campus or Amazon Web Services Availability Zones.
Petabyte
A petabyte is a unit of information equal to one quadrillion bytes, or 1000 terabytes. The unit symbol for the petabyte is PB. The prefix peta- (P) indicates a power of 1000:
1 PB = 1,000,000,000,000,000 B = 1000^5 B = 10^15 B.
The term "pebibyte" (PiB), using a binary prefix, is used for the corresponding power of 1024.
Source: Wikipedia
RAID
Redundant Array of Independent Disks (RAID) is a technology that provides increased storage reliability through redundancy. It combines multiple low-cost, less-reliable disk drives components into a logical unit where all drives in the array are interdependent.
RRDNS
Round Robin Domain Name Service (RRDNS) is a method to distribute load across application servers. RRDNS is implemented by creating multiple records with the same name and different IP addresses in the zone file of a DNS server.
Server
The machine (virtual or bare metal) that hosts the file system in which data is stored.
Block Storage
Block special files, or block devices, correspond to devices through which the system moves data in the form of blocks. These device nodes often represent addressable devices such as hard disks, CD-ROM drives, or memory regions. As of Red Hat Gluster Storage 3.4, block storage supports only Container-Native Storage (CNS) and Container-Ready Storage (CRS) use cases. Block storage can be created and configured for this use case by using the gluster-block command line tool. For more information, see Container-Native Storage for OpenShift Container Platform.
Scale-Up Storage
Increases the capacity of the storage device in a single dimension. For example, adding additional disk capacity in a trusted storage pool.
Scale-Out Storage
Increases the capability of a storage device in single dimension. For example, adding more systems of the same size, or adding servers to a trusted storage pool that increases CPU, disk capacity, and throughput for the trusted storage pool.
Trusted Storage Pool
A storage pool is a trusted network of storage servers. When you start the first server, the storage pool consists of only that server.
Namespace
An abstract container or environment that is created to hold a logical grouping of unique identifiers or symbols. Each Red Hat Gluster Storage trusted storage pool exposes a single namespace as a POSIX mount point which contains every file in the trusted storage pool.
User Space
Applications running in user space do not directly interact with hardware, instead using the kernel to moderate access. User space applications are generally more portable than applications in kernel space. glusterFS is a user space application.
Distributed Hash Table Terminology

Hashed subvolume
A Distributed Hash Table Translator subvolume to which the file or directory name is hashed to.
Cached subvolume
A Distributed Hash Table Translator subvolume where the file content is actually present. For directories, the concept of cached-subvolume is not relevant. It is loosely used to mean subvolumes which are not hashed-subvolume.
Linkto-file
For a newly created file, the hashed and cached subvolumes are the same. When directory entry operations like rename (which can change the name and hence hashed subvolume of the file) are performed on the file, instead of moving the entire data in the file to a new hashed subvolume, a file is created with the same name on the newly hashed subvolume. The purpose of this file is only to act as a pointer to the node where the data is present. In the extended attributes of this file, the name of the cached subvolume is stored. This file on the newly hashed-subvolume is called a linkto-file. The linkto file is relevant only for non-directory entities.
Directory Layout
The directory layout helps determine where files in a gluster volume are stored.
When a client creates or requests a file, the DHT translator hashes the file's path to create an integer. Each directory in a gluster subvolume holds files that have integers in a specific range, so the hash of any given file maps to a specific subvolume in the gluster volume. The directory layout determines which integer ranges are assigned to a given directory across all subvolumes.
Directory layouts are assigned when a directory is first created, and can be reassigned by running a rebalance operation on the volume. If a brick or subvolume is offline when a directory is created, it will not be part of the layout until after a rebalance is run.
You should rebalance a volume to recalculate its directory layout after bricks are added to the volume. See Section 11.11, “Rebalancing Volumes” for more information.
Fix Layout
A command that is executed during the rebalance process.
The rebalance process itself comprises of two stages:
  1. Fixes the layouts of directories to accommodate any subvolumes that are added or removed. It also heals the directories, checks whether the layout is non-contiguous, and persists the layout in extended attributes, if needed. It also ensures that the directories have the same attributes across all the subvolumes.
  2. Migrates the data from the cached-subvolume to the hashed-subvolume.

Part III. Configure and Verify

Chapter 3. Considerations for Red Hat Gluster Storage

3.1. Firewall and Port Access

Red Hat Gluster Storage requires access to a number of ports in order to work properly. Ensure that port access is available as indicated in Section 3.1.2, “Port Access Requirements”.

3.1.1. Configuring the Firewall

Firewall configuration tools differ between Red Hat Entperise Linux 6 and Red Hat Enterprise Linux 7.
For Red Hat Enterprise Linux 6, use the iptables command to open a port:
# iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 5667 -j ACCEPT
  # service iptables save
For Red Hat Enterprise Linux 7, if default ports are in use, it is usually simpler to add a service rather than open a port:
# firewall-cmd --zone=zone_name --add-service=glusterfs
  # firewall-cmd --zone=zone_name --add-service=glusterfs --permanent
However, if the default ports are already in use, you can open a specific port with the following command:
# firewall-cmd --zone=zone_name --add-port=port/protocol
  # firewall-cmd --zone=zone_name --add-port=port/protocol --permanent
For example:
# firewall-cmd --zone=public --add-port=5667/tcp
  # firewall-cmd --zone=public --add-port=5667/tcp --permanent

3.1.2. Port Access Requirements

Table 3.1. Open the following ports on all storage servers
Connection sourceTCP PortsUDP PortsRecommended forUsed for
Any authorized network entity with a valid SSH key22-All configurationsRemote backup using geo-replication
Any authorized network entity; be cautious not to clash with other RPC services.111111All configurationsRPC port mapper and RPC bind
Any authorized SMB/CIFS client139 and 445137 and 138Sharing storage using SMB/CIFSSMB/CIFS protocol
Any authorized NFS clients20492049Sharing storage using Gluster NFS or NFS-GaneshaExports using NFS protocol
All servers in the Samba-CTDB cluster4379-Sharing storage using SMB and Gluster NFSCTDB
Any authorized network entity24007-All configurationsManagement processes using glusterd
Any authorized network entity24009-All configurationsGluster events daemon
Any network entity monitored by Nagios5666-Monitoring using Red Hat Gluster Storage Console and NagiosNRPE service
NFSv3 clients662662Sharing storage using NFS-Ganesha and Gluster NFSstatd
NFSv3 clients3280332803Sharing storage using NFS-Ganesha and Gluster NFSNLM protocol
NFSv3 clients sending mount requests-32769Sharing storage using Gluster NFSGluster NFS MOUNT protocol
NFSv3 clients sending mount requests2004820048Sharing storage using NFS-GaneshaNFS-Ganesha MOUNT protocol
NFS clients875875Sharing storage using NFS-GaneshaNFS-Ganesha RQUOTA protocol (fetching quota information)
Servers in pacemaker/corosync cluster2224-Sharing storage using NFS-Ganeshapcsd
Servers in pacemaker/corosync cluster3121-Sharing storage using NFS-Ganeshapacemaker_remote
Servers in pacemaker/corosync cluster-5404 and 5405Sharing storage using NFS-Ganeshacorosync
Servers in pacemaker/corosync cluster21064-Sharing storage using NFS-Ganeshadlm
Any authorized network entity to access gluster-swift proxy server via SSL/TLS mode; SSL/TLS cert is required.443-Object storage configurationsHTTPS requests
Any authorized network entity with valid object server gluster-swift credentials6010-Object storage configurationsObject server
Any authorized network entity with valid container server gluster-swift credentials6011-Object storage configurationsContainer server
Any authorized network entity with valid gluster-swift account credentials6012-Object storage configurationsAccount server
Any authorized network entity with valid gluster-swift proxy credentials8080-Object storage configurationsProxy server
Any authorized network entity49152 - 49664-All configurationsBrick communication ports. The total number of ports required depends on the number of bricks on the node. One port is required for each brick on the machine.
Table 3.2. Open the following ports on NFS-Ganesha and Gluster NFS storage clients
Connection sourceTCP PortsUDP PortsRecommended forUsed for
NFSv3 servers662662Sharing storage using NFS-Ganesha and Gluster NFSstatd
NFSv3 servers3280332803Sharing storage using NFS-Ganesha and Gluster NFSNLM protocol
Table 3.3. Open the following ports on all Nagios servers
Connection sourceTCP PortsUDP PortsRecommended forUsed for
Console clients80-Monitoring using Red Hat Gluster Storage Console and NagiosHTTP protocol when Nagios server runs on a Red Hat Gluster Storage server
Console clients443-Monitoring using Red Hat Gluster Storage Console and NagiosHTTPS protocol when Nagios server runs on a Red Hat Gluster Storage server
Servers monitored by Nagios5667-Monitoring using Red Hat Gluster Storage Console and NagiosNSCA service when Nagios server runs on a Red Hat Gluster Storage server

3.2. Feature Compatibility Support

Red Hat Gluster Storage supports a number of features. Most features are supported with other features, but there are some exceptions. This section clearly identifies which features are supported and compatible with other features to help you in planning your Red Hat Gluster Storage deployment.

Note

Internet Protocol Version 6 (IPv6) support is available only for Red Hat Hyperconverged Infrastructure for Virtualization environments and not for Red Hat Gluster Storage standalone environments.
Features in the following table are supported from the specified version and later.
Table 3.4. Features supported by Red Hat Gluster Storage version
FeatureVersion
Arbiter bricks3.2
Bitrot detection3.1
Erasure coding3.1
Google Compute Engine3.1.3
Metadata caching3.2
Microsoft Azure3.1.3
NFS version 43.1
SELinux3.1
Sharding3.2.0
Snapshots3.0
Snapshots, cloning3.1.3
Snapshots, user-serviceable3.0.3
Tiering3.1.2
Volume Shadow Copy (VSS)3.1.3
Table 3.5. Features supported by volume type
Volume TypeShardingTieringQuotaSnapshotsGeo-RepBitrot
Arbitrated-ReplicatedYesNoYesYesYesYes
DistributedNoYesYesYesYesYes
Distributed-DispersedNoYesYesYesYesYes
Distributed-ReplicatedYesYesYesYesYesYes
ReplicatedYesYesYesYesYesYes
ShardedN/ANoNoNoYesNo
TieredNoN/ALimited[a]Limited[a]Limited[a]Limited[a]
[a] See Section 17.3. Tiering Limitations in the Red Hat Gluster Storage 3.4 Administration Guide for details.
Table 3.6. Features supported by client protocol
FeatureFUSEGluster-NFSNFS-GaneshaSMBSwift/S3
ArbiterYesYesYesYesNo
Bitrot detectionYesYesNoYesNo
dm-cacheYesYesYesYesYes
Encryption (TLS-SSL)YesYesYesYesNo
Erasure codingYesYesYesYesNo
Export subdirectoryYesYesYesN/AN/A
Geo-replicationYesYesYesYesYes
QuotaYesYesYesYesNo
RDMAYesNoNoNoN/A
SnapshotsYesYesYesYesYes
Snapshot cloningYesYesYesYesYes
TieringYesYesN/AN/AN/A

Chapter 4. Adding Servers to the Trusted Storage Pool

A storage pool is a network of storage servers.
When the first server starts, the storage pool consists of that server alone. Adding additional storage servers to the storage pool is achieved using the probe command from a running, trusted storage server.

Important

Before adding servers to the trusted storage pool, you must ensure that the ports specified in Chapter 3, Considerations for Red Hat Gluster Storage are open.
On Red Hat Enterprise Linux 7, enable the glusterFS firewall service in the active zones for runtime and permanent mode using the following commands:
To get a list of active zones, run the following command:
# firewall-cmd --get-active-zones
To allow the firewall service in the active zones, run the following commands:
# firewall-cmd --zone=zone_name --add-service=glusterfs
# firewall-cmd --zone=zone_name --add-service=glusterfs --permanent
For more information about using firewalls, see section Using Firewalls in the Red Hat Enterprise Linux 7 Security Guide: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Security_Guide/sec-Using_Firewalls.html.

Note

When any two gluster commands are executed concurrently on the same volume, the following error is displayed:
Another transaction is in progress.
This behavior in the Red Hat Gluster Storage prevents two or more commands from simultaneously modifying a volume configuration, potentially resulting in an inconsistent state. Such an implementation is common in environments with monitoring frameworks such as the Red Hat Gluster Storage Console, Red Hat Enterprise Virtualization Manager, and Nagios. For example, in a four node Red Hat Gluster Storage Trusted Storage Pool, this message is observed when gluster volume status VOLNAME command is executed from two of the nodes simultaneously.

4.1. Adding Servers to the Trusted Storage Pool

The gluster peer probe [server] command is used to add servers to the trusted server pool.

Note

Probing a node from lower version to a higher version of Red Hat Gluster Storage node is not supported.

Adding Three Servers to a Trusted Storage Pool

Create a trusted storage pool consisting of three storage servers, which comprise a volume.

Prerequisites

  • The glusterd service must be running on all storage servers requiring addition to the trusted storage pool. See Chapter 24, Starting and Stopping the glusterd service for service start and stop commands.
  • Server1, the trusted storage server, is started.
  • The host names of the target servers must be resolvable by DNS.
  1. Run gluster peer probe [server] from Server 1 to add additional servers to the trusted storage pool.

    Note

    • Self-probing Server1 will result in an error because it is part of the trusted storage pool by default.
    • All the servers in the Trusted Storage Pool must have RDMA devices if either RDMA or RDMA,TCP volumes are created in the storage pool. The peer probe must be performed using IP/hostname assigned to the RDMA device.
    # gluster peer probe server2
    Probe successful
    
    # gluster peer probe server3
    Probe successful
    
    # gluster peer probe server4
    Probe successful
  2. Verify the peer status from all servers using the following command:
    # gluster peer status
      Number of Peers: 3
    
      Hostname: server2
      Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5
      State: Peer in Cluster (Connected)
    
      Hostname: server3
      Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7
      State: Peer in Cluster (Connected)
    
      Hostname: server4
      Uuid: 3e0caba-9df7-4f66-8e5d-cbc348f29ff7
      State: Peer in Cluster (Connected)

Important

If the existing trusted storage pool has a geo-replication session, then after adding the new server to the trusted storage pool, perform the steps listed at Section 10.5, “Starting Geo-replication on a Newly Added Brick, Node, or Volume”.

Note

Verify that time is synchronized on all Gluster nodes by using the following command:
# for peer in `gluster peer status | grep Hostname | awk -F':' '{print $2}' | awk '{print $1}'`; do clockdiff $peer; done

4.2. Removing Servers from the Trusted Storage Pool

Warning

Before detaching a peer from the trusted storage pool, make sure that the clients are not using the node. If backup servers were not set at mount time using the backup-volfile-servers option, remount the volume on the client using the IP address or FQDN of another server in the trusted storage pool to avoid inconsistencies.
Run gluster peer detach server to remove a server from the storage pool.

Removing One Server from the Trusted Storage Pool

Remove one server from the Trusted Storage Pool, and check the peer status of the storage pool.

Prerequisites

  1. Run gluster peer detach [server] to remove the server from the trusted storage pool.
    # gluster peer detach server4
    Detach successful
  2. Verify the peer status from all servers using the following command:
    # gluster peer status
    Number of Peers: 2
    
    Hostname: server2
    Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5
    State: Peer in Cluster (Connected)
    
    Hostname: server3
    Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7
    

Chapter 5. Setting Up Storage Volumes

A Red Hat Gluster Storage volume is a logical collection of bricks, where each brick is an export directory on a server in the trusted storage pool. Most of the Red Hat Gluster Storage Server management operations are performed on the volume. For a detailed information about configuring Red Hat Gluster Storage for enhancing performance see, Chapter 20, Tuning for Performance

Warning

Red Hat does not support writing data directly into the bricks. Read and write data only through the Native Client, or through NFS or SMB mounts.

Note

Red Hat Gluster Storage supports IP over Infiniband (IPoIB). Install Infiniband packages on all Red Hat Gluster Storage servers and clients to support this feature. Run the yum groupinstall "Infiniband Support" to install Infiniband packages.

Volume Types

Distributed
Distributes files across bricks in the volume.
Use this volume type where scaling and redundancy requirements are not important, or provided by other hardware or software layers.
See Section 5.5, “Creating Distributed Volumes” for additional information about this volume type.
Replicated
Replicates files across bricks in the volume.
Use this volume type in environments where high-availability and high-reliability are critical.
See Section 5.6, “Creating Replicated Volumes” for additional information about this volume type.
Distributed Replicated
Distributes files across replicated bricks in the volume.
Use this volume type in environments where high-reliability and scalability are critical. This volume type offers improved read performance in most environments.
See Section 5.7, “Creating Distributed Replicated Volumes” for additional information about this volume type.
Arbitrated Replicated
Replicates files across two bricks in a replica set, and replicates only metadata to the third brick.
Use this volume type in environments where consistency is critical, but underlying storage space is at a premium.
See Section 5.8, “Creating Arbitrated Replicated Volumes” for additional information about this volume type.
Dispersed
Disperses the file's data across the bricks in the volume.
Use this volume type where you need a configurable level of reliability with a minimum space waste.
See Section 5.9, “Creating Dispersed Volumes” for additional information about this volume type.
Distributed Dispersed
Distributes file's data across the dispersed sub-volume.
Use this volume type where you need a configurable level of reliability with a minimum space waste.
See Section 5.10, “Creating Distributed Dispersed Volumes” for additional information about this volume type.

5.1. Setting up Gluster Storage Volumes using gdeploy

The gdeploy tool automates the process of creating, formatting, and mounting bricks. With gdeploy, the manual steps listed between Section 5.4 Formatting and Mounting Bricks and Section 5.10 Creating Distributed Dispersed Volumes are automated.
When setting-up a new trusted storage pool, gdeploy could be the preferred choice of trusted storage pool set up, as manually executing numerous commands can be error prone.
The advantages of using gdeploy to automate brick creation are as follows:
  • Setting-up the backend on several machines can be done from one's laptop/desktop. This saves time and scales up well when the number of nodes in the trusted storage pool increase.
  • Flexibility in choosing the drives to configure. (sd, vd, ...).
  • Flexibility in naming the logical volumes (LV) and volume groups (VG).

5.1.1. Getting Started

Prerequisites

  1. Generate the passphrase-less SSH keys for the nodes which are going to be part of the trusted storage pool by running the following command:
    # ssh-keygen -t rsa -N ''
  2. Set up key-based SSH authentication access between the gdeploy controller and servers by running the following command:
    # ssh-copy-id -i root@server

    Note

    If you are using a Red Hat Gluster Storage node as the deployment node and not an external node, then the key-based SSH authentication must be set up for the Red Hat Gluster Storage node from where the installation is performed.
  3. Enable the repository required to install Ansible by running the following command:
    # subscription-manager repos --enable=rhel-7-server-ansible-2-rpms
  4. Install ansible by executing the following command:
    # yum install ansible
  5. You must also ensure the following:
    • Devices should be raw and unused
    • Default system locale must be set to en_US
      For information on system locale, refer to the Setting the System Locale of the Red Hat Enterprise Linux 7 System Administrator's Guide.
    • For multiple devices, use multiple volume groups, thinpool, and thinvol in the gdeploy configuration file
For more information, see Installing Ansible to Support Gdeploy in Red Hat Gluster Storage 3.4 Installation Guide.
gdeploy can be used to deploy Red Hat Gluster Storage in two ways:
  • Using a node in a trusted storage pool
  • Using a machine outside the trusted storage pool
Using a node in a cluster

The gdeploy package is bundled as part of the initial installation of Red Hat Gluster Storage.

Using a machine outside the trusted storage pool

You must ensure that the Red Hat Gluster Storage is subscribed to the required channels. For more information see, Subscribing to the Red Hat Gluster Storage Server Channels in the Red Hat Gluster Storage 3.4 Installation Guide.

Execute the following command to install gdeploy:
# yum install gdeploy
For more information on installing gdeploy see, Installing Ansible to Support Gdeploy section in the Red Hat Gluster Storage 3.4 Installation Guide.

5.1.2. Setting up a Trusted Storage Pool

Creating a trusted storage pool is a tedious task and becomes more tedious as the nodes in the trusted storage pool grow. With gdeploy, just a configuration file can be used to set up a trusted storage pool. When gdeploy is installed, a sample configuration file will be created at:
/usr/share/doc/gdeploy/examples/gluster.conf.sample

Note

The trusted storage pool can be created either by performing each tasks, such as, setting up a backend, creating a volume, and mounting volumes independently or summed up as a single configuration.
For example, for a basic trusted storage pool of a 2 x 2 replicated volume the configuration details in the configuration file will be as follows:
2x2-volume-create.conf:
#
# Usage:
#       gdeploy -c 2x2-volume-create.conf
#
# This does backend setup first and then create the volume using the
# setup bricks.
#
#

[hosts]
10.70.46.13
10.70.46.17


# Common backend setup for 2 of the hosts.
[backend-setup]
devices=sdb,sdc
vgs=vg1,vg2
pools=pool1,pool2
lvs=lv1,lv2
mountpoints=/rhgs/brick1,/rhgs/brick2
brick_dirs=/rhgs/brick1/b1,/rhgs/brick2/b2

# If backend-setup is different for each host
# [backend-setup:10.70.46.13]
# devices=sdb
# brick_dirs=/rhgs/brick1
#
# [backend-setup:10.70.46.17]
# devices=sda,sdb,sdc
# brick_dirs=/rhgs/brick{1,2,3}
#

[volume]
action=create
volname=sample_volname
replica=yes
replica_count=2
force=yes


[clients]
action=mount
volname=sample_volname
hosts=10.70.46.15
fstype=glusterfs
client_mount_points=/mnt/gluster
With this configuration a 2 x 2 replica trusted storage pool with the given IP addresses and backend device as /dev/sdb,/dev/sdc with the volume name as sample_volname will be created.
For more information on possible values, see Section 5.1.7, “Configuration File”
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt

Note

You can create a new configuration file by referencing the template file available at /usr/share/doc/gdeploy/examples/gluster.conf.sample. To invoke the new configuration file, run gdeploy -c /path_to_file/config.txt command.
To only setup the backend see, Section 5.1.3, “Setting up the Backend ”
To only create a volume see, Section 5.1.4, “Creating Volumes”
To only mount clients see, Section 5.1.5, “Mounting Clients”

5.1.3. Setting up the Backend

In order to setup a Gluster Storage volume, the LVM thin-p must be set up on the storage disks. If the number of machines in the trusted storage pool is huge, these tasks takes a long time, as the number of commands involved are huge and error prone if not cautious. With gdeploy, just a configuration file can be used to set up a backend. The backend is setup at the time of setting up a fresh trusted storage pool, which requires bricks to be setup before creating a volume. When gdeploy is installed, a sample configuration file will be created at:
/usr/share/doc/gdeploy/examples/gluster.conf.sample
A backend can be setup in two ways:
  • Using the [backend-setup] module
  • Creating Physical Volume (PV), Volume Group (VG), and Logical Volume (LV) individually

Note

For Red Hat Enterprise Linux 6, the xfsprogs package must be installed before setting up the backend bricks using gdeploy.
5.1.3.1. Using the [backend-setup] Module
Backend setup can be done on specific machines or on all the machines. The backend-setup module internally creates PV, VG, and LV and mounts the device. Thin-p logical volumes are created as per the performance recommendations by Red Hat.
The backend can be setup based on the requirement, such as:
  • Generic
  • Specific
Generic

If the disk names are uniform across the machines then backend setup can be written as below. The backend is setup for all the hosts in the `hosts’ section.

For more information on possible values, see Section 5.1.7, “Configuration File”
Example configuration file: Backend-setup-generic.conf
#
# Usage:
#       gdeploy -c backend-setup-generic.conf
#
# This configuration creates backend for GlusterFS clusters
#

[hosts]
10.70.46.130
10.70.46.32
10.70.46.110
10.70.46.77

# Backend setup for all the nodes in the `hosts' section. This will create
# PV, VG, and LV with gdeploy generated names.
[backend-setup]
devices=vdb
Specific

If the disks names vary across the machines in the cluster then backend setup can be written for specific machines with specific disk names. gdeploy is quite flexible in allowing to do host specific setup in a single configuration file.

For more information on possible values, see Section 5.1.7, “Configuration File”
Example configuration file: backend-setup-hostwise.conf
#
# Usage:
#       gdeploy -c backend-setup-hostwise.conf
#
# This configuration creates backend for GlusterFS clusters
#

[hosts]
10.70.46.130
10.70.46.32
10.70.46.110
10.70.46.77

# Backend setup for 10.70.46.77 with default gdeploy generated names for
# Volume Groups and Logical Volumes. Volume names will be GLUSTER_vg1,
# GLUSTER_vg2...
[backend-setup:10.70.46.77]
devices=vda,vdb

# Backend setup for remaining 3 hosts in the `hosts' section with custom names
# for Volumes Groups and Logical Volumes.
[backend-setup:10.70.46.{130,32,110}]
devices=vdb,vdc,vdd
vgs=vg1,vg2,vg3
pools=pool1,pool2,pool3
lvs=lv1,lv2,lv3
mountpoints=/rhgs/brick1,/rhgs/brick2,/rhgs/brick3
brick_dirs=/rhgs/brick1/b1,/rhgs/brick2/b2,/rhgs/brick3/b3
5.1.3.2. Creating Backend by Setting up PV, VG, and LV
If the user needs more control over setting up the backend, then pv, vg, and lv can be created individually. LV module provides flexibility to create more than one LV on a VG. For example, the `backend-setup’ module setups up a thin-pool by default and applies default performance recommendations. However, if the user has a different use case which demands more than one LV, and a combination of thin and thick pools then `backend-setup’ is of no help. The user can use PV, VG, and LV modules to achieve this.
For more information on possible values, see Section 5.1.7, “Configuration File”
The below example shows how to create four logical volumes on a single volume group. The examples shows a mix of thin and thickpool LV creation.
[hosts]
10.70.46.130
10.70.46.32

[pv]
action=create
devices=vdb

[vg1]
action=create
vgname=RHS_vg1
pvname=vdb

[lv1]
action=create
vgname=RHS_vg1
lvname=engine_lv
lvtype=thick
size=10GB
mount=/rhgs/brick1

[lv2]
action=create
vgname=RHS_vg1
poolname=lvthinpool
lvtype=thinpool
poolmetadatasize=200MB
chunksize=1024k
size=30GB

[lv3]
action=create
lvname=lv_vmaddldisks
poolname=lvthinpool
vgname=RHS_vg1
lvtype=thinlv
mount=/rhgs/brick2
virtualsize=9GB

[lv4]
action=create
lvname=lv_vmrootdisks
poolname=lvthinpool
vgname=RHS_vg1
size=19GB
lvtype=thinlv
mount=/rhgs/brick3
virtualsize=19GB
Example to extend an existing VG:
#
# Extends a given given VG. pvname and vgname is mandatory, in this example the
# vg `RHS_vg1' is extended by adding pv, vdd. If the pv is not alreay present, it
# is created by gdeploy.
#
[hosts]
10.70.46.130
10.70.46.32

[vg2]
action=extend
vgname=RHS_vg1
pvname=vdd

5.1.4. Creating Volumes

Setting up volume involves writing long commands by choosing the hostname/IP and brick order carefully and this could be error prone. gdeploy helps in simplifying this task. When gdeploy is installed, a sample configuration file will be created at:
/usr/share/doc/gdeploy/examples/gluster.conf.sample
For example, for a basic trusted storage pool of a 2 x 2 replicate volume the configuration details in the configuration file will be as follows:
[hosts]
10.0.0.1
10.0.0.2
10.0.0.3
10.0.0.4

[volume]
action=create
volname=glustervol
transport=tcp,rdma
replica=yes
replica_count=2
force=yes
For more information on possible values, see Section 5.1.7, “Configuration File”
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
Creating Multiple Volumes

Note

Support of creating multiple volumes only from gdeploy 2.0, please check your gdeploy version before trying this configuration.
While creating multiple volumes in a single configuration, the [volume] modules should be numbered. For example, if there are two volumes they will be numbered [volume1], [volume2]
vol-create.conf
[hosts]
10.70.46.130
10.70.46.32

[backend-setup]
devices=vdb,vdc
mountpoints=/mnt/data1,/mnt/data2

[volume1]
action=create
volname=vol-one
transport=tcp
replica=yes
replica_count=2
brick_dirs=/mnt/data1/1

[volume2]
action=create
volname=vol-two
transport=tcp
replica=yes
replica_count=2
brick_dirs=/mnt/data2/2
With gdeploy 2.0, a volume can be created with multiple volume options set. Number of keys should match number of values.
[hosts]
10.70.46.130
10.70.46.32

[backend-setup]
devices=vdb,vdc
mountpoints=/mnt/data1,/mnt/data2

[volume1]
action=create
volname=vol-one
transport=tcp
replica=yes
replica_count=2
key=group,storage.owner-uid,storage.owner-gid,features.shard,features.shard-block-size,performance.low-prio-threads,cluster.data-self-heal-algorithm
value=virt,36,36,on,512MB,32,full
brick_dirs=/mnt/data1/1

[volume2]
action=create
volname=vol-two
transport=tcp
replica=yes
key=group,storage.owner-uid,storage.owner-gid,features.shard,features.shard-block-size,performance.low-prio-threads,cluster.data-self-heal-algorithm
value=virt,36,36,on,512MB,32,full
replica_count=2
brick_dirs=/mnt/data2/2
The above configuration will create two volumes with multiple volume options set.

5.1.5. Mounting Clients

When mounting clients, instead of logging into every client which has to be mounted, gdeploy can be used to mount clients remotely. When gdeploy is installed, a sample configuration file will be created at:
/usr/share/doc/gdeploy/examples/gluster.conf.sample
Following is an example of the modifications to the configuration file in order to mount clients:
[clients]
action=mount
hosts=10.70.46.159
fstype=glusterfs
client_mount_points=/mnt/gluster
volname=10.0.0.1:glustervol

Note

If the fstype is NFS, then mention it as nfs-version. By default it is 3.
For more information on possible values, see Section 5.1.7, “Configuration File”
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt

5.1.6. Configuring a Volume

The volumes can be configured using the configuration file. The volumes can be configured remotely using the configuration file without having to log into the trusted storage pool. For more information regarding the sections and options in the configuration file, see Section 5.1.7, “Configuration File”
5.1.6.1. Adding and Removing a Brick
The configuration file can be modified to add or remove a brick:
Adding a Brick

Modify the [volume] section in the configuration file to add a brick. For example:

[volume]
action=add-brick
volname=10.0.0.1:glustervol
bricks=10.0.0.1:/rhgs/new_brick
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
Removing a Brick

Modify the [volume] section in the configuration file to remove a brick. For example:

[volume]
action=remove-brick
volname=10.0.0.1:glustervol
bricks=10.0.0.2:/rhgs/brick
state=commit
Other options for state are stop, start, and force.
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
For more information on possible values, see Section 5.1.7, “Configuration File”
5.1.6.2. Rebalancing a Volume
Modify the [volume] section in the configuration file to rebalance a volume. For example:
[volume]
action=rebalance
volname=10.70.46.13:glustervol
state=start
Other options for state are stop, and fix-layout.
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
For more information on possible values, see Section 5.1.7, “Configuration File”
5.1.6.3. Starting, Stopping, or Deleting a Volume
The configuration file can be modified to start, stop, or delete a volume:
Starting a Volume

Modify the [volume] section in the configuration file to start a volume. For example:

[volume]
action=start
volname=10.0.0.1:glustervol
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
Stopping a Volume

Modify the [volume] section in the configuration file to start a volume. For example:

[volume]
action=stop
volname=10.0.0.1:glustervol
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
Deleting a Volume

Modify the [volume] section in the configuration file to start a volume. For example:

[volume]
action=delete
volname=10.70.46.13:glustervol
After modifying the configuration file, invoke the configuration using the command:
# gdeploy -c conf.txt
For more information on possible values, see Section 5.1.7, “Configuration File”

5.1.7. Configuration File

The configuration file includes the various options that can be used to change the settings for gdeploy. The following options are currently supported:
  • [hosts]
  • [devices]
  • [disktype]
  • [diskcount]
  • [stripesize]
  • [vgs]
  • [pools]
  • [lvs]
  • [mountpoints]
  • [peer]
  • [clients]
  • [volume]
  • [backend-setup]
  • [pv]
  • [vg]
  • [lv]
  • [RH-subscription]
  • [yum]
  • [shell]
  • [update-file]
  • [service]
  • [script]
  • [firewalld]
The options are briefly explained in the following list:
  • hosts

    This is a mandatory section which contains the IP address or hostname of the machines in the trusted storage pool. Each hostname or IP address should be listed in a separate line.

    For example:
    [hosts]
    10.0.0.1
    10.0.0.2
  • devices

    This is a generic section and is applicable to all the hosts listed in the [hosts] section. However, if sections of hosts such as the [hostname] or [IP-address] is present, then the data in the generic sections like [devices] is ignored. Host specific data take precedence. This is an optional section.

    For example:
    [devices]
    /dev/sda
    /dev/sdb

    Note

    When configuring the backend setup, the devices should be either listed in this section or in the host specific section.
  • disktype

    This section specifies the disk configuration that is used while setting up the backend. gdeploy supports RAID 10, RAID 6, RAID 5, and JBOD configurations. This is an optional section and if the field is left empty, JBOD is taken as the default configuration. Valid values for this field are raid10, raid6, raid5, and jbod.

    For example:
    [disktype]
    raid6
  • diskcount

    This section specifies the number of data disks in the setup. This is a mandatory field if a RAID disk type is specified under [disktype]. If the [disktype] is JBOD the [diskcount] value is ignored. This parameter is host specific.

    For example:
    [diskcount]
    10
  • stripesize

    This section specifies the stripe_unit size in KB.

    Case 1: This field is not necessary if the [disktype] is JBOD, and any given value will be ignored.
    Case 2: This is a mandatory field if [disktype] is specified as RAID 5 or RAID 6.
    For [disktype] RAID 10, the default value is taken as 256KB. Red Hat does not recommend changing this value. If you specify any other value the following warning is displayed:
    "Warning: We recommend a stripe unit size of 256KB for RAID 10"

    Note

    Do not add any suffixes like K, KB, M, etc. This parameter is host specific and can be added in the hosts section.
    For example:
    [stripesize]
    128
  • vgs

    This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for gdeploy 2.0. This section specifies the volume group names for the devices listed in [devices]. The number of volume groups in the [vgs] section should match the one in [devices]. If the volume group names are missing, the volume groups will be named as GLUSTER_vg{1, 2, 3, ...} as default.

    For example:
    [vgs]
    CUSTOM_vg1
    CUSTOM_vg2
  • pools

    This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for gdeploy 2.0. This section specifies the pool names for the volume groups specified in the [vgs] section. The number of pools listed in the [pools] section should match the number of volume groups in the [vgs] section. If the pool names are missing, the pools will be named as GLUSTER_pool{1, 2, 3, ...}.

    For example:
    [pools]
    CUSTOM_pool1
    CUSTOM_pool2
  • lvs

    This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for gdeploy 2.0. This section provides the logical volume names for the volume groups specified in [vgs]. The number of logical volumes listed in the [lvs] section should match the number of volume groups listed in [vgs]. If the logical volume names are missing, it is named as GLUSTER_lv{1, 2, 3, ...}.

    For example:
    [lvs]
    CUSTOM_lv1
    CUSTOM_lv2
  • mountpoints

    This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for gdeploy 2.0. This section specifies the brick mount points for the logical volumes. The number of mount points should match the number of logical volumes specified in [lvs] If the mount points are missing, the mount points will be names as /gluster/brick{1, 2, 3…}.

    For example:
    [mountpoints]
    /rhgs/brick1
    /rhgs/brick2
  • peer

    This section specifies the configurations for the Trusted Storage Pool management (TSP). This section helps in making all the hosts specified in the [hosts] section to either probe each other to create the trusted storage pool or detach all of them from the trusted storage pool. The only option in this section is the option names 'action' which can have it's values to be either probe or detach.

    For example:
    [peer]
    action=probe
  • clients

    This section specifies the client hosts and client_mount_points to mount the gluster storage volume created. The 'action' option is to be specified for the framework to determine the action that has to be performed. The options are 'mount' and 'unmount'. The Client hosts field is mandatory. If the mount points are not specified, default will be taken as /mnt/gluster for all the hosts.

    The option fstype specifies how the gluster volume is to be mounted. Default is glusterfs (FUSE mount). The volume can also be mounted as NFS. Each client can have different types of volume mount, which has to be specified with a comma separated. The following fields are included:
    * action
    * hosts
    * fstype
    * client_mount_points
    For example:
    [clients]
    action=mount
    hosts=10.0.0.10
    fstype=nfs
    options=vers=3
    client_mount_points=/mnt/rhs
  • volume

    The section specifies the configuration options for the volume. The following fields are included in this section:

    * action
    * volname
    * transport
    * replica
    * replica_count
    * disperse
    * disperse_count
    * redundancy_count
    * force
    • action

      This option specifies what action must be performed in the volume. The choices can be [create, delete, add-brick, remove-brick].

      create: This choice is used to create a volume.
      delete: If the delete choice is used, all the options other than 'volname' will be ignored.
      add-brick or remove-brick: If the add-brick or remove-brick is chosen, extra option bricks with a comma separated list of brick names(in the format <hostname>:<brick path> should be provided. In case of remove-brick, state option should also be provided specifying the state of the volume after brick removal.
    • volname

      This option specifies the volume name. Default name is glustervol

      Note

      • In case of a volume operation, the 'hosts' section can be omitted, provided volname is in the format <hostname>:<volname>, where hostname is the hostname / IP of one of the nodes in the cluster
      • Only single volume creation/deletion/configuration is supported.
    • transport

      This option specifies the transport type. Default is tcp. Options are tcp or rdma or tcp,rdma.

    • replica

      This option will specify if the volume should be of type replica. options are yes and no. Default is no. If 'replica' is provided as yes, the 'replica_count' should be provided.

    • disperse

      This option specifies if the volume should be of type disperse. Options are yes and no. Default is no.

    • disperse_count

      This field is optional even if 'disperse' is yes. If not specified, the number of bricks specified in the command line is taken as the disperse_count value.

    • redundancy_count

      If this value is not specified, and if 'disperse' is yes, it's default value is computed so that it generates an optimal configuration.

    • force

      This is an optional field and can be used during volume creation to forcefully create the volume.

    For example:
    [volname]
    action=create
    volname=glustervol
    transport=tcp,rdma
    replica=yes
    replica_count=3
    force=yes
  • backend-setup

    Available in gdeploy 2.0. This section sets up the backend for using with GlusterFS volume. If more than one backend-setup has to be done, they can be done by numbering the section like [backend-setup1], [backend-setup2], ...

    backend-setup section supports the following variables:
    • devices: This replaces the [pvs] section in gdeploy 1.x. devices variable lists the raw disks which should be used for backend setup. For example:
      [backend-setup]
      devices=sda,sdb,sdc
      This is a mandatory field.
    • dalign:
      The Logical Volume Manager can use a portion of the physical volume for storing its metadata while the rest is used as the data portion. Align the I/O at the Logical Volume Manager (LVM) layer using the dalign option while creating the physical volume. For example:
      [backend-setup]
      devices=sdb,sdc,sdd,sde
      dalign=256k
      For JBOD, use an alignment value of 256K. For hardware RAID, the alignment value should be obtained by multiplying the RAID stripe unit size with the number of data disks. If 12 disks are used in a RAID 6 configuration, the number of data disks is 10; on the other hand, if 12 disks are used in a RAID 10 configuration, the number of data disks is 6.
      The following example is appropriate for 12 disks in a RAID 6 configuration with a stripe unit size of 128 KiB:
      [backend-setup]
      devices=sdb,sdc,sdd,sde
      dalign=1280k
      The following example is appropriate for 12 disks in a RAID 10 configuration with a stripe unit size of 256 KiB:
      [backend-setup]
      devices=sdb,sdc,sdd,sde
      dalign=1536k
      To view the previously configured physical volume settings for the dalign option, run the pvs -o +pe_start device command. For example:
      # pvs -o +pe_start disk
      PV         VG   Fmt  Attr PSize PFree 1st PE
      /dev/sdb        lvm2 a--  9.09t 9.09t   1.25m
      You can also set the dalign option in the PV section.
    • vgs: This is an optional variable. This variable replaces the [vgs] section in gdeploy 1.x. vgs variable lists the names to be used while creating volume groups. The number of VG names should match the number of devices or should be left blank. gdeploy will generate names for the VGs. For example:
      [backend-setup]
      devices=sda,sdb,sdc
      vgs=custom_vg1,custom_vg2,custom_vg3
      A pattern can be provided for the vgs like custom_vg{1..3}, this will create three vgs.
      [backend-setup]
      devices=sda,sdb,sdc
      vgs=custom_vg{1..3}
    • pools: This is an optional variable. The variable replaces the [pools] section in gdeploy 1.x. pools lists the thin pool names for the volume.
      [backend-setup]
      devices=sda,sdb,sdc
      vgs=custom_vg1,custom_vg2,custom_vg3
      pools=custom_pool1,custom_pool2,custom_pool3
      Similar to vg, pattern can be provided for thin pool names. For example custom_pool{1..3}
    • lvs: This is an optional variable. This variable replaces the [lvs] section in gdeploy 1.x. lvs lists the logical volume name for the volume.
      [backend-setup]
      devices=sda,sdb,sdc
      vgs=custom_vg1,custom_vg2,custom_vg3
      pools=custom_pool1,custom_pool2,custom_pool3
      lvs=custom_lv1,custom_lv2,custom_lv3
      Patterns for LV can be provided similar to vg. For example custom_lv{1..3}.
    • mountpoints: This variable deprecates the [mountpoints] section in gdeploy 1.x. Mountpoints lists the mount points where the logical volumes should be mounted. Number of mount points should be equal to the number of logical volumes. For example:
      [backend-setup]
      devices=sda,sdb,sdc
      vgs=custom_vg1,custom_vg2,custom_vg3
      pools=custom_pool1,custom_pool2,custom_pool3
      lvs=custom_lv1,custom_lv2,custom_lv3
      mountpoints=/gluster/data1,/gluster/data2,/gluster/data3
    • ssd - This variable is set if caching has to be added. For example, the backed setup with ssd for caching should be:
      [backend-setup]
      ssd=sdc
      vgs=RHS_vg1
      datalv=lv_data
      cachedatalv=lv_cachedata:1G
      cachemetalv=lv_cachemeta:230G

      Note

      Specifying the name of the data LV is necessary while adding SSD. Make sure the datalv is created already. Otherwise ensure to create it in one of the earlier `backend-setup’ sections.
  • PV

    Available in gdeploy 2.0. If the user needs to have more control over setting up the backend, and does not want to use backend-setup section, then pv, vg, and lv modules are to be used. The pv module supports the following variables.

    • action: Mandatory. Supports two values, 'create' and 'resize'
      Example: Creating physical volumes
      [pv]
      action=create
      devices=vdb,vdc,vdd
      Example: Creating physical volumes on a specific host
      [pv:10.0.5.2]
      action=create
      devices=vdb,vdc,vdd
    • devices: Mandatory. The list of devices to use for pv creation.
    • expand: Used when action=resize.
      Example: Expanding an already created pv
      [pv]
      action=resize
      devices=vdb
      expand=yes
    • shrink: Used when action=resize.
      Example: Shrinking an already created pv
      [pv]
      action=resize
      devices=vdb
      shrink=100G
    • dalign:
      The Logical Volume Manager can use a portion of the physical volume for storing its metadata while the rest is used as the data portion. Align the I/O at the Logical Volume Manager (LVM) layer using the dalign option while creating the physical volume. For example:
      [pv]
      action=create
      devices=sdb,sdc,sdd,sde
      dalign=256k
      For JBOD, use an alignment value of 256K. For hardware RAID, the alignment value should be obtained by multiplying the RAID stripe unit size with the number of data disks. If 12 disks are used in a RAID 6 configuration, the number of data disks is 10; on the other hand, if 12 disks are used in a RAID 10 configuration, the number of data disks is 6.
      The following example is appropriate for 12 disks in a RAID 6 configuration with a stripe unit size of 128 KiB:
      [pv]
      action=create
      devices=sdb,sdc,sdd,sde
      dalign=1280k
      The following example is appropriate for 12 disks in a RAID 10 configuration with a stripe unit size of 256 KiB:
      [pv]
      action=create
      devices=sdb,sdc,sdd,sde
      dalign=1536k
      To view the previously configured physical volume settings for the dalign option, run the pvs -o +pe_start device command. For example:
      # pvs -o +pe_start disk
      PV         VG   Fmt  Attr PSize PFree 1st PE
      /dev/sdb        lvm2 a--  9.09t 9.09t   1.25m
      You can also set the dalign option in the backend-setup section.
  • VG

    Available in gdeploy 2.0. This module is used to create and extend volume groups. The vg module supports the following variables.

    • action - Action can be one of create or extend.
    • pvname - PVs to use to create the volume. For more than one PV use comma separated values.
    • vgname - The name of the vg. If no name is provided GLUSTER_vg will be used as default name.
    • one-to-one - If set to yes, one-to-one mapping will be done between pv and vg.
    If action is set to extend, the vg will be extended to include pv provided.
    Example1: Create a vg named images_vg with two PVs
    [vg]
    action=create
    vgname=images_vg
    pvname=sdb,sdc
    Example2: Create two vgs named rhgs_vg1 and rhgs_vg2 with two PVs
    [vg]
    action=create
    vgname=rhgs_vg
    pvname=sdb,sdc
    one-to-one=yes
    Example3: Extend an existing vg with the given disk.
    [vg]
    action=extend
    vgname=rhgs_images
    pvname=sdc
  • LV

    Available in gdeploy 2.0. This module is used to create, setup-cache, and convert logical volumes. The lv module supports the following variables:

    action - The action variable allows three values `create’, `setup-cache’, `convert’, and `change’. If the action is 'create', the following options are supported:
    • lvname: The name of the logical volume, this is an optional field. Default is GLUSTER_lv
    • poolname - Name of the thinpool volume name, this is an optional field. Default is GLUSTER_pool
    • lvtype - Type of the logical volume to be created, allowed values are `thin’ and `thick’. This is an optional field, default is thick.
    • size - Size of the logical volume volume. Default is to take all available space on the vg.
    • extent - Extent size, default is 100%FREE
    • force - Force lv create, do not ask any questions. Allowed values `yes’, `no’. This is an optional field, default is yes.
    • vgname - Name of the volume group to use.
    • pvname - Name of the physical volume to use.
    • chunksize - The size of the chunk unit used for snapshots, cache pools, and thin pools. By default this is specified in kilobytes. For RAID 5 and 6 volumes, gdeploy calculates the default chunksize by multiplying the stripe size and the disk count. For RAID 10, the default chunksize is 256 KB. See Section 20.2, “Brick Configuration” for details.

      Warning

      Red Hat recommends using at least the default chunksize. If the chunksize is too small and your volume runs out of space for metadata, the volume is unable to create data. This includes the data required to increase the size of the metadata pool or to migrate data away from a volume that has run out of metadata space. Red Hat recommends monitoring your logical volumes to ensure that they are expanded or more storage created before metadata volumes become completely full.
    • poolmetadatasize - Sets the size of pool's metadata logical volume. Allocate the maximum chunk size (16 GiB) if possible. If you allocate less than the maximum, allocate at least 0.5% of the pool size to ensure that you do not run out of metadata space.

      Warning

      If your metadata pool runs out of space, you cannot create data. This includes the data required to increase the size of the metadata pool or to migrate data away from a volume that has run out of metadata space. Monitor your metadata pool using the lvs -o+metadata_percent command and ensure that it does not run out of space.
    • virtualsize - Creates a thinly provisioned device or a sparse device of the given size
    • mkfs - Creates a filesystem of the given type. Default is to use xfs.
    • mkfs-opts - mkfs options.
    • mount - Mount the logical volume.
    If the action is setup-cache, the below options are supported:
    • ssd - Name of the ssd device. For example sda/vda/ … to setup cache.
    • vgname - Name of the volume group.
    • poolname - Name of the pool.
    • cache_meta_lv - Due to requirements from dm-cache (the kernel driver), LVM further splits the cache pool LV into two devices - the cache data LV and cache metadata LV. Provide the cache_meta_lv name here.
    • cache_meta_lvsize - Size of the cache meta lv.
    • cache_lv - Name of the cache data lv.
    • cache_lvsize - Size of the cache data.
    • force - Force
    If the action is convert, the below options are supported:
    • lvtype - type of the lv, available options are thin and thick
    • force - Force the lvconvert, default is yes.
    • vgname - Name of the volume group.
    • poolmetadata - Specifies cache or thin pool metadata logical volume.
    • cachemode - Allowed values writeback, writethrough. Default is writethrough.
    • cachepool - This argument is necessary when converting a logical volume to a cache LV. Name of the cachepool.
    • lvname - Name of the logical volume.
    • chunksize - The size of the chunk unit used for snapshots, cache pools, and thin pools. By default this is specified in kilobytes. For RAID 5 and 6 volumes, gdeploy calculates the default chunksize by multiplying the stripe size and the disk count. For RAID 10, the default chunksize is 256 KB. See Section 20.2, “Brick Configuration” for details.

      Warning

      Red Hat recommends using at least the default chunksize. If the chunksize is too small and your volume runs out of space for metadata, the volume is unable to create data. Red Hat recommends monitoring your logical volumes to ensure that they are expanded or more storage created before metadata volumes become completely full.
    • poolmetadataspare - Controls creation and maintanence of pool metadata spare logical volume that will be used for automated pool recovery.
    • thinpool - Specifies or converts logical volume into a thin pool's data volume. Volume’s name or path has to be given.
    If the action is change, the below options are supported:
    • lvname - Name of the logical volume.
    • vgname - Name of the volume group.
    • zero - Set zeroing mode for thin pool.
    Example 1: Create a thin LV
    [lv]
    action=create
    vgname=RHGS_vg1
    poolname=lvthinpool
    lvtype=thinpool
    poolmetadatasize=200MB
    chunksize=1024k
    size=30GB
    Example 2: Create a thick LV
    [lv]
    action=create
    vgname=RHGS_vg1
    lvname=engine_lv
    lvtype=thick
    size=10GB
    mount=/rhgs/brick1
    If there are more than one LVs, then the LVs can be created by numbering the LV sections, like [lv1], [lv2] …
  • RH-subscription

    Available in gdeploy 2.0. This module is used to subscribe, unsubscribe, attach, enable repos etc. The RH-subscription module allows the following variables:

    This module is used to subscribe, unsubscribe, attach, enable repos etc. The RH-subscription module allows the following variables:
    If the action is register, the following options are supported:
    • username/activationkey: Username or activationkey.
    • password/activationkey: Password or activation key
    • auto-attach: true/false
    • pool: Name of the pool.
    • repos: Repos to subscribe to.
    • disable-repos: Repo names to disable. Leaving this option blank will disable all the repos.
    • ignore_register_errors: If set to no, gdeploy will exit if system registration fails.
    • If the action is attach-pool the following options are supported:
      pool - Pool name to be attached.
      ignore_attach_pool_errors - If set to no, gdeploy fails if attach-pool fails.
    • If the action is enable-repos the following options are supported:
      repos - List of comma separated repos that are to be subscribed to.
      ignore_enable_errors - If set to no, gdeploy fails if enable-repos fail.
    • If the action is disable-repos the following options are supported:
      repos - List of comma separated repos that are to be subscribed to.
      ignore_disable_errors - If set to no, gdeploy fails if disable-repos fail
    • If the action is unregister the systems will be unregistered.
      ignore_unregister_errors - If set to no, gdeploy fails if unregistering fails.
    Example 1: Subscribe to Red Hat Subscription network:
    [RH-subscription1]
    action=register
    username=qa@redhat.com
    password=<passwd>
    pool=<pool>
    ignore_register_errors=no
    Example 2: Disable all the repos:
    [RH-subscription2]
    action=disable-repos
    repos=
    Example 3: Enable a few repos
    [RH-subscription3]
    action=enable-repos
    repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rhel-7-server-rhev-mgmt-agent-rpms
    ignore_enable_errors=no
  • yum

    Available in gdeploy 2.0. This module is used to install or remove rpm packages, with the yum module we can add repos as well during the install time.

    The action variable allows two values `install’ and `remove’.
    If the action is install the following options are supported:
    • packages - Comma separated list of packages that are to be installed.
    • repos - The repositories to be added.
    • gpgcheck - yes/no values have to be provided.
    • update - Whether yum update has to be initiated.
    If the action is remove then only one option has to be provided:
    • remove - The comma separated list of packages to be removed.
    For example
    [yum1]
    action=install
    gpgcheck=no
    # Repos should be an url; eg: http://repo-pointing-glusterfs-builds
    repos=<glusterfs.repo>,<vdsm.repo>
    packages=vdsm,vdsm-gluster,ovirt-hosted-engine-setup,screen,gluster-nagios-addons,xauth
    update=yes
    Install a package on a particular host.
    [yum2:host1]
    action=install
    gpgcheck=no
    packages=rhevm-appliance
  • shell

    Available in gdeploy 2.0. This module allows user to run shell commands on the remote nodes.

    Currently shell provides a single action variable with value execute. And a command variable with any valid shell command as value.
    The below command will execute vdsm-tool on all the nodes.
    [shell]
    action=execute
    command=vdsm-tool configure --force
  • update-file

    Available in gdeploy 2.0. update-file module allows users to copy a file, edit a line in a file, or add new lines to a file. action variable can be any of copy, edit, or add.

    When the action variable is set to copy, the following variables are supported.
    • src - The source path of the file to be copied from.
    • dest - The destination path on the remote machine to where the file is to be copied to.
    When the action variable is set to edit, the following variables are supported.
    • dest - The destination file name which has to be edited.
    • replace - A regular expression, which will match a line that will be replaced.
    • line - Text that has to be replaced.
    When the action variable is set to add, the following variables are supported.
    • dest - File on the remote machine to which a line has to be added.
    • line - Line which has to be added to the file. Line will be added towards the end of the file.
    Example 1: Copy a file to a remote machine.
    [update-file]
    action=copy
    src=/tmp/foo.cfg
    dest=/etc/nagios/nrpe.cfg
    Example 2: Edit a line in the remote machine, in the below example lines that have allowed_hosts will be replaced with allowed_hosts=host.redhat.com
    [update-file]
    action=edit
    dest=/etc/nagios/nrpe.cfg
    replace=allowed_hosts
    line=allowed_hosts=host.redhat.com
    Example 3: Add a line to the end of a file
    [update-file]
    action=add
    dest=/etc/ntp.conf
    line=server clock.redhat.com iburst
  • service

    Available in gdeploy 2.0. The service module allows user to start, stop, restart, reload, enable, or disable a service. The action variable specifies these values.

    When action variable is set to any of start, stop, restart, reload, enable, disable the variable servicename specifies which service to start, stop etc.
    • service - Name of the service to start, stop etc.
    Example: enable and start ntp daemon.
    [service1]
    action=enable
    service=ntpd
    [service2]
    action=restart
    service=ntpd
  • script

    Available in gdeploy 2.0. script module enables user to execute a script/binary on the remote machine. action variable is set to execute. Allows user to specify two variables file and args.

    • file - An executable on the local machine.
    • args - Arguments to the above program.
    Example: Execute script disable-multipath.sh on all the remote nodes listed in `hosts’ section.
    [script]
    action=execute
    file=/usr/share/ansible/gdeploy/scripts/disable-multipath.sh
  • firewalld

    Available in gdeploy 2.0. firewalld module allows the user to manipulate firewall rules. action variable supports two values `add’ and `delete’. Both add and delete support the following variables:

    • ports/services - The ports or services to add to firewall.
    • permanent - Whether to make the entry permanent. Allowed values are true/false
    • zone - Default zone is public
    For example:
    [firewalld]
    action=add
    ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp
    services=glusterfs

5.1.8. Deploying NFS Ganesha using gdeploy

gdeploy supports the deployment and configuration of NFS Ganesha on Red Hat Gluster Storage 3.4, from gdeploy version 2.0.1.
NFS-Ganesha is a user space file server for the NFS protocol. For more information about NFS-Ganesha see https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/#nfs_ganesha
5.1.8.1. Prerequisites
Ensure that the following prerequisites are met:
Subscribing to Subscription Manager

You must subscribe to subscription manager and obtain the NFS Ganesha packages before continuing further.

Add the following details to the configuration file to subscribe to subscription manager:
[RH-subscription1]
action=register
username=<user>@redhat.com
password=<password>
pool=<pool-id>
Execute the following command to run the configuration file:
# gdeploy -c <config_file_name>
Enabling Repos

To enable the required repos, add the following details in the configuration file:

[RH-subscription2]
action=enable-repos
repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rh-gluster-3-nfs-for-rhel-7-server-rpms,rhel-ha-for-rhel-7-server-rpms,rhel-7-server-ansible-2-rpms
Execute the following command to run the configuration file:
# gdeploy -c <config_file_name>
Enabling Firewall Ports

To enable the firewall ports, add the following details in the configuration file:

[firewalld]
action=add
ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp
services=glusterfs,nlm,nfs,rpc-bind,high-availability,mountd,rquota

Note

To ensure NFS client UDP mount does not fail, ensure to add port 2049/udp in [firewalld] section of gdeploy.
Execute the following command to run the configuration file:
# gdeploy -c <config_file_name>
Installing the Required Package:

To install the required package, add the following details in the configuration file

[yum]
action=install
repolist=
gpgcheck=no
update=no
packages=glusterfs-ganesha
Execute the following command to run the configuration file:
# gdeploy -c <config_file_name>
5.1.8.2. Supported Actions
The NFS Ganesha module in gdeploy allows the user to perform the following actions:
  • Creating a Cluster
  • Destroying a Cluster
  • Adding a Node
  • Deleting a Node
  • Exporting a Volume
  • Unexporting a Volume
  • Refreshing NFS Ganesha Configuration
Creating a Cluster

This action creates a fresh NFS-Ganesha setup on a given volume. For this action the nfs-ganesha in the configuration file section supports the following variables:

  • ha-name: This is an optional variable. By default it is ganesha-ha-360.
  • cluster-nodes: This is a required argument. This variable expects comma separated values of cluster node names, which is used to form the cluster.
  • vip: This is a required argument. This variable expects comma separated list of ip addresses. These will be the virtual ip addresses.
  • volname: This is an optional variable if the configuration contains the [volume] section
For example: To create a NFS-Ganesha cluster add the following details in the configuration file:
[hosts]
host-1.example.com
host-2.example.com

[backend-setup]
devices=/dev/vdb
vgs=vg1
pools=pool1
lvs=lv1
mountpoints=/mnt/brick

[firewalld]
action=add
ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp,662/tcp,662/udp
services=glusterfs,nlm,nfs,rpc-bind,high-availability,mountd,rquota

[volume]
action=create
volname=ganesha
transport=tcp
replica_count=2
force=yes

#Creating a high availability cluster and exporting the volume
[nfs-ganesha]
action=create-cluster
ha-name=ganesha-ha-360
cluster-nodes=host-1.example.com,host-2.example.com
vip=10.70.44.121,10.70.44.122
volname=ganesha
ignore_ganesha_errors=no
In the above example, it is assumed that the required packages are installed, a volume is created and NFS-Ganesha is enabled on it.
Execute the configuration using the following command:
# gdeploy -c <config_file_name>
Destroying a Cluster

The action, destroy-cluster cluster disables NFS Ganesha. It allows one variable, cluster-nodes.

For example: To destroy a NFS-Ganesha cluster add the following details in the configuration file:
[hosts]
host-1.example.com
host-2.example.com

# To destroy the high availability cluster

[nfs-ganesha]
action=destroy-cluster
cluster-nodes=host-1.example.com,host-2.example.com
Execute the configuration using the following command:
# gdeploy -c <config_file_name>
Adding a Node

The add-node action allows three variables:

  • nodes: Accepts a list of comma separated hostnames that have to be added to the cluster
  • vip: Accepts a list of comma separated ip addresses.
  • cluster_nodes: Accepts a list of comma separated nodes of the NFS Ganesha cluster.
For example, to add a node, add the following details to the configuration file:
[hosts]
host-1.example.com
host-2.example.com
host-3.example.com

[peer]
action=probe

[clients]
action=mount
volname=host-3.example.com:gluster_shared_storage
hosts=host-3.example.com
fstype=glusterfs
client_mount_points=/var/run/gluster/shared_storage/


[nfs-ganesha]
action=add-node
nodes=host-3.example.com
cluster_nodes=host-1.example.com,host-2.example.com
vip=10.0.0.33
Execute the configuration using the following command:
# gdeploy -c <config_file_name>
Deleting a Node

The delete-node action takes one variable, nodes, which specifies the node or nodes to delete from the NFS Ganesha cluster in a comma delimited list.

For example:
[hosts]
host-1.example.com
host-2.example.com
host-3.example.com
host-4.example.com

[nfs-ganesha]
action=delete-node
nodes=host-2.example.com
Exporting a Volume

This action exports a volume. export-volume action supports one variable, volname.

For example, to export a volume, add the following details to the configuration file:
[hosts]
host-1.example.com
host-2.example.com

[nfs-ganesha]
action=export-volume
volname=ganesha
Execute the configuration using the following command:
# gdeploy -c <config_file_name>
Unexporting a Volume:

This action unexports a volume. unexport-volume action supports one variable, volname.

For example, to unexport a volume, add the following details to the configuration file:
[hosts]
host-1.example.com
host-2.example.com

[nfs-ganesha]
action=unexport-volume
volname=ganesha
Execute the configuration using the following command:
# gdeploy -c <config_file_name>
Refreshing NFS Ganesha Configuration

This action will add/delete or add a config block to the configuration file and runs refresh-config on the cluster.

The action refresh-config supports the following variables:
  • del-config-lines
  • block-name
  • volname
  • ha-conf-dir
  • update_config_lines
Example 1 - To add a client block and run refresh-config add the following details to the configuration file:

Note

refresh-config with client block has few limitations:
  • Works for only one client
  • User cannot delete a line from a config block
[hosts]
host1-example.com
host2-example.com

[nfs-ganesha]
action=refresh-config
# Default block name is `client'
block-name=client
config-block=clients = 10.0.0.1;|allow_root_access = true;|access_type = "RO";|Protocols = "2", "3";|anonymous_uid = 1440;|anonymous_gid = 72;
volname=ganesha
Execute the configuration using the following command:
# gdeploy -c <config_file_name>
Example 2 - To delete a line and run refresh-config add the following details to the configuration file:
[hosts]
host1-example.com
host2-example.com


[nfs-ganesha]
action=refresh-config
del-config-lines=client
volname=ganesha
Execute the configuration using the following command:
# gdeploy -c <config_file_name>
Example 3 - To run refresh-config on a volume add the following details to the configuration file:
[hosts]
host1-example.com
host2-example.com


[nfs-ganesha]
action=refresh-config
volname=ganesha
Execute the configuration using the following command:
# gdeploy -c <config_file_name>
Example 4 - To modify a line and run refresh-config add the following details to the configuration file:
[hosts]
host1-example.com
host2-example.com


[nfs-ganesha]
action=refresh-config
update_config_lines=Access_type = "RO";
#update_config_lines=Protocols = "4";
#update_config_lines=clients = 10.0.0.1;
volname=ganesha
Execute the configuration using the following command:
# gdeploy -c <config_file_name>

5.1.9. Deploying Samba / CTDB using gdeploy

The Server Message Block (SMB) protocol can be used to access Red Hat Gluster Storage volumes by exporting directories in GlusterFS volumes as SMB shares on the server. In Red Hat Gluster Storage, Samba is used to share volumes through SMB protocol.
5.1.9.1. Prerequisites
Ensure that the following prerequisites are met:
Subscribing to Subscription Manager

You must subscribe to subscription manager and obtain the Samba packages before continuing further.

Add the following details to the configuration file to subscribe to subscription manager:
[RH-subscription1]
action=register
username=<user>@redhat.com
password=<password>
pool=<pool-id>
Execute the following command to run the configuration file:
# gdeploy -c <config_file_name>
Enabling Repos

To enable the required repos, add the following details in the configuration file:

[RH-subscription2]
action=enable-repos
repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rh-gluster-3-samba-for-rhel-7-server-rpms,rhel-7-server-ansible-2-rpms
Execute the following command to run the configuration file:
# gdeploy -c <config_file_name>
Enabling Firewall Ports

To enable the firewall ports, add the following details in the configuration file:

[firewalld]
action=add
ports=54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,4379/tcp
services=glusterfs,samba,high-availability
Execute the following command to run the configuration file:
# gdeploy -c <config_file_name>
Installing the Required Package:

To install the required package, add the following details in the configuration file

[yum]
action=install
repolist=
gpgcheck=no
update=no
packages=samba,samba-client,glusterfs-server,ctdb
Execute the following command to run the configuration file:
# gdeploy -c <config_file_name>
5.1.9.2. Setting up Samba
Samba can be enabled in two ways:
  • Enabling Samba on an existing volume
  • Enabling Samba while creating a volume
Enabling Samba on an existing volume

If a Red Hat Gluster Storage volume is already present, then the user has to mention the action as smb-setup in the volume section. It is necessary to mention all the hosts that are in the cluster, as gdeploy updates the glusterd configuration files on each of the hosts.

For example, to enable Samba on an existing volume, add the following details to the configuration file:
[hosts]
10.70.37.192
10.70.37.88

[volume]
action=smb-setup
volname=samba1
force=yes
smb_username=smbuser
smb_mountpoint=/mnt/smb

Note

Ensure that the hosts are not part of the CTDB cluster.
Execute the configuration using the following command:
# gdeploy -c <config_file_name>
Enabling Samba while creating a Volume

If Samba has be set up while creating a volume, the a variable smb has to be set to yes in the configuration file.

For example, to enable Samba while creating a volume, add the following details to the configuration file:
[hosts]
10.70.37.192
10.70.37.88

[backend-setup]
devices=/dev/vdb
vgs=vg1
pools=pool1
lvs=lv1
mountpoints=/mnt/brick

[volume]
action=create
volname=samba1
smb=yes
force=yes
smb_username=smbuser
smb_mountpoint=/mnt/smb
Execute the configuration using the following command:
# gdeploy -c <config_file_name>

Note

In both the cases of enabling Samba, smb_username and smb_mountpoint are necessary if samba has to be setup with the acls set correctly.
5.1.9.3. Setting up CTDB
Using CTDB requires setting up a separate volume in order to protect the CTDB lock file. Red Hat recommends a replicated volume where the replica count is equal to the number of servers being used as Samba servers.
The following configuration file sets up a CTDB volume across two hosts that are also Samba servers.
[hosts]
10.70.37.192
10.70.37.88

[volume]
action=create
volname=ctdb
transport=tcp
replica_count=2
force=yes

[ctdb]
action=setup
public_address=10.70.37.6/24 eth0,10.70.37.8/24 eth0
volname=ctdb
You can configure the CTDB cluster to use separate IP addresses by using the ctdb_nodes parameter, as shown in the following example.
[hosts]
10.70.37.192
10.70.37.88

[volume]
action=create
volname=ctdb
transport=tcp
replica_count=2
force=yes

[ctdb]
action=setup
public_address=10.70.37.6/24 eth0,10.70.37.8/24 eth0
ctdb_nodes=192.168.1.1,192.168.2.5
volname=ctdb
Execute the configuration using the following command:
# gdeploy -c <config_file_name>

5.1.10. Enabling SSL on a Volume

You can create volumes with SSL enabled, or enable SSL on an exisiting volumes using gdeploy (v2.0.1 onwards). This section explains how the configuration files should be written for gdeploy to enable SSL.
5.1.10.1. Creating a Volume and Enabling SSL
To create a volume and enable SSL on it, add the following details to the configuration file:
[hosts]
10.70.37.147
10.70.37.47

[backend-setup]
devices=/dev/vdb
vgs=vg1
pools=pool1
lvs=lv1
mountpoints=/mnt/brick

[volume]
action=create
volname=vol1
transport=tcp
replica_count=2
force=yes
enable_ssl=yes
ssl_clients=10.70.37.107,10.70.37.173
brick_dirs=/data/1

[clients]
action=mount
hosts=10.70.37.173,10.70.37.107
volname=vol1
fstype=glusterfs
client_mount_points=/mnt/data
In the above example, a volume named vol1 is created and SSL is enabled on it. gdeploy creates self signed certificates.
After adding the details to the configuration file, execute the following command to run the configuration file:
# gdeploy -c <config_file_name>
5.1.10.2. Enabling SSL on an Existing Volume:
To enable SSL on an existing volume, add the following details to the configuration file:
[hosts]
10.70.37.147
10.70.37.47

# It is important for the clients to be unmounted before setting up SSL
[clients1]
action=unmount
hosts=10.70.37.173,10.70.37.107
client_mount_points=/mnt/data

[volume]
action=enable-ssl
volname=vol2
ssl_clients=10.70.37.107,10.70.37.173

[clients2]
action=mount
hosts=10.70.37.173,10.70.37.107
volname=vol2
fstype=glusterfs
client_mount_points=/mnt/data
After adding the details to the configuration file, execute the following command to run the configuration file:
# gdeploy -c <config_file_name>

5.1.11. Limiting Gluster Resources

When Red Hat Gluster Storage is deployed on the same machine as other resource intensive software and services, it can be useful to limit the resources that glusterd attempts to use to avoid resource contention between processes.
To limit the resources available to glusterd on a Red Hat Enterpise Linux 7 based installation of Red Hat Gluster Storage 3.2 or higher, define slice_setup=yes when you start the glusterd service. This applies a set of resource limitations for the glusterd service and all of its child processes.
[hosts]
192.168.100.101
192.168.100.102
192.168.100.103

[service]
action=start
service=glusterd
slice_setup=yes
The resource limitations set cannot be customized using gdeploy, but they can be manually modified outside the scope of gdeploy, for example, by using systemctl.
If you use a version of Red Hat Gluster Storage that is based on Red Hat Enterprise Linux 6, you cannot set up resource management using gdeploy. See Chapter 19, Managing Resource Usage for details.
For more information about resource management, see the Red Hat Enterprise Linux Resource Management Guide:

5.1.12. Gdeploy log files

Because gdeploy is usually run by non-privileged users, by default, gdeploy log files are written to /home/username/.gdeploy/logs/gdeploy.log instead of the /var/log directory.
You can change the log location by setting a different location as the value of the GDEPLOY_LOGFILE environment variable. For example, to set the gdeploy log location to /var/log/gdeploy/gdeploy.log for this session, run the following command:
$ export GDEPLOY_LOGFILE=/var/log/gdeploy/gdeploy.log
To persistently set this as the default log location for this user, add the same command as a separate line in the /home/username/.bash_profile file for that user.

5.2. Managing Volumes using Heketi

Heketi provides a RESTful management interface which can be used to manage the lifecycle of Red Hat Gluster Storage volumes. With Heketi, cloud services like OpenStack Manila, Kubernetes, and OpenShift can dynamically provision Red Hat Gluster Storage volumes with any of the supported durability types. Heketi will automatically determine the location for bricks across the cluster, making sure to place bricks and its replicas across different failure domains. Heketi also supports any number of Red Hat Gluster Storage clusters, allowing cloud services to provide network file storage without being limited to a single Red Hat Gluster Storage cluster.
With Heketi, the administrator no longer manages or configures bricks, disks, or trusted storage pools. Heketi service will manage all hardware for the administrator, enabling it to allocate storage on demand. Any disks registered with Heketi must be provided in raw format, which will then be managed by it using LVM on the disks provided.

Note

The replica 3 volume type is the default and the only supported volume type that can be created using Heketi.
Heketi volume creation

Figure 5.1. Heketi volume creation

A create volume request to Heketi leads it to select bricks spread across 2 zones and 4 nodes. After the volume is created in Red hat Gluster Storage, Heketi provides the volume information to the service that initially made the request.
Heketi can be configured and executed using the CLI or the API. The sections ahead describe configuring Heketi using the CLI.

5.2.1. Prerequisites

Ensure that the following requirements are met:
Configure SSH access
Configure key-based SSH authentication without a password for the Heketi user. For a non-root user:
  • Ensure the user and server specified when copying SSH keys matches the user provided to Heketi in the Heketi configuration file.
  • Ensure the user can use sudo by disabling requiretty in the /etc/sudoers file and adding sudo: true to the sshexec configuration section in the Heketi configuration file.
Configure the firewall
Ensure that Heketi can accept TCP requests over the port specified in the heketi.json file. For example, on Red Hat Enterprise Linux 7 based installations, run the following commands:
# firewall-cmd --zone=zone_name --add-port=port/tcp
# firewall-cmd --zone=zone_name --add-port=port/tcp --permanent
On Red Hat Enterprise Linux 6 based installations, run the following commands:
# iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport port -j ACCEPT
# service iptables save
Start glusterd
After Red Hat Gluster Storage is installed, ensure that the glusterd service is started.
Ensure disks are raw format
Disks to be registered with Heketi must be in the raw format.

5.2.2. Installing Heketi

Note

Heketi is supported only on Red Hat Enterprise Linux 7.
After installing Red Hat Gluster Storage 3.4, execute the following command to install the heketi-client:
 # yum install heketi-client
heketi-client has the binary for the heketi command line tool.
Execute the following command to install heketi:
# yum install heketi
For more information about subscribing to the required channels and installing Red Hat Gluster Storage, see the Red Hat Gluster Storage Installation Guide.

5.2.3. Starting the Heketi Server

Before starting the server, ensure that the following prerequisites are met:
  • Generate the passphrase-less SSH keys for the nodes which are going to be part of the trusted storage pool by running the following command:
    # ssh-keygen -f /etc/heketi/heketi_key -t rsa -N ''
  • Change the owner and the group permissions for the heketi keys using the following command:
    # chown heketi:heketi /etc/heketi/heketi_key*
  • Set up key-based SSH authentication access between Heketi and the Red Hat Gluster Storage servers by running the following command:
    # ssh-copy-id -i /etc/heketi/heketi_key.pub root@server
  • As a non root user, set up password-less SSH access between Heketi and the Red Hat Gluster Storage servers by running the following command:
    $ ssh-copy-id -i /etc/heketi/heketi_key.pub user@server
  • Note

    To run SSH as a non-root, the username mentioned in username>@server for ssh-copy-id must match with the user name provided to Heketi in the Heketi configuration file below.
  • Setup the heketi.json configuration file. The file is located in /etc/heketi/heketi.json. The configuration file has the information required to run the Heketi server. The config file must be in JSON format with the following settings:
    • port: string, Heketi REST service port number
    • use_auth: bool, Enable JWT Authentication
    • jwt: map, JWT Authentication settings
      • admin: map, Settings for the Heketi administrator
        • key: string,
        • user: map, Settings for the Heketi volume requests access user
        • key: string, t
    • glusterfs: map, Red Hat Gluster Storage settings
      • executor: string, Determines the type of command executor to use. Possible values are:
        • mock: Does not send any commands out to servers. Can be used for development and tests
        • ssh: Sends commands to real systems over ssh
      • db: string, Location of Heketi database
      • sshexec: map, SSH configuration
        • keyfile: string, File with private ssh key
        • user: string, SSH user
    Following is an example of the JSON file:
    {
      "_port_comment": "Heketi Server Port Number",
      "port": "8080",
    
      "_use_auth": "Enable JWT authorization. Please enable for deployment",
      "use_auth": false,
    
      "_jwt": "Private keys for access",
      "jwt": {
        "_admin": "Admin has access to all APIs",
        "admin": {
          "key": "My Secret"
        },
        "_user": "User only has access to /volumes endpoint",
        "user": {
          "key": "My Secret"
        }
      },
    
      "_glusterfs_comment": "GlusterFS Configuration",
      "glusterfs": {
        "_executor_comment": [
          "Execute plugin. Possible choices: mock, ssh",
          "mock: This setting is used for testing and development.",
          "      It will not send commands to any node.",
          "ssh:  This setting will notify Heketi to ssh to the nodes.",
          "      It will need the values in sshexec to be configured.",
          "kubernetes: Communicate with GlusterFS containers over",
          "            Kubernetes exec api."
        ],
        "executor": "ssh",
    
        "_sshexec_comment": "SSH username and private key file information",
        "sshexec": {
          "keyfile": "path/to/private_key",
          "user": "sshuser",
          "port": "Optional: ssh port.  Default is 22",
          "fstab": "Optional: Specify fstab file on node.  Default is /etc/fstab",
          "sudo": "Optional: set to true if SSH as a non root user. Default is false."
        },
    
        "_kubeexec_comment": "Kubernetes configuration",
        "kubeexec": {
          "host" :"https://kubernetes.host:8443",
          "cert" : "/path/to/crt.file",
          "insecure": false,
          "user": "kubernetes username",
          "password": "password for kubernetes user",
          "namespace": "OpenShift project or Kubernetes namespace",
          "fstab": "Optional: Specify fstab file on node.  Default is /etc/fstab"
        },
    
        "_db_comment": "Database file name",
        "db": "/var/lib/heketi/heketi.db",
    
        "_loglevel_comment": [
          "Set log level. Choices are:",
          "  none, critical, error, warning, info, debug",
          "Default is warning"
        ],
        "loglevel" : "debug"
      }
    }
    

    Note

    The location for the private SSH key that is created must be set in the keyfile setting of the configuration file, and the key should be readable by the heketi user.
5.2.3.1. Starting the Server
For Red Hat Enterprise Linux 7

  1. Enable heketi by executing the following command:
    # systemctl enable heketi
  2. Start the Heketi server, by executing the following command:
    # systemctl start heketi
  3. To check the status of the Heketi server, execute the following command:
    # systemctl status heketi
  4. To check the logs, execute the following command:
    # journalctl -u heketi

Note

After Heketi is configured to manage the trusted storage pool, gluster commands should not be run on it, as this will make the heketidb inconsistent, leading to unexpected behaviors with Heketi.
5.2.3.2. Verifying the Configuration
To verify if the server is running, execute the following step:
If Heketi is not setup with authentication, then use curl to verify the configuration:
# curl http://<server:port>/hello
You can also verify the configuration using the heketi-cli when authentication is enabled:
# heketi-cli --server http://<server:port> --user <user> --secret <secret> cluster list

5.2.4. Setting up the Topology

Setting up the topology allows Heketi to determine which nodes, disks, and clusters to use.
5.2.4.1. Prerequisites
You have to determine the node failure domains and clusters of nodes. Failure domains is a value given to a set of nodes which share the same switch, power supply, or anything else that would cause them to fail at the same time. Heketi uses this information to make sure that replicas are created across failure domains, thus providing cloud services volumes which are resilient to both data unavailability and data loss.
You have to determine which nodes would constitute a cluster. Heketi supports multiple Red Hat Gluster Storage clusters, which gives cloud services the option of specifying a set of clusters where a volume must be created. This provides cloud services and administrators the option of creating SSD, SAS, SATA, or any other type of cluster which provide a specific quality of service to users.

Note

Heketi does not have a mechanism today to study and build its database from an existing system. So, a new trusted storage pool has to be configured that can be used by Heketi.
5.2.4.2. Topology Setup
The command line client loads the information about creating a cluster, adding nodes to that cluster, and then adding disks to each one of those nodes.This information is added into the topology file. To load a topology file with heketi-cli, execute the following command:

Note

A sample, formatted topology file (topology-sample.json) is installed with the ‘heketi-client’ package in the /usr/share/heketi/ directory.
# export HEKETI_CLI_SERVER=http://<heketi_server:port>
# heketi-cli topology load --json=<topology_file>
Where topology_file is a file in JSON format describing the clusters, nodes, and disks to add to Heketi. The format of the file is as follows:
clusters: Array of clusters
  • Each element on the array is a map which describes the cluster as follows
    • nodes: Array of nodes in a cluster
      Each element on the array is a map which describes the node as follows
      • node: Same as Node Add, except there is no need to supply the cluster ID.
      • devices: Name of each disk to be added
      • zone: The value represents failure domain on which the node exists.
For example:
  1. Topology file:
    {
        "clusters": [
            {
                "nodes": [
                    {
                        "node": {
                            "hostnames": {
                                "manage": [
                                    "10.0.0.1"
                                ],
                                "storage": [
                                    "10.0.0.1"
                                ]
                            },
                            "zone": 1
                        },
                        "devices": [
                            "/dev/sdb",
                            "/dev/sdc",
                            "/dev/sdd",
                            "/dev/sde",
                            "/dev/sdf",
                            "/dev/sdg",
                            "/dev/sdh",
                            "/dev/sdi"
                        ]
                    },
                    {
                        "node": {
                            "hostnames": {
                                "manage": [
                                    "10.0.0.2"
                                ],
                                "storage": [
                                    "10.0.0.2"
                                ]
                            },
                            "zone": 2
                        },
                        "devices": [
                            "/dev/sdb",
                            "/dev/sdc",
                            "/dev/sdd",
                            "/dev/sde",
                            "/dev/sdf",
                            "/dev/sdg",
                            "/dev/sdh",
                            "/dev/sdi"
                        ]
                    },
    
    .......
    .......
  2. Load the Heketi JSON file:
    # heketi-cli topology load --json=topology_libvirt.json
    Creating cluster ... ID: a0d9021ad085b30124afbcf8df95ec06
            Creating node 192.168.10.100 ... ID: b455e763001d7903419c8ddd2f58aea0
                    Adding device /dev/vdb ... OK
                    Adding device /dev/vdc ... OK
    …….
            Creating node 192.168.10.101 ... ID: 4635bc1fe7b1394f9d14827c7372ef54
                    Adding device /dev/vdb ... OK
                    Adding device /dev/vdc ... OK
    ………….
    
  3. Execute the following command to check the details of a particular node:
    # heketi-cli node info b455e763001d7903419c8ddd2f58aea0
    Node Id: b455e763001d7903419c8ddd2f58aea0
    Cluster Id: a0d9021ad085b30124afbcf8df95ec06
    Zone: 1
    Management Hostname: 192.168.10.100
    Storage Hostname: 192.168.10.100
    Devices:
    Id:0ddba53c70537938f3f06a65a4a7e88b   Name:/dev/vdi            Size (GiB):499     Used (GiB):0       Free (GiB):499
    Id:4fae3aabbaf79d779795824ca6dc433a   Name:/dev/vdg            Size (GiB):499     Used (GiB):0       Free (GiB):499
    …………….
  4. Execute the following command to check the details of the cluster:
    # heketi-cli cluster info a0d9021ad085b30124afbcf8df95ec06
    Cluster id: a0d9021ad085b30124afbcf8df95ec06
    Nodes:
    4635bc1fe7b1394f9d14827c7372ef54
    802a3bfab2d0295772ea4bd39a97cd5e
    b455e763001d7903419c8ddd2f58aea0
    ff9eeb735da341f8772d9415166b3f9d
    Volumes:
  5. To check the details of the device, execute the following command:
    # heketi-cli device info 0ddba53c70537938f3f06a65a4a7e88b
    Device Id: 0ddba53c70537938f3f06a65a4a7e88b
    Name: /dev/vdi
    Size (GiB): 499
    Used (GiB): 0
    Free (GiB): 499
    Bricks:
    

5.2.5. Creating a Volume

After Heketi is set up, you can use the CLI to create a volume.
  1. Execute the following command to check the various option for creating a volume:
    # heketi-cli volume create --size=<size in Gb> [options]
  2. For example: After setting up the topology file with two nodes on one failure domain, and two nodes in another failure domain, create a 100Gb volume using the following command:
    # heketi-cli volume create --size=100
    Name: vol_0729fe8ce9cee6eac9ccf01f84dc88cc
    Size: 100
    Id: 0729fe8ce9cee6eac9ccf01f84dc88cc
    Cluster Id: a0d9021ad085b30124afbcf8df95ec06
    Mount: 192.168.10.101:vol_0729fe8ce9cee6eac9ccf01f84dc88cc
    Mount Options: backupvolfile-servers=192.168.10.100,192.168.10.102
    Durability Type: replicate
    Replica: 3
    Snapshot: Disabled
    
    Bricks:
    Id: 8998961142c1b51ab82d14a4a7f4402d
    Path: /var/lib/heketi/mounts/vg_0ddba53c70537938f3f06a65a4a7e88b/brick_8998961142c1b51ab82d14a4a7f4402d/brick
    Size (GiB): 50
    Node: b455e763001d7903419c8ddd2f58aea0
    Device: 0ddba53c70537938f3f06a65a4a7e88b
     …………….
    
  3. To check the details of the device, execute the following command:
    # heketi-cli device info 0ddba53c70537938f3f06a65a4a7e88b
    Device Id: 0ddba53c70537938f3f06a65a4a7e88b
    Name: /dev/vdi
    Size (GiB): 499
    Used (GiB): 201
    Free (GiB): 298
    Bricks:
    Id:0f1766cc142f1828d13c01e6eed12c74   Size (GiB):50      Path: /var/lib/heketi/mounts/vg_0ddba53c70537938f3f06a65a4a7e88b/brick_0f1766cc142f1828d13c01e6eed12c74/brick
    Id:5d944c47779864b428faa3edcaac6902   Size (GiB):50      Path: /var/lib/heketi/mounts/vg_0ddba53c70537938f3f06a65a4a7e88b/brick_5d944c47779864b428faa3edcaac6902/brick
    Id:8998961142c1b51ab82d14a4a7f4402d   Size (GiB):50      Path: /var/lib/heketi/mounts/vg_0ddba53c70537938f3f06a65a4a7e88b/brick_8998961142c1b51ab82d14a4a7f4402d/brick
    Id:a11e7246bb21b34a157e0e1fd598b3f9   Size (GiB):50      Path: /var/lib/heketi/mounts/vg_0ddba53c70537938f3f06a65a4a7e88b/brick_a11e7246bb21b34a157e0e1fd598b3f9/brick

5.2.6. Expanding a Volume

Heketi expands a volume size by using add-brick command. The volume id has to be provided to perform volume expansion.
  1. Find the volume id using the volume list command.
    # heketi-cli volume list
    Id:9d219903604cabed5ba234f4f04b2270    Cluster:dab7237f6d6d4825fca8b83a0fac24ac    Name:vol_9d219903604cabed5ba234f4f04b2270
    Id:a8770efe13a2269a051712905449f1c1    Cluster:dab7237f6d6d4825fca8b83a0fac24ac    Name:user1vol1
  2. This volume id can be used as input to heketi-cli for expanding the volume.
    # heketi-cli volume expand --volume <volume_id> --expand-size <size>
    For example:
    # heketi-cli volume expand --volume a8770efe13a2269a051712905449f1c1 --expand-size 30
    Name: user1vol1
    Size: 130
    Volume Id: a8770efe13a2269a051712905449f1c1
    Cluster Id: dab7237f6d6d4825fca8b83a0fac24ac
    Mount: 192.168.21.14:user1vol1
    Mount Options: backup-volfile-servers=192.168.21.15,192.168.21.16
    Block: false
    Free Size: 0
    Block Volumes: []
    Durability Type: replicate
    Distributed+Replica: 3

5.2.7. Deleting a Volume

To delete a volume, execute the following command:
# heketi-cli volume delete <vol_id>
For example:
$ heketi-cli volume delete 0729fe8ce9cee6eac9ccf01f84dc88cc
Volume 0729fe8ce9cee6eac9ccf01f84dc88cc deleted

5.3. About Encrypted Disk

Red Hat Gluster Storage provides the ability to create bricks on encrypted devices to restrict data access. Encrypted bricks can be used to create Red Hat Gluster Storage volumes.
For information on creating encrypted disk, refer to the following product documentation:
  • For RHEL 6, see Disk Encryption Appendix of the Red Hat Enterprise Linux 6 Installation Guide.
  • For RHEL 7, see Encryption of the Red Hat Enterprise Linux 7 Security Guide.

5.4. Formatting and Mounting Bricks

To create a Red Hat Gluster Storage volume, specify the bricks that comprise the volume. After creating the volume, the volume must be started before it can be mounted.

5.4.1. Creating Bricks Manually

Important

  • Red Hat supports formatting a Logical Volume using the XFS file system on the bricks.
5.4.1.1. Creating a Thinly Provisioned Logical Volume
  1. Create a physical volume(PV) by using the pvcreate command.
    # pvcreate --dataalignment alignment_value device
    For example:
    # pvcreate --dataalignment 1280K /dev/sdb
    Here, /dev/sdb is a storage device.
    Use the correct dataalignment option based on your device. For more information, see Section 20.2, “Brick Configuration”

    Note

    The device name and the alignment value will vary based on the device you are using.
  2. Create a Volume Group (VG) from the PV using the vgcreate command:
    # vgcreate --physicalextentsize alignment_value volgroup device
    For example:
    # vgcreate --physicalextentsize 1280K rhs_vg /dev/sdb
  3. Create a thin-pool using the following commands:
    # lvcreate --thin volgroup/poolname --size pool_sz --chunksize chunk_sz --poolmetadatasize metadev_sz --zero n
    
    For example:
    # lvcreate --thin rhs_vg/rhs_pool --size 2T --chunksize 1280K --poolmetadatasize 16G --zero n
    Ensure you read Chapter 20, Tuning for Performance to select appropriate values for chunksize and poolmetadatasize.
  4. Create a thinly provisioned volume that uses the previously created pool by running the lvcreate command with the --virtualsize and --thin options:
    # lvcreate --virtualsize size --thin volgroup/poolname --name volname
    For example:
    # lvcreate --virtualsize 1G --thin rhs_vg/rhs_pool --name rhs_lv
    It is recommended that only one LV should be created in a thin pool.
  5. Format bricks using the supported XFS configuration, mount the bricks, and verify the bricks are mounted correctly. To enhance the performance of Red Hat Gluster Storage, ensure you read Chapter 20, Tuning for Performance before formatting the bricks.

    Important

    Snapshots are not supported on bricks formatted with external log devices. Do not use -l logdev=device option with mkfs.xfs command for formatting the Red Hat Gluster Storage bricks.
    # mkfs.xfs -f -i size=512 -n size=8192 -d su=128k,sw=10 device
    DEVICE is the created thin LV. The inode size is set to 512 bytes to accommodate for the extended attributes used by Red Hat Gluster Storage.
  6. Run # mkdir /mountpoint to create a directory to link the brick to.
    # mkdir /rhgs
  7. Add an entry in /etc/fstab:
    /dev/volgroup/volname /mountpoint  xfs rw,inode64,noatime,nouuid,x-systemd.device-timeout=10min  1 2
    For example:
    /dev/rhs_vg/rhs_lv /rhgs  xfs rw,inode64,noatime,nouuid,x-systemd.device-timeout=10min  1 2
  8. Run mount /mountpoint to mount the brick.
  9. Run the df -h command to verify the brick is successfully mounted:
    # df -h
    /dev/rhs_vg/rhs_lv   16G  1.2G   15G   7% /rhgs
  10. If SElinux is enabled, then the SELinux labels that has to be set manually for the bricks created using the following commands:
    # semanage fcontext -a -t glusterd_brick_t /rhgs/brick1
    # restorecon -Rv /rhgs/brick1

5.4.2. Using Subdirectory as the Brick for Volume

You can create an XFS file system, mount them and point them as bricks while creating a Red Hat Gluster Storage volume. If the mount point is unavailable, the data is directly written to the root file system in the unmounted directory.
For example, the /rhgs directory is the mounted file system and is used as the brick for volume creation. However, for some reason, if the mount point is unavailable, any write continues to happen in the /rhgs directory, but now this is under root file system.
To overcome this issue, you can perform the below procedure.
During Red Hat Gluster Storage setup, create an XFS file system and mount it. After mounting, create a subdirectory and use this subdirectory as the brick for volume creation. Here, the XFS file system is mounted as /bricks. After the file system is available, create a directory called /rhgs/brick1 and use it for volume creation. Ensure that no more than one brick is created from a single mount. This approach has the following advantages:
  • When the /rhgs file system is unavailable, there is no longer/rhgs/brick1 directory available in the system. Hence, there will be no data loss by writing to a different location.
  • This does not require any additional file system for nesting.
Perform the following to use subdirectories as bricks for creating a volume:
  1. Create the brick1 subdirectory in the mounted file system.
    # mkdir /rhgs/brick1
    Repeat the above steps on all nodes.
  2. Create the Red Hat Gluster Storage volume using the subdirectories as bricks.
    # gluster volume create distdata01 ad-rhs-srv1:/rhgs/brick1
    ad-rhs-srv2:/rhgs/brick2
  3. Start the Red Hat Gluster Storage volume.
    # gluster volume start distdata01
  4. Verify the status of the volume.
    # gluster  volume status distdata01

Note

If multiple bricks are used from the same server, then ensure the bricks are mounted in the following format. For example:
# df -h

/dev/rhs_vg/rhs_lv1   16G  1.2G   15G   7% /rhgs1
/dev/rhs_vg/rhs_lv2   16G  1.2G   15G   7% /rhgs2
Create a distribute volume with 2 bricks from each server. For example:
# gluster volume create test-volume server1:/rhgs1/brick1 server2:/rhgs1/brick1 server1:/rhgs2/brick2 server2:/rhgs2/brick2

5.4.3.  Reusing a Brick from a Deleted Volume

Bricks can be reused from deleted volumes, however some steps are required to make the brick reusable.
Brick with a File System Suitable for Reformatting (Optimal Method)
Run # mkfs.xfs -f -i size=512 device to reformat the brick to supported requirements, and make it available for immediate reuse in a new volume.

Note

All data will be erased when the brick is reformatted.
File System on a Parent of a Brick Directory
If the file system cannot be reformatted, remove the whole brick directory and create it again.

5.4.4. Cleaning An Unusable Brick

If the file system associated with the brick cannot be reformatted, and the brick directory cannot be removed, perform the following steps:
  1. Delete all previously existing data in the brick, including the .glusterfs subdirectory.
  2. Run # setfattr -x trusted.glusterfs.volume-id brick and # setfattr -x trusted.gfid brick to remove the attributes from the root of the brick.
  3. Run # getfattr -d -m . brick to examine the attributes set on the volume. Take note of the attributes.
  4. Run # setfattr -x attribute brick to remove the attributes relating to the glusterFS file system.
    The trusted.glusterfs.dht attribute for a distributed volume is one such example of attributes that need to be removed.

5.5. Creating Distributed Volumes

This type of volume spreads files across the bricks in the volume.
Illustration of a distributed volume consisting of two servers. Two files are shown on the server1 brick, and one file is shown on the server2 brick. The distributed volume is set to a single mount point.

Figure 5.2. Illustration of a Distributed Volume

Warning

Distributed volumes can suffer significant data loss during a disk or server failure because directory contents are spread randomly across the bricks in the volume.
Use distributed volumes where scalable storage and redundancy is either not important, or is provided by other hardware or software layers.

Create a Distributed Volume

Use gluster volume create command to create different types of volumes, and gluster volume info command to verify successful volume creation.

Prerequisites

  1. Run the gluster volume create command to create the distributed volume.
    The syntax is gluster volume create NEW-VOLNAME [transport tcp | rdma | tcp,rdma] NEW-BRICK...
    The default value for transport is tcp. Other options can be passed such as auth.allow or auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.
    Red Hat recommends disabling the performance.client-io-threads option on distributed volumes, as this option tends to worsen performance. Run the following command to disable performance.client-io-threads:
    # gluster volume set VOLNAME performance.client-io-threads off

    Example 5.1. Distributed Volume with Two Storage Servers

    # gluster volume create test-volume server1:/rhgs/brick1 server2:/rhgs/brick1
    Creation of test-volume has been successful
    Please start the volume to access data.

    Example 5.2. Distributed Volume over InfiniBand with Four Servers

    # gluster volume create test-volume transport rdma server1:/rhgs/brick1 server2:/rhgs/brick1 server3:/rhgs/brick1 server4:/rhgs/brick1
    Creation of test-volume has been successful
    Please start the volume to access data.
  2. Run # gluster volume start VOLNAME to start the volume.
    # gluster volume start test-volume
    Starting test-volume has been successful
  3. Run gluster volume info command to optionally display the volume information.
    # gluster volume info
    Volume Name: test-volume
    Type: Distribute
    Status: Created
    Number of Bricks: 2
    Transport-type: tcp
    Bricks:
    Brick1: server1:/rhgs/brick
    Brick2: server2:/rhgs/brick

5.6. Creating Replicated Volumes

Replicated volume creates copies of files across multiple bricks in the volume. Use replicated volumes in environments where high-availability and high-reliability are critical.
Use gluster volume create to create different types of volumes, and gluster volume info to verify successful volume creation.
Prerequisites

5.6.1. Creating Two-way Replicated Volumes (Deprecated)

Warning

As of Red Hat Gluster Storage 3.4, two-way replication without arbiter bricks is considered deprecated. Existing volumes that use two-way replication without arbiter bricks remain supported for this release. New volumes with this configuration are not supported. Red Hat no longer recommends the use of two-way replication without arbiter bricks, and plans to remove support entirely in future versions of Red Hat Gluster Storage. This change affects both replicated and distributed-replicated volumes that do not use arbiter bricks.
Two-way replication without arbiter bricks is being deprecated because it does not provide adequate protection from split-brain conditions. Even in distributed-replicated configurations, two-way replication cannot ensure that the correct copy of a conflicting file is selected without the use of a tie-breaking node.
While a dummy node can be used as an interim solution for this problem, Red Hat strongly recommends that all volumes that currently use two-way replication without arbiter bricks are migrated to use either arbitrated replication or three-way replication.
Instructions for migrating a two-way replicated volume without arbiter bricks to an arbitrated replicated volume are available in the Red Hat Gluster Storage 3.4 Administration Guide: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html-single/administration_guide/#sect-Convert_Rep_to_Arbiter.
Two-way replicated volume creates two copies of files across the bricks in the volume. The number of bricks must be multiple of two for a replicated volume. To protect against server and disk failures, it is recommended that the bricks of the volume are from different servers.
Illustration of a Two-way Replicated Volume

Figure 5.3. Illustration of a Two-way Replicated Volume

Creating two-way replicated volumes
  1. Run the gluster volume create command to create the replicated volume.
    The syntax is # gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...
    The default value for transport is tcp. Other options can be passed such as auth.allow or auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.

    Example 5.3. Replicated Volume with Two Storage Servers

    The order in which bricks are specified determines how they are replicated with each other. For example, every 2 bricks, where 2 is the replica count, forms a replica set. This is illustrated in Figure 5.3, “Illustration of a Two-way Replicated Volume” .
    # gluster volume create test-volume replica 2 transport tcp server1:/rhgs/brick1 server2:/rhgs/brick2
    Creation of test-volume has been successful
    Please start the volume to access data.
  2. Run # gluster volume start VOLNAME to start the volume.
    # gluster volume start test-volume
    Starting test-volume has been successful
  3. Run gluster volume info command to optionally display the volume information.

Important

You must set client-side quorum on replicated volumes to prevent split-brain scenarios. For more information on setting client-side quorum, see Section 11.15.1.2, “Configuring Client-Side Quorum”

5.6.2. Creating Three-way Replicated Volumes

Three-way replicated volume creates three copies of files across multiple bricks in the volume. The number of bricks must be equal to the replica count for a replicated volume. To protect against server and disk failures, it is recommended that the bricks of the volume are from different servers.
Synchronous three-way replication is now fully supported in Red Hat Gluster Storage. It is recommended that three-way replicated volumes use JBOD, but use of hardware RAID with three-way replicated volumes is also supported.
Illustration of a Three-way Replicated Volume

Figure 5.4. Illustration of a Three-way Replicated Volume

Creating three-way replicated volumes
  1. Run the gluster volume create command to create the replicated volume.
    The syntax is # gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...
    The default value for transport is tcp. Other options can be passed such as auth.allow or auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.

    Example 5.4. Replicated Volume with Three Storage Servers

    The order in which bricks are specified determines how bricks are replicated with each other. For example, every n bricks, where 3 is the replica count forms a replica set. This is illustrated in Figure 5.4, “Illustration of a Three-way Replicated Volume”.
    # gluster volume create test-volume replica 3 transport tcp server1:/rhgs/brick1 server2:/rhgs/brick2 server3:/rhgs/brick3
    Creation of test-volume has been successful
    Please start the volume to access data.
  2. Run # gluster volume start VOLNAME to start the volume.
    # gluster volume start test-volume
    Starting test-volume has been successful
  3. Run gluster volume info command to optionally display the volume information.

Important

By default, the client-side quorum is enabled on three-way replicated volumes to minimize split-brain scenarios. For more information on client-side quorum, see Section 11.15.1.2, “Configuring Client-Side Quorum”

5.6.3. Creating Sharded Replicated Volumes

Sharding breaks files into smaller pieces so that they can be distributed across the bricks that comprise a volume. This is enabled on a per-volume basis.
When sharding is enabled, files written to a volume are divided into pieces. The size of the pieces depends on the value of the volume's features.shard-block-size parameter. The first piece is written to a brick and given a GFID like a normal file. Subsequent pieces are distributed evenly between bricks in the volume (sharded bricks are distributed by default), but they are written to that brick's .shard directory, and are named with the GFID and a number indicating the order of the pieces. For example, if a file is split into four pieces, the first piece is named GFID and stored normally. The other three pieces are named GFID.1, GFID.2, and GFID.3 respectively. They are placed in the .shard directory and distributed evenly between the various bricks in the volume.
Because sharding distributes files across the bricks in a volume, it lets you store files with a larger aggregate size than any individual brick in the volume. Because the file pieces are smaller, heal operations are faster, and geo-replicated deployments can sync the small pieces of a file that have changed, rather than syncing the entire aggregate file.
Sharding also lets you increase volume capacity by adding bricks to a volume in an ad-hoc fashion.
5.6.3.1. Supported use cases
Sharding has one supported use case: in the context of providing Red Hat Gluster Storage as a storage domain for Red Hat Enterprise Virtualization, to provide storage for live virtual machine images. Note that sharding is also a requirement for this use case, as it provides significant performance improvements over previous implementations.

Important

Quotas are not compatible with sharding.

Important

Sharding is supported in new deployments only, as there is currently no upgrade path for this feature.

Example 5.5. Example: Three-way replicated sharded volume

  1. Before you start your volume, enable sharding on the volume.
    # gluster volume set test-volume features.shard enable
  2. Start the volume and ensure it is working as expected.
    # gluster volume test-volume start
    # gluster volume info test-volume
5.6.3.2. Configuration Options
Sharding is enabled and configured at the volume level. The configuration options are as follows.
features.shard
Enables or disables sharding on a specified volume. Valid values are enable and disable. The default value is disable.
# gluster volume set volname features.shard enable
Note that this only affects files created after this command is run; files created before this command is run retain their old behaviour.
features.shard-block-size
Specifies the maximum size of the file pieces when sharding is enabled. The supported value for this parameter is 512MB.
# gluster volume set volname features.shard-block-size 32MB
Note that this only affects files created after this command is run; files created before this command is run retain their old behaviour.
5.6.3.3. Finding the pieces of a sharded file
When you enable sharding, you might want to check that it is working correctly, or see how a particular file has been sharded across your volume.
To find the pieces of a file, you need to know that file's GFID. To obtain a file's GFID, run:
# getfattr -d -m. -e hex path_to_file
Once you have the GFID, you can run the following command on your bricks to see how this file has been distributed:
# ls /rhgs/*/.shard -lh | grep GFID

5.7. Creating Distributed Replicated Volumes

Use distributed replicated volumes in environments where the requirement to scale storage, and high-reliability is critical. Distributed replicated volumes also offer improved read performance in most environments.

Note

The number of bricks must be a multiple of the replica count for a distributed replicated volume. Also, the order in which bricks are specified has a great effect on data protection. Each replica_count consecutive bricks in the list you give will form a replica set, with all replica sets combined into a distribute set. To ensure that replica-set members are not placed on the same node, list the first brick on every server, then the second brick on every server in the same order, and so on.
Prerequisites

5.7.1. Creating Two-way Distributed Replicated Volumes

Warning

Support for two-way replication is planned for deprecation and removal in future versions of Red Hat Gluster Storage. This will affect both replicated and distributed-replicated volumes.
Support is being removed because two-way replication does not provide adequate protection from split-brain conditions. While a dummy node can be used as an interim solution for this problem, Red Hat recommends that all volumes that currently use two-way replication are migrated to use either arbitrated replication or three-way replication.
Instructions for migrating a two-way replicated volume to an arbitrated replicated volume are available in Section 5.8.5, “Converting to an arbitrated volume”.
Two-way distributed replicated volumes distribute and create two copies of files across the bricks in a volume. The number of bricks must be multiple of the replica count for a replicated volume. To protect against server and disk failures, the bricks of the volume should be from different servers.
Illustration of a Two-way Distributed Replicated Volume

Figure 5.5. Illustration of a Two-way Distributed Replicated Volume

Creating two-way distributed replicated volumes
  1. Run the gluster volume create command to create the distributed replicated volume.
    The syntax is # gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...
    The default value for transport is tcp. Other options can be passed such as auth.allow or auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.

    Example 5.6. Four Node Distributed Replicated Volume with a Two-way Replication

    The order in which bricks are specified determines how they are replicated with each other. For example, the first two bricks specified replicate each other where 2 is the replica count.
    # gluster volume create test-volume replica 2 transport tcp server1:/rhgs/brick1 server2:/rhgs/brick1 server3:/rhgs/brick1 server4:/rhgs/brick1
    Creation of test-volume has been successful
    Please start the volume to access data.

    Example 5.7. Six Node Distributed Replicated Volume with a Two-way Replication

    # gluster volume create test-volume replica 2 transport tcp server1:/rhgs/brick1 server2:/rhgs/brick1 server3:/rhgs/brick1 server4:/rhgs/brick1 server5:/rhgs/brick1 server6:/rhgs/brick1
    Creation of test-volume has been successful
    Please start the volume to access data.
  2. Run # gluster volume start VOLNAME to start the volume.
    # gluster volume start test-volume
    Starting test-volume has been successful
  3. Run gluster volume info command to optionally display the volume information.

Important

You must ensure to set server-side quorum and client-side quorum on the distributed-replicated volumes to prevent split-brain scenarios. For more information on setting quorums, see Section 11.15.1, “Preventing Split-brain”

5.7.2. Creating Three-way Distributed Replicated Volumes

Three-way distributed replicated volume distributes and creates three copies of files across multiple bricks in the volume. The number of bricks must be equal to the replica count for a replicated volume. To protect against server and disk failures, it is recommended that the bricks of the volume are from different servers.
Synchronous three-way distributed replication is now fully supported in Red Hat Gluster Storage. It is recommended that three-way distributed replicated volumes use JBOD, but use of hardware RAID with three-way distributed replicated volumes is also supported.
Illustration of a Three-way Distributed Replicated Volume

Figure 5.6. Illustration of a Three-way Distributed Replicated Volume

Creating three-way distributed replicated volumes
  1. Run the gluster volume create command to create the distributed replicated volume.
    The syntax is # gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...
    The default value for transport is tcp. Other options can be passed such as auth.allow or auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.

    Example 5.8. Six Node Distributed Replicated Volume with a Three-way Replication

    The order in which bricks are specified determines how bricks are replicated with each other. For example, first 3 bricks, where 3 is the replica count forms a replicate set.
    # gluster volume create test-volume replica 3 transport tcp server1:/rhgs/brick1 server2:/rhgs/brick1 server3:/rhgs/brick1 server4:/rhgs/brick1 server5:/rhgs/brick1 server6:/rhgs/brick1
    Creation of test-volume has been successful
    Please start the volume to access data.
  2. Run # gluster volume start VOLNAME to start the volume.
    # gluster volume start test-volume
    Starting test-volume has been successful
  3. Run gluster volume info command to optionally display the volume information.

Important

By default, the client-side quorum is enabled on three-way distributed replicated volumes. You must also set server-side quorum on the distributed-replicated volumes to prevent split-brain scenarios. For more information on setting quorums, see Section 11.15.1, “Preventing Split-brain”.

5.8. Creating Arbitrated Replicated Volumes

An arbitrated replicated volume is similar to a two-way replicated volume, in that it contains two full copies of the files in the volume. Arbitrated volumes have an extra arbiter brick for every two data bricks in the volume. Arbiter bricks do not store file data; they only store file names, structure, and metadata. Arbiter bricks use client quorum to compare metadata on the arbiter with the metadata of the other nodes to ensure consistency in the volume and prevent split-brain conditions.

Advantages of arbitrated replicated volumes

Better consistency
When an arbiter is configured, arbitration logic uses client-side quorum in auto mode to prevent file operations that would lead to split-brain conditions.
Less disk space required
Because an arbiter brick only stores file names and metadata, an arbiter brick can be much smaller than the other bricks in the volume.
Fewer nodes required
The node that contains the arbiter brick of one volume can be configured with the data brick of another volume. This "chaining" configuration allows you to use fewer nodes to fulfill your overall storage requirements.
Easy migration from deprecated two-way replicated volumes
Red Hat Gluster Storage can convert a two-way replicated volume without arbiter bricks into an arbitrated replicated volume. See Section 5.8.5, “Converting to an arbitrated volume” for details.

Limitations of arbitrated replicated volumes

  • Arbitrated replicated volumes provide better data consistency than a two-way replicated volume that does not have arbiter bricks. However, because arbitrated replicated volumes store only metadata, they provide the same level of availability as a two-way replicated volume that does not have arbiter bricks. To achieve high-availability, you need to use a three-way replicated volume instead of an arbitrated replicated volume.
  • Tiering is not compatible with arbitrated replicated volumes.
  • Arbitrated volumes can only be configured in sets of three bricks at a time. Red Hat Gluster Storage can convert an existing two-way replicated volume without arbiter bricks into an arbitrated replicated volume by adding an arbiter brick to that volume. See Section 5.8.5, “Converting to an arbitrated volume” for details.

5.8.1. Arbitrated volume requirements

This section outlines the requirements of a supported arbitrated volume deployment.
5.8.1.1. System requirements for nodes hosting arbiter bricks
The minimum system requirements for a node that contains an arbiter brick differ depending on the configuration choices made by the administrator. See Section 5.8.4, “Creating multiple arbitrated replicated volumes across fewer total nodes” for details about the differences between the dedicated arbiter and chained arbiter configurations.
Table 5.1. Requirements for arbitrated configurations on physical machines
Configuration typeMin CPUMin RAMNICArbiter Brick SizeMax Latency
Dedicated arbiter64-bit quad-core processor with 2 sockets8 GB[a]Match to other nodes in the storage pool1 TB to 4 TB[b]5 ms[c]
Chained arbiterMatch to other nodes in the storage pool1 TB to 4 TB[d]5 ms[e]
[a] More RAM may be necessary depending on the combined capacity of the number of arbiter bricks on the node.
[b] Arbiter and data bricks can be configured on the same device provided that the data and arbiter bricks belong to different replica sets. See Section 5.8.1.2, “Arbiter capacity requirements” for further details on sizing arbiter volumes.
[c] This is the maximum round trip latency requirement between all nodes irrespective of Aribiter node. See KCS#413623 to know how to determine latency between nodes.
[d] Multiple bricks can be created on a single RAIDed physical device. Please refer the following product documentation: Section 20.2, “Brick Configuration”
[e] This is the maximum round trip latency requirement between all nodes irrespective of Aribiter node. See KCS#413623 to know how to determine latency between nodes.
The requirements for arbitrated configurations on virtual machines are:
  • minimum 4 vCPUs
  • minimum 16 GB RAM
  • 1 TB to 4 TB of virtual disk space
  • maximum 5 ms latency
5.8.1.2. Arbiter capacity requirements
Because an arbiter brick only stores file names and metadata, an arbiter brick can be much smaller than the other bricks in the volume or replica set. The required size for an arbiter brick depends on the number of files being stored on the volume.
The recommended minimum arbiter brick size can be calculated with the following formula:
minimum arbiter brick size = 4 KB * ( size in KB of largest data brick in volume or replica set / average file size in KB)
For example, if you have two 1 TB data bricks, and the average size of the files is 2 GB, then the recommended minimum size for your arbiter brick 2 MB, as shown in the following example:
minimum arbiter brick size  = 4 KB * ( 1 TB / 2 GB )
                            = 4 KB * ( 1000000000 KB / 2000000 KB )
                            = 4 KB * 500 KB
                            = 2000 KB
                            = 2 MB
If sharding is enabled, and your shard-block-size is smaller than the average file size in KB, then you need to use the following formula instead, because each shard also has a metadata file:
minimum arbiter brick size = 4 KB * ( size in KB of largest data brick in volume or replica set / shard block size in KB )
Alternatively, if you know how many files you will store in a volume, the recommended minimum arbiter brick size is the maximum number of files multiplied by 4 KB. For example, if you expect to have 200,000 files on your volume, your arbiter brick should be at least 800,000 KB, or 0.8 GB, in size.
Red Hat also recommends overprovisioning where possible so that there is no short-term need to increase the size of the arbiter brick.

5.8.2. Arbitration logic

In an arbitrated volume, whether a file operation is permitted depends on the current state of the bricks in the volume. The following table describes arbitration behavior in all possible volume states.
Table 5.2. Allowed operations for current volume state
Volume stateArbitration behavior
All bricks availableAll file operations permitted.
Arbiter and 1 data brick available
If the arbiter does not agree with the available data node, write operations fail with ENOTCONN (since the brick that is correct is not available). Other file operations are permitted.
If the arbiter's metadata agrees with the available data node, all file operations are permitted.
Arbiter down, data bricks availableAll file operations are permitted. The arbiter's records are healed when it becomes available.
Only one brick available
If the available brick is a data brick, client quorum is not met, and the volume enters an EROFS state.
If the available brick is the arbiter, all file operations fail with ENOTCONN.

5.8.3. Creating an arbitrated replicated volume

The command for creating an arbitrated replicated volume has the following syntax:
# gluster volume create VOLNAME replica 3 arbiter 1 HOST1:DATA_BRICK1 HOST2:DATA_BRICK2 HOST3:ARBITER_BRICK3
This creates a volume with one arbiter for every three replicate bricks. The arbiter is the last brick in every set of three bricks.

Note

The syntax of this command is misleading. There are a total of 3 bricks in this set. This command creates a volume with two bricks that replicate all data and one arbiter brick that replicates only metadata.
In the following example, the bricks on server3 and server6 are the arbiter bricks. Note that because multiple sets of three bricks are provided, this creates a distributed replicated volume with arbiter bricks.
# gluster volume create testvol replica 3 arbiter 1 \
server1:/bricks/brick server2:/bricks/brick server3:/bricks/arbiter_brick \
server4:/bricks/brick server5:/bricks/brick server6:/bricks/arbiter_brick
# gluster volume info testvol
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: ed9fa4d5-37f1-49bb-83c3-925e90fab1bc
Status: Created
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: server1:/bricks/brick
Brick2: server2:/bricks/brick
Brick3: server3:/bricks/arbiter_brick (arbiter)
Brick1: server4:/bricks/brick
Brick2: server5:/bricks/brick
Brick3: server6:/bricks/arbiter_brick (arbiter)
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on

5.8.4. Creating multiple arbitrated replicated volumes across fewer total nodes

If you are configuring more than one arbitrated-replicated volume, or a single volume with multiple replica sets, you can use fewer nodes in total by using either of the following techniques:
  • Chain multiple arbitrated replicated volumes together, by placing the arbiter brick for one volume on the same node as a data brick for another volume. Chaining is useful for write-heavy workloads when file size is closer to metadata file size (that is, from 32–128 KiB). This avoids all metadata I/O going through a single disk.
    In arbitrated distributed-replicated volumes, you can also place an arbiter brick on the same node as another replica sub-volume's data brick, since these do not share the same data.
  • Place the arbiter bricks from multiple volumes on a single dedicated node. A dedicated arbiter node is suited to write-heavy workloads with larger files, and read-heavy workloads.

Example 5.9. Example of a dedicated configuration

The following commands create two arbitrated replicated volumes, firstvol and secondvol. Server3 contains the arbiter bricks of both volumes.
# gluster volume create firstvol replica 3 arbiter 1 server1:/bricks/brick server2:/bricks/brick server3:/bricks/arbiter_brick
# gluster volume create secondvol replica 3 arbiter 1 server4:/bricks/data_brick server5:/bricks/brick server3:/bricks/brick
Dedicated Arbiter Node Configuration
Two gluster volumes configured across five servers to create two three-way arbitrated replicated volumes, with the arbiter bricks on a dedicated arbiter node.

Example 5.10. Example of a chained configuration

The following command configures an arbitrated replicated volume with six sub-volumes chained across six servers in a 6 x (2 + 1) configuration.
# gluster volume create arbrepvol replica 3 arbiter 1 server1:/bricks/brick1 server2:/bricks/brick1 server3:/bricks/arbiter_brick1 server2:/bricks/brick2 server3:/bricks/brick2 server4:/bricks/arbiter_brick2 server3:/bricks/brick3 server4:/bricks/brick3 server5:/bricks/arbiter_brick3 server4:/bricks/brick4 server5:/bricks/brick4 server6:/bricks/arbiter_brick4 server5:/bricks/brick5 server6:/bricks/brick5 server1:/bricks/arbiter_brick5 server6:/bricks/brick6 server1:/bricks/brick6 server2:/bricks/arbiter_brick6
6 x (2 + 1) Arbitrated Distributed-Replicated Configuration
Six replicated gluster sub-volumes chained across six servers to create a 6 * (2 + 1) arbitrated distributed-replicated configuration.

5.8.5. Converting to an arbitrated volume

You can convert a replicated volume into an arbitrated volume by adding new arbiter bricks for each replicated sub-volume, or replacing replica bricks with arbiter bricks.

Procedure 5.1. Converting a replica 2 volume to an arbitrated volume

Warning

If you follow this process with geo-replication configured, you run the risk of data loss when converting a volume. This race condition is tracked by Bug 1683893 and the workaround is available in the Red Hat Gluster Storage Release Notes.
  1. Verify that healing is not in progress

    # gluster volume heal VOLNAME info
    Wait until pending heal entries is 0 before proceeding.
  2. Disable and stop self-healing

    Run the following commands to disable data, metadata, and entry self-heal, and the self-heal daemon.
    # gluster volume set VOLNAME cluster.data-self-heal off
    # gluster volume set VOLNAME cluster.metadata-self-heal off
    # gluster volume set VOLNAME cluster.entry-self-heal off
    # gluster volume set VOLNAME self-heal-daemon off
  3. Add arbiter bricks to the volume

    Convert the volume by adding an arbiter brick for each replicated sub-volume.
    # gluster volume add-brick VOLNAME replica 3 arbiter 1 HOST:arbiter-brick-path
    For example, if you have an existing two-way replicated volume called testvol, and a new brick for the arbiter to use, you can add a brick as an arbiter with the following command:
    # gluster volume add-brick testvol replica 3 arbiter 1 server:/bricks/arbiter_brick
    If you have an existing two-way distributed-replicated volume, you need a new brick for each sub-volume in order to convert it to an arbitrated distributed-replicated volume, for example:
    # gluster volume add-brick testvol replica 3 arbiter 1 server1:/bricks/arbiter_brick1 server2:/bricks/arbiter_brick2
  4. Wait for client volfiles to update

    This takes about 5 minutes.
  5. Verify that bricks added successfully

    # gluster volume info VOLNAME
    # gluster volume status VOLNAME
  6. Re-enable self-healing

    Run the following commands to re-enable self-healing on the servers.
    # gluster volume set VOLNAME cluster.data-self-heal on
    # gluster volume set VOLNAME cluster.metadata-self-heal on
    # gluster volume set VOLNAME cluster.entry-self-heal on
    # gluster volume set VOLNAME self-heal-daemon on
  7. Verify all entries are healed

    # gluster volume heal VOLNAME info
    Wait until pending heal entries is 0 to ensure that all heals completed successfully.

Procedure 5.2. Converting a replica 3 volume to an arbitrated volume

Warning

If you follow this process with geo-replication configured, you run the risk of data loss when converting a volume. This race condition is tracked by Bug 1683893 and the workaround is available in the Red Hat Gluster Storage Release Notes.
  1. Verify that healing is not in progress

    # gluster volume heal VOLNAME info
    Wait until pending heal entries is 0 before proceeding.
  2. Reduce the replica count of the volume to 2

    Remove one brick from every sub-volume in the volume so that the replica count is reduced to 2. For example, in a replica 3 volume that distributes data across 2 sub-volumes, run the following command:
    # gluster volume remove-brick VOLNAME replica 2 HOST:subvol1-brick-path HOST:subvol2-brick-path force

    Note

    In a distributed replicated volume, data is distributed across sub-volumes, and replicated across bricks in a sub-volume. This means that to reduce the replica count of a volume, you need to remove a brick from every sub-volume.
    Bricks are grouped by sub-volume in the gluster volume info output. If the replica count is 3, the first 3 bricks form the first sub-volume, the next 3 bricks form the second sub-volume, and so on.
    # gluster volume info VOLNAME
    [...]
    Number of Bricks: 2 x 3 = 6
    Transport-type: tcp
    Bricks:
    Brick1: node1:/test1/brick
    Brick2: node2:/test2/brick
    Brick3: node3:/test3/brick
    Brick4: node1:/test4/brick
    Brick5: node2:/test5/brick
    Brick6: node3:/test6/brick
    [...]
    In this volume, data is distributed across two sub-volumes, which each consist of three bricks. The first sub-volume consists of bricks 1, 2, and 3. The second sub-volume consists of bricks 4, 5, and 6. Removing any one brick from each subvolume using the following command reduces the replica count to 2 as required.
    # gluster volume remove-brick VOLNAME replica 2 HOST:subvol1-brick-path HOST:subvol2-brick-path force
  3. Disable and stop self-healing

    Run the following commands to disable data, metadata, and entry self-heal, and the self-heal daemon.
    # gluster volume set VOLNAME cluster.data-self-heal off
    # gluster volume set VOLNAME cluster.metadata-self-heal off
    # gluster volume set VOLNAME cluster.entry-self-heal off
    # gluster volume set VOLNAME self-heal-daemon off
  4. Add arbiter bricks to the volume

    Convert the volume by adding an arbiter brick for each replicated sub-volume.
    # gluster volume add-brick VOLNAME replica 3 arbiter 1 HOST:arbiter-brick-path
    For example, if you have an existing replicated volume:
    # gluster volume add-brick testvol replica 3 arbiter 1 server:/bricks/brick
    If you have an existing distributed-replicated volume:
    # gluster volume add-brick testvol replica 3 arbiter 1 server1:/bricks/arbiter_brick1 server2:/bricks/arbiter_brick2
  5. Wait for client volfiles to update

    This takes about 5 minutes. Verify that this is complete by running the following command on each client.
    # grep -ir connected mount-path/.meta/graphs/active/volname-client-*/private
    The number of times connected=1 appears in the output is the number of bricks connected to the client.
  6. Verify that bricks added successfully

    # gluster volume info VOLNAME
    # gluster volume status VOLNAME
  7. Re-enable self-healing

    Run the following commands to re-enable self-healing on the servers.
    # gluster volume set VOLNAME cluster.data-self-heal on
    # gluster volume set VOLNAME cluster.metadata-self-heal on
    # gluster volume set VOLNAME cluster.entry-self-heal on
    # gluster volume set VOLNAME self-heal-daemon on
  8. Verify all entries are healed

    # gluster volume heal VOLNAME info
    Wait until pending heal entries is 0 to ensure that all heals completed successfully.

5.8.6. Converting an arbitrated volume to a three-way replicated volume

You can convert an arbitrated volume into a three-way replicated volume or a three-way distributed replicated volume by replacing the arbiter bricks with full bricks for each replicated sub-volume.

Warning

If you follow this process with geo-replication configured, you run the risk of data loss when converting a volume. This race condition is tracked by Bug 1683893 and the workaround is available in the Red Hat Gluster Storage Release Notes.

Procedure 5.3. Converting an arbitrated volume to a replica 3 volume

  1. Verify that healing is not in progress

    # gluster volume heal VOLNAME info
    Wait until pending heal entries is 0 before proceeding.
  2. Remove arbiter bricks from the volume

    Check which bricks are listed as (arbiter), and then remove those bricks from the volume.
    # gluster volume info VOLNAME
    # gluster volume remove-brick VOLNAME replica 2 HOST:arbiter-brick-path force
  3. Disable and stop self-healing

    Run the following commands to disable data, metadata, and entry self-heal, and the self-heal daemon.
    # gluster volume set VOLNAME cluster.data-self-heal off
    # gluster volume set VOLNAME cluster.metadata-self-heal off
    # gluster volume set VOLNAME cluster.entry-self-heal off
    # gluster volume set VOLNAME self-heal-daemon off
  4. Add full bricks to the volume

    Convert the volume by adding a brick for each replicated sub-volume.
    # gluster volume add-brick VOLNAME replica 3 HOST:brick-path
    For example, if you have an existing arbitrated replicated volume:
    # gluster volume add-brick testvol replica 3 server:/bricks/brick
    If you have an existing arbitrated distributed-replicated volume:
    # gluster volume add-brick testvol replica 3 server1:/bricks/brick1 server2:/bricks/brick2
  5. Wait for client volfiles to update

    This takes about 5 minutes.
  6. Verify that bricks added successfully

    # gluster volume info VOLNAME
    # gluster volume status VOLNAME
  7. Re-enable self-healing

    Run the following commands to re-enable self-healing on the servers.
    # gluster volume set VOLNAME cluster.data-self-heal on
    # gluster volume set VOLNAME cluster.metadata-self-heal on
    # gluster volume set VOLNAME cluster.entry-self-heal on
    # gluster volume set VOLNAME self-heal-daemon on
  8. Verify all entries are healed

    # gluster volume heal VOLNAME info
    Wait until pending heal entries is 0 to ensure that all heals completed successfully.

5.8.7. Tuning recommendations for arbitrated volumes

Red Hat recommends the following when arbitrated volumes are in use:
  • For dedicated arbiter nodes, use JBOD for arbiter bricks, and RAID6 for data bricks.
  • For chained arbiter volumes, use the same RAID6 drive for both data and arbiter bricks.
See Chapter 20, Tuning for Performance for more information on enhancing performance that is not specific to the use of arbiter volumes.

5.9. Creating Dispersed Volumes

Dispersed volumes are based on erasure coding. Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces and stored across a set of different locations. This allows the recovery of the data stored on one or more bricks in case of failure. The number of bricks that can fail without losing data is configured by setting the redundancy count.
Dispersed volume requires less storage space when compared to a replicated volume. It is equivalent to a replicated pool of size two, but requires 1.5 TB instead of 2 TB to store 1 TB of data when the redundancy level is set to 2. In a dispersed volume, each brick stores some portions of data and parity or redundancy. The dispersed volume sustains the loss of data based on the redundancy level.

Important

Dispersed volume configuration is supported only on JBOD storage. For more information, see Section 20.1.2, “JBOD”.
Illustration of a Dispersed Volume

Figure 5.7. Illustration of a Dispersed Volume

The data protection offered by erasure coding can be represented in simple form by the following equation: n = k + m. Here n is the total number of bricks, we would require any k bricks out of n bricks for recovery. In other words, we can tolerate failure up to any m bricks. With this release, the following configurations are supported:
  • 6 bricks with redundancy level 2 (4 + 2)
  • 10 bricks with redundancy level 2 (8 + 2)
  • 11 bricks with redundancy level 3 (8 + 3)
  • 12 bricks with redundancy level 4 (8 + 4)
  • 20 bricks with redundancy level 4 (16 + 4)
For optimal fault tolerance, create each brick on a separate server. Creating multiple bricks on a single server is supported, but the more bricks there are on a single server, the greater the risk to availability and consistency when that single server becomes unavailable.
Use gluster volume create to create different types of volumes, and gluster volume info to verify successful volume creation.
Prerequisites

Important

Red Hat recommends you to review the Dispersed Volume configuration recommendations explained in Section 5.9, “Creating Dispersed Volumes” before creating the Dispersed volume.
To Create a dispersed volume
  1. Run the gluster volume create command to create the dispersed volume.
    The syntax is # gluster volume create NEW-VOLNAME [disperse-data COUNT] [redundancy COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...
    The number of bricks required to create a disperse volume is the sum of disperse-data count and redundancy count.
    The disperse-data count option specifies the number of bricks that is part of the dispersed volume, excluding the count of the redundant bricks. For example, if the total number of bricks is 6 and redundancy-count is specified as 2, then the disperse-data count is 4 (6 - 2 = 4). If the disperse-data count option is not specified, and only the redundancy count option is specified, then the disperse-data count is computed automatically by deducting the redundancy count from the specified total number of bricks.
    Redundancy determines how many bricks can be lost without interrupting the operation of the volume. If redundancy count is not specified, based on the configuration it is computed automatically to the optimal value and a warning message is displayed.
    The default value for transport is tcp. Other options can be passed such as auth.allow or auth.reject. See Section 5.3, “About Encrypted Disk” for a full list of parameters.

    Example 5.11. Dispersed Volume with Six Storage Servers

    # gluster volume create test-volume disperse-data 4 redundancy 2 transport tcp server1:/rhgs1/brick1 server2:/rhgs2/brick2 server3:/rhgs3/brick3 server4:/rhgs4/brick4 server5:/rhgs5/brick5 server6:/rhgs6/brick6
    Creation of test-volume has been successful
    Please start the volume to access data.
  2. Run # gluster volume start VOLNAME to start the volume.
    # gluster volume start test-volume
    Starting test-volume has been successful

    Important

    The open-behind volume option is enabled by default. If you are accessing the dispersed volume using the SMB protocol, you must disable the open-behind volume option to avoid performance bottleneck on large file workload. Run the following command to disable open-behind volume option:
    # gluster volume set VOLNAME open-behind off
    For information on open-behind volume option, see Section 11.1, “Configuring Volume Options”
  3. Run gluster volume info command to optionally display the volume information.

5.10. Creating Distributed Dispersed Volumes

Distributed dispersed volumes support the same configurations of erasure coding as dispersed volumes. The number of bricks in a distributed dispersed volume must be a multiple of (K+M). With this release, the following configurations are supported:
  • Multiple disperse sets containing 6 bricks with redundancy level 2
  • Multiple disperse sets containing 10 bricks with redundancy level 2
  • Multiple disperse sets containing 11 bricks with redundancy level 3
  • Multiple disperse sets containing 12 bricks with redundancy level 4
  • Multiple disperse sets containing 20 bricks with redundancy level 4

Important

Distributed dispersed volume configuration is supported only on JBOD storage. For more information, see Section 20.1.2, “JBOD”.
Use gluster volume create to create different types of volumes, and gluster volume info to verify successful volume creation.
Prerequisites
Illustration of a Distributed Dispersed Volume

Figure 5.8. Illustration of a Distributed Dispersed Volume

Creating distributed dispersed volumes

Important

Red Hat recommends you to review the Distributed Dispersed Volume configuration recommendations explained in Section 11.16, “Recommended Configurations - Dispersed Volume” before creating the Distributed Dispersed volume.
  1. Run the gluster volume create command to create the dispersed volume.
    The syntax is # gluster volume create NEW-VOLNAME disperse-data COUNT [redundancy COUNT] [transport tcp | rdma | tcp,rdma] NEW-BRICK...
    The default value for transport is tcp. Other options can be passed such as auth.allow or auth.reject. See Section 11.1, “Configuring Volume Options” for a full list of parameters.

    Example 5.12. Distributed Dispersed Volume with Six Storage Servers

    # gluster volume create test-volume disperse-data 4 redundancy 2 transport tcp server1:/rhgs1/brick1 server2:/rhgs2/brick2 server3:/rhgs3/brick3 server4:/rhgs4/brick4 server5:/rhgs5/brick5 server6:/rhgs6/brick6 server1:/rhgs7/brick7 server2:/rhgs8/brick8 server3:/rhgs9/brick9 server4:/rhgs10/brick10 server5:/rhgs11/brick11 server6:/rhgs12/brick12
    Creation of test-volume has been successful
    Please start the volume to access data.
    The above example is illustrated in Figure 5.7, “Illustration of a Dispersed Volume” . In the illustration and example, you are creating 12 bricks from 6 servers.
  2. Run # gluster volume start VOLNAME to start the volume.
    # gluster volume start test-volume
    Starting test-volume has been successful

    Important

    The open-behind volume option is enabled by default. If you are accessing the distributed dispersed volume using the SMB protocol, you must disable the open-behind volume option to avoid performance bottleneck on large file workload. Run the following command to disable open-behind volume option:
    # gluster volume set VOLNAME open-behind off
    For information on open-behind volume option, see Section 11.1, “Configuring Volume Options”
  3. Run gluster volume info command to optionally display the volume information.

5.11. Starting Volumes

Volumes must be started before they can be mounted.
To start a volume, run # gluster volume start VOLNAME

Note

Every volume that is created is exported by default through the SMB protocol. If you want to disable it, please refer Section 6.3.7, “Disabling SMB Shares” before starting the volume.
For example, to start test-volume:
# gluster volume start test-volume
Starting test-volume has been successful

Chapter 6. Creating Access to Volumes

Red Hat Gluster Storage volumes can be accessed using a number of technologies:
Cross Protocol Data Access

Because of differences in locking semantics, a single Red Hat Gluster Storage volume cannot be concurrently accessed by multiple protocols. Current support for concurrent access is defined in the following table.

Table 6.1. Cross Protocol Data Access Matrix
  SMB Gluster NFS NFS-Ganesha Native FUSE Object
SMB Yes No No No No
Gluster NFS No Yes No No No
NFS-Ganesha No No Yes No No
Native FUSE No No No Yes Yes [a]
Object No No No Yes [a] Yes
[a] For more information, refer Section 6.5, “Managing Object Store”.
Access Protocols Supportability

The following table provides the support matrix for the supported access protocols with TCP/RDMA.

Table 6.2. Access Protocol Supportability Matrix
Access Protocols TCP RDMA
FUSEYes Yes
SMB Yes No
NFSYesYes

Important

Red Hat Gluster Storage requires certain ports to be open. You must ensure that the firewall settings allow access to the ports listed at Chapter 3, Considerations for Red Hat Gluster Storage.
Gluster user is created as a part of gluster installation. The purpose of gluster user is to provide privileged access to libgfapi based application (for example, nfs-ganesha and glusterfs-coreutils ). For a normal user of an application, write access to statedump directory is restricted. As a result, attempting to write a state dump to this directory fails. Privileged access is needed by these applications in order to be able to write to the statedump directory. In order to write to this location, the user that runs the application should ensure that the application is added to the gluster user group. After the application is added, restart gluster processes to apply the new group.

6.1. Native Client

Native Client is a FUSE-based client running in user space. Native Client is the recommended method for accessing Red Hat Gluster Storage volumes when high concurrency and high write performance is required.
This section introduces Native Client and describes how to perform the following:
  • Install Native Client packages
  • Mount Red Hat Gluster Storage volumes (manually and automatically)
  • Verify that the Gluster Storage volume has mounted successfully
Table 6.3. Red Hat Gluster Storage Support Matrix
Red Hat Enterprise Linux version Red Hat Gluster Storage version Native client version
6.5 3.0 3.0, 2.1*
6.6 3.0.2, 3.0.3, 3.0.4 3.0, 2.1*
6.73.1, 3.1.1, 3.1.23.1, 3.0, 2.1*
6.83.1.33.1.3
6.93.23.2, 3.1.3*
6.93.33.3, 3.2
6.93.3.13.3.1, 3.3, 3.2
6.103.43.4, 3.3.z
7.13.1, 3.1.13.1.1, 3.1, 3.0
7.23.1.23.1.2, 3.1, 3.0
7.23.1.33.1.3
7.33.23.2, 3.1.3
7.43.23.2, 3.1.3
7.43.33.3, 3.2
7.43.3.13.3.1, 3.3, 3.2
7.53.3.1, 3.43.3.z, 3.4.z
7.63.3.1, 3.43.3.z, 3.4.z

Warning

If you want to access a volume being provided by a server using Red Hat Gluster Storage 3.1.3 or higher, your client must also be using Red Hat Gluster Storage 3.1.3 or higher. Accessing these volumes from earlier client versions can result in data becoming unavailable and problems with directory operations. This requirement exists because Red Hat Gluster Storage 3.1.3 changed how the Distributed Hash Table works in order to improve directory consistency and remove the effects seen in BZ#1115367 and BZ#1118762.

Warning

The following issues are observed and recorded for Red Hat Gluster Storage 3.2 on RHEL 6.x and 7.x using Native Client 3.1.3:
  • gluster volume heal VOLNAME info is unresponsive for some volumes. (BZ#1500542)
  • Gluster brick process crashes frequently. (BZ#1510725)
  • Multiple disconnects on NFS mounts. (BZ#1425740)

Warning

  • For Red Hat Gluster Storage 3.4, Red Hat supports Red Hat Gluster Storage 3.3 and 3.4 clients only.
  • For Red Hat Gluster Storage 3.2, you need to have Red Hat Gluster Storage 3.2 clients. This version is not compatible with backward versions of the client.
For more information on the release version see, https://access.redhat.com/solutions/543123.

6.1.1. Installing Native Client

After installing the client operating system, register the target system to Red Hat Network and subscribe to the Red Hat Enterprise Linux Server channel.

Important

All clients must be of the same version. Red Hat strongly recommends upgrading the servers before upgrading the clients.

Use the Command Line to Register and Subscribe a System to Red Hat Subscription Management

Register the system using the command line, and subscribe to the correct repositories.

Prerequisites

  • Know the user name and password of the Red Hat Subscription Manager account with Red Hat Gluster Storage entitlements.
  1. Run the subscription-manager register command to list the available pools. Select the appropriate pool and enter your Red Hat Subscription Manager user name and password to register the system with Red Hat Subscription Manager.
    # subscription-manager register
  2. Depending on your client, run one of the following commands to subscribe to the correct repositories.
    • For Red Hat Enterprise Linux 7.x clients:
      # subscription-manager repos --enable=rhel-7-server-rpms --enable=rh-gluster-3-client-for-rhel-7-server-rpms

      Note

      The following command can also be used, but Red Hat Gluster Storage may deprecate support for this repository in future releases.
      # subscription-manager repos --enable=rhel-7-server-rh-common-rpms
    • For Red Hat Enterprise Linux 6.1 and later clients:
      # subscription-manager repos --enable=rhel-6-server-rpms --enable=rhel-6-server-rhs-client-1-rpms
    • For Red Hat Enterprise Linux 5.7 and later clients:
      # subscription-manager repos --enable=rhel-5-server-rpms --enable=rhel-5-server-rhs-client-1-rpms
    For more information, see Section 3.2 Registering from the Command Line in Using and Configuring Red Hat Subscription Management.
  3. Verify that the system is subscribed to the required repositories.
    # yum repolist

Use the Web Interface to Register and Subscribe a System

Register the system using the web interface, and subscribe to the correct channels.

Prerequisites

  • Know the user name and password of the Red Hat Subsrciption Management (RHSM) account with Red Hat Gluster Storage entitlements.
  1. Log on to Red Hat Subscription Management (https://access.redhat.com/management).
  2. Click the Systems link at the top of the screen.
  3. Click the name of the system to which the Red Hat Gluster Storage Native Client channel must be appended.
  4. Click Alter Channel Subscriptions in the Subscribed Channels section of the screen.
  5. Expand the node for Additional Services Channels for Red Hat Enterprise Linux 7 for x86_64 or Red Hat Enterprise Linux 6 for x86_64 or for Red Hat Enterprise Linux 5 for x86_64 depending on the client platform.
  6. Click the Change Subscriptions button to finalize the changes.
    When the page refreshes, select the Details tab to verify the system is subscribed to the appropriate channels.

Install Native Client Packages

Install Native Client packages from Red Hat Network
  1. Run the yum install command to install the native client RPM packages.
    # yum install glusterfs glusterfs-fuse
  2. For Red Hat Enterprise 5.x client systems, run the modprobe command to load FUSE modules before mounting Red Hat Gluster Storage volumes.
    # modprobe fuse
    For more information on loading modules at boot time, see https://access.redhat.com/knowledge/solutions/47028 .

6.1.2. Upgrading Native Client

Before updating the Native Client, subscribe the clients to the channels mentioned in Section 6.1.1, “Installing Native Client”

Warning

If you want to access a volume being provided by a server using Red Hat Gluster Storage 3.1.3 or higher, your client must also be using Red Hat Gluster Storage 3.1.3 or higher. Accessing these volumes from earlier client versions can result in data becoming unavailable and problems with directory operations. This requirement exists because Red Hat Gluster Storage 3.1.3 changed how the Distributed Hash Table works in order to improve directory consistency and remove the effects seen in BZ#1115367 and BZ#1118762.
  1. Unmount gluster volumes

    Unmount any gluster volumes prior to upgrading the native client.
    # umount /mnt/glusterfs
  2. Upgrade the client

    Run the yum update command to upgrade the native client:
    # yum update glusterfs glusterfs-fuse
  3. Remount gluster volumes

6.1.3. Mounting Red Hat Gluster Storage Volumes

After installing Native Client, the Red Hat Gluster Storage volumes must be mounted to access data. Three methods are available:
After mounting a volume, test the mounted volume using the procedure described in Section 6.1.3.5, “Testing Mounted Volumes”.

Note

  • Clients should be on the same version as the server, and at least on the version immediately previous to the server version. For Red Hat Gluster Storage 3.4, the recommended native client version should either be 3.4 or 3.3.z. For other versions, see Section 6.1, “Native Client”.
  • Server names selected during volume creation should be resolvable in the client machine. Use appropriate /etc/hosts entries, or a DNS server to resolve server names to IP addresses.
  • Internet Protocol Version 6 (IPv6) support is available only for Red Hat Hyperconverged Infrastructure for Virtualization environments and not for Red Hat Gluster Storage standalone environments.
6.1.3.1. Mount Commands and Options
The following options are available when using the mount -t glusterfs command. All options must be separated with commas.
# mount -t glusterfs -o backup-volfile-servers=volfile_server2:volfile_server3:.... ..:volfile_serverN,transport-type tcp,log-level=WARNING,reader-thread-count=2,log-file=/var/log/gluster.log server1:/test-volume /mnt/glusterfs
backup-volfile-servers=<volfile_server2>:<volfile_server3>:...:<volfile_serverN>
List of the backup volfile servers to mount the client. If this option is specified while mounting the fuse client, when the first volfile server fails, the servers specified in backup-volfile-servers option are used as volfile servers to mount the client until the mount is successful.

Note

This option was earlier specified as backupvolfile-server which is no longer valid.
log-level
Logs only specified level or higher severity messages in the log-file.
log-file
Logs the messages in the specified file.
transport-type
Specifies the transport type that FUSE client must use to communicate with bricks. If the volume was created with only one transport type, then that becomes the default when no value is specified. In case of tcp,rdma volume, tcp is the default.
dump-fuse
This mount option creates dump of fuse traffic between the glusterfs client (fuse userspace server) and the kernel. The interface to mount a glusterfs volume is the standard mount(8) command from the CLI. This feature enables the same in the mount option.
# mount -t glusterfs -odump-fuse=filename hostname:/volname mount-path
For example,
# mount -t glusterfs -odump-fuse=/dumpfile  10.70.43.18:/arbiter /mnt/arbiter
The above command generates a binary file with the name dumpfile.

Note

The fusedump grows large with time and notably if the client gets a heavy load. So this is not an intended use case to do fusedump during normal usage. It is advised to use this to get a dump from a particular scenario, for diagnostic purposes.
You need to unmount and remount the volume without the fusedump option to stop dumping.
ro
Mounts the file system with read-only permissions.
acl
Enables POSIX Access Control List on mount. See Section 6.4.4, “Checking ACL enablement on a mounted volume” for further information.
background-qlen=n
Enables FUSE to handle n number of requests to be queued before subsequent requests are denied. Default value of n is 64.
enable-ino32
Enables file system to present 32-bit inodes instead of 64-bit inodes.
reader-thread-count=n
Enables FUSE to add n number of reader threads that can give better I/O performance. Default value of n is 1.
lru-limit
This mount command option clears the inodes from the least recently used (lru) list (which keeps non-referenced inodes) after the inode limit has reached.
For example,
# mount -olru-limit=NNNN -t glusterfs hostname:/volname /mnt/mountdir
Where NNNN is a positive integer. The default value of NNNN is 128k (131072) and the recommended value is 20000 and above. If 0 is specified as the lru-limit then it means that no invalidation of inodes from the lru-list.
6.1.3.2. Mounting Volumes Manually

Manually Mount a Red Hat Gluster Storage Volume or Subdirectory

Create a mount point and run the following command as required:
For a Red Hat Gluster Storage Volume
mount -t glusterfs HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR
For a Red Hat Gluster Storage Volume's Subdirectory
mount -t glusterfs HOSTNAME|IPADDRESS:/VOLNAME/SUBDIRECTORY /MOUNTDIR

Note

The server specified in the mount command is used to fetch the glusterFS configuration volfile, which describes the volume name. The client then communicates directly with the servers mentioned in the volfile (which may not actually include the server used for mount).
  1. If a mount point has not yet been created for the volume, run the mkdir command to create a mount point.
    # mkdir /mnt/glusterfs
  2. Run the mount -t glusterfs command, using the key in the task summary as a guide.
    1. For a Red Hat Gluster Storage Volume:
      # mount -t glusterfs server1:/test-volume /mnt/glusterfs
    2. For a Red Hat Gluster Storage Volume's Subdirectory
      # mount -t glusterfs server1:/test-volume/sub-dir /mnt/glusterfs
6.1.3.3. Mounting Volumes Automatically
Volumes can be mounted automatically each time the systems starts.
The server specified in the mount command is used to fetch the glusterFS configuration volfile, which describes the volume name. The client then communicates directly with the servers mentioned in the volfile (which may not actually include the server used for mount).
Mounting a Volume Automatically
Mount a Red Hat Gluster Storage Volume automatically at server start.
  1. Open the /etc/fstab file in a text editor.
  2. Append the following configuration to the fstab file:
    For a Red Hat Gluster Storage Volume
    HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR glusterfs defaults,_netdev 0 0
    For a Red Hat Gluster Storage Volume's Subdirectory
    HOSTNAME|IPADDRESS:/VOLNAME/SUBDIRECTORY /MOUNTDIR glusterfs defaults,_netdev 0 0
    Using the example server names, the entry contains the following replaced values.
    server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0
    OR
    server1:/test-volume/subdir /mnt/glusterfs glusterfs defaults,_netdev 0 0
    If you want to specify the transport type then check the following example:
    server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev,transport=tcp 0 0
    OR
    server1:/test-volume/sub-dir /mnt/glusterfs glusterfs defaults,_netdev,transport=tcp 0 0
6.1.3.4. Manually Mounting Sub-directories Using Native Client
With Red Hat Gluster Storage 3.4, you can share a single Gluster volume with different clients and they all can mount only a subset of the volume namespace. This feature is similar to the NFS subdirectory mount feature where you can export a subdirectory of an already exported volume. You can also use this feature to restrict full access to any particular volume.
Mounting subdirectories provides the following benefits:
  • Provides namespace isolation so that multiple users can access the storage without risking namespace collision with other users.
  • Prevents the root file system from becoming full in the event of a mount failure.
You can mount a subdirectory using native client by running either of the following commands:
# mount -t glusterfs hostname:/volname/subdir /mount-point
OR
# mount -t glusterfs hostname:/volname -osubdir-mount=subdir /mount-point
For example:
# gluster volume set test-vol auth.allow "/(192.168.10.*|192.168.11.*),/subdir1(192.168.1.*),/subdir2(192.168.8.*)”
In the above example:
  • The auth.allow option allows only the directories specified as the value of the auth.allow option to be mounted.
  • Each group of auth-allow is separated by a comma (,).
  • Each group has a directory separated by parentheses, (), which contains the valid IP addresses.
  • All subdirectories start with /, that is, no relative path to a volume, but everything is an absolute path, taking / as the root directory of the volume.

Note

By default, the authentication is *, where any given subdirectory in a volume can be mounted by all clients.
6.1.3.5. Testing Mounted Volumes

Testing Mounted Red Hat Gluster Storage Volumes

Using the command-line, verify the Red Hat Gluster Storage volumes have been successfully mounted. All three commands can be run in the order listed, or used independently to verify a volume has been successfully mounted.
  1. Run the mount command to check whether the volume was successfully mounted.
    # mount
    server1:/test-volume on /mnt/glusterfs type fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
    OR
    # mount
    server1:/test-volume/sub-dir on /mnt/glusterfs type fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
    If transport option is used while mounting a volume, mount status will have the transport type appended to the volume name. For example, for transport=tcp:
    # mount
    server1:/test-volume.tcp on /mnt/glusterfs type fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
    OR
    # mount
    server1:/test-volume/sub-dir.tcp on /mnt/glusterfs type fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
  2. Run the df command to display the aggregated storage space from all the bricks in a volume.
    # df -h /mnt/glusterfs
    Filesystem           Size  Used  Avail  Use%  Mounted on
    server1:/test-volume  28T  22T   5.4T   82%   /mnt/glusterfs
  3. Move to the mount directory using the cd command, and list the contents.
    # cd /mnt/glusterfs
    # ls

6.2. NFS

Red Hat Gluster Storage has two NFS server implementations, Gluster NFS and NFS-Ganesha. Gluster NFS supports only NFSv3 protocol, however, NFS-Ganesha supports NFSv3 and NFSv4 protocols.

6.2.1. Support Matrix

The following table contains the feature matrix of the NFS support on Red Hat Gluster Storage 3.1 and later:
Table 6.4. NFS Support Matrix
Features glusterFS NFS (NFSv3) NFS-Ganesha (NFSv3) NFS-Ganesha (NFSv4)
Root-squash Yes Yes Yes
All-squash No Yes Yes
Sub-directory exportsYes Yes Yes
LockingYes Yes Yes
Client based export permissionsYes Yes Yes
NetgroupsYesYesYes
Mount protocolsUDP, TCPUDP, TCPOnly TCP
NFS transport protocolsTCPUDP, TCPTCP
AUTH_UNIXYesYesYes
AUTH_NONEYesYesYes
AUTH_KRBNoYesYes
ACLsYesNoYes
DelegationsN/AN/ANo
High availabilityYes (but with certain limitations. For more information see, "Setting up CTDB for NFS")YesYes
Multi-headYesYesYes
Gluster RDMA volumesYesNot supportedNot supported
DRCNot supportedYesYes
Dynamic exportsNoYesYes
pseudofsN/AN/AYes
NFSv4.1N/AN/ANot Supported

Note

  • Red Hat does not recommend running NFS-Ganesha with any other NFS servers, such as, kernel-NFS and Gluster NFS servers.
  • Only one of NFS-Ganesha, gluster-NFS or kernel-NFS servers can be enabled on a given machine/host as all NFS implementations use the port 2049 and only one can be active at a given time. Hence you must disable kernel-NFS before NFS-Ganesha is started.

6.2.2. Gluster NFS

Linux, and other operating systems that support the NFSv3 standard can use NFS to access the Red Hat Gluster Storage volumes.

Note

From the Red Hat Gluster Storage 3.2 release onwards, Gluster NFS server will be disabled by default for any new volumes that are created. You can restart Gluster NFS server on the new volumes explicitly if needed. This can be done running the “mount -t nfs” command on the client as below:
On any one of the server node:
# gluster volume set VOLNAME nfs.disable off
However, existing volumes (using Gluster NFS server) will not be impacted even after upgrade to Red Hat Gluster Storage 3.2 and will have implicit enablement of Gluster NFS server.
Differences in implementation of the NFSv3 standard in operating systems may result in some operational issues. If issues are encountered when using NFSv3, contact Red Hat support to receive more information on Red Hat Gluster Storage client operating system compatibility, and information about known issues affecting NFSv3.
NFS ACL v3 is supported, which allows getfacl and setfacl operations on NFS clients. The following options are provided to configure the Access Control Lists (ACL) in the glusterFS NFS server with the nfs.acl option. For example:
  • To set nfs.acl ON, run the following command:
    # gluster volume set VOLNAME nfs.acl on
  • To set nfs.acl OFF, run the following command:
    # gluster volume set VOLNAME nfs.acl off

Note

ACL is ON by default.
Red Hat Gluster Storage includes Network Lock Manager (NLM) v4. NLM protocol allows NFSv3 clients to lock files across the network. NLM is required to make applications running on top of NFSv3 mount points to use the standard fcntl() (POSIX) and flock() (BSD) lock system calls to synchronize access across clients.
This section describes how to use NFS to mount Red Hat Gluster Storage volumes (both manually and automatically) and how to verify that the volume has been mounted successfully.

Important

On Red Hat Enterprise Linux 7, enable the firewall service in the active zones for runtime and permanent mode using the following commands:
To get a list of active zones, run the following command:
# firewall-cmd --get-active-zones
To allow the firewall service in the active zones, run the following commands:
# firewall-cmd --zone=zone_name --add-service=nfs --add-service=rpc-bind
# firewall-cmd --zone=zone_name --add-service=nfs --add-service=rpc-bind --permanent
6.2.2.1. Setting up CTDB for Gluster NFS
In a replicated volume environment, the CTDB software (Cluster Trivial Database) has to be configured to provide high availability and lock synchronization for Samba shares. CTDB provides high availability by adding virtual IP addresses (VIPs) and a heartbeat service.
When a node in the trusted storage pool fails, CTDB enables a different node to take over the virtual IP addresses that the failed node was hosting. This ensures the IP addresses for the services provided are always available. However, locks are not migrated as part of failover.

Important

On Red Hat Enterprise Linux 7, enable the CTDB firewall service in the active zones for runtime and permanent mode using the below commands:
To get a list of active zones, run the following command:
# firewall-cmd --get-active-zones
To add ports to the active zones, run the following commands:
# firewall-cmd --zone=zone_name --add-port=4379/tcp
# firewall-cmd --zone=zone_name --add-port=4379/tcp  --permanent

Note

Amazon Elastic Compute Cloud (EC2) does not support VIPs and is hence not compatible with this solution.
6.2.2.1.1. Prerequisites
Follow these steps before configuring CTDB on a Red Hat Gluster Storage Server:
  • If you already have an older version of CTDB (version <= ctdb1.x), then remove CTDB by executing the following command:
    # yum remove ctdb
    After removing the older version, proceed with installing the latest CTDB.

    Note

    Ensure that the system is subscribed to the samba channel to get the latest CTDB packages.
  • Install CTDB on all the nodes that are used as NFS servers to the latest version using the following command:
    # yum install ctdb
  • CTDB uses TCP port 4379 by default. Ensure that this port is accessible between the Red Hat Gluster Storage servers.
6.2.2.1.2. Port and Firewall Information for Gluster NFS
On the GNFS-Client machine, configure firewalld to add ports used by statd, nlm and portmapper services by executing the following commands:
# firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp \
    --add-port=32803/tcp --add-port=32769/udp \ --add-port=111/tcp --add-port=111/udp
# firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp \
    --add-port=32803/tcp --add-port=32769/udp \ --add-port=111/tcp --add-port=111/udp --permanent
Execute the following steps on the client machine:
  • Edit /etc/sysconfig/nfs file as mentioned below:
    # sed -i '/STATD_PORT/s/^#//' /etc/sysconfig/nfs
  • Restart the services:
    • For Red Hat Enterprise Linux 6:
                            # service nfslock restart
                            # service nfs restart
    • For Red Hat Enterprise Linux 7:
                            # systemctl restart nfs-config
                            # systemctl restart rpc-statd
                            # systemctl restart nfs-mountd
                            # systemctl restart nfslock
6.2.2.1.3. Configuring CTDB on Red Hat Gluster Storage Server
To configure CTDB on Red Hat Gluster Storage server, execute the following steps:
  1. Create a replicate volume. This volume will host only a zero byte lock file, hence choose minimal sized bricks. To create a replicate volume run the following command:
    # gluster volume create volname replica n ipaddress:/brick path.......N times
    where,
    N: The number of nodes that are used as Gluster NFS servers. Each node must host one brick.
    For example:
    # gluster volume create ctdb replica 3 10.16.157.75:/rhgs/brick1/ctdb/b1 10.16.157.78:/rhgs/brick1/ctdb/b2 10.16.157.81:/rhgs/brick1/ctdb/b3
  2. In the following files, replace "all" in the statement META="all" to the newly created volume name
    /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
    /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh
    For example:
    META="all"
      to
    META="ctdb"
  3. Start the volume.
    # gluster volume start ctdb
    As part of the start process, the S29CTDBsetup.sh script runs on all Red Hat Gluster Storage servers, adds an entry in /etc/fstab for the mount, and mounts the volume at /gluster/lock on all the nodes with Gluster NFS server. It also enables automatic start of CTDB service on reboot.

    Note

    When you stop the special CTDB volume, the S29CTDB-teardown.sh script runs on all Red Hat Gluster Storage servers and removes an entry in /etc/fstab for the mount and unmounts the volume at /gluster/lock.
  4. Verify if the file /etc/sysconfig/ctdb exists on all the nodes that is used as Gluster NFS server. This file contains Red Hat Gluster Storage recommended CTDB configurations.
  5. Create /etc/ctdb/nodes file on all the nodes that is used as Gluster NFS servers and add the IPs of these nodes to the file.
    10.16.157.0
    10.16.157.3
    10.16.157.6
    The IPs listed here are the private IPs of NFS servers.
  6. On all the nodes that are used as Gluster NFS server which require IP failover, create /etc/ctdb/public_addresses file and add the virtual IPs that CTDB should create to this file. Add these IP address in the following format:
    <Virtual IP>/<routing prefix><node interface>
    For example:
    192.168.1.20/24 eth0
    192.168.1.21/24 eth0
  7. Start the CTDB service on all the nodes by executing the following command:
    # service ctdb start

Note

CTDB with gNFS only provides node level high availability and is not capable of detecting NFS service failure. Therefore, CTDB does not provide high availability if the NFS service goes down while the node is still up and running.
6.2.2.2. Using Gluster NFS to Mount Red Hat Gluster Storage Volumes
You can use either of the following methods to mount Red Hat Gluster Storage volumes:

Note

Currently GlusterFS NFS server only supports version 3 of NFS protocol. As a preferred option, always configure version 3 as the default version in the nfsmount.conf file at /etc/nfsmount.conf by adding the following text in the file:
Defaultvers=3
In case the file is not modified, then ensure to add vers=3 manually in all the mount commands.
# mount nfsserver:export -o vers=3 /MOUNTPOINT
RDMA support in GlusterFS that is mentioned in the previous sections is with respect to communication between bricks and Fuse mount/GFAPI/NFS server. NFS kernel client will still communicate with GlusterFS NFS server over tcp.
In case of volumes which were created with only one type of transport, communication between GlusterFS NFS server and bricks will be over that transport type. In case of tcp,rdma volume it could be changed using the volume set option nfs.transport-type.
After mounting a volume, you can test the mounted volume using the procedure described in .Section 6.2.2.2.4, “Testing Volumes Mounted Using Gluster NFS”
6.2.2.2.1. Manually Mounting Volumes Using Gluster NFS
Create a mount point and run the mount command to manually mount a Red Hat Gluster Storage volume using Gluster NFS.
  1. If a mount point has not yet been created for the volume, run the mkdir command to create a mount point.
    # mkdir /mnt/glusterfs
  2. Run the correct mount command for the system.
    For Linux
    # mount -t nfs -o vers=3 server1:/test-volume /mnt/glusterfs
    For Solaris
    # mount -o vers=3 nfs://server1:38467/test-volume /mnt/glusterfs
Manually Mount a Red Hat Gluster Storage Volume using Gluster NFS over TCP
Create a mount point and run the mount command to manually mount a Red Hat Gluster Storage volume using Gluster NFS over TCP.

Note

glusterFS NFS server does not support UDP. If a NFS client such as Solaris client, connects by default using UDP, the following message appears:
requested NFS version or transport protocol is not supported
The option nfs.mount-udp is supported for mounting a volume, by default it is disabled. The following are the limitations:
  • If nfs.mount-udp is enabled, the MOUNT protocol needed for NFSv3 can handle requests from NFS-clients that require MOUNT over UDP. This is useful for at least some versions of Solaris, IBM AIX and HP-UX.
  • Currently, MOUNT over UDP does not have support for mounting subdirectories on a volume. Mounting server:/volume/subdir exports is only functional when MOUNT over TCP is used.
  • MOUNT over UDP does not currently have support for different authentication options that MOUNT over TCP honors. Enabling nfs.mount-udp may give more permissions to NFS clients than intended via various authentication options like nfs.rpc-auth-allow, nfs.rpc-auth-reject and nfs.export-dir.
  1. If a mount point has not yet been created for the volume, run the mkdir command to create a mount point.
    # mkdir /mnt/glusterfs
  2. Run the correct mount command for the system, specifying the TCP protocol option for the system.
    For Linux
    # mount -t nfs -o vers=3,mountproto=tcp server1:/test-volume /mnt/glusterfs
    For Solaris
    # mount -o proto=tcp, nfs://server1:38467/test-volume /mnt/glusterfs
6.2.2.2.2. Automatically Mounting Volumes Using Gluster NFS
Red Hat Gluster Storage volumes can be mounted automatically using Gluster NFS, each time the system starts.

Note

In addition to the tasks described below, Red Hat Gluster Storage supports Linux, UNIX, and similar operating system's standard method of auto-mounting Gluster NFS mounts.
Update the /etc/auto.master and /etc/auto.misc files, and restart the autofs service. Whenever a user or process attempts to access the directory it will be mounted in the background on-demand.
Mounting a Volume Automatically using NFS
Mount a Red Hat Gluster Storage Volume automatically using NFS at server start.
  1. Open the /etc/fstab file in a text editor.
  2. Append the following configuration to the fstab file.
    HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR nfs defaults,_netdev, 0 0
    Using the example server names, the entry contains the following replaced values.
    server1:/test-volume /mnt/glusterfs nfs defaults,_netdev, 0 0
Mounting a Volume Automatically using NFS over TCP
Mount a Red Hat Gluster Storage Volume automatically using NFS over TCP at server start.
  1. Open the /etc/fstab file in a text editor.
  2. Append the following configuration to the fstab file.
    HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR nfs defaults,_netdev,mountproto=tcp 0 0
    Using the example server names, the entry contains the following replaced values.
    server1:/test-volume /mnt/glusterfs nfs defaults,_netdev,mountproto=tcp 0 0
6.2.2.2.3. Automatically Mounting Subdirectories Using NFS
The nfs.export-dir and nfs.export-dirs options provide granular control to restrict or allow specific clients to mount a sub-directory. These clients can be authenticated during sub-directory mount with either an IP, host name or a Classless Inter-Domain Routing (CIDR) range.
nfs.export-dirs
This option is enabled by default. It allows the sub-directories of exported volumes to be mounted by clients without needing to export individual sub-directories. When enabled, all sub-directories of all volumes are exported. When disabled, sub-directories must be exported individually in order to mount them on clients.
To disable this option for all volumes, run the following command:
# gluster volume set VOLNAME nfs.export-dirs off
nfs.export-dir
When nfs.export-dirs is set to on, the nfs.export-dir option allows you to specify one or more sub-directories to export, rather than exporting all subdirectories (nfs.export-dirs on), or only exporting individually exported subdirectories (nfs.export-dirs off).
To export certain subdirectories, run the following command:
# gluster volume set VOLNAME nfs.export-dir subdirectory
The subdirectory path should be the path from the root of the volume. For example, in a volume with six subdirectories, to export the first three subdirectories, the command would be the following:
# gluster volume set myvolume nfs.export-dir /dir1,/dir2,/dir3
Subdirectories can also be exported based on the IP address, hostname, or a Classless Inter-Domain Routing (CIDR) range by adding these details in parentheses after the directory path:
# gluster volume set VOLNAME nfs.export-dir subdirectory(IPADDRESS),subdirectory(HOSTNAME),subdirectory(CIDR)
# gluster volume set myvolume nfs.export-dir /dir1(192.168.10.101),/dir2(storage.example.com),/dir3(192.168.98.0/24)
6.2.2.2.4. Testing Volumes Mounted Using Gluster NFS
You can confirm that Red Hat Gluster Storage directories are mounting successfully.
To test mounted volumes

Testing Mounted Red Hat Gluster Storage Volumes

Using the command-line, verify the Red Hat Gluster Storage volumes have been successfully mounted. All three commands can be run in the order listed, or used independently to verify a volume has been successfully mounted.
  1. Run the mount command to check whether the volume was successfully mounted.
    # mount
    server1:/test-volume on /mnt/glusterfs type nfs (rw,addr=server1)
  2. Run the df command to display the aggregated storage space from all the bricks in a volume.
    # df -h /mnt/glusterfs
    Filesystem              Size Used Avail Use% Mounted on
    server1:/test-volume    28T  22T  5.4T  82%  /mnt/glusterfs
  3. Move to the mount directory using the cd command, and list the contents.
    # cd /mnt/glusterfs
    # ls
6.2.2.3. Troubleshooting Gluster NFS
Q: The mount command on the NFS client fails with RPC Error: Program not registered. This error is encountered due to one of the following reasons:
Q: The rpcbind service is not running on the NFS client. This could be due to the following reasons:
Q: The NFS server glusterfsd starts but the initialization fails with nfsrpc- service: portmap registration of program failed error message in the log.
Q: The NFS server start-up fails with the message Port is already in use in the log file.
Q: The mount command fails with NFS server failed error:
Q: The showmount command fails with clnt_create: RPC: Unable to receive error. This error is encountered due to the following reasons:
Q: The application fails with Invalid argument or Value too large for defined data type
Q: After the machine that is running NFS server is restarted the client fails to reclaim the locks held earlier.
Q: The rpc actor failed to complete successfully error is displayed in the nfs.log, even after the volume is mounted successfully.
Q: The mount command fails with No such file or directory.
Q:
The mount command on the NFS client fails with RPC Error: Program not registered. This error is encountered due to one of the following reasons:
  • The NFS server is not running. You can check the status using the following command:
    # gluster volume status
  • The volume is not started. You can check the status using the following command:
    # gluster volume info
  • rpcbind is restarted. To check if rpcbind is running, execute the following command:
    # ps ax| grep rpcbind
A:
  • If the NFS server is not running, then restart the NFS server using the following command:
    # gluster volume start VOLNAME
  • If the volume is not started, then start the volume using the following command:
    # gluster volume start VOLNAME
  • If both rpcbind and NFS server is running then restart the NFS server using the following commands:
    # gluster volume stop VOLNAME
    # gluster volume start VOLNAME
Q:
The rpcbind service is not running on the NFS client. This could be due to the following reasons:
  • The portmap is not running.
  • Another instance of kernel NFS server or glusterNFS server is running.
A:
Start the rpcbind service by running the following command:
# service rpcbind start
Q:
The NFS server glusterfsd starts but the initialization fails with nfsrpc- service: portmap registration of program failed error message in the log.
A:
NFS start-up succeeds but the initialization of the NFS service can still fail preventing clients from accessing the mount points. Such a situation can be confirmed from the following error messages in the log file:
[2010-05-26 23:33:47] E [rpcsvc.c:2598:rpcsvc_program_register_portmap] rpc-service: Could notregister with portmap
[2010-05-26 23:33:47] E [rpcsvc.c:2682:rpcsvc_program_register] rpc-service: portmap registration of program failed
[2010-05-26 23:33:47] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465
[2010-05-26 23:33:47] E [nfs.c:125:nfs_init_versions] nfs: Program init failed
[2010-05-26 23:33:47] C [nfs.c:531:notify] nfs: Failed to initialize protocols
[2010-05-26 23:33:49] E [rpcsvc.c:2614:rpcsvc_program_unregister_portmap] rpc-service: Could not unregister with portmap
[2010-05-26 23:33:49] E [rpcsvc.c:2731:rpcsvc_program_unregister] rpc-service: portmap unregistration of program failed
[2010-05-26 23:33:49] E [rpcsvc.c:2744:rpcsvc_program_unregister] rpc-service: Program unregistration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465
  1. Start the rpcbind service on the NFS server by running the following command:
    # service rpcbind start
    After starting rpcbind service, glusterFS NFS server needs to be restarted.
  2. Stop another NFS server running on the same machine.
    Such an error is also seen when there is another NFS server running on the same machine but it is not the glusterFS NFS server. On Linux systems, this could be the kernel NFS server. Resolution involves stopping the other NFS server or not running the glusterFS NFS server on the machine. Before stopping the kernel NFS server, ensure that no critical service depends on access to that NFS server's exports.
    On Linux, kernel NFS servers can be stopped by using either of the following commands depending on the distribution in use:
    # service nfs-kernel-server stop
    # service nfs stop
  3. Restart glusterFS NFS server.
Q:
The NFS server start-up fails with the message Port is already in use in the log file.
A:
This error can arise in case there is already a glusterFS NFS server running on the same machine. This situation can be confirmed from the log file, if the following error lines exist:
[2010-05-26 23:40:49] E [rpc-socket.c:126:rpcsvc_socket_listen] rpc-socket: binding socket failed:Address already in use
[2010-05-26 23:40:49] E [rpc-socket.c:129:rpcsvc_socket_listen] rpc-socket: Port is already in use
[2010-05-26 23:40:49] E [rpcsvc.c:2636:rpcsvc_stage_program_register] rpc-service: could not create listening connection
[2010-05-26 23:40:49] E [rpcsvc.c:2675:rpcsvc_program_register] rpc-service: stage registration of program failed
[2010-05-26 23:40:49] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465
[2010-05-26 23:40:49] E [nfs.c:125:nfs_init_versions] nfs: Program init failed
[2010-05-26 23:40:49] C [nfs.c:531:notify] nfs: Failed to initialize protocols
In this release, the glusterFS NFS server does not support running multiple NFS servers on the same machine. To resolve the issue, one of the glusterFS NFS servers must be shutdown.
Q:
The mount command fails with NFS server failed error:
A:
mount: mount to NFS server '10.1.10.11' failed: timed out (retrying).
Review and apply the suggested solutions to correct the issue.
  • Disable name lookup requests from NFS server to a DNS server.
    The NFS server attempts to authenticate NFS clients by performing a reverse DNS lookup to match host names in the volume file with the client IP addresses. There can be a situation where the NFS server either is not able to connect to the DNS server or the DNS server is taking too long to respond to DNS request. These delays can result in delayed replies from the NFS server to the NFS client resulting in the timeout error.
    NFS server provides a work-around that disables DNS requests, instead relying only on the client IP addresses for authentication. The following option can be added for successful mounting in such situations:
    option nfs.addr.namelookup off

    Note

    Remember that disabling the NFS server forces authentication of clients to use only IP addresses. If the authentication rules in the volume file use host names, those authentication rules will fail and client mounting will fail.
  • NFS version used by the NFS client is other than version 3 by default.
    glusterFS NFS server supports version 3 of NFS protocol by default. In recent Linux kernels, the default NFS version has been changed from 3 to 4. It is possible that the client machine is unable to connect to the glusterFS NFS server because it is using version 4 messages which are not understood by glusterFS NFS server. The timeout can be resolved by forcing the NFS client to use version 3. The vers option to mount command is used for this purpose:
    # mount nfsserver:export -o vers=3 /MOUNTPOINT
Q:
The showmount command fails with clnt_create: RPC: Unable to receive error. This error is encountered due to the following reasons:
  • The firewall might have blocked the port.
  • rpcbind might not be running.
A:
Check the firewall settings, and open ports 111 for portmap requests/replies and glusterFS NFS server requests/replies. glusterFS NFS server operates over the following port numbers: 38465, 38466, and 38467.
Q:
The application fails with Invalid argument or Value too large for defined data type
A:
These two errors generally happen for 32-bit NFS clients, or applications that do not support 64-bit inode numbers or large files.
Use the following option from the command-line interface to make glusterFS NFS return 32-bit inode numbers instead:
NFS.enable-ino32 <on | off>
This option is off by default, which permits NFS to return 64-bit inode numbers by default.
Applications that will benefit from this option include those that are:
  • built and run on 32-bit machines, which do not support large files by default,
  • built to 32-bit standards on 64-bit systems.
Applications which can be rebuilt from source are recommended to be rebuilt using the following flag with gcc:
-D_FILE_OFFSET_BITS=64
Q:
After the machine that is running NFS server is restarted the client fails to reclaim the locks held earlier.
A:
The Network Status Monitor (NSM) service daemon (rpc.statd) is started before gluster NFS server. Hence, NSM sends a notification to the client to reclaim the locks. When the clients send the reclaim request, the NFS server does not respond as it is not started yet. Hence the client request fails.
Solution: To resolve the issue, prevent the NSM daemon from starting when the server starts.
Run chkconfig --list nfslock to check if NSM is configured during OS boot.
If any of the entries are on,run chkconfig nfslock off to disable NSM clients during boot, which resolves the issue.
Q:
The rpc actor failed to complete successfully error is displayed in the nfs.log, even after the volume is mounted successfully.
A:
gluster NFS supports only NFS version 3. When nfs-utils mounts a client when the version is not mentioned, it tries to negotiate using version 4 before falling back to version 3. This is the cause of the messages in both the server log and the nfs.log file.
[2013-06-25 00:03:38.160547] W [rpcsvc.c:180:rpcsvc_program_actor] 0-rpc-service: RPC program version not available (req 100003 4)
[2013-06-25 00:03:38.160669] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
To resolve the issue, declare NFS version 3 and the noacl option in the mount command as follows:
# mount -t nfs -o vers=3,noacl server1:/test-volume /mnt/glusterfs
Q:
The mount command fails with No such file or directory.
A:
This problem is encountered as the volume is not present.

6.2.3. NFS Ganesha

NFS-Ganesha is a user space file server for the NFS protocol with support for NFSv3 and NFSv4.
Red Hat Gluster Storage 3.4 is supported with the community’s V2.5 stable release of NFS-Ganesha on Red Hat Enterprise Linux 7. To understand the various supported features of NFS-ganesha see, Supported Features of NFS-Ganesha.

Note

To install NFS-Ganesha refer, Deploying NFS-Ganesha on Red Hat Gluster Storage in the Red Hat Gluster Storage 3.4 Installation Guide.
Red Hat Gluster Storage does not support NFSv4 delegations. For more information refer, Support matrix.
6.2.3.1. Supported Features of NFS-Ganesha
The following list briefly describes the supported features of NFS-Ganesha:
Highly Available Active-Active NFS-Ganesha

In a highly available active-active environment, if a NFS-Ganesha server that is connected to a NFS client running a particular application goes down, the application/NFS client is seamlessly connected to another NFS-Ganesha server without any administrative intervention.

Data coherency across the multi-head NFS-Ganesha servers in the cluster is achieved using the Gluster’s Upcall infrastructure. Gluster’s Upcall infrastructure is a generic and extensible framework that sends notifications to the respective glusterfs clients (in this case NFS-Ganesha server) when changes are detected in the back-end file system.
Dynamic Export of Volumes

NFS-Ganesha supports addition and removal of exports dynamically. Dynamic exports is managed by the DBus interface. DBus is a system local IPC mechanism for system management and peer-to-peer application communication.

Exporting Multiple Entries

In NFS-Ganesha, multiple Red Hat Gluster Storage volumes or sub-directories can be exported simultaneously.

Pseudo File System

NFS-Ganesha creates and maintains a NFSv4 pseudo-file system, which provides clients with seamless access to all exported objects on the server.

Access Control List

NFS-Ganesha NFSv4 protocol includes integrated support for Access Control List (ACL)s, which are similar to those used by Windows. These ACLs can be used to identify a trustee and specify the access rights allowed, or denied for that trustee.This feature is disabled by default.

Note

AUDIT and ALARM ACE types are not currently supported.
6.2.3.2. Setting up NFS Ganesha
To set up NFS Ganesha, follow the steps mentioned in the further sections.

Note

You can also set up NFS-Ganesha using gdeploy, that automates the steps mentioned below. For more information, see "Deploying NFS-Ganesha"
6.2.3.2.1. Port and Firewall Information for NFS-Ganesha
You must ensure to open the ports and firewall services:
The following table lists the port details for NFS-Ganesha cluster setup:
Table 6.5. NFS Port Details
Service Port Number Protocol
sshd 22TCP
rpcbind/portmapper 111TCP/UDP
NFS 2049TCP/UDP
mountd 20048TCP/UDP
NLM 32803TCP/UDP
RQuota 875TCP/UDP
statd 662TCP/UDP
pcsd2224TCP
pacemaker_remote3121TCP
corosync5404 and 5405UDP
dlm21064TCP

Note

The port details for the Red Hat Gluster Storage services are listed under section 3. Verifying Port Access.
Defining Service Ports

Ensure the statd service is configured to use the ports mentioned above by executing the following commands on every node in the nfs-ganesha cluster:

  1. Edit /etc/sysconfig/nfs file as mentioned below:
    # sed -i '/STATD_PORT/s/^#//' /etc/sysconfig/nfs
  2. Restart the statd service:
    For Red Hat Enterprise Linux 7:
    # systemctl restart nfs-config
    # systemctl restart rpc-statd

Note

For the NFS client to use the LOCK functionality, the ports used by LOCKD and STATD daemons has to be configured and opened via firewalld on the client machine:
  1. Edit '/etc/sysconfig/nfs' using following commands:
    # sed -i '/STATD_PORT/s/^#//' /etc/sysconfig/nfs
    # sed -i '/LOCKD_TCPPORT/s/^#//' /etc/sysconfig/nfs
    # sed -i '/LOCKD_UDPPORT/s/^#//' /etc/sysconfig/nfs
  2. Restart the services:
    For Red Hat Enterprise Linux 7:
    # systemctl restart nfs-config
    # systemctl restart rpc-statd
    # systemctl restart nfslock
  3. Open the ports that are configured in the first step using the following commnad:
    # firewall-cmd --zone=zone_name --add-port=662/tcp --add-port=662/udp \
    --add-port=32803/tcp --add-port=32769/udp
    
    # firewall-cmd --zone=zone_name --add-port=662/tcp --add-port=662/udp \
    --add-port=32803/tcp --add-port=32769/udp --permanent
  4. To ensure NFS client UDP mount does not fail, ensure to open port 2049 by executing the following command:
    # firewall-cmd --zone=zone_name --add-port=2049/udp
    # firewall-cmd --zone=zone_name --add-port=2049/udp --permanent
  • Firewall Settings

    On Red Hat Enterprise Linux 7, enable the firewall services mentioned below.
    1. Get a list of active zones using the following command:
      # firewall-cmd --get-active-zones
    2. Allow the firewall service in the active zones, run the following commands:
      # firewall-cmd --zone=zone_name --add-service=nlm  --add-service=nfs  --add-service=rpc-bind  --add-service=high-availability --add-service=mountd --add-service=rquota
      
      # firewall-cmd --zone=zone_name  --add-service=nlm  --add-service=nfs  --add-service=rpc-bind  --add-service=high-availability --add-service=mountd --add-service=rquota --permanent
      
      # firewall-cmd --zone=zone_name --add-port=662/tcp --add-port=662/udp
      
      # firewall-cmd --zone=zone_name --add-port=662/tcp --add-port=662/udp --permanent
6.2.3.2.2. Prerequisites to run NFS-Ganesha
Ensure that the following prerequisites are taken into consideration before you run NFS-Ganesha in your environment:
  • A Red Hat Gluster Storage volume must be available for export and NFS-Ganesha rpms are installed.
  • Ensure that the fencing agents are configured. For more information on configuring fencing agents, refer to the following documenation:
  • Only one of NFS-Ganesha, gluster-NFS or kernel-NFS servers can be enabled on a given machine/host as all NFS implementations use the port 2049 and only one can be active at a given time. Hence you must disable kernel-NFS before NFS-Ganesha is started.
    Disable the kernel-nfs using the following command:
    For Red Hat Enterprise Linux 7

    # systemctl stop nfs-server
    # systemctl disable nfs-server
    To verify if kernel-nfs is disabled, execute the following command:
    # systemctl status nfs-server
    The service should be in stopped state.

    Note

    Gluster NFS will be stopped automatically when NFS-Ganesha is enabled.
    Ensure that none of the volumes have the variable nfs.disable set to 'off'.
  • Ensure to configure the ports as mentioned in Port/Firewall Information for NFS-Ganesha.
  • Edit the ganesha-ha.conf file based on your environment.
  • Reserve virtual IPs on the network for each of the servers configured in the ganesha.conf file. Ensure that these IPs are different than the hosts' static IPs and are not used anywhere else in the trusted storage pool or in the subnet.
  • Ensure that all the nodes in the cluster are DNS resolvable. For example, you can populate the /etc/hosts with the details of all the nodes in the cluster.
  • Make sure the SELinux is in Enforcing mode.
  • Start network service on all machines using the following command:
    For Red Hat Enterprise Linux 7:
    # systemctl start network
  • Create and mount a gluster shared volume by executing the following command:
    # gluster volume set all cluster.enable-shared-storage enable
    volume set: success
    
  • Create a directory named nfs-ganesha under /var/run/gluster/shared_storage
  • Copy the ganesha.conf and ganesha-ha.conf files from /etc/ganesha to /var/run/gluster/shared_storage/nfs-ganesha.
  • Enable the glusterfssharedstorage.service service using the following command:
    systemctl enable glusterfssharedstorage.service
  • Enable the nfs-ganesha service using the following command:
    systemctl enable nfs-ganesha
6.2.3.2.3. Configuring the Cluster Services
The HA cluster is maintained using Pacemaker and Corosync. Pacemaker acts a resource manager and Corosync provides the communication layer of the cluster. For more information about Pacemaker/Corosync see the documentation under the Clustering section of the Red Hat Enterprise Linux 7 documentation: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/

Note

It is recommended to use 3 or more nodes to configure NFS Ganesha HA cluster, in order to maintain cluster quorum.
  1. Enable the pacemaker service using the following command:
    For Red Hat Enterprise Linux 7:
    # systemctl enable pacemaker.service
  2. Start the pcsd service using the following command.
    For Red Hat Enterprise Linux 7:
    # systemctl start pcsd

    Note

    • To start pcsd by default after the system is rebooted, execute the following command:
      For Red Hat Enterprise Linux 7:
      # systemctl enable pcsd
  3. Set a password for the user ‘hacluster’ on all the nodes using the following command. Use the same password for all the nodes:
    # echo <password> | passwd --stdin hacluster
  4. Perform cluster authentication between the nodes, where, username is ‘hacluster’, and password is the one you used in the previous step. Ensure to execute the following command on every node:
    # pcs cluster auth <hostname1> <hostname2> ...

    Note

    The hostname of all the nodes in the Ganesha-HA cluster must be included in the command when executing it on every node.
    For example, in a four node cluster; nfs1, nfs2, nfs3, and nfs4, execute the following command on every node:
    # pcs cluster auth nfs1 nfs2 nfs3 nfs4
    Username: hacluster
    Password:
    nfs1: Authorized
    nfs2: Authorized
    nfs3: Authorized
    nfs4: Authorized
  5. Key-based SSH authentication without password for the root user has to be enabled on all the HA nodes. Follow these steps:
    1. On one of the nodes (node1) in the cluster, run:
      # ssh-keygen -f /var/lib/glusterd/nfs/secret.pem -t rsa -N ''
    2. Deploy the generated public key from node1 to all the nodes (including node1) by executing the following command for every node:
      # ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@<node-ip/hostname>
    3. Copy the ssh keypair from node1 to all the nodes in the Ganesha-HA cluster by executing the following command for every node:
      # scp -i /var/lib/glusterd/nfs/secret.pem /var/lib/glusterd/nfs/secret.* root@<node-ip/hostname>:/var/lib/glusterd/nfs/
  6. As part of cluster setup, port 875 is used to bind to the Rquota service. If this port is already in use, assign a different port to this service by modifying following line in ‘/etc/ganesha/ganesha.conf’ file on all the nodes.
    # Use a non-privileged port for RQuota
    Rquota_Port = 875;
6.2.3.2.4. Creating the ganesha-ha.conf file
The ganesha-ha.conf.sample is created in the following location /etc/ganesha when Red Hat Gluster Storage is installed. Rename the file to ganesha-ha.conf and make the changes based on your environment.
  1. Create a directory named nfs-ganesha under /var/run/gluster/shared_storage
  2. Copy the ganesha.conf and ganesha-ha.conf files from /etc/ganesha to /var/run/gluster/shared_storage/nfs-ganesha.
Sample ganesha-ha.conf file:
# Name of the HA cluster created.
# must be unique within the subnet
HA_NAME="ganesha-ha-360"
#
#
# You may use short names or long names; you may not use IP addresses.
# Once you select one, stay with it as it will be mildly unpleasant to clean
# up if you switch later on. Ensure that all names - short and/or long - are in
# DNS or /etc/hosts on all machines in the cluster.
#
# The subset of nodes of the Gluster Trusted Pool that form the ganesha HA
# cluster. Hostname is specified.
HA_CLUSTER_NODES="server1.lab.redhat.com,server2.lab.redhat.com,..."
#
# Virtual IPs for each of the nodes specified above.
VIP_server1="10.0.2.1"
VIP_server2="10.0.2.2"
#VIP_server1_lab_redhat_com="10.0.2.1"
#VIP_server2_lab_redhat_com="10.0.2.2"
....
....

Note

  • Pacemaker handles the creation of the VIP and assigning an interface.
  • Ensure that the VIP is in the same network range.
  • Ensure that the HA_CLUSTER_NODES are specified as hostnames. Using IP addresses will cause clustering to fail.
6.2.3.2.5. Configuring NFS-Ganesha using Gluster CLI
Setting up the HA cluster

To setup the HA cluster, enable NFS-Ganesha by executing the following command:

  1. Enable NFS-Ganesha by executing the following command
    # gluster nfs-ganesha enable

    Note

    Before enabling or disabling NFS-Ganesha, ensure that all the nodes that are part of the NFS-Ganesha cluster are up.
    For example,
    # gluster nfs-ganesha enable
    Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue?
     (y/n) y
    This will take a few minutes to complete. Please wait ..
    nfs-ganesha : success

    Note

    After enabling NFS-Ganesha, if rpcinfo -p shows the statd port different from 662, then, restart the statd service:
    For Red Hat Enterprise Linux 7:
    # systemctl restart rpc-statd
    Tearing down the HA cluster

    To tear down the HA cluster, execute the following command:

    # gluster nfs-ganesha disable
    For example,
    # gluster nfs-ganesha disable
    Disabling NFS-Ganesha will tear down entire ganesha cluster across the trusted pool. Do you still want to continue?
    (y/n) y
    This will take a few minutes to complete. Please wait ..
    nfs-ganesha : success
    Verifying the status of the HA cluster

    To verify the status of the HA cluster, execute the following script:

    # /usr/libexec/ganesha/ganesha-ha.sh --status /var/run/gluster/shared_storage/nfs-ganesha
    For example:
    # /usr/libexec/ganesha/ganesha-ha.sh --status /var/run/gluster/shared_storage/nfs-ganesha
     Online: [ server1 server2 server3 server4 ]
    server1-cluster_ip-1 server1
    server2-cluster_ip-1 server2
    server3-cluster_ip-1 server3
    server4-cluster_ip-1 server4
    Cluster HA Status: HEALTHY
    

    Note

    • It is recommended to manually restart the ganesha.nfsd service after the node is rebooted, to fail back the VIPs.
    • Disabling NFS Ganesha does not enable Gluster NFS by default. If required, Gluster NFS must be enabled manually.
6.2.3.2.6. Exporting and Unexporting Volumes through NFS-Ganesha
Exporting Volumes through NFS-Ganesha

To export a Red Hat Gluster Storage volume, execute the following command:

# gluster volume set <volname> ganesha.enable on
For example:
# gluster vol set testvol ganesha.enable on
volume set: success
Unexporting Volumes through NFS-Ganesha

To unexport a Red Hat Gluster Storage volume, execute the following command:

# gluster volume set <volname> ganesha.enable off
This command unexports the Red Hat Gluster Storage volume without affecting other exports.
For example:
# gluster vol set testvol ganesha.enable off
volume set: success
6.2.3.2.7. Verifying the NFS-Ganesha Status
To verify the status of the volume set options, follow the guidelines mentioned below:
  • Check if NFS-Ganesha is started by executing the following commands:
    On Red Hat Enterprise Linux-7
    # systemctl status nfs-ganesha
    For example:
    # systemctl  status nfs-ganesha
       nfs-ganesha.service - NFS-Ganesha file server
       Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled)
       Active: active (running) since Tue 2015-07-21 05:08:22 IST; 19h ago
       Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki
       Main PID: 15440 (ganesha.nfsd)
       CGroup: /system.slice/nfs-ganesha.service
                   └─15440 /usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT
       Jul 21 05:08:22 server1 systemd[1]: Started NFS-Ganesha file server.]
    
    
  • Check if the volume is exported.
    # showmount -e localhost
    For example:
    # showmount -e localhost
    Export list for localhost:
    /volname (everyone)
  • The logs of ganesha.nfsd daemon are written to /var/log/ganesha/ganesha.log. Check the log file on noticing any unexpected behavior.
6.2.3.3. Accessing NFS-Ganesha Exports
NFS-Ganesha exports can be accessed by mounting them in either NFSv3 or NFSv4 mode. Since this is an active-active HA configuration, the mount operation can be performed from the VIP of any node.
For better large file performance on all workloads that is generated on Red Hat Enterprise Linux 7 clients, it is recommended to set the following tunable before mounting the volume:
  1. Execute the following commands to set the tunable:
    # sysctl -w sunrpc.tcp_slot_table_entries=128
    # echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries
    # echo 128 > /proc/sys/sunrpc/tcp_max_slot_table_entries
  2. To make the tunable persistent on reboot, execute the following commands:
    # echo "options sunrpc tcp_slot_table_entries=128" >> /etc/modprobe.d/sunrpc.conf
    # echo "options sunrpc tcp_max_slot_table_entries=128" >>  /etc/modprobe.d/sunrpc.conf

Note

Ensure that NFS clients and NFS-Ganesha servers in the cluster are DNS resolvable with unique host-names to use file locking through Network Lock Manager (NLM) protocol.
6.2.3.3.1. Mounting exports in NFSv3 Mode
To mount an export in NFSv3 mode, execute the following command:
# mount -t nfs -o vers=3 virtual_ip:/volname /mountpoint
For example:
mount -t nfs -o vers=3 10.70.0.0:/testvol /mnt
6.2.3.3.2. Mounting exports in NFSv4 Mode
To mount an export in NFSv4 mode, execute the following command:
# mount -t nfs -o vers=4.0 virtual_ip:/volname /mountpoint
For example:
# mount -t nfs -o vers=4.0 10.70.0.0:/testvol /mnt
6.2.3.3.3. Finding clients of an NFS server using dbus
To display the IP addresses of clients that have mounted the NFS exports, execute the following command:
# dbus-send --type=method_call --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ClientMgr org.ganesha.nfsd.clientmgr.ShowClients

Note

If the NFS export is unmounted or if a client is disconnected from the server, it may take a few minutes for this to be updated in the command output.
6.2.3.4. Modifying the NFS-Ganesha HA Setup
To modify the existing HA cluster and to change the default values of the exports use the ganesha-ha.sh script located at /usr/libexec/ganesha/.
6.2.3.4.1. Adding a Node to the Cluster
Before adding a node to the cluster, ensure that the firewall services are enabled as mentioned in Port Information for NFS-Ganesha and also the prerequisites mentioned in section Pre-requisites to run NFS-Ganesha are met.

Note

Since shared storage and /var/lib/glusterd/nfs/secret.pem SSH key are already generated, those steps should not be repeated.
To add a node to the cluster, execute the following command on any of the nodes in the existing NFS-Ganesha cluster:
# /usr/libexec/ganesha/ganesha-ha.sh --add <HA_CONF_DIR> <HOSTNAME> <NODE-VIP>
where,
HA_CONF_DIR: The directory path containing the ganesha-ha.conf file. By default it is /run/gluster/shared_storage/nfs-ganesha.
HOSTNAME: Hostname of the new node to be added
NODE-VIP: Virtual IP of the new node to be added.
For example:
# /usr/libexec/ganesha/ganesha-ha.sh --add /var/run/gluster/shared_storage/nfs-ganesha server16 10.00.00.01
6.2.3.4.2. Deleting a Node in the Cluster
To delete a node from the cluster, execute the following command on any of the nodes in the existing NFS-Ganesha cluster:
# /usr/libexec/ganesha/ganesha-ha.sh --delete <HA_CONF_DIR> <HOSTNAME>
where,
HA_CONF_DIR: The directory path containing the ganesha-ha.conf file. By default it is located at /run/gluster/shared_storage/nfs-ganesha.
HOSTNAME: Hostname of the new node to be added
For example:
# /usr/libexec/ganesha/ganesha-ha.sh --delete /var/run/gluster/shared_storage/nfs-ganesha  server16
6.2.3.5. Modifying the Default Export Configurations
It is recommended to use gluster CLI options to export or unexport volumes through NFS-Ganesha. However, this section provides some information on changing configurable parameters in NFS-Ganesha. Such parameter changes require NFS-Ganesha to be started manually.
For various supported export options see the ganesha-export-config 8 man page.
To modify the default export configurations perform the following steps on any of the nodes in the existing ganesha cluster:
  1. Edit/add the required fields in the corresponding export file located at /run/gluster/shared_storage/nfs-ganesha/exports/.
  2. Execute the following command
    # /usr/libexec/ganesha/ganesha-ha.sh --refresh-config <HA_CONF_DIR> <volname>
where:
  • HA_CONF_DIR: The directory path containing the ganesha-ha.conf file. By default it is located at /run/gluster/shared_storage/nfs-ganesha.
  • volname: The name of the volume whose export configuration has to be changed.
Sample export configuration file:
The following are the default set of parameters required to export any entry. The values given here are the default values used by the CLI options to start or stop NFS-Ganesha.
# cat export.conf

EXPORT{
    Export_Id = 1 ;   # Export ID unique to each export
    Path = "volume_path";  # Path of the volume to be exported. Eg: "/test_volume"

    FSAL {
        name = GLUSTER;
        hostname = "10.xx.xx.xx";  # IP of one of the nodes in the trusted pool
        volume = "volume_name";     # Volume name. Eg: "test_volume"
    }

    Access_type = RW;     # Access permissions
    Squash = No_root_squash; # To enable/disable root squashing
    Disable_ACL = TRUE;     # To enable/disable ACL
    Pseudo = "pseudo_path";     # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
    Protocols = "3”, “4" ;     # NFS protocols supported
    Transports = "UDP”, “TCP" ; # Transport protocols supported
    SecType = "sys";     # Security flavors supported
}
  • Providing Permissions for Specific Clients
  • Enabling and Disabling NFSv4 ACLs
  • Providing Pseudo Path for NFSv4 Mount
  • Exporting Subdirectories
6.2.3.5.1. Providing Permissions for Specific Clients
The parameter values and permission values given in the EXPORT block applies to any client that mounts the exported volume. To provide specific permissions to specific clients , introduce a client block inside the EXPORT block.
For example, to assign specific permissions for client 10.00.00.01, add the following block in the EXPORT block.
client {
        clients = 10.00.00.01;  # IP of the client.
        access_type = "RO"; # Read-only permissions
        Protocols = "3"; # Allow only NFSv3 protocol.
        anonymous_uid = 1440;
        anonymous_gid = 72;
  }
The following section describes various configurations possible via NFS-Ganesha. Minor changes have to be made to the export.conf file to see the expected behavior.
All the other clients inherit the permissions that are declared outside the client block.
6.2.3.5.2. Enabling and Disabling NFSv4 ACLs
To enable NFSv4 ACLs , edit the following parameter:
Disable_ACL = FALSE;

Note

NFS clients should remount their share after enabling/disabling ACLs on the NFS-Ganesha server.
6.2.3.5.3. Providing Pseudo Path for NFSv4 Mount
To set NFSv4 pseudo path , edit the below parameter:
Pseudo = "pseudo_path"; # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
This path has to be used while mounting the export entry in NFSv4 mode.
6.2.3.5.4. Exporting Subdirectories
Execute the following commands to export subdirectories:
  1. Stop the volume by executing the following command:
    # gluster volume stop <volname>
  2. To export subdirectories within a volume, edit the following parameters in the export.conf file.
    Path = "path_to_subdirectory";  # Path of the volume to be exported. Eg: "/test_volume/test_subdir"
    
     FSAL {
      name = GLUSTER;
      hostname = "10.xx.xx.xx";  # IP of one of the nodes in the trusted pool
      volume = "volume_name";  # Volume name. Eg: "test_volume"
      volpath = "path_to_subdirectory_with_respect_to_volume"; #Subdirectory path from the root of the volume. Eg: "/test_subdir"
     }
  3. Change Export_ID to an unused value. I should preferably be a larger value so that it cannot be re-used for other volumes.
  4. Restart the volume to export the subdirectory.
    # gluster volume start <volname>

Note

If there are multiple sub-directories to be exported, create EXPORT blocks for each such sub-directory and then restart the nfs-ganesha service.
6.2.3.5.5. Configuring Upcall Poll Interval
Currently, the default interval (in microseconds) between upcall polls is 10 microseconds. For large numbers of threads, this results in lot of CPU consumption.
To avoid high CPU consumption, you can configure the time interval between two upcall polls by using the up_poll_usec option. You can set an optimal value depending on the workload. The default value of up_poll_usec option is set to 10 miscroseconds and can be increased upto 60000000 microseconds (60s).
You need to add this option to the FSAL Gluster block in the export.<volume name>.conf file.
For example, to set the upcall poll interval to 1 millisecond (1000 microseconds), add the up_poll_usec in the FSAL Gluster block in export.<volume name>.conf file.
FSAL {
            name = GLUSTER;
		        hostname = “10.xx.xx.xx.”;
		        Volume = "volume_name";
            up_poll_usec = 1000; # Upcall poll interval in microseconds
            }
6.2.3.5.6. Enabling all_squash option
To enable all_squash, edit the following parameter:
Squash = all_squash ; # To enable/disable root squashing
6.2.3.6. Configuring Kerberized NFS-Ganesha
Execute the following steps on all the machines:
  1. Install the krb5-workstation and the ntpdate packages on all the machines:
    # yum install krb5-workstation
    # yum install ntpdate

    Note

    • The krb5-libs package will be updated as a dependent package.
  2. Configure the ntpdate based on the valid time server according to the environment:
    # echo <valid_time_server> >> /etc/ntp/step-tickers
    
    # systemctl enable ntpdate
    
    # systemctl start ntpdate
  3. Ensure that all systems can resolve each other by FQDN in DNS.
  4. Configure the /etc/krb5.conf file and add relevant changes accordingly. For example:
    [logging]
      default = FILE:/var/log/krb5libs.log
      kdc = FILE:/var/log/krb5kdc.log
      admin_server = FILE:/var/log/kadmind.log
    
      [libdefaults]
      dns_lookup_realm = false
      ticket_lifetime = 24h
      renew_lifetime = 7d
      forwardable = true
      rdns = false
      default_realm = EXAMPLE.COM
      default_ccache_name = KEYRING:persistent:%{uid}
    
      [realms]
      EXAMPLE.COM = {
      kdc = kerberos.example.com
        admin_server = kerberos.example.com
      }
    
      [domain_realm]
      .example.com = EXAMPLE.COM
       example.com = EXAMPLE.COM

    Note

    For further details regarding the file configuration, refer to man krb5.conf.
  5. On the NFS-server and client, update the /etc/idmapd.conf file by making the required change. For example:
    Domain = example.com
6.2.3.6.1. Setting up the NFS-Ganesha Server:
Execute the following steps to set up the NFS-Ganesha server:

Note

Before setting up the NFS-Ganesha server, make sure to set up the KDC based on the requirements.
  1. Install the following packages:
    # yum install nfs-utils
    # yum install rpcbind
  2. Install the relevant gluster and NFS-Ganesha rpms. For more information see, Red Hat Gluster Storage 3.4 Installation Guide.
  3. Create a Kerberos principle and add it to krb5.keytab on the NFS-Ganesha server
    $ kadmin
    $ kadmin: addprinc -randkey nfs/<host_name>@EXAMPLE.COM
    $ kadmin: ktadd nfs/<host_name>@EXAMPLE.COM
    For example:
    # kadmin
    Authenticating as principal root/admin@EXAMPLE.COM with password.
    Password for root/admin@EXAMPLE.COM:
    
    kadmin:  addprinc -randkey nfs/<host_name>@EXAMPLE.COM
    WARNING: no policy specified for nfs/<host_name>@EXAMPLE.COM; defaulting to no policy
    Principal "nfs/<host_name>@EXAMPLE.COM" created.
    
    
    kadmin:  ktadd nfs/<host_name>@EXAMPLE.COM
    Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno2, encryption type aes256-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
    Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
    Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type des3-cbc-sha1 added to keytab FILE:/etc/krb5.keytab.
    Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type arcfour-hmac added to keytab FILE:/etc/krb5.keytab.
    Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type camellia256-cts-cmac added to keytab FILE:/etc/krb5.keytab.
    Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type camellia128-cts-cmac added to keytab FILE:/etc/krb5.keytab.
    Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type des-hmac-sha1 added to keytab FILE:/etc/krb5.keytab.
    Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type des-cbc-md5 added to keytab FILE:/etc/krb5.keytab.
  4. Update /etc/ganesha/ganesha.conf file as mentioned below:
    NFS_KRB5
    {
            PrincipalName = nfs ;
            KeytabPath = /etc/krb5.keytab ;
            Active_krb5 = true ;
    }
  5. Based on the different kerberos security flavours (krb5, krb5i and krb5p) supported by nfs-ganesha, configure the 'SecType' parameter in the volume export file (/var/run/gluster/shared_storage/nfs-ganesha/exports) with appropriate security flavour
  6. Create an unprivileged user and ensure that the users that are created are resolvable to the UIDs through the central user database. For example:
    # useradd guest

    Note

    The username of this user has to be the same as the one on the NFS-client.
6.2.3.6.2. Setting up the NFS Client
Execute the following steps to set up the NFS client:

Note

For a detailed information on setting up NFS-clients for security on Red Hat Enterprise Linux, see Section 8.8.2 NFS Security, in the Red Hat Enterprise Linux 7 Storage Administration Guide.
  1. Install the following packages:
    # yum install nfs-utils
    # yum install rpcbind
  2. Create a kerberos principle and add it to krb5.keytab on the client side. For example:
    # kadmin
    # kadmin: addprinc -randkey host/<host_name>@EXAMPLE.COM
    # kadmin: ktadd host/<host_name>@EXAMPLE.COM
    # kadmin
    Authenticating as principal root/admin@EXAMPLE.COM with password.
    Password for root/admin@EXAMPLE.COM:
    
    kadmin:  addprinc -randkey host/<host_name>@EXAMPLE.COM
    WARNING: no policy specified for host/<host_name>@EXAMPLE.COM; defaulting to no policy
    Principal "host/<host_name>@EXAMPLE.COM" created.
    
    kadmin:  ktadd host/<host_name>@EXAMPLE.COM
    Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
    Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab.
    Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type des3-cbc-sha1 added to keytab FILE:/etc/krb5.keytab.
    Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type arcfour-hmac added to keytab FILE:/etc/krb5.keytab.
    Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type camellia256-cts-cmac added to keytab FILE:/etc/krb5.keytab.
    Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type camellia128-cts-cmac added to keytab FILE:/etc/krb5.keytab.
    Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type des-hmac-sha1 added to keytab FILE:/etc/krb5.keytab.
    Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type des-cbc-md5 added to keytab FILE:/etc/krb5.keytab.
  3. Check the status of nfs-client.target service and start it, if not already started:
    # systemctl status nfs-client.target
    # systemctl start nfs-client.target
    # systemctl enable nfs-client.target
  4. Create an unprivileged user and ensure that the users that are created are resolvable to the UIDs through the central user database. For example:
    # useradd guest

    Note

    The username of this user has to be the same as the one on the NFS-server.
  5. Mount the volume specifying kerberos security type:
    # mount -t nfs -o sec=krb5 <host_name>:/testvolume /mnt
    As root, all access should be granted.
    For example:
    Creation of a directory on the mount point and all other operations as root should be successful.
    # mkdir <directory name>
  6. Login as a guest user:
    # su - guest
    Without a kerberos ticket, all access to /mnt should be denied. For example:
    # su guest
    # ls
    ls: cannot open directory .: Permission denied
  7. Get the kerberos ticket for the guest and access /mnt:
    # kinit
    Password for guest@EXAMPLE.COM:
    
    # ls
    <directory created>

    Important

    With this ticket, some access must be allowed to /mnt. If there are directories on the NFS-server where "guest" does not have access to, it should work correctly.
6.2.3.7. NFS-Ganesha Service Downtime
In a highly available active-active environment, if a NFS-Ganesha server that is connected to a NFS client running a particular application goes down, the application/NFS client is seamlessly connected to another NFS-Ganesha server without any administrative intervention. However, there is a delay or fail-over time in connecting to another NFS-Ganesha server. This delay can be experienced during fail-back too, that is, when the connection is reset to the original node/server.
The following list describes how the time taken for the NFS server to detect a server reboot or resume is calculated.
  • If the ganesha.nfsd dies (crashes, oomkill, admin kill), the maximum time to detect it and put the ganesha cluster into grace is 20sec, plus whatever time pacemaker needs to effect the fail-over.

    Note

    This time taken to detect if the service is down, can be edited using the following command on all the nodes:
    # pcs resource op remove nfs-mon monitor
    # pcs resource op add nfs-mon monitor interval=<interval_period_value>
  • If the whole node dies (including network failure) then this down time is the total of whatever time pacemaker needs to detect that the node is gone, the time to put the cluster into grace, and the time to effect the fail-over. This is ~20 seconds.
  • So the max-fail-over time is approximately 20-22 seconds, and the average time is typically less. In other words, the time taken for NFS clients to detect server reboot or resume I/O is 20 - 22 seconds.
6.2.3.7.1. Modifying the Fail-over Time
After failover, there is a short period of time during which clients try to reclaim their lost OPEN/LOCK state. Servers block certain file operations during this period, as per the NFS specification. The file operations blocked are as follows:
Table 6.6. 
Protocols FOPs
NFSV3
  • SETATTR
NLM
  • LOCK
  • UNLOCK
  • SHARE
  • UNSHARE
  • CANCEL
  • LOCKT
NFSV4
  • LOCK
  • LOCKT
  • OPEN
  • REMOVE
  • RENAME
  • SETATTR

Note

LOCK, SHARE, and UNSHARE will be blocked only if it is requested with reclaim set to FALSE.
OPEN will be blocked if requested with claim type other than CLAIM_PREVIOUS or CLAIM_DELEGATE_PREV.
The default value for the grace period is 90 seconds. This value can be changed by adding the following lines in the /etc/ganesha/ganesha.conf file.
NFSv4 {
Grace_Period=<grace_period_value_in_sec>;
}
After editing the /etc/ganesha/ganesha.conf file, restart the NFS-Ganesha service using the following command on all the nodes :
On Red Hat Enterprise Linux 7

# systemctl restart nfs-ganesha
6.2.3.8. Manually Configuring NFS-Ganesha Exports
It is recommended to use gluster CLI options to export or unexport volumes through NFS-Ganesha. However, this section provides some information on changing configurable parameters in NFS-Ganesha. Such parameter changes require NFS-Ganesha to be started manually.
To modify the default export configurations perform the following steps on any of the nodes in the existing ganesha cluster:
  1. Edit/add the required fields in the corresponding export configuration file in the /run/gluster/shared_storage/nfs-ganesha/exports directory.
  2. Execute the following command
    # /usr/libexec/ganesha/ganesha-ha.sh --refresh-config <HA_CONF_DIR> <volname>
where:
  • HA_CONF_DIR: The directory path containing the ganesha-ha.conf file. By default it is located at /etc/ganesha.
  • volname: The name of the volume whose export configuration has to be changed.
Sample export configuration file:
The following are the default set of parameters required to export any entry. The values given here are the default values used by the CLI options to start or stop NFS-Ganesha.
# cat export.conf

EXPORT{
    Export_Id = 1 ;   # Export ID unique to each export
    Path = "volume_path";  # Path of the volume to be exported. Eg: "/test_volume"

    FSAL {
        name = GLUSTER;
        hostname = "10.xx.xx.xx";  # IP of one of the nodes in the trusted pool
        volume = "volume_name";     # Volume name. Eg: "test_volume"
    }

    Access_type = RW;     # Access permissions
    Squash = No_root_squash; # To enable/disable root squashing
    Disable_ACL = TRUE;     # To enable/disable ACL
    Pseudo = "pseudo_path";     # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
    Protocols = "3”, “4" ;     # NFS protocols supported
    Transports = "UDP”, “TCP" ; # Transport protocols supported
    SecType = "sys";     # Security flavors supported
}
The following section describes various configurations possible via NFS-Ganesha. Minor changes have to be made to the export.conf file to see the expected behavior.
  • Exporting Subdirectories
  • Providing Permissions for Specific Clients
  • Enabling and Disabling NFSv4 ACLs
  • Providing Pseudo Path for NFSv4 Mount
Exporting Subdirectories

To export subdirectories within a volume, edit the following parameters in the export.conf file.

Path = "path_to_subdirectory";  # Path of the volume to be exported. Eg: "/test_volume/test_subdir"

 FSAL {
  name = GLUSTER;
  hostname = "10.xx.xx.xx";  # IP of one of the nodes in the trusted pool
  volume = "volume_name";  # Volume name. Eg: "test_volume"
  volpath = "path_to_subdirectory_with_respect_to_volume"; #Subdirectory path from the root of the volume. Eg: "/test_subdir"
 }
Providing Permissions for Specific Clients

The parameter values and permission values given in the EXPORT block applies to any client that mounts the exported volume. To provide specific permissions to specific clients , introduce a client block inside the EXPORT block.

For example, to assign specific permissions for client 10.00.00.01, add the following block in the EXPORT block.
client {
        clients = 10.00.00.01;  # IP of the client.
        allow_root_access = true;
        access_type = "RO"; # Read-only permissions
        Protocols = "3"; # Allow only NFSv3 protocol.
        anonymous_uid = 1440;
        anonymous_gid = 72;
  }
All the other clients inherit the permissions that are declared outside the client block.
Enabling and Disabling NFSv4 ACLs

To enable NFSv4 ACLs , edit the following parameter:

Disable_ACL = FALSE;
Providing Pseudo Path for NFSv4 Mount

To set NFSv4 pseudo path , edit the below parameter:

Pseudo = "pseudo_path"; # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
This path has to be used while mounting the export entry in NFSv4 mode.
6.2.3.9. Troubleshooting
Mandatory checks

Ensure you execute the following commands for all the issues/failures that is encountered:

  • Make sure all the prerequisites are met.
  • Execute the following commands to check the status of the services:
    # service nfs-ganesha status
    # service pcsd status
    # service pacemaker status
    # pcs status
  • Review the followings logs to understand the cause of failure.
    /var/log/ganesha/ganesha.log
    /var/log/ganesha/ganesha-gfapi.log
    /var/log/messages
    /var/log/pcsd.log
    
  • Situation

    NFS-Ganesha fails to start.

    Solution

    Ensure you execute all the mandatory checks to understand the root cause before proceeding with the following steps. Follow the listed steps to fix the issue:

    1. Ensure the kernel and gluster nfs services are inactive.
    2. Ensure that the port 875 is free to connect to the RQUOTA service.
    3. Ensure that the shared storage volume mount exists on the server after node reboot/shutdown. If it does not, then mount the shared storage volume manually using the following command:
      # mount -t glusterfs <local_node's_hostname>:gluster_shared_storage /var/run/gluster/shared_storage
    For more information see, section Manually Configuring NFS-Ganesha Exports.
  • Situation

    NFS-Ganesha port 875 is unavailable.

    Solution

    Ensure you execute all the mandatory checks to understand the root cause before proceeding with the following steps. Follow the listed steps to fix the issue:

    1. Run the following command to extract the PID of the process using port 875:
      netstat -anlp | grep 875
    2. Determine if the process using port 875 is an important system or user process.
    3. Perform one of the following depending upon the importance of the process:
      • If the process using port 875 is an important system or user process:
        1. Assign a different port to this service by modifying following line in ‘/etc/ganesha/ganesha.conf’ file on all the nodes:
          # Use a non-privileged port for RQuota
          Rquota_Port = port_number;
        2. Run the following commands after modifying the port number:
          # semanage port -a -t mountd_port_t -p tcp port_number
          # semanage port -a -t mountd_port_t -p udp port_number
        3. Run the following command to restart NFS-Ganesha:
          systemctl restart nfs-ganesha
      • If the process using port 875 is not an important system or user process:
        1. Run the following command to kill the process using port 875:
          # kill pid;
          Use the process ID extracted from the previous step.
        2. Run the following command to ensure that the process is killed and port 875 is free to use:
          # ps aux | grep pid;
        3. Run the following command to restart NFS-Ganesha:
          systemctl restart nfs-ganesha
        4. If required, restart the killed process.
  • Situation

    NFS-Ganesha Cluster setup fails.

    Solution

    Ensure you execute all the mandatory checks to understand the root cause before proceeding with the following steps.

    1. Ensure the kernel and gluster nfs services are inactive.
    2. Ensure that pcs cluster auth command is executed on all the nodes with same password for the user hacluster
    3. Ensure that shared volume storage is mounted on all the nodes.
    4. Ensure that the name of the HA Cluster does not exceed 15 characters.
    5. Ensure UDP multicast packets are pingable using OMPING.
    6. Ensure that Virtual IPs are not assigned to any NIC.
  • Situation

    NFS-Ganesha has started and fails to export a volume.

    Solution

    Ensure you execute all the mandatory checks to understand the root cause before proceeding with the following steps. Follow the listed steps to fix the issue:

    1. Ensure that volume is in Started state using the following command:
      # gluster volume status <volname>
      
    2. Execute the following commands to check the status of the services:
      # service nfs-ganesha status
      # showmount -e localhost
    3. Review the followings logs to understand the cause of failure.
      /var/log/ganesha/ganesha.log
      /var/log/ganesha/ganesha-gfapi.log
      /var/log/messages
    4. Ensure that dbus service is running using the following command
      # service messagebus status
    5. If the volume is not in a started state, run the following command to start the volume.
      # gluster volume start <volname>
      If the volume is not exported as part of volume start, run the following command to re-export the volume:
      # /usr/libexec/ganesha/dbus-send.sh /var/run/gluster/shared_storage on <volname>
  • Situation

    Adding a new node to the HA cluster fails.

    Solution

    Ensure you execute all the mandatory checks to understand the root cause before proceeding with the following steps. Follow the listed steps to fix the issue:

    1. Ensure to run the following command from one of the nodes that is already part of the cluster:
      # ganesha-ha.sh --add <HA_CONF_DIR>  <NODE-HOSTNAME>  <NODE-VIP>
    2. Ensure that gluster_shared_storage volume is mounted on the node that needs to be added.
    3. Make sure that all the nodes of the cluster is DNS resolvable from the node that needs to be added.
    4. Execute the following command for each of the hosts in the HA cluster on the node that needs to be added:
      # pcs cluster auth <hostname>
  • Situation

    Cleanup required when nfs-ganesha HA cluster setup fails.

    Solution

    To restore back the machines to the original state, execute the following commands on each node forming the cluster:

    # /usr/libexec/ganesha/ganesha-ha.sh --teardown /var/run/gluster/shared_storage/nfs-ganesha
    # /usr/libexec/ganesha/ganesha-ha.sh --cleanup /var/run/gluster/shared_storage/nfs-ganesha
    # systemctl stop nfs-ganesha
  • Situation

    Permission issues.

    Solution

    By default, the root squash option is disabled when you start NFS-Ganesha using the CLI. In case, you encounter any permission issues, check the unix permissions of the exported entry.

6.3. SMB

You can access Red Hat Gluster Storage volumes using the Server Message Block (SMB) protocol by exporting directories in Red Hat Gluster Storage volumes as SMB shares on the server.
This section describes how to enable SMB shares, how to mount SMB shares manually and automatically on Microsoft Windows and macOS based clients, and how to verify that the share has been mounted successfully.

Important

To export Red Hat Gluster Storage volumes with Samba, setting up CTDB is mandatory.
Follow the process outlined in Overview of configuring SMB shares. The details of this overview are provided in the rest of this section.

6.3.1. Requirements for using SMB with Red Hat Gluster Storage

  • Samba is the server software used to export Linux filesystems with the SMB protocol. For exporting Red Hat Gluster Storage volumes with Samba, it is mandatory to have CTDB configured, which is a component of Samba. For information on subscribing to the correct channels for SMB support, see Subscribing to the Red Hat Gluster Storage server channels in the Red Hat Gluster Storage 3.4 Installation Guide.
  • Enable the Samba firewall service in the active zones for runtime and permanent mode. The following commands are for systems based on Red Hat Enterprise Linux 7.
    To get a list of active zones, run the following command:
    # firewall-cmd --get-active-zones
    To allow the firewall services in the active zones, run the following commands
    # firewall-cmd --zone=zone_name --add-service=samba
    # firewall-cmd --zone=zone_name --add-service=samba  --permanent

6.3.2. Setting up CTDB for Samba

To export Red Hat Gluster Storage volumes with Samba, configure CTDB (Cluster Trivial Database).
CTDB provides high availability by adding virtual IP addresses (VIPs) and a heartbeat service. When a node in the trusted storage pool fails, CTDB enables a different node to take over the virtual IP addresses that the failed node was hosting. This ensures the IP addresses for the services provided are always available.

Important

Amazon Elastic Compute Cloud (EC2) does not support VIPs and is hence not compatible with this solution.

Prerequisites

  • If you already have an older version of CTDB (version <= ctdb1.x), then remove CTDB by executing the following command:
    # yum remove ctdb
    After removing the older version, proceed with installing the latest CTDB.

    Note

    Ensure that the system is subscribed to the samba channel to get the latest CTDB packages.
  • Install CTDB on all the nodes that are used as Samba servers to the latest version using the following command:
    # yum install ctdb
  • In a CTDB based high availability environment of Samba , the locks will not be migrated on failover.
  • Enable the CTDB firewall service in the active zones for runtime and permanent mode. The following commands are for systems based on Red Hat Enterprise Linux 7.
    To get a list of active zones, run the following command:
    # firewall-cmd --get-active-zones
    To add ports to the active zones, run the following commands:
    # firewall-cmd --zone=zone_name --add-port=4379/tcp
    # firewall-cmd --zone=zone_name --add-port=4379/tcp  --permanent

Configuring CTDB on Red Hat Gluster Storage Server

  1. Create a new replicated volume to house the CTDB lock file. The lock file has a size of zero bytes, so use small bricks.
    To create a replicated volume run the following command, replacing N with the number of nodes to replicate across:
    # gluster volume create volname replica N ip_address_1:brick_path ... ip_address_N:brick_path
    For example:
    # gluster volume create ctdb replica 3 10.16.157.75:/rhgs/brick1/ctdb/b1 10.16.157.78:/rhgs/brick1/ctdb/b2 10.16.157.81:/rhgs/brick1/ctdb/b3
  2. In the following files, replace all in the statement META="all" with the newly created volume name, for example, META="ctdb".
    /var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh
    /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh
  3. In the /etc/samba/smb.conf file, add the following line in the global section on all the nodes:
    clustering=yes
  4. Start the volume.
    The S29CTDBsetup.sh script runs on all Red Hat Gluster Storage servers, adds an entry in /etc/fstab for the mount, and mounts the volume at /gluster/lock on all the nodes with Samba server. It also enables automatic start of CTDB service on reboot.

    Note

    When you stop the special CTDB volume, the S29CTDB-teardown.sh script runs on all Red Hat Gluster Storage servers and removes an entry in /etc/fstab for the mount and unmounts the volume at /gluster/lock.
  5. Verify that the /etc/sysconfig/ctdb file exists on all nodes that are used as a Samba server. This file contains CTDB configuration details recommended for Red Hat Gluster Storage.
  6. Create the /etc/ctdb/nodes file on all the nodes that are used as Samba servers and add the IP addresses of these nodes to the file.
    10.16.157.0
    10.16.157.3
    10.16.157.6
    The IP addresses listed here are the private IP addresses of Samba servers.
  7. On nodes that are used as Samba servers and require IP failover, create the /etc/ctdb/public_addresses file. Add any virtual IP addresses that CTDB should create to the file in the following format:
    VIP/routing_prefix network_interface
    For example:
    192.168.1.20/24 eth0
    192.168.1.21/24 eth0
  8. Start the CTDB service on all the nodes.
    # service ctdb start

6.3.3. Sharing Volumes over SMB

After you follow this process, any gluster volumes configured on servers that run Samba are exported automatically on starting the volume.
The procedure to share volumes over samba differs on the Samba version you choose.

If you are using an older version of Samba:

  1. Enable SMB specific caching:
    # gluster volume set VOLNAME performance.cache-samba-metadata on
    You can also enable generic metadata caching to improve performance. See Section 20.7, “Directory Operations” for details.
  2. Restart the glusterd service on each Red Hat Gluster Storage node.
  3. Verify proper lock and I/O coherence:
    # gluster volume set VOLNAME storage.batch-fsync-delay-usec 0

If you are using Samba-4.8.5-104 or later:

  1. To export gluster volume as SMB share via Samba, one of the following volume options, user.cifs or user.smb is required.
    To enable user.cifs volume option, run:
    # gluster volume set VOLNAME user.cifs enable
    And to enable user.smb, run:
    # gluster volume set VOLNAME user.smb enable
    Red Hat Gluster Storage 3.4 introduces a group command samba for configuring the necessary volume options for Samba-CTDB setup.
  2. Execute the following command to configure the volume options for the Samba-CTDB:
    # gluster volume set VOLNAME group samba
    This command will enable the following option for Samba-CTDB setup:
    • performance.readdir-ahead: on
    • performance.parallel-readdir: on
    • performance.nl-cache-timeout: 600
    • performance.nl-cache: on
    • performance.cache-samba-metadata: on
    • network.inode-lru-limit: 200000
    • performance.md-cache-timeout: 600
    • performance.cache-invalidation: on
    • features.cache-invalidation-timeout: 600
    • features.cache-invalidation: on
    • performance.stat-prefetch: on

Then, for all Samba versions:

  1. Verify that the volume can be accessed from the SMB/CIFS share:
    # smbclient -L <hostname> -U%
    For example:
    # smbclient -L rhs-vm1 -U%
    Domain=[MYGROUP] OS=[Unix] Server=[Samba 4.1.17]
    
         Sharename       Type      Comment
         ---------       ----      -------
         IPC$            IPC       IPC Service (Samba Server Version 4.1.17)
         gluster-vol1    Disk      For samba share of volume vol1
    Domain=[MYGROUP] OS=[Unix] Server=[Samba 4.1.17]
    
         Server               Comment
         ---------            -------
    
         Workgroup            Master
         ---------            -------
  2. Verify that the SMB/CIFS share can be accessed by the user, run the following command:
    # smbclient //VIP/gluster-volname -U username%password
    For example:
    # smbclient //10.0.0.1/gluster-vol1 -U root%redhat
    Domain=[MYGROUP] OS=[Unix] Server=[Samba 4.1.17]
    smb: \> mkdir test
    smb: \> cd test\
    smb: \test\> pwd
    Current directory is \\10.0.0.1\gluster-vol1\test\
    smb: \test\>
  3. Configure this share so that a client can mount it using the address of any server in the trusted storage pool that provides this volume.
    1. Open the /etc/samba/smb.conf file in a text editor and add the following lines for a simple configuration:
      [gluster-VOLNAME]
      comment = For samba share of volume VOLNAME
      vfs objects = glusterfs
      glusterfs:volume = VOLNAME
      glusterfs:logfile = /var/log/samba/VOLNAME.log
      glusterfs:loglevel = 7
      path = /
      read only = no
      guest ok = yes
      The configuration options are described in the following table:
      Table 6.7. Configuration Options
      Configuration Options Required? Default Value Description
      Path Yes n/a It represents the path that is relative to the root of the gluster volume that is being shared. Hence / represents the root of the gluster volume. Exporting a subdirectory of a volume is supported and /subdir in path exports only that subdirectory of the volume.
      glusterfs:volume Yes n/a The volume name that is shared.
      glusterfs:logfile No NULL Path to the log file that will be used by the gluster modules that are loaded by the vfs plugin. Standard Samba variable substitutions as mentioned in smb.conf are supported.
      glusterfs:loglevel No 7 This option is equivalent to the client-log-level option of gluster. 7 is the default value and corresponds to the INFO level.
      glusterfs:volfile_server No localhost The gluster server to be contacted to fetch the volfile for the volume. It takes the value, which is a list of white space separated elements, where each element is unix+/path/to/socket/file or [tcp+]IP|hostname|\[IPv6\][:port]
    2. Run service smb [re]start to start or restart the smb service.
    3. Run smbpasswd to set the SMB password.
      # smbpasswd -a username
      Specify the SMB password. This password is used during the SMB mount.

6.3.4. Configuring User Access to Shared Volumes

6.3.4.1. Configuring the Apple Create Context for macOS users
  1. Add the following lines to the [global] section of the smb.conf file. Note that the indentation level shown is required.
                fruit:aapl = yes
                ea support = yes
  2. Load the vfs_fruit module and its dependencies by adding the following line to your volume's export configuration block in the smb.conf file.
    vfs objects = fruit streams_xattr glusterfs
    For example:
    [gluster-volname]
    comment = For samba share of volume smbshare
    vfs objects = fruit streams_xattr glusterfs
    glusterfs:volume = volname
    glusterfs:logfile = /var/log/samba/glusterfs-volname-fruit.%M.log
    glusterfs:loglevel = 7
    path = /
    read only = no
    guest ok = yes
    
    fruit:encoding = native
6.3.4.2. Configuring read/write access for a non-privileged user
  1. Add the user on all the Samba servers based on your configuration:
    # adduser username
  2. Add the user to the list of Samba users on all Samba servers and assign password by executing the following command:
    # smbpasswd -a username
  3. From any other Samba server, mount the volume using the FUSE protocol.
    # mount -t glusterfs -o acl ip-address:/volname /mountpoint
    For example:
    # mount -t glusterfs -o acl rhs-a:/repvol /mnt
  4. Use the setfacl command to provide the required permissions for directory access to the user.
    # setfacl -m user:username:rwx mountpoint
    For example:
    # setfacl -m user:cifsuser:rwx /mnt

6.3.5. Mounting Volumes using SMB

6.3.5.1. Manually mounting volumes exported with SMB on Red Hat Enterprise Linux
  1. Install the cifs-utils package on the client.
    # yum install cifs-utils
  2. Run mount -t cifs to mount the exported SMB share, using the syntax example as guidance.
    # mount -t cifs -o user=username,pass=password  //hostname/gluster-volname /mountpoint
    The sec=ntlmssp parameter is also required when mounting a volume on Red Hat Enterprise Linux 6.
    # mount -t cifs -o user=username,pass=password,sec=ntlmssp //hostname/gluster-volname /mountpoint
    For example:
    # mount -t cifs -o user=cifsuser,pass=redhat,sec=ntlmssp //server1/gluster-repvol /cifs
  3. Run # smbstatus -S on the server to display the status of the volume:
    Service        pid     machine             Connected at
    -------------------------------------------------------------------
    gluster-VOLNAME 11967   __ffff_192.168.1.60  Mon Aug  6 02:23:25 2012
6.3.5.2. Manually mounting volumes exported with SMB on Microsoft Windows
6.3.5.2.1. Using Microsoft Windows Explorer to manually mount a volume
  1. In Windows Explorer, click ToolsMap Network Drive…. to open the Map Network Drive screen.
  2. Choose the drive letter using the Drive drop-down list.
  3. In the Folder text box, specify the path of the server and the shared resource in the following format: \\SERVER_NAME\VOLNAME.
  4. Click Finish to complete the process, and display the network drive in Windows Explorer.
  5. Navigate to the network drive to verify it has mounted correctly.
6.3.5.2.2. Using Microsoft Windows command line interface to manually mount a volume
  1. Click StartRun, and then type cmd.
  2. Enter net use z: \\SERVER_NAME\VOLNAME, where z: is the drive letter to assign to the shared volume.
    For example, net use y: \\server1\test-volume
  3. Navigate to the network drive to verify it has mounted correctly.
6.3.5.3. Manually mounting volumes exported with SMB on macOS

Prerequisites

  • Ensure that your Samba configuration allows the use the SMB Apple Create Context.
  • Ensure that the username you're using is on the list of allowed users for the volume.

Manual mounting process

  1. In the Finder, click Go > Connect to Server.
  2. In the Server Address field, type the IP address or hostname of a Red Hat Gluster Storage server that hosts the volume you want to mount.
  3. Click Connect.
  4. When prompted, select Registered User to connect to the volume using a valid username and password.
    If required, enter your user name and password, then select the server volumes or shared folders that you want to mount.
    To make it easier to connect to the computer in the future, select Remember this password in my keychain to add your user name and password for the computer to your keychain.
For further information about mounting volumes on macOS, see the Apple Support documentation: https://support.apple.com/kb/PH25269?locale=en_US.
6.3.5.4. Configuring automatic mounting for volumes exported with SMB on Red Hat Enterprise Linux
  1. Open the /etc/fstab file in a text editor and add a line containing the following details:
    \\HOSTNAME|IPADDRESS\SHARE_NAME MOUNTDIR cifs OPTIONS DUMP FSCK
    In the OPTIONS column, ensure that you specify the credentials option, with a value of the path to the file that contains the username and/or password.
    Using the example server names, the entry contains the following replaced values.
    \\server1\test-volume /mnt/glusterfs cifs credentials=/etc/samba/passwd,_netdev 0 0
    The sec=ntlmssp parameter is also required when mounting a volume on Red Hat Enterprise Linux 6, for example:
    \\server1\test-volume /mnt/glusterfs cifs credentials=/etc/samba/passwd,_netdev,sec=ntlmssp 0 0
    See the mount.cifs man page for more information about these options.
  2. Run # smbstatus -S on the client to display the status of the volume:
    Service        pid     machine             Connected at
    -------------------------------------------------------------------
    gluster-VOLNAME 11967   __ffff_192.168.1.60  Mon Aug  6 02:23:25 2012
6.3.5.5. Configuring automatic mounting for volumes exported with SMB on Microsoft Windows
  1. In Windows Explorer, click ToolsMap Network Drive…. to open the Map Network Drive screen.
  2. Choose the drive letter using the Drive drop-down list.
  3. In the Folder text box, specify the path of the server and the shared resource in the following format: \\SERVER_NAME\VOLNAME.
  4. Click the Reconnect at logon check box.
  5. Click Finish to complete the process, and display the network drive in Windows Explorer.
  6. If the Windows Security screen pops up, enter the username and password and click OK.
  7. Navigate to the network drive to verify it has mounted correctly.
6.3.5.6. Configuring automatic mounting for volumes exported with SMB on macOS
  1. Manually mount the volume using the process outlined in Section 6.3.5.3, “Manually mounting volumes exported with SMB on macOS”.
  2. In the Finder, click System Preferences > Users & Groups > Username > Login Items.
  3. Drag and drop the mounted volume into the login items list.
    Check Hide if you want to prevent the drive's window from opening every time you boot or log in.
For further information about mounting volumes on macOS, see the Apple Support documentation: https://support.apple.com/kb/PH25269?locale=en_US.

6.3.6. Starting and Verifying your Configuration

Perform the following to start and verify your configuration:

Verify the Configuration

Verify the virtual IP (VIP) addresses of a shut down server are carried over to another server in the replicated volume.
  1. Verify that CTDB is running using the following commands:
    # ctdb status
    # ctdb ip
    # ctdb ping -n all
  2. Mount a Red Hat Gluster Storage volume using any one of the VIPs.
  3. Run # ctdb ip to locate the physical server serving the VIP.
  4. Shut down the CTDB VIP server to verify successful configuration.
    When the Red Hat Gluster Storage server serving the VIP is shut down there will be a pause for a few seconds, then I/O will resume.

6.3.7. Disabling SMB Shares

To stop automatic sharing on all nodes for all volumes execute the following steps:

  1. On all Red Hat Gluster Storage Servers, with elevated privileges, navigate to /var/lib/glusterd/hooks/1/start/post
  2. Rename the S30samba-start.sh to K30samba-start.sh.
    For more information about these scripts, see Section 13.2, “Prepackaged Scripts”.
To stop automatic sharing on all nodes for one particular volume:

  1. Run the following command to disable automatic SMB sharing per-volume:
    # gluster volume set <VOLNAME> user.smb disable

6.3.8. Accessing Snapshots in Windows

A snapshot is a read-only point-in-time copy of the volume. Windows has an inbuilt mechanism to browse snapshots via Volume Shadow-copy Service (also known as VSS). Using this feature users can access the previous versions of any file or folder with minimal steps.

Note

Shadow Copy (also known as Volume Shadow-copy Service, or VSS) is a technology included in Microsoft Windows that allows taking snapshots of computer files or volumes, apart from viewing snapshots. Currently we only support viewing of snapshots. Creation of snapshots with this interface is NOT supported.
6.3.8.1. Configuring Shadow Copy
To configure shadow copy, the following configurations must be modified/edited in the smb.conf file. The smb.conf file is located at etc/samba/smb.conf.

Note

Ensure, shadow_copy2 module is enabled in smb.conf. To enable add the following parameter to the vfs objects option.
For example:
vfs objects = shadow_copy2 glusterfs
Table 6.8. Configuration Options
Configuration Options Required? Default Value Description
shadow:snapdir Yes n/a Path to the directory where snapshots are kept. The snapdir name should be .snaps.
shadow:basedir Yes n/aPath to the base directory that snapshots are from. The basedir value should be /.
shadow:sort Optional unsorted The supported values are asc/desc. By this parameter one can specify that the shadow copy directories should be sorted before they are sent to the client. This can be beneficial as unix filesystems are usually not listed alphabetically sorted. If enabled, it is specified in descending order.
shadow:localtime Optional UTC This is an optional parameter that indicates whether the snapshot names are in UTC/GMT or in local time.
shadow:format Yes n/a This parameter specifies the format specification for the naming of snapshots. The format must be compatible with the conversion specifications recognized by str[fp]time. The default value is _GMT-%Y.%m.%d-%H.%M.%S.
shadow:fixinodesOptionalNo If you enable shadow:fixinodes then this module will modify the apparent inode number of files in the snapshot directories using a hash of the files path. This is needed for snapshot systems where the snapshots have the same device:inode number as the original files (such as happens with GPFS snapshots). If you don't set this option then the 'restore' button in the shadow copy UI will fail with a sharing violation.
shadow:snapprefixOptionaln/aRegular expression to match prefix of snapshot name. Red Hat Gluster Storage only supports Basic Regular Expression (BRE)
shadow:delimiterOptional_GMTdelimiter is used to separate shadow:snapprefix and shadow:format.
Following is an example of the smb.conf file:
[gluster-vol0]
comment = For samba share of volume vol0
vfs objects = shadow_copy2 glusterfs
glusterfs:volume = vol0
glusterfs:logfile = /var/log/samba/glusterfs-vol0.%M.log
glusterfs:loglevel = 3
path = /
read only = no
guest ok = yes
shadow:snapdir = /.snaps
shadow:basedir = /
shadow:sort = desc
shadow:snapprefix= ^S[A-Za-z0-9]*p$
shadow:format = _GMT-%Y.%m.%d-%H.%M.%S
In the above example, the mentioned parameters have to be added in the smb.conf file to enable shadow copy. The options mentioned are not mandatory.
Shadow copy will filter all the snapshots based on the smb.conf entries. It will only show those snapshots which matches the criteria. In the example mentioned earlier, the snapshot name should start with an 'S' and end with 'p' and any alpha numeric characters in between is considered for the search. For example in the list of the following snapshots, the first two snapshots will be shown by Windows and the last one will be ignored. Hence, these options will help us filter out what snapshots to show and what not to.
Snap_GMT-2016.06.06-06.06.06
Sl123p_GMT-2016.07.07-07.07.07
xyz_GMT-2016.08.08-08.08.08
After editing the smb.conf file, execute the following steps to enable snapshot access:
  1. Run service smb [re]start to start or restart the smb service.
  2. Enable User Serviceable Snapshot (USS) for Samba. For more information see Section 8.13, “User Serviceable Snapshots”
6.3.8.2. Accessing Snapshot
To access snapshot on the Windows system, execute the following steps:
  1. Right Click on the file or directory for which the previous version is required.
  2. Click on Restore previous versions.
  3. In the dialog box, select the Date/Time of the previous version of the file, and select either Open, Restore, or Copy.
    where,
    Open: Lets you open the required version of the file in read-only mode.
    Restore: Restores the file back to the selected version.
    Copy: Lets you copy the file to a different location.
    Accessing Snapshot

    Figure 6.1. Accessing Snapshot

6.3.9. Tuning Performance

This section provides details regarding improving the system performance in an SMB environment. The various enhancements tasks can be classified into:
  • Enabling Metadata Caching to improve the performance of SMB access of Red Hat Gluster Storage volumes.
  • Enhancing Directory Listing Performance
  • Enhancing File/Directory Create Performance
More detailed information for each of this is provided in the sections ahead.
6.3.9.1. Enabling Metadata Caching
Enable metadata caching to improve the performance of directory operations. Execute the following commands from any one of the nodes on the trusted storage pool in the order mentioned below.

Note

If majority of the workload is modifying the same set of files and directories simultaneously from multiple clients, then enabling metadata caching might not provide the desired performance improvement.
  1. Execute the following command to enable metadata caching and cache invalidation:
    # gluster volume set <volname> group metadata-cache
    This is group set option which sets multiple volume options in a single command.
  2. To increase the number of files that can be cached, execute the following command:
    # gluster volume set <VOLNAME> network.inode-lru-limit <n>
    n, is set to 50000. It can be increased if the number of active files in the volume is very high. Increasing this number increases the memory footprint of the brick processes.
6.3.9.2. Enhancing Directory Listing Performance
The directory listing gets slower as the number of bricks/nodes increases in a volume, though the file/directory numbers remain unchanged. By enabling the parallel readdir volume option, the performance of directory listing is made independent of the number of nodes/bricks in the volume. Thus, the increase in the scale of the volume does not reduce the directory listing performance.

Note

You can expect an increase in performance only if the distribute count of the volume is 2 or greater and the size of the directory is small (< 3000 entries). The larger the volume (distribute count) greater is the performance benefit.
To enable parallel readdir execute the following commands:
  1. Verify if the performance.readdir-ahead option is enabled by executing the following command:
    # gluster volume get <VOLNAME> performance.readdir-ahead
    If the performance.readdir-ahead is not enabled then execute the following command:
    # gluster volume set <VOLNAME> performance.readdir-ahead on
  2. Execute the following command to enable parallel-readdir option:
    # gluster volume set <VOLNAME> performance.parallel-readdir on

    Note

    If there are more than 50 bricks in the volume it is recommended to increase the cache size to be more than 10Mb (default value):
    # gluster volume set <VOLNAME> performance.rda-cache-limit <CACHE SIZE>
6.3.9.3. Enhancing File/Directory Create Performance
Before creating / renaming any file, lookups (5-6 in SMB) are sent to verify if the file already exists. By serving these lookup from the cache when possible, increases the create / rename performance by multiple folds in SMB access.
  1. Execute the following command to enable negative-lookup cache:
     # gluster volume set <volname> group nl-cache
       volume set success

    Note

    The above command also enables cache-invalidation and increases the timeout to 10 minutes.

6.4. POSIX Access Control Lists

Basic Linux file system permissions are assigned based on three user types: the owning user, members of the owning group, and all other users. POSIX Access Control Lists (ACLs) work around the limitations of this system by allowing administrators to also configure file and directory access permissions based on any user and any group, rather than just the owning user and group.
This section covers how to view and set access control lists, and how to ensure this feature is enabled on your Red Hat Gluster Storage volumes. For more detailed information about how ACLs work, see the Red Hat Enterprise Linux 7 System Administrator's Guide: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/System_Administrators_Guide/ch-Access_Control_Lists.html.

6.4.1. Setting ACLs with setfacl

The setfacl command lets you modify the ACLs of a specified file or directory. You can add access rules for a file with the -m subcommand, or remove access rules for a file with the -x subcommand. The basic syntax is as follows:
# setfacl subcommand access_rule file_path
The syntax of an access rule depends on which roles need to obey the rule.
Rules for users start with u:
# setfacl -m u:user:perms file_path
For example, setfacl -m u:fred:rw /mnt/data gives the user fred read and write access to the /mnt/data directory.
setfacl -x u::w /works_in_progress/my_presentation.txt prevents all users from writing to the /works_in_progress/my_presentation.txt file (except the owning user and members of the owning group, as these are controlled by POSIX).
Rules for groups start with g:
# setfacl -m g:group:perms file_path
For example, setfacl -m g:admins:rwx /etc/fstab gives users in the admins group read, write, and execute permissions to the /etc/fstab file.
setfacl -x g:newbies:x /mnt/harmful_script.sh prevents users in the newbies group from executing /mnt/harmful_script.sh.
Rules for other users start with o:
# setfacl -m o:perms file_path
For example, setfacl -m o:r /mnt/data/public gives users without any specific rules about their username or group permission to read files in the /mnt/data/public directory.
Rules for setting a maximum access level using an effective rights mask start with m:
# setfacl -m m:mask file_path
For example, setfacl -m m:r-x /mount/harmless_script.sh gives all users a maximum of read and execute access to the /mount/harmless_script.sh file.
You can set the default ACLs for a directory by adding d: to the beginning of any rule, or make a rule recursive with the -R option. For example, setfacl -Rm d:g:admins:rwx /etc gives all members of the admins group read, write, and execute access to any file created under the /etc directory after the point when setfacl is run.

6.4.2. Checking current ACLs with getfacl

The getfacl command lets you check the current ACLs of a file or directory. The syntax for this command is as follows:
# getfacl file_path
This prints a summary of current ACLs for that file. For example:
# getfacl /mnt/gluster/data/test/sample.jpg
# owner: antony
# group: antony
user::rw-
group::rw-
other::r--
If a directory has default ACLs set, these are prefixed with default:, like so:
# getfacl /mnt/gluster/data/doc
# owner: antony
# group: antony
user::rw-
user:john:r--
group::r--
mask::r--
other::r--
default:user::rwx
default:user:antony:rwx
default:group::r-x
default:mask::rwx
default:other::r-x

6.4.3. Mounting volumes with ACLs enabled

To mount a volume with ACLs enabled using the Native FUSE Client, use the acl mount option. For further information, see Section 6.1.3, “Mounting Red Hat Gluster Storage Volumes”.
ACLs are enabled by default on volumes mounted using the NFS and SMB access protocols. To check whether ACLs are enabled on other mounted volumes, see Section 6.4.4, “Checking ACL enablement on a mounted volume”.

6.4.4. Checking ACL enablement on a mounted volume

The following table shows you how to verify that ACLs are enabled on a mounted volume, based on the type of client your volume is mounted with.
Table 6.9. 
Client typeHow to checkFurther info
Native FUSE
Check the output of the mount command for the default_permissions option:
# mount | grep mountpoint
If default_permissions appears in the output for a mounted volume, ACLs are not enabled on that volume.
Check the output of the ps aux command for the gluster FUSE mount process (glusterfs):
# ps aux | grep gluster
root     30548  0.0  0.7 548408 13868 ?        Ssl  12:39   0:00 /usr/local/sbin/glusterfs --acl --volfile-server=127.0.0.2 --volfile-id=testvol /mnt/fuse_mnt
If --acl appears in the output for a mounted volume, ACLs are enabled on that volume.
See Section 6.1, “Native Client” for more information.
Gluster Native NFS
On the server side, check the output of the gluster volume info volname command. If nfs.acl appears in the output, that volume has ACLs disabled. If nfs.acl does not appear, ACLs are enabled (the default state).
On the client side, check the output of the mount command for the volume. If noacl appears in the output, ACLs are disabled on the mount point. If this does not appear in the output, the client checks that the server uses ACLs, and uses ACLs if server support is enabled.
Refer to the output of gluster volume set help pertaining to NFS, or see the Red Hat Enterprise Linux Storage Administration Guide for more information: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/ch-nfs.html
NFS Ganesha
On the server side, check the volume's export configuration file, /run/gluster/shared_storage/nfs-ganesha/exports/export.volname.conf. If the Disable_ACL option is set to true, ACLs are disabled. Otherwise, ACLs are enabled for that volume.

Note

NFS-Ganesha supports NFSv4 protocol standardized ACLs but not NFSACL protocol used for NFSv3 mounts. Only NFSv4 mounts can set ACLs.
There is no option to disable NFSv4 ACLs on the client side, so as long as the server supports ACLs, clients can set ACLs on the mount point.
See Section 6.2.3, “NFS Ganesha” for more information. For client side settings, refer to the Red Hat Enterprise Linux Storage Administration Guide: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/ch-nfs.html
samba
POSIX ACLs are enabled by default when using Samba to access a Red Hat Gluster Storage volume.
See Section 6.3, “SMB” for more information.

6.5. Managing Object Store

Object Store provides a system for data storage that enables users to access the same data, both as an object and as a file, thus simplifying management and controlling storage costs.
Red Hat Gluster Storage is based on glusterFS, an open source distributed file system. Object Store technology is built upon OpenStack Swift. OpenStack Swift allows users to store and retrieve files and content through a simple Web Service REST (Representational State Transfer) interface as objects. Red Hat Gluster Storage uses glusterFS as a back-end file system for OpenStack Swift. It also leverages on OpenStack Swift's REST interface for storing and retrieving files over the web combined with glusterFS features like scalability and high availability, replication, and elastic volume management for data management at disk level.
Object Store technology enables enterprises to adopt and deploy cloud storage solutions. It allows users to access and modify data as objects from a REST interface along with the ability to access and modify files from NAS interfaces. In addition to decreasing cost and making it faster and easier to access object data, it also delivers massive scalability, high availability and replication of object storage. Infrastructure as a Service (IaaS) providers can utilize Object Store technology to enable their own cloud storage service. Enterprises can use this technology to accelerate the process of preparing file-based applications for the cloud and simplify new application development for cloud computing environments.
OpenStack Swift is an open source software for creating redundant, scalable object storage using clusters of standardized servers to store petabytes of accessible data. It is not a file system or real-time data storage system, but rather a long-term storage system for a more permanent type of static data that can be retrieved, leveraged, and updated.

6.5.1. Architecture Overview

OpenStack Swift and Red Hat Gluster Storage integration consists of:
The following diagram illustrates OpenStack Object Storage integration with Red Hat Gluster Storage:
Object Store Architecture

Figure 6.2. Object Store Architecture

Important

On Red Hat Enterprise Linux 7, enable the Object Store firewall service in the active zones for runtime and permanent mode using the following commands:
To get a list of active zones, run the following command:
# firewall-cmd  --get-active-zones
To add ports to the active zones, run the following commands:
# firewall-cmd  --zone=zone_name  --add-port=6010/tcp  --add-port=6011/tcp --add-port=6012/tcp  --add-port=8080/tcp

# firewall-cmd  --zone=zone_name --add-port=6010/tcp  --add-port=6011/tcp --add-port=6012/tcp  --add-port=8080/tcp   --permanent
Add the port number 443 only if your swift proxy server is configured with SSL. To add the port number, run the following commands:
# firewall-cmd --zone=zone_name --add-port=443/tcp
# firewall-cmd --zone=zone_name --add-port=443/tcp --permanent

6.5.2. Components of Object Store

The major components of Object Storage are:
Proxy Server
The Proxy Server is responsible for connecting to the rest of the OpenStack Object Storage architecture. For each request, it looks up the location of the account, container, or object in the ring and routes the request accordingly. The public API is also exposed through the proxy server. When objects are streamed to or from an object server, they are streamed directly through the proxy server to or from the user – the proxy server does not spool them.
The Ring
The Ring maps swift accounts to the appropriate Red Hat Gluster Storage volume. When other components need to perform any operation on an object, container, or account, they need to interact with the Ring to determine the correct Red Hat Gluster Storage volume.
Object and Object Server
An object is the basic storage entity and any optional metadata that represents the data you store. When you upload data, the data is stored as-is (with no compression or encryption).
The Object Server is a very simple storage server that can store, retrieve, and delete objects stored on local devices.
Container and Container Server
A container is a storage compartment for your data and provides a way for you to organize your data. Containers can be visualized as directories in a Linux system. However, unlike directories, containers cannot be nested. Data must be stored in a container and hence the objects are created within a container.
The Container Server’s primary job is to handle listings of objects. The listing is done by querying the glusterFS mount point with a path. This query returns a list of all files and directories present under that container.
Accounts and Account Servers
The OpenStack Swift system is designed to be used by many different storage consumers.
The Account Server is very similar to the Container Server, except that it is responsible for listing containers rather than objects. In Object Store, each Red Hat Gluster Storage volume is an account.
Authentication and Access Permissions
Object Store provides an option of using an authentication service to authenticate and authorize user access. Once the authentication service correctly identifies the user, it will provide a token which must be passed to Object Store for all subsequent container and object operations.
Other than using your own authentication services, the following authentication services are supported by Object Store:
  • Authenticate Object Store against an external OpenStack Keystone server.
    Each Red Hat Gluster Storage volume is mapped to a single account. Each account can have multiple users with different privileges based on the group and role they are assigned to. After authenticating using accountname:username and password, user is issued a token which will be used for all subsequent REST requests.
    Integration with Keystone

    When you integrate Red Hat Gluster Storage Object Store with Keystone authentication, you must ensure that the Swift account name and Red Hat Gluster Storage volume name are the same. It is common that Red Hat Gluster Storage volumes are created before exposing them through the Red Hat Gluster Storage Object Store.

    When working with Keystone, account names are defined by Keystone as the tenant id. You must create the Red Hat Gluster Storage volume using the Keystone tenant id as the name of the volume. This means, you must create the Keystone tenant before creating a Red Hat Gluster Storage Volume.

    Important

    Red Hat Gluster Storage does not contain any Keystone server components. It only acts as a Keystone client. After you create a volume for Keystone, ensure to export this volume for accessing it using the object storage interface. For more information on exporting volume, see Section 6.5.7.8, “Exporting the Red Hat Gluster Storage Volumes”.
    Integration with GSwauth

    GSwauth is a Web Server Gateway Interface (WGSI) middleware that uses a Red Hat Gluster Storage Volume itself as its backing store to maintain its metadata. The benefit in this authentication service is to have the metadata available to all proxy servers and saving the data to a Red Hat Gluster Storage volume.

    To protect the metadata, the Red Hat Gluster Storage volume should only be able to be mounted by the systems running the proxy servers. For more information on mounting volumes, see Chapter 6, Creating Access to Volumes.
    Integration with TempAuth

    You can also use the TempAuth authentication service to test Red Hat Gluster Storage Object Store in the data center.

6.5.3. Advantages of using Object Store

The advantages of using Object Store include:
  • Default object size limit of 1 TiB
  • Unified view of data across NAS and Object Storage technologies
  • High availability
  • Scalability
  • Replication
  • Elastic Volume Management

6.5.4. Limitations

This section lists the limitations of using Red Hat Gluster Storage Object Store:
  • Object Name
    Object Store imposes the following constraints on the object name to maintain the compatibility with network file access:
    • Object names must not be prefixed or suffixed by a '/' character. For example, a/b/
    • Object names must not have contiguous multiple '/' characters. For example, a//b
  • Account Management
    • Object Store does not allow account management even though OpenStack Swift allows the management of accounts. This limitation is because Object Store treats accounts equivalent to the Red Hat Gluster Storage volumes.
    • Object Store does not support account names (i.e. Red Hat Gluster Storage volume names) having an underscore.
    • In Object Store, every account must map to a Red Hat Gluster Storage volume.
  • Subdirectory Listing
    Headers X-Content-Type: application/directory and X-Content-Length: 0 can be used to create subdirectory objects under a container, but GET request on a subdirectory would not list all the objects under it.

6.5.5. Swift API Support Matrix

Subject to the limitations mentioned in Section 6.5.4, “Limitations”, the following table describes the support status for current Swift API’s functional features:
Table 6.10. Supported Features
FeatureStatus
AuthenticationSupported
Get Account MetadataSupported
Swift ACLsSupported
List ContainersSupported
Delete ContainerSupported
Create ContainerSupported
Get Container MetadataSupported
Update Container MetadataSupported
Delete Container MetadataSupported
List ObjectsSupported
Static WebsiteSupported
Create/Update an ObjectSupported
Create Large ObjectSupported
Delete ObjectSupported
Get ObjectSupported
Copy ObjectSupported
Get Object MetadataSupported
Add/Update Object MetadataSupported
Temp URL OperationsSupported
Expiring ObjectsSupported
Object VersioningSupported
Cross-Origin Resource Sharing (CORS)Supported
Bulk UploadSupported
Account QuotaUnsupported
Container QuotaUnsupported

6.5.6. Prerequisites

Ensure that you do the following before using Red Hat Gluster Storage Object Store.
  • Ensure that the openstack-swift-* and swiftonfile packages have matching version numbers.
    # rpm -qa | grep swift
    openstack-swift-container-1.13.1-6.el7ost.noarch
    openstack-swift-object-1.13.1-6.el7ost.noarch
    swiftonfile-1.13.1-6.el7rhgs.noarch
    openstack-swift-proxy-1.13.1-6.el7ost.noarch
    openstack-swift-doc-1.13.1-6.el7ost.noarch
    openstack-swift-1.13.1-6.el7ost.noarch
    openstack-swift-account-1.13.1-6.el7ost.noarch
  • Ensure that the gluster-swift services are owned by and run as the root user, not the swift user as in a typical OpenStack installation.
    # cd /usr/lib/systemd/system
    # sed -i s/User=swift/User=root/ openstack-swift-proxy.service openstack-swift-account.service openstack-swift-container.service openstack-swift-object.service openstack-swift-object-expirer.service
  • Start the memcached service:
    # service memcached start
  • Ensure that the ports for the Object, Container, Account, and Proxy servers are open. Note that the ports used for these servers are configurable. The ports listed in Table 6.11, “Ports required for Red Hat Gluster Storage Object Store” are the default values.
    Table 6.11. Ports required for Red Hat Gluster Storage Object Store
    ServerPort
    Object Server6010
    Container Server6011
    Account Server6012
    Proxy Server (HTTPS)443
    Proxy Server (HTTP)8080
  • Create and mount a Red Hat Gluster Storage volume for use as a Swift Account. For information on creating Red Hat Gluster Storage volumes, see Chapter 5, Setting Up Storage Volumes . For information on mounting Red Hat Gluster Storage volumes, see Chapter 6, Creating Access to Volumes .

6.5.7. Configuring the Object Store

This section provides instructions on how to configure Object Store in your storage environment.

Warning

When you install Red Hat Gluster Storage 3.2 and higher, the /etc/swift directory would contain both *.conf extension and *.conf-gluster files. You must delete the *.conf files and create new configuration files based on *.conf-gluster template. Otherwise, inappropriate python packages will be loaded and the component may not work as expected.
If you are upgrading to Red Hat Gluster Storage 3.2 and higher, the older configuration files will be retained and new configuration files will be created with .rpmnew extension. You must ensure to delete .conf files and folders (account-server, container-server, and object-server) for better understanding of the loaded configuration.
6.5.7.1. Configuring a Proxy Server
Create a new configuration file /etc/swift/proxy-server.conf by referencing the template file available at /etc/swift/proxy-server.conf-gluster.
6.5.7.1.1. Configuring a Proxy Server for HTTPS
By default, proxy server only handles HTTP requests. To configure the proxy server to process HTTPS requests, perform the following steps:
  1. Create self-signed cert for SSL using the following commands:
    # cd /etc/swift
    # openssl req -new -x509 -nodes -out cert.crt -keyout cert.key
  2. Add the following lines to /etc/swift/proxy-server.conf under [DEFAULT]
    bind_port = 443
     cert_file = /etc/swift/cert.crt
     key_file = /etc/swift/cert.key

Important

When Object Storage is deployed on two or more machines, not all nodes in your trusted storage pool are used. Installing a load balancer enables you to utilize all the nodes in your trusted storage pool by distributing the proxy server requests equally to all storage nodes.
Memcached allows nodes' states to be shared across multiple proxy servers. Edit the memcache_servers configuration option in the proxy-server.conf and list all memcached servers.
Following is an example listing the memcached servers in the proxy-server.conf file.
[filter:cache]
use = egg:swift#memcache
memcache_servers = 192.168.1.20:11211,192.168.1.21:11211,192.168.1.22:11211
The port number on which the memcached server is listening is 11211. You must ensure to use the same sequence for all configuration files.
6.5.7.2. Configuring the Authentication Service
This section provides information on configuring Keystone, GSwauth, and TempAuth authentication services.
6.5.7.2.1. Integrating with the Keystone Authentication Service
  • To configure Keystone, add authtoken and keystoneauth to /etc/swift/proxy-server.conf pipeline as shown below:
    [pipeline:main]
    pipeline = catch_errors healthcheck proxy-logging cache authtoken keystoneauth proxy-logging proxy-server
  • Add the following sections to /etc/swift/proxy-server.conf file by referencing the example below as a guideline. You must substitute the values according to your setup:
    [filter:authtoken]
    paste.filter_factory = keystoneclient.middleware.auth_token:filter_factory
    signing_dir = /etc/swift
    auth_host = keystone.server.com
    auth_port = 35357
    auth_protocol = http
    auth_uri = http://keystone.server.com:5000
    # if its defined
    admin_tenant_name = services
    admin_user = swift
    admin_password = adminpassword
    delay_auth_decision = 1
    
    [filter:keystoneauth]
    use = egg:swift#keystoneauth
    operator_roles = admin, SwiftOperator
    is_admin = true
    cache = swift.cache
Verify the Integrated Setup

Verify that the Red Hat Gluster Storage Object Store has been configured successfully by running the following command:

$ swift -V 2 -A http://keystone.server.com:5000/v2.0 -U tenant_name:user -K password stat
6.5.7.2.2. Integrating with the GSwauth Authentication Service
Integrating GSwauth

Perform the following steps to integrate GSwauth:

  1. Create and start a Red Hat Gluster Storage volume to store metadata.
    # gluster volume create NEW-VOLNAME NEW-BRICK
    # gluster volume start NEW-VOLNAME
    For example:
    # gluster volume create gsmetadata server1:/rhgs/brick1
    # gluster volume start gsmetadata
  2. Run gluster-swift-gen-builders tool with all the volumes to be accessed using the Swift client including gsmetadata volume:
    # gluster-swift-gen-builders gsmetadata other volumes
  3. Edit the /etc/swift/proxy-server.conf pipeline as shown below:
    [pipeline:main]
    pipeline = catch_errors cache gswauth proxy-server
  4. Add the following section to /etc/swift/proxy-server.conf file by referencing the example below as a guideline. You must substitute the values according to your setup.
    [filter:gswauth]
    use = egg:gluster_swift#gswauth
    set log_name = gswauth
    super_admin_key = gswauthkey
    metadata_volume = gsmetadata
    auth_type = sha1
    auth_type_salt = swauthsalt

    Important

    You must ensure to secure the proxy-server.conf file and the super_admin_key option to prevent unprivileged access.
  5. Restart the proxy server by running the following command:
    # swift-init proxy restart
Advanced Options:

You can set the following advanced options for GSwauth WSGI filter:

  • default-swift-cluster: The default storage-URL for the newly created accounts. When you attempt to authenticate for the first time, the access token and the storage-URL where data for the given account is stored will be returned.
  • token_life: The set default token life. The default value is 86400 (24 hours).
  • max_token_life: The maximum token life. You can set a token lifetime when requesting a new token with header x-auth-token-lifetime. If the passed in value is greater than the max_token_life, then the max_token_life value will be used.
GSwauth Common Options of CLI Tools

GSwauth provides CLI tools to facilitate managing accounts and users. All tools have some options in common:

  • -A, --admin-url: The URL to the auth. The default URL is http://127.0.0.1:8080/auth/.
  • -U, --admin-user: The user with administrator rights to perform action. The default user role is .super_admin.
  • -K, --admin-key: The key for the user with administrator rights to perform the action. There is no default value.
Preparing Red Hat Gluster Storage Volumes to Save Metadata

Prepare the Red Hat Gluster Storage volume for gswauth to save its metadata by running the following command:

# gswauth-prep [option]
For example:
# gswauth-prep -A http://10.20.30.40:8080/auth/ -K gswauthkey
6.5.7.2.2.1. Managing Account Services in GSwauth
Creating Accounts

Create an account for GSwauth. This account is mapped to a Red Hat Gluster Storage volume.

# gswauth-add-account [option] <account_name>
For example:
# gswauth-add-account -K gswauthkey <account_name>
Deleting an Account

You must ensure that all users pertaining to this account must be deleted before deleting the account. To delete an account:

# gswauth-delete-account [option] <account_name>
For example:
# gswauth-delete-account -K gswauthkey test
Setting the Account Service

Sets a service URL for an account. User with reseller admin role only can set the service URL. This command can be used to change the default storage URL for a given account. All accounts will have the same storage-URL as default value, which is set using default-swift-cluster option.

# gswauth-set-account-service [options] <account> <service> <name> <value>
For example:
# gswauth-set-account-service -K gswauthkey test storage local http://newhost:8080/v1/AUTH_test
6.5.7.2.2.2. Managing User Services in GSwauth
User Roles

The following user roles are supported in GSwauth:

  • A regular user has no rights. Users must be given both read and write privileges using Swift ACLs.
  • The admin user is a super-user at the account level. This user can create and delete users for that account. These members will have both write and read privileges to all stored objects in that account.
  • The reseller admin user is a super-user at the cluster level. This user can create and delete accounts and users and has read and write privileges to all accounts under that cluster.
  • GSwauth maintains its own swift account to store all of its metadata on accounts and users. The .super_admin role provides access to GSwauth own swift account and has all privileges to act on any other account or user.
The following table provides user access right information.
Table 6.12. User Role/Group with Allowed Actions
Role/GroupAllowed Actions
.super_admin (username)
  • Get Account List
  • Get Account Details
  • Create Account
  • Delete Account
  • Get User Details
  • Create admin user
  • Create reseller_admin user
  • Create regular user
  • Delete admin user
.reseller_admin (group)
  • Get Account List
  • Get Account Details
  • Create Account
  • Delete Account
  • Get User Details
  • Create admin user
  • Create regular user
  • Delete admin user
.admin (group)
  • Get Account Details
  • Get User Details
  • Create admin user
  • Create regular user
  • Delete admin user
regular user (type) No administrative actions.
Creating Users

You can create an user for an account that does not exist. The account will be created before creating the user.

You must add -r flag to create a reseller admin user and -a flag to create an admin user. To change the password or role of the user, you can run the same command with the new option.
# gswauth-add-user [option] <account_name> <user> <password>
For example
# gswauth-add-user -K gswauthkey -a test ana anapwd
Deleting a User

Delete a user by running the following command:

# gswauth-delete-user [option] <account_name> <user>
For example
# gwauth-delete-user -K gswauthkey test ana
Authenticating a User with the Swift Client

There are two methods to access data using the Swift client. The first and simple method is by providing the user name and password everytime. The swift client will acquire the token from gswauth.

For example:
$ swift -A http://127.0.0.1:8080/auth/v1.0 -U test:ana -K anapwd upload container1 README.md
The second method is a two-step process, first you must authenticate with a username and password to obtain a token and the storage URL. Then, you can make the object requests to the storage URL with the given token.
It is important to remember that tokens expires, so the authentication process needs to be repeated very often.
Authenticate a user with the cURL command:
# curl -v -H 'X-Storage-User: test:ana' -H 'X-Storage-Pass: anapwd' -k http://localhost:8080/auth/v1.0
...
< X-Auth-Token: AUTH_tk7e68ef4698f14c7f95af07ab7b298610
< X-Storage-Url: http://127.0.0.1:8080/v1/AUTH_test
...
Now, you use the given token and storage URL to access the object-storage using the Swift client:
$ swift --os-auth-token=AUTH_tk7e68ef4698f14c7f95af07ab7b298610 --os-storage-url=http://127.0.0.1:8080/v1/AUTH_test upload container1 README.md
README.md
bash-4.2$
bash-4.2$ swift --os-auth-token=AUTH_tk7e68ef4698f14c7f95af07ab7b298610 --os-storage-url=http://127.0.0.1:8080/v1/AUTH_test list container1
README.md

Important

Reseller admins must always use the second method to acquire a token to get access to other accounts other than his own. The first method of using the username and password will give them access only to their own accounts.
6.5.7.2.2.3. Managing Accounts and Users Information
Obtaining Accounts and User Information

You can obtain the accounts and users information including stored password.

# gswauth-list [options] [account] [user]
For example:
# gswauth-list -K gswauthkey test ana
+----------+
|  Groups  |
+----------+
| test:ana |
|   test   |
|  .admin  |
+----------+
  • If [account] and [user] are omitted, all the accounts will be listed.
  • If [account] is included but not [user], a list of users within that account will be listed.
  • If [account] and [user] are included, a list of groups that the user belongs to will be listed.
  • If the [user] is .groups, the active groups for that account will be listed.
The default output format is in tabular format. Adding -p option provides the output in plain text format, -j provides the output in JSON format.
Changing User Password

You can change the password of the user, account administrator, and reseller_admin roles.

  • Change the password of a regular user by running the following command:
    # gswauth-add-user -U account1:user1 -K old_passwd account1 user1 new_passwd
  • Change the password of an account administrator by running the following command:
    # gswauth-add-user -U account1:admin -K old_passwd -a account1 admin new_passwd
  • Change the password of the reseller_admin by running the following command:
    # gswauth-add-user -U account1:radmin -K old_passwd -r account1 radmin new_passwd
Cleaning Up Expired Tokens

Users with .super_admin role can delete the expired tokens.

You also have the option to provide the expected life of tokens, delete all tokens or delete all tokens for a given account.
# gswauth-cleanup-tokens [options]
For example
# gswauth-cleanup-tokens -K gswauthkey --purge test
The tokens will be deleted on the disk but it would still persist in memcached.
You can add the following options while cleaning up the tokens:
  • -t, --token-life: The expected life of tokens. The token objects modified before the give number of seconds will be checked for expiration (default: 86400).
  • --purge: Purges all the tokens for a given account whether the tokens have expired or not.
  • --purge-all: Purges all the tokens for all the accounts and users whether the tokens have expired or not.
6.5.7.2.3. Integrating with the TempAuth Authentication Service

Warning

TempAuth authentication service must only be used in test deployments and not for production.
TempAuth is automatically installed when you install Red Hat Gluster Storage. TempAuth stores user and password information as cleartext in a single proxy-server.conf file. In your /etc/swift/proxy-server.conf file, enable TempAuth in pipeline and add user information in TempAuth section by referencing the below example.
[pipeline:main]
pipeline = catch_errors healthcheck proxy-logging cache tempauth proxy-logging proxy-server

[filter:tempauth]
use = egg:swift#tempauth
user_admin_admin = admin.admin.reseller_admin
user_test_tester = testing .admin
user_test_tester2 = testing2
You can add users to the account in the following format:
user_accountname_username = password [.admin]
Here the accountname is the Red Hat Gluster Storage volume used to store objects.
You must restart the Object Store services for the configuration changes to take effect. For information on restarting the services, see Section 6.5.7.9, “Starting and Stopping Server”.
6.5.7.3. Configuring Object Servers
Create a new configuration file /etc/swift/object.server.conf by referencing the template file available at /etc/swift/object-server.conf-gluster.
6.5.7.4. Configuring Container Servers
Create a new configuration file /etc/swift/container-server.conf by referencing the template file available at /etc/swift/container-server.conf-gluster.
6.5.7.5. Configuring Account Servers
Create a new configuration file /etc/swift/account-server.conf by referencing the template file available at /etc/swift/account-server.conf-gluster.
6.5.7.6. Configuring Swift Object and Container Constraints
Create a new configuration file /etc/swift/swift.conf by referencing the template file available at /etc/swift/swift.conf-gluster.
6.5.7.7. Configuring Object Expiration
The Object Expiration feature allows you to schedule automatic deletion of objects that are stored in the Red Hat Gluster Storage volume. You can use the object expiration feature to specify a lifetime for specific objects in the volume; when the lifetime of an object expires, the object store would automatically quit serving that object and would shortly thereafter remove the object from the Red Hat Gluster Storage volume. For example, you might upload logs periodically to the volume, and you might need to retain those logs for only a specific amount of time.
The client uses the X-Delete-At or X-Delete-After headers during an object PUT or POST and the Red Hat Gluster Storage volume would automatically quit serving that object.

Note

Expired objects appear in container listings until they are deleted by the object-expirer daemon. This is an expected behavior.
A DELETE object request on an expired object would delete the object from Red Hat Gluster Storage volume (if it is yet to be deleted by the object expirer daemon). However, the client would get a 404 (Not Found) status in return. This is also an expected behavior.
6.5.7.7.1. Setting Up Object Expiration
Object expirer uses a separate account (a Red Hat Gluster Storage volume) named gsexpiring for managing object expiration. Hence, you must create a Red Hat Gluster Storage volume and name it as gsexpiring.
Create a new configuration file /etc/swift/object.expirer.conf by referencing the template file available at /etc/swift/object-expirer.conf-gluster.
6.5.7.7.2. Using Object Expiration
When you use the X-Delete-At or X-Delete-After headers during an object PUT or POST, the object is scheduled for deletion. The Red Hat Gluster Storage volume would automatically quit serving that object at the specified time and will shortly thereafter remove the object from the Red Hat Gluster Storage volume.
Use PUT operation while uploading a new object. To assign expiration headers to existing objects, use the POST operation.
X-Delete-At header

The X-Delete-At header requires a UNIX epoch timestamp, in integer form. For example, 1418884120 represents Thu, 18 Dec 2014 06:27:31 GMT. By setting the header to a specific epoch time, you indicate when you want the object to expire, not be served, and be deleted completely from the Red Hat Gluster Storage volume. The current time in Epoch notation can be found by running this command:

$ date +%s

  • Set the object expiry time during an object PUT with X-Delete-At header using cURL:
    # curl -v -X PUT -H 'X-Delete-At: 1392013619' http://127.0.0.1:8080/v1/AUTH_test/container1/object1 -T ./localfile
    Set the object expiry time during an object PUT with X-Delete-At header using swift client:
    # swift --os-auth-token=AUTH_tk99a39aecc3dd4f80b2b1e801d00df846 --os-storage-url=http://127.0.0.1:8080/v1/AUTH_test upload container1 ./localfile --header 'X-Delete-At: 1392013619'
X-Delete-After

The X-Delete-After header takes an integer number of seconds that represents the amount of time from now when you want the object to be deleted.

  • Set the object expiry time with an object PUT with X-Delete-After header using cURL:
    # curl -v -X PUT -H 'X-Delete-After: 3600' http://127.0.0.1:8080/v1/AUTH_test/container1/object1 -T ./localfile
    Set the object expiry time with an object PUT with X-Delete-At header using swift client:
    # swift --os-auth-token=AUTH_tk99a39aecc3dd4f80b2b1e801d00df846 --os-storage-url=http://127.0.0.1:8080/v1/AUTH_test upload container1 ./localfile --header 'X-Delete-After: 3600'
6.5.7.7.3. Running Object Expirer Service
The object-expirer service runs once in every 300 seconds, by default. You can modify the duration by configuring interval option in /etc/swift/object-expirer.conf file. For every pass it makes, it queries the gsexpiring account for tracker objects. Based on the timestamp and path present in the name of tracker objects, object-expirer deletes the actual object and the corresponding tracker object.
To start the object-expirer service:
# swift-init object-expirer start
To run the object-expirer once:
# swift-object-expirer -o -v /etc/swift/object-expirer.conf
6.5.7.8. Exporting the Red Hat Gluster Storage Volumes
After creating configuration files, you must now add configuration details for the system to identify the Red Hat Gluster Storage volumes to be accessible as Object Store. These configuration details are added to the ring files. The ring files provide the list of Red Hat Gluster Storage volumes to be accessible using the object storage interface to the Swift on File component.
Create the ring files for the current configurations by running the following command:
# cd /etc/swift
# gluster-swift-gen-builders VOLUME [VOLUME...]
For example,
# cd /etc/swift
# gluster-swift-gen-builders testvol1 testvol2 testvol3
Here testvol1, testvol2, and testvol3 are the Red Hat Gluster Storage volumes which will be mounted locally under the directory mentioned in the object, container, and account configuration files (default value is /mnt/gluster-object). The default value can be changed to a different path by changing the devices configurable option across all account, container, and object configuration files. The path must contain Red Hat Gluster Storage volumes mounted under directories having the same names as volume names. For example, if devices option is set to /home, it is expected that the volume named testvol1 be mounted at /home/testvol1.
Note that all the volumes required to be accessed using the Swift interface must be passed to the gluster-swift-gen-builders tool even if it was previously added. The gluster-swift-gen-builders tool creates new ring files every time it runs successfully.
To remove a VOLUME, run gluster-swift-gen-builders only with the volumes which are required to be accessed using the Swift interface.
For example, to remove the testvol2 volume, run the following command:
# gluster-swift-gen-builders testvol1 testvol3
You must restart the Object Store services after creating the new ring files.
6.5.7.9. Starting and Stopping Server
You must start or restart the server manually whenever you update or modify the configuration files. These processes must be owned and run by the root user.
  • To start the server, run the following command:
    # swift-init main start
  • To stop the server, run the following command:
    # swift-init main stop
  • To restart the server, run the following command:
    # swift-init main restart

6.5.8. Starting the Services Automatically

To configure the gluster-swift services to start automatically when the system boots, run the following commands:
On Red Hat Enterprise Linux 6:
# chkconfig memcached on
# chkconfig openstack-swift-proxy on
# chkconfig openstack-swift-account on
# chkconfig openstack-swift-container on
# chkconfig openstack-swift-object on
# chkconfig openstack-swift-object-expirer on
On Red Hat Enterprise Linux 7:
# systemctl enable openstack-swift-proxy.service
# systemctl enable openstack-swift-account.service
# systemctl enable openstack-swift-container.service
# systemctl enable openstack-swift-object.service
# systemctl enable openstack-swift-object-expirer.service
# systemctl enable openstack-swift-object-expirer.service
Configuring the gluster-swift services to start at boot time by using the systemctl command may require additional configuration. Refer to https://access.redhat.com/solutions/2043773 for details if you encounter problems.

Important

You must restart all Object Store services servers whenever you change the configuration and ring files.

6.5.9. Working with the Object Store

For more information on Swift operations, see OpenStack Object Storage API Reference Guide available at http://docs.openstack.org/api/openstack-object-storage/1.0/content/ .
6.5.9.1. Creating Containers and Objects
Creating container and objects in Red Hat Gluster Storage Object Store is very similar to OpenStack swift. For more information on Swift operations, see OpenStack Object Storage API Reference Guide available at http://docs.openstack.org/api/openstack-object-storage/1.0/content/.
6.5.9.2. Creating Subdirectory under Containers
You can create a subdirectory object under a container using the headers Content-Type: application/directory and Content-Length: 0. However, the current behavior of Object Store returns 200 OK on a GET request on subdirectory but this does not list all the objects under that subdirectory.
6.5.9.3. Working with Swift ACLs
Swift ACLs work with users and accounts. ACLs are set at the container level and support lists for read and write access. For more information on Swift ACLs, see http://docs.openstack.org/user-guide/content/managing-openstack-object-storage-with-swift-cli.html.

6.6. Checking Client Operating Versions

Different versions of Red Hat Gluster Storage support different features. Servers and clients identify the features that they are capable of supporting using an operating version number, or op-version. The cluster.op-version parameter sets the required operating version for all volumes in a cluster on the server side. Each client supports a range of operating versions that are identified by a minimum (min-op-version) and maximum (max-op-version) supported operating version.
Check the operating versions of the clients connected to a given volume by running the following command:
For Red Hat Gluster 3.2 and later
# gluster volume status volname clients
Use all in place of the name of your volume if you want to see the operating versions of clients connected to all volumes in the cluster.

Before Red Hat Gluster Storage 3.2:

  1. Perform a state dump for the volume whose clients you want to check.
    # gluster volume statedump volname
  2. Locate the state dump directory
    # gluster --print-statedumpdir
  3. Locate the state dump file and grep for client information.
    # grep -A4 "identifier=client_ip" statedumpfile

Chapter 7. Integrating Red Hat Gluster Storage with Windows Active Directory

In this chapter, the tasks necessary for integrating Red Hat Gluster Storage nodes into an existing Windows Active Directory domain are described. The following diagram describes the architecture of integrating Red Hat Gluster Storage with Windows Active Directory.
Active Directory Integration

Figure 7.1. Active Directory Integration

This section assumes that you have an active directory domain installed. Before we go ahead with the configuration details, following is a list of data along with examples that will be used in the sections ahead.
Table 7.1. 
InformationExample Value
DNS domain name / realmaddom.example.com
NetBIOS domain nameADDOM
Name of administrative accountadministrator
RHGS nodesrhs-srv1.addom.example.com, 192.168.56.10 rhs-srv2.addom.example.com, 192.168.56.11 rhs-srv3.addom.example.com, 192.168.56.12
Netbios name of the clusterRHS-SMB

7.1. Prerequisites

Before integration, the following steps have to be completed on an existing Red Hat Gluster Storage environment:
  • Name Resolution

    The Red Hat Gluster Storage nodes must be able to resolve names from the AD domain via DNS. To verify the same you can use the following command:

    host dc1.addom.example.com
    where, addom.example.com is the AD domain and dc1 is the name of a domain controller.
    For example, the /etc/resolv.conf file in a static network configuration could look like this:
    domain addom.example.com
    search addom.example.com
    nameserver 10.11.12.1 # dc1.addom.example.com
    nameserver 10.11.12.2 # dc2.addom.example.com
    This example assumes that both the domain controllers are also the DNS servers of the domain.
  • Kerberos Packages

    If you want to use the kerberos client utilities, like kinit and klist, then manually install the krb5-workstation using the following command:

    # yum -y install krb5-workstation
  • Synchronize Time Service

    It is essential that the time service on each Red Hat Gluster Storage node and the Windows Active Directory server are synchronized, else the Kerberos authentication may fail due to clock skew. In environments where time services are not reliable, the best practice is to configure the Red Hat Gluster Storage nodes to synchronize time from the Windows Server.

    On each Red Hat Storage node, edit the file /etc/ntp.conf so the time is synchronized from a known, reliable time service:
    # Enable writing of statistics records.
    #statistics clockstats cryptostats loopstats peerstats
    server ntp1.addom.example.com
    server 10.11.12.3
    Activate the change on each Red Hat Gluster Storage node by stopping the ntp daemon, updating the time, then starting the ntp daemon. Verify the change on both servers using the following commands:
    # service ntpd stop
    
    # service ntpd start
  • Samba Packages

    Ensure to install the following Samba packages along with its dependencies:

    • CTDB
    • samba
    • samba-client
    • samba-winbind
    • samba-winbind-modules

7.2. Integration

Integrating Red Hat Gluster Storage Servers into an Active Directory domain involves the following series of steps:
  1. Configure Authentication
  2. Join Active Directory Domain
  3. Verify/Test Active Directory and Services

7.2.1. Configure Authentication

In order to join a cluster to the Active Directory domain, a couple of files have to be edited manually on all nodes.

Note

  • Ensure that CTDB is configured before the active directory join. For more information see, Section 6.3.1 Setting up CTDB for Samba in the Red Hat Gluster Storage Administration Guide.
  • It is recommended to take backups of the configuration and of Samba’s databases (local and ctdb) before making any changes.
7.2.1.1. Basic Samba Configuration
As of Red Hat Gluster Storage 3.4 Batch 4 Update, the recommended idmap configuration method for new deployments is autorid, not tdb. Red Hat recommends autorid because in addition to automatically calculating user and group identifiers like tdb, it performs fewer database transactions and read operations, and is a prerequisite for supporting secure ID history (SID history).

Warning

Do not change the idmap configuration in existing deployments. Doing so requires a large number of changes, such as modifying the permissions and access control lists of all files in the shared file system, which unless done carefully can create user access problems. If you do need to change the idmap configuration settings for an existing deployment, contact Red Hat support for assistance.
The Samba configuration file /etc/samba/smb.conf must be identical on all nodes, and must contain the relevant parameters for AD. Along with that, a few other settings are required in order to activate mapping of user and group IDs.
The following example depicts the minimal Samba configuration for AD integration:
[global]
netbios name = RHS-SMB
workgroup = ADDOM
realm = addom.example.com
security = ads
clustering = yes
idmap config * : backend = autorid
idmap config * : range = 1000000-19999999
idmap config * : rangesize = 1000000

# -----------------RHS Options -------------------------
#
# The following line includes RHS-specific configuration options. Be careful with this line.

       include = /etc/samba/rhs-samba.conf

#=================Share Definitions =====================

Warning

The example above is the complete global section required in the smb.conf file. Ensure that nothing else appears in this section in order to prevent gluster mechanisms from changing settings when starting or stopping the ctdb lock volume.
The netbios name consists of only one name which has to be the same name on all cluster nodes. Windows clients will only access the cluster via that name (either in this short form or as an FQDN). The individual node hostname (rhs-srv1, rhs-srv2, …) must not be used for the netbios name parameter.

Note

  • The idmap range defines the lowerst and hightest identifier numbers that can be used. Specify a range large enough to cover the number of objects specified in rangesize.
  • The idmap rangesize specifies the number of identifiers available for each domain range. In this case there are one million identifiers per domain range, and the range parameter indicates that there are nearly 19 million identifiers total, meaning that there are a total of 19 possible domain ranges.
  • If you want to be able to use the individual host names to also access specific nodes, you can add them to the netbios aliases parameter of smb.conf.
  • In an AD environment, it is usually not required to run nmbd. However, if you have to run nmbd, then make sure to set the cluster addresses smb.conf option to the list of public IP addresses of the cluster.
7.2.1.2. Alternative Configuration using ad backend
If you need full control over Active Directory IDs, you can adapt the Samba configuration further by using the idmap_ad module in addition to autorid. The idmap_ad module reads the unix IDs from the AD's special unix attributes. This has to be configured by the AD domain's administrator before it can be used by Samba and winbind.
In order for Samba to use idmap_ad, the AD domain admin has to prepare the AD domain for using the so called unix extensions and assign unix IDs to all users and groups that should be able to access the Samba server.
For example, following is an extended Samba configuration file to use the idmap_ad backend for the ADDOM domain. The default autorid backend catches all objects from domains other than the ADDOM domain.
[global]
netbios name = RHS-SMB
workgroup = ADDOM
realm = addom.example.com
security = ads
clustering = yes
idmap config * : backend = autorid
idmap config * : range = 1000000-1999999
idmap config ADDOM : backend = ad
idmap config ADDOM : range = 3000000-3999999
idmap config ADDOM : schema mode = rfc2307
winbind nss info = rfc2307

# -------------------RHS Options -------------------------------
#
# The following line includes RHS-specific configuration options. Be careful with this line.

       include = /etc/samba/rhs-samba.conf

#===================Share Definitions =========================

Note

  • The range for the idmap_ad configuration is prescribed by the AD configuration. This has to be obtained by AD administrator.
  • Ranges for different idmap configurations must not overlap.
  • The schema mode and the winbind nss info setting should have the same value. If the domain is at level 2003R2 or newer, then rfc2307 is the correct value. For older domains, additional values sfu and sfu20 are available. See the manual pages of idmap_ad and smb.conf for further details.
7.2.1.3. Verifying the Samba Configuration
Test the new configuration file using the testparm command. For example:
# testparm -s
Load smb config files from /etc/samba/smb.conf
rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384)
Loaded services file OK.

Server role: ROLE_DOMAIN_MEMBER

# Global parameters
[global]
    workgroup = ADDOM
    realm = addom.example.com
    netbios name = RHS-SMB
    security = ADS
    clustering = Yes
    winbind nss info = rfc2307
    idmap config addom : schema mode = rfc2307
    idmap config addom : range = 3000000-3999999
    idmap config addom : backend = ad
    idmap config * : range = 1000000-1999999
    idmap config * : backend = autorid
7.2.1.4. nsswitch Configuration
Once the Samba configuration has been made, Samba has to be enabled to use the mapped users and groups from AD. This is achieved via the local Name Service Switch (NSS) that has to be made aware of the winbind. To use the winbind NSS module, edit the /etc/nsswitch.conf file. Make sure the file contains the winbind entries for the passwd and group databases. For example:
...
passwd: files winbind
group: files winbind
...
This will enable the use of winbind and should make users and groups visible on the individual cluster node once Samba is joined to AD and winbind is started.

7.2.2. Join Active Directory Domain

Prior to joining AD, CTDB must be started so that the machine account information can be stored in a database file that is available on all cluster nodes via CTDB. In addition to that, all other Samba services should be stopped. If key-based SSH authentication without a password has been configured for the root user between the nodes, you can use the onnode tool to run these commands on all nodes from a single node:
# onnode all service ctdb start
# onnode all service winbind stop
# onnode all service smb stop

Note

  • If your configuration has CTDB managing Winbind and Samba, they can be temporarily disabled with the following commands (to be executed prior to the above stop commands) so as to prevent CTDB going into an unhealthy state when they are shut down:
    # onnode all ctdb event script disable 49.winbind
    # onnode all ctdb event script disable 50.samba
  • For some versions of RHGS, a bug in the selinux policy prevents 'ctdb disablescript SCRIPT' from succeeding. If this is the case, 'chmod -x /etc/ctdb/events.d/SCRIPT' can be executed as a workaround from a root shell.
  • Shutting down winbind and smb is primarily to prevent access to SMB services during this AD integration. These services may be left running but access to them should be prevented through some other means.
The join is initiated via the net utility from a single node:

Warning

The following step must be executed only on one cluster node and should not be repeated on other cluster nodes. CTDB makes sure that the whole cluster is joined by this step.
# net ads join -U Administrator
Enter Administrator's password:
Using short domain name -- ADDOM
Joined 'RHS-SMB' to dns domain addom.example.com'
Not doing automatic DNS update in a clustered setup.
Once the join is successful, the cluster ip addresses and the cluster netbios name should be made public in the network. For registering multiple public cluster IP addresses in the AD DNS server, the net utility can be used again:
# net ads dns register rhs-smb <PUBLIC IP 1> <PUBLIC IP 2> ...
This command will make sure the DNS name rhs-smb will resolve to the given public IP addresses. The DNS registrations use the cluster machine account for authentication in AD, which means this operation only can be done after the join has succeeded.
Registering the NetBIOS name of the cluster is done by the nmbd service. In order to make sure that the nmbd instances on the hosts don’t overwrite each other’s registrations, the ‘cluster addresses’ smb.conf option should be set to the list of public addresses of the whole cluster.

7.2.3. Verify/Test Active Directory and Services

When the join is successful, the Samba and the Winbind daemons can be started.
Start nmdb using the following command:
# onnode all service nmb start
Start the winbind and smb services:
# onnode all service winbind start
# onnode all service smb start

Note

  • If you previously disabled CTDB’s ability to manage Winbind and Samba they can be re-enabled with the following commands:
    # onnode all ctdb event script enable 50.samba
    # onnode all ctdb event script enable 49.winbind
  • For some versions of RHGS, a bug in the selinux polict prevents 'ctdb enablescript SCRIPT' from succeeding. If this is the case, 'chmod +x /etc/ctdb/events.d/SCRIPT' can be executed as a workaround from a root shell.
  • Ensure that the winbind starts after a reboot. This is achieved by adding ‘CTDB_MANAGES_WINBIND=yes’ to the /etc/sysconfig/ctdb file on all nodes.
Execute the following verification steps:
  1. Verify the join by executing the following steps

    Verify the join to check if the created machine account can be used to authenticate to the AD LDAP server using the following command:
    # net ads testjoin
    Join is OK
  2. Execute the following command to display the machine account’s LDAP object
    # net ads status -P
    objectClass: top
    objectClass: person
    objectClass: organizationalPerson
    objectClass: user
    objectClass: computer
    cn: rhs-smb
    distinguishedName: CN=rhs-smb,CN=Computers,DC=addom,DC=example,DC=com
    instanceType: 4
    whenCreated: 20150922013713.0Z
    whenChanged: 20151126111120.0Z
    displayName: RHS-SMB$
    uSNCreated: 221763
    uSNChanged: 324438
    name: rhs-smb
    objectGUID: a178177e-4aa4-4abc-9079-d1577e137723
    userAccountControl: 69632
    badPwdCount: 0
    codePage: 0
    countryCode: 0
    badPasswordTime: 130880426605312806
    lastLogoff: 0
    lastLogon: 130930100623392945
    localPolicyFlags: 0
    pwdLastSet: 130930098809021309
    primaryGroupID: 515
    objectSid: S-1-5-21-2562125317-1564930587-1029132327-1196
    accountExpires: 9223372036854775807
    logonCount: 1821
    sAMAccountName: rhs-smb$
    sAMAccountType: 805306369
    dNSHostName: rhs-smb.addom.example.com
    servicePrincipalName: HOST/rhs-smb.addom.example.com
    servicePrincipalName: HOST/RHS-SMB
    objectCategory: CN=Computer,CN=Schema,CN=Configuration,DC=addom,DC=example,DC=com
    isCriticalSystemObject: FALSE
    dSCorePropagationData: 16010101000000.0Z
    lastLogonTimestamp: 130929563322279307
    msDS-SupportedEncryptionTypes: 31
    
  3. Execute the following command to display general information about the AD server:
    # net ads info
    LDAP server: 10.11.12.1
    LDAP server name: dc1.addom.example.com
    Realm: ADDOM.EXAMPLE.COM
    Bind Path: dc=ADDOM,dc=EXAMPLE,dc=COM
    LDAP port: 389
    Server time: Thu, 26 Nov 2015 11:15:04 UTC
    KDC server: 10.11.12.1
    Server time offset: -26
  4. Verify if winbind is operating correctly by executing the following steps

    Execute the following command to verify if winbindd can use the machine account for authentication to AD
    # wbinfo -t
    checking the trust secret for domain ADDOM via RPC calls succeeded
  5. Execute the following command to resolve the given name to a Windows SID
    # wbinfo --name-to-sid 'ADDOM\Administrator'
    S-1-5-21-2562125317-1564930587-1029132327-500 SID_USER (1)
  6. Execute the following command to verify authentication:
    # wbinfo -a 'ADDOM\user'
    Enter ADDOM\user's password:
    plaintext password authentication succeeded
    Enter ADDOM\user's password:
    challenge/response password authentication succeeded
    or,
    # wbinfo -a 'ADDOM\user%password'
    plaintext password authentication succeeded
    challenge/response password authentication succeeded
  7. Execute the following command to verify if the id-mapping is working properly:
    # wbinfo --sid-to-uid <SID-OF-ADMIN>
    1000000
  8. Execute the following command to verify if the winbind Name Service Switch module works correctly:
    # getent passwd 'ADDOM\Administrator'
    ADDOM\administrator:*:1000000:1000004::/home/ADDOM/administrator:/bin/false
  9. Execute the following command to verify if samba can use winbind and the NSS module correctly:
    # smbclient -L rhs-smb -U 'ADDOM\Administrator'
    Domain=[ADDOM] OS=[Windows 6.1] Server=[Samba 4.2.4]
    
            Sharename       Type      Comment
            ---------       ----      -------
            IPC$            IPC       IPC Service (Samba 4.2.4)
    Domain=[ADDOM] OS=[Windows 6.1] Server=[Samba 4.2.4]
    
            Server               Comment
            ---------            -------
            RHS-SMB         Samba 4.2.4
    
            Workgroup            Master
            ---------            -------
            ADDOM             RHS-SMB
    

Part IV. Manage

Chapter 8. Managing Snapshots

Red Hat Gluster Storage Snapshot feature enables you to create point-in-time copies of Red Hat Gluster Storage volumes, which you can use to protect data. Users can directly access Snapshot copies which are read-only to recover from accidental deletion, corruption, or modification of the data.
Description

Figure 8.1. Snapshot Architecture

In the Snapshot Architecture diagram, Red Hat Gluster Storage volume consists of multiple bricks (Brick1 Brick2 etc) which is spread across one or more nodes and each brick is made up of independent thin Logical Volumes (LV). When a snapshot of a volume is taken, it takes the snapshot of the LV and creates another brick. Brick1_s1 is an identical image of Brick1. Similarly, identical images of each brick is created and these newly created bricks combine together to form a snapshot volume.
Some features of snapshot are:
  • Crash Consistency

    A crash consistent snapshot is captured at a particular point-in-time. When a crash consistent snapshot is restored, the data is identical as it was at the time of taking a snapshot.

    Note

    Currently, application level consistency is not supported.
  • Online Snapshot

    Snapshot is an online snapshot hence the file system and its associated data continue to be available for the clients even while the snapshot is being taken.

  • Quorum Based

    The quorum feature ensures that the volume is in a good condition while the bricks are down. If any brick that is down for a n way replication, where n <= 2 , quorum is not met. In a n-way replication where n >= 3, quorum is met when m bricks are up, where m >= (n/2 +1) where n is odd and m >= n/2 and the first brick is up where n is even. If quorum is not met snapshot creation fails.

  • Barrier

    To guarantee crash consistency some of the fops are blocked during a snapshot operation.

    These fops are blocked till the snapshot is complete. All other fops is passed through. There is a default time-out of 2 minutes, within that time if snapshot is not complete then these fops are unbarriered. If the barrier is unbarriered before the snapshot is complete then the snapshot operation fails. This is to ensure that the snapshot is in a consistent state.

Note

Taking a snapshot of a Red Hat Gluster Storage volume that is hosting the Virtual Machine Images is not recommended. Taking a Hypervisor assisted snapshot of a virtual machine would be more suitable in this use case.

8.1. Prerequisites

Before using this feature, ensure that the following prerequisites are met:
Recommended Setup

The recommended setup for using Snapshot is described below. In addition, you must ensure to read Chapter 20, Tuning for Performance for enhancing snapshot performance:
  • For each volume brick, create a dedicated thin pool that contains the brick of the volume and its (thin) brick snapshots. With the current thin-p design, avoid placing the bricks of different Red Hat Gluster Storage volumes in the same thin pool, as this reduces the performance of snapshot operations, such as snapshot delete, on other unrelated volumes.
  • The recommended thin pool chunk size is 256KB. There might be exceptions to this in cases where we have a detailed information of the customer's workload.
  • The recommended pool metadata size is 0.1% of the thin pool size for a chunk size of 256KB or larger. In special cases, where we recommend a chunk size less than 256KB, use a pool metadata size of 0.5% of thin pool size.
For Example

To create a brick from device /dev/sda1.
  1. Create a physical volume(PV) by using the pvcreate command.
    pvcreate /dev/sda1
    Use the correct dataalignment option based on your device. For more information, Section 20.2, “Brick Configuration”
  2. Create a Volume Group (VG) from the PV using the following command:
    vgcreate dummyvg /dev/sda1
  3. Create a thin-pool using the following command:
    # lvcreate --size 1T --thin dummyvg/dummypool --chunksize 256k --poolmetadatasize 16G  --zero n
    A thin pool of size 1 TB is created, using a chunksize of 256 KB. Maximum pool metadata size of 16 G is used.
  4. Create a thinly provisioned volume from the previously created pool using the following command:
    # lvcreate --virtualsize 1G --thin dummyvg/dummypool --name dummylv
  5. Create a file system (XFS) on this. Use the recommended options to create the XFS file system on the thin LV.
    For example,
    mkfs.xfs -f -i size=512 -n size=8192 /dev/dummyvg/dummylv
  6. Mount this logical volume and use the mount path as the brick.
    mount /dev/dummyvg/dummylv /mnt/brick1

8.2. Creating Snapshots

Before creating a snapshot ensure that the following prerequisites are met:
  • Red Hat Gluster Storage volume has to be present and the volume has to be in the Started state.
  • All the bricks of the volume have to be on an independent thin logical volume(LV).
  • Snapshot names must be unique in the cluster.
  • All the bricks of the volume should be up and running, unless it is a n-way replication where n >= 3. In such case quorum must be met. For more information see Chapter 8, Managing Snapshots
  • No other volume operation, like rebalance, add-brick, etc, should be running on the volume.
  • Total number of snapshots in the volume should not be equal to Effective snap-max-hard-limit. For more information see Configuring Snapshot Behavior.
  • If you have a geo-replication setup, then pause the geo-replication session if it is running, by executing the following command:
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL pause
    For example,
    # gluster volume geo-replication master-vol example.com::slave-vol pause
    Pausing geo-replication session between master-vol example.com::slave-vol has been successful
    Ensure that you take the snapshot of the master volume and then take snapshot of the slave volume.
To create a snapshot of the volume, run the following command:
# gluster snapshot create <snapname> <volname> [no-timestamp] [description <description>] [force]
where,
  • snapname - Name of the snapshot that will be created.
  • VOLNAME(S) - Name of the volume for which the snapshot will be created. We only support creating snapshot of single volume.
  • description - This is an optional field that can be used to provide a description of the snap that will be saved along with the snap.
  • force - Snapshot creation will fail if any brick is down. In a n-way replicated Red Hat Gluster Storage volume where n >= 3 snapshot is allowed even if some of the bricks are down. In such case quorum is checked. Quorum is checked only when the force option is provided, else by-default the snapshot create will fail if any brick is down. Refer the Overview section for more details on quorum.
  • no-timestamp: By default a timestamp is appended to the snapshot name. If you do not want to append timestamp then pass no-timestamp as an argument.

Note

Snapshots are not activated on creation by default; to enable this behavior for all future snapshot creations, set the activate-on-create parameter to enabled.
For Example 1:
# gluster snapshot create snap1 vol1 no-timestamp
snapshot create: success: Snap snap1 created successfully
For Example 2:
# gluster snapshot create snap1 vol1
snapshot create: success: Snap snap1_GMT-2015.07.20-10.02.33 created successfully
Snapshot of a Red Hat Gluster Storage volume creates a read-only Red Hat Gluster Storage volume. This volume will have identical configuration as of the original / parent volume. Bricks of this newly created snapshot is mounted as /var/run/gluster/snaps/<snap-volume-name>/brick<bricknumber>.
For example, a snapshot with snap volume name 0888649a92ea45db8c00a615dfc5ea35 and having two bricks will have the following two mount points:
/var/run/gluster/snaps/0888649a92ea45db8c00a615dfc5ea35/brick1
/var/run/gluster/snaps/0888649a92ea45db8c00a615dfc5ea35/brick2
These mounts can also be viewed using the df or mount command.

Note

If you have a geo-replication setup, after creating the snapshot, resume the geo-replication session by running the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL resume
For example,
# gluster volume geo-replication master-vol example.com::slave-vol resume
Resuming geo-replication session between master-vol example.com::slave-vol has been successful
Volume snapshot creation results in the creation of snapshot pool of blocks that contains a copy of the LVM metadata. After taking a snapshot, when new data is written to gluster volume, the snapshot pool is overwritten and the changes are copied to the main gluster volume. As a result, the snapshot pool consumes more metadata space if data changes after the snapshot is taken.

8.3. Cloning a Snapshot

A clone or a writable snapshot is a new volume, which is created from a particular snapshot.
To clone a snapshot, execute the following command.
# gluster snapshot clone <clonename> <snapname>
where,
clonename: It is the name of the clone, ie, the new volume that will be created.
snapname: It is the name of the snapshot that is being cloned.

Note

  • Unlike restoring a snapshot, the original snapshot is still retained, after it has been cloned.
  • The snapshot should be in activated state and all the snapshot bricks should be in running state before taking clone. Also the server nodes should be in quorum.
  • This is a space efficient clone therefore both the Clone (new volume) and the snapshot LVM share the same LVM backend. The space consumption of the LVM grow as the new volume (clone) diverge from the snapshot.
For example:
# gluster snapshot clone clone_vol snap1
snapshot clone: success: Clone clone_vol created successfully
To check the status of the newly cloned snapshot execute the following command
# gluster vol info <clonename>
For example:
# gluster vol info clone_vol

Volume Name: clone_vol
Type: Distribute
Volume ID: cdd59995-9811-4348-8e8d-988720db3ab9
Status: Created
Number of Bricks: 1
Transport-type: tcp
Bricks:
Brick1: 10.00.00.01:/var/run/gluster/snaps/clone_vol/brick1/brick3
Options Reconfigured:
performance.readdir-ahead: on
In the example it is observed that clone is in Created state, similar to a newly created volume. This volume should be explicitly started to use this volume.

8.4. Listing of Available Snapshots

To list all the snapshots that are taken for a specific volume, run the following command:
# gluster snapshot list [VOLNAME]
where,
  • VOLNAME - This is an optional field and if provided lists the snapshot names of all snapshots present in the volume.
For Example:
# gluster snapshot list
snap3
# gluster snapshot list test_vol
No snapshots present

8.5. Getting Information of all the Available Snapshots

The following command provides the basic information of all the snapshots taken. By default the information of all the snapshots in the cluster is displayed:
# gluster snapshot info [(<snapname> | volume VOLNAME)]
where,
  • snapname - This is an optional field. If the snapname is provided then the information about the specified snap is displayed.
  • VOLNAME - This is an optional field. If the VOLNAME is provided the information about all the snaps in the specified volume is displayed.
For Example:
# gluster snapshot info snap3
Snapshot                  : snap3
Snap UUID                 : b2a391ce-f511-478f-83b7-1f6ae80612c8
Created                   : 2014-06-13 09:40:57
Snap Volumes:

     Snap Volume Name          : e4a8f4b70a0b44e6a8bff5da7df48a4d
     Origin Volume name        : test_vol1
     Snaps taken for test_vol1      : 1
     Snaps available for test_vol1  : 255
     Status                    : Started

8.6. Getting the Status of Available Snapshots

This command displays the running status of the snapshot. By default the status of all the snapshots in the cluster are displayed. To check the status of all the snapshots that are taken for a particular volume, specify a volume name:
# gluster snapshot status [(<snapname> | volume VOLNAME)]
where,
  • snapname - This is an optional field. If the snapname is provided then the status about the specified snap is displayed.
  • VOLNAME - This is an optional field. If the VOLNAME is provided the status about all the snaps in the specified volume is displayed.
For example:
# gluster snapshot status snap3

Snap Name : snap3
Snap UUID : b2a391ce-f511-478f-83b7-1f6ae80612c8

     Brick Path        :
10.70.42.248:/var/run/gluster/snaps/e4a8f4b70a0b44e6a8bff5da7df48a4d/brick1/brick1
     Volume Group      :   snap_lvgrp1
     Brick Running     :   Yes
     Brick PID         :   1640
     Data Percentage   :   1.54
     LV Size           :   616.00m


     Brick Path        :
10.70.43.139:/var/run/gluster/snaps/e4a8f4b70a0b44e6a8bff5da7df48a4d/brick2/brick3
     Volume Group      :   snap_lvgrp1
     Brick Running     :   Yes
     Brick PID         :   3900
     Data Percentage   :   1.80
     LV Size           :   616.00m


     Brick Path        :
10.70.43.34:/var/run/gluster/snaps/e4a8f4b70a0b44e6a8bff5da7df48a4d/brick3/brick4
     Volume Group      :   snap_lvgrp1
     Brick Running     :   Yes
     Brick PID         :   3507
     Data Percentage   :   1.80
     LV Size           :   616.00m

Note

This shows the status of an activated snapshot.

8.7. Configuring Snapshot Behavior

The configurable parameters for snapshot are:
  • snap-max-hard-limit: If the snapshot count in a volume reaches this limit then no further snapshot creation is allowed. The range is from 1 to 256. Once this limit is reached you have to remove the snapshots to create further snapshots. This limit can be set for the system or per volume. If both system limit and volume limit is configured then the effective max limit would be the lowest of the two value.
  • snap-max-soft-limit: This is a percentage value. The default value is 90%. This configuration works along with auto-delete feature. If auto-delete is enabled then it will delete the oldest snapshot when snapshot count in a volume crosses this limit. When auto-delete is disabled it will not delete any snapshot, but it will display a warning message to the user.
  • auto-delete: This will enable or disable auto-delete feature. By default auto-delete is disabled. When enabled it will delete the oldest snapshot when snapshot count in a volume crosses the snap-max-soft-limit. When disabled it will not delete any snapshot, but it will display a warning message to the user
  • activate-on-create: Snapshots are not activated at creation time by default. If you want created snapshots to immediately be activated after creation, set the activate-on-create parameter to enabled. Note that all volumes are affected by this setting.
  • Displaying the Configuration Values

    To display the existing configuration values for a volume or the entire cluster, run the following command:

    # gluster snapshot config [VOLNAME]
    where:
    • VOLNAME: This is an optional field. The name of the volume for which the configuration values are to be displayed.
    If the volume name is not provided then the configuration values of all the volume is displayed. System configuration details are displayed irrespective of whether the volume name is specified or not.
    For Example:
    # gluster snapshot config
    
    Snapshot System Configuration:
    snap-max-hard-limit : 256
    snap-max-soft-limit : 90%
    auto-delete : disable
    activate-on-create : disable
    
    Snapshot Volume Configuration:
    
    Volume : test_vol
    snap-max-hard-limit : 256
    Effective snap-max-hard-limit : 256
    Effective snap-max-soft-limit : 230 (90%)
    
    Volume : test_vol1
    snap-max-hard-limit : 256
    Effective snap-max-hard-limit : 256
    Effective snap-max-soft-limit : 230 (90%)
  • Changing the Configuration Values

    To change the existing configuration values, run the following command:

    # gluster snapshot config [VOLNAME] ([snap-max-hard-limit <count>] [snap-max-soft-limit <percent>]) | ([auto-delete <enable|disable>]) | ([activate-on-create <enable|disable>])
    where:
    • VOLNAME: This is an optional field. The name of the volume for which the configuration values are to be changed. If the volume name is not provided, then running the command will set or change the system limit.
    • snap-max-hard-limit: Maximum hard limit for the system or the specified volume.
    • snap-max-soft-limit: Soft limit mark for the system.
    • auto-delete: This enables or disables the auto-delete feature. By default auto-delete is disabled.
    • activate-on-create: This enables or disables the activate-on-create feature for all volumes. By default activate-on-create is disabled.
    For Example:
    # gluster snapshot config test_vol snap-max-hard-limit 100
    Changing snapshot-max-hard-limit will lead to deletion of snapshots if
    they exceed the new limit.
    Do you want to continue? (y/n) y
    snapshot config: snap-max-hard-limit for test_vol set successfully

8.8. Activating and Deactivating a Snapshot

Only activated snapshots are accessible. Check the Accessing Snapshot section for more details. Since each snapshot is a Red Hat Gluster Storage volume it consumes some resources hence if the snapshots are not needed it would be good to deactivate them and activate them when required. To activate a snapshot run the following command:
# gluster snapshot activate <snapname> [force]
where:
  • snapname: Name of the snap to be activated.
  • force: If some of the bricks of the snapshot volume are down then use the force command to start them.
For Example:
# gluster snapshot activate snap1
To deactivate a snapshot, run the following command:
# gluster snapshot deactivate <snapname>
where:
  • snapname: Name of the snap to be deactivated.
For example:
# gluster snapshot deactivate snap1

8.9. Deleting Snapshot

Before deleting a snapshot ensure that the following prerequisites are met:
  • Snapshot with the specified name should be present.
  • Red Hat Gluster Storage nodes should be in quorum.
  • No volume operation (e.g. add-brick, rebalance, etc) should be running on the original / parent volume of the snapshot.
To delete a snapshot run the following command:
# gluster snapshot delete <snapname>
where,
  • snapname - The name of the snapshot to be deleted.
For Example:
# gluster snapshot delete snap2
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: snap2: snap removed successfully

Note

Red Hat Gluster Storage volume cannot be deleted if any snapshot is associated with the volume. You must delete all the snapshots before issuing a volume delete.

8.9.1. Deleting Multiple Snapshots

Multiple snapshots can be deleted using either of the following two commands.
To delete all the snapshots present in a system, execute the following command:
# gluster snapshot delete all
To delete all the snapshot present in a specified volume, execute the following command:
# gluster snapshot delete volume <volname>

8.10. Restoring Snapshot

Before restoring a snapshot ensure that the following prerequisites are met
  • The specified snapshot has to be present
  • The original / parent volume of the snapshot has to be in a stopped state.
  • Red Hat Gluster Storage nodes have to be in quorum.
  • No volume operation (e.g. add-brick, rebalance, etc) should be running on the origin or parent volume of the snapshot.
    # gluster snapshot restore <snapname>
    where,
    • snapname - The name of the snapshot to be restored.
    For Example:
    # gluster snapshot restore snap1
    Snapshot restore: snap1: Snap restored successfully
    After snapshot is restored and the volume is started, trigger a self-heal by running the following command:
    # gluster volume heal VOLNAME full

    Note

  • In the cluster, identify the nodes participating in the snapshot with the snapshot status command. For example:
     # gluster snapshot status snapname
    
        Snap Name : snapname
        Snap UUID : bded7c02-8119-491b-a7e1-cc8177a5a1cd
    
         Brick Path        :   10.70.43.46:/var/run/gluster/snaps/816e8403874f43a78296decd7c127205/brick2/brick2
         Volume Group      :   snap_lvgrp
         Brick Running     :   Yes
         Brick PID         :   8303
         Data Percentage   :   0.43
         LV Size           :   2.60g
    
    
         Brick Path        :   10.70.42.33:/var/run/gluster/snaps/816e8403874f43a78296decd7c127205/brick3/brick3
         Volume Group      :   snap_lvgrp
         Brick Running     :   Yes
         Brick PID         :   4594
         Data Percentage   :   42.63
         LV Size           :   2.60g
    
    
         Brick Path        :   10.70.42.34:/var/run/gluster/snaps/816e8403874f43a78296decd7c127205/brick4/brick4
         Volume Group      :   snap_lvgrp
         Brick Running     :   Yes
         Brick PID         :   23557
         Data Percentage   :   12.41
         LV Size           :   2.60g
    
    • In the nodes identified above, check if the geo-replication repository is present in /var/lib/glusterd/snaps/snapname. If the repository is present in any of the nodes, ensure that the same is present in /var/lib/glusterd/snaps/snapname throughout the cluster. If the geo-replication repository is missing in any of the nodes in the cluster, copy it to /var/lib/glusterd/snaps/snapname in that node.
    • Restore snapshot of the volume using the following command:
      # gluster snapshot restore snapname
Restoring Snapshot of a Geo-replication Volume

If you have a geo-replication setup, then perform the following steps to restore snapshot:

  1. Stop the geo-replication session.
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop
  2. Stop the slave volume and then the master volume.
    # gluster volume stop VOLNAME
  3. Restore snapshot of the slave volume and the master volume.
    # gluster snapshot restore snapname
  4. Start the slave volume first and then the master volume.
    # gluster volume start VOLNAME
  5. Start the geo-replication session.
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start
    
  6. Resume the geo-replication session.
    # gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL resume
    

8.11. Accessing Snapshots

Snapshot of a Red Hat Gluster Storage volume can be accessed only via FUSE mount. Use the following command to mount the snapshot.
mount -t glusterfs <hostname>:/snaps/<snapname>/parent-VOLNAME /mount_point
  • parent-VOLNAME - Volume name for which we have created the snapshot.
    For example,
    # mount -t glusterfs myhostname:/snaps/snap1/test_vol /mnt
Since the Red Hat Gluster Storage snapshot volume is read-only, no write operations are allowed on this mount. After mounting the snapshot the entire snapshot content can then be accessed in a read-only mode.

Note

NFS and CIFS mount of snapshot volume is not supported.
Snapshots can also be accessed via User Serviceable Snapshots. For more information see, Section 8.13, “User Serviceable Snapshots”

Warning

External snapshots, such as snapshots of a virtual machine/instance, where Red Hat Gluster Storage Server is installed as a guest OS or FC/iSCSI SAN snapshots are not supported.

8.12. Scheduling of Snapshots

Snapshot scheduler creates snapshots automatically based on the configured scheduled interval of time. The snapshots can be created every hour, a particular day of the month, particular month, or a particular day of the week based on the configured time interval. The following sections describes scheduling of snapshots in detail.

8.12.1. Prerequisites

  • To initialize snapshot scheduler on all the nodes of the cluster, execute the following command:
    snap_scheduler.py init
    
    This command initializes the snap_scheduler and interfaces it with the crond running on the local node. This is the first step, before executing any scheduling related commands from a node.

    Note

    This command has to be run on all the nodes participating in the scheduling. Other options can be run independently from any node, where initialization has been successfully completed.
  • A shared storage named gluster_shared_storage is used across nodes to co-ordinate the scheduling operations. This shared storage is mounted at /var/run/gluster/shared_storage on all the nodes. For more information see, Section 11.12, “Setting up Shared Storage Volume”
  • All nodes in the cluster have their times synced using NTP or any other mechanism. This is a hard requirement for this feature to work.
  • If you are on Red Hat Enterprise Linux 7.1 or later, set the cron_system_cronjob_use_shares boolean to on by running the following command:
    # setsebool -P cron_system_cronjob_use_shares on
    

8.12.2. Snapshot Scheduler Options

Note

There is a latency of one minute, between providing a command by the helper script and for the command to take effect. Hence, currently, we do not support snapshot schedules with per minute granularity.
Enabling Snapshot Scheduler

To enable snap scheduler, execute the following command:

snap_scheduler.py enable

Note

Snapshot scheduler is disabled by default after initialization
For example:
# snap_scheduler.py enable
snap_scheduler: Snapshot scheduling is enabled
Disabling Snapshot Scheduler

To enable snap scheduler, execute the following command:

 snap_scheduler.py disable
For example:
# snap_scheduler.py disable
snap_scheduler: Snapshot scheduling is disabled
Displaying the Status of Snapshot Scheduler

To display the the current status(Enabled/Disabled) of the snap scheduler, execute the following command:

snap_scheduler.py status
For example:
# snap_scheduler.py status
snap_scheduler: Snapshot scheduling status: Disabled
Adding a Snapshot Schedule

To add a snapshot schedule, execute the following command:

snap_scheduler.py add "Job Name" "Schedule" "Volume Name"
where,
Job Name: This name uniquely identifies this particular schedule, and can be used to reference this schedule for future events like edit/delete. If a schedule already exists for the specified Job Name, the add command will fail.
Schedule: The schedules are accepted in the format crond understands. For example:
Example of job definition:
.---------------- minute (0 - 59)
| .------------- hour (0 - 23)
| | .---------- day of month (1 - 31)
| | | .------- month (1 - 12) OR jan,feb,mar,apr ...
| | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
| | | | |
* * * * * user-name command to be executed
Volume name: The name of the volume on which the scheduled snapshot operation will be performed
For example:
# snap_scheduler.py add "Job1" "* * * * *" test_vol
snap_scheduler: Successfully added snapshot schedule

Note

The snapshots taken by the scheduler will have the following naming convention: Scheduler-<Job Name>-<volume name>_<Timestamp>.
For example:
Scheduled-Job1-test_vol_GMT-2015.06.19-09.47.01
Editing a Snapshot Schedule

To edit an existing snapshot schedule, execute the following command:

snap_scheduler.py edit "Job Name" "Schedule" "Volume Name"
where,
Job Name: This name uniquely identifies this particular schedule, and can be used to reference this schedule for future events like edit/delete. If a schedule already exists for the specified Job Name, the add command will fail.
Schedule: The schedules are accepted in the format crond understands. For example:
Example of job definition:
.---------------- minute (0 - 59)
| .------------- hour (0 - 23)
| | .---------- day of month (1 - 31)
| | | .------- month (1 - 12) OR jan,feb,mar,apr ...
| | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
| | | | |
* * * * * user-name command to be executed
Volume name: The name of the volume on which the snapshot schedule will be edited.
For Example:
# snap_scheduler.py edit "Job1" "*/5 * * * *" gluster_shared_storage
snap_scheduler: Successfully edited snapshot schedule
Listing a Snapshot Schedule

To list the existing snapshot schedule, execute the following command:

snap_scheduler.py list
For example:
# snap_scheduler.py list
JOB_NAME         SCHEDULE         OPERATION        VOLUME NAME
--------------------------------------------------------------------
Job0                          * * * * *                Snapshot Create    test_vol
Deleting a Snapshot Schedule

To delete an existing snapshot schedule, execute the following command:

snap_scheduler.py delete "Job Name"
where,
Job Name: This name uniquely identifies the particular schedule that has to be deleted.
For example:
# snap_scheduler.py delete Job1
snap_scheduler: Successfully deleted snapshot schedule

8.13. User Serviceable Snapshots

User Serviceable Snapshot is a quick and easy way to access data stored in snapshotted volumes. This feature is based on the core snapshot feature in Red Hat Gluster Storage. With User Serviceable Snapshot feature, you can access the activated snapshots of the snapshot volume.
Consider a scenario where a user wants to access a file test.txt which was in the Home directory a couple of months earlier and was deleted accidentally. You can now easily go to the virtual .snaps directory that is inside the home directory and recover the test.txt file using the cp command.

Note

  • User Serviceable Snapshot is not the recommended option for bulk data access from an earlier snapshot volume. For such scenarios it is recommended to mount the Snapshot volume and then access the data. For more information see, Chapter 8, Managing Snapshots
  • Each activated snapshot volume when initialized by User Serviceable Snapshots, consumes some memory. Most of the memory is consumed by various house keeping structures of gfapi and xlators like DHT, AFR, etc. Therefore, the total memory consumption by snapshot depends on the number of bricks as well. Each brick consumes approximately 10MB of space, for example, in a 4x2 replica setup the total memory consumed by snapshot is around 50MB and for a 6x2 setup it is roughly 90MB.
    Therefore, as the number of active snapshots grow, the total memory footprint of the snapshot daemon (snapd) also grows. Therefore, in a low memory system, the snapshot daemon can get OOM killed if there are too many active snapshots

8.13.1. Enabling and Disabling User Serviceable Snapshot

To enable user serviceable snapshot, run the following command:
# gluster volume set VOLNAME features.uss enable
For example:
# gluster volume set test_vol features.uss enable
volume set: success
To disable user serviceable snapshot run the following command:
# gluster volume set VOLNAME features.uss disable
For example:
# gluster volume set test_vol features.uss disable
volume set: success

8.13.2. Viewing and Retrieving Snapshots using NFS / FUSE

For every snapshot available for a volume, any user who has access to the volume will have a read-only view of the volume. You can recover the files through these read-only views of the volume from different point in time. Each snapshot of the volume will be available in the .snaps directory of every directory of the mounted volume.

Note

To access the snapshot you must first mount the volume.
For NFS mount refer Section 6.2.2.2.1, “Manually Mounting Volumes Using Gluster NFS” for more details. Following command is an example.
# mount -t nfs -o vers=3 server1:/test-vol /mnt/glusterfs
For FUSE mount refer Section 6.1.3.2, “Mounting Volumes Manually” for more details. Following command is an example.
# mount -t glusterfs server1:/test-vol /mnt/glusterfs
The .snaps directory is a virtual directory which will not be listed by either the ls command, or the ls -a option. The .snaps directory will contain every snapshot taken for that given volume as individual directories. Each of these snapshot entries will in turn contain the data of the particular directory the user is accessing from when the snapshot was taken.
To view or retrieve a file from a snapshot follow these steps:
  1. Go to the folder where the file was present when the snapshot was taken. For example, if you had a test.txt file in the root directory of the mount that has to be recovered, then go to that directory.
    # cd /mnt/glusterfs

    Note

    Since every directory has a virtual .snaps directory, you can enter the .snaps directory from here. Since .snaps is a virtual directory, ls and ls -a command will not list the .snaps directory. For example:
    # ls -a
          ....Bob  John  test1.txt  test2.txt
  2. Go to the .snaps folder
    # cd .snaps
  3. Run the ls command to list all the snaps
    For example:
     # ls -p
     snapshot_Dec2014/    snapshot_Nov2014/    snapshot_Oct2014/    snapshot_Sept2014/
  4. Go to the snapshot directory from where the file has to be retrieved.
    For example:
    cd snapshot_Nov2014
    # ls -p
        John/  test1.txt  test2.txt
  5. Copy the file/directory to the desired location.
    # cp -p test2.txt  $HOME

8.13.3. Viewing and Retrieving Snapshots using CIFS for Windows Client