Administration Guide
Configuring and Managing Red Hat Gluster Storage
Abstract
Part I. Preface
Chapter 1. Preface
1.1. About Red Hat Gluster Storage
1.2. About glusterFS
1.3. About On-premises Installation
Part II. Overview
Chapter 2. Architecture and Concepts
2.1. Architecture
Figure 2.1. Red Hat Gluster Storage Architecture
2.2. On-premises Architecture
Figure 2.2. Red Hat Gluster Storage for On-premises Architecture
2.3. Storage Concepts
- Brick
- The glusterFS basic unit of storage, represented by an export directory on a server in the trusted storage pool. A brick is expressed by combining a server with an export directory in the following format:
SERVER:EXPORT
For example:myhostname:/exports/myexportdir/
- Volume
- A volume is a logical collection of bricks. Most of the Red Hat Gluster Storage management operations happen on the volume.
- Translator
- A translator connects to one or more subvolumes, does something with them, and offers a subvolume connection.
- Subvolume
- A brick after being processed by at least one translator.
- Volfile
- Volume (vol) files are configuration files that determine the behavior of your Red Hat Gluster Storage trusted storage pool. At a high level, GlusterFS has three entities, that is, Server, Client and Management daemon. Each of these entities have their own volume files. Volume files for servers and clients are generated by the management daemon upon creation of a volume.Server and Client Vol files are located in
/var/lib/glusterd/vols/VOLNAME
directory. The management daemon vol file is named asglusterd.vol
and is located in/etc/glusterfs/
directory.Warning
You must not modify any vol file in/var/lib/glusterd
manually as Red Hat does not support vol files that are not generated by the management daemon. - glusterd
- glusterd is the glusterFS Management Service that must run on all servers in the trusted storage pool.
- Cluster
- A trusted pool of linked computers working together, resembling a single computing resource. In Red Hat Gluster Storage, a cluster is also referred to as a trusted storage pool.
- Client
- The machine that mounts a volume (this may also be a server).
- File System
- A method of storing and organizing computer files. A file system organizes files into a database for the storage, manipulation, and retrieval by the computer's operating system.Source: Wikipedia
- Distributed File System
- A file system that allows multiple clients to concurrently access data which is spread across servers/bricks in a trusted storage pool. Data sharing among multiple locations is fundamental to all distributed file systems.
- Virtual File System (VFS)
- VFS is a kernel software layer that handles all system calls related to the standard Linux file system. It provides a common interface to several kinds of file systems.
- POSIX
- Portable Operating System Interface (for Unix) (POSIX) is the name of a family of related standards specified by the IEEE to define the application programming interface (API), as well as shell and utilities interfaces, for software that is compatible with variants of the UNIX operating system. Red Hat Gluster Storage exports a fully POSIX compatible file system.
- Metadata
- Metadata is data providing information about other pieces of data.
- FUSE
- Filesystem in User space (FUSE) is a loadable kernel module for Unix-like operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a "bridge" to the kernel interfaces.Source: Wikipedia
- Geo-Replication
- Geo-replication provides a continuous, asynchronous, and incremental replication service from one site to another over Local Area Networks (LAN), Wide Area Networks (WAN), and the Internet.
- N-way Replication
- Local synchronous data replication that is typically deployed across campus or Amazon Web Services Availability Zones.
- Petabyte
- A petabyte is a unit of information equal to one quadrillion bytes, or 1000 terabytes. The unit symbol for the petabyte is PB. The prefix peta- (P) indicates a power of 1000:1 PB = 1,000,000,000,000,000 B = 1000^5 B = 10^15 B.The term "pebibyte" (PiB), using a binary prefix, is used for the corresponding power of 1024.Source: Wikipedia
- RAID
- Redundant Array of Independent Disks (RAID) is a technology that provides increased storage reliability through redundancy. It combines multiple low-cost, less-reliable disk drives components into a logical unit where all drives in the array are interdependent.
- RRDNS
- Round Robin Domain Name Service (RRDNS) is a method to distribute load across application servers. RRDNS is implemented by creating multiple records with the same name and different IP addresses in the zone file of a DNS server.
- Server
- The machine (virtual or bare metal) that hosts the file system in which data is stored.
- Block Storage
- Block special files, or block devices, correspond to devices through which the system moves data in the form of blocks. These device nodes often represent addressable devices such as hard disks, CD-ROM drives, or memory regions. As of Red Hat Gluster Storage 3.4 and later, block storage supports only OpenShift Container Storage converged and independent mode use cases. Block storage can be created and configured for this use case by using the
gluster-block
command line tool. For more information, see Container-Native Storage for OpenShift Container Platform. - Scale-Up Storage
- Increases the capacity of the storage device in a single dimension. For example, adding additional disk capacity in a trusted storage pool.
- Scale-Out Storage
- Increases the capability of a storage device in single dimension. For example, adding more systems of the same size, or adding servers to a trusted storage pool that increases CPU, disk capacity, and throughput for the trusted storage pool.
- Trusted Storage Pool
- A storage pool is a trusted network of storage servers. When you start the first server, the storage pool consists of only that server.
- Namespace
- An abstract container or environment that is created to hold a logical grouping of unique identifiers or symbols. Each Red Hat Gluster Storage trusted storage pool exposes a single namespace as a POSIX mount point which contains every file in the trusted storage pool.
- User Space
- Applications running in user space do not directly interact with hardware, instead using the kernel to moderate access. User space applications are generally more portable than applications in kernel space. glusterFS is a user space application.
- Hashed subvolume
- A Distributed Hash Table Translator subvolume to which the file or directory name is hashed to.
- Cached subvolume
- A Distributed Hash Table Translator subvolume where the file content is actually present. For directories, the concept of cached-subvolume is not relevant. It is loosely used to mean subvolumes which are not hashed-subvolume.
- Linkto-file
- For a newly created file, the hashed and cached subvolumes are the same. When directory entry operations like rename (which can change the name and hence hashed subvolume of the file) are performed on the file, instead of moving the entire data in the file to a new hashed subvolume, a file is created with the same name on the newly hashed subvolume. The purpose of this file is only to act as a pointer to the node where the data is present. In the extended attributes of this file, the name of the cached subvolume is stored. This file on the newly hashed-subvolume is called a linkto-file. The linkto file is relevant only for non-directory entities.
- Directory Layout
- The directory layout helps determine where files in a gluster volume are stored.When a client creates or requests a file, the DHT translator hashes the file's path to create an integer. Each directory in a gluster subvolume holds files that have integers in a specific range, so the hash of any given file maps to a specific subvolume in the gluster volume. The directory layout determines which integer ranges are assigned to a given directory across all subvolumes.Directory layouts are assigned when a directory is first created, and can be reassigned by running a rebalance operation on the volume. If a brick or subvolume is offline when a directory is created, it will not be part of the layout until after a rebalance is run.You should rebalance a volume to recalculate its directory layout after bricks are added to the volume. See Section 11.11, “Rebalancing Volumes” for more information.
- Fix Layout
- A command that is executed during the rebalance process.The rebalance process itself comprises of two stages:
- Fixes the layouts of directories to accommodate any subvolumes that are added or removed. It also heals the directories, checks whether the layout is non-contiguous, and persists the layout in extended attributes, if needed. It also ensures that the directories have the same attributes across all the subvolumes.
- Migrates the data from the cached-subvolume to the hashed-subvolume.
Part III. Configure and Verify
Chapter 3. Considerations for Red Hat Gluster Storage
3.1. Firewall and Port Access
3.1.1. Configuring the Firewall
iptables
command to open a port:
# iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 5667 -j ACCEPT # service iptables save
Important
# firewall-cmd --zone=zone_name --add-service=glusterfs # firewall-cmd --zone=zone_name --add-service=glusterfs --permanent
# firewall-cmd --zone=zone_name --add-port=port/protocol # firewall-cmd --zone=zone_name --add-port=port/protocol --permanent
# firewall-cmd --zone=public --add-port=5667/tcp # firewall-cmd --zone=public --add-port=5667/tcp --permanent
3.1.2. Port Access Requirements
Connection source | TCP Ports | UDP Ports | Recommended for | Used for |
---|---|---|---|---|
Any authorized network entity with a valid SSH key | 22 | - | All configurations | Remote backup using geo-replication |
Any authorized network entity; be cautious not to clash with other RPC services. | 111 | 111 | All configurations | RPC port mapper and RPC bind |
Any authorized SMB/CIFS client | 139 and 445 | 137 and 138 | Sharing storage using SMB/CIFS | SMB/CIFS protocol |
Any authorized NFS clients | 2049 | 2049 | Sharing storage using Gluster NFS or NFS-Ganesha | Exports using NFS protocol |
All servers in the Samba-CTDB cluster | 4379 | - | Sharing storage using SMB and Gluster NFS | CTDB |
Any authorized network entity | 24007 | - | All configurations | Management processes using glusterd |
Any authorized network entity | 55555 | - | All configurations |
Gluster events daemon
If you are upgrading from a previous version of Red Hat Gluster Storage to the latest version 3.5.4, the port used for glusterevents daemon should be modified to be in the ephemral range.
|
NFSv3 clients | 662 | 662 | Sharing storage using NFS-Ganesha and Gluster NFS | statd |
NFSv3 clients | 32803 | 32803 | Sharing storage using NFS-Ganesha and Gluster NFS | NLM protocol |
NFSv3 clients sending mount requests | - | 32769 | Sharing storage using Gluster NFS | Gluster NFS MOUNT protocol |
NFSv3 clients sending mount requests | 20048 | 20048 | Sharing storage using NFS-Ganesha | NFS-Ganesha MOUNT protocol |
NFS clients | 875 | 875 | Sharing storage using NFS-Ganesha | NFS-Ganesha RQUOTA protocol (fetching quota information) |
Servers in pacemaker/corosync cluster | 2224 | - | Sharing storage using NFS-Ganesha | pcsd |
Servers in pacemaker/corosync cluster | 3121 | - | Sharing storage using NFS-Ganesha | pacemaker_remote |
Servers in pacemaker/corosync cluster | - | 5404 and 5405 | Sharing storage using NFS-Ganesha | corosync |
Servers in pacemaker/corosync cluster | 21064 | - | Sharing storage using NFS-Ganesha | dlm |
Any authorized network entity | 49152 - 49664 | - | All configurations | Brick communication ports. The total number of ports required depends on the number of bricks on the node. One port is required for each brick on the machine. |
Gluster Clients | 1023 or 49152 | - | Applicable when system ports are already being used in the machines. | Communication between brick and client processes. |
Connection source | TCP Ports | UDP Ports | Recommended for | Used for |
---|---|---|---|---|
NFSv3 servers | 662 | 662 | Sharing storage using NFS-Ganesha and Gluster NFS | statd |
NFSv3 servers | 32803 | 32803 | Sharing storage using NFS-Ganesha and Gluster NFS | NLM protocol |
3.2. Feature Compatibility Support
Note
Feature | Version |
---|---|
Arbiter bricks | 3.2 |
Bitrot detection | 3.1 |
Erasure coding | 3.1 |
Google Compute Engine | 3.1.3 |
Metadata caching | 3.2 |
Microsoft Azure | 3.1.3 |
NFS version 4 | 3.1 |
SELinux | 3.1 |
Sharding | 3.2.0 |
Snapshots | 3.0 |
Snapshots, cloning | 3.1.3 |
Snapshots, user-serviceable | 3.0.3 |
Tiering (Deprecated) | 3.1.2 |
Volume Shadow Copy (VSS) | 3.1.3 |
Volume Type | Sharding | Tiering (Deprecated) | Quota | Snapshots | Geo-Rep | Bitrot |
---|---|---|---|---|---|---|
Arbitrated-Replicated | Yes | No | Yes | Yes | Yes | Yes |
Distributed | No | Yes | Yes | Yes | Yes | Yes |
Distributed-Dispersed | No | Yes | Yes | Yes | Yes | Yes |
Distributed-Replicated | Yes | Yes | Yes | Yes | Yes | Yes |
Replicated | Yes | Yes | Yes | Yes | Yes | Yes |
Sharded | N/A | No | No | No | Yes | No |
Tiered (Deprecated) | No | N/A | Limited[a] | Limited[a] | Limited[a] | Limited[a] |
Feature | FUSE | Gluster-NFS | NFS-Ganesha | SMB |
---|---|---|---|---|
Arbiter | Yes | Yes | Yes | Yes |
Bitrot detection | Yes | Yes | No | Yes |
dm-cache | Yes | Yes | Yes | Yes |
Encryption (TLS-SSL) | Yes | Yes | Yes | Yes |
Erasure coding | Yes | Yes | Yes | Yes |
Export subdirectory | Yes | Yes | Yes | N/A |
Geo-replication | Yes | Yes | Yes | Yes |
Quota (Deprecated)
Warning
Using QUOTA feature is considered to be deprecated in Red Hat Gluster Storage 3.5.3. Red Hat no longer recommends to use this feature and does not support it on new deployments and existing deployments that upgrade to Red Hat Gluster Storage 3.5.3.
See Chapter 9, Managing Directory Quotas for more details.
| Yes | Yes | Yes | Yes |
RDMA (Deprecated)
Warning
Using RDMA as a transport protocol is considered deprecated in Red Hat Gluster Storage 3.5. Red Hat no longer recommends its use, and does not support it on new deployments and existing deployments that upgrade to Red Hat Gluster Storage 3.5.3.
| Yes | No | No | No |
Snapshots | Yes | Yes | Yes | Yes |
Snapshot cloning | Yes | Yes | Yes | Yes |
Tiering (Deprecated)
Warning
Tiering is considered deprecated as of Red Hat Gluster Storage 3.5. Red Hat no longer recommends its use, and does not support tiering in new deployments and existing deployments that upgrade to Red Hat Gluster Storage 3.5.3.
| Yes | Yes | N/A | N/A |
Chapter 4. Adding Servers to the Trusted Storage Pool
Important
# firewall-cmd --get-active-zones
# firewall-cmd --zone=zone_name --add-service=glusterfs # firewall-cmd --zone=zone_name --add-service=glusterfs --permanent
Note
gluster volume status VOLNAME
command is executed from two of the nodes simultaneously.
4.1. Adding Servers to the Trusted Storage Pool
gluster peer probe [server]
command is used to add servers to the trusted server pool.
Note
Adding Three Servers to a Trusted Storage Pool
Prerequisites
- The
glusterd
service must be running on all storage servers requiring addition to the trusted storage pool. See Chapter 22, Starting and Stopping the glusterd service for service start and stop commands. Server1
, the trusted storage server, is started.- The host names of the target servers must be resolvable by DNS.
- Run
gluster peer probe [server]
from Server 1 to add additional servers to the trusted storage pool.Note
- Self-probing
Server1
will result in an error because it is part of the trusted storage pool by default. - All the servers in the Trusted Storage Pool must have RDMA devices if either RDMA or RDMA,TCP volumes are created in the storage pool. The peer probe must be performed using IP/hostname assigned to the RDMA device.
# gluster peer probe server2 Probe successful # gluster peer probe server3 Probe successful # gluster peer probe server4 Probe successful
- Verify the peer status from all servers using the following command:
# gluster peer status Number of Peers: 3 Hostname: server2 Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5 State: Peer in Cluster (Connected) Hostname: server3 Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7 State: Peer in Cluster (Connected) Hostname: server4 Uuid: 3e0caba-9df7-4f66-8e5d-cbc348f29ff7 State: Peer in Cluster (Connected)
Important
Note
# for peer in `gluster peer status | grep Hostname | awk -F':' '{print $2}' | awk '{print $1}'`; do clockdiff $peer; done
4.2. Removing Servers from the Trusted Storage Pool
Warning
gluster peer detach server
to remove a server from the storage pool.
Removing One Server from the Trusted Storage Pool
Prerequisites
- The
glusterd
service must be running on the server targeted for removal from the storage pool. See Chapter 22, Starting and Stopping the glusterd service for service start and stop commands. - The host names of the target servers must be resolvable by DNS.
- Run
gluster peer detach [server]
to remove the server from the trusted storage pool.# gluster peer detach (server) All clients mounted through the peer which is getting detached needs to be remounted, using one of the other active peers in the trusted storage pool, this ensures that the client gets notification on any changes done on the gluster configuration and if the same has been done do you want to proceed? (y/n) y peer detach: success
- Verify the peer status from all servers using the following command:
# gluster peer status Number of Peers: 2 Hostname: server2 Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5 State: Peer in Cluster (Connected) Hostname: server3 Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7
Chapter 5. Setting Up Storage Volumes
Warning
Note
yum groupinstall "Infiniband Support"
to install Infiniband packages.
Volume Types
- Distributed
- Distributes files across bricks in the volume.Use this volume type where scaling and redundancy requirements are not important, or provided by other hardware or software layers.See Section 5.4, “Creating Distributed Volumes” for additional information about this volume type.
- Replicated
- Replicates files across bricks in the volume.Use this volume type in environments where high-availability and high-reliability are critical.See Section 5.5, “Creating Replicated Volumes” for additional information about this volume type.
- Distributed Replicated
- Distributes files across replicated bricks in the volume.Use this volume type in environments where high-reliability and scalability are critical. This volume type offers improved read performance in most environments.See Section 5.6, “Creating Distributed Replicated Volumes” for additional information about this volume type.
- Arbitrated Replicated
- Replicates files across two bricks in a replica set, and replicates only metadata to the third brick.Use this volume type in environments where consistency is critical, but underlying storage space is at a premium.See Section 5.7, “Creating Arbitrated Replicated Volumes” for additional information about this volume type.
- Dispersed
- Disperses the file's data across the bricks in the volume.Use this volume type where you need a configurable level of reliability with a minimum space waste.See Section 5.8, “Creating Dispersed Volumes” for additional information about this volume type.
- Distributed Dispersed
- Distributes file's data across the dispersed sub-volume.Use this volume type where you need a configurable level of reliability with a minimum space waste.See Section 5.9, “Creating Distributed Dispersed Volumes” for additional information about this volume type.
5.1. Setting up Gluster Storage Volumes using gdeploy
- Setting-up the backend on several machines can be done from one's laptop/desktop. This saves time and scales up well when the number of nodes in the trusted storage pool increase.
- Flexibility in choosing the drives to configure. (sd, vd, ...).
- Flexibility in naming the logical volumes (LV) and volume groups (VG).
5.1.1. Getting Started
- Generate the passphrase-less SSH keys for the nodes which are going to be part of the trusted storage pool by running the following command:
# ssh-keygen -t rsa -N ''
- Set up key-based SSH authentication access between the gdeploy controller and servers by running the following command:
# ssh-copy-id -i root@server
Note
If you are using a Red Hat Gluster Storage node as the deployment node and not an external node, then the key-based SSH authentication must be set up for the Red Hat Gluster Storage node from where the installation is performed. - Enable the repository required to install Ansible by running the following command:For Red Hat Enterprise Linux 8
# subscription-manager repos --enable=ansible-2-for-rhel-8-x86_64-rpms
For Red Hat Enterprise Linux 7# subscription-manager repos --enable=rhel-7-server-ansible-2-rpms
- Install
ansible
by executing the following command:# yum install ansible
- You must also ensure the following:
- Devices should be raw and unused
- Default system locale must be set to
en_US
For information on system locale, refer to the Setting the System Locale of the Red Hat Enterprise Linux 7 System Administrator's Guide. - For multiple devices, use multiple volume groups, thinpool, and thinvol in the
gdeploy
configuration file
- Using a node in a trusted storage pool
- Using a machine outside the trusted storage pool
The gdeploy
package is bundled as part of the initial installation of Red Hat Gluster Storage.
You must ensure that the Red Hat Gluster Storage is subscribed to the required channels. For more information see, Subscribing to the Red Hat Gluster Storage Server Channels in the Red Hat Gluster Storage 3.5 Installation Guide.
# yum install gdeploy
gdeploy
see, Installing Ansible to Support Gdeploy section in the Red Hat Gluster Storage 3.5 Installation Guide.
5.1.2. Setting up a Trusted Storage Pool
/usr/share/doc/gdeploy/examples/gluster.conf.sample
Note
# # Usage: # gdeploy -c 3x3-volume-create.conf # # This does backend setup first and then create the volume using the # setup bricks. # # [hosts] 10.70.46.13 10.70.46.17 10.70.46.21 # Common backend setup for 2 of the hosts. [backend-setup] devices=sdb,sdc,sdd vgs=vg1,vg2,vg3 pools=pool1,pool2,pool3 lvs=lv1,lv2,lv3 mountpoints=/rhgs/brick1,/rhgs/brick2,/rhgs/brick3 brick_dirs=/rhgs/brick1/b1,/rhgs/brick2/b2,/rhgs/brick3/b3 # If backend-setup is different for each host # [backend-setup:10.70.46.13] # devices=sdb # brick_dirs=/rhgs/brick1 # # [backend-setup:10.70.46.17] # devices=sda,sdb,sdc # brick_dirs=/rhgs/brick{1,2,3} # [volume] action=create volname=sample_volname replica=yes replica_count=3 force=yes [clients] action=mount volname=sample_volname hosts=10.70.46.15 fstype=glusterfs client_mount_points=/mnt/gluster
/dev/sdb
, /dev/sdc
, /dev/sdd
with the volume name as sample_volname
will be created.
# gdeploy -c txt.conf
Note
/usr/share/doc/gdeploy/examples/gluster.conf.sample
. To invoke the new configuration file, run gdeploy -c /path_to_file/config.txt
command.
only
setup the backend see, Section 5.1.3, “Setting up the Backend ”
only
create a volume see, Section 5.1.4, “Creating Volumes”
only
mount clients see, Section 5.1.5, “Mounting Clients”
5.1.3. Setting up the Backend
/usr/share/doc/gdeploy/examples/gluster.conf.sample
- Using the [backend-setup] module
- Creating Physical Volume (PV), Volume Group (VG), and Logical Volume (LV) individually
Note
xfsprogs
package must be installed before setting up the backend bricks using gdeploy.
Important
5.1.3.1. Using the [backend-setup] Module
- Generic
- Specific
If the disk names are uniform across the machines then backend setup can be written as below. The backend is setup for all the hosts in the `hosts’ section.
# # Usage: # gdeploy -c backend-setup-generic.conf # # This configuration creates backend for GlusterFS clusters # [hosts] 10.70.46.130 10.70.46.32 10.70.46.110 10.70.46.77 # Backend setup for all the nodes in the `hosts' section. This will create # PV, VG, and LV with gdeploy generated names. [backend-setup] devices=vdb
If the disks names vary across the machines in the cluster then backend setup can be written for specific machines with specific disk names. gdeploy is quite flexible in allowing to do host specific setup in a single configuration file.
# # Usage: # gdeploy -c backend-setup-hostwise.conf # # This configuration creates backend for GlusterFS clusters # [hosts] 10.70.46.130 10.70.46.32 10.70.46.110 10.70.46.77 # Backend setup for 10.70.46.77 with default gdeploy generated names for # Volume Groups and Logical Volumes. Volume names will be GLUSTER_vg1, # GLUSTER_vg2... [backend-setup:10.70.46.77] devices=vda,vdb # Backend setup for remaining 3 hosts in the `hosts' section with custom names # for Volumes Groups and Logical Volumes. [backend-setup:10.70.46.{130,32,110}] devices=vdb,vdc,vdd vgs=vg1,vg2,vg3 pools=pool1,pool2,pool3 lvs=lv1,lv2,lv3 mountpoints=/rhgs/brick1,/rhgs/brick2,/rhgs/brick3 brick_dirs=/rhgs/brick1/b1,/rhgs/brick2/b2,/rhgs/brick3/b3
5.1.3.2. Creating Backend by Setting up PV, VG, and LV
[hosts] 10.70.46.130 10.70.46.32 [pv] action=create devices=vdb [vg1] action=create vgname=RHS_vg1 pvname=vdb [lv1] action=create vgname=RHS_vg1 lvname=engine_lv lvtype=thick size=10GB mount=/rhgs/brick1 [lv2] action=create vgname=RHS_vg1 poolname=lvthinpool lvtype=thinpool poolmetadatasize=200MB chunksize=1024k size=30GB [lv3] action=create lvname=lv_vmaddldisks poolname=lvthinpool vgname=RHS_vg1 lvtype=thinlv mount=/rhgs/brick2 virtualsize=9GB [lv4] action=create lvname=lv_vmrootdisks poolname=lvthinpool vgname=RHS_vg1 size=19GB lvtype=thinlv mount=/rhgs/brick3 virtualsize=19GB
# # Extends a given given VG. pvname and vgname is mandatory, in this example the # vg `RHS_vg1' is extended by adding pv, vdd. If the pv is not alreay present, it # is created by gdeploy. # [hosts] 10.70.46.130 10.70.46.32 [vg2] action=extend vgname=RHS_vg1 pvname=vdd
5.1.4. Creating Volumes
/usr/share/doc/gdeploy/examples/gluster.conf.sample
[hosts] 10.0.0.1 10.0.0.2 10.0.0.3 10.0.0.4 [volume] action=create volname=glustervol transport=tcp,rdma replica=yes replica_count=3 brick_dirs=/glus/brick1/b1,/glus/brick1/b1,/glus/brick1/b1 force=yes
# gdeploy -c txt.conf
Note
[hosts] 10.70.46.130 10.70.46.32 10.70.46.16 [backend-setup] devices=vdb,vdc,vdd,vde mountpoints=/mnt/data{1-6} brick_dirs=/mnt/data1/1,/mnt/data2/2,/mnt/data3/3,/mnt/data4/4,/mnt/data5/5,/mnt/data6/6 [volume1] action=create volname=vol-one transport=tcp replica=yes replica_count=3 brick_dirs=/mnt/data1/1,/mnt/data2/2,/mnt/data5/5 [volume2] action=create volname=vol-two transport=tcp replica=yes replica_count=3 brick_dirs=/mnt/data3/3,/mnt/data4/4,/mnt/data6/6
[hosts] 10.70.46.130 10.70.46.32 10.70.46.16 [backend-setup] devices=vdb,vdc mountpoints=/mnt/data{1-6} [volume1] action=create volname=vol-one transport=tcp replica=yes replica_count=3 key=group,storage.owner-uid,storage.owner-gid,features.shard,features.shard-block-size,performance.low-prio-threads,cluster.data-self-heal-algorithm value=virt,36,36,on,512MB,32,full brick_dirs=/mnt/data1/1,/mnt/data3/3,/mnt/data5/5 [volume2] action=create volname=vol-two transport=tcp replica=yes key=group,storage.owner-uid,storage.owner-gid,features.shard,features.shard-block-size,performance.low-prio-threads,cluster.data-self-heal-algorithm value=virt,36,36,on,512MB,32,full replica_count=3 brick_dirs=/mnt/data2/2,/mnt/data4/4,/mnt/data6/6
5.1.5. Mounting Clients
/usr/share/doc/gdeploy/examples/gluster.conf.sample
[clients] action=mount hosts=10.70.46.159 fstype=glusterfs client_mount_points=/mnt/gluster volname=10.0.0.1:glustervol
Note
fstype
) is NFS, then mention it as nfs-version
. The default version is 3
.
# gdeploy -c txt.conf
5.1.6. Configuring a Volume
5.1.6.1. Adding and Removing a Brick
Modify the [volume] section in the configuration file to add a brick. For example:
[volume] action=add-brick volname=10.0.0.1:glustervol bricks=10.0.0.1:/rhgs/new_brick
# gdeploy -c txt.conf
Modify the [volume] section in the configuration file to remove a brick. For example:
[volume] action=remove-brick volname=10.0.0.1:glustervol bricks=10.0.0.2:/rhgs/brick state=commit
state
are stop, start, and force.
# gdeploy -c txt.conf
5.1.6.2. Rebalancing a Volume
[volume] action=rebalance volname=10.70.46.13:glustervol state=start
state
are stop, and fix-layout.
# gdeploy -c txt.conf
5.1.6.3. Starting, Stopping, or Deleting a Volume
Modify the [volume] section in the configuration file to start a volume. For example:
[volume] action=start volname=10.0.0.1:glustervol
# gdeploy -c txt.conf
Modify the [volume] section in the configuration file to start a volume. For example:
[volume] action=stop volname=10.0.0.1:glustervol
# gdeploy -c txt.conf
Modify the [volume] section in the configuration file to start a volume. For example:
[volume] action=delete volname=10.70.46.13:glustervol
# gdeploy -c txt.conf
5.1.7. Configuration File
- [hosts]
- [devices]
- [disktype]
- [diskcount]
- [stripesize]
- [vgs]
- [pools]
- [lvs]
- [mountpoints]
- [peer]
- [clients]
- [volume]
- [backend-setup]
- [pv]
- [vg]
- [lv]
- [RH-subscription]
- [yum]
- [shell]
- [update-file]
- [service]
- [script]
- [firewalld]
- [geo-replication]
- hosts
This is a mandatory section which contains the IP address or hostname of the machines in the trusted storage pool. Each hostname or IP address should be listed in a separate line.
For example:[hosts] 10.0.0.1 10.0.0.2
- devices
This is a generic section and is applicable to all the hosts listed in the [hosts] section. However, if sections of hosts such as the [hostname] or [IP-address] is present, then the data in the generic sections like [devices] is ignored. Host specific data take precedence. This is an optional section.
For example:[devices] /dev/sda /dev/sdb
Note
When configuring the backend setup, the devices should be either listed in this section or in the host specific section. - disktype
This section specifies the disk configuration that is used while setting up the backend. gdeploy supports RAID 10, RAID 6, RAID 5, and JBOD configurations. This is an optional section and if the field is left empty, JBOD is taken as the default configuration. Valid values for this field are
raid10
,raid6
,raid5
, andjbod
.For example:[disktype] raid6
- diskcount
This section specifies the number of data disks in the setup. This is a mandatory field if a RAID disk type is specified under
[disktype]
. If the [disktype] is JBOD the [diskcount] value is ignored. This parameter is host specific.For example:[diskcount] 10
- stripesize
This section specifies the stripe_unit size in KB.
Case 1: This field is not necessary if the [disktype] is JBOD, and any given value will be ignored.Case 2: This is a mandatory field if [disktype] is specified as RAID 5 or RAID 6.For [disktype] RAID 10, the default value is taken as 256KB. Red Hat does not recommend changing this value. If you specify any other value the following warning is displayed:"Warning: We recommend a stripe unit size of 256KB for RAID 10"
Note
Do not add any suffixes like K, KB, M, etc. This parameter is host specific and can be added in the hosts section.For example:[stripesize] 128
- vgs
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for gdeploy 2.0. This section specifies the volume group names for the devices listed in [devices]. The number of volume groups in the [vgs] section should match the one in [devices]. If the volume group names are missing, the volume groups will be named as GLUSTER_vg{1, 2, 3, ...} as default.
For example:[vgs] CUSTOM_vg1 CUSTOM_vg2
- pools
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for gdeploy 2.0. This section specifies the pool names for the volume groups specified in the [vgs] section. The number of pools listed in the [pools] section should match the number of volume groups in the [vgs] section. If the pool names are missing, the pools will be named as GLUSTER_pool{1, 2, 3, ...}.
For example:[pools] CUSTOM_pool1 CUSTOM_pool2
- lvs
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for gdeploy 2.0. This section provides the logical volume names for the volume groups specified in [vgs]. The number of logical volumes listed in the [lvs] section should match the number of volume groups listed in [vgs]. If the logical volume names are missing, it is named as GLUSTER_lv{1, 2, 3, ...}.
For example:[lvs] CUSTOM_lv1 CUSTOM_lv2
- mountpoints
This section is deprecated in gdeploy 2.0. Please see [backend-setup] for more details for gdeploy 2.0. This section specifies the brick mount points for the logical volumes. The number of mount points should match the number of logical volumes specified in [lvs] If the mount points are missing, the mount points will be names as /gluster/brick{1, 2, 3…}.
For example:[mountpoints] /rhgs/brick1 /rhgs/brick2
- peer
This section specifies the configurations for the Trusted Storage Pool management (TSP). This section helps in making all the hosts specified in the [hosts] section to either probe each other to create the trusted storage pool or detach all of them from the trusted storage pool. The only option in this section is the option names 'action' which can have it's values to be either probe or detach.
For example:[peer] action=probe
- clients
This section specifies the client hosts and client_mount_points to mount the gluster storage volume created. The 'action' option is to be specified for the framework to determine the action that has to be performed. The options are 'mount' and 'unmount'. The Client hosts field is mandatory. If the mount points are not specified, default will be taken as /mnt/gluster for all the hosts.
The option fstype specifies how the gluster volume is to be mounted. Default is glusterfs (FUSE mount). The volume can also be mounted as NFS. Each client can have different types of volume mount, which has to be specified with a comma separated. The following fields are included:* action * hosts * fstype * client_mount_points
For example:[clients] action=mount hosts=10.0.0.10 fstype=nfs options=vers=3 client_mount_points=/mnt/rhs
- volume
The section specifies the configuration options for the volume. The following fields are included in this section:
* action * volname * transport * replica * replica_count * disperse * disperse_count * redundancy_count * force
- action
This option specifies what action must be performed in the volume. The choices can be [create, delete, add-brick, remove-brick].
create: This choice is used to create a volume.delete: If the delete choice is used, all the options other than 'volname' will be ignored.add-brick or remove-brick: If the add-brick or remove-brick is chosen, extra option bricks with a comma separated list of brick names(in the format <hostname>:<brick path> should be provided. In case of remove-brick, state option should also be provided specifying the state of the volume after brick removal. - volname
This option specifies the volume name. Default name is glustervol
Note
- In case of a volume operation, the 'hosts' section can be omitted, provided volname is in the format <hostname>:<volname>, where hostname is the hostname / IP of one of the nodes in the cluster
- Only single volume creation/deletion/configuration is supported.
- transport
This option specifies the transport type. Default is tcp. Options are tcp or rdma (Deprecated) or tcp,rdma.
- replica
This option will specify if the volume should be of type replica. options are yes and no. Default is no. If 'replica' is provided as yes, the 'replica_count' should be provided.
- disperse
This option specifies if the volume should be of type disperse. Options are yes and no. Default is no.
- disperse_count
This field is optional even if 'disperse' is yes. If not specified, the number of bricks specified in the command line is taken as the disperse_count value.
- redundancy_count
If this value is not specified, and if 'disperse' is yes, it's default value is computed so that it generates an optimal configuration.
- force
This is an optional field and can be used during volume creation to forcefully create the volume.
For example:[volname] action=create volname=glustervol transport=tcp,rdma replica=yes replica_count=3 force=yes
- backend-setup
Available in gdeploy 2.0. This section sets up the backend for using with GlusterFS volume. If more than one backend-setup has to be done, they can be done by numbering the section like [backend-setup1], [backend-setup2], ...
backend-setup section supports the following variables:- devices: This replaces the [pvs] section in gdeploy 1.x. devices variable lists the raw disks which should be used for backend setup. For example:
[backend-setup] devices=sda,sdb,sdc
This is a mandatory field. - dalign:The Logical Volume Manager can use a portion of the physical volume for storing its metadata while the rest is used as the data portion. Align the I/O at the Logical Volume Manager (LVM) layer using the dalign option while creating the physical volume. For example:
[backend-setup] devices=sdb,sdc,sdd,sde dalign=256k
For JBOD, use an alignment value of 256K. For hardware RAID, the alignment value should be obtained by multiplying the RAID stripe unit size with the number of data disks. If 12 disks are used in a RAID 6 configuration, the number of data disks is 10; on the other hand, if 12 disks are used in a RAID 10 configuration, the number of data disks is 6.The following example is appropriate for 12 disks in a RAID 6 configuration with a stripe unit size of 128 KiB:[backend-setup] devices=sdb,sdc,sdd,sde dalign=1280k
The following example is appropriate for 12 disks in a RAID 10 configuration with a stripe unit size of 256 KiB:[backend-setup] devices=sdb,sdc,sdd,sde dalign=1536k
To view the previously configured physical volume settings for the dalign option, run thepvs -o +pe_start device
command. For example:# pvs -o +pe_start /dev/sdb PV VG Fmt Attr PSize PFree 1st PE /dev/sdb lvm2 a-- 9.09t 9.09t 1.25m
You can also set the dalign option in the PV section. - vgs: This is an optional variable. This variable replaces the [vgs] section in gdeploy 1.x. vgs variable lists the names to be used while creating volume groups. The number of VG names should match the number of devices or should be left blank. gdeploy will generate names for the VGs. For example:
[backend-setup] devices=sda,sdb,sdc vgs=custom_vg1,custom_vg2,custom_vg3
A pattern can be provided for the vgs like custom_vg{1..3}, this will create three vgs.[backend-setup] devices=sda,sdb,sdc vgs=custom_vg{1..3}
- pools: This is an optional variable. The variable replaces the [pools] section in gdeploy 1.x. pools lists the thin pool names for the volume.
[backend-setup] devices=sda,sdb,sdc vgs=custom_vg1,custom_vg2,custom_vg3 pools=custom_pool1,custom_pool2,custom_pool3
Similar to vg, pattern can be provided for thin pool names. For example custom_pool{1..3} - lvs: This is an optional variable. This variable replaces the [lvs] section in gdeploy 1.x. lvs lists the logical volume name for the volume.
[backend-setup] devices=sda,sdb,sdc vgs=custom_vg1,custom_vg2,custom_vg3 pools=custom_pool1,custom_pool2,custom_pool3 lvs=custom_lv1,custom_lv2,custom_lv3
Patterns for LV can be provided similar to vg. For example custom_lv{1..3}. - mountpoints: This variable deprecates the [mountpoints] section in gdeploy 1.x. Mountpoints lists the mount points where the logical volumes should be mounted. Number of mount points should be equal to the number of logical volumes. For example:
[backend-setup] devices=sda,sdb,sdc vgs=custom_vg1,custom_vg2,custom_vg3 pools=custom_pool1,custom_pool2,custom_pool3 lvs=custom_lv1,custom_lv2,custom_lv3 mountpoints=/gluster/data1,/gluster/data2,/gluster/data3
- ssd - This variable is set if caching has to be added. For example, the backed setup with ssd for caching should be:
[backend-setup] ssd=sdc vgs=RHS_vg1 datalv=lv_data cachedatalv=lv_cachedata:1G cachemetalv=lv_cachemeta:230G
Note
Specifying the name of the data LV is necessary while adding SSD. Make sure the datalv is created already. Otherwise ensure to create it in one of the earlier `backend-setup’ sections.
- PV
Available in gdeploy 2.0. If the user needs to have more control over setting up the backend, and does not want to use backend-setup section, then pv, vg, and lv modules are to be used. The pv module supports the following variables.
- action: Mandatory. Supports two values, 'create' and 'resize'Example: Creating physical volumes
[pv] action=create devices=vdb,vdc,vdd
Example: Creating physical volumes on a specific host[pv:10.0.5.2] action=create devices=vdb,vdc,vdd
- devices: Mandatory. The list of devices to use for pv creation.
- expand: Used when
action=resize
.Example: Expanding an already created pv[pv] action=resize devices=vdb expand=yes
- shrink: Used when
action=resize
.Example: Shrinking an already created pv[pv] action=resize devices=vdb shrink=100G
- dalign:The Logical Volume Manager can use a portion of the physical volume for storing its metadata while the rest is used as the data portion. Align the I/O at the Logical Volume Manager (LVM) layer using the dalign option while creating the physical volume. For example:
[pv] action=create devices=sdb,sdc,sdd,sde dalign=256k
For JBOD, use an alignment value of 256K. For hardware RAID, the alignment value should be obtained by multiplying the RAID stripe unit size with the number of data disks. If 12 disks are used in a RAID 6 configuration, the number of data disks is 10; on the other hand, if 12 disks are used in a RAID 10 configuration, the number of data disks is 6.The following example is appropriate for 12 disks in a RAID 6 configuration with a stripe unit size of 128 KiB:[pv] action=create devices=sdb,sdc,sdd,sde dalign=1280k
The following example is appropriate for 12 disks in a RAID 10 configuration with a stripe unit size of 256 KiB:[pv] action=create devices=sdb,sdc,sdd,sde dalign=1536k
To view the previously configured physical volume settings for the dalign option, run thepvs -o +pe_start device
command. For example:# pvs -o +pe_start /dev/sdb PV VG Fmt Attr PSize PFree 1st PE /dev/sdb lvm2 a-- 9.09t 9.09t 1.25m
You can also set the dalign option in the backend-setup section.
- VG
Available in gdeploy 2.0. This module is used to create and extend volume groups. The vg module supports the following variables.
- action - Action can be one of create or extend.
- pvname - PVs to use to create the volume. For more than one PV use comma separated values.
- vgname - The name of the vg. If no name is provided GLUSTER_vg will be used as default name.
- one-to-one - If set to yes, one-to-one mapping will be done between pv and vg.
If action is set to extend, the vg will be extended to include pv provided.Example1: Create a vg named images_vg with two PVs[vg] action=create vgname=images_vg pvname=sdb,sdc
Example2: Create two vgs named rhgs_vg1 and rhgs_vg2 with two PVs[vg] action=create vgname=rhgs_vg pvname=sdb,sdc one-to-one=yes
Example3: Extend an existing vg with the given disk.[vg] action=extend vgname=rhgs_images pvname=sdc
- LV
Available in gdeploy 2.0. This module is used to create, setup-cache, and convert logical volumes. The lv module supports the following variables:
action - The action variable allows three values `create’, `setup-cache’, `convert’, and `change’. If the action is 'create', the following options are supported:- lvname: The name of the logical volume, this is an optional field. Default is GLUSTER_lv
- poolname - Name of the thinpool volume name, this is an optional field. Default is GLUSTER_pool
- lvtype - Type of the logical volume to be created, allowed values are `thin’ and `thick’. This is an optional field, default is thick.
- size - Size of the logical volume volume. Default is to take all available space on the vg.
- extent - Extent size, default is 100%FREE
- force - Force lv create, do not ask any questions. Allowed values `yes’, `no’. This is an optional field, default is yes.
- vgname - Name of the volume group to use.
- pvname - Name of the physical volume to use.
- chunksize - The size of the chunk unit used for snapshots, cache pools, and thin pools. By default this is specified in kilobytes. For RAID 5 and 6 volumes, gdeploy calculates the default chunksize by multiplying the stripe size and the disk count. For RAID 10, the default chunksize is 256 KB. See Section 19.2, “Brick Configuration” for details.
Warning
Red Hat recommends using at least the default chunksize. If the chunksize is too small and your volume runs out of space for metadata, the volume is unable to create data. This includes the data required to increase the size of the metadata pool or to migrate data away from a volume that has run out of metadata space. Red Hat recommends monitoring your logical volumes to ensure that they are expanded or more storage created before metadata volumes become completely full. - poolmetadatasize - Sets the size of pool's metadata logical volume. Allocate the maximum chunk size (16 GiB) if possible. If you allocate less than the maximum, allocate at least 0.5% of the pool size to ensure that you do not run out of metadata space.
Warning
If your metadata pool runs out of space, you cannot create data. This includes the data required to increase the size of the metadata pool or to migrate data away from a volume that has run out of metadata space. Monitor your metadata pool using thelvs -o+metadata_percent
command and ensure that it does not run out of space. - virtualsize - Creates a thinly provisioned device or a sparse device of the given size
- mkfs - Creates a filesystem of the given type. Default is to use xfs.
- mkfs-opts - mkfs options.
- mount - Mount the logical volume.
If the action is setup-cache, the below options are supported:- ssd - Name of the ssd device. For example sda/vda/ … to setup cache.
- vgname - Name of the volume group.
- poolname - Name of the pool.
- cache_meta_lv - Due to requirements from dm-cache (the kernel driver), LVM further splits the cache pool LV into two devices - the cache data LV and cache metadata LV. Provide the cache_meta_lv name here.
- cache_meta_lvsize - Size of the cache meta lv.
- cache_lv - Name of the cache data lv.
- cache_lvsize - Size of the cache data.
- force - Force
If the action is convert, the below options are supported:- lvtype - type of the lv, available options are thin and thick
- force - Force the lvconvert, default is yes.
- vgname - Name of the volume group.
- poolmetadata - Specifies cache or thin pool metadata logical volume.
- cachemode - Allowed values writeback, writethrough. Default is writethrough.
- cachepool - This argument is necessary when converting a logical volume to a cache LV. Name of the cachepool.
- lvname - Name of the logical volume.
- chunksize - The size of the chunk unit used for snapshots, cache pools, and thin pools. By default this is specified in kilobytes. For RAID 5 and 6 volumes, gdeploy calculates the default chunksize by multiplying the stripe size and the disk count. For RAID 10, the default chunksize is 256 KB. See Section 19.2, “Brick Configuration” for details.
Warning
Red Hat recommends using at least the default chunksize. If the chunksize is too small and your volume runs out of space for metadata, the volume is unable to create data. Red Hat recommends monitoring your logical volumes to ensure that they are expanded or more storage created before metadata volumes become completely full. - poolmetadataspare - Controls creation and maintanence of pool metadata spare logical volume that will be used for automated pool recovery.
- thinpool - Specifies or converts logical volume into a thin pool's data volume. Volume’s name or path has to be given.
If the action is change, the below options are supported:- lvname - Name of the logical volume.
- vgname - Name of the volume group.
- zero - Set zeroing mode for thin pool.
Example 1: Create a thin LV[lv] action=create vgname=RHGS_vg1 poolname=lvthinpool lvtype=thinpool poolmetadatasize=200MB chunksize=1024k size=30GB
Example 2: Create a thick LV[lv] action=create vgname=RHGS_vg1 lvname=engine_lv lvtype=thick size=10GB mount=/rhgs/brick1
If there are more than one LVs, then the LVs can be created by numbering the LV sections, like [lv1], [lv2] … - RH-subscription
Available in gdeploy 2.0. This module is used to subscribe, unsubscribe, attach, enable repos etc. The RH-subscription module allows the following variables:
This module is used to subscribe, unsubscribe, attach, enable repos etc. The RH-subscription module allows the following variables:If the action is register, the following options are supported:- username/activationkey: Username or activationkey.
- password/activationkey: Password or activation key
- auto-attach: true/false
- pool: Name of the pool.
- repos: Repos to subscribe to.
- disable-repos: Repo names to disable. Leaving this option blank will disable all the repos.
- ignore_register_errors: If set to no, gdeploy will exit if system registration fails.
- If the action is attach-pool the following options are supported:pool - Pool name to be attached.ignore_attach_pool_errors - If set to no, gdeploy fails if attach-pool fails.
- If the action is enable-repos the following options are supported:repos - List of comma separated repos that are to be subscribed to.ignore_enable_errors - If set to no, gdeploy fails if enable-repos fail.
- If the action is disable-repos the following options are supported:repos - List of comma separated repos that are to be subscribed to.ignore_disable_errors - If set to no, gdeploy fails if disable-repos fail
- If the action is unregister the systems will be unregistered.ignore_unregister_errors - If set to no, gdeploy fails if unregistering fails.
Example 1: Subscribe to Red Hat Subscription network:[RH-subscription1] action=register username=qa@redhat.com password=<passwd> pool=<pool> ignore_register_errors=no
Example 2: Disable all the repos:[RH-subscription2] action=disable-repos repos=*
Example 3: Enable a few repos[RH-subscription3] action=enable-repos repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rhel-7-server-rhev-mgmt-agent-rpms ignore_enable_errors=no
- yum
Available in gdeploy 2.0. This module is used to install or remove rpm packages, with the yum module we can add repos as well during the install time.
The action variable allows two values `install’ and `remove’.If the action is install the following options are supported:- packages - Comma separated list of packages that are to be installed.
- repos - The repositories to be added.
- gpgcheck - yes/no values have to be provided.
- update - Whether yum update has to be initiated.
If the action is remove then only one option has to be provided:- remove - The comma separated list of packages to be removed.
For example[yum1] action=install gpgcheck=no # Repos should be an url; eg: http://repo-pointing-glusterfs-builds repos=<glusterfs.repo>,<vdsm.repo> packages=vdsm,vdsm-gluster,ovirt-hosted-engine-setup,screen,xauth update=yes
Install a package on a particular host.[yum2:host1] action=install gpgcheck=no packages=rhevm-appliance
- shell
Available in gdeploy 2.0. This module allows user to run shell commands on the remote nodes.
Currently shell provides a single action variable with value execute. And a command variable with any valid shell command as value.The below command will execute vdsm-tool on all the nodes.[shell] action=execute command=vdsm-tool configure --force
- update-file
Available in gdeploy 2.0. update-file module allows users to copy a file, edit a line in a file, or add new lines to a file. action variable can be any of copy, edit, or add.
When the action variable is set to copy, the following variables are supported.- src - The source path of the file to be copied from.
- dest - The destination path on the remote machine to where the file is to be copied to.
When the action variable is set to edit, the following variables are supported.- dest - The destination file name which has to be edited.
- replace - A regular expression, which will match a line that will be replaced.
- line - Text that has to be replaced.
When the action variable is set to add, the following variables are supported.- dest - File on the remote machine to which a line has to be added.
- line - Line which has to be added to the file. Line will be added towards the end of the file.
Example 1: Copy a file to a remote machine.[update-file] action=copy src=/tmp/foo.cfg
Example 2: Edit a line in the remote machine, in the below example lines that have allowed_hosts will be replaced with allowed_hosts=host.redhat.com[update-file] action=edit replace=allowed_hosts line=allowed_hosts=host.redhat.com
Example 3: Add a line to the end of a fileFor Red Hat Enterprise Linux 7:[update-file] action=add dest=/etc/ntp.conf line=server clock.redhat.com iburst
For Red Hat Enterprise Linux 8:[update-file] action=add dest=/etc/chrony.conf line=server 0.rhel.pool.ntp.org iburst
- service
Available in gdeploy 2.0. The service module allows user to start, stop, restart, reload, enable, or disable a service. The action variable specifies these values.
When action variable is set to any of start, stop, restart, reload, enable, disable the variable servicename specifies which service to start, stop etc.- service - Name of the service to start, stop etc.
For Red Hat Enterprise Linux 7:Example: enable and start ntp daemon.[service1] action=enable service=ntpd
[service2] action=restart service=ntpd
For Red Hat Enterprise Linux 8:Example: enable and start chrony daemon.[service1] action=enable service=chrony
[service2] action=restart service=chrony
- script
Available in gdeploy 2.0. script module enables user to execute a script/binary on the remote machine. action variable is set to execute. Allows user to specify two variables file and args.
- file - An executable on the local machine.
- args - Arguments to the above program.
Example: Execute script disable-multipath.sh on all the remote nodes listed in `hosts’ section.[script] action=execute file=/usr/share/ansible/gdeploy/scripts/disable-multipath.sh
- firewalld
Available in gdeploy 2.0. firewalld module allows the user to manipulate firewall rules. action variable supports two values `add’ and `delete’. Both add and delete support the following variables:
- ports/services - The ports or services to add to firewall.
- permanent - Whether to make the entry permanent. Allowed values are true/false
- zone - Default zone is public
For example:[firewalld] action=add ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp services=glusterfs
- geo-replication
Available in gdeploy 2.0.2, geo-replication module allows the user to configure geo-replication, control and verify geo-replication sessions. The following are the supported variables:
action
- The action to be performed for the geo-replication session.- create - To create a geo-replication session.
- start - To start a created geo-replication session.
- stop - To stop a started geo-replication session.
- pause - To pause a geo-replication session.
- resume - To resume a paused geo-replication session.
- delete - To delete a geo-replication session.
georepuser
- Username to be used for the action being performedImportant
Ifgeorepuser
variable is omitted, the user is assumed to be root user.mastervol
- Master volume details in the following format:Master_HostName:Master_VolName
slavevol
- Slave volume details in the following format:Slave_HostName:Slave_VolName
slavenodes
- Slave node IP addresses in the following format:Slave1_IPAddress,Slave2_IPAddress
Important
Slave IP addresses must be comma (,) separated.force
- Force the system to perform the action. Allowed values areyes
orno
.start
- Start the action specified in the configuration file. Allowed values areyes
orno
. Default value isyes
.
For example:[geo-replication] action=create georepuser=testgeorep mastervol=10.1.1.29:mastervolume slavevol=10.1.1.25:slavevolume slavenodes=10.1.1.28,10.1.1.86 force=yes start=yes
5.1.8. Deploying NFS Ganesha using gdeploy
5.1.8.1. Prerequisites
You must subscribe to subscription manager and obtain the NFS Ganesha packages before continuing further.
[RH-subscription1] action=register username=<user>@redhat.com password=<password> pool=<pool-id>
# gdeploy -c txt.conf
To enable the required repos, add the following details in the configuration file:
[RH-subscription2] action=enable-repos repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rh-gluster-3-nfs-for-rhel-7-server-rpms,rhel-ha-for-rhel-7-server-rpms,rhel-7-server-ansible-2-rpms
# gdeploy -c txt.conf
To enable the firewall ports, add the following details in the configuration file:
[firewalld] action=add ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp services=glusterfs,nlm,nfs,rpc-bind,high-availability,mountd,rquota
Note
# gdeploy -c txt.conf
To install the required package, add the following details in the configuration file
[yum] action=install repolist= gpgcheck=no update=no packages=glusterfs-ganesha
# gdeploy -c txt.conf
5.1.8.2. Supported Actions
- Creating a Cluster
- Destroying a Cluster
- Adding a Node
- Deleting a Node
- Exporting a Volume
- Unexporting a Volume
- Refreshing NFS Ganesha Configuration
This action creates a fresh NFS-Ganesha setup on a given volume. For this action the nfs-ganesha in the configuration file section supports the following variables:
- ha-name: This is an optional variable. By default it is ganesha-ha-360.
- cluster-nodes: This is a required argument. This variable expects comma separated values of cluster node names, which is used to form the cluster.
- vip: This is a required argument. This variable expects comma separated list of ip addresses. These will be the virtual ip addresses.
- volname: This is an optional variable if the configuration contains the [volume] section
[hosts] host-1.example.com host-2.example.com host-3.example.com host-4.example.com [backend-setup] devices=/dev/vdb vgs=vg1 pools=pool1 lvs=lv1 mountpoints=/mnt/brick [firewalld] action=add ports=111/tcp,2049/tcp,54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,16514/tcp,662/tcp,662/udp services=glusterfs,nlm,nfs,rpc-bind,high-availability,mountd,rquota [volume] action=create volname=ganesha transport=tcp replica_count=3 force=yes #Creating a high availability cluster and exporting the volume [nfs-ganesha] action=create-cluster ha-name=ganesha-ha-360 cluster-nodes=host-1.example.com,host-2.example.com,host-3.example.com,host-4 .example.com vip=10.70.44.121,10.70.44.122 volname=ganesha ignore_ganesha_errors=no
# gdeploy -c txt.conf
The action, destroy-cluster cluster disables NFS Ganesha. It allows one variable, cluster-nodes
.
[hosts] host-1.example.com host-2.example.com # To destroy the high availability cluster [nfs-ganesha] action=destroy-cluster cluster-nodes=host-1.example.com,host-2.example.com
# gdeploy -c txt.conf
The add-node action allows three variables:
nodes
: Accepts a list of comma separated hostnames that have to be added to the clustervip
: Accepts a list of comma separated ip addresses.cluster_nodes
: Accepts a list of comma separated nodes of the NFS Ganesha cluster.
[hosts] host-1.example.com host-2.example.com host-3.example.com [peer] action=probe [clients] action=mount volname=host-3.example.com:gluster_shared_storage hosts=host-3.example.com fstype=glusterfs client_mount_points=/var/run/gluster/shared_storage/ [nfs-ganesha] action=add-node nodes=host-3.example.com cluster_nodes=host-1.example.com,host-2.example.com vip=10.0.0.33
Note
# gdeploy -c txt.conf
The delete-node
action takes one variable, nodes
, which specifies the node or nodes to delete from the NFS Ganesha cluster in a comma delimited list.
[hosts] host-1.example.com host-2.example.com host-3.example.com host-4.example.com [nfs-ganesha] action=delete-node nodes=host-2.example.com
This action exports a volume. export-volume action supports one variable, volname
.
[hosts] host-1.example.com host-2.example.com [nfs-ganesha] action=export-volume volname=ganesha
# gdeploy -c txt.conf
This action unexports a volume. unexport-volume action supports one variable, volname
.
[hosts] host-1.example.com host-2.example.com [nfs-ganesha] action=unexport-volume volname=ganesha
# gdeploy -c txt.conf
This action will add/delete or add a config block to the configuration file and runs refresh-config
on the cluster.
refresh-config
supports the following variables:
- del-config-lines
- block-name
- volname
- ha-conf-dir
- update_config_lines
Note
refresh-config
with client block has few limitations:
- Works for only one client
- User cannot delete a line from a config block
[hosts] host1-example.com host2-example.com [nfs-ganesha] action=refresh-config # Default block name is `client' block-name=client config-block=clients = 10.0.0.1;|allow_root_access = true;|access_type = "RO";|Protocols = "2", "3";|anonymous_uid = 1440;|anonymous_gid = 72; volname=ganesha
# gdeploy -c txt.conf
[hosts] host1-example.com host2-example.com [nfs-ganesha] action=refresh-config del-config-lines=client volname=ganesha
# gdeploy -c txt.conf
[hosts] host1-example.com host2-example.com [nfs-ganesha] action=refresh-config volname=ganesha
# gdeploy -c txt.conf
[hosts] host1-example.com host2-example.com [nfs-ganesha] action=refresh-config update_config_lines=Access_type = "RO"; #update_config_lines=Protocols = "4"; #update_config_lines=clients = 10.0.0.1; volname=ganesha
# gdeploy -c txt.conf
5.1.9. Deploying Samba / CTDB using gdeploy
5.1.9.1. Prerequisites
You must subscribe to subscription manager and obtain the Samba packages before continuing further.
[RH-subscription1] action=register username=<user>@redhat.com password=<password> pool=<pool-id>
# gdeploy -c txt.conf
To enable the required repos, add the following details in the configuration file:
[RH-subscription2] action=enable-repos repos=rhel-7-server-rpms,rh-gluster-3-for-rhel-7-server-rpms,rh-gluster-3-samba-for-rhel-7-server-rpms,rhel-7-server-ansible-2-rpms
[RH-subscription2] action=enable-repos rh-gluster-3-for-rhel-8-x86_64-rpms,ansible-2-for-rhel-8-x86_64-rpms,rhel-8-for-x86_64-baseos-rpms,rhel-8-for-x86_64-appstream-rpms,rhel-8-for-x86_64-highavailability-rpms,rh-gluster-3-samba-for-rhel-8-x86_64-rpms
# gdeploy -c txt.conf
To enable the firewall ports, add the following details in the configuration file:
[firewalld] action=add ports=54321/tcp,5900/tcp,5900-6923/tcp,5666/tcp,4379/tcp services=glusterfs,samba,high-availability
# gdeploy -c txt.conf
To install the required package, add the following details in the configuration file
[yum] action=install repolist= gpgcheck=no update=no packages=samba,samba-client,glusterfs-server,ctdb
# gdeploy -c txt.conf
5.1.9.2. Setting up Samba
- Enabling Samba on an existing volume
- Enabling Samba while creating a volume
If a Red Hat Gluster Storage volume is already present, then the user has to mention the action as smb-setup
in the volume section. It is necessary to mention all the hosts that are in the cluster, as gdeploy updates the glusterd configuration files on each of the hosts.
[hosts] 10.70.37.192 10.70.37.88 [volume] action=smb-setup volname=samba1 force=yes smb_username=smbuser smb_mountpoint=/mnt/smb
Note
# gdeploy -c txt.conf
If Samba has be set up while creating a volume, the a variable smb
has to be set to yes in the configuration file.
[hosts] 10.70.37.192 10.70.37.88 10.70.37.65 [backend-setup] devices=/dev/vdb vgs=vg1 pools=pool1 lvs=lv1 mountpoints=/mnt/brick [volume] action=create volname=samba1 smb=yes force=yes smb_username=smbuser smb_mountpoint=/mnt/smb
# gdeploy -c txt.conf
Note
smb_username
and smb_mountpoint
are necessary if samba has to be setup with the acls set correctly.
5.1.9.3. Setting up CTDB
[hosts] 10.70.37.192 10.70.37.88 10.70.37.65 [volume] action=create volname=ctdb transport=tcp replica_count=3 force=yes [ctdb] action=setup public_address=10.70.37.6/24 eth0,10.70.37.8/24 eth0 volname=ctdb
ctdb_nodes
parameter, as shown in the following example.
[hosts] 10.70.37.192 10.70.37.88 10.70.37.65 [volume] action=create volname=ctdb transport=tcp replica_count=3 force=yes [ctdb] action=setup public_address=10.70.37.6/24 eth0,10.70.37.8/24 eth0 ctdb_nodes=192.168.1.1,192.168.2.5 volname=ctdb
# gdeploy -c txt.conf
5.1.10. Enabling SSL on a Volume
5.1.10.1. Creating a Volume and Enabling SSL
[hosts] 10.70.37.147 10.70.37.47 10.70.37.13 [backend-setup] devices=/dev/vdb vgs=vg1 pools=pool1 lvs=lv1 mountpoints=/mnt/brick [volume] action=create volname=vol1 transport=tcp replica_count=3 force=yes enable_ssl=yes ssl_clients=10.70.37.107,10.70.37.173 brick_dirs=/data/1 [clients] action=mount hosts=10.70.37.173,10.70.37.107 volname=vol1 fstype=glusterfs client_mount_points=/mnt/data
# gdeploy -c txt.conf
5.1.10.2. Enabling SSL on an Existing Volume:
[hosts] 10.70.37.147 10.70.37.47 # It is important for the clients to be unmounted before setting up SSL [clients1] action=unmount hosts=10.70.37.173,10.70.37.107 client_mount_points=/mnt/data [volume] action=enable-ssl volname=vol2 ssl_clients=10.70.37.107,10.70.37.173 [clients2] action=mount hosts=10.70.37.173,10.70.37.107 volname=vol2 fstype=glusterfs client_mount_points=/mnt/data
# gdeploy -c txt.conf
5.1.11. Gdeploy log files
/home/username/.gdeploy/logs/gdeploy.log
instead of the /var/log
directory.
GDEPLOY_LOGFILE
environment variable. For example, to set the gdeploy log location to /var/log/gdeploy/gdeploy.log
for this session, run the following command:
$ export GDEPLOY_LOGFILE=/var/log/gdeploy/gdeploy.log
/home/username/.bash_profile
file for that user.
5.2. About Encrypted Disk
- For RHEL 6, refer to Disk Encryption Appendix of the Red Hat Enterprise Linux 6 Installation Guide.
Important
Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide - For RHEL 7, refer to Encryption of the Red Hat Enterprise Linux 7 Security Guide.
- Starting in RHEL 7.5, Red Hat has implemented an additional component that can be used to enable LUKS disks remotely during startup called as Network Bound Disk Encryption (NBDE). For more information on NBDE, refer to Configuring Automated Unlocking of Encrypted Volumes using Policy-Based Decryption of the Red Hat Enterprise Linux 7 Security Guide.
- For RHEL 8, refer to Encrypting Block Devices Using LUKS of the Red Hat Enterprise Linux 8 Security Guide.
5.3. Formatting and Mounting Bricks
5.3.1. Creating Bricks Manually
Important
- Red Hat supports formatting a Logical Volume using the XFS file system on the bricks.
- Red Hat supports heterogeneous subvolume sizes for distributed volumes (either pure distributed, distributed-replicated or distributed-dispersed). Red Hat does not support heterogeneous brick sizes for bricks of the same subvolume.For example, you can have a distributed-replicated 3x3 volume with 3 bricks of 10GiB, 3 bricks of 50GiB and 3 bricks of 100GiB as long as the 3 10GiB bricks belong to the same replicate and similarly the 3 50GiB and 100GiB bricks belong to the same replicate set. In this way you will have 1 subvolume of 10GiB, another of 50GiB and 100GiB. The distributed hash table balances the number of assigned files to each subvolume so that the subvolumes get filled proportionally to their size.
5.3.1.1. Creating a Thinly Provisioned Logical Volume
- Create a physical volume(PV) by using the
pvcreate
command.# pvcreate --dataalignment alignment_value device
For example:# pvcreate --dataalignment 1280K /dev/sdb
Here,/dev/sdb
is a storage device.Use the correctdataalignment
option based on your device. For more information, see Section 19.2, “Brick Configuration”Note
The device name and the alignment value will vary based on the device you are using. - Create a Volume Group (VG) from the PV using the
vgcreate
command:# vgcreate --physicalextentsize alignment_value volgroup device
For example:# vgcreate --physicalextentsize 1280K rhs_vg /dev/sdb
- Create a thin-pool using the following commands:
# lvcreate --thin volgroup/poolname --size pool_sz --chunksize chunk_sz --poolmetadatasize metadev_sz --zero n
For example:# lvcreate --thin rhs_vg/rhs_pool --size 2T --chunksize 1280K --poolmetadatasize 16G --zero n
Ensure you read Chapter 19, Tuning for Performance to select appropriate values forchunksize
andpoolmetadatasize
. - Create a thinly provisioned volume that uses the previously created pool by running the
lvcreate
command with the--virtualsize
and--thin
options:# lvcreate --virtualsize size --thin volgroup/poolname --name volname
For example:# lvcreate --virtualsize 1G --thin rhs_vg/rhs_pool --name rhs_lv
It is recommended that only one LV should be created in a thin pool. - Format bricks using the supported XFS configuration, mount the bricks, and verify the bricks are mounted correctly. To enhance the performance of Red Hat Gluster Storage, ensure you read Chapter 19, Tuning for Performance before formatting the bricks.
Important
Snapshots are not supported on bricks formatted with external log devices. Do not use-l logdev=device
option withmkfs.xfs
command for formatting the Red Hat Gluster Storage bricks.# mkfs.xfs -f -i size=512 -n size=8192 -d su=128k,sw=10 device
DEVICE is the created thin LV. The inode size is set to 512 bytes to accommodate for the extended attributes used by Red Hat Gluster Storage. - Run
# mkdir /mountpoint
to create a directory to link the brick to.# mkdir /rhgs
- Add an entry in
/etc/fstab
:/dev/volgroup/volname /mountpoint xfs rw,inode64,noatime,nouuid,x-systemd.device-timeout=10min 1 2
For example:/dev/rhs_vg/rhs_lv /rhgs xfs rw,inode64,noatime,nouuid,x-systemd.device-timeout=10min 1 2
- Run
mount /mountpoint
to mount the brick. - Run the
df -h
command to verify the brick is successfully mounted:# df -h /dev/rhs_vg/rhs_lv 16G 1.2G 15G 7% /rhgs
- If SElinux is enabled, then the SELinux labels that has to be set manually for the bricks created using the following commands:
# semanage fcontext -a -t glusterd_brick_t /rhgs/brick1 # restorecon -Rv /rhgs/brick1
5.3.2. Using Subdirectory as the Brick for Volume
/rhgs
directory is the mounted file system and is used as the brick for volume creation. However, for some reason, if the mount point is unavailable, any write continues to happen in the /rhgs
directory, but now this is under root file system.
/bricks
. After the file system is available, create a directory called /rhgs/brick1
and use it for volume creation. Ensure that no more than one brick is created from a single mount. This approach has the following advantages:
- When the
/rhgs
file system is unavailable, there is no longer/rhgs/brick1
directory available in the system. Hence, there will be no data loss by writing to a different location. - This does not require any additional file system for nesting.
- Create the
brick1
subdirectory in the mounted file system.# mkdir /rhgs/brick1
Repeat the above steps on all nodes. - Create the Red Hat Gluster Storage volume using the subdirectories as bricks.
# gluster volume create distdata01 ad-rhs-srv1:/rhgs/brick1 ad-rhs-srv2:/rhgs/brick2
- Start the Red Hat Gluster Storage volume.
# gluster volume start distdata01
- Verify the status of the volume.
# gluster volume status distdata01
Note
# df -h /dev/rhs_vg/rhs_lv1 16G 1.2G 15G 7% /rhgs1 /dev/rhs_vg/rhs_lv2 16G 1.2G 15G 7% /rhgs2
# gluster volume create test-volume server1:/rhgs1/brick1 server2:/rhgs1/brick1 server1:/rhgs2/brick2 server2:/rhgs2/brick2
5.3.3. Reusing a Brick from a Deleted Volume
# mkfs.xfs -f -i size=512 device
to reformat the brick to supported requirements, and make it available for immediate reuse in a new volume.
Note
5.3.4. Cleaning An Unusable Brick
- Delete all previously existing data in the brick, including the
.glusterfs
subdirectory. - Run
# setfattr -x trusted.glusterfs.volume-id brick
and# setfattr -x trusted.gfid brick
to remove the attributes from the root of the brick. - Run
# getfattr -d -m . brick
to examine the attributes set on the volume. Take note of the attributes. - Run
# setfattr -x attribute brick
to remove the attributes relating to the glusterFS file system.Thetrusted.glusterfs.dht
attribute for a distributed volume is one such example of attributes that need to be removed.
5.4. Creating Distributed Volumes
Figure 5.1. Illustration of a Distributed Volume
Warning
- No in-service upgrades - distributed only volumes need to be taken offline during upgrades.
- Temporary inconsistencies of directory entries and inodes during eventual node failures.
- I/O operations will block or fail due to node unavailability or eventual node failures.
- Permanent loss of data.
Create a Distributed Volume
gluster volume create
command to create different types of volumes, and gluster volume info
command to verify successful volume creation.
Prerequisites
- A trusted storage pool has been created, as described in Section 4.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 5.10, “Starting Volumes”.
- Run the
gluster volume create
command to create the distributed volume.The syntax isgluster volume create NEW-VOLNAME [transport tcp | rdma (Deprecated) | tcp,rdma] NEW-BRICK...
The default value for transport istcp
. Other options can be passed such asauth.allow
orauth.reject
. See Section 11.1, “Configuring Volume Options” for a full list of parameters.Red Hat recommends disabling theperformance.client-io-threads
option on distributed volumes, as this option tends to worsen performance. Run the following command to disableperformance.client-io-threads
:# gluster volume set VOLNAME performance.client-io-threads off
Example 5.1. Distributed Volume with Two Storage Servers
# gluster v create glustervol server1:/rhgs/brick1 server2:/rhgs/brick1 volume create: glutervol: success: please start the volume to access data
Example 5.2. Distributed Volume over InfiniBand with Four Servers
# gluster v create glustervol transport rdma server1:/rhgs/brick1 server2:/rhgs/brick1 server3:/rhgs/brick1 server4:/rhgs/brick1 volume create: glutervol: success: please start the volume to access data
- Run
# gluster volume start VOLNAME
to start the volume.# gluster v start glustervol volume start: glustervol: success
- Run
gluster volume info
command to optionally display the volume information.The following output is the result of Example 5.1, “Distributed Volume with Two Storage Servers”.# gluster volume info Volume Name: test-volume Type: Distribute Status: Created Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: server1:/rhgs/brick Brick2: server2:/rhgs/brick
5.5. Creating Replicated Volumes
gluster volume create
to create different types of volumes, and gluster volume info
to verify successful volume creation.
- A trusted storage pool has been created, as described in Section 4.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 5.10, “Starting Volumes”.
Warning
5.5.1. Creating Three-way Replicated Volumes
Figure 5.2. Illustration of a Three-way Replicated Volume
- Run the
gluster volume create
command to create the replicated volume.The syntax is# gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma (Deprecated) | tcp,rdma] NEW-BRICK...
The default value for transport istcp
. Other options can be passed such asauth.allow
orauth.reject
. See Section 11.1, “Configuring Volume Options” for a full list of parameters.Example 5.3. Replicated Volume with Three Storage Servers
The order in which bricks are specified determines how bricks are replicated with each other. For example, everyn
bricks, where3
is the replica count forms a replica set. This is illustrated in Figure 5.2, “Illustration of a Three-way Replicated Volume”.# gluster v create glutervol data replica 3 transport tcp server1:/rhgs/brick1 server2:/rhgs/brick2 server3:/rhgs/brick3 volume create: glutervol: success: please start the volume to access
- Run
# gluster volume start VOLNAME
to start the volume.# gluster v start glustervol volume start: glustervol: success
- Run
gluster volume info
command to optionally display the volume information.
Important
5.5.2. Creating Sharded Replicated Volumes
.shard
directory, and are named with the GFID and a number indicating the order of the pieces. For example, if a file is split into four pieces, the first piece is named GFID and stored normally. The other three pieces are named GFID.1, GFID.2, and GFID.3 respectively. They are placed in the .shard
directory and distributed evenly between the various bricks in the volume.
5.5.2.1. Supported use cases
Important
Important
Example 5.4. Example: Three-way replicated sharded volume
- Set up a three-way replicated volume, as described in the Red Hat Gluster Storage Administration Guide: https://access.redhat.com/documentation/en-US/red_hat_gluster_storage/3.5/html/Administration_Guide/sect-Creating_Replicated_Volumes.html#Creating_Three-way_Replicated_Volumes.
- Before you start your volume, enable sharding on the volume.
# gluster volume set test-volume features.shard enable
- Start the volume and ensure it is working as expected.
# gluster volume test-volume start # gluster volume info test-volume
5.5.2.2. Configuration Options
-
features.shard
- Enables or disables sharding on a specified volume. Valid values are
enable
anddisable
. The default value isdisable
.# gluster volume set volname features.shard enable
Note that this only affects files created after this command is run; files created before this command is run retain their old behaviour. -
features.shard-block-size
- Specifies the maximum size of the file pieces when sharding is enabled. The supported value for this parameter is 512MB.
# gluster volume set volname features.shard-block-size 32MB
Note that this only affects files created after this command is run; files created before this command is run retain their old behaviour.
5.5.2.3. Finding the pieces of a sharded file
# getfattr -d -m. -e hex path_to_file
# ls /rhgs/*/.shard -lh | grep GFID
5.6. Creating Distributed Replicated Volumes
Note
- A trusted storage pool has been created, as described in Section 4.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 5.10, “Starting Volumes”.
5.6.1. Creating Three-way Distributed Replicated Volumes
Figure 5.3. Illustration of a Three-way Distributed Replicated Volume
- Run the
gluster volume create
command to create the distributed replicated volume.The syntax is# gluster volume create NEW-VOLNAME [replica COUNT] [transport tcp | rdma (Deprecated) | tcp,rdma] NEW-BRICK...
The default value for transport istcp
. Other options can be passed such asauth.allow
orauth.reject
. See Section 11.1, “Configuring Volume Options” for a full list of parameters.Example 5.5. Six Node Distributed Replicated Volume with a Three-way Replication
The order in which bricks are specified determines how bricks are replicated with each other. For example, first 3 bricks, where 3 is the replica count forms a replicate set.# gluster v create glustervol replica 3 transport tcp server1:/rhgs/brick1 server2:/rhgs/brick1 server3:/rhgs/brick1 server4:/rhgs/brick1 server5:/rhgs/brick1 server6:/rhgs/brick1 volume create: glutervol: success: please start the volume to access data
- Run
# gluster volume start VOLNAME
to start the volume.# gluster v start glustervol volume start: glustervol: success
- Run
gluster volume info
command to optionally display the volume information.
Important
5.7. Creating Arbitrated Replicated Volumes
Advantages of arbitrated replicated volumes
- Better consistency
- When an arbiter is configured, arbitration logic uses client-side quorum in auto mode to prevent file operations that would lead to split-brain conditions.
- Less disk space required
- Because an arbiter brick only stores file names and metadata, an arbiter brick can be much smaller than the other bricks in the volume.
- Fewer nodes required
- The node that contains the arbiter brick of one volume can be configured with the data brick of another volume. This "chaining" configuration allows you to use fewer nodes to fulfill your overall storage requirements.
- Easy migration from deprecated two-way replicated volumes
- Red Hat Gluster Storage can convert a two-way replicated volume without arbiter bricks into an arbitrated replicated volume. See Section 5.7.5, “Converting to an arbitrated volume” for details.
Limitations of arbitrated replicated volumes
- Arbitrated replicated volumes provide better data consistency than a two-way replicated volume that does not have arbiter bricks. However, because arbitrated replicated volumes store only metadata, they provide the same level of availability as a two-way replicated volume that does not have arbiter bricks. To achieve high-availability, you need to use a three-way replicated volume instead of an arbitrated replicated volume.
- Tiering is not compatible with arbitrated replicated volumes.
- Arbitrated volumes can only be configured in sets of three bricks at a time. Red Hat Gluster Storage can convert an existing two-way replicated volume without arbiter bricks into an arbitrated replicated volume by adding an arbiter brick to that volume. See Section 5.7.5, “Converting to an arbitrated volume” for details.
5.7.1. Arbitrated volume requirements
5.7.1.1. System requirements for nodes hosting arbiter bricks
Configuration type | Min CPU | Min RAM | NIC | Arbiter Brick Size | Max Latency |
---|---|---|---|---|---|
Dedicated arbiter | 64-bit quad-core processor with 2 sockets | 8 GB[a] | Match to other nodes in the storage pool | 1 TB to 4 TB[b] | 5 ms[c] |
Chained arbiter | Match to other nodes in the storage pool | 1 TB to 4 TB[d] | 5 ms[e] | ||
[a]
More RAM may be necessary depending on the combined capacity of the number of arbiter bricks on the node.
[b]
Arbiter and data bricks can be configured on the same device provided that the data and arbiter bricks belong to different replica sets. See Section 5.7.1.2, “Arbiter capacity requirements” for further details on sizing arbiter volumes.
[c]
This is the maximum round trip latency requirement between all nodes irrespective of Aribiter node. See KCS#413623 to know how to determine latency between nodes.
[d]
Multiple bricks can be created on a single RAIDed physical device. Please refer the following product documentation: Section 19.2, “Brick Configuration”
[e]
This is the maximum round trip latency requirement between all nodes irrespective of Aribiter node. See KCS#413623 to know how to determine latency between nodes.
|
- minimum 4 vCPUs
- minimum 16 GB RAM
- 1 TB to 4 TB of virtual disk space
- maximum 5 ms latency
5.7.1.2. Arbiter capacity requirements
minimum arbiter brick size = 4 KB * ( size in KB of largest data brick in volume or replica set / average file size in KB)
minimum arbiter brick size = 4 KB * ( 1 TB / 2 GB ) = 4 KB * ( 1000000000 KB / 2000000 KB ) = 4 KB * 500 KB = 2000 KB = 2 MB
minimum arbiter brick size = 4 KB * ( size in KB of largest data brick in volume or replica set / shard block size in KB )
5.7.2. Arbitration logic
Volume state | Arbitration behavior |
---|---|
All bricks available | All file operations permitted. |
Arbiter and 1 data brick available |
If the arbiter does not agree with the available data node, write operations fail with ENOTCONN (since the brick that is correct is not available). Other file operations are permitted.
If the arbiter's metadata agrees with the available data node, all file operations are permitted.
|
Arbiter down, data bricks available | All file operations are permitted. The arbiter's records are healed when it becomes available. |
Only one brick available |
All file operations fail with ENOTCONN.
|
5.7.3. Creating an arbitrated replicated volume
# gluster volume create VOLNAME replica 3 arbiter 1 HOST1:DATA_BRICK1 HOST2:DATA_BRICK2 HOST3:ARBITER_BRICK3
Note
# gluster volume create testvol replica 3 arbiter 1 \ server1:/bricks/brick server2:/bricks/brick server3:/bricks/arbiter_brick \ server4:/bricks/brick server5:/bricks/brick server6:/bricks/arbiter_brick
# gluster volume info testvol Volume Name: testvol Type: Distributed-Replicate Volume ID: ed9fa4d5-37f1-49bb-83c3-925e90fab1bc Status: Created Snapshot Count: 0 Number of Bricks: 2 x (2 + 1) = 6 Transport-type: tcp Bricks: Brick1: server1:/bricks/brick Brick2: server2:/bricks/brick Brick3: server3:/bricks/arbiter_brick (arbiter) Brick1: server4:/bricks/brick Brick2: server5:/bricks/brick Brick3: server6:/bricks/arbiter_brick (arbiter) Options Reconfigured: cluster.granular-entry-heal: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on
5.7.4. Creating multiple arbitrated replicated volumes across fewer total nodes
- Chain multiple arbitrated replicated volumes together, by placing the arbiter brick for one volume on the same node as a data brick for another volume. Chaining is useful for write-heavy workloads when file size is closer to metadata file size (that is, from 32–128 KiB). This avoids all metadata I/O going through a single disk.In arbitrated distributed-replicated volumes, you can also place an arbiter brick on the same node as another replica sub-volume's data brick, since these do not share the same data.
- Place the arbiter bricks from multiple volumes on a single dedicated node. A dedicated arbiter node is suited to write-heavy workloads with larger files, and read-heavy workloads.
Example 5.6. Example of a dedicated configuration
# gluster volume create firstvol replica 3 arbiter 1 server1:/bricks/brick server2:/bricks/brick server3:/bricks/arbiter_brick # gluster volume create secondvol replica 3 arbiter 1 server4:/bricks/data_brick server5:/bricks/brick server3:/bricks/brick
Example 5.7. Example of a chained configuration
# gluster volume create arbrepvol replica 3 arbiter 1 server1:/bricks/brick1 server2:/bricks/brick1 server3:/bricks/arbiter_brick1 server2:/bricks/brick2 server3:/bricks/brick2 server4:/bricks/arbiter_brick2 server3:/bricks/brick3 server4:/bricks/brick3 server5:/bricks/arbiter_brick3 server4:/bricks/brick4 server5:/bricks/brick4 server6:/bricks/arbiter_brick4 server5:/bricks/brick5 server6:/bricks/brick5 server1:/bricks/arbiter_brick5 server6:/bricks/brick6 server1:/bricks/brick6 server2:/bricks/arbiter_brick6
5.7.5. Converting to an arbitrated volume
Procedure 5.1. Converting a replica 2 volume to an arbitrated volume
Warning
Verify that healing is not in progress
# gluster volume heal VOLNAME info
Wait until pending heal entries is0
before proceeding.Disable and stop self-healing
Run the following commands to disable data, metadata, and entry self-heal, and the self-heal daemon.# gluster volume set VOLNAME cluster.data-self-heal off # gluster volume set VOLNAME cluster.metadata-self-heal off # gluster volume set VOLNAME cluster.entry-self-heal off # gluster volume set VOLNAME self-heal-daemon off
Add arbiter bricks to the volume
Convert the volume by adding an arbiter brick for each replicated sub-volume.# gluster volume add-brick VOLNAME replica 3 arbiter 1 HOST:arbiter-brick-path
For example, if you have an existing two-way replicated volume called testvol, and a new brick for the arbiter to use, you can add a brick as an arbiter with the following command:# gluster volume add-brick testvol replica 3 arbiter 1 server:/bricks/arbiter_brick
If you have an existing two-way distributed-replicated volume, you need a new brick for each sub-volume in order to convert it to an arbitrated distributed-replicated volume, for example:# gluster volume add-brick testvol replica 3 arbiter 1 server1:/bricks/arbiter_brick1 server2:/bricks/arbiter_brick2
Wait for client volfiles to update
This takes about 5 minutes.Verify that bricks added successfully
# gluster volume info VOLNAME # gluster volume status VOLNAME
Re-enable self-healing
Run the following commands to re-enable self-healing on the servers.# gluster volume set VOLNAME cluster.data-self-heal on # gluster volume set VOLNAME cluster.metadata-self-heal on # gluster volume set VOLNAME cluster.entry-self-heal on # gluster volume set VOLNAME self-heal-daemon on
Verify all entries are healed
# gluster volume heal VOLNAME info
Wait until pending heal entries is0
to ensure that all heals completed successfully.
Procedure 5.2. Converting a replica 3 volume to an arbitrated volume
Warning
Verify that healing is not in progress
# gluster volume heal VOLNAME info
Wait until pending heal entries is0
before proceeding.Reduce the replica count of the volume to 2
Remove one brick from every sub-volume in the volume so that the replica count is reduced to 2. For example, in a replica 3 volume that distributes data across 2 sub-volumes, run the following command:# gluster volume remove-brick VOLNAME replica 2 HOST:subvol1-brick-path HOST:subvol2-brick-path force
Note
In a distributed replicated volume, data is distributed across sub-volumes, and replicated across bricks in a sub-volume. This means that to reduce the replica count of a volume, you need to remove a brick from every sub-volume.Bricks are grouped by sub-volume in thegluster volume info
output. If the replica count is 3, the first 3 bricks form the first sub-volume, the next 3 bricks form the second sub-volume, and so on.# gluster volume info VOLNAME [...] Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: node1:/test1/brick Brick2: node2:/test2/brick Brick3: node3:/test3/brick Brick4: node1:/test4/brick Brick5: node2:/test5/brick Brick6: node3:/test6/brick [...]
In this volume, data is distributed across two sub-volumes, which each consist of three bricks. The first sub-volume consists of bricks 1, 2, and 3. The second sub-volume consists of bricks 4, 5, and 6. Removing any one brick from each subvolume using the following command reduces the replica count to 2 as required.# gluster volume remove-brick VOLNAME replica 2 HOST:subvol1-brick-path HOST:subvol2-brick-path force
Disable and stop self-healing
Run the following commands to disable data, metadata, and entry self-heal, and the self-heal daemon.# gluster volume set VOLNAME cluster.data-self-heal off # gluster volume set VOLNAME cluster.metadata-self-heal off # gluster volume set VOLNAME cluster.entry-self-heal off # gluster volume set VOLNAME self-heal-daemon off
Add arbiter bricks to the volume
Convert the volume by adding an arbiter brick for each replicated sub-volume.# gluster volume add-brick VOLNAME replica 3 arbiter 1 HOST:arbiter-brick-path
For example, if you have an existing replicated volume:# gluster volume add-brick testvol replica 3 arbiter 1 server:/bricks/brick
If you have an existing distributed-replicated volume:# gluster volume add-brick testvol replica 3 arbiter 1 server1:/bricks/arbiter_brick1 server2:/bricks/arbiter_brick2
Wait for client volfiles to update
This takes about 5 minutes. Verify that this is complete by running the following command on each client.# grep -ir connected mount-path/.meta/graphs/active/volname-client-*/private
The number of timesconnected=1
appears in the output is the number of bricks connected to the client.Verify that bricks added successfully
# gluster volume info VOLNAME # gluster volume status VOLNAME
Re-enable self-healing
Run the following commands to re-enable self-healing on the servers.# gluster volume set VOLNAME cluster.data-self-heal on # gluster volume set VOLNAME cluster.metadata-self-heal on # gluster volume set VOLNAME cluster.entry-self-heal on # gluster volume set VOLNAME self-heal-daemon on
Verify all entries are healed
# gluster volume heal VOLNAME info
Wait until pending heal entries is0
to ensure that all heals completed successfully.
5.7.6. Converting an arbitrated volume to a three-way replicated volume
Warning
Procedure 5.3. Converting an arbitrated volume to a replica 3 volume
Verify that healing is not in progress
# gluster volume heal VOLNAME info
Wait until pending heal entries is0
before proceeding.Remove arbiter bricks from the volume
Check which bricks are listed as(arbiter)
, and then remove those bricks from the volume.# gluster volume info VOLNAME
# gluster volume remove-brick VOLNAME replica 2 HOST:arbiter-brick-path force
Disable and stop self-healing
Run the following commands to disable data, metadata, and entry self-heal, and the self-heal daemon.# gluster volume set VOLNAME cluster.data-self-heal off # gluster volume set VOLNAME cluster.metadata-self-heal off # gluster volume set VOLNAME cluster.entry-self-heal off # gluster volume set VOLNAME self-heal-daemon off
Add full bricks to the volume
Convert the volume by adding a brick for each replicated sub-volume.# gluster volume add-brick VOLNAME replica 3 HOST:brick-path
For example, if you have an existing arbitrated replicated volume:# gluster volume add-brick testvol replica 3 server:/bricks/brick
If you have an existing arbitrated distributed-replicated volume:# gluster volume add-brick testvol replica 3 server1:/bricks/brick1 server2:/bricks/brick2
Wait for client volfiles to update
This takes about 5 minutes.Verify that bricks added successfully
# gluster volume info VOLNAME # gluster volume status VOLNAME
Re-enable self-healing
Run the following commands to re-enable self-healing on the servers.# gluster volume set VOLNAME cluster.data-self-heal on # gluster volume set VOLNAME cluster.metadata-self-heal on # gluster volume set VOLNAME cluster.entry-self-heal on # gluster volume set VOLNAME self-heal-daemon on
Verify all entries are healed
# gluster volume heal VOLNAME info
Wait until pending heal entries is0
to ensure that all heals completed successfully.
5.7.7. Tuning recommendations for arbitrated volumes
- For dedicated arbiter nodes, use JBOD for arbiter bricks, and RAID6 for data bricks.
- For chained arbiter volumes, use the same RAID6 drive for both data and arbiter bricks.
5.8. Creating Dispersed Volumes
Important
Figure 5.4. Illustration of a Dispersed Volume
n = k + m
. Here n
is the total number of bricks, we would require any k
bricks out of n
bricks for recovery. In other words, we can tolerate failure up to any m
bricks. With this release, the following configurations are supported:
- 6 bricks with redundancy level 2 (4 + 2)
- 10 bricks with redundancy level 2 (8 + 2)
- 11 bricks with redundancy level 3 (8 + 3)
- 12 bricks with redundancy level 4 (8 + 4)
- 20 bricks with redundancy level 4 (16 + 4)
gluster volume create
to create different types of volumes, and gluster volume info
to verify successful volume creation.
- Create a trusted storage pool as described in Section 4.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 5.10, “Starting Volumes”.
Important
- Run the
gluster volume create
command to create the dispersed volume.The syntax is# gluster volume create NEW-VOLNAME [disperse-data COUNT] [redundancy COUNT] [transport tcp | rdma (Deprecated) | tcp,rdma] NEW-BRICK...
The number of bricks required to create a disperse volume is the sum ofdisperse-data count
andredundancy count
.Thedisperse-data
count
option specifies the number of bricks that is part of the dispersed volume, excluding the count of the redundant bricks. For example, if the total number of bricks is 6 andredundancy-count
is specified as 2, then the disperse-data count is 4 (6 - 2 = 4). If thedisperse-data count
option is not specified, and only theredundancy count
option is specified, then thedisperse-data count
is computed automatically by deducting the redundancy count from the specified total number of bricks.Redundancy determines how many bricks can be lost without interrupting the operation of the volume. Ifredundancy count
is not specified, based on the configuration it is computed automatically to the optimal value and a warning message is displayed.The default value for transport istcp
. Other options can be passed such asauth.allow
orauth.reject
. See Section 5.2, “About Encrypted Disk” for a full list of parameters.Example 5.8. Dispersed Volume with Six Storage Servers
# gluster v create glustervol disperse-data 4 redundancy 2 transport tcp server1:/rhgs1/brick1 server2:/rhgs2/brick2 server3:/rhgs3/brick3 server4:/rhgs4/brick4 server5:/rhgs5/brick5 server6:/rhgs6/brick6 volume create: glutervol: success: please start the volume to access data
- Run
# gluster volume start VOLNAME
to start the volume.# gluster v start glustervol volume start: glustervol: success
Important
Theopen-behind
volume option is enabled by default. If you are accessing the dispersed volume using the SMB protocol, you must disable theopen-behind
volume option to avoid performance bottleneck on large file workload. Run the following command to disableopen-behind
volume option:# gluster volume set VOLNAME open-behind off
For information onopen-behind
volume option, see Section 11.1, “Configuring Volume Options” - Run
gluster volume info
command to optionally display the volume information.
5.9. Creating Distributed Dispersed Volumes
- Multiple disperse sets containing 6 bricks with redundancy level 2
- Multiple disperse sets containing 10 bricks with redundancy level 2
- Multiple disperse sets containing 11 bricks with redundancy level 3
- Multiple disperse sets containing 12 bricks with redundancy level 4
- Multiple disperse sets containing 20 bricks with redundancy level 4
Important
gluster volume create
to create different types of volumes, and gluster volume info
to verify successful volume creation.
- A trusted storage pool has been created, as described in Section 4.1, “Adding Servers to the Trusted Storage Pool”.
- Understand how to start and stop volumes, as described in Section 5.10, “Starting Volumes”.
Figure 5.5. Illustration of a Distributed Dispersed Volume
Important
- Run the
gluster volume create
command to create the dispersed volume.The syntax is# gluster volume create NEW-VOLNAME disperse-data COUNT [redundancy COUNT] [transport tcp | rdma (Deprecated) | tcp,rdma] NEW-BRICK...
The default value for transport istcp
. Other options can be passed such asauth.allow
orauth.reject
. See Section 11.1, “Configuring Volume Options” for a full list of parameters.Example 5.9. Distributed Dispersed Volume with Six Storage Servers
# gluster v create glustervol disperse-data 4 redundancy 2 transport tcp server1:/rhgs1/brick1 server2:/rhgs2/brick2 server3:/rhgs3/brick3 server4:/rhgs4/brick4 server5:/rhgs5/brick5 server6:/rhgs6/brick6 server1:/rhgs7/brick7 server2:/rhgs8/brick8 server3:/rhgs9/brick9 server4:/rhgs10/brick10 server5:/rhgs11/brick11 server6:/rhgs12/brick12 volume create: glutervol: success: please start the volume to access data.
The above example is illustrated in Figure 5.4, “Illustration of a Dispersed Volume” . In the illustration and example, you are creating 12 bricks from 6 servers. - Run
# gluster volume start VOLNAME
to start the volume.# gluster v start glustervol volume start: glustervol: success
Important
Theopen-behind
volume option is enabled by default. If you are accessing the distributed dispersed volume using the SMB protocol, you must disable theopen-behind
volume option to avoid performance bottleneck on large file workload. Run the following command to disableopen-behind
volume option:# gluster volume set VOLNAME open-behind off
For information onopen-behind
volume option, see Section 11.1, “Configuring Volume Options” - Run
gluster volume info
command to optionally display the volume information.
5.10. Starting Volumes
# gluster volume start VOLNAME
# gluster v start glustervol volume start: glustervol: success
Chapter 6. Creating Access to Volumes
Warning
storage.fips-mode-rchecksum
volume option on volumes with clients that use Red Hat Gluster Storage 3.4 or earlier.
- Native Client (see Section 6.2, “Native Client”)
- Network File System (NFS) v3 (see Section 6.3, “NFS”)
- Server Message Block (SMB) (see Section 6.4, “SMB”)
6.1. Client Support Information
6.1.1. Cross Protocol Data Access
SMB | Gluster NFS | NFS-Ganesha | Native FUSE | Object | |
---|---|---|---|---|---|
SMB | Yes | No | No | No | No |
Gluster NFS (Deprecated) | No | Yes | No | No | No |
NFS-Ganesha | No | No | Yes | No | No |
Native FUSE | No | No | No | Yes | Yes [a] |
6.1.2. Client Operating System Protocol Support
Client OS | FUSE | Gluster NFS | NFS-Ganesha | SMB |
---|---|---|---|---|
RHEL 5 | Unsupported | Unsupported | Unsupported | Unsupported |
RHEL 6 | Supported | Deprecated | Unsupported | Supported |
RHEL 7 | Supported | Deprecated | Supported | Supported |
RHEL 8 | Supported | Unsupported | Supported | Supported |
Windows Server 2008, 2012, 2016 | Unsupported | Unsupported | Unsupported | Supported |
Windows 7, 8, 10 | Unsupported | Unsupported | Unsupported | Supported |
Mac OS 10.15 | Unsupported | Unsupported | Unsupported | Supported |
6.1.3. Transport Protocol Support
Access Protocols | TCP | RDMA (Deprecated) |
---|---|---|
FUSE | Yes | Yes |
SMB | Yes | No |
NFS | Yes | Yes |
Warning
Important
6.2. Native Client
- Install Native Client packages
- Mount Red Hat Gluster Storage volumes (manually and automatically)
- Verify that the Gluster Storage volume has mounted successfully
Note
- Red Hat Gluster Storage server supports the Native Client version which is the same as the server version and the preceding version of Native Client . For list of releases see: https://access.redhat.com/solutions/543123.
- From Red Hat Gluster Storage 3.5 batch update 7 onwards,
glusterfs-6.0-62
and higher version of glusterFS Native Client is only available viarh-gluster-3-client-for-rhel-8-x86_64-rpms
for Red Hat Gluster Storage based on Red Hat Enterprise Enterprise Linux (RHEL 8) andrh-gluster-3-client-for-rhel-7-server-rpms
for Red Hat Gluster Storage based on RHEL 7.
Red Hat Enterprise Linux version | Red Hat Gluster Storage version | Native client version |
---|---|---|
6.5 | 3.0 | 3.0, 2.1* |
6.6 | 3.0.2, 3.0.3, 3.0.4 | 3.0, 2.1* |
6.7 | 3.1, 3.1.1, 3.1.2 | 3.1, 3.0, 2.1* |
6.8 | 3.1.3 | 3.1.3 |
6.9 | 3.2 | 3.2, 3.1.3* |
6.9 | 3.3 | 3.3, 3.2 |
6.9 | 3.3.1 | 3.3.1, 3.3, 3.2 |
6.10 | 3.4 | 3.5*, 3.4, 3.3.z |
7.1 | 3.1, 3.1.1 | 3.1.1, 3.1, 3.0 |
7.2 | 3.1.2 | 3.1.2, 3.1, 3.0 |
7.2 | 3.1.3 | 3.1.3 |
7.3 | 3.2 | 3.2, 3.1.3 |
7.4 | 3.2 | 3.2, 3.1.3 |
7.4 | 3.3 | 3.3, 3.2 |
7.4 | 3.3.1 | 3.3.1, 3.3, 3.2 |
7.5 | 3.3.1, 3.4 | 3.3.z, 3.4.z |
7.6 | 3.3.1, 3.4 | 3.3.z, 3.4.z |
7.7 | 3.5.1 | 3.4.z, 3.5.z |
7.8 | 3.5.2 | 3.4.z, 3.5.z |
7.9 | 3.5.3, 3.5.4, 3.5.5, 3.5.6, 3.5.7 | 3.4.z, 3.5.z |
8.1 | NA | 3.5 |
8.2 | 3.5.2 | 3.5.z |
8.3 | 3.5.3 | 3.5.z |
8.4 | 3.5.4 | 3.5.z |
8.5 | 3.5.5, 3.5.6 | 3.5.z |
8.6 | 3.5.7 | 3.5.z |
Warning
Warning
- For Red Hat Gluster Storage 3.5, Red Hat supports only Red Hat Gluster Storage 3.4 and 3.5 clients.
6.2.1. Installing Native Client
- Use the Command Line to Register and Subscribe a System to Red Hat Subscription Management
- Use the Web Interface to Register and Subscribe a System to Red Hat Subscription Management
Important
Use the Command Line to Register and Subscribe a System to Red Hat Subscription Management
Prerequisites
- Know the user name and password of the Red Hat Subscription Manager account with Red Hat Gluster Storage entitlements.
- Run the
subscription-manager register
command to list the available pools. Select the appropriate pool and enter your Red Hat Subscription Manager user name and password to register the system with Red Hat Subscription Manager.# subscription-manager register
- Depending on your client, run one of the following commands to subscribe to the correct repositories.
- For Red Hat Enterprise Linux 8 clients:
# subscription-manager repos --enable=rh-gluster-3-client-for-rhel-8-x86_64-rpms
- For Red Hat Enterprise Linux 7.x clients:
# subscription-manager repos --enable=rhel-7-server-rpms --enable=rh-gluster-3-client-for-rhel-7-server-rpms
Note
The following command can also be used, but Red Hat Gluster Storage may deprecate support for this repository in future releases.# subscription-manager repos --enable=rhel-7-server-rh-common-rpms
- For Red Hat Enterprise Linux 6.1 and later clients:
# subscription-manager repos --enable=rhel-6-server-rpms --enable=rhel-6-server-rhs-client-1-rpms
For more information on subscriptions, refer to Section 3.1 Registering and attaching a system from the Command Line in Using and Configuring Red Hat Subscription Management. - Verify that the system is subscribed to the required repositories.
# yum repolist
Use the Web Interface to Register and Subscribe a System to Red Hat Subscription Management
Prerequisites
- Know the user name and password of the Red Hat Subsrciption Management (RHSM) account with Red Hat Gluster Storage entitlements.
- Log on to Red Hat Subscription Management (https://access.redhat.com/management).
- Click the Systems link at the top of the screen.
- Click the name of the system to which the Red Hat Gluster Storage Native Client channel must be appended.
- Click Subscribed Channels section of the screen.in the
- Expand the node for Additional Services Channels for
Red Hat Enterprise Linux 7 for x86_64
orfor x86_64
or forRed Hat Enterprise Linux 5 for x86_64
depending on the client platform. - Click thebutton to finalize the changes.When the page refreshes, select the Details tab to verify the system is subscribed to the appropriate channels.
Install Native Client Packages
Prerequisites
- Run the
yum install
command to install the native client RPM packages.#
yum install glusterfs glusterfs-fuse
- For Red Hat Enterprise 5.x client systems, run the
modprobe
command to load FUSE modules before mounting Red Hat Gluster Storage volumes.#
modprobe fuse
For more information on loading modules at boot time, see https://access.redhat.com/knowledge/solutions/47028 .
6.2.2. Upgrading Native Client
Unmount gluster volumes
Unmount any gluster volumes prior to upgrading the native client.# umount /mnt/glusterfs
Upgrade the client
Run theyum update
command to upgrade the native client:#
yum update glusterfs glusterfs-fuse
Remount gluster volumes
Remount volumes as discussed in Section 6.2.3, “Mounting Red Hat Gluster Storage Volumes”.
6.2.3. Mounting Red Hat Gluster Storage Volumes
Note
- Clients should be on the same version as the server, and at least on the version immediately previous to the server version. For Red Hat Gluster Storage 3.5, the recommended native client version should either be 3.4.z, and 3.5. For other versions, see Section 6.2, “Native Client”.
- Server names selected during volume creation should be resolvable in the client machine. Use appropriate
/etc/hosts
entries, or a DNS server to resolve server names to IP addresses. - Internet Protocol Version 6 (IPv6) support is available only for Red Hat Hyperconverged Infrastructure for Virtualization environments and not for Red Hat Gluster Storage standalone environments.
6.2.3.1. Mount Commands and Options
mount -t glusterfs
command. All options must be separated with commas.
# mount -t glusterfs -o backup-volfile-servers=volfile_server2:volfile_server3:.... ..:volfile_serverN,transport-type tcp,log-level=WARNING,reader-thread-count=2,log-file=/var/log/gluster.log server1:/test-volume /mnt/glusterfs
- backup-volfile-servers=<volfile_server2>:<volfile_server3>:...:<volfile_serverN>
- List of the backup volfile servers to mount the client. If this option is specified while mounting the fuse client, when the first volfile server fails, the servers specified in
backup-volfile-servers
option are used as volfile servers to mount the client until the mount is successful.Note
This option was earlier specified asbackupvolfile-server
which is no longer valid. - log-level
- Logs only specified level or higher severity messages in the
log-file
. - log-file
- Logs the messages in the specified file.
- transport-type
- Specifies the transport type that FUSE client must use to communicate with bricks. If the volume was created with only one transport type, then that becomes the default when no value is specified. In case of
tcp,rdma
volume, tcp is the default. - dump-fuse
- This mount option creates dump of fuse traffic between the glusterfs client (fuse userspace server) and the kernel. The interface to mount a glusterfs volume is the standard mount(8) command from the CLI. This feature enables the same in the mount option.
# mount -t glusterfs -odump-fuse=filename hostname:/volname mount-path
For example,# mount -t glusterfs -odump-fuse=/dumpfile 10.70.43.18:/arbiter /mnt/arbiter
The above command generates a binary file with the namedumpfile
.Note
The fusedump grows large with time and notably if the client gets a heavy load. So this is not an intended use case to do fusedump during normal usage. It is advised to use this to get a dump from a particular scenario, for diagnostic purposes.You need to unmount and remount the volume without the fusedump option to stop dumping. - ro
- Mounts the file system with read-only permissions.
- acl
- Enables POSIX Access Control List on mount. See Section 6.5.4, “Checking ACL enablement on a mounted volume” for further information.
- background-qlen=n
- Enables FUSE to handle n number of requests to be queued before subsequent requests are denied. Default value of n is 64.
- enable-ino32
- Enables file system to present 32-bit inodes instead of 64-bit inodes.
- reader-thread-count=n
- Enables FUSE to add n number of reader threads that can give better I/O performance. Default value of n is
1
. - lru-limit
- This
mount
command option clears the inodes from the least recently used (lru) list (which keeps non-referenced inodes) after the inode limit has reached.For example,# mount -olru-limit=NNNN -t glusterfs hostname:/volname /mnt/mountdir
Where NNNN is a positive integer. The default value of NNNN is 128k (131072) and the recommended value is 20000 and above. If0
is specified as thelru-limit
then it means that no invalidation of inodes from the lru-list.
6.2.3.2. Mounting Volumes Manually
Manually Mount a Red Hat Gluster Storage Volume or Subdirectory
- For a Red Hat Gluster Storage Volume
mount -t glusterfs HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR
- For a Red Hat Gluster Storage Volume's Subdirectory
mount -t glusterfs HOSTNAME|IPADDRESS:/VOLNAME/SUBDIRECTORY /MOUNTDIR
Note
- If a mount point has not yet been created for the volume, run the
mkdir
command to create a mount point.# mkdir /mnt/glusterfs
- Run the
mount -t glusterfs
command, using the key in the task summary as a guide.- For a Red Hat Gluster Storage Volume:
# mount -t glusterfs server1:/test-volume /mnt/glusterfs
- For a Red Hat Gluster Storage Volume's Subdirectory
# mount -t glusterfs server1:/test-volume/sub-dir /mnt/glusterfs
6.2.3.3. Mounting Volumes Automatically
- Open the
/etc/fstab
file in a text editor. - Append the following configuration to the
fstab
file:- For a Red Hat Gluster Storage Volume
HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR glusterfs defaults,_netdev 0 0
- For a Red Hat Gluster Storage Volume's Subdirectory
HOSTNAME|IPADDRESS:/VOLNAME/SUBDIRECTORY /MOUNTDIR glusterfs defaults,_netdev 0 0
Using the example server names, the entry contains the following replaced values.server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev 0 0
ORserver1:/test-volume/subdir /mnt/glusterfs glusterfs defaults,_netdev 0 0
If you want to specify the transport type then check the following example:server1:/test-volume /mnt/glusterfs glusterfs defaults,_netdev,transport=tcp 0 0
ORserver1:/test-volume/sub-dir /mnt/glusterfs glusterfs defaults,_netdev,transport=tcp 0 0
6.2.3.4. Manually Mounting Sub-directories Using Native Client
- Provides namespace isolation so that multiple users can access the storage without risking namespace collision with other users.
- Prevents the root file system from becoming full in the event of a mount failure.
# mount -t glusterfs hostname:/volname/subdir /mount-point
# mount -t glusterfs hostname:/volname -osubdir-mount=subdir /mount-point
# gluster volume set test-vol auth.allow "/(192.168.10.*|192.168.11.*),/subdir1(192.168.1.*),/subdir2(192.168.8.*)”
- The
auth.allow
option allows only the directories specified as the value of theauth.allow
option to be mounted. - Each group of auth-allow is separated by a comma (
,
). - Each group has a directory separated by parentheses,
()
, which contains the valid IP addresses. - All subdirectories start with
/
, that is, no relative path to a volume, but everything is an absolute path, taking/
as the root directory of the volume.
Note
*
, where any given subdirectory in a volume can be mounted by all clients.
6.2.3.5. Testing Mounted Volumes
Testing Mounted Red Hat Gluster Storage Volumes
Prerequisites
- Run the
mount
command to check whether the volume was successfully mounted.# mount server1:/test-volume on /mnt/glusterfs type fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
OR# mount server1:/test-volume/sub-dir on /mnt/glusterfs type fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
If transport option is used while mounting a volume, mount status will have the transport type appended to the volume name. For example, for transport=tcp:# mount server1:/test-volume.tcp on /mnt/glusterfs type fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
OR# mount server1:/test-volume/sub-dir.tcp on /mnt/glusterfs type fuse.glusterfs(rw,allow_other,default_permissions,max_read=131072
- Run the
df
command to display the aggregated storage space from all the bricks in a volume.# df -h /mnt/glusterfs Filesystem Size Used Avail Use% Mounted on server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
- Move to the mount directory using the
cd
command, and list the contents.# cd /mnt/glusterfs # ls
6.3. NFS
6.3.1. Support Matrix
Features | glusterFS NFS (NFSv3) | NFS-Ganesha (NFSv3) | NFS-Ganesha (NFSv4) |
---|---|---|---|
Root-squash | Yes | Yes | Yes |
All-squash | No | Yes | Yes |
Sub-directory exports | Yes | Yes | Yes |
Locking | Yes | Yes | Yes |
Client based export permissions | Yes | Yes | Yes |
Netgroups | Yes | Yes | Yes |
Mount protocols | UDP, TCP | UDP, TCP | Only TCP |
NFS transport protocols | TCP | UDP, TCP | TCP |
AUTH_UNIX | Yes | Yes | Yes |
AUTH_NONE | Yes | Yes | Yes |
AUTH_KRB | No | Yes | Yes |
ACLs | Yes | No | Yes |
Delegations | N/A | N/A | No |
High availability | Yes (but with certain limitations. For more information see, "Setting up CTDB for NFS") | Yes | Yes |
Multi-head | Yes | Yes | Yes |
Gluster RDMA volumes | Yes | Not supported | Not supported |
DRC | Not supported | Yes | Yes |
Dynamic exports | No | Yes | Yes |
pseudofs | N/A | N/A | Yes |
NFSv4.1 | N/A | N/A | Yes |
Note
- Red Hat does not recommend running NFS-Ganesha with any other NFS servers, such as, kernel-NFS and Gluster NFS servers.
- Only one of NFS-Ganesha, gluster-NFS or kernel-NFS servers can be enabled on a given machine/host as all NFS implementations use the port 2049 and only one can be active at a given time. Hence you must disable kernel-NFS before NFS-Ganesha is started.
6.3.2. Gluster NFS (Deprecated)
Warning
Note
mount -t nfs
” command on the client as below:
# mount -t nfs HOSTNAME:VOLNAME MOUNTPATH
# gluster volume set VOLNAME nfs.disable off
- To set nfs.acl ON, run the following command:
# gluster volume set VOLNAME nfs.acl on
- To set nfs.acl OFF, run the following command:
# gluster volume set VOLNAME nfs.acl off
Note
Important
# firewall-cmd --get-active-zones
# firewall-cmd --zone=zone_name --add-service=nfs --add-service=rpc-bind # firewall-cmd --zone=zone_name --add-service=nfs --add-service=rpc-bind --permanent
6.3.2.1. Setting up CTDB for Gluster NFS (Deprecated)
Important
# firewall-cmd --get-active-zones
# firewall-cmd --zone=zone_name --add-port=4379/tcp # firewall-cmd --zone=zone_name --add-port=4379/tcp --permanent
Note
6.3.2.1.1. Prerequisites
- If you already have an older version of CTDB (version <= ctdb1.x), then remove CTDB by executing the following command:
# yum remove ctdb
After removing the older version, proceed with installing the latest CTDB.Note
Ensure that the system is subscribed to the samba channel to get the latest CTDB packages. - Install CTDB on all the nodes that are used as NFS servers to the latest version using the following command:
# yum install ctdb
- CTDB uses TCP port 4379 by default. Ensure that this port is accessible between the Red Hat Gluster Storage servers.
6.3.2.1.2. Port and Firewall Information for Gluster NFS
# firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp \ --add-port=32803/tcp --add-port=32769/udp \ --add-port=111/tcp --add-port=111/udp
# firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp \ --add-port=32803/tcp --add-port=32769/udp \ --add-port=111/tcp --add-port=111/udp --permanent
- On Red Hat Enterprise Linux 7, edit
/etc/sysconfig/nfs
file as mentioned below:# sed -i '/STATD_PORT/s/^#//' /etc/sysconfig/nfs
Note
This step is not applicable for Red Hat Enterprise Linux 8. - Restart the services:
- For Red Hat Enterprise Linux 6:
# service nfslock restart # service nfs restart
Important
Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide - For Red Hat Enterprise Linux 7:
# systemctl restart nfs-config # systemctl restart rpc-statd # systemctl restart nfs-mountd # systemctl restart nfslock
Note
This step is not applicable for Red Hat Enterprise Linux 8.
6.3.2.1.3. Configuring CTDB on Red Hat Gluster Storage Server
- Create a replicate volume. This volume will host only a zero byte lock file, hence choose minimal sized bricks. To create a replicate volume run the following command:
# gluster volume create volname replica n ipaddress:/brick path.......N times
where,N: The number of nodes that are used as Gluster NFS servers. Each node must host one brick.For example:# gluster volume create ctdb replica 3 10.16.157.75:/rhgs/brick1/ctdb/b1 10.16.157.78:/rhgs/brick1/ctdb/b2 10.16.157.81:/rhgs/brick1/ctdb/b3
- In the following files, replace "all" in the statement META="all" to the newly created volume name
/var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh
For example:META="all" to META="ctdb"
- Start the volume.
# gluster volume start ctdb
As part of the start process, theS29CTDBsetup.sh
script runs on all Red Hat Gluster Storage servers, adds an entry in/etc/fstab
for the mount, and mounts the volume at/gluster/lock
on all the nodes with Gluster NFS server. It also enables automatic start of CTDB service on reboot.Note
When you stop the special CTDB volume, the S29CTDB-teardown.sh script runs on all Red Hat Gluster Storage servers and removes an entry in /etc/fstab for the mount and unmounts the volume at /gluster/lock. - Verify if the file /etc/sysconfig/ctdb exists on all the nodes that is used as Gluster NFS server. This file contains Red Hat Gluster Storage recommended CTDB configurations.
- Create /etc/ctdb/nodes file on all the nodes that is used as Gluster NFS servers and add the IPs of these nodes to the file.
10.16.157.0 10.16.157.3 10.16.157.6
The IPs listed here are the private IPs of NFS servers. - On all the nodes that are used as Gluster NFS server which require IP failover, create /etc/ctdb/public_addresses file and add the virtual IPs that CTDB should create to this file. Add these IP address in the following format:
<Virtual IP>/<routing prefix><node interface>
For example:192.168.1.20/24 eth0 192.168.1.21/24 eth0
- Start the CTDB service on all the nodes by executing the following command:
# service ctdb start
Note
6.3.2.2. Using Gluster NFS to Mount Red Hat Gluster Storage Volumes (Deprecated)
Note
nfsmount.conf
file at /etc/nfsmount.conf
by adding the following text in the file:
Defaultvers=3
vers=3
manually in all the mount commands.
# mount nfsserver:export -o vers=3 /MOUNTPOINT
tcp,rdma
volume it could be changed using the volume set option nfs.transport-type
.
6.3.2.2.1. Manually Mounting Volumes Using Gluster NFS (Deprecated)
mount
command to manually mount a Red Hat Gluster Storage volume using Gluster NFS.
- If a mount point has not yet been created for the volume, run the
mkdir
command to create a mount point.# mkdir /mnt/glusterfs
- Run the correct
mount
command for the system.- For Linux
# mount -t nfs -o vers=3 server1:/test-volume /mnt/glusterfs
- For Solaris
# mount -o vers=3 nfs://server1:38467/test-volume /mnt/glusterfs
mount
command to manually mount a Red Hat Gluster Storage volume using Gluster NFS over TCP.
Note
requested NFS version or transport protocol is not supported
nfs.mount-udp
is supported for mounting a volume, by default it is disabled. The following are the limitations:
- If
nfs.mount-udp
is enabled, the MOUNT protocol needed for NFSv3 can handle requests from NFS-clients that require MOUNT over UDP. This is useful for at least some versions of Solaris, IBM AIX and HP-UX. - Currently, MOUNT over UDP does not have support for mounting subdirectories on a volume. Mounting
server:/volume/subdir
exports is only functional when MOUNT over TCP is used. - MOUNT over UDP does not currently have support for different authentication options that MOUNT over TCP honors. Enabling
nfs.mount-udp
may give more permissions to NFS clients than intended via various authentication options likenfs.rpc-auth-allow
,nfs.rpc-auth-reject
andnfs.export-dir
.
- If a mount point has not yet been created for the volume, run the
mkdir
command to create a mount point.# mkdir /mnt/glusterfs
- Run the correct
mount
command for the system, specifying the TCP protocol option for the system.- For Linux
# mount -t nfs -o vers=3,mountproto=tcp server1:/test-volume /mnt/glusterfs
- For Solaris
# mount -o proto=tcp, nfs://server1:38467/test-volume /mnt/glusterfs
6.3.2.2.2. Automatically Mounting Volumes Using Gluster NFS (Deprecated)
Note
/etc/auto.master
and /etc/auto.misc
files, and restart the autofs
service. Whenever a user or process attempts to access the directory it will be mounted in the background on-demand.
- Open the
/etc/fstab
file in a text editor. - Append the following configuration to the
fstab
file.HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR nfs defaults,_netdev, 0 0
Using the example server names, the entry contains the following replaced values.server1:/test-volume /mnt/glusterfs nfs defaults,_netdev, 0 0
- Open the
/etc/fstab
file in a text editor. - Append the following configuration to the
fstab
file.HOSTNAME|IPADDRESS:/VOLNAME /MOUNTDIR nfs defaults,_netdev,mountproto=tcp 0 0
Using the example server names, the entry contains the following replaced values.server1:/test-volume /mnt/glusterfs nfs defaults,_netdev,mountproto=tcp 0 0
6.3.2.2.3. Automatically Mounting Subdirectories Using NFS (Deprecated)
nfs.export-dir
and nfs.export-dirs
options provide granular control to restrict or allow specific clients to mount a sub-directory. These clients can be authenticated during sub-directory mount with either an IP, host name or a Classless Inter-Domain Routing (CIDR) range.
- nfs.export-dirs
- This option is enabled by default. It allows the sub-directories of exported volumes to be mounted by clients without needing to export individual sub-directories. When enabled, all sub-directories of all volumes are exported. When disabled, sub-directories must be exported individually in order to mount them on clients.To disable this option for all volumes, run the following command:
# gluster volume set VOLNAME nfs.export-dirs off
- nfs.export-dir
- When
nfs.export-dirs
is set toon
, thenfs.export-dir
option allows you to specify one or more sub-directories to export, rather than exporting all subdirectories (nfs.export-dirs on
), or only exporting individually exported subdirectories (nfs.export-dirs off
).To export certain subdirectories, run the following command:# gluster volume set VOLNAME nfs.export-dir subdirectory
The subdirectory path should be the path from the root of the volume. For example, in a volume with six subdirectories, to export the first three subdirectories, the command would be the following:# gluster volume set myvolume nfs.export-dir /dir1,/dir2,/dir3
Subdirectories can also be exported based on the IP address, hostname, or a Classless Inter-Domain Routing (CIDR) range by adding these details in parentheses after the directory path:# gluster volume set VOLNAME nfs.export-dir subdirectory(IPADDRESS),subdirectory(HOSTNAME),subdirectory(CIDR)
# gluster volume set myvolume nfs.export-dir /dir1(192.168.10.101),/dir2(storage.example.com),/dir3(192.168.98.0/24)
6.3.2.2.4. Testing Volumes Mounted Using Gluster NFS (Deprecated)
Testing Mounted Red Hat Gluster Storage Volumes
Prerequisites
- Run the
mount
command to check whether the volume was successfully mounted.# mount server1:/test-volume on /mnt/glusterfs type nfs (rw,addr=server1)
- Run the
df
command to display the aggregated storage space from all the bricks in a volume.# df -h /mnt/glusterfs Filesystem Size Used Avail Use% Mounted on server1:/test-volume 28T 22T 5.4T 82% /mnt/glusterfs
- Move to the mount directory using the
cd
command, and list the contents.# cd /mnt/glusterfs # ls
Note
The LOCK functionality in NFS protocol is advisory, it is recommended to use locks if the same volume is accessed by multiple clients.
6.3.2.3. Troubleshooting Gluster NFS (Deprecated)
- Q: The mount command on the NFS client fails with RPC Error: Program not registered. This error is encountered due to one of the following reasons:
- Q: The rpcbind service is not running on the NFS client. This could be due to the following reasons:
- Q: The NFS server glusterfsd starts but the initialization fails with nfsrpc- service: portmap registration of program failed error message in the log.
- Q: The NFS server start-up fails with the message Port is already in use in the log file.
- Q: The mount command fails with NFS server failed error:
- Q: The showmount command fails with clnt_create: RPC: Unable to receive error. This error is encountered due to the following reasons:
- Q: The application fails with Invalid argument or Value too large for defined data type
- Q: After the machine that is running NFS server is restarted the client fails to reclaim the locks held earlier.
- Q: The rpc actor failed to complete successfully error is displayed in the nfs.log, even after the volume is mounted successfully.
- Q: The mount command fails with No such file or directory.
RPC Error: Program not registered
. This error is encountered due to one of the following reasons:
- The NFS server is not running. You can check the status using the following command:
# gluster volume status
- The volume is not started. You can check the status using the following command:
# gluster volume info
- rpcbind is restarted. To check if rpcbind is running, execute the following command:
# ps ax| grep rpcbind
- If the NFS server is not running, then restart the NFS server using the following command:
# gluster volume start VOLNAME
- If the volume is not started, then start the volume using the following command:
# gluster volume start VOLNAME
- If both rpcbind and NFS server is running then restart the NFS server using the following commands:
# gluster volume stop VOLNAME
# gluster volume start VOLNAME
rpcbind
service is not running on the NFS client. This could be due to the following reasons:
- The portmap is not running.
- Another instance of kernel NFS server or glusterNFS server is running.
rpcbind
service by running the following command:
# service rpcbind start
[2010-05-26 23:33:47] E [rpcsvc.c:2598:rpcsvc_program_register_portmap] rpc-service: Could notregister with portmap [2010-05-26 23:33:47] E [rpcsvc.c:2682:rpcsvc_program_register] rpc-service: portmap registration of program failed [2010-05-26 23:33:47] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465 [2010-05-26 23:33:47] E [nfs.c:125:nfs_init_versions] nfs: Program init failed [2010-05-26 23:33:47] C [nfs.c:531:notify] nfs: Failed to initialize protocols [2010-05-26 23:33:49] E [rpcsvc.c:2614:rpcsvc_program_unregister_portmap] rpc-service: Could not unregister with portmap [2010-05-26 23:33:49] E [rpcsvc.c:2731:rpcsvc_program_unregister] rpc-service: portmap unregistration of program failed [2010-05-26 23:33:49] E [rpcsvc.c:2744:rpcsvc_program_unregister] rpc-service: Program unregistration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465
- Start the rpcbind service on the NFS server by running the following command:
# service rpcbind start
After starting rpcbind service, glusterFS NFS server needs to be restarted. - Stop another NFS server running on the same machine.Such an error is also seen when there is another NFS server running on the same machine but it is not the glusterFS NFS server. On Linux systems, this could be the kernel NFS server. Resolution involves stopping the other NFS server or not running the glusterFS NFS server on the machine. Before stopping the kernel NFS server, ensure that no critical service depends on access to that NFS server's exports.On Linux, kernel NFS servers can be stopped by using either of the following commands depending on the distribution in use:
# service nfs-kernel-server stop # service nfs stop
- Restart glusterFS NFS server.
[2010-05-26 23:40:49] E [rpc-socket.c:126:rpcsvc_socket_listen] rpc-socket: binding socket failed:Address already in use [2010-05-26 23:40:49] E [rpc-socket.c:129:rpcsvc_socket_listen] rpc-socket: Port is already in use [2010-05-26 23:40:49] E [rpcsvc.c:2636:rpcsvc_stage_program_register] rpc-service: could not create listening connection [2010-05-26 23:40:49] E [rpcsvc.c:2675:rpcsvc_program_register] rpc-service: stage registration of program failed [2010-05-26 23:40:49] E [rpcsvc.c:2695:rpcsvc_program_register] rpc-service: Program registration failed: MOUNT3, Num: 100005, Ver: 3, Port: 38465 [2010-05-26 23:40:49] E [nfs.c:125:nfs_init_versions] nfs: Program init failed [2010-05-26 23:40:49] C [nfs.c:531:notify] nfs: Failed to initialize protocols
mount
command fails with NFS server failed error:
mount: mount to NFS server '10.1.10.11' failed: timed out (retrying).
- Disable name lookup requests from NFS server to a DNS server.The NFS server attempts to authenticate NFS clients by performing a reverse DNS lookup to match host names in the volume file with the client IP addresses. There can be a situation where the NFS server either is not able to connect to the DNS server or the DNS server is taking too long to respond to DNS request. These delays can result in delayed replies from the NFS server to the NFS client resulting in the timeout error.NFS server provides a work-around that disables DNS requests, instead relying only on the client IP addresses for authentication. The following option can be added for successful mounting in such situations:
option nfs.addr.namelookup off
Note
Remember that disabling the NFS server forces authentication of clients to use only IP addresses. If the authentication rules in the volume file use host names, those authentication rules will fail and client mounting will fail. - NFS version used by the NFS client is other than version 3 by default.glusterFS NFS server supports version 3 of NFS protocol by default. In recent Linux kernels, the default NFS version has been changed from 3 to 4. It is possible that the client machine is unable to connect to the glusterFS NFS server because it is using version 4 messages which are not understood by glusterFS NFS server. The timeout can be resolved by forcing the NFS client to use version 3. The vers option to mount command is used for this purpose:
# mount nfsserver:export -o vers=3 /MOUNTPOINT
- The firewall might have blocked the port.
- rpcbind might not be running.
NFS.enable-ino32 <on | off>
off
by default, which permits NFS to return 64-bit inode numbers by default.
- built and run on 32-bit machines, which do not support large files by default,
- built to 32-bit standards on 64-bit systems.
-D_FILE_OFFSET_BITS=64
chkconfig --list nfslock
to check if NSM is configured during OS boot.
on,
run chkconfig nfslock off
to disable NSM clients during boot, which resolves the issue.
rpc actor failed to complete successfully
error is displayed in the nfs.log, even after the volume is mounted successfully.
nfs.log
file.
[2013-06-25 00:03:38.160547] W [rpcsvc.c:180:rpcsvc_program_actor] 0-rpc-service: RPC program version not available (req 100003 4) [2013-06-25 00:03:38.160669] E [rpcsvc.c:448:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
noacl
option in the mount command as follows:
# mount -t nfs -o vers=3,noacl server1:/test-volume /mnt/glusterfs
No such file or directory
.
6.3.3. NFS Ganesha
Note
6.3.3.1. Supported Features of NFS-Ganesha
In a highly available active-active environment, if a NFS-Ganesha server that is connected to a NFS client running a particular application goes down, the application/NFS client is seamlessly connected to another NFS-Ganesha server without any administrative intervention.
NFS-Ganesha supports addition and removal of exports dynamically. Dynamic exports is managed by the DBus interface. DBus is a system local IPC mechanism for system management and peer-to-peer application communication.
In NFS-Ganesha, multiple Red Hat Gluster Storage volumes or sub-directories can be exported simultaneously.
NFS-Ganesha creates and maintains a NFSv4 pseudo-file system, which provides clients with seamless access to all exported objects on the server.
NFS-Ganesha NFSv4 protocol includes integrated support for Access Control List (ACL)s, which are similar to those used by Windows. These ACLs can be used to identify a trustee and specify the access rights allowed, or denied for that trustee.This feature is disabled by default.
Note
6.3.3.2. Setting up NFS Ganesha
Note
6.3.3.2.1. Port and Firewall Information for NFS-Ganesha
Service | Port Number | Protocol |
sshd | 22 | TCP |
rpcbind/portmapper | 111 | TCP/UDP |
NFS | 2049 | TCP/UDP |
mountd | 20048 | TCP/UDP |
NLM | 32803 | TCP/UDP |
RQuota | 875 | TCP/UDP |
statd | 662 | TCP/UDP |
pcsd | 2224 | TCP |
pacemaker_remote | 3121 | TCP |
corosync | 5404 and 5405 | UDP |
dlm | 21064 | TCP |
Note
Ensure the statd service is configured to use the ports mentioned above by executing the following commands on every node in the nfs-ganesha cluster:
- On Red Hat Enterprise Linux 7, edit /etc/sysconfig/nfs file as mentioned below:
# sed -i '/STATD_PORT/s/^#//' /etc/sysconfig/nfs
Note
This step is not applicable for Red Hat Enterprise Linux 8. - Restart the statd service:For Red Hat Enterprise Linux 7:
# systemctl restart nfs-config # systemctl restart rpc-statd
Note
This step is not applicable for Red Hat Enterprise Linux 8.
Note
- Edit '/etc/sysconfig/nfs' using following commands:
# sed -i '/STATD_PORT/s/^#//' /etc/sysconfig/nfs # sed -i '/LOCKD_TCPPORT/s/^#//' /etc/sysconfig/nfs # sed -i '/LOCKD_UDPPORT/s/^#//' /etc/sysconfig/nfs
- Restart the services:For Red Hat Enterprise Linux 7:
# systemctl restart nfs-config # systemctl restart rpc-statd # systemctl restart nfslock
- Open the ports that are configured in the first step using the following command:
# firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp \ --add-port=32803/tcp --add-port=32769/udp \ --add-port=111/tcp --add-port=111/udp
# firewall-cmd --zone=public --add-port=662/tcp --add-port=662/udp \ --add-port=32803/tcp --add-port=32769/udp \ --add-port=111/tcp --add-port=111/udp --permanent
- To ensure NFS client UDP mount does not fail, ensure to open port 2049 by executing the following command:
# firewall-cmd --zone=zone_name --add-port=2049/udp # firewall-cmd --zone=zone_name --add-port=2049/udp --permanent
- Firewall SettingsOn Red Hat Enterprise Linux 7, enable the firewall services mentioned below.
- Get a list of active zones using the following command:
# firewall-cmd --get-active-zones
- Allow the firewall service in the active zones, run the following commands:
# firewall-cmd --zone=zone_name --add-service=nlm --add-service=nfs --add-service=rpc-bind --add-service=high-availability --add-service=mountd --add-service=rquota # firewall-cmd --zone=zone_name --add-service=nlm --add-service=nfs --add-service=rpc-bind --add-service=high-availability --add-service=mountd --add-service=rquota --permanent # firewall-cmd --zone=zone_name --add-port=662/tcp --add-port=662/udp # firewall-cmd --zone=zone_name --add-port=662/tcp --add-port=662/udp --permanent
6.3.3.2.2. Prerequisites to run NFS-Ganesha
- A Red Hat Gluster Storage volume must be available for export and NFS-Ganesha rpms are installed.
- Ensure that the fencing agents are configured. For more information on configuring fencing agents, refer to the following documentation:
- Fencing Configuration section in the High Availability Add-On Administration guide: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_administration/s1-fenceconfig-haaa
- Fence Devices section in the High Availability Add-On Reference guide: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-guiclustcomponents-haar#s2-guifencedevices-HAAR
Note
The required minimum number of nodes for a highly available installation/configuration of NFS Ganesha is 3 and a maximum number of supported nodes is 8. - Only one of NFS-Ganesha, gluster-NFS or kernel-NFS servers can be enabled on a given machine/host as all NFS implementations use the port 2049 and only one can be active at a given time. Hence you must disable kernel-NFS before NFS-Ganesha is started.Disable the kernel-nfs using the following command:For Red Hat Enterprise Linux 7
# systemctl stop nfs-server # systemctl disable nfs-server
To verify if kernel-nfs is disabled, execute the following command:# systemctl status nfs-server
The service should be in stopped state.Note
Gluster NFS will be stopped automatically when NFS-Ganesha is enabled.Ensure that none of the volumes have the variablenfs.disable
set to 'off'. - Ensure to configure the ports as mentioned in Port/Firewall Information for NFS-Ganesha.
- Edit the ganesha-ha.conf file based on your environment.
- Reserve virtual IPs on the network for each of the servers configured in the ganesha.conf file. Ensure that these IPs are different than the hosts' static IPs and are not used anywhere else in the trusted storage pool or in the subnet.
- Ensure that all the nodes in the cluster are DNS resolvable. For example, you can populate the /etc/hosts with the details of all the nodes in the cluster.
- Make sure the SELinux is in Enforcing mode.
- Start network service on all machines using the following command:For Red Hat Enterprise Linux 7:
# systemctl start network
- Create and mount a gluster shared volume by executing the following command:
# gluster volume set all cluster.enable-shared-storage enable volume set: success
For more information, see Section 11.12, “Setting up Shared Storage Volume” - Create a directory named
nfs-ganesha
under/var/run/gluster/shared_storage
Note
With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ . - Copy the
ganesha.conf
andganesha-ha.conf
files from/etc/ganesha
to/var/run/gluster/shared_storage/nfs-ganesha
. - Enable the glusterfssharedstorage.service service using the following command:
systemctl enable glusterfssharedstorage.service
- Enable the nfs-ganesha service using the following command:
systemctl enable nfs-ganesha
6.3.3.2.3. Configuring the Cluster Services
Note
- Enable the pacemaker service using the following command:For Red Hat Enterprise Linux 7:
# systemctl enable pacemaker.service
- Start the pcsd service using the following command.For Red Hat Enterprise Linux 7:
# systemctl start pcsd
Note
- To start pcsd by default after the system is rebooted, execute the following command:For Red Hat Enterprise Linux 7:
# systemctl enable pcsd
- Set a password for the user ‘hacluster’ on all the nodes using the following command. Use the same password for all the nodes:
# echo <password> | passwd --stdin hacluster
- Perform cluster authentication between the nodes, where, username is ‘hacluster’, and password is the one you used in the previous step. Ensure to execute the following command on every node:For Red Hat Enterprise Linux 7:
# pcs cluster auth <hostname1> <hostname2> ...
For Red Hat Enterprise Linux 8:# pcs host auth <hostname1> <hostname2> ...
Note
The hostname of all the nodes in the Ganesha-HA cluster must be included in the command when executing it on every node.For example, in a four node cluster; nfs1, nfs2, nfs3, and nfs4, execute the following command on every node:For Red Hat Enterprise Linux 7:# pcs cluster auth nfs1 nfs2 nfs3 nfs4 Username: hacluster Password: nfs1: Authorized nfs2: Authorized nfs3: Authorized nfs4: Authorized
For Red Hat Enterprise Linux 8:# pcs host auth nfs1 nfs2 nfs3 nfs4 Username: hacluster Password: nfs1: Authorized nfs2: Authorized nfs3: Authorized nfs4: Authorized
- Key-based SSH authentication without password for the root user has to be enabled on all the HA nodes. Follow these steps:
- On one of the nodes (node1) in the cluster, run:
# ssh-keygen -f /var/lib/glusterd/nfs/secret.pem -t rsa -N ''
- Deploy the generated public key from node1 to all the nodes (including node1) by executing the following command for every node:
# ssh-copy-id -i /var/lib/glusterd/nfs/secret.pem.pub root@<node-ip/hostname>
- Copy the ssh keypair from node1 to all the nodes in the Ganesha-HA cluster by executing the following command for every node:
# scp -i /var/lib/glusterd/nfs/secret.pem /var/lib/glusterd/nfs/secret.* root@<node-ip/hostname>:/var/lib/glusterd/nfs/
- As part of cluster setup, port 875 is used to bind to the Rquota service. If this port is already in use, assign a different port to this service by modifying following line in ‘/etc/ganesha/ganesha.conf’ file on all the nodes.
# Use a non-privileged port for RQuota Rquota_Port = 875;
6.3.3.2.4. Creating the ganesha-ha.conf file
- Create a directory named nfs-ganesha under /var/run/gluster/shared_storage
Note
With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ . - Copy the ganesha.conf and ganesha-ha.conf files from /etc/ganesha to /var/run/gluster/shared_storage/nfs-ganesha.
# Name of the HA cluster created. # must be unique within the subnet HA_NAME="ganesha-ha-360" # # # You may use short names or long names; you may not use IP addresses. # Once you select one, stay with it as it will be mildly unpleasant to clean # up if you switch later on. Ensure that all names - short and/or long - are in # DNS or /etc/hosts on all machines in the cluster. # # The subset of nodes of the Gluster Trusted Pool that form the ganesha HA # cluster. Hostname is specified. HA_CLUSTER_NODES="server1.lab.redhat.com,server2.lab.redhat.com,..." # # Virtual IPs for each of the nodes specified above. VIP_server1="10.0.2.1" VIP_server2="10.0.2.2" #VIP_server1_lab_redhat_com="10.0.2.1" #VIP_server2_lab_redhat_com="10.0.2.2" .... ....
Note
- Pacemaker handles the creation of the VIP and assigning an interface.
- Ensure that the VIP is in the same network range.
- Ensure that the HA_CLUSTER_NODES are specified as hostnames. Using IP addresses will cause clustering to fail.
6.3.3.2.5. Configuring NFS-Ganesha using Gluster CLI
To setup the HA cluster, enable NFS-Ganesha by executing the following command:
- Enable NFS-Ganesha by executing the following command
# gluster nfs-ganesha enable
Note
Before enabling or disabling NFS-Ganesha, ensure that all the nodes that are part of the NFS-Ganesha cluster are up.For example,# gluster nfs-ganesha enable Enabling NFS-Ganesha requires Gluster-NFS to be disabled across the trusted pool. Do you still want to continue? (y/n) y This will take a few minutes to complete. Please wait .. nfs-ganesha : success
Note
After enabling NFS-Ganesha, ifrpcinfo -p
shows the statd port different from 662, then, restart the statd service:For Red Hat Enterprise Linux 7:# systemctl restart rpc-statd
Tearing down the HA clusterTo tear down the HA cluster, execute the following command:
# gluster nfs-ganesha disable
For example,# gluster nfs-ganesha disable Disabling NFS-Ganesha will tear down entire ganesha cluster across the trusted pool. Do you still want to continue? (y/n) y This will take a few minutes to complete. Please wait .. nfs-ganesha : success
Verifying the status of the HA clusterTo verify the status of the HA cluster, execute the following script:
# /usr/libexec/ganesha/ganesha-ha.sh --status /var/run/gluster/shared_storage/nfs-ganesha
Note
With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ .For example:# /usr/libexec/ganesha/ganesha-ha.sh --status /var/run/gluster/shared_storage/nfs-ganesha
Online: [ server1 server2 server3 server4 ] server1-cluster_ip-1 server1 server2-cluster_ip-1 server2 server3-cluster_ip-1 server3 server4-cluster_ip-1 server4 Cluster HA Status: HEALTHY
Note
- It is recommended to manually restart the
ganesha.nfsd
service after the node is rebooted, to fail back the VIPs. - Disabling NFS Ganesha does not enable Gluster NFS by default. If required, Gluster NFS must be enabled manually.
Note
- NFS-Ganesha fails to start.
- NFS-Ganesha port 875 is unavailable.
- The ganesha.conf file is available at /etc/ganesha/ganesha.conf.
- Uncomment the line #Enable_RQUOTA = false; to disable RQUOTA.
- Restart the nfs-ganesha service on all nodes.
# systemctl restart nfs-ganesha
6.3.3.2.6. Exporting and Unexporting Volumes through NFS-Ganesha
Note
To export a Red Hat Gluster Storage volume, execute the following command:
# gluster volume set <volname> ganesha.enable on
# gluster vol set testvol ganesha.enable on volume set: success
To unexport a Red Hat Gluster Storage volume, execute the following command:
# gluster volume set <volname> ganesha.enable off
# gluster vol set testvol ganesha.enable off volume set: success
6.3.3.2.7. Verifying the NFS-Ganesha Status
- Check if NFS-Ganesha is started by executing the following commands:On Red Hat Enterprise Linux-7
# systemctl status nfs-ganesha
For example:# systemctl status nfs-ganesha nfs-ganesha.service - NFS-Ganesha file server Loaded: loaded (/usr/lib/systemd/system/nfs-ganesha.service; disabled) Active: active (running) since Tue 2015-07-21 05:08:22 IST; 19h ago Docs: http://github.com/nfs-ganesha/nfs-ganesha/wiki Main PID: 15440 (ganesha.nfsd) CGroup: /system.slice/nfs-ganesha.service └─15440 /usr/bin/ganesha.nfsd -L /var/log/ganesha/ganesha.log -f /etc/ganesha/ganesha.conf -N NIV_EVENT Jul 21 05:08:22 server1 systemd[1]: Started NFS-Ganesha file server.]
- Check if the volume is exported.
# showmount -e localhost
For example:# showmount -e localhost Export list for localhost: /volname (everyone)
- The logs of ganesha.nfsd daemon are written to /var/log/ganesha/ganesha.log. Check the log file on noticing any unexpected behavior.
6.3.3.3. Accessing NFS-Ganesha Exports
- Execute the following commands to set the tunable:
# sysctl -w sunrpc.tcp_slot_table_entries=128 # echo 128 > /proc/sys/sunrpc/tcp_slot_table_entries # echo 128 > /proc/sys/sunrpc/tcp_max_slot_table_entries
- To make the tunable persistent on reboot, execute the following commands:
# echo "options sunrpc tcp_slot_table_entries=128" >> /etc/modprobe.d/sunrpc.conf # echo "options sunrpc tcp_max_slot_table_entries=128" >> /etc/modprobe.d/sunrpc.conf
Note
6.3.3.3.1. Mounting exports in NFSv3 Mode
# mount -t nfs -o vers=3 virtual_ip:/volname /mountpoint
mount -t nfs -o vers=3 10.70.0.0:/testvol /mnt
6.3.3.3.2. Mounting exports in NFSv4 Mode
# mount -t nfs -o vers=4 virtual_ip:/volname /mountpoint
# mount -t nfs -o vers=4 10.70.0.0:/testvol /mnt
Important
# mount -t nfs -o vers=4.0 or 4.1 virtual_ip:/volname /mountpoint
# mount -t nfs -o vers=4.1 10.70.0.0:/testvol /mnt
6.3.3.3.3. Finding clients of an NFS server using dbus
# dbus-send --type=method_call --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ClientMgr org.ganesha.nfsd.clientmgr.ShowClients
Note
6.3.3.3.4. Finding authorized client list and other information from an NFS server using dbus
# dbus-send --type=method_call --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.DisplayExport uint16:Export_Id
uint16 export_id string fullpath string pseudopath string tag array[ struct { string client_type int32 CIDR_version byte CIDR_address byte CIDR_mask int32 CIDR_proto uint32 anonymous_uid uint32 anonymous_gid uint32 expire_time_attr uint32 options uint32 set } struct { . . . } . . . ]
client_type
is the client’s IP address, CIDR_version
, CIDR_address
, CIDR_mask
and CIDR_proto
are the CIDR representation details of the client and uint32 anonymous_uid
, uint32 anonymous_gid
, uint32 expire_time_attr
, uint32 options
and uint32
set are the Client Permissions.
#dbus-send --type=method_call --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.DisplayExport uint16:2 method return time=1559209192.642525 sender=:1.5491 -> destination=:1.5510 serial=370 reply_serial=2 uint16 2 string "/mani1" string "/mani1" string "" array [ struct { string "10.70.46.107/32" int32 0 byte 0 byte 255 int32 1 uint32 1440 uint32 72 uint32 0 uint32 52441250 uint32 7340536 } struct { string "10.70.47.152/32" int32 0 byte 0 byte 255 int32 1 uint32 1440 uint32 72 uint32 0 uint32 51392994 uint32 7340536 } ]
6.3.3.4. Modifying the NFS-Ganesha HA Setup
6.3.3.4.1. Adding a Node to the Cluster
Note
/var/lib/glusterd/nfs/secret.pem
SSH key are already generated, those steps should not be repeated.
# /usr/libexec/ganesha/ganesha-ha.sh --add <HA_CONF_DIR> <HOSTNAME> <NODE-VIP>
/run/gluster/shared_storage/nfs-ganesha.
# /usr/libexec/ganesha/ganesha-ha.sh --add /var/run/gluster/shared_storage/nfs-ganesha server16 10.00.00.01
Note
6.3.3.4.2. Deleting a Node in the Cluster
# /usr/libexec/ganesha/ganesha-ha.sh --delete <HA_CONF_DIR> <HOSTNAME>
/run/gluster/shared_storage/nfs-ganesha
.
# /usr/libexec/ganesha/ganesha-ha.sh --delete /var/run/gluster/shared_storage/nfs-ganesha server16
Note
6.3.3.4.3. Replacing a Node in the Cluster
- Delete the node from the cluster. Refer Section 6.3.3.4.2, “Deleting a Node in the Cluster”
- Create a node with the same hostname.Refer Section 11.10.2, “Replacing a Host Machine with the Same Hostname”
Note
It is not required for the new node to have the same name as that of the old node. - Add the node to the cluster. Refer Section 6.3.3.4.1, “Adding a Node to the Cluster”
Note
Ensure that firewall services are enabled as mentioned in Section 6.3.3.2.1, “Port and Firewall Information for NFS-Ganesha” and also the Section 6.3.3.2.2, “Prerequisites to run NFS-Ganesha” are met.
6.3.3.5. Modifying the Default Export Configurations
ganesha-export-config 8
man page.
- Edit/add the required fields in the corresponding export file located at
/run/gluster/shared_storage/nfs-ganesha/exports/
. - Execute the following command
# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config <HA_CONF_DIR> <volname>
- HA_CONF_DIR: The directory path containing the ganesha-ha.conf file. By default it is located at
/run/gluster/shared_storage/nfs-ganesha
. - volname: The name of the volume whose export configuration has to be changed.
# cat export.conf EXPORT{ Export_Id = 1 ; # Export ID unique to each export Path = "volume_path"; # Path of the volume to be exported. Eg: "/test_volume" FSAL { name = GLUSTER; hostname = "10.xx.xx.xx"; # IP of one of the nodes in the trusted pool volume = "volume_name"; # Volume name. Eg: "test_volume" } Access_type = RW; # Access permissions Squash = No_root_squash; # To enable/disable root squashing Disable_ACL = true; # To enable/disable ACL Pseudo = "pseudo_path"; # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo" Protocols = "3”, “4" ; # NFS protocols supported Transports = "UDP”, “TCP" ; # Transport protocols supported SecType = "sys"; # Security flavors supported }
export.conf
file to see the expected behavior.
- Providing Permissions for Specific Clients
- Enabling and Disabling NFSv4 ACLs
- Providing Pseudo Path for NFSv4 Mount
- Exporting Subdirectories
6.3.3.5.1. Providing Permissions for Specific Clients
EXPORT
block applies to any client that mounts the exported volume. To provide specific permissions to specific clients , introduce a client
block inside the EXPORT
block.
EXPORT
block.
client { clients = 10.00.00.01; # IP of the client. access_type = "RO"; # Read-only permissions Protocols = "3"; # Allow only NFSv3 protocol. anonymous_uid = 1440; anonymous_gid = 72; }
client
block.
6.3.3.5.2. Enabling and Disabling NFSv4 ACLs
Disable_ACL = false;
Note
6.3.3.5.3. Providing Pseudo Path for NFSv4 Mount
Pseudo = "pseudo_path"; # NFSv4 pseudo path for this export. Eg: "/test_volume_pseudo"
6.3.3.5.4. Exporting Subdirectories
- Create a separate export file for the sub-directory.
# cat export.ganesha-dir.conf # WARNING : Using Gluster CLI will overwrite manual # changes made to this file. To avoid it, edit the # file and run ganesha-ha.sh --refresh-config. EXPORT{ Export_Id = 3; Path = "/ganesha/dir"; FSAL { name = GLUSTER; hostname="localhost"; volume="ganesha"; volpath="/dir"; } Access_type = RW; Disable_ACL = true; Squash="No_root_squash"; Pseudo="/ganesha/dir"; Protocols = "3", "4"; Transports = "UDP","TCP"; SecType = "sys"; }
- Change the
Export_ID
to any unique unused ID.Edit thePath
andPseudo
parameters and add the volpath entry to the export file. - If a new export file is created for the sub-directory, you must add it's entry in
ganesha.conf
file.%include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.<share-name>.conf"
Note
With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ .For example:%include "/var/run/gluster/shared_storage/nfs-ganesha/exports/export.ganesha.conf" --> Volume entry %include >/var/run/gluster/shared_storage/nfs-ganesha/exports/export.ganesha-dir.conf" --> Subdir entry
- Execute the following script to export the sub-directory shares without disrupting existing clients connected to other shares :
# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config <HA_CONF_DIR> <share-name>
For example:/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /run/gluster/shared_storage/nfs-ganesha/ ganesha-dir
- Edit the volume export file with subdir entry.For Example:
# cat export.ganesha.conf # WARNING : Using Gluster CLI will overwrite manual # changes made to this file. To avoid it, edit the # file and run ganesha-ha.sh --refresh-config. EXPORT{ Export_Id = 4; Path = "/ganesha/dir1"; FSAL { name = GLUSTER; hostname="localhost"; volume="ganesha"; volpath="/dir1"; } Access_type = RW; Disable_ACL = true; Squash="No_root_squash"; Pseudo="/ganesha/dir1"; Protocols = "3", "4"; Transports = "UDP","TCP"; SecType = "sys"; }
- Change the
Export_ID
to any unique unused ID.Edit thePath
andPseudo
parameters and add the volpath entry to the export file. - Execute the following script to export the sub-directory shares without disrupting existing clients connected to other shares:
# /usr/libexec/ganesha/ganesha-ha.sh --refresh-config <HA_CONF_DIR> <share-name>
For example:/usr/libexec/ganesha/ganesha-ha.sh --refresh-config /run/gluster/shared_storage/nfs-ganesha/ ganesha
Note
If the same export file contains multiple EXPORT{} entries, then a volume restart or nfs-ganesha service restart is required.
6.3.3.5.4.1. Enabling all_squash option
all_squash
, edit the following parameter:
Squash = all_squash ; # To enable/disable root squashing
6.3.3.5.5. Unexporting Subdirectories
- Note the export id of the share which you want to unexport from configuration file
(/var/run/gluster/shared_storage/nfs-ganesha/exports/file-name.conf)
- Deleting the configuration:
- Delete the configuration file (if there is a seperate configraution file):
# rm -rf /var/run/gluster/shared_storage/nfs-ganesha/exports/file-name.conf
- Delete the entry of the conf file from /etc/ganesha/ganesha.confRemove the line:
%include "/var/run/gluster/shared_storage/nfs-ganesha/export/export.conf
- Run the below command:
# dbus-send --print-reply --system --dest=org.ganesha.nfsd /org/ganesha/nfsd/ExportMgr org.ganesha.nfsd.exportmgr.RemoveExport uint16:export_id
Export_id in above command should be of export entry obtained from step 1.
6.3.3.6. Configuring Kerberized NFS-Ganesha
Note
- Install the krb5-workstation and the ntpdate (RHEL 7) or the chrony (RHEL 8) packages on all the machines:
# yum install krb5-workstation
For Red Hat Enterprise Linux 7:# yum install ntpdate
For Red Hat Enterprise Linux 8:# dnf install chrony
Note
- The krb5-libs package will be updated as a dependent package.
- For RHEL 7, configure the ntpdate based on the valid time server according to the environment:
# echo <valid_time_server> >> /etc/ntp/step-tickers # systemctl enable ntpdate # systemctl start ntpdate
For RHEL 8, configure chrony based on the valid time server accroding to the environment:# vi /etc/chrony.conf # systemctl enable chrony # systemctl start chrony
For RHEL 7 and RHEL 8 both, perform the following steps: - Ensure that all systems can resolve each other by FQDN in DNS.
- Configure the
/etc/krb5.conf
file and add relevant changes accordingly. For example:[logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] dns_lookup_realm = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true rdns = false default_realm = EXAMPLE.COM default_ccache_name = KEYRING:persistent:%{uid} [realms] EXAMPLE.COM = { kdc = kerberos.example.com admin_server = kerberos.example.com } [domain_realm] .example.com = EXAMPLE.COM example.com = EXAMPLE.COM
Note
For further details regarding the file configuration, refer toman krb5.conf
. - On the NFS-server and client, update the /etc/idmapd.conf file by making the required change. For example:
Domain = example.com
6.3.3.6.1. Setting up the NFS-Ganesha Server
Note
- Install the following packages:
# yum install nfs-utils # yum install rpcbind
- Install the relevant gluster and NFS-Ganesha rpms. For more information see, Red Hat Gluster Storage 3.5 Installation Guide.
- Create a Kerberos principle and add it to krb5.keytab on the NFS-Ganesha server
$ kadmin $ kadmin: addprinc -randkey nfs/<host_name>@EXAMPLE.COM $ kadmin: ktadd nfs/<host_name>@EXAMPLE.COM
For example:# kadmin Authenticating as principal root/admin@EXAMPLE.COM with password. Password for root/admin@EXAMPLE.COM: kadmin: addprinc -randkey nfs/<host_name>@EXAMPLE.COM WARNING: no policy specified for nfs/<host_name>@EXAMPLE.COM; defaulting to no policy Principal "nfs/<host_name>@EXAMPLE.COM" created. kadmin: ktadd nfs/<host_name>@EXAMPLE.COM Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno2, encryption type aes256-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab. Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab. Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type des3-cbc-sha1 added to keytab FILE:/etc/krb5.keytab. Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type arcfour-hmac added to keytab FILE:/etc/krb5.keytab. Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type camellia256-cts-cmac added to keytab FILE:/etc/krb5.keytab. Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type camellia128-cts-cmac added to keytab FILE:/etc/krb5.keytab. Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type des-hmac-sha1 added to keytab FILE:/etc/krb5.keytab. Entry for principal nfs/<host_name>@EXAMPLE.COM with kvno 2, encryption type des-cbc-md5 added to keytab FILE:/etc/krb5.keytab.
- Update
/etc/ganesha/ganesha.conf
file as mentioned below:NFS_KRB5 { PrincipalName = nfs ; KeytabPath = /etc/krb5.keytab ; Active_krb5 = true ; }
- Based on the different kerberos security flavours (krb5, krb5i and krb5p) supported by nfs-ganesha, configure the 'SecType' parameter in the volume export file (/var/run/gluster/shared_storage/nfs-ganesha/exports) with appropriate security flavour.
Note
With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ . - Create an unprivileged user and ensure that the users that are created are resolvable to the UIDs through the central user database. For example:
# useradd guest
Note
The username of this user has to be the same as the one on the NFS-client.
6.3.3.6.2. Setting up the NFS Client
Note
- Install the following packages:
# yum install nfs-utils # yum install rpcbind
- Create a kerberos principle and add it to krb5.keytab on the client side. For example:
# kadmin # kadmin: addprinc -randkey host/<host_name>@EXAMPLE.COM # kadmin: ktadd host/<host_name>@EXAMPLE.COM
# kadmin Authenticating as principal root/admin@EXAMPLE.COM with password. Password for root/admin@EXAMPLE.COM: kadmin: addprinc -randkey host/<host_name>@EXAMPLE.COM WARNING: no policy specified for host/<host_name>@EXAMPLE.COM; defaulting to no policy Principal "host/<host_name>@EXAMPLE.COM" created. kadmin: ktadd host/<host_name>@EXAMPLE.COM Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab. Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab FILE:/etc/krb5.keytab. Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type des3-cbc-sha1 added to keytab FILE:/etc/krb5.keytab. Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type arcfour-hmac added to keytab FILE:/etc/krb5.keytab. Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type camellia256-cts-cmac added to keytab FILE:/etc/krb5.keytab. Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type camellia128-cts-cmac added to keytab FILE:/etc/krb5.keytab. Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type des-hmac-sha1 added to keytab FILE:/etc/krb5.keytab. Entry for principal host/<host_name>@EXAMPLE.COM with kvno 2, encryption type des-cbc-md5 added to keytab FILE:/etc/krb5.keytab.
- Check the status of nfs-client.target service and start it, if not already started:
# systemctl status nfs-client.target # systemctl start nfs-client.target # systemctl enable nfs-client.target
- Create an unprivileged user and ensure that the users that are created are resolvable to the UIDs through the central user database. For example:
# useradd guest
Note
The username of this user has to be the same as the one on the NFS-server. - Mount the volume specifying kerberos security type:
# mount -t nfs -o sec=krb5 <host_name>:/testvolume /mnt
As root, all access should be granted.For example:Creation of a directory on the mount point and all other operations as root should be successful.# mkdir <directory name>
- Login as a guest user:
# su - guest
Without a kerberos ticket, all access to /mnt should be denied. For example:# su guest # ls ls: cannot open directory .: Permission denied
- Get the kerberos ticket for the guest and access /mnt:
# kinit Password for guest@EXAMPLE.COM: # ls <directory created>
Important
With this ticket, some access must be allowed to /mnt. If there are directories on the NFS-server where "guest" does not have access to, it should work correctly.
6.3.3.7. NFS-Ganesha Service Downtime
- If the ganesha.nfsd dies (crashes, oomkill, admin kill), the maximum time to detect it and put the ganesha cluster into grace is 20sec, plus whatever time pacemaker needs to effect the fail-over.
Note
This time taken to detect if the service is down, can be edited using the following command on all the nodes:# pcs resource op remove nfs-mon monitor # pcs resource op add nfs-mon monitor interval=<interval_period_value>
- If the whole node dies (including network failure) then this down time is the total of whatever time pacemaker needs to detect that the node is gone, the time to put the cluster into grace, and the time to effect the fail-over. This is ~20 seconds.
- So the max-fail-over time is approximately 20-22 seconds, and the average time is typically less. In other words, the time taken for NFS clients to detect server reboot or resume I/O is 20 - 22 seconds.
6.3.3.7.1. Modifying the Fail-over Time
Protocols | File Operations |
NFSV3 |
|
NLM |
|
NFSV4 |
|
Note
/etc/ganesha/ganesha.conf
file.
NFSv4 { Grace_Period=<grace_period_value_in_sec>; }
/etc/ganesha/ganesha.conf
file, restart the NFS-Ganesha service using the following command on all the nodes :
# systemctl restart nfs-ganesha
6.3.3.8. Tuning Readdir Performance for NFS-Ganesha
Dir_Chunk
parameter enables the directory content to be read in chunks at an instance. This parameter is enabled by default. The default value of this parameter is 128
. The range for this parameter is 1
to UINT32_MAX
. To disable this parameter, set the value to 0
Procedure 6.1. Configuring readdir perform for NFS-Ganesha
- Edit the
/etc/ganesha/ganesha.conf
file. - Locate the
CACHEINODE
block. - Add the
Dir_Chunk
parameter inside the block:CACHEINODE { Entries_HWMark = 125000; Chunks_HWMark = 1000; Dir_Chunk = 128;
# Range:
,1
toUINT32_MAX
}0
to disable - Save the
ganesha.conf
file and restart the NFS-Ganesha service on all nodes:# systemctl restart nfs-ganesha
6.3.3.9. Troubleshooting NFS Ganesha
Ensure you execute the following commands for all the issues/failures that is encountered:
- Make sure all the prerequisites are met.
- Execute the following commands to check the status of the services:
# service nfs-ganesha status # service pcsd status # service pacemaker status # pcs status
- Review the followings logs to understand the cause of failure.
/var/log/ganesha/ganesha.log /var/log/ganesha/ganesha-gfapi.log /var/log/messages /var/log/pcsd.log
- Situation
NFS-Ganesha fails to start.
SolutionEnsure you execute all the mandatory checks to understand the root cause before proceeding with the following steps. Follow the listed steps to fix the issue:
- Ensure the kernel and gluster nfs services are inactive.
- Ensure that the port 875 is free to connect to the RQUOTA service.
- Ensure that the shared storage volume mount exists on the server after node reboot/shutdown. If it does not, then mount the shared storage volume manually using the following command:
# mount -t glusterfs <local_node's_hostname>:gluster_shared_storage /var/run/gluster/shared_storage
Note
With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ .
For more information see, section Exporting and Unexporting Volumes through NFS-Ganesha. - Situation
NFS-Ganesha port 875 is unavailable.
SolutionEnsure you execute all the mandatory checks to understand the root cause before proceeding with the following steps. Follow the listed steps to fix the issue:
- Run the following command to extract the PID of the process using port 875:
netstat -anlp | grep 875
- Determine if the process using port 875 is an important system or user process.
- Perform one of the following depending upon the importance of the process:
- If the process using port 875 is an important system or user process:
- Assign a different port to this service by modifying following line in ‘/etc/ganesha/ganesha.conf’ file on all the nodes:
# Use a non-privileged port for RQuota Rquota_Port = port_number;
- Run the following commands after modifying the port number:
# semanage port -a -t mountd_port_t -p tcp port_number # semanage port -a -t mountd_port_t -p udp port_number
- Run the following command to restart NFS-Ganesha:
systemctl restart nfs-ganesha
- If the process using port 875 is not an important system or user process:
- Run the following command to kill the process using port 875:
# kill pid;
Use the process ID extracted from the previous step. - Run the following command to ensure that the process is killed and port 875 is free to use:
# ps aux | grep pid;
- Run the following command to restart NFS-Ganesha:
systemctl restart nfs-ganesha
- If required, restart the killed process.
- Situation
NFS-Ganesha Cluster setup fails.
SolutionEnsure you execute all the mandatory checks to understand the root cause before proceeding with the following steps.
- Ensure the kernel and gluster nfs services are inactive.
- Ensure that
pcs cluster auth
command is executed on all the nodes with same password for the userhacluster
- Ensure that shared volume storage is mounted on all the nodes.
- Ensure that the name of the HA Cluster does not exceed 15 characters.
- Ensure UDP multicast packets are pingable using
OMPING
. - Ensure that Virtual IPs are not assigned to any NIC.
- Situation
NFS-Ganesha has started and fails to export a volume.
SolutionEnsure you execute all the mandatory checks to understand the root cause before proceeding with the following steps. Follow the listed steps to fix the issue:
- Ensure that volume is in
Started
state using the following command:# gluster volume status <volname>
- Execute the following commands to check the status of the services:
# service nfs-ganesha status # showmount -e localhost
- Review the followings logs to understand the cause of failure.
/var/log/ganesha/ganesha.log /var/log/ganesha/ganesha-gfapi.log /var/log/messages
- Ensure that dbus service is running using the following command
# service messagebus status
- If the volume is not in a started state, run the following command to start the volume.
# gluster volume start <volname>
If the volume is not exported as part of volume start, run the following command to re-export the volume:# /usr/libexec/ganesha/dbus-send.sh /var/run/gluster/shared_storage on <volname>
Note
With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ .
- Situation
Adding a new node to the HA cluster fails.
SolutionEnsure you execute all the mandatory checks to understand the root cause before proceeding with the following steps. Follow the listed steps to fix the issue:
- Ensure to run the following command from one of the nodes that is already part of the cluster:
# ganesha-ha.sh --add <HA_CONF_DIR> <NODE-HOSTNAME> <NODE-VIP>
- Ensure that gluster_shared_storage volume is mounted on the node that needs to be added.
- Make sure that all the nodes of the cluster is DNS resolvable from the node that needs to be added.
- Execute the following command for each of the hosts in the HA cluster on the node that needs to be added:For Red Hat Enterprize Linux 7:
# pcs cluster auth <hostname>
For Red Hat Enterprize Linux 8:# pcs host auth <hostname>
- Situation
Cleanup required when nfs-ganesha HA cluster setup fails.
SolutionTo restore back the machines to the original state, execute the following commands on each node forming the cluster:
# /usr/libexec/ganesha/ganesha-ha.sh --teardown /var/run/gluster/shared_storage/nfs-ganesha # /usr/libexec/ganesha/ganesha-ha.sh --cleanup /var/run/gluster/shared_storage/nfs-ganesha # systemctl stop nfs-ganesha
Note
With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ . - Situation
Permission issues.
SolutionBy default, the
root squash
option is disabled when you start NFS-Ganesha using the CLI. In case, you encounter any permission issues, check the unix permissions of the exported entry.
6.4. SMB
Warning
Overview of configuring SMB shares
- Verify that your system fulfils the requirements outlined in Section 6.4.1, “Requirements for using SMB with Red Hat Gluster Storage”.
- If you want to share volumes that use replication, set up CTDB: Section 6.4.2, “Setting up CTDB for Samba”.
- Configure your volumes to be shared using SMB: Section 6.4.3, “Sharing Volumes over SMB”.
- If you want to mount volumes on macOS clients: Section 6.4.4.1, “Configuring the Apple Create Context for macOS users”.
- Set up permissions for user access: Section 6.4.4.2, “Configuring read/write access for a non-privileged user”.
- Mount the shared volume on a client:
- Verify that your shared volume is working properly: Section 6.4.6, “Starting and Verifying your Configuration”
6.4.1. Requirements for using SMB with Red Hat Gluster Storage
- Samba is required to provide support and interoperability for the SMB protocol on Red Hat Gluster Storage. Additionally, CTDB is required when you want to share replicated volumes using SMB. See Subscribing to the Red Hat Gluster Storage server channels in the Red Hat Gluster Storage 3.5 Installation Guide for information on subscribing to the correct channels for SMB support.
- Enable the Samba firewall service in the active zones for runtime and permanent mode. The following commands are for systems based on Red Hat Enterprise Linux 7.To get a list of active zones, run the following command:
# firewall-cmd --get-active-zones
To allow the firewall services in the active zones, run the following commands# firewall-cmd --zone=zone_name --add-service=samba # firewall-cmd --zone=zone_name --add-service=samba --permanent
6.4.2. Setting up CTDB for Samba
Important
Prerequisites
- If you already have an older version of CTDB (version <= ctdb1.x), then remove CTDB by executing the following command:
# yum remove ctdb
After removing the older version, proceed with installing the latest CTDB.Note
Ensure that the system is subscribed to the samba channel to get the latest CTDB packages. - Install CTDB on all the nodes that are used as Samba servers to the latest version using the following command:
# yum install ctdb
- In a CTDB based high availability environment of Samba , the locks will not be migrated on failover.
- Enable the CTDB firewall service in the active zones for runtime and permanent mode. The following commands are for systems based on Red Hat Enterprise Linux 7.To get a list of active zones, run the following command:
# firewall-cmd --get-active-zones
To add ports to the active zones, run the following commands:# firewall-cmd --zone=zone_name --add-port=4379/tcp # firewall-cmd --zone=zone_name --add-port=4379/tcp --permanent
Best Practices
- CTDB requires a different broadcast domain from the Gluster internal network. The network used by the Windows clients to access the Gluster volumes exported by Samba, must be different from the internal Gluster network. Failing to do so can lead to an excessive time when there is a failover of CTDB between the nodes, and a degraded performance accessing the shares in Windows.For example an incorrect setup where CTDB is running in Network 192.168.10.X:
Status of volume: ctdb Gluster process TCP Port RDMA Port Online Pid Brick node1:/rhgs/ctdb/b1 49157 0 Y 30439 Brick node2:/rhgs/ctdb/b1 49157 0 Y 3827 Brick node3:/rhgs/ctdb/b1 49157 0 Y 89421 Self-heal Daemon on localhost N/A N/A Y 183026 Self-heal Daemon on sesdel0207 N/A N/A Y 44245 Self-heal Daemon on segotl4158 N/A N/A Y 110627 cat ctdb_listnodes 192.168.10.1 192.168.10.2 cat ctdb_ip Public IPs on node 0 192.168.10.3 0
Note
The host names, node1, node2, and node3 are used to setup the bricks and resolve the IPs in the same network 192.168.10.X. The Windows clients are accessing the shares using the internal Gluster network and this should not be the case. - Additionally, the CTDB network and the Gluster internal network must run in separate physical interfaces. Red Hat recommends 10GbE interfaces for better performance.
- It is recommended to use the same network bandwidth for Gluster and CTDB networks. Using different network speeds can lead to performance bottlenecks.The same amount of network traffic is expected in both internal and external networks.
Configuring CTDB on Red Hat Gluster Storage Server
- Create a new replicated volume to house the CTDB lock file. The lock file has a size of zero bytes, so use small bricks.To create a replicated volume run the following command, replacing N with the number of nodes to replicate across:
# gluster volume create volname replica N ip_address_1:brick_path ... ip_address_N:brick_path
For example:# gluster volume create ctdb replica 3 10.16.157.75:/rhgs/brick1/ctdb/b1 10.16.157.78:/rhgs/brick1/ctdb/b2 10.16.157.81:/rhgs/brick1/ctdb/b3
- In the following files, replace
all
in the statementMETA="all"
with the newly created volume name, for example,META="ctdb"
./var/lib/glusterd/hooks/1/start/post/S29CTDBsetup.sh /var/lib/glusterd/hooks/1/stop/pre/S29CTDB-teardown.sh
- In the
/etc/samba/smb.conf
file, add the following line in the global section on all the nodes:clustering=yes
- Start the volume.
# gluster volume start ctdb
The S29CTDBsetup.sh script runs on all Red Hat Gluster Storage servers, adds an entry in/etc/fstab
for the mount, and mounts the volume at/gluster/lock
on all the nodes with Samba server. It also enables automatic start of CTDB service on reboot.Note
When you stop the special CTDB volume, the S29CTDB-teardown.sh script runs on all Red Hat Gluster Storage servers and removes an entry in/etc/fstab
for the mount and unmounts the volume at/gluster/lock
. - Verify that the
/etc/ctdb
directory exists on all nodes that are used as a Samba server. This file contains CTDB configuration details recommended for Red Hat Gluster Storage. - Create the
/etc/ctdb/nodes
file on all the nodes that are used as Samba servers and add the IP addresses of these nodes to the file.10.16.157.0 10.16.157.3 10.16.157.6
The IP addresses listed here are the private IP addresses of Samba servers. - On nodes that are used as Samba servers and require IP failover, create the
/etc/ctdb/public_addresses
file. Add any virtual IP addresses that CTDB should create to the file in the following format:VIP/routing_prefix network_interface
For example:192.168.1.20/24 eth0 192.168.1.21/24 eth0
- Start the CTDB service on all the nodes.On RHEL 7 and RHEL 8, run
# systemctl start ctdb
On RHEL 6, run# service ctdb start
6.4.3. Sharing Volumes over SMB
/etc/samba/smb.conf
:
[gluster-VOLNAME]
comment = For samba share of volume VOLNAME
vfs objects = glusterfs
glusterfs:volume = VOLNAME
glusterfs:logfile = /var/log/samba/VOLNAME.log
glusterfs:loglevel = 7
path = /
read only = no
guest ok = yes
Configuration Options | Required? | Default Value | Description |
---|---|---|---|
Path | Yes | n/a | It represents the path that is relative to the root of the gluster volume that is being shared. Hence / represents the root of the gluster volume. Exporting a subdirectory of a volume is supported and /subdir in path exports only that subdirectory of the volume. |
glusterfs:volume | Yes | n/a | The volume name that is shared. |
glusterfs:logfile | No | NULL | Path to the log file that will be used by the gluster modules that are loaded by the vfs plugin. Standard Samba variable substitutions as mentioned in smb.conf are supported. |
glusterfs:loglevel | No | 7 | This option is equivalent to the client-log-level option of gluster. 7 is the default value and corresponds to the INFO level. |
glusterfs:volfile_server | No | localhost | The gluster server to be contacted to fetch the volfile for the volume. It takes the value, which is a list of white space separated elements, where each element is unix+/path/to/socket/file or [tcp+]IP|hostname|\[IPv6\][:port] |
If you are using an older version of Samba:
- Enable SMB specific caching:
# gluster volume set VOLNAME performance.cache-samba-metadata on
You can also enable generic metadata caching to improve performance. See Section 19.7, “Directory Operations” for details. - Restart the
glusterd
service on each Red Hat Gluster Storage node. - Verify proper lock and I/O coherence:
# gluster volume set VOLNAME storage.batch-fsync-delay-usec 0
Note
# gluster volume set <volname> performance.write-behind off
If you are using Samba-4.8.5-104 or later:
- To export gluster volume as SMB share via Samba, one of the following volume options,
user.cifs
oruser.smb
is required.To enable user.cifs volume option, run:# gluster volume set VOLNAME user.cifs enable
And to enable user.smb, run:# gluster volume set VOLNAME user.smb enable
Red Hat Gluster Storage 3.4 introduces a group commandsamba
for configuring the necessary volume options for Samba-CTDB setup. - Execute the following command to configure the volume options for the Samba-CTDB:
# gluster volume set VOLNAME group samba
This command will enable the following option for Samba-CTDB setup:- performance.readdir-ahead: on
- performance.parallel-readdir: on
- performance.nl-cache-timeout: 600
- performance.nl-cache: on
- performance.cache-samba-metadata: on
- network.inode-lru-limit: 200000
- performance.md-cache-timeout: 600
- performance.cache-invalidation: on
- features.cache-invalidation-timeout: 600
- features.cache-invalidation: on
- performance.stat-prefetch: on
If you are using Samba-4.9.8-109 or later:
- Have a local mount using native Gluster protocol Fuse on every Gluster node that shares the Gluster volume via Samba. Mount GlusterFS volume via FUSE and record the FUSE mountpoint for further steps:Add an entry in
/etc/fstab
:localhost:/myvol /mylocal glusterfs defaults,_netdev,acl 0 0
For example:localhost:/myvol 4117504 1818292 2299212 45% /mylocal
Where gluster volume ismyvol
that will be mounted on/mylocal
- Edit the samba share configuration file located at
/etc/samba/smb.conf
[gluster-VOLNAME] comment = For samba share of volume VOLNAME vfs objects = glusterfs glusterfs:volume = VOLNAME glusterfs:logfile =
/var/log/samba/VOLNAME.log
glusterfs:loglevel = 7 path = / read only = no guest ok = yes- Edit the
vfs objects
parameter value toglusterfs_fuse
vfs objects = glusterfs_fuse
- Edit the
path
parameter value to the FUSE mountpoint recorded previously. For example:path = /MOUNTDIR
- With SELinux in Enforcing mode, turn on the SELinux boolean
samba_share_fusefs
:# setsebool -P samba_share_fusefs on
Note
- New volumes being created will be automatically configured with the use of default
vfs objects
parameter. - Modifications to samba share configuration file are retained over restart of volumes until these volumes are deleted using Gluster CLI.
- The Samba hook scripts invoked as part of Gluster CLI operations on a volume
VOLNAME
will only operate on a Samba share named[gluster-VOLNAME]
. In other words, hook scripts will never delete or change the samba share configuration file for a samba share called[VOLNAME]
.
Then, for all Samba versions:
- Verify that the volume can be accessed from the SMB/CIFS share:
# smbclient -L <hostname> -U%
For example:#
smbclient -L rhs-vm1 -U%
Domain=[MYGROUP] OS=[Unix] Server=[Samba 4.1.17] Sharename Type Comment --------- ---- ------- IPC$ IPC IPC Service (Samba Server Version 4.1.17) gluster-vol1 Disk For samba share of volume vol1 Domain=[MYGROUP] OS=[Unix] Server=[Samba 4.1.17] Server Comment --------- ------- Workgroup Master --------- ------- - Verify that the SMB/CIFS share can be accessed by the user, run the following command:
# smbclient //<hostname>/gluster-<volname> -U <username>%<password>
For example:#
smbclient //10.0.0.1/gluster-vol1 -U root%redhat
Domain=[MYGROUP] OS=[Unix] Server=[Samba 4.1.17] smb: \> mkdir test smb: \> cd test\ smb: \test\> pwd Current directory is \\10.0.0.1\gluster-vol1\test\ smb: \test\>
6.4.4. Configuring User Access to Shared Volumes
6.4.4.1. Configuring the Apple Create Context for macOS users
- Add the following lines to the
[global]
section of thesmb.conf
file. Note that the indentation level shown is required.fruit:aapl = yes ea support = yes
- Load the
vfs_fruit
module and its dependencies by adding the following line to your volume's export configuration block in thesmb.conf
file.vfs objects = fruit streams_xattr glusterfs
For example:[gluster-volname] comment = For samba share of volume smbshare vfs objects = fruit streams_xattr glusterfs glusterfs:volume = volname glusterfs:logfile = /var/log/samba/glusterfs-volname-fruit.%M.log glusterfs:loglevel = 7 path = / read only = no guest ok = yes fruit:encoding = native
6.4.4.2. Configuring read/write access for a non-privileged user
- Add the user on all the Samba servers based on your configuration:
# adduser username
- Add the user to the list of Samba users on all Samba servers and assign password by executing the following command:
# smbpasswd -a username
- From any other Samba server, mount the volume using the FUSE protocol.
# mount -t glusterfs -o acl ip-address:/volname /mountpoint
For example:# mount -t glusterfs -o acl rhs-a:/repvol /mnt
- Use the
setfacl
command to provide the required permissions for directory access to the user.# setfacl -m user:username:rwx mountpoint
For example:# setfacl -m user:cifsuser:rwx /mnt
6.4.5. Mounting Volumes using SMB
6.4.5.1. Manually mounting volumes exported with SMB on Red Hat Enterprise Linux
- Install the
cifs-utils
package on the client.# yum install cifs-utils
- Run
mount -t cifs
to mount the exported SMB share, using the syntax example as guidance.# mount -t cifs -o user=username,pass=password //hostname/gluster-volname /mountpoint
Thesec=ntlmssp
parameter is also required when mounting a volume on Red Hat Enterprise Linux 6.# mount -t cifs -o user=username,pass=password,sec=ntlmssp //hostname/gluster-volname /mountpoint
For example:# mount -t cifs -o user=cifsuser,pass=redhat,sec=ntlmssp //server1/gluster-repvol /cifs
Important
Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide - Run
# smbstatus -S
on the server to display the status of the volume:Service pid machine Connected at ------------------------------------------------------------------- gluster-VOLNAME 11967 __ffff_192.168.1.60 Mon Aug 6 02:23:25 2012
6.4.5.2. Manually mounting volumes exported with SMB on Microsoft Windows
6.4.5.2.1. Using Microsoft Windows Explorer to manually mount a volume
- In Windows Explorer, click Map Network Drive screen.→ . to open the
- Choose the drive letter using thedrop-down list.
- In the Folder text box, specify the path of the server and the shared resource in the following format: \\SERVER_NAME\VOLNAME.
- Clickto complete the process, and display the network drive in Windows Explorer.
- Navigate to the network drive to verify it has mounted correctly.
6.4.5.2.2. Using Microsoft Windows command line interface to manually mount a volume
- Click→ , and then type
cmd
. - Enter
net use z: \\SERVER_NAME\VOLNAME
, where z: is the drive letter to assign to the shared volume.For example,net use y: \\server1\test-volume
- Navigate to the network drive to verify it has mounted correctly.
6.4.5.3. Manually mounting volumes exported with SMB on macOS
Prerequisites
- Ensure that your Samba configuration allows the use the SMB Apple Create Context.
- Ensure that the username you're using is on the list of allowed users for the volume.
Manual mounting process
- In the Finder, click Go > Connect to Server.
- In the Server Address field, type the IP address or hostname of a Red Hat Gluster Storage server that hosts the volume you want to mount.
- Click.
- When prompted, select Registered User to connect to the volume using a valid username and password.If required, enter your user name and password, then select the server volumes or shared folders that you want to mount.To make it easier to connect to the computer in the future, select Remember this password in my keychain to add your user name and password for the computer to your keychain.
6.4.5.4. Configuring automatic mounting for volumes exported with SMB on Red Hat Enterprise Linux
- Open the
/etc/fstab
file in a text editor and add a line containing the following details:\\HOSTNAME|IPADDRESS\SHARE_NAME MOUNTDIR cifs OPTIONS DUMP FSCK
In the OPTIONS column, ensure that you specify thecredentials
option, with a value of the path to the file that contains the username and/or password.Using the example server names, the entry contains the following replaced values.\\server1\test-volume /mnt/glusterfs cifs credentials=/etc/samba/passwd,_netdev 0 0
Thesec=ntlmssp
parameter is also required when mounting a volume on Red Hat Enterprise Linux 6, for example:\\server1\test-volume /mnt/glusterfs cifs credentials=/etc/samba/passwd,_netdev,sec=ntlmssp 0 0
See themount.cifs
man page for more information about these options.Important
Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide - Run
# smbstatus -S
on the client to display the status of the volume:Service pid machine Connected at ------------------------------------------------------------------- gluster-VOLNAME 11967 __ffff_192.168.1.60 Mon Aug 6 02:23:25 2012
6.4.5.5. Configuring automatic mounting for volumes exported with SMB on Microsoft Windows
- In Windows Explorer, click Map Network Drive screen.→ . to open the
- Choose the drive letter using thedrop-down list.
- In the Folder text box, specify the path of the server and the shared resource in the following format: \\SERVER_NAME\VOLNAME.
- Click the Reconnect at logon check box.
- Clickto complete the process, and display the network drive in Windows Explorer.
- If the Windows Security screen pops up, enter the username and password and click OK.
- Navigate to the network drive to verify it has mounted correctly.
6.4.5.6. Configuring automatic mounting for volumes exported with SMB on macOS
- Manually mount the volume using the process outlined in Section 6.4.5.3, “Manually mounting volumes exported with SMB on macOS”.
- In the Finder, click System Preferences > Users & Groups > Username > Login Items.
- Drag and drop the mounted volume into the login items list.Check Hide if you want to prevent the drive's window from opening every time you boot or log in.
6.4.6. Starting and Verifying your Configuration
Verify the Configuration
- Verify that CTDB is running using the following commands:
# ctdb status # ctdb ip # ctdb ping -n all
- Mount a Red Hat Gluster Storage volume using any one of the VIPs.
- Run
# ctdb ip
to locate the physical server serving the VIP. - Shut down the CTDB VIP server to verify successful configuration.When the Red Hat Gluster Storage server serving the VIP is shut down there will be a pause for a few seconds, then I/O will resume.
6.4.8. Accessing Snapshots in Windows
Note
6.4.8.1. Configuring Shadow Copy
Note
vfs objects = shadow_copy2 glusterfs
Configuration Options | Required? | Default Value | Description |
---|---|---|---|
shadow:snapdir | Yes | n/a | Path to the directory where snapshots are kept. The snapdir name should be .snaps. |
shadow:basedir | Yes | n/a | Path to the base directory that snapshots are from. The basedir value should be /. |
shadow:sort | Optional | unsorted | The supported values are asc/desc. By this parameter one can specify that the shadow copy directories should be sorted before they are sent to the client. This can be beneficial as unix filesystems are usually not listed alphabetically sorted. If enabled, it is specified in descending order. |
shadow:localtime | Optional | UTC | This is an optional parameter that indicates whether the snapshot names are in UTC/GMT or in local time. |
shadow:format | Yes | n/a | This parameter specifies the format specification for the naming of snapshots. The format must be compatible with the conversion specifications recognized by str[fp]time. The default value is _GMT-%Y.%m.%d-%H.%M.%S. |
shadow:fixinodes | Optional | No | If you enable shadow:fixinodes then this module will modify the apparent inode number of files in the snapshot directories using a hash of the files path. This is needed for snapshot systems where the snapshots have the same device:inode number as the original files (such as happens with GPFS snapshots). If you don't set this option then the 'restore' button in the shadow copy UI will fail with a sharing violation. |
shadow:snapprefix | Optional | n/a | Regular expression to match prefix of snapshot name. Red Hat Gluster Storage only supports Basic Regular Expression (BRE) |
shadow:delimiter | Optional | _GMT | delimiter is used to separate shadow:snapprefix and shadow:format. |
[gluster-vol0] comment = For samba share of volume vol0 vfs objects = shadow_copy2 glusterfs glusterfs:volume = vol0 glusterfs:logfile = /var/log/samba/glusterfs-vol0.%M.log glusterfs:loglevel = 3 path = / read only = no guest ok = yes shadow:snapdir = /.snaps shadow:basedir = / shadow:sort = desc shadow:snapprefix= ^S[A-Za-z0-9]*p$ shadow:format = _GMT-%Y.%m.%d-%H.%M.%S
Note
vfs objects = shadow_copy2 glusterfs_fuse
[gluster-vol0] comment = For samba share of volume vol0 vfs objects = shadow_copy2 glusterfs_fuse path = /MOUNTDIR read only = no guest ok = yes shadow:snapdir = /MOUNTDIR/.snaps shadow:basedir = /MOUNTDIR shadow:sort = desc shadow:snapprefix= ^S[A-Za-z0-9]*p$ shadow:format = _GMT-%Y.%m.%d-%H.%M.%S
Snap_GMT-2016.06.06-06.06.06 Sl123p_GMT-2016.07.07-07.07.07 xyz_GMT-2016.08.08-08.08.08
- Start or restart the
smb
service.On RHEL 7 and RHEL 8, runsystemctl [re]start smb
On RHEL 6, runservice smb [re]start
- Enable User Serviceable Snapshot (USS) for Samba. For more information see Section 8.13, “User Serviceable Snapshots”
6.4.8.2. Accessing Snapshot
- Right Click on the file or directory for which the previous version is required.
- Click on.
- In the dialog box, select the Date/Time of the previous version of the file, and select either, , or .where,Open: Lets you open the required version of the file in read-only mode.Restore: Restores the file back to the selected version.Copy: Lets you copy the file to a different location.
Figure 6.1. Accessing Snapshot
6.4.9. Tuning Performance
- Enabling Metadata Caching to improve the performance of SMB access of Red Hat Gluster Storage volumes.
- Enhancing Directory Listing Performance
- Enhancing File/Directory Create Performance
6.4.9.1. Enabling Metadata Caching
Note
- Execute the following command to enable metadata caching and cache invalidation:
# gluster volume set <volname> group metadata-cache
This is group set option which sets multiple volume options in a single command. - To increase the number of files that can be cached, execute the following command:
# gluster volume set <VOLNAME> network.inode-lru-limit <n>
n, is set to 50000. It can be increased if the number of active files in the volume is very high. Increasing this number increases the memory footprint of the brick processes.
6.4.9.2. Enhancing Directory Listing Performance
Note
- Verify if the
performance.readdir-ahead
option is enabled by executing the following command:# gluster volume get <VOLNAME> performance.readdir-ahead
If theperformance.readdir-ahead
is not enabled then execute the following command:# gluster volume set <VOLNAME> performance.readdir-ahead on
- Execute the following command to enable
parallel-readdir
option:# gluster volume set <VOLNAME> performance.parallel-readdir on
Note
If there are more than 50 bricks in the volume it is recommended to increase the cache size to be more than 10Mb (default value):# gluster volume set <VOLNAME> performance.rda-cache-limit <CACHE SIZE>
6.4.9.3. Enhancing File/Directory Create Performance
- Execute the following command to enable negative-lookup cache:
# gluster volume set <volname> group nl-cache volume set success
Note
The above command also enables cache-invalidation and increases the timeout to 10 minutes.
6.5. POSIX Access Control Lists
6.5.1. Setting ACLs with setfacl
setfacl
command lets you modify the ACLs of a specified file or directory. You can add access rules for a file with the -m
subcommand, or remove access rules for a file with the -x
subcommand. The basic syntax is as follows:
# setfacl subcommand access_rule file_path
- Rules for users start with
u:
# setfacl -m u:user:perms file_path
For example,setfacl -m u:fred:rw /mnt/data
gives the userfred
read and write access to the/mnt/data
directory.setfacl -x u::w /works_in_progress/my_presentation.txt
prevents all users from writing to the/works_in_progress/my_presentation.txt
file (except the owning user and members of the owning group, as these are controlled by POSIX).- Rules for groups start with
g:
# setfacl -m g:group:perms file_path
For example,setfacl -m g:admins:rwx /etc/fstab
gives users in theadmins
group read, write, and execute permissions to the/etc/fstab
file.setfacl -x g:newbies:x /mnt/harmful_script.sh
prevents users in thenewbies
group from executing/mnt/harmful_script.sh
.- Rules for other users start with
o:
# setfacl -m o:perms file_path
For example,setfacl -m o:r /mnt/data/public
gives users without any specific rules about their username or group permission to read files in the/mnt/data/public directory
.- Rules for setting a maximum access level using an effective rights mask start with
m:
# setfacl -m m:mask file_path
For example,setfacl -m m:r-x /mount/harmless_script.sh
gives all users a maximum of read and execute access to the/mount/harmless_script.sh
file.
d:
to the beginning of any rule, or make a rule recursive with the -R
option. For example, setfacl -Rm d:g:admins:rwx /etc
gives all members of the admins
group read, write, and execute access to any file created under the /etc
directory after the point when setfacl
is run.
6.5.2. Checking current ACLs with getfacl
getfacl
command lets you check the current ACLs of a file or directory. The syntax for this command is as follows:
# getfacl file_path
# getfacl /mnt/gluster/data/test/sample.jpg # owner: antony # group: antony user::rw- group::rw- other::r--
default:
, like so:
# getfacl /mnt/gluster/data/doc # owner: antony # group: antony user::rw- user:john:r-- group::r-- mask::r-- other::r-- default:user::rwx default:user:antony:rwx default:group::r-x default:mask::rwx default:other::r-x
6.5.3. Mounting volumes with ACLs enabled
acl
mount option. For further information, see Section 6.2.3, “Mounting Red Hat Gluster Storage Volumes”.
6.5.4. Checking ACL enablement on a mounted volume
Client type | How to check | Further info |
---|---|---|
Native FUSE |
Check the output of the
mount command for the default_permissions option:
# mount | grep mountpoint
If
default_permissions appears in the output for a mounted volume, ACLs are not enabled on that volume.
Check the output of the
ps aux command for the gluster FUSE mount process (glusterfs):
# ps aux | grep gluster root 30548 0.0 0.7 548408 13868 ? Ssl 12:39 0:00 /usr/local/sbin/glusterfs --acl --volfile-server=127.0.0.2 --volfile-id=testvol /mnt/fuse_mnt
If
--acl appears in the output for a mounted volume, ACLs are enabled on that volume.
| See Section 6.2, “Native Client” for more information. |
Gluster Native NFS |
On the server side, check the output of the
gluster volume info volname command. If nfs.acl appears in the output, that volume has ACLs disabled. If nfs.acl does not appear, ACLs are enabled (the default state).
On the client side, check the output of the
mount command for the volume. If noacl appears in the output, ACLs are disabled on the mount point. If this does not appear in the output, the client checks that the server uses ACLs, and uses ACLs if server support is enabled.
|
Refer to the output of
gluster volume set help pertaining to NFS, or see the Red Hat Enterprise Linux Storage Administration Guide for more information: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/ch-nfs.html
|
NFS Ganesha |
On the server side, check the volume's export configuration file,
/run/gluster/shared_storage/nfs-ganesha/exports/export.volname.conf . If the Disable_ACL option is set to true , ACLs are disabled. Otherwise, ACLs are enabled for that volume.
Note
NFS-Ganesha supports NFSv4 protocol standardized ACLs but not NFSACL protocol used for NFSv3 mounts. Only NFSv4 mounts can set ACLs.
There is no option to disable NFSv4 ACLs on the client side, so as long as the server supports ACLs, clients can set ACLs on the mount point.
|
See Section 6.3.3, “NFS Ganesha” for more information. For client side settings, refer to the Red Hat Enterprise Linux Storage Administration Guide: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/ch-nfs.html
|
samba |
POSIX ACLs are enabled by default when using Samba to access a Red Hat Gluster Storage volume.
| See Section 6.4, “SMB” for more information. |
6.6. Checking Client Operating Versions
op-version
. The cluster.op-version
parameter sets the required operating version for all volumes in a cluster on the server side. Each client supports a range of operating versions that are identified by a minimum (min-op-version
) and maximum (max-op-version
) supported operating version.
- For Red Hat Gluster 3.2 and later
# gluster volume status volname clients
Useall
in place of the name of your volume if you want to see the operating versions of clients connected to all volumes in the cluster.
Before Red Hat Gluster Storage 3.2:
- Perform a state dump for the volume whose clients you want to check.
# gluster volume statedump volname
- Locate the state dump directory
# gluster --print-statedumpdir
- Locate the state dump file and grep for client information.
# grep -A4 "identifier=client_ip" statedumpfile
Chapter 7. Integrating Red Hat Gluster Storage with Windows Active Directory
Figure 7.1. Active Directory Integration
Information | Example Value |
---|---|
DNS domain name / realm | addom.example.com |
NetBIOS domain name | ADDOM |
Name of administrative account | administrator |
Red Hat Gluster Storage nodes | rhs-srv1.addom.example.com, 192.168.56.10 rhs-srv2.addom.example.com, 192.168.56.11 rhs-srv3.addom.example.com, 192.168.56.12 |
Netbios name of the cluster | RHS-SMB |
7.1. Prerequisites
- Name Resolution
The Red Hat Gluster Storage nodes must be able to resolve names from the AD domain via DNS. To verify the same you can use the following command:
host dc1.addom.example.com
where,addom.example.com
is the AD domain and dc1 is the name of a domain controller.For example, the/etc/resolv.conf
file in a static network configuration could look like this:domain addom.example.com search addom.example.com nameserver 10.11.12.1 # dc1.addom.example.com nameserver 10.11.12.2 # dc2.addom.example.com
This example assumes that both the domain controllers are also the DNS servers of the domain. - Kerberos Packages
If you want to use the kerberos client utilities, like kinit and klist, then manually install the krb5-workstation using the following command:
# yum -y install krb5-workstation
- Synchronize Time Service
It is essential that the time service on each Red Hat Gluster Storage node and the Windows Active Directory server are synchronized, else the Kerberos authentication may fail due to clock skew. In environments where time services are not reliable, the best practice is to configure the Red Hat Gluster Storage nodes to synchronize time from the Windows Server.
On each Red Hat Storage node, edit the file/etc/ntp.conf
for RHEL 7 or/etc/chrony.conf
for RHEL 8 so the time is synchronized from a known, reliable time service:# Enable writing of statistics records. #statistics clockstats cryptostats loopstats peerstats server 0.rhel.pool.ntp.org iburst server 1.rhel.pool.ntp.org iburst driftfile /var/lib/chrony/drift makestep 1.0 3 rtcsync logdir /var/log/chrony
Activate the change on each Red Hat Gluster Storage node by stopping the NTP or chrony daemon, updating the time, then starting the chrony daemon. Verify the change on both servers using the following commands:For RHEL 7 and RHEL 8, run:# systemctl stop ntpd # systemctl start ntpd # systemctl stop chrony # systemctl start chrony
For RHEL 6, run:# service ntpd stop # service ntpd start # service chrony stop # service chrony stop
For more information on using chrony with RHEL 8, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_basic_system_settings/using-chrony-to-configure-ntp - Samba Packages
Ensure to install the following Samba packages along with its dependencies:
- CTDB
- samba
- samba-client
- samba-winbind
- samba-winbind-modules
7.2. Integration
- Configure Authentication
- Join Active Directory Domain
- Verify/Test Active Directory and Services
7.2.1. Configure Authentication
Note
- Ensure that CTDB is configured before the active directory join. For more information see, Section 6.3.1 Setting up CTDB for Samba in the Red Hat Gluster Storage Administration Guide.
- It is recommended to take backups of the configuration and of Samba’s databases (local and ctdb) before making any changes.
7.2.1.1. Basic Samba Configuration
autorid
. Red Hat recommends autorid
because in addition to automatically calculating user and group identifiers like tdb
, it performs fewer database transactions and read operations, and is a prerequisite for supporting secure ID history (SID history).
Warning
/etc/samba/smb.conf
must be identical on all nodes, and must contain the relevant parameters for AD. Along with that, a few other settings are required in order to activate mapping of user and group IDs.
[global] netbios name = RHS-SMB workgroup = ADDOM realm = addom.example.com security = ads clustering = yes idmap config * : backend = autorid idmap config * : range = 1000000-19999999 idmap config * : rangesize = 1000000 # -----------------RHS Options ------------------------- # # The following line includes RHS-specific configuration options. Be careful with this line. include = /etc/samba/rhs-samba.conf #=================Share Definitions =====================
Warning
global
section required in the smb.conf
file. Ensure that nothing else appears in this section in order to prevent gluster mechanisms from changing settings when starting or stopping the ctdb lock volume.
netbios name
consists of only one name which has to be the same name on all cluster nodes. Windows clients will only access the cluster via that name (either in this short form or as an FQDN). The individual node hostname (rhs-srv1, rhs-srv2, …) must not be used for the netbios name
parameter.
Note
- The idmap
range
defines the lowerst and hightest identifier numbers that can be used. Specify a range large enough to cover the number of objects specified inrangesize
. - The idmap
rangesize
specifies the number of identifiers available for each domain range. In this case there are one million identifiers per domain range, and therange
parameter indicates that there are nearly 19 million identifiers total, meaning that there are a total of 19 possible domain ranges. - If you want to be able to use the individual host names to also access specific nodes, you can add them to the
netbios aliases
parameter ofsmb.conf
. - In an AD environment, it is usually not required to run
nmbd
. However, if you have to runnmbd
, then make sure to set thecluster addresses
smb.conf
option to the list of public IP addresses of the cluster.
7.2.1.2. Alternative Configuration using ad
backend
idmap_ad
module in addition to autorid
. The idmap_ad
module reads the unix IDs from the AD's special unix attributes. This has to be configured by the AD domain's administrator before it can be used by Samba and winbind.
idmap_ad
, the AD domain admin has to prepare the AD domain for using the so called unix extensions and assign unix IDs to all users and groups that should be able to access the Samba server.
idmap_ad
backend for the ADDOM domain. The default autorid
backend catches all objects from domains other than the ADDOM domain.
[global] netbios name = RHS-SMB workgroup = ADDOM realm = addom.example.com security = ads clustering = yes idmap config * : backend = autorid idmap config * : range = 1000000-1999999 idmap config ADDOM : backend = ad idmap config ADDOM : range = 3000000-3999999 idmap config ADDOM : schema mode = rfc2307 winbind nss info = rfc2307 # -------------------RHS Options ------------------------------- # # The following line includes RHS-specific configuration options. Be careful with this line. include = /etc/samba/rhs-samba.conf #===================Share Definitions =========================
Note
- The range for the idmap_ad configuration is prescribed by the AD configuration. This has to be obtained by AD administrator.
- Ranges for different idmap configurations must not overlap.
- The schema mode and the winbind nss info setting should have the same value. If the domain is at level 2003R2 or newer, then rfc2307 is the correct value. For older domains, additional values sfu and sfu20 are available. See the manual pages of idmap_ad and smb.conf for further details.
7.2.1.3. Verifying the Samba Configuration
# testparm -s Load smb config files from /etc/samba/smb.conf rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384) Loaded services file OK. Server role: ROLE_DOMAIN_MEMBER # Global parameters [global] workgroup = ADDOM realm = addom.example.com netbios name = RHS-SMB security = ADS clustering = Yes winbind nss info = rfc2307 idmap config addom : schema mode = rfc2307 idmap config addom : range = 3000000-3999999 idmap config addom : backend = ad idmap config * : range = 1000000-1999999 idmap config * : backend = autorid
7.2.1.4. nsswitch Configuration
/etc/nsswitch.conf
file. Make sure the file contains the winbind entries for the passwd
and group
databases. For example:
... passwd: files winbind group: files winbind ...
visible
on the individual cluster node once Samba is joined to AD and winbind is started.
7.2.2. Join Active Directory Domain
# onnode all systemctl start ctdb # onnode all systemctl stop winbind # onnode all systemctl stop smb
# onnode all service ctdb start # onnode all service winbind stop # onnode all service smb stop
Note
- If your configuration has CTDB managing Winbind and Samba, they can be temporarily disabled with the following commands (to be executed prior to the above stop commands) so as to prevent CTDB going into an unhealthy state when they are shut down:
# onnode all ctdb event script disable legacy 49.winbind # onnode all ctdb event script disable legacy 50.samba
- For some versions of Red Hat Gluster Storage, a bug in the selinux policy prevents 'ctdb disablescript SCRIPT' from succeeding. If this is the case, 'chmod -x /etc/ctdb/events.d/SCRIPT' can be executed as a workaround from a root shell.
- Shutting down winbind and smb is primarily to prevent access to SMB services during this AD integration. These services may be left running but access to them should be prevented through some other means.
net
utility from a single node:
Warning
# net ads join -U Administrator Enter Administrator's password: Using short domain name -- ADDOM Joined 'RHS-SMB' to dns domain addom.example.com' Not doing automatic DNS update in a clustered setup.
net
utility can be used again:
# net ads dns register rhs-smb <PUBLIC IP 1> <PUBLIC IP 2> ...
rhs-smb
will resolve to the given public IP addresses. The DNS registrations use the cluster machine account for authentication in AD, which means this operation only can be done after the join has succeeded.
7.2.3. Verify/Test Active Directory and Services
# onnode all systemctl start nmb # onnode all systemctl start winbind # onnode all systemctl start smb
# onnode all service nmb start # onnode all service winbind start # onnode all service smb start
Note
- If you previously disabled CTDB’s ability to manage Winbind and Samba they can be re-enabled with the following commands:
# onnode all ctdb event script enable legacy 50.samba # onnode all ctdb event script enable legacy 49.winbind
- With the latest ctdb-4.9.8-105.el7rhgs.x86_64 package, the paths of ctdb managed service scripts have changed. The script files are now available under /etc/ctdb/events/legacy/ after enabling them from /usr/share/ctdb/events/legacy.
- To enable ctdb event script, execute the following command:
ctdb event script enable legacy 49.winbind
- To enable ctbd event script on all nodes, execute the following command:
# onnode all ctdb event script enable legacy 49.winbind
- Verify the join by executing the following stepsVerify the join to check if the created machine account can be used to authenticate to the AD LDAP server using the following command:
# net ads testjoin Join is OK
- Execute the following command to display the machine account’s LDAP object
# net ads status -P objectClass: top objectClass: person objectClass: organizationalPerson objectClass: user objectClass: computer cn: rhs-smb distinguishedName: CN=rhs-smb,CN=Computers,DC=addom,DC=example,DC=com instanceType: 4 whenCreated: 20150922013713.0Z whenChanged: 20151126111120.0Z displayName: RHS-SMB$ uSNCreated: 221763 uSNChanged: 324438 name: rhs-smb objectGUID: a178177e-4aa4-4abc-9079-d1577e137723 userAccountControl: 69632 badPwdCount: 0 codePage: 0 countryCode: 0 badPasswordTime: 130880426605312806 lastLogoff: 0 lastLogon: 130930100623392945 localPolicyFlags: 0 pwdLastSet: 130930098809021309 primaryGroupID: 515 objectSid: S-1-5-21-2562125317-1564930587-1029132327-1196 accountExpires: 9223372036854775807 logonCount: 1821 sAMAccountName: rhs-smb$ sAMAccountType: 805306369 dNSHostName: rhs-smb.addom.example.com servicePrincipalName: HOST/rhs-smb.addom.example.com servicePrincipalName: HOST/RHS-SMB objectCategory: CN=Computer,CN=Schema,CN=Configuration,DC=addom,DC=example,DC=com isCriticalSystemObject: FALSE dSCorePropagationData: 16010101000000.0Z lastLogonTimestamp: 130929563322279307 msDS-SupportedEncryptionTypes: 31
- Execute the following command to display general information about the AD server:
# net ads info LDAP server: 10.11.12.1 LDAP server name: dc1.addom.example.com Realm: ADDOM.EXAMPLE.COM Bind Path: dc=ADDOM,dc=EXAMPLE,dc=COM LDAP port: 389 Server time: Thu, 26 Nov 2015 11:15:04 UTC KDC server: 10.11.12.1 Server time offset: -26
- Verify if winbind is operating correctly by executing the following stepsExecute the following command to verify if winbindd can use the machine account for authentication to AD
# wbinfo -t checking the trust secret for domain ADDOM via RPC calls succeeded
- Execute the following command to resolve the given name to a Windows SID
# wbinfo --name-to-sid 'ADDOM\Administrator' S-1-5-21-2562125317-1564930587-1029132327-500 SID_USER (1)
- Execute the following command to verify authentication:
# wbinfo -a 'ADDOM\user' Enter ADDOM\user's password: plaintext password authentication succeeded Enter ADDOM\user's password: challenge/response password authentication succeeded
or,# wbinfo -a 'ADDOM\user%password' plaintext password authentication succeeded challenge/response password authentication succeeded
- Execute the following command to verify if the id-mapping is working properly:
# wbinfo --sid-to-uid <SID-OF-ADMIN> 1000000
- Execute the following command to verify if the winbind Name Service Switch module works correctly:
# getent passwd 'ADDOM\Administrator' ADDOM\administrator:*:1000000:1000004::/home/ADDOM/administrator:/bin/false
- Execute the following command to verify if samba can use winbind and the NSS module correctly:
# smbclient -L rhs-smb -U 'ADDOM\Administrator' Domain=[ADDOM] OS=[Windows 6.1] Server=[Samba 4.2.4] Sharename Type Comment --------- ---- ------- IPC$ IPC IPC Service (Samba 4.2.4) Domain=[ADDOM] OS=[Windows 6.1] Server=[Samba 4.2.4] Server Comment --------- ------- RHS-SMB Samba 4.2.4 Workgroup Master --------- ------- ADDOM RHS-SMB
Part IV. Manage
Chapter 8. Managing Snapshots
Figure 8.1. Snapshot Architecture
- Crash Consistency
A crash consistent snapshot is captured at a particular point-in-time. When a crash consistent snapshot is restored, the data is identical as it was at the time of taking a snapshot.
Note
Currently, application level consistency is not supported. - Online Snapshot
Snapshot is an online snapshot hence the file system and its associated data continue to be available for the clients even while the snapshot is being taken.
- Barrier
To guarantee crash consistency some of the file operations are blocked during a snapshot operation.
These file operations are blocked till the snapshot is complete. All other file operations are passed through. There is a default time-out of 2 minutes, within that time if snapshot is not complete then these file operations are unbarriered. If the barrier is unbarriered before the snapshot is complete then the snapshot operation fails. This is to ensure that the snapshot is in a consistent state.
Note
8.1. Prerequisites
- Snapshot is based on thinly provisioned LVM. Ensure the volume is based on LVM2. Red Hat Gluster Storage is supported on Red Hat Enterprise Linux 6.7 and later, Red Hat Enterprise Linux 7.1 and later, and on Red Hat Enterprise Linux 8.2 and later versions. All these versions of Red Hat Enterprise Linux is based on LVM2 by default. For more information, see https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/thinprovisioned_volumes.html
Important
Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide - Each brick must be independent thinly provisioned logical volume(LV).
- All bricks must be online for snapshot creation.
- The logical volume which contains the brick must not contain any data other than the brick.
- Linear LVM and thin LV are supported with Red Hat Gluster Storage 3.4 and later. For more information, see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html-single/logical_volume_manager_administration/index#LVM_components
- For each volume brick, create a dedicated thin pool that contains the brick of the volume and its (thin) brick snapshots. With the current thin-p design, avoid placing the bricks of different Red Hat Gluster Storage volumes in the same thin pool, as this reduces the performance of snapshot operations, such as snapshot delete, on other unrelated volumes.
- The recommended thin pool chunk size is 256KB. There might be exceptions to this in cases where we have a detailed information of the customer's workload.
- The recommended pool metadata size is 0.1% of the thin pool size for a chunk size of 256KB or larger. In special cases, where we recommend a chunk size less than 256KB, use a pool metadata size of 0.5% of thin pool size.
- Create a physical volume(PV) by using the
pvcreate
command.pvcreate /dev/sda1
Use the correctdataalignment
option based on your device. For more information, Section 19.2, “Brick Configuration” - Create a Volume Group (VG) from the PV using the following command:
vgcreate dummyvg /dev/sda1
- Create a thin-pool using the following command:
# lvcreate --size 1T --thin dummyvg/dummypool --chunksize 256k --poolmetadatasize 16G --zero n
A thin pool of size 1 TB is created, using a chunksize of 256 KB. Maximum pool metadata size of 16 G is used. - Create a thinly provisioned volume from the previously created pool using the following command:
# lvcreate --virtualsize 1G --thin dummyvg/dummypool --name dummylv
- Create a file system (XFS) on this. Use the recommended options to create the XFS file system on the thin LV.For example,
mkfs.xfs -f -i size=512 -n size=8192 /dev/dummyvg/dummylv
- Mount this logical volume and use the mount path as the brick.
mount /dev/dummyvg/dummylv /mnt/brick1
8.2. Creating Snapshots
- Red Hat Gluster Storage volume has to be present and the volume has to be in the
Started
state. - All the bricks of the volume have to be on an independent thin logical volume(LV).
- Snapshot names must be unique in the cluster.
- All the bricks of the volume should be up and running, unless it is a n-way replication where n >= 3. In such case quorum must be met. For more information see Chapter 8, Managing Snapshots
- No other volume operation, like
rebalance
,add-brick
, etc, should be running on the volume. - Total number of snapshots in the volume should not be equal to Effective snap-max-hard-limit. For more information see Configuring Snapshot Behavior.
- If you have a geo-replication setup, then pause the geo-replication session if it is running, by executing the following command:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL pause
For example,# gluster volume geo-replication master-vol example.com::slave-vol pause Pausing geo-replication session between master-vol example.com::slave-vol has been successful
Ensure that you take the snapshot of the master volume and then take snapshot of the slave volume.
# gluster snapshot create <snapname> <volname> [no-timestamp] [description <description>] [force]
- snapname - Name of the snapshot that will be created.
- VOLNAME(S) - Name of the volume for which the snapshot will be created. We only support creating snapshot of single volume.
- description - This is an optional field that can be used to provide a description of the snap that will be saved along with the snap.
force
- The behavior of snapshot creation command remains the same with and without the force option.- no-timestamp: By default a timestamp is appended to the snapshot name. If you do not want to append timestamp then pass no-timestamp as an argument.
Note
activate-on-create
parameter to enabled
.
# gluster snapshot create snap1 vol1 no-timestamp snapshot create: success: Snap snap1 created successfully
# gluster snapshot create snap1 vol1 snapshot create: success: Snap snap1_GMT-2015.07.20-10.02.33 created successfully
/var/run/gluster/snaps/<snap-volume-name>/brick<bricknumber>
.
0888649a92ea45db8c00a615dfc5ea35
and having two bricks will have the following two mount points:
/var/run/gluster/snaps/0888649a92ea45db8c00a615dfc5ea35/brick1 /var/run/gluster/snaps/0888649a92ea45db8c00a615dfc5ea35/brick2
df
or mount
command.
Note
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL resume
# gluster volume geo-replication master-vol example.com::slave-vol resume Resuming geo-replication session between master-vol example.com::slave-vol has been successful
8.3. Cloning a Snapshot
# gluster snapshot clone <clonename> <snapname>
Note
- Unlike restoring a snapshot, the original snapshot is still retained, after it has been cloned.
- The snapshot should be in activated state and all the snapshot bricks should be in running state before taking clone. Also the server nodes should be in quorum.
- This is a space efficient clone therefore both the Clone (new volume) and the snapshot LVM share the same LVM backend. The space consumption of the LVM grow as the new volume (clone) diverge from the snapshot.
# gluster snapshot clone clone_vol snap1 snapshot clone: success: Clone clone_vol created successfully
# gluster vol info <clonename>
# gluster vol info clone_vol Volume Name: clone_vol Type: Distribute Volume ID: cdd59995-9811-4348-8e8d-988720db3ab9 Status: Created Number of Bricks: 1 Transport-type: tcp Bricks: Brick1: 10.00.00.01:/var/run/gluster/snaps/clone_vol/brick1/brick3 Options Reconfigured: performance.readdir-ahead: on
Created
state, similar to a newly created volume. This volume should be explicitly started to use this volume.
8.4. Listing of Available Snapshots
# gluster snapshot list [VOLNAME]
- VOLNAME - This is an optional field and if provided lists the snapshot names of all snapshots present in the volume.
# gluster snapshot list snap3 # gluster snapshot list test_vol No snapshots present
8.5. Getting Information of all the Available Snapshots
# gluster snapshot info [(<snapname> | volume VOLNAME)]
- snapname - This is an optional field. If the snapname is provided then the information about the specified snap is displayed.
- VOLNAME - This is an optional field. If the VOLNAME is provided the information about all the snaps in the specified volume is displayed.
# gluster snapshot info snap3 Snapshot : snap3 Snap UUID : b2a391ce-f511-478f-83b7-1f6ae80612c8 Created : 2014-06-13 09:40:57 Snap Volumes: Snap Volume Name : e4a8f4b70a0b44e6a8bff5da7df48a4d Origin Volume name : test_vol1 Snaps taken for test_vol1 : 1 Snaps available for test_vol1 : 255 Status : Started
8.6. Getting the Status of Available Snapshots
# gluster snapshot status [(<snapname> | volume VOLNAME)]
- snapname - This is an optional field. If the snapname is provided then the status about the specified snap is displayed.
- VOLNAME - This is an optional field. If the VOLNAME is provided the status about all the snaps in the specified volume is displayed.
# gluster snapshot status snap3 Snap Name : snap3 Snap UUID : b2a391ce-f511-478f-83b7-1f6ae80612c8 Brick Path : 10.70.42.248:/var/run/gluster/snaps/e4a8f4b70a0b44e6a8bff5da7df48a4d/brick1/brick1 Volume Group : snap_lvgrp1 Brick Running : Yes Brick PID : 1640 Data Percentage : 1.54 LV Size : 616.00m Brick Path : 10.70.43.139:/var/run/gluster/snaps/e4a8f4b70a0b44e6a8bff5da7df48a4d/brick2/brick3 Volume Group : snap_lvgrp1 Brick Running : Yes Brick PID : 3900 Data Percentage : 1.80 LV Size : 616.00m Brick Path : 10.70.43.34:/var/run/gluster/snaps/e4a8f4b70a0b44e6a8bff5da7df48a4d/brick3/brick4 Volume Group : snap_lvgrp1 Brick Running : Yes Brick PID : 3507 Data Percentage : 1.80 LV Size : 616.00m
Note
8.7. Configuring Snapshot Behavior
snap-max-hard-limit
: If the snapshot count in a volume reaches this limit then no further snapshot creation is allowed. The range is from 1 to 256. Once this limit is reached you have to remove the snapshots to create further snapshots. This limit can be set for the system or per volume. If both system limit and volume limit is configured then the effective max limit would be the lowest of the two value.snap-max-soft-limit
: This is a percentage value. The default value is 90%. This configuration works along with auto-delete feature. If auto-delete is enabled then it will delete the oldest snapshot when snapshot count in a volume crosses this limit. When auto-delete is disabled it will not delete any snapshot, but it will display a warning message to the user.auto-delete
: This will enable or disable auto-delete feature. By default auto-delete is disabled. When enabled it will delete the oldest snapshot when snapshot count in a volume crosses the snap-max-soft-limit. When disabled it will not delete any snapshot, but it will display a warning message to the useractivate-on-create
: Snapshots are not activated at creation time by default. If you want created snapshots to immediately be activated after creation, set theactivate-on-create
parameter toenabled
. Note that all volumes are affected by this setting.
- Displaying the Configuration Values
To display the existing configuration values for a volume or the entire cluster, run the following command:
# gluster snapshot config [VOLNAME]
where:- VOLNAME: This is an optional field. The name of the volume for which the configuration values are to be displayed.
If the volume name is not provided then the configuration values of all the volume is displayed. System configuration details are displayed irrespective of whether the volume name is specified or not.For Example:# gluster snapshot config Snapshot System Configuration: snap-max-hard-limit : 256 snap-max-soft-limit : 90% auto-delete : disable activate-on-create : disable Snapshot Volume Configuration: Volume : test_vol snap-max-hard-limit : 256 Effective snap-max-hard-limit : 256 Effective snap-max-soft-limit : 230 (90%) Volume : test_vol1 snap-max-hard-limit : 256 Effective snap-max-hard-limit : 256 Effective snap-max-soft-limit : 230 (90%)
- Changing the Configuration Values
To change the existing configuration values, run the following command:
# gluster snapshot config [VOLNAME] ([snap-max-hard-limit <count>] [snap-max-soft-limit <percent>]) | ([auto-delete <enable|disable>]) | ([activate-on-create <enable|disable>])
where:- VOLNAME: This is an optional field. The name of the volume for which the configuration values are to be changed. If the volume name is not provided, then running the command will set or change the system limit.
- snap-max-hard-limit: Maximum hard limit for the system or the specified volume.
- snap-max-soft-limit: Soft limit mark for the system.
- auto-delete: This enables or disables the auto-delete feature. By default auto-delete is disabled.
- activate-on-create: This enables or disables the activate-on-create feature for all volumes. By default activate-on-create is disabled.
For Example:# gluster snapshot config test_vol snap-max-hard-limit 100 Changing snapshot-max-hard-limit will lead to deletion of snapshots if they exceed the new limit. Do you want to continue? (y/n) y snapshot config: snap-max-hard-limit for test_vol set successfully
8.8. Activating and Deactivating a Snapshot
# gluster snapshot activate <snapname> [force]
- snapname: Name of the snap to be activated.
force
: If some of the bricks of the snapshot volume are down then use theforce
command to start them.
# gluster snapshot activate snap1
# gluster snapshot deactivate <snapname>
- snapname: Name of the snap to be deactivated.
# gluster snapshot deactivate snap1
8.9. Deleting Snapshot
- Snapshot with the specified name should be present.
- Red Hat Gluster Storage nodes should be in quorum.
- No volume operation (e.g. add-brick, rebalance, etc) should be running on the original / parent volume of the snapshot.
# gluster snapshot delete <snapname>
- snapname - The name of the snapshot to be deleted.
# gluster snapshot delete snap2 Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y snapshot delete: snap2: snap removed successfully
Note
8.9.1. Deleting Multiple Snapshots
# gluster snapshot delete all
# gluster snapshot delete volume <volname>
8.10. Restoring Snapshot
- The specified snapshot has to be present
- The original / parent volume of the snapshot has to be in a stopped state.
- Red Hat Gluster Storage nodes have to be in quorum.
- No volume operation (e.g. add-brick, rebalance, etc) should be running on the origin or parent volume of the snapshot.
# gluster snapshot restore <snapname>
where,- snapname - The name of the snapshot to be restored.
For Example:# gluster snapshot restore snap1 Snapshot restore: snap1: Snap restored successfully
After snapshot is restored and the volume is started, trigger a self-heal by running the following command:# gluster volume heal VOLNAME full
Note
- The snapshot will be deleted once it is restored. To restore to the same point again take a snapshot explicitly after restoring the snapshot.
- After restore the brick path of the original volume will change. If you are using
fstab
to mount the bricks of the origin volume then you have to fixfstab
entries after restore. For more information see, https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Installation_Guide/apcs04s07.html
- In the cluster, identify the nodes participating in the snapshot with the snapshot status command. For example:
# gluster snapshot status snapname Snap Name : snapname Snap UUID : bded7c02-8119-491b-a7e1-cc8177a5a1cd Brick Path : 10.70.43.46:/var/run/gluster/snaps/816e8403874f43a78296decd7c127205/brick2/brick2 Volume Group : snap_lvgrp Brick Running : Yes Brick PID : 8303 Data Percentage : 0.43 LV Size : 2.60g Brick Path : 10.70.42.33:/var/run/gluster/snaps/816e8403874f43a78296decd7c127205/brick3/brick3 Volume Group : snap_lvgrp Brick Running : Yes Brick PID : 4594 Data Percentage : 42.63 LV Size : 2.60g Brick Path : 10.70.42.34:/var/run/gluster/snaps/816e8403874f43a78296decd7c127205/brick4/brick4 Volume Group : snap_lvgrp Brick Running : Yes Brick PID : 23557 Data Percentage : 12.41 LV Size : 2.60g
- In the nodes identified above, check if the
geo-replication
repository is present in/var/lib/glusterd/snaps/snapname
. If the repository is present in any of the nodes, ensure that the same is present in/var/lib/glusterd/snaps/snapname
throughout the cluster. If thegeo-replication
repository is missing in any of the nodes in the cluster, copy it to/var/lib/glusterd/snaps/snapname
in that node. - Restore snapshot of the volume using the following command:
# gluster snapshot restore snapname
If you have a geo-replication setup, then perform the following steps to restore snapshot:
- Stop the geo-replication session.
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL stop
- Stop the slave volume and then the master volume.
# gluster volume stop VOLNAME
- Restore snapshot of the slave volume and the master volume.
# gluster snapshot restore snapname
- Start the slave volume first and then the master volume.
# gluster volume start VOLNAME
- Start the geo-replication session.
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start
- Resume the geo-replication session.
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL resume
8.11. Accessing Snapshots
mount -t glusterfs <hostname>:/snaps/<snapname>/parent-VOLNAME /mount_point
- parent-VOLNAME - Volume name for which we have created the snapshot.For example,
# mount -t glusterfs myhostname:/snaps/snap1/test_vol /mnt
Note
Warning
8.12. Scheduling of Snapshots
8.12.1. Prerequisites
- To initialize snapshot scheduler on all the nodes of the cluster, execute the following command:
snap_scheduler.py init
This command initializes the snap_scheduler and interfaces it with the crond running on the local node. This is the first step, before executing any scheduling related commands from a node.Note
This command has to be run on all the nodes participating in the scheduling. Other options can be run independently from any node, where initialization has been successfully completed. - A shared storage named
gluster_shared_storage
is used across nodes to co-ordinate the scheduling operations. This shared storage is mounted at /var/run/gluster/shared_storage on all the nodes. For more information see, Section 11.12, “Setting up Shared Storage Volume”Note
With the release of 3.5 Batch Update 3, the mount point of shared storage is changed from /var/run/gluster/ to /run/gluster/ . - All nodes in the cluster have their times synced using NTP or any other mechanism. This is a hard requirement for this feature to work.
- If you are on Red Hat Enterprise Linux 7.1 or later, set the
cron_system_cronjob_use_shares
boolean toon
by running the following command:# setsebool -P cron_system_cronjob_use_shares on
8.12.2. Snapshot Scheduler Options
Note
To enable snap scheduler, execute the following command:
snap_scheduler.py enable
Note
# snap_scheduler.py enable snap_scheduler: Snapshot scheduling is enabled
To enable snap scheduler, execute the following command:
snap_scheduler.py disable
# snap_scheduler.py disable snap_scheduler: Snapshot scheduling is disabled
To display the the current status(Enabled/Disabled) of the snap scheduler, execute the following command:
snap_scheduler.py status
# snap_scheduler.py status snap_scheduler: Snapshot scheduling status: Disabled
To add a snapshot schedule, execute the following command:
snap_scheduler.py add "Job Name" "Schedule" "Volume Name"
Example of job definition: .---------------- minute (0 - 59) | .------------- hour (0 - 23) | | .---------- day of month (1 - 31) | | | .------- month (1 - 12) OR jan,feb,mar,apr ... | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat | | | | | * * * * * user-name command to be executed
# snap_scheduler.py add "Job1" "* * * * *" test_vol snap_scheduler: Successfully added snapshot schedule
Note
Scheduled-Job1-test_vol_GMT-2015.06.19-09.47.01
To edit an existing snapshot schedule, execute the following command:
snap_scheduler.py edit "Job Name" "Schedule" "Volume Name"
Example of job definition: .---------------- minute (0 - 59) | .------------- hour (0 - 23) | | .---------- day of month (1 - 31) | | | .------- month (1 - 12) OR jan,feb,mar,apr ... | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat | | | | | * * * * * user-name command to be executed
# snap_scheduler.py edit "Job1" "*/5 * * * *" gluster_shared_storage snap_scheduler: Successfully edited snapshot schedule
To list the existing snapshot schedule, execute the following command:
snap_scheduler.py list
# snap_scheduler.py list JOB_NAME SCHEDULE OPERATION VOLUME NAME -------------------------------------------------------------------- Job0 * * * * * Snapshot Create test_vol
To delete an existing snapshot schedule, execute the following command:
snap_scheduler.py delete "Job Name"
# snap_scheduler.py delete Job1 snap_scheduler: Successfully deleted snapshot schedule
8.13. User Serviceable Snapshots
test.txt
which was in the Home directory a couple of months earlier and was deleted accidentally. You can now easily go to the virtual .snaps
directory that is inside the home directory and recover the test.txt file using the cp
command.
Note
- User Serviceable Snapshot is not the recommended option for bulk data access from an earlier snapshot volume. For such scenarios it is recommended to mount the Snapshot volume and then access the data. For more information see, Chapter 8, Managing Snapshots
- Each activated snapshot volume when initialized by User Serviceable Snapshots, consumes some memory. Most of the memory is consumed by various house keeping structures of gfapi and xlators like DHT, AFR, etc. Therefore, the total memory consumption by snapshot depends on the number of bricks as well. Each brick consumes approximately 10MB of space, for example, in a 4x3 replica setup the total memory consumed by snapshot is around 50MB and for a 6x3 setup it is roughly 90MB.Therefore, as the number of active snapshots grow, the total memory footprint of the snapshot daemon (snapd) also grows. Therefore, in a low memory system, the snapshot daemon can get
OOM
killed if there are too many active snapshots
8.13.1. Enabling and Disabling User Serviceable Snapshot
# gluster volume set VOLNAME features.uss enable
# gluster volume set test_vol features.uss enable volume set: success
# gluster snapshot activate <snapshot-name>
# gluster volume set VOLNAME features.uss disable
# gluster volume set test_vol features.uss disable volume set: success
8.13.2. Viewing and Retrieving Snapshots using NFS / FUSE
.snaps
directory of every directory of the mounted volume.
Note
# mount -t nfs -o vers=3 server1:/test-vol /mnt/glusterfs
# mount -t glusterfs server1:/test-vol /mnt/glusterfs
.snaps
directory is a virtual directory which will not be listed by either the ls
command, or the ls -a
option. The .snaps directory will contain every snapshot taken for that given volume as individual directories. Each of these snapshot entries will in turn contain the data of the particular directory the user is accessing from when the snapshot was taken.
- Go to the folder where the file was present when the snapshot was taken. For example, if you had a test.txt file in the root directory of the mount that has to be recovered, then go to that directory.
# cd /mnt/glusterfs
Note
Since every directory has a virtual.snaps
directory, you can enter the.snaps
directory from here. Since.snaps
is a virtual directory,ls
andls -a
command will not list the.snaps
directory. For example:# ls -a ....Bob John test1.txt test2.txt
- Go to the
.snaps
folder# cd .snaps
- Run the
ls
command to list all the snapsFor example:# ls -p snapshot_Dec2014/ snapshot_Nov2014/ snapshot_Oct2014/ snapshot_Sept2014/
- Go to the snapshot directory from where the file has to be retrieved.For example:
cd snapshot_Nov2014
# ls -p John/ test1.txt test2.txt
- Copy the file/directory to the desired location.
# cp -p test2.txt $HOME
8.13.3. Viewing and Retrieving Snapshots using CIFS for Windows Client
.snaps
folder of every folder in the root of the CIFS share. The .snaps
folder is a hidden folder which will be displayed only when the following option is set to ON
on the volume using the following command:
# gluster volume set volname features.show-snapshot-directory on
ON
, every Windows client can access the .snaps
folder by following these steps:
- In the
Folder
options, enable theShow hidden files, folders, and drives
option. - Go to the root of the CIFS share to view the
.snaps
folder.Note
The.snaps
folder is accessible only in the root of the CIFS share and not in any sub folders. - The list of snapshots are available in the
.snaps
folder. You can now access the required file and retrieve it.
8.14. Troubleshooting Snapshots
- Situation
Snapshot creation fails.
Step 1Check if the bricks are thinly provisioned by following these steps:
- Execute the
mount
command and check the device name mounted on the brick path. For example:# mount /dev/mapper/snap_lvgrp-snap_lgvol on /rhgs/brick1 type xfs (rw) /dev/mapper/snap_lvgrp1-snap_lgvol1 on /rhgs/brick2 type xfs (rw)
- Run the following command to check if the device has a LV pool name.
lvs device-name
For example:# lvs -o pool_lv /dev/mapper/snap_lvgrp-snap_lgvol Pool snap_thnpool
If thePool
field is empty, then the brick is not thinly provisioned. - Ensure that the brick is thinly provisioned, and retry the snapshot create command.
Step 2Check if the bricks are down by following these steps:
- Execute the following command to check the status of the volume:
# gluster volume status VOLNAME
- If any bricks are down, then start the bricks by executing the following command:
# gluster volume start VOLNAME force
- To verify if the bricks are up, execute the following command:
# gluster volume status VOLNAME
- Retry the snapshot create command.
Step 3Check if the node is down by following these steps:
- Execute the following command to check the status of the nodes:
# gluster volume status VOLNAME
- If a brick is not listed in the status, then execute the following command:
# gluster pool list
- If the status of the node hosting the missing brick is
Disconnected
, then power-up the node. - Retry the snapshot create command.
Step 4Check if rebalance is in progress by following these steps:
- Execute the following command to check the rebalance status:
gluster volume rebalance VOLNAME status
- If rebalance is in progress, wait for it to finish.
- Retry the snapshot create command.
- Situation
Snapshot delete fails.
Step 1Check if the server quorum is met by following these steps:
- Execute the following command to check the peer status:
# gluster pool list
- If nodes are down, and the cluster is not in quorum, then power up the nodes.
- To verify if the cluster is in quorum, execute the following command:
# gluster pool list
- Retry the snapshot delete command.
- Situation
Snapshot delete command fails on some node(s) during commit phase, leaving the system inconsistent.
Solution- Identify the node(s) where the delete command failed. This information is available in the delete command's error output. For example:
# gluster snapshot delete snapshot1 Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y snapshot delete: failed: Commit failed on 10.00.00.02. Please check log file for details. Snapshot command failed
- On the node where the delete command failed, bring down glusterd using the following command:On RHEL 7 and RHEL 8, run
# systemctl stop glusterd
On RHEL 6, run# service glusterd stop
Important
Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide - Delete that particular snaps repository in
/var/lib/glusterd/snaps/
from that node. For example:# rm -rf /var/lib/glusterd/snaps/snapshot1
- Start glusterd on that node using the following command:On RHEL 7 and RHEL 8, run
# systemctl start glusterd
On RHEL 6, run# service glusterd start.
Important
Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide - Repeat the 2nd, 3rd, and 4th steps on all the nodes where the commit failed as identified in the 1st step.
- Retry deleting the snapshot. For example:
# gluster snapshot delete snapshot1
- Situation
Snapshot restore fails.
Step 1Check if the server quorum is met by following these steps:
- Execute the following command to check the peer status:
# gluster pool list
- If nodes are down, and the cluster is not in quorum, then power up the nodes.
- To verify if the cluster is in quorum, execute the following command:
# gluster pool list
- Retry the snapshot restore command.
Step 2Check if the volume is in
Stop
state by following these steps:- Execute the following command to check the volume info:
# gluster volume info VOLNAME
- If the volume is in
Started
state, then stop the volume using the following command:gluster volume stop VOLNAME
- Retry the snapshot restore command.
- Situation
Snapshot commands fail.
Step 1Check if there is a mismatch in the operating versions by following these steps:
- Open the following file and check for the operating version:
/var/lib/glusterd/glusterd.info
If theoperating-version
is lesser than 30000, then the snapshot commands are not supported in the version the cluster is operating on. - Upgrade all nodes in the cluster to Red Hat Gluster Storage 3.2 or higher.
- Retry the snapshot command.
- Situation
After rolling upgrade, snapshot feature does not work.
SolutionYou must ensure to make the following changes on the cluster to enable snapshot:
- Restart the volume using the following commands.
# gluster volume stop VOLNAME # gluster volume start VOLNAME
- Restart glusterd services on all nodes.On RHEL 7 and RHEL 8, run
# systemctl restart glusterd
On RHEL 6, run# service glusterd restart
Important
Red Hat Gluster Storage is not supported on Red Hat Enterprise Linux 6 (RHEL 6) from 3.5 Batch Update 1 onwards. See Version Details table in section Red Hat Gluster Storage Software Components and Versions of the Installation Guide
Chapter 9. Managing Directory Quotas
Warning
9.1. Enabling and Disabling Quotas
# gluster volume quota VOLNAME enable
Note
# gluster volume quota VOLNAME disable
Important
quota-remove-xattr.sh
. If you re-enable quotas while the cleanup process is still running, the extended attributes that enable quotas may be removed by the cleanup process. This has negative effects on quota accounting.
9.2. Before Setting a Quota on a Directory
- When specifying a directory to limit with the
gluster volume quota
command, the directory's path is relative to the Red Hat Gluster Storage volume mount point, not the root directory of the server or client on which the volume is mounted. That is, if the Red Hat Gluster Storage volume is mounted at/mnt/glusterfs
and you want to place a limit on the/mnt/glusterfs/dir
directory, use/dir
as the path when you run thegluster volume quota
command, like so:# gluster volume quota VOLNAME limit-usage /dir hard_limit
- Ensure that at least one brick is available per replica set when you run the
gluster volume quota
command. A brick is available if aY
appears in theOnline
column ofgluster volume status
command output, like so:# gluster volume status VOLNAME Status of volume: VOLNAME Gluster process Port Online Pid ------------------------------------------------------------ Brick arch:/export/rep1 24010 Y 18474 Brick arch:/export/rep2 24011 Y 18479 NFS Server on localhost 38467 Y 18486 Self-heal Daemon on localhost N/A Y 18491
9.3. Limiting Disk Usage
9.3.1. Setting Disk Usage Limits
# gluster volume quota VOLNAME limit-usage path hard_limit
/dir
directory on the data
volume to 100 GB, run the following command:
# gluster volume quota data limit-usage /dir 100GB
/dir
directory and all files and directories underneath it from containing more than 100 GB of data cumulatively.
data
volume to 1 TB, set a 1 TB limit on the root directory of the volume, like so:
# gluster volume quota data limit-usage / 1TB
# gluster volume quota data limit-usage / 1TB 75
/var/log/glusterfs/bricks/BRICKPATH.log
.
default-soft-limit
subcommand. For example, to set a default soft limit of 90% on the data volume, run the following command:
# gluster volume quota data default-soft-limit 90
# gluster volume quota VOLNAME list
limit-usage
subcommand.
9.3.2. Viewing Current Disk Usage Limits
# gluster volume quota VOLNAME list
# gluster volume quota test-volume list Path Hard-limit Soft-limit Used Available -------------------------------------------------------- / 50GB 75% 0Bytes 50.0GB /dir 10GB 75% 0Bytes 10.0GB /dir/dir2 20GB 90% 0Bytes 20.0GB
# gluster volume quota VOLNAME list /<directory_name>
# gluster volume quota test-volume list /dir Path Hard-limit Soft-limit Used Available ------------------------------------------------- /dir 10.0GB 75% 0Bytes 10.0GB
# gluster volume quota VOLNAME list DIR1 DIR2
9.3.2.1. Viewing Quota Limit Information Using the df
Utility
df
utility does not take quota limits into account when reporting disk usage. This means that clients accessing directories see the total space available to the volume, rather than the total space allotted to their directory by quotas. You can configure a volume to display the hard quota limit as the total disk space instead by setting quota-deem-statfs
parameter to on
.
quota-deem-statfs
parameter to on
, run the following command:
# gluster volume set VOLNAME quota-deem-statfs on
df
to to display the hard quota limit as the total disk space for a client.
quota-deem-statfs
is set to off
:
# df -hT /home Filesystem Type Size Used Avail Use% Mounted on server1:/test-volume fuse.glusterfs 400G 12G 389G 3% /home
quota-deem-statfs
is set to on
:
# df -hT /home Filesystem Type Size Used Avail Use% Mounted on server1:/test-volume fuse.glusterfs 300G 12G 289G 4% /home
9.3.3. Setting Quota Check Frequency (Timeouts)
soft-timeout
parameter specifies how often Red Hat Gluster Storage checks space usage when usage has, so far, been below the soft limit set on the directory or volume. The default soft timeout frequency is every 60
seconds.
# gluster volume quota VOLNAME soft-timeout seconds
hard-timeout
parameter specifies how often Red Hat Gluster Storage checks space usage when usage is greater than the soft limit set on the directory or volume. The default hard timeout frequency is every 5
seconds.
# gluster volume quota VOLNAME hard-timeout seconds
Important
9.3.4. Setting Logging Frequency (Alert Time)
alert-time
parameter configures how frequently usage information is logged after the soft limit has been reached. You can configure alert-time
with the following command:
# gluster volume quota VOLNAME alert-time time
1w
).
Unit of time | Format 1 | Format 2 |
---|---|---|
Second(s) | [integer]s | [integer]sec |
Minute(s) | [integer]m | [integer]min |
Hour(s) | [integer]h | [integer]hr |
Day(s) | [integer]d | [integer]days |
Week(s) | [integer]w | [integer]wk |
# gluster volume quota test-vol alert-time 10m
# gluster volume quota test-vol alert-time 10days
9.3.5. Removing Disk Usage Limits
# gluster volume quota VOLNAME remove DIR
# gluster volume quota test-volume remove /data volume quota : success
# gluster vol quota VOLNAME remove /
Chapter 10. Managing Geo-replication
10.1. About Geo-replication
- Master – the primary Red Hat Gluster Storage volume.
- Slave – a secondary Red Hat Gluster Storage volume. A slave volume can be a volume on a remote host, such as
remote-host::volname
.
10.2. Replicated Volumes vs Geo-replication
Replicated Volumes | Geo-replication |
---|---|
Works between all bricks in a replica set, so that changes are synced in both directions. | Works only from the primary (master) volume to the secondary (slave) volume. |
Mirrors data across bricks within one trusted storage pool. | Mirrors data across geographically distributed trusted storage pools. |
Provides high-availability. | Provides data back-up for disaster recovery. |
Synchronous replication: each and every file operation is applied to all the bricks. | Asynchronous replication: checks for changes in files periodically, and syncs them on detecting differences. |
10.3. Preparing to Deploy Geo-replication
10.3.1. Exploring Geo-replication Deployment Scenarios
- Geo-replication over LAN
- Geo-replication over WAN
- Geo-replication over the Internet
- Multi-site cascading geo-replication
10.3.2. Geo-replication Deployment Overview
- Verify that your environment matches the minimum system requirements. See Section 10.3.3, “Prerequisites”.
- Determine the appropriate deployment scenario. See Section 10.3.1, “Exploring Geo-replication Deployment Scenarios”.
- Start geo-replication on the master and slave systems.
- For manual method, see Section 10.4, “Starting Geo-replication”.
- For gdeploy method, see Starting a geo-replication session in Section 10.5.3, “Controlling geo-replication sessions using gdeploy”.
10.3.3. Prerequisites
- The master and slave volumes must use the same version of Red Hat Gluster Storage.
- Nodes in the slave volume must not be part of the master volume. Two separate trusted storage pools are required.
- Disable the
performance.quick-read
option in the slave volume using the following command:[slave ~]# gluster volume set slavevol performance.quick-read off
- Time must be synchronized between all master and slave nodes before geo-replication is configured. Red Hat recommends setting up a network time protocol service to keep time synchronized between bricks and servers, and avoid out-of-time synchronization errors.See Network Time Protocol Setup for more information.
- Add the required port for geo-replication from the ports listed in the Section 3.1.2, “Port Access Requirements”.
- Key-based SSH authentication without a password is required between one node of the master volume (the node from which the
geo-replication create
command will be executed), and one node of the slave volume (the node whose IP/hostname will be mentioned in the slave name when running thegeo-replication create
command).Create the public and private keys usingssh-keygen
(without passphrase) on the master node:# ssh-keygen
Copy the public key to the slave node using the following command:# ssh-copy-id -i identity_file root@slave_node_IPaddress/Hostname
If you are setting up a non-root geo-replicaton session, then copy the public key to the respectiveuser
location.Note
- Key-based SSH authentication without a password is only required from the master node to the slave node; the slave node does not need this level of access. - ssh-copy-id
command does not work ifssh authorized_keys
file is configured in the custom location. You must copy the contents of.ssh/id_rsa.pub
file from the Master and paste it to authorized_keys file in the custom location on the Slave node.Gsyncd also requires key-based SSH authentication without a password between every node in the master cluster to every node in the slave cluster. Thegluster system:: execute gsec_create
command createssecret-pem
files on all the nodes in the master, and is used to implement the SSH authentication connection. Thepush-pem
option in thegeo-replication create
command pushes these keys to all slave nodes.For more information on thegluster system::execute gsec_create
andpush-pem
commands, see Section 10.3.4.1, “Setting Up your Environment for Geo-replication Session”.
10.3.4. Setting Up your Environment
- Section 10.3.4.1, “Setting Up your Environment for Geo-replication Session” - In this method, the slave mount is owned by the root user.
- Section 10.3.4.2, “Setting Up your Environment for a Secure Geo-replication Slave” - This method is more secure as the slave mount is owned by a normal user.
10.3.4.1. Setting Up your Environment for Geo-replication Session
Creating Geo-replication Sessions
- To create a common
pem pub
file, run the following command on the master node where the key-based SSH authentication connection is configured:#
gluster system:: execute gsec_create
Alternatively, you can create the pem pub file by running the following command on the master node where the key-based SSH authentication connection is configured. This alternate command generates Geo-rep session specific ssh-keys in all the master nodes and collects public keys from all peer nodes. It also provides a detailed view of the command status.#
gluster-georep-sshkey generate
+--------------+-------------+---------------+ | NODE | NODE STATUS | KEYGEN STATUS | +--------------+-------------+---------------+ | node1 | UP | OK | | node2 | UP | OK | | node3 | UP | OK | | node4 | UP | OK | | node5 | UP | OK | | localhost | UP | OK | +--------------+-------------+---------------+ - Create the geo-replication session using the following command. The
push-pem
option is needed to perform the necessarypem-file
setup on the slave nodes.#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL create push-pem [force]
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol create push-pem
Note
- There must be key-based SSH authentication access between the node from which this command is run, and the slave host specified in the above command. This command performs the slave verification, which includes checking for a valid slave URL, valid slave volume, and available space on the slave. If the verification fails, you can use the
force
option which will ignore the failed verification and create a geo-replication session. - The slave volume is in read-only mode by default. However, in case of a failover-failback situation, the original master is made read-only by default as the session is from the original slave to the original master.
- Enable shared storage for master and slave volumes:
# gluster volume set all cluster.enable-shared-storage enable
For more information on shared storage, see Section 11.12, “Setting up Shared Storage Volume”. - Configure the meta-volume for geo-replication:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config use_meta_volume true
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config use_meta_volume true
For more information on configuring meta-volume, see Section 10.3.5, “Configuring a Meta-Volume”. - Start the geo-replication by running the following command on the master node:For example,
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start [force]
- Verify the status of the created session by running the following command:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status
10.3.4.2. Setting Up your Environment for a Secure Geo-replication Slave
mountbroker
, an internal service of glusterd which manages the mounts for unprivileged slave accounts. You must perform additional steps to configure glusterd with the appropriate mountbroker's
access control directives. The following example demonstrates this process:
- In all the slave nodes, create a new group. For example,
geogroup
.Note
You must not use multiple groups for themountbroker
setup. You can create multiple user accounts but the group should be same for all the non-root users. - In all the slave nodes, create a unprivileged account. For example,
geoaccount
. Addgeoaccount
as a member ofgeogroup
group. - On any one of the Slave nodes, run the following command to set up mountbroker root directory and group.
# gluster-mountbroker setup <MOUNT ROOT> <GROUP>
For example,# gluster-mountbroker setup /var/mountbroker-root geogroup
- On any one of the Slave nodes, run the following commands to add volume and user to the mountbroker service.
# gluster-mountbroker add <VOLUME> <USER>
For example,# gluster-mountbroker add slavevol geoaccount
- Check the status of the setup by running the following command:
# gluster-mountbroker status NODE NODE STATUS MOUNT ROOT GROUP USERS --------------------------------------------------------------------------------------- localhost UP /var/mountbroker-root(OK) geogroup(OK) geoaccount(slavevol) node2 UP /var/mountbroker-root(OK) geogroup(OK) geoaccount(slavevol)
The output displays the mountbroker status for every peer node in the slave cluster. - Restart
glusterd
service on all the Slave nodes.# service glusterd restart
After you setup an auxiliary glusterFS mount for the unprivileged account on all the Slave nodes, perform the following steps to setup a non-root geo-replication session.: - Setup key-based SSH authentication from one of the master nodes to the
user
on one of the slave nodes.For example, to setup key-based SSH authentication to the user geoaccount.# ssh-keygen # ssh-copy-id -i identity_file geoaccount@slave_node_IPaddress/Hostname
- Create a common pem pub file by running the following command on the master nodes, where the key-based SSH authentication connection is configured to the
user
on the slave nodes:# gluster system:: execute gsec_create
- Create a geo-replication relationship between the master and the slave to the
user
by running the following command on the master node:For example,# gluster volume geo-replication MASTERVOL geoaccount@SLAVENODE::slavevol create push-pem
If you have multiple slave volumes and/or multiple accounts, create a geo-replication session with that particular user and volume.For example,# gluster volume geo-replication MASTERVOL geoaccount2@SLAVENODE::slavevol2 create push-pem
- Enable shared storage for master and slave volumes:
# gluster volume set all cluster.enable-shared-storage enable
For more information on shared storage, see Section 11.12, “Setting up Shared Storage Volume”. - On the slave node, which is used to create relationship, run
/usr/libexec/glusterfs/set_geo_rep_pem_keys.sh
as a root with user name, master volume name, and slave volume names as the arguments.For example,# /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoaccount MASTERVOL SLAVEVOL_NAME
- Configure the meta-volume for geo-replication:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config use_meta_volume true
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config use_meta_volume true
For more information on configuring meta-volume, see Section 10.3.5, “Configuring a Meta-Volume”. - Start the geo-replication with slave user by running the following command on the master node:For example,
# gluster volume geo-replication MASTERVOL geoaccount@SLAVENODE::slavevol start
- Verify the status of geo-replication session by running the following command on the master node:
# gluster volume geo-replication MASTERVOL geoaccount@SLAVENODE::slavevol status
After mountbroker geo-replicaton session is deleted, you must remove the volumes per mountbroker user.
Important
# gluster-mountbroker remove [--volume volume] [--user user]
# gluster-mountbroker remove --volume slavevol --user geoaccount # gluster-mountbroker remove --user geoaccount # gluster-mountbroker remove --volume slavevol
Important
# gluster volume geo-replication MASTERVOL geoaccount@SLAVENODE::slavevol status
geoaccount
is the name of the unprivileged user account.
10.3.5. Configuring a Meta-Volume
gluster_shared_storage
is the gluster volume used for internal purposes. Setting use_meta_volume
to true
enables geo-replication to use shared volume in order to store lock file(s) which helps in handling worker fail-overs. For effective handling of node fail-overs in Master volume, geo-replication requires this shared storage to be available across all nodes of the cluster. Hence, ensure that a gluster volume named gluster_shared_storage
is created in the cluster, and is mounted at /var/run/gluster/shared_storage
on all the nodes in the cluster. For more information on setting up shared storage volume, see Section 11.12, “Setting up Shared Storage Volume”.
Note
- Configure the meta-volume for geo-replication:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config use_meta_volume true
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config use_meta_volume true
Important
rsync_full_access on
and rsync_client on
booleans are set to ON to prevent file permission issues during rsync required by geo-replication.
10.4. Starting Geo-replication
10.4.1. Starting a Geo-replication Session
Important
- To start the geo-replication session between the hosts:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol start Starting geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful
This command will start distributed geo-replication on all the nodes that are part of the master volume. If a node that is part of the master volume is down, the command will still be successful. In a replica pair, the geo-replication session will be active on any of the replica nodes, but remain passive on the others.After executing the command, it may take a few minutes for the session to initialize and become stable.Note
If you attempt to create a geo-replication session and the slave already has data, the following error message will be displayed:slave-node::slave is not empty. Please delete existing files in slave-node::slave and retry, or use force to continue without deleting the existing files. geo-replication command failed
- To start the geo-replication session forcefully between the hosts:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL start force
For example:# gluster volume geo-replication Volume1 storage.backup.com::slave-vol start force Starting geo-replication session between Volume1 & storage.backup.com::slave-vol has been successful
This command will force start geo-replication sessions on the nodes that are part of the master volume. If it is unable to successfully start the geo-replication session on any node which is online and part of the master volume, the command will still start the geo-replication sessions on as many nodes as it can. This command can also be used to re-start geo-replication sessions on the nodes where the session has died, or has not started.
10.4.2. Verifying a Successful Geo-replication Deployment
status
command to verify the status of geo-replication in your environment:
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol status
10.4.3. Displaying Geo-replication Status Information
status
command can be used to display information about a specific geo-replication master session, master-slave session, or all geo-replication sessions. The status output provides both node and brick level information.
- To display information about all geo-replication sessions, use the following command:
#
gluster volume geo-replication status [detail]
- To display information on all geo-replication sessions from a particular master volume, use the following command:
#
gluster volume geo-replication MASTER_VOL status [detail]
- To display information of a particular master-slave session, use the following command:
#
gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status [detail]
Important
There will be a mismatch between the outputs of thedf
command (including-h
and-k
) and inode of the master and slave volumes when the data is in full sync. This is due to the extra inode and size consumption by thechangelog
journaling data, which keeps track of the changes done on the file system on themaster
volume. Instead of running thedf
command to verify the status of synchronization, use# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL status detail
instead. - The geo-replication status command output provides the following information:
- Master Node: Master node and Hostname as listed in the
gluster volume info
command output - Master Vol: Master volume name
- Master Brick: The path of the brick
- Slave User: Slave user name
- Slave: Slave volume name
- Slave Node: IP address/hostname of the slave node to which master worker is connected to.
- Status: The status of the geo-replication worker can be one of the following:
- Initializing: This is the initial phase of the Geo-replication session; it remains in this state for a minute in order to make sure no abnormalities are present.
- Created: The geo-replication session is created, but not started.
- Active: The
gsync
daemon in this node is active and syncing the data. - Passive: A replica pair of the active node. The data synchronization is handled by the active node. Hence, this node does not sync any data.
- Faulty: The geo-replication session has experienced a problem, and the issue needs to be investigated further. For more information, see Section 10.12, “Troubleshooting Geo-replication” section.
- Stopped: The geo-replication session has stopped, but has not been deleted.
- Crawl Status: Crawl status can be one of the following:
- Changelog Crawl: The
changelog
translator has produced the changelog and that is being consumed bygsyncd
daemon to sync data. - Hybrid Crawl: The
gsyncd
daemon is crawling the glusterFS file system and generating pseudo changelog to sync data. - History Crawl: The
gsyncd
daemon consumes the history changelogs produced by the changelog translator to sync data.
- Last Synced: The last synced time.
- Entry: The number of pending entry (CREATE, MKDIR, RENAME, UNLINK etc) operations per session.
- Data: The number of
Data
operations pending per session. - Meta: The number of
Meta
operations pending per session. - Failures: The number of failures. If the failure count is more than zero, view the log files for errors in the Master bricks.
- Checkpoint Time: Displays the date and time of the checkpoint, if set. Otherwise, it displays as N/A.
- Checkpoint Completed: Displays the status of the checkpoint.
- Checkpoint Completion Time: Displays the completion time if Checkpoint is completed. Otherwise, it displays as N/A.
10.4.4. Configuring a Geo-replication Session
# gluster volume geo-replication MASTER_VOL SLAVE_HOST::SLAVE_VOL config [Name] [Value]
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config sync_method rsync
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config
!
(exclamation mark). For example, to reset log-level
to the default value:
# gluster volume geo-replication Volume1 storage.backup.com::slave-vol config '!log-level'
Warning
Connected
(online) state. If you change the configuration when any of the peer is down, the geo-replication cluster would be in inconsistent state when the node comes back online.
The following table provides an overview of the configurable options for a geo-replication setting:
Option | Description |
---|---|
gluster_log_file LOGFILE | The path to the geo-replication glusterfs log file. |
gluster_log_level LOGFILELEVEL | The log level for glusterfs processes. |
log_file LOGFILE | The path to the geo-replication log file. |
log_level LOGFILELEVEL | The log level for geo-replication. |
changelog_log_level LOGFILELEVEL | The log level for the changelog. The default log level is set to INFO. |
changelog_batch_size SIZEINBYTES | The total size for the changelog in a batch. The default size is set to 727040 bytes. |
ssh_command COMMAND | The SSH command to connect to the remote machine (the default is SSH ). |
sync_method NAME | The command to use for setting synchronizing method for the files. The available options are rsync or tarssh . The default is rsync . The tarssh allows tar over Secure Shell protocol. Use tarssh option to handle workloads of files that have not undergone edits.
Note
On a RHEL 8.3 or above, before configuring the sync_method as _tarssh_ , make sure to install _tar_ package.
# yum install tar |
volume_id=UID | The command to delete the existing master UID for the intermediate/slave node. |
timeout SECONDS | The timeout period in seconds. |
sync_jobs N |
The number of sync-jobs represents the maximum number of syncer threads (rsync processes or tar over ssh processes for syncing) inside each worker. The number of workers is always equal to the number of bricks in the Master volume. For example, a distributed-replicated volume of (3 x 2) with sync-jobs configured at 3 results in 9 total sync-jobs (aka threads) across all nodes/servers.
Active and Passive Workers : The number of active workers is based on the volume configuration. In case of a distribute volume, all bricks (workers) will be active and participate in syncing. In case of replicate or dispersed volume, one worker from each replicate/disperse group (subvolume) will be active and participate in syncing. This is to avoid duplicate syncing from other bricks. The remaining workers in each replicate/disperse group (subvolume) will be passive. In case the active worker goes down, one of the passive worker from the same replicate/disperse group will become an active worker.
|
ignore_deletes | If this option is set to true , a file deleted on the master will not trigger a delete operation on the slave. As a result, the slave will remain as a superset of the master and can be used to recover the master in the event of a crash and/or accidental delete. If this option is set to false , which is the default config option for ignore-deletes , a file deleted on the master will trigger a delete operation on the slave. |
checkpoint [LABEL|now] | Sets a checkpoint with the given option LABEL. If the option is set as now , then the current time will be used as the label. |
sync_acls |