Este contenido no está disponible en el idioma seleccionado.
Chapter 1. Planning a GFS2 file system deployment
The Red Hat Global File System 2 (GFS2) file system is a 64-bit symmetric cluster file system which provides a shared name space and manages coherency between multiple nodes sharing a common block device. A GFS2 file system is intended to provide a feature set which is as close as possible to a local file system, while at the same time enforcing full cluster coherency between nodes. To achieve this, the nodes employ a cluster-wide locking scheme for file system resources. This locking scheme uses communication protocols such as TCP/IP to exchange locking information.
In a few cases, the Linux file system API does not allow the clustered nature of GFS2 to be totally transparent; for example, programs using POSIX locks in GFS2 should avoid using the GETLK
function since, in a clustered environment, the process ID may be for a different node in the cluster. In most cases however, the functionality of a GFS2 file system is identical to that of a local file system.
The Red Hat Enterprise Linux (RHEL) Resilient Storage Add-On provides GFS2, and it depends on the RHEL High Availability Add-On to provide the cluster management required by GFS2.
The gfs2.ko
kernel module implements the GFS2 file system and is loaded on GFS2 cluster nodes.
To get the best performance from GFS2, it is important to take into account the performance considerations which stem from the underlying design. Just like a local file system, GFS2 relies on the page cache in order to improve performance by local caching of frequently used data. In order to maintain coherency across the nodes in the cluster, cache control is provided by the glock state machine.
Make sure that your deployment of the Red Hat High Availability Add-On meets your needs and can be supported. Consult with an authorized Red Hat representative to verify your configuration prior to deployment.
1.1. GFS2 file system format version 1802
As of Red Hat Enterprise Linux 9, GFS2 file systems are created with format version 1802.
Format version 1802 enables the following features:
-
Extended attributes in the
trusted
namespace ("trusted.* xattrs") are recognized bygfs2
andgfs2-utils
. -
The
rgrplvb
option is active by default. This allowsgfs2
to attach updated resource group data to DLM lock requests, so the node acquiring the lock does not need to update the resource group information from disk. This improves performance in some cases.
Filesystems created with the new format version will not be able to be mounted under earlier RHEL versions and older versions of the fsck.gfs2
utility will not be able to check them.
Users can create a file system with the older format version by running the mkfs.gfs2
command with the option -o format=1801
.
Users can upgrade the format version of an older file system running tunegfs2 -r 1802 device
on an unmounted file system. Downgrading the format version is not supported.
1.2. Key GFS2 parameters to determine
There are a number of key GFS2 parameters you should plan for before you install and configure a GFS2 file system.
- GFS2 nodes
- Determine which nodes in the cluster will mount the GFS2 file systems.
- Number of file systems
- Determine how many GFS2 file systems to create initially. More file systems can be added later.
- File system name
-
Each GFS2 file system should have a unique name. This name is usually the same as the LVM logical volume name and is used as the DLM lock table name when a GFS2 file system is mounted. For example, this guide uses file system names
mydata1
andmydata2
in some example procedures. - Journals
-
Determine the number of journals for your GFS2 file systems. GFS2 requires one journal for each node in the cluster that needs to mount the file system. For example, if you have a 16-node cluster but need to mount only the file system from two nodes, you need only two journals. GFS2 allows you to add journals dynamically at a later point with the
gfs2_jadd
utility as additional servers mount a file system. - Storage devices and partitions
-
Determine the storage devices and partitions to be used for creating logical volumes (using
lvmlockd
) in the file systems. - Time protocol
Make sure that the clocks on the GFS2 nodes are synchronized. It is recommended that you use the Precision Time Protocol (PTP) or, if necessary for your configuration, the Network Time Protocol (NTP) software provided with your Red Hat Enterprise Linux distribution.
The system clocks in GFS2 nodes must be within a few minutes of each other to prevent unnecessary inode time stamp updating. Unnecessary inode time stamp updating severely impacts cluster performance.
You may see performance problems with GFS2 when many create and delete operations are issued from more than one node in the same directory at the same time. If this causes performance problems in your system, you should localize file creation and deletions by a node to directories specific to that node as much as possible.
1.3. GFS2 support considerations
To be eligible for support from Red Hat for a cluster running a GFS2 file system, you must take into account the support policies for GFS2 file systems.
For full information about Red Hat’s support policies, requirements, and limitations for RHEL High Availability clusters, see Support Policies for RHEL High Availability Clusters.
1.3.1. Maximum file system and cluster size
The following table summarizes the current maximum file system size and number of nodes that GFS2 supports.
Parameter | Maximum |
---|---|
Number of nodes | 16 (x86, Power8 on PowerVM) 4 (s390x under z/VM) |
File system size | 100TB on all supported architectures |
GFS2 is based on a 64-bit architecture, which can theoretically accommodate an 8 EB file system. If your system requires larger GFS2 file systems than are currently supported, contact your Red Hat service representative.
When determining the size of your file system, you should consider your recovery needs. Running the fsck.gfs2
command on a very large file system can take a long time and consume a large amount of memory. Additionally, in the event of a disk or disk subsystem failure, recovery time is limited by the speed of your backup media. For information about the amount of memory the fsck.gfs2
command requires, see Determining required memory for running fsck.gfs2.
1.3.2. Minimum cluster size
Although a GFS2 file system can be implemented in a standalone system or as part of a cluster configuration, Red Hat does not support the use of GFS2 as a single-node file system, with the following exceptions:
- Red Hat supports single-node GFS2 file systems for mounting snapshots of cluster file systems as might be needed, for example, for backup purposes.
A single-node cluster mounting GFS2 file systems (which uses DLM) is supported for the purposes of a secondary-site Disaster Recovery (DR) node. This exception is for DR purposes only and not for transferring the main cluster workload to the secondary site.
For example, copying off the data from the filesystem mounted on the secondary site while the primary site is offline is supported. However, migrating a workload from the primary site directly to a single-node cluster secondary site is unsupported. If the full work load needs to be migrated to the single-node secondary site then the secondary site must be the same size as the primary site.
Red Hat recommends that when you mount a GFS2 file system in a single-node cluster you specify the
errors=panic
mount option so that the single-node cluster will panic when a GFS2 withdraw occurs since the single-node cluster will not be able to fence itself when encountering file system errors.
Red Hat supports a number of high-performance single-node file systems that are optimized for single node and thus have generally lower overhead than a cluster file system. Red Hat recommends using these file systems in preference to GFS2 in cases where only a single node needs to mount the file system. For information about the file systems that Red Hat Enterprise Linux 9 supports, see Managing file systems.
1.4. GFS2 formatting considerations
To format your GFS2 file system to optimize performance, you should take these recommendations into account.
Make sure that your deployment of the Red Hat High Availability Add-On meets your needs and can be supported. Consult with an authorized Red Hat representative to verify your configuration prior to deployment.
File System Size: Smaller Is Better
GFS2 is based on a 64-bit architecture, which can theoretically accommodate an 8 EB file system. However, the current supported maximum size of a GFS2 file system for 64-bit hardware is 100TB.
Note that even though GFS2 large file systems are possible, that does not mean they are recommended. The rule of thumb with GFS2 is that smaller is better: it is better to have 10 1TB file systems than one 10TB file system.
There are several reasons why you should keep your GFS2 file systems small:
- Less time is required to back up each file system.
-
Less time is required if you need to check the file system with the
fsck.gfs2
command. -
Less memory is required if you need to check the file system with the
fsck.gfs2
command.
In addition, fewer resource groups to maintain mean better performance.
Of course, if you make your GFS2 file system too small, you might run out of space, and that has its own consequences. You should consider your own use cases before deciding on a size.
Block Size: Default (4K) Blocks Are Preferred
The mkfs.gfs2
command attempts to estimate an optimal block size based on device topology. In general, 4K blocks are the preferred block size because 4K is the default page size (memory) for Red Hat Enterprise Linux. Unlike some other file systems, GFS2 does most of its operations using 4K kernel buffers. If your block size is 4K, the kernel has to do less work to manipulate the buffers.
It is recommended that you use the default block size, which should yield the highest performance. You may need to use a different block size only if you require efficient storage of many very small files.
Journal Size: Default (128MB) Is Usually Optimal
When you run the mkfs.gfs2
command to create a GFS2 file system, you may specify the size of the journals. If you do not specify a size, it will default to 128MB, which should be optimal for most applications.
Some system administrators might think that 128MB is excessive and be tempted to reduce the size of the journal to the minimum of 8MB or a more conservative 32MB. While that might work, it can severely impact performance. Like many journaling file systems, every time GFS2 writes metadata, the metadata is committed to the journal before it is put into place. This ensures that if the system crashes or loses power, you will recover all of the metadata when the journal is automatically replayed at mount time. However, it does not take much file system activity to fill an 8MB journal, and when the journal is full, performance slows because GFS2 has to wait for writes to the storage.
It is generally recommended to use the default journal size of 128MB. If your file system is very small (for example, 5GB), having a 128MB journal might be impractical. If you have a larger file system and can afford the space, using 256MB journals might improve performance.
Size and Number of Resource Groups
When a GFS2 file system is created with the mkfs.gfs2
command, it divides the storage into uniform slices known as resource groups. It attempts to estimate an optimal resource group size (ranging from 32MB to 2GB). You can override the default with the -r
option of the mkfs.gfs2
command.
Your optimal resource group size depends on how you will use the file system. Consider how full it will be and whether or not it will be severely fragmented.
You should experiment with different resource group sizes to see which results in optimal performance. It is a best practice to experiment with a test cluster before deploying GFS2 into full production.
If your file system has too many resource groups, each of which is too small, block allocations can waste too much time searching tens of thousands of resource groups for a free block. The more full your file system, the more resource groups that will be searched, and every one of them requires a cluster-wide lock. This leads to slow performance.
If, however, your file system has too few resource groups, each of which is too big, block allocations might contend more often for the same resource group lock, which also impacts performance. For example, if you have a 10GB file system that is carved up into five resource groups of 2GB, the nodes in your cluster will fight over those five resource groups more often than if the same file system were carved into 320 resource groups of 32MB. The problem is exacerbated if your file system is nearly full because every block allocation might have to look through several resource groups before it finds one with a free block. GFS2 tries to mitigate this problem in two ways:
- First, when a resource group is completely full, it remembers that and tries to avoid checking it for future allocations until a block is freed from it. If you never delete files, contention will be less severe. However, if your application is constantly deleting blocks and allocating new blocks on a file system that is mostly full, contention will be very high and this will severely impact performance.
- Second, when new blocks are added to an existing file (for example, by appending) GFS2 will attempt to group the new blocks together in the same resource group as the file. This is done to increase performance: on a spinning disk, seek operations take less time when they are physically close together.
The worst case scenario is when there is a central directory in which all the nodes create files because all of the nodes will constantly fight to lock the same resource group.
1.5. Considerations for GFS2 in a cluster
When determining the number of nodes that your system will contain, note that there is a trade-off between high availability and performance. With a larger number of nodes, it becomes increasingly difficult to make workloads scale. For that reason, Red Hat does not support using GFS2 for cluster file system deployments greater than 16 nodes.
Deploying a cluster file system is not a "drop in" replacement for a single node deployment. Red Hat recommends that you allow a period of around 8-12 weeks of testing on new installations in order to test the system and ensure that it is working at the required performance level. During this period, any performance or functional issues can be worked out and any queries should be directed to the Red Hat support team.
Red Hat recommends that customers considering deploying clusters have their configurations reviewed by Red Hat support before deployment to avoid any possible support issues later on.
1.6. Hardware considerations
Take the following hardware considerations into account when deploying a GFS2 file system.
Use higher quality storage options
GFS2 can operate on cheaper shared storage options, such as iSCSI or Fibre Channel over Ethernet (FCoE), but you will get better performance if you buy higher quality storage with larger caching capacity. Red Hat performs most quality, sanity, and performance tests on SAN storage with Fibre Channel interconnect. As a general rule, it is always better to deploy something that has been tested first.
Test network equipment before deploying
Higher quality, faster network equipment makes cluster communications and GFS2 run faster with better reliability. However, you do not have to purchase the most expensive hardware. Some of the most expensive network switches have problems passing multicast packets, which are used for passing
fcntl
locks (flocks), whereas cheaper commodity network switches are sometimes faster and more reliable. Red Hat recommends trying equipment before deploying it into full production.