Dieser Inhalt ist in der von Ihnen ausgewählten Sprache nicht verfügbar.
20.2. Brick Configuration
Procedure 20.1. Brick Configuration
LVM layer
The steps for creating a brick from a physical device is listed below. An outline of steps for creating multiple bricks on a physical device is listed as Example - Creating multiple bricks on a physical device below.- Creating the Physical VolumeThe
pvcreatecommand is used to create the physical volume. The Logical Volume Manager can use a portion of the physical volume for storing its metadata while the rest is used as the data portion.Align the I/O at the Logical Volume Manager (LVM) layer using--dataalignmentoption while creating the physical volume.The command is used in the following format:pvcreate --dataalignment alignment_value disk
# pvcreate --dataalignment alignment_value diskCopy to Clipboard Copied! Toggle word wrap Toggle overflow For JBOD, use an alignment value of256K.In case of hardware RAID, the alignment_value should be obtained by multiplying the RAID stripe unit size with the number of data disks. If 12 disks are used in a RAID 6 configuration, the number of data disks is 10; on the other hand, if 12 disks are used in a RAID 10 configuration, the number of data disks is 6.For example, the following command is appropriate for 12 disks in a RAID 6 configuration with a stripe unit size of 128 KiB:pvcreate --dataalignment 1280k disk
# pvcreate --dataalignment 1280k diskCopy to Clipboard Copied! Toggle word wrap Toggle overflow The following command is appropriate for 12 disks in a RAID 10 configuration with a stripe unit size of 256 KiB:pvcreate --dataalignment 1536k disk
# pvcreate --dataalignment 1536k diskCopy to Clipboard Copied! Toggle word wrap Toggle overflow To view the previously configured physical volume settings for--dataalignment, run the following command:pvs -o +pe_start disk
# pvs -o +pe_start disk PV VG Fmt Attr PSize PFree 1st PE /dev/sdb lvm2 a-- 9.09t 9.09t 1.25mCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Creating the Volume GroupThe volume group is created using the
vgcreatecommand.For hardware RAID, in order to ensure that logical volumes created in the volume group are aligned with the underlying RAID geometry, it is important to use the-- physicalextentsizeoption. Execute thevgcreatecommand in the following format:vgcreate --physicalextentsize extent_size VOLGROUP physical_volume
# vgcreate --physicalextentsize extent_size VOLGROUP physical_volumeCopy to Clipboard Copied! Toggle word wrap Toggle overflow The extent_size should be obtained by multiplying the RAID stripe unit size with the number of data disks. If 12 disks are used in a RAID 6 configuration, the number of data disks is 10; on the other hand, if 12 disks are used in a RAID 10 configuration, the number of data disks is 6.For example, run the following command for RAID-6 storage with a stripe unit size of 128 KB, and 12 disks (10 data disks):vgcreate --physicalextentsize 1280k VOLGROUP physical_volume
# vgcreate --physicalextentsize 1280k VOLGROUP physical_volumeCopy to Clipboard Copied! Toggle word wrap Toggle overflow In the case of JBOD, use thevgcreatecommand in the following format:vgcreate VOLGROUP physical_volume
# vgcreate VOLGROUP physical_volumeCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Creating the Thin PoolA thin pool provides a common pool of storage for thin logical volumes (LVs) and their snapshot volumes, if any.Execute the following commands to create a thin pool of a specific size:
lvcreate --thin VOLGROUP/POOLNAME --size POOLSIZE --chunksize CHUNKSIZE --poolmetadatasize METASIZE --zero n
# lvcreate --thin VOLGROUP/POOLNAME --size POOLSIZE --chunksize CHUNKSIZE --poolmetadatasize METASIZE --zero nCopy to Clipboard Copied! Toggle word wrap Toggle overflow You can also create a thin pool of the maximum possible size for your device by executing the following command:lvcreate --thin VOLGROUP/POOLNAME --extents 100%FREE --chunksize CHUNKSIZE --poolmetadatasize METASIZE --zero n
# lvcreate --thin VOLGROUP/POOLNAME --extents 100%FREE --chunksize CHUNKSIZE --poolmetadatasize METASIZE --zero nCopy to Clipboard Copied! Toggle word wrap Toggle overflow Recommended parameter values for thin pool creation
- poolmetadatasize
- Internally, a thin pool contains a separate metadata device that is used to track the (dynamically) allocated regions of the thin LVs and snapshots. The
poolmetadatasizeoption in the above command refers to the size of the pool metadata device.The maximum possible size for a metadata LV is 16 GiB. Red Hat Gluster Storage recommends creating the metadata device of the maximum supported size. You can allocate less than the maximum if space is a concern, but in this case you should allocate a minimum of 0.5% of the pool size.Warning
If your metadata pool runs out of space, you cannot create data. This includes the data required to increase the size of the metadata pool or to migrate data away from a volume that has run out of metadata space. Monitor your metadata pool using thelvs -o+metadata_percentcommand and ensure that it does not run out of space. - chunksize
- An important parameter to be specified while creating a thin pool is the chunk size,which is the unit of allocation. For good performance, the chunk size for the thin pool and the parameters of the underlying hardware RAID storage should be chosen so that they work well together.For JBOD, use a thin pool chunk size of 256 KiB.For RAID 6 storage, the striping parameters should be chosen so that the full stripe size (stripe_unit size * number of data disks) is between 1 MiB and 2 MiB, preferably in the low end of the range. The thin pool chunk size should be chosen to match the RAID 6 full stripe size. Matching the chunk size to the full stripe size aligns thin pool allocations with RAID 6 stripes, which can lead to better performance. Limiting the chunk size to below 2 MiB helps reduce performance problems due to excessive copy-on-write when snapshots are used.For example, for RAID 6 with 12 disks (10 data disks), stripe unit size should be chosen as 128 KiB. This leads to a full stripe size of 1280 KiB (1.25 MiB). The thin pool should then be created with the chunk size of 1280 KiB.For RAID 10 storage, the preferred stripe unit size is 256 KiB. This can also serve as the thin pool chunk size. Note that RAID 10 is recommended when the workload has a large proportion of small file writes or random writes. In this case, a small thin pool chunk size is more appropriate, as it reduces copy-on-write overhead with snapshots.If the addressable storage on the device is smaller than the device itself, you need to adjust the recommended chunk size. Calculate the adjustment factor using the following formula:
adjustment_factor = device_size_in_tb / (preferred_chunk_size_in_kb * 4 / 64 )
adjustment_factor = device_size_in_tb / (preferred_chunk_size_in_kb * 4 / 64 )Copy to Clipboard Copied! Toggle word wrap Toggle overflow Round the adjustment factor up. Then calculate the new chunk size using the following:chunk_size = preferred_chunk_size * rounded_adjustment_factor
chunk_size = preferred_chunk_size * rounded_adjustment_factorCopy to Clipboard Copied! Toggle word wrap Toggle overflow - block zeroing
- By default, the newly provisioned chunks in a thin pool are zeroed to prevent data leaking between different block devices. In the case of Red Hat Gluster Storage, where data is accessed via a file system, this option can be turned off for better performance with the
--zero noption. Note thatndoes not need to be replaced.The following example shows how to create the thin pool:lvcreate --thin VOLGROUP/thin_pool --size 2T --chunksize 1280k --poolmetadatasize 16G --zero n
# lvcreate --thin VOLGROUP/thin_pool --size 2T --chunksize 1280k --poolmetadatasize 16G --zero nCopy to Clipboard Copied! Toggle word wrap Toggle overflow You can also use--extents 100%FREEto ensure the thin pool takes up all available space once the metadata pool is created.lvcreate --thin VOLGROUP/thin_pool --extents 100%FREE --chunksize 1280k --poolmetadatasize 16G --zero n
# lvcreate --thin VOLGROUP/thin_pool --extents 100%FREE --chunksize 1280k --poolmetadatasize 16G --zero nCopy to Clipboard Copied! Toggle word wrap Toggle overflow
The following example shows how to create a 2 TB thin pool:lvcreate --thin VOLGROUP/thin_pool --size 2T --chunksize 1280k --poolmetadatasize 16G --zero n
# lvcreate --thin VOLGROUP/thin_pool --size 2T --chunksize 1280k --poolmetadatasize 16G --zero nCopy to Clipboard Copied! Toggle word wrap Toggle overflow The following example creates a thin pool that takes up all remaining space once the metadata pool has been created.lvcreate --thin VOLGROUP/thin_pool --extents 100%FREE --chunksize 1280k --poolmetadatasize 16G --zero n
# lvcreate --thin VOLGROUP/thin_pool --extents 100%FREE --chunksize 1280k --poolmetadatasize 16G --zero nCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Creating a Thin Logical VolumeAfter the thin pool has been created as mentioned above, a thinly provisioned logical volume can be created in the thin pool to serve as storage for a brick of a Red Hat Gluster Storage volume.
lvcreate --thin --name LV_name --virtualsize LV_size VOLGROUP/thin_pool
# lvcreate --thin --name LV_name --virtualsize LV_size VOLGROUP/thin_poolCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Example - Creating multiple bricks on a physical deviceThe steps above (LVM Layer) cover the case where a single brick is being created on a physical device. This example shows how to adapt these steps when multiple bricks need to be created on a physical device.
Note
In this following steps, we are assuming the following:- Two bricks must be created on the same physical device
- One brick must be of size 4 TiB and the other is 2 TiB
- The device is
/dev/sdb, and is a RAID-6 device with 12 disks - The 12-disk RAID-6 device has been created according to the recommendations in this chapter, that is, with a stripe unit size of 128 KiB
- Create a single physical volume using pvcreate
pvcreate --dataalignment 1280k /dev/sdb
# pvcreate --dataalignment 1280k /dev/sdbCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Create a single volume group on the device
vgcreate --physicalextentsize 1280k vg1 /dev/sdb
# vgcreate --physicalextentsize 1280k vg1 /dev/sdbCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Create a separate thin pool for each brick using the following commands:
lvcreate --thin vg1/thin_pool_1 --size 4T --chunksize 1280K --poolmetadatasize 16G --zero n
# lvcreate --thin vg1/thin_pool_1 --size 4T --chunksize 1280K --poolmetadatasize 16G --zero nCopy to Clipboard Copied! Toggle word wrap Toggle overflow lvcreate --thin vg1/thin_pool_2 --size 2T --chunksize 1280K --poolmetadatasize 16G --zero n
# lvcreate --thin vg1/thin_pool_2 --size 2T --chunksize 1280K --poolmetadatasize 16G --zero nCopy to Clipboard Copied! Toggle word wrap Toggle overflow In the examples above, the size of each thin pool is chosen to be the same as the size of the brick that will be created in it. With thin provisioning, there are many possible ways of managing space, and these options are not discussed in this chapter. - Create a thin logical volume for each brick
lvcreate --thin --name lv1 --virtualsize 4T vg1/thin_pool_1
# lvcreate --thin --name lv1 --virtualsize 4T vg1/thin_pool_1Copy to Clipboard Copied! Toggle word wrap Toggle overflow lvcreate --thin --name lv2 --virtualsize 2T vg1/thin_pool_2
# lvcreate --thin --name lv2 --virtualsize 2T vg1/thin_pool_2Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Follow the XFS Recommendations (next step) in this chapter for creating and mounting filesystems for each of the thin logical volumes
mkfs.xfs options /dev/vg1/lv1
# mkfs.xfs options /dev/vg1/lv1Copy to Clipboard Copied! Toggle word wrap Toggle overflow mkfs.xfs options /dev/vg1/lv2
# mkfs.xfs options /dev/vg1/lv2Copy to Clipboard Copied! Toggle word wrap Toggle overflow mount options /dev/vg1/lv1 mount_point_1
# mount options /dev/vg1/lv1 mount_point_1Copy to Clipboard Copied! Toggle word wrap Toggle overflow mount options /dev/vg1/lv2 mount_point_2
# mount options /dev/vg1/lv2 mount_point_2Copy to Clipboard Copied! Toggle word wrap Toggle overflow
XFS Recommendataions
- XFS Inode SizeAs Red Hat Gluster Storage makes extensive use of extended attributes, an XFS inode size of 512 bytes works better with Red Hat Gluster Storage than the default XFS inode size of 256 bytes. So, inode size for XFS must be set to 512 bytes while formatting the Red Hat Gluster Storage bricks. To set the inode size, you have to use -i size option with the
mkfs.xfscommand as shown in the following Logical Block Size for the Directory section. - XFS RAID AlignmentWhen creating an XFS file system, you can explicitly specify the striping parameters of the underlying storage in the following format:
mkfs.xfs other_options -d su=stripe_unit_size,sw=stripe_width_in_number_of_disks device
# mkfs.xfs other_options -d su=stripe_unit_size,sw=stripe_width_in_number_of_disks deviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow For RAID 6, ensure that I/O is aligned at the file system layer by providing the striping parameters. For RAID 6 storage with 12 disks, if the recommendations above have been followed, the values must be as following:mkfs.xfs other_options -d su=128k,sw=10 device
# mkfs.xfs other_options -d su=128k,sw=10 deviceCopy to Clipboard Copied! Toggle word wrap Toggle overflow For RAID 10 and JBOD, the-d su=<>,sw=<>option can be omitted. By default, XFS will use the thin-p chunk size and other parameters to make layout decisions. - Logical Block Size for the DirectoryAn XFS file system allows to select a logical block size for the file system directory that is greater than the logical block size of the file system. Increasing the logical block size for the directories from the default 4 K, decreases the directory I/O, which in turn improves the performance of directory operations. To set the block size, you need to use
-n sizeoption with themkfs.xfscommand as shown in the following example output.Following is the example output of RAID 6 configuration along with inode and block size options:Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Allocation Strategyinode32 and inode64 are two most common allocation strategies for XFS. With inode32 allocation strategy, XFS places all the inodes in the first 1 TiB of disk. With larger disk, all the inodes would be stuck in first 1 TiB. inode32 allocation strategy is used by default.With inode64 mount option inodes would be replaced near to the data which would be minimize the disk seeks.To set the allocation strategy to inode64 when file system is being mounted, you need to use
-o inode64option with themountcommand as shown in the following Access Time section. - Access TimeIf the application does not require to update the access time on files, than file system must always be mounted with
noatimemount option. For example:mount -t xfs -o inode64,noatime <logical volume> <mount point>
# mount -t xfs -o inode64,noatime <logical volume> <mount point>Copy to Clipboard Copied! Toggle word wrap Toggle overflow This optimization improves performance of small-file reads by avoiding updates to the XFS inodes when files are read./etc/fstab entry for option E + F <logical volume> <mount point>xfs inode64,noatime 0 0
/etc/fstab entry for option E + F <logical volume> <mount point>xfs inode64,noatime 0 0Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Allocation groupsEach XFS file system is partitioned into regions called allocation groups. Allocation groups are similar to the block groups in ext3, but allocation groups are much larger than block groups and are used for scalability and parallelism rather than disk locality. The default allocation for an allocation group is 1 TiB.Allocation group count must be large enough to sustain the concurrent allocation workload. In most of the cases allocation group count chosen by
mkfs.xfscommand would give the optimal performance. Do not change the allocation group count chosen bymkfs.xfs, while formatting the file system. - Percentage of space allocation to inodesIf the workload is very small files (average file size is less than 10 KB ), then it is recommended to set
maxpctvalue to10, while formatting the file system.
Performance tuning option in Red Hat Gluster Storage
A tuned profile is designed to improve performance for a specific use case by tuning system parameters appropriately. Red Hat Gluster Storage includes tuned profiles tailored for its workloads. These profiles are available in both Red Hat Enterprise Linux 6 and Red Hat Enterprise Linux 7.Expand Table 20.1. Recommended Profiles for Different Workloads Workload Profile Name Large-file, sequential I/O workloads rhgs-sequential-ioSmall-file workloads rhgs-random-ioRandom I/O workloads rhgs-random-ioEarlier versions of Red Hat Gluster Storage on Red Hat Enterprise Linux 6 recommended tuned profilesrhs-high-throughputandrhs-virtualization. These profiles are still available on Red Hat Enterprise Linux 6. However, switching to the new profiles is recommended.To apply tunings contained in the tuned profile, run the following command after creating a Red Hat Gluster Storage volume.tuned-adm profile profile-name
# tuned-adm profile profile-nameCopy to Clipboard Copied! Toggle word wrap Toggle overflow For example:tuned-adm profile rhgs-sequential-io
# tuned-adm profile rhgs-sequential-ioCopy to Clipboard Copied! Toggle word wrap Toggle overflow Writeback Caching
For small-file and random write performance, we strongly recommend writeback cache, that is, non-volatile random-access memory (NVRAM) in your storage controller. For example, normal Dell and HP storage controllers have it. Ensure that NVRAM is enabled, that is, the battery is working. Refer your hardware documentation for details on enabling NVRAM.Do not enable writeback caching in the disk drives, this is a policy where the disk drive considers the write is complete before the write actually made it to the magnetic media (platter). As a result, the disk write cache might lose its data during a power failure or even loss of metadata leading to file system corruption.
20.2.1. Many Bricks per Node Link kopierenLink in die Zwischenablage kopiert!
Configuring Brick Multiplexing
- Set
cluster.brick-multiplextoon. This option affects all volumes.gluster volume set all cluster.brick-multiplex on
# gluster volume set all cluster.brick-multiplex onCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Restart all volumes for brick multiplexing to take effect.
gluster volume stop VOLNAME gluster volume start VOLNAME
# gluster volume stop VOLNAME # gluster volume start VOLNAMECopy to Clipboard Copied! Toggle word wrap Toggle overflow
Important
20.2.2. Port Range Configuration Link kopierenLink in die Zwischenablage kopiert!
glusterd.vol file. The base-port and max-port options can be used to set the port range. By default, base-port is set to 49152, and max-port is set to 65535.
Important
base-port and max-port, newer bricks and volumes fail to start.
Configuring Port Range
- Edit the
glusterd.volfile on all the nodes.vi /etc/glusterfs/glusterd.vol
# vi /etc/glusterfs/glusterd.volCopy to Clipboard Copied! Toggle word wrap Toggle overflow - Remove the comment marker
#corresponding to thebase-portandmax-portoptions.Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Define the port number in the
base-port, andmax-portoptions.option base-port 49152 option max-port 65535
option base-port 49152 option max-port 65535Copy to Clipboard Copied! Toggle word wrap Toggle overflow - Save the
glusterd.volfile and restart theglusterdservice on each Red Hat Gluster Storage node.